CSE-6421: Advanced Database Systems
York University
Winter 2016
Syllabus
Instructor: Parke Godfrey
Office: #2050 Lassonde
Office Hours: We 2–4pm
& by appointment / availability
Ph#: 416-736-2100 x66671
e-mail: godfrey@cse.yorku.ca
 
Term: winter 2015
Time: Tu & Th 5:30pm–7:00pm
Place: CC #335
Textbook: Ramakrishnan & Gerhke
Database Management Systems
Third Edition, 2003
McGraw Hill
ISBN: 0-07-246563-8
URL: http://www.cs.wisc.edu/~dbbook
Readings: Posted throughout term.
Class URL: http://www.cse.yorku.ca/course/6421/
  About the Course
Description (from the academic calendar)

This course provides an introduction to, and an in-depth study on, several new developments in database systems and intelligent information systems. Topics include: internet databases, data warehousing and OLAP, object-relational, object-oriented, and deductive databases.

Formal semantics of relational databases and systems, physical database tuning, advanced issues in query optimization and transaction processing, advanced database facilities such as triggers and materialized views, query caching, and database mediation. Current topics in database research and development.

Course Objectives & Content

The course is arranged in three parts.

  1. Logic
  2. System
  3. Beyond Relational
  4. Directions

First, we start at the theoretical end of databases to consider issues of semantics. This is necessary, if we are to address important questions as in the following.

  • What does a query mean; how do we know whether the answer table returned by the database system is correct?
  • How expressive is the query language? What types of queries are we unable to ask? Would recursion add anything? Would negation?
  • When are two queries equivalent?
  • When are two database schemas equivalent?
  • How computationally difficult are these questions to answer?

To address these, we study formal, logical models of databases and queries.

Next, we move to the systems end of databases and go “under the hood” of the system (a relational database management system, RDBMS) to understand how it works. We study how a RDBMS is constructed. We study the physical database, how the data is stored, organized, and efficiently accessed. We next study how SQL queries are efficiently evaluated. In particular, we look at the query optimizer which turns the SQL query into an access plan that the database system can execute. We review other functionalities that a database system needs to support such as transaction management.

We then look at alternative data models and query languages. We study their differences, their advantages and disadvantages, and what they offer.

Last, we move to a selection of topics found in current database research and development.

This course is meant to prepare students with a good operating knowledge of modern relational database systems, a knowledge of current research and development issues in databases, and a foundation for conducting research in this area. It is expected that students enter the course with some background in databases, namely an understanding of, and facility with, SQL and schema design, and have some hands-on experience with an RDBMS.

Readings

Reading and study material will be assigned from the course textbook, from relevant conference and journal papers, and lecture notes.

 
  Grading Criteria & Course Requirements
Components
What Percentage
Assignments 5 × 5%
Report 25%
Presentation 10%
Summary Log 10%
Participation 5%
Final Exam 25%
Assignments

There will be five assignments throughout the term, approximately one due every two weeks. These will be a mix of homework assignments related to the topics we discuss and small projects, such as to write queries or analyze query plans.

Report (“The Proposal”)

Each student will write a report. The report topic will be negotiated between the instructor and student. The report will be written as a proposal for research work to be done. (You will not actually do that research. Your work here is to identify a viable research problem in databases that should be addressed.) The task is to identify an unaddressed problem in databases which would be useful to address, to conduct a literature search with respect to the problem, and to propose a methodology by which the problem could potentially be addressed.

The report is to be written in conference-paper format in the LNCS format. It should have

  • an abstract,
  • an introduction which describes and motivates the problem at hand, and demonstrates the importance of the problem,
  • a related works section which demonstrates evidence that the problem is unresolved,
  • a methodologies section in which you propose how the problem might be successfully addressed,
  • conclusions, and
  • a proper bibliograpghy.

Milestones for the report will be:

  1. Topic meeting with instructor
  2. Initial Proposal (~1 page)
  3. Progress Review (meeting)
  4. Extended Abstract (~2-3 pages)
  5. Final Report (≤ 12 pages)

Presentation

You will make a presentation to the class on a given topic. It will be based usually upon two or three research papers. One of the papers will be assigned to the class as a whole to read before the talk. The talk will be for twenty minutes, with ten more minutes for a question & answer period and discussion.

The topic of a student's presentation typically will be closely related to his / her report topic, although this is not strictly necessary.

Talks will be graded in part by the class. The topic a student chooses for his or her talk can be related to his or her project, but it does not have to be. Talks will be scheduled for the last quarter of the semester (Directions).

The milestones for talks are:

  1. Choosing a talk topic. The student and instructor will negotiate the research papers to be read.
  2. The presentation itself.

Summary Log & Participation

Readings of research papers in addition to the text will be assigned, particularly during the Semantics and Topics parts of the course.

For each of the research papers assigned to the class as a whole, the student should

  • write a reasonable one to two paragraph summary of the paper, and
  • compose one question about the work that he or she feels is not well addressed in the paper.

These summaries can be turned in by e-mail to the instructor progressively through the term (up to the final due date).

Clearly, this component is simply a trick, er mechanism, to ensure that students stay somewhat current with the reading. Students may leave out up to three papers from their summary log with no penalty.

Class participation will be graded for 5% towards the final grade.

Final Exam

There will be a comprehensive final exam that contributes 25% towards the final grade.

 
  Schedule

The schedule may be revised as the term progresses. Textbook readings are indicated. Readings of research papers will also be assigned. Note that this schedule is tentative and is subject to change.

Wk# Day Topic Reading Due
#1 Tu 5 Jan Into the briar patch. Introduction. §1  
I. Logic: What does it mean?
  Th 7 Jan It's only logical. slides  
#2 Tu 12 Jan Logic Primer.    
  Th 14 Jan I refute that! §24  
#3 Tu 19 Jan Deductive databases, Datalog, & Prolog    
  Th 21 Jan Negation by any other name... A A1
#4 Tu 26 Jan Negation & non-monotonicity B  
  Th 28 Jan query containment, relational calculus §4.3, C R1
#5 Tu 2 Feb What was the question? §5, D  
  Th 4 Feb SQL E R2
II. System: How does it work?
#6 Tu 9 Feb the physical database & cost model §8, 9  
  Th 11 Feb indexes §10, 11 A2
Reading Week (13–19 February)
#7 Tu 23 Feb Psst. Want a join algorithm, cheap? §14  
  Th 25 Feb Implementing the algebra. F P1
#8 Tu 1 Mar I have a plan. §15, G  
  Th 3 Mar the query optimizer(s) H A3
III. Beyond Relational: What more is there?
#9 Tu 8 Mar XML & XQuery §27, I  
  Th 10 Mar RDF & SPARQL J R3
#10 Tu 15 Mar Scale up & Scale out (“Big Data”) K  
  Th 17 Mar PIG, et al. L A4
IV. Directions: Where to next?
#11 Tu 22 Mar   M, N P2
  Th 24 Mar   O, P R4
#12 Tu 29 Mar   Q, R P2
  Th 31 Mar   S, T A5
 
#13 ? Apr Final Exam R5
 

§x: Readings from the textbook.
Q: Provided reading assigned.
Ax: Assignment x due.
Rx: Report milestone due.
Px: Presentation milestone due.

 
  Policies
Academic Integrity / Honesty / Plagiarism

The Department of Electrical Engineering & Computer Science Academic Honesty Guidelines are in effect for this course, as, indeed, they are for any CS&E / EECS course.

Plagiarism is defined as taking the language, ideas, or thoughts of another, and representing them as your own. If you use someone else's ideas, cite them. If you use someone else's words, clearly mark them as a quotation. Note that plagiarism includes using another's computer programs or pieces of a program. All noted instances of plagiarism will be reported.

These policies are not intended to keep students from working with other students. One can learn much working with others, so this is to be encouraged. Should you encounter any situations for which you are uncertain whether the collaboration is permitted or not, please ask.