Project: SQL Jeopardy
Querying the York River Bookseller Database
  Project

In this project, you will work with an existing database, the York River Bookseller's Database (YRB DB). Actually, each will install his or her own private copy of the database on which to work. For SQL Jeopardy, you are to implement a number of SQL queries over the YRB DB. The project is 20% of the total grade.

There are ten queries that you are to write. Each is worth two point for a total of 20 points for the project. I am the manager and I provide you with English specifications for the queries that I want answered from the database. You are providing SQL implementations of these queries for YRB DB to do the job.

Each query is given a name for bookkeeping. You have successfully implemented the SQL query if you get the same result as I do when run in DB2 on YRB DB. Otherwise, your query is unsuccessful. Grading per query is all-or-nothing. Either your query produces the right results, or it does not.

The queries below are arranged roughly in order of increasing difficulty. The ones at the beginning are straightforward, and the ones towards the end are difficult. Do not get too discouraged if you are unable to do the final few.

Honesty: For projects, you are permitted to confer with others, seek advice, and (to a reasonable extent) help. However, remember that copying someone else's queries and claiming them as your own work is plagiarism. You must do your own work.

The TA will be grading the project so all questions related to the project should be directed to her (info on the course web page).

 
  The YRB Database

Two scripts are provided:

The script yrb-create will create the YRB DB schema for you. It will also populate the tables with mock data. The script yrb-drop is provided for convenience. It will drop your copy of YRB DB from your DB2 schema space. If you mess things up, you can always drop YRB DB and then re-create it easily.

To create the YRB DB in your DB2 schema space:

  • % db2 -tf yrb-create

Read the schema definition in yrb-create for YRB DB to understand the design and what the YRB DB is about.

York River Booksellers is an online bookstore. Customers belong to various clubs. Everybody belongs to at least the club basic. Books are available via different offers per club. Thus the price of a book is determined by which offer (thus club) it was bought under.

Consider an order to be all the books a customer bought at the same time (when in purchase). These will be shipped together.

 
  The Queries
1. one

Select distinct customers (by name) who buy books from club(s) whose name have 'A' as the second letter.?

Show customer's name.

Answer

2. two

Which pairs of customers purchased at least one book in common?

List distinct pairs of customers (by name).

Answer

3. three

Which club or clubs have the most members?

Show the club(s) and the number.

Answer

4. four

For each customer, find the total cost he or she has paid for books in one category. You don't need to consider qnty for this query, just can assume that qnty for every purchase is 1

Show the customers's cid and name, the category and the cost.

Answer

5. five

Select distinct customers (by name) who didn't purchase books from all the club(s) he or she belongs to.

Show the name of the customer.

Hint: for each customer in you answer table, there is at least one club that he or she had participated but didn't purchase from.

Answer

6. six

For each customer, show the category he or she spent more on the books of that category than on books of any other category?

Show customer name, the category and the cost.

Answer

7. seven

Find the books which are only offered by nine clubs.

Show the title and the year.

Answer

8. eight

Select the books, which are more expensive than the average price (across all clubs) of the most expensive books (for each club) written in English. If a club doesn't have books written in English, do not count.

Show the title, the year and the price of the book.

Answer

9. nine

Suppose all books shipped to the customer in purchases made on the same time (when) are shipped as one package. Remember, that customers are billed for the books and the postage.

For each customer, how much do they spend on each package? If the weight of one package is X grams, the entry just higher than (or equal to) X is found in the shipping table and the associated shipping price is the postage for this package.

Show the customer's cid, the day of the package and the cost, order by cid.

Answer

10. ten

Suppose all the books shipped to a customer purchased **in one club** made on the same **day** are shipped as one package. If the total price of one package is higher than (or equal to) 50 dollars, it's post-free. Otherwise, postage is extra.

For each customer, find the package that costs him the most.

Show customers's cid, the day of the package and the cost, order by cid.

Hint: don't forget about the quantity of each purchase! Also, '1998-1-1-17.18.00' is a string! So for DB2 to consider it as a date, you must cast it as well: cast('1998-1-1-17.18.00' as date).

Answer

 
  Deliverables

Due by midnight Nov 12.

For each query, write your query in a file with the corresponding name as above (one, ..., ten).

Do not include the answers generated by each of your queries. We will be testing your queries ourselves for grading your project.

When you are finished, use the submit command to turn in your work. The command will look something like:

% submit 3421 a1 one, ..., ten

 
  Hints & DB2 Comments

DB2 SQL provides a useful clause called with. It is easiest to show its use by example.

refund

Who has bought a book via some club that he or she could have gotten less expensively using another club to which he or she belongs? List the customer's name, the book's title and year, the quantity bought, the price for which it was bought, and the best price for which it could have been bought.

with
    best (cid, title, year, lowest) as (
        select distinct M.cid, O.title, O.year, min(price)
            from yrb_member M, yrb_purchase P, yrb_offer O
            where M.club = O.club and
                  M.cid = P.cid and
                  P.title = O.title and P.year = O.year
            group by M.cid, O.title, O.year
    )
select C.name, P.title, P.year, qnty, price, lowest
    from yrb_customer C, yrb_purchase P, Best B, yrb_offer O
    where P.cid = B.cid and P.title = B.title and
          P.year = B.year and P.title = O.title and
          P.year = O.year and P.club = O.club and
          C.cid = P.cid and
          O.price > B.lowest
    order by C.name, P.title, P.year;

Answer table for query refund.

One may have more tables in the with clause, if need be:

with
    first (...) as
        (...),
    second (...) as
        (...),
    third (...) as
        (...)
select ...
    ...;

Note that you can use the (temporary) view first inside the definition for second, and the views first and second inside the definition for third, and so forth.

DB2 is picky about nested SELECTS always being named via AS. For example,

select ...
    from (select ...) as A,
         (select ...) as B,
         ...
    ...;

Even if you do not need the alias names A or B later on, give the sub-queries (nested SELECTS) names anyway. Otherwise, DB2 will complain.