|
CSE-2041A
Net-Centric Computing
York University
Fall 2011
|
Lab #2: Databases
Intergalactic Data Speak
|
Posted: 20 September 2011
Revised: 2 October 2011
|
|
|
|
Use
the ij utility and your assigned username and password
to connect to the database at the URL
-
jdbc:derby://indigo.cse.yorku.ca:9999/CSE
To connect to the database,
once in the ij shell,
type
-
connect 'jdbc:derby://indigo.cse.yorku.ca:9999/CSE;user=cseNNNNN;password=XXXX';
where the NNNNN is the rest of your CSE account,
and the XXXX are the last four digits
of your York student number.
(This Derby account is not the same as your CSE account.
I just set it up to use the same names.)
The jdbc is the scheme that identifies
the protocol.
(This is like HTTP is for Web.)
The derby is the sub-scheme.
A URL most often does not have one of these,
but it may.
In this case,
it identifies that we are connecting to a Derby database system.
Then,
indigo.cse.yorku.ca
is the host,
the server machine where the service is running,
and the 9999 indicates the port
where it listens.
The path,
CSE ,
in this case,
indicates the database to which we are connecting
(that the Derby database system is managing).
Following the path are the parameters (attribute-value pairs)
that are passed to the service.
In this case, we are passing just two:
a user name
and a password
for that user.
In a URL,
the path and the parameters are usually
separated by a “? ”.
It is a syntactic quirk that they need to be separated,
in this case,
by a semicolon (“; ”),
instead.
The parameters are separated by semicolons,
which is usual.
In ij ,
remember to type the URL of the database to which you are connecting
in single quotes (“' ”), as above.
And end the command with a semicolon.
Every command in the ij
session needs to be ended with a semicolon.
If ij is just sitting there seemingly waiting
after you have hit return on a command,
expecting a response,
check that you typed the semicolon.
The nice thing is that your command can span a number of screen lines.
The command is not issued until the semicolon appears.
Next,
in your ij session,
issue the command
-
set schema roumani;
This sets the schema you use as default
to roumani .
A database schema is just a namespace
in a database,
used to group collections of related tables.
The table we are interested in exploring
is called sis ,
and is within schema roumani .
Issue
-
describe sis;
to see the format
(
called the schema of the table
—
sorry, yet another use of the word
“schema”!)
of the table sis .
You will be issuing SQL queries directly to the database system
from your ij session.
When you are finished with the session,
to disconnect from the database, do
-
disconnect;
To leave the session
—
the ij shell
—
say
-
exit;
|
|
|
|
See
SQL Tutorial
&
Another SQL Tutorial
for good introductions to SQL.
SQL is a standardized query language
for retrieving information from a relational database
(and for updating information in the database,
and manipulating databases too).
We will be querying
—
retrieving information
—
from a database (CSE ),
specifically from a table sis ,
which holds student records.
The basic syntax of an SQL query is
-
select list-of-columns
from list-of-tables
where list-of-conditions;
In the select clause,
we specify the columns
— the attributes —
we want in the answer table.
The shorthand of
“* ”
is used here if we want all the columns of the table(s) back.
In the from clause,
we specify which tables
the information is to draw from.
For our queries,
it will be just a single table: sis .
The where clause
specifies conditions that the rows from the table(s)
must satisfy to be returned in the answer table.
For instance,
GPA > 8.0
would find just students with a GPA above 8.
Conditions can be combined with
and,
or,
not, and
other logical operators,
to make complex conditions.
The where clause is optional;
if we leave it out, all the rows are returned.
SQL does not have to order the rows in the answer table
in any particular way.
Usually, we do want them ordered somehow,
so we can interpret the results more easily.
An order by column-list
can be added at the end of the query for this.
For example,
say we want to find
all computer science majors who have a GPA of more than 8.
Let's report the columns
ID ,
surname ,
givenName ,
GPA ,
and
city .
-
select ID, surname, givenName, GPA, city
from sis
where major = 'Computer Science' and GPA > 8.0
order by surname, givenName, ID;
Aggregation is when we want to consolidate results.
For instance,
we simply might wish to know how many students there are
from each city.
SQL accommodates this with aggregate functions
for return columns,
and with two more clauses,
group by
and
having .
-
select list-of-columns
from list-of-tables
where list-of-conditions
group by list-of-tables
having list-of-conditions.
The group by
lets us specify which are the grouping columns,
for which we want to know aggregate information.
In our example of number of students per city,
this is city .
The having clause is like the where clause.
It lets us filter out the aggregated answer rows
in the answer table by conditions.
Notice this could not be accomplished in the where clause!
Like the where clause,
the having clause is optional.
If we have nothing filter, we can leave it off.
If the group by clause is left off
but aggregate column functions are used
in the select
—
so it is an aggredate query
—
a single answer row is returned,
representing the aggregate over the whole input table.
Again, an order by clause can go at the end,
to order the answer rows.
So, the number of students per city?
-
select city, count(*) as students
from sis
group by city
order by city;
Notice the as with the name
after the aggregate-function column.
This is the convention to give a name to the column
in the answer table.
This is the tip of the iceberg for SQL.
There are many more types of complex queries possible.
For instance,
we can retrieve information from more than one table.
This usually involves joining information from each.
But this offers a good basis.
Also, anywhere a table name can appear in a query,
we can put another query instead!
Since the answer table to a query has the same form as any other table,
this works.
Such is called a sub-query,
and it just goes between
“( ”
&
“) ”.
(The SQL parser can be picky, though.
A sub-query in the from
usually ought to be named.
E.g., (...) as myTable .
Here is another useful construct for the lab.
The clause
-
fetch first k rows only;
can be added as the last clause
(after the order by ),
where k is a positive integer, like 5.
This only returns the first k rows of the answer table.
This is useful when just playing about,
and you do not want huge numbers of rows thrown back at you.
|
|
|
|
These
query exercises retrieve information from the database,
and report it in specified ways.
-
list
-
Retrieve all the records.
-
Do the same, but now
order by
major ,
surname ,
givenName .
-
selection
Retrieve the record for the student
Dante Golden
(givenName and
surname ,
respectively).
-
projection
List the majors (major ).
Show just
major .
Do not have duplicate rows!
Order by
major .
-
range
-
Retrieve the student records for students with
a surname between Go and Gu,
inclusive.
Just show
ID ,
surname ,
givenName ,
yearAdmitted .
Order by
surname ,
givenName ,
ID .
-
Do the same query,
but find instead the students whose surname 's
second letter is y.
-
compound conditions
-
Retrieve the students majoring in
Computer Science,
who entered in 2008 or before,
and who have a GPA of 6.5 or higher.
Show
surname ,
givenName ,
birthdate ,
GPA .
Order by
surname ,
givenName ,
ID .
-
Do the same query,
but find
Computer Science majors
who entered in 2008 or before,
or who have a GPA of 6.5 or higher.
|
|
|
|
These
query exercises aggregate information from the database,
to see patterns in the data.
-
full-set aggregation
-
How many students are in the table?
-
Do the same again, but name the return column
students .
-
group by
List per major the average GPA (avg_gpa ).
Show
major ,
avg_gpa .
Order by
major .
-
compound group by
List per major and year admitted the average GPA
(avg_gpa ).
Show
major ,
yearAdmitted ,
avg_gpa .
Order by
major ,
yearAdmitted .
-
having
List the major,
the number of students in the major (students ),
and the average GPA of the major (avg_gpa )
for majors with ten or more students.
Order by
major .
-
sub-query
Show the student with the highest GPA for each major.
Show
ID ,
surname ,
givenName ,
major ,
GPA .
Order by
major ,
ID .
Might more than one student be reported for a given major?
Why?
|
|
|
|
Send
and e-mail to godfrey@acm.org
with an SQL query that does the following.
-
Retrieves the student record that would
come just before yours,
if we were to insert a record for you in the table,
as ordered by surname.
Show
ID ,
surname ,
givenName ,
major .
If there were two or more
students with the same surname
that would be just before you,
it is fine to list both, or more, records.
E.g.,
for Godfrey, the returned results would be
-
ID |SURNAME |GIVENNAME |MAJOR
----------------------------------------------------------------------------------
200971224|Glover |Alexander |Business
200878456|Glover |Emma |Latin American & Caribbean Studies
-
For the e-mail's subject, say
- cse-2041 / lab #2 / cseXXXXX
where cseXXXXX is your CSE account name.
|
|
|
|