infs614 - lecture week 9 1 more on sql lecture week 9 infs 614, fall 2008

40
INFS614 - Lecture week 9 More on SQL Lecture Week 9 INFS 614, Fall 2008

Upload: rose-simon

Post on 24-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

INFS614 - Lecture week 9 1

More on SQL

Lecture Week 9

INFS 614, Fall 2008

INFS614 - Lecture week 9 2

Example Instances

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.558 rusty 10 35.0

sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day

22 101 10/10/9658 103 11/12/96

R1

S1

S2

INFS614 - Lecture week 9 3

SQL: Seen so far… Syntax of Basic SQL query:

– Relation-list (FROM)– Target-list (SELECT)– Qualification (WHERE)

Semantic: Conceptual Evaluation Strategy; String operators: LIKE, NOT LIKE; Set operators: Union (all), Intersect (all),

Except (all); Qualification involving sets: IN, EXISTS,

UNIQUE, ANY, ALL (and their negated version);

Nested queries.

INFS614 - Lecture week 9 4

Post Processing

Processing on the result of an SQL query:– Sorting: can sort the tuples in the output by any

column (even the ones not appearing in the SELECT clause)

– Duplicate removal– Example:

Aggregation operators

SELECT Distinct S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid and R.bid=103Order by S.sid asc;

INFS614 - Lecture week 9 5

Order By: Example

• For each attribute in the sort list, we can specify its order:• ASC : Ascending Order• DESC : Descending Order

• Default Order is Ascending

SELECT S.sname, S.ageFROM Sailors SWHERE S.sname LIKE ‘%U%’ORDER BY S.age,S.sname

sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.0 32 andy 8 25.5 44 guppy 5 35.0 58 rusty 10 35.0 71 zorba 10 16.0

sid sname rating age 44 guppy 5 35.0 58 rusty 10 35.0 28 yuppy 9 35.0 31 lubber 8 55.0

S.age ASC, S.sname DESC

sid sname rating age 28 yuppy 9 35.0 58 rusty 10 35.0 44 guppy 5 35.0 31 lubber 8 55.0

INFS614 - Lecture week 9 6

Aggregate Operators Significant extension of relational

algebra.

COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)

single column

INFS614 - Lecture week 9 7

Aggregate OperatorsCOUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)

SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10

SELECT COUNT (*)FROM Sailors S

SELECT AVG ( DISTINCT S.age)FROM Sailors SWHERE S.rating=10

SELECT S.snameFROM Sailors SWHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2)

single column

SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname=‘Bob’

INFS614 - Lecture week 9 8

Find name and age of the oldest sailor(s)

The first query is illegal! (We’ll look into the reason a bit later, when we discuss GROUP BY.)

The third query is equivalent to the second query, and is allowed in the SQL/92 standard, but is not supported in some systems.

SELECT S.sname, MAX (S.age)FROM Sailors S

SELECT S.sname, S.ageFROM Sailors SWHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2)

SELECT S.sname, S.ageFROM Sailors SWHERE (SELECT MAX (S2.age) FROM Sailors S2) = S.age

INFS614 - Lecture week 9 9

GROUP BY and HAVING

So far, we’ve applied aggregate operators to all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples.

Consider: Find the age of the youngest sailor for each rating level.– In general, we don’t know how many rating levels

exist, and what the rating values for these levels are!

– Suppose we know that rating values go from 1 to 10; we can write 10 queries that look like this (!):

SELECT MIN (S.age)FROM Sailors SWHERE S.rating = i

For i = 1, 2, ... , 10:

INFS614 - Lecture week 9 10

Queries With GROUP BY and HAVING

The target-list contains (i) attribute names (ii) terms with aggregate operations (e.g., MIN (S.age)).– The attribute list (i) must be a subset of grouping-list. – Intuitively, each answer tuple corresponds to a group, and

these attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.)

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-list[HAVING group-qualification]

INFS614 - Lecture week 9 11

Conceptual Evaluation The cross-product of relation-list is computed, tuples

that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list.

The group-qualification is then applied to eliminate some groups. Expressions in group-qualification must have a single value per group!– In effect, an attribute in group-qualification that is not an

argument of an aggregate op also appears in grouping-list. (SQL does not exploit primary key semantics here!)

One answer tuple is generated per qualifying group.

INFS614 - Lecture week 9 12

Query with GROUP BY and HAVING

Query: Find the age of the youngest sailor for each rating level.

SELECT S.rating, MIN (S.age)FROM Sailors SGROUP BY S.rating

INFS614 - Lecture week 9 13

Find the age of the youngest sailor older than 18, for each rating with at least 2 such sailors older than 18

• Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes `unnecessary’.

• 2nd column of result is unnamed. (Use AS to name it.)

SELECT S.rating, MIN (S.age)FROM Sailors SWHERE S.age > 18GROUP BY S.ratingHAVING COUNT (*) > 1

sid sname rating age 22 Dustin 7 450 31 lubber 8 55.0 32 Andy 8 25.5 71 Zorba 10 16.0 64 Horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 74 Horatio 9 40.0 85 Art 3 25.5 95 Bob 3 63.5

sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.0 32 andy 8 25.5 64 horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 74 horatio 9 40.0 85 art 3 25.5 95 bob 3 63.5

rating age 1 33.0 3 25.5 3 63.5 7 45.0 7 35.0 8 55.0 8 25.5 9 40.0 10 35.0

Answer relation

rating 3 25.5 7 35.0 8 25.5

INFS614 - Lecture week 9 14

For each red boat, find the number of reservations for this boat

Grouping over a join of two relations.

What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition?

SELECT B.bid, COUNT (*) AS rescountFROM Boats B, Reserves RWHERE R.bid=B.bid AND B.color=‘red’GROUP BY B.bid

INFS614 - Lecture week 9 15

Find the age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age)

• HAVING clause can also contain a correlated subquery.

• Compare this with the query where we considered only ratings with 2 sailors over 18!

• What if HAVING clause is replaced by:– HAVING COUNT(*) >1

SELECT S.rating, MIN (S.age)FROM Sailors SWHERE S.age > 18GROUP BY S.ratingHAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating)

Computes total number of sailors who have rating equal to S.rating

INFS614 - Lecture week 9 16

For each sailor with more than three reservations, find the number of his reservations

SELECT R.sid, COUNT (*) AS rescountFROM Reserves RGROUP BY R.sidHAVING COUNT(*) > 3

Find the age of the youngest sailor, for each rating level > 5 with an average age > 25

SELECT S.rating, MIN(S.age) AS minAgeFROM Sailors SWHERE S.rating > 5GROUP BY S.ratingHAVING AVG(S.age) > 25

SELECT S.rating, MIN(S.age) AS minAgeFROM Sailors SGROUP BY S.ratingHAVING S.rating > 5 AND AVG(S.age) > 25

INFS614 - Lecture week 9 17

Find those ratings for which the average age of sailors is the minimum over all ratings

Aggregate operations cannot be nested! WRONG:

SELECT S.ratingFROM Sailors SWHERE AVG(S.age) = (SELECT MIN (AVG (S2.age))

FROM Sailors S2 GROUP BY S2.rating)

INFS614 - Lecture week 9 18

Continue from previous slide

Correct solution (in SQL/92):

SELECT Temp.rating, Temp.avgageFROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS TempWHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)

INFS614 - Lecture week 9 19

Note on Oracle SQL in the labs

Do not use the keyword “AS” to name a temporary table within the FROM clause.

INFS614 - Lecture week 9 20

Note on Oracle SQL in the labs

SELECT Temp.rating, Temp.avgageFROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS TempWHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)

INFS614 - Lecture week 9 21

Conclusions (so far…)

Post processing on the result of queries is supported.

Aggregation is the most complex “post processing”– “Group by” clause partitions the results into

groups– “Having” clause puts condition on groups

(just like Where clause on tuples).

INFS614 - Lecture week 9 22

Null Values

Field values in a tuple are sometimes unknown (e.g., a rating has not been assigned) or inapplicable (e.g., no spouse’s name).

SQL provides a special value null for such situations.

INFS614 - Lecture week 9 23

Dealing with the null value Special operators are needed to check if value

is/is not null. – Attribute_name IS NULL : TRUE if the

attribute’s value is null. FALSE otherwise– Attribute_name IS NOT NULL : TRUE if the

attribute’s value is not null. FALSE otherwise Is rating>8 true or false when rating is equal to

null?– Actually, it’s unknown.

How about: rating > 8 AND age < 40 ?– Three-valued logic.

INFS614 - Lecture week 9 24

Three Valued Logic Logical operators AND, OR and NOT using a 3-

valued logic (TRUE, FALSE and unknown).

INFS614 - Lecture week 9 25

Null Values: Some issuesThe presence of null complicates many issues:

WHERE clause eliminates tuples for which the qualification does not evaluate to TRUE.Important in nested queries involving EXISTS or UNIQUE

Two tuples in SQL are duplicates in a relation if corresponding attributes have the same value or both contains null.

SELECT S1.sname

FROM SAILORS S1

WHERE EXISTS (

SELECT * FROM SAILORS S2

WHERE S1.sname = S2.sname AND

AND S1.SID < S2.SID

AND S1.RATING <> S2.RATING)

sid sname rating age 22 dustin 7 450 31 lubber 8 55.0 64 horatio null 35.0 32 andy 8 25.5 74 horatio 9 40.0

INFS614 - Lecture week 9 26

Issues with the null value: Summary

WHERE and HAVING clause eliminates rows (groups) for which the qualification does not evaluate to true (i.e., evaluate to false or unknown).

Aggregate functions ignore null values (except count(*)).

Distinct treats all null values as the same.

The arithmetic operations +, -, *, / return null if one of the arguments is null.

INFS614 - Lecture week 9 27

Outer joinssid bid day

22 101 10/10/9658 103 11/12/96

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.558 rusty 10 35.0

(left outer-join)

= sid sname rating age bid day

22 dustin 7 45.0 101 10/10/96

31 lubber 8 55.5 Null Null 58 rusty 10 35.0 103 11/12/96

INFS614 - Lecture week 9 28

In Oracle Select *

From Sailor S, Reserves R

Where S.sid = R.sid (+);

How about:Select S.sid, count(R.bid)

From Sailor S, Reserves R

Where S.sid = R.sid (+)

Group by S.sid;

ORSelect S.sid, count(*)

From Sailor S, Reserves R

Where S.sid = R.sid (+)

Group by S.sid;

INFS614 - Lecture week 9 29

More outer joins

Left outer join+ sign on the right in Oracle:Select * from R, S where R.id=S.id(+)

Right outer join + sign on the left in Oracle:Select * from R, S where R.id(+)=S.id

Full outer join– not implemented in Oracle

INFS614 - Lecture week 9 30

More on value functions

Values can be transformed before aggregated:e.g.: SELECT sum(S.A/2) FROM S;

An interesting decode function (Oracle specific):decode(value, if1, then1, if2, then2, …, else):

SELECT sum(decode(Major, ‘INFS’, 1, 0)) as No_IS_Stu, sum(decode(Major, ‘INFS’, 0, 1)) as No_NonIS_Stu

FROM Student ;

Calculating GPA from letter grades?

INFS614 - Lecture week 9 31

Examples

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 32

Query 1

List the classes (Class-no) currently taken by students whose names start with 'T'.

SELECT distinct e.Class-noFROM Enrollment e, Student sWHERE e.Student-Ssn = s.Ssn AND s.S-Name LIKE 'T%'

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 33

Query 2

List the students (SSN) who are currently

taking exactly one class. SELECT e.Student-SsnFROM Enrollment eGROUP BY e.Student-SsnHAVING 1=count(*)

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 34

Query 3

Give the percentage of the students (among all students) who are currently taking courses offered by ISE (D-code='ISE').

SELECT count(distinct e.Student-Ssn)/count(distinct s.Ssn) as Percent

FROM Enrollment e, Class c, Student sWHERE e.Class-no=c.Class-no and c.D-code='ISE';

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 35

Query 4

List the faculty members (F-Name) who teach 2 or more classes. List these faculty members by the number of classes they teach (ascending order).

SELECT f.F-NameFROM Faculty f, Class cWHERE f.Ssn=c.Instructor-SSnGROUP BY f.Ssn, f.F-NameHAVING count(distinct c.Class-no)>=2ORDER BY count(distinct c.Class-no), F-Name;

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 36

Query 5

List the students (SSN and Name) along with the number of classes they are taking. If a student is not taking any class, the student should also be listed (with 0 as the number of classes he/she is taking). The list should be ordered by the number of classes (in an ascending order), and in case of a tie, by the SSN of the students.

SELECT s.Ssn, s.S-Name, count(distinct Class-no)FROM Student s, Enrollment eWHERE s.Ssn = e.Student-Ssn (+) GROUP BY s.Ssn, s.S-NameORDER BY count(distinct Class-no), Ssn

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 37

Query 6

List the faculty members (F-Name) who teach more than twice as many classes as Professor Smith (F-Name='Smith') is teaching. (Note that if Professor Smith is not teaching anything, then any professor who teaches at least one class will satisfy the above query.)

SELECT f.F-NameFROM Faculty f, Class cWHERE f.Ssn=c.Instructor-SSnGROUP BY f.Ssn, f.F-NameHAVING count(distinct c.Class-no) > (SELECT 2*count(distinct Class-no) FROM Faculty f, Class c WHERE f.F-Name='Smith' and f.Ssn=c.Instructor-SSn)

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 38

Query 7

Find the number of departments which do not have a chairman (Chair_ssn is 'NULL').

SELECT count(d.D-code)FROM Department dWHERE d.Chair-SSn is NULL

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 39

Query 8

For each department (i.e., student’s Major), give the number of graduate students (status='Grad') and the number of other students (status <> 'Grad'). The two numbers must be shown in the same row as the department code. Hint: use decode.

SELECT Major, sum(decode(Status, 'Grad', 1,0)) as Grad, sum(decode(Status, 'Grad', 0,1)) as NoGradFROM StudentGROUP BY Major

Department (D-code, D-Name, Chair-SSn)Course (D-code, C-no, Title, Units)Prereq (D-code, C-no, P-code, P-no)Class (Class-no, D-code, C-no, Instructor-SSn)Faculty (Ssn, F-Name, D-Code, Rank)Student (Ssn, S-Name, Major, Status)Enrollment (Class-no, Student-Ssn)Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 - Lecture week 9 40

Query 9For each department (D-code), give the highest rank of the professors in the department along with the number of the faculty with that highest rank. The output contains one row for each department. Assume Full>Associate>Assistant, i.e., lexicographic order is fine. Note that some department may not have professors in some ranks (e.g., a department may not have full, or associate or assistant professors).

SELECT f.D-Code as dept, f.maxrank, count(e.Ssn) as numFROM (SELECT D-code, max(Rank) as maxrank FROM Faculty GROUP BY D-code) AS f, Faculty eWHERE f.D-code=e.D-code AND e.Rank=f.maxrankGROUP BY f.D-code, f.maxrankORDER BY f.D-code