sql - lecture 3 (aggregation, etc.) - home | george …ksmits/cs550/sql3.pdf · infs614 3 sql: seen...

41
INFS614 1 SQL - Lecture 3 (Aggregation, etc.) INFS 614

Upload: truongtu

Post on 10-Sep-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

INFS614 1

SQL - Lecture 3 (Aggregation, etc.)

INFS 614

INFS614 2

Example Instances

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day22 101 10/10/9658 103 11/12/96

R1

S1

S2

INFS614 3

SQL: Seen so far… ❖  Syntax of Basic SQL query:

–  Relation-list (FROM) –  Target-list (SELECT) –  Qualification (WHERE)

❖  Semantic: Conceptual Evaluation Strategy; ❖  String operators: LIKE, NOT LIKE; ❖  Set operators: Union (all), Intersect (all), Except

(all); ❖  Qualification involving sets: IN, EXISTS,

UNIQUE, ANY, ALL (and their negated version); ❖  Nested queries.

INFS614 4

Post Processing ❖  Processes the result of an SQL query: ❖  1) Order by: sort the result tuples by any

column(s), (even columns not appearing in the SELECT clause).

❖  2) Distinct: Duplicate removal in result tuples ❖  Example:

❖  Aggregates ... (later in lecture)

SELECT Distinct S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid and R.bid=103 Order by S.sid asc;

INFS614 5

Order By: Example

•  Sort order: •  ASC : Ascending (default) •  DESC : Descending

•  Listed by priority •  E.g., ties in S.age ordered by

S.sname

SELECT S.sname, S.age FROM Sailors S WHERE S.sname LIKE ‘%U%’ ORDER BY S.age,S.sname

sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.0 32 andy 8 25.5 44 guppy 5 35.0 58 rusty 10 35.0 71 zorba 10 16.0

sid sname rating age 44 guppy 5 35.0 58 rusty 10 35.0 28 yuppy 9 35.0 31 lubber 8 55.0

S.age ASC, S.sname DESC

sid sname rating age 28 yuppy 9 35.0 58 rusty 10 35.0 44 guppy 5 35.0 31 lubber 8 55.0

INFS614 6

Aggregate Operators ❖  Significant extension of relational algebra.

COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A)

single column

INFS614 7

Aggregate Operators COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A)

SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10

SELECT COUNT (*) FROM Sailors S

SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10

SELECT S.sname FROM Sailors S WHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2)

single column SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’

INFS614 8

Find name and age of the oldest sailor(s)

❖  The first query is illegal! (We’ll look into the reason a bit later, when we discuss GROUP BY.)

❖  The third query is equivalent to the second query, and is allowed in the SQL/92 standard, but is not supported in many systems.

SELECT S.sname, MAX (S.age) FROM Sailors S

SELECT S.sname, S.age FROM Sailors S WHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2)

SELECT S.sname, S.age FROM Sailors S WHERE (SELECT MAX (S2.age) FROM Sailors S2) = S.age

INFS614 9

GROUP BY and HAVING ❖  So far, we’ve applied aggregate operators to

all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples.

❖  Consider: Find the age of the youngest sailor for each rating level. –  In general, we don’t know how many rating levels

exist, and what the rating values for these levels are!

–  Suppose we know that rating values go from 1 to 10; we can write 10 queries that look like this (!):

SELECT MIN (S.age) FROM Sailors S WHERE S.rating = i

For i = 1, 2, ... , 10:

INFS614 10

Solution with GROUP BY

❖  Query: Find the age of the youngest sailor for each rating level.

❖  This query:

❖  1) Patitions Sailors S tuples into groups, according to their s.rating value

❖  2) Returns one tuple per partion including: a) the s.rating value, b) the minimum s.age for that partition.

SELECT S.rating, MIN (S.age) FROM Sailors S GROUP BY S.rating

INFS614 11

Queries With GROUP BY and HAVING

❖  The grouping-list forms a partition of tuples with the same value for attributes in the list (e.g., same s.rating value).

❖  The target-list consists of an (i) attribute list (e.g., s.rating) and (ii) aggregate terms (e.g., MIN (S.age)).

❖  The attribute list must be a subset of the grouping-list –  This is because each target-list attribute must have a single value

per group/partition!

SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list [HAVING group-qualification]

INFS614 12

Conceptual Evaluation ❖  The cross-product of relation-list is computed, tuples that

fail qualification are discarded, `unnecessary’ fields are deleted.

❖  The remaining tuples are partitioned into groups by the value of attributes in grouping-list.

❖  The group-qualification is then applied to eliminate some groups.

–  Any attribute in group-qualification that is not an argument of an aggregate op must also appear in the grouping-list.

–  This is because expressions in group-qualification must have a single value per group!

❖  One answer tuple is generated per qualifying group.

INFS614 13

For each red boat, find the number of reservations for this boat

❖  Grouping over a join of two relations.

❖  What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition? –  ... an error!

SELECT B.bid, COUNT (*) AS rescount FROM Boats B, Reserves R WHERE R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid

(B.color is not in the grouping list)

INFS614 14

For each sailor with more than three reservations, find the number of his reservations

SELECT R.sid, COUNT (*) AS rescount FROM Reserves R GROUP BY R.sid HAVING COUNT(*) > 3

Find the age of the youngest sailor for each rating level > 5 which has an average age > 25

SELECT S.rating, MIN(S.age) AS minAge FROM Sailors S WHERE S.rating > 5 GROUP BY S.rating HAVING AVG(S.age) > 25

SELECT S.rating, MIN(S.age) AS minAge FROM Sailors S GROUP BY S.rating HAVING S.rating > 5 AND AVG(S.age) > 25

INFS614 15

SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING COUNT (*) > 1

sid sname rating age 22 Dustin 7 450 31 lubber 8 55.0 32 Andy 8 25.5 71 Zorba 10 16.0 64 Horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 74 Horatio 9 40.0 85 Art 3 25.5 95 Bob 3 63.5

sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.0 32 andy 8 25.5 64 horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 74 horatio 9 40.0 85 art 3 25.5 95 bob 3 63.5

Answer relation

rating 3 25.5 7 35.0 8 25.5

Find the age of the youngest sailor older than 18, for each rating with at least 2 (such) sailors older than 18

rating age 1 33.0 3 25.5 3 63.5 7 45.0 7 35.0 8 55.0 8 25.5 9 40.0 10 35.0

•  GROUP BY applied •  S.rating and S.age are

mentioned in the SELECT, GROUP BY, HAVING clauses; other attributes discarded

•  The 2nd column of result is unnamed. (Use AS to name it.)

INFS614 16

Find the age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of any age!)

•  HAVING clause includes a correlated subquery. •  Counts partitions without an S.age>18 predicate •  What if HAVING clause is replaced by:

–  HAVING COUNT(*) >1 (previous query)? –  WHERE clause always applied before HAVING clause!!

SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age > 18 GROUP BY S.rating HAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating)

Computes total number of sailors who have rating equal to S.rating

INFS614 17

Find the ratings with the minimum average sailor age.

❖  Aggregate operations cannot be nested! ❖  The following is WRONG:

SELECT S.rating FROM Sailors S WHERE AVG(S.age) = (SELECT MIN (AVG (S2.age))

FROM Sailors S2 GROUP BY S2.rating)

INFS614 18

Continue from previous slide ❖  Use nested FROM to compute a temp table

of average sailor age, for each rating level –  (A bit like relational algebra, or views!)

❖  Any query with a HAVING can be rewritten without a HAVING via the nested FROM

SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)

INFS614 19

Note on Oracle SQL in the labs

Do not use the keyword “AS” to name a temporary table within the FROM clause.

INFS614 20

Note on Oracle SQL in the labs

SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS Temp WHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp)

INFS614 21

Conclusions (so far…)

❖  Post processing on the result of queries is supported.

❖  Aggregation is the most complex “post processing” –  “Group by” clause partitions the results into

groups –  “Having” clause puts condition on groups (just

like Where clause on tuples).

INFS614 22

Null Values

❖  Field values in a tuple are sometimes unknown (e.g., a rating has not been assigned) or inapplicable (e.g., no spouse’s name).

❖  They can also be hidden (for security reasons)

❖  SQL provides a single special value null for all such situations.

INFS614 23

Dealing with the null value ❖  Special operators are needed to check if value is/

is not null. –  Attribute_name IS NULL : TRUE if the

attribute’s value is null. FALSE otherwise –  Attribute_name IS NOT NULL : TRUE if the

attribute’s value is not null. FALSE otherwise ❖  Is rating>8 true or false when rating is equal to

null? –  Actually, it’s unknown.

❖  How about: rating > 8 AND age < 40 ? –  Three-valued logic.

INFS614 24

Three Valued Logic ❖  Logical operators AND, OR and NOT using a 3-

valued logic (TRUE, FALSE and unknown).

NOT T F F T U U

AND T F U T T F U F F F F U U F U

OR T F U T T T T F T F U U T U U

INFS614 25

Null Values: Some issues The presence of null complicates many issues: ❖  WHERE clause eliminates tuples for which the qualification

does not evaluate to TRUE. Important in nested queries involving EXISTS or UNIQUE

❖  Two tuples in SQL are duplicates in a relation if corresponding attributes have the same value or both contains null.

SELECT S1.sname FROM SAILORS S1 WHERE EXISTS ( SELECT * FROM SAILORS S2 WHERE S1.sname = S2.sname AND AND S1.SID < S2.SID AND S1.RATING <> S2.RATING)

sid sname rating age 22 dustin 7 450 31 lubber 8 55.0 64 horatio null 35.0 32 andy 8 25.5 74 horatio 9 40.0

INFS614 26

Issues with the null value: Summary

❖  WHERE and HAVING clause eliminates rows (groups) for which the qualification does not evaluate to true (i.e., evaluate to false or unknown).

❖  Aggregate functions ignore null values (except count(*)).

❖  Distinct treats all null values as the same. ❖  The arithmetic operations +, -, *, / return

null if one of the arguments is null.

INFS614 27

Outer joins sid bid day22 101 10/10/9658 103 11/12/96

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

(left outer-join)

= sid sname rating age bid day 22 dustin 7 45.0 101 10/10/96 31 lubber 8 55.5 Null Null 58 rusty 10 35.0 103 11/12/96

Outer joins are generators of null values!

INFS614 28

Oracle Notation: Select * From Sailor S left/right/full outer join Reserves R on S.sid =

R.sid;

Select S.sid, count(R.bid) From Sailor S left outer join Reserves R on S.sid = R.sid; Group by s.sid;

Select S.sid, count(*) From Sailor S left outer join Reserves R on S.sid = R.sid; Group by s.sid;

What do these return??

INFS614 29

SELECT statement functions ❖  Values can be transformed before aggregation:

e.g.: SELECT sum(S.A/2) FROM S;

❖  The decode function (Oracle specific): decode(attribute, domain value1, output value1,

domain value2, output value2, …, default output): SELECT sum(decode(Major, ‘INFS’, 1, 0)) as Total_IS_Stu,

sum(decode(Major, ‘INFS’, 0, 1)) as Total_NonIS_Stu FROM Student ;

❖  Calculating GPA from letter grades?

INFS614 30

Examples Department (D-code, D-Name, Chair-SSn)

Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 31

Query 1

List the classes (Class-no) currently taken by students whose names start with 'T'.

SELECT distinct e.Class-no FROM Enrollment e, Student s WHERE e.Student-Ssn = s.Ssn AND s.S-Name LIKE 'T%'

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 32

Query 2

List the students (SSN) who are currently taking exactly one class.

SELECT e.Student-Ssn FROM Enrollment e GROUP BY e.Student-Ssn HAVING 1=count(*)

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 33

Query 3

Give the percentage of the students (among all students) who are currently taking courses offered by ISE (D-code='ISE').

SELECT count(distinct e.Student-Ssn)/count(distinct s.Ssn) *100.0 as Percent

FROM Enrollment e, Class c, Student s WHERE e.Class-no=c.Class-no and c.D-code='ISE';

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 34

Query 4

List the faculty members (F-Name) who teach 2 or more classes. List these faculty members by the number of classes they teach (ascending order).

SELECT f.F-Name FROM Faculty f, Class c WHERE f.Ssn=c.Instructor-SSn GROUP BY f.Ssn, f.F-Name HAVING count(distinct c.Class-no)>=2 ORDER BY count(distinct c.Class-no), F-Name;

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 35

Query 5

List the students (SSN and Name) along with the number of classes they are taking. If a student is not taking any class, the student should also be listed (with 0 as the number of classes he/she is taking). The list should be ordered by the number of classes (in an ascending order), and in case of a tie, by the SSN of the students.

SELECT s.Ssn, s.S-Name, count(distinct Class-no) FROM Student s, Enrollment e WHERE s.Ssn = e.Student-Ssn (+) GROUP BY s.Ssn, s.S-Name ORDER BY count(distinct Class-no), Ssn

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 36

Query 6

List the faculty members (F-Name) who teach more than twice as many classes as Professor Smith (F-Name='Smith') is teaching. (Note that if Professor Smith is not teaching anything, then any professor who teaches at least one class will satisfy the query.)

SELECT f.F-Name FROM Faculty f, Class c WHERE f.Ssn=c.Instructor-SSn GROUP BY f.Ssn, f.F-Name HAVING count(distinct c.Class-no) > (SELECT 2*count(distinct c2.Class-no) FROM Faculty f2, Class c2 WHERE f2.F-Name='Smith' and f2.Ssn=c2.Instructor-SSn)

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 37

Query 7

Find the number of departments which do not have a chairman (Chair_ssn is 'NULL').

SELECT count(d.D-code) FROM Department d WHERE d.Chair-SSn is NULL

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 38

Query 8

For each department (i.e., student’s Major), give the number of graduate students (status='Grad') and the number of other students (status <> 'Grad'). The two numbers must be shown in the same row as the department code. Hint: use decode.

SELECT Major, sum(decode(Status, 'Grad', 1,0)) as Grad, sum(decode(Status, 'Grad', 0,1)) as NoGrad FROM Student GROUP BY Major

Department (D-code, D-Name, Chair-SSn) Course (D-code, C-no, Title, Units) Prereq (D-code, C-no, P-code, P-no) Class (Class-no, D-code, C-no, Instructor-SSn) Faculty (Ssn, F-Name, D-Code, Rank) Student (Ssn, S-Name, Major, Status) Enrollment (Class-no, Student-Ssn) Transcript (Student-Ssn, D-Code, C-no, Grade)

INFS614 39

Query 9 For each department (D-code), give the highest rank(s) of the professors in the department along with the number of the faculty with that highest rank. The output contains one row for each department. Assume Full>Associate>Assistant, i.e., lexicographic order is fine. Note that some department may not have professors in some ranks (e.g., a department may not have full, or associate or assistant professors).

SELECT d.D-Code as dept, d.maxrank, count(f.Ssn) as num FROM (SELECT D-code, max(Rank) as maxrank FROM Faculty

GROUP BY D-code) AS d, Faculty f WHERE d.D-code=f.D-code AND f.Rank=d.maxrank GROUP BY d.D-code, d.maxrank ORDER BY d.D-code

INFS614 40

Query 10 Student(snum: integer, sname: string, major: string, level: string, age: integer) Class(name: string, meets at: string, room: string, fid: integer)!Enrolled(snum: integer, cname: string)!Faculty(fid: integer, fname: string, deptid: integer)"

SELECT DISTINCT S.sname!FROM Student S!WHERE S.snum IN! (SELECT E1.snum! FROM Enrolled E1, Enrolled E2, Class C1, Class C2! WHERE E1.snum = E2.snum AND E1.cname <> E2.cname! AND E1.cname = C1.name AND E2.cname = C2.name! AND C1.meets at = C2.meets at)"

Find the names of all students who are enrolled in two classes that meet at the same time"

INFS614 41

Examples of Oracle’s Rollup/Cube

❖  https://oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets