m.p. johnson, dbms, stern/nyu, spring 20081 c20.0046: database management systems lecture #11 m.p....
Post on 19-Dec-2015
244 views
TRANSCRIPT
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
1
C20.0046: Database Management SystemsLecture #11
M.P. Johnson
Stern School of Business, NYU
Spring, 2008
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
3
Acc(name,bal,type,…) Q2: Find holder of largest account of each type
Note:1. scope of variables
2. this can still be expressed as single SFW
SELECT name, typeFROM Acc a1WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type)
SELECT name, typeFROM Acc a1WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type)
Recall: correlated subqueries
correlation
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
4
New topic: Nulls in SQL If we don’t have a value, can put a NULL
Null can mean several things: Value does not exists Value exists but is unknown Value not applicable
But null is not the same as 0 See Douglas Foster Wallace…
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
5
Null Values x = NULL 4*(3-x)/7 = NULL x = NULL x + 3 – x = NULL x = NULL 3 + (x-x) = NULL x = NULL x = 'Joe' is UNKNOWN
In general: no row using null fields appear in the selection test will pass the test With one exception
Pace Boole, SQL has three boolean values: FALSE = 0 TRUE = 1 UNKNOWN = 0.5
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
6
Null values in boolean expressions C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 – C1
height > 6 = UNKNOWN UNKNOWN OR weight > 190 = UNKOWN (age < 25) AND UNKNOWN = UNKNOWN
E.g.age=20height=NULLweight=180
SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)
SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
7
Comparing null and non-nulls The schema specifies whether null is allowed for
each attribute NOT NULL to forbid Nulls are allowed by default
Unexpected behavior:
Some Persons are not included! The “trichotomy law” does not hold!
SELECT *FROM PersonWHERE age < 25 OR age >= 25
SELECT *FROM PersonWHERE age < 25 OR age >= 25
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
8
Testing for null values Can test for NULL explicitly:
x IS NULL x IS NOT NULL
But: x = NULL is never true
Now it includes all Persons
SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL
SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
9
Null/logic review TRUE AND UNKNOWN = ?
TRUE OR UNKNOWN = ?
UNKNOWN OR UNKNOWN = ?
X = NULL = ?
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
10
Next: Outer join Like inner join except that dangling tuples are
included, padded with nulls
Left outerjoin: dangling tuples from left are included Nulls appear “on the right”
Right outerjoin: dangling tuples from right are included Nulls appear “on the left”
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
11
Cross join - example
Name Address Gender Birthdate
Hanks 123 Palm Rd M 01/01/60
Taylor 456 Maple Av F 02/02/40
Lucas 789 Oak St M 03/03/55
Name Address Networth
Spielberg 246 Palm Rd 10M
Taylor 456 Maple Av 20M
Lucas 789 Oak St 30M
MovieStar
MovieExec
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
12
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Spielberg 246 Palm Rd 10M
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
13
Outer Join - ExampleSELECT * FROM MovieStar LEFT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar LEFT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar RIGHT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar RIGHT OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60 Null Null Null
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Null Null Null Null Spielberg 246 Palm Rd 10M
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60 Null Null Null
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Null Null Null Null Spielberg 246 Palm Rd 10M
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
14
Outer Join - Example
Name Address Gender Birthdate
Hanks 123 Palm Rd M 01/01/60
Taylor 456 Maple Av F 02/02/40
Lucas 789 Oak St M 03/03/55
Name Address Networth
Spielberg 246 Palm Rd 10M
Taylor 456 Maple Av 20M
Lucas 789 Oak St 30M
MovieStar MovieExec
SELECT * FROM MovieStar FULL OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
SELECT * FROM MovieStar FULL OUTER JOIN MovieExec ON MovieStart.name=MovieExec.name
Name Address G. Birthdate Name Address Net
Hanks 123 Palm Rd M 01/01/60 Null Null Null
Taylor 456 Maple Av F 02/02/40 Taylor 456 Maple Av 20M
Lucas 789 Oak St M 03/03/55 Lucas 789 Oak St 30M
Null Null Null Null Spielberg 246 Palm Rd 10M
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
15
New-style outer joins Outer joins may be left, right, or full
FROM A LEFT [OUTER] JOIN B; FROM A RIGHT [OUTER] JOIN B; FROM A FULL [OUTER] JOIN B;
OUTER is optional If OUTER is included, then FULL is the default
Q: How to remember left v. right? A: It indicates the side whose rows are always
included
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
16
Next: Grouping & Aggregation In SQL:
aggregation operators in SELECT, Grouping in GROUP BY clause
Recall aggregation operators: sum, avg, min, max, count
strings, numbers, dates Each applies to scalars Count also applies to row: count(*) Can DISTINCT inside aggregation op: count(DISTINCT x)
Grouping: group rows that agree on single value Each group becomes one row in result
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
17
Aggregation functions Numerical: SUM, AVG, MIN, MAX Char: MIN, MAX
In lexocographic/alphabetic order Any attribute: COUNT
Number of values
SUM(B) = 10 AVG(A) = 1.5 MIN(A) = 1 MAX(A) = 3 COUNT(A) = 4
A B
1 2
3 4
1 2
1 2
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
18
Straight aggregation In R.A. sum(x)total(R) In SQL:
Just put the aggregation op in SELECT NB: aggreg. ops applied to each non-null val
count(x) counts the number of nun-null vals in field x Use count(*) to count the number of rows
SELECT SUM(x) totalFROM R
SELECT SUM(x) totalFROM R
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
19
Straight aggregation example COUNT applies to duplicates, unless otherwise stated:
Better:
Can we say:
same as Count(*), except excludes nulls
SELECT Count(category)FROM ProductWHERE year > 1995
SELECT Count(category)FROM ProductWHERE year > 1995
SELECT COUNT(DISTINCT category)FROM ProductWHERE year > 1995
SELECT COUNT(DISTINCT category)FROM ProductWHERE year > 1995
SELECT category, COUNT(category)FROM ProductWHERE year > 1995
SELECT category, COUNT(category)FROM ProductWHERE year > 1995
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
20
Straight aggregation example Purchase(product, date, price, quantity)
Q: Find total sales for the entire database:
Q: Find total sales of bagels:
SELECT SUM(price * quantity)FROM Purchase
SELECT SUM(price * quantity)FROM Purchase
SELECT SUM(price * quantity)FROM PurchaseWHERE product = 'bagel'
SELECT SUM(price * quantity)FROM PurchaseWHERE product = 'bagel'
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
21
Largest balance again Acc(name,bal,type) Q: Who has the largest balance? Q: Who has the largest balance of each
type?
Can we do these with aggregation functions?
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
22
Straight grouping Group rows together by field values Produces one row for each group
I.e., by each (combin. of) grouped val(s) Don’t select non-grouped fields
Reduces to DISTINCT selections:
SELECT productFROM PurchaseGROUP BY product
SELECT productFROM PurchaseGROUP BY product
SELECT DISTINCT productFROM Purchase
SELECT DISTINCT productFROM Purchase
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
23
Grouping & aggregation Sometimes want to group and compute
aggregations by group Aggregation op applied to rows in group, not to all rows in table
Purchase(product, date, price, quantity) Find total sales for products that sold for > 0.50:
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
24
Illustrated G&A example
Product Date Price Quantity
Bagel 10/21 0.85 15
Banana 10/22 0.52 7
Banana 10/19 0.52 17
Bagel 10/20 0.85 20
Purchase
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
25
Product Date Price Quantity
Banana 10/19 0.52 17
Banana 10/22 0.52 7
Bagel 10/20 0.85 20
Bagel 10/21 0.85 15
First compute the FROM-WHERE Then GROUP BY product:
Illustrated G&A example
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
26
Product TotalSales
Bagel $29.75
Banana $12.48
Finally, aggregate and select:
Illustrated G&A example
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERW price > .50GROUP BY product
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERW price > .50GROUP BY product
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
27
Illustrated G&A example GROUP BY may be reduced to (a possibly more
complicated) subquery:
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
SELECT product, SUM(price*quantity) totalFROM PurchaseWHERE price > .50GROUP BY product
SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > .50) totalFROM Purchase xWHERE x.price > .50
SELECT DISTINCT x.product, (SELECT SUM(y.price*y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > .50) totalFROM Purchase xWHERE x.price > .50
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
28
For every product, what is the total sales and max quantity sold?
Product SumSales MaxQuantity
Banana $12.48 17
Bagel $29.75 20
Multiple aggregations
SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantityFROM PurchaseWHERE price > .50GROUP BY product
SELECT product, SUM(price * quantity) SumSales, MAX(quantity) MaxQuantityFROM PurchaseWHERE price > .50GROUP BY product
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
29
Another grouping/aggregation e.g. Movie(title, year, length, studioName)
Q: How many total minutes of film have been produced by each studio?
Strategy: Divide movies into groups per studio, then add lengths per group
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
30
Another grouping/aggregation e.g.
Title Year Length Studio
Star Wars 1977 120 Fox
Jedi 1980 105 Fox
Aviator 2004 800 Miramax
Pulp Fiction 1995 110 Miramax
Lost in Translation
2003 95 Universal
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
31
Another grouping/aggregation e.g.
Title Year Length Studio
Star Wars 1977 120 Fox
Jedi 1980 105 Fox
Aviator 2004 800 Miramax
Pulp Fiction 1995 110 Miramax
Lost in Translation
2003 95 Universal
SELECT studio, sum(length) lengthFROM MoviesGROUP BY studio
SELECT studio, sum(length) lengthFROM MoviesGROUP BY studio
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
32
Another grouping/aggregation e.g.
Title Year Length Studio
Star Wars 1977 120 Fox
Jedi 1980 105 Fox
Aviator 2004 800 Miramax
Pulp Fiction 1995 110 Miramax
Lost in Translation
2003 95 Universal
Studio Length
Fox 225
Miramax 910
Universal 95
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
SELECT studio, sum(length) totalLengthFROM MoviesGROUP BY studio
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
33
Grouping/aggregation example StarsIn(SName,Title,Year) Q: Find the year of each star’s first movie
Q: Find the span of each star’s career Look up first and last movies
SELECT sname, min(year) firstyearFROM StarsInGROUP BY sname
SELECT sname, min(year) firstyearFROM StarsInGROUP BY sname
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
34
Account types again Acc(name,bal,type) Q: Who has the largest balance of each
type?
Can we do this with grouping/aggregation?
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
35
G & A for constructed relations Movie(title,year,producerSsn,length) MovieExec(name,ssn,netWorth)
Can do the same thing for larger, non-atomic relations Q: How many mins. of film did each producer make?
What happens to non-producer movie-execs?
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY name
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY name
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
36
HAVING clauses Sometimes want to limit which rows may be grouped Q: How many mins. of film did each rich producer
make? Rich = netWorth > 10000000
Q: Is HAVING necessary here? A: No, could just add rich req. to WHERE
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING netWorth > 10000000
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING netWorth > 10000000
M.P. Johnson, DBMS, Stern/NYU, Spring 2008
37
HAVING clauses Sometimes want to limit which rows may be
grouped Q: How many mins. of film did each rich producer
make? Old = made movies before 1930
Q: Is HAVING necessary here?
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING min(year) < 1930
SELECT name, sum(length) totalFROM Movie, MovieExecWHERE producerSsn = ssnGROUP BY nameHAVING min(year) < 1930