lecture 3 sql adv - github pages · introduction to sql lecture 2. 3. multi-table queries 6 lecture...

55
Lecture 2 (cont’d) & Lecture 3: Advanced SQL – Part I

Upload: phamxuyen

Post on 15-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Lecture2(cont’d)&Lecture3:AdvancedSQL– PartI

Announcements!

1. YoushouldbeJupyter notebookNinjas!

2. WelcomeTing!• NewTA-Officehoursonwebsite(roomtobeannounced)

3. Projectgroupsfinalized!• IfyoudonothaveagrouptalkwithusASAP!

4. ProblemSet#1released

2

Lecture2

Lecture2(cont’d)&Lecture3:AdvancedSQL– PartI

Lecture2

Today’sLecture

1. RecapfromLecture2&Multi-tablequeries• ACTIVITY:Multi-tablequeries

2. Setoperators&nestedqueries• ACTIVITY:Setoperatorsubtleties

4

Lecture2

Lecture2(cont’d):IntroductiontoSQL

Lecture2

3.Multi-tablequeries

6

Lecture2>Section3

Whatyouwilllearnaboutinthissection

1. PrimarykeysandForeignkeysrecap

2. Joins:SQLsemantics

3. ACTIVITY:Multi-tablequeries

7

Lecture2>Section3

8

KeysandForeignKeys

PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi

Product

CompanyCName StockPrice Country

GizmoWorks 25 USACanon 65 JapanHitachi 15 Japan

Whatisaforeignkeyvs.akeyhere?

Lecture2>Section3>ForeignKeys

Akey isaminimalsubsetofattributesthatactsasauniqueidentifierfortuplesinarelation

Iftwotuplesagreeonthevaluesofthekey,thentheymustbethesame tuple!

9

KeysandForeignKeys

PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi

Product

CompanyCName StockPrice Country

GizmoWorks 25 USACanon 65 JapanHitachi 15 Japan

Whatisaforeignkeyvs.akeyhere?

Lecture2>Section3>ForeignKeys

A foreignkey isanattribute(orcollectionofattributes)inonetablethatuniquelyidentifiesarowofanothertable.

The foreignkey is defined inasecondtable,butitreferstotheprimary key inthefirsttable.

DeclaringForeignKeys

Lecture2>Section3>ForeignKeys

Company(CName: string, StockPrice: float, Country: string)Product(PName: string, Price: float, Category: string, Manufacturer: string)

CREATE TABLE Product(pname VARCHAR(100),price FLOAT,category VARCHAR(100),manufacturer VARCHAR(100),PRIMARY KEY (pname, manufacturer),FOREIGN KEY (manufacturer) REFERENCES Company(cname)

)

Lecture2>Section3>ForeignKeys

Canwedothis?Whatwouldbetheproblem?

DeclaringForeignKeysCREATE TABLE Company(

cname VARCHAR(100),stockprice FLOAT,country VARCHAR(100),PRIMARY KEY (cname),FOREIGN KEY (cname) REFERENCES Product(pname, manufacturer)

)

CREATE TABLE Product(pname VARCHAR(100),price FLOAT,category VARCHAR(100),manufacturer VARCHAR(100),PRIMARY KEY (pname, manufacturer)

)

DeclaringForeignKeysLecture2>Section3>ForeignKeys

CREATE TABLE Company(cname VARCHAR(100),stockprice FLOAT,country VARCHAR(100),PRIMARY KEY (cname),FOREIGN KEY (cname) REFERENCES Product(pname, manufacturer)

)

CREATE TABLE Product(pname VARCHAR(100),price FLOAT,category VARCHAR(100),manufacturer VARCHAR(100),PRIMARY KEY (pname, manufacturer)

)

Wecanhaveproductswithoutaregisteredcompany!Baddesign!We’llseemorenextweek.

Canwedothis?Whatwouldbetheproblem?

Lecture2>Section3>ForeignKeys

DeclaringForeignKeysCREATE TABLE Company(

cname VARCHAR(100),stockprice FLOAT,country VARCHAR(100),PRIMARY KEY (cname),FOREIGN KEY (cname) REFERENCES Product(pname, manufacturer)

)

CREATE TABLE Product(pname VARCHAR(100),price FLOAT,category VARCHAR(100),manufacturer VARCHAR(100),PRIMARY KEY (pname, manufacturer)

)

Ifthe primarykey isasetofcolumns(a compositekey),thenthe foreignkey alsomustbeasetofcolumnsthatcorrespondstothe compositekey.

14

Joins

PName Price Category ManufGizmo $19 Gadgets GWorks

Powergizmo $29 Gadgets GWorks

SingleTouch $149 Photography Canon

MultiTouch $203 Household Hitachi

ProductCompany

Cname Stock CountryGWorks 25 USACanon 65 JapanHitachi 15 Japan

PName PriceSingleTouch $149.99

Lecture2>Section3>Joins:Basics

SELECT PName, PriceFROM Product, CompanyWHERE Manufacturer = CName

AND Country=‘Japan’AND Price <= 200

AnexampleofSQLsemantics

15

SELECT R.AFROM R, SWHERE R.A = S.B

A13

B C2 33 43 5

A B C1 2 31 3 41 3 53 2 33 3 43 3 5

CrossProduct

A B C3 3 43 3 5

A33

ApplyProjection

Lecture2>Section3>Joins:semantics

ApplySelections/Conditions

Output

Notethesemantics ofajoin

16

SELECT R.AFROM R, SWHERE R.A = S.B

Lecture2>Section3>Joins:semantics

Recall:Crossproduct(AXB)isthesetofalluniquetuplesinA,B

Ex:{a,b,c}X{1,2}={(a,1),(a,2),(b,1),(b,2),(c,1),(c,2)}

=Filtering!

=Returningonlysome attributes

Rememberingthisorderiscriticaltounderstandingtheoutputofcertainqueries(seelateron…)

1. Takecrossproduct:𝑋 = 𝑅×𝑆

2. Applyselections/conditions:𝑌 = 𝑟, 𝑠 ∈ 𝑋 𝑟. 𝐴 == 𝑟. 𝐵}

3. Applyprojections togetfinaloutput:𝑍 = (𝑦. 𝐴, )𝑓𝑜𝑟𝑦 ∈ 𝑌

Note:wesay“semantics”not“executionorder”

• Theprecedingslidesshowwhatajoinmeans

• NotactuallyhowtheDBMSexecutesitunderthecovers

Lecture2>Section3>Joins:semantics

18

ASubtletyaboutJoins

Findallcountriesthatmanufacturesomeproductinthe‘Gadgets’category.

SELECT CountryFROM Product, CompanyWHERE Manufacturer=CName AND Category=‘Gadgets’

Lecture2>Section3>ACTIVITYLecture2>Section3>Joins:semantics

Product(PName, Price, Category, Manufacturer)

Company(CName, StockPrice, Country)

19

AsubtletyaboutJoins

PName Price Category Manuf

Gizmo $19 Gadgets GWorks

Powergizmo $29 Gadgets GWorks

SingleTouch $149 Photography Canon

MultiTouch $203 Household Hitachi

Product CompanyCname Stock Country

GWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

Country??

SELECT CountryFROM Product, CompanyWHERE Manufacturer=Cname

AND Category=‘Gadgets’

Whatistheproblem?What’sthesolution?

Lecture2>Section3>ACTIVITYLecture2>Section3>Joins:semantics

ACTIVITY:Lecture-2-3.ipynb

20

Lecture2>Section3>ACTIVITY

21

SELECT DISTINCT R.AFROM R, S, TWHERE R.A=S.A OR R.A=T.A

AnUnintuitiveQuery

Whatdoesitcompute?

Lecture2>Section3>ACTIVITY

22

SELECT DISTINCT R.AFROM R, S, TWHERE R.A=S.A OR R.A=T.A

AnUnintuitiveQuery

ComputesRÇ (SÈ T)

ButwhatifS=f?

S T

R

Gobacktothesemantics!

Lecture2>Section3>ACTIVITY

23

SELECT DISTINCT R.AFROM R, S, TWHERE R.A=S.A OR R.A=T.A

AnUnintuitiveQuery

• Recallthesemantics!1. Takecross-product2. Applyselections /conditions3. Applyprojection

• IfS={},thenthecrossproductofR,S,T={},andthequeryresult={}!

Mustconsidersemanticshere.Aretheremoreexplicitwaytodosetoperationslikethis?

Lecture2>Section3>ACTIVITY

Lecture3:AdvancedSQL– PartI

1.SetOperators&NestedQueries

25

Lecture3>Section1

Whatyouwilllearnaboutinthissection

1. Multiset operatorsinSQL

2. Nestedqueries

3. ACTIVITY:Setoperatorsubtleties

26

Lecture3>Section1

27

SELECT DISTINCT R.AFROM R, S, TWHERE R.A=S.A OR R.A=T.A

AnUnintuitiveQuery

ComputesRÇ (SÈ T)

ButwhatifS=f?

Lecture3>Section1>SetOperators

S T

R

Gobacktothesemantics!

Whatdoesitcompute?

28

SELECT DISTINCT R.AFROM R, S, TWHERE R.A=S.A OR R.A=T.A

AnUnintuitiveQuery

Lecture3>Section1>SetOperators

• Recallthesemantics!1. Takecross-product2. Applyselections /conditions3. Applyprojection

• IfS={},thenthecrossproductofR,S,T={},andthequeryresult={}!

Mustconsidersemanticshere.Aretheremoreexplicitwaytodosetoperationslikethis?

29

SELECT DISTINCT R.AFROM R, S, TWHERE R.A=S.A OR R.A=T.A

WhatdoesthislooklikeinPython?

Lecture3>Section1>SetOperators

• Semantics:1. Takecross-product

2. Applyselections /conditions

3. Applyprojection

Joins/cross-products arejustnestedforloops (insimplestimplementation)!

If-thenstatements!

RÇ (SÈ T)

S T

R

30

SELECT DISTINCT R.AFROM R, S, TWHERE R.A=S.A OR R.A=T.A

WhatdoesthislooklikeinPython?

Lecture3>Section1>SetOperators

RÇ (SÈ T)

S T

R

output = {}

for r in R:for s in S:

for t in T:if r[‘A’] == s[‘A’] or r[‘A’] == t[‘A’]:

output.add(r[‘A’])return list(output)

CanyouseenowwhathappensifS=[]?

Multiset Operations

31

Lecture3>Section1>SetOperators

RecallMultisets

32

Lecture3>Section1>SetOperators

Tuple

(1,a)

(1,a)

(1, b)

(2,c)

(2,c)

(2,c)

(1,d)

(1,d)

Tuple 𝝀(𝑿)

(1,a) 2

(1,b) 1

(2,c) 3

(1, d) 2EquivalentRepresentationsofaMultiset

Multiset X

Multiset X

Note:Inasetallcountsare{0,1}.

𝝀 𝑿 =“CountoftupleinX”(Itemsnotlistedhaveimplicitcount0)

GeneralizingSetOperationstoMultisetOperations

33

Lecture3>Section1>SetOperators

Tuple 𝝀(𝑿)

(1,a) 2

(1,b) 0

(2,c) 3

(1, d) 0

Multiset X

Tuple 𝝀(𝒀)

(1,a) 5

(1,b) 1

(2,c) 2

(1, d) 2

Multiset Y

Tuple 𝝀(𝒁)

(1,a) 2

(1,b) 0

(2,c) 2

(1, d) 0

Multiset Z

∩ =

𝝀 𝒁 = 𝒎𝒊𝒏(𝝀 𝑿 , 𝝀 𝒀 )Forsets,thisisintersection

34

Lecture3>Section1>SetOperators

Tuple 𝝀(𝑿)

(1,a) 2

(1,b) 0

(2,c) 3

(1, d) 0

Multiset X

Tuple 𝝀(𝒀)

(1,a) 5

(1,b) 1

(2,c) 2

(1, d) 2

Multiset Y

Tuple 𝝀(𝒁)

(1,a) 5

(1,b) 1

(2,c) 3

(1, d) 2

Multiset Z

∪ =

𝝀 𝒁 = 𝒎𝒂𝒙(𝝀 𝑿 , 𝝀 𝒀 )Forsets,

thisisunion

GeneralizingSetOperationstoMultisetOperations

Multiset OperationsinSQL

35

Lecture3>Section1>SetOperators

ExplicitSetOperators:INTERSECT

36

SELECT R.AFROM R, SWHERE R.A=S.AINTERSECTSELECT R.AFROM R, TWHERE R.A=T.A

Lecture3>Section1>SetOperators

Q1 Q2

𝑟. 𝐴 𝑟. 𝐴 = 𝑠. 𝐴 ∩ 𝑟. 𝐴 𝑟. 𝐴 = 𝑡. 𝐴}

UNION

37

SELECT R.AFROM R, SWHERE R.A=S.AUNIONSELECT R.AFROM R, TWHERE R.A=T.A

Lecture3>Section1>SetOperators

Q1 Q2

𝑟. 𝐴 𝑟. 𝐴 = 𝑠. 𝐴 ∪ 𝑟. 𝐴 𝑟. 𝐴 = 𝑡. 𝐴}

Whyaren’tthereduplicates?

Whatifwewantduplicates?

UNIONALL

38

SELECT R.AFROM R, SWHERE R.A=S.AUNION ALLSELECT R.AFROM R, TWHERE R.A=T.A

Lecture3>Section1>SetOperators

Q1 Q2

𝑟. 𝐴 𝑟. 𝐴 = 𝑠. 𝐴 ∪ 𝑟. 𝐴 𝑟. 𝐴 = 𝑡. 𝐴}

ALLindicatestheMultisetdisjointunionoperation

39

Lecture3>Section1>SetOperators

Tuple 𝝀(𝑿)

(1,a) 2

(1,b) 0

(2,c) 3

(1, d) 0

Multiset X

Tuple 𝝀(𝒀)

(1,a) 5

(1,b) 1

(2,c) 2

(1, d) 2

Multiset Y

Tuple 𝝀(𝒁)

(1,a) 7

(1,b) 1

(2,c) 5

(1, d) 2

Multiset Z

=

𝝀 𝒁 = 𝝀 𝑿 + 𝝀 𝒀Forsets,

thisisdisjointunion

GeneralizingSetOperationstoMultisetOperations

t

EXCEPT

40

SELECT R.AFROM R, SWHERE R.A=S.AEXCEPTSELECT R.AFROM R, TWHERE R.A=T.A

Lecture3>Section1>SetOperators

Q1 Q2

𝑟. 𝐴 𝑟. 𝐴 = 𝑠. 𝐴 \{𝑟. 𝐴|𝑟. 𝐴 = 𝑡. 𝐴}

Whatisthemultiset version?

𝝀 𝒁 = 𝝀 𝑿 − 𝝀 𝒀ForelementsthatareinX

INTERSECT:Stillsomesubtleproblems…

41

Company(name, hq_city)Product(pname, maker, factory_loc)

SELECT hq_cityFROM Company, ProductWHERE maker = name

AND factory_loc = ‘US’INTERSECTSELECT hq_cityFROM Company, ProductWHERE maker = name

AND factory_loc = ‘China’

WhatiftwocompanieshaveHQinUS:BUTonehasfactoryinChina(butnotUS)andviceversa? Whatgoeswrong?

“HeadquartersofcompanieswhichmakegizmosinUSAND China”

Lecture3>Section1>SetOperators

INTERSECT:Rememberthesemantics!

42

Company(name, hq_city) AS CProduct(pname, maker, factory_loc) AS P

SELECT hq_cityFROM Company, ProductWHERE maker = name

AND factory_loc=‘US’INTERSECTSELECT hq_cityFROM Company, ProductWHERE maker = nameAND factory_loc=‘China’

Lecture3>Section1>SetOperators

Example:CJOINPonmaker=nameC.name C.hq_city P.pname P.maker P.factory_loc

XCo. Seattle X XCo. U.S.

YInc. Seattle X Y Inc. China

INTERSECT:Rememberthesemantics!

43

Company(name, hq_city) AS CProduct(pname, maker, factory_loc) AS P

SELECT hq_cityFROM Company, ProductWHERE maker = name

AND factory_loc=‘US’INTERSECTSELECT hq_cityFROM Company, ProductWHERE maker = nameAND factory_loc=‘China’

Lecture3>Section1>SetOperators

Example:CJOINPonmaker=nameC.name C.hq_city P.pname P.maker P.factory_loc

XCo. Seattle X XCo. U.S.

YInc. Seattle X Y Inc. China

XCohasafactoryintheUS(butnotChina)YInc.hasafactorinChina(butnotUS)

ButSeattleisreturnedbythequery!

WedidtheINTERSECTonthewrongattributes!

OneSolution:NestedQueries

44

Company(name, hq_city)Product(pname, maker, factory_loc)

SELECT DISTINCT hq_cityFROM Company, ProductWHERE maker = name

AND name IN (SELECT makerFROM ProductWHERE factory_loc = ‘US’)

AND name IN (SELECT makerFROM ProductWHERE factory_loc = ‘China’)

Lecture3>Section1>NestedQueries

“HeadquartersofcompanieswhichmakegizmosinUSAND China”

Note:Ifwehadn’tusedDISTINCThere,howmanycopiesofeachhq_city wouldhavebeenreturned?

High-levelnoteonnestedqueries

• WecandonestedqueriesbecauseSQLiscompositional:

• Everything(inputs/outputs)isrepresentedasmultisets- theoutputofonequerycanthusbeusedastheinputtoanother(nesting)!

• Thisisextremely powerful!

Lecture3>Section1>NestedQueries

46

Nestedqueries:Sub-queriesReturningRelations

SELECT c.cityFROM Company cWHERE c.name IN (

SELECT pr.makerFROM Purchase p, Product prWHERE p.product = pr.nameAND p.buyer = ‘Joe Blow‘)

“CitieswhereonecanfindcompaniesthatmanufactureproductsboughtbyJoeBlow”

Company(name, city)Product(name, maker)Purchase(id, product, buyer)

Lecture3>Section1>NestedQueries

Anotherexample:

47

NestedQueries

SELECT c.cityFROM Company c,

Product pr, Purchase p

WHERE c.name = pr.makerAND pr.name = p.productAND p.buyer = ‘Joe Blow’

Isthisqueryequivalent?

Bewareofduplicates!

Lecture3>Section1>NestedQueries

48

NestedQueries

SELECT DISTINCT c.cityFROM Company c,

Product pr, Purchase p

WHERE c.name = pr.makerAND pr.name = p.productAND p.buyer = ‘Joe Blow’

Nowtheyareequivalent

Lecture3>Section1>NestedQueries

SELECT DISTINCT c.cityFROM Company cWHERE c.name IN (SELECT pr.makerFROM Purchase p, Product prWHERE p.product = pr.name

AND p.buyer = ‘Joe Blow‘)

49

Subqueries ReturningRelations

SELECT nameFROM ProductWHERE price > ALL(

SELECT priceFROM ProductWHERE maker = ‘Gizmo-Works’)

Product(name, price, category, maker)

Youcanalsouseoperationsoftheform:• s>ALLR• s<ANYR• EXISTSR

Lecture3>Section1>NestedQueries

Findproductsthataremoreexpensivethanallthoseproducedby“Gizmo-Works”

Ex:

ANYandALLnotsupportedbySQLite.

50

SubqueriesReturningRelations

SELECT p1.nameFROM Product p1WHERE p1.maker = ‘Gizmo-Works’

AND EXISTS(SELECT p2.nameFROM Product p2WHERE p2.maker <> ‘Gizmo-Works’

AND p1.name = p2.name)

Product(name, price, category, maker)

Youcanalsouseoperationsoftheform:• s>ALLR• s<ANYR• EXISTSR

Lecture3>Section1>NestedQueries

Find‘copycat’products,i.e.productsmadebycompetitorswiththesamenamesasproductsmadeby“Gizmo-Works”

Ex:

<>means!=

51

NestedqueriesasalternativestoINTERSECTandEXCEPT

(SELECT R.A, R.BFROM R)

INTERSECT(SELECT S.A, S.BFROM S)

SELECT R.A, R.BFROM RWHERE EXISTS(

SELECT *FROM SWHERE R.A=S.A AND R.B=S.B)

SELECT R.A, R.BFROM RWHERE NOT EXISTS(

SELECT *FROM SWHERE R.A=S.A AND R.B=S.B)

Lecture3>Section1>NestedQueries

INTERSECTandEXCEPTnotinsomeDBMSs!

IfR,Shavenoduplicates,thencanwritewithoutsub-queries(HOW?)(SELECT R.A, R.B

FROM R)EXCEPT(SELECT S.A, S.BFROM S)

52

CorrelatedQueries

SELECT DISTINCT titleFROM Movie AS mWHERE year <> ANY(

SELECT yearFROM MovieWHERE title = m.title)

Movie(title, year, director, length)

Notealso:thiscanstillbeexpressedassingleSFWquery…

Lecture3>Section1>NestedQueries

Findmovieswhosetitleappearsmorethanonce.

Notethescopingofthevariables!

53

ComplexCorrelatedQuery

SELECT DISTINCT x.name, x.makerFROM Product AS xWHERE x.price > ALL(

SELECT y.priceFROM Product AS yWHERE x.maker = y.maker

AND y.year < 1972)

Lecture3>Section1>NestedQueries

Findproducts(andtheirmanufacturers)thataremoreexpensivethanallproductsmadebythesamemanufacturerbefore1972

Product(name, price, category, maker, year)

Canbeverypowerful(alsomuchhardertooptimize)

Activity-3-1.ipynb

54

Lecture3>Section1>ACTIVITY

BasicSQLSummary

• SQLprovidesahigh-leveldeclarativelanguageformanipulatingdata(DML)

• TheworkhorseistheSFWblock

• Setoperatorsarepowerfulbuthavesomesubtleties

• Powerful,nestedqueriesalsoallowed.

55

Lecture3>Section1>Summary