database design and implementation - umass...
TRANSCRIPT
Database design and implementation CMPSCI 645
Lecture 04: SQL and Datalog
What you need
2
} InstallPostgres.} Instruc0onsareontheassignmentspageonthewebsite.
} Useittoprac0ce
} RefreshyourSQL:} h?p://sqlzoo.net
Simple SQL query
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
Mul0Touch $203.99 Household Hitachi
SELECT !* FROM !Product WHERE !category='Gadgets'!
Product
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorksSelec0on
3
�category=�Gadgets�
Simple SQL query
PName Price Manufacturer
SingleTouch $149.99 Canon
Mul0Touch $203.99 Hitachi
SELECT !pName, price, manufacturer FROM !Product WHERE !price > 100!
Product
Selec0on&Projec0on
4
�price>100
�pname,price,manufacturer
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
Mul0Touch $203.99 Household Hitachi
Eliminating duplicates
Category
Gadgets
Gadgets
Photography
Household
Category
Gadgets
Photography
Household
SELECT !DISTINCT category FROM !Product!
SELECT !category FROM !Product!
Setvs.Bagseman0cs
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
PowerGizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
Mul0Touch $203.99 Household Hitachi
Product
5
Ordering the results
} Tiesinpricea"ributebrokenbypnamea?ribute
} Orderingisascendingbydefault.Descending:
SELECT ! !pName, price, manufacturer!FROM ! !Product!WHERE ! !category='Gadgets' ! ! !and price > 10!ORDER BY !price, pName!
... ORDER BY price, pname DESC!
6
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
Mul0Touch $203.99 Household Hitachi
SELECT category!FROM Product!ORDER BY pName!
SELECT DISTINCT category!FROM Product!ORDER BY category!
SELECT DISTINCT category!FROM Product!ORDER BY pName!
Product
???
7
Category
Gadgets
Household
Photography
Category
Gadgets
Household
Gadgets
Photography
Syntax error* 8
SELECT category!FROM Product!ORDER BY pName!
SELECT DISTINCT category!FROM Product!ORDER BY category!
SELECT DISTINCT category!FROM Product!ORDER BY pName!
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
Mul0Touch $203.99 Household Hitachi
Product
Joins
Q:Findallproductsunder$200manufacturedinJapan;returntheirnamesandprices!
SELECT !pName, price FROM !Product, Company WHERE !manufacturer=cName ! !and country='Japan' !and price <= 200!
JoinbetweenProductandCompany
Product(pName,price,category,manufacturer)Company(cName,stockPrice,country)
9
Joins
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
Mul0Touch $203.99 Household Hitachi
Product CompanyCName StockPrice Country
GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
PName Price
SingleTouch $149.99
SELECT !pName, price FROM !Product, Company WHERE !manufacturer=cName ! !and country='Japan' !and price <= 200!
10
Semantics are tricky…
SELECT !DISTINCT R.a!FROM !R, S, T!WHERE !R.a=S.a ! !or R.a=T.a!
IfS≠∅ andT≠∅ thenreturnsR∩ (S ∪ T) elsereturns∅
Whatdothesequeriescompute?
SELECT !DISTINCT R.a!FROM !R, S!WHERE !R.a=S.a!
ReturnsR∩ S
R(a),S(a),T(a)
11
Formal semantics of SQL queries
Answer={}forx1inR1doforx2inR2do…..forxninRndoifCondi0onsthenAnswer=Answer∪{(a1,…,ak)}returnAnswer
SELECT !a1, a2, …, ak!FROM !R1 as x1, R2 as x2, …, Rn as xn!WHERE !Conditions!
Conceptualevalua0onstrategy(nestedforloops):
12
Joins introduce duplicates
Country
USA
USA
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
Mul0Touch $203.99 Household Hitachi
Product CompanyCName StockPrice Country
GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
RemembertouseDISTINCT13
SELECT !country!FROM !Product, Company WHERE !manufacturer = cName ! and category = 'Gadgets'!
Q:Findallcountriesthatmanufacturesomeproductinthe‘Gadgets’category!
Subqueries } AsubqueryisaSQLquerynestedinsidealargerquery
} Suchinner-outerqueriesarecallednestedqueries
} Asubquerymayoccurin:} ASELECTclause} AFROMclause} AWHEREclause
} Ruleofthumb:avoidwri0ngnestedquerieswhenpossible;keepinmindthatsome0mesit’simpossible
14
1. Subqueries in SELECT
Whathappensifthesubqueryreturnsmorethanonecity?Run0meerror
Q:Foreachproductreturnthecitywhereitismanufactured!
SELECT !P.pname, (SELECT !C.city!! ! FROM! !Company C!! ! WHERE! !C.cid = P.cid)!
FROM !Product P!
Product(pname,price,cid)Company(cid,cname,city)
15
1. Subqueries in SELECT
Q:Foreachproductreturnthecitywhereitismanufactured!
SELECT !P.pname, C.city!FROM !Product P, Company C!WHERE !C.cid = P.cid!
"unnes0ngthequery" Wheneverpossible,don'tusenestedqueries
Product(pname,price,cid)Company(cid,cname,city)
16
SELECT !P.pname, (SELECT !C.city!! ! FROM! !Company C!! ! WHERE! !C.cid = P.cid)!
FROM !Product P!
2. Subqueries in FROM
Q:Findallproductswhosepricesare>20and<30!
SELECT !X.pname !FROM !(SELECT! *!
! FROM ! Product as P!! WHERE! price >20 ) as X!
WHERE !X.price < 30!
SELECT !pname!FROM !Product!WHERE !price > 20 and price < 30!
unnes0ng
Product(pname,price,cid)Company(cid,cname,city)
17
3. Subqueries in WHERE
Existen0alquan0fiers∃
UsingEXISTS:
Q:Findallcompaniesthatmakesomeproductswithprice<100!
SELECT !DISTINCT C.cname !FROM !Company C!WHERE !EXISTS !(SELECT !*!
! ! FROM ! !Product P!! ! WHERE! !C.cid = P.cid!! ! ! !and P.price < 100)!
Product(pname,price,cid)Company(cid,cname,city)
18
3. Subqueries in WHERE
Existen0alquan0fiers∃
UsingIN:
Q:Findallcompaniesthatmakesomeproductswithprice<100!
SELECT !DISTINCT C.cname !FROM !Company C!WHERE !C.cid IN !(SELECT P.cid!
! ! FROM ! Product P!! ! WHERE! P.price < 100)!
Product(pname,price,cid)Company(cid,cname,city)
19
3. Subqueries in WHERE
Existen0alquan0fiers∃
UsingANY:
Q:Findallcompaniesthatmakesomeproductswithprice<100!
SELECT !DISTINCT C.cname !FROM !Company C!WHERE !100 > ANY!(SELECT price!
! ! FROM !!Product P!! ! WHERE !!P.cid = C.cid)!
Product(pname,price,cid)Company(cid,cname,city)
20
3. Subqueries in WHERE
Existen0alquan0fiers∃
Now,let'sunnest:
Q:Findallcompaniesthatmakesomeproductswithprice<100!
SELECT !DISTINCT C.cname !FROM !Company C, Product P!WHERE !C.cid = P.cid! !and P.price < 100 !
Existen0alquan0fiersareeasy!J
Product(pname,price,cid)Company(cid,cname,city)
21
3. Subqueries in WHERE
Universalquan0fiers∀
Q:Findallcompaniesthatmakeonlyproductswithprice<100!
Q:Findallcompaniesforwhichallproductshaveprice<100!
Universalquan0fiersaremorecomplicated!L
sameas:
Product(pname,price,cid)Company(cid,cname,city)
22
3. Subqueries in WHERE
2. Find all companies s.t. all their products have price < 100!
1. Find the other companies: i.e. they have some product ≥ 100!
SELECT !DISTINCT C.cname !FROM !Company C!WHERE !C.cid IN (SELECT P.cid!
! ! FROM !! Product P!! ! WHERE!! P.price >= 100)!
SELECT !DISTINCT C.cname !FROM !Company C!WHERE !C.cid NOT IN (SELECT P.cid!
! ! FROM!! Product P!! ! WHERE! P.price >= 100)!
23
3. Subqueries in WHERE
Using NOT EXISTS:
SELECT !DISTINCT C.cname !FROM !Company C!WHERE !NOT EXISTS!(SELECT *!
! ! FROM !Product P!! ! WHERE!C.cid = P.cid!! ! !and P.price >= 100)!
Universal quantifiers ∀
Q: Find all companies that make only products with price < 100!
Product (pname, price, cid) Company (cid, cname, city)
24
3. Subqueries in WHERE
Using ALL:
Universal quantifiers ∀
Q: Find all companies that make only products with price < 100!
SELECT !DISTINCT C.cname !FROM !Company C!WHERE !100 > ALL!(SELECT price!
! ! FROM Product P!! ! WHERE P.cid = C.cid)!
Product (pname, price, cid) Company (cid, cname, city)
25
Challenging question } HowcanweunnesttheuniversalquanLfierquery?
26
Queries that must be nested
} AqueryQismonotoneif:} Addingtuplestotheinputcannotremovetuplesfromtheoutput
} Fact:allunnestedqueriesaremonotone} Proof:usingthe“nestedforloops”seman0cs
} Fact:Querywithuniversalquan0fierisnotmonotone} Addonetupleviola0ngthecondi0on.Thennot"all"...
} Consequence:wecannotunnestaquerywithauniversalquan0fier
27
The drinkers-bars-beers example
Finddrinkersthatfrequentsomebarthatservessomebeertheylike.
Finddrinkersthatfrequentonlybarsthatservesomebeertheylike.
Finddrinkersthatfrequentonlybarsthatserveonlybeertheylike.
Finddrinkersthatfrequentsomebarthatservesonlybeerstheylike.
Likes(drinker,beer)Frequents(drinker,bar)Serves(bar,beer)
Challenge:writetheseinSQL
28
Aggregation
SELECT !count(*)!FROM !Product!WHERE !year > 1995!
Exceptcount,allaggrega0onsapplytoasinglea?ribute
SELECT !avg(price)!FROM !Product!WHERE !maker='Toyota'!
SQLsupportsseveralaggrega0onopera0ons:sum,count,min,max,avg
29
COUNTappliestoduplicates,unlessotherwisestated:
SELECT !count (category) !FROM !Product!WHERE !year > 1995!
Weprobablywant:
SELECT !count (DISTINCT category)!FROM !Product!WHERE !year > 1995!
Aggregation: count distinct
30
Simple aggregation PurchaseProduct Price Quan0tyBagel 3 20Bagel 2 20Banana 1 50Banana 2 10Banana 4 10
3*20=602*20=40sum:100
(Nocolumnname)100
SQLcreatesa?ributename
31
SELECT !sum (price * quantity)!FROM !Purchase!WHERE !product = 'Bagel'!
canusearithme0cexpressions
Grouping and Aggregation
Product TotalSalesBagel 40Banana 20
Product Price Quan0tyBagel 3 20Bagel 2 20Banana 1 50Banana 2 10Banana 4 10
FindtotalquanLLesforallsalesover$1,byproduct.
32
From → Where → Group By → Select
SELECT !product, sum(quantity) as TotalSales!FROM !Purchase!WHERE !price > 1!GROUP BY !product!
Product TotalSalesBagel 40Banana 20
Product Price Quan0tyBagel 3 20Bagel 2 20Banana 1 50Banana 2 10Banana 4 10
123
4
Selectcontains• groupeda?ributes• andaggregates
33
Another example
SELECT ! product,!! sum(quantity) as SumQuantity,!! max(price) as MaxPrice!
FROM ! Purchase!GROUP BY product!
Product TotalSales MaxPriceBagel 40 3Banana 70 4
Next, focus only on products with at least 50 sales 34
HAVING clause
Q:Similartobefore,butonlyproductswithatleast50sales.
Product TotalSales MaxPriceBanana 70 4
35
SELECT ! product,!! sum(quantity) as SumQuantity,!! max(price) as MaxPrice!
FROM ! Purchase!GROUP BY product!HAVING ! sum(quantity) >= 50!
General form of grouping and aggregation
36
S: maycontaina?ributesa1,…,akand/oranyaggregatesbutnoothera?ributes
SELECT !S!FROM !R1,…,Rn WHERE !C1!GROUP BY !a1,…,ak HAVING !C2!
Evalua0on1. EvaluateFrom→Where,applycondi0onC12. Groupbythea?ributesa1,…,ak3. Applycondi0onC2toeachgroup(mayhaveaggregates)4. ComputeaggregatesinSandreturntheresult
1234
5
C1:isanycondi0ononthea?ributesinR1,…,Rn
C2:isanycondi0ononaggregatesandona?ributesa1,…,ak
Finding witnesses Store(sid,sname)Product(pid,pname,price,sid)
Q:Foreachstore,finditsmostexpensiveproducts
SELECT Store.sid, max(Product.price)!FROM Store, Product!WHERE Store.sid = Product.sid!GROUP BY Store.sid!
Butwewantthe“witnesses”,i.e.theproductswithmaxprice
Findingthemaximumpriceiseasy…
37
Finding witnesses
} Computemaxpriceinasubquery} CompareitwitheachproductpriceSELECT Store.sname, Product.pname FROM Store, Product, (SELECT Store.sid as sid, !
! ! ! max(Product.price) as p! FROM Store, Product !
! WHERE Store.sid = Product.sid!! GROUP BY Store.sid) X!
WHERE Store.sid = Product.sid and Store.sid = X.sid ! and Product.price = X.p!
38
Finding witnesses
Thereisamoreconcisesolu0onhere:
SELECT Store.sname, x.pname!FROM Store, Product x!WHERE Store.sid = x.sid ! and x.price >= ALL (SELECT y.price FROM Product y WHERE Store.sid = y.sid)!
39
NULLS in SQL } Wheneverwedon’thaveavalue,wecanputaNULL
} Canmeanmanythings:} Valuedoesnotexists} Valueexistsbutisunknown} Valuenotapplicable} Etc.
} Theschemaspecifiesforeacha?ributeifitcanbeNULLornot
} HowdoesSQLcopewithtablesthathaveNULLs?
40
Null values } Ifx=NULLthen
} Arithme0copera0onsproduceNULL.E.g:4*(3-x)/7} Booleancondi0onsarealsoNULL.E.g:x=‘Joe’
} InSQLtherearethreebooleanvalues:FALSE,TRUE,UNKNOWN
} Reasoning:FALSE=0 xANDy=min(x,y)TRUE=1 xORy=max(x,y)UNKNOWN=0.5 NOTx=(1–x)
41
42
SELECT *!FROM Person!WHERE (age < 25) and !
! (height > 6 or weight > 190)!
RuleinSQL:includeonlytuplesthatyieldTRUE
Age Height Weight20 NULL 200NULL 6.5 170
SELECT *!FROM Person!WHERE age < 25 or age >= 25!
Unexpectedbehavior
SELECT *!FROM Person!WHERE age < 25 or age >= 25 ! or age IS NULL!
TestNULLexplicitly
Outer joins
Ifwewantthenever-soldproducts,weneedan“outerjoin”:
Product(name,category)Purchase(prodName,store)
SELECT Product.name, Purchase.store!FROM Product LEFT OUTER JOIN Purchase ON! Product.name = Purchase.prodName!
Name CategoryGizmo GadgetCamera PhotoOneClick Photo
ProductProdName Store Gizmo Wiz Camera Ritz Camera Wiz
PurchaseName StoreGizmo WizCamera RitzCamera WizOneClick NULL
Result
Innerjoindoesnotproducethistuple 43
Example
} Compute,foreachproduct,thetotalnumberofsalesin‘September’
What’swrong?
Product(name,category)Purchase(prodName,month,store)
SELECT !Product.name, count(*)! FROM Product, Purchase ! WHERE !Product.name = Purchase.prodName! !and Purchase.month = ‘September’! GROUP BY Product.name!
44
Example
} Compute,foreachproduct,thetotalnumberofsalesin‘September’
Product(name,category)Purchase(prodName,month,store)
SELECT !Product.name, count(*)! FROM Product LEFT OUTER JOIN Purchase ON !
! Product.name = Purchase.prodName! WHERE Purchase.month = ‘September’! GROUP BY Product.name!
45
What’swrong?
Example
} Compute,foreachproduct,thetotalnumberofsalesin‘September’
Product(name,category)Purchase(prodName,month,store)
SELECT !Product.name, count(month)! FROM Product LEFT OUTER JOIN Purchase ON !
! !Product.name = Purchase.prodName! WHERE Purchase.month = ‘September’! GROUP BY Product.name!
46
Weneedtousethea?ributetogetthecorrect0count.
47
Datalog
Datalog
48
} Friendlynota0onforqueries} Designedforrecursivequeriesinthe80s.} Today:inacoupleofcommercialproducts,e.g.,LogicBlox,Datomic
} Today:recursion-freedatalogwithnega0on
Datalog: Facts and Rules
49
Actor(34524,’Johnny’,’Depp’)Casts(34524,28756)Casts(67725,28756)Movie(28756,‘SweeneyTodd’,2007)Movie(28757,‘TheDaVinciCode’,2006)
Q1(y):-Movie(x,y,z),z=‘2007’
Facts=tuplesinthedatabase
Rules=queriesFindmoviesmadein2007
Q2(f,l):-Actor(z,f,l),Casts(z,x),Movie(x,y,’2007’)Findactorswhoactedinamoviein2007
Q3(f,l):-Actor(z,f,l),Casts(z,x1),Movie(x1,y,’2007’),Casts(z,x2),Movie(x2,y2,’2006’)
Findactorswhoactedinamoviein2007andin2006
EDB and IDB
50
} ExtensionalDatabasePredicates:EDB} Actor,casts,movie
} Inten0onalDatabasePredicates:IDB} Q1,Q2,Q3
Q2(f,l):-Actor(z,f,l),Casts(z,x),Movie(x,y,’2007’)
Terminology
51
Q2(f,l):-Actor(z,f,l),Casts(z,x),Movie(x,y,’2007’)
head body
atom atom atom
f,l :headvariablesx,y,z :existen0alvariables
Datalog Program
52
B0(x):-Actor(x,’Kevin’,’Bacon’)B1(x):-Actor(x,f,l),Casts(x,z),Casts(y,z),B0(y)B2(x):-Actor(x,f,l),Casts(x,z),Casts(y,z),B1(y)Q4(x):-B1(x)Q4(x):-B2(x)union
FindactorswithBaconnumber≤2
Simple datalog programs
53
1
2 3
4
5
1 2
2 1
2 3
1 4
3 4
4 5
R=
T(x,y):-R(x,y)T(x,y):-R(x,z)T(z,y)
Whatdoesthiscompute?
Simple datalog programs
54
1
2 3
4
5
1 2
2 1
2 3
1 4
3 4
4 5
R=
T(x,y):-R(x,y)T(x,y):-R(x,z)T(z,y)
Whatdoesthiscompute?
Tisini0allyempty
Simple datalog programs
55
1
2 3
4
5
1 2
2 1
2 3
1 4
3 4
4 5
R=
T(x,y):-R(x,y)T(x,y):-R(x,z)T(z,y)
Whatdoesthiscompute?
1 2
2 1
2 3
1 4
3 4
4 5
1stitera0on
Simple datalog programs
56
1
2 3
4
5
1 2
2 1
2 3
1 4
3 4
4 5
R=
T(x,y):-R(x,y)T(x,y):-R(x,z)T(z,y)
Whatdoesthiscompute?
1 2
2 1
2 3
1 4
3 4
4 5
1stitera0on
1 2
2 1
2 3
1 4
3 4
4 5
2nditera0on
1 1
1 3
2 2
2 4
1 5
3 5
Simple datalog programs
57
1
2 3
4
5
1 2
2 1
2 3
1 4
3 4
4 5
R=
T(x,y):-R(x,y)T(x,y):-R(x,z)T(z,y)
Whatdoesthiscompute?
1 2
2 1
2 3
1 4
3 4
4 5
1stitera0on
1 2
2 1
2 3
1 4
3 4
4 5
2nditera0on
1 1
1 3
2 2
2 4
1 5
3 5
1 2
2 1
2 3
1 4
3 4
4 5
3rditera0on
1 1
1 3
2 2
2 4
1 5
3 5
2 5
Datalog with Negation
58
B0(x):-Actor(x,’Kevin’,’Bacon’)B1(x):-Actor(x,f,l),Casts(x,z),Casts(y,z),B0(y)Q5(x):-Actor(x,f,l),notB1(x),notB0(x)
FindactorswithBaconnumber≥2
Recursion and negation: L
59
S(x):-R(x),notT(x)T(x):-R(x),notS(x)EDB:R(a)
Thefixpointinunclear!
Unsafe Datalog Rules
60
Whatisunsafeabouttheserules?
U1(x,y):-Movie(x,z,’2007’),y>‘2000’
U2(x,u):-Movie(x,z,’2007’),notCasts(u,x)
Aruleissafeifeveryvariableappearsinsomeposi0verela0onalatom