tiresias - people.cs.umass.eduameli/projects/tiresias/slides/sigmod2012.pdf · 0 10 20 30 40 50 60...
TRANSCRIPT
![Page 1: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/1.jpg)
University of Washington Database Group
Tiresias The Database Oracle for How-‐To Queries
Alexandra Meliou§✜
Dan Suciu✜
§University of Massachuse9s Amherst ✜University of Washington
![Page 2: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/2.jpg)
Hypothetical (What-‐if) Queries
Brokerage company DB
Key Performance Indicators (KPI)
Example from [Balmin et al. VLDB’00]: “An analyst of a brokerage company wants to know what would be the effect on the return of customers’ porPolios if during the last 3 years they had suggested Intel stocks instead of Motorola.”
change something in the source (hypothesis)
observe the effect in the target
forward
![Page 3: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/3.jpg)
How-‐To Queries
Brokerage company DB
Key Performance Indicators (KPI)
Modified example: “An analyst wants to ask how to achieve a 10% return in customer porPolios, with the least number of trades.”
find changes to the source that achieve the desired effect
declare a desired effect in the target
reverse
![Page 4: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/4.jpg)
TPC-‐H example A manufacturing company keeps records of inventory orders in a LineItem table. KPI: Cannot order more than 7% of the inventory from any single country
Can reassign orders to new suppliers as long as the supplier can supply the part
Minimize the number of changes
(constraints)
(variables)
(op\miza\on objec\ve) constraint op\miza\on
![Page 5: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/5.jpg)
extract data
Constraint Optimization on Big Data
DB
construct op\miza\on model
this is for a set of 10 lineitems and 40
suppliers
Mixed Integer Programming (MIP)
solver
transform into data updates
MathProg
Imprac\cal!
![Page 6: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/6.jpg)
Demo: Tiresias
a tool that makes how-‐to queries prac\cal
![Page 7: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/7.jpg)
Tiresias: How-‐To Query Engine
DBMS
MIP solver
Tiresias
TiQL (Tiresias Query Language)
Declara\ve interface, extension to Datalog
RULES:
HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
& SuppNation(sk2,country)
HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)
HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)
[c? <= 10] <- HOrderSum(country,c?)
MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)
![Page 8: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/8.jpg)
Overview
MathProg or AMPL
RULES:
HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
& SuppNation(sk2,country)
HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)
HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)
[c? <= 10] <- HOrderSum(country,c?)
MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)
TiQL
Visualiza\ons
![Page 9: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/9.jpg)
RULES:
HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
& SuppNation(sk2,country)
HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)
HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)
[c? <= 10] <- HOrderSum(country,c?)
MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)
MathProg or AMPL
TiQL
Visualiza\ons
Overview
Demo
![Page 10: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/10.jpg)
Demo
MathProg or AMPL
RULES:
HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
& SuppNation(sk2,country)
HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)
HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)
[c? <= 10] <- HOrderSum(country,c?)
MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)
TiQL
Visualiza\ons
Overview
Language seman\cs
Evalua\on of a TiQL program: Transla\on from TiQL to linear constraints
Performance op\miza\ons
![Page 11: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/11.jpg)
Tiresias Query Language Datalog-‐like nota\on:
TiQL seman\cs: Mapping from EDBs (Extensional Database) to possible worlds over HDBs (Hypothe\cal Database)
head body: conjunc\on of predicates
HDB
P (x̄) :-B1(x̄1) ∧B2(x̄2) ∧ · · · ∧Bn(x̄n)
HP (x̄) :- body
![Page 12: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/12.jpg)
TiQL Rules
Reduc\on Rule HP(x̄) :< body
Constraint Rule [arithm-pred] <- body
Deduc\on Rule HP(x̄) :- body
Seman+cs: Similar to repair-‐key seman\cs [Antonova et al. SIGMOD’07], [Koch ICDT’09]
Seman+cs: Takes a subset of tuples
Seman+cs: The head predicate needs to hold for all tuples
![Page 13: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/13.jpg)
Demo
MathProg or AMPL
RULES:
HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
& SuppNation(sk2,country)
HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)
HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)
[c? <= 10] <- HOrderSum(country,c?)
MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)
TiQL
Visualiza\ons
Overview
Language seman\cs
Evalua\on of a TiQL program: Transla\on from TiQL to linear constraints
Performance op\miza\ons
![Page 14: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/14.jpg)
Evaluating a TiQL Program
Mixed Integer Program (MIP)
TiQL RULES:
HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
& SuppNation(sk2,country)
HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)
HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)
[c? <= 10] <- HOrderSum(country,c?)
MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)
DB
![Page 15: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/15.jpg)
CORE HChooseSok pk sk sk’1 P15 S10 S101 P15 S10 S212 P32 S43 S102 P32 S43 S43
Evaluating a TiQL Program HChooseS(ok,pk,sk,sk’) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
LineItemok pk sk quant1 P15 S10 222 P32 S43 45
PartSupppk skP15 S10P15 S21P32 S10P32 S43
ok pk sk sk’1 P15 S10 S102 P32 S43 S10
ok pk sk sk’1 P15 S10 S102 P32 S43 S43
ok pk sk sk’1 P15 S10 S212 P32 S43 S43
ok pk sk sk’1 P15 S10 S212 P32 S43 S10
possible worlds
![Page 16: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/16.jpg)
Key Constraints CORE HChooseSok pk sk sk’1 P15 S10 S101 P15 S10 S212 P32 S43 S102 P32 S43 S43
x1
x2
x3
x4
0 ≤ xi ≤ 1∀kj :
�
i,key(xi)=kj
xi ≤ 1x1 + x2 ≤ 1x3 + x4 ≤ 1
Key(ok, pk, sk)
ok pk sk sk’1 P15 S10 S101 P15 S10 S21
NOT a possible world
![Page 17: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/17.jpg)
Provenance Constraints A TiQL rule specifies transforma\ons Transforma\ons define provenance
Boolean seman\cs for queries without aggregates Semi-‐module provenance for queries with aggregates [Amsterdamer et al. PODS’11]
Y = X1 ∨X2 ∨ . . . ∨Xn
∀i, y ≥ xi
y ≤�
i
xi
Disjunc+on:
Y = X1 ∧X2 ∧ . . . ∧Xn
∀i, y ≤ xi
y ≥�
i
xi − (n− 1)
Conjunc+on:
![Page 18: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/18.jpg)
Demo
MathProg or AMPL
RULES:
HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)
& SuppNation(sk2,country)
HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)
HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)
[c? <= 10] <- HOrderSum(country,c?)
MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)
TiQL
Visualiza\ons
Overview
Language seman\cs
Evalua\on of a TiQL program: Transla\on from TiQL to linear constraints
Performance op\miza\ons
![Page 19: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/19.jpg)
Optimizing Performance Model op\mizer
eliminates variables, constraints, and parameters uses key constraints, func\onal dependencies, and provenance
Par\\oning op\mizer Significantly faster than lefng the MIP solver do it
max(x1 + x2 + x3 + x4)s.t : x1 + x2 ≤ 50
x3 + x4 ≤ 50xi ≥ 0
max(x1 + x2)s.t : x1 + x2 ≤ 50
xi ≥ 0
max(x3 + x4)s.t : x3 + x4 ≤ 50
xi ≥ 0
![Page 20: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/20.jpg)
1
10
100
1000
10000
100 1000 10000
Runt
ime
(sec
)
Data size (# of tuples)
without model optimizationwith model optimization
Evaluation of the Model Optimizer
baseline
with op\miza\on
![Page 21: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/21.jpg)
0
10
20
30
40
50
60
70
80
0 100 200 300 400 500 600
Runt
ime
(sec
)
Group size (# of partitions per group)
TiQL to MIP translationMIP solver
total Tiresias query time
0
5000
10000
15000
20000
25000
0 100 200 300 400 500 600
Runt
ime
(sec
)
Group size (# of partitions per group)
TiQL to MIP translationMIP solver
total Tiresias query time
Evaluation of Tiresias Partitioning
10k tuples 1M tuples
granularity of par\\oning
complex dependency on the granularity of par\\oning
![Page 22: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/22.jpg)
0
1
2
3
4
5
6
7
8
5k 10k 50k 100k 500k 1M
Runt
ime
(sec
)
Data size (# of tuples)
MIP solver / per partition (avg)MIP constructor / per partition (avg)
Scalability
The MIP solver run\me (per par\\on) does not increase
with data size
Constructor \me depends on DB query execu\on \me
![Page 23: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/23.jpg)
Related Work Provenance
[Amsterdamer et al. PODS’11], [Cui et al. TODS’00], [Green et al. PODS’07]
Incomplete databases [Antonova et al. SIGMOD’07], [Imielinski et al. JACM’84], [Koch ICDT’09]
Other RDM problems [Arasu et al. SIGMOD’11], [Binnig et al. ICDE’07], [Bohannon et al. PODS’06], [Fagin et al. JACM’10]
![Page 24: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/24.jpg)
Next Steps with Tiresias
Tiresias Handling non-‐par\\onable problems
Approxima\ons
Paralleliza\on and handling of skew
Result analysis and feedback-‐based problem genera\on
![Page 25: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/25.jpg)
SIGMOD Demo Group C Loca+on: Vaquero A Time: 13:30-‐15:00
![Page 26: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)](https://reader031.vdocuments.mx/reader031/viewer/2022021916/5e09ec795ed9793fba16abb0/html5/thumbnails/26.jpg)
Contributions How-‐To queries
Using MIP solvers to answer How-‐To queries
Tiresias prototype implementa\on
hlp://db.cs.washington.edu/\resias