tractability in probabilistic databases · this talk: query evaluation • i will describe recent...
TRANSCRIPT
Tractability in Probabilistic Databases
Dan Suciu University of Washington
ICDT 2011 1 Dan Suciu: Tractability in Probabilistic Databases
Slides available from h0p://www.cs.washington.edu/homes/suciu/
Motivation Lots of uncertain data: • External data in business intelligence: blogs,
emails, twitter, social network data
• Data generated by large information extraction systems
• Data integration in the presence of uncertainty
• Specialized data (RFID, scientific data)
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 2
Probabilistic Databases
• In traditional databases: data is certain • In new applications: data is uncertain
• Probabilistic Databases: model uncertainty using probabilities
• Theoretical foundations: logic+probability
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 3
Landscape of PDB Research Incomplete list: • Representation
– [Widom’05, Antova’07, Sen’07, Benjelloun’08] • Query evaluation
– [Dalvi’04, Antova’08,Olteanu’09, Koch’08,Kimelfeld’09,Deutsch’10] • Data integration, data exchange
– [Dong’07,Agrawal’10,Fagin’10] • Applications
– [Nierman’02,Keulen’05, Gupta’06,Hassanzadeh’09,Stoyanovich’10] • Ranking
– [Re’07,Soliman’07,Zhang’08,Li’09] • Indexing
– [Letchner’09,Kanagal’09, Kimura’10] • Pushing ML in the db engine
– [Wang’08,Wick’10]
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 4
Healthy research …
Landscape of PDB Systems
• Some research prototypes: – MaybeMS (Oxford&Cornell), Trio (Stanford), MystiQ
(UW), ProbDB (Maryland), Orion (Purdue) – They do not scale to large datasets
• Commercial systems: NONE ! – Why ? Because to date we do not know how to build
scalable probabilistic database systems
Main challenge: Query Evaluation
This Talk: Query Evaluation
• I will describe recent progress: tractable queries
• I will discuss where we are stuck: intractable queries
This talk is based on:
• Dalvi, S. Efficient query evaluation on probabilistic databases. VLDB’04
• Dalvi,S. The Dichotomy of Conjunctive Queries on Probabilistic Structures, PODS’07
• Dalvi, Schnaitter, S.: Computing query probability with incidence algebras. PODS’10
• Gatterbauer, Jha, S., Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases. MUD'2010
Adver&sement: Koch, Olteanu, Re, S., Probabilis)c Databases, Morgan & Claypool, planned for June, 2011
Outline
• Problem Statement
• Intractable queries
• Tractable queries
• Summary and Open Problems
7 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
Probabilistic Database Every tuple t in D = random variable (present/absent) Possible worlds semantics: D = (W, p), where p:Wà[0,1] s.t. ΣW∈W p(W) = 1
A B a1 b1 .7 a2 b2 .4 a2 b3 .2 a2 b4 .2 a3 b5 .5
S(A,B)
A a1 .4 a2 .2 a3 .5
R(A) 8 tuples à 28 possible worlds W = {W1, W2, …, W256}
In this talk = all tuples are independent
Need correlaPons ? Normalize database !
Queries = UCQ
9
Q := R(x1,…,xk) | ∃x.Q | Q1 ∧ Q2 | Q1 ∨ Q2
R(x),S(x,y) ∃x.∃y. R(x)∧S(x,y) ≣
Traditional database: D ⊨ Q (true/false) Probabilistic database: P(Q)=ΣW∈W: W ⊨ Q p(W) (in [0,1])
In this talk, I consider only Boolean queries:
Unions of ConjuncPve Queries:
Lineage
Query Q + Database D = Lineage FQ • Every tuple t in D = Boolean variable Xt
• The lineage FQ = a propositional formula – Definition: next slide – Intuitively: it says when Q is true on D
• Terminology alert: what we call here lineage corresponds to the PosBool semiring [Tannen, ICDT’10]
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 10
Definition of Lineage FQ
A B a1 b1 Y1
a1 b2 Y2
a2 b3 Y3
a2 b4 Y4
a2 b5 Y5
S(A,B) A a1 X1
a2 X2
a3 X3
R(A)
Q = R(x),S(x,y)
FQ = X1 Y1 ∨ X1 Y2 ∨ X2 Y3 ∨ X2 Y4 ∨ X2 Y5
Def. Ft = Xt (t = ground tuple t) F∃x.Q = FQ[a1/x] ∨ . . . ∨ FQ[an/x] (active dom.={a1,…an}) FQ1∧Q2 = FQ1 ∧ FQ2 FQ1∨Q2 = FQ1 ∨ FQ2
Example
For any fixed Q, FQ has a DNF of size polynomial
in size(D)
Problem Statement
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 12
Problem. Fix Q. Given D, compute P(Q); Equivalently: compute P(FQ).
This is a form of probabilistic inference (next)
Dichotomy Theorem. For any, fixed UCQ Q: • either P(Q) is in PTIME -- tractable • or P(Q) is hard for FP#P -- intractable
This talk I discuss tractable and intractable queries
Data complexity
Discussion: Probabilistic Inference
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 13
Graphical models: X Y
Z U
P(X,Y,Z,U) = P(X) P(Y) P(Z|X,Y) P(U|Y)
Bayesian Network X Y
Z U
Markov Network
Φ(X,Y,Z,U) = 1/Z * Φ1(X,Y,Z) Φ2(Y,U)
Propositional formula = very special case
F = X Y ∨ Y Z ∨ X Z Y ZX
∧ ∧∧
∨
Y ZX
∧ ∧∧
∨
Inference: exponential in
tree-width
Large tree-width
[Pearl’88,Koller&Friedman’09,Darwiche’09]
Discussion: Probabilistic Inference
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 14
Graphical Models Prob DBs
Model Complex P(X) = 1/Z Πi Φi(Xi)
Simple Independent tuples
Query Simple Q = P(X,Y | Z,U,V)
Complex Q =∃x.∃y.R(x)∧S(x,y)
Network Static Bayesian/Markov Network
Dynamic FQ = Q + D
Complexity f(tree-width, network) f(Q, D)
Inference in GM’s is exponential in tw; ill suited for PDB Need new, query-based approach: data complexity
Outline
• Problem Statement
• Intractable queries
• Tractable queries
• Summary and Open Problems
15 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
Intractable Queries
16
Theorem. Data complexity of Hk is hard for FP#P
H1= R(x0),S(x0,y0) ∨ S(x1,y1),T(y1)
H2= R(x0),S1(x0,y0) ∨ S1(x1,y1),S2(x1,y1) ∨ S2(x2,y2),T(y2)
H3=R(x0),S1(x0,y0) ∨ S1(x1,y1),S2(x1,y1) ∨ S2(x2,y2),S3(x2,y2)∨S3(x3,y3),T(y3)
H0= R(x),S(x,y),T(y)
. . .
Will give the proof for H0 and H1 next
[Dalvi&S.’04]
[Dalvi&S.’07, Dalvi,Schnai0er,S.’10]
Model Counting
Theorem The problem: “given a PP2DNF F, count the number of satisfying assignments #F” is #P-hard.
Definition A propositional formula F is a Positive, Partite, 2DNF if F = ∨i,j Xi Yj
Example: F = X1 Y1 ∨ X1 Y2 ∨ X2 Y3 ∨ X2 Y4 ∨ X2 Y5
[Provan&Ball’83]
Holds also for PP2CNF
Proof: H0 is hard for FP#P Reduction from #PP2DNF • Given F = Xi1 Yj1 ∨ Xi2 Yj2 ∨ …
construct D =
• Assignment θ ↔ possible world RW,TW
• θ satisfies F ↔ (RW,S, TW) ⊨ H0 • Therefore P(H0) = #F / 2n,
where n = number of variables
H0= R(x),S(x,y),T(y)
[Dalvi&S.’04]
A B
Xi1 Yj1 1
XI2 Yj2 1
Xi3 Yj3 1
…
A
X1 ½
X2 ½
…
B
Y1 ½
Y2 ½ …
S R T
We can compute #F using an oracle for P(H0); hence H0 is hard for FP#P
Proof: H1 is hard for FP#P By reduction from #PP2CNF: • F = (Xi1∨Yj1)∧(Xi2∨Yj2)∧… ∧(Xim∨Yjm)
Construct D =
• Assignment θ ↔ possible world RW,TW (no S!) • P(¬H1 | θ) = z#cnt(θ), where #cnt(θ) = |{k | θ(Xik∨ Yjk) = true}| • P(¬H1) = ΣθP(¬H1 |θ)P(θ) = 1/2n Σk=0,m ak zk = f(z)
where ak = |{θ | #cnt(θ) = k}|
• Compute f(z) at m+1 distinct values z, using oracle for P(H1). Obtain all coefficients a0,a1,…,am
• Return #F = am
A B
Xi1 Yj1 1-z
XI2 Yj2 1-z
Xi3 Yj3 1-z
…
A
X1 ½
X2 ½
…
B
Y1 ½
Y2 ½ …
S R T H1= R(x0),S(x0,y0) ∨ S(x1,y1),T(y1)
[Dalvi&S.’07]
We can compute #F using an oracle for P(H1); hence H1 is hard for FP#P
Discussion of Intractable Queries
• H0 corresponds to a Restricted Boltzmann Machine
• The hardness proof for Hk, k ≥ 2, and other queries, requires significant extensions
• There are other intractable queries: will describe all of them shortly
Open problems: Is the model counting problem for an intractable query #P hard ? How do we evaluate/approximate them ?
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 20
[Salakhutdinov’10]
Outline
• Problem Statement
• Intractable queries
• Tractable queries – Safe plans – An algorithm for UCQ queries
• Summary and Open Problems
21 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
Overview
• Traditional query processing is done with query plans using relational operators
• Query processing on probabilistic databases is done with query plans using simple extensions of the relational operators
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 22
A B P a1 b1 q1 Y1
a1 b2 q2 Y2
a2 b3 q3 Y3
a2 b4 q4 Y4
a2 b5 q5 Y5
S(A,B)
A P a1 p1 X1
a2 p2 X2
a3 p3 X3
R(A)
Q = ∃x.∃y. R(x),S(x,y)
P(Q) = 1-(1-q1)*(1-q2) p1*[ ]
1-(1-q3)*(1-q4)*(1-q5) p2*[ ]
1- {1- } *
{1- }
FQ = X1Y1 ∨ X1Y2 ∨ X2Y3 ∨ X2Y4 ∨ X2Y5
“Read-once” *
= X1(Y1 ∨ Y2) ∨ X2(Y3 ∨ Y4 ∨ Y5)
Example
* See Sudeepa Roy’s talk on Tuesday
A B P
a1 b1 q1 Y1
a1 b2 q2 Y2
a2 b3 q3 Y3
a2 b4 q4 Y4
a2 b5 q5 Y5
S(A,B)
A P
a1 p1 X1
a2 p2 X2
a3 p3 X3
R(A)
⋈
A B P
a1 b1 p1*q1 X1 ∧ Y1
a1 b2 p1*q2 X1 ∧ Y2
a2 b3 p2*q3 X2 ∧ Y3
a2 b4 p2*q4 X2 ∧ Y4
a2 b5 p2*q5 X2 ∧ Y5
Independent join operator
T1(A,B)
A B P
a1 b1 q1 Y1
a1 b1 q2 Y2
a2 b2 q3 Y3
a2 b3 q4 Y4
a2 b2 q5 Y5
S(A,B)
A P
a1 1 - (1-p1)*(1-p2) Y1 ∨ Y2
a2 1 - (1-p3)*(1-p4)*(1-p5) Y3 ∨ Y4 ∨Y5
Independent projection operator
ΠA
T2(A)
Operators
… other operators for self-joins and for unions
i i
Q = ∃x.∃y. R(x),S(x,y)
q1
q2
q3
q4
q5
p1
p2
p3
⋈
p1 q1
p1 q2
p2 q3
p2 q4
p2 q5
ΠΦ
S(A,B) R(A)
1-(1-p1q1)(1-p1q2)(1-p2q3)(1-p2q4)(1-p2q5)
1-(1-q1)(1-q2)
1-(1-q4)(1-q5) (1-q6)
1-{1-p1[1-(1-q1)(1-q2)]}* {1-p2[1-(1-q4)(1-q5) (1-q6)]}
Wrong Right
T2(A) T1(A,B)
i
i
⋈
ΠΦ
S(A,B) R(A)
ΠA
i
i
i
[Dalvi&S.’04]
Discussion • An extended operator manipulates
probabilities explicitly: join, projection, etc
• A safe plan is a plan that computes the query probability correctly
• Next: I will give an algorithm for computing all tractable UCQ queries. Each tractable query can be computed using a safe plan, using a small set of extended operators, but I will not discuss safe plans in the rest of the talk
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 26
Outline
• Problem Statement
• Intractable queries
• Tractable queries – Safe plans – An algorithm for UCQ queries
• Summary and Open Problems
27 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
CQ with Self-Joins
28
QJ = q1, q2 = R(x1),S(x1,y1), T(x2),S(x2,y2)
FJ = [X1(Y1 ∨ Y2) ∨ X2 (Y3 ∨ Y4 ∨ Y5)] ∧ [Z1(Y1 ∨ Y2) ∨ Z2 (Y3 ∨ Y4 ∨ Y5)] Not read-once
QU = q1 ∨ q2 = R(x1),S(x1,y1) ∨ T(x2),S(x2,y2)
FU = (X1∨Z1) (Y1∨Y2) ∨(X2∨Z2) (Y3∨Y4∨ Y5) Read-once
P(QJ)= P(q1, q2) = P(q1) + P(q2) - P(q1 ∨ q2)
PTIME !
[Dalvi,Schai0er,S.’10]
Discussion
• In order to handle self-joins, we needed ∨
• CQ = not a natural class to study
• UCQ = the natural class to study
• Will give the algorithm next as five rules
29 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
BUMMER !
Five Rules for Query Evaluation
Rule 1: Inclusion/Exclusion Formula P(Q1 ∧ Q2 ∧ Q3) = P(Q1) + P(Q2) + P(Q3) - P(Q1 ∨ Q2) ‒ P(Q1 ∨ Q3) ‒ P(Q2 ∨ Q3) + P(Q1 ∨ Q2 ∨ Q3)
Note1: this is the dual of the more popular: P(Q1 ∨Q2 ∨Q3) = . . .
Note2: this rule is not used for inference
on GM or on propositional formulae; new in PDBs
Five Rules for Query Evaluation
31
Rule 2: Independent Project If z is separator variable, P(∃z.Q) = 1 – (1 – P(Q[a1/z])) × (1 – P(Q[a2/z])) × … Where active domain = {a1, a2, a3, … an}
Definition. z is called separator variable if: • z occurs in every atom A in Q (z is a root variable) • If two atoms A1, A2 in Q are unifiable,
then z occurs on a common position in A1, A2.
Example: QU = R(x1),S(x1,y1) ∨ T(x2),S(x2,y2) = ∃z.(∃y1.R(z),S(z,y1) ∨∃y2.T(z),S(z,y2)) z is “separator variable”
Five Rules for Query Evaluation
32
If Q1, Q2 are independent (no common symbols)
Rule 3: Independent Join P(Q1∧Q2) = P(Q1) × P(Q2)
Rule 4: Independent Union P(Q1∨Q2) = 1-(1-P(Q1)) × (1-P(Q2))
Five Rules for Query Evaluation
33
Rule 5a: Ranking attribute-constant Given attribute R(…A…) and constant ‘a’, replace R in Q with R = R1 ∪ R2, where R1 = σA=‘a’(R), R2 = σA≠’a’(R)
Rule 5b: Ranking attribute-attribute Given two attributes R(…A…B…) replace R in Q with R = R1 ∪ R2 ∪ R3, where R1 = σA<B(R) , R2 = σA>B(R) , R3 = σA=B(R)
Example: q = R(x,y),R(y,x) rewrite to q = R1(x,y),R2(y,x) ∨ R3(z,z)
Note: ranking is applied before all other rules
Summary
• Five rules: 1. Inclusion/exclusion 2. Existential quantifier elimination (separator !) 3. Independent AND 4. Independent OR 5. Ranking
• Each rule reduces P(Q) to a simpler P(Q’) – If we succeed, then P(Q) in PTIME – If we fail, then P(Q) is hard for FP#P
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 34
Specific to PDB, not used in general
probabilisPc inference
An Example
35
QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3)
DNF = H1 (hard !)
[Dalvi,Schai0er,S.’10]
An Example
36
QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3)
DNF Disconnected query = H1 (hard !)
[Dalvi,Schai0er,S.’10]
An Example
37
QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3)
QV =[S(x2,y2),T(y2)∨ R(x3)] ∧ [R(x1),S(x1,y1)∨T(y3)]
DNF
CNF
Disconnected query = H1 (hard !)
[Dalvi,Schai0er,S.’10]
An Example
38
QV = R(x1),S(x1,y1) ∨ S(x2,y2),T(y2) ∨ R(x3),T(y3)
QV =[S(x2,y2),T(y2)∨ R(x3)] ∧ [R(x1),S(x1,y1)∨T(y3)]
DNF
CNF
= R(x3) ∨ T(y3)
P(QV) = P(q1∧q2)= P(q1) + P(q2)-P(q1∨q2)
PTIME !
Disconnected query = H1 (hard !)
Inclusion/exclusion:
QV has a subquery H1 that is hard, yet it is in PTIME !
[Dalvi,Schai0er,S.’10]
Another Example
QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */
39
[Dalvi,Schai0er,S.’10]
Another Example
P(QW) = P(Q1) + P(Q2) + P(Q3) + - P(Q1 ∨ Q2) - P(Q2 ∨ Q3) – P(Q1 ∨ Q3) + P(Q1 ∨ Q2 ∨ Q3)
Also = H3
40
= H3 (hard !)
QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */
[Dalvi,Schai0er,S.’10]
Another Example
P(QW) = P(Q1) + P(Q2) + P(Q3) + - P(Q1 ∨ Q2) - P(Q2 ∨ Q3) – P(Q1 ∨ Q3) + P(Q1 ∨ Q2 ∨ Q3)
Also = H3
41
PTIME !
QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */
= H3 (hard !)
[Dalvi,Schai0er,S.’10]
How do we need to detect cancellaPons in the inclusion/excusion rule ?
August Ferdinand Möbius 1790-1868
• Möbius strip • Möbius function µ in
number theory • Generalized to lattices
[Stanley’97,Rota’09] • And now to queries !
42
From Query Q to Lattice L(Q)
• Write Q in CNF: Q = Q1 ∧ Q2 ∧ … ∧ Qm.
• For s ⊆ [m], denote Qs = ∨i∈sQi
Def. The CNF lattice of Q is L(Q)=(L,≤) where: • L = { Qs | s ⊆ [m] } (up to logical equivalence) • Qs1 ≤ Qs2 if the logical implication Qs2 à Qs1 holds
[Dalvi,Schai0er,S.’10]
Example
Q1 Q2 Q3
Q2∨Q3 Q1∨Q2
Q1∨Q2∨Q3 = Q1∨ Q3 44
Blue nodes are in PTIME, Red nodes are #P hard.
1̂ =max(L) 1̂
[Dalvi,Schai0er,S.’10]
QW = [R(x0),S1(x0,y0) ∨ S2(x2,y2),S3(x2,y2)] ∧ /* Q1 */ [R(x0),S1(x0,y0) ∨ S3(x3,y3),T(y3)] ∧ /* Q2 */ [S1(x1,y1),S2(x1,y1) ∨ S3(x3,y3),T(y3)] /* Q3 */
L(QW) =
The Möbius’ Function 1̂
-1 -1 -1
1 1
0
45
1
1̂ -1 -1 -1
2
1
Def. The Möbius function: µ( , ) = 1 µ(u, ) = - Σu < v ≤ µ(v, )
1̂ 1̂ 1̂ 1̂ 1̂
Möbius’ Inversion Formula: P(Q) = - Σ Qi < µ(Qi, ) P(Qi) 1̂ 1̂
[Stanley 97]
Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
Rule 1 (revised) Inclusion/Exclusion à Mobius’ Inversion Formula
The Dichotomy
Dichotomy Theorem Fix a UCQ query Q. 1. Algorithm terminates, then P(Q) is in PTIME 2. Algorithm fails, then P(Q) is hard for FP#P
Note 1: dichotomy into PTIME/FP#P based on “syntax” where “syntax” includes the Möbius function !
Note 2: the query complexity is open. Note 3: we cannot avoid the Möbius function (next).
[Dalvi,Schai0er,S.’10]
Representation Theorem
Proof • Let u0, u1, …, uk be the “join irreducibles” of L • Associate them to the k+1 components of
Hk = hk0 ∨ hk1 ∨ … ∨ hkk • For every v ∈ L, define Qv = ∨{hki | ui ≰ v} • Define Q = ∧ {Qv | v = co-atom in L}
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 47
THEOREM For every lattice L, there exists Q s.t. L(Q) ≅ L and: • The query at (= min(L)) is hard for FP#P
• All other queries are in PTIME 0̂
1̂
Q9
0 Q is in PTIME iff µ( , )=0 ! 1̂ 0̂
[Dalvi,Schai0er,S.’10]
Example:
Summary of Tractable Queries Five simple rules • Each rule has operator (not shown) à safe plan • Open problem: engineering challenges of safe
plans
Other approaches to compute P(F) include: read-once formulae, OBDDs, FBDDs, d-DNNF’s • For probabilistic databases, the Möbius Rule rules
them all (see Abhay Jha’s talk Tuesday)
ICDT 2011 Dan Suciu: Tractability in Probabilistic Databases 48
Möbius Über Alles
49
UCQ PTIME
d-DNNF
FBDD
OBDD=TW=PW
RO QJ
QV
QW
Q9
H0 H1 H2
?
QU
Necessary and sufficient
characterization
Sufficient Characterization
(Necessary conjectured)
H3
Preview of Abhay Jha’s talk on Tuesday
Outline
• Problem Statement
• Intractable queries
• Tractable queries
• Summary and Open Problems
50 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
Summary • Why we care:
– Strong demand for managing uncertain data • What is difficult:
– Computational complexity of Logic + Probabilities
• What we have seen today: – Tractable queries: the five rules are simple – Intractable queries: hardness proofs are difficult – Dichotomy into PTIME/FP#P based on syntax
51 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
Open Problems • Correlations using views:
– Markov Logic à Markov DBs ? • Model counting for H0 (and others) #P hard ? • Efficient approximation of H0 (and others) ? • Query complexity for checking hardness ? • Data (and query) complexity for:
FO (¬ ∀), interpreted predicates (≠, < ) ? • Characterize UCQ(FBDD), UCQ(d-DNNF) • Prove UCQ(d-DNNF) ≠ UCQ(PTIME)
52 Dan Suciu: Tractability in Probabilistic Databases ICDT 2011
[Richardson&Domingos]
The 4x Grand Challenge Challenge: make probabilistic dbs run at most 4x slower than deterministic dbs, by using approximations Suggested approach: • A rich class of tractable UCQ queries are known • Given any Q, find all tractable Q’, s.t. P(Q) ≈ P(Q’):
– Model theoretic (Robert Fink’s talk on Tuesday) – Dissociation [Gatterbauer’10] http://LaPushDB.com
This is an engineering challenge: search space for Q’; opPmize and execute Q’
Thank You !
QuesPons ?
Slides available from h0p://www.cs.washington.edu/homes/suciu/