Full DisjunctionsFull Disjunctions:: Polynomial-Delay Iterators in ActionPolynomial-Delay Iterators in Action
Sara Cohen Sara Cohen Technion Israel
Yaron KanzaYaron KanzaUniversity of Toronto
Canada Benny Kimelfeld Benny Kimelfeld Hebrew University
Israel
Yehoshua SagivYehoshua SagivHebrew University
Israel
Itzhak FadidaItzhak FadidaTechnion Israel
VLDB 2006Seoul, Korea
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Computing Full DisjunctionsComputing Full Disjunctions
The full disjunctionfull disjunction is a relational operator that maximally combines data from several relations– It extends the natural join by allowing incompleteness– It extends the binary outerjoin to many relations
This paper presents algorithms and optimizations for computing full disjunctions– Theoretically, full disjunctions are more tractable than
previously known– Practically, a significant improvement over the state-of-
art, an iterator-like evaluation
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
Contributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
Conclusion
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full DisjunctionsFull Disjunctions−Complexity
Contributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
The Natural The Natural JoinJoin Operator Operator
CountryClimateCityHotelStarsSiteClimates Accommodations Sites
CountryClimateCanadadiverse
Bahamastropical
UKtemperat
e
ClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
BahamasNassauHilton
Accommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckinghamUKLondonHyde Park
Sites
⋈ ⋈
CanadadiverseLondonRamad
a3Air
Show
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
The Natural Join Misses InformationThe Natural Join Misses Information
CountryClimateCanadadiverse
Bahamastropical
UKtemperat
e
CountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
BahamasNassauHilton
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckinghamUKLondonHyde Park
Climates Accommodations
Sites
CanadadiverseLondonRamad
a3Air
Show
Climates Accommodations SitesCountryClimateCityHotelStarsSite
⋈ ⋈
Bahamas is not in SitesSites, so the natural join misses
it
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
The Natural Join Misses InformationThe Natural Join Misses Information
CountryClimateCanadadiverse
Bahamastropical
UKtemperat
e
Climates AccommodationsCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
BahamasNassauHilton
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckinghamUKLondonHyde Park
CountryClimateCityHotelStarsSiteClimates Accommodations Sites
CanadadiverseLondonRamad
a3Air
Show
⋈
Bahamas is not in SitesSites, so the natural join misses
itMouth Logan is not in a
city, hence missed
⋈Empty space means null value
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
The Natural Join Misses InformationThe Natural Join Misses Information
CountryClimateCanadadiverse
Bahamastropical
UKtemperat
e
Climates AccommodationsCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
BahamasNassauHilton
A looser notion of join is needed—one that enables joining tuples from some of the tablesA looser notion of join is needed—one that enables joining tuples from some of the tables
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckinghamUKLondonHyde Park
CountryClimateCityHotelStarsSiteClimates Accommodations Sites
CanadadiverseLondonRamad
a3Air
Show
⋈ ⋈
Bahamas is not in SitesSites, so the natural join misses
itMouth Logan is not in a
city, hence missed
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
The Natural The Natural JoinJoin Operator Operator
CountryClimateCityHotelStarsSiteClimates Accommodations Sites
CountryClimateCanadadiverse
Bahamastropical
UKtemperat
e
ClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
BahamasNassauHilton
Accommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckinghamUKLondonHyde Park
Sites
⋈ ⋈
CanadadiverseLondonRamad
a3Air
Show
A tuple of the join corresponds to a set of tuples from the source relations
Join consistentJoin consistent
ConnectedConnectedNo Cartesian product
CompleteCompleteOne tuple from each relation
Join consistentJoin consistent
ConnectedConnectedNo Cartesian product
CompleteCompleteOne tuple from each relation
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Join-Consistent Sets of TuplesJoin-Consistent Sets of Tuples
A set T of tuples is join-consistent if every two tuples of T are join-consistent
Two tuples t1 and t2 are join-consistent if for every common attribute A:
1.1. t1[A] and t2[A] are non-null
2.2. t1[A] = t2[A]
CountryCityHotelStarsCanadaLondonRamada
CountryCitySiteCanadaLondonAir Show
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Connected Sets of TuplesConnected Sets of Tuples
CountryClimateCanadadiverse
CountryCitySiteUKLondonBuckingham
The nodes are the tuples of T An edge between every two tuples with a common attribute
The join graph of a set T of tuples:
A set of tuples is connected if its join graph is connected
CityHotelStarsTorontoPlaza4
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Natural Natural JoinJoin (w/o Cartesian Product) (w/o Cartesian Product)
T is join consistentT is join consistent1.1.
Each tuple of the result corresponds to aset T of tuples from the source relations
T is connectedNo Cartesian product
T is connectedNo Cartesian product
2.2.
T is completeOne tuple from each relation
T is completeOne tuple from each relation
3.3.
JCCJCC
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
FullFull Disjunction Disjunction (Galindo-Legaria 1994)(Galindo-Legaria 1994)
T is join consistentT is join consistent1.1.
T is connectedNo Cartesian product
T is connectedNo Cartesian product
2.2.
T is completeOne tuple from each relation
T is completeOne tuple from each relation
3.3.
Each tuple of the result corresponds to a set T of tuples from the source relations
T is maximalNot properly contained in any JCC set
T is maximalNot properly contained in any JCC set
3.3.
JCCJCC
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
An Example of a Full DisjunctionAn Example of a Full Disjunction
CountryClimateCanadadiverse
UKtemperate
ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
AccommodationsAccommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckingham
SitesSites
CountryClimateCityHotelStarsSiteFD(R)
R
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
An Example of a Full DisjunctionAn Example of a Full Disjunction
CountryClimateCanadadiverse
UKtemperate
ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
AccommodationsAccommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckingham
SitesSites
CountryClimateCityHotelStarsSite
CanadadiverseTorontoPlaza4
FD(R)
R
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
An Example of a Full DisjunctionAn Example of a Full Disjunction
CountryClimateCanadadiverse
UKtemperate
ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
AccommodationsAccommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckingham
SitesSites
CountryClimateCityHotelStarsSite
CanadadiverseTorontoPlaza4
CanadadiverseLondonRamad
a3Air Show
FD(R)
R
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
An Example of a Full DisjunctionAn Example of a Full Disjunction
CountryClimateCanadadiverse
UKtemperate
ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
AccommodationsAccommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckingham
SitesSites
CountryClimateCityHotelStarsSite
CanadadiverseTorontoPlaza4
CanadadiverseLondonRamad
a3Air ShowCanadadiverse
Mouth Logan
FD(R)
R
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
An Example of a Full DisjunctionAn Example of a Full Disjunction
CountryClimateCanadadiverse
UKtemperate
ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
AccommodationsAccommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckingham
SitesSites
CountryClimateCityHotelStarsSite
CanadadiverseTorontoPlaza4
CanadadiverseLondonRamad
a3Air ShowCanadadiverse
Mouth Logan
UKtempera
te
London Buckingha
m
FD(R)
R
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
An Example of a Full DisjunctionAn Example of a Full Disjunction
CountryClimateCanadadiverse
UKtemperate
ClimatesClimatesCountryCityHotelStarsCanadaTorontoPlaza4CanadaLondonRamada3
AccommodationsAccommodations
CountryCitySiteCanadaLondonAir Show
CanadaMouth Logan
UKLondonBuckingham
SitesSites
CountryClimateCityHotelStarsSite
CanadadiverseTorontoPlaza4
CanadadiverseLondonRamad
a3Air ShowCanadadiverse
Mouth Logan
UKtempera
te
London Buckingha
m
FD(R)
R
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Padding Joined Tuple Sets with NullsPadding Joined Tuple Sets with Nulls
CountryCitySite
CanadaMouth Logan
CountryClimate
Canadadiverse
Canadadiverse Mouth Logan
CountryClimateCityHotelStarsSite
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
The Outerjoin OperatorThe Outerjoin Operator
The outerjoin of two relations R1 and R2
R1 R2
o⋈
The natural join R1 R2 and, in addition, all dangling tuples padded with nulls
⋈
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Example of an Outerjoin Example of an Outerjoin
CountryClimate
Canadadiverse
Bahamastropical
UKtempera
te
ClimatesClimatesCountryCityHotelStars
CanadaToront
oPlaza4
FranceParisAtala 4
BahamasNassauHilton
AccommodationsAccommodations
CountryClimateCityHotelStars
CanadadiverseToront
oPlaza4
BahamastropicalNassauHilton
UKtemperat
e
FranceParis Atala4
Climates AccommodationsClimates Accommodationso⋈
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Combining Relations using Outerjoins Combining Relations using Outerjoins
The outerjoin operator is not associativeFor more than two relations, the result depends on the order in which the outerjoin is applied
In general, outerjoins cannot maximally combine relations (no matter what order is used)
Outerjoin is not suitable for Outerjoin is not suitable for combining more than two relationscombining more than two relations!!
Outerjoin is not suitable for Outerjoin is not suitable for combining more than two relationscombining more than two relations!!
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−ComplexityComplexity
Contributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Efficiency of EvaluationEfficiency of Evaluation
The full-disjunction operator (as well as other operators
like the Cartesian product or the natural join) can generate an exponential (in the input size) number of tuples
Polynomial running time is not a suitable yardstick
The usual notion:
Polynomial time in the combined size of the input and the output
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
History of Algorithms for Full DisjunctionsHistory of Algorithms for Full Disjunctions
SourceSource TimeTime DatabasesDatabases
RU96 O(n+F2) -acyclic
KS03 O(n5N2F2) general
CS05 O(n3NF2)“incremental polynomial”
general
n:N:F:
number of relationsnumber of tuples in the DBnumber of tuples in the FD
This paper: linear dependence on FThis paper: linear dependence on F
F is typically very large Can be exponential in the
size of the database
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Polynomial DelayPolynomial Delay
One way to obtain an evaluation with a running time linear in the output is to devise an algorithm that actsas an iterator with an efficient next() operator, that is,
An enumeration algorithm that runs with polynomial delay
An enumeration algorithm runs with polynomial delay if the time between every two successive answers is polynomial in the size of the input
time
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Other Benefits of Polynomial DelayOther Benefits of Polynomial Delay
Incremental evaluationIncremental evaluation First tuples are generated quickly
Full disjunctions are large, yet the user need not wait for the whole result to be generated
Suitable for Web applications, where users expect to get the first few pages quickly
In addition, the user can decide anytime that enough information has been shown
Enable parallel query processingEnable parallel query processing While one processor generates the FD tuples,
other processors apply further processing
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
ContributionsContributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Main ContributionsMain Contributions
1.1. First algorithm for computing full disjunctions with polynomial delaypolynomial delay
2.2. First algorithm for computing full disjunctions in time linearlinear in the output
3.3. A general optimizationoptimization technique for computing full disjunctions
Division into biconnected components
Substantial improvement over the state-of-art is proved theoretically and experimentally
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
Contributions
AlgorithmsAlgorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Our Algorithms Our Algorithms
Algorithm NLOJNLOJTree Schemes
Algorithm NLOJNLOJTree Schemes
Algorithm PDelayFDPDelayFDGeneral Schemes
Algorithm PDelayFDPDelayFDGeneral Schemes
Division into Biconnected ComponentsBiconnected ComponentsOptimization
Division into Biconnected ComponentsBiconnected ComponentsOptimization
CombineCombine
Algorithm BiComNLOJBiComNLOJMain Algorithm − General Schemes
Algorithm BiComNLOJBiComNLOJMain Algorithm − General Schemes
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
Contributions
Algorithms−Algorithm Algorithm NLOJNLOJ for Tree-Structured Schemes for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Tree SchemesTree Schemes
R1
R2 R3
R4
R5
R6R7
Scheme graphs w/o cycles
In the scheme graph, the relation schemes are the nodes and there is an edge between every two schemes with one or more common attributes
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Left-Deep Sequence of OuterjoinsLeft-Deep Sequence of Outerjoins
RR : a set of relations with a tree scheme
RR11,…,,…,RRnn :: a connected-prefix order of R
Algorithm NLOJ (Nested Loop OuterJoin)
1.1. Compute a connected-prefix order of R2.2. Apply outerjoins in a left-deep order
FD(R) = (…((R1 R2) R3) …) RnFD(R) = (…((R1 R2) R3) …) Rn
o⋈ o⋈ o⋈ o⋈Proposition:Proposition:
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Connected-Prefix Order of RelationsConnected-Prefix Order of Relations
A connected-prefix order of relations:Each prefix forms a (connected) subtree
R1
R2 R3
R4
R5
R6R7
R1 R3 R2 R7 R4 R5 R6
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Achieving Polynomial DelayAchieving Polynomial Delay
Algorithm NLOJ (Nested Loop OuterJoin)
1.1. Compute a connected-prefix order of R2.2. Apply outerjoins in a left-deep order
R1 R2
o⋈ R3
o⋈ Rn-1
o⋈ Rn
o⋈…
Already exponential size!
Problem:Problem: exp. delayProblem:Problem: exp. delay Solution:Solution: use iteratorsSolution:Solution: use iterators
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
IteratorsIterators
AlgorithmAlgorithm
Operate on top of an enumeration algorithm
Implement next() by controlling the execution
To obtain polynomial delay, we use iterators
IteratorIteratornextnext()()
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Using Iterators for OuterjoinsUsing Iterators for Outerjoins
Iterator 1Iterator 1
Iterator Iterator nn
Iterator 2Iterator 2
Iterator Iterator nn-1-1
R1 R2
o⋈ R3
o⋈ Rn-1
o⋈ Rn
o⋈…
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Outerjoins are not Always ApplicableOuterjoins are not Always Applicable
It is not always possible to formulate a full disjunction as a left-deep sequenceof outerjoins
Rajaraman and UllmanRajaraman and Ullman [PODS 96]:: Some full disjunctions cannot be formulated as expressions of outerjoins (i.e., with arbitrary placement of parentheses)
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
Contributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm Algorithm PDelayFDPDelayFD forfor GeneralGeneral SchemesSchemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
About the AlgorithmAbout the Algorithm
Unlike NLOJ, the next algorithm, PDelayFD, is applicable to all schemes (and not just trees)
Algorithm PDelayFD has a polynomial delay, but the delay is larger than that of NLOJ
Nevertheless, PDelayFD by itself is a significant improvement over the state-of-art
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Shifting a Maximal JCC Tuple Set Shifting a Maximal JCC Tuple Set TT
tt-shifting -shifting TT::
t t t
t-shift of T
1.1. Add t to T
2.2. Extract max. JCC subset containing t
3. 3. Extend to a maximal JCC setT
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Validate that the t-shift is not already in Q or C
Algorithm Algorithm PDelayFDPDelayFD
1.1. Generate a max. JCC set T0
2.2. Insert T0 into Q
Repeat until Q is empty:
1.1. Move some T from Q to C
2.2. Print the join of T, padded with nulls
3.3. Insert into Q a t-shift of T for all tuples t in the database
OutputOutput:: …
PDelayFD(R) computesFD(R) with polynomial delay
Theorem:Theorem:
CQ
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
Contributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−AlgorithmAlgorithm BiComNLOJBiComNLOJ − Main Algorithm− Main Algorithm
Experimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
NLOJNLOJ vs. vs. PDelayFDPDelayFDR3
R5
R2
R9
R8
R7
R10
R4R6
R1
NLOJNLOJNLOJNLOJ PDelayFDPDelayFDPDelayFDPDelayFD
R3
R5
R2
R9
R8
R7
R10
R4R6
R1
R3
R5
R2
R9
R8
R7
R10
R4R6
R1
??
Our approach: divide and conquerdivide and conquer
Shorter delays Less space Simpler to impl.
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Biconnected ComponentsBiconnected Components
R1
R2 R3
R4R7
R1 R2
R4
R7
R8
R9
R5
R6R3
R5
R6
R8
Biconnected componentBiconnected component::
A maximal subset B of relations, s.t. the scheme graph hastwo (or more) disjoint paths between every two relations of B
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Left-Deep Sequence of OuterjoinsLeft-Deep Sequence of Outerjoins
RR : a set of relations
Theorem:Theorem:
Optimized Algorithm:Optimized Algorithm:
1.1. Compute the biconnected components of R2.2. Compute the full disjunction of each component3.3. Apply outerjoins in a suitable order
There exists an (efficiently computable) order B1,…,Bk of the biconnected components of R, s.t.
FD(R) = (…((FD(B1) FD(B2)) …) FD(Bk)
There exists an (efficiently computable) order B1,…,Bk of the biconnected components of R, s.t.
FD(R) = (…((FD(B1) FD(B2)) …) FD(Bk)o⋈ o⋈ o⋈
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
BiComNLOJBiComNLOJ: a Naïve Attempt: a Naïve Attempt
1.1. Divide R into biconnected components
→ B1,…Bk in a suitable order
1.1. Divide R into biconnected components
→ B1,…Bk in a suitable order
2.2. Compute FD(B1),…,FD(Bk)
− using PDelayFDPDelayFD
2.2. Compute FD(B1),…,FD(Bk)
− using PDelayFDPDelayFD
3.3. Using NLOJNLOJ, compute (…((FD(B1) FD(B2)) …) FD(Bk)
3.3. Using NLOJNLOJ, compute (…((FD(B1) FD(B2)) …) FD(Bk)
Each FD(Bi) can be exponential in the input
Non-polynomial delay!Non-polynomial delay!
Iterator Iterator Iterator Iterator Iterator Iterator Solution:Solution:
o⋈ o⋈ o⋈
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
• After generating a tuple t of FD(B1), we need to generate all tuples of FD(B2) that can join t
• Non-polynomial delay if all of FD(B2) is computed for finding these tuples!
• Solution:Solution: PDelayFD can be modified so that it generates only those tuples of FD(B2) that can join t
Retaining Polynomial Delay: 1Retaining Polynomial Delay: 1stst Problem Problem
For simplification, assume only two components
R2
R3R1
R4
R6
R7R5
R8B1 B2
Details in the proceedings…Details in the proceedings…Details in the proceedings…Details in the proceedings…
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
• The last step is to generate all tuples of FD(B2) that cannot be joined with tuples of FD(B1)
• However, this task is by itself NP-hard!
• Solution: When generating all tuples of FD(B2) that can be joined with some tuple of FD(B1), we collect enough information for generating the remaining tuples of FD(B2)
Retaining Polynomial Delay: 2Retaining Polynomial Delay: 2ndnd Problem Problem
For simplification, assume only two components
Details in the proceedings…Details in the proceedings…Details in the proceedings…Details in the proceedings…
R2
R3R1
R4
R6
R7R5
R8B1 B2
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
Contributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental ResultsExperimental Results
Conclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Experimental SettingExperimental Setting
Algorithms: PDelayFD, BiComNLOJ (main) IncrementalFD (CS05, state-of-art)
PosgreSQLPosgreSQL (open source)
HW:HW: Pentium4, 1.6GHZ, 512MB RAM
Implementation
R3
R1
R5R2
R4R6
R9
R8
R7
R10
Scheme S1
R3R1 R7R5
R8
R2
R4
R6
R10R9
Scheme S2
R2
R5
R1
R4
R9
R10R8
R7
R6R3
Scheme S3
• Synthetic data (randomly generated)
• Fixed schemes
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
0
50
100
150
1000 2000 3000 4000 5000Number of Tuples in each Relation
Ave
rag
e D
ela
y (m
sec)
State-of-Art vs. Main AlgorithmState-of-Art vs. Main Algorithm
IncrementalFDIncrementalFD
(state of art, CS05)
BiComNJOJBiComNJOJour main algorithm
BiComNLOJBiComNLOJ is a substantial improvement over the state-of-art
BiComNLOJBiComNLOJ is a substantial improvement over the state-of-art
Scheme 11
Scheme 22
Scheme 33
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
0
50
100
1000 2000 3000 4000 5000Number of Tuples in each Relation
Ave
rag
e D
ela
y (m
sec)
Division into Biconnected ComponentsDivision into Biconnected Components
Division reduces delaysDivision reduces delays(amount depends on the scheme)
Division reduces delaysDivision reduces delays(amount depends on the scheme)
PDelayFDPDelayFD
(no division to b.c.c.)
BiComNJOJBiComNJOJour main algorithm
Scheme 11
Scheme 22
Scheme 33
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Behavior of DelayBehavior of Delay
IncrementalFDIncrementalFD
(state of art, CS05)
BiComNJOJBiComNJOJour main algorithm
0
50
100
150
200
0 5000 10000 15000Tuple Number
Del
ay (
mse
c)Measure the delay before
each generated tuple
While IncrementalFD has a slowdown, the delay of BiComNLOJ remains almost constant
While IncrementalFD has a slowdown, the delay of BiComNLOJ remains almost constant
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContentsContents
Full Disjunctions−Complexity
Contributions
Algorithms−Algorithm NLOJ for Tree-Structured Schemes
−Algorithm PDelayFD for General Schemes
−Algorithm BiComNLOJ − Main Algorithm
Experimental Results
ConclusionConclusion
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
SummarySummary
Full DisjunctionFull Disjunction::
An associative extension of the outerjoin operator to an arbitrary number of relations
33 Algorithms for computing FD: Algorithms for computing FD:
NLOJNLOJNested-Loop
Outerjoin
Tree-Structured Schemes
NLOJNLOJNested-Loop
Outerjoin
Tree-Structured Schemes
PDelayFDPDelayFDPolynomial-Delay Full Disjunction
General Schemes
PDelayFDPDelayFDPolynomial-Delay Full Disjunction
General Schemes
BiComNLOJBiComNLOJCombine first 2, deploy
div. into biconnected components
General Schemes
BiComNLOJBiComNLOJCombine first 2, deploy
div. into biconnected components
General Schemes
Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
ContributionsContributions
Substantial improvement of evaluation timeimprovement of evaluation time over the state-of-art Proved theoretically and experimentally
Full disjunctions can be computed with polynomial polynomial delaydelay and in time linearlinear in the output size
OptimizationOptimization techniques for computing FDs
Implementation within PostgreSQL PostgreSQL (ongoing…)
Incorporating our algorithms into an SQL optimizerSQL optimizer E.g., some operators can be pushed through the FD Not discussed here, appears in the proceedings…
VLDB 06 Full Disjunctions: Polynomial-Delay Iterators in ActionVLDB 06
Thank you.Thank you.
Questions?