a pproximate q uery p rocessing u sing w avelets kaushik chakrabarti(univ of illinois) minos...

42
APPROXIMATE QUERY PROCESSING USING WAVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented By: Charanmai Koorapati Ramesh Harika Guniganti

Upload: hester-newman

Post on 31-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

APPROXIMATE QUERY PROCESSING USING WAVELETS

Kaushik Chakrabarti(Univ Of Illinois)

Minos Garofalakis(Bell Labs)

Rajeev Rastogi(Bell Labs)

Kyuseok Shim(KAIST and AITrc)

Presented By:

Charanmai Koorapati Ramesh

Harika Guniganti

Page 2: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

AGENDA

Introduction Motivation Prior Work Wavelet Decomposition Building Wavelet Synopses Processing Relational Queries Experimental Study Quality Metrics Query Execution Times Conclusion

Page 3: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

DECISION SUPPORT SYSTEMS

Comparative sales figures between one week and the next

Projected revenue figures based on new product sales assumptions

The consequences of different decision alternatives, given past experience in a context that is described

Page 4: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

MOTIVATION

DSS users pose very complex queries to the underlying DBMS that require complex operations over Gigabytes or Terabytes of disk-resident data.

SQL Query

Exact Answer

Decision Support Systems

Long Response Times!

Page 5: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

Exact answers NOT always required. User may prefer a fast, approximate answer.

SQL Query

Exact Answer

CompacCompact Data t Data SynopsSynopseses

“Transformed” Query

KB/MB

Approximate Answer

FAST!!

Long Response Times!

Decision Support Systems

GB/TB

Page 6: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

APPROXIMATE QUERY PROCESSING

Viable solution for dealing with Huge amounts of data High query complexities Increasingly stringent response-time

requirements

Page 7: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

PRIOR WORK

Sampling Based TechniquesLimitations:• Join operator on two uniform samples• Non- aggregate query

Histogram Based TechniquesLimitations:• Storage overhead• Construction cost achieve reasonable error rates for high

dimensional data sets.

Page 8: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

WAVELET BASED TECHNIQUES

Wavelet -mathematical function used to divide a given function or continuous-time signal into different frequency components

and study each component with a resolution that matches its scale.

This paper extends the scope of earlier work , establishing the viability and effectiveness of wavelets as a generic approximate query processing tool for modern high-dimensional DSS applications.

Page 9: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

APPROXIMATE QUERY PROCESSING USING WAVELETS

Novel approach consisting of two steps- Multi dimensional Haar wavelets - effective,

compact synopses Novel query processing alogorithms - fast

and accurate approximate query answers

Page 10: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

WAVELET DECOMPOSITION/TRANSFORM

One- dimensional Haar WaveletsData vector A = [2,2,5,7]

Wavelet transform, WA = [4,-2,0,-1]

Resolution Averages Detail Coefficients

2 [2,2,5,7] -

1 [2,6] [0,-1]

0 [4] [-2]

Wavelet Coefficient

Page 11: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

NORMALIZED WAVELET TRANSFORM

To equalize the importance of all the wavelet coefficients , we normalize the final entries of WA, by dividing each wavelet coefficient by √2 ^l,

where l is the level of resolution.

Thus WA= [4,-2,0,-1/ √2]

Page 12: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

MULTIDIMENSIONAL HAAR WAVELETS Standard Decomposition First, fix an ordering for the data

dimensions(say 1,2,… d) and then proceed to apply the complete one-dimensional wavelet transform for each one dimensional “row” of array cells along dimension k, for all k=1,2…d.

Non- standard DecompositionGiven an ordering for the data dimensions (1,2,…d), we perform one step of pairwise averaging and differencing for each one dimensional row of array cells along dimension k, for each k=1,…d. This process is repeated recursively only on quadrant containing averages across all dimensions.

Page 13: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

NON-STANDARD DECOMPOSITION

Page 14: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

EXAMPLE DECOMPOSITION OF A 4×4 ARRAY

Page 15: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

MULTIDIMENSIONAL HAAR COEFFICIENTS- SEMANTICS AND REPRESENTATION

Page 16: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

SUPPORT REGIONS AND SIGNS FOR 16 NONSTANDARD 2-DIMENSIONAL HAAR BASIS FUNCTIONS

Page 17: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

Haar wavelet coefficient can be represented with the triple

W=<R,S,v> where1) W.R is d-dimensional support hyper-

rectangle of W Along each dimension j,1<=j<=d

Low boundary value - W.R.bound[j].loHigh boundary value - W.R.bound[j].hi

Coefficient W contributes to each data cell of A[i1,…id] satisfying the condition W.R.bound[j].lo <= ij <= W.R.bound[j].hi

for all dimensions j, 1<= j<=d

Page 18: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

2) W.S stores sign information for all d-dimensional quadrants of W.R.

The two elements of the sign vector of coefficient W along dimensions j are denoted by

W.S.sign[j].lo , W.S.sign[j].hi corresponding to lower and upper half of W.R’s extent along dimension j.

The sign information is computed as a product of the d-sign entries that map to that quadrant.

3) W.v is the (scalar) magnitude of coefficient W.This is exactly the quantity that W contributes

to all data array cells enclosed in W.R.

Page 19: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

BUILDING WAVELET-COEFFICIENT SYNOPSES

Joint Data Distribution Joint Data Distribution ArrayArray

0 1 2 3Attr1

3

2

1

0

Attr2

36

4

Attr1 Attr2 Count

2 0 4

1 1 6

3 1 3

Relation (ROLAP) Relation (ROLAP) Representation Representation

Capturing d-dimensional array AR (joint frequency distribution) from relational table R (“set of tuples” ROLAP)

Page 20: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

What is the size of the wavelet-coefficient synopsis?

Page 21: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

PROCESSING RELATIONAL QUERIES IN WAVELET-COEFFICIENT DOMAIN

Wavelet Synopses

Approximate

Relations

Query Results in

Wavelet Domain

Final Approximate

Results

Render

Render

Querying in

Wavelet

Domain

Querying in

Relation

Domain

Compressed domain (FAST)

Relation domain (SLOW)

• Reduce relations into compact wavelet-coefficient synopses

Page 22: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

WAVELET QUERY PROCESSING

join

project

select select

set of coefficients

set of coefficients

set of coefficients

Each operator (e.g., select, project,

join, aggregates, etc.)

input: set of wavelet

coefficients

output: set of wavelet

coefficients

Finally, rendering step

input: set of wavelet

coefficients

output: (multi)set of tuples

render

Page 23: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

QUICK REVIEW OF NOTATIONS

Page 24: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

SELECTION OPERATOR (SELECT)

Page 25: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

SELECTION -- RELATIONAL DOMAIN

In relational domain, interested in only those cells inside query range

In wavelet domain, interested in only the coefficients that contribute to those cells

Dim D1(Attr1)

Dim D2(Attr2)

Count

0 6 61 2 31 3 41 5 61 6 82 6 73 0 14 2 35 2 26 1 36 2 26 5 16 6 3

Dim. D2

6

3

7

3

32

2

4

1

1

8

6

3

Query Range

Dim.

D1

Joint Data Distribution ArrayJoint Data Distribution ArrayRelationRelation

Page 26: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

APPROXIMATE QUERY EXECUTION ENGINE PROCESS FOR SELECT

Page 27: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

SELECTION -- WAVELET DOMAIN

--++

+ -

-+

+-

D2

D1

Query

Range -+

-+

-+

D2

D1

Page 28: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

PROJECTION OPERATOR (PROJECT)

Page 29: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

PROJECTION- WAVELET DOMAIN

Page 30: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

JOIN OPERATOR (JOIN)

Page 31: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

EQUI-JOIN -- RELATIONAL DOMAIN

Relational domain: Join count= 7*3 = (A1-A3)*(B2+B3) Wavelet domain: A1*B2 + A1*B3 - A3*B2 - A3*B3 Consider all pairs of coefficients: (1) check joinability (overlap in

join dimension(s)), (2) compute output coefficients

3

Coefficients A1 (+) and A3 (-)

contribute to this cell

Coefficients B2 (+), and B3 (+)

contribute to this cellDim D1(Attr1)

Dim D2(Attr2)

Count

6 2 74 3 6

Dim D1(Attr1)

Dim D3(Attr3)

Count

6 3 3

Join along D1

Dim D1(Attr1)

Dim D2(Attr2)

Dim D3(Attr3)

Count

6 2 3 21

Joint Data DistributionJoint Data Distribution of Relation 1of Relation 1

Joint Data Distr.Joint Data Distr. of Relation 2of Relation 2

7

6

Dim. D2 Dim. D3

Join Dim.

D1

Relation 1Relation 1

Relation 2Relation 2

Page 32: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST
Page 33: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST
Page 34: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

EQUI-JOIN -- WAVELET DOMAIN

-+

D3

D1--++

D2

D1

D1

v1 v2

Join output coefficient:

D3

D1

+

D2

-v = v1 * v2

Page 35: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

EXPERIMENTAL STUDY

Improved Answer Quality

Low Synopsis Construction Costs

Fast Query Execution

Page 36: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

ERROR METRICS FOR SET-VALUED QUERY ANSWERS

Need an error metric for (multi)sets that accounts for both differences in element frequencies

differences in element values

Proposed Solutions MAC (Match-And-Compare) Error [IP99]: based on perfect

bipartite graph matching

EMD (Earth Mover’s Distance) Error [CGR00, RTG98]: based on bipartite network flows

Page 37: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

QUERY EXECUTION TIMES

Page 38: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

SELECT-JOIN-SUM QUERY ERRORS ON REAL-LIFE DATA

Page 39: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

SELECT query errors on real-life data

Page 40: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

 SELECT-SUM QUERY ERRORS ON REAL-LIFE DATA

Page 41: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

CONCLUSION

Multidimensional wavelets as an effective tool for general purpose approximate query processing in modern, high dimensional applications.

The query processing algorithms operate directly on the wavelet-coefficient synopses of relational data, thus allowing for very fast processing of arbitrarily complex queries entirely in the wavelet-coefficient domain.

 Extensive experimental study with synthetic as well as real-life data sets that verifies the effectiveness of our wavelet-based approach compared to both sampling and histograms

Page 42: A PPROXIMATE Q UERY P ROCESSING U SING W AVELETS Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST

Questions???

THANK YOU