leonardo guerreiro azevedo geraldo zimbrão jano moreira de souza approximate query processing in...

29
Leonardo Guerreiro Leonardo Guerreiro Azevedo Azevedo Geraldo Zimbrão Geraldo Zimbrão Jano Moreira de Souza Jano Moreira de Souza Approximate Query Processing in Spatial Databases Approximate Query Processing in Spatial Databases Using Raster Signatures Using Raster Signatures Federal University of Federal University of Rio de Janeiro Rio de Janeiro {azevedo, zimbrao,jano}@cos.ufrj.br

Upload: tyler-may

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Leonardo Guerreiro AzevedoLeonardo Guerreiro AzevedoGeraldo ZimbrãoGeraldo ZimbrãoJano Moreira de SouzaJano Moreira de Souza

Approximate Query Processing in Spatial Databases Using Approximate Query Processing in Spatial Databases Using Raster SignaturesRaster Signatures

Federal University Federal University of of

Rio de JaneiroRio de Janeiro

{azevedo, zimbrao,jano}@cos.ufrj.br

Page 2: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FINAL CONSIDERATIONS

FINAL CONSIDERATIONS

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

PROPOSALS OF ALGORITHMS

PROPOSALS OF ALGORITHMS

FIRST CONSIDERATIONS

GOALS AND CONTRIBUTIONS

FOUR-COLOR RASTER SIGNATURE (4CRS)

Presentation plan

4CRS

FOUR-COLOR RASTER SIGNATURE (4CRS)

GOALS AND

CONTRIBUTIONS

GOALS AND CONTRIBUTIONS

FIRST CONSIDERATIONS

FIRSTCONSIDERATIO

NS

Page 3: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

There are many cases where a query can take a long time to be processed, for example:– When processing huge volume of data that requires a

large number of I/O operations• Disk access time is still higher than memory access time

– When processing high complex queries– When accessing remote data due to a slow network

link or even temporary non-availability

... ... ...

Motivation

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

An exact An exact answer can answer can demand a demand a long timelong time

Page 4: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

There are many cases where a query can take a long time to be processed, for example:– When processing huge volume of data that requires a

large number of I/O operations• Disk access time is still higher than memory access time

– When processing high complex queries– When accessing remote data due to a slow network

link or even temporary non-availability

... ... ...

A fast answer A fast answer can be more can be more important than important than an exact an exact responseresponse

Motivation

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

Page 5: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Motivation

The challenge becomes bigger in spatial data environments.

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

399,0000 segments 475,434 segments

Page 6: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Motivation

Precision of the query can be lessened, and an approximate answer returned to the user– Approximate answers can be quickly computed– Acceptable precision

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

Page 7: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Motivation

There are many approaches on the approximate query processing field, however most of them are not suitable for spatial data.

“Research new techniques for approximate query processing that support the uniqueness of spatial data is a major issue in the database field”. (Roddick et al., 2004)

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

Page 8: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Scenarios and Applications

Decision Support System Increasing business competitivenessMore use of accumulated data

Data miningDuring drill down query sequence in ad-hoc data miningEarlier queries in a sequence can be used to find out the interesting queries.

Data warehousePerformance and scalability when accessing very large volumes of data during the analysis process.

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

Page 9: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Scenarios and Applications

Mobile computingAn approximate answer may be an alternative:

When the data is not availableTo save storage space

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

Page 10: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Exact answeres

Traditional SDBMS query processing environment

Queries

Spatial DBMSSlowSlow

New data (inserts or updates)

Deleted data

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

Page 11: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

SDBMS set-up for providing approximate query answers

Spatial DBMS

New data (inserts or updates)

Deleted data

Approximate Answer + conf.

IntervalFast answerFast answer

ApproximateQuery Processing

Engine

Exactanswer

Queries

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FIRSTCONSIDERATIO

NS

Page 12: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Goals

Execute approximate query processing in Spatial Databases using Raster Signature– Four-Color Raster Signature (4CRS) (Zimbrao and

Souza, 1998).

Provide fast approximate query answers for queries over spatial data.

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 13: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Contributions

Proposals of algorithms for many spatial operations that can be approximately processed using 4CRS

Spatial operators returning numbers Area, distance, diameter, perimeter…

Spatial predicates Equal, different, disjoint, area disjoint, inside, meet,

adjacent…Operators returning spatial data type values

Intersection, plus (union), minus, common border…Spatial operators on set of objects

Sum, closest, decompose, overlay, fusion.

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 14: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Contributions

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Proposals of algorithms Approximate Area of Polygon Distance Diameter Perimeter and Contour Equal and Different Disjoint, Area Disjoint, Edge Disjoint Inside (Encloses), Edge Inside, Vertex Inside Intersects and Intersection Overlay Adjacent, Border in Common, Common border Plus and Sum Minus Fusion Closest Decompose

Page 15: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Four-Color Raster Signature (4CRS)

4CRS is a raster approximationIt is an object representation upon a grid of cells

Grid resolution can be changed Precision × Storage requirements

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 16: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Four-Color Raster Signature (4CRS)

Bit value Cell type Description

00 Empty The cell is not intersected by the polygon

01 Weak The cell contains an intersection of 50% or less with the polygon

10 Strong The cell contains an intersection of more than 50% with the polygon and less than 100%

11 Full The cell is fully occupied by the polygon

Each cell stores relevant information using few bits4CRS 4 types of cells

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 17: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Four-Color Raster Signature (4CRS) - Generation

Polygon

4CRS

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 18: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Approximate Area of Polygon

Approximate area of polygon

Approximate area of polygon within window

Approximate overlapping area of polygon join

Based on the expected area of polygon within cell

Based on the expected area of polygon within cell

Based on the intersection expected area of two types of cells

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 19: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

E

F

W

S

Expected Area = zero% µ = 0

Expected Area = 100% µ = 1

Expected area (µ) of cell type

Expected Area (0, 0.50] µ = 0.25

Expected Area (0.50, 1) µ = 0.75

Approximate area of polygon

Approximate area of polygonApproximate area of polygon within cell

cellareaanswer eApproximatt

t

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Grid and polygon are independent from each other

Page 20: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Approximate overlapping area of polygon join

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

W E×S E

S W

S S

×××

µW×E

µS×E

µS×W

µS×S

expected area of cells overlapping

Page 21: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Approximate overlapping area of polygon join

Cell types Empty Weak Strong Full

Empty 0 0 0 0

Weak 0 0.0625 0.1875 0.25

Strong 0 0.1875 0.5625 0.75

Full 0 0.25 0.75 1

jiji cellareaanswer eApproximat

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Table of expected area of cells overlapping

Page 22: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Affinity degree

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

For other algorithms, when evaluating cell types it is also required to compute an approximate value in the interval [0,1] that indicates a true percentage of the response Affinity deggree: it is based on expected area of cells overlapping (Azevedo et al., 2005).

Cell types Empty Weak Strong Full

Empty 0 0 0 0

Weak 0 0.0625 0.1875 0.25

Strong 0 0.1875 0.5625 0.75

Full 0 0.25 0.75 1

Table of affinity degree

For some proposed algorithms, it is possible to return an approximate answer evaluating only cell types.

Page 23: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Equal

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Equal algorithm using 4CRS the approximate answer is equal to the sum of affinity degrees divided by the number of comparisons of pair of objects, if no trivial case occurs.

E×W

S S

F F

×××

µE×E = 1

µW×W = 0.0625

µS×S = 0.5625µF×F = 1

E

W

Sum of affinity degreeTrivial case:

not equal overlap of different cell types result false

S E×S W×F S×

Page 24: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Different

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Different algorithm is opposite to equal algorithm Affinity degree is equal to the 1 - affinity degrees

S E×S W×F S×

Trivial case: different overlap of different

cell types result trueµE×E = 0

µW×W = 1-0.0625

µS×S = 1-0.5625µF×F = 0

Sum of affinity degree

E×W

S S

F F

×××

E

W

Page 25: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Disjoint

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Disjoint: two objects are disjoint if they have no portion in common

Case III: weak × weak weak × strong

×W W

E

×S

EW

Case II: Only overlap of Disjoint (partial answer)Affinity degree += 1

F

Disjoint (partial answer)Affinity degree += 1 – expected area(type1,type2)

W S×

×S

F

W

Case I: At least one overlap of

Trivial case:Not disjoint (exact answer)

F

S S×

Page 26: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Distance

Distance can be estimate from 4CRS signatures computing the distance among cells corresponding to polygons’ borders (Weak and Strong cells).

Distance = average of the minimum and maximum distances

... ... ...

(a) (b) (c)

Minimumdistance

Maximumdistance

FINAL CONSIDERATIONS

PROPOSALS OF ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

PROPOSALS OF ALGORITHMS

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 27: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

FINAL CONSIDERATIONS

EXPERIMENTAL RESULTS

IMPL. AND EVAL. ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FINAL CONSIDERATIONS

EXPERIMENTAL RESULTS

IMPL. AND EVAL. ALGORITHMS

Conclusions

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Goal Provide an estimated result in orders of magnitude less time than the time

to compute an exact answer, along with a confidence interval for the answer.

Proposals Use raster approximations for approximate query processing in

spatial databases Use 4CRS signature to process the queries over polygons,

avoiding accessing the real data. Proposal many algorithms for approximate processing

Use expected area of polygons (Azevedo et al., 2005) to estimate responses

Page 28: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Implement and evaluate algorithms involving other kinds of datasets, for example, points and polylines, and combinations of them:

• point × polyline, polyline × polygon and polygon × polyline.

The experimental evaluation is not addressed in this work; it is on going work developed on Secondo (Güting et al., 2005) which is an extensible DBMS platform for research prototyping and teaching.

FINAL CONSIDERATIONS

EXPERIMENTAL RESULTS

IMPL. AND EVAL. ALGORITHMS

4CRSGOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

FINAL CONSIDERATIONS

EXPERIMENTAL RESULTS

IMPL. AND EVAL. ALGORITHMS

Future work

4CRS

GOALS AND

CONTRIBUTIONS

FIRSTCONSIDERATIO

NS

Page 29: Leonardo Guerreiro Azevedo Geraldo Zimbrão Jano Moreira de Souza Approximate Query Processing in Spatial Databases Using Raster Signatures Federal University

Leonardo Guerreiro AzevedoLeonardo Guerreiro AzevedoGeraldo ZimbrãoGeraldo ZimbrãoJano Moreira de SouzaJano Moreira de Souza

Approximate Query Processing in Spatial Databases Using Approximate Query Processing in Spatial Databases Using Raster SignaturesRaster Signatures

Federal University Federal University of of

Rio de JaneiroRio de Janeiro

{azevedo, zimbrao,jano}@cos.ufrj.br