graphrel: a relational graph query processor

40
GraphREL: A Decomposition-Based and Selectivity-Aware Relational Framework for Processing Sub-graph Queries Sherif Sakr School of Computer Science and Engineering University of New South Wales . http://www.cse.unsw.edu.au/ssakr/ BIT Seminars ’09 - Free University of Bolzano, Italy 16 November 2009 S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 1 / 40

Upload: university-of-new-south-wales

Post on 11-May-2015

2.177 views

Category:

Education


0 download

TRANSCRIPT

Page 1: GraphREL: A Relational Graph Query Processor

GraphREL: A Decomposition-Based andSelectivity-Aware Relational Framework for Processing

Sub-graph Queries

Sherif Sakr

School of Computer Science and EngineeringUniversity of New South Wales

.http://www.cse.unsw.edu.au/∼ssakr/

BIT Seminars ’09 - Free University of Bolzano, Italy

16 November 2009

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 1 / 40

Page 2: GraphREL: A Relational Graph Query Processor

Outline

Previous Work: Pathfinder - Relational XQuery Compiler.

Current Work: GraphREL - General Graph Query Processor.

Future Work: Scalable Graph Query Processing for New Generationof Database Applications.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 2 / 40

Page 3: GraphREL: A Relational Graph Query Processor

Outline

Previous Work: Pathfinder - Relational XQuery Compiler.

Current Work: GraphREL - General Graph Query Processor.

Future Work: Scalable Graph Query Processing for New Generationof Database Applications.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 3 / 40

Page 4: GraphREL: A Relational Graph Query Processor

Pathfinder: A Relational XQuery Processor

Pathfinder

XQuery Expression

Relational Algebra

MIL Code Generator SQL Code Generator

MIL Scripts SQL Scripts

Monet DBMS Conventional RDBMS

http://pathfinder-xquery.org/S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 4 / 40

Page 5: GraphREL: A Relational Graph Query Processor

Pathfinder: A Relational XQuery Processor

Pathfinder

XML Document

XQuery

Expression

Relational Algebra + Special Properties

XPath Accelerator

Estimation Rules

Cardinality Properties

Translation Templates

[VLDB’04]

[VLDB’08]

Conventional RDBMS

XQuery Estimator

Statistical Guide

Statistical Histograms

Relational Results XML

XPath Accelerator Encoding Tuples

+

Statistical Guide

XML Serializer

SQL Generator

System Administrator

Statistical Histograms

Cardinality Properties

Cardinality Properties Aware

SQL Scripts

[SIGMOD’07][IJWIS’09][JDM’09]

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 5 / 40

Page 6: GraphREL: A Relational Graph Query Processor

Outline

Previous Work: Pathfinder - Relational XQuery Compiler.

Current Work: GraphREL - General Graph Query Processor.

Future Work: Scalable Graph Query Processing for New Generationof Database Applications.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 6 / 40

Page 7: GraphREL: A Relational Graph Query Processor

GraphREL: Motivations

Graphs are among the most complicated and general form of datastructures.

Recently, they have been widely used to model many complexstructured and schemaless data such as social networks, chemicalcompounds, biological pathways, spatial databases, semantic web andbusiness process models.

Retrieving related graphs containing a query graph from a large graphdatabase is a key performance issue in all of these graph-basedapplications.

The success of any graph database application is directly dependenton the efficiency of the graph indexing and query processingmechanisms.

RDBMSs have repeatedly shown that they are very efficient, scalableand successful in hosting different kinds of data.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 7 / 40

Page 8: GraphREL: A Relational Graph Query Processor

Preliminaries: Graph Data Model

In labelled graphs, vertices and edges represent the entities and therelationships between them respectively.

The attributes associated with these entities and relationships arecalled labels.

A graph database D is a collection of member graphsD = {g1, g2, ...gn} where each member graph gi is denoted as(V , E , Lv , Le).

V is the set of vertices.E ⊆ V × V is the set of edges joining two distinct vertices.Lv is the set of vertex labels.Le is the set of edge labels.

labelled graphs are classified according to the direction of their edgesinto two main classes:

1 Directed-labelled graphs such as XML, RDF and traffic networks.2 Undirected-labelled graphs such as social networks and chemical

compounds.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 8 / 40

Page 9: GraphREL: A Relational Graph Query Processor

Preliminaries: Graph Queries

In principle, queries in graph databases can be broadly classified into thefollow- ing main categories:

Subgraph queries: this category searches for a specific pattern in thegraph database. The pattern can be either a small graph or a graphwhere some parts of it are uncertain, e.g., vertices with wildcardlabels.

Supergraph queries: this category searches for the graph databasemembers of which their whole structures are contained in the inputquery.

Similarity (Approximate Matching) queries: this category findsgraphs which are similar, but not necessarily isomorphic to a givenquery graph.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 9 / 40

Page 10: GraphREL: A Relational Graph Query Processor

Preliminaries: Subgraph Search Queries

Given a graph database D = {g1, g2, ..., gn} and a graph query q, itreturns the query answer set A = {gi |q ⊆ gi , gi ∈ D}.

A graph q is described as a sub-graph of another graph databasemember gi if the set of vertices and edges of q form subset of thevertices and edges of gi .

Formally, g1(V1, E1, Lv1, Le1) is defined as sub-graph ofg2(V2, E2, Lv2, Le2) if and only if:

1 For every distinct vertex x ∈ V1 with a label vl ∈ Lv1, there is adistinct vertex y ∈ V2 with a label vl ∈ Lv2.

2 For every distinct edge edge ab ∈ E1 with a label el ∈ Le1, there is adistinct edge ab ∈ E2 with a label el ∈ Le2.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 10 / 40

Page 11: GraphREL: A Relational Graph Query Processor

Preliminaries: Subgraph Search Queries

A

B A

C

A

D

A

B C

C D

A

C A

D

B

D

C A

A D

g1 g2 g3 q

mn

n

xx

zy

m z

n

x

x

ef

mx n

m

x

f m

n

x

xx e

(a) Sample graph database

A

B A

C

A

D

A

B C

C D

A

C A

D

B

D

C A

A D

g1 g2 g3 q

mn

n

xx

zy

m z

n

x

x

ef

mx n

m

x

f m

n

x

xx e

(b) Graph query

Figure: An example graph database and graph query

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 11 / 40

Page 12: GraphREL: A Relational Graph Query Processor

Our Approach: GraphREL

Relational encoding of graph data.

SQL translation of sub-graph search queries.

Filtering phase.

Optional verification phase.

Partitioned B-tree Indexes.

Statistical Summaries.

Decomposition-Based and Selectivity-Aware SQL Translation.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 12 / 40

Page 13: GraphREL: A Relational Graph Query Processor

Relational Encoding of Graph Data

The starting point of our relational framework is to find an efficientand suitable encoding for each graph member gi in the graphdatabase D.

We use the Vertex-Edge mapping scheme for storing directedlabelled graphs with the following structure:

Vertices(graphID, vertexID, vertexLabel)

Edges(graphID, sVertex , dVertex , edgeLabel)

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 13 / 40

Page 14: GraphREL: A Relational Graph Query Processor

Relational Encoding of Graph Data

g1

graphID vertexID vLabel

1 1 A

1 2 A

1 3 D

1 4 A

1 5 C

1 6 B

2 1 A

2 2 C

2 3 D

2 4 C

2 5 B

graphID sVertex dVertex eLabel

1 1 2 n

1 1 3 m

1 2 3 n

1 4 3 x

1 5 4 x

1 6 5 y

1 5 2 z

1 1 6 m

2 1 2 e

2 2 3 m

2 4 3 m

2 4 2 n

2 5 4 x

1A

B A

C

A

D

mn

n

xx

zy

m

A

B Cef

mx ng2

2

3

4

5

6

1

25

2 5 B2 5 4 x

2 1 5 fC Dmx n

m

g234

Edges TableVertices Table

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 14 / 40

Page 15: GraphREL: A Relational Graph Query Processor

SQL Translation of Graph Queries

Filtering Phase: a sub-graph query q consists of a set of verticesQV with size equal m and a set of edges QE equal n is evaluatedusing the following SQL translation template:

SELECT DISTINCT V1.graphID, Vi .vertexIDFROM Vertices as V1,..., Vertices as Vm, Edges as E1,..., Edges as En

WHERE∀mi=2(V1.graphID = Vi .graphID)AND ∀nj=1(V1.graphID = Ej .graphID)

AND ∀mi=1(Vi .vertexLabel = QVi .vertexLabel)AND ∀nj=1(Ej .edgeLabel = QEj .edgeLabel)

AND ∀nj=1(Ej .sVertex = Vf .vertexID AND Ej .dVertex = Vf .vertexID);

Verification Phase: an optional phase which is used to verify thateach vertex in the set of filtered vertices for each candidate graph isdistinct. It is applied only if more than one vertex of the set of queryvertices QV have the same label. This can be easily achieved usingtheir vertex ID.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 15 / 40

Page 16: GraphREL: A Relational Graph Query Processor

Partitioned B-tree Indexes

Partitioned B-tree indexing is a slight variant of the B-tree indexingstructure.

The main idea is the use of low-selectivity leading columns tomaintain partitions within the associated B-tree.

In labelled graphs, it is generally the case that the number of distinctvertices and edges labels are far less than the number of vertices andedges respectively.

For example, having an index defined in terms of columns(vertexLabel , graphID) can reduce the access cost of sub-graph querywith only one label to one disk page. On the contrary, an indexdefined in terms of the two columns (graphID, vertexLabel) requiresscanning a large number of disk pages.

Having partitioned B-trees indexes of the high-selectivity attributesachieves fixed execution times which are no longer dependent on thesize of the whole graph database.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 16 / 40

Page 17: GraphREL: A Relational Graph Query Processor

Limitations of SQL-Based Translation Approach

An obvious problem of the SQL translation template is that itinvolves a large number of conjunctive SQL predicates and joinoperations between the encoding tables.

Most of relational query engines will certainly fail to execute the SQLtranslation queries of medium size or large sub-graph queries becausethey are too long and too complex (this does not mean they mustconsequently be too expensive).

Therefore, we need a decomposition mechanism to divide this largeand complex SQL translation query into a sequence of intermediatequeries.

Applying this decomposition mechanism blindly may lead to inefficientexecution plans with very large, non-required and expensiveintermediate results.

We use statistical summary information to achieve an efficientdecomposition process.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 17 / 40

Page 18: GraphREL: A Relational Graph Query Processor

Statistical Summaries

In general, one of the most effective techniques for optimizing theexecution times of SQL queries is to select the relational executionbased on the accurate selectivity information of the query predicates.

We construct three Markov tables to store information about thefrequency of occurrence of the distinct labels of vertices, distinctlabels of edges and connection between pair of vertices (edges).

Vertex Label Frequency

A 100

B 200

C 38

D 4

E 50

L 6

M 10

N 250

O 3

P 40

R 55

Edge Label Frequency

a 40

c 5

e 28

l 54

m 140

n 3

o 20

p 15

x 8

y 60

z 15

Edge Label Connection

Frequency

ab 3

ac 15

ae 45

ec 14

em 103

la 5

pc 18

px 45

xy 25

xz 2R 55

Markov Table summary of vertices labels

z 15

Markov Table summary of edges labels

za 1

Markov Table summary of pair-wise edge connections

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 18 / 40

Page 19: GraphREL: A Relational Graph Query Processor

Decomposition-Based and Selectivity-Aware SQLTranslation

Identifying the pruning points.

Calculating the number of partitions.

Decomposed SQL translation.

Blindly Single-Level Decomposition.

Pruned Single-Level Decomposition.

Pruned Multi-Level Decomposition

Selectivity-aware Annotations.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 19 / 40

Page 20: GraphREL: A Relational Graph Query Processor

Decomposition-Based and Selectivity-Aware SQLTranslation

Identifying the pruning pointsEach vertex label, edge label or edge connection with low frequency isconsidered as a pruning point in our relational evaluation mechanism.

Given a query graph q, we first check the structure of q against oursummary Markov tables to identify the possible pruning points (NPP).

Calculating the number of partitionsHaving a sub-graph query q requires NJP join operations.

Assuming that the relational query engine can evaluate up to numberof join operations equal to MJP in one query.

The number of partitions (NOP) is computed as: (NJP/MJP)

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 20 / 40

Page 21: GraphREL: A Relational Graph Query Processor

Decomposition-Based and Selectivity-Aware SQLTranslation

Blindly Single-Level DecompositionIf NPP = 0 ⇒ we blindly decompose the query q into NOP partitions.Each partition is translated into an intermediate evaluation step Si .The final evaluation step joins all intermediate evaluation steps andadds the conjunctive conditions of the partition’s connectors.

Pruned Single-Level DecompositionIf NPP >= NOP ⇒ we distribute the pruning points across thedifferent intermediate NOP partitions.It ensures a balanced effective pruning of all intermediate results.

Pruned Multi-Level Decompositionif NPP < NOP ⇒ we distribute the pruning points across a first levelintermediate results of NOP partitions. An intermediate collectivepruned step IPS is constructed by joining all the pruned first levelintermediate results.IPS is used as an entry pruning point for the rest (NOP − NPP)non-pruned partitions in a hierarchical multi-level fashion .Each pruning point can be used to prune more than one partition (ifpossible).

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 21 / 40

Page 22: GraphREL: A Relational Graph Query Processor

Decomposition-Based and Selectivity-Aware SQLTranslation

S1 -SQL

S2 -SQL

FES -SQL

S1S2

S FES -

S1

S2

S3

S1 -SQL

S2 -SQL

FES -SQL

S3 -SQL

(a) NPP > NOP

S1 -SQL

S2 -SQL

FES -SQL

S1S2

S2

FES -

S1 S3

S1 -SQL

S2 -SQL

FES -SQL

S3 -SQL

(b) NPP < NOP

Figure: Selectivity-aware decomposition process

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 22 / 40

Page 23: GraphREL: A Relational Graph Query Processor

Decomposition-Based and Selectivity-Aware SQLTranslation

Selectivity-aware Annotations

For any given SQL query, there are a large number of alternativeexecution plans. These alternative execution plans may differsignificantly in their use of system resources or response time.

We use the statistical summary information to give influencing hints forthe query optimizers by injecting additional selectivity information forthe individual query predicates into the SQL translations of the graphqueries.

SELECT fieldlist FROM tablelistWHERE Pi SELECTIVITY Si

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 23 / 40

Page 24: GraphREL: A Relational Graph Query Processor

Experimental Results: Performance and Scalability

Q4 Q8 Q12 Q16 Q201

10

100

1000

10000

100000

Execution T

ime (

ms)

Query Size

D2kV10E20L40M50D10kV10E20L40M50D50kV30E40L90M150D100kV30E40L90M150

(a) Synthetic Dataset

Q4 Q8 Q12 Q16 Q201

10

100

1000

10000

Execution T

ime (

ms)

Query Size

1MB10MB50MB100MB

(b) DBLP Dataset

Figure: The scalability of GraphREL.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 24 / 40

Page 25: GraphREL: A Relational Graph Query Processor

Experimental Results: The effect of using PartitionedB-tree Indexes and Selectivity Injections

Q4 Q8 Q12 Q16 Q200

10

20

30

40

50

60

70

80

90

100

Perc

enta

ge o

f Im

pro

vem

ent (%

)

Query Size

SyntheticDBLP

(a) Partitioned B-tree indexes

Q4 Q8 Q12 Q16 Q200

5

10

15

20

25

30

35

40

Exe

cutio

n T

imes

(ms)

Query Size

SyntheticDBLP

(b) Injection of selectivity annotations

Figure: The speedup improvement for the relational evaluation of sub-graphqueries using partitioned B-tree indexes and selectivity-aware annotations.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 25 / 40

Page 26: GraphREL: A Relational Graph Query Processor

QBP: An Application of GraphREL

Many of today’s Information Systems are driven by explicit processmodels.

A business process is a set of coordinated activities to achieve aspecific business objective.

With the rapid and incremental increase in the number of processmodels, it becomes crucial for business process designers to be able tolook up their repository for models efficiently.

QBP is a query processor for business processes models.

QBP is based on a new visual query language for business processescalled BPMN-Q. The language addresses processes definitions andextends the standard BPMN notations for modeling businessprocesses for its concrete syntax.

A BPMN-Q query is considered to be a graph which is going to bematched with process graph(s).

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 26 / 40

Page 27: GraphREL: A Relational Graph Query Processor

QBP: An Application of GraphREL

Customer applies for

real-estate credit

Credit Rating

[rejected]

Check credit rating

Credit Rating

[accepted]

Check real-estate

construction

document

Check land register

record

Const. Doc.

[invalid]

Const. Doc

[valid]

Record

[absent]

Record

[present]

Prepare contract

Reject application

All OK

Offer loan protection

insurance

Offer residence

insurance

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 27 / 40

Page 28: GraphREL: A Relational Graph Query Processor

QBP: Application Architecture

Relational Business

Process Repository

Relational Business

Process Repository

Model

Editor

Model

Editor

Semantic Query

Expander

Semantic Query

Expander

SQL-Based

Query Processor

SQL-Based

Query Processor

EPCBPELBPMN

Translation MiddlewareTranslation Middleware

RDBMS

……….

SQL ScriptQuery Results

Updates

BPM-Q

Query Editor

BPM-Q

Query Editor

UML

ADs

BPM- Q Query

Semantically

expanded queries

Result Process ModelsBusiness Process

Designers

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 28 / 40

Page 29: GraphREL: A Relational Graph Query Processor

BPMN-Q Query Constructs

Anonymous

Activity

It is used to indicate unknown activities in a query. It resembles an

activity but is distinguished by the @ sign in the beginning of the label.

Generic Node It indicates an unknown node in a process. It could evaluate to any node

type.

Generic Split It refers to any type of split gateways.

Generic Join It refers to any type of join gateways.

Negative

Sequence Flow

It states that two nodes A and B are not directly related by sequence

flow.

Path It states that there must be a path from A to B. A query usually returns

all paths.

Negative Path It states that there is not any path between two nodes A and B.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 29 / 40

Page 30: GraphREL: A Relational Graph Query Processor

QBP: An Application of GraphREL

Customer applies for

real-estate creditReject application//

(a) BPMN-Q Query Example

Customer applies for

real-estate credit

Check credit rating

Check real-estate

construction

document

Check land register

recordReject application

(b) BPMN-Q Query Match

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 30 / 40

Page 31: GraphREL: A Relational Graph Query Processor

QBP: Use Cases

Searching the structure of the process models.

Compliance checking.

Detecting design anomalies.

Discovery of frequent process patterns/anti-patterns.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 31 / 40

Page 32: GraphREL: A Relational Graph Query Processor

QBP: An Application of GraphREL

http://bpmnq.sourceforge.net/

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 32 / 40

Page 33: GraphREL: A Relational Graph Query Processor

Conclusions

GraphREL is a purely relational framework to store and query graphdata.

In principle GraphREL has the following advantages:It can reside on any relational database system and exploits its wellknown matured query optimization techniques as well as its efficientand scalable query processing techniques.

It has no required time cost for offline or pre-processing steps.

It can handle static and dynamic (with frequent updates) graphdatabases very well.

The selectivity annotations for the SQL evaluation scripts provide therelational query optimizers with the ability to select the most efficientexecution plans and apply an efficient pruning for the non-requiredgraph database members.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 33 / 40

Page 34: GraphREL: A Relational Graph Query Processor

Outline

Previous Work: Pathfinder - Relational XQuery Compiler.

Current Work: GraphREL - General Graph Query Processor.

Future Work: Scalable Graph Query Processing for New Generationof Database Applications.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 34 / 40

Page 35: GraphREL: A Relational Graph Query Processor

Future Work: Large Scale Graph Query Processing(e.g: Social Networks)

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 35 / 40

Page 36: GraphREL: A Relational Graph Query Processor

Future Work: Parallel Processing / MapReduce(HadoopDB)

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 36 / 40

Page 37: GraphREL: A Relational Graph Query Processor

Future Work: Storing and Querying Hypergraphs

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 37 / 40

Page 38: GraphREL: A Relational Graph Query Processor

References

[CIDR’03] G. Graefe. Sorting And Indexing With PartitionedB-Trees.

[VLDB’04] T. Grust, S. Sakr, and J. Teubner. XQuery on SQLHosts.

[SIGMOD’07] T. Grust, M. Mayr, J. Rittinger, S. Sakr, and J.Teubner. A SQL:1999 Code Generator for the Pathfinder XQueryCompiler.

[VLDB’08] J. Teubner, T. Grust, S. Maneth, and S. Sakr.Dependable Cardinality Forecats for XQuery.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 38 / 40

Page 39: GraphREL: A Relational Graph Query Processor

References

[IJWIS’08] S. Sakr. ”Algebraic-Based XQuery CardinalityEstimation.

[DASFAA’09] S. Sakr. GraphREL: A Decomposition-Based andSelectivity-Aware Relational Framework for Processing Sub-graphQueries.

[UNISCON’09] S. Sakr. Storing and Querying Graph Data UsingEfficient Relational Processing Techniques.

[JDM’09] S. Sakr. Purely Relational Implementation of an XQueryProcessor.

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 39 / 40

Page 40: GraphREL: A Relational Graph Query Processor

The End

Thank You

S. Sakr (CSE, UNSW) BIT Seminars’09 16 November 2009 40 / 40