marcus paradies, michael rudolf, christof bornhoevd...
TRANSCRIPT
Marcus Paradies, Michael Rudolf, Christof Bornhoevd, Wolfgang Lehner
GRATIN: Accelerating Graph Traversalsin Main-Memory Column StoresGRADES’14 Workshop
June 22, 2014
Graphs from an Enterprise View
Relational+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data • Recursive Queries
• Chained Joins
: Data already in RDBMS
6 SQL as interface
6 Data transfer to application
Relational + Graph+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
: Efficient processing in GDBMS
6 Processing on replicated data
6 No combination with relational data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data • Recursive Queries
• Chained Joins
: Data already in RDBMS
6 SQL as interface
6 Data transfer to application
Relational + Graph+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
: Efficient processing in GDBMS
6 Processing on replicated data
6 No combination with relational data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data • Recursive Queries
• Chained Joins
: Data already in RDBMS
6 SQL as interface
6 Data transfer to application
Relational + Graph+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
: Efficient processing in GDBMS
6 Processing on replicated data
6 No combination with relational data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data • Recursive Queries
• Chained Joins
: Data already in RDBMS
6 SQL as interface
6 Data transfer to application
Relational + Graph+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
: Efficient processing in GDBMS
6 Processing on replicated data
6 No combination with relational data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graphs from an Enterprise View
Relational+ Application Logic
Application Layer
RDBMS
Graph Processing
Data Data • Recursive Queries
• Chained Joins
: Data already in RDBMS
6 SQL as interface
6 Data transfer to application
Relational + Graph+ Application Logic
Application Layer
RDBMS GDBMS
Replicate
Data Data Data Data
: Efficient processing in GDBMS
6 Processing on replicated data
6 No combination with relational data
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 2
Graph Processing (The New World)
Graph Representation
id=1name=Johntype=User
id=2title=The Shiningtype=Product
id=3title=The Standtype=Product
id=4name=Horrortype=Category
id=5name=Literaturetype=Category
type=category
type=similar
type=belongs
type=belongs
type=ratedrating=4.0
type=ratedrating=5.0
Example graph
id type name . . . title
1 User John . . . ?2 Product ? . . . The Shining3 Product ? . . . The Stand4 Category Horror . . . ?5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?2 3 similar . . . ?2 4 belongs . . . ?3 4 belongs . . . ?1 3 rated . . . 5.01 2 rated . . . 4.04 5 category . . . ?
Edge table
• Each vertex/edge represented as a single record in universal tables
• Support for transactions and compression
• Combination with other data models (spatial, text, temporal)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Graph Representationid=1name=Johntype=User
id=2title=The Shiningtype=Product
id=3title=The Standtype=Product
id=4name=Horrortype=Category
id=5name=Literaturetype=Category
type=category
type=similar
type=belongs
type=belongs
type=ratedrating=4.0
type=ratedrating=5.0
Example graph
id type name . . . title
1 User John . . . ?2 Product ? . . . The Shining3 Product ? . . . The Stand4 Category Horror . . . ?5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?2 3 similar . . . ?2 4 belongs . . . ?3 4 belongs . . . ?1 3 rated . . . 5.01 2 rated . . . 4.04 5 category . . . ?
Edge table
• Each vertex/edge represented as a single record in universal tables
• Support for transactions and compression
• Combination with other data models (spatial, text, temporal)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Graph Representationid=1name=Johntype=User
id=2title=The Shiningtype=Product
id=3title=The Standtype=Product
id=4name=Horrortype=Category
id=5name=Literaturetype=Category
type=category
type=similar
type=belongs
type=belongs
type=ratedrating=4.0
type=ratedrating=5.0
Example graph
id type name . . . title
1 User John . . . ?2 Product ? . . . The Shining3 Product ? . . . The Stand4 Category Horror . . . ?5 Category Literature . . . ?
Vertex table
Vs Vt type . . . rating
3 2 similar . . . ?2 3 similar . . . ?2 4 belongs . . . ?3 4 belongs . . . ?1 3 rated . . . 5.01 2 rated . . . 4.04 5 category . . . ?
Edge table
• Each vertex/edge represented as a single record in universal tables
• Support for transactions and compression
• Combination with other data models (spatial, text, temporal)© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 3
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Graph Processing (The New World)
Query Execution
EDGES
σ (Selection)
� (Traversal)
π (Projection)
EDGES
σ (Selection)
� (Traversal)
π (Projection)
Graph TraversalsExample query:{ id:D }-a->(1,*)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 4
Edge Clustering
• Clustering by edge type preservessubgraph meaning
• Clustering by edge source preserves vertexneighborhood
• Increases spatial locality in memory
• Allows reducing scan to range in column
Type ClusteringEdge Clustering
Vs Vt Type
D F aA D aA B aA C aE B aE G aD B bB E bF G b
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 5
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
GRATIN
Column
2
2
2
1
3
1
2
3
4
5
B1
B2
Minimal blocksize: 2
2
1
2
1
2
3
Value Blocks
Block ranges
2
1 3
5
• gratin replaces full column scans by block scans
• gratin indexes column with variable block size
• Allows efficient handling of vertices with high outdegree
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 6
Experiments on Static Graphs
ID |V | |E | d̄out
Amazon 0.4 M 3.3 M 16.8California-Roads 1.9 M 2.7 M 2.8
1 2 3 4 5 6 70
5
10
15
20
# of Traversal Iterations
Exe
cutio
nTi
me
(ms)
Amazon
SCAN gratin-512
gratin-4096 gratin-32768
Figure: Comparison for different block sizes.
2 4 6 8 10
10
20
30
40
50
Traversal Iteration
Que
rytim
e(m
s)
Amazon
2 4 6 8 10
5
10
15
20
25
Traversal Iteration
California-Roads
Figure: Query time for scan-based traversal ( ) andgratin-based traversal ( ).
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 7
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks
Health factor
hB1 = 1.0
hB2 = 1.0hB2 = 0.5
hB3 = 1.0
GRATIN health
h = 1.0h = 0.83
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold→ rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks Health factor
hB1 = 1.0
hB2 = 1.0
hB2 = 0.5
hB3 = 1.0
GRATIN health
h = 1.0
h = 0.83
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold→ rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks Health factor
hB1 = 1.0
hB2 = 1.0
hB2 = 0.5
hB3 = 1.0
GRATIN health
h = 1.0
h = 0.83
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold→ rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Handling Updates
Column
21
22
23
14
35
26
Minimal blocksize: 2
B1
B2
B3
2
1
2
3
1
2
3
Value Blocks Health factor
hB1 = 1.0
hB2 = 1.0
hB2 = 0.5
hB3 = 1.0
GRATIN health
h = 1.0
h = 0.83
Block ranges
3
2
1 3
5
6
• Health factor describes viability of gratin
• If global health factor below threshold→ rebuild index
• gratin allows updates in constant time (append-only)
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 8
Experiments on Dynamic Graphs
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
1.8
2
2.2
∆Batch Insertions
Slo
wdo
wn
Fact
or(
)
Amazon
0
0.2
0.4
0.6
0.8
1
Hea
lthFa
ctor
()
0 +20K +20K +20K +20K +20K
1
1.2
1.4
1.6
∆Batch Insertions
Slo
wdo
wn
Fact
or(
)
California-Roads
0
0.2
0.4
0.6
0.8
1
Hea
lthFa
ctor
()
Figure: Query time for gratin-based traversal on dynamic graphs. Slowdown factor describesthe relative execution time in multiples of the execution time on a static graph.
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 9
Summary
• Tight integration of traversal operatorinto main-memory column store
• gratin is a lightweight secondaryindex structure
• Handles dynamic graphs inpredictable time
• Experiments show a diversespectrum of performanceimprovements
• Performance of gratin depends ongraph topology
If you want to know more, join me for a chat at my poster!
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 10
Summary
• Tight integration of traversal operatorinto main-memory column store
• gratin is a lightweight secondaryindex structure
• Handles dynamic graphs inpredictable time
• Experiments show a diversespectrum of performanceimprovements
• Performance of gratin depends ongraph topology
If you want to know more, join me for a chat at my poster!
© Marcus Paradies | GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores | 10
1 Backup Slides