graph data processing - david r. cheriton school of ...kmsalem/courses/cs743/f14/slides/graph... ·...

75
Graph Data Processing M. Tamer ¨ Ozsu 1 / 75

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Data Processing

M. Tamer Ozsu

1 / 75

Page 2: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Outline

Introduction

RDF Graph Querying

General Graph ProcessingOffline analyticsOnline querying

2 / 75

Page 3: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Data are Very Common

Internet

3 / 75

Page 4: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Data are Very Common

Socialnetworks

4 / 75

Page 5: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Data are Very Common

Tradevolumes andconnections

5 / 75

Page 6: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Data are Very Common

Biologicalnetworks

6 / 75

Page 7: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Data are Very Common

As of September 2011

MusicBrainz

(zitgist)

P20

Turismo de

Zaragoza

yovisto

Yahoo! Geo

Planet

YAGO

World Fact-book

El ViajeroTourism

WordNet (W3C)

WordNet (VUA)

VIVO UF

VIVO Indiana

VIVO Cornell

VIAF

URIBurner

Sussex Reading

Lists

Plymouth Reading

Lists

UniRef

UniProt

UMBEL

UK Post-codes

legislationdata.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov.

uk

Traffic Scotland

theses.fr

Thesau-rus W

totl.net

Tele-graphis

TCMGeneDIT

TaxonConcept

Open Library (Talis)

tags2con delicious

t4gminfo

Swedish Open

Cultural Heritage

Surge Radio

Sudoc

STW

RAMEAU SH

statisticsdata.gov.

uk

St. Andrews Resource

Lists

ECS South-ampton EPrints

SSW Thesaur

us

SmartLink

Slideshare2RDF

semanticweb.org

SemanticTweet

Semantic XBRL

SWDog Food

Source Code Ecosystem Linked Data

US SEC (rdfabout)

Sears

Scotland Geo-

graphy

ScotlandPupils &Exams

Scholaro-meter

WordNet (RKB

Explorer)

Wiki

UN/LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-castle

LAASKISTI

JISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints dotAC

DEPLOY

DBLP (RKB

Explorer)

Crime Reports

UK

Course-ware

CORDIS (RKB

Explorer)CiteSeer

Budapest

ACM

riese

Revyu

researchdata.gov.

ukRen. Energy Genera-

tors

referencedata.gov.

uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

Rådata nå!

PSH

Product Types

Ontology

ProductDB

PBAC

Poké-pédia

patentsdata.go

v.uk

OxPoints

Ord-nance Survey

Openly Local

Open Library

OpenCyc

Open Corpo-rates

OpenCalais

OpenEI

Open Election

Data Project

OpenData

Thesau-rus

Ontos News Portal

OGOLOD

JanusAMP

Ocean Drilling Codices

New York

Times

NVD

ntnusc

NTU Resource

Lists

Norwe-gian

MeSH

NDL subjects

ndlna

myExperi-ment

Italian Museums

medu-cator

MARC Codes List

Man-chester Reading

Lists

Lotico

Weather Stations

London Gazette

LOIUS

Linked Open Colors

lobidResources

lobidOrgani-sations

LEM

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

LinkedUser

FeedbackLOV

Linked Open

Numbers

LODE

Eurostat (OntologyCentral)

Linked EDGAR

(OntologyCentral)

Linked Crunch-

base

lingvoj

Lichfield Spen-ding

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Klapp-stuhl-club

Good-win

Family

National Radio-activity

JP

Jamendo (DBtune)

Italian public

schools

ISTAT Immi-gration

iServe

IdRef Sudoc

NSZL Catalog

Hellenic PD

Hellenic FBD

PiedmontAccomo-dations

GovTrack

GovWILD

GoogleArt

wrapper

gnoss

GESIS

GeoWordNet

GeoSpecies

GeoNames

GeoLinkedData

GEMET

GTAA

STITCH

SIDER

Project Guten-berg

MediCare

Euro-stat

(FUB)

EURES

DrugBank

Disea-some

DBLP (FU

Berlin)

DailyMed

CORDIS(FUB)

Freebase

flickr wrappr

Fishes of Texas

Finnish Munici-palities

ChEMBL

FanHubz

EventMedia

EUTC Produc-

tions

Eurostat

Europeana

EUNIS

EU Insti-

tutions

ESD stan-dards

EARTh

Enipedia

Popula-tion (En-AKTing)

NHS(En-

AKTing) Mortality(En-

AKTing)

Energy (En-

AKTing)

Crime(En-

AKTing)

CO2 Emission

(En-AKTing)

EEA

SISVU

education.data.g

ov.uk

ECS South-ampton

ECCO-TCP

GND

Didactalia

DDC Deutsche Bio-

graphie

datadcs

MusicBrainz

(DBTune)

Magna-tune

John Peel

(DBTune)

Classical (DB

Tune)

AudioScrobbler (DBTune)

Last.FM artists

(DBTune)

DBTropes

Portu-guese

DBpedia

dbpedia lite

Greek DBpedia

DBpedia

data-open-ac-uk

SMCJournals

Pokedex

Airports

NASA (Data Incu-bator)

MusicBrainz(Data

Incubator)

Moseley Folk

Metoffice Weather Forecasts

Discogs (Data

Incubator)

Climbing

data.gov.uk intervals

Data Gov.ie

databnf.fr

Cornetto

reegle

Chronic-ling

America

Chem2Bio2RDF

Calames

businessdata.gov.

uk

Bricklink

Brazilian Poli-

ticians

BNB

UniSTS

UniPathway

UniParc

Taxonomy

UniProt(Bio2RDF)

SGD

Reactome

PubMedPub

Chem

PRO-SITE

ProDom

Pfam

PDB

OMIMMGI

KEGG Reaction

KEGG Pathway

KEGG Glycan

KEGG Enzyme

KEGG Drug

KEGG Com-pound

InterPro

HomoloGene

HGNC

Gene Ontology

GeneID

Affy-metrix

bible ontology

BibBase

FTS

BBC Wildlife Finder

BBC Program

mes BBC Music

Alpine Ski

Austria

LOCAH

Amster-dam

Museum

AGROVOC

AEMET

US Census (rdfabout)

Media

Geographic

Publications

Government

Cross-domain

Life sciences

User-generated content

Linked data

7 / 75

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/

Page 8: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Types

RDF graph

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

bm:books/0743424425

4.7

rev:rating

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:directormovie:actor

movie:actor movie:actor

movie:director movie:director

8 / 75

Page 9: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Types

Property graph

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)

books 0743424425(rating, 4.7)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447) actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013

(relatedBook)

(hasOffer)

(based near)(actor)

(director)(actor)

(actor) (actor)

(director) (director)

9 / 75

Page 10: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Types

RDF graph

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

bm:books/0743424425

4.7

rev:rating

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:directormovie:actor

movie:actor movie:actor

movie:director movie:director

I Workload: SPARQLqueries

I Query execution: subgraphmatching byhomomorphism

Property graph

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)

books 0743424425(rating, 4.7)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447) actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013

(relatedBook)

(hasOffer)

(based near)(actor)

(director)(actor)

(actor) (actor)

(director) (director)

I Workload: Online queriesand analytic workloads

I Query execution: Muchmore varied

10 / 75

Page 11: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Outline

Introduction

RDF Graph Querying

General Graph ProcessingOffline analyticsOnline querying

11 / 75

Page 12: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

RDF Graph

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

bm:books/0743424425

4.7

rev:rating

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:directormovie:actor

movie:actor movie:actor

movie:director movie:director

12 / 75

Page 13: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

SPARQL Queries

SELECT ?nameWHERE {

?m r d f s : l a b e l ?name . ?m movie : d i r e c t o r ?d .?d movie : d i r e c t o r n a m e ” S t a n l e y K u b r i c k ” .?m movie : r e l a t e d B o o k ?b . ?b r e v : r a t i n g ? r .FILTER(? r > 4 . 0 )

}

?m ?dmovie:director

?name

rdfs:label

?b

movie:relatedBook

“Stanley Kubrick”

movie:director name

?rrev:rating

FILTER(?r > 4.0)

13 / 75

Page 14: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph-based Approach

I Answering SPARQL query ≡ subgraph matchingI gStore, chameleon-db

?m ?dmovie:director

?name

rdfs:label

?b

movie:relatedBook

“Stanley Kubrick”

movie:director name

?rrev:rating

FILTER(?r > 4.0)

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

bm:books/0743424425

4.7

rev:rating

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:directormovie:actor

movie:actor movie:actor

movie:director movie:director

SubgraphM

atching

14 / 75

Page 15: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

gStore

General Approach:

I Work directly on the RDF graph and the SPARQL query graph

I Use a signature-based encoding of each entity and class vertexto speed up matching

I Filter-and-evaluateI Use a false positive algorithm to prune nodes and obtain a set

of candidates; then do more detailed evaluation on those

I Use an index (VS∗-tree) over the data signature graph (haslight maintenance load) for efficient pruning

15 / 75

Page 16: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

1. Encode Q and G to Get Signature GraphsQuery signature graph Q∗

0100 0000 1000 000000010

0000 010010000

Data signature graph G∗

0010 1000

0100 0001

00001

1000 000100010

0000 0100

10000

0000 1000

10000

0000 0010

10000

0000 1001

00100

0001 000101000

0100 1000

01000

1001 1000

01000

0001 0100

01000

16 / 75

Page 17: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

2. Filter-and-EvaluateQuery signature graph Q∗

0100 0000 1000 000000010

0000 010010000

Data signature graph G∗

0010 1000

0100 0001

00001

1000 000100010

0000 0100

10000

0000 1000

10000

0000 0010

10000

0000 1001

00100

0001 000101000

0100 1000

01000

1001 1000

01000

0001 0100

01000

Find matches of Q∗ oversignature graph G ∗

Verify each match inRDF graph G

17 / 75

Page 18: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

How to Generate Candidate List

I Two step process:

1. For each node of Q∗ get lists of nodes in G∗ that include thatnode.

2. Do a multi-way join to get the candidate list

I Alternatives:

I Sequential scan of G∗

I Both steps are inefficient

I Use S-treesI Height-balanced tree over signaturesI Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

• Given query signature q and a set of data signatures S ,find all data signatures si ∈ S where q&si = q

I Does not support second step – expensive

I VS-tree (and VS∗-tree)I Multi-resolution summary graph based on S-treeI Supports both steps efficientlyI Grouping by vertices

18 / 75

Page 19: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

How to Generate Candidate List

I Two step process:

1. For each node of Q∗ get lists of nodes in G∗ that include thatnode.

2. Do a multi-way join to get the candidate list

I Alternatives:

I Sequential scan of G∗

I Both steps are inefficient

I Use S-treesI Height-balanced tree over signaturesI Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

• Given query signature q and a set of data signatures S ,find all data signatures si ∈ S where q&si = q

I Does not support second step – expensive

I VS-tree (and VS∗-tree)I Multi-resolution summary graph based on S-treeI Supports both steps efficientlyI Grouping by vertices

19 / 75

Page 20: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

How to Generate Candidate List

I Two step process:

1. For each node of Q∗ get lists of nodes in G∗ that include thatnode.

2. Do a multi-way join to get the candidate list

I Alternatives:I Sequential scan of G∗

I Both steps are inefficient

I Use S-treesI Height-balanced tree over signaturesI Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

• Given query signature q and a set of data signatures S ,find all data signatures si ∈ S where q&si = q

I Does not support second step – expensive

I VS-tree (and VS∗-tree)I Multi-resolution summary graph based on S-treeI Supports both steps efficientlyI Grouping by vertices

20 / 75

Page 21: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

How to Generate Candidate List

I Two step process:

1. For each node of Q∗ get lists of nodes in G∗ that include thatnode.

2. Do a multi-way join to get the candidate list

I Alternatives:I Sequential scan of G∗

I Both steps are inefficient

I Use S-treesI Height-balanced tree over signaturesI Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

• Given query signature q and a set of data signatures S ,find all data signatures si ∈ S where q&si = q

I Does not support second step – expensive

I VS-tree (and VS∗-tree)I Multi-resolution summary graph based on S-treeI Supports both steps efficientlyI Grouping by vertices

21 / 75

Page 22: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

How to Generate Candidate List

I Two step process:

1. For each node of Q∗ get lists of nodes in G∗ that include thatnode.

2. Do a multi-way join to get the candidate list

I Alternatives:I Sequential scan of G∗

I Both steps are inefficient

I Use S-treesI Height-balanced tree over signaturesI Run an inclusion query for each node of Q∗ and get lists of

nodes in G∗ that include that node.

• Given query signature q and a set of data signatures S ,find all data signatures si ∈ S where q&si = q

I Does not support second step – expensive

I VS-tree (and VS∗-tree)I Multi-resolution summary graph based on S-treeI Supports both steps efficientlyI Grouping by vertices

22 / 75

Page 23: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

S-tree Solution

1111 1111

0110 1111 1101 1101

0000 1110 0110 1001 1100 1001 1001 1101

0000 1000

0000 0100 0000 0010

0010 1000

0100 0001

1000 0001

0000 1001

0100 1000

1001 1000

0001 0100

0001 0001

005

004 006

001

002

003

007

011

008

009

010

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000100 000000010

0000 010010000

Possibly large join space!

23 / 75

Page 24: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

S-tree Solution

1111 1111

0110 1111 1101 1101

0000 1110 0110 1001 1100 1001 1001 1101

0000 1000

0000 0100 0000 0010

0010 1000

0100 0001

1000 0001

0000 1001

0100 1000

1001 1000

0001 0100

0001 0001

005

004 006

001

002

003

007

011

008

009

010

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000100 000000010

0000 010010000

Possibly large join space!

24 / 75

Page 25: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

S-tree Solution

1111 1111

0110 1111 1101 1101

0000 1110 0110 1001 1100 1001 1001 1101

0000 1000

0000 0100 0000 0010

0010 1000

0100 0001

1000 0001

0000 1001

0100 1000

1001 1000

0001 0100

0001 0001

005

004 006

001

002

003

007

011

008

009

010

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000100 000000010

0000 010010000 002

011

Possibly large join space!

25 / 75

Page 26: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

S-tree Solution

1111 1111

0110 1111 1101 1101

0000 1110 0110 1001 1100 1001 1001 1101

0000 1000

0000 0100 0000 0010

0010 1000

0100 0001

1000 0001

0000 1001

0100 1000

1001 1000

0001 0100

0001 0001

005

004 006

001

002

003

007

011

008

009

010

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000100 000000010

0000 010010000 002

011

003

008

Possibly large join space!

26 / 75

Page 27: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

S-tree Solution

1111 1111

0110 1111 1101 1101

0000 1110 0110 1001 1100 1001 1001 1101

0000 1000

0000 0100 0000 0010

0010 1000

0100 0001

1000 0001

0000 1001

0100 1000

1001 1000

0001 0100

0001 0001

005

004 006

001

002

003

007

011

008

009

010

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000100 000000010

0000 010010000 002

011

003

008

004

009

Possibly large join space!

27 / 75

Page 28: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

S-tree Solution

1111 1111

0110 1111 1101 1101

0000 1110 0110 1001 1100 1001 1001 1101

0000 1000

0000 0100 0000 0010

0010 1000

0100 0001

1000 0001

0000 1001

0100 1000

1001 1000

0001 0100

0001 0001

005

004 006

001

002

003

007

011

008

009

010

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000100 000000010

0000 010010000 002

011

003

008

004

009on on

Possibly large join space!

28 / 75

Page 29: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

S-tree Solution

1111 1111

0110 1111 1101 1101

0000 1110 0110 1001 1100 1001 1001 1101

0000 1000

0000 0100 0000 0010

0010 1000

0100 0001

1000 0001

0000 1001

0100 1000

1001 1000

0001 0100

0001 0001

005

004 006

001

002

003

007

011

008

009

010

d11

d21 d2

2

d31 d3

2 d33 d3

4

G 3

G 2

G 1

1000 00000100 000000010

0000 010010000 002

011

003

008

004

009on on

Possibly large join space!

29 / 75

Page 30: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Outline

Introduction

RDF Graph Querying

General Graph ProcessingOffline analyticsOnline querying

30 / 75

Page 31: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

31 / 75

Page 32: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Focus here is on the

dynamism of the

graphs in whether or

not they change and

how they change.

32 / 75

Page 33: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Focus here is on the

dynamism of the

graphs in whether or

not they change and

how they change.

Focus here is on the

how algorithms behave

as their input changes.

33 / 75

Page 34: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Focus here is on the

dynamism of the

graphs in whether or

not they change and

how they change.

Focus here is on the

how algorithms behave

as their input changes.

The types of workloads

that the approaches are

designed to handle.

34 / 75

Page 35: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

35 / 75

Page 36: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Graphs do not

change or we

are not inter-

ested in their

changes – only

a snapshot is

considered.

36 / 75

Page 37: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Graphs do not

change or we

are not inter-

ested in their

changes – only

a snapshot is

considered.

Graphs change

and we are

interested in

their changes.

37 / 75

Page 38: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Graphs do not

change or we

are not inter-

ested in their

changes – only

a snapshot is

considered.

Graphs change

and we are

interested in

their changes.

Dynamic

graphs with

high veloc-

ity changes –

not possible to

see the entire

graph at once.

38 / 75

Page 39: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Graphs do not

change or we

are not inter-

ested in their

changes – only

a snapshot is

considered.

Graphs change

and we are

interested in

their changes.

Dynamic

graphs with

high veloc-

ity changes –

not possible to

see the entire

graph at once.

Dynamic

graphs with un-

known changes

– requires re-

discovery of

the graph (e.g.,

LOD).

39 / 75

Page 40: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

40 / 75

Page 41: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Computation accesses a

portion of the graph

and the results are

computed for a subset

of vertices; e.g., point-

to-point shortest path,

subgraph matching,

reachability, SPARQL.

41 / 75

Page 42: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Computation accesses a

portion of the graph

and the results are

computed for a subset

of vertices; e.g., point-

to-point shortest path,

subgraph matching,

reachability, SPARQL.

Computation accesses

the entire graph and

may require multiple

iterations; e.g., PageR-

ank, clustering, graph

colouring, all pairs

shortest path.

42 / 75

Page 43: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

43 / 75

Page 44: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Sees the en-

tire input in

advance.

44 / 75

Page 45: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Sees the en-

tire input in

advance.

Sees the input

piece-meal as it

executes.

45 / 75

Page 46: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Sees the en-

tire input in

advance.

Sees the input

piece-meal as it

executes.

One-pass on-

line algorithm

with limited

memory.

46 / 75

Page 47: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Sees the en-

tire input in

advance.

Sees the input

piece-meal as it

executes.

One-pass on-

line algorithm

with limited

memory.

Online algo-

rithm with

some info

about forth-

coming input.

47 / 75

Page 48: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Sees the en-

tire input in

advance.

Sees the input

piece-meal as it

executes.

One-pass on-

line algorithm

with limited

memory.

Online algo-

rithm with

some info

about forth-

coming input.

Sees the en-

tire input

in advance,

which may

change; an-

swers computed

as change oc-

curs.

48 / 75

Page 49: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Classification

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Sees the en-

tire input in

advance.

Sees the input

piece-meal as it

executes.

One-pass on-

line algorithm

with limited

memory.

Online algo-

rithm with

some info

about forth-

coming input.

Sees the en-

tire input

in advance,

which may

change; an-

swers computed

as change oc-

curs.

Similar to dy-

namic, but

computation

happens in

batches of

changes. 49 / 75

Page 50: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Example Design Points – Not all alternatives make sense

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Dynamic (or batch-dynamic) algorithms do not make sense for staticgraphs.

50 / 75

Page 51: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Graph Workloads

Offline graph analytics

I PageRank

I Clustering

I Strongly connectedcomponents

I Diameter finding

I Graph colouring

I All pairs shortest path

I Graph pattern mining

I Machine learningalgorithms (Beliefpropagation, Gaussiannon-negative matrixfactorization)

Online graph querying

I Reachability

I Single source shortest-path

I Subgraph matching

I SPARQL queries

51 / 75

Page 52: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

PageRank Computation

A web page is important if it is pointed to by other importantpages.

P1 P2

P3

P5P6

P4

r(Pi ) =∑

Pj∈BPi

r(Pj)

|FPj|

r(P2) =r(P1)

2+

r(P3)

3

rk+1(Pi ) =∑

Pj∈BPi

rk(Pj)

|FPj|

BPi: in-neighbours of Pi

FPi: out-neighbours of Pi

52 / 75

Page 53: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

PageRank Computation

A web page is important if it is pointed to by other importantpages.

P1 P2

P3

P5P6

P4

rk+1(Pi ) =∑

Pj∈BPi

rk(Pj)

|FPj|

Iteration 0 Iteration 1 Iteration 2Rank atIter. 2

r0(P1) = 1/6 r1(P1) = 1/18 r2(P1) = 1/36 5r0(P2) = 1/6 r1(P2) = 5/36 r2(P2) = 1/18 4r0(P3) = 1/6 r1(P3) = 1/12 r2(P3) = 1/36 5r0(P4) = 1/6 r1(P4) = 1/4 r2(P4) = 17/72 1r0(P5) = 1/6 r1(P5) = 5/36 r2(P5) = 11/72 3r0(P6) = 1/6 r1(P6) = 1/6 r2(P6) = 14/72 2

Iterative processing.

53 / 75

Page 54: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Some Alternative Computational Models for OfflineAnalytics

I MapReduceI map and reduce functionsI Not suitable for iterative processing due to data movement at

each stageI Need to save in storage system intermediate results of each

iteration

I Vertex-centric paradigmI Specify (a) the computation to be performed at each vertex,

and (b) its communication with neighbour verticesI Designed specifically for interactive graph processingI Synchronous (e.g., Pregel, Giraph)

I Asynchronous (e.g., GraphLab)

54 / 75

Page 55: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Some Alternative Computational Models for OfflineAnalytics

I MapReduceI map and reduce functionsI Not suitable for iterative processing due to data movement at

each stageI Need to save in storage system intermediate results of each

iterationI Vertex-centric paradigm

I Specify (a) the computation to be performed at each vertex,and (b) its communication with neighbour vertices

I Designed specifically for interactive graph processingI Synchronous (e.g., Pregel, Giraph)

I Asynchronous (e.g., GraphLab)

55 / 75

Page 56: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Pregel-like Graph Processing Systems

Pregel-like systems are BSP, vertex-centric programs.

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3Computation

56 / 75

Page 57: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Pregel-like Graph Processing Systems

Pregel-like systems are BSP, vertex-centric programs.

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

57 / 75

Page 58: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Pregel-like Graph Processing Systems

Pregel-like systems are BSP, vertex-centric programs.

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

58 / 75

Page 59: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Pregel-like Graph Processing Systems

Pregel-like systems are BSP, vertex-centric programs.

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

59 / 75

Page 60: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Pregel-like Graph Processing Systems

Pregel-like systems are BSP, vertex-centric programs.

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

60 / 75

Page 61: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Pregel-like Graph Processing Systems

Pregel-like systems are BSP, vertex-centric programs.

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

61 / 75

Page 62: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Pregel-like Graph Processing Systems

Pregel-like systems are BSP, vertex-centric programs.

I “Think like a vertex”:

?

62 / 75

Page 63: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

GraphLab (Asynchronous)

GraphLab features asynchronous execution:

I No communication barriers. 3

I Uses the most recent vertex values. 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

63 / 75

Page 64: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

GraphLab (Asynchronous)

Implemented via distributed locking:

v0

v1 v2

v3 v4

64 / 75

Page 65: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

GraphLab (Asynchronous)

Implemented via distributed locking:

v0

v1 v2

v3 v4

65 / 75

Page 66: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

GraphLab (Asynchronous)

Implemented via distributed locking:

v0

v1 v2

v3 v4

66 / 75

Page 67: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

GraphLab (Asynchronous)

Implemented via distributed locking:

v0

v1 v2

v3 v4

67 / 75

Page 68: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

GraphLab (Asynchronous)

Implemented via distributed locking:

v0

v1 v2

v3 v4

68 / 75

Page 69: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Reachability Queries

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)

books 0743424425(rating, 4.7)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447) actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013

(relatedBook)

(hasOffer)

(based near)(actor)

(director)(actor)

(actor) (actor)

(director) (director)

69 / 75

Page 70: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Reachability Queries

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)

books 0743424425(rating, 4.7)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447) actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013

(relatedBook)

(hasOffer)

(based near)(actor)

(director)(actor)

(actor) (actor)

(director) (director)

Can you reach film 1267 from film 2014?

70 / 75

Page 71: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Reachability Queries

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)

books 0743424425(rating, 4.7)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447) actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013

(relatedBook)

(hasOffer)

(based near)(actor)

(director)(actor)

(actor) (actor)

(director) (director)

Is there a book whose rating is > 4.0 associated with a film thatwas directed by Stanley Kubrick?

71 / 75

Page 72: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Reachability Queries

Think of Facebook graph and finding friends of friends.

72 / 75

Page 73: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

Subgraph Matching

?m ?dmovie:director

?name

rdfs:label

?b

movie:relatedBook

“Stanley Kubrick”

movie:director name

?rrev:rating

FILTER(?r > 4.0)

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

bm:books/0743424425

4.7

rev:rating

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:directormovie:actor

movie:actor movie:actor

movie:director movie:director

SubgraphM

atching

73 / 75

Page 74: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

For more information I

Graph-based SPARQL processing

I Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Ozsu, and Dongyan Zhao.gStore: answering SPARQL queries via subgraph matching. Proc. VLDBEndowment, 4(8):482–493, 2011

I Lei Zou, M. Tamer Ozsu, Lei Chen, Xuchuan Shen, Ruizhe Huang, andDongyan Zhao. gStore: A graph-based SPARQL query engine. VLDB J.,23(4):565–590, 2014

I Gunes Aluc, M. Tamer Ozsu, Khuzaima Daudjee, and Olaf Hartig.chameleon-db: a workload-aware robust RDF data management system.Technical Report CS-2013-10, University of Waterloo, 2013. Available athttps://cs.uwaterloo.ca/sites/ca.computer-science/files/

uploads/files/CS-2013-10.pdf

74 / 75

Page 75: Graph Data Processing - David R. Cheriton School of ...kmsalem/courses/cs743/F14/slides/Graph... · \A Clockwork Orange" refs:label mdb: lm/424 \Spartacus" refs:label mdb:actor/30013

For more information IIMore on RDF graph management

I Olaf Hartig and M. Tamer Ozsu. Linked data query processing. In Proc.30th Int. Conf. on Data Engineering, pages 1286–1289, 2014. Tutorialdescription

I More organized slides from my talks are available at http://www.slideshare.net/MTamerOzsu/web-data-management-with-rdf

General graph processingI Arijit Khan and Sameh Elnikety. Systems for big-graphs. Proc. VLDB

Endowment, 7(13):1709–1710, 2014

I Slides available at http://people.inf.ethz.ch/khana/Papers/2014_VLDB_GraphSystemsTutorial.pptx

75 / 75