query languages for graph databases - home - …ptw/tutorial.pdf · query languages for graph...

103
Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models and query languages G, G + and Graphlog Lore/Lorel YAGO/NAGA Other models and languages Query Functionality Overview Graph pattern matching Path finding Edge and path variables Aggregation Approximate matching and ranking Summary Query Languages for Graph Databases Peter T. Wood School of Computer Science and Information Systems Birkbeck, University of London [email protected] Third Alberto Mendelzon International Workshop on Foundations of Data Management

Upload: vuongxuyen

Post on 19-Jul-2018

260 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Query Languages for Graph Databases

Peter T. Wood

School of Computer Science and Information SystemsBirkbeck, University of London

[email protected]

Third Alberto Mendelzon International Workshop onFoundations of Data Management

Page 2: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 3: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 4: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Motivation

Graphs are widely used for representing dataI transportation and other networksI geographical informationI semistructured dataI (hyper)document structureI semantic associations in criminal investigationsI bibliographic citation analysisI pathways in biological processesI knowledge representation (e.g. semantic web)I program analysisI workflow systemsI data provenanceI . . .

Page 5: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Example

A graph of cities and flight durations:

LHR

CDGJFK MAD

LIM SCL

172

812 14

4

14

Page 6: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Example

A graph of cities and flight durations:

LHR

CDGJFK MAD

LIM SCL

172

812 14

4

14

LHR

CDGJFK MAD

LIM SCL

Nodes

Page 7: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Example

A graph of cities and flight durations:

LHR

CDGJFK MAD

LIM SCL

172

812 14

4

14

Edges

Page 8: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Types of Edges

Undirected

Directed

Page 9: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Types of Edges

Undirected Directed

Page 10: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Types of labels

Node labelsA B

CD

Edge labels3

21

5

6

Page 11: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Types of labels

Node labelsA B

CD

Edge labels3

21

5

6

Page 12: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Cyclic graphs

UndirectedA B

CD

DirectedA B

CD

Page 13: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Cyclic graphs

UndirectedA B

CD

DirectedA B

CD

Page 14: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Cyclic graphs

UndirectedA B

CD

DirectedA B

CD

Page 15: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Cyclic graphs

UndirectedA B

CD

DirectedA B

CD

Page 16: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Acyclic graphs

TreeA B

CD

DAGA B

CD

TreeA B

CD

Page 17: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Acyclic graphs

TreeA B

CD

DAGA B

CD

TreeA B

CD

Page 18: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Acyclic graphs

TreeA B

CD

DAGA B

CD

TreeA B

CD

Page 19: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Formal graph definition

For our purposes:I database comprises a single labelled (multi-)graph GI (finite) set of nodes N with identifiers drawn from an

infinite vocabulary VI (finite) set of (directed) edges EI incidence function φ : E 7→ N × N (allows

multi-edges)I edge labelling function λ : E 7→ Σ

I Σ is a finite alphabetSo G = (N,E ,V ,Σ, φ, λ)

Page 20: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 21: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Many years ago . . .

I PhD on “Queries on Graphs” (1988)I supervised by Alberto Mendelzon

More recently

I querying RDF (allowing for query relaxation andranking)

I ranking approximate answers to semantic webqueries

I investigating operators for finding/manipulating pathsI . . . with Pablo Barcelo and Carlos Hurtado (Chile)

and Alex Poulovassilis (Birkbeck)

Page 22: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Many years ago . . .

I PhD on “Queries on Graphs” (1988)I supervised by Alberto Mendelzon

More recently

I querying RDF (allowing for query relaxation andranking)

I ranking approximate answers to semantic webqueries

I investigating operators for finding/manipulating pathsI . . . with Pablo Barcelo and Carlos Hurtado (Chile)

and Alex Poulovassilis (Birkbeck)

Page 23: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 24: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Transportation and other networks

I airline, train, bus . . . networksI communication networksI planning networks—single source and sink, acyclic

Typical queries:I reachability: can I get from a to b?I shortest path: find the quickest/shortest route from a

to bI reliability/capacity of pathsI critical path

Page 25: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Transportation and other networks

I airline, train, bus . . . networksI communication networksI planning networks—single source and sink, acyclic

Typical queries:I reachability: can I get from a to b?I shortest path: find the quickest/shortest route from a

to bI reliability/capacity of pathsI critical path

Page 26: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Knowledge representation

I semantic networksI conceptual graphsI RDF/S, OWLI ontologiesI taxonomiesI . . .

Typical queries:I instance and subclass relationshipsI finding connections between entitiesI . . .

Page 27: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Knowledge representation

I semantic networksI conceptual graphsI RDF/S, OWLI ontologiesI taxonomiesI . . .

Typical queries:I instance and subclass relationshipsI finding connections between entitiesI . . .

Page 28: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Program/workflow analysis

I nodes are program points or agents/productsI edges are program or workflow stepsI often single source and sink nodesI also data provenance applications

Typical queries:I reachability of codeI variables used before definedI deadlock/livelockI what agents/processes/products were involved in

producing something

Page 29: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Program/workflow analysis

I nodes are program points or agents/productsI edges are program or workflow stepsI often single source and sink nodesI also data provenance applications

Typical queries:I reachability of codeI variables used before definedI deadlock/livelockI what agents/processes/products were involved in

producing something

Page 30: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Biological applications

I metabolic pathwaysI gene regulatory networksI protein interaction networksI . . .

Typical queries include:I path existenceI subgraph isomorphismI k-shortest pathsI neighbourhood queriesI approximate matchingI . . .

(see https://hpcrd.lbl.gov/staff/olken/graphdm/graphdm.htm)

Page 31: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Biological applications

I metabolic pathwaysI gene regulatory networksI protein interaction networksI . . .

Typical queries include:I path existenceI subgraph isomorphismI k-shortest pathsI neighbourhood queriesI approximate matchingI . . .

(see https://hpcrd.lbl.gov/staff/olken/graphdm/graphdm.htm)

Page 32: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 33: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

G, G+ and Graphlog

I from the 1980s [Cruz, Mendelzon and Wood, 1987][Cruz, Mendelzon and Wood, 1988][Consens and Mendelzon, 1989]

I developed at University of TorontoI data model is a labelled, directed graphI in G and G+, query is a set of pairs of pattern graphs

and summary graphsI pattern graph nodes are labelled with variables or

constantsI pattern graph edges are labelled with regular

expressions over edge labels and variablesI Graphlog adds edge inversion, negation,

distinguished edge and different semantics

Page 34: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

G, G+ Example

I given a graphI nodes representing peopleI edges labelled with m (for motherOf ) and f (for

fatherOf )

I following query finds parents followed by pairs ofpeople who have a common ancestor

x z y x y

x y x y

p∗ p∗ a

m|f p

Page 35: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

G, G+ Example

I given a graphI nodes representing peopleI edges labelled with m (for motherOf ) and f (for

fatherOf )I following query finds parents followed by pairs of

people who have a common ancestor

x z y x y

x y x y

p∗ p∗ a

m|f p

Page 36: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 37: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Lore/Lorel

I from the 1990s [Abiteboul et al., 1997]I developed at StanfordI Lore: Lightweight Object RepositoryI Lorel: Lore query languageI for semistructured data

I no predefined schemaI may be heterogeneous

I uses Object Exchange Model (OEM)I Lore/Lorel can be viewed as extension of ODMG

model/OQL

Page 38: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Lore model

I data model is graph with two types of nodesI complex objectsI atomic objects (values) with no outgoing edges

I each node has a unique oidI each edge is labelled with a stringI graph has a number of named nodes (entry points)I every node must be reachable from a named node

Page 39: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Lorel example

I graph representing a restaurant guideI find addresses of restaurants with a given zipcode

select Guide.restaurant .addresswhere Guide.restaurant .address.zipcode = 92310

I Guide is a named nodeI restaurant , address and zipcode are edge labels

Page 40: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 41: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

YAGO/NAGA

I from the 2000s [Weikum et al., 2009]I developed at Max Planck Institute for InformaticsI YAGO: Yet Another Great OntologyI NAGA: Not Another Google AnswerI semantic search engine for web derived knowledgeI combines DB and IRI 26 relationships between entities derived using

information extractionI e.g., isA, bornInYear , hasWonPrize, locatedIn, . . .

Page 42: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

NAGA model

I data model is directed, weighted multigraphI nodes represent entitiesI edges represent relationshipsI weights represent confidence of extracted factsI query is a connected, directed graphI each edge labelled with a regular expression over

edge labels or a variable or connect keywordI answers are ranked by

I informativenessI confidenceI compactness

Page 43: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

NAGA examples

I graph representing information on people and filmsI in which films did a governor act?

X isA governorX actedIn YY isA film

I X and Y are node variablesI isA and actedIn are relationships (edge labels)I what do Albert Einstein and Niels Bohr have in

common?Albert_Einstein connect Niels_Bohr

I Albert_Einstein and Niels_Bohr are node labelsI asks for paths connecting nodes—ranked

Page 44: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 45: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Other graph data models and querylanguages

I Functional Data ModelI Logical Data ModelI O2I GOOD, GDMI Strudel and StruQLI G-BASE, Gram, GraphDB, GRASI hypergraphs, hypernode model, hygraphsI RDF/S and SPARQL

See Survey of Graph Database Models[Angles and Gutierrez, 2008]

Page 46: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 47: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Query functionality

I graph pattern matchingI path findingI edge label variablesI negationI path variablesI aggregationI approximate matching and rankingI (disjunction)

Page 48: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 49: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Example graph

A graph of authors, prizes they have won, and countrieswhere they were born:

Nobel Booker prize

Neruda Gordimer Coetzee Carey author

Chile SouthAfrica Australia country

hasWon

bornIn

Page 50: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Example query

Which authors born in South Africa have won both theNobel Prize in Literature and the Man Booker prize?

XBooker Nobel

SouthAfrica

hasWon hasWon

bornIn

X is a variable

Page 51: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Matching subgraphs

Two matching subgraphs

Nobel Booker prize

Neruda Gordimer Coetzee Carey author

Chile SouthAfrica Australia country

hasWon

bornIn

Page 52: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Matching subgraphs

Two matching subgraphs

Nobel Booker prize

Neruda Gordimer Coetzee Carey author

Chile SouthAfrica Australia country

hasWon

bornIn

Gordimer

Page 53: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Matching subgraphs

Two matching subgraphs

Nobel Booker prize

Neruda Gordimer Coetzee Carey author

Chile SouthAfrica Australia country

hasWon

bornIn

Coetzee

Page 54: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Query answers

Depending on the query language and whether thedatabase is a set of graphs or a single graph, answersmights be the

I set of graphs in which a match is found (e.g.biological applications)

I set of matching subgraphs (NAGA)I set of variable bindings for each variable (most

others)

Page 55: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Forms of query expression

I similar to SQL/OQL (Lorel, RQL):select Xfrom X .hasWon Y , X .hasWon Z , X .bornIn Wwhere Y = Nobel and Z = Booker and

W = SouthAfricaW , X , Y and Z are variables

I conjunctive query (similar to NAGA and others):

(X ) ← (X ,hasWon,Nobel),(X ,hasWon,Booker),

(X ,bornIn,SouthAfrica)

Page 56: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Query evaluation problem

Given a query expression Q and a graph (database) G,is Q(G) non-empty?

I Combined complexity: both Q and G are part of theinput

I Query complexity: input is Q while G is fixedI Data complexity: input is G while Q is fixed

Often consider data complexity since graphs areassumed to be large while query expressions areassumed to be short

Page 57: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Complexity of query evaluation

For graph pattern matching, the complexity is the sameas

I relational conjunctive queriesI subgraph isomorphism

namelyI NP-complete in terms of query and combined

complexityI PTIME in terms of data complexity

Query and combined complexity are in PTIME if thevariables in the query satisfy an acyclicity condition[Yannakakis, 1981]But can still be exponential if output all variable bindings

Page 58: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 59: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

More flexible matching

XBooker Nobel

SouthAfrica

hasWon hasWon

citizenOf | ((bornIn | livesIn) · locatedIn∗)

South African if a citizen or born or lives in a placelocated there

Page 60: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular expressions

Regular expression over alphabet Σ of edge labels:I ε (empty string) is a regular expressionI any label in Σ is a regular expressionI if r1 and r2 are regular expressions, then so are

I (r1|r2) (alternation)I (r1 · r2) (concatenation)

I if r is regular expression, then so is r∗ (closure)I may also use a− to mean traversal of edge labelled a

in the reverse directionI r+ is shorthand for (r · r∗)

I r? is shorthand for (r |ε)I Σ is shorthand for (a1| · · · |an) if Σ = {a1, . . . ,an}

Page 61: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular languages

Language L(r) (set of sequences of labels) denoted by ris given by:

I ε denotes {ε}I a ∈ Σ denotes {a}I (r1|r2) denotes L(r1) ∪ L(r2)

I (r1 · r2) denotes L(r1) · L(r2)

I r∗ denotes L(r)∗

Page 62: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Paths satisfying regular expressions

Given a graph G = (N,E ,V ,Σ, φ, λ)

I a path p is a sequence of edges (e1,e2, . . . ,en) suchthat, for each 1 ≤ i ≤ n, if φ(ei) = (x , y), thenφ(ei+1) = (y , z) for some x , y , z ∈ N

I the path label of p is given by λ(e1) · λ(e2) · · ·λ(en)and is denoted λ(p)

I path p satisfies regular expression r if λ(p) ∈ L(r)

Regular path query: given r and G, find all pairs of nodes(x , y) in G such there is a path from x to y which satisfiesr

Page 63: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Examples of regular path queries

X YcitizenOf | ((bornIn | livesIn) · locatedIn∗)

X Yrestaurant · (address)? · zipcode

X YinstanceOf · (subclass)∗

X (selling · bidder)+

Page 64: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Examples of regular path queries

X YcitizenOf | ((bornIn | livesIn) · locatedIn∗)

X Yrestaurant · (address)? · zipcode

X YinstanceOf · (subclass)∗

X (selling · bidder)+

Page 65: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Examples of regular path queries

X YcitizenOf | ((bornIn | livesIn) · locatedIn∗)

X Yrestaurant · (address)? · zipcode

X YinstanceOf · (subclass)∗

X (selling · bidder)+

Page 66: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Examples of regular path queries

X YcitizenOf | ((bornIn | livesIn) · locatedIn∗)

X Yrestaurant · (address)? · zipcode

X YinstanceOf · (subclass)∗

X (selling · bidder)+

Page 67: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Complexity of regular path query evaluation

I REGULAR PATH PROBLEM

Given graph G, pair of nodes x and y and regularexpression r , is there a path from x to y satisfying r?

I algorithm:I construct a nondeterministic finite automaton (NFA)

M accepting L(r)I assume M has initial state s0 and final state sfI consider G as an NFA with initial state x and final

state yI form the “intersection” I of M and GI check if there is a path from (s0, x) to (sf , y)

I Each step can be done in PTIME, so REGULAR PATH

PROBLEM has PTIME combined complexity

Page 68: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluation

NFA M for r = citizenOf | ((bornIn | livesIn) · locatedIn∗)

s0start

sf

s1

bornIn

livesIn

citizenOf ε

locatedIn

Page 69: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluationGraph G:

a

b

c

SA

CT

UK

citizenOf

bornIn

livesIn

bornIn

locatedIn

Page 70: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluationIntersection of G and M

a, s0

b, s0

c, s0

SA, s1

CT , s1

UK , s1

SA, sf

CT , sf

UK , sf

citizenOf

bornIn

livesIn

bornIn

locatedIn

ε

ε

ε

Page 71: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluationIntersection of G and M

a, s0 SA, sfa, s0

b, s0

c, s0

SA, s1

CT , s1

UK , s1

SA, sf

CT , sf

UK , sf

citizenOf

bornIn

livesIn

bornIn

locatedIn

ε

ε

ε

Page 72: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluationIntersection of G and M

b, s0

SA, sfa, s0

b, s0

c, s0

SA, s1

CT , s1

UK , s1

SA, sf

CT , sf

UK , sf

citizenOf

bornIn

livesIn

bornIn

locatedIn

ε

ε

ε

Page 73: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluationIntersection of G and M

b, s0 CT , sf

a, s0

b, s0

c, s0

SA, s1

CT , s1

UK , s1

SA, sf

CT , sf

UK , sf

citizenOf

bornIn

livesIn

bornIn

locatedIn

ε

ε

ε

Page 74: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluationIntersection of G and M

b, s0

UK , sf

a, s0

b, s0

c, s0

SA, s1

CT , s1

UK , s1

SA, sf

CT , sf

UK , sf

citizenOf

bornIn

livesIn

bornIn

locatedIn

ε

ε

ε

Page 75: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluationIntersection of G and M

c, s0 UK , sf

a, s0

b, s0

c, s0

SA, s1

CT , s1

UK , s1

SA, sf

CT , sf

UK , sf

citizenOf

bornIn

livesIn

bornIn

locatedIn

ε

ε

ε

Page 76: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular path query evaluation

Alternatively can translate

citizenOf | ((bornIn | livesIn) · locatedIn∗)

to Datalog (as done by Graphlog, e.g.)

assoc(X ,Y ) ← bornIn(X ,Y )

assoc(X ,Y ) ← livesIn(X ,Y )

partOf (X ,Y ) ← locatedIn(X ,Y )

partOf (X ,Y ) ← locatedIn(X ,Z ),partOf (Z ,Y )

answer(X ,Y ) ← citizenOf (X ,Y )

answer(X ,Y ) ← assoc(X ,Y )

answer(X ,Y ) ← assoc(X ,Z ),partOf (Z ,Y )

Page 77: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular simple path queries

I path p is simple if no node is repeated on pI REGULAR SIMPLE PATH PROBLEM

Given graph G, pair of nodes x and y and regularexpression r , is there a simple path from x to ysatisfying r?

I REGULAR SIMPLE PATH PROBLEM is NP-complete,even for fixed expressions[Mendelzon and Wood, 1989]

I there can be a path from x to y satisfying r but nosimple path satisfying r , e.g., r = (c · d)∗

a bc

d

Page 78: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular simple path queries

I path p is simple if no node is repeated on pI REGULAR SIMPLE PATH PROBLEM

Given graph G, pair of nodes x and y and regularexpression r , is there a simple path from x to ysatisfying r?

I REGULAR SIMPLE PATH PROBLEM is NP-complete,even for fixed expressions[Mendelzon and Wood, 1989]

I there can be a path from x to y satisfying r but nosimple path satisfying r , e.g., r = (c · d)∗

a bc

d

Page 79: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Regular simple path queries

I path p is simple if no node is repeated on pI REGULAR SIMPLE PATH PROBLEM

Given graph G, pair of nodes x and y and regularexpression r , is there a simple path from x to ysatisfying r?

I REGULAR SIMPLE PATH PROBLEM is NP-complete,even for fixed expressions[Mendelzon and Wood, 1989]

I there can be a path from x to y satisfying r but nosimple path satisfying r , e.g., r = (c · d)∗

a bc

d

Page 80: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 81: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Edge label variables

I what relationship(s) exist between Coetzee andSouthAfrica?

X ← (Coetzee,X ,SouthAfrica)

a “schema-level” queryI answers might be: {bornIn, livesIn, citizenOf}I find people X and things Y such that X is related Y

in the same way as Coetzee is related to Y

(X ,Y )← (Coetzee,Z ,Y ), (Y ,Z−,X )

superscript − indicates traversal in reverse direction

Page 82: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Edge label variables

I program analysis example: database transactionI graph:

I nodes represent points in a programI special nodes start and endI edges represent operations, e.g., lock(b) and

unlock(b) of some data item bI is it the case that a transaction tries to lock the same

item more than once (not two-phase)?

← (start , (Σ∗ · lock(X ) · Σ∗ · lock(X ) · Σ∗),end)

I Σ∗ matches any sequence of edge labelsI sometimes called parameterised regular expressions

Page 83: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Negation

I program analysis: def and use of program variablesI to find program points that immediately follow a use

of an uninitialized variable

Y ← (start , (¬def (X ))∗ · use(X ),Y )

I to find only the first use of each uninitialized variablealong each path

Y ,Z ← (start , ((¬(def (X ) | use(X )))∗),Y ),

(Y ,use(X ),Z )

Page 84: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Negation

I program analysis: def and use of program variablesI to find program points that immediately follow a use

of an uninitialized variable

Y ← (start , (¬def (X ))∗ · use(X ),Y )

I to find only the first use of each uninitialized variablealong each path

Y ,Z ← (start , ((¬(def (X ) | use(X )))∗),Y ),

(Y ,use(X ),Z )

Page 85: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Path variables

I may want to know path(s) connecting two nodes:I linked data on the web (DBPedia, Freebase)I link analysis in criminal networksI data provenance

I given regular expression r and variable X , use(r)%X to bind path matching r to X

I edge label variable X is a special case where firstoccurrence is equivalent to (Σ)%X

I paths connecting Coetzee and Gordimer given by

X ← (Coetzee, ((Σ|Σ−)∗)%X ,Gordimer)

answers: bornIn · bornIn− and hasWon · hasWon−

I Lorel uses @, not %; NAGA uses connect keyword

Page 86: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Path variables

I find entities X and Y such that Coetzee is connectedto Y in the same way as X is connected to Y

(X ,Y )← (Coetzee, (Σ∗)%Z ,Y ), (X ,Z ,Y )

I similar to regular expressions with backreferencing,e.g., in egrep (Unix) and in Perl

I membership problem is NP-complete [Aho, 1980];data complexity is PTIME

I in general, can denote non-context-free languages,e.g., {ww | w ∈ Σ∗} as above

I can also do local binding: ((Σ%X ) · X )∗

Page 87: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 88: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Motivation

To be able to answer traditional graph queries likeI degree of a nodeI distance between pairs of nodesI eccentricity of a nodeI diameter, radius and centre of a graph

and applications likeI shortest pathI most reliable pathI critical pathI bill of materialsI . . .

need operators such as count, min, max, sum

Page 89: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Motivation

To be able to answer traditional graph queries likeI degree of a nodeI distance between pairs of nodesI eccentricity of a nodeI diameter, radius and centre of a graph

and applications likeI shortest pathI most reliable pathI critical pathI bill of materialsI . . .

need operators such as count, min, max, sum

Page 90: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Aggregation in Graphlog

I aggregate terms are allowed in label of distinguishededge or distinguished node

I following query computes, for each directory D, thetotal file space used by all contained files andsub-directories, other than those residing on disk1

disk1 F S

D sum(S)

residesOnsize

contains+

diskUtil

Page 91: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

From Graphlog to Datalog

disk1 F S

D sum(S)

residesOnsize

contains+

diskUtil

containsPlus(X ,Y ) ← contains(X ,Y )

containsPlus(X ,Y ) ← contains(X ,Z ), containsPlus(Z ,Y )

diskUtil(D, sum(S)) ← containsPlus(D,F ), size(F ,S),

¬residesOn(F ,disk1)

Page 92: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

From Graphlog to Datalog

disk1 F S

D sum(S)

residesOnsize

contains+

diskUtil

containsPlus(X ,Y ) ← contains(X ,Y )

containsPlus(X ,Y ) ← contains(X ,Z ), containsPlus(Z ,Y )

diskUtil(D, sum(S)) ← containsPlus(D,F ), size(F ,S),

¬residesOn(F ,disk1)

Page 93: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Graphlog exampleI we might also want to summarise values along a

path and then aggregateI following query computes the length of the shortest

path between each pair of nodes

X Ydistance( )+(D)

shortestPath(min(sum(D))

I D is called a collecting variableI sum is used to summarise distances along a pathI min is used to aggregate the summarised distancesI query evaluation is in PTIME if summarisation and

aggregation operators form a closed semiring[Consens and Mendelzon, 1990]

Page 94: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

OutlineBackground

GraphsPersonal interestUses of graphs

Graph models and query languagesG, G+ and GraphlogLore/LorelYAGO/NAGAOther models and languages

Query FunctionalityOverviewGraph pattern matchingPath findingEdge and path variablesAggregationApproximate matching and ranking

Page 95: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Motivation

I users may not be familiar with graphstructure/constraints

I may formulate queries which return no answers ortoo few answers, e.g.

I expression course · student when correct path isstudent · course

I expression restaurant · zipcode when address isrequired between them

I can perform approximate matching of pathsI rank results in terms of “closeness” to original query

Page 96: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Approximate matching

I can modify the user’s original query (regularexpression r )

I one way is to apply edit operations to L(r)I insertionsI deletionsI substitutionsI transpositionsI invertions

I each operation may have a different costI somewhat related to user preferences

I prepared to substitute train by bus but at cost 2

Page 97: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Approximate matching algorithm

I for conjunctive regular path queriesI can use algorithms from approximate string matchingI incrementally build an approximate NFAI perform incremental joins for conjunctsI PTIME combined complexity if conjuncts are acyclic

and fixed number of head variablesI in general, can transform NFA using a regular

transducerI see [Hurtado, Poulovassilis and Wood, 2009]

Page 98: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Summary

I motivated that graph-based data is widely used andavailable

I a brief high-level overview of some query languagesfor graph databases

I focussed on query language functionalityI some discussion of query evaluation algorithmsI some complexity results mentioned

Page 99: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

Issues not covered

Many issues not coveredI other languagesI more query evaluation strategies, e.g., using indexesI graphs with schemasI query optimisation, e.g., containmentI . . .

Page 100: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

ReferencesS. Abiteboul, D. Quass, J. McHugh, J. Widom, andJ. L. Wiener.The LOREL query language for semistructured data.Int. J. on Digital Libraries, 1(1):68–88, April 1997.

A. V. Aho.Pattern matching in strings.In R. V. Book, editor, Formal Language Theory:Perspectives and Open Problems, pages 325–347.Academic Press, 1980.

R. Angles and C. Gutierrez.Survey of graph database models.ACM Comput. Surv., 40(1):1–39, 2008.

M. P. Consens and A. O. Mendelzon.Expressing structural hypertext queries in GraphLog.In Proc. Second ACM Conf. on Hypertext, pages269–292, 1989.

Page 101: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

References

M. P. Consens and A. O. Mendelzon.Low complexity aggregation in GraphLog andDatalog.In Proc. 3rd Int. Conf. on Database Theory, pages379–394, 1990.

I. F. Cruz, A. O. Mendelzon, and P. T. Wood.A graphical query language supporting recursion.In ACM SIGMOD Int. Conf. on Management of Data,pages 323–330, 1987.

I. F. Cruz, A. O. Mendelzon, and P. T. Wood.G+: Recursive queries without recursion.In Proc. 2nd Int. Conf. on Expert Database Systems,pages 355–368, 1988.

Page 102: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

References

C. A. Hurtado, A. Poulovassilis, and P. T. Wood.Ranking Approximate Answers to Semantic WebQueries.In Proc. 6th European Semantic Web Conference,pages 263–277, 2009.

A. O. Mendelzon and P. T. Wood.Finding regular simple paths in graph databases.In Proc. 15th Int. Conf. on Very Large Data Bases,pages 185–193, 1989.

G. Weikum, G. Kasneci, M. Ramanath, andF. Suchanek.Database and information-retrieval methods forknowledge discovery.Commun. ACM, 52(4):56–64, 2009.

Page 103: Query Languages for Graph Databases - Home - …ptw/tutorial.pdf · Query Languages for Graph Databases Peter T. Wood Background Graphs Personal interest Uses of graphs Graph models

Query Languagesfor Graph

Databases

Peter T. Wood

BackgroundGraphs

Personal interest

Uses of graphs

Graph models andquery languagesG, G+ and Graphlog

Lore/Lorel

YAGO/NAGA

Other models andlanguages

QueryFunctionalityOverview

Graph pattern matching

Path finding

Edge and path variables

Aggregation

Approximate matching andranking

Summary

References

M. Yannakakis.Algorithms for acyclic database schemes.In Proc. 7th Int. Conf. on Very Large Data Bases,pages 82–94, 1981.