property graphs with time - amazon s3 · october 25, 2017 opencypher meetup system architecture 17...

28
Property graphs with time Julia Stoyanovich, joint work with Vera Moffitt Drexel University Philadelphia, PA USA stoyanovich.org openCypher Meetup October 25, 2017

Upload: others

Post on 23-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

Property graphs with time

Julia Stoyanovich, joint work with Vera Moffitt

Drexel UniversityPhiladelphia, PA USA

stoyanovich.org

openCypher MeetupOctober 25, 2017

Page 2: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017 2

2008 20092007

20112010

Page 3: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017 3

https://www.kenedict.com/apples-internal-innovation-network-unraveled-part-1-evolving-networks/

Page 4: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017 4

https://arxiv.org/abs/1709.06176

Page 5: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Exploratory analysis of evolving graphs

• Which nodes are showing an increasing popularity trend?

• Have any changes in network connectivity been observed?

• At what time scale can interesting trends be observed?

• How can multiple data sources be used jointly to complement or corroborate information about network evolution?

5

Page 6: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Goal

6

Principled and systematics support for usable, scalable and extensible analysis of evolving graphs

Page 7: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Are Alice and Bill connected?

7

TNGP

… by a path?

Page 8: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Snapshot reducibility

8

Page 9: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Are Alice and Bill connected?

extended snapshot reducibility9

… by a journey?

… by a path that persists over >2 time instants

Page 10: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

TGraph: an evolving property graph

10

Page 11: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

TGA: Temporal Graph Algebra

• Temporal variants of standard graph operators + novel time-specific operators

• Compositional: TGraph (or a pair of TGraphs) as input - TGraph as output

• Operations maintain model integrity

- graph integrity at each time instant: no dangling edges, a node/edge appears at most once

- temporal integrity: semantics of temporal operations are automatically enforced (formally: point semantics)

11

Page 12: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

TGA operations

• trim

• temporal versions of

- vertex-map, edge-map

- subgraph, path

- aggregate messages

- union, intersection, difference - binary

• snapshot analytics

- PageRank, connected components,… - Pregel

12

Page 13: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

TGA operations

• node creation

• based on temporal window: temporal zoom

• attribute-based: structural zoom

• edge creation

13

Page 14: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Structural zoom

14

add university nodes Drexel and CMU, and edges between students and these universities

Page 15: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Structural zoom

15

Page 16: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Temporal zoom

16

coarsen taxi trip start-times into 10-min intervals

Page 17: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

System architecture

17

Portal

InteractiveShell

QueryParser

SparkRuntime

GraphXDataStructures

WorkerSparkRuntime

HDFS

WorkerSparkRuntime

HDFS

SystemCatalog

SparkSQL

PortalRuntime(optimizer,operators,etc)

Spark 2.0, interoperable with SparkSQL and with BigDatalog

Page 18: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Physical data representation• On-disk: Apache Parquet

- vertex / edge files

- broken down into snapshot groups

- each file sorted on start time followed by node /edge id

• In-memory:

- nested relational (Vertex-Edge RDDs)

- GraphX-based: RepresentativeGraphs (RG), One Graph (OG), HybridGraph (HG)

18

1 2 3

BitSet(p1,p2,p3,p4) BitSet(p2,p3,p4,p5)

BitSet(p5)

BitSet(p1,p2,p3,p4,p5)

BitSet(p2,p3)

JULIA’S VERSION

Page 19: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Performance highlights

• 16-node Open Stack cluster

• Apache Spark 2.0

• 4 cores, 16GB / RAM per node

19

Page 20: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

PageRank on wiki-talk

20

Page 21: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

PageRank on nGrams

21

Page 22: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

PageRank on Twitter

22

Page 23: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Aggregate messages on wiki-talk

23

Page 24: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Vertex-subgraph on wiki-talk

24

Page 25: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Portal vs. G*

25

average node degree, wiki-talk

Page 26: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Take-aways

• TGraph: a logical model of property graphs with time

• TGA: a compositional temporal graph algebra under point semantics

• Portal: a library on top of Apache Spark, inter-operable with SparkSQL

• Ongoing work on a declarative language, multi-operator query optimization, benchmarking

• Planned open source release this Fall

26

Page 27: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

References

• Temporal Graph Algebra, Moffitt & Stoyanovich, DBPL 2017.

• Zooming in on NYC taxi data with Portal, Stoyanovich, Gilbride and Moffitt, DSSG 2017 (arXiv).

• Towards sequenced semantics for evolving graphs, Moffitt & Stoyanovich, EDBT 2017.

• Towards a distributed infrastructure for evolving graph analytics, Moffitt & Stoyanovich, TempWeb 2016.

• Vera Moffitt’s Ph.D. thesis.

27

Page 28: Property graphs with time - Amazon S3 · October 25, 2017 openCypher Meetup System architecture 17 Portal Interactive Shell Query Parser Spark Runtime GraphX Data Structures Worker

openCypher MeetupOctober 25, 2017

Thank you!