property graphs with time - amazon s3 · october 25, 2017 opencypher meetup system architecture 17...
TRANSCRIPT
Property graphs with time
Julia Stoyanovich, joint work with Vera Moffitt
Drexel UniversityPhiladelphia, PA USA
stoyanovich.org
openCypher MeetupOctober 25, 2017
openCypher MeetupOctober 25, 2017 2
2008 20092007
20112010
openCypher MeetupOctober 25, 2017 3
https://www.kenedict.com/apples-internal-innovation-network-unraveled-part-1-evolving-networks/
openCypher MeetupOctober 25, 2017 4
https://arxiv.org/abs/1709.06176
openCypher MeetupOctober 25, 2017
Exploratory analysis of evolving graphs
• Which nodes are showing an increasing popularity trend?
• Have any changes in network connectivity been observed?
• At what time scale can interesting trends be observed?
• How can multiple data sources be used jointly to complement or corroborate information about network evolution?
5
openCypher MeetupOctober 25, 2017
Goal
6
Principled and systematics support for usable, scalable and extensible analysis of evolving graphs
openCypher MeetupOctober 25, 2017
Are Alice and Bill connected?
7
TNGP
… by a path?
openCypher MeetupOctober 25, 2017
Snapshot reducibility
8
openCypher MeetupOctober 25, 2017
Are Alice and Bill connected?
extended snapshot reducibility9
… by a journey?
… by a path that persists over >2 time instants
openCypher MeetupOctober 25, 2017
TGraph: an evolving property graph
10
openCypher MeetupOctober 25, 2017
TGA: Temporal Graph Algebra
• Temporal variants of standard graph operators + novel time-specific operators
• Compositional: TGraph (or a pair of TGraphs) as input - TGraph as output
• Operations maintain model integrity
- graph integrity at each time instant: no dangling edges, a node/edge appears at most once
- temporal integrity: semantics of temporal operations are automatically enforced (formally: point semantics)
11
openCypher MeetupOctober 25, 2017
TGA operations
• trim
• temporal versions of
- vertex-map, edge-map
- subgraph, path
- aggregate messages
- union, intersection, difference - binary
• snapshot analytics
- PageRank, connected components,… - Pregel
12
openCypher MeetupOctober 25, 2017
TGA operations
• node creation
• based on temporal window: temporal zoom
• attribute-based: structural zoom
• edge creation
13
openCypher MeetupOctober 25, 2017
Structural zoom
14
add university nodes Drexel and CMU, and edges between students and these universities
openCypher MeetupOctober 25, 2017
Structural zoom
15
openCypher MeetupOctober 25, 2017
Temporal zoom
16
coarsen taxi trip start-times into 10-min intervals
openCypher MeetupOctober 25, 2017
System architecture
17
Portal
InteractiveShell
QueryParser
SparkRuntime
GraphXDataStructures
WorkerSparkRuntime
HDFS
WorkerSparkRuntime
HDFS
…
SystemCatalog
SparkSQL
PortalRuntime(optimizer,operators,etc)
Spark 2.0, interoperable with SparkSQL and with BigDatalog
openCypher MeetupOctober 25, 2017
Physical data representation• On-disk: Apache Parquet
- vertex / edge files
- broken down into snapshot groups
- each file sorted on start time followed by node /edge id
• In-memory:
- nested relational (Vertex-Edge RDDs)
- GraphX-based: RepresentativeGraphs (RG), One Graph (OG), HybridGraph (HG)
18
1 2 3
BitSet(p1,p2,p3,p4) BitSet(p2,p3,p4,p5)
BitSet(p5)
BitSet(p1,p2,p3,p4,p5)
BitSet(p2,p3)
JULIA’S VERSION
openCypher MeetupOctober 25, 2017
Performance highlights
• 16-node Open Stack cluster
• Apache Spark 2.0
• 4 cores, 16GB / RAM per node
19
openCypher MeetupOctober 25, 2017
PageRank on wiki-talk
20
openCypher MeetupOctober 25, 2017
PageRank on nGrams
21
openCypher MeetupOctober 25, 2017
PageRank on Twitter
22
openCypher MeetupOctober 25, 2017
Aggregate messages on wiki-talk
23
openCypher MeetupOctober 25, 2017
Vertex-subgraph on wiki-talk
24
openCypher MeetupOctober 25, 2017
Portal vs. G*
25
average node degree, wiki-talk
openCypher MeetupOctober 25, 2017
Take-aways
• TGraph: a logical model of property graphs with time
• TGA: a compositional temporal graph algebra under point semantics
• Portal: a library on top of Apache Spark, inter-operable with SparkSQL
• Ongoing work on a declarative language, multi-operator query optimization, benchmarking
• Planned open source release this Fall
26
openCypher MeetupOctober 25, 2017
References
• Temporal Graph Algebra, Moffitt & Stoyanovich, DBPL 2017.
• Zooming in on NYC taxi data with Portal, Stoyanovich, Gilbride and Moffitt, DSSG 2017 (arXiv).
• Towards sequenced semantics for evolving graphs, Moffitt & Stoyanovich, EDBT 2017.
• Towards a distributed infrastructure for evolving graph analytics, Moffitt & Stoyanovich, TempWeb 2016.
• Vera Moffitt’s Ph.D. thesis.
27
openCypher MeetupOctober 25, 2017
Thank you!