pregel: a system for large-scale graph processing presented by dylan davis authors: grzegorz...
TRANSCRIPT
![Page 1: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/1.jpg)
Pregel: A System for Large-Scale Graph
Processing
Presented by Dylan DavisAuthors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Ilan Horn, Naty Leiser, Grzegorz Czajkowski(GOOGLE, INC.)
![Page 2: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/2.jpg)
Overview
•What is a graph?•Graph Problems• The Purpose of Pregel•Model of Computation•C++ API• Implementation•Applications• Experiments
![Page 3: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/3.jpg)
What is a graph?G = (V, E)
Binary Tree
![Page 4: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/4.jpg)
Graph Problems
Network Routing Social Network Connections
![Page 5: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/5.jpg)
The Purpose of Pregel
•Google was interested in applications that could perform internet-related graph algorithms, such as PageRank, so they designed Pregel to perform these tasks efficiently.• It is a scalable, general-purpose system for implementing graph algorithms in a distributed environment.•Focus on “Thinking Like a Vertex” and parallelism
![Page 6: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/6.jpg)
Model of Computation
![Page 7: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/7.jpg)
Model of Computation (Vertex)
Vertex ID
Vertex Value
Edge ValueVertex
ID
Vertex ID
Edge Value
![Page 8: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/8.jpg)
Model of Computation (Superstep)
Superstep 0 Superstep 1 Superstep 2
Execution Time
Compute()
Compute()
Compute() Compute()
Compute()
Compute() Compute()
Compute()
Compute()
![Page 9: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/9.jpg)
Model of Computation (Vertex Actions)
A vertex can:
Vertex ID
Vertex Value
• Modify its values• Receive messages from
previous superstep• Send messages• Request topology changes
![Page 10: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/10.jpg)
Model of Computation (State Machine)
![Page 11: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/11.jpg)
![Page 12: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/12.jpg)
C++ API
![Page 13: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/13.jpg)
![Page 14: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/14.jpg)
C++ API (Message Passing)
DestinationVertex ID
Message Value
2 571 2
Message Buffer
![Page 15: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/15.jpg)
C++ API (Combiners & Aggregators)
Combiner Aggregator
![Page 16: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/16.jpg)
C++ API (Topology Mutations)V
Superstep
![Page 17: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/17.jpg)
C++ API (Input and Output)
0 1 2 3 40 0 0 1 1 01 0 0 0 1 12 1 1 0 1 13 0 1 1 0 14 1 1 1 0 0
![Page 18: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/18.jpg)
Implementation
![Page 19: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/19.jpg)
Implementation (Basic Architecture)
![Page 20: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/20.jpg)
Implementation (Program Execution)
Flow:1. Copy user program – Master copy & worker
copies2. Master assigns graph partitions3. Master takes user input data, assigns to workers
– load vertex data4. Supersteps (Compute() and send messages)5. Save output
![Page 21: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/21.jpg)
Implementation (Fault Tolerance)
Checkpoint
WorkerSave()
WorkerSave()
WorkerSave()
Recover
WorkerRecompute()
WorkerWorker
Recompute() X
![Page 22: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/22.jpg)
Implementation (Worker)
Worker Worker
![Page 23: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/23.jpg)
Implementation (Master)
List of Workers
Master
Partitions
![Page 24: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/24.jpg)
Applications
![Page 25: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/25.jpg)
Applications (Shortest Path)
2 1
5
3
![Page 26: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/26.jpg)
Experiments
![Page 27: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/27.jpg)
Experiments (Description)
• Test the execution times of Pregel running the Single-Source Shortest Path algorithm. •Use a cluster of 300 multicore commodity PCs.•Run Pregel with Binary Tree graphs, and with a more
realistic, randomly-distributed graph. •Results do not include initialization, graph generation,
and result verification times.• Failure Recovery is not included (reduces overhead)
![Page 28: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/28.jpg)
![Page 29: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/29.jpg)
![Page 30: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/30.jpg)
![Page 31: Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,](https://reader030.vdocuments.mx/reader030/viewer/2022032605/56649e7e5503460f94b825a5/html5/thumbnails/31.jpg)
Conclusion
•Pregel is a model suitable for large-scale graph computing with a production-quality, scalable and fault tolerant implementation.
•Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges.
•This implementation is flexible enough to express a broad set of algorithms.