piccolo: building fast distributed programs with ... · pagerank in mapreduce 1 2 3 distributed...

32
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University

Upload: others

Post on 04-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Piccolo: Building fast distributed programs with partitioned tables

Russell Power Jinyang Li New York University

Page 2: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Motivating Example: PageRank

for each node X in graph: for each edge XZ: next[Z] += curr[X]

Repeat until convergence

AB,C,D

BE

CD

A: 0.12

B: 0.15

C: 0.2

A: 0

B: 0

C: 0

Curr Next Input Graph

A: 0.2

B: 0.16

C: 0.21

A: 0.2

B: 0.16

C: 0.21

A: 0.25

B: 0.17

C: 0.22

A: 0.25

B: 0.17

C: 0.22

Fits in memory!

Page 3: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

PageRank in MapReduce

1

2 3

Distributed Storage

Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2

Rank stream A:0.1, B:0.2

  Data flow models do not expose global state.

Page 4: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

PageRank in MapReduce

1

2 3

Distributed Storage

Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2

Rank stream A:0.1, B:0.2

  Data flow models do not expose global state.

Page 5: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

PageRank With MPI/RPC

1

2 3

Distributed Storage

Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

User explicitly programs

communication

Page 6: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Piccolo’s Goal: Distributed Shared State

1

2 3

Distributed Storage

Graph A->B,C B->D …

Ranks A: 0

B: 0 …

read/write

Distributed in-memory state

Page 7: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Piccolo’s Goal: Distributed Shared State

1

2 3

Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

Piccolo runtime handles

communication

Page 8: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Ease of use Performance

Page 9: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2
Page 10: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Talk outline

  Motivation   Piccolo's Programming Model   Runtime Scheduling   Evaluation

Page 11: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Programming Model

1

2 3

Graph AB,C BD …

Ranks A: 0

B: 0 …

read/write x get/put update/iterate

Implemented as library for C++ and Python

Page 12: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

def main(): for i in range(50): launch_jobs(NUM_MACHINES, pr_kernel,

graph, curr, next) swap(curr, next) next.clear()

def pr_kernel(graph, curr, next): i = my_instance n = len(graph)/NUM_MACHINES for s in graph[(i-1)*n:i*n] for t in s.out: next[t] += curr[s.id] / len(s.out)

Naïve PageRank with Piccolo

Run by a single controller

Jobs run by many machines

curr = Table(key=PageID, value=double) next = Table(key=PageID, value=double)

Controller launches jobs in parallel

Page 13: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Naïve PageRank is Slow

1

2 3 Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

get

put

put

put

get

get

Page 14: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

PageRank: Exploiting Locality Control table partitioning

Co-locate tables

Co-locate execution with table

curr = Table(…,partitions=100,partition_by=site) next = Table(…,partitions=100,partition_by=site) group_tables(curr,next,graph)

def pr_kernel(graph, curr, next): for s in graph.get_iterator(my_instance) for t in s.out: next[t] += curr[s.id] / len(s.out)

def main(): for i in range(50): launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) swap(curr, next) next.clear()

Page 15: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Exploiting Locality

1

2 3 Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

get

put

put

put

get

get

Page 16: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Exploiting Locality

1

2 3 Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

put

get

put put

get get

Page 17: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Synchronization

1

2 3 Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

put (a=0.3)

put (a=0.2)

How to handle synchronization?

Page 18: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Synchronization Primitives

 Avoid write conflicts with accumulation functions  NewValue = Accum(OldValue, Update)

  sum, product, min, max

Global barriers are sufficient   Tables provide release consistency

Page 19: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

PageRank: Efficient Synchronization curr = Table(…,partition_by=site,accumulate=sum) next = Table(…,partition_by=site,accumulate=sum) group_tables(curr,next,graph)

def pr_kernel(graph, curr, next): for s in graph.get_iterator(my_instance) for t in s.out: next.update(t, curr.get(s.id)/len(s.out))

def main(): for i in range(50): handle = launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) barrier(handle) swap(curr, next) next.clear()

Accumulation via sum

Update invokes accumulation function

Explicitly wait between iterations

Page 20: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Efficient Synchronization

1

2 3 Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

put (a=0.3) put (a=0.2) update (a, 0.2)

update (a, 0.3)

Runtime computes sum

Workers buffer updates locally Release consistency

Page 21: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Table Consistency

1

2 3 Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

put (a=0.3) put (a=0.2) update (a, 0.2)

update (a, 0.3)

Page 22: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

PageRank with Checkpointing

curr = Table(…,partition_by=site,accumulate=sum) next = Table(…,partition_by=site,accumulate=sum) group_tables(curr,next) def pr_kernel(graph, curr, next): for node in graph.get_iterator(my_instance) for t in s.out: next.update(t,curr.get(s.id)/len(s.out))

def main(): curr, userdata = restore() last = userdata.get(‘iter’, 0) for i in range(last,50): handle = launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) cp_barrier(handle, tables=(next), userdata={‘iter’:i}) swap(curr, next) next.clear()

Restore previous computation

User decides which tables to checkpoint and when

Page 23: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Distributed Storage

Recovery via Checkpointing

1

2 3 Graph A->B,C

Ranks A: 0 …

Graph B->D …

Ranks B: 0 …

Graph C->E,F

Ranks C: 0 …

Runtime uses Chandy-Lamport

protocol

Page 24: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Talk Outline

  Motivation   Piccolo's Programming Model   Runtime Scheduling   Evaluation

Page 25: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Load Balancing

1 3 2

master

J5 J3

Other workers are updating P6!

Pause updates!

P1 P1, P2

P3 P3, P4

P5 P5, P6 J1

J1, J2 J3

J3, J4 J6

J5, J6

P6

Coordinates work-

stealing

Page 26: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Talk Outline

  Motivation   Piccolo's Programming Model   System Design   Evaluation

Page 27: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Piccolo is Fast

 NYU cluster, 12 nodes, 64 cores  100M-page graph

Main Hadoop Overheads:

 Sorting

 HDFS

 Serialization

8 16 32 64Workers

0

50

100

150

200

250

300

350

400

Pag

eRan

k ite

ratio

n tim

e

(sec

onds

)

Hadoop Piccolo

Page 28: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Piccolo Scales Well

  EC2 Cluster - linearly scaled input graph

12 24 48 100 200Workers

0

10

20

30

40

50

60

70

ideal

1 billion page graph

Pag

eRan

k ite

ratio

n tim

e

(sec

onds

)

Page 29: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

  Iterative Applications   N-Body Simulation   Matrix Multiply

 Asynchronous Applications   Distributed web crawler‏

Other applications

No straightforward Hadoop implementation

Page 30: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Related Work

 Data flow  MapReduce, Dryad

 Tuple Spaces  Linda, JavaSpaces

 Distributed Shared Memory  CRL, TreadMarks, Munin, Ivy  UPC, Titanium

Page 31: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Conclusion

 Distributed shared table model  User-specified policies provide for

 Effective use of locality  Efficient synchronization  Robust failure recovery

Page 32: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2

Gratuitous Cat Picture

I can haz kwestions?

Try it out: piccolo.news.cs.nyu.edu