tran nam-luc – stale synchronous parallel iterations on flink
TRANSCRIPT
Stale Synchronous Parallel Iterations on Flink
TRAN Nam-Luc / Engineer @EURA NOVA Research & Development
FLINK FORWARD 2015BERLIN, GERMANYOCTOBER 2015
Our people:
40 employees from business engineers to data scientists
7 freelances 3 founding partners
EURA NOVA?OUR INNOVATION-DRIVEN MODEL & DISRUPTIVE CULTURE
KEY FIGURES“EURA NOVA is a team of passionated IT
experts devoted to providing knowledge & skills to people with great ideas”
Data Science, Distributed computing, Software engineering, Big Data.
Our researches Since 2009
2 Phd thesis & 18 master thesis
with 4 renowned Universities20 publications
in conferences as lecturer4 large R&D projects
3 open-source products
How not to synchronize workers
Worker 1
Worker 2Worker 3
Worker 4
Worker 6
Worker 5
Worker 7
Worker 8
Worker 9
Worker 10
STRAGGLER
Bulk Synchronous Parallelism synchronizes threads after each iteration.
THE BIG PICTURE
4
There are always stragglers in a cluster.
In large clusters, that causes a lot of workers waiting !
Gonna dig me a hole (gonna dig me a hole),
Gonna put a nerd in it (gonna put a nerd in it),
Gonna take a firecracker (gonna take a firecracker)…
Worker 1
Worker 2
Worker 3
CONTRIBUTION
6
1. STALE SYNCHRONOUS PARALLEL ITERATIONS
Tackling the straggler problem within Flink
2. DISTRIBUTED FRANK-WOLFE ALGORITHM
Applied on LASSO regression, as use case
There are stragglers in distributed processing frameworks …
→ Hardware heterogeneity→ Skewed data distribution→ Garbage collection
8
THE STRAGGLER PROBLEM
Iteration time
Not predictableCostly to reschedule !
… especially in the context of data center operating systems:
Distribution of iterative-convergent algorithms:
9
BULK VS STALE SYNCHRONOUS
STALE STALE
Classic
Explicit synchronization
barrier
10
PARAMETER SERVER
STALE STALE
Explicit synchronization
barrier
How to keep workers up-to-date?
x
x
x
Parameter server
1. SSP iteration control model
2. Parameter server
11
INTEGRATION WITH FLINK
What does Flink need to enable SSP?
if clocki <= cluster-wide clock + staleness
do iteration
++clocki , then send to clock
i synchronization sink
else wait until clocki <= cluster-wide clock + staleness
12
ITERATION CONTROL MODEL IN FLINK
Worker pi
Clock Synchronization Sink
clocki
cluster-wide clock
store clocki in C
cluster-wide clock = min(C)
broadcast cluster-wide clock if changed
ITERATION CONTROL MODEL IN FLINK
ClockEvent
13
IterationHead
worker done
worker done
worker done
IterationIntermediate
IterationTail
backchannel
IterationHead
IterationIntermediate
IterationTail
backchannel
IterationHead
IterationIntermediate
IterationTail
backchannel
all workers done
all workers done
all workers done
IterationSynchronizationTask
ITERATION CONTROL MODEL IN FLINK
ClockEvent
14
IterationHead
Clock pi
IterationIntermediate
IterationTail
backchannel
IterationHead
IterationIntermediate
IterationTail
backchannel
IterationHead
IterationIntermediate
IterationTail
backchannel
ClockSynchronizationTask
cluster-wide clock
Clock pi
Clock pi
cluster-wide clock
cluster-wide clock
15
ITERATION CONTROL MODEL IN FLINK
SuperstepBarrier
IterationHeadPACTTask
SyncEventHandler
IterationSynchronizationTask SSPIterationHeadPACTTask
ClockHolder
ClockSyncEventHandler
ClockSynchronizationTask
BULK SYNCHRONOUS PARALLEL
Convergence determined at synchronization barrier
16
CONVERGENCE CHECK
STALE SYNCHRONOUS PARALLEL
Convergence reached when no more worker can improve the solution
dataSet.Iterate(nIterations)
17
STALE SYNCHRONOUS API
dataSet.IterateWithSSP(nIterations, staleness)
Simple APIRichMapFunctionWithParameterServer extends RichMapFunction {
update(id, clock, parameter)
get(id)
}
18
PARAMETER SERVER
DATA GRID
SHARED MODEL
Worker Worker Worker Worker
Architecture
Solving the current optimization problem:
Distributed version (Bellet et al. 2015):
20
DISTRIBUTED FRANK-WOLFE ALGORITHM
Linear combination of atoms
sparse coefficients
Distributed version (Bellet et al. 2015):
21
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
Linear combination of atoms
sparse coefficients
Distributed version (Bellet et al. 2015):
22
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
Distributed version (Bellet et al. 2015):
23
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
1. Local selection of atoms
Distributed version (Bellet et al. 2015):
24
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
2. Global consensus
Distributed version (Bellet et al. 2015):
25
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
3. α Coefficients update
Stale synchronous version:
26
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
1. Get α coefficients from parameter server
Parameter Server
Stale synchronous version:
27
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
2. Local selection of atoms
Parameter Server
Stale synchronous version:
28
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
3. Compute α coefficients from locally selected atoms
Parameter Server
Stale synchronous version:
29
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
4. Update α coefficients to parameter server
Parameter Server
Stale synchronous version:
30
DISTRIBUTED FRANK-WOLFE ALGORITHM
Ato
m 1
Ato
m 2
Ato
m 3
Ato
m 4
...
Ato
m n
W1 W2 W3
Repeat while within staleness bounds
Parameter Server
See our full paper for
� full implementation details� properties � application to LASSO REGRESSION� convergence proof
N-L Tran, T Peel, S Skhiri, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, proceedings of IEEE BigData 2015, Santa Clara, November 2015
DISTRIBUTED FRANK-WOLFE ALGORITHM
31
Application on LASSO regression
Random sparse 1.000 x 10.000 matrices
Sparsity ratio = 0,001
Generated load: at any time, 1 random node under 100% load during 12 seconds
32
EXPERIMENTS
5 nodes, 2 Ghz, 3Gb RAM
Stragglers in a cluster are an issue.
Mitigate them with Stale Synchronous Parallel Iterations.
34
RECAP
Pull request #967
35
WANNA TRY IT OUT?
Stale Synchronous Parallel iterations + API
Pull request #1101
Frank-Wolfe algorithm + LASSO regression
AGENDA
37
1. STALE SYNCHRONOUS PARALLEL ITERATIONS∙ The straggler problem∙ BSP vs SSP∙ Integration with Flink∙ Iteration control model∙ API
2. DISTRIBUTED FRANK-WOLFE ALGORITHM∙ Problem statement∙ Application: LASSO regression∙ Experiments