concurrency control for machine learning joseph e. gonzalez post-doc, uc berkeley amplab...
TRANSCRIPT
Concurrency Control forMachine Learning
Joseph E. GonzalezPost-doc, UC Berkeley [email protected]
In Collaboration withXinghao Pan, Stefanie Jegelka, Tamara Broderick, Michael
I. Jordan
Data
ModelParameters
Serial Machine Learning Algorithm
Data
ModelParameters
Parallel Machine Learning
Data
ModelParameters !!
Parallel Machine Learning
Concurrency:more machines = less time
Correctness:serial equivalence
Data
ModelParameters
Coordination-free
Data
ModelParameters
Concurrency Control
Data
ModelParameters
Serializability
Research Summary
Coordination Free (e.g., Hogwild):
Provably fast and correct under key assumptions.
Concurrency Control (e.g., Mutual Exclusion):
Provably correct and fast under key assumptions.
Research Focus
Optimistic Concurrency Controlto parallelize:
Non-Parametric Clustering
and
Sub-modular Maximization
Data
ModelParameters
Optimistic Concurrency Control
• Optimistic updates• Validation: detect conflict• Resolution: fix conflict
! !
Hsiang-Tsung Kung and John T Robinson.On optimistic methods for concurrency control.
ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.
Concurrency
Correctness
Example:
Serial DP-means Clustering
Sequential!
Brian Kulis and Michael I. Jordan.Revisiting k-means: New algorithms via Bayesian nonparametrics.
In Proceedings of 23rd International Conference on Machine Learning, 2012.
Validation
ResolutionFirst proposal wins
AssumptionNo new cluster created nearby
Example:
OCC DP-means Clustering
Optimistic Concurrency Control for DP-means
Theorem: OCC DP-means is serializable.
Corollary: OCC DP-means preserves theoretical properties of DP-means.
Theorem: Expected overhead of OCC DP-means, in terms of number of rejected proposals, does not depend on size of data set.
Corr
ectn
es
sC
on
cu
rre
ncy
Evaluation: Amazon EC2
1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
Number of Machines
Ru
nti
me I
n S
econ
dP
er
Com
ple
te P
ass o
ver
Data
OCC DP-means Runtime Projected Linear Scaling
~140 million data points; 1, 2, 4, 8 machines
Optimistic Concurrency Controlto parallelize
Non-Parametric Clustering
Summary
Sub-modular Maximization
Next
Motivating ExampleBidding on Keywords:
Apple
iPhone
Android
Games
xBox
Samsung
Microwave
Appliances
Keywords“How big is Apple iPhone”
“iPhone vs Android”
“best Android and iPhone games”“Samsung sues Apple over iPhone”
“Samsung Microwaves”
“Appliance stores in SF”
“Playing games on a Samsung TV”
“xBox game of the year”
Common Queries
Motivating ExampleBidding on Keywords:
Apple
iPhone
Android
Games
xBox
Samsung
Microwave
Appliances
Keywords“How big is Apple iPhone”
“iPhone vs Android”
“best Android and iPhone games”“Samsung sues Apple over iPhone”
“Samsung Microwaves”
“Appliance stores in SF”
“Playing games on a Samsung TV”
“xBox game of the year”
Common QueriesA
B
C
D
E
F
G
H
Keywords Queries1
2
3
4
5
6
7
8
Motivating ExampleBidding on Keywords:
Keywords QueriesA
B
C
D
E
F
G
H
1
2
3
4
5
6
7
8
$2
$5
$1
$2
$5
$1
$4
$2
Cost
s
$2
$2
$4
$4
$3
$6
$5
$1
Valu
e
Purchase
Motivating ExampleBidding on Keywords:
Keywords QueriesA
C
D
E
F
G
H
5
6
7
8
$2
$5
$1
$2
$5
$1
$4
$2
Cost
s
$2
$2
$4
$4
$3
$6
$5
$1
Valu
e
B
1
2
3
4
Cover $5- Cost:
$12
Revenue:
$7Profit:
Purchase
Purchase
Motivating ExampleBidding on Keywords:
Keywords QueriesA
D
E
F
G
H
5
6
7
8
$2
$5
$1
$2
$5
$1
$4
$2
Cost
s
$2
$2
$4
$4
$3
$6
$5
$1
Valu
e
B
1
4
CoverC
2
3
$12$5- Cost:
Revenue:
$7Profit:
+1
$6
Submodularity =
Diminishing Returns
Purchase
Purchase
Purchase
Purchase
Motivating ExampleBidding on Keywords:
Keywords QueriesA
B
C
D
E
F
G
H
1
2
3
4
5
6
7
8
$2
$5
$1
$2
$5
$1
$4
$2
Cost
s
$2
$2
$4
$4
$3
$6
$5
$1
Valu
e
$20$10
- Cost:
Revenue:
$10
Profit:
Purchase
Purchase
Purchase
Purchase
Motivating ExampleBidding on Keywords:
Keywords QueriesA
B
C
D
E
F
G
H
1
2
3
4
5
6
7
8
$2
$5
$1
$2
$5
$1
$4
$2
Cost
s
$2
$2
$4
$4
$3
$6
$5
$1
Valu
e
$20$10
- Cost:
Revenue:
$10
Profit:
- 4
+6
$20
NP-Hard in General
Submodular Maximization
• NP-Hard in General
• Buchbinder et al. [FOCS’12] proposed the double-greedy randomized algorithm which is provably optimal.
f( , X, Y ) =
Double Greedy Algorithm
Process keywords serially
Keywords QueriesA
B
C
D
E
F
1
2
3
4
5
6
Set X
Set YA
B
C
D
E
F
Add XRem.
Y
0 1
A
rand
A
Keywords to
purchase
f( , X, Y ) =
Double Greedy Algorithm
Process keywords serially
Keywords QueriesA
B
C
D
E
F
1
2
3
4
5
6
Set X
Set YA
B
C
D
E
F
Add X Rem. Y
0 1
B
rand
A
Keywords to
purchase
f( , X, Y ) =
Double Greedy Algorithm
Process keywords serially
Keywords QueriesA
B
C
D
E
F
1
2
3
4
5
6
Set X
Set YA
C
D
E
F
Add X Rem. Y
0 1
C
rand
A
C
Keywords to
purchase
Concurrency ControlDouble Greedy Algorithm
Process keywords in parallel
Keywords QueriesA
B
C
D
E
F
1
2
3
4
5
6
Set X
Set YA
C
D
E
F
B
Within each processor:
f( , Xbnd,Ybnd)=
Add XRem.
Y
0 1
A
Subset of true
X
Superset of true
Y
Uncertainty
Keywords to
purchase
Sets X and Y are sharedby all processors.
Concurrency ControlDouble Greedy Algorithm
Process keywords in parallel
Keywords QueriesA
B
C
D
E
F
1
2
3
4
5
6
Set X
Set YA
C
D
E
F
B
Within each processor:
f( , Xbnd,Ybnd)=
Add XRem.
Y
0 1
A
Subset of true
X
Superset of true
Y
Uncertainty
rand
A
Safe
rand
UnsafeMust Validate
Keywords to
purchase
Sets X and Y are sharedby all processors.
Concurrency ControlDouble Greedy Algorithm System
DesignImplemented in multicore (shared memory):
Model Server(Validator)
Set X
Set YA
C
D
E
F
A
Valid
atio
n Q
ueue
Published Bounds
(X,Y)
Bound
(X,Y)D
Trx. Add X
D
Bound
(X,Y)E
FailE
Thread 1
f( , Xbnd,Ybnd)=Add X
Rem. Y
0 1
D
Uncertainty
Thread 2
f( , Xbnd,Ybnd)=Add X Rem. Y
0 1
E
Uncertainty
Provable PropertiesTheorem: CC double greedy is serializable.
Corollary: CC double greedy preserves optimal approximation guarantee of ½OPT.
Lemma: CC has bounded overhead.
set cover with costs: 2τsparse max cut: 2cτ/n
Corr
ectn
es
sC
on
cu
rre
ncy
Provable Properties – coord free?
Theorem: CF double greedy is serializable.
Lemma: CF double greedy achieves approximation guarantee of ½OPT – ¼
Lemma: CC has bounded overhead.
set cover with costs: 2τsparse max cut: 2cτ/n
Corr
ectn
es
sC
on
cu
rre
ncy
depends on uncertainty regionsimilar order of CC overhead!
Provable Properties – coord free?
Theorem: CF double greedy is serializable.
Lemma: CF double greedy achieves approximation guarantee of ½OPT – ¼
CF: no coordination overhead.
Corr
ectn
es
sC
on
cu
rre
ncy
depends on uncertainty regionsimilar order of CC overhead!
Early Results
Runtime and Strong-Scaling
IT-2004: Italian Web-graph (41M Vertices, 1.1B Edges)UK-2005: UK Web-graph (39M, 921M Edges)
Arabic-2005: Arabic Web-graph (22M, 631M Edges)
Coordination Free
Concurrency Ctrl.
Coordination and Guarantees
IT-2004: Italian Web-graph (41M Vertices, 1.1B Edges)UK-2005: UK Web-graph (39M, 921M Edges)
Arabic-2005: Arabic Web-graph (22M, 631M Edges)
Increase in Coordination
Bad
Decrease in Objective
Summary
• New primitives for robust parallel algorithm design– Exploit properties in ML algorithms
• Introduced parallel algorithms for: – DP-Means– Submodular Maximization
• Future Work: Integrate with Velox Model Server