![Page 1: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/1.jpg)
Distributed Machine Learning: Communication, Efficiency, and
Privacy
Avrim Blum
[RaviKannan60]
Joint work with Maria-Florina Balcan, Shai Fine, and Yishay Mansour
Carnegie Mellon University
![Page 2: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/2.jpg)
Happy birthday Ravi!
![Page 3: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/3.jpg)
And thank you for many enjoyable years working
togetheron challenging problems where
machine learning meets high-dimensional geometry
![Page 4: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/4.jpg)
This talkAlgorithms for machine learning in distributed,
cloud-computing context.
Related to interest of Ravi’s in algorithms for cloud-computing.
For full details see [Balcan-B-Fine-Mansour COLT’12]
![Page 5: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/5.jpg)
Machine LearningWhat is Machine Learning about?
– Making useful, accurate generalizations or predictions from data.
– Given access to sample of some population, classified in some way, want to learn some rule that will have high accuracy over population as a whole.
Typical ML problems:
Given sample of images, classified as male or female, learn a rule to classify new images.
![Page 6: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/6.jpg)
Machine LearningWhat is Machine Learning about?
– Making useful, accurate generalizations or predictions from data.
– Given access to sample of some population, classified in some way, want to learn some rule that will have high accuracy over population as a whole.
Typical ML problems:
Given set of protein sequences, labeled by function, learn rule to predict functions of new proteins.
![Page 7: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/7.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
![Page 8: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/8.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
Click data
![Page 9: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/9.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
Customer data
![Page 10: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/10.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
Scientific data
![Page 11: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/11.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
Each has only a piece of the overall
data pie
![Page 12: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/12.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
In order to learn over the combined D,
holders will need to communicate.
![Page 13: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/13.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
Classic ML question: how much data is needed to learn a
given type of function well?
![Page 14: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/14.jpg)
Distributed Learning
Many ML problems today involve massive amounts of data distributed across multiple locations.
These settings bring up a new question:
how much communication?Plus issues like privacy, etc.
That is the focus of this talk.
![Page 15: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/15.jpg)
Distributed Learning: Scenarios
Two natural high-level scenarios:1. Each location has data from same distribution.
– So each could in principle learn on its own.– But want to use limited communication to speed up
– ideally to centralized learning rate. [Dekel, Giliad-Bachrach, Shamir, Xiao]
2. Overall distribution arbitrarily partitioned.– Learning without communication is impossible.– This will be our focus here.
![Page 16: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/16.jpg)
The distributed PAC learning model• Goal is to learn unknown function f 2 C
given labeled data from some prob. distribution D.
• However, D is arbitrarily partitioned among k entities (players) 1,2,…,k. [k=2 is interesting]
+++
+- -
--
![Page 17: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/17.jpg)
The distributed PAC learning model• Goal is to learn unknown function f 2 C
given labeled data from some prob. distribution D.
• However, D is arbitrarily partitioned among k entities (players) 1,2,…,k. [k=2 is interesting]
• Players can sample (x,f(x)) from their own Di.
1 2 … kD1 D2 … Dk
D = (D1 + D2 + … + Dk)/k
![Page 18: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/18.jpg)
The distributed PAC learning model• Goal is to learn unknown function f 2 C
given labeled data from some prob. distribution D.
• However, D is arbitrarily partitioned among k entities (players) 1,2,…,k. [k=2 is interesting]
• Players can sample (x,f(x)) from their own Di.
1 2 … kD1 D2 … Dk
Goal: learn good rule over combined D.
![Page 19: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/19.jpg)
The distributed PAC learning model
Interesting special case to think about:– k=2.– One has the positives and one has the
negatives.– How much communication to learn, e.g., a
good linear separator?
1 2
+++
+
+++
+
- -
--
- -
--- -
--
- -
--
+++
+
+++
+
– In general, view k as small compared to sample size needed for learning.
![Page 20: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/20.jpg)
The distributed PAC learning model
Some simple baselines.• Baseline #1: based on fact that can learn any
class of VC-dim d to error ² from O(d/² log 1/²) samples– Each player sends 1/k fraction of this to
player 1– Player 1 finds good rule h over sample. Sends
h to others.– Total: 1 round, O(d/² log 1/²) examples sent.
D1 D2 … Dk
![Page 21: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/21.jpg)
The distributed PAC learning model
Some simple baselines.• Baseline #2: Suppose function class has an
online algorithm A with mistake-bound M. E.g., Perceptron algorithm learns linear separators of margin ° with mistake-bound O(1/°2).
D1 D2 … Dk
+++
+- -
--
![Page 22: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/22.jpg)
The distributed PAC learning model
Some simple baselines.• Baseline #2: Suppose function class has an
online algorithm A with mistake-bound M. – Player 1 runs A, broadcasts current
hypothesis.– If any player has a counterexample, sends to
player 1. Player 1 updates, re-broadcasts.– At most M examples and rules
communicated.
D1 D2 … Dk
![Page 23: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/23.jpg)
Dependence on 1/²
Had linear dependence in d and 1/², or M and no dependence on 1/². [² = final error rate]• Can you get O(d log 1/²) examples of
communication?• Yes.
Distributed boosting
D1 D2 … Dk
![Page 24: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/24.jpg)
Distributed Boosting
Idea: • Run baseline #1 for ² = ¼. [everyone sends a
small amount of data to player 1, enough to learn to error ¼]
• Get initial rule h1, send to others.
D1 D2 … Dk
![Page 25: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/25.jpg)
Distributed Boosting
Idea: • Players then reweight their Di to focus on
regions h1 did poorly.• Repeat
D1 D2 … Dk
+++
+
+++
+
- -
--
- -
--
++ -
-- -
• Distributed implementation of Adaboost Algorithm.
• Some additional low-order communication needed too (players send current performance level to #1, so can request more data from players where h doing badly).
• Key point: each round uses only O(d) samples and lowers error multiplicatively.
![Page 26: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/26.jpg)
Distributed Boosting
Final result: • O(d) examples of communication per
round + low order extra bits.• O(log 1/²) rounds of communication.• So, O(d log 1/²) examples of
communication in total plus low order extra info.
D1 D2 … Dk
![Page 27: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/27.jpg)
Agnostic learning (no perfect h)
[Balcan-Hanneke] give robust halving alg that can be implemented in distributed setting.• Based on analysis of a generalized active
learning model.• Algorithms especially suited to distributed
setting.
D1 D2 … Dk
![Page 28: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/28.jpg)
Agnostic learning (no perfect h)
[Balcan-Hanneke] give robust halving alg that can be implemented in distributed setting.• Get error 2*OPT(C) + ² using total of only
O(k log|C| log(1/²)) examples.
• Not computationally efficient, but says logarithmic dependence possible in principle.
D1 D2 … Dk
![Page 29: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/29.jpg)
Can we do better for specific classes of functions?
D1 D2 … Dk
![Page 30: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/30.jpg)
Interesting class: parity functions
Examples x 2 {0,1}d. f(x) = x¢vf mod 2, for unknown vf.• Interesting for k=2.• Classic communication LB for
determining if two subspaces intersect. • Implies (d2) bits LB to output good v.• What if allow rules that “look different”?
D1 D2 … Dk
D1 D2
![Page 31: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/31.jpg)
Interesting class: parity functions
Examples x 2 {0,1}d. f(x) = x¢vf mod 2, for unknown vf.• Parity has interesting property that:
(a) Can be learned using . [Given dataset S of size O(d/²), just solve the linear system]
(b) Can be learned using in reliable-useful model of Rivest-Sloan’88. [if x in subspace spanned by S, predict accordingly, else say “??”]
S vector vh
S
xf(x)
??
![Page 32: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/32.jpg)
D1 D2
Interesting class: parity functions
Examples x 2 {0,1}d. f(x) = x¢vf mod 2, for unknown vf.• Algorithm:
– Each player i PAC-learns over Di to get parity function gi. Also R-U learns to get rule hi. Sends gi to other player.
– Uses rule: “if hi predicts, use it; else use g3-i.”– Can one extend to k=3?
g1
h1
g2
h2
![Page 33: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/33.jpg)
Linear Separators
Can one do better?
+++
+
- -
--
Linear separators thru origin. (can assume pts on sphere)• Say we have a near-uniform prob. distrib. D over Sd.• VC-bound, margin bound, Perceptron mistake-bound all
give O(d) examples needed to learn, so O(d) examples of communication using baselines (for constant k, ²).
![Page 34: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/34.jpg)
Linear Separators
Idea: Use margin-version of Perceptron alg [update until f(x)(w ¢ x) ¸ 1 for all x] and run round-robin.
+++
+
- -
--
![Page 35: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/35.jpg)
Linear Separators
Idea: Use margin-version of Perceptron alg [update until f(x)(w ¢ x) ¸ 1 for all x] and run round-robin.• So long as examples xi of player i and xj of
player j are reasonably orthogonal, updates of player j don’t mess too much with data of player i.– Few updates ) no damage.– Many updates ) lots of progress!
![Page 36: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/36.jpg)
Linear Separators
Idea: Use margin-version of Perceptron alg [update until f(x)(w ¢ x) ¸ 1 for all x] and run round-robin.• If overall distrib. D is near uniform [density
bounded by c¢unif], then total communication (for constant k, ²) is O((d log d)1/2) rather than O(d).
Get similar savings for general distributions?
![Page 37: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/37.jpg)
Preserving Privacy of DataNatural also to consider privacy in this setting.• Data elements could be patient records,
customer records, click data.• Want to preserve privacy of individuals
involved.• Compelling notion of differential privacy: if
replace any one record with fake record, nobody else can tell. [Dwork, Nissim, …]S1 ~ D1 S2 ~ D2 … Sk ~ Dk
10110110111010111011001
![Page 38: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/38.jpg)
Preserving Privacy of DataNatural also to consider privacy in this setting.
S1 ~ D1 S2 ~ D2 … Sk ~ Dk
10110110111010111011001
¼ 1-² ¼ 1+²
probability over randomness in A
For all sequences of interactions ¾, e-² · Pr(A(Si)=¾)/Pr(A(Si’)=¾) · e²
![Page 39: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/39.jpg)
Preserving Privacy of DataNatural also to consider privacy in this setting.
S1 ~ D1 S2 ~ D2 … Sk ~ Dk
10110110111010111011001
• A number of algorithms have been developed for differentially-private learning in centralized setting.
• Can ask how to maintain without increasing communication overhead.
![Page 40: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/40.jpg)
Preserving Privacy of DataAnother notion that is natural to consider in this setting.
S1 ~ D1 S2 ~ D2 … Sk ~ Dk
• A kind of privacy for data holder.• View distrib Di as non-sensitive (statistical info
about population of people who are sick in city i).
• But the sample Si » Di is sensitive (actual patients).
• Reveal no more about Si other than inherent in Di?
![Page 41: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/41.jpg)
Preserving Privacy of DataAnother notion that is natural to consider in this setting.
S1 ~ D1 S2 ~ D2 … Sk ~ Dk
Di SiProtocol
• Want to reveal no more info about Si than is inherent in Di.
![Page 42: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/42.jpg)
Di Si
S’i
Protocol
PrSi,S’i[8¾, Pr(A(Si)=¾)/Pr(A(S’i)=¾) 2 1 § ²] ¸ 1 -
±.
Preserving Privacy of DataAnother notion that is natural to consider in this setting.
Actual sample
“Ghost sample” sample
Can get algorithms with this guarantee
![Page 43: Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay](https://reader030.vdocuments.mx/reader030/viewer/2022032517/56649c9d5503460f9495c938/html5/thumbnails/43.jpg)
Conclusions
As we move to large distributed datasets, communication issues become important.• Rather than only ask “how much data is needed
to learn well”, also ask “how much communication do we need?”
• Also issues like privacy become more critical.
Quite a number of open questions.