Download - Privacy by Learning the Database
![Page 1: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/1.jpg)
Privacy by Learningthe Database
Moritz Hardt DIMACS, October 24, 2012
![Page 2: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/2.jpg)
Isn’t privacy the opposite of learningthe database?
![Page 3: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/3.jpg)
Curator
Analyst
data set D= multi-set
over universe U
query set Q
privacy-preservingstructure S
accurate on Q
![Page 4: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/4.jpg)
. . .1 2 3 N4 5
Data set D as N-dimensional histogram where N=|U|
D[i] = # elements in Dof type i
Normalized histogram = distribution over universe
1
0
Statistical query q (aka linear/counting):
. . .1 2 3 N4 5
Vector q in [0,1]N
q(D) := <q,D>
q(D) in [0,1]
![Page 5: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/5.jpg)
Why statistical queries?
• Perceptron, ID3 decision trees, PCA/SVM, k-means clustering [BlumDworkMcSherryNissim’05]
• Any SQ-learning algorithm [Kearns’98]– includes “most” known PAC-learning algorithms
Lots of data analysis reduces to multiple statistical queries
![Page 6: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/6.jpg)
Curator’s wildest dream:
This seems hard!
![Page 7: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/7.jpg)
Curator’s 2nd attempt:
Intuition:Entropy implies privacy
![Page 8: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/8.jpg)
Two pleasant surprises
Approximately solved by multiplicative weights update [Littlestone89,...]
Can easily be made differentially private
![Page 9: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/9.jpg)
Why did learning theorists care to solve privacy problems 20 years ago?
Answer:Entropy implies generalization
![Page 10: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/10.jpg)
Learnerexample set Q
hypothesis haccurate on all
examples
Maximizing entropy implieshypothesis generalizes
Unknown concept
![Page 11: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/11.jpg)
Sensitive databaseQueries labeled by
answer on DB
Synopsisapproximates DB on
query set
Must Preserve Privacy
Unknown conceptExamples labeled by concept
Hypothesisapproximates target concept on examples
Must Generalize
Privacy Learning
![Page 12: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/12.jpg)
How can we solve this?
Concave maximizations.t. linear constraints
EllipsoidWe’ll take adifferent route.
![Page 13: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/13.jpg)
Start with uniform D0
“What’s wrong with it?”Query q violates constraint!
Minimize entropy losss.t. correction
Closed form expression for Dt+1? Well...
![Page 14: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/14.jpg)
Closed form expression for Dt+1? YES!
Relax
Approximate Think
![Page 15: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/15.jpg)
Multiplicative Weights Update
![Page 16: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/16.jpg)
. . .1 2 3 N4 5
0
1
Dt
D
At step t
![Page 17: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/17.jpg)
. . .1 2 3 N4 5
0
1
Dt
Dq
At step tSuppose q(Dt) < q(D)
![Page 18: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/18.jpg)
. . .1 2 3 N4 5
0
1
Dt
Dq
After step t
![Page 19: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/19.jpg)
Multiplicative Weights Update
Algorithm:D0 uniformFor t = 1...T Find bad query q Dt+1 = Update(Dt,q)
How quickly do we run out ofbad queries?
![Page 20: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/20.jpg)
Progress Lemma:
if q bad
Put
![Page 21: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/21.jpg)
Facts:
Progress Lemma:
if q bad
At most steps
Error bound
![Page 22: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/22.jpg)
Algorithm:D0 uniformFor t = 1...T Find bad query q Dt+1 = Update(Dt,q)
What about privacy?
Only step that interacts with D
![Page 23: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/23.jpg)
Differential Privacy [Dwork-McSherry-Nissim-Smith-06]
Two data sets D,D’ are called neighboring if they differ in one element.
Definition (Differential Privacy):A randomized algorithm M(D) is called (ε,δ)-differentially privateif for any two neighboring data sets D,D’ and all events S:
![Page 24: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/24.jpg)
Laplacian Mechanism [DMNS’06]
Given query q:1. Compute q(D)2. Output q(D) + Lap(1/ε0n)
Fact: Satisfies ε0-differential privacy
Note: Sensitivity of q is 1/n
![Page 25: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/25.jpg)
Query selection
… q1 q2 q3 qk
|q(D)-q(Dt)|
![Page 26: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/26.jpg)
Query selection
… q1 q2 q3 qk
|q(D)-q(Dt)|
Add Lap(1/ε0n)
![Page 27: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/27.jpg)
Pick maximal violation
Query selection
… q1 q2 q3 qk
|q(D)-q(Dt)|
![Page 28: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/28.jpg)
Pick maximal violation
Query selection
… q1 q2 q3 qk
|q(D)-q(Dt)|
Lemma [McSherry-Talwar’07]:Selected index satisfies ε0-differential privacyand w.h.p Violation >
![Page 29: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/29.jpg)
Algorithm:D0 uniformFor t = 1...T Noisy selection of q Dt+1 = Update(Dt,q)
Now: Each step satisfies ε0-differential privacy!
What is the total privacy guarantee?
Also use noisy answer in update rule
New error bound:
![Page 30: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/30.jpg)
T-fold composition ofε0-differential privacy satisfies:
Answer 1 [DMNS’06]:
ε0T-differential privacy
Answer 2 [DRV’10]:
(ε,δ)-differential privacy
Note: for small enough ε
![Page 31: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/31.jpg)
Composition Theorem
s Erro
r bou
nd
Optim
ize T,
ε 0
ε,δ
Theorem 1. On databases of size nMW achieves ε-differential privacywith
Theorem 2. MW achieves (ε, δ)-differential privacy with
Optimal dependence on |Q| and n
![Page 32: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/32.jpg)
Offline (non-interactive)
S
Q…
Online (interactive)
q1
q2
a2
a1
?✔H-Ligett-McSherry12,Gupta-H-Roth-Ullman11
See also: Roth-Roughgarden10, Dwork-Rothblum-Vadhan10,Dwork-Naor-Reingold-Rothblum-Vadhan09, Blum-Ligett-Roth08
H-Rothblum10
![Page 33: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/33.jpg)
Algorithm:Given query qt:
• If |qt (Dt)- qt (D) | < α/2 + Lap(1/ε0n)– Output qt (Dt)
• Otherwise– Output qt (D) + Lap(1/ε0n)– Dt+1 = Update(Dt, qt )
Private MW Online [H-Rothblum’10]
Achieves same error bounds!
![Page 34: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/34.jpg)
Overview: Privacy Analysis
• Offline setting: T << n steps– Simple analysis using Composition Theorems
• Online setting: k >> n invocations of Laplace– Composition Thms don’t suggest small error!
• Idea: Analyze privacy loss like lazy random walk (goes back to Dinur-Dwork-Nissim’03)
![Page 35: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/35.jpg)
Privacy Loss as a lazy random walk
Number of Steps
![Page 36: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/36.jpg)
Privacy Loss as a lazy random walk
Number of Steps
Privacy loss
![Page 37: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/37.jpg)
Privacy Loss as a lazy random walk
Number of Steps
lazy lazy lazy lazy lazy
Privacy loss
busy busy busy busy busy
busy round = noisy answer close to forcing update
![Page 38: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/38.jpg)
Privacy Loss as a lazy random walk
Number of Steps
lazy lazy lazy lazy lazy
Privacy loss
1
busy busy busy busy busy
busy round = noisy answer close to forcing update
![Page 39: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/39.jpg)
Privacy Loss as a lazy random walk
Number of Steps
lazy lazy lazy lazy lazy
Privacy loss
busy busy busy busy busy
1 1
busy round = noisy answer close to forcing update
![Page 40: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/40.jpg)
Privacy Loss as a lazy random walk
Number of Steps
lazy lazy lazy lazy lazy
Privacy loss
busy busy busy busy busy
1 1 1
busy round = noisy answer close to forcing update
![Page 41: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/41.jpg)
Privacy Loss as a lazy random walk
Number of Steps
lazy lazy lazy lazy lazy
Privacy loss
busy busy busy busy busy
1 1 1 1
busy round = noisy answer close to forcing update
![Page 42: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/42.jpg)
Privacy Loss as a lazy random walk
Number of Steps
lazy lazy lazy lazy lazy
Privacy loss
busy busy busy busy busy
1 1 1 1 1 1 1 1
busy round = noisy answer close to forcing update
![Page 43: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/43.jpg)
Privacy Loss as a lazy random walk
Number of Steps
lazy lazy lazy lazy lazy
Privacy loss
busy busy busy busy busy
1 1 1 1 1 1 1 1
busy round = noisy answer close to forcing update
W.h.p. boundedby O(sqrt(#busy))
![Page 44: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/44.jpg)
Formalizing the random walk
Imagine output of PMW is 0/1 indicator vector
where vt=1 if round t update, 0 otherwise
Recall: Very few updates! Vector is sparse.
Theorem: Vector v is (ε,δ)-diffpriv.
![Page 45: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/45.jpg)
Let D,D’ be neighboring DBs
Let P,Q be corresponding output distributions
Lemma: (3) implies (ε,δ)-diffpriv.
Approach:1.Sample v from P2.Consider X = log(P(v)/Q(v))3.Argue Pr{ |X| > ε } ≤ δ
Intution:X = privacy
loss
![Page 46: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/46.jpg)
Privacy loss in round t
We’ll show:1. Xt = 0 if t not busy2.|Xt| ≤ ε0 if t busy 3. Number of busy rounds O(#updates)
Total privacy loss
DRV’10E[X1+...+Xk] ≤ O(ε0
2#updates)
AzumaStrong concentrationaround expectation
![Page 47: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/47.jpg)
Defining “busy” eventUpdate condition:
Busy event
![Page 48: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/48.jpg)
…
Offline (non-interactive) Online (interactive)
q1
S q2
a2
Q a1
✔ ✔
![Page 49: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/49.jpg)
What we can do
• Offline/batch setting: every set of linear queries• Online/interactive setting: every sequence of
adaptive and adversarial linear queries• Theoretical performance: Nearly optimal in the
worst case– For instance-by-instance guarantee see H-Talwar10,
Nikolov-Talwar (upcoming!), different techniques• Practical performance: Compares favorably to
previous work! See Katrina’s talk.
Are we done?
![Page 50: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/50.jpg)
What we would like to do
Running time: Linear dependence on |U||U| exponential in #attributes of data
Can we get poly(n)?No, in the worst-case for synthetic data [DNRRV09]even for simple query classes [Ullman-Vadhan10]
No, in interactive setting without restricting query class [Ullman12]
What can we do about it?
![Page 51: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/51.jpg)
Look beyond the worst-case!Find meaningful assumptionson data, queries, models etc
Design better heuristics!
In this talk:Get more mileage out of learning theory!
![Page 52: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/52.jpg)
Sensitive databaseQueries labeled by
answer on DB
Synopsisapproximates DB on
query set
Unknown conceptExamples labeled by concept
Hypothesisapproximates target concept on examples
Privacy Learning
Can we turn this into an efficient reduction?
Yes. [H-Rothblum-Servedio’12]
![Page 53: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/53.jpg)
Informal Theorem: There is an efficient differentially private release mechanism for a query class Q provided that there is an efficient PAC-learning algorithm for related concept class Q’• Interfaces nicely with existing learning
algorithms:– Learning based on polynomial threshold
functions [Klivans-Servedio]– Harmonic Sieve [Jackson] and extension [Jackson,
Klivans, Servedio]
![Page 54: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/54.jpg)
Database as a function
Observation:Enough to learn Ft for t=α,2α,...,(1-α)in order to approximate F
Query q q(D)
![Page 55: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/55.jpg)
High-level idea
Learning algorithm
labeled examples
Observation: If all labels are privacy-preserving,then so will be hypothesis h
Hypothesis h such that
![Page 56: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/56.jpg)
Main hurdles
• Privacy requires noise, noise might defeat learning algorithm
• Can only generate |D| examples efficiently before running out of privacy
![Page 57: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/57.jpg)
Learning algorithm
Threshold Oracle
Compute a=F(x)+N If |a-t| tiny: output “fail”Else if a>t: output 1 Else if a<t: output 0
Ensures:1. Privacy2. “Removes” noise3. Complexity independent of |D|
Generate samples:
1. Pick x1,x2,..,.xm
2. Receive b1,b2,...,bm from TO3. Remove all “failed” examples4. Pass on remaining labeled examples to learner
(y1,l1),....,(yr,lr)
“F(x)>t”?
b in {0,1,fail}
![Page 58: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/58.jpg)
Application: Boolean Conjunctions
Important class of queries in differential privacy [BCDKMT07,KRSU10,GHRU11,HMT12,...]
Salary > $50k Syphilis Height > 6’1 Weight < 180 Male
True False True False True
True True True True True
False False False True False
True False False True True
False False False False False
Example Conjunction: “(Salary > $50k) AND (Male)”Evaluates to on this database
Universe U = {0,1}d
![Page 59: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/59.jpg)
Informal Corollary (Subexponential algorithm for conjunctions).There is a differentially private release algorithm with running time poly(|D|) such that for any distribution over Boolean conjunctions the algorithm is w.h.p. α-accurate provided that:
Informal Corollary (Small width).There is a differentially private release algorithm with running time poly(|D|) such that for any distribution over width-k Boolean conjunctions the algorithm is w.h.p. α-accurate provided that:
Previous:2O(d)
Previous:dO(k)
![Page 60: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/60.jpg)
Follow-up work
• Thaler-Ullman-Vadhan12: Can remove distributional relaxation and get exp(O(d1/2)) complexity for all Boolean conjunctions
Idea: Use polynomial encodings from learning algorithm directly
![Page 61: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/61.jpg)
Summary
• Derived simple and powerful private data release algorithm from first principles
• Privacy/learning analogy as a guiding principle– Can be turned into efficient reduction
• Can we use these ideas outside theory and in new settings?
![Page 62: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/62.jpg)
Thank you
![Page 63: Privacy by Learning the Database](https://reader030.vdocuments.mx/reader030/viewer/2022020721/56816836550346895dddf2ff/html5/thumbnails/63.jpg)
Open problems
• Is PMW close to instance optimal?• Is there a converse to privacy-to-learning
reduction?• No barriers for cut/spectral analysis of
graphs/matrices (universe small)• Releasing k-way conjunctions in time poly(n),
error poly(d,k)