local privacy and statistical minimax rates · 2020-01-03 · goals for this talk bring together...
TRANSCRIPT
Local Privacy and Statistical Minimax Rates
John C. Duchi, Michael I. Jordan, Martin J. Wainwright
University of California, Berkeley
December 2013
Goals for this talk
Bring together some classical concepts of decision theoryand newer concepts of privacy
Illustration of problem
I Have data X1, . . . , Xn
I Private views Z1, . . . , Zn constructed from Xi
I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)
I Distribution P and parameter θ(P ) generate data
I Sample X1, . . . , Xn from P not observed (only Zi)
I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn
Illustration of problem
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have data X1, . . . , Xn
I Private views Z1, . . . , Zn constructed from Xi
I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)
I Distribution P and parameter θ(P ) generate data
I Sample X1, . . . , Xn from P not observed (only Zi)
I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn
Illustration of problem
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have data X1, . . . , Xn
I Private views Z1, . . . , Zn constructed from Xi
I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)
I Distribution P and parameter θ(P ) generate data
I Sample X1, . . . , Xn from P not observed (only Zi)
I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn
Illustration of problem
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have data X1, . . . , Xn
I Private views Z1, . . . , Zn constructed from Xi
I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)
I Distribution P and parameter θ(P ) generate data
I Sample X1, . . . , Xn from P not observed (only Zi)
I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn
Illustration of problem
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have data X1, . . . , Xn
I Private views Z1, . . . , Zn constructed from Xi
I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)
I Distribution P and parameter θ(P ) generate data
I Sample X1, . . . , Xn from P not observed (only Zi)
I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn
Illustration of problem
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have data X1, . . . , Xn
I Private views Z1, . . . , Zn constructed from Xi
I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)
I Distribution P and parameter θ(P ) generate data
I Sample X1, . . . , Xn from P not observed (only Zi)
I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn
Illustration of problem
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have data X1, . . . , Xn
I Private views Z1, . . . , Zn constructed from Xi
I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)
I Distribution P and parameter θ(P ) generate data
I Sample X1, . . . , Xn from P not observed (only Zi)
I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn
Primer on minimax rates of convergence and statisticalinference
Illustration of classical problem
X1X2
X3 Xn. . .
✓(P ) Infer
P
I Have distribution P and parameter θ(P ) of P
I Sample X1, . . . , Xn drawn from P and observed
I Goal: infer population parameter θ(P )
Why? Care about making future predictions
I What is likelihood new resident of San Francisco needs food stamps
I Biological prediction, web advertising, search, ...
Illustration of classical problem
X1X2
X3 Xn. . .
✓(P ) Infer
P
I Have distribution P and parameter θ(P ) of P
I Sample X1, . . . , Xn drawn from P and observed
I Goal: infer population parameter θ(P )
Why? Care about making future predictions
I What is likelihood new resident of San Francisco needs food stamps
I Biological prediction, web advertising, search, ...
Illustration of classical problem
X1X2
X3 Xn. . .
✓(P ) Infer
P
I Have distribution P and parameter θ(P ) of P
I Sample X1, . . . , Xn drawn from P and observed
I Goal: infer population parameter θ(P )
Why? Care about making future predictions
I What is likelihood new resident of San Francisco needs food stamps
I Biological prediction, web advertising, search, ...
Illustration of classical problem
X1X2
X3 Xn. . .
✓(P ) Infer
P
I Have distribution P and parameter θ(P ) of P
I Sample X1, . . . , Xn drawn from P and observed
I Goal: infer population parameter θ(P )
Why? Care about making future predictions
I What is likelihood new resident of San Francisco needs food stamps
I Biological prediction, web advertising, search, ...
Illustration of classical problem
X1X2
X3 Xn. . .
✓(P ) Infer
P
I Have distribution P and parameter θ(P ) of P
I Sample X1, . . . , Xn drawn from P and observed
I Goal: infer population parameter θ(P )
Why? Care about making future predictions
I What is likelihood new resident of San Francisco needs food stamps
I Biological prediction, web advertising, search, ...
Illustration of classical problem
X1X2
X3 Xn. . .
✓(P ) Infer
P
I Have distribution P and parameter θ(P ) of P
I Sample X1, . . . , Xn drawn from P and observed
I Goal: infer population parameter θ(P )
Why? Care about making future predictions
I What is likelihood new resident of San Francisco needs food stamps
I Biological prediction, web advertising, search, ...
Minimax risk
Central object of study: Minimax riskI Parameter θ(P ) of distribution P
I E.g. mean: θ(P ) = EP [X]
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22 or more esoteric/robust
ρu(θ̂, θ)
=
{12‖θ̂ − θ‖22 if ‖θ̂ − θ‖2 ≤ uu‖θ̂ − θ‖2 − u2/2 otherwise
I Family of distributions P that we studyI E.g. P such that EP [X2] ≤ 1
Minimax risk
Central object of study: Minimax riskI Parameter θ(P ) of distribution P
I E.g. mean: θ(P ) = EP [X]
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22 or more esoteric/robust
ρu(θ̂, θ)
=
{12‖θ̂ − θ‖22 if ‖θ̂ − θ‖2 ≤ uu‖θ̂ − θ‖2 − u2/2 otherwise
I Family of distributions P that we studyI E.g. P such that EP [X2] ≤ 1
Minimax risk
Central object of study: Minimax riskI Parameter θ(P ) of distribution P
I E.g. mean: θ(P ) = EP [X]
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22 or more esoteric/robust
ρu(θ̂, θ)
=
{12‖θ̂ − θ‖22 if ‖θ̂ − θ‖2 ≤ uu‖θ̂ − θ‖2 − u2/2 otherwise
I Family of distributions P that we studyI E.g. P such that EP [X2] ≤ 1
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
To study: rate of Mn(θ(P), ρ)→ 0 as n grows
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
EP[ρ(θ̂(X1, . . . , Xn), θ(P ))
]
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
To study: rate of Mn(θ(P), ρ)→ 0 as n grows
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
supP∈P
EP[ρ(θ̂(X1, . . . , Xn), θ(P ))
]
I Worst case over distributions P
I Best case over all estimators θ̂ : Zn → Θ
To study: rate of Mn(θ(P), ρ)→ 0 as n grows
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
infθ̂
supP∈P
EP[ρ(θ̂(X1, . . . , Xn), θ(P ))
]
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
To study: rate of Mn(θ(P), ρ)→ 0 as n grows
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
Mn(θ(P), ρ) := infθ̂
supP∈P
EP[ρ(θ̂(X1, . . . , Xn), θ(P ))
]
︸ ︷︷ ︸Classical minimax risk
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
To study: rate of Mn(θ(P), ρ)→ 0 as n grows
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
Mn(θ(P), ρ) := infθ̂
supP∈P
EP[ρ(θ̂(X1, . . . , Xn), θ(P ))
]
︸ ︷︷ ︸Classical minimax risk
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
To study: rate of Mn(θ(P), ρ)→ 0 as n grows
Proving minimax bounds
This talk:
I Upper bounds will be ad-hoc
I Lower bounds will be information theoretic [Hasminskii 78, Birge 83,Ibragimov and Hasminskii 81, Yang and Barron 99, Yu97]
I NB: Many known information-theoretic upper bounds [Barron, Birge,Kivinen, ...]
Proving minimax bounds
This talk:
I Upper bounds will be ad-hoc
I Lower bounds will be information theoretic [Hasminskii 78, Birge 83,Ibragimov and Hasminskii 81, Yang and Barron 99, Yu97]
I NB: Many known information-theoretic upper bounds [Barron, Birge,Kivinen, ...]
Proving minimax bounds
This talk:
I Upper bounds will be ad-hoc
I Lower bounds will be information theoretic [Hasminskii 78, Birge 83,Ibragimov and Hasminskii 81, Yang and Barron 99, Yu97]
I NB: Many known information-theoretic upper bounds [Barron, Birge,Kivinen, ...]
Proving minimax bounds
This talk:
I Upper bounds will be ad-hoc
I Lower bounds will be information theoretic [Hasminskii 78, Birge 83,Ibragimov and Hasminskii 81, Yang and Barron 99, Yu97]
I NB: Many known information-theoretic upper bounds [Barron, Birge,Kivinen, ...]
Proving minimax lower bounds
Step 1: Reduce from estimation to testing
Step 2: Canonical testing problem
Step 3: Classical bounds on error probabilities
Proving minimax lower bounds
Step 1: Reduce from estimation to testing
Step 2: Canonical testing problem
Step 3: Classical bounds on error probabilities
Proving minimax lower bounds
Step 1: Reduce from estimation to testing
Step 2: Canonical testing problem
Step 3: Classical bounds on error probabilities
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
I Have 2δ-packing of Θ,i.e. {θ1, θ2, . . . , θK}
I Estimator θ̂
I At most one index closeto θ̂
I Can test index i ∈ [K]
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
I Have 2δ-packing of Θ,i.e. {θ1, θ2, . . . , θK}
I Estimator θ̂
I At most one index closeto θ̂
I Can test index i ∈ [K]
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
� b✓
I Have 2δ-packing of Θ,i.e. {θ1, θ2, . . . , θK}
I Estimator θ̂
I At most one index closeto θ̂
I Can test index i ∈ [K]
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
� b✓
I Have 2δ-packing of Θ,i.e. {θ1, θ2, . . . , θK}
I Estimator θ̂
I At most one index closeto θ̂
I Can test index i ∈ [K]
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
� b✓
I Have 2δ-packing of Θ,i.e. {θ1, θ2, . . . , θK}
I Estimator θ̂
I At most one index closeto θ̂
I Can test index i ∈ [K]
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
I Nature chooses random index V ∈ [K]I Conditional on V = v, sample X1, . . . , Xn i.i.d. from PvI Lower bound minimax error:
supP∈P
EP[ρ(θ̂, θ(P ))
]≥ 1
K
K∑
v=1
Ev[ρ(θ̂, θv)
]
≥ 1
K
K∑
v=1
ρ(δ)Pv(ρ(θ̂, θv) ≥ 2δ)
I Final canonical testing problem:
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P(v̂(X1, . . . , Xn) 6= V ).
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
I Nature chooses random index V ∈ [K]
I Conditional on V = v, sample X1, . . . , Xn i.i.d. from PvI Lower bound minimax error:
supP∈P
EP[ρ(θ̂, θ(P ))
]≥ 1
K
K∑
v=1
Ev[ρ(θ̂, θv)
]
≥ 1
K
K∑
v=1
ρ(δ)Pv(ρ(θ̂, θv) ≥ 2δ)
I Final canonical testing problem:
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P(v̂(X1, . . . , Xn) 6= V ).
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
I Nature chooses random index V ∈ [K]I Conditional on V = v, sample X1, . . . , Xn i.i.d. from Pv
I Lower bound minimax error:
supP∈P
EP[ρ(θ̂, θ(P ))
]≥ 1
K
K∑
v=1
Ev[ρ(θ̂, θv)
]
≥ 1
K
K∑
v=1
ρ(δ)Pv(ρ(θ̂, θv) ≥ 2δ)
I Final canonical testing problem:
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P(v̂(X1, . . . , Xn) 6= V ).
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
I Nature chooses random index V ∈ [K]I Conditional on V = v, sample X1, . . . , Xn i.i.d. from PvI Lower bound minimax error:
supP∈P
EP[ρ(θ̂, θ(P ))
]≥ 1
K
K∑
v=1
Ev[ρ(θ̂, θv)
]
≥ 1
K
K∑
v=1
ρ(δ)Pv(ρ(θ̂, θv) ≥ 2δ)
I Final canonical testing problem:
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P(v̂(X1, . . . , Xn) 6= V ).
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
I Nature chooses random index V ∈ [K]I Conditional on V = v, sample X1, . . . , Xn i.i.d. from PvI Lower bound minimax error:
supP∈P
EP[ρ(θ̂, θ(P ))
]≥ 1
K
K∑
v=1
Ev[ρ(θ̂, θv)
]
≥ 1
K
K∑
v=1
ρ(δ)Pv(ρ(θ̂, θv) ≥ 2δ)
I Final canonical testing problem:
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P(v̂(X1, . . . , Xn) 6= V ).
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bv
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bvV X
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Le Cam’s method
P0(v̂ 6= 0) + P1(v̂ 6= 1)
≥ 1− ‖P0 − P1‖TV
Fano’s inequality (for K > 2)
P(v̂ 6= V )
≥ 1− I(X1, . . . , Xn;V ) + log 2
logK
Summarizing
V X✓v b✓ bvV X
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Summarizing
V X✓v b✓ bvV X
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Key idea: Control information-theoretic divergences
‖P0 − P1‖TV or I(X1, . . . , Xn;V )
to attain minimax rate
Proving minimax lower bounds
Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities
✓k
✓i
�
Summarizing
V X✓v b✓ bvV X
Mn(θ(P), ρ) ≥ ρ(δ) minv̂
P (v̂(X1, . . . , Xn) 6= V )
Key idea: Control information-theoretic divergences
‖P0 − P1‖TV or I(X1, . . . , Xn;V )
to attain minimax rate
Inference under privacy constraints
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have distribution P and parameter θ(P )
I Sample X1, . . . , Xn drawn from P and not observed
I Private views Z1, . . . , Zn constructed from Xi
I Goal: infer population parameter θ(P )
based on X1, . . . , Xn
Inference under privacy constraints
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have distribution P and parameter θ(P )
I Sample X1, . . . , Xn drawn from P and not observed
I Private views Z1, . . . , Zn constructed from Xi
I Goal: infer population parameter θ(P )
based on X1, . . . , Xn
Inference under privacy constraints
???
X1X2
X3 Xn. . .
Z1
Z2
Zn
✓(P )Infer
P
Database
Channel Q
I Have distribution P and parameter θ(P )
I Sample X1, . . . , Xn drawn from P and not observed
I Private views Z1, . . . , Zn constructed from Xi
I Goal: infer population parameter θ(P ) based on X1, . . . , Xn
Model of privacy
Local Privacy: Don’t trust collector of data (Evfimievski et al. 2003,Warner 1965)
M
Xi
Q(Z | X)
Zi
I Individuals i ∈ {1, . . . , n} have personal data Xi ∼ PθI Estimator Zn1 7→ θ̂(Z1:n)
Model of privacy
Local Privacy: Don’t trust collector of data (Evfimievski et al. 2003,Warner 1965)
X1 X2 X3 Xn
M
θ̂
M
Xi
Q(Z | X)
Zi
I Individuals i ∈ {1, . . . , n} have personal data Xi ∼ PθI Estimator Zn1 7→ θ̂(Z1:n)
Model of privacy
Local Privacy: Don’t trust collector of data (Evfimievski et al. 2003,Warner 1965)
X1 X2 X3 Xn
M
θ̂ M
Xi
Q(Z | X)
Zi
I Individuals i ∈ {1, . . . , n} have personal data Xi ∼ PθI Estimator Zn1 7→ θ̂(Z1:n)
Model of privacy
Local Privacy: Don’t trust collector of data (Evfimievski et al. 2003,Warner 1965)
X1 X2 X3 Xn
M
θ̂ M
Xi
Q(Z | X)
Zi
I Individuals i ∈ {1, . . . , n} have personal data Xi ∼ Pθ
I Estimator Zn1 7→ θ̂(Z1:n)
Model of privacy
Local Privacy: Don’t trust collector of data (Evfimievski et al. 2003,Warner 1965)
X1 X2 X3 Xn
M
θ̂ M
Xi
Q(Z | X)
Zi
I Individuals i ∈ {1, . . . , n} have personal data Xi ∼ PθI Estimator Zn1 7→ θ̂(Z1:n)
Differential privacy
Definition: The channel Q is α-differentiallyprivate if
maxz,x,x′
Q(Z = z | x)
Q(Z = z | x′) ≤ eα.
[Dwork, McSherry, Nissim, Smith 2006]
M
Xi
Q(Z | X)
Zi
Differential privacy
Definition: The channel Q is α-differentiallyprivate if
maxz,x,x′
Q(Z = z | x)
Q(Z = z | x′) ≤ eα.
[Dwork, McSherry, Nissim, Smith 2006]
logQ(z | x)
logQ(z | x′)
M
Xi
Q(Z | X)
Zi
Differential privacy
Definition: The channel Q is α-differentiallyprivate if
maxz,x,x′
Q(Z = z | x)
Q(Z = z | x′) ≤ eα.
[Dwork, McSherry, Nissim, Smith 2006]
What does this mean?
I Given Z, cannot tell what x gave Z
I Testing argument: based on Z, adversary mustdistinguish between x and x′:
FNR + FPR ≥ 2
1 + eα
[Wasserman and Zhou 2011]
M
Xi
Q(Z | X)
Zi
Differential privacy
Definition: The channel Q is α-differentiallyprivate if
maxz,x,x′
Q(Z = z | x)
Q(Z = z | x′) ≤ eα.
[Dwork, McSherry, Nissim, Smith 2006]
What does this mean?
I Given Z, cannot tell what x gave Z
I Testing argument: based on Z, adversary mustdistinguish between x and x′:
FNR + FPR ≥ 2
1 + eα
[Wasserman and Zhou 2011]
M
Xi
Q(Z | X)
Zi
Differential privacy
Definition: The channel Q is α-differentiallyprivate if
maxz,x,x′
Q(Z = z | x)
Q(Z = z | x′) ≤ eα.
[Dwork, McSherry, Nissim, Smith 2006]
What does this mean?
I Given Z, cannot tell what x gave Z
I Testing argument: based on Z, adversary mustdistinguish between x and x′:
FNR + FPR ≥ 2
1 + eα
[Wasserman and Zhou 2011]
M
Xi
Q(Z | X)
Zi
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
I Best case over all α-private channels Q ∈ Qα from X to Z
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
I Best case over all α-private channels Q ∈ Qα from X to Z
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
supP∈P
EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
I Worst case over distributions P
I Best case over all estimators θ̂ : Zn → Θ
I Best case over all α-private channels Q ∈ Qα from X to Z
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
infθ̂
supP∈P
EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
I Best case over all α-private channels Q ∈ Qα from X to Z
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
infθ̂
supP∈P
EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
︸ ︷︷ ︸Classical minimax risk
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
I Best case over all α-private channels Q ∈ Qα from X to Z
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
infQ∈Qα
infθ̂
supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
I Best case over all α-private channels Q ∈ Qα from X to Z
Minimax risk
Central object of study: Minimax risk
I Parameter θ(P ) of distribution P
I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)
I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study
Look at expected loss
Mn(θ(P), ρ, α) := infQ∈Qα
infθ̂
supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
︸ ︷︷ ︸Private minimax risk
I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ
I Best case over all α-private channels Q ∈ Qα from X to Z
Goal for the rest of the talk
How does the minimax risk
Mn(θ(P), ρ, α) := infQ∈Qα
infθ̂supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
change with privacy parameter α and number of samples n?
Many related results
I Non-population lower bounds [Hardt and Talwar 10, Nikkolov,Talwar, Zhang 13; Hall, Rinaldo, Wasserman 11, Chaudhuri,Monteleoni, Sarwate 12]
I Related population bounds:
I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08](e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)
I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]
Goal for the rest of the talk
How does the minimax risk
Mn(θ(P), ρ, α) := infQ∈Qα
infθ̂supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
change with privacy parameter α and number of samples n?
Many related results
I Non-population lower bounds [Hardt and Talwar 10, Nikkolov,Talwar, Zhang 13; Hall, Rinaldo, Wasserman 11, Chaudhuri,Monteleoni, Sarwate 12]
I Related population bounds:
I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08](e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)
I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]
Goal for the rest of the talk
How does the minimax risk
Mn(θ(P), ρ, α) := infQ∈Qα
infθ̂supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
change with privacy parameter α and number of samples n?
Many related results
I Non-population lower bounds [Hardt and Talwar 10, Nikkolov,Talwar, Zhang 13; Hall, Rinaldo, Wasserman 11, Chaudhuri,Monteleoni, Sarwate 12]
I Related population bounds:
I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08](e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)
I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]
Goal for the rest of the talk
How does the minimax risk
Mn(θ(P), ρ, α) := infQ∈Qα
infθ̂supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
change with privacy parameter α and number of samples n?
Many related results
I Non-population lower bounds [Hardt and Talwar 10, Nikkolov,Talwar, Zhang 13; Hall, Rinaldo, Wasserman 11, Chaudhuri,Monteleoni, Sarwate 12]
I Related population bounds:
I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08](e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)
I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]
Goal for the rest of the talk
How does the minimax risk
Mn(θ(P), ρ, α) := infQ∈Qα
infθ̂supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
change with privacy parameter α and number of samples n?
Many related results
I Non-population lower bounds [Hardt and Talwar 10, Nikkolov,Talwar, Zhang 13; Hall, Rinaldo, Wasserman 11, Chaudhuri,Monteleoni, Sarwate 12]
I Related population bounds:I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08]
(e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)
I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]
Goal for the rest of the talk
How does the minimax risk
Mn(θ(P), ρ, α) := infQ∈Qα
infθ̂supP∈P
EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))
]
change with privacy parameter α and number of samples n?
Many related results
I Non-population lower bounds [Hardt and Talwar 10, Nikkolov,Talwar, Zhang 13; Hall, Rinaldo, Wasserman 11, Chaudhuri,Monteleoni, Sarwate 12]
I Related population bounds:I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08]
(e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]
Examples
I Mean estimation
I Fixed-design regression
I Convex risk minimization (i.e. online learning)
I Multinomial (probability) estimation
I Nonparametric density estimation
Examples
I Mean estimation
I Fixed-design regression
I Convex risk minimization (i.e. online learning)
I Multinomial (probability) estimation
I Nonparametric density estimation
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
θ̂
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
θ̂ =1
n
n∑
i=1
Xi
︸ ︷︷ ︸Standard estimator
E[(θ̂ − E[X]
)2]≤ 1
n.
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Proposition (D., Jordan, Wainwright):Non-private minimax rate
1
n. E[(θ̂ − E[X])2] .
1
n
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Proposition (D., Jordan, Wainwright):Private minimax rate
1
(nα2)k−1k
. E[(θ̂ − E[X])2] .1
(nα2)k−1k
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Proposition (D., Jordan, Wainwright):Private minimax rate
1
(nα2)k−1k
. Mn(E[X], (·)2, α) .1
(nα2)k−1k
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Proposition (D., Jordan, Wainwright):Private minimax rate
1
(nα2)k−1k
. Mn(E[X], (·)2, α) .1
(nα2)k−1k
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 1: Mean estimation
Problem: Estimate mean of distributions P with k ≥ 2nd moment:
θ(P ) := EP [X], EP [|X|k] ≤ 1.
Proposition (D., Jordan, Wainwright):Private minimax rate
1
(nα2)k−1k
. Mn(E[X], (·)2, α) .1
(nα2)k−1k
Examples:
I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2
I For k →∞ (bounded random variables) parametric decrease
n 7→ nα2
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Example:
I $0–$10,000
I $10,001–$20,000
I $20,001–$40,000
I $40,001–$80,000
I $80,001–$160,000
I $160,001–$320,000
I $320,001+
θ1 = .05
θ2 = .1
θ3 = .2
θ4 = .4
θ5 = .2
θ6 = .04
θ7 = .01
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Example:
I $0–$10,000
I $10,001–$20,000
I $20,001–$40,000
I $40,001–$80,000
I $80,001–$160,000
I $160,001–$320,000
I $320,001+
θ1 = .05
θ2 = .1
θ3 = .2
θ4 = .4
θ5 = .2
θ6 = .04
θ7 = .01
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Example:
I $0–$10,000
I $10,001–$20,000
I $20,001–$40,000
I $40,001–$80,000
I $80,001–$160,000
I $160,001–$320,000
I $320,001+
θ1 = .05
θ2 = .1
θ3 = .2
θ4 = .4
θ5 = .2
θ6 = .04
θ7 = .01
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Example:
I $0–$10,000
I $10,001–$20,000
I $20,001–$40,000
I $40,001–$80,000
I $80,001–$160,000
I $160,001–$320,000
I $320,001+
θ1 = .05
θ2 = .1
θ3 = .2
θ4 = .4
θ5 = .2
θ6 = .04
θ7 = .01
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Usual rate:
E[‖θ̂ − θ‖22
]≤ 1
n.
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
θ̂j =1
n
n∑
i=1
1 {Xi = j} = P̂ (X = j)
︸ ︷︷ ︸Standard estimator
Usual rate:
E[‖θ̂ − θ‖22
]≤ 1
n.
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
θ̂j =1
n
n∑
i=1
1 {Xi = j} = P̂ (X = j)
︸ ︷︷ ︸Standard estimator (counts)
Usual rate:
E[‖θ̂ − θ‖22
]≤ 1
n.
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
θ̂j =1
n
n∑
i=1
1 {Xi = j} = P̂ (X = j)
︸ ︷︷ ︸Standard estimator (counts)
Usual rate:
E[‖θ̂ − θ‖22
]≤ 1
n.
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Take away: Sample size reduction
n 7→nα2
d
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Proposition: Non-private minimax rate
1
n. E
[‖θ̂ − θ‖22
].
1
n
Take away: Sample size reduction
n 7→nα2
d
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Proposition: Private minimax rate
d
(nα2). E
[‖θ̂ − θ‖22
].
d
(nα2)
Take away: Sample size reduction
n 7→nα2
d
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Proposition: Private minimax rate
d
(nα2). Mn([P (X = j)]dj=1, ‖·‖22 , α) .
d
(nα2)
Take away: Sample size reduction
n 7→nα2
d
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
Proposition: Private minimax rate
d
(nα2). Mn([P (X = j)]dj=1, ‖·‖22 , α) .
d
(nα2)
Take away: Sample size reduction
n 7→nα2
d
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
I Optimal mechanism: randomized response. Resample each coordinateby Bernoulli coin flips
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
I Optimal mechanism: randomized response. Resample each coordinateby Bernoulli coin flips
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
I Optimal mechanism: randomized response. Resample each coordinateby Bernoulli coin flips
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
Example 2: multinomial estimation
Problem: Get observations X ∈ [d] and wish to estimate
θj := P (X = j)
I Optimal mechanism: randomized response. Resample each coordinateby Bernoulli coin flips
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
I X $0–$10,000
I X $10,001–$20,000
I X $20,001–$40,000
I X $40,001–$80,000
I X $80,001–$160,000
I X $160,001–$320,000
I X $320,001+
Main consequences
Goal: Understand tradeoff between differential privacy bound α andsample size n
“Theorem 1” Effective sample size for essentially any1 problem ismade worse by at least
n 7→ nα2
1 essentially any: any problem whose minimax rate can be controlled byinformation-theoretic techniques
“Theorem 2” Effective sample size for d-dimensional problemsscales as
n 7→ nα2
d
Main consequences
Goal: Understand tradeoff between differential privacy bound α andsample size n
“Theorem 1” Effective sample size for essentially any1 problem ismade worse by at least
n 7→ nα2
1 essentially any: any problem whose minimax rate can be controlled byinformation-theoretic techniques
“Theorem 2” Effective sample size for d-dimensional problemsscales as
n 7→ nα2
d
Main consequences
Goal: Understand tradeoff between differential privacy bound α andsample size n
“Theorem 1” Effective sample size for essentially any1 problem ismade worse by at least
n 7→ nα2
1 essentially any: any problem whose minimax rate can be controlled byinformation-theoretic techniques
“Theorem 2” Effective sample size for d-dimensional problemsscales as
n 7→ nα2
d
General theory
Showing minimax bounds:
I Have possible “true” parameters {θv} we want to find
I Distribution Pv associated with each parameter
I Problem is hard when Pv ≈ Pv′
Pv Pv′ Pv Pv′
Easy Hard
General theory
Showing minimax bounds:
I Have possible “true” parameters {θv} we want to find
I Distribution Pv associated with each parameter
I Problem is hard when Pv ≈ Pv′
Pv Pv′ Pv Pv′
Easy Hard
Differential privacy and probability distributions
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Strong data processing: If Q(Z | x)/Q(Z | x′) ≤ eα,
Dkl (M1||M2) +Dkl (M2||M1) ≤ 4(eα − 1)2 ‖P1 − P2‖2TV
P1
P2
Q
Q
M1
M2
Differential privacy and probability distributions
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Strong data processing: If Q(Z | x)/Q(Z | x′) ≤ eα,
Dkl (M1||M2) +Dkl (M2||M1) ≤ 4(eα − 1)2 ‖P1 − P2‖2TV
P1
P2
Q
Q
M1
M2
Differential privacy and probability distributions
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Strong data processing: If Q(Z | x)/Q(Z | x′) ≤ eα,
Dkl (M1||M2) +Dkl (M2||M1) ≤ 4(eα − 1)2 ‖P1 − P2‖2TV
P1
P2
Q
Q
M1
M2
Contraction and lower bounds
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Le Cam’s Method
I θ1 and θ2 are 2δ separated
I Non-private version:
Mn(Θ, (·)2)≥ δ2
(1−
√nDkl (P1||P2)
)
I Private version:
Mn
(Θ, (·)2, α
)
≥ δ2(
1−√nα2 ‖P1 − P2‖2TV
)
Contraction and lower bounds
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Le Cam’s Method
I θ1 and θ2 are 2δ separated
I Non-private version:
Mn(Θ, (·)2)≥ δ2
(1−
√nDkl (P1||P2)
)
I Private version:
Mn
(Θ, (·)2, α
)
≥ δ2(
1−√nα2 ‖P1 − P2‖2TV
)
P1 P2
2δθ1 θ2
Contraction and lower bounds
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Le Cam’s Method
I θ1 and θ2 are 2δ separated
I Non-private version:
Mn(Θ, (·)2)≥ δ2
(1−
√nDkl (P1||P2)
)
I Private version:
Mn
(Θ, (·)2, α
)
≥ δ2(
1−√nα2 ‖P1 − P2‖2TV
)
P1 P2
2δθ1 θ2
Contraction and lower bounds
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Le Cam’s Method
I θ1 and θ2 are 2δ separated
I Non-private version:
Mn(Θ, (·)2)≥ δ2
(1−
√nDkl (P1||P2)
)
I Private version:
Mn
(Θ, (·)2, α
)
≥ δ2(
1−√nα2 ‖P1 − P2‖2TV
)
P1 P2
2δθ1 θ2
M1 M2
Variational results on privacy and probability distributions
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Canonical problem: Nature samples V uniformly from v = 1, . . . ,K anddraws
Xii.i.d.∼ Pv when V = v
V X Z
Goal: Find V based on Z1, . . . , Zn
Difficulty of problem: Saw earlier mutual information
I(X1, . . . , Xn;V ) 7→ I(Z1, . . . , Zn;V )
Variational results on privacy and probability distributions
Samples: Zi are drawn Xi → Q→ Zi from marginal
Mv(Z) :=
∫Q(Z | X = x)dPv(x)
Canonical problem: Nature samples V uniformly from v = 1, . . . ,K anddraws
Xii.i.d.∼ Pv when V = v
V X Z
Goal: Find V based on Z1, . . . , ZnDifficulty of problem: Saw earlier mutual information
I(X1, . . . , Xn;V ) 7→ I(Z1, . . . , Zn;V )
Fano inequality, lower bounds, contraction
V X Z
I Have parameters θ1, . . . , θK , choose randomly V ∈ [K]
I Sample Xi according to θv when V = v
I Sample Zi according to Q(· | Xi)
I Non-private Fano inequality:
P(Error) ≥ 1− I(X1, . . . , Xn;V )
logK− o(1)
Fano inequality, lower bounds, contraction
V X Z
I Have parameters θ1, . . . , θK , choose randomly V ∈ [K]
I Sample Xi according to θv when V = v
I Sample Zi according to Q(· | Xi)
I Non-private Fano inequality:
P(Error) ≥ 1− I(X1, . . . , Xn;V )
logK− o(1)
Fano inequality, lower bounds, contraction
V X Z
I Have parameters θ1, . . . , θK , choose randomly V ∈ [K]
I Sample Xi according to θv when V = v
I Sample Zi according to Q(· | Xi)
I Non-private Fano inequality:
P(Error) ≥ 1− I(X1, . . . , Xn;V )
logK− o(1)
I Private Fano inequality:
P(Error) ≥ 1− I(Z1, . . . , Zn;V )
logK− o(1)
Fano inequality, lower bounds, contraction
V X Z
I Have parameters θ1, . . . , θK , choose randomly V ∈ [K]
I Sample Xi according to θv when V = v
I Sample Zi according to Q(· | Xi)
I Non-private Fano inequality:
P(Error) ≥ 1− I(X1, . . . , Xn;V )
logK− o(1)
I Private Fano inequality:
P(Error) ≥ 1− α2I(X1, . . . , Xn;V )
d logK− o(1)
Variational results on privacy and probability distributions
V X Z
I Define mixture P = 1K
∑Kv=1 Pv
Mutual information contraction: For any non-interactiveα-locally private channel Q,
What happens? Roughly
Variational results on privacy and probability distributions
V X Z
I Define mixture P = 1K
∑Kv=1 Pv
Mutual information contraction: For any non-interactiveα-locally private channel Q,
I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS
1
K
K∑
v=1
(Pv(S)− P (S)
)2
What happens? Roughly
Variational results on privacy and probability distributions
V X Z
I Define mixture P = 1K
∑Kv=1 Pv
Mutual information contraction: For any non-interactiveα-locally private channel Q,
I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS
1
K
K∑
v=1
(Pv(S)− P (S)
)2
︸ ︷︷ ︸Dimension-dependent total variation
What happens? Roughly
Variational results on privacy and probability distributions
V X Z
I Define mixture P = 1K
∑Kv=1 Pv
Mutual information contraction: For any non-interactiveα-locally private channel Q,
I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS
1
K
K∑
v=1
(Pv(S)− P (S)
)2
︸ ︷︷ ︸Dimension-dependent total variation
What happens? Roughly
n supS
1
K
K∑
v=1
(Pv(S)− P (S)
)2 ≈ 1
dI(X1, . . . , Xn;V )
Variational results on privacy and probability distributions
V X Z
I Define mixture P = 1K
∑Kv=1 Pv
Mutual information contraction: For any non-interactiveα-locally private channel Q,
I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS
1
K
K∑
v=1
(Pv(S)− P (S)
)2
︸ ︷︷ ︸Dimension-dependent total variation
What happens? Roughly
n supS
1
K
K∑
v=1
(Pv(S)− P (S)
)2 ≈ 1
dI(X1, . . . , Xn;V )︸ ︷︷ ︸
Classical
Variational results on privacy and probability distributions
V X Z
I Define mixture P = 1K
∑Kv=1 Pv
Mutual information contraction: For any non-interactiveα-locally private channel Q,
I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS
1
K
K∑
v=1
(Pv(S)− P (S)
)2
︸ ︷︷ ︸Dimension-dependent total variation
What happens? Roughly
I(Z1, . . . , Zn;V ) ≤ α2
dI(X1, . . . , Xn;V )︸ ︷︷ ︸
Classical
Summary
High level results:
I Formal minimax framework for local differential privacy
I Two main theorems bound distances between probabilitydistributions as function of privacy
I Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
Extensions and other conclusions:
I In essentially any problem, effective number of samples
n 7→ nα2
I In d-dimensional problems, effective number of samples
n 7→ nα2
dI Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible
(no logarithmic dependence on dimension)I Identification of optimal mechanism requires geometric understanding
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacy
I Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
Extensions and other conclusions:
I In essentially any problem, effective number of samples
n 7→ nα2
I In d-dimensional problems, effective number of samples
n 7→ nα2
dI Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible
(no logarithmic dependence on dimension)I Identification of optimal mechanism requires geometric understanding
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
Extensions and other conclusions:
I In essentially any problem, effective number of samples
n 7→ nα2
I In d-dimensional problems, effective number of samples
n 7→ nα2
dI Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible
(no logarithmic dependence on dimension)I Identification of optimal mechanism requires geometric understanding
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
Extensions and other conclusions:I In essentially any problem, effective number of samples
n 7→ nα2
I In d-dimensional problems, effective number of samples
n 7→ nα2
d
I Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible
(no logarithmic dependence on dimension)I Identification of optimal mechanism requires geometric understanding
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
Extensions and other conclusions:I In essentially any problem, effective number of samples
n 7→ nα2
I In d-dimensional problems, effective number of samples
n 7→ nα2
dI Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible
(no logarithmic dependence on dimension)
I Identification of optimal mechanism requires geometric understanding
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
Extensions and other conclusions:I In essentially any problem, effective number of samples
n 7→ nα2
I In d-dimensional problems, effective number of samples
n 7→ nα2
dI Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible
(no logarithmic dependence on dimension)I Identification of optimal mechanism requires geometric understanding
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
I Trade-offs between privacy and statistical utilityI Many open problems:
I Other models of privacy?I Non-local notions of privacy?I Privacy without knowing statistical objective?
Pre/post-print online: “Local privacy and statistical minimax rates.” D.,Jordan, & Wainwright (2013). arXiv:1302.3203 [stat.TH]
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
I Trade-offs between privacy and statistical utility
I Many open problems:
I Other models of privacy?I Non-local notions of privacy?I Privacy without knowing statistical objective?
Pre/post-print online: “Local privacy and statistical minimax rates.” D.,Jordan, & Wainwright (2013). arXiv:1302.3203 [stat.TH]
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
I Trade-offs between privacy and statistical utilityI Many open problems:
I Other models of privacy?I Non-local notions of privacy?I Privacy without knowing statistical objective?
Pre/post-print online: “Local privacy and statistical minimax rates.” D.,Jordan, & Wainwright (2013). arXiv:1302.3203 [stat.TH]
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
I Trade-offs between privacy and statistical utilityI Many open problems:
I Other models of privacy?I Non-local notions of privacy?I Privacy without knowing statistical objective?
Pre/post-print online: “Local privacy and statistical minimax rates.” D.,Jordan, & Wainwright (2013). arXiv:1302.3203 [stat.TH]
Summary
High level results:
I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability
distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method
I Trade-offs between privacy and statistical utilityI Many open problems:
I Other models of privacy?I Non-local notions of privacy?I Privacy without knowing statistical objective?
Pre/post-print online: “Local privacy and statistical minimax rates.” D.,Jordan, & Wainwright (2013). arXiv:1302.3203 [stat.TH]