local privacy and statistical minimax rates · 2020-01-03 · goals for this talk bring together...

Local Privacy and Statistical Minimax Rates

John C. Duchi, Michael I. Jordan, Martin J. Wainwright

University of California, Berkeley

December 2013

Goals for this talk

Bring together some classical concepts of decision theoryand newer concepts of privacy

Illustration of problem

I Have data X1, . . . , Xn

I Private views Z1, . . . , Zn constructed from Xi

I Often: goal to get statistics of {X1, . . . , Xn} (e.g. average salary)

I Distribution P and parameter θ(P ) generate data

I Sample X1, . . . , Xn from P not observed (only Zi)

I Goal: infer population parameter θ(P ) based on Z1, . . . , Zn

X3 Xn. . .

✓(P )Infer

Database

Channel Q

X3 Xn. . .

✓(P )Infer

Database

Channel Q

X3 Xn. . .

✓(P )Infer

Database

Channel Q

X3 Xn. . .

✓(P )Infer

Database

Channel Q

X3 Xn. . .

✓(P )Infer

Database

Channel Q

X3 Xn. . .

✓(P )Infer

Database

Channel Q

Primer on minimax rates of convergence and statisticalinference

Illustration of classical problem

X3 Xn. . .

✓(P ) Infer

I Have distribution P and parameter θ(P ) of P

I Sample X1, . . . , Xn drawn from P and observed

I Goal: infer population parameter θ(P )

Why? Care about making future predictions

I What is likelihood new resident of San Francisco needs food stamps

I Biological prediction, web advertising, search, ...

X3 Xn. . .

✓(P ) Infer

X3 Xn. . .

✓(P ) Infer

X3 Xn. . .

✓(P ) Infer

X3 Xn. . .

✓(P ) Infer

X3 Xn. . .

✓(P ) Infer

Minimax risk

Central object of study: Minimax riskI Parameter θ(P ) of distribution P

I E.g. mean: θ(P ) = EP [X]

I Loss ρ that measures error in estimate of θ̂ for θ: ρ(θ̂, θ)

I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22 or more esoteric/robust

ρu(θ̂, θ)

{12‖θ̂ − θ‖22 if ‖θ̂ − θ‖2 ≤ uu‖θ̂ − θ‖2 − u2/2 otherwise

I Family of distributions P that we studyI E.g. P such that EP [X2] ≤ 1

Minimax risk

ρu(θ̂, θ)

Minimax risk

ρu(θ̂, θ)

Minimax risk

Central object of study: Minimax risk

I Parameter θ(P ) of distribution P

I E.g. ρ(θ̂, θ) = ‖θ̂ − θ‖22I Family of distributions P that we study

Look at expected loss

I Worst case over distributions PI Best case over all estimators θ̂ : Zn → Θ

To study: rate of Mn(θ(P), ρ)→ 0 as n grows

Minimax risk

EP[ρ(θ̂(X1, . . . , Xn), θ(P ))

Minimax risk

supP∈P

EP[ρ(θ̂(X1, . . . , Xn), θ(P ))

I Worst case over distributions P

I Best case over all estimators θ̂ : Zn → Θ

Minimax risk

infθ̂

supP∈P

EP[ρ(θ̂(X1, . . . , Xn), θ(P ))

Minimax risk

Mn(θ(P), ρ) := infθ̂

supP∈P

EP[ρ(θ̂(X1, . . . , Xn), θ(P ))

︸︷︷︸Classical minimax risk

Minimax risk

Mn(θ(P), ρ) := infθ̂

supP∈P

EP[ρ(θ̂(X1, . . . , Xn), θ(P ))

Proving minimax bounds

This talk:

I Upper bounds will be ad-hoc

I Lower bounds will be information theoretic [Hasminskii 78, Birge 83,Ibragimov and Hasminskii 81, Yang and Barron 99, Yu97]

I NB: Many known information-theoretic upper bounds [Barron, Birge,Kivinen, ...]

This talk:

Proving minimax lower bounds

Step 1: Reduce from estimation to testing

Step 2: Canonical testing problem

Step 3: Classical bounds on error probabilities

Step 1: Reduce from estimation to testingStep 2: Canonical testing problemStep 3: Classical bounds on error probabilities

I Have 2δ-packing of Θ,i.e. {θ1, θ2, . . . , θK}

I Estimator θ̂

I At most one index closeto θ̂

I Can test index i ∈ [K]

I Estimator θ̂

� b✓

I Estimator θ̂

� b✓

I Estimator θ̂

� b✓

I Estimator θ̂

I Nature chooses random index V ∈ [K]I Conditional on V = v, sample X1, . . . , Xn i.i.d. from PvI Lower bound minimax error:

supP∈P

EP[ρ(θ̂, θ(P ))

]≥ 1

Ev[ρ(θ̂, θv)

ρ(δ)Pv(ρ(θ̂, θv) ≥ 2δ)

I Final canonical testing problem:

Mn(θ(P), ρ) ≥ ρ(δ) minv̂

P(v̂(X1, . . . , Xn) 6= V ).

I Nature chooses random index V ∈ [K]

I Conditional on V = v, sample X1, . . . , Xn i.i.d. from PvI Lower bound minimax error:

supP∈P

EP[ρ(θ̂, θ(P ))

]≥ 1

Ev[ρ(θ̂, θv)

P(v̂(X1, . . . , Xn) 6= V ).

I Nature chooses random index V ∈ [K]I Conditional on V = v, sample X1, . . . , Xn i.i.d. from Pv

I Lower bound minimax error:

supP∈P

EP[ρ(θ̂, θ(P ))

]≥ 1

Ev[ρ(θ̂, θv)

P(v̂(X1, . . . , Xn) 6= V ).

supP∈P

EP[ρ(θ̂, θ(P ))

]≥ 1

Ev[ρ(θ̂, θv)

P(v̂(X1, . . . , Xn) 6= V ).

supP∈P

EP[ρ(θ̂, θ(P ))

]≥ 1

Ev[ρ(θ̂, θv)

P(v̂(X1, . . . , Xn) 6= V ).

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

Fano’s inequality (for K > 2)

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bv

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bvV X

P (v̂(X1, . . . , Xn) 6= V )

Le Cam’s method

P0(v̂ 6= 0) + P1(v̂ 6= 1)

≥ 1− ‖P0 − P1‖TV

P(v̂ 6= V )

≥ 1− I(X1, . . . , Xn;V ) + log 2

Summarizing

V X✓v b✓ bvV X

P (v̂(X1, . . . , Xn) 6= V )

Summarizing

V X✓v b✓ bvV X

P (v̂(X1, . . . , Xn) 6= V )

Key idea: Control information-theoretic divergences

‖P0 − P1‖TV or I(X1, . . . , Xn;V )

to attain minimax rate

Summarizing

V X✓v b✓ bvV X

P (v̂(X1, . . . , Xn) 6= V )

Key idea: Control information-theoretic divergences

‖P0 − P1‖TV or I(X1, . . . , Xn;V )

to attain minimax rate

Inference under privacy constraints

X3 Xn. . .

✓(P )Infer

Database

Channel Q

I Have distribution P and parameter θ(P )

I Sample X1, . . . , Xn drawn from P and not observed

based on X1, . . . , Xn

X3 Xn. . .

✓(P )Infer

Database

Channel Q

based on X1, . . . , Xn

X3 Xn. . .

✓(P )Infer

Database

Channel Q

I Goal: infer population parameter θ(P ) based on X1, . . . , Xn

Model of privacy

Local Privacy: Don’t trust collector of data (Evfimievski et al. 2003,Warner 1965)

Q(Z | X)

I Individuals i ∈ {1, . . . , n} have personal data Xi ∼ PθI Estimator Zn1 7→ θ̂(Z1:n)

Model of privacy

X1 X2 X3 Xn

Q(Z | X)

Model of privacy

X1 X2 X3 Xn

θ̂ M

Q(Z | X)

Model of privacy

X1 X2 X3 Xn

θ̂ M

Q(Z | X)

I Individuals i ∈ {1, . . . , n} have personal data Xi ∼ Pθ

I Estimator Zn1 7→ θ̂(Z1:n)

Model of privacy

X1 X2 X3 Xn

θ̂ M

Q(Z | X)

Differential privacy

Definition: The channel Q is α-differentiallyprivate if

maxz,x,x′

Q(Z = z | x)

Q(Z = z | x′) ≤ eα.

[Dwork, McSherry, Nissim, Smith 2006]

Q(Z | X)

maxz,x,x′

Q(Z = z | x)

Q(Z = z | x′) ≤ eα.

logQ(z | x)

logQ(z | x′)

Q(Z | X)

maxz,x,x′

Q(Z = z | x)

Q(Z = z | x′) ≤ eα.

What does this mean?

I Given Z, cannot tell what x gave Z

I Testing argument: based on Z, adversary mustdistinguish between x and x′:

FNR + FPR ≥ 2

1 + eα

[Wasserman and Zhou 2011]

Q(Z | X)

maxz,x,x′

Q(Z = z | x)

Q(Z = z | x′) ≤ eα.

FNR + FPR ≥ 2

1 + eα

Q(Z | X)

maxz,x,x′

Q(Z = z | x)

Q(Z = z | x′) ≤ eα.

FNR + FPR ≥ 2

1 + eα

Q(Z | X)

Minimax risk

I Best case over all α-private channels Q ∈ Qα from X to Z

Minimax risk

EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))

Minimax risk

supP∈P

EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))

I Worst case over distributions P

I Best case over all estimators θ̂ : Zn → Θ

Minimax risk

infθ̂

supP∈P

EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))

Minimax risk

infθ̂

supP∈P

EP[ρ(θ̂(Z1, . . . , Zn), θ(P ))

Minimax risk

infQ∈Qα

infθ̂

supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

Minimax risk

Mn(θ(P), ρ, α) := infQ∈Qα

infθ̂

supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

︸︷︷︸Private minimax risk

Goal for the rest of the talk

How does the minimax risk

infθ̂supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

change with privacy parameter α and number of samples n?

Many related results

I Non-population lower bounds [Hardt and Talwar 10, Nikkolov,Talwar, Zhang 13; Hall, Rinaldo, Wasserman 11, Chaudhuri,Monteleoni, Sarwate 12]

I Related population bounds:

I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08](e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)

I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]

infθ̂supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

infθ̂supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

infθ̂supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

infθ̂supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

I Related population bounds:I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08]

(e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)

infθ̂supP∈P

EP ,Q[ρ(θ̂(Z1, . . . , Zn), θ(P ))

I Related population bounds:I Two point hypotheses [Chaudhuri & Hsu 12, Beimel, Nissim, Omri 08]

(e.g. for 1-dimensional bias estimation, get 1/(nα)2 error)I PAC learning results [Beimel, Brenner, Kasiviswanathan, Nissim 13]

Examples

I Mean estimation

I Fixed-design regression

I Convex risk minimization (i.e. online learning)

I Multinomial (probability) estimation

I Nonparametric density estimation

Examples

I Mean estimation

I Fixed-design regression

I Convex risk minimization (i.e. online learning)

I Multinomial (probability) estimation

I Nonparametric density estimation

Example 1: Mean estimation

Problem: Estimate mean of distributions P with k ≥ 2nd moment:

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Examples:

I For two moments k = 2, rate goes from parametric 1/n to 1/√nα2

I For k →∞ (bounded random variables) parametric decrease

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

θ̂ =1

︸︷︷︸Standard estimator

E[(θ̂ − E[X]

)2]≤ 1

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Proposition (D., Jordan, Wainwright):Non-private minimax rate

n. E[(θ̂ − E[X])2] .

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

Proposition (D., Jordan, Wainwright):Private minimax rate

(nα2)k−1k

. E[(θ̂ − E[X])2] .1

(nα2)k−1k

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

(nα2)k−1k

. Mn(E[X], (·)2, α) .1

(nα2)k−1k

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

(nα2)k−1k

. Mn(E[X], (·)2, α) .1

(nα2)k−1k

Examples:

n 7→ nα2

θ(P ) := EP [X], EP [|X|k] ≤ 1.

(nα2)k−1k

. Mn(E[X], (·)2, α) .1

(nα2)k−1k

Examples:

n 7→ nα2

Example 2: multinomial estimation

Problem: Get observations X ∈ [d] and wish to estimate

θj := P (X = j)

Example:

I $0–$10,000

I $10,001–$20,000

I $20,001–$40,000

I $40,001–$80,000

I $80,001–$160,000

I $160,001–$320,000

I $320,001+

θ1 = .05

θ2 = .1

θ3 = .2

θ4 = .4

θ5 = .2

θ6 = .04

θ7 = .01

θj := P (X = j)

Example:

I $0–$10,000

I $10,001–$20,000

I $20,001–$40,000

I $40,001–$80,000

I $80,001–$160,000

I $160,001–$320,000

I $320,001+

θ1 = .05

θ2 = .1

θ3 = .2

θ4 = .4

θ5 = .2

θ6 = .04

θ7 = .01

θj := P (X = j)

Example:

I $0–$10,000

I $10,001–$20,000

I $20,001–$40,000

I $40,001–$80,000

I $80,001–$160,000

I $160,001–$320,000

I $320,001+

θ1 = .05

θ2 = .1

θ3 = .2

θ4 = .4

θ5 = .2

θ6 = .04

θ7 = .01

θj := P (X = j)

Example:

I $0–$10,000

I $10,001–$20,000

I $20,001–$40,000

I $40,001–$80,000

I $80,001–$160,000

I $160,001–$320,000

I $320,001+

θ1 = .05

θ2 = .1

θ3 = .2

θ4 = .4

θ5 = .2

θ6 = .04

θ7 = .01

θj := P (X = j)

Usual rate:

E[‖θ̂ − θ‖22

]≤ 1

θj := P (X = j)

θ̂j =1

1 {Xi = j} = P̂ (X = j)

︸︷︷︸Standard estimator

Usual rate:

E[‖θ̂ − θ‖22

]≤ 1

θj := P (X = j)

θ̂j =1

1 {Xi = j} = P̂ (X = j)

︸︷︷︸Standard estimator (counts)

Usual rate:

E[‖θ̂ − θ‖22

]≤ 1

θj := P (X = j)

θ̂j =1

1 {Xi = j} = P̂ (X = j)

︸︷︷︸Standard estimator (counts)

Usual rate:

E[‖θ̂ − θ‖22

]≤ 1

θj := P (X = j)

Take away: Sample size reduction

n 7→nα2

θj := P (X = j)

Proposition: Non-private minimax rate

[‖θ̂ − θ‖22

n 7→nα2

θj := P (X = j)

Proposition: Private minimax rate

(nα2). E

[‖θ̂ − θ‖22

(nα2)

n 7→nα2

θj := P (X = j)

(nα2). Mn([P (X = j)]dj=1, ‖·‖22 , α) .

(nα2)

n 7→nα2

θj := P (X = j)

(nα2). Mn([P (X = j)]dj=1, ‖·‖22 , α) .

(nα2)

n 7→nα2

θj := P (X = j)

I Optimal mechanism: randomized response. Resample each coordinateby Bernoulli coin flips

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

θj := P (X = j)

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

θj := P (X = j)

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

θj := P (X = j)

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

I X $0–$10,000

I X $10,001–$20,000

I X $20,001–$40,000

I X $40,001–$80,000

I X $80,001–$160,000

I X $160,001–$320,000

I X $320,001+

Main consequences

Goal: Understand tradeoff between differential privacy bound α andsample size n

“Theorem 1” Effective sample size for essentially any1 problem ismade worse by at least

n 7→ nα2

1 essentially any: any problem whose minimax rate can be controlled byinformation-theoretic techniques

“Theorem 2” Effective sample size for d-dimensional problemsscales as

n 7→ nα2

Main consequences

n 7→ nα2

Main consequences

n 7→ nα2

General theory

Showing minimax bounds:

I Have possible “true” parameters {θv} we want to find

I Distribution Pv associated with each parameter

I Problem is hard when Pv ≈ Pv′

Pv Pv′ Pv Pv′

Easy Hard

General theory

Showing minimax bounds:

I Have possible “true” parameters {θv} we want to find

I Distribution Pv associated with each parameter

I Problem is hard when Pv ≈ Pv′

Pv Pv′ Pv Pv′

Easy Hard

Differential privacy and probability distributions

Samples: Zi are drawn Xi → Q→ Zi from marginal

Mv(Z) :=

∫Q(Z | X = x)dPv(x)

Strong data processing: If Q(Z | x)/Q(Z | x′) ≤ eα,

Dkl (M1||M2) +Dkl (M2||M1) ≤ 4(eα − 1)2 ‖P1 − P2‖2TV

Mv(Z) :=

Contraction and lower bounds

Mv(Z) :=

Le Cam’s Method

I θ1 and θ2 are 2δ separated

I Non-private version:

Mn(Θ, (·)2)≥ δ2

√nDkl (P1||P2)

I Private version:

(Θ, (·)2, α

≥ δ2(

1−√nα2 ‖P1 − P2‖2TV

Mv(Z) :=

Le Cam’s Method

Mn(Θ, (·)2)≥ δ2

√nDkl (P1||P2)

I Private version:

(Θ, (·)2, α

≥ δ2(

1−√nα2 ‖P1 − P2‖2TV

2δθ1 θ2

Mv(Z) :=

Le Cam’s Method

Mn(Θ, (·)2)≥ δ2

√nDkl (P1||P2)

I Private version:

(Θ, (·)2, α

≥ δ2(

1−√nα2 ‖P1 − P2‖2TV

2δθ1 θ2

Mv(Z) :=

Le Cam’s Method

Mn(Θ, (·)2)≥ δ2

√nDkl (P1||P2)

I Private version:

(Θ, (·)2, α

≥ δ2(

1−√nα2 ‖P1 − P2‖2TV

2δθ1 θ2

Variational results on privacy and probability distributions

Mv(Z) :=

Canonical problem: Nature samples V uniformly from v = 1, . . . ,K anddraws

Xii.i.d.∼ Pv when V = v

Goal: Find V based on Z1, . . . , Zn

Difficulty of problem: Saw earlier mutual information

I(X1, . . . , Xn;V ) 7→ I(Z1, . . . , Zn;V )

Mv(Z) :=

Canonical problem: Nature samples V uniformly from v = 1, . . . ,K anddraws

Xii.i.d.∼ Pv when V = v

Goal: Find V based on Z1, . . . , ZnDifficulty of problem: Saw earlier mutual information

I(X1, . . . , Xn;V ) 7→ I(Z1, . . . , Zn;V )

Fano inequality, lower bounds, contraction

I Have parameters θ1, . . . , θK , choose randomly V ∈ [K]

I Sample Xi according to θv when V = v

I Sample Zi according to Q(· | Xi)

I Non-private Fano inequality:

P(Error) ≥ 1− I(X1, . . . , Xn;V )

logK− o(1)

P(Error) ≥ 1− I(X1, . . . , Xn;V )

logK− o(1)

P(Error) ≥ 1− I(X1, . . . , Xn;V )

logK− o(1)

I Private Fano inequality:

P(Error) ≥ 1− I(Z1, . . . , Zn;V )

logK− o(1)

P(Error) ≥ 1− I(X1, . . . , Xn;V )

logK− o(1)

I Private Fano inequality:

P(Error) ≥ 1− α2I(X1, . . . , Xn;V )

d logK− o(1)

I Define mixture P = 1K

∑Kv=1 Pv

Mutual information contraction: For any non-interactiveα-locally private channel Q,

What happens? Roughly

∑Kv=1 Pv

I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS

(Pv(S)− P (S)

∑Kv=1 Pv

I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS

(Pv(S)− P (S)

︸︷︷︸Dimension-dependent total variation

∑Kv=1 Pv

I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS

(Pv(S)− P (S)

n supS

(Pv(S)− P (S)

)2 ≈ 1

dI(X1, . . . , Xn;V )

∑Kv=1 Pv

I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS

(Pv(S)− P (S)

n supS

(Pv(S)− P (S)

)2 ≈ 1

dI(X1, . . . , Xn;V )︸︷︷︸

Classical

∑Kv=1 Pv

I(Z1, . . . , Zn;V ) ≤ n(eα − 1)2 supS

(Pv(S)− P (S)

I(Z1, . . . , Zn;V ) ≤ α2

dI(X1, . . . , Xn;V )︸︷︷︸

Classical

Summary

High level results:

I Formal minimax framework for local differential privacy

I Two main theorems bound distances between probabilitydistributions as function of privacy

I Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method

Extensions and other conclusions:

I In essentially any problem, effective number of samples

n 7→ nα2

I In d-dimensional problems, effective number of samples

n 7→ nα2

dI Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible

(no logarithmic dependence on dimension)I Identification of optimal mechanism requires geometric understanding

Summary

High level results:

I Formal minimax framework for local differential privacyI Two main theorems bound distances between probability

distributions as function of privacy

I Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method

n 7→ nα2

Summary

High level results:

distributions as function of privacyI Pairwise contraction: Le Cam’s methodI Mutual information contraction: Fano’s method

n 7→ nα2

Summary

High level results:

Extensions and other conclusions:I In essentially any problem, effective number of samples

n 7→ nα2

I Rates for regression, multinomial estimation, convex optimizationI Dimension-dependent effects: High-dimensional problems impossible

Summary

High level results:

n 7→ nα2

(no logarithmic dependence on dimension)

I Identification of optimal mechanism requires geometric understanding

Summary

High level results:

n 7→ nα2

Summary

High level results:

I Trade-offs between privacy and statistical utilityI Many open problems:

I Other models of privacy?I Non-local notions of privacy?I Privacy without knowing statistical objective?

Pre/post-print online: “Local privacy and statistical minimax rates.” D.,Jordan, & Wainwright (2013). arXiv:1302.3203 [stat.TH]

Summary

High level results:

I Trade-offs between privacy and statistical utility

I Many open problems:

Summary

High level results:

Summary

High level results:

Summary

High level results:

local privacy and statistical minimax rates · 2020-01-03 · goals for this talk bring together...

Documents