foundations of privacy lecture 5

40
Foundations of Privacy Lecture 5 Lecturer: Moni Naor

Upload: eitan

Post on 06-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Foundations of Privacy Lecture 5. Lecturer: Moni Naor. Recap of last week’s lecture. The Exponential Mechanism Differential privacy May yield utility/approximation Is defined and evaluated by considering all possible answers Counting Queries The BLR Algorithm Efficient Algorithm. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Foundations of Privacy Lecture 5

Foundations of Privacy

Lecture 5

Lecturer: Moni Naor

Page 2: Foundations of Privacy Lecture 5

Recap of last week’s lecture• The Exponential Mechanism

– Differential privacy– May yield utility/approximation– Is defined and evaluated by considering all possible

answers

• Counting Queries– The BLR Algorithm– Efficient Algorithm

Page 3: Foundations of Privacy Lecture 5

query 1,query 2,. . .

Synthetic DB: Output is a DB

Database

answer 1answer 3

answer 2

?

Sanitizer

Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB

Software and people compatibleConsistent answers

Page 4: Foundations of Privacy Lecture 5

Counting Queries

• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

Assume all queries given in advance

U

Database D of size n

Query c

Non-interactive

Page 5: Foundations of Privacy Lecture 5

The BLR Algorithm

For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|

Intuition: far away DBs get smaller probability

Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)

DB F gets picked w.p. / e-ε·dist(F,D)

Blum Ligett Roth08

Page 6: Foundations of Privacy Lecture 5

Counting Queries

• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

U

Database D of size n

Query c

Sample F of size m approx D on all given predicates

c

Page 7: Foundations of Privacy Lecture 5

The BLR Algorithm: Error Õ(n2/3 log|C|)

There exists Fgood of size m =Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤ α

Pr[Fgood] / e-εα

For any Fbad with dist 2α, Pr[Fbad] / e-2εα

Union bound: ∑ bad DB Fbad Pr[Fbad] / |U|me-2εα

For α=Õ(n2/3log|C|), Pr[Fgood] >> ∑ Pr[Fbad]

Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)

DB F gets picked w.p. / e-ε·dist(F,D)

Page 8: Foundations of Privacy Lecture 5

The BLR Algorithm: Running Time

Generating the distribution by enumeration:Need to enumerate every size-m database,where m = Õ((n\α)2·log|C|)

Running time ≈ |U|Õ((n\α)2·log|c|)

Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)

DB F gets picked w.p. / e-ε·dist(F,D)

Page 9: Foundations of Privacy Lecture 5

Conclusion

Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries

• Error α is Õ(n2/3 log|C|/ε)

• Super-poly running time: |U|Õ((n\α)2·log|C|)

Page 10: Foundations of Privacy Lecture 5

Can we Efficiently Sanitize?

The good news

If the universe is small, Can sanitize EFFICIENTLY

The bad news cannot do much better, namely sanitize in time:

sub-poly(|C|) AND sub-poly(|U|)

Time poly(|C|,|U|)

Page 11: Foundations of Privacy Lecture 5

How Efficiently Can We Sanitize?

|C|

|U|subpol

ypoly

subpoly

poly

?

Good news!

?

? ?

Page 12: Foundations of Privacy Lecture 5

The Good News: Can Sanitize When Universe is Small

Efficient Sanitizer for query set C• DB size n ¸ Õ(|C|o(1) log|U|)• error is ~ n2/3 • Runtime poly(|C|,|U|)

Output is a synthetic database

Compare to [Blum Ligget Roth]:

n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)

Page 13: Foundations of Privacy Lecture 5

Recursive Algorithm

C0=C C1 C2 Cb

Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:

shrink query set by (small) factor

Page 14: Foundations of Privacy Lecture 5

Recursive Algorithm

Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:

shrink query set by (small) factorEnd recursion: sanitize D w.r.t. small query set Cb

Output is good for all queries in small set Ci+1

Extract utility on almost-all queries in large set Ci

Fix remaining “underprivileged” queries in large set Ci

C0=C C1 C2 Cb

Page 15: Foundations of Privacy Lecture 5

Recursive Algorithm Overview Want to sanitize DB D for query set CSay we have a small sanitizer A’ for smaller subsets C’ ½ C,

and A’ outputs small synthetic databaseChoose random C’½ C, sanitize D for C’ using A’

“Magic”: Sanitization givesaccurate answers onall but small subset B ½ C

Fix “underprivileged”queries in B “manually”

CC’

B

A’ sanitizesFix manually

Why?

How?

Where?

Page 16: Foundations of Privacy Lecture 5

Sanitize for few queries, get utility for almost all

Consider m-bit synthetic DB output y of A’ vs. DB D:If y is “bad” for query set By of fractional size ≥m/s:

PrC’[C’By=φ] ≤ (1-m/s)|C’| ≈ e-m

W.h.p. simultaneously for all y‘s with large set By of bad queries, C’ intersects By

C’

C

y*=A’(D) good for all of C’

y* good for almost all Cy: potential m-bit output DB

By

Occam’s Razor

By*

Page 17: Foundations of Privacy Lecture 5

How to get Synthetic DB? Syntheticizer

Problem: need small synthetic DB, have large other output

Lemma [“Syntheticizer”]

Given sanitizer A with α-accuracy and arbitrary output

Produce sanitizer A’ with 2α-accuracy and synthetic DB output of size Õ(log|C|/α2)

Runtime is poly(|U|,|C|)

Transform output to synthetic DB using linear programming

Variable per item in U, constraint per query in C

Page 18: Foundations of Privacy Lecture 5

The Linear Program

• Run the sanitizer A and then use it to get differentially private counts vc on all the concepts in C– Database never used again - privacy

• Come up with a low-weight fractional database that approximates these counts.

• Transform this fractional database into a standard synthetic database by rounding the fractional counts.

Page 19: Foundations of Privacy Lecture 5

• For all i 2 U variable xi

• For all c 2 C constraint

vc - · i s.t c(i)=1 xi · vc +

Page 20: Foundations of Privacy Lecture 5

The Linear Program

• Why is there a fractional solution?– The real one integer solution is one example!

• Rounding:– scale the fractional database so that its total weight is 1, – Round down each fractional point to closest multiple of

/|U|– Treat the rounded fractional database, as an integer

synthetic database of size at most |U| / – If too large -sample

Page 21: Foundations of Privacy Lecture 5

How Do We Use Synthetic DB?

Why Synthetic DB?

1. Easy to “shrink” DBs by sub-sampling Õ(log|C|/α2) DB items

2. Gives counts for every query output is well-defined even for queries that were not around when sanitizing

Page 22: Foundations of Privacy Lecture 5

Utility for all queries: First Attempt

Sanitizing small C’ is easy (“brute force”),can “shrink” using syntheticizer

Sub-sample small C’, work for all but a few queriesRepeat many times, take majority

Doesn’t work:

Underprivileged queries

C’

C

BC’’

Page 23: Foundations of Privacy Lecture 5

Utility for all queries: fix “underpriveleged”

Lemma

Given query set C, diff. private sanitizer A that:1. Works for every C’ ½C, |C’|=s2. Outputs synthetic DB of size ≤ mGet sanitizer for C, utility on all queries Need DB size n ≥ Õ(|C|m/s)

Page 24: Foundations of Privacy Lecture 5

Proof OutlineSubsample small C’, get synthetic DB that works for

all but a few (~|C|m/s) “underprivileged” queries

Now “manually” correct those few:“brute force”: release noisy counts vc (noise ~|C|m/s)

Also need to say which ones are underprivileged…depends on DB D. What about privacy?

Key point: regardless of D, almost all queries strongly privileged. Release noisy indicator vector.

For privacy analysis, need only consider the ~|C|m/s potentially underprivileged queries

Page 25: Foundations of Privacy Lecture 5

Recursive Algorithm: Recap

C0=C C1 C2 Cb

Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factor

v

Page 26: Foundations of Privacy Lecture 5

Recursive Algorithm: Recap

Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factorSanitize D w.r.t. small Cb (use “brute force” sanitizer)Syntheticizer transforms output to small synthetic DBFix “underprivileged” (need n ≥ Õ(f))Lose 2b accuracy, “brute force” needs n ≥ 2b|Cb|

C0=C C1 C2 Cb

n ≥ |C|o(1) by trading off b,f

Page 27: Foundations of Privacy Lecture 5

And Now… Bad News

Runtime cannot be subpoly in |C| or |U|• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

Page 28: Foundations of Privacy Lecture 5

The Bad News

For large C and U can’t get efficient sanitizers!• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

Page 29: Foundations of Privacy Lecture 5

Digital Signatures

Digital Signatures (sk,vk)

Can build from one-way function [NaYu,Ro]

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

m’ sig(m’)

valid signatures under vk

Hard to forge new signature

Page 30: Foundations of Privacy Lecture 5

Signatures ! No Synthetic DB

Universe: (m,s) msg,sig pairQueries: cvk(m,s) output 1 iff s valid sig of m under vk

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

sanitizerm’1 s1

m’k sk

most are valid signatures under vkinputs appear in output, no

privacy!valid signatures under same vk

Page 31: Foundations of Privacy Lecture 5

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Page 32: Foundations of Privacy Lecture 5

Where is Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Page 33: Foundations of Privacy Lecture 5

Hardness on Average

Universe: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

sanitizer

valid signatures under vk

m’1 s1vk’1m1 sig(m1)vk

m2 sig(m2)vk

mn sig(mn)vk

m’k skvk’k

are these keys related to vk?Yes! At least one is vk!

Page 34: Foundations of Privacy Lecture 5

Hardness on Average

Samples: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

m’1 s1

m’k sk

vk’1

vk’k

8 i 3/4 of vk’j agree w. ECC(vk)[i] 9 vk’j s.t. ECC(vk’j), ECC(vk) are

3/4-closevk’j = vk (error-correcting code)m’j appears in input. No privacy!

are these keys related to vk?Yes! At least one is vk!

Page 35: Foundations of Privacy Lecture 5

Where is Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Page 36: Foundations of Privacy Lecture 5

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Signatures Hard on Avg.Using PRFs

Page 37: Foundations of Privacy Lecture 5

General output sanitizers

Theorem

Traitor tracing schemes exist if and only if sanitizing is hard

Tight connection between |U|,|C| hard to sanitizeand key,ciphertext sizes in traitor tracing

Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme

Page 38: Foundations of Privacy Lecture 5

Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption

devices

• Given a Pirate Box want to find who leaked the keys

E(Content)

K1 K3 K8

ContentPirate Box

Traitors ``privacy” is violated!

Page 39: Foundations of Privacy Lecture 5

Equivalence of TT and Hardness of Sanitizing

Ciphertext

Key

Traitor Tracing

Database entry

Query

Sanitizing hard

TT Pirate Sanitizer

for distribution of DBs

(collection of)

(collection of)

Page 40: Foundations of Privacy Lecture 5

Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme

– cipher length c(n), – key length k(n),

can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases with entries from UD is “hard to sanitize”: exists tracer that can extract an entry in

D from any sanitizer’s output

Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme

Violate its privacy!