foundations of privacy lecture 5

Foundations of Privacy

Lecture 5

Lecturer: Moni Naor

Recap of last week’s lecture• The Exponential Mechanism

– Differential privacy– May yield utility/approximation– Is defined and evaluated by considering all possible

answers

• Counting Queries– The BLR Algorithm– Efficient Algorithm

query 1,query 2,. . .

Synthetic DB: Output is a DB

Database

answer 1answer 3

answer 2

?

Sanitizer

Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB

Software and people compatibleConsistent answers

Counting Queries

• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

Assume all queries given in advance

U

Database D of size n

Query c

Non-interactive

The BLR Algorithm

For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|

Intuition: far away DBs get smaller probability

Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)

DB F gets picked w.p. / e-ε·dist(F,D)

Blum Ligett Roth08

Counting Queries

• Queries with low sensitivity

Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?

Relaxed accuracy:

answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis

U

Database D of size n

Query c

Sample F of size m approx D on all given predicates

c

The BLR Algorithm: Error Õ(n2/3 log|C|)

There exists Fgood of size m =Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤ α

Pr[Fgood] / e-εα

For any Fbad with dist 2α, Pr[Fbad] / e-2εα

Union bound: ∑ bad DB Fbad Pr[Fbad] / |U|me-2εα

For α=Õ(n2/3log|C|), Pr[Fgood] >> ∑ Pr[Fbad]



The BLR Algorithm: Running Time

Generating the distribution by enumeration:Need to enumerate every size-m database,where m = Õ((n\α)2·log|C|)

Running time ≈ |U|Õ((n\α)2·log|c|)



Conclusion

Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries

• Error α is Õ(n2/3 log|C|/ε)

• Super-poly running time: |U|Õ((n\α)2·log|C|)

Can we Efficiently Sanitize?

The good news

If the universe is small, Can sanitize EFFICIENTLY

The bad news cannot do much better, namely sanitize in time:

sub-poly(|C|) AND sub-poly(|U|)

Time poly(|C|,|U|)

How Efficiently Can We Sanitize?

|C|

|U|subpol

ypoly

subpoly

poly

?

Good news!

?

? ?

The Good News: Can Sanitize When Universe is Small

Efficient Sanitizer for query set C• DB size n ¸ Õ(|C|o(1) log|U|)• error is ~ n2/3 • Runtime poly(|C|,|U|)

Output is a synthetic database

Compare to [Blum Ligget Roth]:

n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)

Recursive Algorithm

C0=C C1 C2 Cb

Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:

shrink query set by (small) factor

Recursive Algorithm

Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:

shrink query set by (small) factorEnd recursion: sanitize D w.r.t. small query set Cb

Output is good for all queries in small set Ci+1

Extract utility on almost-all queries in large set Ci

Fix remaining “underprivileged” queries in large set Ci

C0=C C1 C2 Cb

Recursive Algorithm Overview Want to sanitize DB D for query set CSay we have a small sanitizer A’ for smaller subsets C’ ½ C,

and A’ outputs small synthetic databaseChoose random C’½ C, sanitize D for C’ using A’

“Magic”: Sanitization givesaccurate answers onall but small subset B ½ C

Fix “underprivileged”queries in B “manually”

CC’

B

A’ sanitizesFix manually

Why?

How?

Where?

Sanitize for few queries, get utility for almost all

Consider m-bit synthetic DB output y of A’ vs. DB D:If y is “bad” for query set By of fractional size ≥m/s:

PrC’[C’By=φ] ≤ (1-m/s)|C’| ≈ e-m

W.h.p. simultaneously for all y‘s with large set By of bad queries, C’ intersects By

C’

C

y*=A’(D) good for all of C’

y* good for almost all Cy: potential m-bit output DB

By

Occam’s Razor

By*

How to get Synthetic DB? Syntheticizer

Problem: need small synthetic DB, have large other output

Lemma [“Syntheticizer”]

Given sanitizer A with α-accuracy and arbitrary output

Produce sanitizer A’ with 2α-accuracy and synthetic DB output of size Õ(log|C|/α2)

Runtime is poly(|U|,|C|)

Transform output to synthetic DB using linear programming

Variable per item in U, constraint per query in C

The Linear Program

• Run the sanitizer A and then use it to get differentially private counts vc on all the concepts in C– Database never used again - privacy

• Come up with a low-weight fractional database that approximates these counts.

• Transform this fractional database into a standard synthetic database by rounding the fractional counts.

• For all i 2 U variable xi

• For all c 2 C constraint

vc - · i s.t c(i)=1 xi · vc +

The Linear Program

• Why is there a fractional solution?– The real one integer solution is one example!

• Rounding:– scale the fractional database so that its total weight is 1, – Round down each fractional point to closest multiple of

/|U|– Treat the rounded fractional database, as an integer

synthetic database of size at most |U| / – If too large -sample

How Do We Use Synthetic DB?

Why Synthetic DB?

1. Easy to “shrink” DBs by sub-sampling Õ(log|C|/α2) DB items

2. Gives counts for every query output is well-defined even for queries that were not around when sanitizing

Utility for all queries: First Attempt

Sanitizing small C’ is easy (“brute force”),can “shrink” using syntheticizer

Sub-sample small C’, work for all but a few queriesRepeat many times, take majority

Doesn’t work:

Underprivileged queries

C’

C

BC’’

Utility for all queries: fix “underpriveleged”

Lemma

Given query set C, diff. private sanitizer A that:1. Works for every C’ ½C, |C’|=s2. Outputs synthetic DB of size ≤ mGet sanitizer for C, utility on all queries Need DB size n ≥ Õ(|C|m/s)

Proof OutlineSubsample small C’, get synthetic DB that works for

all but a few (~|C|m/s) “underprivileged” queries

Now “manually” correct those few:“brute force”: release noisy counts vc (noise ~|C|m/s)

Also need to say which ones are underprivileged…depends on DB D. What about privacy?

Key point: regardless of D, almost all queries strongly privileged. Release noisy indicator vector.

For privacy analysis, need only consider the ~|C|m/s potentially underprivileged queries

Recursive Algorithm: Recap

C0=C C1 C2 Cb

Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factor

v

Recursive Algorithm: Recap

Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factorSanitize D w.r.t. small Cb (use “brute force” sanitizer)Syntheticizer transforms output to small synthetic DBFix “underprivileged” (need n ≥ Õ(f))Lose 2b accuracy, “brute force” needs n ≥ 2b|Cb|

C0=C C1 C2 Cb

n ≥ |C|o(1) by trading off b,f

And Now… Bad News

Runtime cannot be subpoly in |C| or |U|• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

The Bad News

For large C and U can’t get efficient sanitizers!• Output is synthetic DB (as in positive result)• General output

Exponential Mechanism cannot be implemented

Want hardness… Got Crypto?

Digital Signatures

Digital Signatures (sk,vk)

Can build from one-way function [NaYu,Ro]

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

m’ sig(m’)

valid signatures under vk

Hard to forge new signature

Signatures ! No Synthetic DB

Universe: (m,s) msg,sig pairQueries: cvk(m,s) output 1 iff s valid sig of m under vk

m1 sig(m1)

m2 sig(m2)

mn sig(mn)

sanitizerm’1 s1

m’k sk

most are valid signatures under vkinputs appear in output, no

privacy!valid signatures under same vk

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Where is Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Hardness on Average

Universe: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

sanitizer

valid signatures under vk

m’1 s1vk’1m1 sig(m1)vk

m2 sig(m2)vk

mn sig(mn)vk

m’k skvk’k

are these keys related to vk?Yes! At least one is vk!

Hardness on Average

Samples: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)

cv(vk,m,s) - 1 iff valid sig under vk

m’1 s1

m’k sk

vk’1

vk’k

8 i 3/4 of vk’j agree w. ECC(vk)[i] 9 vk’j s.t. ECC(vk’j), ECC(vk) are

3/4-closevk’j = vk (error-correcting code)m’j appears in input. No privacy!

are these keys related to vk?Yes! At least one is vk!

Where is Hardness Coming From?

Signature example:

Hard to satisfy a given queryEasy to maintain utility for all queries but one

More natural:

Easy to satisfy each individual queryHard to maintain utility for most queries

Can We output Synthetic DB Efficiently?

|C|

|U|subpol

ypoly

subpoly

poly

? ?

?

Signatures Hard on Avg.Using PRFs

General output sanitizers

Theorem

Traitor tracing schemes exist if and only if sanitizing is hard

Tight connection between |U|,|C| hard to sanitizeand key,ciphertext sizes in traitor tracing

Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme

Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption

devices

• Given a Pirate Box want to find who leaked the keys

E(Content)

K1 K3 K8

ContentPirate Box

Traitors ``privacy” is violated!

Equivalence of TT and Hardness of Sanitizing

Ciphertext

Key

Traitor Tracing

Database entry

Query

Sanitizing hard

TT Pirate Sanitizer

for distribution of DBs

(collection of)

(collection of)

Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme

– cipher length c(n), – key length k(n),

can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases with entries from UD is “hard to sanitize”: exists tracer that can extract an entry in

D from any sanitizer’s output

Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme

Violate its privacy!

foundations of privacy lecture 5

Documents