foundations of privacy lecture 5
DESCRIPTION
Foundations of Privacy Lecture 5. Lecturer: Moni Naor. Recap of last week’s lecture. The Exponential Mechanism Differential privacy May yield utility/approximation Is defined and evaluated by considering all possible answers Counting Queries The BLR Algorithm Efficient Algorithm. - PowerPoint PPT PresentationTRANSCRIPT
Foundations of Privacy
Lecture 5
Lecturer: Moni Naor
Recap of last week’s lecture• The Exponential Mechanism
– Differential privacy– May yield utility/approximation– Is defined and evaluated by considering all possible
answers
• Counting Queries– The BLR Algorithm– Efficient Algorithm
query 1,query 2,. . .
Synthetic DB: Output is a DB
Database
answer 1answer 3
answer 2
?
Sanitizer
Synthetic DB: output also a DB (of entries from same universe X), user reconstructs answers by evaluating query on output DB
Software and people compatibleConsistent answers
Counting Queries
• Queries with low sensitivity
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy:
answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
Assume all queries given in advance
U
Database D of size n
Query c
Non-interactive
The BLR Algorithm
For DBs F and Ddist(F,D) = maxq2C |q(F) – q(D)|
Intuition: far away DBs get smaller probability
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
Blum Ligett Roth08
Counting Queries
• Queries with low sensitivity
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy:
answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
U
Database D of size n
Query c
Sample F of size m approx D on all given predicates
c
The BLR Algorithm: Error Õ(n2/3 log|C|)
There exists Fgood of size m =Õ((n\α)2·log|C|) s.t. dist(Fgood,D) ≤ α
Pr[Fgood] / e-εα
For any Fbad with dist 2α, Pr[Fbad] / e-2εα
Union bound: ∑ bad DB Fbad Pr[Fbad] / |U|me-2εα
For α=Õ(n2/3log|C|), Pr[Fgood] >> ∑ Pr[Fbad]
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
The BLR Algorithm: Running Time
Generating the distribution by enumeration:Need to enumerate every size-m database,where m = Õ((n\α)2·log|C|)
Running time ≈ |U|Õ((n\α)2·log|c|)
Algorithm on input DB D:Sample from a distribution on DBs of size m: (m < n)
DB F gets picked w.p. / e-ε·dist(F,D)
Conclusion
Offline algorithm, 2ε-Differential Privacy for anyset C of counting queries
• Error α is Õ(n2/3 log|C|/ε)
• Super-poly running time: |U|Õ((n\α)2·log|C|)
Can we Efficiently Sanitize?
The good news
If the universe is small, Can sanitize EFFICIENTLY
The bad news cannot do much better, namely sanitize in time:
sub-poly(|C|) AND sub-poly(|U|)
Time poly(|C|,|U|)
How Efficiently Can We Sanitize?
|C|
|U|subpol
ypoly
subpoly
poly
?
Good news!
?
? ?
The Good News: Can Sanitize When Universe is Small
Efficient Sanitizer for query set C• DB size n ¸ Õ(|C|o(1) log|U|)• error is ~ n2/3 • Runtime poly(|C|,|U|)
Output is a synthetic database
Compare to [Blum Ligget Roth]:
n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)
Recursive Algorithm
C0=C C1 C2 Cb
Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:
shrink query set by (small) factor
Recursive Algorithm
Start with DB D and large query set CRepeatedly choose random subset Ci+1 of Ci:
shrink query set by (small) factorEnd recursion: sanitize D w.r.t. small query set Cb
Output is good for all queries in small set Ci+1
Extract utility on almost-all queries in large set Ci
Fix remaining “underprivileged” queries in large set Ci
C0=C C1 C2 Cb
Recursive Algorithm Overview Want to sanitize DB D for query set CSay we have a small sanitizer A’ for smaller subsets C’ ½ C,
and A’ outputs small synthetic databaseChoose random C’½ C, sanitize D for C’ using A’
“Magic”: Sanitization givesaccurate answers onall but small subset B ½ C
Fix “underprivileged”queries in B “manually”
CC’
B
A’ sanitizesFix manually
Why?
How?
Where?
Sanitize for few queries, get utility for almost all
Consider m-bit synthetic DB output y of A’ vs. DB D:If y is “bad” for query set By of fractional size ≥m/s:
PrC’[C’By=φ] ≤ (1-m/s)|C’| ≈ e-m
W.h.p. simultaneously for all y‘s with large set By of bad queries, C’ intersects By
C’
C
y*=A’(D) good for all of C’
y* good for almost all Cy: potential m-bit output DB
By
Occam’s Razor
By*
How to get Synthetic DB? Syntheticizer
Problem: need small synthetic DB, have large other output
Lemma [“Syntheticizer”]
Given sanitizer A with α-accuracy and arbitrary output
Produce sanitizer A’ with 2α-accuracy and synthetic DB output of size Õ(log|C|/α2)
Runtime is poly(|U|,|C|)
Transform output to synthetic DB using linear programming
Variable per item in U, constraint per query in C
The Linear Program
• Run the sanitizer A and then use it to get differentially private counts vc on all the concepts in C– Database never used again - privacy
• Come up with a low-weight fractional database that approximates these counts.
• Transform this fractional database into a standard synthetic database by rounding the fractional counts.
• For all i 2 U variable xi
• For all c 2 C constraint
vc - · i s.t c(i)=1 xi · vc +
The Linear Program
• Why is there a fractional solution?– The real one integer solution is one example!
• Rounding:– scale the fractional database so that its total weight is 1, – Round down each fractional point to closest multiple of
/|U|– Treat the rounded fractional database, as an integer
synthetic database of size at most |U| / – If too large -sample
How Do We Use Synthetic DB?
Why Synthetic DB?
1. Easy to “shrink” DBs by sub-sampling Õ(log|C|/α2) DB items
2. Gives counts for every query output is well-defined even for queries that were not around when sanitizing
Utility for all queries: First Attempt
Sanitizing small C’ is easy (“brute force”),can “shrink” using syntheticizer
Sub-sample small C’, work for all but a few queriesRepeat many times, take majority
Doesn’t work:
Underprivileged queries
C’
C
BC’’
Utility for all queries: fix “underpriveleged”
Lemma
Given query set C, diff. private sanitizer A that:1. Works for every C’ ½C, |C’|=s2. Outputs synthetic DB of size ≤ mGet sanitizer for C, utility on all queries Need DB size n ≥ Õ(|C|m/s)
Proof OutlineSubsample small C’, get synthetic DB that works for
all but a few (~|C|m/s) “underprivileged” queries
Now “manually” correct those few:“brute force”: release noisy counts vc (noise ~|C|m/s)
Also need to say which ones are underprivileged…depends on DB D. What about privacy?
Key point: regardless of D, almost all queries strongly privileged. Release noisy indicator vector.
For privacy analysis, need only consider the ~|C|m/s potentially underprivileged queries
Recursive Algorithm: Recap
C0=C C1 C2 Cb
Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factor
v
Recursive Algorithm: Recap
Start with DB D and large query set CRepeatedly choose rand. subset Ci+1 of Ci: shrink by f factorSanitize D w.r.t. small Cb (use “brute force” sanitizer)Syntheticizer transforms output to small synthetic DBFix “underprivileged” (need n ≥ Õ(f))Lose 2b accuracy, “brute force” needs n ≥ 2b|Cb|
C0=C C1 C2 Cb
n ≥ |C|o(1) by trading off b,f
And Now… Bad News
Runtime cannot be subpoly in |C| or |U|• Output is synthetic DB (as in positive result)• General output
Exponential Mechanism cannot be implemented
Want hardness… Got Crypto?
The Bad News
For large C and U can’t get efficient sanitizers!• Output is synthetic DB (as in positive result)• General output
Exponential Mechanism cannot be implemented
Want hardness… Got Crypto?
Digital Signatures
Digital Signatures (sk,vk)
Can build from one-way function [NaYu,Ro]
m1 sig(m1)
m2 sig(m2)
mn sig(mn)
m’ sig(m’)
valid signatures under vk
Hard to forge new signature
Signatures ! No Synthetic DB
Universe: (m,s) msg,sig pairQueries: cvk(m,s) output 1 iff s valid sig of m under vk
m1 sig(m1)
m2 sig(m2)
mn sig(mn)
sanitizerm’1 s1
m’k sk
most are valid signatures under vkinputs appear in output, no
privacy!valid signatures under same vk
Can We output Synthetic DB Efficiently?
|C|
|U|subpol
ypoly
subpoly
poly
? ?
?
Where is Hardness Coming From?
Signature example:
Hard to satisfy a given queryEasy to maintain utility for all queries but one
More natural:
Easy to satisfy each individual queryHard to maintain utility for most queries
Hardness on Average
Universe: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)
cv(vk,m,s) - 1 iff valid sig under vk
sanitizer
valid signatures under vk
m’1 s1vk’1m1 sig(m1)vk
m2 sig(m2)vk
mn sig(mn)vk
m’k skvk’k
are these keys related to vk?Yes! At least one is vk!
Hardness on Average
Samples: (vk,m,s) key,msg,sigQueries: ci(vk,m,s) - i-th bit of ECC(vk)
cv(vk,m,s) - 1 iff valid sig under vk
m’1 s1
m’k sk
vk’1
vk’k
8 i 3/4 of vk’j agree w. ECC(vk)[i] 9 vk’j s.t. ECC(vk’j), ECC(vk) are
3/4-closevk’j = vk (error-correcting code)m’j appears in input. No privacy!
are these keys related to vk?Yes! At least one is vk!
Where is Hardness Coming From?
Signature example:
Hard to satisfy a given queryEasy to maintain utility for all queries but one
More natural:
Easy to satisfy each individual queryHard to maintain utility for most queries
Can We output Synthetic DB Efficiently?
|C|
|U|subpol
ypoly
subpoly
poly
? ?
?
Signatures Hard on Avg.Using PRFs
General output sanitizers
Theorem
Traitor tracing schemes exist if and only if sanitizing is hard
Tight connection between |U|,|C| hard to sanitizeand key,ciphertext sizes in traitor tracing
Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme
Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption
devices
• Given a Pirate Box want to find who leaked the keys
E(Content)
K1 K3 K8
ContentPirate Box
Traitors ``privacy” is violated!
Equivalence of TT and Hardness of Sanitizing
Ciphertext
Key
Traitor Tracing
Database entry
Query
Sanitizing hard
TT Pirate Sanitizer
for distribution of DBs
(collection of)
(collection of)
Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme
– cipher length c(n), – key length k(n),
can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases with entries from UD is “hard to sanitize”: exists tracer that can extract an entry in
D from any sanitizer’s output
Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme
Violate its privacy!