differential privacy (part ii)differential privacy (recap) • neutralizes linkage attacks a query...
TRANSCRIPT
Differential Privacy (Part II)
Differentialprivacy(recap)
• NeutralizeslinkageattacksA query mechanism M is ✏-di↵erentially private if, for
any two adjacent databases D and D0(di↵ering in just
one entry) and C ✓ range(M)
Pr(M(D) 2 C) e✏ · Pr(M(D0) 2 C)
Sequentialcompositiontheorem(proof)
Let Mi each provide ✏i-di↵erential privacy. The sequenceof Mi(X) provides (
Pi ✏i)-di↵erential privacy.
Parallelcompositiontheorem(proof)
Let Mi each provide ✏-di↵erential privacy. Let Di be
arbitrary disjoint subsets of the input domain D. The
sequence of Mi(X \Di) provides ✏-di↵erential privacy.
Sensitivityofafunction(recap)
•Sensitivitymeasureshowmuchthefunctionamplifiesthedistanceoftheinputs
•Exercises:whatisthesensitivityof•countingqueries(e.g.,“howmanypatientsinthedatabasehavediabetes”)?
•“Howoldistheoldestpatientinthedatabase?”
The sensitivity of a function f : D ! R is defined as:
�f = maxD,D0 |f(D)� f(D0)|
for all adjacent D,D0 2 D
Laplacedistribution(recap)
•DenotedbyLap(b)•Increasingbflattensthecurve
pr(z) =e
�|z|b
2bvariance = 2b2
standard deviation � =
p2b
Laplacemechanism(recap)
•Generalsanitizationmechanism•wehavejusttocomputethesensitivityofthefunction
•Noisedependsonfandℇ,notonthedatabase!•RememberhowtheLaplacedistributionlookslike:smallersensitivity(and/orlessprivacy)meanslessdistortion
•Exercise:howmuchnoisedowehavetoaddtosanitizethefollowingquestion?•“Howmanypeopleinthedatabasearefemale?”
Let f : D ! R be a function with sensitivity �f . Then
g = f(X) + Lap(
�f✏ ) is ✏-di↵erentially private.
ProofofLaplacemechanism
•That’sit:)
Pr(f(D) + Lap(�f/✏) = y)
Pr(f(D0) + Lap(�f/✏) = y)=
e�|y�f(D)|✏
�f
e�|y�f(D0)|✏
�f
=
e✏
�f ·(|y�f(D0)|�|y�f(D)|)
e✏
�f ·(|f(D)�f(D0)|)
e✏
Vector-valuedqueries
•where
Let f : D ! Rdbe a function with sensitivity �f . Then
g = f(X) + (Y1, . . . , Yd), where the Yi are drawn i.i.d.
from Lap(
�f✏ ), is ✏-di↵erentially private.
The sensitivity of a function f : D ! Rdis defined as:
�f = maxD,D0 ||f(D)� f(D0)||1
for all adjacent D,D0 2 D
||(x1, . . . , xn)||1 =P
i |xi|
Toolstoplaywith
✦ A Privacy-Integrated Query Language (PINQ)
Applications of the Laplace sanitization mechanism:
http://research.microsoft.com/en-us/projects/pinq/
✦ Fuzz: a typed functional language for differentially private computations
http://privacy.cis.upenn.edu/software.html
Othersanitizationmechanisms
•Ok,weknowhowtohandlenumericqueries•“Howmanypeopleinthisroomhaveblueeyes?”•Perturbtheresultbyanamountofnoiseproportionaltothesensitivityofthequery
•Butwhataboutnon-numericqueries?•“Whatisthemostcommoneyecolorinthisroom?”
•Whatiftheperturbedanswerisn’talmostasgoodastheexactanswer?•“Whichpricewouldbringthemostmoneyfromasetofbuyers?”
Example:itemsforsale
•Couldsetthepriceofapplesat$1.00forprofit$4.00•Couldsetthepriceofapplesat$4.01forprofit$4.01
•Bestprice:$4.01•Secondbestprice:$1.00•Profitifyousetthepriceat$4.02:$0•Profitifyousetthepriceat$1.01:$1.01
$1.00
$1.00
$1.00
$4.01
Exponentialmechanism [McSherryandTalwar,FOCS’07]
•AmechanismforsomeabstractrangeR•R={Red,Blue,Green,Brown,Purple}•R={$1.00,$1.01,$1.02,$1.03}
•Herethedatabaseisrepresentedasahistogram•e.g.,(Red,Green,Red,Brown,Blue,Green,Green)representedas(2,1,3,1,0)
•Pairedwithaqualityscore:
•q(D,r)representshowgoodoutputrisfordatabaseD•thehigherthescore,themoreappealingtheresult
M : N|X| ! R
q : N|X| ⇥R ! R
Exponentialmechanism(cont’d)
•Thefirstideaistodefineandcomputethesensitivityofthescoringfunction(calledglobalsensitivity):
•Theglobalsensitivitytellsusthemaximumchangeinthescoringfunctionfortwoadjacentdatabases,forallpossibleresults
GS(q) = maxr2R,D,D0:||D�D0||11 |q(D, r)� q(D0, r)|
Exponentialmechanism(cont’d)
•Idea:makehighqualityoutputsexponentiallymorelikelyataratethatdependsonthesensitivityofthequalityscore(andtheprivacyparameter)
Pr(Exponential(D,R, q, ✏)=r)=exp(
✏q(D,r)2� )
Pr02R exp(
✏q(D,r0)2� )
Exponential(D,R, q, ✏):
1. Let � = GS(q).
2. Output r ⇠ R with probability proportional to
exp(✏q(D, r)
2�)
Exponentialmechanism:privacytheorem
Exponential(D,R, q, ✏) is ✏-di↵erentially private
Exponentialmechanism:privacytheorem
Exponential(D,R, q, ✏) is ✏-di↵erentially private
Exponentialmechanism:privacytheorem
Exponential(D,R, q, ✏) is ✏-di↵erentially private
Exponentialmechanism:privacytheorem
Exponential(D,R, q, ✏) is ✏-di↵erentially private
Exponentialmechanism:accuracytheorem•Whatabouttheaccuracyoftheanswer?•Itdepends…itisgoodifthereisasufficientmassofvaluesofvaluesrwithvalueqclosetooptimum
✦ For the proofs of these results, see www.cis.upenn.edu/~aaroth/courses/slides/Lecture3.pdf
Define:
• OPTq(D) = maxr2R q(D, r)
• ROPT = {r 2 R : q(D, r) = OPTq(D)}
• r⇤=Exponential(D,R, q, ✏)
Pr
�q(D, r⇤) OPTq(D)� 2�
✏ (log(
|R||ROPT | + t)
� e�t
Exponentialmechanism:example
•“Whatisthemostcommoneyecolorinthisroom?”•R={Red,Blue,Green,Brown,Purple}
•Hereqreturnsthenumberofpeoplewiththateyecolour•Wecananswerwithacolorthatissharedby
•Exceptwithprobability•Independentofthenumberofpeopleintheroom•Verysmallerrorifnislarge
OPT � 2
✏(log(5) + 3) < OPT � 7.4
✏people
e�3 < .05
Laplacemechanismvsexponentialmechanism
•TheLaplacemechanismisjustaninstanceoftheexponentialmechanism
•Forsimplicity,let’sfocusonfunctionsofsensitivity1•Wehavejusttodefine
•lessnoisegivesbetterquality•Ingeneral,itcanbeshownthattheexponentialmechanismcapturesanydifferentiallyprivatebychoosinganappropriatescoringfunction[McSherryandTalwar,FOCS’07]
q(D, r) = �2|f(D)� r|
Approximate(or(ℇ,∂))-differentialprivacy•Generalizeddefinitionofdifferentialprivacyallowingfora(supposedlysmall)additivefactor
•Usedinavarietyofapplications
A query mechanism M is (✏, �)-di↵erentially private if,
for any two adjacent databases D and D0(di↵ering in
just one entry) and C ✓ range(M)
Pr(M(D) 2 C) e✏ · Pr(M(D0) 2 C) + �