james honaker & salil vadhan school of engineering...
Post on 08-Jun-2020
7 Views
Preview:
TRANSCRIPT
CS208: Applied Privacy for Data ScienceReidentification & Reconstruction Attacks
James Honaker & Salil Vadhan
School of Engineering & Applied SciencesHarvard University
February 4, 2019
Cohen & NissimLinear Program Reconstruction in Practice
• Use queries of sums over random subsets toreconstruct individual data.
• Importantly, the members of the subset are reportedin each sum.
• Received the Aircloak Bounty ($5000) forreidentifying challenge data in the Diffix commercialsystem.
Regression Based Reconstruction
yi = β1x1,i + β2x2,i + . . .+ βNxN,i + εi
Here:
N is the Number of people in the databasei is query index
yi is i-th query releasexh,i is a {0, 1}-indicator of whether person h
was included in query iβh is h’s sensitive dataεi is the noise added to the i-th query
Regression Based Reconstruction
yi = β1x1,i + β2x2,i + . . .+ βNxN,i + εi
7 = 1·1 + 0·1 + 1·0 + 0·0 + . . .+ 0·1 + 24 = 1·0 + 0·1 + 1·1 + 0·1 + . . .+ 0·1 + (−1)6 = 1·0 + 0·0 + 1·0 + 0·1 + . . .+ 0·0 + 1
Here:
N is the Number of people in the databasei is query index
yi is i-th query releasexh,i is a {0, 1}-indicator of whether person h
was included in query iβh is h’s sensitive dataεi is the noise added to the i-th query
Regression Based Reconstruction
yi = β1x1,i + β2x2,i + . . .+ βNxN,i + εi
7 = 1·1 + 0·1 + 1·0 + 0·0 + . . .+ 0·1 + 24 = 1·0 + 0·1 + 1·1 + 0·1 + . . .+ 0·1 + (−1)6 = 1·0 + 0·0 + 1·0 + 0·1 + . . .+ 0·0 + 1
Here:
N is the Number of people in the databasei is query index
yi is i-th query releasexh,i is a {0, 1}-indicator of whether person h
was included in query iβh is h’s sensitive dataεi is the noise added to the i-th query
Regression Based Reconstruction
Find β1, . . . , βn s.t.:
β = argmin[∑
i
(yi − yi)2]
where yi = β1x1,i + β2x2,i + . . .+ βNxN,i
In R see:lm()In Python see for example:linear_model.LinearRegression()from scikit-learn.
Example
From regressionAttack.r:
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Reconstruction of Latino Variable
estimate
sens
itive
val
ue
fraction ones correct: 1
fraction zeros correct: 1
−15 −10 −5 0 5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
Reconstruction of Latino Variable
estimate
sens
itive
val
ue
fraction ones correct: 0.53
fraction zeros correct: 0.38
Garfinkel et al.Understanding Database Reconstruction Attacks on Public Data
• Demonstrates feasibility of Database ReconstructionAttacks (DRAs) on small Census blocks by usingtheir released aggregate statistics.
• 1.5M census blocks with between 1 and 7 residents• Each released statistic provides a constraint, and
some blocks have only one possible dataset thatsatisfy all constraints.
Example Dataset
SAT Solvers
SAT Solvers state the feasibility of a solution to a series oflogical formulae, and find a solution if one exists.
PicoSAT has R and Python bindings.In R:install.packages("rpicosat")In Python:pip install pycosat
Conjunctive Normal FormConjunctive Normal Form (CNF) is expressed asconjunctions of disjunctions, that is, clauses entirelycomposed of OR ∨, each of which are bound together byAND ∧.Negations of literals are allowed.
Examples:• (A ∨ B) ∧ (¬C ∨D)
• (A ∨ B) ∧ (C ∨D ∨ E ∨ F)• (A ∨ B) ∧ (C)• (A)
Construction:• A→ B is expressed as (¬A ∨ B)• A↔ B is expressed as (¬A ∨ B) ∧ (A ∨ ¬B)
Dataset
ActualSex Married
1 1 02 1 13 0 14 0 15 0 0
LabelsSex MarriedA FB GC HD IE J
Release:∑
Sex = 2,∑Married = 3,∑
Sex ·Married = 1.
Dataset
ActualSex Married
1 1 02 1 13 0 14 0 15 0 0
LabelsSex MarriedA FB GC HD IE J
Release:∑
Sex = 2,∑Married = 3,∑
Sex ·Married = 1.
Dataset
ActualSex Married
1 1 02 1 13 0 14 0 15 0 0
LabelsSex MarriedA FB GC HD IE J
Release:∑
Sex = 2,∑Married = 3,∑
Sex ·Married = 1.
Challenge
Transform into CNF:(A ∧ F) ∨ (B ∧ G) ∨ (C ∧H) ∨ (D ∧ I) ∨ (E ∧ J)
What about:(A ∧ F)⊕ (B ∧ G)⊕ (C ∧H)⊕ (D ∧ I)⊕ (E ∧ J)
(requires 24 clauses)
Challenge
Transform into CNF:(A ∧ F) ∨ (B ∧ G) ∨ (C ∧H) ∨ (D ∧ I) ∨ (E ∧ J)
What about:(A ∧ F)⊕ (B ∧ G)⊕ (C ∧H)⊕ (D ∧ I)⊕ (E ∧ J)
(requires 24 clauses)
top related