additive data perturbation: data reconstruction attacks
DESCRIPTION
Additive Data Perturbation: data reconstruction attacks. Outline (paper 15). Overview Data Reconstruction Methods PCA-based method Bayes method Comparison Summary. Overview. Data reconstruction Z = X+R Problem: Z, R estimate the value of X Extend it to matrix - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/1.jpg)
Additive Data Perturbation: data reconstruction attacks
![Page 2: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/2.jpg)
Outline (paper 15) Overview Data Reconstruction Methods
PCA-based method Bayes method
Comparison Summary
![Page 3: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/3.jpg)
Overview Data reconstruction
Z = X+R Problem: Z, R estimate the value of X Extend it to matrix
X contains multiple dimensions Or folding the vector X matrix
Approach 1 Apply matrix analysis technique
Approach 2 Bayes estimation
![Page 4: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/4.jpg)
Two major approaches Principle component analysis (PCA)
based approach Bayes analysis approach
![Page 5: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/5.jpg)
Variance and covariance Definition
Random variable x, mean Var(x) = E[(x- )2] Cov(xi, xj) = E[(xi- i)(xj- j)]
For multidimensional case, X=(x1,x2,…,xm) Covariance matrix
If each dimension xi has zero mean cov(X) = 1/m XT*X
)var()1,cov(
...
...)1,2cov(
),1cov(...)2,1cov()1var(
)cov(
xmxxm
xx
xmxxxx
X
![Page 6: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/6.jpg)
PCA intuition Vector in space
Original space base vectors E={e1,e2,…,em} Example: 3-dimension space
x,y,z axes corresponds to {(1 0 0),(0 1 0), (0 0 1)}
If we want to use the red axes to represent the vectors The new base vectors U=(u1, u2) Transformation: matrix X XU
X1
X2u1u2
![Page 7: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/7.jpg)
Why do we want to use different bases? Actual data distribution can be possibly described
with lower dimensions
X1
X2u1
Ex: projecting points to U1, we can use one dimension (u1) to approximately describe all these points
The key problem: finding these directions that maximize variance of the points. These directions are called principle components.
![Page 8: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/8.jpg)
How to do PCA? Calculating covariance matrix:
C =
“Eigenvalue decomposition” on C Matrix C: symmetric We can always find an orthonormal matrix U
U*UT = I So that C = U*B*UT
B is a diagonal matrix
XXm
T *1
dm
d
d
B...
2
1
Explanation: di in B are actually the variance in the transformed space.U are the new base vectors.
X is zero mean on each dimension
![Page 9: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/9.jpg)
Look at the diagonal matrix B (eigenvalues) We know the variance in each transformed direction We can select the maximum ones (e.g., k elements)
to approximately describe the total variance
Approximation with maximum eigenvalues Select the corresponding k eigenvectors in U U’ Transform A AU’
AU’ has only k dimensional
![Page 10: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/10.jpg)
PCA-based reconstruction Cov matrix for Y=X+R
Elements in R is iid with variance 2
Cov(Xi+Ri, Xj+Rj)= cov(Xi,Xi) + 2 , for diagonal elements cov(Xi,Xj) for i!=j
Therefore, removing 2 from the diagonal of cov(Y), we get the covariance matrix for X
![Page 11: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/11.jpg)
Reconstruct X We have got C=cov(X) Apply PCA on cov matrix C
C = U*B*UT
Select major principle components and get the corresponding eigenvectors U’
Reconstruct X X^ = Y*U’*U’T
for X’ =X*U X=X’*U-1=X’*UT ~ X’*U’T
approximate X’ with Y*U’ and plugin
Error comes from here
![Page 12: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/12.jpg)
Bayes Method Make an assumption
The original data is multidimensional normal distribution
The noise is is also normal distribution
Covariance matrix, can be approximatedwith the discussed method.
![Page 13: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/13.jpg)
Data
(x11,x12,…x1m) vector 1x
(x21,x22,…x2m) vector 2x
…
![Page 14: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/14.jpg)
Problem: Given a vector yi, yi=xi+ri Find the vector xi Maximize the posterior prob P(X|Y)
![Page 15: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/15.jpg)
Again, applying bayes rule
f
Constant for all x
Maximize this
With fy|x (y|x) = fR(y-x), plug in the distributions fx and fR
We maximize:
![Page 16: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/16.jpg)
It’s equivalent to maximize the exponential part
A function is maximized/minimized, when its derivative =0
i.e.,
Solving the above equation, we get
![Page 17: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/17.jpg)
Reconstruction For each vector y, plug in the covariance,
the mean of vector x, and the noise variance, we get the estimate of the corresponding x
![Page 18: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/18.jpg)
Experiments Errors vs. number of dimensions
Conclusion: covariance between dimensions helps reduce errors
![Page 19: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/19.jpg)
Errors vs. # of principle components
Conclusion: the # of principal components ~ the amount of noise
![Page 20: Additive Data Perturbation: data reconstruction attacks](https://reader035.vdocuments.mx/reader035/viewer/2022062301/5681351f550346895d9c80cf/html5/thumbnails/20.jpg)
Discussion The key: find the covariance matrix of
the original data X Increase the difficulty of Cov(X)
estimation decrease the accuracy of data reconstruction
Assumption of normal distribution for the Bayes method other distributions?