privacy preservation for data streams
DESCRIPTION
Privacy Preservation for Data Streams. Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana Stanoi (IBM T.J. Watson Research Center). P. P. P. Sensitive data. Application (1). Corp. A. Analytical Services. Corp. B. Corp. C. - PowerPoint PPT PresentationTRANSCRIPT
Privacy Preservation for Data StreamsFeifei Li, Boston University
Joint work with:Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana Stanoi (IBM T.J. Watson Research Center)
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
2
Application (1)
Corp. A
Corp. B
Corp. C
Analytical Services
Finding trends, clusters, patterns,
aggregations.Sensitive data
P
P
P
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
3
Application (2)
Corp. A Information Hub
Publish data as a service
Client A
Client B
Subscribe data to identify trends, patterns, classes
P
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
4
Target Application Identify trendsvalue
timevalue
timevalue
timevalue
time
stream 1
stream 2
stream 3
stream 4
Cluster/classificati
on
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
5
Problem Formulation
time
time
time
……
..
A1
A2
AN
t
A1t
),1[, TA NT
Nt RA
+ NTE *NTA
Online generated noise,
one vector at a time
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
6
Problem Formulation (continued)
time
time
time
……
.
*NTA Rx
~NTA
),(min ~NTNTR AAD Offline and
Online
Given σ2, obtain A* online, s.t. D(A, A*) = σ2, and for given R, D(A, A~) is close to σ2
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
7
Data Perturbation
time
time
time
time
time
time
time
time
+
Random i.i.d noise
i.i.d: identical independently distributed
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
8
Principal Component Analysis: PCA
i.i.d Noise
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
9
Principal Component Analysis: PCA
Correlated Noise
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
10
PCA Based Data Reconstruction
A
A~
Removed Noise
Principal Direction
Remaining Noise
Privacy
A*
σ2
Added Noise: Utility
Projection Error
A*: Perturbed Data
A: Original Data
A~: Reconstructed Data
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
11
PCA Based Data Reconstruction
A
A~ Principal Direction
Remaining Noise
Privacy
A*σ2
Added Noise: Utility
Projection Error
A*: Perturbed Data
A: Original Data
A~: Reconstructed DataCorrelated Noise!
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
12
Data Perturbation: main idea
Observations
–The amount of the random noise controls privacy/utility tradeoff
– i.i.d (identical independently distributed) noise does not preserve the privacy! Not well enough
Lesson learned
– Noise should be correlated with original data
• Z. Huang et al. Sigmod 05.
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
13
Challenge 1: Dynamic Correlation
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
14
Challenge 1: Dynamic Correlation
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
15
Challenge 2: Dynamic Autocorrelation
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
16
Challenge 2: Dynamic Autocorrelation
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
17
Online Random Noise for Autocorrelation: Stock
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
18
State of the Art
Privacy Preservation
–Given a utility requirement, maximize the privacy
Existing Work (Z. Huang et al. Sigmod05)
–Batch mode, static data
–And many other works (see our paper for a detailed literature review)
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
19
Adding Dynamic Correlated Noise
A1
A2
A3
+
U3x3: online estimation
of principal components
At
Update U
Et
Generate noisedistributed along U
A~t
Publish A~
t
S. Papadimitriou et al. VLDB05
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
20
Put it into Algorithm: Distribute Noise
V
V )1(1 2
V
V )2(2 2
σ2 σ2
TU
k=3, U: eigenvectors, V: eigenvalues
Added to AtRotate back to data space
Noise distributed in principal components’ subspace
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
21
why is our algorithm better (state of the art)?
Local principal component Local principal
component
Global principal component
Noise added along global PC -- offline
Removed noise by online reconstruction
Noise added along global PC -- offline
Removed noise by online reconstruction
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
22
Online Reconstruction vs. Offline Reconstruction
Choice of adversary:
– Offline reconstruction based on global principal components
– Online tracking of the principal components and apply local reconstruction
– Please see the details in the paper
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
23
Tracking Autocorrelation
a=[1 2 3 4 5 6]T
w1
w2
w3
w4
W =
1 2 3
2 3 4
3 4 5
4 5 6
Time
h streams
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
24
Distribute Noise
W =
1 2 3
2 3 4
3 4 5
4 5 6
1 2 3
2 3 4
3 4 5
4 5 6
1 2 3
2 3 4
3 4 5
4 5 6
1 2 3
2 3 4
3 4 5
4 5 6
1 2 3
2 3 4
3 4 5
4 5 6
Avoid adding noise > allowed threshold!
And still auto-correlated with the stream Idea: constraint the
next k noise values based on previous h-k noises + current estimation of U becomes a linear system
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
25
Experiments
Three Real Data Streams
– Sensor streams, Lab: Light, Humidity, Volt, Temperature. 7712x198
– Choroline environmental streams: 4310x166
– Stock streams: 8000x2
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
26
Perturbation vs. Reconstruction
Perturbation i.i.d-N Offline-N Online-N: SCAN / SACAN
Reconstruction
Baseline Offline-R Online-R: SCOR / SACOR
noise correlated with global principal componentsstreaming correlated additive noisestreaming auto-correlated additive noiseoffline-reconstruction based on global principal componentsstreaming correlated online reconstructionstreaming auto-correlated online reconstruction
noise (discrepancy) is represented by the relative energy as percentage to the original data streams,i.e., D(A, A*)/||A||
take perturbed data as the reconstruction
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
27
Reconstruction Error: Online-R vs. Offline-R
online reconstruction achieves better accuracy asit minimizes the projection error
10% noisek=10
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
28
Reconstruction Error: vary k
1. online reconstruction achieves better accuracy2. large k reduces projection error
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
29
Privacy vs. Discrepancy, online-R: Lab data
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
30
Privacy vs. Discrepancy, online-R: Choroline
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
31
Online Random Noise for Autocorrelation: Choroline
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
32
Online Random Noise for Autocorrelation: Stock
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
33
Privacy vs. Discrepancy: Online-R (Choroline)
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
34
Privacy vs. Discrepancy: Online-R (Stock)
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
35
Running Time Analysis
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
36
Running Time Analysis
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
37
Future Work
Combing correlation and autocorrelation
Other type of data streams, other than numeric data, such as categorical data
Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams
38
Questions
Thank you!