differentially private aggregation of distributed time-series

32
Differentially Private Aggregation of Distributed Time-Series Vibhor Rastogi (University of Washington) Suman Nath (Microsoft Research)

Upload: malcolm-gamble

Post on 31-Dec-2015

33 views

Category:

Documents


0 download

DESCRIPTION

Differentially Private Aggregation of Distributed Time-Series. Vibhor Rastogi (University of Washington) Suman Nath (Microsoft Research). Participatory Data Mining. Untrusted Aggregator. How many people visit google.com and then visit yahoo.com on day i ?. google.com private.com - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Differentially Private Aggregation  of Distributed Time-Series

Differentially Private Aggregation of Distributed Time-Series

Vibhor Rastogi (University of Washington)

Suman Nath (Microsoft Research)

Page 2: Differentially Private Aggregation  of Distributed Time-Series

Participatory Data Mining

Untrusted Aggregator

Alice Bob Charlie Delta

google.comprivate.com

hateBoss.com

yahoo.comlikeBoss.com

espn.com

facebook.comfindDate.comprivate.com

google.comprivate.comfindGift.com

How many people visit google.com and then visit yahoo.com on day i ?

• Web History • Medical Info• GPS Traces

Page 3: Differentially Private Aggregation  of Distributed Time-Series

Participatory Data Mining

Untrusted Aggregator

Alice Bob Charlie Delta

Week 10Weight: 120

Cholesterol: 60

Week 10Weight: 120

Cholesterol: 60

Week 10Weight: 120

Cholesterol: 60

Week 10Weight: 120

Cholesterol: 60

How many people weigh > 200 pounds and have cholesterol > 80 in week i?

• Web History • Medical Info• GPS Traces

Page 4: Differentially Private Aggregation  of Distributed Time-Series

Participatory Data Mining

Alice Bob Charlie Delta

How many people take a particular route at 5 PM? How many people take a particular route at 5:15PM? …

Privacy Concerns!

Untrusted Aggregator

• Web History • Medical Info• GPS Traces

Goal: Enable untrusted aggregator to query users’ time-series data with formal privacy

Page 5: Differentially Private Aggregation  of Distributed Time-Series

Current State-of-the-art

Alice Bob Charlie Delta

Traffic Analyzer

How many people were at 148th Street & 36th Ave at 5PM?

Page 6: Differentially Private Aggregation  of Distributed Time-Series

Current State-of-the-art

Alice Bob Charlie Delta

Traffic Analyzer

Yes No No Yes

Trusted Server How many people were at 148th Street & 36th Ave at 5PM?

Page 7: Differentially Private Aggregation  of Distributed Time-Series

Current State-of-the-art

Alice Bob Charlie Delta

Traffic Analyzer

Trusted Server

Actual answer =2

Noisy answer = 3.6

Formal Privacy achieved for right noiseNoise still small for a single query

How many people were at 148th Street & 36th Ave at 5PM?

Page 8: Differentially Private Aggregation  of Distributed Time-Series

Alice Bob Charlie Delta

Traffic Analyzer

At 5PM, were you at 148th Street & 36th Ave?

Current State-of-the-art

Formal Privacy achieved for right noiseNoise still small for a single query

Two Main Challenges

1. Noise in each answer = O(# of queries)

2. Trusted Server required

Trusted Server

Actual answer =2

Noisy answer = 3.6How many people at 148th Street & 36th Ave at 5PM? How many people at 148th Street & 36th Ave at 5:15PM?

How many people at 148th Street & 36th Ave at 7AM?

Noisy answer = 203.6

???

Page 9: Differentially Private Aggregation  of Distributed Time-Series

Outline

• Background: Differential Privacy• Challenge #1: Sequence of Queries• Challenge #2: No Trusted Server• Experimental Evaluation

Page 10: Differentially Private Aggregation  of Distributed Time-Series

Background: Differential privacy[Dwork 06]

For a sequence of queries q1,q2,…,qN

Laplace random noise added to each queryWorst case: noise has to increase linearly with N

56 + Laplace Noise

How many in Bldng99 at 5?

Privacy

Algorithm[Dwork 06]

Differential Privacy[Dwork 06] : Output should be indistinguishable:

VS.

Name Location Time

….. …. ….

Smith Building 99 5:00

Alice 148 & 36 5:00

….. …. ….

Name Location Time

….. …. ….

Smith Building 99 5:00

Name Location Time

….. …. ….

Smith Building 99 5:00

Alice 148 & 36 5:00

Alice 148 & 38 5:15

…. …. ….

Page 11: Differentially Private Aggregation  of Distributed Time-Series

Outline

• Background: Differential Privacy• Challenge #1: Sequence of Queries• Challenge #2: No Trusted Server• Experimental Evaluation

Page 12: Differentially Private Aggregation  of Distributed Time-Series

Answering Sequence of Queries

q1 = # of people in 148th & Sr 520 at 5:00PMq2 = # of people in 148th & Sr 520 at 5:15PM……qN = # of people in 148th & Sr 520 at 1:25AM

Standard algorithm[Dwork et. al. 06] result in Θ(N) noise

Noise too large for long sequences!

Name Location Time

….. …. ….

Smith Building 99 5PM

Name Location Time

….. …. ….

Smith Building 99 5PM

Alice 148th & Sr 520 5PM

Alice 148th & Sr 520 5:15PM

…. …. ….

Alice 148th & Sr 520 1:25AM

Page 13: Differentially Private Aggregation  of Distributed Time-Series

Solution: Compress the sequence

q’i has some error compared to qi

– Error is small if qi has periodic nature– k/N is the compression ratio

DFT-based Compression (NOT private):

Discrete Fourier Transform (DFT):Inverse

DFTDFT

DFT Inverse DFT

q1,…,qN f1,…,fN

Reduce effective N by compressing the sequence

q1,…,qN

q1,…,qN f1,..,fk,fk+1,..,fNq’1,…,q’Nf1,..,fk,0,..,0

Page 14: Differentially Private Aggregation  of Distributed Time-Series

DFT-based Compression - Examples

qiqi’k = 20N = 2000

Page 15: Differentially Private Aggregation  of Distributed Time-Series

DFT-based Compression - Examples

qiqi’ k = 10

N = 2000

Day #

Page 16: Differentially Private Aggregation  of Distributed Time-Series

Our DFT-based Perturbation Algorithm

• Perturbation error: O(N) to O(k)– An improvement of k/N

• Additional compression error often quite small

Our Algorithm1. q1,..,qN f1,..,fk

2. Perturb ’fi’ = fi + noise

3. f1',..,fk',0,0,…,0 q’1,…,q’N

Main ResultStrong differential privacy achieved;

Error in qi’ = O(k) + Compression error

DFT

Inverse DFT

Page 17: Differentially Private Aggregation  of Distributed Time-Series

Outline

• Background: Differential Privacy• Challenge #1: Sequence of Queries• Challenge #2: No Trusted Server• Experimental Evaluation

Page 18: Differentially Private Aggregation  of Distributed Time-Series

No Trusted ServerNo known efficient technique for distributed Laplace noise

Laplace noise = combination of Gaussian noiseGaussian noise can be generated distributedly

Individual noise too small, = (total noise/m)

Use cryptographic techniques to hide individual data despite small individual noise

Distributed Paillier Cryptosystem • Homomorphic encryption: add encrypted data• Threshold decryption: many private keys distributed among users decryption requires a threshold # of users

Page 19: Differentially Private Aggregation  of Distributed Time-Series

Basic Protocol

Alice Bob Charlie Delta

Traffic Analyzer

How many were at 148th Street & 36th Ave at 5PM?

Page 20: Differentially Private Aggregation  of Distributed Time-Series

Basic Protocol (Contd.)

Alice Bob Charlie Delta

Traffic Analyzer

1 0 0 1

Trusted Server

+noise +noise +noise +noiseE(1+noise) E(0+noise) E(0+noise) E(1+noise)

Addition over encrypted data Exploiting homomorphic propertyE(sum) = E(user1) * E(user2) * …

Page 21: Differentially Private Aggregation  of Distributed Time-Series

Basic Protocol (Contd.)

Alice Bob Charlie Delta

Traffic Analyzer

E(sum)

Page 22: Differentially Private Aggregation  of Distributed Time-Series

Basic Protocol (Contd.)

Alice Bob Charlie Delta

Traffic Analyzer

D1[E(sum)] D2[E(sum)] D3[E(sum)] D4[E(sum)]Each user partially decrypts using her key

Finally combines all decryption Exploiting threshold propertySum=D1[E(sum)] * D2[E(sum)] * …

Page 23: Differentially Private Aggregation  of Distributed Time-Series

One Tricky Challenge

Alice Bob Charlie Delta

Traffic Analyzer

E(sum)

During protocol, Encrypted aggregate sent back to the users

Third-party agent can be malicious

E(Alice’s Data)

Alice’s data is breached

D1[E(Alice’s data)] … … D4[E(Alice’s data)]

Page 24: Differentially Private Aggregation  of Distributed Time-Series

Outline

• Challenge #1: Handling correlations• Challenge #2: No trusted Server• Experimental Evaluation• Conclusion

Page 25: Differentially Private Aggregation  of Distributed Time-Series

Experimental Evaluation

• Implemented both solutions on– 2.8 GHz Intel Pentium PC with 1GB RAM

• Evaluated:– Accuracy improvement by Fourier perturbation– Performance overhead in distributed noise-addition

Page 26: Differentially Private Aggregation  of Distributed Time-Series

Fourier Perturbation: Real DatasetsSource: Predestination [Krum et. al. 08]GPS Traces

Fourier-based Standard[Dwork et. al. 06]

Source: hackers.comWeight Data

Fourier-based Standard[Dwork et. al. 06]

Page 27: Differentially Private Aggregation  of Distributed Time-Series

Distributed Noise Addition: Performance Overhead

• Computation Overhead

• Space Overhead– 0.5 Kb for each user– 0.5 Kb/user for the aggregator

Page 28: Differentially Private Aggregation  of Distributed Time-Series

Conclusion

• Participator Data Mining applications require– Accurately answer sequence of queries– Distributed noise-addition

• We saw a solution based on– Fourier compression & perturbation– Cryptographic protocols

Page 29: Differentially Private Aggregation  of Distributed Time-Series

Backup slides

Page 30: Differentially Private Aggregation  of Distributed Time-Series

Current State-of-the-art

Formal Privacy achieved for right noiseNoise still small for a single query

At 5PM, were you at 148th Street & 36th Ave?

1. Noise in each answer = O(# of queries)2. Trusted Server required

At 5:15PM, were you at 148th Street & 36th Ave?

At 7AM, were you at 148th Street & 36th Ave? …

Two Main Challenges

Alice Bob Charlie Delta

Traffic Analyzer

Yes No No Yes

Trusted Server

NoNoNoYes

Page 31: Differentially Private Aggregation  of Distributed Time-Series

Two main challengesChallenge #1: Correlations in Time-Series Data

Name Age Location Time

Alice 25 Building 99 5 PM

Alice 25 36th Street 5:02 PM

Alice 25 Building 112 5:03 PM

Bob 32 Building 99 5:35 PM

Lots of tuple correlations!

Building 99

36th Street

Building 112

Current privacy techniques can’t handle tuple correlations!

Page 32: Differentially Private Aggregation  of Distributed Time-Series

Two main challenges

Challenge #2: No trusted server

Alice Bob Charlie Delta

Traffic Analyzer

Were you at 520 bridge at 5 PM?

Yes No No Yes

Trusted ServerUsers add noise individually

NoNoNoYes

Total error grows with # of users!