narayanan-shmatikov attack (ieee s&p 2009)

26
Structural Data De- anonymization: Quantification, Practice, and Implications Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology Mudhakar Srivatsa IBM T. J. Watson Research Center

Upload: mackenzie-donaldson

Post on 31-Dec-2015

54 views

Category:

Documents


5 download

DESCRIPTION

Narayanan-Shmatikov attack (IEEE S&P 2009). A. Narayanan and V. Shmatikov, De-anonymizing Social Networks , IEEE S&P 2009 . Anonymized data: Twitter (crawled in late 2007) A microblogging service 224K users, 8.5M edges Auxiliary data: Flicker (crawled in late 2007/early 2008) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Narayanan-Shmatikov attack (IEEE S&P 2009)

Structural Data De-anonymization: Quantification, Practice, and Implications

Structural Data De-anonymization: Quantification, Practice, and Implications

Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology

Mudhakar SrivatsaIBM T. J. Watson Research Center

Page 2: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Narayanan-Shmatikov attack (IEEE S&P 2009)• A. Narayanan and V. Shmatikov, De-anonymizing Social Networks, IEEE S&P 2009.

• Anonymized data: Twitter (crawled in late 2007)

– A microblogging service

– 224K users, 8.5M edges

• Auxiliary data: Flicker (crawled in late 2007/early 2008)

– A photo-sharing service

– 3.3M users, 53M edges

• Result: 30.8% of the users are successfully de-anonymized

Twitter Flicker

User mapping

HeuristicsEccentricityEdge directionalityNode degreeRevisiting nodesReverse match

Page 3: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Srivatsa-Hicks Attacks (ACM CCS 2012)• M. Srivatsa and M. Hicks, De-anonymizing Mobility Traces: using Social Networks as a Side-

Channel, ACM CCS 2012.

• Anonymized data

– Mobility traces: St Andrews, Smallblue, and Infocom 2006

• Auxiliary data

– Social networks: Facebook, and DBLP

• Over 80% users can be successfully de-anonymized

Page 4: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Motivation• Question 1: Why can structural data be de-anonymized?

• Question 2: What are the conditions for successful data de-anonymization?

• Question 3: What portion of users can be de-anonymized in a structural dataset?

[1] P. Pedarsani and M. Grossglauser, On the Privacy of Anonymized Networks, KDD 2011.

[2] L. Yartseva and M. Grossglauser, On the Performance of Percolation Graph Matching, COSN 2013.

[3] N. Korula and S. Lattanzi, An Efficient Reconciliation Algorithm for Social Networks, VLDB 2014.

Page 5: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Motivation• Question 1: Why can structural data be de-anonymized?

• Question 2: What are the conditions for successful data de-anonymization?

• Question 3: What portion of users can be de-anonymized in a structural dataset?

Our Constribution Address the above three open questions under a practical

data model.

Page 6: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Outline

• Introduction and Motivation• System Model• De-anonymization Quantification• Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice• Implication 2: Secure Data Publishing• Conclusion

Page 7: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

System Model

• Anonymized Data

• Auxiliary Data

• De-anonymization

• Measurement1

2

3

4 5

6

78

9

10

1

2

3

4 5

6

78

9

10

Page 8: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

System Model

• Anonymized Data

• Auxiliary Data

• De-anonymization

• Measurement

• Quantification

1

2

3

4 5

6

78

9

10

1

2

3

4 5

6

78

9

10

1

2

3

4 5

6

78

9

10

conceptual underlying graph

Configuration ModelG can have an arbitrary degree sequence

that follows any distribution

Page 9: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Outline

• Introduction and Motivation• System Model• De-anonymization Quantification• Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice• Implication 2: Secure Data Publishing• Conclusion

Page 10: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

De-anonymization Quantification• Perfect De-anonymization Quantification

Structural Similarity Condition Graph/Data Size Condition

Page 11: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

De-anonymization Quantification• -Perfect De-anonymization Quantification

Structural Similarity ConditionGraph/Data Size Condition

Page 12: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Outline

• Introduction and Motivation• System Model• De-anonymization Quantification• Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice• Implication 2: Secure Data Publishing• Conclusion

Page 13: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Evaluation• Datasets

Page 14: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Evaluation• Perfect De-anonymization Condition Structural Similarity Condition

Graph/Data Size Condition

Page 15: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Evaluation• -Perfect De-anonymization Condition

Structural Similarity Condition Graph/Data Size ConditionProjection/Sampling Condition

Page 16: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Evaluation• -Perfect De-anonymizability Structural Similarity Condition

Page 17: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Evaluation• -Perfect De-anonymizability Structural Similarity Condition

How many users can be successfully de-anonymized

Page 18: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Outline

• Introduction and Motivation• System Model• De-anonymization Quantification• Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice• Implication 2: Secure Data Publishing• Conclusion

Page 19: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Optimization based De-anonymization (ODA)• Our quantification implies

– An optimum de-anonymization solution exists

– However, it is difficult to find it.

Select candidate users from unmapped users with top degrees

Mapping candidate users by minimizing the Edge Error

function

Page 20: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Optimization based De-anonymization (ODA)• Our quantification implies

– An optimum de-anonymization solution exists

– However, it is difficult to find it.

Space complexity

Time complexity

ODA Features1. Cold start (seed-free)

2. Can be used by other attacks for landmark (seed) identification

3. Optimization based

Page 21: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

ODA Evaluation • Dataset

– Google+ (4.7M users, 90.8M edges): using random sampling to get anonymized graphs and auxiliary graphs

– Gowalla: • Anonymized graphs: constructed based on 6.4M check-ins <UserID, latitude, longitude,

timestamp, location ID> generated by .2M users

• Auxiliary graph: the Gowalla social graph of the .2 users (1M edges)

– Results: landmark identification

Page 22: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

ODA Evaluation • Dataset

– Google+ (4.7M users, 90.8M edges): using random sampling to get anonymized graphs and auxiliary graphs

– Gowalla: • Anonymized graphs: constructed based on 6.4M check-ins <UserID, latitude, longitude,

timestamp, location ID> generated by .2M users

• Auxiliary graph: the Gowalla social graph of the .2 users (1M edges)

– Results: de-anonymization

Page 23: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Outline

• Introduction and Motivation• System Model• De-anonymization Quantification• Evaluation • Implication 1: Optimization based De-anonymization (ODA) Practice• Implication 2: Secure Data Publishing• Conclusion

Page 24: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Secure Structural Data Publishing

• Structural information is important

• Based on our quantification– Secure structural data publishing is difficult, at least theoretically

• Open problem …

Page 25: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Conclusion

– We proposed the first quantification framework for structural data de-anonymization under a practical data model

– We conducted a large-scale de-anonymizability evaluation of 26 real world structural datasets

– We designed a cold-start optimization-based de-anonymization algorithm

AcknowledgementWe thank the anonymous reviewers very much

for their valuable comments!

Page 26: Narayanan-Shmatikov attack (IEEE S&P 2009)

S. Ji, W. Li, M. Srivatsa and R. Beyah Structural Data De-anonymization

Thank you!Shouling Ji

[email protected]

http://users.ece.gatech.edu/sji/