data privacy and anonymization
DESCRIPTION
In the world of Big Data, there has been a lot of the research into creating efficient algorithms that can help us gain statistical insight from the large databases that record much of our life. However, as our digital footprint becomes larger, many databases that were originally considered anonymous can now be re-identified. How do we make sure that doesn't happen?TRANSCRIPT
Big Data and Attacks on Privacy: How to Properly Anonymize Social Networks and Databases (and Keep Them That Way)AC 298r Final PresentationRyan Lee and Jeffrey Wang
Obligatory Social Network Stats
http://www.mediabistro.com/alltwitter/files/2013/11/growth-of-social-media-2013.jpg
Uses of Social Data: Research
Bollen et al. (2011). CS109 Harvard Univ.Fall 2013
Christakis & Fowler (2010). Christakis & Fowler (2007).
Uses of Social Data: Marketing
Facebook.com
Bio-Rad
Chang, R., Lee, A., Ghoniem, M., Kosara, R., Ribarsky, W., Yang, J., ... & Sudjianto, A. (2008). Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information visualization, 7(1), 63-76.
Uses of Social Data: Government
Challenge: Privacy
Naive Approach: Anonymization
Name Favorite Pizza Favorite Course
Ryan Lee Supreme AC298r
Jeffrey Wang Pepperoni AC298r
Daniel Weinstock Anchovies AC298r
Naive Approach: Anonymization
Name Favorite Pizza Favorite Course
Ryan Lee Supreme AC298r
Jeffrey Wang Pepperoni AC298r
Daniel Weinstock Anchovies AC298r
Priority: Security
Concern: Digital Footprint
NSA Data Warehouse
Deanonymization is Possible
Sweeny, Fuzziness and Knowledge-based Systems, 2002
Netflix Prize 2
Netflix De-anon: How they did it● 500,000 record dataset was super-sparse
Netflix “Anonymized” DataPublic Data (IMDb, twitter, blogs, etc.)
Match if: time < thresholdmovie rating < threshold
Names
Surnames in Genomic Sequences
TACATA is a real last name...
“Anonymized” Cell Phone Data
de Montjoye, Y. A., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the Crowd: The privacy bounds of human mobility. Scientific reports, 3.
Defenses (lol JK)
K-Anonymity
Sweeny, Fuzziness and Knowledge-based Systems, 2002
A Tough Problem
DOB, Gender, and ZIP Code is enough to uniquely identify 87% of US Citizens
Sweeny, Fuzziness and Knowledge-based Systems, 2002
Solution?
First Last Age Race
Harry Stone 34 African American
John Reyser 36 Caucasian
Beatrice Stone 34 African American
John Delgado 22 Hispanic
Sweeny, Fuzziness and Knowledge-based Systems, 2002
Solution: Suppression and Generalization
First Last Age Race
Harry Stone 34 African American
John Reyser 36 Caucasian
Beatrice Stone 34 African American
John Delgado 22 Hispanic
k=2: Polynomial Solution! (Simplex Matching)k>=3: NP-Hard (Graph Decomposition)
Sweeny, Fuzziness and Knowledge-based Systems, 2002
● Users are ε times less likely to be identified if they chose not to participate in the database
Differential Privacy
Dwork, ICALP, 2002
Anonymity in Social Networks
Peter S. Bearman, James Moody, and Katherine Stovel, Chains of affection: The structure of adolescent romantic and sexual networks, American Journal of Sociology 110, 44-91 (2004).
http://www-personal.umich.edu/~mejn/networks/addhealth.gif
High School Dating Network
Information-rich Network Structure
Backstrom, L., & Kleinberg, J. (2013). Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook. arXiv preprint arXiv:1310.6753.
Attacks on Social Networks
● Passive: Find yourselves● Active: structural steganography
http://www.cse.psu.edu/~asmith/courses/privacy598d/www/lec-notes/Attacking%20Social%20Network%20FINAL.pdf
No isomorphicNo automorphism
Obfuscating Social Networks
Zhou and Pei, KAIS, 2011
Part 1: Construct Min-DFS Tree for Neighborhood
Zhou and Pei, KAIS, 2011
2 Useful Properties
1. Social Networks follow a Power-Law Distribution
2. Social Networks typically have a small diameter (6 degrees of separation)
Step 2: Anonymize Similar Vertices
Zhou and Pei, KAIS, 2011
Step 3: ??? => Step 4: Profit!
Zhou and Pei, KAIS, 2011
thanks
bye