secgraph:*a uniform and open-source evaluation system for ...secgraph:*a uniform and open-source...

24
SecGraph: A Uniform and Open-source Evaluation System for Graph Data Anonymization and De-anonymization Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Ins=tute of Technology Prateek MiDal Princeton University Xin Hu IBM Research

Upload: others

Post on 25-Mar-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

SecGraph:  A Uniform and Open-source Evaluation System for Graph Data

Anonymization and De-anonymization  

Shouling  Ji,  Weiqing  Li,  and  Raheem  Beyah  Georgia  Ins=tute  of  Technology  

 

Prateek  MiDal  Princeton  University  

 

Xin  Hu  IBM  Research  

 

 

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    2  

Introduc=on    •  Graph  Data  

–  Social  networks  data  

–  Mobility  traces  

–  Medical  data  

–  ……  

•  Why  data  sharing/publishing  –  Academic  research  

–  Business  applica=ons    

–  Government  applica=ons  

–  Healthcare  applica=ons  

–  ……  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    3  

Anonymiza=on  and  De-­‐anonymiza=on  

Anonymization

De-anonymization

Utility Anonymization vs DA

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    4  

Contribu=on  •  We  studied  and  analyzed  

–  11  state-­‐of-­‐the-­‐art  graph  anonymiza=on  techniques  

–  15  modern  Structure-­‐based  De-­‐Anonymiza=on  (SDA)  aDacks  

•  We  developed  and  evaluated  SecGraph  –  11  anonymiza=on  techniques  

–  12  graph  u=lity  metrics  

–  7  applica=on  u=lity  metrics  

–  15  de-­‐anonymiza=on  aDacks  

•  We  released  SecGraph  –  Publicly  available  at  hDp://www.ece.gatech.edu/cap/secgraph/      

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    5  

Outline  

•  Introduc=on    •  Graph  Anonymiza=on  •  Graph  De-­‐anonymiza=on  

•  Anonymiza=on  vs.  De-­‐anonymiza=on  Analysis  •  SecGraph  

•  Conclusion  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    6  

Graph  Anonymiza=on  •  Naïve  ID  Removal    

•  Edge  Edi=ng  (EE)  based  anonymiza=on  –  Add/Del  

–  Switch  

•  K-­‐anonymity  –  K-­‐Neighborhood  Anonymiza=on  (k-­‐NA)  

–  K-­‐Degree  Anonymiza=on  (k-­‐DA)  

–  K-­‐automorphism  (k-­‐auto)  

–  K-­‐isomorphism  (k-­‐iso)  

•  Aggrega=on/class/cluster  based  anonymiza=on  

•  Differen=al  Privacy  (DP)  based  anonymiza=on  

•  Random  Walk  (RW)  based  anonymiza=on  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    7  

U=lity  Metrics  •  Graph  u=lity  (12)  

–  Degree  (Deg.)  

–  Joint  Degree  (JD)  

–  Effec=ve  Diameter  (ED)  

–  Path  Length  (PL)  

–  Local  Clustering  Coefficient  (LCC)  

–  Global  Clustering  Coefficient  (GCC)  

–  Closeness  Centrality  (CC)  

–  Betweenness  Central=y  (BC)  

–  Eigenvector  (EV)  

–  Network  Constraint  (NC)  

–  Network  Resilience  

–  Infec=ousness  (Infe.)  

•  Applica=on  u=lity  (7)  –  Role  eXtrac=on  (RX)  

–  Influence  Maximiza=on  (IM)  

–  Minimum-­‐sized  Influen=al  Node  Set  (MINS)  

–  Community  Detec=on  (CD)  

–  Secure  Rou=ng  (SR)  

–  Sybil  Detec=on  (SD)  

Graph utility captures how the anonymized data preserve fundamental structural

properties of the original graph

How useful the anonymized data are for practical graph applications and mining

tasks

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    8  

Anonymiza=on  vs  U=lity  Resilient to DA attacks

Anonymization Techniques

Conditionally preserve the utility

Partially preserve the utility

Not preserve the utility

preserve the utility

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    9  

Anonymiza=on  vs  U=lity  Resilient to DA attacks

Anonymization Techniques

Conditionally preserve the utility

Partially preserve the utility

Not preserve the utility

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    10  

Outline  

•  Introduc=on    •  Graph  Anonymiza=on  •  Graph  De-­‐anonymiza=on  

•  Anonymiza=on  vs.  De-­‐anonymiza=on  Analysis  •  SecGraph  

•  Conclusion  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    11  

Graph  De-­‐Anonymiza=on  (DA)  •  Seed-­‐based  DA  

–  Backstrom  et  al.’s  aDack  (BDK)  

–  Narayanan-­‐Shma=kov’s  aDack  (NS)  

–  Narayanan  et  al.’s  aDack  (NSR)  

–  Nilizadeh  et  al.’s  aDack  (NKA)  

–  Distance  Vector  (Srivatsa-­‐Hicks  aDack)  (DV)  

–  Randomized  Spanning  Trees  (Srivatsa-­‐Hicks  aDack)  (RST)  

–  Recursive  Subgraph  Matching  (Srivatsa-­‐Hicks  aDack)  (RSM)  

–  Yartseva-­‐Grossglauser’s  aDack  (YG)  

–  De-­‐Anonymiza=on  aDack  (Ji  et  al.’s  aDack)  (DeA)  

–  Adap=ve  De-­‐Anonymiza=on  aDack  (Ji  et  al.’s  aDack)  (ADA)  

–  Korula-­‐LaDanzi’s  aDack  (KL)  

•  Seed-­‐free  DA  –  Pedarsani  et  al.’s  aDack  (PFG)  

–  Ji  et  al.’s  aDack  (JLSB)  

Anonymized Graph Auxiliary Graph

Seed mappings

Figure source: E. Kazemi et al., Growing a Graph Matching from a Handful of Seeds, VLDB 2015

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    12  

DA  ADacks  

SF: seed-free AGF: auxiliary graph-free SemF: semantics-free A/P: active/passive attack Scal.: scalable Prac.: practical Rob.: robust to noise

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    13  

Outline  

•  Introduc=on    •  Graph  Anonymiza=on  •  Graph  De-­‐anonymiza=on  

•  Anonymiza=on  vs.  De-­‐anonymiza=on  Analysis  •  SecGraph  

•  Conclusion  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    14  

Anonymiza=on  vs.  DA  

Modern DA Attacks

State-of-the-art Anonymization

Techniques

: vulnerable : conditionally vulnerable : invulnerable

DA attacks!

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    15  

Outline  

•  Introduc=on    •  Graph  Anonymiza=on  •  Graph  De-­‐anonymiza=on  

•  Anonymiza=on  vs.  De-­‐anonymiza=on  Analysis  •  SecGraph  

•  Conclusion  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    16  

SecGraph:  Overview  &  Implementa=on  

Publishing Data

AnonymizedData

Raw Data

AnonymizationModule (AM)Utility

Module (UM)

DAModule (DM)

15 DA Attacks

19 Utility Metrics

11 Anonymization Techniques

Raw Data -> AM -> Anonymized Data -> UM/DM -> Publishing Data Raw Data -> Publishing Data

Raw Data -> UM/DM -> AM -> Anonymized Data -> Publishing Data

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    17  

SecGraph:  Evalua=on  •  Datasets    

–  Enron  (36.7K  users,  0.2M  edges):  an  email  network  dataset  

–  Facebook  –  New  Orleans  (63.7K  users,  0.82M  edges):  a  Facebook  friendship  dataset  in  the  New  Orleans  area  

•  Evalua=on  senngs  –  Follow  the  same/similar  senngs  in  the  original  papers  

–  See  details  in  the  paper  and  hDp://www.ece.gatech.edu/cap/secgraph/    

•  Evalua=on  scenarios  –  Anonymiza=on  vs  U=lity  

–  DA  performance  

–  Robustness  of  DA  aDacks  

–  Anonymiza=on  vs  DA  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    18  

SecGraph:  Anonymity  vs  U=lity  

C1: existing anonymization techniques are better at preserving graph utilities

C2: all the anonymization techniques lose one or more application utility

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    19  

SecGraph:  DA  Performance  

0.00%  

20.00%  

40.00%  

60.00%  

0.6   0.65   0.7   0.75   0.8   0.85   0.9   0.95  

De-­‐anonymize  Enron  

NS   DV   JLSB  

0.00%  

20.00%  

40.00%  

60.00%  

80.00%  

100.00%  

0.6   0.65   0.7   0.75   0.8   0.85   0.9   0.95  

De-­‐anonymize  Facebook  

NS   DV   JLSB  

Phase transitional phenomena

C1: the phase transitional (percolation) phenomena is attack-dependent

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    20  

SecGraph:  Robustness  of  DA  ADacks  % of error seeds

0%  

10%  

20%  

30%  

40%  

4%  

8%  

12%  

16%  

20%  

24%  

28%  

32%  

36%  

40%  

44%  

48%  

%  OF  SEED  ERROR  

De-­‐anonymize  Enron  

NS   DV   ADA  

0%  20%  40%  60%  80%  100%  

4%  

8%  

12%  

16%  

20%  

24%  

28%  

32%  

36%  

40%  

44%  

48%  

%  OF  SEED  ERROR  

De-­‐anonymize  Facebook  

NS   DV   ADA  

C1: global structure-based DA attacks are more robust to seed errors

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    21  

SecGraph:  Anonymiza=on  vs  DA  

C1: state-of-the-art anonymization techniques are vulnerable to one or several modern DA attacks

C2: no DA attack is general optimal. The DA performance depends on several factors, e.g., similarity between anonymized and auxiliary graphs, graph density, and DA heuristics

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    22  

SecGraph:  Release  &  Support  •  Website  

–  hDp://www.ece.gatech.edu/cap/secgraph/  

–  Sopware  

–  Datasets    

–  Documents  

–  Demo  

–  Q&A    

•  Modes  –  GUI  

–  Command  line  

•  Suppor=ng  –  Windows  

–  Linux  

–  Mac  

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    23  

Conclusion  

–  We  analyzed  exis=ng  graph  anonymiza=on  techniques  and  de-­‐anonymiza=on  aDacks  

–  We  implemented  and  evaluated  SecGraph:  a  uniform  and  open-­‐source  evalua=on  system  from  graph  data  anonymiza=on  and  de-­‐anonymiza=on  

Acknowledgement We thank the anonymous reviewers very much for their valuable

comments!

We are grateful to the following researchers in developing SecGraph: Ada Fu, Michael Hay, Davide Proserpio, Qian Xiao, Shirin Nilizadeh,

Jing S. He, Wei Chen, and Stanford SNAP developers

S.  Ji,  W.  Li,  P.  MiDal,  X.  Hu,  and  R.  Beyah   SecGraph:  A  Uniform  and  Open-­‐Source  Evalua=on  System                                    24  

Thank  you!  Shouling  Ji  

[email protected]  

hDp://www.ece.gatech.edu/cap/secgraph/    

hDp://users.ece.gatech.edu/sji/