de-anonymizing social networks
Post on 25-Feb-2016
61 Views
Preview:
DESCRIPTION
TRANSCRIPT
De-anonymizing Social Networks
Presenter: Lijie ZhangAdvisor: Weining Zhang
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
Motivation
Social network (SN) owner publishes graph data for sharing Academic and government data-mining: phone call networks Advertising: Third-party applications: 550,000 Facebook applications
Private information on SNs: Node attributes: node degree in a sexual network Edge presence: a single call, romantic relationship
Motivation
SN owner publishes anonymized graph:Nodes have no identifying attributes
Propose a model to identify nodes from the anonymized graph:Re-identification: learn the entity to which the
node belongs to. Entity: an account, a real person, a group, an
organization
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
Model – Social Network
Social Network S:A directed graph G=(V,E)A set of node attributes X: name, telephone
numberA set of edge attributes Y: type of relationshipTreat attributes values from a discrete domain
Model – Data Release
A sanitized subset of nodes and edges in S Computation:
Vsan: subset of V Xsan: subset of X including sensitive attributes Ysan: subset of Y including sensitive attributes Published attributes by themselves are insufficient for re-
identification Compute induced subgraph on Vsan Remove some edges and add faked edges
}),|{Y(e)},XVsan,v|{X(v)Esan,(Vsan,Ssan YsanYEsaneXsan
Model – Attacker
Purpose: extract sensitive information about specific individuals from anonymized SN graphs
Attacker’s knowledge Aggregate auxiliary information Individual auxiliary information
Aggregate auxiliary information
Large-scale information from other data sources and social networks whose membership overlaps with the target network Ssan Gaux={Vaux, Eaux} AuxX and AuxY: probability distributions of each node
attribute in Vaux and edge attribute in Eaux, respectively (prior knowledge).
Individual auxiliary information
Identifiable details about a small number of individuals from the target network Ssan and possibly relationships between them
Model – Breaching Privacy
Extract sensitive information about specific individuals from Ssan
Re-identify nodes from target SN Ssan Re-identification: find a mapping μbetween a node
in Vaux and a node in Vsan : ground truth mapping Succeeds if
G)()( vv G
Model – Breaching Privacy
Re-identification algorithm: Input: Ssan and Saux Output is the probability that vaux maps to vsan
Mapping adversary:
]1,0[}){(:~ VauxVsan),(~ sanaux vv
],[,,
],[,,
][,
][,
),(~),(~),(~),(~
],,,[
),(~),(~
],,[
vuYVsanvu auxaux
yvuYVsanvu auxauxauxaux
vXVsanv aux
xvXVsanv auxaux
vvuu
vvuuyvuYAdv
vv
vvxvXAdv
Model – Breaching Privacy
Privacy breach: privacy of vsan is breached w.r.t adversary Adv and privacy parameter , if
],,,[],,,[
],,[],,[
yvuYAuxyvuYAdvor
xvXAuxxvXAdv
auxauxauxaux
auxaux
Model – Measuring Success of an Attack
Let . The success rate of a de-anonymization algorithm outputting a probabilistic mapping , w.r.t a centrality measure , is the probability that μsampled from maps a node v to if v is selected according to
})(:{ vVvV Gauxmapped
~
~ )(vG
mapped
mapped
Vv
Vv G
v
vvvPR
)(
)()]()([
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
De-anonymization Algorithm
Seed identification: apply individual auxiliary information
Propagation: apply aggregate auxiliary information
Algorithm - Seed Identification Input:
The target graph A clique of k nodes which are present both in the
auxiliary and the target graphs. The degree values of k nodes pairs of common-neighbor counts Error parameter ε
Output : k-clique with matching ( ) node degrees and common-neighbor counts.
2k
1S
Algorithm - Propagation
Inputs: G1, G2, Output: μ Iteratively find new mappings using the
topological structure of the network and the feedback from previously constructed mappings.
S
Algorithm - Propagationfunction propagationStep(lgraph, rgraph, mapping) for lnode in lgraph.nodes:
scores[lnode] = matchScores(lgraph, rgraph, mapping, lnode)if eccentricity(scores[lnode]) < theta: continuernode = (pick node from rgraph.nodes where
scores[lnode][node] = max(scores[lnode]))
scores[rnode] = matchScores(rgraph, lgraph, invert(mapping), rnode)if eccentricity(scores[rnode]) < theta: continuereverse_match = (pick node from lgraph.nodes where
scores[rnode][node] = max(scores[rnode]))
if reverse_match != lnode: continue
mapping[lnode] = rnode
Algorithm - Propagation
Eccentricity: measure how much a node in a graph “stands out” from the rest nodes.
Rejects the match if eccentricity of the set of mapping scores is below a threshold,
)()(max)max( 2
XXX
Algorithm - Propagation
Complexity: O((|E1|+|E2|)d1d2) d1 : a bound on the degree of the nodes in V1
Outlines
Motivation Attack Model De-anonymization Algorithm Experiments Conclusions
Experiments – Data Sets
Twitter, Flickr, LiveJournal:
Experiments – Seed Identification
Evaluate the feasibility of seed identification by measuring how much auxiliary information is needed to identify a unique node in the target graph.
LiveJournal graph: auxiliary and target Construct 4-cliques, and treat a 4-clique in the target
graph as a match as long as each degree and common-neighbor count matches within a factor of 1
Experiments – Seed Identification
Experiments – Propagation
Evaluate the robustness against perturbation and seed selection
Pairs of subgraphs (V1,V2), over 100,000 nodes each of a real-world SN One for auxiliary SN, the other as the target SN Perturbation strategy: two subgraphs has nodes
overlapped 25% and edges overlapped 50%
Evaluate the robustness against perturbation and seed selection
Experiments – Propagation
Mapping between two real-world social networks: Flickr and Twitter
Finding ground truth : Exact matches in either the username, or name field 27,000 mappings Human inspect ground truth error that is under 5%.
G
Mapping between two real-world social networks
Seeds: 150 pairs of nodes selected from Results:
30.8% of the mappings were re-identified correctly, 12.1% were identified incorrectly, and 57% were not identified.
41% of the incorrectly identified mappings (5% overall) were mapped to nodes which are at a distance 1 from the true mapping.
55% of the incorrectly identified mappings (6.7% overall) were mapped to nodes where the same geographic location was reported.
The above two categories overlap; of all the incorrect mappings, only 27% (or 3.3% overall) fall into neither category and are completely erroneous.
G
Conclusions
Anonymity is not sufficient for privacy when dealing with social networks.
Demonstrate feasibility of successful re-identification based solely on the network topology and assuming that the target graph is completely anonymized.
Reference
[1] Arvind Narayanan and Vitaly Shmatikov, “De-anonymizing Social Networks”, IEEE Security & Privacy '09.
top related