07 analyzing privacy enterprise slide
TRANSCRIPT
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
1/21
Bruno Ribeiro, Gerome Miklau, Don TowsleyUMass Amherst
Weifeng Chen
California University of Pennsylvania
Analyzing Privacy in Enterprise Packet
Trace Anonymization
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
2/21
2
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Motivation
InternetEnterprise(university)
Monitor
src address dest address src port dest port
14.1.1.1 11.0.0.3 6738 80
18.0.0.1 11.0.0.1 2434 22
11.0.0.1 20.0.0.3 6913 80
Packets
Packet header traces
Used for networking research
Many public repositories (UMass, CAIDA, LBNL, )
Raw trace may violate user privacy
If enterprise IP addresses can be tied to individuals
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
3/21
3
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Motivation
src addr. dest addr. src port dest port
14.1.1.1 11.0.0.3 6738 80
11.0.0.1 20.0.0.3 7913 22
src addr. dest addr. src port dest port
200.0.1.2 128.0.64.2 6738 80 128.0.64.0 5.0.4.5 7913 22
anonymizationmapping
Anonymized
trace
Trace repositoriesAnonymize IP addresses
Two most widely used schemes
Full prefix preservation (Xuet al. , 2001)Partial prefix preservation (Pang et al. 2006)
Original
trace
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
4/21
4
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Adversary
Adversarial model:
De-anonymizeenterprise IP addresses in the trace1. Probes (scan) enterprise network2. Collects similar information from the trace
De-anonymizes trace IPsmatching (1) with (2)
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
5/21
5
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Outline
Our contributions
New attack on IP anonymization:
Attack overviewDefined as a tree editing distance problem
Worst-case analysis:
From a set of trace labels (information)Assesses worst-case attack
Related work
Conclusions
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
6/21
6
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Proposed attack overview
Adversary provides:
Labeled treeconstructed using anonymizedtraceLabeled tree constructed from probingenterpriseA cost (or distance) function(to deal with mismatchedlabels)
Our algorithm finds:
All de-anonymizationsthatcomply withprefix preservation restrictionsandhave minimum total cost
An instance of the tree edit distanceproblem
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
7/21
7
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Full prefix preserving anonymization
Full prefix preservation
If two real addresses share first X bits, then
the same two anonymizedaddresses share first X bitsIt imposes restrictions on the real IP AnonymizedIP mapping
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
8/21
8
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Labeled trees
Trace tree
Match sets:
00 maps to{01
} 10 maps to {10, 11}
Probed tree
Web server
Not a Web server
Probed IP leaf labels
No traffic on port 80
Trace IP leaf labels
00 01 10 11 00 01 10 11
0
0
1
1
0 1
Match set:
00 maps to {00, 01, 10, 11} 10 maps to {00, 01, 10, 11}
Traffic on port 80
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
9/21
9
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Imperfect information
Trace treeProbed tree
Web server
Not a Web server
Traffic on port 80
No traffic on port 80
Backup Web server Correct mapping
Other sources of imperfect labels: Dynamic IP addresses, host shutdown, etc.
Probed IP leaf labels Trace IP leaf labels
00 01 10 11 00 01 10 11
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
10/21
10
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Mapping costs
Assign a cost to map two IPs with different labels
Is zero if labels are equal
Mapping cost
Sum of all individual costs
Trace treeProbed tree
Cost = 0
Cost = 1
Cost = 1
Example:
1
0 00
Total cost = 1
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
11/21
11
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Proposed attack
Allminimum costmappings (over the whole network)Because it is prefix-preserving
Every de-anonymization limits future de-anonymizations
And our algorithm is fast
10 seconds (on this laptop) for all mappings of a network with 216addresses
Probed tree Trace tree
?
00 01 10 11 00 01 10 11
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
12/21
12
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Experiment
Network: class B (64K addresses)
Labels
Active host
Active ports: FTP, SSH, Telnet, E-mail, Time, DNS, Web, POP3, SOCKS
Trace IP labels
Active hostlabel recorded any outgoing trafficActive portsRecorded traffic from ports 80, 22, .
Probed IP labels
Probed over all network
Active hostlabel PINGActive portsTCP SYN ACK reply from ports 80, 22,
Nave cost function: Zero is labels are equal, one otherwise
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
13/21
13
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Experiment results
Trace collected: 2007, June 18th
(9097 active IPs)Network probed: 2007, June 18th
0%
10%
20%
30%
40%
50%
60%
1 2 3 4 5 6 7 8
size of matching set
cumulativefractionof
hosts
inthetrace
Correct
matches
Uniquely re-identified
BAD
Datapublishe
rsview
Incorrect
matches
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
14/21
14
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Worst-case analysis
Given a labeled trace tree
Findbestde-anonymization
We provide an algorithm that
Obtains worst attack matching set size
For each IP address in the trace
For any label mismatch cost function
For any labeledprobed tree
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
15/21
15
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Worst-case experiment
Full prefix preservationJune 18th experiment
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 7 8
size of matching set
Nave attack
Worst-caseattack
Uniquely re-identified
BAD
Datapublishersview
cumulativefractionof
hostsin
thetrace
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
16/21
16
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Partial prefix preservation
Does not retain part of the address structure
Used in Pang et al., 2006
Solution also formulated as an instance of the tree edit distanceproblem
Probed tree rootProbed tree root
8 bits8 bits
Anonymized tree rootAnonymized tree root
8 bits8 bits 8 bits8 bits
Anonymization
mapping
8 bits8 bits
Up to 256 addresses
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
17/21
17
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 7 8
size of matching set
Partial vs. Full prefix preservation
Intuition: Partial is much safer than full prefix preservation
Worst case:Full prefixpreservation
Worst case:Partial prefixpreservation
BAD
Datapublishersview
cumulativefractionof
hostsin
thetrace
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
18/21
18
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Worst-case analysis (II)
Uniquely re-identified
Full prefix preservation: 2713 active IP addresses in the trace
Partial prefix preservation: 113 active IP addresses in the trace
Partial prefix preservation is saferbut not completely safe
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
19/21
19
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Related work
Playing Devil's Advocate: Inferring Sensitive Information from Anonymized Traces,Scott Coull, Charles Wright, Fabian Monrose, Michael Collins andMichael Reiter,NDSS 2007
An attack on partial prefix preservation
Taming the Devil: Techniques for Evaluating Anonymized Network Data,Scott Coull, Charles Wright, Fabian Monrose, Angelos Keromytis and Michael Reiter,NDSS 2008
Comes right after this talk
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
20/21
20
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Conclusions
Attack
Include global mapping restrictions
An instance of the tree edit distance problem
Indicates that full prefix preservation has flawsImpact of late probing on the de-anonymization
Worst-case analysis
Can help future anonymization schemes
A tool for data publishers
Experiments indicate that:
Partial is much safer than full prefix preservation
But still not completely safe
-
8/8/2019 07 Analyzing Privacy Enterprise Slide
21/21
21
Bruno Ribeiro, Weifeng Chen, Gerome Miklau, and Don Towsley, Analyzing Privacy in Enterprise Packet Trace Anonymization
Thanks
Jim Kurose, UMassAmherstEdmundo de Souza e Silva, Federal University of Rio de Janeiro
Kyoungwon Suh,Illinois State UniversityAnonymous NDSS08 reviewers
Neils Provos, Google Inc.