mosaic: quantifying privacy leakage in mobile networks

21
Mosaic: Quantifying Privacy Leakage in Mobile Networks Ning Xia (Northwestern University) Han Hee Song (Narus Inc.) Yong Liao (Narus Inc.) Marios Iliofotou (Narus Inc.) Antonio Nucci (Narus Inc.) Zhi-Li Zhang (University of Minnesota) Aleksandar Kuzmanovic

Upload: trudy

Post on 25-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Ning Xia ( Northwestern University) Han Hee Song ( Narus Inc.) Yong Liao ( Narus Inc.) Marios Iliofotou ( Narus Inc.) Antonio Nucci ( Narus Inc.) Zhi -Li Zhang (University of Minnesota) Aleksandar Kuzmanovic ( Northwestern University). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mosaic: Quantifying Privacy Leakage in Mobile Networks

Mosaic: Quantifying Privacy Leakage in Mobile Networks

Ning Xia (Northwestern University)Han Hee Song (Narus Inc.)Yong Liao (Narus Inc.)Marios Iliofotou (Narus Inc.)Antonio Nucci (Narus Inc.)Zhi-Li Zhang

(University of Minnesota)Aleksandar Kuzmanovic

(Northwestern University)

Page 2: Mosaic: Quantifying Privacy Leakage in Mobile Networks

2

Scenario

IP1

ISP A … IPK IP1

CSP B … IPK

Different services

Dynamic IP,CSP/ISP

Different devices

Different information

Page 3: Mosaic: Quantifying Privacy Leakage in Mobile Networks

3

Problem

Other research

work

We are here!Tessellation Mosaic

IP1

ISP A … IPK

IP1

CSP B … IPK

Input packet traces

How much private information can be obtained and expanded about end users by monitoring network traffic?

Page 4: Mosaic: Quantifying Privacy Leakage in Mobile Networks

4

Motivation

IP1

ISP A

… IPK

IP1

CSP B

… IPK

I will know everything about everyone!

Agencies

Bad guys

Mobile Traffic:• Relevant: more personal information• Challenging: frequent IP changes

Page 5: Mosaic: Quantifying Privacy Leakage in Mobile Networks

5

• How to track users when they hop over different IPs?

Challenges

With Traffic Markers, it is possible to connect the users’ true identities to their sessions.

Sessions:Flows(5-tuple) are grouped into sessions

timeIP3

timeIP2

timeIP1Traffic Markers:Identifiers in the traffic that can be used to differentiate users

Page 6: Mosaic: Quantifying Privacy Leakage in Mobile Networks

6

Datasets

• 3h-Dataset: main dataset for most experiments• 9h-Dataset: for quantifying privacy leakage• Ground Truth Dataset: for evaluation of session

attribution• RADIUS: provide session owners

Dataset Source Description

3h-Dataset CSP-A Complete payload

9h-Dataset CSP-A Only HTTP headers

Ground Truth Dataset CSP-B Payload & RADUIS info.

Page 7: Mosaic: Quantifying Privacy Leakage in Mobile Networks

Methodology OverviewTessellationIP1

ISP A … IPK

IP1

CSP B … IPK

Mosaic construction

Network data analysis

Web crawling

Mapping fromsessions to users

Traffic attribution

Viatraffic markers

Via activity fingerprinting

Combine information from both network data and OSN profiles to infer the user mosaic.

7

Page 8: Mosaic: Quantifying Privacy Leakage in Mobile Networks

8

Traffic Attribution via Traffic Markers

Traffic Markers: • Identifiers in the traffic to differentiate users• Key/value pairs from HTTP header• User IDs, device IDs or sessions IDs

Domain Keywords Category Source

osn1.com c_user=<OSN1_ID> OSN User ID Cookies

osn2.com oauth_token=<OSN2_ID>-## OSN User ID HTTP header

admob.com X-Admob-ISU Advertising HTTP header

pandora.com user_id User ID Cookies

google.com sid Session ID Cookies

How can we select and evaluate traffic markers from network data?

Page 9: Mosaic: Quantifying Privacy Leakage in Mobile Networks

9

Traffic Attribution via Traffic Markers

OSN IDs as Anchors:• The most popular user identifiers among all services• Linked to user public profiles

OSN IDs can be used as anchors, but their coverage on sessions is too small

OSN Source Session Coverage

OSN1 ID HTTP URL and cookies 1.3%

OSN2 ID HTTP header 1.0%

Top 2 OSN providers from North America

Only 2.3% sessions contain OSN IDs

Page 10: Mosaic: Quantifying Privacy Leakage in Mobile Networks

Block Generation: Group Sessions into Blocks

10

Traffic Attribution via Traffic Markers

99K session blocks generated from the 12M sessions

Session interval δ • Depends on the CSP• δ=60 seconds in our

study

Other sessions?OSN ID

≥δ

IP 1

Block• Session group on the

same IP within a short period of time

• Traffic markers shared by the same block

timeIP 1IP

timeIP

Block

Page 11: Mosaic: Quantifying Privacy Leakage in Mobile Networks

Culling the Traffic Markers: OSN IDs are not enough

11

Traffic Attribution via Traffic Markers

• Uniqueness: Can the traffic marker differentiate between users?• Persistency: How long does a traffic marker remain the same?

Traffic markers

mobclix.com

#uidO

SN

1 ID

pandora.com#user_id

mobclix.com

#u

mydas.m

obi#m

ac-id

google.com#sid

craigslist.org#cl_b

Persistency ~= 1The value of Google.com#sid remains the same for the same user nearly all the observation duration

Uniqueness = 1No two users will share the same google.com#sid value

We pick 625 traffic markers with uniqueness = 1, persistency > 0.9

Page 12: Mosaic: Quantifying Privacy Leakage in Mobile Networks

Traffic Attribution: Connecting the Dots

12

Traffic Attribution via Traffic Markers

IP 1

IP 2

IP 3

Same OSN ID

Same traffic marker

Traffic markers are the key in attributing sessions to the same user over different IP addresses

Tessellation User Ti

( )

Page 13: Mosaic: Quantifying Privacy Leakage in Mobile Networks

13

Traffic Attribution via Activity Fingerprinting

• What if a session block has no traffic markers?

Assumption (Activity Fingerprinting):• Users can be identified from the DNS names of their

favorite services

DNS names:• Extracted 54,000 distinct DNS names• Classified into 21 classes

Serviceclasses

Serviceproviders

Search bing, google, yahoo

Chat skype, mtalk.googl.com

Dating plentyoffish, date

E-commerce amazon, ebay

Email google, hotmail, yahoo

News msnbc, ew, cnn

Picture Flickr, picasa

… …

Activity Fingerprinting:• Favorite (top-k) DNS names as the

user’s “fingerprint”

Page 14: Mosaic: Quantifying Privacy Leakage in Mobile Networks

14

Traffic Attribution via Activity Fingerprinting

Y-axis:closer to 1, more distinct

the fingerprint is

Mobile users can be identified by the DNS names from their preferred services

Normalized DNS names

• Fi : Top k DNS names from user as “activity fingerprint”• : Uniqueness of the fingerprint

X-axis:normalized by the total number of DNS names

Page 15: Mosaic: Quantifying Privacy Leakage in Mobile Networks

15

Traffic Attribution Evaluation

RiRADIUS user

(Ground Truth)

Session

Ri Rj

Ti

Correct (Not complete)

Tj

Not correct

Coverage = -----------------------

identified sessions/users

total sessions/users

= -------------------

correctly identified sessions/users

total identified sessions/users

Accuracy on Covered Set

TiTessellation user

(Correct?)

Page 16: Mosaic: Quantifying Privacy Leakage in Mobile Networks

16

Traffic Attribution Evaluation

• Evaluation Results

Session

User

Cov

erag

e

78.60%

69.00%

49.80%

43.20%

2.40%

15.70%

OSN ID extraction Via traffic markers Via activity fingerprinting

Session

User

Acc

urac

y on

Cov

ered

Set

92.50%

96.40%

94.50%

99.30%

100.00%

100.00%

OSN ID extraction Via traffic markers Via activity fingerprinting

Page 17: Mosaic: Quantifying Privacy Leakage in Mobile Networks

• Mosaic of Real User

MOSAIC with 12 information classes(tesserae):• Information (Education, affiliation and etc.) from OSN profiles• Information (Locations, devices and etc.) from users’ network data 17

Construction of User Mosaic

Least gain

Most gain

Sub-classes:Residence, coordinates, city, state, and etc.

Page 18: Mosaic: Quantifying Privacy Leakage in Mobile Networks

• Leakage from OSN profiles vs. from Network Data

18

Information from OSN profiles and network data can complete and corroborate each other

Analysis on network data provides real-time activities and locations

OSN profiles provide static user information (education, interests)

Quantifying Privacy Leakage

Information from both sides can corroborate to

each other

Page 19: Mosaic: Quantifying Privacy Leakage in Mobile Networks

• Traffic markers (OSN IDs and etc.) should be limited and encrypted

Preventing User Privacy Leakage19

Protect traffic markers

Protect user profiles

Restrict3rd parties

• Third party applications/developers should be strongly regulated

• OSN public profiles should be carefully obfuscated

Page 20: Mosaic: Quantifying Privacy Leakage in Mobile Networks

• Prevalence in the use of OSNs leaves users’ true identities available in the network

• Tracking techniques used by mobile apps and services make traffic attribution easier

• Sessions can be labeled with network users’ true identities, even without any identity leaks

• Various types of information can be gleaned to paint rich digital Mosaic about users

20

Conclusions

Page 21: Mosaic: Quantifying Privacy Leakage in Mobile Networks

Thanks!

Mosaic: Quantifying Privacy Leakage in Mobile Network

21