cs 3640: introduction to networks and their...

31
1 CS 3640: Introduction to Networks and Their Applications Fall 2018, Lecture 21: Internet Privacy II (Credit: Prof. Christo Wilson @ NEU) Instructor: Rishab Nithyanand Teaching Assistant: Md. Kowsar Hossain

Upload: others

Post on 17-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

1

CS 3640: Introduction to

Networks and Their Applications

Fall 2018, Lecture 21: Internet Privacy II(Credit: Prof. Christo Wilson @ NEU)

Instructor: Rishab NithyanandTeaching Assistant: Md. Kowsar Hossain

Page 2: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

2

You should…

• Be working on Assignment 4• Due tomorrow at midnight

• Know and understand:• The three Internet design principles and components of the Internet.

• Circuit- vs. packet- switched networks.

• Components of end-to-end delay.

• The link layer: error detection, MAC, local addressing/routing.

• The network layer: addressing, fragmentation, IPv4 vs. IPv6, Inter/Intra-domain routing.

• The transport layer: core functionality, TCP vs. UDP, flow control vs. congestion control.

• Network Address Translation: Why do we need it? How does it work?

• DNS: Why do we need it? How does it work? Types of records? Indirect benefits.

• HTTP (The Web): Components of Web pages. HTTP connection and message types.

• Web privacy: How is the Internet economy driven? IP tracking vs. cookies vs.

fingerprinting. Can trackers share cookies?

Page 3: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

3

Today in class

1.Tracking technologies

2. 3.Tracking thetrackers

Real-time bidding

Page 4: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Recap: Tracking technologies

• Stateful tracking.• Store state on the client (browser). • Identify clients uniquely through this stored state.• Example: Cookie-based tracking.

• Stateless tracking.• Do not store state on the client (browser).• Identify clients by characteristics of their device. • Example: IP-based tracking, fingerprinting.

• Discuss: Benefits and problems of each approach.

Page 5: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

tracker1.comnike.com

* Slide concept taken from Lerner et al. (usenix ‘16)

Set this cookie:id=123

Browsing Profile for User 123:• nike.com

tracker1.com: id=123

Stateful tracking technologies: Cookies

Page 6: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

tracker1.comadidas.com

* Slide concept taken from Lerner et al. (usenix ‘16)

Your cookie:id=123

Browsing Profile for User 123:• nike.com• adidas.com

tracker1.com: id=123

Stateful tracking technologies: Cookies

Page 7: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

tracker1.comcnn.com

* Slide concept taken from Lerner et al. (usenix ‘16)

Browsing Profile for User 123:• nike.com• adidas.com• cnn.com

tracker1.com: id=123

Your cookie:id=123

Stateful tracking technologies: Cookies

Page 8: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

tracker1.com

cnn.com Browsing Profile for User 123

tracker1.com: id=123

tracker2.com

Browsing Profile for User abc

tracker3.com

Browsing Profile for User xyz

tracker2.com: id=abc

tracker3.com: id=xyz

* Slide concept taken from Lerner et al. (usenix ‘16)

Stateful tracking technologies: Cookies

Page 9: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

GET / HTTP/1.1

Host: www.google.com

Connection: keep-alive

Cache-Control: max-age=0

Accept: text/hmtl

User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.68 Safari/537.36

Accept-Encoding: gzip,deflate,sdch

Accept-Language: en-US,en;q=0.8

Discuss: Which of these is client dependent?

Stateless tracking technologies: Browser fingerprinting

Page 10: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

• Browser’s Canvas API helps in generating local images, drawing, fonts.

• Dependent on implementation by OS and underlying hardware.

• E.g GPU for 3D graphics

• Canvas objects generated can be very unique.

Stateless tracking technologies: Canvas fingerprinting

Page 11: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

• AudioContext API.

• Process an audio signal for fingerprinting.

• Does not need microphone access!

• Used by Liverail.

• Battery Status API.• Originally intended to allow websites to serve “lite” versions of webpages

for users when required.• Battery level

• Check the current battery level. Returns a value between 0 and 1.

• Battery charging• A boolean, returning if the device/computer is currently being charged.

• Battery chargingTime• Time left in seconds until it is fully charged. Available when charging.

• Battery dischargingTime• Time left in seconds until it is discharged. Available when not charging.

• Mozilla removed it from Firefox. Other browsers still allow it.

Stateless tracking technologies: Audio and battery fingerprinting

Page 12: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

• Discuss: What other client characteristics could be used to identify a user?

• Many more high-entropy characteristics are observable via Javascript/plugins.• What time zone are you in?

• What fonts are installed on your machine?

• What plugins are installed, and what are their versions?

• What is your screen resolution and color depth?

• Availability of specific JS APIs (i.e. browser version or platform dependent features)

• Existence of specific browser extensions (e.g. AdBlock)

• Order in which HTTP headers are sent

• Hardware-level characteristics like CPU ID and frequency (MHz)

Stateless tracking technologies: Other fingerprinting approaches

Page 13: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

• Advertisers use multiple ways to fingerprint.

• They build probabilistic models to (re)identify users with high probability.

• How fingerprintable are you?

• Test yourself: https://panopticlick.eff.org

Stateless tracking technologies: Putting everything together

Page 14: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

14

Today in class

1.Tracking technologies

2. 3.Tracking thetrackers

Real-time bidding

Page 15: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Tracking the trackers: Web

• A measurement of tracker behavior on the Alexa Top 1M

websites.• Work by Steven Englehart and Arvind Narayanan from Princeton University.

Page 16: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Tracking the trackers: Web

• Methodology• Start with a list of known trackers.

• Limitation: no new discoveries/lower bound of tracker presence.

• Visit the frontpages of the top 1 million websites. Log all HTTP requests and

responses.• Limitation: Not representative of real user interactions.

• Analyze cookie usage and Javascript API calls by known trackers to identify

methods of tracking.

• In spite of the limitations, the amount and complexity of analysis

performed makes this the gold standard for methodology.• 81 million requests and responses were analyzed.

• Limitations mainly had to be enforced to deal with scalability and technology

limitations (convincing browser automation is hard!)

Page 17: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Tracking the trackers: Web

• Parent organizations of trackers found on the top million sites.• ~75% had a Google owned tracker

• ~35% had a Facebook owned tracker

• Less than 100 organizations in the entire ecosystem (with presence in at least

1000 sites).

Page 18: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Tracking the trackers: Web

• Third-parties and trackers are everywhere.• Worst offender: News websites had ~35 trackers on average.

• Smallest offender: Adult websites had ~4 trackers on average.

Page 19: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Tracking the trackers: Web

• Third-parties and trackers do not use encrypted communication.• 54% of third-parties were sending/receiving all user data as cleartext.

Page 20: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Tracking the trackers: Web

• Canvas fingerprinting is quite common.• 14.3K of the top million sites had a tracker performing canvas image or font

fingerprinting.

• Popular sites were more likely to use this approach.

Page 21: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Tracking the trackers: Mobile

• A measurement of tracker behavior on Android apps.• (My) work in collaboration with UC-Berkeley, UMass-Amherst, SBU.

• We built an app called the Lumen Privacy Monitor (download it!)• [as of 12/2017] 12K users, 40K monitored apps.

• What personally identifiable information (PII) is sent by which apps to which domains?

• We only gather aggregated data. Everything is anonymized.

• Limitation: Doing deep dives is hard due to privacy concerns.

• Benefit: What we do find is representative of real users and their interactions.

Page 22: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Who gets your data?

The Alphabet monopoly3.6%

73%

Mobile-specific ATSes

The Outsiders

22

Page 23: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

70% of all tracker flows with

PII terminate in a different

country.

General Data Protection

Regulation (GDPR) • EU laws apply to data from EU

citizens.

Of all PII flows from the EU:• 90% terminate in the US

• 4% terminate in China

23

Challenges with tracking the trackers: How do you enforce regulations?

Page 24: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Of the top 10 ATS organizations:• 10 reserve the right to share data with subsidiaries.

• 8 reserve the right to sell data to third-parties.

We don’t know who the subsidiaries and third-parties are.

?

24

Challenges with tracking the trackers: How is user data disseminated?

[Subject of my current research]

Page 25: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

• Children’s Online Privacy Protection Act (COPPA) • Parental permission is required for tracking in software targeted at

children.

• Of the top 10 ATS organizations:• Only 4 have COPPA-specific policies.

• Only 5 have an inappropriate ads reporting system.

• 24% of all apps targeted at children perform tracking.• Identifying COPPA violations requires the ability to inspect consent

gathering & opt-out procedures. • These are not uniform or automatically measurable.

• We expect similar problems will arise with GDPR

tracking consent forms.25

Challenges with tracking the trackers: Identifying violations of regulations

[Subject of my current research]

Page 26: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

26

Today in class

1.Tracking technologies

2. 3.Tracking thetrackers

Real-time bidding

Page 27: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

27

Real-time bidding

Page 28: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

• How does tracking data actually feed into the

advertising ecosystem?

28

Real-time bidding

Page 29: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Real Time Bidding (RTB)

CNN’s Cookie

Rubicon’s Cookie=XYZ

DoubleClick’s Cookie=123

UserX Publisher Ad Exchange DSPsSSP

Solicit bids, DoubleClick’s Cookie

RightMedia’s Cookie=ABC

Advertisement

DSPs cannot read their cookie!How can they bid if they cannot identify the user?

RightMedia’s Bid = $1.5

Criteo’s Bid = $1.0

UserX=xo$ UserX=ABC

Real-time bidding

Page 30: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

•Key problem: DSPs cannot read their cookies in the RTB auction• How can they submit reasonable bids if they cannot identify the user?

• Solution: cookie matching• Also known as cookie synching• Process of linking the identifiers used by two ad networks

Cookie=123

?dblclk_id=123, Cookie=ABC

301 Redirect, Location=http://rightmedia.com/?dblclk_id=123

Partner My ID PID

DoubleClick ABC 123

MatchingTable

Real-time bidding and cookie matching

Page 31: CS 3640: Introduction to Networks and Their Applicationshomepage.divms.uiowa.edu/~rnithyanand/cs3640-f18/slides/l22-nov08.pdf•AudioContext API. •Process an audio signal for fingerprinting

Real Time Bidding (RTB)

CNN’s Cookie

Rubicon’s Cookie=XYZ

DoubleClick’s Cookie=123

UserX Publisher Ad Exchange DSPsSSP

Solicit bids, DoubleClick’s Cookie

RightMedia’s Cookie=ABC

Advertisement

RightMedia’s Bid = $1.5

Criteo’s Bid = $1.0

UserX=xo$ UserX=ABC

Now RightMedia (rm) can look up it’s DB to find that user ABC is user 123 (doubleclick)

Real-time bidding