cs 3640: introduction to networks and their...
TRANSCRIPT
1
CS 3640: Introduction to
Networks and Their Applications
Fall 2018, Lecture 21: Internet Privacy II(Credit: Prof. Christo Wilson @ NEU)
Instructor: Rishab NithyanandTeaching Assistant: Md. Kowsar Hossain
2
You should…
• Be working on Assignment 4• Due tomorrow at midnight
• Know and understand:• The three Internet design principles and components of the Internet.
• Circuit- vs. packet- switched networks.
• Components of end-to-end delay.
• The link layer: error detection, MAC, local addressing/routing.
• The network layer: addressing, fragmentation, IPv4 vs. IPv6, Inter/Intra-domain routing.
• The transport layer: core functionality, TCP vs. UDP, flow control vs. congestion control.
• Network Address Translation: Why do we need it? How does it work?
• DNS: Why do we need it? How does it work? Types of records? Indirect benefits.
• HTTP (The Web): Components of Web pages. HTTP connection and message types.
• Web privacy: How is the Internet economy driven? IP tracking vs. cookies vs.
fingerprinting. Can trackers share cookies?
3
Today in class
1.Tracking technologies
2. 3.Tracking thetrackers
Real-time bidding
Recap: Tracking technologies
• Stateful tracking.• Store state on the client (browser). • Identify clients uniquely through this stored state.• Example: Cookie-based tracking.
• Stateless tracking.• Do not store state on the client (browser).• Identify clients by characteristics of their device. • Example: IP-based tracking, fingerprinting.
• Discuss: Benefits and problems of each approach.
tracker1.comnike.com
* Slide concept taken from Lerner et al. (usenix ‘16)
Set this cookie:id=123
Browsing Profile for User 123:• nike.com
tracker1.com: id=123
Stateful tracking technologies: Cookies
tracker1.comadidas.com
* Slide concept taken from Lerner et al. (usenix ‘16)
Your cookie:id=123
Browsing Profile for User 123:• nike.com• adidas.com
tracker1.com: id=123
Stateful tracking technologies: Cookies
tracker1.comcnn.com
* Slide concept taken from Lerner et al. (usenix ‘16)
Browsing Profile for User 123:• nike.com• adidas.com• cnn.com
tracker1.com: id=123
Your cookie:id=123
Stateful tracking technologies: Cookies
tracker1.com
cnn.com Browsing Profile for User 123
tracker1.com: id=123
tracker2.com
Browsing Profile for User abc
tracker3.com
Browsing Profile for User xyz
tracker2.com: id=abc
tracker3.com: id=xyz
* Slide concept taken from Lerner et al. (usenix ‘16)
Stateful tracking technologies: Cookies
GET / HTTP/1.1
Host: www.google.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/hmtl
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.68 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Discuss: Which of these is client dependent?
Stateless tracking technologies: Browser fingerprinting
• Browser’s Canvas API helps in generating local images, drawing, fonts.
• Dependent on implementation by OS and underlying hardware.
• E.g GPU for 3D graphics
• Canvas objects generated can be very unique.
Stateless tracking technologies: Canvas fingerprinting
• AudioContext API.
• Process an audio signal for fingerprinting.
• Does not need microphone access!
• Used by Liverail.
• Battery Status API.• Originally intended to allow websites to serve “lite” versions of webpages
for users when required.• Battery level
• Check the current battery level. Returns a value between 0 and 1.
• Battery charging• A boolean, returning if the device/computer is currently being charged.
• Battery chargingTime• Time left in seconds until it is fully charged. Available when charging.
• Battery dischargingTime• Time left in seconds until it is discharged. Available when not charging.
• Mozilla removed it from Firefox. Other browsers still allow it.
Stateless tracking technologies: Audio and battery fingerprinting
• Discuss: What other client characteristics could be used to identify a user?
• Many more high-entropy characteristics are observable via Javascript/plugins.• What time zone are you in?
• What fonts are installed on your machine?
• What plugins are installed, and what are their versions?
• What is your screen resolution and color depth?
• Availability of specific JS APIs (i.e. browser version or platform dependent features)
• Existence of specific browser extensions (e.g. AdBlock)
• Order in which HTTP headers are sent
• Hardware-level characteristics like CPU ID and frequency (MHz)
Stateless tracking technologies: Other fingerprinting approaches
• Advertisers use multiple ways to fingerprint.
• They build probabilistic models to (re)identify users with high probability.
• How fingerprintable are you?
• Test yourself: https://panopticlick.eff.org
Stateless tracking technologies: Putting everything together
14
Today in class
1.Tracking technologies
2. 3.Tracking thetrackers
Real-time bidding
Tracking the trackers: Web
• A measurement of tracker behavior on the Alexa Top 1M
websites.• Work by Steven Englehart and Arvind Narayanan from Princeton University.
Tracking the trackers: Web
• Methodology• Start with a list of known trackers.
• Limitation: no new discoveries/lower bound of tracker presence.
• Visit the frontpages of the top 1 million websites. Log all HTTP requests and
responses.• Limitation: Not representative of real user interactions.
• Analyze cookie usage and Javascript API calls by known trackers to identify
methods of tracking.
• In spite of the limitations, the amount and complexity of analysis
performed makes this the gold standard for methodology.• 81 million requests and responses were analyzed.
• Limitations mainly had to be enforced to deal with scalability and technology
limitations (convincing browser automation is hard!)
Tracking the trackers: Web
• Parent organizations of trackers found on the top million sites.• ~75% had a Google owned tracker
• ~35% had a Facebook owned tracker
• Less than 100 organizations in the entire ecosystem (with presence in at least
1000 sites).
Tracking the trackers: Web
• Third-parties and trackers are everywhere.• Worst offender: News websites had ~35 trackers on average.
• Smallest offender: Adult websites had ~4 trackers on average.
Tracking the trackers: Web
• Third-parties and trackers do not use encrypted communication.• 54% of third-parties were sending/receiving all user data as cleartext.
Tracking the trackers: Web
• Canvas fingerprinting is quite common.• 14.3K of the top million sites had a tracker performing canvas image or font
fingerprinting.
• Popular sites were more likely to use this approach.
Tracking the trackers: Mobile
• A measurement of tracker behavior on Android apps.• (My) work in collaboration with UC-Berkeley, UMass-Amherst, SBU.
• We built an app called the Lumen Privacy Monitor (download it!)• [as of 12/2017] 12K users, 40K monitored apps.
• What personally identifiable information (PII) is sent by which apps to which domains?
• We only gather aggregated data. Everything is anonymized.
• Limitation: Doing deep dives is hard due to privacy concerns.
• Benefit: What we do find is representative of real users and their interactions.
Who gets your data?
The Alphabet monopoly3.6%
73%
Mobile-specific ATSes
The Outsiders
22
70% of all tracker flows with
PII terminate in a different
country.
General Data Protection
Regulation (GDPR) • EU laws apply to data from EU
citizens.
Of all PII flows from the EU:• 90% terminate in the US
• 4% terminate in China
23
Challenges with tracking the trackers: How do you enforce regulations?
Of the top 10 ATS organizations:• 10 reserve the right to share data with subsidiaries.
• 8 reserve the right to sell data to third-parties.
We don’t know who the subsidiaries and third-parties are.
?
24
Challenges with tracking the trackers: How is user data disseminated?
[Subject of my current research]
• Children’s Online Privacy Protection Act (COPPA) • Parental permission is required for tracking in software targeted at
children.
• Of the top 10 ATS organizations:• Only 4 have COPPA-specific policies.
• Only 5 have an inappropriate ads reporting system.
• 24% of all apps targeted at children perform tracking.• Identifying COPPA violations requires the ability to inspect consent
gathering & opt-out procedures. • These are not uniform or automatically measurable.
• We expect similar problems will arise with GDPR
tracking consent forms.25
Challenges with tracking the trackers: Identifying violations of regulations
[Subject of my current research]
26
Today in class
1.Tracking technologies
2. 3.Tracking thetrackers
Real-time bidding
27
Real-time bidding
• How does tracking data actually feed into the
advertising ecosystem?
28
Real-time bidding
Real Time Bidding (RTB)
CNN’s Cookie
Rubicon’s Cookie=XYZ
DoubleClick’s Cookie=123
UserX Publisher Ad Exchange DSPsSSP
Solicit bids, DoubleClick’s Cookie
RightMedia’s Cookie=ABC
Advertisement
DSPs cannot read their cookie!How can they bid if they cannot identify the user?
RightMedia’s Bid = $1.5
Criteo’s Bid = $1.0
UserX=xo$ UserX=ABC
Real-time bidding
•Key problem: DSPs cannot read their cookies in the RTB auction• How can they submit reasonable bids if they cannot identify the user?
• Solution: cookie matching• Also known as cookie synching• Process of linking the identifiers used by two ad networks
Cookie=123
?dblclk_id=123, Cookie=ABC
301 Redirect, Location=http://rightmedia.com/?dblclk_id=123
Partner My ID PID
DoubleClick ABC 123
MatchingTable
Real-time bidding and cookie matching
Real Time Bidding (RTB)
CNN’s Cookie
Rubicon’s Cookie=XYZ
DoubleClick’s Cookie=123
UserX Publisher Ad Exchange DSPsSSP
Solicit bids, DoubleClick’s Cookie
RightMedia’s Cookie=ABC
Advertisement
RightMedia’s Bid = $1.5
Criteo’s Bid = $1.0
UserX=xo$ UserX=ABC
Now RightMedia (rm) can look up it’s DB to find that user ABC is user 123 (doubleclick)
Real-time bidding