non-tracking web analytics istemi ekin akkus 1, ruichuan chen 1, michaela hardt 2, paul francis 1,...
TRANSCRIPT
Non-tracking Web Analytics
Istemi Ekin Akkus1, Ruichuan Chen1, Michaela Hardt2,
Paul Francis1, Johannes Gehrke3
1 Max Planck Institute for Software Systems2 Twitter Inc.3 Cornell University
Non-tracking Web Analytics 2
Web Analytics
Statistics about users visiting a publisher website
Akkus et al.
Non-tracking Web Analytics 3
Analytics by Data Aggregators
• Collect analytics for many publishers from many clients
• Infer extended analytics– Age, gender, education level, other sites visited, …
• Provide aggregate information to publishers & advertisers
Akkus et al.
Aggregate Extended Analytics
Data AggregatorPublisher
Non-tracking Web Analytics 5
Tracking
• Data aggregators criticized– Collection of individual information
• Criticisms led to reactions– Do-not-Track proposal, EU cookie law– Voluntary opt-out mechanisms by aggregators– Client-side tools to blacklist aggregators
• Fewer tracked users less data for inference worse extended analytics for publishersAkkus et al.
Non-tracking Web Analytics 6
Goal
Replicate the functionality of today’s
systems without tracking
Akkus et al.
Non-tracking Web Analytics 7
Specific Goals
• Privacy– No individual information collected by publishers
& aggregators
• Functionality– Aggregate information for publishers &
aggregators– No new organizational components– Practical and efficient
Akkus et al.
Non-tracking Web Analytics 8
Outline
• Motivation & Goals
• Components & Assumptions
• Non-tracking Analytics
• Implementation & Evaluation
• ConclusionAkkus et al.
Non-tracking Web Analytics 9
Components
• Client locally stores information about the user
• Publisher serves webpages to clients
• Aggregator provides aggregation service
Akkus et al.
Non-tracking Web Analytics 10
Assumptions
Akkus et al.
• Potentially malicious client– May try to distort results
• Potentially malicious publisher– May try to violate individual user privacy
• Honest-but-curious data aggregator– Follows the protocol– Doesn’t collude with publishers
Non-tracking Web Analytics 11
Outline
• Motivation & Goals• Components & Assumptions• Non-tracking Analytics– Publisher as Proxy– Noise– Yes-No Queries– Auditing
• Implementation & Evaluation• ConclusionAkkus et al.
Non-tracking Web Analytics 12
Today
• Not anonymous; need a proxy…• …, but don’t want a new component
Publisher already interacts with clients!
Akkus et al.
Non-tracking Web Analytics 13
Publisher as Anonymizing Proxy
4. Aggregator counts anonymous answers and returns results
1. Publisher distributes queries to be executed2. Publisher collects encrypted answers3. Publisher forwards answers to the aggregatorClients never exposed to the
data aggregator
1. Queries
2. EncryptedAnswers 3. Encrypted
Answers
4. Results
Akkus et al.
Non-tracking Web Analytics
Identifiers in Responses
• Rare attributes– Job: CEO of ACME
Enc(CEOof ACME)
Enc(CEOof ACME)
CEO of ACMEvisits my site!
CEO of ACMEvisitsexample.com
Akkus et al.
example.com
14
Non-tracking Web Analytics 15
Noise
2. EncryptedAnswers 4. Noisy
EncryptedAnswers
6. Double-noisy Result
3. Add Noise_Publisher
5. Add Noise_Aggregator
7. RemoveNoise_Publisher
Both entities obtain noisy results
Result withNoise_Aggregator
Result withNoise_Publisher
Akkus et al.
Non-tracking Web Analytics 16
Differentially-private Noise
Hides the existence of an individual answer
CEO: real or noise??
• Requires numerical values
?
Akkus et al.
Non-tracking Web Analytics 17
Yes-No Questions
Convert queries to binary & count answers“What is your job?” “Is your job ‘CEO’?”
Noise as additional answers– Enc(‘Yes’), Enc(‘No’)
• Bonus: limits a malicious client– Either +1 or 0
• Many possible values Many questions– Job: ‘CEO’, ‘Student’, ‘Gardener’, ...
Akkus et al.
Non-tracking Web Analytics 18
Buckets
Multiple yes-no questions with one query
1. Enumerate possible answer values– Job: {‘CEO’, ‘Student’, `Gardener’, `Teacher’, ...}
2. A fixed number of ‘Yes’ answers– Job: 1
3. Clients choose ‘Yes’ for the matching bucket– Enc(‘CEO = Yes’)
4. Publisher generates additional answers– Enc(‘CEO = Yes’), Enc(‘Student = Yes’), ...
Akkus et al.
Non-tracking Web Analytics 19
Impracticalities of Differential Privacy
• Requires a privacy budget– Stop answering when budget expires– No answers from clients low-utility results
• Assumes a static database; our setting is dynamic– User population of a publisher changes– Certain user data may change
Clients keep answering queries
Akkus et al.
Non-tracking Web Analytics 20
Malicious Publishers
• Isolation attacks– Isolate a user’s response– Repeat the same query– Cancel out noise
1. Specific query conditions or buckets– Monitoring and approval by the data aggregator
2. Selectively dropping client responses
Akkus et al.
Non-tracking Web Analytics 21
Isolation via Dropping Responses
Enc(CEO)
Enc(Student)
Enc(Gardener)
Enc(CEO)
Enc(Student)Enc(Gardener)
Enc(Driver)
Enc(Mechanic)
Enc(Driver)
Mechanic: 1 + noiseDriver: 2 + noiseCEO: 1 + noise
User in themiddle isa CEO!
Akkus et al.
example.com
Non-tracking Web Analytics 22
Auditing
Enc(CEO)
Enc(Student)
Enc(CEO)
Enc(Student)Enc(nonce)
Enc(Driver)
Enc(Mechanic)
Enc(Driver)Enc(nonce)
Enc(example.com,nonce)
Enc(example.com,nonce)
Akkus et al.
example.comnonce?
example.com
Non-tracking Web Analytics 23
Outline
• Motivation & Goals• Components & Assumptions• Non-tracking Analytics– Publisher as Proxy– Noise– Yes-No Answer– Auditing
• Implementation & Evaluation• ConclusionAkkus et al.
Non-tracking Web Analytics 24
Implementation
• 2000 lines of code in total– Client: Firefox extension– Publisher software: Piwik plugin– Aggregator software: simple server
• Deployed and tested with over 200 users
• RSA public key cryptosystem
Akkus et al.
Non-tracking Web Analytics 25
Evaluation – Decryption Overhead
• Aggregator: 2.4 GHz CPU, 2048-bit key• Publisher: 50K users, 2 sets of queries/week
1. Information currently provided– Demographics, other sites– 3.6 CPU hours/week
2. Information available through our system– # pages browsed, search engines, visit frequency to
other sites– 3 CPU hours/week
Akkus et al.
Non-tracking Web Analytics 26
Evaluation – Client Overhead
• Bandwidth overhead– <100KB/week to download 11 queries– 8KB/week for all query responses
• CPU overhead for encryption– Google Chrome: 380 enc/sec– Firefox: 20 enc/sec
Akkus et al.