non-tracking web analytics istemi ekin akkus 1, ruichuan chen 1, michaela hardt 2, paul francis 1,...

27
Non-tracking Web Analytics Istemi Ekin Akkus 1 , Ruichuan Chen 1 , Michaela Hardt 2 , Paul Francis 1 , Johannes Gehrke 3 1 Max Planck Institute for Software Systems 2 Twitter Inc. 3 Cornell University

Upload: osvaldo-chalk

Post on 15-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Non-tracking Web Analytics

Istemi Ekin Akkus1, Ruichuan Chen1, Michaela Hardt2,

Paul Francis1, Johannes Gehrke3

1 Max Planck Institute for Software Systems2 Twitter Inc.3 Cornell University

Non-tracking Web Analytics 2

Web Analytics

Statistics about users visiting a publisher website

Akkus et al.

Non-tracking Web Analytics 3

Analytics by Data Aggregators

• Collect analytics for many publishers from many clients

• Infer extended analytics– Age, gender, education level, other sites visited, …

• Provide aggregate information to publishers & advertisers

Akkus et al.

Aggregate Extended Analytics

Data AggregatorPublisher

Non-tracking Web Analytics 4

Analytics Today

Akkus et al.

Publisher

Client

Data Aggregator

Non-tracking Web Analytics 5

Tracking

• Data aggregators criticized– Collection of individual information

• Criticisms led to reactions– Do-not-Track proposal, EU cookie law– Voluntary opt-out mechanisms by aggregators– Client-side tools to blacklist aggregators

• Fewer tracked users less data for inference worse extended analytics for publishersAkkus et al.

Non-tracking Web Analytics 6

Goal

Replicate the functionality of today’s

systems without tracking

Akkus et al.

Non-tracking Web Analytics 7

Specific Goals

• Privacy– No individual information collected by publishers

& aggregators

• Functionality– Aggregate information for publishers &

aggregators– No new organizational components– Practical and efficient

Akkus et al.

Non-tracking Web Analytics 8

Outline

• Motivation & Goals

• Components & Assumptions

• Non-tracking Analytics

• Implementation & Evaluation

• ConclusionAkkus et al.

Non-tracking Web Analytics 9

Components

• Client locally stores information about the user

• Publisher serves webpages to clients

• Aggregator provides aggregation service

Akkus et al.

Non-tracking Web Analytics 10

Assumptions

Akkus et al.

• Potentially malicious client– May try to distort results

• Potentially malicious publisher– May try to violate individual user privacy

• Honest-but-curious data aggregator– Follows the protocol– Doesn’t collude with publishers

Non-tracking Web Analytics 11

Outline

• Motivation & Goals• Components & Assumptions• Non-tracking Analytics– Publisher as Proxy– Noise– Yes-No Queries– Auditing

• Implementation & Evaluation• ConclusionAkkus et al.

Non-tracking Web Analytics 12

Today

• Not anonymous; need a proxy…• …, but don’t want a new component

Publisher already interacts with clients!

Akkus et al.

Non-tracking Web Analytics 13

Publisher as Anonymizing Proxy

4. Aggregator counts anonymous answers and returns results

1. Publisher distributes queries to be executed2. Publisher collects encrypted answers3. Publisher forwards answers to the aggregatorClients never exposed to the

data aggregator

1. Queries

2. EncryptedAnswers 3. Encrypted

Answers

4. Results

Akkus et al.

Non-tracking Web Analytics

Identifiers in Responses

• Rare attributes– Job: CEO of ACME

Enc(CEOof ACME)

Enc(CEOof ACME)

CEO of ACMEvisits my site!

CEO of ACMEvisitsexample.com

Akkus et al.

example.com

14

Non-tracking Web Analytics 15

Noise

2. EncryptedAnswers 4. Noisy

EncryptedAnswers

6. Double-noisy Result

3. Add Noise_Publisher

5. Add Noise_Aggregator

7. RemoveNoise_Publisher

Both entities obtain noisy results

Result withNoise_Aggregator

Result withNoise_Publisher

Akkus et al.

Non-tracking Web Analytics 16

Differentially-private Noise

Hides the existence of an individual answer

CEO: real or noise??

• Requires numerical values

?

Akkus et al.

Non-tracking Web Analytics 17

Yes-No Questions

Convert queries to binary & count answers“What is your job?” “Is your job ‘CEO’?”

Noise as additional answers– Enc(‘Yes’), Enc(‘No’)

• Bonus: limits a malicious client– Either +1 or 0

• Many possible values Many questions– Job: ‘CEO’, ‘Student’, ‘Gardener’, ...

Akkus et al.

Non-tracking Web Analytics 18

Buckets

Multiple yes-no questions with one query

1. Enumerate possible answer values– Job: {‘CEO’, ‘Student’, `Gardener’, `Teacher’, ...}

2. A fixed number of ‘Yes’ answers– Job: 1

3. Clients choose ‘Yes’ for the matching bucket– Enc(‘CEO = Yes’)

4. Publisher generates additional answers– Enc(‘CEO = Yes’), Enc(‘Student = Yes’), ...

Akkus et al.

Non-tracking Web Analytics 19

Impracticalities of Differential Privacy

• Requires a privacy budget– Stop answering when budget expires– No answers from clients low-utility results

• Assumes a static database; our setting is dynamic– User population of a publisher changes– Certain user data may change

Clients keep answering queries

Akkus et al.

Non-tracking Web Analytics 20

Malicious Publishers

• Isolation attacks– Isolate a user’s response– Repeat the same query– Cancel out noise

1. Specific query conditions or buckets– Monitoring and approval by the data aggregator

2. Selectively dropping client responses

Akkus et al.

Non-tracking Web Analytics 21

Isolation via Dropping Responses

Enc(CEO)

Enc(Student)

Enc(Gardener)

Enc(CEO)

Enc(Student)Enc(Gardener)

Enc(Driver)

Enc(Mechanic)

Enc(Driver)

Mechanic: 1 + noiseDriver: 2 + noiseCEO: 1 + noise

User in themiddle isa CEO!

Akkus et al.

example.com

Non-tracking Web Analytics 22

Auditing

Enc(CEO)

Enc(Student)

Enc(CEO)

Enc(Student)Enc(nonce)

Enc(Driver)

Enc(Mechanic)

Enc(Driver)Enc(nonce)

Enc(example.com,nonce)

Enc(example.com,nonce)

Akkus et al.

example.comnonce?

example.com

Non-tracking Web Analytics 23

Outline

• Motivation & Goals• Components & Assumptions• Non-tracking Analytics– Publisher as Proxy– Noise– Yes-No Answer– Auditing

• Implementation & Evaluation• ConclusionAkkus et al.

Non-tracking Web Analytics 24

Implementation

• 2000 lines of code in total– Client: Firefox extension– Publisher software: Piwik plugin– Aggregator software: simple server

• Deployed and tested with over 200 users

• RSA public key cryptosystem

Akkus et al.

Non-tracking Web Analytics 25

Evaluation – Decryption Overhead

• Aggregator: 2.4 GHz CPU, 2048-bit key• Publisher: 50K users, 2 sets of queries/week

1. Information currently provided– Demographics, other sites– 3.6 CPU hours/week

2. Information available through our system– # pages browsed, search engines, visit frequency to

other sites– 3 CPU hours/week

Akkus et al.

Non-tracking Web Analytics 26

Evaluation – Client Overhead

• Bandwidth overhead– <100KB/week to download 11 queries– 8KB/week for all query responses

• CPU overhead for encryption– Google Chrome: 380 enc/sec– Firefox: 20 enc/sec

Akkus et al.

Non-tracking Web Analytics 27

Summary

• Extended analytics without tracking– Differential privacy guarantees for users– Aggregate information for publishers &

aggregators

• No new organizational component

• Practical & feasible to deploy

Akkus et al.