![Page 1: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/1.jpg)
What’s the Gist? Privacy-Preserving Aggregation of User ProfilesIgor Bilogrevic (Google), Julien Freudiger (PARC), Emiliano De Cristofaro (UCL), Ersin Uzun (PARC)
Scott Kildall – Data Crystals
![Page 2: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/2.jpg)
2
Data is the Crux of Internet Economy
Corporations seek personal data for better targeting
More data and more sensitive data
Data Brokers
Third Parties
UsersUsersUsersUsers
Credit card transactionsInterestsPolitical partyApps usageBrowsing historyMobility patterns…
![Page 3: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/3.jpg)
3
Issues with Current Approach
PrivacyWhat personal data is collected?How much and how good is it?
TransparencyWho knows what about me?[1]
Where does this data come from?
RemunerationUsers value their data Users don’t get money for it Data Brokers
A Call for Transparency and AccountabilityFTC, May 2014
[1] aboutthedata.com
![Page 4: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/4.jpg)
4
“This question calls for Acxiom to provide information that would reveal business practices that are of a highly competitive nature. Acxiom cannot provide a list of each entity that has provided data from, or about, consumers to us.”
ACXIOM
![Page 5: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/5.jpg)
5
![Page 6: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/6.jpg)
6Julian Oliver - 2013
![Page 7: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/7.jpg)
7
An Emerging Model
Data Brokers
Third Parties
UsersUsersUsersUsers
Participatory Data Brokers
BenefitsUsers retain control over who access what about themUsers decide what data can be monetizedUsers get some revenue
![Page 8: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/8.jpg)
8
“What if Facebook paid you? Several startups envision an era in which we are all the brokers, and beneficiaries, of our own personal data.“
David Zax, Is personal data the new currency? MIT Tech Review
You
![Page 9: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/9.jpg)
9
Our Contribution
What’s the Gist? Method for monetization of user personal data with privacyUsers choose what to shareBrokers are not required to be trustworthy
IdeaRather than selling data as-is, monetize a model of the data
Age20 30 50
pdfUser data (age)
User1 22User2 56User3 43User4 33…
Aggregate (age)
40 60
![Page 10: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/10.jpg)
10
System Architecture
AggregatorThird Party
1. Query 2. Select users
3. QueriesUsers
5. Noisy encrypted answers
6. Aggregate, decrypt, sample, and monetize
7. Answer
UsersUsersUsers
4. Extract features
Interactive modeCustomer queries for certain desired aggregates
Batch modeAggregator prepares certain aggregates
![Page 11: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/11.jpg)
11
Users – Profile Computation
Each user i has profile pi with K attributes {ai,j}
Each element ai,j is an integer representing a value or a preference
ai,2
ai,2
ai,3
..
..ai,K
User i
pi =
2822356..23
pi =Example
Age# of friendsAction moviesDrama movies…Rock musicHistory books
![Page 12: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/12.jpg)
12
Users – Feature Computation
Features depend on chosen probability modelFor Gaussian model, each user i computes
fi = {[ai,1 , ai,12], …, [ai,K , ai,K
2]}
[28], [282][223], [2232][5], [52][6], [62]..[2], [22][3], [32]
pi =
Age# of friendsAction moviesDrama movies…Rock musicHistory books
![Page 13: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/13.jpg)
13
Private Aggregation
PrivacyDifferentially private ri prevents aggregator from deducting user data[1]
SecurityAggregator can only decrypt sumNo shared secret, no pairwise distributed computations
Aggregator
…
User i
User 1
User n
Assume
Knows
Computes
[1] E Shi et al. Privacy-Preserving Aggregation of Time-Series Data. NDSS, 2011
![Page 14: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/14.jpg)
14
Aggregator – Gaussian Approximation
Entities contribute
Enc[a1], Enc[a12], …, Enc[ai], Enc[ai
2]
Broker aggregates to compute mean μ, and variance σ2
Obtains Gaussian approximation N(μ, σ2) for each attribute
age
N(μ, σ2)pdf
![Page 15: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/15.jpg)
15
Aggregator - Attribute Ranking
AssumptionAttributes with uniform distribution reveal less information about individual entities
Measure divergenceDistance between two probability distributionsJenson-Shannon (JS) divergenceSmall JS distance means low value
Uniform distribution
![Page 16: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/16.jpg)
16
Performance
Dataset and implementation100,000 real users from U.S. Census [data.gov, July 2013]3 types of attributes (income, education, age)Java, measurements on Core i5 2.53 GHz, 8 GB RAM
MetricsAccuracy of Gaussian approximationInformation leakage for each attributeRevenueOverhead
![Page 17: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/17.jpg)
17
Inco
me
Edu
cati
on
Age
100 users 1,000 users 100,000 users
![Page 18: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/18.jpg)
18
Gaussian Approximations
Accuracy improves quickly with number of users (100 is good)
Fit for income and age is 3x better than for education
![Page 19: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/19.jpg)
19
Information Leakage vs Uniform
Maximum information leakage achieved at about 1,000 users
Information leakage not necessarily increasing with number of users (stable after a while)
Larger user samples do not necessarily provide better discriminating features
![Page 20: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/20.jpg)
20
Revenue Model
Value of user information: from $0.0005[2] to $33[1]
Where w=0.1 is the commission.
[1] J. P. Carrascal, C. Riederer, V. Erramilli, M. Cherubini, and R. de Oliveira. Your browsing behavior for a big mac: Economics of personal information online. WWW, 2013[2] L. Olejnik, T. Minh-Dung, C. Castelluccia. Selling off privacy at auction. NDSS, 2014
![Page 21: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/21.jpg)
21
Revenue per AttributeThree privacy sensitivity distributions
User revenue is small and does not increase with the number of participants Revenue similar to Amazon Mechanical Turk
Broker incentivized to collect as many users as possible ($0.07 $ 2897)
Third parties incentivized to select demographic group of size 100
![Page 22: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/22.jpg)
22
Overhead
1.5 min for 100 users 27.7 h for 100,000 usersCan and should be parallelized
User Aggregator
1 ms totalIndependent of number of users
![Page 23: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/23.jpg)
23
Related Work
Privacy-preserving aggregation
Modified version of the Paillier encryption scheme[1,2] But P2P communications between participants
Homomorphic encryption and differential privacy[3,4] But differential privacy by third party and contributions linkable to users before aggregation
[1] Z. Erkin and G. Tsudik. Private computation of spatial and temporal power consumption with smart meter. ACNS 2012[2] E. Shi, R. Zhang, Y. Liu, and Y. Zhang. Prisense: privacy-preserving data aggregation in people-centric urban sensing systems. INFOCOM, 2010[3] R. Chen, I. E. Akkus, and P. Francis. Splitx: high-performance private analytics. SIGCOMM, 2013 [4] R. Chen, A. Reznichenko, P. Francis, and J. Gehrke. Towards statistical queries over distributed private user data. NSDI, 2012
![Page 24: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/24.jpg)
24
Related Work
Privacy-preserving monetization
Local user profile generation, categorization, and ad selection[1,2]
Anonymizing proxies to shield users’ behavioral data from third parties[3]
[1] V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum, and S. Barocas. Adnostic: Privacy preserving targeted advertising. NDSS, 2010[2] S.Guha, B.Cheng, and P. Francis. Privad: practical privacy in online advertising. NSDI, 2011[3] C. Riederer, V. Erramilli, A. Chaintreau, B. Krishnamurthy, and P. Rodriguez. For sale: your data: by: you. HotNETs, 2011
![Page 25: What’s the Gist? Privacy-Preserving Aggregation of User Profiles](https://reader036.vdocuments.mx/reader036/viewer/2022070400/56813590550346895d9cfbdb/html5/thumbnails/25.jpg)
25
Conclusion
Designed method to monetize sensitive data with privacy
If data is new currency, we are creating marketplace
Evaluation shows practical performance, good accuracy with as little as 100 users and good incentives for parties involved
Future workEnhance security features (range checks to thwart pollution attacks, fault-tolerance, efficient key establishment)Enable targeting of users after aggregationEnable subsequent collection of more than model (i.e., black swan)