privacy-preserving data mining in the fully …our work •[wy04,yw05]: privacy-preserving...
TRANSCRIPT
![Page 1: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/1.jpg)
Privacy-Preserving Data Mining in theFully Distributed Model
Rebecca WrightStevens Institute of Technology
www.cs.stevens.edu/~rwrightMADNES ‘05
22 September, 2005
(Includes joint work with Zhiqiang Yang, ShengZhong, Geetha Jagannathan, Hiran Subramaniam,Onur Kardes, Rafi Ryger, and Joan Feigenbaum).
![Page 2: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/2.jpg)
The Data Revolution
• The current data revolution is fueled by advances innetworking and computing devices, as well as theperceived, actual, and potential usefulness of the data.
• Most electronic and physical activities leave some kindof data trail. These trails can provide useful informationto various parties.
• However, there are also concerns about appropriatehandling and use of sensitive information.
• Privacy-preserving methods of data handling seek toprovide sufficient privacy as well as sufficient utility.
![Page 3: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/3.jpg)
Advantages of Privacy Protection• protection of personal information
• protection of proprietary or sensitive information
• enables collaboration between different dataowners (since they may be more willing or able tocollaborate if they need not reveal theirinformation)
• compliance with legislative policies
![Page 4: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/4.jpg)
Outline
• Models for distributed data mining
• Overview of our work in privacy-preserving datamining
• Privacy-preserving classification via frequencymining in the fully distributed model
• Privacy-preserving k-anonymization in the fullydistributed model
• Future work and conclusions
![Page 5: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/5.jpg)
Models for Distributed Data Mining, I
• Horizontally Partitioned • Vertically Partitioned
… … …
… …
… …
P1
P2
P3
… … … … … … …
P1 P2
…
… … … … … … …
…
![Page 6: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/6.jpg)
Models for Distributed Data Mining, II
… … … … … … …
P1 P2
…
• Arbitrarily partitioned
![Page 7: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/7.jpg)
Models for Distributed Data Mining, III
• Fully Distributed • Client/Server(s)
CLIENT
Wishes tocompute
on servers’data
SERVER(S)
Each holdsdatabase
…
…
…
…
…
…
P1
P2
P3
Pn-1
Pn
![Page 8: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/8.jpg)
Cryptography vs. Randomization
inefficiency
privacy loss
inaccuracy
randomization approach
cryptographic approach
![Page 9: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/9.jpg)
Cryptography vs. Randomization
inefficiency
privacy loss
inaccuracy
randomization approach
cryptographic approach
![Page 10: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/10.jpg)
Our Work• [WY04,YW05]: privacy-preserving construction of Bayesian
networks from vertically partitioned data.
• [YZW05]: frequency mining and classification in the fullydistributed model (naïve Bayes classification, decision trees, andassociation rule mining).
• [JW05]: privacy-preserving k-means clustering for arbitrarilypartitioned data.
• [ZYW05]: solutions for a data publisher to learn a k-anonymized version of a fully distributed database withoutlearning anything else.
• [YWZ05b]: anonymous data collection in the fully distributedmodel.
![Page 11: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/11.jpg)
Outline
• Models for distributed data mining
• Overview of our work in privacy-preserving datamining
• Privacy-preserving classification via frequencymining in the fully distributed model
• Privacy-preserving k-anonymization in the fullydistributed model
• Future work and conclusions
![Page 12: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/12.jpg)
Privacy-Preserving Frequency [YZW05]
• Cryptographic primitive for computingfrequency on encrypted inputs in the fullydistributed setting.
• Many data mining classifiers rely only onfrequencies: e.g. naïve Bayes classifier, ID3trees.
![Page 13: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/13.jpg)
Our Setting
user user user user
miner
• Privacy Goal:
– Miner learns nothing (else) about the users’ inputs.
ClassifierFrequencies
![Page 14: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/14.jpg)
Privacy-Preserving Frequency Mining
• Setting:
– n users U1, …Un and each has a binary input denotedby di (either 0 or 1)
– A miner wants to compute d = Σ di
• Privacy goal:
– The miner learns nothing (else) about the users’inputs.
![Page 15: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/15.jpg)
Privacy-Preserving Frequency Mining
• Protocol key setup:
– Public keys for encryption 〈 X, Y 〉
– Each user Ui keeps her private keys (xi, yi)
• Data submission:
– Each user Ui sends one flow of data to the miner: EX,Y(di)
• Miner locally computes the frequency fromusers’ submitted data
![Page 16: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/16.jpg)
ElGamal Encryption• Public encryption 1. Key pair (pub, priv) 2. Encryption algorithm with public key, E(a) 3. Decryption algorithm with private key, D(•)• Additive homomorphic property E(a)E(b)=E(a+b)
(ayr, gr)
(G, g, y=gxi)
(G, g, xi)
c=(gr)xi
a=ayrc-1
• Based on discrete logarithm problem: give 〈G, g, y〉 where y=gx, it is computationally infeasible to compute x.
![Page 17: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/17.jpg)
Protocol Parameter Setup
• Public key parameters 〈G, g〉
• Each user Ui has two pairs of keys
(xi, Xi=gxi) (yi, Yi=gyi)
• Public keys for encryption 〈X=∏Xi , Y= ∏Yi 〉
• Each user Ui keeps private her private keys (xi, yi)
![Page 18: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/18.jpg)
Protocol of Frequency Mining
• Each user i sends one flow of data to the miner:
EX,Y(di) = (mi, hi) where
mi = gdi • Xyi
hi = Yxi
• The miner computes r = ∏ (mi / hi) and thencomputes d from r.
![Page 19: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/19.jpg)
Frequency Mining Protocol
d1
di
dn
EX,Y (di)
EX,Y (d1)
EX,Y (dn)
Each user Ui sends the miner EX,Y(di)
d = ∑ di
![Page 20: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/20.jpg)
Correctness and Privacy
• Correctness:
– The protocol computes the sum of each input.
• Privacy:
– The protocol protects each honest user’s privacyagainst the miner and up to n – 2 corrupted users.
– Proof based on the semantic security of ElGamalencryption.
€
{M(d,[di,xi,yi]i ∈ I,[Xj,Yi] j ∉ I)}≡c{viewminer,{Ui} i ∈ I([di,xi,yi]i=1
n )}
![Page 21: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/21.jpg)
Experimental Results - Primitive
0.05 0.05
0.11
0.14
0.08
0
0.04
0.08
0.12
0.16
2000 4000 6000 8000 10000
Number of Customers
Tim
e (
Sec
on
ds)
![Page 22: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/22.jpg)
Experimental Results - Bayes Classifier
![Page 23: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/23.jpg)
Outline
• Models for distributed data mining
• Overview of our work in privacy-preserving datamining
• Privacy-preserving classification via frequencymining in the fully distributed model
• Privacy-preserving k-anonymization in the fullydistributed model
• Future work and conclusions
![Page 24: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/24.jpg)
ColitisNo Allergy0703008-01-40
DiphtheriaSulfur0702908-02-57
PolioNo Allergy0703011-12-39
StrokeNo Allergy0702808-02-57
PharyngitisPenicillin0703003-24-79
History of IllnessAllergyZip CodeDate of Birth
Medical ResearchDatabase
SensitiveInformation
Anonymization to Protect Privacy?
![Page 25: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/25.jpg)
ColitisNo Allergy0703008-01-40
DiphtheriaSulfur0702908-02-57
PolioNo Allergy0703011-12-39
StrokeNo Allergy0702808-02-57
PharyngitisPenicillin0703003-24-79
History of IllnessAllergyZip CodeDate of Birth
I know Victor is in this table, and I knowhis birthday is 08-02-57 and he lives inthe 07028… Now I’ve learned he has ahistory of stroke!
StrokeNo Allergy0702808-02-57
Quasi-identifiers
Risk of Re-identification
![Page 26: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/26.jpg)
ColitisNo Allergy07030*
DiphtheriaSulfur0702*08-02-57
PolioNo Allergy07030*
StrokeNo Allergy0702*08-02-57
PharyngitisPenicillin07030*
History of IllnessAllergyZip CodeDate of Birth
2-anonymous table
Idea: Make re-identification more difficult, by ensuringthat there are at least k records with a given quasi-id.
k-Anonymity [SS98]
![Page 27: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/27.jpg)
Property of k-Anonymous Table
• Each value of quasi-identifier attributes appears ≥k times in the table (or it does not appear at all)
• Each row of the table is hidden in ≥ k rows
• Each person involved is hidden in ≥ k peers
![Page 28: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/28.jpg)
ColitisNo Allergy07030*
DiphtheriaSulfur0702*08-02-57
PolioNo Allergy07030*
StrokeNo Allergy0702*08-02-57
PharyngitisPenicillin07030*
History of IllnessAllergyZip CodeDate of Birth
StrokeNo Allergy0702*08-02-57
DiphtheriaSulfur0702*08-02-57
Which of them is Victor’s record?Confusing…
k-Anonymity May Protect Privacy
![Page 29: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/29.jpg)
Centralized k-Anonymization
• k-anonymity has been extensively studied[Sam01, Swe02a, Swe02b, MW04, AFK+04, BA05].
• Existing k-anonymization algorithms assume asingle party that has access to the entireoriginal table.
• We remove the need for this assumption,thereby providing additional privacy.
![Page 30: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/30.jpg)
user user user user
minerk-anonymous
table
Distributed k-Anonymization
• N customers + 1 publisher (or miner)
• Each user/respondent/customer has her personal data,comprising a row of the table.
![Page 31: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/31.jpg)
user user user user
minerk-anonymous
table
Distributed k-Anonymization [ZYW05]
Privacy Goal: The miner should not be able toassociate sensitive information in the table withthe corresponding customer.
![Page 32: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/32.jpg)
Hiding quasi-identifierssuppressed by MW algorithm
MW algorithm[MW04]
Formulation 2
Hiding sensitive informationoutside this part
Extracting the k-anonymous part
Formulation 1
Privacy Protectionk-AnonymizationMethod
ProblemFormulation
Formulations of Problem
• We can formulate the problem differently byconsidering different methods to k-anonymizethe table.
• We consider two formulations:
![Page 33: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/33.jpg)
ColitisNo Allergy0703011-12-39
ColitisNo Allergy0703011-12-39
DiphtheriaSulfur0702908-02-57
PharyngitisPenicillin0703003-24-79
StrokeNo Allergy0702908-02-57
History of IllnessAllergyZip CodeDate of Birthk-anonymouspart: largestsubset of rows thatis k-anonymous
Privacy Guarantee: The miner should not learn theprivacy-sensitive attributes of the rows outside thek-anonymous part.
Problem Formulation 1
![Page 34: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/34.jpg)
Basic Idea of Solution
• Each user encrypts her sensitive attributes using anencryption key that can be derived by the miner if andonly if there are ≥ k rows whose quasi-identifiers areequal. No user knows others’ encryption key.
• ⇒ the miner is able to see the sensitive attributes if andonly if there are ≥ k users whose quasi-identifiers areequal.
• Uses (2N, k)-Shamir secret sharing, hash functions, andElGamal-like threshold encryption with keyreconstruction via La Grange interpolation in theexponent.
![Page 35: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/35.jpg)
Degree-(k-1)polynomial
Point that decidesencryption key
Point submitted tominer
Illustration of Solution
![Page 36: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/36.jpg)
Summary of Formulation 1
• Miner learns only those rows of the table for whichthe same quasi-identifier appears at least k times.
• Therefore, this solution is mainly applicable when thedata is already close to k-anonymous.
• Users may agree to generalize data before submissionin order to increase likelihood of k-anonymity.
– For example, use birth year not entire birth date.
![Page 37: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/37.jpg)
Formulation 2
• In this formulation, the miner learns a table thatsuppresses some quasi-ids of the original table inorder to k-anonymize it.
• Our solution is a distributed privacy-preservingversion of the [MW04] algorithm.
• It protects miner from learning the values of thesuppressed quasi-ids. (Miner does learn certainadditional information.)
![Page 38: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/38.jpg)
ColitisNo Allergy0703008-01-40
DiphtheriaSulfur0703008-02-57
PolioNo Allergy0703011-12-39
StrokeNo Allergy0703008-02-57
PharyngitisPenicillin0703003-24-79
History of IllnessAllergyZip CodeDate of Birth
ColitisNo Allergy07030*
DiphtheriaSulfur0703008-02-57
PolioNo Allergy07030*
StrokeNo Allergy0703008-02-57
PharyngitisPenicillin07030*
History of IllnessAllergyZip CodeDate of Birth
MW algorithm
Privacy Guarantee: Theminer should not learn thequasi-identifierssuppressed by MWalgorithm.
Problem Formulation 2
![Page 39: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/39.jpg)
MW Algorithm
• Phase 1: Compute the distance between each pair ofrows.
– The distance between two rows is the number of quasi-idattributes in which they have different entries
• Phase 2: Compute a k-partition of the table.
– A k-partition is a collection of disjoint subsets of rows in whicheach subset contains at least k rows and the union of thesesubsets is the entire table.
• Phase 3: Compute the k-anonymized table.
– Replace any differing entries in each k-partition with *’s.
![Page 40: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/40.jpg)
Private Distributed Algorithm
• Phase 1: Compute the distance between each pair ofrows.
– We give a protocol for the miner to learn these distanceswithout learning the quasi-ids.
• Phase 2: Compute a k-partition of the table.
– Miner can do this locally using distances from Phase 1.
• Phase 3: Compute the k-anonymized table.
– We give a protocol by which the miner learns the quasi-idsthat do not need to be suppressed.
![Page 41: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/41.jpg)
Summary of Formulation 2
• This solution is a distributed privacy-preservingversion of the MW algorithm.
• It works well even if the original table is not closeto k-anonymous.
• However, the users must be willing to allow theminer to learn the inter-row distances.
![Page 42: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/42.jpg)
Ongoing Work
• Experimental platform for privacy-preserving datamining:
– [SYW04]: Experimental analysis of secure scalar product.
– [KRWF#]: Experimental analysis of [WY04,WY05] Bayesnetwork algorithm.
• Enforce policies about what kind of queries orcomputations on data are allowed:
– [JW#]: Extends private inference control of [WS04] to workwith more complex query functions. Client learns query resultif and only if inference rule is met. (Neither client nor serverlearns anything else).
![Page 43: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/43.jpg)
Other Future Directions
• Preprocessing of data for PPDM.
• Privacy-preserving data solutions that use bothrandomization and cryptography in order to gainsome of the advantages of both.
• Policies for privacy-preserving data mining:languages, reconciliation, and enforcement.
• Incentive-compatible privacy-preserving datamining.
![Page 44: Privacy-Preserving Data Mining in the Fully …Our Work •[WY04,YW05]: privacy-preserving construction of Bayesian networks from vertically partitioned data. •[YZW05]: frequency](https://reader033.vdocuments.mx/reader033/viewer/2022041715/5e4b2c52cfb7b944a0745450/html5/thumbnails/44.jpg)
Conclusions
• Increasing use of computers and networks has led to aproliferation of sensitive data.
• Without proper precautions, this data could be misused.
• Many technologies exist for supporting proper datahandling, but much work remains, and some barriers mustbe overcome in order for them to be deployed.
• Cryptography is a useful component, but not the wholesolution.
• Technology, policy, and education must work together.