privacy-preserving rule mining. outline a brief introduction to association rule mining privacy...

24
Privacy-preserving rule mining

Upload: raymond-dixon

Post on 14-Jan-2016

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Privacy-preserving rule mining

Page 2: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Outline A brief introduction to association rule

mining Privacy preserving rule mining

Single party Perturbation - publishing Encryption - outsourcing

Distributed multiparty Cryptographic protocols – collaborative mining

Hiding sensitive rules

Page 3: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Association Rule Mining Transactional datasets

A transaction t = {a,b,c,…} a,b,c,… are called items The length of transaction = # of items Transaction length can vary

Equivalent representation The set of all items : I A transaction t can be transformed to a

boolean vector in length of |I|.

Page 4: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Association Rule Mining Rule mining

Goal: find the frequent itemset Some itemset, e.g.{a,b,c}, appears frequently,

higher than certain support. Rules can be derived from the itemset: AB

{a,b,c} is frequent, then abc, abc, …

Metrics Support = # of occurrences of itemset/ total #

of transactions i.e., prob (A,B)

Confidence = # of occurrences of itemset/# of occurrences of left (of the rule) I.e. the conditional prob: Pr (B|A)

Page 5: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Example E.g. a,b appears 100 times together,

while abc appears 50 times together in total 5000 transactions Support of abc = 50/5000 = 0.01 Confidence of abc = 50/100 = 0.5

Page 6: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Algorithms

Apriori observation: if itemset A is a part of B,

then support(A) >= support (B) Steps in finding frequent itemsets:

Starting from single-item set, pruning the itemsets that have support < threshold

When we have a set of k-itemsets, we expand it to k+1-itemsets, and check their supports.

Using data structures like hash tree to speed up the counting process

Page 7: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Algorithm Generating rules with confidence threshold

Confidence (AB) = P(B|A) = support(AB)/support(A)

Page 8: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Centralized approaches

Two types of methods (Categorical data) Perturbation Encryption

Page 9: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Perturbation Papers

1."Privacy Preserving Mining of Association Rules," SIGKDD 2002. 2."Maintaining data privacy in association rule mining”, VLDB 023.A framework for high-accuracy privacy-preserving mining, ICDE05

Page 10: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Basic ideas Consider a transaction as a boolean bit vector Perturb each bit with certain method

Paper 1: randomly select j items from t, then for rest of all items, with prob p to be selected

Paper 2: each bit has the prob p to be original, 1-p to be flipped

Paper 3: unify the methods with perturbation matrix

Page 11: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

The key is…

After you perturb the data, you should still be able to find the supported rules correctly.

The accuracy is traded off by the intensity of perturbation (p)

Page 12: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Methods discovering the original support

Paper 1 using the correlation between “partial support” to find the original support

Concept of partial support

Prob of the length change of matched parts

N

ltATtAT

l

})(|#{#)(sup

])(|#')'([#]'[ lAtlAtpllpmk

The size of t : m, the size of itemset A: k

Page 13: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Some results

Let si be supi(A) and si’ be supi’ (A)

The matrix P and D are defined with only p[ll’]

1.

2.

From 1, we can estimate the original supportFrom 2, we can estimate the reliability (variance) of the support Estimation (which is related to perturbation rate p)

Page 14: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Privacy

Given an itemset A in perturbed transaction t’ What is the probability of an item a,

really in the itemset A, i.e., ]'|[ tAtap

]'|[ tAtap

Page 15: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Tradeoff between utility and privacy

Lowest discoverable support: distinguishable from zero (consider the variance of support estimation)

Page 16: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Encryption method Paper: Security in Outsourcing of Association Rule

Mining, VLDB2007 Substitution encryption

1-1 substitution a1, b2, … 1-n substitution a{1,10}, b{2,11,12},…

Problem: 1-1 substitution is weak Arbitrary 1-n substitution does not work

Cannot recover original rules from the rules from the substituted items.

Page 17: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

The basic idea Fake items

Original n items, additional m fake items Define “admissible 1-n mapping”

Arbitrary 1-n mapping may result in irreversible results E.g., a{1,2}, b{2}, c{3} If we find frequent itemset {1,2,3} in the substituted set,

{ac} or {abc}, which one is the right original itemset?

Admissible 1-n mapping For each mapping, there should be at least one unique

substitute item in the mapped result, which does not appear in other mapping

E.g., a{1,2}, b{2}, c{3} breaks the definitionwhile a{1,2}, b{2,4}, c{3} is admissible

Page 18: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Recovering rules

When we use admissible mappings We are able to reverse the discovered

rules on substituted set. E.g., if we find {1,2,4} is a frequent set check all mappings:{1,2} a, {2,4} b {1,2,4} {ab}

Page 19: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

cost

Additional cost Generating item mapping Generating transaction transformation

significant

Cost of rule mining Both the # of items and the average length

of transaction is increased, thus the total cost will increase

Page 20: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Features of encryption method

Rules can be accurately recovered A tradeoff between cost and privacy

Privacy is better preserved with more fake items

More fake items will result in higher additional cost.

The VLDB07 paper weaknesses No strong proof on security does not mention any attacks – there are

possibly attacks on this scheme

Page 21: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Distributed datasets

Perturbation works, but existing encryption protocols do not work might need some protocols to make the

encryption method applicable

Horizontally or vertically partitioned Paper1: "Privacy-Preserving Distributed Mining of

Association Rules on Horizontally Partitioned Data," Paper2: "Privacy preserving association rule mining in

vertically partitioned data," Shared features: using the cryptographic methods to

construct protocols

Page 22: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Hiding sensitive rules

When publishing data for rule mining, the rule itself can be sensitive too.

Basic methods Decrease the support

Support = # of occurrences of itemset/ total # of transactions

Decrease the confident Confidence = # of occurrences of itemset/#

of occurrences of left(rule)

Page 23: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

discussion on rule hiding Need sufficient amount of

computational cost at the data owner side

You should know what rules are sensitive in advance! So only necessary for the case you have

to share the data

When hiding sensitive rules, other rules might be damaged

Page 24: Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation

Summary Two methods for single party rule

mining Perturbation and encryption

Distributed multiparty can use protocols and perturbation method

Hiding sensitive rules is also important in some cases