privacy-preserving anonymization of set value data manolis terrovitis institute for the management...

16
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis University of Hong Kong (HKU) Panos Kalnis King Abdullah University of Science and Technology (KAUST)

Upload: beryl-parker

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

Privacy-preserving Anonymization of Set Value Data

Manolis TerrovitisInstitute for the Management of Information Systems

(IMIS), RC AthenaNikos Mamoulis

University of Hong Kong (HKU)Panos Kalnis

King Abdullah University of Science and Technology (KAUST)

Page 2: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

2

Motivation

Attacker can see up to m items Any m items No distinction between sensitive and non-sensitive items

0% M

ilk

Preg

nanc

y

test

Beer

Helen

Page 3: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

3

Motivation (cont.)

Helen: Beer, 0% Milk, Pregnancy testJohn: Cola, CheeseTom: 2% Milk, Coffee….Mary: Wine, Beer, Full-fat Milk

Database

t1: Beer, 0%Milk, Pregnancy testt2: Cola, Cheeset3: 2% Milk, Coffee….tn: Wine, Beer, Full-fat Milk

Published

AttackerFind all transactions that contain Beer & 0% Milk

t1: Beer, Milk, Pregnancy testt2: Cola, Cheeset3: Milk, Coffee….tn: Wine, Beer, Milk

Page 4: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

4

km-anonymity

Di

tttD

t

ooo

,...,

,...,,

21

21

Set of items

Transaction

Database

tqsDttres |

kresres 0

mqs Query terms

km-anonymity:

Page 5: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

5

Related Work: K-Anonymity [Swe02]

Age ZipCode Disease

42 25000 Flu

46 35000 AIDS

50 20000 Cancer

54 40000 Gastritis

48 50000 Dyspepsia

56 55000 Bronchitis

[Swe02] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. Int. J. of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.

(a) Microdata

Quasi-identifier

Age ZipCode Disease

42-46 25000-35000 Flu

42-46 25000-35000 AIDS

50-54 20000-40000 Cancer

50-54 20000-40000 Gastritis

48-56 50000-55000 Dyspepsia

48-56 50000-55000 Bronchitis

(a) 2-anonymous microdata

NOT suitable for high-dimensionality

Page 6: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

6

Related Work: L-diversity in Transactions

[GTK08] G. Ghinita, Y. Tao, P. Kalnis, “On the Anonymization of Sparse High-Dimensional Data”, ICDE, 2008

Requires knowledge of (non)-sensitive attributes

Page 7: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

7

Our Approach: Employs Generalization

Aaa 21,

Gen

era

lizati

on

H

iera

rch

y

otherwise ,

node leaf ,0)(

pupNCP

Information loss

k=2m=2

Page 8: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

8

Lattice of Generalizations

Page 9: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

9

Optimal Algorithm

Q: Q: Q:

Page 10: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

10

Count Tree

1221

1212122 ,,,

,,,,,,,,

baBaAbAB

baBABAbabat

A1B

12a

11b

1

1b1

B1

2a1

1b1

1 1 1

All generalized forms of the paths reside in the tree We can find easily which anonymizations are needed

Page 11: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

11

Apriori-based Anonymization

Global Optimal vs Local Optimal Solution for each path

We examine the paths By size (A priori principle) Paths with invalid nodes are skipped

Page 12: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

12

Apriori-based Anonymization

1. Initialize gen_map2. For i := 1 to m do

1. For all t D do1. Extend t acccording to gen_map2. Add all i-subsets of extended t to

count-tree3. Check all paths in count tree and update

gen_map

Page 13: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

13

Small Datasets (2-15K, BMS-WebView2)

|I|=40..60, k=100, m=3

Page 14: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

14

Small Datasets (BMS-WebView2)

|D|=10K, k=100, m=1..4

Page 15: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

15

Apriori Anonymization for Large Datasets

500

sec

10se

c

100

sec

|D| |I|

515K 1657

59K 497

77K 3340

k=5 m=3

Page 16: Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis

16

Points to Remember

Anonymization of Transactional Data Attacker knows m items Any m items can be the quasi-identifier

Global recoding method Optimal solution: too slow Apriori Anonymization: fast and low information

loss Extensions (VLDBJ 2010)

Local recoding (sort by Gray order and partition)

Global recoding (by partitioning the data domain)