privacy preserving k -means clustering on vertically partitioned data

17
Privacy Preserving K- means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton

Upload: briar-richardson

Post on 30-Dec-2015

52 views

Category:

Documents


1 download

DESCRIPTION

Privacy Preserving K -means Clustering on Vertically Partitioned Data. Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton. Overview. Global Problem Privacy Preserving Distributed Data Mining Specific Problem Clustering (K-Means) For Vertically Partitioned Data Using - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Privacy Preserving K-means Clustering on Vertically

Partitioned Data

Presented by: Jaideep Vaidya

Joint work: Prof. Chris Clifton

Page 2: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Overview

• Global Problem– Privacy Preserving Distributed Data Mining

• Specific Problem– Clustering (K-Means)

• For– Vertically Partitioned Data

• Using– Cryptographic Tools

Page 3: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Medical Records

RPJ Yes Diabetic

CAC No Tumor No

PTR No Tumor Diabetic

Cell Phone Data

RPJ 5210 Li/Ion

CAC none none

PTR 3650 NiCd

Global Database ViewTID Brain Tumor? Diabetes? Model Battery

Vertical Partitioning of Data

Page 4: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Is the problem trivial?

Page 5: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Privacy Preserving Data Mining

• Perturbation– Agrawal & Srikant, Agrawal & Aggarwal, – Rizvi & Haritsa, Evfimievski et al.

• Cryptographic– Lindell & Pinkas, Du & Zhan– Vaidya & Clifton, Kantarcioglu & Clifton

Page 6: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Secure Multiparty Computation (SMC)

• Given a function f and n inputs, distributed at n sites, compute

the result

while revealing nothing to any site except its own input(s) and the result.

xxx n,...,,

21

nxxxfy ,,, 21

Page 7: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Results

• Cluster assignment for entities– Not private

• Cluster centers– Semi-private

2.3 34 19 15.5 5210 Li/Ion Piezo

Page 8: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Secure K-means clustering

Arbitrarily select k starting points

Repeat– Assign to respectively– (re)assign each object to closest cluster

based on distance from mean– Re-compute the cluster means

Until no change

''2

'1 ,,, k

k ,,, 21 ''2

'1 ,,, k

''2

'1 ,,, k

K-means clustering

Page 9: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Assigning objects to closest cluster

k

i

r

D

PPP

O,

O,ity object/entevery For

j

2

1

21

rj

ijki

x 11

minarg Compute

Page 10: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Key Idea

• Disguise site components with random values

• Compare distances while revealing only comparison result

• Permute order of clusters to conceal meaning of comparison results

Page 11: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Closest Cluster Computation

• 3 special sites, P1, P2 and Pr

• P1 generates

– r random vectors such that– Permutation π (over 1 .. K)

iV 01

r

iiV

Page 12: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Permutation ProtocolDu and Atallah ’01

A B,

V

X

EXE ),(

))((

VXE

Homomorphic encryption: Ek(x)*Ek(y) = Ek(x+y)

)(

VX

Page 13: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Closest Cluster Computation

P1

P2

,

V i

2X222 ),( EXE

))(( 222

VXE

Pr

rX

rrr EXE ),(

))((

rrr VXE

Stage 1

P1

Pr-1

P3

Pr

)( 33

VX

)( 11

VX

)( 11

rr VX

Stage 2

2i

ii VX

Page 14: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Closest Cluster Computation

• Stage 3– P2 and Pr determine i, the index of the cluster

with minimum distance

• Stage 4– P1 computes and broadcasts i1

Page 15: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

When to stop?

• Locally compute difference in means

• Globally known threshold

• Use simple random-adding technique to disguise actual values– First party adds random value to its distance and

sends to next party– Each party adds its value to total and sends on– Last party compares with first party’s random

+threshold

Page 16: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Communication Cost

• r parties, n data elements, m bit distances

Bits Rounds

Basic Algorithm

O(knr) O(r+k)

Optimized Algorithm

O(kmr) O(r)

Generic Method

O(kmnr3) 1

Non-Secure Method

O(n) 1

Page 17: Privacy Preserving  K -means Clustering on Vertically Partitioned Data

Conclusion

• Presented a solution for Privacy Preserving K-Means Clustering problem

• How to use clusters?

• Will parties share required information for the possible benefits?

• Improve Efficiency

• Working on EM-Clustering, implementations