privacy preserving in data mining with hybrid approach

1

Privacy Preserving in Data Mining With Hybrid Approach

Guided By:- Presented by:-Prof. Paresh M.Solanki Narenndra Dhadhal

M.Tech. (III) IT 14014021007

2

OUTLINE1) Introduction PPDM2) Need for Privacy3) Privacy Preserving Techniques4) Literature Survey5) K- Anonymization6) Proposed Work7) References

3

Introduction Privacy preserving is one of the most important

research topics in the data security field and it has become a serious concern in the secure transformation of personal data in recent years.[1]

A number of algorithmic techniques have been designed for Privacy Preserving Data Mining (PPDM).[1]

4

Introduction (cont.) It is used to efficiently protect individual privacy in

data sharing. [1]

Thus, the various models have been designed for privacy preserving data sharing. [1]

In which various privacy preserving approaches in data sharing and their merits and demerits are analyzed, [1]

5

Need for Privacy[2] Privacy preserving data mining has become

increasingly popular because it allows sharing of privacy sensitive data for analysis purposes.

Suppose a hospital has some person-specific patient data which it wants to publish.

It wants to publish such that: Information remains practically useful Identity of an individual cannot be determined

6

Need for Privacy[4]

Non-Sensitive Data Sensitive Data

# Zip Age Nationality Name Condition

1 13053 28 Indian Kumar Heart Disease

2 13067 29 American Bob Heart Disease

3 13053 35 Canadian Ivan Viral Infection

4 13067 36 Japanese Umeko Cancer

Fig 1:- Sensitive and Non-Sensitive Data.[4]

7

Quasi Identifiers is a set of attributes that could potentially identify a record owner when combined with publicly available data.

Sensitive Attributes is a set of attributes that contains sensitive person specific information such as disease, salary etc.

Non-Sensitive Attributes is a set of attributes that reates no problem if revealed even to untrustworthy parties.

Need for Privacy[5]

8

Need for Privacy[4] Non-Sensitive Data Sensitive Data

# Zip Age Nationality Condition1 13053 28 Indian Heart Disease2 13067 29 American Heart Disease3 13053 35 Canadian Viral Infection4 13067 36 Japanese Cancer

# Name Zip Age Nationality

1 John 13053 28 American

2 Bob 13067 29 American

3 Chris 13053 23 American

PublishedData

Data leak!

Fig 2:- Sensitive and Non-Sensitive Data Leak.[4]

9

Privacy Preserving Techniques The Important Techniques of Privacy Preserving

Data Mining are: [3] 1)The randomization method 2)The encryption method 3)The Anonymization method

10

1. The Randomization Method [3]

Randomization method is an important and popular method in current privacy preserving data mining techniques.

It masks the values of the records by adding additional data to the original data.

Privacy Preserving Techniques

11

2. The Encryption Method [3]

Encryption method mainly resolves the problems that people jointly conduct mining tasks based on the private inputs they provide.

These privacy mining tasks could occur between mutual un-trusted parties, or even between competitors.

Therefore, to protect the privacy becomes an important

concern in distributed data mining setting.


12

3. The Anonymization Method [3]

Anonymization method is aimed at making the individual record will be indistinguishable among a group record by using generalization and suppression techniques.

K-Anonymity is the representative anonymization method.


13

Literature Survey[1]Privacy Preserving Data Mining Techniques-Survey

Author Ms. Dhanalakshmi.M, Mrs.Siva Sankari, (2014)

Summary In this paper the models of privacy preserving will be discussed .Trust Third Party Model, Semi-honest Model, Malicious Model, Other Models-Incentive Compatibility. Also discuss the survey of privacy preserving techniques such as Randomization method, Anonymization method and Encryption method.

Issues/Challenges

The personalized privacy preservation will become the issue.

14

Literature Survey[2]A Survey on Privacy Preserving Data Mining

Author K.Saranya, K.Premalatha, S.S.Rajasekar, (2015)

Summary This paper presents a brief survey on various standard techniques for privacy preserving data mining was presented namely: Classification, Clustering and Associated rule mining.

Issues/Challenges

The merits and demerits of different techniques were pointed out. In future, propose a hybrid approach of all these techniques.

15

Literature Survey[3]A Survey on Privacy Preserving Data Mining

Author Jian Wang , Yongcheng Luo, Yan Zhao, Jiajin Le, (2009)

Summary This paper intends to reiterate several privacy preserving data mining technologies clearly and then proceeds to analyze the merits and shortcomings of these technologies.

Issues/Challenges

Limitations of the k-anonymity model stem from the two assumptions. First, it may be very hard for the owner of a database to determine which of the attributes are or are not available in external tables. The second limitation is that the k-anonymity model assumes a certain method of attack, while in real scenarios there is no reason why the attacker should not try other methods.

16

Literature Survey[4]A Survey on Anonymity-based Privacy Preserving

Author Jian Wang, Yongcheng Luo, Shuo Jiang, Jiajin Le, (2009)

Summary In this paper author firstly shown that a k-anonymity dataset permits strong attacks due to lack of diversity in the sensitive attributes.

Issues/Challenges

k-anonymity protects against identity disclosure, it does not provide sufficient protection against attribute disclosure.

17

Literature Survey[5]Analysis of Privacy Preserving K-Anonymity Methods and Techniques

Author S.Vijayarani, A.Tamilarasi, M.Sampoorna, (2010)

Summary This paper present a survey of recent approaches that have been applied to the k-Anonymity problem. Two main techniques have been proposed for enforcing k-anonymity on a private table: namely generalization and Suppression.

Issues/Challenges

Threats to k-anonymity that can arise from performing mining on a collection of data and the approaches to combine k-anonymity in data mining.

18

Literature Survey[6]Privacy Preserving in Data Mining Using Hybrid Approach

Author Savita Lohiya, Lata Ragha, (2012)

Summary This paper propose a method called Hybrid approach for privacy preserving. First randomizing the original data. Then apply generalization on randomized or modified data. This technique protect private data with better accuracy, also it can reconstruct original data and provide data with no information loss, makes usability of data.

Issues/Challenges

K-anonymity method has shortcoming of homogeneity and background attack.

19

K- Anonymization Data anonymization is a type of information

sanitization whose intent is privacy protection.[6]

It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.[6]

For example, a hospital may release patients records so that researchers can study the characteristics of various diseases.[6]

20

K- Anonymization There are two common methods for achieving k-

anonymity for some value of k.[3]

Suppression: In this method, certain values of the attributes are replaced by an asterisk '*'. All or some values of a column may be replaced by '*'. [3]

Generalization: In this method, individual values of attributes are replaced by with a broader category. For example, the value ‘33' of the attribute 'Age' may be replaced by ' < 40', the value '24' by '20 < Age ≤ 30' , etc.[3]

21

# Zip Age Nationality Condition

1 130** < 40 * Heart Disease

2 130** < 40 * Heart Disease

3 130** < 40 * Viral Infection

4 130** < 40 * Cancer

Generalization Suppression (cell-level)

K- Anonymization(cont…)

Fig 3:- Generalization and Suppression.[2]

22

ID Attributes

Age Sex Zip Code Disease

1 26 M 83661 Headache

2 24 M 83634 Headache

3 31 M 83967 Viral Infection

4 39 F 83949 Cough

ID AttributesName Age Sex Zip Code

1 Jim 26 M 836612 Jay 24 M 836343 Tom 31 M 839674 Lily 39 F 83949

TABLE I. MICRODATA

TABLE II. VOTER REGISTRATION LIST

K- Anonymization(cont…)[4]

23

1) Key attributes: [5]Name, address, phone number - uniquely identifying!Always removed before release.

2) Quasi-identifiers: [5]It is a set of features whose associated values may be useful

for linking with another data set to re-identify the entity that is the subject of the data.

(5-digit ZIP code, birth date, gender) uniquely identify

Classification of Attributes

24

ID Attributes

Age Sex Zip Code Disease

1 2* M 836** Headache

2 2* M 836** Headache

3 3* * 839** Viral Infection

4 3* * 839** Cough

TABLE III. 2-ANONYMOUS TABLE

K- Anonymization(cont…)[4]

25

K- Anonymization[3]

In general, k-anonymity guarantees that an individual can be associated with his real tuple with a probability at most 1/k.

While k-anonymity protects against identity disclosure, it does not provide sufficient protection against attribute disclosure.

Two attacks were identified : the homogeneity attack and the background knowledge attack.

26

Suppose Jay knows that Jim was 26 year old man and his zip code is 83661. So he conclude that Jim corresponds to the first equivalence class, and thus must have headache. This is the homogeneity attack.

Suppose that, by knowing Lily's age and zip code, Jay can conclude that Lily corresponds to a record in the last equivalence class. Furthermore, suppose that Jay knows that Lily has very low risk for viral infection. This background knowledge enables Jay to conclude that Lily most likely has cough

K- Anonymization[6]

27

In today’s world, privacy is the major concern to protect the sensitive data. People are very much concerned about their sensitive information which they don’t want to share.

The proposed method as we combined K-anonymity with perturbation technique.

Proposed work[5]

28

References[1] Dhanalakshmi, M., and E. Siva Sankari. "Privacy

preserving data mining techniques-survey."Information Communication and Embedded Systems (ICICES), 2014 International Conference on. IEEE, 2014.

[2] K.Saranya, K.Premalatha, S.S.Rajasekar, . " A Survey on Privacy Preserving Data Mining." International Journal of Innovations & Advancement in Computer Science 2015,IEEE,2015.

29

[3] Wang, Jian, et al. "A survey on privacy preserving data mining." Database Technology and Applications, 2009 First International Workshop on. IEEE, 2009.

[4] Wang, Jian, et al. "A survey on anonymity-based privacy preserving." E-Business and Information System Security, 2009. EBISS'09. International Conference on. IEEE, 2009.

References (cont.)

30

References (cont.)[5] Vijayarani, S., A. Tamilarasi, and M. Sampoorna.

"Analysis of privacy preserving k-anonymity methods and techniques." Communication and Computational Intelligence (INCOCCI), 2010 International Conference on. IEEE, 2010.

[6] Lohiya, Savita, and Lata Ragha. "Privacy Preserving in Data Mining Using Hybrid Approach."Computational Intelligence and Communication Networks (CICN), 2012 Fourth International Conference on. IEEE, 2012.

31

Thank You

privacy preserving in data mining with hybrid approach

Education