dcmeet second 07-07-11

35
Presented By K.Indira Under the Guidance of Dr. S. Kanmani, Professor, Department of Information Technology, Pondicherry Engineering College. 1 Mining Association Rules using Optimal Genetic Algorithm & Quantum Swarm intelligent PSO.

Upload: harinima

Post on 27-Dec-2015

3 views

Category:

Documents


0 download

DESCRIPTION

v

TRANSCRIPT

Page 1: Dcmeet Second 07-07-11

1

Presented ByK.Indira

Under the Guidance ofDr. S. Kanmani,Professor, Department of Information Technology,Pondicherry Engineering College.

Mining Association Rules using Optimal Genetic Algorithm &

Quantum Swarm intelligent PSO.

Page 2: Dcmeet Second 07-07-11

2

Objective.Introduction.

Data Mining.Association Analysis.Limitations of the existing system.GA and PSO – An Introduction.

Existing Work.Based on GA.Based on PSO.

Work Done So far.

Proposed Work.

Papers Published.

References.

Contents

Execution Plan.

Page 3: Dcmeet Second 07-07-11

3

To Propose an efficient methodology for mining of ARs using Optimal Genetic Algorithm & Quantum Swarm intelligent PSO

Objective

Page 4: Dcmeet Second 07-07-11

4

Extraction of interesting information or patterns from data in large databases is known as data mining.

Data Mining

Page 5: Dcmeet Second 07-07-11

5

• Association analysis is the discovery of what

are

commonly called association rules.

• It studies the frequency of items occurring

together in

transactional databases

• Association rule mining provides valuable

information in assessing significant

correlations.

ASSOCIATION ANALYSIS

Page 6: Dcmeet Second 07-07-11

6

Association Rules

Find all the rules X Y with minimum support and confidence Support, s, probability

that a transaction contains X Y

Confidence, c, conditional probability that a transaction having X also contains Y

Let minsup = 50%, minconf = 50%Freq. Pat.: Milk:3, Nuts:3, Sugar:4,

Eggs:3, {Milk, Sugar}:3

Customerbuys sugar

Customerbuys both

Customerbuys milk

Nuts, Eggs, Bread40Nuts, Coffee, Sugar , Eggs,

Bread50

Milk, Sugar, Eggs30

Milk, Coffee, Sugar20

Milk, Nuts, Sugar10

Items boughtTid

Association rules: Milk Sugar (60%, 100%) Sugar Milk (60%, 75%)

Page 7: Dcmeet Second 07-07-11

7

• Apriori, FP Growth Tree, Éclat are some of

the popular algorithms for mining ARs.

• Traverse the database many times.

• I/O overhead, and computational complexity

is more

• Cannot meet the requirements of large-

scale database mining.

Limitations of Existing System

Page 8: Dcmeet Second 07-07-11

GA and PSO – An Introduction

• Evolutionary algorithms provide robust and efficient approach in exploring large search space.

• A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology.

• PSOs mechanism is inspired by the social and cooperative behavior displayed by various species like birds, fish etc including human beings.

8

Page 9: Dcmeet Second 07-07-11

9

Existing WorkMining ARs Based on Genetic Algorithm

• Efficient Distributed Genetic Algorithm done by spatial partitioning of the population into several semi-isolated nodes, each evolving in parallel and possibly exploring different regions of the search space.

• Genetic algorithm without taking the minimum support and confidence into account. Extracts the best rules that have best correlation between support and confidence

• Improved niched Pareto genetic algorithm(INPGA), selects the accurate candidates and also saves selection time with combining BNPGA and SDNPGA

• GRA with a new operator, called guided mutation is introduced. GRA considers the correlation coefficient between nodes in each individual of GRA.

Page 10: Dcmeet Second 07-07-11

10

Mining ARs Based on Particle Swarm Optimization

Existing Work contd..

• A novel algorithm for association rule mining in order to improve computational efficiency as well as to automatically determine suitable threshold values.

• The algorithm operates at three evolution levels where an adaptive inertia weight is presented. The safety distance is introduced to move the particle through its current position, and the proximity index.

• Self-adaptive method to adjust the inertia weight of the velocity update rule based on the empirical values and negative feedback technique is introduced ,which relieve the burden of specifying the parameters values.

• Combines Particle Swarm Optimization (PSO) and Genetic Algorithms (GAs) using fuzzy logic to integrate the results of both methods and for parameters tuning. The new optimization method combines the advantages of PSO and GA to give us an improved FPSO + FGA hybrid approach.

Page 11: Dcmeet Second 07-07-11

11

Work Done so Far

• Association Rule Mining was carried out using the Genetic Algorithm in Matlab 2008a.

• Mining of Association rule was carried out

using self Adaptive Genetic algorithm using Java.

• The GA Parameters were varied and the results were recorded for each cases.

Page 12: Dcmeet Second 07-07-11

12

Mining ARs using GA in Matlab 2008a.

Methodology

Selection : Tournament

Crossover Probability : Fixed ( Tested with 3 values)

Mutation Probability : No Mutation

Fitness Function :

Dataset : Lenses, Iris, Haberman from UCI Irvine repository.

Population : Fixed ( Tested with 3 values)

Page 13: Dcmeet Second 07-07-11

13

Flow chart of the GA

Page 14: Dcmeet Second 07-07-11

Results Analysis

No. of Instances No. of Instances * 1.25

No. of Instances *1.5

Accuracy %

No. of Generatio

ns

Accuracy %

No. of Generatio

ns

Accuracy %

No. of Generation

s

Lenses 75 7 82 12 95 17Haberman

71 114 68 88 64 70

Iris 77 88 87 53 82 45

Comparison based on variation in population Size.

Minimum Support & Minimum ConfidenceSup = 0.4 &

con =0.4Sup =0.9 & con

=0.9Sup = 0.9 & con

= 0.2Sup = 0.2 & con

= 0.9

Accuracy %

No. of Gen

Accura

cy %No. of Gen.

Accura

cy %No. of Gen.

Accura

cy %No. of

Gen

Lenses 22 20 49 11 70 21 95 18Haberman

45 68 58 83 71 90 62 75

Iris 40 28 59 37 78 48 87 55

Comparison based on variation in Minimum Support and Confidence

Page 15: Dcmeet Second 07-07-11

15

Cross OverPc = .25 Pc = .5 Pc = .75

Accuracy %

No. of Generatio

ns

Accuracy %

No. of Generatio

ns

Accuracy %

No. of Generatio

ns

Lenses 95 8 95 16 95 13Haberman

69 77 71 83 70 80

Iris 84 45 86 51 87 55

Dataset No. of Instances

No. of attributes

Populati

on SizeMinimum

Support

Minimum confidence

Crossov

er rateAccuracy in %

Lenses 24 4 36 0.2 0.9 0.25 95Haberman

306 3 306 0.9 0.2 0.5 71

Iris 150 5 225 0.2 0.9 0.75 87

Comparison of the optimum value of Parameters for maximum Accuracy

achieved

Comparison based on variation in Crossover Probability

Page 16: Dcmeet Second 07-07-11

16

• Values of minimum support, minimum

confidence and population size decides upon

the accuracy of the system than other GA

parameters.

• Crossover rate affects the convergence rate

rather than the accuracy of the system.

• The optimum value of the GA parameters varies

from data to data and the fitness function plays

a major role in optimizing the results.

• The size of the dataset and relationship between

attributes in data contributes to the setting up

of the parameters.

Inferences

Page 17: Dcmeet Second 07-07-11

17

Mining ARs using Self Adaptive GA in Java.

Methodology

Selection : Roulette Wheel

Crossover Probability : Fixed ( Tested with 3 values)

Mutation Probability : Self Adaptive

Fitness Function :

Dataset : Lenses, Iris, Car from UCI Irvine repository.

Population : Fixed ( Tested with 3 values)

Page 18: Dcmeet Second 07-07-11

18

Procedure SAGA

BeginInitialize population p(k);Define the crossover and mutation rate;

Do{

Do{Calculate support of all k rules;Calculate confidence of all k rules;Obtain fitness;Select individuals for crossover / mutation;

Calculate the average fitness of the n and (n-1) the generation;Calculate the maximum fitness of the n and (n-1) the generation;Based on the fitness of the selected item, calculate the new crossover and mutation rate;Choose the operation to be performed;} k times;}

Page 19: Dcmeet Second 07-07-11

Self Adaptive GA

SELF ADAPTIVE

Page 20: Dcmeet Second 07-07-11

20

Dataset Traditional GA Self Adaptive GAAccuracy

No. of Generations

Accuracy

No. of Generations

Lenses 75 38 87.5 35Haberman 52 36 68 28

Car Evaluation

85 29 96 21

Dataset Traditional GA Self Adaptive GAAccurac

y No. of

Generations

Accuracy

No. of Generations

Lenses 50 35 87.5 35Haberman 36 38 68 28

Car Evaluation

74 36 96 21

ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE ACCORDING TO TERMINTAION OF SAGA

ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE IDEAL FOR TRADITIONAL GA

Results Analysis

Page 21: Dcmeet Second 07-07-11

Inferences

Better accuracy.

Better convergence. Self Adaptive GA gives better accuracy

than Traditional GA.

Page 22: Dcmeet Second 07-07-11

22

Proposed Work

1. To implement a Distributive niched Pareto memetic Algorithm for Rule Mining.

2.To propose a association rule mining algorithm based on Chaotic PSO and swarm intelligence.

3.Propose a Particle swarm optimization rule mining methodology combined with quantum computing and quantum differential evolution

Page 23: Dcmeet Second 07-07-11

23

• Obtains the comparison set S from clustering based

samples.

• For any two candidates and comparison set S, if one

candidate is dominated and the other not, the

candidate non-dominated is selected, Exit.

• If two candidates (cd_1 and cd_2) compute the

number of samples in two niches, count1 and count2.

• If count1=0, cd_1 is selected and if count2=0, cd_2 is

selected, Exit.

• If count1-count2>delta or count2-count1>delta, then

selects cd_2 or cd_1, Exit..

• If abs(count1-count2)<delta, computing the standard

deviation of

two niches,sd1 and sd2.

• If sd1>sd2, cd_1 is selected, otherwise, cd_2 is

selected.

• Exit

Niched Pareto Selection Algorithm

Page 24: Dcmeet Second 07-07-11

24

Distributed Model

GA1subpopulatio

n

GA2subpopulation

GA3subpopulatio

n

GA4subpopulatio

n Full Dataset

Rules Generat

ed

Rules Generat

edRules

Generated

Rules Generat

ed

Concept

Description

Page 25: Dcmeet Second 07-07-11

25

Based onchaotic maps

Association Rule mining Algorithm based on Chaotic PSO and Swarm intelligence.

Swarm Intelligence

Concept

Page 26: Dcmeet Second 07-07-11

26

Execution Plan

July : Niched Pareto Sampling based Selection. Implementing µGA for Local intensity Search.

August : Distributed Methodology Implementation.

Preparing the Above work as a paper.September & : Particle Swarm Optimization basedOctober Rule Mining to be implemented.

November : Chaotic PSO & Swarm intelligence based PSO

for Mining ARs to be implemented.Documenting the same into paper.

December & : Study on Quantum computing and January differential Evolution concepts.

Page 27: Dcmeet Second 07-07-11

27

Papers Published

Paper titled “ Framework for Comparison of Association Rule Mining Using Genetic Algorithm” has been presented in the International Conference On Computers, Communication & Intelligence at VCET, 2010.

Paper titled “Mining Association Rules Using Genetic Algorithm: The role of Estimation Parameters” has been Selected for presentation in the International conference on advances in computing and communications ,2011. To be published in Springer LNCS (CCIS) series.

Paper titled ‘Rule Acquisition in Data Mining Using a Self Adaptive Genetic Algorithm ’ has been Selected for presentation in the First International conference on Computer Science and Information Technology (CCSEIT-2011) , To be published in Springer LNCS (CCIS) series.

Page 28: Dcmeet Second 07-07-11

28

References

Jing Li, Han Rui-feng, “A Self-Adaptive Genetic Algorithm Based On Real- Coded”, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010

Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, “Linkage Discovery through Data Mining”, IEEE Magazine on Computational Intelligence, Volume 5, February 2010.

Caises, Y., Leyva, E., Gonzalez, A., Perez, R., “An extension of the Genetic Iterative Approach for Learning Rule Subsets “, 4th International Workshop on Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010

Shangping Dai, Li Gao, Qiang Zhu, Changwu Zhu, “A Novel Genetic Algorithm Based on Image Databases for Mining Association Rules”, 6th IEEE/ACIS International Conference on Computer and Information Science, Page(s): 977 – 980, 2007

Peregrin, A., Rodriguez, M.A., “Efficient Distributed Genetic Algorithm for Rule Extraction”,. Eighth International Conference on Hybrid Intelligent Systems, HIS '08. Page(s): 531 – 536, 2008

Page 29: Dcmeet Second 07-07-11

29

Mansoori, E.G., Zolghadri, M.J., Katebi, S.D., “SGERD: A Steady-State Genetic Algorithm for Extracting Fuzzy Classification Rules From Data”, IEEE Transactions on Fuzzy Systems, Volume: 16 , Issue: 4 , Page(s): 1061 – 1071, 2008..

Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, “Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining”, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 – 852, 2009

Hong Guo, Ya Zhou, “An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application”, 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 – 120, 2009

Genxiang Zhang, Haishan Chen, “Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining”, International Conference on Artificial Intelligence and Computational Intelligence, AICI '09, Volume: 4, Page(s): 341 – 345, 2009

References Contd..

Page 30: Dcmeet Second 07-07-11

30

Maria J. Del Jesus, Jose A. Gamez, Pedro Gonzalez, Jose M. Puerta, On the Discovery of Association Rules by means of Evolutionary Algorithms, from Advanced Review of John Wiley & Sons , Inc. 2011

Junli Lu, Fan Yang, Momo Li, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, Third International Conference on Measuring Technology and Mechatronics Automation, 2011.

Hamid Reza Qodmanan, Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi Objective Association Rule Mining with Genetic Algorithm without specifying Minimum Support and Minimum Confidence, Expert Systems with Applications 38 (2011) 288–298.

Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule Extraction, Applied Soft Computing 11 (2011) 733–743.

J.H. Ang, K.C. Tan , A.A. Mamun, An Evolutionary Memetic Algorithm for Rule Extraction, Expert Systems with Applications 37 (2010) 1302–1315.

References

Page 31: Dcmeet Second 07-07-11

31

R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimization to association rule mining, Applied Soft Computing 11 (2011) 326–336.

Bilal Alatas , Erhan Akin, Multi-objective rule mining using a chaotic particle swarm optimization algorithm, Knowledge-Based Systems 22 (2009) 455–460.

Mourad Ykhlef, A Quantum Swarm Evolutionary Algorithm for mining association rules in large databases, Journal of King Saud University – Computer and Information Sciences (2011) 23, 1–6.

Haijun Su, Yupu Yang, Liang Zhao, Classification rule discovery with DE/QDE algorithm, Expert Systems with Applications 37 (2010) 1216–1222.

Jing Li, Han Rui-feng, “A Self-Adaptive Genetic Algorithm Based On Real- Coded”, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010

Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, “Linkage Discovery through Data Mining”, IEEE Magazine on Computational Intelligence, Volume 5, February 2010.

References Contd..

Page 32: Dcmeet Second 07-07-11

32

Caises, Y., Leyva, E., Gonzalez, A., Perez, R., “An extension of the Genetic Iterative Approach for Learning Rule Subsets “, 4th International Workshop on Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010

Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, “Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining”, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 – 852, 2009

References Contd..

Page 33: Dcmeet Second 07-07-11

33

References• Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule extraction, Applied Soft Computing 11 (2011) 733–743.

• Hamid Reza Qodmanan , Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence, Expert Systems with Applications 38 (2011) 288–298.

• Junli Lu, Fan Yang, Momo Li1, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, 2011 Third International Conference on Measuring Technology and Mechatronics Automation.

• Yan Chen, Shingo Mabu, Kotaro Hirasawa, Genetic relation algorithm with guided mutation for the large-scale portfolio optimization, Expert Systems with Applications 38 (2011) 3353–3363.

Page 34: Dcmeet Second 07-07-11

34

References• R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle

swarm optimization to association rule mining, Applied Soft Computing 11 (2011) 326–336

• Yamina Mohamed Ben Ali, Soft Adaptive Particle Swarm Algorithm for Large Scale Optimization, IEEE 2010.

• Feng Lu, Yanfeng Ge, LiQun Gao, Self-adaptive Particle Swarm Optimization Algorithm for Global Optimization, 2010 Sixth International Conference on Natural Computation (ICNC 2010)

• Fevrier Valdez, Patricia Melin, Oscar Castillo, An improved evolutionary method with fuzzy logic for combining Particle Swarm Optimization and Genetic Algorithms, Applied Soft Computing 11 (2011) 2625–2632

Page 35: Dcmeet Second 07-07-11

35

Thank You