other classification methods in data mining

19
By- Ajaydeep Abhishek kutiyal

Upload: kumar-deepak

Post on 04-Dec-2014

461 views

Category:

Technology


11 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Other classification methods in data mining

By-AjaydeepAbhishek kutiyal

Page 2: Other classification methods in data mining

Classification

Classification is the process of finding a model that describes and distinguishes data classes or concept .

for the purpose of being able to use the model to predict the class of objects whose class label is unknown.

predicts categorical class labels (discrete or nominal)

classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data

2

Page 3: Other classification methods in data mining

3

TrainingData

name age incomeloan decision

Mike young low risky

Mary young low risky

Bill midage high safe

Jim midage low risky

Dave senior low safe

Anne senior medium safe

ClassificationAlgorithms

IF age=youth THEN loan_deci=riskyIF income=high then loan_deci=safeIF age=mid AND income=low THENLoan_deci=risky

Classifier(Model)

Page 4: Other classification methods in data mining

Classifier

TestingData

name age income loan_deciTom senior low SafeMariya mid_age low riskyGeorge mid_age high safe...... ....... ..... ......

Unseen Data

(john,mid_age,low)

Loan deci?

Page 5: Other classification methods in data mining

Genetic Algorithms Rough Set Approach Fuzzy set Approach

Page 6: Other classification methods in data mining

Genetic algorithms are examples of evolutionary computing methods and are optimization-type algorithms.

Given a population of potential problem solutions (individuals).

evolutionary computing expands this population with new and potentially better solutions.

Page 7: Other classification methods in data mining

The basis for evolutionary computing algorithms is biological evolution, where over time evolution produces the best or “fittest” individuals.

In Data mining, genetic algorithms may be used for clustering, prediction, and even association rules.

Page 8: Other classification methods in data mining

Individual (chromosome):• feasible solution in an optimization problem

Population• Set of individuals• Should be maintained in each generation

Page 9: Other classification methods in data mining

The most important starting point to develop a genetic algorithm

Each gene has its special meaning Based on this representation, we can

define • fitness evaluation function, • crossover operator, • mutation operator.

Page 10: Other classification methods in data mining

The fitness function takes a single chromosome as input and returns a measure of the goodness of the solution represented by the chromosome.

Page 11: Other classification methods in data mining

In genetic algorithms, reproduction is defined by precise algorithms that indicate how to combine the given set of individuals to produce new ones. These are called “crossover algorithms”.

Given two individuals; parents from a population, the crossover technique generates new individuals (offspring or children) by switching subsequences of the string

Page 12: Other classification methods in data mining

Single-point Crossover

Two-point Crossover

Uniform Crossover

1 1 01 1

0 0 00 1

0 0 1 0 0 0

0 1 0 1 0 1

1 1 01 1

0 0 00 1

0 1 0 1 0 1

0 0 1 0 0 0

1 1 01 1

0 0 00 1

0 0 1 0 0 0

0 1 0 1 0 1

1 1 00 1

0 0 01 1

0 1 1 0 0 0

0 0 0 1 0 1

1 0 10 1 0 1 0 0 1 1

1 1 01 1

0 0 00 1

0 0 1 0 0 0

0 1 0 1 0 1

1 0 00 1

0 1 01 1

0 0 0 1 0 0

0 1 1 0 0 1

Crossover templateCrossover template

Page 13: Other classification methods in data mining

Usually change a single bit in a bit string

This operator should happen with very low probability.0 1 01 1

0 1 11 1

Mutation point(random)

Page 14: Other classification methods in data mining

Crossover mates are probabilistically selected based on their fitness value.

0 1 00 11 1 01 0

0 0 11 10 1 01 1

1 1 01 01 1 01 1

1 1 01 1

0 1 00 1

1 1 00 1

0 1 01 1

Crossover pointrandomly selected

1 1 00 1

0 1 11 1

0 1 11 1

old generation

new generation0 1 01 1

1 1 01 01 1 01 1

Mutation point(random)

Probabilistically select individualsProbabilistically select individuals

Page 15: Other classification methods in data mining

A rough set is a formal approximation of a crisp set in terms of a pair of sets which give the lower and the upper approximation of the original set.

The tuple composed of the lower and upper approximation is called a rough set.

Page 16: Other classification methods in data mining

• A Rough Set Definition for a given class C is approximated by two sets-

1. Lower Approximation of C consist of all of the data tuples that based on the knowledge of the attributes, are certain belong to C without ambiguity.

2. Upper Approximation of C consist of all of the data tuples that based on the knowledge of the attributes, cannot be described as not belonging to C.

Page 17: Other classification methods in data mining

One of the new data mining theories is the rough set theories that can be used for

1.Classification to discover structured relationship within noisy data.

2.Attributes subset selection.

3.Reduction of data set.

4.Finding hidden data patterns5. Generation of decision rules

Page 18: Other classification methods in data mining

Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as using fuzzy membership graph)

Attribute values are converted to fuzzy values• e.g., income is mapped into the discrete

categories {low, medium, high} with fuzzy values calculated

For a given new sample, more than one fuzzy value may apply

Each applicable rule contributes a vote for membership in the categories

Typically, the truth values for each predicted category are summed, and these sums are combined 18

Page 19: Other classification methods in data mining