feb 2007alon slapak 1 of 1 classification a practical approach classification methods training set...

Feb 2007 Alon Slapak 1 of 1

Classification

A practical approachClassification

Methods

Training Set

Classifier

Example

Definition

Bibliography


What is Classification?“Statistical classification is a statistical procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items and based on a training set of previously labeled items.” (WIKIPEDIA)

Example:

We can measure the shoe size of a student and classify his gender according to an a-priory (already known) distribution of male’s and females’ shoe size.

Classification

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

http://images.google.com/imgres?imgurl=http://www.footyboots4men.com/football%2520boots/uploads/images/Footyboots9854Pages3.jpg&imgrefurl=http://www.footyboots4men.com/football%2520boots/ShoeSizes-M.ink&h=350&w=350&sz=17&hl=en&start=30&tbnid=o-hrHkpPBX2ExM:&tbnh=120&tbnw=120&prev=/images%3Fq%3Dshoe%2Bsize%26start%3D20%26gbv%3D2%26ndsp%3D20%26svnum%3D10%26hl%3Den%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.clipartheaven.com/clipart/shapes_%26_symbols/male_%26_female.gif&imgrefurl=http://www.ac-nancy-metz.fr/enseign/anglais/Henry/nom.htm&h=360&w=490&sz=6&hl=en&start=28&tbnid=7iJ70WiZbiqQQM:&tbnh=96&tbnw=130&prev=/images%3Fq%3Dmale%2Bfemale%26start%3D20%26gbv%3D2%26ndsp%3D20%26svnum%3D10%26hl%3Den%26sa%3DN


What is a Training set?Firstly, we need a database that gives us information on the distribution of students’ shoe size (feature). This database is called a training set.

Example:We may measure the shoe size of 21 male students and 26 female students and sketch the histogram of the measurements.The histogram may be regarded (under several assumptions) as an estimation to the shoe size distribution.


Training Set

Classifier

Example

Definition

Bibliography


What is a Training set?Based on the assumption that the training set represents the group of objects to be classified, one can determine the shoe size 6.5 as the discriminant boundary between male and female students.


Training Set

Classifier

Example

Definition

Bibliography

But what if for some reason, one of the female student in the training set uses a size 9.5 shoe?

No. Another

discriminant boundary

(more than size 9.25 is a

female student) does

not make sense.


“Perfect” Vs. “Simple” classifier

0 1 2 3 4 5 6

x1

x 2

0

1

2

3

4

5

6


Training Set

Classifier

Example

Definition

Bibliography

“Perfect” classifier

“Simple” classifier

Which is better?

Check on a test-set (cross validation).


Cross validation of the “Perfect” classifier

0 1 2 3 4 5 6

x1

x 2

0

1

2

3

4

5

6


Training Set

Classifier

Example

Definition

Bibliography

Training set:0% misclassification

Test set:8/25 =24% misclassification

Train Test


Cross validation of the “simple” classifier

0 1 2 3 4 5 6

x1

x 2

0

1

2

3

4

5

6

Train Test


Training Set

Classifier

Example

Definition

Bibliography

Training set:6/29 = 20% misclassification

Test set:2/25 =8% misclassification


Training set Vs. Test setOvertraining is said to occur when the decision boundaries are fit too specifically to the training set. To avoid overtraining, the classification algorithm should be tested on a different set of patterns. Attention! A common mistake is to test the classification algorithm on the training set.The statistical distribution of the training set and the test set should be the same.Overtraining is a result of a too small training set.


Training Set

Classifier

Example

Definition

Bibliography


Occam’s razor

Given a training set of data, we want to use this data to choose a classifier (or decision rule), which will classify future elements based on their features.

Based on a particular training set, which of the infinitely many possible classifiers is most likely to be accurate for future data?

As seen in the previous example, Occam's Razor is an important factor in making this choice.


Training Set

Classifier

Example

Definition

Bibliography

“Plurality should not be posited without necessity “Or in pattern recognition language:

“simpler classifiers should be preferred over complex ones.”


What is a Classifier ?“A classifier is a mapping from a (discrete or continuous) feature space X to a discrete set of labels Y.” (WIKIPEDIA)

g: Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

http://images.google.com/imgres?imgurl=http://www1.istockphoto.com/file_thumbview_approve/1026173/2/istockphoto_1026173_print_of_foots_of_the_child.jpg&imgrefurl=http://www.istockphoto.com/imageindex/1026/1/1026173/Print_of_foots_of_the_child.html&h=380&w=326&sz=37&hl=en&start=1&tbnid=6_5uniYwgO4gvM:&tbnh=123&tbnw=106&prev=/images%3Fq%3Dfoots%26gbv%3D2%26ndsp%3D20%26svnum%3D10%26hl%3Den%26sa%3DN


Notations for classifiersA commonly used notation for a class is i where stands for “class” and i stands for the label (or index) of the class.

=1 =2

=x1 =x2 =x3

And for the a pattern (a vector in the feature space) we would prefer to use the notation xi


Training Set

Classifier

Example

Definition

Bibliography


Crisp classifier Vs. Fuzzy classifier

I’m hesitating. 74 percent it’s a male student.

26 percent it’s a female.

?The crisp classifier The fuzzy classifier

100 percent it’s a male student.


Training Set

Classifier

Example

Definition

Bibliography


Crisp classifier Vs. Fuzzy classifierA crisp classifier is a single-valued mapping from a feature space X to a discrete set of labels Y

A fuzzy classifier is a multi-valued mapping from a feature space X to a discrete set of labels Y

X Y

x1 1

2

3

x2

x3

x4

X Y

x1 1

2

3

x2

x3

x4

0.60.4

0.10.9

1.0

0.230.77


Training Set

Classifier

Example

Definition

Bibliography


Example – separate classesLet 1 stands for Female students class, 2 stands for Male students class, and let x stands for the feature which is the students shoe size .

It is easy to see that the following crisp classifier definitely classify the students gender:

1

2

6.25i

x

else


Training Set

Classifier

Example

Definition

Bibliography


Example –non-separate classes1

1

2

2

:

6.25i

where

is the female cl

is the male class

ass

x is the s

x

else

hoe size feature

A crisp classifier may be:


Training Set

Classifier

Example

Definition

Bibliography

Classification error


Example –non-separate classesA Fuzzy classifier may be:

1

2

:

is the male clas

where

is the female class

x is the shoe size fe

s

ature

1 . .1 4.25 5.75i w p x

51 8

32 8

. .5.75 6.25

. .i

w px

w p

11 3

22 3

. .6.25 6.75

. .i

w px

w p

2 . .1 6.75 8.25i w p x


Training Set

Classifier

Example

Definition

Bibliography


Example –non-separate classes

1

1 1

i=1,2 2 2

2

1

2

4.25 5.75

5.75 6.25 6.25argmax

6.25 6.75

6.75 8.25

:

i i

where

is the female class

x is the shoe siz

is the male class

x

x xP x

x else

x

e feature

A Fuzzy classifier to crisp classifier :

Classification error Classification

Methods

Training Set

Classifier

Example

Definition

Bibliography


Example –non-separate classesIn several books and papers, a Fuzzy classifier may be written as:

1 58

13

38

2 23

1 4.25 5.75

5.75 6.25

6.25 7.75

5.75 6.25

6.25 7.75

1 7.75 8.25

D

D

x

x x

x

x

x x

x

:

0,1iD

i

where

D is the classifier

x is the shoe size feature

x is the membership of x

in according to D


Training Set

Classifier

Example

Definition

Bibliography


Crisp classifier Vs. Fuzzy classifierFrom: L. I. Kuncheva, J. C. Bezdek amd R. P.W. Duin, “Decision Templates for Multiple Classier Fusion: An Experimental Comparison”, Pattern Recognition, 34, (2), pp. 299-314, 2001.


Training Set

Classifier

Example

Definition

Bibliography


Classification Methods (part of them)

• Bayes

• Distance

• Adaptive filters

• Neural networks

• Hidden Markov Model (HMM)

• Clustering

• K-Nearest-Neighbors (KNN)

• Support Vector Machine (SVM)


Training Set

Classifier

Example

Definition

Bibliography


Bibliography1. K. Fukunaga, Introduction to Statistical Pattern

Recognition, 2nd ed., Academic Press, San Diego, 1990.

2. L. I. Kuncheva, J. C. Bezdek amd R. P.W. Duin, “Decision Templates for Multiple Classier Fusion: An Experimental Comparison”, Pattern Recognition, 34, (2), pp. 299-314, 2001.

3. C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003.


Training Set

Classifier

Example

Definition

Bibliography

feb 2007alon slapak 1 of 1 classification a practical approach classification methods training set...

Documents

training set of data

small training set

particular training

shoe size distribution

different set of patterns

statistical classification

females shoe size

classification algorithm