feb 2007alon slapak 1 of 1 classification a practical approach classification methods training set...

21
Feb 2007 Alon Slapak 1 of 1 Classification A practical approach Classific ation Methods Trainin g Set Classif ier Example Definit ion Bibliogr aphy

Upload: everett-newton

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Feb 2007 Alon Slapak 1 of 1

Classification

A practical approachClassification

Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 2 of 2

What is Classification?“Statistical classification is a statistical procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items and based on a training set of previously labeled items.” (WIKIPEDIA)

Example:

We can measure the shoe size of a student and classify his gender according to an a-priory (already known) distribution of male’s and females’ shoe size.

Classification

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 3 of 3

What is a Training set?Firstly, we need a database that gives us information on the distribution of students’ shoe size (feature). This database is called a training set.

Example:We may measure the shoe size of 21 male students and 26 female students and sketch the histogram of the measurements.The histogram may be regarded (under several assumptions) as an estimation to the shoe size distribution.

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 4 of 4

What is a Training set?Based on the assumption that the training set represents the group of objects to be classified, one can determine the shoe size 6.5 as the discriminant boundary between male and female students.

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

But what if for some reason, one of the female student in the training set uses a size 9.5 shoe?

No. Another

discriminant boundary

(more than size 9.25 is a

female student) does

not make sense.

Feb 2007 Alon Slapak 5 of 5

“Perfect” Vs. “Simple” classifier

0 1 2 3 4 5 6

x1

x 2

0

1

2

3

4

5

6

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

“Perfect” classifier

“Simple” classifier

Which is better?

Check on a test-set (cross validation).

Feb 2007 Alon Slapak 6 of 6

Cross validation of the “Perfect” classifier

0 1 2 3 4 5 6

x1

x 2

0

1

2

3

4

5

6

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Training set:0% misclassification

Test set:8/25 =24% misclassification

Train Test

Feb 2007 Alon Slapak 7 of 7

Cross validation of the “simple” classifier

0 1 2 3 4 5 6

x1

x 2

0

1

2

3

4

5

6

Train Test

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Training set:6/29 = 20% misclassification

Test set:2/25 =8% misclassification

Feb 2007 Alon Slapak 8 of 8

Training set Vs. Test setOvertraining is said to occur when the decision boundaries are fit too specifically to the training set. To avoid overtraining, the classification algorithm should be tested on a different set of patterns. Attention! A common mistake is to test the classification algorithm on the training set.The statistical distribution of the training set and the test set should be the same.Overtraining is a result of a too small training set.

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 9 of 9

Occam’s razor

Given a training set of data, we want to use this data to choose a classifier (or decision rule), which will classify future elements based on their features.

Based on a particular training set, which of the infinitely many possible classifiers is most likely to be accurate for future data?

As seen in the previous example, Occam's Razor is an important factor in making this choice.

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

“Plurality should not be posited without necessity “Or in pattern recognition language:

“simpler classifiers should be preferred over complex ones.”

Feb 2007 Alon Slapak 11 of 11

Notations for classifiersA commonly used notation for a class is i where stands for “class” and i stands for the label (or index) of the class.

=1 =2

=x1 =x2 =x3

And for the a pattern (a vector in the feature space) we would prefer to use the notation xi

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 12 of 12

Crisp classifier Vs. Fuzzy classifier

I’m hesitating. 74 percent it’s a male student.

26 percent it’s a female.

?The crisp classifier The fuzzy classifier

100 percent it’s a male student.

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 13 of 13

Crisp classifier Vs. Fuzzy classifierA crisp classifier is a single-valued mapping from a feature space X to a discrete set of labels Y

A fuzzy classifier is a multi-valued mapping from a feature space X to a discrete set of labels Y

X Y

x1 1

2

3

x2

x3

x4

X Y

x1 1

2

3

x2

x3

x4

0.60.4

0.10.9

1.0

0.230.77

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 14 of 14

Example – separate classesLet 1 stands for Female students class, 2 stands for Male students class, and let x stands for the feature which is the students shoe size .

It is easy to see that the following crisp classifier definitely classify the students gender:

1

2

6.25i

x

else

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 15 of 15

Example –non-separate classes1

1

2

2

:

6.25i

where

is the female cl

is the male class

ass

x is the s

x

else

hoe size feature

A crisp classifier may be:

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Classification error

Feb 2007 Alon Slapak 16 of 16

Example –non-separate classesA Fuzzy classifier may be:

1

2

:

is the male clas

where

is the female class

x is the shoe size fe

s

ature

1 . .1 4.25 5.75i w p x

51 8

32 8

. .5.75 6.25

. .i

w px

w p

11 3

22 3

. .6.25 6.75

. .i

w px

w p

2 . .1 6.75 8.25i w p x

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 17 of 17

Example –non-separate classes

1

1 1

i=1,2 2 2

2

1

2

4.25 5.75

5.75 6.25 6.25argmax

6.25 6.75

6.75 8.25

:

i i

where

is the female class

x is the shoe siz

is the male class

x

x xP x

x else

x

e feature

A Fuzzy classifier to crisp classifier :

Classification error Classification

Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 18 of 18

Example –non-separate classesIn several books and papers, a Fuzzy classifier may be written as:

1 58

13

38

2 23

1 4.25 5.75

5.75 6.25

6.25 7.75

5.75 6.25

6.25 7.75

1 7.75 8.25

D

D

x

x x

x

x

x x

x

:

0,1iD

i

where

D is the classifier

x is the shoe size feature

x is the membership of x

in according to D

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 19 of 19

Crisp classifier Vs. Fuzzy classifierFrom: L. I. Kuncheva, J. C. Bezdek amd R. P.W. Duin, “Decision Templates for Multiple Classier Fusion: An Experimental Comparison”, Pattern Recognition, 34, (2), pp. 299-314, 2001.

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 20 of 20

Classification Methods (part of them)

• Bayes

• Distance

• Adaptive filters

• Neural networks

• Hidden Markov Model (HMM)

• Clustering

• K-Nearest-Neighbors (KNN)

• Support Vector Machine (SVM)

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography

Feb 2007 Alon Slapak 21 of 21

Bibliography1. K. Fukunaga, Introduction to Statistical Pattern

Recognition, 2nd ed., Academic Press, San Diego, 1990.

2. L. I. Kuncheva, J. C. Bezdek amd R. P.W. Duin, “Decision Templates for Multiple Classier Fusion: An Experimental Comparison”, Pattern Recognition, 34, (2), pp. 299-314, 2001.

3. C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003.

Classification Methods

Training Set

Classifier

Example

Definition

Bibliography