chapter 20 classification and estimation. 20.2 classification –20.2.1 feature selection good...

32
Chapter 20 Classification and Estimation

Upload: carmella-pearson

Post on 18-Jan-2016

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

Chapter 20 Classification and Estimation

Page 2: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

Chapter 20 Classification and Estimation• 20.2 Classification

– 20.2.1 Feature selection• Good feature have four characteristics:

– Discrimination. Features should take on significantly different values for objects belonging to different classes.

– Reliability. Features should take on similar values for all objects of the same class.

– Independence. The various features used should be uncorrelated with each other.

– Small numbers. The number of features should be small because the complexity of a pattern recognition system increases rapidly with the dimensionality of the system.

Page 3: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.2 Classification

• Classifier design– Classifier design consists of establishing the log

ical structure of the classifier and the mathematical basis of the classification rule.

• Classifier Training– A group of known objects are used to train a cla

ssifier to determine its threshold values.

Page 4: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.2 Classification

– The training set is a collection of objects from each class that have been previously identified by some accurate method.

– Training rule:minimizing an error function or a cost function.

– Unrepresentative training set; Biased training set.

Page 5: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.2 Classification

– 20.2.4 Measurement of performance• A classifier accuracy can be directly estimated by cl

assifying a known test set of objects.

• An alternative is to use a test set of known objects to estimate the PDFs of the features for objects belonging to each group.

• Using a different test set from the training set is a better approach to evaluate a classifier.

Page 6: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.3 Feature selection

• Feature selection is the process of eliminating some features and combining others that are related, until the feature set becomes manageable and performance is still adequate.

• The brute force approach of feature selection.

Page 7: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.3 Feature selection

• A training set containing objects from M different classes, let be the number of objects from class j, and , are two features obtained when the ith object in class j, the mean value of each feature is

jN

ijx ijy

jN

iij

jxj x

N 1

jN

iij

jyj y

N 1

Page 8: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.3 Feature selection

• 20.3.1 Feature Variance– All objects within the same class should take on

similar values. The variance of the features with class j is

jN

ixjij

jxj x

N 1

22 )ˆ(1

ˆ

jN

iyjij

jyj y

N 1

22 )ˆ(1

ˆ

Page 9: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.3 Feature selection

• 20.3.2 Feature correlation– The correlation of the features x and y in class j

is

– A value of zero indicates that the two features are uncorrelated, while a value near 1 implies a high degree of correlation.

yjxj

N

iyjijxjij

jxyj

j

yxN

ˆˆ

)ˆ)(ˆ(1

ˆ 1

Page 10: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.3 Feature Selection

• 20.3.3 Class separation distance– The variance-normalized distance between two

class is

where the two classes are j and k. – The greater the distance is, the better the feature

is.

22 ˆˆ

|ˆˆ|ˆ

yjxj

xkxjxjkD

Page 11: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.3 Feature selection

• 20.3.4 Dimension reduction– Many features can combine to form few numbe

r of features. – Linear combination. Two features x and y can p

roduce a new feature z by

this can be reduced to

byaxz

)sin()cos( yxz

Page 12: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.3 Feature selection

– This is a projection of (x,y) plane to line z.

Class 1

Class 2

x

y

z

Page 13: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4 Statistical Classification

• 20.4.1 Statistical decision theory– An approach that makes classification by statist

ical method. The PDFs of features are assumed to be known

– The PDFs of a feature may be estimated by measuring a large number of objects, and plotting a histogram of the feature.

Page 14: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4 Statistical Classification

– 20.4.1.1 A Priori Probabilities• The a priori probabilities represent our knowledge a

bout an object before it has been measured.

• The conditional probability is the probability of the event , when a given event occurs.

)|( 21 EEP

1E 2E

Page 15: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4 Statistical classification

• 20.4.1.2 Bayes’ Theorem– The a posteriori probability is the conditional pr

obability , which means the probability of the object belongs to the class , when the feature

occurs. – The Bayes’ Theorem (two classes)

)|( xCP i

iC

x

2

1

)()|(

)()|(

)(

)()|()|(

iii

iiiii

CpCxp

CpCxp

xp

CpCxpxCP

Page 16: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4 Statistical classification

• Bayes’ theorem may be used to pattern classification. For example, when there are only 2 classes, a object is assigned to class 1 if

• This is equivalent to

• The classifier defined by this decision rule is called a maximum-likelihood classifier.

)|()|( 21 xCPxCP

)()|()()|( 2211 CPCxpCPCxp

Page 17: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4 Statistical classification

– If there are more than one features and the feature vector is , and suppose there are m classes, then Bayes’ theorem is

• Bayes’ Risk. The conditional risk is

where is the cost (loss) of assigning an object to class i when it really belongs in class j.

Tnxxx ],,,[ 21

m

iiin

iinni

CpCxxxp

CpCxxxpxxxCp

121

2121

)()|,,,(

)()|,,,(),,,|(

m

jnjijni xxxCplxxxCR

12121 ),,,|(),,,|(

ijl

Page 18: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4 Statistical classification

– Bayes’ decision rule. Each object should be assigned to the class that produces the minimum conditional risk. The Bayes’ risk is

– Parametric and Nonparametric classifier• If the functional form of the conditional PDFs is known, but so

me parameters are unknown, the classifier is called parametric.

• If the functional form of some or all of the conditional PDFs is unknown, the classifier is called nonparametric.

nnnm dxdxdxxxxpxxxRR 212121 ),,,(),,,(

Page 19: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3 Parameter estimation and classifier training

• The process of estimating the conditional PDFs o their parameters is refered to as training the classifier.

• Supervised and unsupervised training– Supervised training. The classes to which the o

bjects in the training set is known.– Unsupervised training. The conditional PDFs ar

e estimated using samples whose class is unknown.

Page 20: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3 Parameter estimation and classifier training

• Maximum-likelihood estimation– The maximum-likelihood estimation approach assumes

that the parameters to be estimated are fixed but unknown.

– The maximum-likelihood estimate of a parameter is the value that makes the occurrence of the observed training set most likely.

– The Maximum-likelihood estimates of the mean and standard deviation of a normal distribution are the sample mean and sample standard deviation, respectively.

Page 21: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3.3 Bayesian Estimation

– The Bayesian estimation treats the unknown parameter as a random variable, and it has a known a priori PDF before any samples are taken.

– After the training set has been measured, Bayes’ theorem is used to update the a priori PDF, and this results in an a posterior PDF of the unknown parameter value.

– The a posteriori PDF with a single narrow peak, centered on the true value of the parameter is desired.

Page 22: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3.3 Bayesian estimation

• An example of Bayesian estimation– Estimate the mean of a normal distribution with

known variance. The a priori PDF is .– The functional form of the PDF of the unknown

mean is assumed to be , this means that given a value for , we known .

– Suppose represents the set of sample values obtained by measuring the training set.

)(p

)|( xp

)(xp

X

Page 23: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3.3 Bayesian estimation

– Bayes’ theorem gives the a posteriori PDF

– What we really want is

– For example, if has a single sharp peak at , it can be approximated as an impulse

dpXp

pXpXp

)()|(

)()|()|(

dXpxpdXxpXxp )|()|()|,()|(

)|( Xp

)()|( 0 Xp

0

Page 24: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3.3 Bayesian estimation

– Then

This means that is the best estimate of the unknown mean.

– If has a relatively broad peak, then

becomes a weighted average of many PDFs.

– Both maximum-likelihood and Bayesian estimate the unknown mean at the mean of a large training set.

)|()()|()|( 00 xpdxpXxp

0

)|( Xp )|( Xxp

Page 25: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3.3 Bayesian estimation

• Steps of Bayesian estimation– 1.Assume an a priori PDF for the unknown parameters;

– 2.collect samples values from the population by measuring the training set.

– 3.Use Bayes’ theorem to refine the a priori PDF into the a posteriori PDF

– 4.Form the joint density of x and the unknown parameter and integrate out the latter to leave the desired estimate of the PDF.

Page 26: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.4.3.3 Bayesian estimation

– If we have strong ideas about the probable values of the unknown parameter, we may assume a narrow a priori PDF, otherwise, we should assume a relatively broad PDF.

Page 27: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features

20.5 Neural Networks

• Layered feedforward neural networks

where the activation function is usually a Sigmoidal function.

j

jkjk xwS

1x

2x

nx

1w

2w

nwy

][][1

SgwxggON

iii

T

WX

][g

)( kk Sgx

1x

2x

Nx

1kw

2kw

knw

Page 28: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features
Page 29: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features
Page 30: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features
Page 31: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features
Page 32: Chapter 20 Classification and Estimation. 20.2 Classification –20.2.1 Feature selection Good feature have four characteristics: –Discrimination. Features