classification problem givengiven predict class label of a given querypredict class label of a given...

29
Classification Problem Classification Problem Given Given Predict class label of a given query Predict class label of a given query N n n n y 1 )} , {( x q n x } , { n y 0 x - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - - - - - + - - + + . 0 x - - + - + 2 x 1 x + -

Upload: amberly-simon

Post on 14-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Classification ProblemClassification Problem

• GivenGiven

• Predict class label of a given query Predict class label of a given query

Nnnn y 1)},{( x q

n x } ,{ ny

0x

-- --

-- -

-

--

-

-

-

-

-

-

-

+

+

++

+ ++

+

++

+

+

+

+

+

+

+

+

-

-

-

-

-+

-

- + +

.0x

-

-

+

-

+2x

1x

+-

Page 2: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Classification ProblemClassification Problem

• Unknown probability distribution Unknown probability distribution

• We need to estimate: We need to estimate:

),( yP x

)()|( 00 xx fP

)()|( 00 xx fP

Page 3: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

The Bayesian ClassifierThe Bayesian Classifier• Loss function:Loss function: • Expected loss (conditional risk) associated with class Expected loss (conditional risk) associated with class jj::

• Bayes rule:Bayes rule:

• Zero-one loss function:Zero-one loss function:

kj |

x x ||)|(1

kPkjjRJ

k

)|(minarg1

* xjRjJj

kjif

kjifkj

1

0|

)|(maxarg1

* xjPjJj

Bayes rule

Page 4: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

The Bayesian ClassifierThe Bayesian Classifier

• Bayes rule achieves the minimum error rateBayes rule achieves the minimum error rate

• How to estimate the posterior probabilities: How to estimate the posterior probabilities:

)|(maxarg1

* xjPjJj

JjjP 1| x

)|(ˆmaxargˆ1

xx jPjJj

Page 5: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Density estimationDensity estimation• Use Bayes theorem to estimate the posterior probability Use Bayes theorem to estimate the posterior probability

values:values:

is the probability density function of given is the probability density function of given

class class

is the prior probability of classis the prior probability of class

J

k

kPkp

jPjpjP

1

|

|)|(

x

xx

x jp |xj

jP j

Page 6: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Naïve Bayes ClassifierNaïve Bayes Classifier• Makes the assumption of independence of features given the class:Makes the assumption of independence of features given the class:

• The task of estimating a The task of estimating a qq-dimensional density function is reduced to the estimation of q -dimensional density function is reduced to the estimation of q one-dimensional density functions. Thus, the complexity of the task is drastically one-dimensional density functions. Thus, the complexity of the task is drastically reduced.reduced.

• The use of Bayes theorem becomes much simpler.The use of Bayes theorem becomes much simpler.

• Proven to be effective in practice.Proven to be effective in practice.

jxpjxxxpjp i

q

iq ||,,,|

121

x

Page 7: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Nearest-Neighbor MethodsNearest-Neighbor Methods• Predict the class label of as the most frequent Predict the class label of as the most frequent

one occurring in the neighborsone occurring in the neighbors

0xK

-- --

-- -

-

--

-

-

-

-

-

-

-

+

+

++

+ ++

+

++

+

+

+

+

+

+

+

+

-

-

-

-

-+

-

- + +

-

-

+

-

+2x

1x

+- .+

-

Page 8: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Nearest-Neighbor MethodsNearest-Neighbor Methods• Predict the class label of as the most frequent Predict the class label of as the most frequent

one occurring in the neighborsone occurring in the neighbors

0xK

-- --

-- -

-

--

-

-

-

-

-

-

-

+

+

++

+ ++

+

++

+

+

+

+

+

+

+

+

-

-

-

-

-+

-

- + +

-

-

+

-

+2x

1x

+-++

-

Page 9: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Nearest-Neighbor MethodsNearest-Neighbor Methods• Predict the class label of as the most frequent Predict the class label of as the most frequent

one occurring in the neighborsone occurring in the neighbors

0xK

-- --

-- -

-

--

-

-

-

-

-

-

-

+

+

++

+ ++

+

++

+

+

+

+

+

+

+

+

-

-

-

-

-+

-

- + +

-

-

+

-

+2x

1x

+- .+

-.. distanc

edistance

metricmetric

Basic assumption:Basic assumption:

)() (

)() (

xxx

xxx

ff

ff

small for x

Page 10: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Example: Letter RecognitionExample: Letter Recognition

....

..Edge countEdge count

First statisticalFirst statistical momentmoment

Page 11: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Asymptotic Properties of Asymptotic Properties of K-NN MethodsK-NN Methods

)(ˆlim xx jjN ff

0/lim NKNif and if and KNlim

• The first condition reduces the variance by making the estimation The first condition reduces the variance by making the estimation independent of the accidental characteristics of the independent of the accidental characteristics of the KK nearest nearest

neighbors. neighbors.

• The second condition reduces the bias by assuring that the The second condition reduces the bias by assuring that the KK nearest neighbors are arbitrarily close to the query point. nearest neighbors are arbitrarily close to the query point.

Page 12: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Asymptotic Properties of Asymptotic Properties of K-NN MethodsK-NN Methods

EEN 2lim 1

1E classification error rate of the 1-NN ruleclassification error rate of the 1-NN rule

E classification error rate of the Bayes ruleclassification error rate of the Bayes rule

In the asymptotic limitIn the asymptotic limit no decision rule is more no decision rule is more than twice as accurate as the 1-NN rulethan twice as accurate as the 1-NN rule

Page 13: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Finite-sample settingsFinite-sample settings

• If the number of training data If the number of training data NN is large and the number is large and the number of input features of input features qq is small, then the asymptotic results may is small, then the asymptotic results may still be valid.still be valid.

• However, for a moderate to large number of input However, for a moderate to large number of input variables, the sample required for their validity is variables, the sample required for their validity is beyond feasibility.beyond feasibility.

• How well the 1-NN rule works in finite-How well the 1-NN rule works in finite-sample settings?sample settings?

Page 14: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse-of-DimensionalityCurse-of-Dimensionality

• This phenomenon is known as This phenomenon is known as the the curse-of-dimensionalitycurse-of-dimensionality

• It refers to the fact that in high dimensional It refers to the fact that in high dimensional spaces data become extremely sparse and spaces data become extremely sparse and

are far apart from each otherare far apart from each other

• It affects It affects anyany estimation problem with estimation problem with high dimensionalityhigh dimensionality

Page 15: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse of Dimensionality

Sample of size Sample of size N=500N=500 uniformly distributed in uniformly distributed in q]1 ,0[

DMAXDMAX

DMINDMIN

DMAX/DMINDMAX/DMIN

Page 16: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse of Dimensionality

dimdim

The distribution of the ratio The distribution of the ratio DMAX/DMINDMAX/DMIN converges to converges to 11 as the dimensionality increases as the dimensionality increases

Page 17: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse of Dimensionality

dimdim

Variance of distances from a given pointVariance of distances from a given point

Page 18: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse of Dimensionality

The variance of distances from a given point The variance of distances from a given point converges to converges to 00 as the dimensionality increases as the dimensionality increases

dimdim

Page 19: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse of Dimensionality

Distance values from a given pointDistance values from a given point

Values flatten out as dimensionality increasesValues flatten out as dimensionality increases

Page 20: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Computing radii of nearest neighborhoodsComputing radii of nearest neighborhoods

Page 21: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

median radius of a nearest neighborhoodmedian radius of a nearest neighborhood

q.5,.5- cubeunit in theon distributi uniform

Page 22: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse-of-DimensionalityCurse-of-Dimensionality

q 4 4 6 6 10 10 20 20 20

N 100 1000 100 1000 1000 10000 10000

d(q,N) 0.42 0.23 0.71 0.48 0.91 0.72 1.51 1.20 0.76

610 1010

~N• Random sample of size uniform distribution in theRandom sample of size uniform distribution in theq -dimensional unit hypercube-dimensional unit hypercube

• Diameter of a neighborhood using EuclideanDiameter of a neighborhood using Euclidean1K)(),( /1 qNONqd distance: distance:

As dimensionality increases, the distance from the As dimensionality increases, the distance from the closest point increases fasterclosest point increases faster

Large Highly biased estimationsLarge Highly biased estimations),( Nqd

Page 23: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Curse-of-DimensionalityCurse-of-Dimensionality

• It is a serious problem in many It is a serious problem in many real-world applicationsreal-world applications

• Microarray dataMicroarray data: 3,000-4,000 genes;: 3,000-4,000 genes;

• DocumentsDocuments: 10,000-20,000 words in : 10,000-20,000 words in dictionary;dictionary;

• ImagesImages, , face recognitionface recognition, etc., etc.

Page 24: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

How can we deal withHow can we deal with the curse of dimensionality?the curse of dimensionality?

Page 25: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

5.19122.92

2.9268.7

Page 26: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

N

iiii

iii

T

N

ii

xxx

xxxN

xxx

xxxE

xxx

xE

E

Nx

x

12

222211

2211

2

11

2222211

22112

11

221122

11

12

1

2

1

1

,

: 22

1

μxμx

xμx

matrix covariance

Page 27: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

N

i

iN

i

ii

N

i

iiN

i

i

N

iiii

iii

xN

xxN

xxN

xN

xxx

xxxN

1

2

221

2211

12211

1

2

11

12

222211

2211

2

11

11

11

1

variancevariance

variancevariance

covariancecovariance

covariancecovariance

Page 28: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

06.15.0

5.099.0

15.105.1

05.104.1

05.101.0

01.093.0

04.149.0

49.097.0

03.193.0

93.094.0

Page 29: Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Dimensionality ReductionDimensionality Reduction

• Many dimensions are often Many dimensions are often interdependent (correlated);interdependent (correlated);

We can:We can:

• Reduce the dimensionality of problems;Reduce the dimensionality of problems;

• Transform interdependent coordinates Transform interdependent coordinates into significant and independent ones;into significant and independent ones;