gaussian classifiers

44
Gaussian Classiers CS498

Upload: votuong

Post on 19-Jan-2017

254 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Gaussian Classifiers

Gaussian Classi!ers

CS498

Page 2: Gaussian Classifiers

Today’s lecture

•  The Gaussian

•  Gaussian classifiers – A slightly more sophisticated classifier

Page 3: Gaussian Classifiers

Nearest Neighbors

•  We can classify with nearest neighbors

•  Can we get a probability?

Decision boundary

m1

m2

x

Page 4: Gaussian Classifiers

Nearest Neighbors

•  Nearest neighbors offers an intuitive distance measure

di ∝ (x1−mi,1)2 + (x2 −mi,2)

2 = x−m

d1

d2

m1

m2

x

Page 5: Gaussian Classifiers

Making a “Soft” Decision

•  What if I didn’t want to classify – What if I wanted a “degree of belief”

•  How would you do that?

d1

d2

m1

m2

x

Page 6: Gaussian Classifiers

From a Distance to a Probability

•  If the distance is 0 the probability is high

•  If the distance is ∞ the probability is zero

•  How do we make a function like that?

Page 7: Gaussian Classifiers

Here’s a first crack at it

•  Use exponentiation:

e− x−m

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 8: Gaussian Classifiers

Adding an “importance” factor

•  Let’s try to tune the output by adding a factor denoting importance

e−

x−mc

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 9: Gaussian Classifiers

One more problem

•  Not all dimensions are equal

Page 10: Gaussian Classifiers

Adding variable “importance” to dimensions

•  Somewhat more complicated now:

e−(x−m)T C−1(x−m)

Page 11: Gaussian Classifiers

The Gaussian Distribution

•  This is the idea behind the Gaussian – Adding some normalization we get:

P(x;m,C) =

12πk/2 |C |1/2

e−(x−m)T C−1(x−m)

Page 12: Gaussian Classifiers

Gaussian models

•  We can now describe data using Gaussians

•  How? That’s very easy

x

Page 13: Gaussian Classifiers

Learn Gaussian parameters

•  Estimate the mean:

•  Estimate the covariance: m =

1N

xi∑

C =

1N −1

(x−m)T ⋅(x−m)

Page 14: Gaussian Classifiers

Now we can make classifiers

•  We will use probabilities this time

•  We’ll compute a “belief” of class assignment

Page 15: Gaussian Classifiers

The Classification Process

•  We provide examples of classes

•  We make models of each class

•  We assign all new input data to a class

Page 16: Gaussian Classifiers

Making an assignment decision

•  Face classification example

•  Having a probability for each face how do we make a decision?

Page 17: Gaussian Classifiers

Motivating example

•  Face 1 is more “likely”

0.93

0.87

x y P(y | {face1, face2})

Tem

plat

e fa

ce 1

Te

mpl

ate

face

2

Unk

now

n fa

ce

Page 18: Gaussian Classifiers

How the decision is made

•  In simple cases the answer is intuitive

•  To get a complete picture we need to probe a bit deeper

Page 19: Gaussian Classifiers

Starting simple

•  Two class case, ω1 and ω2

Page 20: Gaussian Classifiers

Starting simple

•  Given a sample x, is it ω1 or ω2? –  i.e.

P(ωi | x)= ?

Page 21: Gaussian Classifiers

Getting the answer

•  The class posterior probability is:

•  To find the answer we need to fill in the terms in the right-hand-side

P(ωi | x)=P(x | ωi)P(ωi)

P(x)

Likelihood Priors

Evidence

Page 22: Gaussian Classifiers

Filling the unknowns

•  Class priors – How much of each class?

•  Class likelihood: –  Requires that we know the distribution of ωi

•  We’ll assume it is the Gaussian

P(ω1)≈ N1 /NP(ω2)≈ N2 /N

P(x | ωi)

Page 23: Gaussian Classifiers

Filling the unknowns

•  Evidence:

•  We now have

P(x)= P(x | ω1)P(ω1)+P(x | ω2)P(ω2)

P(ω1 | x),P(ω2 | x)

Page 24: Gaussian Classifiers

Making the decision

•  Bayes classification rule

•  Easier version

•  Equiprobable class version

If P(ω1 | x)> P(ω2 | x) then x belongs to class ω1

If P(ω1 | x)< P(ω2 | x) then x belongs to class ω2

P(x | ω1)P(ω1) P(x | ω2)P(ω2)

P(x | ω1) P(x | ω2)

Page 25: Gaussian Classifiers

Visualizing the decision

•  Assume Gaussian data –  P(x | ωi) = N (x | µi ,σi)

P(ω1 | x)= P(ω2 | x)

x0

R1 R2

Page 26: Gaussian Classifiers

Errors in classification

•  We can’t win all the time though –  Some inputs will be misclassified

•  Probability of errors

Perror =12

P(x | ω2)dx−∞

x0

∫ +12

P(x | ω1)dxx0

x0

R1 R2

Page 27: Gaussian Classifiers

Minimizing misclassifications

•  The Bayes classification rule minimizes any potential misclassifications

•  Can you do any better moving the line?

x0

R1 R2

Page 28: Gaussian Classifiers

Minimizing risk

•  Not all errors are equal! –  e.g. medical diagnoses

•  Misclassification is often very tolerable

depending on the assumed risks

x0

R1 R2

Page 29: Gaussian Classifiers

Example

•  Minimum classification error

•  Minimum risk with λ21 > λ12 –  i.e. class 2 is more important

x0

R1 R2

x0

R1 R2

Page 30: Gaussian Classifiers

True/False – Positives/Negatives

•  Naming the outcomes

•  False positive/false alarm/Type I error •  False negative/miss/Type II error

classifying for ω1 x is ω1 x is ω2

x classified as ω1 True positive False positive x classified as ω2 False negative True negative

Page 31: Gaussian Classifiers

Receiver Operating Characteristic

•  Visualize classification balance

x0

R1 R2

False positive rate

True

pos

itive

rate

R.O.C. Curve

http://www.anaesthetist.com/mnm/stats/roc/Findex.htm

Page 32: Gaussian Classifiers

Classifying Gaussian data

•  Remember that we need the class likelihood to make a decision –  For now we’ll assume that:

–  i.e. that the input data is Gaussian distributed

P(x | ωi)= N (x | µi ,σi)

Page 33: Gaussian Classifiers

Overall methodology

•  Obtain training data

•  Fit a Gaussian model to each class –  Perform parameter estimation for mean,

variance and class priors

•  Define decision regions based on models and any given constraints

Page 34: Gaussian Classifiers

1D example

•  The decision boundary will always be a line separating the two class regions

x0

R1 R2

Page 35: Gaussian Classifiers

2D example

Page 36: Gaussian Classifiers

2D example fitted Gaussians

Page 37: Gaussian Classifiers

Gaussian decision boundaries

•  The decision boundary is defined as: •  We can substitute Gaussians and solve to find what the boundary looks like

P(x | ω1)P(ω1)= P(x | ω2)P(ω2)

Page 38: Gaussian Classifiers

Discriminant functions

•  Define a function so that:

•  Decision boundaries are now defined as:

classify x in ωi if gi(x)> gj(x),∀i ≠ j

gij(x)≡ gi(x)= gj(x)( )

Page 39: Gaussian Classifiers

Back to the data

•  produces line boundaries

•  Discriminant:

Σi = σi2I

gi(x)= wiTx+ b

wi = µi / σ2

b = −µiTµi2σ2

+ logP(ωi)

Page 40: Gaussian Classifiers

Quadratic boundaries

•  Arbitrary covariance produces more elaborate patterns in the boundary

gi(x)= xTWix+wi

Tx+wiWi = −

12Σi−1

wi = Σi−1µi

w = − 12µiTΣi

−1µi −12

log Σi

+ logP(ωi)

Page 41: Gaussian Classifiers

Quadratic boundaries

•  Arbitrary covariance produces more elaborate patterns in the boundary

Page 42: Gaussian Classifiers

Quadratic boundaries

•  Arbitrary covariance produces more elaborate patterns in the boundary

Page 43: Gaussian Classifiers

Example classification run

•  Learning to recognize two handwritten digits

•  Let’s try this in MATLAB

Page 44: Gaussian Classifiers

Recap

•  The Gaussian

•  Bayesian Decision Theory –  Risk, decision regions – Gaussian classifiers