gaussian classifiers
TRANSCRIPT
![Page 1: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/1.jpg)
Gaussian Classi!ers
CS498
![Page 2: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/2.jpg)
Today’s lecture
• The Gaussian
• Gaussian classifiers – A slightly more sophisticated classifier
![Page 3: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/3.jpg)
Nearest Neighbors
• We can classify with nearest neighbors
• Can we get a probability?
Decision boundary
m1
m2
x
![Page 4: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/4.jpg)
Nearest Neighbors
• Nearest neighbors offers an intuitive distance measure
di ∝ (x1−mi,1)2 + (x2 −mi,2)
2 = x−m
d1
d2
m1
m2
x
![Page 5: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/5.jpg)
Making a “Soft” Decision
• What if I didn’t want to classify – What if I wanted a “degree of belief”
• How would you do that?
d1
d2
m1
m2
x
![Page 6: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/6.jpg)
From a Distance to a Probability
• If the distance is 0 the probability is high
• If the distance is ∞ the probability is zero
• How do we make a function like that?
![Page 7: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/7.jpg)
Here’s a first crack at it
• Use exponentiation:
e− x−m
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 8: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/8.jpg)
Adding an “importance” factor
• Let’s try to tune the output by adding a factor denoting importance
e−
x−mc
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 9: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/9.jpg)
One more problem
• Not all dimensions are equal
![Page 10: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/10.jpg)
Adding variable “importance” to dimensions
• Somewhat more complicated now:
e−(x−m)T C−1(x−m)
![Page 11: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/11.jpg)
The Gaussian Distribution
• This is the idea behind the Gaussian – Adding some normalization we get:
P(x;m,C) =
12πk/2 |C |1/2
e−(x−m)T C−1(x−m)
![Page 12: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/12.jpg)
Gaussian models
• We can now describe data using Gaussians
• How? That’s very easy
x
![Page 13: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/13.jpg)
Learn Gaussian parameters
• Estimate the mean:
• Estimate the covariance: m =
1N
xi∑
C =
1N −1
(x−m)T ⋅(x−m)
![Page 14: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/14.jpg)
Now we can make classifiers
• We will use probabilities this time
• We’ll compute a “belief” of class assignment
![Page 15: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/15.jpg)
The Classification Process
• We provide examples of classes
• We make models of each class
• We assign all new input data to a class
![Page 16: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/16.jpg)
Making an assignment decision
• Face classification example
• Having a probability for each face how do we make a decision?
![Page 17: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/17.jpg)
Motivating example
• Face 1 is more “likely”
0.93
0.87
x y P(y | {face1, face2})
Tem
plat
e fa
ce 1
Te
mpl
ate
face
2
Unk
now
n fa
ce
![Page 18: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/18.jpg)
How the decision is made
• In simple cases the answer is intuitive
• To get a complete picture we need to probe a bit deeper
![Page 19: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/19.jpg)
Starting simple
• Two class case, ω1 and ω2
![Page 20: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/20.jpg)
Starting simple
• Given a sample x, is it ω1 or ω2? – i.e.
P(ωi | x)= ?
![Page 21: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/21.jpg)
Getting the answer
• The class posterior probability is:
• To find the answer we need to fill in the terms in the right-hand-side
P(ωi | x)=P(x | ωi)P(ωi)
P(x)
Likelihood Priors
Evidence
![Page 22: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/22.jpg)
Filling the unknowns
• Class priors – How much of each class?
• Class likelihood: – Requires that we know the distribution of ωi
• We’ll assume it is the Gaussian
P(ω1)≈ N1 /NP(ω2)≈ N2 /N
P(x | ωi)
![Page 23: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/23.jpg)
Filling the unknowns
• Evidence:
• We now have
P(x)= P(x | ω1)P(ω1)+P(x | ω2)P(ω2)
P(ω1 | x),P(ω2 | x)
![Page 24: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/24.jpg)
Making the decision
• Bayes classification rule
• Easier version
• Equiprobable class version
If P(ω1 | x)> P(ω2 | x) then x belongs to class ω1
If P(ω1 | x)< P(ω2 | x) then x belongs to class ω2
P(x | ω1)P(ω1) P(x | ω2)P(ω2)
P(x | ω1) P(x | ω2)
![Page 25: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/25.jpg)
Visualizing the decision
• Assume Gaussian data – P(x | ωi) = N (x | µi ,σi)
P(ω1 | x)= P(ω2 | x)
x0
R1 R2
![Page 26: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/26.jpg)
Errors in classification
• We can’t win all the time though – Some inputs will be misclassified
• Probability of errors
Perror =12
P(x | ω2)dx−∞
x0
∫ +12
P(x | ω1)dxx0
∞
∫
x0
R1 R2
![Page 27: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/27.jpg)
Minimizing misclassifications
• The Bayes classification rule minimizes any potential misclassifications
• Can you do any better moving the line?
x0
R1 R2
![Page 28: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/28.jpg)
Minimizing risk
• Not all errors are equal! – e.g. medical diagnoses
• Misclassification is often very tolerable
depending on the assumed risks
x0
R1 R2
![Page 29: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/29.jpg)
Example
• Minimum classification error
• Minimum risk with λ21 > λ12 – i.e. class 2 is more important
x0
R1 R2
x0
R1 R2
![Page 30: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/30.jpg)
True/False – Positives/Negatives
• Naming the outcomes
• False positive/false alarm/Type I error • False negative/miss/Type II error
classifying for ω1 x is ω1 x is ω2
x classified as ω1 True positive False positive x classified as ω2 False negative True negative
![Page 31: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/31.jpg)
Receiver Operating Characteristic
• Visualize classification balance
x0
R1 R2
False positive rate
True
pos
itive
rate
R.O.C. Curve
http://www.anaesthetist.com/mnm/stats/roc/Findex.htm
![Page 32: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/32.jpg)
Classifying Gaussian data
• Remember that we need the class likelihood to make a decision – For now we’ll assume that:
– i.e. that the input data is Gaussian distributed
P(x | ωi)= N (x | µi ,σi)
![Page 33: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/33.jpg)
Overall methodology
• Obtain training data
• Fit a Gaussian model to each class – Perform parameter estimation for mean,
variance and class priors
• Define decision regions based on models and any given constraints
![Page 34: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/34.jpg)
1D example
• The decision boundary will always be a line separating the two class regions
x0
R1 R2
![Page 35: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/35.jpg)
2D example
![Page 36: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/36.jpg)
2D example fitted Gaussians
![Page 37: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/37.jpg)
Gaussian decision boundaries
• The decision boundary is defined as: • We can substitute Gaussians and solve to find what the boundary looks like
P(x | ω1)P(ω1)= P(x | ω2)P(ω2)
![Page 38: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/38.jpg)
Discriminant functions
• Define a function so that:
• Decision boundaries are now defined as:
classify x in ωi if gi(x)> gj(x),∀i ≠ j
gij(x)≡ gi(x)= gj(x)( )
![Page 39: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/39.jpg)
Back to the data
• produces line boundaries
• Discriminant:
Σi = σi2I
gi(x)= wiTx+ b
wi = µi / σ2
b = −µiTµi2σ2
+ logP(ωi)
![Page 40: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/40.jpg)
Quadratic boundaries
• Arbitrary covariance produces more elaborate patterns in the boundary
gi(x)= xTWix+wi
Tx+wiWi = −
12Σi−1
wi = Σi−1µi
w = − 12µiTΣi
−1µi −12
log Σi
+ logP(ωi)
![Page 41: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/41.jpg)
Quadratic boundaries
• Arbitrary covariance produces more elaborate patterns in the boundary
![Page 42: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/42.jpg)
Quadratic boundaries
• Arbitrary covariance produces more elaborate patterns in the boundary
![Page 43: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/43.jpg)
Example classification run
• Learning to recognize two handwritten digits
• Let’s try this in MATLAB
![Page 44: Gaussian Classifiers](https://reader031.vdocuments.mx/reader031/viewer/2022021422/5880a38d1a28ab313d8bfc49/html5/thumbnails/44.jpg)
Recap
• The Gaussian
• Bayesian Decision Theory – Risk, decision regions – Gaussian classifiers