chapter 2 (part 2): bayesian decision theory (sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

Pattern Classification

All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Chapter 2 (Part 2): Bayesian Decision Theory

(Sections 2.3-2.5)

• Minimum-Error-Rate Classification

• Classifiers, Discriminant Functions and Decision Surfaces

• The Normal Density

Minimum-Error-Rate Classification

•Actions are decisions on classesIf action i is taken and the true state of nature is j then:

the decision is correct if i = j and in error if i j

•Seek a decision rule that minimizes the probability of error which is the error rate

• Introduction of the zero-one loss function:

Therefore, the conditional risk is:

“The risk corresponding to this loss function is the average probability error”

c,...,1j,i ji 1

ji 0),( ji

1jjjii

)x|(P1)x|(P

)x|(P)|()x|(R

•Minimize the risk requires maximize P(i | x)

(since R(i | x) = 1 – P(i | x))

•For Minimum error rate

•Decide i if P (i | x) > P(j | x) j i

• Regions of decision and zero-one loss function, therefore:

• If is the zero-one loss function which means:

)(P2 then

2 0 if

)(P then

)|x(P :if decide then

)(P. Let

8Classifiers, Discriminant Functions

and Decision Surfaces

•The multi-category case

•Set of discriminant functions gi(x), i = 1,…, c

•The classifier assigns a feature vector x to class i

if: gi(x) > gj(x) j i

•Let gi(x) = - R(i | x)

(max. discriminant corresponds to min. risk!)

•For the minimum error rate, we take gi(x) = P(i | x)

(max. discrimination corresponds to max. posterior!)

gi(x) P(x | i) P(i)

gi(x) = ln P(x | i) + ln P(i)

(ln: natural logarithm!)

•Feature space divided into c decision regions

if gi(x) > gj(x) j i then x is in Ri

(Ri means assign x to i)

•The two-category case•A classifier is a “dichotomizer” that has two

discriminant functions g1 and g2

Let g(x) g1(x) – g2(x)

Decide 1 if g(x) > 0 ; Otherwise decide 2

•The computation of g(x)

)|x(Pln

)x|(P)x|(P)x(g

The Normal Density

• Univariate density

• Density which is analytically tractable

• Continuous density

• A lot of processes are asymptotically Gaussian

• Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem)

Where: = mean (or expected value) of x 2 = expected squared deviation or variance

• Multivariate density

• Multivariate normal density in d dimensions is:

where:

x = (x1, x2, …, xd)t (t stands for the transpose vector form)

= (1, 2, …, d)t mean vector = d*d covariance matrix

|| and -1 are determinant and inverse respectively

)x()x(

1)x(P 1t

2/12/d

Appendix

•Variance=S2

•Standard Deviation=S

Bays theorem

A ﹁ A

B A and B ﹁ A and B

﹁ B A and ﹁ B ﹁ A and ﹁ B

)|()()|()(

)|()()|(

ABPAPABPAP

ABPAPBAP

)|()()|(

ABPAPBAP

chapter 2 (part 2): bayesian decision theory (sections 2.3-2.5)

Documents

bayesian inference for categorical data...

bayesian networks textbook: probabilistic reasoning,...

bayesian bayesian network

cs 536 – montana state university bayesian learning...

2.5 construction & commissioning 2.5.1 labour...

bayesian learning and learning bayesian networks ·...

3. bayesian decision theory - sophia - inria · bayesian...

bayesian decision theory (sections 2.1-2.2)

sets, boolean algebras, and relations sections 2.1–2.5

chapter 14, sections 1–4 · c stuart russel and peter...

chapter 2, sections 2.5 - 2.6, “applications” and...

stan - columbia university · • books with sections on...

newton’s first law of motion sections 2.4 and 2.5

rds, dublin13-14 march 2016 sept-2344.pdf · 2 2.5 2 2.5 2...

bayesian estimation of non-linear vector error correction...

© 2011 ifrs foundation 1 the ifrs for smes topic 2.5 quiz...

radiography -...

sections 2.1, 2.2, 2.3, 2.4, 2.5

bayesian analysis of stochastic constraints in … · 2.4...

bayesian thinking in spatial statistics - emory …€¦ ·...