ch 2_ part 1.pptx

8/10/2019 Ch 2_ Part 1.pptx

1/25

8/10/2019 Ch 2_ Part 1.pptx

2/25

Statistical decision theory deals with situationswhere decisions have to be made under a state ofuncertainty, and its goal is to provide a rationalframework for dealing with such situations.

The Bayesian approach is a particular way offormulating and dealing with statistical decision

problems. More specifically, it offers a method offormalizing a priori beliefs and of combining themwith the available observations,

Introduction

8/10/2019 Ch 2_ Part 1.pptx

3/25

Introduction (cont..)

The sea bass/salmon example

State of nature, prior

State of nature is a random variable

The catch of salmon and sea bass is equiprobable

P(1) = P(2) (uniform priors)

P(1) + P( 2) = 1

Pattern Classification, Chapter 2 (Part 1) 3

Salmon

Sea Bass

8/10/2019 Ch 2_ Part 1.pptx

4/25

The a priori or prior probability reacts our knowledgeof how likely we expect a certain state of nature beforewe can actually observe said state of nature.

What is a reasonable Decision Rule if

the only available information is the prior, and

the cost of any incorrect classification is equal?

Decide 1if P(1) > P(2)otherwise decide 2What can we say about this decision rule?

Seems reasonable, but it will always choose the same fish.

If the priors are uniform, this rule will behave poorly.Pattern Classification, Chapter 2 (Part 1) 4

Decision Rule From Only Priors

8/10/2019 Ch 2_ Part 1.pptx

5/25

Class-Conditional Density The class-conditional probability density function is

the probability density function for x, our feature,given that the state of nature is :

P(x | 1)and P(x | 2)describe the difference inlightness between populations of sea and salmon

8/10/2019 Ch 2_ Part 1.pptx

6/25

Class Conditional Probabilities

8/10/2019 Ch 2_ Part 1.pptx

7/25

Combine prior and class-conditional probabilities

Posterior probability is the probability of a certain stateof nature given our observables.


Posterior Probability: Bayes formula

8/10/2019 Ch 2_ Part 1.pptx

8/25

Posterior Probability: Bayes formula

Feature x

8/10/2019 Ch 2_ Part 1.pptx

9/25

Likelihood ratio test (LRT)

The rule for a 2-class problem

if choose 1 else choose 2

Or, in a more compact form

Applying Bayes rule


x)|(x)|( 21

8/10/2019 Ch 2_ Part 1.pptx

10/25

For a given observation x, we would be inclined to let theposterior govern our decision:

The probability of error is :


Probability of Error

8/10/2019 Ch 2_ Part 1.pptx

11/25


8/10/2019 Ch 2_ Part 1.pptx

12/25

To answer this question, it is convenient to express[] in terms of the posterior

The optimal decision rulewill minimize at everyvalue of in feature space, so that the integral above isminimized .

dxxpxerrorperrorP )(]|[][

]|[ xerrorp

How good is the LRT decision rule?

8/10/2019 Ch 2_ Part 1.pptx

14/25

Bayesian Decision TheoryContinuous Features

Generalization of the preceding ideas:

Use of more than one feature (e.g., length and lightness)

Use more than two states of nature

Allowing actionsand not only decide on the state of nature

Introduce a loss of function which is more general than

the probability of error(e.g., errors are not equally costly)

Allowing actionsother than classificationprimarily allowsthe possibility of rejection

The lossfunctionstates how costly each action takenis.


8/10/2019 Ch 2_ Part 1.pptx

15/25

A loss function states exactly how costly each action is.

Let {1, 2,, c}be the set of c states of nature (orcategories)

Let {1, 2,, a}be the set of possible actions

Let the loss function (i| j)be the lossincurred fortaking action iwhen the state of nature is j

A general decision rule is a function ithat tells uswhich action to take for every possible observation.


LossFunctions

8/10/2019 Ch 2_ Part 1.pptx

16/25

R = Sum of all R(i| x) for i = 1,,a Minimizing R implies minimizing R(i| x)

for i = 1,, a The expected loss Risk or conditional risk from

taking action i is:

for i = 1,,a


cj

jjjii xPxR

1 )|()|()|(

Overall risk

8/10/2019 Ch 2_ Part 1.pptx

17/25

Bayes Decision Rule gives us a method for minimizing theoverall risk.

Select the action that minimizes the conditional risk:

R is minimumand R in this case is called the Bayes risk=best performance that can be achieved!


The Minimum Overall Risk

8/10/2019 Ch 2_ Part 1.pptx

18/25

Consider two classes and two actions,

1: deciding 12 : deciding 2ij = (i| j) is the loss incurred for deciding iwhen the

true state of nature is j

Conditional risk:

R(1| x) = 11P(1| x) + 12P(2| x)

R(2| x) = 21P(1 | x) + 22P(2| x)Pattern Classification, Chapter 2 (Part 1) 18

Two-Category Classification Examples

8/10/2019 Ch 2_ Part 1.pptx

19/25

Our rule is the following:if R(1| x)< R(2| x)

action 1: decide 1 is taken

In terms of posteriors, decide 1if:

(21- 11) P(1|x) > (12- 22) P(2|x)

and decide2otherwise

Or, expanding via Bayes Rule, decide !1 if

(21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2)

and decide2otherwise


Two-Category Classification Examples (cont..)

8/10/2019 Ch 2_ Part 1.pptx

20/25

The preceding rule is equivalent to the following rule:

Then take action 1(decide 1)

Otherwise take action 2(decide 2)


)(

)(.

)|(

)|(

1

2

1121

2212

2

1

P

P

xP

xPif

Likelihood Ratio:

8/10/2019 Ch 2_ Part 1.pptx

21/25

If the likelihood ratio exceeds a threshold value

independent of the input pattern x, we can takeoptimal actions


Optimal decision property

8/10/2019 Ch 2_ Part 1.pptx

22/25


Likelihood Ratio Test:An Example

8/10/2019 Ch 2_ Part 1.pptx

23/25

Likelihood Ratio Test: Example 2


8/10/2019 Ch 2_ Part 1.pptx

24/25

Example 2: Answer


8/10/2019 Ch 2_ Part 1.pptx

25/25

Likelihood Ratio Test: Example 3

Select the optimal decision where:

= {1, 2}

P(x | 1)= N(2, 0.5)P(x | 2)=N(1.5, 0.2)

P(1) = 2/3

P(2) = 1/3, and

Pattern Classification Chapter 2 (Part 1) 25

43

21

ch 2_ part 1.pptx

Documents