bayesian decision theory: a framework for making decisions when uncertainty exit 1 lecture notes for...

22
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Upload: belinda-richards

Post on 08-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Statistical Analysis of Coin-Toss Data Let heads = 1; tails = 0 Boolean random variables obey Bernoulli statistics P (x) = p o X (1 ‒ p o ) (1 ‒ X) p o = probability of heads Given a sample of N tosses, an unbiased estimator of p o is the fraction of tosses that show heads. Prediction of next toss: Heads if p o > ½, Tails otherwise 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

TRANSCRIPT

Page 1: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Bayesian decision theory: A framework for making decisions when uncertainty exit

1Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 2: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

2Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Modeling data as random variablesExample: coin toss Given sufficient knowledge, we could use Newton’s laws of motion to calculate the result of each toss with minimal uncertainty

In conjunction with our model, analysis of experimental trajectories will probably reveal why the coin is unfair if heads and tails do not occur with equal probability

Alternative: Accept doubt about result of toss. Treat result as random variable X subject to P(X=x). Use P(X=x) to make rational decision about result of next toss.Assume that we are not interested in why the coin is unfair if that is the case.“The reason is in the data”

Page 3: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Statistical Analysis of Coin-Toss Data• Let heads = 1; tails = 0• Boolean random variables obey Bernoulli statistics

P (x) = poX (1 ‒ po)(1 ‒ X) po = probability of heads

• Given a sample of N tosses, an unbiased estimator of po is the fraction of tosses that show heads.

• Prediction of next toss:Heads if po > ½, Tails otherwise

3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 4: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

xxx

ppP

PCCC |

|

4

posterior

Class likelihoodprior

normalization

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Prior is information relevant to classifying that is independent of attributes xClass likelihood is probability that member of class C will have attribute xAssign client with attribute x to class C if P(C|x) > 0.5

Review: Bayes’ Rule for binary classification

Page 5: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Review: Bayes’ Rule: K>2 Classes

K

kkk

iiiii

CPCp

CPCpp

CPCpCP

1

|

|||x

xx

xx

xxx | max | if to attributesh client witassign

1 and 01

kkii

K

iii

CPCPC

CPCP

5

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 6: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

μxμx 1

2/12/Σ

21exp

Σ21)C|P(x Td

6

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

With class labels rit,

estimators are

Review: Estimating priors and class likelihoods from data

Number of examples in a class is an estimate of its prior.If we assume members of a class are Gaussian distributed,then mean and covariance parameterize class likelihood.

tti

Ti

tt i

tti

i

tti

ttt

ii

tti

i

rr

rr

Nr

CP

mxmx

xm

S

)(

Page 7: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

d

i i

iid

ii

d

d

iii

xCxpCp1

2

1

2/1 21exp

)2(

1||

x

7

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Review: naïve Bayes classification

Each class is characterized by a set of means and variances for the components of the attributes in that class.

A simpler model results from assuming that components of x are independent random variables. Covariance matrix is diagonal and p(x|C) is product of probabilities for each component of x.

Page 8: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

• Actions: αi assigning x to Ci of K classes• Loss λik occurs if we take αi when x belongs to Ck

• Expected risk (Duda and Hart, 1973)

xx

xx

|min| if choose

||

kkii

k

K

kiki

RR

CPR

1

8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Minimizing risk given attributes x

Page 9: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Special case: correct decisions no loss and error have equal cost: “0/1 loss function”

kiki

ik if if

10

x

x

xx

|

|

||

i

ikk

K

kkiki

CP

CP

CPR

1

1

9

For minimum risk, choose the most probable class

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 10: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Add rejection option: don’t assign a class

10 ,otherwise 1

1K if if 0

i

ki

ik

xxx

xx

|||

||

iik

ki

K

kkK

CPCPR

CPR

11

1

reject otherwise,

1| and || if choose xxx ikii CPikCPCPC

10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT

Press (V1.0)

risk of no assignment

risk of choosing Ci

1- is risk making some assignment

Page 11: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

xx

xx

|min| if choose

||

kkii

k

K

kiki

RR

CPR

1

R(1|x) = 11 P(C1|x) + 12 P(C2|x) = 10 P(C2|x)

R(2|x) = 21 P(C1|x) + 22 P(C2|x) = P(C1|x)

Choose C1 if R(1|x) < R(2|x), which is true if

10 P(C2|x) < P(C1|x), which becomes

P(C1|x) > 10/11 using normalization of posteriors

Consequence of erroneously assigning instance to C1 is so bad that we choose C1 only when we are virtually certain it is correct.

Example of risk minimization with 11 = 22 = 0, 12 = 10, and 21 = 1

Loss λik occurs if we take αi when x belongs to Ck

Page 12: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e
Page 13: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

K

kkk

iiiii

CPCp

CPCpp

CPCpCP

1

|

|||x

xx

xx

Bayes’ classifier based on neighbors

Consider data set with N examples, Ni of which belong to class i; P(Ci) = Ni

Given a new example x, draw a hyper-sphere of volume V in attribute space, centered on x and containing precisely K training examples, irrespective of their class.

Suppose this sphere contains ni examples from class i, then p(x|Ci)P(Ci) = V-1(ni/Ni)Ni = V-1ni

K

i

1i

1-

i-1

1

n

nV

nV

|

|| K

k

K

kkk

iii

CPCp

CPCpCPx

xx

Page 14: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Using Bayes’ rule we find posteriors p(Ck|x) = nk/K

Assign x to the class with highest posterior, which is the class with the highest representation among the K training examples in the hyper-sphere centered on x

K=1 (nearest neighbor rule) assign x to the class of nearest neighbor in the training data.

Bayes’ classifier based on neighbors

Page 15: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

15Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Usually chose K from a range values based on validation error

In 2D, we can visualize the classification by applying KNN to every point in the (x1,x2) plane. As K increases expect fewer islands and smoother boundaries

Bayes’ classifier based on K nearest neighbors (KNN)

Page 16: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e
Page 17: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Analysis of binary classification: beyond the confusion matrix

Page 18: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Quantities defined by binary confusion matrix

Let C1 be positive class, C2 be negative class, N be # of instancesError rate = (FP+FN)/N = 1-accuracyFalse positive rate = FP / (FP+TN) = fraction of C2 instances misclassifiedTure positive rate = TP / (TP+FN) = fraction of C1 instances correctly classified

18Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 19: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

19Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Receiver operating characteristic (ROC) curve

Let C1 be positive classLet q be the threshold of P(C1|x) for assignment of x to C1If q is near 1, rare assignments to C1 have high probability of being correct

both FP-rate and TP-rate are smallAs q decreases both FP-rate and TP-rate increaseFor every value of q, (FP-rate, TP-rate) is point on ROC curve

Page 20: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

20Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Chance alone

marginal success

ROC curves

Page 21: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Drawing ROC curvesAssume C1 is the positive class. Rand all examples by decreasing P(C1|x)In decreasing rank order, move up 1/P(C1) for each positive example and move right 1/P(C2) for each negative example

If all examples are correctly classified, ROC curve will be in upper left.

If P(C1|x) is not correlated with class labels, ROC curve will be close to the diagonal

Page 22: Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e

Performance with reduced attribute set is slightly improved

Slight improvement

Misclassified malignant cases decreased by 2