on discriminative vs. generative classifiers: naïve bayes presenter : seung hwan, bae

23
On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

Upload: baldric-ball

Post on 17-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

On Discriminative vs. Gener-ative classifiers: Naïve Bayes

Presenter : Seung Hwan, Bae

Page 2: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

2

Andrew Y. Ng and Michael I. JordanNeural Information Processing System (NIPS),

2001 (slides adapted from Ke Chen from University of Manchester

and YangQiu Song from MSRA)Total Citation: 831

Page 3: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

3

Machine Learning

Page 4: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

4

Training classifiers involves estimating f: X->Y, or P(Y|X)– X: Training data, Y: Labels

Discriminative classifiers(also called ‘infor-mative’ by Rubinstein & Hastie):– Assume some functional form from for P(Y|X)– Estimate parameters of P(Y|X) directly from training data

Generative classifier– Assume some functional from for P(X|Y), P(X)– Estimate parameters of P(X|Y), P(X) directly from train-

ing data– Use Bayes rule to calculate

Generative vs. Discriminative Classi-fiers

Page 5: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

5

Bayes Formula

Page 6: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

6

Generative Model

• Color• Size• Texture• Weight• …

Page 7: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

7

Discriminative Model

Logistic Regression

• Color• Size• Texture• Weight• …

Page 8: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

8

Generative models– Assume some functional form for P(X|Y), P(Y)– Estimate parameters of P(X|Y), P(Y) directly from training

data– Use Bayes rule to calculate P(Y|X=x)

Discriminative models– Directly assume some functional form for P(Y|X)– Estimate parameters of P(Y|X) directly from training data

Comparison

Y

X2X1

Y

X2X1

Naïve BayesGenerative

Logistic RegressionDiscriminative

Page 9: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

9

Probability Basics

• Prior, conditional and joint probability for random variables– Prior probability:

– Conditional probability: – Joint probability: – Relationship:– Independence:

• Bayesian Rule

)| ,)( 121 XP(XX|XP 2

)()()(

)(X

XX

PCPC|P

|CP

)(XP

) )( ),,( 22 ,XP(XPXX 11 XX

)()|()()|() 2211122 XPXXPXPXXP,XP(X1

)()() ),()|( ),()|( 212121212 XPXP,XP(XXPXXPXPXXP 1

EvidencePriorLikelihood

Posterior

Page 10: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

10

Establishing a probabilistic model for classi-fication– Discriminative model

Probabilistic Classification

),, , )( 1 n1L X(Xc,,cC|CP XX

),,,( 21 nxxx x

Discriminative Probabilistic Classifier

1x 2x nx

)|( 1 xcP )|( 2 xcP )|( xLcP

Page 11: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

11

Establishing a probabilistic model for classi-fication (cont.)– Generative model

Probabilistic Classification

),, , )( 1 n1L X(Xc,,cCC|P XX

GenerativeProbabilistic Model

for Class 1

)|( 1cP x

1x 2x nx

GenerativeProbabilistic Model

for Class 2

)|( 2cP x

1x 2x nx

GenerativeProbabilistic Model

for Class L

)|( LcP x

1x 2x nx

),,,( 21 nxxx x

Page 12: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

12

MAP classification rule– MAP: Maximum A Posterior– Assign x to c* if

Generative classification with the MAP rule– Apply Bayesian rule to convert them into posterior prob-

abilities

– Then apply the MAP rule

Probabilistic Classification

Lc,,cccc|cCP|cCP 1** , )( )( xXxX

Li

cCPcC|P

PcCPcC|P

|cCP

ii

iii

,,2,1 for

)()(

)()()(

)(

xX

xXxX

xX

Page 13: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

13

Bayes classification

- Difficulty: learning the joint probability

- If the number of feature n is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible.

Naïve Bayes

)()|,,()()( )( 1 CPCXXPCPC|P|CP n XX

)|,,( 1 CXXP n

Page 14: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

14

Naïve Bayes classification– Assume that all input attributes are conditionally inde-

pendent!

– MAP classification rule: for

Naïve Bayes

)|()|()|(

)|,,()|(

)|,,();,,|()|,,,(

21

21

22121

CXPCXPCXP

CXXPCXP

CXXPCXXXPCXXXP

n

n

nnn

),,,( 21 nxxx x

Lnn ccccccPcxPcxPcPcxPcxP ,, , ),()]|()|([)()]|()|([ 1*

1***

1

Page 15: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

15

Naïve Bayes Algorithm (for discrete input attributes)– Learning phase: Given a train set S,

Output: conditional probability tables; for elements

– Test phase: Given an unknown instance Look up tables to assign the label c* to X’ if

Naïve Bayes

;in examples with )|( estimate)|(ˆ

),1 ;,,1( attributeeach of valueattributeevery For

;in examples with )( estimate)(ˆ

of et value each targFor 1

S

S

ijkjijkj

jjjk

ii

Lii

cCxXPcCxXP

N,knj Xx

cCPcCP

)c,,c(c c

LNX jj ,

),,( 1 naa X

Lnn ccccccPcaPcaPcPcaPcaP ,, , ),(̂)]|(̂)|(̂[)(̂)]|(̂)|(̂[ 1*

1***

1

Page 16: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

16

Example

• Example: Play Tennis

Page 17: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

17

Learning phase

Example

Outlook Play=Yes

Play=No

Sunny 2/9 3/5Overcast 4/9 0/5

Rain 3/9 2/5

Temperature

Play=Yes Play=No

Hot 2/9 2/5Mild 4/9 2/5Cool 3/9 1/5

Humidity Play=Yes

Play=No

High 3/9 4/5Normal 6/9 1/5

Wind Play=Yes

Play=No

Strong 3/9 3/5Weak 6/9 2/5

P(Play=Yes) = 9/14P(Play=No) = 5/14

Page 18: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

18

Test Phase– Given a new instances x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

– Look up tables

– MAP rule

Example

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes)

= 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

Page 19: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

19

• Test Phase– Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High,

Wind=Strong)– Look up tables

– MAP rule

P(Outlook=Sunny|Play=No) = 3/5

P(Temperature=Cool|Play==No) = 1/5

P(Huminity=High|Play=No) = 4/5

P(Wind=Strong|Play=No) = 3/5

P(Play=No) = 5/14

P(Outlook=Sunny|Play=Yes) = 2/9

P(Temperature=Cool|Play=Yes) = 3/9

P(Huminity=High|Play=Yes) = 3/9

P(Wind=Strong|Play=Yes) = 3/9

P(Play=Yes) = 9/14

P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|

Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be

“No”.

Example

Page 20: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

20

Violation of Independent Assumption– For many real world tasks,– Nevertheless, naïve Bayes works surprisingly well any-

way! Zero conditional probability problem

– In no example contains the attribute value– In this circumstance, during test– For a remedy, conditional probabilities estimated with

Relevant Issues

)|()|( )|,,( 11 CXPCXPCXXP nn

0)|(̂ , ijkjjkj cCaXPaX0)|(ˆ)|(ˆ)|(ˆ

1 inijki cxPcaPcxP

)1 examples, virtual"" of(number prior o weight t:

) of valuespossible for /1 (usually, estimateprior :

for which examples trainingofnumber :

C and for which examples trainingofnumber :

)|(ˆ

mm

Xttpp

cCn

caXnmn

mpncCaXP

j

i

ijkjc

cijkj

Page 21: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

21

Continuous-valued Input Attributes– Numberless vales for an attribute– Conditional probability modeled with the normal distribu-

tion

– Learning phase: Output: normal distributions and– Test phase:

• Calculate conditional probabilities with all the normal distribution• Apply the MAP rule to make a decision

Relevant Issues

ijji

ijji

ji

jij

jiij

cC

cX

XcCXP

for which examples of X valuesattribute ofdeviation standard :

Cfor which examples of valuesattribute of (avearage)mean :

2

)(exp

2

1)|(ˆ

2

2

Ln ccCXX ,, ),,,(for 11 X

Ln LicCP i ,,1 )( ),,(for 1 nXX X

Page 22: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

22

Naïve Bayes based on the independent as-sumption– A small amount of training data to estimate parameters

(means and variances of the variable)– Only the variances of variables for each class need to be

determined and not the entire covariance matrix– Test is straightforward; just looking up tables or calculat-

ing conditional probabilities with normal distribution

Advantages of Naïve Bayes

Page 23: On Discriminative vs. Generative classifiers: Naïve Bayes Presenter : Seung Hwan, Bae

23

Performance competitive to most of state-of-art classifiers even in presence of violat-ing independence assumption

Many successful application, e.g., spam mail fitering

A good candidate of a base learner in en-semble learning

Apart from classification, naïve Bayes can do more…

Conclusion