sep 10th, 2001copyright © 2001, andrew w. moore learning gaussian bayes classifiers andrew w. moore...

66
Sep 10th, 2001 Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email protected] 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.

Upload: dominic-boyd

Post on 13-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Sep 10th, 2001Copyright © 2001, Andrew W.

Moore

Learning Gaussian Bayes Classifiers

Andrew W. MooreAssociate Professor

School of Computer ScienceCarnegie Mellon University

www.cs.cmu.edu/[email protected]

412-268-7599

Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 2

Maximum Likelihood learning of Gaussians for Classification

• Why we should care• 3 seconds to teach you a new learning

algorithm• What if there are 10,000 dimensions?• What if there are categorical inputs?• Examples “out the wazoo”

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 3

Why we should care• One of the original “Data Mining”

algorithms• Very simple and effective• Demonstrates the usefulness of our

earlier groundwork

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 4

Where we were at the end of the MLE lecture…

Inp

uts

ClassifierPredict

category

Inp

uts Density

EstimatorProb-ability

Inp

uts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeJoint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 5

This lecture…In

pu

ts

ClassifierPredict

category

Inp

uts Density

EstimatorProb-ability

Inp

uts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeJoint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss BC

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 6

Road MapProbability

PDFs

Gaussians

MLE ofGaussians

MLE

DensityEstimation

BayesClassifier

s

DecisionTrees

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 7

Road MapProbability

PDFs

Gaussians

MLE ofGaussians

MLE

DensityEstimation

BayesClassifier

s

Gaussian

BayesClassifie

rs

DecisionTrees

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 8

Gaussian Bayes Classifier Assumption

• The i’th record in the database is created using the following algorithm

1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)

2. Generate the inputs from a Gaussian PDF that depends on the value of yi :

xi ~ N(i ,i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 9

MLE Gaussian Bayes Classifier

• The i’th record in the database is created using the following algorithm

1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)

2. Generate the inputs from a Gaussian PDF that depends on the value of yi :

xi ~ N(i ,i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

|DBi|pi

mle = ------ |DB|

Let DBi = Subset ofdatabase DB in which

the output class is y = i

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 10

MLE Gaussian Bayes Classifier

• The i’th record in the database is created using the following algorithm

1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)

2. Generate the inputs from a Gaussian PDF that depends on the value of yi :

xi ~ N(i ,i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

(imle, i

mle )= MLE Gaussian for DBi

Let DBi = Subset ofdatabase DB in which

the output class is y = i

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 11

MLE Gaussian Bayes Classifier

• The i’th record in the database is created using the following algorithm

1. Generate the output (the “class”) by drawing yi~Multinomial(p1,p2,…pNy)

2. Generate the inputs from a Gaussian PDF that depends on the value of yi :

xi ~ N(i ,i).

Test your understanding. Given Ny classes and m input attributes, how many distinct scalar parameters need to be estimated?

(imle, i

mle )= MLE Gaussian for DBi

Let DBi = Subset ofdatabase DB in which

the output class is y = i

R

ki

mlei

ik DB|DB|

1

x

R

Tmleik

mleik

i

mlei

ik DB|DB|

1

x

μxμxΣ

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 12

Gaussian Bayes Classification

)(

)()|()|(

x

xx

p

iyPiypiyP

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 13

Gaussian Bayes Classification

)(

)()|()|(

x

xx

p

iyPiypiyP

)(

21

exp||||)2(

1

)|(2/12/

x

μxΣμxΣ

xp

p

iyPiiki

Tik

im

How do we deal with that?

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 14

Here is a dataset

48,000 records, 16 attributes [Kohavi 1995]

age employmenteducation edunummarital … job relation race gender hours_workedcountry wealth…

39 State_gov Bachelors 13 Never_married… Adm_clericalNot_in_familyWhite Male 40 United_Statespoor51 Self_emp_not_incBachelors 13 Married … Exec_managerialHusband White Male 13 United_Statespoor39 Private HS_grad 9 Divorced … Handlers_cleanersNot_in_familyWhite Male 40 United_Statespoor54 Private 11th 7 Married … Handlers_cleanersHusband Black Male 40 United_Statespoor28 Private Bachelors 13 Married … Prof_specialtyWife Black Female 40 Cuba poor38 Private Masters 14 Married … Exec_managerialWife White Female 40 United_Statespoor50 Private 9th 5 Married_spouse_absent… Other_serviceNot_in_familyBlack Female 16 Jamaica poor52 Self_emp_not_incHS_grad 9 Married … Exec_managerialHusband White Male 45 United_Statesrich31 Private Masters 14 Never_married… Prof_specialtyNot_in_familyWhite Female 50 United_Statesrich42 Private Bachelors 13 Married … Exec_managerialHusband White Male 40 United_Statesrich37 Private Some_college10 Married … Exec_managerialHusband Black Male 80 United_Statesrich30 State_gov Bachelors 13 Married … Prof_specialtyHusband Asian Male 40 India rich24 Private Bachelors 13 Never_married… Adm_clericalOwn_child White Female 30 United_Statespoor33 Private Assoc_acdm12 Never_married… Sales Not_in_familyBlack Male 50 United_Statespoor41 Private Assoc_voc 11 Married … Craft_repairHusband Asian Male 40 *MissingValue*rich34 Private 7th_8th 4 Married … Transport_movingHusband Amer_IndianMale 45 Mexico poor26 Self_emp_not_incHS_grad 9 Never_married… Farming_fishingOwn_child White Male 35 United_Statespoor33 Private HS_grad 9 Never_married… Machine_op_inspctUnmarried White Male 40 United_Statespoor38 Private 11th 7 Married … Sales Husband White Male 50 United_Statespoor44 Self_emp_not_incMasters 14 Divorced … Exec_managerialUnmarried White Female 45 United_Statesrich41 Private Doctorate 16 Married … Prof_specialtyHusband White Male 60 United_Statesrich

: : : : : : : : : : : : :

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 15

Predicting wealth from age

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 16

Predicting wealth from age

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 17

Wealth from hours worked

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 18

Wealth from years of education

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 19

age, hours wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 20

age, hours wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 21

age, hours wealth

Having 2 inputs instead of one helps in two ways:1. Combining evidence from two 1d Gaussians2. Off-diagonal covariance distinguishes class “shape”

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 22

age, hours wealth

Having 2 inputs instead of one helps in two ways:1. Combining evidence from two 1d Gaussians2. Off-diagonal covariance distinguishes class “shape”

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 23

age, edunum wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 24

age, edunum wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 25

hours, edunum wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 26

hours, edunum wealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 27

Accuracy

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 28

An “MPG” example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 29

An “MPG” example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 30

An “MPG” exampleThings to note:

•Class Boundaries can be weird shapes (hyperconic sections)

•Class regions can be non-simply-connected

•But it’s impossible to model arbitrarily weirdly shaped regions

•Test your understanding: With one input, must classes be simply connected?

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 31

Overfitting dangers• Problem with “Joint” Bayes classifier:

#parameters exponential with #dimensions.

This means we just memorize the training data, and can overfit.

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 32

Overfitting dangers• Problem with “Joint” Bayes classifier:

#parameters exponential with #dimensions.This means we just memorize the

training data, and can overfit.• Problemette with Gaussian Bayes

classifier:#parameters quadratic with #dimensions.

With 10,000 dimensions and only 1,000 datapoints we could overfit.

Question: Any suggested solutions?

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 33

General: O(m2) parameters

mmm

m

m

221

222

12

11212

Σ

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 34

General: O(m2) parameters

mmm

m

m

221

222

12

11212

Σ

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 35

Aligned: O(m) parameters

m

m

2

12

32

22

12

0000

0000

0000

0000

0000

Σ

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 36

Aligned: O(m) parameters

m

m

2

12

32

22

12

0000

0000

0000

0000

0000

Σ

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 37

Spherical: O(1) cov

parameters

2

2

2

2

2

0000

0000

0000

0000

0000

Σ

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 38

Spherical: O(1) cov

parameters

2

2

2

2

2

0000

0000

0000

0000

0000

Σ

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 39

BCs that have both real and categorical inputs?

Inp

uts

ClassifierPredict

category

Inp

uts Density

EstimatorProb-ability

Inp

uts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec Tree

BC Here???

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss BC

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 40

BCs that have both real and categorical inputs?

Inp

uts

ClassifierPredict

category

Inp

uts Density

EstimatorProb-ability

Inp

uts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec Tree

BC Here???

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss BC

Easy!

Guess how?

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 41

BCs that have both real and categorical inputs?

Inp

uts

ClassifierPredict

category

Inp

uts Density

EstimatorProb-ability

Inp

uts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeGauss/Joint BCGauss Naïve BC

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss DE

Gauss BC

Gauss/Joint DE

Gauss Naïve DE

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 42

BCs that have both real and categorical inputs?

Inp

uts

ClassifierPredict

category

Inp

uts Density

EstimatorProb-ability

Inp

uts

RegressorPredictreal no.

Categorical inputs only

Mixed Real / Cat okay

Real-valued inputs only

Dec TreeGauss/Joint BCGauss Naïve BC

Joint BC

Naïve BC

Joint DE

Naïve DE

Gauss DE

Gauss DE

Gauss BC

Gauss/Joint DE

Gauss Naïve DE

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 43

Mixed Categorical / Real Density Estimation

• Write x = (u,v) = (u1 ,u2 ,…uq ,v1 ,v2 … vm-q)

Real valued Categorical valued

P(x |M)= P(u,v |M)

(where M is any Density Estimation Model)

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 44

Not sure which tasty

DE to enjoy? Try our…

Joint / Gauss DE Combo

P(u,v |M) = P(u |v ,M) P(v |M)

Gaussian withparameters

depending on v

Big “m-q”-dimensional lookup

table

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 45

MLE learning of the Joint / Gauss DE Combo

P(u,v |M) = P(u |v ,M) P(v |M)

v = Mean of u among records matching v

v = Cov. of u among records matching v

qv = Fraction of records that match v

u |v ,M ~ N(v , v) , P(v |M) = qv

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 46

MLE learning of the Joint / Gauss DE Combo

P(u,v |M) = P(u |v ,M) P(v |M)

v = Mean of u among records matching v

=

v = Cov. of u among records matching v

=

qv = Fraction of records that match v

=

vvv

ukk

kR s.t.

1

vv

vvv

μuμukk

TkkR s.t.

))((1

R

Rv

u |v ,M ~ N(v , v) , P(v |M) = qv

Rv = # records that match v

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 47

Gender and Hours Worked*

*As with all the results from the UCI “adult census” dataset, we can’t draw any real-world conclusions since it’s such a non-real-world sample

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 48

Joint / Gauss DE Combo

What we just did

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 49

Joint / Gauss BC Combo

What we do next

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 50

Joint / Gauss BC Combo

),(

)()|,(),|(

vu

vuvu

p

iYPMpiYP i

),(

)()|(),|,(

vu

vvu

p

iYPMpMp ii

),(

),;( ,,,

vu

Σμu vvv

p

pqN iiii

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 51

Joint / Gauss BC Combo

),(

)()|,(),|(

vu

vuvu

p

iYPMpiYP i

),(

)()|(),|,(

vu

vvu

p

iYPMpMp ii

),(

),;( ,,,

vu

Σμu vvv

p

pqN iiii

Rather so-so-notation for “Gaussian with mean i,v

and covariance i,v

evaluated at u”

i,v = Mean of u among records matching v and in which y=i

i,v = Cov. of u among records matching v and in which y=i

qi,v = Fraction of “y=i” records that match v

pi = Fraction of records that match “y=i”

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 52

Gender, HoursWealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 53

Gender, HoursWealth

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 54

Joint / Gauss DE Combo and Joint / Gauss BC Combo: The

downside

• (Yawn…we’ve done this before…)More than a few categorical attributes blah

blah blah massive table blah blah lots of parameters blah blah just memorize training data blah blah blah do worse on future data blah blah need to be more conservative blah

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 55

Naïve/Gauss combo for Density Estimation

How many parameters?

qm

jj

q

jj MvPMupMp

11

)|()|()|,( vu

),(~| 2jjj NMu ],...,,[~| 21lMultinomia

jjNjjj qqqMv

Real Categorical

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 56

Naïve/Gauss combo for Density Estimation

qm

jj

q

jj MvPMupMp

11

)|()|()|,( vu

),(~| 2jjj NMu ],...,,[~| 21lMultinomia

jjNjjj qqqMv

Real Categorical

k

kjj uR

1

R

hvq jjh

in which records of #

22 )(1

kjkjj u

R

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 57

Naïve/Gauss DE Example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 58

Naïve/Gauss DE Example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 59

Naïve / Gauss BC ),(

)()|,(),|(

vu

vuvu

p

iYPiYpiYP

)()|(),|(),(

1

11

2 iYPvPupp

qm

jijj

q

jijijj

qvu

ij =Mean of uj among records in which y=i

2ij =Var. of uj among records in which y=i

qij[h] =Fraction of “y=i” records in which vj = h

pi =Fraction of records that match “y=i”

i

qm

jjij

q

jijijj pvquN

p

11

2 ][),;(),(

1 vu

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 60

Gauss / Naïve BC Example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 61

Gauss / Naïve BC Example

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 62

Learn Wealth from 15 attributes

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 63

Learn Wealth from 15 attributes

Sam

e d

ata

, exce

pt

all

real valu

es

dis

creti

zed

to 3

le

vels

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 64

Learn Race from 15 attributes

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 65

What you should know• A lot of this should have just been a

corollary of what you already knew• Turning Gaussian DEs into Gaussian

BCs• Mixing Categorical and Real-Valued

Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifiers: Slide 66

Questions to Ponder• Suppose you wanted to create an

example dataset where a BC involving Gaussians crushed decision trees like a bug. What would you do?

• Could you combine Decision Trees and Bayes Classifiers? How? (maybe there is more than one possible way)