# discriminative and generative classifiers

Click here to load reader

Post on 14-Jan-2016

90 views

Embed Size (px)

DESCRIPTION

Discriminative and Generative Classifiers. Tom Mitchell Statistical Approaches to Learning and Discovery, 10-702 and 15-802 March 19, 2003 Lecture based on “On Discriminative vs. Generative classifiers: A comparison of logistic regression and naïve Bayes,” A. Ng and M. Jordan, NIPS 2002. - PowerPoint PPT PresentationTRANSCRIPT

Discriminative and Generative ClassifiersTom MitchellStatistical Approaches to Learning and Discovery, 10-702 and 15-802 March 19, 2003

Lecture based on On Discriminative vs. Generative classifiers: A comparison of logistic regression and nave Bayes, A. Ng and M. Jordan, NIPS 2002.

Lecture Outline Generative and Discriminative classifiers Asymptotic comparison (as # examples grows) when model correct when model incorrect Non-asymptotic analysis convergence of parameter estimates convergence of expected error Experimental results

Generative vs. Discriminative ClassifiersTraining classifiers involves estimating f: X Y, or P(Y|X)

Discriminative classifiers (also called informative by Rubinstein&Hastie):Assume some functional form for P(Y|X)Estimate parameters of P(Y|X) directly from training data

Generative classifiersAssume some functional form for P(X|Y), P(X)Estimate parameters of P(X|Y), P(X) directly from training dataUse Bayes rule to calculate P(Y|X= xi)

Generative-Discriminative PairsExample: assume Y boolean, X = , where xi are boolean, perhaps dependent on Y, conditionally independent given YGenerative model: nave Bayes:

Classify new example x based on ratio

Equivalently, based on sign of log of this ratios indicates size of set.l is smoothing parameter

Generative-Discriminative PairsExample: assume Y boolean, X = , where xi are boolean, perhaps dependent on Y, conditionally independent given YGenerative model: nave Bayes:

Classify new example x based on ratio

Discriminative model: logistic regression

Note both learn linear decision surface over X in this case

What is the difference asymptotically?Notation: let denote error of hypothesis learned via algorithm A, from m examplesIf assumed model correct (e.g., nave Bayes model), and finite number of parameters, then

If assumed model incorrect

Note assumed discriminative model can be correct even when generative model incorrect, but not vice versa

Rate of covergence: logistic regressionLet hDis,m be logistic regression trained on m examples in n dimensions. Then with high probability:Implication: if we want for some constant , it suffices to pick

Convergences to best linear classifier, in order of n examples(result follows from Vapniks structural risk bound, plus fact that VCDim of n dimensional linear separators is n )

Rate of covergence: nave BayesConsider first how quickly parameter estimates converge toward their asymptotic values.

Then well ask how this influences rate of convergence toward asymptotic classification error.

Rate of covergence: nave Bayes parameters

Rate of covergence: nave Bayes classification errorSee blackboard

Some experiments from UCI data sets

Pairs of plots comparing nave Bayes and logistic regression with quadratic regularization penalty.Left plots show training error vs. number of examples, right plots show test error.Each row uses different regularization penalty. Top row uses small penalty; penalty increases as you move down the page.Thanks to John Lafferty.