# hybrids of generative and discriminative methods for machine learning

Post on 20-Jan-2016

38 views

Embed Size (px)

DESCRIPTION

MSRC Summer School - 30/06/2009. Hybrids of generative and discriminative methods for machine learning. Cambridge – UK. Motivation. Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However - PowerPoint PPT PresentationTRANSCRIPT

MSRC Summer School - 30/06/2009Cambridge UKHybrids of generative anddiscriminative methods for machine learning

MotivationGenerative modelsprior knowledgehandle missing data such as labels

Discriminative modelsperform well at classification

Howeverno straightforward way to combine them

ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

Generative methodsAnswer: what does a cat look like? and a dog? => data and labels joint distributionx : datac : label : parameters

Generative methodsObjective function:G() = p() p(X, C|) G() = p() n p(xn, cn|)

1 reusable model per class, can deal with incomplete data

Example: GMMs

Example of generative model

Discriminative methodsAnswer: is it a cat or a dog? => labels posterior distributionx : datac : label : parameters

Discriminative methodsThe objective function isD() = p() p(C|X, ) D() = p() n p(cn|xn, )

Focus on regions of ambiguity, make faster predictions

Example: neural networks, SVMs

Example of discriminative modelSVMs / NNs

Generative versus discriminativeNo effect of the double mode on the decision boundary

ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

Semi-supervised learningFew labelled data / lots of unlabelled data

Discriminative methods overfit, generative models only help classify if they are good

Need to have the modelling power of generative models while performing at discriminating => hybrid models

Discriminative trainingBach et al, ICASSP 05Discriminative objective function:D() = p() n p(cn|xn, )

Using a generative model:D() = p() n [ p(xn, cn|) / p(xn|) ]

D() = p() n

Convex combinationBouchard et al, COMPSTAT 04Generative objective function:G() = p() n p(xn, cn|)Discriminative objective function:D() = p() n p(cn|xn, )

Convex combination:log L() = log D() + (1- ) log G() [0,1]

A principled hybrid model

A principled hybrid model

A principled hybrid model

A principled hybrid model

A principled hybrid model - posterior distribution of the labels - marginal distribution of the data and communicate through a prior

Hybrid objective function:L(,) = p(,) n p(cn|xn, ) n p(xn|)

A principled hybrid model = => p(, ) = p() (-) L(,) = p() (-) n p(cn|xn, ) n p(xn|) L() = G() generative case

=> p(, ) = p() p() L(,) = [ p() n p(cn|xn, ) ] [ p() n p(xn|) ] L(,) = D() f() discriminative case

A principled hybrid modelAnything in between hybrid case

Choice of prior:p(, ) = p() N(|, (a))a 0 => 0 => = a 1 => =>

Why principled?Consistent with the likelihood of graphical models => one way to train a system

Everything can now be modelled => potential to be Bayesian

Potential to learn a

LearningEM / Laplace approximation / MCMC either intractable or too slow

Conjugate gradients flexible, easy to check BUT sensitive to initialisation, slow

Variational inference

ContentGenerative and discriminative methods

Toy example

Toy example2 elongated distributions

Only spherical gaussians allowed => wrong model

2 labelled points per class => strong risk of overfitting

Toy example

Decision boundaries

ContentGenerative and discriminative methods

A real exampleImages are a special case, as they contain several features each

2 levels of supervision: at the image level, and at the feature levelImage label only => weakly labelledImage label + segmentation => fully labelled

The underlying generative modelgaussianmultinomialmultinomial

The underlying generative modelweakly fully labelled

Experimental set-up3 classes: bikes, cows, sheep

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

ResultsWhen increasing the proportion of fully labelled data, the trend is:generative hybrid discriminative

Weakly labelled data has little influence on the trend

With sufficient fully labelled data, HF tends to perform better than CC

Experimental set-up3 classes: lions, tigers and cheetahs

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

ResultsHybrid models consistently perform better

However, generative and discriminative models havent reached saturation

No clear difference between HF and CC

ConclusionPrincipled hybrid framework

Possibility to learn the best trade-off

Helps for ambiguous datasets when labelled data is scarce

Problem of optimisation

Future avenuesBayesian version (posterior distribution of ) under study

Replace by a diagonal matrix to allow flexibility => need for the Bayesian version

Choice of priors

Thank you!

********************************************