# hybrids of generative and discriminative methods for machine learning

of 44/44
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning

Post on 20-Jan-2016

38 views

Category:

## Documents

Tags:

• #### g generative case

Embed Size (px)

DESCRIPTION

MSRC Summer School - 30/06/2009. Hybrids of generative and discriminative methods for machine learning. Cambridge – UK. Motivation. Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However - PowerPoint PPT Presentation

TRANSCRIPT

• MSRC Summer School - 30/06/2009Cambridge UKHybrids of generative anddiscriminative methods for machine learning

• MotivationGenerative modelsprior knowledgehandle missing data such as labels

Discriminative modelsperform well at classification

Howeverno straightforward way to combine them

• ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

• ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

• Generative methodsAnswer: what does a cat look like? and a dog? => data and labels joint distributionx : datac : label : parameters

• Generative methodsObjective function:G() = p() p(X, C|) G() = p() n p(xn, cn|)

1 reusable model per class, can deal with incomplete data

Example: GMMs

• Example of generative model

• Discriminative methodsAnswer: is it a cat or a dog? => labels posterior distributionx : datac : label : parameters

• Discriminative methodsThe objective function isD() = p() p(C|X, ) D() = p() n p(cn|xn, )

Focus on regions of ambiguity, make faster predictions

Example: neural networks, SVMs

• Example of discriminative modelSVMs / NNs

• Generative versus discriminativeNo effect of the double mode on the decision boundary

• ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

• Semi-supervised learningFew labelled data / lots of unlabelled data

Discriminative methods overfit, generative models only help classify if they are good

Need to have the modelling power of generative models while performing at discriminating => hybrid models

• Discriminative trainingBach et al, ICASSP 05Discriminative objective function:D() = p() n p(cn|xn, )

Using a generative model:D() = p() n [ p(xn, cn|) / p(xn|) ]

D() = p() n

• Convex combinationBouchard et al, COMPSTAT 04Generative objective function:G() = p() n p(xn, cn|)Discriminative objective function:D() = p() n p(cn|xn, )

Convex combination:log L() = log D() + (1- ) log G() [0,1]

• A principled hybrid model

• A principled hybrid model

• A principled hybrid model

• A principled hybrid model

• A principled hybrid model - posterior distribution of the labels - marginal distribution of the data and communicate through a prior

Hybrid objective function:L(,) = p(,) n p(cn|xn, ) n p(xn|)

• A principled hybrid model = => p(, ) = p() (-) L(,) = p() (-) n p(cn|xn, ) n p(xn|) L() = G() generative case

=> p(, ) = p() p() L(,) = [ p() n p(cn|xn, ) ] [ p() n p(xn|) ] L(,) = D() f() discriminative case

• A principled hybrid modelAnything in between hybrid case

Choice of prior:p(, ) = p() N(|, (a))a 0 => 0 => = a 1 => =>

• Why principled?Consistent with the likelihood of graphical models => one way to train a system

Everything can now be modelled => potential to be Bayesian

Potential to learn a

• LearningEM / Laplace approximation / MCMC either intractable or too slow

Conjugate gradients flexible, easy to check BUT sensitive to initialisation, slow

Variational inference

• ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

• Toy example

• Toy example2 elongated distributions

Only spherical gaussians allowed => wrong model

2 labelled points per class => strong risk of overfitting

• Toy example

• Decision boundaries

• ContentGenerative and discriminative methods

A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

• A real exampleImages are a special case, as they contain several features each

2 levels of supervision: at the image level, and at the feature levelImage label only => weakly labelledImage label + segmentation => fully labelled

• The underlying generative modelgaussianmultinomialmultinomial

• The underlying generative modelweakly fully labelled

• Experimental set-up3 classes: bikes, cows, sheep

: 1 Gaussian per class => poor generative model

75 training images for each category

• HF framework

• HF versus CC

• ResultsWhen increasing the proportion of fully labelled data, the trend is:generative hybrid discriminative

Weakly labelled data has little influence on the trend

With sufficient fully labelled data, HF tends to perform better than CC

• Experimental set-up3 classes: lions, tigers and cheetahs

: 1 Gaussian per class => poor generative model

75 training images for each category

• HF framework

• HF versus CC

• ResultsHybrid models consistently perform better

However, generative and discriminative models havent reached saturation

No clear difference between HF and CC

• ConclusionPrincipled hybrid framework

Possibility to learn the best trade-off

Helps for ambiguous datasets when labelled data is scarce

Problem of optimisation

• Future avenuesBayesian version (posterior distribution of ) under study

Replace by a diagonal matrix to allow flexibility => need for the Bayesian version

Choice of priors

• Thank you!

********************************************