hybrids of generative and discriminative methods for machine learning

of 44/44
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning

Post on 20-Jan-2016

38 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

MSRC Summer School - 30/06/2009. Hybrids of generative and discriminative methods for machine learning. Cambridge – UK. Motivation. Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However - PowerPoint PPT Presentation

TRANSCRIPT

  • MSRC Summer School - 30/06/2009Cambridge UKHybrids of generative anddiscriminative methods for machine learning

  • MotivationGenerative modelsprior knowledgehandle missing data such as labels

    Discriminative modelsperform well at classification

    Howeverno straightforward way to combine them

  • ContentGenerative and discriminative methods

    A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

  • ContentGenerative and discriminative methods

    A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

  • Generative methodsAnswer: what does a cat look like? and a dog? => data and labels joint distributionx : datac : label : parameters

  • Generative methodsObjective function:G() = p() p(X, C|) G() = p() n p(xn, cn|)

    1 reusable model per class, can deal with incomplete data

    Example: GMMs

  • Example of generative model

  • Discriminative methodsAnswer: is it a cat or a dog? => labels posterior distributionx : datac : label : parameters

  • Discriminative methodsThe objective function isD() = p() p(C|X, ) D() = p() n p(cn|xn, )

    Focus on regions of ambiguity, make faster predictions

    Example: neural networks, SVMs

  • Example of discriminative modelSVMs / NNs

  • Generative versus discriminativeNo effect of the double mode on the decision boundary

  • ContentGenerative and discriminative methods

    A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

  • Semi-supervised learningFew labelled data / lots of unlabelled data

    Discriminative methods overfit, generative models only help classify if they are good

    Need to have the modelling power of generative models while performing at discriminating => hybrid models

  • Discriminative trainingBach et al, ICASSP 05Discriminative objective function:D() = p() n p(cn|xn, )

    Using a generative model:D() = p() n [ p(xn, cn|) / p(xn|) ]

    D() = p() n

  • Convex combinationBouchard et al, COMPSTAT 04Generative objective function:G() = p() n p(xn, cn|)Discriminative objective function:D() = p() n p(cn|xn, )

    Convex combination:log L() = log D() + (1- ) log G() [0,1]

  • A principled hybrid model

  • A principled hybrid model

  • A principled hybrid model

  • A principled hybrid model

  • A principled hybrid model - posterior distribution of the labels - marginal distribution of the data and communicate through a prior

    Hybrid objective function:L(,) = p(,) n p(cn|xn, ) n p(xn|)

  • A principled hybrid model = => p(, ) = p() (-) L(,) = p() (-) n p(cn|xn, ) n p(xn|) L() = G() generative case

    => p(, ) = p() p() L(,) = [ p() n p(cn|xn, ) ] [ p() n p(xn|) ] L(,) = D() f() discriminative case

  • A principled hybrid modelAnything in between hybrid case

    Choice of prior:p(, ) = p() N(|, (a))a 0 => 0 => = a 1 => =>

  • Why principled?Consistent with the likelihood of graphical models => one way to train a system

    Everything can now be modelled => potential to be Bayesian

    Potential to learn a

  • LearningEM / Laplace approximation / MCMC either intractable or too slow

    Conjugate gradients flexible, easy to check BUT sensitive to initialisation, slow

    Variational inference

  • ContentGenerative and discriminative methods

    A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

  • Toy example

  • Toy example2 elongated distributions

    Only spherical gaussians allowed => wrong model

    2 labelled points per class => strong risk of overfitting

  • Toy example

  • Decision boundaries

  • ContentGenerative and discriminative methods

    A principled hybrid frameworkStudy of the properties on a toy exampleInfluence of the amount of labelled data

  • A real exampleImages are a special case, as they contain several features each

    2 levels of supervision: at the image level, and at the feature levelImage label only => weakly labelledImage label + segmentation => fully labelled

  • The underlying generative modelgaussianmultinomialmultinomial

  • The underlying generative modelweakly fully labelled

  • Experimental set-up3 classes: bikes, cows, sheep

    : 1 Gaussian per class => poor generative model

    75 training images for each category

  • HF framework

  • HF versus CC

  • ResultsWhen increasing the proportion of fully labelled data, the trend is:generative hybrid discriminative

    Weakly labelled data has little influence on the trend

    With sufficient fully labelled data, HF tends to perform better than CC

  • Experimental set-up3 classes: lions, tigers and cheetahs

    : 1 Gaussian per class => poor generative model

    75 training images for each category

  • HF framework

  • HF versus CC

  • ResultsHybrid models consistently perform better

    However, generative and discriminative models havent reached saturation

    No clear difference between HF and CC

  • ConclusionPrincipled hybrid framework

    Possibility to learn the best trade-off

    Helps for ambiguous datasets when labelled data is scarce

    Problem of optimisation

  • Future avenuesBayesian version (posterior distribution of ) under study

    Replace by a diagonal matrix to allow flexibility => need for the Bayesian version

    Choice of priors

  • Thank you!

    ********************************************