hybrid generative-discriminative visual...

of 39/39
Hybrid Generative-Discriminative Visual Categorization. Alex D. Holub 1 , Max Welling 2 , Pietro Perona 1 1 Computation and Neural Systems 2 Department of Computer Science California Institute of Technology, MC 136-93 University of California Irvine Pasadena, CA 91125 Irvine, CA 92697-3425 [email protected] [email protected] Abstract Learning models for detecting and classifying object categories is a challeng- ing problem in machine vision. While discriminative approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful features, one of which is the ability to naturally establish explicit correspondence between model components and scene features – this, in turn, allows for the handling of missing data and unsupervised learning in clut- ter. We explore a hybrid generative/discriminative approach, using ‘Fisher Ker- nels’ [12], which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements over the corresponding generative approach. In ad- dition, we demonstrate how this hybrid learning paradigm can be extended to address several outstanding challenges within computer vision including how to combine multiple object models and learning with unlabelled data. 1

Post on 07-Feb-2018

214 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Hybrid Generative-Discriminative VisualCategorization.

    Alex D. Holub1, Max Welling2, Pietro Perona1

    1Computation and Neural Systems 2 Department of Computer Science

    California Institute of Technology, MC 136-93 University of California Irvine

    Pasadena, CA 91125 Irvine, CA 92697-3425

    [email protected] [email protected]

    AbstractLearning models for detecting and classifying object categories is a challeng-ing problem in machine vision. While discriminative approaches to learning andclassification have, in principle, superior performance, generative approachesprovide many useful features, one of which is the ability to naturally establishexplicit correspondence between model components and scene features this, inturn, allows for the handling of missing data and unsupervised learning in clut-ter. We explore a hybrid generative/discriminative approach, using Fisher Ker-nels [12], which retains most of the desirable properties of generative methods,while increasing the classification performance through a discriminative setting.Our experiments, conducted on a number of popular benchmarks, show strongperformance improvements over the corresponding generative approach. In ad-dition, we demonstrate how this hybrid learning paradigm can be extended toaddress several outstanding challenges within computer vision including how tocombine multiple object models and learning with unlabelled data.

    1

  • 1 IntroductionDetecting and classifying objects and object categories in images is currently oneof the most interesting, useful, and difficult challenges for machine vision. Muchprogress has been made during the past decade in formulating models that capturethe visual and geometrical statistics of natural objects, in designing algorithmsthat can quickly match these models to images, and in developing learning tech-niques that can estimate these models from training images with limited supervi-sion [1, 26, 30, 8, 16, 5, 24, 15, 11]. However, our best algorithms are not close tomatching human abilities. Machine vision systems are at least two orders of mag-nitude worse than humans in several aspects including the number of categoriesthat can be learned and recognized, the classification error rates, the classificationspeed, and the ease and flexibility with which new categories can be learned.

    This work is motivated by the challenge of learning to recognize categoriesthat look similar to one another. A number of methods have shown good perfor-mance on dissimilar categories (for example airplanes, automobiles, spotted cats,faces and motorcycles as in [8, 5]). None of these methods has been shown to per-form well on visual categories which look similar to one another such as bicyclesand motorcycles or male and female faces. For example, while the constellationmodel [8] has error rates of a few percent on dissimilar categories such as facesvs. airplanes and cars vs. cats, it has error rates around 30% if it is asked to rec-ognize faces of different people (see the x axis of the plots in Fig. 1). Why doesthis discrepancy exist? As we shall see, one potential confound is the underlyinggenerative learning algorithm.

    Learning and classification methods fall into two broad categories (see Fig-ure 2). Let y be the label of the class and x the measured data associated with thatclass. A generative approach will estimate the joint probability density functionp(x, y) (or, equivalently, p(x|y) and p(y)) and will classify using p(y|x) whichis obtained using Bayes rule. Conversely, discriminative approaches will esti-mate p(y|x) (or, alternatively, a classification function y = f(x)) directly fromthe data. It has been argued that the discriminative approach results in superiorperformance, i.e. why bother learning the details for models of different classesif we can directly learn a criteria for discriminating between the classes [27]? In-deed, it was shown that the asymptotic (in the number of training examples) errorof discriminative methods is lower than for generative ones when using simplelearning models [17].

    Yet, among machine vision researchers generative models remain popular [26,30, 8, 5, 15, 20]. There are at least five good reasons why generative approaches

    2