ibug talk dec 2014 - imperial college london · pdf file.as_vector() .from_vector() x⇤ ......

Menpo

A Comprehensive Platform for Parametric Image Alignment and Visual Deformable Models

Joan Alabort-i-Medina Epameinondas Antonakos

James Booth Patrick Snape

Supervised by Stefanos Zafeiriou

A Comprehensive Platform for Parametric Image Alignment and Visual Deformable Models

Topics

• What Menpo does

• Why we made Menpo

• A tour of the Menpo Libraries

• Demonstration

• How iBUG researchers can use Menpo

• Upcoming talks

1/12

BEHAVIOURAL ANALYSIS

how interested is this person?

is this person lying?

does this person have a medical

disorder?

RECOGNITIONis this James

Booth?

is this person happy?

FEATURE POINT TRACKINGhow does the

nose tip move in this video?

RECONSTRUCTIONhow would this person

look in 3D?how would this sad person

look if they were happy?

SEMANTIC IMAGE ANALYSIS

2/12

BEHAVIOURAL ANALYSIS

how interested is this person?

is this person lying?

does this person have a medical

disorder?

RECOGNITIONis this James

Booth?

is this person happy?

FEATURE POINT TRACKINGhow does the

nose tip move in this video?

RECONSTRUCTIONhow would this person

look in 3D?how would this sad person

look if they were happy?FEATURE POINT LOCALISATION

where is the nose-tip in this

image?

OBJECT DETECTION

where is the face in this image?

2/12

Menpo is not specific to faces!

Motivation

• iBUG ! Matlab

• Each researcher prepares for papers independently

• Isolated scripts, not reusable frameworks

• Our dream in 2012:

• What if we had a shared, well tested, ever improving codebase that we all collaborated on?

• If we did we’d call it Menpo!

3/12

Why Python? - the best of both worlds

INTERACTIVE LINEAR

ALGEBRA

Matlab

Octave

Mathematica

JuliaGENERAL PURPOSE LANGUAGE

C++

C#

Ruby

Java

4/12

R E S E A R C HA P P L I C A T I O N S

menpofit menpo3d menpodetect

automatic image

annotation

M E N P O L I B R A R I E S

menpo

boothiccv2016 …facial point

tracker

emotion detector

…

5/12

6/12M E N P O L I B R A R I E S

menpomenpofit menpo3d menpodetect

ImagesIO Transforms Statistical Models

LandmarksShapes Visualization Vectorization

.JPG .PNG .BMP .TIFF

…

.PTS

.LM2 .GIF .LAN

…

PointCloud TriMesh

PointTree…

Image

AffineHomogeneousProcrustes

ThinPlateSplinesPiecewiseAffine

…

.as_vector()

.from_vector()

x

⇤ = x̄+X

i

↵ixiMaskedImage

6/12M E N P O L I B R A R I E S

menpomenpofit menpo3d menpodetect

ImagesIO Transforms Statistical Models

LandmarksShapes Visualization Vectorization

.JPG .PNG .BMP .TIFF

…

.PTS

.LM2 .GIF .LAN

…

PointCloud TriMesh

PointTree…

Image

AffineHomogeneousProcrustes

ThinPlateSplinesPiecewiseAffine

…

.as_vector()

.from_vector()

x

⇤ = x̄+X

i

↵ixiMaskedImage

Vectorization

7/12

M E N P O L I B R A R I E S

menpo

menpofit

menpo3d

menpodetect

Active Appearance

Models

Constrained Local

Models

Supervised Descent Method

Frontal Face Detector

(mesh) IO

Benchmark*

OpenGL Rasterizer

Morphable Models*

*in development Feature Point Localisation

3D Model Construction

Object detection

8/12

Blending the best from the scientific software community and contributing back

computation 2D plotting 3D plotting machine learning

image importing 3D asset importing object detection interactive computing

OpenGL management image warping image features dependency management

PIL

9/12

Implemented Research Unified design accentuates similarities across papers

Adaptive and Constrained Algorithms for Inverse Compositional

Active Appearance Model Fitting

George Papandreou and Petros MaragosSchool of E.C.E., National Technical University of Athens, Greece

http://cvsp.cs.ntua.gr∗

Abstract

Parametric models of shape and texture such as Ac-

tive Appearance Models (AAMs) are diverse tools for de-

formable object appearance modeling and have found im-

portant applications in both image synthesis and analy-

sis problems. Among the numerous algorithms that have

been proposed for AAM fitting, those based on the inverse-

compositional image alignment technique have recently re-

ceived considerable attention due to their potential for high

efficiency. However, existing fitting algorithms perform

poorly when used in conjunction with models exhibiting

significant appearance variation, such as AAMs trained

on multiple-subject human face images. We introduce two

enhancements to inverse-compositional AAM matching al-

gorithms in order to overcome this limitation. First, we

propose fitting algorithm adaptation, by means of (a) fit-

ting matrix adjustment and (b) AAM mean template up-

date. Second, we show how prior information can be in-

corporated and constrain the AAM fitting process. The

inverse-compositional nature of the algorithm allows effi-

cient implementation of these enhancements. Both tech-

niques substantially improve AAM fitting performance, as

demonstrated with experiments on publicly available multi-

person face datasets.

1. Introduction

Parametric models of shape and texture, such as Ac-tive Appearance Models [8], Active Blobs [25], MorphableModels [18], and other related approaches [5, 11, 16] arewidely used techniques for object appearance modeling.Employing a number of parameters controlling shape andtexture variation, these models bring a target image into reg-istration with a reference template, even in cases that thetarget image is a deformed version of the template; imaging

∗This work was supported in part by grant ΠENE∆-2003-E∆865 [co-financed by E.U.-European Social Fund (75%) and the Greek Ministryof Development-GSRT (25%)] and in part by the European Communityprojects ASPI, MUSCLE, and HIWIRE.

conditions such as camera position and object illuminationcan also differ significantly between the template and thetarget image. Active Appearance Models can additionalyrepresent appearance variability in a whole class of objects,such as faces or cars, after learning it from examples duringa training phase. Such parametric models can be applied toboth image synthesis and analysis problems. Representativeapplications are object tracking in video [25], face recogni-tion [6], face synthesis [18], and image stitching [26].

An important issue with AAMs concerns matching themto images by finding the parameters that minimize the dis-crepancy between observed and synthesized object appear-ances, possibly also including a prior penalty on the pa-rameter values. This is a difficult non-linear optimizationtask and general-purpose optimization procedures can beinneficient. Most existing AAM fitting algorithms solvethe matching problem by iteratively updating the model pa-rameters, assuming that there is a fixed AAM fitting matrix

which maps the synthesis error image to model parameterincrements; this matrix is learned in a precomputation phaseand is subsequently used unaltered, resulting in a very ef-ficient class of algorithms, reviewed in Section 2. How-ever, the fixed mapping approach reaches its limits whenone works with models allowing considerable appearancedeviation from the mean template, such as AAMs built onlarge multi-person face datasets [3, 10].

As has been highlighted in [2], in performing incremen-tal image registration one can choose between (a) updat-ing the warp parameters either additively or composition-ally and (b) calculating incremental warps and image gra-dients either forwardly on the target image or inversely onthe model template. In our work, similarly to [2,12,22], weuse the inverse compositional combination because it facil-itates efficient calculations, since manipulating the modeltemplate instead of the target image allows precomputinguseful quantities and thus accelerates image fitting.

Our first main contribution is that we introduce twocomplementary adaptation mechanisms in inverse com-positional AAM fitting. The first adaptation mechanismamounts to AAM fitting matrix adjustment and compensates

In Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR-2008), Anchorage, AL, June 2008

Parametric Image Alignment Using EnhancedCorrelation Coefficient Maximization

Georgios D. Evangelidis andEmmanouil Z. Psarakis

Abstract—In this work, we propose the use of a modified version of the

correlation coefficient as a performance criterion for the image alignment problem.

The proposed modification has the desirable characteristic of being invariant with

respect to photometric distortions. Since the resulting similarity measure is a

nonlinear function of the warp parameters, we develop two iterative schemes for

its maximization, one based on the forward additive approach and the second on

the inverse compositional method. As is customary in iterative optimization, in

each iteration, the nonlinear objective function is approximated by an alternative

expression for which the corresponding optimization is simple. In our case, we

propose an efficient approximation that leads to a closed-form solution (per

iteration) which is of low computational complexity, the latter property being

particularly strong in our inverse version. The proposed schemes are tested

against the Forward Additive Lucas-Kanade and the Simultaneous Inverse

Compositional (SIC) algorithm through simulations. Under noisy conditions and

photometric distortions, our forward version achieves more accurate alignments

and exhibits faster convergence, whereas our inverse version has similar

performance as the SIC algorithm but at a lower computational complexity.

Index Terms—Image registration, motion estimation, gradient methods,

parametric motion, correlation coefficient.

Ç

1 INTRODUCTION

THE parametric image alignment problem consists of finding atransformation which aligns two image profiles. The profiles caneither be entire images, as in the image registration problem [1],[2], or subimages, as in the region tracking [3], [4], [5], motionestimation [6], [7], [8], [9], and stereo correspondence [10], [11]problems. In image registration, the alignment problem needs tobe solved only once, whereas, in region tracking, a template imagehas to be matched over a sequence of images. Finally, in motionestimation and stereo correspondences, the goal is to find thecorrespondence for all image points in a pair of images.

The alignment problem can be seen as a mapping between thecoordinate systems of two images; therefore, the first step towardits solution is the suitable selection of a geometric transformationthat adequately models this mapping. Existing models arebasically parametric [12] and their exact form heavily dependson the specific application and the strategy selected to solve thealignment problem [3], [13]. The class of affine transformationsand, in particular, several special cases (as pure translation) havebeen the center of attention in many applications [1], [2], [3], [4],[6], [10], [11], [13]. Alternative approaches rely on projectivetransformations (homography) and, more generally, on nonlineartransformations [5], [13], [14], [15].

Once the geometric parametric transformation has been defined,the alignment problem reduces itself to a parameter estimationproblem. Therefore, the second step toward its solution consists of

coming up with an appropriate performance measure, that is, anobjective function. The latter, when optimized, will yield theoptimum parameter estimates. Most existing approaches adoptmeasures that rely on lp norms of the error between either the wholeimage profiles (pixel-based techniques) or a specific feature of theimage profiles (feature-based techniques) [12]. Clearly, the l2 norm isby far the most popular selection so far [1], [3], [6], [7], [9], [10], [13],[15], [16]. The l2-based objective function is usually referred to as theSum-Squared-Differences (SSD) measure and the correspondingoptimization problem is known as the SSD technique [5], [9].Variations on this approach have been proposed for the importantproblem of optical flow determination [5], [7], [17], and robustversions that can combat outliers were developed in [18].

For the optimum parameter estimation, all existing objectivefunctions require nonlinear optimization techniques. Dependingon the adopted solution strategy, the corresponding techniquescan be broadly classified into two categories. The first includesgradient-based or differential approaches and the second includesdirect search techniques [12]. Gradient-based schemes, because oftheir low computational cost, are regarded as more well fitted toCV applications [13], [19]. They are, however, characterized bynoticeable convergence failure whenever homogeneous areasand/or single slanted edges (aperture problem [20]) are present.Meaningless estimates may also arise whenever we have strongdisplacement values. Direct search techniques, on the other hand,do not suffer the latter drawback. Indeed, these approaches caneasily accommodate large motions since they rely on global imagesearches. Unfortunately, the latter require an exceedingly highcomputational cost, which becomes more intense in the cases offine quantization needed in the case of accurate estimates [6].Efforts to reduce complexity by adopting interpolation instead offine quantization or hybrid techniques that combine the twoclasses can be found in [9], [15].

A common assumption encountered in most existing techniquesis the brightness constancy of corresponding points or regions in thetwo profiles [20]. However, this assumption is valid only in specificcases and it is obviously violated under varying illuminationconditions. There, it becomes clear that, in a practical situation, it isimportant that the alignment algorithm be able to take into accountillumination changes. Alignment techniques that compensate forphotometric distortions in contrast and brightness have beenproposed in [1], [6], [8], [10], [16]. Alternative schemes make useof a set of basis images for handling arbitrary lighting conditions [3],[21] or use spatially dependent photometric models [7].

In this paper, we adopt a recently proposed similarity measure[11], the enhanced correlation coefficient, as our objective function forthe alignment problem. Our measure is characterized by two verydesirable properties. First, it is invariant to photometric distortionsin contrast and brightness. Second, although it is a nonlinearfunction of the parameters, the iterative scheme we are going todevelop for the optimization problem will turn out to be linear,thus requiring reduced computational complexity. Despite theresemblance of our final algorithm to well-known variants of theLucas-Kanade alignment method which take lighting changes intoaccount [10], [19], its performance, as we are going to see, isnotably superior. We would like to mention that the enhancedcorrelation coefficient criterion was successfully applied to theproblem of 1D translation estimation in stereo correspondence [11]and 2D translation estimation in registration [2].

The remainder of this paper is organized as follows: InSection 2, we formulate the parametric image alignment problem.Section 3 contains our main analytic results, namely, the definitionof our objective function, the development of a forward and aninverse compositional iterative scheme for its optimization, andthe relation of the proposed schemes to existing SSD techniques. InSection 4, our schemes are tested in a number of experiments

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 10, OCTOBER 2008 1

. The authors are with the Signal Processing and Communications Lab,Department of Computer Engineering and Informatics, University ofPatras, 26504 Rio-Patras, Greece.E-mail: {evagelid, psarakis}@ceid.upatras.gr.

Manuscript received 17 Jan. 2007; revised 12 Feb. 2008; accepted 7 Apr. 2008;published online 2 May 2008.Recommended for acceptance by F. Dellaert.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log NumberTPAMI-0026-0107.Digital Object Identifier no. 10.1109/TPAMI.2008.113.

0162-8828/08/$25.00 ! 2008 IEEE Published by the IEEE Computer Society

Supervised Descent Method and its Applications to Face Alignment

Xuehan Xiong Fernando De la Torre

The Robotics Institute, Carnegie Mellon University, Pittsburgh PA, [email protected] [email protected]

Abstract

Many computer vision problems (e.g., camera calibra-tion, image alignment, structure from motion) are solvedthrough a nonlinear optimization method. It is generallyaccepted that 2nd order descent methods are the most ro-bust, fast and reliable approaches for nonlinear optimiza-tion of a general smooth function. However, in the context ofcomputer vision, 2nd order descent methods have two maindrawbacks: (1) The function might not be analytically dif-ferentiable and numerical approximations are impractical.(2) The Hessian might be large and not positive definite.

To address these issues, this paper proposes a SupervisedDescent Method (SDM) for minimizing a Non-linear LeastSquares (NLS) function. During training, the SDM learnsa sequence of descent directions that minimizes the meanof NLS functions sampled at different points. In testing,SDM minimizes the NLS objective using the learned descentdirections without computing the Jacobian nor the Hes-sian. We illustrate the benefits of our approach in syntheticand real examples, and show how SDM achieves state-of-the-art performance in the problem of facial feature detec-tion. The code is available at www.humansensing.cs.cmu.edu/intraface.

1. Introduction

Mathematical optimization has a fundamental impact insolving many problems in computer vision. This fact isapparent by having a quick look into any major confer-ence in computer vision, where a significant number of pa-pers use optimization techniques. Many important prob-lems in computer vision such as structure from motion, im-age alignment, optical flow, or camera calibration can beposed as solving a nonlinear optimization problem. Thereare a large number of different approaches to solve thesecontinuous nonlinear optimization problems based on firstand second order methods, such as gradient descent [1] fordimensionality reduction, Gauss-Newton for image align-ment [22, 5, 14] or Levenberg-Marquardt for structure frommotion [8].

“I am hungry. Where is the apple? Gotta do Gradient descent”

𝑓 𝐱 = ℎ 𝐱 − 𝐲 2

∆𝐱 = −𝐇(𝐱𝑘)−1𝐉𝑓 𝐱𝑘

𝐱∗

(𝐚)

𝑓 𝐱 = ℎ 𝐱 − 𝐲 2

𝑓(𝐱, 𝐲1)

𝐱∗1 𝐱∗3 𝐱∗2

(𝐛) ∆𝐱𝟏 = 𝐑𝑘 ×

𝑓(𝐱, 𝐲2) 𝑓(𝐱, 𝐲3)

∆𝐱𝟐 = 𝐑𝑘 × ∆𝐱𝟑 = 𝐑𝑘 ×

Figure 1: a) Using Newton’s method to minimize f(x). b) SDMlearns from training data a set of generic descent directions {R

k

}.Each parameter update (�x

i) is the product of Rk

and an image-specific component (yi), illustrated by the 3 great Mathematicians.Observe that no Jacobian or Hessian approximation is needed attest time. We dedicate this figure to I. Newton, C. F. Gauss, and J.L. Lagrange for their everlasting impact on today’s sciences.

Despite its many centuries of history, the Newton’smethod (and its variants) is regarded as a major optimiza-tion tool for smooth functions when second derivatives areavailable. Newton’s method makes the assumption thata smooth function f(x) can be well approximated by aquadratic function in a neighborhood of the minimum. Ifthe Hessian is positive definite, the minimum can be foundby solving a system of linear equations. Given an initial es-timate x0 2 <p⇥1, Newton’s method creates a sequence ofupdates as

xk+1 = x

k

�H�1(x

k

)Jf

(xk

), (1)

where H(xk

) 2 <p⇥p and Jf

(xk

) 2 <p⇥1 are the Hessianmatrix and Jacobian matrix evaluated at x

k

. Newton-typemethods have two main advantages over competitors. First,when it converges, the convergence rate is quadratic. Sec-ond, it is guaranteed to converge provided that the initialestimate is sufficiently close to the minimum.

1

Int J Comput Vis (2011) 91: 200–215DOI 10.1007/s11263-010-0380-4

Deformable Model Fitting by Regularized Landmark Mean-Shift

Jason M. Saragih · Simon Lucey · Jeffrey F. Cohn

Received: 12 May 2009 / Accepted: 10 September 2010 / Published online: 25 September 2010© Springer Science+Business Media, LLC 2010

Abstract Deformable model fitting has been actively pur-sued in the computer vision community for over a decade.As a result, numerous approaches have been proposed withvarying degrees of success. A class of approaches that hasshown substantial promise is one that makes independentpredictions regarding locations of the model’s landmarks,which are combined by enforcing a prior over their jointmotion. A common theme in innovations to this approachis the replacement of the distribution of probable landmarklocations, obtained from each local detector, with simplerparametric forms. In this work, a principled optimizationstrategy is proposed where nonparametric representationsof these likelihoods are maximized within a hierarchy ofsmoothed estimates. The resulting update equations are rem-iniscent of mean-shift over the landmarks but with regu-larization imposed through a global prior over their jointmotion. Extensions to handle partial occlusions and reducecomputational complexity are also presented. Through nu-merical experiments, this approach is shown to outperformsome common existing methods on the task of generic facefitting.

J.M. Saragih (!)ICT Center, CSIRO, Cnr Vimiera and Pembroke Rds, Sydney,NSW 2122, Australiae-mail: [email protected]

S. LuceyICT Center, CSIRO, 1 Technology Court Pullenvale, Brisbane,QLD 4069, Australiae-mail: [email protected]

J.F. CohnRobotics Institute, Carnegie Mellon University, 5000 ForbesAvenue, Pittsburgh, PA 15213, USAe-mail: [email protected]

Keywords Deformable · Registration · Mean-shift

1 Introduction

Deformable model fitting is the problem of registering aparametrized shape model to an image such that its land-marks correspond to consistent locations on the object ofinterest. It is a difficult problem as it involves an opti-mization in high dimensions, where appearance can varygreatly between instances of the object due to lighting con-ditions, image noise, resolution and intrinsic sources ofvariability. Many approaches have been proposed for thisproblem with varying degrees of success. Of these, oneof the most promising is that which models an object us-ing local spatially-coherent image observations (i.e. imagepatches) centered around landmarks of interest within theobject (Cootes and Taylor 1992; Cristinacce and Cootes2004, 2006, 2007; Wang et al. 2008a). For computationand generalization purposes, these image patches are as-sumed to be conditionally independent of one another, anassumption that has shown superior performance in com-parison to holistic approaches in recent literature (Liu 2007;Matthews and Baker 2004; Nguyen and De la Torre Frade2008; Zhou and Comaniciu 2007). Local image patch de-tectors are typically learned, from labeled training images,for each landmark in the object. Due to their small localsupport and large appearance variation in training, however,these local detectors are plagued by the problem of ambigu-ity. This ambiguity can be observed in the non-parametricdistribution of landmark locations (i.e., the response map)obtained from each landmark detector. The central dilemmaaddressed in this work is how to synergetically employ thesenon-parametric measures of the likely locations for eachlandmark, while limiting the effects of their ambiguity, whenfitting the deformable model.

International Journal of Computer Vision 56(3), 221–255, 2004c⃝ 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Lucas-Kanade 20 Years On: A Unifying Framework

SIMON BAKER AND IAIN MATTHEWSThe Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA

[email protected]

[email protected]

Received July 10, 2002; Revised February 6, 2003; Accepted February 7, 2003

Abstract. Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the mostwidely used techniques in computer vision. Applications range from optical flow and tracking to layered motion,mosaic construction, and face coding. Numerous algorithms have been proposed and a wide variety of extensionshave been made to the original formulation. We present an overview of image alignment, describing most of thealgorithms and their extensions in a consistent framework. We concentrate on the inverse compositional algorithm,an efficient algorithm that we recently proposed. We examine which of the extensions to Lucas-Kanade can be usedwith the inverse compositional algorithm without any significant loss of efficiency, and which cannot. In this paper,Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descentapproximation. In future papers, we will cover the choice of the error function, how to allow linear appearancevariation, and how to impose priors on the parameters.

Keywords: image alignment, Lucas-Kanade, a unifying framework, additive vs. compositional algorithms, for-wards vs. inverse algorithms, the inverse compositional algorithm, efficiency, steepest descent, Gauss-Newton,Newton, Levenberg-Marquardt

1. Introduction

Image alignment consists of moving, and possibly de-forming, a template to minimize the difference betweenthe template and an image. Since the first use of im-age alignment in the Lucas-Kanade optical flow al-gorithm (Lucas and Kanade, 1981), image alignmenthas become one of the most widely used techniquesin computer vision. Besides optical flow, some of itsother applications include tracking (Black and Jepson,1998; Hager and Belhumeur, 1998), parametric andlayered motion estimation (Bergen et al., 1992), mo-saic construction (Shum and Szeliski, 2000), medicalimage registration (Christensen and Johnson, 2001),and face coding (Baker and Matthews, 2001; Cooteset al., 1998).

The usual approach to image alignment is gradi-ent descent. A variety of other numerical algorithms

such as difference decomposition (Gleicher, 1997) andlinear regression (Cootes et al., 1998) have also beenproposed, but gradient descent is the defacto standard.Gradient descent can be performed in variety of dif-ferent ways, however. One difference between the var-ious approaches is whether they estimate an additiveincrement to the parameters (the additive approach(Lucas and Kanade, 1981)), or whether they estimatean incremental warp that is then composed with thecurrent estimate of the warp (the compositional ap-proach (Shum and Szeliski, 2000)). Another differenceis whether the algorithm performs a Gauss-Newton, aNewton, a steepest-descent, or a Levenberg-Marquardtapproximation in each gradient descent step.

We propose a unifying framework for image align-ment, describing the various algorithms and their ex-tensions in a consistent manner. Throughout the frame-work we concentrate on the inverse compositional

Robust and Efficient Parametric Face Alignment

Georgios Tzimiropoulos † Stefanos Zafeiriou††Dept. of Computing,

Imperial College London180 Queen’s Gate

London SW7 2AZ, U.K.{gt204,s.zafeiriou,m.pantic}@imperial.ac.uk

Maja Pantic †,∗

∗EEMCSUniversity of Twente

Drienerlolaan 57522 NB EnschedeThe Netherlands ∗

Abstract

We propose a correlation-based approach to parametric

object alignment particularly suitable for face analysis ap-

plications which require efficiency and robustness against

occlusions and illumination changes. Our algorithm regis-

ters two images by iteratively maximizing their correlation

coefficient using gradient ascent. We compute this corre-

lation coefficient from complex gradients which capture the

orientation of image structures rather than pixel intensities.

The maximization of this gradient correlation coefficient re-

sults in an algorithm which is as computationally efficient

as ℓ2 norm-based algorithms, can be extended within the in-verse compositional framework (without the need for Hes-

sian re-computation) and is robust to outliers. To the best

of our knowledge, no other algorithm has been proposed so

far having all three features. We show the robustness of our

algorithm for the problem of face alignment in the presence

of occlusions and non-uniform illumination changes. The

code that reproduces the results of our paper can be found

at http://ibug.doc.ic.ac.uk/resources.

1. Introduction

Object alignment methods aim at finding the transforma-tion or deformation which minimizes the discrepancies be-tween two or more images/objects. In automated face anal-ysis, these discrepancies usually stem from rigid head mo-tions induced by observing faces at different time instancesand from different viewpoints as well as from non-rigid fa-cial deformations induced by facial expressions. Alignmentmethods aim at estimating these motions and, therefore,play a central role in the efficacy and robustness of high-level applications such as face recognition, speech reading

∗This work has been funded by the European Research Council underthe ERC Starting Grant agreement no. ERC- 2007-StG-203143 (MAH-NOB).

and facial expression analysis.

In this work, we focus on object alignment methodsbased on gradient descent optimization. Since the first al-gorithm of this type, the Lucas-Kanade (LK) algorithm [1],gradient descent has become one of the key ingredients inface alignment algorithms. Numerous extensions to theLK algorithm have been proposed to address issues relatedto efficiency [2–5], generalization capacity [2, 6, 7], opti-mization [8, 9] and robustness [10–13]. Most prior workis based on ℓ2 norm minimization. The ℓ2 norm is thestandard choice [1, 2, 5, 13, 14], as it can result in compu-tationally efficient algorithms. Perhaps, the most notableexample of such algorithms is the inverse compositional al-gorithm proposed by Baker and Matthews [4, 5]. At eachiteration, the method solves a linear least squares problemwith the Hessian pre-computed and constant across itera-tions. As usual, the choice of the norm imposes a trade-offbetween robustness and computational complexity. Robustapproaches typically replace the ℓ2 norm with a robust errorfunction [10, 11]. Such methods solve a re-weighted leastsquares problem, where the weights are updated at eachiteration. This additional computation makes them muchslower. For example, replacing the ℓ2 norm with a robustfunction within the inverse compositional framework, re-quires the re-computation of the Hessian at each iteration,resulting in a less efficient algorithm [11].

In this paper, we propose a new cost function for gradientascend face alignment: the maximization of the correlationof image gradient orientations. The use of this correlationcoefficient has been motivated by the recent success of FFT-based gradient correlation methods for the robust estimationof translational displacements [15–17]. More specifically,we use a correlation coefficient which takes the form of thesum of cosines of gradient orientation differences. The useof gradient orientation differences is the key to the robust-ness of the proposed scheme. As it is was shown in [15,18],local orientation mismatches caused by outliers can be well-described by a uniform distribution which, under a number

Generic Active Appearance Models Revisited

Georgios Tzimiropoulos1,2, Joan Alabort-i-Medina1, Stefanos Zafeiriou1, andMaja Pantic1,3

1 Department of Computing, Imperial College London, United Kingdom2 School of Computer Science, University of Lincoln, United Kingdom

3 Faculty of Electrical Engineering, Mathematics and Computer Science, Universityof Twente, The Netherlands

{gt204, ja310, s.zafeiriou, m.pantic}@imperial.ac.uk

Abstract. The proposed Active Orientation Models (AOMs) are gen-erative models of facial shape and appearance. Their main di↵erenceswith the well-known paradigm of Active Appearance Models (AAMs)are (i) they use a di↵erent statistical model of appearance, (ii) they areaccompanied by a robust algorithm for model fitting and parameter es-timation and (iii) and, most importantly, they generalize well to unseenfaces and variations. Their main similarity is computational complex-ity. The project-out version of AOMs is as computationally e�cient asthe standard project-out inverse compositional algorithm which is ad-mittedly the fastest algorithm for fitting AAMs. We show that not onlydoes the AOM generalize well to unseen identities, but also it outper-forms state-of-the-art algorithms for the same task by a large margin.Finally, we prove our claims by providing Matlab code for reproducingour experiments (http://ibug.doc.ic.ac.uk/resources).

1 Introduction

Because of their numerous applications in HCI, face analysis/recognition andmedical imaging, the problems of learning and fitting deformable models havebeen the focus of cutting edge research in computer vision and machine learn-ing for more than two decades. Put in simple terms, these problems can besummarized as follows: Learning a deformable model consists of (a) annotating(typically manually) a set of points (or landmarks) over a set of training im-ages capturing an object of interest (e.g. faces), (b) learning a shape model (orpoint distribution model) which e↵ectively represents the structure and varia-tions among the annotated points and (c) learning appearance models from theimage texture associated with the learned shape. Fitting a deformable modelutilizes the learned shape and appearance models to detect the location of land-marks in new images; this can be done using regression, classification or couldbe formulated as a non-linear optimization problem.

Depending on the application and/or approach many terms have been usedto coin this research: deformable model fitting, Active Shape Models (ASMs) [1],Constrained Local Models (CLMs) [2, 3] landmark localization, point detection,

Active Appearance Models Revisited

Iain Matthews and Simon Baker

The Robotics InstituteCarnegie Mellon University

Abstract

Active Appearance Models (AAMs) and the closely related concepts of Morphable Models andActive Blobs are generative models of a certain visual phenomenon. Although linear in both shapeand appearance, overall, AAMs are nonlinear parametric models in terms of the pixel intensities.Fitting an AAM to an image consists of minimising the error between the input image and the clos-est model instance; i.e. solving a nonlinear optimisation problem. We propose an efficient fittingalgorithm for AAMs based on the inverse compositional image alignment algorithm. We showthat the effects of appearance variation during fitting can be precomputed (“projected out”) usingthis algorithm and how it can be extended to include a global shape normalising warp, typically a2D similarity transformation. We evaluate our algorithm to determine which of its novel aspectsimprove AAM fitting performance.

Keywords: Active Appearance Models, AAMs, Active Blobs, Morphable Models, fitting, effi-ciency, Gauss-Newton gradient descent, inverse compositional image alignment.

Lucas-Kanade 20 Years On: A Unifying Framework

Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization

Robust and Efficient Parametric Face Alignment

Adaptive and Constrained Algorithms for Inverse Compositional Active Appearance Model Fitting

Active Appearance Models Revisited

Deformable Model Fitting by Regularized Landmark Mean-Shift

Generic Active Appearance Models Revisited

Supervised Descent Method and its Applications to Face Alignment

10/12

Patrick

Demo

PYTHON COMMAND LINE INTERFACE

Access full power of Menpo

Long term strategy to make your future research easier

Learn a powerful new language (useful outside of research)

Learn a new language

Can’t easily leverage existing code (do you need it?)

Simple interface for common Menpo operations

Short term - no need to learn Python!

Great for comparing against methods

Only scratches the surface of what’s possible

Cannot contribute back to improve Menpo

Upcoming talks

• Held with either ACM Student Chapter/IC Python

• Aim - Improve software engineering in research

Python

• Python basics

• Python for Matlab users

• Menpo basics

• Advanced Python

Git (Version Control)

• Git basics

• Collaborating with Github

• Advanced Git

menpo.org github.com/menpo menpo-users New BSD 500+

Site: Code:

Google Group: Licence:

Unit tests:

menpo v0.4.4

menpofit v0.1.0

menpo3d v0.1.0

menpodetect v0.1.0

ibug talk dec 2014 - imperial college london · pdf file.as_vector() .from_vector() x⇤ ......

Documents