1 face tracking in videos gaurav aggarwal, ashok veeraraghavan, rama chellappa

1

Face Tracking in Videos

Gaurav Aggarwal, Ashok Veeraraghavan, Rama Chellappa

2

Why video ?

Illumination Pose Expression

Video •Multiple images (better hope!)

•Dynamic information (distinguishability?)

3

3D facial pose tracking

The goal is to recover the 3D configuration of a face in each frame of a given video.•3D configuration: 3 translation parameters

and 3 orientation parameters. Important for applications requiring

head normalization like face recognition, expression analysis, lip reading, etc.

4

Challenges

Self occlusions (due to pose changes)

Expression changes Illumination variation PS : unlike 2D

tracking, pose-based appearance changes are crucial.

5

Earlier approaches

2D appearance based•Output: region of interest on the image

•3D configuration? Active appearance models 3D face models based Cylindrical models

• Inter-frame warping usually assumed to be linear

•Simple inter-frame pose changes

6

Our Approach

Hybrid: geometrical + statistical Geometric modeling takes care of

pose and self-occlusion. Statistical inference handles

tracking under occlusions, illumination and expression variations.

7

The Geometric Model

We use a cylindrical model with an elliptic cross-section.•The ellipticity becomes important when yaw is high.

Why not simple planar model?•Tracking becomes difficult and does not provide 3D

pose Why not a complicated face model (based on

a few laser scans) ?•Very susceptible to errors in initialization and

registration.

8

The Projection model

Orthographic•Restrictive

Perspective•Calibration parameters?

We use perspective projection model and show robustness to errors in focal length

9

Errors in focal length assignment (1)

Suppose true focal length = f0

True projections:

Say, assigned focal length = kf0

Consider a fictitious cylinder of same dimensions but placed at (X0, Y0, kZ0)

10


The projections under the assumed f :

Hskakjhjj , we are fine

11


Now, if

The assumption means that the depth variations within the object are small

12

Choice of features

Desirable properties•Easy to detect and compute

•Robust to occlusions, changes in illumination, expression etc.

We stress-test our approach by using an extremely simple and easily computable feature.

13

Features

We superimpose a rectangular grid all around the cylinder.

The mean intensity for each of the visible grids constitutes the feature vector

Given a configuration, the grids can be projected on to the image frame and the feature vector can be computed.

14

Tracking (1)

Dynamic state estimation problem•State consists of 3D orientation and translation

parameters

•We use Particle filter based inference

15

Tracking (2)

pf approximates the desired posterior pdf by a set of weighted particles

Random-walk motion model

• keeps the tracker generic The observation model

• Ds is the mapping to transform an image frame to the feature vector

• N is the feature model

16

Tracking (3)

Likelihood of each particle is computed using average SSD between the feature model and the mean vector corresponding to the particle.

Choice of feature model•Ability to handle variations in the appearance

• Immune to drift

17

Tracking (4)

Two feature models•Lost model (the feature vector in the 1st

frame)•not capable of handling drastic appearance changes

•Wander model (the feature vector corresponding to best particle at previous instant)•can handle appearance changes

•susceptible to drifts

We use a combination of both which makes the tracker very resilient.

18

Tracking (5)

Robust Statistics• trust only the top half of the means and treat

the rest as outliers.

•makes the tracker robust to illumination and expression changes, occlusions, etc.

Robustified likelihood computation

19

Experiments and results

3 different datasets •Ground truth available for one to

evaluate the performance of the tracker Experiments

•Tracking – extreme poses, occlusion, expression variations

•Comparison to ground truth

•Recognition with non-overlapping poses

20

Tracking results

21

Comparison to ground Truth

22

Small Recognition Experiment

Gallery of 10 subjects. No overlap between poses present

in gallery and probes. •nearest poses were at least 30 degrees

apart 100% recognition rate.

24

More results

25

Contributors

Rama Chellappa Gaurav Aggarwal Ashok Veeraraghavan

1 face tracking in videos gaurav aggarwal, ashok veeraraghavan, rama chellappa

Documents

feature model slide

changes slide

focal length slide

inference slide

d tracking

small slide

fine slide

d pose