michael j. blackfebruary 2002 learning the appearance and motion of people in video hedvig...

55
Michael J. Black February 2002 Learning the Learning the Appearance and Motion Appearance and Motion of People in Video of People in Video Hedvig Sidenbladh Michael J. Black http://www.cs.brown.edu/~blac Department of Computer Science Brown University Defense Research Institute Stockholm Sweden ttp://www.nada.kth.se/~hedvig (The Science of Silly Walks) (The Science of Silly Walks)

Upload: anis-andrews

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Michael J. BlackFebruary 2002

Learning the Appearance and Learning the Appearance and Motion of People in VideoMotion of People in Video

Hedvig Sidenbladh Michael J. Black

http://www.cs.brown.edu/~black

Department of Computer ScienceBrown University

Defense Research InstituteStockholm Sweden

http://www.nada.kth.se/~hedvig

(The Science of Silly Walks)(The Science of Silly Walks)

Michael J. BlackFebruary 2002

CollaboratorsCollaborators

David Fleet, Xerox PARC

Nancy Pollard, Brown University

Dirk Ormoneit and Trevor Hastie Dept. of Statistics, Stanford University

Allan Jepson, University of Toronto

Michael J. BlackFebruary 2002

The (Silly) ProblemThe (Silly) Problem

Unsolved without manual intervention.

Michael J. BlackFebruary 2002

Inferring 3D Human MotionInferring 3D Human Motion

* No special clothing* Monocular, grayscale, sequences (archival data)* Unknown, cluttered, environment* Incremental estimation

* Infer 3D human motion from 2D image properties.

Michael J. BlackFebruary 2002

Why is it Hard?Why is it Hard?

Low contrast

Self occlusion

Singularities in viewing direction

Unusual viewpoints

Ambiguous matches

Michael J. BlackFebruary 2002

Clothing and LightingClothing and Lighting

Michael J. BlackFebruary 2002

Large MotionsLarge Motions

Limbs move rapidly with respect to their width.

Non-linear dynamics.

Motion blur.

Michael J. BlackFebruary 2002

AmbiguitiesAmbiguities

Where is the leg?

Which leg is in front?

Michael J. BlackFebruary 2002

AmbiguitiesAmbiguities

Accidental alignment

Michael J. BlackFebruary 2002

AmbiguitiesAmbiguities

Whose legs are whose?Occlusion

Michael J. BlackFebruary 2002

RequirementsRequirements

1. Represent uncertainty and multiple hypotheses.

2. Model non-linear dynamics of the body.

3. Exploit image cues in a robust fashion.

4. Integrate information over time.

5. Combine multiple image cues.

Michael J. BlackFebruary 2002

Simple Body ModelSimple Body Model

* Limbs are truncated cones* Parameter vector of joint angles and angular velocities =

Michael J. BlackFebruary 2002

Inference/IssuesInference/IssuesBayesian formulation

p(model | cues) = p(cues | model) p(model)

3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities.

p(cues)

1. Need a constraining likelihood model that is alsoinvariant to variations in human appearance.

2. Need a prior model of how people move.

Michael J. BlackFebruary 2002

What Image Cues?What Image Cues?

Pixels?

Temporal differences?

Background differences?

Edges?

Color?

Silhouettes?

Optical flow?

Michael J. BlackFebruary 2002

Brightness ConstancyBrightness Constancy

I(x, t+1) = I(x+u, t) +

Image motion of foreground as a function of the 3D motion of the body.

Problem: no fixed model of appearance (drift).

t1t

Michael J. BlackFebruary 2002

Bregler and Malik ‘98Bregler and Malik ‘98

State of the Art.

* Brightness constancy cue

• insensitive to appearance

* Full-body required multiple cameras.

* Single hypothesis.

• MAP estimate

Michael J. BlackFebruary 2002

Cham and Rehg ‘99Cham and Rehg ‘99

State of the Art.

* Single camera, multiple hypotheses.

* 2D templates (solves drift but is view dependent)

I(x, t) = I(x+u, 0) +

Michael J. BlackFebruary 2002

Edges as a Cue?Edges as a Cue?

• Probabilistic model?• Under/over-segmentation, thresholds, …

Michael J. BlackFebruary 2002

Deutscher, North, Bascle, & Deutscher, North, Bascle, & Blake ‘99Blake ‘99

* Multiple cameras

* Simplified, clothing, lighting and background.

State of the Art.

Michael J. BlackFebruary 2002

Changing background

Low contrast limb boundaries

Occlusion

Varying shadows

Deforming clothing

What do people look like?

What do non-people look like?

Michael J. BlackFebruary 2002

Key Idea #1 Key Idea #1 (Rigorous Likelihood)(Rigorous Likelihood)

1. Use the 3D model to predict the location of limb boundaries (not necessarily features) in the scene.

2. Compute various filter responses steered to the predicted orientation of the limb.

3. Compute likelihood of filter responses using a statistical model learned from examples.

Michael J. BlackFebruary 2002

Natural Image StatisticsNatural Image Statistics

Ruderman. Lee, Mumford, Huang. Portilla and Simoncelli. Olshausen & Field. Xu, Wu, & Mumford. …

* Statistics of image derivatives are non-Gaussian.* Consistent across scale.

Michael J. BlackFebruary 2002

Statistics of EdgesStatistics of Edges

Statistics of filter responses, F, on edges, pon(F), differs from background statistics, poff (F).

Likelihood ratio, pon/ poff , can be used for edge detection and road following.

Geman & Jednyak and Konishi, Yuille, & Coughlan

What about the object specific statistics of limbs?

* edge may be present or not.

Michael J. BlackFebruary 2002

Object-Specific StatisticsObject-Specific Statistics

Michael J. BlackFebruary 2002

Edge FiltersEdge FiltersNormalized derivatives of Gaussians (Lindeberg, Granlund and Knutsson, Perona, Freeman&Adelson, …)

),(cos),(sin),,( xxx yxe fff

Edge filter response steered to limb orientation:

Filter responses steered to arm orientation.

Michael J. BlackFebruary 2002

Distribution of Edge Distribution of Edge Filter ResponsesFilter Responses

pon(F) poff(F)

Michael J. BlackFebruary 2002

Contrast Normalization?Contrast Normalization?

contrast

OcontrastSw

*2

)*tanh(1

Lee, Mumford & Huang

log(I

IInorm

Michael J. BlackFebruary 2002

Contrast NormalizationContrast NormalizationMaximize difference between distributions

* e.g. Bhattarcharyya distance:

dyypyppp offonoffonB )()(log),(

Michael J. BlackFebruary 2002

Local Contrast NormalizationLocal Contrast Normalization

Michael J. BlackFebruary 2002

Ridge FeaturesRidge Features

|),(cossin2),(sin),(cos|

|),(cossin2),(cos),(sin|),,(22

22

xxx

xxxx

xyyyxx

xyyyxxr

fff

ffff

Scale specific

Michael J. BlackFebruary 2002

Ridge FiltersRidge Filters

Relationship between limb diameter in image and scale of maximum ridge filter response.

Michael J. BlackFebruary 2002

Ridge Thigh StatisticsRidge Thigh Statistics

Michael J. BlackFebruary 2002

Brightness ConstancyBrightness Constancy

What are the statistics of brightness variationI(x, t) - I(x+u, t+1)?

Variation due to clothing, self shadowing, etc.

I(x, t) I(x+u, t+1)

Michael J. BlackFebruary 2002

Brightness ConstancyBrightness Constancy

• well fit by t-distribution or Cauchy distribution (heavy tails)

• related to robust statistics

Michael J. BlackFebruary 2002

Key Idea #2 Key Idea #2 (Explain the Image)(Explain the Image)

p(image | foreground, background)

Generic, unknown, background

Foreground person

Foreground should explain what the background can’t.

pixelsfore

pixelsfore

backimagep

foreimagepconst

)|(

)|(

See also McCormick and Isard, ICCV’01.

Michael J. BlackFebruary 2002

LikelihoodLikelihood

Steered edgefilter responses

crude assumption: filter responses independent across scale.

limbs cues )background|responsefilter(

)person|responsefilter(

p

p

Michael J. BlackFebruary 2002

Inference/IssuesInference/IssuesBayesian formulation

p(model | cues) = p(cues | model) p(model) p(cues)

1. Need a constraining likelihood model that is alsoinvariant to variations in human appearance.

2. Need a prior model of how people move.

Michael J. BlackFebruary 2002

Learning Human MotionLearning Human Motion

* constrain the posterior to likely & valid poses/motions* model the variability

time

joint angles

3D motion-capture data. * Database with multiple actors and a variety of motions.

(from M. Gleicher)

Michael J. BlackFebruary 2002

Key Idea #3 Key Idea #3 (Trade learning for search.)(Trade learning for search.)

Problem:

* insufficient data to learn a prior probabilistic model of human motion.

Alternative:

* the data represents all we know

* replace representation and learning with search. (challenge: search has to be fast)

Michael J. BlackFebruary 2002

Texture SynthesisTexture Synthesis

Efros & Freeman’01

“Database”Synthetic Texture

* De Bonnet & Viola, Efros & Leung, Efros & Freeman, Paztor & Freeman, Hertzmann et al, …

* Image(s) as an implicit probabilistic model.

Michael J. BlackFebruary 2002

Implicit Probabilistic ModelImplicit Probabilistic Model

Key idea: probabilistic search (log time) of this tree approximates sampling from p(stored sequence | generated sequence).

Michael J. BlackFebruary 2002

SynthesisSynthesis

* Colors indicate different training sequences.

* For graphics, we need- editability, constraints (ground contact, pose, interpenetration), key frames, style, …

Michael J. BlackFebruary 2002

Tracking* Efficiently generate samples (image data will sort out which are good).

* Temperature parameter controls randomness of tree search.

Michael J. BlackFebruary 2002

Bayesian FormulationBayesian Formulation

1111 ))|()|(()(

)|(

ttttttt

tt

dppp

p

II

I

Posterior over model parameters given an image sequence.

Likelihood ofobserving the imagegiven the model parameters

Temporal model (prior)

Posterior fromprevious time instant

Michael J. BlackFebruary 2002

What does the posterior look like?What does the posterior look like?

x yz

Shoulder: 3dofElbow: 1dof

Elbow bends

Michael J. BlackFebruary 2002

Inference/IssuesInference/IssuesBayesian formulation

p(model | cues) = p(cues | model) p(model)

3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities.

p(cues)

1. Need a constraining likelihood model that is alsoinvariant to variations in human appearance.

2. Need a prior model of how people move.

Michael J. BlackFebruary 2002

Key Idea #4 Key Idea #4 (Represent Ambiguity)(Represent Ambiguity)

Samples from a distributionover 3D poses.

* Represent a multi-modal posterior probability distribution over model parameters - sampled representation - each sample is a pose and its probability - predict over time using a particle filtering approach.

Michael J. BlackFebruary 2002

Particle FilteringParticle Filtering* large literature (Gordon et al ‘93, Isard & Blake ‘96,…)

* non-Gaussian posterior approximated by N discrete samples

* explicitly represent the ambiguities

* exploit stochastic sampling for tracking

)(nt )10( 3NNn ,...,1

Michael J. BlackFebruary 2002

Particle FilterParticle Filter

samplesample

samplesample

normalizenormalize

Posterior)I|( 11 ttp

Temporal dynamics)|( 1ttp

Likelihood

)|I( ttp )I|( ttp

Posterio

r

Michael J. BlackFebruary 2002

Particle FilterParticle Filter

Isard & Blake ‘96

Michael J. BlackFebruary 2002

Tracking with OcclusionTracking with Occlusion

1500 samples, ~2 minutes/frame.

Michael J. BlackFebruary 2002

Moving CameraMoving Camera

1500 samples, ~2 minutes/frame.

Michael J. BlackFebruary 2002

Stochastic 3D TrackingStochastic 3D Tracking

* 2500 samples (now down as low as 300 with the new prior).

Michael J. BlackFebruary 2002

ConclusionsConclusionsInferring human motion, silly or not, from video is challenging.

We have tackled three important parts of the problem:

1. Probabilistically modeling human appearance in a generic, yet useful, way.

2. Representing the range of possible motions using techniques from texture modeling.

3. Dealing with ambiguities and non-linearities using particle filtering for Bayesian inference.

Michael J. BlackFebruary 2002

Ongoing and Future WorkOngoing and Future WorkBetter search algorithms Hybrid Monte Carlo tracker (Choo and Fleet ’01) Covariance scaled sampling (Schiminescu&Triggs’01)

Richer prior models of motion.

Estimate background motion.

Statistical models of color and texture.

Automatic initialization.

Training data and likelihood models to be available in the web.