c280, computer vision

112
C280, Computer Vision Prof. Trevor Darrell [email protected] Lecture 9: Motion

Upload: gunda

Post on 25-Feb-2016

48 views

Category:

Documents


1 download

DESCRIPTION

C280, Computer Vision. Prof. Trevor Darrell [email protected] Lecture 9: Motion. Roadmap. Previous: Image formation, filtering, local features, (Texture)… Tues: Feature-based Alignment Stitching images together Homographies , RANSAC, Warping, Blending - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: C280, Computer Vision

C280, Computer Vision

Prof. Trevor [email protected]

Lecture 9: Motion

Page 2: C280, Computer Vision

Roadmap• Previous: Image formation, filtering, local features, (Texture)…• Tues: Feature-based Alignment

– Stitching images together– Homographies, RANSAC, Warping, Blending– Global alignment of planar models

• Today: Dense Motion Models– Local motion / feature displacement– Parametric optic flow

• No classes next week: ICCV conference• Oct 6th: Stereo / ‘Multi-view’: Estimating depth with known

inter-camera pose• Oct 8th: ‘Structure-from-motion’: Estimation of pose and 3D

structure– Factorization approaches– Global alignment with 3D point models

Page 3: C280, Computer Vision

Last Time: Alignment

• Homographies• Rotational Panoramas• RANSAC• Global alignment• Warping• Blending

Page 4: C280, Computer Vision

Today: Motion and Flow

• Motion estimation• Patch-based motion

(optic flow)• Regularization and line

processes• Parametric (global)

motion• Layered motion models

Page 5: C280, Computer Vision

Why estimate visual motion?

• Visual Motion can be annoying– Camera instabilities, jitter– Measure it; remove it (stabilize)

• Visual Motion indicates dynamics in the scene– Moving objects, behavior– Track objects and analyze trajectories

• Visual Motion reveals spatial layout – Motion parallax

Szeliski

Page 6: C280, Computer Vision

Classes of Techniques

• Feature-based methods– Extract visual features (corners, textured areas) and track them– Sparse motion fields, but possibly robust tracking– Suitable especially when image motion is large (10s of pixels)

• Direct-methods– Directly recover image motion from spatio-temporal image

brightness variations– Global motion parameters directly recovered without an

intermediate feature motion calculation– Dense motion fields, but more sensitive to appearance variations– Suitable for video and when image motion is small (< 10 pixels)

Szeliski

Page 7: C280, Computer Vision

Patch-based motion estimation

Page 8: C280, Computer Vision

Image motion

How do we determine correspondences?

Assume all change between frames is due to motion:

),(),( ),(),( yxyx vyuxIyxJ

I J

Page 9: C280, Computer Vision

• Brightness Constancy Equation:),(),( ),(),( yxyx vyuxIyxJ

Or, equivalently, minimize :2)),(),((),( vyuxIyxJvuE

),(),(),(),(),(),( yxvyxIyxuyxIyxIyxJ yx

Linearizing (assuming small (u,v))using Taylor series expansion:

The Brightness Constraint

Szeliski

Page 10: C280, Computer Vision

Gradient Constraint (or the Optical Flow Constraint)

2)(),( tyx IvIuIvuE Minimizing:

0)(

0)(

0

tyxy

tyxx

IvIuII

IvIuIIdvE

duE

In general 0, yx II

0 tyx IvIuIHence,

Szeliski

Page 11: C280, Computer Vision

yx

tyx IvyxIuyxIvuE,

2),(),(),(

Minimizing

Assume a single velocity for all pixels within an image patch

ty

tx

yyx

yxx

IIII

vu

IIIIII2

2

tT IIUII

LHS: sum of the 2x2 outer product of the gradient vector

Patch Translation [Lucas-Kanade]

Szeliski

Page 12: C280, Computer Vision

Local Patch Analysis

• How certain are the motion estimates?

Szeliski

Page 13: C280, Computer Vision

The Aperture Problem

TIIMLet

• Algorithm: At each pixel compute by solving

• M is singular if all gradient vectors point in the same direction• e.g., along an edge• of course, trivially singular if the summation is over a single pixel

or there is no texture• i.e., only normal flow is available (aperture problem)

• Corners and textured areas are OK

and

ty

tx

IIII

b

U bMU

Szeliski

Page 14: C280, Computer Vision

SSD Surface – Textured area

Page 15: C280, Computer Vision

SSD Surface -- Edge

Page 16: C280, Computer Vision

SSD – homogeneous area

Page 17: C280, Computer Vision

Iterative Refinement

• Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation

• Warp one image toward the other using the estimated flow field(easier said than done)

• Refine estimate by repeating the process

Szeliski

Page 18: C280, Computer Vision

Optical Flow: Iterative Estimation

xx0

Initial guess: Estimate:

estimate update

(using d for displacement here instead of u)

Szeliski

Page 19: C280, Computer Vision

Optical Flow: Iterative Estimation

xx0

estimate update

Initial guess: Estimate:

Szeliski

Page 20: C280, Computer Vision

Optical Flow: Iterative Estimation

xx0

Initial guess: Estimate:Initial guess: Estimate:

estimate update

Szeliski

Page 21: C280, Computer Vision

Optical Flow: Iterative Estimation

xx0

Szeliski

Page 22: C280, Computer Vision

Optical Flow: Iterative Estimation

• Some Implementation Issues:– Warping is not easy (ensure that errors in warping

are smaller than the estimate refinement)– Warp one image, take derivatives of the other so

you don’t need to re-compute the gradient after each iteration.

– Often useful to low-pass filter the images before motion estimation (for better derivative estimation, and linear approximations to image intensity)

Szeliski

Page 23: C280, Computer Vision

Optical Flow: Aliasing

Temporal aliasing causes ambiguities in optical flow because images can have many pixels with the same intensity.I.e., how do we know which ‘correspondence’ is correct?

nearest match is correct (no aliasing)

nearest match is incorrect (aliasing)

To overcome aliasing: coarse-to-fine estimation.

actual shift

estimated shift

Szeliski

Page 24: C280, Computer Vision

Limits of the gradient method

Fails when intensity structure in window is poorFails when the displacement is large (typical

operating range is motion of 1 pixel)Linearization of brightness is suitable only for small

displacements• Also, brightness is not strictly constant in

imagesactually less problematic than it appears, since we can

pre-filter images to make them look similar

Szeliski

Page 25: C280, Computer Vision

image Iimage J

aJwwarp refine

a aΔ+

Pyramid of image J Pyramid of image I

image Iimage J

Coarse-to-Fine Estimation

u=10 pixels

u=5 pixels

u=2.5 pixels

u=1.25 pixels

Szeliski

Page 26: C280, Computer Vision

J Jw Iwarp refine

ina

a+

J Jw Iwarp refine

a

a+

J

pyramid construction

J Jw Iwarp refine

a+

I

pyramid construction

outa

Coarse-to-Fine Estimation

Szeliski

Page 27: C280, Computer Vision

Spatial Coherence

Assumption * Neighboring points in the scene typically belong to the same surface and hence typically have similar motions. * Since they also project to nearby points in the image, we expect spatial coherence in image flow.

Black

Page 28: C280, Computer Vision

Formalize this Idea

Noisy 1D signal:

x

u

Noisy measurementsu(x)

Black

Page 29: C280, Computer Vision

Regularization

Find the “best fitting” smoothed function v(x)

x

u

Noisy measurementsu(x)

v(x)

Black

Page 30: C280, Computer Vision

Membrane model

Find the “best fitting” smoothed function v(x)

x

uv(x)

Black

Page 31: C280, Computer Vision

Membrane model

Find the “best fitting” smoothed function v(x)

x

uv(x)

Black

Page 32: C280, Computer Vision

Membrane model

Find the “best fitting” smoothed function v(x)

uv(x)

Black

Page 33: C280, Computer Vision

Spatial smoothness assumptionFaithful to the data

Regularization

x

uv(x)

Minimize:

1

1

2

1

2 ))()1(())()(()(N

x

N

x

xvxvxuxvvE

Black

Page 34: C280, Computer Vision

Bayesian Interpretation

1

1

2

1

2 ))()1(())()(()(N

x

N

x

xvxvxuxvvE

)()|()|( vpvupuvp

)()( xvxu ),0(~ 1 N

2)1()( xvxv ),0(~ 22 N

)/))()((21exp(

21)|( 2

12

1 1

xvxuvupN

x

)/))((21exp(

21)( 2

22

1

1 2

xvvp x

N

x

Black

Page 35: C280, Computer Vision

Discontinuities

x

uv(x)

What about this discontinuity?What is happening here?What can we do?

Black

Page 36: C280, Computer Vision

Robust EstimationNoise distributions are often non-Gaussian, having much heavier tails. Noise

samples from the tails are called outliers.• Sources of outliers (multiple motions):

– specularities / highlights– jpeg artifacts / interlacing / motion blur– multiple motions (occlusion boundaries, transparency)

velocity space

u1

u2

++

Black

Page 37: C280, Computer Vision

Occlusion

occlusion disocclusion shear

Multiple motions within a finite region.

Black

Page 38: C280, Computer Vision

Coherent Motion

Possibly Gaussian.

Black

Page 39: C280, Computer Vision

Multiple Motions

Definitely not Gaussian.

Black

Page 40: C280, Computer Vision

Weak membrane modelu

v(x)

1

1

2

1

2 ))(1())()1()(())()((),(N

x

N

x

xlxvxvxlxuxvlvE

x

}1,0{)( xl

Black

Page 41: C280, Computer Vision

Analog line process

1

1

2

1

2 ))(())()1()(())()((),(N

x

N

x

xlxvxvxlxuxvlvE

1)(0 xl

Penalty function Family of quadratics

Black

Page 42: C280, Computer Vision

Analog line process

1

1

2

1

2 ))(())()1()(())()((),(N

x

N

x

xlxvxvxlxuxvlvE

Infimum defines a robust error function.

1

12

1

2 )),()1(())()(()(N

x

N

x

xvxvxuxvvE

Minima are the same:

Black

Page 43: C280, Computer Vision

Robust Regularization

x

uv(x)

Minimize:

1

12

11 )),()1(()),()(()(

N

x

N

x

xvxvxuxvvE

Treat large spatial derivatives as outliers.

Black

Page 44: C280, Computer Vision

Robust EstimationProblem: Least-squares estimators penalize deviations between data & model with quadratic error fn (extremely sensitive to outliers)

error penalty function influence function

Redescending error functions (e.g., Geman-McClure) help to reduce the influence of outlying measurements.

error penalty function influence function

Black

Page 45: C280, Computer Vision

Optical flow

Outlier with respect to neighbors.

)()()()(),( yxyxS vvuuvuE

Robust formulation of spatial coherence term

Black

Page 46: C280, Computer Vision

“Dense” Optical Flow

)),()()()()(())(( DtyxD IvIuIE xxxxxxu

)]),()(()),()(([),()(

SG

SS vvuuvuE yxyxxy

x

xuxuu ))(())(()( SD EEE Objective function:

When is quadratic = “Horn and Schunck”

Black

Page 47: C280, Computer Vision

Optimization

vE

vTvv

uE

uTuu

nn

nn

)(1

)(1

)()1(

)()1(

)(

),(),(sGn

SnsxDtsusxs

uuIIvIuIuE

T(u)= max of second derivative

Black

Page 48: C280, Computer Vision

Optimization

• Gradient descent• Coarse-to-fine (pyramid)• Deterministic annealing

Black

Page 49: C280, Computer Vision

Example

Black

Page 50: C280, Computer Vision

Example

Black

Page 51: C280, Computer Vision

Quadratic:

Robust:

Black

Page 52: C280, Computer Vision

Magnitude of horizontal flow

Black

Page 53: C280, Computer Vision

Outliers

Points where the influence is reduced

Spatial term Data term

Black

Page 54: C280, Computer Vision

With 5% uniform random noise added to the images.

Black

Page 55: C280, Computer Vision

Horizontal Component

Black

Page 56: C280, Computer Vision

More Noise

Quadratic:

Quadratic data term, robust spatial term:

Black

Page 57: C280, Computer Vision

Both Terms Robust

Spatial and data outliers:

Black

Page 58: C280, Computer Vision

Pepsi

Black

Page 59: C280, Computer Vision

Real Sequence

Deterministic annealing.First stage (large ):

Black

Page 60: C280, Computer Vision

Real Sequence

Final result after annealing:

Black

Page 61: C280, Computer Vision

Parametric motion estimation

Page 62: C280, Computer Vision

Global (parametric) motion models• 2D Models:• Affine• Quadratic• Planar projective transform (Homography)

• 3D Models:• Instantaneous camera motion models • Homography+epipole• Plane+Parallax

Szeliski

Page 63: C280, Computer Vision

Motion models

Translation

2 unknowns

Affine

6 unknowns

Perspective

8 unknowns

3D rotation

3 unknowns

Szeliski

Page 64: C280, Computer Vision

0)()( 654321 tyx IyaxaaIyaxaaI

• Substituting into the B.C. Equation:yaxaayxvyaxaayxu

654

321

),(),(

Each pixel provides 1 linear constraint in 6 global unknowns

0 tyx IvIuI

2 tyx IyaxaaIyaxaaIaErr )()()( 654321

Least Square Minimization (over all pixels):

Example: Affine Motion

Szeliski

Page 65: C280, Computer Vision

Last lecture: Alignment / motion warping• “Alignment”: Assuming we know the correspondences,

how do we get the transformation?

),( ii yx ),( ii yx

2

1

43

21

tt

yx

mmmm

yx

i

i

i

i

• Expressed in terms of absolute coordinates of corresponding points…

• Generally presumed features separately detected in each frame

e.g., affine model in abs. coords…

Page 66: C280, Computer Vision

Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or

analyze spatio-temporal gradient

),( ii yx),( ii yx

• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change

Page 67: C280, Computer Vision

Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or

analyze spatio-temporal gradient

),( ii yx),( ii yx

• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change

Page 68: C280, Computer Vision

Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or

analyze spatio-temporal gradient

),( ii yx),( ii yx

• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change

Page 69: C280, Computer Vision

Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or

analyze spatio-temporal gradient

),( ii yx

• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change

(ui,vi)

Page 70: C280, Computer Vision

Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or

analyze spatio-temporal gradient

• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change

(ui,vi)

Page 71: C280, Computer Vision

Today: “flow”, “parametric motion”• Two views presumed in temporal sequence…track or

analyze spatio-temporal gradient

• Sparse or dense in first frame• Search in second frame• Motion models expressed in terms of position change

(ui,vi)

yaxaayxvyaxaayxu

654

321

),(),(

2

1

43

21

tt

yx

mmmm

yx

i

i

i

i

4

1

65

32

aa

yx

aaaa

vu

i

i

i

i

Previous Alignment model:

Now, Displacement model:

Page 72: C280, Computer Vision

Quadratic – instantaneous approximation to planar motion 2

87654

82

7321

yqxyqyqxqqv

xyqxqyqxqqu

yyvxxu

yhxhhyhxhhy

yhxhhyhxhhx

',' and

'

'

987

654

987

321

Projective – exact planar motion

Other 2D Motion Models

Szeliski

Page 73: C280, Computer Vision

ZxTTxxyyv

ZxTTyxxyu

ZYZYX

ZXZYX

)()1(

)()1(2

2

yyvxxuthyhxhthyhxhy

thyhxhthyhxhx

',' :and

'

'

3987

1654

3987

1321

)(1

)(1

233

133

tytt

xyv

txtt

xxu

w

w

Local Parameter:

ZYXZYX TTT ,,,,,

),( yxZ

Instantaneous camera motion:

Global parameters:

Global parameters:

32191 ,,,,, ttthh

),( yx

Homography+Epipole

Local Parameter:

Residual Planar Parallax Motion

Global parameters:

321 ,, ttt

),( yxLocal Parameter:

3D Motion Models

Szeliski

Page 74: C280, Computer Vision

• Consider image I translated by

• The discrete search method simply searches for the best estimate.

• The gradient method linearizes the intensity function and solves for the estimate

21

,00

2

,1

)),(),(),((

)),(),((),(

yxvvyuuxIyxI

vyuxIyxIvuE

yx

yx

00 ,vu

),(),(),(),(),(

1001

0

yxyxIvyuxIyxIyxI

Discrete Search vs. Gradient Based

Szeliski

Page 75: C280, Computer Vision

Correlation and SSD

• For larger displacements, do template matching– Define a small area around a pixel as the template– Match the template against each pixel within a

search area in next image.– Use a match measure such as correlation,

normalized correlation, or sum-of-squares difference– Choose the maximum (or minimum) as the match– Sub-pixel estimate (Lucas-Kanade)

Szeliski

Page 76: C280, Computer Vision

Shi-Tomasi feature tracker

1. Find good features (min eigenvalue of 22 Hessian)

2. Use Lucas-Kanade to track with pure translation

3. Use affine registration with first feature patch4. Terminate tracks whose dissimilarity gets too

large5. Start new tracks when needed

Szeliski

Page 77: C280, Computer Vision

Tracking results

Szeliski

Page 78: C280, Computer Vision

Tracking - dissimilarity

Szeliski

Page 79: C280, Computer Vision

Tracking results

Szeliski

Page 80: C280, Computer Vision

How well do thesetechniques work?

Page 81: C280, Computer Vision

A Database and Evaluation Methodology for Optical Flow

Simon Baker, Daniel Scharstein, J.P Lewis, Stefan Roth, Michael Black, and Richard Szeliski

ICCV 2007http://vision.middlebury.edu/flow/

Page 82: C280, Computer Vision

Limitations of Yosemite• Only sequence used for quantitative evaluation

• Limitations:• Very simple and synthetic• Small, rigid motion• Minimal motion discontinuities/occlusions

Image 7 Image 8

Yose

mite

Ground-Truth FlowFlow Color

Coding

Szeliski

Page 83: C280, Computer Vision

Limitations of Yosemite

• Only sequence used for quantitative evaluation

• Current challenges:• Non-rigid motion• Real sensor noise• Complex natural scenes• Motion discontinuities• Need more challenging and more realistic benchmarks

Image 7 Image 8

Yose

mite

Ground-Truth FlowFlow Color

Coding

Szeliski

Page 84: C280, Computer Vision

Motion estimation 84

Realistic synthetic imagery• Randomly generate scenes with “trees” and “rocks”• Significant occlusions, motion, texture, and blur• Rendered using Mental Ray and “lens shader” plugin

Rock

Grov

e

Szeliski

Page 85: C280, Computer Vision

85

Modified stereo imagery

• Recrop and resample ground-truth stereo datasets to have appropriate motion for OF

Venu

sM

oebi

us

Szeliski

Page 86: C280, Computer Vision

• Paint scene with textured fluorescent paint• Take 2 images: One in visible light, one in UV light• Move scene in very small steps using robot• Generate ground-truth by tracking the UV images

Dense flow with hidden texture

Setup

Visible

UV

Lights Image Cropped

Szeliski

Page 87: C280, Computer Vision

Experimental results

• Algorithms:• Pyramid LK: OpenCV-based implementation of

Lucas-Kanade on a Gaussian pyramid• Black and Anandan: Author’s implementation• Bruhn et al.: Our implementation• MediaPlayerTM: Code used for video frame-rate

upsampling in Microsoft MediaPlayer• Zitnick et al.: Author’s implementation

Szeliski

Page 88: C280, Computer Vision

Motion estimation 88

Experimental results

Szeliski

Page 89: C280, Computer Vision

Conclusions

• Difficulty: Data substantially more challenging than Yosemite

• Diversity: Substantial variation in difficulty across the various datasets

• Motion GT vs Interpolation: Best algorithms for one are not the best for the other

• Comparison with Stereo: Performance of existing flow algorithms appears weak

Szeliski

Page 90: C280, Computer Vision

Layered Scene Representations

Page 91: C280, Computer Vision

Motion representations

• How can we describe this scene?

Szeliski

Page 92: C280, Computer Vision

Block-based motion prediction

• Break image up into square blocks• Estimate translation for each block• Use this to predict next frame, code difference

(MPEG-2)

Szeliski

Page 93: C280, Computer Vision

Layered motion

• Break image sequence up into “layers”:

• =

• Describe each layer’s motion

Szeliski

Page 94: C280, Computer Vision

Layered motion

• Advantages:• can represent occlusions / disocclusions• each layer’s motion can be smooth• video segmentation for semantic processing• Difficulties:• how do we determine the correct number?• how do we assign pixels?• how do we model the motion?

Szeliski

Page 95: C280, Computer Vision

Layers for video summarization

Szeliski

Page 96: C280, Computer Vision

Background modeling (MPEG-4)

• Convert masked images into a background sprite for layered video coding

• + + +

•=

Szeliski

Page 97: C280, Computer Vision

What are layers?

• [Wang & Adelson, 1994; Darrell & Pentland 1991]

• intensities• alphas• velocities

Szeliski

Page 98: C280, Computer Vision

Fragmented Occlusion

Page 99: C280, Computer Vision

Results

Page 100: C280, Computer Vision

Results

Page 101: C280, Computer Vision

How do we form them?

Szeliski

Page 102: C280, Computer Vision

How do we estimate the layers?

1. compute coarse-to-fine flow2. estimate affine motion in blocks (regression)3. cluster with k-means4. assign pixels to best fitting affine region5. re-estimate affine motions in each region…

Szeliski

Page 103: C280, Computer Vision

Layer synthesis

• For each layer:• stabilize the sequence with the affine motion• compute median value at each pixel• Determine occlusion relationships

Szeliski

Page 104: C280, Computer Vision

Results

Szeliski

Page 105: C280, Computer Vision

Recent results: SIFT Flow

Page 106: C280, Computer Vision

Recent GPU Implementation

• http://gpu4vision.icg.tugraz.at/• Real time flow exploiting robust norm +

regularized mapping

Page 107: C280, Computer Vision

Today: Motion and Flow

• Motion estimation• Patch-based motion (optic flow)• Regularization and line processes• Parametric (global) motion• Layered motion models

Page 108: C280, Computer Vision

Slide Credits

• Rick Szeliski• Michael Black

Page 109: C280, Computer Vision

Roadmap• Previous: Image formation, filtering, local features, (Texture)…• Tues: Feature-based Alignment

– Stitching images together– Homographies, RANSAC, Warping, Blending– Global alignment of planar models

• Today: Dense Motion Models– Local motion / feature displacement– Parametric optic flow

• No classes next week: ICCV conference• Oct 6th: Stereo / ‘Multi-view’: Estimating depth with known

inter-camera pose• Oct 8th: ‘Structure-from-motion’: Estimation of pose and 3D

structure– Factorization approaches– Global alignment with 3D point models

Page 110: C280, Computer Vision

Final project

• Significant novel implementation of technique related to course content

• Teams of 2 encouraged (document role!)• Or journal length review article (no teams)• Three components:

– proposal document (no more than 5 pages)– in class results presentation (10 minutes)– final write-up (no more than 15 pages)

Page 111: C280, Computer Vision

Project Proposals• Due next Friday!• No more than 5 pages• Explain idea:

– Motivation– Approach– Datasets– Evaluation– Schedule

• Proposal should convince me (and you) that project is interesting and doable given the resources you have.

• You can change topics after proposal (at your own risk!)• I’ll consider proposal, final presentation, and report when

evaluating project: a well-thought out proposal can have significant positive weight.

• Teams OK; overlap with thesis or other courses OK.

Page 112: C280, Computer Vision

Project Ideas?

• Face annotation with online social networks?• Classifying sports or family events from photo

collections?• Optic flow to recognize gesture?• Finding indoor structures for scene context?• Shape models of human silhouettes? • Salesin: classify aesthetics?

– Would gist regression work?