iccv 2003uc berkeley computer vision group recognizing action at a distance a.a. efros, a.c. berg,...

26
UC Berkeley Computer Vision ICCV 2003 Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

Upload: archibald-cornelius-dalton

Post on 15-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Recognizing Action at a Distance

A.A. Efros, A.C. Berg, G. Mori, J. Malik

UC Berkeley

Page 2: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Looking at People

• 3-pixel man• Blob tracking

– vast surveillance literature

• 300-pixel man• Limb tracking

– e.g. Yacoob & Black, Rao & Shah, etc.

Far fieldNear field

Page 3: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Medium-field Recognition

The 30-Pixel Man

Page 4: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Appearance vs. Motion

Jackson PollockNumber 21 (detail)

Page 5: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Goals

• Recognize human actions at a distance– Low resolution, noisy data– Moving camera, occlusions– Wide range of actions (including non-periodic)

Page 6: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Our Approach

• Motion-based approach– Non-parametric; use large amount of data– Classify a novel motion by finding the most similar

motion from the training set• Related Work

– Periodicity analysis• Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis;

Collins et al.

– Model-free • Temporal Templates [Bobick & Davis]

• Orientation histograms [Freeman et al; Zelnik & Irani]

• Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]

Page 7: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Gathering action data

• Tracking – Simple correlation-based tracker– User-initialized

Page 8: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Figure-centric Representation

• Stabilized spatio-temporal volume– No translation information– All motion caused by person’s

limbs• Good news: indifferent to camera

motion

• Bad news: hard!

• Good test to see if actions, not just translation, are being captured

Page 9: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

input sequence

Remembrance of Things Past• “Explain” novel motion sequence by

matching to previously seen video clips– For each frame, match based on some temporal

extent

Challenge: how to compare motions?

motion analysisrun

walk leftswing

walk rightjog

database

Page 10: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

How to describe motion?

• Appearance – Not preserved across different clothing

• Gradients (spatial, temporal)– same (e.g. contrast reversal)

• Edges/Silhouettes – Too unreliable

• Optical flow– Explicitly encodes motion – Least affected by appearance – …but too noisy

Page 11: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Spatial Motion Descriptor

Image frame Optical flow yxF ,

yx FF , yyxx FFFF ,,, blurred

yyxx FFFF ,,,

Page 12: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Spatio-temporal Motion Descriptor

t

Sequence A

Sequence B

Temporal extent E

Bframe-to-frame

similarity matrix

A

motion-to-motionsimilarity matrix

A

B

I matrix

E

E

blurry I

E

E

Page 13: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Football Actions: matching

InputSequence

Matched Frames

input matched

Page 14: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Football Actions: classification

10 actions; 4500 total frames; 13-frame motion descriptor

Page 15: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Classifying Ballet Actions16 Actions; 24800 total frames; 51-frame motion descriptor. Men used to classify women and vice versa.

Page 16: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Classifying Tennis Actions

6 actions; 4600 frames; 7-frame motion descriptorWoman player used as training, man as testing.

Page 17: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Classifying Tennis

• Red bars show classification results

Page 18: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Querying the Databaseinput sequence

database

run

walk leftswing

walk rightjog

run walk left swing walk right jogAction Recognition:

Joint Positions:

Page 19: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

2D Skeleton Transfer

• We annotate database with 2D joint positions

• After matching, transfer data to novel sequence– Ajust the match for best fit

Input sequence:

Transferred 2D skeletons:

Page 20: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

3D Skeleton Transfer

• We populate database with rendered stick figures from 3D Motion Capture data

• Matching as before, we get 3D joint positions (kind of)!

Input sequence:

Transferred 3D skeletons:

Page 21: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

“Do as I Do” Motion Synthesis

• Matching two things:– Motion similarity across sequences– Appearance similarity within sequence (like VideoTextures)

• Dynamic Programming

input sequence

synthetic sequence

Page 22: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

“Do as I Do” Source Motion Source Appearance

Result

3400 Frames

Page 23: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

“Do as I Say” Synthesis

• Synthesize given action labels– e.g. video game control

run walk left swing walk right jog

synthetic sequence

run

walk leftswing

walk rightjog

Page 24: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

“Do as I Say”

• Red box shows when constraint is applied

Page 25: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Actor Replacement

SHOW VIDEO(GregWorldCup.avi, DivX)

Page 26: ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley

UC Berkeley Computer Vision Group

ICCV 2003

Conclusions

• In medium field action is about motion

• What we propose:– A way of matching motions at coarse scale

• What we get out:– Action recognition– Skeleton transfer – Synthesis: “Do as I Do” & “Do as I say”

• What we learned?– A lot to be said for the “little guy”!