lecture 18: human motion recognition - stanford...

72
Lecture 18: Human Motion Recognition Professor FeiFei Li Stanford Vision Lab Stanford Vision Lab Lecture 18 - Fei-Fei Li 7Mar11 1

Upload: hadat

Post on 07-Aug-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Lecture 18: Human Motion Recognition

Professor Fei‐Fei Li

Stanford Vision LabStanford Vision Lab

Lecture 18 -Fei-Fei Li 7‐Mar‐111

Page 2: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

What we will learn today?

• IntroductionIntroduction

• Motion classification using template matching

i l ifi i i i l• Motion classification using spatio‐temporal features

• Motion classification using pose estimation

Lecture 18 -Fei-Fei Li 7‐Mar‐112

Page 3: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

search

sport analysis

Human

sport analysissurveillance

MotionAnalysis

h isynthesisbiomechanics

signlanguage

Lecture 18 -Fei-Fei Li 7‐Mar‐113

game interfacesmotion captureg g

Page 4: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

It’s challenging…

Variation in Appearance Variation in Pose

Occlusion & clutter

Lecture 18 -Fei-Fei Li 7‐Mar‐114

Adapted from http://luthuli.cs.uiuc.edu/~daf/tutorial.htmlVariation in view‐point

Page 5: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

It’s challenging…It s challenging…

• No clear action/movement classesNo clear action/movement classes– Unless in specific domain

• At different temporal spans motion may• At different temporal spans, motion may acquire different meanings

I i i fl h i• Interactions influence the meaning– With objects

– With people

• Goals & intentions

Lecture 18 -Fei-Fei Li 7‐Mar‐115

Page 6: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Current motion technologyCurrent motion technology

• Mostly action discrimination• Mostly action discrimination– Walking or jumping?

M tl i i l i• Mostly in simple scenarios

• We don’t have a winner feature/representation yet

Lecture 18 -Fei-Fei Li 7‐Mar‐116

Page 7: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

KTH Dataset

Lecture 18 -Fei-Fei Li 7‐Mar‐117

[Schuldt et al ICPR 04]

Page 8: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Weizmann Dataset

Bend JumpRunWave2P‐Jump

SideSkipWalk Wave1Jacks

Lecture 18 -Fei-Fei Li 7‐Mar‐118

[Blank et al ICCV 2005]

Page 9: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

UCF SportsLifting

13 action classes

Golf‐swing • Diving • Liftingd• Golf‐swing

• Kicking• Riding‐Horse• Run

Lecture 18 -Fei-Fei Li 7‐Mar‐119

• Walk • … [Rodriguez et al CVPR 2008]

Page 10: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Hollywood Human ActionsHollywood Human Actions•AnswerPhone•EatEat•FightPerson•GetOutCarH dSh k

12 Action •HandShake•Kiss•Run

Classes

•SitDown•SitUp•StandUp

[Laptev et al 

StandUp

Lecture 18 -Fei-Fei Li 7‐Mar‐1110

CVPR08/CVPR09]

Page 11: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Three representations

• Global templateGlobal template

• Local video patches

• Kinematic/body pose

Lecture 18 -Fei-Fei Li 7‐Mar‐1111

Page 12: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Fields of view

Lecture 18 -Fei-Fei Li 7‐Mar‐1112

Page 13: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

The Big PictureThe Big Picture

• Goal: Action classificationGoal: Action classification– E.g. walk left, walk right, run left, run right, etc

• Steps• Steps:1. Tracking person in video

2. Feature extraction

3. Classification• Nearest Neighbor

• AdaBoost

Lecture 18 -Fei-Fei Li 7‐Mar‐1113

Page 14: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

How To Gather Action Data

A person in a particular body 

T ki

configuration should always map to approximately the same stabilized image

> We lose all translational• Tracking – Simple correlation‐based tracker

– User‐initialized

‐> We lose all translational information but optical flow features allow us to distinguish between running and walking.

Lecture 18 -Fei-Fei Li 7‐Mar‐1114

• Forms a spatio‐temporal volume for each person

Page 15: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Low‐level Motion Features

1.  Compute Optical Flow (Lucas and Kanadealgorithm).

2.  Optical flow vector field F is split into horizontal and vertical components (Fx, Fy)., y

3 F and F are split into four non negative3.  Fx and Fy are split into four non‐negative channels Fx+, Fx‐ , Fy+, Fy‐.

4.  Each channel is blurred to reduce noise.

Lecture 18 -Fei-Fei Li

7‐Mar‐1115

Page 16: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Motion Context Descriptor: A Feature that Combines Shape and MotionA Feature that Combines Shape and Motion

2x2 grid,18‐bin radial histogram

histogramconcatenation

concatenate medium scale 

motion

normalized 120x120 box

3 information channels

Lecture 18 -Fei-Fei Li 7‐Mar‐1116

[Sorokin&Tran ECCCV 08]

histogram motion summaries

120x120 box channels

Page 17: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Classifying ActionsClassifying Actions

• Nearest Neighbors [Efros et al ICCV 2003]Nearest Neighbors [Efros et al ICCV 2003]

• Boosting [Fathi & Mori CVPR 2008]

i i [ & S ki CC 2008]• Metric Learning [Tran & Sorokin ECCV 2008]

Lecture 18 -Fei-Fei Li 7‐Mar‐1117

Page 18: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Nearest Neighbors[Efros et al, ICCV 2003]

• Assume we have template videos containing ssu e e a e te p ate deos co ta gexamples of actions we want to classify– We form motion descriptors (i.e. feature vectors) for both the train and test data

• Find the k‐nearest neighbors and use majority g j yvoting to identify the action class

• The motion descriptors are constructed to be• The motion descriptors are constructed to be discriminative so a simple classification algorithm is sufficient.

Lecture 18 -Fei-Fei Li 7‐Mar‐1118

Page 19: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Nearest Neighbor Results[Efros et al, ICCV 2003]

I t id N t i hb t hInput video: Nearest neighbor match:

Action Classification Result

Lecture 18 -Fei-Fei Li 7‐Mar‐1119

Page 20: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Boosting [Fathi and Mori CVPR 2008]

• Low‐level features alone are not capable of discriminating p gbetween positive and negative classes.

– They are weak‐classifiers

– Combine them via AdaBoost

• Mid‐level features are obtained via AdaBoost from low‐level featuresfeatures.

Fi l l ifi bt i d i Ad b t f id l l f t

(Examples of low‐level features selected for mid‐level features.)

Lecture 18 -Fei-Fei Li

• Final classifier obtained via Adaboost from mid‐level features.

7‐Mar‐1120

Page 21: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Boosting [Fathi and Mori CVPR 2008]

• Mid‐level motion featuresMid level motion features

• Final classifier

Lecture 18 -Fei-Fei Li 7‐Mar‐1121

Page 22: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Three representations

• Global templateGlobal template

• Local video patches

• Kinematic/body pose

Lecture 18 -Fei-Fei Li 7‐Mar‐1122

Page 23: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Big Picture: Local Video PatchesBig Picture: Local Video Patches

• Similar to SIFT, we would like to identify key points inSimilar to SIFT, we would like to identify key points in videos– Space‐time Interest Points

• As done in bag‐of‐words approach, we can combine h ‘ ’ f fthe ‘Space‐time Interest Points’ to form a feature vector

Bag of space time features– Bag‐of‐space‐time features

• Allows the use of a discriminative classifier (e.g. SVM)

Lecture 18 -Fei-Fei Li

Allows the use of a discriminative classifier (e.g. SVM)

7‐Mar‐1123

Page 24: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Motivation

Sparse and Localrepresentation

Spatio temporalSpatio-temporalinformation

Lecture 18 -Fei-Fei Li

Johansson, 1973

7‐Mar‐1124

Page 25: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Local Video PatchesNo global assumptions  

Consider local spatio‐temporal neighborhoods 

b ihand waving

boxing

Lecture 18 -Fei-Fei Li 7‐Mar‐1125

Page 26: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Local Video PatchesNo global assumptions  

Consider local spatio‐temporal neighborhoods 

b ihand waving

boxing

Lecture 18 -Fei-Fei Li 7‐Mar‐1126

Page 27: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Local Video Patches

Bag of space‐time features + Statistical classifier ‐> Action Classification[Schuldt’04, Niebles’06, Zhang’07, Laptev’08,’09]

Collection of space‐time patches

Space‐time interest point detection

Video Feature RepresentationVideo Feature Representation

ClassifierClassifier

Lecture 18 -Fei-Fei Li 7‐Mar‐1127

RepresentationRepresentation

Page 28: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Space‐Time Features: DetectorSpace Time Features: Detector

• Extend the idea of 2D interest points to 3DExtend the idea of 2D interest points to 3D interest points in space and time.

• Build on the idea of Harris and Förstnerinterest point operators

• We want to find regions with large variation in b th d ti d iboth space and time domains

Lecture 18 -Fei-Fei Li 7‐Mar‐1128

Page 29: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Space‐Time Features: Detector Space‐time corner detector

[Laptev, IJCV 2005]

Dense scale sampling (no explicit scale selection)

Lecture 18 -Fei-Fei Li 7‐Mar‐1129

Page 30: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Space‐Time Features: Detectorp

Feature extraction given interest point locations:

Histogram of oriented  Histogram of oriented 

Lecture 18 -Fei-Fei Li 7‐Mar‐1130

gradient (HOG) flow (HOF)

Page 31: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Codebook and

Training …

and Representation

Videos …

Feature Space

Lecture 18 -Fei-Fei Li 7‐Mar‐1131

Page 32: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Codebook and

Training …

and Representation

Videos … Bag of features!

Input Video

Feature Space

C codewords

Codewords

Lecture 18 -Fei-Fei Li 7‐Mar‐1132

Page 33: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Classifying ActionsClassifying Actions• SVM [Schuldt et al ICPR04, Laptev CVPR09]

• Topic Models (pLSA, LDA) [Niebles et al IJCV 08]

• ……

Lecture 18 -Fei-Fei Li 7‐Mar‐1133

Page 34: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Topic Models (pLSA, LDA) [Niebles et al IJCV 08]

Lecture 18 -Fei-Fei Li 7‐Mar‐1134

Page 35: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

ObjectObject Bag of ‘words’Bag of ‘words’

Lecture 18 -Fei-Fei Li

Visual dictionary

7‐Mar‐1135

Page 36: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

ActionAction Bag of motion ‘words’Bag of motion ‘words’

Lecture 18 -Fei-Fei Li 7‐Mar‐1136

Visual dictionary

Page 37: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Codebookwalking running joggingwalking running jogging

boxinghandclappinghandwaving

Lecture 18 -Fei-Fei Li

boxinghandclappinghandwaving

7‐Mar‐1137

Page 38: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Unsupervised learning using pLSA

NWd

z wd

spatio temporal word

input

spatio‐temporal word

action category“camel spin”

Lecture 18 -Fei-Fei Li

pJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1138

Page 39: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

pLSA Model

NWd

z wd

K

K

kjkkiji dzpzwpdwp

1)|()|()|(

action category vectors

Word distribution per action category

action category weights

Action category distribution per video

Lecture 18 -Fei-Fei Li

g y p

7‐Mar‐1139

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

Page 40: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

pLSA Model

NWd

z wd

K

k

jkkiji dzpzwpdwp1

)|()|()|(

Unsupervised LearningUnsupervised Learning

M Ndwn

jijidwpL ,)|(

Lecture 18 -Fei-Fei Li

i j1 1

7‐Mar‐1140

Page 41: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Recognition

New video

Codewords

Word distribution given action category

Action category distribution given test video given action categorygiven test video

zw PdzP dwP K

k ktestktest

1|||

testk dzP |maxargcategoryaction

Lecture 18 -Fei-Fei Li

k

7‐Mar‐1141

Page 42: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

KTH d t t

Experiment I:

KTH dataset[Schuldt et al., 2004]:

25 persons, indoors and outdoors, 4 long sequences per person

Lecture 18 -Fei-Fei Li

p , , g q p pJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1142

Page 43: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Experiment I: Performance

• Leave-one person out cross validation

• Average performance: 81 50%

• Unsupervised training• Handle multiple motions

• Average performance: 81.50%

81.50% 81.17%85%

71 72%75%

80%

71.72%

70%

62.96%

60%

65%

Our Dollar et Schuldt Ke et al

Lecture 18 -

OurMethod

Dollar etal.

Schuldtet al.

Ke et al.

7‐Mar‐11

Page 44: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Experiment I: Caltech dataset

walkingg

running

Trained with the KTH data

Tested with the Caltech dataset

Onl o ds f om the co esponding action a e sho n

Lecture 18 -

Only words from the corresponding action are shownJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐11

Page 45: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Experiment I: A longer sequence

walkingwalking

running

Trained with the KTH data

Tested with our own data

Lecture 18 -

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐11

Page 46: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Experiment I: Multiple motions

handclapping

handwaving

Trained with the KTH data

Tested with our own dataown data

Lecture 18 -

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐11

Page 47: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Figure Skating data set:

Experiment II:Figure Skating data set:[Y.Wang, G.Mori et al, CVPR 2006]

7 pe sons 3 action classes camel spin stand spin sit spin

Lecture 18 -Fei-Fei Li

7 persons, 3 action classes: camel spin, stand spin, sit spinJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1147

Page 48: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Experiment II: Examples

Figure skating actions

Camel spin Sit spin Stand spinCa e sp S t sp p

Lecture 18 -Fei-Fei Li

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1148

Page 49: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Experiment II: Long Sequences

Lecture 18 -Fei-Fei Li

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1149

Page 50: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

But…

W d d l th t l it t i l t f f t

Lecture 18 -Fei-Fei Li

• We need models that exploit geometrical arrangements of features

7‐Mar‐1150

Page 51: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

In the object recognition world

• all have same P under bag‐of‐words model.

• part‐based models that capture geometrical informationC t ll ti d l– Constellation model

– Pictorial structures

– etc

Lecture 18 -Fei-Fei Li 7‐Mar‐1151

Page 52: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

bags of featuresConstellation Model

P1 P2

P3 P4

w w

Image Image

Bg

Image g

Strong shape representation Large number of features

Lecture 18 -Fei-Fei Li

Small number of features No shape information

7‐Mar‐1152

Page 53: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

bags of featuresConstellation ModelConstellation of bags of features

P1 P2 P1 P2

P3 P4 P3 P4

w ww

Image ImageImage

Bg

Image g

Bg

Image

Strong shape representation Large number of features

Lecture 18 -Fei-Fei Li

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

7‐Mar‐1153

Page 54: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Constellation ModelConstellation of bags of 

features

P1 P2 P1 P2

P3 P4 P3 P4P3 P4 P3 P4

w w

Bg

Image

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1154

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 55: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Constellation of bags of features

P1 P2

P3 P4

Partlayer

P3 P4

wFeaturelayer

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1155

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 56: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Constellation of bags of featuresθω

bendP1 P2

θω

P3 P4P3 P4

w

• Use a mixture to account for data multimodality. Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1156

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 57: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Constellation of bags of featuresθω

P1 P2

θω

Partlayer

P3 P4P3 P4

Featurelayer

w

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1157

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 58: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

θω

Mixture model

Single component

bag of words

P

P1 P2

PP3

P1 P2

P4P3 P4

w

3 4

w w

BgImage

BgImage Image

Classification performance

Lecture 18 -Fei-Fei Li 7‐Mar‐1158

Page 59: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Constellation of bags of featuresθω

P1 P2

θω

P3 P4

Partlayer

P3 P4

wFeaturelayer

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1159

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 60: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Constellation of bags of features

S i fθω• Static features

Edge map + Shape Context[Belongie ’03]

P1 P2

θω

• Motion featuresInterest Points + ST‐gradients[D ll ’05] P3 P4[Dollar ’05] P3 P4

Featurelayer

w

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1160

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 61: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Constellation of bags of featuresθωS i f

P1 P2

θω• Static featuresEdge map + Shape Context

[Belongie ’03]

P3 P4

• Motion featuresInterest Points + ST‐gradients[D ll ’05] P3 P4[Dollar ’05]

w

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1161

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 62: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Dataset

Bend JumpRunWave2P-Jump

SideSkipWalk Wave1Jacks

“Actions as Space‐Time Shapes”M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri.IEEE International Conference on Computer Vision (ICCV), Beijing, October 2005.

SideSkipWalk Wave1Jacks

Lecture 18 -Fei-Fei Li

f p ( ) j g

7‐Mar‐1162

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 63: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Experiment

• 9 action classes,performed by 9subjects [Blank et al 2005]

• Leave one outcross‐validation

• Video Classificationfperformance: 72.8% 

Lecture 18 -Fei-Fei Li 7‐Mar‐1163

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 64: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Recognition with constellation of bags of features 

Predicted frame label

Ground truth frame label frame label

PredictedAccumulated votes

Predicted sequence label

Correct Incorrect

Lecture 18 -Fei-Fei Li 7‐Mar‐1164

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 65: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Recognition with constellation of bags of features 

Predicted frame label

Ground truth frame label frame label

PredictedAccumulated votes

Predicted sequence label

Correct Incorrect

Lecture 18 -Fei-Fei Li 7‐Mar‐1165

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 66: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Three representations

• Global templateGlobal template

• Local video patches

• Kinematic/body pose

Lecture 18 -Fei-Fei Li 7‐Mar‐1166

Page 67: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Lecture 18 -Fei-Fei Li 7‐Mar‐1167

[Ramanan& Forsyth, NIPS 2003]

Page 68: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Lecture 18 -Fei-Fei Li 7‐Mar‐1168

Page 69: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Lecture 18 -Fei-Fei Li 7‐Mar‐1169

Page 70: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

search

sport analysis

Human

sport analysissurveillance

MotionAnalysis

th isynthesisbiomechanics

signlanguage

Lecture 18 -Fei-Fei Li 7‐Mar‐1170

game interfacesmotion capture

Page 71: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

Useful SurveysUseful Surveys

• Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, g , , pp , , , , ,O.Machine Recognition of Human Activities: A SurveyCirSysVideo(18), No. 11, November 2008, pp. 1473‐14881488.

• David A. Forsyth, Okan Arikan, Leslie Ikemoto.y , ,Computational Studies of Human Motion: Tracking and Motion Synthesis, Part 1

Lecture 18 -Fei-Fei Li 7‐Mar‐1171

Page 72: Lecture 18: Human Motion Recognition - Stanford …vision.stanford.edu/teaching/cs231a_autumn1112/lecture/lecture18... · Lecture 18: Human Motion Recognition ... discriminative so

What we have learned today?

• IntroductionIntroduction

• Motion classification using template matching

i l ifi i i i l• Motion classification using spatio‐temporal features

• Motion classification using pose estimation

Lecture 18 -Fei-Fei Li 7‐Mar‐1172