lecture 18: human motion recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... ·...
TRANSCRIPT
![Page 1: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/1.jpg)
Lecture 18: Human Motion Recognition
Professor Fei‐Fei Li
Stanford Vision LabStanford Vision Lab
Lecture 18 -Fei-Fei Li 7‐Mar‐111
![Page 2: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/2.jpg)
What we will learn today?
• IntroductionIntroduction
• Motion classification using template matching
i l ifi i i i l• Motion classification using spatio‐temporal features
• Motion classification using pose estimation
Lecture 18 -Fei-Fei Li 7‐Mar‐112
![Page 3: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/3.jpg)
search
sport analysis
Human
sport analysissurveillance
MotionAnalysis
h isynthesisbiomechanics
signlanguage
Lecture 18 -Fei-Fei Li 7‐Mar‐113
game interfacesmotion captureg g
![Page 4: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/4.jpg)
It’s challenging…
Variation in Appearance Variation in Pose
Occlusion & clutter
Lecture 18 -Fei-Fei Li 7‐Mar‐114
Adapted from http://luthuli.cs.uiuc.edu/~daf/tutorial.htmlVariation in view‐point
![Page 5: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/5.jpg)
It’s challenging…It s challenging…
• No clear action/movement classesNo clear action/movement classes– Unless in specific domain
• At different temporal spans motion may• At different temporal spans, motion may acquire different meanings
I i i fl h i• Interactions influence the meaning– With objects
– With people
• Goals & intentions
Lecture 18 -Fei-Fei Li 7‐Mar‐115
![Page 6: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/6.jpg)
Current motion technologyCurrent motion technology
• Mostly action discrimination• Mostly action discrimination– Walking or jumping?
M tl i i l i• Mostly in simple scenarios
• We don’t have a winner feature/representation yet
Lecture 18 -Fei-Fei Li 7‐Mar‐116
![Page 7: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/7.jpg)
KTH Dataset
Lecture 18 -Fei-Fei Li 7‐Mar‐117
[Schuldt et al ICPR 04]
![Page 8: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/8.jpg)
Weizmann Dataset
Bend JumpRunWave2P‐Jump
SideSkipWalk Wave1Jacks
Lecture 18 -Fei-Fei Li 7‐Mar‐118
[Blank et al ICCV 2005]
![Page 9: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/9.jpg)
UCF SportsLifting
13 action classes
Golf‐swing • Diving • Liftingd• Golf‐swing
• Kicking• Riding‐Horse• Run
Lecture 18 -Fei-Fei Li 7‐Mar‐119
• Walk • … [Rodriguez et al CVPR 2008]
![Page 10: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/10.jpg)
Hollywood Human ActionsHollywood Human Actions•AnswerPhone•EatEat•FightPerson•GetOutCarH dSh k
12 Action •HandShake•Kiss•Run
Classes
•SitDown•SitUp•StandUp
[Laptev et al
StandUp
Lecture 18 -Fei-Fei Li 7‐Mar‐1110
CVPR08/CVPR09]
![Page 11: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/11.jpg)
Three representations
• Global templateGlobal template
• Local video patches
• Kinematic/body pose
Lecture 18 -Fei-Fei Li 7‐Mar‐1111
![Page 12: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/12.jpg)
Fields of view
Lecture 18 -Fei-Fei Li 7‐Mar‐1112
![Page 13: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/13.jpg)
The Big PictureThe Big Picture
• Goal: Action classificationGoal: Action classification– E.g. walk left, walk right, run left, run right, etc
• Steps• Steps:1. Tracking person in video
2. Feature extraction
3. Classification• Nearest Neighbor
• AdaBoost
Lecture 18 -Fei-Fei Li 7‐Mar‐1113
![Page 14: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/14.jpg)
How To Gather Action Data
A person in a particular body
T ki
configuration should always map to approximately the same stabilized image
> We lose all translational• Tracking – Simple correlation‐based tracker
– User‐initialized
‐> We lose all translational information but optical flow features allow us to distinguish between running and walking.
Lecture 18 -Fei-Fei Li 7‐Mar‐1114
• Forms a spatio‐temporal volume for each person
![Page 15: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/15.jpg)
Low‐level Motion Features
1. Compute Optical Flow (Lucas and Kanadealgorithm).
2. Optical flow vector field F is split into horizontal and vertical components (Fx, Fy)., y
3 F and F are split into four non negative3. Fx and Fy are split into four non‐negative channels Fx+, Fx‐ , Fy+, Fy‐.
4. Each channel is blurred to reduce noise.
Lecture 18 -Fei-Fei Li
7‐Mar‐1115
![Page 16: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/16.jpg)
Motion Context Descriptor: A Feature that Combines Shape and MotionA Feature that Combines Shape and Motion
2x2 grid,18‐bin radial histogram
histogramconcatenation
concatenate medium scale
motion
normalized 120x120 box
3 information channels
Lecture 18 -Fei-Fei Li 7‐Mar‐1116
[Sorokin&Tran ECCCV 08]
histogram motion summaries
120x120 box channels
![Page 17: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/17.jpg)
Classifying ActionsClassifying Actions
• Nearest Neighbors [Efros et al ICCV 2003]Nearest Neighbors [Efros et al ICCV 2003]
• Boosting [Fathi & Mori CVPR 2008]
i i [ & S ki CC 2008]• Metric Learning [Tran & Sorokin ECCV 2008]
Lecture 18 -Fei-Fei Li 7‐Mar‐1117
![Page 18: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/18.jpg)
Nearest Neighbors[Efros et al, ICCV 2003]
• Assume we have template videos containing ssu e e a e te p ate deos co ta gexamples of actions we want to classify– We form motion descriptors (i.e. feature vectors) for both the train and test data
• Find the k‐nearest neighbors and use majority g j yvoting to identify the action class
• The motion descriptors are constructed to be• The motion descriptors are constructed to be discriminative so a simple classification algorithm is sufficient.
Lecture 18 -Fei-Fei Li 7‐Mar‐1118
![Page 19: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/19.jpg)
Nearest Neighbor Results[Efros et al, ICCV 2003]
I t id N t i hb t hInput video: Nearest neighbor match:
Action Classification Result
Lecture 18 -Fei-Fei Li 7‐Mar‐1119
![Page 20: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/20.jpg)
Boosting [Fathi and Mori CVPR 2008]
• Low‐level features alone are not capable of discriminating p gbetween positive and negative classes.
– They are weak‐classifiers
– Combine them via AdaBoost
• Mid‐level features are obtained via AdaBoost from low‐level featuresfeatures.
Fi l l ifi bt i d i Ad b t f id l l f t
(Examples of low‐level features selected for mid‐level features.)
Lecture 18 -Fei-Fei Li
• Final classifier obtained via Adaboost from mid‐level features.
7‐Mar‐1120
![Page 21: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/21.jpg)
Boosting [Fathi and Mori CVPR 2008]
• Mid‐level motion featuresMid level motion features
• Final classifier
Lecture 18 -Fei-Fei Li 7‐Mar‐1121
![Page 22: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/22.jpg)
Three representations
• Global templateGlobal template
• Local video patches
• Kinematic/body pose
Lecture 18 -Fei-Fei Li 7‐Mar‐1122
![Page 23: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/23.jpg)
Big Picture: Local Video PatchesBig Picture: Local Video Patches
• Similar to SIFT, we would like to identify key points inSimilar to SIFT, we would like to identify key points in videos– Space‐time Interest Points
• As done in bag‐of‐words approach, we can combine h ‘ ’ f fthe ‘Space‐time Interest Points’ to form a feature vector
Bag of space time features– Bag‐of‐space‐time features
• Allows the use of a discriminative classifier (e.g. SVM)
Lecture 18 -Fei-Fei Li
Allows the use of a discriminative classifier (e.g. SVM)
7‐Mar‐1123
![Page 24: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/24.jpg)
Motivation
Sparse and Localrepresentation
Spatio temporalSpatio-temporalinformation
Lecture 18 -Fei-Fei Li
Johansson, 1973
7‐Mar‐1124
![Page 25: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/25.jpg)
Local Video PatchesNo global assumptions
Consider local spatio‐temporal neighborhoods
b ihand waving
boxing
Lecture 18 -Fei-Fei Li 7‐Mar‐1125
![Page 26: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/26.jpg)
Local Video PatchesNo global assumptions
Consider local spatio‐temporal neighborhoods
b ihand waving
boxing
Lecture 18 -Fei-Fei Li 7‐Mar‐1126
![Page 27: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/27.jpg)
Local Video Patches
Bag of space‐time features + Statistical classifier ‐> Action Classification[Schuldt’04, Niebles’06, Zhang’07, Laptev’08,’09]
Collection of space‐time patches
Space‐time interest point detection
Video Feature RepresentationVideo Feature Representation
ClassifierClassifier
Lecture 18 -Fei-Fei Li 7‐Mar‐1127
RepresentationRepresentation
![Page 28: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/28.jpg)
Space‐Time Features: DetectorSpace Time Features: Detector
• Extend the idea of 2D interest points to 3DExtend the idea of 2D interest points to 3D interest points in space and time.
• Build on the idea of Harris and Förstnerinterest point operators
• We want to find regions with large variation in b th d ti d iboth space and time domains
Lecture 18 -Fei-Fei Li 7‐Mar‐1128
![Page 29: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/29.jpg)
Space‐Time Features: Detector Space‐time corner detector
[Laptev, IJCV 2005]
Dense scale sampling (no explicit scale selection)
Lecture 18 -Fei-Fei Li 7‐Mar‐1129
![Page 30: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/30.jpg)
Space‐Time Features: Detectorp
Feature extraction given interest point locations:
Histogram of oriented Histogram of oriented
Lecture 18 -Fei-Fei Li 7‐Mar‐1130
gradient (HOG) flow (HOF)
![Page 31: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/31.jpg)
Codebook and
Training …
and Representation
Videos …
Feature Space
Lecture 18 -Fei-Fei Li 7‐Mar‐1131
![Page 32: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/32.jpg)
Codebook and
Training …
and Representation
Videos … Bag of features!
Input Video
Feature Space
C codewords
Codewords
Lecture 18 -Fei-Fei Li 7‐Mar‐1132
![Page 33: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/33.jpg)
Classifying ActionsClassifying Actions• SVM [Schuldt et al ICPR04, Laptev CVPR09]
• Topic Models (pLSA, LDA) [Niebles et al IJCV 08]
• ……
Lecture 18 -Fei-Fei Li 7‐Mar‐1133
![Page 34: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/34.jpg)
Topic Models (pLSA, LDA) [Niebles et al IJCV 08]
Lecture 18 -Fei-Fei Li 7‐Mar‐1134
![Page 35: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/35.jpg)
ObjectObject Bag of ‘words’Bag of ‘words’
Lecture 18 -Fei-Fei Li
Visual dictionary
7‐Mar‐1135
![Page 36: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/36.jpg)
ActionAction Bag of motion ‘words’Bag of motion ‘words’
Lecture 18 -Fei-Fei Li 7‐Mar‐1136
Visual dictionary
![Page 37: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/37.jpg)
Codebookwalking running joggingwalking running jogging
boxinghandclappinghandwaving
Lecture 18 -Fei-Fei Li
boxinghandclappinghandwaving
7‐Mar‐1137
![Page 38: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/38.jpg)
Unsupervised learning using pLSA
NWd
z wd
spatio temporal word
input
spatio‐temporal word
action category“camel spin”
Lecture 18 -Fei-Fei Li
pJ.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐1138
![Page 39: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/39.jpg)
pLSA Model
NWd
z wd
K
K
kjkkiji dzpzwpdwp
1)|()|()|(
action category vectors
Word distribution per action category
action category weights
Action category distribution per video
Lecture 18 -Fei-Fei Li
g y p
7‐Mar‐1139
J.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
![Page 40: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/40.jpg)
pLSA Model
NWd
z wd
K
k
jkkiji dzpzwpdwp1
)|()|()|(
Unsupervised LearningUnsupervised Learning
M Ndwn
jijidwpL ,)|(
Lecture 18 -Fei-Fei Li
i j1 1
7‐Mar‐1140
![Page 41: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/41.jpg)
Recognition
New video
Codewords
Word distribution given action category
Action category distribution given test video given action categorygiven test video
zw PdzP dwP K
k ktestktest
1|||
testk dzP |maxargcategoryaction
Lecture 18 -Fei-Fei Li
k
7‐Mar‐1141
![Page 42: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/42.jpg)
KTH d t t
Experiment I:
KTH dataset[Schuldt et al., 2004]:
25 persons, indoors and outdoors, 4 long sequences per person
Lecture 18 -Fei-Fei Li
p , , g q p pJ.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐1142
![Page 43: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/43.jpg)
Experiment I: Performance
• Leave-one person out cross validation
• Average performance: 81 50%
• Unsupervised training• Handle multiple motions
• Average performance: 81.50%
81.50% 81.17%85%
71 72%75%
80%
71.72%
70%
62.96%
60%
65%
Our Dollar et Schuldt Ke et al
Lecture 18 -
OurMethod
Dollar etal.
Schuldtet al.
Ke et al.
7‐Mar‐11
![Page 44: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/44.jpg)
Experiment I: Caltech dataset
walkingg
running
Trained with the KTH data
Tested with the Caltech dataset
Onl o ds f om the co esponding action a e sho n
Lecture 18 -
Only words from the corresponding action are shownJ.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐11
![Page 45: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/45.jpg)
Experiment I: A longer sequence
walkingwalking
running
Trained with the KTH data
Tested with our own data
Lecture 18 -
J.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐11
![Page 46: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/46.jpg)
Experiment I: Multiple motions
handclapping
handwaving
Trained with the KTH data
Tested with our own dataown data
Lecture 18 -
J.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐11
![Page 47: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/47.jpg)
Figure Skating data set:
Experiment II:Figure Skating data set:[Y.Wang, G.Mori et al, CVPR 2006]
7 pe sons 3 action classes camel spin stand spin sit spin
Lecture 18 -Fei-Fei Li
7 persons, 3 action classes: camel spin, stand spin, sit spinJ.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐1147
![Page 48: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/48.jpg)
Experiment II: Examples
Figure skating actions
Camel spin Sit spin Stand spinCa e sp S t sp p
Lecture 18 -Fei-Fei Li
J.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐1148
![Page 49: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/49.jpg)
Experiment II: Long Sequences
Lecture 18 -Fei-Fei Li
J.C. Niebles, H. Wang, L. Fei‐Fei, BMVC, 2006
7‐Mar‐1149
![Page 50: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/50.jpg)
But…
W d d l th t l it t i l t f f t
Lecture 18 -Fei-Fei Li
• We need models that exploit geometrical arrangements of features
7‐Mar‐1150
![Page 51: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/51.jpg)
In the object recognition world
• all have same P under bag‐of‐words model.
• part‐based models that capture geometrical informationC t ll ti d l– Constellation model
– Pictorial structures
– etc
Lecture 18 -Fei-Fei Li 7‐Mar‐1151
![Page 52: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/52.jpg)
bags of featuresConstellation Model
P1 P2
P3 P4
w w
Image Image
Bg
Image g
Strong shape representation Large number of features
Lecture 18 -Fei-Fei Li
Small number of features No shape information
7‐Mar‐1152
![Page 53: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/53.jpg)
bags of featuresConstellation ModelConstellation of bags of features
P1 P2 P1 P2
P3 P4 P3 P4
w ww
Image ImageImage
Bg
Image g
Bg
Image
Strong shape representation Large number of features
Lecture 18 -Fei-Fei Li
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
7‐Mar‐1153
![Page 54: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/54.jpg)
Constellation ModelConstellation of bags of
features
P1 P2 P1 P2
P3 P4 P3 P4P3 P4 P3 P4
w w
Bg
Image
Bg
Image
Lecture 18 -Fei-Fei Li 7‐Mar‐1154
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 55: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/55.jpg)
Constellation of bags of features
P1 P2
P3 P4
Partlayer
P3 P4
wFeaturelayer
Bg
Image
Lecture 18 -Fei-Fei Li 7‐Mar‐1155
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 56: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/56.jpg)
Constellation of bags of featuresθω
bendP1 P2
θω
P3 P4P3 P4
w
• Use a mixture to account for data multimodality. Bg
Image
Lecture 18 -Fei-Fei Li 7‐Mar‐1156
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 57: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/57.jpg)
Constellation of bags of featuresθω
P1 P2
θω
Partlayer
P3 P4P3 P4
Featurelayer
w
Bg
Image
Lecture 18 -Fei-Fei Li 7‐Mar‐1157
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 58: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/58.jpg)
θω
Mixture model
Single component
bag of words
P
P1 P2
PP3
P1 P2
P4P3 P4
w
3 4
w w
BgImage
BgImage Image
Classification performance
Lecture 18 -Fei-Fei Li 7‐Mar‐1158
![Page 59: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/59.jpg)
Constellation of bags of featuresθω
P1 P2
θω
P3 P4
Partlayer
P3 P4
wFeaturelayer
Bg
Image
Lecture 18 -Fei-Fei Li 7‐Mar‐1159
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 60: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/60.jpg)
Constellation of bags of features
S i fθω• Static features
Edge map + Shape Context[Belongie ’03]
P1 P2
θω
• Motion featuresInterest Points + ST‐gradients[D ll ’05] P3 P4[Dollar ’05] P3 P4
Featurelayer
w
Bg
Image
Lecture 18 -Fei-Fei Li 7‐Mar‐1160
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 61: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/61.jpg)
Constellation of bags of featuresθωS i f
P1 P2
θω• Static featuresEdge map + Shape Context
[Belongie ’03]
P3 P4
• Motion featuresInterest Points + ST‐gradients[D ll ’05] P3 P4[Dollar ’05]
w
Bg
Image
Lecture 18 -Fei-Fei Li 7‐Mar‐1161
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 62: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/62.jpg)
Dataset
Bend JumpRunWave2P-Jump
SideSkipWalk Wave1Jacks
“Actions as Space‐Time Shapes”M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri.IEEE International Conference on Computer Vision (ICCV), Beijing, October 2005.
SideSkipWalk Wave1Jacks
Lecture 18 -Fei-Fei Li
f p ( ) j g
7‐Mar‐1162
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 63: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/63.jpg)
Experiment
• 9 action classes,performed by 9subjects [Blank et al 2005]
• Leave one outcross‐validation
• Video Classificationfperformance: 72.8%
Lecture 18 -Fei-Fei Li 7‐Mar‐1163
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 64: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/64.jpg)
Recognition with constellation of bags of features
Predicted frame label
Ground truth frame label frame label
PredictedAccumulated votes
Predicted sequence label
Correct Incorrect
Lecture 18 -Fei-Fei Li 7‐Mar‐1164
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 65: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/65.jpg)
Recognition with constellation of bags of features
Predicted frame label
Ground truth frame label frame label
PredictedAccumulated votes
Predicted sequence label
Correct Incorrect
Lecture 18 -Fei-Fei Li 7‐Mar‐1165
J.C. Niebles, & L. Fei‐Fei, CVPR, 2007
![Page 66: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/66.jpg)
Three representations
• Global templateGlobal template
• Local video patches
• Kinematic/body pose
Lecture 18 -Fei-Fei Li 7‐Mar‐1166
![Page 67: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/67.jpg)
Lecture 18 -Fei-Fei Li 7‐Mar‐1167
[Ramanan& Forsyth, NIPS 2003]
![Page 68: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/68.jpg)
Lecture 18 -Fei-Fei Li 7‐Mar‐1168
![Page 69: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/69.jpg)
Lecture 18 -Fei-Fei Li 7‐Mar‐1169
![Page 70: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/70.jpg)
search
sport analysis
Human
sport analysissurveillance
MotionAnalysis
th isynthesisbiomechanics
signlanguage
Lecture 18 -Fei-Fei Li 7‐Mar‐1170
game interfacesmotion capture
![Page 71: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/71.jpg)
Useful SurveysUseful Surveys
• Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, g , , pp , , , , ,O.Machine Recognition of Human Activities: A SurveyCirSysVideo(18), No. 11, November 2008, pp. 1473‐14881488.
• David A. Forsyth, Okan Arikan, Leslie Ikemoto.y , ,Computational Studies of Human Motion: Tracking and Motion Synthesis, Part 1
Lecture 18 -Fei-Fei Li 7‐Mar‐1171
![Page 72: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐](https://reader034.vdocuments.mx/reader034/viewer/2022051902/5ff1f4df6d9019128859155d/html5/thumbnails/72.jpg)
What we have learned today?
• IntroductionIntroduction
• Motion classification using template matching
i l ifi i i i l• Motion classification using spatio‐temporal features
• Motion classification using pose estimation
Lecture 18 -Fei-Fei Li 7‐Mar‐1172