school of electronic information engineering , tianjin university
DESCRIPTION
Human Action Recognition by Learning Bases of Action Attributes and Parts. Jia pingping. School of Electronic Information Engineering , Tianjin University. Outline :. Action Classification in Still Images. 1. 2. Intuition: Action Attributes and Parts. - PowerPoint PPT PresentationTRANSCRIPT
School of Electronic Information Engineering , Tianjin University
Human Action Recognition by Learning Bases of Action
Attributes and Parts
Jia pingping
Outline:
3
Experiments: PASCAL & Stanford 40 Actions4
Intuition: Action Attributes and Parts2
5
Algorithm: Learning Bases of Attributes and Parts
Conclusion
1 Action Classification in Still Images
Action Classification in Still Images
Low level featureRiding bike
Action Classification in Still Images
Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…
- Semantic concepts – Attributes
Low level feature High-level representationRiding bike
Action Classification in Still Images
- Semantic concepts – Attributes- Objects
Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…
Low level feature High-level representationRiding bike
Action Classification in Still Images
- Semantic concepts – Attributes- Objects- Human poses
Parts
Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…
Low level feature High-level representationRiding bike
Action Classification in Still Images
- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts
Parts
Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…
Riding
Low level feature High-level representationRiding bike
Low level feature
- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts
High-level representation
Parts
riding a bike
wearing a helmet
Peddling the pedal
sitting on bike seat
Incorporate human knowledge; More understanding of image content; More discriminative classifier.
Action Classification in Still Images
Riding bike
Outline:
3
Experiments: PASCAL & Stanford 40 Actions4
Intuition: Action Attributes and Parts2
5
Algorithm: Learning Bases of Attributes and Parts
Conclusion
1 Action Classification in Still Images
Action Attributes and Parts
Attributes:
… …
semantic descriptions of human actions
Action Attributes and Parts
Attributes:
… …
semantic descriptions of human actions
Riding bike
Not riding bike
Discriminative classifier, e.g. SVM
Action Attributes and Parts
Attributes:
… …
Parts-Objects:
… …
Parts-Poselets:
… …
A pre-trained detector
Action Attributes and Parts
Attributes:
… …
Parts-Objects:
… …
Parts-Poselets:
… …
Attribute classification
Object detection
Poselet detection
a: Image feature vector
Action Attributes and Parts
Attributes:
… …
Parts-Objects:
… …
Parts-Poselets:
… …
Attribute classification
Object detection
Poselet detection
a: Image feature vector
…
Action bases Φ
Action Attributes and Parts
Attributes:
… …
Parts-Objects:
… …
Parts-Poselets:
… …
a: Image feature vector
…
Action bases Φ
Action Attributes and Parts
Attributes:
… …
Parts-Objects:
… …
Parts-Poselets:
… …
a: Image feature vector
…
Action bases Φ
Action Attributes and Parts
Attributes:
… …
Parts-Objects:
… …
Parts-Poselets:
… …
…
Action bases
Bases coefficients w
Φ
a: Image feature vector
SVM
a Φw
Action Attributes and Parts
Attributes:
… …
Parts-Objects:
… …
Parts-Poselets:
… …
…
Action bases
Bases coefficients w
Φ
a: Image feature vector
Riding bike
a Φw
Outline:
3
Experiments: PASCAL & Stanford 40 Actions4
Intuition: Action Attributes and Parts2
5
Algorithm: Learning Bases of Attributes and Parts
Conclusion
1 Action Classification in Still Images
Bases of Atr. & Parts: Training
w
Φa
a Φw
• Input: 1, , Na a
• Output: 1, , MΦ Φ Φ
1, , NW w wsparse
2
2 1,1
1min ,
2
N
i i ii
Φ W
a Φw w
2
1 2s.t. , 1
2j jj
Φ Φ
• Jointly estimate and :Φ W
…
Bases of Atr. & Parts: Testing
…
w
Φa
a Φw
• Input: a
• Output:
1, , MΦ Φ Φ
w sparse
• Estimate w:
2
2 1
1min
2
wa Φw w
Outline:
3
Experiments: PASCAL & Stanford 40 Actions4
Intuition: Action Attributes and Parts2
5
Algorithm: Learning Bases of Attributes and Parts
Conclusion
1 Action Classification in Still Images
1. PASCAL Action Dataset
http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/
1. PASCAL Action Dataset
• Contain 9 classes , there are 21,738 images in total;
• Randomly select 50% of each class for training/validation and the remain images for testing;
• 14 attributes, 27 objects, 150 poselets;
• The number of action bases are set to 400 and 600 respectively. The 𝜆and values are set to 0.1 and 0.15.𝛾
Classification Result
1 2 3 4 5 6 7 8 9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Phoning Playing instrument
Reading Riding bike
Riding horse
Running Taking photo
Using computer
Walking
Ave
rage
pre
cisi
on Our method, use “a”
POSELETS
SURREY_MKUCLEAR_DOSP
…
w
Φa
…
w
Φa
1 2 3 4 5 6 7 8 9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Phoning Playing instrument
Reading Riding bike
Riding horse
Running Taking photo
Walking
Our method, use “a”Our method, use “w”
POSELETS
SURREY_MKUCLEAR_DOSP
Ave
rage
pre
cisi
on
Using computer
Classification Result
…
w
Φa
1 2 3 4 5 6 7 8 9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Phoning Playing instrument
Reading Riding bike
Riding horse
Running Taking photo
Walking
Our method, use “a”Our method, use “w”
Poselet, Maji et al, 2011
SURREY_MKUCLEAR_DOSP
Ave
rage
pre
cisi
on
Using computer
400 action bases
attributesobjects
poselets
Classification Result
…
w
Φa
1 2 3 4 5 6 7 8 9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Phoning Playing instrument
Reading Riding bike
Riding horse
Running Taking photo
Walking
Our method, use “a”Our method, use “w”
Poselet, Maji et al, 2011
SURREY_MKUCLEAR_DOSP
Ave
rage
pre
cisi
on
Using computer
400 action bases
attributesobjects
poselets
Classification Result
…
w
Φa
1 2 3 4 5 6 7 8 9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Phoning Playing instrument
Reading Riding bike
Riding horse
Running Taking photo
Walking
Our method, use “a”Our method, use “w”
Poselet, Maji et al, 2011
SURREY_MKUCLEAR_DOSP
Ave
rage
pre
cisi
on
Using computer
400 action bases
attributesobjects
poselets
Classification Result
Control Experiment
…
w
Φa
Use “a”
Use “w”
A: attributeO: objectP: poselet
2. Stanford 40 Actions
Applauding Blowing bubbles
Brushing teeth
Calling Cleaning floor
Climbing wall
Cooking Cutting trees
Cutting vegetables
Drinking Feeding horse
Fishing Fixing bike
Gardening Holding umbrella
Jumping
Playing guitar
Playing violin
Pouring liquid
Pushing cart
Reading Repairing car
Riding bike
Riding horse
Rowing Running Shooting arrow
Smoking cigarette
Taking photo
Texting message
Throwing frisbee
Using computer
Using microscope
Using telescope
Walking dog
Washing dishes
Watching television
Waving hands
Writing on board
Writing on paper
http://vision.stanford.edu/Datasets/40actions.html
2. Stanford 40 Actions
• contains 40 diverse daily human actions;• 180∼300 images for each class, 9532 real world images in total;• All the images are obtained from Google, Bing, and Flickr;• large variations in human pose, appearance, and background clutter.
Cutting vegetables
Drinking Feeding horse
Fixing bike
Gardening Holding umbrella
Playing guitar
Playing violin
Pouring liquid
Reading Repairing car
Riding bike
Shooting arrow
Smoking cigarette
Taking photo
Walking dog
Washing dishes
Watching television
Drinking Gardening
Smoking Cigarette
35
Result: • Randomly select 100 images in each class for training, and the remaining images for testing.• 45 attributes, 81 objects, 150 poselets. The number of action bases are set to 400 and 600 respectively. The 𝜆 and 𝜆 values are set to 0.1 and 0.15.•Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Riding
a h
orse
Rowing
a b
oat
Riding
a b
ike
Climbin
g m
ount
ain
Jum
ping
Cleanin
g th
e flo
or
Wal
king
a do
g
Shoot
ing a
n ar
row
Playin
g gu
itar
Fishin
g
Holding
up
an u
mbr
ella
Runni
ng
Throw
ing
a fri
sbee
Writ
ing
on a
boa
rd
Wat
chin
g TV
Cuttin
g tre
es
Feedin
g a
hors
e
Garde
ning
Writ
ing
on a
boo
k
Repai
ring
a ca
r
Look
ing th
ru a
micr
osco
pe
Cuttin
g ve
geta
bles
Blowing
bub
bles
Playin
g vio
lin
Brush
ing te
eth
Repai
ring
a bi
ke
Pushin
g a
cart
Using
a co
mpu
ter
Appla
uding
Cookin
g
Smok
ing c
igare
tte
Look
ing th
ru a
teles
cope
Was
hing
dishe
s
Drinkin
g
Calling
Wav
ing h
ands
Pourin
g liq
uid
Readi
ng a
boo
k
Taking
pho
tos
Textin
g m
essa
ge
LLC
Our Method
Ave
rage
pre
cisi
on
Control Experiment
…
w
Φa
A: attributeO: objectP: poselet
Use “a”
Use “w”
Outline:
3
Experiments: PASCAL & Stanford 40 Actions4
Intuition: Action Attributes and Parts2
5
Algorithm: Learning Bases of Attributes and Parts
Conclusion
1 Action Classification in Still Images
• Partwise Bag-of-Words (PBoW) Representation: Local feature Body part localization PBoW generation
head-wise BoW
limb-wise BoW
leg-wise BoW
foot-wise BoW
• Local Action Attribute Method: 1. Label the action samples according to different parts
static
vertical move
horizontal move
Head
static
swing
…
Limb …
For each part, we define a
new set of low-level semantic to re-class the training action
samplesstatic
…
Leg…
static
…
Foot
…
• Local Action Attribute Method: 2. For each part, train a set of attribute classifiers according to the set of
semantic we define.
for each part
train
……
…
• Local Action Attribute Method: 3. For each action sample, map its low-level representation to a middle-
level representation through the framework as follow:
Head-wise BoW
Limb-wise BoW
Leg-wise BoW
Foot-wise BoW
Combine this four part to built a new histogram
representation of the sample
One action sample
• Local Action Attribute Method: 4. Thus, based on local action attribute, we construct a new descriptor of
action samples. It can be used to classify.
Training set
Testing set
SVMK-NN
Training set
Testing set
School of Electronic Information Engineering , Tianjin University
Thank you