school of electronic information engineering , tianjin university

43
School of Electronic Information Engineering , Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping

Upload: ulani

Post on 17-Jan-2016

23 views

Category:

Documents


1 download

DESCRIPTION

Human Action Recognition by Learning Bases of Action Attributes and Parts. Jia pingping. School of Electronic Information Engineering , Tianjin University. Outline :. Action Classification in Still Images. 1. 2. Intuition: Action Attributes and Parts. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: School of Electronic Information Engineering , Tianjin University

School of Electronic Information Engineering , Tianjin University

Human Action Recognition by Learning Bases of Action

Attributes and Parts

Jia pingping

Page 2: School of Electronic Information Engineering , Tianjin University

Outline:

3

Experiments: PASCAL & Stanford 40 Actions4

Intuition: Action Attributes and Parts2

5

Algorithm: Learning Bases of Attributes and Parts

Conclusion

1 Action Classification in Still Images

Page 3: School of Electronic Information Engineering , Tianjin University

Action Classification in Still Images

Low level featureRiding bike

Page 4: School of Electronic Information Engineering , Tianjin University

Action Classification in Still Images

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

- Semantic concepts – Attributes

Low level feature High-level representationRiding bike

Page 5: School of Electronic Information Engineering , Tianjin University

Action Classification in Still Images

- Semantic concepts – Attributes- Objects

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

Low level feature High-level representationRiding bike

Page 6: School of Electronic Information Engineering , Tianjin University

Action Classification in Still Images

- Semantic concepts – Attributes- Objects- Human poses

Parts

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

Low level feature High-level representationRiding bike

Page 7: School of Electronic Information Engineering , Tianjin University

Action Classification in Still Images

- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts

Parts

Riding a bikeSitting on a bike seatWearing a helmetPeddling the pedals…

Riding

Low level feature High-level representationRiding bike

Page 8: School of Electronic Information Engineering , Tianjin University

Low level feature

- Semantic concepts – Attributes- Objects- Human poses- Contexts of attributes & parts

High-level representation

Parts

riding a bike

wearing a helmet

Peddling the pedal

sitting on bike seat

Incorporate human knowledge; More understanding of image content; More discriminative classifier.

Action Classification in Still Images

Riding bike

Page 9: School of Electronic Information Engineering , Tianjin University
Page 10: School of Electronic Information Engineering , Tianjin University

Outline:

3

Experiments: PASCAL & Stanford 40 Actions4

Intuition: Action Attributes and Parts2

5

Algorithm: Learning Bases of Attributes and Parts

Conclusion

1 Action Classification in Still Images

Page 11: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

semantic descriptions of human actions

Page 12: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

semantic descriptions of human actions

Riding bike

Not riding bike

Discriminative classifier, e.g. SVM

Page 13: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

A pre-trained detector

Page 14: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Attribute classification

Object detection

Poselet detection

a: Image feature vector

Page 15: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Attribute classification

Object detection

Poselet detection

a: Image feature vector

Action bases Φ

Page 16: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

a: Image feature vector

Action bases Φ

Page 17: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

a: Image feature vector

Action bases Φ

Page 18: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases

Bases coefficients w

Φ

a: Image feature vector

SVM

a Φw

Page 19: School of Electronic Information Engineering , Tianjin University

Action Attributes and Parts

Attributes:

… …

Parts-Objects:

… …

Parts-Poselets:

… …

Action bases

Bases coefficients w

Φ

a: Image feature vector

Riding bike

a Φw

Page 20: School of Electronic Information Engineering , Tianjin University
Page 21: School of Electronic Information Engineering , Tianjin University

Outline:

3

Experiments: PASCAL & Stanford 40 Actions4

Intuition: Action Attributes and Parts2

5

Algorithm: Learning Bases of Attributes and Parts

Conclusion

1 Action Classification in Still Images

Page 22: School of Electronic Information Engineering , Tianjin University

Bases of Atr. & Parts: Training

w

Φa

a Φw

• Input: 1, , Na a

• Output: 1, , MΦ Φ Φ

1, , NW w wsparse

2

2 1,1

1min ,

2

N

i i ii

Φ W

a Φw w

2

1 2s.t. , 1

2j jj

Φ Φ

• Jointly estimate and :Φ W

Page 23: School of Electronic Information Engineering , Tianjin University

Bases of Atr. & Parts: Testing

w

Φa

a Φw

• Input: a

• Output:

1, , MΦ Φ Φ

w sparse

• Estimate w:

2

2 1

1min

2

wa Φw w

Page 24: School of Electronic Information Engineering , Tianjin University

Outline:

3

Experiments: PASCAL & Stanford 40 Actions4

Intuition: Action Attributes and Parts2

5

Algorithm: Learning Bases of Attributes and Parts

Conclusion

1 Action Classification in Still Images

Page 25: School of Electronic Information Engineering , Tianjin University

1. PASCAL Action Dataset

http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/

Page 26: School of Electronic Information Engineering , Tianjin University

1. PASCAL Action Dataset

• Contain 9 classes , there are 21,738 images in total;

• Randomly select 50% of each class for training/validation and the remain images for testing;

• 14 attributes, 27 objects, 150 poselets;

• The number of action bases are set to 400 and 600 respectively. The 𝜆and values are set to 0.1 and 0.15.𝛾

Page 27: School of Electronic Information Engineering , Tianjin University

Classification Result

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Using computer

Walking

Ave

rage

pre

cisi

on Our method, use “a”

POSELETS

SURREY_MKUCLEAR_DOSP

w

Φa

Page 28: School of Electronic Information Engineering , Tianjin University

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

POSELETS

SURREY_MKUCLEAR_DOSP

Ave

rage

pre

cisi

on

Using computer

Classification Result

Page 29: School of Electronic Information Engineering , Tianjin University

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

Ave

rage

pre

cisi

on

Using computer

400 action bases

attributesobjects

poselets

Classification Result

Page 30: School of Electronic Information Engineering , Tianjin University

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

Ave

rage

pre

cisi

on

Using computer

400 action bases

attributesobjects

poselets

Classification Result

Page 31: School of Electronic Information Engineering , Tianjin University

w

Φa

1 2 3 4 5 6 7 8 9

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Phoning Playing instrument

Reading Riding bike

Riding horse

Running Taking photo

Walking

Our method, use “a”Our method, use “w”

Poselet, Maji et al, 2011

SURREY_MKUCLEAR_DOSP

Ave

rage

pre

cisi

on

Using computer

400 action bases

attributesobjects

poselets

Classification Result

Page 32: School of Electronic Information Engineering , Tianjin University

Control Experiment

w

Φa

Use “a”

Use “w”

A: attributeO: objectP: poselet

Page 33: School of Electronic Information Engineering , Tianjin University

2. Stanford 40 Actions

Applauding Blowing bubbles

Brushing teeth

Calling Cleaning floor

Climbing wall

Cooking Cutting trees

Cutting vegetables

Drinking Feeding horse

Fishing Fixing bike

Gardening Holding umbrella

Jumping

Playing guitar

Playing violin

Pouring liquid

Pushing cart

Reading Repairing car

Riding bike

Riding horse

Rowing Running Shooting arrow

Smoking cigarette

Taking photo

Texting message

Throwing frisbee

Using computer

Using microscope

Using telescope

Walking dog

Washing dishes

Watching television

Waving hands

Writing on board

Writing on paper

http://vision.stanford.edu/Datasets/40actions.html

Page 34: School of Electronic Information Engineering , Tianjin University

2. Stanford 40 Actions

• contains 40 diverse daily human actions;• 180∼300 images for each class, 9532 real world images in total;• All the images are obtained from Google, Bing, and Flickr;• large variations in human pose, appearance, and background clutter.

Cutting vegetables

Drinking Feeding horse

Fixing bike

Gardening Holding umbrella

Playing guitar

Playing violin

Pouring liquid

Reading Repairing car

Riding bike

Shooting arrow

Smoking cigarette

Taking photo

Walking dog

Washing dishes

Watching television

Drinking Gardening

Smoking Cigarette

Page 35: School of Electronic Information Engineering , Tianjin University

35

Result: • Randomly select 100 images in each class for training, and the remaining images for testing.• 45 attributes, 81 objects, 150 poselets. The number of action bases are set to 400 and 600 respectively. The 𝜆 and 𝜆 values are set to 0.1 and 0.15.•Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Riding

a h

orse

Rowing

a b

oat

Riding

a b

ike

Climbin

g m

ount

ain

Jum

ping

Cleanin

g th

e flo

or

Wal

king

a do

g

Shoot

ing a

n ar

row

Playin

g gu

itar

Fishin

g

Holding

up

an u

mbr

ella

Runni

ng

Throw

ing

a fri

sbee

Writ

ing

on a

boa

rd

Wat

chin

g TV

Cuttin

g tre

es

Feedin

g a

hors

e

Garde

ning

Writ

ing

on a

boo

k

Repai

ring

a ca

r

Look

ing th

ru a

micr

osco

pe

Cuttin

g ve

geta

bles

Blowing

bub

bles

Playin

g vio

lin

Brush

ing te

eth

Repai

ring

a bi

ke

Pushin

g a

cart

Using

a co

mpu

ter

Appla

uding

Cookin

g

Smok

ing c

igare

tte

Look

ing th

ru a

teles

cope

Was

hing

dishe

s

Drinkin

g

Calling

Wav

ing h

ands

Pourin

g liq

uid

Readi

ng a

boo

k

Taking

pho

tos

Textin

g m

essa

ge

LLC

Our Method

Ave

rage

pre

cisi

on

Page 36: School of Electronic Information Engineering , Tianjin University

Control Experiment

w

Φa

A: attributeO: objectP: poselet

Use “a”

Use “w”

Page 37: School of Electronic Information Engineering , Tianjin University

Outline:

3

Experiments: PASCAL & Stanford 40 Actions4

Intuition: Action Attributes and Parts2

5

Algorithm: Learning Bases of Attributes and Parts

Conclusion

1 Action Classification in Still Images

Page 38: School of Electronic Information Engineering , Tianjin University

• Partwise Bag-of-Words (PBoW) Representation: Local feature Body part localization PBoW generation

head-wise BoW

limb-wise BoW

leg-wise BoW

foot-wise BoW

Page 39: School of Electronic Information Engineering , Tianjin University

• Local Action Attribute Method: 1. Label the action samples according to different parts

static

vertical move

horizontal move

Head

static

swing

Limb …

For each part, we define a

new set of low-level semantic to re-class the training action

samplesstatic

Leg…

static

Foot

Page 40: School of Electronic Information Engineering , Tianjin University

• Local Action Attribute Method: 2. For each part, train a set of attribute classifiers according to the set of

semantic we define.

for each part

train

……

Page 41: School of Electronic Information Engineering , Tianjin University

• Local Action Attribute Method: 3. For each action sample, map its low-level representation to a middle-

level representation through the framework as follow:

Head-wise BoW

Limb-wise BoW

Leg-wise BoW

Foot-wise BoW

Combine this four part to built a new histogram

representation of the sample

One action sample

Page 42: School of Electronic Information Engineering , Tianjin University

• Local Action Attribute Method: 4. Thus, based on local action attribute, we construct a new descriptor of

action samples. It can be used to classify.

Training set

Testing set

SVMK-NN

Training set

Testing set

Page 43: School of Electronic Information Engineering , Tianjin University

School of Electronic Information Engineering , Tianjin University

Thank you