technical report: recognizing human actions by attributes

7
Technical Report: Recognizing Human Actions by Attributes Jingen Liu, Benjamin Kuipers, Silvio Savarese Department of EECS University of Michigan {liujg,kuipers,silvio}@umich.edu This is the supplemental material for the main paper. In this document, we include the following items: 1. The mathematical derivation of Equation (6)(LINE 416). 2. Optimal partition of X(LINE 466). 3. Action attribute definition for the UIUC action dataset and the MIXED action dataset (LINE 558). 4. Action attribute definition for the Olympic Sports dataset(LINE 737). 5. Visual demonstration on attribute detection. 1. FOR LINE 416: The mathematical derivation of Equation (6) From the definition of mutual information (i.e., Equation (4), lines 396-399), we have, MI (X; Y )= i j p(x j )p(y i |x j )log p(y i |x j ) p(y i ) . (1) MI ( ˆ X; Y )= i t px t )p(y i | ˆ x t )log p(y i | ˆ x t ) p(y i ) (2) where ˆ X represents the clusters of X, and ˆ x k is a cluster of x j . Then we have the following facts, px t )= x j ˆ x t p(x j ), (3) p(y i | ˆ x t )= x j ˆ x t p(x j ) px t ) p(y i |x j ). (4) Putting Eq. 3 and 4 into Eq. (2), we obtain, MI ( ˆ X; Y )= i t px t ) · x j ˆ x t p(x j ) px t ) p(y i |x j ) · log(p(y i | ˆ x t )) A = i t { x j ˆ x t p(x j )p(y i |x j )log(p(y i | ˆ x t )) A = i t x j ˆ x t p(x j )p(y i |x j )logp(y i | ˆ x t ) A, (5) where A = i j p(x j )p(y i |x j )logp(y i )= i t px t )p(y i | ˆ x t )logp(y i ). By integrating Eq. (1) and (7) together, the loss of mutual information can be, 1

Upload: others

Post on 28-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Technical Report: Recognizing Human Actions by Attributes

Technical Report:Recognizing Human Actions by Attributes

Jingen Liu, Benjamin Kuipers, Silvio SavareseDepartment of EECS

University of Michigan{liujg,kuipers,silvio}@umich.edu

This is the supplemental material for the main paper. In this document, we include the following items:

1. The mathematical derivation of Equation (6)(LINE 416).

2. Optimal partition of X(LINE 466).

3. Action attribute definition for the UIUC action dataset and the MIXED action dataset (LINE 558).

4. Action attribute definition for the Olympic Sports dataset(LINE 737).

5. Visual demonstration on attribute detection.

1. FOR LINE 416: The mathematical derivation of Equation (6)From the definition of mutual information (i.e., Equation (4), lines 396-399), we have,

MI(X;Y ) =∑i

∑j

p(xj)p(yi|xj)logp(yi|xj)

p(yi). (1)

MI(X̂;Y ) =∑i

∑t

p(x̂t)p(yi|x̂t)logp(yi|x̂t)

p(yi)(2)

where X̂ represents the clusters of X , and x̂k is a cluster of xj . Then we have the following facts,

p(x̂t) =∑

xj∈x̂t

p(xj), (3)

p(yi|x̂t) =∑

xj∈x̂t

p(xj)

p(x̂t)p(yi|xj). (4)

Putting Eq. 3 and 4 into Eq. (2), we obtain,

MI(X̂;Y ) =∑i

∑t

p(x̂t) ·∑

xj∈x̂t

p(xj)

p(x̂t)p(yi|xj) · log(p(yi|x̂t))−A

=∑i

∑t

{∑

xj∈x̂t

p(xj)p(yi|xj)} · log(p(yi|x̂t))−A

=∑i

∑t

∑xj∈x̂t

p(xj)p(yi|xj)logp(yi|x̂t)−A, (5)

where A =∑

i

∑j p(xj)p(yi|xj)logp(yi) =

∑i

∑t p(x̂t)p(yi|x̂t)logp(yi). By integrating Eq. (1) and (7) together, the

loss of mutual information can be,

1

Page 2: Technical Report: Recognizing Human Actions by Attributes

Objective: Given a training dataset D, finding a optimal partition Π∗ (X̂) over X such thatthe Linf is minimized.

1. Initialization: Randomly assigning xj to x̂t.

2. Update cluster distributions: For each cluster x̂t, we compute (update) Eq. (3) (theprior for the new cluster) and (4) (the cluster distribution).

3. Re-Assign xj to clusters: for each xj , update its cluster membership as,

t∗ = arg min tKL(p(Y |xj)||p(Y |x̂t)), (9)

4. Stop the iteration till the change of objective function is small (e.g., 10−4).

Table 1. The procedure of partition over X .

Linf (D,Π) =∑i

∑j,xj∈x̂t

p(xj)p(yi|xj)logp(yi|xj)

p(yi|x̂t)

=∑t

∑xj∈x̂t

p(xj) ·KL(p(Y |xj)||p(Y |x̂t)), (6)

where Π = {p(x̂t|xj)}. If consider the distribution p(xj) is uniform, i.e., p(xj) = 1/N where N = |X|, the loss of mutualinformation is measured by the summation of the divergence between distributions p(Y |xj) and their corresponding clusterdistributions p(Y |x̂t) (xj ∈ x̂t). As a result, our objective function is,

Π∗ = arg min Π

∑xi∈x̂t

KL(p(Y |xj)||p(Y |x̂t)), (7)

which means finding a partition Π∗ such that the divergence to the cluster distributions is minimized.

2. FOR LINE 466: Iterative descent algorithm for optimal partition of XGiven a partition C1,...,CT of X with their corresponding cluster distributions π1,...,πT (which are estimated by p(Y |x̂t)),our objective function will be minimized by assignments as follows,

Ct ≡ {xj : KL(p(Y |xj)||πt) ≤ KL(p(Y |xj)||πk), t ̸= k}. (8)

Therefore, the optimal partition can be achieved by an iterative descent algorithm, where every iteration an instance xj isassigned into Ct with smallest divergence of KL(p(Y |xj)||πt), and the cluster distribution πt is updated by averaging all in-stances in x̂t. This descent algorithm is similar to the well-known Euclidean k-means algorithm. We summary the procedurein Table 1.

Page 3: Technical Report: Recognizing Human Actions by Attributes

ha

nd

-cla

p

cra

wl

jum

p f

orw

ard

jum

p f

rom

sit

up

jum

pin

g_

jack

s

pu

shin

g_

up

rais

-1-h

an

d

run

sit-

2-s

tan

din

g

sta

nd

ing

-2-s

i!n

g

stre

tch

-ou

t

turn

wa

lkin

g

wa

vin

g

1 arm only mo"on 1 0 0 1 1 0 1 0 1 1 1 1 0 1

2 standing with arm mo"on 1 0 0 1 1 0 1 0 1 1 1 1 0 1

3 transla"on with arm mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0

4 Jumping mo"on 0 0 1 1 1 0 0 1 0 0 0 0 0 0

5 raise arms/put down 1 0 0 0 1 0 1 0 0 1 1 0 0 1

6 Arm mo"on lower shoulder 0 0 1 0 0 0 0 1 0 0 0 0 1 0

7 Arm mo"on over shoulder 1 0 0 0 1 0 1 0 0 0 1 0 0 1

8 arm-hand: move-back-forward 0 0 1 0 0 0 0 1 0 0 0 0 1 0

9 arm: intense mo"on 0 0 1 1 1 0 0 1 0 0 0 0 0 0

10 arm straight 0 1 1 1 1 1 0 0 1 1 1 1 1 0

11 leg: alterna"ve-move-forward 0 0 1 0 0 0 0 1 0 0 0 0 1 0

12 leg: two-leg synchronized mo"on 0 0 1 0 0 0 0 1 0 0 0 0 0 0

13 leg: fold/unfold mo"on 0 0 0 1 0 1 0 0 1 1 0 1 0 0

14 leg: up-forward mo"on 0 0 1 0 0 0 0 1 0 0 0 0 0 0

15 leg: intense mo"on 0 0 1 0 1 0 0 1 0 0 0 0 0 0

16 leg local mo"on (up-down) 0 0 0 1 0 1 0 0 1 1 0 1 0 0

17 torso ver"cal-shape up/down mo"on 0 0 0 1 0 1 0 0 1 1 0 1 0 0

18 torso ver"cal-shape up-forward mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0

19 torso ver"cal-shape down-forward mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0

20 transla"on mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0

21 cyclic mo"on 1 1 1 0 1 1 0 1 0 0 0 0 1 0

22 intense mo"on 0 0 1 1 1 0 0 1 0 0 0 0 0 0

A#ributes

Figure 1. The action attribute definitions for the UIUC action dataset.

3. FOR LINE 558, Action attribute definition of the UIUC action dataset and the MIXED action datasetA. Fig.1 shows the action attribute definitions for the UIUC action dataset.B. Fig.2 shows the action attribute definitions for the MIXED action dataset.

4. FOR LINE 737, Action attribute definitions for the Olympic Sports datasetFig.3 shows the action attribute definitions for the Olympic Sports dataset.

Page 4: Technical Report: Recognizing Human Actions by Attributes

A�ributes be

nd

bo

x

cla

p

cra

wl

jack

jog

jum

p f

orw

ard

jum

p f

rom

sit

up

pju

mp

pu

sh u

p

rais

-1-h

an

d

run

sid

e

sit-

2-s

tan

d

skip

sta

nd

-2-s

it

Str

etc

h-o

ut

turn

wa

lk

wa

ve

1

wa

ve

2

one-arm mo!on 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0

two-arms mo!on 0 1 1 1 1 1 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1

raise hand mo!on 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0

chest-level arm mo!on 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0

arm-hand: open-close 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

arm-hand: alternate-move-forward 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0

arm-hand: swing-move back-forward mo!on 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0

arm-hand: hang-down swing back-forward 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

arm: side-open up-down mo!on 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

arm: small swing mo!on le"-right/up-down 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

arm: up mo!on over shoulder 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1

arm: synchronized arm mo!on 0 0 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0

arm: intense mo!on 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Arm-shape: fold 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1

Arm-shape: straight 1 0 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1

leg: alternate-move-forward 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0

leg: two-leg synchronized mo!on 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

leg: fold/unfold mo!on 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0

leg: up-forward mo!on 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0

leg: side-stretch mo!on 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

leg: intense mo!on 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0

leg mo!on 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 0 0

leg: feet small moving mo!on 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0

torso ver!cal-shape up mo!on 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0

torso ver!cal-shape down mo!on 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0

torso ver!cal-shape up-forward mo!on 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0

torso ver!cal-shape down-forward mo!on 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0

torso bend mo!on 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

transla!on mo!on 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 1 0 0

cyclic mo!on 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1

intense mo!on 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0

Small wave mo!on (up-down) 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

Huge wave mo!on (up-down) 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0

body horizontal orienta!on 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

Figure 2. The action attribute definitions for the MIXED action dataset.

5. Demonstrations on Attribute DetectionFig. 4 and Fig. 5 demonstrate some selected detected (positive) attributes of some representative actions for the MIXEDaction dataset and Olympic Sports dataset respectively.

Page 5: Technical Report: Recognizing Human Actions by Attributes

Attributes basketb

all-l

ayu

p

bo

wl

Cle

an

-jerk

Dis

cu

s-t

hro

w

Div

ing

-pla

tfo

rm-1

0m

Div

ing

-Sp

rin

g-3

m

Ham

mer-

thro

w

Hig

h-j

um

p

Javelin

-th

row

Lo

ng

-ju

mp

Po

le-v

au

lt

Sh

ot-

pu

t

sn

atc

h

Ten

nis

-serv

e

Tri

ple

-ju

mp

vau

lt

Run 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1

Slow-run 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0

Fast run 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1

Indoor 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1

outdoor 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1

Ball 1 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0

small ball 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0

big ball 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Jump 1 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1

Small local Jump 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

Local jump up 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0

Jump Forward 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0

Track 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1

Bend 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0

StandUp 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0

Lift something 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0

Raise Arms 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

One Arm Open 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Turn Around 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0

Throw Up 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Throw away 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0

water 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0

Down Motion in Air 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0

Up Motion in Air 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1

Up Down Motion Local 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0

Somersault in Air 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1

With Pole 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0

Two hand holding pole 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

One hand holding pole 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

Spring Platform 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1

Motion in the air 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1

one arm swing 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0

Crouch 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0

Two Arms Open 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0

Two Arms Swing overhead 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

Turn around with two arms open 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

Run in Air 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

Big Step 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

Open Arm Lift 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

With Pat 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

Figure 3. The action attributes definition for the Olympic Sports dataset.

References[1] A. Banerjee, S. Merugu, I.S. Dhillon and J. Ghosh. “Clustering with Bregman divergences”. Journal of Machine Learning,6:1705-

1749,2005.

[2] S. Lazebnik, and M. Raginsky. “Supervised Learning of Quantizer Codebooks by Information Loss Minimization”. In PAMI, 2007.

Page 6: Technical Report: Recognizing Human Actions by Attributes

Two-arms mo�on

Arms open-close mo�on

Cyclic mo�on

X One-arm mo�on

Chest-level arm mo�on

Two-arms mo�on

Arms alternate-move-forward

With fold arms

Cyclic mo�on

Intense leg mo�on

Torso bend mo�on

With straight arms

X Leg fold/unfold mo�on

X Two-arms mo�on

With straight arms

Two-leg synchronized mo�on

Leg mo�on

Torso ver�cal up mo�on

Intense mo�on

Two arm mo�on

Leg mo�on

X With fold arms

Torso ver�cal up mo�on

X Huge wave mo�on (up/down)

Two arm mo�on

Arms side-open up-down

Leg mo�on

Torso ver�cal down mo�on

Two-leg synchronized mo�on

Synchronized arm mo�on

X One arm mo�on

X Chest-level arm mo�on

Torso transla�on

Two-leg synchronized mo�on

One arm mo�on

Raise hand mo�on

Arm mo�on over shoulder

Two arms mo�on

Raise hand mo�on

Arm mo�on over shoulder

With straight arms

X Intense mo�on

With straight arms

Leg mo�on

Transla�on mo�on

Small wave mo�on (up-down)

X Leg fold/unfold mo�on

(a) Hand-clapping (b) Running (c) Bending (d) Crawling (e) Jumping from sit-up

(f) Raising 1 hand (h) Jumping forward(g) Jumping & Jacking (i) p-jumping (j) Stretching out

Figure 4. Some selected detected attributes for some representative actions of the MIXED action dataset. At most five positive attributesare list. The attributes highlighted by red color are false positive.

Page 7: Technical Report: Recognizing Human Actions by Attributes

(a) Hammer-throwing (b)High-Jumping

(g) Pole-vaul�ng(e) Tennis-serving

(d) Triple-Jumping

(f) Basketball-layup

(c) Diving-pla!orm-10m

(h) Vaul�ng

Is outdoor ac�vity

With Ball

X With Big Ball

Turn around

Throw away

Run

Fast Run

Is outdoor ac�vity

Jump

Mo�on in the air

Run

Is outdoor ac�vity

With pole

X Local Jump Up

Up mo�on in the air

Is outdoor ac�vity

With pat

Small Local Jump

X Up mo�on in the air

Fast Run

Jump Forward

With Track

Stand Up

With Big Steps

X Fast Run

With Ball

Locally Jump Up

X Turn Around

X Throw Away

Mo�on in the Air

Down Mo�on in the Air

Jump

Somersault in the Air

X Bend

Fast Run

Jump

X Crouch

Somersault in the Air

X Stand Up

Figure 5. Some selected detected attributes for some representative actions of the Olympic Sports action dataset. At most five positiveattributes are list. The attributes highlighted by red color are false positive.