technical report: recognizing human actions by attributes
TRANSCRIPT
Technical Report:Recognizing Human Actions by Attributes
Jingen Liu, Benjamin Kuipers, Silvio SavareseDepartment of EECS
University of Michigan{liujg,kuipers,silvio}@umich.edu
This is the supplemental material for the main paper. In this document, we include the following items:
1. The mathematical derivation of Equation (6)(LINE 416).
2. Optimal partition of X(LINE 466).
3. Action attribute definition for the UIUC action dataset and the MIXED action dataset (LINE 558).
4. Action attribute definition for the Olympic Sports dataset(LINE 737).
5. Visual demonstration on attribute detection.
1. FOR LINE 416: The mathematical derivation of Equation (6)From the definition of mutual information (i.e., Equation (4), lines 396-399), we have,
MI(X;Y ) =∑i
∑j
p(xj)p(yi|xj)logp(yi|xj)
p(yi). (1)
MI(X̂;Y ) =∑i
∑t
p(x̂t)p(yi|x̂t)logp(yi|x̂t)
p(yi)(2)
where X̂ represents the clusters of X , and x̂k is a cluster of xj . Then we have the following facts,
p(x̂t) =∑
xj∈x̂t
p(xj), (3)
p(yi|x̂t) =∑
xj∈x̂t
p(xj)
p(x̂t)p(yi|xj). (4)
Putting Eq. 3 and 4 into Eq. (2), we obtain,
MI(X̂;Y ) =∑i
∑t
p(x̂t) ·∑
xj∈x̂t
p(xj)
p(x̂t)p(yi|xj) · log(p(yi|x̂t))−A
=∑i
∑t
{∑
xj∈x̂t
p(xj)p(yi|xj)} · log(p(yi|x̂t))−A
=∑i
∑t
∑xj∈x̂t
p(xj)p(yi|xj)logp(yi|x̂t)−A, (5)
where A =∑
i
∑j p(xj)p(yi|xj)logp(yi) =
∑i
∑t p(x̂t)p(yi|x̂t)logp(yi). By integrating Eq. (1) and (7) together, the
loss of mutual information can be,
1
Objective: Given a training dataset D, finding a optimal partition Π∗ (X̂) over X such thatthe Linf is minimized.
1. Initialization: Randomly assigning xj to x̂t.
2. Update cluster distributions: For each cluster x̂t, we compute (update) Eq. (3) (theprior for the new cluster) and (4) (the cluster distribution).
3. Re-Assign xj to clusters: for each xj , update its cluster membership as,
t∗ = arg min tKL(p(Y |xj)||p(Y |x̂t)), (9)
4. Stop the iteration till the change of objective function is small (e.g., 10−4).
Table 1. The procedure of partition over X .
Linf (D,Π) =∑i
∑j,xj∈x̂t
p(xj)p(yi|xj)logp(yi|xj)
p(yi|x̂t)
=∑t
∑xj∈x̂t
p(xj) ·KL(p(Y |xj)||p(Y |x̂t)), (6)
where Π = {p(x̂t|xj)}. If consider the distribution p(xj) is uniform, i.e., p(xj) = 1/N where N = |X|, the loss of mutualinformation is measured by the summation of the divergence between distributions p(Y |xj) and their corresponding clusterdistributions p(Y |x̂t) (xj ∈ x̂t). As a result, our objective function is,
Π∗ = arg min Π
∑xi∈x̂t
KL(p(Y |xj)||p(Y |x̂t)), (7)
which means finding a partition Π∗ such that the divergence to the cluster distributions is minimized.
2. FOR LINE 466: Iterative descent algorithm for optimal partition of XGiven a partition C1,...,CT of X with their corresponding cluster distributions π1,...,πT (which are estimated by p(Y |x̂t)),our objective function will be minimized by assignments as follows,
Ct ≡ {xj : KL(p(Y |xj)||πt) ≤ KL(p(Y |xj)||πk), t ̸= k}. (8)
Therefore, the optimal partition can be achieved by an iterative descent algorithm, where every iteration an instance xj isassigned into Ct with smallest divergence of KL(p(Y |xj)||πt), and the cluster distribution πt is updated by averaging all in-stances in x̂t. This descent algorithm is similar to the well-known Euclidean k-means algorithm. We summary the procedurein Table 1.
ha
nd
-cla
p
cra
wl
jum
p f
orw
ard
jum
p f
rom
sit
up
jum
pin
g_
jack
s
pu
shin
g_
up
rais
-1-h
an
d
run
sit-
2-s
tan
din
g
sta
nd
ing
-2-s
i!n
g
stre
tch
-ou
t
turn
wa
lkin
g
wa
vin
g
1 arm only mo"on 1 0 0 1 1 0 1 0 1 1 1 1 0 1
2 standing with arm mo"on 1 0 0 1 1 0 1 0 1 1 1 1 0 1
3 transla"on with arm mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0
4 Jumping mo"on 0 0 1 1 1 0 0 1 0 0 0 0 0 0
5 raise arms/put down 1 0 0 0 1 0 1 0 0 1 1 0 0 1
6 Arm mo"on lower shoulder 0 0 1 0 0 0 0 1 0 0 0 0 1 0
7 Arm mo"on over shoulder 1 0 0 0 1 0 1 0 0 0 1 0 0 1
8 arm-hand: move-back-forward 0 0 1 0 0 0 0 1 0 0 0 0 1 0
9 arm: intense mo"on 0 0 1 1 1 0 0 1 0 0 0 0 0 0
10 arm straight 0 1 1 1 1 1 0 0 1 1 1 1 1 0
11 leg: alterna"ve-move-forward 0 0 1 0 0 0 0 1 0 0 0 0 1 0
12 leg: two-leg synchronized mo"on 0 0 1 0 0 0 0 1 0 0 0 0 0 0
13 leg: fold/unfold mo"on 0 0 0 1 0 1 0 0 1 1 0 1 0 0
14 leg: up-forward mo"on 0 0 1 0 0 0 0 1 0 0 0 0 0 0
15 leg: intense mo"on 0 0 1 0 1 0 0 1 0 0 0 0 0 0
16 leg local mo"on (up-down) 0 0 0 1 0 1 0 0 1 1 0 1 0 0
17 torso ver"cal-shape up/down mo"on 0 0 0 1 0 1 0 0 1 1 0 1 0 0
18 torso ver"cal-shape up-forward mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0
19 torso ver"cal-shape down-forward mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0
20 transla"on mo"on 0 0 1 0 0 0 0 1 0 0 0 0 1 0
21 cyclic mo"on 1 1 1 0 1 1 0 1 0 0 0 0 1 0
22 intense mo"on 0 0 1 1 1 0 0 1 0 0 0 0 0 0
A#ributes
Figure 1. The action attribute definitions for the UIUC action dataset.
3. FOR LINE 558, Action attribute definition of the UIUC action dataset and the MIXED action datasetA. Fig.1 shows the action attribute definitions for the UIUC action dataset.B. Fig.2 shows the action attribute definitions for the MIXED action dataset.
4. FOR LINE 737, Action attribute definitions for the Olympic Sports datasetFig.3 shows the action attribute definitions for the Olympic Sports dataset.
A�ributes be
nd
bo
x
cla
p
cra
wl
jack
jog
jum
p f
orw
ard
jum
p f
rom
sit
up
pju
mp
pu
sh u
p
rais
-1-h
an
d
run
sid
e
sit-
2-s
tan
d
skip
sta
nd
-2-s
it
Str
etc
h-o
ut
turn
wa
lk
wa
ve
1
wa
ve
2
one-arm mo!on 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0
two-arms mo!on 0 1 1 1 1 1 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1
raise hand mo!on 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
chest-level arm mo!on 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
arm-hand: open-close 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
arm-hand: alternate-move-forward 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
arm-hand: swing-move back-forward mo!on 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0
arm-hand: hang-down swing back-forward 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
arm: side-open up-down mo!on 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
arm: small swing mo!on le"-right/up-down 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
arm: up mo!on over shoulder 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1
arm: synchronized arm mo!on 0 0 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0
arm: intense mo!on 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Arm-shape: fold 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1
Arm-shape: straight 1 0 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1
leg: alternate-move-forward 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0
leg: two-leg synchronized mo!on 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
leg: fold/unfold mo!on 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0
leg: up-forward mo!on 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
leg: side-stretch mo!on 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
leg: intense mo!on 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0
leg mo!on 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 0 0
leg: feet small moving mo!on 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
torso ver!cal-shape up mo!on 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0
torso ver!cal-shape down mo!on 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0
torso ver!cal-shape up-forward mo!on 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
torso ver!cal-shape down-forward mo!on 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0
torso bend mo!on 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
transla!on mo!on 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 1 0 0
cyclic mo!on 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1
intense mo!on 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0
Small wave mo!on (up-down) 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Huge wave mo!on (up-down) 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0
body horizontal orienta!on 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Figure 2. The action attribute definitions for the MIXED action dataset.
5. Demonstrations on Attribute DetectionFig. 4 and Fig. 5 demonstrate some selected detected (positive) attributes of some representative actions for the MIXEDaction dataset and Olympic Sports dataset respectively.
Attributes basketb
all-l
ayu
p
bo
wl
Cle
an
-jerk
Dis
cu
s-t
hro
w
Div
ing
-pla
tfo
rm-1
0m
Div
ing
-Sp
rin
g-3
m
Ham
mer-
thro
w
Hig
h-j
um
p
Javelin
-th
row
Lo
ng
-ju
mp
Po
le-v
au
lt
Sh
ot-
pu
t
sn
atc
h
Ten
nis
-serv
e
Tri
ple
-ju
mp
vau
lt
Run 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1
Slow-run 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
Fast run 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1
Indoor 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 1
outdoor 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1
Ball 1 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0
small ball 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0
big ball 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Jump 1 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1
Small local Jump 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Local jump up 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0
Jump Forward 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
Track 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1
Bend 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0
StandUp 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0
Lift something 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
Raise Arms 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
One Arm Open 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Turn Around 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0
Throw Up 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Throw away 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0
water 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
Down Motion in Air 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0
Up Motion in Air 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1
Up Down Motion Local 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
Somersault in Air 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1
With Pole 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
Two hand holding pole 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
One hand holding pole 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Spring Platform 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1
Motion in the air 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1
one arm swing 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0
Crouch 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0
Two Arms Open 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
Two Arms Swing overhead 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
Turn around with two arms open 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Run in Air 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
Big Step 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
Open Arm Lift 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
With Pat 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Figure 3. The action attributes definition for the Olympic Sports dataset.
References[1] A. Banerjee, S. Merugu, I.S. Dhillon and J. Ghosh. “Clustering with Bregman divergences”. Journal of Machine Learning,6:1705-
1749,2005.
[2] S. Lazebnik, and M. Raginsky. “Supervised Learning of Quantizer Codebooks by Information Loss Minimization”. In PAMI, 2007.
Two-arms mo�on
Arms open-close mo�on
Cyclic mo�on
X One-arm mo�on
Chest-level arm mo�on
Two-arms mo�on
Arms alternate-move-forward
With fold arms
Cyclic mo�on
Intense leg mo�on
Torso bend mo�on
With straight arms
X Leg fold/unfold mo�on
X Two-arms mo�on
With straight arms
Two-leg synchronized mo�on
Leg mo�on
Torso ver�cal up mo�on
Intense mo�on
Two arm mo�on
Leg mo�on
X With fold arms
Torso ver�cal up mo�on
X Huge wave mo�on (up/down)
Two arm mo�on
Arms side-open up-down
Leg mo�on
Torso ver�cal down mo�on
Two-leg synchronized mo�on
Synchronized arm mo�on
X One arm mo�on
X Chest-level arm mo�on
Torso transla�on
Two-leg synchronized mo�on
One arm mo�on
Raise hand mo�on
Arm mo�on over shoulder
Two arms mo�on
Raise hand mo�on
Arm mo�on over shoulder
With straight arms
X Intense mo�on
With straight arms
Leg mo�on
Transla�on mo�on
Small wave mo�on (up-down)
X Leg fold/unfold mo�on
(a) Hand-clapping (b) Running (c) Bending (d) Crawling (e) Jumping from sit-up
(f) Raising 1 hand (h) Jumping forward(g) Jumping & Jacking (i) p-jumping (j) Stretching out
Figure 4. Some selected detected attributes for some representative actions of the MIXED action dataset. At most five positive attributesare list. The attributes highlighted by red color are false positive.
(a) Hammer-throwing (b)High-Jumping
(g) Pole-vaul�ng(e) Tennis-serving
(d) Triple-Jumping
(f) Basketball-layup
(c) Diving-pla!orm-10m
(h) Vaul�ng
Is outdoor ac�vity
With Ball
X With Big Ball
Turn around
Throw away
Run
Fast Run
Is outdoor ac�vity
Jump
Mo�on in the air
Run
Is outdoor ac�vity
With pole
X Local Jump Up
Up mo�on in the air
Is outdoor ac�vity
With pat
Small Local Jump
X Up mo�on in the air
Fast Run
Jump Forward
With Track
Stand Up
With Big Steps
X Fast Run
With Ball
Locally Jump Up
X Turn Around
X Throw Away
Mo�on in the Air
Down Mo�on in the Air
Jump
Somersault in the Air
X Bend
Fast Run
Jump
X Crouch
Somersault in the Air
X Stand Up
Figure 5. Some selected detected attributes for some representative actions of the Olympic Sports action dataset. At most five positiveattributes are list. The attributes highlighted by red color are false positive.