hon4d (o. oreifej et al., cvpr2013)

HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences

Presenter: Mitsuru NAKAZAWA@Osaka Univ., JPN

Omar Oreifej† and Zicheng Liu‡†University of Central Florida‡Microsoft Research

CVPR2013 paper introduction

1

• This slide is unofficial one because I am a presenter, NOT an author of HON4D– Presenter: Mitsuru Nakazawa@Osaka univ. JPN– nakazawa[at]am.sanken.osaka-u.ac.jp

• HON4D– http://www.cs.ucf.edu/~oreifej/HON4D.html

• Required knowledge: HOG

2

http://www.cs.ucf.edu/~oreifej/HON4D.html

http://www.cs.ucf.edu/~oreifej/HON4D.html

University of Bonn - Institute of Computer Science III - Computer Vision Group 3

Histogram of 4D Surface Normals

• Surface normals:• Quantization according to “projectors” pi: • Add additional discriminative “projectors”

[ O. Oreifej and L. Zicheng. Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. CVPR 2013 available at http://www.cs.ucf.edu/~oreifej/HON4D.html ]

[URL] Accessed Sept. 25, 2013

Bins of the histogram

http://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CEMQFjAC&url=http://files.is.tue.mpg.de/jgall/tutorials/slides/tutorial_cvpr13_action.pptx&ei=JTdCUqLqHueWiAf2v4CYAw&usg=AFQjCNEDFxAIUXVXWOfgp7fBHQRA0YiPGA&sig2=xa4_xALJG9pw3FHfLzIzoQ&bvm=bv.53077864,d.aGc&cad=rja

Introduction• Compared with conventional color images, depth

maps provide several advantages– Depth maps reflect pure geometry and shape cues

• It seems natural to employ depth data in many computer vision problems like action recognition

Would conventional RGB-based methods also perform well in depth sequences??

4

Related work 1

It is difficult to apply these methods because– Detectors such as STIP and Dollar are not reliable in depth sequences

– Standard methods for automatically acquiring motion trajectories in color images are also not reliable in depth sequences

STIP (Laptev et al. 2005) [10] Dollar (Dollar et al. 2005) [5]

5

Local interest point-based methods originally developed for color sequences

Related work 2• Holistic approaches – instead of using local points,

a global feature is obtained for the entire sequence

Yang et al. 2012 [26]Vieira et al. 2012 [21]HON4D Yang et al. 2012 [26]

We demonstrate that our method captures the complex and articulated structure and

motion within the sequence using a richer and more discriminative descriptor than [26] and [21]

6

Contributions

1. We propose a novel descriptor for activity recognition from depth sequences, in which we encode the distribution of the surface normal orientation in the 4D space of depth, time, and spatial coordinates.

2. We demonstrate how to quantize the 4D space using the vertices of a polychoron, and then refine the quantization to become more discriminative.

7

HON4D can uses different bins based on the gradient

(1)

4D Surface Normal

4th dimension encodes the magnitude of the gradient

Normalize the normal to an unit length normal

8

HOG vs. HONHOG

HON

9

Similar distribution

Discriminable distribution

Gradient orientation is similar for both surfaces

Normalization using the sum across all projectors

Component of each normal in each direction

Histogram of 4D Normals for each cell

• Projectors obtained from 600cells (One of polychorons)

– 600cells divides the 4D space uniformly with 120 vertices– 4D space is quantized using 120 vertices Projectors

10

600cellsStella: Polyhedron Navigatorhttp://www.software3d.com/Stella.php

http://www.software3d.com/Stella.php

http://www.software3d.com/Stella.php

Projector refinementWhen two different classes of activities are quite close in thefeature space such that their samples mostly fall in similar bins

11

Is uniform space quantization optimal??

We set the weighting coefficients for projectors by using SVM

Training HON4D descriptorSupport vectorWeight corresponding to the support vector

Experiments using existing databases • MSR Action 3D[12], MSR Gesture 3D[23]

12

MSR Action 3D[12] MSR Gesture 3D[23]

HON4D does not use a skeleton tracker, and yet we outperform the skeleton-based method [24]

New database: 3D Action Pairs Dataset

Although the two actions of each pair are similar in motion and shape, the motion-shape relation is different

13

Skeleton + LOP Skeleton + LOP + Pyramid

HON4D HON4D + Discriminative density

Local HON4D• For the case when actors significantly change their spatial locations, and the

temporal extent of the activities significantly vary

14

Experiment using MSR Daily Activity 3D [24]

• Local HON4D: 80.00% • Local occupancy pattern feature (LOP, Wang et al. 2012): 67.50%

Local HON4D: Histogram of 4D Normals of spatiotemporal patches centered at skeleton joints

HON4D is also superior for significantly non-aligned sequences

Conclusion• We presented a novel, simple, and easily implementable

descriptor for activity recognition from depth sequences. – We initially quantize the 4D space using the vertices of a 600-cell

polychoron, and use that to compute the distribution of the 4D normal orientation for each depth sequence.

– We estimate the discriminative density at each vertex of the polychoron, and induce further vertices accordingly, thus placing more emphasis on the discriminative bins of the histogram.

• We showed by experiments that the proposed method outperforms all previous approaches on all relevant benchmark datasets.

15

hon4d (o. oreifej et al., cvpr2013)

Technology

object recognition

d normals

activity recognition

action recognition

d counterparts

temporal feature points

depth sequences presenter

detection of spatio