local descriptors for spatio-temporal recognition computational vision and active perception...

39
Local Descriptors Local Descriptors for Spatio-Temporal Recognition for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer Science KTH (Royal Institute of Technology) SE-100 44 Stockholm, Sweden Ivan Laptev and Tony Lindeberg

Post on 18-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Local DescriptorsLocal Descriptorsfor Spatio-Temporal Recognitionfor Spatio-Temporal Recognition

Computational Vision and Active Perception Laboratory (CVAP)Dept of Numerical Analysis and Computer Science

KTH (Royal Institute of Technology)SE-100 44 Stockholm, Sweden

Ivan Laptev and Tony Lindeberg

Page 2: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Motivation

Area: Interpretation of non-rigid motion

Non-rigid motion results in visual events such as Occlusions, disocclusions Appearance, disappearance Unifications, splits Velocity discontinuities

Events are often characterized by non-constant motion and complex spatio-temporal appearance.

Events provide a compact way to capture important aspects of spatio-temporal structure.

Page 3: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Local Motion Events

Idea: look for spatio-temporal neighborhoods that maximize the local variation of image values over space and time

Page 4: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Interest points

Spatial domain (Harris and Stephens, 1988):

Select space-time maxima of

Analogy in space-time:

Select maxima over (x,y) of

points with high variation of image values over space and time. (Laptev and Lindeberg, ICCV’03)

where

Page 5: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Synthetic examples

Velocity discontinuity(spatio-temporal ”corner”)

Unification and split

Page 6: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Image transformations

• p p’• Spatial scale:

• p p’•

Temporal scale:

• p p’•

Galilean transformation:

Estimate locally to obtain invariance to these transformations (Laptev and Lindeberg ICCV’03, ICPR’04)

Page 7: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Invariance with respect to size changes

Feature detection:Selection of spatial scale

Page 8: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Stationary cameraStabilized camera

Feature detection:Velocity adaptation

Page 9: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Selection of temporal scales captures the temporal extent of events

Feature detection:Selection of temporal scale

Page 10: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Features from human actions

Page 11: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Why local features in space-time?

Make a sparse and informative representation of complex motion patterns;

Obtain robustness w.r.t. missing data (occlusions) and outliers (complex, dynamic backgrounds, multiple motions);

Match similar events in image sequences;

Recognize image patterns of non-rigid motion.

Do not rely on tracking or spatial segmentation prior to motion recognition

Page 12: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Space-time neighborhoodsboxing

walking

hand waving

Page 13: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Local space-time descriptors

Describe image structures in the neighborhoods of detected features defined by positions and covariance matrices

where

A well-founded choice of local descriptors is the local jet (Koenderink and van Doorn, 1987) computed from spatio-

temporal Gaussian derivatives (here at interest points pi)

Page 14: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Use of descriptors: Clustering

c1

c2

c3

c4

Clustering

Classification

Group similar points in the space of image descriptors using K-means clustering

Select significant clusters

Page 15: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Use of descriptors: Clustering

Page 16: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Use of descriptors: Matching Find similar events in pairs of video sequences

Page 17: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Other descriptors better?

Multi-scale spatio-temporal derivatives

Consider the following choices:

Spatio-temporal neighborhood

Projections to orthogonal bases obtained with PCA

Histogram-based descriptors

Page 18: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Multi-scale derivative filtersDerivatives up to order 2 or 4; 3 spatial scales; 3 temporal scales: 9 x 3 x 3 = 81 or 34 x 3 x 3 = 306 dimensional descriptors

Page 19: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

PCA descriptors Compute normal flow or optic flow in locally adapted spatio-

temporal neighborhoods of features Subsample the flow fields to resolution 9x9x9 pixels Learn PCA basis vectors (separately for each flow) from

features in training sequences Project flow fields of the new features onto the 100 most

significant eigen-flow-vectors:

Page 20: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Position-dependent histograms

...

Divide the neighborhood i of each point pi into M^3 subneighborhoods, here M=1,2,3

Compute space-time gradients (Lx, Ly, Lt)T or optic flow (vx,

vy)T at combinations of 3 temporal and 3 spatial scales

where are locally adapted detection scales Compute separable histograms over all

subneighborhoods, derivatives/velocities and scales

Page 21: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Evaluation: Action Recognition

walking running jogging handwaving handclapping boxing

Database:

Initially, recognition with Nearest Neighbor Classifier (NNC): Take sequences of X subjects for training (Strain) For each test sequence stest find the closest training

sequence strain,i by minimizing the distance

Action of stest is regarded as recognized if class(stest)= class(strain,i)

Page 22: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Results: Recognition rates (all)

Scale-adapted featuresScale and velocity adapted

features

Page 23: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Results: Recognition rates (Hist)

Scale-adapted featuresScale and velocity adapted

features

Page 24: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Results: Recognition rates (Jets)

Scale-adapted featuresScale and velocity adapted

features

Page 25: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Results: Comparison

Global-STG-HIST: Zelnik-Manor and Irani CVPR’01

Spatial-4Jets: Spatial interest points (Harris and Stephens, 1988)

Page 26: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Confusion matrices

Position-dependent histograms for space-time interest points

Local jets at spatial interest points

Page 27: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer
Page 28: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

STG-PCA, ED STG-PD2HIST, ED

Confusion matrices

Page 29: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Related work

Mikolayczyk and Schmid CVPR’03, ECCV’02 Lowe ICCV’99 Zelnik and Irani CVPR’01 Fablet, Bouthemy and Peréz PAMI’02 Laptev and Lindeberg ICCV’03, IVC 2004, ICPR’04 Efros et.al. ICCV’03 Harris and Stephens Alvey’88 Koenderink and Doorn PAMI 1992 Lindeberg IJCV 1998

Page 30: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Summary

Descriptors of local spatio-temporal features enable classification and matching of motion events in video

Position-dependent histograms of space-time gradients and optical flow give high recognition performance. Results consistent with findings for SIFT descriptor (Lowe, 1999) in the spatial domain.

Future: Include spatial and temporal consistency of local

features Multiple actions in the scene Information inbetween events

Page 31: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer
Page 32: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

walking running jogging handwaving handclapping boxing

Page 33: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer
Page 34: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Results: Recognition Rates

Scalar product Distance Euclidean Distance

Page 35: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Walking model

Represent the gait pattern using classified spatio-temporal points corresponding the one gait cycle

Define the state of the model X for the moment t0 by the position, the size, the phase and the velocity of a person:

Associate each phase with a silhouette of a person extracted from the original sequence

Page 36: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Sequence alignment Given a data sequence with the current moment t0,

detect and classify interest points in the time window of length tw: (t0, t0-tw)

Transform model features according to X and for each model feature fm,i=(xm,i, ym,i, tm,i, m,i, m,i, cm,i) compute its distance di to the most close data feature fd,j, cd,j=cm,i:

Define the ”fit function” D of model configuration X as a sum of distances of all features weighted w.r.t. their ”age” (t0-tm) such that recent features get more influence on the matching

Page 37: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Sequence alignment

data featuresmodel features

At each moment t0 minimize D with respect to X using standard Gauss-Newton minimization method

Page 38: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Experiments

Page 39: Local Descriptors for Spatio-Temporal Recognition Computational Vision and Active Perception Laboratory (CVAP) Dept of Numerical Analysis and Computer

Experiments