dvmm lab, columbia universityvideo event recognition video event recognition: multilevel pyramid...

DVMM Lab, Columbia University Video Event Recognition

Video Event Recognition:Multilevel Pyramid MatchingDong Xu and Shih-Fu Chang

Digital Video and Multimedia LabDepartment of Electrical EngineeringColumbia University

http://www.ntu.edu.sg/home/[email protected]

*Courtesy to Eric Zavesky for preparing for the slides


Video Event Recognition: Problem• Online video search and video indexing

• Events characterized by an evolution of scenes, objects and actions over time

• 56 events are defined in LSCOM

Airplane Flying Car Exiting


Video Event Recognition: Challenges

• Geometric and photometric variances

• Clutter background

• Complex camera motion and object motion


Event Recognition: Object Tracking • Detect interest object, track over time, and model

spatio-temporal dynamics

• Hard to detect events without explicit object motion, such as Riot

Object Detection & Localization

Tracking Inference“ Airpla

ne Landing

”

?


Event Recognition: Key-Frame based Matching

• Only key-frame is used for matching.

• Low-level feature extraction, compare to other frames, overall decision on matching

...

...

Keyframe Feature

15%

18%

50%

Similarity


multi-level multi-level pyramid pyramid matchingmatching

multi-level multi-level pyramid pyramid matchingmatching

Event Recognition: Multi-level Pyramid Matching

feature feature extractionextraction

feature feature extractionextraction

concept concept detectorsdetectorsconcept concept

detectorsdetectorsEMDEMD

distancedistanceEMDEMD

distancedistance

...

...

XX


Content Representation: Low-level Features

edge directionhistogramgrid color

moment

Gabortexture


• Train detectors on low-level features

• Mid-level semantic concept feature is more robust

• Developed and released 374 semantic concept detectors

Concept Detectors

Content Representation: Mid-level Semantic Concept ScoresImage Database

+-


Earth Mover’s Distance (EMD): Approach

dij

Supplier P is with a given amount of goods

Receiver Q is with a given limited capacity

Weights: Solved by linear programming

•Temporal shift: a frame at the beginning of P can be mapped to a frame at the end of Q•Scale variations: a frame from P can be mapped to multiple frames in Q

111/21/2

1/21/2


Multi-level Pyramid Matching: Motivations

• One Clip = several subclips (stages of event evolution)

• No prior knowledge about the number of stages in an event

• Videos of the same event may include only a subset of stages

Solution: Multi-level Solution: Multi-level pyramid matching in pyramid matching in

temporal domaintemporal domain


•Fusion of information from different levels.

•Alignment of different subclips (Level-1 as an example)

EMD DistanceMatrix between

Sub-clips

Integer-valueAlignment

Smoke Fire

Smoke

Level-0 Level-0

Level-1

Level-1

Level-1

Level-1

•Temporally Constrained Hierarchical Agglomerative Clustering

Fire

Multi-level Pyramid Matching: Algorithm

Level-2

Level-2

Level-2

Level-2


Pyramid Matching: Projected Illustration

First stage of shot 1

Second stage of shot 1

First stage of shot 2

Second stage of shot 2

Negative shots


Experiments: Keyframe based feature performance

Dataset: TRECVID2005Evaluation Metric: Average Precision


Experiments: EMD concept performance


Experiments: Benefits of multi-level pyramid fusion


Single-level EMD outperforms key-frame based method. Multi-level Pyramid Matching further improves event detection accuracy.

First systematic study of diverse visual event recognition in the unconstrained broadcast news domain.

Video Event Recognition: Conclusions

dvmm lab, columbia universityvideo event recognition video event recognition: multilevel pyramid...

Documents