dvmm lab, columbia universityvideo event recognition video event recognition: multilevel pyramid...
TRANSCRIPT
DVMM Lab, Columbia University Video Event Recognition
Video Event Recognition:Multilevel Pyramid MatchingDong Xu and Shih-Fu Chang
Digital Video and Multimedia LabDepartment of Electrical EngineeringColumbia University
http://www.ntu.edu.sg/home/[email protected]
*Courtesy to Eric Zavesky for preparing for the slides
DVMM Lab, Columbia University Video Event Recognition
Video Event Recognition: Problem• Online video search and video indexing
• Events characterized by an evolution of scenes, objects and actions over time
• 56 events are defined in LSCOM
Airplane Flying Car Exiting
DVMM Lab, Columbia University Video Event Recognition
Video Event Recognition: Challenges
• Geometric and photometric variances
• Clutter background
• Complex camera motion and object motion
DVMM Lab, Columbia University Video Event Recognition
Event Recognition: Object Tracking • Detect interest object, track over time, and model
spatio-temporal dynamics
• Hard to detect events without explicit object motion, such as Riot
Object Detection & Localization
Tracking Inference“ Airpla
ne Landing
”
?
DVMM Lab, Columbia University Video Event Recognition
Event Recognition: Key-Frame based Matching
• Only key-frame is used for matching.
• Low-level feature extraction, compare to other frames, overall decision on matching
...
...
Keyframe Feature
15%
18%
50%
Similarity
DVMM Lab, Columbia University Video Event Recognition
multi-level multi-level pyramid pyramid matchingmatching
multi-level multi-level pyramid pyramid matchingmatching
Event Recognition: Multi-level Pyramid Matching
feature feature extractionextraction
feature feature extractionextraction
concept concept detectorsdetectorsconcept concept
detectorsdetectorsEMDEMD
distancedistanceEMDEMD
distancedistance
...
...
XX
DVMM Lab, Columbia University Video Event Recognition
Content Representation: Low-level Features
edge directionhistogramgrid color
moment
Gabortexture
DVMM Lab, Columbia University Video Event Recognition
• Train detectors on low-level features
• Mid-level semantic concept feature is more robust
• Developed and released 374 semantic concept detectors
Concept Detectors
Content Representation: Mid-level Semantic Concept ScoresImage Database
+-
DVMM Lab, Columbia University Video Event Recognition
Earth Mover’s Distance (EMD): Approach
dij
Supplier P is with a given amount of goods
Receiver Q is with a given limited capacity
Weights: Solved by linear programming
•Temporal shift: a frame at the beginning of P can be mapped to a frame at the end of Q•Scale variations: a frame from P can be mapped to multiple frames in Q
111/21/2
1/21/2
DVMM Lab, Columbia University Video Event Recognition
Multi-level Pyramid Matching: Motivations
• One Clip = several subclips (stages of event evolution)
• No prior knowledge about the number of stages in an event
• Videos of the same event may include only a subset of stages
Solution: Multi-level Solution: Multi-level pyramid matching in pyramid matching in
temporal domaintemporal domain
DVMM Lab, Columbia University Video Event Recognition
•Fusion of information from different levels.
•Alignment of different subclips (Level-1 as an example)
EMD DistanceMatrix between
Sub-clips
Integer-valueAlignment
Smoke Fire
Smoke
Level-0 Level-0
Level-1
Level-1
Level-1
Level-1
•Temporally Constrained Hierarchical Agglomerative Clustering
Fire
Multi-level Pyramid Matching: Algorithm
Level-2
Level-2
Level-2
Level-2
DVMM Lab, Columbia University Video Event Recognition
Pyramid Matching: Projected Illustration
First stage of shot 1
Second stage of shot 1
First stage of shot 2
Second stage of shot 2
Negative shots
DVMM Lab, Columbia University Video Event Recognition
Experiments: Keyframe based feature performance
Dataset: TRECVID2005Evaluation Metric: Average Precision
DVMM Lab, Columbia University Video Event Recognition
Experiments: EMD concept performance
DVMM Lab, Columbia University Video Event Recognition
Experiments: Benefits of multi-level pyramid fusion
DVMM Lab, Columbia University Video Event Recognition
Single-level EMD outperforms key-frame based method. Multi-level Pyramid Matching further improves event detection accuracy.
First systematic study of diverse visual event recognition in the unconstrained broadcast news domain.
Video Event Recognition: Conclusions