semantic human activity detection in videos
TRANSCRIPT
Semantic Human Activity Detection in Videos
by
Hirantha Pradeep WeerarathnaDr. Anuja Dharmaratne
University of Colombo School of Computing
Definitions• Basic Human Action
– One simple motion of a human body organ
• Human Activity– Combination of basic actions in a row
• Semantic Human Activity– More meaningful human activities
• Human action detection is a well recognized problem in computer vision
• It is a very hard problem due to:– Variations in recording settings– Inter personal differences– Variations in action performing
• Many solutions have been proposed in the past for human action detection
• Significant observation on these solutions is almost all the solutions discuss about basic human actions no attention has been paid for human activities
• Most solutions are based on action pattern analysis
• Laptev et al have proposed a space time classifier and key frame priming based method for action ‘drinking’.
• Ming Yang et al has proposed efficient detection method based on motion history images
• Qingshan Luo et al has proposed a novel action representation called local motion histogram and a gentle adaboost based feature selection technique
• Ke et al proposed a solution to detect smooth human actions
Previous Human Action Detection Solutions
• These solutions are based on action pattern analysis• Fails to detect human activities. Because,
1. Some activities do not have any pattern or structure within
2. Some activities are too complex to identify using basic action detection techniques
3. For some human activities it would be possible to create an action template, but when actions are performed this pattern would not be preserved
Problem Statement
Identifying such semantic human activities ?
Our Solution
• Identifying human activities based on Context Specific Information.
• We propose a solution prototype for the activity ‘smoking’
Context Specific Information
• Information set directly associated to a particular human activity
• Best description of the activity• Have the strength to discriminate the activity from
thousands of similar activity classes
CSI Examples
• Fighting– rapid hand, leg movements– collision of two or more human silhouettes
• Delivering a speech– changing facial expressions– continuous hand movements
• Riding a bicycle – continuous leg movements– bent hands and body– rapid moving in the space
Smoking
• Well-known human activity• Cause fatal diseases• CSI set associated with smoking
– Property IRepeating motion from hand/mouth to mouth/hand
– Property IIAppearance of Main Frame
– Property IIIAppearance of smoke
Solution Architecture
Input Video Frame Grabber
Human DetectorMotion Analyzer
Main Frame Detector
Smoke Detector
Frame CollectorOutput Video
Human Detector
• Detect and localize humans using face detection technique
• Deploys two classifiers to detect face frontal view and face profile view
• Haar cascades used for detection• If detector fails, no room for smoking scene
face frontal view
face profile view
Motion Analyzer
• Associated with smoking property I• Creates a motion history image to accumulate motion
information• Alarms MFD when there is a motion from hand/mouth to
mouth/hand
Main Frame Detector
• Associated with smoking property II• Detects main frames
• Uses object detection techniques to detection actions• Deploys a HOG feature based SVM for detection
Smoke Detector
• Associated with smoking property III• Detect appearance of smoke in video sequence• Uses modified version of Phillips III et als work• Accumulate n number of frames to capture smoke
properties• Uses properties of smoke: special color distribution
and rapid motion
Dataset
• No public dataset available with smoking videos or main frames
• We exploited movie ‘Coffee and Cigarettes’ and ‘Sea and Love’
• Downloaded samples for WWW• Training datasets not overlaps with testing data
Results Evaluation• Results of face frontal view detector
• Results of face profile view detector
• Results of combined face detector
Dataset Recall Precision
Movie 88% 98%
Global 78% 92%
Dataset Recall Precision
Movie 53% 88%
Global 57% 86%
Dataset Recall Precision
Movie 92% 90%
Global 84% 88%
Results Evaluation
• Results of main frame detector
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Series1
False Positive Rate
Det
ecti
on
Rat
e
Results Evaluation
• Results of smoke detector
Dataset Detection Rate FP Rate
Colored 92% 5%
Grayscale 60% 40%
Strengths of CSI Based Solutions
• Can be designed like a evidence collecting approach• Robust to action performing variations• Robust to dynamic and cluttered backgrounds
Future Works
• This is the introduction to significance of using CSI for activity detection. We expect an open discussion and more accurate solutions based on our concept.
• Classifier training using more samples• Analyze the importance of sound information associated
with a particular activity as a context specific information source.
Conclusion
• Action pattern recognition is sufficient for identifying basic human actions
• But it is not sufficient to detect human activities• CSI can be used to detect such human activities• CSI set used to detect one activity class cannot be used
to detect another activity class• Selection of CSI set for a particular activity should be
done carefully