semantic human activity detection in videos

Semantic Human Activity Detection in Videos

by

Hirantha Pradeep WeerarathnaDr. Anuja Dharmaratne

University of Colombo School of Computing

Definitions• Basic Human Action

– One simple motion of a human body organ

• Human Activity– Combination of basic actions in a row

• Semantic Human Activity– More meaningful human activities

• Human action detection is a well recognized problem in computer vision

• It is a very hard problem due to:– Variations in recording settings– Inter personal differences– Variations in action performing

• Many solutions have been proposed in the past for human action detection

• Significant observation on these solutions is almost all the solutions discuss about basic human actions no attention has been paid for human activities

• Most solutions are based on action pattern analysis

• Laptev et al have proposed a space time classifier and key frame priming based method for action ‘drinking’.

• Ming Yang et al has proposed efficient detection method based on motion history images

• Qingshan Luo et al has proposed a novel action representation called local motion histogram and a gentle adaboost based feature selection technique

• Ke et al proposed a solution to detect smooth human actions

Previous Human Action Detection Solutions

• These solutions are based on action pattern analysis• Fails to detect human activities. Because,

1. Some activities do not have any pattern or structure within

2. Some activities are too complex to identify using basic action detection techniques

3. For some human activities it would be possible to create an action template, but when actions are performed this pattern would not be preserved

Problem Statement

Identifying such semantic human activities ?

Our Solution

• Identifying human activities based on Context Specific Information.

• We propose a solution prototype for the activity ‘smoking’

Context Specific Information

• Information set directly associated to a particular human activity

• Best description of the activity• Have the strength to discriminate the activity from

thousands of similar activity classes

CSI Examples

• Fighting– rapid hand, leg movements– collision of two or more human silhouettes

• Delivering a speech– changing facial expressions– continuous hand movements

• Riding a bicycle – continuous leg movements– bent hands and body– rapid moving in the space

Smoking

• Well-known human activity• Cause fatal diseases• CSI set associated with smoking

– Property IRepeating motion from hand/mouth to mouth/hand

– Property IIAppearance of Main Frame

– Property IIIAppearance of smoke

Solution Architecture

Input Video Frame Grabber

Human DetectorMotion Analyzer

Main Frame Detector

Smoke Detector

Frame CollectorOutput Video

Human Detector

• Detect and localize humans using face detection technique

• Deploys two classifiers to detect face frontal view and face profile view

• Haar cascades used for detection• If detector fails, no room for smoking scene

face frontal view

face profile view

Motion Analyzer

• Associated with smoking property I• Creates a motion history image to accumulate motion

information• Alarms MFD when there is a motion from hand/mouth to

mouth/hand

Main Frame Detector

• Associated with smoking property II• Detects main frames

• Uses object detection techniques to detection actions• Deploys a HOG feature based SVM for detection

Smoke Detector

• Associated with smoking property III• Detect appearance of smoke in video sequence• Uses modified version of Phillips III et als work• Accumulate n number of frames to capture smoke

properties• Uses properties of smoke: special color distribution

and rapid motion

Dataset

• No public dataset available with smoking videos or main frames

• We exploited movie ‘Coffee and Cigarettes’ and ‘Sea and Love’

• Downloaded samples for WWW• Training datasets not overlaps with testing data

Results Evaluation• Results of face frontal view detector

• Results of face profile view detector

• Results of combined face detector

Dataset Recall Precision

Movie 88% 98%

Global 78% 92%


Movie 53% 88%

Global 57% 86%


Movie 92% 90%

Global 84% 88%

Results Evaluation

• Results of main frame detector

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Series1

False Positive Rate

Det

ecti

on

Rat

e

Results Evaluation

• Results of smoke detector

Dataset Detection Rate FP Rate

Colored 92% 5%

Grayscale 60% 40%

Strengths of CSI Based Solutions

• Can be designed like a evidence collecting approach• Robust to action performing variations• Robust to dynamic and cluttered backgrounds

Future Works

• This is the introduction to significance of using CSI for activity detection. We expect an open discussion and more accurate solutions based on our concept.

• Classifier training using more samples• Analyze the importance of sound information associated

with a particular activity as a context specific information source.

Conclusion

• Action pattern recognition is sufficient for identifying basic human actions

• But it is not sufficient to detect human activities• CSI can be used to detect such human activities• CSI set used to detect one activity class cannot be used

to detect another activity class• Selection of CSI set for a particular activity should be

done carefully

semantic human activity detection in videos

Documents

semantic human activities

human silhouettes

meaningful human activities

human activities csi

identifyingbasic human

definitions basic human

row semantic human activity

detection actions