![Page 1: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/1.jpg)
Human Activity Analysis
By: Ryan Wendel
![Page 2: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/2.jpg)
It is an ongoing analysis in which videos are analyzed frame by frame
Most of the video recognition is pulled from 3-D graphic engines
What is the Human Activity Analysis?
![Page 3: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/3.jpg)
“HAA” stands for Human Activity Analysis Surveillance systems Patient monitoring systems Human-computer interfaces
What HAA covers
![Page 4: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/4.jpg)
We are going to take a look at methodologies that have been developed for simple human actions.
And high-level activities.
What we will cover
![Page 5: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/5.jpg)
Gestures Actions Interactions Group activities
Basic Human Activities
![Page 6: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/6.jpg)
Basic movements of a persons body parts. For example: Raising an arm Lifting a leg
Gestures
![Page 7: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/7.jpg)
A Single persons activities which could entail multiple gestures.
For example: Walking Waving Shaking body
Actions
![Page 8: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/8.jpg)
Interactions that involve two or more people / items.
For Example: Two people fighting
Interactions
![Page 9: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/9.jpg)
Activities performed by multiple people. For example: A group running A group walking A group fighting
Group Activities
![Page 10: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/10.jpg)
Can be separated into two sections◦ Single-layered approaches: An approach that
deals with recognizing human activities based on a video feed (frame by frame.)
◦ Hierarchical approaches: An approach aimed at describing the high level approach to HAA by showing high level activities in simpler terms.
Activity Recognition Methodologies
![Page 11: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/11.jpg)
Main objective is to analyze simple sequences of movements of humans
Can be categorized into two different categories ◦ Space-time approach: takes an input video as a
3-D volume◦ Sequential approach: takes an input video and
interprets it as a sequence of observations
Single-layered approaches
![Page 12: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/12.jpg)
Divided into three different subsections based on features◦ Space-time volume◦ Space-time Trajectories ◦ Space-time features
Space-time approach
![Page 13: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/13.jpg)
Captures a group of human activities by analyzing volumes of a video (frame by frame.)
Also uses types of recognition using space-time volumes to measure similarities between two volumes
Space-Time Volume
![Page 14: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/14.jpg)
![Page 15: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/15.jpg)
![Page 16: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/16.jpg)
Uses stick figure modeling to extract joint positions of a person at each frame by frame
Space-Time Trajectories
![Page 17: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/17.jpg)
![Page 18: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/18.jpg)
Does not extract features frame by frame Extracts features when there is a
appearance or shape change in 3-D Space-time volume
Space-Time features
![Page 19: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/19.jpg)
![Page 20: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/20.jpg)
Space-Time Volume◦ Hard to differentiate between multiple people in
the same scene. Space-Time Trajectories
◦ 3-D body-part detection and tracking is still an unsolved problem, and it requires a strong low-level component that can estimate 3-D join location.
Space-Time features◦ Not suitable for modeling complex activities
Disadvantages of Space-time approach
![Page 21: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/21.jpg)
Divided into two different subsections based on features◦ Exemplar-based◦ State model-based
Sequential approach
![Page 22: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/22.jpg)
Review◦ Sequential approach: takes an input video and
interprets it as a sequence of observations Exemplar-based
◦ Shows human activities with a set of sample sequences of action executions
Exemplar-based
![Page 23: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/23.jpg)
![Page 24: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/24.jpg)
Sequential set of sequences that represent a human activity as a model composed of a set of states.
State Model-Based
![Page 25: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/25.jpg)
![Page 26: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/26.jpg)
Exemplar-based is more flexible in terms of comparing multiple sample sequences
Where as State Model-based can handle a probabilistic analysis of an activity better.
Exemplar vs State Model
![Page 27: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/27.jpg)
Sequential approach is able to handle and detect more complex activities performed
Whereas the Space-time approach handles simpler less complex activities.
Both methods are based off of some type of a sequences of images
Space-time vs Sequential approach
![Page 28: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/28.jpg)
Allows the recognition of high-level activities based on the recognition results of other simpler activities
Advantages of the Hierarchical Approach◦ Has the ability to recognize high-level activities
with a more in depth structure◦ Amount of data required to recognize an activity
is significantly less then single-layered approach◦ Easier to incorporate human knowledge
Hierarchical Approaches
![Page 29: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/29.jpg)
Statistical approach Syntactic approach Description-based approach
Three main subgroups of Hierarchical approach
![Page 30: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/30.jpg)
Statistical approaches use the state-based models to recognize activities
If you use multiple layers of a state-based model you can use these separate models to recognize activities with sequential structures
Statistical approach
![Page 31: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/31.jpg)
![Page 32: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/32.jpg)
Human activities are recognized as a string of symbols
Human activities are shown as a set of production rules generating a string of actions
Syntactic approach
![Page 33: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/33.jpg)
![Page 34: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/34.jpg)
Human activities that use recognition with complex spatio-temporal structures◦ A spatio-temporal structure is a detector used for
recognizing human actions Uses Context-free grammars (CFGs) to
represent activities ◦ CFGs are used to recognize high-level activities◦ The detection extracts space-time points and
local periodic motions to obtain a sparse distribution of interest points in a video
Description-based approach
![Page 35: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/35.jpg)
![Page 36: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/36.jpg)
Probability theory Fuzzy logic Bayesian network:
◦ Used for recognition of an activity, based on the activities temporal structure representation
◦ Uses a large network with over 10,000 nodes
Image Understanding (IU)
![Page 37: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/37.jpg)
A group of persons marching◦ The images are recognized as an overall motion
of an entire group A group of people fighting
◦ Multiple videos are used to recognize the activity that a “group is fighting”
Group Activities
![Page 38: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/38.jpg)
Recognition of interactions between humans and objects requires multiple components involved.
A lot of human-object interaction ignores interaction between object recognition and motion estimation
You can also factor in object dependencies, motions, and human activities to determine activities involved
Interactions between humans and Objects
![Page 39: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/39.jpg)
![Page 40: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/40.jpg)
J.K. Aggarwal and M.S. Ryoo. 2011. Human activity analysis: A review. ACM Comput. Surv. 43, 3, Article 16 (April 2011), 43 pages. DOI=10.1145/1922649.1922653 http://doi.acm.org/10.1145/1922649.1922653
Christopher O. Jaynes. 1996. Computer vision and artificial intelligence.
Crossroads 3, 1 (September 1996), 7-10. DOI=10.1145/332148.332152 http://doi.acm.org/10.1145/332148.332152
Zhu Li, Yun Fu, Thomas Huang, and Shuicheng Yan. 2008. Real-time human action recognition by luminance field trajectory analysis. In Proceedings of the 16th ACM international conference on Multimedia (MM '08). ACM, New York, NY, USA, 671-676. DOI=10.1145/1459359.1459456 http://doi.acm.org/10.1145/1459359.1459456
Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th international conference on Multimedia (MULTIMEDIA '07). ACM, New York, NY, USA, 357-360. DOI=10.1145/1291233.1291311 http://doi.acm.org/10.1145/1291233.1291311
References
![Page 41: By: Ryan Wendel. It is an ongoing analysis in which videos are analyzed frame by frame Most of the video recognition is pulled from 3-D graphic engines](https://reader035.vdocuments.mx/reader035/viewer/2022062515/56649c995503460f94956149/html5/thumbnails/41.jpg)
Questions?