activity analysis in video spring 2005 computational intelligence seminar series partial review of...

Activity Analysis in Activity Analysis in VideoVideo

Spring 2005 Computational Intelligence Spring 2005 Computational Intelligence Seminar SeriesSeminar Series

Partial Review of the PaperPartial Review of the Paper““Discovery and Segmentation of Activities in VideoDiscovery and Segmentation of Activities in Video””

By Matthew Brand (MIRL)By Matthew Brand (MIRL)

Presented byPresented byDerek AndersonDerek Anderson

TopicsTopics

1.1. TigerPlace ProjectTigerPlace Project

2.2. Monitoring Silhouette ActivityMonitoring Silhouette Activity

3.3. Monitoring Object ActivityMonitoring Object Activity

4.4. Monitoring both (separate or combined)Monitoring both (separate or combined)

5.5. Hidden Markov Models (Brief Introduction)Hidden Markov Models (Brief Introduction)

6.6. Evolutionary Computing for Structure Evolutionary Computing for Structure DiscoveryDiscovery

7.7. Matthew Brands Approach to Activity Matthew Brands Approach to Activity RecognitionRecognition

Context for this Context for this PresentationPresentation

TigerPlace ProjectTigerPlace Project One component of our system will involve analyzing video (in real-One component of our system will involve analyzing video (in real-

time) and recognizing an time) and recognizing an importantimportant set of “short term” activities set of “short term” activities

bed sensors

sensor mat

motion sensor

stove temp. sensor

gait monitor

DataManager

video sensornetwork

Activity Analysis

ActivityAnalysis

Alarm Filter

Behavior Reasoning

sensorevent

anonymizedvideo

videoactivitydescriptor

physicalactivity

descriptor

Alerts

Alerts

Caregiversand

Residents

(a)

(c)

(b)

sensorevent Video

bed sensors

sensor mat

motion sensor

stove temp. sensor

gait monitor

DataManager

video sensornetwork

Activity Analysis

ActivityAnalysis

Alarm Filter

Behavior Reasoning

sensorevent

anonymizedvideo

videoactivitydescriptor

physicalactivity

descriptor

Alerts

Alerts

Caregiversand

Residents

(a)

(c)

(b)

sensorevent Video

Sensor and Video Sensor and Video NetworksNetworks

We are doing the research for the video sensor We are doing the research for the video sensor networknetwork iPAQ hx4700 series PDA with HP PhotoSmart Digital iPAQ hx4700 series PDA with HP PhotoSmart Digital

CamerasCameras The results from the video network can be The results from the video network can be

combined with other sources of information from combined with other sources of information from the sensor network (gait monitor, bed sensors, the sensor network (gait monitor, bed sensors, …) to reduce false alarm rates and help increase …) to reduce false alarm rates and help increase the overall confidence that the activities occurredthe overall confidence that the activities occurred Is this going to be handled inside the behavior Is this going to be handled inside the behavior

reasoning component of the system … (fuzzy rules)?reasoning component of the system … (fuzzy rules)? Fuzzy Integrals? Fuzzy Integrals?

Fuzzy IntegralFuzzy Integral: use each of the sources of information in the sensor : use each of the sources of information in the sensor and video networks, taking into account how reliable each and video networks, taking into account how reliable each individually are (possible for different kinds of tasks), and asses our individually are (possible for different kinds of tasks), and asses our confidence in a particular hypothesis, which is an individual activity? confidence in a particular hypothesis, which is an individual activity?

ImportantImportant Elderly Elderly ActivitiesActivities

What kind of activities to recognize?What kind of activities to recognize? Presently, we are deciding on an initial Presently, we are deciding on an initial

set to studyset to study A few possibilities includeA few possibilities include

Total body motionTotal body motion Falling down (and not being able to get up)Falling down (and not being able to get up) Someone entering and leaving their bedSomeone entering and leaving their bed Sitting and getting up from a chairSitting and getting up from a chair

Partial body motionPartial body motion Taking their medicineTaking their medicine DrinkingDrinking

Monitoring while Monitoring while Ensuring PrivacyEnsuring Privacy

What features for the video system?What features for the video system? Common approach: Silhouette’sCommon approach: Silhouette’s

Silhouette is an image based Silhouette is an image based representation of individual with representation of individual with nearly all personal and nearly all personal and distinguishing information removeddistinguishing information removed

Features from silhouettes will be Features from silhouettes will be used to monitor an individuals used to monitor an individuals activityactivity

These silhouettes will be initially These silhouettes will be initially extracted through image extracted through image subtraction against a known and subtraction against a known and stationary background (cleaned up stationary background (cleaned up with binary morphology, with binary morphology, reconstruction operator)reconstruction operator)

What the Silhouette's really What the Silhouette's really look like look like

(still a (still a veryvery ideal setting) ideal setting)

Conventional Morphological Opening of Extracted Silhouette (Left)Morphological Reconstruction Operation on Extracted Silhouette (Right)

Silhouette motion over timeSilhouette motion over time(identification of activity (identification of activity

regions)regions)

Consecutive Silhouette Subtraction (left) and after additional Erosion Operation (right)

New Application?New Application? Do not necessarily focus on the silhouettes, but rather the objects Do not necessarily focus on the silhouettes, but rather the objects

in the environment (or the co-interaction of the two)in the environment (or the co-interaction of the two) Object or interesting landmark identificationObject or interesting landmark identification

SIFT (Scale Invariant Feature Transform)SIFT (Scale Invariant Feature Transform) Interesting enough texture on everything?Interesting enough texture on everything? Where are the camera’s placed?Where are the camera’s placed? Too complex to apply at first?Too complex to apply at first? Will it run real time (present equation, Bob = NO)Will it run real time (present equation, Bob = NO)

Low level simple image processing techniquesLow level simple image processing techniques Have to see what the resolution and quality of the images areHave to see what the resolution and quality of the images are Use simpler image processing techniques to recognize particular objectsUse simpler image processing techniques to recognize particular objects

How to deal with some occlusion (why co-interaction might be How to deal with some occlusion (why co-interaction might be helpful)helpful) Used the Used the YUV color space to help identify skin regions that helped in to help identify skin regions that helped in

dealing with occlusion for objects the individual would interact with dealing with occlusion for objects the individual would interact with (tracked the hands)(tracked the hands)

NLM Short-Term Fellowship (Summer 2004)NLM Short-Term Fellowship (Summer 2004) At the end of the summer, I used Bob’s SIFT implementation to identify At the end of the summer, I used Bob’s SIFT implementation to identify

key points from a pill bottle (used the minimum spanning tree and key points from a pill bottle (used the minimum spanning tree and density measure)density measure)

Helped reduce some of the false alarms (in the pill taking activity)Helped reduce some of the false alarms (in the pill taking activity)

Activity RecognitionActivity Recognition

I don’t think that we have decided on the I don’t think that we have decided on the exact approach to use yet?exact approach to use yet?

Looks like some form of HMMs might be Looks like some form of HMMs might be as good of place as any to start?as good of place as any to start? Simple Simple

DOHMMs, COHMMs, or MDCOHMMsDOHMMs, COHMMs, or MDCOHMMs HHMMs (Hierarchical)HHMMs (Hierarchical)

Learning Hierarchical Hidden Markov Models for Learning Hierarchical Hidden Markov Models for Video Structure DiscoveryVideo Structure Discovery

Entropic HMMs (Structure discovery)Entropic HMMs (Structure discovery) Discovery and Segmentation of Activities in VideoDiscovery and Segmentation of Activities in Video

Temporal Pattern Temporal Pattern RecognitionRecognition

Hidden Markov Models (HMM) are statistical methods (stochastic Hidden Markov Models (HMM) are statistical methods (stochastic networks) that model sequential patterns that arise from a set of networks) that model sequential patterns that arise from a set of observation sequences which are believed to have come from the observation sequences which are believed to have come from the process of interest.process of interest.

HMMs are known for their application in areas such as natural HMMs are known for their application in areas such as natural speech recognition, word and symbol recognition, etc ...speech recognition, word and symbol recognition, etc ...

HMMs are a doubly embedded stochastic process with an HMMs are a doubly embedded stochastic process with an underlying process that is not observable (hidden), but can only be underlying process that is not observable (hidden), but can only be observed through another set of stochastic processes that produce observed through another set of stochastic processes that produce the sequence of observations.the sequence of observations.

1

2

K…

1

2

K…

1

2

K…

…

…

…

1

2

K…

x1 x2 x3 xK

2

1

K

2

Mixture Density Continuous Mixture Density Continuous Observation HMMObservation HMM

HMM ProblemsHMM Problems

1)1) Given the observation sequence O = Given the observation sequence O = OO11OO22OO33…O…Ott, and a model m = (A, B, p), , and a model m = (A, B, p), how do we efficiently compute P(O | m)?how do we efficiently compute P(O | m)?

2)2) Given the observation sequence O and a Given the observation sequence O and a model m, how do we choose a model m, how do we choose a corresponding state sequence Q = corresponding state sequence Q = qq11qq22qq33…q…qtt which is optimal in some which is optimal in some meaningful sense?meaningful sense?

3)3) How do we adjust the model parameters How do we adjust the model parameters to maximize P(O | m)?to maximize P(O | m)?

Structure DiscoveryStructure Discovery A serious problem related to the deployment of HMMs A serious problem related to the deployment of HMMs

involves how to specify or learn the HMM model structureinvolves how to specify or learn the HMM model structure Matthew Brand has proposed a method based on entropy Matthew Brand has proposed a method based on entropy

to learn an “optimal” model structure to learn an “optimal” model structure We might look at identifying a general way to learn the We might look at identifying a general way to learn the

model structure in a simpler fashion, independent of the model structure in a simpler fashion, independent of the HMM type, since this will be used in not just a “lab” HMM type, since this will be used in not just a “lab” settingsetting

I am presently looking into using Evolutionary Computing I am presently looking into using Evolutionary Computing (EC) techniques to evolve and learn the HMM structure (EC) techniques to evolve and learn the HMM structure automaticallyautomatically

The difference would be related to the “compression” The difference would be related to the “compression” aspect and the few number of observations samples Brand aspect and the few number of observations samples Brand claims works claims works

EP OverviewEP Overview

Generation t+1

S1S2

S3

S1S4

S2S3

S1S2

S3S1S2

S3

S1S4

S2S3

S1S2

Generation t

F(Pi)

F(Pi)

F(Pi)

Generation t

S1

S1S4

S2S3

S1S2

S3

F(Oi)

F(Oi)

F(Oi)

Mutation

{P1, P2, P3, O1, O2, O3}

Selection

HMM

Walk before we start Walk before we start runningrunning

InitiallyInitially Test how well the procedure works on a fully Test how well the procedure works on a fully

connected DOHMM when we only mutate the connected DOHMM when we only mutate the states (add and remove operators)states (add and remove operators)

Test a few different measures of complexity (the Test a few different measures of complexity (the different fitness functions)different fitness functions)

Each chromosome in a generation acts like a Each chromosome in a generation acts like a seed to the next iterations Baum-Welch algorithmseed to the next iterations Baum-Welch algorithm

LaterLater Consider a more complicated MDCOHMM modelConsider a more complicated MDCOHMM model Try to derive a series of equations and mutation Try to derive a series of equations and mutation

operators that can take an initial population operators that can take an initial population estimated by the Baum-Welch and evolve what estimated by the Baum-Welch and evolve what was found (I believe that this would be a was found (I believe that this would be a completely new technique)completely new technique)

Matthew Brands Matthew Brands ApproachApproach

The principle of maximum likelihood is not The principle of maximum likelihood is not valid for small data sets, the training is rarely valid for small data sets, the training is rarely enough to wash out the sampling artifacts (i.e. enough to wash out the sampling artifacts (i.e. noise)noise)

He also leaves out the obvious, related to if we He also leaves out the obvious, related to if we have enough observations to estimate all the have enough observations to estimate all the different parameters in the network (the different parameters in the network (the degrees of freedom)degrees of freedom)

We may only have a few number of We may only have a few number of observations with a few “reflective” sub-observations with a few “reflective” sub-observation sequencesobservation sequences

He advocates replacing the Baum-Welch He advocates replacing the Baum-Welch formulae with parameter estimators based formulae with parameter estimators based that minimize entropythat minimize entropy

Claim is that this exploits the duality between Claim is that this exploits the duality between learning and compressionlearning and compression

Entropy MinimizationEntropy Minimization

First SetupFirst Setup Variety of activity, from picking up the phone (a few Variety of activity, from picking up the phone (a few

seconds) to activities such as writing (could take up to seconds) to activities such as writing (could take up to hours)hours)

Used a “blob” representation consisting of ellipse Used a “blob” representation consisting of ellipse parameters fitting the single largest connected set of parameters fitting the single largest connected set of active pixelsactive pixels

Background subtraction through identifying a Background subtraction through identifying a statistical model of the background and an adaptive statistical model of the background and an adaptive Gaussian color/location model (pixels that have Gaussian color/location model (pixels that have changed and others due to motion)changed and others due to motion)

Cleaned up the “blob” through dilation (he makes Cleaned up the “blob” through dilation (he makes reference to using a seed from the previous frame)reference to using a seed from the previous frame)

Observation vector uses high level geometric features, Observation vector uses high level geometric features, calculated from the mean and eigenvectors of a 2D calculated from the mean and eigenvectors of a 2D Gaussian fitted to the foreground pixelsGaussian fitted to the foreground pixels

30 minutes of data taken at random 30 minutes of data taken at random removed frames when no one is in the videoremoved frames when no one is in the video roughly 21 minutes after thisroughly 21 minutes after this

TrainingTraining

Only three sequences used for Only three sequences used for trainingtraining

Varied from 100 to 1,900 frames in Varied from 100 to 1,900 frames in lengthlength

# states = {12, 16, 20, 25, and 30}# states = {12, 16, 20, 25, and 30}

Procedure 1: Model Procedure 1: Model ActivityActivity

Procedure 2: Monitoring Procedure 2: Monitoring TrafficTraffic

Monitoring Simultaneous Monitoring Simultaneous ProcessesProcesses

HMMs traditionally are used to model a HMMs traditionally are used to model a single hidden processsingle hidden process

Brand modified (don’t know if he is the Brand modified (don’t know if he is the first, he claims this is novel) HMMs to take first, he claims this is novel) HMMs to take a varying number of observations per time a varying number of observations per time step step

The new image representation is a variable The new image representation is a variable length list of flow vectors between two length list of flow vectors between two subsequent imagessubsequent images

Flow vectors that are smaller than some Flow vectors that are smaller than some predefined threshold are disregardedpredefined threshold are disregarded

The model learns the typical locations and The model learns the typical locations and directions of the moving pixels, and the directions of the moving pixels, and the dynamic changes of these patternsdynamic changes of these patterns

InternalsInternals

Brand uses a modified version of a Brand uses a modified version of a multivariate Gaussian mixture modelmultivariate Gaussian mixture model

He deals with multiple observations He deals with multiple observations per time step by treating each per time step by treating each frame’s flow-list as an observation frame’s flow-list as an observation sequence for a mixture model at one sequence for a mixture model at one time steptime step

multi-observation-multi-observation-mixture+countermixture+counter (MOMC) (MOMC)

HMMHMM

First term is a distribution on the obv countFirst term is a distribution on the obv count The mixture Gaussians are 4D observing The mixture Gaussians are 4D observing

flow vectors in (x,y,dx,dy) spaceflow vectors in (x,y,dx,dy) space The mixture components model motion in The mixture components model motion in

particular directions and locationsparticular directions and locations The counter variable essentially models the The counter variable essentially models the

combined surface area of the moving combined surface area of the moving objects objects

Any Questions?Any Questions?

HMM LinksHMM Links Hidden Markov Models (General Introductions)Hidden Markov Models (General Introductions)

http://http://uirvliuirvli..aiai..uiucuiuc..eduedu//dugaddugad/hmm_/hmm_tuttut.html.html http://www.cse.ucsc.edu/research/compbio/html_format_paperhttp://www.cse.ucsc.edu/research/compbio/html_format_paper

s/hughkrogh96/cabios.htmls/hughkrogh96/cabios.html

Baum-Welch algorithm and the EM (Simpler math Baum-Welch algorithm and the EM (Simpler math derivation)derivation) (Bilmes) (Bilmes) http://citeseer.ist.psu.edu/bilmes98gentle.htmlhttp://citeseer.ist.psu.edu/bilmes98gentle.html

Entropic Hidden Markov Models (Matthew Brand)Entropic Hidden Markov Models (Matthew Brand) Discovery and Segmentation of Activities in Video (IEEE Discovery and Segmentation of Activities in Video (IEEE

Transactions on pattern analysis and machine intelligence, Vol Transactions on pattern analysis and machine intelligence, Vol 22, No. 8, Aug 2000)22, No. 8, Aug 2000)

Fuzzy Hidden Markov Models (Gader and Mohammed)Fuzzy Hidden Markov Models (Gader and Mohammed) Generalized Hidden Markov Models – Part I: Theoretical Generalized Hidden Markov Models – Part I: Theoretical

Frameworks (IEEE Transactions on Fuzzy Systems, Vol 8, No Frameworks (IEEE Transactions on Fuzzy Systems, Vol 8, No 1, Feb 2000)1, Feb 2000)

activity analysis in video spring 2005 computational intelligence seminar series partial review of...

Documents

video system

video networks

video spring

kind of activities

system fuzzy rules

segmentation of activities

activity recognition

sensor network gait