activity recognition using rgb-depth sensors-final report

26
Activity Recognition Using RGB-Depth Sensors Issue date: 03.03.2016 Student(s): Nazli Temur Supervisor(s) Francois Bremond Carlos Fernando Crispim Junior Projet étudiant M1 2015-2016 [R| Final Report

Upload: universite-nice-sophia-antipolis

Post on 15-Feb-2017

25 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Activity Recognition Using RGB-Depth Sensors-Final report

[PROJECT TITLE] 1

Activity Recognition Using RGB-Depth Sensors

Issue date: 03.03.2016

Student(s):

• Nazli Temur

Supervisor(s)

• Francois Bremond • Carlos Fernando Crispim Junior

Projet étudiant M1 2015-2016 [ R |

Final Report

Page 2: Activity Recognition Using RGB-Depth Sensors-Final report

Abstract

Activity Recognition deals with the representation of the real activities in a way that they can be recognized by machines. In other words once modeling of human vision was achieved, artificial objects [cameras] that are capable of capturing and replicating view was introduced. This example can be generalized for modeling the activity portion of the reality. Once event modeling is achieved, the assumption is to come up realtime - real world event recognition through artificial objects. Current state of art contains several methods to achieve intelligent tracking. Some methods are machine learning oriented.In these methods , activities are learnt from the context in an unsupervised or semi supervised manner. One other methods is description based event recognition. In the heart of the method , describing scenarios wrt activities employed. For the description, a language is necessarily needed. There are mathematical languages in which logic is used to represent activities and their relations.Also some graphical languages such as hidden markov models, state machines, state charts are being used. Some textual languages proposed aswell. In this study , i will be performing an analysis on a constraint satisfaction applied, RGBD based algorithm to improve the algorithm. I dealt with unconstraint environment of a nursing home. Existing algorithm includes detection-tracking and recognition. To be able to improve the recognition, i dealt with its description based nature by investigating defined event models and constraints. I find it useful to extend existing event scopes with modeling existential qualifications, a route-target, appear, disappear and increase and decrease properties. Algorithm improvement is done with respect to recognition of multiple actor scene. Evaluation of the newly proposed event models are done after evaluating existing performance of the algorithm. Currently i am not able to claim a strong improvement in the algorithm because of a computational question of the event recognition. In this study i showcased the challenges that i overcome and learnings.

! 2 [PROJECT TITLE]

Page 3: Activity Recognition Using RGB-Depth Sensors-Final report

Table of Contents 1.General Project Description 4 ..........................................................

Context 4 .....................................................................................................................................................Motivations 6 ...............................................................................................................................................Challenges 6 ................................................................................................................................................Goals 7 .........................................................................................................................................................

2.State of the Art 8 ..........................................................................

3.Work done so far : progresses achieved during the period March-May 9 ......

4.Work Progress of Second Half 19 ......................................................

5.Conclusion 25 ..............................................................................

6.Bibliography 26............................................................................

! 3

Page 4: Activity Recognition Using RGB-Depth Sensors-Final report

1. General Project Description

Context

Domain of the project is automatic sequence video interpretation. The “SUP” (“Scene Understanding Platform”) Platform developed in STARS, detects mobile objects, tracks their trajectory and recognizes related behaviors predefined by experts.

This platform contains several techniques for the detection of people and for the recognition of human postures and gestures of one person using conventional cameras. Classification into more than 8 classes (e.g Person, Group, Train) based on 2D and 3D descriptors (position,3D

! 4 [PROJECT TITLE]

Generic Platform for Activity Understanding

Page 5: Activity Recognition Using RGB-Depth Sensors-Final report

ratio height/width,..) can be seen below.

Example of 4 classes : Person, Group, Noise, Unknown.

People detection using local descriptors;

! 5

Page 6: Activity Recognition Using RGB-Depth Sensors-Final report

Moreover RGBD sensors have been released improving people detection. This kind of sensor is well adapted for applications which monitor persons (e.g. monitoring Alzheimer patient in hospital): because the people are in a predefined area and near the camera. The depth cameras have two main advantages: first, the output images contain depth information and second, the sensor is independent on the light changes (IR sensor). In this work, we propose to use RGBD sensor to acquire 3D images, detect people and recognize interesting activities. The SUP library is able to compute some treatments (e.g. detection of the people) and to provide a true 3D map of the scene in the referential of the RGB-D sensor.

Motivations

This Project Course consists in the improvement of the activity recognition process using RGB-Depth sensors to improve monitoring systems for older adults. Many techniques have already been proposed for detecting people in specific environment (e.g. low level of occlusion) on short time duration (e.g. few minutes). However, activity recognition is still brittle on long term (e.g. few weeks) and often depends on the scene conditions (e.g. number of people, occlusion, and people interactions). This work aims at reducing these hypotheses in order to conceive a general algorithm enabling the recognition of people activities on long terms living in an unconstrained environment and observed up to a limited number of cameras (e.g. 2 RGB-D sensors). The goal is to review the literature, evaluate existing libraries and propose more robust algorithms and more accurate activity models.

To validate the Project Course, proposed approach on will be assessed via homecare videos from Nice Ehpad to evaluate technologies to keep older adults functioning at higher levels and living independently.

Challenges

There are scientific challenges in people detection when dealing with real word scenes with apathetic patients: cluttered scenes, handling wrong and incomplete person segmentation, handling static and dynamic occlusions, low contrasted objects, moving contextual objects (e.g. chairs)…

Moreover, with the connection of people detection and tracking, recognizing the behavior of the person still remains a challenge. Event modeling is a way to structure the behavior recognizable. Indeterministic property of real world , naturally contributes to this challenge by increasing complexity of the event nature.

! 6 [PROJECT TITLE]

Page 7: Activity Recognition Using RGB-Depth Sensors-Final report

Briefly, automatic recognition of the events which involves complex events are challenging. Creating a real time algorithm that achieves complex event recognition is strongly desired by using RGB Dept camera and multi sensor combination is a way to improve.

Goals

Main purpose of the project is to improve existing recognition algorithm and its application level solution. To achieve this, firstly recognition performance of the existing algorithm should be evaluated. Afterwards, possible problems that impact algorithm efficiency negatively is needed to identified. Moreover, with respect to identified problems, improvements are necessary to be proposed as a part of their applicability in a restricted time and their feasibility. Briefly, the goals are;

1st month: ● Study the limitations of existing solutions and improve the current tool to visualize the

recognized events.

2nd month: ● Proposing more accurate activity models.

3rd month: ● Evaluate and improve the current algorithm for recognizing activities.

4th month: ● Optimize the proposed algorithm for recognizing activities and write report.

! 7

Page 8: Activity Recognition Using RGB-Depth Sensors-Final report

2. State of the Art

Video surveillance systems are automatically recognizes complex events involving several actors. This recognition process are named in literature as video understanding ,video activity recognition, event recognition or scene understanding Video understanding is one of the most trending topic of last decade. Many recent approaches have been pro-posed. Any happening that can be tracked via video cameras is in the scope of video understanding. Recently, composed sensory information became popular to support video understanding along with video cameras. Briefly, video understanding is about making intelligent observations from video in an offline or online manner. To achieve this , it is necessary to gather scene in a specific wrt specific details. Recently , RGB-Dept cameras became popular because they are advantageous for capturing scene with 3D information. Some methods contains necessary wearable to be carried all along recognition process,some not. Howell and Buxon proposed an approach to recognize a video event based on a neural network. But this method remained inefficient to recognize complex events. Complex events are basically the events are combination of multiple events wrt number of actor, time and other constraints such as simultaneous occurrence or re occurrence.[4]

Geber defined a method to recognize a video event based on fuzzy temporal logic. Pinhanez and Bobick have used Allen’s interval algebra to represent video events. Shet have recognized activities based on Prolog Rules. Rota and Thonnat used constraint resolution technique to recognize video events. Another approach is using a symbolic network which stores partially recognized video events. A chronicle is a set of temporal constraints on time stamped events and Ghallab used this chronicle terminology to express video event. Vu have combined previous approaches to optimize the temporal constraint resolution by ordering video events to be recognized in time .[4]

Lavee et al. categorizes computer vision approaches for event recognition in following categories; State Models, Pattern Recognition Semantic Models,

Multi sensor approaches for event recognition generally perform fusion at input data or feature level using State Models or Pattern Recognition Approaches. But those methods are generally complex enough to be applied real life scenarios because of their semantic weaknesses.[2]

To cope with this difficulty , in the existing Event Recognition Algorithm, by Vu, Hierarchical Model Based Framework. In other words, a generic onthology is described by the help of

! 8 [PROJECT TITLE]

Page 9: Activity Recognition Using RGB-Depth Sensors-Final report

multi sensor data as Event Models. Basically framework is composed of two main components a generic ontology which includes temporal event algorithm that is inspired from Ghallab. Algorithm takes the pre-defined event models wrt their temporal priority and returns whether constraints are satisfied or not.[3]

3. Work done so far  : progresses achieved during the period March-May

Briefly, proposal of a new algorithm which will contribute to the recognition improvement of the existing algorithm is done. The current progress is achieved at four steps. Firstly, reference documents and application of algorithm the existing algorithm is investigated.

Secondly, source videos and their ground truths’ are verified that they are complete enough to start for algorithm performance evaluation. For that necessary installations are done.

! 9

Page 10: Activity Recognition Using RGB-Depth Sensors-Final report

Thirdly, algorithm is evaluated .

! 10 [PROJECT TITLE]

Page 11: Activity Recognition Using RGB-Depth Sensors-Final report

For the evaluation, TP,FP,FN measures are taken into account to have some idea about accuracy,precision, F-Measure of the Algorithm.

Lastly, wrt evaluation results, event models are investigated and findings are discussed with the supervisor. Then, implementation of a demo is started with current size of 1.5KLOC.

Those 4 steps were the basis of proposed methods of improvement. Recently, I have identified that existing event models are defined to work with single actor instead of multiple actor. So , as an improvement multiple actor involving recognition is targeted . So that, there is an existing need for new event model definition. This new model definition involves the “benefiting from existential qualifications” and Thinth language syntax will be extended for this purpose. We are aiming to newly define increase and decrease events that are triggered via newly defined, appear and disappear events. I formed a reachability graph in which multiple actors can move through certain zones.Similarly, certain zones are called as IO zone where people can firstly appear or disappear. To be able to define event models, we have to follow Thinth Language rules syntactically, then we will be testing strength of event models with HomeCare Videos.

The other improvement for the multiple actor based scene is to benefit from trajectories of each actors. The idea comes with the investigation of existing Event Models. Those are max 2 second defined , short durational events. so that their occurrence does not have a predictable sign for the occurrence of semantic coherency of the events. If we can somehow anticipate a coherency in between events or occurrence of a certain event, we can achieve stronger decisions to prevent noise and misconception. To achieve , instead of retaining “Enter”,”Exit” Events we are going to model approaching events and with the help of past trajectories of each actor, we are going to distinguish set of activities wrt actor to actor.

The basic idea is about modeling a route-target relation. So that as we have in real life, the person who follows the route reaches to the target. In other words, a route will become a constraint to satisfy occurrence of the event that is “reaching to the target”. Primarily, we observed that some of the recognition problem arises from either misclassified positioning (direction) of the person or constant noise presence of which invalidates single actor scenarios.

To be able to understand how this route-target idea can help to the improvement of the recognition, i started to implement a demo in which 3d point positions of the each actor(detection algorithm outputs an xml) is parsed. By the help of 3d point coordinates i am extracting the trajectories of each actor automatically. Then, localizing the actor i estimate the direction and of the actor via triangulating 3d temporal point positions of the actors.

! 11

Page 12: Activity Recognition Using RGB-Depth Sensors-Final report

! 12 [PROJECT TITLE]

Page 13: Activity Recognition Using RGB-Depth Sensors-Final report

After obtain the direction.Then i apply a strong matching technique, via growing circles from the 3d point coordinates. The details of that algorithm is as below and for now , we are able to validate or invalidate that trajectory of a person matches with a certain route. For now i concentrated on the ENTER_RESTROOM event. Furthermore possible route will be extracted as a sign of event signature.

While doing trajectory analysis , the strategy was to retrieve last 10 points [t/t-10] from trajectory . We performed an analysis on the sample data , to be able to observe a pattern .

The details of the algorithm is as below. Afterwards i will try to come an integer value which represents certain path. For now , i am evaluating the angle as a trajectory of a curve. If not, i will be improving my proposal.

Proposed Route Algorithm: By the help of 3D position coordinates of person, trying to extract the trajectory of the person and estimate whether his target is Restroom or not. This will help us to generate an”Person is approaching” event . Realization of the ENTER_RESTROOM can be strengthen with the help of trajectory. We are reading 3d coordinates of person from a file and writing into “vect_list” linkedlist. Then we are computing distance in between every three sequential points. Lets assume below are our 3d point set. Step1: obtain points;

! 13

Page 14: Activity Recognition Using RGB-Depth Sensors-Final report

Step2 : Compute distances wrt triangulations, So that by the help of Cosinus Theorem, we can obtain the angle of second point which preserves the rotation amount. //Any new point generates a triangle with previous two point.

Basically if the angle is 180 , this means person goes straight. Otherwise, person draws a curve. If the centroid is equal to p2, the trajectory of 3 points is a straight line.

! 14 [PROJECT TITLE]

Page 15: Activity Recognition Using RGB-Depth Sensors-Final report

Also based on the centroid position change direction we can understand the global directioning, in other words , even though there are locally left right positioning(jagging), global positioning could be through left direction.

To locate the movement, I introduced 4 direction(Target Position). Because left and right is not generic, it is a reference to state as a part of previous position what direction is the next positioning. To be able to use this reference, be should define the current direction(positioning). I did not take into account the y coordinate , because object length is assumed to be same while position changes.

D1 is defined by the help if two sequential point, if wrt to p1 point, z and x coordinate of p2 increases, person goes throuh D1, D2 is defined by the help if two sequential point at time (t0,t1); if wrt to p1 point, z and x coordinate of p2 decreases, person goes throuh D4, and so on. The amount of increase and decrease is not taken into account.

LEFT and RIGHT is determined by previous positioning , so any coordinate change which follows a sub sequence of D2==>D1==>D3==>D4 is going through right side, and the reverse is the left side. But get to know about the direction is not enough to determine target, so that we will benefit from the gradient of the theta. I did two computation wrt each angle at the trajectory. One of them is the computation of the average theta wrt time instance, through sum of all angles cumulatively and averaging, p1+p2/2 p1+p2+p3/3 p1+p2+p3+p4/4 but this is not really informative. What we can observe is whether person goes through a line (average of all occurred positionning has 179-180) otherwise it descreses.

! 15

Page 16: Activity Recognition Using RGB-Depth Sensors-Final report

Then I computed the gradient of the angle wrt each two point (alternative way is to compute the gradient of the angles wrt centroids, which preserves 3 point angles). theta2-theta1 theta3-theta2 theta(n)-theta(n-1) this is informative, because whenever we have a value below 10, we have a regular curve , we obtain a value in between 90 to 30 , the person have a strict curve (Turn).

To be able to understand how useful is the gradient of the theta, we obtained a sample set of ENTER_RESTROOM event trajectory and tried to compared gradients of with base path wrt last 10 points. But it is clear that we need more points to reach a conclusion.

! 16 [PROJECT TITLE]

Page 17: Activity Recognition Using RGB-Depth Sensors-Final report

While doing trajectory analysis , we investigated last 10 points [t/t-10] from trajectory . By the help of gradient of the angle or cumulative average of the angle , we tried to find a signature integer wrt ENTER_RESTROOM event. We performed an analysis on the sample data , to be able to observe a pattern . Unfortunately, we could not able to reach a conclusion on the trajectory analysis.

Additionally, I have the idea to grow circles around each 3d point of the trajectory and obtain a possible base path. Radius can be set by two point distance, so that, wrt speed , the set of points that we will work with can be estimated.

The problem is to determine whether trajectory of an actor follows the base-path. A path can be basically represented as a 2d polygon of 3d point correspondences or curvatures. So in the algorithm, we are generating a list L1 to store base path-which is a collection of 3d point coordinates. Then we are growing the points as a form of circles. To define the circle, we can compute a radius by taking into account 2 consequent point distance, {r} and assuming the center of the circle,{c} id the 3d point coordinate. Any point set which is equally distant from center c with radius r is namely in the AREA, which validates the path. Additionally , the distance in between two points preserves the speed of the actor. In the case we have greater radius, it is obvious that we will obtain the less number of points compare to base path. If the

! 17

Page 18: Activity Recognition Using RGB-Depth Sensors-Final report

point is not in the Range of AREA, we are going to traverse through the base path to find out the closest point of the base path to the external point(test point).This will let us to measure the point distance to the base path and if the direction of the each point is at the side of path , with the help of constant velocity , we can estimate whether , possible future position is in the path or not. This method is useful because, each point set is taken into account with respect to past trajectory of its single actor. The advantageous in the case there are multiple actors present at the scene, it is not possible to confuse with other actor's trajectories.

Code of the project and other reports are currently available in case of request by reminding that to be updated.

! 18 [PROJECT TITLE]

Page 19: Activity Recognition Using RGB-Depth Sensors-Final report

4. Work Progress of Second Half

For the second half of the project, the main goal was to complete the implementations of the both event models and trajectory solution to showcase the improvements. If there is not enough improvement , addressing the problematic parts by cause analysis is decided. Java implementation of the solution is almost done, to be able to test the approach before integrating the solution to the SUP platform. But still we had the missing determination about how to benefit from angle of the point positions wrt trajectory. Defining a curve and fitting a curve is not generic enough to apply. Even though in the java application, while parsing the tracking output xml file, we were able to consolidate the information as per each defined actor, in the case detection is lost and person is occluded for a while , we were missing some details of the information. So that we have we did not implement complete trajectory proposal into sup. we only implemented the c++ code in which we triangulate the points and compute the angles of the each point position. Then, with respect to gradient of the points , a flag is returned to the event models, which informs person moves with a constant angle or person performs slight turns via increase and decrease of the angles. This slight movement information could have been combined with the fact that if the person moves with a constant angle of a strait line, by the help of CHANGE_ZONE_TO_X_from_K primitive events, we could have eliminate some patterns such as enter restroom. But still the reasoning is not complete and there is a serious need of reviewing state of art of trajectory related solutions.

After, i started to focus on event modeling for improvement of activity recognition. In my project activity recognition method is description based activity recognition. In other words, modeling real world activities to recognize exactly what you have modeled. The advantage of this models is that it allows you to easily control the recognition scope. Very easy to define very complex event models which include parallelism, dependency on some strict constraints, timely order , multiple actors and so on. In the existing method , the definition of activities is done in a modular manner which means activities are composable . The method includes Allen’s interval. Allen interval can be thought of an interval graph in

! 19

Page 20: Activity Recognition Using RGB-Depth Sensors-Final report

which adjacent (touching) lines are representation of a time window from past to present. So that chronicles can be used including after , before , just before, just after , meet and so . Additionally event hierarchy is defined in which Composite Events are the composition of Primitive Events or Primitive States. While modeling events , it is necessary to define maximum of 2 components to be validated. Previously defined activity models are as below. As a part of my observation, previously defined event models are not suitable for a multiple actor presence in the scene. Existence of multiple actor invalidates the to be recognized event model.

! 20 [PROJECT TITLE]

Page 21: Activity Recognition Using RGB-Depth Sensors-Final report

To be able to expand the expressivity and functionality of existing event recognition algorithm , i have proposed to use number of mobile actor change as a signature of appear and disappear. So that while doing recognition in an occluded scene, we can address the invisibility of a pe r son w i th r e spec t t o t r ue even t , s uch a s ENTER_RESTROOM or EXIT_AREA_IN_BEDROOM. It is same for the visibility / appear-reappear.

One of the Newly defined event model is as below.

For defining an exit restroom there can be another approach.It is binding Exist Restroom to Enter Restroom event, so that we could distinguish an appear from a re-appear. But currently, algorithm does not perform well for the recognition of ENTER_RESTROOM, so when ever we miss an ENTER_RESTROOM event, we will be missing an EXIT_RESTROOM since they are coupled or binded. So that we have a suggestion for the users of the real time systems/reactive systems. The description language of the Event Recognition is necessarily needs to support modular definition of the activities. After ensuring each module of the event performs well, then they can be associated to the more complex events/event sets. What i have defined is modular.In other words, composite events of ENTER_EXIT does not have dependancy.

! 21

CompositeEvent(ENTER_RESTROOM, PhysicalObjects((p1 : Person), (z1 : Zone), (sc : Scene)) Components((c1: CompositeEvent Disappear(p1,sc)) (c2: PrimitiveEvent IN_AREA_IN_RESTROOM(p1,z1))) Constraints((c2 justBefore c1) (duration(c2) > 0.5)) Alarm ((Level : URGENT)) )

CompositeEvent(Disappear, PhysicalObjects((p1 : Person), (sc : Scene)) Components((c1: PrimitiveState Person_exists(p1)) (c2: PrimitiveState Negative_Change_Actor_Number(sc))) Constraints( (c1 justBefore c2) (!(Exist(p1)))) Alarm ((Level : URGENT)) )

PrimitiveState(Negative_Change_Actor_Number, PhysicalObjects((sc : Scene)) Constraints ((AttributeChange(sc->NumberOfActors) = 2)) Alarm ((Level : URGENT)) )

Page 22: Activity Recognition Using RGB-Depth Sensors-Final report

Additionally, I have defined Appear/Re-Appear/ChangeZONE_X_to_ZONE_y/ENTER_R_Fast events, to be able to observe some optimal properties such as what can be the optimal duration to be able to recognize events wrt quick and slow actors without missing. But evaluation of each change in the event models take average of 5 hours to output evaluation result. So that I have concentrated on the verification of multiple actor scene via ENTER_RESTROOM and EXIT RESTROOM.

For the evaluation i have used 2 kinds of metrics, -with respect to Frame Number 4.3 -wrt Event 4.1 after manually validating the metric performance we came up with the fact that , 4.3 is much stronger to represent real output performance.

As a part of first evaluation, we could not able to get any result from 4.3 metric for the updated event models, so that we performed an analysis wrt 4.1 metric and results are as below.

! 22 [PROJECT TITLE]

Page 23: Activity Recognition Using RGB-Depth Sensors-Final report

In the table, First result belongs to existing event models/ base case to compare improvements. The STEP1 includes new models and new zones, The STEP2 includes new extended models and same zones as STEP1. The STEP3 includes models of STEP2 with new update on Zone Definitions.

As you can see from the table, we got rid of almost all false positive events, by the help of new models. TP number is decreased by 2 and when we check manually, we have seen that the main problem is the decrease of zone area. Since Ground Truth annotation is done wrt previous zones, in the case we decrease the zone , we had to update the ground truth file. So that we have defined new zones wide enough and did a second evaluation.STEP3

STEP1 and STEP2 has the different event models. After defining event model of ENTER/EXIT RESTROOMS we observed that there are the cases, Instead of Enter_Restroom , EXIT RESTROOM is recognized. After investigation, we found out that disappear event occurrence is sometimes triggered later than IN_ZONE_RESTROOM, and sometimes triggered before than IN_ZONE_RESTROOM. There is a need of investigation with respect to people tracking, to have a clear answer about these output.

But to remind that 4.1 metric does not really represents the reality, we approached to use 4.3 Frame Metric instead. This metric is very sensitive. Each second composed of on average 6-7 frames, and evaluation is done wrt exact frame numbers. which means we are considering millisecond matching.

Below you can see the results wrt. 4.3 Frame Metric Execution. Because of a bug at the VISEVAL Tool, we could not able to output any value wrt 4.3 for the new evaluation.

! 23

Page 24: Activity Recognition Using RGB-Depth Sensors-Final report

After fixing the bug, we have repeated the evaluations. Result is as below.

After getting this result we have investigated possible cause behind the fact . What we observe is that Exit_Restroom is called each time ENTER_RESTROOM occured. Then when we compare the event models we have seen that, the recognition of disappear event sometimes occurs before IN_ZONE constraint and sometime occurs after. To overcome this , event duration constraints can be experimented for an optimal setting. Also tracking algorithm can be checked to figure out whether re-identification causes this problem or not.

! 24 [PROJECT TITLE]

Page 25: Activity Recognition Using RGB-Depth Sensors-Final report

5. Conclusion

In this project, description based video activity recognition and evaluation is studied.

Improvement proposals are suggested on Event Modeling and Trajectory Analysis Topics.

Thanks to the newly proposed event models, it is possible to recognize events in a scene where multiple actors and noises are present.

For the evaluation process, we benefited from several tool such as SUP Event Recognition Platform,Viseval ,Viper, KreateTool.

While evaluating , we have used True Positive,False Positive and False Negative measurements; F1 score, precision, recall metrics.[TP,FP,FN]

Evaluation on previously defined event models are completed.

Currently, latest assessment on newly defined zones and event models is not resulted with a significant improvement. Necessary analysis is done and there is a need of investigation to comment why significant improvement is not done. As a result of newly defines event models, we can comment that, at least for the ENTER_RESTROOM event, newly defined models are sufficient to replace with EMTY_SCENE to overcome the single actor scene problem.

! 25

Page 26: Activity Recognition Using RGB-Depth Sensors-Final report

6. Bibliography

[1] C. Crispim-Junior, K. Avgerinakis, V. Buso, G. Meditskos, A. Briassouli, J. Benois-Pineau, Y. Kompatsiaris and F. Bremond. Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition, Transactions on Pattern Analysis and Machine Intelligence - PAMI to appear, 2016.

[2] C. Crispim-Junior, V. Bathrinarayanan, B. Fosty, A. Konig, R. Romdhane, M. Thonnat and F. Bremond. Evaluation of a Monitoring System for Event Recognition of Older People. In the 10th IEEE International Conference on Advanced Video and Signal-Based Surveillance 2013, AVSS 2013, Krakow, Poland on August 27-30, 2013.

[3] C. Crispim-Junior, B. Fosty, R. Romdhane, F. Bremond and M. Thonnat. Combining Multiple Sensors for Event Recognition of Older People. In the 1st ACM International Workshop on Multimedia Indexing and Information Retrieval for Healthcare, MIIRH 2013, Copyright 2013 ACM 978-1-4503-2398-7/13/10, http://dx.doi.org/10.1145/2505323.2505329, Barcelona, October 22, 2013.

[4] Alberto Avanzi, Francois Bremond, Christophe Tornieri and Monique Thonnat, Design and Assessment of an Intelligent Activity Monitoring Platform, in EURASIP Journal on Applied Signal Processing, special issue in "Advances in Intelligent Vision Systems: Methods and Applications", 2005.

[5] E. Corvee and F. Bremond. Haar like and LBP based features for face, head and people detection in video sequences. In the International Workshop on Behaviour Analysis, Behave 2011, Sophia Antipolis, France on the 23rd of September 2011.

[6] C. Crispim-Junior and F. Bremond. Uncertainty Modeling Framework for Constraint-based Elementary Scenario Detection in Vision System. In the First International Workshop on Computer vision + ONTology Applied Cross-disciplinary Technologies in conjunction with ECCV 2014, CONTACT-2014, Zurich, Switzerland, September 7th, 2014.

[7] A. König, C. Crispim, A. Covella, F. Bremond, A. Derreumaux, G. Bensadoum, R. David, F. Verhey, P. Aalten and P.H. Robert. Ecological Assessment of Autonomy in Instrumental Activities of Daily Living in Dementia Patients by the means of an Automatic Video Monitoring System, Frontiers in Aging Neuroscience - open access publication - http://dx.doi.org/10.3389/fnagi.2015.00098, 02 June 2015

! 26 [PROJECT TITLE]