deep neural networks for activity recognition with multi-sensor...

Deep Neural Networks for Activity Recognition with Multi-Sensor Data in a Smart Home

Jiho Park Department of Computer Science

Yonsei University Seoul, Korea

[email protected]

Kiyoung Jang Department of Computer Science


[email protected]

Sung-Bong Yang Department of Computer Science


[email protected]

Abstract— Multi-sensor based human activity recognition is one of the challenges in the ambient intelligent environments such as smart home and smart city. Ordinary people in their daily lives usually share a similar and repetitive life pattern, also known as life cycle. Smart home environment and its multi sensors can provide assistance to human by collecting the data sequence of human activities to predict the desired actions. Our goal is to analyze the sequence of activities recorded by a specific resident using deep learning with multiple sensor data. In this paper, we train the multiple sensor data collected by a smart home using several deep neural networks. According to the characteristics of the Recurrent Neural Network (RNN) structure, multiple sensor data of smart home is suitable for RNN because it has a sequence data in time. To support our assumption, we proposed the Residual-RNN architecture to predict future activities of a resident. Furthermore, we also utilized attention module to filter out the meaningless data to have more effective results than the one without. To verify our proposed idea, we used real resident activity in smart home using Massachusetts Institute of Technology (MIT) dataset. After our experiments, our proposed model with attention mechanism outperform the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) model in terms of predicting the desired activities of a smart home resident.

Keywords— Recurrent neural networks, Attention, Deep learning, Human activity recognition, Activity of daily life (ADL), Smart home

I. INTRODUCTION Most of the Internet of Things (IoTs) devices contain

sensors, and these are really important objects in pattern recognition and perception. A smart industry is difficult to cultivate without sensors and the relationship between multiple sensor data will play a significant role in the future. Especially, smart living environments, known as a smart home, are directly linked to the convenience of our daily lives. Therefore, research in pattern recognition using multi-sensor data is an area where rapid technological development is needed. It also involves the collection of data that helps to control the domestic living environments, with a goal of supporting and improving the daily routine and life quality. There are certain services that would be useful to be operate automatically based

on people’s repetitive life pattern, and smart home management system with artificial intelligence will be a great step towards it. A real smart home system needs to have the ability to take a necessary action by analyzing resident’s activity patterns without any manual operation [1]. Furthermore, the analysis of smart home sensors can also be part of energy management strategy in a smart grid system, because it uses the resources of sensors to reduce the energy consumption of a smart environment.

In order to develop a real smart home system, activity recognition is key to an activity of daily life (ADL) in sensor-based smart homes [2]. Since activity recognition is not a new topic, many applications are already developed [3]. However, they only targeted individual systems, not the entire environment. The goal for a smart home management system is to control every single device in the environment. In essence, instead of having multiple secretaries doing their own job individually, we hire one secretary who can handle the tasks altogether. In order to do so, multiple sensors need to be used. More generally, the difference between multi-sensor system and other sensor systems is that multi-sensor system is capable of recognizing different behaviors over time. The data collected from multiple sensors are sequential because sensors are collecting the data overtime. Therefore, we assume that it is important to store the past activities and analyze the relationship between the sensors’ data to improve the overall performance. The aim for multiple sensors-based activity recognition is to predict the variety of different activities of a resident. For example, by learning “Toileting” activity, the smart home can prepare hot water in sink or it can automatically open the curtain in the morning by learning “Waking up” activity based on the prediction of peoples’ lifestyle patterns. Thus, the process and method of learning the activities are taking important roles in this multi-sensor based activity recognition system.

Hence, deep neural networks can play a very good role in activity recognition, and it can be a great way to multiple sensor as features. Since the multiple sensor data has sequential characteristics, we applied a Residual-RNN model with their promise of performance and ease of use [9]. Furthermore, we adapted the attention module and applied it for better results.

155

First, the RNN model that we applied is LSTM model, one of the most popular RNN variants that is able to capture long range dependencies. In particular, the bi-directional LSTM model is used, where it is well-known to outperform other models in sequential data processing. Second model is the GRU model, which has a performance similar to the LSTM model [4]. The only difference is the number of gates that can be used (two gates for GRU; three or four gates for LSTM). Our Residual-RNN structure can yield high performance than RNN, because it uses short-cut path during the process. In particular, since the MIT dataset includes some problem, which contains some meaningless data, Residual vectors in hidden layers can capture the correct use of sensors. However, this model has high computation needlessly, we exploited the attention module. After attention module is adopted, simulation results show improvements by filtering out the irrelevant or meaningless sensor data. In this paper, we used multiple sensor data to develop an activity recognition system and investigated the error loss rate and the accuracy of some deep learning models [8]. In order to gain reliability, real dataset from MIT laboratory is used.

This paper is organized as follows. In section 2, we describe related work on activity recognition in smart environments. In section 3, we give a description of the dataset between activities and multiple sensor data. In section 4, we present a brief literature review of the RNN structure, our proposed Residual-RNN model, and an explanation of our attention module in steps. In section 5, we discuss in detail the experiment results. Lastly, we conclude in section 6.

II. RELATED WORK In the field of activity recognition, the use of sensors is

extremely diverse. As a matter of fact, activity recognition is used in wearable devices or smart industrial fields such as homes, farms, and factories. Among the various activity recognition fields, we are focused on research based smart homes. In particular, the activity recognition with multi-sensor data in smart environments has become a hot topic in the field of industrial IoT. We review some recent studies on this topic based on deep learning in this section.

Mehr, Homay Danaei, Huseyin Polat, and Aydin Cetin [5] present artificial neural networks based activity recognition in smart homes. They use our same dataset MIT to evaluate accuracy by using Quick Propagation (QP), Levenberg Marquardt (LM), and Batch Back Propagation (BBP). However, the most different point when we utilize the MIT dataset is that they only compared activities, and did not used sensors.

Fan, X., Zhang, H., Leung, C., and Miao, C. [6] present recurrent neural networks based activity recognition in home-like environment. They use multi-sensor system to identify activities by the inhabitants. To validate their proposed idea, they use some deep learning techniques such as LSTM, GRU, and Meta-Layer Networks.

Wang, A., Chen, G., Shang, C., Zhang, M., and Liu, L. [7] present artificial neural networks for activity recognition. They adopt a stacked auto-encoder to extract features, and training

data into a unified framework. To evaluate their method, they use three benchmark dataset from two different sensors.

III. MULTI-SENSOR PATTERN ANALYSIS To validate our proposed idea, we exploit a dataset

provided by MIT laboratory to obtain our experimental result [1]. The provided data consists of varieties of sensors and activities in a smart home.

A. Mutilple Sensor Data Collection There is a collection of data from 77 sensors equipped with

reed switch sensors that were installed in an apartment, with one resident, for two weeks. To understand the life pattern of a subject, the sensors were installed in every single furniture in an apartment like refrigerators, door, cabinet and tables. A combination of sensors consists of pressure sensors, light sensors, temperature sensors, gas sensors, sensing sensors, and etc. Once these sensors are used, main control server will be countered over time. If events are measured in time by recording activation, it could be marked as 1, otherwise 0. The distribution of activities is shown in the Table 1. It shows the number of behaviors for two weeks, and indicates the number of sensor used along with the activities. This raw data includes the number of overlapping sensors for each activity. Our simulated dataset contains total 295 activities, 76 sensors and the number of sensor activations detected is around 2,823.

TABLE I. DISTRIBUTION OF ACTIVITIES

Activities Number of Activities

Number of sensor

activations Watching TV 3 47 Toileting 84 404 Preparing lunch 17 505

··· Grooming 37 366 Bathing 18 254

B. Analysis of Relationship between Sensors and Activities Ultimately, the aim of this paper is to conserve energy by

predict the sensor usages based on the relationship between the sensors and the flow of time. Fig. 1 illustrates how to reduce energy consumption through the use of multi-sensor. First, in the (1) case 1, the sensor on the air conditioner is repeatedly used while a subject is watching TV in the living room. However, in terms of energy saving, it is not good to have a device consistently turned on and off recursively. Thus, if the control center understands this pattern, it can save the energy by not turning off the device if it is going to be used again. Second, when analyzing the use of the actual sensors, how to deal with the meaningless data is needed. For example, in (2) case 2, the usage of the medicine cabinet sensor is meaningless data because it does not help the system that a subject is currently preparing a meal. For the third case in (3), if there are some repeated behavior that uses repetitive sensors, the system can provide the service in advance. For example, prepare hot water for sink when a subject turns the switch of a light on.

156

Hence, we solved these cases to minimize the use of sensors in conjunction with RNN.

Fig. 1. Sensor usages according to various activties

IV. ACTIVITY RCOGNITION USING DEEP NEURAL NETWORKS In the section, we provide details of the proposed model

using deep neural networks through a Residual-RNN structure. First, we apply the LSTM model in activity recognition, and describe the usage of multiple sensors for input parameters and analyzed the relationship between the sensors. In particular, the reason why we adapt the RNN structure is because the input parameter of multiple sensors is time series. The Residual-RNN structure is known as highly efficient analysis of sequential data with minimal network layer, because residual vectors can reduce computation. Second, in the same way, the GRU model was applied. Third, we describe attention mechanism that filter out meaningless data to improve overall performance.

A. Apply for Bi-directional RNN model Our proposed deep learning model exploits Residual-

LSTM and GRU, two models that are widely used in RNN. These models are suitable for the input that have the parameter of time series because output results depend on the previous computation results continuously in sensor-based activity recognition. Fig. 2 illustrates general bi-directional RNN structure with the t-th cell in hidden layers to generate sensor embeddings. Initially, we should set learning rate and batch size for learning input data. The learning rate should be set

differently depending on the number of sample input data. The batch size is determined by separating the input data into several groups which have smaller size when total data size is huge, because separated batch size has advantages when performing the following actions; fast learning time, and less memory allocation. Thus, it is important to set the proper learning rate and batch size in RNN. In the Fig. 2, there are three sections of layer; a sensor embedding layer, hidden layers and feature vectors, and also the bi-directional RNN cells are employed in hidden layers. Our model works in the following order.

First, in the sensor embedding layer, we exploit sensor-id to characterize individual sensor. A set of multiple sensor data is synchronized in time, and inserted into each t-th cells in hidden layers. For example, one activity involves several sensors, which provide set of input values. It is entered sequentially from i1 to it-1.

Second, in the hidden layer, suppose the model has K layers and t-th cells in this layers.

ht(f) = RNN(f) (It, ct-1) (1)

ht(b) = RNN(b) (It, ct-1) (2)

where ht(f) and ht

(b) represent hidden states at time t from the forward and backward RNNs. It and ct-1 are input value and state of cell at time t respectively. The bi-directional RNN cell is shaped into a memory architecture. Thus, since the ht remembers less information of the former cell, these cell is constructed by the following three gates (e.g., input gate, output gate, and forget gate) to regulate the information of memory with weight between 0 and 1. At each time step, by adding two hidden states, we construct a set of feature vectors ut = ht

(f) + ht(b) where ut encodes the vectors from the t-th

corresponding sensor data. Note that we use LSTM and GRU cells as the hidden state model.

Third, in the feature vectors, we use sensor-time label, which is a matrix that contains sensors and time sequence. The matrix is constructed by binary units where “1” means the sensor is used in a certain time sequence, and “0” means not. For example, even though there is an activity “Toileting”, multiple matrix is made up, because the sequence of used sensors will differ as activities. However, the similarity of these matrix is closer than different activities. These output vectors are constructed by adding every single cells in hidden layers, and these sequence of binary units will make activity recognition possible.

Fig. 2. Our Bi-directional RNN architecture for activity recognition

157

We described the basic LSTM cell among the two cell structures, because a GRU cell using only two gate is hardly different from the LSTM cell structure. Fig. 3 explains how to work a single LSTM cell. The LSTM cell has essentially three or four gate structures, and they assist to control the flow of information into or out of their memory. In our proposed model, we utilized the four gates, which are the input gate, output gate, forget gate, and input modulation gate. The specific equations are as the following:

i = ReLU(Wi It + Ei Ut-1+bi) (3)

o = ReLU(Wo It + Eo Ut-1+bo) (4)

f = ReLU(Wf It + Ef Ut-1+bf) (5)

m = tanh(Wm It + Em Ut-1+bm) (6)

ct = f • ct-1 + Itm h (7)

ut = o • tanh(ct) (8)

where the i, o, f, and m represent input gate, output gate, forget gate and input modulation gate, and ct and ut are state of cell and feature vectors respectively. Also, Wt and Et are weight vectors of the t-gate in cell.

Fig. 3. The detailed a single LSTM cell structure

B. Semantic data analysis using Residual-RNN architecture In this section, in order to analyze the semantic data, we

added a spatial domain with shortcut path to deal with a vanishing gradient problem flexibility. The Residual-RNN has better performance in analyzing the input sensor data than RNN models. For example, a general LSTM structure can keep a temporal gradient flow without attenuation by maintaining forget gate. Although this gradient flow is not related to the relevant input data, it leaks into next layer without any conditions. In other words, a bundling of related input data is inefficient in a general LSTM structure. Fig. 4 represents our proposed Residual-RNN structure for activity recognition. To create the Residual vectors that will be used in next layers, previous input and processed output data are added.

Fig. 4. Our proposed Residual-RNN structure for activity recognition

Fig. 5 describes a detailed cell of residual LSTM cell. A Residual vector can obtain by a shortcut path that is added to a projection output ct. We have always designed to add the results of the previous input data and the processed output data directly in a shortcut path. In addition to the existing a single LSTM cell, the following equation has been added:

zt = tanh(ct) (9) dt = Wp • zt (10)

rt = ct • (dt + It-1) (11)

Where Wp can be replaced by an identity matrix if the dimension can be different types between input and output layers. And then, rt is a Residual vector, which is calculated by adding the previous input data It-1 and dimension replaced vector dt data with the projection output ct through the shortcut path.

Fig. 5. The detailed a single Residual-LSTM cell structure

C. Reduce meaningless data using Attention methodolgy For more complex results, attention mechanisms assist

neural network models to selectively pay attention to input data. We use soft attention which has some advantageous properties. Since this method uses a back-propagation model, learning data can be studied at once in the end-to-end case. Thus, this method is easy to compute and it can approximate a

158

hard attention function by selecting only a single fact. In the Fig. 6, our attention module is applied on the output of the last hidden layer. Assume that the list of output is r’k, k = {r’1, r’2, …, r’t-1}, where r’k is element of Rm (m is the output dimension of hidden layer L-1).

∑−

=

⊗=1

1'

t

ktkt ircv (12)

where the concatenated vector cvt is concatenated residual vectors r’k and input data it initially, and ⊗ is denoted as concatenation. Ultimately, we can be achieved by increasing the weight between the relevant multiple sensor data. Then, we adopted for drop out process to reduce computation volume in a fully-connected layer. In this way, we can recognize the human activities based on the used multiple sensor.

Fig. 6. Sensor Attention module

V. PERFORMANCE EVALUATION

A. Dataset and Methodology Datasets

To evaluate our proposed model, we trained multiple sensor and activity data from MIT laboratory [1], which is collected by the one resident in an apartment. All sensors were installed in everyday objects to record a resident behavior in two weeks. The sensors were activated when the machine works, or a movement of sensors is detected. This dataset contains around 295 activities, 76 sensors and the number of sensor activations detected is 2,823. To create the refined dataset, we used the Activity name, Date, Start time of activity, End time of activity and Sensors. Table 2 is shown an example of refined dataset, which is formed to determine the relationship between the sensors and the activities in time. Thus, according to each activity, the sequence of sensors is important point.

TABLE II. RFINDED DATASET FROM MULTIPLE SENSOR DATA

Activity name Toileting Preparing dinner Doing laundry

Date 2003-03-28 2003-03-28 2003-03-29

Start time 12:30:56 19:44:07 15:43:11

End time 12:31:20 19:53:47 15:51:18

Activity name Toileting Preparing dinner Doing laundry

Sensor 1 Light switch Cabinet Door

Sensor 2 Toilet flush Refrigerator Washing machine

···

Sensor n Cabinet Oven Exhaust fan

Competitors

We compared our proposed Residual-LSTM/GRU models with the following baselines.

� ANN [5]: An Artificial Neural Network (ANN) is standard rating comparison model in deep learning that only used 10 neurons in their hidden layers. � LSTM / GRU [6]: There are typical RNN structures, LSTM and GRU models are appropriate comparison to our methods, which have 2 hidden layers.

Implementation Detail

We implemented the Residual-RNN structure using Python and Tensorflow library. As for the detailed Residual-RNN structure, we used the following setting: 1) Our model consists of two bi-directional hidden layers that each includes 256 nodes. 2) In two hidden layers, we used learning rate as 0.01 to consider various sensor volume, and used each batch size as 3. 3) For Attention module, we used drop rate as 0.5 in a fully-connected layer for efficient computation. Evaluation Protocol

To evaluate the overall performance of each model on the real world datasets, we randomly split each dataset into a training set 150 sample data (50%) of the total 295 activity dataset, a validation set (25%) and the rest of the data (25%) was actually used a test set to deal with the recognition. As the evaluation measure, we used a Root Mean Square Error (RMSE) in terms of results accuracy, which means it directly related to an objective function of conventional rating prediction model. We repeated this evaluation procedure 5-fold cross-validation with 20 times from randomly data split process and we reported mean test errors.

B. Quantitative results on comparison techniques Table 3 shows the overall rating prediction error of ANN

[5], LSTM/GRU [6] and Residual-LSTM/GRU. Compared to two model, Residual-LSTM/GRU achieve significant improvements on all the comparisons. In particular, the higher ratio of training set, the more likely to result high accuracy of the test RMSE. Owing to the MIT dataset characteristics with small volume, the architecture of ANN has almost no difference with the LSTM/GRU structure. However, we have shown that we have improved the performance by proposing the Residual-LSTM/GRU structure.

TABLE III. TEST RMSE OVER VARIOUS SPARSENESS OF TRAINING DATA

159

Ratio of training set to the entire dataset Model 10% 20% 30% 40% 50% 60% 70% 80% 90%

ANN [5] 0.6802 0.6713 0.6708 0.6714 0.6738 0.7028 0.7076 0.7132 0.7121 LSTM [6] 0.6803 0.6847 0.6890 0.6937 0.6982 0.7030 0.7075 0.7121 0.7120 GRU [6] 0.6799 0.6842 0.6887 0.6932 0.6976 0.7023 0.7069 0.7115 0.7117

Residual-LSTM 0.8720 0.8777 0.8827 0.8880 0.8930 0.8971 0.9013 0.9049 0.9085 Residual-GRU 0.8652 0.8216 0.8516 0.8652 0.8652 0.8885 0.8812 0.9000 0.8952

C. A comparative results on LSTM and GRU models In Fig. 7, we compare the Residual-LSTM and Residual-

GRU models for activity recognition in term of error loss rate. It shows the difference of two models which tend to be similar. Although the Residual-LSTM shows a slightly better performance, the Residual-GRU has slightly faster computation, because LSTM structure has one more gate than GRU structure. Consequently, it was confirmed that there are few differences between two models.

Fig. 7. Error loss rate comparison according to Residual-LSTM and Residual-GRU models

D. A comparative analysis on Attention mechanism We obtain results with dividing the performance between

with cases attention and cases without.

Fig. 8. Error loss rate comparison between iteration steps with attetention and without attention

In Fig. 8, we can see the difference between two models. In particular, it brings out a clear distinction, the error loss rate is high without attention in the beginning. Although the gap between the two models gradually narrows depending on the iteration step, the without attention model still contain higher errors than the other. By using attention, the similarity between sensor data had a significant effect on output results,

because attention mechanism utilizes the learned outputs of the multiple sensor data to filter out the consequences of some degree.

VI. CONCLUSION AND FUTUREWORK In this work, we develop a novel activity recognition in a

smart home, a Residual-RNN with attention module to have higher performance than other methods. By recognizing activities of a resident using real sensor dataset, we tried to the use of efficient sensors that could help to reduce overall energy savings. Furthermore, the applying attention module eliminates meaningless data and produces better performance. Extensive results demonstrate that Residual-RNN significantly outperforms the state-of-the-art competitors. Especially, our models well deals with a high accuracy rate than others. However, it tended to take a little more time compared to others. Also, if the amount of datasets was a bit more, we can find better results.

ACKNOWLEDGMENT This research was supported by the Basic Science Research

Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2016R1A2B4010142).

REFERENCES [1] Tapia, E. M., Intille, S. S., & Larson, K. Activity recognition in the

home using simple and ubiquitous sensors. In International Conference on Pervasive Computing, pp. 158-175, April 2004.

[2] Chen, L., Nugent, C. D., & Wang, H. A knowledge-driven approach to activity recognition in smart homes. IEEE Transactions on Knowledge and Data Engineering, 24(6), pp. 961-974, 2012.

[3] Cook, D. J., Crandall, A. S., Thomas, B. L., & Krishnan, N. C. CASAS: A smart home in a box. Computer, 46(7), 62-69, 2013.

[4] Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.

[5] Mehr, H. D., Polat, H., & Cetin, A. Resident activity recognition in smart homes by using artificial neural networks. In Smart Grid Congress and Fair (ICSG), 2016 4th International Istanbul, pp. 1-5, April 2016.

[6] Fan, X., Zhang, H., Leung, C., & Miao, C. Comparative Study of Machine Learning Algorithms for Activity Recognition with Data Sequence in Home-like Environment. Proc. IEEE MFI, 2016.

[7] Wang, A., Chen, G., Shang, C., Zhang, M., & Liu, L. Human Activity Recognition in a Smart Home Environment with Stacked Denoising Autoencoders. In International Conference on Web-Age Information Management , pp. 29-40, June 2016.

[8] Van Kasteren, T., Noulas, A., Englebienne, G., & Kröse, B. Accurate activity recognition in a home setting. In Proceedings of the 10th international conference on Ubiquitous computing, pp. 1-9, September 2008.

[9] Kim, Jaeyoung, Mostafa El-Khamy, and Jungwon Lee. "Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition." arXiv preprint arXiv:1701.03360 (2017).

160

deep neural networks for activity recognition with multi-sensor...

Documents