dynamic hand gesture detection and recognition with wifi...

6
Dynamic Hand Gesture Detection and Recognition with WiFi Signal Based on 1D-CNN Xu Pan, Ting Jiang, Xudong Li, Xue Ding, Yangyang Wang, Yanan Li Key Laboratory of Universal Wireless Communication, Beijing University of Posts and Telecommunications Beijing, China Abstract—Due to the rapid development of Internet of Things (IoT) technology and artificial intelligence, there is an urgent need for human-computer interaction (HCI) applications. The dynamic hand gesture recognition technology based on WiFi signal plays an important role. However, although the gesture recognition system using Channel State Information (CSI) has made great progress, we have observed that in the current research, most commercial network cards can not directly extract such signals, and the easily acquired received signal strength (RSS) can only recognize simple gestures. Therefore, in this paper, we present a universal framework to achieve dynamic hand gesture detection and recognition with RSS. We use RSS from multiple independent WiFi nodes to increase the upper limit of recognize capability, enabling the system to recognize seven complex dynamic hand gestures. The false trigger detection algorithm effectively eliminates the false triggers, and the detection accuracy is close to 91.38%. The system uses a state machine and a linear scale algorithm to accommodate different hand gesture speed with durations ranging from 0.9s to 5.4s. Furthermore, we analyze the errors of the detection algorithm and propose a recognition architecture based on One Dimension Convolutional Neural Network (1D-CNN) and two data collection strategies: gesture extending and gesture shifting. The proposed 1D-CNN effectively overcomes the error caused by hand gesture detection algorithm and the recognition accuracy reaches 86.91%. Combined with the gesture shifting strategy, the recognition accuracy is further improved to 93.03%. Keywords—dynamic hand gesture, detection, recognition, WiFi, 1D-CNN I. INTRODUCTION With the development of science and technology, human- computer interaction (HCI) technology is evolving. In recent years, at the rise of the Internet of things (IoT) and artificial intelligence, HCI based on human gestures recognition has become a hot topic both in the academia and industry. The traditional gesture recognition technologies are mainly based on wearable sensor [1], mobile device [2] and camera [3,4]. They have been a great success. However, wearing sensors or using mobile device sometimes affects the user experience, while the camera can only work in a line of sight (LOS) situation. In order to overcome these limitations, the target recognition technology based on wireless signals has become a research hotspot in academia. WiTrack [5], WiTrack2.0 [6] and RF-Capture [7] use frequency modulated continuous wave (FMCW) technology to realize human body detection. AllSee [8] can recognize gestures with radio frequency identification (RFID) tags. However, FMCW is not popular in people’s daily life and RFID has only small signal coverage. Compared with them, WiFi ubiquitously exists in almost every corner of people's daily life and has larger signal coverage. It is extremely valuable to effectively use WiFi signal for gesture recognition. In the research of dynamic gesture recognition based on WiFi signals, there are approaches based on received signal strength indication (RSSI), channel state information (CSI) and special signal processing based on software defined radio (SDR) device. Most systems based on RSSI can only be used for simple gesture recognition [9-12]. Although CSI-based systems can recognize fine-grained gestures, the extraction of CSI requires special network cards and drivers [13-19]. The special signal processing approach based on SDR will modify the processing flow of the signal [20,21]. Therefore, CSI-based and SDR-based methods are not universal. In this paper, we present a universal framework to achieve dynamic hand gesture detection and recognition based on WiFi signal. In order to overcome the different types of RSSI output in different network card, we define instantaneous received signal strength (IRSS) to describe the general RSS. We use IRSS extracted from multiple nodes to enhance recognition capability to realize the recognition for complex dynamic hand gestures. We conduct a detailed study on hand gesture detection and segmentation, analyze the error caused by the detection algorithm and propose a false trigger detection algorithm so that the detection accuracy can reach 91.38% without false triggers. The system uses a state machine and a linear scale algorithm to accommodate different hand gesture speed with durations ranging from 0.9s to 5.4s. The proposed 1D-CNN effectively overcomes the error caused by hand gesture detection algorithm and the recognition accuracy reaches 86.91%. Combined with the gesture shifting strategy, the recognition accuracy is further improved to 93.03% II. RELATED WORK Human gesture detection technology can be generally divided into several categories, such as wearable device based method [1,2], ultrasound-based method [22], computer vision- based method [3,4] and wireless signal based method and so on. The wireless signal-based approaches can be further divided into RF-based [5-8], UWB-based [23,24] and WiFi-based approaches. In the WiFi-based methods, there are RSS-based, CSI-based and special signal processing methods [20,21]. For RSS-based methods, RADAR [9] uses the RSSI of multiple nodes to track the target with fingerprint recognition. Literature [10] conduct a research on human body position tracking by analyzing the influence of human body position on RSSI of multiple nodes. RASID [11] combines different modules for statistical anomaly 978-1-7281-2373-8/19/$31.00 ©2019 IEEE

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Hand Gesture Detection and Recognition with WiFi ...static.tongtianta.site/paper_pdf/b8d9d0aa-ad4a-11e... · (IoT) technology and artificial intelligence, there is an urgent

Dynamic Hand Gesture Detection and Recognition

with WiFi Signal Based on 1D-CNN Xu Pan, Ting Jiang, Xudong Li, Xue Ding, Yangyang Wang, Yanan Li

Key Laboratory of Universal Wireless Communication,

Beijing University of Posts and Telecommunications Beijing, China

Abstract—Due to the rapid development of Internet of Things

(IoT) technology and artificial intelligence, there is an urgent need

for human-computer interaction (HCI) applications. The dynamic

hand gesture recognition technology based on WiFi signal plays an

important role. However, although the gesture recognition system

using Channel State Information (CSI) has made great progress,

we have observed that in the current research, most commercial

network cards can not directly extract such signals, and the easily

acquired received signal strength (RSS) can only recognize simple

gestures. Therefore, in this paper, we present a universal

framework to achieve dynamic hand gesture detection and

recognition with RSS. We use RSS from multiple independent

WiFi nodes to increase the upper limit of recognize capability,

enabling the system to recognize seven complex dynamic hand

gestures. The false trigger detection algorithm effectively

eliminates the false triggers, and the detection accuracy is close to

91.38%. The system uses a state machine and a linear scale

algorithm to accommodate different hand gesture speed with

durations ranging from 0.9s to 5.4s. Furthermore, we analyze the

errors of the detection algorithm and propose a recognition

architecture based on One Dimension Convolutional Neural

Network (1D-CNN) and two data collection strategies: gesture

extending and gesture shifting. The proposed 1D-CNN effectively

overcomes the error caused by hand gesture detection algorithm

and the recognition accuracy reaches 86.91%. Combined with the

gesture shifting strategy, the recognition accuracy is further

improved to 93.03%.

Keywords—dynamic hand gesture, detection, recognition, WiFi,

1D-CNN

I. INTRODUCTION

With the development of science and technology, human-computer interaction (HCI) technology is evolving. In recent years, at the rise of the Internet of things (IoT) and artificial intelligence, HCI based on human gestures recognition has become a hot topic both in the academia and industry. The traditional gesture recognition technologies are mainly based on wearable sensor [1], mobile device [2] and camera [3,4]. They have been a great success. However, wearing sensors or using mobile device sometimes affects the user experience, while the camera can only work in a line of sight (LOS) situation. In order to overcome these limitations, the target recognition technology based on wireless signals has become a research hotspot in academia. WiTrack [5], WiTrack2.0 [6] and RF-Capture [7] use frequency modulated continuous wave (FMCW) technology to realize human body detection. AllSee [8] can recognize gestures with radio frequency identification (RFID) tags. However, FMCW is not popular in people’s daily life and RFID has only small signal coverage. Compared with them, WiFi ubiquitously exists in almost every corner of people's daily life and has larger

signal coverage. It is extremely valuable to effectively use WiFi signal for gesture recognition.

In the research of dynamic gesture recognition based on WiFi signals, there are approaches based on received signal strength indication (RSSI), channel state information (CSI) and special signal processing based on software defined radio (SDR) device. Most systems based on RSSI can only be used for simple gesture recognition [9-12]. Although CSI-based systems can recognize fine-grained gestures, the extraction of CSI requires special network cards and drivers [13-19]. The special signal processing approach based on SDR will modify the processing flow of the signal [20,21]. Therefore, CSI-based and SDR-based methods are not universal.

In this paper, we present a universal framework to achieve dynamic hand gesture detection and recognition based on WiFi signal. In order to overcome the different types of RSSI output in different network card, we define instantaneous received signal strength (IRSS) to describe the general RSS. We use IRSS extracted from multiple nodes to enhance recognition capability to realize the recognition for complex dynamic hand gestures. We conduct a detailed study on hand gesture detection and segmentation, analyze the error caused by the detection algorithm and propose a false trigger detection algorithm so that the detection accuracy can reach 91.38% without false triggers. The system uses a state machine and a linear scale algorithm to accommodate different hand gesture speed with durations ranging from 0.9s to 5.4s. The proposed 1D-CNN effectively overcomes the error caused by hand gesture detection algorithm and the recognition accuracy reaches 86.91%. Combined with the gesture shifting strategy, the recognition accuracy is further improved to 93.03%

II. RELATED WORK

Human gesture detection technology can be generally divided into several categories, such as wearable device based method [1,2], ultrasound-based method [22], computer vision-based method [3,4] and wireless signal based method and so on. The wireless signal-based approaches can be further divided into RF-based [5-8], UWB-based [23,24] and WiFi-based approaches.

In the WiFi-based methods, there are RSS-based, CSI-based and special signal processing methods [20,21]. For RSS-based methods, RADAR [9] uses the RSSI of multiple nodes to track the target with fingerprint recognition. Literature [10] conduct a research on human body position tracking by analyzing the influence of human body position on RSSI of multiple nodes. RASID [11] combines different modules for statistical anomaly

978-1-7281-2373-8/19/$31.00 ©2019 IEEE

Page 2: Dynamic Hand Gesture Detection and Recognition with WiFi ...static.tongtianta.site/paper_pdf/b8d9d0aa-ad4a-11e... · (IoT) technology and artificial intelligence, there is an urgent

detection while adapting to changes in the environment to provide accurate, robust and low-overhead detection of human activities. WiGest [12] uses RSSI to acquire basic actions such as rising, falling, and stopping gestures through signal processing without training and can recognize gestures combined with these basic actions. For CSI-based approaches, E-eyes [13] implements fine-grained gesture recognition using CSI. PADS [14] suppresses the influence of equipment and environment on the original CSI data, extracts more accurate phase change information and improves the accuracy of recognition. CARM [15] proposes a model design method for human gesture recognition using CSI through the combination of CSI-speed model and CSI-activity model. In addition, there are many studies based on CSI approach, including finger gesture recognition [16], gait recognition [17], walking direction recognition [18], smoking detection [19] and so on.

Different from all the researches above, in this paper, we present a universal framework to achieve dynamic hand gesture detection and recognition with RSS.

III. SCENARIO SETUP

The established scenario and the detailed hand gestures are depicted in Fig.1. The scenario consists of one single-antenna WiFi transmitter placed on the left of a table and two single-antenna WiFi receivers placed on the right of the table. In this scenario, the transmitter is a common IoT WiFi device CC3200 which transmits standard WiFi frames with a period of 30ms, the receivers are two universal software radio peripherals (USRP) which can extract the received WiFi signals. The central frequency of the WiFi signal is set as 2.412GHz and the protocol is standard IEEE 802.11g [25]. The volunteer is asked to perform different hand gestures in the recognition area with his hand. The designed hand gestures are drawing 7 characters “ABCDEFG”. Hand gestures are marked start after the user put his hand from chest to desktop, then the user draws the characters with his hand touching the table within the recognition area, hand gestures are marked end when the user lifts his arm and his hand moves away from the recognition area.

IV. METHODOLOGY

The system structure is showed in Fig.2, which consists of several modules to realize data extraction, synchronization, denoising, detection and recognition.

Transmitter

ReceiversRecognition Area

Fig.1. The established scenario and the designed hand gestures “ABCDEFG” (Bold point indicates the start point)

Timer

𝑰𝑹𝑺𝑺𝟏, 𝑻𝟏 𝑰𝑹𝑺𝑺𝟏

𝑰𝑹𝑺𝑺𝟐 𝑰𝑹𝑺𝑺𝟐, 𝑻𝟐

Resample

DenoiseRecognition Detection𝑰𝑹𝑺𝑺 Segment

Fig.2. Structure overview of the universal framework for dynamic hand gesture detection and recognition

A. Information Pre-Processing

1) IRSS Extraction The characters of the received WiFi signal is determined by

the channel it passed through. When objects moving close to the LOS, the received WiFi signal will be mainly affected by the moving objects, in other words, there is a very strong relationship between the movement of the object and the change of the received signal. So by processing and analyzing the received signal, the system is able to learn the relationship between them so that it can classify the movements according to the received signal. In this paper, the IRSS is extracted from the long training sequences of the preamble of the WiFi frame [25] which consists of 128 points denoted by (P1, P2 …, P128) , Pi

represents the normalized value, so IRSS represents normalized amplitude:

𝐼𝑅𝑆𝑆 =1

128∑|𝑃𝑖|

128

𝑖=0

(1)

In this system, the two receivers are working independently, so the they may detect the same WiFi frame at different times. In order to solve this problem, the system uses an external clock reference source to record the IRSS and the corresponding timestamp together. The synced IRSS stream 𝑰𝑹𝑺𝑺 will be generated by a resample module with a period of 30ms.

𝑰𝑹𝑺𝑺 = [𝐼𝑅𝑆𝑆11 𝐼𝑅𝑆𝑆12 ⋯𝐼𝑅𝑆𝑆21 𝐼𝑅𝑆𝑆22 …

𝐼𝑅𝑆𝑆1𝑛

𝐼𝑅𝑆𝑆2𝑛

… …] (2)

2) Denoising Module It is necessary to eliminate the environmental noise after

getting the synced and resampled signal. By implementing Mallat’s algorithm [26], the system decomposes the signal into three vectors cD1,cD2,cD3 corresponding to the high frequency elements and one vector cA3 corresponding to the low frequency element. As the noise are mostly very small values in the high frequency vectors, so by removing them from the vectors and composing the modified vectors, the denoised signal can be reconstructed. Instead of setting a static threshold to identify the noise components in the high frequency vectors, an adaptive threshold noise eliminate algorithm is implemented in this part to make this module have a better performance. By calculating the mean value of each input vector cDi, the threshold thi is calculated as below and the scale factor α is set as 0.1 here. The structure overview of the module is depicted in Fig.3.

𝑡ℎ𝑖 =𝛼

𝑙𝑒𝑛𝑔𝑡ℎ(𝑐𝐷𝑖)∑ |𝑣|

𝑣∈𝑐𝐷𝑖

, 𝑖 𝜖 [1,3] (3)

Page 3: Dynamic Hand Gesture Detection and Recognition with WiFi ...static.tongtianta.site/paper_pdf/b8d9d0aa-ad4a-11e... · (IoT) technology and artificial intelligence, there is an urgent

Original

Signal

HPF

2

LPF

2

HPF

2

LPF

2

HPF

2

LPF

2

cD3 th3

2

HPF

2

LPF

2

LPF

2

HPF

2

HPF

cA3

cD2 th2

cD1 th1

2

LPF

Denoised

Signal

Fig.3. Structure overview of denoise module

B. Hand Gesture Detection Algorithm

1) Basic Idea When the user is performing a hand gesture in the

recognition area, the channel will change a lot so the extracted IRSS stream will become unstable. On the contrary, when there is no hand gesture within the recognition area, the channel will become relatively stable so the variance of the extracted IRSS stream will be small. By analyzing the variance of the data stream, the system is able to identify the start time and the end time of a hand gesture to generate corresponding segment.

2) Algorithm The structure of the detection module is depicted in Fig.4 and

the detection algorithm can be summarized as the Algorithm 1. At First, a rolling window is implemented to get the zero mean ZM of the IRSS to eliminate the variance caused by the environment noise. After that, the absolute value of the ZM represents the amplitude of the waveform fluctuation, by calculating the minimum absolute value of ZM at each time point, the output MA represents the common variance caused by the hand gesture. Then a smooth filter with a "smooth->zero mean->smooth->amplify" structure is used to extract the main variance SF of MA, then every peak in SF represents a possible hand gesture. After the difference operation, the variance trend of SF is calculated as DF stored in a buffer. By implementing a state machine, the indices of the start point and end point can be calculated and the segment Frag can be extracted from FIFO. A validation module will drop the Frag with length shorter than 30 and longer than 180, which means the valid time range is from 0.9s to 5.4s. This validation module also implement a kNN-based algorithm to identify the Frag caused by environment change to realize false trigger detection. After that, a linear scale module is used to scale the Frag into FragN with unified length.

The detail of state machine with adaptive threshold in Algorithm 1 is as below. As mentioned above, each peak in the SF represents a hand gesture as Fig.5. So the algorithm can find the start and end points by setting a threshold for its slope. Fig.6 is a comparison of SF and DF which shows three states of DF crossing start threshold th_s, zero and end threshold th_e. The shaded part in Fig.6 indicates the time range of the hand gesture. The corresponding state machine is shown in Fig.7. The adaptive thresholds are calculated as below, scale factor 𝛾 is set as 0.35, th_s_min and th_e_max are used to improve the robustness of the system when there is no hand gesture happened in the recognition area and are set as 7𝑒−5 and −7𝑒−5.

𝑡ℎ_𝑠 = max (max(𝑫𝑭) ∗ 𝛾, 𝑡ℎ_𝑠_𝑚𝑖𝑛) (4)

𝑡ℎ_𝑒 = min (min(𝑫𝑭) ∗ 𝛾, 𝑡ℎ_𝑒_𝑚𝑎𝑥) (5)

Algorithm 1 Hand Gesture Detection Algorithm

Input: denosied IRSS stream IRSS

Output: data segmentation of unified length FragN

1. Calculate the zero mean ZM of a 2k+1 points rolling

window, k is set as 3:

𝑍𝑀1𝑖 = 𝐼𝑅𝑆𝑆1𝑖 −1

2𝑘 + 1∑ 𝐼𝑅𝑆𝑆1𝑖+𝑗

𝑘

𝑗=−𝑘

𝑍𝑀2𝑖 = 𝐼𝑅𝑆𝑆2𝑖 −1

2𝑘 + 1∑ 𝐼𝑅𝑆𝑆2𝑖+𝑗

𝑘

𝑗=−𝑘

𝒁𝑴 = [𝑍𝑀11 𝑍𝑀12 …𝑍𝑀21 𝑍𝑀22 …

𝑍𝑀1𝑛 …𝑍𝑀2𝑛 …

]

2. Calculate the minimum absolute value MA of ZM:

𝑀𝐴𝑖 = 𝑚𝑖𝑛(𝑎𝑏𝑠(𝑍𝑀1𝑖), 𝑎𝑏𝑠(𝑍𝑀2𝑖))

𝑴𝑨 = [𝑀𝐴1, 𝑀𝐴2, … , 𝑀𝐴𝑛, … ] 3. Implement a smooth filter SF to the MA:

𝑺𝑭 = 𝑆𝐹(𝑴𝑨)

4. Implement a difference operation on SF to get DF:

𝑫𝑭 = [𝐷𝐹1, 𝐷𝐹2, … , 𝐷𝐹𝑛] 𝐷𝐹𝑖 = 𝑆𝐹𝑖+1 − 𝑆𝐹𝑖

5. Implement a state machine with adaptive threshold on DF

to get the indices of start point 𝑖𝑠𝑡𝑎𝑟𝑡 and end point 𝑖𝑒𝑛𝑑 of

the hand gesture and extract the segment from the FIFO:

𝑭𝒓𝒂𝒈 = [𝐼𝑅𝑆𝑆1𝑖𝑠𝑡𝑎𝑟𝑡

… 𝐼𝑅𝑆𝑆1𝑖𝑒𝑛𝑑

𝐼𝑅𝑆𝑆2𝑖𝑠𝑡𝑎𝑟𝑡… 𝐼𝑅𝑆𝑆2𝑖𝑒𝑛𝑑

]

6. Validate Frag with a false trigger detection algorithm.

7. Linearly scale Frag into FragN of unified length.

Zero Mean

Min of Abs

Smooth

FIFO

Diff Buffer

State Machine

Validation

Linear Scale

𝑰𝑹𝑺𝑺

𝑖𝑒𝑛𝑑 , 𝑖𝑠𝑡𝑎𝑟𝑡

𝑭𝒓𝒂𝒈

𝑭𝒓𝒂𝒈

𝑭𝒓𝒂𝒈𝑵

Fig.4. Structure overview of hand gesture detection module

Fig.5. The comparison of MA and SF

Fig.6. The three states for the detection algorithm in DF

Page 4: Dynamic Hand Gesture Detection and Recognition with WiFi ...static.tongtianta.site/paper_pdf/b8d9d0aa-ad4a-11e... · (IoT) technology and artificial intelligence, there is an urgent

state0 state1

state3 state2

0

𝑡ℎ_𝑠

𝑡ℎ_𝑠

𝑡ℎ_𝑒

Fig.7. The detail of the state machine. (For state transition conditions, the dashed line indicates the threshold and the arrow indicates the

direction in which the DF crosses the threshold.)

Dense

Softmax

maxpooling

𝑭𝒓𝒂𝒈𝑵

𝑁𝑓𝑖𝑙𝑡𝑒𝑟

𝑁𝑜𝑢𝑡

Fig.8. The structure of the 1D-CNN

Fig.9. Recognition accuracy when kernel_length and 𝑁𝑓𝑖𝑙𝑡𝑒𝑟(filter_number) take different values

C. Hand Gesture Recognition

1) Basic Idea Since a smooth filter is used in the detection algorithm, the

extracted start and end points of the hand gesture are not absolutely accurate. There may be a slight offset between the extracted hand gesture segment and the best estimated hand gesture segment. Therefore, the recognition algorithm should be able to overcome this problem. Since convolutional neural networks (CNN) are also called shift-invariant neural networks, CNN is suitable for this scenario. In addition, it is also possible to solve this problem by extending the hand gesture extraction window or shift the extraction window to collect more samples.

2) 1D-CNN Instead of implementing the common convolutional layer

used in the image processing field, the convolution kernel of 1D-CNN will just convolved with the input data with moving only along one dimension. The structure of the 1D-CNN is depicted in Fig.8. The length N of FragN is set as 120. In the 120 points, there are some redundant data points, so the points corresponding to the actual hand gesture is less than 120. Therefore, the length of the kernel kernel_length is set to a value less than 120, the stride is set as 1, the filter number is set as 𝑁𝑓𝑖𝑙𝑡𝑒𝑟 . With this design, the convolution kernel will scan each

piece of continuous kernel_length points of the input data to

extract features sequentially. The length of the output for the convolutional layer 𝑁𝑜𝑢𝑡 is calculated as below.

𝑁𝑜𝑢𝑡 = ⌊𝑁 − 𝑘𝑒𝑟𝑛𝑒𝑙_𝑙𝑒𝑛𝑔𝑡ℎ

𝑠𝑡𝑟𝑖𝑑𝑒⌋ + 1 (6)

The length of the pooling layer is also set as 𝑁𝑜𝑢𝑡 , which means the sensing area of the pooling layer corresponds to all scanning positions of the convolution kernel. The pooling layer is a maxpooling layer to select best features from entire sensing area. Therefore, the combination of the convolutional layer and the maxpooling layer acts as an shift-invariant feature selector, which is insensitive to the absolute position where the actual hand gesture data is located in the input layer. Through the scanning of the convolutional kernel, features for each step are available to the maxpooling layer and it will finally choose the best feature as the output. So this mechanism can always find the best estimated position of the actual hand gesture from the input segment.

In the 1D-CNN network proposed in this paper, parameters kernel_length and 𝑁𝑓𝑖𝑙𝑡𝑒𝑟 are very important. Fig.9 shows the

recognition rate of the system when the two parameters take different values. There are three local best points in Fig.9. In order to balance the complexity and recognition accuracy of the system, the kernel length is set as 60 and the 𝑁𝑓𝑖𝑙𝑡𝑒𝑟 is set as 75.

3) Data collection Strategies There are two data collection strategies implemented in this

paper: gesture extending and gesture shifting. In order to always include the best estimated start and end points of the actual hand gesture, the main idea of gesture extending strategy is to take more points before 𝑖𝑠𝑡𝑎𝑟𝑡 and more points after 𝑖𝑒𝑛𝑑 . After that, the extraction range of Frag will be extended, 𝑖𝑠𝑡𝑎𝑟𝑡 is updated to (𝑖𝑠𝑡𝑎𝑟𝑡 − 𝛽𝐿), 𝑖𝑒𝑛𝑑 is updated to (𝑖𝑒𝑛𝑑 + 𝛽𝐿), 𝛽 is set as the empirical value 10%. For the gesture shifting strategy, the main idea is to shift the extraction window to collect more samples. The shifting distance k is set as 5 points, whenever the system detects a new hand gesture, the extraction window moves k points to the left step by step to extract k hand gesture samples at different positions. The same method is also applicable to moving to the right.

V. EVALUATION

A. Dataset collection

In order to evaluate the performance of the framework proposed in this paper, five volunteers were invited to perform hand gestures to collect the dataset and one volunteer is invited to manually record the time stamp of each hand gesture. The manually recorded time stamp can be used to identify the false triggers and miss detections of the hand gesture detection algorithm. As shown in Fig.10, the vertical dashed line indicates the time stamp recorded by the volunteer. By comparison, the 61st and 63rd hand gestures are falsely triggered and a hand gesture detection is missed between the 60th and 62th hand gestures. The experimental results are summarized in the table I. In addition, the falsely triggered dataset is collected independently to evaluate the performance of the false trigger detection algorithm.

Page 5: Dynamic Hand Gesture Detection and Recognition with WiFi ...static.tongtianta.site/paper_pdf/b8d9d0aa-ad4a-11e... · (IoT) technology and artificial intelligence, there is an urgent

The dynamic hand gesture time range of the collected data is also measured, the result is shown in Fig.11. It can be seen that it is reasonable for the system to limit the valid hand gesture time range from 0.9s to 5.4s.

B. 1D-CNN performance evaluation

We compare 1D-CNN with traditional machine learning algorithms including support vector machine (SVM), logistic regression (LR), decision tree (DT), naive bayes (NB), k nearest neighbors (kNN), random forest (RF), gradient boosting (GB) and fully connected neural network (NN). Traditional classifiers do not have the ability to extract features automatically, so we tested their performance on both raw features and special features. The special features represent some statistic features and features from [24]. The results without additional strategy are shown in Fig.12, which shows the accuracy of 1D-CNN is significantly higher than traditional algorithms. The performance comparison under different data collection strategies for 1D-CNN is shown in Fig.13. It can be seen from the figure that both strategies can effectively improve the performance of the system.

C. Hand Gesture detection algorithm performance evaluation

The quality of the detection module depends on the accuracy and the false trigger rate. Once the detection module detects an invalid hand gesture, the recognition module will assume this is a valid data and make a prediction, so the false triggers will affect the performance of the system. Since false triggers are often caused by sudden changes in the environment, there should be a large difference in the data distribution between the falsely triggered data and the valid data. Using the kNN algorithm and obtaining the minimum distance in the k-nearest neighbor, the system can verify the validity of the input data by setting an appropriate threshold. The data distribution between the falsely triggered data and the valid data of the minimum distance in the k-nearest neighbor is depicted in Fig.14. According to the results from it, we can set the threshold as 20 to avoid the false triggers of the system and finally achieve the detection accuracy of 91.38%.

Fig.10. False triggers and miss detections of the detection algorithm

TABLE I EXPERIMENTAL RESULTS FOR THE DATASET COLLECTION

Label Number Detected False

Triggers

Miss

Detection

A 93 95 7 5

B 94 96 8 6

C 92 80 0 12

D 93 81 0 12

E 92 86 0 6

F 92 89 1 4

G 94 85 2 11

Fig.11. The dynamic hand gesture time range of the collected data (solid triangle represents the mean values)

Fig.12. Performance comparison between 1D-CNN and traditional machine learning algorithm with no additional strategy

Fig.13. Performance comparison under different data collection strategies for 1D-CNN

Fig.14. The data distribution of the minimum distance in the k-nearest neighbor for the falsely triggered data and the valid data

Fig.15. Comparison between one receiver and multiple receivers

D. Comparison between one receiver and multiple receivers

We compare the performance of using only one receiver with using multiple receivers. The results are shown in Fig.15, when the system uses two receivers, the recognition rate for complex gestures is about 10% higher than when using only one receiver.

Page 6: Dynamic Hand Gesture Detection and Recognition with WiFi ...static.tongtianta.site/paper_pdf/b8d9d0aa-ad4a-11e... · (IoT) technology and artificial intelligence, there is an urgent

VI. CONCLUSION

In this paper, we propose a dynamic hand gesture detection and recognition system based on RSS of WiFi signal. The proposed system can extract RSS from multiple independent WiFi receivers to improve the recognition capability of the system to recognize complex dynamic hand gestures. With the false trigger detection algorithm, the system can avoid false triggers caused by the environment change. With the linear scale module, the system can accommodate different hand gesture speeds. We also analyzed the possible errors caused by detection algorithm, the proposed 1D-CNN effectively overcomes the error caused by hand gesture detection algorithm and the recognition accuracy reaches 86.91%. Combined with the gesture shifting strategy, the recognition accuracy is further improved to 93.03%.

ACKNOWLEDGMENT

This work is supported by the National Natural Sciences Foundation of China (NSFC) (No.61671075) and Major Program of National Natural Science Foundation of China (No.61631003).

REFERENCES

[1] Liang R H , Ouhyoung M . A real-time continuous gesture recognition system for sign language[J]. Proc.ieee Int.conf.on Automatic Face & Gesture Recognition, 1998:558.

[2] Agrawal S , Constandache I , Gaonkar S , et al. [ACM Press the 9th international conference - Bethesda, Maryland, USA (2011.06.28-2011.07.01)] Proceedings of the 9th international conference on Mobile systems, applications, and services - MobiSys \"11 - Using mobile phones to write in air[J]. 2011:15.

[3] Biswas K K , Basu S K . Gesture recognition using Microsoft Kinect{\circledR.[C]// International Conference on Automation. IEEE, 2012.

[4] Potter L , Araullo J , Carter L . The Leap Motion controller: a view on sign language[C]// Australian Computer-human Interaction Conference: Augmentation. ACM, 2013.

[5] Adib F , Kabelac Z , Katabi D , et al. 3D Tracking via Body Radio Reflections[C]// Usenix Conference on Networked Systems Design & Implementation. USENIX Association, 2013.

[6] Adib F , Kabelac Z E , Katabi D . Multi-person localization via RF body reflections[C]// Usenix Conference on Networked Systems Design & Implementation. USENIX Association, 2015.

[7] Adib F , Hsu C Y , Mao H , et al. Capturing the human figure through a wall[J]. Acm Transactions on Graphics, 2017, 34(6):1-13.

[8] Kellogg B , Talla V , Gollakota S . Bringing gesture recognition to all devices[C]// Usenix Conference on Networked Systems Design & Implementation. USENIX Association, 2014.

[9] Bahl P . An In-Building RF-based User Location and Tracking System[J]. Proc. IEEE INFOCOM 2000, 2000, 2:775--784.

[10] Youssef M . Challenges : Devicefree passive localization for wireless environments[J]. 2007.

[11] Kosba A E , Saeed A , Youssef M . RASID: A Robust WLAN Device-free Passive Motion Detection System[C]// IEEE International Conference on Pervasive Computing and Communications (Percom 2012). IEEE, 2012.

[12] Abdelnasser H , Youssef M , Harras K A . [IEEE IEEE INFOCOM 2015 - IEEE Conference on Computer Communications - Kowloon, Hong Kong (2015.4.26-2015.5.1)] 2015 IEEE Conference on Computer Communications (INFOCOM) - WiGest: A ubiquitous WiFi-based gesture recognition system[J]. 2015:1472-1480.

[13] Wang Y , Liu J , Chen Y , et al. E-eyes: Device-free location-oriented activity identification using fine-grained WiFi signatures[M]// E-eyes: device-free location-oriented activity identification using fine-grained WiFi signatures. 2014.

[14] Qian K , Wu C , Yang Z , et al. PADS: Passive detection of moving targets with dynamic speed using PHY layer information[C]// 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE Computer Society, 2014.

[15] Shahzad M , Lu S , Wang W , et al. Understanding and Modeling of WiFi Signal Based Human Activity Recognition[C]// International Conference on Mobile Computing & Networking. ACM, 2015.

[16] Tan S , Yang J . WiFinger: leveraging commodity WiFi for fine-grained finger gesture recognition[C]// the 17th ACM International Symposium. ACM, 2016.

[17] Wang W , Liu A X , Shahzad M . Gait recognition using WiFi signals[C]// Acm International Joint Conference on Pervasive & Ubiquitous Computing. ACM, 2016.

[18] Wu D , Zhang D , Xu C , et al. WiDir: walking direction estimation using wireless signals[C]// the 2016 ACM International Joint Conference. ACM, 2016.

[19] Zheng X , Wang J , Shangguan L , et al. Smokey: Ubiquitous smoking detection with commercial WiFi infrastructures.[C]// IEEE Infocom -the IEEE International Conference on Computer Communications. IEEE, 2016.

[20] Pu Q , Jiang S , Gollakota S . Whole-Home Gesture Recognition Using Wireless Signals (Demo)[C]// Acm Sigcomm Conference on Sigcomm. ACM, 2013.

[21] Adib F , Katabi D . See through walls with WiFi![J]. Acm Sigcomm Computer Communication Review, 2013, 43(4):75-86.

[22] Gupta S , Morris D , Patel S , et al. SoundWave: using the doppler effect to sense gestures[C]// ACM, 2016.

[23] Hui M A , Zhiwen Y U , Xiangchao F , et al. Human activity recognition based on UWB location system[J]. Computer Engineering & Applications, 2012.

[24] Yan M , Jiang T , Liu Y , et al. QGA-based feature selection of target recognition by UWB communication signal in foliage environment[C]// IEEE International Conference on Communication Workshop. IEEE, 2015.

[25] IEEE. IEEE Standard for Telecommunications and Information Exchange Between Systems - LAN/MAN Specific Requirements - Part 11: Wireless Medium Access Control (MAC) and physical layer (PHY) specifications: High Speed Physical Layer in the 5 GHz band[C]// IEEE Std 80211a. IEEE, 2002.

[26] SHENSA, M. J . The discrete wavelet transform : wedding the à trous and Mallat algorithms[J]. IEEE Transactions on Signal Processing, 1992, 40(10):2464-2482.