low power embedded gesture recognition using novel short ... › summit › posters › magno,...

1
TEMPLATE DESIGN © 2008 www.PosterPresentations.com Low Power Embedded Gesture Recognition Using Novel Short-Range Radar Sensors Michele Magno, Emanuel Eggimann, Jonas Erb, Philipp Mayer, Luca Benini Integrated Systems Laboratory, ETH Zurich Gesture Recognition Based on Short-Range Radar Increasing research on radar for gesture recognition 1,2,3,4 Google developed micro-radar for gesture recognition Good results on difficult hand-gestures: 90% accuracy on 11 gestures and 10 people Contribution of This Work Technical Background Embedded And Energy Efficient Algorithm CNN+TCN In Field Ev. and Comparison with Google Soli 1 Conclusion This work presented a high-accuracy and low-power hand-gesture recognition system based on short-range radar. Two large datasets with 11 challenging hand-gestures performed by 26 different people containing a total of 20210 gesture instances are recorded, on which the final algorithm reaches an up to 92%. The model size is only 92kB and the implementation in GAP8 shows that live- prediction is feasible with a power consumption of the prediction network of only 21mW. 1 Soli: Ubiquitous Gesture Sensing with Millimeter Wave Radar, 2016 2 Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016 3 Sparsity-based Dynamic Hand Gesture Recognition Using Micro-Doppler Signatures, 2017 4 A Hand Gesture Recognition Sensor Using Reflected Impulses. 2017 Implement radar based hand-gesture recognition in embedded system Create dataset with fine-grained hand-gestures at least 1000 samples per class and 20 users Algorithm suitable for embedded systems less than 1MB, at least 700x smaller than I. w. Soli Achieving similar accuracy as I. w. Soli 85% (single-user), 87% (10 people) on 11 Gestures 1 Algorithm implementation in GAP8 PULP processor and experimental evaluation on efficiency (power, run- time) Short Range Radar Classical airplane detection radar: 1-12 GHz (wavelength = 2.5-30cm) Acconeer radar: 60GHz (wavelength = 5mm) Data from sensor: Sweep @ 160Hz each over range time and range discrete signal [, ] Machine Learning Features: Range Frequency Doppler Map Signal Energy Signal Variation Center of Mass of the Envelope time range Sweeps range resolution (0.483mm) Range Frequency Doppler Map: Radar output: range & time discrete signal: [, ] Doppler Map: , =σ =0 [, ] 2 = ( , ) Strong feature based on research 1,2 1 Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016 2 Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing, 2015 Idea: Combine information of multiple time steps Up 98.9% Accuracy for 5 gestures and 2 sensors! 2D CNN TCN Excution Length @100MHz Cycles Network + FFT 5.877 million 2D CNN 5.079 million TCN 0.458 million Fully Connected Layers 0.086 million FFT 0.177 million Power consumption: 5Hz Prediction Rate: only 21mW! Much margin: up to 50Hz possible Comparison Cortex M7 (STM32F746ZG): Same calculation consumes 147mW-588mW Running at limit (@216MHz needing 850ms for 5 predictions) @100MHz and 8 cores running: total energy per frame: 4.2mJ 5 frames/s: 21mW Per Frame Accuracy on 11 Gesture Datasets Soli This Work Single-User Leave-One-Out (Session) Cross-Validation 85.75 % 92 % Multiple-Users randomly shuffled 87.17 % 81.52 % Multiple-Users Leave-One-Out (Person) Cross-Validation 79.06 % 73.66 % Other Properties Soli This Work Model Size 689MB 91kB Dataset: Total Instances per Gesture 500 1610 Dataset: People 10 26 Embedded Implementation No Yes Network Power Consumption - 21mW Sensor Power Consumption 300m W <190mW GAP8 VS ARM Cortex-M7Implementation: Run-Time, Power Consumption, FFT This work proposes a low-power high-accuracy embedded hand-gesture recognition targeting battery-operated wearable devices using low power short-range radar sensors. A 2D Convolutional Neural Network (CNN) using range frequency Doppler features is combined with a Temporal Convolutional Neural Network (TCN) for time sequence prediction. The final algorithm has a model size of only 45723 parameters, yielding a memory footprint of only 91kB. Two datasets containing 11 challenging hand gestures performed by 26 different people have been recorded containing a total of 20210 gesture instances. On the 11 hands, gestures and an accuracy of 87% (26 users) and 92% (single user) have been achieved. Furthermore, reducing the gesture to 5 we achieved up to 98.9% in accuracy. Finally, the prediction algorithm has been implemented in the GAP8 Parallel Ultra-Low-Power processor by GreenWaves Technologies, showing that live-prediction is feasible with only 21mW of power consumption for the full gesture prediction neural network, without the sensor consumption. TCN FFT FFT FFT FFT FFT CNN CNN CNN CNN CNN Model Gesture 1) 2D Convolutional Neural Network (CNN) scaling down the range frequency Doppler input map of size 32x492x2 to a representation vector of length 384. 2) Temporal Convolutional Neural Network (TCN) using as input a time sequence of stacked representation vectors of length 384 leveraging temporal information for more accurate predictions. The output representations of length 32 produced by the TCN are then fed into three fully connected layers, which create the probability distribution to classify the observed gesture * From Palm-Hold gesture.

Upload: others

Post on 26-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low Power Embedded Gesture Recognition Using Novel Short ... › summit › posters › Magno, Michele ETH Zurich FINAL.pdfThis work proposes a low-power high-accuracy embedded hand-gesture

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

Low Power Embedded Gesture Recognition Using Novel Short-Range Radar

Sensors

Michele Magno, Emanuel Eggimann, Jonas Erb, Philipp Mayer, Luca Benini

Integrated Systems Laboratory, ETH Zurich

Gesture Recognition Based on Short-Range Radar

Increasing research on radar for gesture

recognition1,2,3,4

Google developed micro-radar for gesture

recognition

Good results on difficult hand-gestures: 90%

accuracy on 11 gestures and 10 people

Contribution of This Work

Technical Background Embedded And Energy Efficient Algorithm CNN+TCN In Field Ev. and Comparison with Google Soli 1

ConclusionThis work presented a high-accuracy and low-power hand-gesture

recognition system based on short-range radar. Two large datasets

with 11 challenging hand-gestures performed by 26 different

people containing a total of 20210 gesture instances are recorded,

on which the final algorithm reaches an up to 92%. The model size

is only 92kB and the implementation in GAP8 shows that live-

prediction is feasible with a power consumption of the prediction

network of only 21mW.

1Soli: Ubiquitous Gesture Sensing with Millimeter Wave Radar, 2016

2Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016

3Sparsity-based Dynamic Hand Gesture Recognition Using Micro-Doppler Signatures, 2017

4A Hand Gesture Recognition Sensor Using Reflected Impulses. 2017

Implement radar based hand-gesture recognition

in embedded system

Create dataset with fine-grained hand-gestures–at least 1000 samples per class and 20 users

Algorithm suitable for embedded systems–less than 1MB, at least 700x smaller than I. w. Soli

Achieving similar accuracy as I. w. Soli–85% (single-user), 87% (10 people) on 11 Gestures1

Algorithm implementation in GAP8 PULP processor

and experimental evaluation on efficiency (power, run-

time)

Short Range Radar–Classical airplane detection radar: 1-12 GHz

(wavelength = 2.5-30cm)

–Acconeer radar: 60GHz (wavelength = 5mm)

–Data from sensor:

−Sweep @ 160Hz → each over range

−time and range discrete signal 𝑆[𝑡, 𝑟]

Machine Learning Features:

–Range Frequency Doppler Map

–Signal Energy

–Signal Variation

–Center of Mass of the Envelope

time

rang

e

Sweeps

range resolution(0.483mm)

Range Frequency Doppler Map:

Radar output: range & time discrete signal:

𝑆[𝑡, 𝑟]

Doppler Map:

–𝑆 𝑓, 𝑟 = σ𝑡=0

𝐿𝑑𝑜𝑝𝑝𝑙𝑒𝑟 𝑆[𝑡, 𝑟]𝑒−

2𝜋𝑖𝑓𝑡

𝐿𝑑𝑜𝑝𝑝𝑙𝑒𝑟

= 𝐹𝐹𝑇(𝑆 𝑡, 𝑟 )Strong feature based on research1,2

1Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. 2016

2Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing, 2015

Idea: Combine information of multiple time steps

→ Up 98.9% Accuracy

for 5 gestures and 2

sensors!

2D CNN

TCN

Excution Length

@100MHz

Cycles

Network + FFT 5.877

million

2D CNN 5.079 million

TCN 0.458 million

Fully Connected

Layers

0.086 million

FFT 0.177 million

Power consumption:

▪ 5Hz Prediction Rate: only 21mW!

▪ Much margin: up to 50Hz possible

Comparison Cortex M7 (STM32F746ZG):

▪ Same calculation consumes

147mW-588mW

▪ Running at limit (@216MHz

needing 850ms for 5 predictions)

@100MHz and 8 cores

running: total energy

per frame: 4.2mJ

→ 5 frames/s: 21mW

Per Frame Accuracy on 11 Gesture

Datasets

Soli This

Work

Single-User Leave-One-Out (Session)

Cross-Validation

85.75

%

92 %

Multiple-Users randomly shuffled 87.17

%

81.52 %

Multiple-Users Leave-One-Out (Person)

Cross-Validation

79.06

%

73.66 %

Other Properties Soli This

Work

Model Size 689MB 91kB

Dataset: Total Instances per Gesture 500 1610

Dataset: People 10 26

Embedded Implementation No Yes

Network Power Consumption - 21mW

Sensor Power Consumption 300m

W

<190mW

GAP8 VS ARM Cortex-M7Implementation: Run-Time,

Power Consumption, FFT

This work proposes a low-power high-accuracy embedded hand-gesture

recognition targeting battery-operated wearable devices using low power

short-range radar sensors. A 2D Convolutional Neural Network (CNN)

using range frequency Doppler features is combined with a Temporal

Convolutional Neural Network (TCN) for time sequence prediction. The

final algorithm has a model size of only 45723 parameters, yielding a

memory footprint of only 91kB. Two datasets containing 11 challenging

hand gestures performed by 26 different people have been recorded

containing a total of 20210 gesture instances. On the 11 hands, gestures

and an accuracy of 87% (26 users) and 92% (single user) have been

achieved. Furthermore, reducing the gesture to 5 we achieved up to 98.9%

in accuracy. Finally, the prediction algorithm has been implemented in the

GAP8 Parallel Ultra-Low-Power processor by GreenWaves Technologies,

showing that live-prediction is feasible with only 21mW of power

consumption for the full gesture prediction neural network, without the

sensor consumption.

TCN

FFT FFT FFT FFT FFT

CNN CNN CNN CNN CNN Model

Gesture

1) 2D Convolutional Neural Network (CNN) scaling down the range frequency

Doppler input map of size 32x492x2 to a representation vector of length 384.

2) Temporal Convolutional Neural Network (TCN) using as input a time

sequence of stacked representation vectors of length 384 leveraging temporal

information for more accurate predictions. The output representations of length 32

produced by the TCN are then fed into three fully connected layers, which create

the probability distribution to classify the observed gesture

* From Palm-Hold gesture.