agenda - wordpress.com · 6) matthew tang, “recognizing hand gestures with microsoft’s...

Agenda •  Vision •  Current State •  Our Approach - towards three main Areas

o  Dynamic Gesture Recognition •  Using Machine Learning

o  Modeling 3D objects •  Building the environment •  Modeling styles •  Other features

o  Improving Accuracy and Interface •  IMU/Kinect integration •  Mobile interface

•  Conclusion •  Demo •  Future Directions •  References

Vision

“Device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D

objects in real time.”

Current State •  Wacom & AutoCAD

o  It is the industry standard to create 3D objects in AutoCAD using a Wacom pen tablet.

o  Problem is that we think and draw in 3D, but the tablets interface is 2D. We can do better by moving towards natural and 3D gestures which takes the imagination of artist to the next level.

o  No commercial application exists which solves this problem. It is still in research. Major problems are Accuracy, Precision and Control.

Approach

Recognition Modeling Improving

Sensor Library

Application ML (Training)

Sample

Sensor

Application

ML (Recognition)

Data Classification

Sample

Fig. 1: Training Fig. 2: Classification

Fig. 3: Training

Training of a new gesture

Kinect Depth Sensor

Library (last k frames)

Application Artificial Neural

Network (Training)

Sample

Depth Frames

OpenNI 3D Coordinates

Supervise

Fig. 4: Classification

Classification of gesture

Kinect depth sensor

Application

Artificial Neural Network

(Recognition)

Classification

3D Coordinates OpenNI

Depth frames

Fig. 5: ANN Net

3 x k inputs

t-‐‑1

t-‐‑k

‘p’ outputs

•  k -> discrete time constant, over which gesture is performed.

•  p -> number of gestures. •  n -> number of neurons in hidden layer. •  Complete Graph between input and hidden layer,

and between hidden layer and output layer. •  Tangent sigmoidal function f(x) used at neurons

•  Learns using Back propagation of errors

ANN-Features

!(!) != !!! − !!!!! + !!! !

ANN-Limitations

•  The net rebuilds itself upon addition of a new gesture to the library.

•  Needs comparatively large data sets for training. •  Detection rate drops if considering both hands.

Results

0 10 20 30 40 50 60 70 80 90

0 5 10 15 20 25

Detection Rate

Number of data sets for a single gesture Fig. 6: Detection rate graph

Approaching 79% with no false positives

Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”

Create Visualize Share

Model Change color

Change texture

Stop & Save Pause

Load object

Place Object

Visualize

Rotate

Fork and Create

Load object

Share PDF

Save in Real time

Application

OpenNI NITE OpenGL Voice Recognition

Voce (Sphinx4)

Render scene

Voice control

Set perspective, bgcolor, etc.

Detect hand location

Plot spline points for Sculpture

Draw balls, spheres, etc

OpenGL

Display / Save

Results

Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”

Limitations of our recognizer

•  Trade off between accuracy and latency. •  Recognition using Kinect sensor ~ 20 pps •  As high amount of visual data is fed for processing

(which consists of pre-processing, feature extraction and classification) in real-time the latency of such a system is high.

Solution

•  Analogy with a GPS/IMU system used in Airplane navigation. o  GPS is the – Kinect Depth Sensor o  Inertial Measuring Unit (IMU) – smart phone sensors. o  Dead reckoning

•  We apply Data Fusion of data from Kinect and Smart phone sensors. o  Both have complementary data streams, which helps us to better

estimate the current state.

•  The user is holding a smart phone while hand recognition.

Gyroscope

Accelerometers

Integrate

Rotate accelerometers into local level

navigation frame

Remove effect of gravity

Double integrate

Position Orientation

IMU / Smart phone sensor – Position Estimation

Limitations of a IMU

•  Major problem is drifting; more in low-cost sensors (like in a smart phone).

•  If one of the accelerometers has a bias error of just 0.001 m/s2, the reported position output would diverge from the true position with an acceleration of 0.0098 m/s2—i.e. after a mere 30 seconds, the estimates would have drifted by 4.5 meters!

Fusion

Errors estimates

Visual Tracker

IMU Fusion Kalman Filter

Corrected Position

Time update

xk-‐=Axk-‐1+Buk-‐1

P-‐k=APk-‐1AT+Q

Measurement update

Kk=Pk-‐HT(HPk-‐HT+R)-‐1

xk=xk-‐+Kk(zk-‐Hxk-‐)

Pk=(I-‐KkH)Pk-‐

Fusion contd.

Android phone

Kinect

Sensors

Fusion

Display

Application

Interaction

Results

Fig. : A quick loop simulation. Using only Kinect (left). Using Kinect and IMU (right)

Results contd. •  Kinect’s position was fed at the rate of 10Hz and

check against the original 20Hz. •  As compared to a usual Linear Interpolation, our

IMU assisted system improved the location estimates by 1.37 times.

•  Further tuning of initial parameters according to application can decrease the errors.

Conclusion

•  An architecture was presented to recognize

dynamic gestures from a depth camera using neural networks, which expanded the ways of interaction with 3D objects. ML

•  We also developed new set of gestures (e.g. pottery style) for 3D modeling. It resulted in structures of actual significance, which could be imported and used in other applications. HCI

•  Finally to improve location estimates, we showed how to integrate data from inertial sensors and Kinect to obtain high quality results. Data Fusion

Short Video Demo

Tools used

•  Hardware o  PC (4GB RAM, min 1 GB free space for environment, 2.3 GHz dual core) o  Microsoft Kinect Sensor o  Android phone (With accelerometers, Gyroscope, OS ver >= 2.3)

•  Software o  OpenNI/NITE o  Point cloud Library (PCL) o  OpenCV, OpenGL o  Processing, Eclipse IDE o  Voce Voice Recognition o  Neuroph o  Apache Common Math Library (for Kalman filter)

Future Directions

Our system is still far from its original vision, but in current state can be used for initial abstract designs.

•  More 3D Interaction Techniques o  Survey done on interaction by Chris Hand[2] can be used as a start point

to develop more interaction techniques for non-ambiguous set of gestures which helps users create sculptures in 3D.

•  Deep Learning o  Deep learning has proved to have more classification accuracy than

tradition ANN techniques, but it requires large data sets and hence computing power. But once trained it performs way better.

•  Others o  Many other features such as network based multi-user interaction,

enhanced brush and color support, integration of physics engine, increased interactivity with the user, multi-user support is possible. We can also involve multi-users to draw at the same instance, using robust distributed computing.

References

1)  Roope Raisamo: “Multimodal Human- Computer Interaction:

a constructive and empirical study”, Academic Dissertation, University of Tampere, 1999

2)  Chris Hand, “A Survey of 3D Interaction Techniques”, Volume 016, (1997) number 005 pp. 269–281, Wiley, 1997

3)  Christoph Arndt and Otmar Loffeld, “Information gained by data fusion”, SPIE Conference Volume 2784, 1996

4)  Dipen Dave, Ashirwad Chowriappa and Thenkurussi Kesavadas, “Gesture Interface for 3D CAD Modeling using Kinect”, Computer-Aided Design & Applications, 9(a), 2012.

5)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie“, Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2011

References

6)  Matthew Tang, “Recognizing Hand Gestures with Microsoft’s

Kinect”, Stanford University, 2011 7)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie,

“Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2012

8)  Rufeng Meng, Jason Isenhower, Chuan Qin, Srihari Nelakuditi ,“Can Smartphone Sensors Enhance Kinect Experience?”, MobiHoc’12, June 11–14, 2012

Thank you!

Questions?

agenda - wordpress.com · 6) matthew tang, “recognizing hand gestures with microsoft’s...

Documents

kinect sensor kinect -sensor sensor kinect

citroen c5 brochure driessen

driessen rede 2008

michael driessen - georgetown university

robotic learning of manipulation tasks from visual ... ·...

bouwcenter actiekrant driessen

driessen mitsubishi asx

driessen autogroep folders

rapport de fonctionnement kinect 2 - kevin marburger ·...

feature descriptors for depth-based hand gesture...

driessen autoschade folder 2010/2011

assessment of microsoft kinect technology (kinect...

kinect a brief introduction to microsoft’s kinect sensor...

kinect chapters 1 & 2. kinect imaging - ::: welcome to...

geert driessen et al. (2014). zittenblijven in het...

driessen mitsubishi grandis

kinect sdk, programming kinect

kinect camp with tmcn / kinect v2 概要

kinect + .net = nui : interfacce naturali facili con...

kinect chapter 15. using the kinect's microphone...