agenda - wordpress.com · 6) matthew tang, “recognizing hand gestures with microsoft’s...

Post on 11-Apr-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Agenda •  Vision •  Current State •  Our Approach - towards three main Areas

o  Dynamic Gesture Recognition •  Using Machine Learning

o  Modeling 3D objects •  Building the environment •  Modeling styles •  Other features

o  Improving Accuracy and Interface •  IMU/Kinect integration •  Mobile interface

•  Conclusion •  Demo •  Future Directions •  References

Vision

“Device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D

objects in real time.”

Current State •  Wacom & AutoCAD

o  It is the industry standard to create 3D objects in AutoCAD using a Wacom pen tablet.

o  Problem is that we think and draw in 3D, but the tablets interface is 2D. We can do better by moving towards natural and 3D gestures which takes the imagination of artist to the next level.

o  No commercial application exists which solves this problem. It is still in research. Major problems are Accuracy, Precision and Control.

Approach

Recognition Modeling Improving

Recognition Modeling Improving

Sensor Library

Application ML (Training)

Sample

Sample

Data

Sensor

Application

ML (Recognition)

Data Classification

Sample

Fig.  1:  Training Fig.  2:  Classification

Basic

Recognition Modeling Improving

Fig.  3:  Training

Training of a new gesture

Kinect Depth  Sensor

Library (last  k  frames)

Application Artificial  Neural  

Network (Training)

Sample

Depth  Frames

Depth  Frames

OpenNI 3D  Coordinates

Supervise

Recognition Modeling Improving

Fig.  4:  Classification

Classification of gesture

Kinect  depth  sensor

Application

Artificial  Neural  Network

(Recognition)

Classification

3D  Coordinates OpenNI

Depth  frames

Depth  frames

Recognition Modeling Improving

ANN

Fig.  5:  ANN Net

x y

z

x y

z

x y

z

. . .

3  x  k  inputs

t

t-­‐‑1

t-­‐‑k

h1

h2

h3

h4

G1

G2

G3

Gp

. . .

‘p’  outputs

hn

. . .

•  k -> discrete time constant, over which gesture is performed.

•  p -> number of gestures. •  n -> number of neurons in hidden layer. •  Complete Graph between input and hidden layer,

and between hidden layer and output layer. •  Tangent sigmoidal function f(x) used at neurons

•  Learns using Back propagation of errors

Recognition Modeling Improving

ANN-Features

!(!) != !!! − !!!!! + !!! !

Recognition Modeling Improving

ANN-Limitations

•  The net rebuilds itself upon addition of a new gesture to the library.

•  Needs comparatively large data sets for training. •  Detection rate drops if considering both hands.

Recognition Modeling Improving

Results

0 10 20 30 40 50 60 70 80 90

0 5 10 15 20 25

Detection  Rate

Number  of  data  sets  for  a  single  gesture   Fig.  6:  Detection  rate  graph

Approaching  79%  with  no  false  positives

Recognition Modeling Improving

Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”

Recognition Modeling Improving

Create Visualize Share

Model Change  color

Change  texture

Stop  &  Save Pause

Load  object

Place  Object

Create Visualize Share

Recognition Modeling Improving

Visualize

6  DOF

Rotate

Zoom

Fork  and  Create

Stop

Load  object

Create Visualize Share

Recognition Modeling Improving

Share PDF

DXF

Save  in  Real  time

Create Visualize Share

Recognition Modeling Improving

Recognition Modeling Improving

Application

OpenNI NITE OpenGL Voice  Recognition

Voce (Sphinx4)

Render  scene

Voice  control

Set  perspective,  bgcolor,  etc.  

Detect  hand  location

Plot  spline  points  for  Sculpture

Draw  balls,  spheres,  etc

OpenGL

Display  /  Save

Recognition Modeling Improving

Recognition Modeling Improving

Results

Recognition Modeling Improving

Approach

“The vision behind is to device newer interfaces and techniques, which will provide a total immersive experience for modeling 3D objects in real time.”

Recognition Modeling Improving

Limitations of our recognizer

•  Trade off between accuracy and latency. •  Recognition using Kinect sensor ~ 20 pps •  As high amount of visual data is fed for processing

(which consists of pre-processing, feature extraction and classification) in real-time the latency of such a system is high.

Recognition Modeling Improving

Solution

•  Analogy with a GPS/IMU system used in Airplane navigation. o  GPS is the – Kinect Depth Sensor o  Inertial Measuring Unit (IMU) – smart phone sensors. o  Dead reckoning

•  We apply Data Fusion of data from Kinect and Smart phone sensors. o  Both have complementary data streams, which helps us to better

estimate the current state.

•  The user is holding a smart phone while hand recognition.

Gyroscope

Accelerometers

Integrate

Rotate  accelerometers  into  local  level  

navigation  frame

Remove  effect  of  gravity

Double  integrate

 

Position Orientation

Recognition Modeling Improving

IMU / Smart phone sensor – Position Estimation

!!

!!

Recognition Modeling Improving

Limitations of a IMU

•  Major problem is drifting; more in low-cost sensors (like in a smart phone).

•  If one of the accelerometers has a bias error of just 0.001 m/s2, the reported position output would diverge from the true position with an acceleration of 0.0098 m/s2—i.e. after a mere 30 seconds, the estimates would have drifted by 4.5 meters!

Recognition Modeling Improving

Fusion

Errors  estimates

Visual  Tracker

IMU Fusion Kalman  Filter  

Corrected  Position

Time  update

xk-­‐=Axk-­‐1+Buk-­‐1

P-­‐k=APk-­‐1AT+Q

 

Measurement  update

Kk=Pk-­‐HT(HPk-­‐HT+R)-­‐1

xk=xk-­‐+Kk(zk-­‐Hxk-­‐)

Pk=(I-­‐KkH)Pk-­‐

 

Recognition Modeling Improving

Fusion contd.

Android  phone

Kinect

Sensors

Fusion

Display

 

Application

Interaction

Recognition Modeling Improving

Results

Fig.  :  A  quick  loop  simulation.  Using  only  Kinect  (left).   Using  Kinect  and  IMU  (right)

Recognition Modeling Improving

Results contd. •  Kinect’s position was fed at the rate of 10Hz and

check against the original 20Hz. •  As compared to a usual Linear Interpolation, our

IMU assisted system improved the location estimates by 1.37 times.

•  Further tuning of initial parameters according to application can decrease the errors.

Conclusion

•  An architecture was presented to recognize

dynamic gestures from a depth camera using neural networks, which expanded the ways of interaction with 3D objects. ML

•  We also developed new set of gestures (e.g. pottery style) for 3D modeling. It resulted in structures of actual significance, which could be imported and used in other applications. HCI

•  Finally to improve location estimates, we showed how to integrate data from inertial sensors and Kinect to obtain high quality results. Data Fusion

Short Video Demo

Tools used

•  Hardware o  PC (4GB RAM, min 1 GB free space for environment, 2.3 GHz dual core) o  Microsoft Kinect Sensor o  Android phone (With accelerometers, Gyroscope, OS ver >= 2.3)

•  Software o  OpenNI/NITE o  Point cloud Library (PCL) o  OpenCV, OpenGL o  Processing, Eclipse IDE o  Voce Voice Recognition o  Neuroph o  Apache Common Math Library (for Kalman filter)

Future Directions

Our system is still far from its original vision, but in current state can be used for initial abstract designs.

•  More 3D Interaction Techniques o  Survey done on interaction by Chris Hand[2] can be used as a start point

to develop more interaction techniques for non-ambiguous set of gestures which helps users create sculptures in 3D.

•  Deep Learning o  Deep learning has proved to have more classification accuracy than

tradition ANN techniques, but it requires large data sets and hence computing power. But once trained it performs way better.

•  Others o  Many other features such as network based multi-user interaction,

enhanced brush and color support, integration of physics engine, increased interactivity with the user, multi-user support is possible. We can also involve multi-users to draw at the same instance, using robust distributed computing.

References

1)  Roope Raisamo: “Multimodal Human- Computer Interaction:

a constructive and empirical study”, Academic Dissertation, University of Tampere, 1999

2)  Chris Hand, “A Survey of 3D Interaction Techniques”, Volume 016, (1997) number 005 pp. 269–281, Wiley, 1997

3)  Christoph Arndt and Otmar Loffeld, “Information gained by data fusion”, SPIE Conference Volume 2784, 1996

4)  Dipen Dave, Ashirwad Chowriappa and Thenkurussi Kesavadas, “Gesture Interface for 3D CAD Modeling using Kinect”, Computer-Aided Design & Applications, 9(a), 2012.

5)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie“, Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2011

References

6)  Matthew Tang, “Recognizing Hand Gestures with Microsoft’s

Kinect”, Stanford University, 2011 7)  Gabrielle Odowichuk, Shawn Trail, Peter Driessen, Wendy Nie,

“Sensor Fusion: Towards a Fully Expressive 3D Music Control Interface”, University of Victoria, 2012

8)  Rufeng Meng, Jason Isenhower, Chuan Qin, Srihari Nelakuditi ,“Can Smartphone Sensors Enhance Kinect Experience?”, MobiHoc’12, June 11–14, 2012

Thank  you!

Questions?

End

top related