knowledge systems lab jn 9/10/2002 computer vision: gesture recognition from images joshua r. new...

28
JN 9/10/2002 Knowledge Systems Lab Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University

Upload: sherman-preston

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

JN 9/10/2002

Knowledge Systems Lab

Computer Vision:Gesture Recognition from Images

Joshua R. NewKnowledge Systems Laboratory

Jacksonville State University

JN 9/10/2002

Knowledge Systems Lab

Outline

• Terminology

• Current Research and Uses

• Kjeldsen’s PhD Thesis

• Implementation Overview

• Implementation Analysis

• Future Directions

JN 9/10/2002

Knowledge Systems Lab

Terminology

Image Processing - Computer manipulation of images. Some of the many algorithms used in image processing include convolution (on which many others are based), edge detection, and contrast enhancement.

Computer Vision - A branch of artificial intelligence and image processing concerned with computer processing of images from the real world. Computer vision typically requires a combination of low level image processing to enhance the image quality (e.g. remove noise, increase contrast) and higher level pattern recognition and image understanding to recognize features present in the image.

JN 9/10/2002

Knowledge Systems Lab

Current Research

•Capture images from a camera•Process images to extract features•Use those features to train a learning system to recognize the gesture•Use the gesture as a meaningful input into a system

More information located at:http://www.cybernet.com/~ccohen/

JN 9/10/2002

Knowledge Systems Lab

Current Research Example

•Starner and Pentland•2 hands segmented•Hand shape from a bounding ellipse•Eight element feature vector•Recognition using Hidden Markov Models

JN 9/10/2002

Knowledge Systems Lab

Current Uses

•Sign Stream (released demo for MacOS)

•Database tool for analysis of linguistic data captured on video•Developed at Boston University with funding from ASL Linguistic Research Project and NSF•http://www.bu.edu/asllrp/SignStream/

JN 9/10/2002

Knowledge Systems Lab

Current Uses

•Recursive Models of Human Motion (Smart Desk, MIT)

•Models the constraints by which we move•Visually-guided gestural interaction, animation, and face recognition•Stereoscopic vision for 3D modeling•http://vismod.www.media.mit.edu/vismod/demos/smartdesk/

JN 9/10/2002

Knowledge Systems Lab

Current Uses

JN 9/10/2002

Knowledge Systems Lab

Kjeldsen’s PhD thesis

•Application•Gesture recognition as a system interface to augment that of the mouse•Menu selection, window move, and resize •Input: 200x300 image

•Calibration of user’s hand

JN 9/10/2002

Knowledge Systems Lab

Kjeldsen’s PhD thesis

•Image split into HSI channels (I = Intensity, Lightness, Value)

•Segmentation with largest connected component•Eroded to get rid of edges•Gray-scale values sent to learning system

JN 9/10/2002

Knowledge Systems Lab

Kjeldsen’s PhD thesis

•Learning System – Backprop network•1014 input nodes (one for each pixel)•20 hidden nodes•1 output node for each classification•40 images of each pose

•Results:•Correct classification 90-96% of the time on images

JN 9/10/2002

Knowledge Systems Lab

Implementation Overview

• System:• 1.33 Ghz AMD Athlon• OpenCV and IPL libraries (from Intel)

• Input:• 2 – 640x480 images, saturation channel• Max hand size in x and y orientations in # of pixels

• Output:• Rough estimate of movement• Refined estimate of movement• Number of fingers being held up• Rough Orientation

JN 9/10/2002

Knowledge Systems Lab

Implementation Overview

Chronological order of system:

1) Saturation channel extraction2) Threshold Saturation channel3) Calculate Center of Mass (CoM)4) Reduce Noise5) Remove arm from hand6) Calculate refined-CoM7) Calculate orientation8) Count the number of fingers

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis1. Saturation channel extraction: Digital camera, saved as JPGs JPGs converted to 640x480 PPMs Saturation channels extracted into PGMs

Original ImageOriginal Image Hue

Lightness

Saturation

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

2. Threshold Saturation channel:a) Threshold value – 50 (values range from 0 to 255)

b) @ PixelValue = PixelValue ≥ 50 ? 128 : 0

JN 9/10/2002

Knowledge Systems Lab

a) 0th moment of an image:

b) 1st moment for x and y of an image, respectively:

c) Center of Mass (location of centroid): where and

Implementation Analysis

),(00 yxIM

),(10 yxIxM ),(01 yxIyM

3. Calculate Center of Mass (CoM):a) Count number of 128-valued pixelsb) Sum x-values and y-values of those pixelsc) Divide each sum by the number of pixels

),( cc yx00

10

M

Mxc

00

01

M

Myc

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

4. Reduce Noise:FloodFill at the computed CoM (128-valued pixels become 192)

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

5. Remove arm from handa) Find top left of bounding boxb) Apply border for bounding box from calibration measurec) FloodFill, 192 to 254

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

6. Calculate refined-CoM (rCoM):a) Threshold, 254 to 255b) Compute CoM as before

JN 9/10/2002

Knowledge Systems Lab

7. Orientation:a) 0th moment of an image:

b) 1st moment for x and y of an image, respectively:

c) 2nd moment for x and y of an image, respectively:

d) Orientation ofimage major axis:

Implementation Analysis

2

arctan2

00

022

00

20

00

112

cc

cc

yM

Mx

M

M

yxM

M

),(220 yxIxM

),(00 yxIM

),(10 yxIxM ),(01 yxIyM

),(202 yxIyM

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

8. Count the number of fingers (via FingerCountGivenX)

Function inputs:a) Pointer to Image Datab) rCoMc) Radius = .17*HandSizeX + .17*HandSizeYd) Starting Location (x or y, call appropriate function)e) Ending Location (x or y, call appropriate function)f) White Pixel Counterg) Black Pixel Counterh) Finger Counter

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

8. Count the number of fingers:• 2 similar functions – start/end location

in x or y• After all previous steps, the finger-

finding function sweeps out an arc, counting the number of white and black pixels as it progresses

• A finger in the current system is defined to be any 10+ white pixels separated by 3+ black pixels (salt/pepper tolerance) minus 1 for the hand itself

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

8. Count the number of fingers:

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

8. Count the number of fingers:• Illustration of noise tolerance

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

SystemInput:

SystemOutput:

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

SystemInput:

SystemOutput:

JN 9/10/2002

Knowledge Systems Lab

Implementation Analysis

System Runtime:• Real Time – requires

30fps• Current time – 16.5 ms

for one frame (without reading or writing)

• Current Processing Capability on 1.33 Ghz Athlon – 60 fps

ProcessSteps

Time (ms)

Athlon MP 1500 (1.33 Ghz)

Pentium850 Mhz

1) Reading Image ? ?

2) Reading Image 208 340

3) Threshold .5 6.5

4) Center of Mass 3.5 18.5

5) Flood Fill 1.5 27

6) Bounding Box Top-Left 3.5 5.5

7) Arm Removal 2 34.5

8) Refined CoM 4 19

9) Finger Counting .5 1

10) Write Image 233 324

Time w/o R&W 16.5 112

Time w/o Write 224.5 452

Total Time 457.5 776.5

JN 9/10/2002

Knowledge Systems Lab

Future Directions

• Optimization

• Orientation for Hand Registration

• New Finger Counting Approach

• Learning System

For additional information, please visit http://ksl.jsu.edu.