eec-693/793 applied computer vision with depth cameras lecture 8 wenbing zhao

19
EEC-693/793 EEC-693/793 Applied Computer Vision Applied Computer Vision with Depth Cameras with Depth Cameras Lecture 8 Lecture 8 Wenbing Zhao Wenbing Zhao [email protected] [email protected]

Upload: esmond-edwards

Post on 17-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Skeleton Tracking Real-Time Human Pose Recognition in Parts from Single Depth Images, by J. Shotton et al at Microsoft Research Cambridge & Xbox incubation  Real-time human pose recognition is difficult and challenging because of the different body poses, sizes, dresses, heights and so on Kinect uses a rendering pipeline where it matches the incoming data (raw depth data from Kinect) with sample trained data  The machine learned data is collected from the base characters with different types of poses, hair types, and clothing, and in different rotations and views  The machine learned data is labeled with individual body parts and matched with the incoming depth data to identify which part of the body it belongs to  The rendering pipeline processes the data in several steps to track human body parts from depth data

TRANSCRIPT

Page 1: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

EEC-693/793EEC-693/793Applied Computer Vision Applied Computer Vision

with Depth Cameraswith Depth Cameras

Lecture 8Lecture 8

Wenbing ZhaoWenbing [email protected]@ieee.org

Page 2: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

OutlineOutline Human skeleton tracking

Page 3: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Skeleton Tracking Real-Time Human Pose Recognition in Parts from Single Depth

Images, by J. Shotton et al at Microsoft Research Cambridge & Xbox incubation http://research.microsoft.com/apps/pubs/default.aspx?id=145347

Real-time human pose recognition is difficult and challenging because of the different body poses, sizes, dresses, heights and so on

Kinect uses a rendering pipeline where it matches the incoming data (raw depth data from Kinect) with sample trained data The machine learned data is collected from the base characters with different

types of poses, hair types, and clothing, and in different rotations and views The machine learned data is labeled with individual body parts and matched with

the incoming depth data to identify which part of the body it belongs to The rendering pipeline processes the data in several steps to track human body

parts from depth data

Page 4: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

The Rendering Pipeline Processes From depth image, we can easily identify the

human body object In the absence of any other logic, the sensor will not

know if this is a human body or something else To start recognizing a human body, we match

each individual pixel of incoming depth data with the data the machine has learned

The data each individual machine has learned is labeled and has some associated values to match with incoming data

matching is based on the probability that the incoming data matches with the data the machine has learned

Page 5: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

The Rendering Pipeline Processes The next step is to label the body parts by creating

segments Kinect uses a trained tree structure (known as a decision

tree) to match the data for a specific type of human body Eventually, every single pixel data passes through this tree

to match with body parts Once the different body parts are identified, the

sensor positions the joint points with the highest probable matched data

With identified joint points and the movement of those joints, the sensor can track the movement of the complete body

Page 6: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

The Rendering Pipeline Processes The joint positions are measured by

three coordinates (x,y,z) X and y define the position of the joint Y represents the distance from the

sensor To get the proper coordinates, the

sensor calculates the three views of the same image: front, left, and top views => define 3D body proposal

Page 7: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Skeleton Tracking The Kinect for Windows SDK provides

us with a set of APIs that allow easy access to the skeleton joints

The SDK supports the tracking of up to 20 joint points

Tracking state: Tracked, Not Tracked, or Position Only

Tracking modes: default and seated Default mode: detects the user based

on the distance of the subject from the background

Seated mode: uses movement to detect the user and distinguish him or her from the background, such as a couch or chair

Page 8: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Skeleton Tracking Kinect can fully track up to two

users It can detect up to 6 users (4 of

them with position only)

Page 9: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Skeleton Tracking Seated skeleton: up to 10 joints The seated pipeline provides a

different segmentation mask than the default pipeline: Continuity of the segmentation mask is not

guaranteed outside of the arms, head, and shoulder areas

The seated segmentation mask doesn't correspond exactly to the player outline like the standing (full-body) mask does

The seated pipeline environment has less data, with more noise and variability than the standing environment

The seated mode uses more resources than the default pipeline and yields a lower throughput (in frames per second) on the same scene

kinect.SkeletonStream.TrackingMode = SkeletonTrackingMode.Seated;

Page 10: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Capturing and Processing Sekelton Data Enable the skeleton stream channel with the type of depth

image format

Attach the event handler to the skeleton stream channel

Process the incoming skeleton frames

Render a joint on UI

this.sensor = KinectSensor.KinectSensors[0];this.sensor.SkeletonStream.Enable();

this.sensor.SkeletonFrameReady += skeletonFrameReady;

void skeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e){}

Page 11: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Processing Skeleton Datavoid skeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e) { using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame()) { if (skeletonFrame == null) {

return; } skeletonFrame.CopySkeletonDataTo(totalSkeleton); Skeleton firstSkeleton = (from trackskeleton in totalSkeleton where trackskeleton.TrackingState == SkeletonTrackingState.Tracked select trackskeleton).FirstOrDefault(); if (firstSkeleton == null) {

return; } if (firstSkeleton.Joints[JointType.HandRight].TrackingState ==

JointTrackingState.Tracked) {this.MapJointsWithUIElement(firstSkeleton);

} }}

Skeleton[] totalSkeleton = new Skeleton[6];

Page 12: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Render the Right-Hand Joint on UI

We have to map the coordinate from the skeleton space to regular image space

Page 13: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Render the Right-Hand Joint on UI

depthPoint will return the X and Y points corresponding to the skeleton joint point

private Point ScalePosition(SkeletonPoint skeletonPoint){ DepthImagePoint depthPoint = this.sensor.CoordinateMapper. MapSkeletonPointToDepthPoint(skeletonPoint, DepthImageFormat. Resolution640x480Fps30); return new Point(depthPoint.X, depthPoint.Y);}

private void MapJointsWithUIElement(Skeleton skeleton){ Point mappedPoint = ScalePosition(skeleton.Joints[JointType.HandRight].Position); Canvas.SetLeft(righthand, mappedPoint.X); Canvas.SetTop(righthand, mappedPoint.Y);}

Page 14: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Build TrackingHand App Create a new C# WPF project with name TrackingHand Add Microsoft.Kinect reference Design GUI Added WindowLoaded() method in xaml file Adding code

Page 15: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

GUI Design Canvas control, then add Ellipse control in Canvas

Page 16: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Adding Code Add member variables:

WindowLoade method (WindowClosing() same as before):KinectSensor sensor;Skeleton[] totalSkeleton = new Skeleton[6];

private void WindowLoaded(object sender, RoutedEventArgs e){ this.sensor = KinectSensor.KinectSensors[0]; this.sensor.SkeletonStream.TrackingMode = SkeletonTrackingMode.Seated; this.sensor.SkeletonStream.Enable(); this.sensor.SkeletonFrameReady += skeletonFrameReady; // start the sensor. this.sensor.Start();}

Page 17: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Adding Code Event handler for skeleton frames:

void skeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e) { using (SkeletonFrame skeletonFrame = e.OpenSkeletonFrame()) { if (skeletonFrame == null) {

return; } skeletonFrame.CopySkeletonDataTo(totalSkeleton); Skeleton firstSkeleton = (from trackskeleton in totalSkeleton where trackskeleton.TrackingState == SkeletonTrackingState.Tracked select trackskeleton).FirstOrDefault(); if (firstSkeleton == null) {

return; } if (firstSkeleton.Joints[JointType.HandRight].TrackingState == JointTrackingState.Tracked) { this.MapJointsWithUIElement(firstSkeleton); } }}

Page 18: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Adding Code For UI display

private void MapJointsWithUIElement(Skeleton skeleton) { Point mappedPoint = ScalePosition(skeleton.Joints[JointType.HandRight].Position); Canvas.SetLeft(righthand, mappedPoint.X); Canvas.SetTop(righthand, mappedPoint.Y); //this.textBox1.Text = "x="+mappedPoint.X+", y="+mappedPoint.Y;}

private Point ScalePosition(SkeletonPoint skeletonPoint){ DepthImagePoint depthPoint = this.sensor.CoordinateMapper. MapSkeletonPointToDepthPoint(skeletonPoint, DepthImageFormat. Resolution640x480Fps30); return new Point(depthPoint.X, depthPoint.Y);}

Page 19: EEC-693/793 Applied Computer Vision with Depth Cameras Lecture 8 Wenbing Zhao

Challenge Task

For advanced students, please modify the project to make it a drawing app Shows all traces of the hand movement Add button to clear traces to make a new drawing Add a small palette chooser for change the color

of the drawing point (an Ellipse)

05/03/23EEC492/693/793 - iPhone Application

Development 19