sccolbert robot vision

IntroductionAlgorithm

TestingImplementation

Summary

A High Performance Robot Vision Algorithm

Implemented in Python

S. Chris Colbert1, Gregor Franz2, Konrad Wöllhaf2,Redwan Alqasemi1, Rajiv Dubey1

1Department of Mechanical Engineering, University of South Florida2University of Applied Sciences Ravensburg-Weingarten, Germany

SciPy 2010

S. Chris Colbert et al. Robot Vision with Python



Summary

MotivationState of the ArtObjectives

Why the Need for Autonomous Robot Vision?

It has broad applicability.

Industrial automationNuclear waste handlingAssistive and service oriented robots

According to a 2006 US Census Bureau report, 51.2 millionAmericans su�er from some form of disability and 10.7 millionof them are unable to independently perform activities of dailyliving (ADL).

Assistive robots can help with this.




Summary


The State of the Art

Autonomous object recognition algorithms comes in two forms:

A priori knowledge based

Novelty based




Summary



How do the a priori knowledge based systems work?

It starts with a model(s).

3D models, images,features...

Match object against thedatabase.

Retrieve information.

Example: Schlemmer et al.

1 Store shape andappearance.

2 Find shape in range data.

3 Match appearance data.

4 Grasp via visual servoingand matched SIFT points.




Summary



How do novelty based systems work?

With lots, and lots ofdata.

Stereo,shape-from-silhouettes,laser ranger.

Full access to objecttypically required.

Long computation times.

Example: Yamazaki et al.

1 Drive robot around object.

2 Capture more than 130images.

3 Perform dense disparityreconstruction.

4 Wait around 100s for thecomputed results.




Summary


Objectives

Reconstruct the shape and pose of a novel object to asu�cient degree of accuracy such that it permits grasp andmanipulation planning.

Require no a priori knowledge of the object with the exceptionthat a given object is the object of interest.

Require only a minimal number of images for reconstruction;signi�cantly less than the status quo.

Operate e�ciently, such that the computation time isnegligible in comparison to image capture times.




Summary

Image Capture and Pre-ProcessingSurface ApproximationShape Fitting

Algorithm Overview

The algorithm has three main phases:

1 Capture three images of the object and generate a silhouetteof the object for each image.

2 Use the silhouettes to generate a point cloud thatapproximates the surface of the object.

3 Improve the approximation by �tting a parametrized shape tothe points. The parameters of this shape serve as the model ofthe object.




Summary


Image Capture

Three images are capturedfrom disparate locations.

Two frontal, oneoverhead.

Mutually orthogonal ispreferred.

But, this can be relaxeddue to kinematicconstraints.

Store reprojection matrixCWT for each image.




Summary


Pre-ProcessingImage Undistortion

Image distortion is corrected according the equations

xp =(1 + k1r

2 + k2r4 + k3r

6)xd +

(2p1xdyd + p2(r2 + 2x2d )

)yp =

(1 + k1r

2 + k2r4 + k3r

6)yd +

(2p2xdyd + p1(r2 + 2y2d )

)Where (xd , yd ) are the distorted image points and(k1, k2, k3, p1, p2) are the �ve distortion coe�cients.




Summary


Pre-ProcessingSilhouette Generation

Color based segmentation is used to generate the silhouette ofobject from each image.




Summary


Surface Approximation

Once the images have been captured and preprocessed, the 3Dsurface of the object is approximated in the form a point cloud.This comprises two major steps:

1 Create a sphere of points that completely bounds the object.

Requires the calculation of a centroid and radius.

2 Modify the position of each point such that the projection ofthe point intersects or lies on the edge of the silhouette.




Summary


Bounding Sphere ConstructionFinding the Centroid

P

1 For each silhouette,project a ray from thecamera center through theimaged silhouettecentroid.

2 Find the single pointwhich minimizes the sumof squared distances toeach ray.

3 This point is theapproximate centroid ofthe object and is used asthe centroid of the sphere.




Summary


Bounding Sphere ConstructionFinding the Radius

rmax

1 For each silhouette, �ndrmax as shown in the�gure.

2 Select the silhouette withthe largest rmax for furtherprocessing.




Summary


Bounding Sphere ConstructionFinding the Radius

p

ix

p

p

p

12

3

4

rmax

p4 = p3 + (p3 − p1)t

t =−(p1 − p2) • (p3 − p2)

(p1 − p2) • (p3 − p1)

1 Project two rays from thecamera center: onethrough the centroid, andone through rmax .

2 Construct a plane at thecentroid that isperpendicular to thecentroid ray.

3 Find the point that lies onthis plane and the raycontaining rmax




Summary


Bounding Sphere ConstructionExample

Example: A simulated cylinderbounded by the computedsphere.

Points are generated usinga simple routine based onthe golden ratio.

The radius of the sphereis generally increased by afactor to insure completebounding of the object.




Summary


Point Cloud Manipulation

x

p p'

x0 = Sphere CenterC0 = Cam Centerxi = Sphere Pointxinew = modi�ed xi

1 Project xi into thesilhouette image to get x′i .

2 If x′i intersects thesilhouette, do nothing.

3 Find the pixel point p′

4 Let the line c0p be L1.

5 Let the line x0xi be L2.

6 Let xinew be the point ofintersection of lines L1and L2




Summary


Point Cloud ManipulationResults

The procedure is appliedto each point once ineach silhouette image.

A signi�cant improvementover other algorithms.

The result is a roughapproximation of theobjects surface.

Given an in�nite number of images, the approximation wouldconverge to the visual hull.

Rather than more images, we improve the approximation by�tting a superquadric to the point.




Summary


Shape FittingOverview

The �nal phase of the algorithm is to �nd a shape that best �ts thepoint cloud. We use superquadrics as our modeling tool for avariety of reasons:

They have a convenient parametrized form which can bedirectly used for grasp planning.

Their closed form expression provides a nice base for non-linearminimization.

Their nature makes them robust to small sources of error.

They are capable of accurately approximating the shape ofmany objects used in Activities of Daily Living.




Summary


SuperquadricsSome Possible Shapes




Summary


SuperquadricsStandard Equation

Implicit Superquadric Equation

F (xw , yw , zw ) =

((nxxw + nyyw + nzzw − pxnx − pyny − pznz

a1

) 2

ε2

+

+

(oxxw + oyyw + ozzw − pxox − pyoy − pzoz

a2

) 2

ε2

) ε2ε1

+

+

(axxw + ayyw + azzw − pxax − pyay − pzaz

a3

) 1

ε1

Evaluates to 1 if a point (xw , yw , zw ) lies on the superquadric.F (xw , yw , zw ) is also called the inside-outside function.17 parameters at �rst glance; 6 are redundant.




Summary


Superquadrics

The parameters (nx , ny , nz , ox , oy , oz , ax , ay , az , px , py , pz) make upthe 4x4 transformation matrix that relates the superquadriccentered coordinate system to the world coordinate system.

WQ T =

nx ox ax pxny oy ay pynz oz az pz0 0 0 1

The 3x3 rotation portion is orthonormal and can be decomposedinto the ZYZ-Euler angles (φ, θ, ψ).Thus, the superquadric is parametrized by 11 parameters

Λ = (λ1, λ2, ...λ11) = (a1, a2, a3, ε1, ε2, φ, θ, ψ, px , py , pz)




Summary


Superquadrics

What do the 11 parameters represent?

(a1, a2, a3) are the dimensions of the superquadric in the x , y ,and z directions.

(ε1, ε2) are the shape exponentials.

(φ, θ, ψ) are the ZYZ-Euler angles which de�ne orientation.

(px , py , pz) are the (x , y , z) world coordinates of the centroidof the superquadric.




Summary


Classical Cost Function

Inside-Outside function

F = F (xw , yw , zw , λ1, λ2, ..., λ11)

Cost function

minΛ

n∑i=1

(√λ1λ2λ3(F ε1 − 1)2)

Standard cost function asderived by Jacklic, Leonardis,and Solina.√λ1λ2λ3 recovers smallest

superquadric.

ε1 exponential promotes rapidand robust convergence.

Use non-linear gradientdescent algorithm to �nd Λ.

There are limits placed oncertain λi to restrict therange of recoverable shapes.




Summary


Error Rejecting Cost Function

Modi�ed Cost function

minΛ

[w

n∑i=1

(√λ1λ2λ3(F ε1 − 1))2+

((1− w)

n∑i=1

(√λ1λ2λ3(F ε1 − 1))2 ∈ F ε1 < 1

)]

Penalizes points that lie inside the superquadric.

Forces the superquadric to reject perspective projection errors.

The superquadric will be as large as possible, withoutexceeding the visual hull.

Empirically determined w = 0.2.




Summary


Cost Function Comparison

Surface Standard Modi�edApproximation Cost Function Cost Function

↑ 20% greateraccuracy




Summary


Example Reconstruction




Summary

SimulationHardware

Simulation TrialsOverview

Tested the algorithm against simulated sphere, cylinder, prism,and cube.

These shapes represent a range of common convex shapes andcan be modeled accurately by a superquadric.

The results are reported by comparing recovered superquadricparameters against the known ground truth.

The volume of the superquadric is also compared against thevolume of the object in the form of a fraction.

The volume fraction vf is a quick and intuitive measure ofaccuracy.




Summary

SimulationHardware

Simulation TrialsResults

vf = 1.087 vf = 1.088

vf = 1.077 vf = 1.092




Summary

SimulationHardware

Hardware SetupRobot

Kuka KR 6/2

Six axis, low payload,industrial manipulator.

High repeatability: ±0.1mm




Summary

SimulationHardware

Test SetupViewing Positions

Due to kinematic limitations of the robot, viewing directions werenot perfectly orthogonal, but they approached such a condition.




Summary

SimulationHardware

Test SetupTest Objects

Four test objects, all red in color, which represent a range offrequently encountered shapes.




Summary

SimulationHardware

Experimental TrialsSources of Error

Several sources of error are introduced in the hardwareimplementation that are not present in the simulation environment:

Imprecise camera calibration: intrinsics and extrinsics

Robot kinematic uncertainty

Imperfect segmentation

Ground truth measurement uncertainty (must be measuredwith the robot).




Summary

SimulationHardware

Experimental TrialsBattery Box

vf = 1.18




Summary

SimulationHardware

Experimental TrialsCup Stack

vf = 1.13




Summary

SimulationHardware

Experimental TrialsYarn Ball

vf = 1.14




Summary

SimulationHardware

Experimental TrialsCardinal Statue

vf = N/A




Summary

SimulationHardware

Performance Evaluation

With respect to the stated objectives:

Most parameters of the reconstruction di�er from the groundtruth by no more than a few percent. This should be wellwithin the margin of error for most household retrieval tasks.

Contrast this with an error of 10% in the work byYamazaki et al.

On an Intel QX-9300 at 2.53 GHz, the algorithm executes in~0.3 seconds on average. This time depends highly on the timerequired for the non-linear minimization routine to converge.

The work by Yamazaki et al. required in excess of 100 secondsto converge.




Summary

Python ImplementationCore Algorithm

NumPy

Basic image data structureLinear algebra

Cython

Image processing andsuperquadric gradient

Scikits.Image

OpenCV bindings

SciPy

Non-linear minimization




Summary

Python ImplementationSimulation

Mayavi

Simulator engine3D renderings

Traits UI

Simulator UI




Summary

Python ImplementationHardware Networking

OpenOPC

Robot control andcommunication

xmlrpclib

Exposes core algorithm tothe network

PIL

Camera JPEG -> NumPyconversion




Summary

Summary

This work has presented an algorithm for the shape and posereconstruction of novel objects using just three images.

The algorithm requires fewer images than other algorithms inthe published literature, and provide su�cient accuracy forgrasp and manipulation planning.

The algorithm provides higher performance than the otheralgorithms in the published literature.

The algorithm is entirely implemented in Python using librariessuch as NumPy, SciPy, Cython, Mayavi, Traits, OpenOPC,and PIL.


sccolbert robot vision

Documents