# A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

Post on 23-Dec-2016

213 views

Embed Size (px)

TRANSCRIPT

J Intell Robot SystDOI 10.1007/s10846-013-9961-0

A Multi-Camera Active-Vision Systemfor Deformable-Object-Motion Capture

David S. Schacter Mario Donnici Evgeny Nuger Matthew Mackay Beno Benhabib

Received: 12 September 2013 / Accepted: 14 September 2013 Springer Science+Business Media Dordrecht 2013

Abstract A novel methodology is proposed toselect the on-line, near-optimal positions and ori-entations of a set of dynamic cameras, for a re-configurable multi-camera active-vision system tocapture the motion of a deformable object. Theactive-vision system accounts for the deformationof the object-of-interest by fusing tracked ver-tices on its surface with those triangulated fromfeatures detected in each cameras view, in or-der to predict the shape of the object at subse-quent demand instants. It then selects a systemconfiguration that minimizes error in the recov-ered position of each of these features. The tan-gible benefits of using a reconfigurable system,particularly with translational cameras, versus sys-tems with static cameras in a fixed configuration,

D. S. Schacter E. Nuger M. Mackay B. Benhabib (B)Department of Mechanical and IndustrialEngineering, University of Toronto, Toronto,ON, Canadae-mail: benhabib@mie.utoronto.ca

D. S. Schactere-mail: david.schacter@utoronto.ca

M. DonniciDepartment of Mechanical and IndustrialEngineering, University of Calabria, Rende, Italye-mail: mario.donnici@gmail.com

are demonstrated through simulations and exper-iments in both obstacle-free and obstacle-ladenenvironments.

Keywords Active vision Camerareconfiguration Deformable objects Motion capture Robot vision systems View planning

1 Introduction

The motion capture of deformable objectsthosewhose shapes variance is distributed relativelyuniformly over their surfacesis a complex re-search problem, owing to the inherent difficultiesin recovering dynamic surface forms in noisy en-vironments [15]. Often, these methods rely onbreaking the objects surface into a mesh of ver-tices whose motion is then tracked [1, 6], as willbe the focus herein. Nevertheless, deformable-object-motion capture has been attempted in a va-riety of applications, such as performance capture[7, 8], expression recognition [9], tele-immersion[10], and soft-tissue deformation tracking [11,12]. However, for many proposed systems, theunderlying assumption has been that the posi-tions and orientations (poses) of the cameras arefixed in space, and clear views of the object-of-interest (OoI) are available. These constraintslimit the applicability of such systems, especially,

J Intell Robot Syst

in the presence of obstacles occluding views ofthe deformable OoI, and in the event of self-occlusion [13].

In order to address the abovementioned chal-lenges, researchers have suggested the use ofactive-vision systems, wherein cameras are re-configured in response to changes in the environ-ment [1417]. Namely, the poses of the camerasmay be varied dynamically in order to maximizethe number of un-occluded features, and/or tominimize the error in recovering the positions ofthese features [18].

Recovering a dynamic surface form is a chal-lenging problem, and methods that have beendeveloped for the on-line selection of optimalcamera poses to improve an OoIs 3D-shape re-covery are, typically, limited to cases where theobject is rigid (e.g., [1922]). When dealing withdeformable objects, these methods suffer froma number of shortcomings. First, by assuming aunique and static representation of the OoI, theyare unable to recognize or track the OoI if itsshape or appearance changes. Second, they as-sume that the entire OoIs shape can be knowngiven the position of a small number of referencepoints on it, due to a rigid shape known a priori.Accordingly, they do not necessarily ensure thatall desired features of interest are observable bythe cameras at all times, thereby allowing un-acceptable errors in the recovered OoIs shape.Third, when faced with self-occlusions, they donot reconfigure to views which avoid increasingthe errors in the occluded features estimatedpositions.

A number of research teams have been focus-ing on overcoming these shortcomings. The activemulti-camera system proposed in [23], for exam-ple, was designed for volumetric reconstructionof a deformable object for 3D video. This systemreconstructs the OoIs shape by calculating thevolume intersection of silhouettes projected fromthe cameras image planes onto multiple parallelplanes, although a control scheme to select cameraviewpoints is not mentioned.

In [24], for a given image sequence, factor-ization is performed on detected feature pointsin order to determine the relative camera poses.Although this technique was designed for 3Ddeformable-object-motion capture, it does not

prescribe explicitly how the camera poses shouldbe chosen optimally. The active multi-camera sys-tem proposed in [25] does address the issue ofoptimal camera viewpoint selection, even in thepresence of occluding obstacles. This system usesan agent-based approach to maximize a visibilitycriterion over bounding ellipsoids encircling theobjects articulated joints. However, as one notes,only recognition of articulated (multi-rigid-link)objects is addressed.

The camera-assignment method discussed in[26] maximizes the visibility of an unknown de-formable OoI. The proposed windowing schemecontrols the orientation of pan/tilt cameras, al-though, no provision is made for future defor-mation prediction. A stochastic quality metricwas proposed in [27] for the optimal control ofan active-camera network for deformable-objecttracking and reconstruction, however, as in [20],no configuration-management component or de-formation prediction was presented. An activepan-tilt-zoom camera system that does use atracking algorithm to keep a moving person cen-tered in each cameras view was proposed in [28].This system, however, does not explicitly optimizefor each targets visibility, allowing the possibilityfor occlusion.

From the abovementioned works, one may con-clude that none of the existing systems adjustcameras positions to better observe deformingobjects. Allowing cameras in an active multi-camera network to translate may be particu-larly beneficial in two cases. First, in mixed-scaleenvironmentswhere the scale of the OoIs de-formations are significantly smaller than the scaleof the OoIs motion within a workspace, which isusually the case with mobile deformable objectsa system must simultaneously capture the defor-mation, which requires high resolution imagerysuitable for resolving small details, and increasethe working volume in which motion capture ispossible [29]. Translation allows even a limitednumber of cameras with limited zoom to do justthat: to approach an OoI and recover its finedeformations, while allowing a wide region in aworkspace to be covered as needed. Second, incluttered environments, a system must be able tocapture the motion of an OoI even when it passesbehind an obstacle. In these cases, translation

J Intell Robot Syst

allows a system with even a limited number ofcameras to recover the OoIs deformation byrepositioning them to viewpoints from which theOoI is un-occluded by obstacles.

Herein, a reconfigurable active-vision systemthat minimizes the error in capturing the motionof a known deformable object with detectiblefeatures is proposed. Such a system would bebeneficial in improving the effectiveness of meth-ods such as those presented in [1, 6], which cap-ture the motion of clothing with a known colorpattern, and could be used to capture the motionof the flowing dress of a dancer moving arounda cluttered set. This system is novel in that itaccounts for the deformation of the OoI in anon-line manner, and allows for pan/tilt as wellas translation of the cameras. Moreover, an al-gorithm is provided for the efficient determina-tion of near-optimal camera poses. Simulationsdemonstrate the systems ability to reduce theerror in deformation recovery and account forerror in deformation prediction. Additionally, ex-periments empirically validate the effectiveness ofthe proposed implementation with application tomotion capture, and show the benefit of allowingreconfigurable cameras to translate.

This paper is, thus, organized as follows. Thenotation used throughout the paper is providedin Section 2, and the problem addressed herein is,then, defined in Section 3, in terms of the objectiveto be achieved, and the underlying tasks requiredto achieve it. Section 4 provides the descriptionof, and theory behind, the proposed system forsolving this problem. This system is evaluated inthree distinct simulations in Section 5, and in twoexperimental scenarios in Section 6, respectively.Both Sections 4 and 5 provide their respectivedescription of the test set-up, procedure, results,and discussion. Section 7 concludes the paper.

2 Notation

T Set of all DIs (demand instants)J Total number of DIst j jth DIj Index of current DIV j Set of all vertex positions at t j (set of

[3 1])

n Number of verticesxi, j Vertex is position in world coordinates

[3 1]xi, j Estimate of Vertex is position in world

coordinates at t j [3 1]C Set of all camerasck Camera kRc, j Pose of camera cat t jE Average error metric over all DIsE j Error metric at t jm Number of degrees of freedom for each

camera Index of a cameras degree of freedomi, j Ellipsoidal uncertainty region around Ver-

tex iat t jW i, j Scaled covariance matrix of Vertex Is pre-

dicted position at t j [3 3] Number of variances defined by user to be

enclosed in uncertainty ellipseijc Projection of uncertainty ellipsoid into

camera cs pixel coordinatesvc Arbitrary pixel in camera c pixel coordi-

nates [2 1] Pc Mapping of world coordinates to pixel co-

ordinates for camera c [2 1]Jcij Jacobian matrix of Pc evaluated for xi, j

[2 3]Pc Camera cs projection matrix [3 4]xc Pixel coordinates of x in camera cs image

[2 1]si, j State of vertex i at DI j in terms of position

and velocity [6 1]si, j Estimate of state si, j [6 1] Noise in vertex dynamics [6 1]Q Covariance of vertex dynamics [6 6]i, j Measurement noise for vertex i in DI j

[3 1]Ri, j Covariance of measurement noise for ver-

tex i in DI j [3 3]ij Covariance variable for Vertex is measure-

ment at DI j Standard deviation of vertex position

detectionoicj Boolean occlusion value for a given vertex

at a given DI, in the perspective of camera cxi, j Predicted world coordinates of Vertex i in

DI j [3 1]X i, j Covariance matrix of vertex is predicted

position at DI j [3 3]

J Intell Robot Syst

Gq Objective function reflecting expected er-ror for a given camera configuration at tq

q Index of DI after t jUq Expected uncertainty in vertex position es-

timates for a given camera configurationat tq

q Non-uniform uncertainty unaccounted forin Uq

ac Error due to Camera c calibrationz pc Triangulation error based on baseline sep-

aration between camera pairsb pc Baseline between camera pairzpc Distance between baseline and vertex cen-

troid (depth)wa Weighting for camera calibration inaccu-

racy errorswz Weighting for camera triangulation errorspc Index of camera pair

3 Problem Formulation

The problem of multi-camera reconfiguration fordeformable-object motion-capture is defined be-low in the case of a known object with detectiblefeatures.

3.1 Objective

Let us suppose that a single deformable object,the OoI, is present in a known workspace, and theshape of its dynamically deforming surface duringthe time period of interest is a priori unknown,and is to be determined over a set of J discretepoints in time given at the outset, T = {t j| j =1, . . ., J}, called demand instants (DIs). The OoIsidentity is known a priori, and there exist a num-ber of detectable feature points on its surface suchthat the shape of its surface at the jth DI, t j, can bemodeled in 3D by a mesh of n vertices, one at eachfeatures location, V j =

{

xi, j|i = 1, . . . , n}

, con-nected by a set of edges. Vertex trajectories are,therefore, not known deterministically over T.

The objective is, thus, to define a system which,given a set of k calibrated cameras, C, attachedto actuators with known kinematic capability (mo-bility, constraints, and maximum speed and accel-eration), will locate the cameras at each DI, t j,in such a configuration, RC, j, so as to determine

the position of the n vertices with the minimumerror over all DIs. One may note that this problemdiffers from traditional placement problems, inthat the object is deforming in time, and so V jis not constant across T up-to some Euclideantransformation.

If the system determines the position of the ith

vertex at DI t j, in world coordinates, to be xi, j,the accuracy of this estimate can be defined by itsdifference from the vertexs true position, xi, j. Anoverall error metric, E, is defined as the optimalitycriteria to represent this error across all verticesand DIs according to:

E =(

J

j=1 E j)

/

J, (1)

where the error metric at each DI, E j, is defined as

E j = 1nn

i=1 xi, j xi, j. (2)

Thus, given that the estimated positions of thevertices, V j, are implicitly determined by the posesof the cameras with which they are observed, anideal system should be capable of minimizing Eover T according to:

minRC;T

(

E(

Rc=1,...,k; j=1,...,J)) ;

s.t. Rcjmin < Rcj < Rcjmax ;c = 1, . . . , k; = 1, . . . , m; j = 1, . . . , J (3)where, for Camera c, Rcj =

(

Rc, j1 , . . . , Rc, jm)

is itspose, and Rc, jmin and Rc, jmax define the limits ofits achievable range of motion in the th degree offreedom (dof) at DI t j.

The above error metric assumes that the truepositions of the OoIs vertices are known; in prac-tice, however, they may only be measured to acertain level of accuracy. Thus, when evaluatingthis error metric herein, results for simulationsreflect the actual error, whereas results for exper-iments are provided to an accuracy of the afore-mentioned measurement error.

3.2 Required Tasks

In order to achieve its objective, the proposedsystem must be able to fulfill a number of different

J Intell Robot Syst

functions in an on-line manner, each subject to anumber of challenges:

Detection Given a set of images acquired at acertain DI t j, the system must detect as manyof the OoIs features as possible. It must do soefficiently in the face of significant uncertaintyin their expected locations in each image, andfrom multiple changing viewpoints. Moreover, thesystem must account for the fact that texturalfeatures may themselves be deforming in time.

Association Given a set of detected features ineach image at a DI, t j, the system must be able tocorrectly associate each one with its correspond-ing vertex, despite uncertainty in its predicted lo-cation, limited capability to uniquely differentiateeach one, and close proximity of similar featuresdue to many densely distributed vertices.

Recovery Given a set of corresponding featuresin images acquired by multiple cameras withknown poses, the system must recover the worldcoordinates of the associated ver...

Recommended