pennochio - mapping motion capture onto a humanoid robot ...cse400/cse400_2010... · map these...

Pennochio - Mapping Motion Capture onto a HumanoidRobot in Real Time

Dept. of CIS - Senior Design 2010-2011

H. Anthony [email protected]

Univ. of PennsylvaniaPhiladelphia, PA

C.J. [email protected]. of Pennsylvania

Philadelphia, PA

ABSTRACTThe objective of this project is to use motion capture to trackthe motions of a human subject, above the waist, and thenmap these motions onto a humanoid robot in real-time, es-sentially making the robot a puppet and the human the pup-peteer. This puppeteer will control the robot while wearinga virtual reality rig that allows the puppeteer to see whatthe robot sees. The robot that will be used is the PR2 fromWillow Garage [3] and the motion capture system was ini-tially designed to be the VICON system [9]. However, givenpossible difficulties in usefulness of a system based on suchan expensive system with such limited access, thought hasbeen given to developing the system (either exclusively or inparallel) for use with more commercial motion capture sys-tems like the Sony Playstation Move and the Microsoft XboxKinect. Through the course of the project the Kinect has re-vealed itself to be the more logical choice and so has replacedthe VICON system as the method for motion capture. ThePR2 can raise and lower its height, turn and tilt its head,and has arms that fully mimic a human arm.Because most motion capture relies on manual correction

after gathering data, the most difficult task will be formingan algorithm or method for motion capture that avoids am-biguity among the capture points and can also be easily bemapped onto the robot in real-time without significant delay.Although the main objective is to achieve an effective map-ping from the user’s head, torso and arms onto the robot in,it is also hoped that an effective system can be designed tocontrol the robot’s motion in the (x,y) plane and also to even-tually design a system using a mechanical device to grasp andinteract with objects.

1. INTRODUCTIONThe field of humanoid robotics is still an actively growing

and evolving field. There are many different types of hu-manoid robots ranging from the less-stable bipedal types tothe more stable and easier to use wheeled designs such asthe PR2. However, most humanoid robots possess the samebasic features: a head with 2 degrees of freedom and arm(s)with seven degrees of freedom.Virtual reality and motion capture have been used most

heavily in the field of graphics and character and scenariosimulation. By leveraging much of the knowledge of mo-tion re-targeting (or motion adjustment when transferredto a different medium) and error correction gained from theplethora of research and data from these fields mostly within

the realm of graphics, the main software development issueis to incorporate a new system for real-time data adjustmentfor the motion capture points. Overall, the main objectivefor this project is to capture the motions of a human torso,arms, head and possibly movement and in the (X,Y) planeand then map theses movements, with as little delay as pos-sible, onto the robot In real time.

2. RELATED WORKAs stated before, most of the related work involves mo-

tion capture for graphics rendering and general robotics sep-arately; there has been some overlap between the fields butthere seems to be little documentation on transferring mo-tion capture to humanoid robots. In graphics systems one ofthe most common tasks is to render a human into a manu-factured 3D environment in order to develop simulations forhuman interactions in a variety of specialized environments.There are a few different ways of doing projects like this andeach one can give insight into how it can be adapted for usein robotics.

This first section details preliminary research that wasdone to provide background and foundation on which de-sign decisions could be made with an educated and informedopinion. The latter section will detail actual research doneon the ROS in order to effectively and efficiently use ROSto achieve the end means of the project. Also helpful wasadditional research into robot modeling principles that werenot abstracted within ROS built-in commands.

2.1 Preliminary ResearchAdvanced techniques can apply image processing using an

assortment of actual cameras, as opposed to a motion cap-ture system, to track the movements of a human subjectin a contained area and then render that using correctioncode into a highly accurate 3D model of the person in soft-ware. Although this is outside of the scope of this project,it nonetheless provides insights into the capabilities bound-aries of human motion re-targeting and its capacity to becaptured using software. The most useful element that thiscan contribute to the mapping of human motion to a robotis speed. Because of the need to do this in real-time, speed isessential. If this system can use advanced image processingto use video from conventional cameras to generate a humanmodel graphically, similar design techniques can be used todo advanced motion capture processing to generate safe andaccurate kinematics to be transferred to a robot. [5].

Imitation learning is also a useful area of research when

trying to determine the best way to design this puppet sys-tem. There has been research into the best way for robotsto mimic human motions in an effective and useful way. Onemethod is by using motion capture to try and have the robotperform the same task over and over using continuous feed-back and correction in order to modify the behavior ap-propriately. The incorporation of impedance sensors is alsouseful because it allows the robot to modify its behavior soit does not behavior more dangerously than it should [4].Such sensors are built into the PR2 but it is unclear if andhow these could be incorporated into the proposed error cor-rection and avoidance software of this project.In research that directly corresponds to the mapping of

human motion onto a robot in real time, there has been someinvestigation. One method that has been explored involvesperforming a series of previously defined actions and to thenusing a system of well-known algorithms (c-means, k-meansand/or expectation maximization) in order to develop a wayfor the robot to determine when the user is trying to executea motion it has previously encountered. Then, when theuser does this motion, the robot can determine which of themotions in its knowledge the user is most closely doing andthen perform this action quickly so that there is no lag inthe motion [1]. The one caveat for this technique is thatit is somewhat at odds with the objective of this projectbecause the objective is to mimic exactly the motions ofthe user. However, it may make sense to incorporate thesedesign principles as an aid or basis to the error correctionalgorithm.Another important aspect of motion capture and transla-

tion is the concept of motion re-targeting This most closelycorrelates to the field of graphics but, again, it contains con-cepts that are very much applicable to a robotics imple-mentation. However, itASs mostly concerned with bipedalgait. This is somewhat troublesome because the PR2 op-erates on treads but the same design principles can be ap-plied nonetheless. Instead of statistically trying to recon-struct and predict the nature of human bipedal gait, thesame principles can be used for developing a system to con-trol the speed and direction of the PR2 given the confinedstate of the user. Because designing a stark and rigid sys-tem for movement would be very user unfriendly, designingsome type of feedback loop to actively gauge the userASsintentions would be helpful [6]. It is impossible to use mo-tion capture to track walking movement and direction, butit may also be outside the scope or possibility to develop aneffective system for translational movement using just themotion capture system. It may be necessary to simply in-corporate a joystick mechanism (or the Sony DualShock 3controller provided with the PR2) for controlling transla-tional motion, although this would be a last resort.The other reach goal of the Pennochio project is to de-

velop a useful grasping system. The hardest part of thisdesign is the need to develop a grasping system that is sensi-tive enough that it can grab and release things smoothly andwithout danger. There has been a lot of research into repli-cating the full capabilities of the human hand but accordingto this research it seems like this type of mechanism is toocomplex and underdeveloped to be a plausible option [2].The grasping device incorporated on the PR2 is a simpli-fied two pronged grasping mechanism that can be equippedto have sensitivity devices. Using this provided technologywith additional intelligent software added to make the de-

vice more responsive, a workable system can be designed forthe PR2.

The field of humanoid robotics is very diverse with spe-cialized types of robotics available for all types of purposes.Although a good deal of current research is being done toimproving bipedal robots, there still a good niche and roomfor growth for simpler robots like the PR2. Because it is eas-ier to work with it can therefore be more readily designedfor useful tasks like acting as an avatar for a human userunder motion capture [11].

2.2 Practical ResearchBecause most of this project relies on the familiarizing

and adapting the Robot Operating System (ROS) to suitthe needs of our project, there has been little need for ex-tensive research to this point in the project. However indeciding how to incorporate more complicated elements likemotion re-targeting and designing an end-effector grippermechanism, more research will be need to be done in thecoming months.

To date the majority of information has been gleanedthrough use of the ROS wiki cite. Using both extensivetutorials and often referring to the public code repositorywe have had to develop familiarity with a few key featuresto programming in the ROS environment. These featuresinclude ROS’s systems of topics and messages, servers andclients and publishers and subscribers. Learning how to nav-igate the ROS system was nontrivial and, in addition todefining overall modules and approach and writing a wrap-per for communication with major parts of PR2, constitutedthe major of the work of this first semester.

ROS can be run on a standard Linux machine but is alsorun on the PR2 robot (and other robots) themselves. Settingup the a machine to run ROS is a fairly simple. After down-loading the system all that remains is to set up a local direc-tor for developing using the ROS commands as your guide.Packages of new information should be developed in a pack-ages that have a manifest.xml file that lists all dependenciesof modules developed in the package. These packages can,in fact, be created by the roscreate-pkg command which canbe given a list of dependent packages as parameters in themanifest file. Any executable binary or python file usingthe framework of this packaging system can be called fromanywhere on the machine using the rosrun command.

The basic building blocks in ROS are nodes. These aresimply code modules that perform a computation. Manydifferent nodes are graphed, or networked, together in ROSin order to execute a non-trivial task. Nodes can be devel-oped in python or C++; in this project development willbe done in python with the hopes of making coding simplerand less cluttered as well as to allow for easy extensibility ofthe code for future work. In addition, launch files which arerequired for launching ROS nodes can also be called fromanywhere on the machine using the roslaunch command.

In addition to these standard commands for navigatingand executing the ROS file system there are also other builtin commands such as as rostopic, rosnode and rosmsg whichare used to list, find and keep track of the current capa-bilities of the local ROS system that is currently runningon a machine. Using these commands is crucial to trackingwhich topics, services, etc. are currently running on ROSand therefore which can be communicated with and con-trolled.

Specifically important are ROS topics. Topics are busesthrough which nodes in ROS are able to exchange messages.Topics are set up so that end publishers and subscribers areanonymous to each other which simplifies communication.In essence, if there is a type of communication one wants toshare with another node in the system simply define a newtopic as a venue to communicate it and a message to carry it.Then any time you wish to send this message (now defined)all one needs is to program a publisher node. To receivemessages a subscriber node needs to be implemented.A final means of communication between nodes is the ser-

vice interaction which is a very familiar system in the realmof computer science because it acts as a server client rela-tionship. A service is defined by a pair of messages. Onecarries the request and the other a response. This helpfulbecause it allows for two-way communication. This is an im-mensely helpful for this project it is imperative that when acommand is sent over ROS to send a link of the PR2 to acertain position an acknowledgment from the recipient ser-vice as to whether the command was received in executedprevents overloading a service with too many commands andalso prevents errors in robot losing track of its position.In addition to gaining familiarity with basic design fea-

tures and paradigms in the ROS environment, gaining basicunderstanding of robot kinematics was also essential for in-terfacing with the PR2 through ROS and its built in func-tions. Through navigation of the ROS wiki and the ROScore packages, built-in nodes to generate a position vectorusing forward kinematics given a series of joint angles for thePR2’s arms, and a set of joint angles using inverse kinematicsgiven a (x,y,z) position vector and an (x,y,z,w) orientationvector. It would be ineffective to develop new modules forcalculating both types of kinematics both because the mathis complicated and liable for error and also because exactmeasurements are needed for all links of the robot and noneare provided [7].But a working knowledge of how to calculate the kinemat-

ics was necessary in order to be able to use these serviceseffectively. Forward Kinematics are calculated traditionallyby using the Denavit-Hartenberg convention. In this conven-tion, given a standard base position for all links in a roboticmanipulator, each joint is assigned a base coordinate frame.From these coordinate frames, a matrix is developed for eachjoint based one its offsets from the next joint and also basedon a set of rules governing the orientation of the currentjoints z-axis as it relates to the previous links z-axis. To de-velop the transformation matrix to go from a set of a jointangles to a an (x,y,z) position for the end link each of thecalculated matrices is multiplied in sequential order.It is more difficult to develop a closed form solution to the

inverse kinematics problem. The most effective way to dothis for a robot (arm) with seven degrees of freedom includ-ing a spherical wrist (Which is characteristic of the PR2’sarms) is to use kinematic decoupling. This technique relieson finding the intersection of the wrist axes which is calledthe wrist center and then by finding the orientation of thewrist. The process uses a geometric approach through useof trigonometry functions to develop equations for the jointangles for the first three degrees of freedom (the shoulderand elbow joints). This actually presents two possible so-lutions for the elbow joint angle (elbow up or elbow down)and the inverse kinematics function in ROS uses a seedingvalues input in its node to provide a basis for which solution

is correct. In practice, the seeding values will probably bethe PR2 arm’s previous position. For the most part thispart of the calculation is abstracted from view by the ROSkinematics service.

However, after that, the remaining three joint angles aredetermined using the Euler angles. Euler angles involve arotation about the z-axis followed by a rotation about they-axis followed by a rotation about the z-axis again; thiscaptures any possible rotation for a coordinate plane. Usingthe transformation matrix found by multiplying the threerotation matrices corresponding to the Euler angles, valuescorresponding to the final three joint angles of the PR2 armcan be extrapolated. Although most of this is also hiddenby abstraction, it is crucial for this project to understandthis inverse kinematics derivation because the ultimate ob-jective is to determine a location for the end effector (notthe wrist center) but with a predetermined orientation forthe effector. It makes sense to derive this orientation (whichneeds to be provided in the form of a quaternion) by makinga transformation matrix based on three Euler angle rotationmatrices and then reverse engineering a relevant quaternionfrom this matrix [8].

2.3 Overlapping ResearchA similar project has been executed by researcher at MIT,

Garratt Gallagher. His project, kinect joy, involved teleop-eration of the PR2 using openni tracker node in ROS andthe Kinect involving the base, torso and grippers. The focusof his project seemed to be more on hand and finger trackingand seems to be mostly self-contained. In order to distin-guish this project from his work, more focus will be placedon head tracking and designing a system that is more easilyadaptable to other scenarios and can be extended to includeother motion capture systems like VICON. Also instead ofexpending time and energy into designing a finger trackingadd on to openni, the anticipation of this project is to de-sign an interface with a hardware device that is held in thehand of the user which will ultimately use a force-feedbacksystem for gripping and ideally a gyroscope to track orien-tation of the hand for more robust data. This would bein place of the finger tracking system Gallagher design forthe Kinect, which, although potentially very useful, seemsto unwieldy and cumbersome for a natural feeling motioncapture system.

3. SYSTEM DESIGNThe system design for the Pennochio software system is

organic. Because it is unknown still what the final scope ofthe system will be many of the prospective design decisionsmay end up being modified or dropped from the final prod-uct. Where possible, all primary courses of action will beindicated in addition to a description of fallback or alternatemethods.

3.1 Initial System ModelA software system can interface with the robot using the

ROS operating system installed on the robot and also onthe machine that is sending communication to the robotremotely [7]. The first step is to create a program that cancommunicate with all relevant aspects of the PR2. Thisincludes developing a system for sending pan and tilt anglesto the robot’s head, giving a position and orientation for therobots end-effector and generating and sending appropriate

joint angles to each of the PR2’s arms, sending a heightparameter to the torso, giving a position and effort indicatorto the PR2’s grippers, and sending a direction and speed tothe base in order to determine motion.After this design will shift focus onto the motion capture

segment of the project. First a group of capture targets willneed need to be designed. For each capture target on theperson a unique, rigid design of capture points will need tobe arranged so that both the position of the target in spaceas well as the targets orientation can be determined. Thisis especially true for the targets that will be placed on thehuman capture subjectASs hands. Because of the detailednature of this capture design, it seems most likely that theVICON system will be used for motion capture. However, ifa less demanding, more elegant method for capturing posi-tions is discovered, a less robust motion capture system suchas the Microsoft Xbox Kinect would ideally be used in orderto develop a more portable and accessible environment forthe Pennochio system.Regardless of what motion capture hardware is used, the

next step is to develop a robust motion capture system insoftware that outputs the positions of specified capture tar-gets on the human subject in a standard base frame. Itis then necessary to construct software that can processthis data, at small time intervals, determine which bodypart the capture points correspond to on the human subjectand then translate these positions and orientations into thePR2’s base-link frame using the already implemented PR2controller interface node.The next step is to program the main program loop that

gathers data from motion capture and translates the datainto data usable in the PR2 controller node and then iteratethrough this loop safely and with a robust system for pri-marily avoiding errors. However errors will no doubt arise ina complex system like this so a system also needs to be devel-oped for finding, mitigating and correcting any errors thatmay arise either from dropped signals, errors from the userunder motion capture (due to speed of movement or otherenvironmental factors), or other unforeseen circumstances.There are also stretch goals of this project. The first is de-

velop a system for motion re-targeting that isn’t too clumsyto be used effectively by an individual under motion cap-ture. Because the workspace of a motion capture system islimited and the desired range of the PR2 being controlled ismuch larger, this is a non-trivial issue. Another stretch goalis the development of a hardware device that can be held ineach hand by the user under motion capture which would beable to send a signal to the PR2’s grippers to open and close.Ideally this system would have force feedback and pressuresensitivity for optimal useability.The most important challenges of the overall system are

speed and accuracy. The system has to be fast enough tokeep up with the userASs motions as it maps them to therobot so there is as little lag as possible but not so fast thatit overloads the PR2 with commands that it can’t keep upwith. But if the system is too slow, the lag would makeit difficult for the user to control the motion effectively ifthere is too much delay between what the robot doing andwhat they are doing. The user is using a virtual reality to seewhat the robot can see so, if the robots movements are out ofsync with their current position, it would make it extremelydifficult for the user to be able to use the system usefully forany extended period. Accuracy is also imperative because

if the system confuses two capture points this could lead tounexpected and possibly dangerous behavior from the robot.

If this system could be produced effectively, it would helprobots to replace humans in a variety of dangerous situationssuch as bomb diffusion or similar circumstances.

The system design for the Pennochio project is to be donealmost entirely in software. The main software componentsas they will be designed can be seen in Figure 1. This isa slightly more realistic representation of the code organi-zation as at odds with the original description, shown inFigure 2. The code will be written in python on a dedicatedmachine which will obtain data from the motion capturesystem perform the necessary operations in its software andthen send the information to the PR2 in the form of ROScommands on the joints of the PR2. This format is shownin Figure 3. The challenge this set up presents is the needto make the communication between these three systems asfast as possible so that any time needed to process data insoftware is not exacerbated by delays in the relay of data inthis network of components.

VICON/Motion

Capture Translator

(Not implemented)

PR2

Communication

Engine

(Implemented)

Coordinate

Translation, Error

Correction and Main

Code Loop (Not

implemented)

Figure 1: Revised Block Diagram of Software Com-ponents to be Designed

Figure 2: Original Block Diagram of Software Com-ponents to be Designed

3.2 System Implementation Prior to Design ChangeThe majority of the time this semester was spent in un-

derstanding and exploring ROS and, using the knowledgegained from this exploration, implementing an interface uti-lizing many of ROS’s built-in components to communicatewith the PR2 (in simulation). Specifically, an interface wasdeveloped for the PR2’s controllers for its head, arms, grip-pers, torso and base. Also, while developing this system,thought was given to how future elements of the projectshould be implemented in order to work optimally with thecurrent system.

Development of the PR2 communication interface wasnontrivial. Most of the information on the ROS wiki wasonly presented in theory or in basic examples programmed

Vicon Motion CaptureMachine

PR2

Figure 3: Physical Diagram of the Interface of AllSystem Components

in C++. Because the decision had already been made to de-velop the system in python, learning the difference betweendeveloping ROS using python was the first step. Rospy is theAPI provided in ROS for developing nodes in python. Al-though the basic methods for developing in using this librarywas provided in the ROS wiki, nuances for calling specifictypes of nodes and services had to be gleaned from othersources such as the publicly accessible ROS code repository.After seeing a few different styles of implementation for

interfacing with the PR2, ultimately a hybrid method ofcommunication was used. To send signals to the arms, aservice for inverse kinematics (and forward kinematics forerror checking) was used to determine joint angles. Thesejoint angles are then communicated to the robot over thejoint trajectory action topic corresponding to the appropri-ate arm.Communication with the Torso (which only requires on

position along the z-axis of the base link) and with thegrippers was implemented using similar action client top-ics. Communication with the base was implemented usinga publisher to a command message to the base controller.Communication with the head was implemented to allowboth communication of the actual joint angles using a pub-lisher to the head controller using a command message andalso by using an action client to point the head to a locationin the space of the base link.For references to the links and joints on the PR2 please

see Figures 5 and 4 respectively [10].

3.3 Previously Anticipated WorkImplementation of the remaining work begins with figur-

ing out how to arrange the markers for the motion capturesystem. Tentatively, rigid, unique groups of markers specificto each point are going to be placed one each on the head,torso, and each hand. Because the objective is to make thesystem as easy to use and set up as possible, this processwill not require putting on an entire motion capture suitbut simply gluing or attaching each marker to the appropri-ate body parts of the user. The exact method for how to dothis will be done in tandem with people from the MEAMdepartment and/or GRASP lab.The next step is understanding the data received from the

motion capture system. Once this data is received, it mustbe parsed into a useful format that can be operated on todetermine which capture points correspond to which values.Specifically, the data has to be translated into positions and

orientations with well-defined spaces relative to the personunder motion capture but also easily translatable into thespace of the PR2’s base link.

Before moving on it is necessary to address dealing withpossible ambiguities of capture points. The current ap-proach dictates that, for each capture location, instead ofusing a single marker to track its position, a unique arrange-ment of capture markers could be arranged to disambiguateeach capture location from the others. However, there couldbe issues with the size needed for these configuration andpossible noise it could generate on how it appears to themotion capture system. The second option is to use singlecapture points to track the target positions on the person’sbody. However, this would necessitate the implementationmachine learning or some sort of clever error correction algo-rithm that actively tries to enact the corrections that currentmotion capture processes use in graphics techniques. Mostof these existing systems rely on gathering data during a trialand error period and then constructing rules based on errortrends found in the data or by simply enforcing manual cor-rections to marker confusion after capture. However it is un-known how effectively these techniques can be adapted to areal time system. Therefore this aspect of the system wouldrequire extensive research, modeling and testing. However,the initial approach is to use the unique capture groups be-cause that design would inherently provide the most robustand reliable data should it prove viable.

The next step is to determine a frame of reference so amethod can be developed for translating and sending valuesto the robot. This will require some testing on the actualrobot, but currently the best approach seems to be to usebase link as the reference frame and map all other move-ments from that point of reference. Although the Pennochioteam has experience working with a single robotic arm, itwill not be simple coordinating the motions of all the sep-arate limbs so that they appropriately mimic the actions ofthe human user in the right reference frame.

It may also be necessary to write in safety features thatprevent the robot from trying to reach places outside of itsreachable or usable workspace that a human user may beable to reach. This will require testing the robot, or possiblyfinding a definition of these workspaces in its documentationand then writing in rules preventing motion capture from theuser from trying to reach what is unreachable and also mak-ing sure that the robot does not in the process lose track ofthe positions of its limbs. Currently, the PR2ASs extensivebuilt-in safety features seem to provide enough protectionfrom unwanted behavior so software will only be written forsafety only if issues arise during testing.

After creating a workable system the next step would be todevise an effective system for movement in the (x,y) plane.Because the motion capture system requires the user to stayin a small area there is no clear way to do this. It couldinvolve using some sort of controller joystick to determinemovement but this would seem to be counterproductive tothe motion capture basis of the system. A more appropri-ate idea, and the method that will be used initially, wouldbe to use the human users position from the center of thecapture space as a determination of speed In whichever di-rection they are displaced from or moving from the center.The issues that would need to be addressed here are over-sensitivity to movement as well as ease of use. It may bedifficult for the user to always be totally aware of their po-

sition in the capture space as well as potentially unintuitiveto control movement in such an unnatural way. Any methodchosen for this aspect of the system design will require test-ing with human subjects to gauge how useable the systemis to an actual person under observation.Another ambitious task would be to incorporate the grip-

ping devices used by the human user to mimic the grippersof the PR2. These devices would ideally be controlled by apressure sensitive gripping mechanism handled by the user.It would also be helpful if the device on the robot were ableto send a signal to the userASs device that indicated thepressure the generated from the robots grip on the object inquestion, or, in other words, a force feedback system. An-other approach would be to use motion capture on a fingerand thumb of the users hand although this approach seemsimplausible and will only be attempted if time permits. Thefallback approach for this design implementation would betwo have a simple binary button interface which the usercould use to indicate whether the gripper should be openedor closed. For any implementation, a correspondence wouldneed to be coordinated with a source from the MEAM de-partment.It would be helpful to incorporate an error correcting

feedback loop into all aspects of the system. This wouldinvolve data coming from the robot explaining its currentstate which could then be compared with its intended statein order to constantly improve the system being used to de-termine the values transmitted to the robot. The PR2 hassome topics that publish the states of many of the PR2’spositions so this is not an infeasible addition. However, it isnot part of the current approach and will only be attemptedif time permits.

X

Y

Z

Figure 4: Illustration of the PR2, Its Joints and BaseLink Reference Frame[10]

3.4 Final Work and Deviation from OriginalPlan

Having finished designing the interface to the PR2 in pythonusing ROS, the next step was to begin implementing thesystem for gathering motion capture data from the humansubject for pan and tilt angles of the head, location andorientation of the hands and arms, and location, orientationand velocity of the base and torso. As previously stated, theintention was to use the VICON system to track each thehead, each hand and the torso each as a separate, uniquely

Figure 5: Illustration of the PR2 and Its Links[10]

defined rigid body. However, before moving forward withthis implementation it was prudent to explore other meth-ods for gathering capture data for the sake of completenessand to help ensure overall reliability and usability of thecompleted system.

Of these other options, one system seemed to be mostappropriate: the Xbox Kinect from Microsoft. The Kinect isa fairly new technology and so it had been initially discardedas a viable option due to concerns about a potential lackof functionality or accuracy. However after researching avariety of aspects of the device it proved to be preferable tothe VICON system in a variety of ways.

First, it was a cheaper and more manageable system. Thissuited the project better chiefly in terms of the motivation.Because the intent was to design a system that would beaccessible and usable by a wide population, it makes moresense to use a system that doesn’t require exorbitant cost toacquire and set up as the VICON does. In addition, becausethe Kinect is a relatively small device, its portability was alsoan advantage. Being able to use the device in a variety ofscenarios is one of the primary objectives of the system.

In addition, the system already has built-in functionalityin ROS. The openni tracker node under the openni kinectROS stack broadcasts a tf rosmsgs at a rate of about 30 Hz (a reasonable rate for the speed required to render the systemuseable) that contain data about the position and orienta-

tion of a personaAZs arms, and torso. More than one usercan be tracked at a time but for the purpose of this projectonly a single user is of interest. The messages broadcasts bythis node consist of a String indicating the body part beingtracked which could be the head, neck, torso, left shoulder,left elbow, left hand, right shoulder, right elbow, right hand,left hip, left knee, left foot, right hip, right knee and right foot.For simplification, only the head, neck, torso, left hand,left elbow, right hand and right elbow, were used in thissystem. The message also includes a timestamp, a transfor-mation vector with x, y and z values and a rotation quater-nion with x, y, z and w values.

However, these values are not immediately useful for anumber of reasons. The first has to do with coordinateframes. The coordinate frames of the messages are givenas transforms from the position and orientation of the spec-ified body part to the coordinate frame of the Kinect deviceitself. The coordinate frame of the Kinect device is designedwith the x-axis to the left, the z-axis pointing out and they-axis pointing up, as seen in 6. This is clearly different thanthe coordinate frames of the PR2 which has the x-axis point-ing outward, the z-axis pointing up and the y-axis pointing

to the PR2’s left. However this does demonstrate that bothdefinitions are based on a right-handed coordinate frame.Therefore, a simple library with definitions for the vectorsin the tf rosmsg could be designed to compute transforma-tions of these values. This library also defines a homogenoustransformation form that combines both the quaternion andTranslation vector into a 4X4 matrix that can then be usedwith simple matrix algebra to do transformations betweencoordinate frame.A second complication with data was difference in scales.

The, way in which the positions of the PR2’s limbs werescaled was based on the assumption that the PR2’s fully ex-tended arm (including gripper) was of length one. However,because the Kinect bases its measurements in meters somesort of scaling would have to be done on these values in orderto effectively use them to calculate the inverse kinematics ofthe PR2’s arms. Because the system should allow for useby different users under motion capture it was not viable tohard code this scaling factor into the software. Instead atechnique was devised that relies on the design of the thirdparty openni tracker node.In order to recognize an individual trying to be tracked by

the node, the openni tracker demands that the user standstill at a range close around 3 meters from the Kinect de-vice and arrange themselves in what they deem the psi-pose,shown in 8. The node is not able to track, and thereforedoes not send tf messages over ROS, until the user has ex-ecuted this pose and the node has properly been able toparse it and determine the area of the person’s body ontowhich it maps a skeleton and is able to track movement.Therefore, this assumes that the first tf message broadcastby the openni tracker node corresponds to the user beingin the psi-pose. Therefore, a scaling constant between theunits of the PR2 and the Kinect can be determined. Us-ing this scaling constant and the transformations library totransform transform the frame determined from the Kinectto the frame of the PR2.Although these calculations are enough to transform the

torso and head measurements into the space of PR2’s torso(and base), calculations for the arms require additional trans-formations before transforming from the space of the humanto the PR2. Because the data given by openni tracker’s tfmessage are relate a transformation from the specified bodypart to the Kinect, an additional step is required to get thetransformation from the torso to the specified body partwhich is necessary because when giving values to the PR2’scontrollers all data is defined based on the torso link as areference frame.Despite all these useful features, a drawback with the

openni tracker nodes is its lack of tracking for pan and tiltangles for the subjects head. This is a major concern be-cause a basic system requirement relied on the fact that theuser’s head could be tracked so when the see what the robotsees (wearing the specialized glasses), this will allow them tobe fully immersed in the robot’s environment. So, two op-tions are currently being explored as a possible work aroundto this issue.The first involves using existing libraries in ROS that can

be used to track a predetermined image or object for its ori-entation in front of a camera connected to the ROS system.The most readily available library to do this involves track-ing a white and black checkerboard pattern for its orienta-tion. This checkerboard would be mounted on the same rig

being used to display the robots camera vision to the user so,although it would involve adding something ungainly look-ing to the user under motion capture it would not be toomuch of an inconvenience for the user since they alreadyneed to wear a piece of head gear.

The second option is more ideal although it involves toomuch overhead to implement currently. This method wouldinvolve using the coordinate frame of the head given byopenni tracker (which is little more than the coordinate frameof the torso translated along the positive y-axis) to deter-mine its position in the raw camera image from the Kinect.Using this data it should be possible to then determine atleast a specific 2d skeleton of the users headed which canthen be tracked for its orientation and therefore be usedto determine pan and tilt angles. However this method isoutside of the scope of the project due to time constraints.

Due to these complications with head tracking, more fo-cus was placed on designing a comfortable system for motionre-targeting Even though the Kinect and openni tracker gen-erate data for the user’s legs, using this data to determinemotion would be too unwieldy. Therefore a system was de-signed based on the user’s torso’s initial x and z values (inthe Kinect coordinate frame) and orientation as a baselinefor determining magnitude and direction of motion. Motiondirection will be determined exactly corresponding to x andz values, regardless of torso orientation (which is determinedfrom the openni tracker’s torso tf message). Magnitude ofthe motion is then a scaled calculation of the length of thevector from the origin point, with threshold values placed insoftware to prevent the system from being too sensitive tothe user’s ability to remain exactly at the origin point.

Figure 6: Data flow from devices with coordinateframes shown

Figure 7: Final layout of the system (Tools used butnot implemented in gray)

3.5 System PerformanceAs mentioned before system performance is something

that will always be kept in mind throughout system imple-

Figure 8: Image of the openni NITE visualization ofthe Kinect image of a user in the Psi-pose

mentation. To date, no major performance issues have beenencountered. However, later in the project performance is-sues may arise in a few key areas. First, the use of a highlevel language like python may slow down the execution ofthe software features of the program. However, this willmost likely be a negligible issue given the relatively smallamount of code that will ultimately be used in the system.Another possible issue may arise in the delay required to

perform error checking and reference frame translation whentranslating from data gained from motion capture to datauseable by the PR2. This issue relies on the efficiency (andmagnitude) of the error checking and data translation re-quired for this step. Because the system for motion capturehas not yet been implemented it is difficult to gauge howmuch of an issue this will be. However, before settling on anerror correction, it will certainly be necessary to gauge itsperformance through a series of trial and error performanceexperiments or tests.Finally, performance may be an issue given the fact that

the system will rely on giving and receiving data to the robotover a (most likely wireless) network. Although this shouldnot be a problem because the system will have safeguardsagainst dropped data and other network related issues, test-ing and experimentation will be necessary to make sure thatthis safety system does not render the system unusable be-cause there is too much lag.

3.6 Evaluation CriteriaTesting the system in a concrete way presents its own chal-

lenges. The best way to test it would be to run a test whereall the positions at certain points in time of the human userare tracked and stored and as well as all the movements ofthe robot. This data could then be scaled appropriately andcompared for accuracy. This seems to be the most objectiveway to evaluate the effectiveness of the system but othertests could include simple test applications to evaluate therobots ability to do certain tasks. This could be anythingfrom finding a variety of objects in a variety of locations in aroom to performing a dance like the YMCA. Although thesetypes of tests would be more subjective, they would also bea good evaluation tool to assess whether the robot is trulyuseful or not.

4. ITEMIZED AREAS FOR FUTURE WORKAs previously mentioned, the system was designed to al-

low for many future improvements. Because the system is

still very raw it has been designed to be easily amended andimproved by future engineers.

• Head Tracking: Because a rather basic inelegantsystem has been designed for head tracking using acheckerboard attached to the user’s head, this is clearlyan area for potential improvement. Going into theraw data of the connect camera to manually sketch arough two dimensional skeleton of the user’s head andthen determine determine pan and tilt angle from thiswould be the most ideal and elegant solution althoughit would require a fair amount of experimentation tomake it reliable.

• Gripping : It seems too unwieldy to use Kinect datato determine the state of a user’s hand to determinegripping. Therefore, the ideal implementation wouldincorporate a hardware device in each hand with aforce-feedback system for a natural gripping mecha-nism and also a gyroscope to get a clear readout of thehand’s orientation which is not given by the openni tracker

• Motion Capture : Although, the Kinect is usefuldue to its portability and incorporation into ROS, itmay make sense to allow this system to be used with asystem like VICON too if more precision was desired,even at the expense of portability and cost.

5. REFERENCES[1] D. K. Arvind and M. M. Bartosik. Motion capture

and classification for real-time interaction with abipedal robot using on-body, fully wireless, motioncapture specknets. In The 18th IEEE InternationalSymposium onRobot and Human InteractiveCommunication, pages 1087–1092, September 2009.

[2] Paolo Dario, Cecilia Laschi, Maria Chiara Carrozza,Eugenio Guglielmelli, Giancarlo Teti, Bruno Massa,Massimiliano Zecca, Davide Taddeucci, and FabioLeoni. An integrated approach for the design anddevelopment of a grasping and manipulation system inhumanoid robotics. In International Conference onIntelligent Robots and Systems, 2000.

[3] Willow Garage. Pr2.http://www.willowgarage.com/pages/

pr2/overview.

[4] Dongheui Lee, Christian Ott, and YoshihikoNakamura. Mimetic communication with impedancecontrol for physical human-robot interaction. In IEEEInternational Conference on Robotics and Automation,pages 1535–1542, May 2009.

[5] Ta Huynh Duy Nguyen, Tran Cong Thien Qui, AdrianDavid Cheok Ke Xu, Sze Lee Teo, ZhiYing Zhou,Asitha Mallawaarachchi, Shang Ping Lee, Wei Liu,Hui Siang Teo, Le Nam Thang, Yu Li, and HirokazuKato. Real-time 3d human capture system formixed-reality art and entertainment. IEEETransactions on Visualization and ComputerGraphics, 11(6):706–721, November 2005.

[6] Alexander Savenko and Dr. Gordon Clapworthy. Usingmotion analysis techniques for motion retargetting. InSixth International Conference on InformationVisualisation, 2002.

[7] Open Source. Ros wiki. http://www.ros.org/wiki/.

[8] Mark W. Spong, Seth Hutchinson, andM. Vidyasagar. Robot Modeling and Control. JohnWiley and Sons, Inc., 1st edition, 2006.

[9] VICON. Vicon. http://www.vicon.com/.

[10] Willow Garage. PR2 user manual, August 2010.

[11] Kazuhito Yokoi. Humanoid robotics. In InternationalConference on Control, Automation and Systems,2007.

pennochio - mapping motion capture onto a humanoid robot ...cse400/cse400_2010... · map these...

Documents