3d human model

CENTER FOR

MACHINE PERCEPTION

CZECH TECHNICAL

UNIVERSITY

MAST

ER

TH

ESIS

Articulated 3D human model andits animation for testing and

learning algorithms ofmulti-camera systems

Ondrej Mazany

{mazano1,svoboda}@cmp.felk.cvut.cz

CTU–CMP–2007–02

January 15, 2007

Available atftp://cmp.felk.cvut.cz/pub/cmp/articles/svoboda/Mazany-TR-2007-02.pdf

Thesis Advisor: Tomas Svoboda

The work has been supported by the Czech Academy of Sciences under Project 1ET101210407. Tomas Svoboda acknowledges suppor asof the Czech Ministry of Education under Project 1M0567.

Research Reports of CMP, Czech Technical University in Prague, No. 2, 2007

Published by

Center for Machine Perception, Department of CyberneticsFaculty of Electrical Engineering, Czech Technical University

Technicka 2, 166 27 Prague 6, Czech Republicfax +420 2 2435 7385, phone +420 2 2435 7637, www: http://cmp.felk.cvut.cz

Prohlasenı

Prohlasuji, ze jsem svou diplomovou praci vypracoval samostatne a pouziljsem pouze podklady (literaturu, projekty, SW atd.) uvedene v prilozenemseznamu.

V Praze dne .......................

................................podpis

Acknowledgments

I thank my advisers Tomas Svoboda and Petr Doubek for their mentor-ing and the time spent helping me with my work. I thank my wife Eva andmy parents for their support during my studies. Finally I give thanks toJesus Christ, for the faith that gives me hope and purpose to study.

Abstract

This document describes a software package for creating realistic simple hu-man animations. The package uses the open source 3D modeling softwareBlender and the scripting language Python. The animations generated forseveral cameras ought to be used for testing and learning of tracking algo-rithms in the multi-camera system where the ground-truth data are needed.We made a human skeletal model and designed the way how to animate itby scripts with using motion capture data. The texture of 3D human modelis obtained from captured images. Transformations between computer vi-sion and computer graphics are discussed in detail. We designed our ownalgorithm for automatic rigging mesh model with the bones of the skeletalmodel. Steps of the design are covered in this thesis together with the usageof the software package.

Abstrakt

Tento dokument popisuje softwarovy balık pro vytvarenı realistickych ani-macı cloveka. Softwarovy balık je zalozen na open source 3D modelovacımnastroji Blender a programovacım jazyku Python. Vytvorene animace zpohledu nekolika kamer jsou urceny pro pouzitı v multikamerovem systemupro ucenı a testovanı algoritmu pocıtacoveho videnı. Vytvorili jsme modelkostry a navrhli zpusob animace s pomocı nasnımanych dat pohybu. Tex-tury trojrozmerneho modelu cloveka jsou zıskany ze snımku z kamer. Trans-formace mezi pocıtacovym videnım a pocıta-covou grafikou jsou detailnepopsany. Navrhli jsme vlastnı algoritmus pro automaticke spojenı sıtovehomodelu cloveka s modelem kostry. Jednotlive kroky navrhu jsou zahrnuty vteto diplomove praci spolecne s pouzitım softwaroveho balıku.

Contents

1 Introduction 5

2 Articulated graphic model 7

3 Blender and Python overview 113.1 Blender coordinate system . . . . . . . . . . . . . . . . . . . . 123.2 Blender scene and objects . . . . . . . . . . . . . . . . . . . . 123.3 Blender materials and textures . . . . . . . . . . . . . . . . . 133.4 Blender camera . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Skeleton model definition 16

5 Rigging the mesh with bones 19

6 Texturing 256.1 Used approach . . . . . . . . . . . . . . . . . . . . . . . . . . 256.2 The problems of our method . . . . . . . . . . . . . . . . . . 276.3 Determining the visibility of faces . . . . . . . . . . . . . . . . 276.4 The best visibility algorithm . . . . . . . . . . . . . . . . . . 286.5 Counting the visible pixels . . . . . . . . . . . . . . . . . . . . 306.6 Conclusions (for texturizing approach) . . . . . . . . . . . . . 30

7 Animation 327.1 Motion capture data . . . . . . . . . . . . . . . . . . . . . . . 327.2 Usage of motion captured data . . . . . . . . . . . . . . . . . 34

8 Description of the software package 378.1 Mesh data format . . . . . . . . . . . . . . . . . . . . . . . . . 388.2 Animation XML file . . . . . . . . . . . . . . . . . . . . . . . 388.3 Camera configuration files . . . . . . . . . . . . . . . . . . . . 398.4 Exported animation . . . . . . . . . . . . . . . . . . . . . . . 408.5 Run parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3

4 CONTENTS

9 Used mathematics and transformations 429.1 Blender’s math . . . . . . . . . . . . . . . . . . . . . . . . . . 429.2 Blender Euler rotations . . . . . . . . . . . . . . . . . . . . . 439.3 Blender camera model . . . . . . . . . . . . . . . . . . . . . . 459.4 Using projection matrices with OpenGL . . . . . . . . . . . . 499.5 Bone transformations . . . . . . . . . . . . . . . . . . . . . . . 549.6 Vertex deformations by bones . . . . . . . . . . . . . . . . . . 559.7 Configuring the bones . . . . . . . . . . . . . . . . . . . . . . 579.8 Idealizing the calibrated camera . . . . . . . . . . . . . . . . . 58

10 The package usage 6110.1 Importing mesh and attaching to the skeleton . . . . . . . . . 6210.2 Fitting the model into views and texturing . . . . . . . . . . 6310.3 Loading animation, rendering and animation data export . . 64

11 Results and conclusions 6611.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Chapter 1

Introduction

Tracking in computer vision is still a difficult problem and in general itremains largely unsolved. Monocular tracking using only one camera ispossible with the knowledge of the tracked object model as described in [14].The 3D model of tracked object can be also a side product of the trackingalgorithms when learning the model that is going to be tracked. Having theprior knowledge of the model can simplify the process of learning and makethe tracking in computer vision more robust. The priors of human modelcan sufficiently constrain the problem [13].

The 3D estimation of a human pose from monocular video is often poorlyconstrained and the prior knowledge can resolve the ambiguities. Dimitri-jevic, Lepetit and Fua used generated human postures to obtain templatehuman silhouettes for specific human motions [3]. They used these tem-plates to detect silhouettes in images in both indoor and outdoor environ-ments where the background subtraction is impossible. This approach isrobust with respect to the camera position and allows detecting a humanbody without any knowledge about camera parameters. Since the detectedtemplates are projections of posed 3D models, it is easy to estimate the full3D pose of the detected body from 2D data.

The constraints for hierarchical joints structures tracked using existingmotion tracking algorithms solve the problems with false positive classifi-cation of poses which are unreachable for human body as shown by Herda,Urtasun and Fua [6]. Also the dependencies between the joints helps to de-termine valid poses. Due to the lack of data for joint ranges authors trackedall the possible configurations for hand joints and applied the acquired datato constrain the motion tracker.

Another use is in testing the computer vision algorithms. The noise inreal data makes detecting silhouettes and events in the scene harder andtherefore it is convenient to have ground truth data without the noise fortesting the algorithms. It is easier to define own scene with own model,generate an animation and then verify information gained by the algorithms.

5

6 CHAPTER 1. INTRODUCTION

Developing simple system for creating realistic animations may speed up theprocess of testing multi-camera systems. Modern computer animations allowus to generate realistic scenes, and therefore we can work with syntheticdata as well as with real data. This saves the time needed for capturing newdata from the real environment. For one’s safety it is preferred to simulatesome dangerous scene scenarios as it is often used in movies nowadays. Butcreating a realistic 3D animation is often a hand work that starts withmodeling a 3D mesh and finishes by its animation. This is a general processof creating animations used by computer artists which we tried to simplifyand automate. For testing and learning the algorithms we need severalhuman models with the same animations and poses. It is important for usto be able to change the model easily. It is true that the major changes canbe done by changing the textures, but sometimes the change of the wholemodel may be needed.

Our task was to design a realistic articulated human model and ways ofits animation. This human model is viewed from different angles by differentcameras. The idea was to use any of the existing SW for generating photo-realistic scenes such as Blender and Povray. Blender is mainly a 3D modelingand animation tool for creating realistic scenes, but it has more features.Blender supports scripting in Python language, this allows to modify camerasettings or objects in the scene easily. We chose Blender [1] and Python[10] for their features and availability. Python is built inside of Blenderfor Blender’s internal scripting. Blender supports usage of external Pythoninstallation with external Python modules. We took advantage of Blenderin skeletal animation. Blender’s GUI was used during preparation of thescenes and the human model. We also used Matlab for some computationsand testing.

Chapter 2

Articulated graphic model

Articulated graphic models are widely used in computer graphics and there-fore many approaches were developed in this branch. These models are oftencalled avatars, virtual humans, humanoids, virtual actors, animated char-acters. The sort of animation of these characters is often called characteranimation. The character animation is used in computer games, model-ing softwares, VRML (virtual reality mark-up language) etc. Most of the3D modeling and animation software tools are supplied with several modelsprepared for animation and also with character animation tools.

Blender has less sophisticated character animation support then com-mercial softwares, but allows similar functions with good results. Blender’scharacter animation ought to be used in game engine, so the focus is onthe speed instead of on visual quality. However Blender’s functions give useverything we need to build an animated human character with sufficientresults. Building the model is similar to other commercial 3D modelingsoftware tools.

Articulated character models usually consists of:

1. Mesh (wire-frame, polygon-based ) model.

2. Material, textures.

3. Virtual bones.

Some softwares also allows simulating muscles. In our work the Blenderarticulated model consists of these three parts only.

We can imagine the mesh model as skin of the model, see Figure 2. It de-scribes the surface of the model in the model’s rest position (see Figure 5.5),when there is no animation applied. The mesh model is a net of connectedtriangles or polygons and can contain separated parts. These polygons arealso called faces of the mesh. Each face is defined by 3,4 or more vertices.Vertex is a point in the 3D space. Vertex besides the coordinate parametersx,y,z can also have its normal as a parameter. To be able to determine wherethe face is facing to, each face has to have its normal.

7

8 CHAPTER 2. ARTICULATED GRAPHIC MODEL

Figure 2.1: Example of a mesh model of a head. The lines between verticesshow the edges of the polygons.

The material specifies how the model will be visualized, how it willreflect lights during the rendering, how it will cast shadows, the color ofmesh polygons and the translucency. A texture can be used simultaneouslywith the material. The texture is often a bitmap which is mapped on themesh model. Mapping the texture on the mesh can be done in several ways.The way mostly used is the UV mapping, also used in our work. The UVstands for 2D coordinates in the texture space. Usually each face vertex hasits UV coordinates. This defines the correspondence between the texturepixels and the face surface, see Figure 2.

Figure 2.2: Example of a textured cube mesh (on the left) and the texture(on the right) with unwrapped UV coordinates.

Virtual bones (further only bones) are used to deform the mesh, seeFigure 2. It is possible to animate each vertex separately, but unusablein case of thousands of vertices. Instead of this you attach the verticesto the bones and though any change of the bone position will affect the

9

vertices. This is a common way of character animation and it is imitatingreal body movement as well. Attaching the vertices to the bones is calledrigging. The most difficult problem is to find the correspondences betweenvertices and bones and this has not been fully solved yet. Artists createhuman models according to their own knowledge about the model and bodydeformations. One way, still often used especially in need of precise results,is to do it manually. At the beginning, the animator uses any automatictool and then corrects the mistakes. This is unusable if you need to changethe mesh. Blender allows attaching vertices to the closest bone but wefound this function not working properly in our case. The other possibilityis to use envelopes. The envelope of a bone defines a bone affected area.This may be useful in case of very simple meshes like robot models. Wewrote our own rigging function which we describe later in Chapter 5. Itfinds the correspondences between the vertices and the bones, and createsvertex groups using these correspondences. Each vertex group belongs to adifferent bone. It is possible to attach one vertex to several bones. Thereforethe final deformation is constructed of deformations of all these bones. Weachieved better results especially if the vertices are close to two bones.

Figure 2.3: The bones in Blender and the effect on the mesh in a new pose.

To get an articulated model we need to define a mesh, textures andbones. Blender is not supplied with such models as other softwares are, butBlender offers tools for creating, animating and rendering a model. Graphicartists often create a realistic model by hand which is very time consuming.The artists get well looking models because they work on all details. Forour purposes we need only approximate model. We need to change both themodel and the animation easily and quickly as well as the animation. Weautomated the whole process using available software and methods.

10 CHAPTER 2. ARTICULATED GRAPHIC MODEL

The project MakeHuman [7] is an open source project which offers atool for creating realistic humanoid models for computer graphics. In thecurrent release, the project allows to create only a mesh model. It is possibleto export the model into the Maya .obj format and reuse it in Blender. Themaintainers of the MakeHuman project are currently working on a muscleengine, which will simulate the musculature. This project does not containbones or textures but offers a wide variety in making humanoid mesh models.Modeling a human mesh model is much easier using this tool then Blenderso we used the MakeHuman to create the mesh models. The advantagesof using the MakeHuman include convenient pose for rigging the mesh andthe possibility of defining a generic skeleton of virtual bones for the meshesgenerated by MakeHuman.

The textures have the biggest influence on appearance of final renderedimages. For example computer games work only with simple models butthe final result looks realistic. The key is the proper texture. In our workwe need to be able to change the textures of the model. We setup a multi-camera system for capturing images of a real human in order to texture ourmodel. More is described in Chapter 6. The used algorithm is based ondetermination of visible and hidden pixels in each camera and chooses thebest view for texturing the particular mesh face.

Creating a skeleton for a character usually depends on levels of articu-lation and purposes of animation. The skeleton defines virtual bone con-nections. There exists a standard H-Anim [4] for humanoid animations inVRML, which defines joints and some feature points on a human body.The feature points are derived from the CAESAR project - the databaseof measured human data. We used the joints and the feature points fromthe H-Anim as a template during the designing the mesh skeleton. Usingthe MakeHuman allows generating a skeleton that match the most of thecreated meshes. H-Anim joint nomenclature was used for naming the bones.The final skeleton was defined in order to match the MakeHuman createdmesh and to be easy for animation with BVH files (more in Chapter 7.

The articulation is done through bones. It is possible to change thelength of the bones or their position during animation, but this makes nosense in the case of human body with constant bone lengths. However thismay be useful during fitting the model into captured images as describedin Chapter 6. The final articulation should be rotation of the bones only.Blender has an inverse kinematics solver. Pose of the character may bechanged quickly by grabbing the hands or feet instead of setting bone rota-tions.

Chapter 3

Blender and Pythonoverview

Blender [1] is open source software for 3D modeling and animation. We usedBlender version 2.42a and Python version 2.4. The choice of Blender versionis important, because Blender is under massive development undergoingchanges of the philosophy and data structures, unfortunately in BlenderPython API as well. For example we could not use the bones setting fromour previous work [8]. We used Blender as the rendering and animationengine. Blender supports inverse kinematics, tools for character animationand others. Blender uses Python for both internal and external scripting.It is possible to write a Python expression in the Blender text box or run anexternal Python script directly from Blender. The Python installation is notneeded, because Blender has Python built in. Blender can use an externalPython installation to extend functions and libraries. We do not describe allthe Blender features, we offer only an overview. More detailed informationmay be found in the Blender Manual [1] and in Blender Python API [1].Blender’s user interface and rendering is based on OpenGL [11]. BlenderPython API provides a wrapper of the OpenGL functions and allows usingOpenGL for creating user-defined interface in Python scripts. We used theOpenGL in our work for rendering in texturing algorithm as will be describedin Chapter 6.

We advice to read a Blender manual [2] to learn using the Blender. TheBlenderQuickStart.pdf file, which is shipped with Blender and can be foundin Blender installation folder, contains a brief tour of Blender usage. Somefunctions are accessible only through hot-keys or mouse clicking and mayneither be found in menus nor in Blender Python API.

Blender supports external raytracers as well as the internal renderer withsupport for radiosity or raytracing. Blender has a panorama rendering thatcan simulate the fish-eye cameras. The Blender has also the anti-aliasingfeatured called oversampling, motion blur, and gaussian filter. The whole

11

12 CHAPTER 3. BLENDER AND PYTHON OVERVIEW

animation composed of several frames can be rendered into one file ( the avior quicktime format) or into single images.

3.1 Blender coordinate system

Blender uses right-handed coordinate system as shown in Figure 3.1. Ro-tations used in Blender’s GUI are entered in degrees. For the sake of com-pleteness, we should mention that Blender internally calculates in radians.We notice this only for certainty. Blender expresses the rotations in bothEuler angles, quaternions, and matrices. The Quaternions are for exampleused for posing the bones. Blender uses Euler rotations for general objectsin Python API. More about rotations and transformations in Blender will bedescribed in Chapter 9. The units in Blender are abstract. The coordinatescan be absolute - in the world space, or relative to parent objects or objectorigins. In our work we perceive Blender units as metric units for easierexpression in real world.

Figure 3.1: Right-handed coordinate system.

3.2 Blender scene and objects

The scene in Blender consists of objects. Each object contain data for itsown specification. Data associated with the objects can be either a Mesh,Lamp, Camera, or Armature. The Mesh is a wire-frame model built ofpolygons (Blender call them faces). The basic parameters of the objectsinclude location and rotation (defined by Euler angles). These parameterscan be absolute (in the world space) or relative. Objects can be parents orchildren of other objects. For example our mesh model is composed of twoobjects: from the object with Mesh data, and from object with armaturedata. Armature objects hold bones in Blender (they are equivalent to theskeleton). The Armature object is parent of the Mesh object and controlsthe deformation of mesh vertices. The example of scene contents is shown inFigure 3.2, where eye symbol stands for camera, bone symbol for bones, axes

3.3. BLENDER MATERIALS AND TEXTURES 13

for general objects, symbol of man for an armature object and spheres standfor materials. The vertex groups shown as small balls will be described inChapter 5. They split mesh vertices into several groups that are used laterfor rigging with the bones. They are called same names as bones.

Figure 3.2: The example of a scene structure in Blender.

The bones in Armature objects define the rest pose of the skeleton. Infact three parameters define the bone: the head of the bone, the tail of thebone and the roll parameter which rotates the bone around its y axis. Thebones can be children of other bones and may inherit the transformations oftheir parent. If bones are connected to the parent, the bone head automat-ically copies the location of the parent’s bone tail as shown in Figure 3.3.

3.3 Blender materials and textures

Blender materials have many features but we use only few of them in ourwork. The materials are used for binding the textures. The textures mustbe used together with materials in Blender. The materials can be also usedfor simulating the human skin which does not behave as a general surface.The project MakeHuman has a plugin for Blender for simulating the skin.We do not use it in our work. Materials in our work are set to be shade-less. This means that the material is insensitive to lights and shadows andthe texture is mapped without any intensity transformation. Final texturedobject will appear the same from different views even if the illuminationdiffers. This makes recognizing learned patterns of an object easier. Thematerials in Blender also specify the mapping of the texture onto an object.The method which we use is UV mapping. This material option must beset explicitly, because this is not default texture mapping in Blender. We

14 CHAPTER 3. BLENDER AND PYTHON OVERVIEW

Figure 3.3: Bones in Blender, children bone is connected to parent bone andcopies the parent transformation.

set this option automatically in our scripts. No other changes are neededin default materials. The important thing for usage of the Blender PythonAPI is linking of the materials. The materials can be linked to both generalBlender Object and Mesh object. We link materials to the Mesh object. Upto 16 materials can be linked to an object. This also limits the number oftextures for uniform-material object.

The textures which we use are normal images loaded in Blender. Itis recommended to align color intensities between images before mappingthe textures. Obviously different cameras have different sensitivity on col-ors. The light sources in the scene have also effect on the illumination inthe captured image. The object textured with images of varied illumina-tion will appear inconsistent in final rendering. The other texture usage isdisplacement maps.

3.4 Blender camera

The camera in Blender is similar to pinhole camera or perspective projec-tion. The Blender camera can be switched to orthographic. The view is incamera negative z axis. The camera parameter lens sets the viewing angle.The value 16 corresponds to a 90 degree angle as shown in Figure 3.4. Thedrawback of Blender camera is in simulating a real camera projection. TheBlender camera always has ideal perspective projection with principal axisin the image center. The camera is configured as other objects in Blenderscene. The main parameters are location and rotation in the world space.The inverse of camera world transformation matrix can be used to get trans-

3.4. BLENDER CAMERA 15

formation from world space to the camera local space. This inverse matrixequals to [R,−C] expression in computer vision. This is more detailed de-scribed in Chapter 9. Other parameters which define the projection areindependent on camera objects and can be setup in Blender render param-eters. These parameters are width, height of the image and aspect ratio.More about the camera model and related transformations will be describedin Chapter 9.

Figure 3.4: Definition of lens parameter of Blender camera.

Chapter 4

Skeleton model definition

We use the MakeHuman project [7] for obtaining the 3D human mesh model.The advantage is easy change of the model. The pose and the proportions ofmost models created in the MakeHuman are approximately constant. Thisallows to define a general skeleton for most of the models. Using the Make-Human project we can generate a mesh using several targets for the legs,hands, face features etc. These models are intended to be used primarilyin computer graphics and may lack anatomical correctness. Models can beexported into a .obj file format which Blender can import. These modelsonly describe the surface of the model. They are without bones or materi-als. These models can be imported into Blender, where we attach bones formodel articulation.

The skeleton definition is ambiguous. The H-Anim standard [4] defineslevels of articulation, bones (joints), dimensions, and feature points of ahuman body for virtual humans (humanoids). But this standard is hardlyapplicable to other data structures, where the philosophy of character an-imation is different from the VRML (Virtual Reality Mark-up Language).However, the basic ideas and definitions of this standard can be convertedfor usage in Blender. We follow the H-Anim recommendations and use themas a template for the skeleton model. We adjusted the H-Anim definitionsfor easier use with motion capture format BVH. The names of our bonescorrespond with joint points of the H-Anim standard. We defined the gen-eral skeleton, which we generate by script, so it fits most of the MakeHumanmeshes. The advantage is having the accurate lengths and positions of thebones. This can be used for example if the computation of the final handposition in a new pose is needed. The locations are defined in table 4.1 andthe skeleton visualization is in Figure 4.1. The bones are defined by pairsbone head and bone tail, see Figure 3.3. Bones have also a roll parameter,which can be specified to rotate the bone space around the bone y axis. Thebone head is a joint location where the transformation applies. It is theorigin of the bone space. The bone head together with tail define the bone’s

16

17

Head TailBone name x y z x y z Parent bone

HumanoidRoot 0.000 0.824 0.0277 0.000 0.921 -0.080sacroiliac 0.000 0.921 -0.080 0.000 1.057 -0.034 HumanoidRootvl5 0.000 1.057 -0.034 0.000 1.4583 -0.057 sacroiliacvt3 0.000 1.4583 -0.057 0.000 1.7504 0.000 vl5HumanoidRoot to l hip 0.000 0.921 -0.080 0.096 0.843 -0.029 HumanoidRootl hip 0.096 0.843 -0.029 0.065 0.493 -0.011 HumanoidRoot to l hipl knee 0.065 0.493 -0.011 0.069 0.091 -0.054 l hipl ankle 0.069 0.091 -0.054 0.042 0.012 0.180 l kneeHumanoidRoot to r hip 0.000 0.921 -0.080 -0.096 0.843 -0.029 HumanoidRootr hip -0.096 0.843 -0.029 -0.065 0.493 -0.011 HumanoidRoot to r hipr knee -0.065 0.493 -0.011 -0.069 0.091 -0.054 r hipr ankle -0.069 0.091 -0.054 -0.042 0.012 0.180 r kneevl5 to l sternoclavicular 0.000 1.4583 -0.057 0.082 1.4488 -0.0353 vl5l sternoclavicular 0.0820 1.4488 -0.0353 0.194 1.434 -0.032 vl5 to l sternoclavicularl shoulder 0.194 1.434 -0.032 0.410 1.379 -0.062 vl5 to l sternoclavicularl elbow 0.410 1.379 -0.062 0.659 1.393 -0.052 l shoulderl wrist 0.659 1.393 -0.052 0.840 1.391 -0.042 l elbowvl5 to r sternoclavicular 0.000 1.4583 -0.057 -0.0694 1.460 -0.033 vl5r sternoclavicular -0.0694 1.4600 -0.0330 -0.194 1.434 -0.032 vl5 to r sternoclavicularr shoulder -0.194 1.434 -0.032 -0.410 1.379 -0.062 vl5 to r sternoclavicularr elbow -0.410 1.379 -0.062 -0.659 1.393 -0.052 r shoulderr wrist -0.659 1.393 -0.052 -0.840 1.391 -0.042 r elbow

Table 4.1: The bone locations in a rest pose. The units are abstract, butcan be perceived as meter units.

y axis. The bones can be connected together (see the parent bone column).The joint locations are dimensionless but can be perceived as meter units.

The imported mesh is sized and rotated so it fits the generated skeleton.The Blender has importing script for obj files but it allows to rotate themesh before importing. We advise to use a wrapper function written in ourscripts to avoid improper functionality. Our function uses the same scriptsupplied with Blender. The final non-textured articulated model is finishedby attaching (parenting) the bones to the mesh.

18 CHAPTER 4. SKELETON MODEL DEFINITION

Figure 4.1: The generic skeleton model as shown in Blender’s top view.

Chapter 5

Rigging the mesh with bones

In this chapter we describe the problems of rigging the Mesh with the Bones,and also our own algorithm which we found suitable for rigging MakeHumanmeshes with our general skeleton. The need of finding new rigging algorithmarose when Blender tools did not work well. Blender tools may attach meshvertices to an improper bone (if using envelopes option) or they may notattach all vertices (using closest bones). The rigging process (skinning theskeleton) is attaching the vertices to the bones. The vertices are attachedin armature (Blender’s skeleton) rest pose. The rest pose is a default posewithout rotations or other bone transformations. The rest pose is only de-fined by bones locations, lengths and by roll parameter. No transformationsare applied on the vertices in the rest pose.

The rigging is done by parenting an armature object (Blender’s skeletonobject) to a mesh object. Usually the armature object consists of severalbones. With the new Blender version it is possible to use the armature objectas a modifier instead of as a mesh parent object. Modifiers are applied ona mesh in order to modify it in several ways. We did not use this option.The armature deform options are: using the envelopes and using the vertexgroups. These options can be mixed. One vertex can be deformed by severalbones. The envelope defines an area around the bone. All the vertices inthis area are affected by bone transformations. The vertex groups definedirectly the correspondences between the vertices and the bones. A vertexgroup must have same name as the corresponding bone. The advantage ofenvelopes is quick and simple usage. You can place the bone into the desiredarea and start using it. The problem is in the envelope shape. Often thevertices which you do not want to attach are assigned to a bone. This isshown in Figure 5.1. The better way is using the vertex groups. You canadd or remove the vertices from the vertex group and directly control whichvertex will be affected by the bone. The problem is that the vertices mustbe added to the vertex groups explicitly. The Blender has a possibility of anautomatic grouping the vertices according to closest bones. This function

19

20 CHAPTER 5. RIGGING THE MESH WITH BONES

can miss some vertices as shown in Figure 5.1, that they do not correspondto any bone and stay unaffected. Therefore we found an algorithm whichis suitable for our general skeleton. In order to make a correct new meshpose, the proper correspondences of vertices with bones must be found. Thevertices can have weight parameter which defines the influence of the bonetransformation. This is useful in case of a vertex transformation by twobones. Vertices at the border of two bones are more naturally deformed ifthey are attached to the both bones with different weights, see Figure 5.2for comparison.

Figure 5.1: Bad deformation in the new pose. Using envelopes option onthe left and the vertex groups created from the closest bones function onthe right.

In our algorithm we use the vertex groups but select the group verticesusing our own classification function. The vertices with a higher angle tothe bone’s y axis are less affected by setting scaled weight. We also takean advantage of the mesh symmetry in our algorithm. The algorithm inpseudo-code looks as follows:

for bone in armature.bones :create vertex group( bone.name )

for vertex in mesh.vertices :J max = 0.0vertex group = bone.name

weight = 1.0for bone in armature.bones :

v1 = bone.head − vertex.location

v2 = bone.tail − vertex.location

vb = bone.tail − bone.head

J = AngleBetwenVectors( v1, v2 )

21

Figure 5.2: The vertex deformations with each vertex attached to one boneon the left. On the right, vertices at the bones border are attached to bothbones with different weights.

if bone not at the same side as the vertex thenJ = 0.0

if J max < J thenJ max = J

vertex group = bone.name

weight = AngleBetweenVectors( −v1, vb ) / 60.0assign vertex to group( vertex , vertex group, weight )if has parent(bone ) and ( weight < 1.0 ) then

assign vertex to group( vertex , parent(bone ).name,1.0 − weight )

See Figure 5.3 for better understanding. We use Blender Python API func-tion as AngleBetweenVectors function. The function returns absolute anglein degrees. The angle can be also computed as arc cosine of two vectors dotproduct arccos a·b

‖a‖‖b‖ . The main idea of vertex classification is finding thebiggest angle between bone’s head and tail as shown in Figure 5.3. If theangle α is bigger than angle β, then the vertex belongs to the parent boneand vice versa. The simulation of the algorithm is in Figure 5.4. We testeach vertex if it lies on the right (positive x) or the left (negative x) side ofy axis. We do not allow the vertices on the left side to be attached to boneswith joints on the right side. The weight factor is decreased for vertices withlarger angle γ between the vb vector and -v1 vector. If the weight is lessthen 1.0 and the bone has a parent bone, then the vertex is also added tothe parent bone group with the remaining weight (the weight is clamped inrange < 0, 1 >). How the vertices are deformed is described in Chapter 9.


Figure 5.3: The rigging algorithm is based on finding the maximum anglebetween bone’s head, tail and vertex.

This algorithm was tested with MakeHuman meshes and our skeletonmodel. Classifying only by angle gives good results with our bone config-uration. The final rigging results are shown in Figure 5.5, with rest posesand new poses for MakeHuman generated meshes.

23

Red shows the border for attaching the vertices to the bones

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Parent BoneChild Bone

Red shows the border for attaching the vertices to the bones

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Parent BoneChild Bone

Figure 5.4: The simulation of the vertex classification for two different boneconfigurations.


Figure 5.5: The rest pose and the new pose for rigged meshes with theskeleton.

Chapter 6

Texturing

Follow the chapters above, we can create an articulated mesh model. Thismodel looks like a figurine because it has not any texture or special materialattached. The texture differentiates the model from others and providesmore detailed information. In order to have a realistic human model wechoose digitally captured images of people for textures. We use multi-camerasystem in order to capture human images synchronously from different an-gles. The mesh model must be posed and fitted into the captured imagesbefore texturing. We use a manual posing and fitting using the Blenderinterface, see Figure 6.1. This task is hard to be solved automatically. Thecameras in the system are calibrated and synchronized, but they do not haveideal parameters. The real cameras suffer from radial distortion and skew.We can not simulate a real camera in Blender, so before fitting the modelinto images we must adapt the images and adjust the camera parameters.First the camera parameters must be computed a converted to Blender cam-era parameters. Then the radial distortion and skew must be removed fromimages. Because we fit the object in the view of Blender cameras, we alsomust shift the image origin to match the Blender camera projection. Thetransformations are discussed in Chapter 9.

6.1 Used approach

Mapping the texture onto a model is well recognized by computer graphicsand most of literature about 3D graphics investigate this. We use UV map-ping method for mapping images onto mesh faces. We use several imagesfrom different angles to cover whole mesh. The UV means, that each vertexof a face has its UV coordinates in the texture space. The texture is inter-polated during the mapping. Faces have an associated texture. The texturemapping is an affine transformation defined by the vertex UV coordinates.

Our problem is different illumination in images and the image noise.Another problem is that visibility of entire body surface is not possible (for

25

26 CHAPTER 6. TEXTURING

Figure 6.1: The manually positioned and posed mesh model into imagesusing the Blender’s user interface. The four cameras were used in this ex-ample.

example in the standing pose the bottom of feet is hidden). The distortionat boundaries arises due to limited accuracy and the problems mentionedabove. The next problem is occlusion of mesh faces from a camera view.An algorithm with good results covering all these problems was developedby Niem and Broszio [9]. They use a sequence of images from stationarycalibrated camera. They apply a maximal texture resolution approach forthe best face camera view classification but they extended the method forregrouping the faces in order to avoid boundary distortion. They synthesizea new texture for hidden parts. Another algorithm was presented by Starckand Hilton [12]. In their model-based reconstruction of people they composethe captured images into a single texture. We build on the same foundationsas used in these two works.

There are two possibilities of using the captured images as textures.The first is using one bitmap as the texture for whole mesh. This imposestatic unwrapped vertex UV coordinates (but if usage of the new model isdesired, the old UV coordinates are invalid). All images are then composedto a single bitmap. This allows easier manipulation with texture and easierfiltering in texture space. This is not a case of MakeHuman meshes becausethey are built of constant number of vertices, with constant ordering. Thedrawback of single bitmap texture is more difficult implementation.

The second option, which we use, is using several textures for one mesh.

6.2. THE PROBLEMS OF OUR METHOD 27

This method is also often used for manual texturing. It allows to specify dif-ferent materials for skin, clothes, hair etc. We use this approach but we donot use different materials. We use each captured image as a texture so thefinal model contains several textures. This makes coding easier without anymanipulations with images but it has many drawbacks. First the problemwith different illumination in images causes steps and distortion in bound-aries. From one camera the same object may look darker or brighter thanfrom another. This can be partly solved by color intensity alignment. Theobjects viewed from different angles have often occluded parts. The cameracalibration is never perfect so the object projection from one camera willnot match the projection from another camera exactly.

We did not solve these problems, they need to be addressed in futuredevelopment. It must be considered which philosophy of the texture modelwould be better, the model with one material and one texture or model withseveral materials and textures.

6.2 The problems of our method

The biggest challenges in texturing the model are: the face best visibilityestimation in the cameras and a texturing method of hidden parts. Thesimple method for testing the visibility is: render the object and read thepixels in result. The rendered pixels must contain information about thefaces. This has some drawbacks. Small polygons can be occluded even ifthey are visible in a camera.

The hidden parts can be as large as limbs or small as fingers. We can onlyguess the texture for these parts. For most of the real human bodies we canexpect small changes in texture in small area. The texture for hidden partswould be synthesized from closest visible parts, but for easier coding wepretend, that the polygons are visible from the same view as their neighborpolygons. We expect that occlusions are caused by the same or similar bodyparts. This impose poses where hands are laid up and do not occlude a bodyat least from one view. The back side of a limb is then occluded by the frontside. The probability of the same surface for the back side as for the frontside is high for human objects. Of course there are exceptions as heador colorful miscellaneous clothes. The appropriate cameras arrangement isimportant.

6.3 Determining the visibility of faces

The problem of visibility is well known in computer graphics. The renderingprocess must properly solve this task. Lot of algorithms were developed forthe visibility test. Some use z-buffer, some test and cull the viewing frustum,some test polygon normals. These algorithms are now often implemented


directly in hardware graphic accelerators in order to speed up the renderingprocess. Instead of writing our own functions, we use OpenGL renderingcapabilities which take advantage of hardware accelerators. Python scriptinterpretation is too slow for rendering algorithm.

Our approach is similar to Starck and Hilton [12]. We use the OpenGL[11] features, the hardware accelerated rendering to determine faces visi-bility. The OpenGL is turn to the simplest rendering mode with z-buffertesting and without shading. The face index is encoded into RGB polygoncolor value. All the OpenGL features which change the face color duringrendering (like shading) must be set off. The final rendered bitmap containsthe face indices of visible faces coded as colors. This has some drawbacks.We have only pixel accuracy for testing and small polygons can overlapeach other. Polygon edges can be overlapped by neighbor polygons. Wecompute the number of visible pixels in order to measure the visibility andfor determining the amount of occlusion we count the hidden (occluded)pixels.

Some small faces may not be classified as visible due to limited pixelaccuracy. Other faces can be hidden or occluded. It is better to have thesefaces textured for a better visual quality. The texture can be synthesized foroccluded faces. The implementation is complicated because we use severalbitmaps instead of a single bitmap. We pretend, that the faces are visible.In our algorithm we simply search in neighbor faces for a camera with thebest visibility for most of the faces. The problem is searching for neighborfaces because the data accessed through Blender Python API has no suchrelation as neighbor face. We must test the face vertices for same location.

6.4 The best visibility algorithm

The algorithm for estimating the best face visibility in camera views is fol-lowing (simplified for reading, pseudo-code):

% CLASSIFICATION OF THE FACES USING VISIBILITY

% DEFINED BY VISIBLE PIXELS

visible faces = []hidden faces = []selected image = Nonefor face in mesh.faces :

best visibility = visibility = 0for image in images :

[visible pixs,hidden pixs ] = image.pixels of( face )if hidden > 0 then

visibility = visible pixs / hidden pixs

else visibility = visible pixs

if visibility > best visibility

and (visibility > MIN VIS) thenbest visibility = visibility

6.4. THE BEST VISIBILITY ALGORITHM 29

selected image = image

if selected image thenset Texture( selected image )UV = selected image.get UV coordinates( face )set Face UV coordintes( face, UV )visible faces.append( face )

elsehidden faces.append( face )

% FINDING THE BEST VIEW FOR UNCLASSIFIED FACES

loop count = MAX OF LOOPSwhile hidden faces and loop count

new visible faces = []for face in hidden faces :

neighbors = get Neighbors( visible faces, face )if neighbors then

selected image = get Most Used( neighbors )set Texture( selected image )UV = selected image.get UV coordinates( face )set Face UV coordintes( face, UV )new visible faces.append( face )hidden faces.remove( face )new visible faces.append( neighbors )

loop count = loop count − 1visible faces = new visible faces

The algorithm part of finding proper texture for unclassified faces mustbe constrained to avoid infinite loops. Because some faces may not haveneighborhood faces (the mesh can contain separated parts). Other reasonis a slow speed of the algorithm. In our results we use value 10 for the con-stant MAX OF LOOPS. We also specify the constant MIN VIS to avoid primarytexturing from views with occluded faces. The value of MIN VIS was setto 0.5 in our work. The slowest part is searching the neighborhood faces.This could be done once and more effectively than we do. Because we donot have any relation between mesh vertices and mesh faces, we need totest vertex locations for each face vertex with the vertices of the rest of thefaces. We search only in the set of newly retrieved visible faces in order toeliminate visible faces which do not have hidden neighbor faces, thus thesearching set is reduced. This algorithm is neither sophisticated nor quick,but shows the possible approach to the problem. Figure 6.2 shows the usedimages for textures and the final textured model. The texture distortion inboundaries and the distortion caused by different illumination is present inthe final result. This is caused by capturing source images from angles withdifferent illumination and by different camera types. The additional erroris that manually fitted model does not match exactly the real object in thescene as shown in Figure 6.1.


6.5 Counting the visible pixels

Using the calibrated cameras we can render mesh model into images andestimate the vertex UV coordinates. As mentioned above we use OpenGLto speed-up the process. The use of camera projection matrix for controllingthe OpenGL view-port transformations will be described in detail in Chapter9. The visible pixels are obtained by the following process:

1. Render the model in OpenGL render mode with depth (z-buffer) testand without shading. Set the projection transformations using thecamera’s projection matrix. Render each mesh face with different color(face index is encoded in color RGB values).

2. Using the OpenGL feedback mode, process the model and obtain theimage coordinates for each face. The OpenGL feedback mode returnsdata of OpenGL pipeline right before rasterizing. This is suitable forobtaining correct face window coordinates which correspond to thepreviously rendered data.

3. Process every face of the model in render mode with depth testingturned off. Turning the depth testing off causes that the faces arefully rendered as they fit into the view. Compare the newly renderedarea with data from the first step. Count the equal color values asvisible pixels and others as hidden.

6.6 Conclusions (for texturizing approach)

This part of our work was most complicated to effectively code in Python.For speed issues we used a built-in support for OpenGL and we took ad-vantage of its fast rendering. This rapidly sped-up the whole process. Thedrawbacks are in the pixel accuracy and rasterizing. The very small facesrendered as a single pixel can be occluded by their neighborhoods. Thepolygon edges are discretized and therefore can overlap each other. Thiscauses that the face fully visible in a view can be enumerated as partlyvisible or hidden. We did not investigate the problems with texture distor-tion at edges or distortion caused by different illumination. We expect awell posed object for selecting the source texture for unclassified faces. Thecamera locations and orientations are also important. Our approach can beextended in a future development. The implementation in Blender showsthe possible usage. In future the algorithm presented by Niem and Broszio[9] can be fully implemented. This will expect synthesizing the texture forhidden parts and would allow usage of the sentence of images from a singlecamera. The model fitting can be done semi-automatically as presented byStarck and Hilton [12].

6.6. CONCLUSIONS (FOR TEXTURIZING APPROACH) 31

Figure 6.2: The four source images for texturing and the final renderedmodel in a new pose.

Chapter 7

Animation

The animation itself can be studied separately for its complexity. EvenBlender offers much more animation tools than we used (for example non-linear animation and actions). We will focus only on issues which we appliedin our work. As mentioned in previous chapters we use a skeleton for anima-tion. Besides that, it is possible to animate vertices separately and directlyby changing their position. Also the textures can be animated which can beuseful for example for animation of face gestures. We prefer to use motioncapture data for real representation of human movements. The proper in-terpretation of data is not easy. We use BVH format because Blender hasimport script for BVH files. It must be noted that this script is not workingcorrectly in our version of Blender with our BVH data even if they are cor-rect. The script may omit some characters in joint names during import. Itis always better to check joint names after import.

The object which holds the bones in Blender is called armature. Arma-ture contains several bones used for animation. Bones have pose channelswhich define the changes against the rest pose. The pose channel is aninterpolated curve which allows smooth animation. The curve is definedat desired frames and the rest of the curve is interpolated. The channelscan be for change in location, size or rotation. The bone rotation is ex-pressed in quaternions. For bone’s pose, channels are for each quaternionelement. Channel interpolated curves can be viewed and edited in Blender’sIPO curve editor window (where IPO stands for interpolated curve). Thenumber of frames and the frame rate can be set in Blender’s Button window.

7.1 Motion capture data

For motion capture data we use the BVH format. This has several advan-tages. This format is widely used by animation software and can be importedinto Blender. The BVH files can be obtained from Internet without the needof capturing own data.

32

7.1. MOTION CAPTURE DATA 33

The BVH file format was originally developed by Biovision, a motioncapture services company. The name BVH stands for Biovision hierarchicaldata. Its disadvantage is the lack of a full definition of the rest pose (thisformat has only translational offsets of children segments from their parent,no rotational offset is defined). The BVH format is built from two parts,the header chapter with joint definitions and the captured data chapter. Seethe example:

HIERARCHYROOT Hips{OFFSET 0.00 0.00 0.00CHANNELS 6 Xposition Yposition Zposition ...... Zrotation Xrotation YrotationJOINT Chest{OFFSET 0.00 5.21 0.00CHANNELS 3 Zrotation Xrotation YrotationJOINT Neck{OFFSET 0.00 18.65 0.00CHANNELS 3 Zrotation Xrotation YrotationJOINT Head{OFFSET 0.00 5.45 0.00CHANNELS 3 Zrotation Xrotation YrotationEnd Site{OFFSET 0.00 3.87 0.00}}}}}MOTIONFrames: 2Frame Time: 0.0333338.03 35.01 88.36 ...7.81 35.10 86.47 ...

The start of the header chapter begins with the keyword HIERARCHY. Thefollowing line starts with the keyword ROOT followed by the name of the rootsegment. The hierarchy is defined by curly braces. The offset is specified bythe keyword OFFSET followed by the X,Y and Z offset of the segment fromits parent. Note that the order of the rotation channels appears a bit odd,it goes Z rotation, followed by the X rotation and finally the Y rotation.The BVH format uses this rotation data order. The world space is definedas a right handed coordinate system with the Y axis as the world up vector.Thus the BVH skeletal segments are obviously aligned along the Y axis (this

34 CHAPTER 7. ANIMATION

is same as our skeleton model).The motion chapter begins with the keyword MOTION followed by a line

indicating the number of frames (Frames: keyword) and frame rate. Therest of the file contains the motion data. Each line contains one frame andthe tabulated data contains data for channels defined in header chapter.

7.2 Usage of motion captured data

We import BVH data using the Blender’s import script for BVH files. Itcreates empty objects in Blender’s scene. The empty objects copy the hi-erarchy of the BVH file (but as noticed above some characters from jointnames may be missing after import). The animation data are importedand animation channels for objects are created. As we know, there is nostandard for joint names or hierarchy of BVH format, but the data thatwe use have the same hierarchy and the joint names. Our data correspondwith our skeleton model, but the joint names differ. We use a dictionary forcorrespondences between our bones and captured data joints. Because weset only rotations for the bone poses (this makes sense for human skeleton),it is not a problem if the imported data are in different scale. But the dif-ferent scale between data and our skeleton model causes problem when wemove the whole skeleton object. Therefore we compute the scale factor fromknown height of our skeleton and expected height of the captured object.We measure from the ankle’s and head’s end site joint the expected heightof the used object for capture. Then the change in location is scaled bythis factor. After import, we go through the dictionary of bones and jointsand configure our bones to be parallel with corresponding joint links in allframes. We set the bones so they have the same direction as the axis con-necting the corresponding parent joint with the child joint. For the joints,whose rest pose of motion capture hierarchy differ from our rest pose (inour case the ankles), we set only the difference rotation from the rest pose.For some bones of our skeleton we do not have any corresponding joints andthen we let them unaffected. The computation of the needed rotation of thebones is described in Chapter 9.

We use the following dictionary (listed the Python code), the # in codedenotes a commentary:

skelDict={ "sacroiliac":["Hips","Chest"],"vl5":["Chest","Neck"],"vl5_to_l_sternoclavicular":["Neck","LeftCollar"],"l_sternoclavicular":["LeftCollar","LeftShoulder"],"l_shoulder":["LeftShoulder","LeftElbow"],"l_elbow":["LeftElbow","LeftWrist"],"l_wrist":["LeftWrist","LeftWrist_end"],"vl5_to_r_sternoclavicular":["Neck","RightCollar"],"r_sternoclavicular":["RightCollar","RightShoulder"],

7.2. USAGE OF MOTION CAPTURED DATA 35

"r_shoulder":["RightShoulder","RightElbow"],"r_elbow":["RightElbow","RightWrist"],"r_wrist":["RightWrist","RightWrist_end"],"HumanoidRoot_to_l_hip":["Hips","LeftHip"],"l_hip":["LeftHip","LeftKnee"],"l_knee":["LeftKnee","LeftAnkle"],"l_ankle":["LeftAnkle","LeftAnkle_end"],"HumanoidRoot_to_r_hip":["Hips","RightHip"],"r_hip":["RightHip","RightKnee"],"r_knee":["RightKnee","RightAnkle"],"r_ankle":["RightAnkle","RightAnkle_end"],"vt3":["Head","Head_end"]}

# the bones that will change relativellyonlyDif=[#"sacroiliac","vl5","vt3",

"l_ankle","r_ankle",#"l_knee","r_knee",#"l_hip","r_hip","HumanoidRoot_to_l_hip","HumanoidRoot_to_r_hip"]

Our interpretation of motion data can be improper. We do not knowwhere the measured joints are exactly located on the human body. We canmake only an approximate reconstruction of the movement. Despite this, weare able to create a realistic animation of the human movement. The errorswith limb self-occlusion may occur during the movement, this is caused bydifferent rest pose joint locations in data than in our model. However we areable to create a short animation quickly with captured data without manualpositioning. You can see the results in Figure 7.1.

36 CHAPTER 7. ANIMATION

Figure 7.1: The images from animation sequence generated with BVH mo-tion captured data. The model is without a texture.

Chapter 8

Description of the softwarepackage

We focused on using the open source software during the development. Weused also Matlab to speed up the process. We used Matlab instead of moredifficult Python implementation of some math functions like matrix decom-positions and using other Python packages. The whole package structure isshown in Figure 8.1. In our work we used free third party data for meshand motion. We designed our formats for storing our own data as cameraconfigurations. We use Blender scene files as templates for Blender. Mo-tion data are stored in BVH format as mentioned before and the mesh isexported from MakeHuman application into Maya .obj simple text format.

Figure 8.1: The structure of software package.

37

38 CHAPTER 8. DESCRIPTION OF THE SOFTWARE PACKAGE

8.1 Mesh data format

The data exported from MakeHuman into *.obj file are simple and easyreadable. You can see the example bellow (shortened printout):

# MakeHuman OBJ File# http://www.dedalo-3d.com/o human_meshv -8.162358 0.737658 4.963252v -8.179651 0.704912 4.979013v -8.178391 0.704847 4.960827v -8.180817 0.704855 4.918237v -8.162949 0.671900 4.963268v -8.161950 0.737917 4.945174...f 5740// 5749// 5761// 5758//f 5199// 5132// 5749// 5740//f 5206// 5748// 5749// 5132//f 5761// 5749// 5748// 5751//...

It starts with object’s name human mesh and continues with vertex defini-tions (starts with the v character). The last is the chapter of face definitions(starts with the f character). The faces are defined by indices of vertices.Faces can be triangles or quads (three or four vertex polygons).

8.2 Animation XML file

We used an XML file in our previous work for animation definition. As theBlender changed, some features are no further available and this file formatis deprecated. Instead, the BVH files with motion data must be used inthe case of bone driven animation. We show it here only for completeness.This approach can be still used, even though this is not expected. Thisfile contains animation description and is parsed by Python scripts, seeFigure 8.2 for example. The XML tags must correspond to Blender’s datastructure. The root tag must be animation. The recognized tags are:

startframe First frame to be rendered. Parameters:

n number of frame

endframe Last frame to be rendered. Parameters:

n number of frame

frame This tag tells which frame is going to be configured. All these set-tings will be done in this frame. Blender automatically linearly inter-polates object parameters between frames. Tag parameters:

8.3. CAMERA CONFIGURATION FILES 39

n number of frame

object Which object will be set up. The size of object depends on itsdefault size. Setting sx=”2.0” will produce two times bigger objectin x axis then the default, so object 2 units wide by default in x axiswill be 4 units wide. Depending on the type of object the followingparameters can be passed, but the name parameter is mandatory.

name objects name in Blender’s data structure

px objects position in x axis

py objects position in y axis

pz objects position in z axis

rx objects rotation around x axis in degrees

ry objects rotation around y axis in degrees

rz objects rotation around z axis in degrees

sx objects size in x axis

sy objects size in y axis

sz objects size in z axis

8.3 Camera configuration files

The camera configuration files (with .cam extension) are used for storing theconfiguration of cameras in Blender. Note that camera in Blender capturesin its negative z axis. Camera configuration file is a normal text file withparameters on each line:

C=3.0,0.0,0.0 is the camera centre in x,y,z axis

f=1 is focal plane distance

R=80.0,0.0,80.0 is camera rotation around its x,y,z axis

k=4:2 is aspect ration u:v

size=600x800 is output image size resolution in pixels

format=png is image output type format. Possible formats are:

aviraw Uncompressed AVI files. AVI is a commonly used format onWindows platforms

avijpeg AVI movie w/ Jpeg images

avicodec AVI using win32 codec

quicktime Quicktime movie (if enabled)

40 CHAPTER 8. DESCRIPTION OF THE SOFTWARE PACKAGE

<?xml version="1.0"?>

<animation>

We start with first frame.

<startframe n="1"/>

And the 30th frame will be last.

<endframe n="30"/>

Now we define the first frame.

<frame n="1">

The order of setting in frame tag doesn’t matter.

Here we say, that we want object mySkeleton

to be a 1.8 unit height.

The mySkeleton object is the name of armature object

in Blender data.

<object name="mySkeleton" sx="1.8" sy="1.8" sz="1.8" />

Here we set the position at (0,0,0)

<object name="mySkeleton" px="0.0" py="0.0" pz="0.0">

</object>

</frame>

<frame n="15">

<object name="mySkeleton" rz="0.0">

</object>

</frame>

<frame n="30">

The final position of object should be at (0,-1,0)

and rotated around z axis by 45 degrees.

<object name="mySkeleton" rz="45" py="-1.0">

</object>

</frame>

</animation>

Figure 8.2: Example xml animation file. (Deprecated)

tga Targa files

rawtga Raw Targa files

png Png files

bmp Bitmap files

jpg Jpeg files

hamx Hamx files

iris Iris files

iriz Iris + z-buffer files

ftype Ftype file

8.4 Exported animation

We also export vertex locations for testing for each frame and the bone posesas well. We use simple text file format. The files with vertex coordinateshave extension .verts and contain coordinates in x,y,z order for each vertexon a separate line:

-0.2762540280819, 1.4156879186630, -0.5783573985100;

8.5. RUN PARAMETERS 41

-0.2774430513382, 1.4150956869125, -0.5788146853447;-0.2793728411198, 1.4186201095581, -0.5797730088234;-0.2778972089291, 1.4197340011597, -0.5795810222626;-0.2761918604374, 1.4200913906097, -0.5792297720909;...

These data can be used for validation of human body detection. For bones,we export the head and tail locations in the same x,y,z order:

l_knee.head=[0.0485088936985, 1.4538201093674, -0.0491226166487]l_knee.tail=[0.0510297790170, 1.0506756305695, -0.0797354504466]l_elbow.head=[0.1631074249744, 2.1514077186584, -0.1573766618967]l_elbow.tail=[0.1602035462856, 1.9171544313431, -0.0712721124291]vl5.head=[-0.0243827812374, 2.0036427974701, -0.2223045825958]vl5.tail=[-0.0497045181692, 2.4030768871307, -0.2594771981239]r_ankle.head=[-0.1345274895430, 1.1150679588318, 0.0820896327496]r_ankle.tail=[-0.1498874127865, 1.0803805589676, 0.3276234567165]l_wrist.head=[0.1602035462856, 1.9171544313431, -0.0712721124291]l_wrist.tail=[0.1632092595100, 1.7440707683563, -0.0174388140440]...

The bone pose data can be used for testing the detected pose. The locationsare absolute ( in world space) for both bones and vertices. Data are exportedfor each frame to a separate file.

8.5 Run parameters

Syntax to run Blender with Python script is:

% WindowsSET bpypath=drive:\path_to_python_scriptsblender_dir\blender.exe template.blend -P %bpypath%\rahat\run.py

% Linuxbpypath=scripts blender_dir/blender template.blend -P \\$bpypath/rahat/run.py

This starts Blender in an interactive mode with opened template.blendfile and runs the main script with simple GUI. Few unexpected events mayoccur, like the script window is hidden, or the default windows arrangementcan be changed. This is caused by different settings in template.blend file.It is easy to switch to script window by icons. When you run other Blender’sscript meanwhile the main script is running, you may need to switch backto main script by choosing the active script for the script window. This canbe done by the scroll button on the script window panel. We use Blender’sGUI because some changes must be done manually and the whole packageis still under testing. The functions from GUI can be easily rewritten toanother script to automate the whole process of creating animations.

Chapter 9

Used mathematics andtransformations

For row vectors we use math bold font (v) and for matrices the math truetype font (M). The functions of Blender Python API (PoseBone.poseMatrix)are written with true type font. Definitions:

• O(bone) is the bone’s pose matrix 4 × 4. It is a transformation frombone space to the armature space.It is obtained by accessing the PoseBone.poseMatrix.

• M(object) is the object’s transformation matrix 4×4 to the world space.It is obtained by calling Python function Object.getMatrix().

• B(bone) is the bone’s transformation matrix 3 × 3. It describes theorientation of the bone in the rest pose.It is obtained by accessing the attribute Bone.matrix[’BONESPACE’].

• A(bone) is the bone’s transformation matrix 4 × 4 to the armaturespace in the rest pose.It is obtained by accessing the attribute Bone.matrix[’ARMATURESPACE’].

• P is the projection matrix 3× 4.

• ~P is the projection matrix extended for row vector to be 4× 4 shape.

9.1 Blender’s math

The Blender is based on OpenGL and thus accept also the OpenGL’s dataformat and transformations. The matrices are stored in the OpenGL column-major order which differs from standard C programming language interpre-tation of arrays. This causes that the matrices are transposed and the mul-tiplying order is changed, the vectors are transposed (row vectors are trans-posed to column vectors) as well. The matrix parameters array [a1, a2, . . .]

42

9.2. BLENDER EULER ROTATIONS 43

represent the following matrix:

M =

a1 a5 a9 a13

a2 a6 a10 a14

a3 a7 a11 a15

a4 a8 a12 a16

. (9.1)

This causes that a translation matrix in Blender has the following shape:

T =

1 0 0 00 1 0 00 0 1 0tx ty tz 1

. (9.2)

The following order must be presented to transform a vector by Blender’smatrix:

v>′= v> · M . (9.3)

This could be transposed but Blender’s Python API uses this matrix ar-ray representation and the vectors in Blender are row vectors as default(Blender math functions work improper with column vectors). Thereforewe follow this in the text whenever the Blender related transformations ap-pear. Elsewhere we use the common notation used in computer vision bookslike [5].

The coordinates in Blender can be relative to parent objects or absolute.The absolute coordinates are in the world space. The local coordinates arerelative to the object’s origin or to its parent. This space is called localspace. The space type must be specified for example if Blender Python APIfunctions are used:Object.getLocation(space) or Object.getEuler(space).

Besides that, Blender also recognizes the armature space, the bone spaceand the pose space. The coordinates in the armature space are relative to thearmature’s object origin. The coordinates in the bone space are relative tobone heads. Blender denotes the bone joints as the bone head and the bonetail. The bone space is defined by the bone configuration. The armaturespace is used to define rest bone positions. The pose space is used for vertextransformations from armature space to the pose space. While the armaturespace vertex locations define the rest pose, the pose space vertex locationsdefine new pose.

9.2 Blender Euler rotations

Blender uses Euler rotations to express the object rotation in the worldspace. The Blender Euler angles are in degrees (this corresponds to BlenderPython Euler object), but rotation angles used in object parameters are in

44 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS

radians (this corresponds to Blender Python Object.Rot* properties). Wedescribe here how to transform Blender Euler angles to a rotation matrix.The Euler angles suffer from drawbacks as gimbal-lock. The computationof rotation matrix is important mainly for computing the camera projectivematrix. Assume that we already converted angles to radians. The α, β, γangles are rotations around x,y,z axes. The final rotation matrix can becomposed as:

R(α, β, γ) = Rz(γ) · Ry(β) · Rx(α) . (9.4)

The rotation matrices around the individual axes are simple:

Rx =

1 0 00 cos α − sinα0 sinα cos α

, (9.5)

Ry =

cos β 0 sinβ0 1 0

− sinβ 0 cos β

, (9.6)

Rz =

cos γ − sin γ 0sin γ cos γ 0

0 0 1

. (9.7)

The final matrix is:

R(α, β, γ) = cos(γ) cos(β) − sin(γ) cos(α) + cos(γ) sin(β) sin(α) sin(γ) sin(α) + cos(γ) sin(β) cos(α)

sin(γ) cos(β) cos(γ) cos(α) + sin(γ) sin(β) sin(α) − cos(γ) sin(α) + sin(γ) sin(β) cos(α)

− sin(β) cos(β) sin(α) cos(β) cos(α)

.

(9.8)

The backward decomposition is more complicated and is not unique. Let Rbe a general rotation matrix with following parameters:

R(α, β, γ) =

a b cd e fg h i

. (9.9)

We can write using the result from (9.8) that:

sin(β) = −g, cos(β) = −√

1− g2 . (9.10)

The sign in cos(β) can be also positive, but we search only one solution.When we express sin(α), cos(α) we get:

sin(α) = − h√1− g2

, cos(α) = − i√1−g2

. (9.11)

9.3. BLENDER CAMERA MODEL 45

Substitution of sin(α), cos(α), sin(β), cos(β) back to rotation matrix givessuch identity: − cos(γ)

√1− g2 . . . . . .

− sin(γ)√

1− g2 . . . . . .g h i

=

a b cd e fg h i

. (9.12)

So sin(γ), cos(γ) can be expressed as:

sin(γ) = − d√1− g2

, cos(γ) = − a√1−g2

. (9.13)

This is correct if g 6= ±1, otherwise the rotation matrix is degenerated:

R(α, β, γ) =

0 x y0 y ±x±1 0 0

=

=

0 − sin(γ) cos(α)∓ cos(γ) sin(α) sin(γ) sin(α)∓ cos(γ) cos(α)

0 cos(γ) cos(α)∓ sin(γ) sin(α) − cos(γ) sin(α)∓ sin(γ) cos(α)

±1 0 0

(9.14)

and we get a 2 equations for 4 variables. We can choose for example α as:

sin(α) = 0, cos(α) = 1 (9.15)

and express the γ as:

sin(γ) = −b, cos(γ) = e . (9.16)

The lines above describe how we compute and decompose the rotation ma-trix composed of Blender Euler angles. This composition corresponds toBlender source code of the rotation matrix. Note that Blender uses trans-posed matrices, so a matrix obtained from Blender Python functions is trans-posed to the matrix (9.8) as well.

9.3 Blender camera model

The camera in Blender can be orthographic or perspective. We need onlyperspective camera for our purposes because this is a closest approximationto real pinhole camera. Projections in Blender are based on OpenGL so allthe transformations are similar to OpenGL viewing transformations. To beable to describe the camera and to compute the projection matrix, we mustunderstand the differences between the OpenGL and the computer visioncamera models. More about camera projections can be found in the book[5] by Hartley and Zisserman.


The basic camera parameters in Blender are location and rotation. Allthese parameters are related to the world space (as all objects in Blenderscene). These parameters do not transform world coordinates to camera co-ordinates but reversely. The inverse transformation must be used to trans-form world coordinates to a camera space. The biggest difference betweenBlender and computer vision is Blender camera view which is in negative zaxis (it is common for OpenGL projections), see Figure 9.1, where C is thecamera center. Cameras in Blender have also specified clipping plane forthe closest and the most far visible objects. This does not have influenceto the camera model, except that objects out of this clipping plane are notviewed.

Figure 9.1: A common OpenGL camera model used for perspective camerain Blender.

To avoid possible confusions, we define the used notation and conven-tions. First, we expect a pinhole camera model as shown in Figure 9.2. Notethat camera axes differ from OpenGL model shown in Figure 9.1.

The objects captured by this camera are projected into the image plane(x, y axes), see Figure 9.3. The image is rasterized to pixels (u, v axes). Herewe use the coordinates as for indexing the matrices of stored images. Therow is first and the column is second index. The origin is at the top leftcorner. The origin of the image plane is shifted by offset u0, v0 (the principalpoint) from the origin of the image. In the center of the image plane is thecamera optical axis.

The f parameter in Figure 9.2 is the camera focal length, this corre-sponds to lens parameter of Blender camera in Figure 3.4. We can writetransformation between parameters as:

f =lens

16.0. (9.17)

9.3. BLENDER CAMERA MODEL 47

Figure 9.2: A pinhole camera model.

Figure 9.3: Image plane (x,y axes ) an the indexing of the image pixels (u,v).

We need to estimate the projection matrix which is often used in com-puter vision. The projection matrix describes the projection from worldspace to the image pixels. The first step is transforming the world coordi-nates into the camera space. Because we know the camera location C andcamera rotation in world space, we can transform any point into cameraspace. We know the Blender camera object’s Euler angles α, β, γ, so we cancompute the rotation matrix. The rotation to camera space is inverse. Thereis also another rotation of the axes between the Blender camera model andthe pinhole model, see Figure 9.1 and Figure 9.2. This is caused by differentaxes orientation. This rotation can be written as:

T =

0 1 01 0 00 0 −1

. (9.18)

The rotation from world space to camera space is:

R = T · R(α, β, γ)> . (9.19)


The C is the vector of camera location in the world space. The whole trans-formation of any homogenous vector xw = [xw, yw, zw, 1]> to the cameraspace can be written as:

xc = [R,−C]xw , (9.20)

where the xc = [xc, yc, zc]>. This vector is projected on the image plane as: λxλyλ

=

f 0 00 f 00 0 1

xc (9.21)

where f is the focal length. Transformation from normalized coordinates topixels is the following: u

v1

=

−mu 0 u0

0 mv v0

0 0 1

x

y1

(9.22)

and so we can write the Blender camera projection matrix as:

P = K[R,−C] (9.23)

where the K is known as the calibration matrix and can be written as:

K =

−mu · f 0 u0

0 mv · f v0

0 0 1

. (9.24)

The computation of the parameters of K matrix is follows:

u0 =height

2,

v0 =width

2. (9.25)

Blender chooses the axis with maximum resolution and the other axis isscaled if the aspect ratio is not 1 : 1. The coefficients mu,mv can be esti-mated as:

m = max(width · kv, height · ku)

mu =m

2ku

mv =m

2kv

(9.26)

We use this code in our function:

9.4. USING PROJECTION MATRICES WITH OPENGL 49

if (width · kv > height · ku):mv = width

2

mu = mv · kvku

else:

mu = height2

mv = mu · kukv

where the ku : kv is the aspect ratio height : width defined in Blender renderparameters.

The projection matrices are then exported into simple text files and canbe used for computer vision algorithms.

9.4 Using projection matrices with OpenGL

In the previous chapter we derived the projection matrix of Blender perspec-tive camera. We need also to use the projection matrix with OpenGL forrendering into images captured by real cameras. Instead of using OpenGLutility functions, we must set the own OpenGL view transformations inorder to get real camera projection transformations. Before we describingtransformations, we do a quick overview of OpenGL.

OpenGL works almost always with homogennous vector coordinates andwith 4 × 4 matrices. We use here the OpenGL terminology. Object coordi-nates of any point are transformed by:

1. MODELVIEW matrix to eye coordinates, which we can understand inthe computer vision analogy as coordinates in the camera space. Thiscorresponds to Blender transformations from object’s local space toworld space and then to camera local space.

2. PROJECTION matrix transforms eye coordinates to clip coordinates.These coordinates are clipped in the viewing frustum.

3. The clip coordinates are transformed (divided) by perspective divisionto a normalized device coordinates. These coordinates are clamped ininterval < −1, 1 >.

4. Finally, the VIEWPORT transformation is applied to transform thenormalized device coordinates to window (pixel) coordinates.

All the matrices can be set explicitly, but it is recommended to useOpenGL utility functions. We will write Mv for MODELVIEW matrix andPj for PROJECTION matrix. The coordinates [Xo, Yo, Zo] of any point (the


object coordinates) are transformed to eye coordinates as:Xe

Ye

Ze

We

= Mv

Xo

Yo

Zo

1

(9.27)

then to clip coordinates: Xc

Yc

Zc

Wc

= Pj

Xe

Ye

Ze

We

. (9.28)

Finally the normalized device coordinates: Xn

Yn

Zn

=

XcWcYcWcZcWc

. (9.29)

Note that these coordinates are in interval < −1, 1 >. We set the defaultVIEWPORT transformations by OpenGL function

glViewPort(0,0,width,height)

We then get the window coordinates as follows: xyz

=

width/2 ·Xn + width/2height/2 · Yn + height/2

Zn/2 + 1/2

. (9.30)

The OpenGL window (image) coordinates x, y have the origin in the bot-tom left corner (this differs from our indexing of image as shown in Fig-ure 9.3). During rasterizing all the three coordinates x, y, z are used. Thelast coordinate z is stored in z-buffer for visibility test. You can see onthe Figure 9.4 how are the normalized device coordinates displayed in thefinal window (we can understand OpenGL window as the image, because weusually render into images). Figure 9.5 shows the image coordinates usedby our pinhole camera model u, v and by OpenGL x, y. We want to setthe OpenGL transformations so the projected coordinates in the windowx, y image will correspond to the image pixel coordinates u, v. We want todefine the same projection as the real camera has. We use the real camera’sprojection matrix. The point with world coordinates [Xw, Yw, Zw] must havethe same location in the OpenGL window (image) as in the captured image(see Figure 9.5). The equation is:[

xy

]=

[v

height− u

]. (9.31)


Figure 9.4: The normalized coordinates of OpenGL devices and their ras-terization in the window (image).

To be able to use a projection matrix P of size 3× 4 with OpenGL, we canextend it by a row vector [0, 0, 0, 1]to size 4× 4:

~P =

[P

0 0 0 1

], (9.32)

to get the transformation: uvw1

= ~P

Xw

Yw

Zw

1

. (9.33)

We need to find a matrix T, which we use to multiply the ~P matrix in orderto get the eye coordinates. The eye coordinates must finally project on thesame location in the image as if a point was captured by a real camera:

Xe

Ye

Ze

1

= T

uvw1

=

ax bx cx dx

ay by cy dy

az bz cz dz

0 0 0 1

uvw1

, (9.34)


Figure 9.5: Pixel coordinates of a pinhole camera model are on the leftimage, pixel coordinates of the OpenGL window (image) are on the right.

which gives a set of equations:

Xe = ax · u + bx · v + cx · w + dx

Ye = ay · u + by · v + cy · w + dy

Ze = az · u + bz · v + cz · w + dz

. (9.35)

Before we continue, we define the PROJECTION and VIEWPORT trans-formations. For the VIEWPORT transformation, we will use the transfor-mation defined in (9.30). It is a common way of setting OpenGL viewport.We will set the PROJECTION transformation matrix Pj to this shape:

Pj =

2

width 0 0 00 2

height 0 00 0 2

f−n −f+nf−n

0 0 1 0

, (9.36)

where n, f is respectively the nearest and the furthermost value of the view-ing distance. Note that the eye coordinates [Xe, Ye, Ze, 1]> will be mappedto clip coordinates[

Xe2

width, Ye

2height

,2Ze − f − n

f − n,Ze

]>. (9.37)

After perspective division we get the normalized coordinates:[Xe

Ze+

2width

,Ye

Ze+

2height

,2Ze − f − n

Ze(f − n)

]>=[

Xe

Ze+

2width

,Ye

Ze+

2height

,2(Ze − n)

f − n− 1

]>. (9.38)


If Ze = n then the window coordinate z is z = −1. If Ze = f then z = 1. Inthe x, y window coordinates we get the eye coordinates divided by distanceZe and shifted. So the origin is in the middle of the window (image). Weget these equations:

x =Xe

Ze+

width

2= v

w

y =Ye

Ze+

height

2= height− u

w (9.39)

further we can form a set of equations using (9.35):

ax · u + bx · v + cx · w + dx

az · u + bz · v + cz · w + dz= v

w − width2

ay · u + by · v + cy · w + dy

az · u + bz · v + cz · w + dz= height

2 − uw (9.40)

The one solution can be:

ax = 0, bx = 1, cx = −width

2, dx = 0

ay = −1, by = 0, cy =height

2, dy = 0

az = 0, bz = 0, cz = 1, dz = 0 (9.41)

and we can write the T matrix as:

T =

0 1 −width/2 0−1 0 height/2 00 0 1 00 0 0 1

(9.42)

This applies if we use the transformations (9.36) and (9.30). The sequence ofproper OpenGL commands for setting the described transformations looksas follows:glViewPort(0,0,width,height)glMatrixMode(GL PROJECTION)glLoadIdentity()glMultMatrixd( Pj)glMatrixMode(GL MODELVIEW)glLoadIdentity()glMultMatrixd( T)glMultMatrixd( ~P)Note that matrices in OpenGL are stored in a different format than someprogramming languages use. We use the ~P matrix multiplied by our T matrix(9.42) as OpenGL MODELVIEW matrix and we use our Pj matrix (9.36)to define OpenGL PROJECTION matrix.


9.5 Bone transformations

Lot of developers working with Blender were disappointed by changes inBlender 2.4 armature source code and therefore Blender developers pub-lished a schema to explain how the armature deformations work in Blender.You can see it in Figure 9.6. This describes the code and data structures butit does not say anything how this relates to the Blender Python API andhow to compose the matrices. We describe how we compute these matricesshown in the schema in Figure 9.6. The vectors bone.head,bone.tail, usedin this chapter, are row vectors, locations [x, y, z] of the bone head and tail.We pursue here the Blender Python notation so the equations correspondto the Python code.

First we need to describe computation of the quaternions. We can com-pute the normal vector n of two normalized vectors u,v as

n = u× v (9.43)

and the angle θ between the vectors as:

θ = arccos(u · v) . (9.44)

Then the quaternion of rotation u to v is

q =(

cos(

θ

2

),n sin

(θ

2

)). (9.45)

We then use the Blender Python functions to convert the quaternions torotation matrices. Here we will write

Rq(quaternion) , (9.46)

by this we mean a rotation matrix composed from quaternion.The bone matrix in bone space is 3 × 3. It describes the orientation of

the bone. It is defined by the location of the bone’s head and tail and by theroll parameter which rotates the bone around its local y axis. We computethe bone matrix in bone space

B(bone) = R(0, bone.roll, 0)> · Rq(q) , (9.47)

where R is the matrix from equation (9.8) and q is the quaternion composedas:

v =bone.tail− bone.head‖bone.tail− bone.head‖

n = [0, 1, 0]> × v

θ = arccos(u · v)

q =(

cos(

θ

2

),n sin

(θ

2

))(9.48)

9.6. VERTEX DEFORMATIONS BY BONES 55

We write R(0, bone.roll, 0)> because the Blender Python API works withtranslated matrices and we use Blender functions for matrix operations.

In order to get the bone matrix in the armature space we must extend thebone local space matrix to 4×4 and include the parent bone transformations.If the bone has not any parent bone we can express the bone matrix inarmature space:

A(bone) =

[B(bone) 0

0 0 0 1

] [I 0

bone.head 1

], (9.49)

where I is the identity matrix 3×3 and 0 is the zero vector. If the bone hasparent the estimation of the bone matrix in the armature space is different:

A(bone) =

[B(bone) 0

0 0 0 1

] [I 0

bone.head 1

]. . .

. . .

[I 0

0 parent.length 0 1

]A(parent) . (9.50)

The parent.length is the length of the parent bone, and can be computedas ‖parent.tail− parent.head‖.

Finally, the bone matrix in the pose space is distinguished from thematrix in armature space by the quaternion pose.rot and by the translationrow vector pose.loc which sets the bone to the new pose. The quaternionsare used to set the bone rotations. If the bone does not have a parent, wecan express its pose matrix:

O(bone) = Rq(pose.rot)

[I 0

pose.loc 1

]A(bone) . (9.51)

If the bone has parent, then the evaluation of the matrix is:

O(bone) = Rq(pose.rot)

[I 0

pose.loc 1

] [B(bone) 0

0 0 0 1

]. . .

. . .

[I 0

bone.head 1

] [I 0

0 parent.length 0 1

]O(parent) . (9.52)

The matrices described above are used to compute the bones locationand pose. We only revised the Blender documentation which is less specificon how the matrices are computed. The description that we gave here referto Figure 9.6, but describes only the basic deformations that we used in ourwork. This is enough for our purposes.

9.6 Vertex deformations by bones

We describe here vertex transformations of any mesh object by armaturebones. We describe only deformation caused by bones using vertex groups


and weights for vertices. The deformation by envelopes or combination ofvertex groups with envelopes is more complicated. The armature object isa parent to a mesh or in newer Blender’s philosophy it is a mesh modifier.In our work, each vertex has its own weight that defines the influence ofthe bone. The vertex can be member of several vertex groups (we set alimit of 2 groups). The vertex group only defines correspondence with bone.Each group is transformed only by this bone (note that the vertex groupsin Blender must have the same names as the bones to which they belong)but the vertex can be member of several groups. The final local spacevertex location can be also obtained using NMesh.GetRawFromObject(name)Blender Python function.

Assume we have a vertex local space location as vector:

v>l = [x, y, z, 1] . (9.53)

The world space vertex location is

v>w = v>l · M(object) , (9.54)

where the M(object) is the object transformation matrix. The vertex is mem-ber of this Mesh object. The vertex location in the parent’s armature spaceis

v>a = v>w · M−1(armature) . (9.55)

Now for each vertex group, where the vertex is member, we compute theweighted vector of relative change to the vertex rest position. The vertexlocation in the bone space is:

v>bi= v>a · A−1(bonei) . (9.56)

The vertex location in the armature space after applying the pose transfor-mation is

v>pi= v>bi

· O(bonei) . (9.57)

The weighted difference of the new location is (use only x,y,z parameters)

d1..3i = (v1..3

pi− v1..3

a )) · weighti . (9.58)

Note that weight is in interval < 0, 1 >. The final vertex location in thearmature space is

va = va +∑

di∑weighti

. (9.59)

The vertex location for a new pose is

v>l = v>a · M(armature) · M−1(object) , (9.60)

this is the final vertex local space location after deformations. We found thetransformations by studying the Blender source code. This is not completedescription of Blender bone transformations for vertices but it completelydescribes the transformations which we use.

9.7. CONFIGURING THE BONES 57

9.7 Configuring the bones

When we animate our model, we do not change the bone joint locationsor the bone sizes. We only rotate the bones to get the desired pose. Asdescribed in the Chapter 7, we use the correspondences between the emptyobjects (created by importing the BVH animation data) and our bones. Wemust estimate the rotation (expressed as quaternion) which rotates the bonefrom its rest pose to the new pose. The bone y axis must be parallel in thenew pose to the axis connecting the corresponding empty objects.

We can compute the bone’s head and tail locations in the rest pose as

[bone.head, 1] = [0, 0, 0, 1] Orest(bone)[bone.tail, 1] = [0, bone.length, 0, 1] Orest(bone) . (9.61)

We know the locations of the corresponding empty objects. We will denotethem empty.head, empty.tail. We need such transformation Q (rotation),which will transform the vector of head, tail to be parallel to vector of emptyobjects:

[bone.headq, 1] = [0, 0, 0, 1] Q · Orest(bone)[bone.tailq, 1] = [0, bone.length, 0, 1] Q · Orest(bone) . (9.62)

The following equality must be valid:

bone.tailq − bone.headq

‖bone.tailq − bone.headq‖=

empty.tail− empty.head‖empty.tail− empty.head‖

. (9.63)

Because the sought transformation, is only a rotation, we can omit the sizeof the vectors and write:

[0, bone.length, 0, 1] Q = (empty.tail−empty.head)∗O−1rest(bone) . (9.64)

We compute the vector x

x = (empty.tail− empty.head) ∗ O−1rest(bone) (9.65)

and compute the quaternion q which defines the transformation Q:

v =x

‖x‖n = [0, 1, 0]> × v

θ = arccos(u · v)

q =(

cos(

θ

2

),n sin

(θ

2

)). (9.66)


We use this quaternion in bone object’s parameter to define the rotation.This quaternion object can be used directly with bones without any trans-formation to rotation matrix. If the rotation should be relative to the emptyobject locations, we then use:

v =empty.tail− empty.head‖empty.tail− empty.head‖

u =empty.tailrest − empty.headrest

‖empty.tailrest − empty.headrest‖n = u× v , (9.67)

where the empty.tailrest, empty.headrest are locations of the empty ob-jects in the animation’s first frame. The locations in the first frame we referas the default locations.

9.8 Idealizing the calibrated camera

Because we did not write own visualization for fitting the model into thecamera views, we must use the Blender camera views. The problem isthat the Blender camera intrinsic parameters are ideal and the real cameracan not be simulated in Blender. We need also to separate the extrinsicparameters of the calibrated camera to be able to configure the Blendercamera to the same view.

We can decompose the first three columns of projection camera matrixusing the RQ decomposition as:

P1..3 = K · R (9.68)

and get the translation vector from multiplying the inverse calibration ma-trix by last row of projection matrix:

t = −K−1 · P4 . (9.69)

The camera center in the world coordinates is:

C = R> · t . (9.70)

The camera center and the rotation matrix is enough to define Blendercamera location and orientation but the intrinsic parameters of the cameracan not be obtained so easily. The general calibration camera matrix is:

K =

∗ ∗ ∗0 ∗ ∗0 0 ∗

(9.71)

This is different if compared with the matrix from equation (9.24). We omitthe skew in calibration matrix and estimate the parameters of the Blender

9.8. IDEALIZING THE CALIBRATED CAMERA 59

camera so the final transformation fits the real camera parameters. We cancompute the aspect ratio

ku

kv=

K(1, 1)K(2, 2)

(9.72)

and the focal length

f =K(1, 1)

2 max(width, height). (9.73)

We can not set the image pixel coordinates origin offset in the calibrationmatrix of Blender camera. But we can shift the image origin to match theBlender camera calibration matrix. We compute the needed offset as:

shift =[width

2,height

2

]>− K3

1..2 (9.74)

The computed shift was up to 50 pixels in our data. It is apparent thatwithout idealizing and shifting the images it is impossible to fit the objectinto views using the Blender camera views.


Figure 9.6: The schema of Blender bones structure and transformations,describing how the armatures in Blender work. This Figure was obtainedfrom Blender web site [1].

Chapter 10

The package usage

In this chapter we show how to use the software package. We show thepossibilities in creating an animation. We use the Blender in interactivemode with own user interface due to need for debugging and testing thePython code. This also helps to understand the procedure of creating ananimation. Unfortunately this requires at least a minimum knowledge ofBlender usage. The objects in the Blender scene must be selected by rightmouse button click before an operation is chosen. The operation with sceneobjects can be attaching the mesh to skeleton or copying BVH animation.We describe here a quick tour of creating an animation. We expect theseinputs to be already obtained:

• Exported MakeHuman mesh into *.obj file.

• Captured images from calibrated video together with decomposed andidealized camera configurations in *.cam files. For this see the Section9.8 in Chapter 9. For the *.cam file format see the section 8.3 inChapter 8.

• The BVH motion captured data. Note that some BVH files may notbe compatible with our scripts.

If you run the Blender with parameters given in section 8.5 in Chapter 8,you will get the screen shown in Figure 10.1. It is possible to access thefunctions from our scripts through the buttons in the right panel. It is easyto write own Python scripts to automate the whole process instead of usinguser interface and clicking on buttons. Using the buttons is more educativefor learning and understanding the usage and functions of our package.

61

62 CHAPTER 10. THE PACKAGE USAGE

Figure 10.1: The Blender window after start of run.py script. Buttons withour functions are on the right ( import and export, configuration and exportof cameras, mesh import and animation etc.).

10.1 Importing mesh and attaching to the skele-ton

It is easy to import any mesh. You just choose the Import MH Mesh buttonand select the proper file to import. The window may need to be redrawnafter import. You can resize the window or zoom it in order to redraw.The imported mesh should appear. The mesh should be also selected (high-lighted). If not, it must be selected by clicking with right mouse button.At the moment after import, the mesh is bigger and in different orientationthan we generate the skeleton. The button Fit to Skeleton must be used inorder to align mesh with the skeleton. The window may need to be redrawnagain. The skeleton can be now generated using the Generate Skeleton but-ton. Now mesh can be attached to the skeleton. Both the mesh and theskeleton object have to be selected. This can be done by holding the Shiftkey and clicking with right mouse button on the objects. The subwindowshould look like in Figure 10.2. Then the Attach Skeleton to Mesh buttoncan be used. Now the mesh was attached to skeleton. To test the artic-ulation, press Ctrl+Tab and switch to the pose mode. Select a bone withthe right mouse button and press the R key. The bone will start rotating.Switching back to the object mode is done again by Ctrl+Tab. After that,the mesh model is prepared for articulation and can be fitted into camera

10.2. FITTING THE MODEL INTO VIEWS AND TEXTURING 63

Figure 10.2: The Blender window with skeleton and mesh object selected.

views in order to cover the model with textures.

10.2 Fitting the model into views and texturing

We must admit it is practically impossible to get precise results by manualmodel fitting into the specific pose in camera views. Nevertheless we didnot write any function for automatic fitting. We can fit roughly the modelinto the desired object in camera views using the Blender interface. Wecan move, scale, rotate our model and bones as well, in order to match theimages.

The fitting process is the following.

1. Load the images into the Blender.

2. Switch the 3D View window to the UV/Image editor window by se-lecting an icon from the menu. (The first icon in the list).

3. Through menu choose Image and Open to load the image into Blender.

After loading all the images we can set the cameras in the scene.

1. We switch the window back to 3D View.

2. From the script window choose the button Load and Setup Cameraand select the proper *.cam file.

3. Then from the 3D View window panel choose the View and Camerato get the view from the current camera.

64 CHAPTER 10. THE PACKAGE USAGE

4. Enable the background image in the camera view from the panel bychoosing the View and Background image. Enable it and select thecorresponding image with the current camera.

5. Split the window by clicking the middle mouse button on the windowborder and select the Split area.

6. In the one of the new windows, unlock the camera view by clickingon the icon with lock (the icon must be unlocked) and in the secondwindow lock the view by the same icon.

7. Load the next camera, select the camera by Get next camera button.Continue from the beginning to set the view to camera view and toset the background image.

All the cameras can be processed by the process listed above.After configuring the cameras in the Blender, we can fit our model into

views using rotation (the R key), moving (the G key) and scaling (the Skey) in both object and pose mode. You can see the results in Figure 6.1on page 26.

Before attaching images as textures, we need to export camera projectionmatrices by Export Camera Projection Matrix. These matrices are used todefine our Blender camera projections to images. We then select an imagein the UV/Image editor window and assign the projection matrix by AssignP matrix to Image button. After all, we attach the images as texturesby Attach Images As Texture button. The model is then texturized fromBlender camera views by loaded images.

10.3 Loading animation, rendering and animationdata export

We load the BVH files using the Blender’s import script for BVH motioncaptured data. It can be accessed through the Blender menu by selectingFile, Import, Motion Capture (.bvh) (we recommend to set the script scaleparameter to 0.1 for better visualization). This script imports BVH dataand creates empty objects connected in the same hierarchy as the BVHdata joints. This script sometimes badly imports joint names. We adviceto check the empty object names and correct them if it is needed (this canbe done by pressing the N key and by rewriting the name). The running ofthe other script swaps the active script in the script window and our scriptcan disappear. It can be recovered by choosing our script from the scriptslist on the script window’s panel. If the armature object is selected, theCopy BVH movement button can be pressed to copy imported animation.Important is setting of the start and the end frame in the Buttons windowin the anim scene panel.

10.3. LOADING ANIMATION, RENDERING AND ANIMATION DATA EXPORT65

To render the animation for each camera, the Render for each camerabutton can be used. The button Export animation can be used for exportingvertex locations through the frames and for exporting the bone poses aswell. All the other Blender features like lights, diffuse rendering, shadowcasting etc. can be also used to get better results. These features are welldocumented in Blender Manual [2].

Chapter 11

Results and conclusions

We extended Blender functionality for the needs of computer vision. Wecan generate a wide variety of real human animations from motion captureddata. We can use outputs of multi-camera system to adapt our models toreal observations.

We created a general skeleton model which fits mesh models created byMakeHuman software. We followed the H-Anim standard and we adaptedthe recommendations for use with BVH motion captured data. We definedthe joint locations and the level of articulation. We use 22 virtual bones,but only 19 of them are intended for articulation. The skeleton model isshown in Figure 4.1 (page 18).

Together with MakeHuman mesh model and our general skeleton modelwe created the articulated model. We attached automatically mesh verticesto the bones using our own algorithm. Articulated models in new poses areshown in Figure 5.5 (on page 24).

We setup calibrated multi-camera system for acquiring the images fortextures. We manually fitted the articulated model using Blender interfaceinto camera views. The acquired images were idealized to match the Blendercamera model which have principal axis in the center of the image. We thenmapped the textures onto the mesh faces taking into account faces visibilityin camera views. The textured articulated model is shown in Figure 6.2 (onpage 31).

For animation we used motion captured data stored in BVH format.We imported data into Blender and copied the animation using dictionaryof correspondences between our skeleton model bones and the motion datajoints. The animation example is shown in Figure 7.1 (page 36).

11.1 Conclusions

We developed a software package that can be used for quickly creating an-imation. Most of the process tasks can be done automatically or can be

66

11.1. CONCLUSIONS 67

fully automated by scripting. We used the open source software which isfree to use. Using the modeling software Blender, we can extend the ani-mated scene for objects. We developed a platform which can be extendedand improved by more advanced algorithms for texturing or model creation.

The main goal was to create a tool by which we can extend the robust-ness of our tracking algorithms. Outputs can be used for learning humansilhouettes from different views, fitting the articulated model into observa-tions or just testing the algorithms on ground-truth data. Our models cannot compete with virtual actors used in movies, but they are easy to createand animate.

The steps that we used during creating an animation of the articulatedmodel are:

• Obtaining model representation (mesh model)

• Defining the virtual skeleton

• Setting the textures

• Animation using motion captured data

The reuse of the motion captured data with our model is not easy. Wedid not use any constraints for bone articulation which could help to boundthe bones in valid poses. The model limbs can penetrate each other duringthe animation. This error is caused by improper interpretation of motiondata. However, the motion data allow us to create animations with realistichuman motions.

We can recommend the following for the further development :

• The mesh model should be adapted to the real object captured bycameras. So the shape of the model will match the object’s shape.The texturing of the model can be done with less distortion caused byover or under fitting the model into the object.

• The H-Anim standard corresponds with the human anatomy. The H-Anim defined human body feature points would be used for automaticmodel fitting into views.

• It would be better to use own captured motion data with exactly de-fined joint locations on the human body. Constrains for joint rotationsshould be also used.

We covered the steps of creating the articulated model animations. Wedescribed the needed transformations between computer vision and com-puter graphics. The developed software can be used anywhere the simplerealistic human animations are needed.

Bibliography

[1] Blender. Open Source 3D Graphics modelling, animation, creation soft-ware. http://www.blender.org,.

[2] Blender documentation. Blender online wikipedia documentation.http://mediawiki.blender.org.

[3] M. Dimitrijevic, V. Lepetit, and P. Fua. Human body pose recogni-tion using spatio-temporal templates. In ICCV workshop on ModelingPeople and Human Interaction, Beijing, China, October 2005.

[4] Humanoid animation (h-anim) standard. ISO/IEC FCD 19774:200x.http://www.h-anim.org.

[5] Richard Hartley and Andrew Zisserman. Multiple view geometry incomputer vision. Cambridge University, Cambridge, 2nd edition, 2003.

[6] L. Herda, R. Urtasun, and P. Fua. Hierarchical implicit surface jointlimits for human body tracking. Computer Vision and Image Under-standing, 99(2):189–209, 2005.

[7] Makehuman (c). The human model parametric modeling application.http://www.makehuman.org.

[8] Ondrej Mazany and Tomas Svoboda. Design of realistic articulatedhuman model and its animation. Research Report K333–21/06, CTU–CMP–2006–02, Department of Cybernetics, Faculty of Electrical Engi-neering Czech Technical University, Prague, Czech Republic, January2006.

[9] W. Niem and H. Broszio. Mapping texture from multiple camera viewsonto 3D-Object models for computer animation. In Proceedings of theInternational Workshop on Stereoscopic and Three Dimensional Imag-ing, 1995.

[10] Python. Interpreted, objective-oriented programming language.http://www.python.org.

68

http://www.blender.org

http://mediawiki.blender.org

http://www.h-anim.org

http://www.makehuman.org

http://www.python.org

BIBLIOGRAPHY 69

[11] Mark Segal, Kurt Akeley, and Jon Leach (ed). The OpenGL GraphicsSystem: A Specification. SGI, 2004.

[12] Jonathan Starck and Adrian Hilton. Model-based multiple view recon-struction of people. In International Conference on Computer Vision,ICCV, volume 02, pages 915–922, Los Alamitos, CA, USA, 2003. IEEEComputer Society.

[13] R. Urtasun, D. J. Fleet, A. Hertzmann, and P. Fua. Priors for peo-ple tracking from small training sets. In International Conference inComputer Vision, pages 403–410, October 2005.

[14] Karel Zimmermann, Tomas Svoboda, and Jirı Matas. Multiview 3Dtracking with an incrementally constructed 3D model. In Third Inter-national Symposium on 3D Data Processing, Visualization and Trans-mission (3DPVT), Piscataway, USA, June 2006. University of NorthCarolina, IEEE Computer Society.

3d human model

Documents