insights into low-level avatar animation and mpeg-4 standardization

25
Signal Processing: Image Communication 17 (2002) 717–741 Insights into low-level avatar animation and MPEG-4 standardization Marius Preda*, Fran - coise Preteux Unit ! e de projets ARTEMIS, Institut National des T ! el ! ecommunications, 9 rue Charles Fourier, 91011 Evry, Cedex, France Abstract Referring to the new functionality of video access and coding, the survey presented here lies within the scope of MPEG-4 activities related to virtual character (VC) animation. We first describe how Amendment 1 of the MPEG-4 standard offers an appropriate framework for virtual human animation, gesture synthesis and compression/ transmission. Specifically, face and body representation and animation are described in detail in terms of node syntax and animation stream encoding methods. Then, we discuss how this framework is extended within the ongoing standardization efforts by (1) allowing the animation of any kind of articulated model, and (2) addressing advanced modeling and animation concepts as ‘‘skin and bones’’-based approach. The new syntax for node definition and animation stream is presented and discussed in terms of genericity and additional functionalities. The biomechanical properties, modeled by means of the character skeleton that defines the bone influence on the skin region, as well as the local spatial deformations simulating muscles, are supported by specific nodes. Animating the VC consists in instantiating bone transformations and muscle control curve. Interpolation techniques, inverse kinematics, discrete cosine transform and arithmetic encoding techniques make it possible to provide a highly compressed animation stream. The new animation framework extension tools are finally evaluated in terms of realism, complexity and transmission bandwidth within a sign language communication system. r 2002 Elsevier Science B.V. All rights reserved. Keywords: Virtual character (VC) animation; MPEG-4 standard; Face and body animation (FBA); Skin&Bones (SB); Bone-based animation (BBA); Low-bit-rate compression; Animation Framework eXtension (AFX) 1. Introduction The continuous development of multimedia software and hardware technologies, together with the explosive growth of the Internet, motivate an increasing interest in effective compression tools for audio–visual content in order to drastically reduce the cost of data transmission in multimedia environments. The Moving Picture Expert Group (MPEG) aims at providing standardized core technologies allowing efficient storage, transmis- sion and manipulation of audio/video/3D data. MPEG-4, an international standard since Decem- ber 1999 [29], is specifically intended to cope with the requirements of multimedia applications, allowing new functionalities like video manipula- tion, scalable video encoding, synthetic and natural hybrid coding, 3D object compression, and face and body animation (FBA) and coding. Parametric model-based video coding provides a very low bit-rate compression, making possible *Corresponding author. E-mail address: [email protected] (M. Preda). 0923-5965/02/$ - see front matter r 2002 Elsevier Science B.V. All rights reserved. PII:S0923-5965(02)00077-2

Upload: marius-preda

Post on 04-Jul-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Insights into low-level avatar animation and MPEG-4 standardization

Signal Processing: Image Communication 17 (2002) 717–741

Insights into low-level avatar animation andMPEG-4 standardization

Marius Preda*, Fran-coise Preteux

Unit!e de projets ARTEMIS, Institut National des T!el!ecommunications, 9 rue Charles Fourier, 91011 Evry, Cedex, France

Abstract

Referring to the new functionality of video access and coding, the survey presented here lies within the scope of

MPEG-4 activities related to virtual character (VC) animation. We first describe how Amendment 1 of the MPEG-4

standard offers an appropriate framework for virtual human animation, gesture synthesis and compression/

transmission. Specifically, face and body representation and animation are described in detail in terms of node syntax

and animation stream encoding methods. Then, we discuss how this framework is extended within the ongoing

standardization efforts by (1) allowing the animation of any kind of articulated model, and (2) addressing advanced

modeling and animation concepts as ‘‘skin and bones’’-based approach. The new syntax for node definition and

animation stream is presented and discussed in terms of genericity and additional functionalities. The biomechanical

properties, modeled by means of the character skeleton that defines the bone influence on the skin region, as well as the

local spatial deformations simulating muscles, are supported by specific nodes. Animating the VC consists in

instantiating bone transformations and muscle control curve. Interpolation techniques, inverse kinematics, discrete

cosine transform and arithmetic encoding techniques make it possible to provide a highly compressed animation

stream. The new animation framework extension tools are finally evaluated in terms of realism, complexity and

transmission bandwidth within a sign language communication system.

r 2002 Elsevier Science B.V. All rights reserved.

Keywords: Virtual character (VC) animation; MPEG-4 standard; Face and body animation (FBA); Skin&Bones (SB); Bone-based

animation (BBA); Low-bit-rate compression; Animation Framework eXtension (AFX)

1. Introduction

The continuous development of multimediasoftware and hardware technologies, together withthe explosive growth of the Internet, motivate anincreasing interest in effective compression toolsfor audio–visual content in order to drasticallyreduce the cost of data transmission in multimediaenvironments. The Moving Picture Expert Group

(MPEG) aims at providing standardized coretechnologies allowing efficient storage, transmis-sion and manipulation of audio/video/3D data.MPEG-4, an international standard since Decem-ber 1999 [29], is specifically intended to cope withthe requirements of multimedia applications,allowing new functionalities like video manipula-tion, scalable video encoding, synthetic andnatural hybrid coding, 3D object compression,and face and body animation (FBA) and coding.Parametric model-based video coding provides avery low bit-rate compression, making possible

*Corresponding author.

E-mail address: [email protected] (M. Preda).

0923-5965/02/$ - see front matter r 2002 Elsevier Science B.V. All rights reserved.

PII: S 0 9 2 3 - 5 9 6 5 ( 0 2 ) 0 0 0 7 7 - 2

Page 2: Insights into low-level avatar animation and MPEG-4 standardization

applications like video conferencing and video-telephony, mobile communications and mobilemultimedia related applications.Character animation is a well-known issue in 3D

related applications, especially because of thedesire to use human beings as avatars in virtualor hybrid environments. As part of Amendment 1of the MPEG-4 standard, the Synthetic andNatural Hybrid Coding Group [29] addresses theanimation of human avatars—so-called face andbody animation (FBA)—by specifying (1) theavatar definition data representation, and (2) theanimation data representation and compression.With the same objective of low-bit-rate streamedanimation, the Animation Framework eXtension(AFX) group has currently adopted the specifica-tions of the so-called bone-based animation (BBA)[52] allowing a more realistic animation of anykind of articulated character. AFX, which will bepart of Amendment 4 of the MPEG-4 standard atthe end of 2002, extends 3D related technologies ofMPEG-4 Amendment 1 and addresses new onessuch as subdivision surfaces, volume representa-tion, view-dependent texture and mesh transmis-sion.In the last decade of the 20th century,

networked graphics applications using virtualcharacters (VC) were mostly prototype systems[7,13] demonstrating the effectiveness of thetechnology. At the beginning of the 21st century,the related commercial systems are booming,mainly thanks to technical developments in thearea of networked games [2,19]. In this context,current efforts for providing real applicationswithin a unified and interoperable framework arematerialized by 3D graphics interchange standardssuch as VRML [55] and MPEG-4, AFX claimingto become the future 3D interchange reference.Animating an 3D VC involves continuouslychange its shape. However, different animationmodels can be applied. They can be structuredhierarchically according to the animation controltype, namely geometric, kinematic, physical, be-havioral and cognitive. as proposed in [24] andillustrated in Fig. 1.When dealing with the geometric models, the

animation controls act directly on the VC vertexlevel. Involving kinematic models enables to group

vertices into subsets with specific kinematic rules.More complex animation controllers use the VCphysical properties to generate motion throughdynamic simulation. Behavioral modeling allowsself-animating characters that react with respect toenvironmental stimuli. Recent research activitieson self-learning VC have been proposed ascognitive animation models. Let us note that theanimation control precision (in terms of resolu-tion) decreases from the pyramid low levels to thehigher ones. Moreover, in order to achieve thefinal VC animation and thus, to deform the VCshape, the high-level controllers have to providethe low-level parameters.In order to be able to represent animation

independently of the controllers used, our goal,within a standardization process, is to offer ageneric and, at the same time, compact representa-tion of the low-level animation model. Thus, weaddress in this paper aspects related to the first twolevels in the hierarchy: geometric and kinematiclevel.A comprehensive view of the evolution of 3D

VC-related techniques is presented in Section 2.This survey is structured according to the threemajor components of an animation system (1)modeling, (2) animation, and (3) motion capture,and is based on relevant techniques reported in theliterature or used by commercial products. Aspecial attention is paid to communication systemsfor 3D animation data (networked virtual envir-onments).

Fig. 1. CG animation modeling hierarchy.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741718

Page 3: Insights into low-level avatar animation and MPEG-4 standardization

Section 3 introduces the basic concepts relatedto the virtual human animation tools as defined inAmendment 1 of the MPEG-4 standard. TheMPEG-4 virtual human body object is described interms of (1) definition parameters, specifying themodel properties (topology, geometry, texture andcolor), and (2) animation parameters defining the3D pose of the avatar.Section 4 shows how the FBA framework is

extended within the ongoing standardizationprocess into the BBA framework. Aiming atanimating any kind of articulated model, theAFX is extensively described. The advancedmodeling concepts then addressed rely on skin-

and-bones (SB) representation and curve-baseddeformations. The new syntax for node definitionis analyzed in detail, respectively for the SBBone,SBSegment, SBSite, SBMuscle, and SBSkinned-

Model. Then, the syntax of animation stream isdiscussed in terms of rotation representation,interpolation methods, and animation mask andvalues parameters.In Section 5, the FBA and BBA frameworks are

comparatively evaluated within the specific appli-cation related to a sign language communicationsystem. Realistic animation capabilities, compres-sion performances, and compatibility with existingbroadcast technologies are the key criteria experi-mentally discussed.Finally, concluding remarks and perspectives of

future work are reported in the last section.

2. 3D VC animation in a standalone and a

networked virtual environment

The first 3D virtual human model was designedand animated by means of the computer in the late70s. Since then, VC models have become more andmore popular, making a growing population ableto impact the every day real world. Starting fromsimple and easy to control models used incommercial games [19,2], to more complex virtualassistants for commercial [33] or informationalweb sites [28], to the new stars of virtual cinema[25], television [56] and advertising [5], the 3Dcharacter model industry is currently booming.

Moreover, the steady improvements within thedistributed network area in terms of bandwidthcapabilities, bit-rate performances and advancedcommunication protocols have promoted theemergence of 3D communities [8] and immersionexperiences [36] in distributed 3D virtual environ-ments.Here, we present a comprehensive view of the

evolution of 3D VC related techniques. This briefoverview is structured according to the three majorcomponents of an animation system, namely (1)modeling, (2) animation, and (3) motion capture,which are strongly interconnected and applicationdependent, as we shall show in the sequel. Aspecial attention will be paid to communicationsystems for 3D animation data, in other words thenetworked virtual environments.

2.1. VC modeling

VC modeling consists in specifying the geometryand the appearance properties (color, material,texture) of a model.Designing a VC can be achieved either accord-

ing to a segmented-based approach or within aseamless framework. The so-called segmented

character is defined as a hierarchical collection ofrigid 3D objects, referred to as segments. The so-called seamless VC is geometrically defined as asingle and continuous mesh.The most relevant techniques used in character

animation software packages for specifying themodel geometry refer to surface-based representa-tions and more specifically to (1) polygonalmeshes, (2) parametric equations, and (3) implicitrepresentations.The polygonal surface-based technique consists

of explicitly specifying the set of planar polygonswhich compose the 3D object surface and ofconnecting them together [23]. Two types ofinformation have to be kept about an object,namely, purely geometric information (coordinatesof vertices) and topological information (whichdetails how the geometric entities relate to eachother). Historically, the polygonal surface-basedtechnique is the first method introduced incomputer graphics and remains the basic toolused at the rendering stage by any other surface

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 719

Page 4: Insights into low-level avatar animation and MPEG-4 standardization

representation technique. Its main advantage is thecapability to define any complex surface, thesmoothness depending on the number of polygonsused. However, increasing the polygon numbermay significantly degrade the animation perfor-mances. The strategy developed in this caseconsists of deriving a lower-resolution version ofthe VC, in animating/deforming this version, andin recovering the full resolution version byachieving a subdivision surface method [15,34,62].Non-planar parametric representations of a

surface are widely used in computer graphicsbecause of their easy computation and simplemathematical manipulation. A non-planar surfacepatch, i.e. an elementary curved surface entity, isdefined as the surface traced out as two parametersðu; vÞA½0; 1�2; in a two-parameter representation,

Pðu; vÞ ¼ ½xðu; vÞ; yðu; vÞ; zðu; vÞ�: ð1Þ

According to the curve degree, the patch surfacecan be of linear, cardinal, B-spline or B!ezier type.The patch-based modeling method aims at gen-erating patches with respect to curve profile(s) andto stitch them together in order to build complexsurfaces. Specific constraints have to be introducedin order to properly build a patch-based VC. Inparticular, it is recommended to (1) completelycover a joint with a unique patch surface, and (2)ensure the continuity of the mesh at the patchborders, by identically deforming adjacent fron-tiers. Mathematically, the patch-based surfacerepresentation is the limit surface resulting fromthe convergence of an iterative surface subdivisionprocedure.NURBS-based surface [43] is an extension of the

B-spline patch obtained by weighting the influenceof each control point. A NURBS surface of degreeðp; qÞ is defined as

Pðu; vÞ ¼

Pmi¼0

Pnj¼0 Ni;pðuÞNj;qðvÞwi; jPi; jPm

i¼0

Pnj¼0 Ni;pðuÞNj;qðvÞwi; j

; ð2Þ

where Ni;p and Nj;q are the B-spline basisfunctions, Pi; j the control points and wi; j is theweight of Pi; j : A non-uniform weighting mechan-ism increases the flexibility of the representation,which becomes adapted for modeling a wide rangeof shapes with very different curvature character-

istics, while minimizing the number of controlpoints.Metaballs [10,61], also known as blobby objects,

are a type of implicit modeling technique. Ametaball can be interpreted as a particle sur-rounded by a density field. The density assigned tothe particle (its influence) decreases with thedistance to the particle location. A surface isimplied by taking an isosurface through thisdensity field—the higher the isosurface value, thenearer it will be to the particle. The key to usingmetaballs is the equation specifying the influenceon an arbitrary point from an arbitrary particle.Blinn [10] used exponentially decaying fields foreach particle,

CðrÞ ¼ b expð�arÞ; ð3Þ

and Wyvill et al. [61] defined a cubic polynomialbased on the radius of influence for a particle R

and the distance r from the center of the particle tothe field location considered,

CðrÞ ¼ 2r3

R3� 3

r2

R2þ 1: ð4Þ

The powerful aspect of metaballs lies in the waythey can be combined. By simply summing theinfluences of each metaball at a given point, we getvery smooth blendings of the spherical influencefields (Fig. 2) allowing the realistic, organic-look-ing shape representation.However, achieving real-time implementation of

implicit surface rendering remains difficult, limit-ing the use of metaballs to off-line productions asin cinema industry [16]. Moreover, mappingtextures on such a geometry is a difficult task.Usually, the models designed with this techniquehave the colors specified at the vertex level or arecovered by uniform (solid) colors.In practice, designing a VC highly depends on

animation requirements. For example, a character

Fig. 2. Metaballs blending.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741720

Page 5: Insights into low-level avatar animation and MPEG-4 standardization

involved in a gaming environment is usuallymodeled with simple polygons, while in the caseof an artistic movie, characters may be eithermodeled as complex polygonal surfaces obtainedby applying subdivision surface techniques, ordirectly defined as NURBS surfaces.Let us now see how the animation of a 3D VC

strongly depends on the modeling type.

2.2. VC animation

Animating a segmented character consists inapplying affine transforms to each segment. Themain advantages of such a simple technique arerelated to real-time capabilities on basic hardwareconfigurations and intuitive motion control. How-ever, such an animation results in seams at thejoint level between two segments—Figs. 3(a) and(b). This undesirable effect can be more or lessovercome by performing ad hoc methods, such asintroducing spheres at the joint level, or 3D objectsfor masking the joints. This is the case whendealing with cartoon-like animation. Nevertheless,more realistic animation requires handling localdeformations.

Animating a seamless character consists inapplying some deformations at the skin level.Major 3D mesh deformation approaches can beclassified into 5 categories referred to as:

1. Lattice-based: A lattice is a set of controlpoints, forming a 3D grid, that the userpositions to control a 3D deformation. Pointsfalling inside the grid are mapped from theunmodified lattice to the modified one usingsmooth interpolation.

2. Cluster-based: Grouping some vertices of theskin into clusters enables to control theirdisplacements by using the same parameters.

3. Spline-based: Spline and, in general, curve-based deformation allows to deform a meshwith respect to the deformation of the curve.Further details are reported in Section 4.

4. Morphing-based: The morphing technique con-sists in smoothly changing a shape into anotherone. Let us mention that such a technique isvery popular for animating virtual human faces.

5. Skeleton-based.

The first four categories are used in specificapplication cases and are more or less supportedby the main animation software packages. The lastcategory, more and more often adopted in the VCanimation systems, involves the concept of skele-ton. The skeleton of a VC is a set of semanticinformation composed in a hierarchical structureof elementary entities called bones. The seamlesscharacter is deformed by applying rigid transformsonto the skeleton bones. These transforms inducedisplacements of a subset of the skin vertices Thistechnique avoids seams at joint level—Figs. 3(c)and (d). The skeleton-based animation conceptwill be described in detail in Section 4.To design the VC skeleton, an initialization

stage is required: the designer has to specify theinfluence region of each bone of the skeleton aswell as a measure of influence. This stage is mostlyinteractively and recursively repeated until reach-ing the desired animation effects. Deforming aseamless character remains a complex and time-consuming task, requiring dedicated hardware.However, the increase in the performance cap-abilities of computer graphics hardware contri-butes to promote the skeleton-based animationFig. 3. Virtual humanoid animation.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 721

Page 6: Insights into low-level avatar animation and MPEG-4 standardization

technique as the ‘‘standard’’ in 3D applications[1,37,31]. The current trend is in favor ofsubdividing the VC into seamless subparts anddeforming each one independently of the others.This partitioning strategy helps to fulfill the real-time requirements.Within the low-level animation context, the

question that arises is to identify the nature(namely, kinematic or dynamic) and the minimalnumber of the critical parameters which have to bespecified. The kinematic type parameters arenaturally involved within a forward or inversekinematics (IK) framework.Within a forward kinematics (FK) approach,

critical parameters correspond to the geometrictransform applied to each bone of the skeleton.Within an IK approach, critical parameterscorrespond to the geometric location of an endeffector. If, in the past, real-time animation basedon IK was not possible because of the complexityof IK equation solving, today almost all animationpackages support both animation methods.Dynamic parameters refer to physical properties

of the 3D virtual object, as mass or inertia, andhow external or internal forces interact with theobject. Such physics-based attributes have beenintroduced since 1985 in the case of virtual human-like models [3,59]. Extensive studies [6,11] onhuman-like virtual actor dynamics and controlmodels for dedicated motions (walking, running,jumping, etc.) [40,41,60] have been carried out.Recently, Faloutsos et al. [21] proposed a frame-work making it possible to exchange controllers, aset of parameters which drive a dynamic simula-tion of the character and which are evolved usingthe goals of the animation as an objective function,resulting in physically plausible motion. Even ifsome positive steps have been achieved fordedicated motions, dynamically simulating articu-lated characters, able to perform a wide range ofmotor skills is still a challenging issue far from astandardized solution.However, addressing skeleton-based animation

within a kinematics animation parameter repre-sentation is currently the state-of-the-art technol-ogy for VC modeling/animation, and is matureenough to be considered within an internationalstandardization process.

2.3. Motion capture systems

Computer animation of VCs and interactivevirtual environments require that a motion gen-erating source is available. The first animationtechnique has been imported from cartoon anima-tion studios: a well-trained animator roughlydraws the character in the key poses and someother animators ‘‘interpolate’’ in-between. Whenadopting such an animation scheme in the case ofa 3D VC, the role of the ‘‘less-trained’’ person isplayed by the computer. Indeed, the key-frame-based interpolation technique is a well-spreadmethod in 3D applications, perfectly masteredwith the computer. The main difficulty, however, isto find the well-trained animator. To overcomethis constraint, motion capture systems weredeveloped in the late 70s.The characteristics of an efficient motion

capture system are the accuracy of the capturedmovements, the ability to capture unconstrainedmovement, as well as the robustness and fastnessof the calibration procedure.Motion capture technologies are generally

classified into active/passive sensor-based captureaccording to the nature of the sensors used. Withinan active sensor-based system, signals to bemeasured are transmitted by the sensors, while,within a passive sensor-based system, signals to bemeasured are obtained by light reflections onsensors.One of the earliest methods, using mechanical

sensors [22] as active sensors, was used in aprosthetic system, which consists in a set ofarmatures attached all over the performer’s bodyand connected by using a series of rotational andlinear encoders. Reading the status of all theencoders allows to analyze and retrieve theperformer’s poses.The so-called acoustic method [49] is based on a

set of sound transmitters pasted on the performer’sbody. They are sequentially triggered to output asignal and the distances between transmitters andreceivers are computed as function of the timeneeded for the sound to reach the receivers. The 3Dposition of the transmitter, and implicitly of theperformer’s segment, is then computed by using atriangulation procedure or phase information.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741722

Page 7: Insights into low-level avatar animation and MPEG-4 standardization

The most popular method for motion capture isbased on magnetic fields [4,44]. Such a system ismade of one transmitter and several magnetic-sensitive receivers pasted on the performer’s body.The magnetic field intensity is measured at thereceivers side and the location and orientation ofeach receiver are computed accordingly. Let usnote that such a system is very sensitive to metallicenvironments and special care has to be taken.More complex active sensors are based on fiber

optics. The principle consists in measuring thelight intensity passing through the flexed fiberoptics. Such a system is usually used to equipdevices like data-gloves [58].The last class of active sensors is based on

accelerometers, small devices which measure theacceleration of the body part they are attached to.When using active sensors the real actor must be

equipped with a lot of cables, limiting motionfreedom. However, recent motion capture systemsbased on wireless communication are very promis-ing [4,44].The second class of motion capture techniques

uses passive sensors. One camera, coupled with asystem of mirrors properly oriented, or severalcameras make possible the 3D object reconstruc-tion from multiple 2D views. To reduce thecomplexity of the analyzer, markers (light reflec-tive or LEDs) are attached to the performer’sbody. The markers are detected on each cameraview and the 3D position of each marker iscomputed. However, occlusions due to perfor-mer’s motions may occur. Additional cameras aregenerally integrated in order to reduce loss ofinformation and ambiguities. Since 1995, compu-ter vision-based motion capture has become anincreasingly challenging issue when dealing withtracking, pose estimation and gesture recognitionoriented human motion capture. Several analysismethods have been reported [38] but the limita-tions imposed do not allow us to considercomputer vision-based motion capture as a matureand accurate motion capture technique.

2.4. Network virtual environment

Core technologies being available for modelingand animating characters, real advances can be

made towards creating communities populated by3D citizens and immersion experiences as de-scribed in [36]: ‘‘Networking coupled with highlyinteractive technology of virtual worlds willdominate the world of computers and informationtechnology’’.The first complete network virtual environment

(NVE), called virtual environment operating shell

[12], was developed by the University of Washing-ton. Since then an increasing number of NVEshave been proposed within special applicationfields or according to special architecture con-straints. For example: dVS [17], created forinteracting with virtual prototypes of CAD pro-ducts, DIVE [14], a system which uses peer-to-peercommunication, NPSNET [35] which simulates abattlefield, and MASSIVE [26] which combinesgraphics, audio and text interfaces. The advancedfunctionality offered by VRE is the capability ofimmersion into the virtual world. VLNET [13,42]is one of the first VREs offering realistic humanrepresentations. NVEs cover a wide range ofapplications: operations in dangerous environ-ments, scientific visualization, medicine, rehabili-tation and help to disabled people, psychiatry,architectural visualization, education, training,and entertainment. NVEs most often use proprie-tary solutions for scene graph definition andanimation. Current standardization efforts withinVRML and MPEG-4 (BIFS 3D, AFX, and Multi-user World), propose a unified framework, ensur-ing interoperability. Moreover, MPEG-4 offerslow-bit-rate compression methods for scene graphcomponents: geometry, appearance and anima-tion.For applications dealing with populated 3D

NVEs, the FBA and more recently, AFX workinggroups have created a 3D character representationand animation framework which:

* is generic enough to accept any character fromsimple to complex;

* ensures realistic rendering as well as cartoonspecific effects;

* can be easily plugged-in with the existentmotion capture techniques;

* supports streaming animation compressed atlow bit-rate.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 723

Page 8: Insights into low-level avatar animation and MPEG-4 standardization

The following two sections present and discuss indetails the FBA and AFX specifications.

3. MPEG-4 face and body animation

First efforts to standardize the animation of aVC within MPEG-4 were finalized at the begin-ning of 1999, and dealt with specifications fordefining and animating a human VC.First version of the standard addresses the

animation issue of a virtual human face, whileAmendment 1 contains specifications related tovirtual human body animation. In order to defineand animate a virtual actor, MPEG-4 introducesthe FBA object. Conceptually, the FBA objectconsists of a collection of nodes in a scene graph,which are animated by the FBA object bitstream.The shape and the appearance of the face arecontrolled by the bitstream instantiating facial

definition parameter. The face expressions andanimation are controlled by the bitstream instan-tiating facial animation parameter node. Virtualbody geometry and color attributes are controlledby the body definition parameters (BDPs) andavatar motions by the body animation parameters

(BAPs). Within the MPEG-4 framework, faceanimation can be performed at a high level, byusing a standardized number of expressions andvisemes as well as at a low level, the standarddefining a number of feature points on the virtualhuman face. Face animation within MPEG-4 hasbeen already presented and discussed in detail[18,20,32] and is not the purpose of the presentpaper. Nevertheless, animation of the human bodywithin the MPEG-4 standard is less reported in theliterature [47,51] and will be addressed here below.The MPEG-4 body object is a hierarchical graph

consisting of nodes associated with anatomicalsegments and edges defining subpart relationships.Each segment is individually specified and ani-mated by means of two distinct bitstreams,referred to as BDPs and BAPs. BDPs control theintrinsic properties of the segment, namely surfacelocal topology, geometry and texture. BAPs definethe extrinsic properties of a segment, i.e. its 3Dpose with respect to a reference frame attached tothe parent segment. BDPs are actor specific, hence,

the overall morphology of an actor can be readilyaltered by overriding the current BDPs. Contraryto BDPs, BAPs are meant to be generic. Ifcorrectly interpreted, a given set of BAPs willproduce perceptually reasonably similar results, interms of motion, when applied to different actormodels specified by their own BDPs.Let us show how (1) starting from a non-

articulated and static VRML humanoid model, acomplete segmentation into anatomical subparts isperformed, (2) a hierarchical graph specifies thebody as an articulated model and (3) the BAPsgeneration issue is addressed. The model segmen-tation procedure presented in Section 3.1, as wellas the BAPs production in Section 3.2, are notstandardized by MPEG-4, but proposed by theauthors as part of an authoring tool.

3.1. 3D virtual human body modeling

The MPEG-4 body object is a segmentedcharacter as defined in Section 2. The characterdefinition, strongly based on H-Anim V2.0 speci-fications [27], uses scene graph nodes, namely,Humanoid node, Joint node, Segment node, andSite node. In addition, the set of human bodyjoints (names and hierarchical topological graph)is standardized and available.Decomposing an VRML model into anatomical

subparts is performed by a supervised polygonalmesh propagation algorithm. The principle is tocreate vertex and face adjacency lists of thesubmeshes associated with anatomical segments.The algorithm involves the following three mainsteps. The first one, the initialization step, aims atcreating interactively an ordered list of meshvertices, the so-called joint location control ver-tices (JLCVs). The designer has to select between 3and 7 vertices on the mesh, at each joint level(Figs. 4(a,b), and 5(a,b)). The second step auto-matically generates the related joint vertices fromthe JLCVs, constituting the so-called joint contour(JC). A 3D propagation-based method includingangular constraints is applied to any couple of twosuccessive JLCVs, ðvi; vjÞ (the last JLCV is coupledwith the first one). For each vertex v belonging tothe one-order neighborhood of vi; the angle ðvvivjÞis computed. The vertex vmin which minimizes the

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741724

Page 9: Insights into low-level avatar animation and MPEG-4 standardization

angular measure is selected. The procedure isiterated by replacing vi by vmin and stops as soon asthere exists v (one order neighbor of the current vi)such that v ¼ vj : The set of selected vertices willform the connection line (CL) between the initial vi

and vj : By concatenating all the CLs, we obtain theJC associated with a joint (Figs. 4(d) and 5(d)).The last stage constructs the anatomical seg-

ment related to two adjacent JCs. The samepropagation procedure is applied to an arbitrarypair of vertices va and vb; each one belonging to aJC (Fig. 6(a)). An arbitrary vertex vk belonging tothe CL obtained with respect to va and vb isselected (Fig. 6(b)). A 3D geodesic reconstruction(iterative elementary geodesic dilation) is appliedto vk with respect to the mesh surface limited bythe two JCs (Fig. 6(c)). Finally, the anatomicalsegment is the submesh made of the two JCs andthe reconstructed component from vk (Fig. 6(d)).

Subpart relationships between the resultinganatomical segments are represented in a hier-archical graph and follows the H-Anim specifica-tions [27]. The tree structure of the graph definesfor each component one parent node and possiblyseveral child nodes.How does one represent the animation para-

meters of such a MPEG-4 compliant human bodyin a compact way?

3.2. 3D virtual human body animation

The key concept for BAP representation is theorientation of any anatomical segment as thecomposition of elementary rotations, namelytwisting, abduction and flexion. 296 angular jointvalues are enough to specify any 3D pose of avirtual avatar. Angular values are specified withrespect to the local 3D coordinate system of the

Fig. 4. Segmenting upper arm of the 3D model: selecting first

JC.

Fig. 5. Segmenting upper arm of the 3D model: selecting

second JC.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 725

Page 10: Insights into low-level avatar animation and MPEG-4 standardization

anatomical segment. The origin of the localcoordinate system is defined as the gravity centerof the JC common to the considered anatomicalsegment and its parent. The rotation planes arespecified (Fig. 7) and anatomical segment rotationaxes are standardized (Fig. 8).In order to facilitate BAPs manipulation, we

have developed the so-called ARTEMIS avatar

animation interface (3AI) (Fig. 9).The 3AI is a C++ user-friendly interface,

available on X11/Motif and Windows platforms,which offers the following functionalities: (1) BAPediting, including basic and advanced instantiationtechniques as linear, spline and spherical inter-polation and IK; (2) 3D compositing of objectssuch as images, video sequences, human bodymodels or anatomical part models (hand, arm),and 3D scenes; (3) calibration of the 3D bodymodel according to the anthropometric character-istics of the actor in the video sequence (dimen-

sions of the palm, length of the fingers, etc.); (4)interactive extraction of BAPs specifying anygesture posture or corresponding to the postureshown in the video sequence; (5) animation of thevirtual human model according to a local resource(BAPs file) and remote resource (through UDP-based communication). 3AI has been used togenerate the MPEG-4 BAPs data set for thealphabet letters used in American Sign Language(ASL) [48].In order to provide realistic 3D deformations

like muscle contraction or clothing folds and

Fig. 6. Segmenting upper arm of the 3D model: geodesic

dilation.

Fig. 7. Arm rotations with respect to standardized rotation

planes.

Fig. 8. (a), (b) Standardized rotation axes attached to shoulder,

elbow, wrist and fingers.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741726

Page 11: Insights into low-level avatar animation and MPEG-4 standardization

adjustments as well as to avoid the seams at thejoint level (effects induced by the animation),MPEG-4 FBA includes a deformation modelingtool achieved by instanciating body deformation

tables (BDTs). BDTs address non-rigid motion byspecifying a list of vertices of the 3D model as wellas their local displacements as functions of BAPs.Let us demonstrate the BDT concept for deform-ing a finger shape:

BodyDefTable {bodySceneGraphNodeName ‘‘l ring proximal’’bapIDs [143]vertexIds [40,50,51,52,53,54]bapCombinations [100, 200, 300, 450, 500]displacements[-0.11 0.12 0, 0.1 0 0, 0.1 0 0, 0.1 0 0, 0.1 0 0,0.1 0 0, �0:25 0.25 0, 0.22 0 0, 0.22 0 0,0.22 0 0, 0.22 0 0, 0.22 0 0, �0:35 0.37 0,0.3 0 0, 0.3 0 0, 0.3 0 0, 0.3 0 0, 0.3 0 0�0:4 0.47 0, 0.41 0 0, 0.41 0 0, 0.41 0 0,0.41 0 0, 0.41 0 0, �0:52 0.6 0, 0.48 0 0,0.48 0 0, 0.48 0 0, 0.48 0 0, 0.48 0 0]

}BodyDefTable {bodySceneGraphNodeName ‘‘l ring middle’’bapIDs [143]vertexIds [0, 1, 2, 3]bapCombinations [100, 200, 300, 450, 500]

displacements[�0:1 0 0, �0:09 0 0, �0:08 0 0, �0:1 0 0,�0:22 0 0, �0:2 0 0, �0:21 0 0, �0:22 0 0,�0:29 0 0, �0:28 0 0, �0:3 0 0, �0:3 0 0,�0:35 0 0, �0:37 0 0, �0:39 0 0, �0:4 0 0,�0:4 0 0, �0:4 0 0, �0:45 0 0, �0:45 0 0]

}

In this case, the BDTs refer to the deformationof the segments l ring proximal and l ring middle

as functions of BAP #143, l ring-flexion2 Thevertices with indices 50, 51, 52, 53 and 54 onsurface l ring proximal are deformed. The displa-cements for vertex 50 are (�0:11 0.12 0), (�0:250.25 0), (�0:35 0.37 0), (�0:4 0.47 0) and (�0:520.6 0) for the BAP l elbow flexion values 100, 200,300, 450 and 500, respectively. By controlling theshape at the vertex level, the muscle-like deforma-tions can be achieved. However, realistic deforma-tions are possible if the deformation tables are bigenough and use an important number of vertices(usually up to 30).For compactness purposes in deformation field

specification, the standard allows a BDT inter-polation method exploiting the reference BDTsassociated with key frames. Fig. 10 shows theresults of the BDT-based interpolation techniquein the case of the simple finger flexion movement.A movie presenting the BDP deformation mechan-ism for the finger flexion is available at www-artemis.int-evry.fr/~preda/2002ICJ.

3.3. 3D Virtual body animation parameter coding

On the one hand, the independence between ageneric 3D model and the BAP description makesit possible to avoid the 3D model transmissionduring animation. On the other hand, BAPencoding ensures a very low bit-rate transmission.Two encoding methods (predictive and DCT-based) are included in the standard.In the first method, BAPs are quantized and

coded by a predictive coding scheme. For eachparameter to be coded in frame n; the decodedvalue of this parameter in frame n � 1 is used as aprediction. The prediction error is then encoded byarithmetic coding. This scheme prevents encoding

Fig. 9. The ARTEMIS avatar animation interface and its main

functionalities (BAP editing, interpolation and UDP-based

communication).

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 727

Page 12: Insights into low-level avatar animation and MPEG-4 standardization

error accumulation. Since BAPs can be assignedwith different precision requirements, differentquantization step sizes are applied. They consistof a local (BAP specific) step size and a global one(used for bit-rate control). The quantized valuesare passed to the adaptive arithmetic encoder. Thecoding efficiency is increased by providing theencoder with range estimates for each BAP. Wehave tested the MPEG-4 animation parametersencoding schemes on BAP data corresponding to

the ASL alphabet. Related to the motion complex-ity, we note that all the signs are performed withone hand and the avatar body position andorientation do not change. The DCT-based codingmethod splits BAP time sequences into BAPsegments made of 16 consecutive BAP frames.Encoding a BAP segment includes three stepsachieved for all BAPs: (1) determining the 16coefficient values by using discrete cosine trans-form (DCT), (2) quantizing and coding the ACcoefficients and (3) quantizing and differentialcoding of the DC coefficients. The global quanti-zation step Q for the DC coefficients can becontrolled and, the AC coefficients global quanti-zation step is 1/3 from Q. The continuouscomponent coefficient (DC) of an intra-codedsegment is encoded as it is and, for an inter-codedsegment, the DC coefficient of the previoussegment is used as a prediction of the currentDC coefficient. The prediction error and alter-native component coefficients (AC), (for bothinter- and intra-coded segments), are coded byusing Huffman tables.Table 1 shows the results obtained by applying

both encoding methods, predictive-based andDCT-based to BAP files associated with signs‘‘A’’ to ‘‘L’’. In our experiments, the animationframe rate equals 10.In order to objectively compare both coding

schemes, we introduce the distortion measurebetween the original and the decoded sequences,

Table 1

Bit-rates for the predictive-based and DCT-based coding schemes

Sign Q ¼ 1 Q ¼ 2 Q ¼ 4 Q ¼ 8 Q ¼ 16 Q ¼ 31

P DCT P DCT P DCT P DCT P DCT P DCT

A 1.32 2.80 1.30 2.42 1.25 1.83 1.22 1.54 1.18 1.14 1.14 0.86

B 1.41 2.61 1.35 2.22 1.34 1.64 1.32 1.36 1.32 1.08 1.29 0.81

C 1.80 3.65 1.78 3.15 1.71 2.24 1.68 1.87 1.63 1.41 1.58 1.10

D 1.45 2.66 1.42 2.30 1.38 1.69 1.35 1.40 1.32 1.01 1.31 0.74

E 1.75 3.23 1.71 2.85 1.65 2.13 1.62 1.74 1.57 1.30 1.53 1.00

F 1.37 2.04 1.33 1.83 1.31 1.36 1.28 1.12 1.26 0.88 1.25 0.67

G 1.81 2.83 1.75 2.43 1.72 1.76 1.67 1.47 1.63 1.12 1.57 0.84

H 1.78 2.83 1.74 2.39 1.70 1.70 1.65 1.40 1.63 1.08 1.58 0.76

I 1.53 2.70 1.49 2.39 1.46 1.73 1.41 1.45 1.39 1.10 1.38 0.81

J 1.37 2.20 1.33 1.89 1.31 1.34 1.27 1.11 1.24 0.82 1.21 0.59

Results for signs ‘‘A’’ to ‘‘L’’. Q denotes the global quantization value.

Fig. 10. BDTs interpolation. Key frames (a) and (d). Inter-

polated frames (b) and (c).

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741728

Page 13: Insights into low-level avatar animation and MPEG-4 standardization

defined as the mean square error of the BAPvectors:

D ¼1

No2frames

XNo2frames

i¼0

8BAPðoÞi � BAPðdÞi 82; ð5Þ

where BAPðoÞi and BAP

ðdÞi denotes, respectively, the

original and the decoded BAP vectors of frame i:Let us analyze the compression methods perfor-mances and behaviors by studying the bit-rate as afunction of the distortion. The encoding methodsallow to control the bit-rate by modifying theglobal quantization step (Fig. 11(II)). Computingfor each quantization step, the distortion asexpressed in Eq. (5), enables to plot Fig. 11(I).Fig. 11(III) results from the combination ofFigs. 11(I) and (II).Let us observe that both curves (Fig. 11(III)) are

decreasing, the slope of the predictive-based curveis significantly smaller than the one of the DCT-based curve. When dealing with wide range of bit-rates, the DCT-based method is more appropriate.In the case of applications requiring near loss-less

compression, the use of the predictive-basedmethod is recommended.In summary, FBA specifications related to

virtual human characters offer a basic frameworkfor animation, allowing the representation of theavatar as a segmented character. Moreover, themotion parameterization is very compact andmakes it possible, in the compressed form, toachieve streaming animation in the very low-bit-rate network environment. However, the FBAframework is limited to human-like VC animation.Therefore, to animate any articulated figure,MPEG-4 proposes the BBA specifications.

4. MPEG-4 bone-based animation

4.1. Context and objectives

An articulated figure, also called kinematics

linkage, consists of a series of rigid links that areconnected at joints. In order to define a static 3Dpose of an articulated figure including geometry,

Fig. 11. Experimental results for sign ‘‘B’’. Predictive (bullet) vs. DCT BAP coding schemes. Distortion versus Q (I), Bit-rate versus Q

(II) and Distorsion versus bit-rate (III).

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 729

Page 14: Insights into low-level avatar animation and MPEG-4 standardization

color and texture attributes, the functionalityaddressed consists in considering the entire figureas one single 3D mesh referred to a global seamlessmesh. In this case, the bone skeleton is providedtogether with a field of weighting vectors specify-ing, for each vertex of the mesh, the relatedinfluence of the bones directly altering the 3Dposition of the vertex. Moreover, the weightingfactors can be specified implicitly and morecompactly by means of two influence regionsdefined through a set of cross sections attachedto the bone.The generic method consisting in animating a

VC from its skeleton is called Skin&Bones.Applications using this concept deal with therealistic animation of any kind of articulatedmodel. AFX Skin&Bones related nodes allow thedefinition of any kind of skeleton hierarchy, withany kind of attached geometry (based on Index-edFaceSet or higher order geometry nodes likeNURBS and subdivision surfaces), and anyappearance attributes. The Skin&Bones animationbitstream—simply called BBA—provides a com-pact representation of motion parameters.The seamless mesh-based representation over-

comes the current limitations of MPEG-4 Amend-ment 1 and is able to provide a realistic figureanimation without specifying deformation infor-mation. AFX specifications also make it possibleto define seamless parts of an articulated modeland to group them together in order to achievereal-time animation. Two aspects are addressed bythe Skin&Bones framework. The first one dealswith the definition of the skinned model as a staticmodel by means of the geometry and theappearance. Also as definition part of the model,a hierarchical skeleton is semantically attached.The second aspect deals with the animation ofarticulated models, and more specifically with thecompressed representation of the animation para-meters.

4.2. Skinned model definition parameters

4.2.1. Semantics

Defining a skinned model involves specifying itsstatic attributes as well as its animation behavior.From a geometric point of view, a skinned model

is such that the set of vertices which belong to the‘‘skin’’ of the model is defined as a unique list. Allthe shapes which form the skin share the same listof vertices. This representation type avoids seamsat the skin level during the animation stage. Toensure the possibility of defining various appear-ances at different levels of the skinned model, theskin is defined as a collection of shapes. For eachone, it is possible to define its own set of color,texture and material attributes, and each oneincludes a geometry node field which refers tothe skinned model vertices list.The animation behavior of a skinned model is

defined by means of a skeleton and its properties.The skeleton is a hierarchical structure constructedfrom bones. Three types of information areassociated with a bone:

1. the relative geometrical transformation of thebone with respect to its parent in the skeletonhierarchy;

2. the influence of the bone movement on thesurface of the articulated model;

3. IK related data, specific to the considered bone.

Each bone has an influence on the skin surface.Thus, by changing one bone position or orienta-tion, some vertices from the model skin will beaffected by translation components, specified foreach vertex. Here, defining the skinned modelconsists in specifying for each skeleton bone aninfluence region, i.e. the subset of affected skinvertices and the related measure of affectedness (aset of weighting coefficients). The influence regioncan be directly specified by the designer or can becomputed just before performing animation.In the first case, the list of affected vertices and

the weighting coefficients are part of the bonedefinition.In the second case, the key concept relies on a

family ðjd Þd of bone related weighting functionsdefined on arbitrary planes at a distance d fromthe bone center and being perpendicular to thebone. The support of the planar weightingfunction jd is partitioned into three specific zones(Zint; Zmid and Zext) by two concentric circles.These zones are defined by two parameters,namely the inner ðrdÞ and outer ðRdÞ radius(Fig. 12).

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741730

Page 15: Insights into low-level avatar animation and MPEG-4 standardization

The planar weighting function jdðrd ;Rd Þ isdefined as follows:

jdðrd ;RdÞðxÞ ¼

1; 8xAZint;

fdðx;ZextÞRd � rd

� �; 8xAZmid;

0; 8xAZext;

8>>><>>>:

ð6Þ

where dðx;ZextÞ denotes the Euclidean distancefrom x to Zext and f ðÞ is a user specified fall-off tobe chosen among the following standardizedfunctions: x3; x2; x; sinðxÞ; x1=2; x1=3:The number and position (distance from the

bone center) of the planes are specified by thedesigner. Fig. 13 shows an example of a skeletonand the surface mesh representing the skin. Here,two planes have been considered relatively to thefirst phalange of the index finger.

The bone influence zone being defined, animat-ing the VC consists in translating mesh verticeswith respect to the bone transforms.For any skin vertex vi ¼ ðti

x; tiy; t

izÞT; the new

position induced by the bone bj geometricaltransformation is computed in three steps asfollows:

(1) calculate transform matrix [55] Mj:

Mj ¼Tbj Cbj

Rbj SRbj

Sbj ðSRbj

Þ�1 ðCbjÞ�1; ð7Þ

where Tb; Rb; Sb; SRb and Cb are the bonetranslation, rotation, scale, scaleOrientationand center matrices, respectively.

(2) compute the displacement vector:

dji ¼ Mj vi w

ji ; ð8Þ

where wji is the weighting coefficient related to

bone bj and associated with vi;(3) compute the new position of vi:

vi’vi þ dji : ð9Þ

Let us note that a bone movement is alwaysspecified by a geometric transformation withrespect to the initial registration of the VC’s staticposition.A set of adjacent bones forms a kinematics

chain. For kinematics chains with a large numberof elements, it is more appropriate to animatethem by using IK techniques and not by directlyspecifying the transformation of each element. Tosupport this kind of animation, the skinned modeldefinition is enriched with IK related data. Becauseof the rapid evolution in hardware capabilities, it isnot appropriate that the standard impose a specificIK solver. Supporting IK is reduced to definingspecific rotation and translation constraints at thebone level. Since a bone is just a semantic entity,the end effector is usually defined as a 3D point onthe skinned model surface.In many cases, especially for complicated

skinned models, it is possible to identify regionson the skin which are influenced by a small subsetof the skeleton bones. In such cases, to addressoptimized animation, the model has to be con-sidered as a collection of skinned models. Thus,the bones from one skinned model do not deform

Fig. 12. The support partitioning of the planar weighting

function jd ðrd ;Rd Þ:

Fig. 13. Hand shape, skeleton and bone planes.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 731

Page 16: Insights into low-level avatar animation and MPEG-4 standardization

the skin of others. The typical example is toconsider the fingers, hand and a part of theforearm as a 3D object segmented from the restof an humanoid character. The upper part of theforearm and the rest of the body induce changesof the position and the orientation of this objectin the scene, but do not induce deformationeffects on the segment skin. With this optimizationin mind, the definition of a skinned modelwithin AFX allows to add other 3D objects atany level of the skeleton, including other skinnedmodels.The bones which compose the skeleton are rigid

objects and while their transformation is possible,deformation is not. To address a realistic skinnedmodel animation, the ‘‘muscle’’ concept is intro-duced. A ‘‘muscle’’ is a curve with an influenceregion on the model skin and which can bedeformed. To ensure a generic representation ofthe ‘‘muscle’’ form, a curve representation basedon NURBS is used.

4.2.2. Node specification

Within a seamless-mesh-based representation,the descriptive structure relies on nodes de-finition. The MPEG-4 standard architecture,based on VRML, allows to add new toolsby defining related nodes and associated stream.Within this requirement, our contribution[45,46] to the standard consists in specifyingthe nodes interface in order to support theconcepts described in the previous subsection.Let us detail the nodes structure, by describingeach field.The SBBone node specifies data related to a

bone of the skeleton while the SBSkinnedModelnode is related to character skin properties. TheSBSegment node enables to add into the skeletonhierarchy any kind of standalone 3D object. Toaddress the IK issue or to define semantic pointson the model surface, the SBSite node is intro-duced. In order to take into account localdeformation effects based on curve deformation,the SBMuscle node is defined.Let us note that a similar approach was recently

adopted by the H-Anim 2001 specifications [27],but exclusively within a humanoid animationframework.

4.2.2.1. SBBone node. The syntax of the SBBonenode is the following:

SBBone{eventIn MFNode addChildreneventIn MFNode removeChil-

drenexposedField SFInt32 boneID 0exposedField MFInt32 skinCoor-

dIndex[ ]

exposedField MFFloat skinCoord-Weight

[ ]

exposedField SFVec3f endpoint 0 0 1exposedField SFInt32 falloff 1exposedField MFFloat sectionPosi-

tion[ ]

exposedField MFFloat sectionInner [ ]exposedField MFFloat sectionOuter [ ]exposedField SFInt32 rotationOr-

der0

exposedField MFNode children [ ]exposedField SFVec3f center 0 0 0exposedField SFRota-

tionrotation 0 0 1 0

exposedField SFVec3f translation 0 0 0exposedField SFVec3f scale 0 0 0exposedField SFRota-

tionscaleOrienta-tion

0 0 1 0

exposedField SFInt32 IkchainPosi-tion

0

exposedField MFFloat IkyawLimit [ ]exposedField MFFloat IkpitchLimit [ ]exposedField MFFloat IkrollLimit [ ]exposedField MFFloat IKTxLimit [ ]exposedField MFFloat IKTyLimit [ ]exposedField MFFloat IKTzLimit [ ]

}

The SBBone node specifies four kinds ofinformation: semantic data, bone motion, bone-skin influence and bone IK constraints.The boneID field is a unique identifier which

allows the bone to be addressed at run-time.The complete rigid transform of the bone as well

as a scale factor with respect to an arbitrarydirection are specified by means of center, transla-

tion, rotation fields and scale, scaleOrientation

fields, respectively. Thus, the possible geometric

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741732

Page 17: Insights into low-level avatar animation and MPEG-4 standardization

3D transformation consists of (in order): (1)(possibly) a non-uniform scale for an arbitrarypoint, (2) a rotation about an arbitrary point andaxis and (3) a translation. The rotationOrder fieldspecifies the rotation order when dealing with thedecomposition of the rotation with respect to thecoordinate system axes.Two ways of specifying the skin influence region

of the bone are possible:

1. directly ‘‘painting’’ the bone influence on skinvertices by instantiating the skinCoordIndex andskinCoordWeight fields. The skinCoordIndex

field enumerates the indices of all the skinvertices affected by the current bone. Mostly,the skin influence region is made of verticesbelonging to the 3D neighborhood of the bone.However, special influence configurations canbe addressed. The skinCoordWeight field is a listof weights (one per vertex listed in skinCoor-

dIndex) that measures the contribution of thecurrent bone to the vertex under consideration.The sum of all the skinCoordWeight related to agiven vertex must be 1.

2. compute the influence as a measure of distancebetween skin vertices and the bone withinseveral planes (cf. Section 4.2). The sectionInner

field (respectively, sectionOuter field) is a list ofinner (respectively outer) influence region radiifor different planes. The sectionPosition field isa list of the plane orientations defined by thedesigner. The falloff field specifies the functionbetween the amplitude affectedness and dis-tance: �1 for x3; 0 for x2; 1 for x; 2 for sinðxÞ; 3for x1=2 and 4 for x1=3: In order to compute theinfluence region as explained in Section 4.2, thelocalization of the bone is specified by the center

and endpoint fields.

The two schemes can be used independently orin combination. In the latter case, the individualvertex weights take precedence.IK related information of a bone deals with

positioning the bone into a kinematics chain anddefining possible motion constraints of the bone. Ifthe bone is the root of an IK chain thenIKchainPosition ¼ 1: In this case, when applyingthe IK scheme, only the orientation of the bone ischanged. If the bone is the last element in the

kinematics chain IKchainPosition ¼ 2: In this case,the animation stream has to include the desiredposition of the bone (X ; Y and Z within worldcoordinates). If IKchainPosition ¼ 3; the bonebelongs to the IK chain but is not the first northe last one in the chain. In this case, position andorientation of the bone are computed by the IKprocedure. Finally, if the bone does not belong toany IK chain ðIKchainPosition ¼ 0Þ; it is necessaryto transmit the bone local transformation in orderto animate the bone. If an animation streamcontains motion information about a bone whichhas IkchainPosition ¼ 1; this information will beignored. If an animation stream contains motioninformation about a bone which has Ikchain

Position ¼ 3; this means that the animation produ-cer wants to ensure the orientation of the bone andthe IK solver will use this value as a constraint.IK constraints of a bone are related to orienta-

tion and translation information. The IkyawLimit

(respectively, IkpitchLimit and IkrollLimit) fieldconsists in a pair of min/max values which limitthe bone rotation with respect to the X (respec-tively, Y and Z) axis. The IKTxLimit (respectivelyIKTyLimit and IKTzLimit) field consists in a pairof min/max values which limit the bone translationin the X (respectively, Y and Z) direction.The SBBone node is used as a building block to

describe the hierarchy of the articulated model byattaching one or more child objects. The children

field has the same semantic as used in VRML.Moreover, to support the dynamically change ofthe structure of the skeleton, the node containsaddChildren and removeChildren fields. The abso-lute geometric transformation of any child of abone is obtained through a composition with thebone-parent transformation.

4.2.2.2. SBSegment node. The SBSegment nodesyntax is the following:

SBSegment{exposedField SFString name ‘‘’’exposedField SFVec3f centerOf-

Mass0 0 0

exposedField SFVec3f momentsOf-Inertia

[ 0 0 00 0 0 00 0 ]

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 733

Page 18: Insights into low-level avatar animation and MPEG-4 standardization

exposedField SFFloat mass 0exposedField MFNode children [ ]eventIn MFNode addChildreneventIn MFNode removeChil-

dren}

The name field must be present, so that theSBSegment can be identified at run time.Physics properties of a segment are defined by

mass (the total mass of the segment), centerOf-

Mass (the location within the segment of its centerof mass) and momentsOfInertia (the moment ofinertia matrix) fields.The children field can be any object attached at

this level to the skeleton, including an SBSkinned-Model.An SBSegment node is a grouping node

especially introduced to address two issues:

1. the requirement to separate different parts ofthe skinned model into deformation-indepen-dent parts. Between two deformation-indepen-dent parts, the geometrical transformation ofone of them does not imply skin deformationson the other. This is essential for run-timeanimation optimization. The SBSegment nodemay contain an SBSkinnedModel node as achild. Portions of the model which are not partof the seamless mesh can be attached to theskeleton hierarchy by using an SBSegment

node;2. The requirement to attach standalone 3Dobjects at different parts of the skeletonhierarchy. For example, a ring can be attachedto a finger; the ring geometry and attributes aredefined outside of the skinned model but thering will have the same local geometricaltransformation as the attached bone.

4.2.2.3. SBSite node. The syntax of the SBSitenode is expressed as follows:

SBSite{exposedField SFVec3f center 0 0 0exposedField MFNode children [ ]exposedField SFString name 0000

exposedField SFRota-tion

rotation 0 0 1 0

exposedField SFVec3f scale 1 1 1exposedField SFRota-

tionscaleOrienta-tion

0 0 1 0

exposedField SFVec3f translation 0 0 0eventIn MFNode addChildreneventIn MFNode removeChil-

dren}

The SBSite node indicates a precise 3D pointwhich may or may not belong to the skin usedusually to localize an end effector. The 3D point isobtained by adding to the current transformationone obtained from the center, rotation, scale,scaleOrientation and translation fields. The children

field is used to store any object that can beattached to the SBSegment node.The SBSite node can be used in three cases: (1)

to define an ‘‘end effector’’, i.e. a location whichcan be used by an IK solver, (2) introduce anattachment point for accessories such as clothing,(3) to specify a location for a virtual camera in thereference frame of an SBSegment node.SBSite nodes are stored within the children field

of an SBSegment node. The SBSite node is aspecialized grouping node that defines a coordi-nate system for nodes in its children field that isrelative to the coordinate system of its parentnode. The reason an SBSite node is considered asa specialized grouping node is that it can only bedefined as a child of an SBSegment node.

4.2.2.4. SBMuscle node. The syntax of theSBMuscle node is defined as follows:

SBMuscle{exposedField MFInt32 skinCoord-

Index[ ]

exposedField MFFloat skinCoord-Weight

[ ]

exposedField SFNode muscleCurveexposedField SFInt32 radius 1exposedField SFInt32 falloff 1

}

The SBMuscle node enables to add localdeformation for simulating muscle action at theskin level. A muscle will be defined by a curve andby the area of influence of the curve. In general, a

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741734

Page 19: Insights into low-level avatar animation and MPEG-4 standardization

muscle is a child of an SBBone node. The muscleinfluence can be defined according to two mechan-isms:

1. direct ‘‘painting’’ by listing the affected verticesof the skin (skinCoordIndex) and the affected-ness measure (skinCoordWeight).

2. influence region computing by radius and falloff

fields. The radius field, specifies the maximumdistance where the ‘‘muscle’’ will affect the skin.The falloff field specifies the function betweenthe amplitude affectedness and distance: �1 forx3; 0 for x2; 1 for x; 2 for sinðxÞ; 3 for x1=2 and 4for x1=3:

The muscle curve representation is based onNurbCurve representation as defined in [9],

NurbsCurve {field MFFloat knot [ ]field SFInt32 order 3exposedField MFVec3f controlPoint [ ]exposedField MFFloat weight [ ]exposedField SFInt32 tessellation}Performing deformation consists in changing

the form of the muscle curve by affecting (1) theposition of the control points of the curve, (2) theweight of control points or/and (3) the knot

sequence. Depending on the producer, the anima-tion stream contains one animation mechanism ora combination of (1), (2) and (3).At the modeling stage, each affected vertex vi of

the skin is assigned to the closest point vci of thecurve. During the animation, the translation of vciobtained from the updated values of the control-

Point, weight or/and knot fields, will induce atranslation on vi:

1. skinCoordWeight field is specified for vertex vi;then

Tvi ¼ skinCoordWeight½k�*Tvci ; ð10Þ

where k is the index of vertex vi in the modelvertices index list;

2. radius field is specified, then

Tvi ¼ fdðvi; vci Þradius

� �*Tvci ð11Þ

with f ðÞ specified by the falloff field.

(e) SBSkinnedModel Node

The SBSkinnedModel node syntax is the follow-ing:

SBSkinnedModel{exposed-Field

SFString name 0000

exposed-Field

SFVec3f center 0 0 0

exposed-Field

SFRota-tion

rota-tion

0 0 1 0

exposed-Field

SFVec3f trans-lation

0 0 0

exposed-Field

SFVec3f scale 1 1 1

exposed-Field

SFRota-tion

scale-Orien-tation

0 0 1 0

exposed-Field

MFNode skin [ ]

exposed-Field

SFNode skin-Coord

NULL

exposed-Field

SFNode skin-Nor-mal

NULL

exposed-Field

MFNode skele-ton

[ ]

exposed-Field

MFNode bones [ ]

exposed-Field

MFNode mus-cles

[ ]

exposed-Field

MFNode seg-ments

[ ]

exposed-Field

MFNode sites [ ]

exposedField SFNode weighsComputationSkin-Coord NULL}The SBSkinnedModel node is at the top of the

hierarchy of Skin and Bones related nodes andcontains the definition parameters for the entireseamless model or for a seamless part of themodel. Mainly, the node contains:

1. a geometrical transformation which poses thecharacter in the scene when no animation isperformed. The transformation is generic andspecified by means of the following fields:

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 735

Page 20: Insights into low-level avatar animation and MPEG-4 standardization

center, translation, rotation, scale and scaleOr-

ientation.2. a list of all the vertices of the skin. The

skinCoord contains the 3D coordinates of allthe vertices of the seamless model.

3. a list of shapes which build the character skin.The skin field consists of a collection of shapesthat share the same skinCoord. This mechanismallows to consider the model as a continuousmesh and, at the same time, to attach differentattributes (like color, texture) to different partsof the model.

4. A skeleton hierarchy. The skeleton field specifiesthe root of the bone hierarchy.

5. A list of all the bones, segments, sites, andmuscles which belong to the character. Thebones, segments, sites and muscles fields respec-tively consist in the lists of all previously definedSBBone, SBSegment, SBSite and SBMuscle

nodes, respectively.6. The weighsComputationSkinCoord field de-scribes a specific static position of the skinnedmodel. In many cases, the static position of thearticulated model defined by skinCoord and skin

fields is not appropriate to compute theinfluence region of a bone. In this case, theweighsComputationSkinCoord field enables tospecify the skinned model vertices in a moreappropriate static posture. This posture will beused just during the initialization stage andignored during the animation. All the skeletontransformations are related to the posturedefined by skinCoord field.

7. The name field specifies the name of the skinnedmodel allowing an easy identification during theanimation run time.

4.3. Skinned model animation parameters

4.3.1. Animation principle

Animating a 3D articulated model requiresknowledge of the position of each model vertexat each key frame. Specifying such data is anenormous and expensive task. For this reason, theAFX animation system uses bone-based modelingof articulated models which effectively attaches themodel vertices to a bone hierarchy (skeleton). Thistechnique prevents the necessity to specify the

position of each vertex, only the local transforma-tion of each bone in the skeleton being considered.The local bone transformation components (trans-

lation, rotation, scale, scaleOrientation and center)are specified at each frame and, at the vertex level,the transformation is obtained by using the bone-vertex influence region.To address streamed animation, the animation

data is considered separately (independent of themodel definition) and is specified for each keyframe.Animating a skinned model is achieved by

updating the skeleton geometric transformationcomponents and by transforming the bones and/orthe muscle curve form.A general transformation of a bone, as defined

by the SBBone node, involves: translation in anydirection, rotation with respect to any rotationaxis, and scaling with respect to any direction andaxis.However, many motion editing systems use the

orientation decomposition according to the Eulerangles. In practice, when less than three angles aresufficient to describe a joint transformation, aEuler angle-based representation is more appro-priate. To compact the animation stream, arotation is represented as Euler angles in thestream. To ensure the bijectivity [30] of thetransformation between the Euler angles notationand rotation matrix (or quaternion representation)the rotationOrder field was introduced in SBBonenode. A triplet of Euler angles ½y1; y2; y3� describeshow a coordinate system r rotates with respectto a static coordinate system s; here, how abone coordinate system rotates with respectto its parent coordinate system. The triplet isinterpreted as a rotation by y1 around an axisA1; followed by a rotation by y2 around an axisA2; and by a rotation by y3 around an axis A3;with A2 different from both A1 and A3: Therotation axes are restricted to the coordinate axes,X ; Y and Z; resulting in 12 order possibilities [53].By considering the axis either in the bonecoordinate system r or in its parent coordinatesystem s; there are 24 possible values for rotatio-

nOrder.The bone-base animation (BBA) of a skinned

model is performed by updating the SBBone

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741736

Page 21: Insights into low-level avatar animation and MPEG-4 standardization

transformation fields (translation, rotation, scale,center and scaleOrientation) and/or by updatingthe SBMuscle curve control points position,weight or knot sequence. The BBA stream con-tains all the animation frames or just the data atthe temporal key frames. In the last case thedecoder will compute the intermediate frames bytemporal interpolation.Linear interpolation is used for translation and

scale components,

linearðt0; t1; tÞ ¼ t0ð1� tÞ þ t1ðtÞ; ð12Þ

where t0 is the translation in the first frame, t1 isthe translation in the last frame, and tA½0; 1�:Spherical linear quaternion interpolation is used

for rotation and scaleOrientation components,

Slerpðq0; q1; tÞ ¼q0 sinðð1� tÞOÞ þ q1 sinðtOÞ

sinðOÞ; ð13Þ

where q0 is the rotation quaternion in the firstframe, q1 is the rotation quaternion in the lastframe, cosðOÞ ¼ q0q1 and tA½0; 1�:Each key frame contains:

1. a natural integer KeyFrameIndex which indi-cates to the decoder the number of frames thathave to be obtained by interpolation. If it iszero, the decoder interprets the received frameas a normal frame and sends it to the animationengine. If not, the decoder computes Key-

FrameIndex intermediate frames and sendsthem to the animation engine, as well as thecontent of the key frame received.

2. the boneID as well as the animation mask foreach animated bone (cf. the description of theSBBone animation mask).

3. the muscleID as well as the animation mask foreach animated muscle (cf. the description of theSBMuscle animation mask).

4. the new values for each bone transformationcomponent which needs to be updated.

5. the new values of each muscle control pointposition or/and weigh and knot values whichneed to be updated.

Data related to 1–3 represent the frame anima-tion mask while data related to 4 and 5 yield theframe animation values.

A BBA stream contains the information relatedto a maximum number of 1024 SBBone and 1024SBMuscle nodes belonging to one or more skinnedmodels. The identifiers fields boneID and muscleID

must be unique in the scene graph and must belongto the interval ½0y1023�:

4.3.2. SBBone animation mask and

animation values

To address high compression efficiency issues,the generic bone transformation is representedwithin an animation bitstream as an animationvector of the transform elementary components aspresented in Table 2.The size of the animation mask of a bone can

vary from 2 bits, if there is motion on one singlecomponent, to 21 bits if all the components of thelocal transformation change with respect to theprevious key frame. When dealing with a complexrotation (in three directions) the animation maskwill have the form as in Table 3.

Table 2

Geometric transformation components in SBBone node defini-

tion and corresponding animation vector in the bitstream

Node representation Bitstream representation

SFVec3f translation int Tx; Ty; Yz

SFRotation rotation int RotAngleOnAxis1,

int RotAngleOnAxis2,

int RotAngleOnAxis3

SFVec3f scale int Sx; Sy; Sz

SFRotation scaleOrientation int Ax1; Ax2; Ax3; RotValSFVec3f center int Cx; Cy; Cz

Table 3

Example of a bone animation mask for a generic rotation of the

bone

Binary mask vector Semantics

0 IsTranslation

1 IsRotation

1 isRotation onAxis1

1 isRotation onAxis2

1 isRotation onAxis3

0 IsScale

0 IsScaleOrientation

0 IsCenter

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 737

Page 22: Insights into low-level avatar animation and MPEG-4 standardization

The animation values of a bone form a vectorwhich contains the new values of all the affectedcomponents. For the above example, the anima-tion value vector is: [rotOnAxis1, rotOnAxis2,rotOnAxis3].

4.3.3. SBMuscle animation mask and

animation values

An SBMuscle node can be animated by updat-ing the control points values of the NURB curve,the points weight or/and the knot sequence.The number of control points and the number of

elements of the knot sequence are integer numbersbetween 0 and 63 and will be encoded for eachframe after the muscleID. The animation values ofa muscle consists of a vector which contains thenew values of the changed data. The vector isordered according to the animation mask.

4.3.4. Frame mask and frame values representation

A global animation frame is able to animate asubset of SBBone nodes or/and SBMuscle nodesfrom the scene graph and refers to an animation

mask field and an animation values field which areobtained by concatenating the bone and muscleanimation masks and animation values, respec-tively. The bone and muscle IDs are also part ofthe animation mask.A frame animation value contains data changed

in the current frame for all the bones and musclesof the animation mask. The compression algo-rithms used in the case of FBA and presented inSection 3.3, namely predictive-based and DCT-based are also retained as part of the standard forencoding the BBA animation values.

5. FBA versus BBA

Fig. 14 shows a comparative analysis of theFBA and BBA frameworks.While FBA is dedicated to the representation of

the avatar as a segmented character, and is a moreappropriate framework for cartoon like applica-tions, BBA offers a higher degree of realisticrepresentation, based on the well-known conceptof skeleton-based animation. FBA standardizes afixed number of animation parameters (296), by

attaching to each anatomical segment one, two orthree rotation angles. BBA animation data in-volves a non-defined number of parameters whichare attached to each bone of the VC skeleton.Moreover, the animation parameters can berotations, translations and/or scaling factors. Bothframeworks address streaming animation by offer-ing low-bit-rate compression schemes. The twocompression methods, frame-based and DCT-based, are adopted for both FBA and BBA.Moreover, BBA natively supports advanced ani-mation techniques such as frame interpolation andIK. For both frameworks, the compression bit-rate depends on the movement complexity(number of segments/joints involved in themotion) and is in the range of 5–30 kbps (for aframe rate of 25 fps). Within FBA, the animationstream contains information on one singlehuman VC, whereas, within BBA, it is possibleto animate several characters in a scene byusing one or more streams. Muscle-like deforma-tions are present in both frameworks: FBA

Fig. 14. FBA versus BBA comparison.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741738

Page 23: Insights into low-level avatar animation and MPEG-4 standardization

standardizes in the case of facial deformation anumber of control points and BBA allowsthe designer to add curve-based deformers atany level of the skin. FBA standardizes thelocation and the number of control points,while BBA gives this freedom to the designer,being possible to achieve muscle-like deformationon any part of the VC skin and achieve realisticanimation.From a point of view of transporting FBA and

BBA data over networks (webcast or broadcast),the same mechanism based on the so-calledMPEG-4 BIFS-Anim stream is used. A BIFS-Anim stream is designed to animate objects in anMPEG-4 scene. Two approaches are supported:directly encode the update values for fields andinclude dedicated animation streams. Whateverthe MPEG-4 VC animation framework (FBA orBBA), compression performances are guaranteedby using the encapsulation-based approach. Forthe compression performances reasons, for both,FBA and BBA frameworks, the BIFS-Animstream use the second method. This encapsulationmechanism allows to deliver VC animation datawithin the MPEG-4 transport layer. Recentspecifications related to transport MPEG-4 dataon MPEG-2 transport channel allows us tointegrate avatar and generic VC, respectively,within real applications as digital video broad-casting.Related to the terminal capabilities, BBA

framework is more complex than FBA, in orderto achieve realistic animation, dedicated 3D hard-ware being well suited.A specific application related to a sign language

communication system [39,50,54] using VCs, anddeveloped within the ViSiCAST project [57],allows us to comparatively evaluate the FBA andBBA frameworks.ViSiCAST develops, evaluates and applies

realistic virtual humans (avatars), generatingEuropean deaf sign languages. By building appli-cations for the signing system in television, multi-media, web and face-to-face transactions,ViSiCAST aims to improve the position ofEurope’s deaf citizens, their access to publicservices and entertainments and enable them todevelop and consume their own multimedia

content for communication, leisure and learningthrough:

1. systems for the generation, storage and trans-mission of virtual signing.

2. user-friendly methods to capture signs (whereappropriate).

3. a machine readable system to describe sign-language gestures (hand, face and body)which can be used to retrieve stored gesturesor to build them from low-level motioncomponents. It will use this descriptive lan-guage to develop translation tools from speechand text to sign.

Fig. 15 shows the functional architecture of theViSiCAST broadcast signing.Fig. 16 shows the equipment for encoding and

transmission the VC animation data within anMPEG-2 transport layer as well as a PC-basedterminal, able to receive, decode and render videoand VC animation.

Fig. 15. ViSiCAST functional architecture.

Fig. 16. ViSiCAST broadcasting system.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 739

Page 24: Insights into low-level avatar animation and MPEG-4 standardization

Our specific application of a communicationsystem for deaf people community by using VCallow us to observe that MPEG-4 VC animationtools (FBA and BBA) ensure, with a low-bit-ratetransmission, a complete and scalable VC anima-tion solution in terms of realistic effects andterminal complexity.

6. Conclusion

We have presented in this paper, the MPEG-4activities related to VC animation. Our targetapplication was a sign language communicationsystem. We have described first how Amendment 1of the MPEG-4 standard offers an appropriateframework for virtual human animation, gesturesynthesis and compression/transmission. Then, wehave discussed how this framework is extendedwithin the ongoing standardization efforts by (1)allowing the animation of any kind of articulatedmodel, and (2) addressing advanced modeling andanimation concepts as ‘‘skin and bones’’-basedapproach. Techniques such as frame interpolation,IK, discrete cosine transform and arithmeticencoding making it possible to provide a highlycompressed animation stream, have been pre-sented and discussed. The two frameworks FBAand BBA have been finally evaluated in terms ofrealism, accuracy and transmission bandwidthwithin a sign language communication system.

References

[1] 3D Studio Max, Version 3.1, Autodesk, San Francisco,

CA, 1999.

[2] Activision, www.activision.com.

[3] W.W. Armstrong, M. Green, The dynamics of articulated

rigid bodies for purposes of animation, in: Proceedings of

the Graphics Interface ’85, 1985, pp. 407–415.

[4] Ascension Technology MotionStars http://www.ascen-

sion-tech.com/products/motionstar/.

[5] Attitude Studio, Eve Solal, www.evesolal.com.

[6] N. Badler, D. Metaxas, B. Webber, M. Steedman, The

center for human modeling and simulation, Presence 4 (1)

(1995) 81–96.

[7] N. Badler, C. Phillips, B. Webber, Simulating Humans:

Computer Graphics, Animation, and Control, Oxford

University Press, Oxford, 1993.

[8] blaxxun Community, VRML-3D-Avatars—Multi-User In-

teraction, http://www.blaxxun.com/vrml/home/ccpro.htm.

[9] blaxxun interactive, NURBS Extension for VRML97, 2/

2001.

[10] J. Blinn, A Generalization of algebraic surface drawing,

ACM Trans. Graph. 1 (3) (July 1982) 235–256.

[11] Boston Dynamics Inc., The digital biomechanics labora-

tory, www.bdi.com, 1998.

[12] W. Bricken, G. Coco, The VEOS Project Presence, Vol. 3

(2), MIT Press, Cambridge, MA, Spring 1994, pp. 111–129.

[13] T. Capin, I. Pandzic, N. Magnenat Thalmann, D.

Thalmann, Virtual human representation and communica-

tion in the VLNET networked virtual environments, IEEE

Comput. Graph. Appl. 17 (2) (1997) 42–53.

[14] C. Carlsson, O. Hagsand, DIVE – a multi-user virtual

reality system. In: Proceedings of the IEEE Virtual Reality

Annual International Symposium, (VRAIS), Seattle, WA,

1993, pp. 394–400.

[15] E. Catmull, J. Clark, Recursively generated B-spline

surfaces on arbitrary topological meshes, Comput. Aided

Design 10 (September 1978) 350–355.

[16] Disney Animation Studios, Flubber (1997) http://disney.

go.com/disneyvideos/liveaction/flubber/.

[17] Division Ltd. dVS Technical Overview, Version 2.0.4, 1st

Edition, 1993.

[18] P.K. Doenges, T.K. Capin, F. Lavagetto, J. Ostermann,

I.S. Pandzic, E. Petajan, MPEG-4: Audio/video and

synthetic graphics/audio for mixed media, Signal Proces-

sings Image Communication 9 (4) (1997) 433–463.

[19] Electronic Arts, www.ea.com.

[20] M. Escher, I. Pandzic, N. Magnenat-Thalmann, Facial

deformations for MPEG-4, in: Computer Animation 98,

Philadelpia, USA, IEEE Computer Society Press, 1998,

pp. 138–145.

[21] P. Falouutsos, M. van de Panne, D. Terzopoulos,

Composable controllers for physics-based character ani-

mation, in: Computer Graphics Proceedings, Annual

Conference Series, 2001, ACM SIGGRAPH.

[22] Faro Technologies, www.farotechnologies.com.

[23] J.D. Foley, A. van Damm, S.K. Feiner, J.F. Hughes,

Computer Graphics—Principles and Practice, 2nd Edition,

Addison-Wesley, Reading, MA, 1992.

[24] J.D. Funge, AI for Computer Games and Animation: A

Cognitive Modeling Approach, AK Peters, USA, August

1999.

[25] Geri’s game, Toy Story (1995), A Bug’s Life (1998), Toy

Story 2 (1999) and Monsters, Inc. (2001). Walt Disney

Pictures and Pixar.

[26] C. Greenhalgh, S. Benford, MASSIVE: a distributed

virtual reality system incorporating spatial trading. In:

15th International Conference on Distributed Computing

Systems (DCS’95), Vancouver, Canada, 30 May –2 June,

IEEE Computer Society Press, Silver Spring, MD, 1995.

[27] Humanoid Animation Specification, www.h-anim.org.

[28] Introducing Seonaid, our online news presenter, Scotland

government web page, http://www.scotland.gov.uk/pages/

news/junior/introducing seonaid.aspx.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741740

Page 25: Insights into low-level avatar animation and MPEG-4 standardization

[29] ISO/IEC JTC1/SC29/WG11, ISO/IEC 14496:1999, Cod-

ing of audio, picture, multimedia and hypermedia infor-

mation, N3056, Maui, December 1999.

[30] C. John Jr., Introduction to Robotics: Mechanics and

Control, 2nd Edition, Addison-Wesley, Reading, MA,

1989.

[31] P. Kalra, N. Magnenat-Thalmann, L. Moccozet, G.

Sannier, A. Aubel, D. Thalmann, Real-time animation of

realistic virtual humans, IEEE Comput. Graph. Appl. 18

(5) (1998) 42–57.

[32] F. Lavagetto, R. Pockaj, The facial animation engine:

toward a high-level interface for the design of Mpeg-4

compliant animated faces, IEEE Trans. Circuits Systems

Video Technol. 9 (2) (March 1999) 277–289.

[33] Living actor technology, http://www.living-actor.com/.

[34] C. Loop, Smooth subdivision surfaces based on triangles,

Master’s Thesis, Department of Mathematics, University

of Utah, August 1987.

[35] M.R. Macedonia, M.J. Zyda, D.R. Pratt, P.T. Barham, S.

Zeswitz, NPSNET: A network software architecture

for large scale virtual environments, Presence 3 (4) (1994)

265–287.

[36] N. Magnenat-Thalmann, D. Thalmann, Virtual Reality

Software and Technology, Encyclopedia of Computer

Science and Technology, Vol. 41, Marcel Dekker, New

York, 1999.

[37] Maya Infinity, Alias/Wavefront Inc., 1999.

[38] T.B. Moeslund, E. Granum, A survey of computer vision-

based human motion capture, Comput. Vision Image

Understanding 81 (3) (2001) 231–268.

[39] G. Mozelle, F. Pr#eteux, J.E. Viallet, Tele-sign: A compres-

sion framework for sign language distant communication,

in: Proceedings of the SPIE Conference on Mathematical

Modeling and Estimation Techniques in Computer Vision,

San Diego, CA, July 1998, Vol. 3457, pp. 94–110.

[40] M.G. Pandy, F.C. Anderson, Three-dimensional computer

simulation of jumping and walking using the same model,

in: Proceedings of the VIIth International Symposium on

Computer Simulation in Biomechanics, August 1999.

[41] M.G. Pandy, F.E. Zajac, E. Sim, W.S. Levine, An optimal

control model for maximum-height human jumping, J.

Biomech. 23 (12) (1990) 1185–1198.

[42] I. Pandzic, N. Magnenat-Thalmann, T. Capin, Thalmann

Virtual LifeNetwork: A Body-Centered Networked Virtual

Environment, Presence, MIT Press, Cambridge, MA, Vol.

6 (6), 1997, pp. 676–686.

[43] L. Piegl, W. Tiller, The NURBS Book, 2nd Edition,

Springer, Berlin, 1997.

[44] Polhemus STAR�TRACKs motion capture system,

http://www.polhemus.com.

[45] M. Preda, F. Preteux, Streamed animation within Anima-

tion Framework eXtension (AFX), ISO/IEC JTC1/SC29/

WG11, MPEG01/M7588, Pattaya, Thailand, December

2001.

[46] M. Preda, F. Preteux, Deformable Bones or Muscle-based

Deformation? ISO/IEC JTC1/SC29/WG11, MPEG01/

M7757, Pattaya, Thailand, December 2001.

[47] M. Preda, T. Zaharia, F. Preteux, 3D body animation and

coding within a MPEG-4 compliant framework, in:

Proceedings of the International Workshop on Synthetic-

Natural Hybrid Coding and Three Dimensional Imaging

(IWSNHC3DI’99), Santorini, Greece, 15–17 September

1999, pp. 74–78.

[48] F. Pr#eteux, M. Preda, G. Mozelle, Donation to ISO of

Hand Animation Software, ISO/IEC JTC1/SC29/WG11,

M3590, July 1998, Dublin.

[49] S20 Sonic Digitizers, Science Accessories Corporation.

[50] Seamless Solutions Inc. demos, http://www.seamless-solu-

tions.com.

[51] H. Seo, F. Cordier, L. Philippon, N. Magnenat-Thalmann,

Interactive modelling of MPEG-4 deformable human body

models, DEFORM’2000 Workshop, Geneva, 29–30

November 2000, pp. 120–131.

[52] M.B. S!evenier, et al. (Editors), PDAM of ISO/IEC 14496-

1/AMD4, ISO/IEC JTC1/SC29/WG11, N4415 December

2001, Pattaya.

[53] K. Shoemake, Euler Angle Conversion, Gramphics Gems

IV, Academic Press Professional, Toronto, 1994.

[54] Sign Language Web Site at University Lumi"ere Lyon2,

http://signserver.univ-lyon2.fr.

[55] The virtual reality modeling language, International

Standard ISO/IEC 14772-1:1997, www.vrml.org.

[56] Vandrea news presenter, Channel 5, British Broadcasting

Television.

[57] ViSiCAST, Virtual Signer Communication, Animation,

Storage and Transmission, IST, European Project 1999-

2002, www.visicast.org.

[58] VPL Research Inc., Dataglove Model 2 Operation

Manual, January 1989.

[59] J. Wilhelms, B.A. Barsky, Using dynamic analysis to

animate articulated bodies such as humans and robots, in:

Graphics Interface ’85, May 1985, pp. 97–104.

[60] W. Wooten, Simulation of leaping, tumbling, landing, and

balancing humans, Ph.D. Thesis, Georgia Institute of

Technology, March 1998.

[61] G. Wyvill, C. McPhetters, B. Wyvill, Data structure for

soft objects, The Visual Comput. 2 (1986) 227–234.

[62] D. Zorin, P. Schr .oder, et al., Subdivision for modeling and

animation, SIGGRAPH’00 Conference Course Notes,

July 2000.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 741