Virtual Character Within MPEG-4 Animation Framework eXtension
Post on 10-Mar-2017
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004 975
Virtual Character Within MPEG-4 AnimationFramework eXtension
Marius Preda and Franoise Preteux
AbstractEnriched multimedia applications and services aimat combining images, sounds, videos, and synthetic objects into hy-brid and interactive scenes. The core technologies discussed heredeal with the representation and integration within such complexscenes of a specific kind of synthetic data, namely virtual characteranimation. This paper analyzes how an integrated and standard-ized framework is currently emerging in order to ensure applica-tion interoperability, universal content access, and user interac-tivity. We first compare how virtual character animation has beenaddressed within virtual readity modeling language (VRML) andMPEG-4 synthetic and natural hybrid coding standardization pro-cesses. A comparative synthesis between the objectives and the ca-pabilities of each framework is exposed, specifically MPEG-4 Faceand Body Animation (FBA) versus H-Anim (Humanoid Anima-tion Working Group, WEB3D Consortium, h-anim.org.) 1.1, andMPEG-4 Bone-Based Animation (BBA) versus H-Anim 2001. TheAnimation Framework eXtension (AFX) specifications that are apart of MPEG-4 Systems Part 16 include the BBA framework.The BBA animation concepts, based on generic skeleton represen-tation and curve-based deformations, are introduced. The defini-tion of the related nodes and how they successfully address theBBA concepts are as well discussed. Some comments with respectto the rotation representation, the interpolation methods, the an-imation mask, and value parameters of the animation stream aremade. This set of specifications provides an efficient framework,which is appropriate to real-time and animation realistic applica-tions within networked environments.
Index TermsAnimation framework eXtension (AFX),bone-based animation (BBA), face and body animation (FBA),H-Anim (1.1, 2001), low-bitrate animation compression, MPEG-4synthetic and natural hybrid coding (SNHC), skeleton, muscle,and skin (SMS) model definition and animation, virtual characteranimation, virtual reality modeling language (VRML).
EVERYBODY currently witnesses the explosion of multi-media content consumption. Broadcast productions over-flow real life with images and sounds on thousand of chan-nels throughout the world. Furthermore, a new age started withthe development of the Internet and mobile communications.Nowadays, it is not only a source of written information, buta huge multimedia network where the distinction between userand content creator is not clearly stated. Multimedia presenta-tions are no longer linear: the user is now able to interact withthe content, to change the presentation course, and to immerseinto it. As an active part of the presentation, the user needs a rep-resentation of his presence. Moreover, when multimedia presen-
Manuscript received July 23, 2003; revised December 10, 2003, and March17, 2004.
The authors are with the Unit de projets ARTEMIS, GET/In-stitut National des Tlcommunications, 91011 Evry, France (e-mail:Marius.Preda@int-evry.fr; email@example.com).
Digital Object Identifier 10.1109/TCSVT.2004.830661
tations are multi-user, one needs to know where the other usersare, what they are doing, and how to interact with them. Hence,the virtual representation of a human being is required.
In the late 1970s, the first three-dimensional (3-D) virtualhuman models were designed and animated by means of thecomputer. In the last decade of the 20th Century, networkedgraphics applications using virtual characters were mostly pro-totype systems , , ,  demonstrating the effectivenessof the technology. At the beginning of the 21st century, com-mercial systems invaded the market, mainly thanks to technicaldevelopments in the area of networked games , . Startingfrom simple and easy-to-control models used in games, evolvingto more complex virtual assistants for commercial  or infor-mational web sites , and going toward new stars of virtualcinema , television , and advertising , the 3-D virtualcharacter industry is currently booming.
This paper aims to address the state of the art of the openstandards related to virtual character definition and animation.Section II briefly presents the current related standardizationframeworks, namely MPEG-4 Synthetic and Natural HybridCoding (SNHC) and H-Anim. A comparative synthesis be-tween the objectives and the capabilities of each frameworkis proposed. Specifically MPEG-4 Face and Body Animation(FBA) versus H-Anim 1.1, and MPEG-4 Bone-Based Anima-tion (BBA) versus H-Anim 2001 are discussed.
Section III deals with the BBA framework, part of the newAnimation Framework eXtension (AFX) specifications devel-oped by the MPEG community in the MPEG-4 Systems Part 16.The BBA animation concepts, based on generic skeleton rep-resentation and curve-based deformations, are introduced. Thedefinitions of the principal nodesSBBone, SBMuscle, SB-SkinnedModel, and SBVCAnimationare discussed, and we sdescribe in detail how they successfully address the BBA anima-tion concepts. Comments on the animation stream, with respectto the rotation representation, the interpolation methods for an-imation parameters, and the representation based on animationmask and value parameters are made.
Briefly describing a series of integrated applications, the lastsection motivates the need for an open multimedia standard ableto take into account virtual characters. Remarks and perspec-tives conclude the paper.
II. STANDARDIZING VIRTUAL CHARACTER ANIMATION
Creating, animating, and especially sharing virtual charac-ters require unified data formats. If some animation industryleaders try, and sometimes succeed , , to impose theirown formats, the alternative of open standard is always wel-come by the community. A standard is even more expected and
1051-8215/04$20.00 2004 IEEE
976 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004
needed as soon as applications and services require the sup-port of various activity field actors as content creators, devel-opers, service providers [e.g., broadcasters and Internet serviceproviders (ISPs)], terminal manufacturers, intellectual propertyrights (IPR) holders, and so on.
Current efforts for providing real applications within a unifiedand interoperable framework are materialized by 3-D graphicsinterchange standards such as VRML  and MPEG-4 .Each of them addresses, more or less in liaison, the virtual char-acter animation issue. In the VRML community, the H-Animgroup released three versions of the specifications (1.0, 1.1, and2001), while the SNHC subgroup of MPEG released two ver-sions (FBA and BBA). Section II-A proposes an analysis of themain similarities and differences of these two frameworks.
A. MPEG-4 SNHC and VRML H-Anim
The MPEG-4 standard, unlike the previous MPEG stan-dards, does not cope only with highly efficient audio andvideo compression schemes, but introduces the fundamentalconcept of media objects such as audio, visual, and two-di-mensional (2-D)/3-D natural and synthetic objects composinga multimedia scene. Temporal and/or spatial behavior can beassociated with an object. The main functionalities proposedby the standard are related to the compression of each typeof media object, hybrid encoding of the natural and syntheticobjects, universal content accessibility over various networksand interactivity at the user terminal. In order to specify thespatial-temporal localization of an object in the scene, MPEG-4defines a dedicated language called Binary Format for Scenes(BIFS). BIFS inherits from VRML the representation of thescene, described as a hierarchical graph, and some dedicatedtools such as animation based on interpolators, events routedto the nodes or sensor-based interactivity. However, BIFSintroduces some new and advanced mechanisms such as com-pression schemes for encoding the scene, streamed animation,integration of 2-D objects and advanced temporal control.
In terms of functionalities both standards -VRML andMPEG-4 define a set of nodes in the scene graph allowing arepresentation of an avatar. However, only the MPEG-4 SNHCspecifications enable the functionality of streamed virtualcharacter animation. A major difference between MPEG-4SNHC and H-Anim is the following: an MPEG-4 compliantvirtual character can coexist within a hybrid environment, andits animation can be synchronized with other types of mediaobjects, while the H-Anim avatar must exist within a VRMLworld and must be animated by VRML generic animation tools.
For structuring the comparative study, we consider thateach framework has developed a first suite of basic animationtools (based on the segmented model approach) correspondingto H-Anim 1.1 and MPEG-4 FBA and some more advancedtools (based on seamless model approach) corresponding toH-Anim 2001 and MPEG-4 BBA. These successive versionsare discussed in the following sections.
1) MPEG-4 FBA and H-Anim 1.1: H-Anim 1.1 specifica-tions define five nodes (Joint, Segment, Site, Humanoid andDisplacer node) that are able to model a segmented avatar.The framework deals mainly with the animation of the humanbody. However, by using the Displacer node, which allows
representing local deformations, it is also possible to addressface animation. The animation tools are inherited from VRML.An H-Anim avatar is usually animated using interpolators.
An MPEG-4 FBA compliant avatar is defined by means oftwo MPEG-4 specific top nodes, namely face and body nodes,and animated with a dedicated compressed stream. The facenode is completely (in terms of geometry, appearance, defor-mation behavior and animation parameters) defined in MPEG-4. The body node used for representing the avatar refers to theH-Anim nodes for defining the geometry and the appearanceand introduces new information concerning the deformation be-havior and the animation parameters. The compact representa-tion of the animation parameters as well as the two compressionmethods retained by the MPEG-4 standard allows streamingavatar animation at very low bit-rate (about 2 kbps for the faceand from 10 to 35 kb/s for the body).
2) MPEG-4 BBA and H-Anim 2001: The rapid developmentof 3-D hardware capabilities makes now possible to addressadvanced animation techniques on popular and low-cost plat-forms. Generic and powerful techniques such as seamless skinmodel definition and skeleton-based animation are widely usedin current animation packages. To support this trend, H-Animand MPEG-4 SNHC introduced new mechanisms in the latestversions of the specifications. H-Anim redesigned the joint andhumanoid node, and MPEG-4 SNHC defines new scene graphnodes and a new compressed stream. While H-Anim alwaysaddresses human-like avatar representation, MPEG-4 SNHCextends the specifications, supporting the representation of anykind of articulated object. The concept of skin deformationinduced by the skeleton is present in both frameworks, whichtherefore support specifying the influence between bone motionand skin deformation. During the MPEG-4 standardizationprocess, special attention has been paid  to enable easymapping or conversion between H-Anim and MPEG-4 SNHC.MPEG-4 SNHC, however, allows the definition of a musclelayer based on curve deformation and provides high-leveland compact represented tools for computing bone-skin andmuscle-skin influence.
While within H-Anim 2001 the avatar animation relies onVRML generic mechanisms as interpolators, within MPEG-4SNHC such generic tools are supported by means of BIFSmechanisms, and, in addition, efficient compression schemeshave been introduced.
The next sections describe in detail the key aspects of thisnew MPEG-4 animation framework.
III. GENERIC SKELETON-, MUSCLE-, AND SKIN-BASED MODELDEFINITION AND ANIMATION
A. General Principle
An articulated virtual character, also called kinematicslinkage, is made of a series of rigid links that are connectedat joints. In the case of a segmented virtual character, a linkcorresponds to the envelope of an anatomical segment. In thecase of a seamless virtual character, a link is associated witheach bone of the skeleton. In order to define the static 3-Dpose of a seamless virtual character, including geometry, color,and texture attributes, the following approach is adopted: 1)
PREDA AND PRETEUX: VIRTUAL CHARACTER WITHIN MPEG-4 AFX 977
to consider the entire virtual character as one single 3-D meshreferred to as a global seamless mesh; 2) to specify for eachbone a so-called weighting measure affecting the skin vertices;and 3) to apply an additively principle in order to define foreach vertex of the global mesh the influence of all the bonesdirectly altering its 3-D position (i.e., summation of all theweighting measures for the given vertex). An appropriate setof weighting vectors converts any rigid movement of the bonesinto a smooth deformation of the skin. Such a bone-skin motionmapping can be computed by parameterizing the volumearound the bone. However, some errors may occur dependingon some local configurations of the character geometry. In thiscase, a direct specification of the bone-skin motion mapping isrequired from the virtual character designer.
During the animation stage, the bones can only be affected byrigid transform and can not be deformed. Sometimes, realisticanimation requires local deformations of the skin, in order tosimulate the muscular activity effects. The AFX specificationsfulfill this requirement by attaching at any level of the skeletonsome curve-based entities, called muscles. A muscle is charac-terized by an influence region defined at the skin level. The de-formation of the muscle-curve induces local deformation of theskin.
The generic approach consisting of animating a virtualcharacter from its skeleton and its muscles is referred to as theSkeleton, Muscle, and Skin (SMS) method. SMS-based appli-cations achieve a realistic animation of any kind of articulatedvirtual character. AFX SMS related nodes allow the definitionof any kind of skeleton hierarchy, with any type of attachedgeometry [based on IndexedFaceSet or higher order geometrynodes like nonuniform rational B-splines (NURBS) and sub-division surfaces] and any appearance attributes. The skeletonhierarchy is not predefined as in the case of the previousMPEG-4 FBA framework, but the designer is free to considerwhat level of detail he/she wants to build the character. In orderto animate the character, an animation resource (uncompressedfile or compressed stream) can be attached to a model or agroup of models. The representation of the animation param-eters allows addressing animation editing or/and animationextraction from motion capture systems in an easy way. Theformat of the animation data stream ensures transmission andanimation scalability. Without imposing specification...