[ieee ieee international geoscience and remote sensing symposium. igarss 2002 - toronto, ont.,...

4
AN INTRODUCTION TO MPEG-4 ANIMATION FRAMEWORK EXTENSION (AFX) Mikael Bourges-Sivenier Mindego Inc., 100 Buckingham drive, Suite 238, Santa Clara, CA 95051, USA ABSTRACT This document presents MPEG-4 Animation Framework extension (AFX), a recent amendment to the MPEG-4 Systems specification. AFX defines high-level geometry, texture, volume, and animation components for enhanced interactive multimedia applications. 1 INTRODUCTION Computer graphics standards such as Virtual Reality Modeling Language [2] (VRML) or MPEG-4’s Binary Format for Scene [I] (BIFS) are based on common industry practice and favor interoperability among players. These standards are made of tools or components organized in a scene graph. The scene graph is a tree structure in which each node is a component and branches are its properties. How is a component defined? Originally, a component follows the famous sentence “onc tool, one functionality”. Hence, the first components that appeared in VRML were very low-level in the sense very close to graphic APIs such as OpenGL. However, higher- level components were needed. BIFS is a binary superset of VRML 2.0 and supports all its features. While VRML 2.0 follows a download-and-play philosophy, BIFS was designed for streaming with other medias. Since its first release at the end of 1998 [I], low- level components have been added to BIFS but few high- level ones. In November 2000, the AFX group started to look at high-level components and a framework to support them with the following fact: in a VRMLiBIFS contents with 2Di3D animated objects, 80% or more of the file often contains animation and geometry data. As the tools used are so low-level, lots of information is needed to describe realistic animations and 2D/3D objects. On the other hand, many higher-level tools have been developed for industries such as medical, CAD/CAM, and games. Higher-level components can be defined as providing a compact representation of functionality in a more abstract manner. Typically, this abstraction leads to mathematical models that need few parameters. These models cannot be rendered directly by a graphic card: internally, they are converted to low-level tools a graphic card can render. Besides a more compact representation, this abstraction often provides other functionalities. For example, a 0-7803-7622-6/02/$17.00 02002 IEEE 111 - 1 subdivision surface can be subdivided based on the area viewed by the user. This providcs four functionalities: compact representation, view-dependent subdivision, automatic level-of-details, and progressive local refinements. Enabling all these functionalities may require lots of computations and an implementation may providc crude capabilities for a limited resource terminal but full support for a desktop. Obviously, the rendering won’t have the same quality but the content will be the same. Thus, another benefit of such representations is scalability. The organization of this document is as follows. First, we present MPEG-4 with an emphasis on synthetic scenes where AFX components are used. Then, we prcscnt AFX and the components present in the specification. Finally, we discuss perspcctivcs and challenges of future AFX work. 2 MPEC-4 OVERVIEW The MPEG-4 toolbox [I] contains many tools for audio, video, 2D and 30 graphics, animation, interactivity, streani synchronization, and so on; everything one can expect to build a multimedia platform. Figure I shows the internals of an MPEG-4 player. Going from left to right, an incoming stream is received by an abstract interface called the Delivery Multimedia Integration Framework (DMIF). DMIF handles the network connections to retrieve the content. Each stream in the content is thcn demultiplexed and feed into its corresponding decoder. In contrast with other media decoders, the BIFS (Binary Format for Scene) decoder outputs a tree of objects instead of an array of data as with audio and video. The scene composes all streams together in the compositor in order to render the content. Intellectual Property Management and Protection (IPMP) systems may protect each input and output of the tools. BIFS, the Blnary Format for Scene, is a binary representation of a VRML scene graph [5], enriched with capabilities of MPEG-4 streaming architecture. AFX, described in the remainder of this document extends BIFS features [4]. In contrast with BIFS nodes, AFX nodes may have more efficient dedicated encoding [4]. IEEE ICIP 2002

Upload: m

Post on 16-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE IEEE International Geoscience and Remote Sensing Symposium. IGARSS 2002 - Toronto, Ont., Canada (24-28 June 2002)] Proceedings. International Conference on Image Processing -

AN INTRODUCTION TO MPEG-4 ANIMATION FRAMEWORK EXTENSION (AFX)

Mikael Bourges-Sivenier

Mindego Inc., 100 Buckingham drive, Suite 238, Santa Clara, CA 95051, USA

ABSTRACT

This document presents MPEG-4 Animation Framework extension (AFX), a recent amendment to the MPEG-4 Systems specification. AFX defines high-level geometry, texture, volume, and animation components for enhanced interactive multimedia applications.

1 INTRODUCTION

Computer graphics standards such as Virtual Reality Modeling Language [2] (VRML) or MPEG-4’s Binary Format for Scene [ I ] (BIFS) are based on common industry practice and favor interoperability among players. These standards are made of tools or components organized in a scene graph. The scene graph is a tree structure in which each node is a component and branches are its properties. How is a component defined? Originally, a component follows the famous sentence “onc tool, one functionality”. Hence, the first components that appeared in VRML were very low-level in the sense very close to graphic APIs such as OpenGL. However, higher- level components were needed. BIFS is a binary superset of VRML 2.0 and supports all its features. While VRML 2.0 follows a download-and-play philosophy, BIFS was designed for streaming with other medias. Since its first release at the end of 1998 [I] , low- level components have been added to BIFS but few high- level ones. In November 2000, the AFX group started to look at high-level components and a framework to support them with the following fact: in a VRMLiBIFS contents with 2Di3D animated objects, 80% or more of the file often contains animation and geometry data. As the tools used are so low-level, lots of information is needed to describe realistic animations and 2D/3D objects. On the other hand, many higher-level tools have been developed for industries such as medical, CAD/CAM, and games. Higher-level components can be defined as providing a compact representation of functionality in a more abstract manner. Typically, this abstraction leads to mathematical models that need few parameters. These models cannot be rendered directly by a graphic card: internally, they are converted to low-level tools a graphic card can render. Besides a more compact representation, this abstraction often provides other functionalities. For example, a

0-7803-7622-6/02/$17.00 02002 IEEE 111 - 1

subdivision surface can be subdivided based on the area viewed by the user. This providcs four functionalities: compact representation, view-dependent subdivision, automatic level-of-details, and progressive local refinements. Enabling all these functionalities may require lots of computations and an implementation may providc crude capabilities for a limited resource terminal but full support for a desktop. Obviously, the rendering won’t have the same quality but the content will be the same. Thus, another benefit of such representations is scalability. The organization of this document is as follows. First, we present MPEG-4 with an emphasis on synthetic scenes where AFX components are used. Then, we prcscnt AFX and the components present in the specification. Finally, we discuss perspcctivcs and challenges of future AFX work.

2 MPEC-4 OVERVIEW

The MPEG-4 toolbox [ I ] contains many tools for audio, video, 2D and 3 0 graphics, animation, interactivity, streani synchronization, and so on; everything one can expect to build a multimedia platform. Figure I shows the internals of an MPEG-4 player. Going from left to right, an incoming stream is received by an abstract interface called the Delivery Multimedia Integration Framework (DMIF). DMIF handles the network connections to retrieve the content. Each stream in the content is thcn demultiplexed and feed into its corresponding decoder. In contrast with other media decoders, the BIFS (Binary Format for Scene) decoder outputs a tree of objects instead of an array of data as with audio and video. The scene composes all streams together in the compositor in order to render the content. Intellectual Property Management and Protection (IPMP) systems may protect each input and output of the tools. BIFS, the Blnary Format for Scene, is a binary representation of a VRML scene graph [ 5 ] , enriched with capabilities of MPEG-4 streaming architecture. AFX, described in the remainder of this document extends BIFS features [4]. In contrast with BIFS nodes, AFX nodes may have more efficient dedicated encoding [4].

IEEE ICIP 2002

Page 2: [IEEE IEEE International Geoscience and Remote Sensing Symposium. IGARSS 2002 - Toronto, Ont., Canada (24-28 June 2002)] Proceedings. International Conference on Image Processing -

Figure 1 - MPEG-4 Systems Architecture and AFX streams.

3 THE ANIMATION FRAMEWORK EXTENSION (AFX)

3.1 AFX concepts

The AFX specification [4] contains components for rendering geometry, textures, volumes, and animation organized around AFX conceptual organization of models for computer games and animation [ 1 I ] (Figure 2 ) . To understand this organization, let’s take an example. Suppose one wants to build an avatar. The avatar consists of geometry elements that describe his legs, arms, head and so on. Simple geometric elements can be used and deformed to produce more physically realistic geometry. Then, skin, hair, cloths are added. These may be physic- based modcls attached to the gcometry. Whenever the geometry is deformed, these models deform and thanks to their physics, they may produce wrinkles. Biomechanical models are used for motion, collision response, and so on. Finally, our avatar may exhibit special behaviors when he encounters objects in its world. He might also leam from experiences: for example, if he touches a bot surface, it hurts. Next time, he will avoid touching such as surface. This hierarchy also works in a top to bottom manner: if he touches a hot surface, his behavior may be to retract his hand. Retracting his hand follows a biomechanical pattern. The speed of the movement is based on the physical property of his hand linked to the rest of its body, which in turn modify geometric properties that define the hand. AFX does not define models for the last two categories as they are heavily application-dependent and standard techniques are often customized for each application. Animation of the models is possible at any stage of the pyramid, except the last two stages.

Animation Framework extension - AFX ‘effeee’

Figure 2 - AFX conceptual organization of models.

3.2 The AFX models

AFX defines 6 categories of models, following [I I]: 1. Geomerric models. They capture the form and

appearance of an object. Many characters in animations and games can be quite efficiently controlled at this low-level. Due to the predictable nature of motion, building higher-level models for characters that are controlled at the geometric level is generally much simpler. Modeling models. They are an extension of geometric models and provide linear and non-linear deformations of geometry they control. Physical models. They capture additional aspects of the world such as an object’s mass inertia, and how it responds to forces such as gravity. The use of physical models allows many motions to he created automatically and with unparallel realism.

4. Biomechanical models. Real animals have muscles that they use to exert forces and torques on their own bodies.

5 . Behavioral models. A character may expose a reactive behavior when its behavior is solely based on its perception of the current situation. Goal-directed behaviors can be used to define a cognitive character’s goals. They can also he used to model pocking behaviors. Cognitive models. If the character is able to leam from stimuli from the world, it may be able to adapt its behavior. These models are related to artificial intelligence techniques.

2.

3.

6.

3.3 AFX components

3.3. I Shaping objects VRML objects consist of a polygonal meshes typically described in IndexedFaceSet nodes. The polygonal mesh represents a sampled version of smooth surfaces and small

Page 3: [IEEE IEEE International Geoscience and Remote Sensing Symposium. IGARSS 2002 - Toronto, Ont., Canada (24-28 June 2002)] Proceedings. International Conference on Image Processing -

faces approximate curvature. AFX proposes curved surfaces such as NURBS [14], [I21 and Subdivision Surface [3]. Subdivision surfaces support well-known Loop [12] and Catmull-Clark [7] algorithms as well as extensions such as normal control and edge sharpness [3]. An extended Loop algorithm enables quadrangulated meshes to be divided smoothly while retaining smooth color transitions during the subdivision process. Hierarchical subdivision surfaces enables progressive detail additions at each level of the subdivision process, wavelets are used to compress the detail signal.

Figure 3 - Subdivision surfaces using extended Loop algorithm: starting from quads (left), triangulation creates invisible edges (center). The extended Loop algorithm preserves curvatures during the Subdivision process (right).

Figure 4 -Subdivision surfaces with normal control.

Figure 5 - Hierarchical Subdivision Surfaces.

AFX introduces solid modeling [I51 that enables content authors to create complex volumes using exact geometry and an extension toconstructive solid geometry (Error!

Figure 6 - Solid modeling operations with density. From left to right: two separate spheres, intersecting spheres, and overlapping spheres. The center line is a cross-section of the left sphere showing the densities.

3.3.2 Texturing objects AFX proposes new tools for creating textures: from procedural textures, to light-field mapping [SI, to image- based rendering [lo], and to photorealistic synthetic images [7]. Light field mapping offers a compelling solution to efficient interactive visualization of photorealistic reflectance properties of both real and synthetic objects and complete environments. Image-based rendering uses images with depth information to represent

Figure 7 - Light-field mapping objects and environment (top). A troll represented using depth image-based rendering (bottom).

Photorealistic synthetic textures can be achieved using the SynthetizedTexture framework: color information of an image is represented with various vector graphics tools

Figure 8 - SynthesizedTexture framework describes images using scene tools in a photorealistic manner. Left: original image, Right: close up showing the scene elements.

3.3.3 AFXanima/ion /oo/s As shown in Figure 2, the first four levels of the pyramid can be animated and, as a rule of thumb, the higher the level in the pyramid, the less data needed to convey animation. MPEG-4 BIFS provides animation tools using

Page 4: [IEEE IEEE International Geoscience and Remote Sensing Symposium. IGARSS 2002 - Toronto, Ont., Canada (24-28 June 2002)] Proceedings. International Conference on Image Processing -

piecewise-linear interpolators. Such tools often require lots of data to approximate cuwature of the real paths and assume a linear timeline. In contrast, AFX proposes the Animator node based on NURBS curve geometry for both the animation path and the timeline (Figure 9).

t

Linear

t t

I . Pacsd velmtyaplins

Figure 9 - Left: the same path is traveled with different timelines. Right: example of animation path made of two NURBS segments.

AFX also proposes the Bone Based Animation (BBA) tool that enables skeleton animation. A skeleton is composed of bones. Bones are typically connected to a skin mesh model such that when a bone moves, the skin is deformed accordingly. BBA is biomechanical tool and can be used for any type of skeleton, not just human-like avatars. The skin models can be simple meshes or more complex

[2] ISO/IEC 14772-1, The Visual Reality Modeling Language (VRM), 1997

[3] H. Biermann, A. Levin., D. Zorin. "Piecewise-smooth subdivision surfaces". SIGGRAPII 2000 Conference Proceedings. New Orleans, Louisiana. July 23-28, 2000. pp. 113-120

[4] M. Bourges-SBvenier et al. Study o/lSO/IEC 14496- 1:200l/PDAM4, Animation Framework extension and Multi-User Worlds, document N4852, May 2002

[ 5 ] M. Bourges-Sevenier, A. Walsh. M7'EG-4 Jumpsfart. Prentice Hall, December 2001.

[6] M. Bourges-Sivenier, A. Walsh. Core Veb3D. Prentice Hall, September 2000.

[7] M. Briskin, Y. Elichai, Y. Yomdin, "How can Singularity Theory help in Image Processing?", Pattern Formation in Biology. Vision and Dynamics, A. Carbone, M. Gromov and P. Prusinkiewitcz, Editors, World Scientific Publishers, pp. 392 - 423, 1999

[8] E. Catmull and J. Clark "Recursively generated B- spline surfaces on arbitrary topological meshes" Computer-Aided Design, 10:350-355, September 1978.

[9] W-C Chen, R. Grzeszczuk, J-Y Bougnet. "Light Field Mapping: Hardware-Accelerated Visualization of Surface Light Fields" Part of "Acquisition and Visualization of Surface Light Fields," SIGGRAPH 2001 Course Notes.for Course #46

models such as subdivision-surfaces. [IO] P. Debevec, "Introduction to Image-Based Modeling, . t Rendering, and Lighting". Siggraph'99 courses, 1999. \ [ I IIFunge. J.D. AIfor Computer Games and Animation;

A Cognitive Modeling Approach. A K Peters Ltd, August 1999.

[ 121 H. Grahn. "NLIRBS extension for VRML97". Blaxxun Interactivc, 2000.

[ 131 C. Loop Smooth subdivision surfaces based on triangles Master's thesis, Department of Mathematics, University ofUtah, August 1987.

[ 141 Piegl and Tiller. The NURBS book. Springer-Verlag. 1997

Figure 10 - Skeleton definition (left), skin using subdivision surfaces (middle) and refinement (right). As the skeleton is animated, the skin is deformed.

[ 151 J-F Rotgi, "Principles of solid geometry design logic", Proceedings of the CSG 96 Conference, Winchester, UK, p 233-254, 17-19 April 1996.

4 FUTURE WORK AND PERSPECTIVE

AFX components provide higher-level representations of geometry, animation, and texturing models than BIFS and VRML. The abstraction provided by these components can be extended to streaming and the AFX group is already working on new areas such as view-dependent streaming and scene partitioning.

5 REFERENCES

[ I ] ISOnEC 14496, Coding of Audio-visual Objects: Systems, January 2001.