interpolator data compression for mpeg-4 animation

20
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004 989 Interpolator Data Compression for MPEG-4 Animation Euee S. Jang, Associate Member, IEEE, James D. K. Kim, Seok Yoon Jung, Member, IEEE, Mahn-Jin Han, Member, IEEE, Sang Oak Woo, and Shin-Jun Lee Abstract—Interpolator representation in key-frame animation is now the most popular method for computer animation. The in- terpolator data consist of key and key value pairs, where a key is a time stamp and a key value is the corresponding value to the key. In this paper, we propose a set of new technologies to compress the interpolator data. The performance of the proposed technique is compared with the existing MPEG-4 generic compression tool. Throughout the core experiments in MPEG-4, the proposed tech- nique showed its superiority over the existing tool, becoming a part of MPEG-4 standard within the Animation Framework eXtension framework. Index Terms—Animation, interpolator compression, MPEG-4, three-dimensional (3-D) mesh compression, virtual reality mod- eling language (VRML). I. INTRODUCTION C OMPUTER animation is one of the fastest growing areas both in academia and in multimedia industry. It has evolved from the primitive animation powered by mainframe computers in the 1970s to high-quality animation, abundant in today’s games and animation movies. Basically, a computer animation content can be decomposed into two parts: object definition and object animation. The object definition is a part that describes how objects are modeled with respect to their geometry and appearance and how objects are related one to another. The object animation contains the description of where objects are placed and how objects change over time. For example, a flight simulator program would have the three-dimensional (3-D) shape information of airplanes, terrains, and buildings (object definition) as well as the flight paths of airplanes, the paths of cameras, and climate changes (object animation). A simple form of object animation is a motion path. There are quite a few techniques to define a motion of a static object [1]. The most common practice in object animation is through key frame-based animation. Key frame-based animation is a con- ventional method originated from the cell animations. The main concept of this method is to produce interpolated images by the given first and last images (or key frames). If the time frame be- tween the first and last frames is very short (less than a second), Manuscript received July 23, 2003; revised December 10, 2003, and March 17, 2004. E. S. Jang is with the Information and Communications Department, Hanyang University, Seoul 133-791, South Korea (e-mail: [email protected]). J. D. K. Kim, S. Y. Jung, M.-J. Han, S. O. Woo, and S.-J. Lee are with Sam- sung Advanced Institute of Technology, Kihung, Kyungki 440-600, South Korea (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSVT.2004.830670 the motion between the frames can be assumed to be linear. Any number of frames in between the key frames can be generated in a linear fashion. This technique, also called linear interpola- tion is currently supported in virtual reality modeling language (VRML) [2], MPEG-4, and other well-known graphics formats. One of the advantages of interpolator representation is that it decouples the static object definition and its animation. In this way, different objects may share a single flying path in a game. Compression of graphical representation of objects is not new. In recent years, a great deal of work has been dedicated to the area called geometry compression [3], [4]. Compression of static polygon mesh has been one of the major themes in geometry compression. Animation of objects was treated as part of geometry compression. One can view the image and video compression in the same way because the former is about the static object and the latter is about the moving object. Tremendous efforts have been made to the compression of IndexedFaceSet representation (polygon mesh) in MPEG-4. It is called as 3-D mesh coding (3DMC), which is now the in- ternational standard stage within MPEG-4 [9], [10]. However, it should be noted that the compression of animated objects has obtained less attention than geometry compression thus far. Among previous research dealing with the compression of ani- mated objects, one method is proposed to translate the original VRML ASCII files into another ASCII file that is well suited for Gzip compression [6]. This textual representation may be worth it since it is far easier than comprehending its counterpart (binary description), but it has its limits in compression. Some research results are more dedicated to the compres- sion of animated objects (for both definition and animation) [7], [8]. Both definition and animation were combined for effi- ciency. One interesting approach was to apply principle compo- nent analysis (PCA) into this combined structure to yield loss- less or lossy compression of animated objects [7]. Although a significant amount of compression can be obtained depending on the number of base components, it should be noted that quan- tifying the fidelity of lossy compression is a remaining task. MPEG-4 has begun a new extension called animation frame- work extension (AFX) to support advanced graphics features [11]. Interpolator compression is also one of the major topics in AFX, since the compact size of interpolator representation is crucial to maintain reasonable bit stream size. Lack of in-depth research in interpolator data compression resulted in a great research challenge because the required com- pression ratio was very high in AFX, which is comparable to that of geometry compression. In this paper, we present the summary of the core experiments conducted through MPEG-4 AFX core 1051-8215/04$20.00 © 2004 IEEE

Upload: s-j

Post on 25-Feb-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004 989

Interpolator Data Compressionfor MPEG-4 Animation

Euee S. Jang, Associate Member, IEEE, James D. K. Kim, Seok Yoon Jung, Member, IEEE,Mahn-Jin Han, Member, IEEE, Sang Oak Woo, and Shin-Jun Lee

Abstract—Interpolator representation in key-frame animationis now the most popular method for computer animation. The in-terpolator data consist of key and key value pairs, where a key is atime stamp and a key value is the corresponding value to the key.In this paper, we propose a set of new technologies to compressthe interpolator data. The performance of the proposed techniqueis compared with the existing MPEG-4 generic compression tool.Throughout the core experiments in MPEG-4, the proposed tech-nique showed its superiority over the existing tool, becoming a partof MPEG-4 standard within the Animation Framework eXtensionframework.

Index Terms—Animation, interpolator compression, MPEG-4,three-dimensional (3-D) mesh compression, virtual reality mod-eling language (VRML).

I. INTRODUCTION

COMPUTER animation is one of the fastest growing areasboth in academia and in multimedia industry. It has

evolved from the primitive animation powered by mainframecomputers in the 1970s to high-quality animation, abundant intoday’s games and animation movies. Basically, a computeranimation content can be decomposed into two parts: objectdefinition and object animation. The object definition is apart that describes how objects are modeled with respect totheir geometry and appearance and how objects are relatedone to another. The object animation contains the descriptionof where objects are placed and how objects change overtime. For example, a flight simulator program would havethe three-dimensional (3-D) shape information of airplanes,terrains, and buildings (object definition) as well as the flightpaths of airplanes, the paths of cameras, and climate changes(object animation).

A simple form of object animation is a motion path. There arequite a few techniques to define a motion of a static object [1].The most common practice in object animation is through keyframe-based animation. Key frame-based animation is a con-ventional method originated from the cell animations. The mainconcept of this method is to produce interpolated images by thegiven first and last images (or key frames). If the time frame be-tween the first and last frames is very short (less than a second),

Manuscript received July 23, 2003; revised December 10, 2003, and March17, 2004.

E. S. Jang is with the Information and Communications Department, HanyangUniversity, Seoul 133-791, South Korea (e-mail: [email protected]).

J. D. K. Kim, S. Y. Jung, M.-J. Han, S. O. Woo, and S.-J. Lee are with Sam-sung Advanced Institute of Technology, Kihung, Kyungki 440-600, South Korea(e-mail: [email protected]).

Digital Object Identifier 10.1109/TCSVT.2004.830670

the motion between the frames can be assumed to be linear. Anynumber of frames in between the key frames can be generatedin a linear fashion. This technique, also called linear interpola-tion is currently supported in virtual reality modeling language(VRML) [2], MPEG-4, and other well-known graphics formats.

One of the advantages of interpolator representation is that itdecouples the static object definition and its animation. In thisway, different objects may share a single flying path in a game.

Compression of graphical representation of objects is notnew. In recent years, a great deal of work has been dedicatedto the area called geometry compression [3], [4]. Compressionof static polygon mesh has been one of the major themes ingeometry compression. Animation of objects was treated aspart of geometry compression. One can view the image andvideo compression in the same way because the former is aboutthe static object and the latter is about the moving object.

Tremendous efforts have been made to the compression ofIndexedFaceSet representation (polygon mesh) in MPEG-4. Itis called as 3-D mesh coding (3DMC), which is now the in-ternational standard stage within MPEG-4 [9], [10]. However,it should be noted that the compression of animated objectshas obtained less attention than geometry compression thus far.Among previous research dealing with the compression of ani-mated objects, one method is proposed to translate the originalVRML ASCII files into another ASCII file that is well suitedfor Gzip compression [6]. This textual representation may beworth it since it is far easier than comprehending its counterpart(binary description), but it has its limits in compression.

Some research results are more dedicated to the compres-sion of animated objects (for both definition and animation)[7], [8]. Both definition and animation were combined for effi-ciency. One interesting approach was to apply principle compo-nent analysis (PCA) into this combined structure to yield loss-less or lossy compression of animated objects [7]. Although asignificant amount of compression can be obtained dependingon the number of base components, it should be noted that quan-tifying the fidelity of lossy compression is a remaining task.

MPEG-4 has begun a new extension called animation frame-work extension (AFX) to support advanced graphics features[11]. Interpolator compression is also one of the major topicsin AFX, since the compact size of interpolator representation iscrucial to maintain reasonable bit stream size.

Lack of in-depth research in interpolator data compressionresulted in a great research challenge because the required com-pression ratio was very high in AFX, which is comparable to thatof geometry compression. In this paper, we present the summaryof the core experiments conducted through MPEG-4 AFX core

1051-8215/04$20.00 © 2004 IEEE

990 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

Fig. 1. Example of position interpolator. Key values of three coordinates (X , Y , and Z) are depicted over time (key). Typically, the value of key ranges from0 to 1. The number of keys is 151.

experiments on interpolator compression. The proposed inter-polator compression technology is now a part of MPEG-4 stan-dard.

This paper is organized as follows. Basic concepts on keyframe animation and linear interpolation are explained inSection II. Some details are given on MPEG-4 BIFS codingtechnology in Section III. The proposed coding scheme en-compasses dedicated coding techniques for key and key valuedata. In Section IV, the general coding scheme is introduced.The interpolator data analysis, as the first step of the codingprocess, is described in Section V. In Section VI, the details onthe key data-coding scheme are provided. The key value datacoding scheme is addressed in Section VII. In Section VIII,distortion measures for performance evaluation are addressed.Experimental results are discussed in Section IX. We concludethe paper in Section X.

II. BACKGROUND

A. Terminology

The interpolator data is a set of key frames to approximatethe original motion path. The key frames represent the approxi-mated path. However, in this paper, the original path is used in-terchangeably with the approximated path. A key frame is com-prised of a key and a key value. The terms key frame and keyare used interchangeably in this paper, since the key representsthe time stamp of the key frame. A key is a floating-point value,which indicates the relative time stamp between negative infi-nite and positive one. A key value is also a floating-point scalaror vector for the given key and is basically not bounded.

B. Linear Interpolation

A key-frame-based system produces the continuous anima-tion by generating interpolated in-between frames out of two

TABLE IDIFFERENT INTERPOLATORS IN VRML

key frames. The simplest interpolation method is linear interpo-lation, which requires only a linear equation based on two keyframes. One disadvantage of linear interpolation is that it mayrequire a tremendous amount of key frames in order to representvery complex motion. Curve interpolation may be an alternativeto linear interpolation, but is more expensive in computation andediting the motion trajectory is more difficult. The linear inter-polation seems to be the better choice, but there is a good pos-sibility that the key frames are over sampled. This is partly dueto the fact that many existing authoring tools provide uniformlysampled key-framing.

In this paper, the linear interpolation is based on VRML andMPEG-4 binary format for scene (BIFS) specifications [2], [12].As shown in Table I, there are six types of interpolators de-pending on usage.

C. Compression Philosophy

Fig. 1 is obtained from the captured data in a real-world ani-mation written in VRML. It is easy to find out that each curve ateach axis ( , , or ) can be represented in fewer line segmentsthan its original 150 segments with 151 keys. In other words,each curve reveals noticeable correlation between the consec-utive key values at each axis. In order to exploit this high cor-relation into compression, various coding techniques (such as

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 991

Fig. 2. Principle of key analysis. (a) Initial (scalar) interpolator. (b) Select the both ends as initial keys. (c) For unselected keys, calculate the distortion. (d) Findthe next key. (e) Search the next key to yield better fitting. (f)–(h) Find the next key.

DPCM, DCT, wavelet-based transform coding, or vector quan-tization) may be applied. When choosing the proper method,there is one notable requirement: lossless compression. This isto ensure that a compression system would allow both loss-less and lossy compression. Our choice was differential coding,since this technique allows both lossless and lossy compressionhomogeneously.

Many lossless compression systems adopted differentialcoding. In previous MPEG-4 versions, there is a differentialcoding technique called Predictive MFField (PMF) coding[12]. Nevertheless, the compression gain of differential codingover transform-based coding is not as promising as in othercoding standards. So, differential coding itself could not solvethe high compression issue easily.

Another observation to the motion path in Fig. 1 is that eachcurve may need a different number of key frames from the othercurves. For instance, the “ -axis” curve is the most complexcurve and the “ -axis” curve is the least complex one in Fig. 1.In order to exploit this property, we have devised a way to selectthe proper number of key frames out of the given key frames.In other words, the point here is that, in order to ensure a higher

compression, one has to select the most needed keys out of thekey frames.

Removing the redundant or less meaningful keys out of theoriginal keys is an optimization problem. The best way to handlethis problem is when one has the original motion path. Un-fortunately, it is often not the case because of the sampling ofkey-frame animation. Therefore, the optimization problem be-comes how to find the best shape which is closer to the motionpath represented by the original key frames. Thus, the losslesspath is generally the case that no key frames are omitted fromthe original key frames.

D. Analysis of Key Frames

If there are key frames, the motion path is going to begoverned by key frames. In order to minimize the numberof key frames to represent the given motion path, our approachis to choose the best ( ) key frames that yields the closerapproximation. One approach to minimize the number of keysis depicted in Fig. 2. From the key frames, the first and thelast key frames are assumed to remain intact during the analysis.

992 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

Fig. 3. Structure of PMF: key and key value coder and decoder.

Hence, one can assume that the very minimal number of keyshas to be at least two. Mostly, as shown in Fig. 2(b), two keysare not going to be enough to represent the entire path.

The next task is to find the most appropriate key out of theunselected keys. The selection process is based on the

distortion between the original path and the approximated pathwith a new key [see Fig. 2(c)]. Once a key that minimizes thedistortion is selected, three keys are selected and keys areremaining unselected. The selection process will continue untilthe approximated path is close enough to the original path. Bydefinition, the selection process will continue to find all keys,which will be the lossless selection.

Fig. 2 is a case of ScalarInterpolator analysis, which can beeasily applicable to other types of interpolators by assuming thatthe same selection process can be done in each axis (or in otherdomain in the case of OrientationInterpolator). Alternatively tothis approach, one can start with all key frames. Also, theselection process is to find the most redundant key and delete itfrom the key frame list.

The aforementioned key-frame analysis cannot guar-antee that the obtained solution is optimal, since there are

possible cases if the problem is toselect keys out of keys. As the number of keys increases,the optimal solution is very hard to find. The problem can bemore complicated if the newly selected keys do not have tobe the subset of the original keys: new keys can be generatedbetween the first and last key frames. Therefore, we leave muchroom for further improvement in key-frame analysis.

E. Differential Compression

Once key frames are selected out of key-frame analysis, a dif-ferential coding can be applied. Although some redundant keysare removed during the key-frame analysis, there still remains astrong correlation. The differential coding is applied to keys andkey values, respectively. Entropy coding of differential values isthe indispensable part of the differential compression.

III. OTHER MPEG-4 BIFS CODING TECHNOLOGIES

Before the proposed technology was adopted in MPEG-4,there were other coding technologies to efficiently encode thescene elements in BIFS [12]. When evaluating the proposedtechnology, the two other BIFS coding technologies are exten-sively used for comparison. Although some researchers usedGzip for comparison, the BIFS coding technologies were se-lected because theirs superior compression performance overGzip.

A. BIFS Quantization Method

Called briefly BIFS- , this method quantizes the data bygiven ( ) bits. Originally, the input data may be rep-resented by 32-b floating-point values. Quantization of the datawith fewer bits allows a simple compression, but may not besufficiently efficient because BIFS- is not equipped with anytype of entropy coding. Hence, this technology is quite usefulin compact scenes, but limited in compression of heavy sceneswith many complex objects.

B. PMF Coding Method

A simple, yet effective cure for the limitation of BIFS- is thePMF coding technique. As can be noticed from its long name,PMF was proposed to efficiently encoded multiple field databy a differential coding with adaptive arithmetic coding. Fig. 3shows a block diagram of the PMF key and key value codec.

The PMF technology looks similar to the proposed tech-nology in the sense that both techniques are differential codingcombined with arithmetic coding. There, however, exist a fewclear differences between the two.

1) PMF exploits only intrafield correlation. For instance, theincremental value of the coordinate does not affect thecoding of the - or -coordinate values.

2) There are six different interpolators with different pur-poses and PMF provides a single operational mode, whichdoes not exploit the correlation based on the differenttypes of interpolators.

3) For the orientation interpolation which is spherical linearinterpolation, PMF cannot handle the characteristics of

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 993

Fig. 4. Interpolator compression system structure: (a) encoder and (b) decoder.

spherical linear interpolation correctly. It still uses lineardifferential coding in its coding block.

4) The notion of reducing the key frames through key-frameanalysis was absent when technology was developed.Hence, PMF was not in the position to compress theinterpolator data most efficiently, even if the interpolatordata fall into the category of multiple field data.

IV. INTERPOLATOR CODING SCHEME

The proposed interpolator compression system has a structureas depicted in Fig. 4. As explained in the previous section, theoriginal interpolator data are reduced through the interpolatoranalyzer. Even when the lossless compression is desired, thisanalysis process is mandatory, since there is a good possibilitythat some keys can be removed after the analysis while keepingthe motion path unaltered.

Once the important keys are selected, the key–key value pairdata are separated into the key data set and the key value dataset, respectively. And there are dedicated encoders for keys andkey values, respectively. Although one can think of key data asone-dimensional (1-D) array of floating-point values and of keyvalue data as multidimensional array of floating-point values,there are clear differences between the two. The key data is anarray with values that are monotonically nondecreasing and isusually bounded between and . There are also somecommonly shared data such as the number of keys, quantizationparameters, and other encoder specific information, and thesedata are encoded with the key and key value header encoder.

The decoder structure looks the opposite of the encodingprocess. It is true for the decoders of header information, keydata, and key value data. The interpolator synthesizer wouldcollect these data and reconstruct the decoded motion path.Further details on different modules of the coding system aregiven in the following sections.

V. INTERPOLATOR ANALYSIS

The interpolator analysis is the data compaction process. Inthis paper, two methods are introduced: data resampling and keyframe analysis. One can choose either method or both methods,which is an encoder-specific issue.

A. Interpolator Data Resampling

Resampling analyzes the original input interpolator andmakes the values of key of interpolator be spaced evenly. Thenew key and key value pairs are generated as much as a givennumber during resampling. The new keys are generated firstand evenly spaced. And then the new key values are generatedby interpolating the original key values as follows:

(1)

where and are the original key and key value at the thposition, and are the resampled key and key value at theth position, and and the numbers of original and resam-

pled key frames.

B. Key-Frame Analysis

Major concepts on key-frame analysis have been presented inthe previous section. The key-frame analysis process is the coreof the proposed compression system because it allows reducingthe number of key frames. In key-frame analysis of orientationinterpolator data, one should keep in mind that the angular ve-locity of the rotated object is as important as the motion path.The aforementioned key-frame analysis is one simple methodthat achieves the goal. Other approaches to optimally reduce thekey frames are still left for the future research.

994 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

The presented two methods to reduce key frame data havedistinct features. Resampling method cannot be used in the key-preserving mode, since the method may change the number ofthe key frames. It also cannot guarantee the lossless compres-sion. Empirically, resampling works better than key-frame anal-ysis in very lossy (or low-bit-rate) environments. Intuitively, itis very obvious that representing the keys by a few parametersinstead of encoding tens (or even more) of key data yields sub-stantial bit savings in very low-bit applications.

VI. KEY CODING

A. Key Header Coding

Selected key frames after the interpolator analysis may berepresented with different numbers of keys and key values. Thenumber of keys is greater than or equal to the number of keyvalues. This information is dependent upon the compressionmode decision, which is explained in the later section. Theinitial header information such as the minimum and maximumvalues of the given key and key value data often consume con-siderable amount of space, which is very critical in low-bit-rateapplications. Hence, the compact data representation of theheader information is equally important to coding of key andkey value data.

1) Compression Mode Decision: For some interpolator-based animations, the editing capability of interpolator datais considered as important as the fidelity of the reconstructedmotion path. In such a case, the keys and/or the key values areto be changed by interactive requests. If a compression systemchanges the number of key frames, it may make the interactiveanimation vulnerable, since the system cannot guarantee therandom data access (for example, motion synchronization be-tween multiple objects). This is called key-preserving scenario,where the number of the original key frames is preserved aftercompression.

If it is not the key-preserving scenario, it means that the mostimportant thing is to represent the motion path as accurate andcompact as possible. This case is called path-preserving sce-nario.

Table II shows an example how these two modes are differentin representation and compression. The number of original keyframes is 9. If the key-preserving mode is selected, the numberof keys has to be preserved. At the same time, the efficiencythrough key-frame analysis has to be kept as well. The infor-mation field called key selection flag contains the data whichkey frames are encoded from the original key frames. This wayallows encoding the fewer number of key values than that ofkeys. In the path-preserving mode, the key selection flag is notnecessary, since the number of original key frames is no longerpreserved. The path-preserving mode is not applied to the coor-dinate interpolator coding.

2) Key Header Information Coding: Some key-coding spe-cific information is kept in the header information bitstream in-stead of key data bitstream, since it is used for decoding the keydata bitstream. Hence, the description of the key header infor-mation is addressed first. As shown in Table III, the key headerinformation includes quantization parameters.

TABLE IIEXAMPLE OF KEY-PRESERVING AND PATH-PRESERVING MODES

TABLE IIIKEY HEADER INFORMATION

The first quantization parameter ( ) indicates how manybits are allocated to each key value. The number of keys ( ) isidentified as a -bit unsigned integer value. Representedin textual format, the input interpolator key data is very likely tohave a unique significant figure ( ), which is kept for losslesscompression.

The unit-step key region flag (UR) indicates whether a sub-region of key data can be represented by the first value, the lastvalue, and the number of keys in between. The interpolator datarepresentation in VRML/MPEG-4 BIFS is usually provided bywell-known authoring tools. When an authoring tool is con-verting a motion path in its own format into VRML interpolatordata format, the key frames are very likely to be sampled evenly.

One problem regarding this conversion process is the resolu-tion. For instance, if a scalar value is changing from 0 to 1 withfour key frames, we have four keys at 0, 1/3, 2/3, and 1 ( ).When these keys are saved, for example, each number is savedin textual form: 0, 0.333 333, 0.666 667, and 1.0. If the keys arerepresented by the first and last keys and the number of in be-tween keys, there are two advantages: 1) no need to compresskeys additionally and 2) no precision error due to textual rep-resentation. If a unit-step key region is found, the region of keywill be represented by the three values.

Usually, a key is a value in between 0 and 1. If the range ofkey is beyond 0 and 1, the key range flag (OR) is set and theminimum and maximum keys are encoded additionally.

3) Floating-Point Numbers: Quite a few floating-pointvalues will appear throughout the header information of key andkey value coding. Although these data are preferred to be rep-resented losslessly, it is often the case that these floating-pointheader values consume a significant portion of the entirebitstream. In order to minimize the impact of the presence ofthe floating-point values in the header, a simple-but-compactrepresentation is adopted. It is through the use of mantissa andexponent for the given floating-point value. The binary sizeof mantissa is determined by the original significant figure. A

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 995

Fig. 5. Structure of key coding: (a) key encoder and (b) key decoder.

three-digit value can be represented with 10 bits, for example.The exponent can be represented with 6 bits and additionalsign bit.

B. Key Coding

The coder and decoder structures of key data are shown inFig. 5. We begin with the discussion of the quantization process.

1) Quantization: The original key data are floating-pointvalues represented in textual format. Quantization is a processthat reduces the resolution of key data for compression. Boththe key and key value data have the minimum and maximumvalues. Hence, the quantization is to represent the data rangebetween the minimum and maximum values by bits. In thecase of a key, the data range is 0 to 1, except for the case wherethe minimum and maximum values are provided additionally.

The proposed quantization process is a uniform quantization,where the step size of each bin is uniform. Uniform quantizationis a popular quantization in MPEG, which is cheap and simple. Itshould be noted that the first and last quantized values representthe minimum and maximum values, respectively.

2) Differential Coding of Key Data: Based on the assump-tion that the keys are equally spaced in time, one can utilizethe high-order differential coding. Since the key data are non-decreasing, we have devised a special way of encoding key datacalled the divide and divide (DND) method. DND is a variationof high-order differential coding to minimize the resulting dif-

ferential values of the predictor. For practicality, the DND codercan select among the first-, second-, and third-order predictors.Since different predictors will yield different sets of differen-tial values, the optimal selection of predictor becomes a crucialissue for better compression. In this paper, a simple deviationrule was applied to the selection process: the chosen predictoryields the smaller deviation than the others.

Once the prediction is done, the differential values are goingthrough further compaction processes: shift, fold, and divide op-erations. The differential values may or may not be centered atzero depending on the order of prediction. If the most frequentdifferential value is not zero, all the differential values are sub-tracted by the most frequent differential value. In this case, themost frequent differential value is sent additionally. This processis to have the differential values centered at zero, which is calledthe shift operation.

The differential values after the shift operation are repre-sented by all positive numbers through the fold operation asfollows:

ifotherwise.

(2)

The shift-and-fold operations are needed for further datacompaction through the following divide operations. Thedifferential values tend to be uniform except a few cases,because of the sampling nature of keys. This resulted in a good

996 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

prediction with a few irregularly large differential values. Thus,the resulted differential values are likely to be concentrated inthe center with a large (close enough to the maximum) value.

After the fold operation, which makes all the differentialvalues positive, divide operation is applied. It puts the largerhalf of the data to the negative side as follows:

ifotherwise .

(3)

This causes the range of data to be reduced according to themaximum value of the lower half of the data in the positiveregion and the minimum value of the data in the negative region.

After the divide operation, different types of divide opera-tions can be tried recursively up to seven times until it reachesthe smallest range of data. During these operations, either thedivide-up or divide-down operation can be used. If the min-imum value (after divide operation) is greater than the maximumvalue in absolute size, divide-up operation is selected. Other-wise, divide-down operation is selected. The divide-up and thedivide-down operations are shown, respectively, as follows:

otherwise(4)

otherwise.(5)

The encoding process may be stopped after shift operation,after fold operation, or after divide operations. This decision ismade based on the quantization space after each process.

When the divide-up and divide-down operations are selected,the standard deviation can be used for reducing the range of key.Additional shift-up operation can be added based on the stan-dard deviation. At this time, the shifted value is the maximumvalue as follows:

otherwise.(6)

This rather unusual process may look complex. It is true toestimate that this process is no less complex than other conven-tional differential coding methods. This DND method, however,was proved useful at the very low bit rate applications, while,many bits have to be assigned for key value data in order to keepthe quality of animation path as good as possible at the high bitrate applications.

In Table IV, a simple example of DND processes is explained.From the example, finding the optimal number of divide up anddown operation may not be an easy task. A simple way that wasused in the experiment was finding the number of divide up anddown operations that need the minimal actual quantization bitof the given differential values.

TABLE IVEXAMPLE OF DND OPERATIONS

VII. KEY VALUE DATA CODING

As mentioned earlier in this paper, there are six different in-terpolators with six different key value sets. Depending on thenumber of data fields, coding of three (position, orientation, andcoordinate) interpolator types can also support the other types.Hence, we present three coding techniques for key value data.

A. Position Interpolator Coding

The position interpolators represent the 3-D coordinate valueschanging over time, which has a pair of key and key value (withthree coordinates) at each key frame. Although the position in-terpolator consists of 3-D coordinates over time, it represents amotion of entire object, not individual vertices. For the anima-tion of individual vertices, the coordinate interpolator is used.

Coding of position interpolators can be applied to the scalar,color, and normal interpolators. The scalar interpolator is aone-dimensional form of the position interpolator. Color andnormal interpolators represent the changes of color and normalattributes over time, but these interpolators also have threecomponents for a key value.

The encoding and decoding structure is shown in Fig. 6. Thecoding scheme is also based on differential coding. The en-coding process consists of quantization, circular differential pre-diction, and entropy coding [16].

1) Normalization and Quantization: Given key value data, normalization and quantization process are performed. In

normalization process, the maximum range among the threecomponents , , and is calculated. Then, with this max-imum range, the given key values are normalized. The normal-ized key values go to the quantizer and these key valuesare quantized. The normalizer makes the quantized key valuesredundant, because two components, which ranges are not themaximum range, are quantized based on the maximum rangeinstead of individual maximum values. In case of Fig. 6, thecomponent has the maximum range out of three components.

If the component is quantized with this maximum range,then the quantized values are similar or the same. These redun-dant values are entropy-encoded efficiently.

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 997

Fig. 6. Encoder and decoder structures of position interpolator: (a) encoder and (b) decoder.

As for quantization, a uniform quantization [15] is performed,which is the same as key coding. The only difference is thatthis time the first and last values are not to be kept as is. Thenormalization and the quantization process are

(7)

where and represent the original and quantized values,respectively, and indicates the bit size for the quan-tization. The function takes a floating-point numberand returns a integer value representing the largest integer thatis less than or equal to .

In this process, adjusting the maximum and minimum valuescan minimize quantization error. For example, assume that max-imum and minimum values are 0 and 3, respectively, and thequantization bit ( ) is 2. If the original values are 0,

0.91, and 1.82, then the quantization values are 0, 1, and 2, andthe inverse quantized values are 0, 1, and 2. The errors betweenthe original values and the inverse quantized values are 0, 0.09,and 0.18. In such a case, if we change only the maximum valueto 2.7 with leaving the others unchanged, then the quantizedvalues are same as above, but the inverse quantized values are0, 0.9, and 1.8. Thus, the errors are reduced to 0, 0.01, and 0.02.

As the quantized values are not changed, the coding effi-ciency of the entropy coder is not changed either. Therefore,with this quantization error minimization process, the distortionbetween the original path and the decoded path can be reducedwithout any loss of the compression ratio.

Since the quantization error is big enough in low-bit-ratecompression, this quantization error minimization process isideal for low-bit-rate compression.

2) Circular Differential Prediction: The first- or second-order prediction is used in key value coding. The decision isencoder specific, but the sum of absolute difference (SAD) isused in this paper. In other words, if SAD from the first-orderprediction is less than that from the second-order prediction,the first-order is used. The differential value of the first-orderprediction is the difference between the current and previousvalues. In the case of the second-order prediction, the thprediction value is the sum of the ( )th value and thedifference between the ( )th and ( )th values. In the

998 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

case that the prediction value is over the maximum value orbelow the minimum value of quantization range, the predictionvalue is set to the maximum or minimum value, respectively.

Another important concept in differential prediction of keyvalues is the circular differential prediction. The differentialvalue between the current and predicted values may be negativeor positive. If positive, it means that the current value ishigher than the prediction value. If the prediction value is themaximum value, the differential value is always nonpositive.If the prediction value is the minimum value, the differentialvalue is going to be always nonnegative. Since the sign bit isencoded separately, it is important to keep the differential valueas small as possible.

The circular prediction is performed on the assumption that amaximum value and a minimum value in a quantization rangeare circularly connected to each other. Accordingly, if differ-ential data, which are the results of performing the predictionon two consecutive quantized data, are greater than half of themaximum value in the quantization range, their values can bedecreased by subtracting the maximum value from the differen-tial data.

When represents the differential value between two suc-cessive quantized values at moments of time and , andthe maximum and minimum value are ( ) and 0,respectively, a circular prediction operation is defined as

ifotherwise

Circular di�erential value (8)

where represents a predetermined number of quan-tization bits. According to the above equation, in circular predic-tion, it is allowed to select the value that has the smaller absolutevalue between the original differential value and the circular dif-ferentiated value.

For example, if the prediction, current, minimum and max-imum values are 13, 1, 0, and 15, respectively, then the differ-ential value is going to be and circular differential value

is 3. Regardless of its sign, the smaller value in magnitude isselected. The selected value does not exceed the half of the rangebetween the minimum and the maximum value (in this case, thehalf of the range is 7.5). Therefore, the range of the circularlydifferentiated values is reduced to half of the range of the dif-ferential values. In other words, the bit size of each circularlydifferentiated value (for entropy encoding) can be saved by 1 b.The differential values are then entropy coded with the adaptivearithmetic coding, which is explained in the later section.

B. Orientation Interpolator Coding

Orientation or rotation interpolators represent the rotating ob-jects over time. In order to represent the rotation, one needs fourcomponents: the rotating axis in the coordinate systemand the rotating angle in radian. Key frames of the orientationinterpolator are interpolated through spherically linear interpo-lation [14].

The notation with the rotation axis and the rotatingangle is called as quaternion representation. Moreprecisely, a quaternion vector has four components

: , where, , , and denote the rotational angle and the

unit normal vector values of the rotating axis. By definition,the squared sum of all quaternion components is one. Thisimplies that one component needs not to be sent if the otherthree components are sent in the data transmission. In thecompression of quaternions, the important thing is that oneshould keep the displacement of the angular velocity, notthe linear displacement between key frames, because lineardisplacement cannot reflect the characteristic of rotationaldifferences between key frames.

Fig. 7 depicts the structure of key value coding of orientationinterpolator. Differently from the key value coding of the otherinterpolators, the key value coding of orientation interpolatorfirst applies rotational differential prediction first before quan-tization. After quantization, the first order circular differentialprediction is conducted to produce final differential values forentropy coder.

1) Rotational Differential Prediction: Rotation can bethought of as a transformation of a given point into a newposition. The position with 3-D coordinates might be rep-resented in quaternion form as well: (0, , , ). The keyframes of the orientation interpolator can be thought of a setof multiple transformations with different rotations. If a point( ) is transformed by a quaternion rotation ( ) into ( ), itfollows the following equation: , where isthe quaternion complex conjugate of and is the quaternionmultiplication [14].

A 3-D point with key frame quaternion transformationscan be transformed from to , whereand . It can be rewritten as

(9)

where is the difference quaternion.While (9) is the real difference quaternion, we should not

adapt this equation as it is in real encoding. Since we cannot cal-culate the term from the decoder side, we should replaceit with reconstructed quaternion in the encoder side,shown as follows:

(10)

In order to reduce the number of bits to be encoded, only threecomponents except the first component in difference quaternionare encoded. Therefore, using the three components, the de-coding part should restore the first component using encodedthree components. It can be restored using the following equa-tion:

(11)

where is the first component of , and are thesecond, the third, and the fourth components of , respectively.

In this case, it is important that the first component of a differ-ence quaternion should always have a positive value. In quater-nion space, quaternion and are identical when they are

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 999

Fig. 7. Encoder and decoder structures of key value coding of orientation interpolator: (a) encoder and (b) decoder.

applied to the rotational transformation of an object in 3-D space(e.g., ). Using this char-acteristic, one can guarantee to always have a positive value ofthe first component of difference quaternion.

2) Quantization: A nonlinear quantization has been adoptedfor quantizing the difference quaternion values, since the values

are more concentrated around zero. This requires a more accuratequantization in the area closer to zero and the nonlinear quantiza-tion is better suited for such purposes. Here, the proposed quan-tization uses an arc-tangent curve for nonlinear scale factor.

As shown in Fig. 8, arc-tangent curve gives finer resolutionto the input values located in the lower range, while gives mod-

1000 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

Fig. 8. Scaling function for nonlinear quantization.

erate resolution in the higher range. The equation for proposedquantization using this nonlinear function is

(12)

where is the scaled value of corresponding difference quater-nion component, is the number of bits used for quantiza-tion, and the function is to determine the sign of quantizedvalue .

Quantization of difference quaternion may result in a changeof direction due to quantization error. For example, an original179 rotation may be reconstructed as 181 rotation in the de-coder side, since the reconstruction of rotation value is done bycontinuously cumulating differential values that are inverselyquantized. For this reason, the quantized value of differencequaternion has to be modified appropriately to correct thisproblem.

Also, encoding only three components out of four in quater-nion representation may be incomplete, since three componentsof difference quaternion with the quantization noise maygenerate an imaginary number in (11). In order to avoid thisproblem from the encoder side, the proposed encoding methodforces the quantizer to modify three quantized componentsappropriately so that the decoder have nonnegative real valuein the first component of restored difference quaternion withminimum error.

C. CoordinateInterpolator Coding

The CoordinateInterpolator data, differently from other inter-polator types, have a unique data structure. It is used to give amotion to each and every vertex of a given 3-D object (i.e., a 3-Dcharacter). If a 3-D object has a hundred vertices in its mesh rep-resentation, the CoordinateInterpolator data with 10 key frameswould have a thousand values or 100 values at each key frame.Hence, one can view this as a 2-D matrix structure representedby the number of key frames and the number of vertices.

The coordinateInterpolator coding scheme has the encoderand decoder structures as shown in Fig. 9. It is similar to otherkey value coding schemes in that it is basically a differential

coder. The coding process of differential values is different andunique from other coding schemes, which are explained later inthis section.

1) Key Value Header Information Coding: bTranspose is theflag for transpose mode or vertex mode [16]. The coding orderof key value data can be chosen either temporally or spatially.If the temporal scanning order is selected, the differential pre-diction of the current vertex at time is performed from the cur-rent vertex at time . If the spatial scanning is selected, onthe other hand, the differential prediction of the current vertexat time is performed from the previous neighboring vertex attime . This process resembles the video coding, which usesintra-frame coding for spatially correlated data and inter-framecoding for temporally correlated data. This scanning order issent in the header information.

Once the scanning order with key frames and verticesis chosen, we have either vertex sets (with elements ineach set) or key frame sets (with elements in each set).For simplicity, we will use the term “vertex set” interchange-ably with “key frame set” for the rest of this paper. And eachvertex set has three subsets corresponding to three ( , , and

) coordinates. Each subset will be called the vertex subset orthe vertex coordinate subset.

The number of vertices has to be sent as the header infor-mation. In order to maximize the coding efficiency, not onlythe minimum and the maximum values of the given key valuedata are encoded, but also the range of minimum and maximumvalues of each vertex set. This way the differential values can befurther shrunken in size. The other header information data aresimilar in principle to those that are defined in the key headerinformation.

2) Quantization: The quantization process in the coordinateinterpolator is similar to that in the position interpolator. Oncequantization is finished, the coding operation is done on eachvertex set. It starts from the first vertex (set) of the given meshand finishes at the last vertex (set). Like position interpolator,the quantization error minimization process can be used for co-ordinate interpolator.

3) Differential Prediction: As the coordinate interpolatorhas a 2-D matrix structure (Fig. 10), the prediction of the

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 1001

Fig. 9. Codec structure of coordinate interpolator: (a) encoder and (b) decoder.

Fig. 10. 2-D matrix structure of key values of coordinate interpolator.

current key value can be done in three different ways: spa-tial, temporal, and spatio-temporal predictions, given in thefollowing equations:

(13)

(14)

(15)

where and represent the differential value and thekey value at the th key frame and the th vertex. Likewise,

and represent the key value at the ( )th keyframe and the th vertex and the key value at the th key frameand the th vertex, where . The reference key value indifferential prediction in temporal prediction is from the pre-vious key frame right before the current key frame. For spatialprediction, the reference vertex does not have to be the neigh-

1002 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

boring vertex. It is due to the fact that the vertices can be orderedregardless of their correlation in description. This is a clear dis-tinction from pixel-based image coding.

The selection of the best prediction mode can be comparedafter encoding every prediction mode, which is very costly.Thus, entropy calculation was used for the selection forsimplicity. During the differential coding stage, the circularprediction was conducted similarly to the differential coding ofother interpolators.

4) Dictionary Coding: Unlike other key value coding tech-niques presented in this paper, the key value coding in coordi-nate interpolator has a dictionary-based coding. Normally, thenumber of key frames is not more than a few hundred frames.After key frame analysis, the number can be decreased into afew tens. For example, if the key value is encoded with 10 b,only a few tens of symbols will be used instead of possible

symbols. Moreover, there are quite a few symbolsthat appear more than once. Dictionary based coding was intro-duced to exploit these facts for coding efficiency.

In dictionary based coding, two types of information haveto be conveyed to the decoder: dictionary data and index data.The proposed dictionary based coding has two different codingmodes by how the dictionary is represented: individual modeand collective mode.

Dictionary coding with individual mode is a process that col-lects the symbols in the scanning order as explained in Table V.From the example, we see that there are only four symbols out of13 input values. Dictionary coding starts from the first symbol ofthe dictionary, which is 3 in the example. The first input value is,by definition, the first symbol in the dictionary. Once the symbolis given, the search operation of the remaining values is con-ducted to see if any remaining value is the same as the symbol.If so, the corresponding bit for the value is set. Otherwise, it isunset. If all the remaining values are not same with the givensymbol, a one-bit flag is set to indicate that the current symbolis the unique symbol. This flag is unset otherwise and this flagis sent right after the symbol representation (a bold 0 or 1 inthe example). This process continues until all the found sym-bols are searched. As shown in Table V, the number of searchedvalues decreases since any value that is matched with any pre-vious symbol will not be checked again. This process generatesa mixture of dictionary and index representation, where the dic-tionary symbols are represented by integers and the indices bybinary values. Dictionary coding with collective mode is usedwhen most of the symbols in the dictionary appear in the differ-ential value list. This happens when a low quantization param-eter is used.

For example, if the differential values are in the range of 3and 3, all the possible symbols are seven symbols. Instead ofrepresenting each symbol by 3 b, it is cheaper to indicate if eachsymbol is selected or not. This process is explained in Table VI.This process is different from the collective mode in two ways.

1) The dictionary symbols are not encoded directly, but theirindices are, and these indices are sent first in the outputrepresentation.

2) The flag that indicates the current symbol is unique or isnot used. In the dictionary, the symbols are listed not in

TABLE VINDIVIDUAL MODE DICTIONARY CODING

TABLE VICOLLECTIVE MODE DICTIONARY CODING

the scanning order, but in the ascending order in magni-tude while alternating its sign as shown in Table VI.

Hence, the resulting representation is all binary values. Here,it should be noted that the index data for the last symbol are notencoded, because the last symbol is already known through theindices of the dictionary symbols.

Since dictionary coding supports two modes, it is time to dis-cuss how one can select the best mode. A simple method is tocompare the sizes of output representation of both modes. Anda mode that yields less representation size will be selected.

As one can see from Tables V and VI, the most of the data indictionary coding are represented by binary numbers: ones andzeros. In order to maximize the compression efficiency of theentropy coder, the binary representation at every symbol scan-ning has an additional flag that indicates whether the currentrepresentation is inversed or not. If the number of zeros is a ma-jority, the INVERSE flag will be unset. The converted represen-tations for both modes are shown at the end of Tables V and VI,where the INVERSE flag is right at the beginning of the binaryrepresentation of every symbol scanning.

5) Prediction Mode Coding: A vertex set has three coordi-nate subsets: , , and coordinates. Each coordinate subsetis encoded independently from the other subsets. Hence, eachsubset has its own prediction mode: temporal, spatial, or spatio-temporal prediction.

Instead of indicating the prediction modes in each vertex set,the prediction mode data over the entire coordiniate interpolator

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 1003

TABLE VIICOLLECTIVE MODE DICTIONARY CODING OF PREDICTION MODES

is independently encoded. This prediction mode list will precedethe differential coding of vertex sets in the bit stream, such thatthe decoder can decipher the exact prediction mode before de-coding each vertex set.

A vertex set may have 27 different combinations of predictionmodes. For the given number of vertices, the prediction modesin each vertex set is represented by a number between 0 and 26and encoded by the same concept of dictionary coding with col-lective mode. It should be noted that the index data for the lastsymbol are not encoded, for the same reason in the collectivemode dictionary coding. Table VII presents an example of dic-tionary coding of prediction modes. It should be noted that theINVERSE flag is not used in prediction mode coding.

D. Entropy Coding

The encoded bit stream of interpolator data compression iseither variable length coded or arithmetic coded. Most of thebitstream of key and key value coding are done with adaptivearithmetic coding. For signed and unsigned input values, thecoding method is the same one that is used in MPEG-4 3DMC[13]. Some different coding schemes are described as follows.

1) Quasi-Adaptive Arithmetic Coding (QuasiAAC): TheQuasiAAC encodes signed input values with fixed context.After the first nonzero bit of the magnitude is encoded, theall other bits of the magnitude are encoded using only a fixedcontext. The integer symbols of the individual mode dictionaryare encoded by this QuasiAAC encoding in the CoordinateInterpolator.

2) Unary Adaptive Arithmetic Coding (UnaryAAC): TheUnaryAAC converts the input symbol into the sequence of 0’sand one 1. The number of 0’s is determined by the magnitudeof symbol and 1 is used as the ending flag for the symbol.Finally, the resulting sequence ends with the 1 bit sign value,which makes the entropy encoder get a lot of 0’s and smallnumber of 1’s. For example, if an input symbol is 255, it isconverted to 255 0’s, one 1, and one 0 for sign value. That is theunary coding. The higher the probability of 0’s is, the smallerthe size of encoded bit stream is. This UnaryAAC is used forcoding key values of position and orientation interpolators.

Fig. 11. An example of SQ AAC. (a) The organization of bits for a symbol.(b) Encoding process.

3) Successive Quantization Adaptive Arithmetic Coding(SQAAC): It encodes the value of each symbol by successivelyrefining the range of quantization. Fig. 11(a) shows how toorganize a sequence of bit stream from an input symbol.Assuming that “1” is an input symbol and the maximum valueof the quantization range is 9. In the first step, “1” is locatedin the lower half of the quantization range [ ]. Therefore,“0” is issued and the maximum value of the range is updated to4 (e.g., the middle value of whole quantization range ). Inthe next step, 1 is located in the lower half of the range [ ].Therefore, “0” is issued and the maximum value of the rangeis updated to 1 (e.g., the middle value of whole quantizationrange ). In the final step, 1 is located in the upper half ofthe range [ ]. Then, “1” is issued and the minimum valueof the range is updated to 1 (e.g., the middle value of wholequantization range).

Fig. 11(b) is the encoding process of SQAAC when a se-quence of input symbols is, for example, 0, 1, 2, 3, 4, 9, andthe maximum value is 9.

As shown in Fig. 11(b), not all the value need to have samenumber of quantization bit unlike conventional quantization, be-cause the number of quantization bit may varies (or be reduced)according to the maximum value of quantization value. For ex-ample, input symbols 0, 1, and 2 just need 3 b for encoding (not4 b). This is especially useful if most of the input symbol is lo-cated in the relatively lower range of whole quantization one anda few symbols are in the relatively higher range. This SQAAC

1004 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

is used for coding key values of position and orientation inter-polators.

VIII. DISTORTION MEASURES

The proposed interpolator data compression method providesnot only lossless compression, but also lossy compression. Inorder to assess the fidelity of the compressed interpolator overthe original interpolator, we introduce the distortion measuresthat were used in the core experiments in MPEG-4 AFX [16].Interpolators have different characteristics with different distor-tion measures in order to compare the visual quality of anima-tion produced by both original and restored interpolator data set.Here, we explain each distortion measure according to the typesof interpolators.

A. Objective Measure for Position Interpolator

The position interpolator is to represent a 3-D animationcurve in time sequence. This four-dimensional (4-D) curve canbe analyzed by three 2-D curves: axis versus time, axisversus time, and axis versus time. By measuring the areadifference between the original and approximated paths in each2-D curve, one can measure the distortion objectively. Theobjective measure is called the area difference ( ), measuringthe area difference between the original and approximatedpaths in each axis.

As shown in Fig. 12, we can divide the original and approxi-mated paths as sets of trapezoid or twisted trapezoid.

Then the distortion is to measure the area of trapezoid ortwisted trapezoid. can be formulated as follows:

Measuring on three axes will yield three distortions, andthe an average area difference ( ) will be

B. Objective Measure for Orientation Interpolator

An error can be defined as a difference of rotation angle be-tween two positions that is rotationally transformed by bothoriginal quaternion rotation and restored quaternion rotation

. If is a quaternion form of arbitrary point in 3-D space and

Fig. 12. Measuring area difference (D ).

Fig. 13. An example of subintervals for estimating error.

if and are the quaternion forms that are rotationally trans-formed by and , respectively, the following equations canbe derived:

where is a value of indicating the rotational relationshipbetween and , and is defined as

(16)

If denotes a difference of rotation angle between and, can be obtained using (16) as the following:

(17)

where indicates inner product operation.Equation (17) can be expressed by an instantaneous quanti-

zation error at a specific time as follows:

(18)

Using (18), rms error and maximum error for all in-tervals can be given as follows. Here, partial sum

is first obtained in order to obtain . Let us assume thatcorresponding decoded keys of original keys , are ,

and corresponding decoded key values of original key values, are , .

Due to the key coding noise, we cannot directly evaluatethe animation path error between ’s and ’s, as shown inFig. 13.

At first, the key interval is divided into three subin-tervals: , , and . Also, we can get the

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 1005

Fig. 14. Different test results of objective evaluation.

intermediate quaternion values at time and using the fol-lowing equations:

where the SLERP function is spherical linear interpolation.Also, due to the twisted animation path, as shown in Fig. 13,

the distortion measurement in the subinterval shouldbe dealt with and , independently. It is assumedthat the distance between two animation paths ( ’s and ’s)is minimum at time in the interval . At time and

, the instantaneous errors are

Also, we also assume that the position of in the intervalis proportional to the ratio as follows:

The intermediate quaternion values at time and the instan-taneous error should be

It is very difficult to directly evaluate at any . Therefore,let us introduce the linear approximation of as

We can obtain the partial sum of rms error ( ) and thePEAK error ( ) of the interval as shown at the bottomof the page.

Finally, the rms ( ) and peak ( ) errors of the whole in-terval are

(19)

(20)

Therefore, the distortion can be measured based on (19) and(20).

C. Objective Measure for Coordinate Interpolator

Coordinate interpolator represents 3-D animation curves foreach vertex in the time sequence. Each 3-D animation curve canbe thought of one Position Interpolator. Therefore, the distortionof it can be measured by using the same error measure as posi-tion interpolator, .

Measuring the average area difference ( ) is a bit different,because it has to find minimum and maximum value from theall vertices as follows:

1006 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

Fig. 15. Results of subjective results: (a)-(f) position and orientation interpolator results and (g)-(i) coordinate interpolator results; (a), (d), and (g) VRML originaltest data; (b), (e), and (h) results of the proposed technique; and (c), (f), and (i) results of the PMF at the same bit rate with the proposed technique.

• Average area difference for the th vertex

• Average distortion error

where is the number of keys ( ) andis the number of vertices ( ).

IX. EXPERIMENTAL RESULTS

In this section, we present the experimental results conductedthrough the core experiments for interpolator data compression

in MPEG-4 AFX. The tests are conducted objectively as wellas subjectively. The objective test results are measured with thedistortion measures that are given in the previous section. Thesubjective results are provided with some visual results.

A. Objective Test Results

The rate-distortion curves are produced for the proposedmethod and the PMF on the normalized area difference, asdescribed above. It was performed on 159 test data for position,390 for orientation, and 131 for coordinate.

For each interpolator, the results are divided into five classes.The “significantly better” is the case where the proposed methodhas, on the average, more than 30% gain over PMF. The “better”is the case where the proposed method has 10% to 30% gain over

JANG et al.: INTERPOLATOR DATA COMPRESSION FOR MPEG-4 ANIMATION 1007

TABLE VIIISUMMARY OF OBJECTIVE EVALUATION (%)

PMF. The “similar” is the case where the gain is 10% to .The “worse” is to and the “significantly worse”is under .

Fig. 14 shows different cases of evaluation results. Wepresent the summary of test results in Table VIII. As shownfrom Table VIII, the proposed technique outperformed theexisting BIFS coding technology substantially.

B. Subjective Test Results

The subjective tests were conducted by the viewers in theMPEG SNHC subgroup. The subject test results also showed thesuperiority of the proposed method. The comparison was madewith the original VRML file and PMF. The snapshot images areshown in Fig. 15.

X. CONCLUSION

Interpolator data compression is a crucial part of animationcompression for future multimedia applications. The proposedtechnology has been developed as a part of the MPEG-4 CoreExperiments for the development of MPEG-4 AFX. Since thestandards are described to provide the bitstream syntax and thedecoding process, we tried to give more insights on the encoderside. Some of the encoding decisions are made rather simple,instead of trying to find an optimal solution. This is one of thefuture work items, which is to make a very smart encoder.

There are new types of animation mechanism that are beingproposed to MPEG-4, including curve interpolators. A furtherextension to the new types of interpolators is a foreseeable re-search study in the near future.

ACKNOWLEDGMENT

The authors thank J.-H. Ahn, M. Preda, M. Bourges-Sevenier,Y. Yodmin, and G. Jang for their valuable comments on thispaper. They would also like to thank C. Kang and H. Lee forcollecting and making quite a few test animation data.

REFERENCES

[1] M. O’Rourke, Principles of Three-Dimensional Computer Ani-mation: Modeling, Rendering, and Animating with 3D ComputerGraphics. New York: Norton, 1998.

[2] The Virtual Reality Modeling Language, ISO/IEC 14 772-1, 1997.[3] M. Deering, “Geometry compression,” in Proc. SIGGRAPH, 1995, pp.

13–20.[4] G. Taubin and J. Rossignac, “Course on 3D geometry compression,” in

Proc. SIGGRAPH, Los Angeles, CA, 1999.[5] Z. Liu and M. F. Cohen, “Keyframe motion optimization by relaxing

speed and timing,” in Proc. Eurographics Workshop Animation, Maas-tricht, The Netherlands, 1995.

[6] M. Isenburg and J. Snoeyink, “Coding with ASCII: Compact, yet text-based 3D content,” in Proc. 3DPVT, Padova, Italy, 2002, pp. 609–616.

[7] M. Alexa and W. Müller, “Representing animations by principle com-ponents,” Proc. Eurographics, vol. 19, no. 3, pp. 411–418, 2000.

[8] G. Leach and J. Gilbert, “VRML molecular dynamics trajectories,” inProc. VRML, Paderborn, Germany, 1999, pp. 71–78.

[9] MPEG SNHC Homepage [Online]. Available: http://www.sait.sam-sung.co.kr/snhc

[10] E. S. Jang, T. Capin, and J. Ostermann, “Visual SNHC tools,” in TheMPEG-4 Book. Upper Saddle River, NJ: Prentice-Hall, 2002, ch. 9.

[11] Information Technology – Coding of Audio-Visual Objects – Part 16:Animation Framework EXtension (AFX), ISO/IEC 14496-16:2003.

[12] Coding of Audio-Visual Objects-Part 1: Systems, ISO/IEC14 496-1:2001, 2001.

[13] Coding of Audio-Visual Objects-Part 1: Visual, ISO/IEC 14 496-2:2001,2001.

[14] A. Watt, 3D Computer Graphics19p, 3rd ed. Reading, MA: Addison-Wesley, 2000, pp. 483–488.

[15] A. K. Jain, Fundamentals of Digital Image Processing, 1st ed. UpperSaddle River, NJ: Prentice-Hall, 1998.

[16] Information Technology—Coding of Audio-Visual Objects—Part11: SL Extension and Multi-User Worlds, ISO/IEC 14496-11:2003/Amd.1:2003(E).

Euee S. Jang (S’93–A’96) was born in Chonju,Korea, in 1968. He received the B.S. degree incomputer engineering from Chonbuk NationalUniversity, Chonju, Korea, in 1991 and the M.S. andPh.D. degrees in electrical and computer engineeringfrom the State University of New York, Buffalo, in1994 and 1996, respectively. His Ph.D. dissertationconcerned robust image communications.

He was a Research Associate with the U.S. ArmyResearch Laboratory, Adelphi, MD, in 1995. Aftercompleting his Ph.D., he joined Samsung Advanced

Institute of Technology, Kyungki, Korea, where he spent six years involvedwith the research and development of various MPEG-4 visual technologies. Hewas a Project Editor of the MPEG-4 Visual Standard in the MPEG committee(ISO/IEC JTC1/SC29/WG11) from 1997 to 2000. He also served as the Chairof Synthetic Natural Hybrid Coding (SNHC) Subgroup in MPEG from 1999 to2002. He has coinvented many MPEG-4 technologies, including shape coding,three-dimensional mesh compression, and interpolator compression. He is alsoa forefather of MPEG-4 Animation Framework Extension (AFX) standardiza-tion. Since 2002, he has served as an Assistant Professor with the College ofInformation and Communications, Hanyang University, Seoul, Korea. He hasauthored more than 70 MPEG contribution papers, 12 journal or conferencepapers, 35 pending or awarded patents, and two book chapters. His current re-search interests include computer graphics, animation, image and video coding,and lossless multimedia data compression.

Dr. Jang is a member of the Association of Computing Machinery. He servedas a Technical Committee Member for ICME. He was the recipient of twoISO/IEC Certificates of Appreciation for the contribution in MPEG-4 devel-opment in 1999 and 2000, respectively.

James D. K. Kim received the B.S. and, M.S. de-grees in electronics engineering from Yonsei Univer-sity, Seoul, Korea, in 1993 and 1995, respectively.

Since 1995, he has been a Member of the ResearchStaff with the Multimedia Laboratory, SamsungAdvanced Institute of Technology, Kyungki, Korea,where he is involved with research and developmentin the area of video coding, stereoscopic videoprocessing, three-dimensional object modeling,animation, rendering, and coding. He has beenactively involved in MPEG-4 since 1999, focusing

in particular on system and synthetic/natural hybrid coding. His recent researchis related to the animation data coding tools by compressing key-frame-basedanimation in the MPEG-4 version 5 specifications.

1008 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 7, JULY 2004

Seok Yoon Jung (S’92–M’99) received the B.S.and M.S. degrees in control and instrumentationengineering and the Ph.D. degree in electricalengineering from Seoul National University, Seoul,Korea, in 1987, 1989, and 1998, respectively.

Since February 1989, he has been a Member ofthe Research Staff of Samsung Advanced Instituteof Technology, Kyungki, Korea. His current researchinterests include image processing, image and videodata compression, and three-dimensional graphicsmodeling and representation.

Mahn Jin Han (M’02) received the B.Sc. degree incomputer science and the M.Sc. degree in multimediafrom Yonsei University, Seoul, Korea, in 1994 and1996, respectively.

Currently, he is a Member of the Technical Staffwith the Multimedia Laboratory, Samsung AdvancedInstitute of Technology, Kyungki, Korea, focusing onthe synthetic/natural hybrid coding (SNHC) researchof MPEG-4. He has been involved in the standardiza-tion of three-dimensional model coding in SNHC andis now focusing on the Animation Framework eXten-

sion (AFX) effort. He is the main contributor to the depth image-based repre-sentation in AFX and also actively involved in the research of interpolator com-pression. His research interests include computer graphics, data compression,image-based rendering, and scientific visualization.

Sang Oak Woo received the B.S degree in computerscience and engineering from Hanyang University,Ansan, Korea, in 1998 and the M.S. degree incomputer engineering from Yonsei University,Seoul, Korea, in 2000.

Currently, he is a Member of the Research Staffwith the Multimedia Laboratory, Samsung AdvancedInstitute of Technology, Kyungki, Korea. His currentresearch interests include MPEG-4, multimedia com-pression and streaming, and three-dimensional ani-mation compression and representation.

Shin-Jun Lee received the B.S degree in informa-tion engineering from Jeju University, Jeju, Korea, in1997 and the M.S. degree in computer engineeringfrom Yonsei University, Seoul, Korea, in 1999.

He has been a Member of the Research Staffwith Multimedia Loratory, Samsung AdvancedInstitute of Technology, Kyungki, Korea since2000. His research interests are MPEG-4 SNHC,three-dimensonal (3-D) model compression and 3-Danimation data compression.