an evolutionary architecture for motion-compensated 100 hz television

IEht TKANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5 , NO. 3. JUNE 1Y95 207

An Evolutionary Architecture for Motion-Compensated 100 Hz Television

Gerard de Haan, Paul W. A. C. Biezen, and Olukayode Anthony Ojo

Abstract- In this paper, recently developed algorithms for high quality motion-compensated up-conversion are combined in a new architecture closely resembling that of current 100 Hz consumer television sets. By merging the motion estimation and the motion compensation part, the estimated silicon area could be reduced to a level where the entire functionality can be realized with one processing chip replacing the currently used 100 Hz processing chip. This enables a simple evolution towards motion compensated 100 Hz TV, considered to be very attractive. The architectural choice, and the wish to share expensive memories, requires some modifications in the motion estimator and up-convertor design which are discussed. The specific case of movie programs is dealt with, and it is also shown how the evolutionary architecture can achieve a significantly improved motion portrayal for this movie material.

I. INTRODUCTION

HE SCANNlNG format of a television system largely T determines the maximum spatial and dynamic resolution' of the picture. These, in tum, have a major effect on the perceived image quality [ I ] . One particular parameter of the scanning format, the picture update frequency, further determines how well the movement of objects in the scene can be represented on the screen (motion portrayal) and, at least for cathode ray tube displays (CRT's), also how seriously the picture quality is deteriorated by so called flickering artifacts [2]. Particularly these flickering effects are growing more and more imtating on modem television sets due to the increasing size and brightness of present displays. It has been known, e.g., 131, that adequate motion portrayal is viable at much lower update rates than the rates required for flicker-free image reproduction. Therefore, the compatible increase of the picture update rate, i.e., without additional information transmission, has been investigated by many researchers, e.g., [4)-[9]. Straightforward repetition of every picture on the screen, to double the update rate in a similar way as flicker problems are tackled in the cinema when displaying film, is an example of this line of thinking. A closer study, however, reveals that this type of processing, as well as other linear interpolation techniques, affects the dynamic resolution and the motion portrayal [lo].

film was applied as an intermediate format; later, electronic solutions were developed [ 1 1 1 . Recently, more advanced meth- ods (1 121-[ 141. or [ 151) apply motion information to improve the quality of the interpolated images. Motion vectors are used in a wide range of image processing applications such as coding, noise reduction, and scan rate conversion. The demands on the motion estimator for scan rate conversion, however, are more severe [ 161 and different from the requirements in the field of coding for which an appropriate estimation method can be selected from the various available algorithms [ 171. Moreover, consumer applications ask for a cost effective approach where functions are merged if possible to share expensive components. Only recently has a motion estimation algorithm been developed which is believed to be feasible for consumer television sets [18], [19J. Furthermore, the same research team presented a new up-conversion algorithm that provides an effective graceful degradation for picture parts with unreliable motion vectors 1201.

In Section I1 of this paper, it is shown that an architecture exists that allows a smooth transition from the current 100 Hz TV designs to a motion-compensated design. Section I11 indicates the modifications to the above mentioned motion estimation algorithm, which are necessary to make it fit in such an architecture. The architectural choice also leads to additional constraints affecting the robust up-conversion algorithm, dealt with in Section IV. The availability of motion vectors obviously helps to prevent the blurring of fast moving objects in regular studio broadcast material, but can also potentially improve the motion portrayal of television movies. The adaptations of the motion estimation and up-conversion algorithms required for processing of movie material, and the detection thereof, are discussed in Section V. The evaluation of the motion estimation and up-conversion algorithms can be found in earlier publications and is not repeated in this paper. The realization of the concept in a single processing chip is considered to be attractive for future application in flicker-free television receivers on the 50 Hz market. The feasibility of such a one-chip approach is discussed in Section VI. Section VI1 summarizes the conclusions of the paper.

The field rate conversion problem has arisen much earlier for interfacing 50 and 60 Hz broadcasting systems. Initially

recommended by Associate Editor Y. Neuvo.

Laboratories, 5656 AA Eindhoven, The Netherlands.

11. THE EVOLUTIONARY ARCHITECTURE

The motivation for standards conversion in a 50 Hz consumer television set is the removal of flickering, particularly annoying with cathode ray tubes. This is a luxury feature, not an attribute indispensable to program watching. Hence, only a modest additional price can be calculated. The feature price in a consumer product is determined mainly by the chip

Manuscript received April 8, 1994; revised April 13, 1995. This paper was

The authors are with the Television Systems Group, Philips Research

IEEE Log Number 9413020. 'Sparial resolution is also referred to as definition and is measured on

stationuy images. D~,nutn i~ resolution is related to the perceived sharpness of muving objects.

1051-8215/95$04.00 0 1995 IEEE

Ik.EE TKANSACTIONS O N ClRCClTS ANI) SYSTEMS FOR VIDEO ‘TECHNOLOGY. VOL. 5. NO 3. J l N E 1995

Fig 1 Architecture of a nonmotion compensated ”digital scan” 100 H7 telc~i\ioii recei\er introduced on the conwmer market in 1993.

count and size, and less by the complexity (irregularity) of the design. as quantities are large compared to most professional applications of field rate conversion. An important implication is that. in contrast to studio equipment, in a television receiver field memories have to be considered as expensive components as they cannot (yet) be included on a processing chip. Amongst the available field memories, the FIFO’s are currently the most attractive \ince they come in low pin-count packages that include all address logic.

These constraints are not unique for motion-compensated 100 H/ television, but have been recognized before for nonmotion compensated designs. Fig. I shows a part of the architecture of a recent high-end 100 Hz, “digital scan.” television set that was introduced on the 50 Hz market in 1993. In this architecture two FIFO field memories are used, and a nonmotion compensated median filtering up-conversion algorithm.

For ;I description of the algorithm we introduce the input luminance function Fjs. r ) ) that defines the luminance at the spatjal po5ition s = ( . I . . y ) ’ ~ on the screen, with T indicating transposition while I I is the field number. Referring to Fig. I , the relation between the up-converted luminance function I.’!, (:(I. i i ) at the output of the first field memory and the input signal l..(,.~. r r 1 is

Lvhere Intl I . ) is a discretization function defined as

This cquals the output signal of a field repetition algorithm, sometimcs indicated as an “,1.4UU” algorithm [SI. The signal at the output of the second field memory is a delayed version o f L;,(.c. 1 1 ) over the input field period, i.e.. a delay of two output fields. This is possible by writing every second field period from the first memory into the second and reading i t twice from this second memory.

The relation of the output signal to the input depends on the output tield number modulo 1. The first and the last field in the foiir field output cycle are a copy of F,,(L, . n , ) :

( 1 1 mod 4 = 1 V 1 1 mod 4 = 0) . ( 3 )

”-2 ” n*2 n+4 l ime (oulpul field number)

Fig. 2. Timing of the signals as available in Fig. I .

Each second and third output field are median filtered

F O T L t ( ~ . a ) =median { F,, [ S + (-9.4 F, [i + (+:) % 71 - 21 . F,,[s. I ) ] }

(nmod4 - 2 v nmod-l = 3 ) . (4)

The processing IC is connected to the second tield memory and contains the above described median filtering to correct for the interlace in the second and third fields [21]. A multiplexer switches between original and median filtered lines. Fig. 2 shows the timing of the signals in this architecture.

This 100 Hz architecture (that offers simultaneous access to two successive input fields with only two FIFO field memories) seems to yield the lowest possible cost, as the field memory required for the delay function (access to the previous field) cannot be combined with the one necessary for field-time compression. The FIFO applied for the delay function further enables recursive temporal noise filtering. which is an advantage over a two field memory approach. in which the two memories field-alternatingly compress the data [SI. A similar architecture would therefore be attractive for a motion compensated 100 Hz television design. The main additional requirement is access to an environment of pixels in the two successive fields that are already available. Such could be achieved using a “cache” memory with a high output bandwidth for both fields included on the processing chip.

Fig. 3 shows the resulting “evolutionary architecture.” in which the cache, or shifter, memories are realized with line and pixel delays with an output switch matrix to allow the high output bandwidth. The field delay and the vector shifting module. required for motion estimation as well as for up-conversion. are shared by these functions. This is not an obvious possibility as. in this architecture, motion vectors from the estimator cannot be available in time for the up-convertor that has access to the same picture part, due to the processing delay of the estimator. This delay will amount to several TV-lines (with a block-matching type of estimator at least the height o f a block). Furthermore, the convertor requires adaptation for movies because of their lower temporal sampling frequency. The feasibility thereof. within the constraints of this evolutionary architecture. remains to be proven. Finally, the standards conversion from a 50 Hz, 2: 1 interlaced to a 100 Hz, 2: 1 interlaced signal requires two different types of interpolation as shown in Fig. 4: A first temporril interpolation, and a second r,erticul interpolation that generates an interlace phase 1 (odd) field from an interlace phase two (even) input field and vice versa

I

HAAN et al.: AN EVOLUTIONARY ARCHITECTURE FOR MOTION-COMPENSATED 100 HZ TELEVISION

~

209

....

. .. . . 1 Vector shifter i . . . . . . . . . . . . . . . . . . . . . . . . . - - - - - - - - - - - - VcsMr shifter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . _______..______.

Fig. 3. architecture.

Basic diagram of an efficient motion compensated 100 Hz television

(interlace conversion). Motion vectors help to improve both interpolation types, as is known from the literature. However, motion vectors are valid at one temporal instance only.

It shall be shown in the following sections how the algorithms of motion estimation [19] and up-conversion [20] can be adapted such that the problems mentioned above can be solved and an evolutionary two field memory motion compensated 100 Hz architecture, with shared shifter memories for estimator and up-convertor, becomes feasible.

111. THE MOTION ESTIMATION ALGORITHM

As mentioned in the introduction, the proposed motion estimator in this 100 Hz television concept is the 3-D Recursive Search (3-D RS) block-matcher. The background of this choice is not only its cost effectiveness in terms of operations count, but also the suitability of the algorithm for field rate conversion because of its highly consistent "true-motion'' vectors. We will neither evaluate nor defend the choices made in this algorithm, as this was done in earlier work [19]. However, we will summarize the concept of [19] and discuss the modifications required by the evolutionary architecture resulting from the previous section.

Candidate vectors G = (C,, of the motion estimator in [19] are limited to CSmax:

CS""" = {Cl - N i c, I + N , -M c, + M } ( 5 )

where N = 16, and M = 9 for the current design. The algorithm of [ 191 consists of two estimators. Each estimator (index a and b ) yields a Displacement vector D,,,(x, n) chosen from the candidate set CS,,b(;ilL, n) , such that it minimizes the matching error e(C, x. n) for the block B ( X ) centered at X. In [19] a block in a previous field n - 1 was shifted to optimally match a block in a reference field n. According to the suggested architecture of Fig. 3, different pairs of blocks, from the available two up-converted luminance fields Fu(g, n) and Fu(g, n - 2) symmetrically about a reference temporal instance, are matched (Fig. 5) in such a way that the resulting vector D(X? n) will be valid for the interpolation of an intermediate output field Fout(g, n - 1). The error function on which the vector selection is based, is

_.._..._.____..___ n

.~ ...................... n Input tiel& Output flddr \ Time

U Original 0 v-intcrpolatfd T-interpolated

e, and o indicate odd and even fields

Fig. 4. every fourth output fields is an original.

Up-conversion from 50 to 100 Hz on interlaced video signals. Only

therefore modified to

e(G, x, n) = lFu[z+Cp(X, a): 71 - 21

- Fu(rc- G,(X, n) , .]I + a * IlU(X, nlll, (6 )

where IlulI is the length of the update vector (defined later), a is a constant, and * indicates scalar multiplication. This update penalty cy * IlUll, introduced in [ 191, improves the vector smoothness. The relation between the original element - C(x, n) of the candidate sets C S a ( X ) and CSb(X) [ 191, on the one hand, and C,(x, n ) pointing to the next field, and C p ( x , n) pointing to the - previous field respectively, on the other hand, is given by

z E B ( X )

[C,(& n) mod2 = I]

C J X , n) = {Int[; * G ( X , n)l, Int[i * cycx, 41IT C,(X, =ax, - G J X , (7)

The match error is summed over a block B ( X ) , at position x = (X,? X,)T of the block grid, with a width X and height Y (X = Y = 8 ) , defined as2

B(x)={~~Ix,-x/21zIx,+x/2, xy - Y / 2 I y 5 x, + Y/2}. (8)

The best of the two vectors resulting from estimators a and b is selected and assigned to all pixels at positions g = (2, Y ) ~ on the pixel grid within the block B ( X ) .

Candidate vectors, in each block, for the first (a) of the two ( a and b ) parallel processing estimators are taken from the candidate set CS,(X, n):

C S , ( X , .) =

U{+- (;)? n ] ,

(9)

2Some details are omitted for clarity of the text; in the actual design, match errors are summed over subsampled blocks (pixel subsampling), and vectors are calculated for half the number of blocks only (block subsampling). These features have been earlier described in [18].

I

210 I

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5, NO. 3, JUNE 1995

lo e 0

Y+

~ Y

ii 01

0 0 6n 0

0 0 9n 0 0

0 0

Fig. 5. Symmetrical block-matching on interlaced signals. The figure shows how the y-components 0, 6, and 9 of a displacement vector are split in two (a p and an n) parts.

while the update vectors of estimator a, L(& t ) , are found as

where Nbl is the output of a block counter, @ is a look-up table function that generates an update vector for each value of its argument which is cyclic in p . Preferably, p is not a factor of the number of blocks in a picture to prevent a coupling between update vector and spatial position on the screen.

Similarly, for the second estimator b the candidate set csb(X: n) is found as:

similar to the candidate vectors, is split into two parts:

-P D (z, - n) = {Int [; * D,(g, n)] , Int [i * ~ ~ ( g ,

= .) - DP(% (14)

which are applied in the up-convertor. It is of significance that the vector Q(z, n) is now valid when calculating an output field at time n - 1. (Note that the argument n indicates the current output field number when Q(g, n) is calculated, not the position in time for which it is valid). The definition of the candidate vector parts C,(& n) and C&(& n), enables the same (integer) resolution of result vectors as in the design of [19]. Subpixel interpolation for odd candidate vectors is prevented, accepting a slight asymmetry in the exact position of the reference in time for which the vectors are valid, as indicated in Fig. 5. This figure shows, for various y- components of the displacement vector, which lines of the two successive fields are matched in the motion estimator.

Due to the interlaced scanning commonly used in television, even with integer motion vectors vertical interpolation in the motion estimator is required. We suggest to apply a motion compensated spatio-temporal median filter, as this nicely fits in the architecture of Section 11. Consequently, (6) is valid for odd vertical candidate displacement vectors, whereas for even values it is adapted to either

e(C, x, .) = IFi[G - GP(X, n ) , n - 21

--FU[a:+C,(X, n>, n l I + ~ * I I U ( X , n>ll, cE B (XI

while, possibly from the same Look-Up-Table (LUT), updates - U b ( X , n) are generated as

&(X, n) = lut { [Nbl(X, n) + offset] modp}. (12)

Good results are reported in [ 191 using an estimator where the LUT contained the following updates:

To accomplish a symmetrical distribution around Q with p updates, a number of Q updates and symmetrical pairs of the other updates can be added at will to arrive at a value of p that is not a factor of the number of blocks in a field [19].

All this ((8)-(13)) is identical to [19]. A small modification is that the estimator yields a result vector Q(x, n) which,

in case the position in the previous field has to be interpolated, or to

in case the position in the next field has to be interpolated.

1

HAAN et al.: AN EVOLUTIONARY ARCHITECTURE FOR MOTION-COMPENSATED 100 HI. TELEVISION

~

21 I

P

- ~ -

I

Fig. 6. Architecture for motion compensated 100 Hz up-conversion with motion estimation and compensation sharing the costly line and lield memories.

Iv. THE UP-CONVERSION ALGORITHM

Although motion compensated field rate conversion as such is a straightforward thing, the limitations of the chosen architecture of Section 11, and the interlace of the video signal yield some nontrivial problems. Furthermore, motion vectors cannot be expected to describe all temporal changes accurately. The artifacts under these circumstances, however, should be acceptable. To this end, we depart from an earlier described up-conversion algorithm applying order statistical filtering [20] that realizes a “graceful degradation” in areas with unreliable motion vectors. As in the previous section on motion estimation, we will neither evaluate nor defend the up- conversion algorithm, as this was the topic of earlier work [20], but only detail the modifications forced by the architecture of Section 11.

The first field in the four field output cycle is a copy of F,(g, b - 2), additionally delayed over some lines (the delay of the vectors shifters, which corresponds to vector 0). This fixed delay will be neglected further, and the first field of the cycle is written as

Follt(g, n) = Fu(g, n - a), (nmod4 = 1). (19)

Each second and fourth output field is motion compensated using the simple variant of the up-conversion algorithm of [20]. This algorithm applies median filtering to realize a graceful degradati~n:~

Fout(cl n) =median(Fu{[g+D,,(rcl n - l), n], F,[g - &(g, n - 1). ‘n - 21)

1/2 * [Fu(7, n - 2) + Fu(c1 .)I), (nmod2) = 0) (20)

but the displacement vector used, as shown in (20), is appar- ently taken as the previous estimate (at output field number n- 1). Fig. 6 shows that, as an implication, the motion vector is taken from the (temporal) prediction memory of the estimator. Finally, the third field in each cycle is found as a motion

’Interlace is neglected for clarity. In practice the symmetrical motion compensated median as discussed in Section 111 can be added.

Fig. 7. architecture shown in Fig. 6.

Signals as present in the motion compensated 100 Hz television

compensated vertical-temporal median interpolation result:

Fout(gl n) =median { F, [i + (-;), n - 21,

Fu [4+ (+:)1n-2]>

(nmod4 = 3). (21)

For the vertical interpolation, in contrast to the temporal interpolation, an asymmetrical compensation is required [(2 1 )], and therefore only one shifter memory is used. The vector range for motion compensated median filtering in the third field of the sequence, therefore, is halved when compared with the range for temporal interpolation. Consequently, in (21) a modified displacement vector D’(g. n) appears which is defined as

- D’(4, n) = limit [Dz(g: 7 1 ) ; N/21) (22) limit [Dy(g, n ) ; M/2]

with

The main advantage of the proposed architecture of Fig. 6 is revealed by (20) and (21). The displacement vectors are selected as the estimates from the previous compressed field. These vectors can be found in the prediction memory of the motion estimator which stores the “convergence accelerators” (shifted temporal predictions), and are valid for the same field pair as can be seen in Fig. 7. Therefore it is possible to have them available in time, without compensating delays in the video. The processing delay of the estimator, including block e r ~ s i o n , ~ can be subtracted from the field delay, and still a feasible (positive) delay remains. This architecture consequently enables sharing of the expensive line memories in the shifters for the estimator and up-convertor part. Furthermore, motion estimation is performed twice in one input field period, which (slightly) improves the convergence in difficult sequences.

‘Block erosion is a postprocessing operation on the vectors field first introduced in [18]. It prevents fixed block structure from becoming visible in the interpolated image, as after the postprocessing a vector to every pixel is assigned. It is omitted from the current text for clarity.

~

i

,

1

~

j

j j

I

I

~

i

I

~L

I

I I

I

I I

I. !

~

212 I

I

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5 , NO. 3, JUNE 1995

The motion vectors resulting from the 3-D RS block- matcher, as shown in (20) and (21), are applied for the temporal [ (n mod 2) = 01 as well as for the vertical [ (n mod 4) = 31 interpolation, even though their validity strictly speaking is limited to one temporal instance only. Motion compensation of the vector field would have been an obvious solution to optimally serve both purposes. This option, however, is not without its problems (double assignments and holes in the resulting vector field), and it also complicates the design. We preferred to neglect this option and optimize the estimator for the most critical one of the interpolation processes, i.e., the temporal interpolation, and accept for the sequential scan interpolation suboptimal motion vectors.

The signals at various points of the economical design are illustrated in Fig. 7. In this figure, A*B, BC*, etc., are again motion compensated fields and B*, D*, etc. are motion compensated vertical-temporal median-filtered fields. The control of the field-memories is identical to that of Fig. 1.

V. MODIFIED PROCESSING IN MOVIE MODE

A practical complication for a motion compensated 100 Hz convertor is that a television receiver must be able to cope with pictures originating from film as these frequently occur in regular broadcast. Film is shot with 24 pictures per second, and is not directly suitable for television at a field rate of 50 Hz. This problem is usually tackled by running the film at a slightly increased speed of 25 pictures per second. The information from two successive fields can be scanned from one picture. The result obviously still yields 25 “movement phases” per second as opposed to 50 obtained with pictures shot with a video camera. The consequences from this distinction for the 3-D RS estimator as well as for the up-convertor will be discussed in the following subsections.

A. Movie Mode Motion Estimation

Neglecting interlace, the luminance functions, Fu(c, n) of the present field and Fu(g, n - 2) of the previous field exhibit an unusual relation in case of film originated sequences:

(n/2mod2 = 0) Fu(z, n) { F,(g-D, n - 2 ) , (n /2mod2= 1). (24)

The lower part of (24) is the common approximation on which motion estimation algorithms are based. The upper part however reveals that, with film scanner signals, between every other input field pair no displacements occur. Due to interlace, this is only true with a good vertical interpolation. The median filtering may yield some mismatch in the presence of strong vertical detail, but usually the vector 0 between every second field pair will be obtained from the estimator. Apart from the up-conversion problem for this input material, the 3-D RS block-matcher will suffer from the poor quality of the convergence accelerators, i.e., the candidate vectors in CS,(& n) [(9)] and CSb(.X, n) [ ( l l ) ] taken from the previous field n - 1.

Obviously, the convergence look-ahead function of the convergence accelerators (see also [18] and [19]) is lost during film transmission, as no relation exists any longer between

F,(g- 0, n - 2)

vectors in two consecutive input fields. This is even more dramatic, since the displacements of objects measured between the other field pairs, for the same speed, are doubled compared to pictures originated on a video camera. To cope with this problem, for movie material CS,(x , n) and CSt,(x, n) are adapted according to

and

with

(27) 1, n /2mod2=O

k = { 3, n/2mod2 = 1.

This may seem to require an increased capacity of the prediction memory in Fig. 6. However, the storage of the predictable zero vector obtained every other input field can be omitted; thus it is possible to handle the increased delay with the same memory capacity as required for nonfilm sequences. What remains is to reliably detect film being transmitted and use that information to modify the delay of the convergence accelerators in the estimator, and to adapt the up-conversion algorithm. These are the topics of the next subsections.

B. Movie and Phase Detection The unique property of video originating from film is the

irregular motion portrayal or movement judder, shown in (24). With a motion estimator available, it seems obvious to measure the ratio of the summed vector lengths of consecutive fields to obtain a signal indicating whether or not the pictures have originated from film. This yields a film identification signal F l ( n ) according to

- XEfield (n-2)

where X runs through all the. centers of blocks X * Y within the indicated current n or previous n - 2 field. The term A in the denominator is a small constant to prevent division by zero. For “regular” video, F I ( n ) is expected to be close to 1, whereas for film-originated video, either a large (B1) or a small (<1) value is expected depending on the film-phase and the input field number. As the change between two film pictures can occur, either between a next odd and a previous

HAAN et ol.: AN EVOLUTIONARY ARCHITECTURE FOR MOTION-COMPENSATED 100 HZ TELEVISION

even field, or between a next even and a previous odd field, two different film-phases are possible.

The first option is standardized for broadcasting, but the other was encountered fairly often in regular broadcast programs, where in contrast to (24) it was found that

F(& - 0, 71 -- 2), (71/2mod2 = 1) { F ( e - 0, 11 - a), (n/2mod2 = 0). (29) F,,(a> n) FZ

Hence, the F I reveals whether a signal is from film and if so, in which phase it arrives at the motion estimator:

1 - Th 5 F l ( n ) 5 1 + Th + video F l ( n ) < 1 - T h A n / 2 mod 2 = 0 3 film, phase = 1

F l ( 7 ~ ) > 1 + T h A 7x2 mod 2 = 0 3 film, phase = 0 (30)

where phase = 0 indicates nonstandard movie pictures and phase = 1 will be found for standard movie broadcast material. It is possible to choose film or video and film-phase, without multiplications or actually calculating the ratio of expression (27), by comparing

1 iin(x, 7 4 1 1 (31) - XEfidd (n)

with

(1 *TIL) * lp(x. n - 2)11 (32) S€field - (7l-2)

and T h is selected as

(33) 1

2 k T h = - - , ( I c E I N ) .

In this case only simple bit shifts, comparators, and adders are required. Experiments with this identification signal however show an uncertainty when little movement occurs in the sequence, which could be expected as both the numerator and the denominator of (27) will approach zero. Although in this situation no strong artifacts in the up-converted image are expected, an improvement turns out to be simple. In the improved proposal a flip-flop is set to video if

1 - T h 5 F I ( n ) 5 1 + T h A A(n) > Th, A 7112 mod 2 = 0 (34)

where the vector activity A ( n ) is detined as:

- x €field (n)

+ IID(X> n - 2111 (35) - S t f i e l d (n-2)

where runs through all the centers of blocks X * Y in the present and the previous field. The flip-flop is set to film and a \econd flip-flop to standard film-phase, if

F l ( n ) < 1 - Th A A(71) > Th, A nlZmod2 = 0. (36)

Finally the film bit is set, while the film-phase is set to nonstandard, if

FI(71) > 1 + Th A A(n) > Th, A n/2mod2 = 0. (37)

213

I Fig. 8. The 50 Hz motion portrayal in thc case of a movie sequence and the ideal motion compensated 100 Hz output. Compared to nonmovie material, temporal interpolation over a larger interval is required.

The set-reset construction eliminates the hesitation, as deci- sions are only made when sufficient motion is available in the sequence [A(n) > Th,]. A reliable detection was verified with a laboratory prototype of the motion compensated up- convertor.

C. Movie-Mode Up-Conversion The up-conversion to 100 Hz of video originating from

movie input, due to the particular motion portrayal of objects in a movie, is more complicated than for video camera pictures. Fig. 8 illustrates this task. As can be seen, three out of every four output fields, as opposed to only one for the video situation, have to be interpolated temporally.

We could propose for the first output field in the four-field output cycle:

Fout(g, 71) = F,(g, 'ri - 4), (umod4 = 1). (38)

The third field is motion compensated. applying the vectors estimated between the third and the fourth field according to

FOut(g, 71) = median {&[g + D,,(g, 71 - l) , n] . F,[g - Qp(g, 71 - l) , 71 - 41,

1/2 * [F,(&:, 71. - 4) + Fu(&, .)I>> (nmod4 = 3 ) . (39)

The second field is found as

Fout(gr T L ) =median {Fu[g + 3/2 * on(&, 71 - l), 711

. &L[g - 1/2 * Dp(g, n - 1). 71 - 41:

1/2 * [Fu(g, n) + Fu(z, n - 4)]}, ( 11 mod 4 = 2) (40)

and finally the fourth field as:

F,,,(g; n) =median {Fu[g + 1/2 * Q,,(&; 71 - l) , 711

. Fu[g - 3/2 *I&(., 71 - l) , ,TI - 41, 1/2 * [F,(g, 71) + F,(Z~ R - 4)]}.

(nmod4 = 0). (41)

As can be seen from (40) and (41), the information for two of the three fields requires shifting over a temporal interval of three output field periods. This is three times more than the

214 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5, NO. 3. JUNE 1995

shifts for the same object velocities in nonmovie sequences. The artifacts therefore, in the case of erroneous motion vectors, are also stronger. Furthermore, three quarters of all fields, rather than one half, are temporally interpolated. Finally, (38)-(41) reveal that an additional field memory is required to have simultaneous access to Fu(g, n) and Fu(g, n - 4), i.e., the algorithm would not fit in the evolutionary architecture of

Subsequently, alternative movie-mode up-conversion algorithms were studied. Before describing the most attractive alternative obtained however, this seems a good moment to conclude from (1) and (24) or (29), that in the case of movie pictures, for Fu(g, n) it holds:

. Section 11.

Fu(E, .) =Fu(g, n + k), (nmod4 = 1, IC = 1, 2, 3). (42)

This means that a particular output of the estimator or the compensator can be realized on various field pairs. This choice can be used to map the algorithm on the architecture defined in Section 11. An attractive alternative movie-mode processing results when the first field in the four-field output cycle is defined as

Fout(:, n) = Fu(:, n - 2), (nmod4 = 1) (43)

whereas the second field has the correct interlace phase if

Fout(:, n) = Fu(z, n), (nmod4 = 2). (44)

The third and the fourth field are motion compensated temporally interpolated fields, yielding a movement phase in between two film images. Field four is found similar to nonmovie pro~essing:~

Fout(:, n) =median {Fu[: + Dn(z, n - I), n], Fu[z - DP@, 12. - I), n - 21, 1/2 * [Fu(c, n) + Fu(c, n - 2)1),

(nmod4 = 0) (45)

but the third field requires an adaptation:

Fout(:, n) = median { F u k + Dn(:, n - 3), n],

Fu[: - Dp(:, n - 3), 71. - 21, 1/2 * [Fu(:, n) + Fu(c, 72, - 2)1),

(nmod4 = 3). (46)

The background of this adaptation is illustrated in Fig. 10. The required displacement vector seems not to be available in the prediction memory, as the previously estimated field pair originated from the same movie picture and therefore yields the vector 0. As discussed in Section V-A, however, this vector 0 is not stored in the prediction memory which enables reading of the vector D(z, n - 3), indicated in (46). This vector is estimated between a temporally incorrect field pair, which after a subjective evaluation was considered an acceptable drawback.

motion-compensated median as discussed in Section III can be added. 5Again interlace is neglected for clarity. In practice, the symmetrical

The resulting motion portrayal for a simple moving object is illustrated in Fig. 9@). The price paid in this approach is a motion portrayal equivalent to that of nonfilm signals up-converted with the field repetition (“AABB”) algorithm. Compared to film, as shown on 50 Hz sets (Fig. 9(a)), this is a major improvement already. The possible sophistication to the level illustrated in Fig. 8 is much smaller, and more dangerous. It is also smaller than the improvement of motion compensated up-conversion of 50 Hz video camera signals, as object velocities in movies are usually kept smaller by the art-director who knows the limitations of his medium.

VI. FEASIBILITY OF A ONE-CHIP DESIGN

Looking at Fig. 6, there are two blocks that require further detailing to enable a feasibility judgment: the motion estimator block and the motion compensated interpolator part. Of these two parts, the up-convertor is simplest and the necessary details are shown in Fig. 11. It requires a double access of two times 27 MHz (assuming 13.5 MHz input signals to the architecture) to both shifter memories via the switch matrices of Fig. 6 and little processing. The first access corresponds to the vector 0 and the second to D(g, t - T/2) . The processing elements required are one 8-b adder stage, a median calculation circuit, as follows directly from (20), and an output multiplexer controlled by the movie detection circuit. As shown in Fig. 11, the median calculation involves three comparators, a multiplexer and a small (3 x 2 b) look-up table all operating at the 27 MHz data rate. The spatio-temporal median filtering required in every third field [(21)] can apply the same circuit as the median filter of (20). For clarity of the figure the multiplexers, required for sharing the median filter, have been left out of the block diagram.

The movie detector, as can be concluded from Section V- B consists of a vector-length accumulator operating at the block-frequency and some minor once-per-field arithmetic. Consequently it hardly adds to the silicon area. A more detailed analysis, considered to be outside the scope of this paper, shows that the up-convertor will cost approximately 5 mm2 in a 1 pm process (excluding the memories in the vector shifters).

The heart of the motion estimator is formed by the block- error summation circuit shown in Fig. 12. Only two of these circuits are required in the total design; one for the a and one for the b estimator (see Section III). Assuming a data rate of 27 MHz, the summation of the errors for the four candidate vectors of each estimator can be handled by this single circuit due to the pixel subsampling with a factor of 4 [ 181. From Fig. 12 it is clear that the main part of this circuit is the memory that has a capacity of 4 * 45 (block subsampling, see [ 181) 12-b words to store the Sum of Absolute Differences for 4 candidate vectors. Read and write access are at 27 MHz. In terms of silicon area the motion estimator further has little requirements apart from the prediction memory. This memory has a capacity of 45 * 72 blocks of 10-b vectors, which makes it fairly large on a processing chip, but since in the proposed architecture it eliminates the need for even more expensive compensating delays in the input signals Fu(g, n) and Fu(g, n - 2) to the up-convertor, it makes the eventual solution very efficient. A more detailed analysis (again considered to be outside the

HAAN er nl A N EVOLUTIONARY ARCHITECTURE f O R MOTION-('OMPENSATF.I) 100 HI 'rELEVISIC)N

~

715

Position POe ion

I - I >

I 1 In-4 n-2 n n+2 outputfieldno. I n-6 n-2 n n+2 outputfieuno.

1 mTraasmitted information Vertically interpolated Motion compensated i J

(3) (b)

Fig. 9. Partial motion compensated movie-modc. Not the ultimate, yet a major improvement over the present situation of mocie on (a) 50 H r set\ is re&ed with (b ) 50 movement p h a \ e A

n-2 n r t 2 n+d n+6 ouQ&ldna

Fig. I O . Timing of the signal\ i n the architecture of Fig. 6 for the movie-mode procesi ng.

i 2

Motion-compenra(sd inlerpolator ......_.........__...~........__I

Fig. I I . multiplexers are controlled hy the movie detection circuit.

Block diagram of the motion compensated interpolator circuit The

scope of this paper) shows that the estimator including the prediction memory will cost less than 20 mm2 in a 1 pm process (excluding the memories in the vector shifters).

In conclusion, it is believed that, apart from the field memories, the total processing for motion compensated field rate conversion can be realized as a one-chip design. An important part of this chip is occupied by the line-memories required in the vector shifters. which will cost up to 50% of the total silicon area. The next most costly parts of the design are the vector prediction memory. the switch matrices and

inputs f n m "eCl"r ri,iften

.I ' 1

L.

[5:::/ I ~ c-

I - . - 1 Mulriplexed rnalch errors OUI

Fig. 12. The heart of the motion estimator is the rrror wmmation circuit. Two of these circuits are required for the X candidate vccton o f the 3-D RS block-matcher.

the error summation memories in the motion estimator. With the indicated number of line-memories in Fig. 6. a one-chip design of certainly less than 100 mm' is feasible, applying a 1 micron technology. This chip then allows a vertical vector component as large as f 9 pixels/field period. which yields a good performance on television broadcast material. as was shown with a laboratory prototype of the motion comperisated up-convertor.

Two 8-b inputs are required to the chip, for F(,(x. T I ) and F,,(L. , ~ - 2 ) , respectively, at 27 MHz data rate, assuming input material sampled according to the CClR recommendation 60 1. Finally, one 8-b output F O U f ( g . 72) is necessary at the same data rate. These I/O demands enable an acceptable pin count which leads to the conclusion, in combination with chip area estimates, that a one-chip design for motion compensated field rate conversion including motion estimation is feasible.

VII. CONCLUSION In this paper a motion compensated 100 H L television

architecture for consumer applications has been proposed. This

216 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 5. NO. 3, JUNE 1995

architecture enables a smooth evolution from the current 100 Hz designs towards the use of motion compensation, as just a single processing chip has to be exchanged.

The attractiveness of such an approach is very large from a TV-manufacturers point of view. It was shown that the additional constraints of this choice did not prevent the use of very sophisticated algorithms for motion estimation and up-

. conversion, although some minor modifications were required. The algorithms used have been evaluated against well known alternatives before, and from there it is known that their performance is very good. The modifications merely improved the performance of the motion estimator further as the number of iterations on a field pair increased.

By focusing the motion estimator on the application of temporal interpolation rather than on de-interlacing, the symmetrical motion estimation was a logical choice. Through this modification in the estimator, a similarity occurred between the most expensive part of the estimator and the up-convertor: the vector shifters. Realizing the estimation as well as the up-conversion in the output rate domain made it possible to exploit this similarity by sharing the costly memory elements of these functions. The extremely low operation count of the 3-D RS block-matcher made the doubled processing rate affordable.

It was shown in Section V that the same architecture could cope with movie pictures as well, albeit with a slight adaptation of the processing. The detection algorithm to control this adaptation was described. A substantial improvement of the motion portrayal with movie material could be achieved, which provides the presented 100 Hz concept, in addition to flicker elimination, a very strong advantage over the present 50 Hz television sets. Although it was reported that informal tests were very positive, a proper subjective evaluation of this part of the specification could not yet be presented.

A first indication of the estimated VLSI consequences, in terms of silicon area and pin count, led to the expectation that a one-chip design is feasible, which is regarded a successful outcome of the investigation.

REFERENCES

[l] I. H. D. M. Westerink, “Perceived sharpness in static and moving images,” Ph.D. dissertation, Dep. Phil. and Social Sci., Tech. Univ. Eindhoven, 199 I.

[2] J. A. J. Roufs, “Dynamic properties of vision I. Experimental relation- ships between flicker and flash thresholds,” in Vision Research. New York: Pergamon, 1972, vol. 12, pp. 261-278.

[3] R. D. Kell, A. V. Bedford, and M. A. Trainer, “Scanning sequence and repetition rate on television images,” in Proc. IRE, 1936, vol. 24, pp.

[4] U. E. Kraus, “Prevention of large-area flicker in consumer television sets,” Rundfunktechnische Miffeilungen, vol. 25, pp. 2 6 2 6 9 , 198 1 (in German).

[5] R. N. Jackson and M. J. J. C. Annegarn, “Compatible systems for high-quality television,” SMPTE J., pp. 719-723, July 1983.

[6] E. J. Berkhoff, U. E. Kraus, and J. G. Raven, “Application of picture memories in television receivers,” IEEE Trans. Consumer Electron., vol. CE-29, no. 3, pp. 251-258, Aug. 1983.

[7] M. Achiha, K. Ishikura, and T. Fukinuki, “A motion adaptive high- definition converter for NTSC color TV signals,” SMPTE J. , vol. 93, pt. 1, no. 5, pp. 470-476, 1984.

[8] C. Hentschel, “Television with increased image quality; Flicker reduction applying an increased vertical deflection frequency in the receiver,” Ph.D. dissertation, Institut fiir Nachrichtentechnik, Tech. Univ. Braunschweig, Drei-R-Verlag, Berlin, Feb. 1990 (in German).

559-576.

[9] G. Huerkamp, “On flicker reduction in television receivers,” Ph.D. dissertation, Dep. Elect. Eng., Univ. Dortmund, 1990 (in German).

[IO] G. J. Tonge, “Television mobon portrayal,” in Les Assises des Jeunes Chercheurs. Rennes, Sept. 1985.

[ I l l C. K. P. Clarke and N. E. Tanton, “Digital standards conversion: Interpolation theory and aperture synthesis,” BBC Res. Dep. Rep., Dec. 1984.

[12] M. Sugimoto, T. Fujio, and Y. Ninomiya, “Second generation HDTV standards convertor,” in Proc. 14th Int. Symp. Extended TV HDTV Syst., Montreux, June 1985.

[13] Y. Tanaka, T. Ohmura, K. Okada, Y. Ohtsuka, T. Kurita, S. Goshi, Y. Ninomiya, and T. Nishizawa. “HDTV-PAL standards convertor,” NHK Lab. Note, no. 326, Jan. 1986.

[14] D. PelC, P. Siohan, and B. Choquet, “Field-rate conversion by motion es- timatiodcompensation,” in Signal Processing of HDTV, L. Chiariglione, Ed. Amsterdam, The Netherlands: Elsevier, 1988, pp. 319-328.

[15] P. Robert, M. Lamnabhi, and J. J. Lhuillier. “Advanced high definition 50 to 60 Hz standard conversion,” SMPTE J., pp. 420-424, June 1989.

[16] G. A. Thomas, “Television motion measurement for DATV and other applications,” BBC Res. Rep., no. BBC RD 1987/11.

[17] H. G Musmann, P. Pirsch, and H. J. Grallert, “Advances in picture coding,” Proc. IEEE, vol. 73, no. 4, pp. 523-548, Apr. 1985.

[18] G. de Haan and H. Huijgen, “Motion estimation for TV picture enhancement,” in Signal Processing of H O W , H. Yasuda and L. Chiariglione, Eds. Amsterdam, The Netherlands: Elsevier, 1992, vol. 111, pp. 241-248.

[I91 G. de Haan, P. W. A. C Biezen, H. Huijgen, and 0. A. Ojo, “True motion estimation with 3-D recursive search block-matching,” IEEE Trans. Circuirs Syst. Wdeo Technol., vol. 3, pp. 368--388, Oct. 1993.

[20] - , “Graceful degradation in motion compensated field-rate conversion,” in Signal Processing of HDTV, V. L. Stenger, L. Chiariglione, and M. Akgun, Eds. Amsterdam, The Netherlands: Elsevier, 1994,

[21] H. J. Dreier, “Line flicker reduction by adaptive signal processing,” Amsterdam, The

pp. 249-256.

in Signal Processing of H D W , L. Chiariglione, Ed. Netherlands: Elsevier, 1990, vol. In, pp. 695-701.

Gerard de Haan was born in Leeuwarden, The Netherlands, on April 4, 1956. He received the B.Sc., the M.Sc. (with honors), and the Ph.D. de- grees from Delft University of Technology in 1977, 1979, and 1992, respectively.

In 1979 he joined the Philips Research Laborato- ries in Eindhoven. He has led various projects related to HDTV and IDTV that concerned algorithm and VLSI architecture design, as well as hardware prototyping. He participated in the EUREKA 95/05 project and in the EUREKA 95/06 Working Party

on up-conversion. He has coached students from various universities, and teaches for the Philips Centre for Technical Training. From September 1991 to September 1992, he was a Visiting Researcher in the Information Theory Group of Delft University. At present, he is with Philips Research as a Senior Scientist in the Television Systems Group and has a particular interest in algorithms for motion estimation, scan rate conversion, and image enhancement.

Dr. de Haan’s work in the areas mentioned above has resulted in about 30 patents and patent applications.

Paul W. A. C. Biezen was born in Goirle, The Netherlands, on March 22, 1965. He completed his study in electrical engineering at the Technische Hogeschool Rijswijk, in 1990.

In August 1990, he joined Philips Research Lab- oratories in Eindhoven as a Research Assistant in the Television Systems Group. He contributed to the Video Signal Processor chip, worked on HDMAC for a short period, and designed hardware for real time video signal processing. At present Mr. Biezen contributes to the research in the area of motion

estimation, motion compensation, and television scan rate conversion.

HAAN et al.. AN EVOLUTIONARY ARCHITECTURE FOR MOTION-COMPENSATED I00 HZ TELEVISION

Olukayode Anthony Ojo was born in Ijaka-Oke, Nigeria, on August 15, 1965 He completed the Bachelor’s degree in electncal engineenng at the University of Ibadan, Nigena, in 1987 In 1990, he received the Master’s degree in electronic engineering at the Philips International Institute, Eindhoven, The Netherlands

After graduating, he joined the Television Sys- tems Group of the Philip? Research Laboratones in Eindhoven as a Research Scientist He hd? m c e then been active in algonthmic and VLSI architec-

tural design for motion estimation and field rate conversion He has also been involved in image noise reduction for video applications, in algonthmic re5earch as in VLSI specification, architectural design, and (hardware) prototyping He is currently involved in the dred of IC specification for video algorithms

an evolutionary architecture for motion-compensated 100 hz television

Documents