motion estimtion and motion compensated (video) coding

1

MOTION ESTIMTION AND

MOTION COMPENSATED

(VIDEO) CODING

1. Imaging space 2. Image sequence vs. Video 3. Interframe correlation and motion compensated (MC) coding 4. Three major techniques in motion

analysis (MA) for video coding:

• Block matching • Pel-recursion • Optical flow

2

• Digital Image Processing:

1960s ⇒ active research field due to the advent of two events

• Digital Image Sequence Processing:

§ Early 1980s ⇒ active research area

§ More information available

§ More computation involved

More memory space required

§ Technically feasible (VLSI, DSP, etc.) ⇒ Image and video sequences:

Indispensable elements of our modern life

Some Applications:

q CD-RAM, VCD, DVD

3

q Digital video camera q Digital TV q Video-phony, Video-conferencing q Video over internet, wireless q Video-on-demand q Interactive multimedia, databases, etc.

• Temporal image sequence: ),,( tyxg • Stereo image (pair) sequence

• Consider a generalization:

A sensor – a solid article in 3-D space Translation (3 parameters) Rotation (2 parameters) ⇒ Many images of the scene

⇔ Infinitely many sensors on all possible positions and orientations, at a moment, take many images

4

Figure 10.1 Two sensors’ position 0)0,0,0,(0,s =v and ?)ß,,z~,y~,x~(s =v .

)~,~,~( zyx

'x

0

(0, 0, 0)

0

X

Y

Z

x

y

X’

Y’

Z’

'y

0’

5

⇒ A spatial image sequence ),,( syxg

r

s

r: the sensor’s position in 3-D

space ),,~,~,~( γβzyxs =

r

• An imaging space:

§ A collection of all spatial image sequences (over temporal domain)

),,,( styxg

r

Alternatively, § A collection of all temporal image

sequences (over spatial domain) • Meaning: Unification of all the image frames

and image sequences [shu & shi 1991]

6

Figure 10.2 A hierarchical structure

Intermediate

Image space ),,,( styxg v

Temporal image sequence ),,,( vectorspecificastyxg =v

Spatial image sequence ),,,( smomentspecificatyxg v=

Individual images ),,,( vectorspecificasmomentspecificatyxg == v

Top level

Bottom level

7

2. Image Sequence vs. Video

• Image frames and image sequences:

defined above

• Video: § refer to a single video frame § or video sequences associated § with visible frequency band (visual information)

• Scope of image frames and sequences is wider.

e.g. Infrared image and image sequences: Outside the visible band: Not video

• Video is more pertinent to multimedia

engineering

• Two are equal when visible band is concerned. • Video, sometimes,

8

⇒ video sequences exclusively

• Image compression

⇒ still image compression

• Video compression

⇒ video sequence coding

3. Interframe Correlation and Motion Compensated Compression

• High interframe correlation exists in

video sequence:

♦ A video sequence in videophone service: less than 10% pixels change their

intensity by 1% of the peak signal between frames [mounts 1969]

9

♦ A video sequence in TV broadcast:

(More motion is involved.) even if there is no scene change between frames, the scene is still scanned constantly

• How to utilize the interframe correlation?

♦ Frame replenishment (FR) [mounts 1969]

§ Compare present frame with next frame pixel-by-pixel

§ If difference large enough,

⇒ Difference, position: coded & transmitted § Otherwise, repeat pixel’s gray scale at

the receiver

§ More efficient than intraframe coding technique such as DPCM

Interframe redundancy:

10

(a) Frame 24 (b) Frame 25

Figure 10.2 Two frames of the Miss America sequence. Problem with frame replenishment:

Figure 10.3 Dirty window effect

11

Main drawback with frame replenishment technique: It is difficult to handle frame sequences containing more rapid changes.

When there are more rapid changes, the number of pixels whose intensity values need to be updated increases. In order to maintain the transmission bit-rate in a steady and proper level, the threshold has to be raised, thus causing many slow changes cannot show up in the receiver.

This poorer reconstruction in the receiver is somewhat analogous to viewing a scene through a dirty window. This is referred to as the dirty window effect. The result of one experiment on the dirty window effect is displayed in Figure 10.5.

From frame 22 to frame 25 of the “Miss America” sequence, there are 2166 pixels (less than 10% of the total pixels) which change their gray level values by more than 1% of the peak signal. When we only update the gray level values for 25% (randomly chosen) of these changing pixels, we can clearly see the dirty window effect.

When rapid scene changes exceed a certain level, buffer saturation will result, causing picture breakup [mounts, 1969]. Motion compensated coding, which is discussed below, has been proved to be able to provide better performance than the replenishment technique in situations with rapid changes.

12

♦ Motion estimation and compensation

§ Find displacement vector of a pixel or a set of pixels between frames

§ Via displacement vector, to predict

counterpart in present frame

§ Prediction error, positions; motion vectors: coded & transmitted

Figure 10.7 Block diagram of motion compensated coding

• MC is much more efficient and complicated than FR

Motionanalysis

Predictionand

differentiation

Encoding

13

(MC ⇒ smaller dynamic range of prediction error) (both MC & FR are predictive coding) (need correspondence – “motion trajectory”)

Figure 10.6 Two consecutive frames of a video sequence

• The first motion compensated coding

algorithm [netravali & robbins 1979] achieved 22-50% lower bit rate than the frame replenishment algorithm

(a) 1−nt (b) nt

14

• ⇒ High coding efficiency is obtained with

high computational complexity

• This strategy is: § Technically feasible (VLSI, DSP, etc.) § Economically desired

∵ cost of DSP drops much faster than that of transmission [dubois 1981]

§ Continue to be the case

• Motion compensated video coding ⇒ A major development in coding since 1980s [musmann 1985, zhang 1995, kunt 1995]

Motion Estimation (ME)

• Biological Vision Perspective

15

§ For understanding human visual system (HVS) § Studied by biomedical scientists § Goal: to shed light for technology

development (machine vision and others) § Measurement and interpretation: two steps

[singh 91]

• Computer Vision Perspective

§ For interpretation of 3-D motion from image sequences

§ Requirements: Accuracy, and processing speed

§ A dominant subject in vision [thompson 1989]

§ Correspondence and Optical flow: two

approaches § Rigid vs deformable motions

§ Intermediate variable vs direct methods

• Signal Processing Perspective

16

§ For MC coding (main purpose) & other MC video processing tasks

§ Ultimate goal: Ø Higher coding efficiency

i.e., same quality with less bit rate or better quality with same bit rate Ø Quite different from the goal in

Computer Vision

§ Real time requirement ⇒ Simple 2-D displacement model

4.1 Block Matching § Description: region-based, matching

§ Representative: [jain 1981], diagram

§ The most popular one among three

techniques

17

Figure

Figure 11.1 Block matching

Matching Criteria • Instead of finding maximum similarity • An equivalent, more computationally

efficient way of block matching: finding minimum dissimilarity

• Dissimilarity (error, distortion, or distance) between two images tn and tn-1, D (s, t), is defined as:

correlation window

q

q

p

(x0,y0)

(xi,yi)

(x,y)

(a) t n frame

(x0,y0)

(x,y)

(xi,yi)

p

q

p

best block matching

displacement vector

(b) tn-1 frame

search window

An original block

18

∑=

++−∑=⋅=

q

ktksjnfkjnfM

p

jqptsD1

)),(1),,((1

1),(

M(u, v): a dissimilarity metric of u and v.

• Several types of matching criteria:

§ mean-square error (MSE) [jain 1981]

2)(),( vuvuM −=

§ sum of squared difference (SSD) [anandan 1987]

§ sum of squared error (SSE) [chan 1990]

§ mean absolute difference (MAD) [koga 1981]

vuvuM −=),(

§ mean absolute error (MAE) [nogaki 1972].

• A study based on experimental works reported that the matching criterion does not significantly affect the search [srinivasan 1984]. The MAD is hence preferred due to its simplicity in implementation [musmann 1985].

19

Searching Procedures

• Full search

§ Brute force in nature § Good accuracy § Large amount computation

• Several fast searching procedures

§ 2-D logarithm search [jain 1981] § Three-step search [koga 1981] § Conjugate direction search

[srinivasan 1984]

j

j+4

j+3

j+2

j+1

j-1

j-2

j-3

j-4

k-4 k-3 k-2 k-1 k k+1 k+2 k+3 k+4

3

4

3

4

4

4 4

1 2

1

1 1 1 2

2

20

Figure 11.1 (a) 2-D logarithm search procedure: Points at (j, k+2), (j+2, k+2), (j+2, k+4), and (j+1, k+4) are found to give the minimum dissimilarity in steps 1, 2, 3 and 4, respectively.

j+6

j+4

j+2

j

j-2

j-4

j-6

k-6 k-4 k-2 k k+2 k+4 k+6

3

3

3

3

3 3

3

3

2

1 1

1 1

1 1 1

1

2 2

2 2

2 2 2

1

21

Figure 11.2 Three-step search procedure: Points (j+4, k-4), (j+4, k-6), and (j+5, k-7) give the minimum dissimilarity in steps 1, 2, and 3, respectively.

Thresholding Multiresolution Block Matching (shi and xia 97)

level increasing

level 1

level 4

level 3

level 2

level 0

resolution

decreasing

22

Figure 11.3 A Gaussian pyramid structure.

Block matching

Block matching

Block matching

Satisfying threshold

Satisfying threshold

Motion field

Low pass filtering and subsampling




Y Y

Y

Frame n-1 Frame n

N

N

23

Figure 11.4 Block diagram for a three-level threshold multiresolution block matching.

Figure 11.5 The thresholding process.

Pyramid n-1

Pyramid n

Pyramid level

1

L

Projection of the block and its estimated motion vector at level L

Estimation of motion vector

Calculation of the MAD of the block at level L

24

Table 11.1 Parameters used in experiments

Parameters at level

Low resolution level

Full resolution leve l

“Miss America”

Search Range

3×3

1×1

Block Size 4×4 8×8 Thresholding Value 2 None(Not applicable)

“Train” Search Range

4×4

1×1

Block Size 4×4 8×8 Thresholding Value 3 None (Not applicable)

“Football” Search Range

4×4

1×1

Block Size 4×4 8×8 Thresholding Value 4 None (Not applicable)

25

Figure 11.6 The 20th frame of the “Train” sequence.

Figure 11.7 The 20th frame in the “Football” sequence.

26

Table 11.2 Experimental results (I)

PSNR (dB)

Error Image Entropy (bits/pixel)

Vector Entropy (bits/vector)

Block stopped at top level/ Total block

Processing Times (No.of Additions, 106)

“Miss America” Sequence Method 1 [tzoraras 1994] New Method (TH=2) New Method (TH=3)

38.91 38.79 38.43

3.311 3.319 3.340

6.02 5.65 5.45

0/1280 487/1280 679/1280

10.02 8.02 6.17

“Train” Sequence Method 1 [tzoraras 1994] New Method (TH=3)

27.37 27.27

4.692 4.788

6.04 5.65

0/2560 1333/2560

22.58 18.68

“Football” Sequence Method 1 [tzoraras 1994] New Method (TH=4) New Method (TH=3)

24.26 24.18 24.21

5.379 5.483 5.483

7.68 7.58 7.57

0/3840 1464/3840 1128/3840

30.06 25.90 27.10

4.2 Pel-recursion

27

§ Description: Ø region-based, matching

Ø DFD: displaced frame

difference Ø minimization of a nonlinear

function of DFD

§ Representative: [netravali & robbins 1979] 1st MC algorithm (Figure) § The earliest one, but the least

accurate one (backward MA)

• In [netravali 1979], a dissimilarity measure called displaced frame difference (DFD) was defined as follows.

28

),(),(),;,( 1 ydyxdxfyxnfydxdyxDFD n −−−= −

n , n-1 : two successive frames yx, : coordinates in image planes

yx dd , : two components of displacement

vector, dv

• Obviously, if no error in the estimation, then

DFD will be zero.

• A nonlinear function of dissimilarity measure: 2DFD , was proposed in [netravali 1979].

• Displacement estimation is then converted into a minimization problem.

• A typical nonlinear programming problem

♦ In [netravali 1979] :

29

),(1,),,(1 ydyxdxnfyxdyxDFDdd kkk −−−∇⋅−=+vvv

α

yx,∇ : gradient operator with respect to x, y

1024/1=α

♦ Inclusion of a neighborhood area

To make displacement estimation more robust, Netravali and Robbins assume the displacement vector is constant within a small neighborhood Ω.

∑Ω∈

⋅⋅∇−=+,,,

);,,(21 21

yxidyxDFDdd k

ikk

d

vvvv ωα

i : an index for the i th pixel ),( yx in Ω ,

iω : weight

∑ =≥

Ω∈i l

i1

0ω

ω

• The algorithm can be applied to a pixel once or iteratively applied several times for displacement estimation.

30

Then the algorithm moves to the next pixel.

The estimated displacement vector of a pixel can be used as an initial estimate for the next pixel.

• This recursion can be carried out horizontally, vertically, or temporally.

31

Figure 12.1 Three types of recursions

4.3 Optical Flow

(x,y+1)(x, y)

(a)

(x+1,y)

(x, y)

(b)

(x,y,tn-1) (x,y,tn)

(c)

32

§ Description: velocity vector for each pixel Ø correlation-based (similar to block

matching)

Figure 13.1 Correlation-based approach to optical flow determination

Ø differential (gradient) method Ø Spatiotemporal energy-based

Ø Phase-based § Representative: [horn and schunck, 1981],

Differential, gradient-based classical solution

(x0,y0)p

qq

p

(x,y)

f(x, y, t)

(x,y)

best matchingcorrelation window

optical flowvector

f(x, y, t-1)

search window

correlation window

The pixel to whichoptical flow needsto be determined

33

§ Mainly used in computer vision

§ Less used in video coding

Ø though accurate, but more side information &

computation Ø (recall ultimate goal of MA in video coding)

⇒ Need some innovative work

5. Classification • Region-based vs. gradient-based

34

Table 14.1 Region-based vs. Gradient-based

Block matching

Pel recursive

Optical flow

Gradient-based | correlation-based method | method

Regional –based

√

√

√

Gradient-based

√

• Forward vs. Backward Motion Estimation

35

rf

e Video in f - T Q

Q-1

T-1

MCP

+

ME

v

q

FB

pf

36

Figure 14. 1 Forward motion estimation and compensation , T: transform, Q: quantizer, FB: frame buffer, MCP: motion compensated predictor, ME: motion estimator, e: prediction error , f: input video frame, f p: predicted video frame, f r: reconstructed video frame, q: quantized transform coefficients, v: motion vector

e Video in f - T Q

Q-1

T-1

+

q

pf

f

37

Figure 14. 2 Backward motion estimation and compensation, T: transform, Q: quantizer, FB: frame buffer, MCP: motion compensated predictor, ME: motion estimator, e: prediction error , f: input video frame, f p: predicted video frame, f r1: reconstructed video frame, f r2: reconstructed previous video frame, q: quantized transform coefficients

Summary

In general, block matching achieves the best performance in terms of higher estimation accuracy and less computational complexity.

It has been used in all of international video coding standards such as H.261, H.263, H. 26L, MPEG 1, 2, and 4.

38

Pel recursive technique has the relatively lowest motion estimation accuracy. Hence less used in the motion estimation and motion compensation.

Optical flow achieves high motion estimation accuracy, but has too many motion vectors to deal with. Mainly used in Computer Vision field.

One attempt (shi et al, 1998): a. Flow vectors are highly correlated b. Can be modeled by a first-order AR model. c. DCT can be applied to flow vectors. d. Good results have achieved:

a. The bit rate is compatible to that achieved by H.263

b. Visual quality is better: i. No block artifacts

39

(a) (b)

(c)

Figure 13.2 (a) The 21st original frame of the “Miss America” sequence, (b) The reconstructed 21st frame with H.263, (c) The reconstructed 21st frame with the proposed technique.

6. References [aggarwal 1988] J. K. Aggarwal and N. Nandhakumar, ``On the computation of motion from sequences of images -- a review,’’ Proceedings of the IEEE, vol. 76, no. 8, pp. 917-935, 1988.

[barron 1994] J. L. Barron, D. J. Fleet and S. S. Beauchemin, “Systems and experiment performance of optical flow techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43-77, 1994.

[boroczky 1991] “Pel-Recursive motion estimation for image coding,” PH.D. dissertation, Technic University of Delft, Netherland.

40

[dubois 1981] E. Dubois, B. Prasada and M. S. Sabri, ``Image Sequence Coding,’’ chapter 3, in T. S. Huang, Ed., Image Sequence Analysis, Springer-Verlag, 1981

[dufaux 1995] F. Dufaux and F. Moscheni, “Motion estimation techniques for digital TV: A review and a new contribution,” Proceedings of the IEEE , vol. 83, no. 6, pp. 858-876, 1995.

[horn 1981] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, vol. 17, pp. 185-203, 1981.

[jain 1981] J. R. Jain and A. K. Jain, “displacement measurement and its application in interframe image coding,” IEEE Transactions on Communications, vol. COM-29, no. 12, pp. 1799-1808, December 1981.

[koga 1981] T. Koga, K. Linuma, A. Hirano, Y. Iijima and T.Ishiguro, “Motion-compensated interframe coding for video conferencing,” Proceedings of NTC'81 , pp. G5.3.1-G5.3.5, New Orleans, LA, Dec. 1981.

[kunt 1995] M. Kunt Ed., Special Issue on Digital Television Part 1: Technologies, Proceedings of The IEEE, vol. 83, no. 6, June 1995.

[mounts 1969] F. W. Mounts, “A video encoding system with conditional picture-element replenishment,’’ The Bell System Technical Journal, vol. 48, no. 7, pp. 2545-1554, September 1969.

[musmann 1985] H. G. Musmann, P. Pirsch, and H. J. Grallert, “Advances in picture coding,’’ Proceedings of the IEEE, vol. 73, no. 4, pp. 523-548, 1985.

[netravali 1979] A. N. Netravali and J. D. Robbins, “Motion-compensated television coding: Part I,” The Bell System Technical Journal, vol. 58, no. 3, pp. 631-670, March 1979.

29

[srinivasan 1984] R. Srinivasan and K. R. Rao, “Predictive coding based on efficient motion estimation,” Proceedings of ICC, pp. 521-526, May 1984.

[shu 1991] C. Q. Shu and Y. Q. Shi, “On unified optical flow field,’’ Pattern Recognition, vol. 24, no. 6, pp. 579-586, 1991.

[thompson 1989] W. B. Thompson, “Introduction to special issue on visual motion,’’ IEEE Trans. on Pattern Analysis and Machine Intelligence , vol. 11, no. 5, pp. 449-450, 1989.

[zhang 1995] Y.-Q. Zhang, W. Li and M. L. Liou, Ed., Special Issue on Advances in Image and

41

30

motion estimtion and motion compensated (video) coding

Documents