by amruta kulkarni under guidance of dr. k.r. rao fast mode decision for inter mode selection in...

By Amruta Kulkarni Under Guidance of DR. K.R. RAO Fast mode decision for Inter Mode Selection in H.264/AVC Video Coding

Contents Need for video compression Motivation Video coding standards, video formats and quality Overview of H.264 Complexity reduction algorithm for inter mode selection Experimental results Conclusions References

Need for Video Compression It reduces both storage and bandwidth demands. Insufficient resources to handle uncompressed videos. Better proposition is to send high- resolution compressed video than a low- resolution, uncompressed stream over a high bit-rate transmission channel.

Motivation [2] Removing redundancy in a video clip Only a small percentage of any particular frame is new information Highly complex process Reduce the overall complexity suitable for handheld devices

Timeline of Video Development [10] Inter-operability between encoders and decoders from different manufacturers Build a video platform which helps to interact with video codecs, audio codecs, transport protocols, security and rights management in well defined and consistent ways

OVERVIEW OF H.264 / AVC STANDARD Built on the concepts of earlier standards such as MPEG-2 and MPEG-4 Visual Achieves substantially higher video compression and has network friendly video representation 50% reduction in bit-rate over MPEG-2 Error resilience tools Supports various interactive (video telephony) and non-interactive applications (broadcast, streaming, storage, video on demand)

H.264/MPEG-4 Part 10 or AVC [2, 5] Is an advanced video compression standard, developed by ITU-T Video Coding Experts Group(VCEG) together with ISO/IEC Moving Picture Experts Group(MPEG). It is a widely used video codec in mobile applications, internet ( YouTube, flash players), set top box, DTV etc. A H.264 encoder converts the video into a compressed format(.264) and a decoder converts compressed video back into the original format.

How does H.264 codec work ? An H.264 video encoder carries out prediction, transform and encoding processes to produce a compressed H.264 bit stream. The block diagram of the H.264 video encoder is shown in Fig 1. A decoder carries out a complementary process by decoding, inverse transform and reconstruction to output a decoded video sequence. The block diagram of the H.264 video decoder is shown in Fig 2.

H.264 encoder block diagram Fig. 1 H.264 Encoder block diagram[7]

H.264 decoder block diagram Motion Compensation Entropy Decoding Intra Prediction Intra/Inter Mode Selection Inverse Quantization & Inverse Transform Deblocking Filter + + Bitstream Input Video Output Picture Buffering Fig.2 H.264 decoder block diagram [2]

Slice Types [3] I (intra) slice contains reference only to itself. P (predictive) slice uses one or more recently decoded slices as a reference (or prediction) for picture construction. B (bi-predictive) slice works similar to P slices except that former and future I or P slices may be used as reference pictures SI and SP or switching slices may be used for transitions between two different H.264 video streams.

Profiles in H.264 The H.264 standard defines sets of capabilities, which are also referred to as Profiles, targeting specific classes of applications. Fig. 3. Different features are supported in different profiles depending on applications. Table 1. lists some profiles and there applications. ProfileApplications BaselineVideo conferencing, Videophone MainDigital Storage Media, Television Broadcasting HighStreaming Video ExtendedContent distribution Post processing Table 1. List of H.264 Profiles and applications[2]

Profiles in H.264[9] Fig. 3 Profiles in H. 264[9]

Intra Prediction I pictures usually have a large amount of information present in the frame. The spatial correlation between adjacent macro-blocks in a given frame is exploited. H.264 offers nine modes for intra prediction of 4x4 luminance blocks. H.264 offers four modes of intra prediction for 16x16 luminance block. H.264 supports four modes similar to 16x16 luminance block for prediction of 8x8 chrominance blocks.

Fig.4 16x16 intra prediction modes [11] Fig. 5 4x4 Intra prediction modes [11] Intra prediction

Inter Prediction [5] Takes advantage of the temporal redundancies that exist among successive frames. Temporal prediction in P frames involves predicting from one or more past frames known as reference frames.

Motion Estimation/Compensation It includes motion estimation (ME) and motion compensation (MC). ME/MC performs prediction. A predicted version of a rectangular block of pixels is generated by choosing another similarly sized rectangular block of pixels from previously decoded reference picture. Reference block is translated to the position of current rectangular block (motion vector). Different sizes of block for luma: 4x4, 4x8, 8x4, 8x8, 16x8, 8x16, 16x16 pixels.

Inter prediction Fig. 6 Partitioning of a MB for motion compensation [5]

Integer Transform and Quantization Transform: Prediction error block is expressed in the form of transform coefficients. H.264 employs a purely integer spatial transform, which is a rough approximation of the DCT. Quantization: Significant portion of data compression takes place. Fifty-two different quantization step sizes can be chosen. Step sizes are increased at a compounding rate of approximately 12.5%.

De-blocking Filter and Entropy Coding De-blocking filter: Removes the blocking artifacts due to the block based encoding pattern In-loop de-blocking filter Entropy coding: Assigning shorter code-words to symbols with higher probabilities of occurrence, and longer code-words to symbols with less frequent occurrences. CAVLC and CABAC

FAT (Fast Adaptive Termination) for Mode Selection [9] The proposed fast adaptive mode selection algorithm includes the following: Fast mode prediction Adaptive rate distortion threshold Homogeneity detection Early Skip mode detection

Fast mode prediction In H264/ AVC video coding is performed on each frame by dividing the frame into small macro blocks from up-left to right-bottom direction. The spatial macro blocks in the same frame generally have the similar characteristics such as motion, detailed region. For example, if most of the neighboring macro blocks have skip mode, that means the current macro block has more chance of having the same mode. Temporal similarity also exists between the collocated macro blocks in the previous encoded frame.

Fast mode prediction Fig. 7 shows the spatial macro blocks, the current macro block X has similar characteristics with its neighboring macro blocks from A through H. In Fig. 8 shows the temporal similarity between current and collocated macro block PX in the previous frame and its neighbors. Fig. 7 Spatial Neighboring blocks [8] Fig. 8 Temporal Neighboring blocks [8]

Fast mode prediction A mode histogram from spatial and temporal neighboring macro blocks is obtained, we select the best mode as the index corresponding to the maximum value in the mode histogram. The average rate-distortion cost of each neighboring macro block corresponding to the best mode is then selected as the prediction cost for the current macro block.

Rate Distortion Optimization Ratedistortion optimization (RDO) is a method of improving video quality in video compression. The name refers to the optimization of the amount of distortion (loss of video quality) against the amount of data required to encode the video, the rate. Macro block parameters : QP(quantization parameter) and Lagrange multiplier ( ) Calculate : Mode = 0.85*2(QP-12)/3 Then calculate cost, which determines the best mode, RD cost = D + MODE * R, D Distortion R - bit rate with given QP Lagrange multiplier Distortion (D) is obtained by SAD (Sum of Absolute Differences) between the original macro block and its reconstructed block. Bit rate(R) includes the bits for the mode information and transform coefficients for macro block. Quantization parameter (QP) can vary from (0-51) Lagrange multiplier ( ) a value representing the relationship between bit cost & quality.

Adaptive Rate Distortion Threshold RD thres for early termination is dependent on RD pred which is computed according to spatial and temporal correlations. RD thres also depends on the value of modulator. Thus, rate distortion threshold is given by, Rd thres = (1+ ) x RD pred modulator provides a trade-off between computational efficiency and accuracy.

Threshold selection Adaptive Threshold I: RD thres = RD pred x (1-8x ) Adaptive Threshold II: RD thres = RD pred x (1+10x ) The threshold is adaptive as it depends on the predicted rate distortion cost derived from spatial and temporal correlations. Where, is the modulation Coefficient, and it depends on two factors namely quantization step (Qstep) and block size (N and M).

Homogeneity Detection Smaller block sizes like P4x8, P8x4 and P4x4 often correspond to detailed regions and thus requires much more computation when compared to larger block sizes. So, before checking smaller block sizes it is necessary to check if a P8x8 block is homogeneous or not. The method adopted to detect homogeneity is based on edge detection. An edge map is created for each frame using the Sobel operator [27].

Homogeneity Detection For each pixel p m, n, an edge vector is obtained D m,n ( dx m,n, dy m,n ) dx m, n = p m-1, n+1 + 2 * p m, n+1 + p m+1, n+1 - p m-1, n-1 2 * p m, n- 1 - p m+1, n-1 (1) dy m,n = p m+1, n-1 + 2 * p m+1, n + p m+1, n+1 - p m-1, n-1 2 * p m-1, n - p m-1, n+1 (2) Here dx m, n and dy m, n represent the differences in the vertical and horizontal directions respectively. The amplitude Amp (D (m, n)) of the edge vector is given by, Amp (D (m, n)) = dx m, n + dy m, n (3) A homogeneous region is detected by comparing the summation of the amplitudes of edge vectors over one region with predefined threshold values [30]. In the proposed algorithm, such thresholds are made adaptive depending on the amplitude of left, up blocks and mode information.

Homogeneity Detection The adaptive threshold is determined as per the following four cases: Case 1: If the left block and the up block are both P8x8 Case 2: If the left block is P8x8 and up block is not P8x8 Threshold =

Homogeneity Detection Case 3: If the left block is not P8x8 and up block is P8x8 Threshold = Case 4: If the left block is not P8x8 and up block is not P8x8

FAT Algorithm [8] Fig. 9 FAT algorithm [8]

FAT Algorithm Step 1 : If current macro block belongs to I slice, check for intra prediction using I4x4 or I16x16,go to step 10 else go to step 2. Step 2 : If a current macro block belongs to the first macro block in P slice check for inter and intra prediction modes, go to step 10 else go to step 2. Step 3: Compute mode histogram from neighboring spatial and temporal macro blocks, go to step 4. Step 4 : Select prediction mode as the index corresponding to maximum in the mode histogram and obtain values of Adaptive Threshold I and Adaptive Threshold II, go to step 5. Step 5 : Always check over P16x16 mode and check the conditions in the skip mode, if the conditions of skip mode are satisfied go to step 10, otherwise go to step 6.

FAT Algorithm Step 6 : If all left, up, up-left and up-right have skip modes, then check the skip mode against Adaptive Threshold I if the rate distortion is less than Adaptive Threshold I, the current macro block is labeled as skip mode and go to step 10, otherwise, go to step 7. Step 7 : First round check over the predicted mode; if the predicted mode is P8x8, go to step 8; otherwise, check the rate distortion cost of the predicted mode against Adaptive Threshold I. If the RD cost is less than Adaptive Threshold I, go to step 10; otherwise go to step 9. Step 8 : If a current P8x8 is homogeneous, no further partition is required. Otherwise, further partitioning into smaller blocks 8x4,4x8, 4x4 is performed. If the RD of P8x8 is less than Adaptive Threshold I, go to step 10; otherwise go to step 9.

FAT Algorithm Step 9 : Second round check over the remaining modes against Adaptive Threshold II : If the rate distortion is less than Adaptive Threshold II; go to step 10; otherwise continue check all the remaining modes, go to step 10. Step 10 : Save the best mode and rate distortion cost.

CIF and QCIF sequences CIF (Common Intermediate Format) is a format used to standardize the horizontal and vertical resolutions in pixels of Y, C b, C r sequences in video signals, commonly used in video teleconferencing systems. QCIF means "Quarter CIF". To have one fourth of the area as "quarter" implies the height and width of the frame are halved. The differences in Y, C b, C r of CIF and QCIF are as shown below in fig.6. [16] Fig.10 CIF and QCIF resolutions(Y, C b, C r ).

Results The following QCIF and CIF sequences were used to test the complexity reduction algorithm. [10] Akiyo Foreman Car phone Hall monitor Silent News Container Coastguard

Test Sequences News Foreman Akiyo CoastguardCar phone Container

Test Sequences Hall monitor Silent

Experimental Results Baseline profile IPPP type. Various QP of 22,27, 32 and 37. QCIF -30 frames CIF - 30 frames The results were compared with exhaustive search of JM in terms of the change of PSNR, bit-rate, SSIM, compression ratio, and encoding time. Intel Pentium Dual Core processor of 2.10GHz and 4GB memory.

Experimental Results Computational efficiency is measured by the amount of time reduction, which is computed as follows: Delta Bit rate is measured by the amount of reduction which is computed by, Delta PSNR (Peak Signal to Noise Ratio) is measured by the amount of reduction which is computed by,

Quality Specify, evaluate and compare Visual quality is inherently subjective. Two types of quality measures : Objective quality measure- PSNR, MSE Structural quality measure- SSIM [29] PSNR - most widely used objective quality measurement PSNR dB = 10 log 10 ((2 n 1) 2 / MSE) where, n = number of bits per pixel, MSE = mean square error SSIM SSIM emphasizes that the human visual system is highly adapted to extract structural information from visual scenes. Therefore, structural similarity measurement should provide a good approximation to perceptual image quality.

Results

Conclusions To achieve time complexity reduction in inter prediction, a fast adaptive termination mode selection algorithm, named FAT [8] has been used. Experimental results reported on different video sequences and comparison with open source code (JM17.2) indicate that the algorithm used achieves faster encoding time with a negligible loss in video quality. Numbers are as shown below: Encoding time: ~43% reduction for QCIF and ~40% reduction for CIF PSNR: ~0.15% reduction for QCIF and ~0.26% reduction for CIF Bit Rate: ~6% reduction for QCIF and ~9.5% reduction for CIF SSIM: ~0.077% reduction for QCIF and ~0.073% reduction for CIF These results show that considerable reduction in encoding time is achieved using FAT algorithm while not degrading the video quality

References: 1. Open source article, Intra frame coding : http://www.cs.cf.ac.uk/Dave/Multimedia/node248.html http://www.cs.cf.ac.uk/Dave/Multimedia/node248.html 2. Open source article, MPEG 4 new compression techniques : http://www.meabi.com/wp-content/uploads/2010/11/21.jpg http://www.meabi.com/wp-content/uploads/2010/11/21.jpg 3. Open source article, H.264/MPEG-4 AVC, Wikipedia Foundation, http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC 4. I.E.Richardson, The H.264 advanced video compression standard,2 nd Edition,Wiley 2010. 5. R. Schafer and T. Sikora, Digital video coding standards and their role in video communications, Proceedings of the IEEE Vol 83,pp. 907-923,Jan 1995. 6. G. Escribano et al, Video encoding and transcoding using machine learning, MDM/KDD08,August 24,2008,Las Vegas,NV,USA. 7. D. Marpe, T. Wiegand and S. Gordon, H.264/MPEG4-AVC Fidelity Range Extensions: Tools, Profiles, Performance, and Application Areas, Proceedings of the IEEE International Conference on Image Processing 2005, vol. 1, pp. 593 - 596, Sept. 2005. 8. ITU-T Recommendation H.264-Advanced Video Coding for Generic Audio-Visual services.

9. S. Kwon, A. Tamhankar and K.R. Rao, Overview of H.264 / MPEG-4 Part 10, J. Visual Communication and Image Representation, vol. 17, pp.186-216, April 2006. 10. A. Puri et al, Video coding using the H.264/ MPEG-4 AVC compression standard, Signal Processing: Image Communication, vol. 19, pp: 793 849, Oct. 2004. 11. G. Sullivan, P. Topiwala and A. Luthra, The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions, SPIE conference on Applications of Digital Image Processing XXVII, vol. 5558, pp. 53-74, Aug. 2004. 12. K. R. Rao and P. C. Yip, The transform and data compression handbook, Boca Raton, FL: CRC press, 2001. 13. T. Wiegand and G. J. Sullivan, The H.264 video coding standard, IEEE Signal Processing Magazine, vol. 24, pp. 148-153, March 2007. 14. I.E.Richardson H.264/MPEG-4 Part 10 White Paper : Inter Prediction, www.vcodex.com,www.vcodex.com 15. March 2003. 16. JM reference software http://iphome.hhi.de/suehring/tml/http://iphome.hhi.de/suehring/tml/ 17. G. Raja and M.Mirza, In-loop de-blocking filter for H.264/AVC Video, Proceedings of the IEEE International Conference on Communication and Signal Processing 2006, Marrakech, Morroco, Mar. 2006. 18. M. Wien, Variable block size transforms for H.264/AVC, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, pp. 604613, July 2003. 19. A. Luthra, G. Sullivan and T. Wiegand, Introduction to the special issue on the H.264/AVC video coding standard, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, issue 7, pp. 557-559, July 2003.

by amruta kulkarni under guidance of dr. k.r. rao fast mode decision for inter mode selection in...

Documents