an efficient intra mode selection algorithm for h.264 based on fast edge classification

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 23 (2008) 699–710

0923-59

doi:10.1

$ Thi

Kong Fo� Cor

E-m

knngan

journal homepage: www.elsevier.com/locate/image

An efficient intra-mode selection algorithm for H.264 based on edgeclassification and rate-distortion estimation$

Zhenyu Wei �, King N. Ngan, Hongliang Li

The Chinese University of Hong Kong, Hong Kong, China

a r t i c l e i n f o

Article history:

Received 31 October 2006

Received in revised form

5 August 2008

Accepted 12 August 2008

Keywords:

H.264

Mode decision

Edge classification

RDO

65/$ - see front matter & 2008 Elsevier B.V. A

016/j.image.2008.08.002

s work was supported in part by the Chinese

cused Investment Scheme (Project 1903003)

responding author. Tel.: +852 95091994.

ail addresses: [email protected] (Z. Wei)

@ee.cuhk.edu.hk (K.N. Ngan), [email protected]

a b s t r a c t

The H.264/AVC is the newest video coding standard recommended by ITU-T and MPEG.

Compared with all existing video coding standards, H.264 can achieve superior

performance by using many advanced techniques. Intra mode selection is an important

feature in H.264 standard and can reduce the spatial redundancy in intra-frame

significantly. An efficient rate distortion optimization (RDO) technique is employed in

H.264 to choose the best mode for each MB, but the computational cost increases

drastically. In this paper, a fast intra-mode selection algorithm is introduced. By using a

fast edge detection method which is based on non-normalized Haar transform (NHT),

edge for each sub-block can be extracted. Based on the local edge information, only few

intra-modes are chosen as mode candidates. A fast RDO algorithm is also proposed in

this paper which is based on accurate rate-distortion estimation model and the fast

intra-mode RDO method. By combining these two methods, computational load is

reduced remarkably. Experimental results show that this fast intra-mode selection

scheme can lessen the encoding time significantly with little loss of bit-rate and visual

quality.

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

H.264 is the newest standard for moving picturecoding [10]. In order to achieve outstanding codingperformance, many advanced techniques are used, suchas: intra-mode decision, variable block size motionestimation, 1

4 resolution motion estimation, multiplereference frames, deblocking filter, integer cosine trans-form (ICT), CABAC and CAVLC entropy coding, etc. It isshown that these techniques can provide nearly 50% bitrate reduction compared with other previous standards.However, as the computational complexity of the encoderincreases, reducing the complexity of some of its algo-rithms becomes an important and challenging task.

ll rights reserved.

University of Hong

.

,

u.hk (H. Li).

Among the new techniques introduced by H.264, intra-mode plays a vital role because it can reduce spatialredundancy substantially. In contrast to previous videostandards, such as MPEG-4 [15], where intra-prediction isperformed in transform domain, in H.264 intra-mode,prediction block is formed based on neighboring recon-structed pixels and is subtracted from current block beforeencoding. In luma component, intra-prediction is appliedfor each 4� 4 block and for a 16� 16 macroblock, andalso for each 8� 8 block in high profile. There are ninemodes for 4� 4 luma block, four modes for 16� 16 lumablock and four modes for 8� 8 chroma block. In order toattain the best coding performance, a very time-consum-ing technique named RDO (rate distortion optimization) isused. It computes the real bit-rate and distortion betweenoriginal and reconstructed frames for each mode. Then therate distortion cost (RDcost) based on Lagrangianrate distortion formula is calculated. The mode whichhas the minimum RDcost will be chosen as the finalcoding mode. Therefore, the computational load of this

www.sciencedirect.com/science/journal/image

www.elsevier.com/locate/image

dx.doi.org/10.1016/j.image.2008.08.002

mailto:[email protected]



ARTICLE IN PRESS

Z. Wei et al. / Signal Processing: Image Communication 23 (2008) 699–710700

kind of exhaustive searching algorithm is not acceptablefor real-time applications. So, it is very important todesign a fast intra-mode decision method to reduce thecomplexity of encoder.

In recent years, several fast intra-prediction modeselection algorithms have been proposed [16,17,14,19,12,21,4]. From the definition of intra-modes, it could havebeen found that they have very strong directionality. Eachmode produces prediction block along correspondingdirection. If the block edge can be detected and the pixelsare predicted along the direction of block edge, a goodprediction result can be obtained. Pan and Lin [17] presentan edge detection method to speed up the intra-modedecision by using Sobel operator. However, since Sobeloperator should be operated on each pixel and the edgedirection histograms also need to be calculated, the extracomputation load is still high. Li and Ngan [13] introduce afast and efficient edge detection method which is basedon the properties of coefficients in non-normalized Haartransform (NHT). Since only a small number of additionand subtraction operations are involved and there are upto 42 models in this method, block edge can be detectedvery quickly and accurately. Based on the block edgeinformation, only few modes need to be checked in intra-prediction.

Because RDO needs to perform transform and entropycoding for each mode to get the real distortion and bitrate, lots of computational costs are introduced. Therefore,if a good method is designed to predict the bit rate anddistortion accurately, a plenty of unnecessary computa-tion will be reduced, such as entropy coding, inversetransform, etc. In order to solve this problem, severalmethods can be used. Because of conversation of energy,the distortion can be calculated in transform domain. Butfor bit-rate estimation, since entropy coding in H.264/AVCis more complex than that in previous standards, it is noteasy to get accurate bit-rate by using lookup tables forruns and levels. Some previous literatures have alreadystudied the bit-estimation problem. Chiang and Zhang [5]and Corbera and Lei [6] proposed rate-distortion modelsbased on quantizer-domain. The r-domain rate model in[7–9] investigates the linear relationship between rateand non-zero quantized DCT coefficients. But unfortu-nately, these methods were designed for rate controloriginally and more suitable for bit-estimation in framelevel, but not in macroblock level. Chen and He [3]proposed a bit-rate estimation model, but since theparameters of this model are fixed and not self-adaptive,

M A B C D

I a b c d

J e f g h

K i j k l

L m n o p

E F G H

Mod

Fig. 1. (a) a 4� 4 block with pixels (a–p) which are predicted by neig

the estimation accuracy is limited. The method in [18]mainly focuses on the inter-mode decision, but in H.264/AVC, intra-mode decision occupies the main part of RDOcomputation, so the speed of this method is not veryremarkable.

By studying the entropy coding method in H.264/AVC,an accurate bit-rate estimation model is proposed in thispaper. The parameters in this model are self-adaptive toguarantee the precision of estimation. The distortion isalso calculated in transform domain to avoid the MBreconstruction process. By using these methods, a plentyof RDO computation is saved while keeping goodperformance.

In this paper, an efficient intra-mode selection algo-rithm for H.264 is proposed which is based on above twotechniques: edge classification and rate-distortion estima-tion. There are two main contributions in this paper:

1

e 3

M

hbo

An accurate bit-rate estimation model is proposed.Based on this model, a fast RDO method is proposedand lots of entropy coding computation is saved.

2
A fast intra-mode RDO scheme is designed based onupper two techniques.
Experimental results show that our methods can greatlyspeed up the intra-prediction process with nearly nodegradation of coding performance.

The rest of the paper is organized as follows: Section 2gives a brief introduction of intra-prediction and RDO inH.264. The proposed edge classification method isintroduced in Section 3. In Section 4, a fast intra-modedecision algorithm based on edge classification is pre-sented. Section 5 describes our fast RDO algorithm, whichis based on precise bit-rate estimation model. Experi-mental results are shown in Section 6 and conclusions aredrawn in Section 7.

2. Overview of intra-prediction and RDO in H.264

There are two kinds of intra-prediction for luma blockin H.264 baseline profile: I4MB and I16MB. In I4MB type,nine different modes are defined as illustrated in Fig. 1(b).Each 4� 4 block is predicted from the spatial adjacentsamples which have already been encoded and recon-structed. The 16 samples of the prediction block which arelabeled as a–p are predicted by the neighboring pixelslabeled as A–M as shown in Fig. 1(a). For example, mode 0

Mode 8

Mode 1

Mode 6

Mode 4

Mode 5Mode 0Mode 7

ode 2: DC

ring pixels (A–M). (b) Nine modes in 4� 4 intra-prediction.

ARTICLE IN PRESS

Z. Wei et al. / Signal Processing: Image Communication 23 (2008) 699–710 701

uses A, B, C and D to extrapolate the prediction blockvertically. The other modes follow the similar methodalong their corresponding orientations. Mode 2 (DC mode)is a directionless mode in which all pixels are predicted by(A+B+C+D+I+J+K+L)/8. For the I16MB type, 33 referencepixels are used to generate a 16� 16 prediction block,four modes are supported: vertical, horizontal, DC and theplane modes.

To choose the best macroblock mode, H.264/AVCencoder computes the RDcost for each possible modeand choose the mode which has minimum RDcost as thebest mode. In information theory, the RDO process can bedescribed as looking for the minimal required rate R toachieve a given distortion D. RDO uses Lagrange Multipliermethod to get the optimal solution. The cost function isshown as follows.

RDcost ¼ SSEþ lmode � R (1)

where SSE is the sum of squared error between theoriginal block and the reconstructed block, lmode is theLagrange multiplier, R represents the bit number con-sumed for coding this block. As shown in Fig. 2, in orderto compute the RDcost for each mode, the block needsto be encoded and decoded to get the bit-rate andthe distortion, so a number of operations of forward/inverse transform and entropy coding are repeatedlyperformed. Hence the computational cost of RDO is veryhigh.

In H.264/AVC, RDOoff option is also supported. The costfunction of this mode is

RDcost ¼ SADþ lmode � R (2)

where SAD is the sum of absolute difference betweenthe original block and the predicted block, R stands forthe bits for coding mode type, motion vector andother side information. The entropy coding processis not needed in this mode, so the computation complex-ity is much lower than RDOon mode, but at the same time,the coding performance is also much worse than RDOon

mode.From (1) and (2), it can be known that choosing I16MB

means that fewer bits are used to signal the mode typeand other information, but the residue may contain higherenergy, especially in sequences with more details. On theother hand, choosing I4MB may give a lower energyresidual after intra-prediction but it needs to use morebits to signal the mode types and other information.

Integer transform & quantization

R

Residue data

Fig. 2. RDO com

So, how to select the best mode is very important for theperformance of the H.264 codec.

3. Fast and efficient edge classification algorithm

In [13], a fast and efficient method was proposedto classify the edge blocks in terms of the NHT coefficients.Unlike the DCT, the NHT not only employs the fastmulti-resolution structure, but also the Walsh-like trans-form. Because only addition and subtraction operationsare involved in this transform, the computational com-plexity is greatly reduced when computing the NHTcoefficients.

Let xðnÞ, n ¼ 0; . . . ;N � 1, with N even, denote asequence of integers. The NHT can be represented bytwo sequences, the approximate lðnÞ and detailed coeffi-cients h(n), defined as follows:

lðnÞ ¼ xð2nÞ þ xð2nþ 1Þ; n ¼ 0; . . . ;N=2� 1

hðnÞ ¼ xð2nÞ � xð2nþ 1Þ; n ¼ 0; . . . ;N=2� 1 (3)

From (3), it can be found that NHT has the most efficientcomputational efficiency compared with other discretetransforms since the whole computation only requires asmall number of additions and subtractions due to itsfunction coefficients with only þ1 and �1.

The 2-D NHT is performed by applying the transforma-tion (3) sequentially to the rows and columns of theimage. The corresponding hierarchical pyramid structureis shown in Fig. 3(a). It is noted that the sametransformations are only applied to the reduced resolu-tion LL sub-band to form the hierarchical pyramid.The coefficients at the lth level correspond to the2l� 2l pixel blocks due to the constant decomposi-

tion structure given in (3). Furthermore, for differentsubbands, i.e., LL, LH, HL, and HH, difference informa-tion corresponding to the original image can be observed.As shown in Fig. 3(b), the same subbands at differentlevels, such as LH1, LH2, LH3,. . ., will represent the graylevel variation in similar way. The coefficients in LHsubband denote the gray level change in the verticaldirection, while the HL and HH represent the change inhorizontal and diagonal directions respectively. LL is theDC component.

Let C denote a 2l� 2l block. B0, B1, B2 and B3 denote

intensity sum of four non-overlap sub-block in a givenblock C, which are shown in Fig. 4. Then, the four NHT

Entropy codingInverse Integer

transform & inverse quantization

Rdcost calculaiton

Distortionate

putation.

ARTICLE IN PRESS

HL1

HH1LH1

HL2

HH2LH2

HL3

HH3LH3

LL3 HL LH HHLevel

1

2

3

4 x 4

2 x 2

8 x 8

Fig. 3. (a) The multi-resolution representation of 2-D NHT. (b) Block features at different levels.

B0 B1

B3B2

θ

2λ-1

2λ

Fig. 4. Possible edge orientation in the given block.


coefficients at the lth level can be written as

LL ¼ B0 þ B1 þ B2 þ B3

LH ¼ B0 þ B1 � B2 � B3

HL ¼ B0 � B1 þ B2 � B3

HH ¼ B0 � B1 � B2 þ B3 (4)

Based on three NHT AC coefficients and some simplegeometry knowledge, about 42 block edge models can bederived, which is shown in Fig. 5. In other words,according to some criteria described in [13], accurateedge information for a given block is obtained by onlyusing three NHT AC coefficients at the highest level.Detailed classification process is described in [13].

4. Proposed fast intra-mode decision algorithm

As mentioned in the previous section, the edgemodel for each image block is identified by using theedge orientation feature. However, blocks with verysmall intensity changes should be classified as homo-geneous blocks because of the weak edge strength. Toidentify blocks with small intensity change, the contrast

information is used in the decision function, which isdefined as

F ¼ f LH þ f HL þ f HH

f LH ¼ jLH=ðLHþ LLÞj

f HL ¼ jHL=ðHLþ LLÞj

f HH ¼ jHH=ðHHþ LLÞj (5)

The block is classified as a homogeneous block when thedecision function F is smaller than a certain threshold Th.In addition, each NHT coefficient is also set as zero if lessthan 0.6 Th.

Based on a lot of experiments, a threshold within theinterval of 0.05 to 0.1 can provide a good classificationresult. Based on the classification results, correspondingintra-modes are chosen as the candidates modes accord-ing to orientation of edge, then a plenty of unnecessarycomputation will be saved.

4.1. I4MB prediction modes

For I4MB prediction modes, the three AC NHTcoefficients LH, HL and HH are computed for a given4� 4 block firstly, then the edge of the block is identified.If the block is homogeneous, mode 2(DC mode) is chosenas the candidate mode. If there exits a vertical orhorizontal edge, mode 0 or mode 1 will be chosen forthe RDO calculation, respectively. If the block contains aedge belonging to other edge modes according to the edgedirection, corresponding modes can be selected ascandidate modes. It should be noted that only nineintra-modes are defined in H.264 for a given 4� 4 lumablock, which are much less than 42 edge modes defined inour edge classification method. Table 1 provides themapping between the nine prediction modes and the 42edge modes. The subscript of each element of the tableindicates the edge mode in Fig. 5. For example, EHM-IAx;2

indicates the three edge modes in the 2nd column of Fig.5. From this table, it could be found that at most threeintra-prediction modes are chosen for one detectededge. For example, if an edge belonging to EHM-ICmodel is detected for given block as shown in Fig. 4(edge orientation angle y is within the range ðp=4p=2Þ),

ARTICLE IN PRESS

EM

EHM

EHM-I EHM-II

ESM EVM

EVM-I EVM-II

A B C A B C A B C A B C

Fig. 5. The framework of the edge classification method.

Table 1The relationship between the prediction modes and the edge modes for 4� 4 luma block

Prediction modes 0 1 8, 3, 7 5, 4, 6 0, 5, 4 4, 6, 1 3, 7, 0 1, 8, 3

Edge model ESM-2 ESM-1 EHM-IA1;1 EVM-IA1;1 EVM-IA3;1 EVM-IA2;1 EHM-IA2;1 EHM-IA3;1

EHM-IAx;2 EVM-IAx;2 EVM-IC EVM-IB EHM-IC EHM-IB

EHM-IIA1;1 EVM-IIA1;1 EVM-IIA3;1 EVM-IIA2;1 EHM-IIA2;1 EHM-IIA3;1

EHM-IIAx;2 EVM-IIAx;2 EVM-IIC EVM-IIB EHM-IIC EHM-IIB


mode 0, mode 3 and mode 7 are chosen as the candidatesaccording to Table 1.

An additional mode named mostProbableMode alsoneeds to be checked, which is described in detail in H.264specifications [10] and produced by the spatially adjacentblocks. There are two reasons to choose this mode. First, ifthe correlation between adjacent blocks is very high, thismode has higher probability to be the best mode.Secondly, if the best mode for given block equals to themostProbableMode, less bits are used to encode the modeindex, so this mode is apt to be chosen as the best mode.

In summary, at least for one mode, and at most fourmodes (three are from edge detection, one is frommostProbableMode) are chosen as the mode candidatesto perform the RDO computation, instead of the originalnine modes.


I16MB type selects a single mode for the wholemacroblock. It is more suited to encode the smooth areaof the picture. On the other hand, I4MB type can get betterresult for coding the area with more details. Based on thisobservation, some criteria can be set to skip theunnecessary I16MB computation.

The proposed edge classification method is performedfor the given 16� 16 macroblock to check whether it ishomogeneous. If the macroblock is classified as homo-geneous, I16MB type will be evaluated. Otherwise, this

type will be skipped and computational cost will be saved.The procedure for checking the homogeneity of themacroblock is described as follows. First the three ACNHT coefficients are computed for the given macroblock.If any one of the following three conditions is satisfied,this macroblock will be considered as homogeneous.

(a)
HL, LH, HH are all zeros. It means the block is totallyhomogeneous.
(b)
HL and HH are zeros, but LH is not zero. It means thatthe block is homogeneous horizontally.
(c)
LH and HH are zeros, but HL is not zero. It means theblock is vertically homogeneous.
Experimental results show that there is a very highprobability that I16MB type is skipped. Especially for thesequence with many details, this ratio is more than 80%.


In H.264 baseline profile, I8MB is the type of intra-prediction modes for chroma component. Totally fourmodes are supported, they are: DC, vertical, horizontaland plane mode. Same as the previous method, firstlythe edge direction is obtained from the U and V

components based on the NHT coefficients. Then themode candidates are decided according to the edgedirection. If the edge directions from the two componentsare same, only one mode candidate is chosen; otherwise,

ARTICLE IN PRESS

Table 2Number of candidate modes

Block size Original # Min # for our

method

Max # for our

method

4� 4 9 1 4

16� 16 4 0 4

8� 8 4 2 3


two modes candidates need to be checked. The DC modealso needs to be checked in order to maintain the codingperformance. Thus, two or three modes will be checkedinstead of the original four modes. Table 2 shows thenumber of the candidate modes for each block size.

5. Proposed fast RDO algorithm

Although RDO improves the coding performancegreatly, the computation complexity is too high to beacceptable for real-time applications. It is necessaryto design a fast RDO algorithm to reduce the computa-tional cost of this module. In this section, a fast RDOapproach is proposed, which is based on the followingthree techniques.

5.1. Precise bit-rate estimation model

In order to predict the bit-rate accurately, the compo-nents of bits for coding one block should be analyzed.Total bits for coding one block can be expressed by thefollowing formula.

Rtotal ¼ Rcoef þ Rheader þ Rmotion (6)

where Rheader stands for the bits used for coding headerinformation, such as mode type, coded block pattern(CBP). Rmotion is the bit number of motion information,including motion vector, reference frame index, etc. Thebit number of these two parts can be obtained throughlook-up tables easily.

Rcoef represents the bits used for coding quantizedcoefficients. It is not practical to get the bits of this part byusing look-up table, since the entropy coding method inH.264/AVC is much more complex. Through our studyingthe entropy coding algorithm in H.264/AVC, it is observedthat the consumed bits for coding quantized coefficientsare related to three factors: N (the number of nonzeroquantized transform coefficients), Z (summation of run-before), and E (summation of absolute value of totalquantized coefficients).

In order to investigate the relationship between bit-rate and the quantized coefficients, some models wasreported in recent literatures. He et al. [7,9,8] proposed ar-domain rate model, which described the linear relation-ship between rate and the percentage of zero quantizedDCT coefficients. This model achieves very good results insome previous compression standards, such as H.263,MPEG2. However, the entropy coding method adopted inH.264 is much more complex. As discussed before, it couldbe found that the r-domain model is only related to the

factor N (the number of nonzero quantized transformcoefficients), but ignores the other two factors Z and E,which also play the important roles in entropy coding.Therefore, r-domain model could not predict the bitsencoded by H.264 accurately. Tu et al. [18] improved He’smodel. In this model, predicted bits equals to the linearcombination of N and E. Thus, the better results can beobtained. In order to get a more accurate model toestimate the bit-rate, all upper three factors are consid-ered in our model and the bits for coding quantizedcoefficients can be predicted by the following equation:

Rcoef ¼ a� N þ b� Z þ g� E (7)

where a, b and g are three parameters of this function. Inorder to be adaptive to the features of different videosequences, or different frames in video, parameters a, band g can be computed and updated by using linearregression method.

For the ðnþ 1Þth macroblock, a, b and g is obtained bysolving following equation arrays:

~Rn ¼ a� ~Nn þ b� ~Zn þ g� ~En (8)

where vector ~Rn contains n elements which are actual bitsfor encoding previous n macroblocks. ~Nn, ~Zn and ~En alsorecord the actual values of N; E and Z in previous n

macroblocks, respectively. It could be observed that thereare three unknowns and n equations, so least squares (LS)method is applied to solve it. Since the derivation andresult are very tedious and complex, the abbreviatedexpression of the result is shown as (9):

a ¼ Ta=F; b ¼ Tb=F; g ¼ Tg=F (9)

where:

Ta ¼ SnrðSzzSee � S2zeÞ � SzrðSeeSnz � SneSzeÞ þ SerðSnzSze � SzzSneÞ

Tb ¼ SnrðSneSze � SeeSnzÞ � SzrðSnnSee � S2neÞ þ SerðSnzSne � SnnSzeÞ

Tg ¼ SnrðSnzSze � SzzSneÞ � SzrðSneSze � SnnSzeÞ þ SerðSnnSzz � S2nzÞ

F ¼ SnnSzzSee � SnnS2ze � SzzS2

ne � SeeS2nz þ 2SnzSneSze (10)

In (10):

Snr ¼Xn

k¼1

NkRk; Szr ¼Xn

k¼1

ZkRk; Ser ¼Xn

k¼1

EkRk

Snn ¼Xn

k¼1

NkNk; Szz ¼Xn

k¼1

ZkZk; See ¼Xn

k¼1

EkEk

Snz ¼Xn

k¼1

NkZk; Sze ¼Xn

k¼1

ZkEk; Sne ¼Xn

k¼1

NkEk (11)

where n stands for the number of encoded macroblocks inpast, Nk Zk Ek are the values of the three correspondingcomponents of the kth MB, Rk is the real consumed bits forcoding the quantized coefficients of the kth MB.

Although the above regression function looks verycomplex, in fact, the kþ 1th group of the parametersða ;b ; gÞ can be obtained easily by the kth group of theparameters, so little extra computation cost is introducedin this regression process.

In Fig. 6 (test sequence is Foreman.QCIF) x-axis is theactual bits, and y-axis is the estimated bits. The beeline is45� line. It can be observed that the most of the dots lie onthe narrow range around the fitting line. It means that our

ARTICLE IN PRESS


bits estimation model can predict real bits accurately,then lots of entropy coding computation is saved.

5.2. Fast intra-mode RDO method

H.264 allows intra-modes to be used in the inter-frames. Therefore, when performing macroblock modeselection, the rate and distortion of both inter- and intra-modes need to be estimated in order to select the bestmacroblock coding mode. The 4� 4 block can be regardedas the counter unit of RDO operation because transforma-tion, quantization and entropy coding are all use 4� 4 asbasic unit. Since there are 16 4� 4 blocks for agiven macroblock and eight inter-modes defined inH.264 as illustrated in Fig. 7, it is found that a totalof 128 RDO operations are needed for inter-modedecision [2].

For intra-mode decision, H.264/AVC combines lumaand chroma components together to perform RDO opera-tion since a symbol named CBP is related to not onlychroma intra-mode type, but also luma intra-mode type.In order to obtain the accurate RDcost where bits for

-100

0

100

200

300

400

500

600

700

Pre

dict

ed B

its

Actual Bits7006005004003002001000

Fig. 6. Correlation between actual bits and predicted bits.

SKIP 16x16 8x16

16 + 16 + 16

P8x8 8x8 4x8 8x4

4 + 4 + 4

Fig. 7. The total number of RDO opera

coding CBP is considered, the following procedureis performed in RDO: under a fixed chroma mode,all the luma modes are checked to calculate the RDcost.Then chroma mode is changed and all luma modesare checked again till the best combination of chromaand luma mode is obtained which makes the RDcostminimum. Because there are four chroma modes,as shown in Fig. 8, the total number of RDO opera-tions for intra-mode decision is 640 [2], which canbe calculated by (12) and is much higher than thatof inter-mode decision. Because RDO is a kind of verytime-consuming operation and is more complex thanmotion estimation/compensation (ME/MC) in inter-framecoding when fast ME algorithm is applied, for onemacroblock, the complexity of intra-mode selection ishigher than that in inter-mode decision when RDO isturned on. It is necessary to reduce the intra-mode RDOcomputation.

NRDO ¼ M8� ð16�M4þ 16Þ (12)

In (12), NRDO is the total number of RDO operations. M8and M4 are 4 and 9, respectively, which stand for themode number of chroma and luma I4MB. In H.264 codecJM [11], mode selection for I16MB type is performed byfollowing steps. First, Hardamard transform is performedon the residue signals of four I16MB modes and chose themode with minimum summation of Hardamard coeffi-cients as the best I16MB mode. Then RDcost is calculatedfor the best I16MB mode by using (1) to compared withI4MB modes and inter-modes, so only 16 units of RDOoperation are performed for I16MB mode selection in theJM codec.

Since the bits for encoding CBP is very few, lumaand chroma components can be considered approxima-tively independent when performing intra-mode selec-tion. By trading off accuracy vs. complexity in intra-modedecision, RDO computation for chroma and luma compo-nents can be performed separately to choose the bestchroma mode and luma mode. In this case the totalnumber of computed RDcost is given by (13), and only164 RDO operations are needed instead of original 640

16x8

+ 16

4x4

4+ = 16

x

4 = 64

= 64

= 128

tions in inter mode decision [2].

ARTICLE IN PRESS

Intra 4x4:9 modes

Intra 16x16:4 modes

9x16 = 144

16

x 4 = 640(Chrmoa mode)

Fig. 8. The total number of RDO operations in intra-mode decision [2].


RDO operations.

NRDO ¼ M8þ ð16�M4þ 16Þ (13)

Based on Table 2, in the best scenario, our algorithmneeds to perform 2þ ð16� 1þ 0Þ ¼ 18 RDO operations. Inthe worst case, this number is 3þ ð16� 4þ 16Þ ¼ 83.Compared with 640 RDO operations in the original H.264intra-prediction method, the proposed algorithm canreduce the computational complexity significantly. Ex-perimental results show that the reduction of RDOoperations has limited impact of coding performance. Itneeds to be clarified that this fast intra-RDO methoddiffers from other traditional fast intra-mode selectionalgorithms because it does not reduce the mode typeswhich will be checked. So, this fast intra-mode RDOscheme can be integrated with other fast mode selectionalgorithms to achieve much faster encoding speed.

5.3. Distortion computation in transform domain

In original H.264 algorithm, the reconstructed frameneed to be generated in order to calculate the distortionbetween the original frame and the reconstructed frame.Because the integer transform in H.264/AVC is anorthogonal transform and distortion is measured by SSE(sum of squared error) in RDO, according to conversationof energy, the quantization error in transform domainmust be equal to the distortion in spatial domain. Thequantization error can be calculated in transform domain.Then, de-quantization and inverse ICT computation can beavoided. This idea was also described in literatures [3,18].

Since ICT uses left and right shift operations to replacemultiplication and division, the complexity of ICT is verylow compared with DCT. This is also the reason that ICT isadopted in H.264. The ICT only occupies a very smallproportion in RDO operations, so distortion computationin transform domain contributes our fast RDO algorithmabout 10–20%. The speedup ratio of proposed algorithm ismainly from the former two techniques.

6. Experimental results

The proposed algorithms were integrated within theH.264 software JM10.1 [11]. Here two our methods weretested. Proposed method I only adopted the fast intra-mode decision algorithm (as discussed in Section 4).Proposed method II showed the hybrid effect of theproposed fast mode decision and the fast RDO algorithms.Another fast intra-decision method proposed by Pan[16,17] was also implemented. All of the above methodswere compared with the full modes search method inJM10.1 with RDO on. In order to evaluate proposed fastRDO algorithm, JM10.1 with RDO off was also compared.The system hardware was a PC with 2.8 GHz Intel P4 CPUand 1 Gb memory. Several test sequences with 150 frameswere chosen and covered high, mild and low motion, aswell as QCIF and CIF resolution. Quantization parameterswere set as 28, 32, 36 and 40, entropy coding adoptedCAVLC.

In order to evaluate coding efficiency, the codingperformance is measured in term of DTtotal, DTRDO, DPSNRand DBR. DTtotal is the saving ratio of total encoding timeand is defined in (14), where RDO is turned on. DTRDO

stands for the coding time saving ratio of the RDO moduleand is calculated by (15). The values of DTtotal and DTRDO

shown in the following tables are average time savingratios in the tests with upper four QP values. DPSNR andDBR denote the difference of PSNR and bit-rate which arecalculated by using the RD-curves fitting method in [1].

DTtotal ¼Tproposed � T Jm

T Jm� 100% (14)

DTRDO ¼ �Tproposed � TRDOon

TRDOoff� TRDOon

� 100% (15)

6.1. All intra-frames mode

In this experiment, all 150 frames were coded as intra-frames. Table 3 shows the results of the proposed methodI. It could be observed that the performance of our methodis better than Pan’s method and the speed is also faster.The reason is that our proposed edge detection algorithmis based on Haar transform and only a small number ofaddition and subtraction operations are involved. There-fore, the extra computational cost is very low. However, InPan’s method, Sobel operator is operated on each pixeland the edge direction histograms also need to becalculated, so the complexity is a little higher.

Table 4 tabulates the results of proposed method II,which is the combination of the fast intra-mode decisionmethod and the fast RDO algorithm. When RDO is turnedoff in JM10.1, speed is much faster, but the codingperformance is degraded greatly. The PSNR decreasesabout 0.46 dB on average and bit-rate increases about7.25%. Our method can save over 80% coding time andabout 96% computational cost of the RDO part. Comparedwith JM10.1 and Pan’s method, our algorithm is muchfaster. At the same time average increment of bitrate is

ARTICLE IN PRESS

Table 4Coding performance compared with JM10.1 RDOOn(All intra-frame)

Sequence RDO Off Pan’s method Proposed method II

DPSNR DBR DTtotal DPSNR DBR DTtotal DPSNR DBR DTtotal DTRDO

(dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (%)

QCIF

Foreman �0.41 6.61 �87.05 �0.15 2.47 �51.68 �0.17 2.77 �83.56 �95.99

Container �0.54 8.46 �86.85 �0.20 2.97 �50.42 �0.21 3.22 �83.32 �95.94

Carphone �0.40 5.73 �86.68 �0.22 3.16 �50.22 �0.39 5.39 �83.14 �95.92

MissA �0.37 6.77 �85.10 �0.35 6.18 �47.56 �0.22 3.63 �80.79 �94.94

CIF

Bus �0.49 7.42 �88.77 �0.22 3.45 �54.42 �0.24 3.60 �85.81 �96.67

Paris �0.44 5.87 �89.14 �0.20 2.64 �57.09 �0.21 2.82 �85.57 �96.00

Hallmonitor �0.56 8.92 �86.60 �0.30 4.64 �51.28 -0.21 3.22 �83.38 �96.28

Mobile �0.51 6.19 �89.29 �0.23 2.81 �53.63 �0.24 2.91 �85.24 �95.46

MotherDr �0.46 8.99 �85.81 �0.33 6.27 �51.45 �0.28 5.21 �82.81 �96.50

News �0.48 6.77 �86.75 �0.29 3.97 �53.09 �0.32 4.28 �83.25 �95.97

Football �0.40 7.94 �86.58 �0.16 3.02 �50.66 �0.21 4.08 �83.11 �95.99

Coastguard �0.41 7.34 �87.78 �0.16 3.02 �51.85 �0.19 3.42 �84.77 �96.57

Avg. �0.46 7.25 �87.2 �0.23 3.72 �51.95 �0.24 3.71 �83.73 �96.02

Table 3Coding performance compared with JM10.1 RDOOn (all intra-frame)

Sequence Pan’s method Proposed method I

DPSNR DBR DTtotal DPSNR DBR DTtotal

(dB) (%) (%) (dB) (%) (%)

QCIF

Foreman �0.15 2.47 �51.68 �0.15 2.48 �59.54

Container �0.20 2.97 �50.42 �0.21 3.16 �60.45

Carphone �0.22 3.16 �50.22 �0.38 5.24 �60.31

MissA �0.35 6.18 �47.56 �0.22 3.80 �58.35

CIF

Bus �0.22 3.45 �54.42 �0.21 3.23 �60.86

Paris �0.20 2.64 �57.09 �0.20 2.73 �62.31

Hallmonitor �0.30 4.64 �51.28 �0.16 2.51 �60.43

Mobile �0.23 2.81 �53.63 �0.23 2.86 �60.52

MotherDr �0.33 6.27 �51.45 �0.21 3.17 �59.28

News �0.29 3.97 �53.09 �0.28 3.53 �60.25

Football �0.16 3.02 �50.66 �0.15 2.88 �59.97

Coastguard �0.16 3.02 �51.85 �0.17 2.56 �60.24

Avg. �0.23 3.72 �51.95 �0.21 3.18 �60.21


about 3.7% or equivalently the loss of PSNR is about0.24 dB. The loss is very slight and can almost be ignored.Fig. 9 plots the R–D curves of several sequences. Theperformance difference between our method and JM10.1is negligible.

6.2. IPPP...mode

In H.264 inter-frame coding, not only inter-modes, butalso intra-modes have to be checked. For a given macro-block, the mode with the minimum cost will be chosen asbest mode to encode the macroblock. Although most ofthe macroblocks are encoded by inter-modes finally, theintra-mode selection is still a very necessary step in inter-frame coding. As discussed in Section 5.2, when RDO isturned on, the RDO operations in intra-mode selection

will consume more computational costs than those ininter-mode decision, so our fast intra-decision and fastRDO algorithms are also effective when coding inter-frames. By applying our proposed algorithm, much fewerintra-modes will be calculated and a number of unneces-sary RDO operations are reduced. In this experiment, for atotal of 150 frames, the first frame was I-frame and allother following frames were coded as P-frame. Thereference frame number was 1. The Simplified UMHexa-gonS [20] was chosen as the fast motion estimationalgorithm.

The experimental results are shown in Tables 5 and 6.The speed and performance of our proposed method I arebetter than those of Pan’s method. Our method II canreduce 60% of the total computation time comparedwith JM10.1 on average with almost same bitrate andPSNR. The results are also consistent with our previous

ARTICLE IN PRESS

37

36

35

34

33

32

31

30

29

28200 300 400 500 600 700 800

Bit rate (Kb/s) Bit rate (Kb/s)

PS

NR

(dB

)P

SN

R (d

B)

PS

NR

(dB

)P

SN

R (d

B)

JM10.1RDO offPan

Proposed IIProposed I

JM10.1RDO offPan


JM10.1RDO offPan


JM10.1RDO offPan


39

38

37

36

35

34

33

32

31

30

29200 250 300 350 400 450 500 550 650600

36

35

34

33

32

31

30

29

28

271000 1500 2000 2500 3000 3500 4000 4500

Bit Rate (Kb/s) Bit Rate (Kb/s)

39

38

37

36

35

34

33

32

31

30600 800 1000 1200 1400 1600 1800 22002000

Fig. 9. Rate–distortion curves (for all I-frames): (a) Foreman (QCIF). (b) Carphone (QCIF). (c) Bus (CIF). (d) News (CIF).

Table 5Coding performance compared with JM10.1 RDOOn (IPPPP....mode)

Sequence Pan’s method Proposed method I

DPSNR DBR DTtotal DPSNR DBR DTtotal

(dB) (%) (%) (dB) (%) (%)

QCIF

Foreman 0.01 �0.12 �40.48 �0.00 0.07 �46.83

Container �0.03 0.79 �42.87 �0.02 0.55 �48.76

Carphone �0.01 0.16 �40.65 0.03 �0.77 �45.13

MissA �0.03 1.06 �39.15 �0.08 1.55 �44.83

CIF

Bus �0.00 �0.01 �41.06 �0.01 0.34 �48.04

Paris 0.01 �0.16 �44.23 �0.01 0.17 �50.93

Hallmonitor �0.01 0.30 �42.7 �0.00 �0.07 �50.72

Mobile 0.01 �0.16 �42.93 �0.00 0.05 �49.68

MotherDr �0.00 0.11 �41.17 �0.01 0.29 �49.22

News �0.07 1.46 �43.56 �0.11 2.25 �50.22

Football �0.04 0.95 �38.01 �0.04 1.06 �44.23

Coastguard 0.00 �0.05 �40.34 �0.02 0.79 �47.25

Avg. �0.02 0.36 �41.43 �0.02 0.52 �47.99


ARTICLE IN PRESS

Table 6Coding performance compared with JM10.1 RDOOn (IPPPP....mode)

Sequence RDO Off Pan’s method Proposed method II

DPSNR DBR DTtotal DPSNR DBR DTtotal DPSNR DBR DTtotal DTRDO

(dB) (%) (%) (dB) (%) (%) (dB) (%) (%) (%)

QCIF

Foreman �0.39 9.12 �77.69 0.01 �0.12 �40.48 �0.02 0.37 �62.53 80.49

Container �0.67 16.55 �80.55 �0.03 0.79 �42.87 �0.14 2.95 �64.57 80.17

Carphone �0.11 2.59 �78.58 �0.01 0.16 �40.65 �0.03 0.71 �62.51 79.55

MissA �0.89 19.78 �79.14 �0.03 1.06 �39.15 �0.16 3.49 �60.94 77.00

CIF

Bus �0.34 7.82 �76.01 �0.00 �0.01 �41.06 �0.04 0.84 �63.97 84.16

Paris �0.39 8.66 �80.18 0.01 �0.16 �44.23 �0.08 1.82 �66.52 82.96

Hallmonitor �0.54 16.66 �79.61 �0.01 0.30 �42.70 �0.09 2.66 �65.10 81.77

Mobile �0.24 6.29 �79.23 0.01 �0.16 �42.93 �0.00 0.07 �66.71 84.20

MotherDr �0.25 6.85 �78.24 �0.00 0.11 �41.17 �0.12 3.00 �62.72 80.16

News �0.44 9.08 �79.21 �0.07 1.46 �43.56 �0.17 3.30 �64.73 81.72

Football �0.50 11.81 �74.20 �0.04 0.95 �38.01 �0.17 3.76 �60.01 80.88

Coastguard �0.33 10.51 �75.46 0.00 �0.05 �40.34 �0.03 0.89 �63.20 83.75

Avg. �0.42 10.48 �78.18 �0.02 0.36 �41.43 �0.08 1.98 �63.63 81.39

36

35

34

33

32

31

30

29

28

27

PS

NR

(dB

)

35

34

33

32

31

30

35

39

38

37

36

34

33

32

31

30

29

29

28

26

25

27

PS

NR

(dB

)

37

36

35

34

33

32

31

30

29

28

PS

NR

(dB

)P

SN

R (d

B)

20 40 60 80 100 120 140Bir Rate (Kb/s)

Bir Rate (Kb/s)

20 40 60 80 100 120Bir Rate (Kb/s)

Bir Rate (Kb/s)

0

0 200 400 600 800 1000 1200 1400 0 50 100 150 200 250

JM10.1RDO offPan


JM10.1RDO offPan


JM10.1RDO offPan


JM10.1RDO offPan


Fig. 10. Rate–distortion curves (IPPP... Mode): (a) Foreman (QCIF). (b) Carphone (QCIF). (c) Bus (CIF). (d) News (CIF).


ARTICLE IN PRESS


analysis. Fig. 10 shows the R–D curves of the four testsequences. It could be observed that the curves of oursand JM10.1 method almost overlap each other. Thisimplies that the performance of proposed fast algorithmsis nearly the same as that of JM10.1.

7. Conclusion

This paper proposes a fast intra-mode decision algo-rithm based on fast edge detection method and rate-distortion estimation. There are two main contributions inthis paper: The first one is to propose an accurate bit-rateestimation model, and based on this model, a plenty ofentropy coding computation is saved. The second one is todesign a fast intra-mode RDO scheme based on the bit-rate estimation model and the edge classification intro-duced in [13]. Verified by the fast, mild, slow motionsequences, our method could reduce the computationalcomplexity by choosing the best mode judiciously.Average time reduction is over 80% in all I-framesequences and over 60% in IPPP sequences. Moreover,our algorithm can maintain the video quality withoutsignificant bit-rate loss. The fast algorithm can be appliedto real-time implementation of H.264 encoder in low-power applications of video coding.

References

[1] G. Bjontegaard, Calculation of average PSNR differences betweenRD-curves, 13th VCEG-M33 Meeting, April 2001.

[2] J. Byeungwoo, Fast mode decision to JVT, Joint Video Team(JVT)Docs. JVT-J033, December 2003.

[3] Q. Chen, Y. He, A fast bits estimation method for rate-distortionoptimization in H.264/AVC, in: Proceedings of the Picture CodingSymposium (PCS), 2004.

[4] C.-C. Cheng, T.-S. Chang, Fast three step intra-prediction algorithmfor 4� 4 blocks in H.264, in: Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS), 2005.

[5] T. Chiang, Y.Q. Zhang, A new rate control scheme using quadraticrate distortion model, IEEE Trans. Circuits Syst. Video Technol. 7(February 1997) 246–250.

[6] J.-R. Corbera, S. Lei, A new rate control scheme using quadratic ratedistortion model, IEEE Trans. Circuits Syst. Video Technol. 9(February 1999) 172–185.

[7] Z. He, Y.K. Kim, S.K. Mitra, Low-delay rate control for dct videocoding via domain source modeling, IEEE Trans. Circuits Syst. VideoTechnol. 11 (August 2001) 928–940.

[8] Z. He, Y.K. Kim, S.K. Mitra, Optimum bit allocation and accurate ratecontrol for video coding via r-domain source modeling, IEEE Trans.Circuits Syst. Video Technol. 12 (October 2002) 840–849.

[9] Z. He, S.K. Mitra, A linear source model and a unified rate controlalgorithm for DCT video coding, IEEE Trans. Circuits Syst. VideoTechnol. 12 (November 2002) 970–982.

[10] H.264 Standard, ITU-T recommendation H.264 advanced videocoding for generic audiovisual services, 2005.

[11] JVT model JM10.1, downloaded from hhttp://iphome.hhi.de/suehring/tml/download/old_jm/i.

[12] C. Kim, H.-H. Shih, C.-C. Jay Kuo, Fast H.264 intra-prediction modeselection using joint spatial and transform domain features, J. Vis.Comm. Image Representation 17 (April 2006) 291–310.

[13] H. Li, K.N. Ngan, Fast and efficient method for block edgeclassification, in: Proceedings of the ACM IWCMC2006 Multimediaover Wireless, 2006.

[14] B. Meng, O.C. Au, Efficient intra-prediction mode selection for 4� 4blocks in H.264, in: Proceedings of the IEEE International Con-ference on Multimedia and Exposure (ICME), 2003.

[15] MPEG-4 Standard, Information Technology-Generic Coding ofAudio-Visual Objects Part2: Visual (in ISO/IEC, ISO/IEC 14496-2(MPEG-4 video)), 1999.

[16] F. Pan, X. Lin, Fast intra-mode decision algorithm for H264AVCvideo coding, in: Proceedings of the IEEE International Conferenceon Image Processing (ICIP), 2004.

[17] F. Pan, X. Lin, Fast mode decision algorithm for intraprediction inH.264/AVC video coding, IEEE Trans. Circuits Syst. Video Technol. 15(July 2005) 813–822.

[18] Y.K. Tu, J.F. Yang, M.T. Sun, Efficient rate–distortion estimationfor H264AVC coders, IEEE Trans. Circuits Syst. Video Technol. 16(May 2006) 600–611.

[19] C.-L. Yang, L.-M. Po, A fast H.264 intra-prediction algorithm usingmacroblock properties, in: Proceedings of the IEEE InternationalConference on Image Processing (ICIP), 2004.

[20] X. Yi, J. Zhang, Improved and simplified fast motion estimation forJM, Joint Video Team (JVT) Doc. JVT-J033, June 2005.

[21] A.C. Yu, K.N. Ngan, G.R. Martin, Efficient intra- and inter-modeselection algorithms for H.264/AVC, J. Vis. Comm. Image Represen-tation 17 (April 2006) 322–344.

http://iphome.hhi.de/suehring/tml/download/old_jm/

http://iphome.hhi.de/suehring/tml/download/old_jm/

an efficient intra mode selection algorithm for h.264 based on fast edge classification

Documents