synthesized views distortion model based rate control in ......model for h.264 [3] and urq model...

Synthesized Views Distortion Model Based Rate Control

in 3D-HEVC

Songchao Tan1, Siwei Ma2, Shanshe Wang2, Wen Gao2

1 School of Computer Science and Technology, Dalian University of Technology, Dalian

116024, China 2 Institute of Digital Media, Peking University, Beijing 100871, China

Abstract. In this paper, we propose a synthesized views distortion model based

rate control algorithm for the high efficiency video coding (HEVC) based 3D

video compression standard. The major contributions of the paper include the

following two aspects. Firstly, we investigate the distortion dependency be-

tween the synthesized views and the coded views including texture video and

depth maps. Then we propose a synthesized views distortion model for 3D-

HEVC, and based on the distortion model an efficient joint bit allocation

scheme is proposed. Experimental results show that the proposed rate control

algorithm achieves better performance on both the coded texture views and syn-

thesized views. The maximum overall (including all coded texture views and all

synthesized views) performance improvement can be up to 14.4% and the aver-

age BD-rate gain is 6.9%. Moreover, it can accurately control the bitrate to sat-

isfy the total bitrate constraint.

Keywords: rate control, bit allocation, 3D video coding

1 Introduction.

The 3D extension of High Efficiency Video Coding [1] (3D-HEVC) was developed

by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-

3V) led by ISO/IEC MPEG and ITU-T VCEG. 3DV video coding aims at coding the

visual information of a 3D scene that usually contains multi-view texture data and its

corresponding depth information [2]. In 3D-HEVC, one view is selected as a base

view which is coded independently of the other views to provide backward compati-

bility to HEVC decoders. Other views (termed as dependent views) are coded with

inter-view prediction using the base view or other reference views to reduce the re-

dundancy between views.

As an important module of video encoder, rate control (RC) is employed to regu-

late the bit rate meanwhile to guarantee a good video quality. For each video coding

standard, rate control is always a hot research topic and many different rate control

schemes for different video coding standards have been proposed, such as quadratic

model for H.264 [3] and URQ model [4], R-lambda [5] and rate-GOP [6] for HEVC.

However for 3D-HEVC, rate control becomes more complicated. Different from the

traditional 2D video coding standards, 3D-HEVC utilizes the depth map to generate

Fig. 1. General framework of the 3D-HEVC [1] system

the synthesized virtual views. The coding quality of synthesized views depends on the

quality of texture video and depth maps [7]. As such, it is important to balance the bit

allocation for texture video and depth maps to get better quality of synthesized views.

And many other techniques are adopted to improve the coding performance. All these

techniques bring great challenges to establish an accurate rate distortion (R-D) model

and bit allocation scheme in rate control for 3D-HEVC.

In the literature, several rate control schemes for 3D-HEVC are proposed. Two

representative rate control methods, URQ and R-lambda have been proposed for 3D-

HEVC in [8] and [9], which are the extension of the methods in HEVC. In order to

get better performance, depth maps-based inter-views MAD prediction is proposed to

improve the prediction accuracy of the to-be-generated bits for the current unit. How-

ever, there is still large room for R-D performance improvement. In our previous

work [10], we proposed an adaptive rate control scheme for 3D-HEVC. The algo-

rithm performance including bit rate mismatch and R-D performance is significantly

improved compared to the above two algorithms.

In this paper, we further propose a novel rate control scheme based on the synthe-

sized views distortion model in 3D-HEVC. Firstly, we investigate the distortion de-

pendency between the synthesized views and the input texture video and depth maps,

and formulate a distortion model for synthesized views. Secondly, based on this mod-

el, the bit allocation scheme for texture video and depth maps is formulated as an

optimization problem.

The rest of this paper is organized as follows. In Section II, the R-D characteristics

of both the coded views and synthesized views are investigated. In Section II-A, the

R-D model for coded texture views is proposed. A view synthesis distortion model to

characterize the distortion dependency of the texture video and the depth maps on the

synthesized virtual views is investigated in Section II-B. In Section II-C, an effective

joint bit allocation based rate control scheme is designed for 3D-HEVC. In Section

III, the experimental results are given to demonstrate the efficiency of the proposed

RC algorithm. Finally, Section IV concludes this paper.

Fig. 2. The relationship between distortion and bit rate of coded texture views, ‘Newspaper_CC’.

2 Rate and Distortion Analysis in 3D-HEVC

As illustrated in Fig. 1, the texture videos are captured by synchronizing the multiple

camera arrays. The associated depth maps are also generated for virtual view synthe-

sis. At the encoder, texture video and depth maps are encoded using 3D-HEVC. At

the client side, the arbitrary virtual views are synthesized from the decoded texture

video and depth maps. Then the decoded texture video and synthesized views would

be presented for viewing at receiver side. Therefore, the quality of coded texture

views and the virtual synthesized views needs to be optimized as follows

min(𝐷𝑣 + 𝐷𝑐),

𝑠. 𝑡. 𝑅𝑑 + 𝑅𝑡 ≤ 𝑅𝑐 , (1)

where Rt and Rd are the bit rate of texture video and depth maps, respectively. Dc and

Dv are the distortion of texture video and synthesized views respectively.

In order to model the expression in (1), we need to investigate the rate and distor-

tion (R-D) relationship for coded texture views and synthesized views, respectively.

2.1 R-D Model for the coded texture views

To obtain the R-D characteristics of the coded views, we encode the original texture

video with 4 quantization parameters (QP) (25, 30, 35, and 40). As an example, the R-

D curve of the texture distortion and the bit rate of test sequence ‘Newspaper_CC’ are

illustrated in Fig. 2.

It can be observed that power functions can be used to fit the R-D points of texture

video well.

𝐷𝑡(𝑅𝑡) ≅ 𝛼𝑐𝑅𝑡−𝛽𝑐 , (2)

Where 𝐷𝑡 is the distortion of the coded texture views. 𝑅𝑡 is the bit rate of texture

views. 𝛼𝑐 and 𝛽𝑐 are model parameters.

10

20

30

40

0 500 1000 1500 2000

R-D curve

乘幂(R-D curve)

线性 (R-D curve)

多项式 (R-D curve)

MS

E

Bit rate (kpbs)

Power

Linear

Quadratic

R vs. D

Fig. 3. The R-D surface of synthesized views distortion and bit rate. ‘kendo’.

2.2 R-D Model for synthesized views

To find the best bit budget between the texture and depth, we also need to establish

the relationship of bit rate and the synthesized views distortion. We investigate the

synthesized views quality influence on the bit rate of texture video (Rt) and the bit rate

of depth map (Rd).

In Fig. 3, the quality influence of texture and depth on synthesized views is inves-

tigated by changing the texture quantization parameter QT from 5 to 45 meanwhile

fixing the depth quantization parameter QD at 24, 34, 39, 44 and 49 respectively. The

quality of synthesized views (Ds) is measured in term of MSE. As shown in Fig. 3,

once Rt / Rd is determined, the Ds - Rt / Ds - Rd relationship can be approximated as

power expression.

The Ds - Rt relationship as

𝐷𝑠(𝑅𝑡) = 𝛼𝑡𝑅𝑡−𝛽𝑡 . (3)

And the Ds - Rd relationship as

𝐷𝑠(𝑅𝑑) = 𝛼𝑑𝑅𝑑−𝛽𝑑 . (4)

Therefore from (3) to (4), we get the distortion model for the synthesized views as

follows,

𝐷𝑠 = 𝛼𝑡𝑅𝑡−𝛽𝑡 + 𝛼𝑑𝑅𝑑

−𝛽𝑑 , (5)

where 𝐷𝑠 is the distortion of synthesized views. 𝑅𝑡 and 𝑅𝑑 are the bits for texture and

depth. 𝛼𝑡, 𝛽𝑡, 𝛼𝑑 and 𝛽𝑑 are the model parameters.

2.3 A Joint Bit Allocation based RC Scheme for 3D-HEVC

Rate control for 3D-HEVC needs to solve the bit allocation on texture/depth level,

view level and frame level. The optimum bit allocation problem is to effectively dis-

MS

E

R Rt

d

tribute the bit budget between texture and depth so that the minimum views synthesis

and coded views distortion are achieved. Based on the proposed coded texture views

and synthesized views R-D model (5), we formulate the overall quality based opti-

mum bit allocation as

(𝑅𝑡𝑜𝑝𝑡

, 𝑅𝑑𝑜𝑝𝑡

) = arg min (𝛼𝑡𝑅𝑡−𝛽𝑡 + 𝛼𝑐𝑅𝑡

−𝛽𝑐 + 𝛼𝑑𝑅𝑑−𝛽𝑑),

s. 𝑡. 𝑅𝑡 + 𝑅𝑑 ≤ 𝑅𝑐 . (6)

ζ is used to represent the proportional relationship between Rt and Rd, defined as

ζ = 𝑅𝑡

𝑜𝑝𝑡

𝑅𝑑𝑜𝑝𝑡 . (7)

Therefore, from (6) and (7), we get the objective optimization function with only

one variable ζ, as shown below,

(ζ) = arg min (𝛼𝑡(ζ

1+ζ𝑅𝑐)−𝛽𝑡 + 𝛼𝑐(

ζ

1+ζ𝑅𝑐)−𝛽𝑐 + 𝛼𝑑(

1

1+ζ𝑅𝑐)−𝛽𝑑). (8)

Many optimization methods can be used to find the optimal solution of (8). In this

paper, Newton iterative method is used to get the approximate optimal value. The

target bit rate for the texture and depth can be expressed as follows

𝑅𝑡 = 𝑅𝑐 ∙ζ

1+ζ , (9)

𝑅𝑑 = 𝑅𝑐 ∙1

1+ζ . (10)

In order to estimate these parameters, we first encode the frames in the first GOP.

Then the model parameters are calculated by the least square error method.

Based on the optimal target bit rate for the texture and depth, the bit rate ratio be-

tween the different views can be further determined by the statistical analysis. In this

paper, we use anchor’s bits ratio between the base view and the dependent views to

allocate the bits for different views.

After allocating the target bit rate for texture/depth level and view level, the target

bit rate needs to be allocated for the different frames. The frame level bit allocation is

proposed in our previous work [7] as follows

, ,1

remain

n

n i remain

n i

R I frameR

R w others

(11)

1 1

,remain remain actual G

n n n n rest

bit rateR N R R N

framerate (12)

Fig. 4. The R-D curves for all views of the proposed RC scheme compared with anchor and R-lambda

‘Undo_Dancer’ and ‘GT_Fly’

where 𝑅𝑛,𝑖 is the target bits for ith frame in nth GOP. 𝑅𝑛𝑟𝑒𝑚𝑎𝑖𝑛 and 𝑅𝑛

𝑎𝑐𝑡𝑢𝑎𝑙 are the target

and actual bits in nth GOP. Nn is the numbers of n th GOP’s frames. 𝑁𝑟𝑒𝑠𝑡𝐺 is the num-

ber of the rest GOP which is not coded. 𝜙 is a proportion of the I frame in a GOP

which is recommended to be 0.4 and 0.25 respectively for the first and the rest GOPs

based on experiments. wi is the weight of the frames in RA hierarchical structure get-

ting from experience.

0.07 ( %8 0)

0.056 ( %8 4),

0.0454 ( % 4 2)

0.035

i

if POC

if POC

if POC

else

w

(13)

where POC denotes Picture Order Count and represents an output order of the pic-

tures in the video stream.

When overflow or underflow occurs, the difference between the target bits and the

actual bits in a GOP will be distributed to the rest GOPs averagely.

The trade-off between the output bit rate (R) and the quality (D) of the compressed

video are determined by the quantization step size (Qs), which is indexed by quantiza-

tion parameter (Q). The R-Qs and D-Qs model have been studied extensively for the

previous video coding standards such as H.264/AVC and HEVC. Here we use a linear

model which is proposed in our previous work [7] as follows

𝑅 = 𝛼 × 𝑋 𝑄𝑃⁄ , (14)

where α is the model parameter. R is the coding rate. QP is the quantization parame-

ter. X is the complexity estimation for the current picture which is computed as fol-

lowing.

11

1 1

0 0

( ) / ( ) ,n n

i i i i n n

i i

X w SAD w SAD R QP

(15)

29

31

33

35

37

39

41

0 2000 4000 6000 8000

Anchor

Proposed

R_lambda

PS

NR

(d

B)

Bit rate (kpbs)

32

34

36

38

40

42

0 1500 3000 4500 6000

Anchor

Proposed

R_lambda

PS

NR

(d

B)

Bit rate (kpbs)

Table 1. Proposed Algorithm R-D Performance Compared With Anchor And R-lambda

Anchor VS Proposed (BD-Rate) R-lambda VS Proposed (BD-Rate) Bits Error (%)

Texture

PSNR

/

Texture bits

Syn views

PSNR

/

Total bits

All views

PSNR

/

Total bits

Texture

PSNR

/

Texture bits

Syn views

PSNR

/

Total bits

All views

PSNR

/

Total bits

R-lambda Proposed

Balloons 2.2% 2.1% 2.0% -3.8% -5.3% -5.2% 1.47% 0.28%

Kendo 2.5% 2.3% 2.2% -4.3% -5.7% -5.6% 0.58% 0.37%

Newspaper_CC 3.8% 3.0% 3.0% -2.6% -3.4% -3.3% 2.60% 0.54%

GT_Fly 0.1% -0.8% -0.5% -8.4% -9.5% -9.8% 1.45% 1.46%

Pozan_Hall2 4.7% 3.3% 3.5% -12.0% -14.0% -14.4% 2.88% 0.62%

Poznan_Street 5.0% 3.3% 3.6% -4.1% -4.6% -4.8% 6.01% 3.88%

Undo_Dancer 4.9% 3.4% 3.6% -6.4% -7.7% -7.8% 0.98% 0.96%

Shark 3.5% 2.1% 2.5% -3.7% -4.7% -4.6% 1.43% 1.63%

1024*768 2.8% 2.5% 2.4% -3.6% -4.8% -4.7% 1.55% 0.40%

1920*1088 3.6% 2.3% 2.5% -6.9% -8.1% -8.3% 2.55% 1.71%

average 3.3% 2.4% 2.5% -5.7% -6.9% -6.9% 2.18% 1.22%

where n is the current frame number. QPn-1 is the quantization parameter of the (n-1)

th frame. Rn-1 is the actual bits of the (n-1) th frame. wi is defined as:

𝑤𝑖 = 0.5𝑛−𝑖 ∑ 0.5𝑛−𝑖𝑛𝑖=0⁄ (16)

3 Experimental Results

To evaluate the proposed 3D-HEVC rate control algorithm, the proposed algorithm is

integrated into the reference software HTM10.0. In order to evaluate the performance

of the proposed RC algorithm and R-lambda algorithm is utilized for comparison. We

have tested our algorithm on all of eight sequences defined in the CTCs (1024×768

and 1920×1088). Each sequence is composed of three views: the left, the center (cod-

ed first) and the right view. After coding, six synthesized views were rendered.

3.1 Control Accuracy

To evaluate the accuracy of the bit rate control, the following measurement is adopt-

ed.

𝐸𝑟𝑟𝑜𝑟 =|𝑅𝑎𝑐𝑡𝑢𝑎𝑙−𝑅𝑡𝑎𝑟𝑔𝑒𝑡|

𝑅𝑡𝑎𝑟𝑔𝑒𝑡× 100%, (17)

where Error is the bits error. Rtarget and Ractual are the number of target bits and the

actual output bits, respectively.

As illustrated in Table.1, it can be seen that the proposed RC algorithm achieves

smaller mismatch between target bits and actual output bits. That is because the frame

level bit allocation proposed in our previous work [10] is designed more suitable for

I-SLICE instead of relying on overflow/underflow handling strategy.

3.2 R-D Performance

In order to objectively evaluate the performance of the proposed RC algorithm, R-

lambda algorithm proposed in [9] is utilized for comparison. In [9], the target bit rate

of each texture video is set as corresponding bit rate in HTM anchor and depth maps

are coded with fixed QP as the same as anchor. In the proposed algorithm, the target

bit rate for all coded views’ bit rate (including three texture videos and three depth

maps) is set as anchor’s total bit rate.

As illustrated in Table. 1 and Fig. 4, we can see that the proposed algorithm shows

much better R-D performance than R-lambda for both coded texture views and syn-

thesized views. Based on the proposed synthesized views distortion model, the opti-

mal bit allocation for the texture and depth is achieved. The maximum performance

improvement for all views (including coded texture views and synthesized views) can

be up to 14.4% and the average BD-rate gain is 6.9%.

Furthermore, two R-D curves are shown in Fig. 4. It can be observed the proposed

RC algorithm shows much better R-D performance than R-lambda model for both

high bit rate and low bit rate.

4 Conclusions

This paper has presented a synthesized views distortion model based joint bit alloca-

tion and rate control method to achieve the best overall quality for 3D-HEVC. The

distortion dependency is investigated between the coded views and the synthesized

views. The proposed bit allocation method is classified into three levels, namely tex-

ture/depth level, view level and frame level. Experimental are conducted on different

video sequences and the results show that the proposed method can achieve much

better R-D performance than other algorithms for 3D-HEVC.

5 References

1. B. Bross, W.-J Han, G.J. Sullivan, J.-R. Ohm, T. Wiegand, High efficiency video coding

(HEVC) text specification draft 8, JCTVC-J1003, Stockholm, July2012.

2. P. Kauff, N. Atzpadin, C. Fehn, M. Müller, O. Schreer, A. Smolic, and R. Tanger, “Depth

Map Creation and Image Based Rendering for Advanced 3DTV Services Providing In-

teroperability and Scalability,” Signal Processing: Image Communication, Special Issue on

3DTV, pp. 217-234, Feb.

3. S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for H.264/AVC video coding and its

application to rate control,” IEEE Trans. Cir-cuits Syst. Video Technol., vol. 15, no. 12,

pp. 1533–1544, Dec.2005.

4. H. Choi, J. Nam, J. Yoo, D. Sim, and I. V. Bajić, “Rate control based on unified RQ model

for HEVC,” JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCT-VC

H0213 (m23088), San José, CA, USA, Feb. 2012.

5. B. Li, H. Li, L. Li and J. Zhang, “lambda Domain Rate Control Algorithm for High Effi-

ciency Video Coding” IEEE Trans. on Image Processing, vol.23, no.9, pp.3841-3854,

Sep.2014

6. S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, “Rate-GOP Based Rate Control for High

Efficiency Video Coding”, IEEE Journal of Selected Topics in Signal Processing, vol.7,

no.6, pp.1101-1111, Dec.2013.

7. S. Ma, S. Wang, and W. Gao, “Low Complexity Adaptive View Synthesis Optimization in

HEVC Based 3D Video Coding,” IEEE Transactions on Multimedia, vol.16, no.1, pp.266

– 271, Jan. 2014.

8. W. Lim, D. Sim and I. V. Bajić, “JCT3V – Improvement of the rate control for 3D multi-

view video coding,” ISO/IEC JTC1/SC29/WG11, JCT3V-C0090, Geneva, Switzerland,

Jan. 2013.

9. W. Lim, D. Sim and I. V. Bajić, “JCT3V –The rate control schemes for 3D multi-view

video coding,” ISO/IEC JTC1/SC29/WG11, JCT3V-D0111, Incheon, KR, Apr. 2013.

10. S. Tan, J. Si, S. Ma, S. Wang and W. Gao, “Adaptive Frame Level Rate Control in 3D-

HEVC”, Visual Communication and Image Processing, Malta, Dec.2014.

synthesized views distortion model based rate control in ......model for h.264 [3] and urq model...

Documents