synthesized views distortion model based rate control in ......model for h.264 [3] and urq model...
TRANSCRIPT
Synthesized Views Distortion Model Based Rate Control
in 3D-HEVC
Songchao Tan1, Siwei Ma2, Shanshe Wang2, Wen Gao2
1 School of Computer Science and Technology, Dalian University of Technology, Dalian
116024, China 2 Institute of Digital Media, Peking University, Beijing 100871, China
Abstract. In this paper, we propose a synthesized views distortion model based
rate control algorithm for the high efficiency video coding (HEVC) based 3D
video compression standard. The major contributions of the paper include the
following two aspects. Firstly, we investigate the distortion dependency be-
tween the synthesized views and the coded views including texture video and
depth maps. Then we propose a synthesized views distortion model for 3D-
HEVC, and based on the distortion model an efficient joint bit allocation
scheme is proposed. Experimental results show that the proposed rate control
algorithm achieves better performance on both the coded texture views and syn-
thesized views. The maximum overall (including all coded texture views and all
synthesized views) performance improvement can be up to 14.4% and the aver-
age BD-rate gain is 6.9%. Moreover, it can accurately control the bitrate to sat-
isfy the total bitrate constraint.
Keywords: rate control, bit allocation, 3D video coding
1 Introduction.
The 3D extension of High Efficiency Video Coding [1] (3D-HEVC) was developed
by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-
3V) led by ISO/IEC MPEG and ITU-T VCEG. 3DV video coding aims at coding the
visual information of a 3D scene that usually contains multi-view texture data and its
corresponding depth information [2]. In 3D-HEVC, one view is selected as a base
view which is coded independently of the other views to provide backward compati-
bility to HEVC decoders. Other views (termed as dependent views) are coded with
inter-view prediction using the base view or other reference views to reduce the re-
dundancy between views.
As an important module of video encoder, rate control (RC) is employed to regu-
late the bit rate meanwhile to guarantee a good video quality. For each video coding
standard, rate control is always a hot research topic and many different rate control
schemes for different video coding standards have been proposed, such as quadratic
model for H.264 [3] and URQ model [4], R-lambda [5] and rate-GOP [6] for HEVC.
However for 3D-HEVC, rate control becomes more complicated. Different from the
traditional 2D video coding standards, 3D-HEVC utilizes the depth map to generate
Fig. 1. General framework of the 3D-HEVC [1] system
the synthesized virtual views. The coding quality of synthesized views depends on the
quality of texture video and depth maps [7]. As such, it is important to balance the bit
allocation for texture video and depth maps to get better quality of synthesized views.
And many other techniques are adopted to improve the coding performance. All these
techniques bring great challenges to establish an accurate rate distortion (R-D) model
and bit allocation scheme in rate control for 3D-HEVC.
In the literature, several rate control schemes for 3D-HEVC are proposed. Two
representative rate control methods, URQ and R-lambda have been proposed for 3D-
HEVC in [8] and [9], which are the extension of the methods in HEVC. In order to
get better performance, depth maps-based inter-views MAD prediction is proposed to
improve the prediction accuracy of the to-be-generated bits for the current unit. How-
ever, there is still large room for R-D performance improvement. In our previous
work [10], we proposed an adaptive rate control scheme for 3D-HEVC. The algo-
rithm performance including bit rate mismatch and R-D performance is significantly
improved compared to the above two algorithms.
In this paper, we further propose a novel rate control scheme based on the synthe-
sized views distortion model in 3D-HEVC. Firstly, we investigate the distortion de-
pendency between the synthesized views and the input texture video and depth maps,
and formulate a distortion model for synthesized views. Secondly, based on this mod-
el, the bit allocation scheme for texture video and depth maps is formulated as an
optimization problem.
The rest of this paper is organized as follows. In Section II, the R-D characteristics
of both the coded views and synthesized views are investigated. In Section II-A, the
R-D model for coded texture views is proposed. A view synthesis distortion model to
characterize the distortion dependency of the texture video and the depth maps on the
synthesized virtual views is investigated in Section II-B. In Section II-C, an effective
joint bit allocation based rate control scheme is designed for 3D-HEVC. In Section
III, the experimental results are given to demonstrate the efficiency of the proposed
RC algorithm. Finally, Section IV concludes this paper.
Fig. 2. The relationship between distortion and bit rate of coded texture views, βNewspaper_CCβ.
2 Rate and Distortion Analysis in 3D-HEVC
As illustrated in Fig. 1, the texture videos are captured by synchronizing the multiple
camera arrays. The associated depth maps are also generated for virtual view synthe-
sis. At the encoder, texture video and depth maps are encoded using 3D-HEVC. At
the client side, the arbitrary virtual views are synthesized from the decoded texture
video and depth maps. Then the decoded texture video and synthesized views would
be presented for viewing at receiver side. Therefore, the quality of coded texture
views and the virtual synthesized views needs to be optimized as follows
min(π·π£ + π·π),
π . π‘. π π + π π‘ β€ π π , (1)
where Rt and Rd are the bit rate of texture video and depth maps, respectively. Dc and
Dv are the distortion of texture video and synthesized views respectively.
In order to model the expression in (1), we need to investigate the rate and distor-
tion (R-D) relationship for coded texture views and synthesized views, respectively.
2.1 R-D Model for the coded texture views
To obtain the R-D characteristics of the coded views, we encode the original texture
video with 4 quantization parameters (QP) (25, 30, 35, and 40). As an example, the R-
D curve of the texture distortion and the bit rate of test sequence βNewspaper_CCβ are
illustrated in Fig. 2.
It can be observed that power functions can be used to fit the R-D points of texture
video well.
π·π‘(π π‘) β πΌππ π‘βπ½π , (2)
Where π·π‘ is the distortion of the coded texture views. π π‘ is the bit rate of texture
views. πΌπ and π½π are model parameters.
10
20
30
40
0 500 1000 1500 2000
R-D curve
δΉεΉ(R-D curve)
ηΊΏζ§ (R-D curve)
ε€ι‘ΉεΌ (R-D curve)
MS
E
Bit rate (kpbs)
Power
Linear
Quadratic
R vs. D
Fig. 3. The R-D surface of synthesized views distortion and bit rate. βkendoβ.
2.2 R-D Model for synthesized views
To find the best bit budget between the texture and depth, we also need to establish
the relationship of bit rate and the synthesized views distortion. We investigate the
synthesized views quality influence on the bit rate of texture video (Rt) and the bit rate
of depth map (Rd).
In Fig. 3, the quality influence of texture and depth on synthesized views is inves-
tigated by changing the texture quantization parameter QT from 5 to 45 meanwhile
fixing the depth quantization parameter QD at 24, 34, 39, 44 and 49 respectively. The
quality of synthesized views (Ds) is measured in term of MSE. As shown in Fig. 3,
once Rt / Rd is determined, the Ds - Rt / Ds - Rd relationship can be approximated as
power expression.
The Ds - Rt relationship as
π·π (π π‘) = πΌπ‘π π‘βπ½π‘ . (3)
And the Ds - Rd relationship as
π·π (π π) = πΌππ πβπ½π . (4)
Therefore from (3) to (4), we get the distortion model for the synthesized views as
follows,
π·π = πΌπ‘π π‘βπ½π‘ + πΌππ π
βπ½π , (5)
where π·π is the distortion of synthesized views. π π‘ and π π are the bits for texture and
depth. πΌπ‘, π½π‘, πΌπ and π½π are the model parameters.
2.3 A Joint Bit Allocation based RC Scheme for 3D-HEVC
Rate control for 3D-HEVC needs to solve the bit allocation on texture/depth level,
view level and frame level. The optimum bit allocation problem is to effectively dis-
MS
E
R Rt
d
tribute the bit budget between texture and depth so that the minimum views synthesis
and coded views distortion are achieved. Based on the proposed coded texture views
and synthesized views R-D model (5), we formulate the overall quality based opti-
mum bit allocation as
(π π‘πππ‘
, π ππππ‘
) = arg min (πΌπ‘π π‘βπ½π‘ + πΌππ π‘
βπ½π + πΌππ πβπ½π),
s. π‘. π π‘ + π π β€ π π . (6)
ΞΆ is used to represent the proportional relationship between Rt and Rd, defined as
ΞΆ = π π‘
πππ‘
π ππππ‘ . (7)
Therefore, from (6) and (7), we get the objective optimization function with only
one variable ΞΆ, as shown below,
(ΞΆ) = arg min (πΌπ‘(ΞΆ
1+ΞΆπ π)βπ½π‘ + πΌπ(
ΞΆ
1+ΞΆπ π)βπ½π + πΌπ(
1
1+ΞΆπ π)βπ½π). (8)
Many optimization methods can be used to find the optimal solution of (8). In this
paper, Newton iterative method is used to get the approximate optimal value. The
target bit rate for the texture and depth can be expressed as follows
π π‘ = π π βΞΆ
1+ΞΆ , (9)
π π = π π β1
1+ΞΆ . (10)
In order to estimate these parameters, we first encode the frames in the first GOP.
Then the model parameters are calculated by the least square error method.
Based on the optimal target bit rate for the texture and depth, the bit rate ratio be-
tween the different views can be further determined by the statistical analysis. In this
paper, we use anchorβs bits ratio between the base view and the dependent views to
allocate the bits for different views.
After allocating the target bit rate for texture/depth level and view level, the target
bit rate needs to be allocated for the different frames. The frame level bit allocation is
proposed in our previous work [7] as follows
, ,1
remain
n
n i remain
n i
R I frameR
R w others
(11)
1 1
,remain remain actual G
n n n n rest
bit rateR N R R N
framerate (12)
Fig. 4. The R-D curves for all views of the proposed RC scheme compared with anchor and R-lambda
βUndo_Dancerβ and βGT_Flyβ
where π π,π is the target bits for ith frame in nth GOP. π πππππππ and π π
πππ‘π’ππ are the target
and actual bits in nth GOP. Nn is the numbers of n th GOPβs frames. ππππ π‘πΊ is the num-
ber of the rest GOP which is not coded. π is a proportion of the I frame in a GOP
which is recommended to be 0.4 and 0.25 respectively for the first and the rest GOPs
based on experiments. wi is the weight of the frames in RA hierarchical structure get-
ting from experience.
0.07 ( %8 0)
0.056 ( %8 4),
0.0454 ( % 4 2)
0.035
i
if POC
if POC
if POC
else
w
(13)
where POC denotes Picture Order Count and represents an output order of the pic-
tures in the video stream.
When overflow or underflow occurs, the difference between the target bits and the
actual bits in a GOP will be distributed to the rest GOPs averagely.
The trade-off between the output bit rate (R) and the quality (D) of the compressed
video are determined by the quantization step size (Qs), which is indexed by quantiza-
tion parameter (Q). The R-Qs and D-Qs model have been studied extensively for the
previous video coding standards such as H.264/AVC and HEVC. Here we use a linear
model which is proposed in our previous work [7] as follows
π = πΌ Γ π ππβ , (14)
where Ξ± is the model parameter. R is the coding rate. QP is the quantization parame-
ter. X is the complexity estimation for the current picture which is computed as fol-
lowing.
11
1 1
0 0
( ) / ( ) ,n n
i i i i n n
i i
X w SAD w SAD R QP
(15)
29
31
33
35
37
39
41
0 2000 4000 6000 8000
Anchor
Proposed
R_lambda
PS
NR
(d
B)
Bit rate (kpbs)
32
34
36
38
40
42
0 1500 3000 4500 6000
Anchor
Proposed
R_lambda
PS
NR
(d
B)
Bit rate (kpbs)
Table 1. Proposed Algorithm R-D Performance Compared With Anchor And R-lambda
Anchor VS Proposed (BD-Rate) R-lambda VS Proposed (BD-Rate) Bits Error (%)
Texture
PSNR
/
Texture bits
Syn views
PSNR
/
Total bits
All views
PSNR
/
Total bits
Texture
PSNR
/
Texture bits
Syn views
PSNR
/
Total bits
All views
PSNR
/
Total bits
R-lambda Proposed
Balloons 2.2% 2.1% 2.0% -3.8% -5.3% -5.2% 1.47% 0.28%
Kendo 2.5% 2.3% 2.2% -4.3% -5.7% -5.6% 0.58% 0.37%
Newspaper_CC 3.8% 3.0% 3.0% -2.6% -3.4% -3.3% 2.60% 0.54%
GT_Fly 0.1% -0.8% -0.5% -8.4% -9.5% -9.8% 1.45% 1.46%
Pozan_Hall2 4.7% 3.3% 3.5% -12.0% -14.0% -14.4% 2.88% 0.62%
Poznan_Street 5.0% 3.3% 3.6% -4.1% -4.6% -4.8% 6.01% 3.88%
Undo_Dancer 4.9% 3.4% 3.6% -6.4% -7.7% -7.8% 0.98% 0.96%
Shark 3.5% 2.1% 2.5% -3.7% -4.7% -4.6% 1.43% 1.63%
1024*768 2.8% 2.5% 2.4% -3.6% -4.8% -4.7% 1.55% 0.40%
1920*1088 3.6% 2.3% 2.5% -6.9% -8.1% -8.3% 2.55% 1.71%
average 3.3% 2.4% 2.5% -5.7% -6.9% -6.9% 2.18% 1.22%
where n is the current frame number. QPn-1 is the quantization parameter of the (n-1)
th frame. Rn-1 is the actual bits of the (n-1) th frame. wi is defined as:
π€π = 0.5πβπ β 0.5πβπππ=0β (16)
3 Experimental Results
To evaluate the proposed 3D-HEVC rate control algorithm, the proposed algorithm is
integrated into the reference software HTM10.0. In order to evaluate the performance
of the proposed RC algorithm and R-lambda algorithm is utilized for comparison. We
have tested our algorithm on all of eight sequences defined in the CTCs (1024Γ768
and 1920Γ1088). Each sequence is composed of three views: the left, the center (cod-
ed first) and the right view. After coding, six synthesized views were rendered.
3.1 Control Accuracy
To evaluate the accuracy of the bit rate control, the following measurement is adopt-
ed.
πΈππππ =|π πππ‘π’ππβπ π‘πππππ‘|
π π‘πππππ‘Γ 100%, (17)
where Error is the bits error. Rtarget and Ractual are the number of target bits and the
actual output bits, respectively.
As illustrated in Table.1, it can be seen that the proposed RC algorithm achieves
smaller mismatch between target bits and actual output bits. That is because the frame
level bit allocation proposed in our previous work [10] is designed more suitable for
I-SLICE instead of relying on overflow/underflow handling strategy.
3.2 R-D Performance
In order to objectively evaluate the performance of the proposed RC algorithm, R-
lambda algorithm proposed in [9] is utilized for comparison. In [9], the target bit rate
of each texture video is set as corresponding bit rate in HTM anchor and depth maps
are coded with fixed QP as the same as anchor. In the proposed algorithm, the target
bit rate for all coded viewsβ bit rate (including three texture videos and three depth
maps) is set as anchorβs total bit rate.
As illustrated in Table. 1 and Fig. 4, we can see that the proposed algorithm shows
much better R-D performance than R-lambda for both coded texture views and syn-
thesized views. Based on the proposed synthesized views distortion model, the opti-
mal bit allocation for the texture and depth is achieved. The maximum performance
improvement for all views (including coded texture views and synthesized views) can
be up to 14.4% and the average BD-rate gain is 6.9%.
Furthermore, two R-D curves are shown in Fig. 4. It can be observed the proposed
RC algorithm shows much better R-D performance than R-lambda model for both
high bit rate and low bit rate.
4 Conclusions
This paper has presented a synthesized views distortion model based joint bit alloca-
tion and rate control method to achieve the best overall quality for 3D-HEVC. The
distortion dependency is investigated between the coded views and the synthesized
views. The proposed bit allocation method is classified into three levels, namely tex-
ture/depth level, view level and frame level. Experimental are conducted on different
video sequences and the results show that the proposed method can achieve much
better R-D performance than other algorithms for 3D-HEVC.
5 References
1. B. Bross, W.-J Han, G.J. Sullivan, J.-R. Ohm, T. Wiegand, High efficiency video coding
(HEVC) text specification draft 8, JCTVC-J1003, Stockholm, July2012.
2. P. Kauff, N. Atzpadin, C. Fehn, M. MΓΌller, O. Schreer, A. Smolic, and R. Tanger, βDepth
Map Creation and Image Based Rendering for Advanced 3DTV Services Providing In-
teroperability and Scalability,β Signal Processing: Image Communication, Special Issue on
3DTV, pp. 217-234, Feb.
3. S. Ma, W. Gao, and Y. Lu, βRate-distortion analysis for H.264/AVC video coding and its
application to rate control,β IEEE Trans. Cir-cuits Syst. Video Technol., vol. 15, no. 12,
pp. 1533β1544, Dec.2005.
4. H. Choi, J. Nam, J. Yoo, D. Sim, and I. V. BajiΔ, βRate control based on unified RQ model
for HEVC,β JCT-VC of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCT-VC
H0213 (m23088), San JosΓ©, CA, USA, Feb. 2012.
5. B. Li, H. Li, L. Li and J. Zhang, βlambda Domain Rate Control Algorithm for High Effi-
ciency Video Codingβ IEEE Trans. on Image Processing, vol.23, no.9, pp.3841-3854,
Sep.2014
6. S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, βRate-GOP Based Rate Control for High
Efficiency Video Codingβ, IEEE Journal of Selected Topics in Signal Processing, vol.7,
no.6, pp.1101-1111, Dec.2013.
7. S. Ma, S. Wang, and W. Gao, βLow Complexity Adaptive View Synthesis Optimization in
HEVC Based 3D Video Coding,β IEEE Transactions on Multimedia, vol.16, no.1, pp.266
β 271, Jan. 2014.
8. W. Lim, D. Sim and I. V. BajiΔ, βJCT3V β Improvement of the rate control for 3D multi-
view video coding,β ISO/IEC JTC1/SC29/WG11, JCT3V-C0090, Geneva, Switzerland,
Jan. 2013.
9. W. Lim, D. Sim and I. V. BajiΔ, βJCT3V βThe rate control schemes for 3D multi-view
video coding,β ISO/IEC JTC1/SC29/WG11, JCT3V-D0111, Incheon, KR, Apr. 2013.
10. S. Tan, J. Si, S. Ma, S. Wang and W. Gao, βAdaptive Frame Level Rate Control in 3D-
HEVCβ, Visual Communication and Image Processing, Malta, Dec.2014.