nayana parashar multimedia processing lab university of texas at arlington supervising professor:...

Multimedia Processing Lab, UTA

Nayana ParasharMultimedia Processing Lab

University of Texas at Arlington

Supervising Professor: Dr. K.R. RaoNovember 25th, 2013

IMPLEMENTATION OF AN OUT-OF-THE LOOP POST-PROCESSING TECHNIQUE FOR

HEVC DECODED DEPTH-MAPS

11/25/2013


CONTENTS

11/25/2013

1. BASIC CONCEPTS2. VIDEO COMPRESSION3. 3D VIDEO COMPRESSION4. THESIS-WORK5. RESULTS6. CONCLUSIONS7. FUTURE-WORK8. REFERENCES


THESIS IN A NUT-SHELL

11/25/2013

Normal Procedure

Thesis

Motivation : Compression artifact removal, better perceptual quality of rendered frames.

3D VIDEO ENCODING

(Color-sequence & corresponding

Depth-map )

3D VIDEO DECODING(Color-sequence &

corresponding Depth-map )

VIEW RENDERING for DISPLAY

(Stereoscopic or Multi-

view)

3D VIDEO ENCODING

(Color-sequence & corresponding

Depth-map )

3D VIDEO DECODING(Color-sequence &

corresponding Depth-map )

VIEW RENDERING for DISPLAY

(Stereoscopic or Multi-

view)

Post-processing of the decoded depth-map

Multimedia Processing Lab, UTA 11/25/2013

BASIC CONCEPTS


Image and video

11/25/2013

Images and video make-up the visual media. An image is characterized by pixels or pels, the smallest addressable elements

in a display device. Properties of an image: number of pixels (height and width), color and

brightness of each pixel. Video is composed of a sequence of pictures (frames) taken at regular time

(temporal) intervals.

Figure 1: 2D image with spatial samples (L) and Video with N frames (R) [1]


3D video – Multi-view video plus depth format

11/25/2013

The multi-view video plus depth (MVD) [2] [3] format: The most promising format for enhanced 3D visual experiences.

This type of representation provides, for each view-point, a texture (image sequence) and an associated depth-map sequence (fig. 2).

Figure 2: Color video frame (L) and associated depth map frame (R) [4]


Depth-maps

11/25/2013

Depth maps represent the per-pixel depth of a corresponding color image, and signal the disparity information needed at the virtual (novel) view rendering system.

They are represented as a gray-scale image sequence for storage and transmission requirements.

In the depth maps, each pixel conveys information on the relative distance from the camera to the object in the 3D space.

Their efficient compression and transmission to the decoder is important for view generation.

They are never actually displayed and are used for view generation purposes only.


Depth Image Based Rendering (DIBR) [5]

11/25/2013

It is the process of synthesizing “virtual” views of a scene from still or moving images and associated per-pixel depth information.

Two step process: 1. The original image points are reprojected into the 3D world, utilizing the

respective depth data.2. 3D space points are projected into the image plane of a “virtual” camera,

which is located at the required viewing position.

Stereoscopic view generation:- Two (left and right) views are generated.

Multiple view generation:- More than two views (each view corresponding to the scene viewed from a different angle) are generated.


Stereoscopic view rendering

11/25/2013

A color image and per-pixel depth map can be used to generate virtual stereoscopic views. This is shown in fig. 3.

In this process, the original image points at locations (x, y) are transferred to new locations (xL , y) and (xr , y) for left and right view respectively.

The view generation process in a little detail:

Figure 3: Virtual view generation in Depth Image Based Rendering (DIBR) process [6]


VIDEO COMPRESSION


Introduction

11/25/2013

Data compression: Science of representing information in a compact format.

Common image/video compression techniques reduce the number of bits required to represent image/video sequence (can be lossy or lossless).

Video compression strategies:- Spatial, temporal and bit-stream redundancies are exploited. High-frequency components are removed.

Many organizations have come-up with a number of video compression codecs over the past many years. [1]

High Efficiency Video Coding (HEVC) is the most recent video compression standard.


HEVC overview [13][14]

11/25/2013

Successor of the H.264/AVC video compression standard.

Multiple goals: improved coding efficiency, ease of transport system integration data loss resilience implementation ability using parallel processing architectures

Complexity of some key modules such as transforms, intra prediction, and motion compensation is higher in HEVC than in H.264/AVC. Complexity of modules such as entropy coding and deblocking is lower in HEVC than in H.264/AVC [15].


HEVC encoder- Block diagram

11/25/2013

LEGEND: - High freq.

content removal - Spatial redundancy exploitation

- Temporal redundancy exploitation- Bit-stream redundancy exploitation-Sharp edge smoothing

Figure 4: HEVC encoder block-diagram [13]


3D VIDEO COMPRESSION


The depth-map dilemma

11/25/2013

Compression of depth-maps is a challenge.Quantization process eliminates high spatial

frequencies in individual frames. The compression artifacts have adverse consequences

upon the quality of the rendered views. It is highly important to preserve the sharp depth

discontinuities present in depth maps for high quality virtual view generation.

Two solutions exist to this dilemma


The two approaches for 3D compression

11/25/2013

Approach one: Use of novel video compression techniques that are suitable for 3D video. Special

features are added to overcome the depth-map dilemma. E.g. 3D video coding in H.264/AVC [16], 3D video extension of HEVC [17] [18] [19] Advantages: Special features that are specific to 3D video are exploited (Inter-view

prediction), Dedicated blocks for depth-map compression in the codec. Disadvantages: Insanely complex with respect to general codec structure as well

encoding time.

Approach two: Using the already existing codecs to encode and decode the sequences.. Later, use image denoising techniques [20] on decoded depth-maps to remove

compression artifacts. Advantages: Not as complicated and complex as approach one. Use of existing video codecs without any modification. Disadvantages: There is never one right denoising solution.


THESIS-WORK


Scope and premises

11/25/2013

This thesis falls into the second approach explained for 3D video compression

Not much research has been done to implement image denoising techniques for HEVC decoded depth-maps.

A post-processing framework that is based on analysis of compression artifacts upon generation of virtual views is used.

The post-processing frame-work utilizes a spatial filtering technique specifically discontinuity analysis followed by Edge-adaptive joint trilateral filter [6] to reduce compression artifacts.

Effectively reduces the compression artifacts from HEVC decoded depth-maps.

Improvement in the perceptual quality of rendered views without using depth-map specific video codec


Algorithm: Block diagram

11/25/2013

Encoder/Decoder

Depth Discontinuity

Analysis

Edge Adaptive

Joint Trilateral

Filter

OriginalDepth Map

CorrespondingColor Image

CompressedDepth Map

BinaryMask

ReconstructedDepthMap

(a)

(b)

Figure 5: Block-diagram of the algorithm used for depth-map enhancement


Step (a): Depth discontinuity analysis [6]

11/25/2013

The purpose is twofold: 1) The areas that have aligned edges in the color image and the corresponding depth map are identified. The filter kernels of the EA-JTF are adaptively selected based on this information. 2) All depth discontinuities that are significant in terms of rendering are identified.

Sub-steps: The depth map is convolved with a vertical sobel filter to obtain Gx.

An edge mask Ed is derived using Eq. (1.1), which corresponds to pixel locations of significant depth discontinuities.

Where is a theoretical threshold obtained after studying the effects of compression artifacts to view rendering.

xB – distance between the left and right virtual cameras or eye separation (assumed to be 6 cm)

D - viewing distance (assumed to be 250 cm).

knear and kfar – range of the depth information respectively behind and in front of the picture, relative to the screen width.

Npix – screen width measured in pixels

8-bit images are considered ( that is why the number 255)

(1.1)


Step (a): (contd.)

11/25/2013

To identify the regions in which the color edges and depth discontinuities are aligned, an edge mask Ec of the color image is generated by the canny edge detection algorithm. Using Ed and Ec, the binary mask Es signifying the aligned edge areas is obtained as:

Where, represents the morphological dilation and S1and S2 represent flat square structuring elements of size 2 and 7 respectively.

Different stages of step (a) are shown in figure 6.

(1.2)

http://www.mathworks.com/help/images/morphology-fundamentals-dilation-and-erosion.html


Figure 6: Illustration of depth discontinuity analysis


Step (b): Edge-adaptive joint trilateral filter

11/25/2013

The edge adaptive joint trilateral filter [6] is based of bilateral filter and joint trilateral filter [7] [8] [9] [10] [11] [12].

For some pixel position p the filtered result F is given as in the eq. (2.1),

In Eq. (2.1), Iq is the value at pixel position q in the kernel neighborhood. The filter weight wpq at pixel position q is calculated as,

Both c and s are popularly implemented as a Gaussian centered at p and Ip (Ip is the value at pixel position p) with standard deviations σc and σs, respectively as

The similarity filter kernel st of the joint trilateral filter is adaptively selected as given in Eq. (2.3). For the areas where the edges between the color image and the corresponding depth map are aligned (i.e. Es from eq. 1.2 = 1) , there will be two similarity filter kernels used, each derived from the compressed depth map (s) and the color image(sj). For the remaining area, only the similarity filter kernel derived from the compressed depth map is used.

(2.1)

(2.2)

(2.3)

(2.4)

(2.5)


Step (c) :- Stereoscopic view rendering

11/25/2013

The reconstructed depth-map from step (b) is used to generate left side and right side views using stereoscopic view rendering process [21] [22] [27].

Finally, the frames obtained using uncompressed depth map, HEVC decoded depth-map and HEVC decoded depth-map to which the post-processing has been applied are compared using the metrics PSNR, SSIM [24] and a approximate of Mean Opinion Score [25] for image quality.


RESULTS


Results: Experimental set-up

11/25/2013

To evaluate the performance of the EA-JTF [6] on HEVC decoded depth maps, color sequences along with the corresponding depth maps are compressed using HEVC reference software HM 9.2 [26]. For the purpose of filtering and rendering MATLAB R2013a student version was used.

For all the sequences, other than Ballet, only one frame result is obtained at QP = 32. A 15 frame sequence at a frame-rate of 3 frames/sec is used for Ballet.

Three different rendered images are obtained: 1) Original image and the corresponding depth map are used. (original) 2) HEVC decoded image and the corresponding decoded depth-map are used. (compressed) 3) HEVC decoded image and the depth-map after post-processing. (post-processed) PSNR and SSIM [24] and an approximate Mean Opinion Score (MOS) [25] was

used to evaluate the perceptual quality of the rendered views.


Results: Input parameters

11/25/2013

Parameter Value

Viewing distance (D) 250cm (assumed)

Eye separation (xB) 6cm(assumed)

Screen width in pixels (Npix)1366 (for the laptop used for experimentation)

knear and kfar

knear = 44.00; kfar = 120.00 (BreakDancer)knear = 42.00; kfar = 130.00 (Ballet)knear = 448.25; kfar = 11206.28 (Balloons)knear = 448.25 ; kfar = 11206.28 (Kendo)

Resolution of the video sequences used 1024 x 768

EA-JTF

Kernel size: 15 x 15 pixels

Standard deviation for the color similarity filter (σs) = 0.025 (normalized range of 0-1)

Standard deviation for the depth similarity filter (σj) = 0.036 (normalized range of 0-1)

Standard deviation for the closeness filter (σc) = 45


Results: Break-dancer sequence

11/25/2013

Original sequence obtained from Microsoft Research [23] An increase in both PSNR as well as SSIM is seen. High-quality rendering as the original depth-maps are generated using

computer vision algorithms. A grayscale version of the sequence was used for approximate MOS

calculation. Even, here the post-processed method had better ratings than the compressed one.

Image databaseMetric Decoded Image(Left-side view)

Processed Image(Left-side view)

Decoded Image(Right-side view)

Processed Image(Right-side view)

PSNR (dB) 41.9401 41.9804 41.9401 41.9804SSIM (dB) 0.9133 0.9139 0.9133 0.9139

Image MOS Rating (max = 3)

Original 2.6

Decoded 1.5

Processed 1.9


Results: Ballet sequence

11/25/2013

Original sequence obtained from Microsoft Research [23] An increase in both PSNR as well as SSIM is seen. High-quality rendering as the original depth-maps are generated using

computer vision algorithms. Sequence not used for MOS calculation. Image database

Metric Decoded Image(Left-side view)




PSNR (dB) 42.7317 42.787 42.7317 42.787

SSIM (dB) 0.9413 0.9444 0.9413 0.9444


Results: Kendo sequence

11/25/2013

Original sequence obtained from [4]. Very interesting sequence. Not much edge information, hence the original,

post-processed and compressed all are extremely similar perceptually. However, there is a slight decrease in PSNR and SSIM turned out to be exactly equal.

On the other hand, in MOS calculation, the post-processed frame performed better than the compressed frame.

Image databaseMetric Decoded Image(Left-side view)




PSNR (dB) 45.7213 45.0551 45.7213 45.0551SSIM (dB) 0.9887 0.9887 0.9887 0.9877


Original 2.2

Decoded 1.7

Processed 2.1


Results: Balloons sequence

11/25/2013

Original sequence obtained from [4]. The compressed has better PSNR as well as SSIM compared to the processed. This can be attributed to the fact that the views rendered from the original

sequence themselves are not optimal due to noise in the original-depth. The proposed solution improves the perceptual quality to a great extent. In MOS calculation, the post-processed frame performed better than the

compressed frame. Image database

Metric Decoded Image(Left-side view)




PSNR (dB) 44.2039 43.209 44.2039 43.209SSIM (dB) 0.981 0.9798 0.981 0.9798


Original 2.4

Decoded 1.0

Processed 2.5


CONCLUSIONS


Conclusions

11/25/2013

Quality of rendered views (stereoscopic rendering) generated using HEVC decoded depth-maps was improved.

Four multi-view plus depth sequences were used to carry-out experiments. There was a an improvement in PSNR as well as SSIM for the two sequences-

Break-dancer and Ballet. Break-dancer sequence saw an improvement of 0.04 dB in PSNR and 0.006 dB in SSIM. Ballet saw improvement of dB in PSNR and dB in SSIM. There was no improvement in PSNR for Kendo sequence while the SSIM remained constant (not much edge information) while for the balloons sequence, there was no improvement in either the PSNR or the SSIM.

However, the main improvement brought about by this method was the improvement in the perceptual quality of the rendered views. An approximate MOS test survey suggested that the views rendered after post-processing were always better perceptually compared to the ones rendered without post-processing. In this regard, all the four test sequences showed improvement in perceptual quality.


FUTURE-WORK


Future-work

11/25/2013

Improvement in filter design to provide more significant results.

Moving ahead of stereoscopic rendering and into multi-view rendering.

Method can be made in-loop and merged with the HEVC compression codec.

To calculate the perceptual quality, the current work used SSIM and an approximate of Mean Opinion Score, more research into perceptual quality assessment for depth-maps and rendered views will be useful.


REFERENCES


References

11/25/2013

1. K.R. Rao, D.N. Kim and J.J. Hwang, “Video coding standards: AVS China, H.264/MPEG4-Part 10, HEVC, VP6, DIRAC and VC-1”, Springer -2014.

2. D.K. Shah, et al, "Evaluating multi-view plus depth coding solutions for 3D video scenarios," 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2012, vol., no., pp.1, 4, 15-17 Oct. 2012.

3. Fraunhofer HHI, 3D Video coding information: http://www.hhi.fraunhofer.de/fields-of-competence/image-processing/research-groups/image-video-coding/3d-hevc-extension.html

4. Balloons and Kendo test sequences: http://www.tanimoto.nuee.nagoya-u.ac.jp/~fukushima/mpegftv/5. C. Fehn "A 3D-TV system based on video plus depth information," Signals, Systems and Computers, 2004. Conference Record of the

Thirty-Seventh Asilomar Conference on, vol.2, no., pp.1529-1533 Vol.2, 9-12 Nov. 2003.6. D.V.S. De Silva, et al, “A Depth Map Post-Processing Framework for 3D-TV systems based on Compression Artifact Analysis”, Selected

Topics in Signal Processing, 2011, IEEE journal of volume: PP, Issue: 99, pp. 1 - 30 7. C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” IEEE International Conference on Computer Vision,

Washington DC, USA, pp 839-846, 1998. 8. E. Eisemann and F. Durand, “Flash photography enhancement via intrinsic relighting,” in ACM Transactions on Graphics (TOG), vol. 23,

no. 3. ACM, 2004, pp. 673–678.9. G. Petschnigg, et al, “Digital photography with flash and no-flash image pairs,” in ACM Transactions on Graphics (TOG), vol. 23, no. 3.

ACM, 2004, pp. 664–672.10. B. Zhang and J. Allebach, “Adaptive bilateral filter for sharpness enhancement and noise removal,” Image Processing, IEEE

Transactions on, vol. 17, no. 5, pp. 664–678, 2008.11. P. Choudhury and J. Tumblin, “The trilateral filter for high contrast images and meshes,” in ACM SIGGRAPH 2005 Courses. ACM, 2005,

pp. 5-es.12. S. Liu, P. Lai, D. Tian, C. Gomila, and C. W. Chen, “Joint trilateral filtering for depth map compression,” Huangshan, China, 2010, pp. 77

440F-10.13. G.J. Sullivan; J. Ohm; Woo-Jin Han and T.Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions

on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec 2012.14. HEVC text specification draft 10: http://phenix.it- sudparis.eu/jct/doc_end_user/current_document.php?id=7243

http://www.hhi.fraunhofer.de/fields-of-competence/image-processing/research-groups/image-video-coding/3d-hevc-extension.html


http://www.tanimoto.nuee.nagoya-u.ac.jp/~fukushima/mpegftv/



http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6316136&contentType=Journals+&+Magazines&searchField=Search_All&queryText=overview+of+HEVC


REFERENCES

11/25/2013

15. F Bossen, et al, “HEVC complexity and implementation analysis”, IEEE Transactions on Circuits and Systems for Video Technology, Volume: 22, Issue: 12, pp. 1685 - 1696, December 2012.

16. 3DV for H.264: http://mpeg.chiariglione.org/technologies/general/mp-3dv/index.htm17. Fraunhofer HHI, 3D Video coding information:


18. P. Merkle, A Smolic, K. Müller, and T. Wiegand, “Multi-View video plus depth data representation and coding”. Picture Coding Symposium, 2007.

19. “Test Model under Consideration for HEVC based 3D video coding”, ISO/IEC JTC1/SC29/WG11 MPEG2011/N12559, San Jose, CA, USA, February 2012.

20. M.C. Motwani, et al, “A survey of image denoising techniques”, Proceedings of GSPx 2004, Santa Clara, CA: http://www.cse.unr.edu/~fredh/papers/conf/034-asoidt/paper.pdf

21. I. J. S. W. 11, “Proposed experimental conditions for EE4 in MPEG 3DAV. WG 11 doc. m9016,” vol. Shanghai, Oct. 2002. 22. C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV,”

Proceedings of the SPIE, vol. 5291, 93, 2004. 23. Break-Dancers and Ballet sequence: http://research.microsoft.com/en-us/um/people/sbkang/3dvideodownload/24. Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "

Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600 - 612, Apr. 2004.

25. L Ma, et al, "Image Retargeting Quality Assessment: A study of subjective scores and objective metrics," Selected Topics in Signal Processing, IEEE Journal of , vol.6, no.6, pp.626,639, Oct. 2012.

26. HEVC reference software (HM 9.2):- https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/HM-9.2-dev/27. MATLAB code for stereoscopic view rendering:

http://www.mathworks.com/matlabcentral/fileexchange/27538-depth-image-based-stereoscopic-view-rendering

http://mpeg.chiariglione.org/technologies/general/mp-3dv/index.htm

http://mpeg.chiariglione.org/technologies/general/mp-3dv/index.htm



http://www.cse.unr.edu/~fredh/papers/conf/034-asoidt/paper.pdf

http://www.cse.unr.edu/~fredh/papers/conf/034-asoidt/paper.pdf

http://research.microsoft.com/en-us/um/people/sbkang/3dvideodownload/

http://www.cns.nyu.edu/~zwang/files/papers/ssim.html

http://en.wikipedia.org/w/index.php?title=IEEE_Transactions_on_Image_Processing&action=edit&redlink=1

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/HM-9.2-dev/

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/HM-9.2-dev/

http://www.mathworks.com/matlabcentral/fileexchange/27538-depth-image-based-stereoscopic-view-rendering


THANK YOU!

QUESTIONS??

nayana parashar multimedia processing lab university of texas at arlington supervising professor:...

Documents

depth image

d video multiview video

d video compression

d image

depth format

uta depth maps

depth mvd

associated depthmap