implementation of h.264 using jm and intel ipp · implementation of h.264 using jm and intel ipp...
TRANSCRIPT
Implementation of H.264 using JM and Intel IPP
Interim Report for EE 5359 - Multimedia Processing, Spring 2011.
Instructor: Dr. K.R. Rao
Id:1000659642
Abstract:
The main goal of this project is to implement the different profiles of H.264 [1] and [3] using JM
and Intel softwares. The implementation would be on various test sequences in different formats
like CIF (Common Intermediate Format), QCIF (Quarter Common Intermediate Format) and
SD/HD. Comparison is done based on metrics like MSE (Mean Square Error), PSNR (Peak – to-
Peak Signal to Noise Ratio), SSIM (Structural Similarity Index Metric), encoding time, decoding
time and the compression ratio of the H.264 file size (encoded output).
Overview of H.264:
H.264/AVC is the newest international video coding standard. It is also known as MPEG-4 Part
10, or MPEG-4 AVC (advanced video coding) [10]. It is the latest block-oriented motion-
compensation-based video standard developed by the ITU-T video coding experts group
(VCEG) together with the ISO/IEC moving picture experts group (MPEG), and it was the
product of a partnership effort known as the joint video team (JVT) [7].
Encoder and decoder block diagrams are shown in Figures 1 and 2 respectively [3]. H.264/AVC
standard is developed to provide good video quality at substantially lower bit rates than previous
standards like MPEG-2, H.263 or MPEG-4 Part 2 without affecting the design complexity [8]
and [11]. The standard provides integrated support for transmission or storage, including a
packetized compressed format and features that help to minimize the effect of transmission
errors.
Different Profiles in H.264:
H.264 standard defines numerous profiles, as listed below.
Constrained Baseline Profile (CBP): Primarily for low-cost applications this profile is used
widely in videoconferencing and mobile applications. It corresponds to the subset of features that
are common between the Baseline, Main, and High Profiles.
Baseline Profile (BP): Primarily for low-cost applications that require additional error
robustness, this profile is used rarely in videoconferencing and mobile applications, and it adds
additional error resilience tools to the Constrained Baseline Profile. The importance of the
baseline profile is fading after the Constrained Baseline Profile has been defined.
Figure 1. H.264/AVC encoder block diagram for a macroblock [3].
Figure 2. H.264/MPEG-4 AVC decoder block diagram [3].
Main Profile (MP): This was originally intended as the mainstream consumer profile for
broadcast and storage applications. The importance of this profile faded when the High profile
was developed for these applications.
Extended Profile (XP): This was intended as the streaming video profile. This profile has
relatively high compression capability. It has some extra tricks for robustness to data losses and
server stream switching.
High Profile (HiP): This is the primary profile for broadcast and disc storage applications,
particularly for high-definition television applications. This is the profile adopted into HD DVD
and Blu-ray Disc. There are four High Profiles (Fidelity range extensions). They are:
High 10 Profile (Hi10P): Going beyond today's mainstream consumer product capabilities,
this profile builds on top of the High Profile, adding support for up to 10 bits per sample of
decoded picture precision.
High 4:2:2 Profile (Hi422P): This profile primarily targets professional applications that use
interlaced video. It builds on top of the High 10 Profile, adding support for the 4:2:2 chroma
sub sampling format while using up to 10 bits per sample of decoded picture precision.
High 4:4:4 Predictive Profile (Hi444PP): This profile builds on top of the High 4:2:2
Profile, supporting up to 4:4:4 chroma sampling, up to 14 bits per sample, and additionally
supporting efficient lossless region coding and the coding of each picture as three separate
color planes.
CAVLC 4:4:4 intra profile: The High 4:4:4 Profile constrained to all-Intra use and to
CAVLC entropy coding (i.e., not supporting CABAC).
As a result of the Scalable Video Coding (SVC) extension, the standard contains three additional
scalable profiles, which are defined as a combination of a H.264/AVC profile for the base layer
and tools that achieve the scalable extension [15]:
Scalable baseline profile: Primarily targeting video conferencing, mobile, and
surveillance applications, this profile builds on top of a constrained version of the
H.264/AVC Baseline profile to which the base layer (a subset of the bitstream) must
conform. For the scalability tools, a subset of the available tools is enabled.
Scalable high profile: Primarily targeting broadcast and streaming applications, this
profile builds on top of the H.264/AVC High Profile to which the base layer must
conform.
Scalable high intra profile: Primarily targeting production applications, this profile is
the Scalable High Profile constrained to all-Intra use.
As a result of the Multi Video Coding (MVC) extension, the standard contains two multiview
profiles:
Stereo high profile: This profile targets two – view stereoscopic 3D video and combines
the tools of the High profile with the inter – view prediction capabilities of the MVC
extension.
Multiview high profile: This profile supports two or more views using both inter–picture
(temporal) and MVC inter-view prediction, but does not support field pictures and
macroblock-adaptive frame-field coding.
Feature CBP BP XP MP HiP Hi10P Hi422P Hi444PP
B slices No No Yes Yes Yes Yes Yes Yes
SI and SP slices No No Yes No No No No No
Flexible macroblock
ordering(FMO)
No Yes Yes No No No No No
Arbitrary slice ordering
(ASO)
No Yes Yes No No No No No
Redundant slices (RS) No Yes Yes No No No No No
Data partitioning No No Yes No No No No No
Interlaced coding (PicAFF,
MBAFF)
No No Yes Yes Yes Yes Yes Yes
CABAC entropy coding No No No Yes Yes Yes Yes Yes
8x8 vs. 4x4 transform
adaptivity
No No No No Yes Yes Yes Yes
Quantization scaling matrices No No No No Yes Yes Yes Yes
Separate Cb and Cr QP
control
No No No No Yes Yes Yes Yes
Monochrome (4:0:0) No No No No Yes Yes Yes Yes
Chroma formats 4:2:0 4:2:
0
4:2:0 4:2:0 4:2:
0
4:2:0 4:2:0/4:
2:2
4:2:0/4:2
:2/4:4:4
Sample depths (bits) 8 8 8 8 8 8 to 10 8 to 10 8 to 14
Separate color plane coding No No No No No No No Yes
Predictive lossless coding No No No No No No No Yes
Table 1: Feature support in various profiles of H.264 [15].
Comparison of different profiles are as shown in Figure 3 and Table 1 gives the brief description
of feature support in particular profiles.
Figure 3. Comparison of H.264 baseline, main, extended and high profiles [5].
Inter prediction:
This block includes both motion estimation (ME) and motion compensation (MC). The ME/MC
process performs prediction. It generates a predicted version of a rectangular array of pixels, by
choosing another similarity sized rectangular array of pixels from a previously decoded reference
picture and translating the reference array to the position of the current rectangular array. The
translation from other positions of the array in the reference picture is specified with quarter
pixel precision. Figure 4 shows the various block sizes for motion estimation and compensation.
Figure 4: Variable block sizes for motion estimation and motion compensation [3].
Intra Prediction:
Intra prediction exploits spatial redundancy between adjacent macroblocks in a frame. It predicts
the pixel values as linear interpolation of pixels from adjacent edges of neighbouring
macroblocks that are decoded before the current macroblock. The interpolations are directional
in nature, with multiple modes, each implying a spatial direction of prediction as shown in
Figure 5. There are 9 prediction modes defined for a 4x4 block and 4 prediction modes defined
for a 16x16 block.
The union of all mode evaluations, cost comparisons and exhaustive search inside motion
estimation (ME) cause a great amount of time spent by the encoder. Complex and exhaustive ME
evaluation is the key to good performance achieved by H.264, but the cost is in the encoding
time.
Figure 5: Intra 4*4 prediction modes and prediction directions [4].
Applications:
H.264 offers greater flexibility in terms of compression options and transmission support. An
H.264 encoder can select from a wide variety of compression tools, making it suitable for
applications ranging from low – bitrate, low – delay mobile transmission through high definition
consumer TV to professional television production. The standard provides integrated support for
transmission or storage, including a packetized compressed format and features that help to
minimize the effect of transmission errors.
H.264/ AVC is being adopted for an increasing range of applications, including [2] and [6]:
High Definition DVDs (HD-DVD and Blu-Ray formats)
High Definition TV broadcasting in Europe
Apple products including iTunes video downloads, ipod video and MacOS
NATO and US DoD video applications
Mobile TV broadcasting
Internet video
Videoconferencing
Video Formats:
The Common Intermediate Formats (CIF) is the basis for a popular set of formats. It is common
to capture or convert to one of a set of intermediate formats prior to compression and
transmission. Table 2 shows the luma component of a video frame sampled at a range of
resolutions, from 4CIF down to sub-QCIF.
Format Luminance resolution (horiz x
vert.)
Bits per frame (4:2:0, 8 bits
per sample)
Sub – QCIF
Quarter CIF (QCIF)
CIF
4CIF
SD
128 x 96
176 x 144
352 x 288
704 x 576
720 x 480
147456
304128
1216512
4866048
1228800
Table 2: Luminance resolution and Bits per frame for each format [10].
The choice of frame resolution depends on the application and available storage or transmission
capacity. Table 3 lists the range of applications.
Format Application
SQCIF Mobile multimedia applications where the
display resolution and the bit rate are limited.
QCIF Video conferencing and mobile multimedia
applications.
CIF Video conferencing applications.
4CIF Standard-definition television and DVD –
video.
Table 3: Range of applications for each video format [10].
So, different profiles are implemented and the metrics like MSE, SSIM, PSNR, encoding time,
decoding time and compression ratio of the H.264 file are compared using JM [13] and Intel IPP
software [14].
Quality Metrics:
The quality metrics of the test sequence using JM software is given by the values like PSNR
(Peak to peak signal to noise ratio), SSIM (Structural similarity index) [16 and 17] and MSE
(Mean Square Error). Figures 4,5 and 6 show the variation in the metrics for a baseline profile
for a QCIF test sequence. Figures 7, 8 and 9 show the variation in the metrics for a baseline
profile for a CIF test sequence.
QCIF Test Sequence:
Number of frames: 90
Source width : 176
Source Height: 144
QP: 0,10,20,30,40,50
Profile IDC : 66 (Baseline profile)
QP Bitrate
(kbps)
PSNR (dB) Encoding time
(sec)
Decoding time
(sec)
ME time
(sec)
0 3903.22 69.374 151.320 3.279 129.382
10 1650.93 51.471 137.195 2.866 118.930
20 360.73 43.758 134.012 1.589 117.079
30 80.96 36.045 146.272 1.616 129.304
40 22.88 29.424 164.762 0.855 144.614
50 6.69 23.057 136.406 0.613 116.407
Table 4: Results obtained using JM 17.2 for Carphone QCIF test sequence.
Figure 6: Supporting picture format for 4:2:0 chroma sampling for QCIF test sequence.
A. PSNR:
Figure 7: PSNR vs Bitrate using JM 17.2 for Carphone QCIF test sequence.
B. SSIM
Figure 8: SSIM vs Bitrate using JM 17.2 for Carphone QCIF test sequence.
C. MSE
Figure 9: MSE vs Bitrate using JM 17.2 for Carphone QCIF test sequence.
Foreman Test Sequence:
Number of frames : 90
Source width : 176
Source Height: 144
QP: 0,10,20,30,40,50
Profile IDC : 66 (Baseline profile)
QP Bitrate(Kbps) PSNR(dB) Encoding time
(Sec)
Decoding time
(Sec)
ME time (sec)
0 3729.47 69.164 122.871 4.757 97.103
10 1654.47 51.514 136.672 3.733 110.691
20 436.80 42.816 120.038 2.442 98.679
30 115.49 35.230 106.573 1.260 89.764
40 36.94 28.506 99.976 1.029 84.153
50 10.87 21.704 91.541 0.590 74.347
Table 5: Results obtained using JM 17.2 for Foreman QCIF test sequence.
A. PSNR
Bitrate (Kbps)
Figure 10: PSNR Vs Bitrate using JM 17.2 for Foreman QCIF test sequence.
PS
NR
(d
B)
B. SSIM
Bitrate (Kbps)
Figure 11: SSIM Vs Bitrate using JM 17.2 for Foreman QCIF test sequence.
C. MSE
Bitrate (Kbps)
Figure 12: MSE Vs Bitrate using JM 17.2 for Foreman QCIF test sequence.
SS
IM
MSE
CIF Test Sequence:
Figure 13: Supporting picture format for 4:2:0 chroma sampling for CIF test sequence.
Number of frames: 90
Source width : 352
Source Height: 288
QP: 0,10,20,30,40,50
Profile IDC : 66 (Baseline profile)
QP Bitrate
(kbps)
PSNR (dB) Encoding time
(sec)
Decoding time
(sec)
ME time
(sec)
0 16737.42 69.785 742.471 15.379 655.998
10 8026.28 51.579 713.142 12.740 637.021
20 1642.46 42.597 616.887 7.113 553.565
30 338.15 35.869 633.715 4.412 573.552
40 112.81 30.114 609.422 3.072 550.785
50 43.41 23.909 576.105 2.345 505.838
Table 6: Results obtained using JM 17.2 for Foreman CIF test sequence.
A. PSNR
Figure 14: PSNR Vs Bit rate using JM 17.2 for Foreman CIF test sequence.
B. SSIM
Figure 15: SSIM Vs Bit rate using JM 17.2 for Foreman CIF test sequence.
C. MSE
Figure 16: MSE Vs Bit rate using JM 17.2 for Foreman CIF test sequence.
Number of frames: 90
Source width : 352
Source Height: 288
QP: 0,10,20,30,40,50
Profile IDC : 66 (Baseline profile)
QP Bitrate
(Kbps)
PSNR (dB) Encoding
time (sec)
Decoding
time (sec)
ME time
(sec)
0 18235.53 69.752 768.570 16.668 681.242
10 9575.72 51.649 959.500 11.069 859.159
20 3365.02 43.707 927.807 10.618 835.050
30 1011.22 36.424 801.816 7.427 729.127
40 277.15 30.659 702.272 5.237 630.341
50 104.79 25.780 725.502 3.176 624.922
Table 7: Results obtained using JM 17.2 for Football CIF test sequence.
A. PSNR
Bitrate (Kbps)
Figure 16: PSNR Vs Bitrate using JM 17.2 for Football CIF test sequence.
PSN
R(d
B)
B. SSIM
Bitrate (Kbps)
Figure 17: SSIM Vs Bitrate using JM 17.2 for Football CIF test sequence.
C. MSE
Bitrate (Kbps)
Figure 18: MSE Vs Bitrate using JM 17.2 for Football CIF test sequence.
SSIM
MSE
HD High Profile:
Number of frames: 90
Source width : 720
Source Height: 1080
QP: 0,10,20,30,40,50
Profile IDC : 100 (High Profile)
QP Bitrate
(Kbps)
PSNR (dB) Encoding
time (sec)
Decoding
time (sec)
ME time
(sec)
0 13044.45 67.56 17002.19 26.668 1181.242
10 7998.45 63.117 14044.44 21.069 859.159
20 1799.99 52.86 7989.98 17.618 835.050
30 959.54 47.424 2001.91 7.427 629.127
40 476.25 37.834 476.98 5.237 630.341
50 109.25 10.716 109.25 3.176 624.922
Table 8: Results obtained for HD test sequence using JM 17.2.
A. PSNR
Bitrate (Kbps)
Figure 19: PSNR Vs Bitrate using JM 17.2 for Sintel HD test sequence.
B. SSIM
Bitrate (Kbps)
Figure 20: SSIM Vs Bitrate using JM 17.2 for Sintel HD test sequence.
PSN
R (
dB
) P
SNR
(d
B)
Intel IPP 6.1:
QP
Bitrate
(Kbps)
PSNR (dB) Encoding
time (sec)
Decoding
time (sec)
ME time
(sec)
0 3729.57 69.26 3.41 0.1257 2.51
10 1654.48 51.54 3.31 0.1519 2.58
20 436.87 42.84 3.93 0.1061 3.07
30 115.56 35.33 3.74 0.1232 2.88
40 36.96 37.84 3.77 0.1386 2.94
50 10.97 21.77 4.13 0.1577 3.25
Table 9: Results obtained for Foreman QCIF sequence using Intel IPP 6.1.
A. PSNR
Figure 21: PSNR Vs Bitrate using Intel IPP 6.1 for Foreman QCIF test sequence.
B. SSIM
Figure 22: SSIM Vs Bitrate using Intel IPP 6.1 for Foreman QCIF test sequence.
C. MSE
Figure 23: MSE Vs Bitrate using Intel IPP 6.1 for Foreman QCIF test sequence.
QP Bitrate
(Kbps)
PSNR (dB) Encoding time
(sec)
Decoding time
(sec)
ME time
(sec)
0 18235.53 69.70 25.86 0.0898 2.53
10 9575.72 52.64 22.93 0.0628 2.47
20 3365.02 44.40 23.71 0.0629 2.38
30 1011.22 37.42 28.58 0.0642 2.25
40 277.15 30.75 29.27 0.0567 1.96
50 104.79 25.68 18.77 0.0483 1.47
Table 10: Results obtained for Football CIF sequence using Intel IPP 6.1.
A. PSNR
Bitrate (Kbps)
Figure 24: PSNR Vs Bitrate using Intel IPP 6.1 for football CIF test sequence.
PSN
R (
dB
)
B. SSIM
Bitrate (Kbps)
Figure 25: SSIM vs Bitrate using Intel IPP 6.1 for Football CIF test sequence.
C. MSE
Bitrate (Kbps)
Figure 26: PSNR Vs Bitrate using Intel IPP 6.1 for Football CIF test sequence.
SSIM
M
SE
Number of frames: 90
Source width : 720
Source Height: 1080
QP: 0,10,20,30,40,50
Profile IDC: 100 (High profile)
QP Bitrate
(Kbps)
PSNR (dB) Encoding
time (sec)
Decoding
time (sec)
ME time
(sec)
0 13044.45 61.755 15.7 0.2357 3.53
10 7998.45 55.8701 11.71 0.2619 3.47
20 1799.99 42.128 9.85 0.2261 3.38
30 959.54 35.188 4.76 0.2332 3.25
40 476.25 30.659 4.2 0.2486 2.96
50 109.25 25.780 4.17 0.2677 2.47
Table 11: Results obtained for Sintel HD sequence using Intel IPP 6.1.
A. PSNR
Bitrate (Kbps)
Figure 27: PSNR Vs Bitrate using Intel IPP 6.1 for Sintel HD test sequence.
B. SSIM
Bitrate (Kbps)
Figure 28: SSIM Vs Bitrate using Intel IPP 6.1 for Sintel HD test sequence.
PSN
R (
dB
) SS
IM
Intel IPP 6.1 Vs JM 17.2
A. PSNR
Bitrate (Kbps)
Figure 29: PSNR Vs Bitrate for Intel IPP 6.1 Vs JM 17.2 using CIF Football test
sequence.
B. SSIM
Bitrate (Kbps)
Figure 30: SSIM Vs Bitrate for Intel IPP 6.1 Vs JM 17.2 using CIF Football test
sequence.
PSN
R (
dB
) SS
IM
C. MSE
Bitrate(Kbps)
Figure 31: MSE Vs Bitrate for Intel IPP 6.1 Vs JM 17.2 using CIF Football test
sequence.
D. Encoding Time
Bitrate (Kbps)
Figure 32: Encoding time Vs Bitrate for JM 17.2 Vs Intel IPP using CIF Football test
sequence.
MSE
En
cod
ing
tim
e (s
ec)
Conclusions:
Metrics Performance
SSIM Intel IPP 6.1 offers better results than JM
17.2
MSE Intel IPP 6.1 offers better results than JM
17.2
PSNR Intel IPP 6.1 offers better results than JM
17.2
Encoding time and decoding time Intel IPP6.1 is faster than JM 17.2
Table 12 : Performance analysis between JM 17.2 and Intel IPP 6.1 using different metrics.
References:
[1]. K. R. Rao and D. N. Kim, “Current Video Coding Standards: H.264/AVC, Dirac, AVS
China and VC-1”, IEEE 42nd Southeastern symposium on system theory (SSST), pp. 1-8, Mar.
2010.
[2]. J. Zhang, A. Perkis and N. D. Georganas, “H.264/AVC and Transcoding for Multimedia
Adaptation”, Proc. of the 6th
COST, 2006.
[3]. T. Wiegand et al, “ Overview of the H.264/AVC Video Coding Standard”, IEEE Trans. on
Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003.
[4]. Soon-kak Kwon, A. Tamhankar and K.R. Rao, “Overview of H.264 / MPEG-4 Part 10”, J.
Visual Communication and Image Representation, vol. 17, pp.186-216, April 2006.
[5]. A.Puri, X.Chen and A. Luthra , “ Video coding using H.264/MPEG-4 AVC compression
standard”, Science Direct. Signal processing: Image communication, vol.19, pp 793-849, Oct.
2004.
[6]. D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its
applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.
[7]. R. Schäfer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard”, EBU
Technical Review, Jan. 2003.
[8]. P.Carrillo, H.Kalva, and T.Pin, “Low complexity H.264 video encoding”, Applications of
Digital Image Processing. Proc. of SPIE, vol. 7443, 74430A, Sept.2009.
[9]. H. Kalva, “ The H.264 Video Coding Standard”, IEEE Conference on Multimedia, Vol.
13, pp. 86-90, 2006.
[10]. I.E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for next-
generation multimedia”, 2nd
edition, Wiley, 2010.
[11]. V. Roden and T. Praktische, “H.261 and MPEG1- A comparison”, Conference Proceedings
of the 1996 IEEE Fifteenth Annual International Phoenix Conference on Computers and
Communications, pp.65-71, Mar 1996.
[12]. JM 17.2 software : http://iphome.hhi.de/suehring/tml/
[13].CIF and QCIF formats:http://www.birds-eye.net/definition/c/cif-
common_intermediate_format.shtml/
[14]. Intel IPP software: http://software.intel.com/en-us/
[15]. H.264 profiles: http://www.innocodec.com/H.html
[16]. C. Li and A. C. Bovik, “Content – weighted video quality assessment using a three-
component image model”, Journal of Electronic Imaging, Vol. 19, pp. 65-71, Mar 2010.
[17]. Z. Wang, E. P. Simoncelli and A.C. Bovik, “Multi-scale structural similarity for image
quality assessment”, Proc. of 37th
IEEE Asilomar Conference on Signal Systems and Computers,
Nov. 9 -12, 2003.
LIST OF ACRONYMS AND ABBREVATIONS
AVC- Advanced video coding
BP – Baseline profile
B slice – Bi-predictive slice
CBP – Constrained baseline profile
CAVLC – Context adaptive variable length coding
CABAC – Context adaptive binary arithmetic coding
CIF – Common intermediate format
DVD – Digital video disc
Fps – Frames per sec
HD – High definition
HiP – High profile
Hi10P – High 10 profile
Hi422P – High 4:2:2 profile
Hi444PP – High 4:4:4 predictive profile
I slice – Intra slice
JM – Joint model
MB – Macroblock
MC – Motion compensation
ME – Motion estimation
MP – Main profile
MPEG – Moving picture experts group
MSE – Mean square error
MVC – Multi video coding
NAL – Network abstraction layer
P slice – Predictive slice
P – Predicted macroblock
PSNR – Peak to peak signal to noise ratio
QCIF – Quarter common intermediate format
SI slice – Switching I slice
SP slice – Switching P slice
SSIM - Structural similarity index metric
VCEG – Video coding experts group
XP – Extended profile