Error Resilience and Performance Evaluation of H.264/AVC video streams in a Lossy Wireless
EnvironmentMuhammad Saleem Koul, SM IEEE
EE Dept., UT Arlington
EE 5359: Multimedia ProcessingFinal Project
© M.S.Koul, Dept. of Electrical Engineering
Synopsis
Multimedia over Wireless has become a reality with Broadband 3G/4G Cellular Technologies .
Inherent nature of the transmission medium makes the problems of Packet Loss and Delay variance (jitter) more severe in Wireless/Cellular networks.
Several Error Concealment Algorithms for H.264 are compared, their advantages and disadvantages are analyzed. A new EC Algorithm is also proposed.
A Video Quality Assessment (VQA) methodology is introduced, that helps analyze and quantify these effects on the Quality of the received Video.
© M.S.Koul, Dept. of Electrical Engineering
Typical 3G/4G Network Applications
CDMA 2000 1X: 144 kbpsEVDO Rev A: 500-700kbpsWiMAX: >3.1Mbps
~10 Mbps ~10 Mbps
~10 Mbps~10 Mbps
CDMA 2000 1X: 144 kbpsEVDO Rev A: 500-700kbpsWiMAX: >3.1Mbps
© M.S.Koul, Dept. of Electrical Engineering
Correlation in Video Sequences
Figure 3.2 Spatial and temporal correlation in a video sequence
temporal correlation
spatial correlation
© M.S.Koul, Dept. of Electrical Engineering
H.264 Performance
video frame compressed at the same bitrate (150 kbps) using MPEG-2 (left), MPEG-4 Visual (center) and H.264 compression (right), Courtesy: Vcodex White paper http://www.vcodex.com
Better image quality at the same compressed bitrate, or a lower compressed bitrate for the same image quality.
© M.S.Koul, Dept. of Electrical Engineering
Typical Stream Setup courtesy: streamingmedia.org
© M.S.Koul, Dept. of Electrical Engineering
Basic coding structure for H.264/AVC for a macroblock.
EntropyCoding
Scaling & Inv. Transform
Motion-Compensation
ControlData
Quant.Transf. coeffs
MotionData
Intra/Inter
CoderControl
Decoder
MotionEstimation
Transform/Scal./Quant.-
InputVideoSignal
Split intoMacroblocks16x16 pixels
Intra-frame Prediction
DeblockingFilter
OutputVideoSignal
H.264/AVC Structure
© M.S.Koul, Dept. of Electrical Engineering
Error Concealment vs Error Resilience
Page 8
Error Resilience
© M.S.Koul, Dept. of Electrical Engineering
Temporal ReplacementReplaces missed Frame/MB as (0,0)Copy a MB/Frame from previously reconstructed
reference slice at the exact same position
Error Concealment – Frame missing
© M.S.Koul, Dept. of Electrical Engineering
Temporal Replacement (contd.)
Frames# 115, 116 and 117 of the Original Sequence
Successfully decoded Frame# 115 and lost Frame #116. Frame# 116 was reconstructed by Frame copy. Frame #117 is degraded.
© M.S.Koul, Dept. of Electrical Engineering
Error Concealment – Frame missing (contd.)
Multi-frame Motion Vector Averaging (MVA)Exploits MVs of a few past framesEstimate the MV of each pixel in last successful frameProject last frame onto an estimate of missing frameSometimes worse than temporal replacement
How many past frames should be used?Farther reference frame could not contain helpful MV
© M.S.Koul, Dept. of Electrical Engineering
Error Concealment – Frame missing (contd.)
Motion Vector Extrapolation (MVE)Compensate the missed MB by extrapolating each MV
that is stored in previously decoded frame8x8 sub-block based processLarge overlapped MV is selected for the sub-block
If there is no overlap, then use Zero MV
© M.S.Koul, Dept. of Electrical Engineering
Different Error Concealment Techniques
Original
Error
WeightedAverage
Decode Iframewithoutresiduals
Copy-paste
Blockmatching
Boundarymatching
Decodewithoutresiduals
Ref: I.C.Todoli “Performance of Error Concealment Methods forWireless Video”, Diploma Thesis, Vienna University of Technology, 2007 [1]
© M.S.Koul, Dept. of Electrical Engineering
Deformable Surface Morphing [10,11]
One of the simplest Morphing methods is Image morphing using deformable surfaces.
We devised a mapping of the H.264 motion vectors to a deformable surface morph.
Once the morphing surface matrix is obtained we apply it to the previous frame to obtain the next frame.
We show that a geometric warp transform inherently smooths out the motion vectors in case of lost motion vectors and residual information.
Conversely, the same geometric warp could be applied at the encoder to reduce the size of the residual, thus decreasing the overall bandwidth overhead.
Page 14
© M.S.Koul, Dept. of Electrical Engineering
Motion Vector to Deformable Surface Morphing
Page 15
0 2 4 6 8 10 120
1
2
3
4
5
6
7
8
9
© M.S.Koul, Dept. of Electrical Engineering
a) Error (No Error Concealment)MSE: 2498PSNR: 14.15 dBSSIM: 0.7340
c) Frame Copy MSE: 123. 8PSNR: 27.20 dBSSIM: 0.8598
b) Weighted averagingMSE: 891.24PSNR: 18.63 dBSSIM: 0.7522
Different Error Concealment Techniques (QCIF)
© M.S.Koul, Dept. of Electrical Engineering
e) Motion vector copy without residualMSE: 52.50PSNR: 30.93 dBSSIM: 0.913
f) Geometric warping after MV copyMSE: 42.51PSNR: 31.85 dBSSIM: 0.928
d) Decoded without residualMSE: 46.10PSNR: 31.49 dBSSIM: 0.925
Different Error Concealment Techniques (QCIF)
© M.S.Koul, Dept. of Electrical Engineering
Different Error Concealment Techniques (CIF)
b) Error (No Error Concealment)MSE: 207PSNR: 25 dBSSIM: 0.925
c) Weighted averagingMSE: 97.56PSNR: 28.24 dBSSIM: 0.946
a) Original Frames 36 and 37Frame 37 suffers packet loss resultingin several MBs lost.
© M.S.Koul, Dept. of Electrical Engineering
Different Error Concealment Techniques (CIF)
e ) Motion vector copyMSE: 7.74PSNR: 39.25 dBSSIM: 0.9903
f) Geometric warping after MV copyMSE: 6.72PSNR: 39.90 dBSSIM: 0.9915
d) Frame Copy MSE: 23.41PSNR: 34.44 dBSSIM: 0.9781
© M.S.Koul, Dept. of Electrical Engineering
Error Propagation due to I or P Frame Damage/Loss
© M.S.Koul, Dept. of Electrical Engineering
A Packet Loss Modelcourtesy: Feamster, Balakrishnan
This plot shows the relationship between Packet Loss Rate and Frame Rate (or Perceived Video Quality) for Videos of different original PSNR.
© M.S.Koul, Dept. of Electrical Engineering
Effects of Packet Loss on observed frame rate at the receiver. •Using selective reliability of specific frames can improve the over all received Video quality•The graph shows that as the packet loss rate p increases, the frame rate/quality degrades roughly as [14]:
Let P(F|fi) be the conditional probability that a frame was successfully decoded at the receiver. Defining it as a Bernoulli random variable:
Successful decoding of a P frame depends on all I and P frames that precede it in the GOP:
Where p = packet loss rateSI and SP are the average number of packets in an I and P frame respectivelyNP = Number of P frames in a GOP
Packet Loss Model contd.
© M.S.Koul, Dept. of Electrical Engineering
Probability of frame loss of I-frames and P-frames Following plots show the dependence of the probability of frame loss of typical Predicted frames.I-frames have been proven by the model to result in the highest number of dependencies and probability of error propagation.If the probability of degradation/loss of I-frames is decreased, it decreases the probability of degradation/loss of P-frames.
© M.S.Koul, Dept. of Electrical Engineering
Errors in JM13.2 EC implementation
The Error Concealment implementation in the latest JM decoder is far from perfect at this stage.
Several Open issues are already under investigation. Bugs have already been reported to the JVT-JM team.
Current implementation of the Motion Vector Copy does not work properly in JM13.2
Ref URL:https://ipbt.hhi.de/mantis/search.php?
project_id=1&search=error+concealment&sticky_issues=off&sortby=last_updat
ed&dir=DESC&hide_status_id=-2
© M.S.Koul, Dept. of Electrical Engineering
(Objective) Video Quality Analysis of the Received Sequences
In this section we detail several Video Quality Analysis schemes applied to the transmitted and received video sequences and analyze their results in light of the VQEG final report ‘No one objective model outperforms the other in all cases’.
We compare results from RMSE, PSNR, DVQ and SSIM.
The VQA methods need to be mapped to a Human Perceivable Score (MOS). The importance of MOS mapping is discussed.
© M.S.Koul, Dept. of Electrical Engineering
Root-Mean-Square Error (RMSE)
It calculates the “difference” between two images. It can be applied to digital video by averaging the results for each frame.
For an MxN image, RMSE can be calculated as:
© M.S.Koul, Dept. of Electrical Engineering
PSNR (Peak Signal to Noise Ratio)
The most commonly used objective quality metric is the Peak Signal to Noise Ratio (PSNR). For a video sequence of frames. The PSNR (dB) of each frame having N*M pixels can be calculated as:
where 255 is the maximum pixel value in the N*M pixel image (8-bit PCM).
© M.S.Koul, Dept. of Electrical Engineering
DCT based VQ Evaluation
The conventional video metrics (RMSE and PSNR) do not take into account the spatial and temporal properties of human visual perception.
This DCT based VQ Metric proposed Xiao’s [8] is based on Watson’s work [7].
The 8x8 based block based distortion is the ‘atom’ of all current compression based video processing. This kind of block based distortions are very eminent hallmarks in all decoded video sequences. Hence a metric that does the evaluation in the DCT domain on 8x8 blocks yields significant results:
© M.S.Koul, Dept. of Electrical Engineering
Structural Similarity Approach
This approach emphasizes that the Human Visual System (HVS) is highly adapted to extract structural information from visual scenes. Therefore, a measurement of structural similarity (or difference) should provide a good approximation to perceptual image quality.
The SSIM index is defined as a product of luminance, contrast and structural comparison functions. [9]
Where μ is the mean intensity, and σ is the standard deviation as a round estimate of the signal contrast. C1 and C2 are constants. M is the numbers of samples in the quality map.
© M.S.Koul, Dept. of Electrical Engineering
MOS Mapping
In Multimedia, the Mean Opinion Score (MOS) provides a numerical indication of the perceived quality of received media after compression and/or transmission. The MOS is expressed as a single number in the range 1 to 5, where 1 is lowest perceived quality, and 5 is the highest perceived quality.
© M.S.Koul, Dept. of Electrical Engineering
Mobile to Mobile QCIF Clips
MSE, PSNR, DVQ, SSIM
Receiver traceSender trace
NETWORK
© M.S.Koul, Dept. of Electrical Engineering
© M.S.Koul, Dept. of Electrical Engineering
Server to Mobile Device CIF Clips
MSE, PSNR, DVQ, SSIM
Receiver traceSender trace
NETWORK
© M.S.Koul, Dept. of Electrical Engineering
© M.S.Koul, Dept. of Electrical Engineering
Applications and Future Work
A framework for evaluating the quality of standard video transmissions over a wireless infrastructure (system/subsystem).
Implement several novel error concealment algorithms using the JM 13.2 standard software [13].
A test bed to test novel VQA approaches. It can also be used for new encoding schemes, compression algorithms, motion estimation methods etc.
© M.S.Koul, Dept. of Electrical Engineering
References1) I.C.Todoli “Performance of Error Concealment Methods for Wireless Video”,
Diploma Thesis, Vienna University of Technology, 2007 2) T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the
H.264/AVC Video Coding Standard", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003.
3) Iain Richardson, “H.264 and MPEG-4 Video Compression”, John Wiley & Sons, 2003.
4) B. R. J. Klaue and A. Wolisz, “Evalvid - a framework for video transmission and quality evaluation,” Proc. 13th Intl Conf on Modeling, Techniques and Tools for Computer Performance Evaluation, Urbana, IL, 2003.
5) P. Seeling, et al., Video traces for network performance evaluation: a comprehensive overview and guide on video traces and their utilization in networking research. Springer, 2007. http://www.springer.com/engineering/signals/book/978-1-4020-5565-2
6) Video Trace research group at ASU, “Yuv video sequences,” http://trace.eas.asu.edu/yuv/index.html.
7) A.B. Watson, "Toward a perceptual video quality metric", Human Vision, Visual Processing, and Digital Display VIII, 3299, pp 139-147, 1998.
8) F. Xiao, “Dct-based video quality evaluation,” Final Project for EE392J Stanford Univ. 2000. http://compression.ru/video/quality_measure/vqm.pdf
9) Z. Wang, “The SSIM index for image quality assessment,” http://www.cns.nyu.edu/zwang/files/research/ssim/.
© M.S.Koul, Dept. of Electrical Engineering
References (contd.)10) D. Pröfrock, M. Schlauweg, E. Müller, ”Content-Based Watermarking by Geometric
Warping and Feature-Based Image Segmentation”, IEEE/ACM Proceedings of International Conference on Signal-Image Technology & Internet-Based Systems, 17 - 21. 2006, Hammamet, Tunisia.
11) S.Y. LEE, K.Y. CHWA, S.Y. SHIN “Image morphing using deformable surfaces”, Proc. Computer Animation (1994) , vol 200, pp. 31-39.
12) Joint Video Team (JVT), "ITU-T Recommendation H.264: ISO/TEC 14496-10:2005," ITU-T, 2005.
13) Joint Model (JM) - H.264/AVC Reference Software. http://iphome.hhi.de/suehring/tml/download/.
14) N. Feamster and H. Balakrishnan, “Packet Loss Recovery for Streaming Video”12th International Packet Video Workshop, Pittsburgh, PA, April 2002.
15) VQEG, “Final report from the video quality experts group on the validation of objective models of video quality assessment,” Mar. 2000. http://www.vqeg.org/.