[acm press the 18th brazilian symposium - são paulo/sp, brazil (2012.10.15-2012.10.18)] proceedings...

Performance of the Objective Video Quality Metrics withPerceptual Weighting Considering First and Second Order

Differential Operators

Carlos D. M. RegisFederal Institute of Education,

Science and Technology ofParaíba – Campus Campina

GrandeInstitute for Advanced Studiesin Communications – Iecom

Campina Grande, Brazil

José V. M. CardosoFederal University of Campina

Grande – UFCGInstitute for Advanced Studiesin Communications – Iecom


Ítalo P. OliveiraFederal Institute of Education,

Science and Technology ofParaíba – Campus Campina

GrandeCampina Grande, Brazil

Marcelo S. AlencarFederal University of Campina

Grande – UFCGInstitute for Advanced Studiesin Communications – Iecom


ABSTRACT

Objective video quality assessment is important to validatethe performance of applications in video communicationssystems. The research on visual attention and gradient-inspired metrics improves the performance of the video qual-ity evaluation. This paper investigates the use of first andsecond order differential operators as visual attention es-timators. The edge information, computed by differentialoperators, is an important factor of visual attention. Theperformance of the proposed models are compared with theobjective metrics, PSNR, SSIM and GSSIM, by means ofthe Pearson correlation coefficient, considering the blurring,blocking, salt and pepper and Gaussian noise distortions.The results suggest that for blurring, blocking and salt andpepper, the combination of objective metrics with first orderoperators provides a significant improvement, especially forthe Sobel operator.

Categories and Subject Descriptors

D.2.8 [Software Engineering]: Metric

General Terms

Algorithms, Measurement.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.WebMedia’12, October 15–18, 2012, São Paulo/SP, Brazil.Copyright 2012 ACM 978-1-4503-1706-1/12/10 ...$15.00.

Keywords

Video Quality Assessment, Differential Operators, Struc-tural Similarity, Human Visual System

1. INTRODUCTIONVideo quality assessment (VQA) is an important process

to validate communications systems which provide video ser-vices, including video on demand, video streaming, IPTVand Web TV. Subjective and objective methodologies areused to evaluate the video quality. The subjective meth-ods use psycho-visual experiments with end-users, whichevaluate the video quality according to personal criteria.Although this is the correct approach to assess the videoquality [9], it is resource intensive, expensive andtime-consuming, and it is impractical for real-time videoquality assessment.

The objective metrics, on the other hand, evaluate thevideo quality by means of algorithms, which permits to ob-tain the results rapidly, and it is ideal for real time ap-plications. Classical objective metrics such as MSE (MeanSquared Error) and PSNR (Peak Signal-to-Noise Ratio)present an unsatisfactory correlation with the subjective per-ception of quality, undermining the confidence on the results,because they do not incorporate features of the human visualsystem (HVS).

In 2004, Wang et al [9] proposed the Structural SIMilarity(SSIM) to image quality assessment, based on the assump-tion that the HVS is highly adapted to recognize structuralinformation in the visual environment and, therefore, thechange in this structural information would provide a goodimprovement in the approximation of the quality perceivedby the HVS.

A great deal of effort has been put on the design and im-provement of image and video quality metrics. Gu et al [3]

71

proposed a novel structural similarity index using adaptiveweights with gradient vectors (first order derivative) to mea-sure the image quality considering the following character-istics: edge, smooth and texture. Chen et al [2] present agradient-based structural similarity (GSSIM) based on theassumption that the edge and contour information are themost sensitive features for the human visual system (HVS).

Regis et al [8] proposes a technique called PerceptualWeighting (PW), which introduces the local Spatial Percep-tual Information (SI) as a visual attention estimator usingthe SSIM, since experiments indicate that the quality per-ceived by HVS is more sensitive in areas of intense visualattention [6]. The SI is basically computed by the Sobeldifferential operator, which estimates the magnitude of thegradient vectors of the video.

This paper presents a comparative study between first andsecond order differential operators, used to estimate the vi-sual attention, applied into objective video quality metricsSSIM and GSSIM, with the PW technique. The remainderof this paper is organized as follows. Section 2 presents theSSIM. Section 3 describes the proposed methods. Section 4describes the performed subjective experiment. Section 5presents the simulation results and their impact and Sec-tion 6 presents the conclusions.

2. STRUCTURAL SIMILARITY INDEX RE-

VIEWLet f = {fi | i = 1, 2, 3, . . . , P} be the original video

and h = {hi | i = 1, 2, 3, . . . , P} be the processed video.The SSIM is computed as the set of three measures over theluminance plane: luminance comparison l(f, h), the contrastcomparison c(f, h) and the structural comparison s(f, h),

l =2µfµh + C1

µ2

f + µ2

h + C1

, c =2σfσh + C2

σ2

f + σ2

h + C2

, s =σfh + C3

σfσh + C3

, (1)

in which µ is the average, σ is the standard deviation, σfh

is the covariance, C1 = (0.01 · 255)2, C2 = 2 · C3 and C3 =(0.03 · 255)2.

The structural similarity index is described as

SSIM(f, h) = [l(f, h)]α · [c(f, h)]β · [s(f, h)]γ , (2)

in which almost always α = β = γ = 1.In practice the SSIM is computed for an 8 × 8 sliding

squared window or for an 11×11 Gaussian-circular window.The first approach is used in this paper. Then, for two videoswhich are subdivided into D blocks, the SSIM is computedas

SSIM(f, h) =1

D

D∑

j=1

SSIM(fj , hj). (3)

3. DIFFERENTIAL OPERATORS APPLIED

INTO VQA ALGORITHMSLet T = {τi | τi ∈ Z [0, 255] and i = 0, 1, 2, . . . , P} be

a video signal with 28 luminance levels. The gradient vec-tor of T (∇T ) estimate the rate of change of luminance ofthe pixels along the horizontal and vertical directions. Itsmagnitude is computed as

| ∇T | =

[

(

∂T

∂x

)

2

+

(

∂T

∂y

)

2]

1/2

, (4)

in which x and y represent the horizontal and vertical direc-tions, respectively.

Digitally, the magnitude of the gradient is computed as

| ∇T | = [(O1 ∗ T )2 + (O2 ∗ T )2]1/2, (5)

in which ∗ denotes the linear filtering operation and thepair of matrices (O1,O2) is called differential operator. Thefirst order differential operators considered in this work are:Sobel (6), Prewitt (7) and Robert-cross (8). Their matricesare respectively:

−1 0 +1−2 0 +2−1 0 +1

,

−1 −2 −10 0 0

+1 +2 +1

; (6)

−1 0 +1−1 0 +1−1 0 +1

,

−1 −1 −10 0 0

+1 +1 +1

; (7)

[

+1 00 −1

]

,

[

0 +1−1 0

]

. (8)

For the Sobel and Prewitt operators the rate of change ofluminance is calculated in a coordinate system with 0 andπ2rad as orthogonal directions. For the Robert-cross opera-

tor, the angle of the coordinate system is rotated in π4rad [5].

The magnitude of the second order differential operator(Laplacian) is defined as [5]

| ∇2T | =

[

(

∂2T

∂x

)2

+

(

∂2T

∂y

)2]

1/2

. (9)

In image processing, the Laplacian is an isotropic operator,i.e., its value is independent of the direction of the edge. Itsmatrix of convolution is

0 +1 0+1 −4 +10 +1 0

. (10)

The gradient and the Laplacian are applied to highlightthe edges of the objects, i.e., zones of transition of pixel lu-minance. While the high magnitude of the gradients meansthe presence of edges, the Laplacian creates zero-crossing toindicate the existence of edges [7], i.e., when the first deriva-tive is at a maximum, the second derivative is zero [5], asshown in Figure 1.

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Second−order DerivativeFirst−order Derivative

Smooth Edge

Figure 1: First and second order derivative to identify edgeinformation.

Results of researches in perceptual vision indicated thatthe HVS has a different weight to sense the regions as edges,

72

texture and smooth, i.e., these regions are structurally moreimportant in vision environment [3], [2].

In many gradient-based image and video quality metrics,the Sobel operators are used to estimate the edge informa-tion and to identify higher visual interest regions. In thealgorithm proposed by [3], the Sobel operators are used toclassify the pixels in four regions: strong edge, weak edge,texture and smooth. From this division, adaptive weightsare assigned to each region, intending thereby to obtain aweighted structural similarity index.

Similarly, the algorithm present in [2], called Gradient-based Structural Similarity Index (GSSIM), used the Sobeloperator, and it computed as

GSSIM(fj , hj) = [l(fj , hj)]α · [cg(fj , hj)]

β · [sg(fj , hj)]γ,

(11)

GSSIM(f, h) =1

D

D∑

j=1

GSSIM(fj , hj), (12)

in which cg(fj , hj) and sg(fj , hj) are the contrast and struc-ture comparisons calculated over the gradient vectors mapfrom the Equation 1.

The weight (wj) is defined to to evaluate the PW tech-nique, with first and second order differential operators, foreach squared region in the frame,

wj =

(

1

n− 1

n∑

i=1

(∇fi − µf ′)2) 1

2

, (13)

in which µf ′ represent the average magnitude of the gradientvectors and n is a total of gradient vectors in j-th block. Forthe case that the frames are partitioned uniformly in squares8×8, n = 64. For the case of the Laplacian, wj is computedwith ∇2.

To merge the weight wj with the considered metrics asimilar approach to that used in the works of Akamine andFarias [1], Liu and Heynderickx [6] was used. It is necessaryto compute a weighted average for the VQA algorithms, inwhich the weighting coefficients are the wj , i.e.

M-PW-VQA =

∑Dj=1

VQA(fj , hj) · wj∑D

j=1wj

, (14)

in which the M = {S, R, P, L} is denoted according theused differential operator, to compute wj , as: S (Sobel), R(Roberts), P (Prewitt) and L (Laplace); and likewise forVQA = {SSIM, GSSIM}.

4. VALIDATION PROCESSTo validate and compare the presented approach, a sub-

jective evaluation was performed, in which real-users watchedvideo sequences and assigned a score, according with theirjudgment, which corresponds to the perceived quality. Theaverage of those scores, for each video sequence, is calledMean Opinion Score (MOS).

There are two important points which should be consid-ered to implement a subjective experiment: the selection ofthe video sequences and the method. For the first, the ITU-T Recommendation P.910 suggests to observe a variety ofthe Spatial Perceptual Information (SI) and the TemporalPerceptual Information (TI) for the chosen video sequences,to avoid that the subjective experiment becomes boring.Accordingly, the chosen video sequences were: “Foreman’,

“Mobile”, “Glasgow” and “Mother and Daughter”, in QCIFformat (ideal for mobile receivers). Their SI and TI arepresented in Figure 3.

60

80

100

120

140

160

0 20 40 60 80 100

Spa

tial P

erce

ptua

l Inf

orm

atio

n (S

I)

Temporal Perceptual Information (TI)

ForemanGlasgow

MotherMobile

Figure 3: Spatio-temporal information of the selected videosequences.

For the second, the literature provides several methods [4],including the Absolute Category Rate (ACR), used in thiswork, categorized as a Single Stimulus Method (SSM), i.e.,a category in which the processed video sequences are evalu-ated independently according a discrete scale of judgement.The main advantages of using a SSM method are the easyimplementation and fast acquisition of the results [4].

The selected videos were submitted to four types of distor-tion: Gaussian noise, blocking, blurring and salt & peppernoise, with two levels of intensity, which resulted in a to-tal of 32 synthetic videos, evaluated by 40 people for salt &pepper, blocking and blurring, and by 24 people for Gaus-sian noise. That type of distortion often occurs in videoprocessing systems [10], therefore, it is important that theyare simulated in controlled environments. The parametersused to simulate the distortions are show in Table 1.

Table 1: Distortion parameters used in the distortions sim-ulator.

Distortion Intensity Parameters

Salt & Pepper Noise 1 Probability 1%(SP) 2 Probability 3%

Blurring 1 Mean Filter with3 × 3 mask (twoapplications)

(BR) 2 Mean Filter with3 × 3 mask (fourapplications)

Blocking 1 Probability 1%(BK) 2 Probability 3%

Gaussian Noise 1 σ = 0.0002(GN) 2 σ = 0.003

5. SIMULATION RESULTSTo compare the performance of the VQA algorithms de-

scribed in previous section the Pearson Linear CorrelationsCoefficients (PCC) was take into account, between objec-tive measures and the MOS. The PCC quantifies the perfor-mance about to accuracy of the objective metric, when it isthe closer to one the the objective metric is better.

Table 2 presents the PCC, between MOS and the pre-dicted measure, considering the distortions in the Table 1.In general, a significant improvement for the metrics whichembed the Sobel first order differential operator is noted.

73

(a) Original (b) Sobel (c) Prewitt (d) Roberts (e) Laplace

Figure 2: Differential Operators

Specifically, the S-PW-SSIM in scenario of videos submit-ted to salt and pepper noise and S-PW-GSSIM for videoswhich presented blurring and blocking degradations, alsopresented high correlation.

The Laplacian second order operator presents the high-est value for the blocking scenario. It is also important toobserve that none of the proposed models showed improve-ment in the correlation for the Gaussian noise, which can bejustified by the low correlation between the Gaussian noiseand SI [8].

Table 2: Pearson Linear Correlation Coefficients.

Model SP BR BK GN

PSNR 0.828 0.607 0.697 0.858

SSIM 0.902 0.776 0.792 0.931

S-PW-SSIM 0.920 0.866 0.834 0.918R-PW-SSIM 0.883 0.870 0.831 0.903P-PW-SSIM 0.914 0.874 0.833 0.912L-PW-SSIM 0.907 0.882 0.840 0.915

G-SSIM 0.864 0.972 0.867 0.822S-PW-GSSIM 0.914 0.984 0.876 0.913R-PW-GSSIM 0.865 0.983 0.872 0.905P-PW-GSSIM 0.911 0.984 0.875 0.911L-PW-GSSIM 0.894 0.977 0.876 0.917

With respect to the computational cost, the Sobel andPrewitt first order operators present practically the sameincrement in time of execution, for the merge with SSIM (S-PW-SSIM and P-PW-SSIM) the period was 450ms and forG-SSIM (S-PW-GSSIM and P-PW-GSSIM) it was 500ms.The Roberts and Laplacian operators present short process-ing times, in relation to SSIM (R-PW-SSIM and L-PW-SSIM), the time difference was of 331ms and 210ms, re-spectively, and for the GSSIM (R-PW-GSSIM and L-PW-GSSIM) 370ms and 286ms, respectively.

6. CONCLUSIONSThis paper presented an investigation on how to improve

some objective metrics, with the introduction of first andsecond order differential operators, to identify the visual at-tention in a scene. The proposed technique involves thecombination of the local magnitude of the first and secondorder differential operators, which provide an estimation ofthe visual attention, with objective metrics.

The experiments considered blurring, blocking, salt & pep-per and Gaussian noise distortions. The performance of theproposed metrics was compared with the PSNR, GSSIM andSSIM metrics. The results suggest which the Sobel first or-der differential operator is the best one to estimate the vi-sual attention, when combined with objective video qualitymetrics. The Laplacian second order differential operatorprovided a small improvement regarding the correlation co-

efficient, but it is fast to compute. The simulations wereperformed using the C programming language for a note-book with Intel(R) Core(TM) i5-2410M CPU 2.30GHz withGNU/Linux Ubuntu 10.10 operation system.

7. ACKNOWLEDGMENTSThe authors would like to thank IFPB, CNPq/PIBITI,

Iecom and COPELE/UFCG for providing research support.

8. REFERENCES[1] W. Y. L. Akamine and M. C. Q. Farias. Incorporating

visual attention models into image quality metrics. InProceedings of the Sixth International Workshop onVideo Processing and Quality Metrics for ConsumerElectronics (VPQM), 2012.

[2] G.-H. Chen, C.-L. Yang, and S.-L. Xie.Gradient-based structural similarity for image qualityassessment. In IEEE International Conference onImage Processing, pages 2929 –2932, oct. 2006.

[3] S. Gu, F. Shao, G. Jiang, and M. Yu. A newfour-component gradient-based structural similaritymetric using adaptive weights. In Eighth InternationalConference onFuzzy Systems and Knowledge Discovery(FSKD), volume 2, pages 970 –973, july 2011.

[4] ITU-T. ITU-T Recommendation P.910: Subjectivevideo quality assessment methods for multimediaapplications. Technical report, ITU-T, April 2008.

[5] M. Juneja and P. S. Sandhu. Performance evaluationof edge detection techniques for images in spatialdomain. International Journal of Computer Theoryand Engineering, (5), Dec. 2009.

[6] H. Liu and I. Heynderickx. Studying the added valueof visual attention in objective image quality metricsbased on eye movement data. In 16th IEEEInternational Conference on Image Processing, pages3097 –3100, nov. 2009.

[7] E. Nadernejad, S. Sharifzadeh, and H. Hassanpour.Edge detection techniques: Evaluations andcomparisons. Applied Mathematical Sciences,(31):1507–1520, 2008.

[8] C. D. M. Regis, J. V. M. Cardoso, and M. S. Alencar.Video quality assessment based on the effect of theestimation of the spatial perceptual information. InAnais do XXX Simposio Brasileiro deTelecomunicacoes (SBrT’12) – Accepted Paper, 2012.

[9] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli.Image quality assessment: from error visibility tostructural similarity. IEEE Transactions on ImageProcessing, 13(4):600 –612, april 2004.

[10] S. Winkler. Digital Video Quality: Vision Models andMetrics. Wiley, 2005.

74

[acm press the 18th brazilian symposium - são paulo/sp, brazil (2012.10.15-2012.10.18)] proceedings...

Documents