a visual attention based reference free perceptual quality metric

6
A VISUAL ATTENTION BASED REFERENCE FREE PERCEPTUAL QUALITY METRIC Ali Shariq Imran, Fahad Fazal Elahi Guraya, Faouzi Alaya Cheikh Gjøvik University College P.O.Box-191, N-2802, Gjøvik , Norway phone: + (47) 96822072, email: [email protected] ABSTRACT In this paper we study image distortions and impairments that affect the perceived quality of blackboard lectures images. We also propose a novel reference free image quality eval- uation metric that correlates well with the perceived image quality. The perceived quality of images of blackboard lec- ture contents is mostly affected by the presence of noise, blur and compression artifacts. Therefore, the importance of these impairments are estimated and used in the proposed quality metric. In this context there is no reference, distortion free, image; thus we propose to evaluate the image perceived qual- ity based on the features extracted from its content. The pro- posed objective metric estimates the blockliness and blur arti- facts in the salient regions of the lecture images. The use of a visual saliency models allows the metric to focus only on the distortions in perceptually important regions of the images; hence mimicking the human visual system in its perception of image quality. The experimental results show a very good correlation between the objective quality scores obtained by our metric and the mean opinion scores obtained via psy- chophysical experiments. The obtained objective scores are also compared to those of the PSNR. Index TermsReference Free, Quality Evaluation, Per- ceptual Quality Metric, Lecture Images, Text Saliency 1. INTRODUCTION Recent advances in e-learning technologies coupled with sig- nificant internet growth have led to the widespread use and availability of digital lecture videos [1]. Most of these im- ages and lecture videos are still based on the use of tradi- tional blackboard with handwritten text. Apart from the e- learning, these images and videos are widely used for the cre- ation, storage and retrieval of multi-media learning objects [1], optical character recognition for handwritten text and for lecture video summarization and indexing. However the qual- ity evaluation of blackboard images and lecture videos is still an untouched area. Image quality can be measured objectively (by an algo- rithm) or subjectively (carried out by viewing sessions where viewers are asked for their opinion score on the image qual- ity). Objective quality metrics can be divided into three cat- egories namely full reference, reduced-reference and refer- ence free metrics. In the full reference quality estimation, the distorted image is usually subtracted from the original im- age. Peak Signal to Noise Ratio (PSNR), Mean Absolute Error (MAE) and Mean Squared Error (MSE) are still the most widely used full reference quality metrics. In reduced- reference, features are extracted from both the reference and distorted images which are later used to quantify the degra- dation in the distorted image. Both of these categories of objective metrics make use of the original image as a ref- erence. This however is not always possible, since in most cases original image is not present. Therefore, the reference free objective quality metrics are desirable. These metrics use virtual reference, i.e. they are application domain dependent and rely on finding the known coding artifacts. In our application, excess of chalk dust can create noise in the board images during repetitive writing and erasing of text. The quality of the content is more severely degraded with the introduction of compression artifacts by the image or video encoders. Work has been done in recent years to develop sys- tems for enhancing the visual quality of white board images [2] and documents acquired with portable digital cameras [3]. Most of these systems rely on applying various enhancement technique directly on the image to enhance the text [4, 5] without evaluating the quality of the media. Blocking and blurring are the prominent artifacts affecting the readability of the blackboard content that are introduced by compression. The initial work to estimate the blockliness was carried out by Wang and Alan [6, 7, 8]. Different techniques since then are adopted to estimate these artifacts based on edge sharp- ness level [9, 10], contrast similarity [11], geometric moments [12], and based on other spatial features [13, 14]. In this paper, we propose a non-reference quality metric for black-board images based on visual attention analysis. Vi- sual attention always plays an important role in determining the perceived image quality. It is widely accepted that under normal viewing conditions, human eye tends to follow visu- ally salient regions [15]. Here we have proposed a quality metric that gives higher weights to salient regions and lower to non-salient regions of a black-board image. Barland and 978-1-4244-7287-1/10/$26.00 ©2010 IEEE EUVIP 2010

Upload: ntnu-no

Post on 10-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

A VISUAL ATTENTION BASED REFERENCE FREE PERCEPTUAL QUALITY METRIC

Ali Shariq Imran, Fahad Fazal Elahi Guraya, Faouzi Alaya Cheikh

Gjøvik University CollegeP.O.Box-191, N-2802, Gjøvik , Norway

phone: + (47) 96822072, email: [email protected]

ABSTRACT

In this paper we study image distortions and impairments thataffect the perceived quality of blackboard lectures images.We also propose a novel reference free image quality eval-uation metric that correlates well with the perceived imagequality. The perceived quality of images of blackboard lec-ture contents is mostly affected by the presence of noise, blurand compression artifacts. Therefore, the importance of theseimpairments are estimated and used in the proposed qualitymetric. In this context there is no reference, distortion free,image; thus we propose to evaluate the image perceived qual-ity based on the features extracted from its content. The pro-posed objective metric estimates the blockliness and blur arti-facts in the salient regions of the lecture images. The use of avisual saliency models allows the metric to focus only on thedistortions in perceptually important regions of the images;hence mimicking the human visual system in its perceptionof image quality. The experimental results show a very goodcorrelation between the objective quality scores obtained byour metric and the mean opinion scores obtained via psy-chophysical experiments. The obtained objective scores arealso compared to those of the PSNR.

Index Terms— Reference Free, Quality Evaluation, Per-ceptual Quality Metric, Lecture Images, Text Saliency

1. INTRODUCTION

Recent advances in e-learning technologies coupled with sig-nificant internet growth have led to the widespread use andavailability of digital lecture videos [1]. Most of these im-ages and lecture videos are still based on the use of tradi-tional blackboard with handwritten text. Apart from the e-learning, these images and videos are widely used for the cre-ation, storage and retrieval of multi-media learning objects[1], optical character recognition for handwritten text and forlecture video summarization and indexing. However the qual-ity evaluation of blackboard images and lecture videos is stillan untouched area.

Image quality can be measured objectively (by an algo-rithm) or subjectively (carried out by viewing sessions where

viewers are asked for their opinion score on the image qual-ity). Objective quality metrics can be divided into three cat-egories namely full reference, reduced-reference and refer-ence free metrics. In the full reference quality estimation, thedistorted image is usually subtracted from the original im-age. Peak Signal to Noise Ratio (PSNR), Mean AbsoluteError (MAE) and Mean Squared Error (MSE) are still themost widely used full reference quality metrics. In reduced-reference, features are extracted from both the reference anddistorted images which are later used to quantify the degra-dation in the distorted image. Both of these categories ofobjective metrics make use of the original image as a ref-erence. This however is not always possible, since in mostcases original image is not present. Therefore, the referencefree objective quality metrics are desirable. These metrics usevirtual reference, i.e. they are application domain dependentand rely on finding the known coding artifacts.

In our application, excess of chalk dust can create noise inthe board images during repetitive writing and erasing of text.The quality of the content is more severely degraded with theintroduction of compression artifacts by the image or videoencoders. Work has been done in recent years to develop sys-tems for enhancing the visual quality of white board images[2] and documents acquired with portable digital cameras [3].Most of these systems rely on applying various enhancementtechnique directly on the image to enhance the text [4, 5]without evaluating the quality of the media. Blocking andblurring are the prominent artifacts affecting the readabilityof the blackboard content that are introduced by compression.The initial work to estimate the blockliness was carried outby Wang and Alan [6, 7, 8]. Different techniques since thenare adopted to estimate these artifacts based on edge sharp-ness level [9, 10], contrast similarity [11], geometric moments[12], and based on other spatial features [13, 14].

In this paper, we propose a non-reference quality metricfor black-board images based on visual attention analysis. Vi-sual attention always plays an important role in determiningthe perceived image quality. It is widely accepted that undernormal viewing conditions, human eye tends to follow visu-ally salient regions [15]. Here we have proposed a qualitymetric that gives higher weights to salient regions and lowerto non-salient regions of a black-board image. Barland and

978-1-4244-7287-1/10/$26.00 ©2010 IEEE EUVIP 2010

Perkis combined the attention model for saliency with per-ceptual quality in [16, 17].

The rest of this paper is organized as follows: Section 2presents our quality metric that computes the quality score ofa black-board image after computing the blurriness and block-liness in the salient regions of the image. Section 3 shows theexperimental setup and psychophysical experiment results. Insection 4, the results obtained from psychophysical experi-ments are compared with the existing full reference matrices.In the last section conclusions and future directions are ex-plored.

2. PROPOSED MODEL

The simplified process flow chart for our proposed attentionbased reference free quality metric model is shown in Fig-ure 1. The system takes the lecture images as an input. Itthen computes the text saliency, estimates the blocking andblurring artifacts on the salient regions, and gives an objec-tive quality value. Psychophysical experiments are conductedto obtain mean opinion score. The same sets of images areused for the subjective experiments. Lastly, correlation is per-formed between the objective quality score obtained as a re-sult of artifacts estimation and the mean opinion score.

Fig. 1. Process flow for the proposed model.

2.1. Saliency Detection

In our proposed quality estimation metric, first of all wehave to detect salient regions of the image for that we needa saliency detection model. Traditional bottom-up saliencydetection models [18, 19] make use of the features such ascolor, intensity, orientation to compute the salient regions in

an image. However these techniques fail when applied toblackboard images with handwritten text because of minorvariation in color and intensity. So for detecting salient re-gions on black-board images, that is the regions with textwe have proposed a simple technique. At first we detect thehorizontal and vertical edges in the image using Sobel edgedetector followed by noise removal from the images using amedian filter. The resultant image with edges is considered asthe saliency map of the original image. The original image isshown in Figure 2(a) top left corner and the saliency map isshown in Figure 2(b) top right corner. Here we are assumingthat the images contain only the inner part of the blackboardwith text, so that only the blackboard text regions could bedetected as salient regions. False salient regions could bedetected in case the image contains the blackboard boundaryedges, and edges due to presence of other objects. Figure 2(c)and 2(d) shows the 8 × 8 salient blocks. Further blocklinessand blurriness estimation is carried out in these salient blockregions.

Fig. 2. 8x8 saliency blocks representation for text detection.(a). original Image (b). Saliency map (c). Blocks representa-tion (d) salient blocks.

2.2. Blockiness Estimation

Compression standards like JPEG and MPEG use blocks of8 × 8 pixels for the DCT transform and quantization opera-tions. This creates artifacts in the compressed images at theedges of these blocks. Here we propose to use an algorithmthat detects edges at 8× 8 block boundaries of the image andused to compute the degree of block processing in the image.

56

The extrapolated difference between adjacent blocks con-stituting the salient regions is calculated to estimate the block-iness effect on the perceived image quality.

LetBij is an 8×8 block of pixels from the salient regionsof the image.

The value of the blocking artifact Blcv across two hori-zontally adjacent blocks B11 and B12, as illustrated in Fig-ure 3, represents a measure of the discontinuity at the verticalboundary between the two blocks. This value is computed inthe following way, first the vertical discontinuity is evaluatedfor each line across the two blocks. This vertical discontinuityis computed as the absolute difference of the two extrapolatedvalues, (El) and (Er), across the boundaries of two adjacentblocks. (El) and (Er) are calculated using first order extrap-olator given as:

El =32∗ x1 −

12∗ x2 (1)

Er =32∗ y1 −

12∗ y2 (2)

Where x1, x2 and y1, y2 are the pixel values at the bound-ary of the blocks as illustrated in Figure 3.

While the vertical artifact value is the mean of the eightdiscontinuities within a single block. Where (Er)j is the ith

row extrapolated values.

Blcv =18

7∑j=0

∣∣∣(Er)j − (El)j

∣∣∣ (3)

Fig. 3. Representation of pixel values for blocking artifacts.

The values for the horizontal artifacts can be calculatedin similar fashion. A blockiness score can be estimated bysumming up the vertical and horizontal blockiness artifacts.

BS = Blcv +Blch (4)

Figure 4 shows the increase in the measured blocking ar-tifacts with the addition of blockiness (due to compression)impairments in image.

Fig. 4. Estimation of blocking artifacts in image.

2.3. Blur Estimation

Blur is usually caused by the quantization process or by thede-blocking filter. The high frequency information is associ-ated with the detail of an image which is represented by thehigh frequency components. Quantization process removessuch high frequency information detail from an image thatresults in a blurred image. Blur is actually hard to estimateaccurately without reference image. One way of finding theamount of blur is to monitor the changes in the activity sig-nal [6]. Blur is calculated across the horizontal and verticalboundaries of the 8 × 8 adjacent blocks. As we are only in-terested in the blocks that are text salient, so the occurrenceof inaccurate results in regions with too complicated or tooplain texture is ignored as in [20]. Local variance is usedto estimate the blurriness in an image constituting the salientblocks. First the variance is calculated across the horizontalblocks and then across the vertical ones. The local variance atthe first row of blocks B11 and B12 is given by:

σ11 =

√∑ni=1 |xi − yi|n− 1

(5)

Where n = 2 in the example in Figure 3. Next we com-pute the average of these local variances along row j in theimage, as follows:

∆σj = mean{σji − σj(i+1)|i ∈ {1, 2, ...,K}}, (6)

Where K is the number of 8 × 8 blocks in the horizontaldirection of the image. The sum of total blur across verticaldirection is given by

Blrv =N∑

j=1

∆σj (7)

57

Where N is the total number of rows in the image.The value for the horizontal blur is calculated in similar

fashion.

2.4. Quality Prediction Score

An overall image quality value is obtained by combining themeasured blockiness and blurriness from the dataset. First theaverage blocking and blurring values are obtained by combin-ing the vertical and horizontal artifacts.

Blc =(Blcv +Blch

2

)(8)

Blr =(Blrv +Blrh

2

)(9)

Then the following prediction model is used to combinethe artifacts

QPM = 10×(α+ δ ×Blca ×Blrb

)× T c (10)

The values of a = −0.24, b = −0.16, and c = 0.06are estimated from the image dataset used to train the algo-rithm using a non-linear regression routine. Where α = 0.5and δ = 2.356 are adjusted based on opinion score on trainingdataset. While T = 2 is a perceptual threshold obtained viathe subjective test questionnaire and it represents the accept-able amount of blocking and blurring artifacts in an image. Tis used to fine-tune the parameters α and δ.

3. EXPERIMENTAL SETUP

The psychophysical experiments were conducted on a Dell2407 wide flat panel, with monitor white point set to D65,light intensity of 120cd/m2 (376.8 lux) and a resolution of1920x1200 pixels with 32bits color quality. The ambient lightintensity was set to 200 lux. The images were shown in theiroriginal sizes with 786 x 560 average resolutions per image.Figure 5 shows samples of the images used.

Three different categories of images were created from 7original images consisting of 3 green board images, 2 whiteand 2 black. The first two categories of images were havingblocking and blurring artifacts respectively. Each categoryconsists of 5 datasets having 10 images each. While the thirdcategory consists of 3 datasets having 10 images each. Table1 shows the datasets classification.

Table 1. Classification of artifacts into different categoriesArtifacts Type Sets Images Total Image

Cat. 1 blocking 5 10 50Cat. 2 blurring 5 10 50Cat. 3 block/blur 3 10 30

Fig. 5. Sample of images from different categories in thedataset.

During the psychophysical experiment a total of 130 im-ages were shown in random fashion to 17 subjects without thereference images. The viewers were asked to rate the qualityof each image based on the ease of readability of the text on ascale of 1 to 5. Where 1 means barely readable image (highlydegraded) and 5 corresponds to easily readable one. Figure 6shows the quality rating scale.

Fig. 6. Quality rating scale.

4. EXPERIMENTAL RESULTS

A high correlation value of 0.92 is obtained for 130 imagesin total, from three different categories, when comparing thequality prediction model (QPM) score with the Mean OpinionScore (MOS). The results were obtained from 17 non-expertsubjects. The prediction model was trained on 65 imagesfrom all categories of dataset. There is also a significant in-

58

crease in correlation from 0.81 to 0.92 when comparing MOSwith PSNR and QPM respectively as seen in Figure 7 and 8.

Fig. 7. Scatter of QPM score vs. MOS of 0.927.

Fig. 8. Scatter of PSNR vs. MOS = 0.815.

Table 2 shows the average correlation per artifact fordatasets containing images impaired with blocking, blurring,and with both blocking and blurring artifacts in each image.The blocking artifact is introduced in images using imwriteroutine in matlab.

It is really difficult to accurately estimate the amount ofblurring artifact without the presence of reference image. Forthis reason in some images having blurring artifacts only,PSNR shows slightly better results. For the rest of the imagesin each category, QPM shows very high correlation. The re-sults are shown in Table 3. In images with both the artifacts,the presence of one artifact masks the other. This is true forcategory 3 dataset images. Where, images were first intro-duced with blurring artifacts and then with the blocking. Thisresults in the masking of blurring artifact by blocking. i.e. theblurring artifacts tends to fadeout. For these kind of images

Table 2. Average correlation per artifacts.Category 1 Category 2 Category 3blocking blurring blocks/blur

PSNR 0.837 0.752 0.853QPM 0.972 0.842 0.932

Table 3. Individual dataset correlation coefficientBlocking

Dataset No. 1 2 3 4 5PSNR 0.77 0.63 0.89 0.54 0.76QPM 0.94 0.95 0.98 0.91 0.98

BlurringPSNR 0.67 0.61 0.81 0.59 0.71QPM 0.65 0.59 0.80 0.87 0.77

Blocking/BlurringPSNR 0.85 0.82 0.87 - -QPM 0.93 0.93 0.94 - -

as well, QPM show significance increase in correlation withMOS than PSNR.

5. CONCLUSION AND FUTURE WORK

In this paper we studied different image distortions and im-pairments that affect the perceived quality of blackboard lec-tures images. And conducted a psychophysical tests whereusers were asked to rate the perceived quality of the impairedimages. The test images were extracted from lecture videosstored in compressed format and thus suffered from differ-ent types of impairments introduced by the compression pro-cess. The MOS was calculated for the different types of im-pairments in the training dataset and used to develop a refer-ence free objective quality metric. This metric was shown tocorrelates well with the subjective score obtained for the testdataset. From the experiments it was observed that peoplemostly focus on the text rather than the background. To ex-ploit this, the QPM relies on text salient regions around whichit estimates the degrees of blur and blockiness, which makes itcorrelate better with the Human Visual System. Comparisonis made with the PSNR as it is still widely accepted qualitymetric when video compression is involved.

Even though the initial results obtained in this work arevery encouraging, to adopt a metric for lecture images ob-tained from lecture videos in general is not easy. This is due tothe sensitivity of the overall image quality for blackboard im-age to the text size, orientation and handwriting style. More-over quality degradations in some image regions may be lessnoticeable due to uniform background with less text. There-fore, further studies are needed to improve the stability of theproposed quality metric and to cover different types of videos.

59

6. REFERENCES

[1] Ali Shariq Imran, “Interactive media learning objectin distance and blended education,” in MM ’09: Pro-ceedings of the seventeen ACM international conferenceon Multimedia, New York, NY, USA, 2009, pp. 1139–1140, ACM.

[2] Zhengyou Zhang and Li wei He, “Note-taking with acamera: whiteboard scanning and image enhancement,”in Acoustics, Speech, and Signal Processing, 2004. Pro-ceedings. (ICASSP ’04). IEEE International Conferenceon, 17-21 2004, vol. 3, pp. iii – 533–6 vol.3.

[3] R. Dueire Lins, G. Pereira e Silva, and A.R. Gomes eSilva, “Assessing and improving the quality of doc-ument images acquired with portable digital cameras,”in Document Analysis and Recognition, 2007. ICDAR2007. Ninth International Conference on, 23-26 2007,vol. 2, pp. 569 –573.

[4] Konstantinos Ntirogiannis, B. Gatos, and I. Pratikakis,“An objective evaluation methodology for document im-age binarization techniques,” in Document Analysis Sys-tems, 2008. DAS ’08. The Eighth IAPR InternationalWorkshop on, 16-19 2008, pp. 217 –224.

[5] Lijun Tang and J.R. Kender, “Educational video under-standing: mapping handwritten text to textbook chap-ters,” in Document Analysis and Recognition, 2005.Proceedings. Eighth International Conference on, 292005, pp. 919 – 923 Vol. 2.

[6] Zhou Wang, A.C. Bovik, and B.L. Evan, “Blind mea-surement of blocking artifacts in images,” in Image Pro-cessing, 2000. Proceedings. 2000 International Confer-ence on, 2000, vol. 3, pp. 981 –984 vol.3.

[7] A.C. Bovik and Shizhong Liu, “Dct-domain blind mea-surement of blocking artifacts in dct-coded images,” inAcoustics, Speech, and Signal Processing, 2001. Pro-ceedings. (ICASSP ’01). 2001 IEEE International Con-ference on, 2001, vol. 3, pp. 1725 –1728 vol.3.

[8] Zhou Wang, H.R. Sheikh, and A.C. Bovik, “No-reference perceptual quality assessment of jpeg com-pressed images,” in Image Processing. 2002. Proceed-ings. 2002 International Conference on, 2002, vol. 1,pp. I–477 – I–480 vol.1.

[9] Xin Li, “Blind image quality assessment,” in Image Pro-cessing. 2002. Proceedings. 2002 International Confer-ence on, 2002, vol. 1, pp. I–449 – I–452 vol.1.

[10] Xin Wang, Baofeng Tian, Chao Liang, and DongchengShi, “Blind image quality assessment for measuring im-age blur,” in Image and Signal Processing, 2008. CISP’08. Congress on, 27-30 2008, vol. 1, pp. 467 –470.

[11] Wei Fu, Xiaodong Gu, and Yuanyuan Wang, “Imagequality assessment using edge and contrast similarity,”in Neural Networks, 2008. IJCNN 2008. (IEEE WorldCongress on Computational Intelligence). IEEE Inter-national Joint Conference on, 1-8 2008, pp. 852 –855.

[12] Chong-Yaw Wee, R. Paramesran, and R. Mukundan,“Quality assessment of gaussian blurred images usingsymmetric geometric moments,” in Image Analysis andProcessing, 2007. ICIAP 2007. 14th International Con-ference on, 10-14 2007, pp. 807 –812.

[13] Z.M.P. Sazzad, Y. Kawayoke, and Y. Horita, “Spatialfeatures based no reference image quality assessmentfor jpeg2000,” in Image Processing, 2007. ICIP 2007.IEEE International Conference on, sept. 2007, vol. 3,pp. III –517 –III –520.

[14] Jingchao Zhou, Baihua Xiao, and Qiudan Li, “A no ref-erence image quality assessment method for jpeg2000,”in Neural Networks, 2008. IJCNN 2008. (IEEE WorldCongress on Computational Intelligence). IEEE Inter-national Joint Conference on, 1-8 2008, pp. 863 –868.

[15] Timothee Jost, Nabil Ouerhani, Roman von Wartburg,Rene Muri, and Heinz Hugli, “Assessing the contribu-tion of color in visual attention,” Comput. Vis. ImageUnderst., vol. 100, no. 1-2, pp. 107–123, 2005.

[16] R. Barland and A. Saadane, “Blind quality metric us-ing a perceptual importance map for jpeg-20000 com-pressed images,” in Image Processing, 2006 IEEE In-ternational Conference on, 8-11 2006, pp. 2941 –2944.

[17] Junyong You, Andrew Perkis, Miska M. Hannuksela,and Moncef Gabbouj, “Perceptual quality assessmentbased on visual attention analysis,” in MM ’09: Pro-ceedings of the seventeen ACM international conferenceon Multimedia, New York, NY, USA, 2009, pp. 561–564, ACM.

[18] L. Itti and C. Koch, “Computational modelling of visualattention,” Nature Reviews Neuroscience, vol. 2, no. 3,pp. 194–203, Mar 2001.

[19] Wolfgang Einhuser, T. Nathan Mundhenk, Pierre Baldi,Christof Koch, and Laurent Itti, “A bottomup model ofspatial attention predicts human error patterns in rapidscene recognition,” Journal of Vision, vol. 7, no. 10, pp.–, 2007.

[20] Liu Debing, Chen Zhibo, Ma Huadong, Xu Feng, andGu Xiaodong, “No reference block based blur de-tection,” in Quality of Multimedia Experience, 2009.QoMEx 2009. International Workshop on, 29-31 2009,pp. 75 –80.

60