a feature-based metric for the quantitative evaluation of pixel-level image fusion

www.elsevier.com/locate/cviu

Computer Vision and Image Understanding 109 (2008) 56–68

A feature-based metric for the quantitative evaluationof pixel-level image fusion

Zheng Liu a,*, David S. Forsyth a, Robert Laganiere b

a Institute for Aerospace Research, National Research Council Canada, 1200 Montreal Road, Ottawa, Ont., Canada K1A 0R6b School of Information Technology and Engineering, University of Ottawa, Canada

Received 14 February 2005; accepted 25 April 2007Available online 24 May 2007

Abstract

Pixel-level image fusion has been investigated in various applications and a number of algorithms have been developed and proposed.However, few authors have addressed the problem of how to assess the performance of those algorithms and evaluate the resulting fusedimages objectively and quantitatively. In this study, two new fusion quality indexes are proposed and implemented through using thephase congruency measurement of the input images. Therefore, the feature-based measurements can provide a blind evaluation of theimage fusion result, i.e. no reference image is needed. These metrics take the advantage of the phase congruency measurement whichprovides a dimensionless contrast- and brightness-invariant representation of image features. The fusion quality indexes are comparedwith recently developed blind evaluation metrics. The validity of the new metrics are identified by the test on the fusion results achievedby a number of multiresolution pixel-level fusion algorithms.Crown copyright � 2007 Published by Elsevier Inc. All rights reserved.

Keywords: Feature measurement; Image fusion; Image quality; Phase congruency; Cross-correlation

1. Introduction

The analysis of multisensor images benefits from thetechnique named image fusion. The applications includebut are not limited to military target detection, medicalimaging, remote sensing, nondestructive inspection, andintelligent surveillance [1]. The study of fusing multipleimages mainly focuses on how to optimize the use of theredundant and complementary information provided byheterogeneous image sensors. With different purposes, theimage fusion algorithms can be classified as either combi-nation or classification fusion. In the first case the fusionalgorithm consists of combining the complementary fea-tures from multiple input images while in the second casethe redundant information is mainly used for making a

1077-3142/$ - see front matter Crown copyright � 2007 Published by Elsevie

doi:10.1016/j.cviu.2007.04.003

* Corresponding author. Present address: National Research CouncilCanada, The Institute for Research in Construction, Canada. Fax: +1 613993 1866.

E-mail address: [email protected] (Z. Liu).

decision through modeling. The output of the first typeof image fusion is still an image but comprised of the mostsalient features captured from the different sensors. Usuallysuch fusion operation is implemented with four major stepsat the pixel-level as illustrated in Fig. 1. The success of aspecific post-processing or analysis relies largely on the per-formance of the specific fusion algorithm. The goal of theclassification fusion is to derive a thematic map that indi-cates certain homogeneous characteristics of pixel regionsin the image. This process needs a higher-level operationlike feature- or decision-level fusion. In order to determinethe performance of different fusion algorithms, an objectivemetric that can quantitatively measure the quality of theresults should be introduced. However, such a metricwould largely depend on the requirements of the specificapplication. A straightforward way to implement the eval-uation is through the comparison with a reference image,which is assumed to be perfect. The metrics for image com-parison are often employed for the quantitative assessmentof image fusion. In the case of classification fusion, the

r Inc. All rights reserved.

mailto:[email protected]

Fig. 1. The procedure for pixel-level image fusion.

Z. Liu et al. / Computer Vision and Image Understanding 109 (2008) 56–68 57

resulting thematic map is compared with the ground truthdata. The classification results are then used to generate aconfusion matrix. Alternatively the classification errorsfor each of the classes, and for various thresholds, can alsobe represented by a receiver operating characteristic (ROC)curve. The perfect results can be prepared with the help ofexperts’ experience. Unfortunately, the reference image isnot always perfect or available practically, thus, raisingthe need for the quantitative and blind evaluation of thefused images.

A typical example for pixel-level image fusion is thefusion of multi-focus images from a digital camera [2,3].In such case, a cut and paste operation is applied to obtainthe full-focus image that will serve as a reference for eval-uating the fusion results. However, such operation doesnot assure a perfect reference image. In some applications,the ground truth reference can be generated from a moreprecise measurement. For example, in nondestructive test-ing the thickness of a multilayer-structured specimen froman aircraft is estimated from the eddy current inspection orthe ultrasonic testing. The actual thickness map of the spec-imen can be obtained through the post-teardown inspec-tion with X-ray, a destructive process that is used toverify the accuracy of the estimation [4]. Such comparisoncan only be applied after the acquired images are fully reg-istered, i.e. converted to the same resolution, size, and for-mat. The evaluation metric should be optimized for theimage feature. Pixel-by-pixel comparison does not meetthe requirement, because in the original image pixels areclosely related. Moreover, it would be better if the quanti-tative evaluation can still be achieved without the presenceof reference image. This is the case of most practical appli-cations. The evaluation metric should provide a measure-ment of how well the information of the inputs isintegrated into the output.

In this paper, the study is limited to the evaluation ofpixel-level image fusion. We propose a measurement offusion performance based on the phase congruency calcula-tion which was suggested by Kovesi [5–7]. The phasecongruency measurement provides an absolute or dimen-sionless value ranging from 0 to 1; the larger values corre-spond to the salient features. Two methods are proposedto identify the availability and quality of input features inthe fused image. The first one is based on the modified struc-tural similarity measurement, where phase congruency is

employed as the structural component. Similarity map withthe fused image is generated for each input image. Then, thelarger value at each location is retained for overall assess-ment. The second one is implemented by computing the localcross-correlation of the phase congruency maps between thefused and input images. The index value is obtained by aver-aging the similarity or cross-correlation value in each pre-defined region. The proposed schemes achieve a no-referenceevaluation of the fused image. Experiments are carried outon a group of fused images obtained by various multiresolu-tion fusion algorithms. The validity of the proposed methodsare demonstrated.

The rest of the paper is organized as follows. An over-view of the metrics used for image fusion is presented inSection 2. In Section 3, the concept and implementationof the feature-based evaluation are described. Experimentswith the proposed approach and comparison with otherexisting methods can be found in Section 4. The experimen-tal results obtained from both the reference-based assess-ment and blind evaluation are presented. Thus, theproposed algorithms are validated with these tests. Section5 presents the discussion. In the final Section 6, the conclu-sion of whole paper is drawn.

2. Evaluation of image fusion algorithms

A straightforward approach for fusion performanceassessment is to compare the fused image with a referenceimage. The commonly used methods include the root meansquare error (RMSE), normalized least square error(NLSE), the peak signal-to-noise ratio (PSNR), correlation(CORR), difference entropy (DE), and mutual information(MI) [8]. The expressions that correspond to the abovetreatments are listed below. The meaning of the symbolsused in the following equations is listed in Table 1.

Root mean square error (RMSE):ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPMm¼1

PNn¼1½Iðm; nÞ � F ðm; nÞ�2

MN

sð1Þ

Normalized least square error (NLSE):ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPMm¼1

PNn¼1½Iðm; nÞ � F ðm; nÞ�2PM

m¼1

PNn¼1½Iðm; nÞ�

2

sð2Þ

Table 1The notation for Eqs. (1)–(6)

I(m,n) Reference imageF(m,n) Fused imageL Maximum pixel valuePI(g) Probability of value g of the reference imagePF(g) Probability of value g of the fused imagehIF(i, j) Normalized joint histogram of the reference image and the fused

imagehI(i) Normalized marginal histogram of the reference imagehF(j) Normalized marginal histogram of the fused image

58 Z. Liu et al. / Computer Vision and Image Understanding 109 (2008) 56–68

Peak signal-to-noise ratio (PSNR):

10 log10

L2

1MN

PMm¼1

PNn¼1½Iðm; nÞ � F ðm; nÞ�2

!ð3Þ

Correlation (CORR):

2PM

m¼1

PNn¼1Iðm; nÞF ðm; nÞPM

m¼1

PNn¼1Iðm; nÞ2 þ

PMm¼1

PNn¼1F ðm; nÞ2

ð4Þ

Difference entropy (DE):

XL�1

g¼0

P IðgÞlog2P IðgÞ �XL�1

g¼0

P FðgÞlog2P FðgÞ��

�� ð5Þ

Mutual information (MI):

XL

i¼1

XL

j¼1

hIFði; jÞlog2

hIFði; jÞhIðiÞhFðjÞ

ð6Þ

The difference between the reference image and the fusedone is calculated and serves as a measurement of the qual-ity of the fused image. The difficulty of the comparison-based approach is that the reference is not perfect or a ref-erence may not always be available. Moreover, there is apossibility that images with a similar RMSE value mayexhibit a quite different quality [9]. Some other methodsthat consider human visual system and attempt to incorpo-rate perceptual quality measurement do not show any clearadvantage over simple measurement like RMSE and PSNRunder the image distortion environments [9,10].

Thus, it would be better if the assessment can be accom-plished without any reference image. In this way, the fusedimage only needs to refer to the input images to evaluateitself. Qu et al. considered MI and simply used the summa-tion of the MI between the fused image (F) and inputs (Aand B) to represent the difference in quality. The expressionof MI-based fusion performance measure MAB

F is [11]:

MABF ¼

Xi;j

hAFði; jÞlog2

hAFði; jÞhAðiÞhFðjÞ

þX

i;j

hBFði; jÞlog2

� hBFði; jÞhBðiÞhFðjÞ

ð7Þ

where hAF(i, j) indicates the normalized joint grey-level his-togram of images A and F, hK(i, j) (K = A, B, and F) is thenormalized marginal histogram of image A, B, or F. How-ever, the MI metric still needs a reference value to compare

with. Furthermore, the MI-based approach is insensitive toimpulsive noise and is subject to great change in the pres-ence of additive Gaussian noise.

Another strategy of blind assessment without referencewas proposed by Xydeas and Petrovic [12,13]. Theirmethod aimed at measuring the amount of visual informa-tion transferred from the input images to the fused image.With the assumption that the edge information is closelyrelated to the visual information, the metric is defined interms of edge strength and orientation. The Sobel edgeoperator is used in the implementation to extract thestrength and orientation information for each pixel. Unfor-tunately, the Sobel edge detection, which is based on themeasurement of the intensity gradient, depends on imagecontrast and spatial magnification and hence one doesnot know in advance what level of edge strength corre-sponds to a significant feature [5,6].

Recently, Piella and Heijmans defined a fusion qualityindex based on Wang and Bovik’s work on a so-called uni-versal image quality index (UIQI) [14]. The universal imagequality index is based on the evidence that human visualsystem is highly adapted to structural information and ameasurement of the loss of structural information can pro-vide a good approximation of the perceived image distor-tion. The definition of the UIQI is [9]:

Q ¼ 4rxy�x�yðr2

x þ r2yÞð�x2 þ �y2Þ ¼

rxy

rxry� 2�x�y�x2 þ �y2

� 2rxry

r2x þ r2

y

ð8Þ

where �x and �y are the average values, rx, ry, and rxy are thecovariance, respectively.

In the above equation, three components measure thedegree of linear correlation between image x and y, i.e.how close the mean luminance is between x and y, andhow similar the contrasts of the images are. This equationhas been modified to produce the structural similarityindex measure (SSIM) which is better adapted to more gen-eral conditions [10]. The quality measurement is applied tolocal regions using a sliding window of size 8 · 8 from topleft to bottom right of the image. The overall quality isgiven by the average Q ¼

PMb¼1Qb=M , where M is the total

number of image blocks. The formula can be rewritten inanother form like:

Q0ða; bÞ ¼1

jW jXw2W

Q0ða; bjwÞ ð9Þ

where a and b are two images for comparison while w

stands for the sliding window. W is the family of all win-dows and jWj is the cardinality of W. Piella and Heijmansdefined three fusion quality indexes based on the UIQIconcept [14,15]:

Qða;b;f Þ¼ 1

jW jXw2W

½kðwÞQ0ða;f jwÞþð1�kðwÞQ0ðb;f jwÞÞ� ð10Þ

Qwða;b;f Þ¼Xw2W

cðwÞ½kðwÞQ0ða;f jwÞþð1�kðwÞQ0ðb;f jwÞÞ� ð11Þ

QEða; b; f Þ ¼ Qwða; b; f ÞQwða0; b0; f 0Þa ð12Þ


where f stands for the fused image of a and b. The variancemeasure k(w) can be expressed as:

kðwÞ ¼ sðajwÞsðajwÞ þ sðbjwÞ ð13Þ

and C(w) = max(s(ajw), s(bjw)), cðwÞ ¼ CðwÞ=ðP

w02W Cðw0ÞÞ.Herein, s(ajw) and s(bjw) are the local salience of image a andb, respectively. One choice is using the variance of image aand b within the window w of size 8 · 8. Eq. (10) gives a gen-eral definition of fusion quality index. The value Q0 in Eq. (9)measures the difference between the inputs and the fusedimage, weighted by the variance measurement. a 0, b 0, and f 0

are the corresponding edge map of image a, b, and f, respec-tively. The weighted fusion quality index actually carries outa maximum selection operation to emphasize the overallimportance of each block. In Eq. (12), the edge-dependentfusion quality index tries to include the effect of edges, theircontribution being controlled by the parameter a. Piella andHeijmans also implemented a weighted sum or a scaledweighted sum of the similarity measure UIQI or SSIM[14,15]. The success of Piella’s metrics depends on theperformance of SSIM.

3. A strategy for the feature-based evaluation

This section describes our approaches for the evaluationof fused image. The proposed feature-based strategy pro-ceeds in two steps: first extracting image features and thenmeasuring how those features are integrated in the fusedimage. The phase congruency is employed to provide anabsolute measurement of image feature. Such measurementis incorporated into the SSIM or a local cross-correlation isperformed to determine if the features from inputs areavailable in the fused image. An overall evaluation isobtained by averaging those local measurements.

3.1. Image feature extraction

3.1.1. The concept of phase congruency

Gradient-based image feature detection and extractionapproaches are sensitive to the variations in illumination,blurring, and magnification. The threshold applied needsto be modified appropriately. A model of feature percep-tion named local energy was investigated by Morroneand Owens [16]. This model postulates that features areperceived at points in an image where the Fourier compo-nents are maximally in phase. A wide range of feature typesgive rise to points of high phase congruency. With the evi-dence that points of maximum phase congruency can becalculated equivalently by searching for peaks in the localenergy function, the relation between the phase congruencyand local energy is established, that is [5,7]:

PCðxÞ ¼ EðxÞPnAnðxÞ þ e

ð14Þ

EðxÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiF 2ðxÞ þ H 2ðxÞ

qð15Þ

where PC(x) is the phase congruency at some location x

and E(x) is the local energy function. An represents theamplitude of the nth component in the Fourier seriesexpansion. A very small positive constant e is added tothe denominator in case of small Fourier amplitudes. Inthe expression of local energy, F(x) is the signal with itsDC component removed and H(x) is the Hilbert transformof F(x).

3.1.2. Implementation of phase congruency algorithm with

the logarithmic Gabor filter

Kovesi proposed a scheme to calculate phase congru-ency using logarithmic Gabor wavelets [5,6,17], whichallow arbitrarily large bandwidth filters to be constructedwhile still maintaining a zero DC component in the even-symmetric filter. However, when there are few high-fre-quency components in the signal, the frequency spread isreduced. The phase congruency will be one everywhere ifthe signal is a pure sine wave. To counter this problem, aweighting function W(x) is constructed to devalue phasecongruency at locations where the spread of filter responseis narrow. To further enhance the calculation, Kovesibrought in a more sensitive phase deviation DU(x) to definethe phase congruency. The equation of the new measurenow becomes [5,7]:

PCðxÞ ¼P

nW ðxÞbAnðxÞDUnðxÞ � T cPnAnðxÞ þ e

ð16Þ

where ºß denotes that the enclosed quantity is not permittedto be negative. The detailed implementation of the algo-rithm is shown as a flowchart in Fig. 2. The convolution re-sults of the input image I(x) with quadrature pairs of filtersat scale n, enðxÞ ¼ IðxÞ �M e

n and onðxÞ ¼ IðxÞ �Mon , consist

of the basic components to calculate PC(x). M en and Mo

n de-note the even-symmetric and odd-symmetric wavelet at thisscale, respectively. T is the compensation for the influenceof noise and is estimated empirically for the implementa-tion in Eq. (16). In the equation of weighting functionW(x) in the flowchart, parameter c is the cut-off value of fil-ter response and c is a gain factor that controls the sharp-ness of the cut-off [5,7].

To extend the algorithm to images, the one-dimensionalanalysis is applied to several orientations and the resultsare combined in different ways. The two-dimensional phasecongruency can be expressed as [5,7]:

PCðxÞ ¼P

o

PnW oðxÞbAnoðxÞDUnoðxÞ � T ocP

o

PnAnoðxÞ þ e

ð17Þ

where o denotes the index over orientation. The noise com-pensation To is performed in each orientation indepen-dently. By simply applying the Gaussian spreadingfunction across the filter perpendicular to its orientation,the one-dimensional Gabor filter can be extended intotwo dimensions. The orientation space can be quantifiedusing a step size of p/6, which results in six different orien-

Fig. 2. The implementation of phase congruency algorithm. M en and Mo

n denote the even-symmetric and odd-symmetric wavelet at this scale, respectively.


tations. For an extensive discussion of the underlying the-ory, readers are referred to Refs. [7,5].

3.2. Image comparison based on phase congruency feature

measure

3.2.1. A modified SSIM metric

The comparison of images can be carried out by com-paring their corresponding phase congruency features. Itis appropriate to evaluate the space-variant features locallyand combine them together [9,10]. The first consideration isto incorporate the phase congruency measurement into thestructural similarity framework proposed by Wang et al.[10].

Similar to Eq. (8), the SSIM is defined as [10]:

SSIMði; jÞ ¼ ½lði; jÞ�a½cði; jÞ�b½sði; jÞ�c ð18Þ

where l(i, j), c(i, j), and s(i, j) are the luminance, contrast,and structure component, respectively (see Eq. (8)). Theparameter a, b, and c are used to adjust the relative impor-tance of the three components. The major problem withSSIM is that it fails to measure severely blurred image[18]. Chen et al. proposed edge-based structure measure-ment to replace the s(i, j) component in Eq. (19) [18]. The

edge direction histogram was used to compare the edgeinformation. Instead of using the edge direction vector,we use the phase congruency measurement. The feature-based SSIM (FSSIM) becomes:

FSSIMði; jÞ ¼ ½lði; jÞ�a½cði; jÞ�b½f ði; jÞ�c ð19ÞHerein, the feature component is:

f ði; jÞ ¼ r0ab þ er0ar

0b þ e

ð20Þ

where a small constant is added to avoid the denominatorbeing zero. The covariance and standard deviation of twophase congruence maps a and b are r 0ab, r 0a, and r 0b. Eq.(20) is actually an expression for calculating the correla-tion. We treat the three factors equally and the parametera, b, and c are all set to one in our following experiments.

3.2.2. Phase congruency based comparison

Another solution is to compare images with their phasecongruence maps based on the cross-correlation directly[19]. However, in a phase congruency map, the sub-blockwindow could be blank, i.e. all the feature points are zeroat the locations without any features. The immediate resultcannot be obtained due to the zero in the denominator of


cross-correlation’s expression. Fig. 3 indicates an alterna-tive procedure to calculate the local cross-correlation.

First, the pixels in the sub-block window from the twoimages are summed, respectively, to get the results A andB. The summation and product of these two values are C

and D. If the summation C appears to be zero, that meansboth A and B are zero and these two image blocks aretotally matched. Therefore, the corresponding cross-corre-lation value should be set to 1. When the block is different,i.e. C „ 0 and D = 0, the cross-correlation value is set to 0.Otherwise the value is computed by the zero-mean normal-ized cross-correlation (ZNCC) [19] in Eq. (21):

ZNCC¼PM

i¼1

PNj¼1ðI1ði;jÞ��I1ÞðI2ði;jÞ��I2ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPM

i¼1

PNj¼1ðI1ði;jÞ��I1Þ2 �

PMi¼1

PNj¼1ðI2ði;jÞ��I2Þ2

qð21Þ

where �I1 and �I2 are the average value of the two imageblocks I1(i, j) and I2(i, j), respectively.

We tested the phase congruency based approach with aset of images prepared by Wang and Bovik [9] in Fig. 4 andthe corresponding phase congruence feature maps can befound on the right side. The results obtained by applyingthe metrics described in Section 2 are given in Table 2.Although the test images are prepared to have an identicalroot mean square error, their quality are completely differ-ent. It should be noted that the mean shift operation doesnot degrade the image quality at all as we obtain 1 for theassessment. However, if the mean-shifted image is dis-played or manipulated in a different way, the conclusionmay be different. The first way is to map the maximumand minimum value to 0 and 255. This does not change

A B

Summation

CD

?0C

?0D

N

1res0res),( 2 21 blockblockcorrres

)( resaverageresult

Y

N Y

Fig. 3. The Pref metric for reference-based evaluation.

the appearance of the original image. The second is settingthe pixel value to 255 if it exceeds 255. If the image is pro-cessed as the latter one, the operation is not the mean shift-ing any more and the image appears to be degraded as inWang and Bovik’s publication [9]. To avoid such confu-sion, we show the mean-shifted image with the firstmethod. The image appears to be exact the same as its ori-gin. From the comparison, we can see the phase congru-ency can be used to measure the quality of an image witha reference; thus we can derive quality metrics to blindlyevaluate image fusion result in the next section.

3.3. Quality metric for image fusion evaluation

As stated previously, a blind evaluation of the fusedimage is preferred for practical applications, because aground truth or a perfect reference is not always availablefor comparison. The pixel-level fusion is to integrate imagefeatures like edges, lines, and region boundaries into onecomposite image. The success of the fusion algorithm willbe assessed by the measure of the image features availablein the fused image and those from multiple input sources.Phase congruency is used as the basis for the feature extrac-tion and measurement.

3.3.1. The Fblind metric

Based on the FSSIM, we first propose a Fblind metric toassess the fused image without a reference. Mathematically,the Fblind can be expressed as:

F blind ¼1

M

XM

m¼1

ðmaxðFSSIMðf ; aÞ;FSSIMðf ; bÞÞÞ ð22Þ

where f stands for the fused image while a and b are the in-puts. The FSSIM between the fused image and the inputimage is computed, respectively. The larger value meansstronger feature from input image is detected in the fusedimage. This value is retained for final summation andaverage.

3.3.2. The Pblind metric

The flowchart in Fig. 5 presents the procedure to calcu-late the second blind evaluation metric Pblind. The phasecongruency maps of the input and fused images are firstlycalculated. A third feature map Mpc is derived by point-by-point maximum selection of the two input maps Apc andBpc, i.e. retaining the larger feature points Mpcði; jÞ ¼maxðApcði; jÞ;Bpcði; jÞÞ. The feature map of the fused image,Fpc, is then compared to Apc, Bpc, and Mpc locally. At eachsub-block, the cross-correlation values between these mapsare computed. The evaluation index Pblind is the averageover all the blocks. The flowchart is controlled by three val-ues: Apc, Bpc, and Fpc, and there is Dpc (k) = Apc(k) +Bpc(k).

Again, when the local cross-correlation of the phasecongruency map is considered, the denominator in Eq.(21) might be zero. However, this does not mean a low

Fig. 4. Image ‘‘Lena’’ and its corresponding phase congruency map (on the right): (a) original ‘‘Lena’’ image; (b) salt–pepper noise contaminated image;(c) Gaussian noise contaminated image; (d) speck noise contaminated image; (e) mean-shifted image; (f) contrast stretched image; (g) blurred image; (h)JPEG compressed image.

Table 2Experimental results of image quality assessment

Images Lena (b) Lena (c) Lena (d) Lena (e) Lena (f) Lena (g) Lena (h)

RMSE 15.0123 15.006 14.9916 15.00 15.0031 14.9713 14.6668NLSE 0.1141 0.1141 0.1140 0.1140 0.1140 0.1138 0.1115PSNR 22.7157 22.8299 23.4438 40.7125 27.4375 22.1800 22.5848CORR 0.9531 0.9542 0.9543 1.00 1.00 0.9510 0.9537DE �0.0163 �0.1953 �0.1412 0 0.0067 0.1402 3.1991MI 3.3484 1.1160 1.4042 3.4120 3.5453 1.4740 1.2560UIQI 0.6494 0.3891 0.4408 0.9894 0.9372 0.3461 0.2876SSIM 0.7227 0.4508 0.5009 0.9890 0.9494 0.6880 0.6709FSSIM 0.5357 0.3109 0.3552 0.989 0.9463 0.0742 0.1721Pref 0.4734 0.3781 0.3868 1 0.9904 0.2928 0.2815


correlation. The summation of each block is considered inthe flowchart of Fig. 5: (1) when the input blocks are zero(Dpc(k) = 0), the correlation result should be one if thefused block is zero too (Fpc(k) = 0); otherwise the cross-correlation value is set to zero; (2) when the fused blockis not zero, the cross-correlation value is computed witheither of the input block that is not zero; (3) when boththe input and fused blocks are not zero (Dpc(k) „ 0), themaximum of the cross-correlation of the fused block withApc, Bpc, and Mpc is selected as the result. The overallcross-correlation value is: P ¼ 1

K

PKk¼1P ðkÞ, where K is the

total number of blocks.

Herein, we use the maximum-selected feature map aspart of the reference, because the feature can be comparedthrough the dimensionless measure directly. Unlike pixelswhich are closely related in the original images, the pointsin the phase congruency map indicate the salience of imagefeature. Therefore, the selection of feature points is notequivalent to the selection of pixels with larger value inthe original image followed by the computation of thewhole phase congruency map. Selecting larger featurepoints can provide a reference for comparison althoughthis arrangement is not always optimal. For the combina-tive fusion, especially when heterogeneous sensors are

Fig. 6. Four cases in a combinative fusion. For a small local region in thefused image, the local feature may come from the corresponding block ofthe input image A or B, or a combination of them.

Fig. 5. The blind evaluation algorithm by using phase congruency map.


involved, the feature in the fused image may come frominput images or a combination of them, as shown inFig. 6. That is why we need both the inputs and maxi-mum-selected map for the similarity measure andcomparison.

4. Experimental results

The major differences between the existing fusion algo-rithms reside in two aspects: the multiresolution strategyused and the fusion rule to be applied in the transformdomain. Various multiresolution representations have beeninvestigated for image fusion applications. The efficiency ofthe fusion rules largely depends on the applications, i.e. thecharacteristic of the images in the tests. The group ofimages considered here consists of multi-focus and simu-lated multi-focus images from a digital camera and the cor-

responding full-focus reference images as shown in Fig. 7.The multi-focus images are fused by using the multiresolu-tion algorithm. In the experiment, we choose the followingalgorithms for comparison: Laplacian pyramid, gradientpyramid, ratio-of-lowpass (RoLP) pyramid, Daubechieswavelet four, spatially invariant discrete wavelet transform(SIDWT), and Simoncelli’s steerable pyramid. The detailedimplementation of these algorithms can be found in Refs.[20–24]. The basic fusion rule applied is averaging thelow-frequency components while selecting the coefficientswith larger absolute value in other frequency bands. Thedecomposition was carried out to level four and four orien-tational frequency bands were employed in the steerablepyramid implementation. The fused images were firstlyevaluated by the criteria: RMSE, NLSE, PSNR, CORR,DE, MI, SSIM, FSSIM, and Pref metric when comparedwith the reference image. Discarding the reference image,we also employed Qu’s mutual information, Xydeas’ objec-tive performance measure, Piella’s fusion quality index (Q,Qw, and QE), and our feature-based metric Fblind and Pblind

to assess the fusion results. In the implementation of Piel-la’s quality indexes, the SSIM algorithm was employed.

The results are given in Tables 3–7, respectively. Eachtable consists of two parts. One is the result from the refer-enced-based assessment; the other is from the metrics of

Fig. 7. The multi-focus images used for the test. From top to bottom: laboratory, books, Japanese food, Pepsi, and object. From left to right: full-focusimage, left-focus image, and right-focus image.

Table 3Evaluation of the fusion result of multi-focus image ‘‘laboratory’’

Assessmentmetric

Laplacianpyramid

Gradientpyramid

Ratio-of-lowpasspyramid

Daubechies waveletfour

SIDWT(Haar)

Steerablepyramid

RMSE 3.9202 7.3346 12.587 4.4012 4.386 3.8849

NLSE 0.0296 0.0555 0.0952 0.0333 0.0332 0.0294

PSNR 24.721 28.41 20.784 24.384 25.97 23.553CORR 0.9996 0.9984 0.9956 0.9994 0.9995 0.9996

DE 0.0566 0.1659 0.1053 0.1497 0.0398 0.0870MI 2.4652 2.0920 1.9992 2.3567 2.4270 2.4341SSIM 0.9809 0.9683 0.9253 0.9762 0.9782 0.9855

FSSIM 0.8504 0.8250 0.6706 0.8104 0.8307 0.8559

Pref 0.8297 0.8199 0.6832 0.8000 0.8212 0.8488

MI (Qu) 4.0969 3.8541 3.9855 3.9999 4.1275 4.1095Xydeas 0.7585 0.7107 0.6040 0.7407 0.7581 0.7646

Q 0.9566 0.9490 0.9178 0.9480 0.9627 0.9574Qw 0.9325 0.8964 0.8039 0.9281 0.9364 0.9334QE 0.8730 0.7636 0.5560 0.8641 0.8667 0.8771

Fblind 0.8949 0.8658 0.7487 0.8618 0.8782 0.8952

Pblind 0.8505 0.8293 0.7331 0.8186 0.8466 0.8585


Table 4Evaluation of the fusion result of multi-focus image ‘‘books’’

Assessmentmetric

Laplacianpyramid

Gradientpyramid



SIDWT(Haar)

Steerablepyramid

RMSE 5.4013 8.5444 18.3360 5.5587 5.5313 4.5888

NLSE 0.0511 0.0830 0.1638 0.0527 0.0524 0.0435

PSNR 23.2540 24.0330 22.1090 19.9820 28.0010 23.9520CORR 0.9987 0.9966 0.9857 0.9986 0.9986 0.9991

DE 0.1473 0.0579 0.0081 0.3191 0.0966 0.1673MI 2.5355 2.0977 2.2000 2.4839 2.5165 2.7075

SSIM 0.9556 0.9485 0.9064 0.9503 0.9538 0.9661

FSSIM 0.7558 0.7359 0.6394 0.7138 0.7573 0.7751

Pref 0.7419 0.7344 0.6355 0.7120 0.7423 0.7691

MI (Qu) 4.4115 3.9617 4.6247 4.3229 4.4521 4.6372

Xydeas 0.7348 0.6718 0.6256 0.7154 0.7427 0.7380Q 0.9474 0.9400 0.9253 0.9365 0.9569 0.9489Qw 0.9332 0.8942 0.7992 0.9283 0.9363 0.9347QE 0.8823 0.7686 0.5785 0.8682 0.8778 0.8859

Fblind 0.8670 0.8417 0.7517 0.8212 0.8697 0.8741

Pblind 0.8159 0.7967 0.7094 0.7738 0.8236 0.8268

Table 5Evaluation of the fusion result of multi-focus image ‘‘Japanese food’’

Assessmentmetric

Laplacianpyramid

Gradientpyramid



SIDWT(Haar)

Steerablepyramid

RMSE 9.0354 17.0190 9.4889 9.1864 9.3845 9.1162NLSE 0.0517 0.0994 0.0538 0.0527 0.0538 0.0523PSNR 32.4400 27.9130 29.3440 32.7300 29.6740 32.8100

CORR 0.9987 0.9954 0.9986 0.9987 0.9986 0.9987

DE 0.0219 0.1964 0.0833 0.0160 0.0087 0.0162MI 2.5227 1.7691 2.2717 2.5912 2.5198 2.6265

SSIM 0.9808 0.9370 0.9591 0.9820 0.9800 0.9833

FSSIM 0.9035 0.8573 0.8601 0.9082 0.9011 0.9140

Pref 0.8956 0.9007 0.8554 0.9004 0.8944 0.9076

MI (Qu) 4.4781 3.3620 4.5679 4.4824 4.4813 4.5301Xydeas 0.8910 0.8362 0.8873 0.8889 0.8984 0.8936Q 0.9803 0.9506 0.9771 0.9794 0.9829 0.9807Qw 0.9706 0.9389 0.9655 0.9706 0.9744 0.9710QE 0.9193 0.8419 0.8963 0.9188 0.9271 0.9203Fblind 0.9683 0.9175 0.9402 0.9629 0.9671 0.9707

Pblind 0.9098 0.8857 0.8916 0.9045 0.9098 0.9124

Table 6Evaluation of the fusion result of multi-focus image ‘‘Pepsi’’

Assessmentmetric

Laplacianpyramid

Gradientpyramid



SIDWT(Haar)

Steerablepyramid

RMSE 3.4475 5.9806 10.8900 3.9439 4.6410 3.4466

NLSE 0.0320 0.0561 0.0971 0.0367 0.0431 0.0320

PSNR 25.4720 25.6130 25.6870 26.7070 29.1220 30.9460

CORR 0.9995 0.9984 0.9951 0.9993 0.9991 0.9995

DE 0.3254 0.3469 0.4090 0.2644 0.3651 0.3474MI 2.3328 2.0024 2.0062 2.1990 2.2171 2.2983SSIM 0.9519 0.9429 0.9177 0.9427 0.9464 0.9502FSSIM 0.6276 0.5907 0.5367 0.5879 0.6016 0.6293

Pref 0.4745 0.4536 0.4189 0.4670 0.4597 0.4851

MI (Qu) 4.3611 4.0215 4.3146 4.1514 4.2510 4.3199Xydeas 0.8180 0.7880 0.6639 0.7997 0.8143 0.8275

Q 0.9667 0.9627 0.9342 0.9607 0.9711 0.9664Qw 0.9604 0.9250 0.7910 0.9569 0.9602 0.9603QE 0.9240 0.8133 0.4854 0.9153 0.9186 0.9252

Fblind 0.8805 0.8543 0.7869 0.8440 0.8811 0.9053

Pblind 0.7612 0.7342 0.6691 0.7332 0.7662 0.7936


Table 7Evaluation of the fusion result of multi-focus image ‘‘objects’’

Assessmentmetric

Laplacianpyramid

Gradientpyramid



SIDWT(Haar)

Steerablepyramid

RMSE 4.7977 9.0774 10.2290 4.5204 5.3637 4.0017

NLSE 0.0597 0.1187 0.1217 0.0566 0.0670 0.0501

PSNR 31.6830 24.0500 29.7230 28.3720 29.0960 30.2070CORR 0.9982 0.9931 0.9920 0.9984 0.9977 0.9987

DE 0.0585 0.1489 0.0886 0.1318 0.0279 0.0791MI 2.0762 1.6046 1.9335 2.0273 2.0884 2.1842

SSIM 0.9651 0.9437 0.9382 0.9617 0.9658 0.9720

FSSIM 0.7908 0.7646 0.7209 0.7756 0.7838 0.8144

Pref 0.7428 0.7447 0.6803 0.7301 0.7333 0.7695

MI (Qu) 3.9174 3.1582 4.2260 3.7720 4.0264 4.1063Xydeas 0.7833 0.7111 0.7593 0.7524 0.7938 0.7853Q 0.9629 0.9450 0.9559 0.9542 0.9693 0.9637Qw 0.9436 0.9118 0.9055 0.9389 0.9497 0.9457QE 0.8671 0.7694 0.7311 0.8553 0.8710 0.8745

Fblind 0.9110 0.8631 0.8559 0.8745 0.9012 0.9163

Pblind 0.8534 0.8218 0.8051 0.7986 0.8356 0.8471


blind assessment. The reference-based assessment is carriedout by comparing with the ‘‘perfect’’ reference, i.e. full-focus image. The comparison can indicate which fusionalgorithm is the best. With such knowledge, we can furthervalidate the blind metrics, i.e. Fblind and Pblind.

In the case of the assessment with a reference image(full-focus image), we can see that in the five cases, the met-rics favour the steerable pyramid based approach most.The constituents of the votes of each assessment metricfor the different algorithms is shown in Fig. 8.

In the case of image fusion without a reference, only theblind evaluation can be carried out. It is not surprised tosee that the fusion algorithms show different performanceon the varied images. The steerable pyramid is selectedthe best by the proposed two metrics Fblind and Pblind.The metric QE gave a similar result except for the image‘‘Japanese food’’.

As the reference-based assessment is concerned, our pro-posed approach (Pref) is accordance with the SSIM metric.The only disaccord happens to the image ‘‘Pepsi’’. For the

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

RM

SE

NLS

E

PS

NR

CO

RR

DE MI

SS

IM

FSS

IM

Pre

f

Assessmen

Vote

s (%

)

Fig. 8. The constituents of the votes of each ass

blind assessment, our feature-based evaluation is consistentwith the reference-based results. However, some other met-rics do not show their coincidence.

When an image is compared with a reference, the SSIMmetric is believed to have the advantage over the otherapproaches [10]. Our proposed metric FSSIM and Pref

can achieve similar results as the UIQI and SSIM metricsin the case of the seven degraded Lena images. In the appli-cation of multi-focus imaging, we count the votes from theassessment metrics. In the five pairs of images, 58% of thevotes are in favour of the steerable pyramid algorithm (seeFig. 9). Both the referenced and blind metrics indicate thatthe steerable pyramid achieved a fused image with higherquality. The proposed metrics appear with their consis-tency with the reference-based assessment.

5. Discussion

The image fusion is an application-dependent operation.In other words, the process depends on the type of images

MI

Xyd

eas Q

QW QE

Fblin

d

Pbl

ind

t Metric

Steerable pyramidSIDWTDaubechies fourRoLPGradient pyramidLaplacian pyramid

essment for the different fusion algorithms.

13%

1%4%

4%

20%

58%

Laplacian pyramid

Gradient pyramid

Ratio-of-lowpass pyramid

Daubechies wavelet 4

SIDWT

Simoncelli steerablepyramid

Fig. 9. The votes of the evaluation metrics for the multiresolution fusionalgorithms in the multi-focus imaging application.


or their formats. Because the images acquired by heteroge-neous sensors possess a different intensity map, there is no‘‘one size fits all’’ solution for the evaluation process.Therefore, one fusion algorithm may not necessarilyachieve the same performance on distinct images by usingcertain evaluation metric. One purpose of this study is toidentify the feasibility and validity of the evaluation algo-rithms, i.e. how these metrics work for different images,rather than rank these methods in a general sense. For aparticular application there should be an optimal solutionto the pixel-level fusion process. However, a benchmarkmust be setup for such comparison in a predefined situa-tion. To our knowledge, most work on pixel-level imagefusion employs multiples metrics for assessing the fusedresults rather than relies on one metric only.

The fusion quality metrics provide a scale to assess theresult and guide the choice or the structure of fusion algo-rithms. This could be generalized in a two step procedure:(1) make clear what are expected in the fusion result andselect one or multiple evaluation metrics for this purpose;(2) test fusion algorithms with the specific evaluation met-rics and choose the appropriate one. Besides, the require-ments of post-processing provide a direct test on thequality of the fused image even though this proceduremay not always provide a quantitative evaluation for thefused results.

The fusion quality metrics we proposed in this paper, i.e.Fblind and Pblind, are feature-oriented approaches. Thesemetrics can successfully identify the image quality basedon the feature measurement with the phase congruencymethod. The gradient-based algorithm for feature detec-tion is inadequate for edges composed of combinations ofsteps, peaks, and roofs [5,6]. The invariant qualities inimages are very important to evaluate wide classes ofimages, which provide a very dynamic and unstructuredenvironment for the algorithms applied [5,7]. Phase con-gruency allows edges, lines, and other features to bedetected reliably [5,7,6] and the match between the fusedand input images can be detected by using the local corre-lation of the phase congruency as the proposed metrics. Inother words, when the target of the fusion is to combine the

features like step edges, lines, and Mach bands from multi-ple input images, the Fblind and Pblind metrics provide aneffective way to assess the feasibility of the potential algo-rithms. In the implementation of Pblind metric, the cross-correlation is employed to measure the similarity of imagefeatures. Other similarity measure presented in [25] willalso be considered in the future work.

It should be mentioned that the proposed approach issubject to the presence of noise, which is introduced duringthe image acquisition. The evaluation metrics cannot iden-tify and remove the noise automatically. In the multireso-lution-based pixel-level fusion, the choice of significancerelies on the coefficients’ absolute value in the transformdomain. If no denoising process is applied, the salient noisewill also be fused in the image as a complementary featureby the multiresolution algorithm with the fusion rule asdescribed in the experiment. This is an inherent disadvan-tage of the multiresolution-based pixel-level fusion withmost of the current available fusion rules. Thus, appropri-ate noise removal processes should be applied to the inputimages, respectively.

6. Conclusion

In this paper, two new feature-based metrics for imagefusion performance are presented. The two metrics arebased on a modified SSIM scheme and the local cross-cor-relations between the feature maps of the fused and theinput images. The image features are represented by adimensionless quantity ranging from zero to one, namelyphase congruency, which is invariant to the changes inimage illumination and contrast. These metrics providean objective quality measure of the fused image in theabsence of a reference image. Although image fusion isan application-dependent process, the metrics proposedin this study can be applied to where the features like stepedges, lines, and Mach bands are to be integrated frommultiple images. The effectiveness of the approach can beenseen from the comparison with other solutions. It is worthto point out when we talk about fusion performance, weneed to identify what fusion algorithm is employed, whattypes of images are used, and which evaluation metric isapplied. The evaluation metric must be chosen carefullybased on the sensor type, image format, and the require-ments of the particular application. There is no uniformstandard for all image fusion applications. In a compli-cated case, multiple metrics should be considered. Anotherinteresting aspect of this work is how the metrics can be uti-lized to optimize the fusion process at the algorithm devel-opment stage; we will explore this in our future study.

Acknowledgments

Dr. Andy Goshtasby at Image Fusion SystemsResearch, Dr. Zhou Wang at University of Texas (Austin),the SPCR Lab at the Lehigh University, and Ginfukuat Kyoto are acknowledged for providing the images


used in this work. Some of the images are available fromhttp://www.imagefusion.org and http://www.cns.nyu.edu/~zwang/.

References

[1] R.S. Blum, Z. Xue, Z. Zhang, An overview of image fusion, in: R.S.Blum, Z. liu (Eds.), Multi-Sensor Image Fusion and Its Applications,Taylor and Francis, 2005, pp. 1–35 (ch. 1).

[2] H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion usingthe wavelet transform, Graphical Models and Image Processing 57 (3)(1995) 235–245.

[3] Z. Zhang, R.S. Blum, Image fusion for a digital camera application,in: Proceedings of 32nd Asilomar Conference on Signals Systems, andComputers, Monterey, CA, 1998, pp. 603–607.

[4] D.S. Forsyth, J.P. Komorowski, NDT data fusion for improvedcorrosion detection, in: X.E. Gros (Ed.), Applications of NDT DataFusion, Kluwer Academic Publisher, 2001, pp. 205–225.

[5] P. Kovesi, Image features from phase congruency, Videre: A Journalof Computer Vision Research 1 (3) (1999).

[6] P. Kovesi, Image features from phase congruency, University ofWestern Australia, Tech. Rep., 1995.

[7] P. Kovesi, Invarian measures of image features from phase informa-tion, Ph.D. dissertation, University of Western Australia, May 1996.

[8] Y. Wang, B. Lohmann, Multisensor image fusion: concept, methodand applications, Institut fur Automatisierungstechnik, UniversitatBremen, Tech. Rep., 2000.

[9] Z. Wang, A.C. Bovik, A universal image quality index, IEEE SignalProcessing Letters 9 (8) (2002) 81–84.

[10] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image qualityassessment: from error measurement to structural similarity, IEEETransactions on Image Processing 13 (1) (2004).

[11] G. Qu, D. Zhang, P. Yan, Information measure for performance ofimage fusion, Electronics Letters 38 (7) (2002) 313–315.

[12] C.S. Xydeas, V. Petrovic, Objective image fusion performancemeasure, Electronics Letters 36 (4) (2000) 308–309.

[13] C.S. Xydeas, V. Petrovic, Objective pixel-level image fusionperformance measure, in: Proceedings of SPIE, vol. 4051, 2000,pp. 89–98.

[14] G. Piella, H. Heijmans, A new quality metric for image fusion, in:Proceedings of International Conference on Image Processing,Barcelona, 2003.

[15] G. Piella, New quality measures for image fusion, in: Proceedings ofInternational Conference on Information Fusion, Stockholm, Swe-den, 2004.

[16] M.C. Morrone, R.A. Owens, Feature detection from local energy,Pattern Recognition Letters 6 (1987) 303–313.

[17] P. Kovesi, Invariant measures of image feature from phase informa-tion. Available from: http://www.csse.uwa.edu.au/pk/research/research.html.

[18] G. Chen, C. Yang, L. Po, S. Xie, Edge-based structural similarity forimage quality assessment, in: Proceedings of ICASSP, vol. II, 2006,pp. 993–996.

[19] J. Martin, J.L. Crowley, Experimental comparison of correlationtechniques, in: Proceedings of International Conference on IntelligentAutonomous Systems, 1995.

[20] E.H. Adelson, C.H. Anderson, J.R. Bergen, P.J. Burt, J.M. Ogden,Pyramid methods in image processing, RCA Engineer 29 (6) (1984)33–41.

[21] P.J. Burt, R.J. Kolczynski, Enhanced image capture through fusion,in: Proceedings of Fourth International Conference on ImageProcessing, 1993, pp. 248–251.

[22] A. Teot, Image fusion by a ratio of low-pass pyramid, PatternRecognition Letters 9 (1989) 245–253.

[23] O. Rockinger, Image sequence fusion using a shift-invariant wavelettransform, in: Proceedings of International Conference on ImageProcessing, vol. 3, 1997, pp. 288–301.

[24] Z. Liu, K. Tsukada, K. Hanasaki, Y.K. Ho, Y.P. Dai, Image fusionby using steerable pyramid, Pattern Recognition Letters 22 (2001)929–939.

[25] D.V. Wekenm, M. Nachtegael, E.E. Kerre, Using similarity measuresand homogeneity for the comparison of images, Image and VisualComputing 22 (2004) 695–702.

http://www.imagefusion.org

http://www.cns.nyu.edu/zwang/

http://www.cns.nyu.edu/zwang/

http://www.csse.uwa.edu.au/pk/research/research.html

http://www.csse.uwa.edu.au/pk/research/research.html

a feature-based metric for the quantitative evaluation of pixel-level image fusion

Documents