configurable pixel level color image fusion 2012

362 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 50, NO. 2, FEBRUARY 2012

An FPGA-Based Hardware Implementation ofConfigurable Pixel-Level Color Image Fusion

Dimitrios Besiris, Vassilis Tsagaris, Member, IEEE, Nikolaos Fragoulis, Member, IEEE, andChristos Theoharatos, Member, IEEE

Abstract—Image fusion has attracted a lot of interest in recentyears. As a result, different fusion methods have been proposedmainly in the fields of remote sensing and computer (e.g., night)vision, while hardware implementations have been also presentedto tackle real-time processing in different application domains. Inthis paper, a linear pixel-level fusion method is employed and im-plemented on a field-programmable-gate-array-based hardwaresystem that is suitable for remotely sensed data. Our work incor-porates a fusion technique (called VTVA) that is a linear trans-formation based on the Cholesky decomposition of the covariancematrix of the source data. The circuit is composed of differentmodules, including covariance estimation, Cholesky decomposi-tion, and transformation ones. The resulted compact hardware de-sign can be characterized as a linear configurable implementationsince the color properties of the final fused color can be selected bythe user in a way of controlling the resulting correlation betweencolor components.

Index Terms—Color representation, field-programmable gatearrays (FPGAs), hardware implementation, image fusion.

I. INTRODUCTION

THE TERM “image fusion” often refers to the process ofcombining information from different imaging modalities

of a scene in a single composite image representation thatis more informative and appropriate for visual perception orfurther processing [1]. Early work on image fusion can betraced back to the mid-1980s. A large number of differentimage fusion methods have been proposed mainly due tothe different available data types and various applications. Acomprehensive survey of image fusion methods is availablein [2], while a collection of papers was edited by Blum andLiu in [3]. For a dedicated review article on pixel-based imagefusion in remote sensing, the interested reader is referred to [4],where related techniques of Earth observation satellite data arepresented as a contribution to multisensory-integration-orienteddata processing.

Image fusion methods are mainly categorized into pixel(low), feature (mid), or symbolic (high) level. Pixel-level tech

Manuscript received December 9, 2010; revised April 20, 2011 andJuly 6, 2011; accepted July 17, 2011. Date of publication September 15, 2011;date of current version January 20, 2012. This work was supported by the GreekMinistry of Education, under the Corallia Clusters Initiative with the project“SYNTHESIS,” under Contract MIKRO-I/18.

D. Besiris is with the Electronics Laboratory, Department of Physics, Uni-versity of Patras, 26504 Rio, Greece, and also with IRIDA Labs, 26504 Rio,Greece (e-mail: [email protected]).

V. Tsagaris, N. Fragoulis, and C. Theoharatos are with IRIDA Labs,26504 Rio, Greece (e-mail: [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TGRS.2011.2163723

niques that work in the spatial domain [5]–[7] have gainedsignificant interest mainly due to their simplicity and linearity.Multiresolution analysis is another popular approach for pixel-level image fusion [8]–[10], using filters with increasing spatiallevel in order to produce a pyramid sequence of images atdifferent resolutions. In most of these approaches, at eachposition of the transformed image, the value in the pyramid thatcorresponds to the highest saliency is used. Finally, an inversetransformation of the composite image is employed in order toderive the fused image. In the fields of remote sensing, fusionof multiband images that lie in different spectral bands andcorresponding areas of the electromagnetic spectrum is one ofthe key areas of research. The main target in these techniquesis to produce an effective representation of the combined multi-spectral image data, i.e., an application-oriented visualizationin a reduced data set [11]–[16]. Useful theoretical analysisand comparative evaluation (both visually and objectively) ofseveral image fusion methods are presented in [17] and [18],using experimental data sets from different sensors (IKONOS,Quickbird, and simulated Pleiades).

The requirements of modern society have, lately, paved theway for not only providing effective solutions to the imagefusion problem but also making fast implementations in orderto boost related techniques and, therefore, be successfully usedin a plethora of computer vision applications. Real-time imple-mentation of image fusion systems is very demanding, since itemploys algorithms with a relatively high runtime complexity.Lately, hardware implementations have emerged as a mean toachieve real-time performance in image processing systems.These have been mainly adapted to multisensor platforms forvideo processing applications, such as the deployment of mili-tary, security, and safety efforts. The complementary nature ofvisible and nonvisible sensors makes it possible to obtain usefulvideo under varying conditions. In these types of applications,image fusion blends multiple video streams and produces en-hanced visibility of the scene, increasing the effectiveness of thevideo information. The most widespread enabling technologyfor these kinds of implementations (or practices) is the so-calledfield-programmable gate arrays (FPGAs). Modern versions ofthese devices offer a number of critical characteristics such aslarge number of logic elements to allow the implementationof complex algorithms, very large scale integration to occupyminimum space, low power consumption, and very high speedgrades. Therefore, system implementations can be real time,mobile, robust, and low-power consuming.

In this paper, a hardware implementation of a real-timefusion system is proposed. The system is based on an ALTERACyclone II FPGA and implements a configurable linear pixel-level algorithm which is able to result in color fused images

0196-2892/$26.00 © 2011 IEEE

BESIRIS et al.: FPGA-BASED HARDWARE IMPLEMENTATION OF PIXEL-LEVEL COLOR IMAGE FUSION 363

using VHSIC hardware description language. The overall ar-chitecture is based on a control module, a covariance estimationmodule, Cholesky decomposition, and a transformation mod-ule. A detailed description of the Cholesky decomposition isalso provided.

This paper is organized as follows. In Section II, a reviewof related work with hardware implementation of image fusionis presented. The necessary background for the pixel-levelfusion method that is implemented in this paper is providedin Section III. Section IV presents the generic block-basedsystem architecture, while subsystem hardware implementationdetails for the FPGA approach are given in Section V. Addi-tional facts regarding operation performance on Altera CycloneII EP2C35F672 FPGA, including occupied resources, imagesize, input channels, bit length, and throughput of individualunits of the overall hardware implementation, are included inSection VI. Finally, the experimental verification is provided inSection VII, while conclusions are drawn in Section VIII.

II. RELATED WORK

A few attempts have been introduced in the literature fordesigning FPGA-based systems that implement image fusionalgorithms. Jasiunas et al. [19] presented an image fusionsystem based on wavelet decomposition for unmanned airbornevehicles (UAVs). This is probably the first implementationdeveloped on a reconfigurable platform alone, as well as thefirst investigation of adaptive image fusion that makes use ofdynamic reconfiguration to change the fusion algorithm as theUAV approaches an object of interest. The authors chose anFPGA implementation mainly since it requires little physicalspace on a UAV platform. Results showed an achieved la-tency of 3.81 ms/frame for visible and infrared 8-bit imagesof 512 × 512 pixel resolution. Sims and Irvine presented in[20] an FPGA implementation using pyramidal decompositionand subsequent fusion of dual video streams. This realizationenabled a design that can fuse dual video streams in grayscalevideo graphics array (VGA), with 30 frames/s, in real time.Both hardware designs showed a dramatic improvement inspeed performance compared to respective algorithms runningon a typical PC, giving strong indications of the advantages ofhardware against software implementations.

In [21], a real-time image processing system was presentedfor combining the video outputs of an uncooled infrared imag-ing system and a low-level-light TV system. Both images were384 × 288 in size, with 8-bit resolution. The hardware imple-mentation was based on a simple weighted pixel average andprovided poor results regarding the contrast of the fused images.Aiming to provide enhanced results in both visual effect andimage quality, Song et al. [22] proposed an alternative imagefusion implementation based on Laplacian pyramid decompo-sition of two-channel VGA video fusion using parallel andpipelined architectures. In their work, a three-level Laplacianpyramid image fusion algorithm was implemented in VHDLaccording to the designed methods (including controlling, de-composing, fusion, and reconstruction modules). The designwas verified on a real-time dual-channel image fusion systembased on Virtex-4 SX35 FPGA, giving a fusion frame rate of25 frames/s (real-time video).

Furthermore, attempts have been made toward the devel-opment of real-time image fusion systems for use in prac-

tical applications. For example, Li et al. [23] proposed anFPGA system of multisensor image fusion and enhancementfor visibility improvement that can be used to help driversdriving at night or under bad weather conditions. Their designincluded wavelet-decomposition-based image fusion, as wellas image registration and enhancement in order to improvethe visibility of roads in extremely low lightning conditions.A throughput of 67 megapixels/s was achieved, using 1024 ×1024 resolution for fusion of charge-coupled device (CCD) andlong wavelength infrared, which is very suitable for applicationto a driver visibility improvement system. Other commercialproducts have been also proposed in the literature [24]–[26]with very limited details regarding the fusion methodology.

III. IMAGE FUSION METHOD BACKGROUND

In this section, the necessary background for a vector repre-sentation of multidimensional remotely sensed data is provided.Moreover, the basic principles of the pixel-level fusion methodproposed in [6] are provided in order to smooth the progressof the FPGA hardware implementation that is described in thelater sections of this paper.

A. Vector Representation in a MultidimensionalSpace—Dimensionality Reduction

The statistical properties of a multispectral data set with M ·N pixels per channel and K different channels can be exploredif each pixel is described by a vector whose components are theindividual spectral responses to each multispectral channel

X = [X1, X2, . . . , XK ]T (1)

with a mean vector given by X = E{X{= (1/(M ·N))

∑M ·Ni=1 Xi. While the mean vector is used to define

the average or expected position of the pixels in the vectorspace, the covariance matrix describes their scatter

Cx =1

M ·N

M ·N∑i=1

XiXTi −XX

T. (2)

The covariance matrix can be used to quantify the correlationbetween the multispectral bands. In the case of a high degreeof correlation, the corresponding off-diagonal elements in thecovariance matrix will be large. The correlation between thedifferent multispectral components can also be described bymeans of the correlation coefficient. The correlation coefficientr is related to the corresponding covariance matrix element,since it is the covariance matrix element divided by the stan-dard deviation of the corresponding multispectral component(rij = cij/σiσj). The correlation coefficient matrix Rx hasas elements the correlation coefficient between the ith andjth multispectral components. Accordingly, all the diagonalelements will be one, and the matrix is symmetric.

In the literature, several different linear transforms can befound, based on the statistical properties of vector represen-tation. An important case is the Karhunen–Loeve transform,also known as principal component analysis (PCA). For thistransformation, the matrix Cx is real and symmetric, therebyfinding that a set of orthonormal eigenvalues is always possible.Let ei and λi, i = 1, 2, . . . ,K, be the eigenvectors and the


corresponding eigenvalues of Cx arranged in descending order.Furthermore, let A be a matrix whose rows are formed by theeigenvectors of Cx ordered so that the first row of A is theeigenvector corresponding to the largest eigenvalue and the lastrow is the eigenvector corresponding to the smallest one. Thematrix A is the transformation matrix that maps vector X into Y

Y = AT (X −X). (3)

The mean of Y resulting from that transformation is zero,and the covariance matrix Cy is given by

Cy = ACxAT . (4)

The resulting covariance matrix Cy will be diagonal, andthe elements along the main diagonal are the eigenvalues ofCx. The off-diagonal elements of the covariance matrix arezero, denoting that the elements of the vector population Y areuncorrelated. This transformation will establish a new coordi-nate system whose origin is at the centroid of the populationand whose axes are in the direction of the eigenvectors of Cx.This coordinate system clearly shows that the transformationin (3) is a rotation transformation that aligns the eigenvectorswith the data, and this alignment is exactly the mechanism thatdecorrelates the data.

The PCA transform is optimal in the sense that the firstprincipal component will have the highest contrast and it canbe displayed as a grayscale image with the bigger percentageof the total variance and, thus, the bigger percentage of visualinformation. The aforementioned property does not hold in thecase of a color image. If the three principal components are usedto establish a red–green–blue (RGB) image (the first componentas red, the second as green, and the third as blue), the resultis not optimal for the human visual system. The first principalcomponent (red) will exhibit a high degree of contrast, thesecond (green) will display only limited available brightnessvalue, and the third one (blue) will demonstrate an even smallerrange. In addition, the three components displayed as R, G, andB are totally uncorrelated, and this is an assumption that doesnot hold for natural images [27], [28]. Therefore, a color imagehaving as RGB channels the first three principal componentsresulted by the PCA transformation of the source multispectralchannels possesses, most of the times, unnatural correlationproperties as opposed to natural color images.

B. VTVA Fusion Method

A different approach for RGB image formation using mul-tispectral data is not to totally decorrelate the data but tocontrol the correlation between the color components of thefinal image. This is achieved by means of the covariancematrix. The proposed transformation distributes the energy ofthe source multispectral bands so that the correlation betweenthe RGB components of the final image may be selected by theuser/visual expert or adjusted to be similar to that of naturalcolor images. For example, we could consider the case ofcalculating the mean correlation between red–green, red–blue,and green–blue channels for the database of a large number ofnatural images (like the Corel or Imagine Macmillan database)as in [6]. In this way, no additional transformation is needed,

and direct representation to any RGB display can be applied.This can be achieved using a linear transformation of the form

Y = ATX (5)

where X and Y are the population vectors of the source and thefinal images, respectively. The relation between the covariancematrices is

Cy = ATCxA (6)

where Cx is the covariance of the vector population X andCy is the covariance of the resulting vector population Y . Therequired values for the elements in the resulting covariancematrix Cy are based on the study of natural color images [6].The selection of a covariance matrix based on the statisticalproperties of natural color images guarantees that the resultingcolor image will be pleasing for the human eye. The RGBcorrelation coefficients depend on the scenes depicted in theimages. However, since a large variety of images with differ-ent scenes, perceptually pleasing for the observer, have beenchosen from the database, the mean value of the correlationcoefficients is not affected by the selection of the scenes. Thematrices Cx and Cy are of the same dimension, and if theyare known, the transformation matrix A can be evaluated usingthe Cholesky factorization method. Accordingly, a symmetricpositive definite matrix S can be decomposed by means of anupper triangular matrix Q so that

S = QT ·Q. (7)

The matrices Cx and Cy using the aforementioned factoriza-tion can be written as

Cx =QTxQx

Cy =QTy Qy (8)

and (6) becomes

QTy Qy = ATQT

xQxA = (QxA)TQxA. (9)

Thus

Qy = QxA (10)

and the transformation matrix A is

A = Q−1x Qy. (11)

The final form of the transformation matrix A implies that theproposed transformation depends on the statistical properties ofthe original multispectral data set. Additionally, in the designof the transformation, the statistical properties of natural colorimages are taken into account. The resulting population vectorY is of the same order as the original population vector X ,but only three of the components of Y will be used for colorrepresentation.

The evaluation of the desired covariance matrix Cy for thetransformed vector is based on the statistical properties of natu-ral color images, discussed in [6], and on requirements imposed


Fig. 1. Flowchart for the proposed system.

Fig. 2. Block-based architecture of the VTVA fusion system.

by the user or the visual expert. The relation between the covari-ance Cy and the correlation coefficient matrix Ry is given by

Cy = ΣRyΣT (12)

where

Σ =

⎡⎢⎢⎢⎣σy1 0 0 . 00 σy2 0 . 00 0 σy3 . 0. . . . .0 0 0 . σyK

⎤⎥⎥⎥⎦ (13)

is the diagonal matrix with the variances (or standarddeviations) of the new vectors in the main diagonal and

Ry =

⎡⎢⎢⎢⎣

1 rR,G rR,B . 0rR,G 1 rG,B . 0rR,B rG,B 1 . 0. . . . .0 0 0 . 1

⎤⎥⎥⎥⎦ (14)

is the desired correlation coefficient matrix.The necessary steps for the method implementation are

shown in Fig. 1 and can be summarized as follows.

1) Estimate the covariance matrix Cx of populationvectors X .

2) Compute the covariance matrix Cy of population vectorsY , using the correlation coefficient matrix Ry and thediagonal matrix Σ.

3) Decompose the covariance matrices Cx and Cy using theCholesky factorization method in (8) by means of theupper triangular matrices Qx and Qy , respectively.

4) Compute the inverse of the upper triangular matrix Qx,namely, Q−1

x .5) Compute the transformation matrix A in (11).6) Compute the transformed population vectors Y using (5).7) Scale the mapped images to the range of [0, 255] in order

to produce RGB representation.

For high visual quality, the final color image produced bythe transformation must have a high degree of contrast. In otherwords, the energy of the original data must be sustained andequally distributed in the RGB components of the final colorimage. This requirement is expressed as follows:

K∑i=1

σ2xi =

3∑i=1

σ2yi (15)

with σy1 = σy2 = σy3 approximately. The remaining bandsshould have negligible energy (contrast) and will not be usedin forming the final color image. Their variance can be adjustedto small values, for example, σyi = 10−4σy1, for i = 4, . . . ,K.

IV. SYSTEM ARCHITECTURE

In order to efficiently implement the VTVA fusion methodin hardware based on FPGA, we derived the necessary block-based system architecture which is shown in Fig. 2. The circuitis composed of seven blocks (labeled in blue), where theCholesky decomposition, the inverse matrix, and the trans-formation matrix are integrated in one subsystem, as will beexplained in Section V. The overall architecture has four datainputs and one data output. In addition, a controlling signal isrequired, as shown in Table I.


TABLE IVTVA FUSION SYSTEM I/O SIGNALS

V. SUBSYSTEM IMPLEMENTATION DETAILS

This section provides the detailed implementation for twoof the subsystems that comprise the hardware architecture.For each subsystem, the mathematical formulas and their sim-plifications that are necessary for the efficient hardware im-plementation are provided. This enables the realization in alow-level hardware description language such as VHDLwhich is the dominant language in FPGA-based hardwareimplementation.

A. Covariance Matrix

The covariance matrix Cx for the source population vectorsX is estimated using the following equation:

Cxkj =1

M ·N

M ·N∑i=1

[(Xki −Xk) · (Xji −Xj)

](16)

where Cxkj refers to the covariance element at the kth row andjth column and Xk and Xj are the mean values of input vectorsXk and Xj , respectively, as estimated in the following:

Xk =1

M ·N

M ·N∑i=1

Xki (17)

Xj =1

M ·N

M ·N∑i=1

Xji. (18)

By applying the mathematical equivalent of mean valuesin (16), each element of the covariance matrix Cxkj can beestimated using the analytical expression in the following:

Cxkj =1

M ·N

M ·N∑i=1

[(Xki −

1

M ·N

M ·N∑i=1

Xki

)

·(Xji −

1

M ·N

M ·N∑i=1

Xji

)]

=1

M ·N

M ·N∑i=1

Xki ·Xji −1

(M ·N)2

M ·N∑i=1

Xki

·M ·N∑i=1

Xji. (19)

A detailed architecture for the estimation of the covariancematrix implementation composed of six blocks is shown inFig. 3. The subsystem has four data inputs and one dataoutput. The procedure is controlled by four signals, as shownin Table II. The procedure is explained in detail using thepseudocode in Fig. 4.

B. Cholesky Decomposition and Transformation Subsystem

This subsystem integrates in a single unit the processes ofCholesky decomposition, matrix inversion, and calculation ofthe transformation matrix. The covariance matrix Cy in (12)is computed by means of the correlation coefficient matrix Ry

in (14) and the matrix of variances Σ in (13), resulting in thefollowing equation:

Cy =σy ·

⎡⎢⎣1 0 0 00 1 0 00 0 1 00 0 0 κ

⎤⎥⎦ ·

⎡⎢⎣

1 rR,G rR,B 0rR,G 1 rG,B 0rR,B rG,B 1 00 0 0 1

⎤⎥⎦

· σy ·

⎡⎢⎣1 0 0 00 1 0 00 0 1 00 0 0 κ

⎤⎥⎦

=σ2y ·

⎡⎢⎣

1 rR,G rR,B 0rR,G 1 rG,B 0rR,B rG,B 1 00 0 0 κ2

⎤⎥⎦ (20)

where κ (�1) is the adjustable parameter that controls thecontribution of the fourth variance value σy4

. The variance ofthe transformed population vectors σy is computed under therequirement in (15), by the sum of covariance matrix Cxii

inthe main diagonal

σy =

√√√√1

3·

4∑i=1

σ2xi

=

√√√√1

3·

4∑i=1

Cxii. (21)

The covariance matrix of the source population vectors Xis decomposed by means of the upper triangular matrix Qx,using Cholesky’s factorization, given in (22), shown at thebottom of the next page. Using (22), the covariance matrix Cy

is decomposed to the upper triangular matrix

Qy = σy

⎡⎢⎣1 rR,G rR,B 00 α β 00 0 γ 00 0 0 κ

⎤⎥⎦ (23)

where α, β, and γ are three parameters introduced during thecomputation of (23) and are given by the following equations:

α =(1− r2R,G

) 12

β =1

α(rG,B − rR,GrR,B)

γ =(1−

(r2R,B + β2

)) 12 . (24)


Fig. 3. Detailed block-based architecture of the covariance matrix estimation subsystem.

The inverse matrix of the upper triangular matrix Qx is alsoan upper triangular matrix

Q−1x =

1

|det(Qx)|· adj(Qx)=

⎡⎢⎣Q−1

x11Q−1

x12Q−1

x13Q−1

x14

0 Q−1x22

Q−1x23

Q−1x24

0 0 q−1x33

Q−1x34

0 0 0 Q−1x44

⎤⎥⎦

(25)

where det(Qx) is the determinant and adj(Qx) is the corre-sponding adjugate matrix. The resulting inverse matrix Q−1

x isgiven in (26), shown at the bottom of the page.

The transformation matrix A in (11) is calculated by themultiplication of the inverse matrix Q−1

x in (26) and the uppertriangular matrix Qy in (23), resulting in (27), shown at thebottom of the page. Using the elements of the inverse matrix

Qxij=

⎧⎪⎪⎪⎨⎪⎪⎪⎩

(Cxii

−i−1∑k=1

Q2xki

) 12

, for i = j = 1, 2, 3, 4

1Qxii

(Cxji

−i−1∑k=1

QxkiQxkj

), for j = i+ 1, i+ 2, . . . , 4

(22)

A =

⎡⎢⎣Q−1

x11Q−1

x12Q−1

x13Q−1

x14

0 Q−1x22

Q−1x23

Q−1x24

0 0 Q−1x33

Q−1x34

0 0 0 Q−1x44

⎤⎥⎦ · σy

⎡⎢⎣1 rR,G rR,B 00 α β 00 0 γ 00 0 0 κ

⎤⎥⎦

=σy ·

⎡⎢⎣Q−1

x11rR,GQ

−1x11

+ αQ−1x12

rR,BQ−1x11

+ βQ−1x12

+ γQ−1x13

κQ−1x14

0 αQ−1x22

βQ−1x22

+ γQ−1x23

κQ−1x24

0 0 γQ−1x33

κQ−1x34

0 0 0 κQ−1x44

⎤⎥⎦ (27)

Q−1x =

⎡⎢⎢⎢⎢⎢⎣

1Qx11

−Qx12

Qx11Qx22

Qx12Qx23

−Qx13Qx22

Qx11Qx22

Qx33

Qx33(Qx12Qx24

−Qx14Qx22)−Qx34(Qx12

Qx23−Qx13

Qx22)Qx11

Qx22Qx33

Qx44

0 1Qx22

−Qx23

Qx22Qx33

Qx23Qx34

−Qx24Qx33

Qx22Qx33

Qx44

0 0 1Qx33

−Qx34

Qx33Qx44

0 0 0 1Qx44

⎤⎥⎥⎥⎥⎥⎦ (26)


TABLE IICOVARIANCE MATRIX SUBSYSTEM I/O AND CONTROL SIGNALS

Fig. 4. Pseudocode of the covariance matrix procedure.

(27), the elements of the transformation matrix A are computedin the following equations:

A11 =σyQ−1x11

= σy1

Qx11

(28)

A22 =ασyQ−1x22

= ασy1

Qx22

(29)

A33 = γσyQ−1x33

= γσy1

Qx33

(30)

A44 =κσyQ−1x44

= κσy1

Qx44

(31)

A12 =σy

(rR,GQ

−1x11

+ αQ−1x12

)= (σyrR,G − α22Qx12

)1

Qx11

(32)

A23 =σy

(βQ−1

x22+ γQ−1

x23

)= (σyβ − α33Qx23

)1

Qx22

(33)

A34 =κσyQ−1x34

= −α44Qx34

1

Qx33

(34)

A13 =σy

(rR,BQ

−1x11

+ βQ−1x12

+ γQ−1x13

)= (σyrR,B − α23Qx12

− α33Qx13)

1

qx11

(35)

A24 =κσyQ−1x24

= − (α34Qx23+ α44Qx24

)1

Qx22

(36)

A14 =κσyQ−1x14

= − (α24Qx12+ α34Qx13

+ α44Qx14)

1

Qx11

. (37)

This set of equations can be replaced by one unified equation(maintaining the order of computations), as shown in

Aij=

⎧⎨⎩

1Qxii

(cijσy−

4∑k=i+1

AkjQxik

), for j= i+1; i+2, . . . , 4

0, otherwise(38)

where c is the set of coefficients, i.e., c = {1, α, γ, κ, rR,G,rR,B, β, 0}, used in (28)–(37). Equations (22) and (38) havethe same mathematical form. As a result, both implementationsof the upper triangular matrix Qx and the transformation matrixA can be implemented from the same circuit block, which isshown in Fig. 5. The circuit comprises six blocks and has fourdata inputs and one data output. The process is controlled bythree signals, as shown in Table III.

1) Multiplication: Inputs AQi and Qj are multiplied.2) Addition: Multiplication result MAQij is summed. Addi-

tion operation is initialized Sij = 0, selecting SLS = 1.3) Subtraction: The sum Σ is subtracted from the input

CCij , and the result SCSij is stored, selecting WRS=1.4) Square root: The subtraction result SCSij is square

rooted.5) Division: The subtraction result SCSij is divided by the

input Qii.6) Select operation: The square root (SLOP = 1) and divi-

sion (SLOP = 0) results are multiplexed, providing theoutput AQij .

In more details, the inputs to the Cholesky decomposition andtransformation matrix register file blocks are given in Table IV.

The linear transformation of the source population vectors Xthat results in vector Y , based on the transformation matrix Ain (38), is calculated as follows:⎡⎢⎣Y1i

Y2i

Y3i

Y4i

⎤⎥⎦ =

⎡⎢⎣

A11X1i

A12X1i +A22X2i

A13X1i +A23X2i +A33X3i

A14x1i +A24X2i +A34X3i +A44X4i

⎤⎥⎦ . (39)


Fig. 5. Subsystem block-based architecture of the Cholesky decomposition and transformation matrix.

TABLE IIICHOLESKY DECOMPOSITION AND TRANSFORMATION

MATRIX I/O AND CONTROL SIGNALS

The transformed vectors Y must be scaled to the range[0, 255], in order to produce an RGB representation

Y Tki = 255 · Yki −min(Yki)

max(Yki)−min(Yki)(40)

TABLE IVCHOLESKY DECOMPOSITION AND TRANSFORMATION MATRIX I/O

SELECTION

where min(Yki) and max(Yki) are the minimum and maximumvalues of the transformed vector Yk, respectively. The proce-dure is explained in detail using the pseudocode in Fig. 6.

VI. HARDWARE SYSTEM IMPLEMENTATION

The proposed system has been implemented using VHDLon an Altera Cyclone II EP2C35F672 FPGA. The datapath bitlength for this particular implementation has been chosen to be


Fig. 6. Pseudocode of the Cholesky decomposition and transformation matrixprocedure.

16 b. The choice of the datapath bit length provides satisfactoryaccuracy on the output of the individual system blocks, relatedto a double precision model. Objective results to measure thisaccuracy were calculated using signal-to-noise ratio (SNR) forthe fused image at the output. The SNR is calculated as the ratioof the rms value of a reference, “noise-free” image producedin MATLAB using a double precision model, to the rms valueof the noise calculated as the difference of the aforementionedreference image and the image produced in a real fixed-pointimplementation of a certain bit length

SNR=10 log10RMS{ref_im}

RMS{ref_im−real_impl_im(bit_length)} .(41)

The results are given in Table V. For the proposed implemen-tation, a bit length of 16 has been chosen, corresponding to anSNR value of 98.15 dB.

The occupied resources are shown in Table VI. The max-imum clock frequency of the system is dictated by the mostdemanding subsystem which is the Cholesky and transforma-tion unit. This unit defines also the critical path of the system.For this particular FPGA implementation, the maximum clockfrequency is limited up to 81.47 MHz. The achieved frameprocessing time is 7.55 ms, a time period that allows real-timeprocessing of video sequences up to 132 frames/s. Simulationshave shown that throughput can be significantly improved whenfaster FPGAs (e.g., Altera Stratix II family) are considered.

TABLE VSNR VALUES IN DECIBELS VERSUS BIT LENGTH

OF THE PROPOSED IMPLEMENTATION

TABLE VIOCCUPIED RESOURCES OF THE PROPOSED IMPLEMENTATION ON AN

ALTERA CYCLONE II EP2C35F672 FPGA

Apparently, the system size depends on the number of inputchannels, the size of the input images, and the chosen bit length.The dependence on image size, number of channels, and bitlength is summarized in Tables VII–IX, respectively. As it isevident from these tables, the increase in image size imposesonly a small increase to the amount of the occupied resources.On the other hand, an increase to the number of channels causesa significant increase on the amount of the occupied resources,and the same holds with the increase of bit length. Therefore,during design, a careful choice should be made in order to havea balanced tradeoff between accuracy and system size.

The overall latency of the system is computed as

T = 2NM + 1225 (42)

where N and M correspond to the size of the image. Theextra factor of 1225 in (42) corresponds to the sum of delaysof the individual system blocks of the system. In the currentimplementation (N = 480 and M = 640) and for the selectedclock rate, this latency equals to 7.56 ms. The throughput ofeach individual functional unit in the proposed implementationis indicated in Table X.


TABLE VIIOCCUPIED RESOURCES IN RELATION TO IMAGE ON AN ALTERA

CYCLONE II EP2C35F672 FPGA (N = 480, M = 640, BOLD

IS THE CURRENT IMPLEMENTATION)

TABLE VIIIOCCUPIED RESOURCES IN RELATION TO INPUT CHANNELS ON AN

ALTERA CYCLONE II EP2C35F672 FPGA (BOLD IS THE

CURRENT IMPLEMENTATION)

The throughput of blocks 1 and 5 is expressed in framesper second since block 1 has image frame pixels as input andblock 5 outputs image frame pixels. However, blocks 2, 3,and 4 are processing internal data of different nature; thus, itis meaningful to express throughput for these units in blocksper clock. The term block here stands for a block of datacorresponding to a matrix of some size. As it is evident, thereare large throughput variations between the different units, afact that could be exploited for performing resource sharing,increasing this way the efficiency of the system.

VII. EXPERIMENTAL VERIFICATION

The experimental verification of the proposed architecturehas been carried out using a test bench based on a MATLAB/Simulink model as well as an Altera DSP Builder blockset andinterfacing a Terasic DE2 development board which encom-passes an Altera Cyclone II EP2C35F672 FPGA. The Simulink

TABLE IXOCCUPIED RESOURCES IN RELATION TO BIT LENGTH ON AN ALTERA

CYCLONE II EP2C35F672 FPGA (BOLD IS THE

CURRENT IMPLEMENTATION)

TABLE XTHROUGHPUT OF THE INDIVIDUAL UNITS OF THE

PROPOSED IMPLEMENTATION

model reads the image data from the hard disk and transmitsthem to the development board through a Universal SerialBus Joint Test Action Group interface, on which the fusionalgorithm is implemented. The outcome of the processing isreturned to the Simulink environment, which stores them in thehard disk. The resulting data are then available to be presentedas images.

Two different data sets are employed for verifying the pro-posed fusion implementation. The first data set is composed offour multispectral bands, and it is available by space imagingand acquired from the IKONOS-2 sensor. The analysis of thesource in each band is 11 bits/pixel, and the size is 2001 ×2001 pixels. In the implementation described in this paper,we employ the IKONOS data with 8 bits/pixel and VGAresolution format, which is the case for standard night visionapplications. The ground resolution provided by IKONOS-2


Fig. 7. IKONOS data set. (a) Natural color composite of the first three bands,(b) the corresponding NIR channel, and (c) the fusion result.

for the multispectral imagery is 4 m, and the covered area inthis multispectral image is an urban area with a road network, aforest, a stadium, a park, etc. The natural color composite imagealong with the near-infrared (NIR) channel and the final fusedimage can be found in Fig. 7.

The second data set also comprises electro-optical and ther-mal infrared data, but this time, data come from the field ofnight vision. It comprises a color image of a scene representinga lakeside and a bench along with a midwave infrared image(3–5 μm) where a person is crouching next to the bench. Thesedata were provided by TNO Human Factors Institute, and amore detailed description of the data acquisition procedure canbe found in [29]. In Fig. 8, the color scene, the infrared image,and the fused color image are shown.

VIII. CONCLUSION

In this paper, the hardware implementation of a fusionmethod that is suitable for remote sensing data has been pre-sented. The VTVA fusion method is configurable since it allowsthe user to control the correlation properties of the final fusedcolor image. The hardware realization which is based on FPGAtechnology provides a fast, compact, and low-power solutionfor image fusion. The dedicated sections provide a detaileddescription of the methodology to transform the VTVA fusionmethod in a hardware realizable process. Future work in thisfield is planned for extension to other types of image modalitiesand to objectively evaluate image fusion methods in real time.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir helpful comments and credible review.

Fig. 8. Night vision data set. (a) Natural color scene, (b) the correspondinginfrared image, and (c) the fusion result.

REFERENCES

[1] A. Goshtaby and S. Nikolov, “Image fusion: Advances in the state of theart,” Inf. Fusion, vol. 8, no. 2, pp. 114–118, Apr. 2007.

[2] T. Stathaki, Image Fusion: Algorithms and Applications. New York:Academic, 2008.

[3] R. S. Blum and Z. Liu, Eds., Multi-Sensor Image Fusion and Its Ap-plications (Special Series on Signal Processing and Communications).New York: Taylor & Francis, 2006.

[4] C. Pohl and J. L. van Genderen, “Multisensor image fusion in remote sens-ing: Concepts, methods and applications,” Int. J. Remote Sens., vol. 19,no. 5, pp. 823–854, 1998.

[5] V. Tsagaris, V. Anastassopoulos, and G. Lampropoulos, “Fusion of hy-perspectral data using segmented PCT for enhanced color representa-tion,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 10, pp. 2365–2375,Oct. 2005.

[6] V. Tsagaris and V. Anastassopoulos, “Multispectral image fusion for im-proved RGB representation based on perceptual attributes,” Int. J. RemoteSens., vol. 26, no. 15, pp. 3241–3254, Aug. 2005.

[7] N. Jacobson, M. Gupta, and J. Cole, “Linear fusion of image sets fordisplay,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 10, pp. 3277–3288, Oct. 2007.

[8] K. Nagarajan, C. Krekeler, K. C. Slatton, and W. D. Graham, “A scal-able approach to fusing spatiotemporal data to estimate streamflow via aBayesian network,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 10,pp. 3720–3732, Oct. 2010.

[9] G. Piella, “A general framework for multiresolution image fusion: Frompixels to regions,” Inf. Fusion, vol. 4, no. 4, pp. 259–280, Dec. 2003.

[10] C. Thomas, T. Ranchin, L. Wald, and J. Chanussot, “Synthesis of mul-tispectral images to high spatial resolution: A critical review of fusionmethods based on remote sensing physics,” IEEE Trans. Geosci. RemoteSens., vol. 46, no. 5, pp. 1301–1312, May 2008.

[11] J. Tyo, A. Konsolakis, D. Diersen, and R. C. Olsen, “Principal-components-based display strategy for spectral imagery,” IEEE Trans.Geosci. Remote Sens., vol. 41, no. 3, pp. 708–718, Mar. 2003.

[12] W. Zhang and J. Kang, “QuickBird panchromatic and multi-spectral im-age fusion using wavelet packet transform,” in Lecture Notes in Controland Information Sciences, vol. 344. Berlin, Germany: Springer-Verlag,2006, pp. 976–981.

[13] V. Shah, N. Younan, and R. King, “An efficient pan-sharpening methodvia a combined adaptive PCA approach and contourlets,” IEEE Trans.Geosci. Remote Sens., vol. 46, no. 5, pp. 1323–1335, May 2008.

[14] K. Kotwal and S. Chaudhuri, “Visualization of hyperspectral images usingbilateral filtering,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5,pp. 2308–2316, May 2010.


[15] Q. Du, N. Raksuntorn, S. Cai, and R. J. Moorhead, “Color display forhyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6,pp. 1858–1866, Jun. 2008.

[16] S. Cai, Q. Du, and R. J. Moorhead, “Feature-driven multilayer visual-ization for remotely sensed hyperspectral imagery,” IEEE Trans. Geosci.Remote Sens., vol. 48, no. 9, pp. 3471–3481, Sep. 2010.

[17] L. Alparone, L. Wald, J. Chanussot, C. Thomas, P. Gamba, and L. Bruce,“Comparison of pansharpening algorithms: Outcome of the 2006 GRS-Sdata-fusion contest,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 10,pp. 3012–3021, Oct. 2007.

[18] Z. Wang, D. Ziou, C. Armenakis, D. Li, and Q. Li, “A comparativeanalysis of image fusion methods,” IEEE Trans. Geosci. Remote Sens.,vol. 43, no. 6, pp. 1391–1402, Jun. 2005.

[19] M. D. Jasiunas, D. A. Kearney, J. Hopf, and G. B. Wigley, “Image fusionfor uninhabited airborne vehicles,” in Proc. Int. Conf. FPT , Dec. 16–18,2002, pp. 348–351.

[20] O. Sims and J. Irvine, “An FPGA implementation of pattern-selective pyra-midal image fusion,” in Proc. Int. Conf. FPL, Aug. 28–30, 2006, pp. 1–4.

[21] Q. Yunsheng, Z. Junju, T. Shi, C. Qian, Z. Zixiang, and C. Benkang, “Thereal-time processing system of infrared and LLL image fusion,” in Proc.Int. Symp. Photoelectron. Detection Imag.: Image Process., Mar. 2008,pp. 66 231Y-1–66 231Y-9.

[22] Y. Song, K. Gao, G. Ni, and R. Lu, “Implementation of real-timeLaplacian pyramid image fusion processing based on FPGA,” in Proc.SPIE, 2007, vol. 6833, pp. 683 316–683 318.

[23] T. Li, N. Hau, Z. Ming, A. Livingston, and V. Asari, “A multisensor imagefusion and enhancement system for assisting drivers in poor lightingconditions,” in Proc. 34th Appl. Imag. Pattern Recognit. Workshop, 2005,pp. 106–113.

[24] L. Wolff, D. Socolinsky, and C. Eveland, “Versatile low power multi spec-tral video fusion hardware,” in Proc. SPIE, vol. 6206, Infrared Technologyand Applications XXXII, 2006, p. 620 624.

[25] L. Wolff, D. Socolinsky, and C. Eveland, “Advances in low power visi-ble/thermal IR video image fusion hardware,” in Proc. SPIE, vol. 5782,Thermosense, 2005, pp. 54–58.

[26] M. I. Smith, A. N. Ball, and D. Hooper, “Real time image fusion: A visionaid for helicopter pilotage,” in Proc. SPIE, vol. 4713, Real-time imagingVI, 2002, pp. 83–94.

[27] M. Petrou and C. Petrou, Image Processing: The Fundamentals.Hoboken, NJ: Wiley, 2010.

[28] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach.Englewood Cliffs, NJ: Prentice-Hall, 2002.

[29] A. Toet, “Natural color mapping for multiband nightvision imagery,” Inf.Fusion, vol. 4, no. 3, pp. 155–166, Sep. 2003.

Dimitrios Basiris was born in Agrinio, Greece, in1978. He received the B.Sc. degree in physics andthe M.Sc. degree in electronics from the ElectronicsLaboratory, University of Patras, Rio, Greece, in2002 and 2005, respectively, where he is currentlyworking toward the Ph.D. degree in the field of imageand video processing.

He has been an Image Processing Engineer anda Senior Verilog Hardware Description LanguageDeveloper in the industry during the past seven years.He is also currently with IRIDA Labs, Rio. His main

research interests include image and video processing (browsing, retrieval,summarization, and video object tracking) and graph theory.

Vassilis Tsagaris (M’00) was born in Athens,Greece, in 1974. He received the B.Sc. degree inphysics in 1997, the M.Sc. degree in electronics andcomputer science in 2000, and the Ph.D. degree indata fusion and remote sensing from the ElectronicsLaboratory (ELLAB), University of Patras (UoP),Rio, Greece.

He has been a Researcher, a Postdoctoral Re-searcher, or a Project Manager for about ten Euro-pean and national R&D projects for the academicand company sectors. Part of his former working

experience was as a Research Fellow with the ELLAB, UoP. He is one of thethree co-founders of IRIDA Labs, acting as Chief Executive Officer (CEO) andBusiness Development Manager. The fields of activities covered in projectslike ADHOCSYS (FP6), MEO (ESA), SESAMO (EDA), THETIS (nationalproject), and SYNTHESIS (national project) are image (data) and decisionfusion, pattern recognition, remote sensing, and information technology. Asa result, he has published more than 25 journal and conference papers. Hismain research interests include pattern recognition, information processing andfusion, computer vision, and applications in embedded systems.

Nikolaos Fragoulis (M’99) received the B.Sc. de-gree in physics, the M.Sc. degree in electronics,and the Ph.D. degree in microelectronics from theUniversity of Patras (UoP), Rio, Greece, in 1995,1998, and 2005, respectively.

During the past ten years, he has been a SoftwareEngineer and an Embedded Systems Engineer inthe private sector. He was also a Research Fellowwith the Electronics Laboratory, UoP, where he wasworking on the development of image and decisionfusion techniques, as well as remote sensing tech-

nologies using synthetic aperture radar techniques, through participation inseveral related projects (e.g., THETIS: Ship Vessel Traffic Monitoring UsingSatellite SAR Data, funded by the Greek Government/General Secretariat forResearch and Technology). He is one of the three co-founders of IRIDA Labs,acting as Chief Technical Officer (CTO) and Product Developer Manager.He is a Technical Manager for the project SYNTHESIS, involved in field-programmable-gate-array-based hardware implementation of computer visionalgorithms. He has published more than 30 journal and conference papersand three book chapters, while he has participated in several national- andEuropean-funded research and technology development projects. His mainfields of expertise include microelectronics, embedded systems design, andcomputer vision.

Christos Theoharatos (M’00) was born in Athens,Greece, in 1973. He received the B.Sc. degree inphysics, the M.Sc. degree in electronics and com-puter science, and the Ph.D. degree in image process-ing and multimedia retrieval from the ElectronicsLaboratory (ELLAB), University of Patras (UoP),Rio, Greece, in 1998, 2001, and 2006, respectively.

After his Ph.D. degree, he was a PostdoctoralResearcher with the Signal, Image, and MultimediaGroup, ELLAB, Department of Physics, UoP. Dur-ing this activity, he was a Research Engineer in a

number of European and national projects in the fields of signal and imageprocessing, multimedia services, and information technology. He is one of thethree co-founders of IRIDA Labs, acting as R&D Project Manager. He is alsocurrently the Project Manager for the project “SYNTHESIS—Developmentof an Information Fusion Chipset” funded by the General Secretariat forResearch and Technology, as well as the Technical Manager on behalf ofIRIDA Labs for the project “NICE—Nonlinear Innovative Control Designs andEvaluations,” funded by the European Defence Agency. He has published morethan 30 journal and conference papers and four book chapters, in the fields ofhis expertise. His main research interests include pattern recognition, imageprocessing and computer vision, data mining, and graph theory.

configurable pixel level color image fusion 2012

Technology

pixellevel image fusion

image fusion problem

pixelbased image fusion

different fusion methods

fused image

transformed image

ieeeabstractimage fusion

fusion technique