method of image fusion and enhancement using mask...

8
Method of Image Fusion and Enhancement Using Mask Pyramid David C. Zhang, Sek Chai and Gooitzen Van der Wal Vision Systems SRI International Sarnoff Princeton, NJ 08540, U.S.A. [email protected], [email protected], [email protected] Abstract - Image fusion is an important visualization technique of integrating coherent spatial and temporal information into a compact form. Laplacian fusion is a process that combines regions of images from different sources into a single fused image based on a salience selection rule for each region. In this paper, we proposed an algorithmic approach using a mask pyramid to better localize the selection process. A mask pyramid operates in different scales of the image to improve the fused image quality beyond a global selection rule. Several examples of this mask pyramid method are provided to demonstrate its performance in a variety of applications. A new embedded system architecture that builds upon the Acadia® II Vision Processor is proposed. Keywords: image fusion, exposure fusion, high dynamic range imaging, focus fusion, image blending, noise reduction, image enhancement. 1 Introduction Since Burt and Adelson published their first paper on Laplacian pyramid-based image processing in 1983[1-4], the multispectral and multiscale representation of an images has been used extensively in image compression, stabilization and fusion. The pyramid is essentially a data structure consisting of a series of band-pass filtered copies of an image, each representing pattern information on a different scale. The image fusion, represented by the Laplacian pyramid of each source image, extracts local salience information from each source image at multiple pyramid levels ranging from coarse to fine, and then reverses the pyramidal operation to form a fused image. The extraction of salient features is based on a specific selection rule. One common selection rule compares the local pixel energy of each pyramid level and selects the pyramid level with the maximum absolute value for each location. Other rules include globally weighted averaging of the Laplacian images of a pyramid level.[5] The Laplacian fusion scheme has been mainly used for image integration of the mixed modality sensors.[6] It is also used in image stitching and multi-focus fusion for mono- modal sensors.[7] Sarnoff’s Acadia ® I vision processor was the first embedded solution for real-time fusion, capable of aligning and fusing two NTSC/PAL images in real- time[8]. The second generation Acadia ® II vision processor can align and fuse three 1280x1024 images in real time, and is also capable of globally and locally enhancing the source images. This local enhancement is done per pyramid level in the Laplacian images that normalize local contrast. Additionally, Waterfall Solutions Ltd. has put forward an FPGA based real-time image fusion hardware device. The hardware supports simple weighted averaging as well as high performance multi- resolution Laplacian fusion.[9] The use of image fusion as a primary vision processing function has gained widespread acceptance over the past several decades. Presently, both hardware and software solutions exist to fuse individual pixels in order to provide an integrated view of a scene captured via multiple sensors or sensor conditions (e.g. focus, exposure). The fusion function can provide increase informational content to the viewer when a priori information about the source images is made available. Thus the purpose of this paper is not to discuss the higher level fusion schemes, but rather to present methods of extending current fusion capabilities and applications for real time embedded systems. Present fusion schemes typically only apply a global selection rule to all image pixels. In many applications, it would be preferable to use a localized rule. Therefore, we propose the generation of a mask pyramid in the hardware architecture to localize pixel selection. We will present examples of this in this paper, demonstrating a generic architectural implementation for the fusion system designed to consider. If any intelligent analysis from the source imagery needs to be used in fusion, it can be programmed into this mask pyramid. We will also demonstrate that with this new capability, we can easily extend the fusion application to image enhancement, high dynamic range compression and image blending among other applications. The paper is organized as follows: the basic and advanced Laplacian fusion model are discussed in Section II. Section III presents examples of how the mask pyramid can be used to extend the fusion function to perform image enhancement, depth-of-field extension, high dynamic range compression and image blending. Section IV addresses the system architecture and Section V concludes the discussion. 14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 2011 978-0-9824438-3-5 ©2011 ISIF 1927

Upload: others

Post on 06-Apr-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

Method of Image Fusion and Enhancement Using Mask Pyramid

David C. Zhang, Sek Chai and Gooitzen Van der Wal

Vision Systems SRI International Sarnoff

Princeton, NJ 08540, U.S.A. [email protected], [email protected], [email protected]

Abstract - Image fusion is an important visualization technique of integrating coherent spatial and temporal information into a compact form. Laplacian fusion is a process that combines regions of images from different sources into a single fused image based on a salience selection rule for each region. In this paper, we proposed an algorithmic approach using a mask pyramid to better localize the selection process. A mask pyramid operates in different scales of the image to improve the fused image quality beyond a global selection rule. Several examples of this mask pyramid method are provided to demonstrate its performance in a variety of applications. A new embedded system architecture that builds upon the Acadia® II Vision Processor is proposed. Keywords: image fusion, exposure fusion, high dynamic range imaging, focus fusion, image blending, noise reduction, image enhancement.

1 Introduction Since Burt and Adelson published their first paper on

Laplacian pyramid-based image processing in 1983[1-4], the multispectral and multiscale representation of an images has been used extensively in image compression, stabilization and fusion. The pyramid is essentially a data structure consisting of a series of band-pass filtered copies of an image, each representing pattern information on a different scale. The image fusion, represented by the Laplacian pyramid of each source image, extracts local salience information from each source image at multiple pyramid levels ranging from coarse to fine, and then reverses the pyramidal operation to form a fused image. The extraction of salient features is based on a specific selection rule. One common selection rule compares the local pixel energy of each pyramid level and selects the pyramid level with the maximum absolute value for each location. Other rules include globally weighted averaging of the Laplacian images of a pyramid level.[5] The Laplacian fusion scheme has been mainly used for image integration of the mixed modality sensors.[6] It is also used in image stitching and multi-focus fusion for mono-modal sensors.[7]

Sarnoff’s Acadia® I vision processor was the first embedded solution for real-time fusion, capable of aligning and fusing two NTSC/PAL images in real-

time[8]. The second generation Acadia® II vision processor can align and fuse three 1280x1024 images in real time, and is also capable of globally and locally enhancing the source images. This local enhancement is done per pyramid level in the Laplacian images that normalize local contrast. Additionally, Waterfall Solutions Ltd. has put forward an FPGA based real-time image fusion hardware device. The hardware supports simple weighted averaging as well as high performance multi-resolution Laplacian fusion.[9]

The use of image fusion as a primary vision processing function has gained widespread acceptance over the past several decades. Presently, both hardware and software solutions exist to fuse individual pixels in order to provide an integrated view of a scene captured via multiple sensors or sensor conditions (e.g. focus, exposure). The fusion function can provide increase informational content to the viewer when a priori information about the source images is made available. Thus the purpose of this paper is not to discuss the higher level fusion schemes, but rather to present methods of extending current fusion capabilities and applications for real time embedded systems. Present fusion schemes typically only apply a global selection rule to all image pixels. In many applications, it would be preferable to use a localized rule. Therefore, we propose the generation of a mask pyramid in the hardware architecture to localize pixel selection. We will present examples of this in this paper, demonstrating a generic architectural implementation for the fusion system designed to consider. If any intelligent analysis from the source imagery needs to be used in fusion, it can be programmed into this mask pyramid. We will also demonstrate that with this new capability, we can easily extend the fusion application to image enhancement, high dynamic range compression and image blending among other applications.

The paper is organized as follows: the basic and advanced Laplacian fusion model are discussed in Section II. Section III presents examples of how the mask pyramid can be used to extend the fusion function to perform image enhancement, depth-of-field extension, high dynamic range compression and image blending. Section IV addresses the system architecture and Section V concludes the discussion.

14th International Conference on Information FusionChicago, Illinois, USA, July 5-8, 2011

978-0-9824438-3-5 ©2011 ISIF 1927

Page 2: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

2 Laplacian Fusion Model 2.1 Laplacian pyramid transform

The FSD (filter-subtract-decimate) Laplacian pyramid transform used here was described in [5]. Some notations used in this paper can be summarized briefly. Let I(ij) be the source image. Let Gk(ij) be a kth level Gaussian pyramid based on I(ij), with k=0 to K. Let Lk(ij) be the corresponding Laplacian pyramid, for k=0 to K-1. The Gaussian is generated through a recursive filter (F) and subsample process. Each level of the Laplacian pyramid is obtained by applying a band-pass filter to the corresponding Gaussian. Therefore, G0 can be decomposed into a multispectral and multiscale representation, or a Laplacian pyramid representation, where each subband of the pyramid contains a certain frequency band of the image.The original image can then be recovered from its Laplacian transform by reversing these steps. This begins with the lowest resolution level of the Gaussian Gk, then uses the Laplacian pyramid levels to recursively recover the higher resolution Gaussian levels and the original image.

2.2 The global selection rule in fusion The fused image is assembled from the selected

component patterns of the source image. In Laplacian pyramid implementation, the Laplacian images serve as the component patterns. The fused output can be represented as follows: f

KfK

fk

fff GLLLLG ⊕⊕⊕⊕⊕= −1100 …… , (1)

where ),( jiLfk is the result of the selection among all

),( jiLnk for Nn <≤0 , and N is the number of source

images. The selection is based on a measure of salience at time t:

( ) ( ) ( )( )⎩⎨⎧ ≠<∀≥−

=otherwise

npNpTijtSijtSifijtijt

pk

nkn

k ,0,,),(ηχ (2)

where n< N, and T is a constant threshold with default value 0. The selection coefficient ( )ijtη is by default a

binary value 1. The global selection rule indicates that if the salience )(ijtS n

k at position (i,j) of pyramid level k is the

maximum among all at time t, the pixel value of n is given the full weight. If the selection is weighted by the mask image, then the selection coefficient can be the mask value at (i,j). The fused Laplacian image is simply the summation of all the selection coefficients: ( ) ∑ ⋅=

n

nk

nk

fk ijtLijtijtL )()(χ . (3)

The salience nkS is defined as the local average of

|)(| tLnk or )(2 tLn

kfor the nth source image at level k, and is

referred to as the local energy of the pixel (i,j) at time t.

2.3 The local selection rule in fusion The pattern selective fusion with the global selection

rule can select a pattern that plays a role in representing important information in a scene. However, it cannot, for

example, select a pattern that contains a best signal-to-noise ratio. In order to do that, we need to modify the selection rule: ),()()( pnMSMS k

pk

pk

nk

nk φ≥− . (4)

Notice that salience is a function of the mask M, and the threshold φ is not a constant. Here n

kM contains local

information pertaining to the source image n at level k, and φk is the local threshold that is associated with source images at level k. <Mk, φk> is a local weight mask that provides guidance in selecting a salience for fusion. The local mask can be either an in-focus mask, a region of interest, a motion detection mask, or a selection from the previous frame. If kφ is separable, then we can define

pk

nk

k pn φφφ −=),( . Thus Eq. (4) can be reformatted as:

pk

pk

pk

nk

nk

nk MSMS φφ −≥− )()( . (5)

To be consistent with the conventional definition, we merge the local mask into the salience computation and still call it salience. Therefore the selection criteria remain the same by Eq. (2), and the Laplacian fusion is formed in the same way by Eq. (3). However, the meaning of salience is not limited to the local energy but extends itself to the role of best representing the information of all the source images based on local criteria. Since the local mask is applied to each pyramid level, there will be a mask pyramid for each source image. The programming of the mask pyramid will lead into different feature selections from the source images and thus provide extensions to other applications.

3 Examples Because we use the mask pyramid to extend the

selection rule, the proposed approach offers a generic methodology for applications in image enhancement, high dynamic range compression, depth of field extension, and image blending. The mask pyramid can also be encoded for intelligent analysis of source imagery. In this section, several examples of this mask pyramid method are provided to demonstrate its performance. A new embedded system architecture based upon the Acadia® II Vision Processor is proposed in the next section.

3.1 Hysteresis Flicker is a flashing effect that often occurs in image

fusion. This distracting, displeasing artifact is often seen in the fusion of LWIR and SWIR or Visible images, when both image sources contain reverse intensities at the same pixel locations. The reverse intensity regions are shown in the ROI windows in Figure 1. The fusion selection rule selects the highest energy among same level Laplacian images, but ignores the signs of the values. When two Laplacian images are equal in magnitude but with opposite sign, small differences due to noise will flip the selection temporally, and thus introduce this artificial flicker in real time.

To improve temporal coherence and reduce flicker artifacts, a hysteresis mask is defined, providing a local feedback mechanism in fusion that biases the selection in

1928

Page 3: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

time t based on time t-1. Let Sk(ij) be the salience associated with the source image pyramid Lk.(ij) at level k, and Mk(ij) be the hysteresis mask. The salience at pixel (i,j) can be expressed in terms of the mask at t-1: ( ) ( ) )(ˆ)1(1 tLthMtS n

knk

nk ⋅−+= , (6)

where )(ˆ tLnk is the local average of |)(| tLn

k or )(2 tLnk for

nth source image, and h is the hysteresis factor in the range of 0 and 1. The hysteresis mask at (i,j) is 1if the salience of image n is the largest, otherwise it is set to 0. The fused laplacian image is based on Eq. (3).

In order to reduce the aliasing artifacts, the hysteresis masks )(tM n

k can be blurred before they are multiplied

with the Laplacian images.

(a) (b)

Figure 1. (a) Intensified TV image. (b) LWIR image. The rectangle shows the region that contains reverse intensities in the two source images.

Figure 2. Diagram of hysteresis in fusing two images. Mk is the hysteresis mask. kL̂ is the local energy image of the Laplacian at level k. The feature selection is based on the Eq.(2). The linear functional block is based on Eq. (6). The hysteresis mask at time t will be used for time t+1 input.

Figure 3 plots the effect of hysteresis applied on the video sequence. The vertical axis is the local average of pixels around the bottom-left corner of the selected ROI window in Figure 1 in the fused image (not shown). The horizontal axis represents the frame number in the sequence. The dashed line in the figure represents the pixel variations without hysteresis, and the circles represent the result with hysteresis added. The flashing artifacts are greatly reduced with hysteresis in fusion (h=0.5).

Figure 3. The dashed line represents the local average of pixels without hysteresis in fusion. The circles are the results with the hysteresis added. Clearly the magnitude of the variation without hysteresis is much greater than that with hysteresis added.

3.2 Fusion with improved SNR In many fusion applications using low light cameras

(e.g. LWIR, SWIR, or Image Intensified), noise in the video severely degrades the fused image. Common noise reduction strategies include median filtering, coring the noise in the source images, and temporal filtering of aligned images. By using the proposed architecture, we are able to design a local selection rule to pick a Laplacian coefficient representing the highest signal-to-noise ratio present.

3.2.1 Additive Gaussian Noise Let Mk(ij) be the noise mask. The salience at pixel (i,j)

can be expressed in terms of the mask: ( ) )()(ˆ tMtLtS n

knk

nk −= , and n

knk NtM =)( , (7)

where )(ˆ tLnk

is the local average of |)(| tLnk

, )(tM nk

is the

noise mask, and nkN is the noise at level k. The window

size for local average is selected as the double of the filter size used in pyramid generation in order to reduce ghosting artifacts. The noise at each level is also decided by the filter size. For example, if the filter is a boxcar with size of 3, then the noise at level 1 is reduced by 3, and at level 2 by 9, and so on. The feature selection rule is defined in Eq. (2) and the fused Laplacian image is obtained by multiplying the selection mask with the laplacian images. In order to reduce the aliasing artifacts, the selection masks can be blurred before they are multiplied with the Laplacian images.

(a) (b)

αkLα

kLαkL̂

)1( −tM kα

βkL

βkL

)1( −tM kβ f

kL

βkL̂

Feature Select

Linear

Linear

)(tM kα

)(tM kβ

1929

Page 4: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

(c)

(d) Figure 4. (a) Visible image. (b) IR imagenoise. (c) Fusion without noise mask. (d) Fmask. Note the increased contrast and speimage (d).

In Figure 4, a visible image and an Lused in fusion. The LWIR image has theFigure 4(c) is the fusion without noise redu4(d), after the noise reduction, the backgrcleaner.

3.2.2 Readout Noise The readout noise is proportional to the

the signal I(ij). Therefore, the signal-toproportional to I1 . If we want to consid

noise as well, then the noise mask is encodscale factor )(ijn

kα and nkN . The salience

be expressed in terms of the mask:

( ) )(~)(ˆ)(ˆ tMtLtMtS nk

nk

nk

nk −⋅= ,

)(11)(ˆ

tI

CtMnk

nk

nk −=−= α , and n

k tM =)(~

where >=< nk

nk

nk MMM ~,ˆ , and C is a const

Gaussian pyramid generated from the sousame filter is used for the Laplacian algorithm is summarized in the block diagr

e with Gaussian Fusion with noise eckle-free sky in

LWIR image are e additive noise. uction. In Figure round is a much

e square-root of o-noise ratio is der the Gaussian

ded with both the at pixel (i,j) can

(8) nkN , (9)

tant. )(tI nk is the

urce images. The pyramid. The ram Fig 5.

Figure 5. Diagram of readout noisefusion. Mk is the readout noise masimage of the Laplacian at level k. Tboth the gain and offset noise.

3.3 Depth of Field ExtensDepth of field (DoF) extension

sensors is achieved through the fsame scene taken with different foimage, where each scene within thcontrast and optimal focus. Regpyramid representation, commapproaches decompose an image and use a selection rule at each subimage at all scales. The subband recombined into a single image. Tin-focus feature has the maximumnaturally by pattern-selective fsaliency computation guarantees selected over those out-of-focus.applying the maximum-selection rupyramids, the extended DoF image

However, the above assumptiaperture size of the camera is larDoF to begin with), 2) the focal lthe distance from the objects (ecase), and/or 3) the scene has a randepth of field. For these cases, the subband does not guarantee that thfrom the in-focus scene feature. image may need a mask to hefeatures.

Let Mk(ij) be the in-focus mask(i,j) can be expressed in terms of th ( ) )(ˆ)( tLtMtS n

knk

nk ⋅= ,

the feature selection rule is definfused Laplacian image is obtained b

There are several ways to compOur implementation used the unsias the energy map, taking advarepresentation of the source images

1. For each source image In, comp

pyramid level of image n. 2. Blur n

mL with a fixed Gaussian fi

αkL

αkL̂

αkM

βkL

βkM

βkL̂

Feature Select

Linear

Linear

e reduction in two image sk.

kL̂ is the local energy he mask image contains

ion n using standard image fusion of images of the ocal points into a single he image has maximum gardless of wavelet or mon extended DoF into multiple subbands

bband to form a subband sets of images are then The assumption that an m contrast is supported fusion, as the feature

the in-focus feature is Therefore, by simply ule among all the image

e is generated. ion fails when 1) the rge (insufficient optical length is comparable to e.g. extreme near-field

nge much larger than the feature salience at each

he maximum contrast is Therefore, each source

elp define its in-focus

k. The salience at pixel he mask:

(10)

ned in Eq. (2), and the by Eq. (3).

pute the in-focus masks. igned Laplacian images antage of the pyramid s, as summarized below:

pute nmL , where m is a

ilter: nm

nm LFE ⊗= .

αkL

βkL

fkL

1930

Page 5: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

3. Generate a binary mask nmM for each source at the level

m. The mask )(ijtM nm is 1 if )(ijtE n

m is the largest

among energy images at level m, otherwise, it is set to 0.

(a)

(b)

(c) Figure 6. (a) Four source images with different depth of field. The images were taken such that there was no parallax between them. (b) The fused image using in-focus masks. The color components are weight-averaged using the in-focus masks. (c) The above image shows the fusion without in-focus masks; the bottom image is the result after masking operation.

4. Create mask pyramid }{ nmM by upscaling the mask n

mM

from level m to level 0, and subsampling it from level K to the top pyramid level.

Figure 7. Diagram of fusing two images taken with different lens focus settings. Mk is the in-focus mask. kL̂ is the local energy image of the Laplacian at level k. The feature selection is based on the Eq.(2). Figure 6(a) shows four 1280x1024 images, each with different depth of field. To avoid parallax, instead of adjusting the position of the lens with respect to the object, the sensor-to-lens distance only was adjusted. Therefore, the alignment among these DoF images is just a scale factor, which is only a function of the distance between the sensor and the lens. Figure 6(b) shows the result of the fusion using the in-focus masks. In this example, the fusion was processed only on the luminance images. The color components of these images (UV) were weight-averaged by the in-focus masks at level 0. The final image is the combination of the averaged UV and the fused luminance, converted to RGB for display. If no in-focus mask is used, the reference image color component may be combined with the fused result to form the final image. One could also fuse each individual RGB band, but since the fused )(ijcLf

k value from each color band may not

come from the same source image, color distortion may result. The weighted average of the color components using in-focus masks thus reduces the aliasing and distortion in the result image. Figure 6(c) shows the fusion from a small region of interest in the image. The upper image is the result without using in-focus masks and the one below is using the masks. Clearly the result using the masks has higher contrast and less color distortion.

3.4 Image Fusion with Blending When multiple cameras are not co-aligned or have

different fields of view, the resultant fused image may show the borders of the source images that align to the reference image. In order to remove the borders, we apply a blending technique that has been used in image mosaicking and wavelet fusion. The blended region of the aligned images can be determined by the alignment parameters. To reduce the aliasing artifacts, the selection masks may be blurred before they are multiplied with the Laplacian images and the blend masks.

αkLα

kLαkL̂

αkM

βkL

βkL

βkM f

kL

βkL̂

Feature Select

1931

Page 6: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

Figure 8. The transition of the values in the blend mask image is at the ROI edge. The transition functions in the horizontal and vertical directions are illustrated at the bottom and the right side of the image respectively.

Figure 9. Diagram of blending two images. Mk is generated from the ROI pyramid. kL̂ is the local energy image of the Laplacian at level k. The feature selection is based on the Eq.(2) such that the output is either the masking coefficient or 0.

In Figure 8, the aligned image is represented by the gray area. The blend mask ROI is defined as a rectangle inside the gray area. The closest distance from the ROI to the gray area boundary is a function of the blending function and the number of pyramid levels. On the two sides of the mask ROI edge, the mask values vary from 0 to 1. The size of the transition function is fixed at all levels.

Once the region of interest is known, we can define a mask image at level 0 with 1 representing the valid pixel region and 0 the invalid pixels. The mask pyramid is generated by smoothing with a filter F and decimating by half: 2)(1 ↓⊗=+

ni

ni MFM . (11)

)(tM ni is the blend mask at level k. F is a normalized

Gaussian filter. The size of the filter determines the softness of the transitions at the blended boundary. The salience at pixel (i,j) can be expressed in terms of the mask: ( ) )(ˆ tLMtS n

knk

nk ⋅= , (12)

the feature selection rule is defined in Eq. (2), and the fused Laplacian image is obtained by Eq. (3).

Figure 10 is another example, showing the blend of a color image with a SWIR image. The fusion without blending is displayed in Figure 10(a), and the edge blended result is given in Figure 10(b).

(a) (b) Figure 10. (a) The fused image with no blending. (b) The fused image with blending technique.

3.5 High Dynamic Range Compression

Many techniques have been developed to compress high dynamic range (HDR) images. One common approach is called ‘tone mapping’. Tone mapping merges a set of low dynamic range (LDR) images captured with different exposure times into a single HDR image. Each LDR image needs to be converted based on the camera response curve so that the complete set of images can linearly average based on the exposure. The fidelity of the pixel value in each LDR image is also weighted to avoid using underflow and overflow values. The HDR image is then contrast normalized (i.e. tone mapped) to compress the overall dynamic range while preserving contrast, enabling presentation of the final LDR image on traditional display media. Shortcomings in many tone mapping implementations include the introduction of halo artifacts and the tendency of standard image compression techniques to make the resulting image look “unreal”.

(a) (b)

(c) (d) Figure 11. HDRC Example. (a) T=1/250s (b) T=1/100s (c) T=1/60s (d) Fusion result.The first three LDR images are taken at different exposure times. The last image is the fused result using weight maps and contrast normalization at all pyramid levels.

The compression of the HDR image aims to preserve the local contrast details from the LDR images. Since this is a fundamental characteristic of Laplacian fusion, it is clearly well suited to this task. Further, the creation of an intermediate HDR image is not needed, and thus no tone mapping step is required. High Dynamic Range Compression (HDRC) is thus far more efficient and easier

αkLα

kLαkL̂

αkM

βkL

βkL

βkM f

kL

βkL̂

Feature Select

Mask ROIMask ROI

1932

Page 7: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

to implement in the hardware than previous techniques. Since each component LDR image contains underflow and overflow pixels, mask operations are needed to exclude their influence. Further, if the contrast in any single LDR image needs to be enhanced, the pyramid based contrast normalization method can be performed during the fusion process. We define Mk(ij) as the weight map, with the underflow and overflow regions receive less weight and saturated color regions receiving more (easily determined if images are first converted to the HSV [Hue, Saturation and Value] color space). The weight map is a function of the saturation and camera exposure curve: ( ) ),),(( pa

nk

nk

nk ESijtIMijtM = , (13)

where Sa is the average saturation around (ij), and Ep is associated with the exposure time and weight for source image n. The salience at pixel (i,j) can be expressed in terms of the normalized local average of absolute Laplacian )(~ ijtLn

k : ( ) )()(~ ijtMijtLijtS n

knk

nk ⋅= . (14)

The fused Laplacian image is then obtained by Eq. (3), in which )(( ijtn

kχ is the normalized )(ijtS nk .

Figure 11 shows an HDRC example. Three LDR images are taken at different integration times as indicated. The fusion is combined with contrast normalization at all pyramid levels to enhance fine features.

Figure 12. Diagram of HDRC. The feature selection is bypassed. The normalized saliency is the selection weight.

In the fusion process, the fused Laplacian image at level k is the average of the Laplacian images weighted by the weight map. The fused Gaussian image at the top level is also shaped to contain less dynamic range. The DC (i.e. offset) level of the Gaussian is adjusted to be near the middle of the dynamic range. The reconstructed fusion result is shown in the last image in Figure 11.

4 Hardware Implementation The functional block diagrams for the different

application examples in Section III using mask pyramids can be assembled into the canonical implementation illustrated in Figure 13. This architecture is an example implementation that supports the selection and usage of the pyramid masks for image fusion and enhancement. The generation of the mask pyramid can be done

elsewhere depending on the application, and can be tuned as needed by this architecture.

Figure 13. Canonical Block Diagram of Mask Pyramid implementation for image fusion.

The hardware architecture is best described by following the data flow of the image in the block diagram. Inputs to the hardware block include the image Lk and the optionally pre-stored mask pyramid Mk at a pyramid level. The center portion of the block diagram describes how the mask pyramid is selected, used, and optionally generated for the next cycle. At point c, a filtering operation is used to determine either the absolute value or the square of the input data and apply m-tap filtering. This operation computes the local energy of the image. At point d, the local weight is generated based on inputs Lk and Mk, using a Func block consisting of an arithmetic logic unit (ALU) for table lookup, multiplications, subtractions, additions, and linear operations. At point f, the local weight coefficient is applied to the input image and thus contrast-enhances the original image. The Feature Select hardware selects either a pre-stored mask pyramid from external memory or locally calculated binary values. The output of the Feature Select block is an optional mask image can be saved to external memory for the next image frame, e.g. in hysteresis processing at point g, or at point e as a filter to reduce aliasing when used as the weight for final averaging of the enhanced Laplacian images. At point h, individual Laplacians are locally weighted before they are summed to form the locally enhanced fused Laplacian at point j. Finally, the images at different pyramid levels are summed to produce an output image.

It is worth mentioning that the functional block in Figure 13 may be configured to perform single image enhancement. Except for the input channel, the mask images for all other channels may be set to 0. Other input channels may be optionally disabled mode to save power as needed. Point d provides the local weight coefficients applied to the input Laplacian. The mapping function may be programmed as a local normalized contrast map[11], a histogram enhanced map, or a noise coring map among others.

For Acadia II System-on-Chip (SoC) applications[12], the processing for the canonical Mask Pyramid implementation may be accelerated by the use of its Video Processing Modules as connected modules on the Crossbar Switch (Fig. 14). These modules are hardware accelerators optimized for different pyramid image processing functions, including a three channel image fusion module and an ALU device. Referring to the Figure 13, image

αkLα

kLαkL̂

αkM

βkLβ

kL

βkM

fkL

βkL̂

Feature Select

Func

Func

αkS

βkS

αχk

βχ k

kLαkL

αkL̂

αkM

'βkL

βkL

βkM

fkL

βkL̂

Filter

Filter

Filter

Filter

'αkM

'βkM

Feature Select

a b

c d e

f

g

h

Func

Func

i

1933

Page 8: Method of Image Fusion and Enhancement Using Mask Pyramidfusion.isif.org/proceedings/Fusion_2011/data/papers/253.pdf · 2014-10-02 · Method of Image Fusion and Enhancement Using

convolution can be performed using the FILTER device for image data (represented by point c and e). The Mask Pyramid can be implemented using the FILTER and FSP (Frame Store Ports) hardware connected in a pipeline shown in Figure 14. There are a total of 24 FSPs in Acadia II. When reading or writing to the FSPs, a raster-scan-ordered data stream is transferred between the Crossbar Switch and external memory. Up- and down-sampling may be performed by the FSP. The Feature Select functionality may be implemented using the FSPs and the ALU, or FUSE device. This mapping of this processing to the Acadia II is only an example of a possible implementation, as alternative mapping exists on the Acadia II SoC.

Figure 14. Acadia II SoC detailed block diagram.

5 Discussion and Conclusion This paper has discussed the use of the mask pyramid to

extend the conventional pyramid based fusion architecture to other image processing applications. The innovation comes from the use of local information to determine salience and thus influence the feature selection map. The canonical implementation is configured not to just execute single applications (as discussed in Section III), but is flexible enough to combine multiple applications in a comprehensible way. For example, the hardware can be configured to simultaneous perform both DoF and HDRC processing while locally enhancing the source image.[13]

The hierarchical local-information-encoded masking technique presented herein is not limited to pyramid fusion, but may apply to any wavelet fusion. The input data source can be multi-dimensional, and is not limited to vision data. The feature selection rule as described can output local weights when the register is enabled, and is thus not confined to binary.

References [1] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code”, IEEE Trans. Commun. COM-31, 532-540, 1983. [2] E. Adelson, C.H. Anderson, J.R. Bergen, P.J. Burt and J.M. Ogden, “Pyramid methods in image processing”, RCA Engineer 29, 33-41, 1984.

[3] P. J. Burt, “The pyramid as a structure for efficient computation”, Multi-resolution Image Processing and Analysis, A. Rosenfeld, ed., Springer-Verlag, Berlin, 1984. [4] J.M. Ogden, E.H. Adelson , J R. Bergen and P.J. Burt, “Pyramid-based computer graphics”, RCA Engineer, 30-5, 1985. [5] P. J. Burt, and R. J. Kolczynski, “Enhanced image capture through fusion”, Proc. International Conference on Computer Vision, Berlin, pp. 173-182, 1993. [6] Moira I. Smith, Jamie P. Heather, “Review of image fusion technology in 2005”, Thermosense XXVII, Proceedings of the SPIE, 5782, pp. 29-45, 2005. [7] “Multi-Image Fusion Makes Panoramic Stitching Super Easy”, http://www.tested.com/news/multi-image-fusion-makes-panoramic-stitching-super-easy/253/. [8] Acadia Video Processors, http://www.sarnoff.com/ products/acadia-video-processors/acadia-II, and http://www.sarnoff.com/products/acadia-video-processors/acadia-pci. [9] Fuzer, http://www.waterfallsolutions.co.uk/downloads/ products.html#VORTEX. [10] G. S. van der Wal and J. O. Sinniger, “Real time pyramid transform architecture”, Proc. Intelligent Robots and Comp. Vision, Boston, pp. 300-305, 1985.

[11] P. Burt, C. Zhang, G. van der Wal, “Image Enhancement through Contrast Normalization,” Military Sensing Symposium, February 6, 2007.

[12] G. S. van der Wal, “Technical Overview of the Sarnoff Acadia II Vision Processor,” SPIE Defense, Security, and Sensing Conference, Subconference 7710: Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications, Orlando, April 2010, Proc. SPIE 7710, (2010). [13] A. Sufi, D.C. Zhang and Gooitzen Van der Wal, “A Single Algorithm Combining Exposure and Focus Fusion,” ICIP, Brussels, Belgium, September, 2011.

SDI: VIDEO MEMORY ARBITER

WARP 02

WARP 01

WARP 00

VIDEOIN0

VIDEOIN1

NR 00VIN 00 NUC 00 HIST 00 BAY 00

16 14

UV-14

16

16

UV-14

14 14 14

NR 01VIN 01 NUC 01 HIST 01 BAY 01

16 14

UV-14

16

16

UV-14

14 14 14

NR 02VIN 02 NUC 02 HIST 02

16 14VIDEOIN2 16

16

14 14

EQ

DCT/IDCT

DDR2Controller

DMA

OSD

Flash Ctrl

AMBA Fabric

FSP 23

FSP 00

WARP 03

FUSE .

CC

HIST03HIST04

2X

24X

FILTER2X

GME

GCR

ALU

LUT2X

Stereo

HCDGEN

3X

DISPLAY

10 R

VIDEOOUT

LNK 008

LINKPORTS

16

16

16 10 B

10 G

8

LNK 018

8

8

LNK 028

8

8

To/From Flash

To/From DDR2 DRAM

MULTI-PORT MEMORY ARBITERS

Crossbar Switch

MPCore

ARM 11

w/FPU64kByteCache

ARM 11

w/FPU64kByteCache

ARM 11

w/FPU64kByteCache

ARM 11

w/FPU64kByteCache

LRAM64KB

USB 2.0

USB 2.0

5x UART

2x SPI

I2C16x GPIO

1934