method of image fusion and enhancement using mask...
TRANSCRIPT
Method of Image Fusion and Enhancement Using Mask Pyramid
David C. Zhang, Sek Chai and Gooitzen Van der Wal
Vision Systems SRI International Sarnoff
Princeton, NJ 08540, U.S.A. [email protected], [email protected], [email protected]
Abstract - Image fusion is an important visualization technique of integrating coherent spatial and temporal information into a compact form. Laplacian fusion is a process that combines regions of images from different sources into a single fused image based on a salience selection rule for each region. In this paper, we proposed an algorithmic approach using a mask pyramid to better localize the selection process. A mask pyramid operates in different scales of the image to improve the fused image quality beyond a global selection rule. Several examples of this mask pyramid method are provided to demonstrate its performance in a variety of applications. A new embedded system architecture that builds upon the Acadia® II Vision Processor is proposed. Keywords: image fusion, exposure fusion, high dynamic range imaging, focus fusion, image blending, noise reduction, image enhancement.
1 Introduction Since Burt and Adelson published their first paper on
Laplacian pyramid-based image processing in 1983[1-4], the multispectral and multiscale representation of an images has been used extensively in image compression, stabilization and fusion. The pyramid is essentially a data structure consisting of a series of band-pass filtered copies of an image, each representing pattern information on a different scale. The image fusion, represented by the Laplacian pyramid of each source image, extracts local salience information from each source image at multiple pyramid levels ranging from coarse to fine, and then reverses the pyramidal operation to form a fused image. The extraction of salient features is based on a specific selection rule. One common selection rule compares the local pixel energy of each pyramid level and selects the pyramid level with the maximum absolute value for each location. Other rules include globally weighted averaging of the Laplacian images of a pyramid level.[5] The Laplacian fusion scheme has been mainly used for image integration of the mixed modality sensors.[6] It is also used in image stitching and multi-focus fusion for mono-modal sensors.[7]
Sarnoff’s Acadia® I vision processor was the first embedded solution for real-time fusion, capable of aligning and fusing two NTSC/PAL images in real-
time[8]. The second generation Acadia® II vision processor can align and fuse three 1280x1024 images in real time, and is also capable of globally and locally enhancing the source images. This local enhancement is done per pyramid level in the Laplacian images that normalize local contrast. Additionally, Waterfall Solutions Ltd. has put forward an FPGA based real-time image fusion hardware device. The hardware supports simple weighted averaging as well as high performance multi-resolution Laplacian fusion.[9]
The use of image fusion as a primary vision processing function has gained widespread acceptance over the past several decades. Presently, both hardware and software solutions exist to fuse individual pixels in order to provide an integrated view of a scene captured via multiple sensors or sensor conditions (e.g. focus, exposure). The fusion function can provide increase informational content to the viewer when a priori information about the source images is made available. Thus the purpose of this paper is not to discuss the higher level fusion schemes, but rather to present methods of extending current fusion capabilities and applications for real time embedded systems. Present fusion schemes typically only apply a global selection rule to all image pixels. In many applications, it would be preferable to use a localized rule. Therefore, we propose the generation of a mask pyramid in the hardware architecture to localize pixel selection. We will present examples of this in this paper, demonstrating a generic architectural implementation for the fusion system designed to consider. If any intelligent analysis from the source imagery needs to be used in fusion, it can be programmed into this mask pyramid. We will also demonstrate that with this new capability, we can easily extend the fusion application to image enhancement, high dynamic range compression and image blending among other applications.
The paper is organized as follows: the basic and advanced Laplacian fusion model are discussed in Section II. Section III presents examples of how the mask pyramid can be used to extend the fusion function to perform image enhancement, depth-of-field extension, high dynamic range compression and image blending. Section IV addresses the system architecture and Section V concludes the discussion.
14th International Conference on Information FusionChicago, Illinois, USA, July 5-8, 2011
978-0-9824438-3-5 ©2011 ISIF 1927
2 Laplacian Fusion Model 2.1 Laplacian pyramid transform
The FSD (filter-subtract-decimate) Laplacian pyramid transform used here was described in [5]. Some notations used in this paper can be summarized briefly. Let I(ij) be the source image. Let Gk(ij) be a kth level Gaussian pyramid based on I(ij), with k=0 to K. Let Lk(ij) be the corresponding Laplacian pyramid, for k=0 to K-1. The Gaussian is generated through a recursive filter (F) and subsample process. Each level of the Laplacian pyramid is obtained by applying a band-pass filter to the corresponding Gaussian. Therefore, G0 can be decomposed into a multispectral and multiscale representation, or a Laplacian pyramid representation, where each subband of the pyramid contains a certain frequency band of the image.The original image can then be recovered from its Laplacian transform by reversing these steps. This begins with the lowest resolution level of the Gaussian Gk, then uses the Laplacian pyramid levels to recursively recover the higher resolution Gaussian levels and the original image.
2.2 The global selection rule in fusion The fused image is assembled from the selected
component patterns of the source image. In Laplacian pyramid implementation, the Laplacian images serve as the component patterns. The fused output can be represented as follows: f
KfK
fk
fff GLLLLG ⊕⊕⊕⊕⊕= −1100 …… , (1)
where ),( jiLfk is the result of the selection among all
),( jiLnk for Nn <≤0 , and N is the number of source
images. The selection is based on a measure of salience at time t:
( ) ( ) ( )( )⎩⎨⎧ ≠<∀≥−
=otherwise
npNpTijtSijtSifijtijt
pk
nkn
k ,0,,),(ηχ (2)
where n< N, and T is a constant threshold with default value 0. The selection coefficient ( )ijtη is by default a
binary value 1. The global selection rule indicates that if the salience )(ijtS n
k at position (i,j) of pyramid level k is the
maximum among all at time t, the pixel value of n is given the full weight. If the selection is weighted by the mask image, then the selection coefficient can be the mask value at (i,j). The fused Laplacian image is simply the summation of all the selection coefficients: ( ) ∑ ⋅=
n
nk
nk
fk ijtLijtijtL )()(χ . (3)
The salience nkS is defined as the local average of
|)(| tLnk or )(2 tLn
kfor the nth source image at level k, and is
referred to as the local energy of the pixel (i,j) at time t.
2.3 The local selection rule in fusion The pattern selective fusion with the global selection
rule can select a pattern that plays a role in representing important information in a scene. However, it cannot, for
example, select a pattern that contains a best signal-to-noise ratio. In order to do that, we need to modify the selection rule: ),()()( pnMSMS k
pk
pk
nk
nk φ≥− . (4)
Notice that salience is a function of the mask M, and the threshold φ is not a constant. Here n
kM contains local
information pertaining to the source image n at level k, and φk is the local threshold that is associated with source images at level k. <Mk, φk> is a local weight mask that provides guidance in selecting a salience for fusion. The local mask can be either an in-focus mask, a region of interest, a motion detection mask, or a selection from the previous frame. If kφ is separable, then we can define
pk
nk
k pn φφφ −=),( . Thus Eq. (4) can be reformatted as:
pk
pk
pk
nk
nk
nk MSMS φφ −≥− )()( . (5)
To be consistent with the conventional definition, we merge the local mask into the salience computation and still call it salience. Therefore the selection criteria remain the same by Eq. (2), and the Laplacian fusion is formed in the same way by Eq. (3). However, the meaning of salience is not limited to the local energy but extends itself to the role of best representing the information of all the source images based on local criteria. Since the local mask is applied to each pyramid level, there will be a mask pyramid for each source image. The programming of the mask pyramid will lead into different feature selections from the source images and thus provide extensions to other applications.
3 Examples Because we use the mask pyramid to extend the
selection rule, the proposed approach offers a generic methodology for applications in image enhancement, high dynamic range compression, depth of field extension, and image blending. The mask pyramid can also be encoded for intelligent analysis of source imagery. In this section, several examples of this mask pyramid method are provided to demonstrate its performance. A new embedded system architecture based upon the Acadia® II Vision Processor is proposed in the next section.
3.1 Hysteresis Flicker is a flashing effect that often occurs in image
fusion. This distracting, displeasing artifact is often seen in the fusion of LWIR and SWIR or Visible images, when both image sources contain reverse intensities at the same pixel locations. The reverse intensity regions are shown in the ROI windows in Figure 1. The fusion selection rule selects the highest energy among same level Laplacian images, but ignores the signs of the values. When two Laplacian images are equal in magnitude but with opposite sign, small differences due to noise will flip the selection temporally, and thus introduce this artificial flicker in real time.
To improve temporal coherence and reduce flicker artifacts, a hysteresis mask is defined, providing a local feedback mechanism in fusion that biases the selection in
1928
time t based on time t-1. Let Sk(ij) be the salience associated with the source image pyramid Lk.(ij) at level k, and Mk(ij) be the hysteresis mask. The salience at pixel (i,j) can be expressed in terms of the mask at t-1: ( ) ( ) )(ˆ)1(1 tLthMtS n
knk
nk ⋅−+= , (6)
where )(ˆ tLnk is the local average of |)(| tLn
k or )(2 tLnk for
nth source image, and h is the hysteresis factor in the range of 0 and 1. The hysteresis mask at (i,j) is 1if the salience of image n is the largest, otherwise it is set to 0. The fused laplacian image is based on Eq. (3).
In order to reduce the aliasing artifacts, the hysteresis masks )(tM n
k can be blurred before they are multiplied
with the Laplacian images.
(a) (b)
Figure 1. (a) Intensified TV image. (b) LWIR image. The rectangle shows the region that contains reverse intensities in the two source images.
Figure 2. Diagram of hysteresis in fusing two images. Mk is the hysteresis mask. kL̂ is the local energy image of the Laplacian at level k. The feature selection is based on the Eq.(2). The linear functional block is based on Eq. (6). The hysteresis mask at time t will be used for time t+1 input.
Figure 3 plots the effect of hysteresis applied on the video sequence. The vertical axis is the local average of pixels around the bottom-left corner of the selected ROI window in Figure 1 in the fused image (not shown). The horizontal axis represents the frame number in the sequence. The dashed line in the figure represents the pixel variations without hysteresis, and the circles represent the result with hysteresis added. The flashing artifacts are greatly reduced with hysteresis in fusion (h=0.5).
Figure 3. The dashed line represents the local average of pixels without hysteresis in fusion. The circles are the results with the hysteresis added. Clearly the magnitude of the variation without hysteresis is much greater than that with hysteresis added.
3.2 Fusion with improved SNR In many fusion applications using low light cameras
(e.g. LWIR, SWIR, or Image Intensified), noise in the video severely degrades the fused image. Common noise reduction strategies include median filtering, coring the noise in the source images, and temporal filtering of aligned images. By using the proposed architecture, we are able to design a local selection rule to pick a Laplacian coefficient representing the highest signal-to-noise ratio present.
3.2.1 Additive Gaussian Noise Let Mk(ij) be the noise mask. The salience at pixel (i,j)
can be expressed in terms of the mask: ( ) )()(ˆ tMtLtS n
knk
nk −= , and n
knk NtM =)( , (7)
where )(ˆ tLnk
is the local average of |)(| tLnk
, )(tM nk
is the
noise mask, and nkN is the noise at level k. The window
size for local average is selected as the double of the filter size used in pyramid generation in order to reduce ghosting artifacts. The noise at each level is also decided by the filter size. For example, if the filter is a boxcar with size of 3, then the noise at level 1 is reduced by 3, and at level 2 by 9, and so on. The feature selection rule is defined in Eq. (2) and the fused Laplacian image is obtained by multiplying the selection mask with the laplacian images. In order to reduce the aliasing artifacts, the selection masks can be blurred before they are multiplied with the Laplacian images.
(a) (b)
αkLα
kLαkL̂
)1( −tM kα
βkL
βkL
)1( −tM kβ f
kL
βkL̂
Feature Select
Linear
Linear
)(tM kα
)(tM kβ
1929
(c)
(d) Figure 4. (a) Visible image. (b) IR imagenoise. (c) Fusion without noise mask. (d) Fmask. Note the increased contrast and speimage (d).
In Figure 4, a visible image and an Lused in fusion. The LWIR image has theFigure 4(c) is the fusion without noise redu4(d), after the noise reduction, the backgrcleaner.
3.2.2 Readout Noise The readout noise is proportional to the
the signal I(ij). Therefore, the signal-toproportional to I1 . If we want to consid
noise as well, then the noise mask is encodscale factor )(ijn
kα and nkN . The salience
be expressed in terms of the mask:
( ) )(~)(ˆ)(ˆ tMtLtMtS nk
nk
nk
nk −⋅= ,
)(11)(ˆ
tI
CtMnk
nk
nk −=−= α , and n
k tM =)(~
where >=< nk
nk
nk MMM ~,ˆ , and C is a const
Gaussian pyramid generated from the sousame filter is used for the Laplacian algorithm is summarized in the block diagr
e with Gaussian Fusion with noise eckle-free sky in
LWIR image are e additive noise. uction. In Figure round is a much
e square-root of o-noise ratio is der the Gaussian
ded with both the at pixel (i,j) can
(8) nkN , (9)
tant. )(tI nk is the
urce images. The pyramid. The ram Fig 5.
Figure 5. Diagram of readout noisefusion. Mk is the readout noise masimage of the Laplacian at level k. Tboth the gain and offset noise.
3.3 Depth of Field ExtensDepth of field (DoF) extension
sensors is achieved through the fsame scene taken with different foimage, where each scene within thcontrast and optimal focus. Regpyramid representation, commapproaches decompose an image and use a selection rule at each subimage at all scales. The subband recombined into a single image. Tin-focus feature has the maximumnaturally by pattern-selective fsaliency computation guarantees selected over those out-of-focus.applying the maximum-selection rupyramids, the extended DoF image
However, the above assumptiaperture size of the camera is larDoF to begin with), 2) the focal lthe distance from the objects (ecase), and/or 3) the scene has a randepth of field. For these cases, the subband does not guarantee that thfrom the in-focus scene feature. image may need a mask to hefeatures.
Let Mk(ij) be the in-focus mask(i,j) can be expressed in terms of th ( ) )(ˆ)( tLtMtS n
knk
nk ⋅= ,
the feature selection rule is definfused Laplacian image is obtained b
There are several ways to compOur implementation used the unsias the energy map, taking advarepresentation of the source images
1. For each source image In, comp
pyramid level of image n. 2. Blur n
mL with a fixed Gaussian fi
αkL
αkL̂
αkM
βkL
βkM
βkL̂
Feature Select
Linear
Linear
e reduction in two image sk.
kL̂ is the local energy he mask image contains
ion n using standard image fusion of images of the ocal points into a single he image has maximum gardless of wavelet or mon extended DoF into multiple subbands
bband to form a subband sets of images are then The assumption that an m contrast is supported fusion, as the feature
the in-focus feature is Therefore, by simply ule among all the image
e is generated. ion fails when 1) the rge (insufficient optical length is comparable to e.g. extreme near-field
nge much larger than the feature salience at each
he maximum contrast is Therefore, each source
elp define its in-focus
k. The salience at pixel he mask:
(10)
ned in Eq. (2), and the by Eq. (3).
pute the in-focus masks. igned Laplacian images antage of the pyramid s, as summarized below:
pute nmL , where m is a
ilter: nm
nm LFE ⊗= .
αkL
βkL
fkL
1930
3. Generate a binary mask nmM for each source at the level
m. The mask )(ijtM nm is 1 if )(ijtE n
m is the largest
among energy images at level m, otherwise, it is set to 0.
(a)
(b)
(c) Figure 6. (a) Four source images with different depth of field. The images were taken such that there was no parallax between them. (b) The fused image using in-focus masks. The color components are weight-averaged using the in-focus masks. (c) The above image shows the fusion without in-focus masks; the bottom image is the result after masking operation.
4. Create mask pyramid }{ nmM by upscaling the mask n
mM
from level m to level 0, and subsampling it from level K to the top pyramid level.
Figure 7. Diagram of fusing two images taken with different lens focus settings. Mk is the in-focus mask. kL̂ is the local energy image of the Laplacian at level k. The feature selection is based on the Eq.(2). Figure 6(a) shows four 1280x1024 images, each with different depth of field. To avoid parallax, instead of adjusting the position of the lens with respect to the object, the sensor-to-lens distance only was adjusted. Therefore, the alignment among these DoF images is just a scale factor, which is only a function of the distance between the sensor and the lens. Figure 6(b) shows the result of the fusion using the in-focus masks. In this example, the fusion was processed only on the luminance images. The color components of these images (UV) were weight-averaged by the in-focus masks at level 0. The final image is the combination of the averaged UV and the fused luminance, converted to RGB for display. If no in-focus mask is used, the reference image color component may be combined with the fused result to form the final image. One could also fuse each individual RGB band, but since the fused )(ijcLf
k value from each color band may not
come from the same source image, color distortion may result. The weighted average of the color components using in-focus masks thus reduces the aliasing and distortion in the result image. Figure 6(c) shows the fusion from a small region of interest in the image. The upper image is the result without using in-focus masks and the one below is using the masks. Clearly the result using the masks has higher contrast and less color distortion.
3.4 Image Fusion with Blending When multiple cameras are not co-aligned or have
different fields of view, the resultant fused image may show the borders of the source images that align to the reference image. In order to remove the borders, we apply a blending technique that has been used in image mosaicking and wavelet fusion. The blended region of the aligned images can be determined by the alignment parameters. To reduce the aliasing artifacts, the selection masks may be blurred before they are multiplied with the Laplacian images and the blend masks.
αkLα
kLαkL̂
αkM
βkL
βkL
βkM f
kL
βkL̂
Feature Select
1931
Figure 8. The transition of the values in the blend mask image is at the ROI edge. The transition functions in the horizontal and vertical directions are illustrated at the bottom and the right side of the image respectively.
Figure 9. Diagram of blending two images. Mk is generated from the ROI pyramid. kL̂ is the local energy image of the Laplacian at level k. The feature selection is based on the Eq.(2) such that the output is either the masking coefficient or 0.
In Figure 8, the aligned image is represented by the gray area. The blend mask ROI is defined as a rectangle inside the gray area. The closest distance from the ROI to the gray area boundary is a function of the blending function and the number of pyramid levels. On the two sides of the mask ROI edge, the mask values vary from 0 to 1. The size of the transition function is fixed at all levels.
Once the region of interest is known, we can define a mask image at level 0 with 1 representing the valid pixel region and 0 the invalid pixels. The mask pyramid is generated by smoothing with a filter F and decimating by half: 2)(1 ↓⊗=+
ni
ni MFM . (11)
)(tM ni is the blend mask at level k. F is a normalized
Gaussian filter. The size of the filter determines the softness of the transitions at the blended boundary. The salience at pixel (i,j) can be expressed in terms of the mask: ( ) )(ˆ tLMtS n
knk
nk ⋅= , (12)
the feature selection rule is defined in Eq. (2), and the fused Laplacian image is obtained by Eq. (3).
Figure 10 is another example, showing the blend of a color image with a SWIR image. The fusion without blending is displayed in Figure 10(a), and the edge blended result is given in Figure 10(b).
(a) (b) Figure 10. (a) The fused image with no blending. (b) The fused image with blending technique.
3.5 High Dynamic Range Compression
Many techniques have been developed to compress high dynamic range (HDR) images. One common approach is called ‘tone mapping’. Tone mapping merges a set of low dynamic range (LDR) images captured with different exposure times into a single HDR image. Each LDR image needs to be converted based on the camera response curve so that the complete set of images can linearly average based on the exposure. The fidelity of the pixel value in each LDR image is also weighted to avoid using underflow and overflow values. The HDR image is then contrast normalized (i.e. tone mapped) to compress the overall dynamic range while preserving contrast, enabling presentation of the final LDR image on traditional display media. Shortcomings in many tone mapping implementations include the introduction of halo artifacts and the tendency of standard image compression techniques to make the resulting image look “unreal”.
(a) (b)
(c) (d) Figure 11. HDRC Example. (a) T=1/250s (b) T=1/100s (c) T=1/60s (d) Fusion result.The first three LDR images are taken at different exposure times. The last image is the fused result using weight maps and contrast normalization at all pyramid levels.
The compression of the HDR image aims to preserve the local contrast details from the LDR images. Since this is a fundamental characteristic of Laplacian fusion, it is clearly well suited to this task. Further, the creation of an intermediate HDR image is not needed, and thus no tone mapping step is required. High Dynamic Range Compression (HDRC) is thus far more efficient and easier
αkLα
kLαkL̂
αkM
βkL
βkL
βkM f
kL
βkL̂
Feature Select
Mask ROIMask ROI
1932
to implement in the hardware than previous techniques. Since each component LDR image contains underflow and overflow pixels, mask operations are needed to exclude their influence. Further, if the contrast in any single LDR image needs to be enhanced, the pyramid based contrast normalization method can be performed during the fusion process. We define Mk(ij) as the weight map, with the underflow and overflow regions receive less weight and saturated color regions receiving more (easily determined if images are first converted to the HSV [Hue, Saturation and Value] color space). The weight map is a function of the saturation and camera exposure curve: ( ) ),),(( pa
nk
nk
nk ESijtIMijtM = , (13)
where Sa is the average saturation around (ij), and Ep is associated with the exposure time and weight for source image n. The salience at pixel (i,j) can be expressed in terms of the normalized local average of absolute Laplacian )(~ ijtLn
k : ( ) )()(~ ijtMijtLijtS n
knk
nk ⋅= . (14)
The fused Laplacian image is then obtained by Eq. (3), in which )(( ijtn
kχ is the normalized )(ijtS nk .
Figure 11 shows an HDRC example. Three LDR images are taken at different integration times as indicated. The fusion is combined with contrast normalization at all pyramid levels to enhance fine features.
Figure 12. Diagram of HDRC. The feature selection is bypassed. The normalized saliency is the selection weight.
In the fusion process, the fused Laplacian image at level k is the average of the Laplacian images weighted by the weight map. The fused Gaussian image at the top level is also shaped to contain less dynamic range. The DC (i.e. offset) level of the Gaussian is adjusted to be near the middle of the dynamic range. The reconstructed fusion result is shown in the last image in Figure 11.
4 Hardware Implementation The functional block diagrams for the different
application examples in Section III using mask pyramids can be assembled into the canonical implementation illustrated in Figure 13. This architecture is an example implementation that supports the selection and usage of the pyramid masks for image fusion and enhancement. The generation of the mask pyramid can be done
elsewhere depending on the application, and can be tuned as needed by this architecture.
Figure 13. Canonical Block Diagram of Mask Pyramid implementation for image fusion.
The hardware architecture is best described by following the data flow of the image in the block diagram. Inputs to the hardware block include the image Lk and the optionally pre-stored mask pyramid Mk at a pyramid level. The center portion of the block diagram describes how the mask pyramid is selected, used, and optionally generated for the next cycle. At point c, a filtering operation is used to determine either the absolute value or the square of the input data and apply m-tap filtering. This operation computes the local energy of the image. At point d, the local weight is generated based on inputs Lk and Mk, using a Func block consisting of an arithmetic logic unit (ALU) for table lookup, multiplications, subtractions, additions, and linear operations. At point f, the local weight coefficient is applied to the input image and thus contrast-enhances the original image. The Feature Select hardware selects either a pre-stored mask pyramid from external memory or locally calculated binary values. The output of the Feature Select block is an optional mask image can be saved to external memory for the next image frame, e.g. in hysteresis processing at point g, or at point e as a filter to reduce aliasing when used as the weight for final averaging of the enhanced Laplacian images. At point h, individual Laplacians are locally weighted before they are summed to form the locally enhanced fused Laplacian at point j. Finally, the images at different pyramid levels are summed to produce an output image.
It is worth mentioning that the functional block in Figure 13 may be configured to perform single image enhancement. Except for the input channel, the mask images for all other channels may be set to 0. Other input channels may be optionally disabled mode to save power as needed. Point d provides the local weight coefficients applied to the input Laplacian. The mapping function may be programmed as a local normalized contrast map[11], a histogram enhanced map, or a noise coring map among others.
For Acadia II System-on-Chip (SoC) applications[12], the processing for the canonical Mask Pyramid implementation may be accelerated by the use of its Video Processing Modules as connected modules on the Crossbar Switch (Fig. 14). These modules are hardware accelerators optimized for different pyramid image processing functions, including a three channel image fusion module and an ALU device. Referring to the Figure 13, image
αkLα
kLαkL̂
αkM
βkLβ
kL
βkM
fkL
βkL̂
Feature Select
Func
Func
αkS
βkS
αχk
βχ k
'α
kLαkL
αkL̂
αkM
'βkL
βkL
βkM
fkL
βkL̂
Filter
Filter
Filter
Filter
'αkM
'βkM
Feature Select
a b
c d e
f
g
h
Func
Func
i
1933
convolution can be performed using the FILTER device for image data (represented by point c and e). The Mask Pyramid can be implemented using the FILTER and FSP (Frame Store Ports) hardware connected in a pipeline shown in Figure 14. There are a total of 24 FSPs in Acadia II. When reading or writing to the FSPs, a raster-scan-ordered data stream is transferred between the Crossbar Switch and external memory. Up- and down-sampling may be performed by the FSP. The Feature Select functionality may be implemented using the FSPs and the ALU, or FUSE device. This mapping of this processing to the Acadia II is only an example of a possible implementation, as alternative mapping exists on the Acadia II SoC.
Figure 14. Acadia II SoC detailed block diagram.
5 Discussion and Conclusion This paper has discussed the use of the mask pyramid to
extend the conventional pyramid based fusion architecture to other image processing applications. The innovation comes from the use of local information to determine salience and thus influence the feature selection map. The canonical implementation is configured not to just execute single applications (as discussed in Section III), but is flexible enough to combine multiple applications in a comprehensible way. For example, the hardware can be configured to simultaneous perform both DoF and HDRC processing while locally enhancing the source image.[13]
The hierarchical local-information-encoded masking technique presented herein is not limited to pyramid fusion, but may apply to any wavelet fusion. The input data source can be multi-dimensional, and is not limited to vision data. The feature selection rule as described can output local weights when the register is enabled, and is thus not confined to binary.
References [1] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code”, IEEE Trans. Commun. COM-31, 532-540, 1983. [2] E. Adelson, C.H. Anderson, J.R. Bergen, P.J. Burt and J.M. Ogden, “Pyramid methods in image processing”, RCA Engineer 29, 33-41, 1984.
[3] P. J. Burt, “The pyramid as a structure for efficient computation”, Multi-resolution Image Processing and Analysis, A. Rosenfeld, ed., Springer-Verlag, Berlin, 1984. [4] J.M. Ogden, E.H. Adelson , J R. Bergen and P.J. Burt, “Pyramid-based computer graphics”, RCA Engineer, 30-5, 1985. [5] P. J. Burt, and R. J. Kolczynski, “Enhanced image capture through fusion”, Proc. International Conference on Computer Vision, Berlin, pp. 173-182, 1993. [6] Moira I. Smith, Jamie P. Heather, “Review of image fusion technology in 2005”, Thermosense XXVII, Proceedings of the SPIE, 5782, pp. 29-45, 2005. [7] “Multi-Image Fusion Makes Panoramic Stitching Super Easy”, http://www.tested.com/news/multi-image-fusion-makes-panoramic-stitching-super-easy/253/. [8] Acadia Video Processors, http://www.sarnoff.com/ products/acadia-video-processors/acadia-II, and http://www.sarnoff.com/products/acadia-video-processors/acadia-pci. [9] Fuzer, http://www.waterfallsolutions.co.uk/downloads/ products.html#VORTEX. [10] G. S. van der Wal and J. O. Sinniger, “Real time pyramid transform architecture”, Proc. Intelligent Robots and Comp. Vision, Boston, pp. 300-305, 1985.
[11] P. Burt, C. Zhang, G. van der Wal, “Image Enhancement through Contrast Normalization,” Military Sensing Symposium, February 6, 2007.
[12] G. S. van der Wal, “Technical Overview of the Sarnoff Acadia II Vision Processor,” SPIE Defense, Security, and Sensing Conference, Subconference 7710: Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications, Orlando, April 2010, Proc. SPIE 7710, (2010). [13] A. Sufi, D.C. Zhang and Gooitzen Van der Wal, “A Single Algorithm Combining Exposure and Focus Fusion,” ICIP, Brussels, Belgium, September, 2011.
SDI: VIDEO MEMORY ARBITER
WARP 02
WARP 01
WARP 00
VIDEOIN0
VIDEOIN1
NR 00VIN 00 NUC 00 HIST 00 BAY 00
16 14
UV-14
16
16
UV-14
14 14 14
NR 01VIN 01 NUC 01 HIST 01 BAY 01
16 14
UV-14
16
16
UV-14
14 14 14
NR 02VIN 02 NUC 02 HIST 02
16 14VIDEOIN2 16
16
14 14
EQ
DCT/IDCT
DDR2Controller
DMA
OSD
Flash Ctrl
AMBA Fabric
FSP 23
FSP 00
WARP 03
FUSE .
CC
HIST03HIST04
2X
24X
FILTER2X
GME
GCR
ALU
LUT2X
Stereo
HCDGEN
3X
DISPLAY
10 R
VIDEOOUT
LNK 008
LINKPORTS
16
16
16 10 B
10 G
8
LNK 018
8
8
LNK 028
8
8
To/From Flash
To/From DDR2 DRAM
MULTI-PORT MEMORY ARBITERS
Crossbar Switch
MPCore
ARM 11
w/FPU64kByteCache
ARM 11
w/FPU64kByteCache
ARM 11
w/FPU64kByteCache
ARM 11
w/FPU64kByteCache
LRAM64KB
USB 2.0
USB 2.0
5x UART
2x SPI
I2C16x GPIO
1934