region-of-interest extraction based on frequency domain analysis and salient region detection for...

916 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 11, NO. 5, MAY 2014

Region-of-Interest Extraction Based on FrequencyDomain Analysis and Salient Region Detection

for Remote Sensing ImageLibao Zhang and Kaina Yang

Abstract—Traditional approaches for detecting visually salientregions or targets in remote sensing images are inaccurate andprohibitively computationally complex. In this letter, a fast, ef-ficient region-of-interest extraction method based on frequencydomain analysis and salient region detection (FDA-SRD) is pro-posed. First, the HSI transform is used to preprocess the re-mote sensing image from RGB space to HSI space. Second, afrequency domain analysis strategy based on quaternion Fouriertransform was employed to rapidly generate the saliency map.Finally, the salient regions are described by an adaptive thresholdsegmentation algorithm based on Gaussian Pyramids. Comparedwith existing models, the new algorithm is computationally moreefficient and provides more visually accurate detection results.

Index Terms—Frequency domain analysis (FDA), quaternionFourier transform, region of interest (ROI), remote sensing imageprocessing.

I. INTRODUCTION

R EGION-of-interest (ROI) detection technology, which isrepresented by the visual attention mechanism, has been

introduced into the remote sensing image analysis field, andit has become an important technical approach for improvingthe time required and analysis accuracy in mass-data imageprocessing [1]. After providing a potential ROI, the viewercan search for specific objects in the region. The computingresources can be reasonably allocated to enhance the operatingefficiency of an image processing system.

A region that draws attention is defined as a focus of attention(FOA), which is considered an ROI or a target. Several com-putational models have been developed to simulate the humanvisual system (HVS) [1]–[3]. Itti [2] constructed a model usinga biologically plausible architecture, which was proposed byKoch and Ullman [3] and is the basis for visual attention.Dai et al. [4] presented a method involving visual attention intothe satellite image classification. A faster, more efficient ROIdetection algorithm based on an adaptive spatial subsamplingvisual attention model was proposed by Zhang et al. [5]. The

Manuscript received May 6, 2013; revised July 17, 2013 and August 17,2013; accepted September 5, 2013. Date of publication November 19, 2013;date of current version December 11, 2013. This work was supported in partby the National Natural Science Foundation of China under Grant 61071103and Fundamental Research Funds for the Central Universities under Grant2012LYB50.

The authors are with the College of Information Science and Technol-ogy, Beijing Normal University, Beijing 100875, China (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/LGRS.2013.2281827

above models have attempted to simulate the visual attentionmechanism based on the HVS biological construction.

In addition to the above biological models, certain othermethods have been proposed. Achanta et al. [6] presented afrequency-tuned approach for computing saliency in imagesusing low level color and luminance features and it generatesfull-resolution saliency maps. By analyzing the log-spectrum ofan input image, Hou et al. [7] extracted the spectral residual foran image in the spectral domain and proposed a fast methodfor constructing a corresponding saliency map in the spatialdomain. A bottom-up visual saliency model, graph-based visualsaliency (GBVS), is proposed by Harel et al. [8]. This methoduses a novel application of ideas from graph theory to concen-trate mass on activation maps, and to form activation maps fromraw features. In addition to such models, the visual saliencymodel is also applied to video compression [9].

Remote sensing images comprise high amounts of data. Thebiological models can simulate the HVS well, but they oftenlead to prohibitive computational complexity and not considerthe characteristics in frequency domain. Further, human visualattention does not necessarily reflect actual concern in a remotesensing image. Additional researchers in different disciplinescalculate ROI quickly, but they only consider the features ofthe image itself. It is easy to cause false or missing detection.To overcome the weaknesses in the existing visual attentionmodels so that they are more suitable for processing remotesensing images, we focus on two aspects: accuracy and lowcomputation. The salient regions should be detected and well-described. Thus, we propose a FDA-SRD model. This model isproposed to improve computational efficiency and accuracy inROI detection of remote sensing images. After the HSI trans-form, a novel frequency domain strategy based on quaternionFourier transform is been used to generate a saliency map,which is time-saving and efficient. In addition, an adaptivethreshold segmentation algorithm based on Gaussian Pyramidsis employed to obtain more accurate shape information of ROIs.Experimental results show that the proposed model is time-efficient and accurate.

The remainder of this letter is organized as follows. TheFDA-SRD method is illustrated in Section II. Section IIIfocuses on the research findings, while Section IV providesconclusions.

II. FDA-SRD METHOD

In the FDA-SRD model, the input image is subsampled bya factor of 2 twice to reduce the amount of data and is pre-processed using the HSI transform. A novel frequency domain

1545-598X © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

ZHANG AND YANG: REGION-OF-INTEREST EXTRACTION BASED ON FREQUENCY DOMAIN ANALYSIS 917

Fig. 1. Framework of the FDA-SRD model.

strategy is employed to generate a saliency map. Finally, thedetected regions are formed through an adaptive threshold seg-mentation algorithm based on Gaussian Pyramids to improveaccuracy in target detection. Fig. 1 illustrates the framework ofthe FDA-SRD model.

A. HSI Transform

There are two frequently used color models in image pro-cessing applications: RGB (red, green, blue) and HSI (hue,saturation, intensity). Because the HSI color space is consistentwith the human color perception system and is better than theRGB color space, remote sensing images are often transformedfrom RGB space to HSI space. HSI-based methods are oftenused due to their simple computation, high spatial resolutionand efficiency [10]. Given an image in the RGB color format,

the hue component for each RGB pixel is calculated using thefollowing equation:

H ={θ B ≤ G360− θ B > G

(1)

with θ = arccos{((1/2)[(R −G) + (R−B)])/([(R−G)2+(R−B)(G−B)1/2)}.

The saturation component is given by the following:

S = 1− 3

(R+G+B)[min(R,G,B)] . (2)

Finally, the intensity component is given by the following:

I =1

3(R+G+B). (3)

It is assumed that the RGB values have been normalized tothe range [0,1]; the angle θ is measured with respect to the redaxis in the HSI space.

B. Quaternion Fourier Transform for FDA-SRD Method

The quaternions were first described by Irish mathematicianWilliam Rowan Hamilton and applied to mechanics in a 3-Dspace. The first definition of a quarternion Fourier transformwas by Ell [11], and the first application of a quaternion Fouriertransform to color images was reported by Sangwine [12] usinga discrete version of Ell’s transform.

First of all, the remote sensing image is transformed to HSIspace by applying an HSI transform method. Next, using purequaternions, a remote sensing image can be represented inquaternion form as follows:

f(n,m) = H(n,m)μ1 + S(n,m)μ2 + I(n,m)μ3 (4)

where (n,m) is the location of each pixel. H(n,m), S(n,m),I(n,m) are the hue, saturation, and intensity components ofthe pixel, respectively. The choice of μ is arbitrary, but it hasconsequences. A generalized complex operator is employedin the Quaternion Fourier Transform. μ3 = μ1μ2, μ1 ⊥ μ2,μ2 ⊥ μ3, μ3 ⊥ μ1. f(n,m) is represented in symplectic formas follows:

⎧⎨⎩

f(n,m) = f1(n,m) + f2(n,m)μ2

f1(n,m) = H(n,m)μ1

f2(n,m) = S(n,m) + I(n,m)μ1.(5)

We calculated the quaternion Fourier transform as follows:

F (u, v) = F1(u, v) + F2(u, v)μ2 (6)

Fi(u, v) =1√MN

M−1∑m=0

N−1∑n=0

e−μ12π((mv/M)+(nu/N))fi(n,m).

(7)

For i ∈ {1, 2}, fi(n,m) is calculated by (5). The size ofimage is M ×N . The frequency domain representation forf(n,m) is F (u, v). F (u, v) can be represented in polar coordi-nates as follows:

F (u, v) = |F (u, v)| eμΦ(u,v) (8)

where Φ(u, v) is the phase spectrum and |F (u, v)| is theamplitude of F (u, v).


Fig. 2. ROI extraction.

The phase preserves many of the important features of imagesuch as the edge information [13]. But its ability to highlightfeatures is limited so the power of position information shouldbe boosted further by using Butterworth high-pass filter (BHPF)with high-frequency-emphasis to make frequency informationhigher from amplitude spectrum. A 2-D BHPF of order n andcutoff frequency D0 is defined as

H(u, v) =1

1 + [D0/D(u, v)]2n(9)

where D(u, v) = [(u−M/2)2 + (v −N/2)2]1/2 is the dis-tance between a point (u, v) in the frequency domain and thecenter of the frequency rectangle. In our model, D0 = 4% ofthe image size and n = 1. The formulation of high-frequency-emphasis filtering is defined as

He(u, v) = δH(u, v) (10)

where δ = K/(∑M−1

m=0

∑N−1n=0 (SMAP(n,m))2/(M ×N)).

Through test, suitable K is 0.16. SMAP(n,m) is the initialsaliency map with δ = 1. The inverse Fourier transform is thefollowing:

fi(n,m) =1√MN

M−1∑v=0

N−1∑u=0

eμ12π((mv/M)+(nu/N))Fi(u, v).

(11)

The output image can be constructed in the spatial domain

f ′(n,m) = x(n,m)μ1 + y(n,m)μ2 + z(n,m)μ3. (12)

f ′(n,m) is converted to grayscale to become initial saliencymap.

C. ROI Extraction

Fig. 2 shows the steps of ROI extraction. A Gaussian pyramidis a technique used in final saliency map generating. Gaussianpyramid decomposition will generate a series of images inwhich each image is a low-pass-filtered copy of its predecessor.The low-pass filtering is performed via convolution using aGaussian filter kernel and down-sampling operator [14]. Let theoriginal image be g0, which comprises C columns and R rowsof pixels. The Gaussian pyramid algorithm can be described asfollows:

gk = Re(gk−1). (13)

For levels 0 < k < N and nodes i, j, 0 ≤ i < Ck, 0 ≤ j <Rk. N is the number of levels in the pyramid, while Ck and Rk

are the dimensions for the kth level. The kth level image withina 5-by-5 window can be identified as follows:

gk(i, j) =

2∑m=−2

2∑n=−2

w(m,n)gk−1(2i+m, 2j + n) (14)

where w(m,n) satisfy w(m,n) = w(m)w(n), which is thegenerating kernel. w must meet the normalization, symmetryand equal contribution constraints. The equivalent weightingfunctions in a generating kernel are similar to Gaussian func-tions. Due to reduced sample density, gk is smaller than gk−1

by a scale factor of 1/2. This decomposition algorithm generatesthe final saliency map. Our model uses standard deviation 3.5and size 10 for Gaussian filter kernel.

After using a Gaussian pyramid, Otsu is used to identify thethreshold [15]. A picture with the total pixel number N can berepresented in gray levels [1, 2, . . . , L]. The number of pixels atlevel i is denoted by ni. The probability for level i is as follows:

⎧⎨⎩

pi = ni/N(i = 1, 2, . . . , L)L∑

i=1

pi = 1.(15)

The pixels are divided into two classes, A and B (back-ground and objects, or vice versa), by a threshold at level k.A comprises pixels at levels [1, . . . , k], and B comprises pixelsat levels [k + 1, . . . , L]. The probabilities for class occurrenceare given by the following:

⎧⎪⎪⎨⎪⎪⎩

ωA =k∑

i=1

pi = ω(k)

ωB =L∑

i=k+1

pi = 1− ω(k).

(16)

The class mean levels are calculated by the following:⎧⎪⎪⎨⎪⎪⎩

λA =k∑

i=1

ipi/ωA = λ(k)ω(k)

λB =L∑

i=k+1

ipi/ωB = λT−λ(k)1−ω(k)

(17)

where λ(k) =∑k

i=1 ipi, and λT (k) =∑L

i=1 ipi. λT is theaverage of the gray values for the entire image, and λT =ωAλA + ωBλB .

The variance between two classes is as follows:

σ2(k) =[λTω(k)− λ(k)]2

ω(k) [1− ω(k)]. (18)

Using maximum variance theory, the optimal threshold k∗ is

k∗ = argmax1≤k<L

σ2(k). (19)

Then we use this threshold to transforms the saliency mapinto a binary mask, where ones refer to the ROIs. The finaldetection result (ROI) is generated by multiplying the mask inthe original image.

Region growing is a procedure that groups pixels or sub-region into larger regions based on predefined criteria forgrowth. However, it will spend a lot of time and can’t detectsome effective area of the saliency. The larger the proportionof ROIs, the longer it takes to establish the region growing.Though the region growing provides a better description of theROIs, it has become the most time-consuming part.

To establish an accurate description, an adaptive thresholdsegmentation algorithm based on Gaussian Pyramids is used inthis letter. It solves the “fragments” and “intra-regional blackspots” problems from traditional single threshold method, and

ZHANG AND YANG: REGION-OF-INTEREST EXTRACTION BASED ON FREQUENCY DOMAIN ANALYSIS 919

TABLE ITIME DESCRIBING ROIs USING DIFFERENT METHOD (S)

Fig. 3. Saliency map and ROI description of each model for BJ1.

it is highly efficiency. The time required using the differentmethods is shown in Table I.

III. EXPERIMENTAL RESULTS AND DISCUSSION

To evaluate the performance of the proposed model, sev-eral experiments were conducted using select remote sensingimages. The experiment was designed to compare the overallperformance between the Itti model, the Achanta model, the SRmodel, the GBVS model and the FDA-SRD model, includingqualitative and quantitative evaluations.

Our method is mainly applied to high-resolution remotesensing images. BJ1 and BJ2 were taken by the SPOT5 satellitewhich offers a resolution of 2.5 m. BJ3 was taken by theGeoEye-1 which provides a resolution of 1 m. These imagescame from Beijing rural areas. Among these images, the ruralresidential regions are defined as ROIs and should be detectedprimarily. These regions typically include one or more of thefollowing characteristics: rich edge and texture features, thebrightness area and color highlighting area. The tests wereconducted on Windows platform. The PC was equipped withIntel Pentium CPU G630 and 8 GB memory. All the imageswere 2048 × 2048 pixels.

The implementations of the other methods can be down-loaded from [16]–[18], which can generate the saliency maps.We simulate the ROI extraction process of each method byusing the ways which the authors mention in their letters.

A. Qualitative Experiment

The saliency map comparison experiment and the ROI de-scription comparison experiment for each model of differentremote sensing images are shown in Figs. 3–5, respectively.BJ1 and BJ2 are high contrast images and BJ3 is low contrastimage in intensity component. In Figs. 3 and 4, each method candetect these ROIs basically in high-intensity contrast. The FDA-SRD method presents a more accurate description of ROIs.In Fig. 5, although the intensity contrast of BJ3 is low, thedetection results using the FDA-SRD model are consistent withthese ROIs in the original image and include little background



Fig. 6. Time trends for computing saliency map of different models.

information. The model of Achanta et al., the SR model, and theGBVS model fail to detect fully salient regions. The detectedregions of the model of Itti et al. contain too much backgroundinformation. Experimental results show that our method canbe used for not only the high contrast image but also the lowcontrast image. The strength of the FDA-SRD model can easilybe determined.

B. Quantitative Experiment

The processing time of different models for different remotesensing image size is shown in Fig. 6 and Table II. Fig. 6 is thetime trends for computing saliency map of different models.Table II illustrates the detailed time comparison of computingsaliency map for different image size among different models.Compared with the results from the traditional models, the time


TABLE IITIME COMPARISON OF COMPUTING SALIENCY MAP FOR DIFFERENT

IMAGE SIZE AMONG DIFFERENT MODELS (S)

Fig. 7. ROC curve.

Fig. 8. Precision (P ), recall (R), and F -measure (F ).

used for FDA-SRD is shorter than the models of Itti et al.,Achanta et al., and GBVS. In the SR model, the log spectrumL(f) is computed from the down-sampled image with height(or width) at 64 pixels. Thus, the timeframe is shorter andchanged a little. Overall, our method is very fast in dealing withmass-data remote sensing image.

In addition, the receiver operator characteristic (ROC) curveand precision (P ), recall (R) F -measure (F ) are used tocompare the performance across the five models quantitatively,which are shown in Figs. 7 and 8, respectively. Twenty ran-domly selected image fragments with dimension of 2048 ×2048 are used as the image database. For each image, a manu-ally segmented map is generated as the background truth.

The ROC curves are generated by classifying the locationsin a saliency map into salient regions and non-salient regionswith varying quantization thresholds. It can be seen that theFDA-SRD model has the best performance across all the fivemodels. The precision and recall measures are particularlymeaningful in the context of boundary detection when weconsider applications that make use of boundary maps, suchas stereo or object recognition. It is reasonable to characterizehigher level processing in terms of how much true signal is

required to succeed R (recall), and how much noise can be tole-rated P (precision) [19]. The F -measure (F ) is calculated as

F =2× P recision×Recall

P recision +Recall. (20)

From the result, we can find that the FDA-SRD model isbetter than other models.

IV. CONCLUSION

The FDA-SRD model is proposed and validated in this letter.The input image is down-subsampled then transformed to anHSI color space to increase efficiency for further processing.For computing the saliency map, a strategy based on theQuaternion Fourier Transform is proposed. Frequency domaininformation is considered in the method herein. The FDA-SRD model uses an adaptive threshold segmentation algorithmbased on Gaussian Pyramids to generate the detected ROIs,which is much more accurate. Generally speaking, the detectionresults for the FDA-SRD model are visually and statisticallysatisfactory. This model solves the problem of computationefficiency for remote sensing image processing. It meets thetime requirements.

REFERENCES

[1] Z. Li and L. Itti, “Saliency and gist features for target detection in satelliteimages,” IEEE Trans. Image Process., vol. 20, pp. 2017–2029, Jul. 2011.

[2] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attentionfor rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20,no. 11, pp. 1254–1259, Nov. 1998.

[3] C. Koch and S. Ullman, “Shifts in selective visual attention: towards theunderlying neural circuitry,” Human Neurobiol., vol. 4, no. 4, p. 219, 1985.

[4] D. Dai and W. Yang, “Satellite image classification via two-layer sparsecoding with biased image representation,” IEEE Geosci. Remote Sens.Lett., vol. 8, no. 1, pp. 173–176, Jan. 2011.

[5] L. Zhang, H. Li, P. Wang, and X. Yu, “Detection of regions of interest in ahigh-spatial-resolution remote sensing image based on an adaptive spatialsubsampling visual attention model,” GISci. Remote Sens., vol. 50, no. 1,pp. 112–132, 2013.

[6] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tunedsalient region detection,” in Proc. IEEE Conf. CVPR, 2009, pp. 1597–1604.

[7] X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,”in Proc. IEEE Conf. CVPR, 2007, pp. 1–8.

[8] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Proc.Adv. Neural Inf. Process. Syst., 2007, pp. 545–552.

[9] C. Guo and L. Zhang, “A novel multiresolution spatiotemporal saliencydetection model and its applications in image and video compression,”IEEE Trans. Image Process., vol. 19, no. 1, pp. 185–198, Jan. 2010.

[10] S. Rahmani, M. Strait, D. Merkurjev, M. Moeller, and T. Wittman, “Anadaptive IHS pan-sharpening method,” IEEE Geosci. Remote Sens. Lett.,vol. 7, no. 4, pp. 746–750, Oct. 2010.

[11] T. A. Ell, “Quaternion-Fourier transforms for analysis of two-dimensionallinear time-invariant partial differential systems,” in Proc. 32nd IEEEConf. Decision Control, 1993, pp. 1830–1841.

[12] S. J. Sangwine, “Fourier transforms of colour images using quaternion orhypercomplex, numbers,” Electron. Lett., vol. 32, no. 21, pp. 1979–1980,Oct. 1996.

[13] A. V. Oppenheim and J. S. Lim, “The importance of phase in signals,”Proc. IEEE, vol. 69, no. 5, pp. 529–541, May 1981.

[14] P. Burt and E. Adelson, “The Laplacian pyramid as a compact imagecode,” IEEE Trans. Commun., vol. COM-31, pp. 532–540, Apr. 1983.

[15] N. Otsu, “A Threshold Selection Method from Gray-Level Histogram,”IEEE Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, pp. 62–66, Jan. 1979.

[16] [Online]. Available: http://www.klab.caltech.edu/~xhou/projects/spectralResidual/spectral-residual.html

[17] [Online]. Available: http://ivrgwww.epfl.ch/supplementary_material/RK_CVPR09/index.html

[18] [Online]. Available: http://www.klab.caltech.edu/~harel/share/gbvs.php[19] D. R. Martin, C. C. Fowlkes, and J. Malik, “Learning to detect natural

image boundaries using local brightness, color, and texture cues,” IEEETrans. Pattern Anal. Mach. Intell., vol. 26, no. 5, pp. 530–549, May 2004.

region-of-interest extraction based on frequency domain analysis and salient region detection for...

Documents