ieie transactions on smart processing and computing, vol ... › upload › jnl ›...

IEIE Transactions on Smart Processing and Computing, vol. 3, no. 2, April 2014 http://dx.doi.org/10.5573/IEIESPC.2014.3.2.41

41

IEIE Transactions on Smart Processing and Computing

Minimum Statistics-Based Noise Power Estimation for Parametric Image Restoration

Yoonjong Yoo1, Jeongho Shin2, and Joonki Paik3

1 Image Processing and Intelligent Systems Laboratory, Department of Advanced Imaging, Graduate School of Advanced Imaging Science, Multimedia, and Film, Chung-Ang University / Seoul, Korea [email protected]

2 Department of Web information Engineering Hankyong University / Gyeonggi, Korea [email protected] 3 Image Processing and Intelligent Systems Laboratory, Department of Advanced Imaging, Graduate School of Advanced

Imaging Science, Multimedia, and Film, Chung-Ang University / Seoul, Korea [email protected]

* Corresponding Author: Joonki Paik

Received November 20, 2013; Revised December 27, 2013; Accepted February 12, 2014; Published April 30, 2014

* Regular Paper

Abstract: This paper describes a method to estimate the noise power using the minimum statistics approach, which was originally proposed for audio processing. The proposed minimum statistics-based method separates a noisy image into multiple frequency bands using the three-level discrete wavelet transform. By assuming that the output of the high-pass filter contains both signal detail and noise, the proposed algorithm extracts the region of pure noise from the high frequency band using an appropriate threshold. The region of pure noise, which is free from the signal detail part and the DC component, is well suited for minimum statistics condition, where the noise power can be extracted easily. The proposed algorithm reduces the computational load significantly through the use of a simple processing architecture without iteration with an estimation accuracy greater than 90% for strong noise at 0 to 40dB SNR of the input image. Furthermore, the well restored image can be obtained using the estimated noise power information in parametric image restoration algorithms, such as the classical parametric Wiener or ForWaRD image restoration filters. The experimental results show that the proposed algorithm can estimate the noise power accurately, and is particularly suitable for fast, low-cost image restoration or enhancement applications.

Keywords: Power Estimation, Noise Estimation, Restoration 1. Introduction

Image restoration estimates the original undistorted image from an observed image using an inverse operation of various image degradation factors, such as out-of-focus, motion blur, and atmospheric turbulence, in an imaging system. In particular, based on the image degradation model, the image restoration process is considered adeconvolution of the point spread function (PSF) or statistical inverse problem (SIP). Because image restoration is almost always an ill-posed problem, its solution either does not exist or it is not unique. This results in a set of feasible solutions rather than a unique one. A priori information of the original image is used most widely to select the best solution in the set. This type of image restoration approach includes a Wiener filter, constrained least squares (CLS) filter, and iterative

regularization, all of which fall into the category of regularized image restoration [1].

A parametric Wiener filter approximates the noise-to-signal power ratio (NSPR) to a constant for simplified, efficient realization. On the other hand, the CLS filter and iterative regularization incorporate a priori information as a form of a regularization constraint into the solution. The relative amount of the regularization constraint is controlled by a regularization parameter. The abovementioned image restoration methods are called parametric image restoration because they use a single parameter to control the amount of regularization. Among the various sophisticated image restoration approaches available, a parametric image restoration is particularly suitable for efficient, realistic applications, such as digital auto-focusing and motion blur removal using an embedded processor or a system on chip (SOC). For the successful

Yoo et al.: Minimum Statistics-Based Noise Power Estimation for Parametric Image Restoration

42

deployment of parametric image restoration to the real imaging system, an accurate estimation of noise power is of paramount importance because the regularization parameter is related directly to the noise power.

Although many image denoising methods for image enhancement have been proposed, their application to parametric image restoration exhibits limited performance because they mainly extract the original signal instead of estimating the noise power [2, 3]. On the other hand, blind deconvolution is considered to be the most fundamental approach to estimating the original image without information on the image degradation model and noise characteristics. A blind deconvolution method assumes the observed image as an auto-regressive moving-average (ARMA) process. In this method, the modeling error of the auto-regressive (AR) process is considered as noise, and the parameters of moving-average (MA) process is considered as an image degradation factor [4-6]. Most blind deconvolution methods, however, cannot guarantee consistent performance with various images and noise characteristics.

A practical approach to estimate the regularization parameters instead of directly obtaining the noise power uses either discrepancy principle [7] or generalized cross-validation [8]. These methods require a high computational load due to the nature of the iterative estimation, and cannot be considered a solution of a practical, real-time image restoration. Another method to estimate the optimal regularization parameter uses the L-curve. Each point on the L-curve in the two-dimensional (2D) coordinate represents the energy of residual error and signal energy calculated using the corresponding regularization parameter. Because this curve generally has the “L” shape, the regularization parameter corresponding to the highest curvature point is considered to be the optimal value [9]. The U-curve method is a variant of the L-curve method [10]. Both L-curve and U-curve methods require a large number of imager restoration processes with various regularization parameters, and are unsuitable for practical applications.

Modeling or analyzing noise has become a fundamental problem in the signal processing area [11]. In speech signal processing, the minimum statistics (MS) method is used widely to estimate ambient noise power under the assumption that there are breaking periods between words in human speech [12]. In other words,a speech-absent period contains only noise components, and its signal power is estimated using a windowed Fourier transform to obtain the noise power.

To apply the MS approach to estimate the noise in an image, it is important to note that there is no signal-absent region in a general image, and that a flat region can play the role of an absent period without a direct current (DC) component. In the flat region, the low-frequency component contains the DC component of the signal, and the high-frequency component contains only noise. In this context the proposed noise power estimation method uses a discrete wavelet transform to decompose the different frequency bands while preserving the locations. Multi-resolution wavelet transform can extract the flat regions, and the noise power is estimated by calculating the signal

energy of the corresponding high-frequency component. This paper is organized as follows. Section 2 explains

the image degradation model including the point spread function (PSF) and noise, and briefly presents the basic terminologies for the multiresolution discrete wavelet transform. Section 3 presents the proposed MS-based noise power estimation method for a parametric image restoration. Section 4 summarizes the experimental results of noise power estimation and parametric image restoration, and section 5 concludes the paper.

2. Theoretical Background

2.1 Image Degradation Model In digital image processing, the general, discrete model

for linear degradation caused by blurring and additive noise can be expressed as the following superposition summation,

( ) ( ) ( )( ) ( )1 1

, , ; , l , ,M N

k ly i j h i j k f k l n i j

= =

= ⋅ +∑∑ , (1)

where ),( jif represents an original M N× image, and

( , ; , )h i j k l is the 2D PSF of the imaging system. The ),;,( lkjih operator, and ( , )n i j additive noise, which is normally modeled as a white Gaussian process. In this paper, it was assumed that the PSF linear space invariant (LSI) is

( ) ( ) ( )( ) ( )

( ) ( ) ( )1 1

, , ; , l , ,

, , ,

M N

k ly i j h i j k f k l n i j

h i j f i j n i j= =

= ⋅ +

= ∗∗ +

∑∑ , (2)

where ** indicates a 2D convolution. Using the LSI image degradation model in (2), the PSF does not change over the entire image. A major advantage of the LSI model is to reduce the computational load significantly using frequency domain processing.

2.2 Discrete Wavelet Transform The discrete wavelet transform passes the input signal

through a series of filters. A signal can be decomposed into a set of band-limited components, called sub-bands, which can be reassembled to reconstruct the original image without error. As shown in Fig. 1, both sub-bands, which are outputs of 0 ( )h n and 1( )h n , can be downsampled without any loss of information because the bandwidth is smaller than that of the original signal. 0 ( )h n and 1( )h n are called analysis filters, and 0 ( )g n and 1( )g n are called synthesis filters. The output of the low-pass filter 0 ( )h n represents an approximation of ( )x n , and the output of the high-pass filter 1( )h n represents a detailed part of ( )x n .

The reconstructed signal ˆ( )x n is obtained by adding the upsampled and filtered version of 0 ( )y n and 1( )y n . For

IEIE Transactions on Smart Processing and Computing, vol. 3, no. 2, April 2014

43

error-free construction, ˆ( ) ( )x n x n= , the following conditions must be satisfied.

0 0 1 1

0 0 1 1

( ) ( ) ( ) ( ) 0 ,( ) ( ) ( ) ( ) 2,

H z G z H z G zH z G z H z G z− + − =

+ = (3)

where ( )iH z and ( )iG z , { }0,1i∈ , respectively represent z-transforms of ( )ih n and ( )ig n . (3)is called the conditions for perfect reconstruction.

After some algebraic step, the conditions in (3)are expressed as the following biorthogonality constraint:

( ) ( ) ( ) ( ) { }2 , , , 0,1 ,i jh n k g k i j n i jδ δ− = − ∈ (4)

which is imposed on the analysis and synthesis filter impulse responses of all two-band, real-coefficient, perfect reconstruction filter banks.

If the filters are constrained further to be orthonormal, such as

( ) ( ) ( ) ( ) { }, 2 , , 0,1 ,i jg n g n m i j m i jδ δ+ = − ∈ (5)

given a synthesis low-pass filter, 0g , impulse responses 0h ,

1h , and 1g can be determined as follows:

{ }1 0( ) ( 1) (2 1 ), , 0,1 ,

( ) (2 1 )

n

i i

g n g K ni j

h n g K n= − − −

∈= − −

(6)

where 2K represents the length of each filter.

3. Multiresolution Analysis for a Noise Power Estimation

As stated in section 2, the DWT can effectively search flat areas within an image. On the other hand, the coefficient in each high-pass sub-band contains both signal detail and noise. For the decomposition, they can be classified into the signal detail and pure noise parts. For this, in (2), the input image is expressed as a sum of low-pass and high-pass filtered signals,

( , ) ( , )** ( , ) ( , )** ( , ) ( , ),low highy i j h i j f i j h i j f i j n i j= + + (7)

where highf and lowf represent the high-frequency and low-frequency parts, respectively. By applying the DWT,

(7) can be formulated as

( , ) ( , )** ( , ),( , ) ( , )** ( , ) ( , ).

low low

high high

y i j h i j f i j andy i j h i j f i j n i j

== +

(8)

Because the noise is a random signal, most of ),( jin is

included in the high-frequency sub-band of the DWT. Using the 3-level DWT, the location of the detailed edge

( , )** ( , )highh i j f i j , is detected, and the noise power can be estimated by excluding the edge region.

Fig. 1 shows a block diagram of the proposed algorithm.

As shown in Fig. 2, the proposed noise power estimation algorithm is composed of three steps, such as 3-level DWT, edge-map generation, and signal power estimation.

3.1 Part of DWT The proposed algorithm first performs the 3-level

DWT, as shown in Fig. 3. Multiresolution analysis becomes possible by

alternative wavelet transforms along each direction. In all subsequent decompositions, the approximation sub-band, which is the sub-image located at the upper-left-hand corner of the previous decomposition, becomes the input for the next level DWT. Each decomposition produces four quarter-size output images that are arranged, as shown in Fig. 3 and substituted for the input from which they were derived. Based on both theoretical and experimental observations, the 2nd and 3rd level DWTs could extract the pure noisy region from the meaningful entities of the original image. In particular, the 1st level diagonal detail component contains both a meaningful edge and noise,

Fig. 1. Two-band filter bank overview.

Fig. 2. Block diagram of the proposed algorithm.

(a) (b)

Fig. 3. 3-level DWT (a) the pyramid of DWT sub-bands,(b) corresponding example using Lena image.


44

among which only the edge regions are removed using the higher-level detail components.

3.2 Edge-map Generation in the DWT-Space

This subsection describes how the edge map is generated to distinguish the flat regions in an image. In this study, the flat region detection is motivated by the detection of a speech absent period in speech processing. In general, it is difficult to detect the edge using only the signal level DWT when the image contains a substantial amount of noise. Therefore, the 2nd and 3rd level DWT are also needed to extract the edges. The edge-map in the 2nd and 3rd level detail coefficients are defined using the binarization function as follows:

{ }1, [ , ][ , ] , for , , ,

0,i

i

if x m n te m n i V H D

otherwise⎧ >

= ∈⎨⎩

(9)

where , ,V Hx x and Dx respectively represent the absolute values of vertical, horizontal, and diagonal details in the DWT. The threshold values for each level are chosen experimentally as

2

3

(5 max 4 min) / 9,(5 max 3 min) / 8,

level

level

tt−

−

= × + ×= × + ×

(10)

where max and min represent the biggest and smallest wavelet coefficients, respectively.

After the edge map values ie are located, they are integratedas follows:

{ }( , ) ( , ) ( , ) ( , ), for 2,3 .l V H De m n e m n e m n e m n l= =∪ ∪

(1) The wavelet transform reorganizes the image content

into a low-resolution approximation and a set of details of different orientations and different scales. Therefore, the sizes of these edge-maps are smaller than the original image because of the down-sampling by 2 at each level. Therefore, they are used to relocate the edge in the 1st level DWT. The nearest neighbor interpolation was used to register the differently scaled edge maps.

Because the 3rd level DWT coefficients are generated by low-pass filtering signals twice, the corresponding detail components contain only significant edges without random noise. Therefore the resulting edge map can be expressed as

[ ] [ ]2 32 4

( , ) ( , ) ( , ) ,e m n e m n e m n= ∩ (12)

where [ ]2 2( , )e m n represents the 2nd level edge map

upsampled by 2, and [ ]3 4( , )e m n is the 3rd level edge map

upsampled by 4. The morphological dilation operation near the edge is

used to avoid the case where the details are classified as low components near the sharp edges. The dilation

operation is expressed as

1, ( , ) 1

( 1, 1) .0,M

if e m ne m n

otherwise=⎧

± ± = ⎨⎩

(13)

The dilated edge-map provides a better chance to

accurately locate the edges in the original image. Therefore, the pixels along the resulting edge are not used, and the other pixels in the flat region serve as a feasible region for a noise power estimation.

3.3 Estimating the noise power As mentioned in section 2, the noise is defined as a

zero mean Gaussian random signal with an autocorrelation as follows:

)()]()([)( 2 τδσττ =+= tNtNErNN , (14)

where 2σ represents the variance, E is the expectation operation, and ( )δ τ is the unit impulse. The power spectrum of noise is given by the Fourier integration as follows:

2 2( ) ( ) .j ftNN NNP f r t e dtπ σ

∞−

−∞

= =∫ (15)

The power spectrum expression in (15) justifies that the

variance of a signal-absent region is equal to the constant the noise power spectrum. Given the measured noise variance, the flat region in the 1st level diagonal detail band is taken as follows

( , ), ( , ) 0

( , ) ,0,

D Mx m n if e m nN m n

otherwise=⎧

= ⎨⎩

(16)

where Dx represents the 1st level diagonal detail band, N is the estimated set of noise samples in the signal-absent region. Finally the noise power is computed as follows:

( ) ( )( )( )

( )

22

1 1

1 , , ,

Total number of 1 ,

I J

m n

M

N m n E N m nTN

TN I J e

σ= =

= −

= × − =

∑∑ (17)

where I and J respectively represent the vertical and the horizontal sizes of the diagonal detailband in the 1st level DWT, and TN is the number of samples in the flat region.

4. Experimental Results

This section presents the simulation results of the proposed noise power estimation algorithm and its application to parametric image restoration algorithms. A set of standard 256 256× 8-bit grayscale images were tested including Lena, Cameraman, Barbara, Boat, and Goldhill. These test images were degraded by simulating 3


45

to 11 pixel uniform blurs with additive Gaussian white noise of 0 to 70dB SNR. The proposed method was then used to estimate the noise power for each image to evaluate the performance of the estimation method. Table 1 shows the experimental results by comparing the real and estimated noise powers.

The numerical data given in Table 1 are the real and estimated values. As shown in Table 1, the proposed estimation method gives accurate noise power values with SNR 0 to 30dB, which is illustrated in Fig. 4 for the Lena image.The vertical axis of the graph in Fig. 4 represents the level noise power on a log scale, and the horizontal axis indicates the SNR values. For images with SNR 0 to 30dB, the estimated noise powers were similar to the real noise power regardless of the blur sizes. As the SNR was increased above 40dB, the number of noise samples decreased. On the other hand, for images with SNR 40dB or higher, the number of detected noise samples decreased significantly and the estimation accuracy decreased. Larger blur sizes provide a more accurate estimation with SNR 40dB or higher because it suppresses more edge in the original image.

Table 2 shows the image restoration results of the Wiener filter using estimated noise power. Given the

spectrum of the original image, the Wiener filter can theoretically minimize the mean squared error between the original and restored images with the frequency responses as follows:

),(/),(),(

),(),( 2

*

vuSvuSvuH

vuHvuWxxηη+

=

, (18)

Table 1. Results of noise estimation (variance of noise).

SNR [dB] Image Blur Size 0 10 20 30 40 50 60 70

Real Value 3239.667 323.9667 32.3967 3.2397 0.324 0.0324 0.0032 0.0003

Lena 3x3 5x5 7x7 9x9

11x11

3210.103 3221.812 3209.968 3226.263 3254.237

328.7249330.5618328.8707326.8065324.4781

33.588532.847632.474332.708732.698

4.4679 3.4774 3.3669 3.2984 3.2888

1.5785 0.5635 0.4416 0.3854 0.3587

1.2934 0.2797 0.1513 0.0964 0.0671

1.2687 0.2532 0.1229 0.0685 0.0386

1.2671 0.25

0.1203 0.066 0.036

Real Value 4210.47 421.047 42.1047 4.2105 0.421 0.0421 0.0042 0.0004

Camera-man 3x3 5x5 7x7 9x9

11x11

4289.119 4275.86 4285.852 4263.994 4249.764

427.3593425.0564424.2836425.6784424.5425

44.746543.008442.572242.306742.2741

6.6499 4.7718 4.4455 4.3027 4.2588

2.8844 0.9294 0.6188 0.5046 0.4648

2.4891 0.5445 0.2322 0.1252 0.0845

2.4563 0.5052 0.1939 0.0872 0.0464

2.4542 0.501

0.1895 0.0836 0.042

Real Value 2792.166 279.2166 27.9217 2.7922 0.2792 0.0279 0.0028 0.0003

Barbara 3x3 5x5 7x7 9x9

11x11

2898.218 2822.948 2816.222 2846.788 2821.618

286.6985286.0339286.8649287.5493287.5812

33.539329.202828.806828.916328.3155

7.9205 3.661 3.113 2.9441 2.9071

5.5188 1.0558 0.5272 0.3874 0.3304

5.0846 0.7771 0.2626 0.1252 0.0729

5.046 0.7441 0.2335 0.0977 0.0467

5.0325 0.7401 0.23

0.0946 0.044

Real Value 2707.32 270.732 27.0732 2.7073 0.2707 0.0271 0.0027 0.0003

Boat 3x3 5x5 7x7 9x9

11x11

2751.544 2771.077 2758.934 2728.934 2742.365

277.1431276.8141276.0354274.6365274.4742

29.436328.069927.725727.607 27.4036

4.6828 3.1505 2.8803 2.8073 2.7858

2.2253 0.6464 0.3901 0.3229 0.2992

1.984 0.3948 0.1426 0.0739 0.0495

1.962 0.3692 0.1189 0.0489 0.0246

1.9604 0.3664 0.1165 0.0464 0.0222

Real Value 2133.844 213.3844 21.3384 2.1338 0.2134 0.0213 0.0021 0.0002

Goldhill 3x3 5x5 7x7 9x9

11x11

2173.891 2137.604 2106.107 2140.056 2171.393

217.3611216.0309215.7399216.8333214.9948

22.407 21.773821.867821.546521.1543

2.9837 2.3268 2.2283 2.1786 2.1046

1.0581 0.3832 0.2727 0.2404 0.2216

0.8544 0.1885 0.0776 0.046 0.0349

0.8344 0.168 0.0573 0.0267 0.0164

0.8321 0.1658 0.0554 0.0248

0.0147

Fig. 4. Result of noise estimation fromthe Lena image.


46

where xxS and Sηη represent the power spectra of the original image and noise, respectively.

In most image restoration problems, the power spectrum information of the original image is unavailable. The original Wiener filter, however, provides the good reference for comparing a range of image restoration algorithms. For the experiment the original periodogram

was used as an approximation of xxS . The MSE for PSNR of the restored image was calculated as

1 1 2

0 0

1 ˆ( , ) ( , ) ,m n

i jMSE f i j f i j

m n

− −

= =

= −× ∑∑ (19)

Table 2. Results of image restoration using the Wiener filter with the real and the estimated noise power (peak-to-peak signal-to-noise ratio; PSNR).

SNR [dB] Image Blur Size

Type of noise power 0 10 20 30 40 50 60 70

Real 23.8846 26.9816 29.8579 32.6848 35.8671 39.4885 43.4946 47.71433x3 Estimated 23.4047 26.5427 29.3919 32.1093 34.2927 34.9086 34.9831 34.9906Real 23.0575 25.2821 27.3222 29.4897 32.0639 35.221 39.0044 43.16265x5 Estimated 22.675 24.9369 26.9202 29.0259 31.4048 32.9659 33.2785 33.311 Real 22.3753 24.1837 25.8135 27.6967 30.06 33.0215 36.5146 40.50067x7 Estimated 22.065 23.856 25.4184 27.2461 29.5475 31.5776 32.173 32.2479Real 21.759 23.3895 24.899 26.6741 28.8624 31.4747 34.6416 38.339 9x9 Estimated 21.4581 23.0627 24.4949 26.2692 28.3943 30.5291 31.3855 31.5106Real 21.1039 22.6819 24.2392 25.9698 28.0801 30.6281 33.7327 37.367

Lena

11x11 Estimated 20.868 22.3391 23.864 25.5693 27.6278 29.8919 31.114 31.3166Real 22.3852 25.1677 27.9766 30.9808 34.2485 37.7331 41.5757 45.80963x3 Estimated 21.9164 24.6894 27.4516 30.3447 32.3317 32.8001 32.8573 32.8652Real 21.5558 23.5775 25.5864 27.9729 30.7363 33.8736 37.467 41.29275x5 Estimated 21.2098 23.1896 25.1141 27.4553 29.9248 31.1618 31.3714 31.3946Real 20.9223 22.5638 24.3757 26.5466 29.0746 31.9082 35.2263 39.17077x7 Estimated 20.6341 22.2235 23.9819 26.0993 28.4755 30.3097 30.7895 30.851 Real 20.449 21.8927 23.4982 25.4301 27.6668 30.2777 33.4247 37.07529x9 Estimated 20.1916 21.5898 23.1314 24.9955 27.163 29.2284 30.065 30.1846Real 20.0748 21.4623 22.9824 24.7365 26.7858 29.2677 32.2804 35.957

Cameraman

11x11 Estimated 19.8498 21.154 22.648 24.3485 26.3084 28.5152 29.794 30.0443Real 22.1229 23.7675 25.514 27.8749 30.9784 34.6997 38.7562 43.05723x3 Estimated 21.8203 23.4125 25.0313 26.9306 27.7802 27.9562 27.9747 27.9797Real 21.628 22.8247 23.8684 25.397 27.8021 31.0406 34.8908 39.32045x5 Estimated 21.4152 22.6156 23.5383 24.8888 26.5741 27.2407 27.34 27.3524Real 21.2283 22.2636 23.1719 24.3456 26.097 28.8169 32.1934 35.99937x7 Estimated 21.0226 22.0707 22.9196 23.9757 25.446 26.5676 26.8225 26.8576Real 20.8446 21.8374 22.7304 23.8295 25.4918 27.9728 31.213 34.937 9x9 Estimated 20.6408 21.6444 22.506 23.4944 24.9746 26.5974 27.1552 27.2373Real 20.5045 21.5016 22.3495 23.2996 24.6436 26.761 29.7923 33.4748

Barbara

11x11 Estimated 20.2965 21.2833 22.1469 23.0199 24.1927 25.8614 26.7647 26.9174Real 24.0632 27.1509 30.2964 33.6468 37.273 41.2964 45.6637 50.12933x3 Estimated 23.5584 26.6655 29.8074 32.9608 35.0155 35.4524 35.5005 35.504 Real 23.1033 25.2132 27.4282 30.0571 33.1536 36.719 40.7213 44.96595x5 Estimated 22.7406 24.7927 26.9735 29.5745 32.2789 33.6228 33.8489 33.8718Real 22.4804 24.1381 26.1268 28.5302 31.3814 34.6481 38.3193 42.41177x7 Estimated 22.1738 23.7767 25.6666 28.0361 30.8162 33.0273 33.6076 33.6758Real 22.0587 23.6051 25.3004 27.3715 29.9533 32.897 36.2837 40.22639x9 Estimated 21.782 23.2519 24.8722 26.8895 29.459 31.951 33.018 33.1789Real 21.5606 22.9538 24.623 26.6416 29.0169 31.766 34.9755 38.7398

Boat

11x11 Estimated 21.3145 22.6123 24.2556 26.1988 28.4963 31.0803 32.7101 33.0407Real 26.7398 30.2543 33.746 37.3324 41.0328 44.8544 48.9565 53.12233x3 Estimated 26.255 29.7393 33.2195 36.7474 39.407 40.1518 40.2293 40.2368Real 25.8456 28.4774 31.0497 33.9961 37.2998 40.8006 44.5499 48.62265x5 Estimated 25.4779 28.035 30.5548 33.4822 36.5811 38.4764 38.8736 38.9279Real 25.1182 27.3499 29.6478 32.1777 35.0686 38.1718 41.6039 45.54687x7 Estimated 24.766 26.961 29.179 31.7038 34.5376 37.0044 37.8739 37.9951Real 24.4436 26.45 28.5452 31.056 33.8238 36.7732 40.1019 43.819 9x9 Estimated 24.0947 26.0663 28.1484 30.5624 33.2791 35.9845 37.3805 37.6098Real 23.8268 25.5836 27.5896 29.9754 32.5945 35.4005 38.5675 42.2403

Goldhill

11x11 Estimated 23.5211 25.2339 27.1688 29.4912 32.0828 34.8182 36.5704 36.9412


47

where m and n respectively represent the vertical and the horizontal sizes of image, and f and f̂ respectively represent the original and the restored images.

Finally, The PSNR of the restored image was computed as

2

1025510 log .PSNRMSE

⎛ ⎞= × ⎜ ⎟

⎝ ⎠ (20)

Fig. 5 shows the PSNR values of the restored Lena

image by the Wiener filter using both real and the estimated noise power.

Table 3. Results of image restoration using the parametric Wiener filter with ( )1/2estK

γ

ηασ= .


Type of NSPR 0 10 20 30 40 50 60 70

K 15.9318 21.1556 26.3624 30.8127 34.4406 38.1502 42.2265 46.69493x3 Estimated 15.5051 20.0607 24.8053 29.8786 34.261 35.7159 35.9104 35.9207K 17.4361 21.7755 25.3427 28.0514 30.5762 33.5892 37.2297 41.58565x5 Estimated 17.1252 21.1244 24.4503 27.5974 30.3498 31.7327 31.9689 31.9935K 18.2425 21.796 24.3586 26.35 28.4543 30.7008 34.8828 39.08847x7 Estimated 17.7177 21.4231 23.9424 26.1339 28.0718 29.2616 29.5343 29.5684K 18.6639 21.6007 23.6512 25.4092 27.3613 29.8073 32.9877 36.867 9x9 Estimated 17.8185 21.2934 23.4131 25.172 26.7803 27.9153 28.2355 28.2727K 18.8018 21.2345 22.9586 24.6046 26.5019 28.887 31.9835 35.8382

Lena

11x11 Estimated 17.6566 20.8506 22.6883 24.2065 25.6444 26.8167 27.2505 27.3095K 14.83 20.0254 24.9753 29.1312 32.7159 36.3606 40.3133 44.76933x3 Estimated 14.5677 19.3279 24.0353 28.7647 32.1538 33.031 33.1424 33.1552K 16.2889 20.4608 23.7696 26.4517 29.1533 32.2267 35.7827 39.984 5x5 Estimated 15.9452 20.0983 23.324 26.258 28.6586 29.6334 29.7835 29.8008K 17.0884 20.5188 23.0076 25.1551 27.4511 30.2898 33.7674 37.763 7x7 Estimated 16.3905 20.2704 22.7929 24.9561 26.8537 27.9151 28.133 28.1587K 17.4939 20.3244 22.2864 24.1143 26.148 28.6795 31.841 35.70819x9 Estimated 16.4573 20.0455 22.1054 23.815 25.3987 26.4971 26.8065 26.8474K 17.7483 20.1703 21.864 23.5494 25.3692 27.7015 30.7469 34.5304

Cameraman

11x11 Estimated 16.417 19.8031 21.6284 23.1215 24.5419 25.6682 26.0991 26.1714K 15.6221 20.2254 23.8989 26.6357 29.857 33.7201 37.7904 42.28383x3 Estimated 15.1585 19.3323 23.2726 26.3977 27.5048 27.6864 27.7061 27.7081K 17.0036 20.596 22.9081 24.4345 26.6259 29.8918 33.7617 38.13975x5 Estimated 16.6052 20.0261 22.4073 24.2798 25.473 25.8018 25.846 25.8513K 17.7924 20.7061 22.4257 23.616 25.1575 27.701 31.1878 34.96797x7 Estimated 17.2068 20.3588 22.1394 23.4625 24.4797 24.93 25.0101 25.023 K 18.25 20.6639 22.0591 23.1151 24.5087 26.9901 30.3902 34.14289x9 Estimated 17.3746 20.3733 21.8667 22.932 23.8154 24.361 24.4941 24.5111K 18.4585 20.5093 21.7195 22.6843 23.7905 25.7951 28.9341 32.7194

Barbara

11x11 Estimated 17.3491 20.1864 21.5347 22.4634 23.2328 23.7866 23.9701 23.9954K 15.4863 20.9674 26.437 31.2731 35.182 39.3307 43.6906 48.67873x3 Estimated 15.1861 20.1418 25.1831 30.609 34.9407 36.149 36.2982 36.3096K 17.0373 21.5494 25.3366 28.3519 31.2236 34.2817 38.1526 42.77785x5 Estimated 16.6487 21.0308 24.6194 27.9757 30.7804 31.9434 32.124 32.1461K 17.9772 21.7346 24.5518 26.8864 29.2883 32.326 36.1975 40.57737x7 Estimated 17.2375 21.3906 24.2071 26.6575 28.8158 30.0834 30.3524 30.381 K 18.545 21.7533 24.0697 26.0827 28.2278 30.8949 34.2007 38.24479x9 Estimated 17.4206 21.3988 23.8148 25.8097 27.6224 28.9701 29.4022 29.4615K 18.8042 21.496 23.4107 25.2646 27.3417 29.8694 33.028 36.8845

Boat

11x11 Estimated 17.3771 21.0831 23.1355 24.832 26.4608 27.8568 28.4567 28.5564K 17.292 22.7483 28.4208 33.684 38.0951 42.0161 46.0533 50.88863x3 Estimated 16.6058 21.1164 26.1581 31.871 37.8767 40.7316 41.1952 41.2456K 18.9349 23.6931 27.9453 31.4699 34.6221 37.9221 41.6176 46.25085x5 Estimated 18.5681 22.5305 26.3129 30.295 34.341 36.966 37.5577 37.6266K 19.921 24.0243 27.3415 30.13 32.7592 35.5151 39.0593 43.18297x7 Estimated 19.5879 23.3945 26.527 29.6315 32.547 34.6976 35.3421 35.4257K 20.4684 23.9485 26.6062 29.0018 31.4507 34.2092 37.4553 41.30019x9 Estimated 19.9582 23.5935 26.2212 28.7512 31.1578 33.1109 33.8879 33.9952K 20.7242 23.6782 25.8681 27.9547 30.2072 32.8143 36.0581 39.7016

Goldhill

11x11 Estimated 19.9897 23.3871 25.6156 27.69 29.717 31.4794 32.3138 32.4576


48

Similar experiments were performed using a parametric Wiener filter with a constant noise-to-signal power ratio (NSPR) as follows:

( )1/2 ,estxx

SK

Sγηη

ηασ≅ = (21)

where experimentally chosen parameters α = 0.000015446 and γ = 2 were used. The near optimal value of estK was obtained by an interactive simulation for the optimal PSNR, where estK is in the range 8[1.0 10 ,1.0]−× . The corresponding PSNR values are given in Table 3. Other experimental results using the manually selected constant for the NSPR are given in Table 4.

Table 4. Results of image restoration using the ForWaRD algorithm in PSNR.


Type of noise power 0 10 20 30 40 50 60 70

Real 12.0817 23.6813 29.2859 31.9092 34.4559 36.8064 39.6032 43.32973x3 Estimated 12.0868 23.6758 29.3248 32.1796 34.144 34.6094 34.6619 34.668 Real 11.7382 22.4076 26.9339 29.2023 31.5026 33.9945 36.6331 39.39125x5 Estimated 11.8118 22.3772 26.9357 29.2575 31.5324 32.6363 32.8197 32.8464Real 11.4096 21.3742 25.424 27.5061 29.7853 32.403 35.0823 37.83217x7 Estimated 11.3242 21.3668 25.4212 27.5169 29.7972 31.4571 31.8569 31.9074Real 11.1004 20.4556 24.3463 26.4596 28.6033 31.0552 33.6474 36.41119x9 Estimated 11.1417 20.4478 24.3411 26.4575 28.616 30.551 31.2091 31.3036Real 10.8212 19.6835 23.4171 25.5444 27.7366 30.2433 32.9371 35.797

Lena

11x11 Estimated 10.8649 19.6606 23.4118 25.5407 27.7343 30.0629 31.2289 31.4529Real 11.0517 22.4179 27.873 30.6476 33.3845 36.1039 39.1418 42.88193x3 Estimated 10.9414 22.3433 27.8843 30.6871 32.2059 32.5164 32.554 32.556 Real 10.7765 21.0941 25.4287 28.0118 30.4362 32.9708 35.8083 38.86585x5 Estimated 10.6758 21.0289 25.419 28.0188 30.3005 31.3138 31.4702 31.4753Real 10.561 20.291 24.1672 26.5228 28.9035 31.4477 34.1255 37.11747x7 Estimated 10.4758 20.2319 24.1577 26.524 28.8491 30.433 30.8025 30.8504Real 10.3691 19.5755 23.0402 25.3061 27.5775 30.0886 32.8474 35.723 9x9 Estimated 10.3303 19.5142 23.0302 25.2988 27.5672 29.5916 30.3182 30.4176Real 10.2108 19.0828 22.4008 24.5842 26.7784 29.2419 31.9406 34.9117

Cameraman

11x11 Estimated 10.2051 19.0236 22.3881 24.5751 26.7639 29.0231 30.1878 30.3834Real 11.0449 21.2378 24.7526 26.9164 30.1828 33.8312 37.5026 41.95743x3 Estimated 10.8148 21.087 24.6716 26.0643 26.5323 26.6119 26.6176 26.6209Real 10.7637 20.5683 23.4534 24.5723 26.8274 30.3315 34.0291 37.81155x5 Estimated 10.5676 20.4787 23.4474 24.4801 25.4027 25.7003 25.735 25.7418Real 10.4977 19.9653 22.8228 23.8457 25.2706 27.9346 31.623 35.33347x7 Estimated 10.3799 19.8755 22.8167 23.827 24.9221 25.5157 25.6219 25.6313Real 10.2513 19.4173 22.3375 23.3361 24.6415 27.048 30.586 34.28849x9 Estimated 10.1233 19.3608 22.3348 23.3324 24.4798 25.4812 25.758 25.7993Real 10.0229 18.9017 21.8941 22.9549 23.9286 25.8029 29.1708 32.9918

Barbara

11x11 Estimated 9.9002 18.8099 21.8931 22.954 23.8844 24.8263 25.1891 25.2358Real 10.3301 22.6785 29.4999 32.4216 35.2734 37.9196 41.0248 44.868 3x3 Estimated 10.0789 22.5 29.5178 32.6431 34.7408 35.1895 35.2668 35.2639Real 10.0144 21.4427 26.8712 29.4584 32.0061 34.7688 37.731 40.74025x5 Estimated 9.7854 21.227 26.8496 29.4914 32.0153 33.3866 33.6243 33.6423Real 9.7684 20.6427 25.465 27.9147 30.4422 33.1262 35.8268 38.90867x7 Estimated 9.6516 20.5446 25.4509 27.9136 30.4533 32.4932 33.0274 33.0795Real 9.553 20.0144 24.6253 26.9061 29.1889 31.7675 34.4731 37.347 9x9 Estimated 9.5524 20.0018 24.6092 26.9015 29.2019 31.5362 32.5181 32.662 Real 9.3585 19.4501 23.7401 26.0097 28.2808 30.8416 33.5767 36.5561

Boat

11x11 Estimated 9.2723 19.3932 23.7187 25.9944 28.2818 30.7293 32.0985 32.3484Real 13.1989 25.4969 32.3482 35.2669 37.8552 40.285 43.2054 46.86223x3 Estimated 13.2117 25.3896 32.3882 35.6726 38.1347 38.8696 38.9647 38.9783Real 12.9351 24.466 30.034 32.7458 35.0247 37.3128 39.9765 42.70685x5 Estimated 12.8569 24.4952 30.0358 32.8443 35.4271 36.8648 37.1323 37.1505Real 12.6919 23.5989 28.7017 31.1889 33.3095 35.7235 38.2176 40.99627x7 Estimated 12.6546 23.6454 28.7035 31.2045 33.5262 35.3785 35.8772 35.9481Real 12.4738 22.8065 27.493 30.0713 32.2931 34.6044 36.9353 39.57369x9 Estimated 12.4534 22.8189 27.4957 30.0712 32.3863 34.669 35.5987 35.7388Real 12.2787 22.1495 26.4667 29.0588 31.3209 33.6124 35.9986 38.7593

Goldhill

11x11 Estimated 12.2625 22.1802 26.4651 29.0606 31.357 33.8235 35.3351 35.623


49

In addition to the original and the parametric Wiener filter, this study tested Fourier-Wavelet Regularized Deconvolution (ForWaRD) algorithm, which effectively combines and balances scalar Fourier shrinkage and wavelet shrinkage [13]. Table 5 summarizes the restored results using the ForWaRD algorithm by comparing the PSNRs with the real and estimated noise power. The ForWaRD algorithm was tested on two conditions, one with real noise power and the other with an estimated

noise power, as shown in Table 5. For images with high level noise, such as SNR 0 to 40dB, the ForWaRD algorithm gives similar restoration performance using both the real and estimated noise power.

Fig. 7 shows the restoration results using the ForWaRD algorithm with real and estimated noise power for the Lena image.

Fig. 8 shows the restored Lena images for the 7 7× uniform blur and 20dB additive noise.

From Fig. 8(c)-(f) exhibit some artifacts due to incomplete reconstruction of the entire frequency components, while the ForWaRD algorithm can reduce such artifacts, as shown in Fig. 8(g) and (h). In addition, there are no differences between the results using the original and estimated noise power.

Fig. 5. PSNR values of the Wiener filter versus various noise levels.

Fig. 6. PSNR values of the restored results using theparametric Wiener filter for the Lena image versusvarious levels of noise.

Fig. 7. PSNR values of the restored results using theForWaRD algorithm for Lena image versus variouslevels of noise.

(a)

(b)

(c) (d)

(e)

(f)

(g) (h)

Fig. 8. Comparison of the restored images (a) original image, (b) observed image degraded by two dimensional 7 7× uniform blur with 20dB whith Gaussian noise, (c) restored image using the original Wiener filter with real noise power, (d) restored image using the original Wiener filter with the estimated noise power, (e) restored image using the parametric Wiener filter with manually chosen constant NSPR shown in Table 4, (f) restored image using the parametric Wiener filter with the estimated noise power, (g) restored image using the ForWaRD algorithm with real noise power, (h) restored image using the ForWaRD algorithm with the estimated noise power.

(a)

(b)

(c) (d)

(e)

(f)

(g) (h)

Fig. 9. Magnified images of Fig. 8.


50

5. Conclusion

This paper proposed a novel noise power estimation algorithm based on the DWT-based minimum statistics. The estimated noise can be used for any parametric image restoration filter. Based on experimental results, the accuracy of the proposed estimation method was more than 90% with a relatively high noise level of SNR. As the SNR was increased to more than 50dB, the estimation accuracy decreased because the total number of noise samples approaches zero. Such inaccuracy, however, is not a serious problem because the final restored results maintain sufficiently high quality. The proposed algorithm can estimate the noise power far more efficiently than any existing algorithm. In addition, there were almost no differences between the restored results using real and estimated noise power. Because the proposed algorithm estimated noise power in the DWT domain, it can fully utilize multiple resolution characteristics of the noise distribution, and as a result, its estimation results can be applied successfully to any kind of parametric image restoration filters.

Acknowledgement

This studywas supported in part by the Basic Science Research Program through National Research Foundation (NRF) of Korea funded by the Ministry of Education, Science and Technology (2009-0081059, 2013R1A1A2061847).

References

[1] M. Banhamand A. Katsaggelos, “Digital image

restoration,”IEEE Signal Processing Magazine, vol. 14, no. 2, pp. 24-41, Mar. 1997. Article (CrossRef Link)

[2] S. Chang, Y. Bin, and M. Vetterli,“Adaptive wavelet thresholding for image denoising and compression,” IEEE Trans. Image Processing, vol. 9, no. 9, pp.1532-1546, Sep. 2000. Article (CrossRef Link)

[3] L.Ce, R. Szeliski, K. Sing, C. Zitnick, and W. Freeman, “Automatic estimation and removal of noise from a single image,” IEEE Trans. Pattern Analysis, Machine Intelligence, vol. 30, no. 2, pp. 299-314, Feb. 2008. Article (CrossRef Link)

[4] A. Tekalp, H. Kaufman, and J. Woods, “Identification of image and blur parameters for the restoration of noncausal blurs,”IEEE Trans. Acoustics, Speech, Signal Processing, vol. 34, no. 4, pp. 963-972, Aug. 1986. Article (CrossRef Link)

[5] R. Legendijk, A.Tekalp, and J. Biemond, “Maximum likelihood image and blur identification: a unifying approach,”Optical Engineering, vol. 29, no. 5, pp 422-435, May 1990. Article (CrossRef Link)

[6] A. Tekalp and H. Kaufman, “On statistical identification of a class of linear space-invariant blurs using nonminimum-phase arma models,” IEEE Trans. Acoustics, Speech, Signal Processing,vol. 38, no. 8,

pp. 1360-1363, Aug. 1988. Article (CrossRef Link) [7] H. Engl, “Discrepancy principles for Tikhonov

regularization of ill-posed problems leading to optimal convergence rates,” Journal of Optimization Theory and Applications, vol. 52, no. 2, pp. 209-215, 1987. Article (CrossRef Link)

[8] N. Wayrich, and G. Warhola, “Wavelet shrinkage and generalized cross validation for image denoising,” IEEE Trans. Image Processing, vol. 7, no.1, pp. 82-90, Jan. 1998. Article (CrossRef Link)

[9] P. Hansen, “Analysis of discrete ill-posed problems by means of the L-curve,”SIAM Review, vol. 34, no. 4, pp. 561-580, 1992. Article (CrossRef Link)

[10] D. Krawczyk-Stando, and M. Rudnicki, “Regularization parameter selection in discrete ill-posed problems: the use of the U-curve,” International Journal of Applied Mathematic Computer Science, vol. 17, no. 2, pp. 157-164,2007. Article (CrossRef Link)

[11] S. Kayand S. Marple, “Spectrum analysis-A modern perspective,”IEEE Proceeding, vol. 69, no. 11, pp. 1380-1419,Nov. 1981. Article (CrossRef Link)

[12] R.Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 5, pp. 504-512,July 2001. Article (CrossRef Link)

[13] R.Neelamani,H. Choi, and R.Baraniuk,“ForWaRD: Fourier-wavelet regularized deconvolution for ill-conditioned systems,”IEEE Trans. Signal Processing, vol. 52, no. 2, pp. 418-433, Feb. 2004. Article (CrossRef Link)

Yoonjong Yoo was born in Seoul, Korea in 1981. He received hisB.S. degree in electronic engineering from Chung-Ang University, Seoul, Korea, in 2005. He received hisM.S. degree in image engineering from Chung-Ang University, Seoul, Korea, in 2007. From 2009 to 2013, he joined

Nextchip, where he designed the auto exposure, auto white balance, and wide dynamic range for surveillance camera. Currently, he is pursuing a Ph.D. degree in image processing at Chung-Ang University. His research interests include image enhancement and restoration for display processing, video compression standards and surveillance video applications.


51

Jeongho Shin received hisB.S. and the M.S. degrees in electronic engineering from Chung-Ang University, Seoul, Korea, in 1994 and 1998, respectively, and Ph.D. degree in image engineering from Chung-Ang University, Seoul, Korea, in 2001. From 2003 to 2006, he was a research professor at the

department of image engineering, Chung-Ang University, Seoul, Korea. Currently, he is an assistant professor at the department of web information engineering, Hankyong National University, Gyeonggi, Korea. His current research interests include the enhancement and restoration of image and video, object tracking, and data fusion.

Joonki Paik was born in Seoul, Korea in 1960.He received hisB.S. degree in control and instrumentation engineering from Seoul National University in 1984.He received hisM.S. and the Ph.D. degrees in electrical engineering and computer science from North- western University in 1987 and 1990,

respectively. From 1990 to 1993, he joined Samsung Electronics, where he designed the image stabilization chip sets for consumer’s camcorders.Since 1993, he has joined the faculty at Chung-Ang University, Seoul, Korea, where he is currently a Professor in the Graduate school of Advanced Imaging Science, Multimedia and Film. From 1999 to 2002, he was a visiting Professor at the Department of Electrical and Computer Engineering at the University of Tennessee, Knoxville. Dr. Paik was a recipient of the Chester-Sall Award from the IEEE Consumer Electronics Society, Academic Award from the Institute of Electronic Engineers of Korea, and Best Research Professor Award from Chung-Ang University. He has served the Consumer Electronics Society of IEEE as a member of the Editorial Board. Since 2005, he has been the head of National Research Laboratory in the field of image processing and intelligent systems. In 2008, he has worked as a full-time technical consultant for the System LSI Division in Samsung Electronics, where he developed various computational photographic techniques including an extended depth of field (EDoF) system. From 2005 to 2007 he served as Dean of the Graduate School of Advanced Imaging Science, Multimedia, and Film. From 2005 to 2007 he has been Director of Seoul Future Contents Convergence (SFCC) Cluster established by Seoul Research and Business Development (R&BD) Program. Dr. Paik is currently serving as a member of the Presidential Advisory Board for Scientific/Technical policy of Korean Government and a technical consultant of Korean Supreme Prosecutor’s Office for computational forensics.

Copyrights © 2014 The Institute of Electronics and Information Engineers


52


Exact Histogram Specification Considering the Just Noticeable Difference

Seung-Won Jung

Department of Multimedia Engineering, Dongguk University / Seoul, South Korea [email protected] * Corresponding Author: Seung-Won Jung


* Short Paper

Abstract: Exact histogram specification (EHS) transforms the histogram of an input image into the specified histogram. In the conventional EHS techniques, the pixels are first sorted according to their graylevels, and the pixels that have the same graylevel are further differentiated according to the local average of the pixel values and the edge strength. The strictly ordered pixels are then mapped to the desired histogram. However, since the conventional sorting method is inherently dependent on the initial graylevel-based sorting, the contrast enhancement capability of the conventional EHS algorithms is restricted. We propose a modified EHS algorithm considering the just noticeable difference. In the proposed algorithm, the edge pixels are pre-processed such that the output edge pixels obtained by the modified EHS can result in the local contrast enhancement. Moreover, we introduce a new sorting method for the pixels that have the same graylevel. Experimental results show that the proposed algorithm provides better image enhancement performance compared to the conventional EHS algorithms.

Keywords: Contrast enhancement, Image enhancement, Histogram equalization, Histogram specification, Human visual system 1. Introduction

Histogram based image processing plays a crucial role in image contrast enhancement due to its conceptual simplicity and reliable performance. One of the widely used histogram based techniques is histogram equalization (HE), which transforms a narrow input histogram into a wide and uniform target histogram. However, HE can result in an annoying visible artifact since the uniformly distributed target histogram tends to excessively enhance a dynamic range of an input image. To this end, many improvements on HE have been proposed in the literature.

Histogram specification (HS) is a generalized version of HE that can change the input histogram into any desired histogram. Thus, HE is equivalent to HS when a desired histogram is the uniform distribution. In HS, HE serves as a link to connect the input and desired histograms [1]. However, since the graylevels of the digital image are discrete, the exact solution, i.e., the uniform distribution, is not achieved for HE in general. Since HS is relying on HE, therefore, only crude approximation of the desired histogram is obtained. In order to more closely

approximate the desired histogram, the graylevel grouping [2] and the graph theory [3] were utilized. However, the exact specification of the desired histogram, so called exact HS (EHS), is not accomplished due to the inherent ill-posed characteristics of the problem [4].

In order to exactly specify the uniform histogram, the pixels of the same graylevel are separated randomly or distinguished according to the local average pixel values. The exact histogram equalization is generalized into the EHS in [1, 4]. In [1], the pixels of the same graylevel are sorted by comparing the local average pixel values. If the local averages are the same, the enlarged neighboring filter masks are employed to further discriminate pixels. In [4], the pixel ordering is performed in the wavelet transform domain. By comparing the absolute values of the wavelet coefficients, the local edge information can be examined as well as the local average information.

However, the wavelet based method does not provide the further visual quality improvement as compared to [1]. This is because the pixel value is the first condition of the

ordering methods in both [1] and [4]. In other words, regardless of the local perceptual characteristics of the


53

human visual system (HVS), the pixels are firstly sorted solely based on the graylevel. This restriction on the pixel ordering tends to constrain the contrast enhancement capability of the EHS. Therefore, a new ordering method utilizing the characteristics of the HVS is required to successively perform the contrast enhancement.

There are many useful characteristics of the HVS for the contrast enhancement. First, the HVS is sensitive to the contrast rather than the absolute graylevel. Therefore, histogram mapping should consider the contrast changes instead of the pixel value changes. Especially, the just noticeable difference (JND) can explain the perceptual contrast changes of the image [6-9]. Second, changes at or near edges have profound impacts on the HVS. Consequently, histogram mapping should differently process the edge regions of the image. Even though the conventional EHS techniques [1, 4] partially utilize the above HVS characteristics, a perceptual contrast enhancement is limited due to the strong dependency of the pixel ordering on the pixel value.

In this paper, an improved EHS (IEHS) algorithm based on the HVS is proposed. In order to effectively enhance the contrast of the image, the edge pixels are pre-processed so that the output edge pixels obtained by the IEHS can result in local contrast enhancement. Moreover, a new ordering method for sorting the pixels of the same graylevel is introduced. The rest of this paper is organized as follows. A brief description of the EHS is given in Section II, and the proposed IEHS algorithm is presented in Section III. The experimental results are provided in Section IV, and conclusions are given in Section V.

2. Exact Histogram Specification

In this section, we briefly explain the basic HS and review two conventional EHS techniques [1, 4]. Since HS is a technique that converts an original histogram into a desired one, the desired histogram is assumed to be given. Therefore, finding a suitable desired histogram [5, 11] is not a concern of our work.

Let u and v denote the random variable (RV) representing the original and desired histograms, respectively. Then, the RVs of the histogram equalized version, uHE and vHE, are obtained as follows:

( ),( ),

TT

==

HE u

HE v

u uv v

(1)

where Tu and Tv are the cumulative distribution functions (CDFs) of u and v, respectively. Since the equalized RVs should be identically distributed, the mapping from u to v can be represented by

1 1 1( ) ( ) ( )T T T T− − −= = =v HE v HE v uv v u u . (2)

However, for the discrete RVs, the exact histogram

equalization is impossible in general [1]. This is because the pixels of the same graylevel should be mapped to the

same output graylevel. Therefore, the equality between uHE and vHE does not hold for the discrete RV. Consequently, the specified histogram can only crudely approximate the desired histogram.

To solve this problem, a strict pixel ordering algorithm was proposed in [1]. The goal of the strict ordering is to prioritize all pixels that have the same graylevel. In other words, the normal ordering based on the pixel value is maintained while the pixels of the same graylevel are sorted based on their neighboring pixel values. The basic rule of the strict ordering is that the pixel surrounded by bright pixels is assumed to be brighter than that surrounded by dark neighboring pixels. This local average based sorting can discriminate the pixels of the same graylevel. Since it is possible that the local average values are still the same, the number of pixels to be averaged is increased in such a case. In [1], the local average filter masks of different sizes were defined as follows:

1 2 3

4 5

0 1 0 1 1 11[1], 1 1 1 , 1 1 1 ,5

0 1 0 1 1 1

0 0 1 0 0 0 1 1 1 00 1 1 1 0 1 1 1 1 1

1 11 1 1 1 1 , 1 1 1 1 1 .13 21

0 1 1 1 0 1 1 1 1 10 0 1 0 0 0 1 1 1 0

φ φ φ

φ φ

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= = =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

(3)

The other masks, 6φ , 7φ , and so on, are constructed in

a similar manner. The above masks are sequentially applied until the different local average values are obtained. The experiments in [1] revealed that the strict ordering is satisfied before examining 7φ in most cases. After the strict ordering, the EHS is achieved by assigning the sorted pixels to the desired pixel values.

In [4], the strict ordering is performed in the wavelet domain. For the pixels of the same graylevel, the sorting is performed according to the absolute values of the wavelet coefficients since the wavelet coefficients contain the local edge information as well as the local average information. When the absolute values are the same at a certain subband, a coarser subband is examined until the different values are found. However, the visual quality of the output image by the wavelet based EHS is not noticeably improved compared to the EHS algorithm in [1]. Only for the highly compressed image, the wavelet domain technique slightly improves the visual quality.

In summary, the EHS can be accomplished by utilizing the neighboring pixels. From the viewpoint of the HVS, the wavelet domain technique is more appropriate. However, we found that the initial ordering based on the pixel value restricts the image contrast enhancement capability. In order to improve the performance, the strong dependency of the pixel ordering on the pixel value needs to be relaxed.

Jung: Exact Histogram Specification Considering the Just Noticeable Difference

54

3. The Proposed Algorithm

We assume that the goal of the EHS is the image contrast enhancement. From this viewpoint, in this Section, we first propose a preprocessing algorithm that facilitates the image contrast enhancement. Then, we present a modified pixel ordering method that can further improve the image contrast.

3.1 HVS Based Preprocessing In the classical HE and HS algorithms, the pixels of the

same graylevel are always mapped to the same output graylevel. Therefore, this mapping principle can reduce the local contrast of the output image. Also, since the HVS is sensitive to the contrast rather than the graylevel, the pixel sorting based on the graylevel is not suitable to the HVS. In order to improve the conventional algorithms, the HVS characteristics should be more extensively exploited.

The HVS can only perceive the difference above the JND. Due to the HVS characteristics, the JND is dependent on the background luminance and the spatial activity. In [8], the JND is modelled by the combination of two threshold values as follows:

( , )( , ) ( , ) ,( , )

tl

l

Th x yjnd x y Th x yTh x y

λ= + (4)

where (x, y) is the pixel coordinates, = 0.5, Thl is the threshold for the luminance adaptation, and Tht is the threshold for the activity masking. Specifically, a piecewise linear approximation in Fig. 1 is used for Thl, and Tht is estimated by the maximum pixel difference in the 5x5 spatial neighborhood. The parameters in Fig. 1, f, g, and h, are defined in [8]. Then, with the aid of the JND in (4), the noticeable local contrast (NLC), co, is defined as

0, if ( , ) ( , ) ( , )

( , ) ,( , ) ( , ) / ( , ), otherwise

o

o x y o x y jnd x yc x y

o x y o x y jnd x y

⎧ − ≤⎪= ⎨−⎪⎩

(5)

where o represents the original image and o (x, y) is the average pixel value of the 5x5 mask centered at (x, y) [8].

Our remaining problem is how to effectively enhance this NLC by the EHS. In general, the increase of the NLC is advantageous only for the edge pixels. This is because the increase in the non-edge pixels tends to produce annoying artifacts. From this viewpoint, we propose an algorithm that preprocesses the edge pixels so that the NLC can be successfully enhanced by the newly defined IEHS process, whose flowchart is shown in Fig. 2.

First, the Sobel edge mask is applied to the input image o. Then, the binary edge map, oΩ , is obtained by

1, if ( , )

( , ) ,0, otherwise

e

oG x y Th

x y⎧ >

Ω = ⎨⎩

(6)

where G is the Sobel filtered result and The is set to 30 [8].

While the edge map is constructed, the original HS is performed by using o and the given desired histogram h. After HS, the specified image, d, and the HS mapping function, Fh, are obtained. Specifically, Fh maps the graylevels between o and d. Here, the objective of applying the original HS is to approximately estimate the change of the NLC. If the NLC in the edge pixels is not increased by the original HS, it is beneficial to modify the original pixel value such that the sufficient NLC enhancement can be achieved by the following EHS process.

For each edge pixel, therefore, the NLC of o and d, co and cd, are compared. co is defined in (5), and cd is simply obtained by replacing o by d. Then, at edge position (x,y), if cd (x, y) ≥ co(x, y), no modification is required since the NLC is already increased. Otherwise, the original pixel value needs to be updated. For this case, in order to find a proper input pixel value, the desired output pixel value is estimated. Our assumption on the desired HS result is that the NLC should be increased or at least maintained for the edge pixels. Therefore, if cd (x, y) < co(x, y), d(x, y) is updated to ∝d (x, y) in a way that cd (x, y) is mapped into co(x, y), i.e.,

∝ ( , ) ( , ) ( , ), if ( , ) ( , )( , ) .

( , ) ( , ) ( , ),otherwiseo

o

d x y c x y jnd x y d x y d x yd x y

d x y c x y jnd x y

⎧ + ⋅ >⎪= ⎨− ⋅⎪⎩

(7)

where ( , )d x y is the average pixel value of d inside the 5x5 mask centered at (x, y). In (7), the local contrast of d is emphasized by increasing the difference between the current pixel value and its local average. Consequently, a possible loss of the NLC in the edge pixels can be alleviated. This can be checked in (5) by replacing o to ∝d . Up to this step, the desired specified image is approximated by preventing the decrease of the NLC at the edge pixels. The remaining problem is to convert the original image in a way that the resultant EHS image resembles the updated image, ∝d . To this end, the input pixel value is modified to produce a pixel value close to the updated value. This is simply done by

Fig. 1. Visibility threshold against background luminance [8].


55

∝( , ) arg min ( ) ( , ) .hk

o x y F k d x y= −∃ (8)

By modifying the input pixel value in advance, the

undesired loss of the NLC can be prevented. Note that the original HS is used to estimate the output

pixel values of the EHS. This simple HS can be replaced by the conventional EHS algorithms [1, 4] at the expense of the computational overhead. However, since our objective is to approximate the output, the complicated EHS process is not necessary in the preprocessing stage. Also, the proposed preprocessing is different from the conventional contrast enhancement algorithms [12-14] in that the pixel values are controlled by the desired histogram. Since the input image is tuned with the consideration of the specified histogram, the following EHS can produce a perceptually enhanced image without a loss of the NLC.

3.2 Pixel Ordering Method After applying the proposed preprocessing algorithm,

the strict pixel ordering should be performed for the EHS. As described in Section II, the neighboring pixels are necessary to discriminate the pixels of the same graylevel. In [1], the filter masks in (3) are sequentially examined until all the pixels are prioritized. Basic intuition behind this approach is that the local average pixel intensities can be viewed as low resolution information. Since the HVS perceives a local region not a certain pixel, it is reasonable to consider the low resolution pixel values. In order to further utilize the HVS characteristics, the strict ordering is performed in the wavelet domain [4]. In the wavelet domain sorting, the horizontal, vertical, and diagonal frequency bands are additionally compared since the significant image

In the proposed EHS, the pixel ordering similar to [4] is used. It is evident that the HVS is sensitive to the high

frequency bands. However, for the pixel ordering purpose, it is not necessary to differentiate the high frequency components. Thus, only the difference between the current and neighboring pixels is considered in the proposed pixel ordering. To this end, the filter masks are designed as follows:

1 2 2

3 3

0 1 0 0 1 01[1], 1 1 1 , 1 4 1 ,5

0 1 0 0 1 0

1 1 1 1 1 11 1 1 1 , 1 8 1 .9

1 1 1 1 1 1

l h

l h

φ φ φ

φ φ

−⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= = = − −⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥−⎣ ⎦ ⎣ ⎦

− − −⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= = − −⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥− − −⎣ ⎦ ⎣ ⎦

(9) The larger masks are similarly defined by enlarging the

filter support while keeping the symmetry. It can be seen that only the mask, h

iφ , is additionally included to (3). In the proposed pixel ordering, the pixels are firstly ordered based on the pixel value, i.e, the result using 1φ . Then, for the pixels of the same graylevel, the remaining masks are sequentially examined until all pixels are sorted. Note that

hiφ measures the local deviation of the current pixel to the

average of neighboring pixels. When the local average values are the same, i.e, the results of l

iφ are equivalent, the results of h

iφ are compared to enhance the local contrast. For instance, if the current pixel is darker than the neighboring pixels, it is advantageous to map that pixel to a more darker value to increase the local contrast. This can be accomplished by sorting the outputs of h

iφ . In addition, the ordering failure problem in [1], which occurs when all the local averages are exhausted without successfully ordering pixels, can be effectively alleviated.

Notice that the proposed pixel ordering method also depends on the pixel value. Without the use of the

Fig. 2. Flow chart of the proposed preprocessing algorithm.


56

preprocessing technique, therefore, the proposed IEHS may not provide a significant difference compared to the conventional EHS algorithms. Together with the preprocessing, however, a possible loss of the NLC can be avoided and the local contrast can be further enhanced by the EHS.


In order to evaluate the performance of the proposed IEHS algorithm, four grayscale images, “Cameraman”, “Clock”, “Leopard”, and “Candy”, in Fig. 3 are tested. The histograms of these four images are shown in Fig. 4. Then, the uniform histogram is assumed as the desired histogram. Since the IEHS is based on the strict ordering, the resultant images can have the exact uniform histogram. For the performance evaluation, the visual quality of the resultant images is compared. In [4], it was shown that Wan et al.’s algorithm slightly outperforms Coltuc et al.’s algorithm [1]. Thus, for the visual quality evaluation, we compare the proposed IEHS only to Wan et al.’s algorithm. Hereafter, the conventional algorithm represents Wan et al.’s

algorithm unless otherwise mentioned. Fig. 5 shows the results of the conventional and

proposed algorithms for the uniform specified histogram. Since the proposed technique consists of the two steps, the resultant images after the preprocessing step are shown in the first column. Although the resultant histograms for each target histogram are the same for both algorithms, the proposed algorithm provides superior visual quality compared to the conventional algorithm. As can be seen, the loss of the local contrast is noticeable in the results of the conventional algorithm. Since the proposed preprocessing can prevent the loss of the NLC and the proposed pixel ordering can further emphasize the local contrast, the resultant images obtained by the IEHS have sharper image details.

In order to evaluate the performance objectively, the original images in Fig. 3 are degraded by reducing the contrast and inducing the image blur. To this end, the pixel values are multiplied by k and the Gaussian blur with variance of 2σ is applied. In Table 1, the peak signal-to- noise ratio (PSNR) results are provided by using different k and 2σ values. Since the original images are given in this simulation, the histograms of the original images are set as the desired histograms. By comparing the EHS results on the degraded images with the original images, the image restoration capability can be assessed. The objective performance comparison with respect to the PSNR shows that the proposed IEHS outperforms the conventional technique. Note that the effect of the image noise is not considered even the histogram based image processing techniques are inherently sensitive to the image noise. Thus, when dealing with the noisy images, denoising algorithms should be employed before applying the EHS technique.

Table 2 compares the ratio of the computational

Fig. 3. Test images of 256_256 size.

Fig. 4. The histograms of the original images.

Table 1. Experiment Parameters.

Algorithm (k,σ2) Cameraman Clock Leopard Candy

(0.5, 1) 26.33 29.08 25.62 37.30(0.5, 2) 24.66 27.34 23.86 35.04(0.5, 4) 23.78 26.40 22.90 33.87

(0.25, 1) 26.21 28.93 25.51 36.70(0.25, 2) 24.58 27.21 23.77 34.66

Wan’s

(0.25, 4) 23.72 26.31 22.81 33.57(0.5, 1) 26.62 29.44 25.94 37.85(0.5, 2) 24.85 27.59 24.09 35.46(0.5, 4) 23.89 26.57 23.07 34.20

(0.25, 1) 26.71 29.53 26.06 37.23(0.25, 2) 24.90 27.65 24.18 35.17

Proposed

(0.25, 4) 23.90 26.60 23.15 33.97

Table 2. The comparison of the computational complexity (ratio).

Cameraman Clock Leopard Candy uniform 1.01 1.02 1.08 0.93


57

complexity between the proposed and Wan et al.’s methods, where the processing time of the proposed method is divided by that of Wan et al.’s. We see that even though the processing time depends on the image characteristics, the computational complexity of the proposed IEHS is comparable to that of Wan et al.’s method. This is because the computational overhead required for preprocessing is mainly compensated by the proposed pixel ordering. In the ordering stage, we empirically found that the pixel ordering algorithm in [1] utilizes the filter masks up to 6φ or 7φ for the smooth regions. However, in the proposed method, the pixel ordering is frequently finished at 2

hφ or 3hφ in such a case.

Thus, the proposed preprocessing algorithm does not deteriorate the total computational complexity.

5. Conclusion

In this paper, we have presented an IEHS algorithm consisting of the preprocessing and the strict pixel ordering.

In the preprocessing, the pixel values are modified to prevent the loss of the NLC by alleviating the dependency of the pixel value on the EHS result. Then, the pixel ordering concerning the local contrast enhancement is applied to exactly specify the desired histogram. Compared to the conventional EHS algorithm, the proposed IEHS algorithm provides better image enhancement performance.

The proposed algorithm is applicable to a wide variety of multimedia devices. For instance, a user may specify a desirable histogram or the multimedia device can provide a suitable histogram by a certain algorithm. In such a case, the proposed algorithm can not only exactly specify the desired histogram but also provide a perceptually pleasant image. Fast implementation and speedup issues remain a future work of the proposed algorithm.

References

[1] D. Coltuc, P. Bolon, and J.-M. Chassery, “Exact

histogram specification,” IEEE Trans. Image Process., vol. 15, no. 5, pp. 1143–1152, May 2006. Article

Fig. 5. The visual quality comparison for the uniform target histogram. First column: Resultant images afterpreprocessing, Second column: Specified images by the conventional algorithm, Third column: Magnified regionsof the second column, Fourth column: Specified images by the proposed algorithm, Fifth column: Magnifiedregions of the fourth column. The images are best viewed in the electronic version.


58

(CrossRef Link) [2] Y. J. Zhang, “Improving the accuracy of direct

histogram specification,” Electron. Lett., vol. 28, no. 3, pp. 213–214, Jan. 1992. Article (CrossRef Link)

[3] S. Kundu, “A solution to histogram-equalization and other related problems by shortest path methods,” Pattern Recognit., vol. 31, no. 3, pp. 231–234, Jun. 1998. Article (CrossRef Link)

[4] Y. Wan and D. Shi, “Joint exact histogram specification and image enhancement through the wavelet transform,” IEEE Trans. Image Process., vol. 16, no. 9, pp. 2245–2250, Sep. 2007. Article (CrossRef Link)

[5] T. Arici, S. Dikbas, and Y. Altunbasak, “A histogram modification framework and its application for image contrast enhancement,” IEEE Trans. Image Process., vol. 18, no. 9, pp. 1921–1935, Sep. 2009. Article (CrossRef Link)

[6] T.-L. Ji, M. K. Sundareshan, and H. Roehrig, “Adaptive image contrast enhancement based on human visual properties,” IEEE Trans. Med. Imag., vol. 13, no. 4, pp. 573–586, Dec. 1994. Article (CrossRef Link)

[7] A. B. Watson, J. Hu, and J. F. McGowan III, “DVQ: a digital video quality metric based on human vision,” J. Electron. Imaging, vol. 10, no. 1, pp. 20-29, 2001. Article (CrossRef Link)

[8] W. Lin, L. Dong, and P. Xue, “Visual distortion gauge based on discrimination of noticeable contrast changes,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 900–909, Jul. 2005. Article (CrossRef Link)

[9] I. Hontsch and L. Karam, “Adaptive image coding with perceptual distortion control,” IEEE Trans. Image Process., vol. 11, no. 3, pp. 213–222, Mar. 2002. Article (CrossRef Link)

[10] M. P. Eckert and A. P. Bradley, “Perceptual quality metrics applied to still image compression,” Signal Process., vol. 70, pp. 177.200, 1998. Article (CrossRef Link)

[11] C.-C. Sun, S.-J. Ruan, M.-C. Shie, and T.-W. Pai, “Dynamic contrast enhancement based on histogram specification,” IEEE Trans. Consum. Electron., vol. 51, no. 4, pp. 1300–1305, Nov. 2005. Article (CrossRef Link)

[12] D.-C. Chang and W.-R. Wu, “Image contrast enhancement based on a histogram transformation of local standard deviation,” IEEE Trans. Med. Imag., vol. 17, no. 4, pp. 518–531, Aug. 1998. Article (CrossRef Link)

[13] A. Beghdadi and A. L. N´egrate, “Contrast enhancement technique based on local detection of edges,” Computer Vision, Graphics, and Image Processing, vol. 46, no. 2, pp. 162–174, May 1989. Article (CrossRef Link)

[14] A. Polesel, G. Ramponi, and V. Mathews, “Image enhancement via adaptive unsharp masking,” IEEE Trans. Image Process., vol. 9, no. 3, pp. 505–510, Mar. 2000. Article (CrossRef Link)

Seung-Won Jung is received the B.S. and Ph.D. degrees in electrical engineering from Korea University, Seoul, Korea, in 2005 and 2011, respectively. He was a Research Professor with the Research Institute of Information and Communication Technology, Korea University, from

2011 to 2012. He was a Research Scientist with the Samsung Advanced Institute of Technology, Yongin-si, Korea, from 2012 to 2014. He is currently an Assistant Professor at the Department of Multimedia Engineering, Dongguk University, Seoul, Korea. He has published over 30 peer-reviewed articles in international journals. His current research interests include image enhancement, image restoration, video compression, and computer vision.



59


Bounding volumeestimation algorithm for image-based 3D object reconstruction

Tae Young Jang1, Sung Soo Hwang1, Hee-Dong Kim1, and Seong Dae Kim1

1 Department of Electrical Engineering, KAIST / Daejeon 305-701 South Korea {tyjang1020, dreamerjoe, khdong98, sdkim}@kaist.ac.kr

* Corresponding Author: SeongDae Kim


* Short Paper

Abstract: This paper presents a method for estimating the bounding volume for image-based 3D object reconstruction. The bounding volume of an object is a three-dimensional space where the object is expectedto exist, and the size of the bounding volume strongly affects the resolution of the reconstructed geometry. Therefore, the size of a bounding volume should be as small as possible while it encloses an actual object. To this end, the proposed method uses a set of silhouettes of an object and generates a point cloud using a point filter. A bounding volume is then determined as the minimum sphere that encloses the point cloud. The experimental results show that the proposed method generates a bounding volume that encloses an actual object as small as possible.

Keywords: Bounding volume, Point consistency, Object reconstruction 1. Introduction

A visual hull is a three-dimensional (3D) entity computed from multi-view silhouettes [1]. Each silhouette of an object defines a generalized cone that encloses the object by back-projecting it to 3D space and a visual hull is generated by the intersection of these cones. A visual hull has been widely used for the image-based reconstruction since it reflects the overall structure of an object and its computation is relatively simple.

Since A. Laurentini proposed the concept of a visual hull, a variety of algorithms have been proposed to compute a visual hull efficiently [2-5]. In most of these algorithms, computing a bounding volume precedes the actual geometry reconstruction. A bounding volume of an object is a 3D space where the actual object is expected to exist [2, 3]. The accuracy of a visual hull increases with increasing number of voxels or segments in a bounding volume since voxels or segments in a bounding volume are used to compute a visual hull. In other words, when the number of voxels or segments is fixed, the more compact a bounding volume is, the more accurate a visual hull is. Hence, to compute an exquisite visual hull, it is important to determine a compact bounding volume.

Although the size of a bounding volume affects the accuracy of a visual hull, few studies have been conducted

on the issue. Most studies on visual hull computation handle the bounding volume estimation using simple methods [2-3, 6-9]. Some studies estimated a bounding volume as a 3D space that is visible from all cameras, i.e., a bounding volume is estimated according to the location of the cameras [6-9] (Fig. 1). However, the computed visual hull can be inaccurate if an object is much smaller than an estimated bounding volume since the number of segments or voxels is insufficient to represent a visual hull. Another research [3] estimated a bounding volume using silhouettes. First, it finds rectangles that enclose silhouettes. Thereafter, it estimates abounding volume as an inscribed cylinder of the intersection of back-projected rectangles. Although this method is simple, the estimated bounding volume does not enclose the intersection of back-projected rectangles. Hence, a dead zone exists. If an object exists in a dead zone, the estimated bounding volume might not enclose the actual object(Fig. 2). Moreover, a bounding volume can be unnecessarily large if the silhouettes, which are extracted from the images by segmentation, are noisy.

In this paper, we propose a method that estimates a compact bounding volume which encloses an object. The proposed method generates an initial 3D point cloud using 2D polygons that enclose the silhouettes. Valid 3D points, which represent the volume of an object, are then extracted using a point filter. The point filter verifies the validity of

Jang et al.: Bounding volumeestimation algorithm for image-based 3D object reconstruction

60

each 3D point by projecting it into image planes. Subsequently, it generates a bounding volume that encloses valid 3D points. The proposed method estimates a more compact bounding volume than in previous works. Unlike [3], the proposed method is robust to noisy silhouettes since it extracts the valid points from an initial point cloud with the additional help of the point filter.

This paper is organized as follows: Section 2 explains proposed method. Section 3 includes experimental results

and Section 4 presents the conclusions.

2. Proposed method

The proposed method consists of three steps, as shown in Fig. 3. In the first step, an initial 3D point cloud is generated using bounding 2D polygons of silhouettes. The validity of each 3D point is then evaluated by the point filter. In the last step, we construct a bounding volume of an object using the valid 3D points.

2.1 Generation of aninitial 3D point cloud To estimate the valid 3D points that represent the

volume of an object, the best method might be to find the intersection volume of back-projected silhouettes, and estimate a bounding volume that encloses the intersection volume. However, this method yields a complex computation when the shape of an object is complicated. Therefore, to avoid these problems, silhouettes are approximated as polygons that enclose the silhouettes.

After the polygons are found, the edges of the polygons are back-projected into the 3D space to generate planes (Fig. 4). Then, three planes out of all planes without ordering are selected. After selecting the three planes, the proposed method calculates a 3D point as follows:

(1)

where 1 2,π π and 3π denote the plane parameter vectors, and x is a three-dimensional point. Note that rank(M) should be three, since 1 2,π π and 3π cannot produce a 3D point if these planes are dependent. This process is conducted to all three planes selected out of all planes, and produces a set of 3D points V, which is an initial point cloud (Fig. 5). Note that V includes a set, V’, which consists of valid 3D points to represent the volume of an object.

Fig. 1. Bounding volume estimated in [6-9]. If an actualobject is much smaller than the estimated boundingvolume, the computed visual hull can be inaccurate,since the number of segments or voxels representing avisual hull might beinsufficient.

Fig. 2. Bounding volume estimated in [3]. It may notenclose the three-dimensional object.

Fig. 3. Overview of the proposed method.


61

2.2 Evaluation of an initial point cloud using a point filter

Subsequently, to find a set of valid points V ' , the proposed method applies a point filter to an initial point cloud. The point filter projects each 3D point of V to all image planes. If there is at least a projected element that is located on the outside of polygons, the element of an initial point cloud is an invalid point (Fig. 6). V ' can be found after conducting this process on all elements of an initial point cloud. The volume of an object can be estimated from the distribution of the elements of a set V ' . Through a point filter, the estimated volume of an object is robust to a noisy silhouette if there is at least one noiseless silhouette.

2.3 Estimation of a bounding volume After estimating the volume of an object, it estimates a

bounding volume. In this process, a variety of methods can be exploited. The simplestmethod is estimating the minimum cube enclosing valid points. The other approach is estimating the minimum enclosing sphere [10-12]. This approach manipulates a bounding volume easily since

there are only four parameters. Hence a bounding volume was estimated as a sphere, and Welzl’s method was used to calculate the minimum enclosing sphere [12].This method uses a linear programming algorithm and can estimate the minimum sphere which encloses a set of points in the expected ( )O n time.


Several experiments were carried out to evaluate the performance of the proposed method. We compared our method with previous methods [3], and [6] using publicly available datasets and our dataset, i.e., standing man(Fig. 7). A chroma-keying algorithm proposed in [13] was used to acquire silhouettes for our dataset. Silhouettes of publicly available datasets were provided along with the color images. Table 1 summarizes the datasets. In addition, we defined the number of points on a visual hull as the resolution of a visual hull, and the number of voxels or segments in a bounding volume as the resolution of a bounding volume. In the process of generating a point cloud, the rectangle was used as a polygon. To verify the validity of using a rectangle as a polygon, the length of the x-axis, y-axis, and z-axis of the estimated volume of an object using a rectangle and a silhouette was compared (Table 2). From Table 2, we recognized that a rectangle is a proper approximation of a silhouette.

Tables 3 and 4 show the radius of the bounding volume that is estimated by using the methods in [3, 6] and the

Camera CenterPolygon

Projected Planes

Fig. 4. Planes generated by back-projection edges ofpolygons.

Fig. 5. Initial point cloud generated by three planes outof all planes.

Camera center Polygon

Image plane

Fig. 6. Evaluation of an initial point cloud using a point filter.

(a)

(b)

Fig. 7. Standing man (Left), Dancer (Middle), Capoeira (Right) (a) represents the texture, (b) is the silhouette.


62

proposed method with ground-truth silhouettes and noisy silhouettes. In the case of noisy silhouettes, the silhouettes provided and acquired were used. Table 3 shows that the proposed method estimates the smaller bounding volume than the other bounding volumes generated using the method in [3] and [6]. In Table 4, we can recognize that the proposed method is robust to noisy silhouettes, since the radius of the bounding volume estimated from noisy silhouettes is similar to that estimated from the ground-truth silhouettes. These results show that the proposed method is robust to noisy silhouettes and estimates a compact bounding volume.

Fig. 8 compares the resolution of a visual hull generatedusing the proposed method, method in [3], and method in [6] to each dataset. As the results show, the proposed method generated a high resolution visual hull for any resolution of a bounding volume. For example, in the dancer dataset, the resolution of a visual hull generated using each method is as follows: the proposed method was

51.30 10 ,× the method in [3] was 43.93 10× and the

Table 1. Data used for the evaluation.

Resolution Views Standing man 1024x768 11 Dancer data 780x582 8

Capoeira 1004x1004 8

Table 2. Validity of using a rectangle as a polygon.

Rectangle Silhouette Length of x-axis 0.53 0.45 Length of y-axis 0.71 0.63

Standing man

Length of z-axis 1.97 1.86 Length of x-axis 1.32 1.29 Length of y-axis 0.67 0.58 Dancer Length of z-axis 1.84 1.77 Length of x-axis 0.82 0.78 Length of y-axis 1.97 1.86 Capoeira Length of z-axis 0.80 0.57

Table 3. Radius of the bounding volume estimated using ground-truth silhouettes.

Standing man Dancer Capoeira Method[6] 2220.16 3.26 2.40 Method[3] 1030.36 1.20 1.12 Proposed Method 922.16 0.95 0.97

Table 4. Radius of the bounding volume estimated using noisy silhouettes.

Standing man Dancer Capoeira Method[6] 2220.16 3.26 2.40 Method[3] 1298.88 1.72 1.31 Proposed Method 922.56 0.95 0.98

Fig. 8. Experimental results for the three datasets.Thevisual hull computation using the proposedmethod generates a visual hull with ahigh resolution.In terms of resolution of a bounding volume, visual hullcomputation using the proposed method representsthe geometry component of a visual hull despitehaving a low resolution.


63

method in [6] was 42.20 10× in the case when the resolution of a bounding volume is 85.00 10 .× In the case of standing man dataset, the resolution of a visual hull was

51.11 10 ,× 48.90 10× and 44.51 10 ,× respectively, and, 52.66 10× , 52.04 10 ,× and 49.30 10× in the case of

capoeira data, respectively, at a bounding volume resolution of 85.00 10× . In particular, in the result of the dancer data, the proposed method achieved approximately three to four times higher visual-hull resolution than the previous methods. From these results, the proposed method helped to generate a high resolution visual hull.

4. Conclusion

In this paper, we propose a method that estimates a bounding volume that encloses an object using a point filter. To this end, the silhouettes of an object were used. Through a point filter, the volume of an object was estimated even when the silhouettes are noisy. In addition, a sphere was used as a bounding volume since it is represented by four parameters. Through the proposed method, a visual hull can have high resolution, despite the low resolution of a bounding volume. The experimental results show that the proposed method can be used as a useful tool for image-based object reconstruction.

Acknowledgement

This work was supported by Mid-career Researcher Program through the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education, Science and Technology (MEST)(2011-0016298)

The captured performance data were provided courtesy of the research group three-dimensional Video and Vision-based Graphics of the Max-Planck-Center for Visual Computing and Communication (MPI Informatik / Stanford).

The dancer data used in this project was obtained from http://grimage.inrialpes.fr.

References

[1] Laurentini, Aldo. "The visual hull concept for

silhouette-based image understanding." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.16, no.2, pp.150-162, February, 1994. Article (CrossRef Link)

[2] Kutulakos et al., "A theory of shape by space carving." International Journal of Computer Vision, vol.38, no.3, pp.199-218, 2000. Article (CrossRef Link)

[3] Lee, Jin-Sung, et al. "Efficient three-dimensional object representation and reconstruction using depth and texture maps." Optical Engineering, vol.47, no.1, pp.017204-1-017204-8, January, 2008. Article (CrossRef Link)

[4] Kim, Sujung, et al. "Fast computation of a visual hull." in Proc. ofComputer Vision–ACCV 2010. Springer Berlin Heidelberg, pp.1-10, 2011. Article (CrossRef Link)

[5] Matusik, Wojciech, et al. "Image-based visual hulls." in Proc. of the 27th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., pp.369-374, 2000. Article (CrossRef Link)

[6] Matsuyama, Takashi, et al. "Real-time three-dimensional shape reconstruction, dynamic three-dimensional mesh deformation, and high fidelity visualization for three-dimensional video." Computer Vision and Image Understanding, vol.96, no.3, pp.393-434, 2004. Article (CrossRef Link)

[7] Grau, Oliver et al. "A combined studio production system for 3-D capturing of live action and immersive actor feedback." IEEE Transactions on Circuits and Systems for Video Technology, vol.14, no.3, pp.370-380, 2004. Article (CrossRef Link)

[8] Starck, Jonathan et al. "Surface capture for performance-based animation." IEEEComputer Graphics and Applications, vol.27, no.3, pp.21-31, 2007. Article (CrossRef Link)

[9] Theobalt, Christian, et al. "High-quality reconstruction from multiview video streams." IEEE Signal Processing Magazine, vol.24, no.6 pp.45-57, 2007. Article (CrossRef Link)

[10] Megiddo, Nimrod. "Linear-time algorithms for linear programming in R3 and related problems." IEEEFoundations of Computer Science, 1982. SFCS'08. 23rd Annual Symposium on, 1982. Article (CrossRef Link)

[11] Skyum, Sven. "A simple algorithm for computing the smallest enclosing circle."Information Processing Letters, vol.37, no.3, pp.121-125, 1991 Article (CrossRef Link)

[12] Welzl, Emo. "Smallest enclosing disks (balls and ellipsoids)"Springer Berlin Heidelberg, 1991. Article (CrossRef Link)

[13] Hwang, S.S. et al."High-resolution 3D object Reconstruction using Multiple Cameras."Journal of The Institute of Electronics Engineers of Korea, vol. 50, no.10 pp.2602-2613, Oct. 2013. Article (CrossRef Link)

Tae Young Jang received his B.S. degree in electrical engineering from Sejong University, Korea, in 2013. Currently he is a graduate student of electrical engineering at Korea advanced institute of science and technology (KAIST) in Korea. He is interested in 3D object reconstruction.


64

Sung Soo Hwang received his B.S. degree in electrical engineering from Handong University, Korea, in 2008 and the M.S. degree in electrical engineering from KAIST, Korea, in 2010. Currently he is a Ph.D. candidate in electrical engineering from KAIST, Korea. His current research interests

include computer vision, and 3D object compression.

Hee-Dong Kim received his B.S. degree in electrical engineering from Busan University, Korea, in 2005 and the M.S. degree in electrical engineering from KAIST, Korea, in 2007. Currently he is a Ph.D. candidate in electrical engineering from KAIST, Korea. His research interests include

image processing, and computer vision.

Seong Dae Kim received his B.S. degree in electrical engineering from Seoul National University, Korea, in 1977, M.S. degree in electrical engineering from KAIST, Korea, in 1979 and Dr. Ing. degree in electrical engineering from ENSEEIHT, France, in 1983. Since 1984, he has been a

Professor at the department of electrical engineering, KAIST. He was the President emeritus of the Institute of Electronics and Information Engineers (IEIE).His research interests include image processing, and computer vision.


IEIE Transactions on Smart Processing and Computing, vol. 3, no. 2, April 2014 http://dx.doi.org/10.5573/IEIESPC.2014.3.2.65 65


Design and Load Map of the Next Generation Convergence Security Framework for Advanced Persistent Threat Attacks

Moongoo Lee

Department of Smart IT, Kimpo College / Seoul, South Korea [email protected] HT * Corresponding Author: Moongoo Lee


* Regular Paper

Abstract: An overall responding security-centered framework is necessary required for infringement accidents, failures, and cyber threats. On the other hand, the correspondence structures of existing administrative, technical, physical security have weakness in a system responding to complex attacks because each step is performed independently. This study will recognize all internal and external users as a potentially threatening element. To perform connectivity analysis regarding an action, an intelligent convergence security framework and road map is suggested. A suggested convergence security framework was constructed to be independent of an automatic framework, such as the conventional single solution for the priority defense system of APT of the latest attack type, which makes continuous reputational attacks to achieve its goals. This study suggested the next generation convergence security framework to have preemptive responses, possibly against an APT attack, consisting of the following five hierarchical layers: domain security, domain connection, action visibility, action control, and convergence correspondence. In the domain, the connection layer suggests a security instruction and direction in the domains of administrative, physical and technical security. The domain security layer has consistency of status information among the security domain. A visibility layer of an intelligent attack action consists of data gathering, comparison and decision cycle. The action control layer is a layer that controls the visibility action. Finally, the convergence corresponding layer suggests a corresponding system of before and after an APT attack. The administrative security domain had a security design based on organization, rule, process, and paper information. The physical security domain is designed to separate into a control layer and facility according to the threats of the control impossible and control possible. Each domain action executes visible and control steps, and is designed to have flexibility regarding security environmental changes. In this study, the framework to address an APT attack and load map will be used as an infrastructure corresponding to the next generation security.

Keywords: Infringement, Failure, Cyber threat, Administrative, Technical, Physical, Convergence security 1. Introduction

Understanding an Advanced Persistent Threat (APT) attack, which is a new type of cyber-attack that considers the security of the converged correspondence of administrative, physical and technological factors, is extremely important. Without depending on automatic tools and a unitary solution, this study defines the subject data to be protected, and suggests the overall

corresponding structures as follows: strengthening access control, reinforcing end point security, cyber security, the encryption of important information, analyzing the strategy of abnormal activities, security education, and the necessity of exclusive organization taking force. This study established a security strategy for each domain based on the factors that need to be reinforced for security. Each domain separates the security strategy and methods into defense, analysis, and control. A subdivided strategy

Lee: Design and Load Map of the Next Generation Convergence Security Framework for Advanced Persistent Threat Attacks

66

designs a convergence security framework based on the connectivity and efficiency, and suggests a road map for this.

2. Features of APT Attack and Necessity to Improve the Defense Strategy

2.1 Features of an APT Attack In the process of investigation, entry and control, the

attacker defines the target and plans the attacking strategy. In the penetration step, it connects to an individual and a group, and plants malignant codes. In the collection step, inside information is collected by operating a remote control bot. In the leakage step, inside information is leaked with a command and control server. An APT attack is a new type of cyber- attack that repeats continuously until it achieves an objective using a range of intelligent attacking methods. The detection of an APT threat is difficult even before a large scale of losses and damage occur or after an accident, and an even a systemized organization can be damaged. This shows that only fragmentary detection cannot penetrate the final attacking objective. The previous 8 level defense (DDoS defense, spam or virus, invasion prevention, invasion blocking, DDoS shelter, web firewall, server security, DB access security) and 4 level analysis (analyzing harmful traffic, vulnerability, malignant code and integrated security) process lacks connectivity and there is a gap. Therefore, it needs to penetrate the attacking characteristics and technological defects and considers the converged correspondence of administrative, physical and security factors [1, 2].

Advanced Threat The entire APT attacking scenario is “intelligent” in

that it is organized through considerable time and effort. Therefore, it cannot penetrate the final objective with penetration detection.

Persistent Threat To achieve an objective at a great distance of time, six

step cycles repeat with the spiral life cycle silently and

confidentially, inside or outside of the target organization. As the life cycle repeats for 6 months to 1 year, the attacking skills become elaborate and the scale of damage expands.

2.2 APT vs. Mass-market Attack Comparison

An APT attack does not depend on an automatic attacking tool, it is not standardized and is processed for a long time. Most damage is not recognized before the damage is visualized. In some cases, it cannot recognize the damage even after it has occurred [3, 4].

This requires an improvement in availability because the current converged log analysis system for an APT attack only performs a simple collection other than connectivity analysis between the security USB and the server security. To design the converged security framework against an APT attack, the following requirements need to be met [5, 6].

2.3 Correspondence Strategy for the Next Generation Convergence Security Framework of APT Attacks

A conventional multi security layer for the latest attacks has an independent executing process and performs simple collection except for a server security and connectivity analysis of security USB. Hence, it is a process of simple collection, as shown in Fig. 2. Therefore, although various controls and monitoring of the server security in management infrastructure are consistent, the efficiency of jobs decreases and it becomes unsuitable for an intelligent attack response, such as APT, which can be an persistent attack because it performs an integration analysis processing system according to an integrated log analysis. Therefore, to design a convergence security framework regarding an APT attack, the following requirements should be satisfied.

Fig. 1. Corresponding Method Gap of IntegratedSecurity for the APT Attacks Cycle.

Table 1. Comparison of a Mass Market Attack And an APT AttackOF MA.

Division APT Attack Parison

Target Particular Target Many and

unspecified persons

Purpose Definite purpose Indefinite purposeMalicious code malformation Presence Presence

Zero-day Exploit General use Nonuse Investigation Continuous penetration Minimize

Penetration methodPrimarily attack client

PC and approach to final target system

Mostly access to vulnerable system

directly

Method of purpose achievement

After attaining an authority,

systematically and confidentially act

Immediately after attaining authority,

act


67

1. Appropriate domain composition To defend against an APT attack effectively, it should

not depend on a unitary solution and automatic tools, and should prepare a corresponding strategy, which is information and human oriented. This should include all 3 domains, which are administrative, physical and technological domains, and the convergence security should be added against a composite threat [7].

2. Connectivity of domains The connectivity of domains should be constructed so

the 3 independent domains are mutually correlated as a convergence aspect. To reinforce correlation analysis, it should analyze the mutual correlation of the connected logs occurring simultaneously and distinguish the normal activities. The system should set an application plan of integrated log analyzed results, and suggest targets, relations, exploration, results and possible intrusion types by standardizing the integrated log analysis result templates.

3. Possibility of preemptive defense This should convert to a preemptive attack to defend

against an APT attack, which can overcome the passive

controls of external attacks and consequence management levels.

A preemptive attack is needed to make all defenses possible against an API attack, which is becoming the most intelligent over consequence management levels, passive control of external attacks

4. Flexibility of environment change Security threat changes continuously and develops. The

framework should be changed and developed flexibly corresponding to the changes in the security environment.

3. Design of Convergence Security Frameworks for APT Attack Correspondence

To defend against an APT attack effectively, it should not depend on a unitary solution and automatic tools. In addition, it should prepare a counterstrategy that is oriented to information and humans [8].

3.1 Design of an Administrative, Physical, Technical Security Domain

1. The design of Administrative Security Administrative Security is a composite of extension

that is based on governance, and includes physical and technological security to protect intelligence assets. This also suggests that the guidelines and direction corresponding to compliance and international standards. Administrative security is designed to have 4 criteria (rules, organizations, processes and information), as well as a management cycle, which is an action - plan - check - do. Rule is composed based on the Standard Compliance, ISMS, ISO-20000 and BS-25999 certification, and organization is composed of an Information Security Officer, Security Manager, Security Screening Committee and Related Organization Coordination Scheme.

Process includes personal security, emergency countermeasures, backup and recovery, information system introduction, information security education, and

Fig. 2. A simple Log Analysis Process ConventionalMulti Security Layer.

Fig. 3. Example of Process Linkage.

Fig. 4. Design of Administrative Security.


68

security audit. Information has an Information Protection Policy, inventory of assets, Asset Status Report, Infringement Accident Report, Hindrance Report, and Audit Report, Corrective Action Report.

Considering that an insider acts as a medium intentionally or unintentionally, the administrative security corresponding method should be reinforced in terms of social technological attacks. Personal security (insider or outsider) of a process from the 4 criteria should monitor or perform corrective actions periodically to address the social technological attack types.

2. The design of Physical Security A physical security should be designed to 2 criteria

which are a facility and a control, as shown in Fig. 4, to protect information assets. The control criterion is based on security equipment installation, such as a security agent badge, patrol, business continuity, CCTV intrusion, monitoring entrance and exit control, protection area definition, assistance memory management, output and copy control, and clean desk and cable lock. The facility criterion is designed to control both impossible and possible threats.

To respond to control impossible threats, fence, strengthen building an outer wall, strengthen window, earthquake resistant design, vibration isolation device, UPS, generator, and disaster recovery construction, are generalized. The control possible action suggests plans for a reaction against an intrusion, insider’s illegal acts, facility disabilities and fires. This can be the entrance and exit control of the data center and computer room against a person without permission. Another control is that even if a person is permitted, he or she cannot be given access to a specific protective area using a metal detector, X-ray scanner and detecting devices. In addition, it is composed of CCTV, speed gate, metal detector, gate control, ID card, fingerprint, intrusion detection sensor, and retention area.

3. Design of Technical Security Technical security is planned to protect from

DDoS(Distributed Denial of Services) attacks throughout the logical approach to information assets, Hacking attempts, insiders’ trial to attack, which makes the total directional safeguard possible against external and internal attacks by 8 factors of defense, 10 levels of analysis, and 17 elements of control at technical security systems. Therefore, the strategies of technical security can protect against inter and outside attacks.

The defense criterion starts from attacks of an external internet and it is in charge of the defense of each section, i.e. the network, server, and database. The criterion operates the DDoS response equipment, spam/virus filtering device, firewall, intrusion prevention system DDoS cyber shelter, web firewall, server security solution, DB access security solution, and internal and external net separation. At the analysis level, it performs secondary analysis and correspondence that occur at the defense level. Harmful traffic analysis, integrated security analysis, packet analysis, malignant code analysis, and vulnerability analysis, invasion attempt traces. Honey net, emergency responses are examples of analysis level responses.

The control domain only allows authorized access to information assets. To disconnect illegal acts, such as information spills, with technological methods, it mostly concentrates on preventing intentional or unintentional attacks by users working in computer rooms, developers, system operators, and managers. For this, harmful websites blocking, radio blocking, remote access control, integrated authentication: single sign on, integrated approach console, account management, security USB, PC security, PC data encryption, integrated file server, information leakage prevention are used. Recently, as cloud computing spreads, security under a cloud environment becomes significant. Even in a cloud computing environment, security solutions that are suggested in a control domain are similar.

Fig. 5. Design of Physical Security.


69

3.2 A Design of Domain Connection Layer

The linkage layers of each domain of the next generation convergence security frame work represents the intersect connections of administrative, physical, and technological security domains, as shown in Fig. 7.

This designs the security in 4 intersections. One is the intersection of administrative, physical and technical securities as a convergence aspect. An intersection between administrative security and technological security, an intersection between administrative and physical securities, and an intersection between technological security and physical security are the other intersecting connections. Historic information of each security domain should be stored as a database and the system should be realized to perform a security task using this information.

3.3 Diagram of an Action Visibility Layer A feature of an APT attack makes tracking impossible

by deleting the traces throughout long term elaborate advance preparation, a phased approach toward an objective, and an attempt to destroy the information assets after achieving an objective. For this, approximately each section, i.e. administrative, physical, and technical security response, processes like 'Business Process Management', 'Entrance & Exit Traceability Management', 'Information Asset Traceability Management', 'Control Terminal(Zero-Client)', 'Cloud Security', should be visualized by repeating the collection, comparison, decision, and notification.

Therefore, a diagram of the action visibility layer, as shown in Fig. 8, should analyze the accumulated data of

Fig. 6. The Design of Technical Security.

Fig. 7. Diagram of Domain Connection Layer.

Fig. 8. Block Diagram of the Action Visibility Layer.


70

the connection layer and provide transparency. The target data of action visibility are classified as follows.

Data based on business process management BPM provides visibility for all task processes that are

performed in data centers or computer rooms. Important task processes represent 4 domains, which are an IT plan and systemization, information system introduction and construction, service operation and supports, monitoring and evaluation, and 34 substructure control processes (defined in COBIT 5.0).

Data of access history management This maintains and manages all the information that

enters and exits the data center or computer center, and these records range from residing workers, transient workers, and visitors.

Data of Zero-Client In order to direct controls for the user performance,

such as the desk top, PC, laptop, tablet PC, and in place of a data center, computer room or access of various terminal units outside of company with VPN, institutes a Virtual Desktop Infra structure (VDI) in an organization, standardizes terminal environment of users and simplifies the access routes.

Data of cloud security Because of the efficient resource management, the user

convenience cloud computing infrastructure introduction is increasing. On the other hand, a conventional computing environment faces risks from the structural features of the cloud environment, such as the operating system, network, process, hypervisors and managers. The system visualizes the administrative activity records, such as cloud computing environment construction based on the encryption skills of TPM (Trusted Platform Module), access control throughout cloud visors, authorized booting, and safe storage set up.

3.4 A Diagram of Action Control Layer The diagram of action control layer shown in Fig. 8

defines the abnormal activities using the Business Rule Engine (BRE) to control the visualized activities, and it performs performance monitoring by business activity monitoring.

The action control layer consistently detects abnormal actions and makes a judgment. As a process that separates the control terminal from the access routes of a virtual desktop infra-environment by performing a control process by blocking the abnormal activities.

The action control layer performs processes, such as ‘unusual, aberration action definition’, ‘monitoring action definition’, ‘abnormal action decision’, ‘abnormal action disconnection’, and ‘reporting and statistics’.

3.5 Diagram of the Convergence Correspondence Layer

To address intelligent and convergence APT attacks, the system constitutes the visualized layers and forms

activity control, ultimately convergence security layers that can block before or after APT attacks about the abnormal activities. The convergence correspondence layer, as shown in Fig. 9, defines the scenario of an APT attack that is known as an APT intrusion attack case.

Throughout the simulation processes according to the characteristics of the information assets, it extracts the attack specialized scenarios. A range of attack scenarios are possible and it defines the scenario for the detection of whole steps of the scenario and middle stages as well. For large data analysis, it introduces high-capacity distributed file systems and composes a SDW(security data warehouse). Based on this, it performs typical or atypical data analysis. In addition, based on SDW, it performs data mining, as well as periodic extraction of an APT attack scenario, and performs the management of changes in the convergence security by instituting convergence corresponding layers.

3.6 Final Design of Convergence Security Framework

Against an APT attack, administrative, physical and technical security realization occurs through the connectivity between administrative and technical security, connection of the administrative and physical security, and technical and physical security connection, which are inter-

Fig. 9. Block Diagram of Action Control Layer.

Fig. 10. Block Diagram of Convergence Correspon-dence Layer.


71

connecting each security process of each section, and the visualizing layer is the first step of the process. The second process blocks the abnormal actions by controlling visualized actions based on first step.

Finally, the correspondence layer is designed to perform an APT attack scenario definition, ‘big data analysis’, ‘APT pre-diagnosis and disconnect,’ ‘APT paragnosis and action’, ‘reporting and share with cooperation institution’, as shown in Fig. 11.

4. Load Map of Convergence Security Framework for the APT Attack Security Correspondence

Against an APT attack, the load map of the convergence security framework was performed, as shown Fig. 12.

The first step of the implementation load map is ‘connectivity and visualization’. For the first security area connectivity, the administrative, technical and physical security performance should be connected, and the history information connectivity for the events, log data connection should be continued. Secondly, for ‘activity visualization configuration’, business process management, exit and entrance information management, information asset access history and event management should be performed, and for visualized monitor composition, collection, comparison, decision and reporting process are

performed. Thirdly, for ‘activity configuration standardization’ control terminals, such as zero-client, cloud security, desktop virtualization, VDI(virtualization desktop infrastructure), SBC(server based computing), SaaS, PaaS, IaaS, etc. are applied.

The second step is ‘action control activity’. First of all, the abnormal and normal activities are defined. Normal activities perform the business procedures of business process management that are authorized by the users. The abnormal activities are the unauthorized activity that reached critical amounts. Secondly, it constructs action monitoring. At this point, it monitors the performance processes and the event occurrence based on BRE. Thirdly, when the access activity is abnormal or deviated action, the confirmation process should be performed. Fourthly, abnormal performance blocking should be carried out by separating the abnormal activity from the control terminal, VDI route.

On third step of the load map, the ‘convergence security step,’ APT attack scenarios should be defined by collecting the known scenarios and the detection of specialized attack scenarios. Secondly, in a high capacity distributed file system composition and SDW composition, typical or atypical data analysis, large data analysis by data mining should be executed. Thirdly, by a SBI (Security Business Intelligence) set up, processed APT detection and blocking function, APT pre diagnosis or blocking is composed. Fourthly, the detection of an APT attack that has completely invaded, an APT paragnosis and measures are continued. Finally, a reporting or sharing system with cooperation is realized.

Fig. 11. Convergence Security Framework for an APT Security Strategy.


72

The fourth step is the ‘operating’ step. Firstly, system education for activity control and convergence security and user education training is performed. Secondly, convergence control is performed from integrated control security to the convergence security. Thirdly, activity control or convergence security performance are performed and the result evaluation and improvement are carried out.

5. Conclusion

This paper proposed correspondence to the next generation convergence security framework and load map against APT attacks. The system is designed to have several layers, such as the connectivity of each domain and each security layer, action visualization layers, action control layer and convergence correspondence and suggested load map of each step. The introduced convergence security framework is designed to manage the changes to convert from an integrated security framework to the convergence security framework by considering the connectivity, sections, and steps. For the suggested APT attack correspondence, the framework and load map are used for the infrastructure of a convergence security correspondence strategy for the next generation security response system.

Acknowledgement

This study was supported by KIMPO College’s Research Fund.

References

[1] Command Five Pty Ltd. "Advanced Persistent Threat: A Decade in Review" 2011. Article (CrossRef Link)

[2] AhnLab “A Whole New Approach in combating Advanced Persistent Threats”, 2012 Article (CrossRef Link)

[3] Binde, Beth E., McRee, Russ., and O’Conner, Terrence J. (2011). “Assessing Outbound Traffic to Uncover Advanced Persistent Threat”. Article (CrossRef Link)

[4] Blue Coat Labs Report: Advanced Persistent Threats, BlueCoat, BlueCoat, 2011. Article (CrossRef Link)

[5] Woo Bong Cheon, Won Hyung Park, Tai Myoung Chung, “Design and Implementation of ATP(Advanced Persistent Threat) Attack Tool Using HTTP Get Flooding Technology”, The Journal of Korean association of computer education / v.14 no.6, 2011, pp.65-73 Article (CrossRef Link)

[6] Segyun Park, "(A)Study on Effective APT Attack Defense of Endpoint Level", The korea Institute of Information Scientists and Engineers 2013 Conference. 2013. 6, pp.732-734 Article (CrossRef Link)

[7] Frankie Li, ran2, "A Detailed Analysis of an Advanced Persistent Threat Malware", SANS Institute Infosec Reading Room, Oct 2011. Article (CrossRef Link)

[8] Moongoo Lee, Chunsock Bae, "Next Generation Convergence Security Framework for Advanced Persistent Threat", Journal of the Institute of Electronics Engineers of Korea, Vol. 50, No. 9 September 2013 pp. 92-99. Article (CrossRef Link)

Fig. 12. Load Map of the Convergence Security Framework.


73

Moongoo Lee is Professor of the Dept. of Smart IT at Kimpo College, Seoul, KOREA. She is the director of the Computer Society at the IEIE. She received B.S. in Computer Science at Soongsil University in 1984, M.S. in Computer Education at Ewha Womans University in 1993, and Ph.D. in

Computer Science at Soongsil University in 2000. Her research interests include Information Security, Algorithm Design, and other fields of Computer Science.



74


Design of Tourism Application Based on RFID Technology

JiHyun Lee1, JoonGoo Lee1, and SeonWook Kim1

1School of Electrical and Computer Engineering, Korea University, Republic of Korea {chaz, nextia9, seon}@korea.ac.kr * Corresponding Author: SeonWook Kim


* Regular Paper

Abstract: Automatic identification is pervasive in many areas and its applicable areas are increasing gradually. 2D bar-code, NFC, and RFID technologies are representative examples of theautomatic identification. This paper explains the implementation of mobile tourism applicationsoftware on RFID technology. The mobile application provides the location and navigationinformation by combining the tag inventory and web database. The interactions among the user, application and database server are described in detail. This paper proposes a simple way of minimizing the efforts to build the entire system by storing the URLs for the tag and accessing existing tourism information services through the URLs.

Keywords: RFID, Tourism, Automatic Identification, Mobile, NFC 1. Introduction

Automatic identification technologies, such as barcodes, RFID, etc. have become prevalent in inventory andtracking systems. The use of those technologies hasmany advantages and attractions as follows. 1)The technologies link the physical world and virtualworld easily by attaching identification codes on the surfaces of any object. 2) The process of object identification becomes simpler, more consistent, and more intuitive to use. 3) Those technologies are easy to integrate with pre-existing network infrastructures, particularly with the Internet. 4) Each technology has evolved to store and retrieve not only a simple identification code, but also useful data and information. 5) Data aggregation and manipulation can be automated easily. Consequently, the technologies are expected to extend their applications in many fields [1, 2].

A two-dimensional barcode [3] was developed based on the one-dimensional barcode to overcome its small data storage capacity. QR Code [4], which is one of the two-dimensional barcodes well known to the public, has data storage capacity up to thousands of characters. TheQR code was initially intended to track automotive parts, but the scope of its applicability has been extended to other fields, such as transportation tickets, sushi dishes for freshness control, tags for jewelry certification, etc.. Some people print the QR Code on their business cards linked to

their homepages or social network service pages. Another major automatic identification technology is

radio frequency identification (RFID) [5]. RFID technology has the potential to improve management efficiency in terms of cost, time and accuracy of information interchange over other technologies. An anti-theft system [6] based on RFID technology has already been incorporated in many stores with an inventory management system [7]. Besides, there is NFC technology that is derived from RFID technology, working in a considerably short range. The NFC has been popularly used in payments [8, 9], automatic pairing devices for wireless connectivity, such as Bluetooth and Wi-Fi [10, 11], etc. These technologies have different characteristics and are adapted to use in a wide variety of applications ranging from industry to personal purposes.

Many studies have applied automatic identification technologies to the tourism area [12-14] because it is beneficial for tourism applications in many aspects. Examples include ”NFC Internal” [15] for indoor navigation and Smart Poster [13] for outdoor location and navigation in urban areas or commercial districts. On the other hand, despite these studies and implementations, there isa lack of satisfactory tourism applications using those technologies.

When a tourism application is constructed, it is important to consider the following issues. First, the users


75

are typically unspecified individuals. They require different and various types of information depending on their personal experience and interest. The implementation of a system that satisfies all the users’ needs is difficult to achieve and requires enormous effort. Fortunately, a range of tour-related services are already available in the form of web services or mobile applications. Rather than build from the ground up, such as the case study reported by Borrego-Jaraba et al. [13], integrating with preexisting tour services appears to better fit when a tourism application is constructed. The second issue is the cost of deploying and managing the entire system. If a system covers a vast area and many points of interest, the cost of deployment and management increases considerably. Because each automatic identification technology has its pros and cons over others, it is important to carefully choose the appropriate technology to reduce the cost.

This paper reports a method for resolving the above issues when applying RFID technology to the tourism area. Although there are many touristic services available, tourists may feel it uncomfortable and difficult to obtainuseful information due to the unfamiliar surroundings. Therefore, a bridge was constructed between tourists and existing tourism services, which would make the process that users undergo to obtain information easier using automatic identification, particularly using RFID technology. Instead of accessing the information from each website individually, a tourist can use RFID technology to identify the location and access all the related information from several websites at one time from an RFID tag. To verify this concept, a mobile application prototype was implemented in this study. The application works on an Android-based smartphone and requires an RFID reader dongle. Note that this dongle can read the passive 860~960 MHz ultra-high frequency band RFID.

The front-end mobile application scans near the RFID tags, accesses a backend web server, receives the result of database searching, and shows the usable information requested.In the application, tourists can see the tag history, i.e., their movement history. Many technologies are available as auto identification, i.e. the frontend, but a backend system might be more important because of the device independency. In the current approach, the backend system requires no extra program and consists of only two parts; a web database and server-side script receiver files. The web database stores and manages the data that is provided to a tourist. The receiver script files catch a request from the application, check the tag ID attached to the request, and then returns the result of the database searching query using a server processor. Because the backend system is made up with the DBMS system and server-side receiver, it would be simple and easy to manage the data. Currently, the database stores the URLs corresponding to a tag ID and the titles of the URLs, respectively.

This paper reports two major contributions. First, in the process of implementation, minimal effort was made to build a mobile tourism application. The user’s location identification was simplified by introducing RFID technology. Second, the existing tourism-related information services were integrated into a single

application to reduce the redundant searching process, which improved user experience. As a result, the barrier to entry for developers and users was lowered by suggesting a simple way to utilize RFID technology and obtain useful information, respectively.

This paper is structured as follows. Section 2 explainsthe pros and cons of automatic identification technologies and compares them in terms of the tag-reader and frontend-backend relationship. Section 3 reports the architecture of the proposal and its implementation. Finally, Section 4 presents the conclusions.

2. Background of the Automatic Object Identification System

2.1 Comparisons of Identification Technologies

Many identification technologies exist in wireless, such as Radio Frequency IDentification (RFID), Near Field Communication (NFC), and 2D barcode, such as Quick Response Code (QR code). Note that in this paper, RFID means the passive 860~960 MHz ultra-high frequency (UHF) band RFID technology and the NFC means 13.56 MHz high frequency technology. Using QR codes, RFID tags, and NFC tags can be helpful for many applications, such as logistics, augmented data, supply chain, etc. [1, 4, 16]. Table 1 lists the pros and cons of each technology.

Among these three useful technologies, RFID technology wasselected. The reasons are as follows. The 2D barcode was excluded initially. 2D barcode is based on a visual sensor and image processing. The innate line-of-sight characteristics would require more effort to use it. In addition, printed bar-code is read-only. This makes harder to replace for new data than other technologies. Thus, the management cost may increase for long term use. It is vulnerable for sabotage such as scribbling as well. Finally, a 2D bar-code requires more space to embed large data and cannot be superimposed.

Although NFC is already pervasive and supported in many smartphone devices, NFC wasnotthe best choice. First, its reading range, only a couple of centimeters, could be difficult to use in tourism scenarios. In contrast, RFID offers better user experience than NFC because of the longer reading range and multi-tag recognition. RFID doesnot require the users to move their device close to the tag. Second, the users can turn off the NFC at the configuration of their devices after using the NFC. Otherwise, the device will search constantly, even if there is no any other NFC device nearby, which leads to power wastage. The users should then manually turn on the NFC again when they want to use it. To some people, adjusting the configuration of their device might be troublesome. These process were changed to a plug and unplug mechanism using a RFID reader dongle. The users just need to plug in an RFID reader dongle, scan tags, and unplug the dongle when they wish to discontinue their use. The application is automatically executed and there is no more power consumption after unplugging.

Lee et al.: Design of Tourism Application Based on RFID Technology

76

2.2 Mobility of Readers and Tags RFID/NFC systems can be classified into three types in

terms of the mobility of readers and tags. This paper focuses on the 2nd system. • 1) Fixed readers and mobile tags: Readers exist at a

pre-defined fixed position. They recognize the nearby tags, gather and manipulate the necessary data, and then provide useful information using the data. This type of system can be used in access management systems, toll collection systems, etc. • 2) Fixed tags and mobile readers: Tags are located

at pre-defined fixed positions, such as walls, surfaces, and billboards. Nomadic users with a mobile reader device can read the data of the tags and then use them for special purposes. An information provider system at any point of interest, such as Smart Poster, etc. could be an example. • 3) Mobile tags and mobile readers: RFID/NFC tags

are attached to the objects. These objects may be mobile. Users with a mobile reader can read the tags’ data and/or write/modify the data. An augmented memory system [18] and the Internet of Things are two examples.

2.3 Differences in implementation types In practice, an application system consists of a frontend

system and a back-end system. The front-end system is

composed of RFID/NFC tags and a RFID/NFC reader equipped on smart devices. The back-end system comprises a network infrastructure and data servers that are connected to the Internet. Theimplementation can be divided into three types as follows.

2.3.1 Front-end centric implementation RFID/NFC tags keep the most valuable information in

their non-volatile memory. A reader extracts the information from the tags and then uses the information. This is called the least use of back-end system. • Pros: An application can be independent of the

infrastructure, and there is less requirement for an Internet connection. • Cons: It is difficult to update. The size of the front-

end might be too big to cover a large area because the system needs to contain almost all the data. The deployment and management cost may increase significantly. • Possible applications: For underground, mountainous

regions, wetlands, rural areas where an Internet connection is poor.

2.3.2 Back-end centric implementation Each RFID/NFC tag has only a unique identification

code.A reader reads the code and uses it to query useful information via back-end systems. The most valuable

Table 1. Comparison of automatic object identification technologies.

QR code

Pros Near zero cost for the deployment of QR codes and easy to make QR codes. Large data capacity up to 2,953 bytes for binary and 4,296 characters for alphanumeric.

Cons Line-of-sight problem can occur because of vision processing. This can be lead to falsification. Read-only after being produced once, and therefore difficult to replace for new data or information. Affected by glare, shadows and defacement.

NFC

Pros

Many smartphones and smart pads have already been equipped with NFC function. Mobile payment - more secure than others. Situation-based profiles - context-aware application may be provided. Plenty of re-writable non-volatile storage. Typically up to a couple of KB. Device-to-device communication is possible with a peer-to-peer mode.

Cons

Short reading/writing range, typically less than 10 cm. Practically less than 1 cm. Because of the polling loop, which is time-multiplexed switching between many modes (reader mode and card emulation mode for many different protocols), it may consume more power for usability. Relatively high cost for tag deployment. A device may perform unintended action [17].

RFID

Pros

Longer read/write operation range than NFC, and non-line-of-sight. Multi-tag recognition. RFID tags are less expensive than NFC tags. Not a lot but still affordable re-writable memory;typically up to couple of Kbits. Higher data transmission rate.

Cons Few mobile readersare available. Hard to use with metal or water because of reflection or blocking. Interference problem may be occurred because of reader collision.


77

information and data are stored in back-end storages, such as databases. The back-end system manages and manipulates the information and data. • Pros: Size of the front-end can be light in contrast to

front-end centric implementation. • Cons: It is difficult to update. The size of the front-

end might be too big to cover a large area because the system needs to contain almost all the data. The deployment and management cost may increasesignificantly. • Possible applications: For urban, tourist spots where

the cellular network or Wi-Fi connection is activated.

2.3.3 Hybrid implementation RFID/NFC tags keep a unique identification code and

minimal basic information. A reader reads the code and information, and then provides the basic information to a user without a back-end system. The user may request the relevant information with the code to the backend system if necessary. • Pros: Application can offer primary data without an

Internet connection. With the Internet, it can obtain more detailed information. • Cons: Balancing between the front-end and back-end

can be difficult. • Possible applications: From rural areas to urban areas

including museums, palaces, temples, etc. Almost all places can utilize it.

In this paper, the proposed system using the concept of hybridwas implemented.

3. Architecture and Implementation of Our System

Fig. 1 shows the organization of the system that includes 1) an Android smartphone, 2) a mobile RFID tag reader, 3) Smart Poster to embed RFID tags, and 4) a backend server that receives and processes the requests from the smartphone.

The process to obtain the information is as follows: 1) A user accesses a Smart Poster using a RFID reader, and then the application will start automatically if it has already been installed. 2) In the start-up screen, the user can run an inventory just by touching a button until the tag information is displayed on a screen. In response to this touch, the application enables the mobile RFID tag reader to scan around. The mobile RFID reader returns the basic information including the tag ID from the RFID tags if the mobile RFID reader succeeds in finding some RFID tags nearby. 3) and 4) The application receives the value and sends it to the backend server through the Internet to request the information stored in the database of that server. 5) The backend server processes the request and searches the related record from the database. Normally, the record consists of information, such as where the tag is (same as where the user is), what the place is for with a simple description, how the area of the surroundings is, or where to go among the points of interest. This information is essential for tourism. 6) The application displays the received information on the screen. The information is shown in a form of list. 7) The user can select one item from the list, and then 8 and 9) the user can see the detailed information from its linked URL.

Using this solution, the user will obtainthe necessary information regarding where the person is and where to go among the points of interest easily without using a GPS or

Fig. 1. Organization of the proposed system.


78

tiresome web surfing. Therefore,the data in the database should contain the tag ID and URLs as a minimum. The tag ID is from the RFID tag that is embedded in the Smart Poster. This is used to identify where the RFID tag is, and therefore where the user is. The URLs attached to the tag ID are used for location and navigation for the recommended route. Table 2 lists the proposed fields in the database. Until now, only the Tag ID, URLs and the titles of each URL are stored in the database. The titles of the URLs are inputtedmanually. Detailed information, such as pictures, comments from other users, the number of recommendations, and others, might be added to upgrade the front-end.

Each URL is stored double in the database. One is the full URL and the other is the shortened form of the full URL.

This is because many URL shortener services (such as goo.gl1 or Bitly2) provide statistics related to the number of clicks for the past, referrers, browsers, operating systems, IP based region information, etc.. This may help information providers’ insightful metadata for tourism services. For example, the number of clicks can help determine which places are hot. Extra programming and storing data for statistics, however, incurs another cost. Therefore,this studyuseda short URL for statistics. On the other hand, there is one problem. The short URL’s destination is not transparent [19]. A long URL (full URL) has much information regarding its destination. For example, consider ‘http://www.korea.ac.kr’ without actual access. At least three things can be inferred. 1) The top-level domain name ’.kr’ means that the host country is Korea3. 2) The second-level domain name ’.ac’ indicates that this URL is for a collegeor other academic institution4. 3) The third-level domain name ‘.korea’ may refer to the name of the institution. Hence, the URL’s destination can be considered an academic institution located in Republic of Korea; it is actually the full domain name of Korea University. On the other hand, the shortened URLs are normally made of a randomly generated meaningless string5. These are known to be vulnerable to cyber-attacks because short URLs work on redirecting the visitors to a previous long URL [20]. The destination may be a spam site or a site with malicious script code. Therefore, to prevent abusing the short URL, it was decided to use both of them in that a long URL is shown to users as an

1http://goo.gl/ 2https://bitly.com/ 3http://www.iana.org/domains/root/db 4http://krnic.kisa.or.kr/jsp/domain/domainInfo/krDomainInfo.jsp

indicator of where to go, and a short URL not shown to the users is the true URL for statistics.

3.1 Interaction between the User and Application

If the user connects the mobile RFID reader to a smartphone, in which the application is pre-installed, the application is activated automatically. If not, a pop-up box will appear on the screen for installation if it is the first time to use the application. If the application starts to execute, the application initially shows an inventory screen. Fig. 2(a) shows the start-up screen. In this screen, the inventory can be run by touching the button constantly. As shown in Fig. 2(a), there is a button for running the inventory that allows the user to obtain the information. While the user keeps touching the button, the application would keep searching and reading the tag ID from the nearby RFID tag(s) as well, through the mobile RFID reader. (Fig. 2(d)). The user can easily notice that the application is scanning RFID tags by visual changesin the button. This RFID tag scanning continues until the user detaches his/her finger from the button. Therefore, even when multiple RFID tags are around the user, these RFID tags can be handled sequentially until all the RFID tags that the user wants to read are scanned. The user can stop searching at any time when he/she doesnot want to scan the RFID tag(s) anymore.

3.2 Interaction between Application and Server

Fig. 3 shows the process flow on the server-side. As soon as the mobile RFID reader reads the tag ID from the tag on a running inventory, the application starts to access to web database immediately, by obtaining the URLs corresponding from the acquired tag ID. The database is in the specified back-end server including DBMS system.

Under the condition of a constant connection to the Internet, the application starts interacting with the backend server to process the result of tag scanning. On the back-end server, there is a server-side script receiver file that receives the request from the application and sends the searching query to the database. After the server processes the request, the server answers the application. If there is any matching tag ID in the database, the matched tag ID, URLs and the titles of the URLs are sent to the smartphone. Finally, the application displays a list of the URLs on the screen while storing the URL scanning history

Table 2. Proposed data structure in this paper.

Essential

key id (just for the key value in the database. not be shown to the user) Tag ID (unique id value of the RFID tag) URLs (URLs corresponding to a Tag ID. Each URL has two forms: full URL; and shortened version of the full URL)

Optional Title (or something else like simple description) Extended

(not yet implemented, but might be required)

Picture, Text description, Comment, # of Recommendation, and others


79

simultaneously. The result of the URL scanning history, as shown in Fig. 2(b), is shown at the ’List’ menu.

The configuration menu of the application in Fig. 2(c) is for tourism information providers. First, using sign-in function, some special users, such as service providers, can be authorized as administrators. The indication of a normal user or an authorized user is at bottom right of the screen. The normal user can see the Sign-in button and ’Normal mode’ text. The signed-in user can see Sign-out button and ’Manager mode’ text. The account data of the authorized user is also stored in a separate table of the database. Second, only for the authorized user, can unregistered RFID tags be handled to register proper URLs. If any RFID tag in Smart Poser is damaged for some reason, changing the physical tags and registering virtual new IDs can be performed simultaneously. This would reduce the managerial steps and cost.

In addition, authorized service providers can add a new URL/title to a tag ID, edit data, or delete data in an application at the scanning history screen. The short touch

activates an embedded browser like the tag scanning screen, but the long touch invokes a pop-up menu for management (see Fig. 4). Each of the three managerial works is similar to registering a new tag.

Fig. 5 shows the all processes of the proposed system.

3.3 Interaction between the User and Application Again

Fig. 6(a) shows the information received from the server. Each URL is a link to a web page that contains the location or navigation information. For example, there is a URL; https://www.airbnb.co.kr/s/koreauniversity?source=bb.

This URL is linked to the Airbnb service that shows the search result of accommodation nearby Korea University.

The geographic information and a descriptive web page can be attached to the RFID tag. Another URL, http://www.korea.ac.kr/, which is connected to the official web page of Korea University, is an example. The user will be able to know about what the location is for. The application

(a) Start-up screen (b) Showing history screen (c) Configuration screen (d) Scanning

Fig. 2. Running application screenshots and picture.

Fig. 3. Server-side processing.


80

Fig. 4. Managing data in application.

Fig. 5. Flowchart of our system.

(a) Start-up screen (b) Showing history screen

(c) Configuration screen

Fig. 6. Running application.


81

identifies the RFID tag IDs and provides a range of information for the user’s satisfaction in this way.

In this scenario, as shown in Fig. 6(b), the user selects an item among the proposed web page URLs, and then the embedded web browser accesses the URLautomatically. The result of selecting an item is shown in Fig. 6(c).

4. Conclusion

A mobile location and navigation system was constructed with minimal effort using existing methods. The location identification was simplified by introducing RFID technology. The navigation and information about the points of interest based on the location identification was extracted from the existing tourism services. The extracted information was then integrated into a single application to reduce the redundant searching process. This was expected to improve the user’s experience. In managerial aspects, the change in context does not need to modify the system because the actual information comes from each tourism information service, such as Airbnb, Foursquare, or Yelp, not from our database. The database just has URLs as pointers indicating which web page to go to. As a result, the barrier to entry for users and developers was lowered by suggesting a simple way to obtain useful information for their trip and utilize the RFID technology in the tourism area, respectively.

Acknowledgement

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2011-0020128).

References [1] X. Zhu, S. K. Mukhopadhyay, and H. Kurata, “A

Review of RFID Technology and Its Managerial Applications in Different Industries,” Journal of Engineering and Technology Management, vol. 29, no. 1, pp. 152-167, 2012. Article (CrossRef Link)

[2] T.-W. Kan, C.-H. Teng, and M. Y. Chen, “QR Code Based Augmented Reality Applications,” in Handbook of Augmented Reality. Springer, pp. 339-354, 2011. Article (CrossRef Link)

[3] T. Sun and D. Zhou, “Automatic Identification Technology-Application of Two-dimensional Code,” in Automation and Logistics (ICAL), 2011 IEEE International Conference on. IEEE, pp. 164-168, 2011. Article (CrossRef Link)

[4] T. J. Soon, “QR Code,” Synthesis Journal, pp. 59-78, 2008. Article (CrossRef Link)

[5] R. Want, “An Introduction to RFID Technology,” Pervasive Computing, IEEE, vol. 5, no. 1, pp. 25-33, 2006. Article (CrossRef Link)

[6] C. M. Roberts, “Radio Frequency Identification (RFID),” Computers& Security, vol. 25, no. 1, pp. 18-26, 2006. Article (CrossRef Link)

[7] B. Nath, F. Reynolds, and R. Want, “RFID

Technology and Applications,” IEEE Pervasive Computing, vol. 5, no. 1, pp. 22-24, 2006. Article (CrossRef Link)

[8] J. Ondrus and Y. Pigneur, “An Assessment of NFC for Future Mobile Payment Systems,” in Management of Mobile Business, 2007. ICMB 2007. International Conference on the. IEEE, pp. 43-43, 2007. Article (CrossRef Link)

[9] M. Fisher, “Conducting an Online Payment Transaction Using an NFC Enabled Mobile Communication Device,”US Patent 8,352,323, Jan. 8 2013. Article (CrossRef Link)

[10] G. Madlmayr, J. Langer, C. Kantner, and J. Scharinger, “NFC Devices: Security and Privacy,” in Availability, Reliability and Security, 2008. ARES 08. Third International Conference on. IEEE, pp. 642-647, 2008. Article (CrossRef Link)

[11] J. Haartsen, “Bluetooth-The Universal Radio Interface for Ad Hoc, Wireless Connectivity,” Ericsson review, vol. 3, no. 1, pp. 110-117, 1998. Article (CrossRef Link)

[12] K. Hozak, “Managerial Guidance for Applying RFID in The Tourism Industry,” Interdisciplinary Journal of Contemporary Research in Business, vol. 4, no. 2, pp. 18-30, 2012. Article (CrossRef Link)

[13] F. Borrego-Jaraba, I. Luque Ruiz, and M. A. G´omez-Nieto, “A NFC-based Pervasive Solution for City Touristic Surfing,” Personal Ubiquitous Comput., vol. 15, no. 7, pp. 731-742, Oct. 2011. Article (CrossRef Link)

[14] J. Pesonen and E. Horster, “Near Field Communication Technology in Tourism,” Tourism Management Perspectives, vol. 4, pp. 11-18, 2012. Article (CrossRef Link)

[15] B. Ozdenizci, K. Ok, V. Coskun, and M. Aydin, “Development of an Indoor Navigation System Using NFC Technology,” in Information and Computing (ICIC), 2011 Fourth International Conference on, pp. 11-14, 2011. Article (CrossRef Link)

[16] B. Benyo, A. Vilmos, K. Kovacs, and L. Kutor, “NFC Applications and Business Model ofThe Ecosystem,” in Mobile and Wireless Communications Summit, 2007. 16th IST. IEEE, pp. 1-5, 2007. Article (CrossRef Link)

[17] J. J. Gummeson, B. Priyantha, D. Ganesan, D. Thrasher, and P. Zhang, “Engarde: Protecting the Mobile Phone from Malicious NFC Interactions,” in Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, ser. MobiSys ’13. New York, NY, USA: ACM, pp. 445-458, 2013. Article (CrossRef Link)

[18] R. Barthel, K. Leder Mackley, A. Hudson-Smith, A. Karpovich, M. Jode, and C. Speed, “An Internet of Old Things as an Augmented Memory System,” Personal Ubiquitous Comput., vol. 17, no. 2, pp. 321-333, Feb. 2013. Article (CrossRef Link)

[19] A. Neumann, J. Barnickel, and U. Meyer, “Security and Privacy Implications of URL Shortening Services,” in Proceedings of the Workshop on Web, vol. 2, 2011. Article (CrossRef Link)

[20] F. Klien and M. Strohmaier, “Short Links Under


82

Attack: Geographical Analysis of Spam in a URL Shortener Network,” in Proceedings of the 23rd ACM conference on Hypertext and social media. ACM, pp. 83-88, 2012. Article (CrossRef Link)

[21] D. Guinard, C. Floerkemeier, and S. Sarma, “Cloud Computing, REST and Mashups to Simplify RFID Application Development and Deployment,” in Proceedings of the Second International Workshop on Web of Things. ACM, p. 9, 2011. Article (CrossRef Link)

Jihyun Leereceived her BS degree in Electrical Engineering from Korea University, Seoul, Korea, in 2014. Currentrly, she is a graduate student of School of Electrical Engineering at Korea University, Korea. Her current research interests include mobile application and mobile web browser

optimization.

Joon Goo Lee received his BS and MS degrees in electrical engineering from Korea University, Seoul, Korea, in 2005 and 2008 respectively, and is pursuing a Ph.D. degree in electrical engineering with Korea University. His research interests include passive RFID systems and their new applications,

ultra low-power wireless communication systems, efficient protocols, and machine-to-machine interface.

Seon Wook Kim received his BS in Electronics and Computer Engineering from Korea University, Seoul, Republic of Korea in 1988. He received his MS in Electrical Engineering from Ohio State University, Columbus, Ohio, USA, in 1990, and Ph.D. in Electrical and

Computer engineering from Purdue University, West Lafayette, Indiana, USA, in 2001. He was a senior researcher at the Agency for Defense Development from 1990 to 1995, and a staff software engineer at Inter/KSL from 2001 to 2002. Currently, he is a professor with the School of Electrical and Computer Engineering of Korea University and Associate Dean for Research at the College of Engineering. His research interests include compiler construction, microarchitecture, and SoC design. He is a senior member of ACM and IEEE.



83


Energy-Efficient Opportunistic Interference Alignment With MMSE Receiver

Won-Yong Shin1 and Jangho Yoon2

1 Department of Computer Science and Engineering, Dankook University / Yongin, Republic of Korea [email protected] 2 Department of Electrical Engineering, KAIST / Daejeon, Republic of Korea [email protected] * Corresponding Author: Won-Yong Shin

Received November 20, 2013; Revised December 27, 2013; Accepted February 12, 2014; Published April 30, 2014 * Short Paper

Abstract: This paper introduces a refined opportunistic interference alignment (OIA) technique that uses minimum mean square error (MMSE) detection at the receivers in multiple-input multiple-output multi-cell uplink networks. In the OIA scheme under consideration, each user performs the optimal transmit beamforming and power control to minimize the level of interference generated to the other-cell base stations, as in the conventional energy-efficient OIA. The result showed that owing to the enhanced receiver structure, the OIA scheme shows much higher sum-rates than those of the conventional OIA with zero-forcing detection for all signal-to-noise ratio regions.

Keywords: Energy efficiency, Minimum mean square error (MMSE) detection, Multi-cell uplink network, Opportunistic interference alignment (OIA) 1. Introduction

Interference management is considered a crucial problem in wireless communication systems, where multiple users share the same resources. Considerable research has been carried out to characterize the asymptotic capacity of interference channels using the simple notion of degrees-of-freedom (DoF). Recently, interference alignment (IA) was proposed as a novel approach to fundamentally solve the interference problem when there are two communication pairs [1]. Cadambe and Jafar [2] reported that the IA scheme can achieve the optimal DoF, equal to / 2,K in the K -user interference channel with the time-varying channel coefficients. The underlying idea in references [1, 2] has led to interference management schemes based on IA in a range of wireless network environments, such as multiple-input multiple-output (MIMO) interference networks [3, 4], X networks [5], and cellular networks [6].

This paper considers the interfering multiple-access channel (IMAC), which is referred to as a practical multi-cell uplink network. In addition to the IA scheme reported by Suh and Tse [6], the concept of an opportunistic IA (OIA) [7, 8] was introduced recently in the single-input multiple-output IMAC with time-invariant channel coefficients. The OIA scheme intelligently combines user

scheduling into the classical IA framework in signal vector space by selecting opportunistically certain users in each cell to align the inter-cell interference at a pre-defined interference space. One study [8] reported that a full DoF can be achieved asymptotically provided that the number of users scales at least to a certain value. Their work [8] was extended to the MIMO IMAC model using various types of pre-processing methods [9, 10]. Moreover, to take energy efficiency into account [11, 12], which is a growing issue in the telecommunication community, a new energy-efficient OIA protocol [13] for MIMO multi-cell uplink networks, i.e., MIMO IMACs, was proposed based on the existing OIA framework. In an energy-efficient OIA, the optimal transmit beamforming vector design and power control strategy were performed jointly at each user to minimize the amount of interference generated to other-cell base stations (BSs) in a distributed manner, which leads to significant performance enhancement on the sum-rates even with reduced transmit power consumption, resulting in higher energy efficiency compared to the conventional OIA scheme [9].

In previous work [6-10, 13], the OIA framework was designed using a simple zero-forcing (ZF) decoder based on the received intra-cell channel links because ZF detection at the receivers is sufficient to guarantee the optimal DoF in the IMAC model. On the other hand, the

Shin et al.: Energy-Efficient Opportunistic Interference Alignment With MMSE Receiver

84

system performance may be improved further compared to the ZF detection case using a more enhanced receiver structure in terms of the sum-rates. This paper introduces a refined energy-efficient OIA technique using minimum mean square error (MMSE) detection at the receivers in the MIMO IMAC model with time invariant channel coefficients. In the OIA scheme under consideration, each user jointly performs the optimal transmit beamforming and power control, as in the conventional energy-efficient OIA case [13]. The results suggest that owing to the enhanced receiver structure, the proposed OIA scheme shows much higher sum-rates than those of the conventional OIA with ZF detection for all signal-to-noise ratios (SNRs) regions.

The remainder of this paper is organized as follows. Section 2 describes the system and channel models. In Section 3, the new energy-efficient OIA protocol is specified. Section 4 presents the numerical results of the OIA scheme. Finally, Section 5 summarizes the paper with some concluding remarks.

2. System and Channel Models

This study considered the MIMO IMAC model [9] to describe one of the realistic cellular uplink networks. Suppose that there are K cells, each of which has N users. Also assume that each user is equipped with L transmit antennas and each cell is covered by one BS with M receive antennas. Under the model, each BS in a cell is interested only in the traffic demands of the users in its cell.

The term, , ,g M Lc a

×∈CH indicates the channel matrix

between BS g and the tha user in cell c . A block fading channel model is assumed, where the channel is constant during a transmission block and changes independently for every transmission block. The channel matrices are assumed to be Rayleigh, whose elements follow an independent complex Gaussian distribution . Each selected user is assumed to transmit a single data stream at one time. Let (1), , ( )g g Sφ φ denote the S users who obtain transmission opportunities among N users in cell,

{ }1, , ,g K∈ where { }1, , .S M∈ When S symbols per cell are transmitted using the transmit beamforming vectors, 1

, (1) , ( ), , ,g g

Lg g Sφ φ

×∈Cv v the

signal, 1,Mg

×∈Cr received at BS g is given by

where , ( )gg sm φ ∈C is the transmit message at user ( )g sφ

in cell g and 1Mg

×∈Cn denotes the independent and identically distributed (i.i.d.) and circularly symmetric complex additive Gaussian noise (AWGN), whose elements have a zero-mean and unit variance. Each transmit beamforming vector is assumed to be normalized

to 2

, 1g a =v and each user has an average transmit power

constraint, 2

, ,[ ] .g a g ap m η= ≤E

3. OIA With MMSE Detection

This section describes the overall energy-efficient OIA protocol and then specifies the MMSE detection structure.

3.1 Protocol Description The overall procedure of the proposed energy-efficient

OIA is essentially the same as that reported elsewhere [13]. Therefore, the material of this subsection is retaken in part from reference [13].

The procedure is based on the channel reciprocity of time-division multiplexing systems. Each BS broadcasts its receive subspace, [ (1), , ( )],g g g S=W w w where

1( ) Mg s ×∈Cw is an orthonormal basis vector of .gW Each

user can then align its signal to the space orthogonal to the receive subspace of the other cells (i.e., interference subspace). Thereafter, the tha user in cell g calculates the total amount of generating interference affecting the receive subspaces of the other cells, termed leakage of interference (LIF), which is given by the following:

where The MIMO IMAC model considers a power control

strategy that can reduce the LIF further compared to the OIA with no power control, while maintaining the pre-defined signal quality. The OIA protocol including both transmit beamforming vector design and power control performed at each user, which basically consists of the following four steps, is described as follows: □ Step 1: Each BS randomly chooses and broadcasts

an S -dimensional receive space. □ Step 2: Suppose that the required received power

level of the desired signal, ρ , is known a priori at each user. Each user finds the transmit power and vector terms

{ }, ,, ,g a g ap v such that its LIF is minimized while

maintaining the desired signal quality in a distributed manner. The optimization problem at MS a belonging to cell g is given by the following:

(1a)

subject to

(1b) (1c)


85

where Note that at a given constraint (1b), the received SNR at BS g is expressed as ρ . □ Step 3: The users who have a feasible solution set

{ }, ,ˆˆ ,g a g ap v send the computed LIF to their home cell BSs.

The users who do not satisfy the two constraints, (1b) and (1c), do not feed the LIFs back to their home cell BSs. □ Step 4: Each BS selects its S home cell users

yielding LIF values up to the S th smallest one. Finally, the selected users in each cell begin to send

their data packets using the optimized pre-processor (i.e., the transmit beamforming and power terms). Each BS decodes the users’ signal by treating all interference as noise.

3.2 MMSE Receiver Design This subsection describes the receiver structure using

MMSE filtering under an energy-efficient OIA framework. The BS employing ZF detection does not need to acquire channel links between the BS itself and other-cell users. The scheme using ZF detection however has practical challenges because the number of per-cell users, N , is not sufficiently large. Because interference is not aligned perfectly for a finite ,N the remaining amount of interference in each received subspace may reduce the signal-to-interference-and-noise ratio, resulting in the performance degradation on the sum-rates.

If it is assumed that the inter-cell interference (i.e., the channel links between the BS and other-cell users) is known to the receivers (BSs), each BS can effectively suppress the remaining interference level in each receive subspace while decoding the home-cell users’ signal. The MMSE and receive subspace filtering were designed based on this additional information indicating the channel links between the BS and other-cell users. Let 1

, ( )g

Mg sφ

×∈Cu

denote the post-processing vector at BS g used to decode the signal sent from user ( )g sφ in cell g . Assuming that the signal sent from user ( )g sφ in cell g is decoded

( { }1, ,s S∈ ), the MMSE-based receive vector can then be expressed as

(2)

where

and (.)Me denotes the normalized eigenvector

corresponding to the maximum eigenvalue of a matrix. Some trade-off between ZF and MMSE detections

exist in multi-cell uplink networks. Because the ZF-based receiver operates based only on the intra-cell channel links received, each BS does not need to acquire inter-cell interference, resulting in a slight overhead reduction compared to the MMSE-based receiver requiring knowledge of the inter-cell interfering links. On the other hand, the OIA scheme using MMSE detection ensures much higher sum-rates than those of the ZF detection case for all SNR regions, which will be verified by computer simulations.

4. Numerical Evaluation

This section examines the performance of the proposed OIA scheme with MMSE detection through computer simulations of the sum-rates. For comparison, the performance of the OIA algorithm based on ZF detection was also evaluated [13]. The simulation environments are given by 2, 2, 1.M L S= = = The transmit power

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

SNR [dB]

Ach

ieva

ble

sum

-rat

es [

bps/

Hz]

MMSE, K=3

MMSE, K=4

MMSE, K=5ZF, K=3

ZF, K=4

ZF, K=5

Fig. 1. Sum-rates with respect to the SNR. The system with 2, 2, 1,M L S= = = and 10N = is considered.

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

SNR [dB]

Ach

ieva

ble

sum

-rat

es [

bps/

Hz]

MMSE, K=3MMSE, K=4

MMSE, K=5

ZF, K=3

ZF, K=4ZF, K=5

Fig. 2. Sum-rates with respect to the SNR. The system with 2, 2, 1,M L S= = = and 50N = is considered.

Shin et al.: Energy-Efficient Opportunistic Interference Alignment With MMSE Receiver

86

constraint, η , the required received power level of the desired signal, ρ , were set to η ρ= = SNR under the unit noise variance assumption.

The sum-rates of both OIA schemes with 1) ZF and 2) MMSE detections were evaluated for 3, 4, 5K = according to the received SNRs (in dB scale). As shown in Fig. 1, the sum-rates were obtained when 10N = . This confirms that the proposed OIA scheme with MMSE detection outperforms the conventional one with ZF detection for almost all SNR regions. A performance gain of up to 25% compared to the conventional scheme was obtained. Figs. 2 and 3 show the sumrates when 50N = and 100, respectively. The sum-rate curves in Figs. 2 and 3 show similar trends to those in Fig. 1. To determine the robustness of the proposed scheme according to the estimation error of inter-cell channels (i.e., channel links between the BS and other-cell users), which are difficult to acquire precisely in practice, the sum-rates were also evaluated in the presence of inter-cell channel estimation errors. Fig. 4 shows the sum-rates according to the

estimation error normalized to the true channel gain when 50N = and SNR = 30dB. Note that the users with a

higher LIF value were selected due to the inaccurate other-cell channel estimates at the transmitters, resulting in performance degradation for the OIA schemes using ZF detection and MMSE detection. Moreover, MMSE filtering can cause additional performance degradation because of the inaccurate inter-cell channel estimates at the receivers. On the other hand, the proposed OIA scheme with MMSE detection still outperformed that with ZF detection, even when the estimation error was increased to

210 .−

5. Conclusion

This paper introduced the refined energy-efficient OIA scheme with MMSE detection in the MIMO IMAC, while it operates in a distributed manner at the cost of a slightly increased amount of feedback/feedforward overhead and receiver complexity, compared to the conventional energy-efficient OIA with ZF detection [13]. The MMSE receiver structure was specified based on the OIA framework. The proposed OIA scheme showed much higher sum-rates than those of the conventional OIA for the given practical system parameters.

Acknowledgement

This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (MSIP) (2012R1A1A1044151).

References [1] M. A. Maddah-Ali, A. S. Motahari, and A. K.

Khandani, “Communication over MIMO channels: Interference alignment, decomposition, and performance analysis,” IEEE Trans. Inf. Theory, vol. 54, no. 8, pp. 3457-3470, Aug. 2008. Article (CrossRef Link)

[2] V. R. Cadambe and S. A. Jafar, “Interference alignment and degrees of freedom of the -user interference channel,” IEEE Trans. Inf. Theory, vol. 54, no. 8, pp. 3425-3441, Aug. 2008. Article (CrossRef Link)

[3] T. Gou and S. A. Jafar, “Degrees of freedom of the K user M N× MIMO interference channel,” IEEE Trans. Inf. Theory, vol. 56, no. 12, pp. 6040-6057, Dec. 2010. Article (CrossRef Link)

[4] K. Gomadam, V. R. Cadambe, and S. A. Jafar, “A distributed numerical approach to interference alignment and applications to wireless interference networks,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3309-3322, June 2011. Article (CrossRef Link)

[5] S. A. Jafar and S. Shamai (Shitz), “Degrees of

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

SNR [dB]

Ach

ieva

ble

sum

-rat

es [

bps/

Hz]

MMSE, K=3

MMSE, K=4

MMSE, K=5ZF, K=3

ZF, K=4

ZF, K=5

Fig. 3. Sum-rates with respect to the SNR. The systemwith 2, 2, 1,M L S= = = and 100N = is considered.

10-4

10-3

10-2

10

12

14

16

18

20

22

24

26

28

30

Normalized channel estimation error

Ach

ieva

ble

sum

-rat

es [

bps/

Hz]

MMSE, K=3MMSE, K=4

MMSE, K=5

ZF, K=3

ZF, K=4ZF, K=5

Fig. 4. Sum-rates with respect to the normalizedchannel estimation error. The system with

2, 2, 1,M L S= = = 50N = and SNR = 30dB isconsidered.


87

freedom region of the MIMO channel,” IEEE Trans. Inf. Theory, vol. 54, no. 1, pp. 151-170, Jan. 2008. Article (CrossRef Link)

[6] C. Suh and D. Tse, “Interference alignment for cellular networks” in Proc. 46th Allerton Conf. Commun. Control, and Comput., Monticello, IL, Sept. 2008. Article (CrossRef Link)

[7] B. C. Jung and W.-Y. Shin, “Opportunistic interference alignment for interference-limited cellular TDD uplink,” IEEE Commun. Lett., vol. 15, no. 2, pp. 148-150, Feb. 2011. Article (CrossRef Link)

[8] B. C. Jung, D. Park, and W.-Y. Shin, “Opportunistic interference mitigation achieves optimal degrees-of-freedom in wireless multi-cell uplink networks,” IEEE Trans. Commun., vol. 60, no. 7, pp. 1935-1944, July 2012. Article (CrossRef Link)

[9] H. J. Yang, W.-Y. Shin, B. C. Jung, and A. Paulraj, “Opportunistic interference alignment for MIMO interfering multiple-access channels,” IEEE Trans. Wireless Commun., vol. 12, no. 5, pp. 2180-2192, May 2013. Article (CrossRef Link)

[10] H. J. Yang, B. C. Jung, W.-Y. Shin, and A. Paulraj, “Codebook-based opportunistic interference alignment,” IEEE Trans. Sig. Process., to appear. Article (CrossRef Link)

[11] Y. Chen, S. Zhang, and S. Xu, “Characterizing energy efficiency and deployment efficiency relations for green architecture design,” in Proc. IEEE Int. Conf. Commun. (ICC), Cape Town, South Africa, May 2010, pp. 1-5. Article (CrossRef Link)

[12] W.-Y. Shin, H. Yi, and V. Tarokh, “Energy-efficient base-station topologies for green cellular networks,” in Proc. IEEE Consumer Commun. Netw. Conf. (CCNC), Las Vegas, NV, Jan. 2013, pp. 91-96. Article (CrossRef Link)

[13] J. Yoon, W.-Y. Shin, H. S. Lee, “Energy-efficient opportunistic interference alignment,” IEEE Commun. Lett., vol. 18, no. 1, pp. 30-33, Jan. 2014. Article (CrossRef Link)

Won-Yong Shin received his B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 2002. He received his M.S. and Ph.D. degrees in electrical engineering and computer science from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea,

in 2004 and 2008, respectively. From February 2008 to April 2008, he was a Visiting Scholar in the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA. From September 2008 to April 2009, he was with the Brain Korea Institute and CHiPS at KAIST as a Postdoctoral Fellow. From August 2008 to April 2009, he was with the Lumicomm. Inc., Daejeon, Korea, as a Visiting Researcher. In May 2009, he joined Harvard University as a Postdoctoral Fellow and was promoted to a Research Associate in October 2011. Since March 2012, he has been with the Department of Computer Science and Engineering, Dankook University, Yongin, Korea, where he is currently an Assistant Professor. His research interests include information theory, communications, signal processing, and their applications to multiuser networking issues.

Jangho Yoon received his B.S. and M.S. degrees in electrical engineering and computer science from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2006 and 2008, respectively. He is now pursuing a Ph.D degree in electrical engineering at KAIST. His

research interests include wireless communications, interference management, signal processing, and their applications to multiuser networking issues.



88


Photo-based Desktop Virtual Reality System Implemented on a Web-browser

Masaya Ohta1, Hiroki Otani1, and Katsumi Yamashita1

1 Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho Nakaku Sakai Osaka, 599-8531, Japan [email protected]

* Corresponding Author: Masaya Ohta


* Extended from a conference: Preliminary results of this paper were presented at the ICEIC 2014.This present paper has been accepted by the editorial board through the regular reviewing process that confirms the original contribution.

* Regular Paper

Abstract: This paper proposes a novel desktop virtual reality system. Based on the position of the user’s face, the proposed system selects the most appropriate image of an object from a set of photographs taken at various angles, and simply “pastes” it onto the display at the appropriate location and scale. Using this system, the users can intuitively feel the presence of the object.

Keywords: Virtual reality, Desktop VR, Photo AR, E-Commerce, Digital signage, HTML5/JavaScript 1. Introduction

Virtual Reality (VR) has been evolving constantly since its early days, and is now a fundamental technology in different application areas, including scientific and medical visualization, entertainment, video games, education, and training. One of the aims of this study was to apply VR technology to the display of products on electronic commerce (EC) sites and digital signage.

Object VR [1] is one of the notable methods that use the photographs of an object taken at various angles and allows the users to observe it from the angle they desire. On the other hand, Object VR requires dragging the mouse of a computer to change the angle. Therefore, the user does not feel the presence of the object in the display during mouse operation.

Previously, a photo-based augmented reality (Photo AR) system that uses the photo images of an object captured at various angles instead of rendering a 3D model corresponding to the object during runtime was proposed [2]. With this system, an appropriate photo image for the position and orientation of the user's camera was selected from previously captured and stored images, and then adjusted and simply “pasted” into the camera view. Because Photo AR displays the object as if it exists right in front of the user, it allows the user to intuitively recognize the object better than Object VR. On the other hand, the

user needs to move the camera or marker when he/she wants to check the object at a different angle. Although this operation is more intuitive than dragging the mouse, it is not effective enough.

Desktop VR, which is commonly referred to as fish tank VR, is a technology that involves the use of a display of a desktop computer coupled with a head tracker that estimates the user’s head position and updates a 3D projection matrix in real-time [3-5]. This system allows the users to observe virtual images through the display as if they were looking into an actual fish tank. He/she feels the presence of a rendered object in the display because no mouse operation is needed. Although, desktop VR does not provide strong immersion, it is suitable for visualization systems because of its ease of use and ability to present high-quality images.

The main obstacle encountered when applying desktop VR to EC sites is that 3D models of all viewable products must be prepared in advance. A typical desktop VR system uses three dimensional computer graphics (3DCG) to render the objects required for a given user's viewpoint. These objects are typically authored using 3D modeling programs. These tools are highly expressive but they also tend to be complex and time-consuming. If the number of viewable products is large, the cost of creating these models will in most cases be prohibitive. 3D reconstruction methods have been proposed [6-8] as an alternative to


89

author-driven modeling. These methods attempt to reconstruct a 3D model, automatically or semi-automatically, based on the photo images of a target object. On the other hand, accurate modeling may be impossible in some cases when a target object has a complicated shape. This technology is inappropriate because the product images displayed on EC sites must be accurate and appealing.

This paper proposes a photo-based desktop VR system that uses the photo images of an object captured at various angles, instead of rendering a 3D model of the object during runtime. In this system, the appropriate photo image for the position of the user's face is selected from the images captured and stored previously, which is then adjusted and simply “pasted” onto the display. By using this system, the users can strongly feel the presence of the rendered object. This system does not use 3DCG, and can display the products accurately. This approach can reduce the number of rendering calculations significantly because it does not require 3DCG rendering. Moreover, any complicated image can be processed at the same processing load because photo images are just pasted. This also spares the developers from the efforts of creating a 3D model for every viewable object. In addition, this helps to minimize the costs of rendering on mobile devices with limited battery capacity, such as smart phones. The system is implemented using HTML5 and JavaScript; hence, it has little dependence on the OS or browser, which can reduce the development costs.

The remainder of this paper is organized as follows. Section 2 provides an outline of the system and Section 3 presents an evaluation of the proposed methods. The final section summarizes this study.

2. Proposed System

2.1 Configuration of the Proposed System

The desktop VR system proposed in this study consists of a face tracker, a photo renderer, and a virtual showcase renderer. Fig. 1 presents the configuration of the proposed system. The face tracker recognizes the user’s face at the image captured by a camera and estimates the position relative to the display.

The photo renderer receives the results of the face tracker, selects a suitable photo image from a set of images captured and stored previously, and draws an image on the display. In addition, the virtual showcase renderer receives the results of the face tracker, and draws the virtual showcase such that the user feels the target object being displayed in the showcase.

The face tracker, the photo renderer, and the showcase renderer were implemented using HTML5 and JavaScript and run on a web browser with an HTML rendering engine.

2.2 Photo Images This section describes the photography method

implemented to capture the photographs of the products

used for the system. Fig. 2 shows a simple imaging system. First, the target object is photographed by the imaging system, which is the same as that for Object VR. In this system, the object is placed on a rotating table and photographed from a number of preset angles using a camera. These initial images are stored in video format. To remove the background from the images, blue-screen panels are placed around the object so that their color can be deleted from the resulting images using video editing software. The final images are stored as a png format file with an alpha channel.

Because the proposed system requires a transparent background, the photo images are stored in png format, as mentioned above. The png format has an alpha channel in addition to RGB, and is suitable for images with a transparent background. For this purpose, however, it must handle a large volume of data. On the other hand, the data volume of the jpeg format is small because it does not store images with a transparent background due to the lack of an alpha channel. Therefore, to reduce the amount of data, the png images are converted to jpeg using the system, and a transparent background is created when drawing the

Fig. 1. Configuration of the proposed system.

Fig. 2. Imaging system.

Ohta et al.: Photo-based Desktop Virtual Reality System Implemented on a Web-browser

90

images. In particular, information of the transparent areas is stored separately as data similar to the run-length coding, in addition to the jpeg images. The system uses this information to draw pixels consisting of only non-transparent areas on a background image to create images with a transparent background.

2.3 Face Tracker The face tracking system used in this section is an open

source system presented in reference [9]. This system was implemented with HTML5/JavaScript, and was executable on browsers. The face tracking system consisted of two phases. First, the recognition of a face captured by a camera is attempted. This is called the recognition phase. Second, the face is tracked using a tracking method. This is referred to as the tracking phase. If the tracking phase fails, it returns to the recognition phase. The recognition and tracking results are calculated with the display coordinate system, as shown in Fig. 3.

2.4 Photo Renderer Suppose the coordinates of a user's face calculated by

the face tracker are ( , , )X Y Z , the Photo renderer calculates the angle formed by the line-of-sight from the center of the display to the face center and the Y axis of the display coordinate system is specified as the depression ϕ , and the angle formed by the straight line as the projection of this line-of-sight onto the ZX plane of the display coordinate system and the Z axis is specified as the azimuth ψ (See Fig. 3). ϕ and ψ are calculated as follows.

( )1 2 2tan X Z Yϕ −= + (1)

( )1tan /X Zψ −= − (2) In this system, the image shot from a position closest to

both ϕ and ψ is selected. The image it then is pasted on the proper position of the display.

2.5 Showcase Renderer To indicate a showcase that is located at the back of the

display and that contains a product, a virtual box is drawn in the background of the product. Fig. 4 shows the drawing method of the showcase. When the depth of the virtual showcase is D , the intersection point ( , )i ix y between the display and a straight line connecting ( , , )i iX Y D− and ( , , )X Y Z can be calculated using the following formula:

( )i iZx X X X

D Z= − −

+ (3)

( )i iZy Y Y Y

D Z= − −

+ (4)

The lines connecting the intersection points and the

four corners of the display are drawn as a virtual showcase.

3. Evaluation

3.1 Performance of Face Tracker The performance of Face tracker was evaluated

experimentally. A desktop PC and tablets (platforms) shown in Table 1 were used for the experiments.

First, to evaluate the performance of only the recognition phase, the system was modified to skip the tracking phase every time, and the number of times of recognition processing executed per second was measured. During the measurement, the face was kept still. Moreover, a partition was placed at the back of the user to create a completely white background, so the recognition experiment could be performed in every platform under the same conditions.

Fig. 5 shows the images captured during the measurement, and Fig. 6 shows the measurement results.

ψ

φX

Y

Z

display user

Fig. 3. Display coordinate system.

Fig. 4. Rendering of a virtual showcase.

Fig. 5. Captured images for face tracking (left: desktop PC, right: tablet).


91

The dark blue bars show the number of successful recognitions, and the pale blue bars show the number of times of failed recognitions. These graphs indicate that additional time is required for recognition as the resolution of images captured by the camera increased. When performing the following experiments, the resolution of the image for which the recognition time was shortest should be used.

Subsequently, the tracking phase performance was evaluated. In addition, the number of times tracking the face was executed every second for two cases was evaluated: the face was static, and the face swung right and left (approx. 30cm/sec).

Fig. 7 shows the measurement results. Compared to the static face, the swinging face needed at most 1.1 times longer to track a face. In addition, compared to Fig. 6, the tracking processing was 4.5 times or significantly faster than that of recognition processing. No case of failure was noted when the face was swung just around 30cm/sec.

3.2 System Performance The proposed system was implemented with

HTML5/JavaScript, and the performance was evaluated. First, an operation test was conducted with the devices Fig. 6. Performance of face recognition.

Fig. 7. Face tracking performance.

Table 1. Specifications of the desktop PC and tablets for the experiments.

Platform Desktop PC Tablet L Tablet K Tablet X Tablet N

Maker/Type Custom PC Lenovo/ ideaPad Miix2 8

Amazon/ Kindle Fire

HDX 7

Sony/Xperia Z SGP412JP/W

Google/ Nexus 7 (2013)

CPU Intel

Core i5-3470 3.2GHz

Intel Atom Z3740

Qualcomm Krait 400x4

Qualcomm Krait 400x4

Qualcomm Quad Krait

RAM 8GB 2GB 2GB 2GB 2GB

GPU NVIDIA GeForce GTX 650

Intel HD Graphics Adreno 330 Adreno 330 Adreno 320

Monitor 24inch 1,920x1,080

8 inch 1,280x800

7 inch 1,920x1,200

6.4 inch 1,920x1,080

7 inch 1,920x1,200

Camera 3MPixel 2MPixel 1.2MPixel 2.2MPixel 1.2MPixel OS Windows 8.1 Windows 8.1 Fire OS 3.0 Android 4.2 Android 4.3

Browser Google Chrome 33.0.1750.154m

Google Chrome 33.0.1750.154m

Google Chrome 33.0.1750.136




92

shown in Table 1. One hundred and twenty pictures photographed at every 3° from 360° around a product were prepared. The resolution of each photo image was 720x720 pixels. In this experiment, a face that rarely moved up and down was considered, the angle of depression (ϕ ) was fixed at 60°.

All photo images were saved in the desktop PC used for the experiment. When this PC was used for the experiments, the browser on the PC read these images directly. When using a tablet for the experiment, the desktop PC was used as a Web server, and the system was prepared so the tablet could download the photo images from the server as needed.

The system was also prepared using 3DCG instead of photo images, and compared with the proposed system that

used the photo images. The open source library, "Three.js" was used for 3DCG rendering [10].

Figs. 8 and 9 show the operation checking process. As shown in the figures, images could be rendered smoothly on all platforms, and the test users felt as if the product really existed in the showcase installed in the display. Fig. 10 shows a display of the system using 3DCG. A comparison of Figs. 8(a) and (b) with Fig. 10 showed that the proposed system using the photo images could render images more realistically than the 3DCG system, and it was suitable for displaying products.

Subsequently, the processing time of the Photo renderer was measured as the face swung right and left. The size and the number of photo images were the same as those of the previous experiment.

Fig. 8. Display of the proposed system (desktop PC).


93

Fig. 11 shows the measurement results. The vertical axis of the figure shows the inverse number of measurement results and the means of how many times per second the Photo renderer could render. As shown in the figure, in every platform, the calculation load of the processing by the Photo renderer was smaller than that for recognition processing by Face Tracker.

Next, the processing time of the conventional 3DCG renderer (canvas renderer of Three.js) and the proposed Photo renderer were compared. The 3D model "Apple" consisted of 208 vertices and 228 faces. The 3D model "Plant" consisted of 10116 vertices and 9952 faces. Similar to the previous experiment, the face was swung right and left during the measurements.

Fig. 12 shows the measurement results. For the model

"Apple", the conventional 3DCG renderer was faster than the Photo renderer because the size of the model is very small. On the other hand, for the model "Plant", the processing time could not be measured on all tablets due to the large size of the model. In contrast, photo renderer could operate at almost same speed regardless of the complexity of the model.

Finally, the resolution of the photo images was set to 1080x1080 pixels, and the number of photos was also increased from 120 pictures (every 3°) to 500 pictures (every 0.72°) so that the product could be indicated smoothly according to movement of the user' face, and the processing time of Photo renderer was measured. the number of different photo images displayed per second was also measured.

Fig. 9. Display of the proposed system (Tablets).

Fig. 10. Display of the system using 3DCG (desktop PC).


94

Fig. 13 shows the measurement results. Although the number of resolution or photo images was increased, the system on the desktop PC and Tablet L could draw images at 20 fps or higher. Fig. 14 shows the number of photo images displayed per second. The figure shows that smoother images were displayed as the number of displayed different photo images was increased. When checking the operation visually, no switching of the photos could be found and very natural images were observed.

3. Conclusion

This paper presented a novel desktop VR system. The

proposed system was implemented and evaluated on a desktop PC and tablets. From the evaluation, it was verified that this system operated smoothly on both a PC and tablets, and could provide a better sense of its existence to users.

Recently, LCD displays have become increasingly larger in size and their resolutions have increased proportionally. These devices are suitable for displaying products accurately and beautifully. We believe that this system will be able to display products more attractively when incorporated in these devices.

References

[1] T. Monroe, QuickTime Toolkit: Advanced Movie

Playback and Media Types, Morgan Kaufmann, 2004, pp. 283-287. Article (CrossRef Link)

[2] M. Ohta, R. Yokomichi, M. Motokurumada, K. Yamashita, " A Photo-Based Augmented Reality System with HTML5/JavaScript," Proc. of 1st IEEE Global Conf. on Consumer Electronics, pp.430-431, 2012. Article (CrossRef Link)

[3] K. W. Arthur, K. S. Booth, and C. Ware, “Evaluating 3D task performance for fish tank virtual worlds,” ACM Transactions on Information Systems, vol.11,

Fig. 11. Performance of the photo renderer.

(a)

(b)

Fig. 12. Comparison of the rendering performance (CG: conventional, Photo: proposed).

Fig. 13. Performance of the photo renderer with high-resolution photo images.

Fig. 14. Number of different photo images displayed per second.


95

no.3, pp.239-265, 1993. Article (CrossRef Link) [4] J. Rekimoto, “A vision-based head tracker for fish

tank virtual reality,” Proc. of Virtual Reality Annual International Symposium, pp.94-100, 1995. Article (CrossRef Link)

[5] J. C. Lee, “Hacking the Nintendo Wii Remote,” IEEE Pervasive Computing, vol.7, no.3, pp.39-45, 2008. Article (CrossRef Link)

[6] M. Brown, T. Drummond, and R. Cipolla, “3D Model Acquisition by Tracking 2D Wireframes,” Proc. of the 11th British Machine Vision Conference, 2000. Article (CrossRef Link)

[7] A. van den Hengel, A. Dick, T. Thormahlen, B. Ward, and P. H. S. Torr, "Video-Trace: Rapid Interactive Scene Modelling from Video,” ACM Transactions on Graphics, vol. 26 (3), 2007. Article (CrossRef Link)

[8] Q. Pan, G. Reitmayr, and T. Drummond, “ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition,” Proc. of the 20th Brithish Machine Vision Conference, 2009. Article (CrossRef Link)

[9] headtrackr. Article (CrossRef Link) [10] Three.js. Article (CrossRef Link)

Masaya Ohta graduated from Osaka Prefecture University in 1991, completed his doctoral studies in 1996, and became an assistant professor at Osaka Electro-Communication Uni- versity, an assistant professor with the Graduate School of Engineering at Osaka Prefecture University in 2002.

He has been an associate professor since 2012. He is primarily pursuing research related to augmented reality and communications systems. He holds a D.Eng. degree, and is a member of IEEE, IEICE, IEEJ, and IPSJ.

Hiroki Otani graduated from Osaka Prefecture University in 2012, and enrolled in the first half of the doctoral program there. He is primarily pursuing research related to augmented reality and is a member of IPSJ.

Katsumi Yamashita completed his doctoral studies at Osaka Prefecture University in 1980. He became a lecturer on the Faculty of Engineering at the University of the Ryukyus in 1982, an associate professor in 1988, and a professor in 1991. Since 2000 he has been a professor in the Graduate

School of Engineering at Osaka Prefecture University. He has primarily been pursuing research on communications and adaptive signal processing. He holds a D.Eng. degree, and is a member of IEEE and IEEJ.


IEIE Transactions on Smart Processing and Computing, vol. 3, no. 2, April 2014 http://dx.doi.org/10.5573/IEIESPC.2014.3.2.96 96


Performance Comparison between LLVM and GCC Compilers for the AE32000 Embedded Processor

Chanhyun Park, Miseon Han, Hokyoon Lee, Myeongjin Cho, and Seon Wook Kim

Compiler and Microarchitecture Laboratory, School of Electrical and Computer Engineering, College of Engineering, Korea University, Seoul, Korea {yasutaxi, mesunyyam, hokyoon79, linux, seon}@korea.ac.kr * Corresponding Author: Seon Wook Kim


* Extended from a conference: Preliminary results of this paper were presented at the ICEIC 2014.This present paper has been accepted by the editorial board through the regular reviewing process that confirms the original contribution.

* Short Paper

Abstract: The embedded processor market has grown rapidly and consistently with the appearance of mobile devices. In an embedded system, the power consumption and execution time are important factors affecting the performance. The system performance is determined by both hardware and software. Although the hardware architecture is high-end, the software runs slowly due to the low quality of codes. This study compared the performance of two major compilers, LLVM and GCC on a32-bit EISC embedded processor. The dynamic instructions and static code sizes were evaluated from these compilers with the EEMBC benchmarks.LLVM generally performed better in the ALU intensive benchmarks, whereas GCC produced a better register allocation and jump optimization. The dynamic instruction count and static code of GCCwere on average 8% and 7% lower than those of LLVM, respectively.

Keywords: GCC, LLVM, Optimization, EISC, Performance, Code quality 1. Introduction

Nowadays, the embedded processor market is becoming increasingly larger with the appearance of mobile devices, such as smartphones and tablets. In an embedded system, the power consumption and execution time are important performance factors due to the resourcelimitations and mobility characteristics. Even if the hardware specifications becomes tremendously high, the hardware cannot be executed efficiently without proper assistance from a compiler. Therefore, higher performance on the same hardware can be achieved when a program is compiled with better compilers.

Traditionally, GCC [1] has been used and developed over a long period, and it provides many useful techniques for achieving higher performance, such as loop unrolling, constant propagation, aggressive register allocation. LLVM [2] has become available recently, and it is being popularly used because of its versatility, flexibility, and reusability. Despite this, a performance comparison of the two compilers has not been researched sufficiently.

The aim of this study was to evaluate the code quality of the two compilers and better understand their optimization techniques. For this purpose, the EEMBC benchmark [3], which represents a wide variety of workload for embedded systems, was used. The number of dynamic instructions and code size have been used as the basic performance metrics.

The remainder of this paper is organized as follows. Section 2 introduces the target embedded processor, AE32000 [4], along with its features and instruction set architecture. Section 3 describes the frameworks of the LLVM and GCC compilers. In Section 4, the experimental setup and results are described. Finally, the conclusion is reported in Section 5.

2. Target Processor and ISA

This section describes the features of the target processor (AE32000) and the characteristics of its instruction set, called the EISC (Extendable Instruction Set Computer).


97

2.1 AE32000 Fig. 1 presents an overview of the architecture of

AE32000. AE32000 is a 32-bit embedded microprocessor that is aiming at a high code density and high performance. The processor uses a 16-bit fixed length instruction set, and has a typical 5-stage pipeline, 16x32-bit GPRs (General Purpose Registers) and 9x32-bit SPRs (Special Purpose Registers). The target supports the AMBA AHB/AXI Bus and the SIMD architecture. In addition, it supports up to 3 coprocessors.

2.2 EISC (Extendable Instruction Set Computer)

EISC is an instruction set architecture that has been developed by ADChips [6]. The EISC ISA combines the hardware simplicity of RISC (Reduced Instruction Set Computer) and the small code size of CISC (Complex Instruction Set Computer) into one. Table 1 lists the characteristics of RISC, CISC, and EISC.

EISC uses the Extension Register (ER) and Extension Flag (E) to express the variable immediate operands. ER stores an extendable operand value and E-flag shows whether the ER stores the immediate value. A LERI instructionis used to set the ER.

Fig. 2 shows how the LERI instruction cooperates with the MOV instruction to move a large immediate value. When the LERI instruction is executed, a value in the immediate field is copied into the ER register. The following MOV instruction then checks E-flag. If E-flag is set, a value in ER is shifted to the left by 8 and concatenated with the immediate value in the MOV instruction. As a result, a 22bit-width value can be moved at once. Two LERI instructions followed by MOV can

decode an immediate value up to [31:0] in the same manner.

3. Code Generation

GCC and LLVM have similar features in that the front-ends reada source code written in various languages, such as C, C++, Fortran, Java, and generate IR (Intermediate Representation). Subsequently, the IR is optimized by the target independent optimizers. The compiler back-ends translate the optimized IR into the target processor assembly code, as described in Fig. 3. This strategy allows the reuse optimizers with various high-level languages and targets. An AE32000 back-end was added to each compiler to port LLVM and GCC to the AE32000 processor.

3.1 GCC Back-end To generate the target assembly code, a compiler uses

hardware-specific information, such as registers, instruction cost, and assembly mnemonics. For this purpose, the GCC back-end uses the machine description files. The machine description files consist of two parts: instruction patterns (.md file) and C code that contains the code generation rules.

The .md file contains IR patternsas RTL templates and the corresponding assembly mnemonics. The GCC back-end matches the generated IR against the RTL templates. If the IR matches successfully, GCC emits the assembly code for a target processor. Otherwise, GCC reconstructs the IR.

The other part of the machine description is written in C code. The C code describes the code generation rules in detail, such as caller-callee convention, register allocation priority, prologue/epilogue expansion, etc.

3.2 LLVM Back-end The back-end of LLVM is similar to that of GCC.

Fig. 1. AE32000 architecture [5].

Table 1. Characteristics of ISAs [7].

RISC CISC EISC ISA structure Simple Complex Moderate

Program code Size Large Small Moderate

Fig. 2. LERI and LDI instructions to load the variable immediate values [7].

Fig. 3. Common compilation flow.

Park et al.: Performance Comparison between LLVM and GCC Compilers for the AE32000 Embedded Processor

98

LLVM IRs are translated into the target instructions based on the target description (.td file) and cpp files. The target description files describe the register information, instruction information and caller-callee information for a target processor. Special routines, such as prologue, epilog expansion, and target specified instructions, are described in the cpp files.

3.3 LERI Instruction Support The LERI is a special instruction for a large immediate

value used by the EISC processor. The LERI instruction is inserted immediately before an instruction that requires a large immediate value. The AE32000 assembler is in charge of this process. Each source code is compiled into an assembly file (.s file) separately by LLVM and GCC. The assembler and linker generate binary files for AE32000. Fig. 4 shows a total compilation flow.

4. Performance Evaluation

The EEBMC benchmark on the AE32000 simulator was used to evaluate the performance of LLVM and GCC. The EEMBC benchmark was compiled by LLVM 3.1 and GCC 4.7.1 compilers with the option, ‘-O2’. In addition, the AE32000 assembler was used for both LLVM and GCC.

Fig. 7 shows the dynamic instruction count and the static code size of LLVM normalized to those of GCC. The results show that the dynamic instruction count and

the static code size of GCCwere on average8% and 7% lower than those of LLVM, respectively. This section analyzes some representative benchmarks in detail.

4.1 iirflt This benchmark shows a significant difference in

performance between LLVM and GCC. As shown in Fig. 5, iirflt has a modulo operation and divides the codes with a single constant and single variable operand. Because AE32000 does not support modulo and division functional units in the hardware, the compiler back-ends generate a library call or an alternative instruction sequence instead of these unsupported operations.

As shown in Fig. 6, GCC generates a library call (__divsi3), whereas LLVM generates a code sequence for the division operator. The GCC manner has two disadvantages. First, the library call incurs call linkage overheads. Second, there are many branches in the library routine of GCC. The library checks whether a divisor is zero and calls another library to obtain the value required.

Fig. 4. Compilation flow of GCC and LLVM.

signalOutLow1 = (varsize)(signal_in * (*coefficient1++)); if( (signalOutLow1%COEF_SCALE)>(COEF_SCALE/2)) { signalOutLow1 += COEF_SCALE / 2; } signalOutLow1 /= COEF_SCALE; //COEF_SCALE is constant.

Fig. 5. Code snippet of the iirflt benchmark.

C code signalOutLow1 /= COEF_SCALE; // COEF_SCALE=50

(a) jal ___divsi3

(b) ldi 1374389535, %r9 mul %r9, %r8 mfmh %r8 mov %r8, %r9 lsr 31, %9 asr 4, %r8 add %r9, %r8

Fig. 6. Assembly code generated by (a) GCC and (b) LLVM.

Fig. 7.Dynamic instruction count and static code size of LLVM normalized to GCC [8].


99

Therefore, LLVM could achieve better performance than GCC.

These strength reductions are slightly different in GCC and LLVM. GCC inserts 64bit multiplication during this optimization, but AE32000 does not support this; it results in a library call. On the other hand, LLVM inserts other instructions with the same effects. This makes LLVM apply the optimization.

4.2 pntrch To compare the performance of LLVM and GCC in

detail, the EISC instructions were categorized into five groups: Memory, Branch, Arithmetic, Move, and others. Fig. 8 shows the number of total dynamic instructions in the groups. The number of executed memory instructions of LLVM was 4.78 times higher than that of GCC. In contrast, the number of executed move instructions of LLVM was 0.8% of those by GCC.

The main reason for the difference is memory optimization in loops. First, in the LLVM code, a loop induction variable is spilled. Second, GCC performs basic block reordering. As a result, GCC uses more register-to-register instructions by reducing the LD/ST instruction, as shown in Fig. 8.

4.3 idctrn The hotspot region of idctrn is a two dimension array

multiplication (matrix multiplication) that is 3-nested loops, as shown in Fig. 9.

LLVM unrolls the loop by a factor of 8, precisely the same as a constant COLS. This loop-unrolling [9] removes the innermost loop – 8 comparison instructions, 8 jump instructions and arithmetic instructions of a loop induction variable. As a result, LLVM can reduce the dynamic instructions significantly, but increases the static code size. On the other hand, GCC does not perform loop unrolling with the –O2 option.

For a fair comparison, the -funroll-loops option was applied when the benchmark was compiled with GCC. The simulation showed that the dynamic instruction count is 10% smaller than that of LLVM. On the other hand, the static code size of GCC is 28% larger than that of LLVM. LLVM unrolls only the inner most loop, but GCC unrolls the nested loops. Therefore, LLVM generates smaller code.

Fig. 10 shows the number of total dynamic instructions of the 5 groups in idctrn where gcc-u means it applied the loop unrolling optimization.

4.4 rgbyiq The benchmark loads a 76816-byte RGB image and

converts it to a YIQ format image. GCC has many more memory instructions than LLVM for the following reasons.

AE32000 provides a multiplication instruction with an immediate value. On the other hand, the GCC implementation does not use this instruction. Therefore, one more instruction is needed to load the immediate value from a register. In addition, in the main function loop, GCC uses 14 registers, whereas LLVM uses 12 registers. The difference is that 2 registers hold irrelevant values to the hot spot in the LLVM case. The use of more registers in GCC removes some move instructions, and GCC requires fewer move instructions.

Fig. 8. Number of dynamic instructions in the 5 groups in pntrch01.

for(i=0;i<ROWS; i++) for(j=0; j<COLS; j++) for(k=0; k<COLS; k++) F_1[i][j]+=f_1[i][k]*cosMatrixA[k][j];

Fig. 9. Hotspot code section of the idctrn benchmark.

Fig. 10. Number of dynamic instructions into 5 groupsin idctrn.

Fig. 11. Number of dynamic instructions in the 5 groups in rgbyiq.


100

4.5 rgbcmy The benchmark rgbcmy loads RGB data, converts it to

CMYK format, and stores into new memory. RGB and CMYK are stored in arrays with strides 3 and 4, respectively. In the source code, the source and destination memory locations are obtained with a base pointer, an offset and a stride. LLVM calculates the memory address for every loop, whereas GCC performs loop invariant factoring and strength reduction in the loop. Therefore, GCC generates the add instructions, but LLVM generates the add and move instructions to obtain the memory address. As a result, the number of move instructions in GCC is lower than those in LLVM, as shown in Fig. 12.

4.6 routelookup The hotspot function in the routelookup benchmark

consists of ‘if-else’ statements, as shown in Fig. 13.

As mentioned in Section 4.2, LLVM has a weakness in basic block reordering. LLVM tends to maintain the sequence of the basic block, as described in Fig. 13. For example, there are two unconditional jumps and one conditional jump instruction in assembly code obtained by LLVM, whereas there is only one unconditional jump instruction with GCC. The control flow in the code generated by LLVM is easy to understand because the basic blocks are quite similar to the original source code. The inefficiency of LLVM reduces the chance of optimization, which leads to decreasing dynamic instruction counts and static code size.

4.7 bezierfixed The benchmark takes the curves as an input and

calculates the Bezier curve points. This benchmark consists of nested loops and division instructions. LLVM compiles these division instructions into instruction sequences, as mentioned in Section 4.1. In addition, LLVM unrolls the innermost loop by a factor of 4. As a result, there are fewer branch instructions and arithmetic instructions of LLVM than GCC, as shown in Fig. 14.

5. Related work

Few studies have compared GCC and LLVM. ‘Kim et al, Comparison of LLVM and GCC on the ARM Platform”, EMC 2010’ [10] is an example of a study of LLVM and GCC. In their study, they focused on the inter-procedure optimization performed by ‘llvm-ld’. In addition, they showed that the other optimizations, GCC, performed better, in various true and false flags. They gave ‘-O2’ to both compilers, but they did not mention loop unrolling. LLVM might have been improved since then because their LLVM version was 2.6.

6. Conclusion

This study analyzed and compared the performance of

Fig. 12. Number of dynamic instructions in the 5 groups in rgbcmy.

C code if(proute->dst_addr==trie_root->key) {

next_node=next_node->left; } else {

next_node=next_node->right; }

(a) cmp %r1 %r2 jeq label1: jmp label2:

label1: mov %4 %r3 jmp label3:

label2: mov%r5 %r3

label3: (b) move %r4 %r3 cmp %r1 %r2 jeq label1: move%r5 %r3

label1:

Fig. 13. Code snippet in the routelookup and corresponding assembly code (a) LLVM, (b) GCC.

Fig. 14. Number of dynamic instructions in the 5 groups in bezierfixed.


101

the generated codes by LLVM and GCC in terms of the dynamic instruction counts and static code sizes. For the performance evaluation, two backend compilers for the AE32000 processor were implemented and the EEMBC benchmarks were used.

GCC performed better in optimizing the memory accesses and basic block ordering, whereas LLVM did better in optimizing the arithmetic instructions. In addition, LLVM sometimes optimized quite aggressively in loop unrolling, whereas GCC was good at register allocation in the view of saving instructions using more registers. On the other hand, if there is an opportunity for loop optimization, the program performance can be enhanced by adding ‘-funroll-loops’ with GCC.

In the ALU intensive EEMBC benchmarks, LLVM showed better performance. On the other hand, GCC showed better performance in memory intensive benchmarks. Overall, the dynamic instruction counts and the static code sizes of the benchmarks compiled by GCC were 8% and 7% on average lower than those of LLVM, respectively.

Acknowledgement

This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2011-0010262).

References [1] GCC, the GNU Compiler Collection. Article

(CrossRef Link) [2] The LLVM Compiler Infrastructure. Article

(CrossRef Link) [3] J. Poovey, T. Conte, M. Levy, and S. Gal-On, "A

benchmark characterization of the EEMBC benchmark suite," Micro, IEEE, vol. 29, no. 5, pp.18–29, 2009. Article (CrossRef Link)

[4] Hyun-Gyu Kim, Dae-Young Jung, Hyun-Sup Jung, Young-Min Chio, Jung-Su Han, Byung-Gueon Min, and Hyeong-Cheol Oh, "AE32000B: a Fully Synthesizable 32-Bit Embedded Microprocessor Core," ETRI Journal, vol. 25, no. 5, pp. 337-344, Oct. 2003. Article (CrossRef Link)

[5] AE32000-Lucida. Article (CrossRef Link) [6] Advanced digital chips inc. Article (CrossRef Link) [7] EISC. Article (CrossRef Link) [8] Chanhyun Park, Miseon Han, Hokyoon Lee, and

Seon Wook Kim, "Performance Comparison of GCC and LLVM on the EISC Processor, "International Conference on Electronics, Information and Communication (ICEIC) in Kota Kinabalu, Malaysia, 2014.

[9] J. C.Huang, T. Leng, "Generalized loop-unrolling: a method for program speedup," IEEE Symposium on Application-Specific Systems and Software Engineering and Technology (ASSET) in Richardson, Texas, pp.244-248, 1999. Article (CrossRef Link)

[10] Jae-Jin Kim, Seok-Young Lee, Soo-Mook Mook,

Suhyun Kim, “Comparison of LLVM and GCC on the ARM Platform”, Embedded and Multimedia Computing (EMC), 2010 5th International Conference on, vol., no., pp.1,6, 11-13 Aug.2010. Article (CrossRefLink)

Chanhyun Park received his B.S. degree in Electrical Engineering from Korea University, Seoul, Republic of Korea, in 2013. Currently he is a Ph.D. student in Korea University. His research interests include performance analysis in the Android mobile system. He is a student member of IEEE

Miseon Han received her B.S. degree in Electrical Engineering from Korea University, Seoul, Republic of Korea, in 2012. Currently, she is a Ph.D. student in Korea University. Her research interests include compiler support, microarchitecture, and memory designs (particularly DRAM

and NAND Flash memory). She is a student member of IEEE.

Hokyoon Lee received his B.S. degree in Electrical Engineering from Korea University, Seoul, Republic of Korea, in 2013. Currently, he is a Ph.D. student in Korea University. His research interests include compiler, embedded system, and microarchitecture.

Myeongjin Cho received his BS, MS, and Ph.D.from the School of Electrical Engineering of Korea University, Seoul, Republic of Korea in 2006, 2008, and 2013, respectively. Currently, he is a research professor at Korea University. His research interests include Android performance analysis

and optimization, high performance computing, and microarchitecture. He is a member of IEEE and ACM.


102

Seon Wook Kim received his BS in Electronics and Computer Engineering from Korea University, Seoul, Republic of Korea in 1988. He received his MS in Electrical Engineering from Ohio State University, Columbus, Ohio, USA, in 1990, and Ph.D. in Electrical and Computer engineering from

Purdue University, West Lafayette, Indiana, USA, in 2001. He was a senior researcher at the Agency for Defense Development from 1990 to 1995, and a staff software engineer at Inter/KSL from 2001 to 2002. Currently, he is a professor with the School of Electrical and Computer Engineering of Korea University and Associate Dean for Research at the Collegeof Engineering. His research interests include compiler construction, microarchitecture, and SoC design. He is a senior member of ACM and IEEE.


ieie transactions on smart processing and computing, vol ... › upload › jnl ›...

Documents