image denoising using wavelet thresholding and model selection

7/30/2019 Image Denoising Using Wavelet Thresholding and Model Selection

1/4

Image Denoising using Wavelet Thresholding and Model Selection

Shi Zhong

Dept. of ECE, Univ. of Texas at Austin

[email protected]

Vladimir Cherkassky

Dept. of ECE, Univ. of Minnesota

[email protected]

ABSTRACT

This paper describes wavelet thresholding for image denoising

under the framework provided by Statistical Learning Theory aka

Vapnik-Chervonenkis (VC) theory. Under the framework of VC-

theory, wavelet thresholding amounts to ordering of wavelet

coefficients according to their relevance to accurate function

estimation, followed by discarding insignificant coefficients.

Existing wavelet thresholding methods specify an ordering based

on the coefficient magnitude, and use threshold(s) derived under

gaussian noise assumption and asymptotic settings. In contrast,

the proposed approach uses orderings better reflecting statistical

properties of natural images, and VC-based thresholding

developed for finite sample settings under very general noiseassumptions. A tree structure is proposed to order the wavelet

coefficients based on its magnitude, scale and spatial location.

The choice of a threshold is based on the general VC method for

model complexity control. Empirical results show that the

proposed method outperforms Donohos level dependent

thresholding techniques and the advantages become more

significant under finite sample and non-gaussian noise settings.

1. INTRODUCTION

In many applications, image denoising is used to produce good

estimates of the original image from noisy observations. The

restored image should contain less noise than the observations

while still keep sharp transitions (i.e. edges).

Wavelet transform, due to its excellent localization property, has

rapidly become an indispensable signal and image processing

tool for a variety of applications, including compression [7,8] and

denoising [1,2,4,5,6,9]. Wavelet thresholding (first proposed by

Donoho [4,5,6]) is a signal estimation technique that exploits the

capabilities of wavelet transform for signal denoising and has

recently received extensive research attentions. It removes noise

by killing coefficients that are insignificant relative to some

threshold, and turns out to be simple and effective. Wavelet

thresholding solution given by Donoho has also proven to be

asymptotically optimal in a minimax MSE (mean squared error)

sense over a variety of smoothness spaces [2,6]. It should be

pointed out, however, all the proofs were conducted under

additive gaussian noise assumptions.

In this paper, we interpret image denoising as a special case of

signal estimation problem and propose a model selection based

denoising method under the framework of VC theory, which was

developed for estimating data dependencies from finite samples.

The methodology is presented in next section, followed by

empirical results. Finally, we present the conclusions.

2. METHODOLOGY

2.1 VC-theory

VC-theory has recently emerged as a general theory for

estimating data dependencies from finite samples. It provides a

framework for model selection called structural risk

minimization (SRM). Under SRM, a set of possible models

(each model may consist of one or more basis functions) are

ordered according to their complexity. The set, called a structure

in SRM, consists of a group ofnestedsubsets Sksuch that

kSSS 21 (1)

where each element Sk has finite VC-dimension (the complexity

measure in VC-theory) ofhk. A structure is designed to provide

an ordering of its elements according to their complexity. Model

selection can be done by choosing the minimal analytic upper

bound (VC-bound) of the prediction risk provided for each

element by SRM. For detailed formulation and explanation of

VC-bound see [10]. A simplified formula [3] derived for signal

estimation (regression) is

R R p p pn

npr ed e mp +

( lnln

)12

1 (2)

where R e mp is the empirical risk, Rpr ed is the estimated

prediction risk, n is the number of signal samples, p( = h nk ) is

a complexity parameter. This inequality holds with probability

(1 1 / n ). A straightforward implementation of SRM is to

construct each element Sk in the structure as a linear

combination of n k basis functions, in which case the complexity

of each element Sk

is simply h nk k= + 1 [10].

2.2 Wavelet thresholding

Wavelet thresholding for image denoising involves two steps: 1)

taking the wavelet transform of an image (i.e., calculating the

wavelet coefficients); 2) discarding (setting to zero) the

coefficients with relatively small or insignificant magnitudes. Bydiscarding small coefficients one actually discard wavelet basis

functions which have coefficients below a certain threshold. The

denoised signal is obtained via inverse wavelet transform of the

kept coefficients. One global threshold derived by Donoho [5,6]

under gaussian noise assumption is )log(2 nT = , where

n is the number of samples and the noise standard deviation.

Clearly, wavelet thresholding can be viewed a special case of

signal/data estimation from noisy samples, which can be

addressed within the framework of VC-theory. Consider the

following structure on a set of all discrete wavelet basis


2/4

functions: Each element (of the structure) Sk has exactly k

wavelet basis functions. Note that once kbasis functions in Sk

are specified, minimizing the empirical risk is trivial due to

orthogonality of wavelets and amounts to estimation of the

wavelet coefficients via discrete wavelet decomposition.

In summary, application of SRM to wavelet thresholding forimage denoising involve the following steps:

1) Define a structure by appropriate importance ordering of all

wavelet basis functions. Each element Sk of a structure consists

of the first k basis functions. The original wavelet thresholding

technique is equivalent to specifying a structure that use only a

magnitude ordering of the wavelet coefficients. Obviously, this is

not the best way of ordering the coefficients. A better tree

structure is presented in this paper.

2) Estimate the prediction risk for each set of wavelet functions

formed in the structure. Since each Sk is a set of linear models,

VC-bound of the prediction risk (2) is easy to compute.

2.3 Level dependent thresholding and importance ordering

Level-dependent thresholding has been proposed to improve the

performance of wavelet thresholding method. Instead of using a

global threshold, level-dependent thresholding uses a group of

thresholds, one for each scale level. One popular level-dependent

thresholding scheme [4] is to set the threshold as:

2/)(

, 2)log(2Jj

nj nt

= ,j = 0, , J (3)

where n is the total number of signal samples,Jis the number of

decomposition levels, is the noise standard deviation (to be

estimated) and j is the scale level. This scheme uses a larger

threshold at finer scale levels. It can be interpreted as:

1) Order the wavelet coefficients with respect to their

magnitudes adjusted by scale level as multiplied by

2/

2

j

,wherej is the scale level associated with each coefficient.

2) Apply global threshold 2/2)log(2J

n nt

= .

This suggests that the level-dependent thresholding be viewed as

a special case of more sophisticated importance ordering in

model selection based denoising method.

A number of different structures (ordering schemes) can be

specified on the same set of basis functions. The choice of a

structure can be critical for the success of image denoising. A

good ordering should reflect the prior knowledge about the

signal/data being estimated. For example, it is not sensible to

order a set of polynomial basis functions starting from the

highest order term, or order the Fourier basis functions from the

highest frequency down (because such orderings contradict the

basic assumptions about signal smoothness). Similarly, 2-D

image signal estimation with VC approach may require more

complicated ordering scheme.

Motivated by tree structures used in wavelet-base image

compression [7,8], an improved tree-base ordering structure is

proposed in this paper. The basic idea is to simultaneously

exploit the magnitude, scale and spatial location contribution of

each wavelet coefficient using a tree structure. This ordering

scheme include following steps:

1) Set initial threshold |})),({|(maxlog ,22

jiWYjit= ( denotes

the closest smaller integer), final thresholdft (usually 1) and

set the initial ordered coefficient list to an empty list;

2) Scan all the coefficients in an order from low scale to high

scale. Within each scale, choose (in certain order) those selected

(due to space limit, we refer readers to [11] for details on what

coefficients are selected) coefficients that are equal to or larger

than the threshold tand append them to the list;

3) Set those coefficients selected in step 2) to N/A (not

available next iteration) and halve the threshold t;

4) If t tf , then repeat step 2) and step 3); otherwise, append

all the rest coefficients to the list in certain scanning order.

3. EMPIRICAL RESULTS

We compared following three denoising methods:

1) WaveThresh: Donohos level dependent thresholding

method using (3). The noise standard deviation is calculated

using Donohos estimate MAD/0.6745 [4], where MAD is the

median of the magnitudes of all the coefficients at the finest

decomposition scale.

2) WaveVC: Order the wavelet coefficients using the tree

structure proposed in previous section and use VC-bound to

choose the optimal number of coefficients (minimizing the

bound).

3) Wiener2: Wiener2 in Matlab is a spatial version of Wiener

filtering algorithm.

Approach 1) and 2) use biorthogonal wavelet filters. The window

size, a parameter in Wiener2, is set in our experiments to 3 3.

Different image sizes are tested. We mainly compare differentmethods on two measures: Signal-to-Noise Ratio (SNR) of

denoised image and the model complexity of the approximation.

SNR is defined as:

)),(

)var((log10 10

YYmse

YSNR =

(4)

where Y is the original clean image and Y is the denoised

image. The model complexity ofWaveVCand WaveThresh is just

the VC dimension of the model. Wiener2 can be viewed as a

local K-mean method doing some local averaging over the noisy

image. Its model complexity can be approximated by the VC

dimension of K-mean method, which is n/k [3] with n the

number of samples and k the size of averaging window. For

example, for 512 512 image with 3 3 window size, the model

complexity is 512 * 512 / 3 / 3 = 29127.

Due to space limit, we only show results on 8-bitLenna image in

this paper. Fig. 1 and 2 show the comparable denoising results

on 512 512 Lenna images corrupted by gaussian white noise

( = 15), using WaveThresh and WaveVC, respectively. Fig. 3

compares the SNR values and the model complexities of the

three approaches on 512 512 Lenna images at a variety of

different noise levels. The results on 128 128 images and

32 32 images are shown in Fig. 4 and Fig. 5, respectively. In


3/4

these results, multiplicative speckle noise is used to show the

advantages of our proposed method under non-gaussian settings.

We have similar but less dramatic results for gaussian noise

settings (which can be found in [11]).

Obviously WaveVCperforms approximately the same as or better

than WaveThresh for 512 512 Lenna images and begins to

outperform WaveThresh for smaller (128 128) images. And therelative performance of WaveVC increases further for 32 32

images. The results can be explained as follows:

1) VC theory was designed for finite samples and Donohos

threshold was derived under asymptotic assumptions. As the

image size gets smaller, the asymptotic assumptions begin to fail.

2) The noise assumption used in Donohos derivation fails when

images are not contaminated by additive gaussian noise. In

contrast, VC-based approach is more general in this sense.

As a global trend, WaveVC tends to use large amount of

coefficients for reconstructing the image when the true noise

standard deviation is small and use less when is large. And

this is true for different image sizes. So when the noise standard

deviation is fairly small, meaning the image pretty clean, VC

approach tends to keep a large number of coefficients, which

makes sense. WaveThresh does not have such clear trends.

4. CONCLUSIONS

Image denoising problem can be cast as a 2-D signal estimation

problem. In this paper, VC-based model selection method is

integrated with a variation of the wavelet thresholding method

and performs well on this problem. An importance ordering

structure (the tree structure), which reflects the prior knowledge

about the data and the basis functions used, turns out to

characterize the importance of noisy wavelet coefficients

successfully. However, there may exist better ordering scheme

for this wavelet-based denoising problem.

Wiener filtering is an optimal linear MSE estimator and Donoho

has proven his methods to be minimax optimal under certain

assumptions. However, both methods are based on white noise

model and true only in asymptotic sense. In contrast, model

selection based denosing method is more general and does not

need any noise assumption. And compared to Wiener filtering,

thresholding uses a sparse structure to approximate the original

signal so provides a compressed representation of the original

signal (only a small number of coefficients need to be kept).

Obviously our method has a lot more potential applications.

5. ACKNOWLEDGEMENT

This work was supported, in part, by a grant from Minnesota

Department of Transportation.

6. REFERENCES

[1] S. G. Chang and M. Vetterli, "Spatial Adaptive Wavelet

Thresholding for Image Denoising", Proc of IEEE Int. Conf. on

Image Processing, 1997

Fig. 1 Denoised image by WaveThresh (SNR = 24.99 dB)

Fig. 2 Denoised image by WaveVC(SNR = 25.26 dB)

[2] A. Chambolle, R. A. DeVore, N-Y Lee and B. J. Lucier,

Nonlinear wavelet image processing: variational problems,

compression and noise removal through wavelet shrinkage,

IEEE Trans. Image Processing, vol. 7, pp. 319-335, 1998

[3] V. Cherkassky and F. Mulier, Learning from Data:

Concepts, Theory and Methods, Wiley Interscience, 1998

[4] D. L. Donoho, "Wavelet Thresholding and W.V.D.: A 10-

minute Tour", Int. Conf. on Wavelets and Applications,

Toulouse, France, June 1992

[5] D. L. Donoho and I. M. Johnstone, "Ideal spatial adaptation

via wavelet thresholding", Biometrika, vol. 81, pp. 425-455,

1994


4/4

[6] D. L. Donoho, "De-Noising by Soft-Threshholding", IEEE

Trans. Information Theory, vol. 41, No. 3, May 1995

[7] A. Said and W. A. Pearlman, A New Fast and Efficient

Image Codec Based on Set Partitioning in Hierarchical Trees,

IEEE Trans Circ. and Syst. Video Tech., vol. 6, June 1996

[8] J. M. Shapiro, Embedded Image Coding using Zerotrees of

Wavelet coefficients, IEEE Trans. Signal Processing, vol. 41,

pp. 3445-3462, Dec. 1993

[9] X. Shao and V. Cherkassky, "Model Selection for Wavelet-

based Signal Estimation", Proc. IEEE Int. Joint Conf. on Neural

Networks, Anchoradge, Alaska, 1998

[10] V. Vapnik, The Nature of Statistical Learning Theory,

Springer, 1995

[11] S. Zhong and V. Cherkassky, Image Denoising using

Wavelet Thresholding and Statistical Learning Theory,

submitted to IEEE Trans. Image Processing, Feb. 2000

Fig. 3 Denoising results for multiplicative speckle noise on 512 by 512 Lenna image



0 10 20 30 40 5020

22

24

26

28

30

32

34

Noise standard deviation

SNR

(dB)

. - Wiener2+ - WaveThresho - WaveVC

0 10 20 30 40 500.5

1

1.5

2

2.5

3

3.5

4x 104


Modelcomplexity(VC-dimension) . - Wiener2

+ - WaveThresho - WaveVC

0 10 20 30 40 5016

18

20

22

24

26

28


SNR

(dB)


0 10 20 30 40 5010001500200025003000350040004500500055006000


M

odelcomplexity(VC-dimension) . - Wiener2+ - WaveThresh

o - WaveVC

0 10 20 30 40 5012

14

16

18

20

22

24


SNR

(dB)


0 10 20 30 40 5050

100150200

250

300350400

450500


Modelcomplexity(VC-dimension) . - Wiener2+ - WaveThresh

o - WaveVC

image denoising using wavelet thresholding and model selection

Documents