astronomical image denoising using dictionary learningjstarck.free.fr/dl2012_double.pdf · on...

Astronomy & Astrophysics manuscript no. DL2012˙double c© ESO 2012December 20, 2012

Astronomical Image Denoising Using Dictionary LearningS. Beckouche1, J.L. Starck1, and J. Fadili2

1 Laboratoire AIM, UMR CEA-CNRS-Paris 7, Irfu, SAp/SEDI, Service d’Astrophysique, CEASaclay, F-91191 GIF-SUR-YVETTE CEDEX, France2 GREYC CNRS-ENSICAEN-Universite de Caen, 6, Bd du Marechal Juin, 14050 Caen Cedex, France

December 20, 2012

Abstract

Astronomical images suffer a constant presence of multiple defects that are consequences of the intrin-sic properties of the acquisition equipments, and atmospheric conditions. One of the most frequentdefects in astronomical imaging is the presence of additive noise which makes a denoising step manda-tory before processing data. During the last decade, a particular modeling scheme, based on sparserepresentations, has drawn the attention of an ever growing community of researchers. Sparse repre-sentations offer a promising framework to many image and signal processing tasks, especially denoisingand restoration applications. At first, the harmonics, wavelets, and similar bases and overcompleterepresentations have been considered as candidate domains to seek the sparsest representation. A newgeneration of algorithms, based on data-driven dictionaries, evolved rapidly and compete now with theoff-the-shelf fixed dictionaries. While designing a dictionary beforehand leans on a guess of the mostappropriate representative elementary forms and functions, the dictionary learning framework offersto construct the dictionary upon the data themselves, which provides us with a more flexible setup tosparse modeling and allows to build more sophisticated dictionaries. In this paper, we introduce theCentered Dictionary Learning (CDL) method and we study its performances for astronomical imagedenoising. We show how CDL outperforms wavelet or classic dictionary learning denoising techniqueson astronomical images, and we give a comparison of the effect of these different algorithms on thephotometry of the denoised images.

Key words. Methods : Data Analysis, Methods : Statistical

1. Introduction

1.1. Overview of sparsity in astronomy

The wavelet transform (WT) has been extensivelyused in astronomical data analysis during the lastten years, and this holds for all astrophysical do-mains, from study of the sun through CosmicMicrowave Background (CMB) analysis (Starck& Murtagh 2006). X-ray and Gamma-ray sourcescatalogs are generally based on wavelets (Pacaudet al. 2006; Nolan et al. 2012). Using multiscaleapproaches such as the wavelet transform, an im-age can be decomposed into components at dif-ferent scales, and the wavelet transform is there-fore well-adapted to the study of astronomical data(Starck & Murtagh 2006). Furthermore, since noisein physical sciences is not always Gaussian, model-ing in wavelet space of many kinds of noise such asPoisson noise has been a key motivation for the useof wavelets in astrophysics (Schmitt et al. 2010). Ifwavelets represent well isotropic features, they arefar from optimal for analyzing anisotropic objectssuch as filaments, jets, etc.. This has motivated theconstruction of a collection of basis functions gen-

erating possibly overcomplete dictionaries, eg. co-sine, wavelets, curvelets (Starck et al. 2003). Moregenerally, we assume that the data X is a super-position of atoms from a Dictionary D such thatX = Dα, where α are the synthesis coefficients ofX from D. The best data decomposition is the onewhich leads to the sparsest representation, i.e. fewcoefficients have a large magnitude, while most ofthem are close to zero (Starck et al. 2010b). Hence,for some astronomical data sets containing edges(planetary images, cosmic strings, etc.), curveletsshould be preferred to wavelets. But for a signalcomposed of a sine, the Fourier dictionary is opti-mal from a sparsity standpoint since all informationis contained in a single coefficient. Hence, the rep-resentation space that we use in our analysis canbe seen as a prior we have on our observations.The larger the dictionary is, the better data anal-ysis will be, but also the larger the computationtime to derive the coefficients α in the dictionarywill be. For some specific dictionaries limited to agiven set of functions (Fourier, wavelet, etc.) wehowever have very fast implicit operators allowingus to compute the coefficients with a complexity of

1

O(N logN), which makes these dictionaries veryattractive. But what can we do if our data are notwell represented by these fixed existing dictionar-ies ? Or if we do not know the morphology of fea-tures contained in our data ? Is there a way tooptimize our data analysis by constructing a ded-icated dictionary ? To answer these questions, anew field has recently emerged, called DictionaryLearning (DL). Dictionary learning techniques of-fer to learn an adaptive dictionary D directly fromthe data (or from a set of exemplars that we believeto well represent the data). DL is at the interface ofmachine learning, optimization and harmonic anal-ysis.

1.2. Contributions

In this paper, we show how classic dictionary learn-ing for denoising behaves with astronomical im-ages. We introduce a new variant, the CenteredDictionary Learning (CDL), developed to processmore efficiently point-like features that are ex-tremely common in astronomical images. Finally,we perform a study comparing how wavelet anddictionary learning denoising methods behave re-garding sources photometry, showing that dictio-nary learning is better at preserving the sourcesflux.

1.3. Paper organization

This paper is organized as follows. Section 2presents the sparsity regularization problem wherewe introduce notations and the paradigm of dictio-nary for sparse coding. We introduce in Section 3the methods for denoising by dictionary learningand we introduce the CDL technique. We give inSection 4 our results on astronomical images andwe conclude with some perspectives in Section 5.

2. Sparsity regularization

2.1. Notations

We use the following notations. Uppercase lettersare used for matrices notation and lowercase forvectors. Matrices are written column-wise D =[d1, . . . , dm] ∈ Rn×m. If D is a dictionary matrix,the columns di ∈ Rn represent the atoms of thedictionnary. We define the `p pseudo-norm (p > 0)

of a vector x ∈ Rn as ‖x‖p = (∑ni=1 |xi|p)

1/p. As

an extension, the `∞ norm is defined as ‖x‖∞ =max1≤i≤n {xi}, and the pseudo-norm `0 stands forthe number of non-zero entries of a vector: ‖x‖0 =# {i, xi 6= 0}. Given an image Y of Q×Q pix-els, a patch size n = τ × τ and an over-lapping factor ∆ ∈ [1, ..., n], we denotes byR(i1,i2)(Y ) the patch extracted from Y at thecentral position i = (i1, i2) ∈ [0, ..., Q/∆] andconverts it into a vector of Rn, such that

∀j1, j2 ∈ [−τ/2, ..., τ/2],

Y (i1∆ + j1, i2∆ + j2). = Ri(Y )[τj1 + j2], (1)

which corresponds to reading the extractedsquare patch column by column. Given apatch Ri,j(Y ) ∈ Rn, we define the centering op-erator Ci,j as the translation operator

Ci,jRi,j [l] =

{Ri,j [l + δi,j ] if 1 ≤ l ≤ n− δi,jRi,j [l + δi,j − n] if n− δi,j < l ≤ n

(2)and δi,j is the smallest index verifying{

Ci,jRi,j [n/2] = maxl{Ri,j [l]} if n is even

Ci,jRi,j [(n− 1)/2] = maxl{Ri,j [l]} if n is odd

(3)The centering operator translates the original vec-tor values to place the maximum values in thecentral index position. In case where the origi-nal vector has more than one entry that reachits maximum values, the smallest index with thisvalue is placed at the center in the translatedvector. Finally, to compare two images M1,M2,we use the Peak Signal to Noise Ratio PSNR =

10 · log10

(max(M1,M2)

2

MSE(M1,M2)

), where MSE(M1,M2) is

the mean square error of the two images andmax(M1,M2) is the highest value contained in M1

and M2.

2.2. Sparse recovery

A signal α = [α1, . . . , αn], is said to be sparse whenmost of its entries αi are equal to zero. When theobservations do not satisfy the sparsity prior in thedirect domain, computing their representation co-efficients in a given dictionary might yield a sparserrepresentation of the data. Overcomplete dictionar-ies, which contain more atoms than their dimensionand thus are redundant, when coupled with sparsecoding framework, have shown in the last decadeto lead to more significant and expressive repre-sentations, which help to better interpret and un-derstand the observations (Starck & Fadili 2009;Starck et al. 2010a). Sparse coding concentratesaround two main axes: finding the appropriate dic-tionary, and computing the encodings given thisdictionary.

Sparse decomposition requires the summationof the relevant atoms with their appropriateweights. However, unlike a transform coder thatcomes with an inverse transform, finding suchsparse codes within overcomplete dictionaries isnon-trivial, in particular because the decomposi-tion of a signal on an overcomplete dictionary isnot unique. The combination of a dictionary rep-resentation with sparse modeling has first been in-troduced in the pioneering work of Mallat & Zhang(1993), where the traditional wavelet transformshave been replaced by the more generic conceptof dictionary for the first time.

2

We use in this paper the sparse synthesis ap-proach. Given an observation x ∈ Rn, and a spar-sifying dictionary D ∈ Rn×k, the process of sparsedecomposition refers to finding an encoding α ∈ Rkthat represents a given signal x in the domainspanned by the dictionary D, while minimizing thenumber of the elementary atoms involved in the soidentified code:

α ∈ arg minα‖α‖0 s.t. x = Dα (4)

When the original signal is to be reconstructed onlyapproximately, the equality constrain is replaced byan `2 norm inequality constrain:

α ∈ arg minα‖α‖0 s.t. ‖x−Dα‖2 ≤ ε (5)

where ε is a threshold controlling the misfitting be-tween the observation x and the recovered signalx = Dα.

The sparse prior can also be used from an analy-sis point of view (Elad et al. 2007). In this case, thecomputation of the signal coefficient is simply ob-tained by the sparsifying dictionary and the prob-lem becomes

y ∈ arg miny‖D∗y‖0 s.t. y = x (6)

or

y ∈ arg miny‖D∗y‖0 s.t. ‖x− y‖2 ≤ ε (7)

whether the signal x is contaminated by noiseor not. This approach has been explored more re-cently than the synthesis model and has thus faryielded promising results (Rubinstein et al. 2012).We chose to use the synthesis model for our workbecause it offers more guarantees as it has beenproved to be an efficient model in many differentcontexts.

Solving 5 proves to be conceptually Np-hardand numerically intractable. Nonetheless, heuristicmethods called greedy algorithms were developedto approximate the sparse solution of the `0 prob-lem, while considerably reducing the resources re-quirements. The process of seeking a solution canbe divided into two effective parts: finding the sup-port of the solution and estimating the values of theentries over the selected support (Mallat & Zhang1993). Once the support of the solution is found, es-timating the signal coefficients becomes a straight-forward and easier problem since a simple least-squares application can often provide the optimalsolution regarding the selected support. This classof algorithms includes matching pursuit (MP), or-thogonal matching pursuit (OMP), gradient pur-suit (GP) and their variants.

A popular alternative to the problem (5) isto use the `1−norm instead of the `0 to promotea sparse solution. Using the `1 norm as a spar-sity prior results on a convex optimization problem

(Basis Pursuit De-Noising or Lasso) that is that iscomputationally tractable, finding

α ∈ arg minα‖α‖1 s.t. ‖x−Dα‖2 ≤ ε. (8)

The optimization problem (8) can also bewritten in its unconstrained form:

α ∈ arg minα‖x−Dα‖22 + λ ‖α‖1 (9)

where λ is a Lagrange multiplier, controlling thesparsity of the solution (Chen et al. 1998). Thelarger λ is, the sparser the solution becomes. Manyframeworks have been proposed in this perspective,leading to multiple basis pursuit schemes. Readersinterested in an in-depth study of sparse decompo-sition algorithms can be referred to Starck et al.(2010a); Elad (2010).

2.3. Fixed dictionaries

A data set can be decomposed in many dictionar-ies, but the best dictionary to solve 5 is the onewith the sparsest (most economical) representationof the signal. In practice, it is convenient to usedictionaries with fast implicit transform (such asFourier transform, wavelet transform, etc.) whichallow us to directly obtain the coefficients and re-construct the signal from these coefficients usingfast algorithms running in linear or almost lineartime (unlike matrix-vector multiplications). TheFourier, wavelet and discrete cosine transforms pro-vide certainly the most well known dictionaries.

Most of these dictionaries are designed to han-dle specific contents, and are restricted to signalsand images that are of a certain type. For instance,Fourier represents well stationary and periodic sig-nals, wavelets are good to analyze isotropic objectsof different scales, curvelets are designed for elon-gated features, etc.. They can not guarantee sparserepresentations of new classes of signals of inter-est, that present more complex patterns and fea-tures. Thus, finding new approaches to design thesesparsifying dictionaries becomes of the utmost im-portance. Recent works have shown that designingadaptive dictionaries and learning them upon thedata themselves instead of using predesigned selec-tions of analytically-driven atoms leads to state-of-the-art performances in various tasks such asimage denoising (Elad & Aharon 2006), inpaint-ing (Mairal et al. 2010), source separation (Bobinet al. 2008, 2012) and so forth.

2.4. Learned dictionaries

The problem of dictionary learning, in its non-overcomplete form (that is when the number ofatoms in the dictionary is smaller or equal to thedimension of the signal to decompose), has beenstudied in depth and can be approached using

3

many viable techniques, such as principal compo-nent analysis (PCA) and its variants, which arebased on algorithms minimizing the reconstruc-tion errors upon a training set of samples, whilerepresenting them as a linear combination of thedictionary elements (Bishop 2007). Inspired froman analogy to the learning mechanism in the sim-ple cells in the visual cortex, Olshausen & Field(1996) proposed a minimization process based ona cost function that balances between a misfittingterm and a sparsity inducing term. The optimiza-tion process is performed by alternating the opti-mization with respect to the sparse encodings, andto the dictionary functions. Most of the overcom-plete dictionary learning methods are based on asimilar alternating optimization scheme, while us-ing specific techniques to induce the sparsity priorand update the dictionary elements. This problemshares many similarities with the Blind SourcesSeparation problem (Zibulevsky & Pearlmutter1999), although in BSS the sources are assumedto be sparse in a fixed dictionary and the learningis performed on the mixing matrix.

A popular approach is to learn patch-sizedatoms instead of a dictionary of image-sized atoms.This allows a faster processing and makes the learn-ing possible even with a single image to train onas many patch exemplars can be extracted from asingle training image. Section 3 gives more detailsabout the variational problem of patch learning anddenoising. This patch-based approach lead to dif-ferent learning algorithms such as MOD (Enganet al. 1999), Pojected Gradient Descent methods(Lin 2007), or K-SVD (Aharon et al. 2006) thathave proven efficient for image processing (Elad &Aharon 2006; Mairal et al. 2010; Peyre et al. 2010).

3. Denoising by centered dictionary learning

3.1. General variational problem

The goal of denoising with dictionary learning isto build a suitable n× k dictionary D, a collectionof atoms [di]i=1,...,P ∈ RN×P , that offers a sparserepresentation to the estimated denoised image. Asis it not numerically tractable to process the wholeimage as a large vector, Elad & Aharon (2006);Mairal et al. (2010); Peyre et al. (2010) propose tobreak down the image in smaller patches and learna dictionary of patch-sized atoms. When simulta-neously learning a dictionary and denoisingan image Y , the learning process finds(

X, A, D)∈ arg min

X,A,DE(X,A,D) (10)

where

E(X,A,D) =λ

2‖Y −X‖22 +∑

i,j

(µi,j2‖Ci,jRi,j(X)−Dαi,j‖22 + ‖αi,j‖1

)(11)

such that the learned dictionary D is in D, the setof dictionaries with bounded atoms

∀j ∈ [1, ..., k], ‖dj‖2 =

N∑i=1

|dj [i]|2 ≤ 1. (12)

Here, Y is the noisy image, X the estimated de-noised image, A = (αi,j)i,j is the sparse en-coding matrix such that αi,j is a sparse encod-ing of Ri,j(X) in D and Ci, j is a centering oper-ator define by 2. The parameters λ and (µi,j)i,jbalance the energy between sparsity prior, datafidelity between the learned dictionary and thetraining set, and denoising. The dictionary is con-strained to verify 12 to avoid degenerative solu-tions. Indeed if (X,A,D) is a minimizer of E , then(X, νA, 1νD

), 0 < ν < 1 is also solution and can

yield an arbitrary small sparsity error term, anda degenerated dictionary. Having bounded atomsallows us to avoid this problem. The energy 11is not minimized with respect to the trans-lation operators (Ci,j)i,j . We chose to use fixedoperators that translates the maximum of a patchin its center to increase the sensibility of the al-gorithm to point-like features such as stars, whichare very common in astronomical energy. We showin Section 4 how introducing centering operator al-lows significant better results for astronomical im-age processing.

It is possible to learn a dictionary without de-noising the image simultaneously, thus minimizing∑

i,j

(1

2‖Ri(X)−Dαi‖2 + λ ‖αi‖1

)(13)

with respect to D and A. This allows to learn a dic-tionary from a noiseless training set, or learn froma small noisy training set extracted from a largenoisy image when it is numerically not tractable toprocess the whole image directly. Once the dictio-nary learned, an image can be denoised solving 5 aswe show in Section 4. The classical scheme of dic-tionary learning for denoising dos not include thecentering operators and has proven to be an effi-cient approach (Elad & Aharon 2006; Peyre et al.2010).

An efficient way to find a minimizer of 11 is touse a alternating minimization scheme. The dictio-nary D, the sparse coding coefficient matrix A andthe denoised imageX are alternatively updated oneat a time, the other being fixed. We give more de-tails about each step and how we tuned the param-eters below.

3.2. Alternating minimization

3.2.1. Sparse coding

We consider here that the estimated image X andthe dictionary D are determined to minimize Ewith respect to A. Estimating the sparse encoding

4

matrix A comes down to solve 9, that can be solvedusing iterative soft thresholding (Daubechies et al.2004) or interior point solver (Chen et al. 1998). Wechose to use the Orthogonal Matching Pursuit al-gorithm (Pati et al. 1993), a greedy algorithm thatfind an approximate solution of 5. OMP yields sat-isfying results while being very fast and parameterssimple to tune. When learning on a noisy image, welet OMP find the sparsest representation of a vectorup to an error threshold set depending on the noiselevel. In the case of learning an image on a noise-less image, we reconstruct an arbitrary number ofcomponent of OMP.

3.2.2. Dictionary update

We consider that the encoding matrix A and thetraining image Y are fixed here, and we explain howthe dictionary D can be updated. The dictionaryupdate consists in finding

D ∈ arg minD∈D

∑i,j

µi,j2‖Ci,jRi,j(X)−Dαi,j‖22 ,

(14)which can be rewritten in a matrix form as

D ∈ arg minD∈D

‖P −DA‖2F (15)

where each columns of P contain a the patchCi,jRi,j(X). We chose to use the Method ofOptimal Directions that minimizes the meansquare error of the residuals, introduced in Enganet al. (1999). The MOD algorithm uses a singleconjugate gradient step and gives the followingdictionary update

D = pD

(PAT

(AAT

)−1)(16)

where pD is the projection on D such that forD2 = pD (D1), d2i = d1i/ ‖d1i‖2 for each atomd2j of D2.The MOD algorithm is simple to imple-ment and fast. An exact minimization is possiblewith an iterative projected gradient descend (Peyreet al. 2010) but the process is slower and requireprecise parameter tuning. Another successful ap-proach, the K-SVD algorithm, updates the atomsof the dictionary one by one, using for the update ofa given atom only the patches that use significantlythis atom in their sparse decomposition (Aharonet al. 2006).

3.2.3. Image update

When D and A are fixed, the energy 11 is aquadratic function of X minimized by the closed

-form solution

X =

∑i,j

µi,jR∗i,jRi,j + λId

−1∑

i,j

µi,jR∗i,jC

∗i,jDαi,j + λY

.

(17)

Updating X with 17 simply consists in applyingon each patch the ”de-centering” operator C∗i,jand reconstruct the image by averaging overlapingpatches.

3.2.4. Algorithm summary

The centered dictionary learning for denoising al-gorithm is summarized in Algorithm 1. It takes asinput a noisy image to denoise and an initial dic-tionary, and iterates the three steps previously de-scribed to yield a noiseless image, a dictionary, andan encoding matrix.

Algorithm 1 Alternating scheme for centered dic-tionary learning and denoising

Input: noisy image Y ∈ RQ×Q, number of iterationsK, assumed noise level σOutput: sparse coding matrix A, sparsifying dic-tionary D, denoised image XInitialize D ∈ Rn×p with patches randomly ex-tracted from Y , set αi,j = 0 for all i, j, set X = Y ,compute centering operators (Ci,j)i,j by locating themaximum pixel of each patch (Ri,j(X))i,jfor k = 1 to K do

Step 1: Sparse codingCompute the sparse encoding matrix A of(Ri,j(X))i,j in D solving 5 or 8Step 2: Dictionary updateUpdate dictionary D solving 15Step 3: Image updateUpdate denoised image X using 17

end for

3.3. Parameters

The algorithm shown in 1 requires several param-eters. All images are 512× 512 in our experiments.

Patch size and overlap: We use n = 9 × 9patches for our experiments and take an overlapof 8 pixels between two consecutive patches. A oddnumber of pixels is more convenient for patch cen-tering, and this patch size has proven to be a goodtrade off between speed and robustness. A highoverlap parameter allows to reduce block artifacts.

Dictionary size: We learn a dictionary of p =2n = 162 atoms. A ratio 2 between the size of thedictionary and the dimension of its atoms. It makesthe dictionary redundant and allows to capture dif-ferent morphologies, without inducing an unreason-able computing complexity.

5

Training set size: We extract 80n train-ing patches when learning patches of n pixels.Extracting more training sample allows tobetter capture the image morphology, andwhile it leads to very similar dictionaries, itallows a slightly sparser representation and aslightly better denoising. Reducing the sizeof the training set might lead to miss somefeatures from the image to learn from, de-pending on the diversity of the morphologyit contains. Sparse coding stop criterion: We stopOMP when the sparse coding xs of a vector x ver-ifies

‖xs − x‖2 ≤ Cσ√n (18)

and we use C = 1.15 as gain parameter, similarly toElad & Aharon (2006). When learning on noiselessimages, we stop OMP computation when it findsthe 3 first components of xs.

Training set We do not use every patch avail-able in Y as it would be too computationally costly,so we select a random subset of patch positionsthat we extract from Y . We extract 80n train-ing patches and after learning, we perform a singlesparse coding step with the learned dictionary onevery noisy patch from Y that are then averagedusing 17. Extracting more training sample does nothave a significant effect on the learned dictionaryin our examples. Reducing the size of the trainingset might lead to miss some features from the im-age to learn from, depending on the diversity of themorphology it contains.

4. Application to astronomical imaging

In this section, we report the main results of the ex-periments we conducted to study the performancesof the presented strategy of centering dictionarylearning and image denoising, in the case of astro-nomical observations. We performed our tests onseveral Hubble images and cosmic string simula-tions (see Figure 1). Cosmic string maps are notstandard astronomical images, but present the in-terest to have a complex texture and to be ex-tremely difficult to detect. Wavelet filtering hasbeen proposed to detect them (Hammond et al.2009) and it is interesting to investigate if DLcould eventually be an alternative to wavelets forthis purpose. It should however be clear that thelevel of noise that we are using here are not real-istic, and this experiment has to be seen as a toy-model example rather than a cosmic string scien-tific study which would require to consider as wellCMB and also more realistic cosmic string simula-tions. The three Hubble images are the Pandora’sGalaxy Cluster Abell 2744, an ACS image of 47Tucanae star field, and a WFC3 UVIS Full FieldNebula image. These images contain a mixtureof isotropic and linear features, which make themdifficult to process with the classical wavelets orcurvelets-based algorithms.

We study two different case, where we per-form dictionary learning and image denoising atthe same time, and where the dictionary is learnedon a noiseless image and used afterward to denoisea noisy image. We show for these two cases how DLis able to capture the natural features contained inthe image, even in presence of noise, and how itoutperforms wavelet-based denoising techniques.

4.1. Joint learning and denoising

We give several examples of astronomical imagesdenoised with the method presented above. Forall experiments, we show the noisy image, thelearned dictionary and the denoised images, pro-cessed respectively with the wavelet shrinkage andthe dictionary learning algorithms. We add a whiteGaussian noise to a noiseless image. We then de-noise them using Algorithm 1 and a wavelet shrink-age algorithm, and compare their performances interm of PSNR. Figure 2 shows the processing ofa Hubble image of the Pandora galaxies cluster,Figure 3 show our results on a star cluster image,and Figure 4 shows our results on a nebula im-age. The CDL proves to be superior to the wavelet-based denoising algorithm on each examples. Thedictionary learning methods is able to capture themorphology of each kind of images and manages togive a good representation of point-like features.

4.2. Separate learning and denoising

We now apply the presented method to cosmicstring simulations. We use a second image similarto the cosmic string simulation from Figure 1 tolearn a noiseless dictionary shown on Figure 5. Weadd a high-level white Gaussian noise on the cos-mic string simulation from Figure 1 and we com-pare how classic DL and wavelet shrinkage denois-ing perform on Figure 6. We chose not to use CDLbecause the cosmic string images do not containstars but more textured features. We give in Figure7 a benchmark of the same process repeated fordifferent noise levels. The PSNR between the de-noised and source image is displayed as a functionof the PSNR between the noisy and the originalsource image. The reconstruction score is higher forthe dictionary learning denoising algorithm thanfor the wavelet shrinkage algorithm, for any noiselevel. This shows that the atoms computed duringthe learning are more sensitive to the features con-tained in the noisy image than wavelets. The dictio-nary learned was able to capture the morphology ofthe training image, which is similar to the morphol-ogy of the image to denoise. Hence, the coefficientsof the noisy image’s decomposition in the learneddictionary are more significant that its coefficientin the wavelet space, which leads to a better de-noising.

6

(a) (b)

(c) (d)

Figure 1. Hubble images used for numerical experiments. Figure (a) is the Pandora’s Cluster Abell 2744,Figure (b) is an ACS image of 47 Tucanae, Figure (c) is a image of WFC3 UVIS Full Field, and Figure(d) is a cosmic strings simulation.

We show now how DL behaves when learningon real astronomical noiseless images,that is im-ages that present an extremely level of noiseor that have been denoised with a prior pro-cess and thus are considered noiseless. Wegive several benchmarks to show how the centereddictionary learning is able to outperforms the clas-sic approach. We denoise two previously presentedimages, and two additional images shown in Figure8. We perform the learning step on similar noise-less images, see Figure 9. The benchmark resultsare presented in Figures 10, 11, 12 and 13. Figure

13 illustrates a particular case where the classicaldictionary learning becomes less efficient than thewavelet-based denoising algorithm while using thecentered learning and denoising yields better re-sults at any noise level. For each benchmark, weadded a white Gaussian noise with a varying stan-dard deviation to one image and learn a centereddictionary and a non-centered dictionary on a sec-ond similar noiseless image. We use the same set ofparameters for both learning. CDL performs bet-ter than the classic DL method and wavelet-baseddenoising. A consequence of the better sparsifying

7

(a) (b)

(c) (d)

Figure 2. Results of denoising with galaxy cluster image. Figure (a) shows the image noisy image, witha PSNR of 26.52 dB. The learned dictionary is shown Figure (b). Figure (c) shows the result of thewavelet shrinkage algorithm that reaches a PSNR of 38.92 dB, Figure (d) shows the result of denoisingusing the dictionary learned on the noisy image, with a PSNR of 39.35 dB.

capability of the centered dictionary is a faster com-putation during the sparse coding step. The noise-less dictionaries prove to be efficient for any levelof noise.

4.3. Photometry and source detection

Although the final photometry is generally doneon the raw data Pacaud et al. (2006); Nolan et al.(2012), it is important that the denoising does notintroduce a strong bias on the flux of the different

sources because it would dump their amplitude andreduce the number of detected sources.

We provide in this section a photometric com-parison of the wavelet and dictionary learning de-noising algorithms. We use the top left quarter ofthe nebula image from 4. We run Sextractor (Bertin& Arnouts 1996) using a 3σ detection thresholdon the noiseless image, and we store the detectedsources with their respective flux. We then add awhite Gaussian noise with a standard deviation of0.07 to the image which has a standard deviation of

8

(a) (b)

(c) (d)

Figure 3. Results of denoising with star cluster image. Figure (a) shows the image noisy image, with aPSNR of 27.42 dB. The learned dictionary is shown Figure (b). Figure (c) shows the result of the waveletshrinkage algorithm that reaches a PSNR of 37.28 dB, Figure (d) shows the result of denoising using thedictionary learned on the noisy image, with a PSNR of 37.87 dB.

0.0853 (SNR = 10.43 dB), and use the different al-gorithms to denoise it. We then use Sextractor, us-ing the source location stored from the clean imageanalysis and processing the denoised images. Weshow in Figure 14 two curves. The first one is thenumber of sources in the image with a flux above avarying threshold, for the original, wavelet denoisedand CDL denoised images. The second curve showshow the flux is dampened by the different denois-ing methods. We also show in Figures 15, 16 and 17several features after denoising the galaxy cluster

images using the different methods. It appears thatthe centered dictionary learning denoising restoresobjects with better contrast, less blur, and is moresensitive to small sources. We finally give severalbenchmarks to show how the centered dictionarylearning is able to overcome the classic approach.

The learned dictionary based techniques showa much better behavior in term of flux comparison.This is consistent with the aspect of the featuresshowed in Figures 15, 16 and 17. The CDL method

9

(a) (b)

(c) (d)

Figure 4. Results of denoising with nebula image. Figure (a) shows the image used both for learning anoisy dictionary and denoising, with a PSNR of 26.67 dB. The learned dictionary is shown Figure (b).Figure (c) shows the result of the wavelet shrinkage algorithm that reaches a PSNR of 33.61 dB, Figure(d) shows the result of denoising using the dictionary learned on the noisy image, with a PSNR of 35.24dB.

induces less blurring of the sources and is more sen-sitive to point-like features.

5. Software

We provide the matlab functionsand script related to our nu-merical experiment at the URLhttp://www.cosmostat.org/software.html.

6. Conclusion

We introduce a new variant of dictionary learn-ing, the centered dictionary learning method, fordenoising astronomical observations. Centering thetraining patches yields an approximate translationinvariance inside the patches and lead to significantimprovements in terms of global quality as well asphotometry or feature restoration. We conduct acomparative study of different dictionary learningand denoising schemes, as well as comparing the

10

Figure 5. Figure (a) shows a simulated cosmic string map (1”x1”), and Figure (b) shows the learneddictionary.

performance of the adaptive setting to the stateof the art in this matter. The dictionary learningappears as a promising paradigm that can be ex-ploited for many tasks. We showed its efficiency inastronomical image denoising and how it overcomesthe performances of stat-of-the-art denoising algo-rithms that use non-adaptive dictionaries. The useof dictionary learning requires to chose several pa-rameters like the patch size, the number of atomsin the dictionary or the sparsity imposed duringthe learning process. Those parameters can have asignificant impact on the quality of the denoising,or the computational cost of the processing. Thepatch-based framework also brings additional dif-ficulties as one have to adapt it to the problem todeal with. Some tasks require a more global pro-cessing of the image and might require a more sub-tle use of the patches than the sliding window usedfor denoising.

References

Aharon, M., Elad, M., & Bruckstein, A. 2006, SignalProcessing, IEEE Transactions on, 54, 4311

Bertin, E. & Arnouts, S. 1996, A&AS, 117, 393Bishop, C. M. 2007, Pattern Recognition and Machine

Learning (Information Science and Statistics) (Springer)Bobin, J., Moudden, Y., Starck, J. L., Fadili, M., &

Aghanim, N. 2008, Statistical Methodology, 5, 307–317Bobin, J., Starck, J.-L., Sureau, F., & Basak, S. 2012, ArXiv

e-printsChen, S. S., Donoho, D. L., Michael, & Saunders, A. 1998,

SIAM Journal on Scientific Computing, 20, 33Daubechies, I., Defrise, M., & De Mol, C. 2004, Comm. Pure

Appl. Math., 57, 1413Elad, M. 2010, Sparse and Redundant Representations:

From theory to applications in signal and image process-ing (Springer)

Elad, M. & Aharon, M. 2006, Image Processing, IEEETransactions on, 15, 3736

Elad, M., Milanfar, P., & Rubinstein, R. 2007, InverseProblems, 23, 947

Engan, K., Aase, S. O., & Hakon Husoy, J. 1999, 2443Hammond, D. K., Wiaux, Y., & Vandergheynst, P. 2009,

MNRAS, 398, 1317Lin, C. 2007, Neural Computation, 19, 2756Mairal, J., Bach, F., Ponce, J., & Sapiro, G. 2010, The

Journal of Machine Learning Research, 11, 19Mallat, S. & Zhang, Z. 1993, IEEE Transactions on Signal

Processing, 41, 3397Nolan, P. L., Abdo, A. A., Ackermann, M., et al. 2012, ApJS,

199, 31Olshausen, B. & Field, D. 1996, Nature, 381, 607Pacaud, F., Pierre, M., Refregier, A., et al. 2006, MNRAS,

372, 578Pati, Y. C., Rezaiifar, R., Rezaiifar, Y. C. P. R., &

Krishnaprasad, P. S. 1993, in Proceedings of the 27 thAnnual Asilomar Conference on Signals, Systems, andComputers, 40–44

Peyre, G., Fadili, J., & Starck, J. L. 2010, SIAM Journal onImaging Sciences, 3, 646

Rubinstein, R., Peleg, T., & Elad, M. 2012, in ICASSP 2012,Kyoto, Japon

Schmitt, J., Starck, J. L., Casandjian, J. M., Fadili, J., &Grenier, I. 2010, A&A, 517, A26

Starck, J., Murtagh, F., & Fadili, J. 2010a, Sparse Image& Signal Processing Wavelets, Curvelets, MorphologicalDiversity (Combridge University Press(GB))

Starck, J.-L., Candes, E., & Donoho, D. 2003, A&A, 398,785–800

Starck, J.-L. & Fadili, M. J. 2009, An overview of inverseproblem regularization using sparsity

Starck, J.-L. & Murtagh, F. 2006, Astronomical Image andData Analysis (Springer), 2nd edn.

Starck, J.-L., Murtagh, F., & Fadili, M. 2010b, Sparse Imageand Signal Processing (Cambridge University Press)

Zibulevsky, M. & Pearlmutter, B. A. 1999, NeuralComputation, 165

11

(a) (b)

(c) (d)

Figure 6. Example of cosmic string simulation denoising with a high noise level, using the learneddictionary from 5 and the wavelet algorithm. Figure (a) is the source image, Figure (b) shows the noisyimage with a PSNR of 17.34 dB, Figure (c) shows the wavelet denoised version with a PSNR of 30.19dB and Figure (d) shows the learned dictionary denoised version with a PSNR of 31.04 dB.

12

14 16 18 20 22 24 26 28 30 32 3426

28

30

32

34

36

38

40

42

DLWavelet

Original / Noisy SNR (dB)

Orig

inal

/ D

enoi

sed

SNR

(dB)

Figure 7. Benchmark comparing the wavelet shrinkage algorithm to the dictionary learning denoisingalgorithm when dealing with various noise levels, using dictionary from Figure 5. Each experiment isrepeated 100 times and the results are averaged. We use the maximum value for the patch overlapingparameter. The sparse coding uses OMP and is set to reach an error margin (Cσw)

2where σ is the noise

standard deviation and C is a gain factor set to 1.15. The wavelet algorithm uses 5 scales of undecimatedbi-orthogonal wavelets, with three bands per scale. The red and blue lines correspond respectively towavelet and learned dictionary denoising. The horizontal axe is the PSNR between the noised and thesource images, and the horizontal axe is the PSNR between the denoised and the source images.

Figure 8. Images used in CDL benchmarks. Figure (a) is a Panoramic View of a Turbulent Star-makingRegion, and Figure (b) is an ACS/WFC image of Abell 1689.

13

(a) (b)

(c) (d)

Figure 9. Hubble images used for noiseless dictionary learning. Figure (b) is a Pandora’s Cluster Abell,Figure (b) is a galaxy cluster, Figure (c) is a region in the Large Magellanic Cloud , and Figure (d) is asecond Pandora’s Cluster Abell.

14

Extract of the dictionary

14 16 18 20 22 24 26 2824

26

28

30

32

34

36


Orig

ina

l / D

en

ois

ed

SN

R (

dB

)

CDL

DL

Wavelet

Figure 10. Benchmark for nebula image from Figure 1 comparing CDL to DL and wavelet denoisingmethods. (a) shows a centered learned dictionary learned on a second, noiseless image and used fordenoising. (b) shows the PSNR curve for the three methods. Centered dictionary learning method isrepresented by the green curve, the classic dictionary learning in blue and the wavelet-based method inred. The horizontal axis represents the PSNR (dB) between the image before and after adding noise. Fordenoising, we use OMP with a stopping criterion fixed depending on the level of noise that was added.100 experiments were repeated for each value of noise.


14 16 18 20 22 24 26 2826

28

30

32

34

36

38

40


Orig

ina

l / D

en

ois

ed

SN

R (

dB

)

CDL

DL

Wavelet

Figure 11. Benchmark for galaxy cluster image from Figure 1 comparing CDL to DL and waveletdenoising methods. (a) shows a centered learned dictionary learned on a second, noiseless image andused for denoising. (b) shows the PSNR curve for the three methods. Centered dictionary learningmethod is represented by the green curve, the classic dictionary learning in blue and the wavelet-basedmethod in red. The horizontal axis represents the PSNR (dB) between the image before and after addingnoise. For denoising, we use OMP with a stopping criterion fixed depending on the level of noise thatwas added. 100 experiments were repeated for each value of noise.

15


14 16 18 20 22 24 26 2824

25

26

27

28

29

30

31

32


Orig

ina

l / D

en

ois

ed

SN

R (

dB

)

CDL

DL

Wavelet

Figure 12. Benchmark for star-making region image from Figure 8 comparing CDL to DL and waveletdenoising methods. (a) shows a centered learned dictionary learned on a second, noiseless image and usedfor denoising. (b) shows the PSNR curve for the three methods. Centered dictionary learning method isrepresented by the green curve, the classic dictionary learning in blue and the wavelet-based method inred. The horizontal axis represents the PSNR (dB) between the image before and after adding noise. Fordenoising, we use OMP with a stopping criterion fixed depending on the level of noise that was added.100 experiments were repeated for each value of noise.


14 16 18 20 22 24 26 2824

26

28

30

32

34

36

38


Orig

ina

l / D

en

ois

ed

SN

R (

dB

)

CDL

DL

Wavelet

Figure 13. Benchmark for Abell 1689 image from Figure 8 comparing CDL to DL and wavelet denoisingmethods. (a) shows a centered learned dictionary learned on a second, noiseless image and used fordenoising. (b) shows the PSNR curve for the three methods. Centered dictionary learning method isrepresented by the green curve, the classic dictionary learning in blue and the wavelet-based method inred. The horizontal axis represents the PSNR (dB) between the image before and after adding noise. Fordenoising, we use OMP with a stopping criterion fixed depending on the level of noise that was added.100 experiments were repeated for each value of noise.

16

100 101 102 103 1040

100

200

300

400

500

600

700

Flux threshold

Num

ber o

f sou

rces

OriginalWTCDL

10 1 100 101 102 10310 6

10 4

10 2

100

102

104

Original flux

Flux

afte

r den

oisin

g

OriginalWTCDL

Figure 14. Source photometry comparison between CDL and wavelet denoising. Figure (a) shows howmany sources have a flux above a varying threshold after denoising. Figure (b) shows how the flux isdampened by denoising, representing the source flux after denoising as a function of the source fluxbefore denoising

(a) (b) (c)

(d) (e) (f)

Figure 15. Zoomed features extracted from a galaxy cluster image. (a) shows the full source imagebefore adding noise, (b) shows the noiseless source, (c) shows the noisy version, and (d), (e) and (f)respectively show the denoised feature using wavelet denoising, classic dictionary learning and centereddictionary learning.

17

(a) (b) (c)

(d) (e) (f)

Figure 16. Zoomed features extracted from the previously shown nebular image. (a) shows the fullsource image before adding noise, (b) shows the noiseless source, (c) shows the noisy version, and (d), (e)and (f) respectively show the denoised feature using wavelet denoising, classic dictionary learning andcentered dictionary learning.

18

(a) (b) (c)

(d) (e) (f)

Figure 17. Zoomed features extracted from the previously shown nebular image. (a) shows the fullsource image before adding noise, (b) shows the noiseless source, (c) shows the noisy version, and (d), (e)and (f) respectively show the denoised feature using wavelet denoising, classic dictionary learning andcentered dictionary learning.

19

astronomical image denoising using dictionary learningjstarck.free.fr/dl2012_double.pdf · on...

Documents