data-driven tight frames of kronecker type and applications to...

DATA-DRIVEN TIGHT FRAMES OF KRONECKER TYPE AND1

APPLICATIONS TO SEISMIC DATA INTERPOLATION AND2

DENOISING∗3

LINA LIU † , GERLIND PLONKA‡ , AND JIANWEI MA§4

Abstract. Seismic data interpolation and denoising plays a key role in seismic data processing.5These problems can be understood as sparse inverse problems, where the desired data are assumed6to be sparsely representable within a suitable dictionary. In this paper, we present a new data-driven7tight frame of Kronecker type (KronTF) method that avoids the vectorization step and considers8the multidimensional structure of data in a tensor-product way. It takes advantage of the structure9contained in all different modes (dimensions) simultaneously. In order to overcome the limitations10of a usual tensor-product approach we also incorporate data-driven directionality (KronTFD). The11complete method is formulated as a sparsity-promoting minimization problem. It includes two main12steps. In the first step, a hard thresholding algorithm is used to update the frame coefficients of the13data in the dictionary. In the second step, an iterative alternating method is used to update the tight14frame (dictionary) in each different mode. The dictionary that is learned in this way contains the15principal components in each mode. Furthermore, we apply the proposed tight frames of Kronecker16type to seismic interpolation and denoising. Examples with synthetic and real seismic data show17that the proposed method achieves better results in comparison with the traditional projection onto18convex sets (POCS) method based the Fourier transform and the previous vectorized data-driven19tight frame (DDTF) methods. In particular, the simple structure of the new frame construction20makes it essentially more efficient.21

Key words. Seismic data, Interpolation and denoising, Dictionary learning, Tight frame22

AMS subject classifications. 65T60, 68U10, 86A2223

1. Introduction. Seismic data processing is the essential bridge connecting the24

seismic data acquisition and interpretation. Many processes benefit from fully sam-25

pled seismic volumes. Examples of the latter are multiple suppression, migration,26

amplitude versus offset analysis, and shear wave splitting analysis. However, sam-27

pling is often limited by the high cost of the acquisition, obstacles in the field, and28

sampled data contain missing traces. Interpolating 3D data on a regular grid, but29

with an irregular pattern of missing traces, is more delicate. In that case undesired30

attenuate artifacts can arise from improper wave field sampling. Because of its prac-31

tical importance, interpolation and denoising of irregular missing seismic data have32

become an important topic for the seismic data-processing community. Various in-33

terpolation methods have been proposed to handle the arising aliasing effects. We34

especially refer to the f -x domain prediction method by Spitz [24] that is based on35

predictability of linear events without a priori knowledge of the directions of lateral36

coherence of the events. This approach can be also extended to 3D regular data.37

Seismic data interpolation and data denoising can also be regarded as an inverse38

problem where we employ sparsity constraints. Using the special structure of seismic39

data, we want to exploit that it can be represented sparsely in a suitable basis or40

frame, i.e., the original signal can be well approximated using only a small number of41

significant frame coefficients. In this paper, we restrict ourselves to finite-dimensional42

∗Submitted to the editors July 12, 2016.†Department of Mathematics, Harbin Institute of Technology, Harbin; Institute for Numerical

and Applied Mathematics, University of Gottingen, Gottingen, Germany (liulina [email protected])‡Institute for Numerical and Applied Mathematics, University of Gottingen, Gottingen, Germany

([email protected])§Department of Mathematics, Harbin Institute of Technology, Harbin, China ([email protected] )

1

This manuscript is for review purposes only.

mailto:liulina\protect [email protected]

mailto:[email protected]

mailto:[email protected]

2 L. LIU, G. PLONKA, AND J. MA

tight frames. Generalizing the idea of orthogonal transforms, a frame transform is43

determined by a transform matrix D ∈ CM1×N1 with M1 ≥ N1, such that for given44

data y ∈ CN1 the vector u = Dy ∈ CN2 contains the frame coefficients. For tight45

frames, D possesses a left inverse of the form 1cD∗ = 1

cDT ∈ CN1×M1 , i.e., D∗D = cI,46

where I is the identity operator. The constant c is the frame bound that can be set47

to 1 by suitable normalization. In this case the tight frame is called Parseval frame.48

The idea to employ sparsity of the data in a certain basis or frame has been already49

used in seismic data processing. A fundamental question is here the choice of the50

transform D. Transforms can be mainly divided into two categories: fixed-basis (basis51

also can be called frame) transforms and adaptive learned transforms. Transforms in52

the first category are data independent and include e.g. Radon transform [27, 14],53

Fourier transform [22, 19, 30, 1, 26], curvelet transform [12, 23, 13, 21], shearlet54

transform [11], and others. In recent years, adaptive learning transform methods55

came up for data processing. The corresponding transforms are data dependent and56

therefore usually more expensive. On the other hand they often achieve essentially57

better results.58

Transforms of adaptive learning type include principal component analysis (PCA)59

[29] and generalized PCA [28], the method of optimal directions (MOD) [8, 4], the60

K-SVD method (K-mean singular value decomposition) [2], and others. Particularly,61

K-SVD is a very popular tool for training the dictionary. It trains a dictionary from62

patches of the input image. When all of the patches of the input image are used, K-63

SVD is actually an adaptive frame learning method in the image space, where elements64

of the learned adaptive frame are translations of atoms in the patch dictionary. In this65

case, the adaptive frame transform can be understood as a filtering with filters being66

the atoms the learned patch dictionary, or a concatenation of block Toeplitz matrices67

in matrix form. However, the filters in the K-SVD frame are highly correlated and68

highly redundant, and in each step each filter is updated by one SVD, which causes69

the demand of a large computational effort. In order to reduce the computational70

complexity, [6] proposed a new method, named data-driven tight frame (DDTF),71

which in the image space learns a tight frame with a set of orthogonal filters. Due to72

the orthogonality, the DDTF method is able to update the complete set of orthonormal73

filters by one SVD in one iteration step, in contrast to the K-SVD that employs an74

SVD to update each filters consecutively. Therefore, the DDTF method is significantly75

faster than K-SVD, while achieving similar results in terms of image quality. The76

DDTF method has been applied to seismic data interpolation and denoising of two-77

dimensional and high-dimensional seismic data [18, 31].78

Fixed dictionary methods enjoy the high efficiency advantage, while adaptive79

learning dictionary methods can take use of the information of data itself. By com-80

bining the seislets [9] and DDTF, a double sparsity method was proposed for seismic81

data processing [7], which can benefit from both fixed and learned dictionary methods.82

However, a vast majority of existing dictionary learning methods vectorize the patches83

and deal with vectors and therefore do not fully exploit the data structure, i.e., spatial84

correlation of the data pixels. To overcome this problem, Kronecker-based DDTFs85

has been applied to dynamic computed tomography [25, 32] treating the patches as86

tensors instead of vectors and applying a learned filters with tensor structures. The87

tensor decomposition has been also used for seismic interpolation [16].88

Generalizing these ideas, in this paper we present a data-driven tight frame con-89

struction of Kronecker type (KronTF), that inherits the simple tensor-product struc-90

ture of the filters to ensure fast computation also for 3D tensors. This approach is91


DATA-DRIVEN TIGHT FRAMES OF KRONECKER TYPE 3

further generalized by incorporating adaptive directionality (KronTFD). The sepa-92

rability of the learned filters makes the resulting dictionaries highly scalable. The93

orthonormality among filters leads to very efficient sparse coding computation, as94

each sub-problem encountered during the iterations for solving the resulting opti-95

mization problem has a simple closed-form solution. These two characteristics, i.e.,96

the computational efficiency and scalability, make the proposed method very suitable97

for processing tensor data. The main contribution of our work includes the following98

aspects: 1) To avoid the vectorization step in previous DDTF method, we unfold the99

seismic data in each mode and use one SVD of small size per iteration to learn the100

filters of the learned tight frame. Therefore, the learned tight frame contains struc-101

tures of data sets in different modes. 2) Further, we introduce a simple method to102

incorporate significant directions of the data structure into the filters. 3) Finally the103

proposed new data-driven dictionaries KronTF and KronTFD are applied to seismic104

data interpolation and denoising.105

This paper is organized as follows. We first introduce the notation borrowed from106

tensor algebra and definitions pertaining tensors that are used throughout our paper.107

In Section 2, we shortly recall the basic idea of construction a data-driven frame108

and present an iterative scheme that is based on alternating updates of the frame109

coefficients for sparse representation of the given training data on the one hand and110

of the data-dependent filters of the tight frame on the other hand. Section 3 is devoted111

to the construction of a new data-driven frame that is based on the tensor-product112

structure, and is therefore essentially cheaper to learn. The employed idea can be113

simply transferred to the case of 3-tensors. In order to make the construction more114

flexible, we extend the data-driven Kronecker frame construction by incorporating115

also directionality in Section 4. The two data-driven tight frames are applied to data116

interpolation/reconstruction and to denoising in Section 5, and the corresponding117

numerical tests for synthetic 2D and 3D data and real 2D and 3D data are presented118

in Section 6. The paper finishes with a short conclusion on the achievements in this119

paper and further open problems.120

1.1. Notations. In this paper, we denote vectors by lowercase letters and ma-121

trices by uppercase letters. For x = (xi)Ni=1 ∈ CN let ‖x‖2 :=

(∑Ni=1 |xi|2

)1/2be122

the Euclidean norm, and for X = (xi,j)N1,N2

i=1,j=1 ∈ CN1×N2 we denote by ‖X‖F =123 (∑N1

i=1

∑N2

j=1 |xi,j |2)1/2

the Frobenius norm.124

For a third order tensor X = (xi,j,k)N1,N2,N3

i=1,j=1,k=1 ∈ CN1×N2×N3 , we introduce the125

three matrix unfoldings that involve the tensor dimensions N1, N2 and N3 in a cyclic126

way,127

X(1) :=(

(xi,j,1)N1,N2

i=1,j=1, (xi,j,2)N1,N2

i=1,j=1, . . . , (xi,j,N3)N1,N2

i=1,j=1

)∈ CN1×N2N3 ,128

X(2) :=(

(x1,j,k)N2,N3

j=1,k=1, (x2,j,k)N2,N3

j=1,k=1, . . . , (xN1,j,k)N2,N3

j=1,k=1

)∈ CN2×N1N3 ,129

X(3) :=(

(xi,1,k)N3,N1

k=1,i=1, (xi,2,k)N3,N1

k=1,i=1, . . . , (xi,N2,k)N3,N1

k=1,i=1

)∈ CN3×N1N2 .130

The multiplication of a tensor X with a matrix is defined by employing the ν-131

mode product, ν = 1, 2, 3, as defined in [17]. For X ∈ CN1×N2×N3 and matri-132

ces V (1) = (v(1)i′,i)

M1,N1

i′=1,i=1 ∈ CM1×N1 , V (2) = (v(2)j′,j)

M2,N2

j′=1,j=1 ∈ CM2×N2 , and V (3) =133



(v(3)k′,k)M3,N3

k′=1,k=1 ∈ CM3×N3 we define134

X ×1 V(1) :=

(N1∑i=1

xi,j,k v(1)i′,i

)M1,N2,N3

i′=1,j=1,k=1

∈ CM1×N2×N3 ,135

X ×2 V(2) :=

N2∑j=1

xi,j,k v(2)j′,j

N1,M2,N3

i=1,j′=1,k=1

∈ CN1×M2×N3 ,136

X ×3 V(3) :=

(N3∑k=1

xi,j,k v(3)k′,k

)N1,N2,M3

i=1,j=1,k′=1

∈ CN1×N2×M3 .137

Generalizing the usual singular value decomposition for matrices it has been shownin [17] that every (N1 ×N2 ×N3)-tensor X can be written as a product

X = S ×1 V(1) ×2 V

(2) ×3 V(3)

with unitary matrices V (1) ∈ CN1×N1 , V (2) ∈ CN2×N2 , and V (3) ∈ CN3×N3 and138

with a sparse core tensor S ∈ CN1×N2×N3 . By unfolding, this factorization can be139

represented as140

X(1) = V (1)S(1)(V (3) ⊗ V (2))T ,141

X(2) = V (2)S(2)(V (1) ⊗ V (3))T ,(1)142

X(3) = V (3)S(3)(V (2) ⊗ V (1))T ,143

where the S(ν) denote the corresponding ν-modes of the tensor S. Further, ⊗ denotes

the Kronecker product, see [15], which is for two matrices A = (ai,j)M,Ni=1,j=1 ∈ CM×N

and B ∈ CL×K defined as

A⊗B = (ai,jB)M,Ni=1,j=1 ∈ CML×NK .

In order to compute a generalized SVD for the tensor X one may first consider the144

SVDs of the unfolded matrices X(1), X(2) and X(3). Interpreting the equations in (1)145

as SVD decompositions, we obtain the ν-mode orthogonal matrices V (ν), ν = 1, 2, 3,146

directly from the left singular matrices of these SVDs, see [17].147

2. Data-driven Tight frames. In this section, we describe the construction of148

DDTF for two-dimensional data or images, as proposed in [6, 31].149

Generalizing the approach of unitary matrices, we say that a matrix D ∈ CM1×N1150

represents a (Parseval) tight frame of CN1 , if D∗D = IN1 , i.e., D∗ = DT

is the Moore151

Penrose inverse of D. Obviously, each vector a ∈ CN1 can be represented as a linear152

combination of the M1 columns of D∗, i.e., there exists a vector c ∈ CM1 with a = D∗c,153

and for M1 > N1, the coefficient vector is usually not longer uniquely defined. One154

canonical choice for c would be c = Da, since obviously D∗c = D∗Da = a is true.155

In frame theory, D∗ is called synthesis operator while D is the analysis operator.156

Within the last time, many frame transforms have been developed for signal and157

image processing, as e.g. wavelet framelets, curvelet frames, shearlet frames etc. In158

[6] a DDTF construction has been proposed for image denoising, where the analysis159

operator D is composed of filtering with a set of orthogonal filters.160



Let X be an input image. We now aim at finding a tight frame transform D under161

which X has a sparse representation. It is impossible to achieve this gaol without162

imposing any additional structure of D, as we have only one input image for training.163

In [6], D is assumed to be a filtering by a set of 2-D filters Di for i = 1, . . . ,K. It is164

proved in [6] that D is a tight frame if the filters Di for i = 1, . . . ,K are orthogonal165

and K = S2, where the support of the filters are all S × S. Then, the DDTF can be166

constructed by solving167

(1)

minDi∈CK×K ,C∈CM1×K

K∑i=1

‖Di∗X−Ci‖2F+λ

K∑i=1

‖Ci‖? s.t. 〈Di, Dj〉 = δij/K, ∀ 1 ≤ i, j ≤ K,168

where ‖Ci‖? denotes either ‖Ci‖0, the number of nonzero entries of Ci, or ‖Ci‖1,169

the sum of all absolute values of entries in Ci. The parameter λ > 0 balances the170

approximation term and the sparsity term.171

The DDTF construction (1) can be reformulated as an orthogonal dictionary172

learning in the patch space [3]. Let Y1, . . . , YK are patches of X that are vectorized173

to y1, . . . , yK ∈ CN1 . Define D = [d1, d2, . . . , dK ], where di is the vectorization of174

Di. Then (1) is equivalent to175

(2) minD∈CM1×N1 ,C∈CM1×K

‖DY −C‖2F + λ‖C‖? s.t. D∗D = IN1 .176

Observe that in the above model not only C, the set of sparse vectors ck representing177

yk in the dictionary domain, but also the filters D has to be learned, where the178

side condition D∗D = IN ensures that D is a Parseval tight frame. To solve this179

optimization problem, we employ an alternating iterative scheme that in turns updates180

the coefficient matrix C and the frame matrix D, where we start with a suitable initial181

frame D = D0.182

Step 1. For a fixed frame D ∈ CM1×N1 we have to solve the problem183

(3) minC∈CM1×K

‖DY −C‖2F + λ‖C‖?.184

We obtain for ‖ · ‖1 the solution C = Sλ/2(DY), applied componentwisely, with the185

soft threshold operator186

(4) Sλ/2(x) =

{(signx)(|x| − λ/2) for |x| > λ/2,0 for |x| ≤ λ/2.

187

For for ‖·‖0 we obtain C = Tγ(DY), applied componentwisely, with the hard threshold188

operator189

Tγ(x) =

{x for |x| > γ,0 for |x| ≤ γ,

190

where the threshold parameter γ depends on λ and on the data DY.191

Step 2. For given C we now want to update the frame matrix D by solving the192

optimization problem193

(5) minD∈CM1×N1

‖DY −C‖2F s.t. D∗D = IN1.194

Similarly as in [33] we can show195



Theorem 1. The above optimization problem (5) is equivalent to the problem

maxD∈CM1×N1

Re (tr (DYC∗)) s.t. D∗D = IN1 ,

where tr denotes the trace of a matrix, i.e., the sum of its diagonal entries, and Re196

the real part of a complex number. If YC∗ has full rank N1, this optimization problem197

is solved by Dopt = VN1U∗, where UΛV ∗ denotes the singular value decomposition of198

YC∗ ∈ CN1×M1 with the unitary matrices U ∈ CN1×N1 , V ∈ CM1×M1 , and VN1∈199

CM1×N1 is the restriction of V to its first N1 columns.200

Proof. We give the short proof for the complex case that is not contained in [33].201

The objective function in (5) can be rewritten as202

‖DY −C‖2F = tr(((DY)∗ −C∗)(DY −C))203

= tr(Y∗D∗DY + C∗C−C∗DY −Y∗D∗C)204

= tr(Y∗Y) + tr(C∗C)− 2Re(tr(DYC∗)),205

where we have used that tr(DYC∗) = tr(C∗DY) = tr(Y∗D∗C). Thus (5) is equiva-lent to

maxD∈CM1×N1

Re (tr (DYC∗)) s.t. D∗D = IN1.

Let now the singular value decomposition of YC∗ be given by

YC∗ = UΛV ∗

with unitary matrices U ∈ CN1×N1 , V ∈ CM1×M1 , and a diagonal matrix of (realnonnegative) singular values Λ = (diag(λ1, . . . , λN1),0) ∈ RN1×M1 . Choosing Dopt :=VN1U

∗, where VN1 is the restriction of V to its first N1 columns, we simply observethat

tr(DoptYC∗) = tr(VN1U∗UΛV ∗) = tr(VN1ΛV ∗) = tr(V ∗VN1Λ) = tr ΛN1 ∈ R,

where ΛN1= diag(λ1, . . . , λN1

) is the restriction of Λ to its first N1 columns. More-over,

(Dopt)∗Dopt = UV ∗N1

VN1U∗ = IN1

.

We show now that the value tr ΛN1is indeed maximal. Let D ∈ CM1×N1 be an

arbitrary matrix satisfying D∗D = IN1. Let U be again the unitary matrix from

the SVD YC∗ = UΛV ∗, and let WN1 := DU . Then the constraint implies thatW ∗N1

WN1 = U∗D∗DU = IN1 . Thus

tr(DYC∗) = tr(WN1U∗UΛV ∗) = tr(WN1

ΛV ∗) = tr(V ∗WN1Λ).

Since V is unitary and since WN1contains N1 orthonormal columns of length M1

also V ∗WN1consists of N1 orthonormal columns. Particularly, each diagonal entry of

V ∗WN1has modulus smaller than or equal to 1. Thus,

Re (tr(V ∗WN1Λ)) ≤ trΛN1

,

and equality only occurs if WN1= VN1

, i.e., if D = Dopt.206



3. Data-driven Kronecker tight frame. In the patch space, the DDTF trans-207

forms the patches in Cn1×n2 to vectors of length CN1 with N1 = n1 ·n2. However, the208

vectorization ignores the spatial correlation of the patch pixels hence not fully exploit209

the data structure. Moreover, the number of filters becomes huge as the dimension in-210

creases. To overcome these, we present the data-driven Kronecker tight frame, where211

each 2-D filter is a Kronecker product (tensor product) of 1-D vectors.212

Similar to DDTF construction which is equivalent to finding an orthogonal trans-213

form in the patch space, the data-driven Kronecker tight frame construction can be214

reformulated as a tensor-product structured orthogonal transform learning from the215

patches. We omit the details. A usual (tensor-product) two-dimensional orthogonal216

transform for a given patch Y ∈ Cn1×n2 is of the form217

(6) C = D1 Y DT2 ,218

where D1 ∈ Cm1×n1 , D2 ∈ Cm2×n2 are orthogonal matrices. Many existing orthogo-nal transform takes this structure, such as DCT, DFT, and discrete wavelet transform.Employing the properties of the Kronecker tensor product, we obtain by vectorizationof (6)

c := vec(C) = vec(D1Y DT2 ) = (D2 ⊗D1) vec(Y ) = (D2 ⊗D1) y,

where the operator vec reshapes the matrix into a vector, and y := vec(Y ).219

Without vectorizing the patches, and by imposing the tensor-product structure220

of the orthogonal transform, Eq. (2) in Section 2 is revised to221

(7) minD1,D2,C1,...,CK

K∑k=1

(‖D1YkD

T2 − Ck‖2F + λ‖Ck‖?

)s.t. D∗νDν = Inν , ν = 1, 2.222

where Y1, . . . , YK ∈ Cn1×n2 are patches of the input image X, and C1, . . . , CK ∈223

Cm1×m2 are the corresponding coefficient matrices.224

Equivalently, using the notations yk := vec(Yk) ∈ Cn1·n2 , ck := vec(Ck) ∈ Cm1·m2225

as well as Y := (y1, . . . , yK) ∈ Cn1n2×K and C := (c1, . . . , cK) ∈ Cm1m2×K , we can226

write the model in the form227

(8) minD1,D2,C

‖(D2 ⊗D1)Y −C‖2F + λ‖C‖? s.t. D∗νDν = Inν , ν = 1, 2.228

Here, as before ‖·‖? denotes either ‖·‖0 or ‖·‖1. Due to the orthogonality, Proposition229

3 in [6] implies that the filtering with filters being columns of D2 ⊗D1 forms a tight230

frame transform in the image space. Observe that, compared to (2), the matrix231

(D2 ⊗ D1) ∈ Cm1m2×n1n2 of filters has now a Kronecker structure that we did not232

impose before. Thus, learning the filter matrix (D2 ⊗D1) requires to determine only233

n1m1 + n2m2 instead of n1n2m1m2 components. Surely, we need to capture the234

important structures of the images in a sparse manner.235

In order to solve the minimization problem (7) resp. (8), we adopt the alternating236

minimization scheme proposed in the last section. Now, each iteration step consists237

of three independent steps:238

Step 1. First, for fixed matrices D1, D2, we minimize only with respect to C by239

applying either a hard or a soft threshold analogously as in Step 1 for the model (2).240

Step 2. We fix C and D2, and minimize (7) resp. (8) with respect to D1. For this241

purpose, we rewrite (7) as242

(9) minD1∈Cm1×n1

K∑k=1

‖D1YkDT2 − Ck‖2F s.t. D∗1D1 = In1

.243



As in (5), this problem is equivalent to

maxD1∈Cm1×n1

Re

(tr

(K∑k=1

D1(YkDT2 )C∗k

))s.t. D∗1D1 = In1

.

We take the singular value decomposition

K∑k=1

YkDT2 C∗k = U1Λ1V

∗1

and obtain unitary matrices U1 ∈ Cn1×n1 , V1 ∈ Cm1×m1 and a diagonal matrix of244

singular values Λ1 =(

diag(λ(1)1 , . . . , λ

(1)n1 ), 0

)∈ Rn1×m1 . Now, similarly as shown245

in Theorem 1, the optimal dictionary matrix is obtained by D1,opt = V1,n1U∗1 , where246

V1,n1denotes the restriction of V1 to its first n1 columns.247

Since∑Kk=1 YkD

T2 C∗k is a matrix of size n1 × m1, we only need to apply the248

singular value decomposition of this size here to obtain an update for D1.249

Step 3. Analogously, in the third step we fix C and D1 and minimize (7) resp. (8)with respect to D2. Here, we observe that

minD2∈Cm2×n2

K∑k=1

‖D1YkDT2 − Ck‖2F s.t. D∗2D2 = In2

is equivalent to

minD2∈Cm2×n2

K∑k=1

‖D2YTk D

T1 − CTk ‖2F s.t. D∗2D2 = In2

.

The SVDK∑k=1

Y Tk DT1 Ck = U2Λ2V

∗2

with unitary matrices U2 ∈ Cn2×n2 , V2 ∈ Cm2×m2 , and Λ2 =(

diag(λ(2)1 , . . . , λ

(2)n2 ), 0

)∈ Rn2×m2 yields the update

D2 = V2,n2U∗2 ,

where V2,n2 again denotes the restriction of V2 to its first n2 columns.250

We outline the pseudo code for learning the tight frame with Kronecker structure251

(KronTF) in the following Algorithm 1.252

This approach to construct a data-driven tight frame can now easily be extendedto third order tensors. Using tensor notion (for 2-tensors) the product in (6) reads

C = Y ×1 D1 ×2 D2.

Generalizing the concept above to 3-tensors, we want to find matrices Dν ∈ Cmν×nν ,ν = 1, 2, 3, with mν ≥ nν and D∗νDν = Inν , ν = 1, 2, 3, such that for the tensorpatches Yk ∈ Cn1×n2×n3 , k = 1, . . . ,K, of a given big tensor X , the core tensors

Sk = Yk ×1 D1 ×2 D2 ×3 D3 ∈ Cm1×m2×m3



Algorithm 1 KronTF Algorithm

Input: Image X, number of iterations TTake patches of X, denoted by Y1, Y2 · · ·YK ∈ Cn1×n2 .Initialize the filter matrices D1 ∈ Cm1×n1 and D2 ∈ Cm2×n2 with D∗1D1 = In1

,D∗2D2 = In2

.for k = 1, 2, . . . , T do

Use the hard/soft thresholding to update the coefficient matrix C = (C1, . . . , CK)as given in Step 1for n = 1 to 2 do do

Use the SVD method to update the filter matrices Dn as given in Steps 2 and3

end forend forreturn D1, D2

are simultaneously sparse. This is done by solving the minimization problem253

minD1,D2,D3,S1,...,SK

K∑k=1

(‖Yk ×1 D1 ×2 D2 ×3 D3 − Sk‖2F + λ ‖Sk‖?

)254

s.t. D∗νDν = Inν , ν = 1, 2, 3.(10)255

Here, the Frobenius norm of X is defined by ‖X‖F := (∑n1

i=1

∑n2

j=1

∑n3

k=3 |xi,j,k|2)1/2,256

and ‖Sk‖? denotes in the case ? = 0 the number of nonzero entries of Sk, and for257

? = 1 the sum of the moduli of all entries of Sk. In the space of the input tensor X , we258

are training a tight frame whose filters are tensor products of three sets of orthogonal259

1-D filters.260

The minimization problem (10) can be solved by four steps at each iterationlevel. In step 1, for fixed D1, D2, D3, one minimizes Sk, k = 1, . . . ,K, by applying athreshold procedure as before. In step 2, for fixed D2, D3 and Sk, k = 1, . . . ,K, weuse the unfolding

(Sk)(1) = D1(Yk)(1)(D3 ⊗D2)T

and have to solve261

minD1∈Cm1×n1

K∑k=1

‖(Sk)(1) −D1(Yk)(1)(D3 ⊗D2)T ‖2F s.t. D∗1D1 = In1.262

This problem has exactly the same structure as (9) and is solved by choosing D1 =V1,n1U

∗1 , where U1Λ1V

∗1 is the singular value decomposition of the (n1 ×m1)-matrix

K∑k=1

(Yk)(1)(D3 ⊗D2)T (Sk)∗(1).

Analogously, we find in step 3 and step 4 the updates D2 = V2,n2U∗2 and D3 = V3,n3U

∗3

from the singular decompositions

K∑k=1

(Yk)(2)(D1 ⊗D3)T (Sk)∗(2) = U2Λ2V∗2



resp.K∑k=1

(Yk)(3)(D2 ⊗D1)T (Sk)∗(3) = U3Λ3V∗3 .

263

264

Remark 3.1. 1. This dictionary learning approach requires at each iteration step265

three SVDs but for matrices of moderate size nν ×mν , ν = 1, 2, 3.266

2. One may also connect the two approaches considered in Section 2 and inSection 3 by e.g. enforcing a tensor product structure of the filters only in the thirddirection while employing a more general filter for the first and second direction. Inthis case we can use the unfolding

(Sk)(3) = D3 · (Yk)(3) · DT ,

where the dictionary (D2 ⊗D1) is replaced by the matrix D ∈ Cm1m2×n1n2 that not267

necessarily has the Kronecker product structure. The dictionary learning procedure is268

then applied as for two-dimensional tensors.269

4. Directional DDTF with Kronecker structure. Though the data-driven270

Kronecker tight frame considered in Section 3 is of simple structure and therefore271

is faster to implement, the Kronecker structure is limited in representing directional272

features, as usual for tensor product approaches. Therefore, we want to propose now273

a data-driven frame construction whose filters contain both a Kronecker structure and274

a directional structure. We first focus on the 2-D case.275

It is well-known that tensor-product filters are especially well suited for repre-276

senting vertical or horizontal features in an image. Our idea is now to incorporate277

other favorite directions by mimicking rotation of the image. While an exact rotation278

is not possible when we want to stay with the original grid, we apply the following279

simple procedure.280

Let V : Cn1 → Cn1 be the cyclic shift operator, i.e., for some x = (xj)n1−1j=0 we

haveV x := (x(j+1)modn1

)n1−1j=0 .

For a given image X = (x0, . . . , xn2−1) ∈ Cn1×n2 with columns x0, . . . , xn2−1 in Cn1 ,we consider e.g. the new image

X0 = (x0, V x1, V2x2, . . . , V

n2−1xn2−1),

where the j-th column is cyclically shifted by j steps. This procedure for exampleyields

X =

0 1 2 23 1 1 23 3 1 12 3 3 1

X0 =

0 1 1 13 3 3 23 3 2 22 1 1 1

,

such that diagonal structures of X turn to horizontal structures of X0. Vectorizingthe image X into x = vec(X), it follows that

x0 = vecX0 = diag(V j)n2−1j=0 x =

I

VV 2

. . .

V n2−1

x,



Fig. 1. Illustration of different angles in the range [0, π/4].

where diag(V j)n2−1j=0 ∈ Cn1n2×n1n2 denotes the block diagonal matrix with blocks V j .

Other rotations of X can be mimicked for example by multiplying x = vecX with

diag(V bj`/n1c

)n2−1

j=0, ` = 1, . . . , n1

for capturing the directions in [−π/4, 0] and by

diag(V −bj`/n1c

)n2−1

j=0, ` = −n1 + 1,−n1 + 2, . . . , 0

for capturing directions in [0, π/4]. This range is sufficient in order to bring essential281

edges into horizontal or vertical form. If a priori information about favorite directions282

(or other structures) in images is available, we may suitably adopt the method to283

transfer this structure into a linear vertical or horizontal structure. Possible column284

shifts in the data matrices mimicking directions are illustrated in Figure 1.285

Using this idea to incorporate directionality, we generalize the model (7) resp.286

(8) for filters learning as follows. nstead of considering only one set of orthogonal287

Kronecker filters D2 ⊗D1, we employ R sets of orthogonal Kronecker filters288

(11)

(D2 ⊗D1)(diag

(V bjα1c

)n2−1j=0

)...

(D2 ⊗D1)(diag(V bjαRc

)n2−1j=0

)

∈ CRm1m2×n1n2289

with R constants α1, . . . , αR ∈ { − 1 + 1n1,−1 + 2

n1, . . . n1−1)

n1, 1} capturing R favorite290

directions. Then the model reads291

(12) minD1,D2,CR,α1...,αR

R∑r=1

(‖(D2 ⊗D1)(diagV bjαrc)n2−1

j=0 Y −Cr‖2F + λ‖Cr‖?)

292

s.t. D∗νDν = Inν , ν = 1, 2.

where Cr ∈ Cm1m2R×K contains the blocks of transform coefficients for each direc-293

tion, and where ‖.‖? denotes either the 1-norm or the 0-subnorm as before. Similar to294

[6, Proposition 3], one can prove that the filtering with filters being rows of (11) form295

a tight frame transform for the input image. In other words, through (12), we are find-296

ing a data-driven tight frame with filters containing both Kronecker and directional297



structures. Compared to (8), the filter matrix is now composed of a Kronecker matrix298

(D2⊗D1) ∈ Cm1m2×n1n2 and block diagonal matrices (diagV bjαrc)n2−1j=0 , r = 1, . . . , R299

capturing different directions of the image features. Particularly, learning the filter300

matrix (11) only requires to determine n1m1 + n2m2 components of D1, D2 and R301

components αr to fix the directions.302

The minimization problem in (12) can again be solved by alternating minimiza-303

tion, where we determine the favorite directions already in a preprocessing step.304

Preprocessing step. For fixed matrices D1 and D2 we solve the problem

minCαr‖(D2 ⊗D1)(diagV bjαrc)n2−1

j=0 Y −Cαr‖2F + λ‖Cαr‖?

for each direction αr from a predetermined set of possible directions by applying ei-305

ther a hard or a soft threshold operator to the transformed patches Y = (y1, . . . , yK).306

We emphasize, that computing (diagV bjαc)n2−1j=0 Y for some fixed α ∈ (−1, 1] is very307

cheap, it just means a cyclic shifting of columns in the matrices Yk, such that308

(D2 ⊗D2)(diagV bjαc)n2−1j=0 Y corresponds to applying the two-dimensional dictionary309

transform to the images that are obtained from Yk, k = 1, . . . ,K, by taking cyclic310

shifts.311

We a priori fix the number R of considered directions (in practice often just R = 1312

or R = 2) and find the R favorite directions, e.g. by comparing the energies of Cαr313

for fixed thresholds or by comparing the PSNR values of the reconstructions of Y314

after thresholding.315

Once the directions α1, . . . , αR are fixed, we start the iteration process as before.316

317

Step 1. At each iteration level, we first minimize Cαr , r = 1, . . . , R, while the318

complete set of filters (directions and D1, D2) is fixed. This is done by applying the319

thresholding procedure. This step is not necessary at the first level since we can use320

here Cαr obtained in the preprocessing step.321

Step 2. For fixed directions α1, . . . , αR, corresponding Cαr , r = 1, . . . , R, and fixedD2 we consider the minimization problem for D1. We recall that Cαr consists ofK columns, where the k-th column is the vector of transform coefficients of Yk. Wereshape

Cαr ∈ Cm1m2×K ⇒ (Cαr,1, . . . Cαr,K) ∈ Cm1×m2K

and

diag(V bjαrc

)n2−1

j=0Y ∈ Cn1n2×K ⇒ (Yαr,1, . . . , Yαr,K) ∈ Cn1×n2K ,

where the k-th column of Cαr of length m1m2 is reshaped back to an (m1 × m2)-matrix Cαr,k by inverting the vec operator, and analogously, the k-th column of

diag(V bjαrc

)n2−1j=0

yk ∈ Cn1n2 is reshaped to a matrix Yαr,k ∈ Cn1×n2 . Now we have

to solve

minD1∈Cm1×n1

R∑r=1

K∑k=1

‖D1(Yαr,kDT2 − Cαr,k‖2F s.t. D∗1D1 = In1

with similar structure as in (9). As before, we apply the singular value decomposition

R∑r=1

K∑k=1

Y`r,kDT2 C∗`r,k = U1Λ1V

∗1



with unitary matrices U1 ∈ Cn1×n1 , V1 ∈ Cm1×m1 and a diagonal matrix of singular322

values Λ1 =(

diag(λ(1)1 , . . . , λ

(1)n1 ), 0

)∈ Rn1×m1 . Now, as shown in Theorem 1, the323

optimal dictionary matrix is obtained by D1,opt = V1,n1U∗1 , where V1,n1

denotes the324

restriction of V1 to its first n1 columns.325

Step 3. For fixed directions α1, . . . , αR, corresponding Cαr , r = 1, . . . , R, and fixedD1 we consider the minimization problem for D2. With the same notations as above,we can write

minD2∈Cm2×n2

R∑r=1

K∑k=1

‖D2YTαr,kD

T1 − CTαr,k‖

2F s.t. D∗2D2 = In2

and obtain the optimal solution D2,opt = V2,n2U∗2 from the singular value decomposi-

tionR∑r=1

K∑k=1

Y Tαr,kDT1 Cαr,k = U2Λ2V

∗2 ,

where V2,n2denotes the restriction of V2 to its first n2 columns. We outline the pseudo326

code for learning the tight frame with Kronecker structure and one optimal direction327

in Algorithm 2.

Algorithm 2 KronTFD Algorithm

Input: Training set of data, number of iterations TLet Y1, Y2 · · ·YK ∈ Cn1×n2 be patches of XInitialize the filter matrices D1 ∈ Cm1×n1 and D2 ∈ Cm2×n2 with D∗1D1 = In1

,D∗2D2 = In2 .First: Find optimal angle direction of patches.for k = 1, 2, . . . ,K dofor angle = −45,−40, . . . , 45 do

Adjust Yk by cyclic shifting of columns by the angle.Soft thresholding to update the coefficient matrix C = (C1, . . . , CK).Reconstruct the image by the tight frame synthesis operator and record theachieved SNR value.

end forThe largest SNR value yields the optimal angle direction of training dataAdjust Yk by cyclic shifting of columns by the optimal angle.

end forSecond: Learn filters.for k = 1, 2, . . . , T do

Use the hard/soft thresholding to update the coefficient matrix C = (C1, . . . , CK)

for n = 1 to 2 do doUse the SVD method to update the filter matrices Dn as given in Steps 2 and3

end forend forreturn D1, D2, optimal angle

328



5. Application to seismic data reconstruction and denoising. We want to329

apply the new data-driven tight frame constructions to 2D and 3D data reconstruction330

(resp. data interpolation) and to data denoising. Let X denote the complete correct331

data, and let Y be the observed data. We assume that these data are connected by332

the following relation333

(13) Y = A ◦X + γξ.334

Here A ◦X denotes the pointwise product (Hadamard product) of the two matrices335

A and X, ξ denotes an array of normalized Gaussian noise with expectation 0, and336

γ ≥ 0 determines the noise level. The matrix A contains only the entries 1 or 0 and337

is called trace sampling operator. If γ = 0 then the above relation models a seismic338

interpolation problem, and the task ist to reconstruct the missing data. If A = I1,339

where I1 is the matrix containing only ones, and γ > 0, it models a denoising problem.340

The two problems can be solved by a sparsity-promoting minimization method, see341

e.g. [6].342

We assume that our desired data X can be sparsely represented by the data-343

driven tight frame that has been learned beforehand, either by KronTF with kronecker344

product filters presented in Section 3 or by KronTFD with directional kronecker345

product filters presented in Section 4. We use D to denote the analysis operator of346

the learned tight frame.347

In geophysical community, alternating projection algorithms as POCS (projectiononto convex sets) are very popular. The approach in [1] for Fourier bases (insteadof frames) can be interpreted as follows. We may try to formulate the interpolationproblem as a feasibility problem. We look for a solution X of (13) that on the onehand satisfies the interpolation condition A ◦ X = Y , i.e., is contained in the set ofall data

M := {Z : A ◦ Z = Y }

possessing the observed data Y . This constraint can be enforced by applying theprojection operator onto M ,

PMX = (I1 −A) ◦X + Y

that leaves the unobserved data unchanged and projects the observed traces to Y .348

On the other hand, we want to ensure that the solution X has a sparse represen-349

tation in the constructed data-driven frame. The sparsity constraint is enforced by350

applying a (soft) thresholding to the transformed data, i.e., we compute351

PDλX := D∗SλDX,

where D denotes the tight frame analysis operator, and Sλ is the soft threshold oper-352

ator as in (4). We observe however, that PDλ is not longer a projector and therefore353

this approach already generalizes a usual alternating projection method.354

The complete iteration scheme can be obtained by alternating application of PM355

and PDλk ,356

(14) Xk+1 = PM (PDλkXk) = (I1 −A) ◦ (PDλkXk) + Y,357

where λk is the threshold value that can vary at each iteration step. We recall that all358

elements of the matrix I1 are one. To show convergence of this scheme in the finite-359

dimensional setting one can transfer ideas from [20], where an iterative projection360



scheme had been applied to a phase retrieval problem incorporating sparsity in a361

shearlet frame.362

In the context of image inpainting, (14) was proposed in [5] with framelet trans-363

form and the convergence was also proved there.364

To improve the convergence of this iteration scheme in numerical experiments,365

we adopt the following exponentially decreasing thresholding parameters, see [10],366

λk = λmaxeb(k−1), k = 1, 2, . . . , iter,367

where λ1 = λmax represents the maximum parameter, λiter = λmin the minimum368

parameter, and b is chosen as b = ( −1iter−1 ) ln(λmaxλmin

), where iter is the fixed number of369

iterations in the scheme.370

For data denoising, we only apply the simple one-step (non-iterative) soft thresh-371

olding procedure given by PDλX, where λ is the threshold parameter related to the372

noise level γ.373

6. Numerical Simulations. In a first illustration, we show the results of the374

dictionary learning method, where we have used (64×64) blocks of the seismic image375

in Figure 5(a). In Figure 2(b) we present the learned dictionary using the DDTF376

method. In comparison, the dictionaries learned by KronTF are shown in Figure 2(c)377

and (d). For the DDTF method, a spline wavelet is applied as the initial filter and378

the filter size is set to be 7× 7. The KronTF method chooses the Fourier basis of size379

64× 64 as initial filter.380

The Figures 2(e) and (f) show, how well the data can be sparsified by the learned381

filters compared to the initial filters. DDTF works here better than KronTF, but382

requires a much higher computational effort for dictionary learning. The comparison383

of time costs is illustrated in Figure 3. Here, we also show the comparison to KSVD384

that is even more expensive since it incorporates many SVDs, see Remark 2.3.385

We use the the new data-driven tight frames of Kronecker type (KronTF) and386

the data-driven directional frames of Kronecker type (KronTFD) for interpolation387

and denoising of 2D and 3D seismic data, and compare the performance with the388

results using the POCS algorithm based on the Fourier transform [1], curvelet frames389

and data-driven tight frames (DDTF) method [18].390

For the POCS method, seismic data are cropped into overlapping patches to391

suppress the artifacts resulting from the global Fourier transform. For the DDTF392

method, again a spline wavelet is applied as the initial dictionary and the filter size393

is set to be 7× 7, see [6]394

The quality of the reconstructed images is compared using the PSNR (peak signal-395

to-noise ratio) value, given by396

(15) PSNR = 10 log(maxX −minX

1MN

∑i,j(Xi,j − Xi,j)

),397

where X ∈ CM×N denotes the original seismic data and X ∈ CM×N is the recovered398

seismic data.399

In a first test we consider synthesis data, see Figure 4. Figure 4(a)-(b) show400

the original real data and sub-sampled data with 50% random traces missing.We401

compare the reconstructions (interpolation) using the DDFT method in Figure 4(c),402

the KronTF in Figure 4(d) and the KronTFD method in Figure 4(e).403

In a next example, we consider real seismic data. In Figure 5 and Figure 6 we404

present the interpolation results using different reconstruction methods. Figure 5(c)-405

(f) show the interpolation results by the POCS method and the curvelet transform as406



well as the difference images. In Figure 6, we show the corresponding interpolation407

results for DDTF, and KronTF and KronTFD together with the error between the408

recovery and original data.409

For a further comparison of the recovery results, we compare a single trace in410

Figure 7. In Table 1, we list the comparisons of reconstruction results obtained from411

incomplete data with different sampling ratios.

Table 1PSNR comparison of five methods for different sampling ratio.

Sampling 0.2 0.3 0.4 0.5 0.6 0.7 0.8KronTFD 35.1327 37.3572 38.8862 41.2564 43.0243 44.0132 44.8996KronTF 34.6198 36.8759 38.3167 40.6312 42.4675 43.8262 44.5342DDTF 33.2942 35.1486 36.9543 39.0385 40.1064 42.3052 42.5668

Curvelet 31.1674 34.6478 35.6347 37.6377 39.4190 40.8930 41.5727POCS 27.5122 31.8804 35.5560 38.2300 39.9878 41.0084 41.6816

412



(a) (b)

(c) (d)

100

101

102

103

104

0

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

k

ab

s(u

)

Learned

Initial

0 500 1000 1500 2000 2500 3000 3500 4000 45000

200

400

600

800

1000

1200

k

ab

s(u

)

Learned

Initial

(e) (f)

Fig. 2. (a) Initial filters. (b) Learned filter via DDTF. (c) The matrix D1 learned via KronTF.(d) The matrix D2 learned via KronTF. (e) (f) Absolute values of frame coefficients using KronTFand DDTF. Solid lines denote absolute values of frame coefficients using initial filters.



150 200 250 300 350 400 450 5000

1

2

3

4

5

6

7

Data size

Tra

inin

g tim

e(s

)

DDTF

KronTF

140 160 180 200 220 2400

500

1000

1500

2000

2500

Data size

Tra

inin

g tim

e(s

)

KSVD

DDTF

(a) (b)

Fig. 3. Comparison of time costs for filter learning.

Figures 8 and 9 show denoising results for the data in Figure 8(b) where the noise413

level is 20. We compare the results of POCS, the curvelet transform, DDTF, and our414

new frames KronTF and KronTFD, respectively. With our new data-driven frames,415

we can achieve better results than with the other methods, because the dictionary416

learning methods contain the information on the special seismic data. For better417

comparison, we also display the single trace comparisons in Figure 10. In Table 2, we418

list the comparisons of the achieved PSNR value with different noise levels. KronTF419

and KronTFD achieve competitive results both for interpolation and denoising.420

Finally, we test the interpolation of 3D data with 50% randomly missing traces. In421

Figure 11, we applied the KronTF and the KronTFD tensor technique to the synthesis422

3D seismic data (size of 178×178×128). Here, 8×8×8 patches are used for training.423

The results of the DDTF method for 3D data are shown in Figures 11(c). In Figure424

11(d)-(e) we present the results obtained by the KronTF and KronTFD method,425

respectively. The single trace comparisons are also shown in Figure 11(f).426

Finally, Figure 12 and Figure 13 show interpolation comparison of the DDTF427

method and KronTF method for a real 3D marine data.428

Table 2PSNR comparison of five methods for different noise levels.

Noise 5 10 15 20 25 30 35KronTFD 42.4425 39.2846 37.0246 36.1475 35.2537 34.5327 33.8631KronTF 41.5282 38.5876 36.4967 35.7451 34.7496 33.9365 33.2156DDTF 41.4918 38.4231 36.2578 35.0778 33.7533 32.5935 31.5993

Curvelet 40.7478 38.0749 35.8603 34.4654 33.2531 32.0124 30.6583POCS 40.6547 36.7751 34.1781 32.0267 30.8375 28.6220 27.9847

7. Conclusion. This paper aims at exploiting sparse representation of seismic429

data for interpolation and denoising. We have proposed a new method to construct430

data-driven tensor-product tight frames, which learn a separable dictionaries. In or-431

der to enlarge the flexibility of the filters, we have also proposed a simple method432

to incorporate favorite local directions that are learned from the data. In the nu-433

merical experiments, we employed the new dictionaries to improve both seismic data434

interpolation and denoising.435

The main advantage of the proposed tight frame construction is its computational436



efficiency which is due to the imposed tensor-product structure. The tight fame437

learning method is applied here to rather small blocks of data, such that local features438

are well recovered. However tensor-based method has its limitation, The algorithm439

need to store the each unfold-mode matrix is used to learn the dictionary, we need440

more storage space. In future, we will extend the idea to learn more global features441

(e.g. by applying a multiscale approach) and incorporate other direction and slope442

detection methods for the KronTFD. Furthermore, we are interested in applications443

to five-dimensional seismic data, where efficiency of the method is of even greater444

importance.445



Trace number

Tim

e s

am

plin

g n

um

be

r

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

Trace number

Tim

e s

am

plin

g n

um

ber

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

(a) (b)Trace number

Tim

e s

am

plin

g n

um

be

r

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

Trace number

Tim

e s

am

plin

g n

um

be

r

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

(c) (d)Trace number

Tim

e s

am

plin

g n

um

ber

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

(e)

0 200 40060

80

100

120

140

160

180

200

Time sampling number

Ampl

itude

0 200 40060

80

100

120

140

160

180

200


Ampl

itude

0 200 40060

80

100

120

140

160

180

200


Ampl

itude

OriginalKronTFD

OriginalKronTF

OriginalDDTF

(f)

Fig. 4. Interpolation results using DDTF and the new KronTF methods for synthesis datawith irregular sampling ratio of 0.5. (a) Original seismic data. (b) Seismic data with 50% tracesmissing. (c) Interpolation by DDTF method. (d) Interpolation by KronTF method. (e) Interpolationby KronTFD method. (f) Single trace comparisons of the reconstructions with the original data.



Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(a) (b)Trace number

Tim

e s

am

plin

g n

um

be

r

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(c) (d)Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(e) (f)

Fig. 5. Interpolation results on real data with an irregular sampling ratio of 0.5. (a) Originalseismic data. (b) Seismic data with 50% trace missing. (c) Interpolation using the POCS method.(d) Difference between (c) and (a). (e) Interpolation using the curvelet method. (f) Differencebetween (e) and (a).



Trace number

Tim

e s

am

plin

g n

um

be

r

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(a) (b)Trace number

Tim

e s

am

plin

g n

um

be

r

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(c) (d)Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(e) (f)

Fig. 6. Interpolation results (continued) of Figure 5(b). (a) Interpolation using the DDTF. (b)Difference between (a) and Figure 5(a). (c) Interpolation using the KronTF. (d) Difference between(c) and Figure 5(a). (e) Interpolation using the KronTFD. (f) Difference between (e) and Figure 5(a).



0 20 40 60 80 100 1200

50

100

150

200

250

300

Time sampling data

Am

plit

ud

e

original

recover

(a)

0 20 40 60 80 100 1200

50

100

150

200

250

300


Am

plit

ude

0 20 40 60 80 100 1200

50

100

150

200

250

300

Time sampling data

Am

plit

ud

e

original

recover

(b) (c)

0 20 40 60 80 100 1200

50

100

150

200

250

300


Am

plit

ud

e

original

recover

0 20 40 60 80 100 1200

50

100

150

200

250

300

Am

plit

ude


original

recover

(d) (e)

Fig. 7. (a)-(e) Single trace comparison of the five interpolation results in Figure 5 and 6 withthe original data as shown Figure 5(a).



Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(a) (b)Trace number

Tim

e s

am

plin

g n

um

be

r

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(c) (d)Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(e) (f)

Fig. 8. Denoising results of methods on real data with noise level of 20. (a) Original seismicdata. (b) Noisy data. (c) Denoising result by POCS method. (d) Difference between (c) and (a).(e) Denoising result by curvelet method. (f) Difference between (e) and (a).



Trace number

Tim

e s

am

plin

g n

um

be

r

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(a) (b)Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(c) (d)Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

Trace number

Tim

e s

am

plin

g n

um

ber

20 40 60 80 100 120

20

40

60

80

100

120

(e) (f)

Fig. 9. Denoising results for data in Figure 8(b). (a) Denoising by DDTF method. (b)Difference between (a) and 8(a). (c) Denoising by KronTF method. (d) Difference between (c) and8(a). (e) Denoising by KronTFD method. (f) Difference between (e) and 8(a).



0 20 40 60 80 100 1200

50

100

150

200

250

300


Am

plit

ud

e

original

denoise

(a)

0 20 40 60 80 100 1200

50

100

150

200

250

300


Am

plit

ude

Original

Curvelet

0 20 40 60 80 100 1200

50

100

150

200

250

300

Time sampling data

Am

plit

ud

e

original

denoise

(b) (c)

0 20 40 60 80 100 1200

50

100

150

200

250

300


Am

plit

ud

e

original

denoise

0 20 40 60 80 100 1200

50

100

150

200

250

300

Time sampling data

Am

plit

ude

original

denoise

(d) (e)

Fig. 10. Single trace comparisons for the recovery results in Figure 8 and Figure 9 and theoriginal Figure 8 (a). Here we have (a) application of POCS method, (b) curvelet method, (c)-(e)application of the data-driven frames DDTF, KronTF and KronTFD.



Shot (km) Receiver (km)

Tim

e (

s)

R

eceiv

er

(km

)

0 1 2 0 1 2

0

1

2

0

0.2

0.4

−12

−10

−8

−6

−4

−2

0

2

4

6


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

2

0

0.2

0.4

−12

−10

−8

−6

−4

−2

0

2

4

6

8

(a) (b)


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

2

0

0.2

0.4

−12

−10

−8

−6

−4

−2

0

2

4

6


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

2

0

0.2

0.4

−12

−10

−8

−6

−4

−2

0

2

4

6

(c) (d)


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

2

0

0.2

0.4

−12

−10

−8

−6

−4

−2

0

2

4

6

(e)

0 20 40 60 80−10

−5

0

5


Ampl

itude

OriginalKronTF

0 20 40 60 80−10

−5

0

5

Time sampling data

Ampl

itude

0 20 40 60 80−10

−5

0

5


Ampl

itude

OriginalDDTF

OriginalKronTFD

(f)

Fig. 11. Interpolation of a synthetic seismic 3D data with data size 178×178×128. (a) Originaldata. (b) Data with 50% randomly missing traces. (c)-(e) Interpolation by DDTF, KronTF andKronTFD.(f) Single trace comparisons of their reconstructions with the original data.




Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

20

0.2

0.4

−0.2

−0.1

0

0.1

0.2


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

20

0.2

0.4

−0.2

−0.1

0

0.1

0.2

(a) (b)


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

20

0.2

0.4

−0.2

−0.1

0

0.1

0.2


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

20

0.2

0.4

−0.2

−0.1

0

0.1

0.2

(c) (d)


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

20

0.2

0.4

−0.2

−0.1

0

0.1

0.2


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

20

0.2

0.4

−0.2

−0.1

0

0.1

0.2

(e) (f)


Tim

e (s

)

R

ecei

ver (

km)

0 1 2 0 1 2

0

1

20

0.2

0.4

−0.2

−0.1

0

0.1

0.2

(g)

Fig. 12. Interpolation of a real 3D marine data with data size 251 × 401 × 50. (a) Originaldata; (b) Data with 50% randomly missing traces. (c)-(g) Interpolation by POCS method, Curveletmethod, DDTF, KronTF and KronTFD.



20 40 60 80 100 120 140−0.06

−0.04

−0.02

0

0.02

0.04

0.06


Ampl

itude

OriginalPOCS

(a)

0 50 100 150−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08


Ampl

itude

OriginalCurvelet

0 50 100 150−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08


Ampl

itude

OriginalDDTF

(b) (c)

0 50 100 150−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08


Ampl

itude

OriginalKronTF

0 50 100 150−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08


Ampl

itude

OriginalKronTFD

(d) (e)

Fig. 13. Single trace comparisons for the recovery results in Figure 12 and the original Figure12 (a). Here we have (a) POCS method, (b) curvelet denoising, (c)-(e) application of the data-drivenframes DDTF, KronTF and KronTFD.



Acknowledgments. The authors thank Jian-Feng Cai from Hongkong Univer-446

sity of Science and Technology for insightful technical discussions and helpful edit447

on the DDTF. This work is supported by NSFC (grant number: NSFC 91330108,448

41374121, 61327013), and the Fundamental Research Funds for the Central Universi-449

ties (grant number: HIT.PIRS.A201501).450

REFERENCES451

[1] R. Abma and N. Kabir, 3d interpolation of irregular data with a pocs algorithm, Geophysics,45271 (2006), pp. E91–E97, doi:10.1190/1.2148139.453

[2] M. Aharon, M. Elad, and A. Bruckstein, The k-svd: An algorithm for designing of over-454complete dictionaries for sparse representation, IEEE Transactions On Signal Processing,45554 (2006), pp. 4311–4322, doi:10.1109/TSP.2006.881199.456

[3] C. Bao, J.-F. Cai, and H. Ji, Fast sparsity-based orthogonal dictionary learning for image457restoration, in Proceedings of the IEEE International Conference on Computer Vision,4582013, pp. 3384–3391.459

[4] S. Beckouche and J. Ma, Simultaneously dictionary learning and denoising for seismic data,460Geophysics, 79 (2014), pp. A27–A31, doi:10.1190/geo2013-0382.1.461

[5] J.-F. Cai, R. H. Chan, and Z. Shen, A framelet-based image inpainting algorithm, Applied462and Computational Harmonic Analysis, 24 (2008), pp. 131–149.463

[6] J.-F. Cai, H. Ji, Z. Shen, and G.-B. Ye, Data-driven tight frame construction and im-464age denoising, Applied and Computational Harmonic Analysis, 37 (2014), pp. 89–105,465doi:10.1016/j.acha.2013.10.001.466

[7] Y. Chen, J. Ma, and S. Fomel, Double sparsity dictionary for seismic noise attenuation,467Geophysics, 81 (2016), pp. V103–V116, doi:10.3997/2214-4609.201413416.468

[8] K. Engan, S. O. Aase, and J. H. Husoy, Method of optimal directions for frame design, IEEE469International Conference on., 5 (1999), pp. 2443–2446, doi:10.1109/ICASSP.1999.760624.470

[9] S. Fomel and Y. Liu, Seislet transform and seislet frame, Geophysics, 75 (2010), pp. V25–V38,471doi:10.1190/1.3380591.472

[10] J. J. Gao, X. H. Chen, J. Y. Li, G. C. Liu, and J. Ma, Irregular seismic data reconstruc-473tion based on exponential threshold model of pocs method, Applied Geophysics, 7 (2010),474pp. 229–238, doi:10.1007/s11770-010-0246-5.475

[11] S. Hauser and J. Ma, Seismic data reconstruction via shearlet-regularized directional inpaint-476ing, (2012).477

[12] G. Hennenfent and F. J. Herrmann, Simply denoise: wavefield reconstruction via jittered478undersampling, Geophysics, 73 (2008), pp. V19–V28, doi:10.1190/1.2841038.479

[13] F. J. Herrmann and G. Hennenfent, Non-parametric seismic data recovery with curvelet480frames, Geophysical Journal International, 173 (2008), pp. 233–248, doi:10.1111/j.1365-481246X.2007.03698.x.482

[14] M. M. N. Kabir and D. J. Verschuur, Restoration of missing offsets by parabolic483radon transform, Geophysical Prospecting, 43 (1995), pp. 347–368, doi:10.1111/j.1365-4842478.1995.tb00257.x.485

[15] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM review, 51486(2009), pp. 455–500, doi:10.1137/07070111X.487

[16] N. Kreimer and M. D. Sacchi, A tensor higher-order singular value decomposition for488prestack seismic data noise reduction and interpolation, Geophysics, 77 (2012), pp. V113–489V122, doi:10.1190/geo2011-0399.1.490

[17] L. D. Lathauwer, B. D. Moor, and J. Vandewalle, A multilinear singular value decom-491position, SIAM journal on Matrix Analysis and Applications, 21 (2000), pp. 1253–1278,492doi:10.1137/S0895479896305696.493

[18] J. Liang, J. Ma, and X. Zhang, Seismic data restoration via data-driven tight frame, Geo-494physics, 79 (2014), pp. V65– V74, doi:10.1190/geo2013-0252.1.495

[19] B. Liu and M. D. Sacchi, Minimum weighted norm interpolation of seismic records, Geo-496physics, 69 (2004), pp. 1560–1568, doi:10.1190/1.1836829.497

[20] S. Loock and G. Plonka, Phase retrieval for fresnel measurements using a shearlet sparsity498constraint, Inverse Problems, 30 (2014), p. 055005.499

[21] M. Naghizadeh and M. D. Sacchi, Beyond alias hierarchical scale curvelet interpolation of500regularly and irregularly sampled seismic data, Geophysics, 75 (2010), pp. WB189–WB202,501doi:10.1190/1.3509468.502

[22] M. D. Sacchi, T. J. Ulrych, and C. J. Walker, Interpolation and extrapolation using a high-503resolution discrete fourier transform, IEEE Transactions on Signal Processing, 46 (1998),504


http://dx.doi.org/10.1190/1.2148139

http://dx.doi.org/10.1109/TSP.2006.881199

http://dx.doi.org/10.1190/geo2013-0382.1

http://dx.doi.org/10.1016/j.acha.2013.10.001

http://dx.doi.org/10.3997/2214-4609.201413416

http://dx.doi.org/10.1109/ICASSP.1999.760624

http://dx.doi.org/10.1190/1.3380591

http://dx.doi.org/10.1007/s11770-010-0246-5

http://dx.doi.org/10.1190/1.2841038

http://dx.doi.org/10.1111/j.1365-246X.2007.03698.x



http://dx.doi.org/10.1111/j.1365-2478.1995.tb00257.x



http://dx.doi.org/10.1137/07070111X


http://dx.doi.org/10.1137/S0895479896305696


http://dx.doi.org/10.1190/1.1836829

http://dx.doi.org/10.1190/1.3509468


pp. 31–38, doi:10.1109/78.651165.505[23] R. Shahidi, G. Tang, J. Ma, and F. J. Herrmann1, Applications of randomized sampling506

schemes to curvelet-based sparsity-promoting seismic data recovery, Geophysical Prospect-507ing, 61 (2013), pp. 973–997, doi:10.1111/1365-2478.12050.508

[24] S. Spitz, Seismic trace interpolation in the f-x domain, Geophysics, 56 (1991), pp. 785–794,509doi:10.1190/1.3509468.510

[25] S. Tan, Y. Zhang, G. Wang, X. Mou, G. Cao, Z. Wu, and H. Yu, Tensor-based dictionary511learning for dynamic tomographic reconstruction, Physics in Medicine and Biology, 60512(2015), pp. 2803–2818.513

[26] D. Trad, Five-dimensional interpolation: Recovering from acquisition constraints, Geophysics,51474 (2009), pp. V123–V132, doi:10.1190/1.3245216.515

[27] D. O. Trad, T. J. Ulrych, and M. D. Sacchi, Accurate interpolation with high-resolution516time-variant radon transforms, Geophysics, 67 (2002), pp. 644–656, doi:10.1190/1.1468626.517

[28] R. Vidal, Y. Ma, and S. Sastry, Generalized principal component analysis(gpca), IEEE518Transactions on Pattern Analysis and Machine Intelligence, 27 (2005), pp. 1945–1959,519doi:10.1109/CVPR.2003.1211411.520

[29] S. Wold, K. Esbensen, and P. Geladi, Principal component analysis, Chemometrics and521intelligent laboratory systems, doi:10.1016/0169-7439(87)80084-9.522

[30] S. Xu, Y. Zhang, D. Pham, and G. Lambar, Antileakage fourier transform for seismic data523regularization, Geophysics, 70 (2005), pp. V87–V95, doi:10.1190/1.1993713.524

[31] S. Yu, J. Ma, X. Zhang, and M. D. Sacchi, Interpolation and denoising of high-525dimensional seismic data by learning a tight frame, Geophysics, 80 (2015), pp. V119–V132,526doi:10.1190/geo2014-0396.1.527

[32] W. Zhou, J.-F. Cai, and H. Gao, Adaptive tight frame based medical image reconstruction:528proof-of-concept study in computed tomography, Inverse Problems, 29 (2013), pp. 125006:1–52918, doi:10.1088/0266-5611/29/12/125006.530

[33] H. Zou, T. Hastie, and R. Tibshirani, Sparse principal component analysis, Journal of com-531putational and graphical statistics, 15 (2006), pp. 265–286, doi:10.1198/106186006X113430.532


http://dx.doi.org/10.1109/78.651165

http://dx.doi.org/10.1111/1365-2478.12050

http://dx.doi.org/10.1190/1.3509468

http://dx.doi.org/10.1190/1.3245216

http://dx.doi.org/10.1190/1.1468626

http://dx.doi.org/10.1109/CVPR.2003.1211411

http://dx.doi.org/10.1016/0169-7439(87)80084-9

http://dx.doi.org/10.1190/1.1993713


http://dx.doi.org/10.1088/0266-5611/29/12/125006

http://dx.doi.org/10.1198/106186006X113430

data-driven tight frames of kronecker type and applications to...

Documents