[advances in imaging and electron physics] volume 128 || fourier, block, and lapped transforms

50
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128 Fourier, Block, and Lapped Transforms TIL AACH Institute for Signal Processing, University of Lfibeck, Ratzeburger Allee 160, D-23538 Lfibeck, Germany I. Introduction: Why Transform Signals Anyway? ................. 1 II. Linear System Theory and Fourier Transforms ................. 3 A. Continuous-Time Signals and Systems .................... 3 B. Discrete-Time Signals and Systems ...................... 6 C. The Discrete Fourier Transform and Block Transforms .......... 8 III. Transform Coding ................................. 13 A. The Role of Transforms: Constrained Source Coding ............ 13 B. Transform Efficiency .............................. 14 C. Transform Coding Performance ........................ 23 IV. Two-Dimensional Transforms ........................... 25 V. Lapped Transforms ........................ ' ......... 28 A. Block Diagonal Transforms .......................... 28 B. Extension to Lapped Transforms ....................... 29 C. The Lapped Orthogonal Transform ...................... 30 D. The Modulated Lapped Transform ...................... 33 E. Extensions .................................... 36 VI. Image Restoration and Enhancement ....................... 39 VII. Discussion ...................................... 41 Acknowledgments .................................. 42 Appendix A ..................................... 42 Appendix B ..................................... 43 Appendix C ..................................... 45 Appendix D ..................................... 47 References ...................................... 48 I. INTRODUCTION: WHY TRANSFORM SIGNALS ANYWAY? The Fourier transform and its related discrete transforms are of key importance in both theory and practice of signal and image processing. In the theory of continuous-time systems and signals, the Fourier transform allows one to describe both signal and system properties and the relation between system input and output signals in the frequency domain (Ziemer et al., 1989; Ltike, 1999). Fourier-optical systems based on the diffraction of coherent light are a direct practical realization of the two-dimensional continuous Fourier transform (Papoulis, 1968; Bamler, 1989).The discrete- time Fourier transform (DTFT) describes properties of discrete-time signals and systems. While the DTFT assigns frequency-continuous and periodic Copyright 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00

Upload: til

Post on 13-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128

Fourier, Block, and Lapped Transforms

TIL AACH

Institute for Signal Processing, University of Lfibeck, Ratzeburger Allee 160, D-23538 Lfibeck, Germany

I. In t roduct ion: W h y Trans fo rm Signals Anyway? . . . . . . . . . . . . . . . . . 1

II. Linear System Theory and Fourier Transforms . . . . . . . . . . . . . . . . . 3

A. Con t inuous -T ime Signals and Systems . . . . . . . . . . . . . . . . . . . . 3

B. Discrete-Time Signals and Systems . . . . . . . . . . . . . . . . . . . . . . 6

C. The Discrete Fourier Transform and Block Transforms . . . . . . . . . . 8

III. T rans fo rm Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

A. The Role of Transforms: Constrained Source Coding . . . . . . . . . . . . 13

B. Trans fo rm Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

C. Trans fo rm Coding Performance . . . . . . . . . . . . . . . . . . . . . . . . 23

IV. Two-Dimens iona l Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

V. L apped Transforms . . . . . . . . . . . . . . . . . . . . . . . . ' . . . . . . . . . 28

A. Block Diagona l Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 28

B. Extension to La ppe d Transforms . . . . . . . . . . . . . . . . . . . . . . . 29

C. The Lapped Or thogona l Trans fo rm . . . . . . . . . . . . . . . . . . . . . . 30

D. The Modulated La ppe d Trans fo rm . . . . . . . . . . . . . . . . . . . . . . 33

E. Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

VI. Image Restoration and Enhancement . . . . . . . . . . . . . . . . . . . . . . . 39

VII. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Acknowledgmen t s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Append ix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Append ix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Append ix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Append ix D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

I . I N T R O D U C T I O N : W H Y T R A N S F O R M S I G N A L S A N Y W A Y ?

The Fourier transform and its related discrete transforms are of key importance in both theory and practice of signal and image processing. In the theory of continuous-time systems and signals, the Fourier transform allows one to describe both signal and system properties and the relation between system input and output signals in the frequency domain (Ziemer et al., 1989; Ltike, 1999). Fourier-optical systems based on the diffraction of coherent light are a direct practical realization of the two-dimensional continuous Fourier transform (Papoulis, 1968; Bamler, 1989).The discrete- time Fourier transform (DTFT) describes properties of discrete-time signals and systems. While the DTFT assigns frequency-continuous and periodic

Copyright �9 2003 Elsevier Inc. All rights reserved.

1076-5670/2003 $35.00

2 TIL AACH

spectra to discrete-time signals, the discrete Fourier transform (DFT) represents a discrete-time signal of finite length by a finite number of discrete-frequency coefficients (Oppenheim and Schafer, 1998; Proakis and Manolakis, 1996; Lfike, 1999). The DFT thus permits one to compute spectral respresentations numerically. The DFT and other discrete transforms related to it, like the discrete cosine transform (DCT), are also of great practical importance for the implementations of signal and image processing systems, since efficient algorithms for their computations exist, e.g., in the form of the fast Fourier transform (FFT).

However, while continuous-time Fourier analysis generally considers the entire time axis from minus infinity to plus infinity, the DFT is only defined for signals of finite duration. Conceptually, the finite-duration signals are formed by taking single periods from originally periodic signals. Consequently, enhancement and transform codings of, for instance, speech, are based on the spectral analysis of short time intervals of the speech waveform (Lim and Oppenheim, 1979; Ephraim and Malah, 1984; van Compernolle, 1992; Capp6, 1994; Aach and Kunz, 1998). The length of the time intervals depends on the nature of the signals, viz. short-time stationarity. Similarly, transform coding (Clarke, 1985) or frequency- domain enhancement (Lim, 1980; Aach and Kunz, 1996a,b, 2000) of images require spectral analysis of rectangular blocks of finite extent in order to take into account short-space stationarity. Such processing by block trans- forms often generates audible or visible artifacts at block boundaries. While in some applications these artifacts may be mitigated using overlapping blocks (Lim and Oppenheim, 1979; Lim, 1980; Ephraim and Malah, 1984; Capp~, 1994; van Compernolle, 1992; Aach and Kunz, 1996a,b, 1998; Aach, 2000), this is not practical in applications like transform coding, where overlapping blocks would inflate the data volume. Transform coders therefore punch out adjacent blocks from the incoming continuous data stream, and encode these individually. To illustrate the block artifacts, Figure 1 shows an image reconstructed after encoding by the JPEG algorithm, which uses a blockwise DCT (Rabbani and Jones, 1991). Lapped transforms aim at reducing or even eliminating block artifacts by the use of overlapping basis functions, which extend over more than one block.

The purpose of this chapter is to provide a self-contained introduction to lapped transforms. Our approach is to develop lapped transforms from standard block transforms as a starting point. To introduce the topic of signal transforms, we first summarize the development from the Fourier transform of continuous-time signals to the DFT. An in-depth treatment can be found in many texts on digital signal processing and system theory (e.g., Ziemer et al., 1989; Oppenheim and Schafer, 1998; Lfike, 1999). In Section III, we discuss the relevance of orthogonal block transforms for

FOURIER, BLOCK, AND LAPPED TRANSFORMS 3

FIGURE 1. Left: Portion of size 361 x 390 pixels of the "Marcel" image, 8 bits per pixel. Right: Reconstruction after JPEG compression at about 0.2 bits per pixel.

transform coding, which depends on the covariance structure of the signals. Section IV deals with two-dimensional block transforms. Orthogonal block transforms map a given number of signal samples contained in each block into an identical number of transform coefficients. Each signal block can hence be perfectly reconstructed from its transform coefficients by an inverse transform. In contrast to block transforms, the basis functions of lapped transforms discussed in Section V extend into neighboring blocks. The number of transform coefficients generated is then lower than the number of signal samples covered by the basis functions. Signal blocks can therefore not be perfectly reconstructed from their individual transform coefficients. However, if the transform meets a set of extended orthogonality conditions, the original signal is perfectly reconstructed by superimposing the overlapping, imperfectly reconstructed signal blocks. Two types of lapped transforms will be considered, the lapped orthogonal transform (LOT) and the modulated lapped transform (MLT). We then discuss extensions of these transforms before concluding with some examples comparing the use of block and lapped transforms in image restoration and enhancement.

II . LINEAR SYSTEM THEORY AND FOURIER TRANSFORMS

A. Continuous-Time Signals and Systems

Let s(t) denote a real signal, with t being the independent continuous-time variable. Our aim is to describe the transmission of signals through one or

4 TIL A A C H

more systems, where a system is regarded as a black box which maps an input signal s(t) into the output signal g(t) by a mapping M, i.e., g(t) = M(s(t)). Restricting ourselves here to the class of linear time-invariant (LTI) systems, we require the systems to comply with the following conditions. (i) Linearity: A linear system reacts to any weighted combination of K input signals si(t), i = 1 , . . . , K, with the same weighted combination of output signals gi( t )= M(si(t)):

) M aisi(t) -- ~ aiM(si(t)) -- ~ aigi(t), i=1 i=1 i=1

(1)

where ai, i - - 1 , . . . , K denote the weighting factors. (ii) Time invariance: A time-invariant system reacts to an arbitrary delay of the input signal with a correspondingly delayed, but otherwise unchanged output signal:

M(s(t)) = g(t) :=~ M ( s ( t - r)) = g ( t - r), (2)

where r is the delay. An LTI system is completely characterized by the response to the Dirac

delta impulse 3(t). Denoting the so-called impulse response by h(t), we have h(t) = M(3(t)). The Dirac impulse 3(t) is a distribution defined by the integral equation

F s( t ) - s ( r ) ~ ( t - r ) 8 r , o o

(3)

which essentially represents a signal s(t) by an infinite series of Dirac impulses delayed by r and weighted by s(r). Since an LTI system reacts to the signal s(t) by the same weighted combination of delayed impulse responses h(t), it suffices to replace 6(t-r) in Equation (3) by h( t - r ) to obtain the output g(t):

F g(t) - s(r)h(t - r) dr. cx)

(4)

This relationship is known as the so-called convolution, and abbreviated by g(t) = s ( t ) , h(t). Since the convolution is commutative, we may interchange input signal and impulse response, and equally write g( t )= h ( t ) , s(t).

Let us now consider the system reaction to the complex exponential Seig(t)

of frequency f (or radian frequency co = 2rtf) given by

Seig(t) - - e j2nft, (5)

FOURIER, BLOCK, AND LAPPED TRANSFORMS 5

where j = ~ 1. From g( t) -- h( t) * Seig(t), we obtain

g(t) - - e j 2 r t f t . h('c)e -j2rcfr dr - Seig(t)" g ( f ) , ( x )

where 1

/? H ( f ) - h(t)e -]2~fr dt. o o

(6)

(7)

Hence, the input signal is only weighted by the generally complex weight- ing factor H ( f ) , but otherwise reproduced unchanged, and called an eigenfunction of LTI systems. The relationship between h(t) and H ( f ) is the Fourier transform, and denoted by h(t)o--oH(f). If known for all frequencies, H ( f ) is called the spectrum of the signal h(t), or the transfer function of the LTI system. Equation (7) essentially is an inner product or correlation between h(t) and the complex exponential of frequency f. The signal h(t) can be recovered from its spectrum H ( f ) by the inverse Fourier transform

/? h(t) - H ( f )e jz~/t df, (8) (x)

which is a weighted superposition of complex exponentials. (This integral reconstructs discontinuities of h(t) by the average between left and right limit.) Evidently, an LTI system can also be fully described by its transfer function H( f ) . When applied to a signal s(t), the Fourier transform S ( f ) is called the spectrum of s(t). It specifies the weights and phases of the complex exponentials contributing to s(t) in the inverse Fourier transform according to

// s(t) o--e S ( f ) =~ s(t) - S ( f )e jz~/t df. (9) o o

The Fourier transform allows one to describe the transfer of a signal s(t) over an LTI system in the frequency domain. According to Equation (6), the LTI system reacts to e jz~zt by H ( f ) . e j2~zt. Equation (9) represents the system input s(t) as a weighted superposition of complex exponentials. Because of linearity, the output signal g(t) is given by an identical weighted superposition of system reactions H ( f ) . eJZ~/t:

Z? g(t) - S ( f ) H ( f )e j2~ft d f . ( 1 O) o o

1In the following, we assume the Fourier integrals to exist. For h(t) piecewise continuous, a sufficient condition is f_~ Ih(r)l dr < eo.

6 TIL A A C H

Denoting the spectrum of g(t) by G(f), the inverse Fourier transform yields

F g(t) o--~ G ( f ) =~ g(t) - G ( f )e jznft d f . o o

(11)

Comparing Equations (10) and (11), we obtain G ( f ) = H ( f ) S ( f ) , i.e., the spectrum of the output signal is given by the product of the spectrum of the input signal and the transfer function of the LTI system.

The Fourier transform as given by Equations (7) and (8) thus provides insight into the frequency content of signals, and transfer properties of LTI systems. Relating a continuous-time signal to a spectrum that is a function of a continuous frequency variable, this version of the Fourier transform is, however, not suited for numerical evaluation by computer or digital signal processing systems. Still, realization of a continuous Fourier analyzer is possible, for instance by optical systems (Papoulis, 1968; Bamler, 1989).

B. Discrete-Time Signals and Systems

Let us now consider a discrete-time signal s(n), where the independent variable may only take integer values, i.e., n =0, 4-1, 4 -2 , . . . . Essentially, s(n) is an ordered sequence of numbers stored, for example, in the memory of a computer, or coming from an A/D-converter. A discrete-time system maps the input signal s(n) into the output signal g(n) by the mapping g(n) = M(s(n)). As in the continuous-time case, we regard only linear time- invariant systems obeying the following conditions: (i) Linearity:

(5 ) M ai i( ) - - aigi( ),

i=1 i=1 i=1

(12)

for arbitrary input signals si(n) and weighting factors ai, i = 1, . . . , K. (ii) Time invariance:

M(s(n)) = g(n) =~ M(s(n - m)) = g(n - m), (13)

where m is an integer delay. In the discrete-time case, the Dirac delta impulse is replaced by the unit

impulse ~(n) which is defined by

1 for n - - 0 ~(n) -- 0 otherwise" (14)

F O U R I E R , B L O C K , A N D L A P P E D T R A N S F O R M S 7

A discrete-time signal s(n) can then be composed of a sum of weighted and shifted unit impulses according to

O<3

s(n) -- ~_~ s(m)6(n - m). (15) m-- -cx~

To determine the system response g(n), it then suffices to know its impulse response h(n)= M(6(n)). Because of linearity and time invariance, the output signal is given by the following superposition of weighted and shifted impulse responses:

O<3

g(n) - Z s(m)h(n - m). (16) m = - c x ~

This operation is called the discrete-time convolution, and is denoted by g(n)=s(n) * h(n). Like its continuous-time counterpart, the discrete-time convolution is commutative.

The eigenfunctions of discrete-time LTI systems are discrete-time complex exponentials given by

S e i g ( n ) - - e j zr t fn . ( 1 7 )

Note that the frequency variable f is still continuous. Passing Seig(r/) through our LTI system yields the output signal

oo

g(n) -- e j2rcfn . ~ h(m)e -j2rcfm -- Seig(n)" HDT(f), (18) m=-cx~

where

oo

HDT(f ) - - ~ h(n)e -j27tfn (19) n=--oo

is the DTFT of h(n), which can be regarded as the transfer function of the LTI system, or the spectrum of the signal h(n). We denote this relation by h(n)o--~HDT(f). Clearly, the spectrum of a discrete-time signal is periodic over f. Indeed, s(n) can be regarded as the Fourier series representation of HDT(f). TO reconstruct h(n) from its spectrum, it therefore suffices to consider a single period of HDT(f):

f l / 2

h(n) ~ HDT( f ) ::~ h(n) -- HDT( f )e jz~/n df. . ! - 1 / 2

(20)

8 TIL AACH

As in the continuous-time case, it is straightforward to show that the spectrum of an output signal of an LTI system is the product of the spectrum of the input signal and the transfer function of the LTI system:

g(n) = s(n) * h(n) o--~ GDT(f) -- SDT(f)" HDT(f). (21)

While the discrete-time convolution in Equation (16) can be implemented on digital signal processing (DSP) systems, the spectral-domain relations are of less practical value, since they depend on a continuous frequency variable.

C. The Discrete Fourier Transform and Block Transforms

Let us now consider a finite-duration signal s(n), n = 0 , . . . , N - 1 comprising N samples. Seeking a spectral-domain representation for s(n) by N frequency coefficients SDvT(k), k = 0 , . . . , N - 1 , we start from its DTFT SDT(f), which is a sum over N components. SDT(f) is periodic with period 1, and therefore fully specified by one period, for instance 0 _<f< 1. Seeking to represent the N-sample sequence s(n) by N discrete frequency coefficients, we take N equally spaced samples from one period of SDT(f), 0 < f < 1, thus obtaining the DFT of s(n) by

SDFT(k) - - SDT -- ~ s(n)e -j(2n/w)kn, k - 0 , . . . , N - 1. (22) n--0

The finite-duration signal s(n) can be recovered from its DFT SDvT(k) by the inverse DFT (see Appendix A)

1 N-1 s(n) o--o SDvT(k)=:~ s(n) -- -~ ~ SDFT(k)e j(2rt/N)kn, n = 0 , . . . , N - 1 . (23)

The DFT hence represents a finite-duration discrete-time signal s(n) of N coefficients by N discrete spectral coefficients SDvx(k), and is therefore perfectly suited for numerical implementation. In general, the frequency coefficients are complex, offering 2N degrees of freedom. However, for real s(n), the DFT obeys the symmetry condition SDFT(k) - - S D F T ( N - k), which reduces the degrees of freedom to N.

Since the DFT applies to finite-length signals, samples of signals of long duration must be collected into successive segments or blocks of finite length, which are then subjected to the DFT. Transforms like the DFT are therefore termed block transforms. The block length is limited by practical considerations, like available memory and performance of the digital signal

FOURIER, BLOCK, AND LAPPED TRANSFORMS 9

processing system. More important, however, is the influence of statistical signal properties on the block length: the notion of power spectrum, for instance, is meaningful only for (wide sense) stationary random signals. Real data, like speech or images, are stationary only over short time intervals and blocks of rather small extent, respectively. Applications of spectral analysis, like power spectrum estimation by block transforms or block transform coding, therefore only make sense when applied to reasonably short and small segments. In the JPEG still image compression standard, images are processed in blocks of 8 • 8 pixels. Speech can be considered stationary for intervals of the order of 10-50 ms. When sampled at 8 kHz, this translates into blocks with 64-256 samples.

Linear block transforms are conveniently expressed as matrix operations. Grouping the signal samples s(n), n - - 0 , . . . , N - 1 into a column vector s - [s(0), s(1), . . . , s ( N - 1)]T, and the frequency coefficients into a vector S--[SDFT(0), SDFT(1),..., SDFT(N--1)] 7", Equation (22) can be written as

S = W.s , with W = [W]kn, (24)

where W is the square N x N transform matrix with entry

Wkn = e -j(2rr/N)kn (25)

in the (k+ 1)th row and (n+ 1)th column. The inverse transform in Equation (23) can be expressed by

Thus,

1 s - ~ (W*) T. S. (26)

(W*) 7~. W = N. I, (27)

where I is the identity matrix. The DFT transform matrix is hence unitary up to a factor N, or, in other words, the DFT basis functions are orthogonal.

We have derived the DFT by sampling the first period of the DTFT of a signal of length N with a sampling period 1/N. What are the consequences of this frequency-domain sampling operation? Comparing the Fourier trans- form S ( f ) of a continuous-time signal s(t) according to Equation (7) to the DTFT in Equation (19), we see that replacing a continuous-time signal by discrete equally spaced samples with sampling period one leads to a periodic spectrum with period one in the DTFT of Equation (19). Also, the Fourier transform and its inverse in Equation (8) are almost identical in structure.

10 TIL AACH

Apart from a sign change in the exponent, the signal s(t) and its spectrum S(f) are simply interchanged, as are the time and frequency variables t and f. Therefore, like time-domain sampling leads to a periodic spectrum, frequency-domain sampling of the periodic spectrum leads to a periodic discrete signal Sp(n) by periodically repeating s(n) with a period that is the inverse of the sampling period in the frequency domain. Since the frequency- domain sampling period in Equation (22) is 1/N, the periodic signal Sp(n) is given by

o o

Sp(n) -- ~ s(n + rN). (28) r = - c x ~

Hence, the DFT represents one period of a periodic discrete-time signal by one period of its discrete-frequency periodic spectrum. Both the "actually" transformed signal and its spectrum are therefore periodic. This implicit, underlying periodicity must not be overlooked when applying the DFT. One consequence is the occurrence of spurious high-frequency artifacts in the DFT spectrum, which are generated when the block-end signal coefficients s(0) and s(N-1) differ strongly. The periodically repeated signal Sp(n) then exhibits abrupt transitions, which "leak" spectral energy into high-frequency spectral coefficients. To illustrate this effect, Figure 2 shows the signal s(n)=cos(4rcn/64), n = 0 , . . . , 6 3 and its DFT spectrum. Since two periods of the cosine fit perfectly into the analysis interval, periodic repetition of s(n) generates a smooth signal. As expected, the DFT spectrum exhibits two "clean" peaks at k = 2 and k=62 . This is vastly different in Figure 3: the frequency of the cosine is slightly increased such that now 2.5 periods fit into the data interval, with s(n)= cos(5rcn/64), n = 0 , . . . , 6 3 . Periodic repetition generates transitions from almost -1 to 1 between block-end samples. The effect of these transitions is evident in the DFT spectrum, which is now spread over all frequency coefficients.

This example also illustrates one important application of block transforms: as Figure 2 shows, a block transform may concentrate the signal energy into only a small number of spectral coefficients. This property is essential for data compression by transform coding. From Figure 3, however, it becomes clear that the DFT is probably not the optimal transform for this purpose, due to problems caused by discontinuities at the block ends. In the next section we examine transform coding in more detail. We will see that, although better transforms than the DFT exist for transform coders, artifacts caused by block boundaries persist. This is the main motivation for the development and use of lapped transforms.

F O U R I E R , B L O C K , A N D L A P P E D T R A N S F O R M S 11

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8

-1 0

cosine wave

11 TI i"

1 ( ) I = I

10 20 30 40

n

T.lf TT li"

f I' q

6O

30,.. I i

20-

15-

10-

C~. :C_ j 0

modulus DFT

F

10 20 30 40 50 60

k

FIaURE 2. Top: Source signal s(n) = cos(4rcn/64) for n = 0 . . . . . 63. Bottom: Modulus DFT spectrum SDva'(k) of s(n) for k = 0 . . . . . 6 3 . Since periodic repetition of s(n) does not create discontinuities, the DFT spectrum exhibits the expected two peaks.

Various fast and highly efficient algorithms are available for the computation of the DFT and its inverse ("fast Fourier transform," FFT). These are widely used in applications like power spectrum estimation, fast convolution, adaptive filtering, noise reduction, and signal enhancement, as

12 TIL AACH

0 .8

0 .6

0 .4

0 .2

- 0 . 2

- 0 . 4

- 0 6

-0.8

cosine wave qp~] ; ,

T.lq T

,

@

b l 20

- I m 1 0 10 30 50

n

i I 'I

T

Ti 60

modulus DFT

TI 15 I

i

I 10

0 10 20 30 k 40 50 60

!

i

FIGURE 3. Top: Source signal s(n) = cos(5rtn/64) for n -- 0 . . . . . 63. Bottom: Modulus DFT spectrum SDFT(k) of s(n) of k = 0 . . . . . 63. Periodic repetition of s(n) results in strong discontinuities between the periods, causing the spreading of signal energy over all frequency coefficients.

well as in m a n y others. S o m e o f these appl icat ions require the use o f

over lapp ing segments , others the use o f segments that are subjected to a s m o o t h w i n d o w funct ion such that d iscont inui t ies at b lock ends are reduced or e l iminated ( O p p e n h e i m and Schafer, 1998; Ziemer et al., 1989, Chap. 11).

FOURIER, BLOCK, AND LAPPED TRANSFORMS 13

III. TRANSFORM CODING

A. The Role of Transforms: Constrained Source Coding

The aim of source coding or data compression is to represent discrete signals s(n) with only a small expected number of bits per sample (the so-called bit rate), with either no distortion (lossless compression), or as low a distortion as possible for a given rate (lossy compression). Since we try to optimize the trade-off between distortion and rate on the average, we regard signals as random which we describe by their statistical properties. The essential step in source coding is quantization (Goyal, 2001, p. 12). A straightforward approach is so-called pulse code modulation (PCM), where each sample is quantized individually at a fixed number of bits, e.g., eight bits for gray-level images. Most signals representing meaningful information, however, exhibit strong statistical dependencies between signal samples. In images, for instance, the gray levels of neighboring pixels tend to be similar. To take such dependencies into account, possibly large sets of adjacent samples should be quantized together. Unfortunately, this unconstrained approach leads to practical problems even for relatively small groups of samples (Goyal, 2001).

In transform coding, the signals or images are first decomposed into adjacent blocks or vectors of N input samples each. Each block is then individually transformed such that the statistical dependencies between the samples are reduced, or even eliminated (Clarke, 1985; Zelinski and Noll, 1977; Goyal, 2001). Also, the signal energy which generally is evenly distributed over all signal samples s(n) should be repacked into only a few transform coefficients. The transform coefficients S(k) can then be quantized individually (scalar quantization). Each quantizer output consists of an index i(k) of the quantization interval into which the corresponding transform coefficient falls. These indices are then coded, e.g., by a fixed length code or an entropy code. The decoder then first reconverts the incoming bitstream into the quantization indices, and then replaces the quantization index i(k) for each transform coefficient S(k) by the centroid V(i(k)) of the indexed quantization interval, which serves as an approxi- mation, or better, estimate, S(k) = V(i(k)) of S(k). The relation between the indices i(k) and the centroids V(i(k)) is stored in a look-up table called a codebook. An inverse transform then calculates the reconstructed signal s The principle of a transform coder and decoder (codec) is shown in Figure 4. Clearly, due to quantization, the compression technique is lossy. The distortion caused by uniform scalar quantization is discussed in Appendix B. Optimizing a transform codec needs to address choosing an

14 TIL AACH

FIGURE 4. Block diagram of a transform coder and decoder. The signal vector s is first transformed into the transform coefficient vector S= As. The transform coefficients are quantized. The quantization indices i(k) are encoded into codewords and multiplexed into the bitstream which is transmitted over the channel. The decoder first demultiplexes the bitstream into the codewords, which are then reconverted into the quantization indices i(k). The decoded quantization indices are used to access the codebooks, yielding the quantized transform coefficient values S(k)= V(i(k)). These are subjected to an inverse transform to obtain the reconstructed signal vector w = A-Is.

opt imal t ransform and opt imal scalar quant izat ion of the t ransform coefficients. Since the opt imizat ion is thus constrained by the architecture outl ined in Figure 4, we speak of constrained source coding.

Practical t ransform codecs employ linear uni tary or or thogonal trans- forms. Linear t ransforms explicitly influence linear statistical dependencies, that is, correlations. In the next section we therefore first discuss uni tary t ransforms subject to the criteria of decorrelat ion and energy concentrat ion. We then show that the opt imal t ransform with respect to these criteria is also opt imal with respect to the reconstruct ion errors incurred at given rates.

B. Transform Efficiency

Model ing the signal s(n) as wide-sense s tat ionary over n = 0 , . . . , N - l , the

mean value is constant for all samples. Wi thout loss of generality, we

assume that the mean is zero, if necessary by having first subtracted a potent ial nonzero mean f rom the data. The autocovar iance function (ACF) is then given by cs(n) = E(s(m)s(m+n)), where E denotes expectation, and the

FOURIER, BLOCK, AND LAPPED TRANSFORMS 15

(constant) variance 0-2 of s(n) is 0-2= r The ACF can be normalized by r = 02 .ps(n), with p~(0)= 1 and Ips(n)l _< 1. Alternatively, covariances can be expressed by the covariance matrix C~ (Fukunaga, 1972; Therrien, 1989), which is an N x N matrix defined by

Cs = E[ss T] = 0 -2.

1 ps(1) ps(2) . . . p s ( N - 1) Ps(1) 1 ps(1) . . . p s ( N - 2)

i p s ( N - 1 ) p ~ ( N - 2) p ~ ( n - 3) . . . 1

(29)

The entry in the (n+l) th row and (k+l ) th column of Cs is thus given by c~(In-kl). The covariance matrix of a wide-sense stationary signal vector is evidently a positive semidefinite and symmetric Toeplitz matrix (Therrien, 1989; Makhoul, 1981; Akansu and Haddad, 2001); indeed, C~ is symmetric about both main diagonals (persymmetric) (Unser, 1984). We transform the signal vector s into the coefficient vector S = A. s by a linear, unitary trans- form. The transform is described by an N x N matrix A, with A - l = A H, where the superscript H denotes conjugate transpose (cf. Equation (27)). For instance, A could be a unitary D F T defined by A = 1 / ~ . W, with W given by Equation (25). A unitary transform preserves Euclidean lengths:

II81122 - s H s - - s T �9 AHA �9 s - s T. I . s - Ilsl122, (30)

where s H - s T, since s is real. The covariance matrix C s of the transform coefficients can then be derived as

Cs = E[SS H] = AE[ssT]A H -- ACsA n, (31)

and also det(Cs)=det(C~). Furthermore, the sum of the variances of the signal and transform coefficients are identical:

N-1

N . 0-2 _ tr(Cs) - tr(Cs) - ~ 0-~(k), (32) k=0

where tr(C) is the trace of matrix C. In general, the nondiagonal entries of Cs differ more or less strongly from zero, reflecting correlations between the signal samples s(n).

We now seek a unitary transform matrix A, which decorrelates as much as possible the input data. Hence, we seek a transform matrix such that the covariance matrix Cs of the transform coefficients is diagonal or nearly

16 TIL AACH

diagonal (Fukunaga, 1972; Therrien, 1989; Clarke, 1985; Goyal, 2001). At the same time, we seek to concentrate optimally the signal energy into only a few dominant transform coefficients. The decorrelation efficiency r/d can be measured by comparing the sums of absolute nondiagonal matrix entries before and after transformation by (Akansu and Haddad, 2001, p. 33)

lid - - 1 -- ~-~k,l,k~s I[Cskzl Em,n ,mr ][Cs]mnl

(33)

Energy concentration can be evaluated by the relative energy contribu- tion of the L < N transform coefficients with lowest energy to the total energy. Ordering the variances of the transform coefficients by rank to cr2(0) > ~r2(1) > ... a Z ( N - 1), such a measure is

N - 1 = Y~'~k=L cry(k) (34)

tr (Cs) '

which is sometimes referred to as the basis restriction error (Jain, 1979; Unser, 1984; Akansu and Haddad, 2001). Denoting the rows of A by N-component row vectors a[, k - 0 , . . . , N - l , we obtain for the variances crZ(k) by evaluating Equation (31) for the entries along the main diagonal

~r~(k) r ak (35) : a k C s .

Minimizing the basis restriction error subject to the real constraint akT. at* = 8 ( k - l) is equivalent to minimizing the functional

N - 1

J ~ Z T * T * [a k C s a k - X k ( a k a k - - 1)], (36) k=L

with Langrangian multipliers ~,k, and where we have taken into account that the denominator in Equation (34) is invariant under a unitary transform. It can straightforwardly be shown that J is minimized by the normalized eigenvectors uk, k - - 0 , . . . , N -1 of the data covariance matrix Cs (Therrien, 1992, pp. 50, 694; Therrien, 1989; Akansu and Haddad, 2001). The eigenvectors fulfill

CsUk -- ~kUk. (37)

Since Cs is symmetric and positive semidefinite, its eigenvalues ~,k are real and nonnegative. Its eigenvectors are orthogonal, and, since the eigenvalues

FOURIER, BLOCK, AND LAPPED TRANSFORMS 17

are real, the eigenvectors can always be found such that their elements are real (C, also has complex eigenvectors, obtained by multiplying the real eigenvectors by a nonzero complex factor). The unitary transform matrix A is given by

A - . . (38)

uT_I

This transform is called the Karhunen-Lo6ve transform (KLT) (Fukunaga, 1972; Therrien, 1989). The variances of the transform coefficients are given by the eigenvalues )~k, since from Equation (35)

U k CsUk - - O k )~kUk - - )~k, (39)

where we have considered only real eigenvectors. Also, since the eigen- vectors are orthogonal, we have for the nondiagonal entries of the covari- ance matrix C s

T C s U l T [Cs]kl -- u~ -- Uk )~lul -- 0 for k r 1. (40)

Hence, Cs is a diagonal matrix, and the transform coefficients are perfectly decorrelated. We constrain the eigenvectors to be real, and order them in Equation (38) by rank of their eigenvalues. Up to the sign of the eigenvectors, the KLT then is the unique unitary transform which minimizes the basis restriction error and perfectly diagonalizes the covariance matrix if the eigenvalues are all distinct. Also, invoking Hadamard 's inequality which states that the determinant of any symmetric, positive semidefinite matrix is less than or equal to the product of its diagonal elements, we obtain an additional measure for energy concentration: we find that the determinant of a covariance matrix is always less than or equal to the product over all variances, i.e.,

N-1

det[C,] - det[Cs] _ 1-Ias(k) 2. (41) k=O

If Cs was obtained by the KLT, we have equality:

N-1 N-1

det[C,] - det[Cs] - I-I Xk -- F I ors(k)2" (42) k=O k=O

18 TIL AACH

Hence, the KLT minimizes the geometric mean of the variances to (Zelinski and Noll, 1977; Goyal, 2001)

1IN

(43)

As we will see later on, this measure is directly related to the distortion of a transform coder as a function of the rate.

Although thus optimal in theory, the KLT has two drawbacks. First, it depends on the covariance structure of the data. Second, there is no general fast algorithm for computation of the KLT. Fortunately, as we will see subsequently, the KLT is in practice well approximated by sinusoidal tranforms like the DCT and lapped transforms. Let us first examine how the DFT is related to the KLT. Rewriting the covariance matrix in Equation (29) as

C S m

c(0) c(1) c(2) . . . c (N - 1)

c(1) c(0) c(1) . . . c ( N - 2)

c ( N - 1 ) c ( N - 2) c ( U - 3) . . . c(0)

= toeplitz[c0, C l . . . . . CN--2, r

(44)

we form another symmetric Toeplitz matrix"

Ds - toeplitz[c0, ON-l, r r c(O) c(U-1) c(U-2) . - - c(1)

c ( U - 1 ) c(O) c ( U - 1 ) .-- c(2) = . . (45)

c(1) c(2) c(3) . . . c(O)

Similar to the decomposition of a signal s(n) into the sum s(n) = Se(n) -+- So(n) of an even signal Se(n)=O.5[s(n)+s(-n)] and an odd signal So(n)= 0.5[s(n)- s(-n)], we can decompose the covariance matrix Cs into the sum of a circulant and a skew circulant matrix (Unser, 1984). The circulant matrix is calculated by

1 E - ~ [Cs + Ds] - toeplitz[eo, e l , . . . , eN-1], (46)

FOURIER, BLOCK, AND LAPPED TRANSFORMS 19

and the skew circulant by

1 O - ~ [C~ - Ds] - toeplitz[o0, o l , . . . , O N - l ] . (47)

Evidently, e0 = Co and o0 = 0. The entries ei and oi, i = 1 , . . . , N - 1 are related to ci by

1 1 ei - -~ [Ci -~- C N - i ] - - e N - i and Oi - - -~ [Ci - - C N - i ] - - - - O N - i , (48)

and the covariance matrix Cs is the sum

C, = E + O (49)

As shown in Unser (1984, Sect. 4), Therrien (1992, Sect. 4.7.2), and Akansu and Haddad (2001, p. 43), the basis functions of the unitary D F T form complex eigenvectors Uk of the circulant matrix E. Denoting the elements of Uk by uk(n), we thus have

1 { } uk(n) - - - ~ exp +j --~ kn . (5o)

Similarly, the basis vectors of a related transform called the discrete odd Fourier transform are eigenvectors of O. The eigenvalues of E are then given by the D F T of its first row:

N1 ) ~ - - ~ e n . e x p • , k - 0 , . . . , N - 1 . (51)

n=0

Because of the symmetry en=eN-n, n - - 1 , . . . , N - I , the D F T is real and E symmetric, that is L~ - LU-k, k - 1 , . . . , N - 1 . Therefore, eigenvectors with

real elements can also be found for E, like real or imaginary parts of Equation (50). The D F T can be simplified to

) ~ - ~ e , cos - - k n . (52) n=0

Recalling from Equation (29) that the elements of a covariance matrix are given by the samples of the ACF, and regarding E as a valid covariance

2 0 T I L A A C H

matrix, the eigenvalues )~ can also be interpreted as power spectral coefficients.

Although we have thus found fast KLTs for circulant and skew circulant matrices, this does not generally solve for the KLT of the sum. We therefore analyze now a specific parametric covariance model, which is often used as an elementary approximation of the short-time behavior of s(n). Let w(n) denote zero-mean white noise with variance a 2, which is

2 ~(t/), and its covariance stationary by definition. Its ACF is c w ( n ) - a w �9 matrix is the W x N diagonal matrix Cw = diag[a2w, a2w,..., a2w]. We model s(n) as the output of a first-order recursive LTI system with input w(n); s(n) then is also stationary and obeys s ( n ) = p s ( n - 1 ) + w ( n ) , with [p[ < 1. Transfer function HDT(f) and impulse response h(n) of the LTI system are

1 �9 --~ h(n) - e(n) pn (53)

H D T ( f ) - - 1 - - pe-JZrcf

where e(n) is the unit step sequence, i.e., e ( n ) - 1 for n>0 , and zero otherwise. The ACF of this first-order autoregressive (AR(1)) or Markov-I process is

2 aw (54) Cs(n) - a2p Inl, with 0 .2 - 1 - p2"

The covariance matrix Cs then is

Cs - a 2" toeplitz[1, p, p2 , . . . , pN-1]. ( 5 5 )

The correlation between samples of s(n) decays exponentially with their distance, and p is the correlation between directly adjacent samples. Practically, approximation of the short-time and short-space behavior of speech and image signals, respectively, leads to p positive and close to one (Ahmed et al., 1974; Clarke and Tech, 1981; Clarke, 1985; Malvar, 1992b; Goyal, 2001; Akansu and Haddad, 2001). The eigenvectors of the covariance matrix are sinusoids (Ray and Driver, 1970; see also Clarke and Tech, 1981; Akansu and Haddad, 2001, p. 36) the frequencies of which are not equally spaced on the unit circle. No fast algorithm for computing this KLT exists. Fortunately, as shown numerically in Ahmed et al. (1974), the KLT for an AR(1) process with p sufficiently large is well

FOURIER, BLOCK, AND LAPPED TRANSFORMS 21

approximated by the DCT. Element n of basis vector k of the DCT is defined as

ak(n) - ~ (2n + 1 cos \ 2N

n - O , . . . , N - 1 .

f o r k - 0

kn) for k - 1 , . . . , N - 1 (56)

For a visual comparison, Figures 5 and 6 depict the KLT basis functions for p=0.91 , N = 8 and the DCT basis vectors. Clarke proved analytically that the KLT of an AR(1) process approaches the DCT as p approaches one (Clarke and Tech, 1981). Moreover, the DCT of an N-point signal vector can be regarded as the 2N-point DFT of the concatenation s(n) and the mirrored signal s(2N-1-n) (Clarke and Tech, 1981; Lira, 1990, p. 148). Periodic repetition of the concatenated signal is not afflicted with discontinuities between the periods, thus avoiding the spreading of spectral

0.2 . . . . . . . . . -. 0 . " . . . . . . . . .

0 2 4 6

-0 .2 .. . v ' " " v 0 2 4 6 A ~ _ A

0 2 4 6

...... !, ..... i - : : i l -o.~ L....- ....i ..... .-.-.-.~IT',.,. -0.41 ..k i

0 2 4 6

A

-0.4J: ~ i "

0 2 4 6

T. i" " .."i 1 O.~o -. -T T-i v

0 2 4 6

i ..... . ' T ....... l v

0 2 4 6

t" ~176 -- ..... -... ' i. i !... .i i.i..., i. T"I'I.. ".i'.",...l " v

0 2 4 6

FIGURE 5. Numerically computed KLT basis vectors of an AR(1) process for p=0.91 and N-- 8.

22 TIL A A C H

!!ill I" I-I - I- .... i I-I- 0 2 4 6

~ .......... i ..... ~ : ....... i~ ~ o . . 1' ....... :: . . . . . . . . . . . . j , . . . .

:Oo:iI .... ......--.-,-!...-.......!-.!..... ..... ............i 0 2 4 6

0 2 4 6

::24; ' ' ........ T""""" ....... i ' / i ......... T ' " ' ! ..... /11

:Oo:!f: :::!: ..... :+:::: ::::i ...... t 0 2 4 6

0.4~ ...... ' ........ ": ................... ; ............ :'...- i ....... 0 2 . . . . . . . . . . . . : . . . . . . . .

0 2 4 6

0.4 . . . . . . . ; . . . . . . . . :..! . -~.--. ! ....

o.: ........... ~ ....... T I + ....... 1

-o.~I '..i ...... .; ..... ~. ..... .......... i 1 -0.4[ ..... ~ .... : ........ i q

0 2 4 6

0.4 ........ ' -! ........... ~ --- .... - " -:.-

o.:: ,: T, T1 -0.2 .-. :: ................

_o.~I ..... ! ~ ........ ! .......... 1 0 2 4 6

o . 4 . . . . . :: . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ . . . . . . . . . . . . . . . . . i . . . . . . . .

0~; .................. T ................... .T .............. ~ .........

0 2 4 6

FIGURE 6. Basis vectors of the unitary DCT for N = 8. Up to a sign, the similarity to the KLT basis vectors in Figure 5 is evident.

energy caused by the DFT leakage artifacts (Lim, 1990, p. 645). More details are given in Appendix C.

Figure 5 also illustrates symmetry properties of the KLT: evidently, half of the eigenvectors are invariant to reversing the order of their elements; they are called (even) symmetric. For these vectors, we have ui--Jui, where J denotes the N x N counter identity matrix (or reverse operator), with ones along the second diagonal and zero elsewhere. For the other half, we have Ili----Jui; these vectors are skew symmetric. In fact, for persymmetric matrices C with distinct eigenvalues and N even, half of the eigenvectors are symmetric, while the other half are skew symmetric (Cantoni and Butler, 1976; Makhoul, 1981; Unser, 1984; Akansu and Haddad, 2001). The same symmetry properties hold for the DCT basis vectors, half of which are symmetric, while the other half are skew symmetric. We will need this property for the construction of lapped transforms.

Let us summarize the results of this section:

�9 The covariance matrix of a wide-sense stationary random signal is a persymmetric Toeplitz matrix.

�9 The orthogonal linear transform generating perfectly decorrelated transform coefficients from a wide-sense stationary signal is the KLT,

FOURIER, BLOCK, AND LAPPED TRANSFORMS 23

which is unique except for a sign in the eigenvectors if the eigenvectors are constrained to have only real elements. For an even number N of samples, half of the eigenvectors are symmetric, while the other half are skew symmetric. Also, the KLT maximizes energy concentration as measured by the basis restriction error, and minimizes the geometric mean of the transform coefficient variances. The covariance matrix of a wide-sense stationary process can be decomposed into the sum of a circulant and a skew circulant matrix. A KLT of the circulant matrix is the DFT. Real data can often be regarded as a first-order autoregressive (AR(1) or Markov-I) process with relatively high adjacent-sample correlation. A KLT for this model is well approximated by the DCT. As the adjacent-sample correlation approaches one, this KLT approaches the DCT.

For the AR(1) process with p--0.91 and N = 8 , the decorrelation efficiency of the DCT is r/d=98.05% (for the KLT, r/d= 100% by design). The basis restriction errors are given in Table 1.

C. Transform Coding Performance

In this section we show how to distribute optimally an allowable maximum bit rate to the transform coefficients in Figure 4 such that the average distortion is minimized, and quantify the distortion. Since a unitary transform preserves Euclidean length, it is straightforward to show that the distortion introduced by quantization in the transform domain is the same as the mean square error of the reconstructed signal (Huang and Schultheiss, 1963; Zelinski and Noll, 1977). Denoting the quantized

^

transform coefficient vector by S, and the reconstructed signal vector by ~, the average distortion is (cf. Equation (30))

- - 1 E [ ( s - ~ ) T ( s - - ~)] D - N1 E[(S _ ~)H(s - S)] - ~ . (57)

TABLE 1 BASIS RESTRICTION ERROR (%) FOR KLT AND DCT

L 0 1 2 3 4 5 6 7

KLT 100 20.5 8.9 5.2 3.3 2.1 1.3 0.61 DCT 100 20.7 9.1 5.2 3.3 2.2 1.3 0.61

24 TIL AACH

For sufficiently fine quantization, it is shown in Appendix B, Equation (119), that the distortion D(k) of the k-th transform coefficient depends on the allocated bit rate R(k) by

D(k) -- v(k)" a2s(k) �9 2 -2R(k). (58)

The required bit rate for a given maximum distortion then is

1 1 [a2(k)] R(k) - -~ log2[y(k)] + ~ log 2 kD(k)j (59)

The parameters y(k) depend on the distribution of the coefficients and the type of quantization. Assuming a Gaussian signal, the transform coefficients are also Gaussian. (Transform coefficients perfectly decorrelated by a KLT are then also statistically independent.)

The y(k) are then all identical, and the rate simplifies to

1 1 [4(k) l R(k) - 5 l~ + 2 l~ k D(k) J" (60)

Minimizing the average distortion

1 ~ D(k) D - ~ k=O

(61)

subject to a fixed average rate

-- _ _1 ~ R(k) R N k=O

(62)

yields that all transform coefficients have to be quantized with the same distortion D ( k ) - D , k - O , . . . , N - 1 . The optimum bit rate for the kth transform coefficient is

1 R(k)- R +g lOg2L ~M J' (63)

where O'2M is the geometric mean of the transform coefficient variances introduced in Equation (43). (Potential negative rates for low-variance coefficients may be clipped, see e.g., Zelinski and Noll, 1977; Goyal, 2001).

FOURIER, BLOCK, AND LAPPED TRANSFORMS 25

Inserting this result into Equation (58), and with D(k) - D, we obtain for the distortion as a function of rate given optimal bit allocation

m

D - ?,. 2 -2R. crZM. (64)

As we saw above, ~rzM is minimized by the KLT; hence, the KLT is the transform minimizing the distortion under optimal bit allocation. To quantify the performance of a transform coder, the optimal transform coding distortion is compared to the distortion DpcM of PCM. In the latter, the transform matrix can formally be set to the identity matrix. Then, the transform coefficients are identical to the signal samples, and the coefficient variances are identical to the signal variance cr 2. We thus obtain for the transform coding gain

a 2 (I/N) ~N__-01 a2(k) (65) a T c - = ,

where the rightmost identity follows from the energy preservation property of unitary transforms. For an AR(1) process with p=0.91 and N = 8, the transform coding gains of DCT and KLT are 4.6334 (6.66 dB) and 4.668 (6.69 dB), respectively. Evidently, the DCT is a very good approximation to the KLT. Experiments show that this result holds also for covariance matrices estimated from real speech or image data (Zelinski and Noll, 1977; Malvar, 1992b; Clarke, 1985; Akansu and Haddad, 2001).

IV. Two-DIMENSIONAL TRANSFORMS

So far we have considered only 1D signals and their transformations. In this section we generalize to 2D signals. Let s(m,n) denote a real signa! defined over the 2D block m,n=O, . . . ,N -1 , and S(k,l) the transform coefficients for k, 1 = 0 , . . . , N-1 . (Without loss of generality, the restriction to square blocks simplifies notation.) Signal samples and transform coefficients can be regarded as N x N-matrices s and S, respectively. The basis vectors ak= [ak(0),..., ak(N--1)] r, k = 0 , . . . , N - 1 , are then replaced by basis matrices bkt= [bkl]mn. The transform coefficients are calculated by

N - 1 N - 1

S(k, 1) -- Z Z s(m, n)bkz(m, n). m=O n=O

(66)

26 TIL AACH

With a 4D transform tensor T, this can be expressed as (Malvar, 1992b, p. 22)

S = Ts, with T = [T]ktm n = [bk/(m, n)]. (67)

Alternatively, we can order the signal samples row by row into a NZ-dimensional column vector Sv as

Sv = [s(0, 0), s(0, 1) , . . . , s(0, N - 1), s (1 ,0 ) , . . . , s ( N - 1, N - 1)]r. (68)

Similarly, a transform coefficient vector Sv can be formed. Ordering the entries bk/ (m,n) in an appropriate order in a N 2 x N 2 matrix B, we can express the 2D transform as a product of a matrix with a vector as

Sv = Bsv. (69)

Clearly, for real signals and transforms, this product requires O(N 4) multiplications and additions.

In practice, however, so-called separable 2D transforms are used almost exclusively. The entries bk/(m, n) of the (k+l) , ( l+l ) th basis matrix of a separable transform are calculated from 1D basis vector entries by bk / (m , n ) - -ak (m )a t (n ) . For the unitary 2D DFT, this yields

1 e_J(2rc/N)(km+ln ) bla(m, n) - -~ (70)

and for the 2D DCT, we obtain

b~l(m,n) =

1 N

( 2 m + l ) --~-cos\ 2N krc

~/2 ( 2 n + l ) --~-cos\ 2N lrt

2 \ {2m+l )zN ( 2 n + l \ 2N

cos/ krt cos

f o r k = l = 0

for 1 = 0,k = 1 . . . . . N - 1

for k = 0 , / = 1 . . . . , N - 1

for k , l = 1 . . . . . N - 1

m,n = 0 , . . . , N - 1. (71)

FOURIER, BLOCK, AND LAPPED TRANSFORMS 27

The matrix B in Equation (69) can then be written as the Kronecker product of the N x N transform matrix A for a 1D signal of length N with itself:

I 1 a00 A a01A . . . a0N- 1A

B - A | A - - : . ( 7 2 )

aN-10A aN-11A "" aN-1N-1A

In the tensor notat ion of Equation (67), the transform simplifies to the product of three N x N matrices as

S - AsA r, (73)

where the multiplication from the right by A r is a transform of the rows of s, while the multiplication from the left by A transforms the columns. The 2D transform can hence be realized by a 1D transform along each row of the signal block followed by a 1D transform along each column of the result, or vice versa. Evidently, the number of multiplications and additions needed by Equation (73) is O(N3), down from O(N 4) for the nonseparable transforms, if no fast algorithms are used.

As an illustration, Figure 7 depicts the real part of a basis matrix for the D F T computed from Equation (70) and a basis matrix for the DCT according to Equation (71). A comparison shows that in the case of a real transform, separability comes at a price: while the DFT basis matrix exhibits an unambiguous orientation, this is not the case for the DCT, which consists of two cosine waves with different orientations. The separable 2D D F T is

FIGURE 7. Left: Real part of a basis matrix of the 2D DFT for N= 16, k= l= 2. Right: 2D DCT basis matrix for N = 16, k = l = 4.

28 TIL AACH

therefore unambiguously orientation selective, while the separable 2D DCT basis matrices are sensitive to two different orientations. Unambiguous orientation selectivity is desired in applications like adaptive enhancement of oriented structures, such as lines and edges. More on this topic can be found in Section V.E, and in Kunz and Aach (1999) and Aach and Kunz (1996a, 2000).

In the following, we will consider only separable transforms. Since these can always be implemented as a sequence of 1D transforms, we will return to the 1D notation for the remainder of this chapter.

V. LAPPED TRANSFORMS

A. Block Diagonal Transforms

In the preceding discussion it was sufficient to express the transform operations with respect to single blocks. The development of lapped transforms will require the joint consideration of several neighboring blocks. Denoting the (m+ 1)th block by Sm -- [s(mN), s(mN+ 1),..., s(mN+N- 1)]v, a signal st consisting of M blocks can be written as s ~ - [s0,sl,...,SM-1]. Similarly, with Sm= Asm being the transform coefficients for the (m+l)th block and stacking these, we obtain

So ... 0 So S1 A s1

St - - A �9 - T ' s t , (74) A

SM-1 0 "'. SM-1

where the matrix T =diag(A, . . . ,A) is block-diagonal. The inverse trans- form is given by

st -- T T s t -- diag(AV,... , AV), (75)

where we have assumed a real transform. Evidently, orthogonality of the blockwise transform can also be expressed as orthogonality of the transform matrix T.

FOURIER, BLOCK, AND LAPPED TRANSFORMS 29

B. Extension to Lapped Transforms

As already shown in Figure 1, independent block processing may create artifacts at the block boundaries. These are caused by the discontinuous transitions to zero at the ends of the transform basis functions (Malvar, 1992b; Aach and Kunz, 2000). Block artifacts could hence be avoided by using basis functions that decay smoothly to zero. Perfect reconstruction by an inverse transform then requires that the basis functions of neighboring blocks overlap, as otherwise "holes" would appear in the reconstructed signal. The basis functions would thus have lengths L > N, while the number of transform coefficients per block must, of course, not exceed N. The square matrix A is then replaced by a nonsquare matrix P of size N x L. We consider now L = 2N. The basis functions for calculating Sm then extend over the blocks Sm and Sm+l, i.e., over the samples [s(mN), s(mN+l), . . . , s((m+2)N-1)]. The N-dimensional vector Sm of transform coefficients is then given by

S m - P [ Sm lSm+l . (76)

The next block is taken over the samples [s(m+ 1)N), . . . , s((m+3)N- 1)], and so on. This procedure is illustrated in Figure 8. Such a transform is called a lapped transform.

Since P is not a square matrix, we cannot invert Equation (76) to obtain Sm and Sm+l from Sm. We therefore formulate the transform with respect to the entire signal (or image). Writing the N x 2 N matrix P as the concatenation P = [AB] of two N x N matrices, Equation (74) becomes for a lapped transform

- S o -

S1

S t - : =

-SM-1 -

[A B] 0 0 [A B] 0 0 [A 0 0 0

0 B] 0 [A B]

~

_ B . . - 0

0

0

B A_

So s1

�9

_ S M - 1 _

- T "st,

(77)

where the wrap-around in the last row corresponds to a periodic repetition of the signal. As in block transforms, T is a square matrix, which we

30 TIL AACH

FIGURE 8. Formation of signal blocks $m and transform vectors S m in a lapped transform with basis functions of length L = 2N.

require to be orthogonal, i.e., T. T r - I. The original image can thus be reconstructed by

St -- TTst -

m

B T

A v 0 . . . 0

B v A v 0 . . . i

0 B ~ A ~

�9 0 B ~ "'. 0 A T

�9 St . ( 7 8 )

This relation shows that the inverse transform consists of two steps�9 First, each N-dimensional transform vector Sm is multiplied by the 2N • N matrix p7-, yielding a 2N-dimensional signal vector. Neighboring signal vectors overlap by N samples, and are added in a second step to obtain the reconstructed image. Alternatively, Equation (78) may be regarded as another lapped transform applied to the data St, yielding

Sm--[BTAT][ Sm-1 ] --BTSm_I+ATSm Sm (79)

which is of the same structure as Equation (76).

C. The Lapped Orthogonal Transform

The matrix product T. T T yields a block tridiagonal matrix, with entries p . p T along the main diagonal, entries A . B ~r along the diagonal

FOURIER, BLOCK, AND LAPPED TRANSFORMS 31

immediately to the left, and entries B. A T along the diagonal immediately to the right. From the orthogonality condition T. T T = I, the necessary and sufficient conditions on P = [AB] therefore are

p . pV _ A- A v + B. B v - I and A. B T = B. A T - 0. (80)

For T. T T-- I, we can equivalently write T T. T - I , from which an alter- native formulation of the necessary and sufficient condition can be derived:

A T . A + B T . B - I and A T . B - B T . A - 0 . (81)

We may also approach the orthogonality conditions by rewriting Equa- tion (76) to

sm P[ sm+lsm I 'A"'[ Sm+,Sm I (82)

Inserting this into Equation (79), we obtain

Sm -- BTAsm_I + ATBsm+I + (ArA + BTB)sm . (s3)

This equality holds with condition (81). The first condition in Equation (80) states that the rows of P, i.e., the

transform basis functions, must be orthogonal, while the second condition requires the overlapping parts of the basis functions to be orthogonal as well. A transform complying with Equation (80) is called a lapped orthogonal transform (LOT). Invoking the shift matrix V defined as

0 I ] (84) V - 0 0 '

where 0 and I are of size N • N, conditions (80) can be more compactly written as

p v m p T = 3(m)l, m -- 0, 1. (85)

Extending the above considerations towards lapped transforms with basis functions of lengths L = KN, K = 2, 3 , . . . , the matrix P then has size N x L, and condition (85) becomes (Malvar and Staelin, 1989; Malvar, 1992b)

p v m p r = 3(m)l, m = 0, 1 , 2 , . . . , K - 1, (86)

32 TIL AACH

where the identity matrix in the shift matrix V is now of order (K-1)N. Of course, for K = 1 this notation includes traditional nonoverlapping block transforms as a special case.

If P0 is a valid LOT matrix, it can be used to generate more valid LOT matrices P by P = Z. P0, where Z is an orthogonal N • N matrix. P will then also comply with condition (86), since

pvmp T - ZPoVmp~z T - Z3(m) IZ T - 3(m)l. (87)

In the following, we construct a valid LOT of order N with basis functions of length L = 2N. To obtain a transform which can be realized by a fast algorithm, the initial matrix P0 is constructed from the unitary DCT basis functions of lengths N. As we have seen in Section III.B, half of the DCT basis functions are even symmetric, while the other half are odd. Stacking the even basis functions rowwise into the (N/2) • N matrix De, and the odd ones into the matrix Do, a valid LOT matrix is (Malvar, 1992a; Akansu and Wadas, 1992; Akansu and Haddad, 2001)

l [ D e - D o ( D e - D o ) J ] P 0 - ~ D e - D o - ( D e - D o ) J ' (88)

where J is the counter identity matrix (or reverse operator) already used in Section III.B. The matrix P0 is of size N • 2N, where, similar to KLT and DCT, the basis functions in the first N/2 rows are even, while the other N/2 basis functions are odd. It satisfies condition (86), but it will not optimize the transform coding gain of, for example, an AR(1) process. Hence, for a given covariance model Cs, the orthogonal square matrix Z is determined such that its rows are identical to the eigenvectors of the covariance matrix PoCsP~. The LOT ZP0 thus consists of two steps" a transform by P0 followed by another transform by Z. The covariance matrix Cs - ZPoCsP~Z r then is diagonal. Note, however, that the LOT does not preserve the determinant of the covariance matrix. Figure 9 shows the basis functions for N = 8 and L = 16 as computed for an AR(1) process with p = 0.91. The coding gain of this LOT is 5.06 (7.05 dB).

The fast implementation of this transform reflects its two-step structure (Malvar, 1992b): the matrix P0 is realized using an N-point DCT, which is followed by a series of plane rotations used to approximate Z (Akansu and Haddad, 2001; Akansu and Wadas, 1992). The numerical values of the basis functions of the approximate LOT can be found for p=0 .95 in Malvar (1992b, p. 171).

FOURIER, BLOCK, AND LAPPED TRANSFORMS 33

~ t. -0.5

.,T.iTTT.TTT,.. ,.,T.TTr..~! l,o, 0 5 10 5 10 15

i i t '"i .TT. l'" - 0 . ' ' - 0 .

15 0

�9 t ,r.IiTTiI,T- x i ,

0

~ i ! -0.5

5 10 15 0 5

TT i~ T ~ ,t 0 5 10 15

0.5-

0t,., T T t - 0 . i . . . . . . . . . .

0 5 10 15

~ t -0.5

0.5

_0 . t 0

0.5

01t 0

e o -~

i T., ! [ 5

5

10 15

t T,iS,,, 10 15

10 15

FIGURE 9. Basis functions of the LOT for N = 8 and L = 16. The computation of the basis functions is based on an AR(1) signal model with p = 0.91. The functions are sorted from left to right and top to bottom in descending order of the eigenvalues of PoC~P0 T.

D. The Modulated Lapped Transform

The above LOT was derived by an eigenvector analysis, leading to basis functions with even or odd symmetry. An alternative approach is motivated by the close relationship between maximally decimated filter banks on the one hand and block and lapped transforms on the other (Akansu and Haddad, 2001, p. 4; Malvar, 1992b). In filter banks, the filters are often realized by a low-pass prototype shifted to N different frequency channels by modulation. In the context of lapped transforms, this leads to the so-called modulated lapped transform if the filter length L is equal to 2N. For longer filters (or basis functions), this transform is referred to as the extended lapped transform (ELT).

For L=2N, the basis functions are formed by a cosine modulated window function h(n), leading to the N x 2N transform matrix P with entries

N + I 2 ) (k + ~ ) N] ' (89)

3 4 T IL A A C H

~ I O5

.~...~ $. : t .,TTTTTTTT, 0 5 10 15

T t ,j,TT ,TT -0 i

"0 5 10 15

O I : ' ' i t �9 ,,~T t l.T, T T~. 05 ~ '

0 5 10 15

O l : i t .,~T i.T1 T , , , , -0.5 ~ '

0 5 10 15

0.5

_o.t .t...?....~ 0 5 10 15

O'it 'iT t ~ ~ ~ ;TT,~ -0.5 ,

0 5 10 15

t t +.T i..I. ,T. T,.~, - 0 . i

0 5 10 15

O" t I t ,~T I ..... 1T~T,..,, -0.5 , i

0 5 10 15

FIGURE 10. Basis funct ions of the M L T for N = 8 and L = 16. The f requency index k

increases f rom left to r ight and top to bo t tom.

for k - 0 , . . . , N - l, and n - 0 , . . . , 2N- 1. The window h(n) obeys

h ( n ) - 4-sin[(n + ~) 2 ~ ]. (90)

These basis functions are shown in Figure 10. Evidently, they are not symmetric any more. Still, the half-sine window ensures a continuous transition towards zero at the ends of the basis functions. In the following, we will show that this choice of basis functions complies with the orthogonality conditions (81).

The window function obeys the conditions

h2(n) + h2(n 4-- N ) - 1 (91)

and

h(n) - h ( 2 N - 1 - n). (92)

FOURIER, BLOCK, AND LAPPED TRANSFORMS 35

Arranging the window samples into two diagonal N x N matrices Ho and H1, we obtain

H 0 - - diag[h(0), h (1) , . . . , h(N - 1)]

H1 -- diag[h(N), h(N + 1), . . . , h(2N - 1)]

= diag[h(N - 1), h(N - 2) , . . . , h(0)] - JHoJ (93)

where JHoJ reverses both rows and columns of Ho. The modulating cosines are arranged into the N x N matrices Qo and Q1, yielding

N+ I~ (k +1~ ~ l k , n - 0 , N - I (94) 2 / k z ~ / _ / v j ~ " " ~

and

N + I k,n - 0 , . . . , N - 1 (95)

Expressing the transformation matrix P as the concatenation P - [ A B ] , we obtain

A - QoHo, and B - Q1H1. (96)

For Qo and Q1, the conditions

Q~Q1 - Q~Qo - o, (97)

Q~'Qo -- QoQ~ - I - J, (98)

and

Q ~ Q 1 - Q 1 Q T - I + J (99)

hold (see Appendix D). Inserting these into condition (81), we obtain

A T B - - HoQ~Q1HI -- 0 (100)

36 TIL AACH

and

A;rA + BVB - HoQorQoHo + H1QlrQ1H1

= H o [ I - J]Ho + Hl[I + J]H1

= n 2 + Hi 2 - noJno + n l J n l - I (101)

since Ho e + H 2 - I and HoJH0 - H1JH1. This shows that the MLT complies with the orthogonality conditions for the LOT. For the AR(1) model with p - 0.91, the MLT coding gain is 5.15 (7.12 dB).

E. Extensions

In this section we discuss three extensions of the MLT and the LOT by introducing additional basis functions which are in a certain sense com- plementary to the already existing ones.

T T In the MLT, reconstruction from the transform vector Sm - P[smSm+ 1]T only leads to

[~m w ] - pTSm - [ AT BT ] [AB] [ Sm 1 .Sm+l (102)

With Equations (96), (98), and (99), we obtain

EA ] [ ] E 1 ATA ATB H o ( I - J)Ho 0 B; r [AB] - BT A BT B -- 0 Hi(I-t- J)H1 (103)

where 0 is of size N • N. This matrix is evidently not diagonal, thus mixing coefficients from Sm with different time indices into one coefficient of the reconstructed vector ~m, and similarly for ~'m+l. By analogy to frequency- domain aliasing, where higher frequencies are mapped back onto lower ones during downsampling, this phenomenon is called time-domain aliasing. Since the MLT perfectly reconstructs the entire signal st by adding the overlapping signals obtained by individual inverse transforms, time-domain aliasing in the reconstruction from Sm is hence canceled from reconstruc- tions from Sin_l, and Sm+l (time-domain aliasing cancellation, TDAC). This observation holds if the transform vectors Si are left unchanged. Frequency-domain processing of the transform coefficients unbalances the time-domain aliasing components contained in the ~i, thus resulting in uncanceled time-domain aliasing in the reconstructed signal. In general, these uncanceled aliasing components are the larger, the stronger the transform coefficients are changed during processing. Keeping uncanceled

FOURIER, BLOCK, AND LAPPED TRANSFORMS 37

aliasing below an acceptable threshold hence restricts how strongly the transform coefficients can be processed. As an example, acoustic echo cancellation using the MLT is mentioned in Malvar (1999), where the occurrence of uncanceled aliasing limits the maximum echo reduction to no more than 10 dB.

In Malvar (1999) the MLT is therefore extended by replacing the real basis functions by complex ones defined as

[ P ] k n - - h ( n ) ' ~ e x p { - j ( n + 2 N + 1) (k + ~) N} (104)

The resulting transform is called the modulated complex lapped transform (MCLT). The inverse transform is carried out by the Hermitian transpose P/-/, yielding for Equation (102)

] Sm+l

(105)

with (Malvar, 1999)

P/-/P - diag[h2(n)] - [ H2~ 0 ] (106)

which is a diagonal matrix. Time-domain aliasing does therefore not occur, which allows a stronger degree of processing. Superposition of the reconstructed signal vectors only compensates for the effects of the window h(n). In Malvar (1999) the MCLT permits one to reduce echo by 20 dB, compared to only 5 dB with the MLT. The price to pay is a redundancy by a factor of two, since the MCLT transforms N real signal samples into N complex transform coefficients.

In Young and Kingsbury (1993) a similar extension, termed the complex lapped transform (CLT), is proposed for the 2D LOT. The objective is to estimate motion in image sequences by phase correlation between blocks. Since the use of a lapped transform implies smoothly windowed overlapping blocks, smoother motion fields are expected in comparison to motion estimation techniques using nonoverlapping blocks. The transform generates a redundancy by a factor two in each dimension, resulting in a total redundancy of four.

We finally discuss an extension of the 2D MLT which makes the transform unambiguously orientation sensitive. As we have seen in Section IV, the basis functions of real separable 2D transforms are sensitive to two different orientations. For image enhancement and restoration,

38 TIL AACH

however, unambiguous detection of orientated structures is often desired. This can be achieved by complementing the cosine-shaped basis functions by sine-shaped ones.

The basis functions of the separable, 2D MLT are given by

[P]klmn = N cos m +

(107)

where k, 1 - - 0 , . . . , N - 1 and m,n = 0 , . . . , 2N-1. Replacing the cosine functions by sine functions leads to the complementary basis functions

2h(n)h(m)sin m+ k+ .sin n+ [Pt]klmn= ~ 2 (108)

The basis functions of the new, orientation-selective transform are formed by

[PL+]klmn = [P]klmn '1- [P']ktmn

=2h(n)h(m)N cos {N I(m + N + l) (k + ~) - ( n

(109)

which is an unambiguously orientated windowed cosine wave. Since the [PL+]klm n cover only half the possible orientations, we form additionally the basis functions

[PL-]ktmn -- [P]ktm,, = [P']klm,,

= ~ cos m + N + I

(110)

This transform is termed the lapped directional transform (LDT) (Kunz and Aach, 1999; Aach and Kunz, 2000). The relation between the MLT and LDT basis functions is illustrated in Figure 11. The LDT is real-valued, but not separable. However, both forward and inverse LDT can be computed from the separable fast MLTs in Equations (107) and (108). The LDT generates a redundancy by a factor of only two, and was successfully used for anisotropic image restoration and enhancement in Kunz and Aach (1999) and Aach and Kunz (2000). In a combined image restoration, enhancement, and compression framework, the processed LDT coefficients can be

FOURIER, BLOCK, AND LAPPED TRANSFORMS 39

F~GURE 11. Example 2D MLT and LDT basis functions for N=8 , and k=3, l=2: (a) MET, (b) MET', (c) LDT (sum), (d) LDT (difference).

reconverted into the coefficients of the MLT and the complementary MLT by a simple butterfly. Using only the MLT coefficients for compression eliminates the redundancy problem (Aach and Kunz, 2000).

VI. IMAGE RESTORATION AND ENHANCEMENT

In this section we compare the block FFT and the LDT within a framework for anisotropic noise reduction by a nonlinear spectral domain filter. The noisy input image is first decomposed into blocks of size 32 x 32 pixels, which are then transformed by the FFT or the LDT. The observed noisy transform coefficients are then attenuated depending on their observed signal-to-noise ratio: the more the magnitude of a coefficient exceeds a

40 TIL AACH

FIGURE 13. Left: Processing result for the noisy "Marcel" image using the block FFT with no overlap. The blocking effect is evident. Right: Processing result for the noisy "Marcel" image using the LDT. The noise reduction performance is almost identical to the FFT-based algorithm, but the blocking effect has disappeared.

corresponding noise estimate, the less it is attenuated. Since directional image information leads to spectral energy concentration, which can unambiguously be detected in both FFT and LDT (but not in real separable transforms, like DCT, LOT, and MLT), coefficients contributing to oriented lines and edges can be identified and more carefully treated than other ones. These algorithms are discussed in detail elsewhere (Aach and Kunz, 1996a, 1998, 2000; Aach, 2000; Kunz and Aach, 1999). Figure 12 shows an original image and its noisy version (white Gaussian noise, peak signal-to-noise ratio 20.2dB). The processed images are shown in Figure 13. Evidently, processing by the FFT without block overlap reduces the noise level visibly, but the rather strong processing causes the block raster to appear. (In Aach

FOURIER, BLOCK, AND LAPPED TRANSFORMS 41

FIGURE 14. Enlarged versions of the FFT-processed (left) and LDT-processed (right) noisy "Marcel" image.

and Kunz (1996a,b) the authors therefore used overlapping blocks, inflating the processed data volume by a factor of four.) The LDT-based processing result reduces noise approximately as much as the FFT-based approach, i.e., by about 6dB, without causing block artifacts. Enlargements of both processing results are shown in Figure 14.

VII. DISCUSSION

In this chapter we have summarized the development of lapped transforms. We started with the continuous-time and discrete-time Fourier transforms of time-dependent signals with infinite duration. These transforms were viewed as a decomposition of the signals into frequency-selective basis functions, or eigenfunctions of LTI systems. With the discrete Fourier transform which decomposes a finite-length signal block into a set of orthogonal basis functions, a transform could be expressed as a multi- plication of the signal vector by a unitary matrix, i.e., viewed as a rotation of coordinate axes. We then analyzed the effects of unitary transforms of the covariance structure of random signals, and found optimal transforms with respect to decorrelation and energy concentration. While these optimal transforms are signal dependent and cannot be calculated fast, we showed that Fourier-like fixed transforms, in particular the DCT, are good practical approximations to the optimal transforms. The disadvantage of blockwise processing are the blocking artifacts introduced by independent spectral- domain processing of the blocks. To alleviate the blocking effects, we then turned to finite-length transforms with overlapping basis functions. The

42 TIL AACH

transform matrix for a single block then is not square any more; inverse transforms of single blocks do not therefore exist. Under extended orthogonality conditions, however, it was shown that the original signal can be reconstructed from nonperfectly reconstructed individual blocks by overlapping and adding. Two types of lapped transforms were discussed, the lapped orthogonal transform and the modulated lapped transform, where we focused on a 2:1 overlap. For the LOT, a feasible rectangular matrix obeying the extended orthogonality conditions was first constructed using DCT basis functions. This matrix was optimized by multiplication with an orthogonal square matrix derived from an eigenvector analysis. The MLT did not need an eigenvector analysis; rather, it was based on modulated filter banks. We did not delve deeper into the relation between block transforms and filter banks. Suffice it to mention that a block transform can be viewed as a uniform critically sampled filter bank, where the filter length is equal to the number of subbands. Similarly, a lapped transform can be regarded as a uniform and critically sampled filter bank with filter length equal to, for example, twice the number of subbands.

We then discussed extensions of both the LOT and the MLT in speech and image processing. These extensions are based on the additional use of complementary basis functions, thus introducing redundancy. We concluded with an exemplary comparison of block and lapped transforms in image processing.

ACKNOWLEDGMENTS

The author is grateful to Cicero Mota, formerly with the University of Amazonas, Brazil, and now with the University of Lfibeck, and to Dietmar Kunz, Cologne University of Applied Sciences, for fruitful discussions.

APPENDIX A

To prove that Equation (23) indeed recovers s(n) from its frequency coefficients, we multiply both sides of Equation (22) by e j(z=/N)kr, sum over all frequency coefficients, and normalize by N, yielding

N-1 N-1 1 1 ESDFT(k) eJ(2rc/N)kr ES(rl) -N ~ �9

N k=0 n=0

N-1 E eJ(2rt/N)(r-n)k k=O

for r = 0 , . . . , N - 1,

(111)

FOURIER, BLOCK, AND LAPPED TRANSFORMS 43

where we have interchanged the order of summations on the right-hand side. The orthogonality of complex sinusoids

~ 1 for r - n -- 0, N, 2N, .. . ! eJ(2rffN)(r-n)k - - | 0 otherwise N k--0

- - 6(n - (r - m N ) )

(112)

yields

N-1 1 SDFT(k) e j(2~/N)kr - - Z s ( n ) 6 ( n (r m N ) ) - - s (r ) ,

N k=O n=O (113)

which concludes the proof.

APPENDIX B

Figure 15 shows a scalar uniform quantizer with quantization interval or step size A. A transform coefficient S(k) is quantized to multiples V(i(k)) = i(k)A (Goyal, 2001, p. 13; Gray and Neuhoff, 1998). The output of the quantizer hence is an index i(k) = round(S(k)/A), where round(x) rounds to the nearest integer. The decoder calculates the quantized transform coefficient values by S ( k ) - i ( k ) A - VQ'(k)). Assuming sufficiently fine quantization, the error d(k)= S (k ) -S (k ) can be assumed as being

!

-Smax

I I

3A

2A

A

I I

I

- - 2 A

- 3 A

Sm.x

FIGURE 15. Uniform quantization into multiples of the step size A.

44 TIL AACH

uniformly distributed between --A/2 and A/2. Defining the distortion D(k) as D(k)= E[dZ(k)], where E denotes the expectation, we obtain

A 2

D - 1--2-" (114)

Consider S(k) uniformly distributed between [--Smax, Smax). Its energy erZ(k) then is

o'2(k) - (2Smax)~2 (115) 12 "

Dividing the dynamic range [-Smax, Smax) into steps of step size A yields 2Smax/A quantization steps. Assuming the number of steps being a power of two, a fixed-length code needs

R - l o g 2 (2S~ ax) (116)

bits per transform coefficient. The distortion then depends on the rate R according to

D -- 4 ( k ) . 2 -2R (117)

and the signal-to-distortion ratio is

~ = 22R :=> 10 log10 dB - R. 6 dB. D

(118)

Each additional bit hence improves this ratio by 6 dB (Lfike, 1999, p. 204; Proakis and Manolakis, 1996, Sect. 9.2.3). In fact, it can be shown that optimal quantizers perform in accordance with (Goyal, 2001; Gray and Neuhoff, 1998)

2-2, lO lo 10 6 d . - 1 0 log,0 , d . ,

(119)

where ?, is a factor depending on the distribution of the input signal and on the encoding method. For instance, for a Gaussian source and

FOURIER, BLOCK, AND LAPPED TRANSFORMS 45

fixed-length encoding of i(k), we have g - x /3n/2 ~ 2.7. Using an entropy code yields ?,-roe~6; this improves the signal-to-distortion ratio by about 2.8 dB over the fixed-length code (Goyal, 2001, p. 14; Jayant and Noll, 1984).

APPENDIX C

To eliminate the potential discontinuities in the periodic repetition of s(n), n - 0 , . . . , N - 1 , we form the concatenated signal of length 2N

s(n) g(n) -- s ( 2 N - l - n)

f o r n - 0 , . . . , N - 1 for n - N , . . . , 2 N - 1"

(120)

Figure 16 shows the concatenated signal g(n) for the cosine wave in Figure 3. Note that the last coefficient of s(n), i.e., s ( N - 1 ) = g(N-1) , is repeated as g(N), the concatenation therefore is not a perfect cosine wave. Periodic repetition of g(n) will not exhibit unwanted discontinuities, so the DFT of g(n) should not be afflicted by leakage artifacts. Also, if s(n) is of even length, so is g(n), which is convenient when one wants to use fast FFT-like implementations. Moreover, g(n) is symmetric with respect to N-0.5. The

0.8

0.6

0.4

0.2

o 7 -0 .2

-0 .4

-0 .6 q

-0 .8

t v -1 - J ~ 0 20 ~lO 60 100 120

n

FIGURE 16. Concatenation g(n) of the cosine wave in Figure 3 and its mirrored version according to Equation (120). Note that always g(N-1)=g(N).

4 6 T I L A A C H

DFT G(k), k - 0 , . . . , 2 N - l , of g(n) should therefore be real apart from a complex linear phase factor e j~k/zN, and even symmetric (recall that s(n) is assumed to be real). Indeed, we have for G(k)

N-1 2N-1

g(n) o---. G(k) - Z s(n)e-J(Tck/N)n + Z s(ZN- 1 - m)e -j(Tck/N)m, n=0 m - - N

(121)

which, after substituting n - 2 N - 1 - m in the second sum, yields

N-1

G(k) - ~ s(n)[e -j(~k/u)n + eJ(~k/U)(n+l)]. (122) n--0

Factoring out the complex linear phase factor caused by the (N-0.5)-point circular shift, we obtain

N - 1 (~k(n + (1/2))) G(k) - - e j ~ k / z u . 2 ~ s(n) cos

n=O N ' k - 0 , . . . , 2 N - 1.

(123)

Leaving off the complex exponential factor (this corresponds to a reverse circular shift of g(n) by N - 0 . 5 points) and normalizing to achieve a unitary transform leads to the DCT as defined in Equation (56). Because of the symmetry of the DFT coefficients, the coefficients for k = 0 , . . . , N - 1 suffice. Figure 17 shows IG(k)l for the extended cosine wave in Figure 16; this is proportional to the modulus DCT spectrum of the signal in Figure 3. When comparing the spectra in Figures 3 and 17, the reduction of leakage is immediately evident.

The DCT can hence be regarded as a DFT after modifying the signal so that discontinuities do not occur in the periodic extension. Another consequence is that the DCT can be computed efficiently using FFT algorithms where, from Equation (56), no complex number operations are needed any more. Exploiting the symmetry of the concatenated signal g(n), the 2N-point DFT of G(k) can actually be computed by an N-point DFT (Lim, 1990, p. 153). The above observations also hold for the inverse DCT.

FOURIER, BLOCK, AND LAPPED TRANSFORMS 47

40

30

(scaled) modulus DCT, k=0,..., N-1 i | i i i

10

j~ . , _ . . . . . . . . . 0 10 20 30-- 40 50 60

K

FIGURE 17. Modulus DFT of the extended cosine in Figure 16, for k = 0 . . . . . N - 1 = 63, which is proportional to the DCT of the cosine wave in Figure 3. Note the improved concentration of spectral energy with respect to the DFT spectrum in Figure 3.

APPENDIX D

With the notation

( )( N + l k + o t (k ,m) - m + 2 N

( )( N + I k + f i ( k , n ) - n + N + 2 N

we have for the (m+l , n+ l ) th element of Q[Q1

[ Q ~ Q 1 ]ran - -

2 N-1 - - Z cos or(k, m) cos fl(k, n) N k=o

• U~ -- -- ~ { cos[c~(k, m) + fl(k, n)] + cos[or(k, m) - fl(k, n)]}.

N k=0

W i t h

y(k) - or(k, m) + fl(k, n) - (m + n + 1) k + ~ +

(124)

(125)

(126)

48 TIL AACH

and 0 < m + n + 1 < 2N, the sum y~,~v-01 cos v(k) extends over i per iods if

m 4- n 4- 1 = 2i is an even n u m b e r , and is thus zero. F o r m 4- n 4- 1 = 2i 4- 1 odd, the sequence of cos y(k), k - 0 , . . . , N - 1, is an odd sequence, and again

sums to zero. Similarly, the sum over c o s [ o t ( k , m ) - fl(k,n)] is zero, which p roves E q u a t i o n (97).

The entries of Q T Q 0 are

{Q[Q0].. -- - - cos oe(k, m) cos oe(k, n) N k=0

_-- 1 ~ {cos[ot(k, m) -- ol(k, n)] 4- cos[or(k, m) + or(k, n)]} N k=0

(127)

where

N-~ { 0 for m ~ n (128) k=0 cos[or(k, m) - or(k, n)] - N for m - n

and

.1 { Z cos[or(k, m) + or(k, n)] - 0 for m + n --/: N - 1 k=0 - N for m 4- n -- N - 1 (129)

f r om which E q u a t i o n (98) follows. The p r o o f of E q u a t i o n (99) is similar.

REFERENCES

Aach, T. (2000). Transform-based denoising and enhancement in medical x-ray imaging. European Signal Processing Conference, EURASIP, Tampere, Finland, edited by M. Gabbouj, and P. Kuosmanen, pp. 1085-1088.

Aach, T., and Kunz, D. (1996a). Anisotropic spectral magnitude estimation filters for noise reduction and image enhancement. Proc. ICIP-96, Lausanne, Switzerland, pp. 335-338.

Aach, T., and Kunz, D. (1996b). Spectral estimation filters for noise reduction in x-ray fluoroscopy imaging. Proc. EUSIPCO-96, Trieste, Italy, edited by G. Ramponi, G. L. Sicuranza, S. Carrato, and S. Marsi, pp. 571-574.

Aach, T., and Kunz, D. (1998). Spectral amplitude estimation-based x-ray image restoration: An extension of a speech enhancement approach, in Proc. EUSIPCO-98, Patras, edited by S. Theodoridis, I. Pitas, A. Stouraitis, and N. Kalouptsidis, pp. 323-326.

Aach, T., and Kunz, D. (2000). A lapped directional transform for spectral image analysis and its application to restoration and enhancement. Signal Processing 80(11), 2347-2364.

Ahmed, N., Natarajan, T., and Rao, K. R. (1974). Discrete cosine transform. IEEE Trans. Computers 23, 90-93.

FOURIER, BLOCK, AND LAPPED TRANSFORMS 49

Akansu, A. N., and Haddad, R. A. (2001). Multiresolution Signal Decomposition. Boston: Academic Press.

Akansu, A. N., and Wadas, F. E. (1992). On lapped orthogonal transforms. IEEE Trans. Signal Processing 40(2), 439-443.

Bamler, R. (1989). Mehrdimensionale lineare Systeme. Berlin: Springer Verlag. Cantoni, A., and Butler, P. (1976). Properties of the eigenvectors of persymmetric matrices with

applications to communication theory. IEEE Trans. Communications 24(8), 804-809. Capp~, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah

noise suppressor. IEEE Trans. Speech and Audio Processing 2(2), 345-349. Clarke, R. J. (1985). Transform Coding of Images. London: Academic Press. Clarke, R. J., and Tech, B. (1981). Relation between the carhunen lo+ve and cosine transform.

IEEE Proc. 128(6), 359-360. Ephraim, Y., and Malah, D. (1984). Speech enhancement using a minimum mean-square error

short-time spectral amplitude estimator. IEEE Trans. Acoustics, Speech, and Signal Processing 32(6), 1109-1121.

Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. New York: Academic Press.

Goyal, V. K. (2001). Theoretical foundations of transform coding. IEEE Signal Processing Magazine September, 9-21.

Gray, R. M., and Neuhoff, D. L. (1998). Quantization. IEEE Trans. Information Theory 44, 2325-2383.

Huang, J., and Schultheiss, P. (1963). Block quantization of correlated Gaussian random variables. IEEE Trans. Communication Systems 11, 289-296.

Jain, A. K. (1979). A sinusoidal family of unitary transforms. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 356-365.

Jayant, N. S., and Noll, P. (1984). Digital Coding of Waveforms. Englewood Cliffs, NJ: Prentice Hall.

Kunz, D., and Aach, T. (1999). Lapped directional transform: A new transform for spectral image analysis. Proc. ICASSP-99, Phoenix, AZ, pp. 3433-3436.

Lim, J. S. (1980). Image restoration by short space spectral subtraction. IEEE Trans. Acoustics, Speech, and Signal Processing 28(2), 191-197.

Lim, J. S. (1990). Two-Dimensional Signal and Image Processing. Englewood Cliffs, NJ: Prentice-Hall.

Lim, J. S., and Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586-1604.

Lfike, H. D. (1999). Signal~ibertragung. Berlin, Heidelberg, New York: Springer Verlag. Makhoul, J. (1981). On the eigenvectors of symmetric Toeplitz matrices. IEEE Trans. Acoustics,

Speech, and Signal Processing 29(4), 868-872. Malvar, H. (1999). A modulated complex lapped transform and its application to audio

processing. Proc. ICASSP-99, Phoenix, AZ, pp. 1421-1424. Malvar, H. S. (1992a). Extended lapped transforms: Properties, applications, and fast

algorithms. IEEE Trans. Signal Processing 40(11), 2703-2714. Malvar, H. S. (1992b). Signal Processing with Lapped Transforms. Norwood, MA: Artech

House. Malvar, H. S., and Staelin, D. H. (1989). The LOT: Transform coding without blocking effects.

IEEE Trans. Acoustics, Speech, and Signal Processing 37(4), 553-559. Oppenheim, A. V., and Schafer, R. W. (1998). Discrete-Time Signal Processing. Englewood

Cliffs, NJ: Prentice Hall. Papoulis, A. (1968). Systems and Transforms with Applications in Optics. New York:

McGraw Hill.

50 TIL AACH

Proakis, J. G., and Manolakis, D. G. (1996). Digital Signal Processing. Upper Saddle River, NJ: Prentice Hall.

Rabbani, M., and Jones, P. W. (1991). Digital Image Compression Techniques. Bellingham: SPIE Optical Engineering Press.

Ray, W. D., and Driver, R. M. (1970). Further decomposition of the Karhunen-Lo6ve series representation of a stationary random process. IEEE Trans. Information Theory 16(4), 845-850.

Therrien, C. W. (1989). Decision, Estimation, and Classification. New York: Wiley. Therrien, C. W. (1992). Discrete Random Signals and Statistical Signal Processing. Englewood

Cliffs, NJ: Prentice-Hall. Unser, M. (1984). On the approximation of the discrete Karhunen-Loeve transform for

stationary processes. Signal Processing 7, 231-249. van Compernolle, D. (1992). DSP techniques for speech enhancement. Proc. Speech Processing

in Adverse Conditions, Cannes-Mandelieu, pp. 21-30. Young, R. W., and Kingsbury, N. G. (1993). Frequency domain motion estimation using a

complex lapped transform. IEEE Trans. Image Processing 2(1), 2-17. Zelinski, R., and Noll, P. (1977). Adaptive transform coding of speech signals. IEEE Trans.

Acoustics, Speech, and Signal Processing ASSP-25(4), 299-309. Ziemer, R. E., Tranter, W. H., and Fannin, D. R. (1989). Signals and Systems: Continuous and

Discrete. New York: Macmillan.