wavelets - kansas state university 1 introduction: wavelet compression of a digital signal...

43
Wavelets Shane Scott Kansas State University Honors Program 2012

Upload: truongliem

Post on 25-Apr-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

  • Wavelets

    Shane ScottKansas State University Honors Program

    2012

  • Abstract

    Fourier spectral analysis provides methods of signal compression and approxi-mation, but presents some limitations to practical implementations. The Fouriertransform of a signal may be thought of as its representation over a (pseudo)basis of complex exponentials, which while localized in frequency space, are de-localized in time. In contrast with the Fourier basis, a wavelet basis can beboth frequency and time localized. Such a basis allows transformations betweenrepresentations to be calculated locally, and partial representations can oftenbe calculated even when only a portion of the signal is known. We demon-strate and compare low-loss signal compression using both methods. Througha discussion of the advantages of wavelet representations we will motivate atheoretical discussion of alternative bases and give proofs of their constructionand in the space of square summable complex sequences and the space of squareLebesgue integrable functions.

  • Contents

    1 Introduction: Wavelet Compression of a Digital Signal 2

    2 Discrete Wavelets 16

    3 Continuous Wavelets 23

    4 Appendix 294.1 Lebesgue Measure and Integration . . . . . . . . . . . . . . . . . 294.2 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 L2(R) as a Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . 33

    i

  • List of Figures

    1.1 A Walken Signal z . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The Walken Spectrum |z| . . . . . . . . . . . . . . . . . . . . . . 41.3 The 90% Compressed Walken Spectrum |z| . . . . . . . . . . . . 51.4 The 90% Compressed Walken Signal z . . . . . . . . . . . . . . . 61.5 The 95% Compressed Walken Spectrum |z| . . . . . . . . . . . . 61.6 The 95% Compressed Walken Signal z . . . . . . . . . . . . . . . 71.7 Christopher Walken on SNL . . . . . . . . . . . . . . . . . . . . . 71.8 The Mexican Hat wavelet in standard space . . . . . . . . . . . . 81.9 The Mexican Hat wavelet in frequency space . . . . . . . . . . . 81.10 The D4 wavelet in space . . . . . . . . . . . . . . . . . . . . . . . 91.11 The D4 wavelet in frequency space . . . . . . . . . . . . . . . . . 91.12 1st Stage Wavelet Representation z(1) . . . . . . . . . . . . . . . 121.13 2st Stage Wavelet Representation z(2) . . . . . . . . . . . . . . . 121.14 3rd Stage Wavelet Representation z(3) . . . . . . . . . . . . . . . 131.15 4th Stage Wavelet Representation z(4) . . . . . . . . . . . . . . . 131.16 90% Compressed 4th Stage Wavelet Representation z(4) . . . . . 131.17 90% Wavelet Compressed z . . . . . . . . . . . . . . . . . . . . . 141.18 95% Compressed 5th Stage Wavelet Representation z(5) . . . . . 141.19 95% Compressed z(5) . . . . . . . . . . . . . . . . . . . . . . . . . 141.20 Comparison % Error by Compression Ratio . . . . . . . . . . . . 15

    Acknowledgement

    This work was supported by a scholarship from the Center for the Integrationof Undergraduate, Graduate, and Postdoctoral Research of the MathematicsDepartment at Kansas State University under the direction of Dr. VirginiaNaibo and in collaboration with Brian Moore and Vincent Pigno.

    ii

  • Notation

    Z the set of integersZ+ the set of positive integersN the set of non-negative integersR the set of real numbersR+ the set of positive real numbersR+0 the set of non-negative real numbersC the set of complex numbersP(A) set of all subsets of AAc set compliment of A`1(Z) the absolutely of summable sequences`2(Z) the set of square summable sequences`2(ZN ) N length complex vectorsL2(A) the set of square integrable complex valued functions on AclosdA d induced metric topology closure of A

    aA z(a) the summation of z over set AaA z(a) the product of z over set A

    Af(x)dx the Lebesgue integral of f over A

    v|w inner product of v and wv w convolution of v and wv w v orthogonal to wV W orthogonal direct sum of V and W|z| modulus of zz norm of zz complex conjugate of zz conjugate reflection of z

    Rnz nth translation of z

    z Fourier or Discrete Fourier transform of z

    f inverse Fourier transform of ff : A B f is a set function from A to Bf : a 7 b f maps a to b Dirac delta sequenceIn n n identity matrix

    1

  • Chapter 1

    Introduction: WaveletCompression of a DigitalSignal

    Informally wavelets are vectors in a space whose translations and dilations spanthe space. Often wavelets may be chosen which are localized in both their tem-poral and frequency representations. Wavelets are of interest both to theory andapplication, so it may be best to demonstrate by an example. The following in-formal presentation of wavelet applications to data compression techniques mayhelp to motivate the theoretical discussion of their existence and constructionthat follows.

    In an increasingly data-filled age where few are unfamiliar with concepts ofstreaming or data transfer, more and more are routinely initiated merely frominternet exposure, even while the underlying mathematics goes unrecognized.Efficiency of such data transmission is of great concern with the current prolif-eration of sites streaming music, video, photos. Signals (such as electrical andoptical signals speeding through cables and antennas across the country) arebest modeled as vectors in the space:

    L2(R) ={v : R C

    R|v(t)|2dt

  • Figure 1.1: A Walken Signal z

    0 0.5 1 1.5 2

    x 105

    0.5

    0.4

    0.3

    0.2

    0.1

    0

    0.1

    0.2

    0.3

    0.4

    Time

    Sig

    nal

    These are simply N length vectors of complex values. We will develop some ofthe theory of wavelets in these spaces. But for now, lets consider the signalz `2(ZN ) shown in Figure 1.1.

    The signal z represents a digital sound signal of Christopher Walken per-forming a line from an April 8, 2000, Saturday Night Live sketch. (Press theplay button in Figure 1.1 to listen to the clip.) Suppose we are faced with theproblem of storing a great many such signals in a database, or transmitting thesignal as fast as possible. Lossy compression techniques to such problems lookto find ways of representing signals (or at least an approximation) with as littleinformation as possible. A fairly simple method of digital compression is Fouriercompression.

    The discrete fourier transform of z is z, also an N length vector, given bythe formula

    z(m) =1N

    N1n=0

    z(n)e2imn/N . (1.1)

    This transformation is reversed by the inverse transformation

    z(n) =1N

    N1m=0

    z(m)e2imn/N =

    ei2mn/N

    N

    z , (1.2)and we can then think of z as the representation of z over a basis of complex

    3

  • Figure 1.2: The Walken Spectrum |z|

    0 0.5 1 1.5 2

    x 105

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    550

    Frequency

    Sig

    nal F

    ourie

    r T

    rans

    form

    exponentials (essentially sine waves):

    z(n) =

    N1m=0

    ei2mn/N

    N

    z ei2mn/NN ,while z is its own representation over the standard basis. In this sense z is thetemporal or spatial representation while z is the frequency space representationof z. The component-wise absolute value of z is called the spectrum of z. Thespectrum of Mr. Walkens line is shown in Figure 1.2.

    Notice that a small number of positions of z have a large value, while manyhave values close to zero. Thus we can discard many of these position andreplace them only with zero, we can perform the inverse Fourier transform toobtain a very close approximation of z. By transmitting only the positions ofz above some threshold value, we have much less data to transmit, but canstill reconstruct a very close approximation to z. The Fourier compressionalgorithm at this point goes as follows. To compress and send a p percent sizedrepresentation of z:

    1. Calculate z.

    2. Order the values of z by their values in modulus and calculate a, their pth

    percentile.

    3. If a component of z is less that a in absolute value, discard it and replaceit with 0.

    4

  • Figure 1.3: The 90% Compressed Walken Spectrum |z|

    0 0.5 1 1.5 2

    x 105

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    550

    90% Compressed Spectrum

    Frequency

    Sig

    nal F

    ourie

    r T

    rans

    form

    4. Transmit the positions and values of the non-zero positions of z.

    5. Reconstruct the approximation by taking the inverse Fourier transform.

    Figures 1.3 and 1.5 show the 90% and 95% compressed spectrum of z inblue while the original is shown for comparison in red. (The thickness of thegraph lines in the figures obscures the sparsity of the non-zero coefficients post-compression, but only 10%/5% remain.) Figures 1.4 and 1.6 show the recon-structed approximation to z in blue with the original z in red. Press the Playbutton by their graph to listen. Thus we can reduce by 95% or more the amountof data describing z and still have a very close approximation to Mr. Walkensline.

    This technique is good, but still has a number of drawbacks. Note first thatin the compressed versions of z as we increase the compression ratio the signalto background ratio is getting much worse, that is most of the information lossoccurs on the most important parts of the signal. But for modern applicationslike web streaming, this method has an even bigger problem. Note that theinversion formula for z from z ( Eq. 1.2) each component of z must be calculatedusing every component of z, so the inversion calculation cannot begin until allof z is transmitted, which is decidedly not how web streaming operates. We callthis problem the time delocalization of the complex exponential basis. Withoutgoing into detail, we can say compression of video signals works much the sameway. Take a peak at this video clip of Mr. Walken delivering his line (Figure1.7).

    Although this example of Mr. Walken is a two dimensional example, thetwo dimensional case is very similar to that of the one dimensional; a Fouriertransform exists and the coefficients depend on every spatial pixel, that is, they

    5

  • Figure 1.4: The 90% Compressed Walken Signal z

    0 0.5 1 1.5 2

    x 105

    0.5

    0.4

    0.3

    0.2

    0.1

    0

    0.1

    0.2

    0.3

    0.4

    90% Compressed Walken Signal

    Time

    Sig

    nal

    Figure 1.5: The 95% Compressed Walken Spectrum |z|

    0 0.5 1 1.5 2

    x 105

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    550

    95% Compressed Spectrum

    Frequency

    Sig

    nal F

    ourie

    r T

    rans

    form

    6

  • Figure 1.6: The 95% Compressed Walken Signal z

    0 0.5 1 1.5 2

    x 105

    0.5

    0.4

    0.3

    0.2

    0.1

    0

    0.1

    0.2

    0.3

    0.4

    95% Compressed Walken Signal

    Time

    Sig

    nal

    are not spatially localized. Observe that the background behind Mr. Walkenchanges very little throughout the clip. A better algorithm might avoid sendingredundant information to update the background, but because the values of zdepend on all components of z, (see Eq. 1.1), we cannot avoid sending someof this information. That is, the standard basis which expresses the time rep-resentation z is delocalized in the frequency space. But a modification of thecompression algorithm can overcome these drawbacks.

    The compression algorithm we have expressed depends only on having abasis over which the representation of z has as many zeros or small values, so

    Figure 1.7: Christopher Walken on SNL

    7

  • Figure 1.8: The Mexican Hat wavelet in standard space

    8 6 4 2 0 2 4 6 8

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    Figure 1.9: The Mexican Hat wavelet in frequency space

    8 6 4 2 0 2 4 6 80

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    instead we can hope to find a basis which is localized both in the standardspace and in the frequency space. Qualitatively this means that we will finda basis of vectors which have only a small section of non-zero components andwhose Fourier transforms have only a small section of non-zero coefficients. Oneexample is the Mexican Hat wavelet, whose time and frequency representationsare shown in Figures 1.8 and 1.9. A basis may be formed via their spatialtranslations, which will avoid the localization objection raised to the complexexponential basis. A detailed technical discussion of this may be found in thesucceeding chapters.

    For discrete finite length vectors we form a first stage wavelet basis by takingthe even spatial translations of two vectors u and v. We will choose u such thatonly low frequency components are non-zero, while v will have high frequencycomponents non-zero. Figure 1.10 shows one such example, Daubechiess D4wavelet filter pair.

    Then the components of z represented over the translations of u will con-

    8

  • Figure 1.10: The D4 wavelet in space

    0 10 20 300.8

    0.6

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    0 10 20 300.8

    0.6

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    (Left) High frequency filter. (Right) Low frequency filter.

    Figure 1.11: The D4 wavelet in frequency space

    10 0 10

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    10 0 10

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    (Left) High frequency filter spectrum. (Right) Low frequency filter spectrum.

    9

  • tain an N/2 length approximation of z, while the components of z representedover the translations of v will contain the details of z. Figure 1.12 shows therepresentation of z in the wavelet basis. The first half is an approximation of z,while the second contains the details needed to reconstruct z. We can the re-vise the compression algorithm to discard the detail components of the waveletrepresentation of z. This will (most of the time) maintain the original signal tonoise ratio of z, the first objection we raised to the Fourier compression algo-rithm. But by discarding only details, we are limited to 50% compression. Bydecomposing the N/2 wavelet approximation over an N/2 length wavelet basis,and iterating this process, we can continue to produce more detail componentswhich can be discarded to compress the signal. Our new algorithm to compressz by p percent and transmit is as follows.

    Let ` be the minimal integer satisfying p 12` and assume N , the lengthof z, satisfies 2`|N . Let z(0) = z. Let p = p12` .

    1. While 1 k `

    a. Form z(k) from z(k1) by replacing the length low frequency approx-

    imation of z(k1) (the first 2kN components) by its representation

    over an 2kN length first stage wavelet basis.

    2. Order the detail components of z(`) (the last N(1 2`)1 components)by their moduli and calculate a, their pth percentile.

    3. If a detail component of z(`) is less than a in modulus, discard it andreplace it with 0.

    4. Transmit the positions and values of the non-zero positions of z(`).

    5. Reconstruct the N length approximation of z by taking the inverse wavelettransformations.

    (In fact, a `th stage wavelet basis can be calculated directly so that the calcu-lation of z(`) need not be performed iteratively.) Because this new algorithmretains the low frequency approximation coefficients, it retains a signal to noiseratio in the compressed signal better than the Fourier algorithm. The localiza-tion of the wavelet basis means we may begin reconstruction while only a fewcomponents have been transmitted. Figures 1.12 through 1.19 show the iterativecalculation of z(`) using Daubechiess D4 wavelet, a 90% and 95% compression,and their reconstructed approximations of z. Again the compressed signals areshown in blue over the original in red. Listen to the signal by pressing the playbutton and compare it with the Fourier method. While the same amount ofcompression is achieved with both, the wavelet compressed signal has discardedmuch of the background noise, while the Fourier method sounds like it has am-plified it. Figure 1.20 shows a comparison of the error1 between the Fourier

    1The percent error for a compressed z from signal z is given by error(z, z) = zzz

    where is the vector space norm. See the later discussion.

    10

  • and wavelet compression techniques. Observe that in this example, the Fouriermethod has better error values for lower levels of compression, although forcompression over 80%, the wavelet method produces lower error. Compressingwith a higher stage basis also appears to continue to improve the error of thecompression. Bear in mind that because the wavelet compression method dis-cards more noise information, a portion of its error is irrelevant to our purpose,and in some contexts even a desirable denoising. Then a wavelet compressionapproach out-performs the Fourier compression in terms of high-compressionerror, localization, and signal-to-noise constancy.

    The iterative levels that are naturally constructed by wavelet bases lendthemselves to many other applications. Because the compression separates arough approximation and detail information, it enables algorithms which ex-amine first rough approximations of signals then further examine the details ofsignals if something of interest is revealed, as might be of interest to remotesensing or image recognition applications. Many uses of wavelets are found inthe literature. In the discussion that follows we will look instead towards thetheory surrounding their existence and construction.

    11

  • Figure 1.12: 1st Stage Wavelet Representation z(1)

    Figure 1.13: 2st Stage Wavelet Representation z(2)

    12

  • Figure 1.14: 3rd Stage Wavelet Representation z(3)

    Figure 1.15: 4th Stage Wavelet Representation z(4)

    Figure 1.16: 90% Compressed 4th Stage Wavelet Representation z(4)

    13

  • Figure 1.17: 90% Wavelet Compressed z

    Figure 1.18: 95% Compressed 5th Stage Wavelet Representation z(5)

    Figure 1.19: 95% Compressed z(5)

    14

  • Figure 1.20: Comparison % Error by Compression Ratio

    0 10 20 30 40 50 60 70 80 90 100

    105

    104

    103

    102

    101

    100

    % Compression

    % E

    rror

    (S

    emilo

    g S

    cale

    )

    Fourier Method1st Stage Wavelet Method2nd Stage Wavelet Method5th Stage Wavelet Method10th Stage Wavelet Method

    15

  • Chapter 2

    Discrete Wavelets

    Wavelets over the spaces `2(Z) and `2(ZN ) are widely used in application, andwe shall build our theory of continuous wavelets off the discrete foundation. Webegin by defining our notation.

    For a set A, we write its power set as P(A) = {U | U A} .We will write Zto refer the integers, and R to refer to the the field of real numbers. Z+ and R+will refer to the positive integers and real numbers, respectively. We write C forthe field of complex numbers. For a complex number z = x + iy with x, y Rand i =

    1 we write the complex conjugate of z as z = x iy.

    An infinite, discrete signal is usually thought of as a vector in

    `2(Z) =

    {v : Z C

    nZ|v(n)|2

  • We distinguish one particular sequence called the Dirac delta `2(Z)where (n) = 1 if n = 0 and (n) = 0 if n 6= 0. We call `1(Z) the space definedby

    `1(Z) =

    {v : Z C

    nZ|v(n)| N

    nk=n

    |v(k)|2 =N

    k=N

    |v(k)|2 +

    N

  • the sister of the sister of u is u itself and

    v|R2mv =kZ

    v(k)v(k 2m)

    =kZ

    (1)k1u(1 k)(1)k2m1u(1 k + 2m)

    =kZ

    u(1 k + 2m)u(1 k)

    =kZ

    u(k)u(k 2m)

    = u|R2mu.

    We call a basis B of `2(Z) a first-stage wavelet basis if there is some u B suchthat u is even translate orthonormal with sister wavelet v and

    B = {R2ku | k Z} {R2kv | k Z}.

    We write L2[, ) for the space of square integrable functions on [, ):

    L2[, ) ={f : [, ) C

    |f(t)|2 dt

    }.

    We define the Fourier transform on `2(Z) as the map F : `2(Z) L2[, )such that F(z) = z defined by

    z() =12

    nZ

    z(n)ein.

    It can be shown that the Fourier transform has inverse F1 : L2[, ) `2(Z)where F1 : f 7 f such that

    f(n) =12

    f()ein d,

    and we have z = z and f = f . If we consider that L2[, ) itself is a Hilbertspace (see appendix) equipped with its own inner product

    f |g =

    f()g() d,

    then we can write z() =ein

    2

    z where the inner product is in `2(Z) andf(n) = e

    in2|f where the inner product is that of L2[, ). The symmetry

    between the inner products of these two spaces is even greater, and given byParsevals formula:

    z|w = z|w

    18

  • where again the first inner product is over `2(Z) while the second is that ofL2[, ). We note a few helpful Fourier relations:

    2() =

    nZ

    (n)ein = 1

    ,

    (1)nz(n)() =nZ

    (1)nz(n) ein

    2

    =nZ

    z(n)ein(+)

    2

    = z( + ),

    Rkz() =nZ

    z(n k) ein

    2

    =nZ

    z(n)ein

    2eik

    = z()eik.

    The operation of convolution is a binary operation : `1(Z)`1(Z) `2(Z)where

    z w(n) =mZ

    z(nm)w(m)

    =mZ

    Rnz(m)w(m)

    = Rnz|w.

    The convolution has the useful property that z w =

    2zw.

    Theorem 1. Let u `1(Z) be even translate orthonormal. Then if we definethe sister wavelet v by

    v(k) = (1)k1u(1 k).

    Then B = {R2ku | k Z} {R2kv | k Z} is an orthonormal basis for `2(Z),and we call B a first stage wavelet basis.

    Proof. Observe that if u is even translate orthonormal, then R2mu|u = (m)

    19

  • and we have by Parsevals formula that

    u|R2mu =uR2mu

    =

    u()u()ei2md

    =

    0|u()|2ei2md +

    0

    |u()|2ei2md

    =

    0

    |u( + )|2ei2md + 0

    |u()|2ei2md

    =

    0

    (|u()|2 + |u( + )|2

    )ei2md

    Then

    (m) =

    1

    2

    (u(2

    )2 + u(2

    + )2) eimd.

    By taking the Fourier transform on both sides we have for [, )

    12

    =1

    2

    u(2

    )2 + 12

    u(2

    + )2 ,

    or for [0, )22

    = |u ()|2 + |u ( + )|2 . (2.1)

    The same argument applied to v yields that

    22

    = |v ()|2 + |v ( + )|2 (2.2)

    is equivalent with v being even translate orthonormall. Consider the Fouriertransform of v calculated by its definition through u

    v() =kZ

    (1)k1u(1 k) eik

    2

    =kZ

    (1)ku(k)ei(1k)

    2

    = eikZ

    u(k)eik(+)

    2

    = eiu( + )

    .

    Then v( + ) = eiu() and hence v satisfies equation (2.2) if and only if usatisfies equation (2.1) and we also have

    v()u() + v( + )u( + ) = 0 (2.3)

    20

  • is equivalent to u|R2mv = 0, by a similar argument as above. Observe thatequations 2.1, 2.2, and 2.3 are equivalent to the matrix equation:[

    u() u( + )

    v() v( + )

    ] [u() v()

    u( + ) v( + )

    ]=

    2

    I2,

    but then these matrices commute, so[u() v()

    u( + ) v( + )

    ] [u() u( + )

    v() v( + )

    ]=

    2

    I2. (2.4)

    Fix some z `2(Z) and let w `2(Z) be given by

    w =kZR2kv|zR2kv +

    kZR2ku|zR2ku.

    We take the Fourier transform of f . Because the Fourier transform is a contin-uous and linear operator

    w() =kZR2kv|zR2kv() +

    kZR2ku|zR2ku()

    =kZR2kv|zv()ei2k +

    kZR2ku|zu()ei2k

    = v()kZR2kv|zei2k + u()

    kZR2ku|zei2k. (2.5)

    Observe that as R2kv|z = v z(2k) we can takekZR2kv|zei2k =

    kZ

    v z(2k)ei2k

    =kZ

    1

    2

    (v z(k) + (1)kv z(k)

    )eik

    =

    2

    2

    kZ

    v z(k) eik

    2

    +

    2

    2

    kZ

    v z(k)eik(+)

    2

    =

    2

    2v z() + 2

    2v z( + )

    = v()z() + v( + )z( + ). (2.6)

    An identical derivation with u yieldskZR2ku|zei2k = u()z() + u( + )z( + ). (2.7)

    21

  • Substitute equations (2.6) and (2.7) into (2.5) we find

    w() = v()(v()z() + v( + )z( + )) + u()(u()z() + u( + )z( + ))

    = z()(|u()|2 + |v()|2

    )+ z( + )

    (v()v( + ) + u()u( + )

    ).

    But equation 2.4 says that

    (|u()|2 + |v()|2) =

    2

    (v()v( + ) + u()u( + )) = 0

    so we have w = z and by the Fourier inversion

    z =kZR2kv|zR2kv +

    kZR2ku|zR2ku

    so `2(Z) spanB, that is {R2ku | k Z} {R2kv | k Z} is an orthonormalbasis of `2(Z).

    During the proof of Theorem 1 we saw that if the even translations of u andv form a first stage wavelet basis then their Fourier transforms satisfy:

    |u ()|2 + |u ( + )|2 =

    2,

    |v ()|2 + |v ( + )|2 =

    2.

    It is convenient if we choose u(0) = (2)14 and v() = (2)

    14 . Then we will call

    u the low frequency filter and v the high frequency filter. It is this choice whichenables the simultaneous time-frequency localization of the wavelet basis. Forany z `2(Z) we have

    z =kZR2kv|zR2kv +

    kZR2ku|zR2ku.

    The first term contains the high frequency adjustments to z and represents de-tails of the signal, while the second term contains a low frequency approximationof the signal. Note that because any z can be written in this form, we can inparticular write

    Rm =kZR2kv|RmR2kv +

    kZR2ku|RmR2ku.

    We can reduce with R2kv|Rm = v(m 2k) and R2ku|Rm = u(m 2k),so

    (nm) =kZ

    (v(m 2k)v(n 2k) + u(m 2k)u(n 2k)), (2.8)

    which is a useful equation we will utilize in proving construction of wavelets inthe continuous case.

    We have seen that a first stage wavelet basis can be constructed any time wecan find an even translate orthonormal sequence in `1(Z). In the next chapterwe shall see that continuous wavelets can also be constructed from such a `1(Z)sequence.

    22

  • Chapter 3

    Continuous Wavelets

    Just as in the discrete case, we are interested in forming bases of L2(R) generatedby simple transformations of a single function. L2(R) is the vector space ofsquare Lebesgue integrable functions

    L2(R) ={f : R C

    tR|f(t)|2 dt

    }with inner product

    f |g =tR

    f(t)g(t)

  • 1. Space: Vj is a subspace of L2(R).

    2. Non-decreasing: If j < k then Vj Vk.

    3. Density: The setjZ Vj is dense in L

    2(R).

    4. Trivial Intersection:jZ Vj = {0}.

    5. Dilation: f V if and only if fj,0 Vj.

    6. Scaling: There is some V0 called the scaling function such that{0,k | k Z} is an orthonormal basis for V0.

    The multi-resolution analysis has a number of useful properties, and we willeventually use them to construct wavelets over L2(R). Observe first that thedilation and scaling properties of a multi-resolution analysis V will guaranteethat the set of translations of the jth dilate of , the scaling function of V , is anorthonormal basis for Vj . By the dilation property of V we have that 0,k V0implies j,k Vj . Naturally

    j,k |j,k =R

    2j/2(2jt k)2j/2(2jt k) dt

    =

    R(t)(t (k k)) dt

    = |0,kk,

    so it follows that {j,k | k Z} is orthonormal. And if f Vj then the dilationproperty of V asserts that f = gj,0 for some g V0. But because the translationsof form an orthonormal basis of V0 we have:

    g =kZ0,k|g0,k

    so that my taking the dilate of g we obtain for almost every t R:

    f(t) = 2j/2g(2jt)

    = 2j/2kZ0,k|g0,k(2jt)

    =kZ0,k|gj,2jk(t).

    Hence f is in the span of {j,k | k Z}, which must constitute an orthonormalbasis of Vj . Observe that as V is non-decreasing we have V0 V1 and sowe have

    =kZ1,k|1,k.

    We will define u `2(Z), called the scaling sequence of V , satisfying u(k) =1,k|. That is

    =kZ

    u(k)1,k,

    24

  • and we have

    0,m(t) =kZ

    u(k)1,k(tm)

    =kZ

    u(k)21/2(2t 2m k)

    =kZ

    u(k 2m)21/2(2t k)

    =kZ

    R2mu(k)1,k(t).

    One can see that u is even translate orthonormal in `2(Z). Indeed, observe that

    |0,m =

    kZ

    u(k)1,k

    kZ

    R2mu(k)1,k

    .

    If we utilize the continuity and linearity of the inner product

    =kZ

    kZ

    u(k)R2mu(k) 1,k|1,k

    =kZ

    kZ

    u(k)R2mu(k)(k k)

    =kZ

    u(k)R2mu(k)

    = u|R2mu,

    where this last inner product is over `2(Z). So because the translations of are orthonormal, it follows that the even translations of u are orthonormal in`2(Z). Remember that according to Theorem 1 we have that if u `1(Z) thenits sister wavelet v defined by

    v(k) = (1)k1u(1 k)

    is in `1(Z) and the even translations of u and v form a first stage wavelet basisin `2(Z). The first stage wavelet basis generated by u and v will allow us toconstruct a wavelet basis in L2(R) itself.

    Theorem 3 (Mallat). Let V : Z P(L2(R)) be a multi-resolution analysiswith scaling function and scaling sequence u with sister wavelet of v. Define

    =kZ

    v(k)1,k.

    Then is a wavelet for L2(R) .

    25

  • Proof. is a wavelet for L2(R) if {j,k | j, k Z} is an orthonormal basis forL2(R). We first show that for a fixed j, Wj = {j,k | k Z} is orthonormal.Observe that

    j,k |j,k =R

    2j/2(2jt k)2j/2(2jt k) dt

    =

    R(t)2j/2(t (k k)) dt

    = |0,kk.

    It will then suffice to show that |0,m = (m). If we expand over the same1,k which define , then we can write j,m in terms of the basis of Vj+1.

    j,m(x) = 2j/2(2jtm)

    = 2j/2kZ

    v(k)1,k(2jtm)

    = 2j/2kZ

    v(k)21/2(2j+1t 2m k)

    =kZ

    v(k 2m)2(j+1)/2(2j+1t k)

    =kZ

    R2mv(k)j+1,k(t).

    The continuity of the inner product allows us to take

    |0,m =

    kZ

    v(k)1,k

    kZ

    R2mv(k)1,k

    =kZ

    kZ

    v(k)R2mv(k) 1,k|1,k .

    but we know that 1,k|1,k = 0,k|0,k = (k k) follows from the defini-tion of the scaling function. Then

    |0,m =kZ

    kZ

    v(k)R2mv(k)(k k)

    =kZ

    v(k)R2mv(k)

    = v|R2mv,

    where the final inner product is in `2(Z). But we saw in Theorem 1 that v iseven translate orthonormal if and only if u is even translate orthonormal. Hencethe orthonormality of {j,k|k Z} reduces to the orthonormality of the eventranslates of u, which in turn follows from the orthonormality of {0,k | k Z}.

    We may then define Wj = span{j,k|k Z} which is a subspace of L2(R).We see that for fixed j Z by following the same style of argument as above

    j,k|j,k = R2ku|R2kv = 0.

    26

  • Then Wj Vj . We will next aim to show that Vj Wj = Vj+1.Because V is a multi-resolution analysis Vj Vj+1 and by the expansion

    j,m =kZR2mv(k)j+1,k we have that j,k Vj+1 for all k Z. Then

    because {j,k|k Z} Vj+1 we have that Wj = span{j,k|k Z} Vj+1, soVj Wj Vj+1. Let g Vj Wj be given by

    g =mZ

    R2mu(k)j,m +mZ

    R2mv(k)j,m

    =mZ

    R2mu(k)nZ

    R2mu(n)j+1,n +mZ

    R2mv(k)nZ

    R2mv(n)j+1,n

    =nZ

    (mZ

    (u(k 2m)u(n 2m) + v(k 2m)v(n 2m))

    )j+1,n

    The middle summation is a Dirac delta by equation 2.8. We are left with

    g =nZ

    (k n)j+1,n

    = j+1,k.

    Then as the defining basis of Vj+1 is the sum of terms in Vj and Wj , we havethat Vj+1 = Vj Wj . For any m < j we have Wm Vm+1 . . . Vj sowe have j,k Vj+1, and because Wj Vj we see that j,k m,k for allk, k Z. It follows that {j,k | j, k Z} is orthonormal in L2(R).

    All that remains is to show that {j,k | j, k Z} is complete. Assume thatthere is some f L2(R) such that f |j,k = 0 for every j, k Z. Let Pj(f) bethe projection of f onto Vj defined by

    Pj(f) =kZj,k|fj,k.

    Observe that f Pj(f) Vj so for every ` < j: f Pj(f) W` and asby assumption f W` we see Pj(f) W`. Because Vj = Vj1 Wj1 andPj(f) Wj1 we see that Pj(f) Vj1 and by induction g V` for every ` < j.As V is non-decreasing Pj(f)

    jZ Vj . Then the trivial intersection property

    of V forces Pj(f) = 0. Then

    f = f Pj(f) = inf {f v | v Vj }

    for every j Z. By the density property of V we have that there is somesequence g : Z+

    jZ Vj such that gn converges to f in the L

    2(R) metric.But of course as gn Vj for some j Z and f = inf {f v | v Vj } wesee that

    f f gn

    for any n Z. Then f = 0, and it must be that {j,k | j, k Z} is a basis ofL2(R).

    27

  • In contrast with the discrete case, wavelets in L2(R) have a bi-infinite di-mension of dilations, and there is no clear base level off which to form an ap-proximation. But if we distinguish some ` Z then in fact the set B givenby

    B = {`,k | k Z} {j,k | k Z, j `}

    is an orthonormal basis for L2(R). That B is orthonormal should be clear fromthe above discussion of the orthonormal translates of and the orthonormaltranslates and dilates of . Suppose there is some g L2(R) such that g B.By Theorem 3 we can write g as

    g =j

  • Chapter 4

    Appendix

    4.1 Lebesgue Measure and Integration

    For A R we define

    (A) = inf

    { k=0

    (bk ak)

    A k=0

    (ak, bk)

    }as the outer measure of A. We say that A is measurable if

    B R : (B) = (B A) + (B Ac) (4.1)

    where Ac = R A is the compliment of A. We will write M for the set ofmeasurable real sets and define the Lebesgue measure : M R+0 {} as therestriction of to M.

    A -algebra is a non-empty set of sets closed under complementation, count-able unions, and countable intersections. We observe that M is a -algebra.

    Theorem 4. Let A : Z+ M be a sequence of sets in M. Then:

    (a) Ac0 M

    (b)nZ+

    An M

    (c)nZ+

    An M

    Proof. (a) If A0 M then it follows immediately from (4.1) that Ac0 M.

    (b) We first demonstrate sub-additivity of . Let A : Z+ M be a sequenceof sets in M. For any n Z we have

    (An) = inf

    { mZ+

    (bm am)

    An mZ

    (am, bm)

    }.

    29

  • Observe that for any > 0 we can find some collection of open intervals{(amn , bmn) | mn Z} covering An such that

    (An) +

    2n>mZ+

    (bmn amn).

    Then by taking a sum over n we havenZ+

    (An) + >nZ+

    mZ+

    (bmn amn).

    But becausenZ+

    mnZ+(amn , bmn) covers

    nZ+ An we have

    nZ+

    (An) + > inf

    { mZ+

    (bm am)

    nZ+

    An mZ

    (am, bm)

    }

    =

    ( nZ+

    An

    )

    But as this > 0 is arbitrary we have

    ( nZ+

    An

    )nZ+

    (An). (4.2)

    This is referred to as the sub-additivity of . Note then by the sub-additivity, if A M for any B R

    (B) = ((B A) (B Ac)) (4.3) (B A) + (A Bc) (4.4)

    Note also that it follows immediately from the definition of is monotone,that is if A0 A1 then (A0) (A1). Suppose that Ai, Aj M thenfor any B R we have by applying (4.1):

    (B Ai Aj) = (B) (B Aci ) (B Ai Acj)

    and by set considerations we see that

    (B (Ai Aj)c) = ((B Aci ) (B Acj)) (B Aci ) + (B Acj)

    adding these results we obtain

    (BAiAj)+(B(AiAj)c) (B)+(BAcj)(BAiAcj).

    As B Ai Acj B Acj we have

    (B Ai Aj) + (B (Ai Aj)c) (B),

    30

  • which with result (4.4), we have that Ai Aj meets condition (4.1) and soAi Aj M. It follows by induction that M is closed under finite intersec-tion. It remains to be shown that M is closed under countable intersection.Suppose now that A : Z+ M is a sequence of sets in M. Let S0 = R,Sn =

    nm=0Am, and S =

    mZ+ Am. We have that each Sn is measurable

    by the above argument, so for any B R

    (B) = (B Sn) + (B Scn),

    but as Sn is a decreasing sequence and as is monotone we see (BSn)

    (B S) so it will suffice to show that (B Sc) limn (B Scn).Observe that we can write B Sc as the union

    B Sc =n=0

    B (Scn+1 Scn

    )so utilizing the sub-additivity we have

    (B Sc) n=0

    (B (Scn+1 Scn))

    =

    n=0

    (B Scn+1) (B Scn+1 Scn)

    by the measurability of Sn. But as Scn Scn+1

    (B Sc) n=0

    (B Scn+1) (B Scn)

    = limn

    (B Scn+1) (B Sc0),

    but Sc0 = which, as is easy to verify, has () = 0. It follows that

    (B) (B S) + (B Sc)

    for any B R, so with equation 4.4 we find that S M.

    (c) If A : Z+ P(M) is a sequence in M then by DeMorgans lawn=0An =

    (n=0A

    cn)c M

    We call a set 0 measure if it is measurable and has Lebesgue measure of 0.The Lebesgue measure is complete in the sense that every subset of a 0 measureset is also measurable with 0 measure. To see this let Z M has (Z) = 0 andY Z. For any B R we have B Y Z, so (B Y ) (Z) = 0. Then

    (B) (B Y c) = (B Y ) + (B Y c)

    31

  • which with the subadditivity of is sufficient to show that Y M. And(Y ) (Z) = 0.

    We say that f : R R is measurable if the set {x R | f(x) > a} ismeasurable for every a R. We say f is simple if f is measurable and its rangeis finite. For a measurable f such that f 0 define the Lebesgue integral of fon A R as

    A

    f(x)dx = sup

    {k

    n=0

    cn(A s1(cn))

    s simple 0 s f}

    where range s = {c1, . . . , ck}. We say a non-negative measurable function isintegrable on A if its integral is finite. If f takes negative values then let f+ =max(f, 0) and f = min(f, 0), and define the integral as

    A

    f(x)dx =

    A

    f+(x)dxA

    f(x)dx,

    provided both f+ and f are integrable, in which case we call f integrable. Forf : R C we say f is measurable when its real and imaginary parts ( a}{x R | f(x) < a} = {x R | |f(x)| > a}.So |f | is also measurable. It follows immediately that |f | is integrable wheneverf is, as

    A

    |f(x)|dx =A

    f+(x)dx+

    A

    f(x)dx,

    Continuous functions are measurable. Continuous functions are integrable overcompact sets. The sums and products of integrable functions are integrable.

    4.2 Hilbert Spaces

    A Hilbert space H is a vector space over the scalar field C, equipped with aconjugate symmetric, linear, and positive-definite inner product | : HH

    32

  • C such that : H R+0 with v = v|v12 is a norm, and that this norm

    defines a metric d(v, w) = vw under which H is Cauchy complete. We calla set A H orthonormal if we have that for any pair of vectors v, w A

    v|w =

    {1 if v = w

    0 if v 6= w.

    We say that v, w H are orthonormal and write v w whenever v|w = 0.Observe that v + w2 = v2 + w2 if and only if v w.

    The span of a set A is the closure in H under the metric topology of all finitelinear combinations of elements of A. That is

    spanA = closd

    {Nn=0

    nan

    N Z+, : Z+ C, a : Z+ A}.

    We say that a set A H is linearly independent if for every a A, a /span(A {a}). A linearly independent set which spans H is called a basis.An orthonormal basis is an especially useful mathematical tool. We will state anumber of useful facts of countable orthonormal bases without proof or assertingtheir existence in general. Suppose that A = {an | a : JA Z H} is anorthonormal basis of H and B = {bn | b : JB Z H} is an orthonormal setthen

    1. If z `2(Z) thennJB z(n)bn spanB.

    2. If v H has an|v = 0 for all n JA then v = 0.

    3. If v H and z : Z C has z(n) = an|v if n JA and z(n) = 0otherwise, then z `2(Z).

    4. If v H thenv =

    nJA

    an|van.

    One important result of a general Hilbert space is the Cauchy-Schwarz inequal-ity, which asserts that |v|w| vw. To see this observe that f : C R+0with f(t) = v tw2 0. Then f(t) = v2 tv|w tw|v+ |t|2w2. Then

    f

    (w|vw2

    )= v2 |w|v|

    2

    w2

    and the result follows.

    4.3 L2(R) as a Hilbert SpaceIn fact

    L2(A) =

    {f : A C

    A

    |f(t)|2dt

  • Theorem 5. L2(R) is a Hilbert space.

    Lemma 6. L2(R) is a vector space over the scalar field of C.

    Proof. Here we take the obvious addition and scalar multiplication (+) : L2(R)L2(R) L2(R) and () : C L2(R) L2(R) where (f + g)(x) = f(x) + g(x)and ( f)(x) = f(x).

    We first show that these operations are well defined with respect to theequivalence classes of . Let f1 f2 and g1 g2 be functions of L2(R) equalexcept on measure zero sets Zf and Zg respectively. Clearly f1(x) = f2(x)except on Zf , so f1 f2. We have f1(x) + g1(x) = f2(x) + g2(x) exceptpossibly on Zf Zg, which is also of measure zero. Then f1 + g1 f2 + g2.

    We next demonstrate that these binary operations are closed on L2(R). Theclosure of L2(R) under is implied by the linearity of the integral. We nextobserve that if f, g L2(R) then

    0 R

    (|f(x)| |g(x)|

    )2dx =

    R|f(x)|2dx 2

    R|f(x)g(x)|dx+

    R|g(x)|2dx

    implies

    0 R|f(x)g(x)|dx 1

    2

    R|f(x)|2dx+ 1

    2

    R|g(x)|2dx

  • Naturally f = 0 if and only if f(x) = 0 a.e. R. The linearity of the integralimplies that for any C we have f = ||f. Observe further that

    0 R

    R|f(x)g(y) f(y)g(x)|2dx dy

    =

    R

    R|f(x)|2|g(y)|2 + |f(y)|2|g(x)|2 f(x)g(x)f(y)g(y) f(y)g(y)f(x)g(x)dx dy

    = 2f2g2 2f |gg|f= 2f2g2 2|f |g|2.

    With the final result showing the frequently useful Cauchy-Swartz inequality:

    |f |g| fg.

    The final property of the norm we are interested in is the triangle inequality.Observe that

    f + g =(

    R|f(x)|2 + |g(x)|2 + f(x)g(x) + f(x)g(x) dx

    ) 12

    =(f(x)2 + g(x)2 + f |g+ g|f

    ) 12

    (f(x)2 + g(x)2 + 2fg

    ) 12

    = f+ g

    Note that the triangle inequality, the obvious symmetry, and the positive valuesof d(f, g) = f g make it a metric on L2(R), where d(f, g) = 0 if and only iff g.

    We will require the following helpful result to show that this metric is Cauchycomplete on L2(R).

    Lemma 7 (Monotone Convergence Theorem). Let (fn : R R+0 {} | n Z+) be a sequence of non-negative, measurable functions such that if i j thenfi(x) fj(x) almost everywhere in R. Then f : R R+0 {} defined almosteverywhere by f(x) = limn fn(x) is measurable and

    limn

    Rfn(x)dx =

    Rf(x)dx.

    Proof. We first claim that f is measurable. Let Z R be the set such thatx Z if (fn(x) | n Z+) is not monotone. By hypothesis, Z has 0 measure,fn is measurable for every n Z+. This means that for every a R we have{x R | fn(x) > a} is measurable. Because by Theorem 4 the -algebra ofmeasurable sets is closed under complements, intersections, and unions, we seethat {x R | fn(x) > a} Zc is measurable. In Zc we see f(x) fn(x) forevery n, so

    {x Zc | f(x) > a} nZ+

    {x Zc | fn(x) > a}

    35

  • We will see the reverse inclusion is also true. We have in Zc that f(x) =sup{fn(x) | n Z+}. Then if f(x) > a there must be some n with fn(x) > a,or a would be an upper bound of {fn(x) | n Z+}. Hence

    {x Zc | f(x) > a} =nZ+

    {x Zc | fn(x) > a}

    so {x Zc | f(x) > a} is measurable for every a R. As {x Z | f(x) > a} Zwe have that {x Z | f(x) > a} is measurable, hence

    {x R | f(x) > a} = {x Zc | f(x) > a} {x Z | f(x) > a}

    is measurable for every a R. Then f is a measurable function.Observe that

    (R fn(x)dx | n Z

    +)

    is a monotone sequence (possibly infin-

    ity) bounded above byR f(x)dx (possibly infinity), we have that

    (R fn(x)dx | n Z

    +)

    must converge to some value R {}

    = limn

    Rfn(x)dx

    Rf(x)dx.

    If = orR fn(x)dx = for any n Z

    +, then limnR fn(x)dx =

    R f(x)dx. Otherwise, let s be any simple, measurable function with 0 s(x) f(x) in Zc, and fix some (0, 1). Let An = {x Zc | s(x) fn(x)}.Because (fn(x) | n Z+) is monotone in Zc, it follows that An An+1. Forany x Zc, either f(x) = f1(x) = s(x) = 0 and x A1 or f(x) > s(x) andthere is some n Z+ such that fn(x) s(x). So Zc =

    nZ+ An. We see

    that s(x) f(x) impliesAn

    s(x)dx An

    fn(x)dx Rfn(x)dx (4.5)

    We take the limit as n on both sides of equation (4.5). Because theintegral is a countably additive set function and (An | n Z+) non-decreasingwe have that

    limn

    An

    s(x)dx =

    nZ+ An

    s(x)dx

    And as R andnZ+ An differ only by Z we have

    Rs(x)dx =

    nZ+ An

    s(x)dx

    Recall that the integral is linear, and equation (4.5) becomes

    Rs(x)dx lim

    n

    Rfn(x)dx.

    As is arbitrary we may take 1 to obtainRs(x)dx lim

    n

    Rfn(x)dx.

    36

  • By taking the supremum over all non-negative simple functions less than falmost everywhere, we arrive at the definition of the integral on the left so

    Rf(x)dx lim

    n

    Rfn(x)dx.

    Theorem 8. L2(R) is Cauchy complete.

    Proof. Let (fn) be a Cauchy sequence of functions in L2(R). As the distance

    between consecutive functions goes to zero we can find a subsequence (fnk) suchthat fnk fnk+1 < 12k . The for any g L

    2(R) we have by the Cauchy-Swarzinequality that

    ||g|||fnk fnk+1 || k=1

    |g| |fnk fnk+1 |=

    k=1

    R|g(x)||fnk(x) fnk+1(x)|dx

    =

    R|g(x)|

    k=1

    |fnk(x) fnk+1(x)|dx.

    by the Monotone Convergence Theorem. But this holds for arbitrary g L2(R).Let [a, b] R and let g be the indicator on the closed interval [a, b]. As g =b a >

    [a,b]

    k=1 |fnk(x) fnk+1(x)| dx it must be that

    k=1 |fnk(x)

    fnk+1(x)| is infinite almost everywhere on [a, b]. As [a, b] is arbitrary, it must bethat

    k=1 |fnk(x)fnk+1(x)| is finite almost everywhere on R. If

    k=1 |fnk(x)

    fnk+1(x)| converges, then for any > 0 it must be there is some N Z+ suchthat

    k=N

    |fnk(x) fnk+1(x)| < ,

    so if i, j N we have

    |fni(x) fnj (x)| j1k=i

    |fnk(x) fnk+1(x)| < ,

    So for almost every x R, (fnk(x))kZ+ is a Cauchy sequence in R. Then bythe Cauchy completeness of R we can define f(x) = limnk fnk(x) wheneverit converges (pointwise) and f(x) = 0 everywhere else (which is at most somemeasure zero set). We claim that fn f in L2(R).

    37

  • Fix some > 0. We have that there is some N Z+ with the property thatif n,m N then fm fn < . Let nk > N . Consider gj(x) = inf{|fni(x) fnk(x)|2 | i j}.

    f fnk =(

    R|f(x) fnk(x)|2dx

    ) 12

    =

    (R

    limj

    gj(x)dx

    ) 12

    .

    We have for almost every x in R that 0 g1(x) g2(x) . . . |f(x)fnk(x)|2and gj(x) |f(x)fnk(x)|2 so by the Monotone Convergence Theorem we have:

    f fnk =(

    limj

    Rgj(x)dx

    ) 12

    =

    (limi

    R

    inf{|fni(x) fnk(x)|2 i j}dx) 12

    (

    limi

    infik

    R|fni(x) fnk(x)|2dx

    ) 12

    .

    Then f fnk L2(R) and as fnk L2(R) we have by the additive closureof L2(R) that f L2(R). We also see that for any n > N we have there is somenk > n as defined above such that f fn f fnk+ fnk fn < 2. As is arbitrary it must be that fn f .

    38

  • Bibliography

    [1] Charles K. Chui. An introduction to wavelets. Academic Press Professional,Inc., San Diego, CA, USA, 1992.

    [2] Michael Frazier. An introduction to wavelets through linear algebra. Springer,1999.

    [3] Walter Rudin. Principles of mathematical analysis. McGraw-Hill Book Co.,New York, third edition, 1976. International Series in Pure and AppliedMathematics.

    [4] Terence Tao. An epsilon of room. I: Real analysis. Pages from year three ofa mathematical blog. Graduate Studies in Mathematics 117. Providence, RI:American Mathematical Society (AMS) , 2010.

    39