functional analysis i autumn term 2008 - warwickmasdh/fa1_2008.pdffunctional analysis i autumn term...

Functional Analysis I

Autumn Term 2008

James C. Robinson

Introduction

I hope that these notes will be useful. They are, of course, much more wordy

than the notes you will have taken in lectures, but the maths itself is usually

done in a little more detail and should generally be ‘tighter’. You may find

that the order in which the material is presented is a little different to the

lectures, but this should make things more coherent.

Solutions to the examples sheets will follow separately.

I hope that there are relatively few mistakes, but if you find yourself

staring at something thinking that it must be wrong then it most likely is,

so do email me at [email protected]. I will post a list of errata

as and when people find them on my webpage for the course,

www.maths.warwick.ac.uk/ jcr/FAI.

These notes will form the basis of the first part of a textbook on functional

analysis, so any general comments would also be welcome.

iii

Contents

1 Vector spaces and bases page 1

2 Norms and normed spaces 8

2.1 Norms and normed spaces 8

2.2 Convergence 16

3 Compactness and equivalence of norms 20

3.1 Compactness 20

4 Completeness 26

4.1 The completion of a normed space 31

5 Lebesgue integration 36

5.1 Integrals of functions in Lstep(R) 36

5.2 Increasing sequences of functions in Lstep(R): Linc(R) 39

5.3 The space L1(R) of integrable functions 43

5.4 The Lebesgue spaces Lp 46

6 Inner product spaces 48

6.1 Inner products and norms 49

iv

Contents v

6.2 The Cauchy-Schwarz inequality 49

6.3 The relationship between inner products and their norms 51

7 Orthonormal bases in Hilbert spaces 54

7.1 Orthonormal sets 54

7.2 Convergence and orthonormality in Hilbert spaces 56

7.3 Orthonormal bases in Hilbert spaces 62

8 Closest points and approximation 65

8.1 Closest points in convex subsets 65

8.2 Linear subspaces and orthogonal complements 66

8.3 Best approximations 69

9 Separable Hilbert spaces and ℓ2 73

10 Linear maps between Banach spaces 78

10.1 Bounded linear maps 78

10.2 Kernel and range 83

11 The Riesz representation theorem and the adjoint operator 85

11.1 Linear operators from H into H 91

12 Spectral Theory I: General theory 93

12.1 Spectrum and point spectrum 93

13 Spectral theory II: compact self-adjoint operators 99

13.1 Complexification and real eigenvalues 99

13.2 Compact operators 101

14 Sturm-Liouville problems 110

1

Vector spaces and bases

In all that follows we use K to denote R or C, although one can

define vector spaces over arbitrary fields.

Rn is the simplest and most natural example of a vector space. We give a

formal definition, but it is the closure property inherent in the definitions,

f + λg ∈ V, f, g ∈ V, λ ∈ K = R or C,

that one usually has to check.

Definition 1.1 A vector space V over K is a set V with operations + :

V × V → V and ∗ : K × V → V such that

• additive and multiplicative identities exist: there exists a zero element

0 ∈ V such that x + 0 = x for all x ∈ V ; and 1 ∈ K is the identity

for scalar multiplication, 1 ∗ x = x for all x ∈ V ;

• there are additive inverses: for every x ∈ V there exists an element

−x ∈ V such that x+ (−x) = 0;

• addition is commutative and associative, x+y = y+x and x+(y+z) =

(x+ y) + z for all x, y, z ∈ V ;

• multiplication is associative

α ∗ (β ∗ x) = (αβ) ∗ x for all α, β ∈ K, x ∈ V

and distributive

α ∗ (x+ y) = α ∗ x+ α ∗ y and (α+ β) ∗ x = α ∗ x+ β ∗ x

for all α, β ∈ K, x, y ∈ V .

1

2 Vector spaces and bases

The multiplication operator ∗ can usually be understood and so we gen-

erally drop this notation.

As remarked above we will only consider K = R or C here, and will refer to

real or complex vector spaces respectively. Generally we will omit the word

‘real’ or ‘complex’ unless wishing to make an explicit distinction between

real and complex vector spaces.

Examples: Rn is a real vector space over R; it is not a vector space over

C (since i ∗x /∈ Rn for any x ∈ Rn); Cn is a vector space over both R and C

(so if we take K = R the space Cn can be thought of, somewhat unnaturally,

as a ‘real vector space’).

Example 1.2 For 1 ≤ p < ∞ define the space ℓp(K) of all pth power

summable sequences with elements in K (recall that K = R or C):

ℓp(K) = x = (x1, x2, . . .) : xj ∈ K,

∞∑

j=1

|xj |p < +∞.

For p = ∞, ℓ∞(K) is the space of all bounded sequences in K.

For x, y ∈ ℓp(K) set

x+ y = (x1 + y1, x2 + y2, . . .),

and for α ∈ K, x ∈ ℓp, define

αx = (αx1, αx2, . . .).

With these definitions ℓp(K) is a vector space. The only issue is whether

x+ y is still in ℓp(K); for 1 ≤ p <∞ this follows since

n∑

j=1

|xj + yj|p ≤n∑

j=1

2p(|xj |p + |yj |p) ≤ 2p∞∑

j=1

|xj |p + 2p∞∑

j=1

|yj |p < +∞,

and for p = ∞ it is clear.

Sometimes we will simply write ℓp for ℓp(R).

Example 1.3 The space C0([0, 1]) of all real-valued continuous functions

on the interval [0, 1] is a vector space with the obvious definitions of addition

and scalar multiplication, which we give here for the one and only time: for

Vector spaces and bases 3

f, g ∈ C0([0, 1]) and α ∈ R, we denote by f + g the function whose values

are given by

(f + g)(x) = f(x) + g(x), x ∈ [0, 1],

and by αf the function whose values are

(αf)(x) = α f(x), x ∈ [0, 1].

Example 1.4 Denote by L1(0, 1) the set of all real-valued continuous func-

tions on (0, 1) for which∫ 1

0|f(x)|2 dx < +∞.

Then L1(0, 1) is a vector space (with the obvious definitions of addition and

scalar multiplication).

The only thing to check here is that f + λg ∈ L1(0, 1) whenever f, g ∈L1(0, 1) and λ ∈ R. Clearly f + λg ∈ C0(0, 1), and we have

∫ 1

0|(f + λg)(x)|2 dx =

∫ 1

0|f(x)| + |λ||g(x)|dx

≤(∫ 1

0|f(x)|dx+ |λ|

∫ 1

0|g(x)|dx

)

< +∞.

Note that if f ∈ C0([0, 1]) then, since it is a continuous function on a

closed bounded interval, it is bounded and attains its bounds. It follows

that for some M ≥ 0, |f(x)| ≤M for all x ∈ [0, 1], and so∫ 1

0|f(x)|dx ≤M <∞,

i.e. f ∈ L1(0, 1).

But while the function f(x) = x−1/2 is not continuous on [0, 1], it is

continuous on (0, 1) and∫ 1

0|x−1/2|dx =

[

2x1/2]1

0= 2 <∞,

so f ∈ L1(0, 1). These two examples show that C0([0, 1]) is a strict subset

of L1(0, 1).

We now discuss spanning sets, linear independence, and bases. Note


that the definitions – and the following arguments – also apply to infinite-

dimensional spaces. In particular the result of Lemma 1.9 is valid for infinite-

dimensional spaces.

Definition 1.5 The linear span of a subset E of a vector space V is the

collection of all finite linear combinations of elements of E:

Span(E) = v ∈ V : v =n∑

j=1

αjej , n ∈ N, αj ∈ K, ej ∈ E.

We say that E spans V if V = Span(E), i.e. every element of v can be

written as a finite linear combination of elements of E.

Note that this definition requires v to be expressed as a finite linear com-

bination of elements of E. When discussing bases for abstract vector spaces

with no additional structure the only option is to take finite linear combina-

tions, since these are defined using only the axioms for a vector space (scalar

multiplication and addition of vector space elements). In order to take infi-

nite linear combinations we require some way to discuss convergence, which

is not available in a general vector space.

Definition 1.6 A set E is linearly independent if any finite collection of

elements of E is linearly independent

n∑

j=1

αjej = 0 ⇒ α1 = · · · = αn = 0

for any choice of n ∈ N, αj ∈ K, and ej ∈ E.

Definition 1.7 A Hamel basis for V is an linearly independent spanning

set.

Expansions in terms of basis elements are unique:

Lemma 1.8 If E is a Hamel basis for V then any element of V can be

written uniquely in the form

v =

n∑

j=1

αjej

for some n ∈ N, αj ∈ K, and ej ∈ E.


For a proof see Linear Algebra.

If E is a linearly independent set that spans V then it is not possible to

find an element of V that can be added to E to obtain a larger linearly

independent set (otherwise E would not span V ). We now show that this

can be reversed.

Lemma 1.9 If E ⊂ V is maximal linearly independent set, i.e. a linearly

independent set E such that E ∪ v is not linearly independent for any

v ∈ V \ E. Then E is a Hamel basis for V .

Proof Suppose that E does not span V : in particular take v ∈ V that

cannot be written as any finite linear combination of elements of E. To

obtain a contradiction, choose n ∈ N and ejnj=1 with ej ∈ E, and suppose

thatn∑

j=1

αjej + αn+1v = 0.

Since v cannot be written as a sum of any finite collection of the ej, we

must have αn+1 = 0, which leaves∑n

j=1 αjej = 0. However, ej is a finite

subset of E and is thus linearly independent by assumption, and so we must

have αj = 0 for all j = 1, . . . , n + 1. But this says that E ∪ v is linearly

independent, a contradiction. So E spans V .

We recall here the following fundamental theorem:

Theorem 1.10 Suppose that V has a basis consisting of a finite number of

elements. Then every basis of V contains the same number of elements.

This allows us to make the following definition:

Definition 1.11 If V has a basis consisting of a finite number of elements

then the dimension of V is the number of elements in this basis. If V has

no finite basis then V is infinite-dimensional.

Since a Hamel basis is a maximal linearly independent set (Lemma 1.9),

it follows that a space is infinite-dimensional iff for every n ∈ N one can find

a set of n linearly independent elements of V .


Example 1.12 For any n ∈ N the n elements in ℓ2(K) given by

(1, 0, 0, 0, . . .), (0, 1, 0, 0, . . .), (0, 0, ..., 0, 1, 0, . . .),

i.e. elements e(j) with e(j)j = 1 and e

(j)i = 0 if i 6= j, are linearly independent.

It follows that ℓ2(K) is an infinite-dimensional vector space.

Example 1.13 Consider the functions fn ∈ C0([0, 1]), where fn is zero for

x /∈ In = [2−n−2−(n+2), 2−n+2−(n+2)] and interpolates linearly between the

values

fn(2−n − 2−(n+2)) = 0 fn(2

−n) = 1 fn(2−n + 2−(n+2)) = 0.

The intervals In where fn 6= 0 are disjoint, but f(2−n) = 1. It follows that

for any n the fjnj=1 are linearly independent, and so C0([0, 1]) is infinite-

dimensional.

We end this section with the following powerful-looking theorem:

Theorem 1.14 Every vector space has a Hamel basis.

The proof makes use of Zorn’s Lemma. In order to state this result (which

in fact is more of an axiom, since it is equivalent to the axiom of choice) we

need to introduce some auxiliary concepts.

A partial order on a set P is a binary relation on P such that

(i) a a for all a ∈ P ,

(ii) a b and b a implies that a = b, and

(iii) a b and b c implies that a c.

The order is ‘partial’ because two arbitrary elements of P need not be or-

dered: consider for example, the case when P consists of all subsets of R

and X Y if X ⊆ Y ; one cannot order [0, 1] and [1, 2].

A subset C of P in which all elements can be ordered is called a chain

(i.e. for all a, b ∈ C, either a b or b a (or both, in which case a = b)).

An element b ∈ P is an upper bound for a subset S of P if s b for all

s ∈ P . An element m of P is maximal if m a for some a ∈ P implies that

a = m.


Lemma 1.15 (Zorn’s Lemma) If every chain in a non-empty partially

ordered set P has an upper bound, then P has at least one maximal element.

We can now give the proof of Theorem 1.14.

Proof If V is finite-dimensional then V has a finite-dimensional basis, by

definition.

So assume that V is infinite-dimensional. Let P be the collection of all

linearly independent subsets of V . Let E1 E2 if E1 ⊆ E2. Let C =

Eii∈I be a chain in P , where I is an index set, and let

E∗ =⋃

i∈I

Ei.

Note that E∗ is linearly independent, since any finite collection of elements

of E∗ must be contained in one of the Ei (which is linearly independent).

Clearly Ei E∗ for all i ∈ I , so E∗ is an upper bound for C.

It follows from Zorn’s Lemma that P has a maximal element, i.e. a max-

imal linearly independent set. By Lemma 1.9 this is a basis for V .

Exercise 1.16 Show that the space ℓf consisting of all sequences that contain

only finitely many non-zero terms is a vector space. Show that

ej = (0, . . . , 0, 1, 0, . . .)

(all zeros except for a single 1 in the jth position) is a Hamel basis for ℓf .

This is a very artificial example. No Banach space (a particularly nice

kind of vector space, see later) can have a countable Hamel basis. We will

show that no Hilbert space can have a countable Hamel basis in Chapter *.

2

Norms and normed spaces

2.1 Norms and normed spaces

Definition 2.1 A norm on a vector space V is a map ‖ · ‖ : V → R such

that for all x, y ∈ V and α ∈ K

(i) ‖x‖ ≥ 0 with equality iff x = 0;

(ii) ‖αx‖ = |α|‖x‖; and

(iii) ‖x+ y‖ ≤ ‖x‖ + ‖y‖ (‘the triangle inequality’).

A vector space equipped with a norm is called a normed space.

Strictly a normed space should be written (V, ‖ · ‖V ) where ‖ · ‖V is the

particular norm on V . However, many normed spaces have standard norms,

and so often the norm is not specified. E.g. unless otherwise stated, Rn is

equipped with the standard norm

‖x‖ =

n∑

j=1

|xj |2

1/2

. (2.1)

However, others are possible, such as

‖x‖∞ = maxi=1,...,n

|xi| and ‖x‖1 =∑

i

|xi|.

Exercise 2.2 Show that ‖ · ‖, ‖ · ‖∞, and ‖ · ‖1 are all norms on Rn.

8


Example 2.3 For 1 ≤ p <∞ the standard norm on ℓp(K) is

‖x‖ℓp =

∞∑

j=1

|xj |p

1/p

,

where x = (x1, x2, x3, . . .). For p = ∞ we set

‖x‖ℓ∞ = supj

|xj |.

Note that when p = 2 and K = R this is the natural extension of the standard

Euclidean norm to a countable collection of real numbers.

We now show that this really does define a norm. It is clear that ‖x‖ℓp ≥ 0,

and that if the norm is zero that x = 0. It is also clear that

‖λx‖ℓp =

∞∑

j=1

|λxj|p

1/p

=

∞∑

j=1

|λ|p|xj|p

1/p

= |λ|

∞∑

j=1

|xj |p

1/p

= |λ|‖x‖ℓp .(This requirement explains why we have to take the pth root of the sum of

pth powers.)

It is the triangle inequality that requires some work. Although the ar-

gument is a little long, on the way we will prove two auxiliary – and very

useful – inequalities. We say that (p, q) with 1 < p, q <∞ are conjugate if

1

p+

1

q= 1; (2.2)

we extend to definition to cover the case p = 1, q = ∞.

Lemma 2.4 (Young’s inequality) Let a, b > 0 and (p, q) conjugate with

1 < p, q <∞. Then

ab ≤ ap

p+bq

q. (2.3)

Proof Consider the function

f(t) =tp

p+

1

q− t.

Thendf

dt= tp−1 − 1,


Note that for p = q = 2 we obtain the Cauchy-Schwarz inequality for

x, y ∈ ℓ2,

∞∑

j=1

|xjyj| ≤

∞∑

j=1

|xj|2

1/2

∞∑

j=1

|yj|2

1/2

.

Choosing x = (x1, . . . , xn, 0, 0, · · · ) and y = (y1, . . . , yn, 0, 0, · · · ) we recover

the Cauchy-Schwarz inequality in Rn,

n∑

j=1

|xjyj| ≤

n∑

j=1

|xj|2

1/2

n∑

j=1

|yj|2

1/2

, (2.5)

which is just the familiar inequality for dot products |x · y| ≤ |x||y|.

Finally we can prove the triangle inequality for the ℓp norm:

Lemma 2.6 (Minkowski’s inequality) Let x, y ∈ ℓp, 1 ≤ p ≤ ∞. Then

x+ y ∈ ℓp, and

‖x+ y‖ℓp ≤ ‖x‖ℓp + ‖y‖ℓp . (2.6)

Proof For p = ∞ the triangle inequality is clear, since

nmaxj=1

|xj + yj| ≤n

maxj=1

|xj | +n

maxj=1

|yj |,

and the proof follows by taking limits. Similarly for p = 1,

n∑

j=1

|xj + yj| ≤n∑

j=1

|xj | +n∑

j=1

|yj |,

and again one takes the limit as n→ ∞.

For 1 ≤ p < ∞, we have already noted that x + y ∈ ℓp. Now let q be

the conjugate exponent to p, i.e. q = p/(p − 1) (so that 1/p + 1/q = 1). It

follows that (p − 1)q = p, and therefore, since x + y ∈ ℓp, the sequence z


with zj = |xj + yj |p−1 ∈ ℓq. So we can write

n∑

j=1

|xj + yj|p ≤n∑

j=1

|xj + yj|p−1(|xj | + |yj |)

=

n∑

j=1

|xj + yj|p−1|xj | +n∑

j=1

|xj + yj|p−1|yj |

≤∞∑

j=1

|xj + y + j|p−1|xj | +n∑

j=1

|xj + y + j|p−1|yj |

≤

∞∑

j=1

|xj + y +j |(p−1)q

1/q

(‖x‖ℓp + ‖y‖ℓp)

=

n∑

j=1

|xj + yj|p

1/q

(‖x‖ℓp + ‖y‖ℓp),

i.e.

n∑

j=1

|xj + yj |p

1−1/q

≤ ‖x‖ℓp + ‖y‖ℓp .

Since 1 − (1/q) = 1/p one can now let n→ ∞ to obtain (2.6).

One of our main concerns in what follows will be normed spaces that

consist of functions. For example, the following are norms on C0([0, 1]), the

space of all continuous functions on [0, 1]: the ‘sup(remum) norm’,

‖f‖∞ = supx∈[0,1]

|f(x)|

(convergence in this norm is equivalent to uniform convergence) and the L1

and L2 norms

‖f‖L1 =

∫ 1

0|f(x)| dx and ‖f‖L2 =

(∫ 1

0|f(x)|2 dx

)1/2

.

Note that of the three candidates here the L2 norm looks most like the

expression (2.1) for the familiar norm in Rn.

Lemma 2.7 ‖ · ‖L1 is a norm on C0([0, 1]).


Proof The only part that requires much thought is (i), to make sure that

‖f‖L1 = 0 iff f = 0. So suppose that f 6= 0. Then |f(y)| = δ > 0 for some

y ∈ (0, 1) (if f(0) 6= 0 or f(1) 6= 0 it follows from continuity that f(y) 6= 0

for some y ∈ (0, 1)). Since f is continuous, there exists an ǫ > 0 such that

for any x ∈ (0, 1) with |x− y| < ǫ we have

|f(x) − f(y)| < δ/2.

If necessary, reduce ǫ so that [y − ǫ, y + ǫ] ∈ (0, 1). Then∫ 1

0|f(x)|2 dx ≥

∫ y+ǫ

y−ǫ|f(x)|2 dx ≥

∫ y+ǫ

y−ǫ

δ

2dx = ǫδ > 0.

(ii) and (iii) are clear since

‖αf‖L1 =

∫

|αf(x)|dx = |α|∫

|f(x)|dx = |α|‖f‖L1

and

‖f + g‖L1 =

∫

|f(x) + g(x)|dx ≤∫

|f(x)| + |g(x)|dx ≤ ‖f‖L1 + ‖g‖L1 .

For ‖ · ‖L2 (i) and (ii) are the same as above; we will see (iii) below as a

consequence of the Cauchy-Schwarz inequality for inner products.

Definition 2.8 Two norms ‖·‖1 and ‖·‖2 on a vector space V are equivalent

– we write ‖ · ‖1 ∼ ‖ · ‖2 – if there exist constants 0 < c1 ≤ c2 such that

c1‖x‖1 ≤ ‖x‖2 ≤ c2‖x‖1 for all x ∈ V.

It is clear that the above notion of ‘equivalence’ is reflexive. It is also

transitive:

Lemma 2.9 Suppose that ‖ · ‖1, ‖ · ‖2, and ‖ · ‖3 are all norms on a vector

space V , and that ‖ · ‖2 and ‖ · ‖3 are both equivalent to ‖ · ‖1. Then ‖ · ‖2

and ‖ · ‖3 are equivalent.

Proof There exist constants 0 < α1 ≤ α2 and 0 < β1 ≤ β2 such that

α1‖x‖2 ≤ ‖x‖1 ≤ α2‖x‖2 and β1‖x‖3 ≤ ‖x‖1 ≤ β2‖x‖3,

and so

‖x‖2 ≥ α−12 ‖x‖1 ≥ β1α

−12 ‖x‖3


and

‖x‖2 ≤ α−11 ‖x‖1 ≤ β2α

−11 ‖x‖3,

i.e. β1α−12 ‖x‖3 ≤ ‖x‖2 ≤ β2α

−11 ‖x‖3 and ‖ · ‖2 and ‖ · ‖3 are equivalent.

Exercise 2.10 Show that the norms ‖ · ‖, ‖ · ‖1, and ‖ · ‖∞ on Rn are all

equivalent.

This is a particular case of the general result that all norms on a finite-

dimensional vector space are equivalent, which we will prove in the following

chapter. As part of this proof, the following proposition – which shows that

one can always find a norm on a finite-dimensional vector space – will be

useful.

Proposition 2.11 Let V be an n-dimensional vector space, and E = ejnj=1

a basis for V . Define a map ‖ · ‖E : V → [0,∞) by

∥

∥

∥

∥

∥

∥

n∑

j=1

αjej

∥

∥

∥

∥

∥

∥

E

=

n∑

j=1

|αj |2

1/2

(taking the positive square root). Then ‖ · ‖E is a norm on V .

Proof First, note that any v ∈ V can be written uniquely as v =∑

j αjej ,

so the map v 7→ ‖v‖E is well-defined. We check that ‖ ·‖E satisfies the three

requirements of a norm:

(i) clearly ‖v‖E ≥ 0, and if ‖v‖E = 0 then v =∑

αjej with∑ |αj |2 = 0¡

i.e. αj = 0 for j = 1, . . . , n, and so v = 0.

(ii) If v =∑

j αjej then λv =∑

j(λαj)ej , and so

‖λv‖2E =

∑

j

|λαj |2 = |λ|2∑

j

|αj |2 = |λ|2‖v‖2E .

(iii) For the triangle inequality, if u =∑

j αjej and v =∑

j βjej then,


using the Cauchy-Schwarz inequality (2.5)

‖u+ v‖2E =

∥

∥

∥

∥

∥

∥

∑

j

(αj + βj)ej

∥

∥

∥

∥

∥

∥

2

=∑

j

|αj + βj |2

=∑

j

|αj |2 + αjβj + αjβj + |βj |2

= ‖u‖2E +

∑

j

αjβj +∑

j

αjβj + ‖v‖2E

≤ ‖u‖2E +

∑

j

|αj |2

1/2

∑

j

|βj |2

1/2

+ ‖v‖2E

= ‖u‖2E + 2‖u‖E‖v‖E + ‖v‖2

E

= (‖u‖E + ‖v‖E)2,

i.e. ‖u+ v‖E ≤ ‖u‖E + ‖v‖E .

We now want to show that with ‖·‖E norm, any finite-dimensional normed

space is ‘the same’ as Rn. For two objects to be ‘the same’ we generally

require an isomorphism that also preserves the essentially structures of the

objects. Here we want to say that two linear spaces, along with their norms,

are ‘the same’. So we will need the isomorphism to be linear (so that ϕ(x)+

ϕ(y) = ϕ(x+y)) and we will also want to preserve the norm (‘an isometry’).

Definition 2.12 Two normed spaces (X, ‖ · ‖X) and (Y, ‖ · ‖Y ) are isomet-

rically isomorphic, or simply isometric, if there exists a linear isomorphism

ϕ : X → Y that is also an isometry, i.e.

‖ϕ(x)‖Y = ‖x‖X for all x ∈ X.

Corollary 2.13 With the norm defined in Proposition 2.11, (V, ‖ · ‖E) is

isometrically isomorphic to (Rn, | · |).


Proof For v ∈ (V, ‖ · ‖E) with v =∑n

j=1 αjej , define

ϕ(

n∑

j=1

αjej) = (α1, . . . , αn),

which has a well-defined inverse

ϕ−1(α1, . . . , αn) =

n∑

j=1

αjej.

It is clear that ϕ is one-to-one and onto, that ϕ (and its inverse) are linear,

and it follows directly from the definition of ‖ · ‖E that |ϕ(x)| = ‖x‖E for all

x ∈ V .

2.2 Convergence

In a normed space we can measure the distance between x and y using

‖x− y‖. So we can define notions of convergence and continuity using this

idea of distance:

Definition 2.14 A sequence xk∞k=1 in a normed space X converges to a

limit x ∈ X if for any ǫ > 0 there exists an N such that

‖xk − x‖ < ǫ for all n ≥ N.

This definition is sensible in that limits are unique:

Exercise 2.15 Show that the limit of a convergent sequence is unique.

The following result shows that if xn → x then the norm of xn converges

to the norm of x. This will turn out to be a very useful observation.

Lemma 2.16 If xn → x in (X, ‖ · ‖) then ‖xn‖ → ‖x‖.

Proof The triangle inequality gives

‖xn‖ ≤ ‖x‖ + ‖xn − x‖ and ‖x‖ ≤ ‖xn‖ + ‖x− xn‖which implies that

∣

∣

∣‖xn‖ − ‖x‖

∣

∣

∣≤ ‖xn − x‖.

2.2 Convergence 17

Two equivalent norms give rise to the same notion of convergence:

Lemma 2.17 Suppose that ‖ · ‖1 and ‖ · ‖2 are two equivalent norms on a

space X. Then

‖xn − x‖1 → 0 iff ‖xn − x‖2 → 0,

i.e. convergence in one norm is equivalent to convergence in the other, with

the same limit.

The proof of this lemma is immediate from the definition of the equiva-

lence of norms, since there exist constants 0 < c1 ≤ c2 such that

c1‖xn − x‖1 ≤ ‖xn − x‖2 ≤ c2‖xn − x‖1;

Using convergence we can also define continuity:

Definition 2.18 A map f : (X, ‖ · ‖X) → (Y, ‖ · ‖Y ) is continuous if

xn → x in X ⇒ f(xn) → f(x) in Y,

i.e. if

‖xn − x‖X → 0 ⇒ ‖f(xn) − f(x)‖Y → 0.

Exercise 2.19 Show that this is equivalent to the ǫ–δ definition of continu-

ity: for each x ∈ X, for every ǫ > 0 there exists a δ > 0 such that

‖y − x‖X < ǫ ⇒ ‖f(y) − f(x)‖Y < δ.

Lemma 2.17 has an immediate implication for continuity:

Corollary 2.20 Suppose that ‖ · ‖X,1 ∼ ‖ · ‖X,2 are two equivalent norms

on a space X, and ‖ · ‖Y,1 ∼ ‖ · ‖Y,2 are two equivalent norms on a space

Y . Then a function f : (X, ‖ · ‖X,1) → (Y, ‖ · ‖Y,1) is continuous iff it is

continuous as a map from (X, ‖ · ‖X,2) into (Y, ‖ · ‖Y,2).

We remarked above that all norms on a finite-dimensional space are equiv-

alent, which means that there is essentially only one notion of ‘convergence’

and of ‘continuity’. But in infinite-dimensional spaces there are distinct


1

fk

12

12− 1

k12

+ 1k

k2 − 1

1k

1k+1

1k−1

gk

Fig. 2.1. (a) definition of fk and (b) definition of gk.

norms, and the different notions of convergence implied by the norms we

have introduced for continuous functions are not equivalent, as we now show.

First we note that convergence of fk ∈ C0([0, 1]) to f in the supremum

norm, i.e.

supx∈[0,1]

|fk(x) − f(x)| → 0 (2.7)

implies that fk → f in the L1 norm, since clearly

∫ 1

0|fk(y)−f(y)|dy ≤

∫ 1

0

(

supx∈[0,1]

|fk(x) − f(x)|)

dy = supx∈[0,1]

|fk(x)−f(x)|.

This inequality should make very clear the advantage of the shorthand norm

notation, since it just says

‖fk − f‖L1 ≤ ‖fk − f‖∞.It is also clear that if (2.7) holds then fk(x) → f(x) for each x ∈ [0, 1], which

is ‘pointwise convergence’. However, neither pointwise convergence nor L1

convergence imply uniform convergence:

Example 2.21 Consider the sequence of functions fk as illustrated in

Figure 2.2(a). Then fk → 0 in the L1 norm, since

‖fk − 0‖L1 = ‖fk‖L1 =1

k.

However, fk 6→ 0 pointwise, since fk(12) = 1 for all k.

2.2 Convergence 19

Example 2.22 Consider the sequence of functions gk as illustrated in

Figure 2.2(b). Then fk → 0 pointwise, but

‖fk‖L1 = 12(k2 − 1)

[(

1

k − 1− 1

k

)

+

(

1

k− 1

k + 1

)]

= 1,

so fk 6→ 0 in the L1 norm.

3

Compactness and equivalence of norms

3.1 Compactness

One fundamental property of the real numbers is expressed by the Bolzano-

Weierstrass Theorem:

Theorem 3.1 (Bolzano-Weierstrass) A bounded sequence of real num-

bers has a convergent subsequence.

This can easily be generalised to sequences in Rn:

Corollary 3.2 (Bolzano-Weierstrass in Rn) A bounded sequence in Rn

has a convergent subsequence.

Proof Let x(k) = (x(k)1 , . . . , x

(k)n be a bounded sequence in Rn. Since

x(k)1 is a bounded sequence in R, there is a subsequence x(k1,j) for which

x(k1,j)1 converges. Since x(k1,j) is again a bounded sequence in Rn, x

(k1,j)2 is

a bounded sequence in R. We can therefore find a subsequence x(k2,j) of

x(k1,j ) such that x(k2,j)2 converges. Since x(k2,j) is a subsequence of x(k1,j),

x(k2,j)1 still converges. We can continue this process inductively to obtain a

subsequence x(kn,j) such that all the x(kn,j)i for i = 1, . . . , n converge.

We now make two definitions:

20

3.1 Compactness 21

Definition 3.3 A subset X of a normed space (V, ‖ · ‖) is bounded if there

exists an M > 0 such that

‖x‖ ≤M for all x ∈ X.

Note that if ‖·‖1 ∼ ‖·‖2 are two equivalent norms on V then X is bounded

wrt ‖ · ‖1 iff it is bounded wrt ‖ · ‖2.

Definition 3.4 A subset X of a normed space (V, ‖·‖) is closed if whenever

a sequence xn with xn ∈ X converges to some x, we must have x ∈ X.

Note that if ‖ · ‖1 ∼ ‖ · ‖2 are two equivalent norms on V then X is closed

wrt ‖ · ‖1 iff it is closed wrt ‖ · ‖2.

Example: any closed interval in R is closed in this sense. Any product of

closed intervals in closed in Rn.

Exercise 3.5 Show that if (X, ‖ · ‖) is a normed space then the unit ball

BX([0, 1]) = x ∈ X : ‖x‖ ≤ 1and the unit sphere

SX = x ∈ X : ‖x‖ = 1are both closed.

Definition 3.6 A subset K of a normed space (V, ‖ · ‖) is compact if any

sequence xn with xn ∈ K has a convergent subsequence xnj→ x∗ with

x∗ ∈ K.

Note that if ‖·‖1 ∼ ‖·‖2 are two equivalent norms on V then X is compact

wrt ‖ · ‖1 iff it is compact wrt ‖ · ‖2.

Two properties of compact sets are easy to prove:

Theorem 3.7 A compact set is closed and bounded.

Proof Let K be a compact set in (V, ‖ · ‖) and xn → x with xn ∈ K. Since

K is compact xn has a convergent subsequence; its limit must also be x,

and from the definition of compactness x ∈ K, and so K is closed.

Suppose that K is not bounded. Then for each n ∈ N there exists an

22 Compactness and equivalence of norms

xn ∈ K such that ‖xn‖ ≥ n. But xn must have a convergent subsequence,

and any convergent sequence is bounded, which yields a contradiction.

It follows from the Bolzano-Weierstrass theorem that any closed bounded

set K in Rn is compact: A sequence in a bounded subset K of Rn has a

convergent subsequence by Corollary 3.2; since K is closed by definition this

subsequence converges to an element of K. So K is compact. We have

therefore shown:

Theorem 3.8 A subset of Rn is compact iff it is closed and bounded.

We will see later that this characterisation does not hold in infinite-

dimensional spaces (and this is one way to characterise such spaces).

We now prove two fundamental results about continuous functions on

compact sets:

Theorem 3.9 Suppose that K ⊂ (X, ‖ · ‖X) is compact and that f is a

continuous map from (X, ‖ · ‖X) into (Y, ‖ · ‖Y ). Then f(K) is a compact

subset of (Y, ‖ · ‖Y ).

Proof Let yn ∈ f(K). Then yn = f(xn) for some xn ∈ K. Since

xn ∈ K, and K is compact there is a subsequence of xn that converges,

xnj→ x∗ ∈ K. Since f is continuous it follows that as j → ∞

ynj= f(xnj

) → f(x∗) = y∗ ∈ f(K),

i.e. the subsequence ynjconverges to some y∗ ∈ f(K), and so f(K) is

compact.

From which follows:

Proposition 3.10 Let K be a compact subset of (X, ‖·‖). Then any contin-

uous function f : K → R is bounded and attains its bounds, i.e. there exists

an M > 0 such that |f(x)| ≤ M for all x ∈ K, and there exist x, x ∈ K

such that

f(x) = infx∈K

f(x) and f(x) = supx∈K

f(x).

3.1 Compactness 23

Proof Since f is continuous and K is compact, f(K) is a compact subset

of R, i.e. f(K) is closed and bounded. It follows that

f = supy∈f(K)

y ∈ f(K),

and so f = f(x) for some x ∈ K. [That sup(S) ∈ S for any closed S is clear,

since for each n there exists an sn ∈ S such that sn > sup(S) − 1/n. Since

sn ≤ sup(S) by definition, sn → sup(S), and it follows from the fact that S

is closed that sup(S) ∈ S.] The argument for x is identical.

This allows one to prove the equivalence of all norms on a finite-dimensional

space.

Theorem 3.11 Let V be a finite-dimensional vector space. Then all norms

on V are equivalent.

Proof Let E = ejnj=1 be a basis for V , and let ‖ · ‖E be the norm on V

defined in Proposition 2.11. Let ‖ · ‖ be another norm on V . We will show

that ‖ · ‖ is equivalent to ‖ · ‖E . Since equivalence of norms is an equivalence

relation, this will imply that all norms on V are equivalent.

Now, if u =∑

j αjej then

‖u‖ =

∥

∥

∥

∥

∥

∥

∑

j

αjej

∥

∥

∥

∥

∥

∥

≤∑

j

|αj |‖ej‖ (using the triangle inequality)

≤

∑

j

|αj |2

1/2

∑

j

‖ej‖2

1/2

(using the Cauchy-Schwarz inequality)

= CE‖u‖E ,

where C2E =

∑

j ‖ej‖2, i.e. CE is a constant that does not depend on u.

Now, observe that this estimate ‖u‖ ≤ CE‖u‖E implies for u, v ∈ V ,

‖u− v‖ ≤ CE‖u− v‖E ,

and so the map u 7→ ‖u‖ is continuous from (V, ‖ · ‖E) into R.

Now, note that set

SV = u ∈ V : ‖u‖E = 1

24 Compactness and equivalence of norms

is the image of Sn = α ∈ Rn : |α| = 1 under the map α 7→ ∑nj=1 αjej .

Since by definition∥

∥

∥

∥

∥

∥

n∑

j=1

αjej

∥

∥

∥

∥

∥

∥

E

= |α|,

this map is continuous. Since Sn is closed and bounded, it is a compact

subset of Rn; since SV is the image of Sn under a continuous map, it is also

compact.

Therefore the map v 7→ ‖v‖ is bounded on SV , and attains its bounds. In

particular, there exists an a ≥ 0 such that

‖v‖ ≥ a for every v ∈ V with ‖v‖E = 1.

Since the bound is attained, there exists a v ∈ SV such that ‖v‖ = a. If

a = 0 then ‖v‖ = 0, i.e. v = 0. But since v ∈ SV we have ‖v‖E = 1, and

so v cannot be zero. It follows that a > 0. Then for an arbitrary v ∈ V , we

have v/‖v‖E ∈ SV , and so∥

∥

∥

∥

v

‖v‖E

∥

∥

∥

∥

≥ a ⇒ ‖v‖ ≥ a‖v‖E .

Combining this with ‖u‖ ≤ CE‖u‖E shows that ‖·‖ and ‖·‖E are equivalent.

Corollary 3.12 A subset of a finite-dimensional normed space V is compact

iff it is closed and bounded.

Proof Let K be a closed bounded subset of (V, ‖ · ‖). Then K is also a

closed bounded subset of (V, ‖ · ‖E), since ‖ · ‖ ∼ ‖ · ‖E .

The map ϕ : (V, ‖ · ‖E) → Rn defined by

ϕ

∑

j

αjej

= (α1, . . . , αn)

is an isometry: ϕ and ϕ−1 are continuous, and |ϕ(x)| = ‖x‖E .

Since |ϕ(x)| = ‖x‖E and K is bounded, it follows that ϕ(K) is bounded.

Since ϕ−1 is continuous andK is closed, it follows that ϕ(K) is closed. To see

this, take yn ∈ ϕ(K), and suppose that yn → y∗. Since ϕ−1 is continuous,

it follows that

ϕ−1(yn) → ϕ−1(y∗),

3.1 Compactness 25

and since ϕ−1(yn) ∈ K and K is closed it follows that ϕ−1(y∗) ∈ K. In

particular, therefore, y∗ = ϕ(ϕ−1(y∗)) ∈ ϕ(K) and ϕ(K) must be closed.

So ϕ(K) is a closed bounded subset of Rn, and hence compact. ϕ−1 is

continuous, so K = ϕ−1(ϕ(K)) is the continuous image of a compact set,

and hence a compact subset of (V, ‖ · ‖E). Since ‖ · ‖ ∼ ‖ · ‖E , it follows that

K is a compact subset of (V, ‖ · ‖) as required.

4

Completeness

In the treatment of convergent sequences of real numbers, one natural ques-

tion is whether there is a way to characterise which sequences converge

without knowing their limit. The answer, of course, is yes, and is given by

the notion of a Cauchy sequence.

Theorem 4.1 A sequence of real numbers xn∞n=1 converges if and only if

it is a Cauchy sequence, i.e. given any ǫ > 0 there exists an N such that

|xn − xm| < ǫ for all n,m ≥ N.

Note that the proof makes use of the Bolzano-Weierstrass Theorem, so

is in some way entangled with compactness properties of closed bounded

subsets of R.

A sequence in a normed space (X, ‖ · ‖) is Cauchy if given any ǫ > 0 there

exists an N such that

‖xn − xm‖ < ǫ for all n,m ≥ N.

Lemma 4.2 Any Cauchy sequence is bounded.

Proof There exists an N such that

‖xn − xm‖ < 1 for all n,m ≥ N.

It follows that in particular ‖xn‖ ≤ ‖xN‖+ 1 for all n ≥ N , and hence ‖xn‖is bounded.

26

Completeness 27

Definition 4.3 A normed space X is complete if any Cauchy sequence in

X, converges to some x ∈ X. A complete normed space is called a Banach

space.

Theorem 4.1 states that R with its standard norm is complete (‘R is a

Banach space’). It follows fairly straightforwardly that the same is true for

any finite-dimensional normed space.

Theorem 4.4 Every finite-dimensional normed space (V, ‖ · ‖) (over R or

C) is complete.

Proof Choose a basis E = (e1, . . . , en) of V , and define another norm ‖ · ‖Eon V by

∥

∥

∥

∥

∥

∥

n∑

j=1

xjej

∥

∥

∥

∥

∥

∥

E

=

n∑

j=1

|xj|2

1/2

.

Since all norms on V are equivalent (Theorem 3.11), a sequence xk that

is Cauchy in ‖ · ‖ is Cauchy in ‖ · ‖E .

Writing xk =∑n

j=1 xkj ej it follows that given any ǫ > 0 there exists an Nǫ

such that for k, l ≥ Nǫ

‖xk − xl‖2E =

n∑

j=1

|xkj − xlj |2 < ǫ2. (4.1)

In particular xkj is a Cauchy sequence of real numbers for each fixed j =

1, . . . , n. It follows that for each j = 1, . . . , n we have xnj → x∗j for some x∗j .

Set x∗ =∑n

j=1 x∗jej .

Letting l → ∞ in (4.1) shows that

‖xk − x∗‖2E =

n∑

j=1

|xkj − x∗j |2 ≤ ǫ2 for all n ≥ Nǫ,

i.e. xn → x∗ wrt ‖ · ‖E . It follows that xn → x∗ wrt ‖ · ‖, and clearly x∗ ∈ V ,

and so V is complete.

Note that in particular Rn is complete.

The completeness of ℓp is a little more delicate, but only in the final steps.

28 Completeness

Proposition 4.5 (Completeness of ℓp) The sequence space ℓp(K) (equipped

with its standard norm) is complete.

Proof Suppose that xk = (xk1 , xk2 , · · · ) is a Cauchy sequence in ℓp(K). Then

for every ǫ > 0 there exists an Nǫ such that

‖xn − xm‖pℓp =∞∑

j=1

|xnj − xmj |p < ǫp for all n,m ≥ Nǫ. (4.2)

In particular xkj ∞k=1 is a Cauchy sequence in K for every fixed j. Since K

is complete (recall K = R or C) it follows that for each k ∈ N

xkj → ak

for some ak ∈ R.

Set a = (a1, a2, · · · ). We want to show that a ∈ ℓp and that ‖xk−a‖ℓp → 0

as k → ∞. First, since xk is Cauchy we have from (4.2) that ‖xn−xm‖ℓp <ǫ for all n,m ≥ Nǫ, and so in particular for any N ∈ N

N∑

j=1

|xnj − xmj |p ≤∞∑

j=1

|xnj − xmj |p ≤ ǫp.

Letting m→ ∞ we obtain

N∑

j=1

|xnj − aj|p ≤ ǫp,

and since this holds for all N it follows that∞∑

j=1

|xnj − aj|p ≤ ǫp,

and so xk − a ∈ ℓp. But since ℓp is a vector space and xk ∈ ℓp, this implies

that a ∈ ℓp and ‖xk − a‖ℓp ≤ ǫ.

Since the norm on ℓ2 is the natural generalisation of the norm on Rn,

and since it is complete, it is tempting to think that ℓ2 will behave just like

Rn. However, it does not have the ‘Bolzano-Weierstrass property’ (bounded

sequences have a convergent subsequence) as we can see easily by considering

the sequence ej∞j=1, where ej consists entirely of zeros apart from a 1 in

the jth position. Then clearly ‖ej‖ℓ2 = 1 for all j; but if i 6= j then

‖ei − ej‖2ℓ2 = 2,

Completeness 29

i.e. any two elements of the sequence are always√

2 away from each other.

It follows that no subsequence of the ej can form a Cauchy sequence, and

so there cannot be a convergent subsequence.

This is really the first time we have seen a significant difference between Rn

and the abstract normed vector spaces that we have been considering. The

failure of the Bolzano-Weierstrass property is in fact a defining characteristic

of infinite-dimensional spaces.

Theorem 4.6 C0([0, 1]) equipped with the sup norm ‖ · ‖∞ is complete.

Proof Let fk be a Cauchy sequence in C0([0, 1]): so given any ǫ > 0 there


supx∈[0,1]

|fn(x) − fm(x)| < ǫ for all n,m ≥ N. (4.3)

In particular fk(x) is a Cauchy sequence for each fixed x, so fk(x) con-

verges for each fixed x ∈ [0, 1]: define

f(x) = limk→∞

fk(x).

We need to show that in fact fk → f uniformly. But this follows since for

every x ∈ [0, 1] we have from (4.3)

|fn(x) − fm(x)| < ǫ for all n,m ≥ N,

where N does not depend on x. Letting m→ ∞ in this expression we obtain

|fn(x) − f(x)| < ǫ for all n ≥ N,

where again N does not depend on x. It follows that

supx∈[0,1]

|fn(x) − f(x)| < ǫ for all n ≥ N,

i.e. fn converges uniformly to f on [0, 1]. Completeness of C0([0, 1]) then

follows from the fact that the uniform limit of a sequence of continuous

functions is still continuous.

For this reason the supremum norm is the ‘standard norm’ on C0([0, 1]);

if no norm is mentioned this is the norm that is intended.

Example 4.7 C0([0, 1]) equipped with the L1 norm is not complete.

30 Completeness

1

1

12

12− 1

k

0

fk

Fig. 4.1. Definition of fk.

Consider the sequence of functions fk defined by

fk(x) =

0 0 ≤ x ≤ 12 − 1

k

k[

x−(

12 − 1

k

)]

12 − 1

k < x < 12

1 12 ≤ x ≤ 1,

see Figure 4.

This sequence is Cauchy in the L1 norm, since for n,m ≥ N we have

fn(x) = fm(x) for all x < 12 − 1

N, x > 1

2 ,

and so

‖fn − fm‖L1 =

∫ 1

0|fn(x) − fm(x)|dx =

∫12

12−

1

N

|fn(x) − fm(x)|dx ≤ 1

N,

(4.4)

since |fn(x) − fm(x)| ≤ 1 for all x ∈ [0, 1] and all n,m ∈ N.

But what is the limit, f(x), as n→ ∞? Clearly one would expect f to be

the pointwise limit,

f(x) =

0 0 ≤ x < 12

1 12 ≤ x ≤ 1,


but this is certainly not continuous. It is not a priori obvious that fn → f

(in the L1 norm), but this is easy to check,

‖fk − f‖L1 =

∫ 1

0|fk(x) − f(x)|dx

=

∫12

12−

1

k

|k(x− 12 + 1

k )|dx

=1

2k→ 0 as k → ∞.

So this sequence converges in the L1 norm but not the sup norm.

4.1 The completion of a normed space

However, every normed space has a completion, i.e. a minimal set V such

that V ⊃ V and (V , ‖ · ‖) is a Banach space. Essentially V consists of all

limit points of Cauchy sequences in V (and in particular, therefore, contains

a copy of V via the constant sequence vn = v ∈ V ).

This implies that for any v ∈ V there exist vn ∈ V such that vn → v in

the norm ‖ · ‖; we say that V is dense in V .

Definition 4.8 Let (V, ‖ · ‖) be a normed space. Then X is dense in V if

for any v ∈ V and any ǫ > 0 there is an x ∈ X such that

‖x− v‖ < ǫ.

This is equivalent to the fact that given any v ∈ V there exists a sequence

xn ∈ X such that

‖xn − v‖ → 0 as n→ ∞.

This is the particularly useful form of ‘density’: if X is dense in V one can

often deduce properties of V by approximating them with elements of X.

Example 4.9 R is the completion of Q in the standard norm on R.

Exercise 4.10 Recall that we defined ℓf to be the set of all sequences in

which only a finite numbers of terms are non-zero. Show that ℓf is dense in

ℓ2.

32 Completeness

The description of the completion of (V, ‖ · ‖) above is not strictly cor-

rect. Clearly it must be missing some subtleties, since we are ‘adding’ to V

elements that are not in V and hence, in the setting of a general abstract

normed space (V, ‖ · ‖V ), are not defined.

A completion of (V, ‖ · ‖V ) is a Banach space (X, ‖ · ‖X) that contains an

isometric image of (V, ‖ · ‖V ) that is dense in X. One can show that there

is ‘only one’ completion in that any two candidates must be isometric:

Theorem 4.11 Let (X, ‖ · ‖X) be a normed space. Then there exists a

complete normed space (X , ‖ · ‖X ) and a linear map i : (X, ‖ · ‖X) →(X , ‖ · ‖X ) that is an isometry between X and its image, such that i(X)

is a dense subspace of X . Furthermore X is unique up to isometry; if

(Y , ‖ · ‖Y ) is a complete normed space and j : (X, ‖ · ‖X) → (Y , ‖ · ‖Y ) is

an isometry between X and its image, such that j(X) is a dense subspace

of Y , then X and Y are isometric.

Proof We consider Cauchy sequences in X, writing

x = (x1, x2, . . .) xj ∈ X

for a sequence in X. We say that two Cauchy sequences x and y are equiv-

alent, x ∼ y, if

limn→∞

‖xn − yn‖X = 0.

We let X be the space of equivalence classes of Cauchy sequences in X

(i.e. X = X/ ∼). It is clear that X is a vector space, since the sum of

two Cauchy sequences in X is again a Cauchy sequence in X. We define a

candidate for our norm on X : if η ∈ X then

‖η‖X = limn→∞

‖xn‖X , (4.5)

for any x ∈ η (recall that η is an equivalence class of Cauchy sequences).

Note that (i) if y is a Cauchy sequence in X, then ‖yn‖ forms a Cauchy

sequence in R, so for a particular choice of y ∈ η the right-hand side of (4.5)

exists, and (ii) if x, y ∈ η then∣

∣

∣ limn→∞

‖xn‖ − limn→∞

‖yn‖∣

∣

∣ =∣

∣

∣ limn→∞

‖xn‖ − ‖yn‖∣

∣

∣

= limn→∞

∣

∣

∣‖xn‖ − ‖yn‖

∣

∣

∣

≤ limn→∞

‖xn − yn‖ = 0


since x ∼ y. So the map in (4.5) is well-defined, and it is easy to check that

it satisfies the three requirements of a norm.

Now we define a map i : X → X , by setting

i(x) = [(x, x, x, x, x, x, . . .)].

Clearly i is linear, and an isometry between X and its image. We want to

show that i(X) is a dense subset of X .

For any given η ∈ X , choose some x ∈ η. Since x is Cauchy, for any given

ǫ > 0 there exists an N such that

‖xn − xm‖X < ǫ for all n,m ≥ N.

In particular, ‖xn − xN‖X < ǫ for all n ≥ N , and so

‖η − i(xN )‖X = limn→∞

‖xn − xN‖X < ǫ,

which shows that i(X) is dense in X .

Finally, we have to show that X is complete, i.e. that any Cauchy se-

quence in X converges to another element of X . A Cauchy sequence in X

is a Cauchy sequence of equivalence classes of Cauchy sequences in X! Take

such a Cauchy sequence, η(k)∞k=1. For each k, find xk ∈ X such that

‖i(xk) − η(k)‖X < 1/k,

using the density of i(X) in X . Now let

x = (x1, x2, x3, . . .).

We will show (i) that x is a Cauchy sequence, and so [x] ∈ X , and (ii) that

η(k) converges to [x]. This will show that X is complete.

(i) To show that x is Cauchy, observe that

‖xn − xm‖X = ‖i(xn) − i(xm)‖X

= ‖i(xn) − η(n) + η(n) − η(m) + η(m) − i(xm)‖X

≤ ‖i(xn) − η(n)‖X + ‖η(n) − η(m)‖X + ‖η(m) − i(xm)‖X

≤ 1

n+ ‖η(n) − η(m)‖X +

1

m.

So now given ǫ > 0, choose N such that ‖η(n)−η(m)‖X < ǫ/3 for n,m ≥ N .

If N ′ = max(N, 3/ǫ), it follows that

‖xn − xm‖X < ǫ for all n,m ≥ N ′,

i.e. x is Cauchy. So [x] ∈ X .

34 Completeness

(ii) To show that η(k) → [x], simply observe that

‖[x] − η(k)‖X ≤ ‖[x] − i(xk)‖X + ‖i(xk) − η(k)‖X .

Given ǫ > 0, choose N large enough that ‖xn−xm‖X < ǫ/2 for all n,m ≥ N ,

and then set N ′ = max(N, 2/ǫ). It follows that for k ≥ N ′,

‖[x] − i(xk)‖X = limn→∞

‖xn − xk‖ < ǫ/2

and ‖i(xk) − η(k)‖X < ǫ/2, i.e.

‖[x] − η(k)‖X < ǫ,

and so η(k) → [x].

We will not prove the uniqueness of X here.

The space X in the above theorem is a very abstract one, and we are

fortunate that in most situations there is a more concrete description of the

completion of ‘interesting’ normed spaces.

Definition 4.12 The space L1(0, 1) is the completion of C0([0, 1]) with re-

spect to the L1 norm.

Note that with this definition it is immediate that L1(0, 1) is complete,

and that C0([0, 1]) is dense in L1(0, 1).

What is this space L1(0, 1)? There are a number of possible answers:

• Heuristically, L1(0, 1) consists of all functions that can arise as the limit

(with respect to the L1 norm) of sequences fn ∈ C0([0, 1]).

However, how do we characterise these limits? Certainly L1(0, 1) is larger

than C0([0, 1]) (and larger than C0(0, 1)). We saw above that it contains

functions that are not continuous, and even functions whose values at indi-

vidual points (e.g. x = 12) are not defined.

• Formally, L1(0, 1) is isometrically isomorphic to the equivalence class of

sequences in C0([0, 1]) that are Cauchy in the L1 norm, where fk ∼ gkif

∫ 1

0|fk(x) − gk(x)|dx→ 0 as k → ∞.


This is hardly helpful.

• The space L1(0, 1) consists of all real-valued functions f such that∫ 1

0|f(x)|dx

is finite, where the integral is understood in the sense of Lebesgue integra-

tion. We say that f = g in L1(0, 1) (the functions are essentially ‘the same’

if∫ 1

0|f(x) − g(x)|dx = 0.

This is the most intrinsic definition, and some ways the most ‘useful’.

But note that given this definition it is certainly not obvious that L1(0, 1) is

complete, nor that C0([0, 1]) is dense in L1(0, 1). We will assume these prop-

erties in what follows, but at the risk of over-emphasis: if we use Definition

4.12 to define L1 these properties come for free. If we use the ‘useful’ defi-

nition above there is actually some work to do to check these (which would

be part of a proper development of the Lebesgue integral and corresponding

‘Lebesgue spaces’).

Although we cannot discuss the theory of Lebesgue integration in detail

here, we can give a quick overview of its fundamental features and give

a rigorous definition of the notion of ‘almost everywhere’. Essentially the

Lebesgue integral extends more elementary definitions of the integral in a

mathematically consistent way.

5

Lebesgue integration

We follow the presentation in Priestley (1997), and start the construction of

the Lebesgue integral by defining the integral of simple functions for which

there can be no argument as to the correct definition.

We define the measure (or length) |I| of an interval I = [a, b], (a, b), (a, b],

or [a, b) to be

|I| = b− a.

5.1 Integrals of functions in Lstep(R)

The class Lstep(R) of step functions on R consists of all those functions s(x)

that are piecewise constant on a finite number of intervals, i.e.

s(x) =

n∑

j=1

cjχ[Ij ](x), (5.1)

where cj ∈ R, each Ij is an interval, and χ[A] denotes the characteristic

function of the set A,

χ[A](x) =

1 x ∈ A

0 x /∈ A.

We define the integral of s(x) by

∫

s =

n∑

j=1

cj |Ij |. (5.2)

36

5.1 Integrals of functions in Lstep(R) 37

Even though this definition appears entirely reasonable, note that we have

not specified the nature of the intervals Ij, so the functions

χ(0,1) and χ[0,1]

have the same integral (which is 1); by extension we can change the value

of s at a finite number of points and leave the value of∫

s unchanged.

Most of the results we state below about the integrals of functions in

Lstep(R) rely on the following two observations:

(1) If φ ∈ Lstep(R) then φ can be written as

φ =

n∑

j=1

cjχKj(5.3)

where the intervals Kj are disjoint.

(2) If φ,ψ ∈ Lstep(R) then one can write

φ =n∑

j=1

cjχIj and ψ =n∑

j=1

djχIj ,

i.e. where φ and ψ are expressed using the same choice of intervals (in this

case some of cjs and djs may be zero).

The following properties of step functions are easily proved:

(A) If φ ∈ Lstep(R) then |φ| ∈ Lstep(R)

(B) If φ ∈ Lstep(R) then φ+ = max(φ, 0) and φ− = min(φ, 0) are both in

Lstep(R).

It is tedious but fairly elementary to check that this integral is well-defined

on Lstep(R), so that if s(x) is given by two possible expressions (5.1) then

the integrals in (5.2) agree: to do this one uses the disjoint form in (5.3).

It is also relatively simple to check that this integral satisfies the following

three fundamental properties:

(L) Linearity: if φ,ψ ∈ Lstep(R) and λ ∈ R then φ+ λψ ∈ Lstep(R) and∫

(φ+ λψ) =

∫

φ+ λ

∫

ψ.

38 Lebesgue integration

Proof Use (2) above. Then

φ+ λψ =n∑

j=1

(cj + λdj)χIj ,

and so∫

(φ+ λψ) =

n∑

j=1

(cj + λdj)|Ij | =

n∑

j=1

cj |Ij | + λ

n∑

j=1

dj |Ij | =

∫

φ+ λ

∫

ψ.

(P) Positivity: If φ ∈ Lstep(R) with φ ≥ 0 then∫

φ ≥ 0.

Proof Use (1). Then

φ =

n∑

j=1

cjχKj

is positive iff cj ≥ 0 for each j. It is immediate that∫

φ ≥ 0.

(M) Modulus property: If φ ∈ Lstep(R) then |φ| ∈ Lstep(R) and |∫

φ| ≤∫

|φ|.

Proof Clearly |φ| ± φ ≥ 0, and so, since φ ∈ Lstep(R) implies (see *) that

|φ| ∈ Lstep(R), we can use properties (L) and (P) to give∫

|φ| ±∫

φ =

∫

|φ| ± φ ≥ 0,

from whence

∓∫

φ ≤∫

|φ|,

as required.

(T) Translation invariance: Take φ ∈ Lstep(R). For t ∈ R define φh(x) =

φ(h + x). Then φh ∈ Lstep(R) and∫

φh =∫

φ.

Proof Clearly if

φ =

n∑

j=1

cjϕIj

where Ij = 〈aj , bj〉, then φd is given by

φh =

n∑

j=1

cjϕIj ,


where Ij = 〈aj − h, bj − h〉. It is clear that |Ij | = |Ij |, and so∫

φh =∫

φ.

Note that combining positivity and linearity gives the comparison result

which will be critical in what follows: if φ,ψ ∈ Lstep(R) and φ ≥ ψ then∫

φ ≥∫

ψ.

5.2 Increasing sequences of functions in Lstep(R): Linc(R)

Now, if sn(x) is a monotonically increasing sequence of functions in Lstep(R)

(sn+1(x) ≥ sn(x) for each x ∈ R), then it follows from the comparison

property above that the sequence∫

sn (5.4)

is also monotonically increasing. Provided that the integrals in (5.4) are

uniformly bounded in n,

limn→∞

∫

sn

exists.

We would like to deduce that the functions sn(x) must converge to some

limit, but we do not know that sn(x) is bounded, only that the integral∫

sn is bounded. Nevertheless, we can show that sn(x) converges ‘almost

everywhere’, an idea that we now introduce.

First, we will say that a set A ⊂ R has “measure zero” (‘zero length’) if,

given any ǫ > 0, one can find a (possibly countably infinite) set of intervals

[aj, bj ] that cover A but whose total length is less than ǫ:

A ⊂∞⋃

j=1

[aj , bj ] and∞∑

j=1

(bj − aj) < ǫ.

Exercise 5.1 Show that if Aj has measure zero for all j = 1, . . . then

∞⋃

j=1

Aj

also has measure zero. [Hint:∑∞

j=n+1 2−j = 2−n.]


A property is said to hold ‘almost everywhere’ or ‘for almost every x’ if

the set of points at which it does not hold has measure zero.

Exercise 5.2 Show that if each property Pj , j = 1, 2, . . ., holds almost

everywhere in an interval I then all the Pj hold simultaneously at almost

every point in I.

What we will now show that each monotonic sequence sn(x) with (5.4)

uniformly bounded tends pointwise to a function f(x) almost everywhere,

i.e. except on a set of measure zero.

Theorem 5.3 Let φn be an increasing sequence of step functions (φn+1(x) ≥φn(x)) such that

∫

φn ≤ K for all n. Then φn(x) converges for almost every

x.

Proof First, replace φn by φn := φn−φ1. Then φn satisfies the conditions

of the theorem with K replaced by some K ′, but now φn ≥ 0.

If we can show that φn(x) converges for almost every x then the same

clearly follows for φn(x) = φn(x) + φ1(x).

We want to show that

E = x : φn(x) → ∞has measure zero. Note that, since φn(x) is non-decreasing for each x, this

is precisely the set of points where φn(x) does not converge.

Fix m > 0, and set

En = x : φn(x) ≥ K ′m.Then En can be covered by a finite collection In of disjoint intervals of

total length ≤ 1/m: indeed, using (1) and writing φn =∑

j cjχKjwith Kj

disjoint, let In be the collection of those Kj with corresponding cj ≥ K ′m.

Then

K ≥∫

φn =∑

j

cj |Kj | ≥∑

j′∈In

ci|Ki| ≥ K ′m∑

i∈In

|Ki|,

and so∑

i∈In

|Ki| ≤1

m.

Now, note that En+1 ⊇ En (since φn+1 ≥ φn); by splitting up the intervals

that occur in In+1 if need be one can ensure that In+1 ⊇ In.


One can therefore list the intervals occurring in the In, J1, J2, J3, . . . –

first list those in I1, then the additional intervals in I2, and so on.

Now, suppose that x ∈ E. Then for n sufficiently large, x ∈ En. So

E ⊆⋃

i

Ei ⊆∞⋃

k=1

Jk. (5.5)

Now, for any fixed N , ∪Ni=1Ji ⊆ Ir for some r, and so∑N

i=1 |Ji| ≤ 1/m.

Since this does not depend on N ,

∞∑

i=1

|Ji| ≤ 1/m. (5.6)

Since m is arbitrary, it follows from (5.5) and (5.6) that E has zero

measure.

We denote the set of all functions that can be arrived at as the almost-

everywhere limit of an increasing sequence of step functions as Linc(R). For

such a function (f = limn→∞ sn), we define∫

f = limn→∞

∫

sn.

Again, we have to check that this definition does not depend on exactly

which sequence sn we have chosen.

The proof uses the following technical lemma whose proof, which appeals

to the Heine-Borel Theorem (compactness of [0,1]) we omit.

Lemma 5.4 Let θn be a decreasing sequence of non-negative step functions

such that θn → 0 almost everywhere. Then∫

θn → 0 as n→ ∞.

Lemma 5.5 If f, g ∈ Linc(R) with f ≥ g almost everywhere. If φn and ψnare increasing sequences in Lstep(R) tending to f, g, respectively, then

limn→∞

∫

φn ≥ limn→∞

∫

ψn.

Proof Since f ≥ g almost everywhere, there exists a set E≥ of zero measure

such that f(x) ≥ g(x) for all x /∈ E≥. Also φn(x) ↑ f(x) for x /∈ Eφ (Eφwith zero measure) and ψn(x) ↑ g(x) for x /∈ Eψ (Eψ with zero measure).

Let E = E≥ ∪ Eφ ∪Eψ; then E has zero measure.


For each fixed k, the sequence ψk−φn is decreasing in n, with limit ψk−ffor x /∈ E. For x ∈ E, ψk ≤ g, and so

ψk − f ≤ g − f ≤ 0.

Thus ψk−φn converges to a non-positive limit for x ∈ E, and so the sequence

(ψk−φn)+ converges to zero almost everywhere. It follows from Lemma 5.4

that for each fixed k,∫

(ψk − φn)+ → 0

as n→ ∞. Since∫

ψk −∫

φn =

∫

(ψk − φn) ≤∫

(ψk − φn)+,

letting n→ ∞ it follows that∫

ψk ≤ limn→∞

∫

φn,

and now letting k → ∞

limk→∞

∫

ψk ≤ limn→∞

∫

φn.

It follows from this lemma that∫

f is well-defined for f ∈ Linc(R), since

one can deduce that if f = limφn = limψn then lim∫

φn = lim∫

ψn.

We can prove the following properties for the integral of functions in

Linc(R):

Proposition 5.6

(i) If f, g ∈ Linc(R) and f ≥ g then∫

f ≥∫

g.

(ii) If f, g ∈ Linc(R) then f + g ∈ Linc(R) and∫

(f + g) =∫

f +∫

g.

(iii) If f ∈ Linc(R) and λ ≥ 0 then∫

λf = λ∫

f .

(iv) If f, g ∈ Linc(R) then max(f, g),min(f, g) ∈ Linc(R); in particular,

f+ = max(f, 0) ∈ Linc(R).

The proofs are obvious.


5.3 The space L1(R) of integrable functions

Finally, we define the space of integrable functions on R, written L1(R), to

be all functions of the form f(x) = g(x) − h(x) with g, h ∈ Linc(R), and set∫

f =

∫

g −∫

h.

This definition is consistent: if f = g1−h1 = g2−h2 then g1+h2 = g2+h1,

and so by the addition property for functions in Linc(R),∫

g1 + h2 =

∫

g1 +

∫

h2 =

∫

g2 +

∫

h1 =

∫

g2 + h1,

from whence∫

g1 −∫

h1 =

∫

g2 −∫

h2.

We can now show that the integral of L1 functions has the same properties

as the integral on Lstep(R):

(L) If f1, f2 ∈ L1(R) and λ ∈ R then f1 + λf2 ∈ L1(R) and∫

(f1 + λf2) =

∫

f1 + λ

∫

f2.

Proof With fi = gi − hi, gi, hi ∈ Linc(R), consider λ ≥ 0. Then

f1 + λf2 = (g1 + λg2) − (h1 + λh2)

where both the bracketed terms are in Linc(R). So f1 + λf2 ∈ L1(R), and∫

f1 + λf2 =

∫

(g1 + λg2) −∫

(h1 + λh2)

=

∫

(g1 − h1) + λ

∫

(g2 − h2) =

∫

f1 + λ

∫

f2.

Finally if f ∈ L1(R) with f = g − h (g, h ∈ Linc(R)), then −f = h − g.

So −f ∈ L1(R), and clearly∫

−f = −∫

f .

(P) If f ∈ L1(R) and f ≥ 0 a.e. then∫

f ≥ 0.

Proof If f = g − h, g, h ∈ Linc(R), then f ≥ 0 implies that g ≥ h.


Since integration respects order for functions in Linc(R),∫

g ≥∫

h, and

therefore∫

f =

∫

g −∫

h ≥ 0.

Note that it is a consequence of this property that if f = 0 a.e. then∫

f = 0.

(M) If f ∈ L1(R) then |f | ∈ L1(R) and∣

∣

∣

∣

∫

f

∣

∣

∣

∣

≤∫

|f |.

Proof Write f = g − h with g, h ∈ Linc(R). One can easily check that

|f | = max(g, h) − min(g, h),

and so |f | ∈ L1(R) using Proposition 5.6. We need to show that∫

g −∫

h =

∫

f ≤∫

|f | =

∫

max(g, h) − min(g, h)

and∫

h−∫

g = −∫

f ≤∫

|f | =

∫

max(g, h) − min(g, h).

By symmetry we need only check one of these, and for this it is sufficient

to show that

g(x) + min(g(x), h(x)) ≤ h(x) + max(g(x), h(x)).

If g < h then the LHS is 2g(x) and the RHS is 2h(x), so the inequality

holds; while if g ≥ h the LHS is g + h and the RHS is h + g, so the

inequality holds once more.

This result will imply that there are some functions that can be integrated

to nevertheless are not in L1(R), see the Examples Sheet.

(T) If f ∈ L1(R) and fd(x) = f(x+ d) then fd ∈ L1(R) and∫

fd =∫

f .

Proof Follows immediately from the same property for functions in Linc(R).

We have now defined the space L1(R) of integrable functions, and shown

that the integral as we have defined it has the four properties we start with

for the integral on Lstep(R).


If we want to talk about integrals on intervals, we say that f ∈ L1(a, b) –

‘f is integrable on (a, b)’ – if fχ(a,b) ∈ L1(R).

There are three fundamental theorems for the Lebesgue integral. The first

is the Monotone Convergence Theorem, which looks like the construction

of the Lebesgue integral, but with a monotone sequence of step functions

replaced by a monotone sequence of integrable functions.

Theorem 5.7 (Monotone Convergence Theorem) Suppose that fn ∈L1(R), fn(x) ≤ fn+1(x) almost everywhere, and

∫

fn ≤ K for some K

independent of n. Then there exists a g ∈ L1(R) such that fn → g almost

everywhere, and∫

g = lim

∫

fn.

Corollary 5.8 Suppose that f ∈ L1(R) and∫

|f | = 0. Then f = 0 almost

everywhere.

Proof Exercise.

Theorem 5.9 (Dominated Convergence Theorem) Suppose that fn ∈L1(R) and that fn → f almost everywhere. If there exists a function g ∈L1(R) such that |fn(x)| ≤ g(x) for almost every x, and for every n, then

f ∈ L1(R) and∫

f = lim

∫

fn.

To define an integral of a function of two variables one would naturally

proceed by analogy with the construction above: take ‘step functions’ that

are constant on rectangles, construct an integral on Linc(R2) by taking lim-

its of monotonic sequences, and the construct L1(R2) as the limits of differ-

ences. But this does not related ‘double integrals’ to single integrals. This

is achieved by the Fubini and Tonelli theorems. We give a less-than-rigorous

formulation:

Theorem 5.10 (Fubini-Tonelli) If f : R2 → R is such that either∫ (∫

|f(x, y)|dx)

dy <∞ or

∫ (∫

|f(x, y)|dy)

dx <∞


then f ∈ L1(R2) and

∫(∫

f(x, y) dx

)

dy =

∫(∫

f(x, y) dy

)

dx.

(The less-than-rigorous nature is that the conditions require integrability

properties of |f(x, y)|: first that for almost every y ∈ R, |f(·, y)| ∈ L1(R),

and then that the resulting function g(y) =∫

|f(x, y)| dy is again in L1(R).)

5.4 The Lebesgue spaces Lp

We denote by Lp(R) [Lp(I)] the space of functions that are pth power inte-

grable [on an interval I]. It is natural to try to use the Lp norm

‖f‖Lp =

(∫

|f(x)|p)1/p

on Lp(I). However, although this is a norm on Lp(I) [recall this was con-

tinuous functions with finite Lp norm], it in facts fails to satisfy property (i)

in the definition of a norm – if ‖f‖Lp = 0 then we only have f = 0 almost

everywhere.

So strictly speaking if we want to consider Lp as a normed space we in

fact have to consider Lp/ ∼, where f ∼ g if f = g almost everywhere. This

identification is usually done tacitly, rather than by being explicit about the

quotienting operation.

We have now gone almost full circle. We started developing these Lp

spaces because C0 is not complete in the L1 norm. Using the abstract

definition of completion, we defined L1 as the completion of C0 in the L1

norm. But now we have defined a space, L1(R), which a priori has nothing

to do with C0 and its completion.

In fact one can show that C0 is dense in L1 fairly easily (see Exercises).

We now show that L1 as defined is complete (and so is the completion of

C0 that we were after).

We start with a general lemma of independent interest.

5.4 The Lebesgue spaces Lp 47

Lemma 5.11 Suppose that (X, ‖ · ‖) is a normed space in which

∞∑

j=1

‖yj‖ <∞

implies that limn→∞∑n

j=1 yj exists. Then (X, ‖ · ‖) is complete.

Theorem 5.12 The Lebesgue space L1(R) is complete.

In fact we prove the following:

Proposition 5.13 Let fk ∈ L1(R). Then

(i) If∑∞

k=1 ‖fk‖L1 < ∞ then∑∞

k=1 |fk| converges to an element of

L1(R).

(ii) If∑∞

k=1 |fk| ∈ L1(R) then∑∞

k=1 fk ∈ L1(R).

Proof

(i) Suppose that fk ∈ L1(R) and

K =

∞∑

j=1

∫

|fk| <∞.

We apply the MCT to gn =∑n

k=1 |fk|, since gn+1 ≥ gn and∫

gn ≤ K

for every k.

(ii) We now apply the DCT to hn =∑n

k=1 fk. Each hn ∈ L1(R), and by

the triangle inequality

|hn| ≤n∑

k=1

|fk| ≤∞∑

k=1

|fk|,

and by assumption the right-hand side is in L1(R). Since∑

k |fk|converges almost everywhere, so does

∑

k fk (if ak ∈ R and∑ |ak|

converges then so does∑

ak), and the DCT implies that∑∞

k=1 fk ∈L1(R).

For completeness of L2 (which follows similar lines) see the Examples

sheet.

6

Inner product spaces

If x = (x1, . . . , xn) and y = (y1, . . . , yn) are two elements of Rn then we

define their dot product as

x · y = x1y1 + · · · + xnyn. (6.1)

This is one concrete example of an inner product on a vector space:

Definition 6.1 An inner product (·, ·) on a vector space V is a map (·, ·) :

V × V → K such that for all x, y, z ∈ V and for all α ∈ K,

(i) (x, x) ≥ 0 with equality iff x = 0,

(ii) (x+ y, z) = (x, z) + (y, z),

(iii) (αx, y) = α(x, y), and

(iv) (x, y) = (y, x).

Note that

• in a real vector space the complex conjugate in (iv) is unnecessary;

• in the complex case the restriction that (y, x) = (x, y) implies in particular

that (x, x) = (x, x), i.e. that (x, x) is real, and so the requirement that

(x, x) ≥ 0 makes sense; and

• (iii) and (iv) imply that the inner product is conjugate linear in its second

element, i.e. (x, αy) = α(x, y).

A vector space equipped with an inner product is known as an inner

product space.

48

6.1 Inner products and norms 49

Example 6.2 In the space ℓ2(K) of square summable sequences, for x =

(x1, x2, . . .) and y = (y1, y2, . . .) one can define an inner product

(x, y) =

∞∑

j=1

xj yj.

This is well-defined since∑

j |xj yj| ≤ 12(∑

j |xj |2 + |yj|2).

Example 6.3 The expression

(f, g) =

∫ b

af(x)g(x) dx

defines an inner product on the space L2(a, b).

6.1 Inner products and norms

Given an inner product we can define ‖v‖ by setting

‖v‖2 = (v, v). (6.2)

We will soon shows that ‖ · ‖ defines a norm; we say that it is the norm

induced by the inner product (·, ·).

6.2 The Cauchy-Schwarz inequality

Lemma 6.4 (Cauchy-Schwarz inequality) Any inner product satisfies

the inequality

|(x, y)| ≤ ‖x‖‖y‖ for all x, y ∈ V, (6.3)

where ‖ · ‖ is defined in (6.2).

Proof If x = 0 or y = 0 then (6.3) is clear; so suppose that x 6= 0 and y 6= 0.

For any λ ∈ K we have

(x− λy, x− λy) = (x, x) − λ(y, x) − λ(x, y) + |λ|2(y, y) ≥ 0.

50 Inner product spaces

Setting λ = (x, y)/‖y‖2 we obtain

0 ≤ ‖x‖2 − 2|(x, y)|2‖y‖2

+|(x, y)|2‖y‖2

= ‖x‖2 − |(x, y)|2‖y‖2

,

which implies (6.3).

The Cauchy-Schwarz inequality allows us to show easily that the map

x 7→ ‖x‖ is a norm on V . Property (i) is clear, since ‖x‖ ≥ 0 and if

‖x‖2 = (x, x) = 0 then x = 0. Property (ii) is also clear, since

‖αx‖2 = (αx,αx) = αα(x, x) = |α|2‖x‖2.

Property (iii), the triangle inequality, follows from the Cauchy-Schwarz in-

equality (6.3), since

‖x+ y‖2 = (x+ y, x+ y)

= ‖x‖2 + (x, y) + (y, x) + ‖y‖2

≤ ‖x‖2 + 2‖x‖‖y‖ + ‖y‖2

= (‖x‖ + ‖y‖)2,

i.e. ‖x+ y‖ ≤ ‖x‖ + ‖y‖.

As an example of the Cauchy-Schwarz inequality, consider the standard

inner product on Rn. As we would expect, the norm derived from this inner

product is just

‖x‖ =

n∑

j=1

|xj|2

1/2

.

The Cauchy-Schwarz inequality says that

|(x, y)|2 =

∣

∣

∣

∣

∣

∣

n∑

j=1

xjyj

∣

∣

∣

∣

∣

∣

2

≤

n∑

j=1

|xj|2

n∑

j=1

|yj|2

, (6.4)

or just |x · y| ≤ |x||y|.


Exercise 6.5 The norm on the sequence space ℓ2 derived from the inner

product (x, y) =∑

xj yj is

‖x‖ℓ2 =

∞∑

j=1

|xj|2

1/2

.

Obtain the Cauchy-Schwarz inequality for ℓ2 using (6.4) and a limiting ar-

gument rather than Lemma 6.4.

The Cauchy-Schwarz inequality in L2(a, b) gives the very useful:

∣

∣

∣

∣

∫ b

af(x)g(x) dx

∣

∣

∣

∣

≤(∫ b

a|f(x)|2 dx

)1/2(∫ b

a|g(x)|2 dx

)1/2

,

for f, g ∈ L2(a, b). (This shows in particular that if f, g ∈ L2(a, b) then

fg ∈ L1(a, b).)

6.3 The relationship between inner products and their norms

Norms derived from inner products have one key property in addition to

(i)–(iii) of Definition 2.1:

Lemma 6.6 (Parallelogram law) Let V be an inner product space with

induced norm ‖ · ‖. Then

‖x+ y‖2 + ‖x− y‖2 = 2(‖x‖2 + ‖y‖2) for all x, y ∈ V. (6.5)

Proof Simply expand the inner products:

‖x+ y‖2 + ‖x− y‖2 = (x+ y, x+ y) + (x− y, x− y)

= ‖x‖2 + (y, x) + (x, y) + ‖y‖2

+‖x‖2 − (y, x) − (x, y) + ‖y‖2

= 2(‖x‖2 + ‖y‖2).

52 Inner product spaces

Exercise 6.7 Show that there is no inner product on C0([0, 1]) which induces

the sup or L1 norms,

‖f‖∞ = supx∈[0,1]

|f(x)| or ‖f‖L1 =

∫ 1

0|f(x)|dx.

Given a norm that is derived from an inner product, one can reconstruct

the inner product as follows:

Lemma 6.8 (Polarisation identity) Let V be an inner product space with

induced norm ‖ · ‖. Then if V is real

4(x, y) = ‖x+ y‖2 − ‖x− y‖2, (6.6)

while if V is complex

4(x, y) = ‖x+ y‖2 − ‖x− y‖2 + i‖x+ iy‖2 − i‖x− iy‖2. (6.7)

Proof Once again, rewrite the right-hand sides as inner products, multiply

out, and simplify.

If V is an real/complex inner product space and ‖ · ‖ is a norm on V that

satisfies the parallelogram law then (6.6) or (6.7) defines an inner product

on V . In other words, the parallelogram law characterises those norms that

can be derived from inner products. (This argument is non-trivial.)

Lemma 6.9 If V is an inner product space with inner product (·, ·) and

derived norm ‖ · ‖, then xn → x and yn → y implies that

(xn, yn) → (x, y).

Proof Since xn and yn converge, ‖xn‖ and ‖yn‖ are bounded (the proof is

a simple exercise). Then

|(xn, yn) − (x, y)| = |(xn − x, yn) + (x, yn − y)|≤ ‖xn − x‖‖yn‖ + ‖x‖‖yn − y‖

implies that (xn, yn) → (x, y).


This lemma is extremely useful: that we can swap limits and inner prod-

ucts means that ifn∑

j=1

xj

converges (so that∑n

j=1 xj → x =∑∞

j=1 xj) then

∞∑

j=1

xj , y

=∞∑

j=1

(xj , y),

i.e. we can swap inner products and sums.

Definition 6.10 A Hilbert space is a complete inner product space.

Examples: Rn with inner product and norm

(

(x1, . . . , xn), (y1, . . . , yn))

=n∑

j=1

xjyj ‖(x1, . . . , xn)‖ =

n∑

j=1

|xj |2

1/2

;

Cn with inner product and norm

(

(w1, . . . , wn), (z1, . . . , zn))

=

n∑

j=1

wj zj ‖(w1, . . . , wn)‖ =

n∑

j=1

|wj |2

1/2

;

ℓ2(K) with inner product and norm

(x, y) =

∞∑

j=1

xj yj ‖x‖ =

∞∑

j=1

|xj |2

1/2

(the complex conjugate is redundant if K = R); and L2(I) with inner prod-

uct and norm

(f, g) =

∫

If(x)g(x) dx ‖f‖L2 =

(∫

I|f(x)|2 dx

)1/2

.

From now on we will assume unless explicitly stated that all the above

spaces are equipped by their standard inner product (and corresponding

norm).

7

Orthonormal bases in Hilbert spaces

From now on we will denote by H an arbitrary Hilbert space, with inner

product (·, ·) and norm ‖ · ‖; we take K = C, since the case K = R is

simplified only by removing the complex conjugates.

Our aim in this chapter is to discuss orthonormal bases for Hilbert spaces.

In contrast to the Hamel basis we considered earlier, we are now going to

allow infinite linear combinations of basis elements (called a Schauder basis).

7.1 Orthonormal sets

Definition 7.1 Two elements x and y of an inner product space are said

to be orthogonal if (x, y) = 0. (We sometimes write x ⊥ y.)

Clearly if (x, y) = 0 then

‖x+y‖2 = (x+y, x+y) = ‖x‖2 +(x, y)+(y, x)+‖y‖2 = ‖x‖2 +‖y‖2 (7.1)

(Pythagoras). Sums of orthogonal vectors are therefore very useful in cal-

culations, since all the cross terms in their norm vanish.

Definition 7.2 A set E is orthonormal if ‖e‖ = 1 for all e ∈ E and

(e1, e2) = 0 for any e1, e2 ∈ E with e1 6= e2.

Note that this definition does not require E to be countable. Note also

54

7.1 Orthonormal sets 55

that any orthonormal set must be linearly independent, since if

n∑

j=1

αjej = 0 n ∈ N, αj ∈ K, ej ∈ E,

one can take the inner product with each ej in turn to show that αj = 0

for j = 1, . . . , n.

Example 7.3 The set ej∞j=1, where

ej = (0, 0, . . . , 1, . . . , 0, . . .)

(with the 1 in the jth position), is an orthonormal set in ℓ2.

Example 7.4 Consider the space L2(−π, π) and the set

E =1√2π,

1√π

cos t,1√π

sin t,1√π

sin 2t,1√π

cos 2t, . . . .

Then E is orthonormal, since∫ π

−πcos2 nt dt =

∫ π

−πsin2 nt dt = π;

for any n,m∫ π

−πcosnt dt =

∫ π

−πsinnt dt =

∫ π

−πsinnt cosmt dt = 0;

and for any n 6= m∫ π

−πcosnt cosmt dt =

∫ π

−πsinnt sinmt dt =

∫ π

−πsinnt cosmt dt = 0.

Expansions involving orthonormal elements of an inner product space are

easy to treat, essentially due to the following simple lemma.

Lemma 7.5 Let e1, . . . , en be an orthonormal set in an inner product space

V . Then for any αj ∈ K.

∥

∥

∥

∥

∥

∥

n∑

j=1

αjej

∥

∥

∥

∥

∥

∥

2

=

n∑

j=1

|αj |2.

56 Orthonormal bases in Hilbert spaces

Proof Use induction and the Pythagorean property (7.1), noting that

n−1∑

j=1

αjej, αnen

= 0.

The following lemma, proved using the Gram-Schmidt orthonormalisa-

tion process (which we will revisit later), guarantees the existence of an

orthonormal basis in any finite-dimensional inner product space.

Lemma 7.6 Let (·, ·) be any inner product on a vector space V of dimension

n. Then there exists an orthonormal basis ejnj=1 of V .

It follows that in some sense the dot product (6.1) is the canonical in-

ner product on a finite-dimensional space. Indeed, with respect to any the

orthonormal basis ej the inner product (·, ·) has the form (6.1), i.e.

n∑

j=1

xjej ,

n∑

k=1

ykek

=

n∑

i,j=1

xj yk(ej , ek) = x1y1 + · · · + xnyn.

7.2 Convergence and orthonormality in Hilbert spaces

In an infinite-dimensional Hilbert space we cannot hope to find a finite basis,

since then the space would by definition be finite-dimensional. The best that

we can hope for is to find a countable basis ej∞j=1, in terms of which to

expand any x ∈ H as potentially infinite series,

x =∞∑

j=1

αjej .

We make the obvious definition of what this equality means.

Definition 7.7 Let (X, ‖ · ‖X) be a normed space. Then

∞∑

j=1

αjej = x


iff the partial sums converge to x in the norm of X, i.e.∥

∥

∥

∥

∥

∥

n∑

j=1

αnen

− x

∥

∥

∥

∥

∥

∥

X

→ 0 as n→ ∞.

We now formalise our notion of a basis for a Hilbert space. Note that at

present we do not require E to be countable.

Definition 7.8 A set E is a basis for H if every x can be written uniquely†in the form

x =

∞∑

j=1

αjej (7.2)

for some αj ∈ K, ej ∈ E.

(Note that if E is a basis in the sense of Definition 7.8, i.e. the expansion

in terms of the ej is unique then E is linearly independent, since if

0 =

n∑

j=1

αjej

there is a unique expansion for zero and so we must have αj = 0 for all

j = 1, . . . , n.)

† There is a subtlety here. If E is countable then one can assume that E = ej∞j=1, i.e. that the

elements of E are specified in a particular order. In this case the ‘uniqueness’ is clear. But ifE is uncountable ‘uniqueness’ means that one does not care about the order of the summationin (7.2), and so certainly this order should not affect the value of the sum itself. This requiresproof: Suppose that wj is a rearrangement of the ej. Set αn = (x, en), βm = (x, wm),

x1 =∞∑

n=1

αnen and x2 =∞∑

m=1

βmwm.

Then (by Lemma 6.9) it follows that (x1, en) = (x, en) and (x2, wm) = (x,wm). Since en = wm

for some m = m(n), it follows that

(x1 − x2, en) = (x1, en) − (x2, wm) = (x, en) − (x, wm) = 0

for every n, and similarly (x1 − x2, wm) = 0 for every m. Thus

‖x1 − x2‖2 =

(

x1 − x2,∞∑

n=1

αnen −∞∑

m=1

βmwm

)

=∞∑

n=1

αn(x1 − x2, en) −∞∑

m=1

βm(x1 − x2, wm) = 0,

and so x1 = x2, as required.


If E is a basis and is orthonormal, we refer to it as an orthonormal basis.

Note that an orthonormal set is a basis provided that every x can be written

in the form (7.2), and the uniqueness follows from the orthonormality.

Indeed, suppose that

x =∞∑

j=1

αjej and x =∞∑

j=1

βkfk

with αj, βk ∈ K non-zero, and ej, fk ∈ E.

Write E1 = ej∞j=1 and E2 = fj∞j=1. Set U = E1 ∩ E2, E′1 = E1 \ U ,

and E′2 = E2\U . In other words, U consists of those basis elements common

to ej and fj, E′1 is those occurring only in ej, and E′

2 those occurring

only in fj. All these sets are at most countable.

Then with some abuse of notation one can write

x =∑

u∈U

αuu+∑

e∈E′

1

αee and y =∑

u∈U

βuu+∑

f∈E′

2

βff.

It follows that

∑

u∈U

(αu − βu)u+∑

e∈E′

1

αee−∑

f∈E′

2

βff = 0.

Taking the inner product of this with any e ∈ E′1 shows that αe = 0, while

the inner product with any f ∈ E′2 implies that βf = 0. So in fact E1 = E2,

and the inner product with any u ∈ U then shows that αu = βu, and so the

expansion is unique. (In each of these steps we use Lemma 6.9 to swap the

order of the inner product and summation.)

Rather than discuss general bases, we concentrate on orthonormal bases,

with a particularly emphasis on countable bases E = ej∞j=1. Neglecting

for the moment the question of convergence, and of conditions to guarantee

that ej really is a basis, suppose that the equality (7.2) holds for some

x ∈ H, i.e. that

x =

∞∑

j=1

αjej .


To find the coefficients αj , simply take the inner product with some ek to

give

(x, ek) =

∞∑

j=1

αjej , ek

=

∞∑

j=1

αj(ej , ek) = αk,

and so we would expect αk = (x, ek). (Note that if x =∑

αjej then this

manipulation is rigorous, using Lemma 6.9 to change the order of the inner

product and the sum.) So if E is an orthonormal basis we would expect to

obtain the expansion

x =

∞∑

j=1

(x, ej)ej .

Assuming that the Pythagoras result of Lemma 7.5 holds for infinite sums,

we would expect that∞∑

j=1

|(x, ej)|2 = ‖x‖2.

In some ways this says that ‘the ej capture all directions in H’. Presum-

ably if ej do not form an orthonormal basis we should be able to find an

x such that∞∑

j=1

|(x, ej)|2 < ‖x‖2.

We have not proved any of this yet, since we are assuming that (7.2) holds;

but it motivates the following lemma, whose result is known as Bessel’s

inequality.

Lemma 7.9 (Bessel’s inequality) Let V be an inner product space and

en∞n=1 an orthonormal sequence. Then for any x ∈ V we have

∞∑

n=1

|(x, en)|2 ≤ ‖x‖2

and in particular the left-hand side converges.

Proof Let us denote by xk the partial sum

xk =

k∑

j=1

(x, ej)ej .


Clearly

‖xk‖2 =

k∑

j=1

|(x, ej)|2

and so we have

‖x− xk‖2 = (x− xk, x− xk)

= ‖x‖2 − (xk, x) − (x, xk) + ‖xk‖2

= ‖x‖2 −k∑

j=1

(x, ej)(ej , x) −k∑

j=1

(x, ej)(x, ej) + ‖xk‖2

= ‖x‖2 − ‖xk‖2.

It follows that

k∑

j=1

|(x, ej)|2 = ‖xk‖2 ≤ ‖x‖2 − ‖x− xk‖2 ≤ ‖x‖2.

We can give an interesting corollary about the the coefficients (x, e) when

E is an uncountable set.

Corollary 7.10 Let E be an uncountable orthonormal set in an inner prod-

uct space V . Then for each x ∈ V ,

e ∈ E : (x, e) 6= 0

is at most a countable set.

Proof For each fixed m ∈ N, consider the set

Em = e ∈ E : |(x, e)| ≥ 1/m.

Then this set can have no more than m2‖x‖2 elements. Indeed, if Em has

N > m2‖x‖2 elements, one can select N elements e1, . . . , eN from Em,

and then

N∑

j=1

|(x, ej)|2 ≥ N × 1

m2> ‖x‖2.


But this contradicts Bessel’s inequality. Thus each Em contains only a finite

number of elements, and hence

∞⋃

m=1

Em = e ∈ E : (x, e) 6= 0

contains at most a countable number of elements.

We now use Bessel’s inequality to give a simple criterion for the conver-

gence of a sum∑∞

j=1 αjej when the ej are orthonormal.

Lemma 7.11 Let H be a Hilbert space and en an orthonormal sequence

in H. The series∑∞

n=1 αnen converges iff

∞∑

n=1

|αn|2 < +∞

and then∥

∥

∥

∥

∥

∞∑

n=1

αnen

∥

∥

∥

∥

∥

2

=

∞∑

n=1

|αn|2. (7.3)

We could rephrase this as∑∞

n=1 αnen converges iff α = (α1, α2, . . .) ∈ ℓ2.

Proof Suppose that∑n

j=1 αjej converges to x as n→ ∞; then

∥

∥

∥

∥

∥

∥

n∑

j=1

αjej

∥

∥

∥

∥

∥

∥

2

=

n∑

j=1

|αj |2

converges to ‖x‖2 as n→ ∞ (see Lemma 2.16).

Conversely, if∑∞

j=1 |αj |2 < +∞ then ∑nj=1 |αj |2 is a Cauchy sequence.

Setting xn =∑n

j=1 αjej we have, taking wlog m > n,

‖xn − xm‖2 =

∥

∥

∥

∥

∥

∥

m∑

j=n+1

αjej

∥

∥

∥

∥

∥

∥

2

=

m∑

j=n+1

|αj |2,

and so xn is a Cauchy sequence and therefore converges to some x ∈ H,

since H is complete. The equality in (7.3) follows as above.

By combining this lemma with Bessel’s inequality we obtain:


Corollary 7.12 Let H be a Hilbert space and en∞n=1 an orthonormal

sequence in H. Then for any x the sequence

∞∑

n=1

(x, en)en

converges.

In fact one can use Corollary 7.10 to deduce that for any orthonormal set

E,∑

e∈E

(x, e)e

converges (we have already seen that this is independent of the order of

summation).

7.3 Orthonormal bases in Hilbert spaces

We now show that E = en∞n=1 forms a basis for H iff

‖x‖2 =

∞∑

j=1

|(x, ej)|2 for all x ∈ H.

The same results hold for a general orthonormal set E, but we stick to the

countable case for simplicity of presentation.

Proposition 7.13 Let E = ej∞j=1 be an orthonormal set in a Hilbert

space H. Then the following are equivalent to the statement that E is an

orthonormal basis for H:

(a) x =∑∞

n=1(x, en)en for all x ∈ H;

(b) ‖x‖2 =∑∞

n=1 |(x, en)|2 for all x ∈ H; and

(c) (x, en) = 0 for all n implies that x = 0.

(d) the linear span of E = en∞n=1 is dense in H, i.e. for any x ∈ H

and any ǫ > 0 there exists an n ∈ N, fj ∈ E and αj ∈ K such that

∣

∣

∣

∣

∣

∣

x−n∑

j=1

αjfj

∣

∣

∣

∣

∣

∣

< ǫ.

7.3 Orthonormal bases in Hilbert spaces 63

Proof If E is an orthonormal basis for H then we can write

x =

∞∑

j=1

αjej , i.e. x = limj→∞

n∑

j=1

αjej .

Clearly if k ≤ n we have

(n∑

j=1

αjej , ek) = αk,

and using the properties of the inner product of limits we obtain αk = (x, ek)

and hence (a) holds. The same argument shows that if we assume (a) then

this expansion is unique and so E is a basis.

We first show that (a) ⇒ (b) ⇒ (c) ⇒ (a), and then that (a) ⇒ (d) and

(d) ⇒ (c).

(a) ⇒ (b) is immediate from (2.16).

(b) ⇒ (c) is immediate since ‖x‖ = 0 implies that x = 0.

(c) ⇒ (a) Take x ∈ H and let

y = x−∞∑

j=1

(x, ej)ej .

For each m ∈ N we have, using Lemma 6.9 (continuity of the inner product),

(y, em) = (x, em) − limn→∞

n∑

j=1

(x, ej)ej , em

= 0

since eventually n ≥ m. It follows from (c) that y = 0, i.e. that

x =

∞∑

j=1

(x, ej)ej

as required.

(a) ⇒ (d) is clear, since given any x and ǫ > 0 there exists an n such that∣

∣

∣

∣

∣

∣

n∑

j=1

(x, ej)ej − x

∣

∣

∣

∣

∣

∣

< ǫ.

(d) ⇒ (c) Take x ∈ H such that (x, ej) = 0 for every j. Choose xncontained in the linear space of E such that xn → x. Then

‖x‖2 = (x, x) = ( limn→∞

xn, x) = limn→∞

(xn, x) = 0,


since xn is a (finite) linear combination of the ej . So x = 0.

Example 7.14 The set ej∞j=1, where

ej = (0, 0, . . . , 1, . . . , 0, . . .)

(with the 1 in the jth position), is an orthonormal basis for ℓ2, since it clear

that if (x, ej) = xj = 0 for all j then x = 0.

Example 7.15 The sine and cosine functions given in example 7.4 are an

o-n basis for L2(−π, π).

Lemma 7.16 Any infinite-dimensional Hilbert H space contains a countably

infinite orthonormal sequence.

Proof Suppose that H contains an orthonormal set Ek = ejkj=1. Then Ekdoes not form a basis for H, since H is infinite-dimensional. It follows that

there exists a non-zero uk ∈ H such that

(uk, ej) = 0 for all j = 1, . . . , k

(for otherwise by characterisation (c), Ek would be a basis). Define ek+1 =

uk/‖uk‖ to obtain an orthonormal set Ek+1 = e1, . . . , ek.The result follows by induction, start with e1 = x/‖x‖ for any non-zero

x ∈ H.

Theorem 7.17 A Hilbert space is finite-dimensional iff its unit ball is com-

pact.

Proof The unit ball is closed and bounded. If H is finite-dimensional this

is equivalent to compactness by Corollary 3.12. If H is infinite-dimensional

then it contains a countable orthonormal set ej∞j=1, and for i 6= j

‖ei − ej‖2 = 2.

The ej form a sequence in the unit ball that can have no convergent

subsequence.

8

Closest points and approximation

8.1 Closest points in convex subsets

We start with a general result about closest points.

Definition 8.1 A subset A of a vector space V is said to be convex if for

every x, y ∈ A and λ ∈ [0, 1], λx+ (1 − λ)y ∈ A.

Lemma 8.2 Let A be a non-empty closed convex subset of a Hilbert space

H and let x ∈ H. Then there exists a unique a ∈ A such that

‖x− a‖ = inf‖x− a‖ : a ∈ A.

Proof Set δ = inf‖x− a‖ : a ∈ A and find a sequence an ∈ A such that

‖x− an‖2 ≤ δ2 +1

n. (8.1)

We will show that an is a Cauchy sequence. To this end, we use the

parallelogram law:

‖(x− an)+ (x− am)‖2 + ‖(x− an)− (x− am)‖2 = 2[‖x− an‖2 + ‖x− am‖2].

Which gives

‖2x− (an + am)‖2 + ‖an − am‖2 < 4δ2 +2

m+

2

n

or

‖an − am‖2 ≤ 4δ2 +2

m+

2

n− 4‖x− 1

2(an + am)‖2.

65

66 Closest points and approximation

Since A is convex, an +am ∈ A, and so ‖x− 12(an +am)‖2 ≥ δ2, which gives

‖an − am‖2 ≤ 2

m+

2

n.

It follows that an is Cauchy, and so an → a. Since A is closed, a ∈ A.

To show that a is unique, suppose that ‖u − a∗‖ = δ with a∗ 6= a. Then

‖u− 12(a∗ + a)‖ ≥ δ since A is convex, and so, using the parallelogram law

again,

‖a∗ − a‖2 ≤ 4γ2 − 4γ2 = 0,

i.e. a∗ = a and a is unique.

8.2 Linear subspaces and orthogonal complements

In an infinite-dimensional space, linear subspaces need not be closed. For

example, the space ℓf (R) of all real sequences with only a finite number of

non-zero terms is a linear subspace of ℓ2(R), but is not closed (consider the

sequence x(n) = (1, 12 ,

13 , . . . ,

1n , 0, . . .)).

If X is a subset of H then the orthogonal complement of X in H is

X⊥ = u ∈ H : (u, x) = 0 for all x ∈ X.Clearly if Y ⊆ X then X⊥ ⊆ Y ⊥.

Lemma 8.3 If X is a subset of H then X⊥ is a closed linear subspace of

H.

Proof It is clear that X⊥ is a linear subspace of H: if u, v ∈ X⊥ and α ∈ K

then

(u+ v, x) = (u, x) + (v, x) = 0 and (αu, x) = α(u, x) = 0

for every x ∈ X. To show that X⊥ is closed, suppose that un ∈ X⊥ and

un → u then

(u, x) = limn→∞

(un, x) = 0,

and so X⊥ is closed.

Note that in general (X⊥)⊥ ⊇ X; one has equality if X is a closed linear

subspace (see examples sheet).

8.2 Linear subspaces and orthogonal complements 67

Note that Proposition 7.13 shows that E is a basis for H iff E⊥ = 0(since this is just a rephrasing of (c): (u, ej) = 0 for all j implies that u = 0).

Note also that if Span(E) denotes the linear span of E, i.e.

Span(E) = u ∈ H : u =

n∑

j=1

αjej n ∈ N, αj ∈ K, ej ∈ E

then E⊥ = (Span(E))⊥. Since E ⊆ Span(E) one immediately has the

inclusion (Span(E))⊥ ⊆ E⊥, and if y ∈ E⊥, then for any u ∈ Span(E),

i.e. for any

u =

n∑

j=1

αjej n ∈ N, αj ∈ K, ej ∈ E,

one has

(y, u) = (y,

n∑

j=1

αjej) =

n∑

j=1

αj(y, ej) = 0

since y ∈ E⊥. So y ∈ (Span(E))⊥, which shows that E⊥ ⊆ (Span(E))⊥ and

hence the claimed equality.

In fact one also has E⊥ = (clin(E))⊥. Recall that the closed linear span

of E, clin(E) is given by

clin(E) = u ∈ H : ∀ ǫ > 0 there exists an x ∈ Span(E) with ‖u−x‖ < ǫ.

Since Span(E) ⊆ clin(E), we have (Span(E))⊥ ⊇ (clin(E))⊥. To show

equality, take y ∈ (Span(E))⊥ and u ∈ clin(E): we want to show that

(y, u) = 0 so that u ∈ (clin(E))⊥. Now, since u ∈ clin(E), there exists a

sequence xn ∈ Span(E) such that xn → 0. Therefore

(y, u) = (y, limn→∞

xn) = limn→∞

(y, xn) = 0,

as required.

Proposition 8.4 If U is a closed linear subspace of a Hilbert space H then

any x ∈ H can be written uniquely as

x = u+ v with u ∈ U, v ∈ U⊥,

i.e. H = U ⊕ U⊥. The map PU : H → U defined by

PUx = u


is called the orthogonal projection of x onto U , and satisfies

P 2Ux = PUx and ‖PUx‖ ≤ ‖x‖ for all x ∈ H.

Proof If U is a closed linear subspace then U is closed and convex, so the

above result shows that given x ∈ H there is a unique closest point u ∈ U .

It is now simple to show that x− u ∈ U⊥ and then such a decomposition is

unique.

Indeed, consider v = x− u; the claim is that v ∈ U⊥, i.e. that

(v, y) = 0 for all y ∈ U.

Consider ‖x− (u− ty)‖ = ‖v + ty‖; then

∆(t) = ‖v + ty‖2 = (v + ty, v + ty)

= ‖v‖2 + (ty, v) + (v, ty)) + ‖y‖2

= ‖v‖2 + t(y, v) + t(y, v) + |t|2‖y‖2

= ‖v‖2 + 2Ret(y, v) + |t|2‖y‖2.

We know from the construction of u that ‖v+ ty‖ is minimal when t = 0. If

t is real then this implies that d∆/dt(0) = 2Re(y, v) = 0. If t = is, with

s real, then d∆(is)/ds = −2 Im(y, v) = 0. So (y, v) = 0 for any y ∈ U ,

i.e. v ∈ U⊥ is claimed.

Finally, the uniqueness follows easily: if x = u1 + v1 = u2 + v2, then

u1 − u2 = v2 − v1, and so

|v1 − v2|2 = (v1 − v2, v1 − v2) = (v1 − v2, u2 − u1) = 0,

since u1 − u2 ∈ U and v2 − v1 ∈ U⊥.

If PUx denotes the closest point to x in U then clearly P 2U = PU , and it

follows from the definition of u that

‖x‖2 = ‖u‖2 + ‖x− u‖2,

thus ensuring that

‖PUx‖ ≤ ‖x‖,

i.e. the projection can only decrease the norm.

We will find an explicit expression for PU in Theorem 8.5


8.3 Best approximations

We now investigate the best approximation of elements of H using the closed

linear span of an orthonormal set E. Of course, if E is a basis then there is

no approximation involved.

Theorem 8.5 Let E be an orthonormal set E = ejj∈J , where J =

(1, 2, . . . , n) or N. Then for any u ∈ H, the closest point to x in clin(E) is

given by

y =∑

j∈J

(x, ej)ej .

In particular the orthogonal projection of X onto clin(E) is given by

Pclin(E)x =∑

j∈J

(x, ej)ej .

Proof Consider x−∑j αjej . Then

∥

∥

∥

∥

∥

∥

x−∑

j

αjej

∥

∥

∥

∥

∥

∥

2

= ‖x‖2 −∑

j

(x, αjej) −∑

j

(αjej , x) +∑

j

|αj |2

= ‖x‖2 −∑

j

αj(x, ej) −∑

j

αj(x, ej) +∑

j

|αj |2

= ‖x‖2 −∑

j

|(x, ej)|2

+∑

j

[

|(x, ej)|2 − αj(x, ej) − αj(x, ej) + |αj |2]

= ‖x‖2 −∑

j

|(x, ej)|2 +∑

j

|(x, ej) − αj |2,

and so the minimum occurs when αj = (x, ej) for all j.

Example 8.6 The best approximation of an element x ∈ ℓ2 in terms of

ejnj=1 (elements of the standard basis) is simply

n∑

j=1

(x, ej)ej = (x1, x2, . . . , xn, 0, 0, . . .).


Example 8.7 If E = ej∞j=1 is an orthonormal basis in H then the best

approximation of an element of H in terms of ejnj=1 is just given by the

partial sumn∑

j=1

(x, ej)ej .

Now suppose that E is a finite or countable set that is not orthonormal.

We can still find the best approximation to any u ∈ H that lies in clin(E)

by using the Gram-Schmidt orthonormalisation process:

Proposition 8.8 (Gram-Schmidt orthonormalisation) Given a set

E = ejj∈J with J = N or J = (1, . . . , n) there exists an orthonor-

mal set E = ejj∈J such that

Span(e1, . . . , ek) = Span(e1, . . . , ek)

for every k ∈ N.

Proof First omit all elements of en which can be written as a linear

combination of the preceding ones.

Now suppose that we already have an orthonormal set (e1, . . . , en) whose

span is the same as (e1, . . . , en). Then we can define en+1 by setting

e′n+1 = en+1 −n∑

i=1

(en+1, ei)ei and en+1 =e′n+1

‖e′n+1‖.

The span of (e1, . . . , en+1) is clearly the same as the span of (e1, . . . , en, en+1),

which is the same as the span of (e1, . . . , en, en+1) using the induction hy-

pothesis. Clearly ‖en+1‖ = 1 and for m ≤ n we have

(en+1, em) =1

‖e′n+1‖(

(en+1, em) −n∑

i=1

(en+1, ei)(ei, em))

= 0

since (e1, . . . , en) are orthonormal. Setting e1 = e1/‖e1‖ starts the induction.

Example 8.9 Consider approximation of functions in L2(−1, 1) with poly-

nomials of degree up to n. We can start with the set 1, x, x2, . . . , xn,

and then use the Gram-Schmidt process to construct polynomials that are

orthonormal w.r.t. the L2(−1, 1) inner product.


We begin with e1 = 1/√

2, and then consider

e′2 = x−(

x,1√2

)

1√2

= x− 1

2

∫ 1

−1t dt = x

so

‖e′2‖2 =

∫ 1

−1t2 dt =

2

3; e2 =

√

3

2x.

Then

e′3 = x2 −(

x2,

√

3

2x

)

√

3

2x−

(

x2,1√2

)

1√2

= x2 − 3x

2

∫ 1

−1

3t3

2dt− 1

2

∫ 1

−1t2 dt

= x2 − 1

3,

so

‖e′3‖2 =

∫ 1

−1

(

t2 − 1

3

)2

dt =

[

t5

5− 2t3

9+t

9

]1

−1

=8

45

which gives

e3 =

√

5

8

(

3x2 − 1)

.

Exercise 8.10 Show that e4 =√

78 (5x3 − 3x), and check that this is or-

thogonal to e1, e2, and e3.

Using these orthonormal functions we can find the best approximation of

any function f ∈ L2(−1, 1) by a degree three polynomial:(

7

8

∫ 1

−1f(t)(5t3 − 3t) dt

)

(5x3 − 3x) +

(

5

8

∫ 1

−1f(t)(3t2 − 1) dt

)

(3x2 − 1)

+

(

3

2

∫ 1

−1f(t)t dt

)

x+1

2

∫ 1

−1f(t) dt.

Example 8.11 The best approximation of f(x) = |x| by a third degree

polynomial is

f3(x) =

(

5

4

∫ 1

03t3 − t dt

)

(3x2−1)+

∫ 1

0t dt =

5

16(3x2−1)+

1

2=

15x2 + 3

16.


We have (after some tedious integration)

‖f − f3‖2 =2

162

∫ 1

0(15x2 − 16x+ 3)2 dx =

3

16

Of course, the meaning of the ‘best approximation’ is that this choice

minimises the L2 norm of the difference. It is not the ‘best approximation’

in terms of the supremum norm: at x = 0 the value of f3(x) = 3/16, while

supx∈[0,1]

|x− (x2 +1

8)| =

1

8.

Exercise 8.12 Find the best approximation (w.r.t. L2(−1, 1) norm) of sinx

by a third degree polynomial.

Exercise 8.13 Find the first four polynomials that are orthogonal on L2(0, 1)

with respect to the usual L2 inner product.

9

Separable Hilbert spaces and ℓ2

We start with a definition.

Definition 9.1 A normed space is separable if it contains a countable dense

subset.

This is an approximation property: one can find a countable set xn∞n=1

such that given any u ∈ H and ǫ > 0, there exists an xj such that

‖xj − u‖ < ǫ.

Example 9.2 R is separable, since Q is a countable dense subset. So is Rn,

since Qn is countable and dense. C is separable since complex numbers of

the form q1 + iq2 with q1, q2 ∈ Q is countable and dense.

Example 9.3 ℓ2 is separable, since sequences of the form

x = (x1, . . . , xn, 0, 0, 0, . . .)

with x1, . . . , xn ∈ Q are dense.

We now show that C0([0, 1]) is separable, by proving the Weierstrass ap-

proximation theorem: every continuous function can be approximated arbi-

trarily closely (in the supremum norm) by a polynomial.

73

74 Separable Hilbert spaces and ℓ2

Theorem 9.4 Let f(x) be a real-valued continuous function on [0, 1]. Then

the sequence of polynomials

Pn(x) =

n∑

p=0

(

n

p

)

f(p/n)xp(1 − x)n−p

converges uniformly to f(x) on [0, 1].

Proof Start with the identity

(x+ y)n =

n∑

p=0

(

n

p

)

xpyn−p.

Differentiate with respect to x and multiply by x to give

nx(x+ y)n−1 =n∑

p=0

p

(

n

p

)

xpyn−p;

differentiate twice with respect to x and multiply by x2 to give

n(n− 1)x2(x+ y)n−2 =

n∑

p=0

p(p− 1)

(

n

p

)

xpyn−p.

It follows that if we write

rp(x) =

(

n

p

)

xp(1 − x)n−p

we haven∑

p=0

rp(x) = 1,

n∑

p=0

prp(x) = nx, and

n∑

p=0

p(p− 1)rp(x) = n(n− 1)x2.

Thereforen∑

p=0

(p− nx)2rp(x) = n2x2n∑

p=0

rp(x) − 2nx

n∑

p=0

prp(x) +

n∑

p=0

p2rp(x)

= n2x2 − 2nx · nx+ (nx+ n(n− 1)x2)

= nx(1 − x).

Since f is continuous on the closed bounded interval it is bounded, |f(x)| ≤M for some M > 0. It also follows that f is uniformly continuous on [0, 1],

so for any ǫ > 0 there exists a δ > 0 such that

|x− y| < δ ⇒ |f(x) − f(y)| < ǫ.

Separable Hilbert spaces and ℓ2 75

Since∑n

p=0 rp(x) = 1 we have

∣

∣

∣

∣

∣

∣

f(x) −n∑

p=0

f(p/n)rp(x)

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

n∑

p=0

(f(x) − f(p/n))rp(x)

∣

∣

∣

∣

∣

∣

≤

∣

∣

∣

∣

∣

∣

∑

|(p/n)−x|≤δ


∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∣

∣

∑

|(p/n)−x|>δ


∣

∣

∣

∣

∣

∣

.

For the first term on the right-hand side we have∣

∣

∣

∣

∣

∣

∑

|(p/n)−x|≤δ


∣

∣

∣

∣

∣

∣

≤ ǫ∑

rp(x) = ǫ,

and for the second term on the right-hand side∣

∣

∣

∣

∣

∣

∑

|(p/n)−x|>δ


∣

∣

∣

∣

∣

∣

≤ 2M∑

|(p/n)−x|>δ

rp(x)

≤ 2M

n2δ2

n∑

p=0

(p− nx)2rp(x)

=2Mx(1 − x)

nδ2≤ 2M

nδ2

which tends to zero as n→ ∞.

One could also state this as: the set of polynomials is dense in C0([0, 1])

equipped with the supremum norm.

Proposition 9.5 C0([0, 1]) is separable.

Proof Given any f ∈ C0([0, 1]), it can be approximated to within ǫ by some

polynomial, i.e.∥

∥

∥

∥

∥

∥

f −

N∑

j=1

anxn

∥

∥

∥

∥

∥

∥

∞

< ǫ/2.

76 Separable Hilbert spaces and ℓ2

While the set of all polynomials in not countable, the set of all polynomials

with rational coefficients is. Since∥

∥

∥

∥

∥

(

N∑

n=1

anxn

)

−(

N∑

n=1

bnxn

)∥

∥

∥

∥

∥

∞

≤N∑

n=1

|an − bn|,

one can choose bn ∈ Q such that |an − bn| < ǫ/2N , and then∥

∥

∥

∥

∥

∥

f −

N∑

j=n

bnxn

∥

∥

∥

∥

∥

∥

∞

< ǫ.

If we now use the fact that C0([0, 1]) is dense in L2(0, 1), it follows that:

Proposition 9.6 L2(0, 1) is separable.

Proof Take f ∈ L2(0, 1). Given ǫ > 0 there exists a g ∈ C0([0, 1]) such that

‖f−g‖L2 < ǫ/2. We know from above that there exists a polynomial h with

rational coefficients such that

‖g − h‖∞ < ǫ/2.

Since

‖g − h‖2L2 =

∫ 1

0|g(x) − h(x)|2 dx ≤

∫ 1

0‖g − h‖2

∞ dx = ‖g − h‖2∞,

it follows that

‖f − h‖L2 ≤ ‖f − g‖L2 + ‖g − h‖∞ < ǫ.

The property of separability seems very strong, but it is a simple conse-

quence of the existence of a countable orthonormal basis, as we now show.

Proposition 9.7 An infinite-dimensional Hilbert space is separable iff it

has a countable orthonormal basis.

Note that this shows immediately that the unit ball in a separable Hilbert

space is not compact.

Separable Hilbert spaces and ℓ2 77

Proof If a Hilbert space has a countable basis then we can construct a

countable dense set by taking finite combinations of the basis elements with

rational coefficients, and so it is separable.

If H is separable, let E = xn be a countable dense subset. In particular,

the closed linear span of E is the whole of H. The Gram-Schmidt process

now provides a countable orthonormal set whose closed linear span is all of

H, i.e. a countable orthonormal basis.

Note that there are Hilbert spaces that are not separable. For example, if

Γ is uncountable then the space ℓ2(Γ) consisting of all functions f : Γ → R

such that∑

γ∈Γ

|f(γ)|2 <∞

is a Hilbert space but is not separable.

By using these basis elements we can construct an isomorphism between

any separable Hilbert space and ℓ2, so that in some sense ℓ2 is “the only”

separable infinite-dimensional Hilbert space:

Theorem 9.8 Any infinite-dimensional separable Hilbert space H is iso-

metric to ℓ2(K).

Proof Since H is separable it has a countable orthonormal basis ej. Define

ϕ : H → ℓ2 by the map

u 7→(

(u, e1), (u, e2), . . . , (u, en), . . .)

;

clearly the inverse map is given by

α 7→∞∑

j=1

αjej where α = (α1, α2, . . . , αn, . . .).

By the result of Lemma 7.11, u ∈ H ⇒ ϕ(u) ∈ ℓ2 and α ∈ ℓ2 ⇒ ϕ−1(α) ∈ H,

while the characterisation of a basis in Proposition 7.13 shows that ‖u‖H =

‖ϕ(u)‖ℓ2 .

10

Linear maps between Banach spaces

We now consider linear maps between Banach spaces.

10.1 Bounded linear maps

Definition 10.1 If U and V are vector spaces over K then an operator A

from U into V is linear if

A(αx+ βy) = αAx+ βAy for all α, β ∈ K, x, y ∈ U.

The collection L(U, V ) of all linear operators from U into V is a vector

space.

Definition 10.2 A linear operator A from a normed space (X, ‖ · ‖X) into

another normed space (Y, ‖ · ‖Y ) is bounded if there exists a constant M

such that

‖Ax‖Y ≤M‖x‖X for all x ∈ X. (10.1)

Linear operators on infinite-dimensional spaces need not be bounded.

Lemma 10.3 A linear operator T : X → Y is continuous iff it is bounded.

Proof Suppose that T is bounded; then for some M > 0

‖Txn − Tx‖Y = ‖T (xn − x)‖Y ≤M‖xn − x‖X ,

78


and so T is continuous. Now suppose that T is continuous; then in particular

it is continuous at zero, and so, taking ǫ = 1 in the definition of continuity,

there exists a δ > 0 such that

‖Tx‖ ≤ 1 for all ‖x‖ ≤ δ.

It follows that

‖Tz‖ =

∥

∥

∥

∥

T

(‖z‖δ

δz

‖z‖

)∥

∥

∥

∥

=‖z‖δ

∥

∥

∥

∥

T

(

δz

‖z‖

)∥

∥

∥

∥

≤ 1

δ‖z‖,

and so T is bounded.

The space of all bounded linear operators from X into Y is denoted by

B(X,Y ).

Definition 10.4 The operator norm of an operator A (from X into Y ) is

the smallest value of M such that (10.1) holds,

‖A‖B(X,Y ) = infM : (10.1) holds. (10.2)

Note that – and this is the key point – it follows that

‖Ax‖Y ≤ ‖A‖B(X,Y )‖x‖X for all x ∈ X.

(Since for each x ∈ X, ‖Ax‖Y ≤M‖x‖X for every M > ‖A‖B(X,Y )).

Lemma 10.5 The following is an equivalent definition of the operator norm:

‖A‖B(X,Y ) = sup‖x‖X=1

‖Ax‖Y . (10.3)

Proof Let us denote by ‖A‖1 the value defined in (10.2), and by ‖A‖2 the

value defined in (10.3). Then given x 6= 0 we have∥

∥

∥

∥

Ax

‖x‖X

∥

∥

∥

∥

Y

≤ ‖A‖2 i.e. ‖Ax‖Y ≤ ‖A‖2‖x‖X ,

and so ‖A‖1 ≤ ‖A‖2. It is also clear that if ‖x‖X = 1 then

‖Ax‖Y ≤ ‖A‖1‖x‖X = ‖A‖1,

and so ‖A‖2 ≤ ‖A‖1. It follows that ‖A‖1 = ‖A‖2.

80 Linear maps between Banach spaces

Exercise 10.6 Show that also

‖A‖B(X,Y ) = sup‖x‖X≤1

‖Ax‖Y

and

‖A‖B(X,Y ) = supx 6=0

‖Ax‖Y‖x‖X

. (10.4)

When there is no room for confusion we will omit the B(X,Y ) subscript

on the norm, sometimes adding the subscript “op” (for “operator”) to make

things clearer (‖ · ‖op).

If T : X → Y then in order to find ‖T‖op one can try the following: first

show that

‖Tx‖Y ≤M‖x‖X (10.5)

for someM > 0, i.e. show that T is bounded. It is then clear that ‖T‖op ≤M

(since ‖T‖op is the infimum of all M such that (10.5) holds). Then, in order

to show that in fact ‖T‖op = M , find an example of a particular z ∈ X such

that

‖Tz‖Y = M‖z‖X .

This shows from the definition in (10.4) that ‖T‖op ≥M and hence that in

fact ‖T‖op = M .

Example 10.7 Consider the right and left shift operators on ℓ2, σr and σl

given by

σr(x) = (0, x1, x2, . . .) and σl(x) = (x2, x3, x4, . . .).

Both operators are clearly linear. We have

‖σr(x)‖2ℓ2 =

∞∑

i=1

|xi|2 = ‖x‖2ℓ2 ,

so that ‖σr‖op = 1, and

‖σl(x)‖2ℓ2 =

∞∑

i=2

|xi|2 ≤ ‖x‖2ℓ2 ,

so that ‖σl‖op ≤ 1. However, if we choose an x with

x = (0, x2, x3, . . .)


then we have

‖σl(x)‖2ℓ2 =

∞∑

j=2

|xj|2 = ‖x‖2ℓ2 ,

and so we must have ‖σl‖op = 1.

In slightly more involved examples one might have to be a little more

crafty; for example, given the bound ‖Tx‖Y ≤ M‖x‖X find a sequence

zn ∈ X such that

‖Tzn‖Y‖zn‖X

→M

as n→ ∞, which again shows using (10.4) that ‖T‖op ≥M and hence that

‖T‖op = M .

Example 10.8 Consider the space L2(a, b) with −∞ < a < b < +∞ and

the multiplication operator T from L2(a, b) into itself given by

Tx(t) = f(t)x(t) t ∈ [a, b]

where f ∈ C0([a, b]). Then clearly T is linear and

‖Tx‖2 =

∫ b

a|f(t)x(t)|2 dt

=

∫ b

a|f(t)|2|x(t)|2 dt

≤ maxa≤t≤b

|f(t)|2∫ b

a|x(t)|2 dt,

and so

‖Tx‖L2 ≤ ‖f‖∞‖x‖L2 ,

i.e. ‖T‖op ≤ ‖f‖∞.

Now let s be a point at which |f | is maximum. Assume for simplicity that

s ∈ (a, b), and for each ǫ > 0 consider

xǫ(t) =

1 |t− s| < ǫ

0 otherwise,

then

‖Txǫ‖2

‖xǫ‖2=

1

2ǫ

∫ s+ǫ

s−ǫ|f(t)|2 dt→ |f(s)|2 as ǫ→ 0


since f is continuous. Therefore in fact

‖T‖op = ‖f‖∞.

If s = a then we can replace |t−s| < ǫ in the definition of xǫ by a ≤ t < a+ǫ,

and if s = b we replace it by b−ǫ < t ≤ b; the rest of the argument is identical.

We now give a very important particular example.

Example 10.9 Consider the map from L2(a, b) into itself given by the in-

tegral

(Tx)(t) =

∫ b

aK(t, s)x(s) ds for all t ∈ [a, b]

where∫ b

a

∫ b

a|K(t, s)|2 ds dt < +∞.

Then T is clearly linear, and

‖Tx‖2 =

∫ b

a

∣

∣

∣

∣

∫ b

aK(t, s)x(s) ds

∣

∣

∣

∣

2

dt

≤∫ b

a

[∫ b

a|K(t, s)|2 ds

∫ b

a|x(s)|2 ds

]

dt by Cauchy-Schwarz

=

(∫ b

a

∫ b

a|K(t, s)|2 ds dt

)

‖x‖2,

and so

‖T‖2op ≤

∫ b

a

∫ b

a|K(t, s)|2 ds dt.

Note that this upper bound on the operator norm can be strict, see ex-

amples.

The space B(X,Y ) is a Banach space whenever Y is a Banach space.

Remarkably this does not depend on whether the space X is complete or

not.

Theorem 10.10 Let X be a normed space and Y a Banach space. Then

B(X,Y ) is a Banach space.

10.2 Kernel and range 83

Proof Let An be a Cauchy sequence in B(X,Y ). We need to show that

An → A for some A ∈ L(X,Y ). Since An is Cauchy, given ǫ > 0 there


‖An −Am‖op ≤ ǫ for all n,m ≥ N. (10.6)

We now show that for every fixed x ∈ X the sequence Anx is Cauchy in

Y . This follows since

‖Anx−Amx‖Y = ‖(An −Am)x‖Y ≤ ‖An −Am‖op ‖x‖X , (10.7)

and An is Cauchy in B(X,Y ). Since Y is complete, it follows that

Anx→ y,

where y depends on x. We can therefore define a mapping A : X → Y

by Ax = y. We still need to show, however, that A ∈ B(X,Y ) and that

An → A in the operator norm.

First, A is linear since

A(x+ λy) = limn→∞

An(x+ λy) = limn→∞

Anx+ λ limn→∞

Any = Ax+ λAy.

To show that A is bounded take n,m ≥ N (from (10.6)) in (10.7), and let

m→ ∞. Since Amx→ Ax this shows that

‖Anx−Ax‖Y ≤ ǫ‖x‖X . (10.8)

Since (10.8) holds for every x it follows that

‖An −A‖op ≤ ǫ, (10.9)

and so An − A ∈ B(X,Y ). Since B(X,Y ) is a vector space and An ∈B(X,Y ), it follows that A ∈ B(X,Y ), and (10.9) shows that An → A in

B(X,Y ).

10.2 Kernel and range

Given a linear operator, we define its kernel

KerT = x ∈ X : Tx = 0

and its range

RangeT = y ∈ Y : y = Tx for some x ∈ X.


Corollary 10.11 If T ∈ B(X,Y ) then KerT is a closed linear subspace of

X.

Proof It is easy to show that Ker(T ) is a linear subspace, since if x, y ∈Ker(T ) then

T (αx+ βy) = αTx+ βTy = 0.

Furthermore if xn → x and Txn = 0 then since T is continuous Tx =

limn→∞ Txn = 0, so Ker(T ) is closed.

Note that the range is not necessarily closed, see examples.

11

The Riesz representation theorem and the adjointoperator

If U is a normed space over K then a linear map from U into K is called a

linear functional on U .

Since K = R or C is complete then by Theorem 10.10 the collection of all

linear functionals on U , B(U,K), is a Banach space. This space is termed

the dual space of U , and is denoted by U∗.

For any f ∈ U∗,

‖f‖U∗ = sup‖u‖=1

|f(u)|.

Example 11.1 Take U = C0([a, b]), and consider δx defined for x ∈ [a, b]

by

δx(f) = f(x) for all f ∈ U.

Then

|δx(f)| = |f(x)| ≤ ‖f‖∞,

so that δx ∈ U∗ with ‖δx‖ ≤ 1. Choosing a function f ∈ C0([a, b]) such that

|f(x)| = ‖f‖∞ shows that in fact ‖δx‖ = 1.

(Note: this shows that – at least for this particular choice of U – knowledge

of T (f) for all T ∈ U∗ determines f ∈ U . This result is in fact true in

general.)

Example 11.2 Let U be the real vector space L2(a, b), and take φ ∈ C0([a, b]).

85

86 The Riesz representation theorem and the adjoint operator

Consider

f(u) =

∫ b

aφ(t)u(t) dt.

Then

|f(u)| =

∣

∣

∣

∣

∫ b

aφ(t)u(t) dt

∣

∣

∣

∣

= |(φ, u)L2 |≤ ‖φ‖L2‖u‖L2 using the Cauchy-Schwarz inequality,

and so f ∈ U∗ with

‖f‖ ≤ ‖φ‖L2 .

If we choose u = φ‖φ‖

L2

then ‖u‖L2 = 1 and

|f(u)| =

∫ b

a

|φ(t)|2‖φ‖L2

dt = ‖φ‖L2

and so ‖f‖ = ‖φ‖.

Exercise 11.3 Let U be C0([a, b]) and for some φ ∈ U consider fφ defined

as

fφ(u) =

∫ b

aφ(t)u(t) dt for all u ∈ U.

Show that fφ ∈ U∗ with ‖fφ‖ ≤∫ ba |φ(t)|dt. Show that this is in fact an

equality by choosing an appropriate sequence of functions un ∈ U for which

|fφ(un)|/‖un‖∞ →∫ ba |φ(t)|dt.

Example 11.4 Let U be a Hilbert space. Given any y ∈ H, define

ly(x) = (x, y). (11.1)

Then ly is clearly linear, and

|ly(x)| = |(x, y)| ≤ ‖x‖‖y‖

using the Cauchy-Schwarz inequality. It follows that ly ∈ H∗ with ‖ly‖ ≤‖y‖. Choosing x = y in (11.1) shows that

|ly(y)| = (y, y) = ‖y‖2

and hence ‖ly‖ = ‖y‖.

The Riesz representation theorem and the adjoint operator 87

The Riesz Representation Theorem shows that this example can be ‘re-

versed’, i.e. every linear functional on H corresponds to some inner product:

Theorem 11.5 (Riesz Representation Theorem) For every bounded

linear functional f on a Hilbert space H there exists a unique element y ∈ H

such that

f(x) = (x, y) for all x ∈ H (11.2)

and ‖y‖H = ‖f‖H∗ .

Proof Let K = Ker f , which is a closed linear subspace of H.

First we show that K⊥ is a one-dimensional linear subspace of H. Indeed,

given u, v ∈ K⊥ we have

f( f(u)v − f(v)u ) = 0. (11.3)

Since u, v ∈ K⊥ it follows that f(u)v−f(v)u ∈ K⊥, while (11.3) shows that

f(u)v − f(v)u ∈ K. It follows† that

f(u)v − f(v)u = 0,

and so u and v are proportional.

Therefore we can choose z ∈ K such that ‖z‖ = 1, and decompose any

x ∈ H as

x = (x, z)z + w with w ∈ K.

Therefore

f(x) = (x, z)f(z) = (x, f(z)z).

Setting y = f(z)z we obtain (11.2).

To show that this choice of y is unique, suppose that

(x, y) = (x, y) for all x ∈ H.

Then (x, y − y) = 0 for all x ∈ H, i.e. y − y ∈ H⊥ = 0.Finally, the calculation in Example 11.4 shows the equality of the norms

of y and f .

We now use this to define the adjoint of an operator.

† We always have U ∩ U⊥ = 0: if x ∈ U and x ∈ U⊥ then ‖x‖2 = (x, x) = 0.


Theorem 11.6 Let H and K be Hilbert spaces and T ∈ B(H,K). Then

there exists a unique operator T ∗ ∈ B(K,H), the adjoint of T , such that

(Tx, y)K = (x, T ∗y)H

for all x ∈ H, y ∈ K. In particular, ‖T ∗‖B(K,H) ≤ ‖T‖B(H,K).

Proof Let y ∈ K and consider the function f : H → K defined by f(x) =

(Tx, y)K . Then clearly f is linear and

|f(x)| = |(Tx, y)K |≤ ‖Tx‖K‖y‖K≤ ‖T‖‖x‖H‖y‖K .

It follows that f ∈ H∗, and so by the Riesz representation theorem there

exists a unique z ∈ H such that

(Tx, y)K = (x, z)H for all x ∈ H.

Define T ∗y = z. Then by definition

(Tx, y)K = (x, T ∗y) for all x ∈ H, y ∈ K.

However, it remains to show that T ∗ ∈ B(K,H). First, T ∗ is linear since

(x, T ∗(αy1 + βy2))H = (Tx, αy1 + βy2)K

= α(Tx, y1)K + β(Tx, y2)K

= α(x, T ∗y1)H + β(x, T ∗y2)H

= (x, αT ∗y1 + βT ∗y2)H ,

i.e.

T ∗(αy1 + βy2) = αT ∗y1 + βT ∗y2.

To show that T ∗ is bounded, we have

‖T ∗y‖2 = (T ∗y, T ∗y)H

= (TT ∗y, y)K

≤ ‖TT ∗y‖K‖y‖K≤ ‖T‖‖T ∗y‖H‖y‖K .

If ‖T ∗y‖H = 0 then clearly ‖T ∗y‖H ≤ ‖T‖‖y‖K , otherwise we can divide

both side by ‖T ∗y‖H to obtain the same conclusion (that T ∗ is bounded

from K into H). So ‖T ∗‖ ≤ ‖T‖.Finally suppose that (x, T ∗y)H = (x, T y)H for all x ∈ H, y ∈ K. Then for

The Riesz representation theorem and the adjoint operator 89

each y ∈ H, (x, (T ∗−T )y)H = 0 for all x ∈ H: this shows that (T ∗−T )y = 0

for all y ∈ H, i.e. that T = T ∗.

Example 11.7 Let H = K = Cn with its standard inner product. Then

given a matrix A = (aij) ∈ Cn×n we have

(Ax, y) =

n∑

i=1

n∑

j=1

aijxj

yi

=

n∑

j=1

xj

n∑

i=1

(aijyi)

= (x,A∗y),

where A∗ is the Hermitian conjugate of A, i.e. A∗ = AT

.

Example 11.8 Consider H = K = L2(0, 1) and the integral operator

(Tx)(t) =

∫ 1

0K(t, s)x(s) ds.

Then for x, y ∈ L2(0, 1) we have

(Tx, y)H =

∫ 1

0

∫ 1

0K(t, s)x(s) dsy(t) dt

=

∫ 1

0

∫ 1

0K(t, s)x(s)y(t) ds dt

=

∫ 1

0x(s)

(∫ 1

0K(t, s)y(t) dt

)

ds

= (x, T ∗y)H ,

where

T ∗y(s) =

∫ 1

0K(t, s)y(t) dt.

Exercise 11.9 Show that the adjoint of the integral operator T : L2(0, 1) →L2(0, 1) defined as

(Tx)(t) =

∫ t

0K(t, s)x(s) ds

is given by

(T ∗y)(t) =

∫ 1

tK(s, t)y(s) ds.


Example 11.10 Let H = K = ℓ2 and consider σrx = (0, x1, x2, . . .) then

(σrx, y) = x1y2 + x2y3 + x3y4 + . . .

= (x, σ∗r y),

where σ∗r = σly = (y2, y3, y4, . . .). Similarly σ∗l = σr.

The following lemma gives some elementary properties of the adjoint:

Lemma 11.11 Let H, K, and J be Hilbert spaces, R,S ∈ B(H,K) and

T ∈ B(K,J), then

(a) (αR + βS)∗ = αR∗ + βS∗ and

(b) (TR)∗ = R∗T ∗.

Proof

(a) Exercise

(b) Clearly

(x, (TR)∗y)H = (TRx, y)J

= (Rx, T ∗y)K = (x,R∗T ∗y)H .

Less trivially we have the following:

Theorem 11.12 Let H and K be Hilbert spaces and T ∈ B(H,K). Then

(a) (T ∗)∗ = T ,

(b) ‖T ∗‖ = ‖T‖, and

(c) ‖T ∗T‖ = ‖T‖2.

Proof

(a) Since T ∗ ∈ B(K,H), (T ∗)∗ ∈ B(H,K). For all x ∈ K, y ∈ H we

have

(x, (T ∗)∗y)K = (T ∗x, y)H

= (y, T ∗x)H

= (Ty, x)K

= (x, Ty)K ,

i.e. (T ∗)∗y = Ty for all y ∈ H, i.e. (T ∗)∗ = T .

11.1 Linear operators from H into H 91

(b) We have already shown in the proof of Theorem 11.6 that ‖T ∗‖ ≤‖T‖. Applying this inequality to T ∗ rather than to T we have

‖(T ∗)∗‖ = ‖T‖ ≤ ‖T ∗‖, and so ‖T ∗‖ = ‖T‖.(c) Since ‖T‖ = ‖T ∗‖ we have

‖T ∗T‖ ≤ ‖T ∗‖‖T‖ = ‖T‖2.

But also we have

‖Tx‖2 = (Tx, Tx) = (x, T ∗Tx)

≤ ‖x‖‖T ∗Tx‖ ≤ ‖T ∗T‖‖x‖2,

i.e. ‖T‖2 ≤ ‖T ∗T‖.

11.1 Linear operators from H into H

Definition 11.13 If H is a Hilbert space and T ∈ B(H,H) then T is normal

if

TT ∗ = T ∗T

and self-adjoint if T = T ∗.

Note that if T is normal then TT ∗ is self-adjoint, since (TT ∗)∗ = (T ∗)∗T ∗ =

TT ∗ using (b) of Lemma 11.11 ((TR)∗ = R∗T ∗) and (a) of Theorem 11.12

((T ∗)∗ = T ).

Equivalently T ∈ B(H,H) is self-adjoint iff

(x, Ty) = (Tx, y) for all x, y ∈ H.

Example 11.14 Let H = Rn and A ∈ Rn×n. Then A is self-adjoint if

A = AT , i.e. if A is symmetric.

Example 11.15 Let H = Cn and A ∈ Cn×n. Then A is self-adjoint if

A = AT

, i.e. if A is Hermitian.

Example 11.16 Consider the right-shift operator on ℓ2, for which σ∗r = σl.

Then σr is not normal, since

σrσlx = σr(x2, x3, . . .) = (0, x2, x3, . . .)


and

σlσrx = σl(0, x1, x2, . . .) = (x1, x2, x3, . . .).

Example 11.17 The integral operator T : L2 → L2 given by

Tf(t) =

∫ 1

0K(t, s)f(s) ds

is self-adjoint if K(t, s) = K(s, t), i.e. if K is symmetric.

It would be nice, of course, to have yet another expression for ‖T‖, and

we can do this when T is self-adjoint.

Theorem 11.18 Let H be a Hilbert space and T ∈ B(H,H) a self-adjoint

operator. Then

(a) (Tx, x) is real for all x ∈ H and

(b) ‖T‖ = sup|(Tx, x)| : x ∈ H, ‖x‖ = 1.

Proof For (a) we have

(Tx, x) = (x, Tx) = (Tx, x),

and so (Tx, x) is real. Now let M = sup|(Tx, x)| : x ∈ H, ‖x‖ = 1.Clearly

|(Tx, x)| ≤ ‖Tx‖‖x‖ ≤ ‖T‖‖x‖2 = ‖T‖when ‖x‖ = 1, and so M ≤ ‖T‖.

For any u, v ∈ H we have

4Re(Tu, v) = (T (u+ v), u+ v) − (T (u− v), u− v)

≤ M(‖u+ v‖2 + ‖u− v‖2)

≤ 2M(‖u‖2 + ‖v‖2)

using the parallelogram law. If Tu 6= 0 choose

v =‖u‖‖Tu‖ Tu

to obtain, since ‖v‖ = ‖u‖, that

4‖u‖‖Tu‖ ≤ 4M‖u‖2,

i.e. ‖Tu‖ ≤M‖u‖. This also holds if Tu = 0. It follows that ‖T‖ ≤M and

therefore that ‖T‖ = M .

12

Spectral Theory I: General theory

12.1 Spectrum and point spectrum

LetH be a complex Hilbert space and T ∈ B(H,H), then the point spectrum

of T consists of the set of all eigenvalues,

σp(T ) = λ ∈ C : Tx = λx for some non-zero x ∈ H.

Clearly |λ| ≤ ‖T‖op for any λ ∈ σp(T ), since if Tx = λx then

|λ|‖x‖ = |λx| = ‖Tx‖ ≤ ‖T‖op‖x‖.

Example 12.1 Consider the right shift operator σr on ℓ2. This operator

has no eigenvalues, since σrx = λx implies that

(0, x1, x2, . . .) = λ(x1, x2, x3, . . .)

and so

λx1 = 0, λx2 = x1, λx3 = x2, . . . .

If λ 6= 0 then this implies that x1 = 0, and then x2 = x3 = x4 = . . . = 0,

and so λ is not an eigenvalue. If λ = 0 then we also obtain x = 0, and so

there are no eigenvalues, i.e. σp(σr) = ∅.

Example 12.2 Consider the left shift operator σl on ℓ2; λ ∈ C is an eigen-

value if σlx = λx, i.e. if

(x2, x3, x4 . . .) = λ(x1, x2, x3, . . .),

93

94 Spectral Theory I: General theory

i.e. if

x2 = λx1, x3 = λx2, x4 = λx3.

Given λ 6= 0 this gives a candidate ‘eigenfunction’

x = (1, λ, λ2, λ3, . . .),

which is an element of ℓ2 provided that

∞∑

n=1

|λ|2n =1

1 − |λ|2 <∞

which is the case for any λ with |λ| < 1. It follows that

λ ∈ C : |λ| < 1 ⊆ σp(σl).

If A is a linear operator on a finite-dimensional space V then λ ∈ C is an

eigenvalue of A if

Ax = λx for some non-zero x ∈ V.

In this case λ is an eigenvalue if and only if A−λI is not invertible (recall that

you can find the eigenvalues of an n×n matrix by solving det(A−λI) = 0).

However, this is no longer true in infinite-dimensional spaces. Before we

define the spectrum, we briefly discuss the definition of the inverse of a linear

operator.

As with the theory of matrices, the concept of the inverse of a general

linear operator is extremely useful. We say that A is injective if the equation

Ax = y

has a unique solution for every y ∈ range(A). We say that A is invertible if

it is injective and the range of A is equal to H. In this case we define the

inverse of A, A−1, by A−1y = x. It is easy to check that

AA−1u = u for all u ∈ range(A) and A−1Au = u for all u ∈ H.

Exercise 12.3 Show that if A is linear and A−1 exists then it is linear too.

The injectivity of A is equivalent to the triviality of its kernel.

Lemma 12.4 A is injective iff Ker(A) = 0.


Proof Suppose that A is invertible. Then the equation Ax = y has a unique

solution for any y ∈ Range(A). However, if Ker(A) contain some non-zero

element z then A(x+z) = y also, so Ker(A) must be 0. Conversely, if A is

not invertible then for some y ∈ Range(A) there are two distinct solutions,

x1 and x2, of Ax = y, and so A(x1 − x2) = 0, giving a non-zero element of

Ker(A).

We can now make the following definition:

Definition 12.5 The resolvent set of T , R(T ), is

R(T ) = λ ∈ C : (T − λI)−1 ∈ B(H,H).

The spectrum σ(T ) of T ∈ B(H,H) is the complement of R(T ),

σ(T ) = C \R(T ),

i.e. the spectrum of T is the set of all complex λ for which T − λI does not

have a bounded inverse defined on all of H.

Clearly σp(T ) ⊆ σ(T ), since if there is a non-zero z with Tz = λz then

Ker(T−λI) 6= 0 and so (T−λI) is not invertible (using Lemma 12.4). But

the spectrum can be much larger than the point spectrum; a nice example

will be a consequence of the fact that σ(T ∗) = σ(T ).

Lemma 12.6 σ(T ∗) = λ : λ ∈ σ(T ).

Proof If λ /∈ σ(T ) then T − λI has a bounded inverse,

(T − λI)(T − λI)−1 = I = (T − λI)−1(T − λI).

Taking adjoints we obtain

[(T − λI)−1]∗(T ∗ − λI) = I = (T ∗ − λI)[(T − λI)−1]∗,

and so T ∗ − λI has a bounded inverse, i.e. λ /∈ σ(T ∗). Starting instead with

T ∗ we deduce that λ /∈ σ(T ∗) ⇒ λ /∈ σ(T ), which completes the proof.

Example 12.7 Let σr be the right-shift operator on ℓ2. We saw above that

σr has no eigenvalues, but that for σ∗r = σl the interior of the unit disc is

contained in the point spectrum. It follows that

λ ∈ C : |λ| < 1 ⊆ σ(σr)


even though σp(σr) = ∅.

We have already seen that any eigenvalue λ of T must satisfy |λ| ≤ ‖T‖op.

We now show that this also holds for any λ ∈ σ(T ); the argument is more

subtle, and based on considering how to solve the linear equation (I−T )x =

y.

Theorem 12.8 Suppose that T ∈ B(H,H) with ‖T‖ < 1. Then (I−T )−1 ∈B(H,H) and

(I − T )−1 = I + T + T 2 + · · ·with

‖(I − T )‖−1 ≤ (1 − ‖T‖)−1.

Proof Since

‖T nx‖ ≤ ‖T‖‖T n−1x‖it follows that ‖T n‖ ≤ ‖T‖n. Therefore if we consider

Vn = I + T + · · · + T n

we have (for n > m)

‖Vn − Vm‖ = ‖Tm+1 + · · · + T n−1 + T n‖≤ ‖Tm+1‖ + · · · + ‖T n−1‖ + ‖T n‖≤ ‖T‖m+1 + · · · + ‖T‖n−1 + ‖T‖n

≤ ‖T‖m+1 1

1 − ‖T‖ .

It follows that Vn is Cauchy in the operator norm, and so converges to

some V ∈ B(H,H) with

‖V ‖ ≤ 1 + ‖T‖ + ‖T‖2 + · · · = [1 − ‖T‖]−1.

Clearly

V (I−T ) = (I+T+T 2+ · · · )(I−T ) = (I+T+T 2+ · · · )−(T +T 2+T 3) = I

and similarly (I − T )V = I.

As promised:

Corollary 12.9 The spectrum of T , σ(T ) ⊆ λ ∈ C : |λ| ≤ ‖T‖op.


Proof We have T − λI = λ( 1λT − I). So if I − 1

λT is invertible, λ /∈ σ(T ).

But for |λ| > ‖T‖op we have ‖ 1λT‖op < 1, and the above theorem shows that

T − λI is invertible and the result follows.

We now show that the spectrum must also be closed, by showing that it

complement (the resolvent set) is open. To this end, we prove the following

theorem, which shows that the set of bounded linear operators with bounded

inverses defined on all of H is open, i.e. that this property is stable under

perturbation.

Theorem 12.10 Let H be a Hilbert space and T ∈ B(H,H) such that

T−1 ∈ B(H,H). Then for any U ∈ B(H,H) with

‖U‖ < ‖T−1‖−1

the operator T + U is invertible with

‖(T + U)−1‖ ≤ ‖T−1‖1 − ‖U‖‖T−1‖ . (12.1)

Proof Let P = T−1(T + U) = I + T−1U . Then since by assumption

‖T−1‖‖U‖ < 1 it follows from Theorem 12.8 that P is invertible with

‖P‖−1 ≤ 1

1 − ‖T−1‖‖U‖ .

Using the definition of P we have

T−1(T + U)P−1 = P−1T−1(T + U) = I;

from the first of these identities we have

(T + U)P−1T−1 = I

and so

(T + U)−1 = P−1T−1

and (12.1) follows.

Corollary 12.11 If T ∈ B(H,H) then the spectrum of T is closed.

Proof We show that the resolvent set R(T ), the complement of σ(T ), is


open. Indeed, if λ ∈ R(T ) then T − λI is invertible. Theorem 12.10 show

that (T − λI) − δI is invertible for all

|δ| < ‖(T − λI)−1‖−1,

i.e. R(T ) is open.

Lemma 12.12 The spectrum of σl and of σr are both equal to the unit disc

in the complex plane:

σ(σ·) = λ ∈ C : |λ| ≤ 1.

Proof We showed earlier that for the shift operators σr and σl on ℓ2,

σ(σr) = σ(σl) ⊇ λ ∈ C : |λ| < 1.We have already shown that ‖σ·‖op = 1, so we know that at most

σ(σ·) ⊇ λ ∈ C : |λ| ≤ 1,but since it follows from Corollary 12.11 that

σ(σ·) ⊆ λ ∈ C : |λ| ≤ 1.It follows that in fact

σ(σ·) = λ ∈ C : |λ| ≤ 1.

A question on the final examples sheet shows that if T is self-adjoint then

σ(T ) ⊆ R.

13

Spectral theory II: compact self-adjoint operators

We now consider eigenvalues of compact self-adjoint linear operators on a

Hilbert space. It is convenient to restrict attention to Hilbert spaces over C,

but this is no restriction, since we can always consider the ‘complexification’

of a Hilbert space over R.

13.1 Complexification and real eigenvalues

Exercise 13.1 Let H be a Hilbert space over R, and define its complexifi-

cation HC as the vector space

HC = x+ iy : x, y ∈ H,

equipped with operations + and ∗ defined via

(x+ iy) + (w + iz) = (x+ w) + i(y + z), x, y, w, z ∈ V

and

(a+ ib) ∗ (x+ iy) = (ax− by) + i(bx+ ay) a, b ∈ R, x, y ∈ V.

Show that equipped with the inner product

(x+ iy,w + iz)HC= (x,w) + i(y,w) − i(x, z) + (y, z)

HC is a Hilbert space.

It follows that

‖x+ iy‖2HC

= ‖x‖2 + ‖y‖2.

99

100 Spectral theory II: compact self-adjoint operators

Just as we can complexify a Hilbert spaceH to give HC, we can complexify

a linear operator T that acts on H to a linear operator TC that acts on HC:

Lemma 13.2 Let H be a real Hilbert space and HC its complexification.

Given T ∈ B(H,H), extend T to a linear operator T : HC → HC via the

definition

T (x+ iy) = Tx+ iTy x, y ∈ H.

Then T ∈ B(HC,HC), any eigenvalue of T is an eigenvalue of T , and that

any real eigenvalue of T is an eigenvalue of T .

Proof Clearly

‖T (x+iy)‖2HC

= ‖Tx‖2+‖Ty‖2 ≤ ‖T‖2B(H,H)(‖x‖2+‖y‖2) = ‖T‖2

B(H,H)‖x+iy‖2HC,

so ‖T‖ ≤ ‖T‖. But also ‖Tx‖ = ‖T (x+ i0)‖, and so ‖T‖ ≤ ‖T‖. So in fact

‖T‖ = ‖T‖.If λ is an eigenvalue of T with eigenvector x then

T (x+ i0) = Tx = λx = λ(x+ i0);

while if T has eigenvalue λ ∈ R with eigenvector x+ iy (with either x or y

non-zero), it follows that

Tx+ iTy = T (x+ iy) = λ(x+ iy) = λx+ iλy,

and so Tx = λx and Ty = λy, so since x or y is non-zero, λ is an eigenvalue

of T .

Lemma 13.3 Let H be a real Hilbert space and T ∈ B(H,H) a self-adjoint

operator. Then T as defined above is a self-adjoint operator on HC.

Proof For ξ, η ∈ HC, ξ = x+ iy, η = u+ iv, x, y, u, v ∈ H,

(T ξ, η) = (T (x+ iy), u+ iv)HC

= (Tx+ iTy, u+ iv)

= (Tx, u) − i(Tx, v) + i(Ty, u) + (Ty, v)

= (x, Tu) − i(x, Tv) + i(y, Tu) + (y, Tv)

= (x+ iy, Tu+ iTv)

= (ξ, T η).


If T is self-adjoint then all its eigenvalues are real:

Theorem 13.4 Let T be a self-adjoint operator on a Hilbert space H. Then

all the eigenvalues of T are real and the eigenvectors corresponding to dis-

tinct eigenvalues are orthogonal.

Proof Suppose that Tx = λx with x 6= 0. Then

λ‖x‖2 = (λx, x) = (Tx, x) = (x, T ∗x) = (x, Tx) = (x, λx) = λ‖x‖2,

i.e. λ = λ.

Now if λ and µ are distinct eigenvalues with Tx = λx and Ty = µy then

0 = (Tx, y) − (x, Ty) = (λx, y) − (x, µy) = (λ− µ)(x, y),

and so (x, y) = 0.

Corollary 13.5 If T is a self-adjoint operator on a real Hilbert space H,

and T is its complexification defined above, σp(T ) = σp(T ).

13.2 Compact operators

We will develop our spectral theory for operators that are self-adjoint and

‘compact’ according to the following definition:

Definition 13.6 Let X and Y be normed spaces. Then a linear operator

T : X → Y is compact if for any bounded sequence xn ∈ X, the sequence

Txn ∈ Y has a convergent subsequence (whose limit lies in Y ).

Note that a compact operator must be bounded, since otherwise there

exists a sequence in X with ‖xn‖ = 1 but ‖Txn‖ → ∞, and clearly Txncannot have a convergent subsequence.

Example 13.7 Take T ∈ B(X,Y ) with finite-dimensional range. Then T

is compact, since any bounded sequence in a finite-dimensional space has a

convergent subsequence.


Theorem 13.8 Suppose that X is a normed space and Y is a Banach space.

If Kn is a sequence of compact (linear) operators in B(X,Y ) converging

to some K ∈ B(X,Y ) in the operator norm, i.e.

sup‖x‖X=1

‖Knx−Kx‖Y → 0

as n→ ∞, then K is compact.

Proof Let xn be a bounded sequence in X. Then since K1 is compact,

K1(xn) has a convergent subsequence, K1(xn1j). Since xn1j

is bounded,

K2(xn1j) has a convergent subsequence, K2(xn2j

). Repeat this process to

get a family of nested subsequences, xnkj, with Kl(xnkj

) convergent for all

l ≤ k.

Now consider the diagonal sequence yj = xnjj. Since yj is a subsequence

of xni for j ≥ n, it follows that Kn(yj) converges (as j → ∞) for every n.

We now show that K(yj) is Cauchy, and hence convergent, to complete

the proof. Choose ǫ > 0, and use the triangle inequality to write

‖K(yi) −K(yj)‖Y≤ ‖K(yi) −Kn(yi)‖Y + ‖Kn(yi) −Kn(yj)‖Y + ‖Kn(yj) −K(yj)‖Y .

Since yj is bounded and Kn → K in the operator norm, pick n large

enough that

‖K(yj) −Kn(yj)‖Y ≤ ǫ/3

for all yj in the sequence. For such an n, the sequence Kn(yj) is Cauchy,

and so there exists an N such that for i, j ≥ N we can guarantee

‖Kn(yi) −Kn(yj)‖Y ≤ ǫ/3.

So now

‖K(yi) −K(yj)‖Y ≤ ǫ for all i, j ≥ N,

and K(yn) is a Cauchy sequence.

We now use this theorem to show the following:

Proposition 13.9 The integral operator T : L2(a, b) → L2(a, b) given by

[Tu](x) =

∫ b

aK(x, y)u(y) dy


with∫ b

a

∫ b

a|K(x, y)|2 dxdy <∞

is compact.

Proof Let φj be an orthonormal basis for L2(a, b). It follows that φi(x)φj(y)is an orthonormal basis for L2((a, b) × (a, b)). If we write K(x, y) in terms

of this basis we have

K(x, y) =∞∑

j,k=1

kijφi(x)φj(y),

where the coefficients kij are given by

kij =

∫ b

a

∫ b

aK(x, y)φi(x)φj(y) dxdy,

and the sum converges in L2((a, b) × (a, b)). Since φi(x)φj(y) is a basis

we have

‖K‖2L2((a,b)×(a,b)) =

∫ b

a

∫ b

a|K(x, y)|2 dxdy =

∞∑

i,j=1

|kij |2. (13.1)

We now approximate T by operators derived from finite truncations of

the expansion of K(x, y). We set

Kn(x, y) =

n∑

i,j=1

kijφi(x)φj(y),

and

[Tnu](x) =

∫ b

aKn(x, y)u(y) dy.

If u ∈ L2(Ω) is given by u =∑∞

l=1 clφl, then

(Tnu)(x) =

n∑

i,j=1

∞∑

l=1

∫ b

akijφi(x)φj(y)clφl(y) dy

=n∑

i,j=1

kijcjφi(x).

Since Tnu is therefore a linear combination of φini=1, the range of Tn has

dimension n. It follows that n is compact for each n; if we can show that


Tn → T in the operator norm then we can use theorem 13.8 to show that T

is compact.

This is straightforward, since

‖(T − Tn)u‖2 =

∫ b

a

∫ b

a|K(x, y)u(y) −Kn(x, y)u(y)|2 dxdy

≤(∫ b

a

∫ b

a|K(x, y) −Kn(x, y)|2 dxdy

)∫ b

a|u(y)|2 dy,

i.e.

‖T − Tn‖2 ≤∫ b

a

∫ b

a|K(x, y) −Kn(x, y)|2 dxdy

≤∫ b

a

∫ b

a

∣

∣

∣

∣

∣

∣

∞∑

i,j=n+1

kijφi(x)φj(y)

∣

∣

∣

∣

∣

∣

2

dxdy

=

∞∑

i,j=n+1

|kij |2,

using the expansion of K and Kn. Convergence of Tn to T follows since the

sum in (13.1) is finite.

We now show that any compact self-adjoint operator has at least one

eigenvalue. (Recall that σr is not even normal, so this is no contradiction.)

Theorem 13.10 Let H be a Hilbert space and T ∈ B(H,H) a compact

self-adjoint operator. Then at least one of ±‖T‖op is an eigenvalue of T .

Proof We assume that T 6= 0, otherwise the result is trivial. From Theorem

11.18,

‖T‖op = sup‖x‖=1

|(Tx, x)|.

Thus there exists a sequence xn, of unit vectors, such that

(Txn, xn) → ±‖T‖op = α.

Since T is compact there is a subsequence xnjsuch that Txnj

is convergent

to some y. Relabel xnjas xn again.

Now consider

‖Txn − αxn‖2 = ‖Txn‖2 + α2 − 2α(Txn, xn)

≤ 2α2 − 2α(Txn, xn);


by the choice of xn, the right-hand side tends to zero as n→ ∞. It follows,

since Txn → y, that

αxn → y,

and since α 6= 0 is fixed we must have xn → x for some x ∈ H. Therefore

Txn → Tx = αx. It follows that

Tx = αx

and clearly x 6= 0, since ‖y‖ = |α|‖x‖ = ‖T‖op 6= 0.

Note that since any eigenvalue must satisfy |λ| ≤ ‖T‖op, since if Tx = λx

we have

λ‖x‖2 = (λx, x) = (Tx, x) ≤ ‖Tx‖‖x‖ ≤ ‖T‖op‖x‖2,

it follows that ‖T‖op = supλ : λ ∈ σp(T ).

Proposition 13.11 Let T be a compact self-adjoint operator on a Hilbert

space H. Then σp(T ) is either finite or consists of a countable sequence

tending to zero. Furthermore every distinct non-zero eigenvalue corresponds

to a finite number of linearly independent eigenvectors.

Proof Suppose that T has infinitely many eigenvalues that do not form a

sequence tending to zero. Then for some ǫ > 0 there exists a sequence of

distinct eigenvalues with |λn| > ǫ. Let xn be a corresponding sequence of

eigenvectors with ‖xn‖ = 1; then

‖Txn − Txm‖2 = (Txn − Txm, Txn − Txm) = |λn|2 + |λm|2 ≥ 2ǫ2

since (xn, xm) = 0. It follows that Txn can have no convergent subse-

quence, which contradicts the compactness of T .

Now suppose that for some eigenvalue λ there exists an infinite number of

linearly independent eigenvectors en∞n=1. Using the Gram-Schmidt process

we can find a countably infinite orthonormal set of eigenvectors, since any

linear combination of the ej is still an eigenvector:

T (

n∑

j=1

αjej) =

n∑

j=1

αjTej = λ(

n∑

j=1

αjej).

Now, we have

‖Ten − Tem‖ = ‖λen − λem‖ = |λ|‖en − em‖ =√

2|λ|.


It follows that Ten can have no convergent subsequence, again contradict-

ing the compactness of T . (Note that this second part does not in fact use

the fact that T is self-adjoint.)

Lemma 13.12 Let T ∈ B(H,H) and let S be a closed linear subspace of H

such that TS ⊆ S. Then T ∗S⊥ ⊆ S⊥.

Proof Let x ∈ S⊥ and y ∈ S. Then Ty ∈ S and so (Ty, x) = (y, T ∗x) = 0

for all y ∈ S, i.e. T ∗x ∈ S⊥.

Since we will apply this lemma when T is self-adjoint, for such a T we

have

TS ⊆ S ⇒ TS⊥ ⊆ S⊥

for any closed linear subspace S of H.

Theorem 13.13 (Hilbert-Schmidt Theorem). Let H be a Hilbert space and

T ∈ B(H,H) be a compact self-adjoint operator. Then there exists a finite

or countably infinite orthonormal sequence wn of eigenvectors of T with

corresponding non-zero real eigenvalues λn such that for all x ∈ H

Tx =∑

j

λj(x,wj)wj . (13.2)

Proof By Theorem 13.10 there exists a w1 such that ‖w1‖ = 1 and Tw1 =

±‖T‖w1. Consider the subspace of H perpendicular to w1,

H2 = w⊥1 .

Then since T is self-adjoint, Lemma 13.12 shows that T leaves H2 invariant.

If we consider T2 = T |H2then we have T2 ∈ B(H2,H2) with T2 compact;

this operator is still self-adjoint, since for all x, y ∈ H2

(x, T2y) = (x, Ty) = (T ∗x, y) = (Tx, y) = (T2x, y).

Now apply Theorem 13.10 to the operator T2 on the Hilbert space H2

find an eigenvalue λ2 = ±‖T2‖ and an eigenvector w2 ∈ H2 with ‖w2‖ = 1.

Continue this process as long as Tn 6= 0.

If Tn = 0 for some n then for any x ∈ H we have

y := x−n−1∑

j=1

(x,wj)wj ∈ Hn.


Then

0 = Tny = Ty = Tx−n−1∑

j=1

(x,wj)Twj = Tx−n−1∑

j=1

λj(x,wj)wj

which is (13.2).

If Tn is never zero then consider

yn := x−n−1∑

j=1

(x,wj)wj ∈ Hn.

Then we have

‖x‖2 = ‖yn‖2 +

n−1∑

j=1

|(x,wj)|2,

and so ‖yn‖ ≤ ‖x‖. It follows that∥

∥

∥

∥

∥

∥

Tx−n−1∑

j=1

λj(x,wj)wj

∥

∥

∥

∥

∥

∥

= ‖Tyn‖ ≤ ‖Tn‖‖yn‖ = |λn|‖x‖,

and since |λn| → 0 as n→ ∞ we have (13.2).

There is a partial converse to this theorem on the last examples sheet: if

H is a Hilbert space, ej is an orthonormal set in H, and T ∈ B(H,H) is

such that

Tu =

∞∑

j=1

λj(u, ej)ej with λj ∈ R and λj → 0 as j → ∞

then T is compact and self-adjoint.

The orthonormal sequence constructed in this theorem is only a basis for

the range of T ; however, we do have:

Corollary 13.14 Let H be an infinite-dimensional Hilbert space and T ∈B(H,H) a compact self-adjoint operator. Then there exists an orthonormal

basis E of H consisting of eigenvectorsof T , and for any x ∈ H

Tx =

∞∑

e∈E

λe(x, e)e.

where Te = λee.


Proof From Theorem 13.13 we have a finite or countable sequence wn of

eigenvectors of T such that

Tx =∞∑

j=1

λj(x,wj)wj . (13.3)

Now let F be an orthonormal basis for KerT (this exists since Ker(T ) is

a Hilbert space in its own right, and every Hilbert space has an orthonormal

basis); each f ∈ F is an eigenvector of T with eigenvalue zero, and since

Tf = 0 but Twj = λjwj with λj 6= 0, we know that (f, ek) = 0 for all f ∈ F ,

k ∈ N. So F ∪ wj is an orthonormal set in H. orthonormal set in H.

Now, (13.3) implies that

T

x−∞∑

j=1

(x,wj)wj

= 0,

i.e. that x−∑∞j=1(x,wj)wj ∈ KerT . It follows that wj∪F is an orthonor-

mal basis for H.

Exercise 13.15 Show that if T is invertible and satisfies the conditions

of Theorem 13.13 then there is an orthonormal basis of H consisting of

eigenvectors corresponding to non-zero eigenvalues of T .

We end this chapter with a corollary of Corollary 13.14 (!) that shows that

the eigenvalues are essentially all of the spectrum of a compact self-adjoint

operator.

Theorem 13.16 Let T be a compact self-adjoint operator on a Hilbert space.

Then σ(T ) = σp(T ).

Note that this means that either σ(T ) = σp(T ) or σ(T ) = σp(T ) ∪ 0,since σp(T ) has no limit points except perhaps zero. So σ(T ) = σp(T )

unless there are an infinite number of eigenvalues but zero itself is not an

eigenvalue.

Proof By the corollary of the HS Theorem, we have

Tx =∑

e∈E

λe(x, e)e

for some orthonormal basis E of H.


Now take µ /∈ σp(T ). For such µ, it follows that there exists a δ > 0 such

that

supj

|µ− λ| ≥ δ > 0 for all λ ∈ σp(T )

(otherwise µ ∈ σp(T )). We use this to show that T − µI is invertible with

bounded inverse.

Now,

(T − µI)x = y ⇔∑

e∈E

(λe − µ)(x, e)e =∑

e∈E

(y, e)e.

Taking the inner product of both sides with a particular f ∈ E, we have

(T − µI)x = y ⇔ (λf − µ)(x, f) = (y, f).

So we must have

(x, f) =(y, f)

λk − µ.

Since∑ |(y, f)|2 <∞ and |λ− µ| ≥ δ, it follows that

x =∑

f∈E

(x, f)f

converges, and that ‖x‖ ≤ δ−1‖y‖. So (T − µI)−1 exists and is bounded.

14

Sturm-Liouville problems

We consider the Sturm-Liouville problem

− d

dx

(

p(x)du

dx

)

+ q(x)u = λu with u(a) = u(b) = 0. (14.1)

As a shorthand, we write L[u] for the left-hand side of (14.1), i.e.

L[u] = −(p(x)u′)′ + q(x)u

We will assume that p(x) > 0 on [a, b] and that q(x) ≥ 0 on [a, b].

It was one of the major concerns of applied mathematics throughout the

nineteenth century to show that the solutions un(x) of (14.1) form a com-

plete basis for some appropriate space of functions (generalising the use of

Fourier series as a basis for L2). We can do this easily using the theory

developed in the last section.

However, first we have to turn the differential equation (14.1) into an

integral equation. We do this as follows:

Lemma 14.1 Let u1(x) and u2(x) be two linearly independent non-zero

solutions of

− d

dx

(

p(x)du

dx

)

+ q(x)u = 0.

Then

Wp(u1, u2)(x) := p(x)[u′1(x)u2(x) − u′2(x)u1(x)]

is a constant.

110

Sturm-Liouville problems 111

Proof First we show that Wp is constant. Differentiate Wp with respect to

x, then use the fact that L[u1] = L[u2] = 0 to substitute for pu′′ = qu− p′u′

to give:

W ′p = p′u′1u2 + pu′′1u

′2 + pu′1u

′2 − p′u1u

′2 − pu′1u

′2 − pu1u

′′2

= p′(u′1u2 − u′2u1) + p(u′′1u2 − u′′2u1)

= p′(u′1u2 − u′2u1) + u2(qu1 − p′u′1) − u1(qu2 − p′u′2)

= 0.

Now, if Wp ≡ 0 then, since p 6= 0, we have

u′1u2 − u′2u1 = 0 ⇒ u′1u1

=u′2u2,

which can be integrated to give lnu1 = lnu2 + c, i.e. u1 = ecu2, which

implies that u1 and u2 are proportional, contradicting the fact that they are

linearly independent.

Theorem 14.2 Suppose that u1(x) and u2(x) are two linearly independent

non-zero solutions of

− d

dx

(

p(x)dy

dx

)

+ q(x)y = 0,

with u1(a) = 0 and u2(b) = 0. Set C = Wp(u1, u2)−1 and define

G(x, y) =

Cu1(x)u2(y) a ≤ x < y

Cu2(x)u1(y) y ≤ x ≤ b;(14.2)

then the solution of L[u] = f is given by

u(x) =

∫ b

aG(x, y)f(y) dy. (14.3)

Proof Writing (14.3) out in full we have

u(x) = Cu2(x)

∫ x

au1(y)f(y) dy + Cu1(x)

∫ b

xu2(y)f(y) dy.

112 Sturm-Liouville problems

Now,

u′(x) = Cu2(x)u1(x)f(x) + Cu′2(x)

∫ x

au1(y)f(y) dy − Cu1(x)u2(x)f(x)

+Cu′1(x)

∫ b

xu2(y)f(y) dy

= Cu′2(x)

∫ x

au1(y)f(y) dy + Cu′1(x)

∫ b

xu2(y)f(y) dy,

and then since CWp(u1, u2) = 1,

u′′(x) = Cu′2(x)u1(x)f(x) + Cu′′2(x)

∫ x

au1(y)f(y) dy − Cu′1(x)u2(x)f(x)

+Cu′′1(x)

∫ b

xu2(y)f(y) dy

= −f(x)

p(x)+ Cu′′2(x)

∫ x

au1(y)f(y) dy + Cu′′1(x)

∫ b

xu2(y)f(y) dy.

Now we have L[u] = −pu′′ − p′u′ + qu, and since L is linear with L[u1] =

L[u2] = 0 it follows that

L[u] = f(x)

as claimed.

We can now define a linear operator on L2(a, b) by the right-hand side of

(14.3):

Tf(x) =

∫ b

aG(x, y)f(y) dy.

Since G is symmetric (see (14.2)), it follows that T is a self-adjoint (see

Example 11.17), and we have proved that such a T is compact in Proposition

13.9.

If L[u] = λu, then u = T (λu). Since T is linear,

L[u] = λu ⇔ u = λTu.

If we can show that λ, 1/λ 6= 0 then the eigenvectors (or ‘eigenfunctions’) of

the ODE boundary value problem L[u] = λu (which is just (14.1)) will be

exactly those of the operator T (for which Tu = 1λu). Since the eigenvectors

of T form an orthonormal basis for L2(a, b) (we will see that Ker(T ) = 0),the same is true of the eigenfunctions of the Sturm-Liouville problem.

Sturm-Liouville problems 113

Theorem 14.3 The eigenfunctions of the problem (14.1) form a complete

orthonormal basis for L2(a, b).

Proof We show first that λ = 0 is not an eigenvalue of (14.1), i.e. there is

no non-zero u for which L[u] = 0. Indeed, if L[u] = then we have

0 = (L[u], u) =

∫ b

a−(pu′)′u+ q|u|2 dx

=

∫ b

ap|u′|2 + q|u|2 dx−

[b

ap(x)u′(x)u(x)]

=

∫ b

ap|u′|2 + q|u|2 dx.

Since p > 0 on [a, b], it follows that u′ = 0 on [a, b], and so u must be

constant on [a, b]. Since u(a) = 0, it follows that u ≡ 0.

We now use show that Ker(T ) = 0. Indeed, Tf is the solution of L[u] = f ,

i.e. f = L[Tf ]. So if Tf = 0, it follows that f = 0.

So φ is an eigenfunction of the SL problem iff it is an eigenvector for T :

L[φ] = λφ ⇔ Tφ =1

λφ.

Since G(x, y) is symmetric and bounded, it follows from Examples 10.9

and 11.8 that T is a bounded self-adjoint operator; Proposition 13.9 shows

that T is also compact. It follows from Theorem 13.13 that T has a set of

orthonormal eigenfunctions φj with

Tφj = µjφj ,

and since Ker(T ) = 0 the argument of Corollary 13.14 shows that those

form an orthonormal basis for L2(a, b).

Comparing this with our original problem we obtain an infinite set of

eigenfunctions φj with corresponding eigenvalues λj = µ−1j . Note that

now λj → ∞ as j → ∞. As above, the eigenfunctions φj form an or-

thonormal basis of L2(a, b).

This has immediate applications for Fourier series. Indeed, if we consider

−d2u

dx2= λu u(0) = 0, u(1) = 0,

which is (14.1) with p = 1, q = 0, it follows that the eigenfunctions of this

114 Sturm-Liouville problems

equation will form a basis for L2(0, 1). These are easily found by elementary

methods, and are

sin kπx∞k=1.

It follows that, appropriately normalised, these functions form an orthonor-

mal basis for L2(a, b), i.e. that any f ∈ L2(a, b) can be expanded in the

form

f(x) =

∞∑

k=1

αk sin kπx.

Thus begins the theory of Fourier series...

Exercise 14.4 Show that the solution of −d2u/dx2 = f is given by

u(x) =

∫ 1

0G(x, y)f(y) dy,

where

G(x, y) =

x(1 − y) 0 ≤ x < y

y(1 − x) y ≤ x ≤ 1.

functional analysis i autumn term 2008 - warwickmasdh/fa1_2008.pdffunctional analysis i autumn term...

Documents