functional analysis ii - unistra.frirma.math.unistra.fr/~sabri/lectures_fa.pdf · functional...

81
Cairo University Faculty of Sciences Department of Mathematics Functional Analysis II Mostafa

Upload: others

Post on 15-Jul-2020

19 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Cairo UniversityFaculty of Sciences

Department of Mathematics

Functional Analysis II

Mostafa SABRI

Page 2: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

ii

Page 3: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Contents

1 LCS and Distributions 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Locally convex spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Fréchet Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 The inductive limit topology . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.6 PoU and sheaf structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.7 Distributions with compact support . . . . . . . . . . . . . . . . . . . . . . 151.8 Further results and applications . . . . . . . . . . . . . . . . . . . . . . . . . 161.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Fourier analysis 212.1 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 The Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Sobolev spaces and embeddings . . . . . . . . . . . . . . . . . . . . . . . . . 272.4 Paley-Wiener Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.5 Further results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Some Important Theorems 393.1 Measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 Separation of convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3 Banach-Alaoglu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4 Markov-Kakutani and Haar . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Krein-Milman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.6 Stone-Weierstrass vs Monotone Class . . . . . . . . . . . . . . . . . . . . . . 473.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Spectral theory 554.1 Compact and Hilbert-Schmidt operators . . . . . . . . . . . . . . . . . . . . 554.2 Spectral theorem for compact operators . . . . . . . . . . . . . . . . . . . . 584.3 Application : Peter-Weyl Theorem . . . . . . . . . . . . . . . . . . . . . . . 604.4 The spectrum of a bounded operator . . . . . . . . . . . . . . . . . . . . . . 624.5 Continuous functional calculus . . . . . . . . . . . . . . . . . . . . . . . . . 664.6 Borel functional calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.7 Further results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Bibliography 75

Page 4: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions
Page 5: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Chapter 1

LCS and Distributions

1.1 Introduction

In this chapter we present the basics of the theory of distributions, also known asgeneralized functions. 1 Some motivation to introduce them:

– Differentiation of functions is not well behaved. For instance, there exist continuousfunctions which are nowhere differentiable. Another example : if a sequence (fk)converges to some function f pointwise or uniformly, then (f ′k) does not necessarilyconverge to f ′ in the same sense.

– One would like to generalize the concept of a function to encompass objects likethe “Dirac delta function”, which was defined by Dirac as a function δ satisfyingδ(x) = 0 if x 6= 0, δ(0) = +∞ and

∫R f(x)δ(x) dx = f(0) for sufficiently smooth

f : R→ R. These properties are of course contradictory, because if δ(x) = 0 for allx 6= 0, then f(x)δ(x) = 0 a.e., so

∫R f(x)δ(x) dx = 0. This is why Von Neumann

objected its use during his work on the foundations of quantum mechanics. Butdespite this, physicists continued to use it and obtain quite successful results.

The theory of distributions was founded by Laurent Schwartz in the 1940’s, withsome earlier results by Sergei Sobolev. The idea is that, instead of restricting the useof differentiation to smooth functions, one can instead enlarge the space of functions,so that it contains derivatives of functions which are not differentiable in the classicalsense. For example, f(x) = |x| will be differentiable on R, with derivative f ′(x) = −1if x < 0 and f ′(x) = 1 if x > 0. The way we’ll do this is to first define a convenienttopology on the space of smooth functions with compact support, then let the space of“generalized functions” be its topological dual, and finally define on it calculus conceptslike differentiation through the duality bracket. As a result,

– On this larger space of distributions, any generalized function is differentiable. More-over, if (fk) converges to f (in this dual topology), then ∂αfk converges to ∂αf forall α. Finally, if the function is smooth, then “generalized differentiation” is thesame as classic differentiation.

– The Dirac delta function has a simple interpretation as a distribution, and can indeedbe seen as a limit of well behaved functions getting more and more singular at 0(which is the physicist’s intuition). We call these “delta sequences”.

– As a bonus we get interesting applications in the theory of partial differential equa-tions. It turns out that if P (D) is a partial differential operator and if E is adistribution, then there always exist a solution to the equation P (D)E = δ. This

1. Sections 1-4 simply unzip [15], the rest is taken from [32].

Page 6: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2 Chapter 1. LCS and Distributions

result is due to Malgrange and Ehrenpreis, and such solutions E are called “fun-damental solutions”. If now we’d like to solve the PDE P (D)u = v, and if v hascompact support, it suffices to choose u = E ∗ v. Then u will be a solution.

– Let us finally mention that the famous “Schwartz kernel (or nuclear) theorem” con-cerning bilinear forms on functions of rapid decay was the starting point of a deepstudy by Grothendieck of spaces in which such theorems hold. His PhD thesis waspublished in [18], and the main results were summarized later in [17]; a paper whichhad a major impact in Banach space theory.

To conclude this introduction, let us mention that the idea of generalizing classicalconcepts by duality is quite fertile. For example, one uses the same idea to define “gen-eralized eigenfunctions” in Spectral Theory. Here, starting from a Hilbert space H , wefirst construct a smaller space H+ →H , define its dual H−, so that H+ →H →H−,and let generalized eigenfunction live in H−.

1.2 Locally convex spacesDefinition 1.1. Let X be a vector space over K = R or C. A function p : X → R iscalled a seminorm if(i) p(x+ y) ≤ p(x) + p(y) for x, y ∈ X (sub-additivity),(ii) p(λx) = |λ| · p(x) for λ ∈ K and x ∈ X (positive homogeneity).

Lemma 1.2. Let p be a seminorm on a vector space X. Then(a) p(0) = 0,(b) p(x) ≥ 0,(c) |p(x)− p(y)| ≤ p(x− y),

Proof. By homogeneity, p(0) = p(0 · x) = 0 · p(x) = 0, which is (a).By sub-additivity, p(x) ≤ p(x − y) + p(y), so p(x − y) ≥ p(x) − p(y). Substituting

x for y we get p(x − y) = | − 1| · p(y − x) ≥ p(y) − p(x), so (c) follows. In particular,p(x) ≥ |p(x)− p(0)| ≥ 0, which is (b).

Definition 1.3. Given a seminorm on a vector space X, we call

By,ε(p) = x ∈ X : p(x− y) < ε

the p-semiball of radius ε > 0 centered at y ∈ X.Given a finite collection of seminorms p1, . . . , pk, we call

By,ε(p1, . . . , pk) = By,ε(p1) ∩ . . . ∩By,ε(pk)

the (p1, . . . , pk)-semiball of radius ε > 0 centered at y ∈ X.

Definition 1.4. Let X be a vector space and let P be a family of seminorms on X. Wedefine a topology τP on X by calling A ⊆ X open if ∀x ∈ A, ∃ε > 0 and p1, . . . , pk ∈ Psuch that Bx,ε(p1, . . . , pk) ⊆ A.

With this topology, we call (X,P) a locally convex space. 2

2. Remark on terminology: a topological vector space in general is called locally convex if there is abase for the topology that consists of convex sets. It can be shown that a space is locally convex in thissense iff its topology is generated by a family of seminorms; see [32]. The reader will prove some results inthis direction in the exercises.

Page 7: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.2. Locally convex spaces 3

Lemma 1.5. Let (X,P) be a locally convex space. Then1. τP is a topology on X.2. The collection By,ε(p1, . . . , pk) : y ∈ X, ε > 0 and p1, . . . , pk ∈ P is a base for τP .3. The collection By,ε(p) : y ∈ X, ε > 0 and p ∈ P is a subbase for τP .4. If p1 and p2 are seminorms, then max(p1, p2) is a seminorm.5. If

P = P ∪ maxp1, . . . , pk : p1, . . . , pk ∈ P, k ∈ N ,

then τP = τP .6. The collection By,ε(p) : y ∈ X, ε > 0 and p ∈ P is a base for τP .7. If p ∈ P and p′ is a seminorm satisfying p′(x) ≤ cp(x) for all x ∈ X and some constantc > 0, then τP ′ = τP , where P ′ = P ∪ p′.

The advantage of working with P is that, similar to metric spaces, the semiballs By,ε(p),p ∈ P become a base, and we get rid of finite intersections By,ε(p1, . . . , pk).

The point of the last item is that, if P ′ is given initially, then using this result we canremove redundant seminorms from P ′ to “clean it up” without affecting the topology.

Proof. 1. Clearly, X ∈ τP . Also, ∅ ∈ τP because any element in ∅ (there are none) satisfiesany property. Next, if U, V ∈ τP , then for any x ∈ U∩V , we may find εi > 0 and pi, qi ∈P such thatBx,ε1(p1, . . . , pk) ⊆ U andBx,ε2(q1, . . . , qk′) ⊆ V . Let ε = min(ε1, ε2). ThenBx,ε(p1, . . . , pk, q1, . . . , qk′) = Bx,ε(p1, . . . , pk)∩Bx,ε(q1, . . . , qk′) ⊆ U∩V , so U∩V ∈ τP .Finally, if Uαα∈I is any collection in τP and x ∈ ∪α∈IUα, then x ∈ Uα for some α, sowe may find ε > 0 and pi such that Bx,ε(p1, . . . , pk) ⊆ Uα ⊆ ∪α∈IUα, so the union is inτP . Hence, τP is a topology.

2. Let U ∈ τP . Then ∀x ∈ U ∃εx > 0 and pi,x such that Bx,εx(p1,x, . . . , pk,x) ⊆ U .Hence, U ⊆ ∪x∈UBx,εx(p1,x, . . . , pk,x) ⊆ U , so U = ∪x∈UBx,εx(p1,x, . . . , pk,x). Thus,Bx,ε(p1, . . . , pk)x∈U,ε>0,pi∈P is a base.

3. By item 2, U = ∪x∈U(Bx,εx(p1,x) ∩ . . . ∩ Bx,εx(pk,x)

). Thus, Bx,ε(p)x∈U,ε>0,p∈P is a

subbase.4. Easy, left for the reader.5. If U ∈ τP , then U ∈ τP because P ⊆ P. In detail: if U ∈ τP and x ∈ U , there existε > 0 and pi ∈ P such that Bx,ε(p1, . . . , pk) ⊆ U . In particular, there exist pi ∈ P suchthat Bx,ε(p1, . . . , pk) ⊆ U , so U ∈ τP .If U ∈ τP , ∃ε > 0 and qi ∈ P such that Bx,ε(q1, . . . , qk) ⊆ U . Suppose qi =max(p(1)

i , . . . , p(mi)i ). ThenBx,ε(qi) = Bx,ε(p(1)

i )∩. . .∩Bx,ε(p(mi)i ) = Bx,ε(p(1)

i , . . . , p(mi)i ).

Hence, Bx,ε(p(1)1 , . . . , p

(m1)1 , p

(1)2 , . . . , p

(mk)k ) = Bx,ε(q1, . . . , qk) ⊆ U , so U ∈ τP .

6. By item 2, if U ∈ τP , then U = ∪x∈UBx,ε(p1,x, . . . , pk,x), where pi,x ∈ P. Hence,U = ∪x∈UBx,ε(qx), where qx = max(pi,x) ∈ P. Hence, Bx,ε(q)q∈P is a base.

7. If U ∈ τP , then U ∈ τP ′ because P ⊆ P ′. Conversely, if U ∈ τP ′ , then given x ∈ U ,there exist ε > 0, pi ∈ P ′ such that Bx,ε(p1, . . . , pk) ⊆ U . If all pi ∈ P, for all x ∈ U ,then U ∈ τP . Otherwise, say pi = p′. Then p′(x) ≤ cp(x) implies By,ε(p) ⊆ By,cε(p′).Hence, Bx,ε(p1) ∩ . . . ∩ Bx,ε(pi−1) ∩ Bx,ε/c(p) ∩ . . . ∩ Bx,ε(pk) ⊆ Bx,ε(p1, . . . , pk) ⊆ U .Hence, U ∈ τP (take ε′ = min(ε, ε/c)).

Reminder 1.6. Recall that a sequence xk ⊆ X in a topological space converges if forany open set ω ⊆ X containing x, we have xk ∈ ω for all large k.

Page 8: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4 Chapter 1. LCS and Distributions

Lemma 1.7. Let (X,P) be a locally convex space. Then xk → x iff p(xk − x)→ 0 for allp ∈ P.

Proof. Suppose xk → x and let p ∈ P. Then given ε > 0, there exists k0 > 0 such thatxk ∈ Bx,ε(p) for all k ≥ k0. Thus, p(x− xk) < ε for all k ≥ k0, so p(x− xk)→ 0.

Conversely, suppose p(x− xk)→ 0 for any p ∈ P and let ω be an open set containingx. Then there exists δ > 0 and pi ∈ P such that Bx,δ(p1, . . . , pk) ⊆ ω. Now we may findki such that pi(xk − x) < δ for all k ≥ ki. Let k0 = max ki. Then pi(x − xk) < δ for allk ≥ k0 and all i. Hence, xk ∈ Bx,δ(p1, . . . , pk) ⊆ ω for all k ≥ k0.

The reader knows from the theory of Banach spaces that a linear operator is continuousiff it is bounded. For locally convex spaces, we have the following.

Theorem 1.8. Let (X,P) and (Y,Q) be locally convex spaces. Then(a) f : X → Y is continuous iff ∀x ∈ X, ε > 0, q ∈ Q, ∃p ∈ P, δ > 0 such that

z ∈ Bx,δ(p) =⇒ q(f(x)− f(z)) < ε . (?)

(b) If moreover f is linear, then f is continuous iff ∀q ∈ Q, ∃p ∈ P, C > 0 such that

q(f(x)) ≤ Cp(x) ∀x ∈ X . (??)

(c) A seminorm q : X → R (not necessarily in P) is continuous on (X,P) iff ∃p ∈ P,C > 0 such that

q(x) ≤ Cp(x) ∀x ∈ X .

Proof. (a) Suppose f is continuous and let x ∈ X, ε > 0 and q ∈ Q. Then Bf(x),ε(q) isopen in Y , so f−1(Bf(x),ε(q)) is open in X. But x ∈ f−1(Bf(x),ε(q)), so ∃δ > 0, p ∈ Psuch that Bx,δ(p) ⊆ f−1(Bf(x),ε(q)). Hence, z ∈ Bx,δ(p) implies f(z) ∈ Bf(x),ε(q)implies q(f(z)− f(x)) < ε.Conversely, suppose (?) is true, let U ⊆ Y be open and let x ∈ f−1(U). Thenf(x) ∈ U , so ∃ε > 0, q ∈ Q such that Bf(x),ε(q) ⊆ U . By (?) we may find δ > 0, p ∈ Psuch that Bx,δ(p) ⊆ f−1(Bf(x),ε(q)) ⊆ f−1(U). Hence, f−1(U) is open.

(b) If (??) is true, by linearity q(f(x)−f(z)) ≤ Cp(x−z). So given ε > 0, taking δ = ε/C,we have z ∈ Bx,δ(p) implies p(x− z) < δ implies q(f(x)− f(z)) ≤ Cδ = ε.Conversely, suppose (?) is true. Then there is δ > 0 and p ∈ P such that z ∈ B0,δ(p)implies q(f(y)) ≤ 1. Let x ∈ X with p(x) 6= 0. Then z = δ

2p(x)x satisfies p(z) = δ2 < δ,

hence 1 ≥ q(f(z)) = δ2p(x)q(f(x)). We thus get (??) with C = 2

δ .For the case p(z) = 0, we must show q(f(z)) = 0. By (?), for any ε > 0 we havez ∈ B0,δ(p) implies q(f(z)) < ε. In particular, p(z) = 0 implies q(f(z)) < ε. As ε > 0is arbitrary, we have q(f(z)) = 0. Thus, (??) remains true in this case.

(c) Part (c) is analogous to (b) and is left to the reader.

Corollary 1.9. Let (X,P) and (Y,Q) be locally convex with X ⊂ Y . Then the followingstatements are equivalent:(i) The embedding X → Y is continuous.(ii) ∀q ∈ Q ∃p ∈ P, C > 0 such that q(x) ≤ Cp(x) for all x ∈ X.(iii) The restriction of every seminorm of (Y,Q) to X is continuous on (X,P).

Page 9: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.2. Locally convex spaces 5

Proof. (i) is equivalent to (ii) by Theorem 1.8 (b), taking f : X → Y as the inclusion mapf(x) = x.

(ii) implies (iii) by Theorem 1.8 (c). Next, if (iii) holds and q ∈ Q, then q =max1≤i≤k(qi) for some qi ∈ Q. By Theorem 1.8 (c), ∃pi ∈ P and Ci > 0 such thatqi(x) ≤ Cipi(x). Let C = maxCi and p = max pi. Then q(x) ≤ Cp(x) and (ii) holds.

In view of Lemma 1.7, we make the following definition:

Definition 1.10. Let (X,P) be a locally convex space. We say that– xk is Cauchy if for any p ∈ P, p(xj − xk)→ 0 as j, k →∞.– A ⊆ X is bounded if for any p ∈ P, supx∈A p(x) <∞.

Lemma 1.11. A Cauchy sequence is bounded.

Proof. If xk is Cauchy, then for any p ∈ P we may find m0 such that p(xj − xk) < 1for j, k ≥ m0. Hence, p(xk) ≤ p(xm0) + 1 for all k ≥ m0, so supk p(xk) ≤ C, whereC = max(p(x1), . . . , p(xm0−1), p(xm0) + 1).

Definition 1.12. A family of seminorms P on X is said to be separating or sufficient iffor any x ∈ X \ 0, there exists p ∈ P such that p(x) 6= 0.

Lemma 1.13. Let (X,P) be locally convex with P separating. Then τP is Hausdorff.

Proof. Let x, y ∈ X, x 6= y. Then ∃p ∈ P with δ := p(x− y) > 0. Let U = Bx,δ/2(p) andV = By,δ/2(p). Then U, V are open, satisfy x ∈ U , y ∈ V and U ∩ V = ∅.

Reminder 1.14. Recall that a topological space (X, τ) is said to be metrizable if there isa metric d on X which induces τ . If X is a vector space, we say that a metric is translationinvariant if d(x+ z, y + z) = d(x, y) for all z ∈ X.

Theorem 1.15. Let (X,P) be a locally convex space with P countable and separating.Then (X,P) is metrizable with a translation invariant metric. 3

Proof. Let P = p1, p2, . . . and let αk ⊂ R∗+ with αk → 0. Define

d(x, y) = maxk

αkpk(x− y)1 + pk(x− y) for x, y ∈ X .

The maximum is well-defined because pk1+pk ≤ 1 and αk → 0. Since αk > 0 for all k, we

have d(x, y) = 0 iff pk(x − y) = 0 for all k iff x = y because P is separating. Next, sincef(t) = t

1+t is increasing, we have a ≤ b + c implies a1+a ≤

b+c1+b+c ≤

b1+b + c

1+c and thetriangle inequality follows. Since d is clearly symmetric, it follows that d is a metric.

Now let ε > 0 and let Bx,ε(d) = y ∈ X : d(x, y) < ε be a metric ball. If αk < ε∀k, then d(x, y) < ε for any y, so Bx,ε(d) = X ∈ τP . Otherwise, there are finitely many isuch that αi ≥ ε (since αk → 0), say αi1 , . . . , αim . Then a simple calculation shows thatBx,ε(d) = Bx,δ1(pi1)∩ . . .∩Bx,δm(pim), where δj = ε

αj−ε . Hence, Bx,ε(d) ∈ τP for all ε > 0.But if U is open in the metric topology, then U = ∪x∈UBx,εx(d), hence, U ∈ τP .

3. The proof of many results in this course, including this one, can be simplified if you know aboutnets; see e.g. [28] or [41]. Nets generalize the concept of a sequence. A nice feature about nets is that theycharacterize the topology. For example, suppose we have two topologies τ1 and τ2 on a set X and we wantto show that τ1 = τ2. Then it suffices to show that a net converges in (X, τ1) iff it converges in (X, τ2).This is an easy exercise if you know that x ∈ A iff there exists a net xν ⊂ A converging to x (just showthat consequently, A is closed in (X, τ1) iff it is closed in (X, τ2)).

A related feature is that given a set X, one can define a topology on X by stating which nets converge.See e.g. [42, Problem 11D].

Page 10: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

6 Chapter 1. LCS and Distributions

Conversely, suppose U ∈ τP and let x ∈ U . Then ∃ε > 0, pi ∈ P such thatBx,ε(p1, . . . , pk) ⊆ U . Choose r < ε

1+ε min(α1, . . . , αk) and let y ∈ Bx,r(d). Then

αipi(x− y)1 + pi(x− y) ≤ d(x, y) < r <

αiε

1 + εfor i = 1, . . . k ,

so pi(x−y)1+pi(x−y) <

ε1+ε and thus pi(x− y) < ε. Hence, Bx,r(d) ⊆ Bx,ε(p1, . . . , pk) ⊆ U . Thus,

U is open in the metric topology.

1.3 Fréchet SpacesDefinition 1.16. A Fréchet space is a locally convex space (X,P) such that P is countable,separating, and (X,P) is complete.

By Theorem 1.15, any Fréchet space is metrizable with a translation invariant metric.Actually, the converse is also true: if X is a metrizable locally convex space, then itstopology is generated by a countable separating family of seminorms. These seminormsare constructed using the Minkowski functional (see Exercise 2) and the fact that metricspaces are first countable. We will not pursue this here; see e.g. [28] and [32].

Notation. We denote N0 = 0, 1, 2, . . .. Given α ∈ Nn0 and x ∈ Rn, we denote

xα = xα11 · · ·x

αnn and Dα = ∂α = ∂α1

1 · · · ∂αnn .

Finally, we let |α| = α1 + . . .+ αn.

Lemma 1.17. Let K ⊂ Rn be compact. Then the space C(K) of continuous functions onK, endowed with the norm ‖f‖C(K) = supx∈K |f(x)|, is a Banach space.

Proof. Exercise 8; just adapt the proof you learned for C [a, b] in undergraduate courses.More generally, if A ⊆ Rn and Cb(A) denotes the set of bounded continuous functions onA, then Cb(A) is a Banach space in the supremum norm.

Lemma 1.18. Let K ⊂ Rn be compact and let Cm(K) = f |K : f ∈ Cm(Rd), m ∈ N.Then Cm(K) is a Banach space in the norm ‖f‖Cm(K) = max|α|≤m ‖Dαf‖C(K).

Proof. Exercise 8. If (fk) is Cauchy in Cm(K), then (Dαfk) is Cauchy in C(K) for each|α| ≤ m. Now suppose n = 1, use Lemma 1.17, the fundamental theorem of calculus andinduction. Generalize for arbitrary n using partial derivatives.

Lemma 1.19. Let K ⊂ Rn be compact and let C∞(K) = ∩m∈NCm(K). Then C∞(K),endowed with the seminorms pm(f) = ‖f‖Cm(K), m = 0, 1, 2, . . ., is a Fréchet space.

Proof. Clearly, P = pm is countable and separating. Let (fk) be Cauchy in C∞(K).Then (fk) is Cauchy in Cm(K) for each m, so by Lemma 1.18, we may find gm ∈ Cm(K)such that pm(fk − gm)→ 0 as k →∞. But

p0(g0 − gm) ≤ p0(fk − g0) + p0(fk − gm) ≤ p0(fk − g0) + pm(fk − gm)→ 0

as k → ∞. Hence, g0 = gm ∈ Cm(K) for every m. Hence, g0 ∈ C∞(K) and fk → g0 inC∞(K).

We now turn to Cm(Ω) where Ω ⊆ Rn is an open set. We first give a technical lemma.

Page 11: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.3. Fréchet Spaces 7

Lemma 1.20. Let Ω ⊆ Rn be open. Then there is a sequence of compact sets K1,K2, . . .such that Kj is contained in the interior of Kj+1 and ∪jKj = Ω.

Proof. If Ω = Rn, take Kj = Bj = x ∈ Rn : |x| ≤ j, where |x| is the euclidean normon Rn. If Ω 6= Rn, let Aj = x ∈ Ω : dist(x, ∂Ω) ≥ 1

j . Then Aj is closed and lies in theinterior of Aj+1. Take Kj = Aj ∩Bj . Then Kj satisfies the requirements.

Lemma 1.21. Let Ω ⊆ Rn be open and let Kj ⊂ Ω be compact, Kj ⊂ Kj+1 and ∪jKj = Ω.Endow Cm(Ω), m ∈ N, with the seminorms pj,m(f) = ‖f‖Cm(Kj), j = 1, 2, . . .. ThenCm(Ω) is a Fréchet space.

Similarly, C∞(Ω), endowed with the seminorms pj,k(f) = ‖f‖Ck(Kj), j = 1, 2, . . . andk = 0, 1, . . . is a Fréchet space.

Proof. Let (fi) be Cauchy in Cm(Ω). By completeness of Cm(Kj), we may find gj ∈Cm(Kj) such that pj,m(fi−gj)→ 0 as i→∞. But pj,m(g) ≤ pj+1,m(g) for g ∈ Cm(Kj+1),so as in Lemma 1.19, it follows that gj = gj+1|Kj . Hence, we may define g on Ω byg(x) = gj(x) if x ∈ Kj . Then g ∈ Cm(Ω) and pj,m(fi − g)→ 0 as i→∞, for any j.

The case of C∞(Ω) is the same; just consider pj,k for arbitrary k instead of pj,m.

Remark 1.22. We could also consider Cm(Ω) with the larger family pK,m(f) = ‖f‖Cm(K),where K runs over all compact subsets of Ω. But the topology on Cm(Ω) would actuallybe the same. Indeed, if K ⊂ Ω is compact, then K ⊂ Kj for some j, so pK(f) ≤ pKj (f),so we may remove pK from the family without affecting the topology by Lemma 1.5.7.

Lemma 1.23. Let Ω ⊂ Rn be open and for 1 ≤ p ≤ ∞, define

Lploc(Ω) = u : Ω→ C measurable, u|K ∈ Lp(K) for any compact K ⊂ Ω .

Then Lploc(Ω) is a Fréchet space when endowed with the seminorms qj(u) = ‖u‖Lp(Kj)

Proof. Same proof as Lemma 1.21; you can write it as an exercise.

We now turn to the space of smooth functions with compact support. Recall that iff : Ω→ C is continuous, we define the support of f by supp f = x ∈ Ω : f(x) 6= 0.

We should first check that such functions exist, since a nonzero analytic function cannothave compact support, and being analytic doesn’t seem much stronger than being smooth.

Lemma 1.24. Let R > r > 0. There exists φ ∈ C∞(Rn) such that φ(x) = 1 for |x| ≤ r,φ(x) = 0 for |x| ≥ R and 0 ≤ φ ≤ 1 on Rn. 4

Proof. Let f(t) =e−1/t if t > 0,0 if t ≤ 0.

Clearly, f(t) → 0 as t 0. Moreover, f (k)(t) =pk(1/t)e−1/t if t > 0,0 if t < 0,

for some polynomials pk. Since p(1/t)e−1/t → 0 as t 0 for any

polynomial p, it follows that f ∈ C∞(R). Now let R > r > 0, let f1(t) = f(t− r)f(R− t)and let f2(t) =

∫∞t f1(s) ds. Then f2(t) ≥ 0 for all t, equals 0 for t ≥ R and equals

C :=∫ Rr f1(s) ds > 0 for t ≤ r. Thus, φ(x) := 1

C f2(|x|) satisfies the desired properties.

Definition 1.25. Let K ⊂ Rn be compact. We define

DK = ϕ ∈ C∞(Rn) : suppϕ ⊂ K .

and endow it with the family P∞ = pm, where pm(ϕ) = ‖ϕ‖Cm , m = 0, 1, 2, . . .

4. Lemma and picture taken from [19].

Page 12: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

8 Chapter 1. LCS and Distributions

Remarks 1.26. (i) This topology is the one induced by the embedding DK → C∞(K).Note that DK is a bit smaller than C∞(K). For example, ϕ1 ≡ 1 on K can beextended to a smooth function in C∞(Rn) and is thus in C∞(K), but ϕ1 /∈ DKbecause any smooth extension of ϕ1 must be nonzero in a small neighborhood of K.

(ii) If K contains an open ball, then by scaling and translating φ, we can find infinitelymany smooth functions so that their supports are in K but do not intersect eachother. This implies that DK is infinite-dimensional.

(iii) DK is a Fréchet space, as it is a closed subspace of C∞(K).(iv) Note that P∞ = P∞. Indeed, P∞ ⊆ P∞ is clear, and if q ∈ P∞, say q =

max(qi1 , . . . , qik) for some qij ∈ P∞, then qij (ϕ) = ‖ϕ‖Cij

. Takem = max(i1, . . . , ik).Then q(ϕ) = pm(ϕ), so q ∈ P∞.

1.4 The inductive limit topology

Definition 1.27. Let Ω ⊆ Rn be open. We define the space of test functions by

D(Ω) =⋃KbΩ

DK ,

where K b Ω means that K is a compact subset of Ω.

Note that D(Ω) is simply the set of smooth functions of compact support in Ω, usuallydenoted C∞c (Ω). Here we use the symbol D(Ω) because we will now endow it with aspecial topology which is stronger than the C∞ topology.

The topology we would like to have on D(Ω) should fulfill two requirements:1) The embeddings DK → D(Ω) should be continuous. This means by Corollary 1.9 that

the restriction of every seminorm p in (D(Ω),P) to DK must be continuous. Here Pis the hypothetical family inducing a topology on D(Ω). The family P∞ satisfies thisrequirement by Theorem 1.8(c).However, (D(Ω),P∞) is not complete. To see this for Ω = R, take a ϕ ∈ D(R) whosesupport is small and concentrated near 0, and consider the sequence

ϕk(x) = ϕ(x) + 2−1ϕ(x− 1) + . . .+ 2−kϕ(x− k), k = 1, 2, . . .

Then (ϕk) is Cauchy in P∞ : given r < s,

pm(ϕr − ϕs) = pm(2−r−1ϕ(x− r − 1) + . . .+ 2−sϕ(x− s)

)= 2−r−1‖ϕ‖Cm → 0

as r (and hence s) go to∞. However, the support of the limit function is not compact.

Page 13: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.4. The inductive limit topology 9

2) This failure indicates that P∞ has not enough seminorms to ensure that Cauchy se-quences remain in a compact set. So we can add more seminorms to P∞ and hope thatthings get better. Having a large family of seminorms will also have the benefit thatit becomes easier for a function f : D(Ω) → Y to be continuous. However, we shouldnot enlarge P∞ too much, since we still need every restriction p|DK to be continuous.These two competing requirements give rise to a unique family P as follows.

Definition 1.28. We endow D(Ω) with the family P defined by p ∈ P ⇐⇒ p|DK iscontinuous for each compact K ⊂ Ω.

The induced topology τP on D(Ω) is called the inductive limit topology.

Remark 1.29. Note that P = P. Indeed, P ⊆ P is clear, and if q ∈ P, say q =max(qi1 , . . . , qik) for some qij ∈ P, then given K ⊂ Ω compact, qij |DK is continuous, so∃cj > 0 and pj ∈ P∞ such that qij (ϕ) ≤ cjpj(ϕ) for ϕ ∈ DK . Let c = max(c1, . . . , ck) andp = max(p1, . . . , pk) ∈ P∞ = P∞. Then q(ϕ) ≤ cp(ϕ) for ϕ ∈ DK , hence q ∈ P.

Lemma 1.30. The topology of DK is exactly the one induced by the embedding DK →D(Ω). In other words, the topology of DK is exactly the subspace topology inherited fromD(Ω).

This shows that the topology of D(Ω) is completely natural; with a different topology,the DK could have obtained two topologies: their own τP∞ topology, and a possiblydifferent one inherited from D(Ω), which is certainly inconvenient.

Proof. Let A ⊆ D(Ω) be open, let K ⊂ Ω be compact and let ψ ∈ A ∩ DK . Then ∃ε > 0,p ∈ P such that Bψ,ε(p) ⊂ A. But p|DK is continuous, so ∃pm with p ≤ cpm on DK , c > 0.Hence, if Bψ,ε(pm;DK) denotes semiballs in DK , we get Bψ,ε/c(pm,DK) ⊆ Bψ,ε(p)∩DK ⊆A ∩ DK . Hence, A ∩ DK is open in DK .

Conversely, since pm ⊂ P, the semiball Bϕ,δ(pm) has a meaning, and any semiballBϕ,δ(pm;DK) = Bϕ,δ(pm) ∩ DK . It follows that any open set in DK can be written as anintersection of an open set in D(Ω) with DK .

We now show that P is large enough to solve the problem we had with P∞:

Theorem 1.31. A set A ⊆ D(Ω) is bounded iff there is a compact K ⊂ Ω such thatA ⊆ DK and A is bounded in DK .

Proof. Suppose A is bounded in DK for some compactK ⊂ Ω and let p ∈ P. By continuityof p|DK , ∃pm such that p ≤ cpm on DK . But ∃M > 0 such that pm(ϕ) ≤ M for ϕ ∈ A.Hence, p(ϕ) ≤ cM on A and A is bounded in D(Ω).

Conversely, suppose A ⊆ D(Ω) is bounded but A * DK for any compact K ⊂ Ω. Thenthere is a sequence ϕm ⊆ A and xm ⊂ Ω such that ϕm(xm) 6= 0 and xm leaves anycompact K for large m. Let

p(ϕ) = supm

m|ϕ(xm)||ϕm(xm)| , ϕ ∈ D(Ω) .

Then p is a seminorm and p ∈ P. Indeed, if K ′ ⊂ Ω is compact, ∃m0 with xm /∈ K ′

for all m ≥ m0, so for any ϕ ∈ DK′ , we have ϕ(xm) = 0 for all m ≥ m0, so p(ϕ) =max1≤m≤m0

m|ϕ(xm)||ϕm(xm)| ≤ C‖ϕ‖C0 . But p(ϕj) ≥ j, so p is not bounded on A. This contra-

diction shows that A ⊆ DK for some compact K.

Corollary 1.32. a) A sequence ϕj is Cauchy in D(Ω) iff ϕj ⊆ DK for some compactK ⊂ Ω and ϕj is Cauchy in DK .

Page 14: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

10 Chapter 1. LCS and Distributions

b) ϕj → 0 in D(Ω) iff ϕj ⊆ DK for some compact K ⊂ Ω and ϕj → 0 in DK .c) D(Ω) is complete.

Proof. a) If ϕj ⊆ DK is Cauchy in DK and p ∈ P, then p|DK is continuous, so ∃pmwith p ≤ cpm. Hence, p(ϕr − ϕs) ≤ cpm(ϕr − ϕs), so ϕj is Cauchy in D(Ω).Conversely, if ϕj ⊆ D(Ω) is Cauchy, then ϕj is bounded, so by Theorem 1.31,ϕj ⊆ DK for some compact K ⊂ Ω. Since pm ⊂ P, we also have pm(ϕr − ϕs)→ 0as r, s→∞ for each pm.

b) Same as a), left as an exercise.c) Let ϕj ⊆ D(Ω) be Cauchy. Then by a), it is Cauchy in some DK . But DK is Fréchet,

so the limit exists in DK . Using b), the sequence also converges in D(Ω).

Theorem 1.33. Let (Y,Q) be locally convex and let f : D(Ω) → Y be linear. Then thefollowing statements are equivalent:

(i) f is continuous.(ii) ϕj → 0 in D(Ω) implies f(ϕj)→ 0 in Y .(iii) For any compact K ⊂ Ω, f : DK → Y is continuous.

Proof. (i) =⇒ (ii). If f is continuous, by Theorem 1.8 for any q ∈ Q there is p ∈ P andC > 0 such that q(f(ϕ)) ≤ Cp(ϕ) for all ϕ ∈ D(Ω). Hence, ϕj → 0 in D(Ω) impliesp(ϕj)→ 0 implies q(f(ϕj))→ 0. Since this holds for all q, we have f(ϕj)→ 0 in Y .

(ii) =⇒ (iii). Let K ⊂ Ω be compact. If (ii) holds, then using Corollary 1.32, if ϕj → 0in DK , then f(ϕj)→ 0 in Y . So by linearity, if ϕj → ϕ in DK , then f(ϕj)→ f(ϕ). Thus,f preserves convergent sequences. But DK is metrizable. Hence, f is continuous. 5

(iii) =⇒ (i) By Theorem 1.8, we must show that ∀q ∈ Q ∃p ∈ P, C > 0 such thatq(f(ϕ)) ≤ Cp(ϕ) on D(Ω). Let q ∈ Q and define

p(ϕ) := q(f(ϕ)), ϕ ∈ D(Ω) .

Then p is a seminorm on D(Ω). Moreover, if K ⊂ Ω is compact, then by continuity of fon DK , we may find c > 0 and pm such that p(ϕ) = q(f(ϕ)) ≤ cpm(ϕ) for all ϕ ∈ DK .Hence, p|DK is continuous, so p ∈ P.

Corollary 1.34. Every differentiation Dα : D(Ω)→ D(Ω) is continuous.

Proof. Let ϕj → 0 in D(Ω). Then ϕj ⊆ DK for some compact K ⊂ Ω and ‖ϕj‖Ck → 0for each k. So Dαϕj ⊂ DK and ‖Dαϕj‖Ck ≤ ‖ϕj‖Ck+|α| → 0 for each k. So Dαϕj → 0in D(Ω), hence Dα is continuous.

Lemma 1.35. D(Ω) is not metrizable.

Using Theorem 1.15, it follows that P is uncountable. Hence, we added a huge numberof seminorms when we enlarged P∞ to make D(Ω) complete.

5. This is general topology: ifX,Y are topological spaces,X is metrizable, then f : X → Y is continuousif it preserves convergent sequences. To see this, suppose on the contrary that f is not continuous. Thenthere is an open U ⊂ Y such that f−1(U) is not open. So ∃x ∈ f−1(U) such that Bx,ε(d) * f−1(U) forany ε > 0. So for ε = 1/j ∃xj ∈ Bx,1/j(d) but f(xj) /∈ U . Since f(x) ∈ U , we thus get xj → x butf(xj) 9 f(x).

Page 15: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.5. Distributions 11

Proof. We first note that DK , for K ⊂ Ω, is closed in D(Ω), since it is complete and itstopology is just the subspace topology inherited from D(Ω) by Lemma 1.30.

Next, DK has empty interior, relative to D(Ω). Indeed, given ε > 0, p ∈ P andϕ ∈ DK , we may choose a smooth nonzero ψ with support in Ω \ K (see Lemma 1.24).Let ψ = ψ if p(ψ) = 0 and ψ = 1

p(ψ)ψ otherwise. Then, f := ϕ + ε2 ψ satisfies f ∈ D(Ω)

and p(ϕ − f) < ε. Hence, Bϕ,ε(p) * DK . Since p, ε and ϕ are arbitrary, DK has emptyinterior in D(Ω).

Finally, let Kj be as in Lemma 1.20. Then D(Ω) = ∪jDKj . Since D(Ω) is complete, itfollows from the Baire category theorem that D(Ω) is not metrizable. 6

Remark 1.36. In general, if X1 ⊂ X2 ⊂ . . . are locally convex spaces, then the inductivelimit topology on the union X = ∪jXj is the finest topology that leaves the embeddingsXj → X continuous. If each Xj is a Fréchet space, we call X an LF space. The readercan find more details in [28] and [37].

For example, all the results of this section apply to the space of compactly supportedLp functions, which is defined by Lpcomp(Ω) =

⋃KbΩ L

p(K).

1.5 DistributionsDefinition 1.37. A distribution (or generalized function) on Ω is a continuous linearfunctional on D(Ω). The space of distributions on Ω is denoted by D′(Ω).

Given T ∈ D′(Ω), we often denote

〈T, ϕ〉 := T (ϕ), ϕ ∈ D(Ω) .

Lemma 1.38. A linear functional T : D(Ω) → C is in D′(Ω) iff any of the followingconditions holds:a) ϕj → 0 in D(Ω) implies T (ϕj)→ 0.b) For any compact K ⊂ Ω, there exist m, C > 0 such that

|T (ϕ)| ≤ C ‖ϕ‖Cm for ϕ ∈ DK . (†)

Proof. Follows directly from Theorem 1.33.

Definition 1.39. Let T ∈ D′(Ω). If (†) holds with the same m for all compact K ⊂ Ω(but C possibly depending on K), then the smallest such m is called the order of T . If nom will do for all K, we say that T has infinite order.

Example 1.40. 1. Define

δ(ϕ) := ϕ(0) for ϕ ∈ D(Rn) .

Then δ is a distribution of order 0 since |δ(ϕ)| ≤ ‖ϕ‖C0 .2. Given u ∈ L1

loc(Ω), let Tu(ϕ) :=∫uϕ. Then Tu is a distribution of order 0 since

|Tu(ϕ)| ≤ ‖u‖L1(K)‖ϕ‖C0 for ϕ ∈ DK .

In particular, each u ∈ C(Ω) defines a distribution Tu. We show in the next chapterthat u 7→ Tu embeds L1

loc(Ω) in D′(Ω).

6. The reader can find a different proof in [15] using sequences, which works if Ω = R and more generallyif Ω = Rn.

Page 16: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

12 Chapter 1. LCS and Distributions

We now define the derivative of a distribution. To motivate this, note that if u ∈ C∞(R)and ϕ ∈ D(R), then integrating by parts we have∫

u′(x)ϕ(x) dx = −∫u(x)ϕ′(x) dx

so that Tu′(ϕ) = −Tu(ϕ′). So if we think of Tu′ as the derivative of Tu, then the followingdefinition is quite natural.

Definition 1.41 (Differentiation). Given T ∈ D′(Ω) and α ∈ Nn0 , we define

(DαT )(ϕ) := (−1)|α|T (Dαϕ) for ϕ ∈ D(Ω) .

Lemma 1.42. Let T ∈ D′(Ω) and α, β ∈ Nn0 . Then1. DαT ∈ D′(Ω).2. DαDβT = Dα+βT = DβDαT .

Proof. Let T ∈ D′(Ω) and K ⊂ Ω compact, say |T (ψ)| ≤ C‖ψ‖Cm for ϕ ∈ DK . Then

|(DαT )(ϕ)| = |T (Dαϕ)| ≤ C‖Dαϕ‖Cm ≤ C‖ϕ‖Cm+|α| .

Hence, DαT ∈ D′(Ω). Next,

(DαDβT )(ϕ) = (−1)|α|(DβT )(Dαϕ) = (−1)|α|+|β|T (DβDαϕ)= (−1)|α+β|T (Dα+βϕ) = (Dα+βT )(ϕ) .

Example 1.43. δ′(ϕ) = −δ(ϕ′) = −ϕ′(0).

Definition 1.44 (Multiplication by functions). Let T ∈ D′(Ω) and f ∈ C∞(Ω). Wedefine

(fT )(ϕ) := T (fϕ) for ϕ ∈ D(Ω) .

Lemma 1.45. If T ∈ D′(Ω) and f ∈ C∞(Ω), then fT ∈ D′(Ω).

Proof. Let K ⊂ Ω be compact and assume ∃C,m such that |T (ϕ)| ≤ C‖ϕ‖Cm on DK .Then |(fT )(ϕ)| = |T (fϕ)| ≤ C‖fϕ‖Cm . ButDα(fϕ) =

∑β≤α cαβ(Dα−βf)(Dβϕ) for some

cαβ (Leibniz formula). Hence, ‖fϕ‖Cm ≤ Cf‖ϕ‖Cm on DK . Thus, |(fT )(ϕ)| ≤ CCf‖ϕ‖Cmon DK and fT ∈ D′(Ω).

Definition 1.46 (Sequences of distributions). We endow D′(Ω) with the weak*-topology.Hence, we say that Tj → T in D′(Ω) if Tj(ϕ)→ T (ϕ) for all ϕ ∈ D(Ω).

Example 1.47. Let uj(x) = sin(jx). Then uj → 0 in D′(R) (where we abused notationand wrote uj instead of Tuj ). Indeed, if ϕ ∈ D(R), then∫

sin(jx)ϕ(x) dx = 1j

∫cos(jx)ϕ′(x) dx→ 0 as j →∞ .

Theorem 1.48. Suppose Tj ⊆ D′(Ω) and limj→∞ Tj(ϕ) exists (as a complex number)for every ϕ ∈ D(Ω). Define T (ϕ) := limj→∞ Tj(ϕ). Then T ∈ D′(Ω), and for everyα ∈ Nn0 ,

DαTj → DαT in D′(Ω)

Page 17: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.6. PoU and sheaf structure 13

Proof. Let K ⊂ Ω be compact. Then limj→∞ Tj(ϕ) exists for every ϕ ∈ DK . Since DKis Fréchet, by a well-known application of the uniform boundedness principle (which alsoholds in Fréchet spaces; see [32]), we know that T |DK is continuous 7. Hence, T ∈ D′(Ω).Moreover, if ϕ ∈ D(Ω),

limj→∞

(DαTj)(ϕ) = (−1)|α| limj→∞

Tj(Dαϕ) = (−1)|α|T (Dαϕ) = (DαT )(ϕ) .

Remark 1.49. Let ϕ ∈ D(Ω) and suppose gj → g in C∞(Ω). Then ϕgj → ϕg in D(Ω).Indeed, by hypothesis, pr,s(gj − g) → 0 for all r, s. Let K = suppϕ. Then given p ∈ P,∃c, pm with p ≤ cpm, so p(ϕgj − ϕg) ≤ c‖ϕ(gj − g)‖Cm(K) ≤ c‖ϕ(gj − g)‖Cm(Kr) for anyKr ⊇ K. Using Leibniz, we thus get p(ϕgj−ϕg) ≤ ccϕ‖gj−g‖Cm(Kr) = ccϕpr,m(gj−g)→0. As p ∈ P is arbitrary, we get ϕgj → ϕg.

Theorem 1.50. If Tj → T in D′(Ω) and gj → g in C∞(Ω), then gjTj → gT in D′(Ω).

Proof. Fix ϕ ∈ D(Ω) and define a bilinear functional Bϕ on C∞(Ω)×D′(Ω) by

Bϕ(g, T ) := (gT )(ϕ) = T (gϕ) .

Then Bϕ is separately continuous (use Remark 1.49). By another application of theuniform boundedness principle (see [32] or [28] in the Banach case), it follows that B isjointly continuous, i.e. Bϕ(gj , Tj)→ Bϕ(g, T ). Hence, (gjTj)(ϕ)→ (gT )(ϕ).

1.6 PoU and sheaf structureDefinition 1.51. Let T ∈ D′(Ω) and let ω ⊂ Ω be open. We define T |ω : D(ω) → C byT |ω(ϕ) := T (ϕ) for ϕ ∈ D(ω).

Lemma 1.52. If T ∈ D′(Ω) and ω ⊂ Ω, then T |ω ∈ D′(ω).

Proof. Let ϕj → 0 in D(ω). Then ϕj ⊂ K for some compact K ⊂ ω and ϕj → 0 in DK .Hence, ϕj → 0 in D(Ω). Hence, T |ω(ϕj) = T (ϕj)→ 0. Thus, T |ω ∈ D′(ω).

Example 1.53. Let f ∈ L1loc(Ω). Then Tf |ω = 0 iff f(x) = 0 for a.e. x ∈ ω. 8

Restrictions allow us to discuss distributions locally. The aim of this section is to showthat, given an open cover ωα of Ω, together with a set of “compatible” distributionsTα ∈ D′(ωα), we can define a distribution globally by gluing the Tα together. This is awidely used idea in modern mathematics, e.g. algebraic and differential geometry. Westart with the following theorem.

Theorem 1.54 (Partition of unity). Let Γ = ωα be a collection of open sets withΩ = ∪αωα. Then there exists ψi ⊆ D(Ω) with ψi ≥ 0 such that(a) each ψi has its support in some ωα ∈ Γ,(b)

∑∞i=1 ψi(x) = 1 for every x ∈ Ω,

(c) ∀K b Ω ∃m ∈ N and W open such that W ⊃ K and

ψ1(x) + . . .+ ψm(x) = 1 for all x ∈W .

7. This can also be found in [22, Lemma 4.9-5] in the Banach case.8. Here we used the nontrivial fact that if f, g ∈ L1

loc(ω), then Tf = Tg iff f = g a.e. on ω. Thisgeneralizes the du Bois-Reymond Lemma. We shall prove this next chapter.

Page 18: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

14 Chapter 1. LCS and Distributions

We call ψi a locally finite partition of unity in Ω subordinate to Γ. We say it islocally finite because, by (b) and (c), each x ∈ Ω has a neighborhood which intersects thesupports of only finitely many ψi (take K = x, then ∀ i > m, ψi(y) = 0 in W ).

Proof. Let S be a countable dense subset of Ω. Let C1, C2, . . . be the set of all closedballs Ci with center pi ∈ S, radius ri ∈ Q, such that Ci ⊂ ωα for some ωα ∈ Γ. Let Vi ⊂ Cibe the open ball with center pi and radius ri/2. By density of S, we have Ω = ∪Vi.

Using Lemma 1.24, by scaling and translating, we can find φi ∈ D(Ω) such that0 ≤ φi ≤ 1, φ1 = 1 on Vi and φi = 0 outside Ci. Define ψ1 = φ1, and inductively,

ψi+1 = (1− φ1) · · · (1− φi)φi+1 (i ≥ 1) . (‡)

Clearly, ψi = 0 outside Ci, so suppψi ⊂ Ci ⊂ ωα and (a) is true. Next, we have

ψ1 + . . .+ ψi = 1− (1− φ1) · · · (1− φi) . (‡‡)

Indeed, (‡‡) is trivial for i = 1. If it holds at i, then adding (‡) and (‡‡), we obtain (‡‡)for i+ 1 in place of i. Hence, (‡‡) holds for each i. Since φi = 1 in Vi, it follows that

ψ1(x) + . . .+ ψm(x) = 1 x ∈ V1 ∪ . . . ∪ Vm .

This gives (b). Finally, if K is compact, K ⊂ V1 ∪ . . .∪ Vm for some m, so (c) follows.

Definition 1.55. Let X be a topological space. A presheaf of sets F on X is a rule whichassigns to each open U ⊂ X a set F(U), and to each inclusion V ⊂ U a map ρUV : F(U)→F(V ) such that ρUU = idF(U), and whenever W ⊂ V ⊂ U , we have ρUW = ρVW ρUV .

The elements of F(U) are called sections of F over U . We denote s|V := ρUV (s) ifs ∈ F(U) and V ⊂ U .

A sheaf of sets F on X is a presheaf of sets with the following additional property:given an open cover U = ∪αUα and some sections sα ∈ F(Uα), if ∀α, β we have

sα|Uα∩Uβ = sβ|Uα∩Uβ ,

then there exists a unique section s ∈ F(U) such that s|Uα = sα for each α.

Example 1.56. Let X,Y be topological spaces. Given an open U ⊂ X, let F(U) :=C(U, Y ) be the set of continuous functions from U to Y , with the usual restriction opera-tion. Then F is a sheaf. Indeed, ρUU (f) = f |U = f if f ∈ C(U, Y ), we have f |W = (f |V )|Wwhenever W ⊂ V ⊂ U , so F is a presheaf. Suppose U = ∪Uα is an open cover, andsome fα ∈ C(Uα, Y ) satisfy for each α, β that fα|Uα∩Uβ = fβ|Uα∩Uβ . Then we definef : U → Y by f(u) := fα(u) if u ∈ Uα. This is well defined by our assumptions. More-over, f |Uα = fα by definition and each fα is continuous, so f is continuous. Finally, foruniqueness, suppose f also satisfies the gluing axiom. Then given x ∈ U , say x ∈ Uα, wehave f(x) = fα(x) = f(x).

Theorem 1.57. Given an open Ω ⊆ Rn, let F(Ω) := D′(Ω). Then F is a sheaf.

Proof. Again, F is clearly a presheaf. Now let Γ = ωα be an open cover of Ω and assumethere is for each α some Tα ∈ D′(ωα), such that Tα|ωα∩ωβ = Tβ|ωα∩ωβ .

Let ψi be a partition of unity subordinate to Γ, and associate to each i a set ωαi ∈ Γsuch that ωαi ⊃ suppψi.

Page 19: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.7. Distributions with compact support 15

If ϕ ∈ D(Ω), then ϕ =∑ψiϕ. Since ϕ has compact support, only finitely many terms

in the sum are nonzero. Define

T (ϕ) :=∞∑i=1

Tαi(ψiϕ) .

To see that T ∈ D′(Ω), let ϕj → 0 in D(Ω). Then ϕj ⊆ DK for some compact K ⊂ Ω.Choose m as in Theorem 1.54(c). Then T (ϕj) =

∑mi=1 Tαi(ψiϕj). But ψiϕj → 0 in D(ωαi)

as j →∞ by Leibniz rule. Hence, T (ϕj)→ 0, so T ∈ D′(Ω).Next, let ϕ ∈ D(ωα). Then ψiϕ ∈ D(ωαi ∩ ωα), so by hypothesis Tαi(ψiϕ) = Tα(ψiϕ).

Hence, T (ϕ) =∑Tα(ψiϕ) = Tα(

∑ψiϕ) = Tα(ϕ). Thus, T |ωα = Tα.

Thus, T exists. For uniqueness, if T also satisfies the requirements, then given ϕ ∈D(Ω) we have T (ϕ) = T (

∑ψiϕ) =

∑T (ψiϕ) =

∑Tαi(ψiϕ) = T (ϕ).

1.7 Distributions with compact supportDefinition 1.58. The support of a distribution T ∈ D′(Ω) is the set

suppT := Ω \(∪ ω ⊆ Ω open : T |ω = 0

).

Lemma 1.59. Let T ∈ D′(Ω). Then T |Ω\suppT = 0.

Proof. Let W = Ω \ suppT , Γ = ω ⊆ Ω open : T |ω = 0 and ψi a partition of unity inW subordinate to Γ. Given ϕ ∈ D(W ), we have ϕ =

∑ϕψi, and only finitely many terms

in the sum are nonzero. Hence, T (ϕ) =∑T (ψiϕ) = 0, since each ψi has its support in

some ω ∈ Γ.

Example 1.60. Given x ∈ Ω, define δx ∈ D′(Ω) by δx(ϕ) = ϕ(x). Then supp δx = x.One can show that conversely, if suppT = x, then T = P (D)δx for some differential

operator P (D); see e.g. [32].

Definition 1.61. Let D′c(Ω) be the subset of D′(Ω) consisting of distributions whosesupport is a compact subset of Ω.

Lemma 1.62. For any T ∈ D′(Rn) there exists Tj ⊆ D′c(Rn) with Tj → T .

Proof. Use Lemma 1.24 to find φ ∈ C∞(Rn) with φ(x) = 1 if |x| < 12 and φ(x) = 0 if

|x| > 1. Let φj(x) = φ(x/j) and Tj = φjT . Then suppTj ⊂ Bj = x ∈ Rn : |x| ≤ j, andgiven ψ ∈ D(Rn), Tj(ψ) = T (φjψ)→ T (ψ).

In the following theorem, E(Ω) is the space C∞(Ω) endowed with the seminorms pj,kfrom Lemma 1.21.

Theorem 1.63. (1) Any T ∈ D′c(Ω) has finite order.(2) E ′(Ω) = D′c(Ω).

Proof. (1) Let ψ ∈ D(Ω) such that ψ = 1 in a neighborhood of suppT . Such ψ can beobtained e.g. using Theorem 1.54. Then T (ϕ) = T (ψϕ) for any ϕ ∈ D(Ω). Indeed,T (ϕ) = T (ψϕ+ (1− ψ)ϕ) = T (ψϕ) + T ((1− ψ)ϕ) = T (ψϕ) using Lemma 1.59.Let K = suppψ. Then by Lemma 1.38 ∃m,C > 0 such that |T (ϕ)| ≤ c1‖ϕ‖Cm for allϕ ∈ DK . Hence, for any ϕ ∈ D(Ω), we have

|T (ϕ)| = |T (ψϕ)| ≤ c1‖ψϕ‖Cm ≤ c1c2‖ϕ‖Cm , (∗)

where we used the Leibniz formula in the last step. In particular, T has order ≤ m.

Page 20: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

16 Chapter 1. LCS and Distributions

(2) If T ∈ E ′(Ω), then by Theorem 1.8 ∃pj,m and C > 0 such that |T (f)| ≤ Cpj,m(f)for f ∈ E(Ω). Thus, if supp f ⊂ Ω \ Kj , then T (f) = 0. Hence, suppT ⊂ Kj , soT ∈ D′c(Ω).Conversely, let T ∈ D′c(Ω) and let ψ ∈ D(Ω) as in (1). We define T (f) := T (ψf)for f ∈ E(Ω). Since T (ϕ) = T (ψϕ) for ϕ ∈ D(Ω), this definition extends T fromD(Ω) to E(Ω). Moreover, by (∗), ∃m,C > 0 such that |T (f)| ≤ C‖ψf‖Cm . LetK = suppψ. By the Leibniz formula ‖ψf‖Cm = ‖ψf‖Cm(K) ≤ c‖f‖Cm(K) ≤ cpj,m(f)for any Kj ⊃ K. Hence, |T (f)| ≤ cpj,m(f), so T ∈ E ′(Ω) by Theorem 1.8.

1.8 Further results and applicationsWe end this chapter with a collection of facts about distributions, given without proof.We first mention that there are some representation theorems for distributions. For ex-

ample, any T ∈ D′c(Ω) takes the form T =∑|α|≤kD

α(µα) for some complex measures µα.Here µα(ϕ) :=

∫ϕ dµα. The proof of this fact uses the Riesz representation theorem (the

version which says that the dual of C(Ω) is a set of finite measures). For details, see [44].We also mention a representation theorem for “tempered distributions” in Exercise 14,and there is still another such theorem for general T ∈ D′(Ω) in [32].

One deficiency in distribution theory is that we cannot multiply two arbitrary dis-tributions. The problem is deep; in fact Schwartz proved it is impossible to extendthe product of continuous functions in a straightforward way [6]. Still, we can definea meaningful product in various special cases. One such case is when the distributionsdepend on different sets of variables. Thus, if Tj ∈ D′(Xj), we define the tensor productT1 ⊗ T2 ∈ D′(X1 × X2). In case Tj = uj are functions, this product gives the function(u1 ⊗ u2)(x1, x2) = u1(x1)u2(x2).

Any function K ∈ C(X1 × X2) defines an integral operator K : Cc(X2) → C(X1)given by (Kf)(x1) =

∫K(x1, x2)f(x2) dx2. This can be extended as follows: any K ∈

D′(X1 ×X2) defines a continuous operator K : C∞c (X2) → D′(X1) given by (Kϕ)(ψ) :=K(ψ⊗ϕ), where (ψ⊗ϕ)(x1, x2) = ψ(x1)ϕ(x2). The well known Schwartz kernel theoremsays that conversely, to any continuous map K : C∞c (X2)→ D′(X1), there exists a uniquedistribution K ∈ D′(X1 × X2) such that (Kϕ)(ψ) = K(ψ ⊗ ϕ). See [21] for a proof.Another version concerning bilinear maps on the Schwartz space S(Rd) of functions ofrapid decay may be found in [28].

The main applications of distribution theory lie in the analysis of partial differentialequations. We already mentioned the existence of fundamental solutions in Section 1.1and their use in solving PDEs. Another important use is in the study of elliptic regularity.Here is an example of the results one gets: let P (D) be an elliptic differential operator(e.g. −∆). If v ∈ C∞(Ω) and u ∈ D′(Ω) is a distributional solution to P (D)u = v, thenu ∈ C∞(Ω). Thus, u has much more smoothness than initially assumed. The reader canfind the basic applications of distributions to PDE in [32]. For much more results, see [21]and [11].

1.9 Exercises1. Let p be a seminorm on a vector space X and let M = x ∈ X : p(x) ≤ 1. Show that

1. M is convex : if x, y ∈M and 0 ≤ t ≤ 1, then tx+ (1− t)y ∈M ,2. M is balanced : if x ∈M and |α| ≤ 1, then αx ∈M ,3. M is absorbing : for any x ∈ X, there exists α > 0 such that α−1x ∈M ,

Page 21: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.9. Exercises 17

4. p(x) = infα > 0 : α−1x ∈M.2. Let M be a convex, balanced, absorbing subset of a vector space X. Define the

Minkowski functional of M , ρM : X → R+ by ρM (x) = infα > 0 : α−1x ∈ M.Show that ρM is a seminorm on X.Hint: given x, y ∈ X and ε > 0, let t = ρM (x) + ε and s = ρM (y) + ε. Then x/t andy/s ∈M . Using convexity, show that (x+y)/(s+ t) ∈M and hence ρM (x+y) ≤ s+ t.

3. Show that the functions pi : Rn → R defined by pi(x) = |xi| for x = (x1, . . . , xn) ∈ Rnare seminorms.

4. Show that the topologies given by finite separating families of seminorms are actuallygiven by norms.That is, suppose P = pi : i = 1, . . . , n is a separating family of seminorms on X.Show that ‖x‖ := max1≤i≤n pi(x) is a norm on X that induces the same topology asthe family P.More generally, show that for any norm N on Rn, the map ‖x‖N := N(p1(x), . . . , pn(x))is a norm on X which induces the same topology as P.

5. Let X be a Hausdorff space and endow C(X) with the family of seminorms px : x ∈X, where px(f) = |f(x)|. Check that fn → f is equivalent to pointwise convergence.It can be shown that this topology is not metrizable in general 9. In particular, onecannot find a norm on C[0, 1] which induces the topology of pointwise convergence.

6. Show that the topology induced by a family of seminorms P on a vector space X isthe weakest topology in which every p ∈ P is continuous, and in which the operationof addition (x, y) 7→ x + y from X ×X → X is continuous. This means that for anyneighborhood V of x + y, we can find neighborhoods V1 of x and V2 of y such thatV1 + V2 ⊆ V . Hence, if x′ is near x and y′ is near y, then x′ + y′ will be near x+ y.In general, if X is a set and F is a family of mappings f : X → Yf , where Yf is atopological space, the family F induces a topology on X by saying that a set is openin X if it is a union of finite intersections of sets of the form f−1(V ), with f ∈ F andV open in Yf . We call this the F-weak topology on X. One checks that this is theweakest topology on X which makes every f ∈ F continuous.In our construction of locally convex spaces, we took X a vector space, F = P afamily of seminorms and Yf = R for all f . Here is another example : let Xαα∈Ibe a collection of topological spaces, and let X = ×α∈IXα. Define πα : X → Xα byπα : (xβ) 7→ xα. Then the product topology on X is the weak topology generated bythe projections πα, i.e. it is the weakest topology that makes every πα continuous.

7. Show that one can convert a seminorm into a norm by factoring out the zero directions.More precisely, suppose p is a seminorm on a vector space E. Show that ker(p) = x ∈E : p(x) = 0 is a linear subspace of E. Show that the quotient space E/ ker(p) is anormed space, with the norm defined by ‖[x]‖ := p(x) for x ∈ E. 10

8. Let K ⊂ Rn be a compact set. Let Cm(K) = f |K : f ∈ Cm(Rn).1. Show that C(K) endowed with the norm ‖f‖0 = supx∈K |f(x)| is a Banach space.2. Show that Cm(K) endowed with the norm ‖f‖m = max|α|≤m ‖Dαf‖0 is a Banach

space.

9. If X is countable, px is countable and separating, so C(X) is metrizable. For uncountable sets like[0, 1], it won’t be metrizable.10. You used this procedure in your undergraduate courses to construct the normed space Lp(X,µ) from

the seminormed space Lp(X,µ) of all measurable f : X → C satisfying (∫X|f |pdµ)1/p <∞, 1 ≤ p <∞.

Page 22: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

18 Chapter 1. LCS and Distributions

9. Let

S(Rn) = ϕ ∈ C∞(Rn) | pα,β(ϕ) := supx∈Rn

|xαDβϕ(x)| <∞ for all α, β ∈ Nn0

We call S(Rn) the Schwartz space of rapidly decreasing functions.Show that S(Rn) with the topology induced by the family P = pα,β is a Fréchetspace.

10. Let X be a vector space endowed with a topology τ . We say that X is normable ifthere exists a norm on X which induces τ .1. Show that if X is normable, then 0 has a bounded neighborhood.2. Let Ω be a nonempty open subset of Rn. Let K1 ⊂ K2 ⊂ . . . ⊂ Ω be compact

sets such that ∪jKj = Ω. Endow C(Ω) with the family of seminorms pi(f) =supx∈Ki |f(x)|.Show that the semiballs B0,ε(pi) = f ∈ C(Ω) : pi(f) < ε are not bounded.

3. Conclude that C(Ω) is not normable. 11

11. The topological dual of S(Rn) (cf. Exercise 9), denoted S ′(Rn), is called the space oftempered distributions.1. Given k ∈ N0, define |||ϕ|||k = max|α+β|≤k pα,β(ϕ). Show that T ∈ S ′(Rn) iff there

exists k such that |T (ϕ)| ≤ C|||ϕ|||k for all ϕ ∈ S(Rn).2. Show that if T ∈ S ′(Rn), then T |D(Rn) ∈ D′(Rn). Hence, any tempered distribution

is a distribution.Hint: you can show for example that ϕj → 0 in D(Rn) implies ϕj → 0 in S(Rn).

12. 1. Show that S(Rn) ⊂⋂

1≤p≤∞ Lp(Rn), and ‖ϕ‖Lp ≤ (2π)n|||ϕ|||2n for any ϕ ∈ S(Rn). 12

Hint: Recall that∫R

11+x2 dx = π.

2. Show that for any u ∈ Lp(Rn) and ϕ ∈ S(Rn), one has uϕ ∈ L1(Rn), and ‖uϕ‖L1 ≤(2π)n‖u‖Lp |||ϕ|||2n.

3. Given u ∈ Lp(Rn), define Tu(ϕ) =∫u(x)ϕ(x) dx for ϕ ∈ S(Rn). Show that Tu ∈

S ′(Rn).13. Define PV ( 1

x) : S(R) → C by PV ( 1x) : ϕ 7→ limε↓0

∫|x|≥ε

1xϕ(x) dx. We call PV ( 1

x)the Cauchy principal value. Show that PV ( 1

x) is well defined on S(R), and that|PV ( 1

x)(ϕ)| ≤ 2(p0,1(ϕ) + p1,0(ϕ)). Conclude that PV ( 1x) ∈ S ′(R).

A well-known formula says that limε↓01

x−x0+iε = PV ( 1x−x0

) − iπδ(x − x0). It can beshown that this holds in the sense of distributions.

14. Let g(x) =x if x ≥ 0,0 if x ≤ 0.

for x ∈ R. Clearly, g is continuous but not everywhere

differentiable in the classical sense.Let Tg(ϕ) =

∫g(x)ϕ(x) dx for ϕ ∈ S(R).

1. Show that Tg ∈ S ′(R), and that its distributional derivative T ′g = TH , where H is

the Heaviside function H(x) =

1 if x ≥ 0,0 if x ≤ 0.

.

11. We can also show that C∞(Ω) is not normable. The idea is as follows: recall that by the Arzelà-Ascolitheorem, a subset of C(K) is compact iff it is closed, bounded and equicontinuous. Using Arzelà-Ascoli,one can show that any closed bounded subset of C∞(Ω) is compact. Recalling that the unit ball of aninfinite-dimensional normed space is never compact (theorem), we conclude that C∞(Ω) is not normable.The same conclusion holds for DK .12. This shows in particular that the embedding S(Rn) → Lp(Rn) is continuous.

Page 23: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

1.9. Exercises 19

2. Show that T ′H = δ.Thus, δ = T ′′g , and if we identify Tg with g, we see that the nonfunction δ is thesecond derivative of a continuous function. This is typical of tempered distributions:any T ∈ S ′(Rd) takes the form T = DβTg for some polynomially bounded continuousfunction g and some β ∈ Nd0. A related, more complicated result holds for T ∈ D′(Rd),see [32].

15. Let (Tn) be a sequence in S ′(Rd). We say that Tn → T ∈ S ′(Rd) if Tn(ϕ) → T (ϕ) forevery ϕ ∈ S(Rd). This is just convergence in the weak*-topology.Define Tn(ϕ) = n

2∫ 1/n−1/n ϕ(x) dx on S(R). Show that Tn → δ. This is an example of a

delta sequence.

Page 24: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions
Page 25: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Chapter 2

Fourier analysis

2.1 Convolutions

In this chapter we shall discuss the Fourier transform and some of its applications.We begin with some facts from measure theory, then we define convolutions and give

their basic properties. Let us first take care of differentiation under integral sign. 1

Lemma 2.1. Let (X,µ) be a measure space, I ⊆ R open and f : X × I → C such thatf(·, t) ∈ L1(µ) for each t ∈ I. Let F (t) =

∫X f(x, t) dµ(x).

(a) If f(x, ·) ∈ C(I) for each x and there exists g ∈ L1(µ) such that |f(x, t)| ≤ g(x) forall x, t, then F ∈ C(I).

(b) If ∂tf exists and there is a g ∈ L1(µ) such that |∂tf(x, t)| ≤ g(x) for all x, t, then Fis differentiable and F ′(t) =

∫∂tf(x, t) dµ(x).

Proof. Fix t0 ∈ I, let tn ⊂ I such that tn → t0. Then fn(x) = f(x, tn) → f(x, t0),so by the dominated convergence theorem, F (tn) → F (t0), which gives (a). Moreover, ifhn(x) = f(x,tn)−f(x,t0)

tn−t0 , then hn(x) → ∂tf(x, t0), so ∂tf is measurable in x. Furthermore,by the mean value theorem, |hn(x)| ≤ supt∈I |∂tf(x, t)| ≤ g(x), so applying dominatedconvergence again, we get

F ′(t0) = limn→∞

F (tn)− F (t0)tn − t0

= limn→∞

∫hn(x) dµ(x) =

∫∂tf(x, t) dµ(x) .

The classical Minkowski inequality says that the Lp norm of a sum is bounded by thesum of Lp norms. The following is a generalization from sums to arbitrary integrals. Theproof is very similar.

Lemma 2.2 (Generalized Minkowski inequality). Let (X,µ), (Y, ν) be σ-finite measurespaces and suppose f : X × Y → C is measurable.

If 1 ≤ p ≤ ∞, f(·, y) ∈ Lp(µ) for a.e. y, and the function y 7→ ‖f(·, y)‖p is in L1(ν),then the map x 7→

∫f(x, y)dν(y) is in Lp(µ), and∥∥∥ ∫ f(·, y) dν(y)

∥∥∥p≤∫‖f(·, y)‖p dν(y) .

Proof. The map x 7→∫f(x, y)dν(y) is measurable by classical theorems, so it suffices to

prove the inequality.

1. This chapter follows [44] and [14].

Page 26: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

22 Chapter 2. Fourier analysis

For p = 1, this follows from Fubini. For p =∞, fix x ∈ X. Then |f(x, y)| ≤ ‖f(·, y)‖∞for any y ∈ Y , so

∫|f(x, y)|dν(y) ≤

∫‖f(·, y)‖∞ dν(y). Since x is arbitrary, the assertion

follows. So suppose 1 < p <∞, let q be its conjugate exponent (i.e. 1p + 1

q = 1) and defineF (x) =

∫Y f(x, y) dν(y). Then for any x,

|F (x)|p = |F (x)|p−1|F (x)| ≤ |F (x)|p−1∫Y|f(x, y)| dν(y) ,

so by Fubini,

‖F‖pp =∫X|F (x)|p dµ(x) ≤

∫X

(|F (x)|p−1

∫Y|f(x, y)|dν(y)

)dµ(x)

=∫Y

( ∫X|F (x)|p−1|f(x, y)|dµ(x)

)dν(y) .

Now using Hölder and noting that (p− 1)q = p, we get

‖F‖pp ≤∫Y

( ∫X|F (x)|(p−1)q dµ(x)

)1/q( ∫X|f(x, y)|p dµ(x)

)1/pdν(y)

=∫Y‖F‖p/qp ‖f(·, y)‖p dν(y) .

If ‖F‖p = 0, the lemma is trivially true. If ‖F‖p 6= 0, divide by ‖F‖p/qp to get ‖F‖p =‖F‖p(1−1/q)

p ≤∫Y ‖f(·, y)‖p dν(y).

The last measure theoretic result we need before moving on, is that translations arecontinuous in the Lp norm. That is, suppose f ∈ Lp(Rn), y ∈ Rn and define

τyf(x) = f(x− y) .

Clearly, ‖τyf‖p = ‖f‖p for 1 ≤ p ≤ ∞. Moreover,

Lemma 2.3. If 1 ≤ p < ∞, then translation is continuous in the Lp norm. That is, iff ∈ Lp(Rn) and z ∈ Rn, then limy→0 ‖τy+zf − τzf‖p = 0.

Proof. Since τy+z = τyτz, by replacing f by τzf , we may assume z = 0. Suppose firstthat g ∈ Cc(Rn). Then for |y| ≤ 1, all τyg are supported on a common compact set K,so∫|τyg − g|p ≤ maxx∈K |g(x− y)− g(x)|p vol(K) → 0 as y → 0, because g is uniformly

continuous on K.Next, if f ∈ Lp(Rn), let ε > 0 and choose g ∈ Cc(Rn) with ‖f − g‖p < ε/3. Then

‖τyf − f‖p ≤ ‖τy(f − g)‖p + ‖τyg − g‖p + ‖g − f‖p < 2ε3 + ‖τyg − g‖p and we know

‖τyg − g‖p < ε/3 for y small enough.

Definition 2.4. If f , g are measurable on Rn, we define their convolution f ∗ g by

(f ∗ g)(x) :=∫Rnf(y)g(x− y) dy

whenever the integral exists.

To understand convolutions, suppose ρ : Rn → R is measurable with ρ ≥ 0 and∫ρ = 1. Then ρ(y) dy is a probability measure on Rn, and so is ρ(y − x) dy for any

x ∈ Rn. Moreover, the measure ρ(y − x) dy is just a translation of ρ(y) dy by x. Givena function f , the map fρ(x) =

∫f(y) ρ(y − x) dy, when defined, gives the average of f

with respect to the measure ρ(y − x) dy. Intuitively, by taking averages, the function fρ

Page 27: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.1. Convolutions 23

becomes more regular than f , because the singularities of f are averaged upon. Moreover,if ρ has a very small support, say an ε-ball around zero, fρ will be a good approximationof f . The convolution product f ∗ ρ is defined by

∫f(y) ρ(x− y) dy, i.e. fρ instead of fρ,

where ρ(x) := ρ(−x). This is just for convenience. For example, one advantage is that weget

ρ ∗ f = f ∗ ρ

using the change of variables z = x− y.

Lemma 2.5. (i) (Young’s Inequality). If f ∈ L1(Rn) and g ∈ Lp(Rn), then 2

f ∗ g ∈ Lp(Rn) and ‖f ∗ g‖p ≤ ‖f‖1‖g‖p .

(ii) If f ∈ L1(Rn), g ∈ Ck(Rn) and Dαg is bounded for |α| ≤ k, then

f ∗ g ∈ Ck(Rn) and Dα(f ∗ g) = f ∗ (Dαg) .

(iii) If ρ ∈ L1(Rn), f ∈ Lp(Rn), h ∈ Lq(Rn), where 1 ≤ p ≤ ∞ and 1p + 1

q = 1, then

(ρ ∗ f, h) = (f, ρ∗ ∗ h) ,

where (u, v) =∫uv and ρ∗(x) = ρ(−x).

Proof. (i) Let h(x, y) = f(y)g(x− y). Then (f ∗ g)(x) =∫h(x, y) dy, so by Lemma 2.2,

f ∗g ∈ Lp(Rn) with ‖f ∗g‖p = ‖∫h(·, y) dy‖p ≤

∫‖h(·, y)‖p dy =

∫|f(y)| ‖τyg‖p dy =

‖f‖1‖g‖p.(ii) By hypothesis ∃M > 0 with |f(y)Dαg(x−y)| ≤Mf(y) ∈ L1(Rn). So by Lemma 2.1,

∂xj (f ∗ g)(x) = ∂xj∫f(y)g(x − y) dy =

∫f(y)∂xjg(x − y) dy = [f ∗ (∂xjg)](x). The

result now follows by induction.(iii) By Fubini, (ρ ∗ f, h) = (f ∗ ρ, h) = (f, h ∗ ρ∗) = (f, ρ∗ ∗ h).

Remark 2.6. Note that if K is compact and B0,ε = t : |t| ≤ ε, then(supp f ⊆ K and supp gε ⊆ B0,ε

)=⇒ supp gε ∗ f ⊆ Kε ,

where Kε = x ∈ Rn : dist(x,K) ≤ ε = x ∈ Rn : |x − y| ≤ ε for some y ∈ K. Indeed,if |x − y| > ε for all y ∈ K, then gε(x − y) vanishes on K, so (gε ∗ f)(x) = (f ∗ gε)(x) =∫K f(y)gε(x− y) dy = 0.

Definition 2.7. For any function ρ on Rn, we denote ρε(x) := ε−nρ(x/ε).If ρ ∈ L1(Rn) and

∫ρ(x) dx = 1, we call ρε an approximate identity.

If ρ ∈ C∞c (Rn),∫ρ(x) dx = 1, ρ ≥ 0, ρ(x) = ρ(−x) and ρ(x) = 0 if |x| ≥ 1, we call ρε

a mollifier. We constructed such ρ in Chapter 1.

Note that if ρ ∈ L1, then∫ρε is independent of ε, since

∫ρε(x) dx =

∫ρ(ε−1x)ε−n dx =∫

ρ(y) dy. The following lemma justifies the name “approximate identity”.

Lemma 2.8. Suppose ρ ∈ L1(Rn) and∫ρ(x) dx = a. If f ∈ Lp(Rn) with 1 ≤ p < ∞,

then ρε ∗ f → af in Lp(Rn) as ε→ 0.

2. This shows that L1(Rn) is a Banach algebra, if the product is taken to be the convolution.

Page 28: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

24 Chapter 2. Fourier analysis

Proof. Setting y = εz, we have

ρε ∗ f(x)− af(x) =∫ρε(y)

(f(x− y)− f(x)

)dy

=∫ρ(z)

(f(x− εz)− f(x)

)dz =

∫ρ(z)

(τεzf(x)− f(x)

)dz .

Hence, by Lemma 2.2, ‖ρε ∗f −af‖p ≤∫|ρ(z)| ‖τεzf −f‖p dz. But for each z, ‖τεzf −f‖p

is bounded by 2‖f‖p, and tends to 0 as ε→ 0 by Lemma 2.3. So the assertion follows bydominated convergence.

Corollary 2.9. For any 1 ≤ p <∞ and any open Ω ⊆ Rn, C∞c (Ω) is dense in Lp(Ω).

Proof. Since Cc(Ω) is dense in Lp(Ω), it suffices to show that C∞c (Ω) ⊇ Cc(Ω). Letf ∈ Cc(Ω) and let d = dist(supp f, ∂Ω) (with d :=∞ if Ω = Rn). Let ρε be a mollifier. Ifε < d/2, we have ρε ∗ f ∈ Cc(Ω) by Remark 2.6. Moreover, ρε ∗ f = f ∗ ρε ∈ C∞(Rn) byLemma 2.5 (ii). Finally, ρε ∗ f → f in Lp(Ω) by Lemma 2.8.

Corollary 2.10. Let f, g ∈ L1loc(Ω). Then Tf = Tg iff f = g a.e. on Ω. In other words,

L1loc(Ω) can be embedded in D′(Ω), i.e. the map f 7→ Tf is injective.

Proof. Let h ∈ L1loc(Ω) and suppose Th = 0. Then (h, ϕ) = 0 for all ϕ ∈ C∞c (Ω). If h

was in L2(Ω), we would deduce that (h, h) = 0 by density of C∞c (Ω) and thus h = 0.Unfortunately this may not be the case, so we shall approximate h by a smooth function.

Let U ⊂ Ω be open with U ⊂ Ω and U compact. Then ψ := h|U ∈ L1(U).Let V ⊂ U be open with V ⊂ U . Let ρε be a mollifier. If ϕ ∈ C∞c (V ) and ε is

sufficiently small, then ρε ∗ ϕ ∈ C∞c (U). So (ψ, ρε ∗ ϕ) = 0 and thus (ρε ∗ ψ,ϕ) = 0by Lemma 2.5. But ρε ∗ ψ is smooth, so (ρε ∗ ψ)|V ∈ L2(V ). Since C∞c (V ) is dense inL2(V ), there exists ϕj ⊂ C∞c (V ) with ϕj → (ρε ∗ψ)|V in L2(V ). Hence, ‖ρε ∗ψ‖2L2(V ) =limj(ρε ∗ ψ,ϕj) = 0 and thus (ρε ∗ ψ)|V = 0 a.e. But as ε → 0 we have ρε ∗ ψ → ψ inL1(Rn), so ψ|V = 0 a.e. Since V is arbitrary, ψ = 0 a.e. Since U is arbitrary, h = 0 a.e.on Ω.

2.2 The Fourier transform

Notation. Let dx be the Lebesgue measure on Rn. We denote

dm(x) := (2π)−n/2 dx .

This normalization is motivated by the inversion theorem we give later on. For x, ξ ∈ Rnwe set x · ξ =

∑ni=1 xiξi and |ξ|2 =

∑ni=1 ξ

2i . If N0 = 0, 1, 2, . . . and α ∈ Nn0 , we denote

xα = xα11 · · ·x

αnn , Dα = ∂α1

x1 · · · ∂αnxn and |α| = α1 + . . .+ αn .

If f ∈ L1(Rn), we define the Fourier transform of f by

(Ff)(ξ) = f(ξ) =∫Rnf(x)e−ix·ξ dm(x) .

Given z ∈ Rn and a ∈ R∗, we define the function ez and the scaling operator Sa by

ez(y) := eiy·z and (Saf)(y) = f(y/a) .

Page 29: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.2. The Fourier transform 25

Lemma 2.11. Suppose f, g ∈ L1(Rn) and z ∈ Rn and a > 0. Then

(a) τzf = e−z f , (b) ezf = τz f ,

(c) Saf = anS1/af , (d) f ∗ g = (2π)n/2f g.

Proof. We leave (a), (b) and (c) as exercises to the reader. For (d), note that f∗g ∈ L1(Rn)by Lemma 2.5(i). By the absolute convergence, we may use Fubini and get

(2π)−n/2(f ∗ g)(ξ) = (2π)−n/2∫ ( ∫

f(y)g(x− y) dy)e−ix·ξ dm(x)

=∫f(y)e−iy·ξ

( ∫g(x− y)e−i(x−y)·ξ dm(x)

)dm(y)

=( ∫

f(y)e−iy·ξ dm(y))( ∫

g(z)e−iz·ξ dm(z))

= f(ξ)g(ξ) .

Definition 2.12. Let Cb(Rn) be the space of bounded continuous functions on Rn. Wedefine the space C0(Rn) ⊂ Cb(Rn) of continuous functions vanishing at infinity by

C0(Rn) = h ∈ C(Rn) : h(ξ)→ 0 as |ξ| → ∞ .

Theorem 2.13. (a) If f ∈ L1(Rn), then f ∈ Cb(Rn) and ‖f‖∞ ≤ ‖f‖1.(b) If xαf ∈ L1(Rn) for |α| ≤ k, then f ∈ Ck(Rn), and Dαf = F [(−ix)αf ].(c) If f ∈ Ckc (Rn), then for |α| ≤ k, (Dαf)(ξ) = (iξ)αf(ξ).(d) If f ∈ C∞c (Rn), then f ∈ C0(Rn) ∩ Lp(Rn) for every p ≥ 1.(e) (Riemann-Lebesgue Lemma). If f ∈ L1(Rn), then f ∈ C0(Rn).

Proof. If f ∈ L1(Rn), then |f(ξ)| ≤∫|f | = ‖f‖1, so ‖f‖∞ ≤ ‖f‖1. Moreover, if ξn ⊂ Rn

with ξn → ξ, then e−ix·ξn → e−ix·ξ, so f(ξn)→ f(ξ) by dominated convergence. Hence, fis continuous, so f ∈ Cb(Rn)

The case k = 0 of (b) is given in (a). For k > 0, we have by Lemma 2.1 and inductionon |α| ≤ k that f ∈ Ck(Rn), with

Dαf(ξ) = Dαξ

∫f(x)e−ix·ξ dm(x) =

∫f(x)(−ix)αe−ix·ξ dm(x) ,

so (b) holds. For (c), integrating by parts,

∂xjf(ξ) =∫∂xjf(x)e−ix·ξ dm(x) = −

∫f(x)(−iξj)e−ix·ξ dm(x) = iξj f(ξ) ,

so (c) follows by induction. For (d), let f ∈ C∞c (Rn). Then by (c), (Dαf)(ξ) = (iξ)αf(ξ)for all α, so using (a), ξαf(ξ) is bounded all α. Hence,

[∏n1 (1 + ξ2

j )]f(ξ) is bounded. In

particular, f(ξ)→ 0 as |ξ| → ∞, i.e. f ∈ C0(Rn), and f ∈ L1(Rn) because∫R

11+ξ2

jdξj = π

converges. But f ∈ C0(Rn) implies ‖f‖pp ≤ supξ |f(ξ)|p−1‖f‖1 <∞, so f ∈ Lp(Rn).For (e), we know by (a) that f ∈ Cb(Rn). Choose ϕ ∈ C∞c (Rn) such that ‖f−ϕ‖1 < ε/2

(use Corollary 2.9). Then |f(ξ)| ≤ |(f − ϕ)(ξ)|+ |ϕ(ξ)| ≤ ‖f −ϕ‖1 + |ϕ(ξ)| < ε/2 + |ϕ(ξ)|.But ϕ(ξ)→ 0 as |ξ| → ∞ by (d). So |f(ξ)| < ε for |ξ| sufficiently large.

We now calculate the Fourier transform of a Gaussian.

Lemma 2.14. Let φ(x) = e−|x|2/2 for x ∈ Rn. Then φ = φ.

Page 30: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

26 Chapter 2. Fourier analysis

Proof. First suppose n = 1. Since φ′(x) = −xφ(x), we have by Theorem 2.13(b),(c) thatddξ φ(ξ) = (−ixφ)(ξ) = (iφ′)(ξ) = i(iξ)φ(ξ) = −ξφ(ξ). Thus, both φ and φ solve theequation y′ = −xy, so φ = cφ. But φ(0) =

∫R e−x2/2dm(x) = 1. 3 Thus cφ(0) = 1, i.e.

c = 1, so φ = φ.For general n, we have by Fubini, denoting dmi = (2π)−1/2 dxi,

ϕ(ξ) =∫Rne−|x|

2/2e−ix·ξ dm(x) =∫Re−x

21/2e−ix1ξ1 dm1 · · ·

∫Re−x

2n/2e−ixnξndmn

= e−ξ21/2 · · · e−ξ2

n/2 = e−|ξ|2/2 .

Definition 2.15. Given f ∈ L1(Rn), we define

(F ∗f)(x) = f

(x) =∫Rnf(ξ)eix·ξ dm(ξ) = f(−x).

The notation F ∗ is motivated by the following lemma.

Lemma 2.16. If f, g ∈ L1(Rn), then∫fg =

∫fg and (f , g) = (f, g

).

Proof. By Fubini,∫f(ξ)g(ξ) =

∫ ∫f(x)e−ix·ξg(ξ) =

∫f(x)

∫g(ξ)e−ix·ξ =

∫f(x)g(x) ,

(f , g) =∫f(ξ)g(ξ) =

∫ ∫f(x)e−ix·ξg(ξ) =

∫f(x)

∫g(ξ)eix·ξ = (f, g

) .

We claim that F ∗Ff = f . A simple appeal to Fubini’s theorem fails because theintegrand in (F ∗Ff)(x) =

∫ ∫f(y)e−iy·ξeix·ξdm(y)dm(ξ) is not in L1(Rn × Rn). The

trick is to introduce a convergence factor as follows.

Theorem 2.17 (Fourier Inversion Theorem). If f ∈ L1(Rn) and f ∈ L1(Rn), then fagrees a.e. with a continuous function fc, and F ∗Ff = FF ∗f = fc.

Proof. Let φ(x) = e−|x|2/2. Given ε > 0 and x ∈ Rn, we have by Lemma 2.11(a),∫

e−ε2|ξ|2/2eix·ξ f(ξ) dξ =

∫φ(εξ)ex(ξ)f(ξ) dξ =

∫Sε−1φ(ξ)τ−xf(ξ) dξ ,

so using Lemma 2.16 and Lemma 2.11(c),∫e−ε

2|ξ|2/2eix·ξ f(ξ) dξ =∫Sε−1φ(ξ)τ−xf(ξ) dξ = ε−n

∫Sεφ(ξ)f(x+ ξ) dξ .

But φ = φ by Lemma 2.14. So∫e−ε

2|ξ|2/2eix·ξ f(ξ) dξ = ε−n∫Sεφ(y)f(x+ y) dy =

∫φε(y)f(x− y) dy = φε ∗ f(x) ,

where we changed the variable y to −y and used the fact that φ(−y) = φ(y). Now∫φ(x) dx = (2π)n/2, so by Lemma 2.8, φε ∗ f → (2π)n/2f in L1 as ε → 0. But since f ∈

L1, then by dominated convergence, the LHS tends to∫eix·ξ f(ξ) dξ = (2π)n/2(F ∗f)(x).

Hence, f = F ∗Ff a.e. 4 Similarly, FF ∗f = f a.e. Since F ∗Ff and FF ∗f arecontinuous, being the Fourier transforms of L1 functions, the proof is complete.

3. Because(∫∞−∞ e

−x2/2 dx)2

=∫R2 e−(x2+y2)/2 dxdy =

∫ 2π0 dθ

∫∞0 e−r

2/2r dr = 2π.4. If fm → f in L1 and fm → g pointwise, then f = g a.e. Indeed, if fm → f in L1, then some

subsequence fmk converges pointwise to f (see e.g. [14]), so f = g by uniqueness of the pointwise limit.

Page 31: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.3. Sobolev spaces and embeddings 27

We finally give the important

Theorem 2.18 (Plancherel). The map F : L1(Rn)∩L2(Rn)→ C0(Rn) extends uniquelyto a unitary isomorphism F : L2(Rn)→ L2(Rn).

In other words, the unique extension of F to L2(Rn) is linear, bijective, and satisfies

∀f, g ∈ L2(Rn) : (Ff,Fg) = (f, g) (Parseval’s identity).

Proof. We first note that if f, g ∈ C∞c (Rn), then f , g ∈ L1(Rn) by Lemma 2.13(c), so byLemma 2.16 and Theorem 2.17,

(Ff,Fg) = (f,F ∗Fg) = (f, g) .

Hence, ‖Ff‖2 = ‖f‖2 for every f ∈ C∞c (Rn). So F |C∞c (Rn) extends uniquely to anisometry F : L2(Rn) → L2(Rn). Moreover, F (f) = f on L1(Rn) ∩ L2(Rn). Indeed,given f ∈ L1(Rn) ∩ L2(Rn), use Corollary 2.9 (and its proof) to find fj ∈ C∞c (Rn) suchthat fj → f in both L1(Rn) and L2(Rn). Then ‖Ffj − Ff‖2 = ‖fj − f‖2 → 0 and‖fj − f‖∞ ≤ ‖fj − f‖1 → 0 by Lemma 2.13(a). But Ffj = fj . Hence, Ff = f .

It remains to show that F is surjective. Let h ∈ C∞c (Rn). Then F ∗h(x) = h(−x) ∈L1(Rn) ∩ L2(Rn) by Lemma 2.13(c), so F (F ∗h) = F ∗h = h by Theorem 2.17. Hence,h ∈ Ran F , so C∞c (Rn) ⊆ Ran F . Since F is an isometry, then Ran F is closed. 5 Hence,by Corollary 2.9, L2(Rn) = C∞c (Rn) ⊆ Ran F , so F is surjective.

2.3 Sobolev spaces and embeddingsDefinition 2.19. Let f, h ∈ L1

loc(Ω). We say that h is the weak α-th derivative of f , anddenote h = Dα

wf , if Th = DαTf , i.e. if (h, ϕ) = (−1)|α|(f,Dαϕ) for all ϕ ∈ C∞c (Ω).

Remark 2.20. By Corollary 2.10, if a weak α-th derivative exists, it is unique (up to setsof measure 0). In particular, if f ∈ C∞(Ω), then Dαf = Dα

wf for all α.

Definition 2.21. Let 1 ≤ p <∞. We define the Sobolev space W k,p(Ω) by

W k,p(Ω) = f ∈ Lp(Ω) : Dαwf exists for all |α| ≤ k and Dα

wf ∈ Lp(Ω)

and endow it with the norm

‖f‖k,p =( ∑|α|≤k

‖Dαwf‖pp

)1/p.

Sometimes we endow W k,p(Ω) with the equivalent norm Nk,p(f) =∑|α|≤k ‖Dα

wf‖p.

Remark 2.22. If f, fj ∈W k,p(Ω), then

fj → f in W k,p(Ω) ⇐⇒ Dαwfj → Dα

wf in Lp(Ω) for each |α| ≤ k .

Indeed, if fj → f inW k,p, then ‖Dαwfj−Dα

wf‖p ≤ ‖fj−f‖k,p → 0, so Dαwfj → Dα

wf in Lp.Conversely, if each Dα

wfj → Dαwf in Lp, then ‖fj−f‖k,p =

(∑|α|≤k ‖Dα

wfj−Dαwf‖pp

)1/p →0. Similarly, (fj) is Cauchy in W k,p iff each (Dα

wfj) is Cauchy in Lp.

5. If Ffj → g, then ‖fj − fk‖2 = ‖Ffj − Ffk‖2 → 0 as j, k → ∞, so by completeness of L2, fj → h

for some h ∈ L2, so Ffj → Fh, so g = Fh and the range is closed.

Page 32: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

28 Chapter 2. Fourier analysis

Lemma 2.23. W k,p(Ω) is a Banach space. W k,2(Ω) is a Hilbert space.

Proof. Let (fj) be Cauchy inW k,p(Ω). Then (Dαwfj) is Cauchy in Lp(Ω) for every |α| ≤ k,

so by completeness of Lp(Ω), there exists fα ∈ Lp(Ω) with Dαwfj → fα. Let f := f0 and

ϕ ∈ C∞c (Ω). Then (f,Dαϕ) = limj(fj , Dαϕ) = limj(−1)|α|(Dαwfj , ϕ) = (−1)|α|(fα, ϕ).

Hence, Dαwf = fα. We thus proved that Dα

wfj → Dαwf in Lp(Ω) for each |α| ≤ k, so

fj → f in W k,p(Ω).When p = 2, (f, g)k :=

∑|α|≤k(Dα

wf,Dαwg) defines an inner product on W k,2(Ω) which

induces the norm ‖ ‖k,2, so W k,2(Ω) is a Hilbert space.

Lemma 2.24. (a) If f ∈W k,p(Ω) and |α|+ |β| ≤ k, then Dαw(Dβ

wf) = Dα+βw f .

(b) If f ∈ W k,p(Ω) and ϕ ∈ C∞c (Ω), then fϕ ∈ W k,p(Ω) and ∂wj (fϕ) = (∂wj f)ϕ + f∂jϕ,where ∂wj is the weak j-th partial derivative.

(c) If ρ ∈ C∞c (Rn) and f ∈W k,p(Rn), then ρ ∗ f ∈ C∞(Rn) ∩W k,p(Rn) and Dα(ρ ∗ f) =ρ ∗Dα

wf for all |α| ≤ k.

Proof. (a) Exercise.(b) Let ψ ∈ C∞c (Ω). Then

(fϕ, ∂jψ) = (f, ϕ∂jψ) = (f, ∂j(ϕψ))− (f, ψ∂jϕ)= −(∂wj f, ϕψ)− (f∂jϕ,ψ) = −((∂wj f)ϕ+ f∂jϕ,ψ) .

Hence, ∂wj (fϕ) = (∂wj f)ϕ+ f∂jϕ ∈ Lp(Ω), and (b) follows by induction.(c) We have ρ ∗ f = f ∗ ρ ∈ C∞(Rn) by Lemma 2.5(ii), and ρ ∗ Dα

wf ∈ Lp(Rn) byLemma 2.5(i). So it suffices to see that Dα(ρ ∗ f) = ρ ∗Dα

wf . Let ϕ ∈ C∞c (Rn). Then

(ρ ∗ f,Dαϕ) = (f, ρ∗ ∗Dαϕ) (Lemma 2.5(iii))= (f,Dα(ρ∗ ∗ ϕ)) (Lemma 2.5(ii))= (−1)|α|(Dα

wf, ρ∗ ∗ ϕ) (since ρ∗ ∗ ϕ ∈ C∞c (Rn))

= (−1)|α|(ρ ∗Dαwf, ϕ) .

Definition 2.25. If Ω ⊆ Rn is open, we define W k,p0 (Ω) as the closure of C∞c (Ω) in

W k,p(Ω). Thus, f ∈W k,p0 (Ω) iff there exists ϕj ⊂ C∞c (Ω) such that ‖f − ϕj‖k,p → 0.

Roughly speaking, W k,p0 (Ω) consists of functions in W k,p(Ω) which vanish on ∂Ω.

This can be made precise using traces, but we will not discuss this here. Note thatW 0,p

0 (Ω) = Lp(Ω) by Corollary 2.9.

Lemma 2.26. W k,p0 (Rn) = W k,p(Rn). In other words, C∞c (Rn) is dense in W k,p(Rn).

Since C∞c (Rn) ⊂ C∞(Rn)∩W k,p(Rn), then C∞(Rn)∩W k,p(Rn) is in particular densein W k,p(Rn). Actually C∞(Ω) ∩W k,p(Ω) is dense in W k,p(Ω) for any open set Ω ⊆ Rn.This is known as the Meyers-Serrin theorem. For a proof, see e.g. [23, Theorem 10.15].

Proof. Let ψ ∈ C∞c (Rn) such that ψ(x) = 1 if |x| ≤ 1 and ψ(x) = 0 if |x| ≥ 2. Givenf ∈ W k,p(Rn), let fR(x) = f(x)ψ( xR). Then fR(x) = f(x) if |x| ≤ R, fR(x) = 0 if|x| ≥ 2R, and fR ∈ W k,p(Rn) by Lemma 2.24(b). Assuming |Dγψ(x)| ≤ M for |γ| ≤ k,we get |Dγψ( xR)| ≤ M

R|γ|≤M for R ≥ 1, hence |Dα

wfR(x)| ≤M ′∑β≤α |Dβ

wf(x)| by Leibniz.Thus, ‖Dα

wfR −Dαwf‖p =

( ∫|x|>R |Dα

wfR −Dαwf |p

)1/p ≤M ′′∑β≤α( ∫|x|>R |Dβ

wf |p)1/p → 0

as R → ∞, because f ∈ W k,p(Rn). Let ρδ be a mollifier. Then ρδ ∗ fR ∈ C∞c (Rn)

Page 33: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.3. Sobolev spaces and embeddings 29

by Lemma 2.24(c) and Remark 2.6. Moreover, Dα(ρδ ∗ fR) = ρδ ∗ (DαwfR) → Dα

wfR inLp(Rn) for each |α| ≤ k as δ → 0. Thus, given ε > 0, we may choose δ small enoughso that ‖ρδ ∗ fR − fR‖k,p < ε/2 and R so large that ‖fR − f‖k,p < ε/2, which yields‖ρδ ∗ fR − f‖k,p < ε. Hence, C∞c (Rn) is dense in W k,p(Rn).

Corollary 2.27. If f ∈W k,2(Rn), then for |α| ≤ k we have

(Dαwf)(ξ) = (iξ)αf(ξ) .

In particular, ξαf ∈ L2(Rn) for all |α| ≤ k.

Proof. Use Lemma 2.26 to find ϕj ⊂ C∞c (Rn) with ϕj → f inW k,2(Rn). Then Dαϕj →Dαwf in L2(Rn). Since F is unitary, hence continuous on L2, we get by Theorem 2.13(c),

Dαwf = lim Dαϕj = lim(iξ)αϕj = (iξ)αf .

We are finally ready to illustrate the power of the Fourier transform in understandingthe structure of Sobolev spaces.

Definition 2.28. We define

Ck0 (Rn) = f ∈ Ck(Rn) : Dαf ∈ C0(Rn) for |α| ≤ k .

Then Ck0 (Rn) is a Banach space when endowed with ‖f‖ =∑|α|≤k ‖Dαf‖C0 .

Theorem 2.29 (Sobolev embedding theorem). Suppose k > r + n2 .

(1) If f ∈ W k,2(Rn), then for |α| ≤ r, we have Dαwf ∈ L1(Rn) with ‖Dα

wf‖1 ≤ C ‖f‖k,2for some C independent of f .

(2) Every f ∈W k,2(Rn) agrees a.e. with an fc ∈ Cr0(Rn), and the embedding W k,2(Rn) →Cr0(Rn) which maps f 7→ fc is continuous.

We usually express (2) less precisely by saying that W k,2(Rn) ⊂ Cr0(Rn) and that theinclusion map is bounded.

Proof. By Corollary 2.27, ‖Dαwf‖1 = ‖ξαf‖1. Let h1 = (1+ |ξ|k)f and h2 = (1+ |ξ|k)−1ξα.

Then ξαf = h1h2. We first show that h1, h2 ∈ L2(Rn), then apply Cauchy-Schwarz.For h1, we have ‖h1‖2 ≤ ‖f‖2 + ‖|ξ|kf‖2. For the first term, we have by Plancherel,

‖f‖2 = ‖f‖2 ≤ ‖f‖k,2. Next, note that |ξ|2k =(∑n

i=1 ξ2i

)k =∑|β|≤k cβξ

2β for some cβ ≥ 0.So using Corollary 2.27 and Plancherel again,

‖|ξ|kf‖22 =∫|ξ|2k|f(ξ)|2 dξ =

∑cβ

∫|ξβ f(ξ)|2 =

∑cβ‖ξβ f‖22

=∑

cβ‖Dβwf‖22 =

∑cβ‖Dβ

wf‖22 ≤ c ‖f‖2k,2 .

Hence,‖h1‖2 ≤ ‖f‖2 + ‖|ξ|kf‖2 ≤ c′ ‖f‖k,2 .

For h2, we have ‖h2‖22 =∫|ξ|≤1 |h2(ξ)|2 dξ +

∫|ξ|≥1 |h2(ξ)|2 dξ. Since h2 is continuous, the

first integral is finite. For the second integral, since |ξj | ≤ |ξ|, we have |ξα| ≤ |ξ||α|. But for|ξ| ≥ 1 we have |ξ||α| ≤ |ξ|r if |α| ≤ r. Thus,

∫|ξ|≥1 |h2(ξ)|2 dξ ≤

∫|ξ|≥1(1 + |ξ|k)−2|ξ|2r dξ =∫

R≥1(1 +Rk)−2R2rRn−1 dR ≤∫R≥1R

−2kR2r+n−1 dR <∞ because 2r+ n− 1− 2k < −1,since r + n

2 < k.We thus showed that ‖Dα

wf‖1 = ‖ξαf‖1 = ‖h1h2‖1 ≤ ‖h1‖2‖h2‖2 ≤ C‖f‖k,2.

Page 34: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

30 Chapter 2. Fourier analysis

For (2), we know from (1) and Corollary 2.27 that for each |α| ≤ r, we have (iξ)αf =Dαwf ∈ L1(Rn), so by Theorem 2.13(b),(e), we have F (f) ∈ Cr0(Rn). Hence, F ∗Ff(ξ) =

F (f)(−ξ) ∈ Cr0(Rn). So by the inversion formula, f agrees a.e. with an fc ∈ Cr0(Rn).Moreover, by Theorem 2.13(b), ‖Dαfc‖∞ = ‖DαF (f)‖∞ = ‖F [(−iξ)αf ]‖∞, and by The-orem 2.13(a), ‖F [(−iξ)αf ]‖∞ ≤ ‖(−iξ)αf‖1. But by Corollary 2.27 and (1), ‖(−iξ)αf‖1 =‖Dα

wf‖1 ≤ C ‖f‖k,2. Hence, ‖fc‖Cr0 =∑|α|≤r ‖Dαfc‖∞ ≤ C ′ ‖f‖k,2, so the embedding is

bounded, hence continuous.

Let us partly extend the previous theorem to open subsets.

Lemma 2.30. Let f ∈ W k,p0 (Ω) and let f be the extension of f by zero to Rn, i.e.

f(x) = f(x) if x ∈ Ω and f(x) = 0 if x /∈ Ω. Then f ∈ W k,p(Rn), Dαwf = Dα

wf , and themap f 7→ f is isometric.

Proof. Let φj ⊂ C∞c (Ω) converge to f in W k,p0 (Ω). Then Dαφj → Dα

wf in Lp(Ω) for all|α| ≤ k. Denote (f, g)E =

∫E fg. Then given ψ ∈ C∞c (Rn), we have

(f , Dαψ)Rn = (f,Dαψ)Ω = limj→∞

(φj , Dαψ)Ω = limj→∞

(−1)|α|(Dαφj , ψ)Ω

= (−1)|α|(Dαwf, ψ)Ω = (−1)|α|(Dα

wf, ψ)Rn .

Thus, Dαwf = Dα

wf ∈ Lp(Rn). So f ∈W k,p(Rn) and ‖f‖Wk,p(Rn) = ‖f‖Wk,p

0 (Ω).

Corollary 2.31. Let Ω ⊆ Rn be open and k > r + n2 . Then

(a) W k,2(Ω) → Cr(Ω).

(b) W k,20 (Ω) → Crb (Ω) and the embedding is continuous.

The continuity of the embedding W k,2(Ω) → Cr(Ω) is delicate for general Ω, anddepends on the nature of ∂Ω.

Proof. For (a), let U ⊂ Ω, with U ⊂ Ω and U compact. Let φ ∈ C∞c (Ω) with φ = 1on U . Then for f ∈ W k,2(Ω), we have φf ∈ W k,2(Rn) by Lemma 2.24, so we can applyTheorem 2.29 to deduce that φf agrees a.e. with some gU ∈ Cr(Ω). Hence, f |U agreesa.e. with gU , and since U is arbitrary, f can be identified with some fc ∈ Cr(Ω).

For (b), the maps f 7→ f 7→ fc 7→ fc|Ω fromW k,20 (Ω)→W k,2(Rn) → Cr0(Rn)→ Crb (Ω)

are all bounded by Theorem 2.29 and Lemma 2.30, so the embedding is continuous.

We now prove the Rellich theorem, which is of great importance in applications.

Definition 2.32. Let X,Y be normed spaces. We say that T : X → Y is compact if T (B)is compact in Y for every bounded set B ⊆ X.

Equivalently, T is compact if any bounded sequence (xn) in X has a subsequence (xnk)such that (Txnk) converges (Exercise).

For example, if T has a finite rank, i.e. dim Ran(T ) <∞, then T is compact.

Theorem 2.33 (Rellich embedding theorem). Let Ω ⊂ Rn be a bounded open set.

(a) For any k ≥ 1, the inclusion W k,20 (Ω) →W k−1,2

0 (Ω) is a compact operator.

(b) If k > r + n2 + 1, then the embedding W k,2

0 (Ω) → Crb (Ω) is a compact operator.

Page 35: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.3. Sobolev spaces and embeddings 31

In particular, the identity map W 1,20 (Ω) → L2(Ω) is compact (recall that W 0,2

0 (Ω) =L2(Ω)). It is also true that the identity map W 1,2(Ω)→ L2(Ω) is compact, but ∂Ω shouldbe continuous in this case (see e.g. [27]). In fact the following proof works without changefor W 1,2(Ω) if Ω is an extension domain, i.e. there exists a continuous linear operatorE : W 1,2(Ω) → W 1,2(Rn). Domains with regular boundaries (e.g. Lipschitz continuous)are extension domains; see e.g. [23].

Proof. (b) follows from (a) and Corollary 2.31, so we only need to prove (a).First assume k = 1 and let (fj) be bounded inW 1,2

0 (Ω). By Lemma 2.30, identifying fjwith fj , we may regard (fj) as a bounded sequence in W 1,2(Rn). So applying Plancherel,

‖fp − fq‖22 = ‖fp − fq‖22 =∫|ξ|≤R

|fp − fq|2 +∫|ξ|≥R

|fp − fq|2

for any R > 0. Since for each i, (∂wi fj) is bounded in L2(Rn), the sequence (ξifj) isbounded in L2(Rn) by Plancherel and Corollary 2.27. Hence, (|ξ|fj) is bounded, say‖|ξ|fj‖2 ≤ C for all j. Hence,∫

|ξ|≥R|fp − fq|2 dξ ≤

∫|ξ|≥R

|ξ|2

R2 |fp − fq|2 dξ ≤ 4C2

R2 .

Given ε > 0, we may thus choose R so that∫|ξ|≥R |fp − fq|2 < ε/2 for all p, q. We now

turn to the integral over |ξ| ≤ R. Since (fj) is bounded in L2(Ω), by passing to asubsequence, we may assume (fj) is weakly convergent. 6 Since Ω is bounded, we haveeξ ∈ L2(Ω), so fj(ξ) = (2π)−n/2(fj , eξ) is convergent and in particular Cauchy. That is,fp(ξ) − fq(ξ) → 0 pointwise as p, q → ∞. But since Ω is bounded and (fj) is boundedin L2(Ω), then (fj) is bounded in L1(Ω), because ‖fj‖1 = ‖fj · 1‖1 ≤ ‖fj‖2 · vol(Ω)1/2

by Cauchy-Schwarz. So by Theorem 2.13(a), (fj) is uniformly bounded in Cb(Rn), say‖fj‖∞ ≤ M . Since M is integrable on |ξ| ≤ R, then by dominated convergence, weget

∫|ξ|≤R |fp(ξ) − fq(ξ)|2 → 0 as p, q → ∞. We thus showed that for p, q large enough,

‖fp − fq‖22 < ε, so (fj) in Cauchy in L2(Ω), hence convergent. This proves (a) for k = 1.Now let (fj) be bounded inW k,2

0 (Ω), k > 1. Then (fj) is bounded inW 1,20 (Ω), so by the

case k = 1, (fj) has a convergent subsequence in L2(Ω), say (fϕ0(j)), where ϕ0 : N → Nis strictly increasing. Next, (∂x1fϕ0(j)) is bounded in W 1,2

0 (Ω), so it has a convergentsubsequence in L2(Ω), say (∂x1fϕ0ϕ1(j)). Note that (fϕ0ϕ1(j)) also converges, since it isa subsequence of (fϕ0(j)). By induction, we find a subsequence (fϕ0···ϕn(j)) such that(Dαfϕ0···ϕn(j)) converges for all |α| ≤ 1. Take ψ1 = ϕ0 · · · ϕn. By induction, we find asubsequence (fψ1···ψk−1(j)) such that (Dαfψ1···ψk−1(j)) converges for all |α| ≤ k−1. Thus,taking χ = ψ1· · ·ψk−1, it follows that the subsequence (fχ(j)) converges inW

k−1,20 (Ω).

Here is an important application of the Rellich embedding. One can prove that theLaplace operator −∆ : C∞c (Ω)→ L2(Ω) has a self-adjoint extension in L2(Ω), the DirichletLaplacian. One can prove that its resolvent Rλ = (−∆ − λ)−1 is a bounded map fromL2(Ω)→W 1,2

0 (Ω). Since the embedding W 1,20 (Ω) → L2(Ω) is compact, then the resolvent

Rλ : L2(Ω)→ L2(Ω) is a compact operator. It follows that −∆Ω has a discrete spectrum,with eigenvalues tending to ∞. This generalizes the fact that the Dirichlet eigenvaluesof −∆ on ]−a, a[ are En = (nπ2a )2, and is very convenient because we can add a boundedpotential V without any effort. Hopefully, we’ll prove the details later in Chapter 4.

6. Any bounded sequence in a Hilbert space has a weakly convergent subsequence. This can be proveddirectly (see e.g. [30, Chapter 16]), or using the Banach-Alaoglu theorem we shall prove next chapter.

Page 36: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

32 Chapter 2. Fourier analysis

2.4 Paley-Wiener TheoremThe Paley-Wiener theorems relate the decay properties of functions with the analyticity

of their Fourier transform. There are several versions of this theorem, some apply to squareintegrable functions [31], others to distributions [32]. In this section we present one forsmooth functions, following [40]. We denote Br = x ∈ Rn : |x| ≤ r.Theorem 2.34 (Paley-Wiener). A function f on Rn is the Fourier transform of a functionin C∞c (Rn) with support in Br iff f can be extended to Cn as an entire function, and thereare constants CN such that

|f(ζ)| ≤ CNer | Im ζ|

(1 + |ζ|)N for all ζ ∈ Cn and N = 0, 1, 2, . . . . (?)

Proof. Suppose f(λ) = ϕ(λ) for some ϕ ∈ C∞c (Rn) with suppϕ ⊆ Br. Define f(ζ) =∫Rn ϕ(x)e−ix·ζ dm(x) for ζ ∈ Cn. This is well-defined because |e−ix·ζ | ≤ er | Im ζ| for x ∈ Br,so e−ζϕ ∈ L1(Rn) for each ζ ∈ Cn. Moreover, the power series of e−ix·ζ converges uniformlyon Br, so f(ζ) =

∫Brϕ(x)

∑α∈Nn0

(−ix)αζαα! dm(x) =

∑α∈Nn0

( ∫Brϕ(x)(−ix)α dm(x) ζ

α

α!).

Since |∫Brϕ(x)(−ix)α dm(x)| ≤ r|α|‖ϕ‖1, the series of f(ζ) converges absolutely for

all ζ ∈ Cn. Hence, f is entire. To see (?), integrate by parts to get (iζ)αf(ζ) =∫ϕ(x)(iζ)αe−iζ·x dm(x) =

∫(Dαϕ)(x)e−iζ·x dm(x) and thus |ζα| |f(ζ)| ≤ ‖Dαϕ‖1er | Im ζ|.

Hence, (1 + |ζ|)N |f(ζ)| ≤ (1 + |ζ1|+ . . .+ |ζn|)N |f(ζ)| ≤ CNer | Im ζ| for some CN .Conversely, suppose f is entire and satisfies (?). Then λ 7→ (1 + |λ|)Nf(λ) is bounded

on Rn for every N , so we may define

ϕ(x) =∫Rnf(λ)eix·λ dm(λ) = f

(x) .

Then ϕ ∈ C∞(Rn) by Theorem 2.13(b) (or Lemma 2.1(b)). We claim that suppϕ ⊆ Br.To prove this, first note that for any η ∈ Rn, we have

ϕ(x) =∫Rnf(λ+ iη)eix·(λ+iη) dm(λ) . (†)

Indeed, fix ηk for k > 1 and consider

I(η1) =∫Rf(λ1 + iη1, . . . , λn + iηn)eix·[(λ1+iη1)+...+(λn+iηn)] dλ1 .

Let Γ be a rectangular contour in C with vertices at ±R and ±R+ iη1, where R > 0, anddenote ζ = λ + iη. Then by Cauchy’s theorem,

∫Γ f(ζ)eix·ζ dζ1 = 0, since the integrand

is analytic in ζ1. For points ζ = (±R + is, ζ2, . . . , ζn) along the vertical sides of Γ, where|s| ≤ |η1|, we have by (?)

|f(ζ)eix·ζ | ≤ CNer | Im ζ|e−x·Im ζ

(1 + |ζ|)N → 0

as R → ∞. Hence, the part of the contour integral along the vertical sides tends to 0 asR → ∞, so 0 = limR→∞

∫Γ f(ζ)eix·ζ dζ1 = I(0) − I(η1). Hence, I(η1) = I(0). Repeating

this for each ηj and using Fubini, we get (†). Hence, using (?),

|ϕ(x)| ≤∫

CNer |η|

(1 + |λ+ iη|)N e−x·η dm(λ) ≤ er |η|−x·η

∫CN

(1 + |λ|)N dm(λ) ,

where N is chosen large enough so that the integral in the RHS converges. Set η = λxwith λ > 0. Then er |η|−x·η = e−λ|x|(|x|−r), so taking λ → ∞, we conclude that |ϕ(x)| = 0if |x| > r. Hence, suppϕ ⊆ Br.

It follows from the inversion formula that f(λ) = ϕ(λ) for λ ∈ Rn.

Page 37: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.5. Further results 33

2.5 Further results

Many parts of this chapter can be generalized quite easily to distributions. For ex-ample, if ϕ ∈ D(Rn) and T ∈ D′(Rn), we define the convolution (T ∗ ϕ)(x) := T (τxϕ),where ϕ(x) = ϕ(−x). Note that for distributions Tf we have Tf (τxϕ) =

∫f(y)τxϕ(y) dy =∫

f(y)ϕ(x−y) dy, so this generalizes the convolution of functions. One can prove expectedresults; for example T ∗ ϕ ∈ C∞(Rn), with Dα(T ∗ ϕ) = T ∗ (Dαϕ). Convolutions cansimilarly be defined for T ∈ E ′(Rn) and ϕ ∈ E(Rn). The Fourier transform can also beextended to tempered distributions. The reader will find out how in Exercise 12. Fordetails on all these generalizations, see e.g. [32].

Sobolev spaces are widely used in spectral theory, because they arise in the domainsof self-adjointness of differential operators. There are many embedding theorems for thesespaces, and many important inequalities which are used on a regular basis. For example,the Sobolev embedding can be generalized to W k,p(Ω) → Crb (Ω), when Ω is an open setwith a sufficiently regular boundary (for example Lipschitz continuous). The conditionbecomes k > r+ n

p . Similarly, the Rellich embedding generalizes to the Rellich-Kondrachovtheorem. A classic reference here is [1], but the basic results can be found in many books,e.g. [27], [4], [12] or [24].

One generally learns as an undergraduate that the Laplace transform can be helpfulin solving ordinary differential equations. For example, to solve f ′′(t) + 4f(t) = sin(2t)with initial conditions f(0) = f ′(0) = 0, take the Laplace transform of both sides. Thisgives s2L f(t) − sf(0)− f ′(0) + 4L f(t) = L sin(2t. Since L sin(2t) = 2

s2+4 , wededuce that L f(t) = 2

(s2+4)2 , so taking the inverse Laplace transform we get f(t) =18 sin(2t) − t

4 cos(2t). The bottom line is that the differential equation was transformedinto an algebraic one, which was easy to solve, and we deduced the differential solutionfrom the algebraic one using the inverse Laplace. This idea is also exploited in partialdifferential equations, using the Fourier transform. For example, if we have the PDEP (−i∂x,−i∂t)u(x, t) = 0 and we take the Fourier transform, we get P (ξ,−i∂t)u(ξ, t) = 0.This is a PDE in t only; which is a lot easier to solve. We then get the solution of theoriginal PDE using the inverse Fourier transform. For details, see e.g. [11].

There are many directions to pursue from where we stopped. One can for example con-tinue with harmonic analysis; see [8] for an elementary treatment. Another route is the the-ory of pseudo-differential operators and microlocal/semiclassical analysis. Let us mentionan example of such operators. As we know, if f ∈ C∞c (Rn), then (Dαf)(ξ) = (iξ)αf(ξ).So denoting Dα = (i)−|α|Dα, we get (Dαf)(ξ) = ξαf(ξ). Using the inversion formula,this reads (Dαf)(x) =

∫ξαf(ξ)eix·ξ dm(ξ). More generally, given a partial differen-

tial operator P (x,D) =∑cα(x)Dα, we get [P (x,D)f ](x) =

∫p(x, ξ)f(ξ)eix·ξ dm(ξ),

where p(x, ξ) :=∑cα(x)ξα is called the symbol of P . However, this last operator has

a meaning even if p(x, ξ) is not a polynomial in ξ. We obtain pseudo-differential op-erators by using different p(x, ξ). For example, if λs(ξ) = (1 + |ξ|2)s/2, we may define[λs(D)f ](x) =

∫λs(ξ)eix·ξ f(ξ) dm(ξ) for arbitrary s > 0. Differential operators are thus

thought of as functions p(x, ξ) on the phase space Rnx × Rnξ . To work microlocally meansto work on the phase space rather than on Rn. This leads to a more coherent theory. Aclassic treatise here is the 4 volumes [21]. Shorter accounts can be found in [11], [43], [33]and [45].

Page 38: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

34 Chapter 2. Fourier analysis

2.6 Exercises1. Compute the following limit and justify the calculation:

limt→0

∫ ∞0

sin(xt)t· 1x(1 + x2) dx .

2. Let F (t) =∫ 5

0ext√x

dx for t ∈ R. Compute F ′(0) and justify your answer.3. Let (X,µ) be a σ-finite measure space and K ∈ Lp(X ×X,µ× µ), 1 ≤ p ≤ ∞. Define

(TKf)(x) =∫X f(y)K(x, y) dµ(y) for f ∈ Lq(X), where 1

p + 1q = 1. Show that TK is a

bounded linear operator from Lq(X)→ Lp(X), with ‖TK‖ ≤ ‖K‖Lp(X×X).4. Assuming all integrals in question exist, prove that

(a) f ∗ g = g ∗ f .(b) (f ∗ g) ∗ h = f ∗ (g ∗ h).(c) For z ∈ Rn, τz(f ∗ g) = (τzf) ∗ g = f ∗ (τzg).(d) supp(f ∗ g) ⊆ supp(f) + supp(g), where A+B = a+ b : a ∈ A, b ∈ B.

5. Let f(x) = e−x2 and g(x) = x on R. Compute (f ∗ g)(x).

6. Let p, q be conjugate exponents, 1 ≤ p ≤ ∞. Let f ∈ Lp(Rn) and g ∈ Lq(Rn).(1) Show that ‖f ∗ g‖∞ ≤ ‖f‖p‖g‖q.(2) Show that f ∗ g is uniformly continuous.

Hint: you must show that limy→0 ‖τy(f ∗ g) − f ∗ g‖∞ = 0. For this, use (1),Exercise 4(c) and continuity of translations in Lp norm.

(3) Show that if 1 < p <∞, so that 1 < q <∞, then f ∗ g ∈ C0(Rn).Hint: choose fn, gn in Cc(Rn) such that ‖fn − f‖p → 0 and ‖gn − g‖q → 0.Show that fn ∗ gn ∈ Cc(Rn), ‖fn ∗ gn − f ∗ g‖∞ → 0 and deduce the result.

7. Let ρ ∈ L1(Rn) with∫ρ(x) dx = a. If f is bounded and uniformly continuous, show

that ρε ∗ f → af uniformly as ε→ 0.8. Suppose f, g ∈ L1(Rn) and z ∈ Rn and a > 0. Prove that

(a) τzf = e−z f , (b) ezf = τz f , (c) Saf = anS1/af .

9. Using a change of variables, show that

f(ξ) =∫

(τ πξ

|ξ|2f)(x)e−ix·ξeπi dm(x) .

Deduce that2f(ξ) =

∫ (f − τ πξ

|ξ|2f)(x)e−ix·ξ dm(x) .

Conclude that |f(ξ)| → 0 as |ξ| → ∞ using continuity of translations in the L1 norm.This gives another proof of the Riemann-Lebesgue lemma.

10. Given a < b, let g = χ[a,b] on R. Compute g(ξ).11. Recall the Schwartz space studied in the exercises of Chapter 1:

S(Rn) = ϕ ∈ C∞(Rn) | pα,β(ϕ) := supx∈Rn

|xαDβϕ(x)| <∞ for all α, β ∈ Nn0 .

The reader proved in particular that S(Rn) is a Fréchet space when endowed with P =pα,β, and that S(Rn) ⊂

⋂1≤p≤∞ L

p(Rn), with ‖ϕ‖Lp ≤ (2π)n|||ϕ|||2n for ϕ ∈ S(Rn).Here |||ϕ|||k = max

|α+β|≤kpα,β(ϕ).

Page 39: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.6. Exercises 35

(a) (Continuity of differentiation). Let α ∈ Nn0 and ϕ ∈ S(Rn). Show that Dαϕ ∈S(Rn) with |||Dαϕ|||k ≤ |||ϕ|||k+|α| for all k ∈ N0.

(b) (Continuity of multiplication). Let ψ ∈ C∞(Rn) have polynomial growth. Showthat there are Ck, Nk such that ϕ ∈ S(Rn) =⇒ ψϕ ∈ S(Rn), with |||ψϕ|||k ≤Ck|||ϕ|||k+2Nk . In particular, if ψ(x) = xα, show that |||xαϕ|||k ≤ 2k(α!)|||ϕ|||k+|α|.

(c) (Continuity of Fourier transform). Show that if ϕ ∈ S(Rn), then ϕ ∈ S(Rn), and|||ϕ|||k ≤ (8π)n(k + 1)!|||ϕ|||2n+k.

(d) Show that F : S(Rn)→ S(Rn) is an isomorphism.

12. Recall that a tempered distribution is a continuous linear functional on S(Rn). IfT ∈ S ′(Rn), we define T : S(Rn)→ C by T (ϕ) = T (ϕ).

(a) Check that T ∈ S ′(Rn).(b) Check that if g ∈ L1(Rn), then Tg = Tg.

(c) Compute δb and δ′b for b ∈ Rn.(d) Check that F is continuous on S ′(Rn), in the sense that Tn → T in S ′(Rn) implies

Tn → T in S ′(Rn).

(e) Show that (ix)αT = DαT and DβT = F [(−ix)βT ].

13. Show that if f, g ∈ S(Rn), then f ∗ g ∈ S(Rn).Hint: show that f ∗ g = (2π)n/2F−1(f g).

14. Show that ‖f‖k,p :=(∑|α|≤k ‖Dα

wf‖pp)1/p and Nk,p(f) :=

∑|α|≤k ‖Dα

wf‖p are equivalentnorms on W k,p(Rn).

15. Given s > 0, let λs(ξ) = (1 + |ξ|2)s/2. We define 7

Hs(Rn) = u ∈ L2(Rn) : λsu ∈ L2(Rn)

and endow it with ‖u‖2(s) =∫|λs(ξ)u(ξ)|2 dm(ξ).

(1) Show that |λs+1(ξ)u(ξ)|2 = |λs(ξ)u(ξ)|2 +∑nj=1 |λs(ξ)∂wj u(ξ)|2.

(2) Deduce that u ∈ Hs+1(Rn) ⇐⇒ u, ∂w1 u, . . . , ∂wn u ∈ Hs(Rn), with

‖u‖2(s+1) = ‖u‖2(s) +∑|α|=1

‖Dαwu‖2(s) .

(3) Deduce that Hk(Rn) = W k,2(Rn) for k ∈ N.

16. (Heisenberg uncertainty principle). Let f ∈ C1c (R).

1. Show that ‖f‖2L2 = −2 Re∫R xff

′.

2. Deduce that ‖f‖2L2 ≤ 2‖xf‖L2‖ξf‖L2 .

3. Deduce that for any x0, ξ0 ∈ R, ‖f‖2L2 ≤ 2‖(x− x0)f‖L2‖(ξ − ξ0)f‖L2 .Hint: consider g(x) = e−iξ0xf(x+ x0).

Thus, f and f cannot both be sharply localized about single points x0, ξ0. Thisinequality can be generalized to all f ∈ L2(R) (but the RHS may be infinite).

7. Actually, Hs(Rn) is defined for all s ∈ R, not just s > 0, but in this case, we consider u ∈ S ′(Rn).

Page 40: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

36 Chapter 2. Fourier analysis

17. (Fourier series). Let T = R/(2πZ) be the one-dimensional torus, which is obtainedfrom R by identifying points which differ by 2πk, for some k ∈ Z. We denote Lp(T) =Lp[0, 2π] and regard functions in Lp(T) as maps defined on the circle T.It is well known that if ek(x) = 1√

2πeikx on [0, 2π], then ekk∈Z is an orthonormal

basis of L2(T). It follows from Hilbert space theory that any f ∈ L2(T) can be writtenas f =

∑k∈Z〈f, ek〉ek, and that 〈f, g〉 =

∑k∈Z〈f, ek〉〈g, ek〉 for any f, g ∈ L2(T). Now

if for f ∈ L2(T) and k ∈ Z we define

(Ff)k = fk = 1√2π

∫ 2π

0f(x)e−ikx dx ,

we have 〈f, ek〉 = fk. Hence, f =∑k∈Z fkek. Moreover, since

∑k∈Z |fk|2 converges to

‖f‖2L2(T), it follows that F : L2(T)→ `2(Z), and

〈f, g〉L2(T) = 〈f , g〉`2(Z) (Parseval’s identity).

1. Let C1(T) = f ∈ C1[0, 2π] : f(0) = f(2π). Show that for f ∈ C1(T), we have(f ′)k = ikfk.

2. Let λs(k) = (1 + k2)s/2 for k ∈ Z. Given s > 0, we define the Sobolev space

Hs(T) =f ∈ L2(T) :

∑k∈Z

λ2s(k)|fk|2 <∞

and endow it with the norm ‖f‖2(s) =∑k∈Z λ

2s(k)|fk|2. Note that ‖f‖(0) = ‖f‖L2(T)by Parseval.We say that h ∈ L2(T) is a weak derivative of f ∈ L2(T) if 〈h, ϕ〉 = −〈f, ϕ′〉 for allϕ ∈ C1(T). In this case, we denote h = f ′w. Show that

H1(T) = f ∈ L2(T) : f ′w exists and f ′w ∈ L2(T) ,

and that ‖f‖2(1) = ‖f‖2L2(T) + ‖f ′w‖2L2(T).

3. Let SN =∑Nk=−N fkek. Show that if s > 1/2, there exists Cs such that for all

f ∈ Hs(T), we have ‖SN − f‖∞ ≤ CsNs−1/2 ‖f‖(s).

Hence, in this case, the Fourier series SN converges not only in L2(T) (which isknown by Hilbert space theory), but also uniformly.(Hint: ‖SN − f‖∞ ≤ 1√

2π∑|k|>N |fk| ≤ 1√

2π‖f‖(s)( ∫∞

N r−2s dr)1/2.)

4. Deduce the following Sobolev embedding: if s > 1/2, then Hs(T) → C(T).(Hint: SN ∈ C(T) for each N .)

5. Let f ∈ C1(R) and suppose there exist C, ε > 0 such that

|(1 + x2)1/2+εf(x)| ≤ C and |(1 + x2)1/2+εf ′(x)| ≤ C

for all x ∈ R. Show that the Poisson summation formula holds:∑k∈Z

f(x+ 2πk) =∑k∈Z

f(k)ek(x) .

Here f is the Fourier transform of L1(R) functions, i.e. f(k) = 1√2π∫R f(x)e−ixk dx.

In particular, for x = 0 we get∑k∈Z f(2πk) = 1√

2π∑k∈Z f(k).

(Hint: let g(x) =∑k∈Z f(x + 2πk) for x ∈ [0, 2π]. The hypotheses imply that this

series converge uniformly, so g ∈ C1(T) ⊂ H1(T). Hence, g =∑gkek. By (3),

the Fourier series of g converges uniformly, so we may calculate gk by integratingtermwise.

Page 41: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

2.6. Exercises 37

6. Deduce that coth(πa) = 1 + e−2πa

1− e−2πa = 1π

∑k∈Z

a

a2 + k2 .

(Hint: Apply Poisson to f(x) = e−a|x|.)

Page 42: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions
Page 43: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Chapter 3

Some Important Theorems

This chapter is a potpourri of classical results which are regularly used in analysis, andgenerally assumed to be well known. From now on, X is a vector space over K = R orC, and if X is locally convex, we denote its topological dual by X∗ instead of X ′. Thus,X∗ = λ : X → K | λ is linear and continuous .

3.1 Measure theory

We start by giving some fundamental results of measure theory, which are not alwayscovered in undergraduate courses. Here we follow [31].

Definition 3.1. A positive measure is a countably additive function defined on a σ-algebra, with range in [0,∞]. A complex measure is a complex-valued countably additivefunction defined on a σ-algebra.

Hence, a positive measure is just an ordinary measure with ∞ as an admissible value.On the other hand, when we talk about complex measures, µ(E) is always a complexnumber. So a positive measure is not necessarily a complex measure.

Definition 3.2. Let µ be a complex measure on a σ-algebra M. We define the totalvariation of µ as the function |µ| : M→ [0,∞] given by |µ|(E) := sup

∑∞i=1 |µ(Ei)|, where

the supremum is taken over all measurable partitions Ei∞i=1 of E.

Lemma 3.3. |µ| is a positive measure on M, and |µ|(X) <∞.

Proof. See the Appendix, Section 3.7.

Definition 3.4. Let µ be a positive measure on a σ-algebra M and let ν be a positiveor complex measure on M. We say that ν is absolutely continuous with respect to µ, anddenote ν µ, if for each E ∈M : µ(E) = 0 implies ν(E) = 0.

We say that ν is concentrated on a set A ∈M if ν(E) = ν(A ∩ E) for every E ∈M.We say that two measures ν1 and ν2 on M are mutually singular, and denote ν1 ⊥ ν2,

if there are disjoint A,B such that ν1 is concentrated on A and ν2 is concentrated on B.

The following result is “probably the most important theorem in measure theory”,according to [31].

Theorem 3.5 (Lebesgue-Radon-Nikodym). Let (X,M) be a measurable space. Let µ bea σ-finite positive measure on M and ν a complex measure on M.

Page 44: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

40 Chapter 3. Some Important Theorems

(a) There is a unique pair of complex measures νa and νs on M such that ν = νa + νs,νa µ and νs ⊥ µ.

(b) There is a unique h ∈ L1(µ) such that νa(E) =∫E h dµ for every E ∈M.

Proof. See the Appendix, Section 3.7.

This theorem has many applications. For instance, any complex measure gets a polardecomposition dµ = hd|µ| for some measurable h with |h(x)| = 1. Consequently. we candefine integration w.r.t µ by

∫f dµ :=

∫fhd|µ|. One also obtains the Hahn decomposition

for signed measures µ. Namely, X can be written as a disjoint union of two measurablesets A,B such that µ is positive on A and negative on B. Finally, we obtain the followingimportant consequence:

Theorem 3.6. Let (X,M, µ) be a measure space with µ σ-finite. Let 1 ≤ p < ∞ and qits conjugate exponent (i.e. 1

p + 1q = 1). Define Φ : Lq(µ) → Lp(µ)∗ by g 7→ λg, where

λg(f) =∫fg. Then Φ is an isometric isomorphism.

In particular, the dual of `p is `q (take X = N with the counting measure).

Proof. See the Appendix, Section 3.7.

We conclude this section by discussing the dual of C(X).

Definition 3.7. Let X be a locally compact Hausdorff space. We say that µ is a Borelmeasure on X if µ is defined on the σ-algebra of Borel sets.

If µ is positive, we say it is regular if for every Borel set E ⊆ X, we have

µ(E) = infV⊇E, V open

µ(V ) = supK⊆E, K compact

µ(K) .

If µ is complex, we say it is regular if |µ| is regular. We denote

M (X) = regular complex Borel measures on X ,M+,1(X) = regular Borel probability measures on X ,

and endow M (X) and M+,1(X) with the total variation norm ‖µ‖ = |µ|(X).

Theorem 3.8 (Riesz-Markov). Let X be a locally compact Hausdorff space and defineΦ : M (X) → C0(X)∗ by µ 7→ λµ, where λµ(f) =

∫f dµ. Then Φ is an isometric

isomorphism.If X is compact, the map µ 7→ λµ gives an isometric isomorphism of M+,1(X) with

C(X)∗+,1 = λ ∈ C(X)∗ : ‖λ‖ = 1 and λ(f) ≥ 0 whenever f ≥ 0.

This result is also known as the “Riesz representation theorem”.

Proof. See the Appendix, Section 3.7 for a sketch.

Before moving on, note that by Theorem 3.6, if p > 1, then Lp is reflexive, i.e. (Lp)∗∗ =(Lq)∗ = Lp. However, (L1)∗ = L∞ but (L∞)∗ 6= L1. It turns out that L∞(µ)∗ is thespace of bounded finitely additive signed measures ν which are absolutely continuous withrespect to µ. This is known as the Kantorovitch representation theorem, and is somehowreminiscent of Riesz-Markov. For a proof of this theorem, see [30, Section 19.3].

Page 45: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

3.2. Separation of convex sets 41

3.2 Separation of convex sets

In this section we follow [32]. Let us first recall the Hahn-Banach theorem:

Theorem 3.9 (Hahn-Banach). Let X be a real vector space and suppose p : X → Rsatisfies p(x + y) ≤ p(x) + p(y) and p(tx) = tp(x) for x, y ∈ X and t ≥ 0. Let Y be asubspace of X. If λ : Y → R is linear and λ(x) ≤ p(x) on Y , then λ has a linear extensionL : X → R such that L(x) = λ(x) on Y and L(x) ≤ p(x) on X.

Although this theorem is sometimes stated in the special case of normed spaces, theproof of the above statement is exactly the same. Let us also recall some properties of theMinkowski functional defined in the Exercise 2 of Chapter 1.

If C is a convex absorbing set in a vector space X, we define the Minkowski functionalρC : X → R by ρC(x) = inft > 0 : t−1x ∈ C. The reader was asked to prove thatρC(x + y) ≤ ρC(x) + ρC(y) and ρC(tx) = tρC(x) if t ≥ 0. If C is moreover open, thenC = x ∈ X : ρC(x) < 1. Indeed, if x ∈ C, then (1 + ε)x ∈ C for ε > 0 small enough,so ρC(x) ≤ 1

1+ε < 1. Conversely, if ρC(x) < 1, then ∃ t ∈ (0, 1) with t−1x ∈ C, sox = t(t−1x) + (1− t)0 ∈ C.

We now prove the following geometric version of the Hahn-Banach theorem.

Theorem 3.10. Suppose X is a locally convex space defined by a separating family ofseminorms. Let A,B ⊂ X be disjoint nonempty convex sets.(a) If A is open, there exists λ ∈ X∗ and γ ∈ R such that Reλ(a) < γ ≤ Reλ(b) for every

a ∈ A and b ∈ B.(b) If A is compact and B is closed, there exists λ ∈ X∗ and γ1, γ2 ∈ R such that Reλ(a) <

γ1 < γ2 < Reλ(b) for every a ∈ A and b ∈ B.

This is sometimes called the separating hyperplane theorem. That’s because, if wedefine Hγ = y ∈ X : λ(y) = γ, then this theorem says that A can be separated from Bby the hyperplane Hγ . This is certainly not true if A or B is not convex, even in R2.

Proof. It suffices to prove the theorem for real scalars. Indeed, if the field is C and thereal case is proved, then there is a continuous real-linear λ1 on X that gives the requiredseparation. Define λ(x) = λ1(x)− iλ1(ix). Then λ is complex-linear and continuous, i.e.λ ∈ X∗, and Reλ = λ1. Hence, we assume the field is R.

For (a), let −x0 ∈ A − B and let C = A − B + x0. Then 0 ∈ C, C is open, thusabsorbing, and convex. Let ρC be the Minkowski functional of C. Since A ∩B = ∅, thenx0 /∈ C, so ρC(x0) ≥ 1. Define λ on Y = tx : t ∈ R by λ(tx0) = t. Then for t ≥ 0,we have λ(tx0) = t ≤ tρC(x0) = ρC(tx0), and for t < 0, λ(tx0) < 0 ≤ ρC(tx0). Thus,λ(y) ≤ ρC(y) on Y . So by Hahn-Banach, we may extend λ linearly to all X and haveλ(x) ≤ ρC(x) on X.

Now if y ∈ C ∩ (−C), then ρC(±y) ≤ 1, so λ(y) ≤ 1 and −λ(y) = λ(−y) ≤ 1. Thus,|λ(y)| ≤ 1 on V = C ∩ (−C), so λ is continuous. Indeed, since V is open and 0 ∈ V , thengiven ε > 0, we may choose δ > 0 and p ∈ P such that B0,δ(p) ⊂ εV . Then z ∈ B0,δ(p)implies z = εy, y ∈ V , implies |λ(z)| = ε|λ(y)| ≤ ε, so λ is continuous at 0, so by linearity,λ is continuous on X.

Finally, if a ∈ A and b ∈ B, then λ(a)−λ(b) + 1 = λ(a− b+ x0) ≤ ρC(a− b+ x0) < 1,since λ(x0) = 1, a−b+x0 ∈ C and C is open. Thus, λ(a) < λ(b). It follows that λ(A) andλ(B) are disjoint convex subsets of R. Moreover, λ(A) is open. Indeed, if λ(x) ∈ λ(A),there is ε > 0 with Bx,ε(p) ⊆ A for some p ∈ P, since A is open. Taking a small ε′ > 0

Page 46: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

42 Chapter 3. Some Important Theorems

we have x ± ε′x0 ∈ Bx,ε(p). Hence, λ(x ± ε′x0) ∈ λ(A), i.e. λ(x) ± ε′ ∈ λ(A), so λ(A) isopen. Hence, if we let γ = supa∈A λ(a), we get (a). 1

For (b), let V be a convex neighborhood of 0 in X such that (A+ V )∩B = ∅. 2 Thenapplying (a) to A + V in place of A, there exists λ ∈ X∗ such that λ(x) < γ ≤ λ(b) forevery x ∈ A + V and b ∈ B. Let γ0 = supλ(A) = λ(a0), since A is compact. Thenλ(a) ≤ γ0 = λ(a0) < γ ≤ λ(b) for any a ∈ A and b ∈ B, so taking γ1, γ2 such thatγ0 < γ1 < γ2 < γ, we get (b).

Corollary 3.11. If X is a locally convex space defined by a separating family of semi-norms, then X∗ separates points.

Proof. If x1, x2 ∈ X, x1 6= x2, apply Theorem 3.10(b) to A = x1 and B = x2.

3.3 Banach-Alaoglu

Definition 3.12. Let X be a locally convex space defined by a separating family ofseminorms.(a) The weak topology on X is the topology induced by the family pλ : λ ∈ X∗, where

pλ(x) := |λ(x)|. This family of seminorms is separating by Corollary 3.11.(b) The weak* topology on X∗ is the topology on X∗ defined by the family of seminormspx : x ∈ X, where px(λ) := |λ(x)|. This family is separating by definition.

Remarks 3.13. (i) The weak* topology is weaker than the weak topology on X∗, sincethe latter would be induced by pη : η ∈ X∗∗, and any px is in this family, by takingη(λ) := λ(x) for λ ∈ X∗, so that η ∈ X∗∗, and pη(λ) = |η(λ)| = |λ(x)| = px(λ).

(ii) A sequence xn ⊂ X converges weakly to x if pλ(xn− x)→ 0 for every λ ∈ X∗, i.e.if λ(xn)→ λ(x) for every λ ∈ X∗. This is of course weaker than the convergence ofxn to x in norm, since |λ(xn)− λ(x)| ≤ ‖λ‖‖xn − x‖.

(iii) A sequence λn ⊂ X∗ is weak* convergent to λ if px(λn − λ)→ 0 for every x ∈ X,i.e. if λn(x)→ λ(x) for every x ∈ X.

(iv) If X is a Hibert space, than by the Riesz representation theorem, xn convergesweakly to x iff 〈xn − x, z〉 → 0 for each z ∈ X.

Example 3.14. If xn is an orthonormal sequence in an infinite dimensional Hilbertspace, then xn does not converge strongly, since ‖xn − xm‖2 = 2 for n 6= m. However,xn converges weakly to 0, because by Bessel’s inequality,

∑∞n=1 |〈xn, x〉|2 converges for

every x, so 〈xn, x〉 → 0 for every x. This shows that weak convergence is strictly weakerthan convergence in norm.

Similarly, in `p with 1 < p <∞, the sequence e1 = (1, 0, 0, . . .), e2 = (0, 1, 0, 0, . . .), etcconverges weakly to zero, but not in norm. Indeed, if 1

p + 1q = 1, then (`p)∗ = `q via the

identification `q 3 u ↔ fu ∈ (`p)∗, where fu(x) =∑xiui for x ∈ `p. Since

∑|ui|q < ∞

for every u ∈ `q, then fu(ei) = ui → 0, so ei converges weakly to 0.In X = C[a, b], if a sequence fn ⊂ X converges weakly to f ∈ X, then fn converges

pointwise to f . Indeed, λ(fn) → λ(f) for any λ ∈ X∗, so fixing t ∈ [a, b] and takingλt(f) := f(t), we have λt ∈ X∗, so fn(t)→ f(t).

1. Since λ(A) is open, then γ /∈ λ(A), for otherwise, there would exist ε > 0 with γ + ε ∈ λ(A), whichcontradicts that γ = supλ(A). Hence, λ(a) < γ for all a.

2. If X is a normed space, just take a sufficiently small ball. In general, use [32, Theorem 1.10].

Page 47: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

3.3. Banach-Alaoglu 43

We now proceed to the most useful property of the weak* topology. We denote by X∗1the closed unit ball of X∗, i.e. X∗1 = λ ∈ X∗ : ‖λ‖ ≤ 1. The following proof will involvethe product topology. We remind the reader (see Exercise 6 in Chapter 1), that if Yαα∈Iis a collection of topological spaces and Y =

∏α∈I Yα, then the product topology on Y is

the weak topology generated by the projections πα : Y → Yα which map (xβ) 7→ xα. Inother words, it is the weakest topology on Y which makes every πα continuous.

Theorem 3.15 (Banach-Alaoglu). For every normed space X, the ball X∗1 is compact inthe weak* topology.

Proof. Given x ∈ X, let Bx = c ∈ K : |c| ≤ ‖x‖. Then Bx is compact, so K :=∏x∈X Bx

is compact in the product topology, by Tychonoff’s theorem. Define i : X∗1 → K byλ 7→ (λ(x))x∈X . This is well-defined: if λ ∈ X∗1 , then ‖λ‖ ≤ 1 and thus λ(x) ∈ Bx.Clearly, i is injective. Moreover, if we let Kx,y,a,b = κ ∈ K : κax+by = aκx + bκy forx, y ∈ X and a, b ∈ K, then each Kx,y,a,b is closed 3 in K, and i(X∗1 ) =

⋂x,y,a,bKx,y,a,b.

Thus, i(X∗1 ) is closed in K, hence compact. Finally, if Bλ,ε(px1 , . . . , pxk) is a semiball inX∗1 , then i

(Bλ,ε(px1 , . . . , pxk)

)= (η(x))x∈X : |λ(x1)− η(x1)| < ε∧ . . .∧ |λ(xk)− η(xk)| <

ε =⋂ki=1 π

−1xi (λ(xi) − ε, λ(xi) + ε), which is open in K since all the πx are continuous.

Hence, the image of any open set under i is open, so i−1 is continuous on i(X∗1 ). Hence,i−1(i(X∗1 )

)= X∗1 is compact (because i−1 maps compact sets to compact sets).

Theorem 3.16 (Sequential Banach-Alaoglu). For every separable normed space X, theball X∗1 is sequentially compact in the weak* topology.

This theorem follows from Banach-Alaoglu because ifX is separable, thenX∗1 is metriz-able, so compactness is equivalent to sequential compactness. However, we shall give analternative proof which does not rely on Tychonoff, and only uses a “diagonal trick”.

Proof. Let xk∞k=1 be dense in X. The sequence fn(x1)n∈N is bounded in C, since|fn(x1)| ≤ ‖fn‖‖x1‖ ≤ ‖x1‖ for all n. Hence, fn(x1) has a convergent subsequencefϕ1(n)(x1)n∈N by Bolzano-Weierstrass. The sequence fϕ1(n)(x2) is also bounded, soit has a convergent subsequence fϕ1ϕ2(n)(x2)n∈N. And so on. Let ψk = ϕ1 · · · ϕk.Then fψk(n)(xk)n∈N converges for each k.

Let gn = fψn(n). Then for any k ∈ N, the sequence gn(xk)n∈N converges, becausen > k implies that gn(xk) is a subsequence of the convergent sequence fψk(n)(xk)n∈N.Define g(xk) := limn→∞ gn(xk). Then |g(xk)| = limn |gn(xk)| ≤ lim‖gn‖‖xk‖ ≤ ‖xk‖. So gis a bounded linear map on the dense set xk, so g has a unique bounded linear extensiong defined on all X. Let x ∈ X, say x = limk xϕ(k). Then

|g(x)− gn(x)| ≤ |g(x− xϕ(k))|+ |g(xϕ(k))− gn(xϕ(k))|+ |gn(x− xϕ(k))|≤ (‖g‖+ sup

n‖gn‖)‖x− xϕ(k)‖+ |g(xϕ(k))− gn(xϕ(k))| ,

and supn ‖gn‖ ≤ 1. Given ε > 0, choose k large enough so that the first term is smaller thanε/2. Then |g(x)−gn(x)| ≤ ε/2+|g(xϕ(k))−gn(xϕ(k))|. But gn(xϕ(k))→ g(xϕ(k)) = g(xϕ(k))as n → ∞. Hence, gn(x) → g(x). Since x is arbitrary, the subsequence (gn) of (fn) isweak* convergent to g.

Corollary 3.17. Any bounded sequence in a Hilbert space has a weakly convergent subse-quence.

3. as it is the preimage of the weak* closed set 0 under the map κ 7→ κax+by − aκx − bκy, which iscontinuous in the product topology.

Page 48: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

44 Chapter 3. Some Important Theorems

Proof. Let xn be bounded in H and let H0 = sp xn. Then H0 is separable. Assume‖xn‖ ≤M for all n and let fn(y) = 〈y, xnM 〉 for y ∈H0. Then ‖fn‖ = ‖xnM ‖ ≤ 1, so fn is abounded sequence in (H0)∗1, and has a weak* convergent subsequence by Theorem 3.16,say fnk(y) converges for all y ∈H0. Hence, 〈y, xnk〉 converges for all y ∈H0. Let P bethe orthogonal from H →H0 and let Q = I−P . Then given x ∈H , 〈Qx, xnk〉 = 0, sinceQx ⊥ H0. Hence, 〈x, xnk〉 = 〈Px, xnk〉 + 〈Qx, xnk〉 = 〈Px, xnk〉 converges. As x ∈ H isarbitrary, xnk converges weakly.

This result can be strengthened as follows: a Banach space is reflexive iff every boundedsequence has a weakly convergent subsequence. This is known as the Eberlein-Šmuliantheorem. For a proof, see [10, Chapter 9.8].

Example 3.18. Let µ0 ∈ M+,1([0, 1]) be the probability measure supported on 0,and µn ∈ M+,1([0, 1]) be the normalized Lebesgue measure supported on [0, 1/n]. ByTheorem 3.8, we may identify M+,1([0, 1]) with a subspace of C[0, 1]∗. Choose fn ∈ C[0, 1]such that ‖fn‖ = 1,

∫fn dµn = 1/2 and fn(0) = 0. 4 Then

∫fn dµ0 = fn(0) = 0. Hence,

‖µn− µ0‖ = sup‖f‖=1 |∫f d(µn− µ0)| ≥ 1/2, so µn does not converge to µ0 in norm. But

µn converges to µ0 in the weak* topology. Indeed, if f ∈ C[0, 1], then given ε > 0, wemay choose n large enough so that |f(x)− f(0)| < ε on [0, 1/n] (by continuity of f at 0).Then |

∫f dµn −

∫f dµ0| = |

∫(f(x)− f(0)) dµn(x)| < ε.

Corollary 3.19. If X is a compact Hausdorff space, then M1,+(X) is compact in theweak* topology.

Proof. Let C(X)∗1,+ =⋂f≥0λ ∈ C(X)∗1 : λ(f) ≥ 0. For each f , λ : λ(f) ≥ 0 is closed

in the weak* topology. Thus, C(X)∗1,+ is closed in C(X)∗1, hence compact by Banach-Alaoglu. Thus, M1,+(X) is compact by Theorem 3.8.

Our last application of Banach-Alaoglu gives an interesting property of C(K).

Theorem 3.20 (Universality of C(K)). Every normed space X is isometrically embeddedinto C(K) for some compact topological space K.

Proof. Let K = X∗1 , endowed with the weak* topology. Then K is compact by Banach-Alaoglu. Define ϕ : X → C(K) by ϕ : x 7→ Ex, where Ex(f) = f(x). Note that Ex isindeed continuous by definition of the weak* topology. Clearly ϕ is linear, and it is anisometric embedding because ‖Ex‖ = supf∈K |Ex(f)| = supf∈X∗1 |f(x)| = ‖x‖, where weused the Hahn-Banach theorem in the last equality.

Theorem 3.21 (Banach-Mazur). Every separable normed space X is isometrically em-bedded into C[0, 1].

Proof. (Sketch). When X is separable, the weak* topology on X∗1 becomes metrizable, soX can be embedded in C(K) for a compact metric space K, and this essentially finishesthe proof. For the details, see [25, Theorem 3.12] or [5, Corollary 12.14].

3.4 Markov-Kakutani and Haar

Definition 3.22. Let T : X → X be a map on a set X. We say that x ∈ X is a fixedpoint of T if Tx = x.

4. e.g. fn(x) = nx if x ∈ [0, 1/n], fn(x) = 2− nx if x ∈ [1/n, 2/n] and fn(x) = 0 on [2/n, 1].

Page 49: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

3.4. Markov-Kakutani and Haar 45

The reader probably learned the Banach fixed point theorem as an undergraduate.Namely, if X is a complete metric space and T : X → X is a strict contraction (meaningd(Tx, Ty) ≤ kd(x, y) with 0 ≤ k < 1), then T has a unique fixed point.

Here is another fixed point theorem:

Theorem 3.23 (Leray-Schauder-Tychonoff). Let X be a locally convex space defined bya separating family of seminorms. Let K be a nonempty compact convex subset of X andlet T : K → K be continuous. Then T has a fixed point.

This theorem is much more difficult to prove; see [9, Chapter V.10]. It generalizesthe Brouwer fixed point theorem, which says that if B1 is the closed unit ball of Rn andT : B1 → B1 is continuous, then T has a fixed point; a result which is already quite deep.

Following [7, 44], we shall prove a stronger result by more elementary means, but inthe special case where the maps are linear in some sense.

Definition 3.24. Let X, Y be vector spaces, C a convex subset of X. We say thatT : C → Y is affine if T (

∑αjxj) =

∑αjT (xj) whenever xj ∈ C, αj ≥ 0 and

∑αj = 1.

Theorem 3.25 (Markov-Kakutani). Let X be a locally convex space defined by a separat-ing family of seminorms. Let K be a nonempty compact convex subset of X and let F bea family of commuting continuous affine maps of K into itself. Then there is an x0 ∈ Ksuch that T (x0) = x0 for all T ∈ F .

Proof. If T ∈ F and n ≥ 1, define T (n) = 1n

∑n−1k=0 T

k. Then T (n) : K → K, since K isconvex. Let K = T (n)(K) : T ∈ F , n ≥ 1. If T1, . . . , Tp ∈ F and n1, . . . , np ≥ 1, then thecommutativity of F implies that ∅ 6= T

(n1)1 · · ·T (np)

p (K) ⊆ ∩pj=1T(nj)j (K). Thus, K has the

finite intersection property. Since K is a collection of compact sets, it follows that thereis an x0 in

⋂B : B ∈ K. We claim that x0 is the required common fixed point.

Let T ∈ F and n ≥ 1. Then x0 ∈ T (n)(K), so ∃x ∈ K such that x0 = T (n)(x) =1n(x + T (x) + · · · + Tn−1(x)). Hence, T (x0) − x0 = 1

n(T (x) + . . . + Tn(x)) − 1n(x + . . . +

Tn−1(x)) = 1n(Tn(x) − x). Let p be one of the seminorms defining the topology. Then

for each n we have p(T (x0) − x0) ≤ 2Bn , where B = supp(y) : y ∈ K < ∞ since K is

compact. Since this holds for each n, it follows that p(T (x0)−x0) = 0. Since p is arbitraryand P is separating, we thus get T (x0) = x0.

Definition 3.26. We say that G is a compact abelian group if G is a compact topologicalspace which is also an abelian group.

We say that a measure µ on G is invariant if∫f dµ =

∫fg dµ for any g ∈ G and

f ∈ C(G), where fg is the translate of f , namely fg(h) = f(h− g).

Corollary 3.27 (Haar measure). There exists an invariant measure on any compactabelian group G. This measure is unique up to positive scalar multiple. We call it theHaar measure on G.

This corollary can be generalized to locally compact groups. Moreover, the group canbe non-abelian, but in this case one gets left and right Haar measures.

Proof. We only prove existence; uniqueness is proved by different arguments.Let µ ∈ M1,+(G) and define Tgµ by Tgµ(f) = µ(fg). Then Tg maps M1,+(G) →

M1,+(G) and is continuous in the vague (weak*) topology. Indeed, given ε > 0 andf ∈ C(G), we have pf (µ) = |µ(f)|, so |pf (Tgµ− Tgν)| = |pfg(µ− ν)|. Choosing δ = ε, wehave |pfg(µ− ν)| < δ =⇒ |pf (Tgµ− Tgν)| < ε. Hence, Tg is continuous.

Page 50: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

46 Chapter 3. Some Important Theorems

Clearly, M1,+(G) is convex, and M1,+(G) is compact by Corollary 3.19. Finally, thevarious Tg commute since G is abelian, and they are affine. It follows from Markov-Kakutani that there exists a common fixed point µHaar with the desired invariance prop-erty.

Example 3.28. (a) If G = Rn, the Haar measure is the Lebesgue measure.(b) If G = S1 is the unit circle, the Haar measure is arc length. For the torus Tn = (S1)n,

the Haar measure is the product of the arc length measures on the factors.(c) If G is discrete, the Haar measure is the counting measure.

3.5 Krein-MilmanThe Krein-Milman theorem describes compact convex sets in terms of their extreme

points. We shall follow [44]. If X is a vector space and x, y ∈ X, we denote

[x, y] = tx+ (1− t)y : t ∈ [0, 1] and (x, y) = tx+ (1− t)y : t ∈ (0, 1) .

Definition 3.29. Let X be a vector space and let A ⊆ X be convex.If x ∈ A, we say that x is an extreme point of A if

x ∈ [y, z] for some y, z ∈ A =⇒ x = y or x = z .

We let ex(A) be the set of extreme points of A.If ∅ 6= B ⊆ A is convex, we say that B is an extreme set in A if

y, z ∈ A and (y, z) ∩B 6= ∅ =⇒ [y, z] ⊆ B .

Remark 3.30. x is an extreme point of A iff x is an extreme set in A. Indeed, if x is anextreme point of A and x∩ (y, z) 6= ∅, then x ∈ (y, z), so x ∈ [y, z] but x 6= y and x 6= z,a contradiction. Hence, x ∩ (y, z) = ∅ for any y, z ∈ A, so x is trivially an extremeset. Conversely, if x is an extreme set and x ∈ [y, z], then x = y or x = z or y < x < z.The last case is impossible, since it implies x ∩ (y, z) 6= ∅, and hence [y, z] ⊆ x, i.e.y = z = x, a contradiction. Hence, x is an extreme point.

Example 3.31. (a) In Rn, the extreme points of a closed convex polyhedron are just thevertices. Every edge, face, etc. is an extreme set. Half a face is not an extreme set.

(b) The set of extreme points of a closed euclidean ball in Rn is the boundary sphere.(c) If X is a compact Hausdorff space, then ex(M+,1(X)) = δx : x ∈ X, where δx is

the probability measure supported on x. Indeed, if C ⊂ X has 0 < µ(C) < 1 andµC(B) := µ(C)−1µ(B ∩ C), µX\C(B) := µ(X \ C)−1µ(B \ C), then taking θ = µ(C)we get µ = θµC + (1− θ)µX\C , so µ is not an extreme point.If µ has the property that µ(A) is 0 or 1 for each A ⊆ X, and if x 6= y are bothin suppµ, then we can find disjoint open sets B,C with x ∈ B and y ∈ C. Hence,1 = µ(B ∪ C) = µ(B) + µ(C), so µ(B) = 0 or µ(C) = 0 by the 0 -1 hypothesis. Thismeans that x and y cannot be both in suppµ, so µ = δx for some x.Thus, ex(M+,1(X)) ⊆ δx : x ∈ X. Suppose δx = aµ+ bν with a+ b = 1. If x ∈ A,then 1 = δx(A) = aµ(A) + bν(A). Now µ(A) ≤ 1 and ν(A) ≤ 1. If µ(A) < 1 orν(A) < 1, then aµ(A) + bν(A) < a+ b = 1, a contradiction. Hence, µ(A) = ν(A) = 1.Similarly, if x /∈ A, then since µ(A) ≥ 0 and ν(A) ≥ 0, we get µ(A) = ν(A) = 0.Hence, µ = ν = δx, so δx ∈ ex(M+,1(X)). Thus, ex(M+,1(X)) = δx : x ∈ X.

Page 51: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

3.6. Stone-Weierstrass vs Monotone Class 47

Definition 3.32. Let X be locally convex and let A ⊆ X. We define the closed convexhull of A, denoted co(A), the smallest closed convex set containing A. This set exists sinceintersections of convex/closed sets are convex/closed.

Theorem 3.33 (Krein-Milman). Let X be a locally convex space defined by a separatingfamily of seminorms. If A ⊆ X is compact and convex, then A = co(ex(A)).

Proof. Let B be a closed extreme subset of A. We first prove that B contains an extremepoint of A. Let F be the family of closed subsets of B which are extreme in A. Partiallyorder F by F1 ≥ F2 if F1 ⊆ F2. Then every chain Fαα∈I has an upper bound, namely∩α∈IFα. This is closed, nonempty (Cantor’s intersection theorem) and extreme in A: if(y, z) ∩ (∩α∈IFα) 6= ∅, then (y, z) ∩ Fα 6= ∅ for each α, so [y, z] ⊆ Fα for each α, so[y, z] ⊆ ∩α∈IFα. It follows from Zorn’s lemma that F has a maximal element (i.e. aminimal closed subset of B which is extreme in A), call it F . If x, y ∈ F , x 6= y, then byTheorem 3.10(b), there is an f ∈ X∗ with Re f(x) < Re f(y). Let M = maxz∈F Re f(z),which exists since F is compact, and let D = w ∈ F : Re f(w) = M. Then D is extremein A, 5 D ⊂ F and D 6= F since x /∈ D. This contradicts the minimality of F . Thus, F isa singleton x0. We thus showed that B contains an extreme point of A (namely, x0).

Since ex(A) ⊂ A and A is closed and convex, then C := co(ex(A)) ⊆ A. SupposeC 6= A. Then there is an x ∈ A\C. Since C is closed and convex, by Theorem 3.10(b), wemay find λ ∈ X∗ such that Reλ(y) < α < Reλ(x) for all y ∈ C. LetM = maxz∈A Reλ(z)and B = z ∈ A : Reλ(z) = M. Then B is extreme in A, and since M > α, we haveB ∩ C = ∅. But by the preceding paragraph, B contains an extreme point of A, sayx0 ∈ B ∩ C. This contradiction shows that C = A.

The Krein-Milman theorem has many applications in different fields. For example, itcan be combined with Markov-Kakutani to prove the existence of ergodic measures oncompact spaces (see [44, 36]). A stronger Krein-Milman theorem can be found in [32, 36].This states that for A as in Theorem 3.33, any p ∈ A is the barycenter of a probabilitymeasure µ on ex(A), that is, f(p) =

∫z∈ex(A) f(x) dµ(x) for all f ∈ X∗. This gives a

better representation than the convex series of extreme points given by the usual Krein-Milman. Let us mention, however, that the mere existence of extreme points in compactconvex sets is powerful. In fact, this is all we’ll need in the following section to prove theStone-Weierstrass theorem. The Krein-Milman theorem is also used in game theory tosolve matrix games (see [16]). As a final connection with the idea of extreme points, wemention the Banach-Stone theorem: let X and Y be compact Hausdorff spaces such thatC(X) and C(Y ) are linearly isometric. Then X and Y are homeomorphic. For a proof,see [9, Chapter V.8].

3.6 Stone-Weierstrass vs Monotone ClassIn this section we follow [7, 2, 13].The Stone-Weierstrass theorem is an important extension of the Weierstrass approxi-

mation theorem of C[0, 1]. To motivate the result, let us give some of its consequences:

Example 3.34. (a) Suppose X ⊂ Rn is compact. Let P (X) ⊂ C(X) be the restrictionsof polynomial functions on Rn to X. Then P (X) is dense in C(X).

5. Indeed, if y, z ∈ A and (y, z) ∩D 6= ∅, say v = ty + (1− t)z ∈ D, then (y, z) ∩ F 6= ∅, so [y, z] ⊆ F ,so Re f(y) ≤ M and Re f(z) ≤ M . But Re f(v) = M . If Re f(y) < M or Re f(z) < M , we would getRe f(v) = tRe f(y) + (1− t) Re f(z) < M , a contradiction. Hence, Re f(y) = Re f(z) = M . By linearity,if w ∈ [y, z], then Re f(w) = M . Hence, [y, z] ⊆ D.

Page 52: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

48 Chapter 3. Some Important Theorems

(b) Let S1 ⊂ C be the unit circle. Let P (S1) be the set of functions of the form f(z) =∑Nn=−N anz

N , for arbitrary N ∈ N. Then P (S1) is dense in C(S1).(c) Let X be a compact Hausdorff space and let A ⊂ C(X ×X) be the set of all finite

linear combinations of functions of the form (x, y) 7→ f(x)g(y), where f, g ∈ C(X). 6

Then A is dense in C(X ×X).

Theorem 3.35 (Stone-Weierstrass). Let K be a compact Hausdorff space and A ⊆ C(K)a linear subspace. Suppose that(i) A is a subalgebra, i.e. is closed under multiplication.(ii) The constant function 1 ∈ A .(iii) A separates points, i.e. if x, y ∈ K and x 6= y, there is some f ∈ A such that

f(x) 6= f(y).(iv) If K = C, then f ∈ A implies f ∈ A .Then A is dense in C(K).

Proof. (de Branges). Let A ⊥ = µ ∈ C(K)∗ :∫g dµ = 0 ∀ g ∈ A . It suffices to show

that A ⊥ = 0. 7 Suppose A ⊥ 6= 0. By Banach-Alaoglu, A ⊥1 is weak* compact, so byKrein-Milman, there is an extreme point µ ∈ A ⊥1 . Then µ 6= 0, since 0 = 1

2ν + 12(−ν)

for ν ∈ A ⊥ \ 0 shows that 0 is not extreme. Thus, 0 < ‖µ‖ ≤ 1. If ‖µ‖ < 1, thenµ = ‖µ‖(µ/‖µ‖) + (1−‖µ‖)0 implies µ = µ/‖µ‖ = 0, a contradiction. Hence, ‖µ‖ = 1. Inparticular, if C = suppµ, then C 6= ∅. Fix x0 ∈ C. We show that C = x0.

Let x ∈ C, x 6= x0. By (iii), there is an f1 ∈ A with f1(x0) 6= f1(x) =: β. By (ii),the constant function β ∈ A . Hence, f2 = f1 − β ∈ A , f2(x0) 6= 0 = f2(x). By (iv),f3 = |f2|2 = f2f2 ∈ A . Also, f3(x) = 0 < f3(x0) and f3 ≥ 0. Put f = (‖f3‖ + 1)−1f3.Then f ∈ A , f(x) = 0, f(x0) > 0 and 0 ≤ f < 1. By (i), gf and g(1− f) ∈ A for everyg ∈ A . Since µ ∈ A ⊥, we get 0 =

∫gf dµ =

∫g(1 − f) dµ for every g ∈ A . Hence, fµ

and (1− f)µ ∈ A ⊥. (Here hµ denotes the measure E 7→∫E h dµ).

Put α = ‖fµ‖ =∫f d|µ|. Since f(x0) > 0, there is an open neighborhood U of x0 and

an ε > 0 such that f(y) > ε for y ∈ U . Thus, α =∫f d|µ| ≥

∫U f d|µ| ≥ ε|µ|(U) > 0, since

U ∩ C 6= ∅. Similarly, 1 − f > 0 (as f < 1) implies α < 1. Hence, 0 < α < 1. Moreover,1−α =

∫(1−f) d|µ| = ‖(1−f)µ‖. Hence, µ = α(fµ/‖fµ‖)+(1−α)[(1−f)µ/‖(1−f)µ‖],

which implies µ = fµ/‖fµ‖. But the only way the measures µ and α−1fµ = fµ/‖fµ‖can be equal, is that α−1f = 1 µ-a.e. Since f is continuous, this implies f ≡ α on C.In particular, f(x0) = f(x) = α. But f(x) = 0 6= α. This contradiction shows thatC = x0. Hence, µ = γδx0 for some |γ| = 1. But µ ∈ A ⊥ and 1 ∈ A , so 0 =

∫1 dµ = γ,

a contradiction. Thus, A ⊥ = 0, so A = C(K).

Using Banach-Alaoglu and Krein-Milman to prove Stone-Weierstrass is perhaps overkill.A more standard proof can be found in [28]. The previous theorem can be extended toC0(X) with X locally compact without difficulty; see [7, Corollary 8.3].

We now give an analog of the Stone-Weierstrass theorem which holds in the measurableworld. This is frequently used in probability theory.

Theorem 3.36 (Functional Monotone Class). Let V be a family of bounded real valued-functions on a set Ω such that(i) V is a real vector space.

6. This is just the algebraic tensor product C(X)⊗ C(X).7. If A 6= C(K), take f ∈ C(K) \A . By Hahn-Banach [22, Lemma 4.6-7], there is a λ ∈ C(K)∗ with

λ(f) = 1 and λ(g) = 0 for all g ∈ A . Hence, λ ∈ A ⊥ is nonzero, i.e. A ⊥ 6= 0.

Page 53: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

3.6. Stone-Weierstrass vs Monotone Class 49

(ii) The constant function 1 ∈ V .(iii) If fn ∈ V are nonnegative, increasing and supn ‖fn‖∞ <∞, then f := limn fn ∈ V .Suppose C is a subset of V which is closed under multiplication. Then V contains allbounded σ(C)-measurable functions.

Proof. We first prove that V is closed under uniform convergence. So let fn ⊂ V with‖fn − f‖∞ → 0. By passing to a subsequence, we may assume ‖fn+1 − fn‖∞ ≤ 2−n. Letgn = fn − f1 − 21−n + 1. Then gn ∈ V since V is a vector space containing the constantfunctions. Moreover, ‖gn‖∞ ≤ ‖fn‖∞ + ‖f1‖∞ + 2, so gn is uniformly bounded. Finally,gn+1−gn = fn+1−fn+2−n ≥ 0 for n ≥ 1 and g1 = 0. It follows from (iii) that lim gn ∈ V .Hence, f = (lim gn) + f1 − 1 ∈ V .

Now let L = D ∈ σ(C) : 1D ∈ V . By (i), (ii), (iii), L is a λ-system 8 on Ω. Wewill show that L contains a π-system P generating σ(C). By the Dynkin π-λ theorem,it will follow that L ⊇ σ(P) = σ(C), hence L = σ(C). Consequently, V will contain theindicator function 1D of every D ∈ σ(C), and this will complete the proof since everybounded σ(C)-measurable function is the pointwise limit of an increasing sequence ofsimple functions, and V is closed under such limits by (iii).

Let f1, . . . , fn ∈ C, let M = max1≤i≤n ‖fi‖∞ and let Φ : Cn → C be continuous. Thenby Stone-Weierstrass, Φ is the uniform limit on K = |x| ≤ M of a sequence (Rm) ofpolynomial functions in n variables. By hypothesis on C and V , Rm (f1, . . . , fn) belongsto V . Take Φ(x1, . . . , xn) =

∏ni=1 min

(1, r(xi − ai)+

)for r > 0 and a1, . . . , an ∈ R. The

(increasing) limit of Φ(x1, . . . , xn) as r →∞ is∏ni=1 1(xi>ai). Hence, if f1, . . . , fn ∈ C, the

function∏ni=1 1(fi>ai) belongs to V . Hence, B = ∩ni=1(fi > ai) ∈ L for all a1, . . . , an ∈ R.

It follows that L contains the π-system P consisting of finite intersections of sets of theform f−1(I), where f ∈ C and I ⊂ R is an open halfline. Thus, σ(C) = σ(P) ⊆ L.

Example 3.37. 1. Let (Ω,P) be a probability space, let X,Y be random variables andsuppose E[f(X)g(Y )] = E[f(X)g(X)] for all real f, g ∈ C∞c (R). Then P[X = Y ] = 1.Indeed, let V be the set of all bounded real B(R2)-measurable h : R2 → R such thatE[h(X,Y )] = E[h(X,X)]. Then V satisfies (i)-(iii). Let C be the set of functions(x, y) 7→ f(x)g(y) with f, g ∈ C∞c (R). By hypothesis, C ⊂ V , so by monotone class,V contains all bounded σ(C)-measurable functions. But B(R2) is generated by pro-jections π1(x, y) = x and π2(x, y) = y, which are pointwise limits of fn(x)gn(y) andgn(x)fn(y), respectively, where fn, gn are smooth, fn(t) = t on [−n, n], gn = 1 on[−n, n], fn = gn = 0 outside [−n − 1, n + 1]. Since fn, gn are σ(C)-measurable, thenπ1, π2 are σ(C)-measurable. Hence, σ(C) contains B(R2). Thus, V contains all Borelfunctions. Choosing h = 1∆, where ∆ = (x, x) : x ∈ R, we get P[X = Y ] = 1.

2. Let µωω∈Ω be a family of measures on R. The following statements are equivalent:(a) For all t ∈ R, the map ω 7→

∫χ(−∞,t](x) dµω(x) is measurable.

(b) For all bounded Borel f : R→ C, the map ω 7→∫f(x) dµω(x) is measurable.

Indeed, (b) trivially implies (a). For the converse, let V be the set of all boundedmeasurable f : R → R with ω 7→

∫f(x) dµω(x) measurable. Then V satisfies (i)-(iii).

Let C = χ(−∞,t] : t ∈ R. By hypothesis, C ⊂ V . By monotone class, V contains allσ(C)-measurable functions, i.e. all Borel measurable functions f : R→ R. By linearity,for all Borel functions f : R→ C, the map ω 7→

∫f(x) dµω(x) is measurable.

8. Recall that a π-system P on a set Ω is a collection of subsets of Ω which is is closed under finiteintersections (e.g. Ω = R and P = (a, b] : −∞ ≤ a < b < ∞). A λ-system on Ω is a collection L ofsubsets of Ω such that (i) Ω ∈ L, (ii) if A ∈ L, then Ac ∈ L, (iii) if An ∈ L, An ∩Am = ∅ for n 6= m, then∪An ∈ L. The Dynkin π-λ theorem says that if L is a λ-system containing a π-system P, then σ(P) ⊆ L.For a proof, see [3, Theorem 1.1].

Page 54: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

50 Chapter 3. Some Important Theorems

3. Endow Ω with a σ-algebra F , let µωω∈Ω be a family of measures on R and supposeω 7→ µω(B) is F -measurable for each B ∈ B(R). If W ⊆ Ω × R is F ⊗ B(R)-measurable, define the ω-section Wω = t ∈ R : (ω, t) ∈W for each fixed ω ∈ Ω. Thenthe map ω 7→ µω(Wω) is F -measurable. Indeed, if W = A × B for some A ∈ F andB ∈ B(R), then µω(Wω) = 1A(ω)µω(B) and the measurability of this mapping is clear.The general case now follows from a monotone class argument.

3.7 Appendix

In this section we prove the results of Section 3.1.

Lemma 3.38. |µ| is a positive measure on M, and |µ|(X) <∞.

Proof. Clearly, |µ|(B) ≥ 0. If B = ∪∞1 Bn with Bn∞1 ⊂M pairwise disjoint, then givenε > 0, choose a measurable partition Ei∞1 of B such that |µ|(B)−ε <

∑∞1 |µ(Ei)|. Then

|µ|(B) − ε ≤∑∞i=1 |

∑∞n=1 µ(Ei ∩ Bn)| ≤

∑∞n=1

∑∞i=1 |µ(Ei ∩ Bn)| (we can always change

the order of summation when the terms are positive). But Ei ∩ Bn∞i=1 is a partition ofBn. Hence, |µ|(B)− ε ≤

∑∞1 |µ|(Bn). Thus, |µ|(B) ≤

∑∞1 |µ|(Bn). If |µ|(B) =∞, we get∑∞

1 |µ|(Bn) = ∞, so |µ|(B) =∑∞

1 |µ|(Bn). If |µ|(B) < ∞, then |µ|(Bn) < ∞ for eachn. Given ε > 0, let E(n)

i ∞i=1 be a partition of Bn such that∑i |µ(E(n)

i )| > |µ|(Bn)− ε2n .

Then∑Nn=1 |µ|(Bn) <

∑Nn=1

(∑i |µ(E(n)

i )| + ε2n)≤ ε +

∑Nn=1

∑i |µ(E(n)

i )| ≤ ε + |µ|(B).Letting N →∞ and ε→ 0 gives

∑∞1 |µ|(Bn) ≤ |µ|(B). Thus, |µ|(B) =

∑∞1 |µ|(Bn).

To prove that |µ| is finite, we use a somehow reversed triangle inequality:If z1, . . . , zN ∈ C, then ∃ S ⊆ 1, . . . , N such that |

∑k∈S zk| ≥ 1

π

∑Nk=1 |zk|.

To see this, write zk = |zk|eiαk . For any S ⊆ 1, . . . , N and θ ∈ [−π, π], we have|∑S zk| = |

∑S e−iθzk| ≥ Re

∑S e−iθzk =

∑S |zk| cos(αk − θ). Let Sθ = 1 ≤ k ≤ N :

cos(αk− θ) > 0. Then |∑S(θ) zk| ≥

∑S(θ) |zk| cos(αk− θ) =

∑Nk=1 |zk| cos+(αk− θ). Now∫ π

−π cos+(α−θ) dθ = 2 for any α ∈ [0, 2π[. Indeed,∫ π−π =

∫ α−π−π +

∫ πα−π, and

∫ α−π−π cos+(α−

θ) =∫ α−π−π cos+(α − θ − 2π) =

∫ π+απ cos+(α − θ). Hence,

∫ α−π−π +

∫ πα−π =

∫ π+απ +

∫ πα−π =∫ α+π

α−π . Thus,∫ π−π cos+(α− θ) =

∫ α+πα−π cos+(α− θ) =

∫ π−π cos+ θ =

∫ π/2−π/2 cos θ = 2. We thus

showed that 2∑Nk=1 |zk| =

∫ π−π∑Nk=1 |zk| cos+(αk − θ) ≤

∫ π−π |

∑S(θ) zk| ≤ 2π|

∑S(θ0) zk|,

where |∑S(θ0) zk| = max−π≤θ≤π |

∑S(θ) zk| is attained since the function is continuous.

Choosing S = Sθ0 gives the claim.Back to the lemma. Suppose first that some E ∈ M has |µ|(E) = ∞. Put t =

π(1 + |µ(E)|). Since |µ|(E) > t, there is a partition Ei of E such that∑N

1 |µ(Ei)| > tfor some N . Using the preceding paragraph with zi = µ(Ei), we find a set A ⊆ E (a unionof some Ei) such that |µ(A)| > t

π > 1. Set B = E \ A. Then |µ(B)| = |µ(E) − µ(A)| ≥|µ(A)| − |µ(E)| > t

π − |µ(E)| = 1. We have thus split E into disjoint sets A and B suchthat |µ(A)| > 1 and |µ(B)| > 1. Since |µ|(E) = ∞, either |µ|(A) or |µ|(B) is infinite(because we showed that |µ| is a measure).

Now suppose |µ|(X) = ∞. Split X into A1 and B1 as above with |µ(A1)| > 1 and|µ|(B1) = ∞. Split B1 into A2 and B2 with |µ(A2)| > 1 and |µ|(B2) = ∞. Continuingthis way, we get a countably infinite collection Ai with |µ(Ai)| > 1 for each i. Since µis countably additive, we have

∑i µ(Ai) = µ(∪iAi) ∈ C. But the series cannot converge

since µ(Ai) does not tend to 0 as i→∞. This contradiction shows that |µ|(X) <∞.

We will need some technical lemmas before we move on.

Lemma 3.39. Let (X,M, µ) be a measure space, E ∈M and f : E → [0,∞] measurable.

Page 55: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

3.7. Appendix 51

(i) (Markov inequality). For any α > 0, µx ∈ E : f(x) ≥ α ≤ 1α

∫E f dµ.

(ii) If∫E f dµ = 0, then f = 0 for µ-a.e. x ∈ E.

(iii) If f > 1 on E and µ(E) 6= 0, then∫E f dµ > µ(E).

Proof. (i) Since f ≥ αχf≥α, we have∫E f dµ ≥

∫E αχf≥α dµ = αµx ∈ E : f(x) ≥ α.

For (ii), let F = x ∈ E : f(x) > 0, Fn = x ∈ E : f(x) ≥ 1n and note that F = ∪∞1 Fn.

By (i), µ(Fn) ≤ n∫E f dµ = 0. Hence, µ(F ) ≤

∑µ(Fn) = 0. Finally, if f − 1 > 0 on

E and µ(E) 6= 0, then by contraposition of (ii), we must have∫E(f − 1) dµ 6= 0. But∫

E(f − 1) dµ ≥ 0 since f > 1. Hence,∫E(f − 1) dµ > 0, i.e.

∫E fdµ > µ(E).

Lemma 3.40. Let (X,M, µ) be a measure space with µ positive and finite. Let f ∈ L1(µ),M+ = E ∈M : µ(E) > 0 and define the average AE(f) = 1

µ(E)∫E f dµ for E ∈M+. If

S ⊆ C is closed and AE(f) ∈ S for all E ∈M+, then f(x) ∈ S for µ-a.e. x ∈ X.

Proof. Let ∆ be a disc in Sc with center α and radius r > 0 and put E = x : f(x) ∈ ∆.Since Sc is a countable union of discs, it suffices to prove that µ(E) = 0. Suppose µ(E) > 0.Then |AE(f) − α| = 1

µ(E) |∫E(f − α) dµ| ≤ 1

µ(E)∫E |f − α| dµ ≤ r. But this is impossible

since AE(f) ∈ S. Hence, µ(E) = 0.

Lemma 3.41. Let (X,M, µ) be a measure space with µ positive and σ-finite. Then thereexists w ∈ L1(µ) such that 0 < w(x) ≤ 1 for every x ∈ X.

We will use this lemma to replace µ by the finite measure µ = wµ given by µ(E) =∫E w dµ, which has exactly the same sets of measure 0 as µ by Lemma 3.39.

Proof. Let X = ∪∞1 En with En ∈ M and µ(En) finite. Put wn(x) = 2−n/(1 + µ(En))if x ∈ En and wn(x) = 0 if x ∈ Ecn. Let w =

∑∞1 wn. If x ∈ X, say x ∈ En, then

0 < wn(x) ≤ w(x) ≤∑

2−n = 1 and∑N

1 wn ↑ w, so by monotone convergence, ‖w‖1 =∫w = limN

∫ ∑N1 wn = limN

∑N1∫wn = limN

∑N1(2−n µ(En)

1+µ(En))≤ 1.

Theorem 3.42 (Lebesgue-Radon-Nikodym). Let (X,M) be a measurable space. Let µ bea σ-finite positive measure on M and ν a complex measure on M.(a) There is a unique pair of complex measures νa and νs on M such that ν = νa + νs,

νa µ and νs ⊥ µ.(b) There is a unique h ∈ L1(µ) such that νa(E) =

∫E h dµ for every E ∈M.

Proof. (von Neumann). First assume ν is positive and finite. Associate w to µ as inLemma 3.41 and let λ = wµ + ν. Consider the functional F (f) =

∫f dν on L2(λ).

By Cauchy-Schwarz, |F (f)| ≤∫X |f |dν ≤

∫X |f |dλ ≤ ‖f‖ · (λ(X))1/2, where ‖f‖ =

(∫|f |2 dλ)1/2. Thus, F ∈ L2(λ)∗. Applying Riesz’s representation to the Hilbert space

L2(λ), there exists g ∈ L2(λ) such that F (f) =∫X fg dλ. Thus,

∫X f dν =

∫X fg dλ. In

particular, for f = 1B we get

ν(B) =∫Bg dλ for B ∈M . (†)

But 0 ≤ ν ≤ λ, so 0 ≤ 1λ(B)

∫B g dλ = ν(B)

λ(B) ≤ 1 for every B ∈ M+. Hence, 0 ≤ g ≤ 1for λ-a.e. x by Lemma 3.40. Modifying g on the λ-null set does not affect (†), so we mayassume 0 ≤ g ≤ 1 on X. Let S1 = x : 0 ≤ g(x) < 1 and S2 = Sc1 = x : g(x) = 1 and

Page 56: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

52 Chapter 3. Some Important Theorems

define the measures ν1(B) = ν(B ∩ S1), ν2(B) = ν(B ∩ S2) for B ∈ M. Then using (†)and λ = wµ+ ν, we get∫

B(1− g) dν =

∫Bgw dµ for B ∈M . (‡)

For B = S2, the LHS is zero, while the RHS is∫S2w dµ. Since w > 0, it follows by

Lemma 3.39(ii) that µ(S2) = 0. Since ν2(Sc2) = 0 by definition, we get ν2 ⊥ µ. Next, since∫B(1− g) =

∫B∩S1

(1− g) and 1− g > 0 on S1, we have

µ(B) =⇒∫Bgw dµ = 0 =⇒

∫B∩S1

(1− g) dν = 0 =⇒ ν(B ∩ S1) = 0 =⇒ ν1(B) = 0 .

Hence, ν1 µ. Thus, we have a Lebesgue decomposition ν = νa + νs with νa = ν1 andνs = ν2. Multiplying (‡) by 1, g, g2, . . . , gn and adding, we get∫

B(1− gn+1) dν =

∫B

(g + g2 + . . .+ gn+1)w dµ for B ∈M .

Since 1− gn+1 ↑ 1 on S1, denoting by h the increasing limit of (g + . . .+ gn+1)w, we getνa(B) = ν1(B) = ν(B ∩ S1) = limn

∫B∩S1

(1− gn+1) dν = limn∫B(1− gn+1) dν =

∫B hdµ,

which completes the proof of existence for finite positive ν. We now prove uniqueness.Suppose ν = ν ′a + ν ′s is another Lebesgue decomposition. Then ν ′a − νa = νs − ν ′s. But

ν ′a − νa µ and νs − ν ′s ⊥ µ. Hence, ν ′a − νa = νs − ν ′s = 0. 9

Next, if∫B h1 dµ =

∫B h2 dµ for all B ∈M, then

∫B(h1 − h2) dµ = 0 for all B ∈M. In

particular,∫h1>h2

(h1 − h2) dµ = 0 and∫h1≤h2

(h2 − h1) dµ = 0, so∫|h1 − h2| dµ = 0 and

thus h1 = h2. This concludes uniqueness.Finally, if ν is complex, then ν = ν1 + iν2 with ν1, ν2 real. Moreover, νj = ν+

j − ν−j ,

where ν+j = 1

2(|νj | + νj) and ν−j = 12(|νj | − νj) are positive. Hence, we can apply the

preceding case to each ν±j .

Theorem 3.43. Let (X,M, µ) be a measure space with µ σ-finite. Let 1 ≤ p < ∞ andq its conjugate exponent (i.e. 1

p + 1q = 1). Define Φ : Lq(µ) → Lp(µ)∗ by g 7→ λg, where

λg(f) =∫fg. Then Φ is an isometric isomorphism.

Proof. We first assume µ(X) < ∞. Clearly, Φ is linear. To see surjectivity, given λ ∈Lp(µ)∗, let ν(E) = λ(χE) for E ⊆ X measurable. If E = ∪∞1 Ei with Ei disjoint,then χE =

∑∞1 χEi in Lp, since ‖χE −

∑n1 χEi‖p = ‖

∑∞n+1 χEi‖p = µ

(∪∞n+1 Ei

)1/p =(∑∞n+1 µ(Ei)

)1/p → 0 (we used that p < ∞). Since λ is linear and continuous, we getν(E) =

∑∞1 λ(χEi) =

∑∞1 ν(Ei), so ν is a complex measure. Also, if µ(E) = 0, then

‖χE‖p = 0 and thus ν(E) = 0. Hence, ν µ, so by Radon-Nikodym, there existsg ∈ L1(µ) such that λ(χE) = ν(E) =

∫E g dµ for all measurable E. By linearity, we get

λ(f) =∫fg dµ (?)

for all simple f , and thus for all f ∈ L∞(µ), since every f ∈ L∞(µ) is a uniform limit ofsimple functions (fi), and ‖fi − f‖p ≤ ‖fi − f‖∞µ(X)1/p → 0, so that λ(fi)→ λ(f).

9. We have used the following facts: (i) if ν1 µ and ν2 µ, then ν1 + ν2 µ, (ii) if ν1 ⊥ µ andν2 ⊥ µ, then ν1 + ν2 ⊥ µ, (iii) if ν µ and ν ⊥ µ, then ν = 0. (i) is obvious. For (ii), if there areA1 ∩ B1 = ∅ such that ν1 is concentrated on A1, µ on B1, and A2 ∩ B2 = ∅ such that ν2 is concentratedon A2, µ on B2, then ν1 + ν2 is concentrated on A = A1 ∪ A2, µ on B = B1 ∩ B2, and A ∩ B = ∅. Thus,ν1 + ν2 ⊥ µ. Finally, for (iii), if ν ⊥ µ, then ν is concentrated on a set A such that µ(A) = 0. But ν µ,so ν(E) = 0 for every E ⊆ A. Hence, ν is also concentrated on Ac. Thus, ν = 0.

Page 57: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

3.7. Appendix 53

We now show that g ∈ Lq(µ). If p = 1, then by (?) we have |∫E g dµ| ≤ ‖λ‖‖χE‖1 =

‖λ‖µ(E) for all E ∈M. Hence, |g(x)| ≤ ‖λ‖ a.e. by Lemma 3.40, so ‖g‖∞ ≤ ‖λ‖.If 1 < p < ∞, let En = x : |g(x)| ≤ n and write |g| = αg, where α is measurable

and |α| = 1. 10 Put f = χEn |g|q−1α. Then f ∈ L∞(µ), and |f |p = |g|q = fg on En, sincepq−p = q. So by (?),

∫En|g|q dµ =

∫X fg dµ = λ(f) ≤ ‖λ‖‖f‖p = ‖λ‖

( ∫En|g|q

)1/p. Thus,( ∫X χEn |g|q dµ

)1/q ≤ ‖λ‖. Applying monotone convergence, we get ‖g‖q ≤ ‖λ‖.We thus showed in both cases that g ∈ Lq(µ), with ‖g‖q ≤ ‖λ‖. By (?), λ and λg

coincide on L∞(µ), which is dense in Lp(µ), hence λ = λg. But by Hölder, |λg(f)| =|∫fg dµ| ≤ ‖f‖p‖g‖q, so ‖λ‖ = ‖λg‖ ≤ ‖g‖q. Thus, Φ is an isometry. In particular, Φ is

injective, and we just proved it is surjective. This completes the proof if µ(X) <∞.If µ is σ-finite, choose w ∈ L1(µ) as in Lemma 3.41 and let µ = wµ. Define Θ :

Lp(µ) → Lp(µ) by F 7→ w1/pF . Then Θ is a bijective isometry since 0 < w ≤ 1. Hence,if λ ∈ Lp(µ)∗, then Ψ := λ Θ ∈ Lp(µ)∗ and ‖Ψ‖ = ‖λ‖. By the finite case, ∃G ∈ Lq(µ)such that Ψ(F ) =

∫FGdµ and ‖Ψ‖ = ‖G‖q. Put g = w1/qG if p > 1 and g = G if p = 1.

Then∫|g|q dµ =

∫|G|q dµ = ‖Ψ‖q = ‖λ‖q if p > 1, while ‖g‖∞ = ‖G‖∞ = ‖Ψ‖ = ‖λ‖

if p = 1. In any case, ‖g‖q = ‖λ‖. Moreover, since Gµ = w1/pgµ, we finally get λ(f) =Ψ(w−1/pf) =

∫w−1/pfGdµ =

∫fg dµ for every f ∈ Lp(µ).

Theorem 3.44 (Riesz-Markov). Let X be a locally compact Hausdorff space and defineΦ : M (X) → C0(X)∗ by µ 7→ λµ, where λµ(f) =

∫f dµ. Then Φ is an isometric

isomorphism.If X is compact, the map µ 7→ λµ gives an isometric isomorphism of M+,1(X) with

C(X)∗+,1 = λ ∈ C(X)∗ : ‖λ‖ = 1 and λ(f) ≥ 0 whenever f ≥ 0.

Proof. The proof is really long, so we will only outline the main steps. We encouragethe serious student to read the whole proof in [31, Theorem 6.19]. As in Theorem 3.6,the difficulty is to prove that Φ is surjective. Before we begin, let us mention that theproof is much easier in the (important) special case of C[0, 1]. In this case, one simplyargues as follows: given λ ∈ C[0, 1]∗, use Hahn-Banach to find an extension L of λ tothe space of all bounded functions on [0, 1]. The advantage of L over λ is that it is well-defined on characteristic functions. So imitating the proof of Theorem 3.6, we may putw(x) = L(χ[0,x]). It can be shown that w has a bounded variation. Now given f ∈ C[0, 1],let fn =

∑ni=1 f( in)χ[ i−1

n, in

]. Then fn converges uniformly to f , so L(fn) converges toL(f). But L(χ[ i−1

n, in

]) = L(χ[0, in

]) − L(χ[0, i−1n

]) = w( in) − w( i−1n ), so L(fn) =

∫fn dw by

definition of the Riemann-Stieltjes integral. Hence, L(f) =∫f dw. For details, see [28,

page 354] or [22, Section 4.4]. We now proceed to the general case:1. Let λ ∈ C0(X)∗. We first assume λ is positive, i.e. λ(f) ≥ 0 whenever f ≥ 0. Given V

open in X, define

µ(V ) = supλ(φ) : φ ∈ Cc(X), 0 ≤ φ ≤ 1 and suppφ ⊆ V .

Clearly, V1 ⊆ V2 implies µ(V1) ≤ µ(V2), so

µ(E) = infV⊇E,V open

µ(V ) (∗)

for any open set E. We now define µ(E) for any set E ⊆ X by (∗). Let MF be theclass of all E ⊆ X such that µ(E) < ∞ and µ(E) = supK⊆E,K compact µ(K). And let

10. Let α(x) = g(x)+χE(x)|g(x)+χE(x)| , where E = x : g(x) = 0. Then α(x) = 1 on E, α(x) = g(x)/|g(x)| on Ec,

and α is measurable since E is measurable and β(z) = z/|z| is continuous.

Page 58: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

54 Chapter 3. Some Important Theorems

M be the class of all E ⊆ X such that E ∩K ∈ MF for every compact K. Then oneproves that M is a σ-algebra, and µ is a measure on M (much effort here).

2. Show that λ(f) =∫X f dµ for every f ∈ Cc(X). The idea is as follows: let K = supp f

and divide K into pieces Ei on which f is almost constant. Now use a partition ofunity (hi) subordinate to Ei to write f =

∑hif . This writes f as something close

to a step function, so it is more or less sufficient to describe µ(Ei) and µ(K) in termsof λ(hi), and one does this by working from the definition of µ(V ) with V open.

3. Now generalize the previous construction to all λ ∈ C0(X)∗, not just the positive ones.This is done by associating a positive functional to λ, namely λ(f) = sup|λ(g)| : g ∈C0(X) and |g| ≤ f. One can prove that λ is indeed linear, and ‖λ‖ = ‖λ‖.

These are the major steps, one then ties everything together and concludes with someadditional arguments.

Page 59: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Chapter 4

Spectral theory

This chapter is taken from [44], [28] and [38], with added details. The lettersX,Y, Z,Hdenote vector spaces over C which are never 0.

4.1 Compact and Hilbert-Schmidt operators

Definition 4.1. Let X,Y be Banach spaces. We say that T : X → Y is compact if T (B)is compact in Y whenever B ⊂ X is bounded.

Lemma 4.2. The following assertions are equivalent:

(i) T is compact.(ii) T (X1) is compact. Here X1 is the unit ball of X.(iii) Any bounded sequence (xn) in X has a subsequence (xnk) such that (Txnk) converges.

Proof. Exercise 1.

Example 4.3. If T : X → Y has a finite rank, i.e. dim Ran(T ) <∞, then T is compact.Indeed, if B ⊂ X is bounded, then T (B) is a closed bounded subset in a finite dimensionalnormed space, hence compact by Heine-Borel.

The set of compact operators from X to Y is denoted by K(X,Y ). The set of boundedoperators is denoted byB(X,Y ). We also denoteK(X) = K(X,X) andB(X) = B(X,X).

Lemma 4.4. Let X,Y be Banach spaces.

(a) K(X,Y ) is a closed subspace of B(X,Y ). In other words, αT+βS is compact wheneverT, S are compact, α, β ∈ K, and if Tn is a sequence of compact operators convergingin norm to T , then T is compact.

(b) If Z is a Banach space, T ∈ K(X,Y ), S1 ∈ B(Y,Z) and S2 ∈ B(Z,X), then TS2 ∈K(Z, Y ) and S1T ∈ K(X,Z). In particular, K(X) is an ideal in B(X).

(c) If T ∈ K(X,Y ), then the adjoint T ∗ ∈ K(Y ∗, X∗).If X,Y are Hilbert spaces, then the Hilbert adjoint T ∗ ∈ K(Y,X).

Proof. (a) Let (xn) be bounded in X. There is a subsequence (xϕ(n)) such that (Txϕ(n))converges. But (xϕ(n)) is bounded, so it has a subsequence (xϕψ(n)) such that(Sxϕψ(n)) converges. Hence, ((αT + βS)xϕψ(n)) converges, so αT + βS is compact.

Page 60: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

56 Chapter 4. Spectral theory

Suppose Tn ∈ K(X,Y ) and ‖Tn−T‖ → 0. To see that T is compact, it suffices to seethat T (X1) is totally bounded. 1 Let ε > 0. Given x, y ∈ X1, we have

‖Tx− Ty‖ ≤ ‖Tx− Tnx‖+ ‖Tnx− Tny‖+ ‖Tny − Ty‖ ≤ 2‖T − Tn‖+ ‖Tnx− Tny‖ .

Fix N large enough such that ‖TN − T‖ ≤ ε/4. Choose a finite ε/2-net for TN (X1),say TN (x1), . . . , TN (xr). Then T (x1), . . . , T (xr) is an ε-net for T (X1). 2

(b) Let (zn) be bounded in Z. Then (S2zn) is bounded inX (since S2 is bounded), so thereis a subsequence (S2zϕ(n)) such that (TS2zϕ(n)) converges, so TS2 is compact. Next,if (xn) is bounded in X, there is a subsequence (xϕ(n)) such that (Txϕ(n)) converges,so (S1Txϕ(n)) converges (since S1 is continuous), so S1T is compact.

(c) We only prove (c) for Hilbert spaces; the reader can find the Banach case in [22,Theorem 8.2.5]. So suppose T is compact and let (yn) be bounded in Y , say ‖yn‖ ≤M .Since TT ∗ is compact by (b), then (yn) has a subsequence (yϕ(n)) such that (TT ∗yϕ(n))converges. Hence,

‖T ∗yϕ(n) − T ∗yϕ(m)‖2 = 〈TT ∗(yϕ(n) − yϕ(m)), yϕ(n) − yϕ(m)〉≤ ‖TT ∗(yϕ(n) − yϕ(m))‖ · ‖yϕ(n) − yϕ(m)‖≤ 2M‖TT ∗yϕ(n) − TT ∗yϕ(m)‖ → 0

as n,m→∞. Hence, (T ∗yϕ(n)) is Cauchy, hence convergent, so T ∗ is compact.

Example 4.5. Suppose H is a Hilbert space with orthonormal basis ei∞i=1 and letT ∈ B(H ) be given by the diagonal matrix Ti,j = λiδi,j , i.e. Tei = λiei. Then T iscompact iff λi → 0 as i→∞.

To see this, first suppose λi → 0. By definition, if x =∑αiei ∈ H , then Tx =∑

αiλiei. For each n, let Tnei = Tei if i ≤ n and Tnei = 0 if i > n. Then Tn hasa finite rank (RanTn ⊂ sp e1, . . . , en), so Tn is compact. Moreover, ‖(T − Tn)x‖2 =‖∑i>n αiλiei‖2 =

∑i>n |αiλi|2 ≤ supi>n |λi|2

∑∞i=n+1 |αiλi|2 ≤ supi>n |λi|2‖x‖2. Hence,

‖Tn − T‖ ≤ supi>n |λi| → 0 as i→∞, so T is compact by Lemma 4.4(a).Conversely, suppose λi 6→ 0. Then there exists ε0 such that J = i : |λi| > ε0 is

infinite. 3 For each i, j ∈ J , i 6= j, we have ‖Tei − Tej‖2 = |λi|2 + |λj |2 ≥ 2ε20. Hence,

(Tei) has no convergent subsequence, so T is not compact.

Compact operators enjoy many interesting properties:

Lemma 4.6. (1) If X,Y are Banach spaces and T ∈ K(X,Y ), then T transforms weakconvergence into strong convergence. That is, if `(xn− x)→ 0 for every ` ∈ X∗, then‖Txn − Tx‖ → 0.

(2) If H is a separable Hilbert space and T ∈ K(H ), then T transforms strong conver-gence into uniform convergence. That is, if ‖Snx − Sx‖ → 0 for each x ∈ H , then‖SnT − ST‖ → 0 and ‖TSn − TS‖ → 0.

1. Recall that if K is a metric space, then E ⊆ K is an ε-net for K if ∀ p ∈ K ∃ q ∈ E such thatd(p, q) < ε. K is totally bounded if for each ε > 0 there exists a finite ε-net for K. A classic result is thatK is compact iff it is totally bounded and complete. Since Y is Banach, T (X1) is always complete.

2. Longer proof: suppose (xn) is bounded in X. There is a subsequence (x1,n) such that (T1x1,n)converges. Also, (x1,n) has a subsequence (x2,n) such that (T2x2,n) converges, and so on. Let yn = xn,n.Then (Tnym)m is Cauchy. Using an ε/3 argument, it follows that (Tym) is Cauchy, hence convergent.

3. Details: if for all ε > 0 the set J = i : |λi| > ε is finite, then ∀ ε > 0 ∃N ∈ N (namely N = max J)such that |λi| ≤ ε if i > N , so λi → 0.

Page 61: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.1. Compact and Hilbert-Schmidt operators 57

(3) If H is a separable Hilbert space, then every T ∈ K(H ) is the norm limit of asequence of operators of finite rank.

We will not need this lemma, but we prove (1) and (3) for the curious reader. Letus mention that (3) is not true for general Banach spaces. We say that a Banach spacehas the approximation property if (3) is true. The first separable Banach space withoutthis property was constructed by Enflo in 1972. He was awarded a live goose for hisachievement, which was promised by Mazur in 1936. Enflo was 28 years old.

Proof. For (1), suppose xn converges weakly to x. By the uniform boundedness principle,the sequence (‖xn‖) is bounded (see [22]). Let yn = Txn and ` ∈ Y ∗. Then |`∗(yn) −`∗(y)| = |(T ∗`)(xn − x)| → 0 since T ∗` ∈ X∗. Hence, yn converges weakly to y. Supposeyn does not converge to y in norm. Then there exists ε0 and a subsequence (yϕ(n)) of(yn) such that ‖yϕ(n) − y‖ ≥ ε0. But (xϕ(n)) is bounded and T is compact, so yϕ(n) =Txϕ(n) has a subsequence (yϕψ(n)) that converges (in norm) to some y. Now y 6= y since‖y − y‖ = lim ‖yϕψ(n) − y‖ ≥ ε0. But this is impossible since (yϕψ(n)) converges weaklyto y and (yn) converges weakly to y. This contradiction shows that ‖yn − y‖ → 0.

We omit the proof of (2). Sketch: we may assume S = 0. First assume T has a finiterank, then generalize using (3).

For (3), let ei be an orthonormal basis of H and let Tnx =∑n

1 〈x, ej〉Tej . ThenTx − Tnx = Tyn, where yn =

∑∞n+1〈x, ej〉ej ∈ e1, . . . , en⊥ and ‖yn‖ ≤ ‖x‖. Hence,

‖T − Tn‖ = sup‖x‖≤1 ‖Tx − Tnx‖ ≤ supy∈e1,...,en⊥, ‖y‖≤1 ‖Ty‖ =: αn. Clearly, αn ismonotone decreasing, so it converges to a limit α ≥ 0. Choose yn ∈ e1, . . . , en⊥ with‖yn‖ ≤ 1 such that ‖Tyn‖ ≥ 3

4αn. Then ‖Tyn‖ ≥ 12α if n is large enough. Since yn

converges weakly to 0, then ‖Tyn‖ → 0 by (1). Thus, α = 0. Hence, ‖T − Tn‖ → 0.

Definition 4.7. Let H be a Hilbert space with orthonormal basis ej and let T ∈ B(H ).We say that T is a Hilbert-Schmidt operator if

∑j ‖Tej‖2 <∞.

Equivalently,∑i,j |Tij |2 =

∑i,j |〈Tej , ei〉|2 <∞.

This seems to depend on the basis, but it does not:

Lemma 4.8.∑i,j |〈Tej , ei〉|2 =

∑i,j |〈Tfj , fi〉|2 for any orthonormal bases ei and fj.

Proof. We have∑i,j

|〈Tej , ei〉|2 =∑j

‖Tej‖2 =∑i,j

|〈Tej , fi〉|2 =∑i,j

|〈ej , T ∗fi〉|2 =∑j

‖T ∗fi‖2 .

The RHS is independent of ei, and hence so is the LHS.

Definition 4.9. If T is Hilbert-Schmidt, we define the Hilbert-Schmidt norm of T by‖T‖2 =

(∑j ‖Tej‖2

)1/2, where ej is any orthonormal basis of H .

Lemma 4.10. Let T be a Hilbert-Schmidt operator. Then

(i) T ∗ is Hilbert-Schmidt with ‖T ∗‖2 = ‖T‖2,

(ii) ‖T‖ ≤ ‖T‖2,

(iii) T is compact.

Page 62: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

58 Chapter 4. Spectral theory

Proof. For (i), see the proof of Lemma 4.8. For (ii), let ej be an orthonormal basis ofH and x =

∑αjej ∈H . Then by Cauchy-Schwarz,

‖Tx‖2 =∑i

|〈Tx, ei〉|2 =∑i

∣∣∣∑j

αj〈Tej , ei〉∣∣∣2

≤∑i

(∑j

|αj |2)(∑

j

|〈Tej , ei〉|2)

= ‖x‖2‖T‖22 .

For (iii), for each n, define Tn ∈ B(H ) by Tnei = Tei if i ≤ n and Tnei = 0 if i > n. ThenTn has finite rank. Furthermore, T−Tn is Hilbert-Schmidt, with ‖T−Tn‖22 =

∑j>n ‖Tej‖2.

Since∑j ‖Tej‖2 < ∞, we have ‖T − Tn‖2 → 0, so by (ii), ‖T − Tn‖ → 0. Hence, T is

compact by Lemma 4.4(a).

Example 4.11. Let (X,µ) be a σ-finite measure space and let K ∈ L2(X ×X). Definethe integral operator TK : L2(µ)→ L2(µ) by (TKf)(x) =

∫X K(x, y)f(y) dµ(y). Then TK

is Hilbert-Schmidt, and ‖TK‖2 = ‖K‖L2(X×X).To see this, fix x and let Kx(y) = K(x, y). Then Kx ∈ L2(µ) for µ-a.e. x by Fubini’s

theorem. Let ei be an orthonormal basis of L2(µ). Then by monotone convergence,∑‖TKei‖2 =

∑∫|(TKei)(x)|2 =

∑∫|〈Kx, ei〉|2

=∫ ∑

|〈Kx, ei〉|2 =∫‖Kx‖2L2(X) = ‖K‖2L2(X×X).

The space of Hilbert-Schmidt operators I2 is a Hilbert space when endowed with theinner product (T, S) =

∑i,j TijSi,j .

We showed above that I2 ⊂ I∞ ⊂ B(H ), where I∞ are the compact operators,and we showed that I∞ is a closed ideal. There are analogous classes of operators Ip

for every 1 ≤ p < ∞. The most important one is p = 1, the set of trace class operators.They are defined as follows: call an operator T positive if 〈Tx, x〉 ≥ 0 for all x ∈ H . IfT is positive, define the trace of T by tr(T ) =

∑∞1 〈Tei, ei〉. If T is a matrix, the trace is

just the sum of the diagonal elements, and also the sum of the eigenvalues. We will givea meaning later on to the operator |T | =

√T ∗T (it is the unique operator B satisfying

B ≥ 0 and B2 = T ∗T ). We say that a bounded operator A is trace class if tr |A| <∞. Itcan be shown that any trace class operator A takes the form A = BC, where B,C ∈ I2.The classes I1 and I2 also form ideals.

4.2 Spectral theorem for compact operatorsIt is well-known that a Hermitian matrix on Cn (i.e. ai,j = aj,i) has an orthonormal

basis of eigenvectors. In this section, we generalize this result to compact self-adjoint(more generally, normal) operators on a separable Hilbert space.

Definition 4.12. Let H be a Hilbert space and T ∈ B(H ). Given λ ∈ C, we defineHλ = x ∈H : Tx = λx. Then Hλ is a closed subspace of H . λ is called an eigenvalueif Hλ 6= 0, and x ∈Hλ is called an eigenvector.

Example 4.13. In contrast to finite dimensions, a self-adjoint operator need not haveany eigenvalue. For example, define T : L2[a, b]→ L2[a, b] by (Tf)(x) = xf(x). Then T isself-adjoint, T is bounded since ‖Tf‖L2[a,b] ≤ ‖x‖L∞[a,b]‖f‖L2[a,b], but T has no eigenvalue,since Tf = λf implies (x− λ)f(x) = 0 a.e. and hence f = 0 in L2[a, b].

Page 63: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.2. Spectral theorem for compact operators 59

Lemma 4.14. Let H be a Hilbert space and T ∈ B(H ) be self-adjoint (i.e. T = T ∗).(i) If W is T -invariant (i.e. T (W ) ⊆W ), then W⊥ is also T -invariant.(ii) For every x ∈H , 〈Tx, x〉 ∈ R. In particular, all eigenvalues are real.(iii) ‖T‖ = sup‖x‖≤1 |〈Tx, x〉|.(iv) If λ 6= β, then Hλ ⊥Hβ.

Proof. (i) If y ∈W⊥ and x ∈W , 〈x, Ty〉 = 〈Tx, y〉 = 0 since Tx ∈W . Hence, Ty ∈W⊥.(ii) Note that 〈Tx, x〉 = 〈x, Tx〉 = 〈T ∗x, x〉 = 〈Tx, x〉, so 〈Tx, x〉 ∈ R.If Tx = λx, x 6= 0, then λ‖x‖2 = 〈Tx, x〉 ∈ R, so λ ∈ R.(iii) Let α = sup‖x‖≤1 |〈Tx, x〉|. By Cauchy-Schwarz, α ≤ ‖T‖. To prove that ‖T‖ ≤ α,

we first show that |〈Tx, y〉| ≤ α‖x‖‖y‖. Since this inequality is unchanged if we multiplyy by a complex number of modulus 1, we may assume 〈Tx, y〉 ∈ R. Then

4〈Tx, y〉 = 〈T (x+ y), x+ y〉 − 〈T (x− y), x− y〉 ,

so using the definition of α and the parallelogram identity,

|〈Tx, y〉| ≤ α

4 (‖x+ y‖2 + ‖x− y‖2) = α

2 (‖x‖2 + ‖y‖2) .

Apply this inequality to√ax and 1√

ay for any a > 0. We get

|〈Tx, y〉| ≤ α

2(a‖x‖2 + 1

a‖y‖2

).

If x = 0, then trivially |〈Tx, y〉| ≤ α‖x‖‖y‖. Otherwise, choose a = ‖y‖/‖x‖ to get|〈Tx, y〉| ≤ α‖x‖‖y‖. To finally complete the proof of (iii), choose x such that Tx 6= 0,and apply this inequality to y = Tx/‖Tx‖ (the case T = 0 is trivial). Then we get‖Tx‖ = |〈Tx, y〉| ≤ α‖x‖ · 1, i.e. ‖T‖ ≤ α.

(iv) If Tx = λx and Ty = βy, then 〈Tx, y〉 = λ〈x, y〉, and 〈Tx, y〉 = 〈x, Ty〉 = β〈x, y〉(β ∈ R by (ii)). Hence, (λ− β)〈x, y〉 = 0. If λ 6= β, we must have 〈x, y〉 = 0.

Theorem 4.15 (Spectral theorem I). Let T be a compact self-adjoint operator on a sepa-rable Hilbert space H . Then H has an orthonormal basis en consisting of eigenvectorsof T , i.e. Ten = λnen. If H is infinite-dimensional, then λn → 0.

This is known as the Hilbert-Schmidt theorem. Note that since λn → 0, then eachnonzero eigenvalue has a finite multiplicity. So here, the spectrum we will soon define is adiscrete set, with no accumulation point except perhaps λ = 0.

Proof. We first claim that we can find at least one non-zero eigenvector. If T = 0, anyvector is an eigenvector, so we may assume T 6= 0. By Lemma 4.14(iii), we can choosexn ∈ H with ‖xn‖ ≤ 1 such that |〈Txn, xn〉| → ‖T‖. By passing to a subsequence, wemay assume 〈Txn, xn〉 → λ, where |λ| = ‖T‖. Since (xn) is bounded and T is compact,by passing to a subsequence we may assume Txn → y for some y ∈ H . Since ‖T‖ 6= 0and ‖T‖ = lim |〈Txn, xn〉|, then y 6= 0. Now using Lemma 4.14(ii),

‖Txn − λxn‖2 = ‖Txn‖2 − 2λ〈Txn, xn〉+ λ2‖xn‖2 ≤ 2‖T‖2 − 2λ〈Txn, xn〉 .

As n → ∞, we get ‖Txn − λxn‖ → 0. Since Txn → y, we thus have λxn → y. As T iscontinuous, we get λTxn → Ty. But since Txn → y, we also have λTxn → λy. Hence,Ty = λy, so y is an eigenvector.

Page 64: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

60 Chapter 4. Spectral theory

Using Zorn’s lemma, we can now choose an orthonormal set of eigenvectors E for Twhich is maximal among all orthonormal sets of eigenvectors. 4 Let W = spE be thespan of these vectors. It suffices to see that W = H , i.e. W⊥ = 0. We clearly haveTW ⊆ W , so by Lemma 4.14(i), TW⊥ ⊆ W⊥. Hence, T |W⊥ ∈ B(W⊥) is self-adjoint,and if W⊥ 6= 0, then by the preceding paragraph, there is an eigenvector for T in W⊥.This contradicts the maximality of E and proves the first assertion of the theorem. Thesecond assertion follows from Example 4.5.

We now generalize this result to normal operators. We first note the following.

Corollary 4.16. Let S, T be compact self-adjoint operators on a separable Hilbert spaceH such that ST = TS. Then H has an orthonormal basis en such that en is aneigenvector for both S and T .

Proof. Let Hλ be the eigenspace for T as in Definition 4.12. If x ∈ Hλ, then T (Sx) =STx = S(λx) = λ(Sx). Hence, Sx ∈ Hλ, so S : Hλ → Hλ. Since S|Hλ

is compact andself-adjoint we may choose and orthonormal basis en,λn of Hλ consisting of eigenvectorsof S. Of course, en,λn are also eigenvectors of T since en,λ ∈Hλ. Taking en = ∪λen,λwe get the basis.

Definition 4.17. Let H be a Hilbert space. We say that T ∈ B(H ) is normal if itcommutes with its adjoint, i.e. T ∗T = TT ∗.

For example, self-adjoint operators are normal. Unitary operators are also normal(recall that U is unitary if U is bijective and U∗ = U−1). On the other hand, T = 2iI isnormal, but neither self-adjoint nor unitary.

Theorem 4.18 (Spectral Theorem II). Let T be a compact normal operator on a separableHilbert space H . Then H has an orthonormal basis en consisting of eigenvectors of T ,i.e. Ten = γnen. If H is infinite-dimensional, then γn → 0.

Proof. Let T1 = T+T ∗2 and T2 = T−T ∗

2i . Then T = T1 + iT2 and Tj are self-adjoint.Moreover, T1 and T2 commute since T is normal. So by Corollary 4.16, there is anorthonormal basis en for H such that Ten = T1en+iT2en = λnen+iµnen = (λn+iµn)en.So we may take γn = λn + iµn. Finally, γn → 0 by Example 4.5.

4.3 Application: Peter-Weyl TheoremThe Peter-Weyl theorem is an important result in harmonic analysis which can be

regarded as a generalization of the completeness of the Fourier series. Let us start withtwo examples to better understand the theorem.

Example 4.19. Consider the unit circle T ⊂ C and let ek(z) = zk on T. Then the Fourierbasis ekk∈Z is an orthogonal basis of L2(T). The fact that they are orthogonal is a simplecalculation (here

∫T z

kzj dz =∫ 2π

0 eikxe−ijx dx). Their span is dense in L2(T) because byStone-Weierstrass, their span is (uniformly) dense in C(T), which is dense in L2(T).

Hence, if we let Hk = C ·ek = αek : α ∈ C, then L2(T) = ⊕k∈Z Hk. Here ⊕ denotesthe closure of the direct sum. Moreover, the Hk are mutually orthogonal. Also, if wedefine (πaf)(z) = f(az) for a ∈ T and f ∈ L2(T), then (πaek)(z) = ek(az) = akzk, i.e.πaek = akek. Hence, Hk is invariant under each πa.

4. As usual, consider the class of all orthonormal sets of eigenvectors for T , ordered by set inclusion.This class is nonempty since it contains v, where v = y/‖y‖ is the eigenvector we constructed. Any chainEα has an upper bound, namely ∪Eα, which is clearly orthonormal. So we can apply Zorn’s lemma.

Page 65: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.3. Application: Peter-Weyl Theorem 61

Example 4.20. Let G ⊂ GL(n,R) be a compact subgroup. For each r, let Pr be the spaceof polynomial functions on Rn×n of degree at most r, i.e. if p ∈ Pr, then p : Rn×n → R andp : (ai,j)ni,j=1 7→

∑rk=0

∑ni,j=1 ci,j,ka

ki,j . Each g ∈ G acts on Rn×n by matrix multiplication.

Moreover, for a fixed g, this map Rn×n → Rn×n is linear. Hence, if f ∈ Pr and πgf(x) =f(g−1x), then πgf ∈ Pr. Thus, if Vr = f |G : f ∈ Pr, then Vr is πg-invariant. Moreover,Vr ⊂ Vr+1, dimVr < ∞, and ∪iVi is (uniformly) dense in C(G) by Stone-Weierstrass.Hence, ∪iVi is dense in L2(G). Let Hr = x ∈ Vr : x ⊥ Vr−1. Then dim Hr <∞, Hrare mutually orthogonal, L2(G) = ⊕rHr, and Hr is invariant under each πg, g ∈ G,assuming G is equipped with a left invariant (Haar) measure.

Definition 4.21. Let G be a group, V a vector space and GL(V ) the group of invertiblelinear maps on V . A representation of G on V is a homomorphism π : G → GL(V ),π : g 7→ πg. We say that W ⊆ V is invariant under π if πg(W ) ⊆ W for all g ∈ G. Wesay that the representation π is irreducible if it has no closed invariant subspaces otherthan 0 and V . If G is a topological group and H is a Hilbert space, we say that arepresentation π : G → GL(H ) is unitary if each πg is a unitary operator on H , and ifthe map g 7→ πgf is continuous for all f ∈H .

Definition 4.22. Let G be a compact group with a left invariant (Haar) measure µ(see Chapter 3). We define the (left) regular representation of G by π : G → L2(G,µ),(πgf)(x) = f(g−1x).

It can be shown that the regular representation is unitary.

Theorem 4.23 (Peter-Weyl). Let G be a compact group and π the regular representationon L2(G). Then L2(G) = ⊕rHr, where Hr are mutually orthogonal, dim Hr < ∞ andHr is invariant and irreducible for π.

Proof. Given ϕ ∈ C(G), define K(x, y) = ϕ(x−1y). Then the integral operator T = TKis compact by Example 4.11. Hence, T ∗T is compact by Lemma 4.4(b) and clearly self-adjoint. By the Spectral Theorem 4.15, it follows that

L2(G) = ker(T ∗T )⊕ ⊕λ 6=0

Hλ , (†)

where Hλ is the corresponding eigenspace for T ∗T , and for λ 6= 0, dim Hλ <∞ since eachnonzero eigenvalue has a finite multiplicity.

We claim that each Hλ is invariant for π. First note that if g ∈ G, then [T (πgf)](x) =∫K(x, y)f(g−1y) dµ(y). As µ is invariant, we may replace y by gy to get [T (πgf)](x) =∫K(x, gy)f(y) dµ(y) =

∫K(g−1x, y)f(y) dµ(y) = [πg(Tf)](x). Hence, T commutes with

πg. Also, T ∗T commutes with πg. Indeed, first note that (πgf, h) =∫f(g−1x)h(x) =∫

f(x)h(gx) = (f, πg−1h), so π∗g = πg−1 . Hence, T ∗πg = T ∗π∗g−1 = (πg−1T )∗ = (Tπg−1)∗ =πgT

∗ and thus T ∗Tπg = T ∗πgT = πgT∗T . It follows that Hλ is invariant for π. Indeed, if

v ∈Hλ, then T ∗T (πgv) = πg(T ∗Tv) = πg(λv) = λ(πgv), so πgv ∈Hλ as asserted.We can now use Zorn’s lemma to choose a maximal collection Vj of mutually or-

thogonal, finite dimensional subspaces Vj ⊂ L2(G) with Vj invariant and irreducible for π(such Vj exist by the preceding paragraph). Let W = ⊕Vj . We claim that W = L2(G).If not, then we can choose f ∈ W⊥, f 6= 0. Since C(G) is dense in L2(G), we can findϕ ∈ C(G) such that (ϕ, f) 6= 0. Again define K(x, y) = ϕ(x−1y) and T = TK to obtainthe decomposition (†). Now let P : L2(G) → W⊥ be the orthogonal projection. By con-struction, W is invariant for π. So W⊥ is also invariant for π. Indeed, if v ∈ W⊥ and

Page 66: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

62 Chapter 4. Spectral theory

w ∈W , then (πgv, w) = (v, πg−1w) = 0 because πg−1w ∈W , hence πgv ∈W⊥ as asserted.It follows that P commutes with all πg. Indeed, if u ∈ L2(G) and v ∈W⊥, we have

(πgu− πgPu, v) = (u, πg−1v)− (Pu, πg−1v) = (u, πg−1v)− (u, Pπg−1v) .

But Pπg−1v = πg−1v since πg−1v ∈ W⊥. Hence, (πgu − πgPu, v) = 0. Thus, πgu =πgPu + (πgu − πgPu) with πgPu ∈ W⊥ and (πgu − πgPu) ⊥ W⊥. So by definition ofthe orthogonal projection, Pπgu = πgPu, so P commutes with πg. Hence, πg(PHλ) =P (πgHλ) ⊆ P (Hλ), so P (Hλ) ⊆ W⊥ is also invariant for π. Since dimP (Hλ) <∞, thiswill contradict the maximality of Vj if we can show that at least one P (Hλ) 6= 0.

However, if P (Hλ) = 0 for all λ 6= 0, then ⊕Hλ ⊆ W and thus W⊥ = W⊥ ⊆

(⊕Hλ)⊥, i.e. W⊥ ⊆ kerT ∗T . But if T ∗Tv = 0, then 0 = (T ∗Tv, v) = (Tv, Tv), soTv = 0. Thus, we would have W⊥ ⊆ kerT . However, (Tf)(e) =

∫K(e, y)f(y) dµ(y) =∫

ϕ(y)f(y) dµ(y) = (ϕ, f) 6= 0. Moreover, Tf is a continuous function on G. Hence,Tf 6= 0 in L2(G). But f ∈ W⊥, so this contradicts W⊥ ⊆ kerT . Hence, at least oneP (Hλ) 6= 0, which completes the proof.

The previous theorem can be used to prove other versions of the Peter-Weyl theorem,for example the following one.

Theorem 4.24 (Peter-Weyl II). Let G be a compact group and σ a unitary representationof G on a Hilbert space H . Then H = ⊕rHr, where Hr are mutually orthogonal,dim Hr <∞ and Hr is invariant and irreducible for σ.

Proof. See [44, Theorem 3.3.9], or your favorite book in harmonic analysis or representa-tion theory.

4.4 The spectrum of a bounded operatorWe saw in Example 4.13 that a bounded self-adjoint operator may have no eigenvalues.

Hence, if we want to obtain a spectral theorem for such operators, we should weaken thenotion of eigenvalues. This motivates the following.

Definition 4.25. Let X be a Banach space and T ∈ B(X).We say that T is invertible if T−1 exists and T−1 ∈ B(X).We define the spectrum of T by

σ(T ) = λ ∈ C : T − λ is not invertible .

Here T − λ := T − λI; a standard abuse of notation.

Example 4.26. 1. If λ is an eigenvalue, then λ ∈ σ(T ) since Tλ := T −λ is not injective.If dimX < ∞, then λ ∈ σ(T ) iff λ is an eigenvalue. Indeed, if λ is not an eigenvalue,then Tλ is injective. So dimX = dimTλ(X), so Tλ(X) = X, so Tλ is surjective. Hence,T−1λ exists, and T−1

λ ∈ B(X) by the open mapping theorem. Hence, λ /∈ σ(T ).2. Let H be a Hilbert space with orthonormal basis ei. Define T ∈ B(H ) by Tei = λiei

with λi → 0 as i→∞, but λi 6= 0 for all i. Then 0 is not an eigenvalue, but 0 ∈ σ(T ).Indeed, if T was invertible, we would have T−1ei = λ−1

i ei. Since λi → 0, this cannotbe a bounded operator.

Definition 4.27. Let X be a Banach space and T ∈ B(X). We say that T is boundedbelow if there is some k > 0 such that ‖Tx‖ ≥ k · ‖x‖ for all x ∈ X.

Page 67: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.4. The spectrum of a bounded operator 63

Lemma 4.28. If T ∈ B(X), then T is invertible iff T is bounded below and T (X) isdense.

Proof. If T is invertible, then T (X) = X and ‖x‖ = ‖T−1Tx‖ ≤ ‖T−1‖‖Tx‖, i.e. Tis bounded below by k = ‖T−1‖−1. Conversely, if T is bounded below by k, then T isinjective since Tx = 0 implies ‖x‖ ≤ k−1‖Tx‖ = 0, i.e. x = 0. If T (X) is moreoverdense, given y ∈ X, choose xn ∈ X such that Txn → y. Then Txn is Cauchy and‖Txn − Txm‖ ≥ k‖xn − xm‖. Hence, xn is also Cauchy, so xn → x for some x since Xis complete. Hence, Txn → Tx, so Tx = y. Thus, T is surjective, hence bijective. Finally,‖T−1x‖ ≤ k−1‖TT−1x‖ = k−1‖x‖, so ‖T−1‖ ≤ k−1. Conclusion: T is invertible.

Example 4.29. Let H = L2(X,µ) and suppose ϕ ∈ L∞(X). Let Mϕ be the multiplica-tion operator on H given by Mϕ(f) = ϕf . Then for any λ ∈ C, Mϕ − λ = Mϕ−λ. Weclaim that σ(Mϕ) = ess range(ϕ), where 5

ess range(ϕ) = λ ∈ C | ∀ε > 0 : µx : |ϕ(x)− λ| < ε > 0 .

Indeed, if λ /∈ ess range(ϕ), then there exists ε > 0 such that, if Aε = x : |ϕ(x)−λ| < ε,then µ(Aε) = 0. Hence, given f ∈H , ‖M1/(ϕ−λ)f‖22 =

∫X\Aε |

f(x)ϕ(x)−λ |

2 ≤ ε−2‖f‖22. Hence,Mϕ−λ is invertible, with inverse M1/(ϕ−λ), so λ /∈ σ(Mϕ). Conversely, if λ ∈ ess range(ϕ),define Sm = x : |ϕ(x) − λ| < 2−m and let χm = χSm . Then µ(Sm) > 0, so χm 6= 0.Moreover, ‖Mϕ−λχm‖22 =

∫Sm|ϕ(x) − λ|2 ≤ 2−2mµ(Sm) = 2−2m‖χm‖22. Hence, Mϕ−λ is

not bounded below, so not invertible. Thus, λ ∈ σ(Mϕ).

Lemma 4.30. Let H be a Hilbert space and T ∈ B(H ). If T and T ∗ are bounded below,then T is invertible.

Proof. By Lemma 4.28, it suffices to show that T (H ) is dense. But T (H )⊥ = kerT ∗.Indeed, y ∈ T (H )⊥ ⇐⇒ 〈y, Tx〉 = 0 for all x ∈ H ⇐⇒ 〈T ∗y, x〉 = 0 for allx ∈ H ⇐⇒ T ∗y = 0. Since T ∗ is bounded below, kerT ∗ = 0. Hence, T (H )⊥ = 0,so T (H ) is dense.

We saw in Lemma 4.14 that the eigenvalues of a bounded self-adjoint operator are real.We now have the following generalization.

Lemma 4.31. Let H be a Hilbert space and T ∈ B(H ).(a) If T is self-adjoint, then σ(T ) ⊆ R.(b) If T is unitary, then σ(T ) ⊆ λ ∈ C : |λ| = 1.

Proof. (a) By Lemma 4.30, it suffices to show that if | Imλ| 6= 0, then T−λ and (T−λ)∗ arebounded below. Let ‖y‖ = 1. Then ‖(T − λ)y‖ ≥ |〈(T − λ)y, y〉| = |〈Ty, y〉 − λ〈y, y〉|.Since 〈Ty, y〉 ∈ R by Lemma 4.14 and ‖y‖ = 1, then ‖(T − λ)y‖ ≥ | Imλ|. Thus,‖(T − λ)x‖ ≥ | Imλ| · ‖x‖ for all x ∈ H . Also, ‖(T − λ)∗x‖ = ‖(T − λ)x‖ ≥| Im λ| · ‖x‖ = | Imλ| · ‖x‖. Hence, T − λ and (T − λ)∗ are bounded below.

5. Let ϕ∗µ be the pushforward measure ϕ∗µ(B) = µ(ϕ−1(B)) for Borel B ⊆ C. Then λ is in the essentialrange of ϕ iff every neighborhood of λ has positive ϕ∗µ-measure. In general, ess range(ϕ) = ∩ψ=ϕ a.e.Ranψ.

If X is a compact subset of Rn, ϕ ∈ C(X) and µ is the Lebesgue measure, then ess range(ϕ) = Ran(ϕ).Indeed, if λ ∈ ess range(ϕ), then for each ε > 0, we have µx : |ϕ(x) − λ| < ε > 0. In particular,x : |ϕ(x) − λ| < ε 6= ∅. Thus, Bλ,ε ∩ Ran(ϕ) 6= ∅. Since ε > 0 is arbitrary, we get λ ∈ Ran(ϕ). ButRan(ϕ) is compact, hence closed, so ess range(ϕ) ⊆ Ran(ϕ). Conversely, if λ ∈ Ran(ϕ), then λ = ϕ(x0)for some x0. Let ε > 0. Since ϕ is continuous, there exists δ > 0 such that |ϕ(x0) − ϕ(x)| < ε whenever|x−x0|2 < δ. Hence, µx : |ϕ(x0)−ϕ(x)| < ε ≥ µx : |x−x0|2 < δ = cnδ

n > 0. Thus, λ ∈ ess range(ϕ).

Page 68: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

64 Chapter 4. Spectral theory

(b) It suffices to show that if |λ| 6= 1, then T − λ and (T − λ)∗ are bounded below.We have ‖(T − λ)x‖ ≥ | ‖Tx‖ − ‖λx‖ | = |1 − |λ| | · ‖x‖, since ‖Tx‖ = ‖x‖. Also,‖(T − λ)∗x‖ = ‖(T−1 − λ)x‖ ≥ |1− |λ| | · ‖x‖, since T−1 is unitary. Hence, T − λ and(T − λ)∗ are bounded below.

The following lemma is a special case of the spectral mapping theorem.

Lemma 4.32. Let X be a Banach space, T ∈ B(X) and p(t) a polynomial. Then

σ(p(T )) = p(σ(T )) .

Proof. If p(t) = c is constant, then p(T ) = cI and σ(cI) = c = p(σ(I)). So suppose p isnot constant. Given λ ∈ C, let µi be the roots of the polynomial p(t) − λ. Then wehave p(T ) − λ = a0

∏ni=1(T − µi), where a0 6= 0. If T − µi is invertible for each i, then

p(T ) − λ is invertible. Conversely, if T − µi0 is not invertible for some i0, then p(T ) − λis not invertible. Indeed, by the open mapping theorem, T − µi0 is either non-injectiveor non-surjective. If it is not injective, then p(T ) − λ = (T − µ1) · · · (T − µn)(T − µi0) isnot injective. If it is not surjective, then p(T )− λ = (T − µi0)(T − µ1) · · · (T − µn) is notsurjective. We thus showed that p(T )− λ is invertible iff each T − µi is invertible.

Hence, λ ∈ σ(p(T )) ⇐⇒ µi ∈ σ(T ) for some i. But µi ∈ σ(T ) implies λ ∈ p(σ(T )),since p(µi)− λ = 0. Conversely, λ ∈ p(σ(T )) implies λ = p(µ) for some µ ∈ σ(T ). So µ isa root of p(t)− λ, so µ = µi for some i. Hence, µi ∈ σ(T ). Conclusion: λ ∈ σ(p(T )) ⇐⇒λ ∈ p(σ(T )).

We now turn to the main result on the spectrum of a bounded operator.

Definition 4.33. Let T ∈ B(X). We definea) The resolvent set of T by ρ(T ) = C \ σ(T ).b) The spectral radius of T by ‖T‖σ = supλ∈σ(T ) |λ|.c) The resolvent operator of T by R(λ) = (T − λ)−1 for λ ∈ ρ(T ).

Let X be a Banach space and xi ∈ X. Recall that∑xi is absolutely convergent if∑

‖xi‖ < ∞. Since ‖∑mi=n xi‖ ≤

∑mi=n ‖xi‖, then absolute convergence implies conver-

gence by the Cauchy criterion.

Theorem 4.34. (i) If T ∈ B(X) and ‖T‖ < 1, then I − T is invertible. Moreover,(I − T )−1 =

∑∞n=0 T

n and ‖(I − T )−1‖ ≤ 11−‖T‖ .

(ii) The set of invertible operators is open in B(X).(iii) If |λ| > ‖T‖, then λ ∈ ρ(T ), R(λ) = −

∑∞n=0 λ

−n−1Tn and ‖R(λ)‖ ≤ 1|λ|−‖T‖ .

(iv) (Resolvent identity). For all λ, µ ∈ ρ(T ) we have R(λ)−R(µ) = (λ− µ)R(λ)R(µ).(v) ρ(T ) is open, and the resolvent R(λ) is an analytic operator-valued function. That

is, any λ ∈ ρ(T ) has a small neighborhood in which R(µ) can be expressed as aconvergent power series R(µ) =

∑∞n=1(µ− λ)n−1R(λ)n.

(vi) σ(T ) is a compact non-empty set with ‖T‖σ ≤ ‖T‖.

Proof. (i) Since ‖T‖ < 1, the series∑Tn is absolutely convergent, hence convergent,

say to S. Let SN =∑N−1n=0 T

n. Then (I − T )SN = SN (I − T ) = I − TN . LettingN →∞ we obtain S = (I − T )−1. Finally, ‖(I − T )−1‖ ≤

∑∞n=0 ‖T‖n = 1

1−‖T‖ .

(ii) If T is invertible and S ∈ B(X) satisfies ‖T − S‖ < ‖T−1‖−1, then ‖ST−1 − I‖ ≤‖S − T‖‖T−1‖ < 1, so ST−1 = I − (I − ST−1) is invertible by (i). Hence, S isinvertible, so the set of invertible operators is open.

Page 69: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.4. The spectrum of a bounded operator 65

(iii) T − λ = −λ(I − Tλ ). Since ‖Tλ ‖ < 1, R(λ) exists by (i) and equals −λ−1∑∞

n=0(Tλ )n.The norm is bounded as in (i).

(iv) (λ−µ)R(λ)R(µ) = R(λ)(λ−µ)R(µ) = R(λ)[(T −µ)− (T −λ)]R(µ) = R(λ)−R(µ).(v) Let λ ∈ ρ(T ) and µ ∈ C. Then T −µ = (T −λ)− (µ−λ) = (T −λ)(I− (µ−λ)R(λ)).

So by (i), (T − µ) is invertible if ‖(µ − λ)R(λ)‖ < 1, i.e. |µ − λ| < ‖R(λ)‖−1. Thisshows that ρ(T ) is open. By (i), R(µ) = R(λ)

∑∞n=0(µ− λ)nR(λ)n for such µ.

(vi) By (iii), if λ ∈ σ(T ), then |λ| ≤ ‖T‖. Hence, ‖T‖σ ≤ ‖T‖. In particular, σ(T ) isbounded, and it is closed by (v). Suppose that σ(T ) = ∅. Then R(λ) is analyticon all C by (v). We claim it is also bounded. Indeed, by continuity, it is boundedon the compact set λ : |λ| ≤ 2‖T‖, and if |λ| > 2‖T‖, then by (iii), we have‖R(λ)‖ ≤ 1

|λ|−‖T‖ ≤1‖T‖ . Fix f ∈ B(X)∗. We thus showed that f(R(λ)) is a bounded

entire function, so by Liouville’s theorem, f(R(λ)) is constant. But |f(R(λ))| ≤‖f‖ · ‖R(λ)‖ ≤ ‖f‖

|λ|−‖T‖ → 0 as |λ| → ∞. Hence, f(R(λ)) = 0, since it is constant.Since this holds for all f ∈ B(X)∗, then by Hahn-Banach, R(λ) = 0. This contradictsthe fact that R(λ) is invertible, so σ(T ) 6= ∅.

Theorem 4.35. Let X be a Banach space and T ∈ B(X). Then

‖T‖σ = limn→∞

‖Tn‖1/n = infn∈N‖Tn‖1/n .

Proof. Upper bound. If λ ∈ σ(T ), then λn ∈ σ(Tn) by Lemma 4.32, so |λn| ≤ ‖Tn‖σ ≤‖Tn‖ by Theorem 4.34(v). Hence, |λ| ≤ ‖Tn‖1/n. Thus, ‖T‖σ ≤ infn ‖Tn‖1/n.

Lower bound. Fix f ∈ B(X)∗. By Theorem 4.34(v), f(R(λ)) is analytic on theannulus |λ| > ‖T‖σ. And by Theorem 4.34(iii), f(R(λ)) = −

∑∞n=0 λ

−n−1f(Tn) inthe smaller annulus |λ| > ‖T‖. By the theory of Laurent series from complex anal-ysis, we also have f(R(λ)) = −

∑∞n=0 λ

−n−1f(Tn) in the larger annulus |λ| > ‖T‖σ.In particular, supn |λ−n−1f(Tn)| < ∞. By the uniform boundedness principle, it fol-lows that supn ‖λ−n−1Tn‖ =: K < ∞. Hence, ‖Tn‖1/n ≤ K1/n|λ|1+1/n for all n. Solim ‖Tn‖1/n ≤ |λ|. Since this holds for all λ such that |λ| > ‖T‖σ, we have proved thatlim ‖Tn‖1/n ≤ ‖T‖σ. Putting this together with the upper bound, we get

‖T‖σ ≤ infn‖Tn‖1/n ≤ lim ‖Tn‖1/n ≤ lim ‖Tn‖1/n ≤ ‖T‖σ .

This completes the proof.

Corollary 4.36. If H is a Hilbert space and T ∈ B(H ) is normal, then ‖T‖σ = ‖T‖.More generally, ‖p(T )‖ = maxλ∈σ(T ) |p(λ)| for any polynomial p(t).

Proof. As T ∗T is self-adjoint, ‖T‖2 = sup‖x‖≤1 |〈Tx, Tx〉| = sup‖x‖≤1 |〈T ∗Tx, x〉| = ‖T ∗T‖by Lemma 4.14. But ‖T ∗T‖2 = sup‖x‖≤1 |〈T ∗Tx, T ∗Tx〉| = sup‖x‖≤1 |〈T 2x, T 2x〉| =‖T 2‖2, since TT ∗ = T ∗T . Hence, ‖T‖2 = ‖T 2‖. By induction, ‖T‖n = ‖Tn‖ forn = 2, 22, 23, . . .. Thus, using Theorem 4.35, ‖T‖σ = limn ‖Tn‖1/n = limk ‖T 2k‖1/2k =limk ‖T‖ = ‖T‖.

Next, p(T ) is normal by induction, so using Lemma 4.32, we get ‖p(T )‖ = ‖p(T )‖σ =maxµ∈σ(p(T )) |µ| = maxµ∈p(σ(T )) |µ| = maxλ∈σ(T ) |p(λ)|.

Example 4.37. Let H be the space of 2× 2 complex matrices.

(a) If T =(λ1 00 λ2

)for λi ∈ C, then T is normal. We have σ(T ) = λ1, λ2, so

‖T‖σ = max|λ1|, |λ2| = ‖T‖. Note that Tn =(λn1 00 λn2

), so ‖Tn‖1/n = ‖T‖.

Page 70: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

66 Chapter 4. Spectral theory

(b) Let T =(

1 λ0 1

)with λ 6= 0. Then T is not normal. We have σ(T ) = 1, ‖T‖σ = 1

but ‖T‖ = (1 + |λ|2)1/2. Here Tn =(

1 nλ0 1

), so ‖Tn‖ = (1 + |nλ|2)1/2. Hence,

‖Tn‖1/n → 1, as asserted in Theorem 4.35.

4.5 Continuous functional calculusDefinition 4.38. Let H be a Hilbert space, suppose T ∈ B(H ) is self-adjoint, and letf ∈ C(σ(T )). Since σ(T ) is a compact subset of R, by the Weierstrass approximation,we may find polynomials pn(t) such that pn(t) → f(t) uniformly on σ(T ). We define theoperator f(T ) to be the limit in B(H ) of the operators pn(T ).

Lemma 4.39. f(T ) is well-defined. That is, the sequence pn(T ) converges, and its limitis independent of the choice of the approximating polynomials.

Proof. Using Corollary 4.36, ‖pn(T )−pm(T )‖ = ‖(pn−pm)(T )‖ = maxλ∈σ(T )

|(pn−pm)(λ)| →

0 as n,m → ∞, since pn(t) converges uniformly. Hence, pn(T ) is Cauchy, and con-verges by the completeness of B(H ).

If qn(t) is another approximating sequence of polynomials, then repeating the previousargument we get ‖qn(T )− pn(T )‖ → 0, hence lim qn(T ) = lim pn(T ) = f(T ).

Definition 4.40. Let H be a Hilbert space and T ∈ B(H ). We say that T is positive,and denote T ≥ 0, if 〈Tx, x〉 ≥ 0 for all x ∈H . We say that T ≥ S if T − S ≥ 0.

Theorem 4.41. Let H be a Hilbert space, T ∈ B(H ) self-adjoint and f ∈ C(σ(T )).(a) The map f 7→ f(T ) is an algebraic ∗-homomorphism from C(σ(T )) → B(H ). That

is, (fg)(T ) = f(T )g(T ), (αf)(T ) = αf(T ), 1(T ) = id and f(T ) = f(T )∗.(b) f(T ) is invertible iff f(λ) 6= 0 for all λ ∈ σ(T ).(c) (Spectral mapping theorem). σ(f(T )) = f(σ(T )).(d) ‖f(T )‖ = ‖f‖∞.(e) If Tx = λx, then f(T )x = f(λ)x.(f) If f ≥ 0, then f(T ) ≥ 0.(g) If TS = ST , then f(T )S = Sf(T ).

Proof. (a), (e) and (g) are clear for f, g polynomials, and follow by passing to the limit.For (b), suppose first that f(λ) 6= 0 on σ(T ). Then g = 1/f ∈ C(σ(T )). Moreover,f(t)g(t) = g(t)f(t) = 1, so by (a), f(T )g(T ) = g(T )f(T ) = id, hence f(T ) is invertible.Conversely, suppose f(µ) = 0 for some µ ∈ σ(T ). Let pn be a sequence of polynomialsconverging uniformly to f . We may assume pn(µ) = f(µ) = 0 by considering the sequenceqn(t) = pn(t)−pn(µ). Since µ ∈ σ(T ), then 0 = pn(µ) ∈ σ(pn(T )) by Lemma 4.32. Hence,pn(T ) is not invertible. But the set of non-invertible operators is closed by Theorem 4.34.Hence, f(T ) is not invertible.

For (c), we have µ ∈ σ(f(T )) iff f(T ) − µ is not invertible. Applying (b) for f − µ,this happens iff f − µ has a zero in σ(T ), i.e. iff µ ∈ f(σ(T )).

By (a), f(T ) is normal, so using Corollary 4.36 and (c), we have ‖f(T )‖ = ‖f(T )‖σ =maxµ∈σ(f(T )) |µ| = maxµ∈f(σ(T )) |µ| = maxλ∈σ(T ) |f(λ)| = ‖f‖∞.

For (f), note that f ≥ 0 implies f = g2 for some real-valued g ∈ C(σ(T )), so f(T ) =g(T )2 by (a). Moreover, g(T ) is self-adjoint by (a), so 〈f(T )x, x〉 = ‖g(T )x‖2 ≥ 0.

Page 71: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.6. Borel functional calculus 67

Remark 4.42. The previous functional calculus can be generalized to normal operators.For this, one first shows that for polynomials p in z and z ∈ C, we have σ(p(T, T ∗)) =p(z, z) : z ∈ σ(T ), deduce that ‖p(T, T ∗)‖ = maxz∈σ(T ) |p(z, z)|, then define f(T ) andfind its properties as we did. The reason for this detour is that now σ(T ) can be complex,so the polynomials p(z) are no longer dense in C(K). To apply Stone-Weierstrass, wemust consider polynomials p(z, z).

The functional calculus can be used to define the operator√T for any positive self-

adjoint operator T ∈ B(H ). It turns out to be the unique operator satisfying (√T )2 = T .

One can then define the modulus |T | =√T ∗T for any T ∈ B(H ), and it can be shown

that T has a polar decomposition T = U |T |. If T is invertible, then U is unitary.

4.6 Borel functional calculusThe functional calculus admits an important generalization to the set B(σ(T )) of

bounded Borel functions. The great advantage is that one can now consider characteristicfunctions χA, and it turns out that χA(T ) is an orthogonal projection. The proof is a bitmore difficult because Borel functions are not uniform limits of continuous functions.

Theorem 4.43. Let H be a Hilbert space, T ∈ B(H ) self-adjoint and f ∈ B(σ(T )).One can define an operator f(T ) ∈ B(H ) such that(a) The map f 7→ f(T ) if an algebraic ∗-homomorphism from B(σ(T ))→ B(H ).(b) ‖f(T )‖ ≤ ‖f‖∞.(c) If f(t) = t, then f(T ) = T .(d) If fn, f ∈ B(σ(T )) satisfy supn ‖fn‖∞ < ∞, and fn → f pointwise, then fn(T ) →

f(T ) strongly in B(H ).(e) If Tx = λx, then f(T )x = f(λ)x.(f) If f ≥ 0, then f(T ) ≥ 0.(g) If TS = ST , then f(T )S = Sf(T ).

Proof. Fix x, y ∈H . Define a functional Fx,y on C(σ(T )) by Fx,y(f) = 〈f(T )x, y〉. ThenFx,y is bounded: |Fx,y(f)| ≤ ‖f(T )‖‖x‖‖y‖ = ‖f‖∞‖x‖‖y‖. Hence, Fx,y ∈ C(σ(T ))∗ and‖Fx,y‖ ≤ ‖x‖‖y‖. By the Riesz-Markov representation, we may find a unique Borel regularmeasure µx,y on σ(T ) with total variation |µx,y|(σ(T )) ≤ ‖x‖‖y‖, such that Fx,y(f) =∫f dµx,y, i.e. 〈f(T )x, y〉 =

∫σ(T ) f(λ) dµx,y(λ) for any f ∈ C(σ(T )). We extend this to

Borel functions by defining (Bf)(x, y) =∫σ(T ) f(λ) dµx,y(λ) for f ∈ B(σ(T )). Then B is

linear in x, conjugate linear in y, and |(Bf)(x, y)| ≤ ‖f‖∞|µx,y|(σ(T )) ≤ ‖f‖∞‖x‖‖y‖.Thus, B is a bounded sesquilinear form, so by the Riesz representation theorem of Hilbertspaces, there exists a unique operator, call it f(T ), such that (Bf)(x, y) = 〈f(T )x, y〉, and‖f(T )‖ = ‖B‖ ≤ ‖f‖∞. We thus defined f(T ) for all f ∈ B(σ(T )) and verified (b).

If f(t) = t, then 〈f(T )x, y〉 = (Bf)(x, y) =∫σ(T ) f(λ) dµx,y(λ) =

∫σ(T ) λdµx,y(λ) =

〈Tx, y〉. This holds for all x, y, so (c) is proved. Similarly, (αf)(T ) = αf(T ) and 1(T ) = id.Let fn, f as in (d). By dominated convergence, 〈fn(T )x, y〉 =

∫σ(T ) fn(λ) dµx,y(λ) →∫

σ(T ) f(λ) dµx,y(λ) = 〈f(T )x, y〉 for x, y ∈H . Thus, fn(T ) converges weakly to f(T ). Weprove strong convergence later.

We now prove (fg)(T ) = f(T )g(T ). Fix g ∈ C(σ(T )) and let V = f ∈ B(σ(T )) :(fg)(T ) = f(T )g(T ). Then V ⊃ C(σ(T )) by Theorem 4.41. Moreover, V is a vectorspace since [(αf1 + βf2)g](T ) = α(f1g)(T ) + β(f2g)(T ). To see this equality, note that

Page 72: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

68 Chapter 4. Spectral theory

〈[αh1 + βh2](T )x, y〉 =∫

[αh1 + βh2] dµx,y = α∫h1 dµx,y + β

∫h2 dµx,y = 〈[αh1(T ) +

βh2(T )]x, y〉 for any x, y. Clearly, the constant function 1 ∈ V . Finally, suppose fn ∈ Vsatisfy supn ‖fn‖∞ <∞ and fn → f pointwise. Then 〈(fg)(T )x, y〉 = lim〈(fng)(T )x, y〉 =lim〈fn(T )g(T )x, y〉 = 〈f(T )g(T )x, y〉 by weak convergence. This holds for any x, y, so(fg)(T ) = f(T )g(T ). Hence, f ∈ V . By the monotone class theorem, we thus haveV = B(σ(T )). Now fix f ∈ B(σ(T )) and let V ′ = g ∈ B(σ(T )) : (fg)(T ) = f(T )g(T ).Then V ′ is again a vector space, 1 ∈ V ′, and V ′ ⊃ C(σ(T )) by what we just proved. Ifgn ∈ V ′, sup ‖gn‖∞ <∞ and gn → g pointwise, then 〈(fg)(T )x, y〉 = lim〈(fgn)(T )x, y〉 =lim〈f(T )gn(T )x, y〉 = lim〈gn(T )x, f(T )∗y〉 = 〈g(T )x, f(T )∗y〉 = 〈f(T )g(T )x, y〉. Hence,(fg)(T ) = f(T )g(T ). Using monotone class again, this completes the proof. Similarly weprove f(T ) = f(T )∗ and items (e) and (g). Moreover, (a) implies (f) as before.

We finally prove strong convergence for fn, f as in (d). We know |fn−f |2(T ) convergesweakly to 0, so using (a) we get ‖(fn − f)(T )(x)‖2 = 〈(fn − f)(T )x, (fn − f)(T )x〉 =〈|fn − f |2(T )x, x〉 → 0, as asserted.

Definition 4.44. Let T ∈ B(H ) be self-adjoint and let x, y ∈H . We define the spectralmeasure µx,y of T to be the unique regular Borel measure satisfying

〈f(T )x, y〉 =∫σ(T )

f(λ) dµx,y(λ)

for every f ∈ B(σ(T )). As shown in the previous proof, such a measure exists, and has atotal variation |µx,y|(σ(T )) ≤ ‖x‖‖y‖. We denote µx = µx,x.

Example 4.45. Let Mϕ be the multiplication operator of Example 4.29 and let f ∈B(σ(Mϕ)). Then f(Mϕ) = Mfϕ. To see this, let V = f ∈ B(σ(Mϕ)) : f(Mϕ) = Mfϕ.Clearly, V contains polynomials. If f ∈ C(σ(Mϕ)), let pn be polynomials converginguniformly to f . Then ‖Mpnϕψ−Mfϕψ‖L2 = ‖(pn ϕ)ψ−(f ϕ)ψ‖ = ‖[(pn−f)ϕ]ψ‖ ≤‖pn − f‖∞‖ψ‖L2 , so ‖Mpnϕ −Mfϕ‖ ≤ ‖pn − f‖∞ → 0, hence Mfϕ = limMpnϕ =lim pn(Mϕ) = f(Mϕ), where the last equality holds by definition of f(Mϕ). We thusshowed that V ⊃ C(σ(Mϕ)). Clearly, V is a vector space containing the constant function1. If fn ∈ V , fn ≥ 0, supn ‖fn‖∞ < ∞ and fn ↑ f pointwise, then by Theorem 4.43,f(Mϕ)ψ = lim fn(Mϕ)ψ = limMfnϕψ = lim[fn ϕ]ψ = [f ϕ]ψ = Mfϕψ. Here‖[fn ϕ]ψ − [f ϕ]ψ‖L2 → 0 by monotone convergence. As ψ ∈ L2 is arbitrary, we getf(Mϕ) = Mfϕ, i.e. f ∈ V . Using monotone class, we thus get V = B(σ(Mϕ)).

Definition 4.46. Let T be self-adjoint. We define the spectral projection PT : B(σ(T ))→B(H ) by PT (E) = χE(T ), where χE is the characteristic function of E.

Lemma 4.47 (Projection-valued measure). PT (E) is an orthogonal projection for anyE ∈ B(σ(T )). Moreover,(i) PT (∅) = 0, PT (σ(T )) = id,(ii) For any Borel partition σ(T ) = ∪∞1 Ek, we have id =

∑∞1 PT (Ek). Here convergence

is in the strong sense.

Proof. As χE(t)2 = χE(t) and χE(t) is real-valued, then by Theorem 4.43(a), PT (E)2 =PT (E) and PT (E) is self-adjoint. Hence, PT (E) is a projection.

Let x, y ∈ H . Then 〈χ∅(T )x, y〉 =∫χ∅(λ) dµx,y(λ) = 0, so PT (∅) = 0. Similarly,

χσ(T )(T ) = 1(T ) = id. If fn(t) =∑n

1 χEk(t), then ‖fn‖∞ = 1 since the Ek are disjoint,and fn(t)→ χσ(T ) pointwise since ∪Ek = σ(T ). Hence,

∑n1 PT (Ek)→ id strongly.

Page 73: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.7. Further results 69

Theorem 4.48 (Spectral Theorem III). Let H be a Hilbert space and let T ∈ B(H ) beself-adjoint. Then for any f ∈ B(R), we have

f(T ) =∫σ(T )

f(λ) dPT (λ) ,

in the sense that 〈f(T )x, x〉 =∫σ(T ) f(λ) d〈PT (λ)x, x〉 for any x ∈H .

If T is a compact self-adjoint operator, then it has an orthonormal basis of eigenvectors.If λk are its eigenvalues and if Pk is the orthogonal projection onto Hk = x : Tx = λkx,then T =

∑k λkPk, as the reader can check. If T is only bounded, the spectrum is not

discrete in general, and this is why the sum gets replaced by an integral.

Proof. We have 〈PT (E)x, x〉 =∫σ(T ) χE(λ) dµx(λ) = µx(E) for any Borel E ⊂ σ(T ). So

the theorem follows from the functional calculus.

4.7 Further resultsEach section of this chapter could be considerably expanded. Compact operators

have a rich theory that extends beyond spectral theory, especially in connection with theoperator ideals Ip we briefly mentioned. See [29] for an elementary treatment and [35]for the advanced theory. The are also many results in the spectral theory of compactoperators which we did not cover. A famous one is the Fredholm alternative, which saysthat if X is a Banach space, T ∈ K(X) and λ ∈ C∗, then exactly one of the followingstatements is true:(i) either the homogeneous equation λx− Tx = 0 has a nontrivial solution,(ii) or the inhomogeneous equation λx−Tx = b has a solution for every b ∈ X, and this

solution is automatically unique.This alternative is very useful for studying integral equations. In this case, the homo-geneous equation is λf(t) −

∫ 10 k(t, s)f(s) ds = 0, while the inhomogeneous equation is

λf(t)−∫ 1

0 k(t, s)f(s) ds = b(t). Quite interestingly, this alternative also implies that if Tis a compact operator on an infinite-dimensional Banach space, then σ(T ) = σp(T )∪ 0,where σp(T ) are the eigenvalues of T . This in turn can be used to prove the spectraltheorem for compact self-adjoint operators. For details, see [38] and [28].

The spectrum can be defined in any Banach algebra, not just the algebra of operators.Many results generalize without difficulty to this setting. To speak about adjoints, oneneeds to add some structure to the Banach algebra (namely, an involution) and specify itsrelation with the norm. This gives rise to C∗-algebras. An important result in this theoryis the Gelfand-Naimark theorem for commutative C∗-algebras. We shall not state it hereto avoid too many definitions, but we mention that it can be used to prove a generalizedspectral theorem, which yields Theorem 4.48 for normal operators as a special case. An-other famous theorem in this theory is the Gelfand-Naimark-Segal (GNS) construction,which is perhaps more algebraic in nature, and states that every C∗-algebra can be re-garded as a C∗-subalgebra of B(H ) for some Hilbert space H . For these results, see [32]and [26].

The spectral theorem for bounded self-adjoint operators is hailed as “one of the mostimportant mathematical achievements of all times” in [20]. It has another version whichgoes as follows: any self-adjoint T ∈ B(H ) is unitarily equivalent to a multiplicationoperator. More precisely, there is a measure space (X,µ) and ϕ ∈ L∞(X) such that T andMϕ are unitarily equivalent. This theorem can be refined to give a complete classification

Page 74: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

70 Chapter 4. Spectral theory

of self-adjoint operators up to unitary equivalence. The reader can find these results in[28] and [20].

The Borel functional calculus is a special feature of normal, in particular self-adjointoperators. However, we can define a functional calculus for any bounded operator, ifwe consider the smaller set of holomorphic functions. For this, see [32]. Finally, theBorel functional calculus and the spectral theorem can be extended to unbounded self-adjoint operators. Such an extension is really important in mathematical physics, sincethe Hamiltonian is an unbounded operator. This generalization is not exactly “provedsimilarly” for many reasons. One of them is that we used the compactness of σ(T ) todefine the continuous functional calculus and then the Borel calculus. For unboundedoperators, σ(T ) is no longer compact, so one proceeds differently. There are two ways todo this: either prove the spectral theorem for unbounded operators from scratch, or proveit using the spectral theorem for bounded operators. For the first approach, see [39], forthe second one see [28] and [34]. Here is the result:

Theorem 4.49. To every self-adjoint operator T on a Hilbert space H , we can find aunique projection-valued measure PT such that T =

∫σ(T ) λdPT (λ).

If µx(E) := 〈PT (E)x, x〉, then we can define an operator f(T ) for any Borel functionf : R→ C by

D(f(T )) = x ∈H : f ∈ L2(R, µx) , f(T ) =∫σ(T )

f(λ) dPT (λ) .

This operator has the following properties:(a) idR(T ) = T .(b) For x ∈ D(f(T )), we have ‖f(T )x‖2 =

∫|f(λ)|2 dµx(λ).

(c) f(T )∗ = f(T ). In particular, f(T ) is a normal operator, it is self-adjoint if f isreal-valued, it is positive if f ≥ 0, and it is unitary if |f | = 1.

(d) If f and fg are bounded, then (fg)(T ) = f(T )g(T ).(e) If λ ∈ C \ R, then 1

T−λ = R(λ).(f) If Tx = λx, then f(T )x = f(λ)x.

4.8 Exercises1. Let X,Y be Banach spaces, T : X → Y . Show that the following assertions are

equivalent:(i) T is compact.(ii) T (X1) is compact. Here X1 is the closed unit ball of X.(iii) Any bounded sequence (xn) in X has a subsequence (xnk) such that (Txnk)

converges.2. Define T : `p → `p by T : (xj)∞1 7→ (xj/j)∞1 . Show that T is compact if

(i) 1 ≤ p <∞,(ii) p =∞.

3. Suppose A,B ∈ B(H ) and AB is compact. Which statements are true?(a) Both A and B are compact.(b) At least A or B is compact.

Page 75: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.8. Exercises 71

4. Which of the following statements are true?(a) There exists a compact operator with a closed image.(b) The image of any compact operator is closed.(c) The image of any compact operator is not closed.(d) There exists a compact operator with a non-closed image.(e) There exists a compact operator with a finite dimensional kernel.(f) The kernel of any compact operator is finite dimensional.

5. Which of the following operators T : L2[a, b]→ L2[a, b] has a finite rank ?(a) (Tf)(t) =

∑nj=1 ϕj(t)

∫ ba ψj(s)f(s) ds, where ϕj , ψj ∈ L2[a, b].

(b) (Tf)(t) =∫ ta f(s) ds.

6. Consider the Hilbert space `2 and let T : `2 → `2 be a linear operator defined by amatrix (ai,j)∞i,j=1.(i) Suppose ai,j = bi,jci,j , with supi

∑j |bi,j |2 ≤ C2

1 and supj∑i |ci,j |2 ≤ C2

2 . Showthat T is bounded, with ‖T‖ ≤ C1C2.

(ii) Suppose the ai,j satisfy supi∑j |ai,j | ≤ C2

1 and supj∑i |ai,j | ≤ C2

2 . Show that Tis bounded, with ‖T‖ ≤ C1C2.

(iii) Suppose T is defined by the matrix

T =

a1 a2 a3 · · ·a2 a3 · · ·a3 · · ·· · ·

and suppose

∑∞j=1 |aj | <∞. Show that T is compact.

7. Let H be a separable Hilbert space and T ∈ B(H ) a compact operator.(i) Suppose T is self-adjoint. Show that there exist an orthonormal basis en of H

and a sequence λn ⊂ R such that for each x ∈H , we have x =∑〈x, ek〉ek and

Tx =∑λk〈x, ek〉ek.

(ii) Show that even if T is not self-adjoint, there exist orthonormal sets en and fnand a sequence µn ⊂ R+ such that for each x ∈ H , we have x =

∑〈x, ek〉ek,

and Tx =∑µk〈x, ek〉fk.

Hint: let en correspond to T ∗T as in (i), show that the corresponding λn mustbe positive, then take µn =

√λn.

8. Let (X,µ) be a σ-finite measure space and let K ∈ L2(X × X). Define the inte-gral operator TK : L2(µ) → L2(µ) by (TKf)(x) =

∫X K(x, y)f(y) dµ(y). Show that

(T ∗Kf)(x) =∫X K(y, x)f(y) dµ(y).

9. Let TK : L2[0, 1] → L2[0, 1] be the integral operator with kernel K(x, y) = min(x, y)for 0 ≤ x, y ≤ 1.(a) Show that TK is a compact self-adjoint operator.(b) Suppose TKf = λf , where 0 6= f ∈ L2 and λ ∈ R. Differentiate this equation twice

to obtain a differential equation for f . Obtain the boundary conditions for f andf ′ from the integral equations.

(c) Multiply the differential equation of f by f and integrate from 0 to 1. Deduce thatλ > 0.

Page 76: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

72 Chapter 4. Spectral theory

(d) Find all the eigenvalues of TK .10. Let TK : L2[0, 1] → L2[0, 1] be the integral operator with K(x, y) = χA(x, y), the

characteristic function of A = (x, y) : y ≥ −x+ 1. Find all the eigenvalues of TK .11. Let T : `2 → `2 be the right shift T : (x1, x2, . . .) 7→ (0, x1, x2, . . .). Show that 0 is not

an eigenvalue of T , but 0 ∈ σ(T ).12. Let T : `2 → `2 be defined by T : (xj)∞1 7→ (αjxj)∞1 , where (αj) is dense in [0, 1]. Find

the eigenvalues of T , and σ(T ).13. Let T : `∞ → `∞ be the left shift (x1, x2, . . .) 7→ (x2, x3, . . .).

(a) If |λ| > 1, show that λ ∈ ρ(T ).(b) If |λ| ≤ 1, show that λ is an eigenvalue and find the eigenspace Xλ.

14. Let T : `p → `p be the left shift (x1, x2, . . .) 7→ (x2, x3, . . .). Suppose 1 ≤ p < ∞. If|λ| = 1, is λ an eigenvalue of T ?

15. Let T be a compact self-adjoint operator on a Hilbert space H . Show that σ(T ) =σp(T ), where σp(T ) is the set of eigenvalues of T .

16. For the integral operator of Exercise 9, calculate(i) ‖TK‖ (Hint: calculate ‖TK‖σ).(ii) ‖K‖L2(X×X)

(iii) ‖TK‖2 by two methods: first using (ii), next using the definition of the Hilbert-Schmidt norm, with an orthonormal basis of eigenvectors.

17. Same question as Exercise 16, for the integral operator of Exercise 10.18. Let H be a Hilbert space and let T ∈ B(H ) be self-adjoint. Show that σ(T ) ⊆ [m,M ],

wherem = inf

‖x‖=1〈Tx, x〉 and M = sup

‖x‖=1〈Tx, x〉 .

Hint: Show that if λ = M + c, c > 0, then T − λ is bounded below. Argue similarly ifλ = m− c.

19. Let H be a Hilbert space.(i) Show that if T ≥ 0, then I + T is invertible.(ii) Show that if T ∈ B(H ), then I + T ∗T is invertible.

20. (McCarthy inequalities). Let T be a positive self-adjoint operator on H and x ∈ H .Show that(i) If α ≥ 1, then 〈Tαx, x〉 ≥ 〈Tx, x〉α‖x‖2−2α.(ii) If α ∈ (0, 1], then 〈Tαx, x〉 ≤ 〈Tx, x〉α‖x‖2−2α.

Hint: for (i), apply the Hölder inequality to 〈Tx, x〉 =∫σ(T ) λdµx(λ). For (ii), apply

(i) to Tα and α−1.These inequalities remain true for unbounded operators, if x ∈ D(T ).

21. (Stone’s formula). Let T be a self-adjoint operator. Show that for all ψ ∈H we have

limε↓0

12πi

∫ b

a

[(T − s− iε)−1ψ − (T − s+ iε)−1ψ

]ds = 1

2[χ[a,b](T )ψ + χ(a,b)(T )ψ

].

Hint: let fε(λ) = 12πi∫ ba

( 1λ−s−iε −

1λ−s+iε

)ds. Show that

limε↓0

fε(λ) =

0 if λ /∈ [a, b],12 if λ = a or λ = b,

1 if λ ∈ (a, b)..

Page 77: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

4.8. Exercises 73

Then use the fact that pointwise convergence of fε(λ) implies strong convergence offε(T ).

22. (One-parameter groups). Let X be a Banach space and suppose that to every t ∈ R,there is an associated operator Q(t) such that(i) Q(0) = id,(ii) Q(s+ t) = Q(s)Q(t) for all s, t ∈ R,(iii) limt→0 ‖Q(t)x− x‖ = 0 for every x ∈ X.Then we say that Q(t)t∈R is a strongly continuous one-parameter group.Let A be a self-adjoint operator on a Hilbert space H . Show that the family U(t)t∈Rgiven by U(t) = e−itA is a strongly continuous one-parameter group of unitary opera-tors.Show moreover that the map t 7→ U(t) is strongly differentiable at t = 0, with derivative−iA. That is, show that limt→0 ‖U(t)ψ−ψ

t − (−iAψ)‖ = 0 for any ψ ∈H .Hint: for differentiability, calculate ‖U(t)ψ−ψ

t − (−iAψ)‖2 using µψ and use dominatedconvergence.

Page 78: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions
Page 79: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Bibliography

[1] Robert A. Adams and John J. F. Fournier. Sobolev spaces, volume 140 of Pure andApplied Mathematics (Amsterdam). Elsevier/Academic Press, Amsterdam, secondedition, 2003.

[2] S. Attal. Lectures on Quantum Noise Theory. author’s homepage, Institut CamilleJordan.

[3] Rabi Bhattacharya and Edward C. Waymire. A basic course in probability theory.Universitext. Springer, New York, 2007.

[4] Haim Brezis. Functional analysis, Sobolev spaces and partial differential equations.Universitext. Springer, New York, 2011.

[5] N. L. Carothers. A short course on Banach space theory, volume 64 of London Math-ematical Society Student Texts. Cambridge University Press, Cambridge, 2005.

[6] J.-F. Colombeau. Multiplication of distributions. Bull. Amer. Math. Soc. (N.S.),23(2):251–268, 1990.

[7] John B. Conway. A course in functional analysis, volume 96 of Graduate Texts inMathematics. Springer-Verlag, New York, second edition, 1990.

[8] Anton Deitmar. A first course in harmonic analysis. Universitext. Springer-Verlag,New York, second edition, 2005.

[9] Nelson Dunford and Jacob T. Schwartz. Linear Operators. I. General Theory. Withthe assistance of W. G. Bade and R. G. Bartle. Pure and Applied Mathematics, Vol.7. Interscience Publishers, Inc., New York; Interscience Publishers, Ltd., London,1958.

[10] Yuli Eidelman, Vitali Milman, and Antonis Tsolomitis. Functional analysis, volume 66of Graduate Studies in Mathematics. American Mathematical Society, Providence,RI, 2004. An introduction.

[11] Gregory Eskin. Lectures on linear partial differential equations, volume 123 of Grad-uate Studies in Mathematics. American Mathematical Society, Providence, RI, 2011.

[12] Lawrence C. Evans. Partial differential equations, volume 19 of Graduate Studies inMathematics. American Mathematical Society, Providence, RI, second edition, 2010.

[13] P. J. Fitzsimmons. Math 280c probability theory. author’s homepage, University ofCalifornia.

[14] Gerald B. Folland. Real analysis. Pure and Applied Mathematics (New York). JohnWiley & Sons, Inc., New York, second edition, 1999. Modern techniques and theirapplications, A Wiley-Interscience Publication.

[15] T. Gantumur. Introduction to Distributions. author’s homepage, McGill University.[16] Julio González-Díaz, Ignacio García-Jurado, and M. Gloria Fiestras-Janeiro. An

introductory course on mathematical game theory, volume 115 of Graduate Studies

Page 80: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

76 Bibliography

in Mathematics. American Mathematical Society, Providence, RI; Real SociedadMatemática Española, Madrid, 2010.

[17] A. Grothendieck. Résumé de la théorie métrique des produits tensoriels topologiques.Bol. Soc. Mat. São Paulo, 8:1–79, 1953.

[18] Alexandre Grothendieck. Produits tensoriels topologiques et espaces nucléaires. Mem.Amer. Math. Soc., 1955(16):140, 1955.

[19] Gerd Grubb. Distributions and operators, volume 252 of Graduate Texts in Mathe-matics. Springer, New York, 2009.

[20] A. Ya. Helemskii. Lectures and exercises on functional analysis, volume 233 of Trans-lations of Mathematical Monographs. American Mathematical Society, Providence,RI, 2006. Translated from the 2004 Russian original by S. Akbarov.

[21] Lars Hörmander. The analysis of linear partial differential operators. I–IV, volume256 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles ofMathematical Sciences]. Springer-Verlag, Berlin, second edition, 1990.

[22] Erwin Kreyszig. Introductory functional analysis with applications. John Wiley &Sons, New York-London-Sydney, 1978.

[23] Giovanni Leoni. A first course in Sobolev spaces, volume 105 of Graduate Studies inMathematics. American Mathematical Society, Providence, RI, 2009.

[24] Elliott H. Lieb and Michael Loss. Analysis, volume 14 of Graduate Studies in Math-ematics. American Mathematical Society, Providence, RI, second edition, 2001.

[25] Terry J. Morrison. Functional analysis. Pure and Applied Mathematics (New York).Wiley-Interscience [John Wiley & Sons], New York, 2001. An introduction to Banachspace theory.

[26] Gerard J. Murphy. C∗-algebras and operator theory. Academic Press, Inc., Boston,MA, 1990.

[27] Jindřich Nečas. Direct methods in the theory of elliptic equations. Springer Mono-graphs in Mathematics. Springer, Heidelberg, 2012. Translated from the 1967 Frenchoriginal by Gerard Tronel and Alois Kufner, Editorial coordination and preface byŠárka Nečasová and a contribution by Christian G. Simader.

[28] Michael Reed and Barry Simon. Methods of modern mathematical physics. I. Aca-demic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, second edition,1980. Functional analysis.

[29] J. R. Retherford. Hilbert space: compact operators and the trace theorem, volume 27 ofLondon Mathematical Society Student Texts. Cambridge University Press, Cambridge,1993.

[30] H. L. Royden and P. M. Fitzpatrick. Real analysis. Pearson, fourth edition, 2010.[31] Walter Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third

edition, 1987.[32] Walter Rudin. Functional analysis. International Series in Pure and Applied Mathe-

matics. McGraw-Hill, Inc., New York, second edition, 1991.[33] Xavier Saint Raymond. Elementary introduction to the theory of pseudodifferential

operators. Studies in Advanced Mathematics. CRC Press, Boca Raton, FL, 1991.[34] Konrad Schmüdgen. Unbounded self-adjoint operators on Hilbert space, volume 265

of Graduate Texts in Mathematics. Springer, Dordrecht, 2012.

Page 81: Functional Analysis II - unistra.frirma.math.unistra.fr/~sabri/Lectures_FA.pdf · Functional Analysis II Mostafa SABRI. ii. Contents 1 LCSandDistributions 1 ... The theory of distributions

Bibliography 77

[35] Barry Simon. Trace ideals and their applications, volume 120 ofMathematical Surveysand Monographs. American Mathematical Society, Providence, RI, second edition,2005.

[36] Barry Simon. Convexity, volume 187 of Cambridge Tracts in Mathematics. CambridgeUniversity Press, Cambridge, 2011. An analytic viewpoint.

[37] François Trèves. Topological vector spaces, distributions and kernels. Academic Press,New York-London, 1967.

[38] I. F. Vershynin. Lectures in Functional Analysis. author’s homepage, University ofMichigan.

[39] Joachim Weidmann. Linear operators in Hilbert spaces, volume 68 of Graduate Textsin Mathematics. Springer-Verlag, New York-Berlin, 1980. Translated from the Ger-man by Joseph Szücs.

[40] I. F. Wilde. Distribution Theory (Generalized Functions) Notes. author’s homepage,King’s College London.

[41] I. F. Wilde. Functional Analysis (Topological Vectors Spaces version). author’shomepage, King’s College London.

[42] Stephen Willard. General topology. Addison-Wesley Publishing Co., Reading, Mass.-London-Don Mills, Ont., 1970.

[43] M. W. Wong. An introduction to pseudo-differential operators. World Scientific Pub-lishing Co., Inc., Teaneck, NJ, 1991.

[44] Robert J. Zimmer. Essential results of functional analysis. Chicago Lectures inMathematics. University of Chicago Press, Chicago, IL, 1990.

[45] Maciej Zworski. Semiclassical analysis, volume 138 of Graduate Studies in Mathe-matics. American Mathematical Society, Providence, RI, 2012.