analysis mathematical introduction todl.booktolearn.com/ebooks2/science/mathematics/... ·...

Introduction to Mathematical Analysis

Igor KrizAleš Pultr

Igor KrizAles Pultr

Introductionto MathematicalAnalysis

Igor KrizDepartment of MathematicsUniversity of MichiganAnn Arbor, MIUSA

Ales PultrDepartment of Applied Mathematics (KAM)Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic

ISBN 978-3-0348-0635-0 ISBN 978-3-0348-0636-7 (eBook)DOI 10.1007/978-3-0348-0636-7Springer Basel Heidelberg New York Dordrecht London

Library of Congress Control Number: 2013941992

© Springer Basel 2013This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed. Exempted from this legal reservation are brief excerpts in connectionwith reviews or scholarly analysis or material supplied specifically for the purpose of being enteredand executed on a computer system, for exclusive use by the purchaser of the work. Duplication ofthis publication or parts thereof is permitted only under the provisions of the Copyright Law of thePublisher’s location, in its current version, and permission for use must always be obtained from Springer.Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violationsare liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained herein.

Printed on acid-free paper

Springer Basel AG is part of Springer Science+Business Media (www.birkhauser-science.com)

www.birkhauser-science.com

To Sophie

To Jitka

Preface

This book is a result of a long-term project which originated in courses we taughtto undergraduate students who specialize in mathematics. These students had ı-"calculus before, but there did not seem to be a suitable comprehensive textbook fora follow-up course in analysis.

We wanted to write such a textbook based on our courses, but that was notthe only goal. Teaching bright students is about introducing them to mathematics.Therefore, we wanted to write a book which the students may want to keep after thecourse is over, and which could serve them as a bridge to higher mathematics. Sucha book would necessarily exceed the scope of their courses.

We start with standard material of second year analysis: multi-variable differ-ential calculus, Lebesgue integration, ordinary differential equations and vectorcalculus. What makes all this go smoothly is that we introduce some basic conceptsof point set topology first. Since our aim is to be completely rigorous and as self-contained as possible, we also include a Preliminaries chapter on the basic topicof one-variable calculus, and two Appendices on the necessary concepts of linearalgebra. This pretty much comprises the first part of our book.

With the foundations covered, it is possible to venture much further. The commontheme of the second part of our book is the interplay between analysis and geometry.After a second installment of point set topology, we are quickly able to introducecomplex analysis, and after some multi-linear algebra, also manifolds, differentialforms and the general Stokes Theorem. The methods of manifolds and complexanalysis combine in a treatment of Riemann surfaces. Basic methods of the calculusof variations are applied to a theory of geodesics, which in turn leads to basic tensorcalculus and Riemannian geometry. Finally, infinite-dimensional spaces, which havealready made an appearance in multiple places throughout the text, are treated moresystematically in a chapter on the basic concepts of functional analysis, and anotheron a few of its applications.

The total amount of material in this book cannot be covered in any single yearcourse. An instructor of a course based on this book should probably aim forcovering the first part, and take his or her picks in the second part. As alreadymentioned, we hope to motivate the student to hold on to their textbook, and useit for further study in years to come. They will eventually get to more advancedbooks in analysis and beyond, but here they can get, relatively quickly, their firstglimpse of a big picture.

vii

viii Preface

Because of this, the aim of our book is not limited to undergraduate students. Thistext may equally well serve a graduate student or a mathematician at any careerstage who would like a quick source or reference on basic topics of analysis. Ascientist (for example in physics or chemistry) who may have always been usinganalysis in their work, can use this book to go back and fill in the rigorous detailsand mathematical foundations. Finally, an instructor of analysis, even if not usingthis book as a textbook, may want to use it as a reference for those pesky proofswhich usually get skipped in most courses: we do quite a few of them.

Ann Arbor, USA Igor KrizPrague 1, Czech Republic Ales Pultr

Contents

Part I A Rigorous Approach to Advanced Calculus

1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Real and complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Convergent and Cauchy sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Derivatives and the Mean Value Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Uniform convergence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Series. Series of functions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 A few facts about the Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2 Metric and Topological Spaces I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Subspaces and products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Some topological concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 First remarks on topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Connected spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 Compact metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Uniform convergence of sequences of functions.

Application: Tietze’s Theorems .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3 Multivariable Differential Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Real and vector functions of several variables . . . . . . . . . . . . . . . . . . . . . . . . . 652 Partial derivatives. Defining the existence of a total differential . . . . . . 663 Composition of functions and the chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 714 Partial derivatives of higher order. Interchangeability . . . . . . . . . . . . . . . . . 745 The Implicit Functions Theorem I: The case of a single equation . . . . 776 The Implicit Functions Theorem II: The case of several equations . . . 817 An easy application: regular mappings and the Inverse

Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

ix

x Contents

8 Taylor’s Theorem, Local Extremes and Extremeswith Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4 Integration I: Multivariable Riemann Integral and BasicIdeas Toward the Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971 Riemann integral on an n-dimensional interval . . . . . . . . . . . . . . . . . . . . . . . . 972 Continuous functions are Riemann integrable. . . . . . . . . . . . . . . . . . . . . . . . . . 1003 Fubini’s Theorem in the continuous case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014 Uniform convergence and Dini’s Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025 Preparing for an extension of the Riemann integral . . . . . . . . . . . . . . . . . . . . 1056 A modest extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077 A definition of the Lebesgue integral and an important lemma . . . . . . . 1098 Sets of measure zero; the concept of “almost everywhere” .. . . . . . . . . . . 1139 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5 Integration II: Measurable Functions, Measureand the Techniques of Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171 Lebesgue’s Theorems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172 The class ƒ (measurable functions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183 The Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204 The integral over a set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235 Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287 The Substitution Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1308 Holder’s inequality, Minkowski’s inequality and Lp-spaces . . . . . . . . . . 1359 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6 Systems of Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1451 The problem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1452 Converting a system of ODE’s to a system of integral equations . . . . . 1473 The Lipschitz property and a solution of the integral equation . . . . . . . . 1494 Existence and uniqueness of a solution of an ODE system . . . . . . . . . . . . 1515 Stability of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1536 A few special differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617 General substitution, symmetry and infinitesimal

symmetry of a differential equation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1658 Symmetry and separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1689 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

7 Systems of Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1751 The definition and the existence theorem for a system

of linear differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1752 Spaces of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1793 Variation of constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1814 A Linear differential equation of nth order

with constant coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Contents xi

5 Systems of LDE with constant coefficients. An applicationof Jordan’s Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

8 Line Integrals and Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1931 Curves and line integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1932 Line integrals of the first kind (D according to length) . . . . . . . . . . . . . . . . 1973 Line integrals of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1994 The complex line integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2025 Green’s Theorem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2046 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Part II Analysis and Geometry

9 Metric and Topological Spaces II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2131 Separable and totally bounded metric spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . 2132 More on compact spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2163 Baire’s Category Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2194 Completion .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2215 More on topological spaces: Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2246 The space of continuous functions revisited:

The Arzela-Ascoli Theorem and the Stone-Weierstrass Theorem.. . . . 2297 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

10 Complex Analysis I: Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2371 The derivative of a complex function. Cauchy-Riemann

conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2372 From the complex line integral to primitive functions . . . . . . . . . . . . . . . . . 2433 Cauchy’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2454 Taylor’s formula, power series, and a uniqueness theorem . . . . . . . . . . . . 2485 Applications: Liouville’s Theorem, the Fundamental

Theorem of Algebra and a remark on conformal maps . . . . . . . . . . . . . . . . 2526 Laurent series, isolated singularities and the Residue Theorem .. . . . . . 2547 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

11 Multilinear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2671 Hom and dual vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2672 Multilinear maps and the tensor product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2713 The exterior (Grassmann) algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2764 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

12 Smooth Manifolds, Differential Forms and Stokes’ Theorem . . . . . . . . . 2871 Smooth manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2872 Tangent vectors, vector fields and differential forms. . . . . . . . . . . . . . . . . . . 2923 The exterior derivative and integration of differential forms . . . . . . . . . . 2984 Integration of differential forms and Stokes’ Theorem . . . . . . . . . . . . . . . . 3015 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

xii Contents

13 Complex Analysis II: Further Topics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3111 The Riemann Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3122 Holomorphic isomorphisms of disks onto polygons

and the Schwartz-Christoffel formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3173 Riemann surfaces, coverings and complex differential forms . . . . . . . . . 3214 The universal covering and multi-valued functions . . . . . . . . . . . . . . . . . . . . 3325 Complex analysis beyond holomorphic functions . . . . . . . . . . . . . . . . . . . . . 3406 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

14 Calculus of Variations and the Geodesic Equation . . . . . . . . . . . . . . . . . . . . . 3491 The basic problem of the calculus of variations,

and the Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3492 A few special cases and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3523 The geodesic equation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3564 The geometry of geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3605 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

15 Tensor Calculus and Riemannian Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3671 Tensor calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3682 Affine connections .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3713 Tensors associated with an affine connection: torsion

and curvature .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3744 Riemann manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3785 Riemann surfaces and surfaces with Riemann metric. . . . . . . . . . . . . . . . . . 3816 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390

16 Banach and Hilbert Spaces: Elements of Functional Analysis . . . . . . . . 3931 Banach and Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3932 Uniformly convex Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3953 Orthogonal complements and continuous linear forms . . . . . . . . . . . . . . . . 3974 Infinite sums in a Hilbert space and Hilbert bases . . . . . . . . . . . . . . . . . . . . . 4025 The Hahn-Banach Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4086 Dual Banach spaces and reflexivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4117 The duality of Lp-spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4158 Images of Banach spaces under bounded linear maps . . . . . . . . . . . . . . . . . 4199 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

17 A Few Applications of Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4271 Some preliminaries: Integration by a measure . . . . . . . . . . . . . . . . . . . . . . . . . 4272 The spaces Lp�.X;C/ and the Radon-Nikodym Theorem . . . . . . . . . . . . . 4323 Application: The Fundamental Theorem of (Lebesgue) Calculus. . . . . 4354 Fourier series and the discrete Fourier transformation .. . . . . . . . . . . . . . . . 4405 The continuous Fourier transformation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4436 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448

Contents xiii

A Linear Algebra I: Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4511 Vector spaces and subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4512 Linear combinations, linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4543 Basis and dimension .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4574 Inner products and orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4605 Linear mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4646 Congruences and quotients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4687 Matrices and linear mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4698 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474

B Linear Algebra II: More about Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4771 Transforming a matrix. Rank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4772 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4793 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4854 More about determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4895 The Jordan canonical form of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4936 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

Index of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505

Introduction

The main purpose of this introduction is to tell the reader what to expect whilereading this book, and to give advice on how to read it. We assume the reader to beacquainted with the basics of differential and integral calculus in one variable, astraditionally covered in the first year of study. Nevertheless, we include, for thereader’s convenience, in Chapter 1, a few pivotal theoretical points of analysisin one variable: continuity, derivatives, convergence of sequences and series offunctions, the Mean Value Theorem, Taylor expansion, and the single-valuedRiemann integral. The purpose of including this material is two-fold. First, wewould like this text to be as self-contained as possible: we wish to spare the reader atedious search, in another text, for an elementary fact he or she may have forgotten.The second, and perhaps more important reason, is to focus attention on factsof elementary differential and integral calculus that have deeper aspects, and arefundamental to more advanced topics. In connection with this, we also review in theexercises to Chapter 1 definitions of elementary functions and proofs, from the firstprinciples, of their properties needed later. What we omit at this stage is a proof ofthe existence of real numbers; the reader probably knows it from elsewhere, but ifnot there will be an opportunity to come back and do it as an exercise to Chapter 9.

An entirely different prerequisite is linear algebra. While not a part of mathemat-ical analysis in the narrowest sense, it contains many necessary techniques. In fact,differential calculus (in particular in more than one variable) can be without muchexaggeration understood as the study of linear approximations of more generalmappings, and a basic knowledge in dealing with the linear case is indispensable.The reader’s skills in these topics (determinants, linear equations, operations withmatrices, and others) may determine to a considerable degree his or her success witha large part of this book. Because of this, we feel it is appropriate to include linearalgebra in this text as a reference. In order not to slow down the narrative, we doso in two appendices: Appendix A for more theoretical topics such as vector spacesand linear mappings, and Appendix B for more computational questions regardingmatrices, culminating with a treatment of the Jordan canonical form.

Let us turn to the main body of this book. It is divided into two parts. Oneof our main goals is to present a rigorous treatment of the traditional topicsof advanced calculus: multivariable differentiation, (Lebesgue) integration, anddifferential equations. All this is covered by Part I, including basic facts about lineintegrals and Green’s Theorem.

xv

xvi Introduction

In Part II we use the techniques developed in Part I to approach phenomena ofgeometrical nature which the reader may already have encountered without proof,or will certainly encounter in further studies. Using the tools developed they can beprobed into considerable depth without too much further difficulty.

Part I. We think it essential to start rigorous advanced calculus with the basicnotions of (at least metric) topology. Concepts such as neighborhood, open set,closure and convergence viewed narrowly just in the context of the Euclidean spaceRn do not give a satisfactory picture of what is going on (and besides, would not besufficient for what will come later). In Chapter 2, we discuss these concepts first inthe context of metric spaces. This generality is, strictly speaking, already sufficientfor most of our purposes. Yet, it is useful to learn about the more general topologicalspaces to be able to distinguish what really depends on metric and what does not.For this reason, our treatment of space in this chapter is an interweaving narrativeof metric and topological spaces with the goal of presenting an adequate generaloutlook. We stick here, however, to the simpler facts and concepts needed in thenearest chapters (the more advanced topics on spaces are postponed to Chapter 9)and the reader will certainly not find this chapter hard.

With the basic knowledge of metric topology we are ready for multivariabledifferential calculus. This is covered in Chapter 3. We start with the basic notionsof partial derivative and total differential, and chain rule. Emphasizing the role ofthe total differential as a linear map is key to more coordinate-free approachesto analysis, vital when investigating manifolds (later in Chapter 12). Next, weprove from first principles the Implicit Function Theorem. This is the first morecomplicated analytic proof in our text; the reader is advised to pay detailed attentionto this material, and certainly not to skip it, as it is a good model of what a proof inanalysis looks and feels like. The chapter is concluded with material related to themultivariable version of Taylor’s Theorem, and to calculating extremes and saddlesof multivariable functions.

The following two chapters are devoted to integration. In Chapter 4 we start withthe multivariable Riemann integral over a product of intervals. It becomes clear veryquickly, however, that a more versatile theory is needed. For example, we want totake integrals of unbounded functions, or integrals over more general types of sets.Or, we would like to know when we can take limits or derivatives behind the integralsign. We would like to understand precisely why “the boundary does not matter”when taking a multivariable integral, and how and why we can change variablesin a non-linear way. All this leads to the concept of Lebesgue integral, which issomewhat notorious for being time-consuming because of the abstract concepts itentails. There is, however a way around that. A method of P.J. Daniell (going backto 1918 and unjustly neglected for decades) allows a straightforward introductionof the Lebesgue integral starting with monotone limits of Riemann integrals ofcontinuous functions with compact support. The necessary technical theorems canbe proved very quickly and we present them in the second half of Chapter 4. InChapter 5, we go on to present the more technical aspects of the Lebesgue integral.We explain how to take limits and derivatives behind the integral sign, prove Fubini’sTheorem, define the Lebesgue measure and prove its basic properties. Further, we

Introduction xvii

introduce Borel sets and prove criteria of measurability. We present a rigorous proofof a multvariable substitution theorem. Finally, we introduce Lp-spaces: while thismay seem like an early place, we will have enough integration theory at this point todo so, and to prove their basic properties. This is useful, as theLp-spaces often occurthroughout analysis (for example, in this book, we will use them in Chapters 13and 15 in proving the existence of a complex structure on an oriented surface witha Riemann metric.) We will return to the study of Lp-spaces in Chapter 16, wherethey provide the most basic examples in functional analysis.

Next, having covered differentiation and integration, we turn to differentialequations. We restrict our attention to the ordinary differential equations (ODEs),as partial differential equations have quite a different flavor and constitute a vastfield of their own, far beyond a general course in analysis (even an advanced one).For a text on partial differential equations, we refer the reader, for example, to [5].

Chapter 6 on (general) ordinary differential equations is in fact independent ofChapters 4 and 5 and uses only the material of Chapters 1, 2 and 3. We introducethe concept of a Lipschitz function and prove the local existence and uniquenesstheorem for the systems of ODEs (the Picard-Lindelof Theorem). We also discussstability of solutions and differentiation with respect to parameters. Further, wediscuss the basic method for separation of variables, and finally discuss globaland infinitesimal symmetries of systems of ODEs (thus motivating further study ofvector fields); also, we explain how the methods of separation of variables discussedearlier are related to symmetries of the system.

Chapter 7 covers some aspects of linear differential equations (LDEs). The globalexistence theorem is proved, and the affine set of solutions of a linear system isdiscussed. We show how to use the Wronskian for recognizing a fundamental system(� basis) of the space of solutions of a homogeneous system of LDEs, and how toget solutions of a non-homogeneous system from the homogeneous one using thevariation of constants. Also, we present a method of solving systems of LDEs withconstant coefficients, easier in the case of a single higher order LDE, and requiringthe Jordan canonical form of a matrix from Appendix B in the harder general case.

Chapter 8, concluding Part I, treats parametric curves, line integrals of the firstand second kind and the complex line integral. At the end we prove Green’sTheorem, which we will need when dealing with complex derivatives, but whichis also an elementary warm-up for the general Stokes’s Theorem.

Part II. Now our perspective changes. The traditional items of advanced calculushave been mostly covered and we turn to topics interesting from the point of viewof geometry.

To proceed, perhaps by now not surprisingly, we need another installment oftopological foundations. This is done in Chapter 9, presenting more material ontopological spaces (separability, compactness, separation axioms and the UrysohnTheorem) as well as on metric spaces (completion, Baire’s Category Theorem). Inthe last section we prove the Stone-Weierstrass Theorem providing a remarkablygeneral method to obtain useful dense sets in spaces of functions, and the Arzela-Ascoli Theorem, which greatly clarifies the meaning of uniform convergence, andwill be useful in Chapter 10 when proving the Riemann Mapping Theorem.

xviii Introduction

Next, in Chapter 10 we introduce the basic methods of complex analysis. Thefundamental facts can be derived almost immediately from the complex line integraland Green’s Theorem of Chapter 8. The conclusions, however, are powerful andsurprising. Unlike differentiable functions of a real variable, complex functionswith a complex derivative (holomorphic functions) are much more rigid. Theyare determined, for example, by their values on a convergent sequence of points,and the existence of a derivative automatically implies the existence of derivativesof all orders; on the other hand, “geometrically very smooth” functions may nothave a complex derivative. Thus, our view of the differential calculus as we knowit from the real case is turned upside down. Yet, complex analysis has importantreal applications, such as for instance the explanation of the convergence propertiesof a Taylor series. Other applications presented are the Fundamental Theorem ofAlgebra, and an important geometric one, the Jordan Curve Theorem. We thengo on to cover other basic methods of complex analysis, such as Laurent series,the classification of isolated singularities, the Residue Theorem and the ArgumentPrinciple, which has several interesting applications, including the Open MappingTheorem.

Next, we also have to upgrade our knowledge of linear algebra; more specifically,we must get acquainted with the techniques of multilinear algebra. This is done inChapter 11, which includes dual vector spaces, tensor products, and the exterior(Grassmann) algebra. Thus equipped, we can now study calculus on manifolds.This is done in Chapter 12. We define smooth manifolds, tangent vectors, vectorfields, and differential forms. Further, we present the exterior derivative, de Rhamcomplex, and the de Rham cohomology. A general form of Stokes’s Theorem isproved and related to the operators grad, div and curl as introduced in traditionalcalculus courses.

A combination of the study of manifolds with complex analysis in one variableleads to the concept of Riemann surfaces. Their basic theory is presented inChapter 13. We begin the chapter with the Riemann Mapping Theorem, showingconformal equivalence (holomorphic isomorphism) of simply connected properopen subsets of C. We also present the Schwarz-Christoffel integrals givingconformal equivalence between open convex polygons and the unit open disk;examples include elliptic integrals. Then we introduce the theory of Riemannsurfaces and coverings, and construct universal coverings. We will see that even ifwe are not interested in the abstraction of manifolds, this formalism will greatlyenhance our understanding of complex integration: we will now be able, forexample, to integrate a holomorphic function over a homotopy class of continuouspaths. We will also be able to understand how to make rigorous the conceptof a “multi-valued holomorphic function”, which was strongly suggested by themethods of Chapter 10, yet could not be adequately approached by its methods.Finally, studying complex differential forms on Riemann surfaces will lead us tothe basic notions of “dz-d z-calculus”, which is very helpful in complex analysis.To demonstrate, we will apply this to extending some of the methods of complexanalysis beyond the case of holomorphic functions.

Introduction xix

Chapter 14 is devoted, primarily, to the basic problem of the calculus ofvariations in one independent variable, and to the Euler-Lagrange equation forcritical functions. Here, only the material of Part I is used. In the second half ofthe chapter we define a Riemann metric on an open subset of Rn and discuss thegeodesic equation in more detail. Also, we prove the local minimality of geodesics.A part of the reason for introducing this material is a motivation of the topicsinvestigated in Chapter 15 where we combine it with the material on manifolds toobtain the basic concepts of Riemannian geometry. We start with tensor calculus andthen move on to affine connections, Riemann metrics on manifolds and curvature,and give a local characterization of the Euclidean space as the Riemannian manifoldwith zero curvature tensor. Using the methods of the last section of Chapter 13, wealso show how to construct a complex structure on an oriented Riemann manifoldin dimension 2.

Chapter 16 concerns Hilbert and Banach spaces, and introduces the basicconcepts of functional analysis. Here we need as prerequisites only the techniquesfrom Part I and Chapter 9. We start with the definition and basic properties ofHilbert spaces. We show that a Hilbert space provides, in a sense, an infinite-dimensional extension of the nice properties of the finite-dimensional vector spaceswith inner product. Banach spaces are also introduced; their theory is much harder,but nevertheless we are able to prove a few neat results. Starting with Hahn-Banach’sTheorem, we go on to examining duals of Banach spaces, proving, for example, thatthe dual of Lp is Lq for 1 < p < 1, 1=pC 1=q D 2. We will also prove the OpenMapping Theorem and the Closed Graph Theorem.

In Chapter 17 we present some applications, mainly of Hilbert spaces. (One tendsto use Hilbert spaces wherever possible, precisely because they are much easier.)We will prove the Radon-Nikodym Theorem, and use it to prove a version of theFundamental Theorem of Calculus for the Lebesgue integral, a fairly hard fact.In the framework of Hilbert spaces we also define Fourier series and (continuous)Fourier transformation. As a fringe benefit, we will introduce Borel measures. Thetheory of Lp-spaces generalizes to this case, and includes some interesting newexamples.

When using this book as a textbook for a course, an instructor should aim to covermost of Part I in the first semester. After this, one can work with Part II, basically,on three independent tracks, thus customizing the course as needed, and as timepermits.

For Chapters 10 and 14, no additional prerequisities beyond Part I are needed.All the other chapters require Chapter 9; just this added to Part I suffices for thestudy of Hilbert spaces.

In the remaining group, multilinear algebra of Chapter 11 has to precedemanifolds in Chapter 12 and Riemannian geometry in Chapter 15, which also usesthe facts from Chapter 14, and its last section uses Chapter 10.

Chapter 13 uses Chapter 10 and Chapter 12.Using this dependence of chapters, an instructor may decide about the topics for

the next semester, possibly assigning some material to the students as independent

xx Introduction

reading. There is no need to cover entire chapters, there are endless possibilities howto mix and match topics to create an interesting course.

The student (or reader) is, in any case, most strongly encouraged to keep thebook for further study. As already mentioned, we anticipate that graduate studentsof mathematics, mathematicians and scientists in areas using analysis, as well asinstructors of courses in analysis will find this book useful as a reference, and willfind their own ways through the topics.

In the Bibliography section, the reader will find suggestions for further reading.In the more advanced sections of this text, we often introduce concepts (such as“Lie group” or “de Rham cohomology”) which arise as a natural culmination ofour discussion, but whose systematic development is beyond the scope of this book.These concepts are meant to motivate further study. We would like to emphasizethat our list of literature is by no means meant to be complete. The books we dosuggest all have a fairly close connection to the present text, and to mathematicalanalysis. They contain more detailed information, as well as suggestions of furtherliterature.

Finally, we would like to say a few words about sources. The overall conceptionof the book is original: we designed the logic of the interdependence of topics, andthe strategy for their presentation. Many proofs are, in fact, also “original” in thesense that we made up our own arguments to fit best the particular stage of thepresentation (the book contains no new mathematical results). Given the scope ofthe project, however, we did, in some cases, consult lecture notes, other books andoccasionally even research papers for particular proofs. All the books used are listedin the Bibliography at the end of the book. In the case of research papers, we givethe name of who we believe is the original author of the proof, but do not includeexplicit journal references, as we feel an effort of being even partially fair would leadto a web of references which would only bewilder a first-time student of the subject.We did want to mention, however, that there are also quite a few proofs which seemto have become “standard” in this field (including, sometimes, particular notation),and whose original author we were not able to track down. We would like to thankall of those, who, by inventing those proofs, contributed to this book implicitly. Wewould also like to thank colleagues and students who read parts of our book, andgave us valuable comments. Last but not least, the authors gratefully acknowledgethe support of CE-ITI of Charles University, the Michigan Center for TheoreticalPhysics and the NSF.

Part I

A Rigorous Approach to Advanced Calculus

1Preliminaries

The typical reader of this text will have had a rigorous “ı-"” first year calculuscourse, using a text such as for example [22]. Such a course will have includeddefinitions and basic properties of the standard elementary functions (polynomials,rational functions, exponentials and logarithms, trigonometric and cyclometricfunctions), the concept of continuity of a real function and the fact that continuityis preserved under standard constructions (sum, product, composition, etc.), and thebasic rules of computing derivatives. We review here mainly the more theoreticalaspects of these topics. The reason for reviewing them are two-fold. The first reasonis that we would like this text to be as self-contained as possible. The second reasonis that some of the basic results have, in fact, substantial depth in them, and the moreadvanced topics on which this book focuses make heavy use of them. Not reviewingsuch topics would at times even create a danger of circular arguments.

1 Real and complex numbers

Perhaps it is useful to go over a few basic conventions first. By a map or mappingfrom a set S to a set T we mean a rule which assigns to each element of S preciselyone element of T . Two rules are considered the same if they always produce thesame value (in T ) on the same input element (of S ).

Therefore, technically, a map is a binary relation, i.e. a set R of pairs .x; y/,x 2 S , y 2 T , such that for each s 2 S , there is precisely one .s; y/ 2 R. Thesets S , T are called the domain and codomain, respectively. We will denote a mapf from a set S to a set T by f W S ! T . For such a map, and a set X � S , we willdenote by f ŒX� the set of all elements f .x/ such that x 2 X . Similarly, for Y � T ,we will denote by f �1ŒY � the set of all x 2 S such that f .x/ 2 Y . The set f ŒX�is called the image of the set X under the map f , and the set f �1ŒY � is called thepre-image of the set Y under f . This use of the square bracket may perhaps seemunusually pedantic, but will soon pay off in the text below. The image f ŒS� of thedomain is sometimes called the image of the map f .

I. Kriz and A. Pultr, Introduction to Mathematical Analysis,DOI 10.1007/978-3-0348-0636-7 1, © Springer Basel 2013

3

4 1 Preliminaries

To comment briefly on the use of inclusion symbols, throughout this book, wegenerally use � to denote a subset with possible equality; when equality is excluded,we use ¨. We generally avoid the somewhat ambiguous symbol �. When we do useit, it means � in a context where equality is a priori excluded for an obvious reasonnot entering the logic of the argument (this may happen, for example, for a finitesubset of the real numbers when we are not using the finiteness to conclude that thecomplement is non-empty).

Returning to the subject of mappings, for a map f W S ! T , and U � T , oftenit is useful to have a special symbol for the map g W U ! T which is defined byg.x/ D f .x/ when x 2 U . The map g is called the restriction to the subset U , anddenoted by f jU or f jU .

A map is called onto if f ŒS� D T , and is called one-to-one (briefly 1-1) if forevery s1; s2 2 S , f .s1/ D f .s2/ ) s1 D s2. Onto maps are also called surjectiveand one-to-one maps are called injective. A bijective map is a map which is bothsurjective and injective.

The composition of maps f W S ! T , g W T !U will be denoted by g ıf .x/ Dg.f .x// for x 2 S . In fact, the circle is often omitted, and instead of g ı f ,one simply writes gf . One must, of course, make sure there is no possibility ofconfusion with multiplication. The identity map IdS on a set S is defined simply byIdS.x/ D x for every x 2 S . Note that a bijective map f W S ! T has an inverse,i.e. a map f �1 such that f �1 ı f D IdS , f ı f �1 D IdT .

We will use the symbol N to denote the set of (positive) natural numbersf1; 2; : : : g. The set of non-negative integers will be denoted by N0, and the set of allintegers by Z. The set of all rational numbers will be denoted by Q. The set R ofreal numbers needs more attention.

1.1

Let us summarize the structure of the set R of real numbers as it will be used in thistext. We do not give a rigorous construction of the real numbers at this point. Such aconstruction however will emerge in the context of our discussion of completenessin Chapter 9, where it is reviewed as an exercise.

First R is a field, that is, there are binary operations, addition C and multiplica-tion � (which will be often indicated simply by juxtaposition) that are associative(that is, a C .b C c/ D .a C b/ C c and a.bc/ D .ab/c) and commutative(that is, a C b D b C a and ab D ba) and related by the distributivity law(a.bCc/ D abCac). There are neutral elements, zero 0 and one (also called unit) 1,such that aC 0 D a and a � 1 D a. With each a 2 R we have associated an element�a 2 R such that aC.�a/ D 0; almost the same holds for the multiplication wherewe have for every non-zero a an element 1

a(also denoted by a�1) such that a � 1

aD 1.

Furthermore there is a linear order � on R (a binary relation such that a � a,that a � b and b � a implies a D b, that a � b and b � c implies a � c, andfinally that for any a; b either a � b or a D b or a � b), and this order is preservedby addition and by multiplication by elements that are � 0.

1 Real and complex numbers 5

Then, we have the absolute value jaj equal to a if a � 0 and to �a if a � 0. Oneoften views R as a line with ja � bj representing the distance between a and b.

For M � R, we say that a is an upper (resp lower) bound of M if x � a (resp.x � a) for all x 2 M . A supremum (resp. infimum), denoted by

supM (resp. infM ),

is the least upper bound (resp. greatest lower bound), if it exists. Thus, the supremums of M is characterized by the properties(1) 8x 2 M; x � s, and(2) if x � a for all x 2 M then s � a

(similarly for infimum with � instead of �). (2) It is often expediently replaced by(2’) if a < s then there is an x 2 M such that a < x(realize that (1)&(2) is indeed equivalent to (1)&(2’)). It is a specific property of theordered field R that

each non-emptyM � R that has an upper bound has a supremumor, equivalently, that

each non-emptyM � R that has a lower bound has an infimum.In mathematical analysis, it is often customary to use the symbols 1 D C1

and �1. The supremum (resp. infimum) of the empty set is defined to be �1(resp. 1), and the supremum (resp. infimum) of a set with no upper bound (resp. nolower bound) is defined to be 1 (resp. �1). Accordingly, it is customary to write�1 < a < C1 for any real number a, and to define 1 C 1 D 1 � .�1/ D 1,and .�1/C .�1/ D .�1/ � 1 D �1, a � ˙1 D ˙1 resp. a � ˙1 D 1for a > 0 resp. a < 0. It is important to keep in mind, however, that the symbols 1,�1 are not real numbers, and expressions such as 1 � 1 or 0 � 1 are undefined(although see Section 6 of Chapter 4 for an exception).

If M is a subset of R and sup.M/ 2 M (resp. inf.M/ 2 M ), we say that thesupremum (resp. infimum) is attained, and speak of a maximum resp. minimum. Inthis case, we may use the notation

maxM; minM:

It is important to keep in mind that, unlike the supremum and infimum, a maximumand/or minimum of a non-empty bounded subset of R may not exist. A non-emptyfinite subset of R, however, always has a maximum and a minimum.

Variants of notation associated with suprema and infima (resp. maxima andminima) are often used. For example, instead of supM , one may write

supx2M

x;

and similarly for the infimum, etc.

6 1 Preliminaries

Let us fix notations for open and closed intervals in R: As usual, .a; b/means theset of all x 2 R such that a < x < b, where a; b are real numbers or ˙1. We willdenote by ha; bi the corresponding closed interval, i.e. the set of all x 2 R[ f˙1gsuch that a � x � b. The reader can fill in the meaning of the symbols ha; b/, .a; bi.

1.2

The field of complex numbers C can be represented as R R with addition.x1; x2/ C .y1; y2/ D .x1 C y1; x2 C y2/ and multiplication .x1; x2/.y1; y2/ D.x1y1 � x2y2; x1y2 C x2y1/; we have the zero .0; 0/ and the unit .1; 0/. It is aneasy exercise to check that C has the arithmetic properties of a field (associativity,commutativity, distributivity) and that �.x1; x2/ D .�x1;�x2/ and .x1; x2/�1 D.

x1

x21 C x22;

�x2x21 C x22

/. The field of complex numbers, however, has no reasonable

order.One introduces the complex conjugate of x D .x1; x2/ as x D .x1;�x2/. It is

easy to see that

x C y D x C y and x � y D x � y: (*)

Further, there is the absolute value (also called the modulus) defined by setting

jxj D pxx D

qx21 C x22

(thus, x�1 D x

jxj2 ).

If we view C as the Euclidean plane (one often speaks of the Gaussian plane)then jxj is the standard distance of x from .0; 0/, and jx � yj is the standardPythagorean distance.

Usually one sets i D .0; 1/ and writes

x1 C ix2 for .x1; x2/

(note that the multiplication rule in C comes from distributivity and the equalityi 2 D �1). In the other direction, one puts

Re.x1 C ix2/ D x1; Im.x1 C ix2/ D x2;

and calls these real numbers the real resp. imaginary part of x1 C ix2.We have a natural embedding of fields

.x 7! .x; 0// W R ! C

which will be used without further mention; note that this embedding respects theabsolute value.


1.3 Theorem. For the absolute value of complex numbers one has

jx C yj � jxj C jyj:

Proof. Let x D x1 C ix2 and y D y1 C iy2. We can assume y ¤ 0. For any realnumber � we have 0 � .xj C �yj /

2 D x2j C 2�xjyj C �2yj , j D 1; 2. Addingthese inequalities, we obtain

0 � jxj2 C 2�.x1y1 C x2y2/C �2jyj2:

Setting � D �x1y1 C x2y2

jyj2 yields

0 � jxj2 � 2.x1y1 C x2y2/2

jyj2 C .x1y1 C x2y2/2

jyj4 jyj2 D jxj2 � .x1y1 C x2y2/2

jyj2

and hence .x1y1 C x2y2/2 � jxj2jyj2. Consequently,

jx C yj2 D .x1 C y1/2 C .x2 C y2/

2 D jxj2 C 2.x1y1 C x2y2/C jyj2 �� jxj2 C 2jxjjyj C jyj2 D .jxj C jyj/2: ut

1.3.1 Corollary. If x D x1 C ix2 and y D y1 C iy2 then

jx � yj � jx1 � y1j C jx2 � y2j and jxj � yj j � jx � yj:

1.3.2 Comment:A function is basically the same thing as a map, although in many texts (includingthis one), the term function is reserved for a map whose codomain is a set whoseelements we perceive as numbers, or at least some closely related generalizations.For example, the codomain may be R, C or a subset of one of these sets, or it maybe, say, Œ0;1�. Sometimes, we will allow the codomain to consist even of n-tuplesof numbers, see for example Chapter 3. While many basic courses define functionssimply by formulas without worrying about the domain and codomain, in a rigorousview of the subject, specifying domains and codomains is essential for capturingeven the most basic phenomena: Consider, for example, the function

f .x/ D x2: (*)

If we specify the domain as R, the function certainly cannot have an inverse nomatter what the codomain is, since it is not injective. If we do specify the domain,say, as Œ0;1/, and the codomain as R, there is still no inverse, since the function isnot onto. If, however, (*) is considered as a function

f W Œ0;1/ ! Œ0;1/;

8 1 Preliminaries

then there is an inverse, which is rather useful, namely

f �1.x/ D px:

1.4 Polynomials and their roots

1.4.1Recall that a polynomial with coefficients in R resp. C is an expression which iseither

0

(the zero polynomial) or is of the form

p.x/ � anxn C � � � C a1x C a0 with aj 2 R resp. C (*)

for some n 2 N0, where an ¤ 0. Technically, then, a non-zero polynomial issimply the .nC 1/-tuple of real (resp. complex) numbers .a0; : : : ; an/. (This is theinformation we would have to specify if we were to store the polynomial, say, on acomputer.)

The number n is called the degree of the polynomial p.x/. The degree of thezero polynomial is not defined.

Of course, the polynomial (*) also determines a function

.x 7! anxn C � � � C a1x C a0/ W R ! R resp. C ! C:

The zero polynomial determines a function, too, namely one which is constantly 0.In analysis, it is quite common to identify a polynomial with the function itdetermines (although note carefully that the domain and codomain of the functioncorresponding to a polynomial with real coefficients will change if its coefficientsare considered as complex numbers). Nevertheless, this identification is permissible,since two different polynomials over R (resp. C) never correspond to the samefunction. To this end, note that it suffices to show that a non-zero polynomial doesnot correspond to the 0 function (by passing to the difference). To this end, simplynote that if jx0j is very large, then

janxn0 j > jan�1j � jxn�10 j C � � � C ja0j � jan�1xn�1

0 C � � � C a0j;

and hence p.x0/ ¤ 0 by the triangle inequality.In fact, much more is true: a polynomial of degree n can be zero at no more than

n different points of R (or C). Define a complex root of a polynomial p.x/ to be anumber c 2 C such that p.c/ D 0. If c 2 R, we speak of a real root.


1.4.2 Lemma. If p.x/ is a polynomial with coefficients in C with root c 2 C, thenthere exists a unique polynomial q.x/ with coefficients in C such that

p.x/ D q.x/.x � c/:

Moreover, q.x/ has degree n � 1. If the coefficients of p.x/ and the number c arereal, then the coefficients of the polynomial q.x/ are real.

Proof. For existence, recall (or observe by chain cancellation) that for k 2 N,

xk � ck D .x � c/.xk�1 C xk�2c C � � � C xck�2 C ck�1/:

Therefore,

p.x/ � p.c/ D an.xn � cn/C � � � C a1.x � c/

can be written as x � c times another polynomial. If c is a root of p.x/, p.c/ D 0

by definition, so our statement follows.For uniqueness, note that for a non-zero polynomial q.x/ of degree k, the

polynomial q.x/.x � c/ has degree k C 1, and hence is non-zero. ut

We immediately have the following

1.4.3 Corollary. A polynomial p.x/ of degree n with coefficients in R or C has atmost n distinct roots.

1.4.4 Proposition. Let c be a (possibly complex) root of a polynomial p withcoefficients in R. Then the complex conjugate c is also a root of p.

Proof. By 1.2.(*), p.c/ D p.c/. ut

The Fundamental Theorem of Algebra (which will be proved in Chapter 10,Theorem 5.2), states that

every polynomial of degree � 1 has a root in C:

By Lemma 1.4.2, we then see that every polynomial of degree n with coefficientsin C can be written uniquely (up to order of factors) as

p.x/ D an.x � c1/ � � � � � .x � cn/

for some complex numbers c1; : : : ; cn. (The uniqueness is proved by induction.)Note that the numbers c1; : : : ; cn may not be all distinct. When c D ci for exactlyk > 0 different values i 2 f1; : : : ; ng, we say that the root c has multiplicity k.

10 1 Preliminaries

Applying Proposition 1.4.4 inductively, if a polynomial p.x/ has real coefficients,then the multiplicity of the root c is equal to the multiplicity of c.

2 Convergent and Cauchy sequences

2.1

A sequence .xn/n in R or in C is said to converge to x if

8" > 0 9n0 such that n � n0 ) jxn � xj < ":

We write

limnxn D x or simply lim xn D x:

The reader is certainly familiar with the easy facts such as lim.xn Cyn/ D limxn Clim yn or lim.xnyn/ D limxn lim yn, etc.

2.2

A sequence .xn/n in R or in C is said to be Cauchy if

8" > 0 9n0 such that m; n � n0 ) jxm � xnj < ":

Observation. Every convergent sequence is Cauchy.

(If we have the implication n � n0 ) jxn � xj < " then m; n � n0 )jxm � xnj � jxm � xj C jx � xnj < 2").

2.3 Theorem. If a � xn � b for all xn, then the sequence .xn/n contains aconvergent subsequence .x/kn , and a � limn xkn � b.

Proof. Let a � xn � b for all n. Set

M D fx j 9 infinitely many n such that x � xng:

This set is non-empty (a 2 M ) and bounded (no x > b is in M ) and hence there isa finite s D supM . By the definition, each

Kn D fk j s � 1

n< xk < s C 1

ng

3 Continuous functions 11

is infinite, and we can choose, first, xk1 such that s � 1 < xk1 < s C 1 and ifk1 < � � � < kn are chosen with kj 2 Kj we can choose a knC1 2 KnC1 such thatknC1 > kn. Then obviously limn xkn D s, and equally obviously a � s � b. ut

2.4 Theorem. (Bolzano - Cauchy) Every Cauchy sequence of real numbers con-verges.

Proof. Since for some m and all n � m, jxn � xmj < 1, a Cauchy sequence isbounded and hence it contains a subsequence xk1 ; : : : ; xkn ; : : : converging to an x.But then limn xn D x: indeed, choose for an " > 0 an n0 such that for m; n � n0we have jxm � xnj < " and jx � xkn j < ". Then, since kn � n, jx � xnj < 2" forn � n0. ut

2.5

From 1.3.1, we see that if .xn D xn1 C ixn2/n is a sequence of complex numbersthen.xn/n converges if and only if both .xnj /n converge

and.xn/n is Cauchy if and only if both .xnj /n are Cauchy.

Consequently we can infer from Theorem 2.4 the following

Corollary. Every Cauchy sequence of complex numbers converges.

3 Continuous functions

3.1

Recall that a real (resp. complex) function of one real (resp. complex) variable is amapping

f W X ! R (resp. ! C) with X � R (resp. � C):

In the real case X will be most often an interval, that is, a set J � R such thatx; y 2 J and x � z � y implies that z 2 J .

Recall the standard notation from 1.1 for (bounded) open and closed intervals:

.a; b/ D fx j a < x < bg and ha; bi D fx j a � x � bg:

The intervals ha; bi will be often referred to as compact intervals; the reason for thisterminology will become apparent in Chapter 2 below. A function f W X ! R resp.C is said to be continuous if

12 1 Preliminaries

8x 2 X 8" > 0 9ı > 0 such that jy � xj < ı ) jf .y/ � f .x/j < ": (3.1.1)

3.2 Proposition. A function f is continuous if and only if for every convergentsequence one has f .lim xn/ D limf .xn/.

Proof. If f is continuous, if x D limxn 2 X and if " > 0 then first choose aı > 0 as in (3.1.1) and then an n0 such that jxn � xj < " for n � n0. Thenjf .xn/� f .x/j < " for n � n0.

Now suppose f is not continuous. Then there is an x 2 X and an " > 0 such thatfor every ı > 0 there is a y.ı/ such that jy.ı/ � xj < ı and jf .y.ı// � f .x/j � ".Set xn D y. 1

n/. Then limn xn D x while f .xn/ cannot converge to f .x/. ut

3.3 Theorem. (The Intermediate Value Theorem) Let J be an interval, letf W J ! R be a continuous function, and let for some u < v, min.f .u/; f .v// �K � max.f .u/; f .v//. Then there is an x 2 hu; vi such that f .x/ D K .

Proof. Since a restriction of a continuous function is obviously continuous, sincef is continuous if and only if �f is, and since if f is continuous then any x 7!f .x/ �K with K fixed is continuous, it suffices to prove that if f W ha; bi ! R iscontinuous and f .a/ � 0 � f .b/ then there is a c 2 ha; bi such that f .c/ D 0.

Set c D supfx 2 ha; bi j f .x/ � 0g.Suppose f .c/ > 0. Then for " D f .c/ we have a ı > 0 such that for x > c � ı,

f .x/ > f .c/ � " D 0 while there should exist an x > c � ı such that f .x/ � 0.Similarly we cannot have f .c/ < 0 because for " D �f .c/ we would have a ı > 0with f .x/ < f .c/ C " D 0 for c � x < c C ı contradicting the definition of cagain. Thus, f .c/ D 0. ut

3.4 Theorem. A continuous function f W ha; bi ! R on a compact interval attainsa maximum and a minimum.

Proof. for the maximum. SetM D ff .x/ j x 2 ha; big. If it is not bounded choosexn > n and consider a convergent subsequence xkn with limit y. We have f .y/ Dlimn xkn which is impossible because it would yield f .y/ > n for all n. Hence Mis bounded and has a finite supremum s. Now choose xn with s � 1

n< xn � s, and

a convergent subsequence xkn with limit y 2 ha; bi to obtain f .y/ D s. ut

3.5

A function f is said to be uniformly continuous if

8" > 0 9ı > 0 such that 8x; y; jy � xj < ı ) jf .y/ � f .x/j < ":

3.5.1 Theorem. A continuous function on ha; bi is uniformly continuous.

4 Derivatives and the Mean Value Theorem 13

Proof. Suppose not. Then there exists an " > 0 such that

8n 9xn; yn such that jxn � ynj < 1

nand jf .xn/� f .yn/j � ":

Choose a convergent subsequence .xkn/n and then a convergent subsequence .ykmn /nof .ykn/n. Then we have limn xkmn D limn ykmn contradicting Proposition 3.2 andthe inequality j limn f .xkmn /� limn f .ykmn /j � ". ut

4 Derivatives and the Mean Value Theorem

4.1

Let f W X ! R be a function,X � R. We say that f has a limit A at a point a andwrite

limx!a

f .x/ D A

if it is defined on .u; v/ X fag for some u < a < v and if

8" > 0 9ı > 0 such that x 2 .a � ı; a C ı/ X fag ) jf .x/ � Aj < ":Note that f does not have to be defined in a, and if it is, lim

x!af .x/ D A does not

say anything about the value f .a/.

4.2

Let J be an open interval. A function f W J ! R has a derivative A in a point x if

limh!0

f .x C h/� f .x/

hD A

(that is, if the limit on the left-hand side exists, and if it is equal to a). The reader iscertainly familiar with the notation

A D f 0.x/; ordf .x/

dx

and with the basic computation rules like .f C g/0 D f 0 C g0 or .fg/0 D f 0g Cfg0 etc.

4.3 Theorem. A function f has a derivative A at the point x if and only if there isa function � defined on some .�ı; ı/ X f0g (ı > 0) such that

limh!0

�.h/ D 0 and f .x C h/ � f .x/ D AhC h�.h/:

14 1 Preliminaries

Proof. If such a � exists we have for h 2 .�ı; ı/ X f0g,

f .x C h/� f .x/

hD AC �.h/

and hence limh!0

f .x C h/ � f .x/h

D A. On the other hand, if the derivative exists

then we can set �.h/ D f .x C h/ � f .x/h

�A. ut

4.3.1 Corollary. If f has a non-zero derivative at a point x then f .x/ is neither amaximum nor a minimum value of f (A maximum resp. minimum value of a functionf is the maximum resp. minimum, if one exists, of the set of values of f .).

(Indeed, consider f .x C h/ � f .x/ D h.A� �.h// for j�.h/j < jAj.)A point at which a function f has zero derivative or the derivative does not exist

is called a critical point. Corollary 4.3.1 implies that critical points are the onlypoints at which a function f can have a minimum or a maximum. It is, of course,not guaranteed that a critical point would be an actual minimum or maximum (takethe point x D 0 for the function f .x/ D x3). However, see Theorem 4.7 below fora partial converse of the Corollary.

4.4 The Mean Value Theorem

4.4.1 Theorem. (Rolle) Let f be continuous in ha; bi and let it have a derivativein .a; b/. Let f .a/ D f .b/. Then there is a c 2 .a; b/ such that f .c/ D 0.

Proof. If f is constant then f 0.c/ D 0 for all c. If not then, as f .a/ D f .b/, eitherits maximum or its minimum (recall Theorem 3.4) has to be attained in a c 2 .a; b/.By 4.3.1, f 0.c/ D 0. ut

4.4.2 Theorem. (The Mean Value Theorem, Lagrange’s Theorem) Let f be contin-uous in ha; bi and let it have a derivative at .a; b/. Then there is a c 2 .a; b/ suchthat

f 0.c/ D f .b/� f .a/b � a :

More generally, if, furthermore, g is a function with the same properties and suchthat g.b/ ¤ g.a/ and g0.x/ ¤ 0 then there is a c 2 .a; b/ such that

f 0.c/g0.c/

D f .b/� f .a/g.b/� g.a/ :


Proof. Set F.x/ D .f .x/ � f .a//.g.b/ � g.a// C .f .b/ � f .a//.g.x/ � g.a//.Then F.a/ D F.b/ D 0 and F 0.x/ D f 0.x/.g.b/ � g.a// � g0.x/.f .b/ � f .a//

and the second formula follows. For the first one, set g.x/ D x. ut

4.4.3The Mean Value Theorem is often used in the following form (to be comparedwith 4.3):

let x; x C h be both in an interval in which f has a derivative. Then

f .x C h/ � f .x/ D f 0.x C �h/ � h for some � 2 .0; 1/:

(Use 4.4.2 for hx; x C hi resp. hx C h; xi.)

4.4.4 Corollary. If f is continuous in ha; bi and if it has a positive (resp. negative)derivative in .a; b/ then it strictly increases (resp. decreases) (i.e. x < y ) f .x/ <

f .y/ resp. x < y ) f .x/ > f .y/) in ha; bi. If f 0 � 0 in .a; b/ then f is constant.

(For, f .y/ � f .x/ D f 0.c/.y � x/. )

4.5 The second derivative, convex and concave functions

Suppose f has a derivative f 0.x/ at every x 2 J , where J is an open interval. Thus,we have a new real function f 0 W J ! R and this function may have a derivativeagain. In such a case we speak of the second derivative.

4.5.1A function f is said to be convex resp. concave on an interval ha; bi if for any twox < y in ha; bi and any z D tx C .1 � t/y, (0 < t < 1), between these arguments,

f .x/ � tf .x/C .1 � t/f .y/ resp. f .x/ � tf .x/C .1� t/f .y/

(that is, the points of the graph of f lay below (resp.above) the straight lineconnecting the points .x; f .x// and .y; f .y//).

4.5.2 Proposition. Let f be continuous on ha; bi and let f have a non-negative(resp non-positive) second derivative on .a; b/. Then it is convex (resp.concave)on ha; bi.

Proof. In the notation above we have

y � z D y � tx � .1� t/y D t.y � x/; z � x D .1 � t/.y � x/:

Let the second derivative be non-negative. Then we have x < u < z < v < y andu < w < v such that

16 1 Preliminaries

f .y/ � f .z/

y � z� f .z/ � f .x/

z � x D f 0.v/ � f 0.u/ D f 00.w/.v � u/ � 0

so that

f .y/ � f .z/

t.y � x/� f .x/ � f .x/.1 � t/.y � x/ ;

hence .1 � t/.f .y/ � f .z// � t.f .z/ � f .x// and finally

tf .x/C .1 � t/f .y/ � f .z/: ut

4.5.3 An application: Young’s inequalityWe have

Proposition. Let a; b > 0 and let p; q � 1 be such that 1p

C 1q

D 1. Then

ab � ap

pC bq

q:

Proof. Since ln00.x/ D � 1x2< 0, ln is concave; thus if, say ap < bq we have

ln.1

pap C 1

qbq/ � 1

pln.ap/C 1

qln.bq/ D ln aC ln b D ln.ab/

and since ln increases, the inequality follows. ut

4.6 Derivatives of higher order and Taylor’s Theorem

Just as we defined the first and second derivative of a function on an open intervalJ , we may iterate the process to define the third, fourth derivative, etc. In general,we speak of the derivative of n’th order, and define

f .0/ D f; f .1/ D f 0 and further f .nC1/ D .f .n//0:

(Of course, as before, for a given function, such higher derivatives may or may notexist.)

4.6.1 Theorem. (Taylor) Let f have derivatives up to degree n C 1 in an openinterval containing a and x, a ¤ x. Then there is a c in the open interval betweena and x such that


f .x/ DnX

kD0

f .k/.a/

kŠ.x � a/k C f .nC1/.c/

.nC 1/Š.x � a/nC1:

Proof. Fix x and a and define a function R.t/ of one real variable t by setting

R.t/ D f .x/ �nX

kD0

f .k/.t/

kŠ.x � t/k :

Then we have

R0.t/ D dR.t/

dtD �

nXkD0

f .kC1/.t/kŠ

.x � t/k CnX

kD1

f .k/.t/

kŠk.x � t/k�1:

Substituting l D k C 1 in the second sum we obtain

R0.t/ D �nX

kD0

f .kC1/.t/kŠ

.x � t/k Cn�1XlD0

f .lC1/.t/lŠ

.x � t/l D �f.nC1/.t/nŠ

.x � t/n:

Now define g.t/ D .x� t/nC1. Then g0.t/ D �.nC1/.x� t/n and g.x/ D 0. Sincealso R.x/ D 0 we obtain from Theorem 4.4.2,

R.a/

g.a/D R.a/� R.x/

g.a/ � g.x/D R0.c/g0.c/

D f .nC1/.c/.x � c/nnŠ.nC 1/.x � c/n

and hence

R.a/ D f nC1.c/nŠ.nC 1/

g.a/ D f .nC1/.c/.nC 1/Š

.x � a/nC1

and the statement follows, since R.a/ D f .x/ �nX

kD0

f .k/.a/

kŠ.x � a/k , that is,

f .x/ DnX

kD0

f .k/.a/

kŠ.x � a/k CR.a/. ut

4.7 Local extremes

One immediate consequence of Taylor’s Theorem is a partial converse ofCorollary 4.3.1. Suppose a function f is defined on an open interval containinga point x0. We say that x0 is a local maximum (resp. local minimum) of f if thereexists a ı > 0 such that for all x 2 .x0 � ı; x0 C ı/ such that x ¤ x0, f .x/ < f .x0/(resp. f .x/ > f .x0/). We have the following

18 1 Preliminaries

Theorem. Let f be a function such that f 0 and f 00 exist and are continuous onan open interval .a; b/ containing a point x0. Suppose further that f 0.x0/ D 0,f 00.x0/ < 0 (resp. f 00.x0/ > 0). Then x0 is a local maximum (resp. local minimum)of f .

Proof. Let us treat the case of f 00.x0/ D q > 0; the proof in the other case isanalogous. By Taylor’s Theorem, for x 2 .a; b/, x ¤ x0, there exists a point c inthe open interval between x0 and x such that

f .x/ D f .x0/C f 00.c/2

.x � x0/2: (*)

Since f 00 is continuous, there exists a ı > 0 such that for x 2 .x0 � ı; x0 C ı/,f 00.c/ > 0. Then it follows immediately from (*) that if x 2 .x0 � ı; x0 C ı/,x ¤ x0, f .x/ > f .x0/. ut

5 Uniform convergence

5.1

Let fn be real or complex functions defined on an X . We write limn fn D f , orbriefly fn ! f if limn fn.x/ D f .x/ for all x 2 X , and say that fn converge tof pointwise. This convergence is not very satisfactory: consider fn W h0; 1i ! R

defined by fn.x/ D xn, an example where all the fn are continuous while the limitf is not.

We shall need to work with a stronger concept. A sequence of (real or complex)functions .fn/n is said to converge to f uniformly if

8" > 0 9n0 such that 8n � n0 8x; jfn.x/ � f .x/j < ":

This is often indicated by writing fn � f .

5.2 Theorem. Let fn be continuous and let fn � f . Then f is continuous.

Proof. Take an x0 2 X and an " > 0. Choose an n such that for all n � n0 andfor all x, jfn.x/ � f .x/j < "

3, and then a ı > 0 such that jfn.x0/ � fn.x/j < "

3for

jx0 � xj < ı. Then for jx0 � xj < ı,

jf .x0/� f .x/j � jf .x0/� fn.x0/j C jfn.x0/� fn.x/j C jfn.x/� f .x/j < ":

ut

5.3 Theorem. Let fn have derivatives on an open interval J , let fn ! f and letf 0n � g. Then f has a derivative and f 0 D g.

6 Series. Series of functions 19

Proof. By the Mean Value Theorem we have for some 0 < � < 1,

ˇˇf .x C h/� f .x/

h� g.x/

ˇˇ

Dˇˇf .x C h/ � fn.x C h/

hC f .x/ � fn.x/

hC fn.x C h/� fn.x/

h� g.x/

ˇˇ

Dˇˇf .x C h/ � fn.x C h/

hC f .x/ � fn.x/

hC f 0

n.x C �h/� g.x/

ˇˇ

� 1

jhj jf .x C h/� fn.x C h/j C 1

jhj jf .x C h/ � fn.x C h/j

C jf 0n.x C �h/� g.x C �h/j C jg.x C �h/� g.x/j:

Fix an h ¤ 0 such that jg.x C �h/� g.x/j < "4. Then choose an n such that

(1) jf .x C h/� fn.x C h/j < "4jhj and jf .x/ � fn.x/j < "

4jhj, and

(2) jf 0n.x C �h/� g.x C �h/j < "

4.

(Inequality (2) is where we need the convergence to be uniform: we do not knowthe exact position of x C �h). Then

ˇˇf .x C h/ � f .x/

h� g.x/

ˇˇ < 1

jhj"

4jhj C 1

jhj"

4jhj C "

4C "

4D ": ut

6 Series. Series of functions

6.1

Let .an/n be a sequence of real or complex numbers. The associated series (or sum

of a series)1PnD1

an (briefly,Pan if there is no danger of confusion) is the limit

limn

nXkD1

ak provided it exists; in such a case we say thatPan converges, and we say

that it converges absolutely ifP janj converges.

6.2 Consequences of Absolute Convergence

6.2.1 Proposition. An absolutely convergent series converges. More generally, ifjanj � bn and

Pbn converges then

Pan converges.

Proof. Set sn DnX

kD1ak and sn D

nXkD1

bk . For m � n we have

20 1 Preliminaries

jsm � snj D jmX

kDnC1anj �

mXkDnC1

janj �mX

kDnC1bn D jsm � snj:

Thus, if .sn/n is convergent, hence Cauchy, then .sn/n is Cauchy and henceconvergent. ut

6.2.2 Proposition. The seriesPan converges absolutely if and only if for every

" > 0 there is an n0 withXk2K

janj < " for every finite K � fn j n � n0g.

Proof. The formula is equivalent to stating thatmXkDn

jakj<" for n0 �n�m.

Thus, the condition amounts to stating that the sequence .

nXkD1

jakj/n is

Cauchy. ut

6.2.3 Theorem. LetPan converge absolutely. Then for all bijections p from the

set of natural numbers f1; 2; : : : g to itself the sums1XnD1

ap.n/ are equal.

Proof. Let1XnD1

ap.n/ D s for a bijection p. Choose n1 sufficiently large such that

Xk2K

janj < "

2for every finite K � fn j n � n1g and, further, an n0 such that for

n � n0 we have

ˇˇˇnX

kD1ap.n/ � s

ˇˇˇ <

"

2and fp.1/; : : : ; p.n/g � f1; : : : ; n1g:

Now if n�p.n0/ then if we considerKD f1; : : : ; ngXfp.1/; : : : ; p.n0/g we obtain

ˇˇˇnX

kD1ak � s

ˇˇˇ D

ˇˇˇn0XkD1

ap.k/ CXk2K

ak � sˇˇˇ �

ˇˇˇn0XkD1

ap.k/ � sˇˇˇCXk2K

jakj < "

2C "

2D ":

ut

6.3

It is worth taking this a little further. A set S is called countable if there exists abijection � W f1; 2; : : : g ! S . Note that this is the same as ordering S into an

6 Series. Series of functions 21

infinite sequence s1; s2; : : : where si go through all elements of S , and each elementoccurs exactly once. Let us say that

Xs2S

as converges absolutely if

supK�S finite

Xs2K

jasj

is finite. By Proposition 6.2.2, this is equivalent to1XnD1

a�.n/ converging absolutely

for one specified bijection � (which can be arbitrary). Theorem 6.2.3 then showsthat when this occurs, then

Xs2S

as

is well-defined. Here is an example where this point of view helps:

6.3.1 Theorem. Let S1; S2; : : : be disjoint finite or countable sets, and let S D[i

Si . Then the set S is finite or countable. Furthermore, ifXs2S

as converges

absolutely, then

1XiD1

0@Xs2Si

as

1A D

Xs2S

as; (*)

and the left-hand side converges absolutely.

Proof. The case when S is finite is not interesting. Otherwise, we may order theelements of S into an infinite sequence as follows: Assume each of the sets Si isordered into a (finite or infinite) sequence. Then let Tn consist of all the i ’th elements(if any) of Sj such that 1 � i; j � n. Then clearly each Tn is finite, and Tn � TnC1,and

STi D S . Thus, we can order S by taking all the elements of T1, then all the

remaining elements of T2, etc. Thus, S is countable.Now let us investigate (*). The supremum sup

Xs2K

jasj over finite subsets K

of Si is less than or equal to the analogous supremum over K finite subsets ofS , which shows that each

Xs2Si

as converges absolutely. Further, for a finite subset

K � 1; 2; : : : ,

Xi2K

ˇˇˇXs2Si

as

ˇˇˇ �

Xi2K

Xs2Si

jas j � supXi2K

Xs2Li

jas j

22 1 Preliminaries

where the supremum on the right-hand side is over all finite subsets Li � Si . Wesee that the right-hand side is finite by our assumption of absolute convergence overS , and therefore the left-hand side of (*) converges absolutely.

Finally, to prove equality in (*), use a variation of the above proof of the fact thatS is countable: Let Tn consist of sufficiently many elements of S1; : : : ; Sn such thatthe sum

nXiD1

0@ Xs2Tn\Si

as

1A

differs from

nXiD1

0@Xs2Si

as

1A

by less than 1=n. Then the limit of these particular partial sums is the left- hand sideof (*), but is also equal to the right-hand side by absolute convergence. ut

6.3.2 Corollary. Let1XmD0

am,1XnD0

bn be absolutely convergent series. Then

1XmD0

am

!� 1XnD0

bn

!D

1XnD0

nX

kD0akbn�k

!; (*)

with the right-hand side converging absolutely.

Proof. By the assumption, the supremum ofP

m2K;n2Ljamj � jbnj over K , L finite

subsets of f0; 1; 2; : : : g is finite, thus proving thatXS

ambn is absolutely convergent,

where S is the set of all pairs of numbers 0; 1; : : : . The rest follows fromTheorem 6.3.1. ut

6.4

Let .fn/n be a sequence of real or complex functions (defined on anX � R resp C).

The series1XnD1

fn is defined as a function with values1XnD1

fn.x/ whenever the last

series converges. If f .x/ D1XnD1

fn.x/ converges (resp. converges absolutely) for

7 Power series 23

all x 2 X we say thatPfn converges (resp. converges absolutely). If

nXkD1

fk.x/ �

f .x/ we say that1XnD1

fn converges uniformly.

6.4.1Since finite sums of continuous functions are continuous and since .f1C� � �Cfn/0 Df 01 C � � � C f 0

n we obtain from Theorem 5.2, Theorem 5.3 and Proposition 6.2.1

Corollary. 1. Let fn be continuous and letPfn uniformly converge. Then the

resulting function is continuous.2. Let f D P

n fn converge, let f 0n exist and let

Pf 0n converge uniformly. Then f 0

exists and is equal toPf 0n (that is, the derivative of

Pfn can be obtained by

taking derivatives of the individual summands).3. The statements 1 (resp 2) apply to the case of jfn.x/j � an with

Pan

convergent; here the convergence is, moreover, absolute.

7 Power series

7.1

A power series with center c is a series1XnD1

an.x�c/n. So far we will limit ourselves

to the real context; later in Chapter 10, we will discuss them in the complex case.

7.2

The limes superior (sometimes also called the upper limit) of a sequence .an/n of areal number is the number

lim supn

an D infn

supk�n

an:

It obviously exists if the sequence .an/n is bounded; if not we set lim supn an DC1. It is easy to see that lim supn an D lim an whenever the latter exists. The limesinferior (or lower limit) is defined analogously with inf and sup switched.

7.2.1 Proposition. Let lim sup an D infn

supk�n

an D a and limn bn D b. Let an; bn �0 and let a; b be finite. Then lim sup anbn D ab.

24 1 Preliminaries

Proof. Choose an " > 0 and a K > aC b. Take an � > 0 such that K > aC bC �

and K� < " There is an n0 such that

n � n0 ) supk�n

ak < aC � and b � � < bn < b C �:

That is, for every n � n0 there exists a k.n/ � n such that a � ak.n/ < a C � andb � � < bk.n/ < b C � so that

a.b � �/ < ak.n/bk.n/ < .a C �/.b C �/ D ab C �.a C b C �/

and since ab � " < ab � �K < ab � �a and �.aC b C �/ < �K < " we see that

ab � " < ak.n/bk.n/ < ab C "

and conclude that lim sup anbn D ab. ut

7.2.2

For a power series1XnD1

an.x � c/n define the radius of convergence

� D �..an/n/ D 1

lim sup npjanj

if lim sup npjanj ¤ 0; otherwise set �..an/n/ D C1.

Theorem. Let r < �..an/n/. Then the power series1Xn�1

an.x � c/n converges

absolutely and uniformly on the set fx j jx � cj � rg.

On the other hand, if jx � cj > � then1XnD1

an.x � c/n does not converge.

Proof. I. Let jx � cj � r < �. Choose a q such that

r � infn

supk�n

kp

jakj < q < 1:

Then there is an n such that

r supk�n

kp

jakj < q and hence r kp

jakj < q for all k � n:

Choose a K � 1 such that rk � Kqk for k � n. Then

jakxk j � Kqk for all k and jx � cj � r

7 Power series 25

and hence by Theorem 6.3.1, the series converges on fx j jx�cj � rg absolutelyand uniformly.

II. Let jx � cj > �; then jx � cj infn supk�n kpjakj > 1, hence jx � cj

supk�n kpjakj > 1 for all n, and hence for each n there is a k.n/ � n such

that jx � cj � k.n/pjak.n/ > 1, and hence jak.n/.x � c/k.n/j > 1 and the series

cannot converge: its summands do not even converge to 0. ut

7.3

Consider the series

1XnD1

nan.x � c/n�1: (*)

Obviously it converges if and only if1XnD1

nan.x � c/n does and hence its radius of

convergence is

1

lim sup npnjanj

:

By Proposition 7.2.1, lim sup npnjanj D lim sup n

pn npjanj D lim n

pn � lim sup

npjanj D lim sup n

pjanj (since lim npn D lim e

1n ln n D e0 D 1). Thus,

the radius of convergence � of the series (*) is the same as that of the originalPan.x � c/n and since nan.x � c/n�1 is the derivative of an.x � c/n we conclude

from 5.3 and 6.3.1 that for jx � cj < � the seriesPan.x � c/n has a derivative,

and that it is obtained as the sum of the derivatives of the individual summands.So far, this derivative had to be understood as in the real context. In fact, however,

it is valid for complex power series as well; see Chapter 10.

7.4 Remark

If we proceed to compute the higher derivatives summand-wise, we obtain

f .k/.x/ D1XnDk

n.n � 1/ � � � .n� k C 1/an.x � c/n�k:

In particular

26 1 Preliminaries

f .k/.c/ D nŠan; and hence ak D f .k/.c/

kŠ:

Thus, if a function can be written as a power series with a center c then thecoefficients an are uniquely determined (they do depend on the c, of course).

Compare this with the formula in 4.6.1. It should be noted, though, that in realanalysis it can easily happen that a function f has all derivatives without being

representable as a power series: the remainder f.nC1/.t /

.nC1/Š .x� c/nC1 may not convergeto zero with increasing n (see Exercise (13)). In fact, it is interesting to note thatmany important constructions in real analysis, such as the smooth partition of unitywhich we will need in Chapter 12, depend on the use of such functions.

8 A few facts about the Riemann integral

8.1

A partition of a compact interval ha; bi is a sequence

D W a D t0 < t1 < � � � < tn D b:

The mesh of the partition is the maximum of the numbers jtiC1� ti j. A partitionD0 Wa D t 00 < t 01 < � � � < t 0m D b refinesD if ftj j j D 1; : : : ; ng � ft 0j j j D 1; : : : ; mg.

Let f W ha; bi ! R be a bounded function (this means that the set of values off is bounded). Define the lower and upper sum of f in D as

s.f;D/ DnX

jD1mj .tj � tj�1/ and S.f;D/ D

nXjD1

Mj .tj � tj�1/

where mj D infff .x/ j x 2 htj�1; tj ig and Mj D supff .x/ j x 2 htj�1; tj ig.

8.1.1 Proposition. 1. If D0 refines D then s.f;D/ � s.f;D0/ and S.f;D/ �S.f;D0/.

2. For any two partitionsD1;D2, s.f;D1/ � S.f;D2/.

Proof. If tk�1 D t 0l < t 0lC1 < � � � < t 0lCr D tk and aj D supff .x/ j x 2ht 0lCj�1; t 0lCj ig, A D supff .x/ j x 2 htk�1; tkig then

Xj

aj .t0lCj � t 0lCj�1/ �

Xj

A.t 0lCj � t 0lCj�1/ D A.tk � tk�1/ and S.f;D0/ � S.f;D/ follows. Similarly

for the lower sums.Let D be a common refinement of D1 and D2 (easily obtained, e.g., from the

union of the elements of the two partitions). Then

8 A few facts about the Riemann integral 27

s.f;D1/ � s.f;D/ � S.f;D/ � S.f;D2/: ut

8.2

By Proposition 8.1.1, we can define the lower resp. upper Riemann integral of fover ha; bi by setting

Z b

a

f .x/dx D supD

s.f;D/ resp.Z b

a

f .x/dx D infDs.f;D/:

IfR baf .x/dx D R b

af .x/dx we denote the common value by

Z b

a

f .x/dx or brieflyZ b

a

f

and call it the Riemann integral of f over ha; bi.

8.2.1 Proposition.R baf exists if and only if for every " > 0 there is a partition D

such that

S.f;D/� s.f;D/ < ":

Proof. I. LetR baf exist and let " > 0. There is a partition D1 such that

S.f;D1/ <R baf C "

2and a partition D2 such that s.f;D2/ >

R baf � "

2.

Then we have, for the common refinementD of D1 andD2,

S.f;D/� s.f;D/ <

Z b

a

f C "

2�Z b

a

f C "

2D ":

II. Let the statement hold. Choose an " > 0 and aD such that S.f;D/�s.f;D/ >". Then

Z b

a

f � S.f;D/ < s.f;D/C " �Z b

a

f C ":

Since " > 0 was arbitrary,R baf D R b

af . ut

8.3 Theorem. For every continuous function f W ha; bi ! R the Riemann integralR baf exists. In fact, more strongly, for every sequence Dn of partitions of ha; bi

whose mesh approaches 0 with n ! 1, we have

28 1 Preliminaries

limn!1 s.f;Dn/ D lim

n!1S.f;Dn/ DZ b

a

f:

Proof. Let " > 0. By 3.5.1, f is uniformly continuous. Hence there exist a ı > 0

such that

jx � yj < ı ) jf .x/ � f .y/j < "

b � a:

Choose a partition D W a D t0 < t1 < � � � < tn D b such that tj � tj�1 < ı forall j D 1; : : : ; n. Then Mj � mj D supff .x/ j x 2 htj�1; tj ig � infff .y/ j y 2htj�1; tj ig � supfjf .x/� f .y/j j x; y 2 htj�1; tj ig � "

b�a and hence

S.f;D/ � s.f;D/ DX

.Mj �mj /.tj � tj�1/

� "

b � aX

.tj � tj�1/ D "

b � a .b � a/ D ": ut

8.4 Theorem. (The Integral Mean Value Theorem) Let f be a continuous functionon ha; bi, M D maxff .x/ j x 2 ha; big and m D minff .x/ j x 2 ha; big (theyexist by 3.4). Then there exists a c 2 ha; bi such that

Z b

a

f .x/dx D f .c/.b � a/:

Proof. From the definition one immediately obtains that

m.b � a/ �Z b

a

f .x/dx � M.b � a/:

Thus there is a K , m � K � M such thatR ba f .x/dx D K.b � a/. By 3.3, there

exists a c such that K D f .c/. ut

8.5 Proposition. Let a < b < c and let f be a bounded function defined on ha; ci.Then

Z b

a

f CZ c

b

f DZ c

a

f andZ b

a

f CZ c

b

f DZ c

a

f:

Proof. Denote by D.u; v/ the set of all paritions of hu; vi. For D1 2 D.a; b/and D2 2 D.b; c/ define D1 C D2 2 D.a; c/ as a union of the two sequences.Obviously

s.D1 CD2; f / D s.D1; f /C s.D2; f /:

8 A few facts about the Riemann integral 29

We have

Z b

a

f CZ c

b

f D supD12D.a;b/

s.D1; f /C supD22D.b;c/

s.D2; f /

D supfs.D1; f /C s.D2; f / jD1 2 D.a; b/;D2 2 D.b; c/gD supfs.D1 CD2; f / j D1 2 D.a; b/;D2 2 D.b; c/g

D supfs.D; f / jD 2 D.a; c/g DZ c

a

f;

the penultimate equality because each D 2 D.a; c/ can be refined by a D1 C D2

adding the b. ut

8.5.1 ConventionFor b < a we will write formally

R baf for � R a

bf . Then we have, for any a; b; c,

Z b

a

f CZ c

b

f DZ c

a

f:

8.6 Theorem. (The Fundamental Theorem of Calculus) Let f be continuous onha; bi. For x 2 ha; bi set

F.x/ DZ x

a

f .t/dt:

Then we have F 0.x/ D f .x/ for all x 2 .a; b/.

Proof. Let h ¤ 0. By 8.5 and 8.4. we have

F.x C h/ � F.x/ DZ xCh

a

f �Z x

a

f DZ xCh

x

f D f .x C �h/h

with some � 2 h0; 1i. Thus,

1

h.F.x C h/ � F.x// D f .x C �h/

and, as f is continuous, limh!0

f .x C �h/ D f .x/. ut

8.6.1 Corollary. If f and G are continuous on ha; bi and if G0 D f in .a; b/ then

Z b

a

f .x/dx D G.b/�G.a/:

30 1 Preliminaries

(By 4.4.4,R xaf .t/dt �G.x/ is constant. Thus,

R baf D R b

af � R a

af D G.b/C

C � .G.a/C C/ D G.b/�G.a/.)

9 Exercises

(1) Assuming the Fundamental Theorem of Algebra, prove that every non-zeropolynomial with coefficients in R is a product of polynomials with coefficientsin R each of which has degree � 2. [Hint: Use 1.4.4.]

(2) Prove that the set R of all real numbers is not countable (we say it is

uncountable). [Hint: Prove that the numbers1XkD0

ak2�k are all well-defined

and different for all choices ak 2 f0; 1g. If there were a sequence1XkD0

ak;n2k,

n 2 N of all these numbers, then the number1XkD0

.1�ak;k/2k would be different

from all of them - a contradiction.]

(3) (a) Prove directly that the function ex DX 1

nŠxn satisfies exey D exCy .

[Hint: Use Corollary 6.3.2](b) Prove that ex ¤ 0 for any x 2 R. [Hint: use (a).](c) Prove that ex is a continuous function on R which takes on only positive

values. [Hint: Use Theorem 7.2.2, Theorem 5.2 and Theorem 3.3.](4) Using the definition from Exercise (3), prove that .ex/0 D ex . [Hint: Corollary

6.4.1 is relevant.](5) (a) Prove that ex is an increasing function on R. [Hint: Use Exercises (3)

and (4).](b) Prove that lim

x!�1 ex D 0, limx!1 ex D 1. [Hint: Use (a) and Exercise (3).]

(6) (a) Prove that there exists a function ln.x/ W fx 2 Rjx > 0g ! R inverse toex. [Hint: Use Exercise (5) (b).]

(b) Prove that .ln.x//0 D 1=x. [Hint: This follows from the chain rule; a directproof can also be given using Theorem 4.3.]

(7) For a 2 R, x > 0, define xa D ea ln.x/. Using the chain rule, prove that.xa/0 D axa�1.

(8) Define functions sin.x/, cos.x/ by sin.x/ D1XnD0.�1/nx2nC1=.2n C 1/Š,

cos.x/ D1XnD0.�1/nx2n=.2n/Š.

(a) Prove that cos.�x/ D cos.x/; sin.�x/ D � sin.x/ (i.e. cos.x/ is even andsin.x/ is odd).

(b) Prove that .sin.x//0 D cos.x/, .cos.x//0 D � sin.x/. [Hint: Corollary6.4.1 is relevant.]

9 Exercises 31

(9) Prove that there exists a minimum number a > 0 such that cos.a/ D 0.This number a is called =2. Prove that cos.x/ is decreasing in the interval.0; =2/. [Hint: By Exercise (8), we have cos00.x/ D � cos.x/, whilecos.0/ D 1, .cos.0//0 D 0. This means that .cos.x//0 is negative in someinterval .0; "/, " > 0, and .cos.x//0 is decreasing on any interval .0; a/ onwhich cos.x/ > 0. Let cos0."=2/ D �b, cos."=2/ D c, b; c > 0. Thencos."=2C t/ � c�bt if "=2 < "=2C t < a (a as above). From this, it followswe cannot have a � "=2 > c=b.]

(10) (a) Prove that cos.x ˙ y/ D cos.x/ cos.y/ sin.x/ sin.y/, sin.x ˙ y/ Dsin.x/ cos.y/˙ cos.x/ sin.y/. [Hint: analogous to Exercise (3).]

(b) Prove that sin.=2/ D 1, sin.x/ is increasing on the interval .0; =2/, andcos.=2 � x/ D sin.x/. [Hint: Let sin.=2/ D a. Apply (a) to show thatcos.=2�x/ D a sin.x/, sin.=2�x/ D a cos.x/, and therefore a2 D 1.Observe that we must then have a D 1 because sin.x/ is increasing on theinterval .0; =2/ by Exercise (8).]

(11) Prove that cos.x/ and sin.x/ are both periodic with period 2 , their values(on x real) are between �1 and 1, and describe their maxima and minima, andintervals on which they are decreasing resp. increasing.[Hint: Use Exercise (10) and the fact that cos.x/ is even to prove that cos.x C/ D � cos.x/, etc.]

(12) Now consider the definition of ex from Exercise (3) for a complex number x.(a) Prove that ex is well-defined (i.e. the series converges) for all x 2 C, and

that exey D exCy for x; y 2 C. [Interpret this as separate statementsabout the real and imaginary parts.]

(b) Prove that for a complex number �, the functions Re.e�x/, Im.e�x/ arecontinuous and differentiable in the real variable x, and that .e�x/0 D �e�x

[this is, again, to be interpreted as equalities of the real and imaginaryparts].

(c) Prove the equalities cos.x/ D eix C e�ix

2, sin.x/D eix � e�ix

2ifor x 2 R.

[Remark: The attentive reader surely noticed that something is missing here;we should learn how to differentiate with respect to a complex variable x.(!) However, we will have to build up a lot more foundations, and wait untilChapter 10 below, to understand that rigorously.]

(13) Let f .x/ D e�1=x for x > 0, f .x/ D 0 for x � 0. Prove that f .n/.0/ D 0 forall n � 1.

2Metric and Topological Spaces I

A key to rigorous multivariable calculus is a basic understanding of point settopology in the framework of metric spaces. Covering these basic concepts is thepurpose of this chapter. We will see that studying these concepts in detail will reallypay off in the chapters below. While studying metric spaces, we will discover certainconcepts which are independent of metric, and seem to beg for a more generalcontext. This is why, in the process, we will introduce topological spaces as well.

1 Basics

1.1

Let RC denote the set of all non-negative real numbers and C1. A metric spaceis a set X endowed with a metric (or distance function, briefly distance) d W X X ! RC such that(M1) d.x; y/ D 0 if and only if x D y,(M2) d.x; y/ D d.y; x/, and(M3) d.x; y/C d.y; z/ � d.x; z/.Condition (M3) is called the triangle inequality; the reader will easily guess why.The elements of a metric space are usually referred to as points.

Very often one considers distance functions which take on finite values only, butallowing infinite distances comes in handy sometimes.

1.1.1 Examples(a) The set R of real numbers with the distance function d.x; y/ D jx � yj.(b) The set (plane) C of complex numbers, again with the distance jx � yj; note,

however, that here the fact that it satisfies the triangle inequality is much lesstrivial than in the previous case (see Theorem 1.3 of Chapter 1).

(c) The Euclidean space Rm D f.x1; : : : ; xm/ j xj 2 Rg


33

34 2 Metric and Topological Spaces I

d..x1; : : : ; xm/; .y1; : : : ; ym// DrX

.xj � yj /2:

Comment: In linear algebra, there are good reasons for distinguishing row andcolumn vectors, and equally good reasons why the ordinary Eucliean spaceRn should consist of column vectors. This is the reason why we used the

subscript Rn above for row vectors, which are easier to write down (comparewith A.7.3). From the point of view of metric and topological spaces, however,the distinction between row and column vectors has no meaning. Because ofthat, in this chapter, we will use the symbols R

n and Rn interchangably, notdistinguishing between row and column vectors.

(d) C.ha; bi/, the set of all continuous real functions on the interval ha; bi, with

d.f; g/ D maxx

jf .x/ � g.x/j:

(e) The set F.X/ of all bounded real functions on a set X with

d.f; g/ D supx

jf .x/ � g.x/j:

(f) The unit circle

S1 D f.x; y/ 2 R2 j x2 C y2 D 1g

where for two points P;Q 2 S1, d.P;Q/ is the lesser of the two anglesbetween the lines ftPjt 2 Rg and ftQjt 2 Rg.

(g) Any set S with the metric given by d.x; y/ D 0 if x D y 2 S and d.x; y/ D 1

if x ¤ y 2 S . This is known as the discrete space.

1.2 Norms

The metrics in Examples 1.1.1 (a)–(e) in fact all come from a more special situation,which plays an especially important role. A norm on a vector space V (over real orcomplex numbers) is a mapping jj � jj W V ! R such that(1) jjxjj � 0, and jjxjj D 0 only if x D o,(2) jjx C yjj � jjxjj C jjyjj, and(3) jj˛xjj D j˛j � jjxjj.

1.2.1A normed vector space is a (real or complex) vector space V provided with a norm.(The term normed linear space is also common.) Since we have

jjx � zjj D jjx � y C y � zjj � jjx � yjj C jjy � zjj;

1 Basics 35

the function �.x; y/ D jjx � yjj is a metric on V , called the metric associated withthe norm. In this sense, we can always view a normed linear space as a metric space.

1.2.2 Examples1. Any of the following formulas yields a norm in Rn.

(a) jjxjj D maxxj ,(b) jjxjj D P jxj j,(c) jjxjj D

qPx2j .

Notice that (c) gives the metric space in Example 1.1.1 (c).2. In the space of bounded real functions on a set X we can consider the norm

jj'jj D supfj'.x/j j x 2 Xg:

The associated metric gives rise to Example 1.1.1 (e) above.

1.2.3 A particularly important exampleExample 1.2.2 (c) is in fact, a special case of the following construction: On a (realor complex) vector space with an inner product (see 4.2 of Appendix A), we have anorm

jjxjj D pxx:

Indeed: (1) of 1.2 is obvious. Further, by the Cauchy-Schwarz inequality (see 4.4 ofAppendix A),

jjx C yjj2 D .x C y/.x C y/ D xx C xy C yx C yy

D jxx C xy C yx C yyj � jjxjj2 C jxyj C jyxj C jjyjj2

� jjxjj2 C 2jjxjjjjyjjj C jjyjj2 D .jjxjj C jjyjj/2:

Finally, jj˛xjj D p.˛x/.˛x/ D p

˛˛.xx/ D j˛j � jjxjj. ut

1.3 Convergence

A sequence x1; x2; : : : of points of metric space converges to a point x wheneverfor every " > 0, there exists an n0 such that for all n � n0, we have d.xn; x/ < ".This is expressed by writing

limn!1xn D x or lim

nxn D x or just lim xn D x:

We then speak of a convergent sequence. Note that obviously(*) any subsequence .xkn/n of a convergent sequence converges to the same limit.


1.3.1 Examples(a) The usual convergence in R or C.(b) Consider the examples in 1.1.1 (d) and (e). Realize that the convergence of asequence of functions f1; f2; : : : in these spaces is what one usually calls uniformconvergence of functions.

1.4

Two metrics d1; d2 on the same set X are said to be equivalent if there exist positivereal numbers ˛; ˇ such that for every x; y 2 X ,

˛d1.x; y/ � d2.x; y/ � ˇd1.x; y/:

Note that we have an obvious

1.4.1 Observation. If d1 and d2 are equivalent then .xn/n converges in .X; d1/ ifand only if it converges in .X; d2/.

1.5

Let .X; d/ and .Y; d 0/ be metric spaces. A map f W X ! Y is said to be continuousif

for every x 2 X and every " > 0 there is a ı > 0 such that, forevery y in X ,

d.x; y/ < ı ) d 0.f .x/; f .y// < ":(ct)

Later on we will need a stronger concept: a mapping f W X ! Y is said to beuniformly continuous if

for every " > 0 there is a ı > 0 such that, for all x; y in X

d.x; y/ < ı ) d 0.f .x/; f .y// < ": (uct)

Note the subtle difference between the two concepts. In the former the ı can dependon x, while in the latter it depends on the " only. For example,

f D .x 7! x2/ W R ! R

is continuous but not uniformly continuous.

2 Subspaces and products 37

It is easy to prove

1.5.1 Proposition. A composition g ıf of continuous (resp. uniformly continuous)maps f and g is continuous (resp. uniformly continuous).

1.5.2Here is another easy but important

Observation. Let d; d1 be equivalent metrics on X and let d 0; d 01 be equivalent

metrics on Y . Then a map f W X ! Y is continuous (resp. uniformly continuous)with respect to d; d 0 if and only if it is continuous (resp. uniformly continuous) withrespect to d1; d 0

1.

1.6 Proposition. A map f W .X; d/ ! .Y; d 0/ is continuous if and only if for everyconvergent sequence .xn/n in .X; d/, the sequence .f .xn//n is convergent and

f .lim xn/ D limf .xn/:

(Compare with Proposition 3.2 of Chapter 1.)

Proof. ): Let limxn D x. Consider the ı > 0 from (ct) taken for the x and an" > 0. There is an n0 such that n � n0 implies d.xn; x/ < ı. Then for n � n0,d 0.f .xn//; f .x// < ".

(: Suppose f is not continuous. Then there is an x 2 X and an "0 > 0

such that for every ı > 0 there exists an x.ı/ such that d.x.ı/; x/ < ı whiled 0.f .x.ı// � "0. Now set xn D x. 1

n/; obviously lim xn D x and .f .xn//n does not

converge to f .x/. ut

2 Subspaces and products

2.1

Let .X; d/ be a metric space and let X 0 � X be an arbitrary subset. Obviously.X 0; d 0/ where d 0 is d restricted to X 0 X 0 is a metric space again.

Examples.(a) Intervals in the real line.(b) More generally, the typical subspaces of the Euclidean space Rm one usually

works with: n-dimensional intervals (by which we mean cartesian products ofn-tuples of intervals), polyhedra, balls, spheres, etc.

(c) The space C.ha; bi/ from 1.1.1.(d) is a subspace of the F.ha; bi/ from 1.1.1.(e).

Convention. Unless otherwise stated we will think of subsets of spaces automat-ically as subspaces.


2.1.1 Observations. 1. Let .X 0; d 0/ be a subspace of .X; d/. Then the embeddingmap j D .x 7! x/ W X 0 ! X is uniformly continuous. Consequently,a restriction f jX 0 W .X 0; d 0/ ! .Y; d/ of a continuous (resp. uniformlycontinuous) f W .X; d/ ! .Y; d/ is continuous (resp. uniformly continuous).

2. Let f W .X; d/ ! .Y; d/ be a continuous (resp. uniformly continuous) map andlet Y 0 � Y be a subspace such that f ŒX� � Y 0. Then f 0 D .x 7! f .x// W.X; d/ ! .Y; d

0/ is continuous (resp. uniformly continuous).

Proof. 1. For " > 0 take ı D ". For the consequence recall 1.5.1 and the fact thatf jX 0 D fj .

2. For x and " > 0 use the same ı as for f . ut

2.2

Let .Xi ; di /, i D 1; : : : ; m, be metric spaces. On the cartesian productmYiD1

Xi DX1 � � � Xm consider the following distances:

�..x1; : : : ; xm/; .y1; : : : ; ym// Dvuut

mXiD1

di .xi ; yi /2;

�..x1; : : : ; xm/; .y1; : : : ; ym// DmXiD1

di .xi ; yi /; and

d..x1; : : : ; xm/; .y1; : : : ; ym// D maxdi.xi ; yi /:

(� and d satisfy (M1), (M2) and (M3) obviously. The triangle inequality of �needs some simple reasoning – one can use, for instance, Theorem 4.4 fromAppendix A. In fact, we will rarely use this metric in the context of the topologyof multivariable functions. However, note its geometrical significance: it yields thestandard Pythagorean metric in the space R

m viewed as R � � � R.)

2.2.1 Proposition. The distance functions �, � and d are equivalent metrics.

Proof.

�..xi /i ; .yi /i / �vuut

mXiD1

maxjdj .xj ; yj /2 D p

n � d..xi /i ; .yi /i /:

Obviously d..xi /i ; .yi /i /� �..xi /i ; .yi /i /; �..xi /i ; .yi /i / and finally �..xi /i ;

.yi /i /�mXiD1

maxj dj .xj ; yj / D n � d..xi /i ; .yi /i /.

3 Some topological concepts 39

2.2.2The space

QXi endowed with any of the metrics �, � , d (typically, by d ) will be

referred to as the product of the spaces .Xi ; di /, i D 1; : : : ; m.

Theorem. 1. The projections pj D ..X1; : : : ; xm/ 7! xj / WYi

.Xi ; di / !.Xj ; dj / are uniformly continuous.

2. A sequence

.x11 : : : ; x1m/; .x

21 : : : ; x

2m/; .x

31 : : : ; x

3m/; : : : (*)

converges inQ.Xi ; di / if and only if each of the sequences

x1j ; x2j ; x

3j : : : (**)

converges in the respective .Xj ; dj /.3. Let fj W .Y; d/ ! .Xj ; dj / be continuous (resp. uniformly continuous). Then

the mapping

f D .y 7! .f1.y/; : : : ; fm.y/// W .Y; d 0/ !Y.Xi ; di /

(the unique mapping such thatpjf D fj for all j ) is continuous (resp.uniformlycontinuous).

Proof. 1. We have d..xi /i ; .yi /i / � dj .xj ; yj /. Thus, it suffices to take ı D ".2. If .�/ converges then each .��/ converges by 1 and 1.6. For " > 0 choose nj

such that for k � nj , dj .xkj ; xj / < ", and consider n0 D maxj nj . Then for

k � n0, dj .xkj ; xj / < " for all j , and hence maxdj .xkj ; xj / < ".3. immediately follows from 2 and 1.6.

3 Some topological concepts

3.1 Neighborhoods

First, define the "-ball with center x as

.x; "/ D fy j d.x; y/ < "g:

A subset U � X is a neighborhood of a point x 2 X if there exists an " > 0 suchthat

.x; "/ � U:


Remark: While the concept of an "-ball depends on the concrete metric, theconcept of neighborhood does not change if we replace a metric by an equivalentone. In fact, we can change the metric even much more radically – see Exercise (5)below.

3.1.1 Observations. 1. If U is a neighborhood of x and U � V then V is aneighborhood of x.

2. If U1; U2 are neighborhoods of x then so is U1 \ U2.

(1: for V use the same .x; "/. 2: if .x; "i / � Ui then .x;min."1; "2// �U1 \ U2.)

3.2 Open and closed sets

A subset U � .X; d/ is open if it is a neighborhood of each of its points.A subset A � .X; d/ is closed if for every sequence .xn/n, xn 2 A convergent in

.X; d/, the limit limxn is in A.

3.2.1 Proposition. 1. X and ; are open. If U and V are open then U \V is open,and if Ui , i 2 J , are open (J arbitrary) then

Si2J

Ui is open.

2. U is open if and only if X X U is closed.3. X and ; are closed. IfA andB are closed thenA[B is closed, and ifAi , i 2 J ,

are closed then[i2J

Ai is closed.

Proof. 1 is straightforward (use 3.1.1).2: Let U be open, A D X X U . The limit x of a sequence .xn/n that is all in A

cannot be in U since there is an " > 0 such that .x; "/ � U , and the xn’s withsufficiently large n have to be in such.x; "/.

On the other hand, if U is not open, then there is an x 2 U such that for every n,.x; 1

n/ ª U . Therefore, we can choose points xn 2 .x; 1

n/\Awith x D lim xn 2

U D X X A.3 follows from 1.3 and the formulas relating intersections and unions with

complements. ut

3.3 Closure

Let A be a general subset of a metric space X D .X; d/. For a point x 2 X , definethe distance of x from A by

d.x;A/ D inffd.x; a/ j a 2 Ag:

3 Some topological concepts 41

Note that if x 2 A then d.x;A/ D 0 but d.x;A/ can be 0 even if x … A.

The closure of a set A in .X; d/ is the set

A D fx j d.x;A/ D 0g:This definition seems to depend heavily on the distance function. But we have

3.3.1 Proposition. 1. The set A is closed, and it is the smallest closed setcontaining A. In other words,

A D\

fB closed j A � Bg:

2. A point x 2 X is in A if and only if for each of its neighborhoodsU , U \A ¤ ;(in other words, if and only if for each open U 3 x, U \ A ¤ ;).

Proof. 1 : U D X XA is open, since if x … A there is an " > 0 such that.x; 2"/\A D ; and hence by the triangle inequality.x; "/ \ A D ;.

Let B be closed and B � A. Let x 2 A. For each n choose an xn 2 A (andhence in B) such that d.x; xn/ < 1

n. Then x D limxn is in B . The correctness of

the formula follows from 3.2.1.2 is obvious: in yet other words we are speaking about the balls .x; "/

intersecting A. ut

3.3.2 Proposition. 1. ; D ;, A � A, and A � B ) A � B ,2. A [ B D A[ B , and3. A D A.

Proof. 1 is trivial.2: By 1, A [ B � A[ B . Now let x 2 A [ B; x is or is not in A. In the latter

case, all sufficiently close elements from A [ B have to be in B and hence x 2 B .

3: By 3.3.1 1, A is closed and since it contains B D A, it also contains B D A.ut

We also define the interior Int.A/ D X X XA. The interior of A is alsodenoted by Aı. It immediately follows from Proposition 3.3.1 that the interior isthe union of all open sets contained in A. The boundary of A is defined as @A DA X Int.A/.

3.4

Continuity can be expressed in terms of the concepts introduced in this section. Wehave


Theorem. The following statements on a mapping f W .X; d/ ! .Y; d 0/ areequivalent.(1) f is continuous.(2) For every 2 X and every neighborhood V of f .x/ there is a neighborhood U

of x such that f ŒU � � V .(3) For every U open in .Y; d 0/ the preimage f �1ŒU � is open in .X; d/.(4) For every A closed in .Y; d 0/ the preimage f �1ŒA� is closed in .X; d/.(5) For every subset A � X ,

f ŒA� � f ŒA�:

(6) For every subset B � Y ,

f �1ŒB� � f �1ŒB�:

Proof. (1))(2) : Let V be a neighborhood of f .x/ with .f .x/; "/ � V . Choosea ı > 0 as in (ct) for x and ". Then f Œ.x; ı/� � .f .x/; "/, and .x; ı/ is aneighborhood of x.

(2))(3) : If U � Y is open and x 2 f �1ŒU � then f .x/ 2 U and U is aneighborhood. Hence there is a neighborhood V of x such that f ŒV � � U and wehave x 2 V � f �1ŒU �, making f �1ŒU � a neighborhood of x.

(3),(4) by 3.2.1 2, since f �1Œ�� preserves complements.(4))(5) : We have

A � f �1ŒŒf ŒA�� f �1Œf ŒA��:

Since f �1Œf ŒA�� is closed, we have by 3.3.1 A � f �1Œf ŒA�� and the statementfollows.

(5))(6) : We have, by (5), f Œf �1ŒB�� f Œf �1ŒB�� B and hence f �1ŒB� �f �1ŒB�.

(6))(1) : If f .y/ 2 .f .x/; "/ then f .y/ … Y X.f .x/; "/ and hence y …f �1ŒB� where B D Y X .f .x/; "/. Hence y … f �1ŒB� and there is a ı > 0

such that .y; ı/ \ f �1ŒB� D ;. Thus if d.x; y/ > ı then f .y/ … B , that is,f .y/ 2 .f .x/; "/. ut

3.5

A continuous mapping f W .X; d/ ! .Y; d 0/ is called a homeomorphism if there isa continuous mapping g W .Y; d 0/ ! .X; d/ such that

fg D idY and gf D idX :

4 First remarks on topology 43

If there exists a homeomorphism f W .X; d/ ! .Y; d 0/ we say that the spaces.X; d/ and .Y; d 0/ are homeomorphic.

Note that if d and d 0 are equivalent metrics then the identity map idX W .X; d/ !.X; d 0/ is a homeomorphism. But idX W .X; d/ ! .X; d 0/ can be a homeomorphismeven when d and d 0 are far from being equivalent (consider, e.g., the interval h0; /with the standard metric d and with d 0.x; y/ D j tanx � tan yj).

A property of a space or a concept related to spaces is said to be topologicalif it is preserved under all homeomorphisms. For example, by Theorem 3.4, for aset to be a neighborhood of a point, or to be open resp, closed, or the closure, aretopological concepts. By 1.6, convergence is a topological concept.

Continuity is a topological concept, but uniform continuity is not.This suggests the possibility of formulating a notion of a space based only on

topological properties. We will explore this in the next section.

4 First remarks on topology

Very often, a choice of metric is not really important. We may be interested justin continuity, and a concrete choice of metric may be somehow off the point. Forexample, note that the ”natural” Pythagorean metric would have been a real burdenin dealing with the product. Sometimes it even happens that one has a natural notionof continuity, or convergence, without having a metric defined first. It may evenhappen that there is no reasonable way to define a metric.

This leads to a more general notion of a space, called a topological space. Theidea is to describe the structure of interest simply in distinguishing whether a subsetU � X containing x “surrounds” (is a neighborhood of) x, or declaring somesubsets open resp. closed, or specifying an operator of closure. We will present herethree variants of the definition, which turn out to be equivalent.

4.1

We will start with the neighborhood approach, which was historically the first one(introduced by Hausdorff in 1914). It is convenient to denote by P.X/ the powerset of X , which means the set of all subsets of X (including the empty set and X ).With every x 2 X , one associates a set U.x/ � P.X/, called the system of theneighborhoods of x, satisfying the following axioms:(1) For each U 2 U.x/, x 2 U ,(2) If U 2 U.x/ and U � V � X then V 2 U.x/,(3) If U; V 2 U.x/ then U \ V 2 U.x/, and(4) For every U 2 U.x/ and every y 2 V there is a V 2 U.x/ such that U 2 U.y/.One then defines a (possibly empty) subsetU ofX to be open ifU is a neighborhoodof each of its points. One defines a subset A of X to be closed if the complement


X X A of A is open. The closure of a subset S of X is defined by the formulaS D fx j 8U 2 U.x/; U \ S ¤ ;g.

4.2

Nowadays probably the most common approach to the structure of topology is todefine open sets first as a set of subsets of X satisfying certain axioms. It may beperhaps less intuitive, but it turns out to be much simpler technically.

In this approach, a topology on a set X is a subset � � P.X/ satisfying(1) ;; X 2 � ,(2) U; V 2 � ) U \ V 2 � ,(3) Ui 2 �; i 2 J ) S

Ui 2 � .In other words, we may simply say that a topology is a subset of the set P.X/of all subsets of X which is closed under all unions and all finite intersections.(To include (1), we allow the union of an empty set of subsets of X , which is saidto be ;, and the intersection of an empty set of subsets of X , which is said to be X .)

One then defines a closed set as a complement of an open set; U is aneighborhood of x if there is an open V such that x 2 V � U , and the closureis defined by the formula

A D\

fB j A � B; B closedg:

A subset A � X is called dense if A D X .

Remark: It is possible to start equivalently with closed sets first and then defineopen sets as their complements; the axioms of closed sets are obtained by expressingthe axioms for open sets in terms of their complements (see Exercise (9)).

4.3

Or, one can start with a closure operator u W P.X/ ! P.X/ satisfying(1) u.;/ D ; and A � u.A/,(2) u.A[ B/ D u.A/[ u.B/ and(3) u.u.A// D u.A/.A is declared closed if u.A/ D A, the open sets are complements of the closed ones,and U is a neighborhood of x if x … u.X X U /.

4.4

In fact one usually thinks of a topological space as a set endowed with all theabove mentioned notions simultaneously, and the only question is which of them

4 First remarks on topology 45

one considers primitive concepts and which are defined afterwards. The resultingstructure is the same. (See the Exercises.)

4.5

A topology is not always obtained from a metric (if it is we speak of a metrizablespace). Here are two rather easy examples.

(a) Take an infinite set X and declare U � X to be open if either it is void or ifX X U is finite.

(b) Take a partially ordered set .X;�/ and declare U to be open if U Dfx j 9y 2 U ; x � yg. (Note: this topology is metrizable for certain specialchoices of partial orderings, but certainly not in general.)

Non-metrizable spaces of importance are of course seldom defined as easily asthis. But it should be noted that many non-metrizable spaces are of interest today.

4.6

A mapping f W X ! Y between topological spaces is continuous if for everyx 2 X and every neighborhood V of f .x/ there is a neighborhood U of x suchthat f ŒU � � V (cf. (2) in Theorem 3.4). If we replace in 3.4 the metric definition ofcontinuity (1) with the definition we just made, we have the following more generalresult:

Theorem. Let X; Y be topological spaces. Then the following statements on amapping f W X ! Y are equivalent.(1) f is continuous.(2) For every U open in Y the preimage f �1ŒU � is open in X .(3) For every A open in Y the preimage f �1ŒA� is closed in X .(4) For every subset A � X ,

f ŒA� � f ŒA�:

(5) For every subset B � Y ,

f �1ŒB� � f �1ŒB�:

Proof. Most of the implications can be proved by the same reasoning as in 3.4. Theonly one needing a simple adjustment is

(5))(1): Let (5) hold and let V be a neighborhood of f .x/. Thus, f .x/ …Y X V , that is, x … f �1ŒY X V �. Hence, U D X X f �1ŒY X V � D f �1ŒV � isa neighborhood of x, and f ŒU � D ff �1ŒV � � V . ut


4.7

The system of open sets � constituting a topology is often determined by a so-calledbasis, which means a subset B � � such that

B1;B2 2 B ) B1 \ B2 2 B and

for every U 2 �; U D[

fB j B 2 B; B � U g:

(For example, the set of all open intervals, or the set of all open intervals withrational endpoints are bases of the standard topology of the real line R).

One may wish to define a topological space where some particular subsetsare open, thus specifying a subset S � P.X/ of such sets without any a prioriproperties. One easily sees that the smallest topology containing S is the set of allunions of finite intersections of elements of S. Then one speaks of S as of a subbasisof the topology obtained.

The preimages of (finite) intersections are (finite) intersections, and preimages ofunions are unions of preimages. Consequently we obtain from 4.6 an important

Observation. A mapping f W .X; �/ ! .Y; �/ is continuous if and only if there isa subbasis S of � such that each f �1ŒS� with S 2 S is open.

(Thus e.g. to make sure a real function f W X ! R is continuous it suffices tocheck that all the f �1Œ.�1; a/� and f �1Œ.a:C 1/� are open.)

4.8

Let .X; �/ be a topological space and let Y � X be a subset. We define the subspaceof .X; �/ carried (or induced) by Y as

.Y; � jY / where � jY D fU \ Y j U 2 �g:

Since for the embedding map j W Y ! X , j�1ŒU � D U , the map j is continuous;furthermore, if f W .Z; �/ ! .X; �/ is a continuous map such that f ŒZ� � Y thenthe map .z 7! f .z// W .Z; �/ ! .Y; � jY / is continuous as well.

Note that this is in accordance with the concept of subspace in the metric case: themetric subspace (cf. 2.1) has the topology just described, obtained from the topologyof the larger metric space.

4.8.1 ConventionUnless otherwise stated, the subsets of a topological space will be understood tobe endowed with the induced topology, and we will subject the terminology to thisconvention. Thus we will speak of “connected subsets” or “compact subsets” etc(see below) or on the other hand of an ‘open subspace” or ”closed subspace”, etc.

5 Connected spaces 47

5 Connected spaces

One of the simplest notions defined for topological spaces is connectedness.

5.1

A topological space X is said to be connected if for any two open sets U; V � X

which satisfy U \ V D ; and U [ V D X , we have U D ; (and hence V D X ),or V D ; (and hence U D X ). It is also common, for a subset S � X , to saythat S is connected if S is a connected topological space with respect to the inducedtopology. Note that this is equivalent to saying that for open sets U; V � X suchthat U \ V \ S D ; and U [ V � S , we have U � S or V � S . The followingobservations are immediate.

5.1.1 Proposition. Let X be a connected space and f W X ! Y a continuous mapwhich is onto. Then Y is connected.

Proof. Suppose U; Y � Y are open, U \ V D ;, U [ V D Y . Then f �1ŒU � \f �1ŒV � D ;, f �1ŒU � [ f �1ŒV � D X , so f �1ŒU � D ; or f �1ŒV � D ;, whichimplies U D ; or V D ; since f is onto. ut

5.1.2 Proposition. Let Si � X , i 2 I , and let each Si be connected. Supposefurther for every i; j 2 I , there exist i0; : : : ; ik 2 I , i0 D i , ik D j such thatSit \ SitC1

¤ ;. Then

S D[i2ISi

is connected.

Proof. Suppose U; V are open in X , U [V � S; U \V \S D ;. Suppose furtherU is non-empty. Then there exists an i 2 I such thatU \Si ¤ ;, and henceU � Sisince Si is connected. Now select any j 2 I and let i0; : : : ; ik be as in the statementof the Proposition. By induction on t , we see that U \ Sit ¤ ;, and hence U � Sitsince Sit is connected. Thus, U � Sj . Since j 2 I was arbitrary, U � S . ut

5.1.3 Corollary. A product X Y of two connected metric spaces X; Y isconnected.

Proof. Choose a point x 2 X and consider the sets S0 D fxg Y , Sy D X fygfor y 2 Y . Then Si , i 2 Y q f0g, satisfy the assumptions of Proposition 5.1.2. ut

5.1.4 Proposition. The closure of a connected subset S of a topological space isconnected.


Proof. If U; V � S satisfy U \ V D ;, U [ V D S and U; V are non-empty openin S , then U \ S , V \ S are non-empty and open in S , their union is S and theirintersection is non-empty, contradicting the assumption that S is connected. ut

5.2 Connectedness of the real numbers

The fact that the set R of all real numbers is connected is “intuitively obvious”, butmust be proved with care. Let us start with a preliminary result.

5.2.1 Lemma. Every open set U � R is a union of countably (or finitely) manydisjoint open intervals.

Proof. We know that U is a union of countably many open intervals Ui , i D1; 2; : : : since open intervals .q1; q2/, q1; q2 2 Q, form a basis of the topologyof R. Note also that if V;W are open intervals and V \ W ¤ ;, then V [ W isan open interval, and that an increasing union of open intervals is an open interval.Now consider an equivalence class on f1; 2; : : : g where i j if and only if thereexist i0; : : : ; ik such that i0 D i , ik D j and Uit \ UitC1

¤ ;. Then the sets

[i2C

Ui

where C are equivalence classes with respect to are disjoint open intervals whoseunion is U . ut

5.2.2 Theorem. The connected subsets of R are precisely (open, closed, half-open,bounded, unbounded, etc.) intervals.

Proof. Let us first prove that intervals are connected. Let J be an interval. SupposeU; V are open in R, U \ V � J , U \ V \ J D ;. Suppose U is non-empty. ByLemma 5.2.1,U is a disjoint union of countably many open intervalsUi , i 2 I ¤ ;.Without loss of generality, none of the sets Ui is disjoint with J . Choose i 2 I , andsuppose Ui D .a; b/ does not contain J . Then .a; b/ [ J is an interval containingbut not equal to .a; b/, so a 2 J or b 2 J . Let, without loss of generality, b 2 J .Then b … V , b … Uj , j ¤ i , since V , Uj , j ¤ i are open and disjoint with Ui .Thus, b 2 J X .U [ V /, which is a contradiction.

On the other hand, suppose that S � R is connected but isn’t an interval. Thenthere exist points x < z < y, x; y 2 S , z … S . But then S � .�1; z/ [ .z;1/,which contradicts the assumption that S is connected. ut

5.2.3 Corollary. The Euclidean space Rn is connected.

Proof. This follows from Theorem 5.2.2 and Corollary 5.1.3. ut

5 Connected spaces 49

5.3 Path-connected spaces

A topological space X is called path-connected if for any two points x; y 2 X ,there exists a continuous map � W h0; 1i ! X such that �.0/ D x, �.1/ D y.By Theorem 5.2.2, Proposition 5.1.1 and Proposition 5.1.2, a path-connected spaceis connected. See Exercise (14) for an example of a closed subset of R2 which isconnected but not path-connected.

5.3.1 Proposition. Let U � Rn be a connected open set (with the inducedtopology). Then U is path-connected.

Proof. If U is empty, it is clearly path-connected. SupposeU is non-empty. Choosea point x 2 U . Let V � U be the set of all points y 2 U for which there exists acontinuous map � W h0; 1i ! U such that �.0/ D x, �.1/ D y. We claim that V isopen in U : this is the same as being open in R

n. If � is as above,.y; "/ � U , andz 2 .y; "/, extend � to a map h0; 2i ! U by putting �.1C t/ D tz C .1� t/y fort 2 h0; 1i. Clearly � is continuous, and defining W h0; 1i ! U by .t/ D �.2t/

shows z 2 V .We also claim, however, that V is closed in U : Let yn ! y, yn 2 V , y 2 U .

Since U is open, there exists an " > 0, .y; "/ � U . Then there exists an n suchthat yn 2 .y; "/. Then we proceed the same way as above: Let � W h0; i ! U ,�.0/ D x, �.1/ D yn. Extend � to a map h0; 2i ! U by putting �.1 C t/ Dty C .1 � t/yn for t 2 h0; 1i. Putting again .t/ D �.2t/ shows that y 2 V .

Since V ¤ ; (since x 2 V ), and since V is open and closed in U , we must haveV D U , since U is connected. ut

5.4 Connected components

Let X be a topological space. Let be a relation on X where x y if andonly if there exists a connected subset S � X such that x; y 2 S . Then is anequivalence relation (transitivity follows from Proposition 5.1.2). The equivalenceclasses of are called the connected components of X . Also by Proposition 5.1.2,connected components are connected subsets of X .

An immediate consequence of Proposition 5.1.4 is the following:

5.4.1 Lemma. Connected components of X are closed subsets of X . ut

Connected components may not be open: consider Q (with the topology inducedfrom R). Then the connected components are single points. We have, however,

5.4.2 Lemma. Let U � Rn be an open set. Then the connected components of Uare open in U (hence in Rn).


Proof. Let x 2 U . Then there exists " > 0 such that .x; "/ � U , but .x; "/is homeomorphic to R

n and hence connected by Corollary 5.2.3, so .x; "/ iscontained in the connected component of x. Since this is true for every point x,the connected components are open. ut

5.5 A result on bounded closed intervals

The proof of the following result will seem, in nature, related to the proof of thefact that the real numbers are connected. While this is true, it turns out to be mainlydue to special properties of the real numbers. The result itself is a reformulation ofcompactness, a notion which we will discuss in the next section. An understandingof this connection for general metric spaces, however, will have to be postponeduntil Chapter 9 below.

By an open interval (resp. bounded closed interval) in Rn we mean a set of the

formnY

kD1.ak; bk/ (resp. of the form

nYkD1

hak; bki, �1 < ak; bk < 1).

Theorem. For every bounded closed interval K in Rn and every set of openintervals S such that K �

[I2S

I , there exists a finite subset F � S such that

J �[I2F

I .

Proof. Let us first consider the case n D 1. Let ha; bi be contained in a union of aset S open intervals. Let t 2 ha; bi be the supremum of the set M of all s 2 ha; bisuch that ha; si is contained in a union of some finite subset of S . We want to provethat t D b. Assume, then, that t < b. Then there exists a J 2 S such that t 2 J . Onthe other hand, by the definition of supremum, there exist si 2 M such that si % t .Then, for some i , si 2 J . But we also know that there exists a finite subset F � S

whose union contains ha; si i. Then the union of the finite subset F [ fJ g containsha; xi for every x 2 J , contradicting t D supM .

Now let us consider general n. Assume, by induction, that the statement holdswith n replaced by n � 1. Let K D ha1; b1i � � � han; bni. Then for every pointx 2 ha1; b1i, there exists, by the induction hypothesis, a finite subset Fx � S

such that fxg ha2; b2i � � � han; bni � Fx . Let Ix be the intersection of all the(1-dimensional) intervals I1 where I1 � � � In 2 Fx . Then ha1; b1i is contained inthe union of the open intervals Ix , x 2 ha1; b1i, and hence there are finitely many

points x1; : : : ; xk 2 ha1; b1i such that ha1; b1i �k[iD1

Ixi . ThenK is contained in the

union of the open intervals in Fx1 [ � � � [ Fxk . ut

6 Compact metric spaces 51

Corollary. For every bounded closed interval K in Rn and every set of open setsQ such that K �

[I2Q

I , there exists a finite subset F � Q such that J �[I2F

I .

(Apply the theorem to the set S of all open intervals which are contained in oneof the open sets in Q.)

6 Compact metric spaces

6.1

A metric space X is said to be compact if each sequence .xn/n in X contains aconvergent subsequence. Thus, in particular, a bounded closed interval ha; bi in R

is compact (recall Theorem 2.3 of Chapter 1).

6.2 Proposition. 1. A subspace of a compact space is compact if and only if it isclosed.

2. If f W X ! Y is continuous then the image f ŒA� of any compact A � X iscompact.

Proof. 1. Let A be a closed subspace of a compact X . Let .xn/n be a sequence ofpoints of A. There is a subsequence xk1 ; xk2 ; xk3 ; : : : converging in X . Since Ais closed, the limit is in A.Now let A not be closed. Then there is a sequence .xn/n of elements of Aconvergent in X , with the limit x in X X A; since each subsequence convergesto x, there is none converging to a point in A.

2. Let .yn/n be a sequence in f ŒA�. Choose xi 2 A such that yi D f .xi /. SinceA is compact we have a subsequence xk1 ; xk2 ; xk3 ; : : : converging to an x 2 A.Then by 1.5, yk1 ; yk2 ; yk3 ; : : : converges to f .x/. ut

6.2.1Note that from the second part of the proof of the first statement we obtain animmediate

Observation. A compact subspace of any metric space X is closed in X .

Remark. Thus we have a slightly surprising consequence: if X is compact, Y isa general metric space and if f W X ! Y is a continuous mapping then, besides


preimages of closed sets being closed, also the images of closed sets are closed.We will learn more about this phenomenon in Chapter 9 below. For now, let usrecord the following

6.2.2 Corollary. Let f W X ! Y be a continuous bijective (i.e. one to one andonto) map of metric spaces where X is compact. Then f is a homeomorphism.

6.3 Proposition. Let X be a compact metric space. Then for each continuous realfunction f on X there exist x1; x2 2 X such that

f .x1/ D minff .x/ j x 2 Xg and f .x2/ D maxff .x/ j x 2 Xg:

(Compare with 3.4 of Chapter 1.)

Proof. A compact subspace A of R has a minimal and a maximal point, namelyinfA and supA that are obviously limits of sequences in A. Apply to A D f ŒX�,compact by 6.2. ut

6.4 Proposition. (Finite) products of compact spaces are compact.

Proof. We will begin with the product X Y of two compact metric spaces - theextension to a general finite product follows by induction.

Let

.x1; y1/; .x2; y2/; .x3; y3/; : : : (*)

be a sequence of points of X Y . In X , choose a convergent subsequence .xkn/nof .xn/n. Now take the sequence .ykn/n in Y and choose a convergent subsequence.ykrn /n. Then by 2.2.2.2 (and (1.2.1)),

.xkr1 ykrn /; .xkr2 ; ykr2 /; .xkr3 ; ykr3 /; : : :

is a convergent subsequence of (*). ut

A metric space .X; d/ is bounded if there exists a number K such that for allx; y 2 X , d.x; y/ < K . From the triangle inequality we immediately see that thisis equivalent to any of the following statements:

there is a K such that for every x; X � .x;K/;

for every x there is a K such that X � .x;K/:

6 Compact metric spaces 53

6.5 Theorem. A subspace of the Euclidean space Rm is compact if and only if it isbounded and closed.

Proof. I. From Theorem 2.3 of Chapter 1, we already know that a bounded closedinterval is compact.

II. Now let X be a bounded closed subspace of Rm. Since it is bounded there areintervals hai ; bi i, i D 1; ; : : : ; m, such that

X � J D ha1; b1i � � � ham; bmi:

By 6.4 and I, J is compact. The subspace X is closed in Rm, hence in J , andhence it is compact by 6.2.

III. Let X not be closed in Rm. Then it is not compact, by 6.2.1.IV. LetX not be bounded. Choose arbitrarily x1 and then xn such that d.x1; xn/>n.

A convergent sequence is always bounded (all but finitely many of its elementsare in the "-ball of the limit). Thus, .xn/n cannot have a convergent subsequenceas it has no bounded one. ut

6.6

We have already observed that uniform continuity is a much stronger property thancontinuity (even the real function x 7! x2 is not uniformly continuous). But thesituation is different for compact spaces. We have

Theorem. Let X; Y be metric spaces and let X be compact. Then a mapping f WX ! Y is uniformly continuous if and only if it is continuous.

(Compare with Theorem 3.5.1 of Chapter 1.)

Proof. Let f be continuous but not uniformly continuous. Negating the defini-tion,

there is an "0 > 0 such that for every ı > 0 there are x.ı/; y.ı/ such that

d.x.ı/; y.ı// < ı while d 0.f .x.ı//; f .y.ı/// � "0:

Consider xn D x. 1n/ and yn D y. 1

n/. Choose a convergent subsequence .xkn/n of

.xn/n and a convergent subsequence .ykrn /n of .ykn/n, setexn D xkrn andeyn D ykrn ,and finally x D limexn and y D limeyn. As d.exn;eyn/ < 1

n, x D y. This is a

contradiction since by continuity f .x/ D limf .exn/ and f .y/ D lim f .eyn/ andd.f .exn/; f .eyn// is always at least "0. ut


7 Completeness

7.1

A sequence .xn/n in a metric space .X; d/ is said to be Cauchy if

8" > 0 9n0 such that 8m; n � n0; d.xm; xn/ < ":

7.2 Proposition. 1. Every convergent sequence is Cauchy.2. Let a Cauchy sequence .xn/n contain a convergent subsequence; then the whole

sequence .xn/n converges.3. Every Cauchy sequence is bounded.

Proof. 1. Let lim xn D x. For " > 0 choose an n0 such that d.xn; x/ < "2

for alln � n0. Then form; n � n0,

d.xm; xn/ � d.xm; x/C d.x; xn/ <"

2C "

2D ":

2. Let .xn/n be Cauchy and let .xkn/n be a subsequence converging to a point x.Choose an n1 such that for m; n � n1, d.xm; xn/ < "

2, and an n2 such that for

n � n2, d.xkn ; x/ <"2. Set n0 D max.n1; n2/. Since kn � n we have, for

n � n0,

d.xn; x/ � d.xn; xkn/C d.xkn ; x/ < ":

3. Choose n0 such that form; n � n0, d.xm; xn/ < 1. Then for any n,

d.x; xn0/ < 1C maxk�n0

d.xn0; xk/: ut

7.3

A metric space .X; d/ is said to be complete if every Cauchy sequence in X

converges.

7.3.1 Proposition. A subspace A of a complete space X is complete if and only ifit is closed.

Proof. Let A be closed. If a sequence is Cauchy in A, it is Cauchy in X and henceconvergent. Since A is closed, the limit of the sequence has to be in A.

If A is not closed there is a sequence .xn/n with xn 2 A, convergent in X toan x 2 X X A. Then .xn/n is Cauchy in X and hence in A as well; but all of itssubsequences converge to x and hence do not converge in A. ut

7 Completeness 55

7.4 Proposition. A compact metric space is complete.

Proof. Let .xn/n be a Cauchy sequence in a compact metric space X . Then it has aconvergent subsequence, and by 6.2 2, it converges. ut

7.5 Theorem. The Euclidean space Rm (in particular, the real line R) is complete.Consequently, a subspace of Rm is complete if and only if it is closed.

Proof. Let .xn/n be a Cauchy sequence in Rm. By 6.2 it is bounded and hence

fxn j n D 1; 2; : : : g � J D ha1; b1i � � � ham; bmi

for sufficiently large intervals haj ; bj i. By 6.4 .xn/n converges in J and hence itconverges in Rm. ut

Remark. The special case of the real line is the well-known Bolzano-CauchyTheorem (Theorem 2.4 of Chapter 1).

7.6

The following is the well-known Banach Fixed Point Theorem. At first sight itmay seem that its use will be rather limited: the assumption is very strong. But thereader will be perhaps surprised by the generality of one of the applications in 3.3of Chapter 6.

Theorem. Let .X; d/ be a complete metric space. Let f W X ! X be a mappingsuch that there is a q < 1 with

d.f .x/; f .y// � q � d.x; y/ (*)

for all x; y 2 X . Then there is precisely one x 2 X such that f .x/ D x.

Proof. Choose any x1 2 X and then, inductively,

xnC1 D f .xn/:

Set C D d.x1; x2/. By the assumption we have

d.x2; x3/ � Cq; d.x3; x4/ � Cq2; : : : ; d.xn; xnC1/ � Cqn�1:

Thus, by triangle inequality, form � nC 1,

d.xn; xm/ D C.qn�1CqnC� � �Cqm�2/ � Cqn�1.1CqCq2C� � � / D C

1 � q �qn�1:


Hence, .xn/n ia a Cauchy sequence and we have a limit x D limxn. Now a mappingf satisfying (*) is clearly continuous and hence we have

f .x/ D f .lim xn/ D lim f .xn/ D lim xnC1 D x:

Finally, if f .x/ D x and f .y/ D y then

d.x; y/ D d.f .x/; f .y// � q � d.x; y/ with q < 1

which is possible only if d.x; y/ D 0. ut

7.7 An Example: Spaces of continuous functions

Let X D .X; d/ be a metric space. Denote by

C.X/

the space of all bounded continuous real functions f W X ! R, endowed with themetric

d.f; g/ D supx2X

jf .x/ � f .x/j:

(The function d thus defined really is a metric. Obviously d.f; g/ D 0 impliesf D g and d.f; g/ D d.g; f /. Suppose d.f; g/ C d.g; h/ < d.f; g/; then thereis an x 2 X such that d.f; g/ C d.g; h/ < jf .x/ � h.x/j, but then in particularjf .x/ � g.x/j C jg.x/ � h.x/j < jf .x/ � h.x/j, a contradiction.)

Remark. Of course, by 2.4.2, if X is compact then C.X/ is the space of allcontinuous functions on X .

7.7.1 Observation. The convergence in C.X/ is exactly the uniform convergencedefined in 8.1.

(We have d.f; g/ < " if and only if for all x 2 X , jf .x/ � g.x/j < ".)

7.7.2 Proposition. The space C.X/ with the metric defined above is complete.

Proof. Let .fn/n be a Cauchy sequence in C.X/. Then, since jfn.x/ � fm.x/j �d.fn; fm/ for each x 2 X , every .fn.x//n is a Cauchy sequence in R, and hence aconvergent one. Set

8 Uniform convergence of sequences of functions. Application: Tietze’s Theorems 57

f .x/ D limnfn.x/:

Claim. The sequence .fn/n converges to f uniformly.Proof of the Claim. Consider an " > 0. There exists an n0 such thatform; n � n0,

8x; jfn.x/ � fm.x/j < "2

and hence limm!1 jfn.x/ � fm.x/j D jfn.x/� lim

m!1fm.x/j Djfn.x/� f .x/j � "

2< ". Thus, for n�n0 and for all x 2X , jfn.x/ �

f .x/j<". utProof of the Proposition continued. By the Claim and 8.2, f is continuous. Nowthere exists an n0 such that for all n;m � n0, d.fn; fm/ D sup

x

jfn.x/ � fm.x/j <1 and hence, taking the limit, we obtain jfn.x/ � f .x/j � 1 for all x. Thus, ifjfn0.x/j � K we have jf .x/j � K for all x.

Now we know that f is bounded and continuous, hence f 2 C.X/, and by 7.7.1and the Claim again, .fn/n converges to f in C.X/. ut

7.7.3Let a; b 2 R [ f�1:C 1g. Put

C.X I a; b/ D ff 2 C.X/ j 8x; a � f .x/ � bg:

Proposition. The subspace C.X I a; b/ is closed in C.X/. Consequently, it iscomplete.

Proof. Recall 8.1.1. Since uniform convergence implies pointwise convergence, ifa � fn.x/ � b and fn converge to f then a � f .x/ � b and f 2 C.X I a; b/.

The consequence follows from 7.3.1. ut

8 Uniform convergence of sequences of functions.Application: Tietze’s Theorems

On various occasions we have seen that general facts the reader knew about realfunctions of one real variable held generally, and the proofs did not really needanything but replacing jx � yj by the distance d.x; y/. For example, this wasthe case when studying the relationship between continuity with convergence, orwhen proving that continuous maps of compact spaces are automatically uniformlycontinuous; or the fact about maxima and minima of real functions on a compact


space (where in fact the general proof was in a way simpler, or more transparent,due to the observation that the image of a compact space is compact).

In this section we will introduce yet another case of such a mechanical exten-sion, namely the behavior of uniformly convergent sequences of mappings, resp.uniformly convergent series of real functions. As an application we will presentrather important Tietze Theorems on extension of continuous maps.

8.1

Let .X; d/, .Y; d 0/ be metric spaces. A sequence of mappings

f1; f2; f3; : : : W X ! Y

is said to converge uniformly to f iffor every " > 0 there is an n0 such that for all n � n0 and for all x 2 X ,

d 0.fn.x/; f .x// < ":

This is usually indicated

fn � f:

8.1.1 Remarks1. Note that if fn � f then

lim fn.x/ D f .x/ for all x. (*)

The statement (*) alone, (called pointwise convergence), is much weaker, andwould not suffice as an assumption in 8.2 below.

2. Also note that in the above definition, one uses the metric structure in .Y; d 0/only. See 8.2.1 below.

8.2 Proposition. Let fn � f for mappings .X; d/ ! .Y; d 0/. Let all the functionsfn be continuous. Then f is continuous.

Proof. For " > 0 choose n such that d 0.fn.x/; f .x// < "3

for all x. Since fn iscontinuous there is a ı > 0 such that d.x; y/ < ı implies d 0.fn.x/f .x// < "

3. Now

we have the implication

d.x;y/ < ı ) d 0.f .x/; f .y//

� d 0.f .x/; fn.x//C d 0.fn.x/; fn.y//C d 0.fn.y/; f .y// <"

3C "

3C "

3D ":

ut


8.2.1Note that an analogous proposition also holds for a topological space .X; �/ insteadof a metric one. In the proof replace the requirement of ı by a neighborhoodU of xsuch that fnŒU � � .fn.x/;

"3/ and use for y 2 U the triangle inequality as before.

8.3 Corollary. Let fn W .X; d/ ! R be continuous functions, letPan be a

convergent series of real numbers, and let for every n and every x, jfn.x/j � an.

Then gn.x/ DnX

kD1fk.x/ uniformly converge to

1XkD1

fk.x/ and hence g D .x 7!1XkD1

fk.x// is a continuous function.

8.4 Lemma. Let A;B be disjoint closed subsets of a metric space .X; d/ and let˛; ˇ be real numbers. Then there is a continuous function

' D ˆ.A;BI˛; ˇ/ W X ! R

such that

'ŒA� � f˛g; 'ŒB� � fˇg and minf˛; ˇg � '.x/ � maxf˛; ˇg: (ˆ)

Proof. Set

'.x/ D ˛ C .ˇ � ˛/ d.x;A/

d.x;A/C d.x;B/:

This definition is correct: d.x;A/ C d.x;B/ D 0 yields d.x;A/ D d.x;B/ D 0

and by closedness x 2 A and x 2 B; but A and B are disjoint.Furthermore, .x/ D d.x; C / is continuous (by triangle inequality, d.y; C / �

d.x; C / C d.x; y/ and hence jd.x; C / � d.y; C /j � d.x; y/) so that ', obtainedby arithmetic operations from continuous functions, is continuous as well.

The properties listed in .ˆ/ are obvious. ut

8.5 Theorem. (Tietze) Let A be a closed subspace of a metric space X and letJ be a compact interval in R. Then each continuous mapping f W A ! J can beextended to a continuous g W X ! J (that is, there is a continuous g such thatgjA D f ).

Proof. For a degenerate interval ha; ai the statement is trivial and all the othercompact intervals are homeomorphic; if the statement holds for J1 and if h W J !J1 is a homeomorphism we can extend for f W A ! J the hf to a g W X ! J1and then take g D h�1g. Thus we can choose the J arbitrarily. For our purposes,J D h�1; 1i will be particularly convenient.


Set A1 D f �1Œh�1;� 13i� and B1 D f �1Œh 1

3; 1i� and consider

'1 D ˆ.A1; B1I �13;1

3/:

We obviously have

8x 2 A; jf .x/ � '1.x/j � 2

3:

Set f1 D f � '1.Suppose we already have continuous

f D f1; f2; : : : ; fn W A ! h�1; 1i and '1; '2; : : : 'n W X ! h�1; 1i

such that for all k D 1; : : : ; n,

j'k.x/j � 1

3k; fk.x/ D fk�1.x/ � 'k.x/ and jfk.x/j � 2

3k: (*)

Then set

AnC1 D f �1Œh� 1

3n;� 1

3nC1 i�; BnC1 D f �1Œh 1

3nC1 ;1

3ni�;

'nC1 D ˆ.AnC1; BnC1I � 1

3nC1 ;1

3nC1 / and fnC1 D fn � 'nC1:

Thus we obtain sequences of continuous functions '1; '3; : : : ; 'k; : : : and f Df0; f1; : : : ; fk; : : : satisfying (*) for all k. By 7.3, we have a continuous function

g D .x 7!1XkD1

'k.x// W X ! R and since jg.x/j �1XkD1

2

3kD 1, we can view it as

a continuous function

g W X ! h�1; 1i:

Now let x 2 A. We have

f .x/ D '1.x/Cf1.x/ D '1.x/C'2.x/Cf2.x/ D � � � D '1.x/C� � �C'n.x/Cfn.x/

and since limn fn.x/ D 0 we conclude that f .x/ D g.x/. ut


8.5.1 Theorem. (Tietze’s Real Line Theorem) Let A be a closed subspace of ametric space X . Then each continuous mapping f W A ! R can be extended toa continuous g W X ! R.

Proof. We can replace R by any space homeomorphic with R (recall the firstparagraph of the previous proof). We will take the open interval .�1; 1/ insteadand extend a map f W A ! .�1; 1/.

By 8.5, f can be extended to a g W X ! h�1; 1i. Such g can, however reach thevalues �1 or 1 and hence is not an extension as desired. To remedy the situation,consider B D g�1Œf�1; 1g� which is a closed set disjoint with A, consider the ' Dˆ.A;B; 0; 1/ from 8.4, and define

g.x/ D g.x/ � '.x/:

Now we have f .x/ D g.x/ D g.x/ for x 2 A, and jg.x/j < 1 for all x 2 X : ifg.x/ D 1 or �1 then '.x/ D 0.

8.5.2A subspace R of a space Y is said to be a retract of Y if there exists a continuousr W Y ! R such that r.x/ D x for all x 2 R.

A metric space Y is injective if for every metric space X and closed A � X ,each continuous f W A ! Y can be extended to a continuous g W X ! Y . (Thus,we have learned above that R and any compact interval are injective spaces.)

Theorem. Every retract of a Euclidean space is injective.

Proof. First we will prove that a Euclidean space itself is injective. Consider it asthe product

Rm D R � � � R m times

with the projections pj ..x1; : : : ; xm// D xj . Let f W A ! Rm be a continuous

mapping. Then we have by 8.5.1 continuous gj W X ! R such that gj jA D pjf .By 2.2.2 we have the continuous g D .x 7! .g1.x/; : : : ; gm.x/// W X ! R

m andfor x 2 A we obtain g.x/ D .p1f .x/; : : : ; pmf .x// D f .x/.

Now let Y be a retract of Rm with a retraction r W Rm ! Y and an inclusion mapj W Y ! R

m (thus, rj D id). Now if f W A ! Y (or, rather, jf W A ! Rm) is

extended to g W X ! Rm, the desired extension g is rg. ut


9 Exercises

(1) Prove 1.4.1.(2) Prove Proposition 1.5.1.(3) Prove Observation 1.5.2.(4) Prove that f W .X; d/ ! .Y; d 0/ is continuous if and only if for each

convergent sequence .xn/n in .X; d/ the sequence .f .xn// is convergent (notspecifying the limits.).

(5) (a) Consider the set of real numbers R. Prove that the function

d 0.x; y/ D jx3 � y3j

is a metric which is not equivalent to the metric d given in exam-ple 1.1.1 (a).

(b) Prove that nevertheless, neighborhoods with respect to d are the same asneighborhoods with respect to d 0.

(6) Each .x; "/ is open (use the triangle inequality).(7) Let Y be a subspace of .X; d/. U is open (closed) in Y if and only if there

exists an open (closed) V in X such that U D V \ Y . The closure of A in yis A \ Y where A is the closure in X (discuss this from the various aspects ofclosure as presented in 3.3.

(8) Find an example when uniform continuity is not preserved under homeomor-phism.

(9) Write down a definition of topology based on closed subsets of X .(10) Check that the closures as defined in 4.1 and 4.2 satisfy the requirements

of 4.3).(11) Starting with open sets, define neighborhoods, and from them define closure

as indicated above. Prove that you get the same as the closure defined fromopen sets directly.

(12) Start with open sets, define neighborhoods, and then open sets as in 4.1. Provethat the open sets thus defined are precisely the same sets as the original ones(note the role of the somewhat clumsy requirement (4) in 4.1).

(13) Preserving connectedness is not the same as continuity. Give an example of amap f W X ! Y such that for every connected S � X , f ŒS� is connected(with the induced topology from Y ), but f is not continuous. [Hint: TakeX DQ, the rational numbers.]

(14) Let X � R2 be the union of the set of all points .0; y/, y 2 h�1; 1i and the setof all points .x; sin.1=x//, x > 0, with the induced topology.(a) Prove that X � R2 is a closed subset.(b) Prove that X is connected but not path-connected.

(15) Let U � Rn be a connected open set, and let x; y 2 U . Prove thatthere exist x0; : : : ; xk 2 U , x0 D x, xk D y, such that the straight linesegment connecting xt ; xtC1 is contained in U . [Hint: mimic the proof ofProposition 5.3.1.]

9 Exercises 63

(16) Path-connected components are defined the same way as connected com-ponents in 5.4, with the word “connected” replaced by the word “path-connected”. Are path-connected components necessarily closed? Prove or givea counterexample.

(17) Check that convergence in the metric spaces defined in 1.1.1 (d), (e) isprecisely uniform convergence.

(18) Prove an analogue of Proposition 8.2 for uniform continuity instead ofcontinuity.

(19) LetK be the set of all real numbers of the form1XkD1

ak3�k, where ak 2 f0; 2g.

(This is called the Cantor set.) Prove thatK is compact. Prove thatK containsno compact interval with more than one point.

(20) Prove that a subspace of Rm is injective if and only if it is a retract.

3Multivariable Differential Calculus

In this chapter, we will learn multivariable differential calculus. We will develop themultivariable versions of the concept of a derivative, and prove the Implicit FunctionTheorem. We will also learn how to use derivatives to find extremes of multivariablefunctions.

To understand Multivariable Differential Calculus, one must be familiar withLinear Algebra. We assume that the typical reader of this book will already havehad a course in linear algebra, but for convenience we review the basic concepts inAppendices A and B. We refer periodically to results of these Appendices, and werecommend that the reader who has seen some linear algebra simply start reading thepresent chapter, and refer to these results in the Appendix as needed. Notationally,the most important are the conventions in Sections 1.3 and 7.3 of Appendix A below:Rn will be the space of real n-dimensional column vectors (matrices of type n 1).To avoid awkward notation, however, we will usually write rows and decorate themwith the superscript ‹T which means transposition (Subsection 7.3 in Appendix A.Row or column vectors will be denoted by bold-faced letters, such as v. The zerovector (origin) will be denoted by o.

1 Real and vector functions of several variables

1.1

We will deal with real functions of several real variables, that is, mappingsf W D ! R with a domain D � Rn. Typically, D will be open. Intercheangablyf .x/ where, in accordance with convention 7.3 of Appendix A, x D .x1; : : : ; xn/

T ,we will also write f .x1; : : : ; xn/. When x 2 Rm, y 2 Rn, notations such as f .x; y/,f .x; y1; : : : ; yn/ will also be allowed for a function f of mC n variables.

Given such a function f , we will often be concerned with the associatedfunctions of one variable

�.t/ D f .x1; : : : ; xk�1; t; xkC1; : : : ; xn/; xj .j ¤ k/ fixed: (1)


65

66 3 Multivariable Differential Calculus

It is useful to realize right away that the study of an f W D ! R cannot be reduced tothe system of all such functions � of one variable. For instance, all of the functions(1) may be continuous while f itself is not. See the following example. Set

f .x; y/ D

8ˆ<ˆ:

.x � y/2

x2 C y2for .x; y/ ¤ .0; 0/;

1 for .x; y/ D .0; 0/:

(2)

Then each f .a;�/ and each f .�; b/ is continuous, but f is not: the sequence . 1n; 1n/

converges to .0; 0/ while lim f . 1n; 1n/ D 0 ¤ f .0; 0/.

1.2

Recall again Convention 1.3, 7.3 of Appendix A. It is important to note that a vectorfunction

f D .f1; : : : ; fm/T W D ! R

m; fj W D ! R:

is continuous if and only if all the fi are continuous (recall Theorem 2.2.2 ofChapter 2).

1.3 Composition

Vector functions f W D ! Rm, D � Rn, and g W D0 ! Rk , D � Rn, can becomposed whenever fŒD� � D0, and we shall write

g ı f W D ! Rk; (if there is no danger of confusion, gf W D ! R

k/;

for the composition (sending x to g.f.x//, without pedantically restricting f to a mapf 0 W D ! D0 first.

2 Partial derivatives. Defining the existence of a totaldifferential

2.1

Let f W D ! R be a real function of n variables. The partial derivative of f by xk(or, the k-th partial derivative) at the point .x1; : : : ; xn/ is the (ordinary) derivativeof the function � of 1.1 (1), i.e. the limit

2 Partial derivatives. Defining the existence of a total differential 67

limh!0

f .x1; : : : xk�1; xk C h; xkC1; : : : ; xn/ � f .x1; : : : ; xn/h

: (*)

The standard notation is

@f .x1; : : : ; xn/

@xkor

@f

@xk.x1; : : : ; xn/;

in case of multiple variables denoted by different letters, say for f .x; y/ we write,of course,

@f .x; y/

@xand

@f .x; y/

@y; etc.

This notation is slightly inconsistent: the xk in the “denominator” @xk just indicatesfocusing on the k-th variable while the xn in the f .x1; : : : ; xn/ in the “numerator”refers to an actual value of the argument. When confusion is possible, one can writemore specifically

@f .x1; : : : ; xn/

@xk

ˇˇ.x1;:::;xn/D.a1;:::;an/

:

However, we will use this notation only occasionally.

Example.

@.x2 C exyCsin.y//

@xD 2x C yexyCsin.y/;

@.x2 C exyCsin.y//

@yD .x C cos.y//exyCsin.y/:

2.1.1

It can happen (and typically it does) that partial derivatives@f .x1; : : : ; xn/

@xkexist for

all .x1; : : : ; xn/ in some domainD0 � D. In such case, we obtain a function

@f

@xkW D0 ! R:

It is usually obvious from the context whether, speaking of a partial derivative, wehave in mind a function or just a number, as in the definition 2.1, (*) above.

2.2

We shall write


jjxjj D maxi

jxi j

for the distance of x from o (for our purposes we could have taken any of theequivalent distances (recall Subsection 2.2 of Chapter 2) such as the Euclidean normp.xx/ where xx is the dot product (see Appendix A, 4.3); our choice is perhaps the

most convenient technically because of its simple behavior with respect to products).We say that f .x1; : : : ; xn/ has a total differential at a point a D .a1; : : : ; an/

if there exists a function � continuous in a neighborhood U of o which satisfies�.o/ D 0 (in an alternate but equivalent formulation, one requires � to be definedin U X fog and satisfy lim

h!o�.h/ D 0), and numbersA1; : : : ; An such that

f .a C h/� f .a/ DnX

kD1Akhk C jjhjj�.h/ (2.2.1)

(using the dot product, we may write f .a C h/� f .a/ D A � a C jjhjj�.h/).

2.3 Proposition. Let a function f have a total differential at a point a, as in thedefinition above. Then1. f is continuous in a.2. f has all the partial derivatives in a and one has

@f .a/

@xkD Ak:

Proof. 1. We have

jf .x � y/j � jA.x � y/j C j�.x � y/jjx � yjj

and the limit of the right-hand side for y ! x is clearly 0.2. We have

1

h.f .x1; : : : xk�1;xk C h; xkC1; : : : ; xn/ � f .x1; : : : ; xn//

D Ak C �..0; : : : ; 0; h; 0; : : : ; 0//jj.0; : : : ; h; : : : ; 0/jj

h;

and the limit of the right-hand side is clearly Ak . ut

2.4 Directional derivatives

It may now seem silly to prefer the basis vectors in Rn when defining partialderivatives. In effect, for any vector v 2 Rn, one can define a directional derivativeof f by v by

2 Partial derivatives. Defining the existence of a total differential 69

@vf .x/ D limh!0

f .x C hv/� f .x/

h:

(Caution: Some calculus textbooks use a different convention, calling the @v=jjvjjthe directional derivative when v ¤ o, the point being that it only depends on the“direction” of v. The notion as we defined it, without requiring any assumptionon v, and moreover linear in v, is much more natural for use in geometry, as wewill see later.) In any case, the following fact is proved precisely in the same way asProposition 2.3:

Proposition. If a function f has a total differential at a point a, and v 2 Rn is anyvector, then the corresponding directional derivative exists and one has

@vf .a/ DnX

kD1Akvk:

2.5

The formula

f .x1Ch1 : : : ; xnChn/�f .x1; : : : xn/ D f .aCh/�f .a/ DnX

kD1Akhk Cjjhjj�.h/

may be interpreted as saying that in a small neighborhood of a, the function f iswell approximated by the affine function (see Appendix A, 5.9)

L.x1; : : : ; xn/ D f .a1; : : : ; an/CX

Ak.xk � ak/ W

by the required properties of �, the error term is much smaller than the differencex � a.

In case of just one variable, there is no distinction between having a derivative ata and having a total differential at the same point. In case of more than one variable,however, the difference between having all partial derivatives and having a totaldifferential at a point is tremendous.

A function f may have all partial derivatives in an open set without f evenbeing even continuous there: In the example 1.1 (2), both partial derivatives existeverywhere. If we consider a single point, there are even much simpler examples,say the function f defined by f .x; 0/ D f .0; y/ D 0 for all x; y, and f .x; y/ D 1

otherwise. Then both @f

@xand @f

@ystill exist at the point .0; 0/).

What is happening geometrically is this: If we think of a function f asrepresented by its “graph”, the hypersurface

S D f.x1; : : : ; xn; f .x1; : : : ; xn// j .x1; : : : ; xn/ 2 Dg � RnC1; (*)


the partial derivatives describe just the tangent lines in the directions of thecoordinate axes, while a total differential guarantees the existence of an entiretangent hyperplane.

Possessing continuous partial derivatives is another matter, though.

2.6 Theorem. Let f have continuous partial derivatives in a neighborhood of apoint a. Then f has a total differential at a.

Proof. Let

h.0/ D h; h.1/ D .0; h2; : : : ; hn/; h.2/ D .0; 0; h3; : : : ; hn/ etc.

(so that h.n/ D o/). Then we have

f .a C h/� f .a/ DnX

kD1.f .a C h.k�1//� f .a C h.k/// DW M:

By Lagrange’s Theorem, there are 0 � �k � 1 such that

f .a C h.k�1//� f .a C h.k// D @f .a1; : : : ; ak�1; ak C �khk; akC1; : : : ; an/@xk

hk

and hence we can proceed with

M DX @f .a1; : : : ; ak C �khk; : : : ; an/

@xkhk

DX @f .a/

@xkhk C

X.@f .a1; : : : ; ak C �khk; : : : ; an/

@xk� @f .a/

@xk/hk

DX @f .a/

@xkhk C jjhjj

X.@f .a1; : : : ; ak C �khk; : : : ; an/

@xk� @f .a/

@xk/hk

jjhjj :

Set

�.h/ DX

.@f .a1; : : : ; ak C �khk; : : : ; an/

@xk� @f .a/

@xk/hk

jjhjj :

Since

ˇˇ hkjjhjj

ˇˇ � 1 and since the functions

@f

@xkare continuous, lim

h!o�.h/ D 0. ut

2.7

Thus, focusing on an open set in the domain of a function, we may writeschematically

continuous PD ) TD ) PD

3 Composition of functions and the chain rule 71

(where PD stands for all partial derivatives and TD for total differential). Note thatneither of the implications can be reversed. We have already discussed the secondone; for the first one, recall that for functions of one variable the existence of aderivative at a point coincides with the existence of a total differential there, but aderivative is not necessarily a continuous function even when it exists at every pointof an open set.

In the rest of this chapter, simply assuming that partial derivatives exist willalmost never be enough. Sometimes the existence of the total differential willsuffice, but more often than not we will assume the existence of continuous partialderivatives.

3 Composition of functions and the chain rule

3.1 Theorem. Let f .x/ have a total differential in a point a. Let real functionsgk.t/ have derivatives at a point b and let gk.b/ D ak for all k D 1; : : : ; n. Put

F.t/ D f .g.t// D f .g1.t/; : : : ; gn.t//:

Then F has a derivative in b, and

F 0.b/ DnX

kD1

@f .a/

@xk� g0

k.b/:

Proof. Consider the formula 2.2.1. Applying it to our function f , we get

1

h.F.b C h/� F.b// D 1

h.f .g.b C h// � f .g.b//

D 1

h.f .g.b/C .g.b C h/ � g.b/// � f .g.b//

DnX

kD1Akgk.b C h/ � gk.b/

hC �.g.b C h/ � g.b//max

k

jgk.b C h/� gk.b/jh

:

Now limh!0 �.g.b C h/ � g.b// D 0 since the functions gk are continuous at b,

and maxk

jgk.b C h/� gk.b/jh

is bounded in a sufficiently small neighborhood of 0,

since gk have derivatives. Thus, the limit of the last summand is zero and we have

lim1

h.F.b C h/� F.b// D lim

nXkD1

Akgk.b C h/� gk.b/

h

DnX

kD1Ak lim

gk.b C h/ � gk.b/h

DnX

kD1

@f .a/

@xkg0k.b/: ut


3.1.1 Corollary. Let f .x/ have a total differential at a point a. Let real functionsgk.t1; : : : ; tr / have partial derivatives at b D .b1; : : : ; br/ and let gk.b/ D ak forall k D 1; : : : ; n. Then

.f ı g/.t1; : : : ; tr / D f .g.t// D f .g1.t/; : : : ; gn.t//

has all the partial derivatives at b, and

@.f ı g/.b/

@tjD

nXkD1

@f .a/

@xk� @gk.b/@tj

:

3.1.2 RemarkThe assumption of the existence of total differential in 2.1 is essential and it iseasy to see why. Recall the geometric intuition from 2.5. The n-tuple of functionsg D .g1; : : : ; gn/ represents a parametrized curve in D, and f ı g is then a curveon the hypersurface S of 2.5, (*). The partial derivatives of f , or the tangent linesof S in the directions of the coordinate axes, have in general nothing to do with thebehaviour on this curve.

3.2 What is the total differential?

The perceptive reader has noticed that in fact, while we defined what it means thata function has a total differential, we have not yet defined the total differentialas an object. To remedy this, let us go one step further and consider in 3.1.1 amapping f D .f1; : : : ; fs/

T W D ! Rs . Take its composition f ı g with a mappingg W D0 ! Rn (recall the convention in 1.3). Then we get

@.f ı g/

@tjDXk

@fi

@xk� @gk@xj

: (3.2.1)

This formula is often referred to as the chain rule. It certainly has not escaped thereader’s attention that the right-hand side is the product of matrices

�@fi

@xk

�

i;k

�@gk

@xj

�

k;j

:

Recall that the multiplication of matrices is the matrix of the composition of thelinear maps the matrices represent (see Theorem 7.6 of Appendix A).

In view of this, it is natural to define the total differential Dfx0 W Rn ! Rs of themap f at a point x0 2 D as the linear map

f A W Rn ! Rs

3 Composition of functions and the chain rule 73

associated with the matrix

A D�@fi .x/

@xj

�

i;j

ˇˇˇx0

:

For the purposes of practical calculation, in fact, the map Dfx0 and its associatedmatrix A are often identified.

The chain rule can be then stated in the form

D.f ı g/v0 D D.f/g.v0/ ı D.g/v0 :

Compare it with the one variable rule

.f ı g/0.t/ D f 0.g.t//g0.t/I

for 1 1 matrices we of course have .a/.b/ D .ab/.Note that additionally, the total differential in this point can be used to define an

affine approximation faffx0

of the map f at the point x0 (in an affine map approximatingf near x0, see Appendix A, 5.9):

faffx0..x// D f.x0/C Dfx0 .x � x0/:

3.3 Lagrange’s Formula in several variables

Recall that a subsetD � Rn is said to be convex if

x; y 2 D ) 8t; 0 � t � 1; .1 � t/x C ty D x C t.y � x/ 2 D:

Proposition. Let a real function f have continuous partial derivatives in a convexopen set D�Rn. Then for any two points x; y 2 D, there exists a � , 0� � � 1,such that

f .y/ � f .x/ DnX

jD1

@f .x C �.y � x//

@xj.yj � xj /:

Proof. Set F.t/ D f .xC t.y�x//. Then F D f ıg where g is defined by gj .t/ Dxj C t.yj � xj /, and

F 0.t/ DnX

jD1

@f .g.t//

@xjg0j .t/ D

nXjD1

@f .g.t//

@xj.yj � xj /:


Hence by Lagrange’s formula in one variable,

f .y/� f .x/ D F.1/ � F.0/ D F 0.�/

which yields the statement of the proposition. ut

Remark. The formula is often used in the form

f .x C h/� f .x/ DnX

jD1

@f .x C �h/

@xjhj :

Compare this with the formula for total differential.

3.4

It may be of interest that the formula for the derivative of a product of single-variablefunctions is a consequence of the chain rule.

Set h.u; v/ D u � v so that @f@u D v and @f

@vD u. Then

.f .x/g.x//0 D @h.f .x/; g.x//

@uf 0.x/C @h.f .x/; g.x//

@ug0.x/

D g.x/f 0.x/C f .x/g0.x/:

4 Partial derivatives of higher order. Interchangeability

4.1

Similarly to the second derivative of a function of one variable, we may considerpartial derivatives of a partial derivative, i.e. of a function of the form g.x/ D @f .x/

@xk,

@g.x/

@xl:

The result, if it exists, is then denoted by

@2f .x/

@xk@xl:

More generally, we may iterate this process to obtain

@rf .x/

@xk1@xk2 : : : @xkr:

4 Partial derivatives of higher order. Interchangeability 75

These functions, when they exist, are called partial derivatives of order r .For example,

@3f .x; y; x/

@x@y@zand

@3f .x; y; x/

@x@x@x

are derivatives of third order (even though in the first case, we have taken a partialderivative by each variable only once).

To simplify notation, taking partial derivatives by the same variable more thanonce consecutively may be indicated by an exponent, e.g.,

@5f .x; y/

@x2@y3D @5f .x; y/

@x@x@x@y@y;

@5f .x; y/

@x2@y2@xD @5f .x; y/

@x@x@y@y@x:

4.2

Consider the function

f .x; y/ D x sin.y2 C x/:

Compute

@f .x; y/

@xD sin.y2 C x/C x cos.y2 C x/ and

@f .x; y/

@yD 2xy cos.y2 C x/:

Computing the second-order derivatives, we obtain

@2f

@x@yD 2y cos.y2 C x/ � 2xy sin.y2 C x/ D @2f

@y@x:

Whether it is surprising or not, it suggests a conjecture that higher order partialderivatives do not depend on the order of differentiation. In effect, this is true –provided all the derivatives in question are continuous.

4.2.1 Proposition. Let f .x; y/ be a function such that the partial derivatives@2f

@x@y

and@2f

@y@xare defined and continuous in a neighborhood of a point .x; y/. Then we

have

@2f .x; y/

@x@yD @2f .x; y/

@y@x:


Proof. Consider the function of a real variable h defined by the formula

F.h/ D f .x C h; y C h/� f .x; y C h/ � f .x C h; y/C f .x; y/

h2:

If we set

'h.y/ D f .x C h; y/ � f .x; y/ and

k.x/ D f .x; y C k/ � f .x; y/;we have

F.h/ D 1

h2.'h.y C h/ � 'h.y// D 1

h2. h.x C h/� h.x//:

Let us compute the first expression. The function 'h, which is a function of onevariable y, has the derivative

' 0h.y/ D @f .x C h; y/

@y� @f .x; y/

@y

and hence by 3.3, we have

F.h/ D 1

h2.'h.y C h/ � 'h.y// D 1

h' 0h.y C �1h/

D @f .x C h; y C �1h/

@y� @f .x; y C �1h/

@y:

Using 3.3 again, we obtain

F.h/ D @

@x

�@f .x C �2h; y C �1h/

@y

�(*)

for some �1; �2 between 0 and 1.Similarly, computing 1

h2. h.x C h/� h.x//, we obtain

F.h/ D @

@y

�@f .x C �4h; y C �2h/

@x

�: (**)

Now since both@

@y.@f

@x/ and

@

@x.@f

@y/ are continuous at the point .x; y/, we can

compute limh!0

F.h/ from either of the formulas (*) or (**) and obtain

limh!0

F.h/ D @2f .x; y/

@x@yD @2f .x; y/

@y@x: ut

5 The Implicit Functions Theorem I: The case of a single equation 77

Remark. Look what happens: F.h/ (and its possible limit in 0) is an attempt

to compute the second partial derivative in one step. The continuity of@2f

@x@yand

@2f

@y@xmakes sure that it is, in fact, possible.

4.3

Iterating the interchanges allowed by 4.2.1, we easily obtain, as a corollary,

Theorem. Let a function f of n variables possess continuous partial derivativesup to the order k. Then the values of these drivatives depend only on the number oftimes a partial derivative is taken in each of the individual variables x1; : : : ; xn.

4.3.1Thus, under the assumption of the theorem, we can write a general partial derivativeof the order r � k as

@rf

@xr11 @x

r22 : : : @x

rnn

with r1 C r2 C � � � C rn D r

where, of course, rj D 0 is allowed and indicates the absence of the symbol @xj .

5 The Implicit Functions Theorem I: The case of a singleequation

5.1

Suppose we have a function of nC 1 variables, which we will write as

F.x; y/;

and consider the problem of finding a solution y D f .x/ of the equation

F.x; y/ D 0: (5.1.1)

Even in very simple cases we can hardly expect a unique solution. Take for exampleF.x; y/ D x2 Cy2 � 1. Then for jxj > 1 there is no solution f .x; y/. For jx0j < 1,for some open interval containing x0, we have two solutions

f .x/ Dp1 � x2 and g.x/ D �

p1 � x2:


This is better, but we have two values in each point, contradicting the definition ofa function. To achieve uniqueness, we have to restrict not only the values of x, butalso the values of y to an interval .y0 � ; y0 C / (where F.x0; y0/ D 0). That is,if we have a particular solution .x0; y0/ we must restrict our attention to a “window”

.x0 � ı; x0 C ı/ .y0 � ; y0 C /

through which we see a unique solution.In our example, there is also the case .x0; y0/ D .1; 0/, where there is a unique

solution, but no suitable window as above, since in every neighborhood of .1; 0/,there are no solutions on the right-hand side of .1; 0/, and two solutions to the left.In another example

y2 � jxj D 0;

the solution .0; 0/ can be extended indefinitely both ways, but still there is noneighborhood of .0; 0/ in which there would be a unique solution.

5.2

Actually, the above examples cover more or less all the exceptions that can occurfor “reasonable” functions F .

Theorem. Let F.x; y/ be a function of nC 1 variables defined in a neighborhoodof a point .x0; y0/. Let F have continuous partial derivatives up to the order r � 1

and let

F.x0; y0/ D 0 and

ˇˇ@F.x0; y0/

@y

ˇˇ ¤ 0:

Then there exist ı > 0 and > 0 such that for every x with jjx � x0jj < ı thereexists precisely one y with jy � y0j < such that

F.x; y/ D 0:

Furthermore, if we write y D f .x/ for this unique value y, then the function

f W .x01 � ı; x01 C ı/ � � � .x0n � ı; x0n C ı/ ! R

has continuous partial derivatives up to the order r .

Proof. As before, we write jjxjj D maxixi . Let

J.< �/ D fx j jjx � x0jj < �g and J.�/ D fx j jjx � x0jj � �g

(thus, the “window” interval we are seeking is J .< ı/ .y0 � ; y0 C ı/.

5 The Implicit Functions Theorem I: The case of a single equation 79

Without loss of generality, let, say,

@F.x0; y0/

@y> 0:

Since the first partial derivatives of F are continuous, there exist a > 0, K , ı1 > 0

and > 0 such that for all .x; y/ 2 J.ı1/ hy0 � ; y0 C i, we have

@F.x; y/

@y� a and

ˇˇ@F.x; y/

@xi

ˇˇ � K (5.2.1)

(use Theorem 6.6 of Chapter 2).I. The function f : For fixed x 2 J.ı1/, we will consider the function of one

variable y 2 .y0 � ; y0 C / defined by

'x.y/ D F.x; y/:

Thus, ' 0x.y/ D @F.x;y/

@y> 0 and hence

all 'x.y/ with x 2 J.ı1/ are increasing functions of

y, and 'x0 .y0 � / < 'x0.y0/ D 0 < 'x0.y0 C /.

By 2.6 and 2.3, F is continuous, and hence there is a ı, 0 < ı � ı1, such that

8x 2 J.< ı/; 'x.y0 � / < 0 < 'x.y0 C /:

Now by Theorem 3.3 of Chapter 1, there is precisely one y 2 .y0 � ; y0 C /

('x is one-to-one since it is increasing) such that 'x.y/ D 0 – that is, F.x; y/ D0. Define this to be f .x/.

II. The first derivatives. We will fix an index j , abbreviate the .j �1/-dimensionalvector x1; : : : ; xj�1 by xb (“the xi ’s before”) and the .n�j /-dimensional vectorxjC1; : : : ; xn by xa (“the xi ’s after”); thus, we may write

x D .xb; xj ; xa/:

Compute @f

@xjas the derivative of .t/ D f .xb; t; xa/.

By 3.3, we have

0 D F.xb; t C h; xa; .t C h// � F.xb; t; xa; .t//

D F.xb; t C h; xa; .t/C . .t C h/ � .t/// � F.xb; t; xa; .t//


D @F.xb; t C �h; xa; .t/C �. .t C h/ � .t///@xj

h

C @F.xb; t C �h; xa; .t/C �. .t C h/� .t///

@y. .t C h/� .t//

and hence

.t C h/ � .t/ D �h �@F.xb; t C �h; xa; .t/C �. .t C h/� .t///

@xj

@F.xb; t C �h; xa; .t/C �. .t C h/� .t///

@y(5.2.2)

for some � between 0 and 1.Thus by (5.2.1),

j .t C h/ � .t/j � jhj �ˇˇKa

ˇˇ

and f is continuous (note that we have not known that before). Using this fact,we can compute from (5.2.2)

limh!0

.t C h/� .t/

hD

D � limh!0

@F.xb ; t C �h; xa; .t/C �. .t C h/� .t///

@xj

@F.xb ; t C �h; xa; .t/C �. .t C h/� .t///

@y

D �@F.xb ; t; xa; .t//

@xj

@F.xb ; t; xa; .t//

@y

:

III The higher derivatives. Note that we have not only proved the existence of thefirst derivative of f , but also the formula

@f .x/

@xjD �@F.x; f .x//

@xj��@F.x; f .x//

@y

��1: (5.2.3)

From this we can inductively compute the higher derivatives of f (using thestandard rules of differentiation) as long as the derivatives

@rF

@xr11 � � � @xrnn @yrnC1

exist and are continuous. ut

6 The Implicit Functions Theorem II: The case of several equations 81

5.3

We have obtained the formula (5.2.3) while proving that f has a derivative. If weknew beforehand that f has a derivative, we could deduce (5.2.3) immediately fromthe chain rule. In effect, we have

0 � F.x; f .x//I

taking a derivative of both the sides we obtain

0 D @F x; f .x//

@xjC @F x; f .x//

@y� @f .x/@xj

:

Differentiating further, we obtain inductively linear equations from which we cancompute the values of all the derivatives guaranteed by the theorem.

5.4 Remark

The solution f in 5.2 has as many derivatives as the initialF . But note the restrictionr � 1. One usually thinks of the 0-th derivative as of the function itself. The theoremdoes not guarantee a continuous solution f of an equation F.x; f .x// D 0 withcontinuousF . Even just for the existence of the f we have used the first derivatives.

6 The Implicit Functions Theorem II: The case of severalequations

6.1 A warm-up: what happens in the case of two equations

Suppose we try to find a solution yi D fi .x/, i D 1; 2, of a pair of equations

F1.x; y1; y2/ D 0;

F2.x; y1; y2/ D 0

in a neighborhood of a point .x0; y01 ; y02/ (at which the equalities hold). We will

apply the “substitution method” based on Theorem 5.2. First we will think ofthe second equation as an equation for the unknown y2; in a neighborhood of.x0; y01 ; y

02/we obtain y2 as a function .x; y1/. Substitute this into the first equation

to obtain

G.x; y1/ D F1.x; y1; .x; y1//I

if we find, in a neighborhood of .x0; y01 /, a solution y1 D f1.x/, we can substitute itinto and obtain y2 D f2.x/ D .x; f1.x//.


What did we have to assume? First, of course, we have to have the continuouspartial derivatives of the functions Fi . Then, to be able to obtain by 5.2 the waywe did, we need to have

@F2

@y2.x0; y01 ; y

02/ ¤ 0: (6.1.1)

Finally, we also need to have

@G

@y1.x0; y01 / ¤ 0I

by 3.1.1, this is equivalent to

@F1

@y1C @F1

@y2

@

@y1¤ 0: (6.1.2)

Now we have (recall (5.2.3))

@

@y1D �

�@F1

@y2

��1@F2

@y1

and (6.1.2) becomes

�@F1

@y2

��1 �@F1

@y1

@F2

@y2� @F1

@y2

@F2

@y1

�¤ 0;

that is,

@F1

@y1

@F2

@y2� @F1

@y2

@F2

@y1¤ 0:

This formula should be conspicuously familiar. Indeed, it is (see the notation fordeterminants from Subsection 3.3 of Appendix B)

ˇˇˇˇˇ

@F1

@y1;@F1

@y2

@F2

@y1;@F2

@y2

ˇˇˇˇˇ

D det

�@Fi

@yj

�

i;j

¤ 0: (6.1.3)

Note that if we assume that this determinant is non-zero we have either

@F2

@y2.x0; y01 ; y

02 / ¤ 0

and/or

@F2

@y1.x0; y01 ; y

02 / ¤ 0;


so if the latter holds, we can start by solving F2.x; y1; y2/ D 0 for y1 instead of y2.Thus the condition (6.1.3) suffices.

6.2 The Jacobian

For a system of functions

F.x; y/ D .F1.x; y1; : : : ; ym/; : : : ; Fm.x; y1; : : : ; ym//

and variables y1; : : : ; ym, define the Jacobi determinant (briefly, the Jacobian)

D.F/

D.y/D det

�@Fi

@yj

�

i;jD1;:::;m:

6.3

By extending the substitution procedure indicated in 6.1, we will now prove thegeneral Implicit Function Theorem.

Theorem. Let Fi .x; y1; : : : ; ym/, i D 1; : : : ; m, be functions of n C m variableswith continuous partial derivatives up to an order k � 1. Let

F.x0; y0/ D o

and let

D.F/

.y/.x0; y0/ ¤ 0:

Then there exist ı > 0 and > 0 such that for every

x 2 .x01 � ı; x01 C ı/ � � � .x0n � ı; x0n C ı/

there exists precisely one

y 2 .y01 � ; y01 C / � � � .y0m � ; x0m C /

such that

F.x; y/ D 0:

Furthermore, if we write this y as a vector function f.x/ D .f1.x/; : : : ; fm.x//, thenthe functions fi have continuous partial derivatives up to the order k.


Proof. We proceed by induction. By Theorem 5.2, the statement holds for m D 1.Now assume it holds for a givenm, and let us have a system of equations

Fi .x; y/; i D 1; : : : ; mC 1

satisfying the assumptions above (i.e. the unknown vector y is .mC1/-dimensional).Then, in particular, in the Jacobian determinant we cannot have a column consistingentirely of zeros, and hence, after possibly renumbering the Fi ’s, we may assumewithout loss of generality that

@FmC1@ymC1

.x0; y0/ ¤ 0:

If we write Qy D .y1; : : : ; ym/, we then have by the induction hypothesis ı1 > 0 and 1 > 0 such that for

.x; Qy/ 2 .x01 � ı1; x01 C ı1/ � .x0n � ı1; xn1 C ı1/ � � � .y0m � ı1; y

0m C ı1/;

there exists precisely one ymC1 D .x; Qy/ satisfying

FmC1.x; Qy; ymC1/ D 0 and jymC1 � y0mC1� < 1:

This has continuous partial derivatives up to the order k and hence so have thefunctions

Gi.x; Qy/ D Fi .x; Qy; .x; Qy//; i D 1; : : : ; mC 1

(the last of which, GmC1, is identically zero). By 3.1.1, we then have

@Gj

@yiD @Fj

@yiC @Fj

@ymC1@

@yi: (6.3.1)

Now consider the determinant

D.F/

D.y/D

ˇˇˇˇˇˇˇˇˇˇˇ

@F1

@y1; : : : ;

@F1

@ym;

@F1

@ymC1

: : : ; : : : ; : : : ; : : :

@Fm

@y1; : : : ;

@Fm

@ym;

@Fm

@ymC1

@FmC1@y1

; : : : ;@FmC1@ym

;@FmC1@ymC1


:


Add to the i th column the product of the last column with the scalar@

@yi. By (6.3.1),

taking into account the fact that GmC1 � 0 and hence

@GmC1@yi

D @FmC1@yi

C @FmC1@ymC1

@

@yiD 0;

we obtain

D.F/

D.y/D


@G1

@y1; : : : ;

@G1

@ym;

@F1

@ymC1

: : : ; : : : ; : : : ; : : :

@Gm

@y1; : : : ;

@Gm

@ym;@Fm

@ymC1

0; : : : ; 0;@FmC1@ymC1


D @FmC1@ymC1

� D.G1; : : : ; Gm/

D.y1; : : : ; ym/:

Thus,

D.G1; : : : ; Gm/

D.y1; : : : ; ym/¤ 0

and hence by the induction hypthesis there are ı2 > 0, 2 > 0 such that forjxi � x0i j < ı2 there is a uniquely determined Qy with jyi � y0i j < 2 such that

Gi.x; Qy/ D 0 for i D 1; : : : ; m

and that the resulting fi .x/ have continuous partial derivatives up to the order k.If we define, further,

fiC1.x/ D .x; f1.x/; : : : ; fm.x//

we obtain a solution f of the original system of equations F.x; y/ D 0.The proof is almost finished but not quite. What about the uniqueness of the

solution within the constraints jjx � x0jj < ı and jjy � y0jj < ? Does uniquenessin the two steps of the proof above (solving FmC1.x; y1; : : : ; ym; ymC1/ D 0 forymC1, and then G.x; Qy/ D 0 for y1; : : : ; ym) really guarantee that a different solutioncannot be found by some other procedure (e.g. reversing the order of variables)? Butluckily, in this particular proof, this turns out not to be a serious problem.

Choose 0 < � ı1; 1; 2 and then 0 < ı < ı1; ı2 and, moreover, sufficientlysmall so that for jx1 � x0i j < ı one had jfj .x/� fj .x0/j < (the last to make sureto have in the -interval at least one solution). Now let

F.x; y/ D o; and jjx � x0jj < ı and jjy � y0jj < : (6.3.2)


We have to prove that then necessarily yi D fi .x/ for all i . Since jxi � x0i j < ı � ı1for i D 1; : : : ; n, jyi�y0i j < � ı1 for i D 1; : : : ; m and jymC1�y0mC1j < � 1

we have, necessarily, ymC1 D .x; Qy/. Thus, by (6.3.2),

G.x; Qy/ D o

and since jxi � x0i j < ı � ı2 and jyi � y0i j < � 2 we have indeed yi D fi .x/.ut

7 An easy application: regular mappings and the InverseFunction Theorem

7.1

Let U � Rn be an open set. A mapping f W U ! Rn is said to be regular if each fihas continuous partial derivatives @fi

@xjand if for all the x 2 D, we have

D.f/

D.x/.x/ ¤ 0:

7.2 Proposition. Let f W U ! Rn be a regular mapping. Then the image f ŒV � of

every open V � U is open.

Proof. Let f .x0/ D y0. Define F W V Rn ! Rn by setting

Fi .x; y/ D fi .x/� yi : (7.2.1)

Thus F.x0; y0/ D o and D.F/D.x/ ¤ 0, and hence, by 6.3, there exist ı > 0 and > 0

such that for every y with jjy � y0jj < ı, there exists (precisely one, but this is notimportant at this moment) x with jjx � x0jj < and Fi .x; y/ D fi .x/�yi D 0. Thismeans that we have f.x/ D y (note that the roles of the xi and the yi are reversedfrom the usual convention: here, the yi are the independent variables). Thus, wehave

.y0; ı/ D fy j jjy � y0jj < ıg � f ŒV �: ut

Remark. Confront this fact with the characterization of continuous maps inTheorem 3.4 of Chapter 2: for regular maps, both images and preimages of opensets are open.

7.3 Proposition. Let f W U ! Rn be a regular mapping. Then for each x0 2 U

there exists an open neighborhood V such that the restriction fjV is one-to-one.Moreover, the mapping g W f ŒV � ! Rn inverse to fjV is regular.

8 Taylor’s Theorem, Local Extremes and Extremes with Constraints 87

Proof. Consider the F from (7.2.1) again. We have, for a sufficiently small > 0,precisely one x D g.y/ such that F.x; y/ D 0 and jjx � x0jj < . This g has,furthermore, continuous partial derivatives. We have, by 3.2,

D.Id/ D D.f ı g/ D Df � Dg:

By the chain rule,

D.f/

D.x/� D.g/

D.y/D detDf � detDg D 1

and hence for each y 2 fŒV �, D.g/D.y/ .y/ ¤ 0. ut

7.3.1 Corollary. If a regular mapping f W U ! Rn is one-one, then the inverseg W fŒU � ! Rn is regular as well.

8 Taylor’s Theorem, Local Extremes and Extremeswith Constraints

8.1 Taylor’s Theorem

A function f defined on an open set of Rn is called a C r -function if f is con-tinuous and possesses continuous partial derivatives up to (and including) order r .A function which is C r for all r 2 N is called C1. C1 functions will be also calledsmooth, whileC1-functions will be called continuously differentiable. (Terminologyin the literature varies, some texts use the word smooth for C1. We shall never doso in the present text.) Taylor’s Theorem for multivariable functions may look moreintimidating, but we will see that it is an easy consequence of the correspondingsingle variable theorem:

Theorem. (Taylor) Let f be a C rC1-function defined on an open convex subsetU � Rn, and let a 2 U . Then for every point x 2 U , x ¤ x0, there exists a point con the open line segment connecting a and x such that

f .x/ D

rXkD0

Xk1C��CknDk; ki�0

1

k1Š : : : knŠ

@kf .a/

.@x1/k1 : : : .@xn/kn.x1 � a1/k1 : : : .xn � an/

kn

CX

k1C��CknDrC1; ki�0

1

k1Š : : : knŠ

@kf .c/

.@x1/k1 : : : .@xn/kn.x1 � a1/k1 : : : .xn � an/

kn :

(*)


Proof. Simply use Theorem 4.5.1 for the function

g.t/ D f .a C t.x � a//:

The formula (*) follows immediately from the observation

g.k/.t/ D

Xk1C��CknDk ki�0

kŠ

k1Š : : : knŠ

@kf .s/

.@s1/k1 : : : .@sn/kn

ˇˇˇsD.aCt .x�a//

.x1 � a1/k1 : : : .xn � an/kn

(**)

which follows by applying the chain rule repeatedly. ut

It is useful to note that the affine approximation in the sense of 3.2 of the functionf at a point a is simply the sum of the constant and linear terms of its Taylorexpansion.

8.2 Local extremes and critical points

Let f be a function defined on an open subset U � Rn and let x0 2 U . In analogywith the one-variable case, (4.7 of Chapter 1), we say that f has a local minimum(resp. local maximum) at x0 if there exists a ı > 0 such that for every x 2 .x0; ı/with x ¤ x0, we have f .x/ > f .x0/ (resp. f .x/ < f .x0/). A local minimum or alocal maximum are referred to by the joint term local extreme.

On the other hand, x0 is called a critical point of f if either f does not have atotal differential at x0, or the total differential is 0. The following is then a directconsequence, for example, of Proposition 2.3 and Corollary 4.3.1 of Chapter 1.

8.2.1 Proposition. A local minimum or local maximum of a function f W U ! R

is a critical point of f .

8.3 The Hessian

Just as in 4.7 of Chapter 1, we would like a partial converse of Proposition 8.2.1based on second derivatives. We will see, however, that in the multivariable case,the geometry is intrinsically more complicated. Suppose a function f W U ! R isC2 on some open set U � Rn. One considers the Hessian matrix H of type n nwhose .i; j /’th entry is

@2f

@xi @xj:


This is a symmetric matrix by Proposition 4.2.1, and hence has an associated realsymmetric bilinear form. If the Hessian is non-degenerate at a critical point x0, wecall x0 a non-degenerate critical point. We have the following

Theorem. Suppose f is C2 on an open set U � Rn containing a non-degeneratecritical point x0. Then the following holds: if the Hessian H.x0/ is positive-definite(resp. negative-definite) at x0, then x0 is a local minimum (resp. local maximum). Ifthe Hessian is indefinite, then x0 is neither a local minimum nor a local maximum.Such point x0 is called a saddle point.

Proof. By Taylor’s Theorem 8.1, for any � > 0 for which .x0; �/ � U , for everyx 2 .x0; ı/, x ¤ x0, there exists a point c on the open line segment connecting x0and x such that

f .x/ D f .x0/C 1

2.x � x0/

TH.c/.x � x0/: (8.3.1)

Then we conclude that ifH.c/ is positive-definite (resp. negative-definite), we havef .x/ > f .x0/ (resp. f .x/ < f .x0/). If H.c/ is indefinite, then, by definition, bothpositive and negative values will occur.

However, in the statement of the theorem, we have H.x0/, not H.c/. To remedythis situation, we proceed as follows: Consider

�.v; c/ D vTH.c/v

as a function of .v; c/ 2 X where

X D f.v; c/ j v 2 Rn; v � v D 1; c 2 .x0; �=2/g:

Then by our assumptions, � is continuous. However, X � R2n is compact byTheorem 6.5 of Chapter 2, and hence by Theorem 6.6 of Chapter 2, � is uniformlycontinuous.

Now supposeH.x0/ is positive-definite. The closed subsetX0 � X consisting ofall .v; c/ where c D x0 is compact, and hence � has a minimum value m on X0 byProposition 6.3 of Chapter 2. SinceH.x0/ is positive-definite, we havem > 0. Nowby the uniform continuity of �, there exists a ı > 0 such that for all c 2 .x0; ı/,.v; c/ 2 X , �.v; c/ > 0, and henceH.c/ is positive-definite also.

The case of H.x0/ negative-definite is handled analogously.When H.x0/ is indefinite non-degenerate, there exist .v1; x0/ 2 X0, .v2; x0/ 2

X0 such that �.v1; x0/ > 0, �.v2; x0/ < 0. Since � is continuous, there exists aı > 0 such that for c 2 .x0; ı/, �.v1; c/ > 0, �.v2; c/ < 0, and hence H.c/ isindefinite. ut


8.4 Global extremes

Suppose f W X ! R is a continuous function on a compact subset X � Rn. Thenby Proposition 6.3 of Chapter 2, f attains a (global) minimum and maximum on Xat some points x1; x2 2 X . Can we find these points in practice? This is a classicexample of an optimization problem, which, as the reader can imagine, has manyapplications outside of mathematics.

The first method that comes into mind is computing all critical points, andchecking the values to see at which of these points the maximum (resp. minimum)occurs. This is generally an adequate method when n D 1. A typical (although notgeneral, see Exercise (19) in Chapter 2 above) example of a set X is a compactinterval or a finite union of compact intervals. If it happens (as it often does) thatthe equation f 0.x/ D 0 has only finitely many solutions, then the only other criticalpoints to check are the finitely many boundary points of the intervals.

One immediately realizes, however, that the method in this form does not workeven for a perfectly “reasonable” compact subset X � Rn when n > 1 such asfor example a cube (or, more generally, a region with corners as we introduce it inChapter 12 below). The point is that the boundary of such sets X will in general beinfinite (in fact, uncountable, see Exercise (2) of Chapter 1), and will consist entirelyof critical points as defined above, so there is no way of checking all of them.

To see what else we can do, let us consider a simple example. Suppose we wantto find the local extremes of a function f .x; y/ which is continuously differentiableon some open set containing the ball B D f.x; y/ j x2 C y2 � 1g. Suppose we areto find the global extremes of f on the compact set B . In the interior of B , we canthen solve the equations

@f

@xD 0;

@f

@yD 0: (*)

On the boundary, the extreme may not satisfy the equations (*), but we note that theboundary is itself the set of solutions of the “nice” equation

x2 C y2 D 1: (C)

It is certainly worth asking if some generalization of (*) might hold, which wouldallow us to solve the problem. Note that generically speaking, we expect a singleequation in the boundary case, since in addition to it, we still have the equation(“constraint”) (C).

8.5 Local Extremes with constraints. Lagrange multipliers

The problem we encountered at the end of Subsection 8.4 can be formalized asfollows: Let U � Rn be open, and let f W U ! R be a real function. Let, further,


gi W U ! R be real functions, i D 1; : : : ; k. A point x0 2 U is called a localminimum (resp. maximum) subject to the constraints

gi .x/ D 0; i D 1; : : : ; k (*)

if x D x0 satisfies (*) and there exists a ı > 0 such that for every x 2 .x0; ı/,x ¤ x0 which satisfies (*) we have f .x/ > f .x0/ (resp. f .x/ < f .x0/.

We have the following

Theorem. Let f; g1; : : : ; gk be real functions defined in an open set D � Rn, andsuppose they are continuously differentiable. Suppose further that the rank of thematrix

M D

0BBB@

@g1

@x1; : : : ;

@g1

@xn: : : ; : : : ; : : :@gk

@x1; : : : ;

@gk

@xn

1CCCA

is exactly k at each point of D. Suppose a continuously differentiable functionf WU ! R has a local extreme subject to the constraints (*) at a point x D a D.a1; : : : ; an/. Then there exist numbers �1; : : : ; �n (known as Lagrange multipliers)such that for each i D 1; : : : ; n, we have

@f .a/

@xiC

nXjD1

�j � @gj .a/@xi

D 0:

Proof. See Subsection 2.4 of Appendix B. If the matrixM has rank k, then at leastone of the kk submatrices ofM is regular, and hence has a non-zero determinant.Without loss of generality, let us assume that at the extremal point we have, say,

ˇˇˇˇˇˇˇˇ

@g1

@x1; : : : ;

@g1

@xn

: : : ; : : : ; : : :

@gk

@x1; : : : ;

@gk

@xn

ˇˇˇˇˇˇˇˇ

¤ 0: (1)

If this holds, we have by 6.3 in a neighborhood of the point a functions

�i .xkC1; : : : ; xn/


(let us write Qx for .xkC1; : : : ; xn/) with contiuous partial derivatives such that

gi .�1.Qx/; : : : ; �k.Qx/; Qx/ D 0 for i D 1; : : : ; k:

Thus, an extreme (i.e. local maximum or local minimum) of f .x/ at a subject to thegiven constraints implies the corresponding extreme property (without constraints)of the function

F.Qx/ D f .�1.Qx/; : : : ; �k.Qx/; Qx/;

at Qa, and hence by (1),

@F.Qa/@xi

D 0 for i D k C 1; : : : ; n;

and this is, by 3.1.1, equivalent to

kXrD1

@f .a/

@xr

@�r .Qa/@xi

C @f .a/

@xifor i D k C 1; : : : ; n: (2)

Taking derivatives of the constant functions gi .�1.Qx/; : : : ; �.Qx/; Qx/ D 0 we obtainfor j D 1; : : : ; k,

kXrD1

@gj .a/

@xr

@�r .Qa/@xi

C @gj .a/

@xifor i D k C 1; : : : ; n: (3)

Now we will use (1) again, for another purpose. By Theorem B.2.5.1, the system oflinear equations

@f .a/

@xiC

nXjD1

�j � @gj .a/@xi

D 0; i D 1; : : : ; k;

has a unique solution �1; : : : ; �k . Those are the equalities from the statement, but,so far, for i � k only. It remains to be shown that the same equalities hold also fori > k. In effect, by (2) and (3), for i > k we obtain

@f .a/

@xiC

nXjD1

�j � @gj .a/@xi

D �kXrD1

@f .a/

@xr

@�r .Qa/@xi

�kX

jD1�j

kXrD1

@gj .a/

@xr

@�r .Qa/@xi

�nXrD1

0@@f .a/@xi

CnX

jD1�j � @gj .a/

@xi

1A @�r.Qa/

@xiD �

nXrD1

0 � @�r .Qa/@xi

D 0: ut


8.6 Remarks

1. The functions f; gi were assumed to be defined in an open D so that we cantake derivatives whenever we need them. In particular, this was used in the

computation [email protected]/

@xi, and the resulting equality (3) in Theorem 8.5 above.

Take the example of the unit ball B at the end of 8.4 as an example off .x; y/ D x C 2y. Then the formulas x C 2y and x2 C y2 � 1 make senseon all of R2.

2. The force of the statement in 8.5 is in asserting the existence of �1; : : : ; �k thatsatisfy more than k equations, thus creating equations for the �i ’s. In the above

mentioned example, we have@f

@xD 1 and

@f

@yD 2, g.x; y/ D x2 C y2 � 1 and

hence@g

@xD 2x and

@g

@yD 2y. There is one � that has to satisfy two equations

1C � � 2x D 0 and 2C � � 2y D 0:

This is possible only if y D 2x. Hence, as x2 C y2 D 1 we obtain 5x2 D 1 andhence x D ˙ 1p

5; this localizes the extremes to . 1p

5; 2p

5/ and . �1p

5

�2p5/.

8.7

A problem of finding extremes with constraints may not be related to extremes atboundary points. Here is an example of another nature.

Let us ask the question which rectangular parallelepiped of a given surface areahas the largest volume. Denoting the lengths of the edges by x1; : : : ; xn, the surfacearea is

S.x1; : : : ; xn/ D 2x1 � � �xn�1

x1C � � � C 1

xn

�

and the volume is

V.x1; : : : ; xn/ D x1 � � �xn:Thus, we have

@V

@xiD 1

xi� x1 � � �xn and

@S

@xiD 2

xi.x1 � � �xn/

�1

x1C � � � C 1

xn

�� 2x1 � � �xn 1

x2i:

If we write yi D 1xi

and s D y1C� � �Cyn and divide the equation from the theoremby x1 � � �xn, we obtain

2yi .s � yi /C �yi D 0; or yi D s C �

2:

Thus, all the xi are equal and the unique solution is the cube.


9 Exercises

(1) Prove that the function

f .x; y/ D

8ˆ<ˆ:

.x2 � y/2

x4 C y2for .x; y/ ¤ .0; 0/;

1 for .x; y/ D .0; 0/

becomes continuous when restricted to any straight line in R2. Prove, however,

that f is not continuous.(2) Let f .x; y/ W R2 ! R be the function defined by

f .x; y/ D e� xy � y

x for x; y ¤ 0

and by f .x; 0/ D f .0; y/ D 0. Prove that f has partial derivatives of allorders on R2, but is not continuous.[Hint: for x; y ¤ 0, inductively, all partial derivatives (including higher ones)are of the form Q.x; y/f .x; y/ where Q is a rational function. Taking limitsof such functions along vertical or horizontal lines to points of the form .0; y/,.x; 0/, however, the limit is always 0. Therefore, by the mean value theorem,the (possibly higher) partial derivative in question is also 0 at those points.]

(3) Prove Proposition 2.4 in detail.(4) Prove that if in 3.1.1 the functions gk have total differentials in b then f ı g

has one as well.(5) Derive, similarly as in 3.4, a formula for the derivative of f

g.

(6) How many different expressions 4.3.1 are there?(7) Find the first three summands in the Taylor expansion of the solution of (5.2.3).(8) Give a counterexample of the statement of Theorem 5.2 when we drop the

assumption r D 1.(9) Implicit differentiation Let functions F W U ! Rm, U � RnCm be open as

in Theorem 6.3 and let f W V ! Rn, V � Rn be open the map mentioned inTheorem 6.3. Let DxF W Rn ! Rm and DyF W Rm ! Rm be linear maps suchthat DF.x; y/ D DxF.x/C DyF.y/. Using the chain rule, prove that then

Dfjx D �.DyF j.x;f.x///�1.DxF j.x;f.x///:

(10) Prove formula 8.1 (**) in detail.(11) Prove Proposition 8.2.1 in detail.

9 Exercises 95

(12) (a) Find a maximum and minimum of the function f .x; y/ D axCby on theset

B D f.x; y/ j x2 C y2 � 1g � R2

for every choice of values of the constants a; b 2 R.(b) Find a minimum and maximum of the function f .x; y/ D x2 C 2y2 on

the set B .

4Integration I: Multivariable Riemann Integraland Basic Ideas Toward the Lebesgue Integral

1 Riemann integral on an n-dimensional interval

In the first part of this chapter we will present a simple generalization of the one-dimensional Riemann integral which the reader already knows (see Section 8 ofChapter 1). To start with, we will consider the integral only for functions definedon n-dimensional intervals (D“bricks”) and we will be concerned, basically, withcontinuous functions. Later, the domains and functions to be integrated on willbecome much more general.

1.1

A compact interval in the n-dimensional Euclidean space Rn is a product

J D ha1; b1i � � � han; bni

where hak; bki are compact intervals in R.A partition D of such interval is an n-tuple .D1; : : : ;Dn/ where the Di are

partitions of the intervals hai ; bi i, that is, sequences

Di W ai D ti1 < ti2 < � � � < ti;ni D bi ; (*)

often also viewed as sequences of intervals

hti1; ti2i; hti2; ti3i; : : : ; hti;ni�1; ti;ni i:

The partition D above is called a refinement of a partition D0 D .D01; : : : ;D

0n/ if

the sequences (*) above are subsequences of the sequences


97

98 4 Integration I: Multivariable Riemann Integral and Basic Ideas Toward the: : :

D0i W ai D t 0i1 < t 0i2 < � � � < t 0

i;n0i

D bi :

We have the obvious

1.1.1 Observation. Any two partitions have a common refinement.

1.2

A member of a partition D D .D1; : : : ;Dn/ is any of the intervals (bricks)ht1;i1 ; t1;i1C1i � � � htn;in ; tn;inC1i where the ti;j are as in (*). The set of all membersof a partitionD will be denoted by jDj.

The volume of an interval J D ha1; b1i � � � han; bni is the number

volJ DnYiD1.bi � ai /:

Let f be a bounded function on an interval J and let D be a partition of J . Thelower (resp. upper) sum of f in D is the number

s.f;D/ DXK2jDj

mK � volK resp. S.f;D/ DXK2jDj

MK � volK

where

mK D infff .x/ j x 2 Kg and MK D supff .x/ j x 2 Kg:

1.2.1From the definitions of suprema and infima we immediately see that ifD refinesD0then

s.f;D/ � s.f;D0/ and S.f;D/ � S.f;D0/; (*)

and taking into account a common refinement we immediately obtain

Observation. For any two partitionsD;D0 we have

s.f;D/ � S.f;D0/:

Now we can define the lower and the upper Riemann integral of f over J bysetting

Z

J

f D supD

s.f:D/ andZ

J

f D infDS.f:D/;

1 Riemann integral on an n-dimensional interval 99

and if these two values coincide we speak of the Riemann integral of f over J andwrite

Z

J

f

or, if we wish to emphasize the variables,

Z

J

f .x1 : : : ; xn/dx1 � � � dxn orZ

J

f .x/dx:

We then speak of a Riemann integrable function.

1.3

The following easy fact can be left to the reader (it can be proved by a literalrepetition of the one variable case – Exercise (1)).

Proposition. If f; g are Riemann integrable and if ˛; ˇ are real numbers thenf C ˇg is Riemann integrable and we have

Z. f C ˇg/ D ˛

Z

J

f C ˇ

Z

J

g:

1.4 Almost disjoint unions of intervals

An interval J D ha1; b1i � � � han; bni is an almost disjoint union of a pair ofintervals J i D hai1; bi1i � � � hain; bini, i D 1; 2, if for some k we have

8i ¤ k; .ai ; bi / D .a1i ; b1i / D .a2i ; b

2i /; and

a1 D a1i ; b1i D a2i ; b

2i D bi or a1 D a21; b

21 D a1i ; b

1i D bi :

An interval J is an almost disjoint union of intervals J1; J2; : : : ; Jn if it can beproduced recursively from J1; : : : ; Jn by taking almost disjoint unions of pairs,using each Ji precisely once.

1.4.1 Proposition. Let L be an almost disjoint union of intervals J i , i D 1; : : : ; n.Then

Z

J

f DnXiD1

Z

J i

f andZ

J

f DnXiD1

Z

J if:


Proof. It suffices to prove the statement for an almost disjoint union of a pair ofintervals J 1; J 2, and for this case it suffices to realize that each partition of J canbe refined into a pair of partitions of the J i ’s, and, on the other hand, from any pairof partitions of the J i ’s we can obtain, using common refinements, a partition of J .

ut

2 Continuous functions are Riemann integrable

2.1 Theorem. A function F is Riemann integrable if and only if for every " > 0

there exists a partition D such that

S.D; f /� s.D; f / < ":

Proof. If the formula holds, then, for each " > 0,

Z

J

f � S.D; f / < s.D; f /C " �Z

J

f C " �Z

J

f C ":

On the other hand, ifRJf D R

Jf then by definition there are D0;D00 such that

S.D; f /� s.D; f / < "; take a common refinementD ofD0;D00 and use 1.2.1 (*).ut

Theorem. Every continuous function on an interval J is Riemann integrable.

Proof. By Theorem 6.6 of Chapter 2, f is uniformly continuous. Take an " > 0 andchoose a ı > 0 such that for the distance in R

n we have

d.x; y/ < ı ) jf .x/� f .y/j < "

volJ:

Further, choose a partitionD such that

8K 2 jDj;8 x; y 2 K; d.x; y/ < ı:

Then we have for mK D infff .x/ j x 2 Kg and MK D supff .x/ j x 2 Kg,

MK �mK � "

volJand since obviously

XK2jDj

volK D volJ;

3 Fubini’s Theorem in the continuous case 101

we have

S.D; f /� s.D; f / DXK2jDj

.MK �mK/volK � "

volJ

XK2jDj

volK D ": ut

2.2

The following statements are straightforward (they hold more generally, but we willneed them so far for continuous functions only).

Proposition. Let f; g be continuous functions. Then1. j R f j � R jf j.2. If f � g then

Rf � R

g.3. In particular if f .x/ � C for all x 2 J then

Z

J

f � C � volJ:

3 Fubini’s Theorem in the continuous case

3.1 Theorem. Let J 0 � Rm, J 00 � Rn be intervals, J D J 0 J 00. Let f be acontinuous function defined on J . Then

Z

J

f .x; y/d.x; y/ DZ

J 0

.

Z

J 00

f .x; y/dy/dx DZ

J 00

.

Z

J 0

f .x; y/dx/dy:

Proof. We will prove the first equality, the second one is analogous.Put F.x/ D R

J 00 f .x; y/dy. We will prove that

Z

j

f DZ

j 0

F:

This will also include the fact that the latter integral exists; this could be easilyshown by proving, using uniform continuity, that F is continuous. But we will getit during the proof for free anyway.

Choose a partition D of J such that

Zf � " � s.f;D/ � S.f;D/ �

Zf C ":

The partition D (as any partition of J ) obviously consists of a partition D0 of J 0and a partitionD00 of J 00, and we have


jDj D fK 0 K 00 j K 0 2 jD0j; K 00 2 jD00jg

and each member appears as precisely one K 0 K 00. We have

F.x/ �X

K002jD00jmaxy2K00

f .x; y/volK 00

and hence

S.F;D0/ �X

K02jD0jmaxx2K0

.X

K002jD00jmaxy2K00

f .x; y/ � volK 00/ � volK 0

�X

K02jD0j

XK002jD00j

max.x;y/2K0�K00

f .x; y/ � volK 00volK 0

�X

K0�K002jDjmax

z2K0�K00f .z/ � vol.K 00 K 0/ D S.f;D/

and similarly

s.f;D/ � s.F;D0/:

Hence we have

Z

j

f � " � s.F;D0/ �Z

J 0

�Z

J 0

� S.F;D/ �Z

J

f C ";

and thereforeRJ 0 F exists and is equal to

RJf . ut

4 Uniform convergence and Dini’s Theorem

4.1 Theorem. Let fn be continuous real functions on a compact interval J and letthem converge uniformly to a function f . Then

Z

J

f D limn!1

Z

J

fn:

Proof. Choose an " > 0 and an n0 such that, for n � n0,

jfn.x/� f .x/j < "

volJ:

The symbolsmK andMK will be as in 1.2, and the corresponding values for fn willbe denoted by mn

K andMnK . Thus we have

4 Uniform convergence and Dini’s Theorem 103

jmK �mnkj; jMK �Mn

k j < "

volJ

so that

js.f;D/ � s.fn;D/j �XK2jDj

jmK �mnK j � volK < "

(again we use the fact thatXK2jDj

volK D volJ ) and similarly

jS.f;D/� S.fn;D/j < ":Choose a partitionD such that

Z

J

f � " � s.f;D/ � S.f;D/ �Z

J

f C ":

ThenZ

J

f � 2" � s.f;D/ � " � s.fn;D/ �Z

J

fn

� S.fn;D/ � S.f;D/C " �Z

J

f C 2";

and we conclude that limRJfn D R

Jf . ut

4.2 Notation

A sequence .fn/n of functions is said to be increasing if for all x

f1.x/ � f2.x/ � � � � � fn.x/ � � � �

(usually this is referred to as non-decreasing, but “increasing” is shorter and therewill be no danger of confusion). Similarly we speak of a decreasing sequence.

In the remainder of this chapter, we will allow infinite values, that is, a functionwill be a mapping f W Rm ! R [ f�1;C1g. Consequently, an increasing (resp.decreasing) sequence .fn/n always has a limit, namely the supremum resp. infimum.We write

fn % f resp. fn & f

and if there is a danger of confusion (e.g. in double indexing) we emphasize thevarying index as in

fnk %k fn; fnk &k fn:


The notation an & a, an % a may be used also for monotone sequences ofnumbers.

The constant zero function will be denoted, hopefully without danger of confu-sion, simply by 0.

4.3 Theorem. (Dini) Let fn be continuous real functions on a compact metricspace X and let fn & 0. Then fn converge to 0 uniformly.

Proof. It suffices to prove that mn D maxxfn.x/ converges to zero, because then

jfn.x/ � 0j < " for sufficiently large n independently of the choice of x 2 X .Suppose it does not. Reducing, possibly, fn to a subsequence, we obtain an

example with

fn & 0 and 8n; mn > "0

for a fixed "0 > 0.Since X is a compact metric space, there exist xn such that fn.xn/ D mn, and

we can choose a subsequence of xn converging to some x 2 X . After reducing to asubsequence, we may assume without loss of generality that we have

fn & 0; 8n fn.xn/ > "0 and limnxn D x:

Now for k � n,

fn.xk/ � fk.xk/ > "0

and hence

fn.x/ D limkfn.xk/ � "0 for all n:

This is a contradiction with limnfn.x/ D 0. ut

4.4

From 4.2 and 4.3, we immediately obtain the following

Corollary. Let fn be continuous real functions on a compact interval J and letfn & 0. Then

limn

Zfn D 0:

5 Preparing for an extension of the Riemann integral 105

5 Preparing for an extension of the Riemann integral

5.1

For many purposes, the Riemann integral is not sufficiently general. For example,we may be interested in computing integrals such as

Z 1

0

dxpx

D 2pxj10 D 2;

which however is incorrect in the setting we considered so far, since the Riemannintegral on the left-hand side does not exist. While in this particular case there is aquick fix in the form of “improper Riemann integrals” (which we do not treat here),clearly, a more systematic solution is needed: What about a function f where f .x/is 0 for x rational and 1 for x irrational? (This function is known as the Dirichletfunction.) Obviously, f is not Riemann integrable, but should we define

Z 1

0

f .x/dx D 1

to express that modifying the value of the function which is constantly equal to 1 oncountably many points should not change the value of the integral? More generally,can one define the integral in such a way that we have

limZfn D

Zlimfn (*)

in a situation more general than the case of a uniform limit? Clearly, it isunreasonable to expect (*) in complete generality: for example, consider functionsfn where fn is constant n on the interval .0; 1=n/ and constant 0 elsewhere. Thenfn ! 0, while each of the functions fn has (Riemann) integral equal to 1.

Given all these questions, it is remarkable that there is a satisfactory answer:people more or less agree on one standard extension of the Riemann integral toa much larger class of functions, known as the Lebesgue integral. While thereare different approaches to the Lebesgue integral, and the concept is somewhatnotorious for taking a long time to cover, we will present here a relatively quick yetrigorous approach of defining the Lebesgue integral simply by starting with certainspecial cases of (*) as the definition of the value of

Rlim fn, and then showing that

this leads to a consistent theory. This approach to the Lebesgue integral is due toP.J.Daniell.


5.2 The class Z

The support of a function f W Rn ! R is the closure of the set fx 2 Rn j f .x/ ¤ 0g.The support of f is denoted by

supp.f /:

Thus, a function has compact support if and only if it vanishes outside a compactsubset X � R

n. There is obviously the smallest interval J0 containing the set X .Any interval J containing J0 is easily represented as an almost disjoint union of aset of intervals containing J0 and such that f is zero on all the other members of thesystem. Thus by 1.4, the integral

RJ f does not depend on the choice of the interval

J containing the support X of f . We will denote the common value by

If

(we will reserve the standard symbolR

for an extended integral defined later).The set of all continuous functions with compact support in Rn will be denoted by

Z:

Let us summarize the basic facts we will use below: We have a class Z offunctions defined on Rn such that(Z1) for all ˛; ˇ 2 R and f; g 2 Z, f C ˇg 2 Z,(Z2) if f 2 Z then jf j 2 Z,and a mapping I W Z ! R such that(I1) if f � 0 then If � 0,(I2) I is a linear map, and(I3) if fn & 0 then Ifn & 0

(for (I3), use 4.4, realizing that the support of fn is contained in the support of f1).Below, we will consistently use only the facts (Zj) and (Ij) and their conse-

quences. For example, let max.f; g/ (resp. min.f; g/) denote the function whosevalue at a point x is max.f .x/; g.x// (resp. min.f .x/; g.x//), and let f C Dmax.f; 0/, f � D �min.f; 0/. Note that

max.f; g/ D 1

2.f C g C jf � gj/ and min.f; g/ D 1

2.f C g � jf � gj/:

Thus, we easily deduce that

f � g ) If � Ig; and

f; g 2 Z ) max.f; g/;min.f; g/; f C; f � 2 Z:

6 A modest extension 107

6 A modest extension

6.1

Define

Zup D ff W Rn ! .�1;C1� j 9fn 2 Z; fn % f g;Zdn D ff W Rn ! Œ�1;1/ j 9fn 2 Z; fn & f g;Z� D Zup [ Zdn:

Remark. We choose, of course, the topology on .�1;C1� where a set is aneighborhood of C1 if and only if it contains some interval .K;C1�. This makesfn % f well defined: it means that fn is an increasing sequence of functions in Zsuch that for each x 2 Rn, the sequence fn.x/ converges to f .x/ in .�1;C1�.The treatment of Zdn is symmetrical. We will refer to f as a monotone limit of thefunctions fn % f or fn & f . The functions in Z� are not necessarily continuous,they do not have to have a compact support, and can (obviously) reach infinitevalues. Also note that Z � Zup \ Zdn and this inclusion is not an equality.

6.2 Proposition. Let f; g 2 Z� be monotone limits of sequences of functionsfn 2 Z and gn 2 Z, respectively. Let f � g. Then

lim Ifn � lim Ign:

Proof. (a) If fn % f and gn & g then fn � f � g � gn.(b) Let fn % f and gn % g. For a fixed k set

hn D min.gn; fk/:

Then the sequence .hn/ increases and we have

limhn D min.g; fk/ D fk;

and hence

hn %n fk; that is, .fk � hn/ &n 0

and we obtain, by (I3), that limn

Ihn D Ifk . Now gn � hn, hence Ign � Ihn,

and hence

limn

Ign � Ifk

for each k so that finally limn

Ifn � limk

Igk .


(c) If fn & f and gn & g use (b) for �f;�g.(d) Let fn & f and gn % g. Then fn � gn � hn D .fn � gn/C; since hn & 0 we

have lim Ihn D 0 and finally

lim Ifn � limIgn D lim I.fn � gn/ � 0: ut

6.3 A Corollary and a Definition

For f 2 Z�, we can define

If D limn

Ifn

where fn is an arbitrary monotone sequence of functions in Z converging (point-wise) to f .

6.4 A few immediate facts

For the purposes of integration, it is convenient to adopt the convention 0 � 1 D0 � .�1/ D 0. We will use this convention for the remainder of this chapter, and inChapter 5.(a) f 2 Zup if and only if �f 2 Zdn.(b) If f; g 2 Zup resp. Zdn then f C g 2 Zup resp. Zdn and we have I.f C g/ D

If C Ig.(c) If f 2 Zup and ˛ � 0 resp. ˛ � 0 then f 2 Zup resp. Zdn and we have

I. f / D ˛If .(d) If f; g 2 Z� and f � g then If � Ig.(e) If f; g 2 Zup then max.f; g/;min.f; g/ 2 Zup.

6.5 Proposition. Let fn 2 Zup and fn % f . Then f 2 Zup and Ifn % If .Similarly for fn 2 Zdn and fn & f .

Proof. Choose fnk 2 Z such that fnk %k fn and set

gn D maxffij j 1 � i; j � ng:(The maximum of finitely many functions is defined by applying the definition of5.2 recursively; alternately, take the maximum of the values at one point at a time.)Then gn % g for some g. Since

gn.x/ D fij .x/ � fi .x/ for some ij � n

we have

gn � fn � f: (1)

7 A definition of the Lebesgue integral and an important lemma 109

On the other hand, for k � n we have gk � fnk and hence

g � fn: (2)

By (1) and (2), gn % f .Regarding the value of If , by (2), If D Ig � Ifn and hence If � limIfn;

on the other hand, by (1), If D lim Ign � limIfn. ut

7 A definition of the Lebesgue integral and an importantlemma

In this section, we will define the well-known Lebesgue integral by the method ofDaniell. This approach differs from the original Lebesgue construction based ondefining a measure first. Here we will obtain measure later as a consequence of analready defined integral. We will see in Chapter 5 that the basic properties ofmeasure will follow practically for free.

7.1

For an arbitrary function f W Rn ! Œ�1;1�, let

Zf D supfIg j g � f; g 2 Zdng and

Zf D inffIg j g � f; g 2 Zupg:

Rf resp

Rf is called the lower resp. upper (Lebesgue) integral of f .

Remark. This notation will not interfere with the notation for the lower andupper Riemann integral introduced in 1.2 and used through Section 4. While themeanings of both notations are in fact different, we will not encounter the lower andupper Riemann integral any longer (with the exception of the Exercises).

7.2 Proposition. (1)Rf D supfIg j g � f; g 2 Z�g and

Rf D inffIg j g �

f; g 2 Z�g.

(2)Rf � R

f .

(3) If f � g thenRf � R

g andRf � R

g.

Proof. (a) Assume that, say, the second equality does not hold. Then there exists a

g � f , g 2 Zdn such that Ig <Rf . Let gn & g with gn 2 Z. Then there has to

be a k such that Igk <Rf . This is a contradiction, since gn 2 Z � Zup.

(2) and (3) are trivial. ut


7.3

From 7.2 (1), we immediately obtain the following

Corollary. For f 2 Z� we haveRf D R

f D If .

7.4

Denote by

L

the set of all functions f such thatRf D R

f and such that the common value isfinite. Such functions are called (Lebesgue) integrable, the common finite value iscalled the Lebesgue integral of f and denoted by

Zf:

We will keep this notation for a while to distinguish the Lebesgue integral from thetypes of integral developed earlier. Note, however, that in practice, other notationsare also common, for example, if x1; : : : ; xn are the standard coordinates in Rn, onecommonly writes

Zf .x1; : : : ; xn/dx1 : : : dxn

orZf .x/dx

for the Lebesgue integral also.

Remark. The assumption of finiteness of the common value is essential.

Functions with infiniteRf D R

f can in general misbehave. We will have functionswith infinite Lebesgue integral later, but their class will have to be restricted – see 7.9below.

7.5 Proposition. A function f W Rn ! Œ�1;1� satisfies f 2 L if and only if forevery " > 0 there exist g1 2 Zdn and g2 2 Zup, g1 � f � g2, such that Igi arefinite and Ig2 � Ig1 < ".

7 A definition of the Lebesgue integral and an important lemma 111

Proof. The implication ) is obvious.( : If gi are as assumed in the statement, then

Ig1 �Zf �

Zf � Ig2 � Ig1 C "

so thatRf � R

f is smaller than any " > 0. ut

7.6 Convention

Functions from L can have infinite values. Let us agree that in case of f .x/ D C1and g.x/ D �1 the value f .x/C g.x/ will be chosen arbitrarily. We will see thatfor our purposes such arbitrariness in the definition of f C g does not matter.

7.7 Proposition. (1) If f; g 2 L then f C g 2 L and one has

Z.f C g/ D

Zf C

Zg:

(2) If f 2 L then any f 2 L and one has

Zf D ˛

Zf:

(3) If f; g 2 L then max.f; g/ 2 L and min.f; g/ 2 L.(4) If f; g 2 L and f � g then

Rf � R

g.(5) If f 2 L then f C; f � 2 L.(6) If f 2 L then jf j 2 L and j R f j � R jf j

Proof. (1) We shall use 7.5. Choose f1; g1 2 Zup and f2; g2 2 Zdn such that f1 �f � f2, g1 � g � g2 and If1 � If2 < ", Ig1 � Ig2 < ". Then

f1 C g1 � f C g � f2 C g2 (*)

and the statement follows (realize that the inequalities hold also at the ambigu-ous points mentioned in the convention of 7.6: if, say, f .x/ D C1 andg.x/ D �1 then f2.x/ D C1 and g1.x/ D �1; f1.x/ has to be finite,as a limit of a decreasing sequence of finite numbers, and similarly for g2.x/ sothat the inequalities (*) are satisfied trivially).

(2) follows immediately from 7.5.(3) Take the fi ; gi as in (1) to obtain

max.f1; g1/ � max.f; g/ � max.f2; g2/ and

min.f1; g1/ � min.f; g/;� min.f2; g2/


and realize that

max.f2; g2/ � max.f1; g1/ � .f2 � f1/C .g2 � g1/:

Similarly for the minimum.(4) is obvious and (5) follows from (3).(5) j R f j D j R .f C � f �/j D j R f C � R

f �j � Rf C C R

f � D R jf j.ut

7.8 Lemma. If fn 2 L and if fn % f then

limZfn D

Zf:

Remarks before the proof.1. This lemma is very important and will play a crucial role below.2. As

Rfn � R

f , we have trivially limRfn � R

f . Hence, under the assumptions

of the lemma, we have limn

Zfn D

Zf D

Zf .

Proof. We obviously have limRfn � R

f , and if limRfn D C1 the equality is

trivial.Thus, we can assume that the limit is finite. By the definition of

Rfn choose

gn 2 Zup, gn � fn such that

Zfn C "

2nC1 > Ign:

Set hn D maxfgi ji D 1; : : : ; ng. Then hn 2 Zup and the sequence hn is increasingso that by 6.5, h D lim hn 2 Zup. Now hn � gn � fn and hence h � f , and

Ih � Rf .

Here is an important

Claim.

hn � fn � .g1 � f1/C .g2 � f2/C � � � C .gn � fn/:

(Indeed, at each point x, we have gj .x/ � fj .x/ D hn.x/ � fj .x/ for somej � n. The summands are non-negative, and hence the inequality holds for j D n;otherwise the sum is greater than or equal to hn.x/ � fj .x/ C gn.x/ � fn.x/ Dhn.x/�fn.x/Cgn.x/�fj .x/ � hn.x/�fn.x/Cgn.x/�fn.x/ � hn.x/�fn.x/.)

8 Sets of measure zero; the concept of “almost everywhere” 113

Thus we have

Ihn �Zfn �

nXiD1

"

2iC1< "

so that Ihn � Rfn C " and finally

Rf � Ihn � lim

Rfn C ". ut

7.9 Some more notation

Set

Lup D ff j 9fn 2 L; fn % f g; Ldn D ff j 9fn 2 L; fn & f g; and

L� D Lup [ Ldn:

Now we obtain from 7.8 the following

7.9.1 Corollary. For each f 2 L� we haveRf D R

f . Consequently,

Lup \ Ldn D L:

7.9.2 ConventionFor f 2 L� we will use the symbol

Rf for the common value of

Rf and

Rf , even

when it is infinite. However, we will not refer to such functions as integrable.

7.9.3 Proposition. If f 2 L� and if the integralRf from 7.9.2 is finite then f 2 L

and the integral coincides with the standard integral in L.

Proof. Let, say, f 2 Lup, let fn % f with fn 2 L. Then by Lemma 7.8 and part 2

of the Remark in 7.8,Rf D lim

Rfn D R

f D Rf . ut

8 Sets of measure zero; the concept of “almost everywhere”

8.1

The characteristic function of a subsetM � Rm will be denoted by

cM

(that is, cM .x/ D 1 if x 2 M and cM .x/ D 0 otherwise). We have


M � N if and only if cM � cN ;

cM[N D max.cM ; cN / and cM\N D min.cM ; cN /;

and if M1 � M2 � � � � � Mn � � � � , M D1SnD1

Mn, then

cMn % cM :

M is a set of measure zero ifRcM D 0 (then, since cm � 0, we also haveR

cM D 0 and hence cM 2 L).

8.2 Proposition. (1) If M is a set of measure zero and N � M then N is a set ofmeasure zero.

(2) If Mn are sets of measure zero then also1SnD1

Mn is a set of measure zero.

Proof. (1) is trivial. For (2), considerNn D M1[� � �[Mn. Then cNn � cM1C� � � cMn

and hence Nn is a set of measure zero by 7.7. Now cNn % cM and henceRcM D 0

by 7.8. ut

8.3

Let V.x/ be a statement about points in Rm. We say thatV holds almost everywhere (briefly, a.e.)

if the set

fx j not V.x/g

is a set of measure zero.If f .x/ D g.x/ almost everywhere, we will write

f g:

8.4 Proposition. (1) If f 2 L then f .x/ is finite almost everywhere.(2) If f 2 Lup (resp. Ldn) then f .x/ > �1 (resp. < C1) almost everywhere.

Proof. (1) Recall the convention on sums in 7.6, and Proposition 7.7 (1). We maydefine f C .�f / equally well as 0 or as cM where M D fx j f .x/ D ˙1gand hence

RcM D R

0 D 0.(2) When f 2 Lup, take fn 2 L with fn % f . Then fx j f .x/ D �1g �

fx j f1.x/ D ˙1g and the latter set is a set of measure zero by (1). The caseof f 2 Ldn is analogous. ut

9 Exercises 115

8.5 Proposition. If f g thenRf D R

g andRf D R

g.

Proof. We will consider the case ofR

(the other case is analogous). If we do not

haveRf D R

gD C1 we can assume thatRf < C1. SetM Dfxj f .x/¤ g.x/g

and rn D n � cM . By 3.8 we haveRr D 0 for r D lim rn.

Choose h1; h2 2 Zup such that h1 � f , h2 � r , Ih1 <Rf C " and Ih2 < ".

Then we have h1Ch2 2 Zup, h1Ch2 � g, and henceRg � Ih1CIh2 <

Rf C2".

Thus,Rg � R

f , in particularRg < C1, and we can repeat the procedure with

f; g interchanged. ut

8.6 Corollary. (1) If f 2 L and f g then g 2 L.(2) If f 2 Lup resp. Ldn and f g then g 2 Lup resp. Ldn.

8.7 Proposition. If f � 0 andRf D 0 then f 0.

Proof. Set Mn D fx j f .x/ � 1ng. Since 0 � cMn � nf we have

RcMn D 0, hence

Mn is a set of measure zero, and consequently fx j f .x/ ¤ 0g D1[nD1

Mn is a set of

measure zero. ut

9 Exercises

(1) Prove Proposition 1.3.(2) Prove Proposition 2.2.(3) Prove the second equality in Theorem 3.1.(4) Prove that the lower Riemann integral of a bounded function on an interval

in Rn is always less than or equal to the lower Lebesgue integral, and thatthe upper Riemann integral is always greater than or equal to the upperLebesgue integral. Conclude that a Riemann integrable function on an intervalis Lebesgue integrable and that both integrals are equal.

(5) Prove by definition that the Lebesgue integral of the function equal to x�qon h0; bi and 0 elsewhere where 0 < q < 1, b > 0 are constants exists,and compute it. [Hint: Consider the functions equal to x�q on ha; bi where0 < a < b and 0 elsewhere.]

(6) Prove that the Lebesgue integral of the function f .x/ D 1

1C x2exists and

compute it. [Hint: see the hint to Exercise (5).](7) Prove that the function which is equal to 1 on every irrational number in h0; 1i

and 0 elsewhere is Lebesgue integrable and calculate its Lebesgue integral.(8) Prove that the Cantor set of Exercise (19) in Chapter 2 has measure 0. [Hint:

Express its characteristic function as an appropriate monotone limit.]


(9) By a generalized Cantor set, we shall mean the intersection S D TSi of sets

S0 � S1 � S2 � : : : constructed as follows: We put S0 D h0; 1i. The set Snis a union of 2n closed intervals hai ; bi i, i D 1; : : : ; 2n, and for some number

"n > 0, " <bi � ai2

, we have

SnC1 D Sn X 2n[iD1.ai C bi

2� ";

ai C bi

2C "

!:

(a) Prove that there exist generalized Cantor sets which are not of measure 0.(b) Derive a necessary and sufficient condition (in terms of the numbers "i )

for the set S to be of measure 0.(10) (a) Prove that for two generalized Cantor sets S , T , there exists a monotone

homeomorphism� W h0; 1i ! h0; 1i such that �ŒS� D T . [Hint: Constructsuch map with S , T replaced by Sn, Tn and prove that the sequence ofthose maps converges uniformly. Use a separate argument to show thatthe limit is monotone.]

(b) Conclude that for a homeomorphism h0; 1i ! h0; 1i, a continuous imageof a set of measure 0 may not be of measure 0.

(11) Let f W R ! h0; 1i be defined as follows: If x is irrational, then f .x/ D 0. Ifx D a=b where a 2 Z, b 2 N and the greatest common divisor of a and b is1, then f .a=b/ D 1=b. Prove that f is continuous almost everywhere. [Hint:Try to guess the set of all points at which f is continuous.]

5Integration II: Measurable Functions, Measureand the Techniques of Lebesgue Integration

1 Lebesgue’s Theorems

1.1 Theorem. (Lebesgue’s Monotone Convergence Theorem) Let fn 2 Lup and letfn % f a.e. Then f 2 Lup and

Rf D lim

Rfn. Similarly for fn 2 Ldn and

fn & f .

Proof. Let us treat the case fn % f , fn 2 Lup, the other case is analogous. Choosefnk 2 L such that fnk %k fn and set

gn D maxffij j i; j � ng:

Now gn % g with gn 2 L. Since gn � f we have g � f . On the other hand,however, gp � fmp for p � n and hence g � fn, and finally g � f . Thus,f D g 2 Lup.

Now consider the value ofRf . If lim

Rfn D C1 the equality is trivial; hence

we can assume that limRfn is finite. Then fn 2 L and we can use 7.8 of Chapter 4

to obtain limRfn D R

f D Rf . ut

Remark. This statement is also known as Levi’s Theorem.

1.2 Theorem. (Lebesgue’s Dominated Convergence Theorem) Let fn 2 L. Assumelim fn.x/ D f .x/ a.e., and let there exist a g 2 L such that jfn.x/j � g.x/ a.e.Then f 2 L and

Rf D lim

Rfn.

Remark. The attentive reader may worry about the seemingly sloppy formula-tion: does one mean “almost everywhere one has that for all n that jfn.x/j � g.x/”or “for each n one has that jfn.x/j � g.x/ almost everywhere”? But it is an easyexercise (Exercise (1)) to show these two statements are equivalent.


117

118 5 Integration II: Measurable Functions, Measure and the Techniques: : :

Proof. By 8.5 of Chapter 4, we may omit “almost everywhere” from the assump-tions.

Set

hn D maxffk j k � ng; gn D minffk j k � ng:

Since maxjD0;:::;p fnCj %p hn we have hn 2 Lup, and similarly gn 2 Ldn. But we have,

moreover,

�g � gn � fn � hn � g

and henceRgn and

Rhn are finite and we have in fact gn; hn 2 L, and consequently

gn 2 Lup and hn 2 Ldn and we can use Lebesgue’s Monotone ConvergenceTheorem. Now obviously gn % f and hn & f , by Lebesgue’s MonotoneConvergence Theorem we have lim

Rgn D lim

Rhn D R

f , and finally sincegn � fn � hn we conclude that

Rf D lim

Rfn. ut

1.3 Proposition. Let g 2 L, let fn 2 L�, let fn � g a.e. and let limnfn.x/ D f .x/

a.e. Then f 2 Lup. Similarly for fn � g we obtain f 2 Ldn.

Proof. Since �1 <Rg � R

fn, fn 2 Lup (if fn 2 Ldn it has, hence, a finiteintegral so that, by 7.9.3 of Chapter 4, fn 2 L � Lup as well). Set ' D supn fn.We have max

k�n fk %n ' and hence ' 2 Lup by 1.1, and there exist 'n 2 L such that

'n % '. Obviously ' � f � g and we can assume that 'n � g (else replace 'n bymax.'n; g/). Set

gkn D min.'k; fn/:

We have g � gkn � 'k and hence gkn 2 L and, moreover, we can use Lebesgue’sDominated Convergence Theorem for lim

ngkn and obtain

min.'k; f / D limngkn 2 L:

Now we conclude that min.'k; f / %k f and hence f 2 Lup. ut

2 The classƒ (measurable functions)

2.1

As before, limn fn D f will be abbreviated by writing fn ! f . Let

ƒ D ff j 9fn 2 L; fn ! f g

2 The classƒ (measurable functions) 119

(unlike in the definition of Lup and Ldn there is no assumption on the nature of theconvergence). Functions which belong to ƒ are called (Lebesgue) measurable.

2.2 Proposition. If f g and f 2 ƒ then g 2 ƒ.

Proof. Let fn 2 L and fn ! f . Define M D fx j f .x/ ¤ g.x/g and set

gn.x/ D g.x/ for x 2 M; gn.x/ D fn.x/ otherwise:

Then by 8.6 of Chapter 4, gn 2 L. ut

2.3

From 1.3, we immediately see the following

Corollary. If f 2 ƒ and f � 0 then f 2 Lup.

2.4

The following is trivial.

Proposition. (a) If f; g 2 ƒ and if f C g makes sense a.e. then f C g 2 ƒ.(b) If f 2 ƒ and ˛ 2 R then f 2 ƒ.(c) If f; g 2 ƒ then max.f; g/;min.f; g/ 2 ƒ.(d) If f 2 ƒ then jf j 2 ƒ.

2.5 Proposition. f 2 ƒ if and only if both f C and f � are in Lup.

Proof. If fn are in L and fn ! f then obviously f Cn ! f C and f �

n ! f �.Use 2.3. The other implication is trivial. ut

2.5.1 Corollary. Let f 2 ƒ and let there exist a g 2 L such that jf j � g. Thenf 2 L.

2.6 Proposition. If fn 2 ƒ and if fn ! f a.e. then f 2 ƒ.

Proof. We have f Cn ; f

�n 2 Lup and f C

n ! f C, f �n ! f �. Thus, by 1.3, both f C

and f � are in Lup. ut

2.7 Proposition. f 2 L� if and only if f C and f � are in Lup and if the differenceRf C � R

f � makes sense.


Consequently, f 2 ƒ X L� if and only if f C and f � are in Lup andRf C DR

f � D C1.

Proof. ) : Let, say, f 2 Lup and let fn % f and fn 2 L. As f1 D f C1 � f �

1 �f D f C � f � we have f � � f �

1 2 L and hence the value ofRf � is finite.

( : IfRf C � R

f � makes sense then at least one of the integrals is finite andeither f C or f � is in L. Thus, f C � f � is either in Lup or in Ldn. ut

2.8 Remark

Some of the statements proved in this section may be somewhat surprising. It turnedout, for example, that for integrability of a limit of integrable functions, the natureof the limiting process is not very important: all one needs is that the positive andnegative parts of the limit not both have infinite integrals.

For the value of the integral of the limit, on the other hand, the nature of theconvergence obviously matters a great deal.

3 The Lebesgue measure

3.1

A set A � Rm is said to be (Lebesgue) measurable if the characteristic function cA

is in ƒ (then, of course, it is in Lup, by 2.3). We put

�.A/ DZcA

and call �.A/ the (Lebesgue) measure of A.Note that this terminology is in accordance with 8.4 of Chapter 4 (see Exer-

cise (4)).

3.2 General facts

If A;B � Rm are measurable, A [ B is measurable and

�.A [ B/ � �.A/C �.B/

(by 2.4, we have cA[B D max.cA; cB/ .� cA C cB/ in ƒ) and if A;B are disjointthen

�.A[ B/ D �.A/C �.B/ (3.2.1)

as then cA[B D cA C cB .

3 The Lebesgue measure 121

But we have much more: the measure is countably additive (�-additive, as thisfact is usually referred to). Here are some facts on measurability.

Proposition. (1) Let An, n D 1; 2; : : : , be measurable sets. Then1[nD1

An is

measurable. If for any two n; k the intersection An \ Ak is a set of measurezero then

�.

1[nD1

An/ D1XnD1

�.An/:

(2) The intersection of a countable system of measurable sets is measurable.(3) If A;B are measurable then the difference A X B is measurable.(4) �.;/ D 0 and for a measurable subset A � B , �.A/ � �.B/.

Proof. (1) We have

cA1[��[An %n cS1nD1 An

and hence cS1nD1 An

2 Lup. In the almost disjoint case we obtain the valuefrom the finite additivity (3.2.1) and from Lebesgue’s Monotone ConvergenceTheorem.

(3) cAXB D max.cA � cB; 0/. (See 2.4.)(2) From (1), (3).(4) is trivial. ut

3.3 Special sets

Proposition. (1) Every open set in Rm is measurable.(2) Every closed set in Rm is measurable.(3) For the interval J D ha1; b1i � � � ham; bmi, one has

�.J / D .b1 � a1/.b2 � a2/ � � � .bn � an/:

(4) Every countable set is measurable, with measure 0.

Proof. The Euclidean distance in Rm will be denoted by �.x; y/.(1) It suffices to show that bounded open sets are measurable: for a general open U

consider the open balls Bn D fx j �.x; .0; : : : ; 0// < ng and use Proposition 3.2(1) for U D

[n

U \ Bn.

Thus, let U be a bounded open set. Set


An D fx j �.x;Rm X U / � 1

ng

and define fn W Rm ! R by

fn.x/ D �.x;Rm X U /�.x;Rm X U /C �.x;An/

:

Since An and Rm X U are disjoint closed sets, fn is a continuous map. Sincefn.x/ D 0 for x … U , we have fn 2 Z � ƒ. Now if x 2 U then �.x;RmXU / �1n0

for some n0 and hence x 2 An, and fn.x/ D 1, for all n � n0. Thus,

fn ! cU

and cU 2 ƒ.(2) Use (1) and 3.2 (3).(3) Note that for a bounded closed set C we can use a similar procedure as in (1):

this time set

An D fx j �.x; C / � 1

ng

and define fn W Rm ! R by

fn.x/ D �.x;An/

�.x;An/C �.x; C /:

Now obviously fn.x/ D 1 for x 2 C and fn.x/ D 0 for �.x; C / � 1n

if n.Furthermore, if k � n then �.x;Ak/ � �.x;An/, and fk.x/ � fn.x/. Thus,

fn & cC :

In particular this holds for the interval J . Moreover, fn.x/ D 0 outside

ha1 � 1

n; b1 C 1

ni � � � ham � 1

n; bm C 1

ni

and 0 � fn.x/ � 1 so that by the standard estimate of Riemann integrals

.b1 � a1/ � � � .bn � an/ �Zfn � .b1 � a1 C 2

n/ � � � .bn � an C 2

n/

andRcJ D .b1 � a1/.b2 � a2/ � � � .bn � an/ by Lebesgue’s Monotone

Convergence Theorem (actually already by Dini’s Theorem).(4) By (3), �.fxg/ D 0. Use 3.2 (1). ut

4 The integral over a set 123

3.4 The set B of Borel sets

The smallest class of subsets of Rm containing all open subsets and closed under• Complements,• Countable unions, and• Countable intersections(of course, the last follows from the first two) is called the class of Borel sets, anddenoted by B.

Thus, all the open and closed sets are Borel. However, we have more complicatedsets. For example, an F� set is a countable union of closed subsets, and a Gı set isan intersection of countably many open subsets. Going on, a Gı� set is a union ofcountably many Gı sets, and an F�ı set is an intersection of countably many F�sets, and so on. All sets produced in this way are Borel by definition.

From 3.2 and 3.3 we immediately obtain

3.4.1 Corollary. Every Borel set is measurable.

3.5

Let us conclude this section with a trivial remark. From 3.2 (1) and 2.2 (1), weimmediately obtain the frequently used somewhat paradoxical observation that forevery " > 0, there exists a dense open set U of the unit interval I such that�.U / < ": order all the rationals in I in a sequence r1; r2; : : : ; rn; : : : and set

U D1[nD1.rn � 1

2nC2 ; rn C 1

2nC2 /

(where .a; b/ designate open intervals).

4 The integral over a set

4.1

Unlike the additivity of the classes L etc., we do not have similarly well behavedmultiplicativity properties. Nevertheless, multiplying by characteristic functions cMof M measurable does give satisfactory results.

Proposition. Let M be a measurable set and let f 2 L. Then cM � f 2 L.

Proof. Put 'n D min.ncM ; .max.f; .�n�cM ////. Then 'n 2 ƒ and since j'nj � jf jwe have cMf D lim'n in L by 2.5.1. ut


4.2

By 4.1, we can define for a measurable set M and f 2 L,

Z

M

f �df

ZcMf;

the integral of f overM .

4.3 Proposition. Let Mn, n D 1; 2; : : : be measurable.(a) Let for n ¤ k, Mn;Mk be almost disjoint (i.e. �.Mn \Mk/ D 0), f 2 ƒ, and

assume that forM D SMn,

RMf makes sense. Then

Z

M

f D1XnD1

Z

Mn

f:

(b) Let M1 � M2 � � � � ; M D SMn and assume that

RMf makes sense. Then

Z

M

f D limn

Z

Mn

f:

(c) Let M1 � M2 � � � � ; M D TMn and assume that

RM1f makes sense. Then

Z

M

f D limn

Z

Mn

f:

Proof. For f � 0 the statement immediately follows from Lebesgue’s MonotoneConvergence Theorem and the fact that the sum formula obviously holds for finitelymany Mn. Thus, we have the equality for f C and f �. Now if

RMf makes sense

then by 2.7 one ofRmf C,

Rmf � is finite, and hence at least one of the series

1XnD1

Z

Mn

f C,1XnD1

Z

Mn

f � converges, and since the summands are non-negative, it

converges absolutely. Thus,

Z

M

f DZ

M

f C �Z

M

f � D1XnD1

Z

Mn

f C �1XnD1

Z

Mn

f � D1XnD1

Z

Mn

.f C � f �/;

the last reshuffling being made possible by the absolute convergence of at least oneof the series (and the other’s being a sum of non-negative numbers).

(b) Apply (a) forM1;M2 XM1;M3 XM2; : : : .

(c) Set Nn D M1 XMn. ThenM D M1 XSNn. Use (b). ut

4 The integral over a set 125

4.3.1 RemarkFor the general statement, the assumption that

RM f make sense is essential. The

point is that we could have bothRM f

C andRM f

� infinite.

4.4 Criteria of measurability

For many purposes, we need a criterion by which sets and functions are measurable.Let us begin with the following definition: For a Borel set X � R

m, a functionf W X ! h�1;1i is called Borel measurable if

For every S � h�1;1i Borel, f �1ŒS� is Borel. (C)

Theorem. A function f W X ! h�1;1i is (Lebesgue) measurable if and only ifthere exists a Borel measurable function equal to f almost everywhere.

4.4.1 Corollary. A subset S � Rm is measurable if and only if there exists a Borelset B � Rm such that S X B and B X S are sets of measure 0.

Comment: Note that since the inverse image preserves unions, intersections andcomplements, we may equivalently replace every Borel set S in (C) by either everyinterval h�1; a/, a 2 R or every interval .a;1i, a 2 R.

Proof of the Theorem: We begin by considering the easy implication. First,suppose f is Borel measurable. Then so are f C and f �, so by Proposition 2.5,we may assume f � 0. Then define

fn.x/ D k

2nwhen

k

2n� f .x/ <

k C 1

2n: (*)

Then clearly

fn % f:

Further, each fn is an increasing limit of a sequence of functions each of whichtakes on only finitely many values, the inverse images of which are Borel, and hencemeasurable sets. Therefore fn 2 Lup, and hence f 2 Lup.

Now a function equal to f almost everywhere is measurable by Corollary 8.6 ofChapter 4.

To prove the converse implication, we first prove some lemmas.

4.4.2 Lemma. If fn % f or fn & f and the functions fn are Borel-measurable,so is f .

Proof. Consider fn % f (the case of fn & f clearly follows by taking negatives).Note that f .x/ > a if and only if there exists an n such that fn.x/ > a, sof �1Œ.a;1i� D S

f �1n Œ.a;1i�, so our statement follows from the Comment. ut


4.4.3 Lemma. If f � 0, f is Borel measurable, andRf D 0, then f D 0 almost

everywhere.

Proof. Otherwise, �.f �1Œ.1=n;1i� for some n D 1; 2; : : : . But then

Zf �

Z

f �1Œ.1=n;1i�f � 1

n�.f �1Œ.1=n;1i�/ > 0:

A contradiction. ut

4.4.4 Lemma. If f , g are Borel measurable, so is �f and, if f; g � 0, also f Cg.

Proof. The statement for �f is immediate. For f C g, note that .f C g/.x/ < a

if and only if there exist rational numbers q, r such that f .x/ < q, g.x/ < r andq C r < a and thus, .f C g/�1Œ.�1; a/� is the (countable) union of the Borel setsf �1Œ.�1; q/� \ g�1Œ.�1; r/�. ut

Now let f be measurable. Then by Lemma 4.4 of Chapter 4, and Proposition 2.5,it suffices to prove the statement for f C, f �, and hence, by Lemma 4.4.2, forf 2 L.

When f 2 L, by 4.7.5, there exist gn 2 Zdn such that

gn � gnC1 � f

andZgn %

Zf:

Similarly, there exist hn 2 Zup such that

hn � hnC1 � f

andZhn &

Zf:

By the Comment, functions in Zup and Zdn are clearly Borel-measurable, so if weput

g D limgn; h D lim hn;

g and h are Borel-measurable functions,

5 Parameters 127

g � f � h

andZ.h � g/ D 0:

By Lemma 4.4.4, h� g is Borel measurable. Let

B D fxj.h� g/.x/ D 0g:

By Lemma 4.4.3,X XB is a set of measure 0. Therefore, we can take h as the Borelmeasurable function required by the statement. ut

4.5 Corollary. A function f W Rm ! h�1;1i is measurable if and only if forevery interval B D .a;1i (alternately, every interval B D h�1; a/), f �1ŒB� ismeasurable.

Proof. If f is measurable then, by Theorem 4.4, it is equal to a Borel-measurablefunction almost everywhere, and hence clearly satisfies our condition by Proposi-tion 8.2 (1) of Chapter 4.

If, on the other hand, f satisfies our criterion then, as in the Comment above,f �1ŒB� is Lebesgue measurable for every Borel set B . As above, we may pass tothe functions f C and f �, and hence may assume that f � 0. Now the formula (*)again produces an increasing sequence of measurable functions converging to f ,and hence f is measurable. ut

5 Parameters

5.1 Theorem. Let T be a metric space, t0 2 T , and let f W T Rm

! R [ fC1;�1g be a function such that.1/ for almost all x, f .�; x/ is continuous in a point t0,.2/ there is a neighborhood U of t0 such that the functions f .t;�/ belong to L for

all t 2 U X ft0g, and.3/ there exists a g 2 L and a neighborhood U of t0 such that for almost all x and

for all t 2 U X ft0g one has jf .t; x/j � g.x/.Then f .t0;�/ is in L and we have

Zf .t0;�/ D lim

t!t0

Zf .t;�/:

Proof. Choose tn 2 U X ft0g such that limntn D t0 and use the Lebesgue Dominated

Convergence Theorem. ut


5.2 Theorem. Let f W RRm ! R[ fC1;�1g be such that in a neighborhoodU of t0

.1/ there exist partial derivatives@f .t; x/

@tfor almost all x,

.2/ there exists a g 2 L such that for almost all x and for all f 2 U one has

ˇˇ@f .t; x/

@t

ˇˇ � g.x/;

.3/ and for t 2 U there existRf .t;�/.

Then there exist the integralZ@f .t0;�/@t

and one has

Z@f .t0;�/@t

D d

dt

Zf .t0;�/:

Proof. We have@f .t0; x/

@tD lim

h!0

1

h.f .t0 C h; x/ � f .t0; x//. Set '.h; x/ D

1h.f .t0 C h; x/ � f .t0; x//. By Lagrange’s Theorem we have

j'.h; x/j Dˇˇ@f .t0 C �h; x/

@t

ˇˇ � g.x/

and hence we can apply Theorem 5.1. ut

6 Fubini’s Theorem

In this section we will have to indicate the dimension of the Euclidean spacewe work in. When working in Rm, we will decorate the symbols Z;Zup;Letc. with subscripts Zm;Z

upm ;Lm etc., and for the integral symbols we will use

R .m/;R .m/

;R .m/

instead ofR;R;R

.We will abandon the integral symbol I since we already know that for f 2 Z�

we have If D Rf .

Finally, to avoid confusion in the case of two variables we will sometimes usethe classical

Zf .x; y/dy or

Zf .x; y/dx for

Zf .x;�/ or

Zf .�; y/:

6.1 Lemma. For a function f defined on RmCn define functions F and F on Rm

by setting

6 Fubini’s Theorem 129

F .x/ DZ .mCn/

f .x; y/dy (resp. F .x/ DZ .mCn/

f .x; y/dy /:

Then one has

Z .mCn/f �

Z .m/

F (resp.Z .mCn/

f �Z .m/

F /:

Proof. I. If f 2 ZmCn then we have equalities, by the case of Fubini’s Theoremfor the Riemann integral of continuous maps on compact intervals. Furthermore,when F D F D F , we have

F 2 Zm:

Indeed, choose a compact interval J carrying the function f . The function Fobviously has compact support, contained in the projection of J (the valueselsewhere are integrals of 0). Further, let K be the volume of J . For an " > 0

there exists a ı > 0 such that for �.x; x0/ < ı, we have jf .x; y/ � f .x0; y/j <"

K, independently on y. Therefore, we have

ˇˇZF.x/ �

ZF.x0/

ˇˇ < "

K�K D ";

and F is continuous.II. Now let fk 2 ZmCn, fk %k f . Then

Fk.x/ DZfk.x; y/dy % F.x/ and also fk.x;�/ % f .x;�/

for all y. Therefore, we still have

Z .mCn/f .x; y/dy D lim

k

Z .mCn/fk D lim

k

Z .m/

Fk DZ .m/

F:

III. Now let f be general and let g 2 Zup be such that g � f . Put G.x/ DR .mCn/g.x; y/dy, Then G � F , and by II we have

Z .mCn/g D

Z .m/

G �Z .m/

F

and hence

Zf D inff

Zg j g 2 Zup; g � f g �

ZF : ut


Theorem. (Fubini) Let f 2 L�mCn. Then for almost all x there exists the integralR .mCn/

f .x; y/dy. If we denote its value by F.x/, and define the values F.x/arbitrarily in the remaining points, we have F 2 L�

m and

Z .mCn/f D

Z .m/

F:

Proof. Put F .x/ D Rf .x; y/dy and F .x/ D R

f .x; y/dy. By Lemma 6.1, wehave

Zf D

Zf �

ZF �

8ˆ<ˆ:

ZF

ZF

9>>>=>>>;

�ZF �

Zf D

Zf:

Let f be in LmCn. Then the values are finite and we obtain, first of all, thatRF DR

F is finite and hence F 2 Lm, and similarly F 2 Lm. Further,RF D R

F and

henceR.F �F / D 0 and henceF �F D 0 almost everywhere, by 4.7. If f 2 L�

mCnuse Lebesgue’s Monotone Convergence Theorem. ut

7 The Substitution Theorem

In this section, we will prove a substitution theorem for multivariable integrals. Thereader should be aware that a much more general substitution theorem is valid (see[18]). In this text, we would basically be happy with a substitution theorem for theRiemann integral of a continuous bounded function where the coordinate change isa diffeomorphism with bounded partial derivatives (as needed, for example, in theStokes Theorem in Chapter 12 below). However, we will typically need to integrateover Borel sets, which makes Lebesgue integral relevant. The purpose of this sectionis to give a rigorous, but otherwise as straightforward as possible, proof of theversion of the theorem needed here.

7.1

Recall the set B of all Borel sets in Rm. Let U � Rm be an open set. Define

BU D fS 2 BjS � U g:Note that clearly, BU is the smallest set of subsets of U closed under complementsand countable unions, which contains all open subsets of U . Let us also write

IU D fha1; b1/ � � � han; bn/jha1; b1i � � � han; bni � U g:

7 The Substitution Theorem 131

7.2 Lemma. Let U � Rm be open and let S D S0 2 IU . Then there existS1; S2; : : : ; Sn; : : : such that i ¤ j ) Si \ Sj D ; for i; j D 0; 1; 2; : : : and

V D1[iD0

Si :

Proof. Let

S0 D ha1; b1/ � � � han; bn/:Assume S0 is non-empty. (If S0 is empty, choose S1 2 IU arbitrary and proceedwith k � 1 instead.) Let di.k/ D .bi � ai /=2

k. Assuming S0 ¤ ;, let T0 D fS0g.Suppose we have already defined T0; : : : Tk�1. Let Tk be the set of all

hr1; s1/ � � � hrn; sn/ � .U X .T0 [ � � � [ Tk�1//

where

si D ri C di .k/; ri D ai C `idi .k/

for some `i 2 Z, i D 1; : : : ; n. Let

fS1; S2; : : : g D T1 [ T2 [ : : : :

By definition, the Si ’s are disjoint and one easily checks thatSSi is open and

closed in U . ut

7.3 Lemma. For S 2 IU , there exist open sets U � V1 � � � � � Vk � : : : suchthat

S D1\iD1

Vk:

Proof. Using the same notation as in Lemma 7.2, take

Vk D .a1 � 1

k; b1/ � � � .an � 1

k; bn/: ut

7.4 Proposition. Let SU be the smallest set of subsets of U which satisfies.1/ Iu � SU ;.2/ When S1; S2; � � � 2 SU are disjoint, then

1[iD1

Si 2 SU ; (C)

.3/ When S 2 SU , we have U X S 2 SU .Then SU D BU .


Comment: This proposition is a special case of a more abstract theorem knownas Dynkin’s Lemma. The proof is essentially the same; the greater generality wouldbe of no use to us.

Proof. Let S 2 SU . Let

SU .S/ D fT 2 SU j S \ T 2 SU g: (7.4.1)

Step 1: If S 2 SU , then the conditions (2) and (3) above hold with SU replacedby SU .S/.

Proof. (2) is trivial by distributivity. To prove (3), when S; T; S\T 2 SU , then

S \ .U X T / D U X ..U X S/ [ .S \ T // 2 SU ;

since .U X T / \ .S \ T / D ;. ut

Step 2: When S 2 IU , clearly IU � SU . Therefore, by Step 1, SU .S/ D SU .Step 3: Now let S 2 SU . By Step 2, IU � SU . Therefore, by Step 1, SU .S/ D

SU .Step 4: By Step 3 (note (7.4.1)) and (2), (C) holds for any S1; S2; � � � 2 SU

(without assuming disjointness). By Lemma 1 (with S0 D ; and U replacedby V ), every open subset V � U satisfies V 2 SU . Therefore, BU � SU .

Step 5: Note that by Lemma 7.2, IU � BU , and hence, by definition, SU � BU .ut

7.5 Assumption

Assume now U � Rm is an open set, and

F W U ! Rm

is an injective map with continuous first partial derivatives which satisfies

det.DFx/ ¤ 0 for all x 2 U(Then F is regular, and by 7.2, 7.3 of Chapter 3, its image is open and its inversealso satisfies the Assumption). Recall 3.2 of Chapter 3 for a discussion of DFx. Theattentive reader has noticed that

det.DFx/

is a special case of the Jacobian considered in 6.2 of Chapter 3 when the variables xof 6.2 of Chapter 3 are not present and y is labeled as x. Many texts, in fact, reservethe term for this special case.

7 The Substitution Theorem 133

7.6 Lemma. Let S 2 IU . Then

�.FŒS�/ �Z

S

jdet.DFx/jdx: (*)

Proof. Note first that by Lemma 7.3 and the fact that F is a homeomorphism ontoits image, FŒS� is Borel.

Next, one proves (*) in the case when � is an affine map (see 5.9 of Appendix A).By the multiplicative property of the determinant with respect to composition,translation-invariance of Lebesge measure, Fubini’s Theorem and Gauss elimina-tion, it then suffices to prove (*) for n D 1 (which is obvious) and for the map

�1 a

0 1

�: (C)

For the case of (C), since � is clearly invariant under translation, it suffices to provethe statement for

S D h0; b1/ h0; b2/; b1; b2 > 0:

Then

FŒS� �n�1[iD0

h iab2n

� jajb2n;

iab2n

C b1 C jajb2n/ h ib2

n;.i C 1/b2

n/:

The Lebesgue measure of the right-hand side, with n D 2k, k ! 1, clearlyapproaches b1b2, while FŒS� is an intersection of this decreasing sequence of sets.

For the case of � general satisfying our assumption, by countable additivity, itsuffices to consider the case when the Rm-closure S of S is contained in U . Thensince the partial derivatives of � are continuous on S , they are uniformly continuousby Theorem 6.6 of Chapter 2. Therefore, for every " > 0 there exists a ı > 0 suchthat for a D .a1; : : : an/ 2 S and

0 < bi � ai < ı; (7.6.1)

we haveˇˇ@Fi .y/@xj

� @Fi .a/

@xj

ˇˇ < ":

By the Mean Value Theorem, then, assuming (7.6.1),

FŒha1; b1/ � � � han; bn/�


is a subset of

xCDFxŒh�".b1�a1/; .b1�a1/C".b1�a1//� � �h�".bn�an/; .bn�an/C".bn�an//�:

From the affine case, we conclude that

�.FŒha1; b1/ � � � han; bn/�/ � .1C 2"/mjdet.DFx/j:

Since " > 0 was arbitrary, our statement follows. ut

7.7 Lemma. Let S 2 IU and let f W FŒU � ! R be a non-negative continuousfunction. Then

Z

FŒS �f �

Z

S

.f ı F/jdet.DFx/jdx: (*)

This also holds with S replaced by an open subset V � U .

Proof. Let

S D ha1; b1/ � � � han; bn/:Let, for integers 0 � i1 < 2

k; : : : 0 � in < 2k,

Sk.i1; : : : ; in/

denote the set�ha1 C i1.b1 � a1/

2k; a1 C .i1 C 1/.b1 � a1/

2k

� : : :

�an C in.bn � an/

2k; an C .in C 1/.bn � an/

2k

�:

Then define “step functions” fk by

fk.x/ D infz2Sk.i1;:::in/

f .z/ for x 2 Sk.i1; : : : ; in/:

Then fk % f and with f replaced by fk , the statement for S 2 IU holds byLemma 7.6. For V open, the statement holds by Lemma 7.2 (with S0 D ;, Ureplaced by V . ut

7.8 Proposition. For V � U open, f W FŒU � ! R non-negative continuous, wehave

8 Holder’s inequality, Minkowski’s inequality and Lp -spaces 135

Z

FŒV �f D

Z

V

.f ı F/jdet.DFx/jdx:

The statement also holds with V replaced by S 2 IU .

Proof. First note that the statement for S 2 IU follows from the statement for Vopen by Lemma 7.3. For V open, the � inequality follows from Lemma 7.7. The �inequality follows from Lemma 7.7 with f replaced by f ı F, F replaced by F�1,FŒU � replaced by U and V replaced by FŒV � (recall that the set FŒU � is open). ut

7.9 Theorem. (The Substitution Theorem) Let F satisfy Assumption 7.5, and letf W FŒU � ! R be a continuous function. Let S 2 BU . Then

Z

FŒS �f D

Z

S

.f ı F/jdet.DFx/jdx; (C)

provided that the integral on at least one side of the equation exists and is finite.

Proof. By considering f C D max.f; 0/, f � D � min.f; 0/, (recall 5.2 ofChapter 4), we may assume f � 0. By Proposition 7.8 (for S 2 IU ), and bythe additivity of the integral, we clearly have (C) for all S 2 SU and hence ourstatement follows from Proposition 7.4. ut

8 Holder’s inequality, Minkowski’s inequality andLp-spaces

In this section, we will introduce Lp-spaces, 1 � p � 1, which are a very basicsource of examples in analysis. The true significance of those spaces in mathematicswill emerge in Chapters 16 and 17 below. However, their definition and basicproperties are often used throughout analysis, and thus now is a good place to treatthem. In this section, let B be a Borel subset of Rn. Let f be a real measurablefunction defined on B . We will write (assuming that the right-hand side is finite) for1 � p < 1.

kf kp D�Z

B

jf jp� 1

p

:

In this section, we will tend to writeZ

instead ofZ

B

, since the setB will not change.

For p D 1, one defines

kf k1 D inffM � 0 j f .x/ � M almost everywhere on Bg;again, assuming this number is finite.


8.1 Theorem. (Holder’s inequality) Let p; q > 1 and let1

pC 1

qD 1. We have

Zjfgj � kf kpkgkq:

Proof. Put ˛ D kf kp , ˇ D kgkq . Then

Z1

˛jf jp D

Z1

ˇjgjq D 1:

Set f D 1˛f and g D 1

ˇg. By Young’s inequality 4.5.3 of Chapter 1 we have

jf .x/g.x/j � jf .x/jp

C jg.x/jq

;

and hence

1

˛

1

ˇ

Zjfgj D

Zjf gj � 1

p

Zjf jp C 1

q

Zjgjq D 1

pC 1

qD 1;

and finally

Zjfgj � ˛ˇ D kf kpkgkq: ut

8.1.1 Observation. If jf jp and jgjq are linearly dependent, then

Zjfgj D kf kpkgkq:

Remark: The equality holds if and only if the functions are dependent, but wewill not need the other implication.

Proof. Let, say, jgjq D ˛jf jp . Then

kgkq D .

Zjgjq/ 1q D ˛

1q .

Zjf jp/ 1q D ˛

1q .kf kp/

pq

and hence

kf kpkgkq D ˛1q .kf kp/1C

pq D ˛

1q .kf kpp/

pCqpq D ˛

1q kf kpp:


On the other hand we also haveZ

jf jjgj DZ.jf j˛ 1

q jf j pq / D ˛1q

Zjf j pq C1 D ˛

1q

Zjf jp. 1q C 1

p / D ˛1q kf kpp:

ut8.2 Theorem. (Minkowski’s inequality) We have, for 1 � p � 1,

kf C gkp � kf kp C kgkpwhenever the right-hand side is defined.

Proof. The inequality is obvious for p D 1 and p D 1, hence we can assume that1 > p > 1.

Recall Proposition 4.5.2 of Chapter 1. For p � 1 and x � 0, the function f .x/ Dxp is convex (since h00.x/ D p.p � 1/xp�2 � 0) and hence we have

jf C gjp � .1

2j2f j C 1

2j2gj/p D 1

2j2f jp C 1

2j2gjp D 2p�1jf jp C 2p�1jgjp:

Thus, first, if the integralsR jf jp and

R jgjp are finite, also the integral of the sumR jf C gjp is finite, and kf C gkp makes sense. If it is zero then the inequalityholds. Thus suppose it is not zero.

We have

.kf Cgkp/p DZ

jf Cgjp �Z.jf jCjgj/jf Cgjp�1 D

Zjf jjf Cgjp�1C

Zjgjjf Cgjp�1:

Proceed, using Holder inequality, taking into account that1

qD 1� 1

pD p � 1

pand

hence q D p � 1p

,

� � � � ..

Zjf jp/ 1p C.

Zjgjp/ 1p /.

Zjf Cgj.p�1/ p

p�1 /1� 1

p D .kf kpCkgkp/.kf Cgkp/p�1:

Hence

.kf C gkp/p � .kf kp C kgkp/.kf kp C kgkp/.kf C gkp/p�1

and Minkowski’s inequality follows dividing both sides by .kf C gkp/p�1. ut

8.3 The definition ofLp

Denote by Lp.B/ the set of all measurable functions on B for which

jjf jjp < 1:


By Theorem 8.2, Lp.B/ is a vector space over R, and it may appear that

jjf � gjjp (8.3.1)

therefore defines a norm on Lp.B/ in the sense of 1.2.1 of Chapter 2. This is,however, not true for the simple reason that two functions f; g which are equalalmost everywhere have 0 distance! It is immediately obvious, on the other hand,that the converse is also true, since we have the following fact.

8.3.1 Lemma. If f W B ! Œ0;1� andRXf D 0, then f D 0 almost everywhere

on B .

Proof. Let, for " > 0, E" D fx 2 X jf .x/ > "g. Then clearlyRXf > "�.E"/,

so �.E"/ D 0. The set E D E1=1 [ E1=2 [ � � � [ E1=n [ : : : therefore satisfies�.E/ D 0, but we have E D fx 2 X jf .x/ ¤ 0g. ut

Thus, we see that (8.3.1) gives a well-defined norm on the quotient space

Lp.B/ D Lp.B/=L0

where L0 is the subspace of functions which are 0 almost everywhere. (SeeSection 6 of Appendix A for the definition of a quotient vector space.) Moreprecisely, the formula (8.3.1) is applied to representatives f , g of two equivalenceclasses constituting the quotient space Lp.B/, but does not depend on the choiceof representatives. Additionally, by what we just observed, the distance of twoequivalence classes which are not equal cannot be 0.

In the context of the normed vector spaces Lp.B/, it is common to identify afunction f with the coset to which it belongs to notationally, i.e. to write f 2Lp.B/. This slight imprecision does not tend to cause difficulties.

8.4 A comment of complex functions

Sometimes, we are interested in an analogue of theLp-spaces for complex functions.In this context, the following simple result is useful:

8.4.1 Lemma. Let f W B ! C be an integrable function. Then

jZ

B

f j �Z

B

jf j:

Proof. Let ˛ be such that j˛j D 1 and ˛RBf D j R

Xf j. Then

jZ

B

f j D ˛

Z

B

f DZ

X

f DZ

B

Re. f / �Z

B

jf j:

(The last equality follows from the fact thatRXf is real.) ut


Therefore, Minkowski’s inequality also holds for complex-valued functions bythe following argument:

jjf C gjjp � jj jf j C jgj jjp � jj jf j jjp C jj jgj jjp D jjf jjp C jjgjjp:

The case of p D 1 needs a separate (easy) discussion, see Exercise (17). Note thata complex analogue of Holder’s inequality follows from the real case immediately.Thus, we can define the normed vector spaces Lp.B;C/, 1 � p � 1 completelyanalogously as the spaces Lp.B/, with real functions replaced by complex ones.

8.5 Completeness of the spacesLp

8.5.1 Lemma. (Fatou’s Lemma) Let fn W B ! Œ0;1� be measurable functions.Then

Z

B

.lim infn!1 fn/ � lim inf

n!1

Z

B

fn:

Proof. Let gn D infm�n fm. We have

Z

B

gn � infm�n

Z

B

fn;

while gn % lim infn!1 fn, so the statement follows by passing to the limit by the

Lebesgue Monotone Convergence Theorem. ut

8.5.2 Theorem. The spaces Lp.B/ and Lp.B;C/, 1 � p � 1, are completemetric spaces.

Proof. Consider, for example, the complex case (the proof in the real case is thesame). Let fn W X ! C represent a Cauchy sequence in Lp . Then there existn1 < n2 < � � � < nk < : : : such that

1XkD1

jjfnk � fnkC1jjp < 1:

For p < 1, this means that

1XkD1

jfnk .x/ � fnkC1.x/jp < 1

almost everywhere, so .fnk .x//k is a Cauchy sequence in C almost everywherein x 2 B , so the sequence of functions fk converges in a set S � B such that�.B X S/ D 0. In the case p D 1, the same conclusion also holds, and moreover,


in that case, the convergence is uniform (Exercise (18)). Now let f .x/ D lim fnk .x/

for x 2 S , and f .x/ D 0 for x 2 X X S . In the case of p D 1, we are done. Forp < 1, by Fatou’s Lemma 8.5.1,

Z

B

jfn � f jp � lim infk!1

Z

B

jfn � fnk jp: (8.5.1)

If we choose n such that jjfn � fmjjp < ", then the right-hand side of (8.5.1) is � ".The right-hand side of (8.5.1) converges to 0 with n ! 1 because the sequence fnis Cauchy. ut

8.6 An inequality betweenLp norms

8.6.1 Lemma. Let 1 < p and let B � Rn be a Borel subset such that �.B/ < 1.

Then

�1

�.B/

Z

B

jf .x/j�p

� 1

�.B/

Z

B

jf .x/jp:

Proof. Put

x0 D 1

�.B/

Z

B

jf .x/j:

Since .xp/00 > 0 on .0;1/, the derivative of xp is increasing on .0;1/. Therefore,if we let b D .x0/

p and let a be the value of .xp/0 D pxp�1 at x0, we have

ax0 C b D .x0/p

and the derivative of ax C b is � .xp/0 on .0; x0/ and � .xp/0 on .x0;1/. Weconclude that

ax C b � xp for all x 2 .0;1/:

Now compute:

1

�.B/

Z

n

jf .x/jp � 1

�.B/

Z

B

.ajf .x/j C b/ D ax0 C b D .x0/p;

as claimed. ut

8.6.2 Theorem. Let �.B/ < 1, 1 � r � p � 1. Then, for a measurable functionf on B ,

jjf jjr � �.B/1p� 1

r jjf jjp:

9 Exercises 141

In particular, Lp.B/ is a closed subspace of Lr.B/ (and similarly in the complexcase).

Proof. Clearly, the case of p D 1 is a direct consequence of the definition.Additionally, it suffices to consider the case r D 1 (otherwise, replace f by jf jrand p by p=r . The case of r D 1 and p < 1 follows from Lemma 8.6.1. ut

9 Exercises

(1) Prove the statement contained in Remark 1.2.(2) Consider a modification of Theorem 1.1 where one replaces Lup, Ldn by L�.

Is this modified statement true? Prove or disprove.(3) Prove Proposition 2.4.(4) Prove that sets of measure 0 as defined in 8.4 of Chapter 4 are precisely

Lebesgue measurable sets of measure 0, as defined in 3.1.(5) (a) Prove that if A1 � A2 � : : : are measurable sets and A D S

Ai , then

�.A/ D lim�.Ai /: (*)

(b) Now let A1 � A2 � : : : , A D TAi . Give an example when (*) does not

hold. Formulate a reasonable hypothesis which fixes the problem. [Hint:Finiteness.]

(6) Let M � Rm be a measurable set, and let f W M ! R be a function suchthat for every Borel set S � Rm, f �1ŒS� is measurable. Prove that then thefunctionf defined by

f .x/ D(f .x/ for x 2 M;0 otherwise

is measurable.(7) Give an example of a measurable function f W Rm ! R such that there exists

a measurable set S � R where f 1ŒS� is not measurable.(8) Prove the following strengthening of Corollary 4.3.1: Let S be a Lebesgue

measurable set in Rm. Then there exists a subset K � S of type F� (acountable union of compact sets) such that �.S X K/ D 0. [Hint: First notethat for a real function f 2 Zdn, f �1Œha;1/� is closed. Now in the proof of4.4, we produced a non-decreasing sequence of Zdn-functions fn � cS suchthat fn % cS almost everywhere. Let K be the union of f �1

n Œh1=2;1/�.](9) Prove that if S is a Lebesgue measurable set in Rm, then there exists a set

U of type Gı (countable intersection of open sets) containing S such that�.U X S/ D 0.


(10) Prove that a bounded function on a compact interval ha; bi is Riemann-integrable if and only if it is continuous almost everywhere. (An analoguein Rm also holds and can be proved using analogous methods.) [Hint: Fornecessity, take a sequence of partitions for which both the upper and lowerRiemann sums converge to the integral; prove that the function is continuousoutside of the union of any set of closed intervals which are neighborhoods ofall the points ti involved in all these partitions - recall 8.1 of Chapter 1. Forsufficiency, let f be continuous almost everywhere. Let Fn be the set of allx0 2 ha; bi such that lim sup

x!x0

jf .x/ � f .x0/j � 1=n. Then Fn is closed, and,

by assumption, covered by a set S of countably many open intervals the sumof lengths of which is < 1=n. For a point x0 … Fn, consider a ıx0 > 0 suchthat for x 2 .x0; ıx0/, jf .x/ � f .x0/j < 1=n. Then ha; bi is contained inthe union of the elements of S and all the .x0; ıx0=2/, x0 … Fn. Hence, by5.5 of Chapter 2, ha; bi is contained in a union of elements of a finite subsetSn. Show that the partitions by the boundary points of all the intervals in Sngive upper and lower Riemann sums which converge to the same number withn ! 1.]

(11) Evaluate the integral

Z =2

0

ln.1C cos.a/ cos.x//

cos.x/dx

for 0 < a < . [Hint: Find the derivative with respect to a first.](12) Compute

Z

E

xy

where E is the tetrahedron in R3 with vertices

.0; 0; 0/T ; .1; 1; 1/T ; .2; 3; 4/T ; .3; 6; 7/T :

[Hint: Use linear substitution.](13) Spherical coordinates. Form � 2, consider the map

�m W .0;1/ .�=2; =2/�n�2 .0; 2/ ! Rm

given as follows. If we denote the variables in the target as x1; : : : ; xm and thevariables in the source as r; t1; : : : ; tm�1 then

x1 D r cos.t1/ : : : cos.tm�1/;

xi D r cos.t1/ : : : cos.tm�i / sin.tm�iC1/ i D 2; : : :m:

Prove that

9 Exercises 143

jdetD�mj.t1;:::;tm�1/j D rm.cos.t1//m�2 cos.t2/

m�3 : : : cos.tm�1/1:

[Hint: Express �m D ı � where � is given by the formula

y1 D tm�1; .y2; : : : ; ym/ D �m�1.r; t1; : : : ; tm�2/

and is given by

x1 D y2 cos.y1/; x2 D y2 sin.y1/; xi D yi for 3 � i � m.

Use the chain rule.](14) Using Exercise (13), compute the volume �.Dm/ where

Dm D f.x1; : : : ; xm/jX

x2i � rg � Rm:

(15) Prove that

Z

R

e�t 2dt D p:

[Hint: First compute

Z

R2

e�x2�y2

using 2-dimensional spherical (Dpolar) coordinates. The integral in questionis the square root of the result. Why?]

(16) Let U be an open subset of Rn, and let F W U ! Rn be a map satisfyingAssumption 7.5. Prove that if U is connected, then det.DF/ does not changesigns on U . [Hint: Recall 5.1.1 of Chapter 2.]

(17) Define in detail the metric space L1.B;C/.(18) Complete the details of the proof of Theorem 8.5.2 for p D 1.(19) Using the method of Lemma 8.6.1, prove the following Jensen inequality: If �

is a convex function on .0;1/, then

�.1

�.B/

Z

B

jf .x/j/ � 1

�.B/

Z

B

�.jf .x/j/:

(20) (“Baby Lp”) Define, on Rn or Cn, 1 � p < 1

k.x1; : : : ; xn/kp D .jx1jp C � � � C jxnjp/1=p

(and similarly forCn). Prove that this makesRn, Cn into normed vector spaces.What is the appropriate definition in the case of p D 1?

6Systems of Ordinary Differential Equations

1 The problem

1.1

A system of ordinarydifferential equations (briefly, ODE’s) is a problem of findingfunctions y1.x/; : : : ; yn.x/ on some open interval in R such that

y0k.x/ D fk.x; y1.x/; : : : ; yn.x// for k D 1; : : : ; n (1.1.1)

where fk are continuous functions of nC 1 real variables. Note that then yi , sincethey are required to have a derivative, must in particular be continuous, and thederivative is then also continuous by (1.1.1). The expression “ordinary” indicatesthat there appear only derivatives of functions of one variable, not partial derivativesof functions of several variables.

Using the vector symbols y, f as in Chapter 3, we can describe the task by writing

y0.x/ D f.x; y.x//:

1.2

We may encounter systems involving higher derivatives, such as for example

y.4/1 D f1.x; y1; y2; y

01; y

02; y

001 ; y

002 ; y

0001 ; y

0002 /;

y0002 D f2.x; y1; y2; y

01; y

02; y

001 ; y

002 ; y

0002 /:

This appears to call for a generalization of the original problem. But in fact, suchsystems are easily converted to systems of ODE’s as above: in this particular case,introduce additional variables


145

146 6 Systems of Ordinary Differential Equations

z1 D y1; z2 D y2; z3 D y01; z4 D y0

2; z5 D y001 ; z6 D y00

2 and y7 D y0001 ;

making the two equations into the equivalent system of the form (1.1.1):

z01 D z3;

z02 D z4;

z03 D z5;

z04 D z6;

z05 D z7;

z06 D f2.x; z1; : : : ; z7/;

z07 D f1.x; z1; : : : ; z6; f2.x; z1; : : : ; z7//:

The reader certainly sees how to apply this procedure in a general situation

y.k1/1 D f1.x; y1; : : : ; y

.k1/1 ; : : : ; yn; : : : ; y

.kn/n /;

: : :

y.kn/n D fn.x; y1; : : : ; y

.k1/1 ; : : : ; yn; : : : ; y

.kn/n /:

(1.2.1)

Introduce additional variables for all the derivatives of yi of order less than thehighest order derivative of yi which occurs in the system, and rewrite the originalsystem in terms of the additional variables, introducing additional equations relatingthe new variables as derivatives of each other (see Exercise (1), (2)). To be explicit,one sometimes refers to a system of the form (1.1.1) as a system of first-order ODE’s,but we already see that such systems are all we need to consider.

1.3

We may, in fact, encounter even more general systems, namely a system of equationsof the form

F1.x; y1; : : : ; y.k1/1 ; : : : ; yn; : : : ; y

.kn/n / D 0;

: : :

Fm.x; y1; : : : ; y.k1/1 ; : : : ; yn; : : : ; y

.kn/n / D 0:

(1.3.1)

In such a case, we will always assume that m D n and that the Jacobian of theFi ’s in the variables corresponding to y.k1/1 ; : : : ; y

.kn/n is non-zero. Then, using the

Implicit Function Theorem 6.3 of Chapter 3, the system (1.3.1) can be converted (atleast locally) to the system (1.2.1), and hence again, by the method explained there,to a first-order system of the form (1.1.1). Ifm ¤ n or the Jacobian in question is 0,the problem (1.3.1) will be considered ill-posed from our point of view.

2 Converting a system of ODE’s to a system of integral equations 147

Note that whether the problem (1.3.1) is well-posed depends on the values of x,the yi ’s and their derivatives up to y.ki�1/i , and the (number) solution of the resulting

equations for the y.ki /i ’s. We will see, however, that this is in the spirit of the theorywe will develop, as in solving the system (1.2.1), we get to specify x, the yi ’s andtheir derivatives up to y.ki�1/i as initial conditions. (This is equivalent to specifyingx and yi as initial condition in the system 1.1.)

The translations of 1.2 and 1.3 serve a theoretical purpose. They may often bedifficult to carry out in practice. In many cases, different reductions may be moreadvantageous. (See Exercise (3).)

1.4 Remarks

1. To simplify notation, we write y0 D f .x; y/ instead of the more correct y0.x/ Df .x; y.x//, etc. Thus, the symbol y may feature both as a variable in a functionf of two variables, and as a name of a function y.x/.

2. Differential equations play a fundamental role in various applications. Let us justmention a simple geometric interpretation of the ODE y0 D f .x; y/: the functionf .x; y/ determines directions at individual points .x; y/ of the plane R2; thegraphs of the desired solutions are curves following the prescribed directions.

2 Converting a system of ODE’s to a system of integralequations

2.1 Theorem. Let .a; b/ be an open interval containing a number x0. Let �1; : : : ; �nbe arbitrary real numbers. Then the functions y1.x/; : : : ; yn.x/ constitute a solutionof the ODE system

y0j .x/ D fj .x; y1.x/; : : : ; yn.x//; j D 1; : : : ; n (2.1.1)

in this interval such that, moreover, yj .x0/ D �j if and only if they satisfythe equations

yj .x/ DZ x

x0

fj .t; y1.t/; : : : ; yn.t//dt C �j : (2.1.2)

Proof. This is an easy consequence of the Fundamental Theorem of Calculus.If (2.1.1) is satisfied then one has

yj .x/ DZ x

x0

fj .t; y1.t/; : : : ; yn.t//dt C cj


for some constants cj . If, moreover, yj .x0/ D �j we obtain for x D x0,

�j D yj .x0/ DZ x0

x0

fj .: : : /dt C cj D 0C cj :

On the other hand, if the functions yj .x/ satisfy (2.1.2) then by taking the derivativeby x we obtain that y0

j .x/ D fj .x; y1.x/; : : : ; yn.x//, and setting x D x0 weconclude that yj .x0/ D �j . ut

2.2 Remark

This very easy translation of our problem has in fact a quite surprising consequence.Let us illustrate it on the equation y0 D f .x; y/. Denote by D the operator of takingthe derivative, and by F the operator transforming y.x/ to f .x; y.x//. Further,define an operator J by setting

J.y/.x/ DZ x

co

f .t; y.t//dt:

The original task was to solve the equation

D.y/ D F.y/: (*)

This looks somewhat scary: for example, if we take the space X D C..a; b//

of bounded continuous functions on .a; b/ as considered in 7.7 of Chapter 2,the operator D is not even defined on X , as not every continuous function hasa derivative. It seems that in order to treat the equation by means of spaces offunctions, we would have to think hard what space to work on, and what metricto choose to make both sides of the equation (*) continuous. Such problems do,indeed, arise with some types of differential equations.

However, in case of our system (1.1.1), Theorem 2.1 gives a way out: After thetranslation we obtain the equation

y D J.y/ (**)

where J is (as we will see) continuous. Furthermore, this is a fixed-point problemabout which we already know something (see 7.6 of Chapter 2); indeed, the BanachFixed Point Theorem will be of a great help.

3 The Lipschitz property and a solution of the integral equation 149

3 The Lipschitz property and a solution of the integralequation

3.1

Let f .x; y1; : : : ; yn/ be a function in nC 1 (real) variables. It is said to be Lipschitzin the variables y1; : : : ; yn if there exists a numberM such that

jf .x; y1; : : : ; yn/� f .x; z1; : : : ; zn/j � M � maxi

jyi � zi j:

We say that f is locally Lipschitz in y1; : : : ; yn if for each u0 D .x0; y01 ; : : : ; y

0n/

of the domain in question there is an open U 3 u0 such that the restriction f jU isLipschitz.

3.2 Observation. If a function f .x; y1; : : : ; yn/ has continuous partial derivatives@f

@yjthen it is locally Lipschitz.

(Indeed take a point u0 D .x0; y01 ; : : : ; y

0n/, an open set U 3 u0 and an M

such thatˇˇ@f .x; y1; : : : ; yn/

@yj

ˇˇ � M

n:

Then by the Mean Value Theorem, we have for .x; y1; : : : ; yn/; .x; z1; : : : ; zn/ 2 U ,

jf .x; y1; : : : ; yn/ � f .x; z1; : : : ; zn/j DˇˇˇXj

@f .: : : /

@yj� .yj � zj /

ˇˇˇ

�Xj

ˇˇ@f .: : : /@yj

ˇˇ � jyj � zj j � n � M

n� max

jjyj � zj j:/

3.3 Theorem. Let fj .x; y1; : : : ; yn/, j D 1; : : : ; n be continuous and Lipschitz inthe variables y1; : : : ; yn in a neighborhood of a point u D .x0; �1; : : : ; �n/. Thenthere is an a > 0 such that in the interval .x0 � a; x0 C a/ the system of equations

uj .x/ DZ x

x0

fj .t; u1.t/; : : : ; un.t//dt C �j

has precisely one solution u1; : : : ; un.

Proof. First, choose a neighborhood

U D .x0 � ˛0; x0 C ˛0/ .y1 � ˇ0; y1 C ˇ0/ � � � .yn � ˇ0; yn C ˇ0/


on which f jU is Lipschitz. Now choose 0 < ˛ < ˛0 and 0 < ˇ < ˇ0. We have anM such that

jx0 � xj � ˛; j�j � yj j � ˇ; j�j � zj j � ˇ

implies that

jfj .x; y1; : : : ; yn/� f .x; z1; : : : ; zn/j � M � maxi

jyi � zi j:

Since f is continuous we also have an A such that

jfj .x; y1; : : : ; yn/j � A

in the compact interval hx0 �˛; x0 C ˛i h�1 �ˇ; �1 Cˇi � � � h�n �ˇ; �n Cˇi(recall Proposition 6.3 of Chapter 2).

Choose an a such that(1) 0 < a � ˛,

(2) a � ˇ

A, and

(3) a � q

Mfor some q < 1.

Consider the space of continuous functions

C D C..x0 � a; x0 C a//

(recall 7.7 of Chapter 2) and the subspaces

Yj D fu j u 2 C; �j � ˇ � u.x/ � �j C ˇg:All the Yj are complete metric spaces and hence also the product

Y D Y1 Y2 � � � Ynwith, say, the maximum metric

�.u; v/ D maxj�j .uj ; vj /;

where �j .�; / D supx j�.x/� .x/j, is complete (7.7.2 and 7.3.1 of Chapter 2).Now define for u D .u1; : : : ; un/

J.u/ D .J1.u/; : : : ; Jn.u//

where

Jj .u/.x/ DZ x

x0

fj .t; u1.t/; : : : ; un.t//dt C �j :

4 Existence and uniqueness of a solution of an ODE system 151

Since

jJj .u/.x/ � �j j DˇˇZ x

x0

fj .t; u1.t/; : : : ; un.t//dt

ˇˇ

�Z x

x0

jfj .t; u1.t/; : : : ; un.t//jdt � jx0 � xj � A � a � A � ˇ;

J is a mapping Y ! Y , and our problem is to find a fixed point of J . We have

�.J.u/; J.v// D maxk

supx

jJk.u/.x/� Jk.v/.x/j

D maxk

supx

ˇˇZ x

x0

fk.t; u1.t/; : : : /dt �Z x

x0

fk.t; v1.t/; : : : /dt

ˇˇ

D maxk

supx

ˇˇZ x

x0

fk.t; u1.t/; : : : / � fk.t; v1.t/; : : : /dt

ˇˇ

� maxk

supx

Z x

x0

jfk.t; u1.t/; : : : / � fk.t; v1.t/; : : : /jdt D c:

Since we have jfk.t; u1.t/; : : : / � fk.t; v1.t/; : : : /j � M � maxjxjuj .t/ � vj .t/j �

M � maxj

supx

juj .x/ � vj .x/j D M � �.u; v/ we obtain

�.J.u/; J.v// � c � maxj

supx

jx � x0j �M � �.u; v/ � a �M � �.u; v/ � q � �.u; v/:

Thus, J W Y ! Y satisfies the condition of the Banach Fixed Point Theorem 7.6 ofChapter 2 and we conclude that there is precisely one u such that J.u/ D u, that is,precisely one solution of our integral equations on the interval .x0 � a; x0 C a/. ut

4 Existence and uniqueness of a solution of an ODE system

4.1

Using 2.1, we immediately infer from Theorem 3.3 the following

Theorem. (The Picard-Lindelof Theorem) Let fj .x; y1; : : : ; yn/, j D 1; : : : ; n becontinuous and let them be Lipschitz with respect to y1; : : : ; yn in a neighborhoodof a point u D .x0; �1; : : : ; �n/. Then for a sufficiently small a > 0 the system

y0j .x/ D fj .x; y1.x/; : : : ; yn.x//; j D 1; : : : ; n

has precisely one solution on .x0 � a; x0 C a/ such that yj .x0/ D �j for all j .


Remark. Thus, unlike the uniqueness in 3.3, the solution is unique with respectto the extra conditions yj .x0/ D �j . These requirements are usually referred to asthe initial conditions.

4.2

The solutions in 4.1 are of a local character, that is, they are guaranteed in a smallneighborhood of the initial point x0 only. Now we will head to solutions of a moreglobal character, defined as far as possible. To start with, we will speak of a localsolution .u; J / defined on an open interval J and we will endeavour to extend the J .

4.2.1 Lemma. Under the conditions of 4.1, let J;K be open intervals, let x0 2J \K , and let .u; J / and .v; K/ be local solutions such that u.x0/ D v.x0/. If f iscontinuous and Lipschitz with respect to the yj in the domain in which we considerour system, we have ujJ \K D vjJ \K .

Proof. By 4.1, if the u and v coincide at a point they coincide in some of its openneighborhoods. Thus,

U D fx j u.x/ D v.x/; x 2 J \Kgis an open subset of J \K . From the continuity of u, v it follows that U is closed aswell. Since J \K is an interval, hence connected by 5.2.2 of Chapter 2, and sinceU is non-empty,U D K \ J . ut

4.2.2Take the union of all the intervals J on which there exists a solution u satisfyinguj .x0/ D �j . By Lemma 4.2.1, there exists a solution .u; J / with the domain J .Such maximal solutions are called the characteristics of the given ODE system. Inthis terminology we can summarize the preceding facts in the following

Theorem. Let U be an open subset of RnC1 and let f.x; y1; : : : ; yn/ W U ! R becontinuous and locally Lipschitz in y1; : : : ; yn, Then for each .x0; �1; : : : ; �n/ thereis a unique characteristic u such that uj .x0/ D �j

4.3

Consider a differential equation

y.n/ D f .c; y; y0; : : : ; y.n�1//: (4.3.1)

From the method of 1.2 and from Theorem 4.2.2, we obtain the following

5 Stability of solutions 153

Corollary. Let U be an open subset of RnC1 and let f .c; y1; y2; : : : ; yn/ becontinuous and locally Lipschitz in y1; : : : ; yn. Then for each .xo; �1; : : : ; �n/ 2 U

there exists precisely one solution y of the equation (4.3.1) with maxinum intervaldomain such that

yk.x0/ D �kC1; k D 0; : : : ; n � 1:

4.4 Examples

1. The domain of a characteristic may not be equal to the domain on which adifferential equation is defined. For example, the differential equation

y0 D 1C y2

has solutions y D tan.x C C/ where C is any constant, as easily verified. (Seethe next section for a more systematic method for finding the solution.)

2. The Lipschitz condition in the assumptions is essential. Consider the equation

y0 D 3y23 :

We have the solutions y.x/ D .xCc/3. The function f .x; y/ D 3y23 is Lipschitz

in y in all the points but the .x; 0/. And indeed in these exceptional points wehave solutions

y.x/ D

8ˆ<ˆ:

.x � a/3 for x � a;

0 for a � x � b;

.x � b/3 for x � b;

all of them satisfying y.x0/ D 0 for any x0 2 .a; b/.

5 Stability of solutions

5.1 The problems of stability

Consider the equations

y0j .x/ D fj .x; y1.x/; : : : ; yn.x//;

yj .x0/ D �j ; j D 1; : : : ; n

solved as in 4.1. The solution depends (uniquely) on the �j . A question naturallyarises whether this dependence is continuous. For example, if it were not continuous,


using of the solution in practical applications would be rather suspect, as the effectof small errors in initial conditions would be unpredictable.

Furthermore, a practical setting often contains additional parameters, so thesystem becomes

y0j .x; ˛1; : : : ; ˛k/ D fj .x; y1.x; ˛1; : : : ; ˛k/; : : : ; yn.x; ˛1; : : : ; ˛k//;

yj .x0; ˛1; : : : ; ˛k/ D �j ; j D 1; : : : ; n:(*)

As before, the derivative is taken by x (while technically, this is a partial derivative,the convention is to continue using the ordinary derivative symbol to emphasizethe fact that we have one system of ordinary differential equations for eachvalue of the parameters). As, again, in practice the parameters are known onlyapproximately, the solution makes practical sense only if it depends continuouslyon the parameters ˛i .

The two stability problems can be reduced to one. Fix initial conditions �0j ,consider

ˇi D �i � �0i ; zi D yi C ˇi

and define

gj .x; z1; : : : ; zn; ˛1; : : : ; ˛k; ˇ1; : : : ; ˇn/ D fj .x; z1 C ˇ1; : : : ; zn C ˇn; ˛1; : : : ; ˛k/

which turns the combined task (*) into

z0j .x;˛1; : : : ; ˛k; ˇ1; : : : ; ˇn/

D gj .x; z1.x; ˛1; : : : ; ˛k; ˇ1; : : : ; ˇn/; : : : ; zn.x; ˛1; : : : ; ˛k; ˇ1; : : : ; ˇn//;

zj .x0;˛1; : : : ; ˛k; ˇ1; : : : ; ˇn/ D �0j ; j D 1; : : : ; n

with the initial values �0j fixed. Thus, it suffices to study the dependence of thesystem on parameters only, with initial conditions fixed; in the notation (*), thismeans we will study stability with respect to ˛1; : : : ; ˛k , with �j fixed.

5.1.1 RemarkOne can also convert the combined stability problem into a problem concerninginitial conditions only. But the trick with parameters is more expedient and we willconcentrate on that.

5.2 Lemma. (Gronwall’s inequality) Let F be a non-negative real-valued functionon an interval ha; bi and let there exist positive constants C;K such that for allx 2 ha; bi we have

F.x/ � C CK

Z x

a

F.t/dt:


Then for all x 2 ha; bi,

F.x/ � C � eK.x�a/:

Proof. Put

G.x/ D C CK

Z x

a

F.t/dt:

Then we have

F.x/ � G.x/ and G0.x/ D K � F.x/ � K �G.x/:

Since G.x/ > 0, we have

G0.x/G.x/

� K

and henceZ x

a

G0.t/G.t/

dt � K �Z x

a

1 � dt D K.x � a/:

Subsituting in the first integral y D G.t/, we obtain

Z G.x/

G.a/

dy

yD lnG.x/ � lnG.a/ D lnG.x/ � lnC;

so that

lnG.x/ � lnC CK.x � a/; and hence G.x/ � C � eK.x�a/:

Using F.x/ � G.x/ again, we obtain the desired inequality. ut

5.3

To simplify notation, in the proof of the following theorem we will write ˛ for˛1; : : : ; ˛k and use the symbol

k˛k for maxjD1;:::;k

j˛j j:

Similarly, for a system we will write y1; : : : ; yn resp. y1.x/; : : : ; yn.x/,

kyk D maxjD1;:::;n jyj j or ky.x/k D max

jD1;:::;n jyj .x/j:


Theorem. Let fj .x; y1; : : : ; yn; ˛1; : : : ; ˛k/ be functions continuous in all vari-ables and Lipschitz in the variables yj and ˛j in some neighborhood of a point

.x0; �0; : : : ; �n; ˛01; : : : ; ˛

0k/: (5.3.1)

Then the solution yj .x; ˛1; : : : ; ˛k/ of the system of equations

y0j .x; ˛1; : : : ; ˛k/ D fj .x; y1.x; ˛1; : : : ; ˛k/; : : : ; yn.x; ˛1; : : : ; ˛k/; ˛1; : : : ; ˛k/;

yj .x0; ˛1; : : : ; ˛k/ D �j ; j D 1; : : : ; n

is continuous in all variables in some neighborhood U of the point (5.3.1).Moreover, if K is a Lipschitz constant for the variables y1; : : : ; yn, ˛1; : : : ; ˛n, wehave an estimate on U :

jyj .x; ˛1; : : : ; ˛k/� yj .x; ˇ1; : : : ; ˇk/j � maxiD1;:::;k j˛i � ˇi jeK.x�a/ (5.3.2)

for all j D 1; : : : ; n.

Proof. We have

jyj .x;˛/� yj .x;ˇ/j D

jZ x

a

fj .t; y1.t;˛/; : : : ; yn.t;˛/;˛/dt � fj .t; y1.t;ˇ/; : : : ; yn.t;ˇ/;ˇ/dt j

�Z x

a

jfj .t; y1.t;˛/; : : : ; yn.t;˛/;˛/ � fj .t; y1.t;ˇ/; : : : ; yn.t;ˇ/;ˇ/jdt

�Z x

a

.jfj .t; y1.t;˛/; : : : ;˛/ � fj .t; y1.t;˛/; : : : ;ˇ/j

C jfj .t; y1.t;˛/; : : : ;ˇ/ � fj .t; y1.t;ˇ/; : : : ;ˇ/j/dt

�Z x

a

.K � k˛ � ˇk CK � ky.t;˛/� y.t;ˇ/k/dt;

so

ky.x;˛/� y.x;ˇ/k � K

Z x

a

.k˛ � ˇk C ky.t;˛/� y.t;ˇ/k/dt:

If we set F.x/ D k˛ � ˇk C ky.x; ˛/ � y.x; ˇ/k, we obtain

F.x/ � k˛ � ˇk CK

Z x

a

F.t/dt:

By Lemma 5.2, we now have


F.x/ � k˛ � ˇkeK.x�a/

and since ky.x;˛/ � y.x;ˇ/k � F.x/, the estimate (5.3.2) follows. ut

5.4 Remark

Recall that the existence and uniqueness in Theorem 4.1 was proved using theBanach Fixed Point Theorem 7.6 of Chapter 2. The reader may naturally askwhether the stability theorem (at least the continuity) is not an easy consequence ofa general property of such fixed points. That is, we think of the following problem.

Let us have metric spaces X; T and a mapping

f W X T ! X

such that d.f .x; t/; f .y; t// � rt where rt < 1 depend on t 2 T only. DefineF.t/ 2 X by the equation f .F.t/; t/ D F.t/. How does F.t/ depend on t?

There are fairly general facts known on this subject, but they do not fit well withour present topic. Due to the special character of our equations it is, luckily enough,easy to show the dependence by an explicit estimate, as we have done.

5.5

The solution of a system of differential equations is (under reasonable conditions)not only continuously dependent on parameters. In fact we can even take derivatives.

Consider, again, the system of equations

y0i .x;˛/ D fi .x; y1.x;˛/; : : : ; yn.x;˛/;˛/; i D 1; : : : ; n;

yi .x0;˛/ D �i(5.5.1)

satisfying the conditions from 4.1 (where we write, similarly as before, y0i , not

@yi

@x,

for the derivatives by x, to keep in mind the fact that we are dealing with an ordinarydifferential equation).

Theorem. Let fi .x; y1; : : : ; yn; ˛1; : : : ; ˛k/ be continuous functions defined on anopen neighborhood of a point (5.3.1), continuously differentiable with respect toyj and ˛p . Then the solutions yi .x;˛/ of the system (5.5.1), which exist and areunique on some open neighborhood U of (5.3.1), are differentiable with respect to˛p , p D 1; : : : ; k on U , and the functions

zi .x;˛/ D @yi

@˛p.x;˛/


satisfy the system of equations

z0i .x;˛/ D

nXjD1

@fi

@yj.x; y.x;˛/;˛/ � zj C @fi

@˛p.x; y.x;˛/;˛/; i D 1; : : : ; n;

zi .x0;˛/ D 0; (5.5.2)

where we write briefly y for y1; : : : ; yn and ˛ for ˛1; : : : ; ˛k .

Remarks.1. The continuous differentiability with respect to yj and ˛p makes, of course, the

functions fi locally Lipschitz with respect to these variables.2. The system (5.5.1) is viewed as solved and yi .x; ˛/ constitute the (unique)

solution. The equations (5.5.2) contains these functions as aleady given, not assomething dependent on the zi . Thus, the right-hand sides of the equations in(5.5.2) are Lipschitz with respect to zj and therefore the system has a solution.Our task will be to prove that the individual zi ’s are the partial derivatives of theyi by ˛p .

3. The reader has certainly not overlooked that the equations for zi which we hoped

to be the@yi

@˛come naturally in the form (5.5.2): if we already knew yi to have

derivatives, we would obtain the equality by taking derivatives of the equalitiesin (5.5.1). But this we do not know yet.

Proof. First of all, note that the problem is immediately reduced to the case k D 1:We may treat all parameters but one as constant for the existence of a singlepartial derivative; once equation (5.5.2) is proved, we can use Theorem 5.3 to provecontinuity of the partial derivatives in all the ˛p’s. Thus, let us assume k D 1, andwrite ˛ for ˛p .

Let yi be a solution of the system (5.5.1) and z a solution of the system (5.5.2).Put

ui .x; ˛; h/ D 1

h.yi .x; ˛ C h/� yi .x; ˛//

and

vi.x; ˛; h/ D ui .x; ˛; h/ � zi .x; ˛/:

Thus,

@vi

@x.x; ˛; h/ D @ui

@x.x; ˛; h/ � z0

i .x; ˛/

D @[email protected]; ˛; h/ �

nXjD1

@fi

@yj.x; y.x; ˛/; ˛/ � zj � @fi

@˛.x; y.x; ˛/; ˛/:


Let us compute the derivative@[email protected]; ˛; h/ :

@[email protected]; ˛; h/ D 1

h.y0i .x; ˛ C h/ � y0

i .x; ˛//

D 1

h.

nXjD1

fj .x; y.x; ˛ C h/; ˛ C h/ � f .x; y.x; ˛/; ˛ C h//

C 1

h.fi .x; y.x; ˛/; ˛ C h/ � f .x; y.x; ˛/; ˛//:

By the Mean Value Theorem we may continue, writing y for y1.x; ˛ C h/ �y1.x; ˛/; : : : ; yn.x; ˛ C h/� yn.x; ˛/, that is, hu1.x; ˛; h/; : : : ; hun.x; ˛; h/,

� � � DnX

jD1

@fi

@yj.x; y.x; ˛/C �1 y; ˛ C h/ � uj .x; ˛; h/C @fi

@˛.x; y.x; ˛/; ˛ C �2h/:

Let us now consider@vi

@x. Since ui .x; ˛; h/ D vi .x; ˛; h/C zi .x; ˛/, we obtain

ˇˇ@[email protected]; ˛; h/

ˇˇ �

nXjD1

ˇˇ@[email protected]; y.x; ˛/C �1 y; ˛ C h/

ˇˇ � jvj .x; ˛; h/j

CnX

jD1

ˇˇ. @fi@yj

.x; y.x; ˛/C �1 y; ˛ C h/ � @fj

@y.x; y.x; ˛/; ˛// � zj .x; ˛/

ˇˇ

Cˇˇ@fi@˛.x; y.x; ˛/C �2h/ � @fi

@˛.x; y.x; ˛//

ˇˇ ;

and further


ˇˇ �

nXjD1

ˇˇ @fi@yj

.x; y.x; ˛/C �1 y; ˛ C h/

ˇˇ � jvj .x; ˛; h/j

CnX

jD1

ˇˇ. @fi@yj

.x; y.x; ˛/C�1 y; ˛ C h/ � @fj

@y.x; y.x; ˛/; ˛ C h// � zj .x; ˛/

ˇˇ

CnX

jD1

ˇˇ. @fi@yj

.x; y.x; ˛/; ˛ C h/ � @fj

@y.x; y.x; ˛/; ˛// � zj .x; ˛/

ˇˇ

Cˇˇ@fi@˛.x; y.x; ˛/C �2h/ � @fi

@˛.x; y.x; ˛//

ˇˇ :


Choose a compact neighbourhood of .x0; y.x0; ˛/; ˛/ and a K sufficiently large tohave, in this range,

maxi

nXjD1

ˇˇ @fi@yj

.x; y.x; ˛/; ˛/

ˇˇ � K:

Now let " > 0. From the Lipschitz property we see that for h sufficiently small wehave for all x sufficiently close to x0 to stay in the aforementioned range

nXjD1

ˇˇ. @fi@yj

.x; y.x; ˛/C �1 y; ˛ C h/� @fj

@y.x; y.x; ˛/; ˛ C h// � zj .x; ˛/

ˇˇ

CnX

jD1

ˇˇ. @fi@yj

.x; y.x; ˛/; ˛ C h/� @fj

@y.x; y.x; ˛/; ˛// � zj .x; ˛/

ˇˇ

Cˇˇ@f@˛.x; y.x; ˛/C �2h/� @f

@˛.x; y.x; ˛//

ˇˇ < "

and hence


ˇˇ � "CK

nXjD1

jvj .x; ˛; h/j

so that

nXiD1


ˇˇ � "CK

nXjD1

jvj .x; ˛; h/j;

and consequently

nXjD1

jvi .x; ˛; h/j �Z x

x0

.n"C nK

nXiD1

jvi .t; ˛; h/j/dt:

Thus, for F.x/ D n"C nK limnXiD1

jv.x; ˛; h/j we have

F.x/ �Z x

x0

F .t/dt

and can apply Gronwall inequality to obtain, for each individual i ,

jvi .x; ˛; h/j Dˇˇ1h.yi .x; ˛ C h/ � yi .x; ˛// � zi .x; ˛/

ˇˇ � ".eK.x�x0/ � 1/

K;

6 A few special differential equations 161

and since " > 0 was arbitrary we conclude that limh!0

1

h.yi .x; ˛ C h/ � yi .x; ˛// D

zi .x; ˛/: ut

6 A few special differential equations

6.1

First of all, let us realize that in the situations where the theorem on the existence anduniqueness is applicable, we do not really have to be concerned about the correctness

of the procedure we use (e.g. working withdy

dxas if it were a fraction, failing to

control whether there might not be a zero in a denominator, etc.). If we obtain afunction satisfying the equation (and initial conditions), it has to be the one andonly solution we are looking for, by Theorem 4.2.2. This is a perfect example of theimportance of theoretical work for calculations.

6.2

We have already encountered a differential equation without knowing it. Namely,looking for a primitive function of a function f is the ODE

y0 D f .x/:

In general, to determine a primitive function is by no means an easy task (indeedit is often impossible to obtain a formula in terms of elementary functions). It is,however, customary to think of an ODE as solved if it is reduced to formulas inprimitive functions.

6.3 Separation of variables

The equation

y0 D f .x/g.y/

can be treated as follows: rewrite it as

1

g.y.x//y0.x/ D f .x/

and compare the primitive functions of both sides (these are indicated by plainR

).We obtain


.

Z1

g/.y.x// D .

Zf /.x/C C:

This somewhat clumsy computation can be, more intuitively, modified as follows.Take the equation as

dy

dxD f .x/g.y/;

proceed to

dy

g.y/D f .x/dx

and “integrate”

Zdy

g.y/DZf .x/dx C C:

Examples.1. For y0 D y � sinx we obtain

Zdy

yDZ

sin xdx C C;

hence

lny D � cos x C C

yielding

jyj D e� cosxCC that is, y D D � e� cosx:

2. Similarly, the equation y0 D 1C y2 of Example 4.4 1 is transformed to

Zdy

1C y2

yielding arctany D x C C and finally y D tan.x C C/.3. For

y0 D �xy

we obtainRydy D � R xdxCC , hence 1

2y2 D � 1

2x2Cc and finally x2Cy2 D

r2. This is a very intuitive example: What curves are perpendicular in each .x; y/to the vector .x; y/? Of course, the circles with their centers at .0; 0/.

6 A few special differential equations 163

6.4

To solve the equation

y0 D f .ax C by/;

substitute z.x/ D axCby.x/. Then we have z0 �b �y0 Ca D b �f .x/, a particularlysimple example of the equations from 6.3 where the right-hand side is independentof x (such example is known as an autonomous equation).

6.5

(The “homogeneous equation” - not to be confused with homogeneous lineardifferential equations in Chapter 7 below.) To solve the equation

y0 D f .y

x/;

(in other words, y0 D F.x; y/ whereF is such that for any t , F.x; y/ D F.tx; ty/),

substitute z D y

x. Then we obtain

z0 D y0x � yx2

D y0 � z

xD .f .z/ � z/ � 1

x;

again an equation with separated variables.

6.6

The equation

y0 D f

�ax C by C c

˛x C ˇy C �

�(6.6.1)

would be of the type 6.5 if we had c D � D 0. If not, let us try to force it. Let x0; y0be a solution of the linear (algebraic) equations

ax C by C c D 0

˛x C ˇy C � D 0:

Then

ax C by C c

˛x C ˇy C �D a.x � x0/C b.y � y0/

˛.x � x0/C ˇ.y � y0/ :


If we substitute

� D x � x0; z D y � y0;

we obtain z.�/ D y.x � x0/� y0 and dxd� D 1 so that

dz

d�D y0.�/ D f

�a� C bz

˛� C ˇz

�:

The linear algebraic equations above may fail to have a solution: namely we couldhave had .a; b/ D K �.˛; ˇ/ orK �.a; b/ D .˛; ˇ/. Then, however, the equation 6.6.1is already of the form y0 D F.Ax C By/ as it is, and we can use the procedurefrom 6.4.

6.7 The linear equation y 0 D a.x/y C b.x/; first encounterwith variation of constants

First, solve the equation y0 D a.x/y. This is a case of separated variables and bythe method from 6.3, we obtain a solution

u1.x/ D c � eRa.x/dx: (6.7.1)

Let us try to find a solution of the original equation in the form

y.x/ D c.x/ � u1.x/

(because of replacing the constant c from (6.7.1) by a function in x one speaks of avariation of constant; in a more general setting, it will be used in Chapter 7 below).

Thus, we should have the equality

y0 D c0u1 C cu01

and since u01 D au1, we have, further,

y0 D c0u1 C cau1 D c0u1 C ay:

Thus, we need a c.x/ such that

b.x/ D c0.x/u1.x/

and this equality is satisfied by

c.x/ DZ

b.x/

u1.x/dx CK:

7 General substitution, symmetry and infinitesimal symmetry of a differential equation 165

6.8 At least one second-order equation

In physics, we encounter the equation

y00 D f .y/:

Such an equation can be solved as follows. First, multiply both sides by y0 to obtain

y0y00 D f .y/y0;

that is,

.1

2.y0/2/0 D ..

Zf / ı y/0 and further

1

2.y0/2 D .

Zf / ı y C C

(ı indicates composition of functions) and finally

y0 Ds2.

Zf / ı y C C ;

a case of separated variables.

7 General substitution, symmetry and infinitesimalsymmetry of a differential equation

7.1

One may ask how, looking at a differential equation, one finds the substitution whichallows us to separate variables. Of course, in most cases, it is not possible. When itis, however, there is, in fact, a general strategy for finding the substitution, relatingseparation of variables to symmetry. To study symmetry, it is convenient to writea system of differential equations in a form in which the right-hand side does notdepend explicitly on x:

y0i D fi .y1; : : : ; yn/: (7.1.1)

Clearly, this is a special case of the system (1.1.1). On the other hand, a system of theform (1.1.1) can be always reduced to the form (7.1.1) by introducing an additionalvariable y0:

y00 D 1;

y0i D fi .y0; y1; : : : ; yn/:


7.2

Now assume we have a system of the form (7.1.1). We may write it in vectornotation, putting y D .y1; : : : ; yn/

T , f D .f1; : : : ; fn/T (recall that reconciling the

direction of composition of maps with matrix multiplication favors viewing vectorsas columns here, see e.g. Appendix A, 7.5):

y0 D f.y/: (7.2.1)

Let us point out a geometric interpretation of the system (7.2.1). Denote theindependent variable by t . A solution y.t/ can be interpreted as a parametric curvewith the parameter t . Then the equation (7.2.1) says that the tangent (“velocity”)vector of the curve y at the point t is equal to f.y.t//. A function U ! Rn ona subset U � yRn when we interpret its values as vectors is called a vector field.The curves y.t/ are called integral curves of the vector field. One sometimes denotesthe solution as

y.t/ D exp.tf/y.0/; (7.2.2)

although this is somewhat misleading, given the fact that the solution is not anexponential even in the case of n D 1 unless f is constant, and cannot be figured outexplicitly in general when n > 1.

7.3

Let us now study how a vector field changes when we change variables. By asubstitution at y0 2 Rn we shall mean a smooth map � W U ! Rn where U is anopen neighborhood of y0 whose differential at y0 is non-singular. Writing z D �.y/,then, by the chain rule, we get from (7.2.1) a system of differential equations for z,

z0 D D�j��1.z/ � f.��1.z//;

(the operation on the right-hand side is matrix multiplication), so from the point ofview of differential equations, � transforms the vector field f to the vector field

g.z/ D D�j��1.z/ � f.��1.z//

in an open neighborhood of z0 D �.y0/. In other words, the differential equa-tion (7.2.1), expressed in the variables z, reads

z0 D g.z/: (7.3.1)

7 General substitution, symmetry and infinitesimal symmetry of a differential equation 167

7.4

We will call � a symmetry (at y0) if the differential equations (7.2.1) and (7.3.1)coincide, i.e. we have g.z/ D f.z/, or

f.�.y// D D�jy � f.y/: (7.4.1)

However, we are less interested in a single symmetry than in a (continuous) familyof symmetries. By this, we mean a smooth map � W RnC1 ! Rn, which, denotingthe first variable as ", and writing �."; ‹/ as �" W Rn ! Rn, has the property thateach �" is a symmetry, and

�0 D Id

(in other words, �0.y/ D y). Given a family of symmetries, what is happening near" D 0? Let

u D @�"

@"

ˇˇ"D0

: (7.4.2)

Then considering the condition (7.4.1) for � D �" and differentiating by " at " D 0,we get that

@

@"f.�".y//

ˇˇ"D0

D Df � u.y/ D @uf .y/;

@

@"D�j.";y/f.y/

ˇˇ"D0

D Dujyf.y/ D @fu.y/

(here on the right-hand side we use the notation @uf.y/ D d

dtf .y C tu/, see 2.4 of

Chapter 3).

7.5

For two smooth vector fields u, f, we write

Œu; f� D @u.f/ � @f .u/;

and call this the Lie bracket of vector fields. This is, again, a vector field. Thederivative of the condition (7.4.1) at " D 0 then reads

Œu; f� D 0: (7.5.1)


A smooth vector field u defined on an open neighborhood of y0 which satisfies(7.5.1) will be called an infinitesimal symmetry of the differential equation (7.2.1)at y0. For technical reasons (dealing with possibly different domains of definition),we will consider two infinitesimal symmetries at y0 equal when they coincide on anopen neighborood of y0.

7.6

It is worth pointing out two properties of the Lie bracket of vector fields:

Œu; v� D �Œv;u�; (7.6.1)

Œu; Œv;w��C Œv; Œw;u��C Œw; Œu; v�� D 0: (7.6.2)

The equality (7.6.2) is called the Jacobi identity. Generally, a vector space over Ror C with a binary operation Œ‹; ‹� which is linear in each coordinate and satisfiesthe equalities (7.6.1), (7.6.2) is called a Lie algebra. Thus, in particular, smoothvector fields defined on the same open subset of Rn form a Lie algebra, as dosymmetries of the differential equation (7.2.1) at a given point y0 (this follows fromthe Jacobi identity).

7.7 Comment

Several concepts of this and the next section are closely related to Chapter 12 below.After finishing that chapter, the reader may be ready to tie this in together in somehighly interesting and important geometrical notions which are beyond the scope ofthis text. For example, the notion of Lie algebra just mentioned leads to the notionof a Lie group. In Chapter 12, we will develop enough techniques to introduce theconcept of a Lie group, and will mention it briefly in Exercises (6), (7), (8) ofChapter 12. Lie groups are a major field of mathematical study. We recommend[9, 10] for further reading.

8 Symmetry and separation of variables

8.1

Given a single infinitesimal symmetry u, then

exp."u/ (8.1.1)

(used in the sense of the notation (7.2.2)) is a continuous family of symmetries. Thisis because in case of �" equal to (8.1.1), by definition, the derivative of the condition

8 Symmetry and separation of variables 169

(7.4.1) by " is the same at every point ", and is equal to Œu; f� D 0. Now given aninfinitesimal symmetry u of the equation (7.2.1) at a point y0, and assuming

u.y0/ ¤ 0; (8.1.2)

then, without loss of generality, we may assume that

u; f2.y0/; : : : ; fn.y

0/ form a basis of Rn. (8.1.3)

(By Steinitz’ Theorem 2.6 of Appendix A, this can be always achieved afterpermuting the coordinates fi .) Assuming (8.1.3) holds, consider the followingsmooth map U ! Rn defined in an open neighborhoodU of y0:

ˆ..z1; : : : ; zn/T / D exp..z1 � y01 /u/ � .y01 ; z2; : : : ; zn/: (8.1.4)

We have set things up in such a way that

ˆ.y0/ D y0;

(although obviously that is not important), and by (8.1.3) and the Implicit FunctionTheorem, the map ˆ has a smooth inverse ‰ in an open neighborhood of y0. Weconsider the substitution

z D ‰.y/: (8.1.5)

Because u is an infinitesimal symmetry, the differential equation expressed in thevariables z, i.e. (7.3.1), has a family of symmetries

z1 7! z1 C "; (8.1.6)

zi 7! zi for i D 2; : : : ; n: (8.1.7)

This means that the function g does not depend on the variable z1, and thus, wehave reduced the number of variables by 1: we have a system of n � 1 differentialequations in the variables z2; : : : ; zn, and an equation for z0

1 in terms of z2; : : : ; zn.For n D 2, this implies a complete solution (separation of variables). Of course,to make this method work, we must be able to evaluate (8.1.1), which, a priori, isa system of n differential equations. However, in some cases, symmetries may bemore easily visible than direct solutions.

8.2

It is useful to mention one generalization. By a generalized symmetry of theequation (7.2.1) we shall mean a substitution z D �.y/ such that

f.�.y// D ˛.y/D�jy � f.y/; (8.2.1)


for some function ˛ W U ! R (i.e., a scalar). The significance of a generalizedsymmetry is that it preserves the direction, but not the magnitude of the tangentvectors to the integral curves. Thus, roughly speaking, a generalized symmetrypreserves the integral curves as sets, but not their parametrization.

The infinitesimal version of this condition is

Œu; f� D �f (8.2.2)

for another scalar function � W U ! R. Again, generalized infinitesimal symmetriesof (7.2.1) at a point form a Lie algebra, a derivative at 0 of a continuoussystem of generalized symmetries is a generalized infinitesimal symmetry, andconversely, (8.1.1) for a generalized infinitesimal symmetry is a continuous familyof generalized symmetries.

In the case of a generalized symmetry, we may still apply the substitution (8.1.4).As a result, (8.1.6), (8.1.7) will be a generalized symmetry of the system (7.3.1). Inthis case, we know that the function

g.z/=g1.z/

does not depend on z1, so (7.3.1) reduces to a system of n � 1 equations

dzidz1

D gi .z/

g1.z/; i D 2; : : : n:

Note, however, that now unless the factor ˛ of the generalized symmetry has somespecial form, we still end up with a general first-order differential equation for thevariable z1.

8.2.1 ExampleConsider the homogeneous differential equation

y0 D f .y

x/:

In symmetric form, this is

y0 D f .y

x/;

x0 D 1:

We have an obvious family of generalized symmetries

��.x; y/T D .�x; �y/T (*)

(to conform with the above notation, " D � � 1). The corresponding infinitesimalsymmetry is

8 Symmetry and separation of variables 171

u.x; y/T D .x; y/T ;

which exponentiates to

exp.zu/.x; y/T D ez.x; y/T ;

so (fixing, say, x0 D 1 and calling the new variables z; v), the substitution becomes

.x; y/T D .ez�1; y0ez�1v/T ;

or

z D 1C ln.x/;

v D y

y0x:

(**)

Up to scalar multiple, the formula for v is the substitution from the last section. Itis worthwhile noting, however, that in the present form, we obtain the autonomousequation

dv

dzD 1

y0.f .y0v/� v/

(which we may not have noticed in the last section). Obviously, the rather simpleform of the generalized infinitesimal symmetry allows us to recover z in this case.

8.3 Example

The fact that for n D 2, a symmetry leads to separation of variables, begs thequestion whether the separated equation

y0 D a.x/y (8.3.1)

always has an infinitesimal symmetry. In fact, we plainly see that making asubstitution in x (independent of y) introduces multiplication by a function of x,so we should be able to make a substitution in x which would eliminate the factora.x/, and the equation would become autonomous. This suggests an infinitesimalsymmetry of the form

u D .k.x/; 0/T : (8.3.2)

The condition (7.5.1) becomes

k0.x/ D k.x/a.x/; (8.3.3)


which can be solved. In fact, it is the original equation, so this is no simplification,but we have found a symmetry, which, as we will see, is useful. Note also thatthe fact that the equation (8.3.3) coincides with (8.3.1) has a geometric reason:Choosing a non-zero characteristic C of the equation (8.3.1), the vector field thevalue of which at each .x; y/T is the vertical vector from .x; 0/T to the characteristicC is a symmetry because any other characteristic, considered as a function of x, isa constant multiple of the function with graph C .

8.4 Example

The symmetry (8.3.2) (subject to the condition (8.3.3)) plainly also is a symmetryof the equation

y0 D a.x/y C b.x/ (8.4.1)

(since (8.3.2) has 0 Lie bracket with .0; b.x//T ). Thus, we may use this symmetryto solve the equation (8.4.1). The substitution we get by choosing y0 D .0; 0/,y1 D y; y2 D x, is

y D z1k.z2/; x D z2:

Setting z D z1, we get

z0 D y0k.x/ � yk0.x/k.x/2

D b.x/

k.x/;

which is solvable by an integral, as desired.

9 Exercises

(1) Convert the differential equation

y000 D .y00/2 � x

sin.y2/C ln.x/

into a system of first-order ODE’s.(2) Convert the system of ordinary differential equations

y00 D z0y0

z C y C xC y3;

z00 D ln.z0 C cos.y0 C z//C 3

into a system of first-order ODE’s.

9 Exercises 173

(3) (a) Using Exercise (9) of Chapter 3, describe a procedure of converting asystem of equations of the form (1.3.1) to a system of the form (1.2.1)with kn raised to knC1without assuming we can find the implicit functionexplicitly. (Note that this may even be useful in the case k0 D � � � Dkn D 0.)

(b) Using this method, convert the differential equation

y0 C sin.x C y C y0/ D 0

into an ordinary (explicit) second-order differential equation.(4) State and prove an analogue of Corollary 4.3 for the general system (1.2.1) of

1.2.(5) Solve the differential equation

y0 D 2x

ey:

(6) Solve the differential equation

y0 D x2 C y2

xy:

(7) Solve the differential equation

y0 D x C y

x:

(8) Prove the Jacobi identity for vector fields.(9) Prove that infinitesimal symmetries of a system of differential equations at a

point y0 form a Lie algebra under the operation of Lie bracket of vector fields.(10) Prove that generalized infinitesimal symmetries of a system of differential

equations at a point y0 form a Lie algebra under the operation of Lie bracketof vector fields.

(11) Prove that a generalized infinitesimal symmetry exponentiates to a generalizedsymmetry.

(12) Find an infinitesimal symmetry of the equation

y0 D f .ax C by/

and recover the solution.(13) Find a generalized infinitesimal symmetry of the equation

y0 D f

�ax C by C c

˛x C ˇy C �

�

and use it to find the solution.

7Systems of Linear Differential Equations

Systems of linear differential equations have many special properties, the mostimportant of which is that a characteristic is defined in any open interval in whichthe system is defined (in contrast with ODE, see Example 4.4.1 of Chapter 6).In this chapter, we prove this important “no blow-up” theorem, and discuss thelinear character of the set of solutions. We also describe a method for solvingcompletely the important class of systems of linear differential equations withconstant coefficients.

1 The definition and the existence theorem for a systemof linear differential equations

1.1

Let aij; bi be continuous functions on an open interval J . A system of lineardifferential equations (briefly, LDE’s) is the following special case of the systemof ODE’s (1.1.1) of Chapter 6:

y0i .x/ D

nXjD1

aij.x/yj .x/C bi .x/; i D 1; : : : ; n: (L)

Recall that such systems arise naturally as equations for partial derivatives ofsolutions of general differential equations by a parameter (see (5.5.2) of Chapter 6).

A linear (differential) equation of order n, where ai ; b are continuous on J , is

y.n/.x/C an�1.x/y.n�1/ C � � � C a1.x/y0 C a0.x/y D b.x/ i D 1; : : : ; n: (L)

Again, the system (L) is easy to translate to a system of the form (L) by the methodof 1.2. In fact, again, one may call (L) a system of first order LDE’s, define systemsof higher order LDE’s, and then show such systems are equivalent to systems of first


175

176 7 Systems of Linear Differential Equations

order LDE using the method of 1.2 of Chapter 6. Consequently, it suffices, again, todevelop a theory for first-order systems (L). However, in some practical situations,it is advantageous to treat the special case of a single higher order equation (L)separately, as we will see below.

If all the functions bi are zero (in the case (L), if b is zero), we speak ofhomogeneous equations resp. equation. The homogeneous counterpart of an (L)resp. (L) will be indicated by (L-hom) resp. (L-hom).

1.2 Lemma. Let f be continuous and bounded on the half-open interval ha; b/.Define a value of f at b arbitrarily. Then there exists the (Riemann) integralR baf .t/dt and we have

Z b

a

f .t/dt D limx!b�

Z x

a

f .t/dt:

Comment: We prove this result here directly to make this chapter (and Chapter 6above) largely self-contained, and independent of the techniques of the Lebesgueintegral as introduced in Chapters 4, 5. The attentive reader, however, should seehow the present statement follows from a much stronger result in Exercise (10) ofChapter 5, Exercise (4) of Chapter 4, and the Lebesgue Dominated ConvergenceTheorem.

Proof. The Riemann integralsR xa

trivially exist (because of the continuity).Let jf .x/j � C . Thus, we can choose partitionsD.x/ of ha; xi such that

Z x

a

f � "

2� s.f jha; xi;D.x// � S.f jha; xi;D.x// �

Z x

a

f C "

2(*)

(notation from Section 8 of Chapter 1).

Let x > b D "

2C. Define a partitionD0.x/ of ha; bi by adding the interval hx; bi

to D.x/. Then we have

s.f jha; xi;D.x// � "

2� s.f jha; xi;D.x// � .b � x/C

� s.f;D0.x// �Z b

a

f �Z b

a

f � S.f;D0.x//

� S.f jha; xi;D.x//C .b � x/C � S.f jha; xi;D.x//C "

2:

(**)

From (*) and (**), we obtain

1 The definition and the existence theorem for a system of linear differential equations 177

Z x

a

f � " �Z b

a

f �Z b

a

f �Z x

a

f C ";

henceˇˇˇZ x

a

f �Z b

a

f

ˇˇˇ � " and

ˇˇˇZ x

a

f �Z b

a

f

ˇˇˇ � "

and finally

Z b

a

f D limx!b�

Z x

a

f DZ b

a

f: ut

1.3 Theorem. Let aij.x/; bi .x/ be continuous on an interval J , let x0 2 L and let�j , j D 1; : : : ; n, be arbitrary real numbers. Then the LDE system

y0i .x/ D

nXjD1

aij.x/yj .x/; i D 1; : : : ; n

has precisely one solution y1; : : : ; yn, defined on the whole of J , such thatyj .x0/D �j .

Proof. Uniqueness follows from the general Theorem 4.1 of Chapter 6, from whichwe also know that there exists a solution defined on a neighborhood of the pointx0. We will prove that this solution can be extended on the whole of J . We willconstruct the extension on the part of the interval to the right of x0, the extension tothe left is analogous.

Recall 2.1 of Chapter 6 and denote by M the set of all z 2 J , z � x0 such thatthere is a solution of the equations

yi .x/ DZ z

x0

.

nXjD1

aij.t/yj .t/C bi.t//dt C �i

on hx0; zi. Set s D supM . If the set M is not all of J \ hx0;C1i, we have(1) s finite, and(2) s 2 J XM .((1) is obvious; regarding (2), either s < supJ and there is a solution in one ofits neighborhoods, or s … M while s 2 J , since it is the only point at whichJ \ hx0;C1i can differ from M ).

Since aij and bi are continuous functions defined on hx0; si, they are bounded onthis interval, say

jaij.x/j � A and jbi.x/j � B:


Choose C , ˛ sufficiently large to have

˛ > 2nA and B.s � x0/C max �i <C

2e˛x

and moreover such that the set

QM D fx j x 2 M and jyi .x/j < C e˛xgis non-empty. The set QM is obviously open inM . The setM is obviously connected.Thus, if we prove that QM is closed we will see (recall Section 5 of Chapter 2)that QM D M , and to do that it suffices to show that QM is closed under limits ofincreasing sequences (if jyi .y/j < C e˛y holds for y’s arbitrarily close above x itobviously holds for x as well).

To this end, consider an increasing sequence xn of points of QM and letlimnxn D �. Important: we do not assume � 2 M ; this will be a consequence of

this part of the proof, and will be used later.From continuity, we immediately see that jyi .x/j � C e˛x on the interval hx0; �i

and hence by Lemma 1.2 we know that the Riemann integralZ �

x0

.X

aij.t/C bi .t//dt

is equal to limx!��

Z x

x0

.X

aij.t/C bi .t//dt D limx!��yi .xn/. If we define yi .�/ as this

limit (if the yi has been already defined at �, this coincides, by continuity, with theoriginal value) we have extended the solution of our LDE system to � (hence, inparticular, we have � 2 M ). We have, however

jZ �

x0

.X

aij.t/C bi .t//dt C �i j �Z �

x0

.nAC e˛t C B/dt C j�i j

� nAC

˛e˛� C B.� � xO/C j�i j < C e˛� ;

so � is not only in M , but in fact in QM .Therefore, QM D M . Now we will take advantage of the fact that in our

procedure, we did not assume � to be inM : the supremum point s can be written asa limit of an increasing sequence of elements fromM (equal to QM ) in contradictionwith s 2 J XM which followed from the assumption thatM ¤ J \hx0;C1i. ut

1.4 Corollary. Let ai .x/ .i D 1; : : : ; n/ and b.x/ be continuous on an interval J ,let x0 2 J and let �i , i D 1; : : : ; n be arbitrary. Then the equation

y.n/ Cn�1XjD1

aj .x/y.j /.x/ D b.x/

has precisely one solution on the interval J satisfying the conditions y.j /.x0/ D �jfor all j D 1; : : : ; n � 1.

2 Spaces of solutions 179

2 Spaces of solutions

2.1

In this section, the continuous functions

aij; bi

are defined on an open interval J . We denote by C.J / the R-vector space of allcontinuous functions on J . Further, we denote the vector space

C.J / � � � C.J / n times

(the n-th power of C.J /) by

Cn.J /:

2.2 Theorem. The system of all solutions of the LDE system (L) constitutes anaffine subset y0 C W of Cn.J /, and the system of all solutions of the n-th orderequation (L) constitutes an affine subset y0CW , where the vector subspacesW arethe sets of all solutions of the associated homogeneous equations.

Proof. will be done for (L). Obviously if y D .y1; : : : ; yn/ and z D .z1; : : : ; zn/solve the associated homogeneous system (L-hom) then so does any ˛y C ˇz andthe system of all the solutions of (L-hom) is a vector subspace of Cn.J /. Nowif y0 D .y01; : : : ; y0n/ solves (L), that is, if y0

0 D Paijy0j C bi and if y solves

(L-hom), that is, y0 D Paijyj then y0

0 C y0 D Paij.y0j C yj / C bi and y0 C y

solves (L). On the other hand if z is an arbitrary solution of (L) then z0i � y0

0 DPaij.zj � y0j / so that z � y0 2 W and z D y0 C .z � y0/ 2 y0 CW . ut

Remark. Of course the principle is the same as in the solution of systems ofalgebraic linear equations.

Theorem. The dimensions of (both of) the affine sets from the previous theoremare n.

Proof. Again, we will prove the statement for the system (L). Let y1; : : : ; yp besolutions of (L-hom) and let p > n. Take an x0 2 J . Then the system of algebraiclinear equations

y11.x0/˛1 C y21.x0/˛2 C � � � C yp1.x0/˛p D 0

: : : : : : : : :

y1n.x0/˛1 C y2n.x0/˛2 C � � � C ypn.x0/˛p D 0


has a non-trivial solution ˛1; : : : ; ˛n (in fact, the vector space of such solutions hasdimension p � n). Set

y DpXiD1

˛iyi :

In particular we have y.x0/ D .0; : : : ; 0/. But we already know such a solution,namely zero: o D .const0; : : : ; const0/. From uniqueness, it now follows thatpXiD1

˛iyi D o, i.e. that the system y1; : : : ; yn is linearly dependent; hence, the

dimension of W is at most n. On the other hand consider the solutions yi of(L-hom) such that yij.x0/ is 1 for i D 1 and 0 otherwise. Then we obtain a linearlyindependent system: if

P˛iyi D o then in particular

P˛iyi .x0/ D 0, that is,P

˛iıij D 0 and all the ˛i are zero. ut

2.3 The Wronski determinants (Wronskians)

For solutions y1; : : : ; yn of (L-hom), one introduces the determinant

W.y1; : : : ; yn/.x/ Dˇˇˇy11.x/; : : : ; y1n.x/

: : :

yn1.x/; : : : ; ynn.x/

ˇˇˇ :

For solutions y1; : : : ; yn of the equation (L-hom), one introduces

W.y1; : : : ; yn/.x/ D

ˇˇˇˇ

y1.x/; : : : ; yn.x/

y01.x/; : : : ; y0

n.x/

: : :

y.n�1/1 .x/; : : : ; y

.n�1/n .x/

ˇˇˇˇ:

The functions W.y1; : : : ; yn/.x/ resp. W.y1; : : : ; yn/.x/ are called the Wronskideterminants of the equations in question.

Remark. Note that the latter is in fact a special case of the former obtained fromthe standard translation as in 1.2 of Chapter 6.

2.4 Theorem. The following statements are equivalent for a system of solutionsy1; : : : ; yn of the system (L) (the interval J is as before):(1) the solutions y1; : : : ; yn are linearly independent,(2) W.y1; : : : ; yn/.x/ ¤ 0 at all x 2 J ,(3) there exists an x0 2 J such that W.y1; : : : ; yn/.x0/ ¤ 0.Similarly for the system (L). If the conditions hold, the system y1; : : : ; yn is called afundamental system of solutions.

3 Variation of constants 181

Proof. We will prove the statement for the case (L), just for a change.(1))(2): Suppose (2) does not hold and we have an x0 2 J such that

W.y1; : : : ; yn/.x0/ D

ˇˇˇˇ

y1.x0/; : : : ; yn.x0/

y01.x0/; : : : ; y0

n.x0/

: : :

y.n�1/1 .x0/; : : : ; y

.n�1/n .x0/

ˇˇˇˇ

D 0:

Then the system of algebraic linear equations

y1.x0/˛1 C y2.x0/˛2 C � � � C yn.x0/˛n D 0;

y1.x0/0˛1 C y2.x0/

0˛2 C � � � C yn.x0/0˛n D 0;

: : : : : : : : :

y.n�1/1 .x0/˛1 C y

.n�1/2 .x0/˛2 C � � � C y.n�1/

n .x0/˛n D 0

has a non-trivial solution ˛1; : : : ; ˛n. If we set y D P˛iyi we have in particular

y.x0/ D y0.x0/ D � � � D y.n�1/.x0/ D 0. This holds for the trivial constant zerosolution as well and hence, by uniqueness,

P˛iy D const0 and our solutions are

linearly dependent.The implications (2))(3) and (3))(1) are trivial. ut

3 Variation of constants

This is a method which allows us to find the system of solutions of the system(L) (resp. (L)), provided we know a fundamental system of solutions of the system(L-hom) (resp. (L-hom)). Again, the latter is a special case of the former, but in thiscase we will present both cases explicitly.

3.1 The system (L)

Suppose we have a basis y1; : : : ; yn of solutions of (L-hom). We will try to find asolution of (L) in the form

y0.x/ DnXiD1

ci .x/yi .x/

(recall 6.7 of Chapter 6). We have

y0ij D

Xk

ajkyik


and hence

y0ij D

Xc0i yij C

Xciy

0ij D

Xi

c0i yij C

Xik

ci ajkyik

DXi

c0i yij C

Xk

ajk

Xi

ciyik DXi

c0i yij C

Xk

ajky0k

and hence the problem is in finding functions ci .x/ such that

Xi

c0i .x/yij.x/ D bi .x/:

This is easily done using the Cramer rule (Appendix B, 4.2). If we denote byWi.x/

the Wronskian in which we replace the i -th column by the

[email protected]/

: : :

bn.x/

1A

we obtain

c0i .x/ D Wi.x/

W.y1; : : : ; yn/

with the denominator non-zero by 2.6, and conclude that

ci .x/ DZ

Wi.x/

W.y1; : : : ; yn/: �

3.2 The equation (L)

Consider a basis y1.x/; : : : ; yn.x/. Let us look for a solution in the form

y.x/ DX

ci .x/yi .x/:

We have y.n/i .x/ Dn�1XjD0

aj y.j /i D 0. Thus, if we require

Xc0i .x/y

.k/i .x/ D 0 (*)

4 A Linear differential equation of nth order with constant coefficients 183

for k D 0; : : : ; n � 2, we will have

y0.x/ DX

ci .x/y0i .x/

: : :

y.n�1/.x/ DX

ci .x/y.n�1/i .x/:

Let us add a further requirement

Xc0i .x/y

.n�1/i .x/ D b.x/: (**)

Then we have

y.n/.x/ DX

ci .x/y.n/i .x/C b.x/

and conclude that

y.n/.x/CXk

ak.x/y.k/.x/ D b.x/:

The requirements (*) and (**) constitute, again, a system of algebraic linearequations solvable using the Cramer rule (again with the non-zero Wronskian in thedenominator) to obtain c0

i .x/. Finally, take the primitive functions to obtain ci.x/.

4 A linear differential equation of nth orderwith constant coefficients

In this and the following section we will consider linear differential equations withconstant coefficients ai , resp. aij. In view of the previous section, it suffices to solvethe corresponding homogeneous equations. If these are solved, the general case canbe computed by variation of constants; note that the right-hand sides b resp bi donot have to be constant.

4.1 The Characteristic Polynomial

Consider the problem of finding a function y satisfying

y.n/ C an�1y.n�1/ C � � � C a1y0 C a0y D 0 (*)

where ak are real numbers.


We already know that it suffices to find n linearly independent solutions. Letus try

y.x/ D e�x:

We have

y.k/.x/ D �ke�x;

and hence the equation (*) will be satisfied if (and only if)

e�x.�n C an�1�n�1 C � � � C a1�C a0/ D 0;

that is, since e�x ¤ 0, if and only if

p.�/ D �n C an�1�n�1 C � � � C a1�C a0 D 0:

The polynomial p is called the characteristic polynomial of the equation (*).Thus we see that

if � is a root of the characteristic polynomial of (*) then y.x/ D e�x is a solution of thisequation.

4.2

If �1; : : : ; �n are distinct numbers then the functions e�1x; : : : ; e�nx constitute alinearly independent system. This is easily proved using the Wronski and Vander-monde determinants. For our purposes this would not suffice, though. We will needa stronger

Lemma. Let �1; : : : ; �k be distinct complex numbers and let p1.x/; : : : ; pk.x/ bepolynomials. Let

kXjD1

pj .x/e�j x

be identically zero. Then all the polynomials pj are zero.

Proof. Suppose not. Then among the counterexamples, choose one such that(a) the maximum of the degrees of the polynomials pj is the least possible, and(b) the number of the polynomials pj with this maximum degree is the least

possible.

4 A Linear differential equation of nth order with constant coefficients 185

Here the degree of a constant non-zero polynomial is defined to be 0, and the degreeof the constant zero is defined to be �1. Thus, taking derivative of a non-zeropolynomial decreases the degree by one. We have identically

kXjD1

pj .x/e�j x D 0: (4.2.1)

Taking the derivative we obtain

kXjD1

pj .x/e�j x CkX

jD1pj .x/�j e�j x D 0: (4.2.2)

Let, say, p1 have the maximum degree. Subtracting (4.2.1) multiplied by �1 from(4.2.2), we obtain

p01.x/e

�1x CkX

jD2..�j � �1/pj .x/C p0

j .x//e�j x D 0: (4.2.3)

Now the degree of the polynomial at e�1x has decreased and none of the otherdegrees has increased. Thus, the formula (4.2.3) cannot be a counterexample to thestatement and hence we have to have

p01.x/ � 0; and

.�j � �1/pj .x/C p0j .x/ � 0 for j > 1:

From the second equation we immediately see that all the pj with j > 1 areidentically zero (since �1 ¤ �j ). The first one immediately yields only that p1has to be a constant, but C e�1x is zero only if C D 0. ut

4.3 Corollary. Let �1; : : : ; �k be distinct complex numbers. Then the system offunctions

e�1x;xe�1x; : : : ; xs1e�1x; e�2x; xe�2x; : : : ; xs2e�2x; : : :

: : : : : : e�kx; xe�kx; : : : ; xsk e�1x:

with arbitrary non-negative integers sj is linearly independent.


4.4 The simplest case

If the characteristic polynomial has n distinct real roots �1; : : : ; �n then we have,by 4.1 and 4.3, the fundamental system of solutions

e�1x; : : : ; e�nx:

The problem is, hence, what to do with the complex roots, and how to deal with apossible multiplicity of some of the roots.

4.5 Complex roots

We are dealing with an LDE in real variables. Thus the characteristic polynomial hasreal coefficients and consequently each of the roots which is not real is accompaniedwith its complex conjugate as another root. That is, if ˇj ¤ 0 in a root

�j D ˛j C iˇj

then there is a k ¤ j with

�k D ˛j � iˇj :The two complex functions e�j x; e�kx are then in our basis replaced by

e˛j x cosˇj x and e˛nx sinˇjx: (4.5.1)

Replacing eix and e�ix by linear combinations of cosx and sin x, and vice versa, inthe present context, is justified by Exercise (12) of Chapter 1. We will gain a muchbetter understanding of this in Chapter 10 below.

4.6 Multiple roots

Define an operator

L.y/ D y.n/ Cn�1XjD0

aj y.j /

to be applied on functions y.x; �/ of two real variables. Thus we have

L.y/ D @ny

@xnC

n�1XjD0

aj@j y

@xj:

By 4.2.1, of Chapter 3, we obtain

5 Systems of LDE with constant coefficients. An application of Jordan’s Theorem 187

@

@�L.y/ D @

@�

@ny

@xnC � � � D @n

@xn@y

@�C � � � D L

�@y

@�

�

and more generally

@k

@�kL.y/ D L

�@ky

@�k

�:

In particular for y.x; �/ D e�x we have L.y/ D e�xp.�/ and hence

L.xke�x/ D L�@ky

@�k

�D @k

@�k.e�xp.�//:

By induction we easily learn that

@k

@�k.e�xp.�// D

kXjD1

k

j

!p.j /.�/xk�j e�x:

If � is a k-multipled root of p we have

p.�/ D p0.�/ D � � � D p.k�1/.�/ D 0

and hence the equation L.y/ D 0 is satisfied, besides e�x , also by

xe�x; x2e�x; : : : ; xk�1e�x

Thus we obtain k solutions, and if we apply this to all the roots we obtain nsolutions, independent by 4.3, and hence the fundamental system of solutions weneeded.

For a conjugate pair of complex roots ˛ C iˇ, ˛ � iˇ we take, of course,

e˛x cosˇx; xe˛x cosˇx; : : : ; xk�1e˛x cosˇx;

e˛x sinˇx; xe˛x sinˇx; : : : ; xk�1e˛x sinˇx:

5 Systems of LDE with constant coefficients. An applicationof Jordan’s Theorem

5.1

Consider a system of first-order linear differential equations

y0 D Ay: (5.1.1)


In fact, let us carefully consider two contexts in which (5.1.1) makes sense. The firstcontext is, as above, whenA is a constant nnmatrix over R, and y W R ! Rn is anunknown vector-valued function. However, it also makes sense to consider the casewhen A is an n n matrix over C, and the unknown function is y W R ! Cn. Thiscase makes sense since we may identify C Š R2, and such system of n complex-valued first order differential equations can therefore be interpreted as a system of2n real-valued first-order linear differential equations. Let us emphasize, however,that in this discussion, the independent variable remains real.

The advantage of considering (5.1.1) overC is that overC, every matrix is similarto a matrix in Jordan canonical form. Changing basis to the basis in which thematrix is in Jordan form gives a substitution which allows us to solve the systemof equations. Even more explicitly, this can be said as follows: consider a k k

Jordan block of the matrix A with respect to an eigenvalue �. This corresponds to kvectors u1; : : : ;uk 2 Cn such that

Au1 D �u1;

Auj D �uj C uj�1; j D 2; : : : ; k:(5.1.2)

Then this data give the following solutions of the system (5.1.1):

u1e�x;

u2e�x C u1xe�x;

: : :

uke�x C uk�1xe�x C � � � C u1xk�1

.k�1/Š e�x:

(5.1.3)

Taking the solutions (5.1.3) for all Jordan blocks gives a fundamental system ofsolutions, which we can see by taking the determinant of their values at 0 (wherewe get the base change matrix from the Jordan basis to the standard basis); recallfrom Theorem 2.4 that a system of n solutions whose values are independent at onepoint is a fundamental system of solutions.

5.2

Let us now consider the case when the system (5.1.1) is over R. Then, the matrix Ais a real matrix. This means that for every solution y over C,

Re.y/; Im.y/ (5.2.1)

are real solutions of (5.1.1). Taking all such solutions for all Jordan blocks gives asystem of real solutions which, when considered over C, generate the vector spaceof all the complex solutions and hence must contain a basis of the space of realsolutions (which can be found explicitly by finding a set of columns which form abasis of the matrix of values at 0).

5 Systems of LDE with constant coefficients. An application of Jordan’s Theorem 189

5.2.1 ExampleConsider the system (5.1.1) with

A D

0BB@

0 �1 0 0

1 0 0 0

0 0 0 �11 0 1 0

1CCA :

Then one sees right away that the characteristic polynomial is

�A.x/ D .x2 C 1/2;

and the Jordan canonical form is

J D

0BB@

i 0 0 0

1 i 0 0

0 0 �i 0

1 0 1 �i

1CCA :

Let us consider the Jordan block corresponding to the eigenvalue i . By solvingsystems of linear equations, we find that

u1 D .0; 0;�1; i/T ;u2 D .2i; 2; 0; 1/T :

Note that we could equivalently take a scalar multiple of both vectors by the samenon-zero complex number. Thus, (5.1.3) produces solutions

.0; 0;�eix; ieix/T ;

.2i; 2; 0; 1/T eix C .0; 0;�1; i/T xeix:

Taking real and imaginary parts, we get four real solutions

.0; 0;� cos.x/;� sin.x//T ;

.0; 0;� sin.x/; cos.x//T ;

.�2 sin.x/; 2 cos.x/;�x cos.x/; cos.x/ � x sin.x//T ;

.2 cos.x/; 2 sin.x/;�x sin.x/; sin.x/C x cos.x//T :

Since the data obtained from the other Jordan block can be taken complex conjugate,we know that these solutions span the space of all complex solutions, and henceform a fundamental system of real solutions.


5.3 Remark

As mentioned above, a single differential equation with constant coefficients

y.n/ C a1y.n�1/ C � � � C any D 0

can be converted to a system of first-order linear differential equations (5.1.1) where

A D

0BBBBB@

0 1 0 : : : 0

0 0 1 : : : 0

: : : : : : : : : : : : : : :

0 0 0 : : : 1

�an �an�1 �an�2 : : : �a1

1CCCCCA:

We clearly have

�A.x/ D xn C an�1xn�1 C � � � C an: (5.3.1)

In fact, we call A the characteristic matrix of the polynomial (5.3.1). We mayask when, conversely, may a system of first-order linear differential equations withconstant coefficients be converted by a substitution to a single n’th order lineardifferential equation? Clearly, this is equivalent to asking which square matrices aresimilar to characteristic matrices. We will solve this question in the exercises.

6 Exercises

(1) Prove that the WronskianW.x/ of any n solutions of the system

y0 D A.x/y

satisfies the differential equation

W.x/0 D tr.A/W.x/:

(Here for a square matrix A, tr.A/ is the sum of its diagonal terms.)(2) The differential equation

y00 C y0

x� y

xD 0

has solutions

y D x; y D 1

x:

6 Exercises 191

Find all solutions of the differential equation

y00 C y0

x� y

xD ex:

(3) Find a fundamental system of solutions of the equation

y.3/ � y.2/ C 8y0 C 12y D 0:

(4) Find all solutions of the system of LDE’s

y01 D y1 � y2 C xex;

y02 D y1 C 3y2 C x2:

(5) Find a fundamental system of real solutions of the system of LDE’s (5.1.1)with

A D

0BB@

1 1 0 1

�1 1 0 1

0 0 1 1

1 0 �1 1

1CCA :

(6) Prove that the characteristic polynomial of a characteristic matrix Ap of apolynomial p with highest coefficient 1 is equal to p.

(7) A cyclic vector for a linear transformation f W V ! V is a vector v 2 V suchthat the vectors v; f .v/; : : : ; f N .v/; : : : span the vector space V . (As usual, wewill identify an nnmatrix with the linear transformation Rn ! Rn it definesby matrix multiplication.) Prove that a matrix is similar to a characteristicmatrix if and only if it has a cyclic vector.

(8) Suppose an n n matrix A over C has only one eigenvalue. Prove that A hasa cyclic vector if and only if A is equivalent to a Jordan block. [Hint: If v is acyclic vector, prove that .�I �A/j v for j � 0 span Cn.]

(9) Prove that if f W V ! V is a linear transformation, v 2 V is a cyclicvector and W � V is a subspace such that f .W / � W , then the imagev C W ov v in V=W is a cyclic vector for the induced linear transformationf=W W V=W ! V=W .

(10) Using the results of Exercises 8 and 9, prove that a square matrixA over C hasa cyclic vector if and only ifA has exactly one Jordan block for each eigenvalue�. (Note: such matrices are sometimes called regular, which however may beconfusing since this notion has nothing to do with non-singularity.)

(11) Suppose you know a cyclic vector of an n n (constant) matrix A. Explainhow you can use the method of Section 4 (which is simpler than the method ofSection 5) for solving the system of LDE’s

y0 D Ay:

8Line Integrals and Green’s Theorem

In this chapter, we introduce the line integral and prove Green’s Theorem whichrelates a line integral over a closed curve (or curves) in R2 to the ordinary integral ofa certain quantity over the region enclosed by the curve(s). Making rigorous senseof what this last concept means is a big part of the work. Much of the materialof this section is subsumed by the more general treatment of Stokes’ Theoremin manifolds of arbitrary dimension in Chapter 12 below. However, there are twoimportant reasons to present Green’s Theorem first. The first reason is that Green’sTheorem is much more elementary, and does not require the added abstraction, andalgebra and topology material needed for Stokes’ Theorem. The other importantreason is that Green’s Theorem can be, in fact, used directly to set up the foundationsof basic complex analysis, which we do in the next chapter, and which is rather niceto do without having to go into Stokes’ Theorem in a general dimension.

1 Curves and line integrals

1.1

A parametrization of a (piecewise continuously differentiable) curve in Rn is acontinuous map

� D .�1; : : : ; �n/T W ha; bi ! R

n

(recall our convention 1.1 of Chapter 3 of denoting vector functions by bold-facedletters) such that there exists a partition

a D a0 < a1 < � � � < an D b


193

194 8 Line Integrals and Green’s Theorem

of the interval ha; bi for which we have:(1) On each of the intervals hai�1; ai i, each of the functions �j has a continuous

derivative (use one-sided derivatives at the endpoints).(2) For every i there exists a j such that �0

j .t/ is positive or negative on all ofhai�1; ai i (again, take one-sided derivatives at the endpoints).

We will sometimes also speak of a parametrized piecewise continuouslydifferentiable curve.

Comment: Instead of condition (2), one may simply require that �0.t/ ¤ 0 onhai�1; ai i, as the interval can then be subdivided into finitely many intervals on eachof which condition (2) holds.

1.2

We say that two parametrizations

� W ha; bi ! Rn;

W ha; bi ! Rn

are weakly equivalent, and write

� ;

if there exists a homeomorphism ˛ W ha; bi ! hc; d i such that

ı ˛ D �:

Note that the relation really is an equivalence relation, i.e. that it is reflexive,symmetrical and transitive (see Exercise (2)). Equivalence classes with respect to will be called piecewise continuously differentiable curves.

1.3 Remark

Clearly, � implies �Œha; bi� D Œhc; d i�. On the other hand, if � and areone-to-one and we have �Œha; bi� D Œhc; d i�, then we have � . In effect,consider the maps

� W ha; bi ! �Œha; bi�; W ha; bi ! �Œha; bi�

defined by the same formulas as �; . Since the relevant spaces are compact, �,

are homeomorphisms (see 6.2.2 of Chapter 2). Put ˛ D �1�:

1 Curves and line integrals 195

1.4 Proposition. The map ˛ from the definition of weak equivalence ispiecewise continuously differentiable.

Proof. Let a0; : : : ; ar , c0; : : : ; cs be the partitions figuring in Definition 1.1 for theparametrizations �, . Let b0; : : : ; bk be a common refinement of a0; : : : ; ar and˛�1.c0/; : : : ; ˛�1.cs/. On the interval h˛.bi�1/; ˛.bi /i, choose j such that j is aone-to-one continuously differentiable map with non-zero derivative. Then �1

j hasa derivative and we have ˛.t/ D �1

j .�j .t// on hbi�1; bi i. ut

1.5

We say that two parametrizations

� W ha; bi ! Rn; W ha; bi ! R

n

are equivalent, and write

� �

if there exists an increasing homeomorphism ˛ such that ı ˛ D �.The equivalence classes of � are called oriented piecewise continuously

differentiable curves.

Proposition. Suppose a parametrization of a piecewise continuously differentiablecurve � is one-to-one with the possible exception �.a/ D �.b/. Then the -equivalence class of � contains precisely two �-equivalence classes.

Proof. Since � � ) � , a -equivalence class is a union of �-equivalenceclasses. Define � W ha; bi ! ha; bi by

�.t/ D �t C b C a:

Then we have � � ı � but (by injectivity), not � � . Therefore, there are atleast two �-equivalence classes in each -equivalence class. When � �

but the homeomorphisms ˛, ˇ from Definition 1.2 for the pairs �; and ˇ; ˛ arenot increasing, then ˇ˛ is increasing, and hence � � �. ut

1.6 A Remark and a Convention

The geometric idea of a curve is modelled well by the concepts of 1.2 (see 1.3).A parametrization can be interpreted as additional information about “time” atwhich we are at a particular point when travelling along the curve. In an orientedcurve, we do not care about the precise time at which we are at a particular point,but we do want to keep track of the direction of travel.


We will often speak freely of an (oriented or unoriented) curve

� W ha; bi ! Rn:

Of course, what we really mean is the corresponding equivalence class ofparametrizations.

1.7

LetK;L be oriented curves with parametrizations� W ha; bi ! Rn, W ha1; b1i !Rn such that �.b/ D .a1/. Without loss of generality, we may assume a1 D b

((otherwise, we may replace , say, by the parametrization ı ˛ where ˛.t/ Da1 C .t � b/). Let us then write c D b1, and define � � W ha; ci ! Rn by

.� � /.t/ D��.t/ for t 2 ha; bi, .t/ for t 2 hb; ci.

Then � � is a parametrization of a new oriented curve which we will denote by

LCK:

This parametrized curve is well-defined in terms of K , L; moreover, this operationis associative (see Exercises (3), (4), (5) below).

1.8

Recall the map � of 1.5. If L is an oriented curve with parametrization� W ha; bi ! Rn, define

�L

as the oriented curve determined by � ı �. Thus, �L is “the other oriented curve”which represents the same (unoriented) curve, and accordingly, we shall refer to �Las the oriented curve with opposite orientation. Again, �L does not depend on theparticular parametrization of the oriented curve L.

1.9 Some terminology

A curve with a one-to-one parametrization is sometimes called a simple arc, a curvewith a representation such that �.a/ D �.b/ but �.x/ ¤ �.y/ unless x D y orfx; yg D fa; bg is called a simple closed curve. The word ‘closed’ in this contexthas, of course, a different meaning than a closed subset of a topological space.

2 Line integrals of the first kind (D according to length) 197

2 Line integrals of the first kind (D according to length)

2.1

Recall our definition of Euclidean norm jjujj D pu � u where � is the dot product.

Then jju � vjj is the ordinary Euclidean distance. While this distance function wasunimportant (even awkward) in topological considerations, and could be replacedby any equivalent metric, in the present section it will play a crucial role.

2.2

Recall the definition of the Riemann integral and view it informally as a kind ofsummation of the function f over the length of an interval. Since an interval isa very special case of a piecewise continuously differentiable curve (where theparametrization is the identity), we may wonder if the Riemann integral couldbe generalized to a situation where the domain is (the image of) a piecewisecontinuously differentiable curve. This intuition indeed works. By a partition ofa parametrized piecewise continuously differentiable curve � W ha; bi ! R

n wewill mean a sequence of points

�.t0/;�.t1/; : : : ;�.tk/ (*)

where t0 < t1 < � � � < tk is a partition of the interval ha; bi. The mesh of a partitionis the maximum of the numbers jj�.ti�1/� �.ti /jj.

Note that since ha; bi is a compact space, � is a uniformly continuous map andhence if the mesh of a sequence of partitions goes to 0, so does the mesh of their�-images.

Now consider a continuous real function f defined (at least) on �Œha; bi�. Inanalogy with the Riemann integral, (recall, in particular, Theorem 8.3 of Chapter 1),let us investigate sums of the form

kXiD1

f .�.ti //jj�.ti / � �.ti�1/jj

and let us see if they converge to a particular value when the mesh goes to 0. By theMean Value Theorem

Xi

f .�.ti //

vuutnX

jD1.�j .ti /� �j .ti�1//2

DXi

f .�.ti //

sXj

�0j .�ij/

2.ti � ti�1/2

DXi

f .�.ti //

sXj

�0j .�ij/

2.ti � ti�1/


which, by Theorem 8.3 of Chapter 1, when the mesh goes to 0, converges to

Z b

a

f .�.t//jj�0.t/jjdt:

When � W ha; bi !Rn is a parametrization of a piecewise continuously differen-tiable curve L, we call the number

Z b

a

f .�.t//jj�0.t/jjdt (**)

the line intergral of the first kind (or integral according to length) of the function fover the curve L, and denote it by

Z

L

f orZ

L

f .x/jjdxjj:

2.2.1 CommentThe formula (**) makes sense, of course, for any integrable function f , in whichcase the integral (**) exists as a Lebesgue integral. A similar comment will applyto all the types of curve integrals we shall introduce. It is useful to note, however,that in the context of the present chapter, we are not interested in such level ofgenerality, and are happy to assume that the function f is continuous in which casethe Lebesgue integral is the same as the Riemann integral. Nevertheless, even withthat in mind, the Lebesgue integral techniques we developed in Chapter 5 are stillneeded for example in arguments such as differentiating behind the integral sign inProposition 3.7 or the use of multivariable substitution in Section 5 below.

2.3 Proposition. The expression in the definition of the line integral of the first kindis independent of parametrization.

Proof. Let � and be as in 1.1, let � D ı˛. By 1.4, ˛ is piecewise continuouslydifferentiable (except at finitely many points where there are, at most, discontinuitiesof the first kind, i.e. such that the corresponding one-sided limits exist), and hencewe have

jj�0.t/jj DqP

�0j .t/

2 DqP

0j .˛.t//

2.˛0.t//2

DqP

0j .˛.t//

2j˛0.t/j D jj 0.˛.t//jj � j˛0.t/j

and hence by the Substitution Theorem (for the Riemann integral in one variable),we have

3 Line integrals of the second kind 199

Z b

a

f .�.t//jj�0.t/jjdt DZ b

a

f . .˛.t//jj 0.˛.t//jj � j˛0.t/jdt

DZ d

c

f . .�//jj 0.�/jjd�: ut

(The attentive reader will recall from the theory of the single variable Riemannintegral substitution that if ˛ is decreasing, the absolute value in j˛0.t/j isnevertheless correct because of an interchange of bounds.)

2.4 Remark

The length of a curve L is defined as the integral of the first kind of the function 1over L, i.e.

Z

L

1 DZ b

a

jj�0jj:

3 Line integrals of the second kind

3.1

Let � W ha; bi ! Rn be a parametrization of a piecewise continuously differentiableoriented curve L, and let f D .f1; : : : ; fn/

T be a vector function defined (at least)on �Œha; bi�. The line integral of the second kind of the function f over the orientedcurve L is the number

Z

L

f DZ b

a

f.�.t// � �0.t/ DnX

jD1

Z b

a

fj .�.t//�0j .t/dt

(note, in the middle expression, the dot product of vectors). When there is a dangerof confusion, we will denote line integrals of the first and second kind explicitly by

.I /

Z

L

; .II/Z

L

:

In the literature, the line integral of the second kind is also often denoted byZ

L

.f1dx1 C � � � C fndxn/:


This notation, in fact, conforms to the notation of differential forms, which we willsee later in Chapter 12. When x D .x1; : : : ; xn/

T , we will also use the notationZ

L

f.x/ � dx:

3.2

The “physical” meaning of the line integral of the second kind: We travel aroundthe curve L from the beginning point to the end point.

RLF is then the work done

when we exert the force F at each given point of the curve.

3.3 Proposition. The expression in the definition of the line integral of the secondkind does not depend on the choice of parametrization of an oriented piecewisecontinuously differentiable curve.

Proof. Let � D ı ˛. Now, of course, ˛0.t/ > 0 (with the possible exception offinitely many points, where ˛0 has, at most, discontinuities of the first kind). We have

nXjD1

Z b

a

fj .�.t//�0j .t/dt D

nXjD1

Z b

a

fj . .˛.t// 0j .˛.t//˛

0.t/dt

DnX

jD1

Z d

c

fj . .�// 0j .�/d�: ut

Observation.Z

�Lf D �

Z

L

f.

3.4

We immediately see the following

Proposition. Let K;L be oriented piecewise continuously differentiable curvessuch that K C L is defined. Then

Z

KCLf D

Z

K

f CZ

L

f:

3 Line integrals of the second kind 201

3.5

Now let f be a (scalar) function defined on �Œha; bi� where � is a parametrizationof a piecewise continuously differentiable oriented curve. On the same set, define

f.�.t// D f .�.t//�0.t/

jj�0.t/jj :

From the definitions, one has immediately

.I /

Z

L

f D .II/Z

L

f:

Thus, the line integral of the first kind can be reduced to the line integral of thesecond kind.

3.6 Remarks

1. The traditional terms “of the first kind” and “of the second kind” therefore shouldnot be interpreted as expressing the order of importance. The line integral of thesecond kind is in fact more fundamental, and the integral of the first kind can bereduced to it. Perhaps the reason for the terminology is that the line integral ofthe first kind is the more naive notion.

2. The function f or f often is defined on an open set containing �Œha; bi�. Thiswill play a crucial role in the proof of Green’s Theorem.

3.7

Since continuous functions on a compact set are bounded, we obtain immediatelyfrom Theorem 5.2 of Chapter 5 the following

3.8 Proposition. Let f.˛; x/ be a continuous vector function defined in an open set

U of Rn such that@fj .˛; x/

@˛is continuous on U for each j . Then the line integral

of the second kind satisfies

d

d˛

Z

L

f.˛; x/ � dx DZ

L

@f.˛; x/

@˛� dx:


4 The complex line integral

4.1

For a complex function of one real variable, f .t/ D f1.t/C if2.t/ where f1, f2 arereal functions, one introduces the Riemann integral by the formula

Z b

a

f .t/dt DZ b

a

f1.t/dt C i

Z b

a

f2.t/dt:

4.2

Recall that on the field of complex numbers C, we use the distance functiond.x; y/ D jx � yj, which is the same as the Euclidean distance when we identifyC with R2 by x C iy 7! .x; y/. We will use this identification freely to definepiecewise continuously differentiable functions in C, etc., but now note that �.t/are the elements of the field C and hence can be subjected to the multiplication in C

which is different from the dot multiplication in R2 (for example in that the result isagain an element of C rather than R). This distinction, in fact, is the main point ofthe present section. Because of this, when working with complex-valued functions,we will not use bold-faced letters as we did in the case of vector functions.

4.3

Let � W ha; bi ! C be a parametrization of an oriented piecewise continuouslydifferentiable curveL and let f be a (continuous) complex function of one complexvariable defined on some set containing �Œha; bi�. The complex line integral

Z

L

f .z/dz

is introduced by the formula

Z b

a

f .�.t//�0.t/dt (*)

(independence on (oriented) parametrization will be discussed below in 4.4). Again,note with caution that while the formula (*) is similar to the definition of theline integral of the second kind, it is different and “more mysterious” in that itinvolves complex multiplication. For example, there is no simple interpretation ofthe complex curve interval similar to the interpretations given in 2.2 or 3.2.

4 The complex line integral 203

4.4

It is, however, again possible to express the complex line integral in terms of lineintegrals of the second kind.

Theorem. Let f be a complex function of one complex variable. Let

f .z/ D f1.z/C if2.z/

where f1, f2 are real functions of one complex variable. Then the complex lineintegral satisfies

Z

L

f .z/dz D .II/Z

L

.f1;�f2/T C i .II/Z

L

.f2; f1/T :

Proof. We have

Z b

a

f .�.t//�0.t/dt DZ b

a

.f1.�.t//C if2.�.t///.�01.t/C i�0

2.t//dt

DZ b

a

.f1.�.t//�01.t/C .�f2.�.t///�0

2.t//dt

Ci Z b

a

.f2.�.t//�01.t/C f1.�.t//�

02.t/dt

!

DZ

L

.f1;�f2/T C i

Z

L

.f2; f1/T : ut

Remark: This theorem also implies that the complex line integral does notdepend on the parametrization of an oriented piecewise continuously differentiablecurve. (Of course, reversal of orientation results in a reversal of sign.)

4.5

The estimate in the following statement is not particularly tight. However, it willprove useful in Chapter 10 below.

Lemma. Let L be a piecewise continuously differentiable curve in C of length d(recall 2.4), and assume a complex function f on �Œha; bi� satisfies jf .z/j � A.Then we have

ˇˇZ

L

f .z/dz

ˇˇ � 4Ad:


Proof. We have

ˇˇˇZ b

a

f .�.t//�0.t/dt

ˇˇˇ D

ˇˇˇZ b

a

f1�01 �

Z b

a

f2�02 C i

Z b

a

f2�01 C i

Z b

a

f1�01

ˇˇˇ

�ˇˇˇZ b

a

f1�01

ˇˇˇC

ˇˇˇZ b

a

f2�02

ˇˇˇC

ˇˇˇZ b

a

f2�01

ˇˇˇC

ˇˇˇZ b

a

f1�02

ˇˇˇ

�Z b

a

jf1j � j�01j C

Z b

a

jf2j � j�02j C

Z b

a

jf2j � j�01j C

Z b

a

jf1j � j�02j

� 4

Z b

a

Aj�0j D 4A

Z b

a

j�0j D 4Ad: ut

5 Green’s Theorem

5.1 Smooth partition of unity: a “baby version”

Let Z � Rn be a compact set, and let S be a set of open subsets of Rn whose unioncontains Z. A smooth partition of unity subordinate to S is a set of finitely manysmooth functions �i W Rn ! R, i D 1; : : : ; k such that the image of each �i iscontained in h0; 1i, the support of each �i is compact and contained in one of the

sets from S , and � DkXiD1

�i has the property that �.x/ D 1 for every x 2 Z.

Lemma. Let Z � Rn be a compact set. For every set S whose union contains Z,there exists a smooth partition of unity.

Proof. First of all, Z is bounded by 6.5 of Chapter 2, and hence contained in abounded closed intervalK D hA1;B1i� � �hAn;Bni. Consider the set T consistingof all bounded open intervals whose closures are either contained in one of the setsfrom S , or are disjoint with Z. By 5.5 of Chapter 2, K is contained in the unionof the elements of a finite subset F of T . Now consider the function �.x/ equal toe�1=x for x > 0, and equal to 0 for x � 0 (see Exercise (13) of Chapter 1). For aninterval J D .a1; b1/ � � � .an; bn/, let

�J .x/ DnY

kD1�.xi � ai /�.bi � xi /:

Consider further the functions �i;A.x/ D �.xi �Bi/, �i;B.x/ D �.Ai �xi /, and let �be the sum of all these functions. Then �J D �J =�, J 2 F form a smooth partitionof unity. ut

5 Green’s Theorem 205

5.2

By a domain we shall mean an open subset U of R2 (or of C) which hascompact closure (which, by 6.5, is equivalent to being bounded). This conditionmay not be very strong, but we will see that it will play an absolutely crucialrole in the proof of Green’s Theorem. Let L1; : : : ; Lk be oriented piecewisecontinuously differentiable simple closed curves in R2 with disjoint images andwith parametrizations c1; : : : ; ck . We will say that L1 q � � � qLk is the boundary ofa domain U oriented counter-clockwise if the images of Li are contained in U andfor every x 2 U X U there exists an open neighborhood Vx of x and an injectiveregular map with bounded partial derivatives

�x W Vx ! .o; 1/ D fx 2 R2j jjxjj < 1g

with

det.D�x/ > 0;

a number ˛ 2 .0; 2/ and numbers a > 0, b 2 R and i 2 f1; : : : ; kg such that(1) �x.b/ D x;

(2) �xŒU \ Vx� D f.r cos �; r sin �/j 0 < r < 1; 0 < � < ˛g;(3)

.

k[jD1

Im.cj //\ Vx D ci Œ.b � a; b C a/�;

(4) For s 2 h�1; 0i, we have �xci .as C b/ D .�s cos.˛/;�s sin.˛// and for s 2h0; 1i, we have �xci .as C b/ D .s; 0/:

5.3 Comment

Informally, the above definition says simply that the boundary of U is a union of theimages of theLi ’s and that at a neighborhood of every point of the boundary, locallyU looks like a wedge of an open disk (the wedge may also be a half-disk) whoseboundary is parametrized linearly by one of the curves ci in the same direction asthe increasing parametrization of .�1; 1/ is with respect to the upper half-disk

f.x; y/ 2 .o; 1/jy � 0g:

Note, however, the great generality this allows, for example a disk D with severalopen disks removed whose disjoint closures are in the interior of D, or similarlywith polygons, etc. The beauty of the upcoming proof is that it uses no intuitiveproperties of such situations except the formal properties given in the definition;


for example, we do not use any intuitive notion of “interior” or “exterior” of thecurves Li , and although the expression “counter-clockwise” matches the intuition,Definition 5.2 is not based on intuition. Another way of putting this is to note thatour definition of boundary is purely local in the sense that it is completely describedby requirements on neighborhoods of individual points of C.

5.4

LetZ

L1q��qLkf D

Z

L1

f C � � � CZ

Lk

f:

Theorem. (Green’s Theorem) Let U be a domain in R2 and let L1; : : : ; Lk be

oriented piecewise continuously differentiable simple closed curves with disjointimages such that L1 q � � � q Lk is the boundary of U oriented counter-clockwise.Let M D U and let f W V ! R

2 be a function with continuous (first) partialderivatives for some V � M open. Then we have

Z

L1q��qLkf D

Z

M

�@f2

@x1� @f1

@x2

�: (5.4.1)

Proof. First, we note that the theorem is valid for U D .0;K/ .0;K/, K > 0,i D 1 and

c1 W h0; 4i ! R2

defined by

c1.t/ D .Kt; 0/ for 0 � t � 1,D .K;K.t � 2// for 1 � t � 2,D .K.3� t/;K/ for 2 � t � 3,D .0;K.4� t// for 3 � t � 4.

In this case, applying Fubini’s Theorem and the Fundamental Theorem of Calculusin one variable, we get

Z

M

@f2

@x1DZ K

0

�Z K

0

@f2.x1; x2/

@x1dx1

�dx2

DZ K

0

.f2.K; x2/ � f2.0; x2// dx2 DZ

L1

f CZ

L3

f:

5 Green’s Theorem 207

Similarly, we have

Z

M

� @f1@x2

DZ K

0

�Z K

0

�@f1.x1; x2/@x2

dx2

�dx1

DZ K

0

.f1.x1; 0/� f1.x1;K// dx1 DZ

L4

f CZ

L2

f:

Adding these two formulas gives the statement in the present case. Amazingly, thisis the only concrete case of the theorem we need to prove by direct calculation.

Now consider the general case. First we need to observe that the statement (5.4.1)doesn’t change if we perform a (2-variable) substitution by a diffeomorphism � WV ! V 0 (see Theorem 7.9 of Chapter 5). This is easy to accept, but somewhatharder to do in detail. The reason is that even in two variables, the concepts we setup so far do not transform in the simplest possible way under coordinate change.We will understand this better in Chapter 12 below.

To do the calculation we need, let us write

.x1; x2/T D F..r1; r2/

T /;

so identifying, at a point, the linear map D� with its associated matrix, we have

DF D

0BBB@

@x1

@r1

@x1

@r2

@x2

@r1

@x2

@r2

1CCCA :

Now consider a parametrized vector function f W R2 ! R2 where we understandthe independent variables to be x1; x2 (i.e. “the x1x2-plane”). Let L be an orientedpiecewise continuously differentiable curve in R2, which we understand as “ther1r2-plane”. We will denote, slightly imprecisely, by FŒL� the “F-image of the curveL in the x1x2-plane”, i.e. the oriented curve obtained by composing the parametriza-tion of L with the map F. The key observation then is that the definition of the lineintegral of the second kind gives

Z

FŒL�f � dx D

Z

L

..DF/T .f ı F// � dr: (5.4.2)

(Note that if we wrote the integrand of a line integral of the second kind as a rowinstead of column vector, the transposition on the right hand side of (5.4.2) wouldbe unnecessary - again, we will understand this better in Chapter 12 below.)


Denoting the integrand on the right hand side of (5.4.2) by g, we have, incoordinates,

g D

0BBB@

@x1

@r1f1 C @x2

@r1f2

@x1

@r2f1 C @x2

@r2f2

1CCCA :

Now compute:

@g2

@r1� @g1

@g2

D @x1

@r2

@f1

@r1C @2x1

@r1@r2f1 C @x2

@r2

@f2

@r1C @2x2

@r1@r2f2

�@x1@r1

@f1

@r2� @2x1

@r1@r2f1 � @x2

@r1

@f2

@r2� @2x2

@r1@r2f2:

(5.4.3)

We see that the second order terms cancel out, and after applying the chain rule

@fi

@rjD @fi

@x1

@x1

@rjC @fi

@x2

@x2

@rj;

the right hand side of (5.4.3) becomes a sum of eight terms, four of which cancelout, leaving

�@f2

@x1� @f1

@x2

��@x1

@r1

@x2

@r2� @x1

@r2

@x2

@r1

�D�@f2

@x1� @f1

@x2

�det.DF/;

which is what we need to transform (5.4.1) from the x-coordinates to ther-coordinates, provided det.DF/ > 0 (see Theorem 7.9 and Exercise (16) ofChapter 5).

Now by compactness of U (our main assumption!), there exist open setsV1; : : : ; Vm of R2 such that

V1 [ � � � [ Vm � U

and for each i , we have either Vi � U or x 2 Vi � Vx for some x 2 UXU . Let ui bea smooth partition of unity subordinate to the cover .Vi /. We shall prove the formula(5.4.1) for each of the functions ui f, i D 1; : : : ; m. We distinguish four cases:

Case 1: Vi � U . By linear substitution, we may assume Vi � .0;K/ .0;K/.Thus, the statement for uif follows from the special case already proved (theleft hand side of (5.4.1) with f replaced by ui f is 0).

6 Exercises 209

Case 2: x 2 Vi � Vx and 0 < � < . By R2-linear substitution, we may assume� D =2. In this case, chooseK D 1 and extend the map ui f ı .�x/�1 to an openset containing h0;Ki h0;Ki by 0. Again, the statement reduces to the specialcase already proved (noting that for this new function, the contributions to theright hand side of (5.4.1) for 1 � t � 3 are 0).

Case 3: � D . By the linear substitution

r D 1

2.x1 C 1/; s D x2;

applied to the function ui f ı .�x/�1, the statement reduces to the special casealready proved with K D 1. (Note that for this function, the contributions to theleft hand side of (5.4.1) with 1 � t � 4 are 0.)

Case 4: < � < 2 . By R2-linear substitution applied to the functionui f ı .�x/�1, we may assume � D 3=2. Then extend the function ui f ı .�x/�1to an open neighborhood in R2 of the set

Z D .h�1; 1i h�1; 1i/X ..0; 1i h�1; 0//:Express

Z D Z1 [Z2where

Z1 D h�1; 1i h0; 1i;Z2 D h�1; 0i h�1; 0i:

The sets are not disjoint, but the intersection has measure 0. For the sets Z1, Z2and restrictions of the function ui f ı .�x/�1, the statement follows from Cases 3and 2, respectively. When adding the left hand sides of formula (5.4.1) for thesefunctions, the contributions from the line segment h�1; 0i f0g cancel out. ut

6 Exercises

(1) Prove the statement of the comment in 1.1.(2) Prove that the relation in 1.2 is an equivalence relation.(3) Prove that in 1.7, � � is a piecewise continuously differentiable

parametrization of a curve.(4) Prove that the parametrized curve K C L defined in 1.7 depends only on the

parametrized curvesK and L, and not on their parametrizations.(5) Prove that in 1.7, we have .K C L/CM D K C .LCM/.


(6) Prove directly that the factor�0.t/

jj�0.t/jj of 3.4 at each point �Œha; bi� (where

defined) does not depend on the parametrization of the piecewise continuouslydifferentiable oriented curve. Prove also that reversal of orientation of thecurve results in multiplication of this factor by �1.

(7) Compute the complex line integral

Z

L

ezdz

where L is the straight line segment in C from 2C 3i to 1C i .(8) Write out in detail the simplification of the right hand side of (5.4.3) using the

chain rule.(9) Compute

.II/Z

L

y2dx C 2xydy

where L is the boundary of the upper unit half-disk

f.x; y/T 2 R2 j x2 C y2 < 1; y > 0g

oriented counterclockwise.(10) Prove that if L1 q � � � q Lk is the boundary of a domain U oriented

counter-clockwise, then the area of U is equal to

1

2

Z

L1q��qLkxdy � ydx:

[Hint: Use Green’s Theorem.](11) Using Green’s Theorem and Theorem 4.4, compute the complex line integral

Z

L

z2dz

where L is the boundary of the square

fx C iy j 0 < x < 1; 0 < y < 1g

oriented counter-clockwise.

Part II

Analysis and Geometry

9Metric and Topological Spaces II

For the remaining chapters of this text, we must revisit our foundations. Specifically,it is time to upgrade our knowledge of both metric and topological spaces. Forexample, in the upcoming discussion of manifolds in Chapter 12, we will needseparability. We will need a characterization of compactness by properties of opencovers. Also, it is natural to define manifolds as topological and not metric spaceswhich prompts the development of separation axioms, with a focus on normality. Onthe other hand, when discussing Hilbert spaces in Chapters 16 and 17, we will needcompletion, extension of uniformly continuous maps, and the Stone-WeierstrassTheorem. These are the topics we will discuss in the present chapter.

1 Separable and totally bounded metric spaces

1.1 A few concepts

A subset M � X of a topological space is said to be dense if M D X .A space is separable if it contains an at most countable dense subset. (At most

countable means finite or countable.).

A cover of a space .X; �/ (� is the set of all open sets, recall Subsection 4.2 ofChapter 2) is a subset U � � such that

[U D X:

Note that we only consider covers by open sets. (In other texts, this requirementis sometimes dropped, in which case our concept would be called an open cover.)A subcover V of a cover U is a subset V � U that is itself a cover.

A space X is said to be Lindelof if every cover of X contains an at mostcountable subcover.

1.2 Theorem. The following statements about a metric space X D .X; d/ areequivalent.


213

214 9 Metric and Topological Spaces II

(1) X is separable.(2) The topology of X has a countable basis.(3) X is Lindelof.

Proof. (1))(2): Let M be a countable dense subset of X . Put

B D f.m; r/ j m 2 M; r rationalg:

We will prove that B is a basis. Take an open U , an x 2 U , and an " > 0 such that.x; "/ � U . Now choose an m 2 M such that d.x;m/ < 1

3" and a rational r such

that 13" < r < 2

3". Then x 2 .m; r/ � .x; "/ � U : in effect, if d.m; y/ < r we

have d.x; y/ � d.x;m/C d.m; y/ < .13

C 23/" D ".

(2))(3): Let B be a countable basis and let U be an arbitrary open cover. PutB0 D fB 2 B j 9U 2 U ; B � U g. Then B0 is a countable cover, and if we choosefor each B 2 B0 a UB 2 U with B � UB then also fUB j B 2 B0g is a countablecover.

(3))(1): For every positive natural number n, choose a countable subcover ofthe cover f.x; 1

n/ j x 2 Xg, say

.xn1;1n/; : : : ; .xnk;

1n/; : : : :

Then the set fxnk j n; k D 1; 2; : : : g is dense in X . ut

1.2.1 Remarks1. This is a very specific fact concerning metric spaces. In a general topological

space one has only the (very easy) implications (2))(3) a (2))(1) and nothingmore.

2. In the literature, the existence of a countable basis is often called the secondaxiom of countability.

1.2.2Obviously, ifX has a countable basis B then each subspace Y � X has one, namelyBjY D fU \ Y j U 2 Bg. Hence we have

Corollary. A subspace of a separable metric space is separable.Equivalently, a subspace of a Lindelof metric space is Lindelof.

The first of these statements hardly comes as a surprise (it is easy to prove itdirectly, too). But the second one should sound somewhat strange. We will seeshortly (in 2.3 below) that Lindelof property is very close to compactness, andcompactness is (very obviously) not preserved on subspaces. Again, this corollary ischaracteristic for metric spaces. In a general topological context neither of the twostatements holds.

1 Separable and totally bounded metric spaces 215

1.3

A metric space X is said to be totally bounded if for each " > 0 there exists a finitesubset M."/ of X such that

for every x 2 X; we have d.x;M."// < ": (TB)

1.3.1A totally bounded space is always bounded but a bounded space is not necessarilytotally bounded: take any infinite set and define d.x; y/ D 1 for x ¤ y. But wehave

Proposition. A subspace X of the Euclidean space Rn is totally bounded if andonly if it is bounded.

Proof. If X is bounded, then we have

X � h�N;N i � � � h�N;N i

for a suficiently large natural N . Choose a natural number k such that Nk< "

2

and put

M D fs D . s1k; : : : ; sn

k/ j si are integers, �Nk � si � Nkg:

For every x 2 X , there exists an s 2 M such that d.x; s/ < "2. For an s 2 M ,

choose an x.s/ 2 X such that d.x.s/; s/ < "2, if such x.s/ exists, and put

MX D fx.s/ j s 2 M such that x.s/ existsg:

Then, by the triangle inequality, we have, for every x 2 X , d.x;MX/ <"2

C "2

D ".ut

1.4 Proposition. A metric space X is totally bounded if and only if every sequencein X contains a Cauchy subsequence.

Proof. I. Let X be totally bounded. Consider the sets M.1n/ from the defini-

tion 1.3. Now consider a sequence .xi /iD1;2::: in X . If the set P D fxi j i D1; 2; : : : g is finite, then our sequence contains a constant subsequence, which is,of course, Cauchy. Otherwise choose first m1 2 M.1/ so that P1 D P \.m1; 1/ is infinite, and then k1 with xk1 2 P1. Now assuming we have

mj 2 M. 1j/; j D 1; : : : ; n � 1; such that Pj D Pj�1 \.mj ;

1j/

are infinite, and k1 < k2 < � � � < kn�1 such that xkj 2 Pj ;


choosemn 2 M.1n/ with Pn D Pn�1 \.m1; 1/ infinite, and a kn > kn�1 such

that xkn 2 Pn. Then the sequence xk1 ; xk2 ; xk3 ; : : : is obviously Cauchy.II. Let X not be totally bounded. Then there exists an " > 0 such that for

every finite M � X there exists an x 2 X with d.x;M/ � ". Pick anarbitrary point x1; assuming we have chosen x1; : : : ; xn, pick an xnC1 2 X

so that d.xnC1; fx1; : : : ; xng/ � ". The resulting sequence obviously containsno Cauchy subsequence. ut

1.5 Proposition. A totally bounded metric space is separable.

Proof. It sufices to take M D S1nD1 M. 1n /. ut

2 More on compact spaces

2.1

A point x of a space X is said to be an accumulation point of a subset M � X iffor every neighborhoodU of X the intersection U \M is infinite.

Here is a simple reformulation of the definition of compactness we have used sofar (i.e. requiring that every sequence have a convergent subsequence).

2.1.1 Proposition. A metric space X is compact if and only if every infinite setM � X has an accumulation point.

Proof. I. Assume that every sequence inX has a converegent subsequence. LetMbe an infinite subset of X . Choose a one-to-one (not necessarily onto) mapping' W N ! M and a convergent subsequence .'.kn//n of .'.n//n. Then lim

n'.kn/

is an accumulaton point of M .II. Assume that every infinite M � X has an accumulation point. Let .xn/n be

a sequence in X . If M D fxn j n D 1; 2; : : : g is finite then .xn/n contains aconstant, and hence convergent, subsequence. Assume that M is infinite, andlet x be one of its accumulation points. Choose k1 arbitrarily, and assumingxk1 ; : : : ; xkn�1 , k1;< � � � < kn�1, are chosen, choose kn > kn�1 so that xkn 2.x; 1

n/ (such a kn exists since x is an accumulation point, and on the other

hand, only finitely many j , namely those with j � kn�1, are disqualified by thedefinition of a subsequence). Now obviously lim

nxkn D x. ut

2.2 Theorem. A metric space X is compact if and only if it is complete and totallybounded.

Proof. I. If X is compact then it is complete by 7.4 of Chapter 2. If X were nottotally bounded, there would exist an " > 0 such that for every finite subsetM there is a point x with d.x;M/ � ". Choose x1 arbitrarily and assuming

2 More on compact spaces 217

x1; : : : ; xn are already chosen, pick an xn such that d.xn; fx1; : : : ; xng/ � ".Then fxn j n D 1; 2; : : : g is infinite and obviously has no accumulation point.

II. If .xn/n is a sequence in a totally bounded complete metric space then by1.4 it contains a Cauchy subsequence, and by completeness, this subsequenceconverges. ut

2.2.1 RemarkThis fact is a generalization of Theorem 6.5 of Chapter 2 stating that a subset X �Rn is compact if and only if it is closed and bounded. We know that Rn is complete(7.5 of Chapter 2), and hence, by 7.5 of Chapter 2 again, X is complete if andonly if it is closed; by 1.3.1, for X � Rm, boundedness and total boundedness areequivalent.

2.3

The following is the famous Heine-Borel Theorem. One can think of it as ageneralization of 5.5 of Chapter 2 to arbitrary metric spaces.

Theorem. A metric space is compact if and only if each of its (open) coverscontains a finite subcover.

Proof. Let X be compact and let U 0 be a cover of X which has no finite subcover.By 2.2 and 1.5, X is separable, hence by 1.2 it is Lindelof, and hence U 0 has acountable subcover

U D fU1; U2; : : : ; Un; : : : g:By assumption, U has no finite subcover.

Now our strategy is to discard the Ui ’s which are “redundant” in the given order.More precisely,

let V1 D Uj for the lowest j for which Uj ¤ ;,

and assuming Vj , j D 1; : : : ; n � 1 are already chosen,

let Vn D Uj for the lowest j such that Uj Xn�1[iD1

Vi ¤ ;

(by assumption, the finite system fV1; : : : ; Vn�1g cannot be a cover). Choose

xn 2 Vn Xn�1[iD1

Vi and put M D fxn j n D 1; 2; : : : g:


Now

• xn … fx1; : : : ; xn�1g .�n�1[iD1

Vi / and henceM is infinite,

• V1; : : : ; Vn; : : : is a cover since each discarded Uj is contained in the union ofthe Vi ’s, and

• Vn \M � fx1; : : : ; xng and hence is finite.This is a contradiction: The set M must have an accumulation point x, this x is anelement of some Vn, but this neighborhood of x meets M in finitely many pointsonly.

II. Assume each cover of X has a finite subcover and assume M � X has noaccumulation point. Then for every x 2 X there exists an open neighborhood Unsuch that Un \M is finite. Choose a finite subcover Uk1; : : : ; Ukn . Then

M D .

n[iD1

Uxi / \M Dn[iD1.Uxi \M/

is a finite union of finite sets and hence is finite. ut

2.4

Theorem 2.3 suggests the following definition of compactness for general topologi-cal spaces, which we will adopt from now on:

A topological space is said to be compact if each of its (open) covers has a finitesubcover.

Similarly as in the special case of metric spaces (recall 6.2 of Chapter 2) we have

2.4.1 Proposition. Let f W X ! Y be a continuous map and let X be compact.Then the subspace f ŒX� of Y is compact.

Proof. Let Ui , i 2 J , be open in Y and let f ŒX� �[i2J

Ui . Then X �

f �1Œ[i2J

Ui � D[i2J

f �1ŒUi � and hence there exist i1; : : : ; in such that

X �n[

jD1f �1ŒUi � D f �1Œ

n[jD1

Uij �:

This is equivalent to f ŒX� �n[

jD1Uij . ut

From this statement we obtain, again, the following important generalization ofProposition 6.3 of Chapter 2:

3 Baire’s Category Theorem 219

2.4.2 Proposition. Let X be a compact topological space. Then every continuousreal function f W X ! R has both a maximum and a minimum.

2.4.3 Proposition. A closed subspace Y of a compact topological space iscompact.

Proof. Let Ui , i 2 J , be open sets in X such thatSUi � Y . Then fUi j i 2

J g [ fX X Y g is an open cover of X and hence there exists a finite subcover

Ui1 ; : : : ; Uin; X X Yof X . Since Y \ .X X Y / D ; we have Y � Sn

jD1 Uij . ut

2.4.4 RemarkUnlike the case of metric spaces, a compact subspace of a topological space isnot necessarily closed: for example, any subspace of a finite topological space iscompact, but not every subset may be closed. This, in fact, is one of the motivationsof separation axioms, which can be used to remedy this situation, and which will bediscussed in Section 5 below.

3 Baire’s Category Theorem

3.1

A subsetA of a topological spaceX is said to be nowhere dense ifX XA is dense in

X , that is, ifX XA D X (recall thatA denotes the closure of A, i.e. the intersectionof all closed subsets of X which contain A).

In other words,A is nowhere dense if and only if for each non-empty open U the intersectionU \ .X X A/ is non-empty.Consequently we obtain

3.2 Observation. A union of finitely many nowhere dense subsets of X is nowheredense.

(If A;B are nowhere dense and U is non-empty open then U \ .X X A/ isnon-empty open and hence U \ .X X A/ \ .X X B/ D U \ .X X .A [ B// DU \ .X XA [ B/ is non-empty.)

3.3

A subset A � X is of the first category (or meager) in X if A is a countable union

1[nD1

An

with An nowhere dense. From 3.2 we immediately see that


A subsetA � X is of the first category inX if it is a union1SnD1

An of an increasing

sequence A1 � A2 � � � � of nowhere dense subsets.

3.4 Theorem. (Baire’s Category Theorem) If X is a complete metric space then Xis not of the first category in X .

Proof. Let

A1 � A2 � � � � � An � � � �

be an increasing sequence of nowhere dense subsets of a complete metric space X .We will prove that X X

[n

An ¤ ;.

Since a closure of a nowhere dense set is nowhere dense, we may assume withoutloss of generality that the sets An are closed.

The set A1 is nowhere dense closed and hence there exists an x1 2 X X A1 andan "1, 0 < "1 < 1 such that .x1; 2"1/\ A1 D ;.

Now .x1; "1/ is a non-empty open set and hence .x1; "1/ \ .X X A1/ ¤ ;and we have an x2 and an "2, 0 < "2 < 1

2, such that .x2; 2"2/ � .x1; "1/\ .X X

A2/, i.e.

.x2; 2"2/\ A2 D ; and .x2; 2"2/ � .x1; "1/:

Now assume we already have x1; : : : ; xn and "1; : : : ; "n, 0 < "k < 1k

, such that

.xk; 2"k/ \Ak D ; for k � n; and

.xk; 2"k/ � .xk�1; "k�1/ for 1 < k � n:

Since .xn; "n/ is a non-empty open set, we have a non-empty open .xn; "n/ \.X XAnC1/ and hence there is an xnC1 and an "nC1 with 0 < "nC1 < 1

nC1 such that

.xnC1; 2"nC1/\AnC1 D ; and .xnC1; 2"nC1/ � .xn; "n/:

Since.x; "/ � .x; 2"/ (if d.y;.x; "// D 0 we can find a z 2 .x; "/ such thatd.y; z/ < "), setting Bn D .xn; "n/ we obtain a sequence

.x1; 2"1/ � B1 � .x2; 2"2/ � B2 � .x3; 2"3/ � B3 � � � �

such that .xk; 2"k/\ Ak D ; (and hence Bk \ Ak D ;).For k � n we have xk 2 .xn; 2"n/ and since "n < 1

nthe sequence .xn/n is

Cauchy, and by completeness it has a limit x 2 X . Furthermore, for k � n we havexk 2 Bn, and since Bn is closed, x 2 Bn. Thus, x 2 TBn. Since Bn \ An D ; wehave

TBk \An D ; and finally

TBk \S

An D ;. Therefore, x … SAn. ut

4 Completion 221

4 Completion

4.1

Let X D .X; d/ be a metric space. On the set of Cauchy sequences .xn/n in X ,introduce an equivalence relation

.xn/n .x0n/n �df lim

nd.xn; x

0n/ D 0

( is obviously reflexive and symmetric, and the transitivity immediately followsfrom the triangle inequality).

4.2 Lemma. 1. If .xn/n and .yn/n are Cauchy sequences in X then .d.xn; yn//nis a Cauchy, and hence convergent, sequence in R.

2. If .xn/n .x0n/n and .yn/n .y0

n/n then limn d.xn; yn/ D limn d.x0n; yn/.

Proof. 1. From the triangle inequality, we immediately see that

jd.xm; ym/ � d.xn; yn/j � d.xm; xn/C d.ym; yn/:

Thus, if d.xm; xn/; d.ym; yn/ < "2, then jd.xm; ym/ � d.xn; yn/j < ".

2. d.xn; yn/ � d.xn; x0n/ C d.x0

n; y0n/ C d.y0

n; yn/ and hence limd.xn; yn/

� limd.x0n; y

0n/, and by symmetry also lim d.x0

n; y0n/ � limd.xn; yn/. ut

4.3

Denote by QX the set of all the -equivalence classes of Cauchy sequences in .X; d/.For �; � 2 QX , define

Qd.�; �/ D limd.xn; yn/ where .xn/n 2 � and .yn/n 2 �:

Observation. QX D . QX; Qd/ is a metric space.

(The definition of Qd is correct by 4.2: obviously Qd is symmetric and satisfies thetriangle inequality, and if Qd.�; �/ D 0 and .xn/n 2 �; .yn/n 2 � then we obtain.xn/n .yn/n by comparing the definitions of and Qd .)

4.4

A bijection (i.e. a one-to-one onto map) f W .X; d/ ! .X 0; d 0/ is called anisometry if

8x; y d 0.f .x/; f .y// D d.x; y/: (*)


(Note that (*) implies that f is one-to-one. Thus, to verify that a mapping satisfyingthis condition is an isometry it suffices to prove that it is onto.)

If such a mapping exists we say that the spaces .X; d/ and .X 0; d 0/ are isometric.A map satisfying the condition (*) without assuming that it is onto will be called

an isometric embedding.

Proposition. Every metric space is isometric to a dense subspace of a completemetric space.

Proof. For x 2 X define Qx 2 QX as the class containing the constant sequence

x; x; x; : : : :

Obviously the mapping

� D .x 7! Qx/ W X ! �X D f Qx j x 2 Xg � QX

is an isometry.I. �X is dense in QX . Consider an arbitrary " > 0. For a � 2 QX , choose a

representative .xn/n and an n0 such that d.xm; xn/ < " form; n � n0. Then

Qd.�; Qxn0/ D limmd.xn; xn0/ � ":

II. QX is complete. Let .�n/n be a Cauchy sequence in QX . Since X is a dense subsetof QX , we can choose an xn 2 X such that

Qd.�n; Qxn/ < 1n:

For an " > 0, choose an n0 such that Qd.�m; �n/ < " wheneverm; n � n0. Then

d.xm; xn/ D Qd. Qxm; Qxn/ � Qd. Qxm; �m/C Qd.�m; �n/C Qd.�n; Qxn/ < 1m

C "C 1n

and we see that .xn/n is a Cauchy sequence.Denote by � the equivalence class of .xn/n. We will show that this � is a limit, in

QX , of the sequence .�n/n. Take an arbitrary " > 0 and an n0 such that 1n0< "

2and

for n; k � n0, d.xm; xk/ < "2

(this can be done since we already know that .xn/n isa Cauchy sequence). Then for n � n0, we have

Qd.�n; �/ � Qd.�n; Qxn/C Qd. Qxn; .xk/k/ < "2

C limkd.xn; xk/ � "

2C "

2D ": ut

4 Completion 223

4.5

An isometric embedding of a metric space X into a complete metric space with adense image is called a completion of X .

Proposition. Up to isometry, there exists precisely one completion of a metric spaceX . More precisely, in the notation of 4.4, if ' W X ! Y is a completion then thereexists an isometry f W QX ! Y such that f ı � D '.

Proof. If we denote by � the metric on Y , we have

8x; y 2 X; �.'.x/; '.y// D d.x; y/ and 'ŒX� D Y:

For a � 2 QX , choose a representative .xn/n and put

f .�/ D limn'.xn/ .in Y /

(by the isometric embedding requirement, .'.xn//n is Cauchy and hence convergentin Y ; if .xn/n .yn/n, then again by the isometric embedding requirement,

limn�.'.xn/; '.yn// D limd.xn; yn/ D 0;

and hence limn'.xn/ D lim

n'.yn/ so that the definition does not depend on the

choice of a representative).We have f . Qx/ D '.x/ (the limit of a constant sequence), and since a metric is

(obviously) a continuous function, we have

�.f .�/; f .�// D �.limn'.xn/; lim

n'.yn//

D lim �.'.xn/; '.yn// D limd.xn; yn/ D Qd.�; �/:

Thus, f is an isometric embedding, and it remains to show that f is onto. Take a y 2Y . Since 'ŒX� is dense, there exists a sequence .xn/n in X such that lim.'.xn// Dy. Thus in particular .'.xn//n is Cauchy, and, since ' is an isometric embedding,so is .xn/n is. If we denote by � the equivalence class of .xn/n, we obtain f .�/ Dlim.'.xn// D y. ut

4.6 Extension of uniformly continuous maps

When discussing the Fourier transform in Chapter 17, we will need the followingimportant result on extension of uniformly continuous maps to the completion.


Proposition. Let .X; d/; .X 0; d 0/ be metric spaces, let .X 0; d 0/ be complete and letY be a dense subspace of X . Then each uniformly continuous f W Y ! X 0 has aunique uniformly continuous extension g W X ! X 0.

Proof. For an x 2 X , choose a sequence xn in Y such that limxn D x, and setg.x/ D lim

nf .xn/. (Clearly, this definition is forced by the assumption of uniform

continuity of g, which already proves uniqueness.) Let us show that this is a correctdefinition of a mapping: .xn/n is a Cauchy sequence, hence .f .xn//n is Cauchyand hence convergent; if .yn/n is another sequence in Y converging to x we have aCauchy sequence f .x1/; f .y1/; f .x2/; f .y2/; : : : ; f .xn/; f .yn/; : : : converging toboth lim

nf .xn/ and lim

nf .yn/. Considering the constant sequence, g.x/ D f .x/ for

x 2 Y .Now let " > 0. Choose "1; � > 0 such that "1 C 2� < ", and a ı > 0 such that

d.u; v/ < ı implies d 0.f .u/; f .v// < "1 for u; v 2 Y . Let d.x; y/ < ı. Choose nsufficiently large such that

d.xn; yn/ < ı and d 0.f .xn/; g.x//; d 0.f .yn/; g.y// < �:

Then

d 0.g.x/; g.y// � d 0.g.x/; f .xn///C d 0.f .xn/; f .yn//C d 0.f .yn/; g.y//

< �C "1 C � < ":

ut

5 More on topological spaces: Separation

Topological spaces are seldom used in the generality of Chapter 2, Section 4. Forvarious purposes, extra assumptions are usually added. In analysis, we typicallyencounter so-called separation axioms, (in fact, typically, the stronger ones), whichwe will briefly introduce in this section. It is worth noting that in this context,separation refers to separation of points or subsets by open sets; it is not relatedto separability as defined in Section 1 above.

5.1 T0 and T1

A topological space is said to be T0 if for any two distinct points x; y 2 X thereexists an open set U such that either x … U 3 y or y … U 3 x. This is equivalentto requiring that fxg D fyg implies that x D y.

A space is said to be T1 if for any two distinct points x; y 2 X there is an openset U such that y … U 3 x. This is equivalent to requiring that every finite set beclosed.

5 More on topological spaces: Separation 225

It should be noted that while there is not much use for spaces that are not T0,spaces which are not T1 are used a lot (typically, however, in applications outsideanalysis).

5.2 T2, or the Hausdorff axiom

A space is Hausdorff (or, T2) if for any two distinct points x; y 2 X there aredisjoint open sets U; V such that x 2 U and y 2 V .

Hausdorff spaces are already “analysis-friendly”; for instance they admit con-cepts of convergence in which limits are unique. We will not discuss such topics butwill present the following fact which has been promised before.

5.2.1 Proposition. In a Hausdorff space every compact subset is closed.

Proof. Let A � X be a compact subspace. Fix an x … A. We will prove that thereis a neighborhood of x that is disjoint from A.

For each a 2 A choose disjoint open sets Ua 3 a and Va 3 x. Then fUa j a 2 Agis a cover of A and hence there is an open subcover Ua1; : : : ; Uan . Set V D

n\iD1

Vai .

Then V \n[iD1

Uai D ; and hence V \A D ;. ut

From 2.4.1, we obtain the following generalization of 7.2 of Chapter 2.

5.2.2 Corollary. Let f W X ! Y be a continuous map, let X be compact and letY be Hausdorff. Then for every closed A � X , the image f ŒA� is closed. Thus inparticular such an f W X ! Y that is bijective is a homeomorphism.

5.3 Regularity and complete regularity (T3 and T3C 12

)

A space X is regular, or T3 , if for every x 2 X and every closed A � X such thatx … A, there are open disjoint U; V such that x 2 U and A � V .X is completely regular, or T3C 1

2, if for every x 2 X and every closed A � X

such that x … A there is a continuous mapping ' W X ! h0; 1i such that '.x/ D 0

and 'ŒA� � f1g.Obviously a completely regular space is regular: take the assumed ' and set

U D '�1Œh0; 12/� and V D '�1Œ. 1

2; 1i�� .

5.3.1 Proposition. A topological space X is regular if and only if for every openU � X ,

U D[

fV j V open; V � U g:


Proof. I. Let X be regular and let x 2 U . Then x … X X U and there are disjointopen sets V 3 x and W � X X U . Now V � X XW � U and since X XWis closed, V � U .

II. Let the condition hold, let A be closed, and let x … A. Then

x 2[

fV j V open; V � X X Ag

and hence there is an open set V 3 x such that A � X X V . ut

5.4 Normality

A space is normal (or T4) if for any two disjoint closed subsets A;B � X , thereexist disjoint open sets U; V such that A � U and B � V .

5.4.1 Remarks1. After 5.3, the reader may expect an axiom T4C 1

2requiring a separation of disjoint

closed sets by continuous real functions. This, however, already follows fromnormality as we will see in 5.4.6 below. On the other hand, complete regularitydoes not follow from regularity.

2. Of course we have T2 ) T1 ) T0 while we do not have such implications forthe higher separation axioms (T3 does not imply T2, T4 does not imply T3). Thereason is that the higher separation axioms in fact do not require that pointsbe closed. In practice, one usually works with T3&T1, T3C 1

2&T1 and T4&T1

and then the expected implications from “higher” to “lower” separation axiomnaturally hold.

5.4.2 Proposition. Every metric space .X; d/ is normal.

Proof (Recall 8.4 of Chapter 2). For disjoint closed sets A;B � X define a maping

' W X ! h0; 1i

by setting

'.x/ D d.x;A/

d.x;A/C d.x;B/:

Since the A;B are closed and disjoint we cannot have simultaneously d.x;A/ D 0

and d.x;B/ D 0 and hence d.x;A/Cd.x;B/ > 0 for all x . Thus, ' is continuousand we can take U D '�1Œh0; 1

2/� and V D '�1Œ. 1

2; 1i� . ut

5 More on topological spaces: Separation 227

5.4.3 Proposition. Every regular Lindelof topological space is normal.

Proof. LetX be regular Lindelof and letA;B be closed and disjoint sets. For a 2 A,choose open disjoint sets Ua 3 a and V 0

a � B .fUa j a 2 Ag [ fX XAg is a cover of X and therefore we have a subcover

X XA;U1; : : : ; Un; : : : :

Thus we have obtained open sets

U1; : : : ; Un; : : : such that[n

Un � A and Un \ B D ;:

Taking, instead, the unions U1; U1 [ U2; U1 [ U2 [ U3; : : : we can assume that

U1 � U2 � � � � Un � � � � :

Similarly we can find open sets

V1 � V2 � � � � � Vn; � � � such that[n

Vn � B and V n \A D ;:

Now set

QUn D Un Xn[

jD1V j ; U D

[n

QUn; and

QVn D Vn Xn[

jD1U j ; V D

[n

QVn:

We have A � U (no point of A appears in any of the subtracted V j ) and B � V ,

and U \V D[m;n

. QUm\ QVn/ D ;, since in any of the intersectionsUm\ QVn, we have

either m � n or m � n. ut

5.4.4 Proposition. Every compact Hausdorff space is normal.

Proof. By 5.4.3, it suffices to prove that the space is regular. Let A be closed andx … A. For a 2 A choose disjoint open sets Ua 3 a and Va 3 x. Then fUa j a 2 Agis a cover of A and hence there is an open subcover Ua1; : : : ; Uan . Set U D

n[iD1

Uai

and V Dn\iD1

Vai . Then x 2 V , A � U and U \ V D ;. ut


5.4.5 Lemma. LetQ � h0; 1i be a dense subset. Let us have in a topological spaceX open sets Uq , q 2 Q, such that

q < r ) U q � Ur :

Define a mapping ' W X ! h0; 1i by setting

'.x/ D inffq j x 2 Uqg:

Then ' is continuous.

Proof. Set M.x/ D fq j x 2 Uqg Since obviously q 2 M.x/ and q < r implyr 2 M.x/, we have q > '.x/ ) x 2 Uq and hence

x … Uq ) '.x/ � q: (*)

For q < '.x/ take an r with q < r < '.x/; then x … Ur and we see that

q < '.x/ ) x … U q: (**)

Let '.x/ 2 .˛; ˇ/ (the cases '.x/ D 0 or 1 are only simpler and can be left to thereader). Choose ˛ < q < ' < r < ˇ. Then by the implications above,

x 2 Ur X U q and 8y 2 Ur X U q; '.y/ 2 .˛; ˇ/:

Thus, the neighborhood Ur X U q of x is being mapped into .˛; ˇ/ and we see that' is continuous. ut

5.4.6 Proposition. (Urysohn’s Theorem) Let A;B be disjoint closed subsets of anormal space X . Then there is a continuous mapping ' W X ! h0; 1i such that'ŒA� � f0g and 'ŒA� � f1g.

Proof. Let Q be the set of all dyadic rationals between 0 and 1, that is, the

k

2n; n D 1; 2; : : : I k D 1; 2; : : : ; 2n � 1:

Choose disjoint open U.12/, V such that A � U.1

2/ and B � V (so that U.1

2/ �

X X B). Now let U. k2m/ be already chosen form � n so that

q < r ) U.q/ � U.r/:

For k D 0; : : : ; 2n, choose disjoint open sets U.2kC12nC1 /, V such that

6 The space of continuous functions revisited: The Arzela-Ascoli Theorem and : : : 229

U. k2n/ � U.2kC1

2nC1 / and X X U.kC12n/ � V (and hence U.2kC1

2nC1 / � U.kC12n//

where for k D 0we take the setA instead ofU.0/ and for k D 2n we takeB insteadof X X U.1/.

Thus we obtain inductively a system U.q/, q 2 Q, satisfying the requirementsof Lemma 5.4.5, and the statement follows. ut

5.4.7 Remarks1. In particular, every Lindelof regular space is completely regular. It should be

noted that, with the exception of T3 » T3C 12, proving that a lower separation

axiom does not imply a higher one is easy. This exception, on the contrary,was a hard nut to crack (and had been an open problem for quite sometime). Proposition 5.4.6 shows why: the counterexample has to use uncountablereasoning in a substantial way.

2. Lemma 5.4.5 can be used to reformulate complete regularity without referring tothe real numbers. Recall 5.3.1. Denote by � the relation

V � U �df V � U:

It is in general not interpolative (that is, we generally do not necesarily have aWsuch that U � W � V ). If we denote by C the largest interpolative subrelationof � then completely regular spaces can be characterized as those where eachopen U is the union

SfV j V C U g.

6 The space of continuous functions revisited:The Arzela-Ascoli Theorem and the Stone-WeierstrassTheorem

Certain very strong theorems hold about the space C.K/ of (necessarily bounded)continuous real functions on a compact metric space K with the supremum metricconsidered in 7.7 of Chapter 2. We will prove two such results in the this section,and use them in Chapters 10 and 17 below.

6.1 The Arzela-Ascoli Theorem

A sequence of functions fn 2 C.K/ is called uniformly bounded if there exists anumber M such that jfn.x/j < M for every n and every x 2 K . Therefore, beinguniformly bounded is the same thing as fn 2 .0;M/ for all n, for a fixedM > 0,where 0 is the constant zero function. Additionally, the sequence of functions .fn/nis called equicontinuous if for every " > 0, there exists a ı > 0 such that for everyx; y 2 K and every n 2 N,


d.x; y/ < ı ) jjfn.x/ � fn.y/jj < ":

Thus, this means that the functions fn are all uniformly continuous with the samebound ı depending on ", independent of n.

6.2 Theorem. (The Arzela-Ascoli Theorem) Let K be a compact metric space.Then any uniformly bounded equicontinuous sequence of functions .fn/n in C.K/has a uniformly convergent subsequence (i.e. a subsequence convergent in C.K/).

Proof. By Theorem 2.2, the space K is totally bounded. Therefore, for each " 2 N,there is a finite subset S" � K such that for every x 2 K , d.x; y/ < " for at leastone y 2 S".

Now let

S D[k

S1=k D fx1; x2; x3; : : : g:

Then f ŒK� is compact by Proposition 6.2 2, so there exists a subsequence .fi1n/nof .fn/n such that the sequence fi1n.x1/ converges. Next, there exists a subsequence.fii2n /n of .fi1n/n such that fi2n.x2/ converges. Repeating this procedure, we maysuccessively pick subsequences .fijn/n such that fijn.xj / converges. Note howeverthat then since we picked each sequence as a subsequence of the previous one, the“diagonal” subsequence finn converges on every point of S . Now let "=3 � 1=k.Taking ı D ı."/ for a given " from the definition of equicontinuity, let N be suchthat for m; n > N , jfimm.s/ � finn.s/j < "=3 for every s 2 Sk. Then, by thetriangle inequality, jfimm.x/ � finn.x/j < ", (since there exists an s 2 S1=k withd.x; s/ < ı."=3/, showing that jfimm.x/ � finn.x/j < " for every x 2 K , showingthat the subsequence .finn/n is Cauchy in C.K/. Since however C.K/ is complete(by Proposition 7.7.2), this subsequence converges in C.K/. ut

Sometimes we are interested in working in the space C.X/ of bounded realcontinuous functions on a space X which is not compact. In that case, theassumptions of equicontinuity and uniform boundedness, and consequently theconclusion of uniform convergence, are often too strong. One strategy for gettingaround this is the following: We say that a topological space X is �-compact if it isa union of countably many compact subsets.

6.3 Theorem. Suppose that X is a �-compact metric space. Then every sequence.fn/n in C.X/ which is equicontinuous and bounded on every K � X compact hasa subsequence which is uniformly convergent on every K � X compact.

Proof. We use the “diagonal method” one more time. Let

X D1[nD0

Kn


for Kn compact. Then using Theorem 6.2, choose a subsequence .fi1n/n whichconverges uniformly on K1. Within this subsequence, choose another subsequence.fi2n/n which converges uniformly on K2. Proceeding in the same way, keepchoosing consecutive subsequences, so that .fijn/n converges uniformly in Kj .Then the “diagonal” subsequence .finn/n satisfies the requirement. ut

Another important problem in analysis is approximation, i.e. the problem offinding a convenient subset dense in a given metric space X . We will now provea very strong approximation theorem for the space C.K/ of real functions on acompact metric spaceK , for which we will find an application in Chapter 17 below,in our treatment of Fourier series.

6.4 The Stone-Weierstrass Theorem: Assumptions and statement

Notice that the space C.K/ has the structure of a vector space over R, and that theoperations of addition and multiplication by a scalar are continuous. In addition tothis, C.K/ also has an operation of product of function, which is also continuous.We will consider subsets A � C.K/ satisfying the following assumptions:(1) A is a vector subspace of C.K/, contains the constant function 1 with value 1,

and for f; g 2 A, we have f � g 2 A. (We say that A is a unital subalgebra ofC.K/.)

(2) For any two points x; y 2 K , there exists a function f 2 A such that f .x/ ¤f .y/ (we say that A separates points).

6.4.1 Theorem. (The Stone-Weierstrass Theorem) Let A be a unital subalgebra ofC.K/ which separates points. Then A is a dense subset of C.K/.

The proof of this theorem will occupy the remainder of this section. However,let us observe one thing right away: since the operations of addition of functions,multiplication of functions and multiplication by a scalar are continuous functionsC.K/ C.K/ ! C.K/, R C.K/ ! C.K/, the closure of a unital subalgebrais a unital subalgebra. Therefore, the statement of the theorem will follow if we canprove that every closed unital subalgebra of C.K/ which separates points is equalto C.K/.

6.5

An important step in the proof of the theorem is the fact that the square root (andhence the absolute value) of a non-negative continuous function on a boundedcompact interval is a uniform limit of polynomials. To prove this, we use the Taylorexpansion of

p1 � x.


Lemma. Let 0 < b < 1. Then the Taylor expansion ofp1 � x at the point x D 0

converges absolutely uniformly in the interval h�b; bi.

While it is possible to prove this fact in an elementary way, a much easier proofwill follow from the methods of complex analysis. Because of this, we will skipthe proof at this point, and refer the reader to Exercise (8) of Chapter 10 where wedefine rigorously the function

p1 � x for x 2 C, Re.x/ < 1, and prove that the

(complex) radius of convergence of its Taylor series is 1.

Comment: In fact, using a lemma of Abel’s, the upper bound of uniform conver-gence can be extended to 1. However, we do not need that fact.

6.6 Lemma. Let A � C.K/ be a closed unital subalgebra.(1) If f 2 A and f � 0, then

pf 2 A.

(2) If f 2 A, then jf j 2 A.(3) If f; g 2 A, then max.f; g/;min.f; g/ 2 A.

Proof. Without loss of generality, maxk

jf j < 1. By Lemma 6.5,

pf C 1=n D

1XkD0

1=2

k

!.1 � 1=n� f /k

converges uniformly for n D 1; 2; : : : , and hence

pf C 1=n 2 A: (6.6.1)

The functionpx is continuous, and hence, by Theorem 6.6 of Chapter 2, uniformly

continuous on h0; 2i, which implies thatpf C 1=n converges to

pf uniformly

with n ! 1, and hencepf 2 A.

(2) This follows from the formula jf j D pf 2 and from (1).

(3) This follows from (2) and the fact that

max.f; g/ D 1

2.f C g C jf � gj/; min.f; g/ D 1

2.f C g � jf � gj/: ut

6.7 Proof of Theorem 1.1:

Let A � C.K/ be a closed unital subalgebra which separates points, and let f 2C.K/. Given " > 0, we will construct a g 2 A such that for every x 2 K ,

jf .x/ � g.x/j < ": (*)


Since " > 0 was arbitrary, this will imply that f is a limit of a uniformly convergentsequence of elements of A, and hence f 2 A since A is closed. Since f wasarbitrary, A D C.K/, which implies the statement of the theorem.

To construct g, consider two points s ¤ t 2 K . Since A separates points, wemay choose h 2 A such that h.s/ ¤ h.t/. Now define, for v 2 K ,

fs;t .v/ D f .s/C .f .t/ � f .s//h.v/ � h.t/

h.s/ � h.t/

Clearly, fs;t 2 A, and

fs;t .s/ D f .s/; fs;t .t/ D f .t/:

Now fixing s, let

Ut D fv 2 K j fs;t .v/ < f .v/C "g:

Then

Ut D .fs;t � f /�1Œ.�1; "/�;

and since fs;t ; f are continuous,Ut is open. On the other hand, s; t 2 Ut , and hence.Ut /t¤s is an open cover of K . Since K is compact, this open cover has a finitesubcover .Ut1 ; : : : ; Utm/. Putting

hs D min.fs;t1 ; : : : ; fs;tm/;

we have

hs < f C "; hs.s/ D s:

Now let

Vs D fv 2 K j hs.v/ > f .v/ � "g:

Then

Vs D .hs � f /�1Œ.�";1/�;

and hence Vs is open. Since s 2 Vs , .Vs/s2K is an open cover of K . Since K iscompact, this cover has a finite subcover .Vs1 ; : : : ; Vsp /. Let

g D max.hs1 ; : : : ; hsp /:


Then g 2 A, and

f � " < g < f C ";

as desired. ut

7 Exercises

(1) Prove directly that a subspace of a metric separable space is separable.(2) Prove that a subspace of a totally bounded metric space is totally bounded.(3) Prove that a (finite) product of totally bounded metric spaces is totally bounded.(4) Using Baire’s theorem, prove that an increasing function f W h0; 1i ! R is

Lipschitz on a dense subset of h0; 1i.(5) Prove a modification of Baire’s Category Theorem where “complete metric

space” is replaced by “compact Hausdorff space”.(6) Prove that an onto isometry of metric spaces is a homeomorphism.(7) A rigorous construction of real numbers. Note carefully that the field of

real numbers R cannot be constructed as a completion of the metric space Q

of rational numbers directly using 4.1, since the definition of the metric inLemma 4.2 uses the real numbers, thereby making such an argument circular.Nevertheless, this difficulty can be circumvented, and the approach of 4.1 canbe used to define R after all. Following the logically correct sequence of stepsis the point of the present exercise.(a) Consider, on Q, the metric d.a; b/ D ja � bj. Now define R as the set of

equivalence classes of Cauchy sequences with respect to the equivalencerelation defined in 4.1. Prove that R is a field with respect to the operationof addition and multiplication of Cauchy sequences, which contains Q asthe subfield of (equivalence classes of) constant sequences.

(b) Write, for a Cauchy sequence x D .xi /i in Q, x > 0 when there exists anN such that xi > 0 for every i > N . Prove that if x y, then x > 0 ifand only if y > 0. (Caution: note that this fails if we tried to use � insteadof >.)

(c) Define, for a 2 R, jaj D a when a > 0 and jaj D �a .D 0 � a/

otherwise. Prove that d.a; b/ D ja � bj is a metric on R and that R is acomplete metric space with respect to this metric.

(d) The material of 4.1 is now rigorous without previously assuming aconstruction of R. Verify (caution, it is very nearly a tautology) that themetric space R is indeed the completion of the metric space Q as definedin 4.1.

(8) Prove that any open set in Rn is �-compact.

(9) Prove the following converse to the Arzela-Ascoli Theorem: IfX is a compactmetric space and .fn/n is a uniformly convergent sequence in C.X/, then it isuniformly bounded and equicontinuous.

7 Exercises 235

(10) Prove the following result known as the Weierstrass Approximation Theorem:For a continuous function f W ha; bi ! R, there exists a sequence ofpolynomials (with real coefficients) pn.x/ which, when restricted to ha; bi,converge to f .

(11) Prove that the set of all polynomials in the variables sin.nx/, cos.nx/, n D0; 1; 2; : : : is dense in C.h0; 2i/. Is the set of all polynomials in the variablessin.nx/, n D 0; 1; 2; : : : dense in C.h0; 2/? Prove or disprove.

10Complex Analysis I: Basic Concepts

In this chapter, we will develop the basic principles of the analysis of complexfunctions of one complex variable. As we will see, using the results of Chapter 8,these developments come almost for free. Yet, the results are of great significance.On the one hand, complex analysis gives a perfect computation of the convergenceof a Taylor expansion, which is of use even if we are looking at functions ofone real variable (for example, power functions with a real power). On the otherhand, the very rigid, almost “algebraic”, behavior of holomorphic functions is astriking mathematical phenomenon important for the understanding of areas ofhigher mathematics such as algebraic geometry ([8]). In this chapter, the readerwill also see a proof of the Fundamental Theorem of Algebra and, in Exercise (4), aversion of the famous Jordan Theorem on simple curves in the plane.

1 The derivative of a complex function. Cauchy-Riemannconditions

1.1

From 1.2 of Chapter 1, recall the complex conjugate z D x � iy of z D x C iy andthe absolute value jzj D p

zz, the easy rules

z1 C z2 D z1 C z2; z1z2 D z1z2 and jz1 � z2j D jz1j � jz2j;

and the slightly harder triangle inequality

jz1 C z2j � jz1j C jz2j:

Further recall from 4.2 of Chapter 8 that the set of complex numbers C is identifiedwith the Euclidean plane, with the distance jz1 � z2j equal to Euclidean distancein R2.


237

238 10 Complex Analysis I: Basic Concepts

1.2

Let U � C be an open subset and let f W U ! C be a mapping, i.e. a complex

function of one variable. We can compute, in the field C, the values1

h.f .z C h/ �

f .z// for h ¤ 0 D 0 C i0, and, analogously to the case of real functions of onevariable, consider the limit

limh!0

f .z C h/ � f .z/h

;

(but this time in the metric space C), if it exists. If the limit exists, we speak (again)of a derivative of f in z. More generally, one can introduce, in the obvious way,partial derivatives of functions f W U1 � � �Un ! C of several complex variables.

One uses the same notation as in the real case:

f 0; f 0.z/;df

dz; etc.

By precisely the same procedure as in the real case we can prove the formulas

.f C g/0 D f 0 C g0; . f /0 D f 0; .f � g/0 D f 0 � g C f � g0

(the second of which concerns the multiplication by a complex constant), thecomposition rule

.f ı g/0.z/ D f 0.g.z// � g0.z/

and the formula .zn/0 D n � zn�1, so we can take derivatives of polynomials exactlyas in the real case.

1.3

What we cannot do, however, is adopt the interpretation of a derivative as describinga tangent, or expressing smoothness, as in the real case. The function f .z/ D z iscertainly as smooth as a map can be: geometrically it is just mirorring the planealong the real axis. But we have here

f .z C h/� f .z/

hD z C h� z

hD h

h;

an expression that has no limit for h approaching 0: on the real axis, i.e. for h Dh1 C i0, we have constantly the value h

hD h1

h1D 1 while on the imaginary axis, i.e.

for h D 0C ih2, we have hh

D �h2h2

D �1.

1 The derivative of a complex function. Cauchy-Riemann conditions 239

In other words, while the condition of existence of complex derivative does implythe existence of total differential of the function f considered as a map R2 ! R2

(or U ! R2 where U is an open set in R2), the converse is not true: the existenceof a complex derivative is a much stronger condition. We will see below in 5.3that it has a different interpretation, namely of f preserving orientation and angles:smoothness follows.

1.4 Cauchy–Riemann conditions

Writing z D x C iy, we can view a complex function f W U ! C as

f .z/ D P.x; y/C iQ.x; y/

where P;Q are real functions in two real variables. We will now show that thedifferentiability of f implies certain equations between the partial derivatives ofP an Q.

1.4.1 Theorem. Let a complex function f have a (complex) derivative at a pointz D xC iy. Then the functions P;Q have partial derivatives at .x; y/ and we have

@P.x; y/

@xD @Q.x; y/

@yand

@P.x; y/

@yD �@Q.x; y/

@x: (CR)

The derivative of f is then given by the formulas

f 0.z/ D @P.x; y/

@xC i

@Q.x; y/

@xD @Q.x; y/

@y� i @P.x; y/

@y:

Remark. The equations (CR) are referred to as the Cauchy - Riemann conditions.We have shown that these conditions are necessary for complex differentiability.We will show in Theorem 1.5 below that the conditions are also sufficient whenf is continuously differentiable. A theorem of Looman and Menchoff states,more generally, that the conditions are also sufficient assuming only that f iscontinuous, but we will not need that result here. The conditions (CR) alone,without any additional assumption on f , however, do not imply differentiability(see Exercise (2).)

Proof. Put h D h1 C ih2. We have

1

h.f .z C h/ � f .z// D 1

h1 C ih2.P.x C h1; y C h2/� P.x; y//

C i

h1 C ih2.Q.x C h1; y C h2/ �Q.x; y//:

(*)


For h2 D 0 (and h1 ¤ 0) this yields in particular

1

h1.P.x C h1; y/ � P.x; y//C i

h1.Q.x C h1; y/ �Q.x; y// (**)

while for h1 D 0 (and h2 ¤ 0) we obtain

�ih2.P.x; y C h2/� P.x; y//C 1

h2.Q.x; y C h2/�Q.x; y//: (***)

If the expression (*) has a limit for h ! 1, the expression (**) has the same limitfor h1 ! 0, namely

@P.x; y/

@xC i

@Q.x; y/

@x.D f 0.z//

and similarly (***) yields

@P.x; y/

@y� i @Q.x; y/

@y.D f 0.z//:

Comparing the real and the imaginary parts, we obtain the desired equations. ut

1.5 Theorem. Let P;Q be real functions of two variables with continuous partialderivatives, let f .z/ D P.x; y/C iQ.x; y/ and let the conditions (CR) be satisfiedat some point z D x C iy 2 U . Then f has a derivative in z.

Proof. We have

1

h.f .z C h/ � f .z/

D 1

h.P.x C h1; y C h2/ � P.x; y/C iQ.x C h1; y C h2/ � iQ.x; y//

D 1

h.P.x C h1; y C h2/ � P.x C h1; y/C P.x C h1; y/� P.x; y/

C i.Q.x C h1; y C h2/ �Q.x C h1; y/CQ.x C h1; y/ �Q.x; y///:

Denote the right-hand side by u. Using the Mean Value Theorem and (CR), weobtain

P .x C h1; y C h2/� P.x C h1; y/C P.x C h1; y/ � P.x; y/

D @P.x C h1; y C ˛h2/

@yh2 C @P.x C ˇh1; y/

@xh1

D �@P.x C h1; y C ˛h2/

@xh2 C @P.x C ˇh1; y/

@xh1

1 The derivative of a complex function. Cauchy-Riemann conditions 241

and similarly

Q.x C h1; y C h2/�Q.x C h1; y/CQ.x C h1; y/ �Q.x; y/

D @Q.x C h1; y C �h2/

@yh2 C @Q.x C ıh1; y/

@yh1

D @Q.x C h1; y C �h2/

@xh2 C @P.x C ıh1; y/

@xh1;

with some 0 < ˛; ˇ; �; ı < 1. Thus, setting h D h1 C ih2,

u D 1

h

�@P.x C h1; y C ˛h2/

@x.h1 C ih2/C i

@Q.x C ıh1; y/

@x.h1 C ih2/

C�@P.x C ˇh1; y/

@x� @P.x C h1; y C ˛h2/

@x

�h1

��@Q.x C h1; y C �h2/

@x� @Q.x C ıh1; y/

@x

�h2

�

D @P.x C h1; y C ˛h2/

@xC i

@Q.x C ıh1; y/

@xC d1 � h1

hC d2 � h2

h

and since the differences d1; d2 tend to 0 and

ˇˇhih

ˇˇ � 1, the statement follows. ut

1.6 Holomorphic functions

A complex function f W U ! C on an open set U � C with continuouspartial derivatives which satisfies the Cauchy-Riemann conditions is said to beholomorphic. It can be shown that a complex function is holomorphic on U if andonly if it has a complex derivative on U . (By what we already proved, sufficiency isthe non-trivial part.) This is the famous theorem of Goursat which can be found, forexample, in [1].

From the chain rule, it is again immediate that for holomorphic functions f; g inan open set U , f C g, f � g, f � g are holomorphic, as is f

gprovided that g is

non-zero at all points of U .

1.7

Recall the complex line integral from Section 4 above. Later we will need thefollowing fact. It is an easy consequence of 3.7 and 4.4 of Chapter 8, but we shallspell things out, mainly to exercise the Cauchy-Riemann conditions.


Theorem. Let f .�; z/ be a continuous complex function of two variables whichis holomorphic in � in some open set U � C. Then the complex line integral

RL

satisfies

d

d�

Z

L

f .�; z/dz DZ

L

@f .�; z/

@�dz:

Proof. Set F.�/ D RL f .�; z/dz and write f .�; z/ D P.˛; ˇ; x; y/C iQ.˛; ˇ; x; y/

where � D ˛ C iˇ. From 4.4 of Chapter 8, we see that

F.�/ D P.˛; ˇ/C iQ.˛; ˇ/

where

P.˛; ˇ/ D .II/Z.P.˛; ˇ; x; y/ �Q.˛; ˇ; x; y/

DZ.P.˛; ˇ; x; y/dx �Q.˛; ˇ; x; y/dy/;

Q.˛; ˇ/ D .II/Z.Q.˛; ˇ; x; y/; P.˛; ˇ; x; y/

DZ.Q.˛; ˇ; x; y/dx C P.˛; ˇ; x; y/dy/:

Since f is holomorphic in � , we have

@P

@˛D @Q

@and

@P

@D �@Q

@˛

so that by 3.7 of Chapter 8,

@P@˛

D .II/Z �

@P

@˛;�@Q

@˛

�D .II/

Z �@Q

@;@P

@

�D @Q@;

@P@

D .II/Z �

@P

@;�@Q

@

�D �.II/

Z �@Q

@˛;@P

@˛

�D �@Q

@˛

(1.7.1)

so that F.�/ is holomorphic and hence has a derivative. By 1.4,@f

@�D @P

@˛C i

@Q

@˛and hence by (1.7.1) and 1.4 again,

Z@f .�; z/

@�dz D .II/

Z �@P

@˛;�@Q

@˛

�C i.II/

Z �@Q

@˛;@P

@˛

�D @P@˛

C i@Q@˛

D dF

d�: ut

2 From the complex line integral to primitive functions 243

2 From the complex line integral to primitive functions

2.1 Theorem. Let U be a domain in C. Let L1; : : :; Lk be simple piece-wise continuously differentiable closed curves with disjoint images such thatL1 q � � � q Lk is the boundary of U oriented counter-clockwise (see 5.2 ofChapter 8). Let f be a function defined on an open set V contining U . Thenthe complex line integral of f satisfies

Z

L1

f .z/dz C � � � CZ

Lk

f .z/dz D 0:

Proof. Put again f .z/ D P.x; y/C iQ.x; y/. By 4.4 of Chapter 8, we have

Z

Li

f D .II/Z

Li

.P;�Q/C i.II/Z

Li

.Q;P /

and by the Green’s formula (5.4.1) of Chapter 8, the sum of these factors is equal to

Z

U

��@Q@x

� @P

@y

�C i

Z

U

�@P

@x� @Q

@y

�:

By the Cauchy-Riemann conditions, both the summands are zero. ut

2.2

Consider two oriented simple arcs P1; P2 expressed by parametrizations�i W h˛i ; ˇi i ! C such that �1.˛1/ D �2.˛2/ and �1.ˇ1/ D �2.ˇ2/ and�1.x/ ¤ �2.y/ unless x D ˛1 and y D ˛2 or x D ˇ1 and y D ˇ2. ThenL D �P2 C P1 is a piecewise continuously differentiable simple closed curve. IfL is the boundary of a domain U and f is holomorphic on an open subset of Ccontaining U , then by 2.1,

Z

P1

f DZ

P2

f:

2.3

Let f be holomorphic in a convex open set U � C. For a; b 2 U define

Z b

a

f .z/dz DZ

L.a;b/

f .z/dz

where L.a; b/ is parametrized by � W h0; 1i ! C, �.t/ D a C t.b � a/.


Fix a 2 U and write for u 2 U ,

F.u/ DZ u

a

f .z/dz:

Theorem. We have F 0.z/ D f .z/.

Proof. We claim that

Z

L.a;uCh/f .z/dz D

Z

L.a;u/f .z/dz C

Z

L.u;uCh/f .z/dz: (2.3.1)

In effect, this is trivial when the points a; u and u C h are colinear. Otherwise thepiecewise continuously differentiable simple curves P1 D L.a; u C h/ and P2 DL.a; u/C L.u; u C h/, h 2 h0; 1i, satisfy the assumptions of 2.2 and hence (2.3.1)follows from 3.4 and 4.4 of Chapter 8. Now, by (2.3.1),

1

h.F.u C h/� F.u// D 1

h

Z 1

0

f .u C th/hdt DZ 1

0

f .u C th/dt

DZ 1

0

P.u C th/dt C i

Z 1

0

Q.u C th/dt

which with real h ! 0 approachesP.u/C iQ.u/, by the Mean Value Theorem. ut

2.4 Comment

By analogy with the theory of real functions, we call F a primitive function of f ifF 0 D f . It is easy to observe that the difference between two primitive functions onan open set is locally constant, i.e. constant on each connected component. Indeed,by 1.4, we can reduce this to the fact that a real function with partial derivativesequal to 0 on an open set is locally constant. In particular, on a convex open set U ,any two primitive functions differ by a constant.

2.5

It is curious to observe that the proof of Theorem 2.3 can be “transported” (with onlyminor modifications) by a (real) injective regular map. More precisely, identifyingC with R2, let � W U ! V be a bijective regular map in the sense of Subsection 7.1of Chapter 3. Then the proof of Theorem 2.3 remains valid with the line segmentsL.a; b/ replaced by their �-images. We obtain therefore the following

3 Cauchy’s formula 245

Proposition. If V is an open set in C such that there exists a bijective (real) regularmap � W U ! V where U is convex, then every holomorphic function on V has aprimitive function.

As it turns out, the converse is also true. In fact, in Section 1 of Chapter 13, weshall prove much more, namely that unless U D C, the map � can be chosen to beholomorphic. This is the famous Riemann Mapping Theorem.

3 Cauchy’s formula

3.1 Lemma. Let Kr be a circle with center in a point z and radius r > 0, orientedcounter-clockwise. Then we have

Z

Kr

d�

� � zD 2i:

Proof. ParametrizeKr by

� W h0; 2i ! C; �.t/ D z C r.cos.t/C i sin.t//:

Then we have �0.t/ D r.� sin.t/C i cos.t//, and hence

Z

Kr

d�

� � zDZ 2

0

r.� sin.t/C i cos.t//

r.cos.t/C i sin.t//DZ 2

0

idt D 2i: ut

3.2

Notice that the integral computed in 3.1 is not required to vanish by Theorem 2.1because the argument is not defined (and in fact, goes to infinity) at � D z.

3.3 Theorem. (Cauchy’s formula) Let f be holomorphic in an open disk .z; R/with R > r > 0. Then we have

1

2i

Z

Kr

f .�/

� � zd� D f .z/:

Proof. We have

Z

Kr

f .�/

� � zd�

DZ

Kr

f .z/

� � zd� C

Z

Kr

f .�/ � f .z/� � z

d�:


The first summand on the right-hand side is equal to 2if .z/ by 3.1. We shall provethat the second summand is 0. Since

f 0.z/ D lim�!z

f .�/ � f .z/� � z

;

the quantityf .�/ � f .z/

� � zis bounded on the set UXfzg for some open neighborhood

U of z (and hence, by continuity, on .z; r/ X fzg). Let

ˇˇf .�/ � f .z/

� � z

ˇˇ < A in .z; r/ X fzg:

By Lemma 4.5 of Chapter 8, for 0 < s < r , we have

ˇˇZ

Ks

f .�/ � f .z/

� � zd�

ˇˇ � 4A � 2s D 8As:

In particular,

lims!0

Z

Ks

f .�/ � f .z/� � z

d� D 0:

Now we will apply 2.1 to

U D .z; r/ X.z; s/; (*)

with k D 2, L1 D Kr , L2 D �Ks . By (*), we have

Z

Kr

f .�/ � f .z/

� � zd�

D lims!0

�Z

Kr

f .�/ � f .z/

� � zd� �

Z

Ks

f .�/ � f .z/

� � zd�

�D .by 2.1/

D lims!0

0 D 0: ut

3.4 Theorem. A holomorphic complex function on an open set U has complexderivatives of all orders on U .

Proof. By 1.7, we may differentiate the argument of the integral in Cauchy’sformula repeatedly by z, giving

3 Cauchy’s formula 247

f .k/.z/ D kŠ

2i

Z

Kr

f .�/

.� � z/kC1 d�: (3.4.1)

ut

3.5 Corollary. A continuous complex function f on a convex open set in C isholomorphic if and only if it has a primitive function.

Proof. If f is holomorphic then it has a primitive function F by Theorem 2.3. If fhas a primitive function F then F is holomorphic since f is continuous. Now applyTheorem 3.4 to the function F . ut

We also get the following

3.6 Theorem. (Weierstrass’s Theorem) Suppose that fn is a sequence of holomor-phic functions defined on an open set U � C which converge to a function f .z/uniformly on every compact subset of U . Then f is a holomorphic function on U .Furthermore, f 0

n converge to f 0 uniformly on every compact subset of U .

Proof. Using Cauchy’s formula (Theorem 3.3) with f replaced by fn, and takingthe limit after the integral sign using Lebesgue’s Dominated Convergence Theoremimplies the same formula for f , proving that f is holomorphic. Further, using thesame argument on formula (3.4.1) (k D 1), we see that f 0

n converges to f 0, andfurther that the convergence is uniform in a disk with center z and radius r=2. Acompact set is covered by finitely many such disks by the Heine-Borel Theorem 2.3of Chapter 9, which implies that the convergence of derivatives is uniform on acompact set. ut

The following result will be useful for applying the Arzela-Ascoli Theorem 6.2to sequences of analytic functions.

3.7 Theorem. A sequence .fn/n of holomorphic functions defined on an openset U � C which is uniformly bounded on every compact subset K � U isequicontinuous on every compact subset K � U .

Proof. Let z0 2 U , and assume .z0; r/ � U . Let M be the boundary of .z0; r/,oriented counterclockwise. For z 2 .z0; r/, we get

f .z/ � f .z0/ D 1

2i

Z

M

�1

� � z� 1

� � z0

�f .�/d�

D z � z02i

Z

M

f .�/d�

.� � z/.� � z0/:

(*)


If jf .t/j < C for all t 2 M , and if z 2 .z0; r=2/, then the right-hand side of (*) isless than or equal to

4C jz � z0jr

: (C)

Now let K � U be a compact subset. We claim that there exists an r > 0 and acompact set L, K � L � U such that every point of distance � r from some pointof K belongs to L.

(For every point x 2 K , there is a number s.x/ > 0 such that .x; s.x// � U .By the Heine-Borel Theorem 2.3 of Chapter 9,K is covered by finitely many of theopen disks.xi ; s.xi /=3/, for some points xi , i D 1; : : : ; k. Let s D minfs.xi /ji D1; : : : ; kg. Then we may put r D s=3, L D

k[iD1

.xi ; s.xi /=3/.

Now let C be a uniform bound on jfn.z/j for z 2 L. Then in (C) we may alwaysuse these values of C and r . We see that then at least for z; t 2 K , jz � t j < r=2,

jfn.z/� fn.t/j < 4C jz � t jr

;

which implies equicontinuity on K . ut

Note that in the preceding proof, we have proved more than equicontinuity,namely a uniform Lipschitz constant.

3.8 Remarks

1. Note that the statements 3.4 and 3.5 are in sharp contrast with real analysis.2. We will see that Cauchy’s formula in complex analysis plays an analogous role

to the Mean Value Theorem in real analysis. It is, however, a much stronger tool,which makes certain concepts (such as the Taylor series) much easier.

3. Realize the role of the argumentf .�/

� � zgoing to infinity at the point z. Note that

all the information about the integral in 8.3 is contained in an arbitrarily smallneighborhood of z.

4. By the same argument, the circle Kr could be replaced by any closed simplecurve L which is the boundary of a domain U oriented counter-clockwise andsuch that z 2 U .

4 Taylor’s formula, power series, and a uniqueness theorem

4.1 Theorem. (Taylor’s formula) Let f be holomorphic in a neighborhood of apoint c 2 C. Then, in a sufficiently small neighborhood of c, we have

4 Taylor’s formula, power series, and a uniqueness theorem 249

f .z/ D f .c/C 1

1Šf 0.c/ �.z�c/C 1

2Šf 00.c/ �.z�c/2C� � �C 1

nŠf .n/.c/ �.z�c/nC: : : :

Proof. We have

1

� � zD z

� � c � 1

1 � z � c� � c

: (*)

Consider a circle Kr with center c and radius r such that f is holomorphic in.c;R/ for some R > r . Let z be such that jz � cj < r , so that

ˇˇ z � c

� � c

ˇˇ < 1

for a point � of the circle Kr . From (*), we obtain

1

� � zD 1

� � c

1C z � c

� � c C�

z � c� � c

�2C � � � C

�z � c� � c

�nC : : :

!

D 1

� � c C .z � c/ � 1

.� � c/2C .z � c/2 � 1

.� � c/3C : : :

C.z � c/n � 1

.� � c/nC1 C : : : :

Thus, from Cauchy’s formula and Lebesgue’s Dominated Convergence Theorem(note that we are dealing with continuous functions and therefore all partial sumshave a uniform constant bound), we get:

f .z/ D 1

2i

Zf .�/

� � zd�

D 1

2i

Zf .�/

� � cd� C .z � c/ 1

2i

Zf .�/

.� � c/2 d� C : : :

C.z � c/n1

2i

Zf .�/

.� � c/nC1 d� C : : : :

By the formula in the proof of Theorem 3.4, we have

1

2i

Zf .�/

.� � c/nC1 d� D 1

nŠf .n/.c/: ut


4.2

Note that repeating verbatim the proofs in Section 7 of Chapter 1, we get thefollowing

Proposition. A (complex) power series

1XkD0

ak.z � c/k (*)

converges absolutely and uniformly in a circle with center c and any radius

s < r D lim inf1

npjanj

and diverges outside of the closed circle with center c and radius r . (The number ris called the radius of convergence of the power series (*).)

Moreover, the power series

1XkD1

kak.z � c/k�1

has the same radius of convergence as (*), and the series (*) may be differentiatedterm by term.

4.3

The power series

ez D 1C z

1ŠC z2

2ŠC : : : ;

sin.z/ D z � z3

3ŠC z5

5Š� : : : ;

cos.z/ D 1 � z2

2ŠC z4

4Š� : : :

will now be considered the definitions of the functions ez, sin.z/, cos.z/ for zcomplex (the radius of convergence of these series is 1). Therefore, we have

4 Taylor’s formula, power series, and a uniqueness theorem 251

eiz D cos.z/C i sin.z/; e�iz D cos.z/� i sin.z/;

and also

cos.z/ D eiz C e�iz

2; sin.z/ D eiz � e�iz

2i:

4.4 A uniqueness theorem

Lemma. Let f , g be holomorphic in an open set U , and let c 2 U , c D lim cn,cn ¤ c. Suppose f .cn/ D g.cn/ for all n. Then f D g in some neighborhood of c.

Proof. It suffices to prove that if f .cn/ D 0 for all n, then f � 0 in someneighborhood of c. By Taylor’s formula, we have

f .z/ D1XkD0

ak.z � c/k

for some constants ak . It suffices to prove that

ak D 0 for all k: (*)

Assuming (*) does not hold, let n be the smallest number such that an ¤ 0. Then insome neighborhood of c,

f .z/ D .z � c/n � .an C anC1.z � c/C anC2.z � c/2 C : : : /:

The function in the parentheses on the right-hand side is continuous (it is a uniformlimit of continuous functions), and not zero at c; thus, it is non-zero in someneighborhood of c, and so is .z � c/n, contradicting our assumptions. ut

Theorem. Assume f; g are holomorphic on a connected open set U , and let c 2 U ,c D lim cn, c ¤ cn, and f .cn/ D g.cn/ for all n. Then f � g on U .

Proof. Let

M D fz 2 U jf .u/ D g.u/ in some neighborhood of zg:

M is clearly open, and by the lemma, it is also closed and non-empty. Since U isconnected, we have M D U . ut


4.5 The algebra of power series

Note that on two power series of the form 4.2.(*) with the same c, we can performaddition, sutraction and multiplication (in the case of multiplication, note that onlyfinitely many terms with the same power .z�c/k are added). As an inverse operationto this purely algebraic multiplication, note that it is also possible to divide by anypower series 4.2.(*) with a0 ¤ 0, figuring the coefficients of the ratio by a recursiveprocedure.

It will be important for us that when these purely algebraic operations areperformed on power series with a positive radius of convergence representing Taylorseries at c of holomorphic functions f , g, the power series resulting in an algebraicoperation converges and is the Taylor series of f C g, f � g, f � g or f=g, (thedivision requires g.c/ ¤ 0). All of these statements are more or less obvious withthe exception of the division. Here we note that since g.c/ ¤ 0, we have g.z/ ¤ 0 insome disk .c; r/, r > 0. Therefore, f=g is a holomorphic function in a disk withcenter a, and hence has a Taylor expansion at c. Multiplying this Taylor expansionwith the Taylor expansion of g at c algebraically, we then get the Taylor expansionof f at c by uniqueness. This implies that the Taylor expansion of f=g at c is thealgebraic ratio of the Taylor expansions of f and g at c.

5 Applications: Liouville’s Theorem, the FundamentalTheorem of Algebra and a remark on conformal maps

5.1 Theorem. (Liouville’s Theorem) Suppose f is holomorphic and bounded in allof C. Then f is constant.

Proof. By the formula from 3.4, for any circleKr with center z and radius r we have

f 0.z/ D 2Š

2i

Z

Kr

f .�/

.� � z/2d�:

Suppose jf .z/j � A for all z 2 C. For a point � on Kr , we then have

ˇˇ f .�/

.� � z/2

ˇˇ � A

r2;

and by 4.5 of Chapter 8, we have

jf 0.z/j � 4 � 2Š2

� 2r � Ar2

D 8A

r:

Since r > 0 was arbitrary, we must have f 0.z/ � 0 and hence f must be constant.ut

5 Applications: Liouville’s Theorem, the Fundamental Theorem of Algebra: : : 253

5.2 Theorem. (The Fundamental Theorem of Algebra) Every non-constant polyno-mial has at least one root in C.

Proof. Suppose a polynomial

p.z/ D zn C an�1zn�1 C � � � C a1z C a0; n � 1

has no root in C. Then the function

f .z/ D 1

p.z/

is defined and holomorphic on all of C. Let

R D 2nmax.ja0j; : : : ; janj/(where an D 1). For jzj � R, we then have

jp.z/j � jzjn � jan�1zn�1 C � � � C a1z C a0j

� jzjn � jzjn�1 � R2

D jzjn�1 � R2

� Rn

and hence

jf .z/j � c

Rn:

On the other hand, on fzj jzj � Rg, f is bounded because it is continuous. Thus, fis bounded on all of C, and by Liouville’s Theorem, it is constant, and hence so isp.z/. This is a contradiction since we assumed n � 1. ut

5.3

A conformal map is a regular map f W U ! Rn defined on an open set U � Rn

such that for two vectors u; v 2 Rn, and every point z 2 U , the angle betweenthe vectors Dzf.u/,Dzf.v/ is the same as the angle between u ¤ 0 and v ¤ 0.(Recall that the angle 0 � ˛ � between non-zero vectors u, v is defined bycos.˛/ D u � v=.jjujj � jjvjj/.)

Note that for n D 1, any regular map is conformal. For n > 2, it can be shownthat every conformal map is locally a constant multiple of an isometry, a fact whichwe will not show here. However, for n D 2 we have the following result. Identify,again, C with R2 by x C iy 7! .x; y/, and drop the bold-faced letters.


Theorem. For n D 2, a regular map f as in 1.1 is conformal if and only if on eachconnected component of U , f is either holomorphic or the complex conjugate of aholomorphic function (such a function is often called antiholomorphic).

Proof: This is really a statement entirely about the R-linear map Dfz for eachz 2 U (see Exercise (1)), which is a consequence of the following

Lemma. A regular R-linear map A W C ! C preserves angles between non-zeropairs of vectors if and only A is given either by the formula Az D �z or Az D �zfor some � ¤ 0 2 C.

The proof of the lemma goes as follows: Representing A by a 2 2 matrix usingthe basis 1, i , by our assumption, the columns ofAmust be non-zero and orthogonal.Since multiplication by a non-zero complex number � preserves angles by thegeometric interpretation of complex numbers, we may assume (by composing, ifnecessary, A with multiplication by a suitable non-zero complex number) that thefirst column of A is .1; 0/T . By orthogonality, the second column is then .0; a/T

for some non-zero (real) a. However, the requirement that A.1; 1/T , A.1;�1/T beorthogonal gives a2 D 1. If a D 1, Az D z and if a D �1, Az D z. ut

6 Laurent series, isolated singularities and the ResidueTheorem

6.1 Laurent series

Let f be a holomorphic function defined on an annulusR1 < jz�cj < R2 for somea 2 C. Let Lr be a circle with center c and radius r oriented counterclockwise.Define

f1.z/ D 1

2i

Z

Lr

f .�/

� � zd�

for some jz � cj < r < R2 and

f2.z/ D � 1

2i

Z

Ls

f .�/

� � zd�

for some R1 < s < jz � cj. The exact choice of r or s does not change the value byTheorem 2.1. Furthermore, we have

f .z/ D f1.z/C f2.z/ for R1 < jz � cj < R2. (6.1.1)

6 Laurent series, isolated singularities and the Residue Theorem 255

To see this, consider a circle K with center z and a small radius orientedcounterclockwise, and apply Theorem 2.1 to the function

f .�/

� � z

of the variable � with simple closed curves Lr , �Ls , K , along with Cauchy’sformula (Theorem 3.3).

By differentiating under the integral sign (Theorem 1.7), and the fact that thevalue does not depend on r , we see that the function f1.z/ is holomorphic in thedisk jz � cj < R2, and hence has a Taylor expansion. In case of the function f2.z/,it is convernient to perform the substitution

� D 1

� � c; � D c C 1

�; d� D � 1

�2d�;

and similarly

t D 1

z � c; z D c C 1

t;

so that

1

� � zD � � t

� � t :

For the function g.t/ D f2.z.t//, this gives

g.t/ D 1

2it

Z

M

g.�/

�.� � t/d�

where M is the circle with center 0 and radius 1=s < 1=jt j oriented counterclock-wise (note that the substitution reverses orientation, so we have a total of 4 minussigns, which result in a plus). Again by differentiating under the integral sign, wesee that g.t/=t is a holomorphic function in the circle jt j < 1=R1, and hence hasa Taylor expansion. (Note: when performing the substitution, we implicitly usedthe fact that when performing substitution in complex line integrals, we may treatdifferentials the same way as in ordinary single-variable integral substitution - seeExercise (11) below). Writing the Taylor series of g.t/ in the variable .z � c/, weobtain an expansion of the form

f2.z � c/ DXn<0

an.z � c/n;

which leads to the following result:


Theorem. A holomorphic function f .z/ in an annulus R1 < jz � cj < R2 hasan expansion

f .z/ D1X

nD�1an.z � c/n (6.1.2)

which is absolutely convergent in the annulus R1 < jz � cj < R2, and theconvergence is uniform on every compact subset. Furthermore, the coefficientsan are uniquely determined by f . (This is called the Laurent expansion of thefunction f .z/.)

Proof. The existence of the expansion (6.1.2) follows from the expansions for thefunctions f1, f2 in the variable z � c discussed above. Moreover, the convergenceproperties of the series (6.1.2) follow from our already discussed theory of powerseries. Regarding uniqueness, note that the coefficients an can be calculated byCauchy integrals, which can be performed term by term by the convergenceproperties of the power series (see Exercise (13) below). ut

6.2 Classification of isolated singularities and the ResidueTheorem

Let U be an open subset of C, and let c 2 U . A holomorphic function f definedon U X fag is said to have an isolated singularity at c. In this case, f has a Laurentexpansion (6.1.2) at a with R1 D 0. Isolated singularities are classified using thisexpansion: If an D 0 for all n < 0, we say that f has a removable singularity at c.Clearly, in that case, one can extend f to U by setting f .c/ D a0. (For a strongerstatement, see Exercise (14).) On the other hand, if the set of all n for which an ¤ 0

is not bounded below, then we say that f has an essential singularity at c. If n > 0is such that a�n ¤ 0 and am D 0 for all m < �n, then we say that f has a poleof order n at c. Symmetrically, if n > 0 and an ¤ 0 while am D 0 for m < n,we say that f has a zero of order n at c, although that is not really a singularity. Ifam D 0 for m < �n, n � 0, we say that f has at most a pole of order n at c, andif am D 0 for m < n > 0, we say that f has a zero of order at least n at c. Wesay that f .z/ has at most a pole at c if it does not have an essential sigularity there.From the uniqueness of the Laurent expansion, it immediately follows that f has atmost a pole of order n at c if and only if f .z/.z � c/n is holomorphic in U , while fhas a zero of order at least n at c if and only if f .z/=.z � c/n is holomorphic in U .

Note that the remarks 4.5 on algebraic operations with power series obviouslyextend to Laurent series of functions which have at most a pole at the point c. Whenessential singularities are present, however, multiplication and division may involveinfinite sums of coefficients at the same power, and hence the purely algebraicoperations are undefined. We will also need the following result on essentialsingularities:


Proposition. Let f be a holomorphic function on .c; r/ X fcg with an essentialsingularity at c and letA 2 C. Then for each " > 0 and each ı > 0, f .z/ 2 .A; "/for infinitely many z 2 .c; ı/.

Proof. Assuming the opposite, there exists an A 2 C and an " > 0 and a ı > 0 suchthat f .z/ �A ¤ 0 for z 2 .c; ı/. But then the function

1

f .z/ � A

has a removable sinularity at A, and hence f .z/ has at most a pole at A. ut

For a function f with an isolated singularity at c as above, we define the residueof f at c as

reszDcf .z/ D a�1: (6.2.1)

Since we may integrate the Laurent series term by term, it follows that ifL is a circlein U with center c oriented counter-clockwise such that the interior of the circle isalso contained in U , then

reszDcf .z/ D 1

2i

Z

L

f .�/d�: (6.2.2)

From this and Theorem 2.1, we then immediately get the following fact:

Theorem. (The Residue Theorem) Let U be a domain in C and let L1; : : : ; Lkbe simple piecewise continuously differentiable closed curves with disjoint imagessuch thatL1q� � �qLk is the boundary ofU oriented counter-clockwise. Let furtherc1; : : : cm be finitely many distinct points in U , and let f be a holomorphic functionon V X fa1; : : : ; amg where V � U is an open set. Then

Z

L1

f .z/dz C � � � CZ

Lk

f .z/dz D 2i.reszDc1f .z/C � � � C reszDcmf .z//:

ut

6.3 Applications: The Argument Principle and Rouche’s Theorem

The Residue Theorem has the following celebrated consequence. We say that afunction is meromorphic in an open set U � C if f is holomorphic and non-zeroon U X S for a discrete set S � U , and f has at most a pole at each c 2 S . Thenwe define the degree of f at c 2 U as

degc.f / D8<:

n if f has a zero of degree n at c�n if f has a pole of order n at c0 otherwise.


6.3.1 Theorem. (The Argument Principle) Let U be a domain in C and letL1; : : : ; Lk be simple piecewise continuously differentiable closed curves withdisjoint images such that L1 q � � � q Lk is the boundary of U oriented counter-clockwise. Let f be a meromorphic function on V � U with no zeros or poles onL1 q � � � q Lk . Then

kXjD1

1

2i

Z

Li

f 0.z/f .z/

dz DXc2U

degc.f /:

(Note that since U is compact, the sum on the right-hand side has only finitely manynon-zero terms.)

Proof. If f .z/ D .z � c/ng.z/ with g.c/ ¤ 0. Then

f 0.z/ D n.z � c/n�1g.z/C .z � c/ng0.z/;

so that

f 0.z/f .z/

D n

z � c C g0.z/g.z/

;

and hence

reszDcf 0.z/f .z/

D n:

The statement then follows directly from the Residue Theorem (see Exercise (17)).ut

Comment: The argument of a number z 2 C X f0g is defined as the angleArg.z/ D ˛ such that z D jzjei˛ . Since this ˛ is only defined up to adding anintegral multiple of 2 , one usually normalizes by requiring 0 � Arg.z/ < 2 (thisis the argument in the narrower, normalized sense). It follows that in a connectedopen set U where there exists a holomorphic function Ln.z/ which satisfies

eLn.z/ D z;

we have

Arg.z/ D Im.Ln.z//C 2k for some k 2 Z:

If U � C is, say, a convex open set on which f .z/ has no zero, then f 0.z/=f .z/has a primitive function Ln.f .z// whose imaginary part differs from Arg.f .z// by


2k, k 2 Z. The whole point is, however, that by Lemma 3.1, Ln.z/ cannot be well-defined on the whole set C X f0g; roughly speaking, when we follow a circle withcenter 0 once around counter-clockwise, the value of the logarithm will increase by2i (note that its real part won’t change: it is just its imaginary part, the argument,which will inrease by 2). Thus, Theorem 6.3.1 in the case k D 1makes precise theintuitive assertion that following around a simple closed curve on which f .z/ hasno zero and which is a boundary oriented counter-clockwise of a domain U , thenthe increase of the argument of f along this curve is equal to 2 times the numberof zeros of f inside U .

Let f be a holomorphic function on U which is non-zero outside of a finite set ofpoints. Then f is meromorphic, and the sum of degrees of f at all the points a 2 U(which has only finitely many non-zero summands) is called the number of zeros ofthe function f in the set U . (Thus, this is a count of zeros with “multiplicities”.)

6.3.2 Corollary. (Rouche’s Theorem) Let U be a domain in C and let L1; : : : ; Lkbe simple piecewise continuously differentiable closed curves with disjoint imagessuch that L1 q � � � qLk is the boundary of U oriented counter-clockwise. Let f , gbe holomorphic on V � U and satisfy

jf .z/ � g.z/j < jf .z/j for z 2 L1 q � � � q Lk:

Then f , g have the same number of zeros in U . (Note that again, since U iscompact, by Theorem 4.4 ,f and g have only finitely many zeros in U .)

Proof. By assumption, we haveˇˇ g.z/f .z/

� 1

ˇˇ < 1 for z 2 L1 q � � � q Lk:

Thus, if we put F.z/ D f .z/=g.z/, then

F ŒL1 q � � � q Lk� �

where is is the open disk with center 1 and radius 1. Then 1=z has a primitivefunction on , which we will denote by Ln.z/. The chain rule then implies

.Ln.F.z///0 D F 0.z/F.z/

:

Therefore,

kXjD1

1

2i

Z

Li

F 0.z/F.z/

dz D 0;

and our statement follows from the Argument Principle. ut


6.3.3 Theorem. Suppose a holomorphic function f .z/ defined on an open set U �C is such that for some z0 2 U , f .z0/ D w0 and the function f .z/ � w0 has a zeroof order n at z0. Then there exists an "0 > 0 such that for 0 < " < "0, there exists aı > 0 such that for all c 2 C with jc � w0j < ı, the number of zeros of f .z/ � c in.z0; "/ is n.

Proof. LetL" be the circle with center z0 and radius " oriented conterclockwise. Wewill study the integral

1

2i

Z

L"

f 0.z/f .z/ � c

dz: (*)

Choose "0 > 0 so that f .z/ � w0 ¤ 0 for z 2 .z0; "0/ X fz0g. (Such an "0 > 0

must exist or else f .z/ is constant in a neighborhood of z0 by Theorem 4.4.) Thenif we choose 0 < " < "0, since L" is compact, there exists a ı > 0 such that thedenominator of the integrand (*) is non-zero for all c 2 .w0; ı/. Therefore, theintegral (*) is defined and continuous in that domain. However, we know bythe Argument Principle that (*) is a non-negative integer, namely the number ofzeros of the function f .z/ � c in the disk .z0; "/. Thus, it must be constant,as claimed. ut

6.3.4 Corollary. (The Holomorphic Open Mapping Theorem) A non-constant holo-morphic function on a connected open set U � C maps open sets onto open sets.

Proof. Note that in particular in the conclusion of Theorem 6.3.3, every element of.w0; ı/ is in the image of f . ut

An immediate consequence is then the following:

6.3.5 Corollary. (The maximum principle) If f .z/ is holomorphic and non-constant in a connected open set U � C, then jf .z/j has no maximum in U .

Proof. By Corollary 6.3.4, for any z 2 U , all points in a neighborhood of f .z/ arein the image of f , so this will include points of greater absolute value. ut

Another consequence of the Argument Principle is the following

6.3.6 Theorem. (Hurwitz’s Theorem) Let fn be holomorphic functions on aconnected open set U � C which converge uniformly on every compact subsetof U to a function f W U ! C. Assume further that fn.z/ ¤ 0 for all n and allz 2 U . Then either f is identically 0 on U , or f .z/ ¤ 0 for all z 2 U .

Proof. We know from Weirstrass’s Theorem (Theorem 3.6) that f .z/ is a holomor-phic function on U . Suppose f .z/ is not identically 0. Then by Theorem 4.4, forany point z0 2 U , there exists a number r > 0 such that f .z/ is defined and not


equal to 0 for 0 < jz � z0j � r . In particular, by Proposition 6.3 of Chapter 2,jf .z/j has a minimum on the circle K D fz 2 Cj jz � z0j D rg. It follows that1=f .z/ converges uniformly to 1=f 0.z/ onK , and by Weierstrass’s Theorem 3.6 alsof 0n.z/ converges uniformly to f 0.z/ on K . By Lebesgue’s Dominated Convergence

Theorem, consideringK oriented counter-clockwise, we conclude that

limn!1

1

2i

Z

K

f 0n.z/

fn.z/dz D 1

2i

Z

K

f 0.z/f .z/

dz: (*)

Now by the Argument Principle, the argument of the limit in (*) is the number ofzeros of fn inside the circle K , which is 0, while the right-hand side is the numberof zeros of f insideK . In particular, f .z0/ ¤ 0, and the statement follows, since z0was arbitrary. ut

6.4 Example: The values of the Riemann zeta function at evenintegers k � 2

The Riemann zeta function is

�.s/ D1XmD1

1

msfor Re.s/ > 1:

A lot can be said about the Riemann zeta function, but here we want to show howthe Residue Theorem can be applied to evaluating �.k/ for k � 2 an even integer,which is a typical example of an application of the theorem. (The evaluation of �.k/for odd integers k > 2 is still an open problem.) First, note that ez � 1 has a simple(=order 1) zero at z D 0, and hence

z

ez � 1

has a removable singularity at 0, and hence has a Taylor expansion at z D 0:

z

ez � 1D

1XjD0

Bj

j Šzj :

The numbers Bj are called the Bernoulli numbers. One has

B0 D 1; B1 D �1=2; B2 D 1=6; B3 D 0; B4 D �1=30; : : : :


(See Exercise (18).) Now consider the function

f .z/ D 2i

zk.e2iz � 1/:

Then, by definition, for k � 2 an even integer,

reszD0f .z/ D .2i/kBk

kŠ:

On the other hand, clearly f .z/ has a simple (D order 1) pole at m 2 Z X f0g), andusing Taylor series at z D m, one gets

reszDmf .z/ D 1

mk:

Also, clearly, f .z/ has no other poles. Let L be a rectangle with sides

˙.nC 12/C ti, niCt , t 2 R in the appropriate ranges, oriented counterclockwise.

By the Residue Theorem,

Z

L

f .z/dz D 2i

.2i/kBk

kŠC 2

nXmD1

1

mk

!: (C)

On the other hand, the left-hand side tends to 0with n ! 1. In effect, we claim that

je2 iz � 1j > C (*)

on L, where C > 0 is a constant independent of n. To see this, note that on thevertical sides of the rectangle, e2iz � 1 is a negative real number, while on thehorizontal sides, e2 iz is a complex number of constant absolute value, which withn ! 1 tends to 0 on the upper side, and to 1 on the lower side. This proves (*),and since we are further dividing by zk , k � 2, the absolute value of the integrandon the left-hand side of (C) is < K=n2 for a constant K independent of n, whichimplies that the left-hand side of (C) converges to 0 with n ! 1. We conclude thefollowing

Theorem. For every even integer k � 2,

�.k/ D .2i/kBk

2.kŠ/:

7 Exercises 263

7 Exercises

(1) Prove from first principles that for a holomorphic function f W U ! C whereU is open, f , thought of as a map from an open set of R2 to R2, has a totaldifferential at every point.

(2) Prove that the function of one complex variable

f .z/ D(e�z�4

if z ¤ 0

0 if z D 0

satisfies the Cauchy-Riemann conditions (CR) everywhere in C, but is notholomorphic. [This example is due to H. Looman.]

(3) Consider the function of one complex variable

f .z/ D�

z5=jzj4 if z ¤ 0

0 if z D 0.

Prove that f is continuous everywhere in C, satisfies the Cauchy-Riemannconditions (CR) at z D 0, but does not have a complex derivative at z D 0.

(4) Jordan’s Theorem. Let L be an oriented closed simple piecewise contin-uously differentiable curve in C. Assume for simplicity that there exists aparametrization c W ha; bi ! C of L where the partition a D a0 < � � � <ak D b mentioned in 1.1 of Chapter 8 satisfies c0C.ai / ¤ ��c0�.ai / fori D 1; : : : ; k � 1, c0C.a/ ¤ ��c0�.b/ for any � > 0.1. Prove that for every x 2 cŒha; bi�, there exists an open neighborhood Vx

and a diffeomorphism

�x W Vx ! .0; 1/

with

det.D�x/ > 0;

a number ˛ 2 .0; 2/ and numbers a > 0; b 2 R such that

c.b/ D x;

cŒha; bi� \ Vx D cŒ.b � a; b C a/�

and

for s 2 h�1; 0i, we have �xc.asCb/ D .�s cos.˛/;�s sin.˛//and for s 2 h0; 1i; we have �xc.as C b/ D .s; 0/.


2. Define, for a point z 2 C X cŒha; bi�,

indc.z/ D 1

2i

Z

L

d�

� � z:

Consider the notation from part 1 of this exercise. Let t1 D .q cos.ˇ/;q sin.ˇ//, t2 D .q cos.�/; q sin.�// where 0 < q < 1, 0 < ˇ < ˛ <

� < 2 . Let zi D ��1x .ti /, i D 1; 2. Prove that

indc.z1/ D indc.z2/C 1:

[Hint: Assume without loss of generality b D 0, a D 1, �x D Id. Letq < r < 1. Consider the curve c1 parametrized by the restriction of cto h�r; ri. Let c2 W Œ0; ˛� ! C and c3 W Œ˛; 2� ! C be defined byt 7! .r cos.t/; r sin.t//. Let Li be the oriented curve parametrized by cifor i D 1; 2; 3. Use Remark 8.6. (4) for the curves L1 C L2, L3 � L1,L2 C L3.]

3. Prove that indc.z/ is constant on connected components of C X cŒha; bi�.4. Let cŒha; bi� � .0;R/ and let jzj � R. Prove that

indc.z/ D 0:

5. Prove that there exists a point x of cŒha; bi� for which, in the notation of part2 of this Exercise, indc.z1/ D 0 or indc.z2/ D 0. Note that either alternativecan arise depending on the orientation of c. [Hint: Let x 2 Im.c/ be a pointwith maximal real part.]

6. Let Ui be the connected component of C X cŒha; bi� which contains thepoint zi , i D 1; 2. Prove that Ui X Ui D cŒha; bi�. [Hint: Use part 1 andcompactness.]

7. Prove from part 5 that U1 [ U2 [ cŒha; bi� is open, and equal to itsclosure, hence equal to C. Hence, CXcŒha; bi� has precisely two connectedcomponents, namely U1 and U2 (note that, by parts 2 and 3, U1 ¤ U2).

(5) Prove that the set of all z 2 C such that ez D 1 is precisely the setf2ki j k 2 Zg. [Hint: Recall Exercises (12), (11) of Chapter 1].

(6) Prove that if Re.t/ > 0, then there exists a unique z 2 C with �=2 <Im.z/ < =2 such that ez D t . Denote z D ln.t/. Prove that the complexderivative of ln.z/ is 1=z.

(7) For Im.z/ > 0, a 2 C, define za D ea ln.z/. Mimic Exercise (7) of Chapter 1 toshow that the complex derivative of za is aza�1.

(8) Define, for a 2 C,

a

0

!D 1

7 Exercises 265

and for k D f1; 2; : : : g,

a

k

!D a.a � 1/ : : : .a � k C 1/

kŠ:

Prove Newton’s formula, which states that for z 2 C with jzj < 1, we have

.1C z/a D1XnD0

a

n

!zn:

(9) Suppose that f is a holomorphic function on C, and suppose there exist non-zero numbers a; b 2 C such that we do not have qa D b for any q 2 Q, andsuch that f .z C a/ D f .z/, f .z C b/ D f .z/ for all z 2 C. Prove that then fis constant. [Note that there is more than one case to consider.]

(10) Prove that a non-constant holomorphic function on f W C ! C satisfiesf ŒC� D C. [Hint: If a … f ŒC�, then the function 1=.f .z/� a/ is holomorphicand bounded.]

(11) Prove that if L is a parametrized oriented piecewise smooth curve in an openset U � C, h W U ! C is a holomorphic injective function and f is aholomorphic function on hŒU �, then

Z

hŒL�

f .t/dt DZ

L

f .h.z//h0.z/dz:

(Note that the notation hŒL� applied to a parametrized curve is slightlyimprecise, but the meaning is clear.)

(12) Prove that if f .z/ is a holomorphic function on an annulusR1 < jjz�ajj < R2which cannot be holomorphically extended to any annulus r1 < jjz � ajj < r2for r1 � R1, r2 � R2 where equality does not arise in both cases, then theLaurent expansion (6.1.2) diverges outside the annulus R1 � jjz � ajj � R2.

(13) Prove that

Z

K

.� � a/nd� D 0

where K is a circle with center a oriented counter-clockwise, and n 2 Z,n ¤ �1.

(14) Let U � C be an open set, and let a 2 U . Let f be a holomorphic functionon U Xfag. Suppose that f is bounded in some neighborhood of a. Prove thatthen f has a removable singularity at a. [Hint: Consider the function f2 ofSubsection 6.1. Then f2 is bounded in a neighborhood of 0 because f is. Thismeans that the function of Subsection 6.1 is holomorphic and bounded in allof C. Apply Liouville’s Theorem.]


(15) Prove that the complex function f .z/ D e�1=z for z ¤ 0 has an essentialsingularity at z D 0. Conclude that the Taylor expansion of f at a 2 C X f0ghas radius of convergence jjajj. (Compare to Exercise (13) of Chapter 1.)

(16) Prove that a function f as in Subsection 6.2 has a pole of order n at a if andonly if g.z/ D f .z/.z � a/n is holomorphic in U and g.a/ ¤ 0. Similarly,prove that f has a zero of order n at a if and only if h.z/ D f .z/=.z � a/n isholomorphic in U and h.z/ ¤ 0.

(17) Let U be a connected open subset of C and let f; g be meromorphic functionson U . Prove that f � g, f=g are meromorphic on U .

(18) Prove that Bk D 0 if k > 2 is an odd integer.

11Multilinear Algebra

Now that we strengthened our foundations in topology, algebra needs an upgradeas well. We already know the concept of a bilinear map, which we have used,for example, in Chapter 3, Section 8. Of the more general multilinear maps, weencountered one additional example: the determinant. In this chapter, we will studymultilinear maps in some depth. This is essential for our treatment of differentialforms in Chapter 12 below, as well as for tensor calculus in Chapter 15 below.

In this chapter and the next, we will drop the bold-faced letter conventionof 1.2 of Chapter 3, as it is generally not used in this context.

1 Hom and dual vector spaces

1.1

In this Chapter, the symbol F stands for either the field R of real numbers or thefield C of complex numbers. Let V , W be vector spaces over F. Denote by

HomF.V;W /

the set of all linear maps (homomorphisms of F-vector spaces)

f W V ! W:

Observe that HomF.V;W / is again a vector space: for f; g 2 HomF.V;W /, we havea linear map f C g 2 HomF.V;W / defined by

.f C g/.x/ D f .x/C g.x/;

and when � 2 F, we also have a linear map �f defined by


267

268 11 Multilinear Algebra

.�f /.x/ D �f .x/:

The required identities are obvious.A case of special interest is whenW D F. Then we write

V � D HomF.V;F/;

and call V � the dual of the vector space V and often refer to elements of V � aslinear forms on V .

1.2 Covariance and contravariance

Observe the behavior of HomF.V;W / with respect to linear maps.First, let � W W ! W 0 be a linear map. We can naturally define a map

�� W HomF.V;W / ! HomF.V;W0/

by composition with �:

f 2 HomF.V;W / 7! � ı f 2 HomF.V;W0/:

The map �� is clearly linear. Moreover, this construction clearly preserves theidentity, and if we have another linear map W W 0 ! W 00, we have

� ı �� D . ı �/�:

Next, consider a linear map � W V ! V 0. Again, we can define naturally a mapon HomF.‹;W / by

g 7! ��.g/ D g ı �:

This map,

�� W HomF.V0;W / ! HomF.V;W /;

however, goes in the opposite direction! Again, �� is clearly linear, and thisconstruction preserves identity. Also, it preserves composition, but this time in thereversed order: If � W V 0 ! V 00 is a linear map, then

.� ı �/� D �� ı ��:

This behavior, i.e. reversing the direction of maps and the order of composition, isreferred to as contravariance and the opposite of contravariance, i.e. preserving thedirection of maps and order of composition, is then referred to as covariance. Thus,the construction �� is contravariant and the construction �� is covariant.

1 Hom and dual vector spaces 269

These are basic concepts of category theory where the constructions � 7! ��and � 7! �� are referred to as covariant and contravariant functors, which meansthat they preserve the identity, and preserve or reverse the order of composition.For our purposes, however, we do not need to investigate these concepts further.The interested reader can look at [12].

At this moment, the most important fact for us is that the dual is contravariant,i.e. for a linear map of vector spaces

� W V ! W

we obtain a linear map

�� W W � ! V �:

1.3 The dual basis

Suppose now that a vector space V is finite-dimensional, and let .v1; : : : ; vn/ bean ordered basis of V . Then, by the definition of a basis, there exist linear forms.f1; : : : ; fn/ 2 V � such that

fi .vj / D 1 when i D j

0 else.

Proposition. .f1; : : : ; fn/ is a ordered basis of V � (and is referred to as the dual(ordered) basis of .v1; : : : ; vn/).

Proof. We know that any linear form f W V ! F satisfies

f .�1v1 C � � � C �nvn/ D �1f .v1/C � � � C �nf .vn/:

We conclude that

f D f .v1/f1 C � � � C f .vn/fn: ut

Note that finite-dimensionality was used in the last line of the proof, where wewould get an undefined infinite sum, were the basis infinite.

1.4 The double dual

Let V be a vector space. Then there is a map

� W V ! .V �/�


defined naturally as follows: Let

v 2 V

and

.f W V ! F/ 2 V �:

Then define

.�.v//.f / D f .v/:

Proposition. The map � is an isomorphism when V is finite-dimensional.

Proof. Let V have an ordered basis .v1; : : : ; vn/. Let .f1; : : : ; fn/ be the dualordered basis, and let .w1; : : : ;wn/ be the dual ordered basis of .f1; : : : ; fn/. Bydefinition, we have �.vi / D wi . ut

1.5 Duals of inner product spaces

For a general finite-dimensional space V , there is no “naturally defined” isomor-phism V ! V �. This statement can actually be made more precise, but we won’tneed that. For a finite-dimensional real vector space V with inner product hu; vi,however, the situation is different. We can define a linear map

� W V ! V � (*)

by

.�.v//.w/ D hv;wi: (**)

It is easily seen that this is an isomorphism, since when .v1; : : : ; vn/ is anorthonormal ordered basis, .�.v1/; : : : ; �.vn// is the dual basis.

When V is a finite-dimensional inner product vector space over C, the situationis somewhat more complicated. If we attempt to define an isomorphism (*) by theformula (**), we find by the properties of the complex inner product that the mapwe obtain is anti-linear, not linear. It is possible to define the complex conjugatespace V which is the same as V as a real vector space, and the multiplication by� 2 C on V is defined as the multiplication by the complex conjugate � on V . Thenthe formula (**) defines an isomorphism

V ! V �:

2 Multilinear maps and the tensor product 271

2 Multilinear maps and the tensor product

2.1

Let V1; : : : ; Vn;W be vector spaces over the same field F (again, we assume F D R

or F D C). A multilinear map � from V1 � � � Vn intoW is a map of sets which islinear in each coordinate. This means that for fixed vi 2 Vi , i ¤ k (i; k D 1; : : : ; n),and for x; y 2 Vk , � 2 F, we have

�.v1; : : : ; vk�1; x C y; vkC1; : : : ; vn/D �.v1; : : : ; vk�1; x; vkC1; : : : ; vn/C �.v1; : : : ; vk�1; y; vkC1; : : : ; vn/

and

�.v1; : : : ; vk�1; �x; vkC1; : : : ; vn/ D ��.v1; : : : ; vk�1; x; vkC1; : : : ; vn/:

When n D 2 resp. n D 3, we speak of a bilinear resp. trilinear map, etc.Beware the standard mistake: Note that a multilinear map is not linear (except

in special cases such as when V1 D � � � D Vn D 0). For example, in the bilinearcase, the linear additivity formula gives

2�.x; z C t/ D �.2x; z C t/ D �.x; z/C �.x; t/;

while the multilinear additivity formula gives

�.x; z C t/ D �.x; z/C �.x; t/:

2.2 The tensor product by universality

Since a multilinear map � from V1� � �Vn toW is not linear, it raises the questionwhether the information contained in a multilinear map could be equivalentlyreplaced by the information contained in a linear map.

In mathematics, a standard approach to such a situation is by looking for auniversal object: Is there a vector spaceW0 and a multilinear map � from V1� � �Vnto W0 such that for every multilinear map � from V1 � � � Vn to any vector spaceW there exists a unique linear map �0 W W0 ! W such that �0 ı � D �? We expressthis by a diagram (the arrows labelled multi mean multilinear maps):

V1 � � � Vnmulti �

multi

�

W0�0

W:


The dotted arrow means that the map (in this case linear) exists and is determinedby the other data. For given vector spaces V1; : : : ; Vn, such a universal vector spaceW0 indeed exists. It is called the tensor product, and denoted by

W0 D V1 ˝ � � � ˝ Vn:

Of course, the existence is yet to be proved. However, let us observe that just fromthe universal property, if the tensor product exists, it must be unique up to a preferred(we say canonical) isomorphism: Suppose

�0 W V1 � � � Vn ! W 00

is another universal multilinear map. Then by the universality of W0, there exists alinear map � W W0 ! W 0

0 such that

�0 D ��:

Similarly, by the universality of W 00 , there exists a linear map W W 0

0 ! W0

such that

� D �0:

But now, since we have

� D ��;

by the uniqueness part of the univeral property of �, we must have

� D Id;

and similarly

� D Id;

so � and are linear isomorphisms.

2.3 The existence of the tensor product

Proposition. Let V1; : : : ; Vn be vector spaces. Then there exists a vector space V1˝� � � ˝ Vn (the tensor product) which satisfies the universality property from the lastparagraph.

Proof. The construction is not very inspiring. Recall from Appendix A, 5.6 theconstruction of the free vector space. Now take the free vector space


F.V1 � � � Vn/ (*)

on the (typically infinite) basis V1 � � � Vn (forgetting, for the moment, the vectorspace structure of V1; : : : ; Vn completely).

Then there is a canonical (i.e. obvious) map

�0 W V1 � � � Vn ! F.V1 � � � Vn/;

namely sending each

x D .v1; : : : ; vn/ 2 V1 � � � Vnto the free generator of the same name. This map �0 is just a map of sets; there is noreason even to suspect that it may be multilinear.

Now, however, we apply our technique of factorization. Namely, in (*), take thevector subspace Z generated by all the elements

�.v1; : : : ; vk�1; x C y; vkC1; : : : ; vn/C.v1; : : : ; vk�1; x; vkC1; : : : ; vn/C .v1; : : : ; vk�1; y; vkC1; : : : ; vn/;

and

.v1; : : : ; vk�1; �x; vkC1; : : : ; vn/� �.v1; : : : ; vk�1; x; vkC1; : : : ; vn/

where vi 2 Vi , i ¤ k, i; k D 1; : : : ; n and � 2 F. Now define W0 as the factor

W0 D F.V1 � � � Vn/=Z:

Therefore, by definition of the factor space, we have a canonical projection

W F.V1 � � � Vn/ ! W0:

Put

� D �0:

Then we immediately see that � is a multilinear map, because the definition ofmultilinearity is precisely equivalent to asserting that our generators of Z go to 0.

Now by definition of basis, any map of sets

� W V1 � � � Vn ! W

determines a unique linear map

ˆ W F.V1 � � � Vn/ ! W


such that

ˆ.v1; : : : ; vn/ D �.v1; : : : ; vn/:

If � is moreover multilinear, then

ˆŒZ� D 0;

so by the Homomorphism Theorem, there exists a (necessarily unique) linear map

�0 W F.V1 � � � Vn/=Z ! W

such that

�0 D ˆ: ut

Notation: One usually denotes

v1 ˝ � � � ˝ vn D �.v1; : : : ; vn/ 2 V1 ˝ � � � ˝ Vn:

Let us also remark that to be completely precise, we should denote our tensorproduct by

V1 ˝F � � � ˝F Vn

to distinguish the field. We will, however, typically not use this longer notationunless confusion can arise.

Perhaps the most important convention is that in most of advanced mathematics,a multilinear map

� W V1 � � � Vn ! W

is generally identified with the corresponding linear map

�0 W V1 ˝ � � � ˝ Vn ! W;

which means that the two concepts are no longer distinguished explicitly, and thelinear variant is written in all formulas.

2.4 The tensor product and bases

To avoid excessive indexing, assume here that n D 2 and investigate the tensorproduct V ˝ W of vector spaces V , W with ordered bases .v1; : : : ; vm/ and.w1; : : : ;wp/. (See Exercise (2).)


Proposition. The set

fvi ˝ wj j i D 1; : : : ; m; j D 1; : : : ; pg

is a basis of V ˝W .

Proof. By the uniqueness explained in 2.2, it suffices to exhibit a bilinear map

�0 W V W ! F.V W /

which satisfies the universal property for bilinear maps. Put

�0.mXiD1

�ivi ;

nXjD1

�jwj / DXi;j

�i�j vi ˝ wj :

Bilinearity is an immediate consequence of associativity and distributivity.For universality, simply note that every bilinear map

� W V W ! U

into another vector space U must satisfy

�.

mXiD1

�ivi ;

nXjD1

�jwj / DXi;j

�i�j �.vi ;wj /;

so the map required by the universality property is uniquely given by the formula

�0.vi ˝ wj / D �.vi ;wj /: ut

2.5 The tensor product and duals

Let V , W be vector spaces over F. Note that we have canonical maps

� W V � ˝W ! Hom.V;W /;

� W V � ˝W � ! .V ˝W /�:

Specifically, let v 2 V , w 2 W , and let

.f W V ! F/ 2 V �;

.g W W ! F/ 2 W �:


Define

.�.f ˝ w//.v/ D f .v/w;

�.f ˝ g/.v ˝ w/ D f .v/˝ g.w/:

Proposition. Let V , W be finite-dimensional vector spaces. Then the linear maps�, � defined above are isomorphisms.

Proof. Let V;W have ordered bases .v1; : : : ; vm/ and .w1; : : : ;wn/, let the dualordered bases be .f1; : : : ; fm/, .g1; : : : ; gn/. We already know that the space spaceHom.V;W / is isomorphic to the space of .m n/-matrices by assigning to a linearmap � W V ! W its matrix with respect to the bases .vi /, .wj /. Denote by

�i;j 2 Hom.V;W /

the linear map whose matrix has 1 in the j ’th row and i ’th column and 0’s elsewhere.Then clearly the set of all �i;j , i D 1; : : : ; m; j D 1; : : : ; n is a basis of Hom.V;W /,and we have

�.fi ˝ wj / D �i;j ;

thus proving the statement for �.Now let, on .V ˝W /�, .ei;j / be the dual basis to the basis .vi ˝ wj /. Then by

definition

�.fi ˝ gj / D ei;j ;

thus proving the statement about �. ut

3 The exterior (Grassmann) algebra

3.1 Alternating (multilinear) maps

Let V , W be vector spaces over the field F (which, again, we assume to be equal toR or C). Recall that a multilinear map

� W V � � � V„ ƒ‚ …k times

! W

can be identified with a linear map

� W V ˝ � � � ˝ V„ ƒ‚ …k times

! W

(the left hand side is, of course, also denoted by V ˝k). The multilinear map � iscalled alternating if for any permutation

3 The exterior (Grassmann) algebra 277

� W f1; : : : ; kg ! f1; : : : ; kg;

and any vectors vi 2 V , we have

�.v�.1/ ˝ � � � ˝ v�.k// D sgn.�/�.v1 ˝ � � � ˝ vk/:

3.2

It is natural to ask if there is a universal object for alternating multilinear maps justas the tensor product was for multilinear maps, i.e. if for every vector space V andevery k D 0; 1; 2; : : : there exists a vector space Wa and an alternating map

� W V ˝k ! Wa

such that for every alternating map

� W V ˝k ! W

there exists a unique linear map �a W Wa ! W such that

� D �a�;

or, expressed by a diagram,

V ˝k

alt �

� alt

Wa�a

W:

The notation alt means an alternating map. Such an object indeed exists, as we shallprove in 3.3. It is called the exterior power, and is denoted by ƒk.V /. It is alsounique up to canonical isomorphism by the same argument as the tensor product(see Exercise (6)).

3.3 The Existence of the exterior power

Proposition. The vector space Wa D ƒk.V / and the map � W V ˝k ! ƒk.V / withthe universal property described in Section 3.2 exist.


Proof. Let Z � V ˝k be the vector subspace generated by all elements of the form

�.v�.1/ ˝ � � � ˝ v�.n//� sgn.�/�.v1 ˝ � � � ˝ vn/

where � is a permutation on f1; : : : ; ng and vi 2 V . Then by definition, thequotient map

� W V ˝k ! ƒk.V /

is alternating (since all the generators of Z being 0 is a translation of the definitionof an alternating map). Let � W V ˝k ! W be an alternating map. Then, again, bydefinition,

�ŒZ� D 0;

so by the Homomorphism Theorem, there exists a unique linear map

�a W ƒk.V / D V ˝k=Z ! W

such that � D �a ı �, as claimed. ut

Notation: For v1; : : : ; vk 2 V , one writes

v1 ^ � � � ^ vk D �.v1 ˝ � � � ˝ vk/ 2 ƒk.V /:

3.4 Exterior powers and bases

Proposition. Let V be a finite-dimensional vector space with ordered basis.v1; : : : ; vn/. Then the set

fvi1 ^ � � � ^ vik j1 � i1 < � � � < ik � ng (1)

is a basis of ƒk.V /.

Proof. Again, we will use the uniqueness which follows from the universal property.Let W 0

a be the free vector space on the set (1). A linear map on a vector space canbe defined by specifying its values on the basis elements. The basis elements onV ˝k are

vi1 ˝ � � � ˝ vik ; i1; : : : ik 2 f1; : : : ng: (2)

Define thus


�0 W V ˝k ! W 0a

by sending the element (2) to 0 if two of the numbers i1; : : : ; ik are equal, and to

sgn.�/ � vi�.1/ ^ � � � ^ vi�.k/if

i�.1/ < � � � < i�.k/:

Now let � W V ˝k ! W be an alternating map. Define

�0a W W 0

a ! W

by

�0a.vi1 ^ � � � ^ vik / D �.vi1 ˝ � � � ˝ vik /: (3)

Then for a basis element x in (2),

�.x/ D �0a�

0.x/ (4)

follows from the definition of an alternating map (in particular, note that if twocoordinates of x coincide, swapping these two coordinates only changes the sign butnot x so we get �.x/ D ��.x/, implying �.x/ D 0). Note also that the definition(3) is thereby forced by (4), so �0 has the universal property of 3.2. ut

3.5 Remark

Note that if dim.V / D n, then we have

dimƒk.V / D n

k

!:

In particular, for k > n, we haveƒk.V / D 0; and we also have

dim.ƒn.V // D 1:

This means that the space of alternating multilinear maps on V ˝n is 1-dimensional.Specifying an isomorphism

� W V ! Fn;


(where the right hand side is the space of columns), one such non-zero alternatingmap is

v1 ˝ � � � ˝ vn 7! det.�.v1/; : : : ; �.vn//:

(Here the argument of the determinant is simply the n n matrix with the columnslisted put in that order.) Thus, we have proved that any alternating multilinear mapon V ˝n is a constant multiple of the determinant!

When F D R, a choice of one of the two connected components of ƒn.V /X f0gis called an orientation of the vector space V , and the other orientation is calledthe opposite orientation. A linear isomorphism f W V ! V is said to preserveorientation if the linear isomorphismƒn.f / (see Exercise (7)) restricted toƒn.V /Xf0g preserves the chosen connected component. Otherwise, f is said to reverseorientation.

3.6 The exterior product

Let V;Z;W be vector spaces. Consider two numbers k; ` D 0; 1; 2; : : : . It is usefulto study multilinear maps

� W V ˝k ˝Z˝` ! W

which are alternating in the first k coordinates and the last ` coordinates separately.By this, we mean that

�.v�.1/ ˝ � � � ˝ v�.kC`// D sgn.�/�.v1 ˝ � � � ˝ vkC`/

whenever v1; : : : ; vk 2 V ,vkC1; : : : ; vkC` 2 Z, and � is a permutation whichsatisfies

�.f1; : : : ; kg/ D f1; : : : ; kg

(and hence, of course, also �.fk C 1; : : : ; k C `g/ D fk C 1; : : : ; k C `g). It turnsout that the universal object for multilinear maps alternating in the first k and last `coordinates separately is ƒk.V /˝ƒ`.Z/. More precisely, we have the following

Proposition. The map

�2 D � ˝ � W V ˝k ˝Z˝` ! ƒk.V /˝ƒ`.Z/

(see Exercises (3) and (4)) is alternating in the first k and last ` coordinatesseparately. For any vector spaceW and any linear map

� W V ˝k ˝Z˝` ! W


alternating in the first k and last ` coordinates separately, there exists a uniquelinear map

�2 W ƒk.V /˝ƒ`.V / ! W

such that � D �2�2.

Proof. It is obvious that �2 is alternating in the first k and the last ` coordinatesseparately. Consider a map � as in the statement of the proposition. Then forw 2 Z˝` fixed,

v 7! �.v ˝ w/

is an alternating map on V ˝k, which gives us a map

�w W ƒk.V / ! W: (*)

Fixing now v 2 ƒk.V /, on the other hand, (*) is clearly linear and alternating in w,thus giving us a map

�v W ƒ`.V / ! W: (**)

Therefore, (**) specifies a bilinear map

ƒk.V / ƒ`.Z/ ! W;

and hence a linear map

�2 W ƒk.V /˝ƒ`.Z/ ! W:

We have � D �2�2 by definition, which in turn uniquely determines �2 sinceƒk.V / ˝ ƒ`.Z/ is generated by elements of the form .v1 ^ � � � ^ vk/ ˝.vkC1 ^ � � � ^ vkC`/, v1; : : : ; vk 2 V , vkC1; : : : ; vkC` 2 Z. ut

The whole point of the proposition for our purposes is that for V D Z,when a map

� W V ˝kC` ! W

is alternating, it is clearly alternating in the first k and last ` coordinates separately,so the universal property in the proposition (forW D ƒkC`.V /) gives a map

^ W ƒk.V /˝ƒ`.V / ! ƒkC`.V /:


We will think of this map as a kind of a product, called the exterior product, i.e.write, for x 2 ƒk.V /, y 2 ƒ`.V /,

x ^ y 2 ƒkC`.V /:

One has, of course, for v1; : : : ; vkC` 2 V ,

.v1 ^ � � � ^ vk/ ^ .vkC1 ^ � � � ^ vkC`/ D v1 ^ � � � ^ vkC`:

In the above notation, we therefore see that

x ^ y D .�1/k`y ^ x

since .�1/k` is the sign of the permutation swapping the first k with the last `coordinates (without changing their individual orders). Note that if we put

ƒ.V / D1MkD0

ƒk.V /

(let ƒ0.V / D F), this defines an actual bilinear product

^ W ƒ.V /˝ƒ.V / ! ƒ.V /:

This product is associative and unital in the sense that

.x ^ y/ ^ z D x ^ .y ^ z/; 1 ^ x D x ^ 1 D x:

One calls ƒ.V / the exterior algebra (or the Grassmann algebra).

3.7 The exterior product and duality

Let V be a vector space over R or C. Define a linear map

� W ƒk.V �/ ! .ƒk.V //�

by

.�.f1 ^ � � � ^ fk//.v1 ^ � � � ^ vk/ DX�

sgn.�/ � f�.1/.v1/ � � � � � f�.k/.vk/

where the sum is over all permutations on the set f1; : : : ; kg.


Proposition. If V is a finite-dimensional vector space, then the map � definedabove is an isomorphism.

Proof. Let .v1; : : : ; vn/ be an ordered basis of V and let .f1; : : : ; fn/ be the dualordered basis of V �. Then for 1 � i1 < � � � < ik � n, we have

�.vi1 ^ � � � ^ vik / D fi1 ^ � � � ^ fik : ut

3.8 The Hodge * operator

Now let V be a finite-dimensional real inner product space of dimension n. (Thecomplex case can be treated also, but we don’t need it; see e.g. [8]). Thenƒk.V / isnaturally an inner product space where the inner product is defined by

hv1 ^ � � � ^ vk;w1 ^ � � � ^ wki DX�

sgn.�/ � hv�.v1/;w1i � � � � � hv�.vk/;wki

where the sum is over all permutations on f1; : : : ; kg. It is useful to note thatif .v1; : : : ; vn/ is an ordered orthonormal basis of V , then the basis given byProposition 3.3 is orthonormal.

Now let V be an oriented real finite-dimensional vector space of dimension n.Recall from Remark 3.5 that dim.ƒn.V // D 1 and note that an orientation specifiesa connected component C of ƒn.V / X f0g. Now there exists a unique � 2 C withh�; �i D 1. There exists a unique linear isomorphism

" W ƒn.V / ! R

such that

".�/ D 1:

Then we have a linear map

� W ƒk.V / ! .ƒn�k.V //�

defined by

.�.v1 ^ � � � ^ vk//.vkC1 ^ � � � ^ vn/ D ".v1 ^ � � � ^ vn/:

Now define the Hodge * operator

� W ƒk.V / ! ƒn�k.V /


as the composition

ƒk.V /�

.ƒn�k.V //� ƒn�k.V /

where the second map is given by the inner product on ƒn�k.V /.Note that when .v1; : : : ; vn/ is an ordered orthonormal basis of V , then

v1 ^ � � � ^ vn is equal either to � or ��. We say that the basis is oriented if

v1 ^ � � � ^ vn D �:

Then we see readily that for an oriented ordered basis .v1; : : : ; vn/ of V , we have

�.v1 ^ � � � ^ vk/ D vkC1 ^ � � � ^ vn:

4 Exercises

(1) Let V , W be finite-dimensional vector spaces, and let � W V ! W be alinear map. Fix ordered bases .v1; : : : ; vn/ of V and .w1; : : : ;wm/ of W . Let.f1; : : : ; fn/ and .g1; : : : ; gm/ be the dual ordered bases. Let A be the matrixof the map � with respect to the ordered bases .v1; : : : ; vn/, .w1; : : : ;wm/.Prove that the matrix of the map �� with respect to the dual ordered bases isthe transposed matrix AT .

(2) Write down a basis of V1˝� � �˝Vn in terms of chosen bases of V1; : : : ; Vn forgeneral n.

(3) “Functoriality” of the tensor product: For linear maps f W V ! V 0, g W W !W 0, construct a map f ˝ g W V ˝W ! V 0 ˝W 0 in such a way that Id ˝ Idis the identity, and the construction preserves compositions.

(4) Prove commutativity associativity and unitality of the tensor product, i.e.construct isomorphisms

V ˝W ! W ˝ V;

V ˝ .W ˝Z/ ! .V ˝W /˝Z;

F ˝ V ! V

which form commutative diagrams with the linear maps constructed inExercise (3). (This property is called “naturality”.)

(5) Prove that for any vector spaces V;W;Z, there is a canonical isomorphism

Hom.V;Hom.W;Z// Š Hom.V ˝W;Z/:

4 Exercises 285

(6) Prove the uniqueness of ƒn.V / based on its universal property discussedin 3.2.

(7) Prove “functoriality” of ƒn, i.e. for a linear map V ! W , construct alinear map ƒn.f / W ƒn.V / ! ƒn.W / which preserves identity maps andcompositions.

(8) Prove that for a finite-dimensional vector space V , a linear isomorphismf W V ! V preserves orientation if and only if det.f / > 0.

(9) Prove the associativity and unitality property of the exterior product definedin Section 3.6.

(10) Let V be a real finite-dimensional inner product space. Prove the commutativ-ity of the diagram

ƒk.V �/Š�

Š

.ƒk.V //�

Š

ƒk.V /Id

ƒk.V /

where the vertical maps are given by the inner products in V andƒk.V /.

12Smooth Manifolds, Differential Formsand Stokes’ Theorem

In this chapter, we will introduce smooth manifolds (“locally Euclidean spaces”).A theory of differential forms, which we will exhibit, allows us to set up ageneral theory of integration on such spaces, and to generalize Green’s Theoremin Chapter 8 to the general Stokes Theorem in arbitrary dimension.

In the process of introducing these topics, we will touch on the field of algebraictopology. For basic information on this topic, the reader may look at [20]. For a moreadvanced introduction to algebraic topology from the point of view of differentialforms, we recommend [3]. For an introduction to topics which are more abstract,we recommend [13, 14].

1 Smooth manifolds

1.1 Topological manifolds

A topological manifold of dimension n (briefly a topological n-manifold) is ametrizable separable topological space M (metrizable means that there exists ametric on M which induces the given topology on M ) such that for every x 2 M

there exists an open neighborhoodUx of x and an injective open map

hx W Ux ! Rn (*)

(open map means that the image of every set open in the domain is open in thecodomain). The neighborhood Ux is called a coordinate neighborhood, and thefunction hx is called a coordinate system, or coordinate system at x. The mapassigning to each x 2 M a coordinate neighborhood and a coordinate system iscalled an atlas. The coordinate systems of an atlas are also referred to as charts.


287

288 12 Smooth Manifolds, Differential Forms and Stokes’ Theorem

Remarks:1. Note that instead of requiring hx to be open, we could have equivalently required

that hx be a homeomorphism (see Exercise (1)).2. Since we assumeM is separable and metrizable, it has a countable basis (see 1.2

of Chapter 9). The reason we don’t actually sayM is a metric space is that we donot want to specify the metric: the metric has no geometric significance, and isonly a technical tool at this point. While there are metrics on manifolds which dohave a geometrical significance, we will only see these when we develop morestructure (such as the concept of a Riemann metric in Chapter 15).

3. Note that the pairs .Ux; hx/ may coincide for different points x. For example,forM D Rn, the atlas may contain only one coordinate system, namely Rn withthe identity map Id W Rn ! Rn, which can be equal to .Ux; hx/ for all x 2 Rn.In other interesting cases, the atlas may contain only finitely many coordinatesystems (in fact, note that by definition, a compact manifold always has sucha finite atlas). The reader may wonder why we don’t simply speak of atlasesas open covers U , with coordinate systems on each U 2 U . This is merely atechnical point: it turns out that being able to denote a coordinate neighborhoodof a point by a single symbol simplifies many arguments.

4. Because we required separability, by our definition, an uncountable discrete set isnot a manifold. There is an alternative definition, calling a manifold any (possiblyuncountable) disjoint union of manifolds in our sense. (In a disjoint union

ai

Mi ,

a set U is open if and only if each U \Mi is open in Mi .)

1.2 Smooth manifolds

A smooth manifold of dimension n (briefly a smooth n-manifold) is a topologicalmanifoldM with an atlas .Ux; hx/ such that for every x; y 2 M , the composition

hxŒUx \ Uy�.hx/

�1

Ux \ Uy

hy

hyŒUx \ Uy� (C)

is a smooth map, i.e. a map which is continuous and has partial derivatives of allorders which are also continuous. (Note that the domain and codomain of (C) areopen subsets of Rn; also note that the intersection of Ux and Uy may be empty; inthat case, the condition (C) is void.)

Remarks:1. Note that this definition is completely intuitive: it simply says that in a coordinate

neighborhood, we can speak of smooth real functions, and that these concepts arecompatible when we pass from one coordinate neighborhood to another.

2. Note that the continuity of all higher partial derivatives does not follow fromtheir existence, even on an open set (see Exercise (2) of Chapter 3).

1 Smooth manifolds 289

1.3 Differentiable maps

Let M and N be manifolds of dimensions m and n, respectively. A mapf W M ! N is called a C r -map if f is continuous and for every x 2 M , thecomposition

hxŒ.f�1ŒUf.x/�/ \ Ux�

h�1x

f �1ŒUf.x/� \ Uxf

Uf.x/

hf.x/

Rn

has continuous partial derivatives up to order r 2 N. Note that the source ofthe composition is an open subset of a Euclidean space. A C1 map is a mapwhich is C r for all r . A C r -diffeomorphism (r � 1) is a homeomorphismf W M ! N such that both f , f �1 are C r . A C1-diffeomorphism will be referredto simply as a diffeomorphism. Two smooth manifoldsM , N for which there existsa diffeomorphismM ! N are called diffeomorphic. For a point x 2 M , a smoothcoordinate system at x consists of an open neighborhood U of x and a (smooth)diffeomorphism h W U ! V where V is an open subset of Rn.

Given a smooth manifoldM with a given atlas .Ux; hx/, any other atlas .Vx; kx/on the topological manifold M is considered an atlas on the smooth manifold Mif the identity on M is a diffeomorphism from the manifold defined by the atlas.Ux; hx/ to the manifold defined by the atlas .Vx; kx/.

1.4 Examples

(1) Any open subset of a Euclidean space Rn is, of course, a smooth manifold,and C r -maps between such manifolds are simply maps for which the requiredpartial derivatives (in the old sense) exist and are continuous.

(2) More generally, an open subset U of a smooth manifold M automaticallyinherits a structure of a smooth manifold.

(3) Suppose f W Rn ! R is a C1-function. Define

M D f.x; f .x// 2 RnC1jx 2 R

ng:Then M is a smooth manifold with a single coordinate neighborhood Rn andthe coordinate function

.x; f .x// 7! x:

The smooth manifoldM is known as the graph of the function f . The identityembeddingM � RnC1 is a C1-map and the projection

M ! Rn


given by

.x; f .x// 7! x

is a diffeomorphism.(4) The first “non-trivial” example of a smooth manifold is the n-sphere

Sn D f.x0; : : : ; xn/ 2 RnC1j

nXiD0

x2i D 1:g:

For every x D .x0; : : : ; xn/ 2 Sn, there exists a k 2 f0; : : : ; ng such that

xk ¤ 0. Choose, for each x, such a k and an " satisfying 0 < " <

q1 � x2k .

Then we can take

hx W .y0; : : : ; yn/ 7! .y0; : : : ; yk�1; ykC1; : : : ; yn/;

Ux D h�1x Œ.hx.x/; "/�:

(5) If M , N are smooth manifolds, then M N is naturally a smooth manifoldwhere the coordinate neighborhood of a point .x; y/ 2 M N is Ux Uy andthe coordinate function is h.x;y/.z; t/ D .hx.z/; hy.t//. The product projectionsM N ! M ,M N ! N are C1-maps.

1.5 Smooth partition of unity

Let M be a smooth manifold and let .Ui/i2I be an open cover of M . A smoothpartition of unity subordinate to the cover .Ui/ is a system of smooth functionsui W M ! R such that for every x 2 M , 0 � ui .x/ � 1,

u�1i Œ.0; 1i� � Ui

(i.e. the support of ui is contained in Ui ), and for every x 2 M there exists an openneighborhoodVx of x and a finite subset Ix � I such that for all y 2 Vx , i 2 I XIx ,we have ui .y/ D 0 and

Xi2I

ui D 1: (1.5.1)

(Note that the expression on the left-hand side of (1.5.1) makes sense because onVx , it can be defined as the sum over Ix .)

A refinement of an open cover .Ui/i2I is an open cover .Vj /j2J such that forevery j 2 J , there exists an i 2 I such that Vj � Ui . A cover .Ui /i2I is called

1 Smooth manifolds 291

locally finite if for every x 2 M , there exists an open neighborhood Vx and a finitesubset Ix � I such that for i 2 I X Ix , Vx \ Ui D ;.

Lemma. For every open cover .Ai /i2I of a smooth manifold M , there exists anatlas .Uj ; hj /j2J such that J is countable, the cover .Uj / is locally finite, is arefinement of the cover .Ai /, we have hj ŒUj � D .o; 3/ and .h�1

j Œ.o; 1/�/j2J isalso a cover of M .

Proof. Since M has a countable basis by Theorem 1.2 of Chapter 9, any opencover has a countable subcover. Since clearly every point of M has a compactneighborhood, there exists a countable cover by open sets whose closures arecompact, which is a refinement of .Ai/. Assume, without loss of generality, that.Ai /i2I itself is such a cover, and that, moreover, I D f1; 2; : : : g. Now defineK1 D A1, and assumingK1; : : : ; Ki are defined, let

KiC1 D A1 [ � � � [Arwhere r > i is the smallest number such that Ki � A1 [ � � � [ Ar . (Note that sucha number exists by compactness.)

Denote by Xı the interior of a set � M , i.e.

Xı D M X .M X X/:

Now setting K0 D ;, one has

M D1[iD1

Ki XKıi�1;

and

Ki�1 � Kıi :

For each x 2 Ki X Kıi�1, we can find an open neighborhood Ux � Kı

iC1 X Ki�2which is contained in one of the Ai ’s, and a diffeomorphism hx W Ux ! .o; 3/such that hx.x/ D 0. Furthermore, there are finite sets Si � Ki X Kı

i�1 so thath�1x Œ.o; 1/�, x 2 Si , coverKi XKı

i�1.The system .Ux; hx/x2SSi is the required atlas. ut

Theorem. For any open cover .Ai / of a smooth manifold M there exists a smoothpartition of unity subordinate to .Ai/.

Proof. Let � W R ! R be a function defined by �.t/ D e�1=t for t > 0 and�.t/ D 0 for t � 0. Then � is smooth (see Exercise (3) of Chapter 1). Hence thefunction g W Rn ! R defined by


g.x/ D �.2� jjxjj/�.2 � jjxjj/C �.jjxjj � 1/

satisfies 0 � g.x/ � 1 for every x 2 Rn, and

g.x/ D 0 for jjxjj � 2;

g.x/ D 1 for jjxjj � 1:

Now take the atlas .Uj ; hj / from the statement of the Lemma, let gj D g ı hj anddefine

uj D gjXk2J

gkfor j 2 J :

(Note that the right-hand side is well defined by local finiteness.) ut

2 Tangent vectors, vector fields and differential forms

The notion of a tangent vector to a smooth manifold models the geometric intuition(for example, the instant velocity of a point moving in the manifold). As we learnedin the previous section, however, we must model everything in terms of coordinateneighborhoods.

2.1 Tangent vectors

Let M be a smoothm-manifold and let x 2 M . Consider the set QTMx of all triples.U; h; v/ where U is a neighborhood of x, h W U ! V be a diffeomorphism forsome V � Rn open, and v 2 Rn.

Now introduce the following equivalence relation on QTMx: We put

.U; h; v/ .V; k;w/

if there exists an open neighborhood W of x contained in U \ V such that if wedenote by f the composition

hŒW �h�1

Wk

kŒW �;

then

Dfh.x/.v/ D w:

2 Tangent vectors, vector fields and differential forms 293

(Recall that D denotes the total differential, see 3.2 of Chapter 3). It is easy toverify that this is indeed an equivalence relation. The set of equivalence classesof QTMx is denoted by TMx and its elements are called tangent vectors to M atx. A representative of a -equivalence class will be called a representative of atangent vector. The tangent vector represented by a triple .U; h; v/will be sometimesdenoted by Œ.U; h; v/�. When this gets too cumbersome, we will also refer to v asthe vector Œ.U; h; v/� in the coordinate system h W U ! V .

Lemma. Let u 2 TMx . Then for every neighborhood U of x and everydiffeomorphism h W U ! V for V � Rn open, there exists a unique representative.U; h; v/ of the tangent vector u.

Proof. If .V; k;w/ is any representative of u, put

v D D.k ı h�1/�1h.x/.w/:

(Note that k ı h�1 is defined in a neighborhood of h.x/.) By definition, this provesexistence. To prove uniqueness, note that by definition, clearly we cannot have.U; h; v/ .U; h; v0/ for v ¤ v0 2 Rn. ut

Note that by the lemma it immediately follows that TMx has a natural structureof a R-vector space, and that moreover, this vector space is n-dimensional. In effect,let U be an open neighborhood of x and let h W U ! V be a diffeomorphism ontoan open subset of Rn. Let

Œ.U; h; v/�C Œ.U; h;w/� D Œ.U; h; v C w/�;

�Œ.U; h; v/� D Œ.U; h; �v/�

where � 2 R. Correctness of the definitions of these operations (i.e. independenceof the results of chosen representatives) follows from the linearity of the differentialin Rn.

As noted above, a coordinate system h W U ! V at x 2 M identifies TMx withRn. Putting h D .h1; : : : ; hn/, hi W U ! R, it is useful to denote the ordered basisof TMx corresponding to the standard basis of Rn by

.@

@h1; : : : ;

@

@hn/: (*)

The reason for this notation is that if f W U ! R is a smooth function, in the spiritof the chain rule, it makes sense to write

@

@hif .x/ D @f .x/

@hiD @.f ı h�1/

@xi.h.x//


where on the right-hand side, xi denotes the standard i ’th coordinate of Rn, as usedin Chapter 3.

It is also useful to notice that when U � Rn is an open subset, x 2 U , we have acanonical identification

Rn Š TUx

via

v 7! ŒU; Id; v�:

2.2 The total differential on manifolds

Let M;N be smooth manifolds and let f W M ! N be a C1-map. Let x 2 M . Wedefine the total differential of f at x

Dfx W TMx ! TNf .x/

as follows: Let V be an open neighborhood of f .x/ and let k W V ! W be adiffeomorphism where W � Rm is open. Then define

Dfx.Œ.U; h; v/� D Œ.V; k;D.k ı f ı h�1/h.x/.v//�:

This definition is correct by the chain rule (in Euclidean spaces) and Dfx is linear bylinearity of differentials (in Euclidean spaces). Additionally, note that it generalizesthe definition of total differential 3.2 of Chapter 3 when we identify the tangentspace of an open subset of Rn at every point with Rn.

If we have a real C1-function f W U ! R from some U � M open, we usuallywrite df .x/ instead of Dfx . From this point of view, df can also be viewed as aC1- 1-form (see 2.3 below). Similar statements, of course, hold for C r and smoothfunctions. In particular, it is useful to note that if h W U ! V is a coordinate systemat x 2 M , and h D .h1; : : : ; hn/, then

.dh1; : : : ; dhn/

is a basis of TM�x dual to the basis 2.1 (*) of TMx .

In preparation for the next subsection, note also that by the properties of dualsand exterior products (see Chapter 11), we have canonical linear maps

Df �x W TN�

f .x/ ! TM�x ;

ƒk.Df �x / W ƒk.TN�

f .x// ! ƒk.TM�x/:


A smooth map f W M ! N is called an immersion (resp. submersion)if for every x 2 M , Dfx is injective (resp. onto). An immersion which is ahomeomorphism onto its image is also called an embedding or an inclusion of asubmanifold. We will then also refer to f ŒM � as a submanifold of N .

2.3 Smooth vector fields and differential forms

Let M be a smooth n-manifold. Then a vector field v on M (resp. a k-form !

on M ) is a map assigning to each x 2 M an element of v.x/ 2 TMx (resp. of!.x/ 2 ƒk.TM�

x /). A differential form is a common term for k-forms for any k.A vector field (resp. a k-form) is called C r if for every x 2 M there exists an openneighborhood U and a diffeomorphism h W U ! V for V � Rn open such that forevery y 2 U ,

y 7! Dhy.v.y// 2 Rn

resp.

y 7! ƒk..D.h/�y /�1/.!.y// 2 ƒk..Rn/�/

is a C r map where the right-hand side uses the identification of the tangent spacesof an open subset of Rn at the end of Section 2.1. C1 vector fields and k-forms arealso called smooth.

It is useful to note that if h W U ! V is a smooth coordinate system at somepoint x 2 M , h D .h1; : : : ; hn/, then immediately from the definition, the vectorspace of all smooth vector fields on U is

fnXiD1

fi � @

@hij fi W U ! R smooth functionsg;

and the space of all smooth 1-forms on U is

fnXiD1

fi � dhi j fi W U ! R smooth functionsg:

Thus, the smooth vector field or 1-form is completely determined by the n-tuple ofsmooth functions .f1; : : : ; fn/, and vice versa, the functions fi are determined bythe vector field (resp. differential form) and the coordinate system.

Using Proposition 3.4 of Chapter 11, we can extend this to k-forms. The spaceof all smooth k-forms on U is isomorphic to

fX

1�i1<��<ik�nfi1;:::;ikdhi1 ^ � � � ^ dhik j fi1;:::;ik W U ! R smoothg;

and the smooth functions fi1;:::;ik are completely determined by a smooth k-form.


It is also useful to realize that if .Ui ; hi / is a smooth atlas of M , and we havesmooth vector fields vi resp. smooth k-forms !i on each Ui such that

vi jUi\Uj D vj jUi\Ujresp.

!i jUi\Uj D !j jUi\Uj ;

then this uniquely determines a smooth vector field resp. smooth k-form on M . Inother words, smooth vector fields and k-forms can be described by a collection oflocal descriptions in the charts of an atlas.

Analogous statements are, of course, true with “smooth” replaced by C r .

2.4 Products and functoriality

For a vector field v and a smooth k-form ! on M , and a smooth functionf W M ! R, the product f � v is again a smooth vector field, and f � ! is a smoothk-form (here the products are evaluated point-wise, i.e. .f � v/.x/ D f .x/v.x/

for all x 2 M , and similarly for the differential form). Additionally, for a smooth`-form � on M , we have a smooth .k C `/-form ! ^ � defined using the exteriorproduct 3.7 of Chapter 11:

.! ^ �/.x/ D !.x/ ^ �.x/ 2 ƒkC`.TM�x /:

There are, also, analogous statements for C r .Now let f W M ! N be a smooth map. Using the maps constructed at the

end of 2.2, for a smooth k-form ! on N , we obtain a smooth k-form f �! on M .Explicitly, for x 2 M ,

.f �!/.x/ D ƒk.Df �f .x//.!.f .x/// 2 ƒk.TMx/

�:

This correspondence, of course, sends the identity to the identity, and .f ıg/�.!/ Dg�.f �.!//. Thus, we conclude that differential forms are contravariant in smoothmaps (in the sense of 1.2 of Chapter 11). There are, of course, analogous statementsfor smooth replaced by C r .

It may be surprising that vector fields are neither covariant nor contravariantin smooth maps: One can see this by realizing that vectors are covariant, whilesmooth functions are contravariant. Vector fields can be made, however, covariantin diffeomorphisms: Let f W M ! N be a diffeomorphism and let v be a smoothvector field onM . Then we can define a smooth vector field f�w on N by

.f�w/.x/ D Dff �1.x/.v.f�1.x/// 2 TNx:


2.4.1 CommentThe meaning of the symbols f � and f� here is related to, but not quite the same asin 1 of Chapter 11. Note that, for example, in the current situation, f is not a linearmap. Nevertheless, using the same symbol in both situations is quite standard inthis case.

2.5 A Slice Theorem

The attentive reader has noticed a similarity of this material with our remarks onsubstitution in differential equations. In fact, much of what we observed in Section 7of Chapter 6 can be done coordinate-free. Let us make this concrete in one aspect,which will be instructive as a contrast with what we will do with differential forms:

Proposition. Let v be a smooth vector field on a smooth n-manifold M , andsuppose v.x/ ¤ 0 for all x 2 M (we speak of a non-vanishing vector field).Let x 2 M . Then there exists a coordinate system h W U ! V at x such thatthe vector field h�.vjU / on V is constant and equal to .1; 0; : : : ; 0/ 2 Rn (using theidentification from the end of 2.1).

Proof. Let k W U1 ! V1 be any coordinate system at x and consider the vector fieldk�v. We can treat this vector field as a system of differential equations on V1: For asmooth function f W .a; b/ ! V1, the equation is

f 0.t/ DnXiD1.k�v/.f .t//i

@f

@xi: (*)

Now we know that this system has a smooth solution in a neighborhood of a pointof V1. Specifically, consider vectors w2; : : : ;wn 2 R

n such that

.k�v/.k.x//;w2; : : : ;wn are linearly independent (hence form abasis of Rn).

(**)

Then by Theorem 4.1 of Chapter 6, there exists an open neighborhood V2 � Rn ofo and a smooth map � W V2 ! V1 such that for y 2 V2, we have �.y/ 2 V1, further

�.0; a2; : : : ; an/ D k.x/CnXiD2

aiwi

and for any constants a2; : : : ; an such that .t; a2; : : : ; an/ 2 V2, the function

f .t/ D �.t; a2; : : : ; an/


satisfies the equation (*). Additionally, by our assumption (**), the map � is regularat 0, so by the Inverse Function Theorem 7.3 of Chapter 3, there exists an openneighborhoodV of 0 such that the restriction �jV W V ! �ŒV � is a diffeomorphism.Now put

U D k�1�ŒV �

and

h D .��1k/jU : ut

3 The exterior derivative and integrationof differential forms

3.1 The exterior derivative

The R-vector space of all smooth k-forms on a smooth n-manifoldM is denoted byk.M/. It is clearly a vector space over R. We will now construct a linear map

d W k.M/ ! kC1.M/: (1)

In terms of a smooth coordinate system h W U ! V , one writes

d

0@ X1�i1<��<ik�n

fi1;:::;ikdhi1 ^ � � � ^ dhik

1A

DX

1�i1<��<ik�ndfi1;:::;ik ^ dhi1 ^ � � � ^ dhik

DX

1�i1<��<ik�n

nXjD1

@fi1;:::;ik@hj

dhj ^ dhi1 ^ � � � ^ dhik :

(2)

Lemma. The formula (2) does not depend on the choice of coordinate system.

Proof. One first notices that for a smooth function f , df is independent ofcoordinate system by the chain rule, and that for smooth functions f; g, one hasthe Leibniz rule

d.fg/ D f dg C gdf:

3 The exterior derivative and integration of differential forms 299

Now let g W U ! W be another coordinate system. By the chain rule, we have

dhi DnX

jD1

@hi

@gjdgj :

Now differentiating

fi1;:::;ikdhi1 ^ � � � ^ dhik (3)

in the h-coordinate system and converting to the g-coordinate system, we obtain

Xdfi1;:::;ik

�@hi1@gj1

: : :@hik@gjk

�dgj1 ^ � � � ^ dgjk (4)

where the sum is over all possible choices 1 � j1; : : : ; jk � n (the numbers jp donot have to form an increasing sequence in p).

Now converting (3) to the g-coordinate system first, we obtain

Xfi1;:::;ik

�@hi1@gj1

: : :@hik@gjk

�dgj1 ^ � � � ^ dgjk : (5)

Now differentiating (5) in the g-coordinate system, we must form

d

�fi1;:::;ik

�@hi1

@gj1: : :

@hik@gjk

��(6)

and then multiply by dgj1 ^ � � � ^ dgjk . However, by the Leibniz rule, we maydifferentiate fi1;:::;ik and the partial derivative factors separately, and the key point isthat when we differentiate

@hip

@gjp;

we get a double partial derivative

@2hip

@gjp@gj 0p

:

In the resulting sum, however, each such term will appear twice, with the attacheddgjp and dgj 0

pterms swapped. Thus, by the rules of computation in the exterior

algebra, the two terms in each such pair appear with opposite signs, and hencecancel out. ut


3.2 The de Rham complex, de Rham cohomologyand Betti numbers

Lemma. We have

d ı d D 0 W k.M/ ! kC2.M/:

Proof. Using formula (2) from 3.1 in a coordinate system h W U ! V , the(k C 2)-form

d

0@ X1�i1<��<ik�n

fi1;:::;ikdhi1 ^ � � � ^ dhik

1A

is a sum of expressions of the form

@2fi1;:::;ik@hj @h`

dh` ^ dhj ^ dhi1 ^ � � � ^ dhik ;

but each of these terms appears twice with j and ` in opposite orders, and thereforewith opposite signs, and hence the entire expression vanishes (of course, the termswith j D ` vanish immediately). ut

We therefore obtain a sequence of vector spaces and linear maps

0.M/d�! 1.M/

d�! � � � d�! n.M/ (*)

(note that we can only have k.M/ ¤ 0 for 0 � k � n) such that

d ı d D 0:

The sequence (*) is called the de Rham complex of the smooth manifold M , and isdenoted by .M/. A k-form ! is called closed if

d! D 0

and is called exact if there exists a .k � 1/-form � such that

! D d�:

(We consider the 0-form 0 exact.) Then the set of all closed k-forms is a vectorsubspace ofk.M/ which is denoted byZk.M/, and the set of all exact k-forms isthen a vector subspace of Zk.M/ which is denoted by Bk.M/.

4 Integration of differential forms and Stokes’ Theorem 301

The quotient R-vector space

HkDR.M/ D Zk.M/=Bk.M/

is called the k’th de Rham cohomology vector space of M . We write

bk.M/ D dim.HkDR.M//

and call this the k’th Betti number of M (it can, of course, be infinite, seeExercise (17)).

Betti numbers are fundamental characteristics of manifolds. For example, theyare computable in practice, they turn out to be topological invariants, whichmeans that two homeomorphic manifolds have the same Betti numbers. Also, Bettinumbers can be defined for topological manifolds, and in fact, for all topologicalspaces. This leads to an area of mathematics called algebraic topology (see, forexample, [3, 13, 14, 20]). Unfortunately, in this text, a systematic treatment of Bettinumbers would take us too far afield, and we will confine ourselves to a few basicexercises (Exercises (11), (12) (13), (14), (15), (16), (17)).

4 Integration of differential forms and Stokes’ Theorem

4.1 Orientation of smooth manifolds

LetM be a smooth n-manifold. An orientation of M is a choice of orientation of thespace .TMx/

� for each x 2 M such that for each x 2 M there exists a coordinatesystem h W U ! V at x for which the orientation is constant when we use theidentification from the end of 2.1. Two orientations of M are considered equal ifthey are equal at every point x 2 M . An orientation may not exist (see Exercise (18)below). A smooth manifold for which there exists an orientation is called orientable.

Recall from 3.5 of Chapter 11 that a non-zero element ofƒn..TMx/�/ determines

an orientation of .TMx/�. Hence a form ! 2 n.M/ such that !.x/ ¤ 0 for all

x 2 M determines an orientation of M .

Lemma. Every orientation of a smooth n-manifold M is determined by a form! 2 n.M/ such that !.x/ ¤ 0 for all x 2 M (which is then often called thevolume form). Moreover, two such forms !; � determine the same orientation if andonly if there exists a smooth (nowhere vanishing) function k W M ! R such that! D k � �.

Proof. To prove the first statement (existence), take a smooth atlas .Ui ; hi / suchthat a form !i as required exists for the restriction of our orientation to Ui (such anatlas exists by the definition of orientation). Now take a smooth partition of unity uisubordinate to the cover Ui , and put


! DXi

ui!i :

To prove the second statement, let !, � determine the same orientation. Choose asmooth atlas .Ui ; hi /. Then !jUi D fidh1 ^ � � � ^ dhn, �jUi D gidh1 ^ � � � ^ dhn.Define k.x/ D fi .x/=gi .x/ when x 2 Ui . ut

Let M , N be oriented n-manifolds and let � be a volume form on N specifyingthe orientation. We say that a diffeomorphism � W M ! N preserves orientation ifthe volume form ��! specifies the given orientation on M .

4.2 Integration

Let ! be a smooth n-form on Rn. Then we may write

! D f dx1 ^ � � � ^ dxn

for a smooth function f W Rn ! R. Let B � Rn be a Borel set such that B iscompact. Define

Z

B

! DZ

B

f dx1 : : : dxn: (1)

Now let M be a smooth oriented n-manifold and let ! be a smooth n-form on M .Let B � M be a Borel set such that B is compact. Then there exists a smooth atlas.Ui ; hi / of M such that hi preserves orientation if we take the standard orientationdx1 ^ � � � ^ dxn on hi ŒUi �, and such that there exists a finite subset F � I whereUi \B D 0 for i … F (take an orientation-preserving atlas, choose a finite subcovercontaining B and intersect the remaining charts with M X B). Now put

Z

B

! DXi2F

Z

hi ŒB\Ui �ui � .h�1

i /�! (2)

(recall 2.4).

Lemma. The number (2) does not depend on the choice of the atlas .Ui ; hi / (subjectto the given conditions).

Proof. First note that if U; V � Rn are open sets such that B � U , � W U ! V isan orientation-preserving diffeomorphism,! is a smooth n-form, then

Z

B

! DZ

�ŒB�

.��1/�! (3)

as defined by (1), by the Substitution Theorem 7.9 of Chapter 5.


Now let .Ui ; hi /i2I , .U 0i ; h

0i /i2I 0 be two atlases as in the statement of the lemma.

First, note that by the (finite) additivity of the integral, we may assume I D I 0,Ui D U 0

i . We may still have hi ¤ h0i , but the invariance of the integral under this

choice follows from (3). ut

Remark: Note that our notation is slightly inconsistent. In (2), we should displaythe orientation of the manifoldM . In (1), on the other hand, we assume the standardorientation of Rn, i.e. the orientation defined by the n-form dx1^� � �^dxn. A reversalof orientation results, of course, in a reversal of sign.

4.3 Regions with corners

4.3.1Let M be an oriented smooth n-manifold. By a region with corners in M wemean a compact subset K � M such that for every x 2 K X Kı, there existsan orientation-preserving coordinate system h W U ! V at x in M such that

V D .�1; 1/�n

and there exists a k 2 f0; : : : ; ng such that

hŒK \ V � D h0; 1/�k .�1; 1/�.n�k/ (1)

or

hŒK \ V � D .�1; 1/�n X ..�1; 0/�k .�1; 1/�.n�k//: (2)

(We use the symbol S�n for the n-th Cartesian power of a set S here to reducethe chance of confusion.) A special case worth pointing out is the case when onealways has k � 1. In this case, we call K a compact n-dimensional submanifoldwith boundary. Note that then our coordinate system givesK XKı the structure ofa .k � 1/-dimensional compact submanifold of M .

4.3.2 Integrating over the boundaryNow let � be a smooth .n � 1/-form on M . Consider an atlas .Ui ; hi / of M suchthat there exists a finite subset F � I whereK \Ui D ; when i … F , and .Ui ; hi /satisfy (1) or (2) when i 2 F . Let ui be a smooth partition of unity subordinateto the cover Ui . Let F1, resp. F2 denote the set of all i 2 F for which h D hisatisfies (1) (resp. (2)). Denote by

cj W Rn�1 ! Rn


the map given by

.x1; : : : ; xn�1/ 7! .x1; : : : ; xj�1; 0; xj ; : : : ; xn�1/:

Then defineZ

@K

�

DXi2F1

kXjD1

.�1/jZ

h0;1/�.k�1/�.�1;1/�.n�k/

ui � .h�1i cj /

��

CXi2F2

kXjD1

.�1/jZ

.�1;0i�.k�1/�.�1;1/�.n�k/

ui � .h�1i cj /

��:

(*)

It can be proved that the expression (*) does not depend on the choice of atlas withthe properties required above. However, this is a bit tedious and we will omit theproof, as it is not needed for proving Stokes’ Theorem. When stating the theorem inthe next paragraph, we will simply assume that an atlas as above has been chosen.

It is worth noting, however, that in the special case of a compact n-dimensionalsubmanifold with boundary, it follows that the integral defined by (*) coincides with

Z

@K

��

where � W @K ! M is the inclusion of a submanifold, as discussed above. Notehowever that then one must be careful about orientation. The correct orientation ata point x 2 @K , x 2 Ui , is by

�.h�1i /

�.dx2 ^ � � � ^ dxn/ 2 ƒn�1.T .@K/x/�:

(The minus sign comes from the fact that the added first vector of the orderedbasis representing the orientation of TMx should point “outside” from the boundary,which, in our setup, happens to be in the negative direction.)

4.4 Theorem. (Stokes’ Theorem) Let M be a smooth n-manifold and let � 2n�1.M/. Let K be a region with corners in M and let .Ui ; hi / and ui be chosenas in 4.3. Then

Z

@K

� DZ

K

d�: (*)

Proof. The statement and the proof are both straightforward generalizations ofour treatment of Green’s Theorem. (In fact, the part of the proof dealing with


substitutions becomes simpler, since Stokes’ Theorem is stated in terms of differen-tial forms, which are contravariant.) Again, the key step is to prove the result for thecase of a cube:M D Rn,

K DnY

jD1haj ; bj i;

aj < bj . In this case, suppose without loss of generality that

� D f � dx1 ^ � � � ^ dxj�1 ^ dxjC1 ^ � � � ^ dxn:

Then

d� D .�1/jC1 @f@xj

dx1 ^ � � � ^ dxn:

Then by Fubini’s Theorem and the Fundamental Theorem of Calculus in onevariable,

Z

K

d� DZQ`¤j ha`;b`i

.�1/jC1.f .x1; : : : ; xj�1; bj ; xjC1; : : : ; xn/

�f .x1; : : : ; xj�1; aj ; xjC1; : : : ; xn//dx1 : : : dxn�1 DZ

@K

�:

(Note that on the right-hand side, the summands corresponding to coordinates otherthan the j ’th coordinate vanish.)

Now in the general case, one proves the theorem by considering each of thesummands 4.3.2 (*) separately, applying the case of the cube to the smooth.n � 1/-form

ui .h�1i cj /

��:

When i 2 F1, one uses the cube

h0; 1i�k h�1; 1i�.n�k/:

When i 2 F2, one sums over the cubes

h�1; 0i�.`�1/ h0; 1i h�1; 1i�.n�`/

with ` D 1; : : : ; k. Again, the summands not relevant to the statement are 0 orappear twice with opposite signs. ut


4.5 Three special cases: grad, div and curl

On open submanifolds U � Rn, smooth 1-forms are identified with Rn-valuedfunctions. The differential

d W 0.U / ! 1.U /

is then identified with a map grad from the space of smooth functions on U to thespace of Rn-valued smooth functions (or, equivalently, n-tuples of smooth func-tions) on U . The corresponding case of the Stokes Theorem is the “FundamentalTheorem of Line Integrals” which says that for an oriented piecewise smooth curveL represented by � W ha; bi ! Rn, we have

.II/Z

L

grad.f / D f .�.b//� f .�.b//: (*)

(Note that our current setup is slightly different, to get a special case of Theorem 4.4,we would have to formulate (*) on smooth 1-manifolds rather than piecewisesmooth curves, but both statements are equally easy to prove - see Exercise (20).)

Smooth .n � 1/-forms can also be identified with smooth 1-forms and smoothn-forms can be identified with smooth functions using the Hodge �-operator. For afunction F W U ! Rn, denote by � the 1-form

PFidxi . Then we put

div.F / D �.d � .�//;

and for a region with cornersK � U , we put

Z

@K

F DZ

@K

��:

(In this form, this integral is also known as flux.) Then the Stokes Theorem takes theform

Z

@K

F DZ

K

div.F /:

When n D 3, one also denotes by curl.F / the R3-valued function associatedwith the 1-form

�.d�/:

In coordinates, we obtain

curl.F / D�@F3

@x2� @F2

@x3;@F1

@x3� @F3

@x1;@F2

@x1� @F1

@x2

�:

5 Exercises 307

Let M be a 2-dimensional submanifold of R3, let K be a region with corners in Mand let F be an Rn-valued function defined in an open subset of R3 containing M .Then the Stokes Theorem takes on the form

Z

K

curl.F / DZ

@K

F:

Observe that the right-hand side may be interpreted as a sum of line integrals of thesecond kind.

5 Exercises

(1) Prove that the definitions of a manifold and a smooth manifold would remainequivalent if we require the coordinate maps hx to be homeomorphisms.

(2) Prove in detail that the definition given in Example 1.4 (4) really specifies asmooth manifold and that the inclusion Sn � R

nC1 is a C1-map.(3) Prove that the function � used in the proof of Theorem 1.5 is smooth.(4) Recall the example of the manifold Sn from the last section. For x 2 Sn,

construct an isomorphism of vector spaces

�x W T .Sn/x Š fw 2 Rnjx � w D 0g

such that for every smooth map f WRnC1 ! RnC1 which satisfies f ŒSn� � Sn

we have a commutative diagram

T .Sn/x�x

Df jSn

Rn

Df

T .Sn/f .x/

�f .x/

Rn:

(5) Recall the notion of Lie bracket of smooth vector fields from 7.5 of Chapter 6.Let us generalize this notion to vector fields on manifolds. In other words,let u, v be vector fields which on some open set U with smooth coordinatesh1; : : : ; hn are given by

u DnXiD1

fi@

@hi; v D

nXiD1

gi@

@hi

for smooth functions fi , gi . Define

Œu; v� DnX

i;jD1

�fi@gj

@hi� gi

@fj

@hi

�@

@hj:


Prove that this is a well-defined operation on smooth vector fields on a smoothmanifoldM , and that it satisfies (7.6.1) and (7.6.2) of Chapter 6.

(6) A Lie group is a smooth manifold G which is also a group (see B.3.1), suchthat the operations of multiplication � W G G ! G and inverse � W G ! G

are smooth maps (see 1.4, (5)). Prove that the groups GLn.R/, GLn.C/ (seeAppendix B, Exercise (6)) are open subsets of the real vector spaces of all nnreal (resp. complex) matrices, and are Lie groups by considering their groupstructure and the induced smooth manifold structure.

(7) Let G be a Lie group. A vector field v on the manifold G is called leftinvariant if

.DLg/.v.e// D v.g/

for every g 2 G where e is the unit element and Lg W G ! G is thediffeomorphism given by left multiplication by g. Prove that the R-vectorspace of left invariant vector fields on G is isomorphic to TGe by

v 7! v.e/:

(8) Prove that if G is a Lie group, then the vector space g of left-invariant smoothvector fields on G forms a sub-algebra of the Lie algebra of all smoothvector fields discussed in Exercise (5) in the sense that the Lie bracket of twoleft invariant vector fields is left invariant. This g is called the Lie algebraassociated with the Lie group G, and can be shown to encode a large part ofthe Lie group structure of G. (For further reading, see for example [9, 10].)

(9) Find two smooth 1-forms !, � on R2 such that for every x 2 R2 we have!.x/; �.x/ ¤ 0 and there does not exist any non-empty open set U � R2 and� W U ! R2 with �� D !jU . Compare with 2.5. [Hint: use the exteriorderivative.]

(10) Prove that for a smooth k-form ! and a smooth `-form �,

d.! ^ �/ D .d!/ ^ �C .�1/k! ^ d�:

(11) Generalize the proof of Lemma 3.1 to prove that for a smooth mapf W M ! N and a smooth k-form ! 2 k.N /, we have

d.f �!/ D f �.d!/:

Conclude that we have a canonical linear map

f � W Zk.N / ! Zk.M/

5 Exercises 309

which restricts to

f � W Bk.N / ! Bk.M/

and hence determines a linear map

f � W HkDR.N / ! Hk

DR.M/:

Hence, de Rham cohomology is contravariant in smooth maps.(12) Prove that diffeomorphic smooth manifolds have the same Betti numbers

[Hint: use Exercise (11)].(13) Note that a smooth 0-form is the same thing as a smooth function. Prove that

a smooth 0-form is closed if and only if it is locally constant. Conclude thatb0.M/ is the number of connected components of M .

(14) Let � W R ! S1 be the smooth map defined by

�.t/ D .cos.t/; sin.t//:

Prove that a smooth 1-form f dx on R is equal to ��! for some ! 2 1.S1/

if and only if f is a smooth periodic function with period 2 . Prove that ! isexact if and only if

Z 2

0

f .x/dx D 0:

Conclude that b1.S1/ D 1. Conclude also that b1.S1 S1/ ¤ 0. [Hint:Consider the smooth map S1 S1 ! S1 given by .x; y/ 7! x and thesmooth map S1 ! S1 S1 given by x 7! .x; a/ for some constant a.Use Exercise (11).]

(15) Prove that bn.Rn/ D 0 (this is a special case of the Poincare lemma, whichsays that bi.Rn/ D 0 for i > 0). [Hint: writing an ! 2 n.Rn/ asf dx1 ^ � � � ^ dxn, put

g.x1; : : : ; xn/ DZ x1

0

f .t; x1; : : : ; xn/dt

and consider the form gdx1 ^ � � � ^ dxn.](16) Let ! 2 Z1.S2/. Let UC D S2 X f.0; 0;�1/g, U� D S2 X f.0; 0; 1/g, U D

UC \ U�. Then UC and U� are diffeomorphic to R2, so by Exercise (15),there exist smooth 0-forms (i.e. smooth functions) f W UC ! R, g W U� ! R

such that df D !jUC , dg D !jU� . Additionally, d.f jU�gjU / D 0 and hencef jU � gjU is locally constant and hence constant, since U is connected. Letc D f jU � gjU . Define a function h W S2 ! R by h.x/ D f .x/ for x 2 UC,


h.x/ D g.x/ C c for x 2 U�. Prove that dh D !, and hence b1.S2/ D 0.Conclude (see Exercise (14)) that the smooth manifolds S2 and S1 S1 arenot diffeomorphic - an intuitively obvious, but highly non-trivial fact.

(17) Prove that b1.C X Z/ D 1.(18) The Mobius strip. Consider here S1 as the unit circle in C.

M D f.x; z/ 2 C S1jx2=z 2 Rg:

Prove thatM is not orientable. [Hint: Consider the immersion and submersionf W RR ! M given by .x; t/ 7! .xe it; e2 it/. Prove that a 2-form hdxdy 22.R2/ D f �! for a 2-form! 2 2.M/must satisfy h.0; 1/ D �h.0; 0/ andthat therefore ! cannot be nowhere vanishing.]

(19) Consider, on Sn�1, the smooth n � 1-form

! DnXiD1.�1/iC1dx1 ^ � � � ^ dxi�1 ^ dxiC1 ^ � � � ^ dxn:

Prove that

Z

sn�1

! ¤ 0:

Conclude that bn�1.Sn�1/ � 1. [Hint: use Stokes’ Theorem, the Hodge *operator and spherical coordinates.]

(20) Prove the Fundamental Theorem of Line Integrals, 4.5 (*). [Hint: Aftercomposing with the map �, it becomes essentially a special case of theFundamental Theorem of Calculus for the Riemann integral, but a little bitof care is needed since L is only piecewise smooth.]

13Complex Analysis II: Further Topics

There are some extremely important concepts in complex analysis which we didnot cover in Chapter 10, and which ultimately lead up to several other areas ofmathematics. First of all, quite a bit more can be said about conformal maps. Undervery general conditions, one open subset of C can be mapped holomorphicallybijectively onto another. We prove one such result, the famous Riemann MappingTheorem. In many situations, such maps can even be written down explicitly. Thoseare the Schwartz-Christoffel formulas, which have applications in cartography, asthe basic condition on mappings in cartography is to be conformal (since distortionof distances in a topographical map is generally considered more allowable thandistortion of angles). Yet, the Schwarz-Christoffel formulas also lead to ellipticintegrals, which are “inverse” to elliptic functions (see for example [11]).

A major topic not covered in Chapter 10 is the question of “multi-valued holo-morphic maps” such as, for example, the natural logarithm on C X f0g (or, for thatmatter, elliptic integrals). What is the appropriate theoretical underpinning for suchfunctions? It turns out that now is the right moment for us to study such questions,since we have already learned about manifolds. In this chapter, we will study com-plex manifolds of complex dimension 1, which are called Riemann surfaces. It turnsout that the right way of thinking about multivalued functions on an open subset Uof C is as functions defined on a certain Riemann surface which is a covering of U(not to be confused with open covers as studied in 1.1 of Chapter 9). In the process ofdeveloping this concept, we will also learn a lot more about complex integration (wewill develop, for example, integration of holomorphic functions along continuouspaths and will show that if two paths are homotopic, i.e. one can be continuouslydeformed to another, the integrals are the same). At the same time, we will alsoexplore striking ways in which complex differential forms behave on Riemannsurfaces, which will greatly enhance our understanding of complex integration.

Finally, we will see that methods of complex analysis extend even to functionswhich are not holomorphic, generalizing, for example, the Cauchy formula to func-tions which are continuously differentiable but not holomorphic. These methods willbe very useful in Chapter 15 below, where we will construct compatible complexstructures on oriented surfaces with Riemann metrics.


311

312 13 Complex Analysis II: Further Topics

As is the case with the concept of manifolds, the study of coverings has a closeconnection with algebraic topology, which we will not explore here in detail. Wewill, however, briefly introduce the concept of the fundamental group and give twoexamples in Exercises (15) and (16). For more information on Riemann surfaces,we recommend the book [6], and for a very concise yet informative study of thefundamental group and coverings in an abstract topological setting, [13]. For veryinteresting ventures to higher dimensions, [8] may be an excellent source.

1 The Riemann Mapping Theorem

In this section, we will consider bijective holomorphic maps f W U ! V whereU; V are open subsets of C. Note that by Theorem 6.3.3 of Chapter 10, we musthave f 0.z/ ¤ 0 for z 2 U , and hence f �1 W V ! U is also a bijective holomorphicfunction. Such functions will be called holomorphic isomorphisms, and if U D V ,holomorphic automorphisms.

1.1 Holomorphic self-maps of C and the unit disk

1.1.1 Proposition. The only injective holomorphic functions onC are f .z/DazCb.

Proof. Let us study the singularity of the function f .1=z/ at z D 0. If this singularityis removable, then f is bounded, and hence constant by Liouville’s Theorem 5.1 ofChapter 10, contradicting our assumptions. If f .1=z/ has a pole of order k > 1 at 0,then for " > 0 sufficiently small, there exists, by Theorem 6.3.3 of Chapter 10, a ı >0 such that 1=f .1=z/� a has exactly k zeros in .0; "/ for every a 2 .0; ı/. Notethat these k zeros may include zeros of order > 0, but not for ı sufficiently small,since otherwise the holomorphic function .1=f .1=z//0 would have zeros arbitrarilyclose to 0, and hence would be constantly 0 by Theorem 4.4 of Chapter 10. However,if the k zeros are all different, this contradicts injectivity of f . Finally, if f .1=z/has an essential singularity at 0, let f .0/ D A. Then by the Holomorphic OpenMapping Theorem 6.3.4 of Chapter 10, for every r > 0 there exists an " > 0

such that f j.0; r/ takes on every value in .A; "/. On the other hand, applyingProposition 6.2 of Chapter 10 to f .1=z/, we see that there are (infinitely many) zwith jzj > r such that f .z/ 2 .A; "/ which, again, contradicts injectivity.

We have concluded that f .1=z/ has a pole of order 1 at z D 0. Then thefunction .f .z/ � A/=z is holomorphic and bounded on C, and hence is constantby Liouville’s Theorem 5.1. ut

It is, however, convenient to consider a slightly larger class of maps calledMobius transformations. These maps are formally defined as maps C [ f1g !C [ f1g by formulas

fA.z/ D az C b

cz C d

1 The Riemann Mapping Theorem 313

where A is a matrix of complex numbers

A D�a b

c d

�;

and we assume

det.A/ ¤ 0:

One readily verifies that

fA ı fB D fAB;

and thus all Mobius transformations are bijective maps C [ f1g ! C [ f1g.We will understand that better in Section 3 below. While in the formalism weintroduced, Mobius transformations with c ¤ 0 are, by definition, not holomorphicfunctions on C, they can be useful in mapping injectively holomorphically certainopen subsets of C onto one another (see Exercises 1, 2).

1.1.2 Lemma. (Schwartz’s Lemma) If f .z/ is a holomorphic function on .0; 1/which satisfies the conditions jf .z/j � 1 for all z 2 .0; 1/ and f .0/ D 0, thenjf .z/j � jzj for z 2 .0; 1/, and jf 0.0/j � 1. If additionally jf .z0/j D jz0j forsome z0 2 .0; 1/ or jf 0.0/j D 1, then f .z/ D cz for all z 2 .0; 1/ for someconstant jcj D 1.

Proof. Consider the function

g.z/ D�f .z/=z for z 2 .0; 1/X f0g.f 0.0/ for z D 0.

This function is holomorphic on .0; 1/ (see Exercise (14)). By the maximumprinciple 6.3.5 of Chapter 10, then, the maximum of the function jg.z/j on .0; r/can occur only on the boundary for any 0 < r < 1. By assumption, jg.z/j � 1=r

for jzj D r , and hence also for jzj � r . Passing to the limit, we get jg.z/j � 1 forz 2 .0; 1/, which is the first claim. If equality arises for a single point z0 2 .0; 1/,the function g has a maximum at that point, and hence must be constant. In order forthe equality to actually arise, however, the constant must have absolute value 1. ut

1.1.3 Corollary. Let f be a holomorphic automorphism of .0; 1/ such thatf .0/ D 0 and f 0.0/ is a positive real number. Then f .z/ D z for all z 2 .0; 1/.

Proof. First note that by Theorem 6.3.3 of Chapter 10, f 0.z/ ¤ 0 for all z 2.0; 1/, and thus f �1 W .0; 1/ ! .0; 1/ is also a holomorphic map. Thus,Schwartz’s Lemma can be applied to both f and f �1. In particular, jf 0.0/j � 1

and 1=jf 0.0/j D j.f �1.0//0j � 1, and hence equality must arise. Therefore, by


Schwartz’s Lemma again, f .z/ D cz with jcj D 1, but the assumption that f 0.0/ isa positive real number then implies c D 1. ut

1.2 The Riemann Mapping Theorem

An open subset U � C is called simply connected if U is connected and everyholomorphic function on U has a primitive function.

Lemma. Let U � C be a simply connected open set and let � W U ! C X f0g bea holomorphic function. Then there exists a holomorphic function Ln.�.z// on Usuch that

eLn.�.z// D �.z/

for all z 2 U .

Proof. Since U is simply connected, the function�0.z/�.z/

has a primitive function on

U , which we will denote by Ln.�.z//. This function is determined up to an additiveconstant. But using the chain rule and the product rule, we find that

�eLn.�.z//

�.z/

�0D 0;

so this function is constant. The additive constant can therefore be chosen in such away that

eLn.�.z// D �.z/: ut

Theorem. (The Riemann Mapping Theorem) Let U ¨ C be an open simplyconnected set, and let z0 2 U . Then there exists a unique holomorphic isomorphismf W U ! .0; 1/ such that f .z0/ D 0 and f 0.z0/ is a positive real number.

Proof. First of all, note that uniqueness follows from Corollary 1.1.3, since if f1, f2were two maps satisfying the conclusion of the Theorem, then .f1/�1f2 would be aholomorphic automorphism of .0; 1/ with positive real derivative at 0.

To prove existence, we will first prove that there exists an injective holomorphicmap f W U ! .0; 1/ with f .z0/ D 0 where f 0.z0/ is a positive real number. Ineffect, let a … U . Apply Lemma 1.2 to the function �.z/ D z � a. Thus, we have afunction Ln.z � a/ such that

eLn.z�a/ D z � a:

1 The Riemann Mapping Theorem 315

Now let

h.z/ D eLn.z�a/=2:

Then

h.z/2 D z � a on U ;

which means that for any z; t 2 U ,

h.z/ ¤ ˙h.t/:

By the Holomorphic Open Mapping Theorem 6.3.4 of Chapter 10, there is an r > 0such that

.h.z0/; r/ � hŒU �:

Therefore,

hŒU � \.�h.z0/; r/ D ;:

This means that for z 2 U ,

jh.z/C h.z0/j � r;

and in particular,

2jh.z0/j � r:

Now consider the function

f0.z/ D r � jh0.z0/j � h.z0/4 � jh.z0/j2 � h0.z0/

� h.z/ � h.z0/h.z/C h.z0/

:

First, note that the denominator is non-zero. Clearly, we have f0.z0/ D 0. From thechain rule, in fact,

f 00 .z0/ D .r=8/ � jh0

0.z0/j=jh.z0/j2 > 0:

Additionally, f0 is a composition of a Mobius transformation with the injectivefunction h. Thus, f0 is injective. Finally,

ˇˇ h.z/� h.z0/

h.z/C h.z0/

ˇˇ D jh.z0/j �

ˇˇ 1

h.z0/� 2

h.z/C h.z0/

ˇˇ � 4jh.z0/j

r


which shows that jf0.z/j � 1 for z 2 U . Of course, strict inequality must hold bythe Holomorphic Open Mapping Theorem 6.3.4 of Chapter 10.

Now letN be the supremum of all the values f 0.z0/ over the set S of all injectivefunctions f W U ! .0; 1/ which satisfy f .z0/ D 0 and f 0.z0/ > 0. (Note thata priori, one may have N D C1.) There exists, however, a sequence .fn/n offunctions in S such that

limn!1f 0

n.z0/ D N:

Clearly, the sequence fn is uniformly bounded and by Theorem 3.7 of Chapter 10,is also equicontinuous on every compact subset of U . By Theorem 6.3 of Chapter 9(a consequence of the Arzela-Ascoli Theorem), there exists a subsequence .fin/nwhich converges uniformly on every compact subset K � U . Denote the limitfunction by f . We know by Weierstrass’s Theorem 3.6 of Chapter 10 that f isholomorphic, f 0.z0/ D N (and thus N < 1), and jf .z/j � 1 for every z 2 U , but,again, a strict inequality must arise by Theorem 6.3.4 of Chapter 10. We will nowshow that f is injective. In effect, let z1 2 U . Then the functions fin.z/ � fin.z1/have no zero inUXfz1g, and hence f .z/�f .z1/ has no zero inUXfz1g by Hurwitz’sTheorem 6.3.6 of Chapter 10. Since z0 was arbitrary, f is injective as claimed.

We claim that the function f W U ! .0; 1/ is onto. Assume, for contradiction,that w0 2 .0; 1/X f ŒU �. Let

�.z/ D f .z/ � w01 � w0f .z/

:

Note that this function is injective since it is the composition of a Mobiustransformation with the injective function f . Applying Lemma 1.2 to this function�.z/, we can find a function Ln.�.z// on U which satisfies

eLn.�.z// D �.z/:

Let, again,

g.z/ D e.Ln�.z//=2;

so that

�.z/ D .g.z//2:

Let

F.z/ D jg0.z0/jg0.z0/

� g.z/ � g.z0/1 � g.z0/g.z/

:

Note that we have F 2 S . (Compare Exercise (2).)

2 Schwartz-Christoffel formula 317

Note that f D ı F for a certain holomorphic function � W .0; 1/ ! .0; 1/

which satisfies .0/ D 0. Concretely, is a composition of a holomorphicautomorphism of .0; 1/ which maps 0 to g.z0/ followed by squaring, and aholomorphic automorphism of .0; 1/ which maps 0 to w0. Since is obviouslynot a linear map, we have j 0.0/j < 1 by Schwartz’s Lemma 1.1.2, so F 0.0/ >f 0.0/ D N (since both are positive real numbers), which is a contradiction. ut

1.3 Comments

1. It is clear why the case U D C must be excluded in the statement of the RiemannMapping Theorem: By Liouville’s Theorem 5.1 of Chapter 10, any boundedholomorphic map defined on C is constant.

2. Excluding the case ofU D C, as already remarked, Proposition 2.5 of Chapter 10provides a converse to the Riemann Mapping Theorem when stated for real-regular images of convex open sets. Perhaps much more importantly, however,Proposition 2.5 of Chapter 10 serves as a source of examples for the Theorem.While our definition of a simply connected set above precisely fits the proof ofthe Theorem, it is not a condition which is easy to verify. On the other hand,constructing real injective regular maps on convex sets, as in Proposition 2.5 ofChapter 10, is easy (for example, see Exercise (4)).

2 Holomorphic isomorphisms of disks onto polygonsand the Schwartz-Christoffel formula

2.1 Convex polygons

We will examine holomorphic isomorphisms between .0; 1/ and open polygons.We will restrict here to convex polygons, although the restriction is not reallynecessary (in the sense that the same formula implies to non-convex polygons aswell). However, convex polygons are much easier to treat rigorously. By an openhalf-plane of angle a , 0 � a < 2, we shall mean a subset of C consisting of allthe numbers b C z where

Im.ze�ia / > 0 (2.1.1)

for some constant b 2 C. A closed half-plane is the closure of an open half-planeof the same angle.

An open convex polygon P is a bounded non-empty intersection of open half-planesP1; : : : ; Pk of angles a1; : : : ak , 0 < a1 < � � � < ak � 2. We put a0 D ak�2. The corresponding closed convex polygon is its closure P . Consider the points zi ,where fzi g D @Pi \ @Pi�1, i D 1; : : : ; k, and we let P0 D Pk and z0 D zk . Let usassume, without loss of generality, that k is the smallest possible for the given P .Then the points zi are called the vertices of P and the number ˇi D .ai � ai�1/


the exterior angle at the vertex zi , i D 1; : : : ; k. Setting ˛i D 1 � ˇi , the angle˛i is called the interior angle at the vertex zi . The boundary @P is the union of theclosed line segments Li between the points zi , ziC1, i D 0; : : : ; k � 1.

Now let z0 2 C and 0 < ˛ < 1. By Lemma 1.2, the function

.z � z0/˛ D e˛Ln.z�z0/ (2.1.2)

can then be defined on any open half-plane P whose boundary contains z0, andinspection shows that this function can be extended to a bijective continuousfunction mapping P onto a closed angle of value ˛.

Let f .w/ be a holomorphic function defined in an open neighborhood U of apoint w0 2 C, let f .w0/ D z0 and let f 0.w0/ ¤ 0. Define

g.w/ D .f .w/ � z0/˛:

Then we just proved that g.w/ is defined on f �1ŒP � where P is an open half-planecontaining z0.

2.2 Lemma. The function

g.w/.w � w0/�˛

extends holomorphically to an open neighborhood of w0, and the extension is non-zero there.

Proof. By our assumption, the Taylor expansion of f .w/ � z0 at w0 is of the form

.w � w0/ �1XnD0

anC1.w � w0/n

where a1 ¤ 0. However, since z�˛ can be defined as a holomorphic function in theneighborhood of any non-zero point, we may then write

g.w/ D 1XnD0

anC1.w � w0/n

!�˛;

which is a holomorphic function in the neighborhood w0. ut

2.3 Lemma. Let f W P ! .0; 1/ be a holomoprhic isomorphism where Pis an open polygon. Then f extends to a homeomorphism f W P ! .0; 1/.Furthermore, if we denote by g.w/ the inverse of f , assuming that g.wi / D zi wherez1; : : : zk are the vertices of P , then for w ¤ w1; : : : ;wk in the domain of g, g canbe extended to a holomorphic function with non-zero derivative in a neighborhood

2 Schwartz-Christoffel formula 319

of w, and additionally

.g.w/ � zi / � .w � wi /�˛i (2.3.1)

can be extended to a holomorphic function on an open neighborhood of wi , whichis non-zero there.

Proof. Consider a point z 2 @P , z ¤ z1; : : : ; zk . Let us use the above notation forP . Assume, without loss of generality, that z 2 L0, and L0 � R. Consider the setQ D Int.P [ fz j z 2 P g/, (note that z means the complex conjugate of z while Pis the closure), and the set R D C X fz 2 R j jzj > 1g. For z 2 L0 X fz0; z1g, by theRiemann Mapping Theorem 1.2, there exists a unique holomorphic isomorphismQ ! R such that g.z/ D 0, and g0.z/ is a positive real number. Since, however, themap g.z/ is another solution, it must be equal to g.z/ or, in other words, g.z/ D g.z/.It then follows from the Intermediate Value Theorem that gŒP � is contained in theupper half-plane (the open half-plane with angle 0), and hence must be equal toit. Thus, g restricts to a holomorphic isomorphism from P onto the upper half-plane. Replacing .0; 1/ by the upper half-plane (which we may do by a Mobiustransformation), we see that f extends holomorphically to an open neighborhoodof z.

Let us now consider the points z D zi . Assume, without loss of generality, i D 0,L0 � R, z0 D 0. Then denoting by the continuous extension of the function(2.1.2) for ˛ D ˛0, to the closed upper half-plane, let � be the restriction of ��1 toP . Then we have

�ŒP � \ R D �ŒLk�1 [ L0�:

Let this image be the interval hs; ti where s < 0 < t . Now applying the argumentof the previous paragraph to the holomorphic isomorphismˆ from the set

Int.�ŒP � [ fz j z 2 �ŒP �g/

to C X ..�1; si [ ht;1/ which maps 0 to 0 with a positive real derivative at0, we again see that the map must be symmetric under complex conjugation, andhence must map �ŒP � holomorphically bijectively onto the open upper half-plane.Therefore (after composing with a Mobius transformation to pass from the upperhalf-plane to .0; 1/), ˆ ı � gives a continuous extension of f to a neighborhoodof z0 in P , and, in fact, also a holomorphic extension of f ı ��1 to an openneighborhood of 0.

Now the open neighborhoods of all points z 2 @P cover @P which is compact,and hence by the Heine-Borel Theorem 2.3 of Chapter 9, we may cover @P byfinitely many such neighborhoods. By the uniqueness theorem for holomorphicfunctions 4.4 of Chapter 10, the local extensions of f agree on the intersectionsof the neighborhood, which proves the existence of the continuous map f .


Now the statement about the holomorphic extension of the function (2.3.1)follows from Lemma 2.2 (applied to this same function g). ut

2.4 Theorem. (The Schwartz-Christoffel formula) Let P be a convex polygon asabove, and let f W P ! .0; 1/ be a holomorphic isomorphism. Let f .zi / D wi(see Lemma 2.3). Then the inverse g of the map f is given by the formula

g.w/ D C

Z w

0

kYiD1.u � wi /

�ˇi du CD (2.4.1)

for some constants C;D 2 C.

Proof. Apply Lemma 2.3. Differentiating, we get that the function

h.w/ D g0.w/ �kYiD1.w � wi /

ˇi (*)

extends holomorphically to an open set U containing .0; 1/, and the extensionhas no zero on U . We will show that the argument of the function (*) is, in fact,constant on the boundary @.0; 1/. First, consider the argument (in the sense ofSubsection 6.3 of Chapter 10) of the function g0.w/ for w in the segment of the unitcircle between the points wi and wiC1. But we see that g.w/ on that segment is acomposition of a linear function with Ln.z/, and using the chain rule,

Arg.g0.w// D Ci � Arg.w/

where Ci is a constant in the circle segment between wi and wiC1. (Comment: Inthis case, we consider the argument in the broader sense, i.e. determined only upto an integral multiple of 2). Now the key point is that the slope of the side of Pchanges by ˇi when passing the point wi , which immediately gives

Ci � Ci�1 D ˇi :

On the other hand, by basic geometry of an isosceles triangle, we have

Arg.w � wi / D ˙

2C Arg.wi /C Arg.w/

2: (**)

Therefore, for w on the circle segment between wi and wiC1,

Arg

kYiD1.w � wi /

ˇi

!D Qi C .

kXjD1

ˇj /Arg.w/=2 D Qi C Arg.w/

3 Riemann surfaces, coverings and complex differential forms 321

for some constant Qi . When passing wi in the clock-wise direction, one of the Csigns in (**) changes into a �, which shows

Qi �Qi�1 D �ˇi :

We see then that the argument of the function h.w/ of (*) is constant on the unitcircle. Thus, the holomorphic function h.w/ on U maps the unit circle into a set ofthe form

S D fteib j t > 0g

for b constant (a ray). Applying the Maximum Principle 6.3.5 of Chapter 10 to theholomorphic functions

eh.w/e�ib; e�h.w/e�ib

;

we then see that h.w/ maps the whole set .0; 1/ to S , and thus, by the OpenMapping Theorem 6.3.4 is constant. Integrating gives the statement of the theorem.

ut

Comment: The numbers wi are not determined by Theorem 2.4 or any of theabove discussion. They are difficult to determine analytically except in a few veryspecial situations (see Exercises (6), (7)).

3 Riemann surfaces, coverings and complex differentialforms

We will now use what we learned about complex analysis to discuss a partial“complex analog” of some of the material of Chapter 12. While this may seem likean abstract exercise, it actually turns out to be an extremely useful device, whichwill enhance greatly our understanding of topics already covered, such as Mobiustransformations, simply connected open subsets of C, and even primitive functions.

3.1 Riemann Surfaces: the basic definitions

Much of the theory of smooth manifolds of Chapter 12 can be directly translated toform a theory of “complex manifolds” by simply replacing R with C and smoothfunctions by holomorphic functions. However, there are some notable exceptionswhich require care. First of all, to discuss a theory of complex manifolds in anarbitrary dimension, we would first have to study analysis in several complexvariables. While the reader could probably fill in the basic definitions, the theory


of several complex variables is a special area of analysis with many subtleties,which exceeds the realm of this book. For a good introduction to that subject, werecommend [15].

Because of this, we will restrict our attention to complex dimension 1. A complexmanifold of complex dimension 1 is called a Riemann surface. (It has, of course,topological dimension 2, and 2-dimensional manifolds are often called surfaces.)Thus, we have the following definition:

A Riemann surface is a 2-dimensional topological manifold † with an atlas.Ux; hx/ (where the coordinate maps are understood as maps into C) such that thecompositions (C) of Subsection 1.2 of Chapter 12 are holomorphic maps.

Analogously with Subsection 1.3 of Chapter 12, a map f W †1 ! †2 ofRiemann surfaces is called holomorphic if f is continuous, and for every x 2 †1,the composition

hxŒ.f�1ŒUf.x/�/ \ Ux�

h�1x

f �1ŒUf.x/�\ Uxf

Uf.x/

hf.x/

C

is holomorphic. As expected, a holomorphic map f W † ! C for a Riemann surface† will be called a holomorphic function on †.

The treatment of tangent vectors of Riemann surfaces also parallels directly thesmooth case, i.e. Subsection 2.1 of Chapter 12. Of course, the tangent space T†xto a Riemann surface at a point x 2 † is a complex line, i.e. a vector space over Cof dimension 1.

The first difference between Riemann surfaces and smooth manifolds is thatRemark 1 of Subsection 1.1 of Chapter 12 does not apply to Riemann surfaces.In other words, we cannot assume that the coordinate maps are onto C: if we did,then there would not be enough examples. By Proposition 1.1.1, an open subsetU ¨ C, which we can consider as a Riemann surface where the (single) coordinatemap is the inclusion, does not have an atlas whose coordinate systems would beholomorphic maps onto C.

Another substantial difference is the absence of a “holomorphic partition ofunity”. In other words, the discussion of Subsection 1.5 of Chapter 12 does not havea holomorphic analogue. For example, a holomorphic function on a connected openset which is 0 outside of a compact subset is necessarily constant 0 by Theorem 4.4of Chapter 10.

On the other hand, also in contrast with the case of real manifolds, note againthat for a bijective holomorphic map of Riemann surfaces f W †1 ! †2, byTheorem 6.3.3 of Chapter 10, Dfx ¤ 0 for every x 2 †1, and thus f �1 isalso a bijective holomorphic map. Again, such maps will be called holomorphicisomorphisms, and holomorphic automorphisms if †1 D †2.


3.2 The first examples

It turns out that we already have a number of examples of Riemann surfaces. Ofcourse, open subsets of C are immediate examples. A first “non-trivial” example isthe complex projective space CP1: As a set, it is C [ f1g. It is topologized as S2,with C identified homeomorphically with S2 X fag for any chosen point a 2 S2.Then the atlas has two charts: one is C and the identity on C, the other is CP1 X f0gwith the chart defined by

z 7!�1=z for z ¤ 10 for z D 1.

Now it is pretty much obvious from the definition that the Mobius transformationsof Subsection 1.1 are holomorphic automorphisms of CP1, and it is not difficult tocheck that they are the only ones (see Exercise (10)).

Moreover, for an open set U � C, note that a meromorphic function on U isprecisely the same thing as a holomorphic map U ! CP1. Because of this, oneextends this to call a meromorphic function on a Riemann manifold† a holomorphicmap f W † ! CP1.

Here is another example: Let a; b be complex numbers linearly independent overR. Introduce an equivalence relation on C where x1 C iy1 x2 C iy2 is x1 �x2 D ka, y1 � y2 D `b where k; ` are integers. The set E of equivalence classeswith respect to this equivalence relation is called an elliptic curve. (The use of theterm “curve” here stems from algebraic geometry, where one develops methods fordefining geometric objects, called varieties, over general fields. A 1-dimensionalvariety is called a curve. A non-singular curve over the field C is then, in particular,a Riemann surface.)

Denote the equivalence class of z 2 C by Œz�, an element of an equivalence classis called its representative. Clearly, we have a projection

W C ! E

given by

.z/ D Œz�:

We may define a metric E by letting the distance of two classes Œz0�, Œt0� be

min jz � t j

where z 2 Œz0�, t 2 Œt0�. The reason the minimum exists is that the subset

L D fkaC `b j k; ` 2 Zg


is discrete. The projection is then continuous. There exists, therefore, an " > 0

such that .0; "/ \ L D f0g. Then for any z 2 C, j.z; "/ is a homeomorphismonto .Œz�; "/. Thus, the inverses of these restrictions can be taken for an atlas,making E a Riemann surface.

Meromorphic functions on E are the same data as doubly periodic functions onC. Such functions are called elliptic functions. See Exercise (8) for one method bywhich examples of elliptic functions can be constructed.

3.3 Coverings

3.3.1Let † be a Riemann surface. A holomorphic map W T ! †, where T is anotherRiemann surface, is called a covering if for every z 2 †, there exists an openneighborhood Vz such that �1ŒVz� is a disjoint union of open subsets Ui , i 2 I ,such that for each i , the restriction

jUi W Ui ! Vz

is a holomorphic isomorphism. We call Vz a fundamental neighborhood. Notethat an open subset of a fundamental neighborhood which contains z is also afundamental neighborhood.

Obviously, a holomorphic isomorphism is a covering. If E is an elliptic curve,the projection W C ! E discussed in Subsection 3.2 is a covering. For yet anotherexample, see Exercise (14).

3.3.2 Coverings from (local) primitive functionsAnother example of a covering, which will be of great significance to us, is obtainedas follows: Let U � C be an open subset, and let f W U ! C be a holomorphicfunction. Let Uf be equal to U C as a set. Denote by

W Uf ! U

the projection to the first factor: .z; t/ D z. Introduce, however, a topology on Ufas follows: Let its basis consist of all sets of the form

WV;F D f.z; F .z// j z 2 V g (*)

where V � U is an open subset and F is a primitive function of f on V . ByTheorem 2.3 of Chapter 10, every point of Uf is contained in one of the sets (*), andin fact .WV;F ; jWV;F / form an atlas of a Riemann surfaceUf , and, furthermore, theprojection is a covering. (Convex open subsets of U can be taken as fundamentalneighborhoods.)


3.3.3 Paths and homotopyWe will now briefly investigate topological properties of coverings. By a path in atopological spaceX , one means a continuous map ! W h0; 1i ! X . The points !.0/and !.1/ are called the beginning point and end point, respectively. A homotopy ofpaths!, �with the same beginning point and the same end point is a continuous maph W h0; 1i h0; 1i ! X such that h.s; 0/ D !.s/, h.s; 1/ D �.s/, h.0; t/ D !.0/,h.1; t/ D !.0/ for all s; t 2 h0; 1i. We write h W ! ' �. Our main result is thefollowing

3.3.4 Theorem. Let W T ! † be a covering, and let ! W h0; 1i ! † be a path.Let a point x 2 T be such that .x/ D !.0/. Then there exists a unique path Q! inT such that Q!.0/ D x and Q!.t/ D !.t/ for all t 2 h0; 1i. Furthermore, if ! ' �,then Q! ' Q� (in particular, Q! and Q� have the same endpoints). One refers to the pathQ! as a lifting of the path !.

Proof. Let At be an open interval containing the point t 2 h0; 1i such that !ŒAt \h0; 1i� is contained in a fundamental neighborhood. By Theorem 5.5 of Chapter 2,h0; 1i is covered by finitely many of the open intervalsAt . Denoting their end pointsby 0 D t0 < t1 < � � � < tk D 1, each of the images !Œhti ; tiC1i� is contained in afundamental neighborhood. We can prove by induction on i that a lift Q!i of !jh0; ti iwith end point x exists and is unique: in fact, assuming this for a given i , Q!i exists,let V be a fundamental neighborhood containing!Œhti ; tiC1i�, and let Vj be the opensubset given by the definition of a covering which is mapped homeomorphically toV by the restriction i of the projection, and has the property that Q!i .ti / 2 Vj . Thenfor t 2 hti ; tiC1i, define

Q!iC1.t/ D �1i !.t/:

Clearly, this extends Q!i to the required Q!iC1, and further this extension is uniquelydetermined, since i is a homeomorphism. Now we can put Q! D Q!k , and we haveboth existence and uniqueness.

Regarding the homotopy, let h W ! ' �. We shall construct a lift of thishomotopy to T . Note that we already know the lift exists and is uniquely determinedby applying the path lifting theorem separately to the path h.‹; a/ with eachfixed a. However, we must prove that this lift Qh W h0; 1i h0; 1i ! T iscontinuous. To this end, we must repeat, to some extent, our above argumentfor paths: The set h0; 1i h0; 1i is compact, and hence is covered by finitelymany rectangles hs; s0i ht; t 0i the closures of whose images lie in fundamentalneighborhoods. Taking the finite sets of all such s; s0 and t; t 0, we obtain partitions0D s0 < s1 < � � � < s` D 1, 0 D t0 < t1 < � � � < tm D 1 where the h-image ofeach rectangle hsi ; siC1ihtj ; tjC1i is in a fundamental neighborhoodUi;j . For eachj , we then prove by induction on i that Qhjh0; sii htj ; jjC1i is continuous; indeed,suppose the statement is true for a given i (and a fixed j ). Then by the connectednessof intervals and the induction hypothesis, QhŒfsi g htj ; tjC1i� is contained in one of


the disjoint open sets which, by , map homeomorphically onto Ui;j . Inverting thehomeomorphism, we obtain the statement for i C 1.

To see that Qh.s; 1/ is constant in s, note that �1Œf!.1/g� is discrete, and acontinuous function from a connected space to a discrete space is constant. ut

Remark: It is useful to note that the proof of this theorem was purely topologicaland did not make any use of the holomorphic structure.

3.4 Complex and holomorphic differential forms

3.4.1 Integration on Riemann surfacesLet us begin by a brief discussion of complex line integrals on Riemann surfaces.A Riemann surface † is certainly a smooth manifold, and by the material ofChapter 12, for a differential 1-form ! on † and a continuously differentiable mapL W ha; bi ! †, we may integrate

Z

L

! DZ b

a

L�.!/: (*)

This definition extends, as before, in an obvious way to piecewise continuouslydifferentiable curves L, and is independent of parametrization in the sense ofChapter 8. Therefore, the key point is specifying the differential 1-form !.

What one means by complex integral is that using complex multiplication, we canintroduce 1-forms with complex coefficients (also called complex-valued differentialforms. We obtain those by applying ‹˝R C to the spaces TMx, ƒk.TM �

x /. (Whentensoring over R with C, we consider C as an R-vector space. However, notethat ‹ ˝R C covariantly turns R-vector spaces into C-vector spaces by usingthe multiplication in C.) Thus, a smooth complex-valued k-form assigns to eachx 2 M , an element of

ƒk.TM �x /˝R C

which becomes smooth upon identification of TMx with C Š R2 when x 2 U and

U is a coordinate neightborhood. Identifying C with R2, a complex-valued k-form

on a Riemann surface is then precisely the same thing as a pair of real k-forms:its real and imaginary part. This construction could, in fact, be done by any (real)smooth manifold.

We remarked in Subsection 4.4 of Chapter 8 that the complex line integral overa piecewise continuously differentiable curve in an open subset U � C we usedso extensively in Chapter 10 (and the present chapter) can be expressed in terms ofthe line integral of the second kind. In the more modern context of complex-valueddifferential forms, this is expressed by the simple but somewhat profound formula


dz D dx C idy: (**)

In (**), we identify C with R2 by

z D x C iy:

The right-hand side of (**) is then a differential 1-form with complex coefficients,so we may integrate it over piecwise continuously differentiable curves L in C.When integrating the left-hand side of (**) over L, we mean, on the other hand, thecorresponding complex line integral. This is, then, the same thing as treating dz asa complex-valued 1-form. Using complex multiplication, we then have additionalcomplex 1-forms ! D f .z/dz for a complex continuously real-differentiablefunction f .z/. A line integral of the complex-valued 1-form ! is then the samething as the complex line integral as treated earlier, thus explaining in this way acomplex line integral as an integral of a complex-valued 1-form.

3.4.2 Holomorphic 1-Forms on a Riemann surfaceOn a general Riemann surface †, we no longer have a preferred form d z, but wedo have one on a coordinate neighborhood with a holomorphic coordinate system z.Using the complex chain rule, we see that if z D z.t/ is a holomorphic function ofanother holomorphic coordinate t , then

dz D z0.t/dt

where z0 denotes the complex derivative (note that we only need to make sense ofthis on an open subset of C). This means that 1-forms on a coordinate system, whichcan be given as

f .z/dz;

where f is a holomorphic function, transform to 1-forms of the same kindupon holomorphic change of coordinates. Such complex-valued 1-forms are calledholomorphic 1-forms on the Riemann manifold †.

Now in analogy with Subsection 2.4 of Chapter 10, for a holomorphic 1-form ona Riemann surface †, a primitive function (if one exists) is a function F W † ! C

such that

dF D !: (*)

To see that this is the right generalization, note that on an open set U � C, indeed,dF D f .z/dz is equivalent to F 0.z/ D f .z/, see Exercise (12). Note that thereforeby what we proved in Chapter 10, it immediately follows that a primitive functionto a holomorphic 1-form (if one exists) is necessarily holomorphic.


Even if a primitive function does not exist, note that the construction 3.3.2immediately generalizes to give, for any holomorphic 1-form ! on a Riemannsurface †, a covering

W †! ! †:

Again,†! D †C as a set, and the topology has basis consisting of sets 3.3.2 (*),where V � † is open, and F is a primitive function of ! on V .

3.5 The basis dz, dz

We are not, however, always interested just in holomorphic 1-forms. It is thereforenatural to also introduce the “complex conjugate 1-form”

dz D dx � idy

on a coordinate neighborhood U of a Riemann surface, where z is the coordinate.Then dz, dz at each point of U of a coordinate neightborhood clearly form a basisof the (complex) dual of the complexified tangent space T†x ˝R C. Under aholomorphic change of coordinates z D z.t/, the 1-form d z transforms by

dz D z0.t/dt :

It follows that the C-vector spaces of forms on U

f�.z/dz j � smooth C � valuedg; f�.z/dz j � smooth C � valuedg

are preserved by holomorphic change of coordinates. Such forms are called 1-formsof type .1; 0/, resp. of type .0; 1/. In fact, note that if we define, for ! D f .z/dz Cg.z/dz with f; g smooth on U ,

! D f .z/dz C g.z/dz;

then this “complex conjugation” operator is invariant under holomorphic coordinatechange, and switches the spaces of .1; 0/-forms and .0; 1/-forms.

In view of this, it is helpful also to write the basis of complex vector fields dualto dz, dz on U :

@

@zD 1

2

�@

@x� i @

@y

�;

@

@zD 1

2

�@

@xC i

@

@y

�:


Note that in this notation, the Cauchy-Riemann equations for a function f can beexpressed simply by

@f

@zD 0: (3.5.1)

In other words, a continuously differentiable function on f W † ! C is holomorphicif and only if it satisfies (3.5.1) in holomorphic coordinates. Let us now examinehow the new basis behaves with respect to the exterior differential and the exteriorproduct. Regarding exterior product, note that

dz ^ dz D �2idx ^ dy: (3.5.2)

Regarding the exterior differential, one has, of course, for a complex continuously(real)-differentiable function f ,

df D @f

@zdz C @f

@zdz:

Recalling the Cauchy-Riemann condition in the form (3.5.1), it then becomesnatural to write for a form !0 2 f1; dz; dz; dz ^ dzg, and a complex continuouslydifferentiable function f on U ,

@.f!0/ D @f

@zdz ^ !0;

@.f!0/ D @f

@zdz ^ !0;

(the point, of course, being that d!0 D 0). Of course, we have

d D @C @:

One readily verifies that @ and @ are invariant under a change of holomorphiccoordinate (see Exercise (13)). Because of that, @ and @ are well-defined on anyRiemann surface †.

Note that on a compact Riemann surface, there may exist non-trivial holomorphic1-forms. For example, the form dz obviously determines a well-defined holomorphic1-form on any elliptic curve as defined in Subsection 3.2. Compare this withExercise 9 which asserts that every holomorphic function on a compact Riemannsurface is constant. In fact, note that if † is a compact Riemann surface, then thespace 1

Hol.†/ embeds canonically into the de Rham cohomology with complexcoefficients

H1DR.†;C/ D H1

DR.†/˝R C:


This is because if a holomorphic 1-form ! satisfies ! D df , then f is necessarilyholomorphic and hence constant, and hence ! D 0. In fact, one can prove that for† compact, there is a canonical isomorphism

H1DR.†;C/ Š 1

Hol.†/ 1Hol.†/:

Let us remark that the 1-form dz, of course, pulls back to any open subset U � C,and hence also to any covering W V ! U . We shall simplify notation by denoting�dz D d.z ı / also by dz, thus defining “complex integration” of functions onany covering † equipped with a covering W V ! U where U � C is an opensubset. Since every point z 2 V has an open neighborhood which is mapped by holomorphically bijectively onto an open subset of U , a complex derivative ofholomorphic functions f W V ! C is then also defined, as is the concept of aprimitive function of f on open subsets of V .

3.6 Complex line integrals revisited

In Chapter 8, we investigated extensively the implications of reparametrizing apiecewise continuously differentiable parametrized curve L. Note that in particular,we can make the domain of the parametrization the interval h0; 1i, which lets usconsider the parametrized curve L as a path in the sense of Subsection 3.3. Notealso that reparametrizations result in homotopic paths.

3.6.1 Theorem. Let † be a Riemann surface, and let ! be a holomorphic 1-formon †. Let L, M be partially continuously differentiable parametrized curves in Vwhich are homotopic as paths (in particular, they have the same beginning pointsand the same end points). Then

Z

L

! DZ

M

!:

Proof. Consider the covering †! of † corresponding to the local primitive func-tions of f (see 3.3.2 and 3.4. Let z0 be the beginning point of the parametrizedcurves L;M . Let QL (resp. QM ) be a lift of the path L (resp.M ) to † with beginningpoint .z0; 0/. Let .z1;K1/, .z2;K2/ be the end points of QL, QM . We claim that

Z

L

! D K1;

Z

M

! D K2:


In effect, find again 0 D t0 < t1 < � � � < tk D 1 such that LŒhti ; tiC1i�, MŒhti ; tiC1i�for each chosen i are contained in a fundamental neighborhood of the covering, anduse the properties of primitive functions.

But then since L;M are homotopic, .z1;K1/ D .z2;K2/ by Theorem 3.3.4,which proves our statement. ut

3.6.2 Corollary. Let U � C be an open set, let f W U ! C be a holomorphicfunction and letL,M be piecewise continuously differentiable parametrized curveswhich are homotopic as paths. Then

Z

L

f .z/dz DZ

M

f .z/dz: ut

Note that it would be quite difficult to prove this directly using the techniques ofChapter 10, in particular since there is no theory of line integrals of the second kindover continuous paths: we have really used the force of Theorem 3.3.4 here.

However, for open subsets of C, we can go even further. Recall the definition ofa simply connected open set from Subsection 1.2.

3.6.3 Theorem. For a connected open set U ¨ C, the following are equivalent:(1) U is simply connected (i.e. every holomorphic function on U has a primitive

function).(2) U is holomorphically isomorphic to .0; 1/(3) Let a; b 2 U . Then any two paths !; � with beginning point a and end point b

are homotopic.

Proof. (1) implies (2) by the Riemann Mapping Theorem 1.2. (2) implies (3)because .0; 1/ is a convex set: We may define the homotopy simply by h.s; t/ Dt!.s/ C .1 � t/�.s/. To see that (3) implies (1), suppose that U is a connectedopen subset of C satisfying (3). Let f be a holomorphic function on U . Let †be the covering 3.3.2 corresponding to the primitive function of f , and let †0be a connected component of U . By definition, the restriction of the projection0 W †0 ! U is a covering. We claim, in fact, that it is a holomorphic isomorphism.By Theorem 3.3.4, and the fact that U is path-connected, 0 is onto. Thus, if it isnot a holomorphic isomorphism, it cannot be injective, i.e. there must be two pointsx; y 2 †0 with 0.x/ D 0.y/. But†0 is connected, and since it is a manifold, alsopath-connected, so there is a path ! in beginning point x and end point y. Then theprojection 0 ı ! in U has the same beginning point and end point 0.x/ D 0.y/,but cannot be homotopic to the constant path by Theorem 3.3.4, since its lift ! hasa different beginning point and end point.

The contradiction proves that 0 is a holomorphic isomorphism; the secondcoordinate of �1

0 .z/ is then a primitive function of f on U . ut


4 The universal covering and multi-valued functions

Theorem 3.6.3 suggests the following definition: A Riemann surface † is calledsimply connected if it is connected, and if any two paths !, � which have the samebeginning point and the same end point are homotopic. The Riemann MappingTheorem actually has a generalization called the Uniformization Theorem statingthat every simply connected Riemann surface is holomorphically isomorphic to.0; 1/, C or CP1, but we shall not prove this here (see, however, Exercise (17)).

4.1 Theorem. Every connected Riemann surface † has a covering W Q† ! †

where Q† is simply connected. (This covering is called the universal covering of†.)

Proof. Select a point x0 2 †. Define Q† as the set of homotopy classes (i.e.equivalence classes with respect to the relation of homotopy) of paths ! withbeginning point x0. The homotopy class of a path ! will be denoted by Œ!�. Wehave an obvious map W Q† ! †, sending a class Œ!� to the end point of ! (byour definition of homotopy, this does not depend on the choice of a representative).Therefore, it remains to define a structure of a Riemann surface on Q† and to provethat it is simply connected and that is a covering.

It is helpful here to introduce the operation of concatenation of paths, which is ageneralization of the operation C on parametrized continuously differentiable curvewe considered in Chapter 8: If !, � are paths in † where the end point of ! is thebeginning point of �, define the path ! � � by

.! � �/.t/ D�!.2t/ for 0 � t � 1=2

�.2t � 1/ for 1=2 � t � 1.

Note that, (just as for piecewise continuously differentiable curves,) concatenationis associative up to homotopy. Also similarly as for curves, the operation �L ofChapter 8 has a generalization to paths: the inverse path of ! is defined by

!.t/ D !.1 � t/:

One readily proves that ! � ! is homotopic to a constant path, as is ! � !.To proceed further, let Œ!� 2 Q†, Œ!� D x. Let .Ux; hx/ be a coordinate system

of † at x, and let V � hxŒUx� be a convex open subset containing hx.x/. Thenlet UŒ!�;V � Q† be the set consisting of all classes Œ! � ..hx/�1 ı L/� where L isa linearly parametrized line segment in V with beginning point hx.x/. Note thatthis class does not depend on the choice of the representative ! of the class Œ!�.Note that by definition, maps UŒ!�;V bijectively onto h�1

x ŒV �. (Note: our notationimplies that a fixed coordinate system is specified at each x 2 †; otherwise, thenotation UŒ!�;V must be modified to reflect the coordinate system.)

4 The universal covering and multi-valued functions 333

4.1.1 Lemma. If .Œ!�/ D .Œ��/ (i.e. ! and � have the same end point x) andŒ!� ¤ Œ�� (i.e. ! and � are not homotopic), then for any convex open subset V �hxŒUx�,

UŒ!�;V \ UŒ��;V D ;:

Proof. If ! � ..hx/�1 ı L/ ' � � ..hx/�1 ı L/, then

! � ..hx/�1 ı L/ � ..hx/�1 ı L/ ' � � ..hx/�1 ı L/ � ..hx/�1 ı L/;

(note: this uses associativity of � up to homotopy), which in turn implies

! ' �

(which uses the inverse property). ut

We still need to make yet another observation.

4.1.2 Lemma. Let !, V be as above and let y 2 h�1x ŒV �. Then there exists a path

� and an " > 0 such that for a convex openW � .hy.y/; "/, UŒ��;W � UŒ!�;V .

Proof. It suffices to choose " > 0 such that h�1y Œ.hy.y/; "/� � h�1

x ŒW �. We set

� D ! � ..hx/�1 ı L/

where L is a linearly parametrized line segment with beginning point hx.x/ andend point hx.y/. To prove that UŒ��;W � UŒ!�;V , let M be a linearly parametrizedline segment in W with beginning point hy.y/ and end point hy.z/. We need toprove that

Œ� � .h�1y ıM/� 2 UŒ!�;V : (*)

To this end, note that by associativity of �,

� � .h�1y ıM/ ' ! � .h�1

x ı L/ � .h�1y ıM/:

Now we have

.h�1x ı L/ � .h�1

y ıM/ D h�1x ı .L � .hx ı h�1

y ıM//:

Clearly, the pathL� .hx ıh�1y ıM/ in V is not a linearly parametrized line segment,

but is homotopic to one since V is a convex set. This proves (*). ut


Now by Lemma 4.1.2, we can give Q† a topology where a subset U is aneighborhood of an Œ!� 2 U if and only if it contains a subset of the form UŒ!�;V (weneed the lemma to conclude that this definition is correct in the sense that a set wecall a neighborhood indeed contains an open subset). Lemma 4.1.1 then implies thatfor U D h�1

x ŒV � as above, �1ŒU � is a disjoint union of open subsets UŒ!�;V overall the ! with .!/ D x. We can then define an atlas of Q† as the open sets UŒ!�;Vtogether with the coordinate maps hx ı where .Œ!�/ D x. It then follows that Q†is a path-connected Riemann surface and is a covering - except for one detail: wemust prove that Q† is separable.

To this end, let Ui be a countable basis of † such that each Ui is connectedand contained in a convex subset of a coordinate neighborhood. Then each Ui is afundamental neighborhood. We will prove that for each x 2 †, the set �1Œfxg� iscountable; then the connected components of �1ŒUi � form a countable basis of Q†.But note that by compactness as above, for every path with beginning point x0 andend point x, there exist 0 D t0 < t1 < � � � < tm D 1, a finite sequence i1; : : : ; imsuch that !Œhtj�1; tj i� � Uij for j D 1; : : : ; m. In particular, Uij \ UijC1

¤ ;.

One then proves by induction that there exist unique connected components QUij of�1ŒUij � such that QUij \ QUijC1

¤ ;. Since clearly Œ!� 2 QUim , and since there areonly countably many such sequences i1 < � � � < im, there are only countably manyŒ!� with .Œ!�/ D x, as claimed.

Finally, we shall prove that Q† is simply connected. But this is easy. First ofall, Q† is path-connected by construction (since the lift of a path ! with beginningpoint x0 has, by definition, end point Œ!�). Next, suppose that ˛, ˇ are two pathsin Q† with the same beginning point Œ!� and the same end point Œ��. But this meansŒ! � . ı ˛/� D Œ�� D Œ! � . ı ˇ/�, i.e. ! � . ı ˛/ ' ! � . ı ˇ/, which implies ı ˛ ' ı ˇ (by concatenating with !, which implies ˛ ' ˇ by Theorem 3.3.4.

ut

4.2 Base points, universality and multi-valued functions

Let † be a connected Riemann surface and let x0 2 †. We refer to such achosen point as a base point of †. Note that we already used the base point in theconstruction of the universal covering Q†, and that in fact that construction comeswith a preferred base point Qx0, represented by the constant path at x0. We have. Qx0/ D x0. We refer to a covering W T ! † with a choice of base points. Qx0/ D x0 as a based covering.

The term ‘universal covering’ (which should really pedantically be called“universal based covering”) is justified by the following fact:

4.2.1 Theorem. Let † be a connected Riemann surface. Consider the baseduniversal covering W Q† ! †, with base points Qx0 7! x0, and let � W T ! †

be any based covering, with base points y0 7! x0. Then there exists a unique basedcovering � W Q† ! T such that �. Qx0/ D y0. In fact, we have � ı � D .


Proof. Let x 2 Q†. Let ! be a path in Q† with beginning point ex0 and end point x.By Theorem 3.3.4, there is a unique lift � of the path ı ! to T with beginningpoint y0. Let �.x/ be the end point of �. (Note in fact that this definition is forced bythe path lifting property, which already implies uniqueness.) On the other hand, alsonote that our definition of �.x/ did not depend on the choice of the path !, sinceany two such paths are homotopic as Q† is simply connected. Because of this, ifU is a connected fundamental open neighborhood of a point z 2 † for both thecoverings and � , and if Ui (resp. Uj ) is the open disjoint summand of �1ŒU �(resp. ��1ŒU �) such that 0 D jUi ! U (resp. �0 D � jUj ) and which containsx (resp. �.x/), then � jUi is given by the formula ��1

0 ı 0, which shows that �is a covering with such fundamental neighborhoods Uj . (Note: if y 2 T is notin the connected component of the base point, then it won’t be in the image of � ,so the fundamental neighborhood of y can be chosen to be the whole connectedcomponent.) ut

We immediately get the following

4.2.2 Theorem. A connected Riemann surface † is simply connected if and only ifevery covering W T ! † with T connected is a holomorphic isomorphism.

Proof. Suppose † is simply connected and W T ! † is a covering with Tconnected. We already remarked that is onto by Theorem 3.3.4. Suppose .x1/ D.x2/. Let ! be a path in T with beginning point x1 and end point x2. Then ı !has a beginning point equal to its end point, and hence is homotopic to the constantpath since† is simply connected. Thus, by Theorem 3.3.4, x1 D x2. Thus, is alsoinjective, and thus is a holomorphic isomorphism.

On the other hand, suppose † is connected but not simply connected. Then theuniversal covering W Q† ! † cannot be a holomorphic isomorphism, since Q† issimply connected. ut

4.2.3 Corollary. (Uniqueness of universal covering) Let † be a connected Rie-mann surface. A based universal covering W Q† ! † with base points ex0 7! x0is unique in the sense that for any other based universal covering � W T ! † withbase points z0 7! x0, there exists a unique holomorphic isomorphism � W Q† ! T

such that ˛.y0/ D z0. In fact, � ı � D .

Proof. By Theorem 4.2.1, there exists a unique covering � W Q† ! T withthe specified properties. Since T is simply connected, by Theorem 4.2.2, � is aholomorphic isomorphism. ut

4.2.4 Multi-valued functionsLet† be a based connected Riemann surface with base point x0, and let W Q† ! †

be a based universal covering with base points Qx0 7! x0. Then we define a multi-valued holomorphic function on † based at x0 as a holomorphic function on Q†.


Note that then, in particular, the multivalued function based at a point x0 does havea well-defined “value” at the point x0.

Multivalued holomorphic functions based at x0 form an algebra in the sensethat they contain (ordinary) holomorphic functions (a holomorphic function f isidentified with the multi-valued function f ı ), and have well-defined operationsof addition and multiplication. Much more is true, of course, for example if f is amulti-valued holomorphic function based at x0 and g W C ! C is an ordinary holo-morphic function, then there is a well-defined multivalued holomorphic functiong ı f based at x0.

Note that by Corollary 4.2.3, the choice of Q† does not matter in the sensethat multi-valued holomorphic functions defined via any other based holomorphicuniversal covering are related to those defined via Q† by a preferred bijection, namelythe one induced by the based holomorphic isomorphism between Q† and T , and thatthis bijection preserves all the operations in sight. It is important to note, however,that unless † is simply connected, there is no preferred way of identifying thealgebras of multivalued holomorphic functions based at different base-points of †.

Examples of multi-valued holomorphic functions on Riemann surfaces can beobtained from holomorphic 1-forms !: Note that we have a primitive function Fof ! well-defined on any connected component of the covering †! , and hence,by Theorem 4.2.1, on the universal cover. This is referred to as the multi-valuedprimitive function of!. Note that a discussion of base points is not so important here,since no matter how we choose base points, two multi-valued primitive functions ofthe same holomorphic 1-form will differ by a constant. In particular, for connectedopen sets U � C, we have a well-defined notion (up to additive constant) of a multi-valued primitive function based at z0 2 U of a given multi-valued function basedat z0.

For example, the multi-valued primitive function of

f .z/ D 1

z � z0

on C X fz0g with value equal to 0 at the base point which is chosen to project toz0 C 1 2 C X fz0g is called the multivalued logarithm ln.z � z0/. Choosing anarbitrary ˛ 2 C, we then obtain the multivalued function

.z � z0/˛ D e˛ ln.z�z0/

on C X fz0g, also based at z0 C 1 2 C X fz0g. Sometimes different conventions ofbase points are appropriate (see below). In any case, no matter what base point wespecify, the multi-valued logarithm is well defined up to adding an integral multipleof 2i , and .z � z0/˛ is well defined up to a non-zero multiplicative constant ofmodulus 1.


4.2.5 ExampleThe behavior of multi-valued functions can be quite complicated. Consider themultivalued function

f .z/ D za.z � 1/b (1)

on U D C X f0; 1g. Assume, for simplicity, a; b > 0 to be real numbers. (Note thatthere exists unique multi-valued functions za, .z � 1/b based at any chosen point0 < z0 < 1 whose values at the base point are positive real numbers.)

Now let F be the multi-valued primitive function on U (Let, for example, thevalue of F at the base point Qz0, .Qz0/ D z0, be 0.) Now let K be the circle withcenter 0 and radius z0 (and beginning point z0) oriented counter-clockwise, and letL be the circle with center 1 and radius 1 � z0 (and beginning point z0) orientedcounter-clockwise.

Let ! be a concatenation ofm copies ofK and n copies ofL (in any fixed order),and ez1 be the end-point of the lift Q! to the universal covering with beginning pointQz0. Then one immediately sees that

f .Qz1/ D f .Qz0/e2.maCnb/i : (2)

Let us now examine the behavior of the function F : First note that the integrals

A DZ z0

0

za.z � 1/bdz; B DZ 1

z0

za.z � 1/bdz

actually exist in the sense of ordinary real analysis, and are equal to (finite) positivereal numbers. Additionally, the integrals of (1) over a circle with radius " and center0 or 1 goes to 0 with " ! 0. Because of this, if eK is a lift of K to the universalcovering with beginning point Qz1 as above, we have, denoting the end point by Qz2,

F.Qz2/� F.Qz1/ D e2.maCnb/i .e2ai � 1/A; (3)

while if eL is a lift of L to the universal cover with beginning point Qz1 as above andend point Qz3, we have

F.Qz3/� F.Qz1/ D e2.maCnb/i .1� e2bi /B: (4)

Note that the operations (3), (4) do not commute: if we begin at Qz1 and follow firstK and then L, the value of the primitive function increases by

e2.maCnb/i .e2ai � 1/AC e2..mC1/aCnb/i .1� e2bi /B;


while following L first and then K beginning from the same point Qz1 gives anincrease of

e2.maC.nC1/b/i.e2ai � 1/AC e2.maCnb/i .1 � e2bi /B:

These two values are in general not equal. Because of this, it is not true, contrary towhat one may naively expect, that F.z/=f .z/ would be a single-valued function onU (in the sense that it would be a composition of an ordinary holomorphic functionon U with ). Note that we also see that the end points of the lifts of K � L andL�K to the universal covering with the same beginning point are, in fact, different.

Up to normalization, the function F belongs to a family of functionscalled hypergeometric functions; they are, in some sense, the “simplest” multi-valued holomorphic functions on a connected open subset of C for which thisphenomenon occurs.

4.3 The fundamental group

Let † be a connected Riemann surface with base point x0. Denote by 1.†; x0/the set of all homotopy classes of paths in † with beginning point and end point x0.Recall the proof of Theorem 4.1, and specifically the operation � of concatenation ofpaths. From the arguments given there, it follows that � gives a well-defined binaryoperation on 1.†; x0/. Moreover, note that the constant path at x0 is a unit elementfor the operation �. Also observe that if, for a path !, we define a path ! given by

!.t/ D !.1 � t/;then Œ!� is the inverse of Œ!� with respect to �. Thus, the set 1.†; x0/ with theoperation � is a group in the sense of Appendix B, 3.1. This group is calledthe fundamental group of † with base point x0. There are many interesting anddeep connections between the fundamental group and coverings, which we cannotexplore in this text, in part because we do not develop the theory of groups in anysubstantial way. After filling in the necessary algebra, say, in [2], the reader can findmore information in [6, 13, 20].

There is, however, one connection between the fundamental group and theuniversal cover which is too beautiful and striking to pass up. Consider a baseduniversal cover

. Q†; Qx0/ ! .†; x0/

of a connected Riemann surface †. A deck transformation is a homeomorphismf W Q† ! Q† such that the following diagram commutes:


Σf

π

Σ

π

Σ

(In other words, such that ı f D .) Note that a deck transformation isautomatically a holomorphic isomorphism, and that deck transformations form agroup with respect to composition of maps. Denote this group by � .

Now define maps

ˆ W � ! �1Œfx0g�

by letting, for a deck transformation f ,

ˆ.f / D f . Qx0/:

Define, on the other hand, a map

‰ W 1.†; x0/ ! �1Œfx0g�

as follows: Let ! be a path in † with beginning point and end point x0. Let Q!be a path in Q† which is the unique lifting of ! with beginning point Qx0 (seeTheorem 3.3.4). Then let ‰.Œ!�/ be the end point of Q!. By Theorem 3.3.4, thisdoes not depend on the choice of the representative ! of the class Œ!� 2 1.†; x0/.The following result can often be used to compute the fundamental group (seeExercise (15)).

Theorem. The maps ˆ and ‰ are bijections. Moreover, the composition ˆ�1 ı ‰is a homomorphism (hence isomorphism) of groups.

Proof. The fact thatˆ is bijective is a special case of the universality Theorem 4.2.1.To show that ‰ is onto, recall that Q† is connected, and hence path-connected. Lety 2 �1Œfx0g� and let � be a path in Q† from Qx0 to y. Put ! D ı�. Then‰.Œ!�/ Dy. To prove injectivity, note that the � just mentioned is unique up to homotopysince Q† is simply connected, and composing with gives uniqueness of Œ!�.

To prove that ˆ�1 ı‰ is a homomorphism of groups, let �, ! be paths in† withbeginning points and points x0 and let Q�, Q! be their lifts to Q† with beginning pointQx0. Let, on the other hand, O� be the lift of � to Q† whose beginning point is the endpoint Qx1 of Q!. Now let f be a deck transformation which sends Qx0 to Qx1. Then byuniqueness of path lifting, f ı Q� D O�. In particular, if we denote the end point of Q�by Ox0 and the end point of O� by Ox1, then

f . Ox0/ D Ox1:


We see that

ˆ.f ı g/ D Ox1 D ‰.! � �/;

so

f ı g D ˆ�1 ı‰.! � �/;

while

f D ˆ�1 ı‰.!/; g D ˆ�1 ı‰.�/;

which is what we wanted to prove. ut

4.4 Comment

The reader no doubt noticed that the concepts of covering, universal covering, andfundamental group do not use the structure of a Riemann surface very substantially.They can, indeed, be defined for more general topological spaces. In order for thenice theorems we presented to be true, however, some “local assumptions” aboutthe topological spaces involved must be included. The book [20] contains an easilyaccessible discussion of coverings in a more general topological context. One casewhich works very well is the case of smooth (or even topological) manifolds.Definition 3.3.1, Theorem 3.3.4, Theorem 4.1, Theorem 4.2.1, the definition offundamental group in 4.3 and Theorem 4.3 remain vaild if we replace “Riemannsurface” by “smooth manifold” (resp. “topological manifold”) and “holomorphicisomorphism” by “diffeomorphism” (resp. “homeomorphism”).

Yet, the case of Riemann surfaces, which we discussed above, is particularlystriking, and in this context, coverings were first discovered by Riemann.

5 Complex analysis beyond holomorphic functions

We are now ready to extend Cauchy’s formula (Theorem 3.3 of Chapter 10) to thecase of any continuously (real)-differentiable function. Let us write an integrationvariable

� D s C it

(to distinguish from the standard convention z D x C iy).

5.1 Theorem. (The Cauchy-Green formula) Let U be a domain in C. LetL1; : : : ; Lk be simple piecewise continuously differentiable closed curves withdisjoint images such that L1 q � � � q Lk is the boundary of U oriented counter-clockwise. Let U be defined and have continuous real partial derivatives on an open

5 Complex analysis beyond holomorphic functions 341

set V � C containing U . Then for z 2 U , we have

� 1

Z

U

.@f =@�/dsdt

� � zC 1

2i

kXjD1

Z

Lj

f .�/d�

� � zD f .z/: (5.1.1)

Proof. It is actually almost the same as the proof of Theorem 3.3 of Chapter 10.Using the language of Subsection 3.5, we may rewrite (5.1.1) as

� 1

2i

Z

U

d

�f .�/d�

� � z

�C 1

2i

kXjD1

Z

Lj

f .�/d�

� � zD f .z/: (*)

On the other hand, for " > 0 small, if we denote by K the boundary of .z; "/oriented counter-clockwise, then Green’s Theorem 5.4 of Chapter 8 gives

� 1

2i

Z

UX.z;"/d

�f .�/d�

� � z

�C 1

2i

kXjD1

Z

Lj

f .�/d�

� � zD 1

2i

Z

K

f .�/d�

� � z: (**)

When " ! 0, the right-hand side tends to f .z/ by the same argument as in the proofof Theorem 3.3 of Chapter 10. So it remains to prove that

lim"!0

Z

.z;"/

.@f =@�/dsdt

� � zD 0;

which, by continuity, is equivalent to

lim"!0

Z

.z;"/

dsdt

� � zD 0;

which is an obvious calculation (for example in polar coordinates). ut

5.2 The “inverse” Cauchy-Riemann operator

The Cauchy-Green formula is the starting point of applying methods of complexanalysis to classes of functions which are not necessarily holomorphic, but merelysatisfies differentiability conditions in the real sense. We will use these methods inSection 5 of Chapter 15, when we will construct a complex structure on an orientedsurface with a Riemann metric.

Recall Holder’s inequality (Theorem 8.1 of Chapter 5)

jjZ

C

f .�/g.z � �/dsdt jj1 � jjf jjpjjgjjq for1

pC 1

qD 1. (5.2.1)


We will focus here on functions defined on an open disk

D D .0; 1/:

(We could, of course, equivalently work on any other disk.) Where needed, we mayextend such functions to C by 0. Note also that for z 2 D, using polar coordinates,we have

�.�/ D 1

� � z2 Lq.D/ for every q < 2:

Thus, by (5.2.1), for p > 2, we have a well-defined operator

P W Lp.D/ ! L1.C/

defined by

.P.f //.z/ D � 1

Z

D

f .�/dsdt

� � z:

We will also need another version of this operator, defined by the formula

.P1.f // D � 1

Z

C

f .�/

�1

� � z� 1

�

�dsdt:

Note that the function

.�/ D 1

� � z� 1

�

is inLq.CX.0; 2jzj// for every q > 1, and thusP1.f / is defined for any functionf 2 Lp.C/, p > 2, and produces a (not necessarily bounded) complex functiondefined everywhere on C.

5.2.1 Lemma. Let f be a continuous function on C with support in D such that

jf .z/ � f .t/j � Kjz � t j˛ for some ˛ > 0 (5.2.2)

for some constant K . Then P.f / is a continuously differentiable function on C. Infact, we have

@P.f .z//

@zD � 1

Z

D

f .�/ � f .z/

.� � z/2dsdt;

@P.f .z//

@zD f .z/: (5.2.3)

If f is a continuous function on C which is in Lp.C/, p > 2 and satisfies (5.2.2),then P1.f / is a continuously differentiable function on C and


@P1.f .z//@z

D � 1

Z

C

�f .�/ � f .z/

.� � z/2� f .�/ � f .0/

�2

�dsdt;

@P1.f .z//@z

D f .z/ � f .0/:

(5.2.4)

Proof. Let us first prove the statement for P . Using polar coordinates, one easilyproves the identity

� 1

Z

D

dsdt

� � zD z; for z 2 D. (1)

Using this formula for z 2 D and small j zj, we have

.P.f //.z C z/� .P.f //.z/ D � 1

Z

D

f .�/ � f .z/� � z

. z/dxdy

� � z � zC f .z/ .z/:

(2)

Dividing by z and taking limits z ! 0 along the lines y D ix, y D �ix, weobtain the formulas (5.2.3). Note carefully that the function

f .�/ � f .z/j� � zj˛

(with arbitrary value at � D z) is, by assumption, bounded in � 2 D. After dividingby z, the limit behind the integral sign can be taken by the Lebesgue DominatedConvergence Theorem after we restrict the integral to DX.z; 2 z/. The remainingintegral is bounded by a constant times

Z

.z;2 z/

dsdt

j� � zj1�˛j� � z � zj : (3)

We must show that (3) converges to 0 with z ! 0. The integral (3) is certainlyfinite, and without loss of generality, z D 0. Now a substitution � D � z shows that(3) is proportional to j zj˛, and hence tends to 0 with z ! 0, as needed.

Proving the continuity of

@P.f .z//

@z

is actually easier, we may use the Lebesgue Dominated Convergence Theoremdirectly on the entire range of integration after substituting � D � � z. If z is notin the support of f , (1) still remains valid since f .z/ D 0 (formula (1) is not usedin that case).

The case of P1 is analogous: Instead of (1), we have


.P1.f //.z C z/ � .P1.f //.z/

� 1

Z

C

�f .�/ � f .z/

� � z

z

� � z � z� f .�/ � f .0/

�

z

� � z

�dsdt

C.f .z/ � f .0// .z/:

The Lebesgue dominated convergence argument can then be applied on the set

C X ..z; 2 z/[.0; 2 z//;

and we use the estimate (5.2.3) at the point z on.z; 2 z/ and at the point z D 0 at.0; 2 z/. The rest of the argument is the same. ut

As an application, we get the following extension of Liouville’s Theorem, whichwill be useful in Section 5 of Chapter 15.

5.3 Theorem. Let f be a function on C with continuous first (real) partialderivatives. Assume that

limz!1f .z/ D 0

and that there exists a function A.z/ with continuous first (real) partial derivativesand compact support such that

@f

@zD Af:

Then f .z/ D 0 for all z 2 C.

Proof. Assume without loss of generality that the support of A.z/ is contained in D.Put F.z/ D f .z/e�.P.A//.z/. Using Lemma 5.2.1, we compute

@F

@zD e�B.z/

�@f .z/

@z� f .z/A.z//

�D 0:

Thus, F is a holomorphic function on C, and since it tends to 0 at 1, it is zero byLiouville’s Theorem 5.1 of Chapter 10. ut

Finally, we will prove two easy inequalities involving the operator P , which willalso be useful in Section 5 of Chapter 15:

5.3.1 Lemma. (1) If f is a continuously differentiable function on C with supportin D, then


j.P.f //.z/j � 8jjf jj11C jzj :

(2) For every p > 2 there exists a constant Cp such that if f 2 Lp.C/, then

j.P1.f //.z1/� .P1.f //.z2/j � Cpjjf jjpjz1 � z2j1�2=p:

Proof. For (1), clearly it suffices to prove that

Z

D

dsdt

j� � zj � 8

1C jzj : (*)

First of all, by polar coordinates, we clearly have

Z

.0;r/

dxdypx2 C y2

D 2r:

Thus, for jzj � 1, we may use r D 2 to show that the left-hand side of (*) is less thanor equal to 4 . For jzj > 1, the idea is to integrate 1=j� � zj over the intersection of

.z; 1C jzj/ X.z; jzj � 1/ (**)

with the smallest angle with center z which contains D. As already remarked, theintegral of 1=j�� zj over (**) is 4 , so the integral over the intersection of (**) withan angle of size ˛ will be

2˛:

The angle in question has size

2arcsin.1=jzj/ �

jzj � 2

1C jzj ;

which is good enough.To prove (2), use Holder’s inequality with 1=p C 1=q D 1 (put � D �=jzj,

u D s=jzj, v D t=jzj):


ˇˇZ

C

�f .�/

� � z� f .�/

�

�dxdy

ˇˇ � jzj � jjf .z/jjp �

�Z

C

dxdy

j.� � z/�jq�1=q

D jjf .z/jjp ��Z

C

dudv

j.�� 1/�jq�1=q

jzj.2=q/�1:

Applying this to the function f .z C z2/ at the point z1 � z2 gives the claimedinequality. ut

6 Exercises

(1) Prove that a Mobius transformation maps an open disk, an open half-planeor the complement of a closed disk onto an open disk, open half-plane or thecomplement of a closed disk. Prove furthermore that for any two subsets of C[f1g of any two of the above three types, there exists a Mobius transformationmapping one onto the other.

(2) Let w0 2 .0; 1/. Consider the Mobius transformation

f .z/ D z � w01 � w0z

:

Prove that this gives a holomorphic automorphism of .0; 1/ which maps w0to 0. [Hint: Consider the effect of this Mobius transformation on jzj D 1.]

(3) Construct a non-constant (non-injective) holomorphic function f on C whichis not onto.

(4) Let a < b 2 R and let f; g W Œa; b� ! R be continuous real functions whichare continuously differentiable in .a; b/ and such that for a < x < b, f .x/ <g.x/. Prove that the set

fx C iy 2 C j a < x < b; f .x/ < y < g.x/g

is simply connected.(5) Find an elementary function which maps the set fz 2 .0; 1/ j Re.z/ >

0; Im.z/ > 0g bijectively holomorphically onto .0; 1/. [Hint: Find, in thisorder, holomorphic isomorphisms of the set described onto an open half-disk,an open quadrant, an open half-plane,.0; 1/.]

(6) Show that if the polygon P is a triangle, then in Theorem 2.4, the pointsw1;w2;w3 can be chosen to be any three points on the unit circle which occur inthis order when the circle is oriented counter-clockwise. [Hint: Using the mapsof Exercise (2) and rotations, show that there is a holomorphic automorphismof .0; 1/ which extends holomorphically to an open set containing .0; 1/,and maps a given choice of points w1;w2;w3 to any other such given choice.]

(7) Determine a choice of the points wi when P is a regular k-gon.

6 Exercises 347

(8) Using the Schwartz-Chrisfoffel formula, write down an explicit formula (withone free parameter) for a function f mapping bijectively holomorphically theupper half-plane on a rectangle. Such formulas are called elliptic integrals.Using complex conjugation (similarly as in Lemma 2.3), prove that the inversefunction g extends to a meromorphic function on C, which is doubly periodic,with periods equal to the sides of the rectangle (such functions are calledelliptic functions). For information on elliptic function, the reader may lookat [11].

(9) Prove that every holomorphic function on a compact Riemann manifold isconstant.

(10) Prove that the Mobius transformations are the only holomorphic automor-phisms of CP1. [Hint: Use Proposition 1.1.1.]

(11) Prove that non-constant meromorphic functions on CP1 are precisely rationalfunctions, i.e. functions of the form p.z/=q.z/ where p.z/, q.z/ are polyno-mials, q.z/ not identically zero. [Hint: Multiply (resp. divide) such a functionf .z/ by the product of all factors .z�zi /ki where zi is a pole (resp. zero)of orderki in C, (infinitely many zeroes or poles would mean f is a constant 0 or 1 bythe Uniqueness Theorem 4.4 of Chapter 10). Then we may assume without lossof generality that the restriction of f to C has neither zeroes nor poles. Nowif f .1/ ¤ 1, then f is bounded on C, while if f .1/ D 1, then 1=f .z/is bounded on C. In either case, f is constant by Liouville’s Theorem 5.1 ofChapter 10.]

(12) Prove in detail that for U � C an open set, F.z/ is a primitive function for the1-form f .z/dz with f holomorphic if and only if F.z/ is a primitive functionof f .z/.

(13) Prove in detail that the definitions of the operators @, @ on differential formson a Riemann surface is invariant under holomorphic change of coordinates.

(14) Prove that the function ez, considered as a holomorphic map C ! C X f0g, isa covering and that, in fact, it is the universal covering of C X f0g.

(15) From Exercise (14), construct an isomorphism

1.C X f0; g; x0/ ! Z

for any base point x0.(16) Prove that 1.C X f0; 1g; x0/ is not abelian for any base point x0. [Hint: Use

Example 4.2.5.](17) Prove that the Riemann surface CP1 is simply connected. [Hint: Use smooth

partition of unity to prove that every path is homotopic to a piecewise continu-ously differentiable p arametrized curve. For any two piecewise continuouslydifferentiable curves, there exists a point which is in neither of their images.]

(18) Prove that a connected Riemann surface † (or, for that matter, a connectedmanifold, see Comment 4.4) with a point x0 2 † is simply connected if andonly if 1.†; x0/ is the trivial group (i.e. a group with a single element).


(19) Recall the concept of a Lie group from Chapter 12, Exercise (6). Prove that thefundamental group of a Lie group is commutative (see Comment 4.4). [Hint:the concatenation of paths is homotopic to the point-wise product, using thegroup operation.]

(20) Prove that if W � ! G is a covering andG is a Lie group (cf. Comment 4.4)with both G and � connected, then � can be given a structure of a Lie groupsuch that is a homomorphism of groups.

(21) Define for f 2 Lp.D/, p > 2,

.Q.f // D � 1

Z

D

f .�/dsdt

� � z:

Assuming (5.2.2), calculate

@Q.f .z//

@z;@Q.f .z//

@z:

14Calculus of Variations and the GeodesicEquation

The aim of this chapter is to give a glimpse of the main principle of the calculus ofvariations which, in its most basic problem, concerns minimizing certain types oflinear functions on the space of continuously differentiable curves in Rn with fixedbeginning point and end point. For further study in this subject, we recommend [7].We derive the Euler-Lagrange equation which can be used to axiomatize a largepart of classical mechanics. We then consider in more detail the possibly mostfundamental example of the calculus of variations, namely the problem of findingthe shortest curve connecting two points in an open set in Rn with an arbitrary given(smoothly varying) inner product on its tangent space. The Euler-Lagrange equationin this case is known as the geodesic equation. The smoothly varying inner productcaptures the idea of curved space. Thus, solving the geodesic equation here goes along way toward motivating the basic techniques of Riemannian geometry, whichwe will develop in the next chapter.

1 The basic problem of the calculus of variations,and the Euler-Lagrange equations

1.1

For the purposes of this chapter, define a continuously differentiable function

y W ha; bi ! Rn (*)

as a function with the property that the function defined as the derivative of y on.a; b/ and as the respective one-sided derivatives at a and b is everywhere definedand continuous on ha; bi.

Now consider the vector space V D Va;b;p;q of all continuously differentiablefunction (*) such that

y.a/ D p; an y.b/ D q


349

350 14 Calculus of Variations and the Geodesic Equation

for some fixed values p;q 2 Rn. Let

L D L.t; x1; : : : xn; v1; : : : ; vn/ W ha; bi R2n ! R

be a function with continuous first partial derivatives (again, take one-sidedderivatives at a, b when applicable).

The most basic problem of the calculus of variations is looking for the extremesof the function (we use the term functional)

S W V ! R

given by

S.y/ DZ b

a

L.t; y.t/; y0.t//dt:

Note that S is continuous when we consider the metric on V given by the norm

jjyjj D supt2ha;bi

jjy.t/jj C supt2ha;bi

jjy0.t/jj:

(Here we may choose any of the usual norms on Rn, for example the maximumone.) However, in the kind of formal investigation we are going to do, even this willplay only a peripheral role.

Lemma. Let f W ha; bi ! R be a continuous function such that

Z b

a

f .t/h.t/dt D 0

for all continuously differentiable functions h such that h.a/ D h.b/ D 0. Thenf � 0.

Proof. Suppose f is not identically zero. Then f .t0/ ¤ 0 for some t0 2 .a; b/.Suppose, without loss of generality, f .t0/ > 0. Since f is continuous, there existsan " > 0 such that f .t/ > 0 for all t 2 .t0 � "; t0 C "/. Now let u be a continuouslydifferentiable function which is positive on some non-empty interval contained in.t0 � "; t0 C "/, and 0 elsewhere (we may use the “baby version” of smooth partitionof unity 5.1 of Chapter 8). Then

Z b

a

f .t/h.t/dt > 0:

ut

1.2 Theorem. (The Euler-Lagrange equations) Suppose the functional S W V ! R

has an extreme at a function y 2 V . Then the function y satisfies the system ofdifferential equations

1 The basic problem of the calculus of variations, and the Euler-Lagrange equations 351

@L

@xi

ˇˇxDy;vDy0

D d

dt

�@L

@vi

�ˇˇxDy;vDy0

:

Comment: It is often customary to write the equations in the form

@L

@yiD d

dt

�@L

@y0i

�;

but there is some danger in such a notation, since in the partial derivatives, we musttreat yi , y0

i as formal symbols plugged in for the arguments xi , vi of L, while thederivative by t is the actual total derivative by the independent variable t .

Proof of the theorem: Choose any continuously differentiable function h Wha; bi ! R, such that h.a/ D h.b/ D 0. Consider the real function of n variables

ˆh.u1; : : : ; un/ DZ b

a

L.t; y.t/C uh.t/; y0.t/C uh0.t//dt:

If the functionalL has an extreme at y, thenˆh has an extreme at o, and since it hascontinuous partial derivatives by the chain rule everywhere, we must have

@ˆh.o/

@uiD 0:

Denoting by ei the i ’th standard basis vector of Rn, compute

1

u.ˆh.ue

i /�ˆh.o//

D 1

u

Z b

a

�L.t; y.t/C uei h.t/; y0.t/C ueih0.t// �L.t; y.t/; y0.t//

�dt

D 1

u

Z b

a

@L.t; y.t/C �uei h.t/; y0.t/C uei h0.t//@xi

uh.t/dt

CZ b

a

@L.t; y.t/; y0.t/C �uei h0.t//@vi

uh0.t/dt!

for some 0 < �; � < 1. On the right hand side, we used the Mean Value Theorem 3.3of Chapter 3 twice. Note that the u factor cancels out, and using h.a/ D h.b/ D 0

and integration by parts in the second integral, we get

� � � DZ b

a

�@L.: : : /

@xi� d

dt

�@L.: : : /

@vi

��h.t/dt:

Now use Lemma 1.1. ut


1.3 Comment

The main idea of the proof resembles the idea of the total differential of a functionof finitely many variables (see Exercise (1) below for a more concrete statement). Itmay seem we got something for free: how come we can find extremes of functionalson a space of continuously differentiable functions as easily as extremes of functionsof finitely many variables? There is, however, one major catch: with the space V notbeing compact (not even locally), there is no guarantee an extreme of the functionalS on V exists at all! Therefore, Theorem 1.2 is not nearly as strong as it may seem,giving only candidates for a possible extreme. Similarly as in the case of functionsof finitely many variables, we call these candidates critical functions. Highly non-trivial methods are generally needed to show that a given critical function is in factan extreme (we will see an example of that below).

2 A few special cases and examples

Simplifications in the form of the Euler-Lagrange equations occur in certain specialcases.

2.1 WhenL does not depend on x

In this case, the Euler-Lagrange equations become

@L.t; y.t/; y0.t//@vi

D Ki

where K1; : : : ; Kn 2 R are constants.

Example: Let n D 1. Let us verify that the shortest graph of a functionconnecting two points .a; p/ .b; q/ in R

2, a < b, is indeed a straight line. Theformula for the arc length of a graph of a function y D y.t/ is

S.y/ DZ b

a

p1C y0.t/2dt;

and hence

L.t; x; v/ Dp1C v2:

Therefore, we have

@L

@vD vp

1C v2;

2 A few special cases and examples 353

and we get a differential equation

y0p1C y02 D K;

or

.1 �K2/y02 D K2:

Thus we get a critical function if and only if y0 is constant.

2.2 WhenL does not depend on t

Then we have

d

dtL.y.t/; y0.t// D

nXiD1

@L.y.t/; y0.t//@xi

� y0i C @L.y.t/; y0.t//

@vi� y00

i ;

which, using the Euler-Lagrange equation, is equal to

nXiD1

d

dt

�@L.y.t/; y0.t//

@vi

��[email protected]/; y0.t//

@vi�y00i D d

dt

nXiD1

@L.y.t/; y0.t//@vi

� y0i

!:

Thus, we obtain

d

dt

L �

nXiD1

@L

@viy0i

!D 0;

or in other words

nXiD1

@L

@viy0i � L D K (2.2.1)

is a “conserved quantity” in t . This expression is called the Hamiltonian. Whenn D 1, the equation (2.2.1) can be used directly instead of the Euler-Lagrangeequation to find critical functions.

Example: The brachistochrone problem. Design a shape of a roller-coaster trackin the tx plane such that the car starting at the point .0; r/ reaches the point .s; 0/ inthe shortest possible time. (Gravity is assumed to pull in the negative direction of thex axis. Caution: here x is the vertical coordinate, and t is the horizontal coordinate,not time!)


We may choose units such that the mass of the car is 1, as is the acceleration ofgravity. Then the potential energy at the point .0; s/ is s, and hence by conservationof energy, at a point .t; x/, the kinetic energy is .s � x/. Thus, if the component ofthe velocity in the t direction is w, we have

1

2w2.1C x02/ D s � x;

and hence

w Ds2.s � x/1C x02 ;

or

L.t; x; v/ D 1=w Ds

1C v2

2.s � x/ :

Thus, (2.2.1) gives

s1C v2

2.s � x/ � x0 x0p1C x02 � 2.s � x/ D K;

which yields

1 D Kp2.s � x/ � .1C x02/

or

1 D 2K2.1 � x/.1C x02/:

It is not difficult to verify that the solution can be expressed parametrically as

t.�/ D 1 �A.1C cos.�//; x.�/ D A.sin.�/C �/C B (2.2.2)

for suitable constants A, B . This curve is called a cycloid.

2.3 Lagrangian mechanics

In physics, the motion of a system of finitely many particles in R3 can be describedusing the Euler-Lagrange equations. Consider all the coordinates of all the particlestogether, so we have a variation problem in R3n, where the coordinates of i -th

2 A few special cases and examples 355

particle are coordinates number 3i � 2, 3i � 1, 3i . The basic principle for writingdown the Lagrangian is

L D kinetic energy � potential energy: (2.3.1)

In the basic setup of Newtonian mechanics, the particles have masses mi , and thekinetic energy is

nXiD1

1

2m.v23i�2 C v23i�1 C v23i /: (2.3.2)

A kinetic energy formula of this form, i.e. essentially the formP

12mv2, is referred

to as a standard kinetic term.The potential energy term is more variable. Assuming the particles act on one

another by gravity, Newton’s law of gravity gives potential energy

�Xi<j

Gmimjvuut

2XkD0.x3i�k � x3j�k/2

(2.3.3)

where G is Newton’s universal constant of gravity. We may generalize this furtherby including a conservative force field acting on each particle, Fi D grad.�i /,�i W R3 ! R, i D 1; : : : ; n, in which case we add to the potential energy the term

�nXiD1

�i .x3i�2; x3i�1; x3i /: (2.3.4)

According to the recipe (2.3.1), the (original) Lagrangian is obtained by taking thestandard kinetic term (2.3.2), and subtracting the potential terms (2.3.3), (2.3.4),thus getting

L.x1; : : : ; x3n; v1; : : : ; v3n/

DnXiD1

1

2m.v23i�2 C v23i�1 C v23i /

CXi<j

Gmimjvuut

2XkD0

.x3i�k � x3j�k/2

CnXiD1

�i .x3i�2; x3i�1; x3i /:


Lagrange’s principle states that the equation of motion is given by the criticalfunction for this Lagrangian on a time interval ha; bi with given positions at thetimes a and b, i.e. that it is subject to the Euler-Lagrange equations 1.2. We will notprove this here. In fact, a mathematical “proof” in this setting is not to be expected:we are referring to a system of physical particles. What could be proved, however,is that Lagrange’s equations are equivalent to Newton’s.

Observe that in the presence of the standard kinetic term (2.3.2), the Hamilto-nian (2.2.1) of 2.3.1 has the physical meaning of the total energy of the system,which, indeed, should be conserved by the law of conservation of energy.

The Lagrangian mechanics setup may seem like nothing new, since it only recov-ers Newton’s equations, and, in fact, is even less general, since it requires a conser-vative force field. However, the Lagrangian turns out to be extremely beneficial forgeneralizations. In fact, most of modern physics uses the Lagrangian formalism.

3 The geodesic equation

Let us return to mathematics. Perhaps the single most important example of theEuler-Lagrange equation is the geodesic equation in a Riemann metric (although itshould be pointed out that the equation does have a physical meaning, describing infact the motion of a light ray in a gravity field in Einstein’s general relativity).

3.1 A Riemann metric on an open subset of Rn

Let U � Rn be an open subset. Let gij W U ! R, i D 1; : : : ; n, be smooth functionssuch that for each x 2 U , g D .gij/i;j is a positive definite symmetric matrix.We will interpret g.x/ as the associated matrix of a (real) inner product

hu; vigof tangent vectors at the point x 2 U , which will be called a Riemann metric on U(see 7.7 of Appendix A). The key point is that the tangent space of U is canonicallyidentified with Rn via the coordinate map which is simply the embedding U � Rn.As a generalization of formula (**) in Subsection 2.2 of Chapter 8, we will, then,define the length with respect to the Riemann metric g of a piecewise continuouslydifferentiable curve represented by a map

� W ha; bi ! U

by the formula

sg.�/ DZ b

a

qh�0.t/; �0.t/igdt: (3.1.1)

(See Exercise (8).)

3 The geodesic equation 357

We will be interested in the variational problem of minimizing the func-tional (3.1.1) over the set of continuously differentiable curves with given boundarypoints �.a/ D A; �.b/ D B 2 U . Before getting into this seriously, we willintroduce a notational convention which is helpful when figuring out numericalexamples in complicated formulas with many indices: often, we are making multiplesums over indices, for example, i D 1; : : : ; n over terms where the index i occurs,and is equal, in two factors entering the formula. In this, and the following chapter,we will make the convention that

When an index i appears in more than one factor of a product, then iwill occur in precisely two such factors, once as a subscript and once asa superscript. This notation shall mean summation over all permissiblevalues of i , which must be the same in both factors in question; thesummation symbol

Xi

will be omitted.

(3.1.2)

Thus, using this convention, the components of the function � will be written withsuperscripts, �i , i D 1; : : : ; n, and the formula (3.1.1) above will assume the form

sg.�/ DZ b

a

qgij�0i �0j D

Z b

a

qgij.�.t//�0i .t/�0j .t/dt: (3.1.3)

The convention (3.1.2) may seem unreasonably restrictive, but turns out adequatein the types of formulas we will encounter. It is known as (one version of) theEinstein convention. When two quantities share an index as a subscript in oneand a superscript in the other (and summation over all permissible values is tobe performed), we call the quantities coupled. We can see already in (3.1.3) incomparison with (3.1.1) that the Einstein convention can make formulas moreexplicit. In the next chapter, when talking about the more general context ofmanifolds, we will talk about tensors, and will give the Einstein convention a deeperinterpretation.

3.2 A trick: modifying the functional

We see immediately that the Euler-Lagrange equation for the functional (3.1.3)will be a pain because of the square root in the Lagrangian. This problem has asurprisingly simple solution, which, at first, cannot possibly seem right: simply omitthe square root! Thus, we will consider the functional

Sg.�/ DZ b

a

gij�0i �0j : (3.2.1)


To justify this, recall that by Lemma 8.6.1 of Chapter 5,

Sg.�/ � 1

b � a.sg.�//

2;

while equality arises if and only if gij.�.t//�0i .t/�0j .t/ is constant in t (keep

in mind that we are using the Einstein convention). This condition is calledparametrization by arc length.

Note that any continuously differentiable curve can be parametrized by arclength: Letting

s.t/ DZ t

a

qgij.�.t//�0i .t/�0j .t/;

we obtain an increasing continuously differentiable map with positive derivative sfrom ha; bi to the interval h0; sg.�/i; composing � with s�1 is a parametrization byarc length. This shows that if the functional Sg indeed has a minimum in the spaceof continuously differentiable curves with fixed boundary points A, B , then theminimum curve also minimizes the functional sg , and furthermore is parametrizedby arc length!

3.3 The Euler-Lagrange equation for the modifiedfunctional-the geodesic equation

The modified functional (3.2.1) of 3.2 gives us the Lagrangian

L.x; v/ D gij.x/vivj ; (3.3.1)

using the notation x D .x1; : : : ; xn/, v D .v1; : : : ; vn/ and the Einstein convention.We have

@L

@viD 2gij.x/v

j :

Note here that from the point of view of the Einstein convention, we must

treat the i in@

@vias a subscript.

By the chain rule, we therefore have

d

dt

�@L.x; x0/@vi

�D 2gij.x

j /00 C 2@gij

@xk.xj /0.xk/0

D 2gij.xj /00 C

�@gij

@xkC @gik

@xj

�.xj /0.xk/0:

The last step may seem to do nothing, but we will see later that it is useful to have thequantity coupled to .xj /0.xk/0 symmetrical in j; k (it will help eliminate a certain,somewhat counterintuitive, quantity known as torsion).

3 The geodesic equation 359

Also note that by the chain rule, we have

@L.x; x0/@xi

D @gjk

@xi.xj /0.xk/0;

and hence the Euler-Lagrange equation becomes (after cancelling 2),

gij.xi /00 C 1

2

�@gij

@xkC @gik

@xj� @gjk

@xi

�.xj /0.xk/0 D 0; (3.3.2)

which is called the geodesic equation. It is useful to write

�ijk D 1

2

�@gij

@xkC @gik

@xj� @gjk

@xi

�: (3.3.3)

Then (3.3.2) becomes

gij.xi /00 C �ijk.x

j /0.xk/0 D 0: (3.3.4)

As we learned from the theory of differential equations in Chapter 6, it is usefulto have the highest derivative in explicit form. In the present case, it suffices tomultiply by the matrix g�1 inverse to g. To conform with the Einstein convention,it is customary to denote the .i; j /’th entry of the matrix g�1 as gij. Then we obtain

gijgjk D ıik

where

ıik D 1 when i D k

D 0 otherwise

is called the Kronecker ı (see also Appendix A, 7.2). Thus, putting

�ijk D gi`� jk;

the geodesic equation becomes

.xi /00 C �ijk.xj /0.xk/0 D 0: (3.3.5)

The symbols �ijk or �ijk are known as Christoffel symbols of the first resp. secondkind.

Parametrized curves satisfying the geodesic equation are called geodesicsparametrized by arc length, or simply geodesics. Let us keep in mind, however,that geodesics are merely critical for the functional (3.2.1) of 3.2. We have notproved that geodesics minimize the length of continuously differentiable curves


with given boundary points. In fact, this is false in general (see Exercise (7) (c)of Chapter 15 below). Yet, for the sake of geometry, we are clearly interested atleast in some minimum length statement regarding geodesics, and it is important tonote that the variational tools we supplied do not give that. We will prove such astatement in the next section using different methods.

4 The geometry of geodesics

The purpose of this section is to study geodesics in more detail, and eventually toprove that locally they really are the curves of minimal length connecting two pointswith respect to a given Riemann metric.

4.1 Dependence on boundary conditions, the exponential map

Recall now Theorem 6.5.5 where we investigated the dependence on an ordinarydifferential equation on initial conditions. At this point, we are interested in dealingwith smooth functions. Let us distill the result we will need here:

4.1.1 Lemma. Let U � Rn be an open set, and let f W R U ! Rn be a smoothfunction. Consider points t0 2 R, x0 2 U . Then there exists an open neighborhoodV of .t0; x0/ in R U and a unique smooth function y W V ! U such thaty.t0; x/ D x and

@y.t; x/

@tD f.t; y.t; x// (*)

for all .t; x/ 2 V .

Proof. As explained in Subsection 5.1 of Chapter 6, we can treat dependenceon initial conditions as dependence on parameters. From this point of view, theexistence and uniqueness of a continuous solution y as claimed follows fromTheorem 5.3 of Chapter 6, and its continuous differentiability in all variables followsfrom Theorem 5.5 of Chapter 6. Now applying the equations (5.5.2) of Chapter 6for the partial derivatives, we obtain, by induction, the existence and continuity ofall higher partial derivatives. ut

By 1.2 of Chapter 6, an analogue of Lemma 4.1.1 also holds for systems of higherorder differential equations. Applying this specifically to the case of the geodesicequation (3.3.5) of 2.3, we obtain the following

4.1.2 Corollary. For a smooth Riemann metric g on an open set U � Rn and apoint P 2 U , pick an isometry

� W .Rn; h‹; ‹i/ ! .Rn; h‹; ‹igP /:

4 The geometry of geodesics 361

(Here on the left hand side, h‹; ‹i denotes the dot product, see Appendix A,Section 4.3.) Then there exists a convex open neighborhood V of o 2 Rn, and aunique smooth map � W V ! U such that(1) �.o/ D P ,(2) for each v 2 Rn, �.vt/ considered as a function of t in an open neighborhood

of o in which vt 2 V is a g-geodesic parametrized by arc length (in the senseof 3.3),

(3) @v�.o/ D �.v/.

The smooth map � of Corollary 4.1.2 is often denoted by exp and called theexponential map.

4.2 Behavior of geodesics with respect to lengths and angles

Let us first verify that solutions of the equation (3.3.5) of 3.3 are indeed parametrizedby arc length with respect to the Riemann metric g. While we argued in 3.2 that thismust be true for parametric curves minimizing the functional (3.2.1), note that wehave so far only proved that the solutions of (3.3.5) are critical. Hence, that argumentcannot be used rigorously.

4.2.1 Lemma. Let x W .a; b/ ! U be a solution of the equation (3.3.5). Thenwe have

.gij.xi /0.xj /0/0 D 0

(using the Einstein convention).

Proof. Let us compute the Hamiltonian (2.2.1) of 3.2 for the Lagrangian (3.3.1)of 2.3:

@L.x; x0/@vi

.xi /0 �L.x; x0/ D 2gij.xi /0.xj /0 � gij.x

i /0.xj /0 D gij.xi /0.xj /0:

Thus, the quantity whose constancy in t we are trying to prove is in effect theHamiltonian. Hence, our statement follows from 2.2. ut

Note that the proof of Lemma 4.2.1 suggests multiplying the Lagrangian (3.3.1)of 3.3 by a factor of 1=2, and calling it energy.

4.2.2Now we will prove that when we shift a geodesic to a nearby geodesic, theangle of the shift is also conserved, provided that we do not change the scale ofparametrization. More precisely, let solutions of the geodesic equation (3.3.5) of 3.3depend on some smooth parameter u in the space of initial conditions, as in the proofof Lemma 4.1.1. Let us assume further that


@gij.xi /0.xj /0

@uD 0: (1)

Note that by Lemma 4.2.1, it suffices to verify this condition at one point, andthe condition indeed means that we are not changing the scale of arc lengthparametrization with u. Now let

z D @x

@u;

as, again, in the proof of Lemma 4.2.1.

Lemma. We have

.gij.zi /.xj /0/0 D 0: (2)

Proof. Compute

.gij.zi /.xj /0/0 D @gij

@xk@xk

@t

@xi

@u

@xj

@tC gij

@2xi

@u@t

@xj

@tC gij

@xi

@u

@2xj

.@t/2: (3)

Now (1) implies that

@gij

@xk@xk

@u

@xi

@t

@xj

@tC 2gij

@2xi

@u@t

@xj

@tD 0: (4)

Subtracting 1=2 times (4) from the right hand side of (3), we get

� � � D @gij

@xk@xk

@t

@xi

@u

@xj

@t� 1

2

@gij

@xk@xk

@u

@xi

@t

@xj

@tC gij

@xi

@u

@2xj

.@t/2: (5)

Using the geodesic equation (3.3.5) of 3.3 for @2xj

.@t/2, we see that the second term is

equal to

�gij�ijk.x

i /0.xj /0.xk/0 D ��ijk.xi /0.xj /0.xk/0:

Using the definition of �ijk in 3.3, this is equal to

�12

@xi

@u

�@gij

@xkC @gik

@xj� @gjk

@xi

�@xj

@t

@xk

@t:

This is equal to

4 The geometry of geodesics 363

� @gij

@xk@xk

@t

@xi

@u

@xj

@tC 1

2

@gij

@xk@xk

@u

@xi

@t

@xj

@t

by renaming variables, which shows that (5) is 0. ut

Remark: In comparison with Lemma 4.2.1, we may ask if Lemma 4.2.2 hasa similarly conceptual proof (our proof was by calculation from the definition ofthe Christoffel symbols). Such a conceptual proof indeed exists, and is related toour comments in Sections 7 and 8 of Chapter 6: the condition (1) indicates thatthe Lagrangian has an infinitesimal symmetry. By a similar but somewhat moreelaborate argument to the discussion in Chapter 6, this always implies a conservedquantity known as a Noether current, which is the cause of the conservation lawproved in Lemma 4.2.2. Discussing this more systematically, however, exceeds thescope of this text.

4.3 Minimality of geodesics

Let us now consider an open subset U � Rn with a smooth Riemann metric g and apoint P 2 U . Choose an isometry � as in Corollary 4.1.2, and let � W V ! U ,�.0/ D P , be the corresponding exponential map. By the Inverse FunctionTheorem 7.3 of Chapter 3, we may further assume that � is a diffeomorphismonto its image.

4.3.1 Lemma. Let

Sr D fx 2 Rn j jjxjj D rg:

Assuming Sr � V , x 2 Sr , v 2 T .Sr/x, then .D�/x.v/ is g-orthogonal to .D�/x.x/.In other words, T .�.Sr//�.x/ is g-orthogonal to .D�/x.x/.

Caution: It is not claimed, and, as we will see in the next section, certainly nottrue in general, that � would be an isometry!

Proof. We will use Lemma 4.2.2. Let Qx D x=jjxjj D x=r . Consider the geodesic�.t Qx/. By the definition of �, and the fact that it is a diffeomorphism onto its imagewhen restricted to V , the space T .�.Sr//�.x/ is spanned by the vectors z.r/ of 4.2.2with respect to the boundary condition change

x.0; u/ D P; (*)

x0.0; u/ D �

� Qx C uw

jjQx C uwjj�


where hx;wi D 0. The condition (1) of 4.2.2 is then satisfied (at t D 0 and hence,by Lemma 4.2.1, for all t 2 .�r; r/) by the fact that � is an isometry. By (*),

gij.zi /.xj /0 D 0

at t D 0, and hence, by Lemma 4.2.1, also at t D r D jjxjj, which implies thestatement of Lemma 4.3.1. ut

Assume now, without loss of generality, that V D .o; R/ for some R > 0.

4.3.2 Theorem. Let y W h0; ai ! �.V / be a continuously differentiable curve suchthat y.a/ 2 �.Sr/. Then, recalling the notation (3.1.1), we have

sg.y/ � r;

and equality is attained if and only if y is a geodesic, i.e. y Qy where Qy is a geodesicparametrized by arc length.

Proof. Consider the function

h W �.V / ! R

given by

h.x/ D jj��1xjj:

Then the function h is smooth on �.V / X fP g. By Lemma 4.3.1, the vector

[email protected]/

@xi/i (a)

is a positive multiple of the derivative at t D jj��1.x/jj of the geodesic

�.t.��1.x/=jj��1.x/jj/: (b)

We have

gij @h.x/

@[email protected]/

@xjD 1: (c)

(Change coordinates so that one coordinate vector will be the derivative of (b) att D jj��1.x/jj and the other, g-orthogonal coordinate vectors will be tangent vectorsat x to �.Sjj��1.x/jj/. Then the contributions to (c) in the new coordinates from allbut i D j D 1 will be 0, and the contribution from the first coordinate is 1 by thefact that the geodesic (b) is parametrized by arc length, and � is an isometry.) Hence,

5 Exercises 365

by the finite-dimensional Cauchy-Schwarz inequality (see 4.4 of Appendix A), forany z 2 Rn,

qgij.x/zizj � @h.x/

@xizi

where equality arises if and only if z is a positive multiple of (a). Therefore,

sg.y/ DZ

h0;ai

qgij.y.t//.yi /0.t/.yj /0.t/dt �

Z

h0;aih.y.t//0dt D h.y.a//

where equality arises if and only if y0.t/ is a positive multiple of a tangent vectorof a geodesic of the form (b) for y.t/ D x almost everywhere in t (and henceeverywhere, by continuity). ut

5 Exercises

(1) Prove that for y 2 Va;b;p;q, and h 2 Va;b;o;o, we have

S.y C h/ � S.y/ DZ b

a

Dy.t/h.t/dt C M.h/ � jjhjj

where

M W Va;b;o;o ! R

satisfies

limh!0

M.h/ D 0

and

.Dy.t/i / D @L.t; y.t/; y0.t//@xi

� d

dt

�@L.t; y.t/; y0.t//

@vi

�:

The function Dy.t/ is an example of what we call a Frechet derivative,although it is more common to consider this concept on normed vector spaces(while Va;b;p;q is an affine space). [Hint: Mimic the proof of Theorem 1.2, butkeep in mind that h plays a slightly different role, h.t/ now having values inRn.]

(2) Find the critical functions � W ha; bi ! R for the functional

S.y/ DZ b

a

y.x/p1C .y0.x//2dx

on continuously differentiable functions y 2 Va;b;p;q , p; q > 0 2 R.


(3) Find the Euler-Lagrange equation for the functional

S.y/ DZ 1

0

y2 � .y0/2dx:

(4) Find the critical functions � W ha; bi ! R for the functional

S.y/ DZ b

a

.p2.y0/2 C q2y2/dx:

(5) By reversing the coordinates in Example 2.2 (i.e. making the vertical coor-dinate the independent and the horizontal coordinate the dependent variable),find an alternate solution to the brachistochrone problem using the method ofExample 2.1.

(6) Find the critical functions for the functional

S.u; v/ DZ 1

0

..u0/2 C .v0/2 C u0v0/dx:

(7) Prove in detail the parametric form (2.2.2) of the solution of the brachys-tochrone problem.

(8) Prove that the formula (3.1.1) of 3.1 does not depend on the parametrization� of a piecewise continuously differentiable curve L.

(9) The hyperbolic plane is the upper half-plane of complex numbers, i.e. the set

H D fx C iy 2 C j y > 0g

with the Riemannian metric gij associated, at a point x C iy 2 H, with thematrix

�1=y2 0

0 1=y2

�:

Using the geodesic equation, determine the geodesics in H.(10) (Spherical geometry) Consider, on

C D fx C iy j x; y 2 Rg;

the Riemann metric gij associated, at a point x C iy 2 C, with the matrix

�1=.1C x2 C y2/ 0

0 1=.1C x2 C y2/

�:

Using the geodesic equations, determine the geodesics in this space.

15Tensor Calculus and Riemannian Geometry

The attentive reader probably noticed that the concept of a Riemann metric on anopen subset of Rn which we introduced in the last chapter, and the related materialon geodesics, beg for a generalization to manifolds. Although this is not quite asstraightforward as one might imagine, the work we have done in the last chaptergets us well underway. A serious problem we must address, of course, is how theconcepts we introduced behave under change of coordinates. It turns out that whatwe have said on covariance and contravariance in manifolds is not quite enough: weneed to discuss the notation of tensor calculus.

Additionally, it turns out that discussing geodesics in a Riemann metric directlywould cause us to copy many expressions over and over unnecessarily. There is anatural intermediate notion which axiomatizes the Christoffel symbols of the secondkind directly, without referring to a Riemann metric. This gives rise to the conceptof an affine connection. In the presence of an affine connection, we can discussgeodesics, but also the important geometric concepts of torsion and curvature. Wewill show that vanishing of torsion and curvature characterizes, in an appropriatesense, the canonical affine connection on Rn (the flat connection).

We will define the notion of a Riemann manifold, and show how it canonicallyspecifies an affine connection, known as the Levi-Civita connection. This will leadus to the concept of curvature of a Riemann manifold. We will show that locally, aRiemann manifold with zero curvature is isometric to an open subset of Rn. We willalso show that every oriented Riemann manifold in dimension 2 has a compatiblestructure of a Riemann surface.

Although we make no reference to physics, the present chapter gives a goodrigorous foundation for the mathematics of general relativity theory. In fact, thenotation we use (writing out the indices in tensors) is closer to physics than iscustomary in most mathematical texts. As we shall see, this notation does notsacrifice rigor, and can make calculations with tensors more transparent by showingexplicitly which coordinates we are contracting.

To comment on the title of this chapter, by tensor calculus, one usually meansthe basic development of tensor fields, their transformation under changes ofcoordinates, and the covariant derivative. Riemannian geometry develops the same


367

368 15 Tensor Calculus and Riemannian Geometry

concepts further and on a higher level of abstraction. By making a kind of avertical slice through the concepts, we are hoping to make advanced geometry moreaccessible to the reader.

Riemannian geometry is a vast subject, and here we only explore its verybeginnings. For further study of differential geometry and Riemannian geometry,we recommend [10, 16, 21].

From here on, we commonly drop the bold-faced letter convention from 1.2 ofChapter 3. Exceptions will be made where we specifially need to refer to materialof previous chapters, such as Comment 1.1 below.

1 Tensor calculus

1.1 Tensors and tensor fields

LetM be a smooth manifold and let x 2 M . Anm-times contravariant and n-timescovariant tensor (or, more briefly, tensor of type .m; n/) at x is simply an element of

.TMx/˝m ˝ .TMx/

�˝n: (1.1.1)

A smooth tensor field T on M of type .m; n/ is a map assigning to each x 2 M atensor Tx at x of type .m; n/ such that for a smooth coordinate system h W U ! RN

at any point x 2 M ,

h� ˝ � � � ˝ h�„ ƒ‚ …mC n times

Ty; y 2 U (1.1.2)

depends smoothly on y. (By h�, we mean Dhx on the contravariant coordinates and.D.h�1/h.x//� on the covariant coordinates. Note also that the tangent space of RN

is identified canonically with RN , so the target of (1.1.2) is canonically identifiedwith the same finite-dimensional vector space for all y 2 U .)

For example, therefore, a smooth tensor field of type .1; 0/ is the same as asmooth vector field, and a smooth tensor field of type .0; 1/ is the same as a smooth1-form.

Comment: The reader may wonder why the convention on using the termscovariant and contravariant when referring to tensors is opposite to the functorialitywe observed in 2.4 of Chapter 12. The reason is that the traditional terminologyon tensors (which we follow here) focuses on coordinates rather than the objectsthemselves. In other words, one does not refer to functoriality with respect to smoothmaps, but with respect to coordinate change, which turns out to be the opposite. Togive an example, to use the notation of Chapter 14, a tangent vector would be, inlocal coordinates h1; : : : ; hn written as

v D vi@

@hi

1 Tensor calculus 369

(using the Einstein convention). From the point of view of tensor calculus, we writev simply as vi . Note that composing the coordinate system h with � where � W U !V is a diffeomorphism of open subsets of Rn, we have

@

@.� ı h/i D @hj

@�i@

@hj

by the chain rule, and thus with respect to the coordinates .� ı h/i , the coordinatesof v will be

D��1.v1; : : : ; vn/T :

1.2 A coordinate-free meaning for indices

Even though we have not specified coordinates, it is often customary to give a tensorof type .m; n/ m different superscripts and n different subscripts, e.g.

Ti1i2:::imj1j2:::jn

:

The superscripts and subscripts are formal symbols each one of which refers simplyto a particular factor of (1.1.1). For example a tensor of type .2; 2/ may be thendenoted by

Tijk`:

This notation has immediate benefits. For example, the Einstein convention nowmakes sense for tensors: for tensors T , S , by the symbol

T :::i :::::: S ::::::i ::: D S::::::i :::T:::i ::::::

we mean the image of S ˝ T under the map which applies the evaluation map

.TMx/� ˝ .TMx/ ! R

to the coordinates of S and T labeled by i . We stipulate that each index will occurat most twice, but there may be multiple pairs of coinciding indices, in which casewe apply multiple evaluation maps: For example,

Tijk`S

k`ij 2 R

makes sense for two tensors of type .2; 2/ at the same point x 2 M . This operationis often referred to as contraction.


The other benefit is that we can easily talk about symmetric and antisymmetrictensors: Recall that for two vector spaces V;W there is a canonical interchange map

V ˝W ! W ˝ V; v ˝ w 7! w ˝ v:

A tensor

T :::i :::j :::::: or T ::::::i :::j :::

is called symmetric (resp. antisymmetric) in the coordinates i; j if applying theinterchange map to those coordinates gives again T (resp. �T ). We may also saythat a tensor is symmetric (resp. antisymmetric) in a set of coordinates S if it issymmetric (resp. antisymmetric) in any pair i; j 2 S . Realize that then, for example,a smooth tensor field of type .0; k/ antisymmetric in all its coordinates is the samething as a smooth k-form.

One example needs to be discussed explicitly: recall from 2.5 of Chapter 11 thatthe canonical map

�V;W W V � ˝W ! Hom.V;W /; .�.f ˝ w//.v/ D f .v/w

is an isomorphism when V ,W are finite-dimensional. We then have a smooth tensorfield of type .1; 1/ on any manifoldM which, at any point x 2 M , is given by

��1TMx;TMx.IdTMx /:

This tensor field is denoted by

ıij :

1.3 Comment

The reader probably noticed the difference between the way subscripts andsuperscripts are used in the context of tensors on a Riemann manifold, and theway we used them in the last chapter: in the last chapter, an index i stood simplyfor the i ’th coordinate, where i is a number, and the Einstein convention was usedto sum terms where the same i occurs twice. In the context of tensors, no numberis plugged in for i , it simply is a label denoting which factor of the tensor productwe are working with, and the Einstein convention means an application of theevaluation map.

Conveniently, these two points of view are somewhat interchangable: if we pick

a local coordinate system h, then we have a basis@

@hiof TMx , and a dual basis dhi

of .TMx/�, and the evaluation map can be indeed computed by summing products

of terms coupling a basis element with the corresponding element of the dual basis.

2 Affine connections 371

Nevertheless, one must be careful to note that the coordinate-free tensor contextis more restrictive: The tensor notation should be used only for quantities whichare intrinsically coordinate-free. For example, let us take the Christoffel symbols�kij . On an open set in Rn, the tangent space is canonically identified with Rn, sowe could certainly view �kij as a tensor of type .1; 2/. The trouble is, however, thatif we change coordinates, i.e. apply a diffeomorphism to another open subset ofRn, this will not preserve the tangent space identification, and we find that it wouldnot preserve the tensor �kij we just defined, i.e. that for each choice of coordinates,we would get a different tensor. Usually, this is expressed by saying that �kij is nota tensor and transforms according to different rules (see Exercise (1) below). Itis more accurate, however, to say that there is no canonical tensor given by theChristoffel symbols.

2 Affine connections

2.1 The definition of an affine connection

There is no general natural way of taking a derivative of a vector field by anothervector field on a smooth manifold. However, we can give a manifold additionalstructure which enables such operations, and specify axioms which make thisoperation “behave like a derivative”. This leads to the notion of an affine connection.

Consider the R-vector space W.M/ of all smooth vector fields on a smooth man-ifoldM . An affine connection (or, more briefly, connection) onM is a bilinear map

W.M/ W.M/ ! W.M/;

.u; v/ 7! ru.v/

such that for a smooth function f W M ! R, we have

rf �u.v/ D f � ru.v/ (2.1.1)

and

ru.f � v/ D @uf � v C f � ru.v/: (2.1.2)

By @uf we mean a function which, at x 2 M , is the directional derivative at x of fby the vector u.x/. (Note that (2.1.2) can be interpreted as a kind of a Leibniz rule.)

2.2 Locality

Perhaps the first thing to notice about affine connections is that they are “local” inthe following sense: the value of ru.v/ at a point x 2 M clearly depends only on


the value u.x/ and the value of v on the image of any continuously differentiableoriented curve

� W .�a; a/ ! M

(such that �0.t/ ¤ 0) where �.a/ D x and �0.0/ D u: Choosing vectorfields e1; : : : ; eN such that e1.x/; : : : ; eN .x/ form a basis of TMx , by (2.1.1) andbilinearity, rei .v/ clearly determine the value of ru.v/ at x for any u. To prove thestatement about the v variable, first note that ru.v/.x/ and ru.w/.x/ are equal if v,w coincide in a neighborhood of x: in such a case, there exists a smooth functionh W M ! R such that h is constant 1 in a neighborhood of x, and hv D hw. Thisimplies our claim by axiom (2.1.2). Now choose local coordinates h W U ! Rn,h.u/ D 0. Then without loss of generality, M D hŒU �, which is an open set in Rn

(in other words, h is the inclusion). We can write

v D f iei ; w D giei ;

and by our assumption, f i and gi coincide on the image �Œ�a; a/� of �. Inparticular,

@ufi .x/ D @ug

i .x/:

Consequently, again, our claim follows from axiom (2.1.2).Consider now a function v assigning to y 2 �Œ.�a; a/� an element in TMy which

is smooth in the sense that h�v� W .�"; "/ ! Rn is a smooth function where h arelocal coordinates at x. We shall call such a function a smooth vector field definedon �Œ.�a; a/�. By the Implicit Function Theorem, a smooth vector field definedon �Œ.�a; a/� extends to a smooth vector field in a neighborhood of x, and by theprevious remarks, ru.v/.x/ is well defined even though v is not a priori a smoothvector field defined on a neighborhood of x. This is sometimes important.

2.3 Examples

1. The most basic example is the canonical connection in Rn: since the tangentspace of Rn is canonically identified with Rn, vector fields are canonicallyidentified withRn-valued functions, and we may simply define the value of ru.v/

at x as the u.x/-directional derivative of v (considered as an Rn-valued function)at x.

2. Let us now generalize this example in the spirit of the previous section. Let U bean open subset of Rn and let gij be a Riemann metric on U . Define

ru.v/ D ui�@vj

@xiej C vj�kij ek

�(2.3.1)

2 Affine connections 373

using the standard coordinates xi in Rn, and letting ei be the standard basis ofRn. The axioms (2.1.1), (2.1.2) are readily verified. To explain where this formulacomes from, note that by the chain rule, if x D x.t/ is a geodesic, then (2.3.1)gives precisely

rx0.t/x0 D 0; (2.3.2)

which really “looks like” a generalization of the equation of a straight line in Rn,although the left-hand side must be taken in the sense of the remarks made in 2.2.

Perhaps the main purpose of this section is to develop this example further,generalize it to Riemann manifolds and show its significance to a Riemann metric;but we need to develop more the general theory of connections first.

2.4 Parallel transport and geodesics

2.4.1Let � W ha; bi ! M be a continuously differentiable parametrized curve in a smoothmanifold M with affine connection r (as usual, we assume that �0.t/ ¤ 0 for anyt 2 ha; bi and take the one-sided derivatives at the boundary points). Consider theequation

r�0.t/y.�.t// D 0 (*)

where y is a smooth vector field defined on �Œha; bi�. Clearly, we can treat thisproblem locally, and hence we may work in a coordinate neighborhood U of M ,

where we have a smooth coordinate system h W U ! RN . Let ei D @@hi

. Writing

y D xiei ;

the equation (*) becomes a system of first-order linear differential equations in thecoefficients xi . Thus, by Theorem 1.3 of Chapter 7, there is a unique solution to theequation (*) with given value

v D y.�.a// 2 TM�.a/:

This solution is called the parallel transport of the vector v along the parametrizedcurve � with respect to the affine connection r. It is important to note, however, thatperforming parallel transport on a vector v D �.a/ 2 TM�.a/ along a parametrizedclosed curve � may produce a different vector v ¤ �.b/ 2 TM�.b/ D TM�.a/. Thisis related to two quantities known as torsion and curvature associated with the affineconnection r, which we will discuss in the next section.


2.4.2 GeodesicsWe can now also see that the concept of a geodesic generalizes to any smoothmanifold with an affine connection. In effect, if, in a local coordinate system h,

we write ei D @@hi

, and define Christoffel symbols of the connection r by

rei .ej / D �kij ek; (*)

then in this generalized sense, any affine connection in local coordinates is givenby the formula (2.3.1) of 2.3 (by the axioms of 2.1). We then see that the“geodesic equation” (2.3.2) written in coordinates becomes a (non-linear) second-order ordinary differential equation, and hence locally has solutions uniquelydetermined by the value and derivative at a single point (by Corollary 4.1.2 ofChapter 14).

3 Tensors associated with an affine connection:torsion and curvature

Recall the vector space W.M/ of smooth vector fields on M . We will prove thefollowing

3.1 Lemma. Suppose we have a multi-linear function

ˆ W W.M/ � � � W.M/„ ƒ‚ …k times

! W.M/

which has the property that

ˆ.u1; : : : ; ui�1; yui ; uiC1; : : : ; uk/x D y.x/ˆ.u1; : : : ; uk/x (*)

for every smooth function y W M ! R and every i D 1; : : : ; k. Thenˆ.u1; : : : ; uk/xonly depends on .u1/x; : : : ; .uk/x , and defines a smooth tensor field of type .1; k/.

Remark: Multi-linearity of ˆ, as we defined it, guarantees condition (*) for aconstant function y.

Proof. By the same reasoning as in 2.2, the value ˆ.u1; : : : ; uk/x depends only onvalues of ui in an open neighborhoodU of x. We may assume U to be a coordinate

neighborhood with a coordinate function h W U ! RN , and let ei D @@hi

. Then we

may write

ui D i yj ej

3 Tensors associated with an affine connection: torsion and curvature 375

for smooth functions i yj on U , so (*) implies that ˆ.u1; : : : ; uk/x is the sum of

1yj1.x/ � � � � � kyjk .x/ˆ.ej1 ; : : : ; ejk /x

over all possible choices 1 � j1; : : : ; jk � N . This implies our statement. ut

3.2

Let r be an affine connection on a smooth manifoldM . We will give two examplesof quantities satisfying the assumptions of Lemma 3.1, namely

T .u; v/ D ru.v/ � rv.u/� Œu; v�

and

R.u; v;w/ D rurv.w/ � rvru.w/ � rŒu;v�.w/

where Œu; v� is the Lie bracket of the smooth vector fields u; v (see Section 7 ofChapter 6, and Exercise (5).

Lemma. The functions T , R satisfy the hypotheses of Lemma 3.1, and hence definesmooth tensor fields T kij , R`ijk. Furthermore, both of these tensors are antisymmetricin the coordinates i; j .

Remark: The tensors T kij , R`ijk are called the torsion tensor and curvature tensor,respectively.

Proof of the Lemma: Multilinearity is obvious, as is antisymmetry in the specifiedcoordinates. Condition (*) of Lemma 3.1 is a direct calculation.

T .yu; v/ D ryu.v/ � rv.yu/� Œyu; v�

D yru.v/� yrv.u/� .@vy/ � u � yŒu; v�C .@vy/ � u

D yT .u; v/;

R.yu; v;w/ D ryurv.w/ � rvryu.w/ � rŒyu;v�.w/

D yrurv.w/ � rvyru.w/ � ryŒu;v�w C r@vy�uw

D yrurv.w/ � yrvru.w/ � @vy � ru.w/ � yrŒu;v�w

C@vy � ru.w/ D yR.u; v;w/;


R.u; v; yw/ D rurv.yw/ � rvru.yw/ � rŒu;v�.yw/D ruyrv.w/C ru.@vy � w/ � rvyru.w/ � rv.@uy/ � w�yrŒu;v�.w/ � @Œu;v�y � wD yrurv.w/C @uyrv.w/C @vyru.w/C @u@vy � w�yrvru.w/ � @vyru.w/ � .@uy/rv.w/ � @v@uy � w�yrŒu;v�w � @u@vy � w C @v@uy � w D yR.u; v;w/:

The other cases follow by antisymmetry. ut

3.3 Example

The connection defined in Example 2.3 2 has zero torsion. This immediately followsfrom the fact that

�kij D �kj i : (3.3.1)

Compare this to the beginning of Subsection 3.3 of Chapter 14, where we specifi-cally defined the Christoffel symbols in such a way so as to make (3.3.1) true.

In fact, more generally, we see from the comments made in 2.4.2 and formula(2.3.1) of 2.3 that any affine connection has zero torsion if and only if, in localcoordinates, it satisfies (3.3.1) in the sense of 2.4.2.

3.4 A characterization of the Euclidean connection

Theorem. Let M be a smooth manifold with an affine connection r, and letx 2 M . Then there exists an open neighborhood of x in which r has torsionand curvature tensors equal identically to 0 if and only if there exists an openneighborhoodU of x and a coordinate system h W U ! Rn which sends r restrictedto U to the canonical connection (Example 2.31) on Rn, restricted to hŒU �.

Proof. Clearly, the Euclidean connection has torsion and curvature 0, and hence theexistence of the coordinate system h W U ! Rn with the specified properties impliesthat r is torsion and curvature free on U .

On the other hand, consider a connection r onM which is torsion and curvaturefree on an open neighborhood of x. Choose a basis e1; : : : ; en of TMx . Let � W.�a1; a1/ ! M , �.0/ D x, �0.0/ D e1 be a geodesic with respect to r. Nowdenote the parallel transport of e2 along � also by e2 at each point t1 2 .�a1; a1/. Let�t1 W .�a2; a2/ ! M be a geodesic with �t1.0/ D �.t1/, �0

t1.0/ D e2. Note that we

may assume the number a2 > 0 is independent of t1 because of smooth dependenceon geodesics on boundary conditions (the argument of 2.4 extends verbatim to thissituation). By the same argument, we may also consider � as a smooth function

3 Tensors associated with an affine connection: torsion and curvature 377

� W .�a1; a1/ .�a2; a2/ ! M:

We will denote the two independent variables by t1 2 .�a1; a1/, t2 2 .�a2; a2/. Weclearly have

Œ@�

@t1;@�

@t2� D 0 (1)

by the commutation of partial derivatives. Write

ei D @�

@ti; (2)

i D 1; 2. By the fact that r has 0 curvature, parallel transports along the curves �t1;‹and �‹;t2 with constant t1 resp. t2 therefore commute. We conclude in particular that

re2 .e1/ D 0;

since it is true at t2 D 0 by our definition. Since r has 0 torsion, we also have

re1 .e2/ D 0;

and since the curvature is 0,

re2re1 .e1/ D re1re2 .e1/ D 0:

Hence, in fact,

re1 .e1/ D 0;

since it is true at t2 D 0 by our definitions. In conclusion,

rei .ej / D 0 (3)

for i; j 2 f1; 2g.Now assume, by induction, that we have a function

� W .�a1; a1/ � � � .�ak; ak/ ! M

such that if we define (2), then (3) is true for all i; j 2 f1; : : : ; kg. If k < n,denote the parallel transport of ekC1 to any of the points .t1; : : : ; tk/ by thecurves �.t1; : : : ti�1; ‹; tiC1; : : : tn/ (with only one ti non-constant) by ekC1. Smoothdependence on boundary conditions implies that � is a smooth function of the kC1

variables t1; : : : ; tkC1 on some set


.�a1; a1/ � � � .�akC1; akC1/

and applying the above argument to individual pairs of coordinates gives (3) fori; j 2 f1; : : : ; k C 1g.

Thus, we may assume k D n. But then � is locally the inverse of a localcoordinate system on M at x (by the Inverse Function Theorem), and (3) impliesthat this coordinate system carries the connection r to the Euclidean connection2.31, as claimed. ut

4 Riemann manifolds

The purpose of this section is to put, finally, everything together. We define aconnection canonically associated with a Riemann metric on a smooth manifold,called the Levi-Civita connection. We define the curvature of a Riemann manifold,and prove that vanishing of the curvature locally characterizes Euclidean geometryup to isometry.

4.1 Riemann metrics

A (smooth) Riemann metric on a smooth manifoldM is a smooth tensor field of type.2; 0/ denoted usually by gij which is symmetric and such that for each x 2 M , thesymmetric bilinear form on TM�

x defined by

g.u; v/ D gijui vj

is positive-definite (and hence defines a real inner product). A smooth manifold witha Riemann metric is called a Riemann manifold. The fact that we considered an innerproduct on TMx (as opposed to TM �

x ) is merely a convention: we claim that givena Riemann metric gij, there exists a unique tensor of type .2; 0/ denoted by gij suchthat

gijgjk D ıki ;

which, moreover, defines a positive-definite symmetric bilinear form on TM�x :

Picking an ordered basis B of TMx , the matrix of gij with respect to the orderedbasis of TM�

x dual to B is the inverse of the matrix of gij with respect to B whichis also positive-definite (see Exercise (2)). Similarly, we could have started withpositive-definite symmetric tensor gij, and a positive-definite symmetric tensor gij

would be determined.An isometry is a smooth diffeomorphism f W M ! N between Riemannian

manifolds with Riemann metrics g, Qg such that f�g D Qg.

4.1.1 Lemma. Every smooth manifoldM has a Riemann metric.

4 Riemann manifolds 379

Proof. The statement is certainly true if we replace M by one of its coordinateneighborhoods Ui (since for an open subset of Rn, we can take the standard innerproduct on Rn). Let i g be the Riemann metric on Ui , and let ui be a smooth partitionof unity subordinate to the open cover .Ui /. Then

Xi

ui � .ig/

is a Riemann metric on M . (Note that a linear combination of finitely manypositive-definite symmetric matrices with positive coefficients is a positive-definitesymmetric matrix.) ut

4.1.2 The induced Riemann metricLemma 4.1.1 is often very useful technically, but is perhaps not very geometric: theRiemann metric which we proved to exist has no geometric meaning. Typically, weare dealing with a situation where a Riemann metric is given and we are interested inits properties. The most common way a Riemann metric can be given is as follows:suppose we are given a Riemann metric on a smooth manifold N , and suppose� W M � N is a smooth submanifold (we could more generally consider the situationwhen � is an immersion). Then we have a naturally induced Riemann metric on M ,simply because for x 2 M , we have an embedding TMx � TNx . To show that thisinduced Riemann metric is smooth, recall that gij is contravariant with respect to �,so we know that .gij/M D ��..gij/N / is a smooth tensor field.

4.2 Riemann metrics and connections

Let gij be a Riemann metric on a smooth manifold M , and let r be an affineconnection on M . We say that the connection r is compatible with the Riemannmetric gij if gij is preserved by parallel transport, i.e. for a smooth parametrizedcurve � with boundary points x, y, and two vectors u; v 2 TMx , if Qu, Qv are theparallel transports of u; v to TMy , we have

gij .Qui ; Qvj / D gij .ui ; vj / (4.2.1)

An “infinitesimal version” of this condition is (dropping the indices)

@u.g.v;w// D g.ru.v/;w/C g.v;ru.w//: (2)

(See exercise (4).)

Theorem. For every Riemann metric g on a smooth manifold M , there exists aunique affine connection r on M which is compatible with g and has 0 torsion.This affine connection r is known as the Levi-Civita connection.


Proof. We shall prove uniqueness first. Suppose we have an affine connectioncompatible with the Riemann metric g. Let u; v;w be smooth vector fields on M .Compute from (2) and the fact that r is torsion free:

@u.g.v;w//C @v.g.u;w// � @w.g.u; v//D g.ruv;w/C g.ruw; v/C g.rvu;w/C g.rvw; u/� g.rwu; v/� g.rwv; u/D 2g.ruv;w/C g.Œu;w�; v/C g.Œv;w�; u/:

Therefore,

g.ruv;w/D 1

[email protected];w//C @v.g.u;w//� @w.g.u; v// � g.Œu;w�; v/ � g.Œv;w�; u// :

Hence, ruv is determined by g.Now we will prove existence. We will first treat the case when M D U is

an open subset of Rn. In this case, consider the connection (2.3.1) constructed inExample 2.32. We already know from Example 3.33 that this connection is torsionfree. To verify that this connection is compatible with the metric g, by the chainrule, it suffices to verify the condition (2) in the case when u D ei , v D ej , w D ek.Thus, we need to show that

@ei .g.ej ; ek// D g.rei ej ; ek/C g.ej ;rei ek/;

which translates to

@gij

@xkD �kij C �jik;

which follows directly from equation (3.3.3) of Chapter 14.Now let M be an arbitrary smooth Riemann manifold, and let .Ui/ be a

coordinate cover ofM . Then by what we just proved, and by locality of connections,we have smooth torsion free connections on each Ui which are compatible with g.By uniqueness, further, the connections corresponding to Ui and Uj coincide onUi \Uj . Thus, these connections together define a torsion free affine connection onM compatible with g. ut

4.3 The curvature tensor of a Riemann manifold,and a characterization of Euclidean geometry

Let M be a smooth manifold with a Riemann metric g. To this data, we haveuniquely associated the Levi-Civita connection r by Theorem 4.2. The curvaturetensor R of the Levi-Civita connection is called the curvature tensor of theRiemann manifold M . The culmination of our work is the following result, whichcharacterizes Euclidean geometry in the world of Riemann manifolds!

5 Riemann surfaces and surfaces with Riemann metric 381

Theorem. LetM be a Riemann manifold, and let x 2 M . Then there exists an openneighborhood of x on whichR D 0 if and only if there exists an open neighborhoodU of x and a smooth map h W U ! Rn which is an isometry onto its image.

Proof. The necessity of 0 curvature for the existence of h follows directly fromTheorem 3.4, and the sufficiency almost does. In effect, if curvature vanishes in aneighborhood of x, from Theorem 3.4, we get an open neighborhood U of x anda map h W U ! Rn which is a diffeomorphism onto its image such that h mapsthe Levi-Civita connection on U to the Euclidean connection on hŒU �. Clearly,we may then assume that U D M and h is the identity. Note however that wehave not proved the map h preserves Riemann metrics. In effect, we must investi-gate the question: What Riemann metrics is the Euclidean connection r compatiblewith?

To answer this question, assume, without loss of generality, that U is connected(in fact, we could assume without loss of generality that it is an open ball). Wesee from the formulation (4.2.1) of compatibility of the connection with the metricthat given an inner product gx on TMx for a chosen point x 2 U , there is at mostone Riemann metric gij on U with which r is compatible and such that .gij/x Dgx (since the inner product on TMy for all y 2 U is then determined by paralleltransport). Since, however, for the Euclidean connection, parallel transport is simplythe identity when we make the canonical identification of TMy with Rn, for anyinner productgx on TMx D Rn, there is precisely one Riemann metric with which ris compatible, namely the one specified by the same inner product on all TMy D Rn.Since any two inner product spaces of the same dimension are isomorphic, to getthe desired isometry, it suffices to pick an affine map ˛ W Rn ! Rn which takes theinner product on TMx to the standard inner product on Rn for a single point x 2 U .We may then put h D ˛jU . ut

Remark: For a general Riemann metric, it is not so easy to characterize allRiemann metrics with which its Levi-Civita connection is compatible, although (forconnected manifolds) it remains true that such Riemann metrics are characterizedby the inner product they give on TMx at a single point. Which of these innerproducts are allowable, however, is related to the notion of holonomy, which wedo not discuss here. We refer the interested reader to [21].

5 Riemann surfaces and surfaces with Riemann metric

Despite the fact that both concepts are attributed to Riemann, a Riemann surface isnot the same thing as a Riemann manifold which is a surface (i.e. has dimension 2).

A Riemann surface † is of course, in particular, a 2-dimensional manifold, andhence Lemma 4.1.1 applies. Additionally,† comes with the structure of a complexmanifold, but that is not the same thing as a Riemann metric.


5.1 The compatible complex structure

When putting a Riemann metric on a Riemann surface, we are usually onlyinterested in compatible metrics which means that for any tangent vector u 2 T†xfor any x 2 M , u is orthogonal to iu. Nevertheless, the method of Lemma 4.1.1readily applies to prove the following

Lemma. Every Riemann surface † has a compatible Riemann metric

Proof. On an open subset of C, the metric on C identified with R2 via theisomorphism

z 7! .Re.z/; Im(z)/ (5.1.2)

is clearly compatible. Let, again, Ui be the coordinate neighborhoods of †, let i gbe a compatible Riemann metric on Ui and let ui be a smooth partition of unitysubordinate to .Ui/. Then, as before,

Xi

ui � .ig/

is the desired compatible Riemann metric on †. ut

In this context, it is also appropriate to make the following

Observation. Every Riemann surface † comes with a canonical (i.e. preferred)orientation.

Proof. We will produce a nowhere vanishing 2-form on†. In fact, on an open subsetof C, we can simply take the form dxdy where x and y are the first and secondcoordinates of R2 (i.e. z D x C iy). Note again that the coordinates of a complexnumber �z D Qx C i Qy, � D j�jei˛ 2 C, are given by

� QxQy�

D j�j�

cos.˛/ � sin.˛/sin.˛/ cos.˛/

��x

y

�:

We conclude that

d Qxd Qy D j�j2dxdy: (5.1.3)

Now let .Ui / be a coordinate neighborhood of † and let !i be a 2-form induced asabove from dxdy by the complex coordinate z D xC iy on Ui . The key observationis that, by (5.1.3), on the intersection Ui \ Uj ,

!i D h!j


where h is a positive smooth real function. Thus, if ui is, again, a smooth partitionof unity subordinate to .Ui /, then

! DXi

ui!i

is the nowhere vanishing 2-form on † we were seeking. Simultaneously, it followsthat the form obtained from any other complex atlas is a multiple of ! by a positivesmooth real function. ut

5.2 The complex structure on an oriented surfacewith a Riemann metric: reduction to the equationof holomorphic disks

The orientation constructed in the Observation is called a compatible orientationon †. In view of the Observation and Lemma 5.1, it is a natural question if there isa converse, i.e. if every 2-dimensional oriented Riemann manifold has a structure ofa Riemann surface with which the Riemann metric and orientation are compatible.The answer is affirmative, but the proof turns out to be quite hard. We will need thefull force of the methods of Section 5 of Chapter 13.

Let † be a 2-dimensional oriented manifold with a Riemann metric. Our task isto construct a complex structure compatible with the metric. Let x 2 †. Clearly, it isenough to construct a conformal oreintation-preserving coordinate u W U ! C (withnon-singular differential at x). It turns out that it is somewhat easier to construct theinverse of the coordinate function u, which we will denote by f D f .z/. Note that,without loss of generality, we may assume that U D † is an open subset of C andx D 0 D u.x/, so the function f we seek should map an open neighborhood of 0onto U , f .0/ D 0, Df0 should be non-singular and orientation preserving. What is,however, the condition of compatibility of complex structure with Riemann metricin this setting? To understand this, note that a 2-dimensional oriented inner productR-vector space V comes with a canonical complex structure J , which means alinear map J W V ! V such that J 2 D �Id. In fact, define Jv to be the vectorof length jjvjj which is orthogonal to v and has the property that v ^ Jv has positiveorientation.

In this setting, the Riemann metric therefore specifies, at each z 2 U , a complexstructure Jz on C D T Uz, which varies smoothly as a function of z. This is referredto as an almost complex structure. We are therefore seeking a smooth function f .z/defined in a neighborhood of 0 such that

Dfz.it/ D Jf.z/Dfz.t/; f .0/ D 0; det.Df0/ ¤ 0: (5.2.1)

This is our first encounter with the equation of holomorphic disks. In order tosolve the equation, however, it is more convenient to write it in terms of complex


differential 1-forms. A complex differential 1-form ˛ on U is said to be of J -type.1; 0/ if for every v 2 C, and every z 2 U ,

˛.z/.Jzv/ D i˛.z/.v/:

(Note that a 1-form of type .1; 0/ with respect to the standard complex structure i issimply of the form

�.z/dz;

where �.z/ is a smooth function, i.e. not necessarily a holomorphic 1-form.)Now by definition, there exists a smooth function � W U ! C such that

dz D ˛ C �.z/˛

where ˛ is of J -type .1; 0/. We have

dz D ˛ C �˛;

and hence

˛ D dz � �dz

1 � j�j2 :

Thus, the complex 1-form dz � �.z/dz is of J -type .1; 0/ and the condition of fbeing J -holomorphic means that

f �.dz � �.z/dz/ D �.z/dz

for a smooth function �.z/. We have

f �.dz/ D @f

@zdz C @f

@zdz;

f �.dz/ D .f /�.dz/ D @f

@zdz C @f

@zdz:

Thus, we have

f �.dz � �.z/dz/ D .@f

@z� �.f .z//

@f

@z/dz C .

@f

@z� �.f .z//

@f

@z/dz:

The condition that this be a form of type .1; 0/ with respect to the standard complexstructure then reads

@f =@z D �.f .z//@f =@z: (5.2.2)

(Note that @f =@z D @f =@z.)


Our goal is then to solve the differential equation (5.2.2). To this end, we willmake one more reduction. Applying @=@z to (5.2.2) and writing

g.z/ D @�

@z; h.z/ D @�

@z; (*)

we obtain

@2f=.@z@z/� �.f .z//@2f=.@z@z/

D @f=@z � .g.f .z//.@f=@z/C h.f .z//.@f =@z//

D @f=@z � .g.f .z//.@f=@z/C h.f .z//�.f .z//.@f=@z//

D .g.f .z//C �.f .z//h.f .z/// � j@f=@zj2:(The second equality uses the equation (5.2.2).) Putting

b.z/ D g.z/C �.z/h.z/; (5.2.3)

we therefore have

@2f=.@z@z/� �.f .z//@2f=.@z@z/ D b.f .z//j@f=@zj2:

The complex conjugate equation is

��.f .z//@2f=.@z@z/C @2f=.@z@z/ D b.f .z//j@f=@zj2:

Putting

a.z/ D b.z/C �.z/b.z/

1 � j�.z/j2 ; (5.2.4)

this gives the equation

@2f=.@z@z/ D a.f .z//j@f=@zj2: (5.2.5)

Our strategy is first to solve the equation (5.2.5), and then show that the solution(with suitable conditions) also satisfies (5.2.2), and hence (5.2.1).

Before doing so, however, let us briefly consider what restriction we can placeon the function a.z/. Note that this function is related to the smooth function �.z/by the equations (*), (5.2.3) and (5.2.4). On the function �.z/ we can certainlyimpose the relation

�.0/ D 0;


since we are free to choose the differential of f to preserve the complex structureat 0. Further, by substituting t D ız for ı > 0 small if necessary, we can make�.z/ and its first several chosen partial derivatives arbitrarily small in a chosenneighborhood of 0, and further, since we are only interested in a correct solution ina neighborhood of 0, we may assume �.z/ D 0. for jzj > 1=2. Using the equations(*), (5.2.3) and (5.2.4), we can translate this to similar conditions on a.z/, i.e., forany fixed chosen ı > 0, we can assume

a.0/ D 0; a.z/ D 0 for jzj > 1=2,ja.z/j; j@a=@zj; j@a=@zj < ı for all z 2 C.

(5.2.6)

5.3 Theorem. There exists an ı > 0 such that for a smooth function a.z/ satisfying(5.2.6), there exists a solution f .z/ to the equation (5.2.5) with @f=@z, @f=@zcontinuous, f .0/ D 0,

limz!1f .z/ D 1; (5.3.1)

.@f=@z/.0/ ¤ 0 and

limz!1

@f

@zD 0: (5.3.2)

Proof. Recall Section 5.2 of Chapter 13. We will find a solution of the form

f .z/ D z C P1.�.z//; � 2 L3.C/: (5.3.3)

Define

.A.�//.z/ D a.z C P1.�// � j�.z/C 1j2:

Let us consider first the equation

@�

@zD A.�/: (5.3.4)

In effect, we will solve the equation (5.3.4) in the set Q" of continuous boundedfunctions on C which satisfy

j�.z/j � "

1C jzj (5.3.5)

with the metric induced from the metric on the space C.C/ of bounded continuousfunctions on C (the supremum metric). Note that obviously, Q" is a closed subsetof C.C/.


The parameter " > 0 will be chosen later, but note that (5.3.5) implies

Q� � L3.C/:

Since

j.P1.�//.z/j � C3K"jzj1=3

where

K D�Z

C

dxdy

.1C jzj/3�1=3

;

choosing C3K" < 1=2 guarantees

jz C P1.�/j > 1=2 for jzj > 1 � ı for some ı > 0,

so

supp..A.�//.z// � D: (5.3.6)

Let us also assume 0 < " < 1. Now by choosing ı > 0 sufficiently small, we mayassume

jA.�/j < "=8and

jA.�/� A. /j � 1

2j� � j (5.3.7)

for �; 2 Q". (Again, we are considering the norm in C.C/.)Now put

�1 D 0; �nC1 D P.A.�n//:

By Lemma 5.3.1 (1) of Chapter 13, we have �n 2 Q", and by (5.3.7), .�n/ is aCauchy sequence in Q". Put

� D limn!1�n:

Since P is continuous on C.C/, we have

� D PA.�/; j�.z/j � "=.1C jzj/; jA.�/j < "=8: (5.3.8)

Now by (5.3.6), A.�/ has support in D, so by Lemma 5.3.1 (2) of Chapter 13,


j�.z/� �.t/j < Kjz � t j1=3

for a suitable constant K . By Lemma 5.3.1 (2) of Chapter 13 again, there existconstants L; � > 0 such that

jA�.z/� A�.t/j < Ljz � t j�;

and hence � is continuously differentiable by Lemma 5.2.1 of Chapter 13, andmoreover satisfies (5.3.4).

Now consider the function f .z/ defined by (5.3.3). First note that by thedefinition of P1, f .0/ D 0. The equality (5.3.1) follows from the second estimate(5.3.8) and from Lemma 5.3.1 (2) of Chapter 13. Also,

@P1.�.z//@z

D �.z/� �.0/

by formula (5.2.4) of Lemma 5.2.1 of Chapter 13. Therefore, we have in (5.3.3)

@f

@zD �.z/C 1 � �.0/;

and f is continuously differentiable on C by Lemma 5.2.1 of Chapter 13. Therefore,f solves the equation (5.2.5), and @f=@z is non-zero at the point z D 0 because

j�.0/j � ":

To prove (5.3.2), it suffices to prove that

limz!1

@P1.�/@z

D 0: (5.3.9)

Because of the second estimate (5.3.8), we can write

P1.�.z// D � 1

Z

C

�.�/

� � zdsdt C 1

Z

C

�.�/

�dsdt:

The second summand is constant in z, the first one is, by substitution � D � � z,� D u C iv,

� 1

Z

C

�.�C z/

�dudv:

Differentiating after the integral sign gives


� 1

Z

C

@�.z C �/=@z � dudv

�D � 1

Z

D

@�.�/=@� � dsdt

� � z: (5.3.10)

Note that the integrand on the right-hand side 0 outside D, which lets us restrict theintegration from C to D. This also implies that taking derivatives after the integralsign is legal by Theorem 5.2 of Chapter 5. Now the right-hand side of (5.3.10)obviously tends to 0 with z ! 1, which proves (5.3.9). ut

5.4 Proposition. Any solution f .z/ of the equation (5.2.5) which satisfies theconditions of Theorem 5.3 is also a solution of the equation (5.2.2).

Proof. Let f be as assumed. Then, recalling (5.2.4), we have

@2f=.@z@z/ � �.f .z//@2f=.@z@z/

D .a.f .z// � �.f .z//a.f .z///j@f=@zj2 D b.f .z//j@f=@zj2:

Using the chain rule, we obtain from (5.2.3)

@

@z.@f=@z � �.f .z//@f=@z/C @f=@z � @�.f .z//

@z

D @f=@z ��g.f .z// � @f

@zC �.f .z// � h.f .z// � @f

@z

�:

From this, we obtain

@

@z.@f=@z � �.f .z//@f=@z/

D �@f=@z � h.f .z// ��@f=@z � �.f .z// � .@f=@z/

:

Setting

F.z/ D @f=@z � �.f .z// � .@f=@z/;

we therefore have

@F

@zD A.z/ � F.z/

where

A.z/ D �@f=@z � h.f .z//is a continuously differentiable function with compact support. Further, we have


limz!1F.z/ D 0

(by (5.3.1) and the fact that � has compact support). Hence, F.z/ D 0 for all z 2 C

by Theorem 5.3 of Chapter 13, which proves our statement. ut

Therefore, we have finished the proof of the following result.

5.5 Theorem. Every oriented smooth surface † with a Riemann metric has acompatible complex structure. ut

Note that in view of the comments of Subsection 5.3 of Chapter 10 and theRiemann Mapping Theorem 1.2 of Chapter 13, this can be equivalently phrasedto say that for every surface † with a Riemann metric, and any point x 2 †,any sufficiently small simply connected open neighborhood of x can be mappedconformally bijectively onto .0; 1/. In cartography, this theorem is of majorsignificance: Note that together with the Riemann Mapping Theorem, we can makea flat local chart of any (smooth) landscape in the shape of any simply connectedopen set in C (other than C itself) which preserves surface angles.

6 Exercises

(1) Let M be a smooth manifold with an affine connection and let U be an opensubset of M . Let xi , yi be two different coordinate systems on U , and let

�kij be the Christoffel symbols with respect to the coordinates xi , and �k

ij theChristoffel symbols with respect to yi . Prove that

�k

ij D @xp

@yi@xq

@yj@yk

@xr�rpq C @yk

@xm@2xm

@yi@2xm

@yi@yj:

Note that the second term is the “error term for the symbol �kij behaving as atensor of type .2; 1/”.

(2) Prove that the inverse of a positive-definite symmetric matrix is positive-definite. [Hint: We have xT Ax > 0 when x ¤ 0, and we want to proveyT .A�1/y > 0 for y ¤ 0. Consider y D Ax.]

(3) Let M be a Riemann manifold with Riemann metric g. Define, for x; y 2 M ,

�.x; y/ D infysg.y/

where y is a parametrized continuously differentiable curve with boundarypoints x; y. Prove that the function � is a metric and that the associatedtopology to � is the topology on M which is a part of the definition of a

6 Exercises 391

manifold. [Hint: Use Theorem 4.3.2 of Chapter 14. Keep in mind that one ofthe things to show is that �.x; y/ D 0 implies x D y.]

(4) Prove that the conditions (4.2.1) and (2) of 4.2 are equivalent. [Hint: Integrat-ing condition (2) along a curve � where r�0.v/ D r�0.w/ D 0, u D �0gives (4.2.1). This also means that (4.2.1) implies (2) at points where ru.v/ Dru.w/ D 0. Fixing local coordinates, the general case then follows by thechain rule.]

(5) Volume associated with a Riemann metric:(a) Let g be a Riemann metric defined on a bounded open subest U � Rn.

Assuming B � U is a Borel set, define

volg.B/ DZ

B

qdet.gij/:

Prove that this definition is invariant under diffeomorphism, provided wetransform gij as a tensor of type .2; 0/.

(b) Let M be a Riemann manifold with coordinate atlas .Up; hp/p2P and letup be a smooth partition of unity subordinate to Up . Recall that P can bechosen to be countable, since we defined manifolds to have a countablebasis. Let B be a Borel subset of M . Prove that we can write B as adisjoint union of Borel sets Bp , p 2 P , such that Bp � Up . Put

volg.B/ DXp

vol.hp/�g.hpŒBp�/:

Prove that volg.B/ does not depend on the choices (i.e. the atlas and theset Bp).

(6) Let � W ha; bi ! .0;1/ be a smooth function (taking one-sided derivatives atthe boundary points). Consider the smooth map of manifolds

� W .a; b/ S1 ! R3

given by

.x; e2it / 7! .x; �.x/ cos.t/; .x/ sin.t//:

Prove that � is an embedding of manifolds. Let g be the Riemannian metricon M D Im.�/ induced from R3. Find an explicit formula for the volume(=“area”) ofM in terms of the function �. Find the function � which minimizethe surface area ofM subject to given values �.a/; �.b/ > 0. You may assumewithout proof such smooth function � exists. [Hint: compare with Exercise (2)of Chapter 14.]

(7) (a) Consider the 2-sphere

S2 D f.x; y; z/ 2 R3 j x2 C y2 C z2 D 1g


with the Riemann metric induced from R3. State precisely and provethat geodesics are precisely segments of great circles parametrized by arclength.

(b) Generalize this to the n-sphere.(c) Construct a Riemann metric on R2 in which there exists a geodesic with

boundary points A, B which does not minimize the distance functionalamong continuously differentiable curves with boundary points A, B .[Hint: Remove a point from S2, and induce a Riemann metric on R2 fromthe Riemann metric (a) via the radial projection diffeomorphism.]

(8) Let M � N be a smooth submanifold, and let g be the Riemann metric onM induced by a Riemann metric Qg on N . If we denote by r resp. Qr theLevi-Civita connection of g resp. Qg, prove that .ru.v//x is the Qg-orthogonalprojection of Qru.v/ onto TMx for x 2 M (note that Qru.v/ is only definedin the sense of 2.2). Use this to compute the curvature tensor of S2 with theRiemann metric induced from R3. Conclude that no non-empty open set of S2

is isometric to an open set of R2 (with the respective Riemann metrics). Thisfact was first rigorously proved by Gauss.

(9) Prove that every 1-dimensional manifold is diffeomorphic either to S1 or toR. [Hint: Use Lemma 4.1.1 and parametrization by arc length.]

(10) Consider the ball S in R3 given by the equation

x2 C y2 C .z � 1/2 D 1:

Identifying the xy-plane with C by

z D x C iy;

define a map from S X f.0; 0; 2/g to C by mapping a point P on S withthe point Q in the xy-plane such that P , Q and .0; 0; 2/ lie on a straightline. This is called the stereographic projection. If we take on S the inducedRiemann metric from R3, and the standard complex structure on C, provethat the stereographical projection gives a coordinate system of a compatiblecomplex structure on S (or, equivalently, a conformal map).[Hint: This can be done using basic trigonometry. A particularly elegantsolution can be obtained by comparing the isometries of S with Mobiustransformations on C [ f1g.]

16Banach and Hilbert Spaces: Elementsof Functional Analysis

Let us now turn to infinite-dimensional geometry. The simplest such structure isprobably that of a Hilbert space. It is highly relevant for analysis, and plays a keyrole in such areas as stochastic analysis and quantum physics. In this chapter wewill discuss the basics of this concept; in the next one we will present some of itsuses.

In the process we will also introduce the more general Banach spaces. Some factsabout Hilbert spaces readily generalize to Banach ones, but deeper theorems in thismuch broader area require separate methods. These methods comprise a vast areaof mathematics called functional analysis. For good texts on this subject we canrecommend, e.g., [17, 19]. In this chapter we will be able to present some of thesimpler highlights of functional analysis, in particular the Hahn-Banach Theoremand some of its consequences, and the duality of Lp spaces.

1 Banach and Hilbert spaces

1.1

In this chapter we will work with vector spaces over the field R of real numbersand the field C of complex numbers (see Appendix A). Since the case of C isperhaps less familiar, we will emphasize it, especially in the theory of Hiblert spaces.All we say for C there remains true essentially verbatim over the field R as well,and the reader is encouraged to consider what changes are appropriate in the realcase (mostly, complex conjugation disappears). In the case of Banach spaces, thecases of R and C are sometimes really different. In those cases, we will spell outboth alternatives in detail.

Now recall the notion of an inner product from 4.2 of Appendix A and itsassociated norm (and hence metric) from 1.2.3 of Chapter 2. Recall also the generalnotion of a norm as introduced in 1.2 of Chapter 2.


393

394 16 Banach and Hilbert Spaces: Elements of Functional Analysis

If a normed vector space is complete (in the sense of Section 7 of Chapter 2)we speak of a Banach space. If, moreover, the norm has been obtained froman inner product as in 1.2.3 of Chapter 2, we speak of a Hilbert space. By anisomorphism of Banach spaces, we mean a vector space isomorphism which is alsoa homeomorphism. An isometric isomorphism (briefly isometry) is an isomorphismof vector spaces which preserves the norm. Note that an isometric isomorphismof Banach spaces which are Hilbert also necessarily preserves the inner product(Exercise (2)).

Examples: In particular, Rn (resp. Cn) equipped with the standard Pythagoreanmetric is an example of a real (resp. complex) Hilbert space, and more generally,each of the norms kvkp makes Rn, Cn into a Banach space (Exercise (20) ofChapter 5).

More interestingly, let B � Rn be a Borel subset. Recall the spaces Lp.B/,Lp.B;C/ of Section 8 of Chapter 5. In the present terminology, Theorem 8.5.2 ofChapter 5 says that Lp.B/ and Lp.B;C/, 1 � p � 1, are real resp. complexBanach spaces. In fact, on L2.B/, L2.B;C/ we have a real (resp. complex) innerproduct defined by

f � g DZ

B

f g

which is finite by the Cauchy-Schwarz inequality applied at every point. Since thenorm on L2 is the norm corresponding to this inner product, the spaces L2.B/ andL2.B;C/ are real and complex Hilbert spaces. The spaces Lp.B/, Lp.B;C/ are, insome sense, the most fundamental examples.

1.2 Theorem. A norm is a uniformly continuous map V ! R.

Proof. We have jjxjj D jjy C .x � y/jj � jjyjjCjjx � yjj and similarly with the rolesof x and y reversed, so

jjjxjj � jjyjjj � jjx � yjj: ut

1.3 An important convention

A subspace of a Banach resp. Hilbert space is a subset that is a Banach resp. Hilbertspace in the inherited structure. In particular, it is required to be complete. Thus, byProposition 7.3.1 of Chapter 2,

subspaces of a Banach resp. Hilbert space are precisely closed linear (vector)subspaces.

2 Uniformly convex Banach spaces 395

2 Uniformly convex Banach spaces

2.1

A normed linear space V is said to be uniformly convex if8" > 0 9ı > 0 such that for all x; y 2 V we have the implication

�jjxjj D jjyjj D 1 and jj xCy

2jj > 1 � ı

) jjx � yjj < ":

To reduce ı-", it is sometimes convenient to rephrase this as the following obviouslyequivalent statement:

For sequences of elements .xn/, .yn/ in V ,

.kxnk D kynk D 1 and kxn C yn

2k ! 1/ ) kxn � ynk ! 0

(here the symbol ! indicates the limit with n ! 1).

Explanation. This condition expresses the intuitive notion of convexity of the(unit) ball in the space as a sort of “bulging”. If you take for instance the norm fromExample 1.2.2(a) of Chapter 2, the unit ball is a cube; it does not really bulge: twoelements x; y on any of its faces may be far from each other while the distance ofthe mean point xCy

2from the center is still 1. In Example 1.2.2(c) of Chapter 2, on

the other hand, if we move x,y on the border from each other, the point xCy2

movesaway from the border. Draw a picture.

2.2 Theorem. A Hilbert space is uniformly convex.

Proof. Choose an " > 0 and set ı D 1 �q1 � "2

4. If jjxjj D jjyjj D 1 and jj xCy

2jj >

1 � ı Dq1 � "2

4, then we have

1 � "2

4<1

4.x C y/.x C y/ D 1

4.1C yx C xy C 1/ D 1

4.2C yx C xy/

and consequently

xy C yx > 2 � "2;

so

jjx � yjj2 D .x�y/.x�y/ D jjxjj2Cjjyjj2�xy �yx D 2� .xy Cyx/ < "2: ut

2.3 Lemma. Let yn; zn be elements of a uniformly convex Banach space such that

lim jjynjj D lim jjznjj D lim jj ynCzn2

jj D 1:

Then lim jjyn � znjj D 0.


Proof. First, we obviously have

lim

zn

jjznjj � znjjynjj

D lim1

jjznjj �.1 � jjznjj

jjynjj /zn D 0: (2.3.1)

Since the norm is a continuous function, it follows from (2.3.1) and the assumptionsthat

lim

1

2.

znkznk C yn

kynk / D lim

1

kynkzn C yn

2C 1

2.

znkznk � zn

kynk / D 1

and hence we obtain, by the uniform convexity, that

lim

yn

jjynjj � znjjznjj

D 0

and we conclude, using (2.3.1) again, that

lim jjyn � znjj D lim jjynjj �yn

jjynjj � znjjznjj C zn

jjznjj � znjjynjj

D 0: ut

2.4 Theorem. LetK be a closed convex subset of a uniformly convex Banach spaceB and let a 2 B . Then there exists precisely one element y 2 K such that

jjy � ajj D inffjjx � ajj j x 2 Kg:

Proof. The maps x 7! x � a and x 7! ˛x are obviously homeomorphismspreserving convexity. Thus, except for the trivial case of a 2 K , we can assumethat

a D o and inffjjxjj j x 2 Kg D 1:

Then there exists a sequence jjxnjj, n D 1; 2; : : : such that

lim jjxnjj D 1:

Since K is convex we have

1 � jj xnCxm2

jj � 1

2.jjxnjj C jjxmjj/: (2.4.1)

Suppose that the sequence .xn/n is not Cauchy. Then there exist subsequences .yn/nand .zn/n such that for some "0 > 0 and all n,

jjyn � znjj � "0:

3 Orthogonal complements and continuous linear forms 397

However, we have lim jjynjj D lim jjznjj D 1 and by (2.4.1) also lim jj ynCzn2

jj D 1 andhence by Lemma 2.3, lim jjyn � znjj D 0, a contradiction.

Thus, .xn/n is a Cauchy sequence and if we set y D lim xn we have y 2 K andjjyjj D 1. If we had jjzjj D 1 for another z 2 K we would have, according to thesame reasoning as above, a Cauchy sequence y; z; y; z; : : : ; y; z; : : : . ut

3 Orthogonal complements and continuous linear forms

3.1

Similarly as in 4.6 of Appendix A, we define for a subspaceM of a Hilbert spaceH

M? D fx j xy D 0 for all y 2 M g:Note that from the property xx D 0 ) x D o of the scalar product it follows that

M \M? D fog:Also note that M? is a Hilbert subspace: it is obviously a vector subspace of H ,and it is closed since the mapping ..x; y/ 7! xy/ W H ! C resp. R is continuous(indeed, we have jxy � x0y0j D jxy � xy0 C xy0 � x0y0j � jxy � xy0j C jxy0 � x0y0j �jjxjj � jjy � y0jj C jjy0jj � jjx � x0jj).

3.2 Theorem. Let M be a (Hilbert) subspace of a Hilbert space H . Then eachx 2 H can be uniquely written as

x D y C z with y 2 M and z 2 M?:

Proof. Using 2.3, consider the element y 2 M for which

jjx � yjj D minfjjx � ujj j u 2 M g

and put z D x � y. For a general non-zero u 2 M we have

jjz � zuuuujj � jjx � yjj D jjzjj;

and hence

jjzjj2 � zu

uuzu � uz

uuuz C zu

uu

zu

uuuu D jjzjj2 � 0; hence

� jzuj2 D �.zu/zu D �.zu/.uz/ � 0

so zu D 0, and finally z 2 M?.


If we have x D zCy D z0Cy0 with y; y0 2 M and z:z0 2 M? then y�y0 D z0�zand these differences are in M \M? D fog. ut

3.3 Theorem. .M?/? D M for all subspaces M .

Proof. ObviouslyM � .M?/?. Now let x 2 .M?/?. Using 3.2, write x D y C zwith y 2 M and z 2 M?. Then

zz D zx � zy D 0 � 0 D 0;

and hence z D o and x D y 2 M . ut

3.4

By Theorem 3.3, the mapping M 7! M? is a bijection of the set of all subspacesof H onto itself. Since it obviously reverses order by inclusion (i.e. M1 � M2 )M?2 � M?

1 ), we have

.M \N/? D M? CN? and .M CN/? D M? \N?

where M CN is the smallest subspace containing bothM and N .

3.5 Theorem. Let V; V 0 be normed vector spaces (real or complex). Then thefollowing statements for a linear operator f W V ! V 0 are equivalent.(1) f is continuous.(2) f is uniformly continuous.(3) There exists a numberK such that

jjxjj � 1 ) jjf .x/jj � K:

Because of condition (3) of the theorem, continuous linear operators betweennormed linear spaces are also referred to as bounded.

Proof. (2))(1) is trivial.(1))(3): Suppose the implication does not hold. Then there exist xk 2 V such

that jjxkjj � 1 and jjf .xk/jj � k. Put yk D 1kxk . Then lim yk D o while jjf .yk/jj �

1 and hence f .xn/ cannot converge to o D f .o/.(3))(2): Suppose such a K exists. For " > 0, put ı D 1

K". Now if jjx � yjj < ı,

then jjK".x � y/jj � 1, and hence

jjf .x/ � f .y/jj D jjf .x � y/jj D "

KjjK"f .x � y/jj � "

KK D ": ut


3.5.1This leads to a concept of a norm of a continuous linear map f W V ! V 0 betweennormed vector spaces defined by

jjf jj D supfjjf .x/jj j jjxjj � 1g:It is an easy exercise to show that it is indeed a norm on the vector space

L.V; V 0/

of all continuous linear maps f W V ! V 0 (with the natural addition and multipli-cation by scalars).

3.5.2A linear form on a real or complex normed vector space V is a continuous linearmapping V ! R resp. V ! C. Similarly as in 1.1 of Chapter 11, we will denote by

V �

the space of all linear forms on V . This is called the dual space of the normedvector space V . Note, however, that, unlike in 1.1 of Chapter 11, we now takethe continuous linear forms only. The definition from 3.5.1 yields a norm on V �defined by

jj'jj D supfj'.x/j j jjxjj � 1g:

3.5.3Similarly as in 1.2 of Chapter 11, we have for a continuous linear mapping f W V !V 0 a linear mapping f � W .V 0/� ! V � defined by

f �.'/ D ' ı f

(if f; ' are continuous then the composition ' ı f is continuous as well). We willshow that f � is continuous. This is an immediate consequence of the following

3.6 Lemma. We have jjf �.'/jj � jjf jj � jj'jj.

Proof. We have jjf �.'/jj D jj' ı f jj D supfj'.f .x//j j jjxjj � 1g. If jjxjj � 1 thenjf .x/j � jjf jj. Thus, 1

jjf jj jjf .x/jj � 1 and 1

jjf jj j'.f .x//j D j'. 1

jjf jjf .x//j � jj'jj.ut

3.6.1 Theorem. (The Riesz RepresentationTheorem) Let H be a Hilbertspace. Then- for every a 2 H , the mapping .x 7! xa/ W H ! C is a linear form, and- on the other hand every linear form ' W H ! C is given by the formula .x 7! xa/

for a uniquely determined a 2 H .


Proof. The first statement is obvious.Now let ' W H ! C be a continuous linear mapping. If it is constant (and hence,

zero everywhere) we can set '.x/ D xo. Otherwise

M D fx j '.x/ D 0g

is a subspace unequal to H and hence M? ¤ fog, by 3.2. First we will show thatdimM? D 1. Indeed, let o ¤ x; y 2 M? and consider u D '.y/x � '.x/y. Then

'.u/ D '.y/'.x/ � '.x/'.y/ D 0

and hence u 2 M \ M? D fog. Thus, '.y/x � '.x/y D o and since x; y arenon-zero in M?, '.x/; '.y/ are nonzero and x; y are linearly dependent.

Thus, we have

M? D f˛b j ˛ 2 Cg

for some b ¤ o. Now by 3.2, a general x 2 H can be written as

x D xM C ˛.x/b with xM 2 M:

Hence we have

'.x/ D ˛.x/'.b/ and xb D ˛.x/.bb/:

Comparing these two equations we obtain

'.x/ D xa where a D '.b/

bb:

The uniueness is obvious (if a ¤ b then 0 ¤ .a � b/.a � b/ and hence xa ¤ xb forx D a � b). ut

3.7 Lemma. Let ' W H ! C be given by '.x/ D xa. Then

jj'jj D supfj'.x/j j jjxjj � 1g D jjajj:

Proof. If jjxjj � 1 then j'.x/j D jxaj � jjxjjjjajj � jjajj. On the other hand we have'. 1

jjajja/ D 1

jjajjaa D jjajj. ut

3.7.1A map f between vector spaces over C is said to be antilinear if it preservesaddition and sends ˛z to ˛f .z/.


Theorem. The correspondence � D �H W H ! H� defined by �.a/.x/ D xa isbijective, antilinear and preserves norms.

Proof. � is one-one onto by Theorem 3.6.1. We have �.a C b/.x/ D x.a C b/ Dxa C xb D �.a/.x/C �.b/.x/, and �.˛z/.x/ D x.˛z/ D ˛.xz/. ut

3.7.2 RemarkNote that in the case of Hilbert spaces over R, the mappings �H are norm preservingisomorphisms.

3.8

Let f W H ! H 0 be a continuous linear mapping. By 3.5.3 we have a continuouslinear mapping f � W .H 0/� ! H� dual to f . On the other hand, in view of 3.7, wehave a continuous linear mapping associated with f going in the same direction,namely the g from the commutative diagram

H�H��! H�

f

??y??ygD�H 0f ��1

H

H 0 �H 0��! .H 0/�:

This calls for a closer analysis. For a continuous linear mapping f and a fixedy 2 H 0 we have the linear form, obviously continuous,

h D .x 7! f .x/y/:

By Theorem 3.5, there is, hence, a z 2 H such that

h D .x 7! xz/:

Setting z D f Ad .y/ we obtain a mappingH 0 ! H satisfying the formula

8x; y; f .x/ � y D x � f Ad .y/:

This mapping f Ad is referred to as the mapping adjoint to f . We will show that themapping g from the diagram above is equal to .f Ad /�. Indeed, we have

.f Ad /�.�H.a//.x/D .�H .a/ ı f Ad /.x/D �H .a/.fAd .x//

Df Ad .x/ � aD a � f Ad .x/Df .a/ � xD x �f .a/D �H 0.f .a//.x/:


3.9

A continuous linear mapping f W H ! H 0 is said to be Hermitian if it is adjoint toitself, that is, if f D f Ad , explicitly

f .x/ � y D x � f .y/ for all x; y 2 H .

Remark. Hermitian mappings (one also speaks of Hermitian operators) play animportant role in theoretical physics. It is a useful exercise to show that Hermitianoperators Cn ! Cn are associated with matrices A such that

A D AT D A�

(we have in mind the complex case with xy D Pxiyi and the complex conjugate

matrix defined by .aij/ij D .aij/ij; AT is, as usual, the transposed matrix). Recallfrom 7.2 of Appendix A that A� is sometimes called the adjoint matrix.

3.9.1The eigenvalues of a linear operator f W H ! H are numbers � such that f .u/ D�u for a non-zero u, and that the x’s satisfying such equations are called eigenvectors(compare 5.1 of Appendix B). We have

Theorem. 1. All the eigenvalues of a Hermitian operator f are real.2. Two eigenvectors associated with different eigenvalues are orthogonal.

Proof. 1. Let f .u/ D �u and u ¤ o. Then we have

�.u � u/ D �u � u D f .u/ � u D u � f .u/ D u � .�u/ D �.u � u/:

2. Let f .u/ D ˛u and f .v/ D ˇv, ˛ ¤ ˇ. Then we have

.˛ � ˇ/uv D ˛.uv/� ˇ.uv/ D .˛u/v � u.ˇv/ D f .u/v � uf .v/ D 0: ut

4 Infinite sums in a Hilbert space and Hilbert bases

4.1

We say that a system .xj /j2J of elements of a Hilbert space has a sum x and write

x DXJ

xj

if for every " > 0 there exists a finite J."/ � J such that for every finite K suchthat J."/ � K � J we have

4 Infinite sums in a Hilbert space and Hilbert bases 403

jjx �Xj2K

xj jj < ":

Observation. If a sum of .xj /j 2 J exists then it is uniquely detemined.

(Indeed, let the statement above hold for x and y. Then

jjx � yjj � jjx �Xj2K

xj jj C jjy �Xj2K

xj jj < 2": /

4.2 Theorem. .xj /j2J has a sum if and only if for every " > 0 there exists a finitesubset K."/ � J such that for each finite subset K � J satisfying K \K."/ D ;one has jj

XK

xi jj < ".

Proof. ) : Consider an " > 0 and put K."/ D J. "2/. Let K be finite and such that

K \K."/ D ;. Then we have

jjXK

xj jjDjjX

K[K."/xj �

XK."/

xj jj�jjX

K\K."/xj � xjj C jj

XK."/

xj � xjj<":

( : SetKn D K.1/[K.12/[� � �[K. 1

n/ and yn D

Xj2Kn

xj . From the assumption

we easily see that .yn/n is a Cauchy sequence and hence it has a limit x D limyn.We will show that x D

XJ

xj .

Choose an " > 0 and an n such that jjx � ynjj < "2

and at the same time 1n< "

2.

Take a K � Kn and set L D K XKn. Then

jjx �XK

xj jj D jjx � yn CXL

xj jj � jjx � ynjj C jjXL

xj jj � "

2C "

2D ":

(The last inequality uses L \Kn D ;.) ut

4.3 Theorem. A system .xj /j2J has a sum x if and only if either J is finite andXj2J

xj D x in the ordinary sense, or the following conditions hold simultaneously:

(a) for at most countably many j , xj ¤ o,(b) whenever we order the xj ¤ o in a sequence x1; x2; : : : we have

limn

nXkD1

xk D x

with the same result x.


Proof. We will use the same notation as above.

) : The set L D1[nD1

K.1

n/ is countable and if j … L then jjxj jj < 1

nfor all n,

and hence xj D 0. Thus, without loss of generality,

J D f1; 2; : : : ; n; : : : g:

For " > 0 choose n" such that J."/ � f1; 2; : : : ; n"g. Then for n � n", weobviously have

jjx �nX

kD1xkjj < ":

( : Suppose the sumXJ

xj does not exist. Choose a fixed order x1; x2; : : : .

Then the limit x D limn

nXkD1

xk either does not exist or it does but it is notXJ

xj .

In the latter case, by the definition, there exists an a > 0 such that

8 finite L � J 9 finite K.L/such that L � K.L/ � J and jjXK.L/

xj � xjj � a:

Put

A1 D f1g; B1 D K.A1/;

A2 D f1; 2; : : : ;maxB1 C 1g; B2 D K.A2/

and further, assuming A1; : : : ; An, B1; : : : ; Bn are already determined, put

AnC1 D f1; 2; : : : ;maxBn C 1g; BnC1 D K.AnC1/:

Now A1 � B1 ¨ A2 � B2 ¨ A3 � � � � and

limn

jjXAn

xj � xjj D 0 while jjXBn

xj � xjj � a: (4.3.1)

If we rearrange the sequence x1; x2; : : : into a sequence y1; y2; : : : by takingsuccessively all xj ’s from the blocks

A1;B1 X A1;A2 X B1; : : : ; An X Bn�1; Bn X An;AnC1 X Bn; : : :

(the xj in the individual blocks ordered arbitrarily), we see that in view of 4.3.1,

limn

nXkD1

yk does not exist. ut


4.4 Theorem. LetXJ

xj andXJ

yj exist in a Hilbert space H . Then

(1)XJ

˛xj exists and is equal to ˛XJ

xj ,

(2)XJ

.xj C yj / exists and is equal toXJ

xj CXJ

yj , and

(3) for every z the sumXJ

.xj z/ exists and is equal to .XJ

xj /z.

Proof. (1) and (2) are straightforward.(3): The mapping .x 7! xz/ is continuous. By Theorem 4.3, we can think of the

system .xj /j as of a sequence x1; x2; : : : with the sum x D limnX

kD1xk and conclude

that xz D .limnX

kD1xk/z D lim

nXkD1.xkz/ D

XJ

.xj z/. ut

4.5

Similarly as in 4.5 of Appendix A, we will speak of an orthogonal system .xj /j2Jif xj xk D 0 whenever j ¤ k. If, moreover, jjxj jj D 1 for all j 2 J we say that thesystem is orthonormal.

4.6 Theorem. (Generalized Pythagoras’ Theorem) An orthogonal system .xj /j ina Hilbert space has a sum if and only if the system .jjxj jj2/j has a sum in R. In thatcase, we have

jjXJ

xj jj2 DXJ

jjxj jj2:

Proof. I. Existence:) : Consider the sets K."/ from 4.2. If K � J is finite and K \ K."/ D ;then, using orthogonality,

XK

jjxj jj2 DXj;k2K

xj xk D .XK

xj /.XK

xj / D jjXK

xj jj2 < "2:

( : Reason as in the ) implication but in reverse, using, this time, the setsK."2/.

II. The equality:Set x D

XJ

xj . By 4.4(3), we have


xx D .XJ

xj /x DXJ

.xj x/ DXJ

xj .XJ

xk/ DXJ

XJ

.xj xk/ DXj

xj xj :

ut

4.7 Theorem. (Bessel’s inequality) Let .xj /j2J be an orthogonal system in aHilbert space H . Then for each element x 2 H , the sum

PJ jxxj j2 exists and

one hasXJ

jxxj j2 � jjxjj2:

Proof. Let K � J be a finite subset. We have

0 �jjx �XK

.xxj /xj jj2 D .x �XK

.xxj /xj /.x �XK

.xxj /xj /

D xx �XK

.xxj /.xj x/ �XK

.xxj /.xj x/CXj;k2K

.xxj /.xxk/.xj xk/

D xx �XK

.xxj /.xxj /�XK

.xxj /.xxj /CXK

.xxj /.xxj /

D xx �XK

.xxj /.xxj / D xx �XK

jxxj j2

and henceXK

jxxj j2 � jjxjj2:

Thus, the sumXJ

jxxj j absolutely converges (recall 6.2 and 6.3 of Chapter 1). ut

4.8

From 4.7 and 4.6, we immediately obtain the following

Corollary. If .xj /j2J is an orthonormal system in H then for every x 2 H thereexists the sum

XJ

.xxj /xj :

4.9 Theorem. (Parseval’s equality) One hasXJ

jxxj j2 D jjxjj2, that is the Bessel

inequality becomes equality, if and only if x DXJ

.xxj /xj .


Proof. Recall the beginning of the proof of Theorem 4.7: instead of the inequality

0 � jjx �XK

.xxj /xj jj2 consider 0 D jjx �XK

.xxj /xj jj2 and observe that the

formulas in the statement express the same fact. ut

4.10

A Hilbert basis of a Hilbert space H is a maximal orthonormal system in H , thatis, an orthonormal system .xj /j2J such that no non-zero x 2 H is orthogonal to allof the xj , j 2 J .

Using Zorn’s lemma (for the system of all orthogonal systems ordered byinclusion), one easily proves the following

Proposition. Every Hilbert space has a Hilbert basis.

Remark. There is a terminological conflict: a Hilbert basis of H is not a basisof H as a vector space; the point is not in the orthogonality – we already have theconcept of an orthogonal basis in a vector space with a scalar product, and a Hilbertbasis is in general not that either. It does not generate the space: a general elementis not necessarily a linear combination of its elements. But, as we will see, a generalelement can be expressed as an “infinite linear combination” of the elements of aHilbert basis.

4.11 Theorem. Let .xj /j2J be an orthonormal system in a Hilbert space H . Thenthe following statements are equivalent.(1) .xj /j2J is a Hilbert basis.(2) If x is orthogonal to all the xj , j 2 J then x D o.

(3) For every x 2 H one has x DXJ

.xxJ /xj .

(4) For every two x; y 2 H one has

xy DXJ

.xxj /.yxj /:

(5) For every x 2 H one has

jjxjj DsX

J

jxxj j2:

Proof. (1),(2) is just a reformulation of the definition.(2))(3) : For every x 2 H , one has .x �P

.xxj /xj /xk D xxk � xxk D 0 foreach k and hence by (2), x �P

.xxj /xj D 0.


(3))(4) : We have

xy D .Xj

.xxj /xj /.Xk

.yxk/xk/ DXj;k

.xxj /.yxk/xj xk DXj

.xxj /.yxj /:

(4))(5) : Suppose (1) does not hold. Choose an element X such that jjxjj D 1

and xxj D 0 for all j . Then jjxjj D 1 ¤ 0 D pPJ jxxj j2. ut

5 The Hahn-Banach Theorem

Let us now turn our attention to Banach spaces. Recall that linear maps f W V ! R,f W V ! C for a real resp. complex vector space V are called linear forms.

5.1 Theorem. (Hahn - Banach) Let V be a real vector space and let W V ! R

be a function such that(a) for all x; y 2 V , .x C y/ � .x/C .y/ and(b) for every x 2 V and r 2 h0;1/, .rx/ D r .x/.Let V0 be a vector subspace of V and let f0 be a linear form on V0 such that

f0.x/ � .x/ for all x 2 V0:

Then there exists a linear form f on V such that

f0 D f jV0 and f .x/ � .x/ for all x 2 V:

Proof. Consider the system W of all pairs .W; g/ where W � V0 is a vectorsubspace of V and g W W ! R a linear form such that gjV0 D f0 and thatjg.x/j � .x/ for all x 2 W .

On W define an order v by the formula

.W1; g1/ v .W2; g2/ �df W1 � W2 and g2jW1 D g1:

Let C D f.Wi ; gi / j i 2 J g � W be a chain in this order. SettingW D Si2J Wi and

defining g W W ! R by f .x/ D fi .x/ for x 2 Wi , we obtain a .W; g/ majorizingall the .Wi ; gi /. By Zorn’s Lemma, there is, hence, a .W; g/ 2 W maximal in theorder v.

We will prove the statement of the theorem by showing that W D V . SupposeW ¤ V . Choose a 2 V XW and let

W 0 D fx C ra j x 2 W; r 2 Rg:For arbitrary x; y 2 W we have

g.x/C g.y/ D g.x C y/ � .x C a C y � a/ � .x C a/C .y � a/

5 The Hahn-Banach Theorem 409

and hence

g.y/ � .y � a/ � �g.x/C .x C a/:

Since x; y are arbitrary there is a real number ˛ such that

8x; y 2 W: g.y/ � .y � a/ � ˛ � �g.x/C .x C a/ (*)

(for instance ˛ D supy.g.y/� .y � a//, or ˛ D infx.�g.x/C .x C a//). Nowdefine a linear form

h W W 0 ! R by letting h.x C ra/ D g.x/C r˛

(this is correct, if x C ra D y C sa then .r � s/a D x � y 2 W , hence s D r andx D y). Let r > 0. Since, by (*),

g.1

rx/C ˛ � .

1

rx C a/;

we have

h.x C ra/ D r.g.1

rx/C ˛/ � r .

1

rx C a/ D .x C ra/:

Similarly if r < 0 we use the inequality

g.�1rx/ � ˛ � .

�1rx � a/

to obtain

h.x C ra/ D �r.g.�1rx/� ˛/ � �r. .�1

rx � a// D .x C ra/:

Since trivially h.x C 0 � a/ � .x C 0 � a/ we conclude that h.y/ � .y/ for ally 2 W 0 contradicting the maximality of .W; g/. ut

5.2 Corollary. (Hahn-Banach’s Theorem - the complex version) Let V be acomplex vector space, and let W V ! h0;1/ satisfy(a) for all x; y 2 V , .x C y/ � .x/C .y/ and(b) for every x 2 V and r 2 C, .rx/ D jr j .x/.Let V0 be a vector subspace of V and let f0 be a linear form on V0 such that

jf0.x/j � .x/ for all x 2 V0:Then there exists a linear form f on V such that

f0 D f jV0 and jf .x/j � .x/ for all x 2 V:


Proof. View V as a vector space over R. By Hahn-Banach’s Theorem, there existsa linear map g W V ! R such that gjV0 D Re.f0/, g.x/ � .x/. Then there existsa (unique) complex-linear map f W V ! C such that Re.f / D g. In particular, byuniqueness, f jV0 D f0. Now for every x 2 V , there exists a complex number � ofmodulus 1 such that

�f .x/ D jf .x/j:Thus,

jf .x/j D f .�x/:

Hence f .�x/ 2 R, and hence f .�x/ D g.�x/. Now compute:

jf .x/j D g.�x/ � .�x/ D j�j .x/ D .x/: ut

5.3

As an easy consequence of Hahn - Banach Theorem we obtain

Proposition. Let L be a normed real or complex vector space and let M be avector subspace of L. Let g be a continuous linear form on M . Then there exists acontinuous linear form on L such that kf k D kgk (the norms in L� andM �).

Proof. Use Theorem 5.1 resp. Corollary 5.2 with V D L, V0 D M and .x/ Dkgk � kxk. ut

5.4

And here is another one.

Proposition. Let L be a normed vector space and let M be a closed vectorsubspace. Let M ¤ L. Then there is a continuous non-zero linear form f on Lsuch that f jM is constant zero.

Remark. Note that we speak of continuity but not of the norm: norm of f jM iszero and would not help us.

Proof. Choose an a 2 L XM . Since M is closed, inffkx � ak j x 2 M g D d > 0.Define a linear form g onM 0 D fxC ra j x 2 M; r 2 Rg by setting g.xC ra/ D r .We have

k.xCra/�.yCsa/k D kx�yC.r�s/ak D jr�sj �k 1

r � s .x�y/Cak � jr�sjd

6 Dual Banach spaces and reflexivity 411

and hence g is continuous. Now extend g to a continuous linear form on L usingthe Hahn-Banach Theorem. ut

6 Dual Banach spaces and reflexivity

6.1

Recall the definition 3.5.2 of the dual L� of a normed vector space L.

Proposition. L� is always complete (and consequently is always a Banach space).

Proof. To fix ideas, let us consider the real case (the complex case is analogous).Suppose .fn/ is a Cauchy sequence in L�. Let B be the unit ball in L. Then, bydefinition, the restriction fnjB is a Cauchy sequence in the space C.B/ of boundedcontinuous functions on B , which we discussed in Chapter 2 (and, in fact, theL�-distances kfm � fnk are equal to the C.B/-distances). However, we alreadyknow that the space C.B/ is complete, and thus the sequence .fnjB/ convergesuniformly to a function f0 W B ! R. Then it is immediate that the function f 2 L�defined by

f .v/ D kvk � f0.v=kvk/

is the limit of the sequence .fn/ in L�. ut

6.2

Recall from Section 3.6 that for a continuous linear mapping f W L ! M , we havea continuous linear mapping

f � W M � ! L� by setting f �.�/ D � � fand that we have kf �k � kf k. In fact, the norms are equal.

Proposition. We have kf �k D kf k.

Proof. To fix ideas, let us consider the real case (the complex case is analogous).Choose an " > 0 and an x0 2 L such that 0 < kx0k � 1 and kf .x0/k � kf k � ".On the vector subspace frf .x0/ j r 2 Rg define a linear form g by setting

g.rf .x0// D rkf .x0/k. Then kgk D 1 (the unit ball is frf .x0/ j r � 1

kf .x0/kg)

and hence there is, by Proposition 5.3, a linear form � 2 M � such that k�k � 1

and �.f .x0// D kf .x0/k. Thus, kf �k � kf �.�/k D k�f k � j�.f0.x0//j Dkf .x0/k � kf k � ". Since " > 0 was arbitrary we conclude that kf �k D kf k. ut


6.3

For a normed linear space L define

� D �L W L ! L�� by setting .�.x//.�/ D �.x/:

6.3.1 Proposition. � is a linear map preserving norm, and for every continuouslinear map f W L ! M we have a commutative diagram

L�L��! L��

f

??y??yf ��

M�M��! M ��

:

Proof. Again, to fix ideas, let us work in the real case. The complex case is thesame.

Checking that � is linear is straightforward. Consider the formula

k�.x/k D supfj�.x/.f /j j kf k � 1g D supfjf .x/j j kf k � 1g:By Lemma 3.6, jf .x/j � kf k � kxk and hence we see that k�.x/k � kxk.

Now fix an x ¤ o and define a linear form g W L0 D frx j r 2 Rg ! R by settingg.rx/ D rkxk. The unit ball in L0 is the set frx j r � 1

kxk g and hence kgk D 1.By Proposition 5.3, we can extend g to a linear form f on L with kf k D 1 and wehave �.x/.f / D f .x/ D kxk. Thus, k�.x/k � kxk.

Finally, let f W L ! M be a continuous linear map, x 2 L and � 2 M �.We have

..f �� L/.x//.�/ D .f ��.�.x//.�/ D .�L.x/ � f �//.�/

D �.x/.f �.�// D �L.x/.� � f /D �.f .x// D .�M .f .x///.�/ D ..�M � f /.x//.�/;

that is, f �� L D �M � f . ut

6.4

A Banach space B is said to be reflexive if the mapping �B is surjective (and hencea norm preserving isomorphism).

6.4.1 Remark:We have seen in Theorem 3.7.1 that the dual space of a Hilbert space H isantilinearly isomorphic to H by the inner product. Composing the antilinearisomorphisms

6 Dual Banach spaces and reflexivity 413

H ! H� ! .H�/�;

one gets the map � of 6.3, and thus a Hilbert space is always reflexive.

6.5 Proposition. Let a Banach spaceB not be reflexive. Then neither is the Banachspace B�.

Proof. Since B is complete, the vector subspace �BŒB� of B�� is also complete (itis norm-isomorphic) and hence, by Proposition 7.3.1 of Chapter 2 closed in B��.By Proposition 5.4, there exists an F 2 B��, a linear form on B�� that is non-zerobut identically zero on �BŒB�. We will show that it is not in �B� ŒB��. Suppose it is,that is, F D �B�.f / for a linear form f onB . In particular, for each �B.x/ we haveF.�B.x// D 0. Thus,

0 D �B�.f /.�B.x// D �B.x/.f / D f .x/

for all x, hence f D o and finally also F D �.o/ is identically zero, a contradiction.ut

6.6 The weak topology

The following construction works over R or C. To fix ideas, let us work over C. Thetreatment over R is analogous. Let W be a Banach space and let W � be its dual.The weak topology of W � (with respect to W ) has a basis of open sets determinedby all possible choices of elements f1; : : : fn 2 W , and open sets U1; : : : ; Un � C:The basis element corresponding to this data is

fX 2 W � j X.f1/ 2 U1; : : : ; X.fn/ 2 Ung:

6.6.1 Lemma. Let V be a normed vector space. Then the unit ballB of .V �/� is theclosure of the image B1 of the unit ball of V under the canonical map V ! .V �/�,with respect to the weak topology (with respect to V �).

Proof. To prove that B is contained in the closure of B1 with respect to the weaktopology, it suffices to show that every open set U in the weak topology disjointwith B1 is also disjoint with B . For open sets U which are of the form

F�11 ŒU1� \ � � � \ F�1

n ŒUn�

with U1; : : : ; Un open for F1; : : : ; Fn 2 V � (such sets form a basis of the opentopology), we may as well take the quotient of both V and .V �/� by the annihilatorof F1; : : : ; Fn (i.e. the subspace of elements which have 0 evaluation on F1; : : : ; Fn).The map induced on the quotients from the canonical map V ! .V �/�, however, is


the canonical mapW ! .W �/� whereW is the quotient of V by the annihilator ofF1; : : : ; Fn, which is an isomorphism since W is finite-dimensional.

On the other hand, if X … B , there exists an F 2 V � such that kF k D 1,X.F / > 1. This means that the open set determined by

F1 D F

and

U1 D .1C .X.F / � 1/=2;1/

contains X but is disjoint from B1, thus showing that X is not in the closure of B1with respect to the weak topologiy. ut

6.7 Theorem. (The Milman-Pettis Theorem) Every uniformly convex Banach spaceV is reflexive.

Proof (The proof we present here is due to J.R. Ringrose). Let V be a uniformlyconvex Banach space. By uniform convexity, for every " > 0 it is possible to choosea ı D ı."/ > 0 such that if x; y 2 V satisfy

kxk; kyk � 1; kx C yk � 2 � ı;

then

kx � yk < ":Now suppose V is a uniformly convex Banach space which is not reflexive. Let Bbe the closed unit ball in .V �/�, and let B1 be the image of the closed unit ball inV under the canonical map V ! .V �/�. Then B is contained in the closure of B1under the weak topology (with respect to the space V �). Assuming B ¤ B1, sincethe canonical embedding V ! .V �/� is an isometry, by completeness, the image isclosed, and thus B1 is a closed subset of B . This means that there exists an " > 0

and an X 2 B such that, in .V �/�,

.X; 2"/\ B1 D ;: (*)

Now choose an F 2 V � such that kF k D 1 and jX.F /� 1j < 1

2ı where ı D ı."/.

Then put

V D fY 2 .V �/� j jY.F /� 1j < 1

2ıg:

If Y; Y1 2 V \B1, we have jY.F /C Y1.F /j > 2� ı, and hence kY C Y1k > 2� ı,and therefore kY � Y1k < ". Fixing Y , we deduce that

7 The duality ofLp-spaces 415

V \ B1 � Y C "B:

Since, however, the right-hand set is closed in .V �/� under the weak topology(with respect to V �), while X is in the closure o V \ B1 with respect to the weaktopology (since, in that topology, V is open), we deduce that X 2 Y C "B . This isa contradiction with (*). ut

7 The duality ofLp-spaces

We begin with the following result:

7.1 Theorem. For 1 < p < 1, the spacesLp.B/,Lp.B;C/ are uniformly convex.

7.2 Reduction to the real case

The remainder of this section will consist of a proof of Theorem 7.1. The first thingwe should realize is that the real and complex cases are actually somewhat different,since in the complex case the definition of Lp uses the complex absolute value,which, in effect, is a Hilbert space norm on C D R2. Because of this, we don’t havean obvious isomorphism of Lp.B;C/, considered as a real Banach space, to a realLp-space (although we won’t prove that they are not isomorphic). Of course,Lp.B/is embedded into Lp.B;C/ isometrically, and hence the uniform convexity forLp.B;C/ implies the uniform convexity of Lp.B/. We will, however, be interestedin the opposite implication, as the proof of uniform convexity of Lp.B/, is, in fact,somewhat simpler.

Assume, therefore, that we already know that Lp.B/ is uniformly convex, andlet .fn/, .gn/ be sequences in Lp.B;C/ such that

kfnkp D kgnkp D 1; kfn C gn

2kp ! 1:

Then certainly

k jfnj kp D k jgnj kp D 1;

and

kfn C gn

2kp � kjfnj C jgnj

2kp � 1

(the second inequality by the triangle inequality), so

kjfnj C jgnj2

kp ! 1;


and hence by the uniform convexity of Lp.B/,

k jfnj � jgnj k ! 0:

This means that there exist measurable functions ˛n W B ! C, j˛n.x/j D 1 for allx, such that

kfn � ˛ngnkp ! 0: (*)

From the uniform convexity of Hilbert spaces (applied to the 1-dimensional complexHilbert space C), we know that for each " > 0 there exists a ı > 0 such that

j˛n.x/ � 1j > " ) jgn.x/C ˛n.x/gn.x/

2j < .1 � ı/1=pjgn.x/j:

Denote by Sn the set of all x 2 B such that j˛n.x/�1j > ", and denote by cn D cSnits characteristic function (i.e. the function equal to 1 on Sn and 0 elsewhere). Then

.kgn.x/C ˛n.x/gn.x/

2kp/p � .kgnkp/p � ı.kgn � cnkp/p:

Taking n ! 1 and using (*), we obtain

limn!1 jjgn � cnjjp D 0;

and hence

limn!1 kfn � gnjjp � lim

n!1 kfn � ˛ngnjjp C limn!1 jj.1 � ˛n/gnjjp �

� limn!1."kgnkp C kgn � cnkp/ D ":

Since " > 0 was arbitrary, we are done: it suffices to prove the uniform convexity ofLp.B/.

7.3 The uniform convexity ofLp.B/

We will show now a simple argument proving the uniform convexity of Lp.B/which does not generalize to the complex case, thus explaining in particular whythe reduction 7.2 pays off.

7.3.1 Lemma. Let 1 � p < 1 and let f; g be non-negative real functions whichrepresent elements in Lp.B/. Then

.kf C gkp/p � .kf kp/p C .kgkp/p:

7 The duality ofLp-spaces 417

Proof. Note that for non-negative numbers x; y and p � 1, we have

.x C y/p � xp C yp:

In effect, dividing by yp , we may assume without loss of generality y D 1, and then

.x C 1/p � xp DZ xC1

x

ptp�1dt

is a non-decreasing function in x, and hence is � 1. Now we have

.kf C gkp/p � .kf kp/p � .kgkp/p DZ

B

..f .t/C g.t//p � f .t/p � g.t/p/ � 0;

as claimed. ut

7.3.2 Lemma. If, in a normed vector space, sequences .xn/, .yn/ satisfy

kxnk ! 1; kxn C ynkp C kxn � ynkp ! 2;

then

kxn C ynk ! 1; kxn � ynk ! 1:

Proof. Using the compactness of the interval h0; 3i, by picking a subsequence, wemay assume, without loss of generality, that

kxn C ynk ! ˛; kxn � ynk ! ˇ

for some ˛; ˇ � 0. Now we have

˛ C ˇ D limn!1.kxn C ynk C kxn � ynk/ � lim

n!1 k2xnk D 2;

while ˛p C ˇp D 2. Thus,

.1

2.˛ C ˇ//p � 1

2.˛p C ˇp/ D 1;

and hence, since tp is a convex function on h0;1/, ˛ D ˇ and equality occurs. ut

7.3.3 Proof thatLp.B/ is uniformly convexSuppose .fn/, .gn/ to be sequences in Lp.B/ such that

kfnkp D kgnkp D 1; kfn C gn

2kp ! 1:


Put

xn D fn C gn

2; yn D fn � gn

2:

Then

kxn C ynkp D kfnkp D 1 D kgnkp D kxn � ynkp;and hence

2 D .kxn C ynkp/p C .kxn � ynkp/p

DZ

B

.jxn.t/C yn.t/jp C jxn.t/ � yn.t/jp/

DZ

B

.j jxn.t/j C jyn.t/j jp C j jxn.t/j � jyn.t/j jp/

D .k jxnj C jynj kp/p C .k jxnj � jynj kp/p:

(Note that in the third equality, it is crucial that xn, yn are real numbers.) Now byLemma 7.3.2,

k jxnj C jynj kp ! 1:

Using Lemma 7.3.1,

.kynkp/p � .k jxnj C jynj kp/p � .kxnkp/p ! 0;

as claimed. This concludes the proof that Lp.B/ is uniformly convex, and hence,by Subsection 7.2, the proof of Theorem 7.1. ut

7.4 Theorem. Let B be a Borel subset of Rn. Let 1 < p < 1 and let1

pC 1

qD 1

(then, of course, also 1 < q < 1). We have isometric isomorphisms of Banachspaces

Uq W Lq.B/ Š .Lp.B//�

and

Uq W Lq.B;C/ Š .Lp.B;C//�

given by

.Uq.y//.x/ DZ

B

x � y: (7.4.1)

8 Images of Banach spaces under bounded linear maps 419

Proof. Let us prove the complex case (the real case is analogous). By Holder’sinequality, the integral (7.4.1) exists, and we have

j.Uq.y//.x/j � kykq � kxkp:

Since Uq.y/ is linear, we therefore have Uq.y/ 2 .Lp.B;C//� with

kUq.y/k � kykq: (*)

To deduce that Uq is an isometry, we need to show that the norms are in fact equal.Let, therefore, y 2 Lq.B;C/ be such that kykq D 1. Let ˛ W B ! C be ameasureable function such that j˛.t/j D 1 for t 2 B and

˛.t/y.t/ D jy.t/j:

Define x.t/ D jy.t/jq=p˛.t/. Then x 2 Lp.B;C/, and kxkp D 1. We compute:

.Uq.y//.x/ DZ

B

xy DZ

B

jyjq=pjyj DZ

B

jyjq D 1;

thus proving the equality in (*).Thus, Uq is an isometric embedding, and since Lq.B;C/ is complete, the image

of Uq is closed. We need to show this map is onto. However, if Uq is not onto, thenby Proposition 5.4, there exists a non-zero ! 2 ..Lp.B;C//�/� such that

!.Uq.y// D 0 for all y 2 Lq.B:C/.

However, since Lp.B;C/ is uniformly convex by Theorem 7.1, it is reflexive byTheorem 6.7, and hence ! D �.x/ for some x 2 Lp.B;C/. We conclude that

.Up.x//.y/ D 0 for all y 2 Lq.B;C/,

which contradicts the fact that Up is an isometry. ut

8 Images of Banach spaces under bounded linear maps

8.1

Recall that a map f W X ! Y between topological spaces is open if the image ofeach open subset of X is open. It is relatively open if its restriction X ! f ŒX� isopen. In this section, we will write for subsets S; T of a vector space V and a pointx 2 V ,


x C S D fx C y j y 2 Sg;S C T D fx C y j x 2 S; y 2 T g

and similarly x � S , S � T etc.We have an immediate

8.1.1 Observation. A linear map f W M ! N between normed vector spacesis open if and only if the image f ŒU � of every neighbourhood of zero in M is aneighbourhood of zero in N .

8.1.2 Corollary. An open linear map f W M ! N is onto.

Proof. f ŒM � contains an open neighborhood U of o, so there exists an " > 0 suchthat kvk < " ) v 2 f ŒM �. But scalar multiples of elements of U are also in f ŒM �

since f is linear, and these include all elements of N . ut

8.2 Proposition. Let M;N be normed vector spaces and let f W M ! N be anopen continuous linear map. If M is complete then N is also complete.

Proof. Let .yn/ be a Cauchy sequence inN . LetB be the unit ball inM . Then sincef is open, there exists a ı > 0 such that f ŒB� contains all vectors of norm � ı.By passing to a subsequence, if necessary, we may assume that

kyn � ynC1k < 1

2n:

Now f is onto, so there is an x1 2 M such that f .x1/ D y1. By induction, then, wemay choose xn such that

f .xn/ D yn

and

kxn � xnC1k < 1

2nı:

Then .xn/ is a Cauchy sequence. Let x D lim xn. Then f .x/ D limyn by continuity.ut

8.3 Lemma. Let M;M1 be normed vector spaces such that M is complete. Letf W M ! M1 be a continuous linear map such that for each neighbourhood Uof o in M the closure of the image f ŒU � is a neighborhood of o in M1. Then foreach neighbourhood U of o the image f ŒU � is a neighborhood of o (and hence fis open).


Proof. Choose a neighborhoodU of o and an ˛ > 0 such that

fx 2 M j kxk � ˛g � U:

Let

Un D fx j kxk � ˛

2ng; Vn D f ŒUn�:

Thus, every Vn is a neighborhood of o in M1. We will prove that f ŒU � is aneigborhood of zero by showing that V1 � f ŒU �. To this end, let y 2 V1 bearbitrary; we look for an x 2 U such that y D f .x/.

We will find inductively xk 2 Uk k D 1; 2; : : : such that for all n,

y �nX

kD1f .xk/ 2 VnC1 and

ky �nX

kD1f .xk/k < 1

n:

(*)

First, since .y � V2/ \ fz j ky � zk < 1g is a neighborhood of y and y is in theclosure of f .U1/, we have a

y1 2 .y � V2/\ fz j ky � zk < 1g \ f .U1/;

that is, a y1 D f .x1/ with x1 2 U1 such that ky � f .x1/k < 1 and y1 D y � v withv 2 V2, that is, y � f .x1/ D v 2 V2.

Now suppose we already have x1; : : : ; xn such that (*) holds. Then

y �nX

kD1f .xk/ 2 f ŒUnC1� and since

..y �nX

kD1f .xk//� VnC2/ \ fz j ky �

nXkD1

f .xk/� zk < 1

nC 1g

is a neigborhood of y �nX

kD1f .xk/ there is an xnC1 2 UnC1 such that

.y �nX

kD1f .xk// � f .xnC1/ D y �

nC1XkD1

f .xk/ 2 VnC2; and

ky �nX

kD1f .xk/� f .xnC1/k D ky �

nC1XkD1

f .xk/k < 1

nC 1;

which are the conditions (*) with nC 1 replacing n.


Since xk 2 Uk , clearly, the sequence

.

nXkD1

xk/

is Cauchy, and if we denote its limit by x, then

f .x/ D limnX

kD1f .xk/ D y:

Finally, kxk � ˛, and hence x 2 U . ut

Recall the definition of a meager set (set of the first category) from 3.3 ofChapter 9, and the Theorem 3.4 of Chapter 9 stating that no complete space ismeager in itself (Baire’s Category Theorem).

8.4 Theorem. Let M;N be normed vector spaces, M a complete one. Let f WM ! M1 be a continuous linear map. Then there holds precisely one of thefollowing statements.(1) f ŒM � is complete and f is relatively open.(2) f ŒM � is meager in itself and f is not open; moreover, there is a neighborhood

U of o such that f ŒU � is nowhere dense in f ŒM �.

Proof. The two alternatives exclude each other by Baire’s Category Theorem(Theorem 3.4 of Chapter 9).

I. Suppose there is a neighbourhood U of zero such that f ŒU � is nowhere dense

in f ŒM �. Then f is obviously not open. Furthermore,M D1[nD1

nU and hence

f ŒM � D1[nD1

nf ŒU �. Obviously, if A is nowhere dense, then nA is nowhere

dense also. Thus, f ŒM � is meager in itself.II. Let none of the f ŒU � with U a neighbourhood of zero be nowhere dense.

Thus, each such f ŒU � is a neighbourhood of some of its points. We will provethat in fact it is a neighbourhood of o and the statement will follow fromProposition 8.2 and Lemma 8.3.

Let U be a neighborhood of zero in M . By continuity of the addition we have aneighborhood V 0 such that V 0 C V 0 � U and by continuity of the map x 7! .�x/,�V 0 is a neighborhood of zero, and finally also V D V 0 \ .�V 0/ is a neighborhoodof o. The set f .V / is a neighborhood of a point y0 and since V D V 0 \ .�V 0/, it isalso a neighborhood of �y0. Consider the homeomorphism � D .y 7! y � y0/.It maps f ŒV � onto f ŒV � � y0 and since f ŒV � � y0 � f ŒV � C f ŒV � � f ŒU � wehave �.f ŒV �/ � f ŒU � and since �.y0/ D o and � is a homeomorphism, f ŒU � is aneighborhood of o. ut


8.5

As an immediate corollary we obtain an important

Theorem. Let M ! N be Banach spaces and let f W M ! N be a bijectivelinear map. Then f is a homeomorphism.

Proof. Alternative (2) of Theorem 8.4 is excluded by Baire’s Category Theorem.ut

8.6

Note that, somewhat surprisingly, we have in Theorem 8.5 the continuity off �1 implied by the continuity of f (reminiscent of the mappings betweencompact Hausdorff spaces, and, even more basically, the behaviour of algebraichomomorphisms).

We will present, as a consequence of Theorem 8.5, another case of an “invertedimplication”.

Let X1;X2 be metric spaces; consider a mapping f W X1 ! X2 and its graph

G D f.x; f .x// j x 2 X1g � X1 X2:If f is continuous then the graph G is obviously closed in X1 X2 (the sequence.xn; f .xn// either converges to .lim xn; f .lim xn// or does not converge at all).

Equally obviously, closedness of the graphG does not imply continuity (considera discontinuous one-one onto map f with continuous f �1). For Banach spaces wehave, however,

8.6.1 Theorem. (The Closed Graph Theorem) Let Mi , i D 1; 2, be Banach spacesand let f W M1 ! M2 be a linear map with a closed graph G D f.x; f .x// j x 2M1g � M1 M2. Then f is continuous.

Proof. Consider the space M1 M2 with the norm

k.x1; x2/k D max.kx1k; kx2k/:This is a Banach space (a product of two complete metric spaces is complete). ThegraphG D f.x; f .x// j x 2 M1g is a closed vector subspace ofM1M2 and henceit is, again, a Banach space.

Now the projection

p1 D ..x; y/ 7! x/ W G ! M1

is a continuous map. It is linear one-one and onto, and hence, by Theorem 8.5, theinverse p�1

1 W M1 ! G is continuous. Since also p2 D ..x; y/ 7! y/ W G ! M2 iscontinuous, the composition f D p2p

�11 W M1 ! M2 is continuous. ut


8.6.2 Remark:The completeness hypothesis in Theorem 8.6.1 is essential. Consider the spaceC.ha; bi/ of continuous real functions on a closed interval ha; bi with the normk k D maxt2ha;bi j�.t/j. Take the subspace M � C.ha; bi/ consisting of thefunctions with a continuous derivative (one-sided in a and b). Now M is anormed vector space (not complete, though) and the convergence in M is uniformconvergence. By Theorem 5.3 of Chapter 1, if functions xn converge to X and ifthe derivatives x0

n converge to y then x0 exists and x0 D y. Thus, the mappingD D .x 7! x0/ W M ! M of taking the derivative has a closed graph. Obviously,however, D is not continuous; in fact it is continuous at no point x 2 M .

9 Exercises

(1) Prove that any finite-dimensional vector space Vwith an inner product is aHilbert space. Prove that the norms associated with any two inner products onV define equivalent metrics.

(2) Prove that if f W H ! H 0 is an isometric isomorphism of Banach spaceswhere H;H 0 are Hilbert spaces, then f .u/ � f .v/ D u � v. [Hint: there is aformula expressing the dot product from its associated norm.]

(3) Prove that the closure of the unit ball.o; 1/ in a Hilbert space H is compactif and only if H is finite-dimensional.

(4) Give an example of a bounded linear operator F W H ! H , where H is aHilbert space, whose image is not closed.

(5) Prove that the symbol jjf jj defined in 3.5.1 is a norm on the space L.B;B 0/of continuous linear maps B ! B 0 for Banach spaces B;B 0.

(6) Prove the statement of 3.4 in detail.(7) Let V be a finite-dimensional Hilbert (Dinner product) space over C and let

f W V ! V be a Hermitian operator. Define, for x; y 2 V ,B.x; y/ D f .x/�y.Prove that B is a Hermitian form.

(8) Let H;J be Hilbert spaces. A linear operator F W H ! J is called compactif F ŒB� is compact where B D fx 2 H j jjxjj � 1g.(a) Prove that if F is compact then for any bounded closed subset S � H ,

F ŒS� is compact.(b) Prove that a compact operator is always bounded.(c) An operator F W H ! J between Hilbert spaces is called finite if

its image is finite-dimensional. Prove that a finite operator is alwayscompact.

(d) Give an example of a compact operator between Hilbert spaces which isnot finite.

(9) Prove that if F W H ! J is a compact linear operator between Hilbert spaces,then there exists an x 2 H such that jjxjj D 1 and jjF.x/jj D jjF jj � jjxjj.[Hint: Consider y 2 F ŒB� to be of maximal norm (note that the norm iscontinuous and F ŒB� is compact).]

9 Exercises 425

(10) Let F W H ! J be a compact linear operator whereH , J are Hilbert spaces.(a) Prove that there exist orthonormal systems .ei /i2N, .fi /i2N in H and J

respectively and numbers s1 � s2 � � � � such that

F.en/ D snfn (i)

and

F is 0 on the orthogonal complement of the closure of the vectorsubspace generated by e1; e2; : : : .

(ii)

Prove further that the numbers sn are uniquely determined and that theorthonormal systems .ei /, .fi / are uniquely determined up to a scalarmultiple if s1 > s2 > � � � . The numbers si are known as singular values ofthe operator F . [Hint: s1 D jjF jj. Use Exercise (9) and pass to orthogonalcomplements.]

(b) Prove that

limn!1 sn D 0: (iii)

Conversely, prove that if F W H ! J is an operator which satisfies (i),(ii) and (iii), then F is compact.

(11) A compact linear operator F W H ! J between Hilbert spaces is called traceclass if its singular values satisfy

1XnD1

sn < 1:

Prove that when an operator F W H ! H for a Hilbert spaceH is trace class,and, for every Hilbert basis .ei /i2I of H ,

f .ej / DXi2I

aijei ;

then the seriesXi2I

aii is absolutely convergent and does not depend on the

choice of Hilbert basis. This number is denoted by tr.F / and called the traceof F .

(12) A compact operator linear F W H ! J between Hilbert spaces is calledHilbert-Schmidt if its singular values satisfy

1XnD1.sn/

2 < 1:

Let .ei /i2I be a Hilbert basis of H . For two Hilbert-Schmidt operators F;G WH ! J , define


F �G DXi2I

F .ei / �G.ei /:

Prove that this is a well-defined inner product on the space HS.H; J / of allHilbert-Schmidt linear operators, and that moreoverHS.H; J / with this innerproduct is a Hilbert space.

(13) Prove that if L is a uniformly convex Banach space and 0 ¤ h 2 L�, thenthere exists a z 2 L such that kzk D 1 and h.z/ D khk.[Hint: Choose a sequence zn in the unit ball of L such that h.zn/ ! khk.Uniform convexity implies that it is Cauchy.]

(14) Let B be a Borel set in Rn such that �.B/ > 0. Prove that L1.B/, L1.B;C/,

L1.B/, L1.B;C/ are not uniformly convex.[Hint: It suffices to consider the “baby” version - see Exercise (20) ofChapter 5.]

(15) Let F W L ! M be a bounded operator where L;M are Banach spaces, andthe vector spaceM=F ŒL� is finite-dimensional. Prove that then F ŒL� is closedin M .[Hint: There is a finite-dimensional vector space V and an extension QF W L˚V ! M which is onto, and maps V isomorphically onto M=f ŒL�. Now QF isopen and the image, under QF , of the open subset L V X f0g is M X F ŒL�.]

17A Few Applications of Hilbert Spaces

In the previous chapter we developed, with the help of analysis, an understanding ofHilbert (and Banach) spaces as a kind of satisfactory generalization of linear algebrato infinite-dimensional spaces. In particular, we developed modified notions of dualsand bases which behave well in this situation.

The real force of Hilbert spaces, however, is that they naturally occur in a varietyof contexts. In physics, specifically in quantum mechanics, a (complex) Hilbertspace is the basic structure on a state space, which is the fundamental concept ofthe theory. In this chapter, we will remain in mathematics, and give examples ofHilbert spaces which occur as certain spaces of functions (generally known as L2-spaces). We will then explore two particular roles L2-spaces play.

First, they provide us with a useful technical tool. To illustrate this, we willprove the Radon-Nikodym Theorem on derivatives of measures. We will then applythat result to proving a Lebesgue integral version of the Fundamental Theorem ofCalculus, which is ultimately a very satisfactory, but also very difficult theorem.In some sense, this theorem brings the story of Lebegue integral, which we usedextensively (although often implicitly) throughout this book, to a conclusion.

The second use of L2-spaces, and generally Hilbert spaces, is as a rigorousfoundation for modelling intuitive geometric ideas. We will illustrate this on theconcepts of Fourier series and the Fourier transformation.

1 Some preliminaries: Integration by a measure

In most of this book, we worked with the Lebesgue integral which we constructed bypassing to limits from the Riemann integral. As a result, we obtained a constructionof the Lebesgue measure. At this point, however, we need to talk about measuresin greater generality. In this section, we summarize the basics of integration theorywith respect to more general measures.


427

428 17 A Few Applications of Hilbert Spaces

1.1

To avoid excessive definitions, we will only consider so-called Borel measures.For the completely abstract concepts of measure and integration, the reader isreferred to [18].

Let X � Rn be a Borel subset. By a Borel measure on X we shall mean a map� which assigns to each Borel subset S � X a number �.S/ 2 Œ0;1�. We requirethat for disjoint subsets S1; S2; : : : ; Sn; : : : of X , we have

�.S1 [ S2 [ : : : / D1XnD1

�.Sn/

(a property known as �-additivity). Note that when E1 � E2 � : : : , then�-additivity, applied to the sets EnC1 X En, implies

�.En/ % �.[En/:

Example: By Proposition 3.2 and Corollary 3.4.1 of Chapter 4, we know that theLebesgue measure on Rn can be considered as a Borel measure on Rn (if we ignorethe fact that it is defined on even more general sets).

1.2 Definition and basic facts about integration of non-negativereal functions with respect to a measure

First, by a simple function on X we mean a function expressed in the form

s DnXiD1

ai cAi (1.2.1)

where Ai � X are Borel subsets, and 0 � ai < 1. We define the integral of asimple function with respect to a Borel measure � by

Z

X

sd� DnXiD1

ai�.Ai /: (1.2.2)

If �.Ai / D 1 and ai D 0, we set the i ’th summand equal to 0. Note carefully thata priori, the integral of s as defined may depend on the expression (1.2.1). However,it doesn’t (see Exercise (1)). Even without knowing that fact, however, we define fora Borel measurable function f W X ! Œ0;1�, (recall 4.4 of Chapter 4)

Z

X

f d� D supZ

X

sd� (1.2.3)

1 Some preliminaries: Integration by a measure 429

where the supremum is taken over all simple functions (1.2.1) such that s � f .Note: For a Borel set B � X ,

RB f d� may be defined simply as the integral of the

restriction of f by the restriction of � to B .

1.2.1 Lemma. For any Borel function f W X ! Œ0;1� there exist simple functionssn such that sn % f .

Proof. Put

sn.x/ D k=2n

when 0 � k � n2n and x is such that

k=2n � f .x/ < .kC 1/=2n: ut

1.2.2 Corollary. When � is the Lebesgue measure and f W X ! Œ0;1� is Borelmeasurable, then (1.2.3) is the Lebesgue integral of f over X .

Proof. Use Lemma 1.2.1, the Lebesgue Monotone Convergence Theorem (Theo-rem 1.1 of Chapter 5), and recall definition 3.1 of Chapter 5. ut

1.2.3 Theorem. (the Lebesgue Monotone Convergence Theorem for a Borel mea-sure) Let fn % f , where fn W X ! Œ0;1� are Borel-measurable functions. Then

limn!1

Z

X

fnd� DZ

X

f d�:

Proof. First note that

Z

X

fn �Z

X

fnC1;

so the limit makes sense. If s � fn is a simple function, then clearly s � f . Thisimplies the � inequality. For the opposite inequality, let s � f be a simple functionand let 0 < c < 1. Let En be the set of all x 2 X such that cs.x/ � fn.x/. Clearly,En � EnC1, and

SEn D X , so

c

Z

X

sd� DZ

X

csd� D limn!1

Z

En

csd� � limn!1

Z

En

fnd� � limn!1

Z

X

fnd�:

(The second equality follows from �-additivity.) Now taking the supremum of theleft-hand side of the inequality we just derived over all 0 < c < 1 and all simplefunctions s � f , we obtain the � inequality of the statement. ut


1.2.4 Lemma. When f; g W X ! Œ0;1� are Borel measurable functions, and c 2Œ0;1/, we have

Z

X

.cf /d� D c

Z

X

f d�;

Z

X

.f C g/d� DZ

X

f d�CZ

X

gd�:

Proof. The first equality is obvious, since simple functions � f correspondbijectively to simple functions � cf by multiplication by c. For the secondinequality, let by Lemma 1.2.1, sn % f , s0

n % g where sn, s0n are simple functions.

We have sn C s0n % f C g, so by the Lebesgue Monotone Convergence Theorem,

Z

X

.f C g/d� D limZ

X

.sn C s0n/d�

D limZ

X

snd�C limZ

X

s0nd� D

Z

X

f d�CZ

X

gd�: ut

1.2.5 CommentLet � be a Borel measure on X and u W X ! Œ0;1� a Borel-measurable function.Then it follows from Lemma 1.2.4 and the Lebesgue Monotone ConvergenceTheorem that

E 7!Z

E

ud�

is a Borel measure on X ; this Borel measure is often denoted by u�.

1.3 Integration of complex functions over a measure

Let � be a Borel measure. A Borel-measurable function f W X ! C is called�-integrable if

Z

X

jf jd� < 1:

(This is clearly equivalent to requiring that Re.f /C, Im.f /C Re.f /� and Im.f /�all have finite integrals). We then put

Z

X

f d� DZ

X

Re.f /Cd� �Z

X

Re.f /�d�C i

Z

X

Im.f /Cd��Z

X

Im.f /�d�:

1 Some preliminaries: Integration by a measure 431

By Proposition 2.7 of Chapter 5, a Borel-measurable function is integrable by theLebesgue measure if and only if it has a finite Lebesgue integral, and the integraljust defined equals its Lebesgue integral.

1.3.1 Lemma. Let f; g W X ! C be �-integrable functions, and let ˛ 2 C. Thenf , f C g are �-integrable and we have

Z

X

f d� D ˛

Z

X

f d�;

Z

X

.f C g/d� DZ

X

f d�CZ

X

gd�:

Proof. The second formula immediately follows from Lemma 1.2.4. To prove thefirst formula, one first notes that for ˛ � 0 it follows from Lemma 1.2.4, then onechecks it for ˛ D �1 and ˛ D i , and uses the second formula to pass to the case of˛ arbitrary. ut

1.3.2 Theorem. (the Lebesgue Dominated Convergence Theorem) Suppose fn WX ! C are Borel measurable functions and assume that fn ! f , and there existsa �-integrable function g W X ! Œ0;1� such that for all n, jfnj � g. Then

limZ

X

fnd� DZ

X

f d�:

Proof. We have

jfn � f j � 2g;

so by Fatou’s lemma (the proof of Lemma 8.5.1 of Chapter 5 works for any Borelmeasure),

Z

X

2gd� � lim infn!1

Z

X

.2g � jf � fnj/d�

D lim infn!1 .

Z

X

2gd� �Z

X

jf � fnj/d�

DZ

X

2gd� � lim supn!1

Z

X

jf � fnjd�:

SubtractingRX 2gd� from both sides,

lim supn!1

Z

X

jf � fnjd� D 0


and hence

limn!1

Z

X

jf � fnjd� D 0:

An analogue of Lemma 8.4.1 of Chapter 5 also holds by the same proof. Therefore,

limn!1

Z

X

.f � fn/d� D 0;

which implies our statement by Lemma 1.3.1. ut

2 The spacesLp�.X;C/ and the Radon-Nikodym Theorem

2.1 The spacesLp�.X;C/

The definition of spaces Lp 1 � p � 1 with respect to an arbitrary Borel measureparallels completely the discussion of the case of the Lebesgue measure in Section 8of Chapter 5. In particular, let X � Rn be a Borel subset and let � be a Borelmeasure onX . Let, for 1 � p < 1, Lp�.X;C/ denote the set of equivalence classesof all Borel-measurable functions f W X ! C such that

Z

X

jf jpd� < 1 (2.1.1)

with respect to the equivalence relation of being equal almost everywhere (i.e.f g if and only if �.fx 2 X j f .x/ ¤ g.x/g/ D 0/. The relation is acongruence, so Lp�.X;C/ inherits a structure of a C-vector space from the set ofall functions satisfying (2.1.1). Again, elements of Lp�.X;C/ are often (slightlyimprecisely but usually harmlessly) identified with their representative functions.Again, we define jjf jjp to be the p’th root of the left-hand side of (2.1.1). Forp D 1, we define, again, jjf jj1 to be the infimum of M � 1 such thatf .x/ � M almost everywhere, and we define L1

� .X;C/ to be the quotient of thevector space of such functions by the congruence of being equal almost everywhere.An analogue of Minkowski’s inequality (Theorem 8.2 of Chapter 5) holds by thesame proof, thus providing us with a norm onLp�.X;C/. The proof of Theorem 8.5.2of Chapter 5 extends to the case of Borel measures to prove that the spacesLp�.X;C/are complete, and hence are Banach spaces. In fact, all the theory of the spacesLp.B/, Lp.B;C/ we built up in Chapter 16 extends verbatim to the case of thespaces Lp�.X/, L

p�.X;C/. In particular, for 1 < p < 1, these Banach spaces

are uniformly convex and hence are reflexive; we simply didn’t want to complicatethe discussion in Chapter 16 with unnecessary generality where we didn’t need it.(However, see Exercises 6, 7 below.)

It is worthwhile pointing out, though, that the case of Borel measures gives someinteresting examples which we haven’t seen before: Let S be a countable set with

2 The spacesLp�.X;C/ and the Radon-Nikodym Theorem 433

the measure � in which every element has measure 1. Then the space Lp�.S/ isisometric to the space of all sequences .an/n2N such that

1XnD1

janjp < 1

with the norm

jj.an/njj D .X

janjp/1=p:Such spaces are denoted by `p (`p.C/ in the complex case).

As before, a special role belongs to the spacesL2�.X/,L2�.X;C/. By the Cauchy-

Schwarz inequality, for f; g 2 L2.X;C/,Z

X

f � gd� < 1; (2.1.2)

so the formula (2.1.2) defines an inner product on L2�.X;C/. Since the norm comesfrom the inner product, L2�.X;C/ with the inner product (2.1.2) is a Hilbert space(and similarly, L2�.X/ is a real Hilbert space).

2.2 The Radon-Nikodym Theorem

Let �, � be Borel measures on a Borel set X � Rn. We say that � is absolutelycontinuous with respect to � if for every Borel set S � X , �.X/ D 0 implies�.X/ D 0.

Theorem. Suppose that �, � are Borel measures on a Borel set X � Rn,�.X/ < 1, �.X/ < 1 and � is absolutely continuous with respect to �. Thenthere exists a Borel measurable function h W X ! Œ0;1/ such that � D h�. (seeComment 1.2.5). The function h is called the Radon-Nikodym derivative of � by �.

Proof. Consider the measure � D �C �. We then have �.X/ < 1 and for everyBorel set S � X , �.S/ � �.S/. Then every function in f 2 L2�.X;C/, f is�-integrable, hence �-integrable. Define

I.f / DZ

X

f d�:

Clearly,

I W L2�.X;C/ ! C

is a C-linear map. We claim that I is continuous. By Theorem 3.5 of Chapter 16,it suffices to prove that there exists a number K such that jjf jj�;2 � 1 implies


jI.f /j � K . Let S D fx 2 X j jf .x/j � 1g.

jI.f /j �Z

S

jf jd�CZ

XXSjf jd�

�Z

S

jjf jj2d�C �.X X S/ � jjf jj2�;2 C �.X X S/ � 1C �.X/:

By the Riesz Representation Theorem 3.6.1 of Chapter 16, there exists a g 2L2�.X;C/ such that for all f 2 L2�.X;C/,

Z

X

f d� DZ

X

f � gd�: (2.2.1)

But we claim that in fact

�.fxj0 � g < 1g/ D �.X/: (2.2.2)

In effect, �.S/ > 0 where S is any of the sets fx 2 X j Im.g/ > 1=ng, fx 2X j Im.g/ < �1=ng, fx 2 X j Re.g/ < �1=ng, fx 2 X j Re.g/ � 1g, wouldviolate (2.2.1) for f D cS (in particular, for S D fx 2 X j Re.g/ � 1g, we wouldget �.S/ � �.S/ D �.S/C �.S/, so �.S/ D 0 which contradicts the assumption�.S/ > 0 by absolute continuity). Thus the above sets S have �.S/ D 0, whichproves (2.2.2). Now rewrite (2.2.1) as

Z

X

f .1 � g/d� DZ

X

fgd�: (2.2.3)

Put h D g=.1 � g/ where defined, and h D 0 elsewhere. Now let

En D fx 2 X jg.x/ < 1 � 1=ng:

Then for a Borel set S � En, f D cS=.1 � g/ is bounded non-negativeBorel-measurable, and hence is in L2�.X;C/, so applying (2.2.3) gives �.S/ DRShd�. If S � X X S

En, then �.S/ D 0 hence �.S/ D �.S/ D 0, so�.S/ D R

Shd�. Now any Borel subset of X is a countable union of sets for which

the statement was just proved. ut

2.3

The following statement will be useful in the next section.

Lemma. Let X � Rn be a Borel set, and let �, � be Borel measures on X suchthat �.X/ < 1. Then � is absolutely continuous with respect to � if and only if forevery " > 0 there exists a ı > 0 such that �.S/ < ı implies �.S/ < ".

3 Application: The Fundamental Theorem of (Lebesgue) Calculus 435

Proof. Let the ı-" condition hold. Then a set S with �.S/ D 0 satisfies thehypothesis for every ı, and hence �.S/ < " for every " > 0.

Conversely, let � be absolutely continuous with respect to �. Suppose the ı-"does not hold, i.e. there exists an " > 0 and sets Ei with �.Ei / � 1=2i such that�.Ei / � ". Put Ai D Ei [ EiC1 [ � � � . Then �.Ai / � ", Ai � AiC1, �.Ai / �1=2i�1, and hence �.

TAi/ D 0, �.

TAi/ � " by �-additivity. ut

3 Application: The Fundamental Theoremof (Lebesgue) Calculus

In this section, we derive an application of the Radon-Nikodym Theorem which isthe analogue, for the Lebesgue integral, of the Fundamental Theorem of Calculus,stating, roughly, that the derivative and the integral are inverse operations. We beginwith the part about the integral of the derivative. This part does not need the Radon-Nikodym Theorem, but it shows that things are much harder than in the case ofthe Riemann integral. Throughout this section, we will work with real functions;all statements immediately follow for complex-valued functions by treating them aspairs of real functions.

3.1 Absolute continuity of functions

A function f W ha; bi ! R is called absolutely continuous if for every " > 0 thereexists a ı > 0 such that for any m-tuple of non-empty disjoint intervals hai ; bi i �ha; bi, i D 1; : : : ; m which satisfy

mXiD1.bi � ai / < ı;

we have

mXiD1

jf .bi /� f .ai /j < ":

An absolutely continuous function is clearly continuous (take m D 1).

3.2 The derivative of an integral

Consider now the situation when f W ha; bi ! R is a Lebesgue integrablefunction. (Recall that by Theorem 4.4 of Chapter 5, we may assume that f is Borelmeasurable.) Now define a function

F W ha; bi ! R


by

F.x/ DZ

ha;xif:

(The integral is with respect to the Lebesgue measure �.)

Proposition. The function F W ha; bi ! R is absolutely continuous.

Proof. The measure jf j� (see Comment 1.2.5) is clearly absolutely continu-ous with respect to �. Our statement therefore follows from Lemma 2.3 andLemma 8.4.1 of Chapter 5. ut

Theorem. The functionF has a derivative almost everywhere in ha; bi and we haveF 0.x/ D f .x/ almost everywhere in ha; bi.

Proof. Recall that for every ı > 0, there exists a continuous function g W ha; bi!R

such thatZ

ha;bijf � gj < ı:

(By our definition of the Lebesgue integral, we may replace f by a function inZup, which can then be replaced by a continuous function.) Now our statement istrue for g in place of f by the corresponding statement for the Riemann integral(Theorem 8.6 of Chapter 1). Now let us investigate the function

h D f � g:

Let " > 0. Let B be the set of all x 2 ha; bi for which there exists a t.x/ > 0 with

a � x � t.x/ < x C t.x/ � b

such thatZ

hx�t .x/;xCt .x/ijhj > "t.x/:

Let K be a compact subset of the open set B . Then there exist x1; : : : ; xN such that

N[iD1.xi � t.xi /; xi C t.xi // � K:

Note that we may find i1 < � � � < im such that the intervals


.xi � t.xi /; xi C t.xi //

are disjoint, and

m[jD1

.xij � 3t.xij /; xij C 3t.xij // � K: (3.2.1)

In fact, assume without loss of generality that

t.x1/ � � � � � t.xN /:

Then it suffices to let ijC1 be the smallest number i > ij such that .xi � t.xi /; xi Ct.xi // is disjoint from .xik � t.xik /; xik C t.xik // for k � j . By (3.2.1), we see that

�.K/ � 6

mXjD1

t.xij / <6

"

mXjD1

Z

hxij �t .xij /;xij Ct .xij /ijhj � 6

"

Z

ha;bijhj � 6ı

":

Since K � B was an arbitrary compact subset, we conclude

�.B/ � 6ı

":

Now the point is that for every " > 0 we can choose ı > 0 such that �.B/ isarbitrarily small. Let

C D fa � x � bj jh.x/j � "g:

Clearly,

�.C / � ı

":

However, for

x 2 ha; bi X .B [ C/; (3.2.2)

for every t > 0 such that a � x � t < x C t � b, we have for both J D hx � t; xiand J D hx; x C ti

1

t

ˇˇZ

J

f �Z

J

g

ˇˇ � 1

t

Z

J

jhj � ";

while jf .x/ � g.x/j < ", and thus

j. 1t

RJf /� f .x/j < 3" for sufficiently small t > 0 (3.2.3)


for x as in (3.2.2). Now the sets B , C depend on ı and ", but writing B D B.ı; "/,C D C.ı; "/, (3.2.3) holds for

x 2 ha; bi X\n

.B.1=n; "/[ C.1=n; "//;

which is almost everywhere. Since " was arbitrary, considering " D 1=k, k D1; 2; : : : , we see that F 0.x/ D f .x/ almost everywhere on ha; bi, as claimed. ut

3.3 The integral of the derivative

Let us now consider the harder direction, namely the integral of the derivative of afunction F W ha; bi ! R. By Proposition 3.2, it suffices to consider the case whenF is absolutely continuous.

Theorem. Let F W ha; bi ! R be absolutely continuous. Then F 0.x/ exists almosteverywhere and for every x 2 ha; bi,

Z

ha;xiF 0 D F.x/ � F.a/:

The proof will consist of several steps. First assume that

F is increasing. (*)

We start with

3.3.1 Lemma. Let (*) hold and let F be absolutely continuous. Let S � ha; bisatisfy �.S/ D 0. Then �.F ŒS�/ D 0.

Proof. Suppose, without loss of generality, a; b … S . By Exercise (9) of Chapter 5,there exists for every ı > 0 an open set U � S such that �.U / < ı. Then wemay express U as a countable disjoint union of open intervals .ai ; bi /, i D 1; 2; : : :

(Lemma 5.2.1 of Chapter 2). By the definition of absolute continuity (applied toi D 1; : : : ; n and taking a limit with n ! 1) we see that for every " > 0 thereexists a ı > 0 for which �ŒF ŒS�� < ". Since " > 0 was arbitrary, �.F ŒS�/ D 0, asdesired. ut

3.3.2 Proof of the Theorem under the hypothesis (*)Since F is increasing and continuous on a compact interval, F�1 is continuous onF Œha; bi�, hence Borel measurable. Hence, we can define a Borel measure � onha; bi by

�.S/ D �.F ŒS�/:


Further, by the lemma, � is absolutely continuous with respect to the Lebesguemeasure �, and hence satisfies the assumptions of the Radon-Nikodym Theorem.Let h be the Radon-Nikodym derivative of � by �. Then applying the statement ofTheorem 2.2 to the sets ha; xi, we get

Z

ha;xihd� D F.x/ � F.a/;

as claimed. The fact that h is the derivative of F almost everywhere follows fromTheorem 3.2. ut

3.3.3 Lemma. Let F W ha; bi ! R be absolutely continuous. Let

G.x/ D supNXiD1

jF.ti / � F.ti�1/j

where the supremum is over all N and all choices of points

a D t0 < � � � < tN D x:

Then the functions G, G � F , G C F are increasing and absolutely continuous.(The function G is called the total variation of the function F .)

Proof. Let a � y < x � b. The supremum in the definition of G.x/ clearly willnot change if we take it only over such tuples .ti / which additionally satisfy ti D y

for some i . This shows that

G.x/ �G.y/ D supNXiD1

jF.ti /� F.ti�1/j (*)

where the supremum is taken over all

y D t0 < � � � < tN D x:

Now choose an " > 0. Then if F satisfies the condition of absolute continuity witha particular ı > 0, (*) (applied to y D ai ; x D bi for each individual i in thedefinition 3.1) shows that G satisfies the condition of absolute continuity for thesame ı.

To show that G � F and G C F are non-decreasing, note that by definition, fora � y < x � b,

G.x/ �G.y/ � jF.x/ � F.y/j;


and hence

G.x/ �G.y/ � ˙.F.x/ � F.y//;

as required. ut

3.3.4 Proof of the Theorem in the general caseClearly, an R-linear combination of absolutely continuous functions is absolutelycontinuous. Let F be as in the hypothesis of the theorem. Then the conclusion holdswith F replaced by the increasing functionsG C F C x, G C x, and hence, by thelinearity of derivatives and integrals, for

F D .G C F C x/ � .G C x/: ut

4 Fourier series and the discrete Fourier transformation

In the preceding sections, we obtained strong theorems (Theorems 2.2 and 3.3)which used the theory of Hilbert spaces in their proofs, but Hilbert spaces werenot a part of the final statements. The role of Hilbert spaces in this and the nextsection is different, namely as a framework in which intuitive statements can beeasily made rigorous. Of course, much more can be said on the subjects we touch onhere, but what we say is a good example of the role the concept plays, for example,in mathematical physics.

4.1 The discrete Fourier transform (L2-Fourier series)

We begin with an auxilliary result.

4.1.1 The subspace of continuous functions with compactsupport inLp

Let U � Rn be an open set. Recall that the support supp.f / of a functionf W U ! R is the closure in U of the set of all x 2 U such that f .x/ ¤ 0. Theset (vector space) of continuous functions on U with compact support is denoted byCc.U /. Similarly, the space of continuous complex functions with compact supporton U is denoted by Cc.U;C/.

Theorem. Let U � Rn be a an open set and let 1 � p < 1. Then the set Cc.U /(resp. Cc.U;C/) is dense in Lp.U / (resp. Lp.U;C/).

Proof. Let us prove the complex case, the real case is analogous. Let K � U be acompact set. We will first prove that in Lp.U;C/,

4 Fourier series and the discrete Fourier transformation 441

cK 2 Cc.U;C/ (4.1.1)

(recall that cK is the characteristic function, which has value 1 on K and 0

elsewhere). In effect, K is contained in the union of all balls .x; "x/ with x 2 K ,"x < 1=k which are contained in U , and hence in finitely many of those balls bycompactness. LetUk be the union of these finitely many open balls. Then by Tietze’sTheorem (Theorem 8.5 of Chapter 2), there exists a function fk W Rn ! h0; 1i suchthat f .x/ D 1 for x 2 K , and f .x/ D 0 for x … Uk. Clearly, fk has compactsupport, and for k sufficiently large, supp.fk/ � U . Then fk & cK , and we have

limk!1 jjfk � cK jjp ! 0

by the Lebesgue Dominated Convergence Theorem, which implies (4.1.1).Next, we claim that (4.1.1) extends to any F� -set K which satisfies �.K/ < 1:

this is because any such set is a union of countably many Kn compact, and wemay assume K1 � K2 � � � � and use the fact that by the Lebesgue DominatedConvergence Theorem,

limk!1 jjcKk � cK jjp D 0:

Finally, recall from Exercise (8) of Chapter 5 that for every measurable set S �U with�.S/ < 1, there exists an F� -setK � S ,�.SXK/ D 0, so inLp.U;C/; cSis in the closure of Cc.U;C/. Consequently, so is any non-negative simple functions with finite integral (which is equivalent to sp having a finite integral). Now for anyf � 0, f 2 Lp.U;C/, there are non-negative simple functions sn with sn % f .Then

limk!1 jjf � skjjp D 0

by Lebesgue’s Dominated Convergence Theorem, and hence f 2 Cc.U;C/, whichimplies the same conclusion about any f 2 L2.U;C/ (by considering Re.f /C,Re.f /�, Im.f /C, Im.f /�). ut

4.1.2 Comments1. Note that unlike our previous results on Lp , Theorem 4.1.1 does not readily

generalize to an arbitrary Borel measure.2. Also note that Cc.U / is certainly not dense in L1.U /. Since the complement of

a measure 0 set in U is necessarily dense, on Cc.U /,L1-convergence is uniformconvergence, and thus the closure of Cc.U / in L1.U / consists, in particular, ofcontinuous functions.


4.1.3 The Discrete Fourier Transform TheoremConsider the space L2.h0; 2i/ (but we could adapt our arguments to any compactinterval of non-zero length, see Exercise (12)). Then by explicit calculation,

1p2einx; n 2 Z (4.1.2)

form an orthonormal system in L2.h0; 2i/.

Theorem. The system (4.1.2) forms an orthonormal basis of L2.h0; 2i/.

Proof. Consider the space

S1 D fz 2 C j jzj D 1g

with the topology induced byC. Now consider theR-vector subspace� � C.S1;R/

spanned by the functions zn C z�n, i.zn � z�n/, n 2 Z. Then � is closed undermultiplication, contains a non-zero constant function and separates points, andhence satisfies the hypotheses of the Stone-Weierstrass Theorem 6.4.1 of Chapter 9.Consequently, every continuous function f W S1 ! R is a uniform limit of asequence of elements of � . Composing with the map eix, we see that in particular,every continuous function g W .0; 2/ ! R with compact support is a uniformlimit of functions gn which are finite linear combinations of the functions sin.nx/,cos.nx/, n 2 Z. Therefore, every continuous function g W .0; 2/ ! C withcompact support is a uniform limit of functions gn where each gn is a finite linearcombination of the functions einx, n 2 Z. By the Lebesgue Dominated ConvergenceTheorem, a sequence inL2.h0; 2i;C/which converges uniformly converges inL2.Since the functions gn are (finite) linear combinations of the elements (4.1.2), g isin the closure of the subspace spanned by (4.1.2). Thus, our statement follows fromTheorem 4.1.1. ut

4.1.4As already remarked in Section 2.1 above, sometimes one denotes by `2.C/ thespace L2�.Z;C/ where � is the counting measure on Z, i.e. �.S/ is the number ofelements of S when S is finite, and �.S/ D 1 for S infinite. Then the assignment

Xn2Z

a.n/p2einx 7! .a W Z ! C/ (*)

defines an isomorphism

L2.h0; 2i;C/ ! `2.C/

which is sometimes referred to as the discrete Fourier transformation and theexpression on the left-hand side of (*) of an element f 2 L2.h0; 2i;C/ is calledits Fourier series. Much hard mathematics concerns convergence of Fourier series

5 The continuous Fourier transformation 443

in other spaces thanL2. Note, however, that by Theorem 4.9 of Chapter 16, we havean expression for the coefficients an:

an D 1p2

Z

h0;2if .x/e�inx: (**)

5 The continuous Fourier transformation

5.1 The continuous Fourier transformation formula

While Exercise (15) of the previous Section gives a basis of L2.R;C/, one mayask if there is a more compelling analogue of formula (**) which would applyto L2.R;C/. There is a surprisingly simple answer, namely to apply (**) for acontinuous parameter instead of n 2 Z, and integrate over all of R, thus obtaining,again, a function on R: Define for a function f W R ! C and for t 2 R,

Of .t/ D 1p2

Z

R

f .x/e�ixtdx: (5.1.1)

(The integral on the right-hand side is the Lebesgue integral; we include the symboldx to emphasize that we are integrating in the variable x.)

Despite the simplicity of the generalization, it is immediately visible thatthe situation will be more complicated than in the case of the discrete Fouriertransformation. For example, we cannot expect the formula (5.1.1) to work for everyf 2 L2.R;C/: in order for (5.1.1) to make sense, f must be integrable. Conversely,suppose (5.1.1) does make sense. Do we have Of 2 L2.R;C/?

We will answer these questions partially: We will apply the continuous Fouriertransform formula (5.1.1) to certain subspace of functions called “rapidly decreasingfunctions”, and extend it to an isometric isomorphism of Hilbert spaces

F W L2.R;C/ ŠL2.R;C/:

Again, much deeper and more specific convergence theorems exist, but we will notdiscuss them in this text.

5.2 Lemma. (The Riemann-Lebesgue lemma) Let f W R ! C be an integrablefunction. Then Of W R ! C is continuous and we have

limt!�1

Of .t/ D limt!1f .t/ D 0:


Proof. When tn ! t , then f .x/e�itnx ! f .x/e�itx, while jf .x/e�itnxj D jf .x/j.Thus, Of .tn/ ! Of .t/ by the Lebesgue Dominated Convergence Theorem. Thisproves continuity.

To prove the limit formula, first consider the case when f D c.a;bi, a < b:we have

ˇˇZ

.a;bie�itxdx

ˇˇ D 1

jt j je�itb � e�itaj

and the right-hand side goes to 0 with jt j ! 1. By a step function we shall nowmean a (finite) C-linear combination of the functions c.a;bi (with varying a < b).Then we claim that for every integrable function f W R ! C and every " > 0, thereexists a step function s such that

Z

R

jf � sj < ":

First, this is true for continuous functions with compact supports (by the conver-gence of the Riemann integral). Then it is true for non-negative functions in Zdn andhence for all integrable functions by the Lebesgue Monotone Convergence Theoremand linearity of integrals. But

j Of .t/ � Os.t/j �Z

R

jf � sj < ";

and thus the limit formula for s implies the limit formula for f . ut

5.3 Lemma. Let f W R ! C be such that both f .x/ and x � f .x/ are integrable.Then Of .t/ is differentiable, and

d Ofdt

D 2�ixf .x/.t/:

(Note: By the right-hand side, we mean the Fourier transform of �ixf .x/, which isa function of t .)

Proof. Under the conditions given, we have

d

dt

Z

R

f .x/e�itxdx DZ

R

@f .x/e�itx

@tdx D

Z

R

.�ix/f .x/e�itxdx

by Theorem 5.2 of Chapter 5 (differentiation under the integral sign). ut

5.4 Lemma. Let f W R ! C have a continuous derivative, and assume f .x/ andf 0.x/ are integrable, and that

limx!˙1f .x/ D 0:


Then we have

bf 0.t/ D it Of .t/:

Proof. Compute:

bf 0.t/ DZ

R

f 0.x/e�itxdx D lima!1

Z

h�a;aif 0.x/e�itxdx

D lima!1.f .a/e

�ita � f .�a/eita CZ

h�a;aiitf .x/e�itxdx/ D it

Z

R

f .x/e�itxdx:

The passages to the limit follow from the Lebesgue Dominated ConvergenceTheorem. The middle equality is integration by parts (for the Riemann integral). ut

5.5 Lemma. Let f; g W R ! C be integrable functions and let a > 0. Then

Z

R

f .ax/ Og.x/dx DZ

R

g.ay/ Of .y/dy: (5.5.1)

(Again, on both sides, we mean the Lebesgue integral.)

Proof. First note that both sides of (5.5.1) make sense by the Riemann-Lebesguelemma, since Of and Og are continuous and bounded. Next, consider the integral

Z

R2

f .x/g.t/e�itx=a:

Clearly, this integral exists (replace the integrand by jf .x/j � jg.t/j), and is equalto both sides of (5.5.1) by Fubini’s Theorem and linear substitutions x=a D u andt=a D v. ut

5.6 Rapidly decreasing functions

A function f W R ! C is called rapidly decreasing (or Schwarzian) if f has allderivatives, and for all numbersm; n D 0; 1; 2; : : : , we have

limx!˙1xmf .n/.x/ D 0:

(Note that the term “rapidly decreasing” is a misnomer, since these functions are,in fact, never decreasing.) Note that any smooth function with compact support israpidly decreasing (since all its derivatives will have, again, compact support). Thevector space of all rapidly decreasing functions f W R ! C is denoted by S.

Lemma. Let f 2 S. Then Of 2 S.


Proof. By induction, using Lemmas 5.3 and 5.4, tm Of .n/.t/ is a (finite) linear

combination of functions of the form 4xkf .`/.x/.t/. Use the assumption and theRiemann-Lebesgue lemma. ut

5.7 The Fourier Inversion Theorem

Define the inverse Fourier transform Qf by

Qf .t/ D 1p2

Z

R

f .x/eitxdx:

Then by definition,

Qf D bf

where x is the complex conjugate of x. It follows that the inverse Fourier transformmaps S to S.

Theorem. For f 2 S, the inverse Fourier transform of the Fourier transform of fis equal to f .

Proof. Let f; g 2 S. By the Lebesgue Dominated Convergence Theorem andLemma 5.6, we may pass to the limit a ! 0 in Lemma 5.5, getting

f .0/

Z

R

Og D g.0/

Z

R

Of : (5.7.1)

Setting

g.x/ D 1p2e�x2=2;

and using Exercise (15) of Chapter 5, and Exercise (18) below, (5.7.1) becomes

f .0/ D 1p2

Z

R

Of ;

which is the special case of the formula we desire at the point x D 0. The generalcase follows from Exercise (16) below. ut

Corollary. For f; g 2 S, we have

hf; gi D h Of ; Ogi:


Proof. We have

h Of ; Ogi DZ

R

Of Og DZ

R

fbOg D

Z

R

feOg DZ

R

f g D hf; gi: ut

Lemma. S is dense in L2.R;C/.

Proof. In effect, by Theorem 4.1.1, continuous functions with compact supportare dense in L2.R;C/, but we claim that if f is a continuous function withcompact support K , then there exists an L � K compact and fn smooth withsupp.fn/ � L which converge to f uniformly (and hence in L2). In effect, letU be any open neighborhood of K such that U is compact. Let " > 0. Since fis uniformly continuous, there exists a ı > 0 such that for x 2 K , y 2 .x; ı/,jf .x/ � f .y/j < ". Further, by compactness, for ı > 0 sufficiently small and allx 2 K ,.x; ı/ � U . Now choose a smooth partition of unity ux subordinate to theopen cover by the balls .x; ı/ and R XK , and let

g.t/ DXx2K

ux.t/f .x/:

Then jf .t/ � g.t/j < 2" for all t 2 K , while g is smooth and supp.g/ � U . ut

5.8 Theorem. The maps S ! S given by f 7! Of , f 7! Qf extend to linearisometries

F ;F�1 W L2.R;C/ ! L2.R;C/

which are inverse to each other.

Proof. An isometry of inner product spaces is always injective. Thus, by Corol-lary 5.7, the Fourier transform gives an injective linear map S ! S, and byTheorem 5.7, it is onto. Hence it is a linear isomorphism, and the inverse Fouriertransform is an inverse linear isomorphism. Hence, the inverse Fourier transform isalso an isometry (of course, this could also be proved directly).

Now composing either the Fourier transform or the inverse Fourier transformS ! S with the inclusion S � L2.R;C/, we obtain uniformly continuous maps intoa complete metric space, which can therefore be uniquely extended to a uniformlycontinuous map

L2.R;C/ ! L2.R;C/


by Proposition 4.6 of Chapter 9. These maps are clearly linear isometries bycontinuity of the inner product and vector space operations, and are inverse to eachother by uniqueness of the extension. ut

6 Exercises

(1) Prove that the expression (1.2.2) of 1.2 does not depend on the expression of asimple function (1.2.1).

(2) Prove that the function volg.B/ on Borel subsets B of a Riemann manifoldMfrom Exercise (5) of Chapter 15 is a Borel measure on M .

(3) Prove that for two Riemann metrics g1, g2 on a smooth manifoldM , the Borelmeasure volg1 is absolutely continuous with respect to the Borel measure volg2 .Conclude that it makes sense to speak of a measure 0 set in a smooth manifold,even when we do not specify a Riemann metric.

(4) Extend the Radon-Nikodym Theorem to the case when there exist subsetsX1;X2; � � � � X such that X D S

Xn and �.Xn/ < 1. (The measure � on Xis then called �-finite). Note that we are keeping the assumption �.X/ < 1.[Hint: Apply Theorem 2.2 for each Xn instead of X .]

(5) Prove uniqueness in the Radon-Nikodym Theorem, i.e. prove that if twofunctions h1, h2 in the statement of Theorem 2.2 satisfy the conclusion, thenthey are equal almost everywhere.

(6) Prove that ifB � Rn is a Borel set and� is a �-finite Borel measure onB , thenthere is an isomorphism of Banach spaces .L1.B//� Š L1.B/, and similarlyin the complex case.[Hint: Extend the Radon-Nikodym Theorem to a situation where insteadof the measure � we have a continuous linear functional on L1�.B/ underthe condition �.X/ < 1 - the proof is the same! The “Radon-Nikodymderivative” h is the function in L1 which we are seeking; Exercise (4) is alsorelevant. To prove that there is a bound M such that jh.X/j < M almosteverywhere, assume for contradiction that jh.x/j > 2n on a subset Xn ofpositive measure,Xn disjoint. Then there exists an integrable function fn withRXn

jfnj � 1=2n,RXnfn � h D 1.]

(7) Prove that if U is an open set in Rn, then the spaces L1.B/, L1.B;C/ are not

reflexive.[Hint: Use 4.1.2. Let V be a the closure of Cc.U / in L1.B/. Prove that thereis a continuous linear form X on L1.B/ which is 0 on Cc.U /. Consequently,X cannot come from L1.U /. (Consider Exercise (6).)]

(8) Prove that in Lemma 2.3, the assumption �.X/ < 1 is needed. Find acounterexample and describe where the proof goes wrong when we omit thiscondition.

(9) The requirement in Definition 3.1 that the intervals hai ; bi i be disjoint isneeded. Give an example showing that we get a different notion if we drop it.

6 Exercises 449

(10) Prove that while x2 sin.1=x2/ has a derivative everywhere, it is not absolutelycontinuous on h�1; 1i, and thus the Lebesgue integral of its derivative does notexist.

(11) Let F W ha; bi ! R be Lipschitz (see 3.1 of Chapter 6). Prove that F has aderivative almost everywhere.

(12) In analogy of 4.1, find a Hilbert basis of the space L2.ha; bi/ for a < b.(13) Using 4.1, find a real orthonormal basis of the real Hilbert space

L2.h0; 2i;R/:

(14) Let f W h0; 2i ! R be defined by(a) f .x/ D x,(b) f .x/ D 1 for 0 � x � and f .x/ D 0 else.Compute the Fourier series of f .

(15) Prove that the functions �m;n W R ! C where �m;n.x/ D 1p2einx when

2m � x < 2.m C 1/ and �m;n.x/ D 0 otherwise, form an orthonormalbasis of L2.R;C/.

(16) Let f W R ! C be an integrable function and let a 2 R. Define a functionfa W R ! C by fa.x/ D f .x C a/. Prove that bfa.t/ D eita Of .t/.

(17) Define the convolution of functions f; g W R ! C by

f � g.t/ DZ

R

f .x/g.t � x/dx:

Prove that if f and g are integrable then the convolution is well defined, andone has

1f � g D Of � Og:

[Hint: Use Fubini’s Theorem.](18) Prove that the function e�x2=2 is rapidly decreasing and that its Fourier

transform is the same function.

ALinear Algebra I: Vector Spaces

1 Vector spaces and subspaces

1.1

Let F be a field (in this book, it will always be either the field of reals R or the fieldof complex numbers C). A vector space

V D .V;C; o; ˛.�/ .˛ 2 F//

over F is a set V with a binary operation C, a constant o and a collection of unaryoperations (i.e. maps) ˛ W V ! V labelled by the elements of F, satisfying(V1) .x C y/C z D x C .y C z/,(V2) x C y D y C x,(V3) 0 � x D o,(V4) ˛ � .ˇ � x/ D .˛ˇ/ � x,(V5) 1 � x D x,(V6) .˛ C ˇ/ � x D ˛ � x C ˇ � x, and(V7) ˛ � .x C y/ D ˛ � x C ˛ � y.Here, we write ˛ � x and we will write also ˛x for the result ˛.x/ of the unaryoperation ˛ in x. Often, one uses the expression “multiplication of x by ˛”; but it isuseful to keep in mind that what we really have is a collection of unary operations(see also 5.1 below). The elements of a vector space are often referred to as vectors.In contrast, the elements of the field F are then often referred to as scalars.

In view of this, it is useful to reflect for a moment on the true meaning of theaxioms (equalities) above. For instance, (V4), often referred to as the “associativelaw” in fact states that the composition of the functions V ! V labelled byˇ; ˛ is labelled by the product ˛ˇ in F, the “distributive law” (V6) states that the(pointwise) sum of the mappings labelled by ˛ and ˇ is labelled by the sum ˛ C ˇ

in F, and (V7) states that each of the maps ˛ preserves the sum C. See Example 3in 1.2.

I. Kriz and A. Pultr, Introduction to Mathematical Analysis,DOI 10.1007/978-3-0348-0636-7, © Springer Basel 2013

451

452 A Linear Algebra I: Vector Spaces

1.2 Examples

Vector spaces are ubiquitous. We present just a few examples; the reader willcertainly be able to think of many more.1. The n-dimensional row vector space Fn. The elements of Fn are the n-tuples.x1; : : : ; xn/ with xi 2 F, the addition is given by

.x1; : : : ; xn/C .y1; : : : ; yn/ D .x1 C y1; : : : ; xn C yn/;

o D .0; : : : ; 0/, and the ˛’s operate by the rule

˛..x1; : : : ; xn// D .˛x1; : : : ; ˛xn/:

Note that F1 can be viewed as the F. However, although the operations a �� comefrom the binary multiplication in F, their role in a vector space is different. See5.1 below.

2. Spaces of real functions. The set F.M/ of all real functions on a set M ,with pointwise addition and multiplication by real numbers is obviously avector space over R. Similarly, we have the vector space C.J / of all thecontinuous functions on an interval J , or e.g. the spaceC1.J / of all continuouslydifferentiable functions on an open interval J or the space C1.J / of all smoothfunctions on J , i.e. functions which have all higher derivatives. There are alsoanalogous C-vector spaces of complex functions.

3. Let V be the set of positive reals. Define x ˚ y D xy, o D 1, and for arbitrary˛ 2 R, ˛ � x D x˛ . Then .V;˚; o; ˛ � .�/ .˛ 2 R// is a vector space (seeExercise (1)).

1.3 An important convention

We have distinguished above the elements of the vector space and the elementsof the field by using roman and greek letters. This is a good convention for adefinition, but in the row vector spaces Fn, which will play a particular role below,it is somemewhat clumsy. Instead, we will use for an arithmetic vector a bold-facedvariant of the letter denoting the coordinates. Thus,

x D .x1; : : : ; xn/; a D .a1; : : : ; an/; etc.

Similarly we will write

f D .f1; : : : ; fn/

for the n-tuple of functions fj W X ! R resp. C (after all, they can be viewed asmappings f W X ! Fn), and similarly.

1 Vector spaces and subspaces 453

These conventions make reading about vectors much easier, and we will maintainthem as long as possible (for example in our discussion of multivariable differentialcalculus in Chapter 3). The fact is, however, that in certain more advanced settingsthe conventions become cumbersome or even ambiguous (for example in the contextof tensor calculus in Chapter 15), and because of this, in the later chapters of thisbook we eventually abandon them, as one usually does in more advanced topics ofanalysis.

We do, however, use the symbol o universally for the zero element of a generalvector space – so that in Fn we have o D .0; 0; : : : ; 0/.

1.4

We have the following trivial

Observation. In any vector space V , for all x 2 V , we have x C o D x and thereexists precisely one y such that x C y D o, namely y D .�1/x.

(Indeed, x C o D 1 � x C 0 � x D .1 C 0/x D x and x C .�1/x D 1x C.�1/x D .1 C .�1//x D 0 � x D o, and if x C y D o and x C z D o theny D y C .x C z/ D .y C x/C z D z.)

1.5 (Vector) subspaces

A subspace of a vector space V is a subset W � V that is itself a vector space withthe operations inherited from V . Since the equations required in V hold for specialas well as general elements, we have a trivial

Observation. A subset W � V of a vector space is a subspace if and only if(a) o 2 W ,(b) if x; y 2 W then x C y 2 W , and(c) for all ˛ 2 F and x 2 W , ˛x 2 W .

1.5.1Also the following statement is immediate.

Proposition. The intersection of an arbitrary set of subspaces of a vector space Vis a subspace of V .

1.6 Generating sets

By 1.5.1, we see that for each subset M of V there exists the smallest subspaceW � V containingM , namely


L.M/ D\

fW jW subspace of V and M � W g:

For M finite, we use the notatiom

L.u1; : : : ; un/ instead of L.fu1; : : : ; ung/:

Obviously L.;/ D fog.We say that M generates L.M/; in particular if L.M/ D V we say that M is a

generating set (of V ). One often speaks of a set of generators but we have to keepin mind that this does not imply each of its elements generates V , which would bea much stronger statement.

If there exists a finite generating system we say that V is finitely generated, orfinite-dimensional.

1.7 The sum of subspaces

Let W1;W2 be subspaces. Unlike the intersection W1 \ W2, the union W1 [ W2

is generally (and typically) not a subspace. But we have the smallest subspacecontaining bothW1 andW2, namely L.W1 [W2/. It will be denoted by

W1 CW2

and called the sum of W1 and W2. (One often uses the symbol ‘˚’ instead of ‘C’when one also has W1 \W2 D fog.)

2 Linear combinations, linear independence

2.1

A linear combination of a system x1; : : : ; xn of elements of a vector space V overF is a formula

˛1x1 C � � � C ˛nxn (briefly,nX

jD1˛j xj /: (*)

The “system” in question is to be understood as the sequence, although the orderin which it is presented will play no role. However, a possible repetition of anindividual element is essential.

Note that we spoke of (*) as of a “formula”. That is, we had in mind the fullinformation involved (more pedantically, we could speak of the linear combinationas of the sequence together with the mapping f1; : : : ; ng ! F sending j to ˛j ).The vector obtained as the result of the indicated operations should be referred to asthe result of the linear combination (*). We will follow this convention consistently

2 Linear combinations, linear independence 455

to begin with; later, we will speak of a linear combinationnX

jD1˛j xj more loosely,

trusting that the reader will be able to tell from the context whether we will meanthe explicit formula or its result.

2.2

A linear combination (*) is said to be non-trivial if at least one of the ˛j is non-zero.A system x1; : : : ; xn is linearly dependent if there exists a non-trivial linear

combination (*) with result o. Otherwise, we speak of a linearly independentsystem.

2.2.1 Proposition. 1. If x1; : : : ; xn is linearly dependent resp. independent then forany permutation of f1; : : : ; ng the system x.1/; : : : ; x.n/ is linearly dependentresp. independent.

2. A subsystem of a linearly independent system is linearly independent.3. Let ˇ2; : : : ; ˇn be arbitrary. Then x1 : : : ; xn is linearly independent if and only if

the system x1 CnX

jD2ˇj xj ; x2; : : : ; xn is.

4. A system x1; : : : ; xn is linearly dependent if and only if some of its members area (result of a) linear combination of the others.In particular, any system containing o is linearly dependent. Similarly, if there

exist j ¤ k such that xj D xk then x1; : : : ; xn is linearly dependent.

Proof. 1. is trivial.2. A non-trivial linear combination demonstrating the dependence of the smaller

system demonstrates the dependence of the bigger one if we put ˛j D 0 for theremaining summands.

3. It suffices to prove one implication, the other follows by symmetry since the firstsystem can be obtained from the second by using the coefficients �ˇj . Thus,

let ˛1.x1 CnX

jD2ˇj xj /C ˛2x2 C � � � C˛nxn D o with an ˛k ¤ 0. Then we have

˛1x1 C .˛2 C ˛1ˇ2/x2 C � � � C .˛n C ˛1ˇn/xn D o

and it is a non-trivial linear combination of the x1; : : : ; xn: indeed either ˛1 ¤ 0

or .˛k C ˛1ˇk/ D ˛k ¤ 0.

4. If ˛1x1 C � � � C ˛nxn (briefly,nX

jD1˛j xj D o) with ˛k ¤ 0 then xk D

Xj¤k

.�˛j /˛k

xj . On the other hand, if xk DXj¤k

˛j xj we have the non-trivial

linear combination xk CXj¤k

.�˛j /xj D o. ut


2.3 Conventions

We speak of a linearly independent finite set X � V if X is independent whenordered as a sequence without repetition. A general subset X � V is said to beindependent if each of its finite subsets is independent.

2.4 Theorem. LetM be an arbitrary subset of a vector space V . Then L.M/ is theset of all the (results of) linear combinations of finite subsystems of M .

Proof. The set of all such results of linear combinations is obviously a subspace ofV . On the other hand, a subspaceW containingM has to contain all the (results of)linear combinations of elements of M . ut

2.5 Proposition. L.u1; : : : ; un/ � L.v1; : : : ; vk/ if and only if each of the uj ’s is alinear combination of v1; : : : ; vk .

Proof. If it is, the inclusion follows from 2.4 since L.u1; : : : ; un/ is the smallestsubspace containing all the uj ; if we have the inclusion then the uj ’s are the desiredlinear combinations, again by 2.4. ut

2.6 Theorem. (Steinitz’ Theorem, or The Exchange Theorem) Let v1; : : : ; vk be alinearly independent system in a vector space V and let fu1; : : : ; ung be a generatingset. Then(1) k � n, and(2) There exists a bijection � W f1; : : : ; ng ! f1; : : : ; ng (i.e. a permutation of the

set f1; : : : ; ng) such that

fv1; : : : ; vk; u�.kC1/; : : : ; u�.n/gis a generating set.

Proof. by induction.

If k D 1 we have v1 DnX

jD1˛j uj and since v1 ¤ o by 2.2, there exists at least

one uj0 with ˛j0 ¤ 0. Now

uj0 D 1

˛j0v1 C

Xj¤j0

�˛j˛j0

uj

and we have, by 2.5,

L.v1; u1; : : : ; uj0�1; uj0C1; : : : ; un/ D L.u1; : : : ; un/ D V:

Rearange the uj by exchanging u1 with uj0 .

3 Basis and dimension 457

Now let the statement hold for k and let us have a linearly independent systemv1; : : : ; vk; vkC1. Then v1; : : : ; vk . is linearly independent and we have, after arearrangement of the uj ,

L.v1; : : : ; vk; ukC1; : : : ; un/ D V:

Since vkC1 2 V we have

vkC1 DkX

jD1˛j vj C

nXjDkC1

˛j uj :

We cannot have all the ˛j with j > k equal to zero: since v1; : : : ; vk; vkC1 areindependent, this would contradict 2.2.1 4. Thus, ˛j0 ¤ 0 for some j0 > k andhence, first,

n � k C 1;

and, second, after rearranging the uj ’s to exchange the uj0 with ukC1 we obtain

1

˛kC1vkC1 D

kXjD1

˛j

˛kC1vj C ukC1 C

nXjDkC2

˛j

˛kC1uj ;

and hence

ukC1 DkX

jD1

�˛j˛kC1

vj C 1

˛kC1vkC1 C

nXjDkC2

�˛j˛kC1

uj ;

and L.v1; : : : ; vk; vkC1; ukC2; : : : ; un/ D L.u1; : : : ; un/ D V by 2.5 again. ut

3 Basis and dimension

3.1

We have observed a somewhat complementary behaviour of generating sets andindependent systems: the former remain a generating set if more elements areadded, the latter remain independent if some elements are deleted. This suggeststhe importance of minimal generating sets and maximal independent ones. We willsee they are, basically, the same. The resulting concept is of fundamental importancein linear algebra.

A basis of a vector space V is a subset that is both generating and linearlyindependent.


3.1.1 Observation. In a vector space V ,

(1) if u1; : : : ; un is a generating set then each x can be written as x DnX

jD1˛j uj ,

(2) if u1; : : : ; un is linearly independent then each x can be written at most one way

as x DnX

jD1˛juj ,

(3) if u1; : : : ; un is a basis then each x can be written precisely one way as x DnX

jD1˛j uj .

((1) is in 2.4; as for (2), ifnX

jD1˛j uj D

nXjD1

ˇj uj thennX

jD1.˛j � ˇj /uj D o and

˛j � ˇj D 0; (3) is a combination of (1) and (2).)

3.2 Theorem. 1. Every (finite) generating system u1; : : : ; un contains a basis.2. Every linearly independent system v1; : : : ; vn of a finitely generated vector

space can be extended to a basis.3. All bases of a finitely generated vector space have the same number of elements.

Proof. 1. If u1; : : : ; un are linearly independent we already have a basis. Elsethere is, by 2.1 4, an element uj , say un (which we can achieve by rearange-ment), that is a linear combination of others. Then by 2.5, L.u1; : : : ; un�1/ DL.u1; : : : ; un/ D V and we can repeat the procedure with the generating systemu1; : : : ; un�1. After repeating the procedure sufficiently many times we finishwith a generating u1; : : : ; uk that is linearly independent. (Note that this lastsystem can be empty if the preceding system u1 consisted of u1 D o only; theempty system is formally independent, and constitutes a basis of the trivial vectorspace fog.)

2. From 1 we already know that V has a basis u1; : : : ; un and from 2.6 we infer thatafter rearangement we have a generating system

v1; : : : ; vk; ukC1 : : : ; un (*)

and this, by 1 again, has to contain a basis. But this basis cannot be a propersubset of (*), by 2.6, since there exists an independent system u1; : : : ; un.

3. If u1; : : : ; un and v1; : : : ; vk are bases then by 2.6, k � n and n � k. ut

3.3

The common cardinality of all bases of a finitely generated vector space V is calledthe dimension of V and denoted by

dimV:

3 Basis and dimension 459

From 2.6 and 3.2 we immediately obtain

Corollary. Let dimV D n. Then1. every generating system u1; : : : ; un is a basis, and2. every linearly independent system u1; : : : ; un is a basis.

3.4 Theorem. A subspace W of a finitely generated vector space V is finitelygenerated and we have dimW � dimV . If dimW D dimV thenW D V .

Proof. We just have to show that W is finitely generated; the other statements areconsequences of the already proved facts (since a basis of W is a linearly indepen-dent system in V ). Suppose W is not finitely generated. Then, first, it contains anon-zero element u1. Suppose we have already found a linearly independent systemu1; : : : ; un. Since V ¤ L.u1; : : : ; un/ there exists a unC1 2 V XL.u1; : : : ; un/. Then,by 2.2.1 4, u1; : : : ; un; unC1 is linearly independent, and we can construct inductivelyan arbitrarily large independent system, contradicting 2.6. ut

3.5 Remark

We have learned that every finitely generated vector space has a basis. In fact, onecan easily prove, using Zorn’s lemma, that every vector space has one. Indeed, let

fIj j j 2 J gbe a chain of independent subsets of V . Then I D SfIj j j 2 J g is an independentset again, since any finite subsetM D fx1; : : : ; xng � I is independent: if xk 2 Ijkthen M � Ir , the largest of the Ijk , k D 1; : : : ; n. Thus there exists a maximalindependent set B and this B is a basis: if there were x … L.B/ we would havefxg [ B independent, by 2.2.1 4, contradicting the maximality.

Recall the sum of subspaces from 1.7. We have

3.6 Theorem. Let W1;W2 be finitely generated subspaces of a vector space V .Then

dimW1 C dimW2 D dim.W1 \W2/C dim.W1 CW2/:

Proof. Consider a basis u1; : : : ; uk of W1 \W2. By 3.2, there exist bases

u1; : : : ; uk; vkC1; : : : ; vr of W1; and

u1; : : : ; uk;wkC1; : : : ;ws of W2:

Then the system

u1; : : : ; uk; vkC1; : : : ; vr ;wkC1; : : : ;ws


obviously generatesW1CW2 and hence our statement will follow if we prove that itis linearly independent (and hence a basis) – since then dim.W1 CW2/ D rC s�k.

To this end, let

kXjD1

˛j uj CrX

jDkC1ˇj vj C

sXjDkC1

�jwj D o:

Then we have

rXjDkC1

ˇj vj D �kX

jD1˛j uj �

sXjDkC1

�jwj 2 W1 \W2

and since it also can be written askX

jD1ıjuj , all the ˇj are zero, by 3.1.1.

Consequently,

kXjD1

˛j uj CsX

jDkC1�jwj D o

and since u1; : : : ; uk;wkC1; : : : ;ws is a basis, also all the ˛i and �i are zero. ut

4 Inner products and orthogonality

4.1

In this section, it is important that we work with vector spaces over R or C. Sinceall the formulas in the real context will be special cases of the respective complexones, the proofs will be done in C.

Recall the complex conjugate z D z1� iz2 of z D z1 C iz2, the formulas z C z0 Dz C z0 and z � z0 D z � z0, the absolute value jzj D p

zz, and realize that for a real z thisabsolute value is the standard one.

4.2

An inner product in a vector space V over C resp. R is a mapping

..x; y/ 7! x � y/ W V V ! C resp. R

such that(1) u � u � 0 (in particular always real), and u � u D 0 only if u D o,(2) u � v D v � u (u � v D v � u in the real case),

4 Inner products and orthogonality 461

(3) .˛u/ � v D ˛.u � v/, and(4) u � .v C w/ D u � v C u � w.We usually write simply uv for u � v, and u2 for uu. Note that

u.˛v/ D .˛v/u D ˛.vu/ D ˛.vu/ D ˛.uv/

and using similarly twice the complex conjugate,

.v C w/u D vu C wu:

Remark: The notation for an inner product sometimes varies. The most commonalternate notation to x �y is hx; yi (although one must beware of possible confusionwith our notation for closed intervals). The notation is particularly convenient whenwe want to express the dependence of the product on some other data, such as amatrix (see Section 7.7 below).

Further, we introduce the norm

jjujj D puu:

4.3 An important example

In the row vector space we will use without further mentioning the inner product thesymbol

x � y DnX

jD1xj yj (in the real case x � y D

nXjD1

xj yj /

(see Exercise (2)). This specific example of an inner product is sometimes referredto as the dot product.

4.4 Theorem. (The Cauchy-Schwarz inequality) We have jxyj � pxx � p

yy.

Proof. We have

0 � .x C �y/.x C �y/ D xx C .�y/x C x.�y/C .�y/.�y/

D xx C �.yx/C �.xy/C ��.yy/:(*)

If x D o then the inequality in the statement holds trivially. Else set

� D �xyyy


to obtain from (*)

0 � xx � xy

yy.yx/ � yx

yy.xy/C .xy/.yx/

.yy/.yy/.yy/ D xx � xy

yy.yx/

and hence .xy/.xy/ D .xy/.yx/ � .xx/.yy/. Take square roots. ut

4.5

Vectors u; v are said to be orthogonal if uv D 0. Note that

the only vector orthogonal to itself is o.

A system u1; : : : ; un is said to be orthogonal if ujuk D 0 whenever j ¤ k. It isorthonormal if, moreover, jjuj jj D 1 for all j .

4.5.1 Proposition. An orthogonal system consisting of non-zero elements (in par-ticular, an orthonormal system) is linearly independent.

Proof. Multiply o D P˛j uj by uk from the right. We obtain 0 D P

.˛j uj /uk DP˛j .ujuk/ D ˛k.ukuk/. Since ukuk ¤ 0, ˛k D 0. ut

4.5.2 Theorem. (The Gram-Schmidt orthogonalization process) For every basisu1; : : : ; un of a vector space V with inner product there exists an orthonormal basisv1; : : : ; vn such that for each k D 1; 2; : : : ; n,

L.v1; : : : ; vk/ D L.u1; : : : ; uk/:

If u1; : : : ; ur is orthonormal we can have vj D uj for j � r .

Proof. Start with v1 D 1jju1jj . If we already have an orthonormal system v1; : : : ; vk

such that L.v1; : : : ; vr / D L.u1; : : : ; ur / for all r � k set

w D ukC1 �kX

jD1.ukC1vj /vj :

For all vr , r � k, we have

wvr D ukC1vr �kX

jD1.ukC1vj /.vj vr / D ukC1vr � ukC1vr D 0:

We have w ¤ o since otherwise ukC1 DkX

jD1.ukC1vj /vj 2 L.v1; : : : ; vk/ D

L.u1; : : : ; uk/ contradicting the linear independence of u1; : : : ; uk; ukC1. Thus wecan set

4 Inner products and orthogonality 463

vkC1 D w

jjwjjand obtain an orthonormal system v1; : : : ; vk; vkC1 and

L.v1; : : : ; vk; vkC1/ D L.u1; : : : ; uk; ukC1/

by 2.5.Finally observe that if u1; : : : ; ur was already orthonormal, the procedure yields

vj D uj until j D r . ut

4.6

The orthogonal complement of a subspaceW of a vector space V with inner productis the set

W ? D fu 2 V j uv D 0 for all v 2 W g:From the properties in 4.1 we immediately obtain

4.6.1 Observations. 1. W ? is a subspace of V and we haveW ? \W D fog andthe implication

W1 � W2 ) W ?2 � W ?

1 :

2. L.v1; : : : ; vn/? D fu j uvj D 0 for all j D 1; : : : ; ng:

4.6.2 Theorem. Let V be a finite-dimensional vector space with inner product.Then we have, for subspaces W;Wj � V ,(1) W ˚W ? D V ,(2) dimW ? D dimV � dimW ,(3) .W ?/? D W , and(4) .W1 \W2/

? D W ?1 CW ?

2 and .W1 CW2/? D W ?

1 \W ?2 .

Proof. (1) and (2): Let u1; : : : ; uk be an orthonormal basis of W . By 2.6 and4.5.2 we can extend it to an orthonormal basis u1; : : : ; uk; ukC1; : : : ; un of V . If

x DnX

jD1˛j uj is in W ? we have 0 D xur D

nXjD1

˛j .ujur / D ˛r for r � k and

x 2 L.ukC1; : : : ; un/.On the other hand, if x 2 L.ukC1; : : : ; un/ then x 2 W ? by 4.6.1 2. Thus,

W ? D L.ukC1; : : : ; un/, and (1) and (2) follow.(3) Obviously W � .W ?/?. By (2), dimW D dim.W ?/? and hence W D

.W ?/? by 3.4.(4) Obviously W ?

i � .W1 \ W2/? and hence W ?

1 C W ?2 � .W1 \ W2/

?, and

similarly W ?1 \W ?

2 � .W1 CW2/?. Now, using (3) and 4.6.1 1 we obtain


.W1 \W2/? D ..W ?

1 /? \ .W ?

2 /?/? � ..W ?

1 CW ?2 /

?/? D W ?1 CW ?

2 ; and

.W1 CW2/? D ..W ?

1 /? C .W ?

2 /?/? � ..W ?

1 \W ?2 /

?/? D W ?1 \W ?

2 : ut

4.7 Hermitian and Symmetric Bilinear Forms

For a vector space V over C, a mapping V V ! C satisfying all the axioms of4.2 except axiom (1) is called a Hermitian form. (Note that by axiom (2) of 4.2,B.v; v/ is always a real number.) If we replace C by R in this definition, we speakof a symmetric bilinear form (over R). For Hermitian and symmetric bilinear forms,one usually does not use the notation �, but a letter, for example B.u; v/, u; v 2 V .A Hermitian (resp. real symmetric bilinear) form B is then called positive definite(resp. negative definite) if B is an inner product (resp. �B is an inner product). Bis called indefinite if it is neither positive nor negative definite. A Hermitian resp.real symmetric bilinear form B is called degenerate if there exists a non-zero vectorv 2 V such that for every w 2 V , B.v;w/ D 0. Otherwise, B is called non-degenerate. Clearly, every degenerate Hermitian or real symmetric bilinear form isindefinite.

Real symmetric bilinear forms, and whether they are non-degenerate and positiveor negative-definite, is important in multivariable differential calculus (see Section 8of Chapter 3). Hermitian forms behave analogously in many ways. It is thereforenatural to ask: Given a Hermitian or real symmetric bilinear form, can we decide if itis positive or negative definite? Doing this algorithmically requires solving systemsof linear equations, which we will review in Appendix B, so we will postpone thesolution of this problem to Appendix B.2.6 below.

5 Linear mappings

5.1

Let V;W be vector spaces. A mapping f W V ! W is said to be linear if

for all x; y 2 V; f .x C y/ D f .x/C f .y/; and

for all ˛ 2 F and x 2 V; f .˛x/ D f .x/:

Note that the “multiplication by elements of F” really acts as individual unaryoperations (recall 1.1). In particular, a linear mapping f W F ! F with F viewed asF1 (recall 1.2 1 satisfies f .ax/ D af .x/, not f .ax/ D f .a/f .x/).

A linear mapping f W V ! W is an isomorphism if there is a linear mappingg W W ! V such that fg D id and gf D id; V and W are then said to beisomorphic.

5 Linear mappings 465

We have an immediate

5.1.1 Observation. A composition of linear mappings is a linear mapping.

5.2 Examples

1. The projections pk D ..x1; : : : ; xn/ 7! xk/ W Fn ! F1 are linear mappings.2. The mapping ..x1; x2; x3/ 7! .x2; x1 � x3// W F3 ! F2 is linear.3. Recall 1.2 2. The mapping .� 7! �.x// W F.X/ ! R1 is linear.4. Let J be an open interval. Recall 1.2 2 again. Taking the derivative at a pointa 2 J is a linear mapping from C1.J / to R1.

See the Exercises for more examples.

5.3 Theorem. Let f W V ! W be a linear mapping such that f ŒV � D W , letg W V ! Z be a linear mapping, and let h W W ! Z be a mapping such thathf D g. Then h is linear.

Proof. For each w 2 W choose an element �.w/ 2 V such that f .�.w// D w.We have h.x C y/ D h.f .�.x// C f .�.y/// D hf .�.x/ C �.y// D g.�.x/ C�.y// D g�.x/Cg�.y/ D hf �.x/Chf�.y/ D h.x/Ch.y/ and similarly h˛x Dh. f �.x// D hf .˛�.x// D g.˛�.x// D ˛g.�.x// D ˛hf �.x/ D ˛h.x/. ut

Note. This is a general fact about homomorphisms between algebraic structures.

5.3.1 Corollary. Every linear mapping f W V ! W that is one-one and onto is anisomorphism.

(Indeed, there is a g W W ! V such that gf D id and gf D id. Since f is ontoand id is linear, g is linear.)

5.3.2 Corollary. If dimV D n then V is isomorphic to Fn.

(Choose a basis u1; : : : ; un and define a mapping f W Fn ! V by settingf ..x1; : : : ; xn// D P

xiui . This f is obviously linear and by 3.1.1 1 it is one-oneand onto.)

5.4 Proposition. Let f W V ! W be a linear mapping. If f is one-one then itsends every linearly independent system to a linearly independent one, if f is ontothen it sends every generating set to a generating one. Consequently, isomorphismspreserve generating sets, linearly independent ones, and bases.

Proof. Let f be one-one and letP˛j f .xj / D o. Then f .

P˛j xj / D f .o/ andP

˛j xj D o so that if x1; : : : ; xn were linearly independent, all the ˛j are zero.


Let f be onto and let M generate V . For a y 2 W choose an x 2 V such thatf .x/ D y and write x as

P˛iui with ui 2 M . Then, y D f .x/ D f .

P˛iui / DP

˛if .ui / with f .ui / 2 f ŒM �. ut

5.5 Theorem. Let u1; : : : ; un be a basis of a vector space V , let W be a vectorspace and let � W fu1; : : : ; ung ! W be an arbitrary mapping. Then there existsprecisely one linear mapping f W V ! W such that f .ui / D �.ui / for each i .

Proof. Since every element of V can be written as x D P˛j uj there is at most

one such f : we must have f .x/ D P˛j �.uj /. On the other hand, if x D P

˛j ujand y D P

ˇj uj then x C y D P.˛j C ˇj /uj and it is, by 3.1.1, the only such

representation. Similarly for ˛x D P˛˛j uj . Thus, setting

f .x/ DX

˛j �.uj / where x DX

˛j uj

yields a linear mapping f W V ! W such that f .ui / D �.ui /. ut

5.6 The Free Vector Space on a Set S

In view of Theorem 5.5, it is an interesting question if for any set S , we can find

a vector space with a basis B and a bijection � W S Š�!B . This is called the freeF-vector space on the set S , and denoted by FS (it is customary to treat � as theidentity, which is usually OK, since it is specified). Of course, for S finite, we maysimply take Fn where n is the cardinality of S . However, for S infinite, the Cartesianproduct FS turns out not to be the right construction. Rather, we set

FS D�a W S ! F j there exists a finite subset F � S such that

a.s/ D 0 for s 2 S X F�:

The operations of addition and multiplication by a scalar are done point-wise. Infact, this is a vector subspace of FS , which is the space of all maps S ! F. ThebasisB in question is the set of all maps as W S ! F where as.s/ D 1 and as.t/ D 0

for t ¤ s. It is easily verified that this is a basis. One usually treats the map S ! FS ,s 7! as as an inclusion, so as becomes identified with s.

5.7 Affine subsets

Let W be a subspace of a vector space V and let x0 2 V . A subset of the form

x0 CW D fx0 C w j w 2 W g

is called an affine subset of V (or affine set in V ).

5 Linear mappings 467

5.7.1 Proposition. Let L be an affine set in V . Then the subspace W in therepresentation

L D x0 CW

is uniquely determined, while for x0 one can take an arbitrary element of L. Thespace W is sometimes referred to as the associated vector subspace of V , and thedimension of V is referred to as the dimension of L.

Proof. We have

w 2 W if and only if w D x � y with x; y 2 L

(x0 C u � .x0 C v/ D u � v 2 W and on the other hand, if w 2 W then w D.x0 C w/ � x0). Now let x1 D x0 C w0 be arbitrary, w0 2 W . Then for any w 2 Wwe have x1 C w D x0 C .w0 C w/ 2 L, and x0 C w D x1 � w0 C w. ut

5.8 Theorem. Let f W V ! Z be a linear mapping. Then(1) W D f �1Œfog� is a subspace of V , and(2) the f �1Œfzg� are precisely the affine sets in V of the form vCW with f .v/ D z.

Proof. (1): If f .x/ D f .y/ D o then f .˛x C ˇy/ D o.(2) Let f .v0/ D z. Then for each w 2 W we have f .v0 C w/ D f .v0/C f .w/ D

z C o D z and on the other hand, if f .v/ D z then f .v � v0/ D z � z D o,hence v � v0 2 W , and v D v0 C .v � v0/. ut

5.9 Affine maps

By an affine map between affine subsets L � V , M � W of vector spaces V , Wwe shall mean simply a map

f W L ! M

which is of the form

f .x/ D y0 C g.x � x0/

where x0 2 L, y0 2 M , and g is a linear map between the associated vectorsubspaces.

It is possible to say a lot more about affine subsets and affine maps. Alternately,many calculus texts do not mention them at all and refer to affine subsets as “linearsubsets”, and affine maps imprecisely as “linear maps [in the broader sense]”.We decided to make the compromise of keeping the terminology precise withoutdwelling on details which would not be useful to us.


6 Congruences and quotients

6.1

A congruence on a vector space V is an equivalence relation E � V V (we willwrite xEy for .x; y/ 2 E) such that

xEy ) .˛x/E.˛y/ for all ˛ 2 F; and

xiEyi ; i D 1; 2 ) .x1 C x2/E.y1 C y2/:

For the equivalence (congruence) classes Œx�; Œy� set

Œx�C Œy� D Œx C y� and ˛Œx� D Œ˛x�

(this is correct: if x0 2 Œx� and y0 2 Œy� then x0Ex and y0Ey and hence.x0 C y0/E.x C y/ and x0 C y0 2 Œx C y�; similarly for Œ˛x�). It is easy to checkthat the set of equivalence classes with these operations constitutes a vector space,denoted by

V=E;

and that

pE D .x 7! Œx�/ W V ! V=E

is a linear mapping onto.

6.2 Theorem. The formulas

E 7! WE D fx j xEog and W 7! EW D f.x; y/ j x �m 2 W g

constitute a one-one corespondence between the congruences on V and subspacesof V .

The congruence classes of E are precisely the affine sets

x CWE:

Proof. ObviouslyWE D fx j xEog is a subspace. If W is a subspace then EW is acongruence: trivially xEW x, if xEW y then x�y 2 W , hence y�x D �.x�y/ 2 Wand yEW x, and if xEW y and yEW z then x�z D .x�y/C.y�z/ 2 W and xEW z;if xiEW yi then .x1 � y1/C .x2 � y2/ 2 W , that is, .x1 C x2/� .y1 C y2/ 2 W andfinally if xEW y we have x � y 2 W and hence ˛x � ˛y 2 W , that is, ˛xEW ˛y.

Now x 2 WEW if and only if xEW o if and only if x D x � o 2 W , and xEWEyif and only if x � y 2 WE if and only if .x � y/Eo if and only if xEy.

7 Matrices and linear mappings 469

Finally, if y 2 Œx� then yEx, hence .y � x/Eo, that is, y � x 2 WE , andy D x C .y � x/ 2 x C WE . If y 2 .x C WE/ then y D x C w with w 2 W andy � x D w 2 W . ut

6.2.1If W is a subspace of V we will use, in view of 5.2, the symbol

V=W instead of V=EW :

We call the vector space V=W the quotient space (or factor) of V by thesubspace W .

6.3

Let f W V ! Z be a linear mapping. The subspace f �1Œfog� of V is called thekernel of f and denoted by

Kerf:

Theorem. (The homomorphism theorem for vector spaces) For every linear map-ping f W V ! Z and every subspaceW � Kerf there is an homomorphism

h W V=W ! Z

defined by h.x CW / D f .x/. If f is onto, so is h. If W D Kerf , h is one-to-one.

Proof. Using the projection V=W ! V=Kerf , x 2 W 7! x C Kerf , it suffices toconsider the caseW D Kerf . If xC Kerf D yC Kerf then x�y 2 Kerf , hencef .x/�f .y/ D o and f .x/ D f .y/. Thus, the mapping h is correctly defined. Sincewe have, for the linear mapping p D .x 7! Œx�/ W V ! V=Kerf with hp D f , h isa linear mapping, by 5.3. Now h is obviously onto if f . If x C Kerf ¤ y C Kerfthen x � y … Kerf and f .x/ � f .y/ D f .x � y/ ¤ o so that h is one-one. ut

7 Matrices and linear mappings

7.1 Matrices

In this section we will deal with vector spaces over the field of complex or realnumbers. A matrix of the type m n is an array

A D0@a11; : : : ; a1n: : : : : : : : :

am1; : : : ; amn

1A


where the entries ajk are numbers, real or complex, according to the context. If mand n are obvious we often write simply

A D .ajk/j;k or .ajk/jk:

Sometimes the jk-th entry of a matrix A is denoted by Ajk.The row vectors

.aj1; : : : ; ajn/; j D 1; : : : ; m

are called the rows of the matrix A, and the

.a1k; : : : ; amk/; k D 1; : : : ; n

are called the columns of A. Hence, a matrix of the typemn is sometimes referredto as a matrix with m rows and n columns.

Matrices of the type m m are called square matrices.

7.2 Basic operations with matrices

Transposition. Let A D .ajk/jk be an m n matrix. The n m matrix

AT D .a0jk/jk where a0

jk D akj

is called the transposed matrix of A. There is a variant of this construction over thefield C: If A is a matrix over C, we denote by A� the complex conjugate of AT ,i.e. the matrix obtained from AT by replacing every entry by its complex conjugate.This is sometimes called the adjoint matrix of A. A (necessarily square) matrix Awhich satisfies AT D A (resp. A� D A) is called symmetric (resp. Hermitian).

Multiplication. Let A D .ajk/jk be an m n matrix and let B D .bjk/jk be ann p matrix. The product of A and B is the matrix

AB D .cjk/jk where cjk DnXrD1

ajrbrk:

The unit matrices are the matrices of type n n defined by

I D In D .ıkj /jk where ıkj D(1 if j D k

0 if j ¤ k:

We obviously have

.AB/T D BTAT ; .AB/� D B�A� and AI D A and IA D A whenever defined.


The motivation for the definition of the product will be apparent in 7.6 below,where we will also learn more about its properties.

7.3 Row and column vectors as matrices

A vector x D .x1; : : : ; xn/ 2 Fn will be viewed as a matrix of the type 1 n. Also,we will consider the column vectors, matrices of type n 1,

xT D0@x1

: : :

xn

1A :

Clearly, all column vectors of a given dimension n also form a vector space overF, known as the n-dimensional column vector space and denoted as Fn. We willsee that in spite of the fact that it is more convenient to write rows than columns,the space of columns is more convenient in the sense that for columns, compositionof linear maps corresponds to multiplication of matrices without reversing orders(see Theorem 7.6 below). Because of this, nearly all courses in linear algebra nowuse the space of column vectors and not row vectors as the default model of ann-dimensional vector space. We will follow this convention in this text as well. Inparticular, we will extend the convention 1.3 to column vectors.

7.4 The standard bases of Fn, Fn

In the row vector space Fn, we will consider the basis

e1; : : : ; en where .ej /k D(1 if j D k;

0 if j ¤ k

and in Fn, we will consider the basis

e1; : : : ; en where ei D .ei /T

(this notation conforms with 1.3; of course .ej /k D ıkj from 7.2).The ej ’s from Fm and Fn withm ¤ n differ (and similarly for ej ), but this rarely

causes confusion. In the rare cases where it can we will display the dimension n asnej , nej .

Obviously we have

x DnX

jD1xj ej : (7.4.1)


7.5 The linear maps fA ,f A

Let A be a matrix of typem n. Define a mapping

fA W Fm ! Fn by setting fA.x/ D xA;

and a mapping

f A W Fn ! Fm by setting f A.x/ D Ax:

7.5.1 Theorem. The mappings fA, f A are linear and the formula

A 7! fA

resp.

A 7! f A

yields a bijective correspondence between matrices of type m n and the set of alllinear mappings Fm ! Fn resp. Fn ! Fm.

Proof. We will prove the statement about row spaces. The statement for columnspaces is analogous (see Exercise (10)). The linearity of the formula is an immediateconsequence of the definition of a product of matrices.

We have

.ej A/1k DnXrD1

ejrark D ajk (*)

and hence if A ¤ B , there exist r; s such that ars ¤ brs . Thus, fA ¤ fB .Now let f W Fm ! Fn be an arbitrary linear mapping. Consider the ajk uniquely

defined by the formula

f .mej / DnX

kD1ajk.nek/

and define A as the array .ajk/jk. We have, by (*),

f .x/ D f .Xj

xj .mej // DXj

xj f .mej / DXj

xjXk

ajk.nek/ DXk

.Xj

xj ajk/.nek/;

and hence f .x/1k D .xA/1k and finally f .x/ D .xA/. ut


7.6 Theorem. In the representation of linear mappings from 7.5 we have

fI D id; fAB D fB ı fA;

and

f I D id; f AB D f A ı f B:

Proof. We will only prove the statement for row vectors. The statement for columnvectors is analogous (see Exercise (11)). The first formula is obvious. Now let A,B be matrices of types m n resp. n p. If two linear maps agree on a basis theyobviously coincide. We have

fB.fA.mej // D fB.Xk

ajk.mek// DXk

ajkfB.mek/

DXk

ajk.Xr

bkr .per / DXr

.Xk

ajkbkr/per D fAB.mej /: ut

7.6.1From the associativity of composition of mappings and from the uniqueness of thematrix in the representation of linear mappings as fA we immediately obtain

Corollary. Multiplication of matrices is associative, that is, A.BC/ D .AB/C

whenever defined.

7.6.2 Different bases, base changeAt this point we must mention the fact that the association between matrices andlinear maps works for arbitrary finite-dimensional vector spaces V;W . Let B D.v1; : : : ; vn/ resp. C D .w1; : : : ;wm/ be sequences of distinct vectors in V resp.W which, when considered as sets, form bases of V and W (we speak of orderedbases). Then for an m n matrix A over F, we have an associated linear map

B;C fA W V ! W

given by

B;C fA.vj / D

mXiD1

aijwi :

Clearly, (for example, by considering the isomorphisms between V , Fn and W ,Fm mapping B and C to the standard bases), this again defines a bijectivecorrespondence between m n matrices over F and linear maps from V to W .We will say that the linear map B;C f

A is associated to the matrix A with respect tothe bases B and C , and, vice versa, that A is the matrix associated with the linear


map (or simply matrix of the linear map) f D B;C fA with respect to the bases

B;C . An analogue of Theorem 7.6 of course holds, i.e.

B;DfA1A2 D C;Df

A1 ı B;C fA2 (*)

for an m n matrix A1 and an n p matrix A2, and ordered bases B;C;D ofm- resp. n- resp. p-dimensional spaces U , V , W .

For two ordered bases B;B 0 of the same finite-dimensional vector space V , thematrix of Id W V ! V with respect to the basis B in the domain and B 0 in thecodomain is sometimes referred to as the base change matrix from the basis B tothe basis B 0. By (*), base change matrices can be used to relate matrices of linearmaps with respect to different bases, both in the domain and codomain.

7.7 Hermitian matrices and Hermitian forms

Given a Hermitian (resp. symmetric) matrix A of type n n over C (resp. over R),we have a Hermitian (resp. symmetric bilinear) form B on C

n (resp. Rn) given by

B.x; y/ D y�Ax:

In case when B is positive-definite, this becomes an inner product, also denoted by

hx; yiB:

(In the real case, of course, y� D yT .) Conversely, the axioms immediately implythat every Hermitian (resp. symmetric bilinear) form on Cn (resp. Rn) arises in thisway. We will say that the form B is associated with the matrix A and vice versa.Sometimes we simplify the terminology and call a Hermitian (resp. real symmetric)matrix positive definite resp. negative definite resp. indefinite if the correspondingproperty holds for its associated Hermitian (resp. symmetric bilinear) form.

8 Exercises

(1) Prove the statement made in Example 1.2 3.(2) Prove that the dot-product from 4.2 satisfies the definition of an inner product,

and more generally the B defined in Subsection 7.7 is a Hermitian (resp.symmetric bilinear) form.

(3) Prove that every Hermitian (resp. symmetric bilinear) form on Cn (resp. Rn)is associated with a Hermitian (resp. symmetric) matrix.

(4) Take the vector space V from 1.23. Prove that .x 7! ln x/ is an isomorphismV ! R1.

8 Exercises 475

(5) Prove that if �1, �2 are inner products on a (real or complex) vector space V ,and �;� > 0, then �.�1/C �.�2/ is an inner product.

(6) Prove that linear maps F ! F are precisely the mappings .x 7! ax/ wherea 2 F is fixed.

(7) Prove that if ha; bi is a closed interval then .� 7!Z b

a

�.x/dx/ is a linear

mapping C.ha; bi/ ! R1.(8) Prove that the set of all as , s 2 S in 5.6 forms a basis of the free vector space

FS on a set S .(9) Prove that an affine map f W L ! M between affine subsets of vector spaces

V , W can be made to satisfy the definition 5.9 with any choice of the elementx0 2 L. Is an analogous statement true for y0 2 M ?

(10) Prove the statement of Theorem 7.5.1 for column vectors.(11) Prove the statement of Theorem 7.6 for column vectors.(12) Prove that the set of all matrices of type m n with entries in F is a vector

space over F where addition is addition of matrices, and multiplication by ascalar � 2 F is the operation which multiplies each entry by �. Is this vectorspace finite-dimensional? What is its dimension?

BLinear Algebra II: More about Matrices

1 Transforming a matrix. Rank

1.1 Elementary row and column operations

Recall Section A.7. LetA be a matrix of typemn. The vector subspace Row.A/ ofFn generated by the rows of A is called the row space of A and the vector subspaceCol.A/ of Fm generated by the columns is called the column space of A.

An elementary row (resp. column) operation on A is any of the following threetransformations of the matrix.(E1) A permutation of the rows (resp. columns).(E2) Multiplication of one of the rows (resp. columns) by a non-zero number.(E2) Adding to a row (resp. column) a linear combination of the other rows (resp.

columns).

1.1.1 Observation. An elementary row (resp. column) operation does not changethe row resp. column space.

1.2

The column space is, of course, changed by a row operation (and the row space ischanged by a column operation). We have, however, the following

Proposition. An elementary row (resp. column) operation preserves the dimensionof the column (resp. row) space.

Proof. Let p be a permutation of the set f1; 2; : : : ; ng. Define �p W Fn ! Fn bysetting

�p.x1; : : : ; xn/ D .xp.1/; : : : ; xp.n//:


477

478 B Linear Algebra II: More about Matrices

Obviously �p is an isomorphism: trivially it is linear, and it has an inverse, namely�p�1 .

Further, for a non-zero a define

�a.x1; x2; : : : ; xn/ D .ax1; x2; : : : ; xn/:

Again, it is an isomorphism, with the inverse �a�1 .Finally, setting for numbers b2; : : : ; bn,

.x1; x2; : : : ; xn/ D .x1 CnX

jD2bj xj ; x2; : : : ; xn/;

we obtain an isomorphism with the inverse sending .x1; x2; : : : ; xn/ to

.x1 �nX

jD2bj xj ; x2; : : : ; xn/:

Now performing elementary row operations onAwe transform the column spaceby the isomorphisms �p, �a and ; an isomorphism sends a basis to a basis (A.5.4)and hence preserves dimension. ut

1.3 Theorem. For any matrix A, the dimensions of the row and column spacescoincide.

Proof. By 1.1.1 and 1.2, the dimensions are unchanged after arbitrarily many rowand column operations.

If ajk D 0 for all j; k then both the dimensions are zero. Let there be an ajk ¤ 0.Performing (E1), we can move the ajk to the position .1; 1/ and multiplying the

(now) first row by1

ajkwe have our matrix transformed to

0BB@

1; b12; : : : ; b1n

b21; b22; : : : ; b2n: : : : : : : : :

bm1; bm2; : : : ; bmn

1CCA :

Now we will perform the operations (E3) subtracting the first row bj1 times from thej -th one, and when this is finished we do the same with the columns thus obtainingthe matrix transformed to

0BBB@

1; 0; : : : ; 0

0; a.2/22 ; : : : ; a

.2/2n

: : : : : : : : :

0; a.2/m2; : : : ; a

.2/mn

1CCCA :

2 Systems of linear equations 479

If all the a.2/jk with j; k � 2 are zero, the dimension of the two spaces are 1.

Otherwise choose an a.2/jk ¤ 0, move it to the position .2; 2/ by (E1) operations(without affecting the first row and column) and repeat the procedure as aboveto obtain

0BBBBB@

1; 0; 0; : : : ; 0

0; 1; 0; : : : ; 0

0; 0; a.3/33 ; : : : ; a

.3/3n

: : : : : : : : :

0; 0; a.3/m3; : : : ; a

.3/mn

1CCCCCA:

After sufficiently many repetitions of the procedure we have a.rC1/jk D 0 for allj; k > r and have a matrix

B D

0BBBBBBBBB@

1; 0; : : : ; 0; 0; : : : ; 0

0; 1; : : : ; 0; 0; : : : ; 0

: : : : : : : : :

0; 0; : : : ; 1; 0; : : : ; 0

0; 0; : : : ; 0; 0; : : : ; 0

: : : : : : : : :

0; 0; : : : ; 0; 0; : : : ; 0

1CCCCCCCCCA

with the first r diagonal entries 1 and all the others zero, and hence the dimensionsof both the row and the column spaces are equal to r . ut

1.4

The common dimension of the row and column spaces is called the rank of thematrix and denoted by

rankA:

2 Systems of linear equations

2.1

Let A D .ajk/jk be a matrix of type m n and let b1; : : : ; bm be numbers. A systemof linear equations is a name for the task of determining x1; : : : ; xn 2 F such that

a11x1 C a12x2 C � � � C a1nxn D b1

: : : : : : : : :

am1x1 C am2x2 C � � � C amnxn D bm

: (2.1.1)


If .b1; : : : ; bn/ D o we speak of a homogeneous system, and when replacing theoriginal b by o we speak of the homogeneous system associated with (2.1.1).

The matrix A is called the matrix of the system and the matrix

0@a11; : : : ; a1n; b1: : : : : : : : :

am1; : : : ; amn; bm

1A

is referred to as the augmented matrix of the system.

2.2 Three views of the task

1. Recall A.7.5. We seek an x such that

AxT D bT :

Thus we have a linear map f W Fn ! Fm and would like to determine the set

f �1ŒbT �:

2. If we denote by c1; : : : ; cn the columns of A, we are seeking numbers x1; : : : ; xnsuch that

nXjD1

xj cj D bT :

3. The associated homogeneous system can be understood as seeking the x suchthat

x � aj D 0 for all j D 1; : : : ; m

where � is the dot product and aj D .aj1 : : : ; ajn/ are the complex conjugates ofthe rows of A (this approach is valid for F D R;C, which, as remarked above,are the only contexts we are interested in).

Thus, the set of solutions of the associated homogeneous system coincides withthe orthogonal complement

L.a1; : : : ; am/?:

Now the dimension of L.a1; : : : ; am/ is the same as that of the row space, thatis, equal to the rank r od A: if we perfom the procedure from Theorem A.3.2(the Gram-Schmidt process) on the system a1; : : : ; am, we end up with a basis of


the same size as when starting with a1; : : : ; am (since aj is a linear combination ofthe other ak’s if and only if aj is a linear combination of the other ak’s).

Thus, by Theorem A.4.6.2, the dimension of the subspace of solutions of ahomogeneous system is n � rankA.

2.3

From 2.2 2, we immediately obtain

2.3.1 Theorem (Frobenius). A system of linear equations has a solution if and onlyif the rank of the matrix of the system is the same as the rank of the augmented one.

(That is: if and only if the right-hand side column is in the column space of A.)From 2.2 1 and 2.2 3, we obtain

2.3.2 Theorem. If a system of linear equations has a solution x0, then the set of allsolutions is an affine set

x0 CW

where W is the set of all solutions of the associated homogeneous system. Thedimension of this affine set is n � rankA.

2.4 The Gauss Elimination Method

By 2.3.2, to determine the set of all solutions of the system (2.1.1), it suffices to findone of its solutions and s D n � r linearly independent solutions x1; : : : ; xs of theassociated homogeneous system, where r D rankA. The general solution is then

x0 CsX

jD1˛j xj ; ˛j 2 F arbitrary:

First observe that

elementary row operations on the augmented matrix preserve the solution set.

Column operations change the solution set and will not be used, with the exceptionof the (E1) performed on the A-part of the augmented matrix: this is relativelyharmless; we will only have to keep track of the permuted coordinates of solutions.

Start with the augmented matrix and transform it by (E1) operations so that the.1; 1/ entry is non-zero, moving there a non-zero aj1k . Remember j1. Then multiplythe first row by .a0

j1k/�1 to obtain


0BB@

1; a012; : : : ; a

01n; b

01

a021; a

022; : : : ; a

02n; b

02

: : : : : : : : :

a0m1; a

0m2; : : : ; a

0mn; b

02

1CCA

and then subtract from the j -th rows, j D 2; : : : ; m, the a0j1 multiple of the first

one. Now we have

0BB@

1; a012; : : : ; a

01n; b

01

0; a0022; : : : ; a

002n; b

002

: : : : : : : : :

0; a00m2; : : : ; a

00mn; b

002

1CCA :

We repeat the procedure in the part of the matrix with indices � 2 (during this, ofcourse, the a0

12; : : : ; a01n are permuted, too; again, the j2 from the a00

j2kmoved to the

.2; 2/ position to be remembered). After repeating the procedure r � 1 times weobtain a matrix

0BBBBBBBBBBB@

1; c12; c13; : : : ; c1r ; : : : ; c1n; Qb10; 1; c23; : : : ; c2r ; : : : ; c2n; Qb20; 0; 1; : : : ; c3r ; : : : ; c3n; Qb3: : : : : : : : :

0; 0; 0 : : : 1; : : : ; crn; Qbr0; 0; 0 : : : 0; : : : ; 0; 0

: : : : : : : : :

0; 0; 0 : : : 0; : : : ; 0; 0

1CCCCCCCCCCCA

(note that because of Frobenius’ Theorem the right-hand side becomes zero after ther-th row or else the system has no solution) corresponding to a system of equations

y1 C c12y2C c13y3 C � � � C c1ryrC c1;rC1yrC1 C � � � Cc1nyn D Qb1;y2C c23y3 C � � � C c2ryrC c2;rC1yrC1 C � � � Cc2nyn D Qb2;

: : : : : :

yrC cr;rC1yrC1 C � � � Ccrnyn D Qbrwith the same system of solutions if we set yk D xjk .

The one solution y0 of the system can be obtained by setting y0;rC1 D y0;rC2 D� � � D y0;n D 0, y0;r D Qbr , and then recursively

y0;k�1 D �nX

jDkck�1;j y0j C Qbk�1:


A basis yi (i D 1; : : : ; s D n � r) of the vector space of solutions of the associatedhomogeneous system can be then obtained by setting yi;rCi D 1, yi;rCj D 0

otherwise, and then recursively

yi;k�1 D �nX

jDkck�1;j yij:

2.5 Regular matrices

A matrix A D .aij/ij of type nn is said to be regular (or non-singular) if rankA Dn. In such a case, each system of equations

nXjD1

aijxj D bj; i D 1; 2; : : : ; n

has precisely one solution: it has a solution since the augmented matrix, being oftype n .nC1/, cannot have a bigger rank than n; on the other hand, the dimensionof the set of solutions is n � n D 0. By 1.3,

a matrix A is regular if and only if AT is regular.

2.5.1 Theorem. The following statements about a square matrix A are equivalent.(1) A is regular.(2) There exists a matrix U such that AU D I .(3) There exists a matrix V such that VA D I .(4) The matrix A has a unique inverse matrix, that is, there is a unique U such that

UA D AU D I .

Notation. The inverse matrix of A will be denoted by A�1.

Proof. (1))(2),(3): Notation from 2.2 1 and A.7.4. For each ei on the right-handside there is a solution xi such that

AxTi D eTi .D ei /:

Thus,Xk

ajkxik D ıji , and if we set uij D xj i we have

Xk

ajkuki D ıji , that is, we

have a U such that AU D I . The statement (3) is obtained applying this reasoningfor AT and using A.7.2.

(2))(1): LetXj

aijujk D ıki . Fix k and set xj D ujk. Then in the notation of

2.2.2 we have for the columns cj of A,


Xj

xj cj D ek:

Thus, the column space contains all the ek and hence its dimension is n.(2)&(3))(4): IfAU D I and VAD I we have have V DV.AU /D .VA/U DU .(4))(2) is trivial. ut

2.6 Deciding if a Hermitian form is positive-definiteor negative-definite

Recall now our problem from A.4.7 of deciding if a Hermitian (or real symmetricbilinear) form is positive-definite or negative-definite. Consider a Hermitian formB on a finite-dimensional complex vector space V (the case of a real symmetricbilinear form is analogous). Then perform the following procedure:

Start with k D 0. Suppose we have constructed vectors v1; : : : vk 2 V such thatB.vi ; vi / ¤ 0, B.vi ; vj / D 0 for i ¤ j . Note that the vectors vi must be linearlyindependent. (In effect, suppose

kXiD1

aivi D 0:

Applying B.‹; vi /, we get ai D 0.) Then, using a system of linear equations, find anon-zero vector w 2 V such that B.vi ;w/ D 0 for all i D 1; : : : ; k. If no such wexists, then by 2.2 3, k � dim.V /, and by linear independence, equality arises, so thevi ’s form a basis of V . In this case, if the signs of the real numbers B.vi ; vi / are allpositive (resp. negative), B is positive-definite (resp. negative-definite). Otherwise,B is indefinite.

Suppose the vector w exists. If B.w;w/ ¤ 0, put vkC1 D w and repeat theprocedure with k replaced by k C 1. If B.w;w/ D 0, find a vector u 2 V such thatB.w; u/ ¤ 0. If no such u exists, B is degenerate. If u exists, then

4B.u;w/ D B.u C w; u C w/C iB.iu C w; iu C w/�B.�u C w;�u C w/� iB.�iu C w;�u C w/

by the axioms, so choosing vkC1 as one of the vectors u C w, �u C w, iu C w,�iuCw, the vector vkC1 will satisfy B.vkC1; vkC1/ ¤ 0. Repeat the procedure withk replaced by k C 1.

3 Determinants 485

3 Determinants

3.1

A group G is a set with a binary operation � which satisfies associativity, has a unitelement e and an inverse unary operation .‹/�1. Explicitly, the axioms are

.a � b/ � c D a � .b � c/;a � e D e � a;

x � x�1 D x�1 � x D e:

For groups G;H , a map f W G ! H is called a homomorphism of groups if wehave

f .a � b/ D f .a/ � f .b/ for all a; b 2 G:

A bijective homomorphism of groups is called an isomorphism (of groups).Obviously, the inverse of an isomorphism is again an isomorphism.

Immediate examples of groups include the set Z of all integers with the operationC, the set f1;�1g with the operations � (multiplication), as well as R or C with theoperation C or R� D R X f0g, C� D C X f0g with the operation �. Note that allthose groups have the additional property that

a � b D b � a

where � is the operations. Groups satisfying this property are called commutative orabelian. We will soon encounter examples of groups which are not abelian.

We will not develop the theory of groups at all here (and the reader is referred to[2] and [4] for more on abstract algebra), but they do come up naturally in the contextof the determinant. In particular we will use the obvious fact that the mappingsG ! G

x 7! x�1 and x 7! ax for a fixed a 2 G

are bijections (the first is inverse to itself, the other one to x 7! a�1x). It thenfollows that if f W G ! R or C is any mapping then

Xx2G

f .x/ DXx2G

f .x�1/ DXx2G

f .ax/ (3.1.1)

(all three are the same sum, only rearanged).


3.2 The sign of a permutation

We will be concerned with the group P.n/ of permutations of the set f1; 2; : : : ; ng,i.e. bijections f1; 2; : : : ; ng ! f1; 2; : : : ; ng, where the operation is composition. Apermutation p 2 P.n/ will be usually encoded as a sequence

.k1; : : : ; kn/ where kj D p.j /:

A transposition is a permutation interchanging two of the elements and keepingall the others.

3.2.1 Theorem. 1. Every pemutation can be obtained as a composition of trans-positions.

2. If p 2 P.n/ can be represented as a composition of an even (resp. odd) numberof transposition then in any such representation the number of transpositions iseven (resp. odd).

Proof. 1. By induction. The statement is obvious for n D 1; 2. Now let it hold forP.n/ and let p be a permutation of f1; : : : ; n; nC1g. Consider the transposition �interchangingnC1 with p.nC1/ (if p.nC1/ D nC1 set � D id). Now q D �ıpsends n C 1 to n C 1, hence f1; : : : ; ng to f1; : : : ; ng. The restriction q0 of q tof1; : : : ; ng can be written as q0 D � 0

1 ı � � � ı � 0r with transpositions � 0

j . Extendingthese to transpositions �j of f1; : : : ; n; nC 1g we obtain a representation

p D � ı �1 ı � � � ı �r :

2. Encode p as .k1; : : : ; kn/ and set

I.p/ D f.i; j / j i < j and ki > kj g; �.p/ D #I.p/

(# indicates the number of elements). We will prove that for any transposition thenumber

j�.� ı p/� �.p/j

is odd; since �.id/ D 0 the statement will follow.Let � exchange ˛ with ˇ, ˛ < ˇ, let q D � ı p. Then we have

p D .k1; : : : ; k˛�1; k˛; k˛C1; : : : ; kˇ�1; kˇ; kˇC1; : : : ; kn/ and

q D .k1; : : : ; k˛�1; kˇ; k˛C1; : : : ; kˇ�1; k˛; kˇC1; : : : ; kn/:

We obviously have .i; j / 2 I.p/ if and only if .i; j / 2 I.q/ for

i; j ¤ ˛; ˇ; or i < ˛ and j 2 f˛; ˇg, or ˇ < j and i 2 f˛; ˇg.

3 Determinants 487

Thus we have to discuss the cases(a) .˛; j / with ˛ < j < ˇ,(b) .j; ˇ/ with ˛ < j < ˇ, and(c) .˛; ˇ/.In cases (a) and (b) we have together an even number of changes: we have .˛; j / 2I.p/ if and only if .j; ˇ/ … I.q/, and .j; ˇ/ 2 I.p/ if and only if .˛; j / … I.q/;thus if there are s many .˛; j / 2 I.p/ and t many .j; ˇ/ 2 I.p/ we have sC t suchpairs in I.p/ and u�sCu D t D 2u�.sCt/ such pairs in I.q/where u D ˇ�˛C1.The case (c) stands alone, and it is in precisely one of the I.p/, I.q/. ut

3.2.2 Notation and observationWe define

sgnp D(

C1 if p is a composition of an even number of transpositions,

�1 if p is a composition of an odd number of transpositions.

From the definition we immediately infer that

sgn id D 1; sgn .p ı q/ D sgnp � sgn q and sgnp�1 D sgnp:

Permutations p with sgnp D 1 (resp. sgnp D �1) are called even (resp. odd).

3.2.3 Corollary. The map

sgn W P.n/ ! f1;�1gsending a permutation to its sign is a homomorphism of groups, where on f1;�1g,we consider the operation of multiplication.

3.3

The determinant of a matrix A D .aij/ij is the number

detA DX

p2P.n/sgnp � a1;p.1/ � � �an;p.n/:

It is often indicated asˇˇˇa11; : : : ; a1n: : : : : :

an1; : : : ; ann

ˇˇˇ :

Thus for instance

ˇˇa; bc; d

ˇˇ D ad � bc (and this is about the only case of a determinant

easily and transparently computed from the basic definition).


3.3.1 Proposition. 1. detAT D detA.2. If B is obtained from a square matrix A by permuting the rows or columns

following a permutation p 2 P.n/ then detB D sgnp � detA.

Proof. Rearranging the factors we obtain the formula a1p.1/ � � � anp.n/ Dap�1.1/1 � � �ap�1.n/n and since sgnp�1 D sgnp we can rewrite the formula from thedefinition as

detA DX

p2P.n/sgnp�1 � ap�1.1/1 � � �ap�1.n/n

which is, by (3.1.1), equal to

Xp2P.n/

sgnp � ap.1/1 � � �ap.n/n:

2. It suffices to prove it for a permutation of rows. We have B D .ap.i/j /ij so that

detB DXq2P.n/

sgn q � ap.1/q.1/ � � �ap.n/q.n/:

Rearanging the factors and using 3.2.2, we obtain

detB DXq2P.n/

sgn q � a1;qp�1.1/ � � �an;qp�1.n/

D sgnpXq2P.n/

sgn qp�1 � a1;qp�1.1/ � � �an;qp�1.n/

and by (3.1.1),

� � � D sgnpXq2P.n/

sgn q � a1;q.1/ � � �an;q.n/ D sgnp � detA: ut

3.3.2 Corollary. If there are in a matrix A two equal colums or rows thendetA D 0.

(For, transposing such two rows yields detA D � detA.)From the formula for detA we immediately get the following

3.4 Theorem. A determinant is linear in each of its rows (resp. columns). That is,if A is a matrix of type n n and if Aj .x/ is obtained from A by replacing the j -throw by x then the mapping

.x 7! detAj .x// W Fn ! R resp. C

is linear.

4 More about determinants 489

3.4.1 ConventionThe notation Aj .x/ will be kept in the remainder of this chapter. Furthermore, wewill use the symbol Aj .xT / for the matrix in which the i -th column is replacedby xT .

3.4.2 Theorem. If B is obtained from A by adding to a row (resp. column) a linearcombination of the other rows (columns) then detB D detA.

Proof. Let a1; : : : ; an be the rows of A. We have A D Ai.ai / and B D Ai.ai CXj¤i

˛j aj /. By 3.2, detAi.aj / D 0 for j ¤ i and hence

detB D detAi.ai CXj¤i

˛j aj / D detAi.ai /CXj¤i

˛j detAi.aj / D detA: ut

3.4.3 Proposition. Let aij D 0 for i > j . Then detA D a11a22 � � �ann. Moreexplicitly,

ˇˇˇˇˇˇ

a11; a12; a13; : : : ; a1;n�1; a1n0: a22: a23; : : : ; a2;n�1; a2n0: 0: a33; : : : ; a3;n�1; a3n

: : : : : : : : :

0; 0; 0; : : : ; 0; ann

ˇˇˇˇˇˇ

D a11a22 � � �ann:

Proof. follows again from the definition: if p ¤ Id then there is an i with i > p.i/.ut

3.4.4 Computing a determinantUsing elemetary operations of the type (E1) and (E3) we can easily transform thematrix in our determinant into the form as in 3.4.3; then we will have the value asthe product of the elements on the diagonal.

The (E3) operations do not change the value (see 3.4.2). We have to be morecareful with the (E1) operations, though. Since computing of the sign may notbe quite transparent, it is prudent to use transpositions only, and whenever suchis performed, to multiply automatically one of the rows or columns by �1.

4 More about determinants

4.1 Minors and the inverse matrix

Denote by A.i;j / the matrix obtained from A by deleting the i -th row and the j -thcolumn. The number

˛ij D .�1/iCj detA.i;j /

is called the .i; j /-th minor of A.


4.1.1Recall the notation from 3.4.1. We have the following

Theorem. detAi.x/ DnX

jD1xj ˛ij and detAj .xT / D

nXjD1

xi˛ij.

Proof. We shall treat the case of rows (the case of columns is analogous). Sincex D P

xj ej we have

detAi.x/ DXj

xj detAi.ej /:

Now

detAi.ej / D

ˇˇˇˇˇˇˇˇ

a1;1 : : : a1;j�1 0 a1;jC1 : : : a1;n

: : : : : : : : : : : : : : : : : : : : :

ai�1;1 : : : ai�1;j�1 0 ai�1;jC1 : : : ai�1;n0 : : : 0 1 0 : : : 0

aiC1;1 : : : aiC1;j�1 0 aiC1;jC1 : : : aiC1;n: : : : : : : : : : : : : : : : : : : : :

an;1 : : : an;j�1 0 an;jC1 : : : an;n

ˇˇˇˇˇˇˇˇ

:

Exchange subsequently the i -th row with the .i � 1/-th one then the .i � 1/-th rowwith the .i �2/-th one, etc., and then similarly operating with the rows we move the1 from the .i; j /-th to the .1; 1/-th position and obtain

detAi.ej / D .�1/iCjˇˇ 1 o

yT A.i;j /

ˇˇ D .�1/iCj detA.i;j / D .�1/iCj ˛ij: ut

4.1.2 Corollary. In particular, for x the j -th row of A, we obtain

nXjD1

akj ˛ij DnX

jD1ajk˛j i D ı

ji detA; hence A � .˛jk/

Tjk D I � detA (*)

from which we immediately get a formula for the inverse matrix,

A�1 D� ˛ij

detA

Tij:

4.2 Cramer’s Rule

Recall the representation of a system of linear equations as

AxT D bT

4 More about determinants 491

from 2.2 1. If A is a regular matrix we can multiply this formula by A�1 from theleft to obtain

xT D A�1AxT D A�1bT :

Thus, by 4.1.2 we obtain

xi D 1

detA

Xj

˛j i bj :

The sum is then by 4.1.1 equal to detAi.b/ so that we obtain the formula (Cramer’sRule)

xi D detAj .b/

detA:

Of course computing the solutions using this formula would be much harder thanusing the Gauss Elimination. It is, however, useful for theoretical purposes.

4.3 Determinants and products of matrices

4.3.1 Lemma. Let A;B be square matrices and let C be a matrix of the form

�A M

O B

�or as

�A O

M B

�

where O indicates a system of zero entries while the entries at M are arbitrary.Then

detC D detA � detB:

Proof. It suffices to treat the first case. Transform the matrix as indicated in 3.4.4 toobtain

0BBBBBBBBBBBBBBB@

a011 a

012 a

013 : : : a

01m

0 a022 a

023 : : : a

02m

0 0 a013 : : : a

03m M

: : : : : : : : : : : : : : :

0 0 0 : : : a0mm

b011 b

012 b

013 : : : b

01n

0 b022 b

023 : : : b

02n

O 0 0 b013 : : : b

03n

: : : : : : : : : : : : : : :

0 0 0 : : : b0nn

1CCCCCCCCCCCCCCCA

:


If we do the first just in the first m rows and columns and then in the remainingones, the left upper part corresponds to the transformation of the matrix A and theright lower one is the matrix B transformed. Thus we have detA D a0

11a022 � � �a0

mm,detB D b0

11b022 � � �b0

nn and detC D a011a

022 � � �a0

mmb011b

022 � � �b0

nn D detA � detB . ut

4.3.2 Theorem. Let A;B be matrices of type n n. Then

detAB D detA � detB:

Proof. Consider the matrix

C D

0BBBBBBBBB@

a11 : : : a1n �1 0 : : : 0

a21 : : : a2n 0 �1 : : : 0: : : : : : : : : : : : : : : : : : : : :

an1 : : : ann 0 0 : : : �10; : : : ; 0 b11 b12 : : : b1n: : : : : : : : : : : : : : : : : : : : :

0; : : : ; 0 bn1 bn2 : : : bnn

1CCCCCCCCCA

:

To the i -th column add the a1i multiple of the .n C 1/-th column, the a2i multipleof the .nC 2/-th column, etc. untill the ani multiple of the 2n-th column. Then theupper left part anihilates, and the lower left part becomes AB , schematically

�O �InAB B

�:

Now let us exchange the i -th and .nC i/-th rows and, to compensate the change ofsign, multiply after each of these exchanges the i -th row by -1. We obtain

D D�In O

�B AB

�

and still detC D detD. By Lemma 4.3.1, detC D detA � detB and detD Ddet I � detAB D detAB . ut

4.4 Proposition. A square matrix A is regular if and only if detA ¤ 0.

Proof. If A is not regular then some of the rows are linear combinations of theothers and detA D 0 by 3.4.2. If A is regular it has an inverse A�1. Thus by 3.3.2,detA � detA�1 D det AA�1 D det I D 1 and hence detA ¤ 0. ut

5 The Jordan canonical form of a matrix 493

4.5 The determinant of a linear map

Let V be a finite-dimensional vector space over F and let f W V ! V be a linearmap. Then Theorem 4.3.2 enables us to define the determinant det.f / of the linearmap f as the determinant of the matrix A of f with respect to the same orderedbasis B in the domain and the codomain (see A.7.6.2). Note that the choice of thebasis B does not matter because if we choose another basis B 0 and denote the basechange matrix from B to B 0 by M , then the matrix of f with respect to B 0 in thedomain and codomain is MAM�1, and

det.MAM�1/ D det.M/det.A/det.M/�1 D det.A/:

5 The Jordan canonical form of a matrix

5.1 Eigenvalues and eigenvectors of a matrix

An eigenvalue of a matrix A is a number � 2 F such that there exists a non-zerocolumn vector v with

Av D �v: (5.1.1)

The column vector v is then called an eigenvector of A (associated with theeigenvalue �).

Note. These concepts are very useful (see an application in Chapter 7). Oneinterpretation is as of a generalized fixed-point. If we recall the linear mappingf A W Fn ! Fn we see that we have here an “almost fixed point” v with f .v/ D �v.In the set of all lines through the origin f�v j � 2 Fg (v ¤ o), which has a lotof structure and called the .n � 1/-dimensional projective space, the directionsgenerated by eigenvectors become fixed points of the action by f A.

5.1.1 Determining eigenvalues: the characteristic polynomial

The formula 5.1.1, that is,nX

kD1ajkvk D �vj , can be viewed as

nXkD1

ajkvk DnX

kD1ıkj vk ,

rewritten asnX

kD1.ıkj � ajk/vk D 0, or

.�I � A/vT D o: (5.1.2)

Now this is a system of linear equations that has a nonzero solution if and only ifrankA < n, that is, by 4.4, if and only if

�A.�/ D det.�I � A/ D 0:


The expression �A.�/ is easily seen to be a polynomial in � with coefficients in F.It is called the characteristic polynomial of A.

We will also apply it to arguments � more general than the numbers from F,

see the next paragraph.

5.2 The algebra of matrices of type n � nMatrices of type n n can be added by the rule

AC B D .ajk C bjk/jk where A D .ajk/jk and B D .bjk/jk

and multiplied by the ˛ 2 F by setting

˛A D .˛ajk/jk:

This is of course the same as computing in the vector space of nn matrices over F.(Recall that the zero vector is the zero matrix O, i.e. the matrix with all the entries0). Note that the �I �A in (5.1.2) agrees with this notation.

For convenience we sometimes write the muliplication by numbers also from theright, as A˛.

Furthermore we have the multiplication of matrices AB and we easily deducethat

.AC B/C D AC C BC; A.B C C/ D AB C AC; and AO D OA D O;

0 � A D O; and .˛A/B D A.˛B/ D ˛.AB/:

This structure is called the algebra of matrices (of type n n). It will be denoted by

An:

Thus, we can consider polynomials with coefficients in An.

5.2.1 Lemma. Let A 2 An, and let

p.x/ D Ckxk C : : : C1x C C0

where C0; : : : Ck 2 An commute with A 2 An. Then there exists a polynomial q.x/with coefficients in An such that

p.x/ D .xI �A/q.x/C p.A/:


Proof. Apply division of polynomials with remainder by xI � A; we work withpolynomials in coefficients in An. All matrices involved as coefficients commutewith A. ut

5.3 Theorem. (Cayley-Hamilton) Plugging a matrix A 2 An into its own charac-teristic polynomial gives

�A.A/ D O:

Proof. Let B.�/ D �I � A, let

C.�/jk D .�1/jCkdetB.�/.j;k/:

By Cramer’s rule,

.�I �A/C.�/T D I � �A.�/:

Applying Lemma 5.2.1, we have

.�I � A/C.�/T D .�I � A/q.�/C �A.A/;

or

.�I �A/.C.�/T � q.�// D �A.A/:

Examining the highest power of � which occurs in C.�/T � q.�/, we see that

C.�/T � q.�/ D 0;

proving the statement of the Theorem. ut

5.4

By a Jordan block we mean a matrix of the form0BBBBBBB@

� 0 : : : 0 0

1 � : : : 0 0

0 1 : : : 0 0

: : : : : : : : : : : : : : :

0 0 : : : � 0

0 0 : : : 1 �

1CCCCCCCA:


A matrix similar to a matrix A is a matrix of the form B�1AB where B is aninvertible matrix. A direct sum of square matrices A1; : : : ; Ak is the matrix

0BB@

A1 0 : : : 0

0 A2 : : : 0

: : : : : : : : : : : :

0 0 : : : Ak

1CCA :

5.5

A vector space V is a direct sum of subspaces U1; : : : ; Ur if Uj \ Uk D fog andV D U1 C � � � C Ur ; in other words, if each v 2 V can be written as a unique sumv D vi C � � � C vr with vj 2 Uj . We then write V D U1 ˚ � � � ˚ Ur . From now on,we will work over the field F D C.

5.5.1 Lemma. Put

U� D fv 2 Cn j .�I �A/N v D 0 for some N D 0; 1; 2; : : :g:

Then Cn is the direct sum of the spaces U�.

Proof. Let us write

�A.x/ DkYiD1.x � �i /ni I

thus, �i are the eigenvalues of A, andPni D n. Define subspaces Wi � Cn,

i D 0; : : : ; k and linear transformations

fi W Wi�1 ! Wi ; i D 1; : : : ; k

as follows.

W0 D Cn;

fi D .A � �iE/ni jWi�1 ;Wi D fi ŒWi�1�:

By definition,

Ker.fi / � U�i ; i D 0; : : : ; k � 1: (1)

By Cayley-Hamilton’s Theorem,


Wk D 0: (2)

Since fi are onto we have by (2),

dim.Ker.f0//C � � � C dim.Ker.fk�1// D n;

hence, by (1),

kXiD1

dim.U�i / � n:

Thus, it suffices to show that if

kXiD1

vi D 0; vi 2 U�i ; (3)

then v1 D � � � D vk D 0. Let

ni D minfN j .A � �iI /N vi D 0g:

Suppose ni0 ¤ 0. Then replacing each vector vi by v0i D .A � �i0I /vi , the vectors

v0i still satisfy (3) in place of the vi ’s. When we make this replacement, the numberni0 decreases by 1, while the numbers ni , i ¤ i0, remain unchanged. After applyingthis procedure finitely many times, we achieve a situation where ni1 D 1 for somei1, and ni D 0 for i ¤ i1. Then (3) reads

vi1 D 0;

which contradicts ni1 D 1.Thus, we have proved that ni D 0 for all i , in other words vi D 0, which is what

we needed to show. ut

5.5.2 Theorem. (Jordan) Every n n matrix is similar to a direct sum of Jordanblocks. Moreover, up to order, the Jordan blocks are uniquely determined.

(We refer to this direct sum as the Jordan canonical form of the matrix A.)

Proof. We will exhibit a proof which will allow us to find the Jordan blocks and thematrix T explicitly (assuming we already have the eigenvalues).

Fix an eigenvalue �. We shall exhibit a basis of U� with respect to which thematrix of the linear transformationAjU� is a direct sum of Jordan blocks. Put f� D�I � A. Define subspaces

U�0 � U�1 � � � � � U�m (1)


of U� inductively by

U�0 D 0; U�;iC1 D f �1� ŒU�i �:

We see that if we let m be the first number such that

U�m D U�;

then all the inclusions (1) are strict. Let v�j1; : : : ; v�jqj be a set of vectors in U�jwhich projects to a basis of U�;j =.U�j�1 C f ŒU�;jC1�/, j D 1; : : : ; m (recallA.6.2.1). Then

v�j i ; f�.v�j i /; : : : ; .f�/j�1v�j i ; j D 1; : : : ; m; i D 1; : : : ; qj

is by definition the desired basis. Combining these bases over for all eigenvalues �,by Lemma 5.5.1, gives a basis with respect to which the linear transformation A isa sum of Jordan blocks. Further, the sizes of the Jordan blocks determine and aredetermined by the dimensions of the spaces U�j , which in turn depend only on thematrix A. This implies the uniqueness statement. ut

6 Exercises

(1) Write down a detailed proof of Theorem 2.3.2.(2) Find all solutions of the system of linear equations over R

x C 2y C 3z C 4t C u D 10;

2x C 4y C 2z C 5t C u D 8;

3x C 6y C 5z C 9t C 2u D 1:

(3) Prove that a Hermitian form over Cn (resp. symmetric bilinear form over Rn)is non-degenerate if and only if its associated matrix is regular.

(4) Decide whether the symmetric bilinear form on R3 associated with the matrix

0@4 6 1

6 8 2

1 2 4

1A

is non-degenerate, and whether it is positive-definite, negative-definite orindefinite.

6 Exercises 499

(5) Compute the determinant of the matrix

0BB@

2 1 3 4

2 2 4 5

1 4 3 3

3 5 6 8

1CCA :

(6) Prove that the set of all nnmatrices overR (resp.C) of non-zero determinantwith the operation of matrix multiplication is a group. This group is called thegeneral linear group and denoted by GLn.R/ (resp. GLn.C/).

(7) Prove that

det W GLn.F/ ! F�

is a homomorphism of groups where F stands for R or C.(8) Prove that the determinant of a square matrix with entries in An in which two

rows (or two columns) coincide is 0. [Hint: the same product appears oncewith a C and once with a �.]

(9) Write down an explicit condition on when a 2 2 matrix

�a b

c d

�

(a; b; c; d 2 C) is regular, and write down a closed formula for its inverse.(10) Determine the Jordan canonical form of the matrix

A D

0BB@

1 1 0 3

0 1 1 0

0 0 1 0

0 0 0 1

1CCA

and find a non-singular matrix P such that P�1AP is in Jordan form.

Bibliography

1. L. Ahlfors, Complex Analysis, 3rd edn. (McGraw-Hill Science/Engineering/Math, New York,1979)

2. M. Artin, Algebra, 2nd edn. (Pearson, Boston, 2011)3. R. Bott, L.W. Tu, Differential Forms in Algebraic Topology. Graduate Texts in Mathematics,

vol. 82 (Springer, New York, 2011)4. D. Dummit, R. Foote, Abstract Algebra, 3rd edn. (Wiley, Hoboken, 2004)5. L. Evans, Partial Differential Equations. Graduate Studies in Mathematics, vol. 19, 2nd edn.

(American Mathematical Society, Providence, 2010)6. O. Forster, B. Gilligan, Lectures on Riemann Surfaces. Graduate Texts in Mathematics, vol. 81

(Springer, New York, 1981)7. I.M. Gelfand, S.V. Fomin, Calculus of Variations. Dover Books in Mathematics (Dover

Publications, Mineola, 2000)8. P. Griffiths, J. Harris, Principles of Algebraic Geometry (Wiley, New York, 1994)9. B.C. Hall, Lie Groups, Lie Algrba, and Representations: An Elementary Introduction. Graduate

Texts in Mathematics, vol. 222 (Springer, New York, 2003)10. S. Helgason, Differential Geometry, Lie Groups, and Symmetric Spaces. Graduate Studies in

Mathematics, vol. 34 (American Mathematical Society, Providence, 2001)11. S. Lang, Elliptic Functions. Graduate Texts in Mathematics, vol. 112 (Springer, New York,

1987)12. S. MacLane, Categories for the Working Mathematician. Graduate Texts in Mathematics,

vol. 5, 2nd edn. (Springer, New York, 1998)13. J.P. May, A Concise Course in Algebraic Topology (University of Chicago Press, Chicago,

1999)14. J.R. Munkres, Elements of Algebraic Topology (Westview Press, Boulder, 1996)15. R. Narasimhan, Several Complex Variables (University of Chicago Press, 1995)16. P. Petersen, Riemannian Geometry. Graduate Texts in Mathematics, vol. 171 (Springer,

New York, 2010)17. F. Riesz, B. Nagy, Functional Analysis (Dover Publications, New York, 1990)18. W. Rudin, Real and Complex Analysis. International Series in Pure and Applied Mathematics,

3rd edn. (McGraw-Hill, New York, 1987)19. W. Rudin, Functional Analysis, 2nd edn. (McGraw-Hill, New York, 1991)20. M. Singer, J.A. Thorpe, Lecture Notes on Elementary Topology and Geometry. Undergraduate

Texts in Mathematics (Springer, New York, 1976)21. M. Spivak, A Comprehensive Introduction to Differential Geometry. 5 volume set, 3rd edn.

(Publish or Perish, Houston, 1999)22. M. Spivak, Calculus, 4th edn. (Publish or Perish, Houston, 2008)


501

Index of Symbols

.a; b/ open interval, 6ha; bi closed interval, 6A� adjoint matrix, 470AT transposed matrix, 470A�1 inverse matrix, 483C.X/ space of bounded continuous functions,

56Cr , C1 degrees of smoothness, 289F� , Gı , F�ı . . . types of Borel sets, 123Lp , 138R`ijk curvature tensor, 375

T kij torsion tensor, 375V � dual vector space, 268W.y1; : : : ; yn/ Wronskian, 180W ? orthogonal complement, 463Œu; v� Lie bracket of vector fields, 167ƒ Lebesgue measurable functions, 118ei , ei standard bases, 471o the zero element of a vector space, 452u � v inner product, dot product, 461v row or column vector, 452�A characteristic polynomial of a matrix, 493ıji Kronecker delta, 359

detA, jAj determinant of a matrix, 487dim.V / dimension of a vector space, 458

.I /

Z

L

line integral of the first kind, 199

.II/

Z

L

line integral of the second kind, 199Zf Lebesgue integral, 109

Z

J

f Riemann integral over an n-dimensional

interval, 99Z

L

f .z/dz complex line integral, 202Z

B

! integral of a differential form, 302Z

M

f Lebesgue integral over a set, 124

Z

X

fd� integral by a measure, 427Z b

a

f .x/dx the integral, 27

`p , 433`p.C/, 433Of the Fourier transform formula, 443

ln.x/ natural logarithm, 30C the field of complex numbers, 6FS free vector space on a set S , 466F field of real or complex numbers, 451Fn the space of column vectors, 471

Fn row vector space, 452R the field of real numbers, 4Z functions with compact support on R

n, 106Zup, Zdn, Z� sets of certain limits of compactly

supported functions, 107B Borel sets, 123F Fourier transformation, 443F�1 inverse Fourier transformation, 447S the space of rapidly decreasing functions,

445L Lebesgue integrable functions, 110Lup, Ldn, L� functions with a (possibly

infinite) Lebesgue integral, 113TMx the tangent space at a point x, 293sgnp sign of a permutation, 487@f

@xipartial derivative, 66

@vf directional derivative, 68Df total differential, 73d exterior derivative, 298 definition of, 311.†; x0/ fundamental group, 338', 325sin.x/; cos.x/ trigonometric functions, 30Col.A/ column space, 477Row.A/ row space, 477Im.z/, 6Re.z/, 6


503

504 Index of Symbols

Arg.z/, 258grad, div, curl operators on vector fields, 306rankA rank of a matrix, 479Qf inverse Fourier transformation formula, 446�ijk Christoffel symbols of the second kind,

359�ijk Christoffel symbols of the first kind, 359.M/ the de Rham complex of M , 300.x; "/ open ball, 39k.M/ the vector space of k-forms, 298jjf jjp , 135jjxjj the norm of x, 34cM characteristic function, 113ex exponential function, 30

f ŒX� the image of a set under a map, 3f � dual linear map, 269f �1ŒX� The pre-image of a set under a map, 3f Ad adjoint linear operator, 401fA, f A linear maps associated with a matrix,

472fn ! f pointwise convergence, 18fn % f increasing limit, 103fn � f uniform convergence, 18, 58fn & f decreasing limit, 103Hom.U; V / the vector space of

homomorphisms, 267Ker.f / kernel, 469P.X/ power set, 43

Index

Abelian group, 485Absolute convergence, 19Absolutely continuous function, 435Absolutely continuous measure, 433Adjoint linear operator, 401Adjoint matrix, 470Affine approximation, 73Affine connection, 371Affine map, 467Affine set, 466Algebra of matrices, 494Almost complex structure, 383Almost everywhere, 114Argument, 258Argument Principle, 258Arzela-Ascoli Theorem, 229Associated homogeneous system, 480Associated vector subspace to an affine set,

467Atlas, 287Augmented matrix of a system of linear

equations, 480

Baire’s Category Theorem, 220Banach’s Fixed Point Theorem, 55Banach space, 393Banach subspace, 394Base change matrix, 474Base point, 334Basis of a topology, 46Basis of a vector space, 457Beginning point, 325Bessel’s inequality, 406Betti numbers, 301, 309Bijective map, 4Bolzano-Cauchy Theorem, 11Borel measurable function, 125Borel measure, 428Borel set, 123

Boundary oriented counter-clockwise, 205Bounded linear operator, 398Bounded metric space, 52Brachistochrone, 353

Cantor set, 63Category theory, 269Cauchy-Riemann conditions, 239Cauchy-Schwarz inequality, 461Cauchy sequence, 10, 54Cauchy’s formula, 245Cayley-Hamilton Theorem, 495Chain rule, 71Change of coordinates, 368Characteristic function, 113Characteristic matrix, 190Characteristic polynomial, 183, 493Chart, 287Christoffel symbol, 359Closed form, 300Closed set, 40, 44Closed simple curve, 196Closure, 40, 44Codomain, 3Column, 470

space, 477vector, 471

Compact interval, 11Compact metric space, 51Compact operator, 424Compact topological space, 218Completely regular space, 225Complete metric space, 54Completion, 223Complex conjugates, 6Complex derivative, 238Complex line integral, 202Complex primitive function, 243Composition, 4, 66


505

506 Index

Concave function, 15Conformal map, 253Congruence on a vector space, 468Connected component, 49Connected space, 47Connection, 371Conserved quantity, 353Continuous Fourier transformation, 443Continuous function, 11Continuous map, 36, 45Contravariance, 268Convergence, 35Convergent sequence, 10Convex function, 15Convex polygon, 317Convex set, 73Coordinate neighborhood, 287Coordinate system, 287Countable set, 20Coupled quantities, 357Covariance, 268Covering, 213, 324Cramer’s rule, 490Critical function, 352Critical point, 14, 88Curvature tensor, 375Curve, 193Cyclic vector, 191Cycloid, 354

Daniell’s method, 109Deck transformation, 338Decreasing sequence of functions, 103Degree of a polynomial, 8Dense subset, 44de Rham cohomology, 301, 309de Rham complex, 300Derivative, 13Determinant, 487

of a linear map, 493Diffeomorphism, 289Differential form, 295Dimension of a vector space, 458Dini’s Theorem, 102Directional derivative, 68Direct sum of vector spaces, 496Discrete Fourier transform, 442Distance, 33Domain, 3, 205Dot product, 461Dual basis, 269Dual space, 399Dual vector space, 268

Dynkin’s Lemma, 132

Eigenvalues, 402, 493Eigenvectors, 402, 493Einstein convention, 357, 370Elementary row and column operations, 477Elliptic curve, 323Elliptic functions, 347Elliptic integral, 347End point, 325Energy, 355, 361Equation of holomorphic disks, 383Essential singularity, 256Euclidean connection, 376Euclidean plane, 6Euler-Lagrange equations, 350Even permutation, see Permutation, evenExact form, 300Existence theorem for systems of LDE’s, 177Existence and Uniqueness Theorem for

Systems of ODE’s, 151Exponential map, 361Exterior algebra, 277Exterior derivative, 298Exterior power, 277Exterior product, 281

Factor, 469Fatou’s lemma, 139Field, 4Finite-dimensional vector space, 454� -finite measure, 448Finite operator, 424Flux, 306Fourier series, 442Fourier transformation, 443Frechet derivative, 365Free vector space on a set, 466Frobenius’ Theorem, 481Fubini’s Theorem, 101, 128Function, 7Functoriality, 285Fundamental group, 338Fundamental neighborhood, 324Fundamental system of solutions, 180Fundamental Theorem of Algebra, 253Fundamental Theorem of Calculus, 29, 435,

438Fundamental Theorem of Line Integrals, 306

Gauss elimination method, 481

Index 507

Gaussian plane, 6Generalized Cantor set, 116Generalized Pythagoras’ Theorem, 405Generalized symmetry of a system of ODE’s,

169General linear group, 499Generating set of a vector space, 453Geodesic, 359, 374

equation, 359Global extreme, 90Gram-Schmidt orthogonalization process, 462Goursat’s Theorem, 241Grassmann algebra, 277Green’s Theorem, 206Gronwall’s inequality, 154Group, 485

Hamiltonian, 353Hausdorff space, 225Heine-Borel Theorem, 217Hermitian form, 464Hermitian matrix, 470Hermitian operator, 402Hessian, 88Higher derivative, 16Hilbert basis, 407Hilbert-Schmidt operator, 426Hilbert space, 393Hilbert subspace, 394Hodge � operator, 283Holder’s inequality, 136Holomorphic automorphism, 312, 322Holomorphic 1-form, 327Holomorphic function, 241, 322Holomorphic isomorphism, 312, 322Holomorphic Open Mapping Theorem, 260Holonomy, 381Homeomorphism, 42Homogeneous differential equation, 170Homogeneous equation, 163Homogeneous LDE’s, 176Homogeneous system of linear equations, 480Homomorphism theorem (for vector spaces),

469Homomorphism of groups, 485Homotopy of paths, 325Hurwitz’s Theorem, 260Hyperbolic plane, 366Hypergeometric functions, 338

Identity, 4Imaginary part, 6

Immersion, 295Implicit differentiation, 94Implicit Function Theorem, 77, 81Increasing sequence of functions, 103Indefinite Hermitian, real symmetric matrix,

form, 474, 484Induced Riemann metric, 379Infimum, 5Infinitesimal symmetry, 168Injective map, 4Injective space, 61Inner product, 460

norm, 461Integral by a Borel measure, 430Integral curves, 166Integral equations, 147Integral Mean Value Theorem, 28Intermediate Value Theorem, 12Interval (n-dimensional), 97Inverse, 4

Fourier transform, 446Function Theorem, 86matrix, 483

Isolated singularity, 256Isometry, 378Isomorphism

of banach spaces, 394of groups, 485holomorphic, 312isometric, 394, 418of vector space, 464

Jacobian, 83Jacobi identity, 168Jensen’s inequality, 143Jordan block, 495Jordan canonical form, 497Jordan’s Curve Theorem, 263Jordan’s Theorem on Matrices, 497

Kernel, 469k-form, 295Kronecker ı, 359

Lagrange’s Theorem, 14, 73Lagrangian, 354Laurent series, 256LDE, see Linear differential equationLebesgue integrable function, 110Lebesgue integral, 109

over a set, 123

508 Index

Lebesgue measure, 120Lebesgue’s Dominated Convergence Theorem,

117, 431Lebesgue’s Monotone Convergence Theorem,

117, 429Left invariant vector field, 308Levi-Civita connection, 379Levi’s Theorem, 117Lie algebra, 168Lie bracket, 168, 307Lie group, 308Lifting, 325Lindelof space, 213Linear combination, 454Linear differential equation (LDE), 164,

175Linear independence, 455Linear map, 464

associated with a matrix, 474Line integral of the first kind, 198Line integral of the second kind, 199Liouville’s Theorem, 252Lipschitz function, 149Local extreme, 17, 88Locally finite cover, 290Looman-Menchoff’s Theorem, 239Lower sum, 26

Manifold, 287Map, 3Matrix, 469

associated with a linear map, 474of a linear map, 473of a system of linear equations, 480

Maximum principle, 260Mean Value Theorem, 14Measurable function, 118Measurable set, 120Meromorphic function, 323Mesh, 26Metric, 33

space, 33subspace, 37

Metrizable space, 45Minor of a matrix, 489Mobius strip, 310Mobius transformations, 312, 323Modulus, 6Multiplication of matrices, 470Multiplicity of a root, 9Multi-valued holomorphic function,

335

Negative definite Hermitian, real symmetricmatrix, form, 474, 484

Neighborhood, 39, 44Noether current, 363Non-singular matrix, 483Non-vanishing vector field, 297Norm, 34Normal space, 226Normed vector space, 34

Odd permutation, see Permutation, oddODE, see Ordinary differential equationOnto map, 4Open cover, 213Open set, 40, 44Ordered basis, 473Ordinary differential equation (ODE), 145Orientation, 280, 301Oriented curve, 195Orthogonal complement, 397, 463Orthogonal vectors, 462Orthonormal system, 462

Parallel transport, 373Parametrization, 193

by arc length, 358Parametrized curve, 194Parseval’s equality, 406Partial derivatives, 66

of higher order, 74Partition of an interval, 26, 97Path, 325Permutation, 456

even, 486odd, 486

Path-connected space, 49Picard-Lindelof Theorem, 151Piecewise continuously differentiable curve,

194Point, 33Pole, 256Polynomial, 8Positive definite Hermitian, real symmetric

matrix, form, 474, 484Power series, 23Power set, 43Primitive function, 327Product of metric spaces, 39

Quotient vector space, 469

Index 509

Radius of convergence, 24Radon-Nikodym Theorem, 433Rank of a matrix, 479Rapidly decreasing function, 445Real numbers - a rigorous construction, 234Real part, 6Refinement of a cover, 290Refinement of a partition, 26, 98Region with corners, 303Regular matrix, 191, 483Regular space, 225Removable singularity, 256Residue, 257Residue Theorem, 257Restriction of a map, 4Riemann integrable function, 99, 142Riemann integral, 26, 27, 97, 98Riemann-Lebesgue lemma, 443Riemann Mapping Theorem, 314Riemann metric, 356, 378Riemann surface, 322Riemann zeta function, 261Riesz Representation Theorem, 399Root of a polynomial, 8Rouche’s Theorem, 259Row, 470

space, 477vector, 471

Scalar, 451Schwartz-Christoffel formula, 317Schwartz’s Lemma, 313Schwarzian function, 445Separable space, 213Separation axioms, 224Separation of variables, 161, 168Series, 19Set of measure 0, 114Sign of a permutation, 486Similar matrices, 496Simple arc, 196Simply connected Riemann surface, 332Simply connected set, 314Singular values, 425Slice Theorem, 297Smooth coordinate system, 289Smooth function, 288Smooth manifold, 288Smooth partition of unity, 204, 290Space of solutions, 179Spherical coordinates, 142Square matrix, 470Standard basis, 471

Steinitz’ Theorem, 456Stereographical projection, 392Stokes’ Theorem, 304Stone-Weierstrass Theorem, 231Subbasis of a topology, 46Subcover, 213Submanifold, 295Submersion, 295Substitution in differential equations, 165Substitution Theorem, 130, 135Sum in a Hilbert space, 402Sum of vector subspaces, 454Support, 106, 440Supremum, 5Surface, 322Surjective map, 4Symmetric bilinear form, 464Symmetric matrix, 470Symmetry of a system of ODE’s, 167System of linear differential equations, 175

with constant coefficients, 183System of linear equations, 479System of ordinary differential equations,

145

Tangent vector, 292Taylor’s Theorem, 16, 87, 248Tensor, 368

calculus, 368field, 368product, 271, 272

Tietze’s Real Line Theorem, 61Tietze’s Theorem, 59Topological concept, 43Topological invariant, 301Topological manifold, 287Topological space, 43Topology, 44Torsion tensor, 375Total differential, 68, 72, 294Totally bounded metric space, 215Trace class operator, 425Transposition, 470, 486Triangle inequality, 33T0 and T1 spaces, 224T2 space, 225T3 and T3C 1

2spaces, 225

T4 space, 226

Uncountable set, 30Uniform convergence, 18, 58Uniformization Theorem, 332

510 Index

Uniformly continuous function, 12Uniformly continuous map, 36Uniformly convex Banach space, 395Uniqueness theorem for holomorphic

functions, 251Unit matrix, 470Universal covering, 332Universal object, 271, 276Upper sum, 26Urysohn’s Theorem, 228

Variation of constants, 164, 181Vector, 451

field, 166, 295

space, 451subspace, 453

Volume, 391form, 301

Weak topology, 413Weierstrass’s Theorem, 247Wronskian, 180

Young’s inequality, 16

Zero, 256

analysis mathematical introduction todl.booktolearn.com/ebooks2/science/mathematics/... ·...

Documents