math51h

7/28/2019 Math51H

http://slidepdf.com/reader/full/math51h 1/142

An Introduction toMultivariable Mathematics

7/28/2019 Math51H


Copyright © 2008 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in

any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in

printed reviews, without the prior permission of the publisher.

An Introduction to Multivariable Mathematics

Leon Simon

www.morganclaypool.com

ISBN: 9781598298017 paperback

ISBN: 9781598298024 ebook

DOI 10.2200/S00147ED1V01Y200808MAS003

A Publication in the Morgan & Claypool Publishers series

SYNTHESIS LECTURES ON MATHEMATICS AND STATISTICS

Lecture #3

Series Editor: Steven G. Krantz, Washington University, St. Louis

Series ISSN

Synthesis Lectures on Mathematics and Statistics

ISSN pending.

7/28/2019 Math51H


An Introduction toMultivariable Mathematics

Leon Simon

Stanford University

SYNTHESIS LECTURES ON MATHEMATICS AND STATISTICS #3

CM

& cLaypoolMor gan publishe rs&

7/28/2019 Math51H


ABSTRACT The text is designed for use in a 40 lecture introductory course covering linear algebra, multivariabledifferential calculus, and an introduction to real analysis.

The core material of the book is arranged to allow for the main introductory material on linearalgebra, including basic vector space theory in Euclidean space and the initial theory of matrices and

linear systems, to be covered in the first 10 or 11 lectures, followed by a similar number of lectureson basic multivariable analysis, including first theorems on differentiable functions on domains inEuclidean space and a brief introduction to submanifolds. The book then concludes with furtheressential linear algebra, including the theory of determinants, eigenvalues, and the spectral theoremfor real symmetric matrices, and further multivariable analysis, including the contraction mappingprinciple and the inverse and implicit function theorems. There is also an appendix which provides

a 9 lecture introduction to real analysis. There are various ways in which the additional material inthe appendix could be integrated into a course—for example in the Stanford Mathematics honors

program, run as a 4 lecture per week program in the Autumn Quarter each year, the first 6 lecturesof the 9 lecture appendix are presented at the rate of one lecture a week in weeks 2–7 of the quarter,

with the remaining 3 lectures per week during those weeks being devoted to the main chapters of the text.

It is hoped that the text would be suitable for a 1 quarter or 1 semester course for students who havescored well in the BC Calculus advanced placement examination (or equivalent), particularly those

who are considering a possible major in mathematics.

The author has attempted to make the presentation rigorous and complete, with the clarity andsimplicity needed to make it accessible to an appropriately large group of students.

KEYWORDS

vector, dimension, dot product, linearly dependent, linearly independent, subspace,Gaussian elimination, basis, matrix, transpose matrix, rank, row rank, column rank,nullity, null space, column space, orthogonal, orthogonal complement, orthogonal pro-

jection, echelon form, linear system, homogeneous linear system, inhomogeneous linearsystem, open set, closed set, limit, limit point, differentiability, continuity, directionalderivative, partial derivative, gradient, isolated point, chain rule, critical point, secondderivative test, curve, tangent vector, submanifold, tangent space, tangential gradient,permutation, determinant, inverse matrix, adjoint matrix, orthonormal basis, Gram-Schmidt orthogonalization, linear transformation, eigenvalue, eigenvector, spectral the-orem, contraction mapping, inverse function, implicit function, supremum, infimum,sequence,convergent, series, power series, base point,Taylor series,complex series, vec-

tor space, inner product, orthonormal sequence, Fourier series, trigonometric Fourierseries.

7/28/2019 Math51H


v

Contents

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 Vectors in R n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Dot product and angle between vectors in R n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Subspaces and linear dependence of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Gaussian Elimination and the Linear Dependence Lemma . . . . . . . . . . . . . . . . . . . . . 7

5 The Basis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1

6 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

7 Rank and the Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Orthogonal complements and orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . 1 8

9 Row Echelon Form of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2

10 Inhomogeneous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Analysis in R n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1 Open and closed sets in Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1

2 Bolzano-Weierstrass, Limits and Continuity in R n . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

4 Directional Derivatives, Partial Derivatives, and Gradient . . . . . . . . . . . . . . . . . . . . . 3 7

5 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

6 Higher-order partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2

7/28/2019 Math51H


vi CONTENTS

7 Second derivative test for extrema of multivariable function . . . . . . . . . . . . . . . . . . . 44

8 Curves in R n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 8

9 Submanifolds of R n and tangential gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3

3 More Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1

1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1

2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64

3 Inverse of a Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4 Computing the Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Orthonormal Basis and Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 Matrix Representations of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7 Eigenvalues and the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4 More Analysis in R n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81

1 Contraction Mapping Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1

2 Inverse Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4

A Introductory Lectures on Real Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87

Lecture 1: The Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7

Lecture 2: Sequences of Real Numbers and the Bolzano-Weierstrass Theorem . . . . . . . . . 9 1

Lecture 3: Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6

Lecture 4: Series of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Lecture 5: Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Lecture 6: Taylor Series Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 0 8

Lecture 7: Complex Series, Products of Series, and Complex Exponential Series . . . . . . 1 1 3

Lecture 8: Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7/28/2019 Math51H


CONTENTS vii

Lecture 9: Pointwise Convergence of Trigonometric Fourier Series . . . . . . . . . . . . . . . . . . 1 2 1

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 7

7/28/2019 Math51H


viii CONTENTS

7/28/2019 Math51H


Preface The present text is designed for use in an introductory honors course.It is in fact a fairly faithful rep-resentation of the material covered in the course “Math 51H—Honors Multivariable Mathematics”taught during the 10 week Autumn quarter in the Stanford Mathematics Department, and typically taken in the first year by those students who have scored well in the BC Calculus advanced placement

examination (or equivalent), many of whom are considering a possible major in mathematics.

The text is designed to be covered by 4 lectures per week (each of 50 minutes duration) in a 10 week

quarter, or by 3 lectures per week over a 13 week semester. In the Stanford program the material inthe Appendix (“Introductory lectures on real analysis”) is run parallel to the material of the maintext—the first 6 real analysis lectures are presented in doses of 1 lecture per week (Mondays are “realanalysis days” in the Stanford program) over a 6-week period, and during these weeks the remaining3 lectures per week are devoted to the main text. A typical timetable based on this approach is to befound at:

http://math.stanford.edu/˜lms/51h-timetable.htm

A serious effort has been made to ensure that the present text is rigorous and complete, but at thesame time that it has the clarity and simplicity needed to make it accessible to an appropriately largegroup of students. For most students, regular (at least weekly) problem sets will be of paramountimportance in coming to grips with the material, and ideally these problem sets should involve a

mixture of questions which are closely integrated with the lectures and which are designed to helpin the assimilation the various concepts at a level which is at least one step removed from routineapplication of standard definitions and theorems.

This text is dedicated to the many Stanford students who have taken the Honors MultivariableMathematics course in recent years. Their vitality and enthusiasm have been an inspiration.

Leon SimonStanford University

July 2008

http://math.stanford.edu/~lms/51h-timetable.htm

http://math.stanford.edu/~lms/51h-timetable.htm

7/28/2019 Math51H


x PREFACE

7/28/2019 Math51H


1

C H A P T E R 1

Linear Algebra 1 VECTORS IN R n

Rn will denote real Euclidean space of dimension n—that is, Rn is the set of all ordered n-tuplesof real numbers, referred to as “points in Rn” or “vectors in Rn”—the two alternative terminologies“points” and “vectors” will be used quite interchangeably, and we in no way distinguish between

them.

1

Depending on the context we may decide to write vectors (i.e., points) inRn

as columns orrows: in the latter case, Rn would denote the set of all ordered n-tuples (x1, . . . , xn) with xj ∈ R

for each j = 1, . . . , n. In the former case, Rn would denote the set of all columns

⎛⎜⎜⎜⎝x1

x2

...

xn

⎞⎟⎟⎟⎠. In this

case we often (for compactness of notation) write (x1, . . . , xn) T as an alternative notation for the

column vector

⎛⎜⎜⎜⎝

x1

x2

...

xn

⎞⎟⎟⎟⎠

, and conversely

⎛⎜⎜⎜⎝

x1

x2

...

xn

⎞⎟⎟⎟⎠

T

= (x1, . . . , xn).

For the moment, we shall write our vectors (i.e., points) as rows. Thus, for the moment, pointsx ∈ Rn will be written x = (x1, . . . , xn). The real numbers x1, . . . , xn are referred to as components of the vector x. More specifically, xj is the j -th component of x.

To start with, there are two important operations involving vectors in Rn: we can (i) add themaccording to the definition

1.1 (x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn)

and (ii) multiply a vector (x1, . . . , xn) in Rn by a scalar (i.e., by a λ ∈ R) according to the definition

1.2 λ(x1, . . . , xn) = (λx1, . . . , λ xn) .

Using these two definitions we can prove the following more general property: If x = (x1, . . . , xn),y = (y1, . . . , yn), and if λ, μ ∈ R, then

1.3 λx + μy = (λx1 + μy1, . . . , λ xn + μyn) .

1Nevertheless, from the intuitive point of view it is sometimes helpful to think geometrically of vectors as directed arrows; forexample, such a geometric interpretation is suggested in the discussion of line vectors below.

7/28/2019 Math51H


2 CHAPTER 1. LINEAR ALGEBRA

Notice, in particular, that, using the definitions 1.1, 1.2, starting with any vector x = (x1, . . . , xn) we can write

1.4 x =n

j =1

xj ej ,

where ej is the vector with each component = 0 except for the j -th component which is equal to 1. Thus, e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0) , . . . , en = (0, . . . , 0, 1). The vectors e1, . . . , en arecalled standard basis vectors for Rn.

The definitions 1.1,1.2 are of course geometrically motivated—the definitionof addition is motivated

by the desire to ensure that the triangle (or parallelogram) law of addition holds (we’ll discuss this

below after we introduce the notion of “line vector” joining two points of Rn) and the second ismotivated by the natural desire to ensure that λx should be a vector in either the same or reversedirection as x (according as λ is positive or negative) and that it should have “length” equal to |λ|times the original length of x.To make sense of this latter requirement,we need to first define lengthof a vector x = (x1, . . . , xn) ∈ Rn: motivated by 2 Pythagoras, it is natural to define the length of x

(also referred to as the distance of x from 0), denoted x, by

1.5 x = n

j =1

x2j .

There is also a notion of “line vector”AB→

from one pointA

=a

=(a

1, . . . , a

n)∈Rn

to anotherpoint B = b = (b1, . . . , bn) defined by

1.6 AB→

= (b1 − a1, . . . , bn − an) (= b − a using the definition 1.3 ) .

Notice, that geometrically, we may think of AB→

as a directed line segment (or arrow) with tail at

A and head at B, but mathematically, AB→

is nothing but b − a, i.e., the point with coordinates(b1 − a1, . . . , bn − an). By definition, we then do have the triangle identity for addition:

1.7 AB→

+ BC→

= AC→

,

because if A

=a

=(a1, . . . , an), B

=b

=(b1, . . . , bn), C

=c

=(c1, . . . , cn) then AB

→

+BC→

=(b − a) + (c − b) = c − a = AC→ .

2 There is often confusion here in elementary texts, which are apt to claim, at least in the case n = 2, 3, that 1.5 is a consequenceof Pythagoras, i.e., that 1.5 is proved by using Pythagoras’ theorem; that is not the case—1.5 is a definition, motivated by the desire to ensure that the Pythagorean theorem holds when we later give the appropriate definition of angle between vectors.

7/28/2019 Math51H


2. DOT PRODUCT AND ANGLE BETWEEN VECTORS IN R n 3

2 DOT PRODUCT AND ANGLE BETWEEN VECTORS IN R n

If x = (x1, . . . , xn) and y = (y1, . . . , yn), we define the dot product of x and y, denoted by x · y,

by x · y = nj =1 xj yj . Notice that then the dot product has the properties:

x · y = y · x(i)(μx + λy) · z = μx · z + λy · z(ii)

x · x = x2 .(iii)

Using the above properties, we can check the identity

2.1 x + y2 = x2 + y2 + 2x · y

as follows:

x + y2 = (x + y) · (x + y) (by (iii))= x · x + y · y + 2x · y (by several applications of (i) and (ii))

= x2 + y2 + 2x · y (by (iii) again) .

In particular, with t y in place of y (where t ∈ R) we see that

2.2 x + t y2 = t 2y2 + 2t x · y + x2 .

Observe that if y = 0 this is a quadratic in the variable t , and by using “completion of the square”at 2

+2bt

+c

=a(t

+b/a)2

+(ac

−b2)/a (valid if a

=0), we see that 2.2 can be written as:

2.3 x + t y2 = y2(t + x · y/y)2 + (y2x2 − (x · y)2)/y .

Observe that the right side here evidently takes its minimum value when, and only when, t =−x · y/y and the corresponding minimum value of x + t y2 is (y2x2 − (x · y)2)/y andthe corresponding value of x + t y is the point x − y−2(x · y)y . Thus,

2.4 x + t y2 ≥ (y2x2 − (x · y)2)/y ,

with equality if and only if t = −x · y/y, in which case x + t y = x − y−2(x · y)y . We can givea geometric interpretation of 2.4 as follows.

For y = 0,the straight line through x parallel to y is by definition the set of points {x + t y : t ∈ R},

so 2.4 says geometrically that the point x − y−2

(x · y)y is the point of which has least distanceto the origin, and the least distance is equal to

(y2x2 − (x · y)2)/y.

Observe that an important particular consequence of the above discussion is that y2x2 − (x ·y)2 ≥ 0, i.e.,

2.5 |x · y| ≤ x y ∀ x, y ∈ Rn ,

7/28/2019 Math51H



and in the case y = 0 equality holds in 2.5 if and only if x = y−2(x · y)y . (We technically proved 2.5 only in case y = 0, but observe that 2.5 is also true, and equality holds, when y = 0.)

The inequality 2.5 is known as the Cauchy-Schwarz inequality.

An important corollary of 2.5 is the following “triangle inequality” for vectors in Rn:

2.6 x, y ∈ Rn ⇒ x + y ≤ x + y .

This is proved as follows. By 2.1 and 2.5, x + y2 = x2 + y2 + 2x · y ≤ x2 + y2 +2xy = (x + y)2 and then 2.6 follows by taking square roots.

Now we want to give the definition of the angle between two nonzero vectors x, y. As a preliminary we recall that the function cos θ is a 2π -periodic function which has maximum value 1, attainedat θ

=kπ , k any even integer, and minimum value

−1 attained at θ

=kπ with k any odd integer.

Also cos θ is strictly decreasing from 1 to −1 as θ varies between 0 and π . Thus, for each valuet ∈ [−1, 1] there is a unique value of θ ∈ [0, π] (denoted arccos t ) with cos θ = t . We will give a

proper definition of the function cos θ and discuss these properties in detail later (in Lecture 6 of Appendix A), but for the moment we just assume them without further discussion.We are now ready to give the formal definition of the angle between two nonzero vectors. So suppose x, y ∈ Rn \ {0}.

Then we define3 the angle θ between x, y by

2.7 θ = arccosx−1x · y−1y

.

Notice this makes sense because x−1x · y−1y ∈ [−1, 1] by the Cauchy-Schwarz inequality 2.5, which says precisely that −xy ≤ x · y ≤ xy.

Observe that by definition we then have

2.8 x · y = xy cos θ , x, y ∈ Rn \ {0} ,

where θ is the angle between x, y.

2.9 Definition: Two vectors x, y are said to be orthogonal if x · y = 0.

Observe that then the zero vector is orthogonal to every other vector, and, by 2.8, two nonzero vectors x, y ∈ Rn are orthogonal if and only if the angle between them is π/2 (because cos θ = 0

with θ ∈ [0, π] if and only if θ = π/2).

SECTION 2 EXERCISES

2.1 Explain why the following “proof ” of the Cauchy-Schwarz inequality |x · y| ≤ x y, wherex, y ∈ Rn, is not valid:

3 Again, there is often confusion in elementary texts about what is a definition and what is a theorem. In the present treatment,

2.7 is a definition which relies only on the Cauchy-Schwarz inequality 2.5 (which ensures x−1y−1x · y ∈ [−1, 1]) and thefact that arccos is a well defined function on [−1, 1] (with values ∈ [0, π]). With this approach it becomes a theorem rather thana definition that cos θ is the ratio (length of the edge adjacent to the angle θ )/(length of hypotenuse) in a right triangle—seeproblem 6.5 of real analysis Lecture 6 in the Appendix for further discussion.

7/28/2019 Math51H


3. SUBSPACES AND LINEAR DEPENDENCE OF VECTORS 5

Proof: If either x or y is zero, then the inequality |x · y| ≤ x y is trivially correct because bothsides are zero. If neither x nor y is zero, then as proved above x · y = x y cos θ , where θ is theangle between x and y. Hence, |x · y| = x y | cos θ | ≤ x y.

Note: We shall give a detailed discussion of the function cos x later (in Lecture 6 of the Appendix); for the moment you should of course assume that cos x is well defined and has its usual properties (e.g., it is 2π-periodic, has absolute value ≤ 1 and takes each value in [−1, 1] exactly once if we restrict x to the interval [0, π], etc.). Thus, | cos θ | ≤ 1 (used in the last step above) is certainly correct.

2.2 (a)The Cauchy-Schwarz inequality proved above guarantees |x · y| ≤ xy and hence x · y ≤xy for all x, y ∈ Rn. If y = 0, prove that equality holds in the first inequality if and only if x = λy for some λ ∈ R and equality holds in the second inequality if and only if x = λy for someλ ≥ 0.

(b) By examining the proof of the triangle inequality

x+

y ≤

x +

y

given above (recall thatproof began with the identity x + y2 = x2 + y2 + 2x · y), prove that equality holds in thetriangle inequality ⇐⇒ either at least one of x, y is 0 or x, y = 0 and y = λx with λ > 0.

2.3 (Another proof of the Cauchy-Schwarz inequality.) If a = (a1, . . . , an), b =(b1, . . . , bn) ∈ Rn, prove the identity 1

2

ni,j =1(ai bj − aj bi )2 = a2b2 − (a · b)2, and

hence prove |a · b| ≤ ab.

2.4 Using the dot product, prove, for any vectors x, y ∈ Rn:

(a) The parallelogram law: x − y2 + x + y2 = 2(x2 + y2).

(b) The law of cosines: x − y2 = x2 + y2 − 2x y cos θ , assuming x, y are nonzero

andθ

is the angle betweenx

andy

.(c) Give a geometric interpretation of identities (a),(b) (i.e., describe what (a) is saying about theparallelogram determined by x, y—i.e., OACB where

−→OA = x,

−→OB = y,

−→OC = x + y, and what (b)

is saying about the triangle determined by x and y—i.e., OAB, where−→OA = x,

−→OB = y).

3 SUBSPACES AND LINEAR DEPENDENCE OF VECTORS

We say that a subset V of Rn is a subspace of Rn (or more specifically a linear subspace of Rn) if V

has the properties that it contains at least the zero vector 0 = (0, . . . , 0), i.e.,

3.1 0 ∈ V

and if it is closed under addition and multiplication by scalars , i.e.,3.2 x, y ∈ V and λ, μ ∈ R ⇒ λx + μy ∈ V .

For example, the subset V consisting of just the zero vector (i.e., V = {0}) is a subspace (usually referred to as the trivial subspace ) and the subset V consisting of all vectors in Rn (i.e., V = Rn) is asubspace.

7/28/2019 Math51H



To facilitate discussion of some less trivial examples we need to introduce some important termi-nology, as follows: We first define the span of a given collection v1, . . . , vN of vectors in Rn.

3.3 Definition:

span{v1, . . . , vN } = {c1v1 + c2v2 + · · · cN vN : c1, . . . , cN ∈ R} .

An expression of the form c1v1 + c2v2 + · · · + cN vN (with c1, . . . , cN ∈ R) is a linear combinationof v1, . . . , vN , and the linear combination is said to be nontrivial if not all the c1, . . . , cN are zero.

Using this terminology the Def. 3.3 says that span{v1, . . . , vN } is the set of all linear combinationsof v1, . . . , vN .

We claim the following.

3.4 Lemma. For any given nonempty collection v1, . . . , vN ∈ Rn , span{v1, . . . , vN } is a subspace of Rn.

Proof: Let V = span{v1, . . . , vN }. Notice that c1v1 + · · · + cN vN = 0 if cj = 0 ∀ j = 1, . . . , N ,hence 0 ∈ V .Also,if w1, w2 ∈ V then w1 = c1v1 + · · · + cN vN and w2 = d 1v1 + · · · + d N vN forsuitable choices of cj , d j ∈ R,andso λw1 + μw2 = (λc1 + μd 1)v1 + · · · + (λcN + μd N )vN ∈ V .

One of the theorems we’ll prove later is that in fact any subspace V of Rn can be expressed as thespan of a suitable collection of vectors v1, . . . , vN ∈ Rn.

We need one more piece of terminology.

3.5 Definition: Vectors v1, . . . , vN are linearly dependent (l.d.) if some nontrivial linear combina-

tion of v

1, . . . , v

N is the zero vector, i.e.,v

1, . . . , v

N are linearly dependent if there are constantsc1, . . . , cN not all zero such that c1v1 + · · · + cN vN = 0.

Notice that v1, . . . , vN are l.d. if and only if one of the vectors v1, . . . , vN can be expressed as alinear combination of the others. That is:

3.6 Lemma. v1, . . . , vN are l.d. ⇐⇒ ∃ j ∈ {1, . . . , N } and constants ci , i = j , with vj =N i=j,i=1 ci vi .

Proof of ⇒: We are given c1, . . . , cN not all zero such that

i ci vi = 0. Pick j such that cj = 0. Then vj =

N i=j,i=1

−ci

cj vi .

Proof of ⇐: Assume j ∈ {1, . . . , N } and ci , i = j , are such that vj =

N i=j,i=1 ci vi . Then

N i=

1 ci vi

=0, where we define cj

= −1.

SECTION 3 EXERCISES

3.1 Let v1, . . . , vk be any set of vectors in Rn. Letv1, . . . ,vk be obtained by applying any one of the 3 “elementary operations” to v1, . . . , vk, where the 3 elementary operations are: (i) interchangeof any pair of vectors (i.e., (i) merely changes the ordering of the vectors); (ii) multiplication of one

7/28/2019 Math51H


4. GAUSSIAN ELIMINATION AND THE LINEAR DEPENDENCE LEMMA 7

of the vectors by a nonzero scalar; (iii) replacing the i-th vector vi by the new vectorvi = vi + μvj , where i = j and μ ∈ R.) Prove that span{v1, . . . , vk} = span{v1, . . . ,vk}.

4 GAUSSIAN ELIMINATION AND THE LINEAR DEPENDENCE LEMMA

We begin with some remarks about linear systems of equations:

(∗)

⎧⎪⎪⎪⎨⎪⎪⎪⎩

a11x1+ a12x2+ · · · + a1 nxn = b1

a21x1+ a22x2+ · · · + a2 nxn = b2

......

...

am1x1+ am2x2+ · · · + amnxn = bm,

where aij , bi are given real constants and the unknowns x1, . . . , xn are to be determined.

Terminology:

(1) Any choice x1, . . . , xn which satisfies all m of the equations is called a solution of the system.

(2) The corresponding vector (x1, x2, · · · , xn) T is called a vector solution (or a solution vector) forthe system. (Here we use the notation that (x1, x2, · · · , xn) T denotes the column vector with j -thentry xj ; for reasons which will become apparent later, we always write solution vectors as column

vectors rather than row vectors.)

(3) The set of all possible solution vectors is called the solution set of the system.

There is a systematic procedure for solving (i.e., finding the solution set of ) such linear systems,known as Gaussian elimination. We’ll discuss Gaussian elimination and its consequences in detaillater in this chapter, but for the moment we just need to describe the first step in the process.

The procedure is based on the (easily checked) observation that the set of solutions of the system (∗)

is unchanged under any one of the following 3 operations:

(i) interchanging any two of the equations;

(ii) multiplying any one of the equations by a nonzero constant;

(iii) adding a multiple of one equation to another one of the equations, i.e., if we add a multiple of the i-th equation to the j -th equation, where i = j .

We now consider the system (∗), and the following alternatives.

Case 1: The coefficient ai1 of x1 is zero for each i = 1, . . . , m; thus in this case the system (∗) hasthe form ⎧⎪⎪⎪⎨⎪⎪⎪⎩0x1+ a12x2+ · · · + a1 nxn = b1

0x1+ a22x2+ · · · + a2 nxn = b2

......

...

0x1+ am2x2+ · · · + amnxn = bm

7/28/2019 Math51H



(i.e., the unknown x1 does not appear at all in any of the equations).Case 2: At least one of the coefficients ai1 = 0; then by operation (i) we may arrange that a11 = 0,and by operation (ii) we may actually arrange that a11 = 1, so that the system (∗) can be changedto a new system having the same set of solutions as the original system and having the form⎧⎪⎪⎪⎨⎪⎪⎪⎩

1 x1+ a12x2+ · · · + a1 nxn =b1a21x1+ a22x2+ · · · + a2 nxn =b2

......

...am1x1+ am2x2+ · · · + amnxn =bm.

Now the operation (iii) allows us to subtract

a21 times the first equation from the second equation,

a31 times the first equation from the third equation, and so on, thus giving a new system having the

same set of solutions as the original system and having the form⎧⎪⎪⎪⎨⎪⎪⎪⎩1x1+ a12x2+ · · · + a1 nxn =b1

0x1+ a22x2+ · · · + a2 nxn =b2

......

...

0x1+ am2x2+ · · · + amnxn =bm

(i.e., the x1 unknown only appears in the first equation), where each bj is a linear combinationof the original b1, . . . , bn; in particular, bj = 0 for each j = 1, . . . , m if all the original bj = 0,j = 1, . . . , m.

This completes the first step of the Gaussian elimination process. Notice that this first step providesus with a new system of equations which is equivalent to (i.e., has precisely the same solution set as)

the original system, and which either has the form

(∗)

⎧⎪⎪⎪⎨⎪⎪⎪⎩0x1+ a12x2+ · · · + a1 nxn = b1

0x1+ a22x2+ · · · + a2 nxn = b2

......

...

0x1+ am2x2+ · · · + amnxn = bm

(i.e., the x1 unknown does not appear at all in any of the equations), or

(∗)

⎧⎪⎪⎪⎨⎪⎪⎪⎩

1x1+ a12x2+ · · · + a1 nxn =b1

0x1+

a22x2+ · · · +

a2 nxn =b2

......

...

0x1+ am2x2+ · · · + amnxn =bm

(i.e., the x1 unknown only appears in the first equation).

The system (∗) is said to be homogeneous if all the bj = 0, j = 1, . . . , m; as we noted above, thefirst step in the Gaussian elimination process gives a new system which is also homogeneous if theoriginal system is homogeneous.

7/28/2019 Math51H


4. GAUSSIAN ELIMINATION AND THE LINEAR DEPENDENCE LEMMA 9

The following lemma refers to such a homogeneous system of linear equations, i.e., a system of mequations in n unknowns thus:

(∗∗)

⎧⎪⎪⎪⎨⎪⎪⎪⎩a11x1+ a12x2+ · · · + a1 nxn = 0

a21x1+ a22x2+ · · · + a2 nxn = 0...

......

am1x1+ am2x2+ · · · + amnxn = 0.

Such a system is called “under-determined” if there are fewer equations than unknowns, i.e., if m < n.

4.1“Under-determined systemslemma:” An under-determined homogeneoussystem of linear equations

(i.e., a system as in (∗∗) above with m < n ) always has a nontrivial solution.

Proof: The proof is based on induction on m. When m = 1 the given system is just the single

equation

a11x1 + a12x2 + · · · + a1nxn = 0

with n ≥ 2, and if a11 = 0 we see that x1 = 1, x2 = ·· · = xn = 0 is a nontrivial solution. On the

other hand, if a11 = 0 then we get a nontrivial solution by taking x2 = 1, x1 = −(a12/a11), andx3 = ·· · = xn = 0. Thus, the lemma is true in the case m = 1.

So take m ≥ 2 and, as an inductive hypothesis, assume that the theorem is true with m − 1 in placeof m, and that we are given the homogeneous system (∗∗) with m < n. According to the abovediscussion of Gaussian elimination we know that (

∗∗) has the same set of solutions as a system of

the form

(∗∗)

⎧⎪⎪⎪⎨⎪⎪⎪⎩a11x1+ a12x2+ · · · + a1 nxn = 0

0 x1+ a22x2+ · · · + a2 nxn = 0...

......

0 x1+ am2x2+ · · · + amnxn = 0,

where eithera11 = 1 or 0, so it suffices to show that there is a nontrivial solution of (∗∗). The lastm − 1 equations here is an under-determined system of m − 1 equations in the n − 1 unknowns

x2, . . . , xn, so by the inductive hypothesis there is a nontrivial solution x2, . . . , xn. If a11 = 1 wealso solve the first equation by taking x1 = −(

a12x2 + · · · +

a1 nxn), so the proof is complete in the

casea11

=1.

On the other hand, if a11 = 0 then we get a nontrivial solution of the system (∗∗) by simply takingx1 = 1 and x2 = ·· · = xn = 0, so the proof is complete.

A very important corollary of the under-determined systems lemma is the following linear depen-dence lemma.

7/28/2019 Math51H



4.2 “Linear Dependence Lemma.” Suppose v1, v2, . . . , vk are any vectors in Rn. Then any k + 1vectors in span{v1, . . . , vk} are l.d.

4.3 Remark: An important special case of this is when k = n and vj = ej , where ej is the j -thstandard basis vector as in 1.4. Since span{e1, . . . , en} = all of Rn (by 1.4), in this case the aboveLinear Dependence Lemma says simply that any n + 1 vectors in Rn are linearly dependent.

Proofof the Linear Dependence Lemma: Let w1, . . . , wk+1 ∈ span{v1, . . . , vk}.Then each wj =a linear combination of v1, . . . , vk, so that

(1) wj = a1j v1 + a2j v2 + · · · + akj vk, j = 1, . . . , k + 1 ,

for suitable scalars aij , i

=1, . . . , k, j

=1, . . . , k

+1.

Now we want to prove that w1, . . . , wk+1 are l.d.; in other words, we want to prove that the system

of equations

(2) x1w1 + · · · + xk+1wk+1 = 0

has a nontrivial solution (i.e., we must show that we can find x1, . . . , xk+1 which are not all zeroand which satisfy the system (2)).

Substituting the given expressions (1) for the wj into (2), we see that (2) actually says

x1(a11v1 + · · · + ak1vk) + x2(a12v1 + · · · + ak2vk) + · · ·· · · + xk+1(a1 k+1v1 + · · · + ak k+1vk) = 0 ;

that is,

(x1a11 + · · · + xk+1a1 k+1)v1 + (x1a21 + · · · + xk+1a2 k+1)v2 + · · ·(3)

· · · + (x1ak 1 + x2ak 2 + · · · + xk+1ak k+1)vk = 0 ,

so it suffices to get a nontrivial solution of the homogeneous linear system

a11x1+ a12x2+ · · · + a1 k+1xk+1 = 0

a21x1+ a22x2+ · · · + a2 k+1xk+1 = 0...

......

ak1x1+ ak2x2+ · · · + ak k+1xk+1 = 0

of k equations in the k

+1 unknowns x1, . . . , xk

+1, and we can do this by the under-determined

systems Lem. 4.1, so the proof is complete.

SECTION 4 EXERCISES

4.1 Use the Linear Dependence Lemma to prove that if k ∈ {2, . . . , n}, then k vectors v1, . . . , vk ∈Rn are l.d. ⇐⇒ dim span{v1, . . . , vk} ≤ k − 1.

7/28/2019 Math51H


5. THE BASIS THEOREM 11

5 THE BASIS THEOREM We now introduce the notion of basis for a nontrivial subspace of Rn. For this we first need theappropriate definitions, as follows.

5.1 Definition: Vectors v1, . . . , vN ∈ Rn are linearly independent (l.i.) if they are not linearly de-pendent.

Thus, v1, . . . , vN are l.i. if and only if N

j =1 cj vj = 0 ⇒ cj = 0 ∀ j = 1, . . . , N .

5.2 Definition: Let V be any nontrivial subspace of Rn. A basis for V is a collection of vectorsw1, . . . , wq such that

(i) w1, . . . , wq are l.i., and

(ii) span{w1, . . . , wq} = V .Such a basis always exists:

5.3Theorem (“The Basis Theorem.”) Suppose V is a nontrivial subspace of Rn.Then

(a) V has a basis; and

(b) If u1, . . . , uk are l.i. vectors in V then there is a basis for V which contains u1, . . . , uk ; more precisely,there is a basis v1, . . . , vq for V with q ≥ k and vj = uj for each j = 1, . . . , k.

Proof of (a): Define

q = max{ ∈ {1, . . . , n} : ∃ l.i. vectors w1, . . . , w ∈ V } ,

and choose l.i. vectorsv

1, . . . , v

q ∈V

. (Thus, roughly speaking,v

1, . . . , v

q are chosen to give “amaximum number of linearly independent vectors in V .”) We claim that such v1, . . . , vq must spanV .To see this let v be an arbitrary vector in V and consider the vectors v1, . . . , vq , v. If q = n thisisasetof n + 1 vectors inRn = span{e1, . . . , en} andsomustbel.d.bytheLinearDependenceLem.4.2.On the other hand, if q < n then q + 1 ≤ n and so v1, . . . , vq , v must again be l.d., otherwisev1, . . . , vq , v is a set of q + 1 ∈ {1, . . . , n} l.i. vectors in V contradicting the definition of q.Thus,in either case (q = n, q < n), the vectors v1, . . . , vq , v are l.d. Thus, c0v + c1v1 + · · · + cq vq = 0for some c0, . . . , cq not all zero.But of course then c0 = 0 because otherwise this identity would tellus that c1v1 + · · · + cq vq = 0 with not all c1, . . . , cq zero, contradicting the linear independenceof v1, . . . , vq . Thus, v = −c−1

0 (c1v1 + · · · + cq vq ), which completes the proof.

Proof of (b): The proof of (b) is almost the same as the proof of (a), except that we start by defining

q = max{ ∈ {k , . . . , n} : ∃ l.i. vectors w1, . . . , w ∈ V with wj = uj for each j = 1, . . . , k} ,

so that we can select l.i. vectors v1, . . . , vq ∈ V with vj = uj for j = 1, . . . , k. The remainder of the proof is identical, word for word, with the proof of (a), and yields the conclusion that v1, . . . , vq

span V , and hence v1, . . . , vq are a basis for V with q ≥ k and vj = uj for each j = 1, . . . , k.

7/28/2019 Math51H



5.4 Theorem. Let V be a nontrivial subspace of Rn. Then every basis for V has the same number of vectors.

Proof: (Based on the “Linear Dependence Lemma” 4.2.) Otherwise, we would have two sets of basis vectors v1, . . . , vk and w1, . . . , wq with q > k. Then, in particular, V = span{v1, . . . , vk} =span{w1, . . . , wq} ⊃ {w1, . . . , wk+1}; that is, w1, . . . , wk+1 are k + 1 linearly independent vectorscontained in span{v1, . . . , vk} which contradicts the linear dependence lemma.

In view of the above result we can now define the notion of “dimension” of a subspace.

5.5 Definition: If V is a nontrivial subspace of Rn then the dimension of V is the number of vectorsin a basis for V . The trivial subspace

{0

}is said to have dimension zero.

5.6 Remarks: Of course Rn itself has dimension n, because e1, . . . , en (as in 1.4) are l.i. and spanall of Rn; i.e., e1, . . . , en is a basis of Rn (which we call the “standard basis for Rn”; the vector ej iscalled the “j -th standard basis vector”).

We now prove a third theorem, which is also, like Thm. 5.4 above, a consequence of the LinearDependence Lemma.

5.7 Theorem. Let V be a nontrivial subspace of Rn of dimension k (so k ≤ n because any n + 1 vectors inRn are l.d. by the Linear Dependence Lemma). Then:

(a) any k vectors v1, . . . , vk in V which span V must be l.i. (and hence form a basis for V ); and

(b) any k l.i. vectors v1, . . . , vk in V must span V (and hence form a basis for V ).

Proof of (a): v1, . . . , vk l.d. means (by Lem. 3.6 of the present chapter) that some vj is a linearcombination of the remaining vi and hence for some j ∈ {1, . . . , k} we have

span{v1, . . . , vk}(= V ) = span{v1, . . . , vj −1, vj +1, . . . , vk}and so any basis of V consists of k l.i. vectors in span{v1, . . . , vj −1, vj +1, . . . , vk}, contradicting

the linear dependence lemma.

Proofof(b): Otherwise,there is a vector v ∈ V whichisnotinspan{v1, . . . , vk} and so v1, . . . , vk, v

are (k + 1) l.i. vectors (see Exercise 5.2 below) in the subspace V = span{w1, . . . , wk}, wherew1, . . . , wk is any basis for V , contradicting the linear dependence lemma.

SECTION 5 EXERCISES

5.1 Use the basis theorem to prove that if V , W are subspaces of Rn with V ⊂ W and dim V =dim W , then V = W .

5.2 If v1, . . . , vk are l.i. and v /∈ span{v1, . . . , vk}, prove that v1, . . . , vk, v are l.i.

7/28/2019 Math51H


6. MATRICES 13

6 MATRICES We begin with some basic definitions.

6.1 Definitions: (i) An m × n real matrix A = (aij ) is an array of mn real numbers aij , arranged inm rows and n columns, with aij denoting the entry in the i-th row and j -th column. Thus, the i-throw is the vector

ρi = (ai1, . . . , ain ) (so ρT i ∈ Rn)

and the j -th column is the vector

αj = (a1j , . . . , amj ) T ∈ Rm .

(ii) If A = (aij ) an m × n matrix as in (i) with columns α1, . . . , αn ∈ Rm

and if x = (x1, . . . , xn) T

∈Rn, we define the matrix product Ax to be the vector y = (y1, . . . , ym) T ∈ Rm with

y =n

k=1

xk αk .

Equivalently,

yi =n

k=1

aik xk, i = 1, . . . , m ,

which is also equivalent to

y=

m

i=1

n

k=1

aik

xk

ei

,

where ei is the i-th standard basis vector in Rm as in 1.4. We note, in particular, that by taking thespecial choice x = ej we see that this says exactly

Aej = the j -th column αj = (a1j , . . . , amj ) T =

mi=1

aij ei of A, j = 1, . . . , n .

(iii) More generally, if A = (aij ) is an m × n matrix and B = (bij ) is and n × p matrix, then AB

denotes the m × p matrix with j -th column Aβ j (∈ Rm), where βj (∈ Rn) is the j -th column of B;thus

j -th column of AB = Aβj where βj = the j -th column of B .

Equivalently,

AB = (cij ), where cij =n

k=1

aik bkj , i = 1, . . . , m, j = 1, . . . , p .

7/28/2019 Math51H



(iv) If A = (aij ) is an m × n matrix, then the transpose of A, denoted A T, is the n × m matrix withentry aji in the i-th row and j -th column. Notice that this notation is consistent with our previoususage of the notation x T when x is a row or column vector: If we are writing vectors inRn as columns,then x T denotes the row vector (i.e., 1 × n matrix) with the same entries as x and if we are writing

vectors in Rn as rows then x T denotes the column (i.e., n × 1 matrix) with the same entries as x.

Notice that

αj = j -th column of A ⇒ α Tj is the j -th row of A T

ρi = i-th row of A ⇒ ρ Ti is the i-th column of A T .

Notice that in a more formal sense an m × n matrix A can be thought of as a point in Rnm, with

the agreement that the entries are ordered into rows and columns rather than a single row or singlecolumn. From this point of view it is natural to define the sum of matrices, multiplication of a matrixby a scalar, and the length (or “norm”) A of a matrix, as in the following.

Analogous to addition of vectors inRn we also add matrices component-wise ;thus,if A = (aij ), B =(bij ) are m × n matrices then we define A + B to the be m × n matrix with aij + bij in the i-throw and j -th column. Thus,

6.2 (aij ) + (bij ) = (aij + bij ) .

Note, however, that it does not make sense to add matrices of different sizes.

Again analogous to the corresponding operation for vectors inRn,foragiven m × n matrix A = (aij )

and a given λ

∈R we define

6.3 λA = (λaij ) ,

and for such an m × n matrix A = (aij ) we define the “length” or “norm” of A, A, by

6.4 A = m

i=1

nj =1

a2ij

⎛⎝= n

j =1

mi=1

a2ij

⎞⎠ .

Since after all this norm really is just the length of the vector with the nm entries aij , we can thenuse the usual properties of the norm of vectors in Euclidean space:

6.5 A + B ≤ A + B, λA = |λ|A .

6.6 Remark: Observe that if A = (aij ) is m × n and x ∈ Rn then the matrix product of A timesx gives a vector Ax in Rm, provided we agree to write vectors in Rn and Rm as columns . Indeed, if

x = (x1, . . . , xn) is a row vector of length n (i.e., a 1 × n matrix) with n ≥ 2 then Ax makes nosense. So we shall henceforth always assume we are writing vectors inRn as columns when we wishto use the matrix product Ax of an m × n matrix A by a vector in Rn.

7/28/2019 Math51H


7. RANK ANDTHE RANK-NULLITY THEOREM 15

We also have the important inequality

6.7 Ax ≤ Ax, A any m × n matrix and x ∈ Rn .

Proof of 6.7: Since x = nj =1 xj ej we have Ax = n

j =1 xj Aej =n

j =1 xj αj , where αj is the j -th column of A (by (ii) above), and so, by the triangle inequality 2.6, Ax ≤n

j =1 |xj |αj = y ·γ , where y = (|x1|, . . . , |xn|) T, γ = (α1, . . . , αn) T ∈ Rn, and hence by the Cauchy-Schwarz inequality we have

Ax ≤ y n

j =1

αj 2 = xA

as claimed.

One of the key reasons that matrices are important is that m × n matrices naturally represent lineartransformationsRn → Rm as follows.

Suppose T : Rn → Rm is a linear transformation (i.e., T is a mapping from Rn to Rm such thatλ, μ ∈ R, x , y ∈ Rm ⇒ T (λx + μy) = λT(x) + μT(y)). Then we can write x = n

j =1 xj ej anduse the linearity of T to give T(x) = n

j =1 xj T (ej ). Thus, if we let A be the m × n matrix withj -th column αj = T (ej )(∈ Rm) we then have by 6.1(ii) above that T (x) = Ax. That is:

6.8 T : Rn → Rm linear ⇒ T (x) ≡ Ax, where the j -th column of A = T (ej ) .

SECTION 6 EXERCISES

6.1 Let θ ∈ [

0, 2π) and let T be the linear transformation of R2 defined by T (x)=

Q(θ)x, where

Q(θ) isthe 2 × 2 matrixcos θ − sin θ

sin θ cos θ

.Provethatif x =

r cos α

r sin α

(with r ≥ 0 and α ∈ [0, 2π ))

then T(x) =

r cos(α + θ )

r sin(α + θ )

. With the aid of a sketch, give a geometric interpretation of this.

6.2 What is the matrix (in the sense described above) of the linear transformation T : R2 → R2

which takes the point (x,y) to its “reflection in the line y = x,” i.e., the transformation T ( x , y ) =(y,x).

Caution: In this exercise points inR2 are written as row vectors, but in order to represent T in termsof matrix multiplication you should first rewrite everything in terms of column vectors.

7 RANK AND THE RANK-NULLITYTHEOREM

Let A = (aij ) be an m × n matrix. We define the column space C(A) of A to be the subspace of Rm spanned by the columns of A: thus, if αj = (a1j , a2j , . . . , amj )

T is the j th column of A, then

7.1 C(A) = span{α1, . . . , αn} ,

7/28/2019 Math51H



which (by 3.4 of the present chapter) is a subspace of Rm. The “null space” N(A) of A is defined tobe the set of all solutions x ∈ Rn of the homogeneous system Ax = 0:

7.2 N(A) = {x ∈ Rn : Ax = 0} ,

so that N(A) is a subspace of Rn.

The rank (or column rank) of A is defined to be the dimension of the column space C(A) of A; thatis, it is the dimension of the subspace of Rn spanned by the columns of A.

The row rank of A is defined to be dimension of the subspace of Rn spanned by the rows of A, incase we agree to write all vectors in Rn as row vectors. (Equivalently, it is the column rank, i.e., thedimension of the column space, of A T.)

One of the things we’ll prove later (in Thm. 8.5 below) is that the row rank and the column rank are equal . This is not intuitively obvious, but true nevertheless.

The nullity of A is defined to be the dimension of the null space N(A).

Let’s look at the extreme cases for N(A): (a) N(A) = {0} and (b) N(A) = Rn (i.e., the case whenN(A) is trivial and the case when N(A) is the whole spaceRn). In case (a), we have that Ax = 0 hasonly the trivial solution x = 0; but Ax = n

j =1 xj αj , so this just saysn

j =1 xj αj = 0 has only thetrivial solution x = 0. That is, (a) just says that the columns of A are l.i., and hence are a basis forC(A) and the column space C(A) has dimension n. In case (b), we have Ax = 0 for each x in Rn,

which, in particular, means that Aej = 0, which says that the j -th column of A is the zero vectorfor each j , i.e., A = O (the zero matrix, with every entry = 0), and hence C(A) = {0} in this case.

Notice that in each of these two extreme cases we have dim C(A) + dim N(A) = n. The rank/nullity theorem (below) says that this is in fact true in all cases, not just in the extremecases.

7.3 Rank/NullityTheorem. Let A be any m × n matrix.Then

dim C(A) + dim N(A) = n .

7.4 Remarks: (1) Notice this is correct in the extreme cases N(A) = {0} and N(A) = Rn (as wediscussed above), so we only of have to give the proof in the nonextreme cases (i.e.,when N(A) = {0}and N(A) = Rn).

(2) Recall that dim C(A) is called the “rank” of A (i.e., “rank (A)”) and dim N(A) the “nullity” of A

(i.e., “nullity (A)”); using this terminology the above theorem saysrank (A) + nullity (A) = n ,

hence the name “rank/nullity theorem.”

Proof of Rank/NullityTheorem: By Rem. (1) above we can assume that N(A) = {0} and N(A) =Rn. In particular, since N(A) is nontrivial it has a basis u1, . . . , uk. By the part (b) of the Basis

7/28/2019 Math51H


7. RANK ANDTHE RANK-NULLITY THEOREM 17

Thm. 5.3 we can find a basis v1, . . . , vn for all of Rn with vj = uj for each j = 1, . . . , k, and of course also k ≤ n − 1 because N(A) = Rn. Now

C(A) = span{α1, . . . , αn} where α1, . . . , αn denote the columns of A

= {x1α1 + · · · + xnαn : x1, . . . , xn ∈ R}= {Ax : x ∈ Rn}= {A(

nj =1 cj vj ) : c1, . . . , cn ∈ R} (because span{v1, . . . , vn} =Rn)

= {nj =1 cj Avj : c1, . . . , cn ∈ R}

= {nj =k+1 cj Avj : ck+1, . . . , cn ∈ R} (because Avj = 0 for j = 1, . . . , k)

= span{Avk+1, . . . , A vn} .

Also, Avk

+1, . . . , A vn are l.i.; check:n

j =k+1 cj Avj = 0 ⇒ Anj =k+1 cj vj

= 0

⇒ nj =k+1 cj vj ∈ N(A) = span{v1, . . . , vk}

⇒ nj =k+1 cj vj =

kj =1 d j vj for some d 1, . . . , d k

⇒ kj =1 d j vj −

nj =k+1 cj vj = 0

⇒ ck+1, . . . , cn, d 1, . . . , d k are all 0, because v1, . . . , vn are l.i.

Thus, Avk+1, . . . , A vn are a basis for C(A), and hence dim C(A) = n − k = n − dim N(A).

SECTION 7 EXERCISES

7.1 (a) Use Gaussian elimination to show that the solution set of the homogeneous system Ax = 0

with matrix

A =

⎛⎜⎜⎝1 2 3 4

1 4 3 2

2 5 6 7

1 0 3 6

⎞⎟⎟⎠is a plane (i.e., the span of 2 l.i. vectors), and find explicitly 2 l.i. vectors whose span is the solutionspace.

(b) Find the dimension of the column space of the above matrix (i.e., the dimension of the subspaceof R4 which is spanned by the columns of the matrix).

Hint: Use the result of (a) and the rank nullity theorem.

7.2 If A, B are any m × n matrices, prove that rank (A + B) ≤ rank (A) + rank (B).(rank (A) is the dimension of the column space of A, i.e., the dimension of the subspace of Rm

spanned by the columns of A.)

7.3 (a) If A, B are, respectively, m × n and n × p matrices, prove that rank AB ≤min{rank A, rank B}.

7/28/2019 Math51H



(b) Give an example in the case n = m = p = 2 to show that strict inequality may hold in theinequality of (a).

7.4 Suppose v1, . . . , vn arevectorsinRm,notallzero,andlet ∈ {1, . . . , n} be themaximum numberof linearly independent vectors which can be selected from v1, . . . , vn. Select any 1 ≤ j 1 < j 2 <

· · · < j ≤ n such that vj 1 , . . . , vj are l.i. Prove that vj 1 , . . . , vj is a basis for span{v1, . . . , vn}.

(In particular, given a nonzero m × n matrix A with j -th column αj , we can always select certainof the columns αj 1 , . . . , αj Q to give a basis for C(A).)

8 ORTHOGONAL COMPLEMENTS AND ORTHOGONALPROJECTION

Let V be a subspace of Rn. The “orthogonal complement” V ⊥ of V is defined to be the set of all vectors x in Rn such that x · v = 0 for each v ∈ V . That is, V ⊥ is the set of all vectors which areorthogonal to each vector in V . Of course V ⊥ so defined is a subspace of Rn.

If V is the trivial subspace {0} we evidently have V ⊥ = Rn, whereas if V is nontrivial the Basis

Thm. 5.3(a) guarantees that we can choose a basis u1, . . . , uk for V and then by 5.3(b) (applied withRn in place of V ) there is a basis v1, . . . , vn for Rn with vj = uj for each j = 1, . . . , k. It is then

very easy to check that V ⊥ is exactly characterized by saying that

x ∈ V ⊥ ⇐⇒ uj · x = 0 for each j = 1, . . . , k .

The k equationshere form a homogeneous systemof k linearequationsinthen unknowns x1, . . . , xn,

so by the under-determined systems Lem. 4.1 it is a nontrivial subspace unless k = n, i.e., unlessV = Rn.

Our main initial results about V ⊥ are given in the following theorem.

8.1 Theorem. Let V be a subspace of Rn. Then the subspace V ⊥ satisfies the following:

V ⊥ ∩ V = {0}(i)

V + V ⊥ = Rn(ii)

dim V + dim V ⊥ = n(iii)

(V ⊥)⊥ = V .(iv)

Remark: Notice that (i) says that V and V ⊥ have only the zero vector in common, and (ii) says that

every vector inRn

can be written as a the sum of a vector in V and a vector in V ⊥.

Proof of (i): w ∈ V ∩ V ⊥ ⇒ w ∈ V and w ∈ V ⊥. Thus, in particular, w · w = 0, i.e., w2 = 0,i.e., w = 0.

Proof of (ii): V + V ⊥ = {v + u : v ∈ V , u ∈ V ⊥} and this is clearly a subspace of Rn. Call thissubspace W . Take any vector x ∈ W ⊥, so that x · (v + u) = 0 for each v ∈ V and each u ∈ V ⊥.

7/28/2019 Math51H


8. ORTHOGONAL COMPLEMENTS AND ORTHOGONAL PROJECTION 19

Taking u = 0, this implies x · v = 0 for each v ∈ V , so x ∈ V ⊥,and,taking v = 0, it implies x · u =0 for each u ∈ V ⊥. In particular, since x ∈ V ⊥, we must have x · x = 0.Thus, x = 0;that is, we haveshown that W ⊥ is the trivial subspace, and, by the remarks preceding the statement of the theorem,this can occur only if W = Rn, so (ii) is proved.

Proof of (iii): Notice that (iii) is trivially true if V is the trivial subspace (because then V ⊥ = Rn),so we can assume V is nontrivial. Likewise, we can assume that V ⊥ is nontrivial, because, as wepointed out in the discussion preceding the statement of the theorem, V ⊥ trivial implies V = Rn

and again (iii) holds. Thus, we can assume that both V and V ⊥ are nontrivial, and hence by thebasis theorem we can select a basis v1, . . . , vk for V and a basis u1, . . . , u for V ⊥. Of coursethen by (ii) we have span{v1, . . . , vk, u1, . . . , u} =Rn and we claim that v1, . . . , vk, u1, . . . , u

are linearly independent. To check this, suppose that c1v1

+ · · · +ck vk

+d 1u1

+ · · · +d u

=0.

Then c1v1 + · · · + ckvk = −d 1u1 − · · · − d u and the left side is in V while the right side is inV ⊥, so by (i) both sides must be zero.That is, c1v1 + · · · + ck vk = 0 = d 1u1 + · · · + d u and sincev1, . . . , vk are l.i., this gives all the cj = 0, and similarly, since u1, . . . , u are l.i., all the d j = 0.

Thus, we have proved that the vectors v1, . . . , vk , u1, . . . , u both span Rn and are l.i.; that is, they are a basis for Rn, and hence k + = n. That is, dim V + dim V ⊥ = n, as claimed.

Proof of (iv): Observe that by definition of V ⊥ we have v · u = 0 for each v ∈ V and each u ∈ V ⊥, which, in particular, says that each v ∈ V is in (V ⊥)⊥ (because it says that each v ∈ V is orthogonalto each vector in V ⊥), so we have shown V ⊂ (V ⊥)⊥.

By (iii) dim V ⊥ = n − dim V . Also, V ⊥ is a subspace of Rn, so (iii) holds with V ⊥ in place of V .

That is, dim(V ⊥)⊥

=n

−dim V ⊥

=n

−(n

−dim V )

=dim V .

That is,we have shown that V ⊂ (V ⊥)⊥ andthat V, ( V ⊥)⊥ havethesamedimension.They thereforemust be equal. (Notice that here we use the following general principle: if V , W are subspaces of Rn and V ⊂ W then dim V = dim W implies V = W . This is an easy consequence of the Basis

Thm. 5.3—see Exercise 5.1 above.)

This completes the proof of Thm. 8.1.

8.2 Remark: 8.1(ii) tells us that we can write any vector x ∈ Rn as x = v + u with v ∈ V andu ∈ V ⊥. We claim that such v, u are unique . Indeed, if we could also write x =v +u withv ∈ V

andu ∈ V ⊥ then we would have x = v + u =v +u, hence v −v =u − u. Since the left side is inV and the right side is in V ⊥, we would then have by 8.1(i) that v −

v = 0 =

u − u, i.e., that v =

v

and u=u. So v, u are unique as claimed. For this reason the decomposition of 8.1(ii) is sometimes

referred to as a direct sum.

Using Thm. 8.1 we can prove that existence of a unique orthogonal projection of Rn onto a givensubspace V , as follows.

7/28/2019 Math51H



8.3 Theorem (Orthogonal Projection.) Given a subspace V ⊂ Rn there is a unique map P V : Rn →Rn with the two properties

P V (x) ∈ V ∀x ∈ Rn

x − P V (x) ∈ V ⊥ ∀x ∈ Rn .

This map automatically has the additional properties

(i) P V : Rn → Rn is linear (i.e., P V (λx + μy) = λP V (x) + μP V (y) ∀ x, y ∈ Rn )

(ii) x · P V (y) = P V (x) · y ∀ x, y ∈ Rn

(iii) ∀ x ∈ Rn, x − P V (x) ≤ x − y ∀ y ∈ V , with equality ⇐⇒ y = P V (x) .

8.4 Remark: The map P V is called the orthogonal projection of Rn onto V . Notice that the properties

(i), (ii) tell us that P V is given by matrix multiplication by a symmetric matrix (pij ); that is, P V (x)

=ni,j =1 pij xj ei , where pij = pj i for each i, j = 1, . . . , n, and property (iii) says that P V (x) is the

unique nearest point of V to x, as in the following schematic diagram depicting the case n = 3,dim V = 2.

Figure 1.1: The orthogonal projection P V onto a subspace V .

Proof of Theorem 8.3: Theorem 8.1(ii) implies that for any x

∈Rn

we can writex

=v

+u

withv ∈ V and u ∈ V ⊥ and by Rem. 8.2 such v, u are unique. Thus, we can define P V (x) = v (∈ V ),and then x − P V (x) = u (∈ V ⊥). Using the Thm. 8.1(i) we can readily check that the claimed

uniqueness of P V the stated properties (i) and (ii) (see Exercise 8.2 below).

To prove (iii), observe that for any x ∈ Rn and y ∈ V we have x − y2 = (x − P V (x)) − (y −P V (x))2 = x − P V (x)2 + y − P V (x)2 + 2(x − P V (x)) · (y − P(x)), and the last term is

7/28/2019 Math51H


8. ORTHOGONAL COMPLEMENTS AND ORTHOGONAL PROJECTION 21

zero because x − P V (x) ∈ V ⊥ and y − P V (x) ∈ V . That is, we have shown that x − y2 =x − P V (x)2 + y − P V (x)2 ≥ x − P V (x)2, with equality if and only if y − P V (x) = 0, i.e.,if and only if y = P V (x).

The following theorem,which is based onThm.8.1 above and the Rank-NullityTheorem,establishessome important connections between column spaces and null spaces of a matrix A and its transposeA T.

8.5 Theorem. Let A be an m × n matrix.Then

(C(A))⊥ = N (AT )(i)

C(A) = (N(AT ))⊥(ii)dim C(A)

=dim C(AT ) .(iii)

8.6 Remark: Notice that (iii) says that the row rank and the column rank are the same.

Proof of (i): Let α1, . . . , αn be the columns of A.Then

x ∈ (C(A))⊥ ⇐⇒ x · αj = 0 ∀ j = 1, . . . , n

⇐⇒ A Tx = 0⇐⇒ x ∈ N (A T) .

Proof of (ii): By (i) (C(A))⊥ = N (A T) and hence (using (iv) of Thm. 8.1) C(A) = ((C(A))⊥)⊥ =(N(A T))⊥.

Proof of (iii): By the rank-nullity theorem (applied to the transpose matrix AT ) we have

(1) dim C(A T) = m − dim N (A T) .

By part (iii) of Thm. 8.1 and by (i) above we have

(2) dim C(A) = m − dim(C(A))⊥ = m − dim N (A T) .

By (1),(2) we have dim C(A) = dim C(A T) as claimed.

SECTION 8 EXERCISES

8.1 (a) If V is a subspace of Rn and if P is the orthogonal projection onto V , prove that I − P isthe orthogonal projection onto V ⊥ (I is the identity transformation, so (I − P)(x) = x − P(x)).

(b) If α ∈ Rn with α = 1 and if V = span{α}, find the matrix of the orthogonal projection onto

V and also the matrix of the orthogonal projection onto V ⊥.

8.2 If V and P V (x) are as in the first part of Thm. 8.3 above, prove that properties (i), (ii) hold as

claimed.

Hint: To check (i) note that P V (λx + μy) − λP V (x) − μP V (y) = −(λx + μy − P V (λx +μy)) + λ(x − P V (x)) + μ(y − P V (y)) and use Thm. 8.1(i) and the stated properties of P V .

7/28/2019 Math51H



9 ROW ECHELON FORM OF A MATRIX

Let A = (aij ) be an m × n matrix. Recall that the “First Stage” (described in Sec. 4 above) of theprocess of Gaussian elimination leads to 2 possible cases: either the first column is zero, or else thereis at least one nonzero entry in the first column, and using only the 3 elementary row operations weproduce a new matrix with all entries in the first column zero except for the first entry which wecan arrange to be 1. That is, Stage 1 of the process of Gaussian elimination produces a new matrixA with the same null space as A (because elementary row operations leave the solution set of thehomogeneous system Ax = 0 unchanged) and A either has the form:

Case 1 A = ⎛⎜⎜⎜⎜⎝0

a12 · · ·

a1n

0 a22 · · · a2n...

... · · · ...

0 am2 · · · amn

⎞⎟⎟⎟⎟⎠or

Case 2 A =

⎛⎜⎜⎜⎜⎝1 a12 · · · a1n

0 a22 · · · a2n

...... · · · ...

0

am2 · · ·

amn.

⎞⎟⎟⎟⎟⎠Now in Case 1 we can apply this whole first stage process again to the m × (n − 1) matrix

A =

⎛⎜⎜⎜⎝a12 · · · a1na22 · · · a2n

... · · · ...am2 · · · amn,

⎞⎟⎟⎟⎠

while in Case 2 we can apply the whole first stage process again to the (m − 1) × (n − 1) matrix

A =⎛⎜⎝ a22 · · · a2n

... · · · ...am2 · · · amn

⎞⎟⎠ .

7/28/2019 Math51H


9. ROW ECHELON FORM OF A MATRIX 23

Using mathematical induction on the number n of columns we can check that, in fact after at mostn repetitions of the process described above, we obtain a matrix of the following form:

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

col. j 1↓ col. j 2↓ col. j 3↓ · · · col. j Q ↓0 · · · 0 1 ∗ · · · ∗ ∗ ∗ · · · ∗ ∗ ∗ · · · ∗ ∗ ∗ · · · ∗0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗ ∗ ∗ · · · ∗ ∗ ∗ · · · ∗0 · · · 0 0 0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗ ∗ ∗ · · · ∗...

......

......

......

......

......

......

...

0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0...

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭(a)

⎫⎪⎬⎪⎭ (b)

Schematic Diagram of the Row Echelon Form of A

where the first Q rows (labeled (a) above) are nonzero, and each contains a pivot , i.e., a 1 precededby an unbroken string of zeros, and each successive one of these rows has a strictly longer unbrokenstring of zeros before the pivot; and the remaining rows (labeled (b) above), if there are any, are allzero.

Notice that mathematical induction does indeed provide a rigorous proof of this because the result istrivially true for n = 1,andfor n ≥ 2 the inductive hypothesis that the result is true for matrices withn − 1 columns would tell us that the matrix A above can be reduced to echelon form by elementary row operations. By applying the same row operations to the matrix A we then get the requiredechelon form for A.

Such a form (obtained from A by elementary row operations) is called a “row echelon form” forthe matrix A: Notice that there are a certain number Q ≤ min{m, n} of nonzero rows, each of

which consists of an unbroken string of zeros (“the leading zeros” of the row) followed by a 1

called the “leading 1” or “pivot,” and each successive nonzero row has strictly more leading zerosthan the previous. Correspondingly, the column numbers of the pivot columns (i.e., the columns

which contain the pivots) arej

1, . . . , j

Q , with1

≤j

1< j

2<

· · ·< j

Q ≤n

, as shown in the aboveschematic diagram.

Of course by using further elementary row operations (subtracting the appropriate multiples of a

row containing a pivot from the rows above it) we can now eliminate all nonzero entries above thepivot in each of the pivot columns, thus yielding a matrix called the “reduced row echelon form of A” (abbreviated rref A):

7/28/2019 Math51H



(‡)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

col. j 1↓ col. j 2↓ col. j 3↓ · · · col. j Q ↓0 · · · 0 1 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗0 · · · 0 0 0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗ 0 ∗ · · · ∗...

......

......

......

......

......

......

...

0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0...

......

......

......

......

......

......

...

0· · ·

0 0 0· · ·

0 0 0· · ·

0 0 0· · ·

0 0 0· · ·

0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

The Reduced Row Echelon Form (“rref A”) of A

Note: Such schematic diagrams are accurate and useful, but must be interpreted correctly; forinstance, if j 1 = 1 (i.e., if there is a pivot in the first column), then the first few zero columns shownin the above diagrams are not present, and the reduced row echelon form of A (rref A) would look

as follows:

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

col. j 1↓ col. j 2↓ col. j 3↓ · · · col. j Q ↓1 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗0 0 · · · 0 1 ∗ · · · ∗ 0 ∗ · · · ∗ 0 ∗ · · · ∗0 0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗ 0 ∗ · · · ∗...

......

......

......

......

......

...

0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 1 ∗ · · · ∗0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0...

......

......

......

......

......

...

0 0 · · · 0 0 0 · · · 0 0 0 · · · 0 0 0 · · · 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

In the case when there is a pivot in every column (i.e., Q = n and j 1 = 1, j 2 = 2, . . . , j n = n), arow echelon form of A would look like

7/28/2019 Math51H


9. ROW ECHELON FORM OF A MATRIX 25

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 ∗ ∗ · · · ∗0 1 ∗ · · · ∗...

......

0 0 0 · · · 1

0 0 0 · · · 0...

......

0 0 0 · · · 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠and the reduced row echelon form of A (i.e., rref A) is:

(‡‡)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 · · · 0

0 1 0

· · ·0

......

...

0 0 0 · · · 1

0 0 0 · · · 0...

......

0 0 0 · · · 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠.

That is, if there is a pivot in every column (Q = n and j 1, . . . , j Q = 1, . . . , n, respectively) then thereduced row echelon form of A has first n rows equal to the n × n identity matrix, and remainingrows all zero.

Since Gaussian elimination (which uses only the 3 elementary row operations) does not change the

null space we haveN(A) = N (rref A) ,

so to find N(A) we instead only need solve the simpler problem of finding N (rref A).

Notice in case Q = 0 we have the trivial case A = O and N(A) = Rn and when Q = n wehave a pivot in every column and N(A) = N (rref A) = {0}, so we assume from now on thatQ ∈ {1, . . . , n − 1}, and we proceed to compute N (rref A).

If we let bij denote the element in the i-th row and j -th column of rref A, then see that,for i ∈ {1, . . . , Q}, equation number i in the system rref Ax = 0 is satisfied ⇐⇒ xj i =−k=j 1,...,j Q

xk bik , and of course the last m − Q equations are all trivially true for all x be-

cause the last m − Q rows of rref A are zero. Thus, since for any vector x ∈ Rn we can write

x = nk=1 xkek = Q

i=1 xj i ej i +k=j 1,...,j Q xkek, we see that

Ax = 0 ⇐⇒ rref Ax = 0 ⇐⇒ xj i = −k=j 1,...,j Qxkbik , i = 1, . . . , Q

⇐⇒ x = −Qi=1(

k=j 1,...,j Q

xkbik )ej i +k=j 1,...,j Qxkek

⇐⇒ x = k=j 1,...,j Q

xk (ek −Qi=1bik ej i )

7/28/2019 Math51H



with xk arbitrary for k = j 1, . . . , j Q. Thus,

N(A) = span{ek −Qi=1bik ej i : k = j 1, . . . , j Q} ,

and the n − Q vectors on the right here are clearly l.i., so in fact we have dim N(A) = n − Q.

Observe also that rref A has the standard basis vectors e1, . . . , eQ in columns number j 1, . . . , j Q,and all the other columns have their last m − Q entries = 0, so trivially the column space of rref Ais spanned by the l.i. vectors e1, . . . , eQ: if k

=j 1, . . . , j Q then the k-th column βk of rref A

=Q=1 ce for suitable constants c1, . . . , cQ. Since the j -th column of a matrix is obtained by

multiplying the matrix by the j -th standard basis vector ej , in terms of matrix multiplication this

says exactly rref A(ek −Q=1 cej ) = 0—i.e., the vector ek −Q

=1 cej is in N (rref A). Since

N (rref A) = N(A) we must then have that ek −Q=1 cej is in N(A), or in other words, A(ek −Q

=1 cej ) = 0.That is,for each k = j 1, . . . , j Q,the k-th column αk of A is the linear combinationQ=1 cαj . Thus, we have proved that the column space C(A) of A is spanned by the columns

number j 1, . . . , j Q of A (i.e.,thecolumns αj 1 , . . . , αj Q of A,where j 1, . . . , j Q arethenumbersofthepivot columns of rref A). We also claim that αj 1 , . . . , αj Q are l.i., because otherwise there are scalars

c1, . . . , cQ not all zero such thatQ

=1 cαj = 0 and this says exactly that the vectorQ

=1 cej ∈N(A) = N (rref A), with c1, . . . , cQ not all zero, so that

Q=1 ce = rref A(

Q=1 cej ) = 0 with

c1, . . . , cQ not all zero, which is of course not true because e, = 1, . . . , Q, are l.i. vectors. Observethat this part of the argument (that αj 1 , . . . , αj Q are l.i.) is also correct if Q = n; of course in thiscase we must have j 1 = 1, j 2 = 2, . . . , j n = n and the result is that all n columns of A are l.i.

Thus, to summarize, we have proved that if the pivot columns of rref A are column numbersj 1, . . . , j Q (Q ≥ 1), then the columns αj 1 , . . . , αj Q are a basis for the column space of A. In

particular, Q (the number of nonzero rows in rref A) is equal to the dimension of C(A), the columnspace of A. Since we already proved directly that dim(N(A)) = n − Q, we have thus given a secondproof of the rank/nullity theorem and at the same time we have shown how to explicitly find thenull space N(A) and the column space C(A).

SECTION 9 EXERCISES

9.1 Write down a single homogeneous linear equation with unknowns x1, x2, x3 such that the

solution set consists of the span of the 2 vectors

⎛⎝ 1

2

−1

⎞⎠ ,

⎛⎝ 1

−2

0

⎞⎠.

7/28/2019 Math51H


10. INHOMOGENEOUS SYSTEMS 27

9.2 Use Gaussian elimination to show that the solution set of the homogeneous system Ax = 0 withmatrix

A =

⎛⎜⎜⎝1 2 3 4

1 4 3 2

2 5 6 7

1 0 3 6

⎞⎟⎟⎠is a plane (i.e., the span of 2 l.i. vectors), and find explicitly 2 l.i. vectors whose span is the solutionspace.

(b) Find the dimension of the column space of the above matrix (i.e., the dimension of the subspaceof R4 which is spanned by the columns of the matrix).

Hint: Use the result of (a) and the rank nullity theorem.

9.3 Suppose

A =

⎛⎜⎜⎝1 0 1 1 1

2 1 1 3 1

1 1 2 0 2

0 0 1 −1 1

⎞⎟⎟⎠ .

Find a basis for the null space N(A) and the column space C(A) of A.

10 INHOMOGENEOUS SYSTEMS

Notice that Gaussian elimination can also be used to solve inhomogeneous systems10.1 Ax = b ,

where A is m × n, b is a given vector in Rm, and x is to be determined. First we have the followinglemma, which establishes that the set of all solutions of such an inhomogeneous system is eitherempty or an affine space which is a translate of the null space N(A) of A.

10.2 Lemma. Suppose A is m × n and b ∈ Rm.Then:

∃ a solution x ∈ Rn of 10.1 ⇐⇒ b ∈ C(A) (the column space of A );(i)

if x0 ∈ Rn is any solution of 10.1 then the complete set of solutions (ii)x of 10.1 is the affine space x0 + N(A).

10.3 Remark: If V is a subspace of Rn and a∈Rn, then “the affine space through a parallel to V ”

means the translate of V by the vector a; that is, a + V (= {a + x : x ∈ V }). With this terminology,part (ii) of the above theorem says that the set of all solutions of 10.1 is the affine space through x0

parallel to N(A), assuming that there is at least one solution x = x0 of 10.1.

Proof of 10.2: Observe first that if αj is the j -th column of A then by definition of matrix multipli-cation 6.1(ii) we have Ax = n

j =1 xj αj which means that the set of all vectors Ax corresponding

7/28/2019 Math51H



Figure 1.2: Schematic picture of the affine space a + V with n = 3, dim V = 2.

to x ∈ Rn is just the set of all linear combinations of the columns α1, . . . , αn, i.e., the column spaceof A. Hence, Ax = b has a solution if and only if b ∈ C(A), so (i) is proved.

Suppose x0 ∈ Rn with Ax0 = b and observe that then

Ax = b ⇐⇒ A(x − x0) + Ax0 = b ⇐⇒ A(x − x0) = 0

⇐⇒ x − x0 ∈ N(A) ⇐⇒ x ∈ x0 + N(A) ,

so (ii) is also proved.

In practice, we can check if there is a solution x0, and if so actually find the whole solution set,by using Gaussian elimination as follows: We start with the m × (n + 1) “augmented matrix” A|b

which has columns α1, . . . , αn, b and use the same elementary row operations which reduce A torref A, but applied to the augmented matrix instead of A: this gives a new m × (n + 1) augmentedmatrix rref A|b such that rref Ax =b has the same set of solutions as the original system Ax = b.One can then directly check whether the new system rref Ax =b has a solution, and, if so, actually find all the solutions x0 + N(A).

SECTION 10 EXERCISES

10.1 Find a cubic polynomial ax 3 + bx 2 + cx + d whose graph passes through the 4 points (−1, 0),

(0, 1), (1, −2), (2, 5). Is this cubic polynomial the unique such?

10.2 For each of the inhomogeneous systems with the indicated augmented matrix A|b: (a) check whether or not a solution exists, and (b) in case a solution exists find the set of all solutions (describe

7/28/2019 Math51H


10. INHOMOGENEOUS SYSTEMS 29

your solution in terms of an affine space x0 + V , giving x0 and a basis for the subspace V in eachcase):

(i)

A|b =

⎛⎜⎜⎝1 0 1 1 1

2 1 1 3 1

1 1 2 0 2

0 0 1 −1 1

1

1

2

1

⎞⎟⎟⎠(ii)

A|b =

⎛⎜⎜⎝

1 0 1 1 2

2 1 1 3 1

1 1 2 0 2

0 0 1 −1 1

1

1

2

1

⎞⎟⎟⎠

.

7/28/2019 Math51H


30

7/28/2019 Math51H


31

C H A P T E R 2

Analysis in R n

1 OPEN AND CLOSED SETS IN R n

We begin with the definition of open and closed sets in Rn. In these definitions we use the notationthat for ρ > 0 and y ∈ Rn the open ball of radius ρ and center y (i.e., {x ∈ Rn : x − y < ρ}) isdenoted Bρ (y).

1.1 Definition: A subset U ⊂ Rn is open if for each a ∈ U there is a ρ > 0 such that the ballBρ (a) ⊂ U .

Observe that direct use of the above definition yields the following general facts:

• Rn and ∅ (the empty set) are both open.

• If U 1, . . . , U N is any finite collection of open sets, then ∩N j =1U j is open (thus “the intersection of

finitely many open sets is again open”).

• If {U α}α∈ ( any nonempty indexing set) is any collection of open sets, then ∪α∈U α is open(thus “the union of any collection of open sets is again open”).

In the definition of closed set we need the notion of limit point of a set C ⊂ Rn: a point y ∈ Rn issaid to be a limit point of a set C ⊂ Rn if there is a sequence {x(k)}k=1,2,... with x(k) ∈ C for each k

and limk→∞ x(k)

= y (i.e., limk→∞ x(k)

− y = 0). For example, in the case n = 1 the point 0 isa limit point of the open interval (0, 1), because { 1k+1

}k=1,2,... is a sequence of points in (0, 1) withlimit zero. Notice that any point y ∈ C is trivially a limit point of C, because it is the limit of the

constant sequence {x(k)}k=1,2,... with x(k) = y for each k, hence it is always true that

C ⊂ {limit points of C} .

1.2 Definition: A subset C ⊂ Rn is closed if C contains all its limit points.

There is a very important connection between open and closed sets.

1.3 Lemma. A set U ⊂ Rn is open ⇐⇒ Rn \ U is closed.

Before we give the proof, we point out an equivalent version of the same result.

1.4 Corollary. A set C ⊂ Rn is closed ⇐⇒ Rn \ C is open.

Proof of Corollary: Apply 1.3 to the case U = Rn \ C; this works because U = Rn \ C ⇐⇒ C =Rn \ U .

7/28/2019 Math51H


32 CHAPTER 2. ANALYSIS IN R n

Proof of 1.3⇒: Assume U is open and let y be a limit point of Rn \ U . If y ∈ U then by definitionof open there is ρ > 0 such that Bρ (y) ⊂ U , so all points of Rn \ U have distance at least ρ from y,so, in particular, y cannot be a limit point of Rn \ U , contradicting the choice of y. Thus, we haveeliminated the possibility that y ∈ U and hence y ∈ Rn \ U .

Proof of 1.3⇐: Take y ∈ U . Then y is not a limit point of Rn \ U (because Rn \ U contains allof its limit points) and we claim there must be some ρ > 0 such that Bρ (y) ∩ Rn \ U = ∅. Indeed,otherwise we would have Bρ (y) ∩ Rn \ U = ∅ for every ρ > 0, and, in particular, B 1

k(y) ∩Rn \

U = ∅ for each k = 1, 2, . . ., so for each k = 1, 2, . . . there is a point x(k) ∈ B 1k

(y) ∩Rn \ U ,hence

x(k) − y < 1k

and y is a limit point of Rn \ U , a contradiction. So in fact there is indeed a ρ > 0

such that Bρ (y) ∩Rn \ U = ∅, or, in other words, Bρ (y) ⊂ U .

Notice that in view of the De Morgan Laws (see Exercise 1.5 below) that Rn \ (∪α∈U α ) =∩α∈(Rn \ U α) andRn \ (∩α∈U α ) = ∪α∈(Rn \ U α) and the general properties of open sets men-tioned above, we conclude the following general properties of closed sets:

• Rn and ∅ are both closed.

• If C1, . . . , CN is any finite collection of c losed sets, then ∪N j =1Cj is closed (thus “the union of

finitely many closed sets is again closed”).

• If {Cα}α∈ ( any nonempty indexing set) is any collection of closed sets then ∩α∈Cα is closed(thus “the intersection of any collection of closed sets is again closed”).

SECTION 1 EXERCISES

1.1 Prove that {(x,y) ∈ R2 : y > x2} is an open set in R2 which is not closed, and {(x,y) ∈ R2 :y ≤ x2} is closed but not open.

1.2 Let A ⊂ Rn, and define E ⊂ A to be relatively open in A if for each x ∈ E there is δ > 0 suchthat Bδ(x) ∩ A ⊂ E.

Prove that E ⊂ A is relatively open in A ⇐⇒ ∃ an open set U ⊂ Rn such that E = A ∩ U .

1.3 Check the claim made above that U 1, . . . , U N open ⇒ ∩N j =1U j is open.

1.4 Give an example, in the case n = 1, to show that the result of the previous exercise does notextend to the intersection of infinitely many open sets; thus, give an example of open sets U 1, U 2, . . .

inR

such that ∩∞j =1U j is not open.

1.5 Prove the first De Morgan law mentioned above; that is, if {Aα}α∈ is any collection of subsets

of Rn, then Rn \ (∪α∈Aα ) = ∩α∈(Rn \ Aα).

Hint: You have to proveRn \ (∪α∈Aα ) ⊂ ∩α∈(Rn \ Aα ) andRn \ (∪α∈Aα ) ⊃ ∩α∈(Rn \ Aα ),i.e., x ∈ Rn \ (∪α∈Aα ) ⇒ x ∈ ∩α∈(Rn \ Aα ) and x ∈ ∩α∈(Rn \ Aα) ⇒ x ∈ Rn \ (∪α∈Aα ).

7/28/2019 Math51H


2. BOLZANO-WEIERSTRASS, LIMITS AND CONTINUITY IN R n 33

2 BOLZANO-WEIERSTRASS, LIMITS AND CONTINUITY IN R n

Here we shall write points in Rn as rows: x = (x1, . . . , xn).

We first give the definition of convergence of a sequence {x(k)}k=1,2,... of such points:

2.1 Definition: Given a sequence {x(k)}k=1,2,... of points in Rn, limk→∞ x(k) = y meanslimk→∞ x(k) − y = 0.

Notice that this makes sense, because {x(k) − y}k=1,2,... is a sequence of real numbers, and weare already familiar with the definition of convergence of such real sequences. In terms of ε, N , theDef. 2.1 actually says that lim x(k) = y ⇐⇒ for each ε > 0 there is N such that x(k) − y < ε

whenever k

≥N . In view of the inequalities

2.2 max|x(k)

j − yj | : j = 1, . . . , n ≤ x(k) − y ≤

nj =1

|x(k)j − yj |

it is very easy to check the following lemma (see Exercise 2.2 below).

2.3 Lemma. lim x(k) = y with y = (y1, . . . , yn) ⇐⇒ lim x(k)j = yj ∀ j = 1, . . . , n.

2.4 Remark: The real sequences {x(k)j }k=1,2,... (for j = 1, . . . , n) are referred to as the component

sequences corresponding to the vector sequence x(k), so with this terminology Lem. 2.3 says that the vector sequence x(k) converges if and only if each of the component sequences converge, and in thiscase the limit of the vector sequence is the point in Rn with j -th component equal to the limit of

the j -th component sequence, j = 1, . . . , n.

Recall from Lecture 2 of the Appendix that every bounded sequence in R has a convergent subse-

quence. A similar theorem is true for sequences in Rn.

2.5Theorem (Bolzano-Weierstrass in R n.) If {x(k)}k=1,2,... is a bounded sequence in Rn (so that there is a constant R with x(k) ≤ R ∀ k = 1, 2, . . . ), then there is a convergent subsequence {x(kj )}j =1,2,....

Proof: As we mentioned above, this is already proved in Lecture 2 of the Appendix for the casen = 1, and the general case follows from this and induction on n. The details are left as an exercise

(see Exercise 2.3 below).

Now suppose A is an arbitrary subset of Rn, and let f

:A

→Rm, where m

≥1. We want to define

continuity of f at a point a ∈ A. The definition which follows is similar to the the special casediscussed in Lecture 3 of the Appendix A—i.e. the special case when A is an interval in R andm = 1. In the general case when A is any subset of Rn and m ≥ 1 the definition is in fact as follows:

Definition (i): f : A → Rm is continuous at the point a ∈ A if for each ε > 0 there is δ > 0 suchthat x ∈ Bδ(a) ∩ A ⇒ f(x) − f(a) < ε.

7/28/2019 Math51H



(ii) f : A → Rm is said to be continuous if is continuous at every point a ∈ A.

Analogous to the 1-variable theorem discussed in Lecture 3 of Appendix A, we have the followingtheorem in Rn.

2.6 Theorem. Suppose K ⊂ Rn is compact (i.e., K is a closed bounded subset of Rn ) and f : K → R is continuous. Then f attains its maximum and minimum values on K ; that is, there are points a, b ∈ K

with f(a) ≤ f(x) ≤ f(b) ∀ x ∈ K .

Proof: The proof is almost identical to the proof of the corresponding 1-variable theorem given inLecture 3 of the Appendix, except that we use the Bolzano-Weierstrass theorem in Rn rather thanin R. (See Exercise 2.4 below.)

We shall also need the concept of of limit of a function which is defined on a subset of Rn. Sosuppose f : A → Rm, where A ⊂ Rn and m ≥ 1. We shall define the notion of limit at any pointa ∈ A which is not an isolated point of A—a point a ∈ A is called an isolated point of A if there issome δ0 > 0 such that A ∩ Bδ0

(a) = {a} (i.e., if there is some ball Bδ0(a) such that a is the only

point of A which lies in the ball Bδ0(a)). Thus a ∈ A is not an isolated point of A if and only if

A ∩ (Bδ(a) \ {a}) = ∅ for each δ > 0.

2.7 Definition: Suppose A ⊂ Rn, f : A → Rm, a ∈ A, a not an isolated point of A, and y ∈ Rm. Then limx→a f(x) = y means that for each ε > 0 thereisa δ > 0 such that x ∈ A ∩ Bδ(a) \ {a} ⇒f(x) − y < ε.

2.8 Remark: With the notation of the above definition, notice that limx

→a f(x)

=y with y

=(y1, . . . , ym) ⇐⇒ limx→a f j (x) = yj ∀ j = 1, . . . , m.

We leave the proof as an exercise (see Exercise 2.4 below),and we also leave the proofs of the standardlimit theorems in the following lemma as an exercise (see Exercise 2.4).

2.9Lemma. Suppose λ, μ ∈ R , f, g : A → Rm , a ∈ A , and limx→a f(x) and limx→a g(x) both exist.Then:

limx→a

(λf (x) + μg(x)) exists and = λ limx→a

f(x) + μ limx→a

g(x)(i)

m = 1 ⇒ limx→a

(f (x)g(x)) exists and = ( limx→a

f (x))( limx→a

g(x))(ii)

m = 1 and limx→a

g(x) = 0 ⇒ limx→a

(f (x)/g(x))(iii)exists and = ( lim

x

→a

f (x))/( limx

→a

g(x)) .

2.10 Remark: Suppose f : A → Rm and a ∈ A. Observe that f is trivially continuous at a if a isan isolated point of A, and if a is not an isolated point of A then f is continuous at a if and only if limx→a f(x) = f(a) (meaning limx→a f(x) exists and is equal to f(a)).

7/28/2019 Math51H


3. DIFFERENTIABILITY 35

SECTION 2 EXERCISES2.1 (Sandwich theorem.) Suppose A ⊂ Rn, a ∈ A, y ∈ Rm, f , g , h : A → Rm and limx→a g(x) =limx→a h(x) = y. Prove g(x) ≤ f(x) ≤ h(x) ∀ x ∈ A ⇒ limx→a f(x) = y.

Note: Part of what you have to prove is that lim x→a f(x) exists.

2.2 Give the proof of 2.2 above.

2.3 Using the Bolzano-Weierstrass theorem for real sequences (from Lecture 2 of Appendix A), givethe proof of Thm. 2.5 above.

Caution: An application of the theorem in the case n = 1 (Lecture 2 of Appendix A) tells us that

each component sequence{

x(k)

j }k=1

,2

,... has a convergent subsequence, but you need to take accountof the fact that there may be a different subsequence corresponding to each j .

2.4 Give the proofs of Thm. 2.6, Rem. 2.8, and Lem. 2.9 above.

2.5 Suppose K is a compact (i.e.,closed andbounded) subsetof Rn andlet f : K → R be continuous.

Prove that foreach ε > 0 there is δ > 0 such that x, y ∈ K and x − y < δ ⇒ |f(x) − f(y)| < ε.

Hint: Otherwise there is ε = ε0 > 0 such that this fails for each δ > 0, which means it fails for δ = 1k

for each k = 1, 2, . . ., which means that for each k there are points xk , yk ∈ K with xk − yk < 1k

and |f (xk) − f (yk)| ≥ ε0.

3 DIFFERENTIABILITY IN R

n

Recall first the definition of differentiability of a function f : (α,β) → R of one variable: f isdifferentiable at a point a ∈ (α, β) if

3.1 limh→0

f (a + h) − f(a)

hexists .

In this case, the limit is denoted f (a) and is referred to as the derivative of f at a.

This definition does not extend well to functions of more than 1-variable because in the case of functions f : U → R with U ⊂ Rn open and n ≥ 2, difference quotients like f (a+h)−f(a )

hmake

no sense (we cannot divide by the vector quantity h). We therefore need an alternative definition of

differentiability which makes sense for such functions and which is equivalent to the usual defini-tion 3.1 of differentiability when n = 1. The key observation in this regard is that 3.1 (in the casen = 1 when f : (α,β) → R and a ∈ (α,β)) is equivalent to the requirement that there is a realnumber A such that

3.2 limh→0

|h|−1|f (a + h) − f(a) + Ah

| = 0 .

7/28/2019 Math51H



Notice that such A exists if and only if A = lim f (a+h)−f(a)h , because

|h|−1|f (a + h) − (f(a) + Ah)| =f (a + h) − f(a)

h− A

and hence

limh→0

|h|−1|f (a + h) − (f(a) + Ah)| = 0 ⇐⇒ limh→0

f (a + h) − f(a)

h= A .

So 3.2 is equivalent to 3.1 in case n = 1, but the point now is that 3.2 has a natural generalizationto the case when f : U → Rm, where U ⊂ Rn is open and m, n are arbitrary positive integers, asfollows.

3.3 Definition: If f : U →Rm

then we say that f is differentiable at a ∈ U if there is an m × nmatrix A such thatlim

h→0h−1f (a + h) − (f(a) + Ah) = 0 .

Figure 2.1: Definition of differentiability, n = 2, m = 1.

In case 3.3 holds, A is called the derivative matrix (or Jacobian matrix) of the function f at the point

a. It is unique if it exists at all, and we could fairly easily check that now, but in any case it will be aby-product of the result of Thm. 4.3 in the next section, so we’ll leave the proof of uniqueness untilthen.

The derivative matrix A of f at a, when it exists, is usually denoted Df (a).

As for functions of 1-variable we have the fact that differentiability ⇒ continuity. More precisely:

7/28/2019 Math51H


4. DIRECTIONAL DERIVATIVES, PARTIAL DERIVATIVES, AND GRADIENT 37

3.4 Lemma. Suppose f : U → Rm with U open in Rn and suppose a ∈ U . Then differentiability of f at a ⇒ continuity of f at a.

Proof: Assume f is differentiable at a ∈ U . Pick ρ > 0 such that Bρ (a) ⊂ U , which we can dobecause U is open.Then for 0 < h < ρ we have

f (a + h) − f(a) = f (a + h) − (f(a) + Df (a)h) − Df (a)h≤ f (a + h) − (f(a) + Df (a)h) + Df (a)h≤ (h−1f (a + h) − (f(a) + Df (a)h))h + Df (a)h

and limh→0 h−1(f (a + h) − (f(a) + Df (a)h)), limh→0 h, limh→0 Df (a)h are allzero, so, by Lem. 2.9, limh→0 f (a + h) − f(a) = 0.0 + 0 = 0, which is the definition of conti-

nuity of f

ata

.

4 DIRECTIONAL DERIVATIVES, PARTIAL DERIVATIVES, AND GRADIENT

Notice that if f : U → Rm with U open and if a ∈ U , we can find ρ > 0 such that Bρ (a) ⊂ U

and so, for any vector v ∈ Rn, f (a + t v) is a function of the real variable t , and is well defined for|t | < v−1ρ if v = 0 and is a constant function f (a + t v) ≡ f(a) for all t ∈ R if v = 0.

4.1 Definition: The directional derivative of f at a by the vector v, denoted Dv f(a), is defined by

Dv f(a) = limh→0

h−1(f(a + hv) − f (a))

whenever the limit on the right exists. Observe that, since g(t) = f (a + t v) is a function of the real variable t for t in some open interval containing 0, the Def. 4.1 just says that the directional derivativeDv f(a) is exactly the derivative g (0) at t = 0 of the function g(t) defined by g(t) = f (a + t v).

4.2 Definition: In the special case v = ej (the j -th standard basis vector in Rn) the directionalderivative Dej

f(a) is usually denoted Dj f(a) whenever it exists; thus

Dj f(a) = limh→0

h−1f (a + hej − f (a)) .

Dj f(a) so defined is called the j -th partial derivative of f at a. An alternative notation (sometimes

referred to as “classical notation”) is to write

∂f (x)

∂xj = Dj f (x) .

Notice that the simplest way of calculating Dj f(x) is usually simply to take the derivative (as in

1-variable calculus) of the function f (x1, . . . , xn) with respect to the single variable xj whileholdingall the other variables, xi , i = j , fixed. Thus, in general, it is not necessary to use the formal limitdefinition of 4.2.

7/28/2019 Math51H



There is an important connection between partial derivatives and the derivative matrix at points where f is differentiable, given in the following theorem.

4.3 Theorem. If f : U → Rm with U open and if f is differentiable at a point a ∈ U , then all the partial derivatives of f at a exist, and in fact the j -th column of the derivative matrix Df (a) is exactly the j -th partial derivative Dj f(a) , j = 1, . . . , n. Furthermore, if f is differentiable at a then all the directional derivatives Dv f(a) exist, and are given by the formula

Dv f(a) = Df (a)v ,

or equivalently,

Dv f(a) =n

j

=1

vj Dj f (a) .

4.4 Caution: The above says, in particular, that f differentiable at a implies that all the partialderivatives Dj f(a), j = 1, . . . , n, and all the directional derivatives Dv f(a) exist, but the converseis false ; that is, it may happen that all the directional derivatives (including the partial derivatives)exist yet f still fails to be differentiable at a. (See Exercise 4.1 below.)

Proof of Theorem 4.3: Take ρ > 0 such that Bρ (a) ⊂ U and let A = Df (a). We are givenlimh→0 h−1f (a + h) − (f (a) + Ah) = 0 so given ε > 0 there is a δ ∈ (0, ρ) such that 0 <

h < δ ⇒ h−1f (a + h) − (f(a) + Ah) < ε and taking, in particular, h = hej we see that

h ∈ R with 0 < |h| < δ ⇒ |h|−1f (a + hej ) − (f (a) + hAej ) < ε ⇐⇒ h−1(f(a + hej ) −f(a)) − Aej < ε, which is precisely the ε, δ definition of lim h−1(f(a + hej ) − f (a)) = Aej , soin fact Dj f(a) exists and equals the j -th column of Df (a). If v is any vector inRn \ {0} and if h ∈ R

with 0 < |h| < δ/v then |h|−1

f (a + hv) − (f(a) + hAv) < ε,andhenceh−1

(f(a + hv) −f(a)) − Av < ε which, since ε > 0 is arbitrary, says limh→0 h−1(f(a + hv) − f(a)) = Av; thatis Dv f(a) exists and is equal to Av = Df (a)v = n

j =1 vj Dj f(a), since we already proved thej -th column of Df (a) is Dj f(a).

4.5 Definition: Let f : U → R with U open. The gradient of f at a point a ∈ U is the vector(D1f ( a ) , . . . , Dnf(a)) T.

We claim that the gradient has geometric significance: If f is differentiable at a it gives the directionof the fastest rate of change of f when we start from the point a, and furthermore it’s magnitude∇ f(a) gives the actual rate of change. More precisely:

4.6 Lemma. Suppose f : U → Rwith U open, let a ∈ U and assume that f is differentiable at a.Then

max{Dv f(a) : v ∈ Rn

, v = 1} = ∇ f(a) , and for ∇ f(a) = 0 the maximum is attained whenand only when v = ∇ f(a)−1∇ f(a).

Proof: By Lem. 4.4 we have Dv f(a) = nj =1 vj Dj f(a) = v · ∇ f(a), and by the version of the

Cauchy-Schwarz inequality in Ch. 1, Exercise 2.2 we have for v = 1 that

(1) v · ∇ f(a) ≤ ∇ f(a)

7/28/2019 Math51H


4. DIRECTIONAL DERIVATIVES, PARTIAL DERIVATIVES, AND GRADIENT 39

and equality holds in this inequality if and only if ∇ f(a) = λv with λ ≥ 0. But ∇ f(a) = λv with λ ≥ 0 and v = 1 ⇒ λ = ∇ f(a), so if ∇ f(a) = 0 equality holds in (1) if and only if v = ∇ f(a)−1∇ f(a), as claimed.

In view of the remarks in 4.4 above, we have the inconvenient fact that all the partial derivativesof f might exist, yet f still not be differentiable. Fortunately, there is after all a criterion, involvingonly the partial derivatives of f , which guarantees differentiability of f , as follows.

4.7 Theorem. Let f : U → Rm and assume that there is a ball Bρ (a) ⊂ U such that the partial derivatives Dj f(x) exist at each point x ∈ Bρ (a) and are continuous at a.Then f is differentiable at a.

4.8 Remark: In particular, if the partial derivatives Dj f exist everywhere in U and are continuousat each point of U (in this case we say that “f is a C1 function on U ”), then we can apply the abovetheorem at every point of U , so f is differentiable at every point of U in this case.

Proof of Theorem 4.7: Observe that it suffices to prove the theorem in the case m = 1, because incase m ≥ 2 we have f(x) = (f 1( x ) , . . . , f m(x)) T andwecanapplythecase m = 1 to each individual

component f j , which evidently implies the required result. With m = 1 we have f real-valued andfor 0 < |h| < ρ we have

(1) f (a + h) − f(a) =n

j =1f (a + h(j )) − f (a + h(j −1))

,

where we use the notation h(j ) = j i=1 hi ei for j = 1, . . . , n and h(0) = 0. Observe that then

h(j ) = h(j −1) + hj ej for j = 1, . . . , n, and so in the difference f (a + h(j )) − f (a + h(j −1)) =f (a + h(j −1) + hj ej ) − f (a + h(j −1)) we are varying only the j -th variable (i.e., a single vari-able), so the Mean Value Theorem from 1-variable calculus implies f (a + h(j )) − f (a + h(j −1)) =hj Dj f (h(j −1) + θ j hj ej ) for some θ j ∈ (0, 1), whence (1) implies

f (a + h) − f(a) =n

j =1

hj Dj f (a + h(j −1) + θ j hj ej ) ,

and hence

f (a + h) − f(a) − Df (a)h =n

j =1

hj

Dj f (a + h(j −1) + θ j hj ej ) − Dj f(a)

.

7/28/2019 Math51H



and in particular, with Df = (D1f , . . . , Dnf ),

f (a + h) − f(a) − Df (a)h = n

j =1

hj


(2)

≤n

j =1

hj


≤h n

j =1

Dj f (a + h(j −1) + θ j hj ej ) − Dj f(a) ,

where we used the triangle inequality (2.6 of Ch. 1) to go from the first line to the second.

Let ε > 0. Since Dj f(x) is continuous at x = a, for each j = 1, . . . , n we have δj ∈ (0, ρ) suchthatk < δj ⇒ Dj f (a + k) − Dj f(a) < ε/n ,

and hence with δ = min{δ1, . . . , δn} we have

(3) k < δ ⇒ Dj f (a + k) − Dj f(a) < ε/n for each j = 1, . . . , n .

Finally, observe that if h < δ then h(j −1) + θ j hj ej = j −1

i=1 h2i + θ 2j h

2j ≤ h < δ, and

hence we can use (3) with k = h(j −1) + θ j hj ej and so (2) implies

f (a + h) − f(a) − Df (a)h < εh whenever 0 < h < δ ,

thus proving that f is differentiable at a.

SECTION 4 EXERCISES

4.1 Let f be the function on R2 defined by f( x , y ) =

x3

yif y = 0

0 if y = 0.

(i) Prove that the directional derivative Dv f (0, 0) (exists and) = 0 for each v ∈ R2.

(ii) f is not continuous at (0, 0).

(iii) f is not differentiable at (0, 0).

4.2 For the function f (x1, x2)

= ⎧⎨⎩|x1|x2

x2

1+x2

2

if (x1, x2) = (0, 0)

0 if (x1, x2) = (0, 0),

(i) Find the directional derivatives at (0, 0) (and show they all exist),

(ii) Show that the formula Dv f (0, 0) = 2j =1 vj Dj f (0, 0) fails for some vectors v ∈ R2,

(iii) Hence, using (ii) together with the appropriate theorem, prove that f is not differentiable at(0, 0).

7/28/2019 Math51H


5. CHAIN RULE 41

5 CHAIN RULE

The chain rule for functions of one variable says that the composite function g ◦ f is differentiable ata point x with derivative g (f (x))f (x) provided both the derivatives f (x) and g (f (x)) exist.Thechain rule for functions of more than one variable is similar, and the proof is more or less identical,except that the derivatives f , g must be replaced by the relevant derivative matrices. The precisetheorem is as follows.

5.1 Theorem (Chain Rule.) Suppose U ⊂ Rn and V ⊂ Rm are open and f : U → V , g : V → R

are given, and suppose that f is differentiable at a point x ∈ U and that g is differentiable at the point y = f(x) ∈ V .Then g ◦ f is differentiable at x and D(g ◦ f)(x) = Dg(f (x))Df (x).

5.2 Remark: Dg(f (x))Df (x) is the product of the × m matrix Dg(f (x)) times the m × n matrixDf (x), which makes sense and gives an × n matrix, which is at least the correct size matrix topossibly represent D(g ◦ f )(x), because the composite g ◦ f maps the set U ⊂ Rn into R, so atleast the statement of the theorem makes sense.

ProofoftheChainRule:Let ε ∈ (0, 1) and take δ0 > 0 such that Bδ0(x) ⊂ U and Bδ0

(f (x)) ⊂ V .Observe first that by differentiability of g at y = f(x) we can choose δ1 ∈ (0, δ0) such that

z − y < δ1 ⇒ g(z) − g(y) − Dg(y)(z − y) ≤ εz − y

and, on the other hand, by differentiability of f at x we can choose δ2 ∈ (0, δ0) such that

h < δ2 ⇒ f (x + h) − f(x) − Df (x)h ≤ εh

and, in particular,

h < δ2 ⇒ f (x + h) − f(x) = f (x + h) − f(x) − Df (x)h + Df (x)h≤ f (x + h) − f(x) − Df (x)h + Df (x)h≤ εh + Df (x))h≤ (1 + Df (x))h < δ1

if h < (1 + Df (x))−1δ1. So, in particular, if h < min{(1 + Df (x))−1δ1, δ2} then we havethe two inequalities

f (x + h) − f(x) − Df (x)h ≤ εhg(z) − g(y) − Dg(y)(z − y) ≤ ε(1 + Df (x))h

7/28/2019 Math51H



with z = f (x + h), y = f(x). So with z = f (x + h), y = f(x) we see h < min{(1 +Df (x))−1δ1, δ2} ⇒

g(z) − g(y) − Dg(y)Df (x)h= g(z) − g(y) − Dg(y)(z − y) − Dg(y)

f (x + h) − f(x) − Df (x)h

≤ g(z) − g(y) − Dg(y)(z − y) + Dg(y)

f (x + h) − f(x) − Df (x)h

≤ g(z) − g(y) − Dg(y)(z − y) + Dg(y) f (x + h) − f(x) − Df (x)h≤ εz − y + εDg(y) h≤ ε

1 + Df (x) + Dg(y)h = Mεh ,

where M = 1 + Df (x) + Dg(f (x)). That is we have proved that if we take δ = min{(1 +Df (x))−1δ1, δ2} then

h < δ ⇒ g(f(x + h)) − g(f (x)) − Dg(f (x))Df (x)h ≤ Mεh .

Since we can repeat the whole argument with M −1ε in place of ε we have thus shown that there isa δ > 0 such that

h < δ ⇒ g(f(x + h)) − g(f (x)) − Dg(f (x))Df (x)h ≤ εh .

That is, g ◦ f is differentiable at x and D(g ◦ f)(x) = Dg(f (x))Df (x).

SECTION 5 EXERCISES

5.1 Use the chain rule and any other theorems from the text that you need to show the following:

(i) If f : U → V and g : V → Rp are C1, where U ⊂ Rn, V ⊂ Rm are open, then g ◦ f is C1 on

U .(ii) If f : R → R is C1 and g : R3 → R is defined by g(x) = f (2x1 − x2 + 3x3), prove that g isC1 and show D1g(x) = −2D2g(x) = (2/3)D3g(x) at each point x = (x1, x2, x3) ∈ R3.

(iii) If ϕ : R → R is C1 and g(x) = ϕ(x) for x ∈ Rn, then g is C1 on Rn \ {0} and Dj g(x) =xj

x ϕ (x) for all x = 0.

(iv) If γ : [0, 1] →Rn is a C1 curve and if f : Rn → R is C1 then f ◦ γ is C1 on [0, 1] and(f ◦ γ ) (t ) = ∇ f(γ(t)) · γ (t ) = (Dγ (t) f)(γ(t))∀ t ∈ [0, 1], where ∇ f is the gradient of f (i.e.,(Df ) T) and Dv f means the directional derivative of f by v.

6 HIGHER-ORDER PARTIAL DERIVATIVES

Recall that f : U → Rm, with U ⊂ Rn open, is said to be C1 on U if all the partial derivatives Dj f

exist and are continuous on U . Likewise f is said to be C2 on U if each of the partial derivativesDj f is C1, so in this case the second order partial derivatives Di Dj f exist and are continuous. Animportant general fact about such second order partials is that

6.1 Di Dj f = Dj Di f on U , provided f is C2 on U ;

7/28/2019 Math51H


6. HIGHER-ORDER PARTIAL DERIVATIVES 43

i.e., we claim that the order in which one computes the mixed partials is irrelevant. Of course thenif k ≥ 2 and f : U → Rm is Ck (meaning all partial derivatives Di1

Di2· · · Dik

f(x) exist and arecontinuous for any choice of i1, . . . , ik ∈ {1, . . . , n}), then again the order in which the partialderivatives are taken is irrelevant:

6.2 f ∈ Ck ⇒ Di1Di2

· · · Dikf(x) = Dij 1

Dij 2· · · Dij k

f(x)

whenever j 1, . . . , j k is a re-ordering (“a permutation”) of the integers 1, . . . , k, which is an easy consequence of 6.1 and induction on k, because for k ≥ 3 we can write

Di1Di2

· · · Dikf(x) = Di1

Di2· · · Dik−1

(Dikf)(x) = Di1

(Di2· · · Dik−1

Dikf )(x) .

A partial derivative Di1

· · ·Dik

f(x) as in 6.2 above is usually referred to as a partial derivative of f

of order k.

We will actually prove a result which is slightly more general than 6.1, as follows.

6.3Theorem. Suppose f : U → Rm , where U is an open subset of Rn , suppose ρ > 0 , a ∈ U , Bρ (a) ⊂U , i, j ∈ {1, . . . , n} , and suppose the second-order partial derivatives Di Dj f(x) and Dj Di f(x) exist at each point x of Bρ (a) and are continuous at the point x = a.Then Di Dj f(a) = Dj Di f(a).

Proof: Evidently, there is nothing to prove when i = j , sowe can assume i = j . Also, we can assumem = 1, because otherwise we can first apply the case m = 1 of the theorem to each component f j of f = (f 1, . . . , f m) T and this then evidently implies the theorem as stated. So we assume that m = 1

(i.e., that f is real-valued) and i = j . For h, k ∈ (−ρ/2, ρ/2) we observe that the points a + hei ,a

+kej and a

+hei

+kej are all in Bρ (a) and so it makes sense to consider the“second difference”

f (a + hei + kej ) − f (a + hei ) − f (a + kej ) + f (a) .

The expression here is called a second difference because it can be written

ϕj,k (a + hei ) − ϕj,k (a) with ϕj,k (x) = f (x + kej ) − f(x)

(i.e., it can be written as the “first difference of the first difference ϕj,k ” hence the name “seconddifference”), and similarly it can also be written

ψi,h(a + kej ) − ψi,h(a) with ψi,h(x) = f (x + hei ) − f (x) ,

so in fact,

(1) ϕj,k (a + hei ) − ϕj,k (a) = ψi,h(a + kej ) − ψi,h(a) .

Now only the i-th variable is being varied in the difference on the left of (1), so we can use the MeanValue Theorem from 1-variable calculus in order to conclude that

ϕj,k (a + hei ) − ϕj,k (a) = hDi ϕj,k (a + θ hei ) for some θ ∈ (0, 1) ,

7/28/2019 Math51H



and of course by taking the partial derivative with respect to xi on each side of the identity ϕj,k (x) =f (x + kej ) − f(x), we see that Di ϕj,k (x) = Di f (x + kej ) − Di f(x). Hence,

ϕj,k (a + hei ) − ϕj,k (a) = hDi f (a + θ hei + kej ) − hDi f (a + θ hei )

and on the right here is a difference in which only the j -th variable is being varied, hence we canagain use the Mean Value Theorem from 1-variable calculus to see that the expression on the rightcan be written hkDj Di f (a + θ hei + ηkej for some η ∈ (0, 1). Thus, finally we have shown

(2) ϕj,k (a + hei ) − ϕj,k (a) = hkDj Di f (a + θ hei + ηkej ) for |h|, |k| < ρ/2 .

By a similar argument (starting with the right side of (1) instead of the left) we have

(3) ψi,h(a + kej ) − ψi,h(a) = khDi Dj f (a +θ hei +ηke j ) for |h|, |k| < ρ/2 ,

and for suitableθ ,η ∈ (0, 1). Thus, by (1), (2), (3) we see that

(4) Dj Di f (a + θ hei + ηkej ) = Di Dj f (a +θ hei +ηke j ) for |h|, |k| < ρ/2 .

Taking limits as√

h2 + k2 → 0 and using the fact that Di Dj f(x) and Dj Di f(x) are both contin-uous at x = a, we have Di Dj f(a) = Dj Di f(a) as claimed.

SECTION 6 EXERCISES

6.1 Suppose f, g : R → R are C2 functions, and let F : R2 → R be defined by F(x,y) = f (x +y) + g(x − y). Prove that F is C2 on R2 and satisfies the wave equation on R2 (i.e., the equation∂ 2F ∂x 2 −

∂ 2F ∂y 2 = 0).

6.2 Prove that the degree n polynomials u, v obtained by taking the real and imaginary parts of

(x + iy)n are harmonic (i.e., ∂2p

∂x 2 + ∂ 2p

∂y 2 ≡ 0 on R2 in both cases p = u, p = v).

Hint:Thus, (x + iy)n = u(x,y) + iv(x, y) where u, v are real-valued polynomials in the variablesx, y. Start by showing that ∂u

∂x= ∂v

∂yand ∂u

∂y= −∂v

∂x.

7 THE SECOND DERIVATIVE TEST FOR EXTREMA OF A MULTIVARIABLE FUNCTION

Here we want to discuss conditions on a C2 function f : U → R, with U open in Rn, which are

sufficient to guarantee f has a local maximum or local minimum at a point a ∈ U . First we give theprecise definition of local maximum and local minimum.

7.1 Definition: Suppose f : U → R with U open and a ∈ U . f is said to have a local maximum ata if there is ρ > 0 with Bρ (a) ⊂ U and

f(x) ≤ f(a) ∀ x ∈ Bρ (a)

7/28/2019 Math51H


7. SECOND DERIVATIVETEST FOR EXTREMA OF MULTIVARIABLE FUNCTION 45

and the local maximum is said to be strict if there is ρ > 0 such that Bρ (a) ⊂ U and

f (x) < f(a) ∀ x ∈ Bρ (a) \ {a} .

Likewise, f is said to have a local minimum at a if there is ρ > 0 with Bρ (a) ⊂ U and f(x) ≥f(a) ∀ x ∈ Bρ (a) and the local minimum is said to be strict if ∃ ρ > 0 such that we have the strictinequality f (x) > f(a) ∀ x ∈ Bρ (a) \ {a}.

Of course if f : U → R is differentiable with U ⊂ Rn open, then ∇ f(a) = 0 at any point a ∈ U

where f has a local maximum or minimum (because if f has a local maximum or minimum at a

then f (a + t ej ) clearly has a local maximum or minimum at t = 0 and hence by 1-variable calculus

has zero derivative with respect to t at t = 0, and the derivative at t = 0 is by definition the j ’thpartial derivative Dj f(a)).

Recall the second derivative test from single variable calculus that if f : (α,β) → R is C2 (i.e., thefirst and second derivatives of f exist and are continuous) then f has a strict local maximum at apoint a ∈ (α, β) if f (a) = 0 and f (a) < 0 and a strict local minimum at a point a ∈ (α, β) if f (a) = 0 and f (a) > 0.

Here we want to show that a similar statement holds for functions f : U → R, with U open inRn. In order to state the relevant theorem, we first need to make a brief discussion of the notion of

quadratic form.

A quadratic form on Rn is an expression of the form

7.2 Q(ξ)

=

n

i,j =1

qij ξ i ξ j , ξ

=(ξ 1, . . . , ξ n)

∈R

n ,

where (qij ) is a given n × n real symmetric matrix. The quadratic form is said to be positive definite if Q(ξ) > 0 ∀ ξ ∈ Rn \ {0} and negative definite if Q(ξ) < 0 ∀ ξ ∈ Rn \ {0}.

Of course since a quadratic form is just a homogeneous degree 2 polynomial in the variablesξ 1, . . . , ξ n, it is evidently continuous and hence by Thm. 2.6 of the present chapter attains itsmaximum and minimum value on any closed bounded set in Rn. In particular, the unit sphere S n−1

in Rn,

7.3 S n−1 = {x ∈ Rn : x = 1} ,

is a closed bounded set, hence the restriction of the quadratic form to S n−1, Q|S n−1, therefore doesattain its maximum and minimum value. Let

7.4 m = min Q|S n−1, M = max Q|S n−1

andobservethatfor ξ ∈ Rn \ {0} we can write ξ = ξ ξ ,whereξ = ξ −1ξ ∈ S n−1,andso Q(ξ) =ξ 2Q(ξ ), whence

7.5 mξ 2 ≤ Q(ξ) ≤ M ξ 2, ∀ ξ ∈ Rn ,

7/28/2019 Math51H



with m, M as in 7.4.Notice, in particular, if Q is positive definite then the minimum m of Q|S n−1 is positive, and so 7.5guarantees

7.6 Q positive definite ⇒ ∃ m > 0 such that Q(ξ) ≥ mξ 2 ∀ ξ ∈ Rn ,

and similarly,

7.7 Q negative definite ⇒ ∃ m > 0 such that Q(ξ) ≤ −mξ 2 ∀ ξ ∈ Rn .

We are now ready to state the second derivative test for C2 functions f : U → R, where U is openin Rn. In this theorem, Hessf,a (ξ ) will denote the Hessian quadratic form of f at a, defined by

7.8 Hessf,a (ξ ) =n

i,j =1

Di Dj f (a)ξ i ξ j , ξ = (ξ 1, . . . , ξ n) T ∈ Rn .

7.9 Theorem (Second derivative test for functions of n variables.) Suppose U is open in Rn ,f : U → R is C2 , and a ∈ U .Then

f(x) = f(a) + (x − a) · ∇ f(a) + 12

Hessf,a (x − a) + E(x) ,

where limx→a x − a−2E(x) = 0.

In particular, f has a strict local maximum at a if ∇ f(a) = 0 and Hessf,a (ξ ) is negative definite, and

a strict local minimum at a if ∇ f(a) = 0 and Hessf,a (ξ ) is positive definite.

7.10 Terminology: If f : U → R is C1 and U ⊂ Rn is open, points a ∈ U with ∇ f(a) = 0 are

usually referred to as critical points of f . Using this terminology, Thm. 7.9 says that if f is C2

thenf has a strict local minimum at any critical point a ∈ U where Hessf,a (ξ ) is positive definite andstrict local maximum at any critical point a ∈ U where Hessf,a (ξ ) is negative definite.

Proof of 7.9: We begin with an identity from single variable calculus: If h : [0, 1] →R is C2, then we have the identity

(1) 1

0 (1 − t )h (t)dt = h(1) − h(0) − h (0) ,

which is proved using integration by parts. (See Exercise 7.3 below.) We apply (1) in the specialcase when h(t) = f (a + t (x − a)), with x ∈ Bρ (a). By the Chain Rule, h(t) is then indeed C2

on [0, 1] and h (t ) = (x − a) · ∇ f (a + t (x − a)), h (t ) = ni,j =1(xi − ai )(xj − aj )Di Dj f (a +

t (x − a)), so (1) gives

f(x)=f(a) + (x − a) · ∇ f(a)+ni,j =1(xi −ai )(xj −aj ) 10 (1− t )Di Dj f (a + t (x − a)) dt .

Since 1

0(1 − t )Di Dj f (a + t (x − a))dt = 1

0(1 − t)(Di Dj f (a + t (x − a)) − Di Dj f(a))dt + 1

0(1 − t ) d t Di Dj f(a) and

10

(1 − t ) d t = 12

, this can be rewritten

(2) f (x) = f(a) + (x − a) · ∇ f(a) + 12

Hessf,a (x − a) + E(x) ,

7/28/2019 Math51H


7. SECOND DERIVATIVETEST FOR EXTREMA OF MULTIVARIABLE FUNCTION 47

where E(x) = ni,j =1 qij (x)(xi − ai )(xj − aj ) with

qij (x) = 10

(1 − t )

Di Dj f (a + t (x − a) − Di Dj f(a)

dt .

Now observe that

|qij (x)| = 1

0(1 − t )

Di Dj f (a + t (x − a) − Di Dj f(a)

dt (3)

≤ 10

(1 − t )|Di Dj f (a + t (x − a) − Di Dj f(a)| dt ,

and let ε > 0 and i, j ∈ {1, . . . , n}. Since Di Dj f(x) is continuous at x = a we then have a δij > 0

such that x ∈ Bδij (a) ⇒ |Di Dj f(x) − Di Dj f(a)| < ε/n2. Let δ = min{δij : i, j ∈ {1, . . . , n}}

and observe that x

∈Bδ(a)

⇒a

+t (x

−a)

∈Bδ(a)

∀t

∈ [0, 1

], so then we have

x ∈ Bδ(a) ⇒ |Di Dj f (a + t (x − a) − Di Dj f(a)| < ε/n2 for each i, j = 1, . . . , n

and each t ∈ [0, 1]. Hence, using (3),

x ∈ Bδ(a) ⇒ |qij (x)| ≤ 10 (1 − t) dt ε/n2 = 12

ε/n2 < ε/n2 ,

and so

(4) x ∈ Bδ(a) ⇒|E(x)|≤n

i,j =1

|xi −ai ||xj − aj ||qij (x)|≤x − a2n2ε/n2 = εx − a2 .

Notice that this is the ε, δ definition of limx→a x − a−2

E(x) = 0, so the first part of the lemmais proved.

Now suppose that Hessf,a (ξ ) is positive definite. Then by 7.6 there is m > 0 such that the term12

Hessf,a (x − a) in (2) satisfies

(5)1

2Hessf,a (x − a) ≥ mx − a2 for all x ∈ Rn ,

and, on the other hand, by using (4) with ε = m/2 we have δ > 0 such that

(6) x ∈ Bδ(a) ⇒ |E(x)| ≤ mx − a2/2 ⇒ E(x) ≥ −mx − a2/2 ,

and so using (5) and (6) in (2) we see that if ∇ f(a) = 0

x ∈ Bδ(a) ⇒ f(x) ≥ f(a) + mx − a2/2 ,

so f has a strict local minimum at a.

Applying the same argument with −f in place of f we deduce that f has a strict local maximumat a if ∇ f(a) = 0 and if Hessf,a (ξ ) is negative definite.

7/28/2019 Math51H



SECTION 7 EXERCISES7.1 (a) We proved above that if U ⊂ Rn is open, if f : U → R is C2, and if x0 ∈ U is a criticalpoint (i.e., ∇ f (x0) = 0),then f has a local minimum/maximum at x0 if the Hessian quadratic formQ(ξ) ≡ n

i,j =1 Di Dj f (x0)ξ i ξ j is positive definite/negative definite, respectively. Prove that f hasneither a local maximum nor a local minimum at a critical point x0 of f if Q(ξ) is indefinite (i.e., if there are unit vectors ξ, η such that Q(ξ) > 0 and Q(η) < 0.

(b) Find all local max/local min for the function f : R2 → R given by f( x , y ) = 13

(x3 + y3) −x2 − 2y2 − 3x + 3y, and justify your answer.

7.2 Suppose that f( x , y ) = 13

(x3 + y3) − x2 − 2y2 − 3x + 3y. Find the critical points (i.e., thepoints where ∇ f = 0) of f , and discuss whether f has a local maximum/minimum at these points.

(Justify any claims that you make by proof or by referring to the relevant theorem above.)

7.3 Prove the identity 1

0(1 − t )h(t)dt = h(1) − h(0) − h (0) used in the proof of Thm. 7.9,

assuming that h is a C2 function on [0, 1].Hint: Write

10 (1 − t )h (t)dt = 1

0 (1 − t ) d dt

h (t)dt , and use integration by parts.

8 CURVES IN R n

By a C0 curve in Rn we mean a continuous map γ : [a, b] →Rn, where a, b ∈ R with a < b. Of course one might think geometrically of the curve as the image γ ([a, b]), but formally the curve isin fact the map γ

: [a, b

] →Rn.

By a C1 curve in Rn we shall mean a C1 map γ : [a, b] →Rn. We should clarify this, because todate we have only considered functions which are C1 on open sets, whereas here we take the domainto be the closed interval [a, b]. In fact, by saying that γ : [a, b] →Rn is C1 we mean that: (i) the

derivative γ (t ) = limh→0 h−1(γ(t + h) − γ(t)) exists for each t ∈ (a,b); (ii) the one-sided deriva-tives γ (a) = limh↓0 h−1(γ(a + h) − γ(a)), γ (b) = limh↑0 h−1(γ(a + h) − γ(a)) exist; and (iii)γ (t ), so defined, is a continuous function of t on the closed interval [a, b].γ (t ) is sometimes referred to as the velocity vector of γ , because if t , t + h ∈ [a, b], the quantity γ (t + h) − γ(t) is the change of position of γ as t changes from t to t + h,so,as h → 0, h−1(γ(t +h) − γ(t)) can be thought of as the (vector) rate of change of the point γ(t). If γ (t ) = 0 we define

8.1

τ(t)=

γ (t )

−1γ

(t) ,

and we refer to τ(t) as the unit tangent vector of the curve γ . From the schematic diagram (depicting

the case n = 3) it is intuitively evident that indeed γ (t ) is tangent to the curve γ(t) if it is nonzero,and hence that the terminology “unit tangent vector” is appropriate.

We can also discuss the length (γ ) of a C0 curve γ as follows.

7/28/2019 Math51H


8. CURVES IN R n 49

0

σ(t+h)σ(t)

σ(b)

σ(a)

σ(t+h)-σ(t)

Figure 2.2: Schematic picture of definition of velocity vector n = 3.

First we introduce the notion of partition of the interval [a, b], meaning a finite collection P ={t 0, . . . , t N } of points of [a, b] with a = t 0 < t 1 < · · · < t N = b. For such a partition P we define

(γ,P ) =N

j =1

γ (t j ) − γ (t j −1) .

Geometrically, (γ,P ) is the length of the “polygonal approximation” to γ obtained by taking theunion of the straight line segments γ (t j −1) γ ( t j ) joining the points γ (t j −1) and γ (t j ), as in thefollowing schematic diagram, which depicts the case n = 3.

We say the curve has finite length if the set

8.2 {(γ,P ) : P is a partition of [a, b]}

is bounded above. In case γ has finite length we can define the length (γ) to be the least upperbound of the set in 8.2:

8.3 (γ) = sup{(γ,P ) : P is a partition of [a, b]} .

It is a straightforward exercise (Exercise 8.1 below) to check the additivity property that if c ∈ (a,b)

then

8.4 (γ) = (γ |[a, c]) + (γ |[c, b]) .

7/28/2019 Math51H



Figure 2.3: Schematic picture of polygonal approximation of γ , n = 3.

The explicit formula for length of a C1 curve given in the following theorem is important.

8.5 Theorem. If γ : [a, b] →Rn is C1 , then γ has finite length and in fact

(γ) = b

a

γ (t ) dt .

Proof: First observe by the fundamental theorem of calculus

γ (t j ) − γ (t j −1) = t j

t j −1

γ (t)dt .

Using the fact that t j

t j −1γ (t)dt ≤ t j

t j −1γ (t ) dt (see Exercise 8.3 below), we then have

γ (t j ) − γ (t j −1) ≤ t j

t j −1

γ (t ) dt

for each j = 1, . . . , N , and hence by summing over j we have

N

j =1 γ (t j )

−γ (t j

−1)

≤ b

a γ (t )

dt ,

so the collection of all sums

N j =1

γ (t j ) − γ (t j −1) : t 0 = a < t 1 < · · · < t N = b is a partition of [a, b]

7/28/2019 Math51H


8. CURVES IN R n 51

has the integral ba γ (t ) dt as an upper bound so indeed γ has finite length as claimed and

(1) (γ ) ≤ b

a

γ (t ) dt .

For t ∈ [a, b] we defineS(t) = (γ |[a, t ])

so that S(a) = 0 and S(b) = (γ).Take τ ∈ [a,b) and h > 0 such that τ + h ∈ [a, b], and observethat, by 8.4 with a , τ , τ + h in place of a , c, b, respectively, we have

S(τ + h) = S(τ) + (γ |[τ, τ + h])

so

(2) (γ |[τ, τ + h]) = S(τ + h) − S(τ) ,

but, on the other hand, by (1) and the definition of length we have

γ (τ + h) − γ ( τ) ≤ (γ |[τ, τ + h]) ≤ τ +h

τ

γ (t ) dt ,

i.e., by (2)

γ (τ +h) − γ ( τ)≤ S(τ +h)−S(τ)≤ τ +h

τ

γ (t ) dt = τ +h

a

γ (t ) dt − τ

a

γ (t ) dt ,

so

h−1γ (τ + h) − γ ( τ)≤h−1(S(τ + h) − S(τ)) ≤ h−1( τ +

h

a

γ (t ) dt − τ

a

γ (t ) dt) .

Notice the right side has limit as h ↓ 0 equal to d dτ

τ a

γ (t ) dt , which by the fundamental theoremof calculus is γ (τ ). So both the left and right side has limit as h ↓ 0 equal to γ (τ ), and henceby the Sandwich Theorem (Exercise 2.1 of the present chapter) we deduce that for τ ∈ [a,b)

limh↓0

h−1(S(τ + h) − S(τ)) = γ (τ ) .

By a similar argument with h < 0 we also prove for τ ∈ (a,b]limh↑0

h−1(S(τ + h) − S(τ)) = γ (τ ) .

Thus, we have shown s (t ) exists and equals γ (t ) for every t ∈ [a, b],1 so by the fundamentaltheorem of calculus and the fact that S(a) = 0 we have

S(b) = b

a

γ (t ) dt .

1Of course we proved only the existence of the relevant 1-sided derivatives at the end-points a, b.

7/28/2019 Math51H



Since, by definition S(b) = (γ), this completes the proof.

8.6 Remark: If γ : [a, b] →Rn is C1, then, as shown in the above proof, the function S(t) =(γ |[a, t ]) is a C1 function on [a, b],and S (t ) = γ (t ) for each t ∈ [a, b].Thus,if γ (t ) = 0 thenS has positive derivative, hence is a strictly increasing function mapping [a, b] onto [0,(γ)], andby 1-variable calculus the inverse function T : [0,(γ)] → [a, b] is C1 and the derivative T (s) =(S (t))−1, assuming the variables s, t are related by s = S(t) (or, equivalently, t = T (s)). Notice thatthen if we define γ = γ ◦ T , so γ(t) = γ(s), assuming again that s, t are related by s = S(t) or,equivalently, t = T(s).The mapγ is called the arc-length parametrization of γ (because s = S(t) =(γ |[a, t ])). Notice that by the 1-variable calculus chain rule we have

γ (s) = γ (t )−1γ (t ) and,

in particular,

γ (s) ≡ 1. The reader should keep in mind that all of this is dependent on the

assumption that γ is C1 with γ (t )

=0.

SECTION 8 EXERCISES

8.1 Let γ : [a, b] →Rn be an arbitrary continuous curve of finite length. Prove directly from thedefinition of length (as the least upper bound of lengths of polygonal approximations, as in 8.3) thatif c ∈ (a, b) then

(γ |[a, c]) + (γ |[c, b]) = (γ ) .

Hint: Start by showing that if P ,Q are partitions for [a, c] and [c, b], respectively, then(γ |[a, c],P ) + (γ |[c, b],Q) ≤ (γ).

8.2Let γ : [0, 1] →R

2

be the C0

curvedefined by γ(t) = (t,t sin1

t ) if t ∈ (0, 1] and γ (0) = (0, 0).Prove that γ has infinite length.

Hint: One approach is to directly check that for each real K > 0 there is a partition P : t 0 = 0 <

t 1 < t 2 < · · · < t N = 1 such thatN

j =1 γ (t j ) − γ (t j −1) > K.

8.3 Let f = (f 1, . . . , f n) : [a, b] →Rn be continuous and as usual take b

af ( t ) d t to mean b

af 1( t ) d t , . . . ,

ba

f n(t)dt

(∈ Rn). Prove ba

f ( t ) d t ≤ ba

f( t ) dt . Hint: First observe v · ba

f ( t ) d t = ba

v · f ( t ) d t for any fixed vector v ∈ Rn.

8.4 Consider the “helix” in R3 given by the curve γ(t) = (a cos ω t , a sin ωt,bωt) for t ∈ R, wherea , b , ω are given positive constants.

(a) Show that the speed (i.e., γ (t )) is constant for t ∈ R.

(b) Show that the velocity vector γ (t ) makes a constant nonzero angle with the z-axis.

(c) With t 1 = 0 and t 2 = 2π/ω, show that there is no τ ∈ (t 1, t 2) such that γ (t 1) − γ (t 2) = (t 1 −t 2)γ (τ ).

(Thus the Mean Value Theorem fails for vector-valued functions.)

7/28/2019 Math51H


9. SUBMANIFOLDS OF R n AND TANGENTIAL GRADIENTS 53

8.5 For the Helix γ(t) = (a cos ω t , a sin ωt,bωt), where t ∈ [0, 2π/ω] and a , b , ω > 0, prove thatthe arc-length variable s = S(t)(= (γ |[0, t ])) (as in Rem. 8.6 above) is a constant multiple of t .

Also, find the length, unit tangent vector, and curvature vector γ (s) for γ .

8.6 Suppose γ : [a, b] →Rn is a C2 curve with nonzero velocity vector γ . If γ(s) = γ(t), wheres is the arc-length parameter (i.e., s = S(t) = (γ |[a, t ]) as in Rem. 8.6 above), prove that the

curvature vector κ = d dsγ (s) at s is given in terms of the t -variable by the formula

κ(s) = γ (t )−2

I − γ (t )−2γ (t)γ (t ) T

γ (t) .

Here I is the n × n identity matrix and we assume γ(t) is written as a column; γ(t)γ(t) T is matrixmultiplication of the n × 1 matrix γ(t) by the 1 × n matrix γ ( t ) T.

9 SUBMANIFOLDS OF R n AND TANGENTIALGRADIENTS

In this section it will be convenient to write vectors in Rn as rows, x = (x1, . . . , xn), and the readershould keep in mind that then the gradient ∇ f(x) of a C1 function f : U → R will also be a row

vector ∇ f(x) = (D1f ( x ) , . . . , Dnf(x)).

We first introduce the notion of k-dimensional graph in Rn, with k ∈ {1, . . . , n − 1}.

9.1 Definition: If g = (g1, . . . , gn−k) : U → Rn−k, where U is open in Rk, the graph map corre-sponding to g is the map G : U → Rn defined by

G(x)

=(x , g(x )), x

∈U ,

and the graph of g is defined to be G(U), i.e., {G(x) : x ∈ U }. Notice that G is C1 if g is C1 andin this case we can view the graph of g (i.e., G(U)) as a k-dimensional C1 surface sitting in Rn, asrepresented in the following schematic diagram, which is actually representative of the special casek = 2, n = 3.

9.2 Definition: A subset M ⊂ Rn is a k-dimensional C1 submanifold of Rn if for each a ∈ M

there is a δ > 0 such that M ∩ Bδ(a) is P j 1,...,j n (G(U)) where G(x) (as in 9.1) is the graph map(x, g(x)) of a C1 function g : U → Rn−k with U open in Rk, and P j 1,...,j n is the permutation mapP j 1,...,j n (x1, . . . , xn) = (xj 1 , . . . , xj n ),with j 1, . . . , j n a permutation (i.e.,a re-ordering) of 1, . . . , n.

This definition sounds at first a little abstract, but it is really just saying that if a ∈ M then, in asuitably small ball Bδ(a), we can re-label the coordinates (using xj i as the new i-th coordinate for

i = 1, . . . , n) in such a way that M ∩ Bδ(a) becomes the graph of a C1

function as in 9.1.Figure 2.5 gives a schematic picture of what is going on in the case k = 1, n = 2.

7/28/2019 Math51H



Figure 2.4: Schematic picture of a C1 graph in case k = 2, n = 3.

Next we define the notion of tangent space T a M of the C1 manifold M at a ∈ M , as follows.

9.3 Definition: If M is a k-dimensional C1 submanifold of Rn and a ∈ M then the tangent space

of M at a, T a M , is defined by

T a M = {γ (0) : γ is a C1 map [0, α) → Rn with α > 0, γ ([0,α)) ⊂ M , γ (0) = a} .

Thus, in fact T a M is the set of all initial velocity vectors of C1 curves γ(t) which are contained inM and which start from the given point a at t = 0. We claim that T a M is a k-dimensional subspaceof Rn.

9.4 Lemma. If M is a k-dimensional C1 submanifold of Rn and if a ∈ M , then the tangent space T a M

is a k-dimensional subspace of Rn.

9.5 Remark A schematic picture representing the situation described in the above lemma, in the

case k = 2 and n = 3, is given in Figure 2.6.Proof of 9.4: Since it is merely a matter of relabeling the coordinates,we can suppose without loss of generality that the permutation j 1, . . . , j n needed in the Def. 9.2 is in fact the identity permutation

(i.e., j i = i for each i = 1, . . . , n), and hence there is δ > 0 such that

M ∩ Bδ(a) = graph g ,

7/28/2019 Math51H



Figure 2.5: Schematic picture of a C1 manifold M in case k = 1, n = 2.

where g : U → Rn

−k

is C

1

and U is open inRk

. If v ∈ T a M then by definition of T a M thereis a C1 curve γ : [0, α) → Rn (α > 0) with γ ([0,α)) ⊂ M , γ (0) = a and γ (0) = v. Since γ iscontinuous we have ρ > 0 such that 0 ≤ t < ρ ⇒ γ(t) ∈ M ∩ Bδ(a), and so 0 ≤ t < ρ ⇒(1) γ (t ) = G(σ (t)) ,

where G is the graph map as in 9.1 and σ(t) = (γ 1( t ) , . . . , γ k(t)) (i.e., σ(t) is the vector in Rk

obtained by taking first k components of γ(t)), so that σ(t) : [0, ρ) → Rk is C1 with σ (0) = a0, where a0 = (a1, . . . , ak) (i.e., a0 is the vector in Rk obtained by taking the first k components of a). By applying the chain rule to the composite on the right of (1) we then have

(2) v = γ (0) =n

j =1

yj Dj G(a0) ,

where (y1, . . . , yn) = σ (0) ∈ Rk, thus showing, since v ∈ T a M was arbitrary, that

(3) T a M ⊂ span{D1G(a0) , . . . , DkG(a0)} .

We can also prove the reverse inclusion

(4) span{D1G(a0) , . . . , DkG(a0)} ⊂ T a M ,

7/28/2019 Math51H



Figure 2.6: The tangent space T a M of M .

Notice that the tangent space in general does not pass through the point a; whereas the affine space

a + T a M = {a + v : v ∈ T a M } (i.e.,the translate of thesubspace T a M by thevector a) does pass through

a.

because if y ∈ Rk we can take ρ > 0 such that Bρ (a0) ⊂ U (because U is open), and hence for any α > 0 with αy < ρ we have γ(t) = G(a0 + t y) is C1 for |t | < α, γ ((−α,α)) ⊂ M , γ (0) = a

and again (2) holds by the chain rule. Since y1, . . . , yk were arbitrary, this does prove the reverseinclusion (4). Thus, in fact (by (3),(4)) we have proved

(5) T a M = span{D1G(a0) , . . . , DkG(a0)} .

Also, D1G(a0) , . . . , DkG(a0) are l.i.; indeed,

G(x)

=(x, g(x))

⇒Dj G(x)

=(ej , Dj g(x)), j

=1, . . . , k ,

which are clearly l.i. vectors. This completes the proof that T a M is a k-dimensional subspace of Rn.

9.5 Remark: Part of the above proof (in fact the proof of inclusion (4)) shows that for any a ∈ M andforany y ∈ Rk we can find γ : (−α,α) → Rn (i.e., γ is C1 on thewhole interval (−α,α) rather than

just on [0, α)) with α > 0, γ ((−α,α)) ⊂ M , γ (0) = a and γ (0) = kj =1 yj Dj G(a0). Notice also

7/28/2019 Math51H



that for any v ∈ T a M we can choose these y1, . . . , yk such that the kj =1 yj Dj G(a0) = v (by (5)in the above proof), i.e., γ can be selected so that γ (0) = v.

Given a C1 map f : U → R,where U is an open subset of Rn with U ⊃ M , we define the tangential gradient relative to M of f , denoted ∇ M f , (also referred to as the “gradient of f |M ”) by

9.6 ∇ M f(a) = P T a M (∇ Rn f ( a)), a ∈ M ,

where P T a M : Rn → Rn is the orthogonal projection onto the tangent space T a M (as in 8.3 of Ch. 1

with V = T a M ). Recall that by 8.3 of Ch. 1 in fact the definition 9.6 simply says that ∇ M f(a) isthe T a M -component v in the decomposition ∇ Rn f(a) = v + u with v ∈ T a M and u ∈ (T a M)⊥,hence the name “tangential gradient.”

Of course since P T a M is a linear map we have that ∇ M f is also linear in f in the sense that if f j : U j → R is C1 with U j open and U j ⊃ M for j = 1, 2, and if λ, μ ∈ R, then

9.7 ∇ M (λf 1 + μf 2)(a) = λ∇ M f 1(a) + μ∇ M f 2(a), a ∈ M .

9.8 Lemma. Suppose M is a k-dimensional C1 submanifold of Rn and f : U → R is C1 , where U is an open set containing M . Then at each point a ∈ M we have

(∗) v ∈ T a M ⇒ Dv f(a) = v · ∇ Rn f(a) = v · ∇ M f (a) .

Furthermore, if f : U → R is another C1 function with U ⊃ M and if f |M = f |M , then ∇ M f(a) =∇ M

f(a) ∀ a ∈ M .

9.9 Remarks: (1) Notice, in particular, that (

∗) implies that all the directional derivatives of f by

vectors v ∈ T a M are determined by the tangential gradient ∇ M f(a).(2) In the last part of the above lemma we use the notation f |M to mean the “restriction of f toM ,” meaning the function which is defined on M only (and not on the larger set U ) and whichagrees with f at all points of M ; the last part of the lemma in fact shows that the gradient of f onM , ∇ M f , depends only on f |M and is unchanged if we take another f , provided the new f is stillC1 and agrees with the original f at all points of M .

Proof of 9.8: For v ∈ T a M Def. 9.3 ensures that there is a C1 map γ : [0, α) → Rn with α > 0,γ ([0,α)) ⊂ M , γ (0) = a, and γ (0) = v. By the chain rule, d

dt f(γ(t)) = γ (t ) · ∇ Rn f(γ(t)) so,

in particular, d dt

f(γ(t))|t =0 = v · ∇ Rn f(a) = Dv f(a). On the other hand, since v ∈ T a M we havev = P T a M (v) and then by 8.3 (with V = T a M ) we have v · ∇ Rn f(a) = P T a M (v) · ∇ Rn f(a) =v

·P T

aM (

∇ Rn f(a))

=v

· ∇ M f(a) by the definition 9.6.

To prove that last part of the lemma, observe that under the stated hypotheses f − f is C1 on theopen set U ∩ U ⊃ M , and so we can apply the above argument with f − f in place of f . Sincef − f = 0 at each point of M , this gives v · ∇ M (f − f )(a) = 0 ∀ a ∈ M and all v ∈ T a M and sotaking v = ∇ M (f − f)(a) we conclude ∇ M (f − f )(a) = 0. Then by the linearity 9.7 this can be

written ∇ M f − ∇ M f(a) = 0, so the proof of 9.8 is complete.

7/28/2019 Math51H



In view of the above theorem the following definition is natural.

9.10 Definition: Suppose M is a k-dimensional C1 submanifold of Rn and f : U → R is C1,whereU is an open set containing M . Then a point a ∈ M is called a critical point of f |M (the restrictionof f to M as in 9.9(2)) if ∇ M f(a) = 0.

Using the terminology of the above definition we then have the following.

9.11 Lemma. If f |M has a local maximum or minimum at a ∈ M (i.e., if there is δ > 0 such that either f(x) ≤ f(a) ∀ x ∈ M ∩ Bδ(a) or f(x) ≥ f(a) ∀ x ∈ M ∩ Bδ (a) ) then a is a critical point of f |M

(i.e., ∇ M f(a) = 0 ).

9.12 Caution: Of course it need not be true under the hypotheses of the above lemma that the full

R

n

gradient ∇ Rn f(a) = 0, because we are only assuming f |M has a local maximum or minimumat a, and this gives no information about the directional derivatives of f at a by vectors v which arenot in T a M .

Proof of Lemma 9.11: Assume f |M has a local maximum at a, and take δ > 0 such that f(x) ≤f(a) for all x ∈ M ∩ Bδ(a). Let v ∈ T a M . By Rem. 9.5 we can find a C1 map γ : (−α,α) → Rn

with γ ((−α,α)) ⊂ M , γ (0) = 0, and γ (0) = v. γ is C1, hence continuous, and so there is anη > 0 such that |t | < η ⇒ γ(t) − γ (0) < δ, so, since γ (0) = a, we have |t | < η ⇒ γ(t) ∈ M ∩Bδ(a), and hence |t | < η ⇒ f(γ(t)) ≤ f(a), so f(γ(t)) has a local maximum at t = 0. Hence,by 1-variable calculus the derivative at t = 0 is zero. That is, by the Chain Rule and 9.8, 0 =d

dt f(γ(t))|t =0 = γ (0) · ∇ Rn f(a) = v · ∇ Rn f(a) = v · ∇ M f(a), so v · ∇ M f(a) = 0 for each v ∈

T a M . In particular, with v

= ∇ M f(a) we infer

∇ M f(a)

2

=0, hence

∇ M f(a)

=0.

If f |M has a local minimum at a the same argument can be applied to −f , hence the proof of 9.11is complete.

One of the things we shall prove later (in Ch. 4) is that if k ∈ {1, . . . , n − 1} then intersection of the level sets of the n − k C1 functions g1, . . . , gn−k is a k-dimensional C1 submanifold, at leastnear points of the intersection where the gradients ∇ Rn g1, . . . , ∇ Rn gn−k are l.i. More precisely:

9.13 Theorem. If U ⊂ Rn is open, if g1, . . . , gn−k : U → R are C1 , if S = {x ∈ U : gj (x) =0 for each j = 1, . . . , n − k} and if M = ∅ , where

M = {y ∈ S : ∇ Rn g1( y ) , . . . , ∇ Rn gn−k(y) are l.i.} ,

then M is a k-dimensional C1 submanifold.

We shall give the proof of this later, in Sec. 3 of Ch. 4, as a corollary of the Implicit Function Theorem.

9.14 Lagrange Multiplier Theorem. If U ⊂ Rn is open, if f, g1, . . . , gn−k : U → R are C1 , if S = {x ∈ U : gj (x) = 0 for each j = 1, . . . , n − k} , if a ∈ M ≡ {y ∈ S : ∇ g1( y ) , . . . ,

7/28/2019 Math51H



∇ gn−k(y) are l.i.} (so that M is a k-dimensional C1 manifold by 9.13 ), and if a is a critical point of f |M , then ∃ λ1, . . . , λn−k ∈ R such that ∇ Rn f(a) = n−k

j =1 λj ∇ Rn gj (a).

9.15 Terminology: The constants λ1, . . . , λn−k are called Lagrange Multipliers for f relative to the constraints g1, . . . , gn−k.

9.16 Remark: In view of 9.11 the hypothesis that a is a critical point of f |M is automatically true(and hence the conclusion of 9.14 applies) if f |M has a local maximum or local minimum at a (or,in particular, if f |S has a local maximum or local minimum at a ∈ M ).

Proof of 9.14: By definition of critical point of f |M we have

P T a M (∇ Rn f (a)) = ∇ M f(a) = 0 ,

and hence ∇ Rn f(a) ∈ (T a M)⊥. By 9.4, T a M has dimension k, and hence, by 8.1(iii) of Ch. 1, wehave dim(T a M)⊥ = n − k.

Now gj ≡ 0 on M , hence by Lem. 9.8 above we have ∇ M gj ≡ 0 on M and hence we also have∇ Rn gj (a) ∈ (T a M)⊥. Furthermore, ∇ Rn g1( a ) , . . . , ∇ Rn gn−k(a) are l.i., and hence, by Lem. 5.7(b)ofCh.1,theymustbeabasisfor (T a M)⊥,sothereareunique λ1, . . . , λn−k ∈ R suchthat ∇ Rn f(a) =n−k

j =1 λj ∇ Rn gj (a).

SECTION 9 EXERCISES

9.1 (a) Find the maximum value of the product x1x2 · · · xn subject to the constraints xj > 0 for eachj = 1, . . . , n and

nj =1 xj = 1, and (rigorously) justify your answer.

(b) Prove that if a1, . . . , an are positive numbers, then (a1a2 · · · an)1/n

≤a1

+a2

+···+an

n . (“The in-equality between the arithmetic and geometric mean.”)

9.2 Find:

(a) the minimum value of x2 + y2 on the line x + y = 2;

(b) the maximum value of x + y on the circle x2 + y2 = 2.

9.3 Suppose f( x , y , z ) = 3xy + z3 − 3z for (x,y,z) ∈ R3.

(i) Show that f has no local max or min in R3.

(ii) Prove that f attains its max and min values on the sphere x2 + y2 + z2 − 2z = 0.

(iii) Find all the points on the sphere x2 + y2 + z2 − 2z = 0 where f attains its max and min values.

9.4 Let n ≥ 2 and S n−1 = {x ∈ Rn : x = 1}. Prove that S n−1 is an (n − 1)-dimensional C1

submanifold.

Hint: Notice that we can write S n−1 =∪n

j =1

x ∈ Rn : i=j x2

i < 1 and xj =

1 −i=j x2i

∪∪n

j =1

x ∈ Rn : i=j x2

i < 1 and xj = −

1 −i=j x2

i

.

7/28/2019 Math51H


60

7/28/2019 Math51H


61

C H A P T E R 3

More Linear Algebra 1 PERMUTATIONS

A permutation on n elements is an ordered n-tuple (i1, . . . , in) which is a re-ordering of the first n

positive integers 1, . . . , n; that is the set P n of all such permutations is

P n= (i1, . . . , in)

: {i1, . . . , in

} = {1, . . . , n

} .

Notice that there are exactly n! such permutations, because there are exactly n ways to choose the

first entry i1, and for each of these n choices there are exactly n − 1 ways of choosing the entry i2,so the number of ways of choosing the first two entries i1, i2 is n(n − 1), and similarly (if n ≥ 3), atotal of n(n − 1)(n − 2) ways of choosing the first 3 entries i1, i2, i3, and so on.

Let 1 ≤ k < ≤ n, and define τ k, : Rn → Rn by

τ k,(x1, . . . ,

k-th↓xk, . . . ,

-th↓x, . . . , xn) = (x1, . . . ,

k-th↓x, . . . ,

-th↓xk, . . . , xn)

(i.e., τ k, interchanges the elements at position k, ). Such a transformation is called a transposition;notice that τ k, is its own inverse. Thus, τ k,

◦τ k,

=identity transformation on Rn:

τ k, ◦ τ k,(x1, . . . , xn) ≡ (x1, . . . , xn) ∀ (x1, . . . , xn) ∈ Rn .

We claim that any permutation can be achieved by applying a finite sequence of such transposi-tions. Indeed, if (i1, . . . , in) is any permutation then we can write this permutation in terms of apermutation of the first n − 1 (instead of n) integers as follows:

(i1, . . . , in) =

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

(

perm. of 1, . . . , n − 1, i1,. . . , in−1, n) if in = n

τ k,n( i1, . . . ,kthslot

in, . . . , in−1

perm. of 1, . . . , n

−1

, n) if in < n (i.e., ik = n for some k < n) ,

and so by induction on n it follows that each permutation can indeed be achieved by a sequenceof transpositions; that is, for any permutation (i1, . . . , in) we can find a sequence τ (1), . . . , τ (q) of

transpositions such that

1.1 (i1, . . . , in) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q)(1, . . . , n ) .

7/28/2019 Math51H


62 CHAPTER 3. MORE LINEAR ALGEBRA

Correspondingly, we can permute the n coordinates x1, . . . , xn of a point in Rn using the samesequence of transpositions:

1.2 (xi1, . . . , xin ) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (x1, x2, . . . , xn) .

1.3 Note: We of course do not claim the representation 1.1 is unique, and it is not—indeed thereare many different ways of writing a given permutation in terms of a sequence of transpositions.

For example, if n = 3 then the permutation (3, 2, 1) can be written as τ 1,3(1, 2, 3) but also asτ 2,3 ◦ τ 1,2 ◦ τ 2,3(1, 2, 3).

An important quantity related to a permutation (i1, . . . , in) is the integer N (i1, . . . , in), which isdefined as the number of inversions present in the sequence i1, . . . , in; that is,

N (i1, . . . , in) = the number of pairs k, ∈ {1, . . . , n} with > k and i < ik .

One usually computes this asn−1

j =1 N j (i1, . . . , in), where

N 1(i1, . . . , in) = number of integers > 1 such that i < i1

N 2(i1, . . . , in) = the number of integers > 2 such that i < i2

......

N n−1(i1, . . . , in) = the number of integers > n − 1 such that i < in−1

(so that N n−1(i1, . . . , in) = 1 if in < in−1 and = 0 if in > in−1).

The quantity N (i1, . . . , in) allows us to define the “parity” (i.e., “evenness” or “oddness”) of a per-

mutation.

1.4 Definition: The permutation (i1, . . . , in) is said to be even or odd according as the integerN (i1, . . . , in) is even or odd.

Notice that then

(−1)N(i1,...,in) =

+1 if (i1, . . . , in) is even

−1 if (i1, . . . , in) is odd .

The quantity (−1)N(i1,...,in) is called the index of the permutation (i1, . . . , in).

1.5 Lemma. If 1 ≤ k < ≤ n and if (i1, . . . , in) is a permutation of (1, . . . , n ) , then

(

−1)N(i1,...,i,...,ik ,...,in)

= −(

−1)N(i1,...,ik ,...,i,...,in) .

1.6Remarks: (1) Notice that (i1, . . . , i, . . . , ik, . . . , in) can be written in terms of the transpositionτ k, as τ k,(i1, . . . , ik, . . . , i, . . . , in), so the above lemma says that the parity of the permutation

changes each time we apply a transposition; notice, in particular, this means that theindex (−1)N(i1,...,in)

can be written alternatively as

(−1)N(i1,...,in) = (−1)q ,

7/28/2019 Math51H


1. PERMUTATIONS 63

where q is the number of transpositions appearing in 1.1 above.(2) An additional consequence of the lemma is that for each n ≥ 2 there are n!

2even permutations

and n!2

odd permutations, because according to the lemma the transposition τ 1,2 maps the evenpermutations to the odd permutations, and this map is clearly 1:1 and onto (it is onto because any permutation (i1, i2, i3, . . . , in) can be written τ 1,2(i2, i1, i3, . . . , in)).

Proof of Lemma 1.5: If = k + 1, then ik, i are adjacent elements in the permutation (i1, . . . , in).Observe that, if we use the notation that #A = the number of elements in the set A (zero if A

is empty), then since the pair in the k-th and (k + 1)’st slots is ik+1, ik (which is an inversion if

ik+1 > ik but not if ik < ik+1) we have

ik

+1 > ik

⇒N k(i1, . . . , ik

−1,

k-th↓ik

+1,

(k + 1)’st↓ik, . . . , in)

=1

+#

{j

:j

≥k

+2 and ij < ik

+1

}and N k+1(i1, . . . , ik−1, ik+1, ik, . . . , in) = #{j : j ≥ k + 2 and ij < ik} ,

so that

N k (i1, . . . , ik−1, ik+1, ik, . . . , in) + N k+1(i1, . . . , ik−1, ik+1, ik , . . . , in) =1 + #{j : j ≥ k + 2 and either ij < ik+1 or ij < ik} ,

which, except for the “1+” on the right side, is exactly the same as

N k (i1, . . . , ik−1, ik, ik+1, . . . , in) + N k+1(i1, . . . , ik−1, ik, ik+1, . . . , in) .

That is, we have shown that

ik+1 > ik ⇒N k(i1, . . . , ik−1, ik+1, ik, . . . , in) + N k+1(i1, . . . , ik−1, ik+1, ik, . . . , in) =N k(i1, . . . , ik−1, ik , ik+1, . . . , in) + N k+1(i1, . . . , ik−1, ik, ik+1, . . . , in) + 1 .

By a similar argument (indeed it’s the same identity applied to the permutation (j 1, . . . , j n), wherej = i for = k, k + 1 and j k = ik+1, j k+1 = ik) we have

ik+1 < ik ⇒N k(i1, . . . , ik−1, ik+1, ik, . . . , in) + N k+1(i1, . . . , ik−1, ik+1, ik, . . . , in) =N k(i1, . . . , ik−1, ik , ik+1, . . . , in) + N k+1(i1, . . . , ik−1, ik, ik+1, . . . , in) − 1 .

Sincetrivially N p(i1, . . . , ik−1, ik+1, ik, . . . , in) = N p(i1, . . . , ik−1, ik , ik+1, . . . , in) if either p < k

or p > k + 1, then we have actually shown that

N (i1, . . . , ik−1, ik+1, ik, . . . , in) = N (i1, . . . , ik−1, ik, ik+1, . . . , in) ± 1

and so (−1)N(i1,...,ik−1,ik+1,ik ,...,in) = −(−1)N(i1,...,ik−1,ik ,ik+1,...,in) and the lemma is proved in thecase when = k + 1.

In the case when ≥ k + 2 we can get the permutation (i1, . . . , i, . . . , ik, . . . , in) from the originalpermutation (i1, . . . , ik, . . . , i, . . . , in) by first making a sequence of − k transpositions of adja-cent elements to move ik to the right of i, and then making a sequence of − k − 1 transpositions

7/28/2019 Math51H



of adjacent elements to move i back to the kth position. In all that is 2( − k) − 1 transpositionsof adjacent elements, each changing the index by a factor of −1, so the total change in the index is(−1)2(−k)−1 = −1, and the lemma is proved.

If (i1, . . . , in) isanypermutationof 1, . . . , n,recallthatthenwecanpicktranspositionsτ (1), . . . , τ (q)

such that (i1, . . . , in) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (1, . . . , n ). Then (since τ (i) ◦ τ (i) = identity for eachi) the same sequence of transpositions in reverse order “undoes” this permutation: (1, . . . , n ) =τ (q) ◦ · · · ◦ τ (1)(i1, . . . , in). The permutation (j 1, . . . , j n) = τ (q) ◦ · · · ◦ τ (1)(1, . . . , n ) is called the“inverse” of (i1, . . . , in).Observethat j k is uniquely determined by thecondition ij k = k foreach k =1, . . . , n, as one sees by taking xk = ik in the identity (xj 1 , . . . , xj n ) = τ (q) ◦ · · · ◦ τ (1) (x1, . . . , xn).Observealsothat(−1)N(j 1,...,j n) = (−1)q = (−1)N(i1,...,in); that is,the permutation (i1, . . . , in) andits inverse (j 1, . . . , j n) have the same index.

SECTION 1 EXERCISES

1.1For each of thefollowing permutations,find theparity (i.e.,evenness or oddness) by twomethods:(i) by directly calculating the number N of inversions; and (ii) by representing the permutation by a

sequence of transpositions applied to the trivial permutation (1, 2, . . . , n ):

(a) (4, 3, 1, 5, 7, 2, 6), (b) (2, 3, 1, 8, 5, 4, 7, 6).

2 DETERMINANTS

We here take the general “multilinear algebra” approach, which begins with the following abstract

theorem (which in the course of the proof will become quite concrete).

2.1 Theorem. There is one and only one function D : R

n × · · · × Rn

n-factors

→ R having the following prop-erties:

(i) D is linear in the j -th factor for each j = 1, . . . , n; For j = 1 this means

D(c1β1 + c2γ 1, α2, . . . , αn) = c1D(β1, α2, . . . , αn) + c2D(γ 1, α2, . . . , αn) ,

for any c1, c2 ∈ R and any β1, γ 1, α2, . . . , αn ∈ Rn , and we require a similar identity for the

other factors;

(ii) D changes sign when we interchange any pair of factors: that is, for any k < ,

D(α1, . . . , αk , . . . , α, . . . , αn) = −

D(α1, . . . , α, . . . , αk, . . . , αn) ;

(iii) D(e1, . . . , en) = 1.

2.2 Remark: Observe that property (ii) says, in particular, that D(α1, . . . , αn) = 0 in case any twoof the vectors αi , αj (i = j ) are equal, because in that case by interchanging αi and αj and usingproperty (ii) we conclude that D(α1, . . . , αn) must be the negative of itself, i.e., it must be zero.

7/28/2019 Math51H


2. DETERMINANTS 65

Proof: The proof is fairly straightforward now that we have the basic properties of permutations atour disposal: We start by assuming such a D exists, and seeing what conclusions that enables us todraw. In fact, if such D exists, then we can write each vector αj in the expression D(α1, . . . , αn) asa linear combination of the standard basis vectors and use the linearity property (i). Thus, we writeα1 = n

i1=1 ai11 ei1, α2 = n

i2=1 ai22 ei2, . . . , αn = n

in=1 ainnein , and use the linearity property (i)on each of the sums. This gives the expression

D(α1, . . . , αn) =n

i1,i2,...,in=1

ai11 ai22 · · · ainnD(ei1, ei2

, . . . , ein) .

In view of Rem. 2.2 above we can drop all terms where i = ik for some k = ; that is we aresumming only over distinct i1, . . . , in, i.e., over i1, . . . , in such that (i1, . . . , in) is a permutation of

the integers 1, . . . , n, so only n! terms (from the total of nn

terms) are possibly nonzero, and we can writeD(α1, . . . , αn) =

distinct i1,...,in ≤ n

ai11 ai22 · · · ainnD(ei1, ei2

, . . . , ein ) .

Now interchanging any two of the indices ik, i (with k = ) just changes the sign of the ex-pression D(ei1

, ei2, . . . , ein ) (by virtue of property (ii)). So pick a sequence of transpositions

τ (1), . . . , τ (q) such that (i1, . . . , in) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q)(1, . . . , n ); then D(ei1, . . . , ein ) =

(−1)qD(e1, . . . , en), and from our discussion of permutations we know that (−1)q is just the index(−1)N(i1,...,in) of the permutation (i1, . . . , in), and also D(e1, . . . , en) = 1 by property (iii), so wehave proved (assuming a D satisfying (i), (ii), (iii) exists) that

(1) D(α1, . . . , αn) = perms. (i1,...,in) of {1,...,n}(−1)N(i1,...,in)ai11 ai22 · · · ainn .

Thus, if such a mapD exists, then there can only be one possible expression for it,Viz. the expressionon the right of (1).Thus, to complete the proof we just have to show that if we define D(α1, . . . , αn)

to be the expression on the right side of (1), then D has the required properties (i), (ii), (iii).

Now properties (i), (iii) are obviously satisfied, so all we really need to do is check property (ii).

To check property (ii) observe first that ik, i are dummy variables so we can relabel them p, q, in which case

D(α1,..,

k-th↓

α, ..,

-th↓

αk,..,αn)

= distinct i1,...p,...,q,...,in

(−

1)N(i1,...,

k-th↓

p,...,

-th↓

q,...,in)ai1

1 ai2

2

· · ·

k-th↓

ap

· · ·

-th↓

aqk

· · ·ainn

=

distinct i1,...p,...,q,...,in

(−1)N(i1,...,p,...,q,...,in)ai11 ai2 2 · · · aqk · · · ap · · · ainn

= −

distinct i1,...q,...,p,...,in

(−1)N(i1,...,q,...,p,...,in)ai11 ai22 · · · aqk · · · ap · · · ainn ,

7/28/2019 Math51H



where we used the fact that (−1)N(i1,...,p,...,q,...,in) = −(−1)N(i1,...,q,...p,...,in) (by the main lemma,Lem. 1.5, in the previous section), and now we simply relabel q = ik and p = i to see that the finalexpression here is indeed, as claimed, the quantity −D(α1, . . . , αk , . . . , α, . . . , αn).

Thus, the proof of the theorem is complete.

We can now define the determinant det A of an n × n matrix A = (aij ).

2.3 Definition: det A = D(α1, . . . , αn), where α1, . . . , αn are the columns of A.

We can now check the various properties of determinant: The first 3 are just restatements of prop-erties (i), (ii), (iii) of the theorem, and hence require no proof.

P1. det A is a linear function of each of the columns of A (assuming the other columns are held

fixed).P2. If we interchange 2 (different) columns of A, then the value of the determinant is changed by afactor −1 (and hence det A = 0 if A has two equal columns).

P3. If A = I (the identity matrix), then det A = 1.

P4. If A, B are n × n matrices, then det AB = det A det B.

Proof of P4: Let A = (aij ), B = (bij ). Recall that by definition of matrix multiplication we havethat the j -th column of AB is

nk=1 bkj αk , where αk is the k-th column of A. Thus, using the

linearity and alternating properties (i), (ii) for D we have

detAB

=D(

n

k1=1

bk11

αk1

, . . . ,

n

kn=1

bknn

αkn

)

=n

k1,k2,...,kn=1

bk11 bk22 · · · bknn D(αk1, . . . , αkn )

=

distinct k1,k2,...,kn

bk11 bk2 2 · · · bknn D(αk1, . . . , αkn

)

=

distinct k1,k2,...,kn

(−1)N(k1,...,kn)bk11 bk22 · · · bknn D(α1, . . . , αn)

= det B det A

as claimed.

P5. det A = det A T (and hence the properties P1, P2 apply also to the rows of A).

P6. (Expansion along the i-th row or j -th column.) The following formulae are valid:

det A = nj =1(−1)i+j aij det Aij for each i = 1, . . . , n

det A = ni=1(−1)i+j aij det Aij for each j = 1, . . . , n ,

7/28/2019 Math51H


2. DETERMINANTS 67

where Aij denotes the (n − 1) × (n − 1) matrix obtained by deleting the i-th row and j -th columnof the matrix A.

We defer the proof of P5, P6 for a moment, in order to make an important point about practicalcomputation of determinants:

2.4 Remark: (Highly relevant for the efficient computation of determinants.) Notice that one almostnever directly uses the definition (or for that matter P6) to compute a determinant. Rather, one usesthe properties P1, P2 (and the corresponding facts about rows instead of columns which we can doby P5) in order to first drastically simplify the matrix before attempting to expand the determinant.In particular, observe that the 3rd kind of elementary row operation does not change the value of

the determinant at all:

det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

r1

r2

...

r i

...

rj + λr i

...

rn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

r 1

r2

...

r i

...

rj

...

rn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+ λ det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

r1

r2

...

r i

...

r i

...

rn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

← i-th

← j -th

= det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

r 1

r2

...

r i

...

rj

...

rn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ where we used P1 and P2.We can of coursealso usetheothertwotypes of elementary row operations,so long as we keep track of the fact that interchanging 2 rows changes the sign of the determinantand multiplying a row by a constant λ has the effect of multiplying the determinant by λ.

We now want to check properties P5, P6.

Proof of P5: Recall that if (i1, . . . , in) is any permutation of 1, . . . , n then we can pick transpo-sitions τ (1), . . . , τ (q) such that (i1, . . . , in) = τ (1) ◦ τ (2) ◦ · · · ◦ τ (q) (1, . . . , n ), and then the samesequence of transpositions in reverse order “undoes” this permutation: (1, . . . , n ) = τ (q) ◦ · · · ◦τ (1)(i1, . . . , in). The permutation (j 1, . . . , j n) = τ (q) ◦ · · · ◦ τ (1)(1, . . . , n ) is called the “inverse” of (i1, . . . , in). Observe that j k is uniquely determined by the condition ij k = k for each k = 1, . . . , n,

as one sees by taking xk = ik in the identity (xj 1 , . . . , xj n ) = τ

(q)

◦ · · · ◦ τ

(1)

(x1, . . . , xn). Ob-serve also that (−1)N(j 1,...,j n) = (−1)q = (−1)N(i1,...,in); that is, the permutation (i1, . . . , in) and

its inverse (j 1, . . . , j n) have the same index.

Now

det A =

distinct i1,...,in

(−1)N(i1,...,in)ai11 ai22 · · · ainn ,

7/28/2019 Math51H



and observe that the terms in the product ai11 ai22 · · · ainn can be reordered so that the j k-th entry (which is aij k

j k = akj k ) appears as the k-th factor. Thus,

ai11 ai22 · · · ainn = a1j 1 a2j 2 · · · anj n ,

and, as we mentioned above, the index (−1)N(j 1,...,j n) of (j 1, . . . , j n) is the same as the index(−1)N(i1,...,in) of (i1, . . . , in). We thus have

det A =

distinct i1,...,in

(−1)N(j 1,...,j n)a1j 1 a2j 2 · · · anj n .

Since the correspondence (i1, . . . , in) → (j 1, . . . , j n) between a permutation and its inverse is 1:1(hence onto all n

!permutations), we see that (j 1, . . . , j n) visits each of the n

!permutations exactly

once in the above sum over distinct i1, . . . , in. That is, we have shown

det A =

distinct j 1,...,j n

(−1)N(j 1,...,j n)a1j 1 a2j 2 · · · anj n .

But now observe what we have here is exactly the expression for the determinant of A T. Thus,det A = det A T as claimed in P5.

Proof of P6: In view of P5 we need only prove the second identity here, because the first followsby applying the second to the transpose matrix. Furthermore, we need only prove the first identity in the case j = n, because the case j = n implies the result for j < n by virtue of property P2. Letα1, . . . , αn be the columns of A. Since αn

= ni

=1 ain ei , we have

det A = D(α1, . . . , αn) = D(α1, . . . , αn−1,n

i=1 ain ei )(1)= n

i=1 ainD(α1, . . . , αn−1, ei ) ,

and by making n − i interchanges of adjacent rows, and using the fact that (by P5) property P2 alsoapplies with respect to interchanges of rows rather than columns, we can move the i-th row of then × n matrix (α1α2 · · · αn−1ei ) to the n-th position, at the expense of gaining a factor (−1)n−i ;andso, since (−1)n−i = (−1)n+i , we obtain

(2) D(α1, . . . , αn−1, ei ) = (−1)n+i det

Ain 0r i 1

,

where 0 is the zero column vector in Rn

−1 and r

iis the row vector (a

i1, a

i2, . . . , a

in−1)∈Rn

−1

(i.e., r i is just the first n − 1 entries of the i-th row of A).

Now by definition

det

Ain 0r i 1

=

distinct i1,...,in ≤ n

(−1)N(i1,...,in)bi11 bi22 . . . binn ,

7/28/2019 Math51H


3. INVERSE OF A SQUARE MATRIX 69

where bij is the element in the i-th row and j -th column of the matrix Ain 0r i 1; notice that

binn = 0 if in = n and 1 if in = n, so this expression simplifies to

det

Ain 0r i 1

=

distinct i1,...,in−1≤n−1

(−1)N(i1,...,in−1,n)bi11 bi22 . . . bin−1n−1 ,

which of course (since N (i1, . . . , in−1, n) = N (i1, . . . , in−1)) is just exactly the expression for de-terminant of the (n − 1) × (n − 1) matrix obtained by deleting the n-th row and the n-th column

of the matrix

Ain 0r i 1

, i.e., it is the determinant det Ain ; that is,

(3) detAin 0r i 1 = det Ain .

By combining (1), (2), (3) we have P6 as claimed.

SECTION 2 EXERCISES

2.1 Calculate the determinant of ⎛⎜⎜⎜⎜⎝

10 11 12 13 426

2000 2001 2002 2003 421

2 2 1 0 419

100 101 101 102 2000

2003 2004 2005 2006 421

⎞⎟⎟⎟⎟⎠

(show all row/column operations: no calculators).

2.2 Prove that the “Vandermonde determinant” = det

⎛⎜⎜⎜⎜⎜⎝1 1 · · · 1

x1 x2 · · · xn

x21 x2

2 · · · x2n

......

xn−11 xn−1

2 · · · xn−1n

⎞⎟⎟⎟⎟⎟⎠is the product of all possible factors xj − xi with 1 ≤ i < j ≤ n; that is, = 1≤i<j ≤n(xj − xi ).

Hint: Consider using row operations to reduce the n-variable case to the (n

−1)-variable case.

3 INVERSE OF A SQUARE MATRIX

Oneveryimportanttheoreticalconsequenceoftheformula P6 oftheprevioussectionisthatitenablesustoprovethe inverse ofan n × n matrix A (i.e.,the n × n matrix A−1 suchthat A−1A = AA−1 = I )

7/28/2019 Math51H



exists if det A = 0. First observe that according to the first identity in P6 we have

nk=1

(−1)i+kak det Aik = det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a11 a12 · · · a1n

a21 a22 · · · a2n

......

...

a1 a2 · · · an

......

...

a1 a2 · · · an

......

...

an1 an2 · · · ann

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

← i-th row

← -th row

which is zero if i = . On the other hand, if i = then P6 says the expression on the left is justdet A. Thus, we have

nk=1

(−1)i+kak det Aik =

det A if i =

0 if i = ,

or in other words (labeling the indices differently)

nk=1

(−1)j +kaik det Aj k =

det A if i = j

0 if i = j .

Notice that the expression on the left is just the element in the i-th row and j -th column of thematrix product AM , where M is the n

×n matrix with entry mij in the i-th row and j -th column,

where mij = (−1)i+j det Aj i ; in other words,

M = (−1)i+j det Aj i

,

and then the above, in matrix terminology, just says

AM = (det A) I ,

where I is the n × n identity matrix. By using an almost identical argument, based on the second

identity in P6 rather than the first, we also deduce

MA = (det A) I ,

whereM

is the same matrix.3.1 Terminology: The above matrix M (which has the entry (−1)i+j det Aj i in its i-th row andj -th column) is called the “adjoint matrix of A,” and is denoted adj A.

Because of the above we now have a complete understanding of when an n × n matrix A has aninverse , i.e., when there is an n × n matrix A−1 such that A−1A = AA−1 = I .

7/28/2019 Math51H


3. INVERSE OF A SQUARE MATRIX 71

3.2 Theorem. Let A be an n × n matrix. Then A has an inverse if and only if det A = 0 , and in this case the inverse A−1 is given explicitly by A−1 = 1

det Aadj A.

Proof: First assume det A = 0. Then by the formulae adj A A = A adj A = (det A) I proved above, we see that indeed the inverse exists and is given explicitly by the formula 1

det Aadj A.

Conversely, if A−1 exists, then we have AA−1 = I , and hence taking determinants of each side andusing properties P4 and P3 we obtain (det A) (det A−1) = 1, which implies det A = 0, so the proof is complete.

3.3 Remark: Notice that the last part of the above argument gives the additional information that

det A−

1

=1

det A.

Recall (see the remark after the statement of properties P1–P6) that we can compute det A by usingelementary row operations and that the first type of elementary row operation (interchanging any 2 rows) only changes the sign, the second type (multiplying a row by a nonzero scalar λ) multipliesthe determinant by λ (= 0), and the third type (replacing the i-th row by the i-th row plus any multiple of the j -th row with i = j ) does not change the value of the determinant at all. Thus,

none of the elementary row operations can change whether or not the matrix has zero determinant.In particular, this means that det A = 0 ⇐⇒ det rref A = 0. On the other hand, since A is n × n

(i.e., we are in the case m = n), then, from our previous discussion of rref A in Sec. 9. of Ch. 1, weknow that either rref A

=I (the n

×n identity matrix) or rref A has a zero row (in which case it

trivially has determinant = 0). We also recall (again from Sec. 9 of Ch. 1) that (since we are in thecase m = n) rref A has a zero row if and only if the null space N(A) is nontrivial (i.e., if and only if the homogeneous system Ax = 0 has a nontrivial solution), which in turn is true if and only if A

has rank less than n.

Thus, combining these facts with Thm. 3.2 above, we have:

3.4 Theorem. If A is an n × n matrix, then

A−1 exists ⇐⇒ det A = 0 ⇐⇒ rref A = I ⇐⇒ N(A) = {0} ⇐⇒ rank A = n .

SECTION 3 EXERCISES

3.1 Suppose A, B are n × n matrices and AB = I . Prove that then det A = 0 and B =(det A)−1adj A. (In particular, AB = I ⇒ BA = I and B is the unique inverse (det A)−1adj A).

3.2 If A, B are, respectively, m × n and n × p matrices, prove

(a) (AB) T = B TA T, and (b) m = n = p and A, B invertible ⇒ AB invertible and (AB)−1 =B−1A−1.

7/28/2019 Math51H



3.3 Let A = (aij ) be an n × n matrix with det A = 0, and b = (b1, . . . , bn) T ∈ Rn. Prove that thesolution x = (x1, . . . , xn) T = A−1b of the system Ax = b is given by the formula

xi = det Ai

det A, i = 1, . . . , n ,

where Ai is obtained from A by replacing the ith column of A by the given vector b.

Hint: Use the formula A−1 = (det A)−1

(−1)i+j det Aj i

proved above to calculate the i-th com-

ponent of A−1b.

4 COMPUTING THE INVERSE

By Exercise 3.1 above, an n×

n matrix B is the inverse of the n×

n matrix A if and only if AB=

I ,

where I is the n × n identity matrix, i.e., the matrix with j -th column ej . Since the j -th column of AB is Aβj , where βj is the j -th column of B, to find the inverse of A we thus have to solve

4.1 Ax = ej , j = 1, . . . , n .

In view of the discussion in Sec. 10 of Ch. 1 we can attempt to do this by using elementary row operations on the augmented matrix A|ej to give the new augmented matrix

4.2 rref A|βj ,

and byThm. 3.4 above we knowthatA−1 existsifandonlyifrref A = I , in which case theaugmented

system in 4.2 is simply

4.3 I

|βj , j

=1, . . . , n ÿ,

and hence if A−1 exists then the solution of 4.1, which gives the j -th column of A−1, is exactly the vector which appears right side of 4.2 (and in that case rref A = I ). We can of course solve allj equations in 4.1 simultaneously by working with the n × 2n augmented matrix A|I (because thej -th column of I is ej ).

Thus, to summarize, we can find the inverse by starting with the n × 2n augmented matrix A|I andapplying elementary row operations, which produce rref A = I as the first n columns, and produces

the n × 2n augmented matrixI |B .

Then B is the required inverse A−1. This process will also tell us when A−1 does not exist, becausein that case rref A will have at least one row of zeros (and det A = 0).

For example, if A is 3 × 3 matrix⎛⎝1 4 3

1 4 5

2 5 1

⎞⎠ then we compute the inverse (and at the same time

check that the inverse exists) by looking at the 3 × 6 augmented matrix⎛⎝ 1 4 3 1 0 0

1 4 5 0 1 0

2 5 1 0 0 1

⎞⎠ ,

7/28/2019 Math51H


5. ORTHONORMAL BASIS AND GRAM-SCHMIDT 73

and using elementary row operations as follows:⎛⎝ 1 4 3 1 0 0

1 4 5 0 1 0

2 5 1 0 0 1

⎞⎠ r2 → r2 − r1

r3 → r3 − 2r1

⎛⎝ 1 4 3 1 0 0

0 0 2 −1 1 0

0 −3 −5 −2 0 1

⎞⎠r2 → 1

2r2

r3 → −r3

⎛⎝ 1 4 3 1 0 0

0 0 1 −1/2 1/2 0

0 3 5 2 0 −1

⎞⎠r1 → r1 − 3r2

r3 → r3 − 5r2

⎛⎝ 1 4 0 5/2 −3/2 0

0 0 1 −1/2 1/2 0

0 3 0 9/2 −5/2 −1

⎞⎠

r3 → 13

r3

⎛⎝ 1 4 0 5/2

−3/2 0

0 0 1 −1/2 1/2 0

0 1 0 9/6 −5/6 −1/3

⎞⎠r1 → r1 − 4r3

⎛⎝ 1 0 0 −21/6 11/6 4/3

0 0 1 −1/2 1/2 0

0 1 0 9/6 −5/6 1/3

⎞⎠r2 ↔ r3

⎛⎝ 1 0 0 −21/6 11/6 4/3

0 1 0 9/6 −5/6 1/3

0 0 1 −1/2 1/2 0

⎞⎠

.

Therefore, A−1

= ⎛⎝−7/2 11/6 4/3

3/2

−5/6

−1/3

−1/2 1/2 0 ⎞⎠.

SECTION 4 EXERCISES

4.1 Calculate the inverse A−1 of A, if A is the 4 × 4 matrix

⎛⎜⎜⎝1 1 1 1

0 1 1 1

1 0 2 3

0 0 1 2

⎞⎟⎟⎠.

5 ORTHONORMAL BASIS AND GRAM-SCHMIDT

Letv

1, . . . , v

N be vectors inRn

. We say that the vectors are orthonormal if v

i ·v

j =δ

ij ,whereδ

ij isthe Kronecker delta (= 1 when i = j and = 0 when i = j ). Thus, v1, . . . , vN orthonormal meansthat each vector has length 1 and the vectors are mutually orthogonal (i.e., each vector is orthogonal

to all the others).

If V is a nontrivial subspace of Rn then an orthonormal basis for V is a basis for V consisting of anorthonormal set of vectors.

7/28/2019 Math51H



5.1 Remark: Observe that orthonormal vectors v1, . . . , vN are automatically l.i., becauseN

i=1

ci vi = 0 ⇒ cj = 0 ∀ j = 1, . . . , N ,

as we see by taking the dot product of each side of the identity N

i=1 ci vi = 0 with the vector vj .

5.2 Lemma. If V is a nontrivial subspace of Rn of dimension k and if u1, . . . , uk is an orthonormal basis for V , then the orthogonal projection P V of Rn onto V (as in Thm. 8.3 of Ch.1 ) is given explicitly by

P V (x) =k

i=1

x · ui ui , x ∈ Rn .

Proof: Define Q(x) = ki=1 x · ui ui for x ∈ Rn, and observe that then Q(x) ∈ V by definition.

Also, since uj · ui = δij , uj · (x − Q(x)) = uj · x − x · uj = 0, so that x − Q(x) is orthogonal to

each uj , hence orthogonal to span{u1, . . . , uk}.That is, x ∈ V and x − Q(x) ∈ V ⊥ and hence Q(x)

satisfies the two conditions which according to Thm. 8.3 of Ch.1 uniquely determine the orthogonalprojection P V . Hence, Q = P V and the proof is complete.

Our main result here is the following.

5.3 Theorem. Let v1, . . . , vk be any l.i. set of vectors in Rn. Then there is an orthonormal set of vectors u1, . . . , uk such that span{u1, . . . , uj } = span{v1, . . . , vj } for each j = 1, . . . , k , and in fact

u1, . . . , uj is an orthonormal basis for span{v1, . . . , vj } for each j = 1, . . . , k.

Proof (“Gram-Schmidt orthogonalization”): We proceed inductively to construct orthonormalu1, u2, . . . , uk suchthatspan{u1, . . . , uj } = span{v1, . . . , vj } foreach j by the following procedure,in which we use the notation that P j denotes the orthogonal projection of Rn onto span{v1, . . . , vj }for each j = 1, . . . , k:

u1 = v1−1v1

and, assuming j ∈ {2, . . . , k − 1} and that we have selected orthonormal u1, . . . , uj withspan{u1, . . . , uj } = span{v1, . . . , vj }, we define

(1) uj +1 = vj +1 − P j (vj +1)−1(vj +1 − P j (vj +1)) .

Observe that this makes sense because vj +1 − P j (vj ) = 0 (because by definitionP j (vj +1) isalinearcombination of v1, . . . , vj and so vj +1 − P j (vj ) is a nontrivial linear combination of v1, . . . , vj +1

and hence cannot be zero because v1, . . . , vj +1 are l.i.).

Then by definition, uj +1 is a unit vector, and if i = 1, . . . , j then we have

ui · uj +1 = c(ui · vj +1 − ui · P j (vj +1)), c = vj +1 − P j (vj +1)−1 ,

7/28/2019 Math51H


6. MATRIX REPRESENTATIONS OF LINEARTRANSFORMATIONS 75

and by Thm. 8.3(ii) of Ch. 1 the right side here is c(ui · vj +1 − P (ui ) · vj +1) = c(ui · vj +1 − ui ·vj +1) = 0, so uj +1 is orthogonal to u1, . . . , uj , hence (since the inductive hypothesis guaranteesthat u1, . . . , uj are orthonormal) u1, . . . , uj +1 are orthonormal and by 5.1 are l.i. vectors in the(j + 1)-dimensional space span{v1, . . . , vj +1}, and so are a basis for span{v1, . . . , vj +1}. Thus, inparticular, span{u1, . . . , uj +1} = span{v1, . . . , vj +1} and the proof of 5.3 is complete.

5.4 Remark: Notice that in view of Lem. 5.2 the inductive definition (1) in the above proof can be

written

uj +1 = vj +1 −j

i=1

vj +1 · ui ui−1(vj +1 −j

i=1

vj +1 · ui ui ) .

The procedure of getting orthonormal vectors in this way is known as Gram-Schmidt orthogonaliza-

tion.

Finally, we want to introduce the important notion of orthogonal matrix.

5.5 Definition: An n × n matrix Q = (qij ) is orthogonal if the columns q1, . . . , qn of Q areorthonormal.

Observe that by definition Q orthogonal ⇒ Q TQ = I , because by definition of matrix multiplica-tion the entry of Q TQ in the i-th row and j -th column is qi · qj which is δij because q1, . . . , qn

are orthonormal. Notice that the fact the Q TQ = I means that det(Q TQ) = 1, and hence by prop-erties P4, P5 we see that (det Q)2 = 1, or in other words det Q = ±1. In particular, det Q = 0,so Q−1 exists by Thm. 3.4. Indeed, since Q TQ = I we see that Q−1 = Q T and hence we mustautomatically have QQ T

=I . That is (since Q

=(Q T) T) we have

Q orthogonal ⇐⇒ Q T orthogonal ,

so, in fact, Q orthogonal implies that the rows of Q are orthonormal.

SECTION 5 EXERCISES

5.1 If S is the two-dimensional subspace of R4 spanned by the vectors (1, 1, 0, 0) T, (0, 0, 1, 1) T,find an orthonormal basis for S and find the matrix of the orthogonal projection of R4 onto S .

5.2 If S is the subspace of R4 spanned by the vectors (1, 0, 0, 1) T, (1, 1, 0, 0) T, (0, 0, 1, 1) T, find anorthonormal basis for S , and find the matrix of the orthogonal projection of R4 onto S .

6 MATRIX REPRESENTATIONS OF LINEAR TRANSFORMATIONS

Suppose V is a nontrivial subspace of Rn and T : V → V is linear. Thus, T (λx + μy) = λT(x) +μT (y) ∀ x, y ∈ V and for all λ, μ ∈ R. We showed in Sec. 6 of Ch. 1 that, in case V = Rn, T is

7/28/2019 Math51H



represented by the n × n matrix A with j -th column T (ej ) in the sense that T (x) ≡ Ax for allx ∈ Rn. A similar result holds in the present more general case when V is an arbitrary nontrivialsubspace of Rn.

To describe this result we first need to introduce a little terminology.

6.1 Terminology: Let v1, . . . , vk be a basis for V . Then each x ∈ V can be uniquely representedkj =1 xj vj . The vector ξ = (x1, . . . , xk) T ∈ Rk is referred to as the coordinates of x relative to the

given basis v1, . . . , vk.

Observe that x = kj =1 xj vj ⇒ T (x) = n

j =1 xj T (vj ) by linearity of T , and since T (vj ) ∈ V

and v1, . . . , vk is a basis for V we can express T (vj ) as a unique linear combination of v1, . . . , vk;thus

6.2 T (vj ) =k

i=1

aij vi for some unique aij ∈ R .

The k × k matrix A = (aij ) is called the matrix of T relative to the given basis v1, . . . , vn.

Observe that, still with x = kj =1 xj vj , we then have T (x) = k

j =1 xj T (vj ) =kj =1

ki=1 aij xj vi = k

i=1(k

j =1 aij xj )vi . Thus if x has coordinates ξ = (x1, . . . , xk) T

relative to v1, . . . , vk then T(x) has coordinates Aξ relative to the same basis v1, . . . , vk, whereA = (aij ) is the matrix of T relative to the given basis v1, . . . , vk as in 6.2.

7 EIGENVALUES AND THE SPECTRAL THEOREM

Let A = (aij ) be an n × n (real) matrix. We begin with the definition of eigenvalue of A.

Observe that theexpression det(A − λI ) is a degree n polynomial in thevariable λ and the coefficientof the top degree term λn is (−1)n. Thus, det(A − λI ) = (−1)n(λn +n−1

j =0 aj λj ) for some real

numbers a1, . . . , an−1.

One can check this directly in case n = 2 (in which case det(A − λI ) = (a11 − λ)(a22 − λ) −a12a21 = λ2 − (a11 + a22)λ + (a11a22 − a12a21)) and it can easily be checked, in general, by in-duction on n.

Notice that then the fundamental theorem of algebra tells us that there are (possibly complex)numbers λ1, . . . , λn such that

7.1 det(A − λI ) ≡ (−1)n(λ − λ1) · · · (λ − λn), ∀ λ .

7.2 Definition: The numbers λ1, . . . , λn (which are the roots of the equation det(A − λI ) = 0) arecalled eigenvalues of the matrix A.

There are two important properties of such eigenvalues.

7/28/2019 Math51H


7. EIGENVALUES ANDTHE SPECTRAL THEOREM 77

7.3 Lemma. Let λ1, . . . , λn be the eigenvalues of A as in 7.2 above. Then:

det A = λ1 · · · λn (i.e., det A = the product of the eigenvalues of A );(i) for any λ ∈ R: λ an eigenvalue of A ⇐⇒ ∃ v ∈ Rn \ {0} with Av = λv .(ii)

7.4 Remark: Any v ∈ Rn \ {0} with Av = λv as in (ii)above is called an eigenvector of A corresponding the eigenvalue λ. Notice we only here consider eigenvectors corresponding to real eigenvalues λ; it

would also be possible to consider eigenvectors corresponding to complex eigenvalues λ, althoughin that case the corresponding eigenvectors would be in Cn \ {0} rather than in Rn \ {0}.

Proof of 7.3: First note that (i) is proved simply by setting λ = 0 in the identity 7.1.

To prove (ii), simply observe that, for any real λ, det(A − λI ) = 0 ⇐⇒ N (A − λI ) = {0} ⇐⇒∃ v ∈ R

n

\ {0} with (A − λI)v = 0 by Thm. 3.4 of the present chapter.

We can now state and prove the spectral theorem.

7.4 Theorem (Spectral Theorem.) Let A = (aij ) be a symmetric n × n matrix. Then there is anorthonormal basis v1, . . . , vn for Rn consisting of eigenvectors of A.

Proof: The proof is by induction on n. It is trivial in the case n = 1 because in that case the matrixA is just a real number and Rn is the real line R. So assume n ≥ 2 and as an inductive hypothesis

assume the theorem is true with n − 1 in place of n.

Let A(x) be the quadratic form given by the n × n matrix A. Thus,

A(x) = ni,j =1

aij xi xj ,

and observe that we can write

(1) A(x) = x2A(x), x = x−1x ∈ S n−1, x ∈ Rn \ {0} .

By Thm. 2.6 of Ch. 2 we know that A|S n−1 attains its minimum value m at some point v1 ∈ S n−1,

so, by (1), for x ∈ Rn \ {0}(2) x−2A(x) = A(

x) ≥ m = x−2A(x)|x=v 1

and hence the function x−2

A(x) : Rn

\ {0} →R actually attains a minimum at the point v1 andhence (see the discussion following 7.1 of Ch. 2) must have zero gradient at this point. But direct

computation shows that the i-th partial derivative of the function x−2A(x) is

2x−2 n

j =1

aij xj − x−3n

k,=1

akxk x xi

,

7/28/2019 Math51H



so at x = v1 = (v11, v12, . . . , v1n) T we getn

j =1

aij v1j − mv1i = 0, i = 1, . . . , n ,

which in matrix terms saysAv1 = mv1 ,

so, by 7.3(ii), we have shown that v1 is an eigenvector corresponding to eigenvalue m.

Let T be the linear transformation determined by the matrix A, so that

T (x) = Ax, x ∈ Rn ,

and let V = span{v1}.Then dim V = 1 and byThm. 8.1(iii) we know that dim V ⊥ = n − 1,anditisa straightforward exercise (Exercise7.1(b) below) to check that T : V ⊥ → V ⊥.Solet w1, . . . , wn−1

be an orthonormal basis for V ⊥ and let A0 = (a(0)ij ) bethe (n − 1) × (n − 1) matrix of T |V ⊥ relative

to the basis w1, . . . , wn−1 as discussed in Sec. 6 above. Observe that then by 6.2 we have

T (wj ) =n−1k=1

a(0)kj wk , j = 1, . . . , n − 1 ,

and by taking the dot product of each side with wi we get (using Exercise 7.1(a) below)

a(0)ij = wi · T (wj ) = wj · T (wi ) = a

(0)ji ,

so that a(0)ij

=a

(0)ji . Thus, (a

(0)ij ) is an (n

−1)

×(n

−1) symmetric matrix, and we can use the

inductive hypothesis to find an orthonormal basis u2, . . . , un of Rn−1 and corresponding λ2, . . . , λn

such thatA0uj = λj uj , j = 2, . . . , n ,

so, with uj = (uj 1, . . . , uj n−1) T,

(2)

n−1i=1

a(0)ki uji = λj ujk , j, k = 1, . . . , n − 1 .

We then let vj =n−1

i=1 uj i wi for j = 2, . . . , n and observe that by (2)

Avj

=

n−1

i=1

uji Awi

=

n−1

i=1

uji T (wi )=

n−1

i,k=1

uj i a(0)

ki

wk

=λj

n−1

k=1

uj kwk

=λj vj ,

for j = 2, . . . , n, so vj are eigenvectors of A for j = 2, . . . , n. Finally, observe that since by con-

struction v2, . . . , vn ∈ V ⊥ and v1 ∈ V we have that v1, . . . , vn are an orthonormal set of vectors inRn, and hence (since orthonormal vectors are automatically l.i. by Rem. 5.1 of the present Chapter)v1, . . . , vn are an orthonormal basis for Rn.

7/28/2019 Math51H


7. EIGENVALUES ANDTHE SPECTRAL THEOREM 79

SECTION 7 EXERCISES7.1 Let T : Rn → Rn be a linear transformation with the property that the matrix of T (i.e., thematrix Q such that T(x) = Qx for all x ∈ Rn) is symmetric. (i.e., Q = (qij ) with qij = qj i ). Prove:

(a) x · T(y) = y · T(x) for all x, y ∈ Rn.

(b) If v is a nonzero vector such that T (v) = λv for some λ ∈ R, and if is the one-dimensionalsubspace spanned by v, then T (⊥) ⊂ ⊥.

7.2Let A bean n × n symmetric matrix.Prove there is an n × n orthogonalmatrix Q (i.e.,Q TQ = I )such that Q TAQ is a diagonal matrix.

Hint: The Spectral Theorem proved above says that there is an orthonormal basis v1, . . . , vn forRn

such that Avj

=λj vj for each j (where λj

∈R for each j ).

7/28/2019 Math51H


80

7/28/2019 Math51H


81

C H A P T E R 4

More Analysis in R n

1 CONTRACTION MAPPING PRINCIPLE

First we define the notion of contraction. Given a mapping f : E → Rn, where E ⊂ Rn, we say thatf is a contraction if there is a constant θ ∈ (0, 1) such that

1.1

f(x)

−f(y)

≤θ

x

−y

,

∀x, y

∈E .

Theorem (Contraction Mapping Theorem.) Let E be a closed subset of Rn and f a contractionmapping of E into E. (Thus, f(E) ⊂ E and f satisfies 1.1.) Then f has a fixed point, i.e., there is a z ∈ E such that f (z) = z.

Remark: The proof we give below is constructive , i.e., not only will the proof show that there is a fixedpoint z, but at the same time it actually gives a practical method for finding (or at least approximatingarbitrarily closely) such a point z.

Proof of the Contraction Mapping Theorem: We inductively construct a sequence {xk}k=0,1,2,...

of points of E as follows:

(i) Take x0 ∈ E arbitrary.

(ii) Assuming that k ≥ 1 and that x0, . . . , xk−1 are already chosen, define

xk = f (xk−1) .

Observe that then we have, using 1.1,

xk+1 − xk = f (xk ) − f (xk−1) ≤ θ xk − xk−1, k ≥ 1 ,

and by mathematical induction we have

xk+1 − xk ≤ θ kx1 − x0, k ≥ 1 ,

and so if > k ≥ 1 we have

x

−xk

= −1

j =k

(xj +

1

−xj )

≤−1

j =kxj

+1

−xj

≤ (−1j =kθ j )x1 − x0 = (−1

j =kθ j )f (x0) − x0 .

On the other hand,∞

j =k θ j is a convergent geometric series with first term θ k and common ratio

θ , so it has sum θ k/(1 − θ ), and hence the above inequality gives

(∗) x − xk ≤ (1 − θ )−1f (x0) − x0 θ k, > k ≥ 1 .

7/28/2019 Math51H


82 CHAPTER 4. MORE ANALYSIS IN R n

In particular, with k = 1 we have

x = x − x1 + x1 ≤ x − x1 + x1 ≤ (1 − θ )−1f (x0) − x0 + f (x0)

for all > 1, so the sequence {xk}k=1,2,... is bounded, and by the Bolzano-Weierstrass theorem thereis a convergent subsequence {xkj

}j =1,2,.... Since E is closed the limit limj →∞ xkj is a point of E.

Notice that we can also take = kj in (∗), giving

(∗∗) xkj − xk ≤ (1 − θ )−1f (x0) − x0 θ k, j > k ≥ 1 ,

and hence, letting z = limj →∞ xkj , and taking limits in (∗∗), we see that

z − xk ≤ (1 − θ )−1f (x0) − x0 θ k , ∀ k ≥ 1 ,

so that, in fact (since θ k → 0 as k → ∞), the whole sequence xk (and not just the subsequence xkj )

converges to z. Then by continuity of f at the point z we have lim f (xk) = f(z). But, on the otherhand, f (xk) = xk+1 so we must also have lim f (xk ) = lim xk+1 = z. Thus, f(z) = z.

SECTION 1 EXERCISES

1.1 Show that the system of two nonlinear equations

x2 − y2 + 4x − 1 = 0

x

12

x2 + y2+ 4y + 1 = 0

has a solution in the disk x2

+y2

≤1.

Hint: Define f( x , y ) = x + 14

(x2 − y2) − 14

, y + 14

( 12

x2 + y2)x + 14 T and show that F(x,y) =

(x,y) T − f( x , y ) has a fixed point in the disk.

2 INVERSE FUNCTION THEOREM

We here prove the inverse function theorem, which guarantees that C1 function f : U → Rn islocally C1 invertible near points where the Jacobian matrix is nonsingular. More precisely:

Theorem (Inverse FunctionTheorem.) Suppose U ⊂ Rn is open, f : U → Rn is C1 , and x0 ∈ U is such that Df (x0) is nonsingular (i.e., (Df (x0))−1 exists). Then there are open sets V , W containing x0

and f (x0) , respectively, such that f |V is a 1:1 map of V onto W with C1 inverse g : W → V .

Proof: We first consider the special case when Df (x0) = I . (The general case follows directly by applying this special case to the function f̃(x) ≡ (Df (x0))−1f(x).)

Let ε ∈ (0, 12]. Observe that since U is open and Df (x) is continuous and = I at x =

x0 we have a ρ > 0 such that Bρ (x0) ⊂ U and Df (x) − I < ε ≤ 12

for x − x0 <

ρ, so, in particular, Df (x) v = v + (Df (x) − I ) v ≥ v − (Df (x) − I ) v ≥ v −

7/28/2019 Math51H


2. INVERSE FUNCTION THEOREM 83

(Df (x) − I ) v ≥ v − 12 v = 12 v for any v ∈ Rn and hence the null space N(Df (x))is trivial. Thus, the above choice of ρ ensures that (Df (x))−1 exists for each x ∈ Bρ (x0).

Next observe that with the same ρ > 0 as in the above paragraph we have, by the fundamentaltheorem of calculus, ∀ x, y ∈ Bρ (x0)

(y − f(y)) − (x − f (x)) = 10

d dt

F (x + t (y − x )) d t , F (x ) = x − f (x) ,

which by the Chain Rule gives

(y − f (y)) − (x − f(x)) = 10

(I − Df (x + t (y − x))) · (y − x ) d t ,

and hence

y − f(y)) − (x − f (x)) = 10 (I − Df (x + t (y − x))) · (y − x ) d t (1)

≤ 10

(I − Df (x + t (y − x))) · (y − x) dt

≤ 10

I − Df (x + t (y − x)) y − x dt

≤ εy − x ∀ x, y ∈ Bρ (x0) ,

and hence

(2) (1 − ε)x − y ≤ f(x) − f(y) ≤ (1 + ε)x − y, ∀ x, y ∈ Bρ (x0) .

We are going to prove the theorem with V = Bρ (x0). We first claim that if z is an arbitrary point

of the open ball Bρ (x0) and if σ ∈ (0, ρ − z − x0) then

(3) Bσ/2(f (z)) ⊂ f (Bσ (z)) ⊂ f (Bρ (x0)) .

Notice that the second inclusion here is trivial because of the inclusion Bσ (z) ⊂ Bρ (x0) which wecheck as follows: x ∈ Bσ (z) ⇒ x − x0 = x − z + z − x0 ≤ x − z + z − x0 ≤ σ + z −x0 < ρ because σ < ρ − z − x0. Thus, we have only to prove the first inclusion in (3). Toprove this we have to show that for each vector v ∈ Bσ/2(f (z)) there is an x v ∈ Bσ (z) withf (x v ) = v . We check this using the contraction mapping principle on the closed ball Bσ (z): Wedefine F(x) = x − f(x) + v , and note that

F(x) − z = (x − f(x)) − (z − f (z)) + v − f (z)≤ (x − f(x)) − (z − f (z)) + v − f(z) <

1

2σ + 1

2σ = σ ,

for all x ∈ Bσ (z), so F maps the closed ball Bσ (z) into the corresponding open ball Bσ (z) and sinceF(x) − F(y) = (x − f(x)) − (y − f (y)) we see by (1) that F is a contraction mapping of Bσ (z)

into itself and so has a fixed point x ∈ Bσ (z) by the contraction mapping principle.This fixed point

x actually lies in the open ball Bσ (z) because F maps Bσ (z) into Bσ (z); thus x − f(x) + v = x, orin other words f(x) = v , with x ∈ Bσ (z), so x is the required point x v ∈ Bσ (z), and indeed (3) isproved. Notice that (3) in particular guarantees that the image f (Bρ (x0)) is an open set, which we

7/28/2019 Math51H



call W . By (2), f is a 1:1 map of Bρ (x0) onto this open set W . Let g : W → Bρ (x0) be the inversefunction and write x = g ( u ) , y = g ( v ) ( u , v ∈ W ); then the first inequality in (2) can be written

(2) g ( u ) − g ( v ) ≤ (1 − ε)−1 u − v ≤ 2 u − v , u , v ∈ W ,

which guarantees that the inverse function g is continuous on W . To prove the inverse is differen-

tiable at a given v ∈ W , pick y ∈ Bρ (x0) such that f(y) = v , let A = Df (y). We claim that g isdifferentiable at v with D g ( v ) = (Df (y))−1. To see this let A = Df (y), and for given ε ∈ (0, 1

4)

pick δ > 0 such that Bδ( v ) ⊂ W (which we can do since W is open), Bδ(y) ⊂ Bρ (x0), and

(4) 0 < x − y < δ ⇒ f(x) − f(y) − A(x − y) < εx − y ,

which we can do since f is differentiableat y. Recallthat we already checked that A−1(= (Df (y))−1)

exists for all y ∈ Bρ (x0), and then

g ( u ) − g ( v ) − A−1( u − v ) = A−1(A(g(u) − g ( v ) ) − ( u − v ))= A−1(A(x − y) − (f(x) − f (y)))≤ A−1 f(x) − f(y) − A(x − y)(5)

for every u ∈ W , where x = g ( u ) (i.e., u = f(x)). But, since x − y = g ( u ) − g ( v ) ≤2 u − v (by (2) ), we then have u − v < δ/2 ⇒ x − y < δ, which by (4) implies the ex-pression on the right of (5) is ≤ εA−1x − y = εA−1g ( u ) − g ( v ) ≤ 2εA−1u − v.

That is, we have proved

u − v < δ/2 ⇒ g ( u ) − g ( v ) − A−1( u − v ) < 2εA−1 u − v ,

which proves that g is differentiable at v with D g ( v ) = A−1, as claimed.Finally,wehavetocheckthe continuity of theJacobian matrix D g( u ) for u ∈ W ;but f(g(u))) ≡ u

for u ∈ W , so by the chain rule we have Df(g(u))Dg(u) = I for every u ∈ W ; i.e., D g( u ) =(Df (g(u )))−1, which is a continuous function of u because g is a continuous function of u ∈ W

and Df is continuous on Bρ (x0), which in turn implies that (Df )−1 is continuous on Bρ (x0).Thiscompletes the proof of the inverse function theorem.

3 IMPLICIT FUNCTION THEOREM

The implicit function theorem considers C1 mappingsfromanopenset U inRn intoRm,with m < n.It is convenient notationally to think of Rn as the same as the productRn−m × Rm, and accordingly,

points in Rn should be written xy where x ∈ Rn−m and y ∈ Rm ; this is somewhat cumbersome

so we instead abbreviate it by writing [x, y] (with square brackets). Likewise, a function f on Rn

should be written f

x

y

, again too cumbersome, so we’ll write f [x, y]. After a while you’ll get

comfortable with this, and start fudging notation by simply writing f( x , y ) (with round brackets)

7/28/2019 Math51H


3. IMPLICIT FUNCTION THEOREM 85

instead of f [x, y]. Note that if f [x, y] is a C1 function then we use the notation Dx f [x, y] =(

∂f [x ,y]∂x1

, . . . ,∂f [x ,y]∂xn−m

) and Dy f [x, y] = (∂f [x ,y]

∂y1, . . . ,

∂f [x ,y ]∂ym

), and so in this notation the Jacobianmatrix of f is (Dx f [x, y], Dy f [x, y]).

Implicit FunctionTheorem. Let U be an open subset of Rn , G : U → Rm a C1 mapping with m < n ,let S = {[x, y] ∈ U : G[x, y] = 0} , and let [a, b] beapointin S such that the matrix Dy G[x, y] (that’s

the m × m matrix with j -th column ∂G[x ,y]∂yj

) is nonsingular at the point [a, b]. Then there are open

sets V ⊂ Rn and W ⊂ Rn−m with [a, b] ∈ V and a ∈ W and a C1 map h : W → Rm such that S ∩ V = graph h.

Note: As usual, graph h = {[x, y] : x ∈ W and y = h(x)}, that is, {[x, h(x)] : x ∈ W }.

Proof of the Implicit FunctionTheorem: The proof is rather easy: it is simply a clever applicationof theinverse function theorem.We start by defining a map f : U → Rn by simply taking f [x, y] =[x, G[x, y]]. Of course this is a C1 map because G is C1. Note that the mapping f leaves the firstn − m coordinates (i.e., the x coordinates) of each point [x, y] fixed. Also, by direct calculation the

Jacobian matrix Df [a, b] (which is n × n) has the block formI (n−m)×(n−m) O

Dx G[a, b] Dy G[a, b]

and hence det Df [a, b] = det Dy G[a, b] = 0 so the inverse function theorem implies that there areopen sets V , Q in Rn with [a, b] ∈ V , f [a, b] = [a, G[a, b]] = [a, 0] ∈ Q such that f : V → Q

is 1:1 and onto and the inverse g : Q → V is C1. Since f [x, y] ≡ [x, G[x, y]] (i.e., f fixes the first

n − m components x of the point [x, y] ∈Rn

), g must have the form(1) g[x, y] = [x, H [x, y]], with H a C1 function on Q .

Now we define W = {x ∈ Rn−m : [x, 0] ∈ Q ∩ (Rn−m × {0})} (it is easy to check that W is thenopen in Rn−m because Q is open in Rn), and let h(x) = H [x, 0] with H as in (1), so that

g[x, 0] = [x, h(x)], x ∈ W, where h(x) = H [x, 0] with H as in (1) ,

and then observe that [x, 0] = f (g[x, 0]) = f [x, h(x)] = [x, G[x,h(x)]; that is, G[x, h(x)] = 0,so we’ve shown that graph h ⊂ S ∩ V . To check equality, we just have to check the reverse inclusionS ∩ V ⊂ graph h as follows:

[x, z

] ∈S

∩V

⇒G

[x, z

] =0

⇒f

[x, z

] = [x, 0

] ⇒[x, z] = g(f [x, z]) = g[x, 0] = [x, h(x)] ,

so h(x) = z and [x, z] ∈ graph h.

Finally, we give an alternative version of the implicit function theorem which we stated in Thm. 9.13of Ch. 2 and which was used in the proof of the Lagrange multiplier Thm. 9.14 of Ch. 2.

7/28/2019 Math51H



Corollary. If k ∈ {1, . . . , n − 1} , if U ⊂ Rn is open, if g1, . . . , gn−k : U → R are C1 , if S = {x ∈U : gj (x) = 0 for each j = 1, . . . , n − k} and if M = ∅ , where

M = {y ∈ S : ∇ Rn g1( y ) , . . . , ∇ Rn gn−k(y) are l.i.} ,

then M is a k-dimensional C1 manifold.

Proof: Let m = n − k,let a ∈ M and let G = (g1, . . . , gm) T : U → Rm.The m rows of the m × n

matrix DG(a) are given to be l.i., so DG(a) has rank m and we must we able find m l.i. columnsDj 1 G( a ) , .. ., Dj m G(a). Reordering the coordinates if necessary, we can suppose without loss of generality that j 1 = n − m + 1, j 2 = n − m + 2, . . . , j m = n (i.e.,j 1, . . . , j m arethelast m integersbetween 1 and n),and we can relabel thecoordinatesof pointsof Rn as [x, y] with x ∈ Rn−m, y ∈ Rm

and the point a can be written also as [a0, b0] with a0 ∈ Rn−m and b0 ∈ Rm. Then Dy G[x, y] is

an m × m nonsingular matrix at the point [a0, b0], so the Implicit Function Theorem applies with[a0, b0] in place of [a, b] and hence the Def. 9.2 of Ch. 2 is checked.

7/28/2019 Math51H


87

A P P E N D I X A

Introductory Lectures on Real Analysis

LECTURE 1: THE REAL NUMBERS

We assume without proof the usual properties of the integers: For example, that the integers are

closed under addition and subtraction, that the principle of mathematical induction holds for thepositive integers, and that 1 is the least positive integer.

We also assume the usual field and order properties for the real numbersR. Thus, we accept withoutproof that the reals satisfy the field axioms , as follows:

F1 a + b = b + a, ab = ba (commutativity)

F2 a + (b + c) = (a + b) + c, a(bc) = (ab)c (associativity)

F3 a(b + c) = ab + ac (distributive law)

F4

∃elements 0, 1 with 0

=1 such that 0

+a

=a and 1

·a

=a

∀a

∈R (additive and multiplica-

tive identities)

F5 ∀ a ∈ R, ∃ an element −a such that a + (−a) = 0 (additive inverse)

F6 ∀ a ∈ R with a = 0, ∃ an element a−1 such that aa−1 = 1 (multiplicative inverse).

Terminology: a + (−b), ab−1 are usually written a − b , ab

, respectively. Notice that the lattermakes sense only for b = 0.

Notice that all the other standard algebraic properties of the reals follow from these.(See Exercise 1.1below.)

We here also accept without proof that the reals satisfy the following order axioms:

01 For each a ∈ R, exactly one of the possibilities a > 0, a = 0, −a > 0 holds.

02 a > 0 and b > 0 ⇒ ab > 0 and a + b > 0.

7/28/2019 Math51H


88 APPENDIX A. INTRODUCTORY LECTURES ON REAL ANALYSIS

Terminology: a > b means a − b > 0, a ≥ b means that either a > b or a = b, a < b means b > a,and a ≤ b means b ≥ a.

We claim that all the other standard properties of inequalities follow from these and from F1–F6.(See Problem 1.2 below.)

Notice also that the above properties (i.e., F1–F6, O1, O2) all hold with the rational numbers Q ≡ {p

q: p, q are integers, q = 0} in place of R. F1–F6 also hold with the complex numbers C =

{x + iy : x, y ∈ R} in place of R, but inequalities like a > b make no sense for complex numbers.

In addition to F1–F6, O1, O2 there is one further key property of the real numbers. To discuss it we need first to introduce some terminology.

Terminology: If S ⊂ R we say:

(1) S is bounded above if ∃ a number K ∈ R such that x ≤ K ∀ x ∈ S . (Any such number K is calledan upper bound for S .)

(2) S is bounded below if ∃ a number k ∈ R such that x ≥ k ∀ x ∈ S . (Any such number k is calleda lower bound for S .)

(3) S is bounded if it is both bounded above and bounded below. (This is equivalent to the fact that∃ a positive real number L such that |x| ≤ L ∀ x ∈ S .)

We can now introduce the final key property of the real numbers.

C. (“Completeness property of the reals”): If S

⊂R is nonempty and bounded above, then S has

a least upper bound.Notice that the terminology “least upper bound” used here means exactly what it says: a number α

is a least upper bound for a set S ⊂ R if

(i) x ≤ α ∀ x ∈ S i.e., α is an upper bound), and

(ii) if β is any other upper bound for S , then α ≤ β i.e., α is ≤ any other upper bound for S ).

Sucha least upper bound is unique ,becauseif α1, α2 arebothleastupperboundsfor S ,theproperty(ii)

implies that both α1 ≤ α2 and α2 ≤ α1, so α2 = α1. It therefore makes sense to speak of the leastupper bound of S (also known as “the supremum” of S ). The least upper bound of S will henceforthbe denoted sup S . Notice that property C guarantees sup S exists if S is nonempty and boundedabove.

Remark: If S is nonempty and bounded below, then it has a greatest lower bound (or “infimum”), which we denote inf S . One can prove the existence of inf S (if S is bounded below and nonempty)

by using property C on the set −S = {−x : x ∈ S }. (See Exercise 1.5 below.)

We should be careful to distinguish between the maximum element of a set S (if it exists) and thesupremum of S . Recall that we say a number α is the maximum of S (denoted max S ) if

7/28/2019 Math51H


LECTURE 1.THE REAL NUMBERS 89

(i) x ≤ α ∀ x ∈ S (i.e., α is an upper bound for S ), and(ii) α ∈ S .

These two properties say exactly that α is a upper bound for S and also one of the elements of S . Thus, clearly a maximum α of S , if it exists, satisfies both (i), (ii) and hence must agree with sup S .But keep in mind that max S may not exist, even if the set S is nonempty and bounded above: forexample, if S = (0, 1) (= {x ∈ R : 0 < x < 1}), then sup S = 1, but max S does not exist, because1 /∈ S .

Notice that of course any finite nonempty set S ⊂ R has a maximum. One can formally prove thisby induction on the number of elements of the set S . (See Exercise 1.3 below.)

Using the above-mentioned properties of the integers and the reals it is now possible to give formal

rigorous proofs of all the other properties of the reals which are used, even the ones which seem self-evident. For example, one can actually prove formally, using only the above properties, the fact thatthe set of positive integers are not bounded above. (Otherwise there would be a least upper boundα so, in particular, we would have n ≤ α for each positive integer n and hence, in particular, n ≤ α

hence n + 1 ≤ α for each n, or in other words n ≤ α − 1 for each positive integer n, contradicting

the fact that α is the least upper bound!) Thus, we have shown rigorously, using only the axiomsF1–F6, O1, O2, and C, that the positive integers are not bounded above. Thus,

∀ positive a ∈ R, ∃ a positive integer n with n > a

i.e.,1

n<

1

a

.

This is referred to as “The Archimedean Property” of the reals.

Similarly, using only the axioms F1–F6, O1,2, and C, we can give a formal proof of all the basicproperties of the real numbers—for example,in Problem 1.7 below you are asked to prove that squareroots of positive numbers do indeed exist.

FinalnotesontheReals:(1) We have assumed without proof all properties F1–F6,O1,O2 and C.Ina more advanced course we could, starting only with the positive integers, give a rigorous constructionof the real numbers, and prove all the properties F1–F6, O1, O2, and C. Furthermore, one can prove(in a sense that can be made precise) that the real number system is the unique field with all the

above properties.

(2) You can of course freely use all the standard rules for algebraic manipulation of equations andinequalities involving the reals;normally you do not need to justify such things in terms of theaxiomsF1–F6, O1, O2 unless you are specifically asked to do so.

LECTURE 1 PROBLEMS

1.1 Using only properties F1–F6, prove

(i) a · 0 = 0 ∀ a ∈ R(ii) ab = 0 ⇒ either a = 0 or b = 0

7/28/2019 Math51H



(iii) ab + cd = ad +bcbd ∀ a , b , c, d ∈ R with b = 0, d = 0.Note: In each case present your argument in steps, stating which of F1–F6 is used at each step.

Hint for (iii): First show the “cancellation law” that xy

= xzyz

for any x , y , z ∈ R with y = 0, z = 0.

1.2 Using only properties F1–F6 and O1,O2, (and the agreed terminology) prove the following:

(i) a > 0 ⇒ 0 > −a i.e., −a < 0)

(ii) a > 0 ⇒ 1a

> 0

(iii) a > b > 0 ⇒ 1a

< 1b

(iv) a > b and c > 0 ⇒ ac > bc.

1.3 If S is a finite nonempty subset of R, prove that max S exists. (Hint: Let n be the number of

elements of S and try using induction on n.)

1.4 Given any number x ∈ R, prove there is an integer n such that n ≤ x < n + 1.

Hint: Start by proving there is a least integer > x.

Note: 1.4 Establishes rigorously the fact that every real number x can be written in the form x =integer plus a remainder, with the remainder ∈ [0, 1). The integer is often referred to as the “integerpart of x.” We emphasize once again that such properties are completely standard properties of thereal numbers and can normally be used without comment; the point of the present exercise is to show

that it is indeed possible to rigorously prove such standard properties by using the basic propertiesof the integers and the axioms F1–F6, O1,O2, C.

1.5 Given a set S ⊂ R, −S denotes {−x : x ∈ S }. Prove:

(i) S is bounded below if an only if −S is bounded above.

(ii) If S is nonempty and bounded below, then inf S exists and = − sup(−S).

(Hint: Show that α = − sup (−S) has the necessary 2 properties which establish it to be the greatestlower bound of S .)

1.6 If S ⊂ R is nonempty and bounded above, prove ∃ a sequence a1, a2, . . . of points in S withlim an = sup S .

Hint: In case sup S /∈ S ,let α = sup S , and for each integer j ≥ 1 prove there is at least one elementaj ∈ S with α > aj > α − 1

j .

1.7 Prove that every positive real number has a positive square root. (That is, for any a > 0, provethere is a real number α > 0 such that α2

=a.)

Hint: Begin by observing that S = {x ∈ R : x > 0 and x2 < a} is nonempty and bounded above,and then argue that sup S is the required square root.

7/28/2019 Math51H


LECTURE 2. SEQUENCES OF REAL NUMBERS 91

LECTURE 2: SEQUENCES OF REAL NUMBERS AND THEBOLZANO-WEIERSTRASSTHEOREM

Let a1, a2, . . . beasequenceofrealnumbers; an is calledthe n-thterm of thesequence.Wesometimesuse the abbreviation {an} or {an}n=1,2,...

Technically, we should distinguish between the sequence {an} and the set of terms of the sequence —i.e.,the set S = {a1, a2 . . . }. These are not the same: e.g., the sequence 1, 1, . . . has infinitely many

terms each equal to 1, whereas the set S is just the set {1} containing one element.

Formally, a sequence is a mapping from the positive integers to the real numbers; the nth

term an of the sequence is just the value of this mapping at the integer n. From this point of view—i.e., thinkingof a sequence as a mapping from the integers to the real numbers—a sequence has a graph consistingof discrete points inR2, one point of the graph on each of the vertical lines x = n. Thus, for example,the sequence 1, 1, . . . (each term = 1) has graph consisting of the discrete points marked “⊗” in thefollowing figure:

Figure A.1: Graph of the trivial sequence {an}n=1,... where an = 1 ∀ n.

Terminology: Recall the following terminology. A sequence a1, a2, . . . is:

(i) bounded above if ∃ a real number K such that an ≤ K ∀ integer n ≥ 1.

(ii) bounded below if ∃ a real number k such that an ≥ k ∀ integer n ≥ 1.

(iii) bounded if it is both bounded above and bounded below. (This is equivalent to the fact that ∃a real number L such that |an| ≤ L ∀ integer n ≥ 1.)

(iv) increasing if an+1 ≥ an ∀ integer n ≥ 1.(v) strictly increasing if an+1 > an ∀ integer n ≥ 1.

(vi) decreasing if an+1 ≤ an ∀ integer n ≥ 1.

(vii) strictly decreasing if an+1 < an ∀ integer n ≥ 1.

(viii) monotone if either the sequence is increasing or the sequence is decreasing.

7/28/2019 Math51H



(ix) We say the sequence has limit ( a given real number) if for each ε > 0 there is an integerN ≥ 1 such that

(∗) |an − | < ε ∀ integer n ≥ N .3

(x) In case the sequence {an} has limit we write

lim an = or limn→∞ an = or an → .

(xi) We say the sequence {an} converges (or “is convergent ”) if it has limit for some ∈ R.

Theorem 2.1. If

{an

}is monotone and bounded, then it is convergent. In fact, if S

= {a1, a2 . . .

}is the

set of terms of the sequence, we have the following:

(i) if {an} is increasing and bounded then lim an = sup S .

(ii) if {an} is decreasing and bounded then lim an = inf S .

Proof: See Exercise 2.2. (Exercise 2.2 proves part (i), but the proof of part (ii) is almost identical.)

Theorem 2.2. If {an} is convergent, then it is bounded.

Proof: Let l = lim an. Using the definition (ix) above with ε = 1, we see that there exists an integerN ≥ 1 such that |an − l| < 1 whenever n ≥ N . Thus, using the triangle inequality, we have |an| ≡

|(an

−l)

+l

| ≤ |an

−l

| + |l

|< 1

+ |l

| ∀integer n

≥N . Thus,

|an| ≤ max{|a1|, . . . |aN |, |l| + 1} ∀ integer n ≥ 1 .

Theorem 2.3. If {an} , {bn} are convergent sequences, then the sequences {an + bn} , {anbn} are also

convergent and

lim(an + bn) = lim an + lim bn(i)lim(anbn) = (lim an) · (lim bn) .(ii)

In addition, if bn = 0 and lim bn = 0 , then

(iii) liman

bn

= lim an

lim bn

.

Proof: We prove (ii); the remaining parts are left as an exercise.First, since {an}, {bn} are convergent,the previous theorem tells us that there are constants L, M > 0 such that

(∗) |an| ≤ L and |bn| ≤ M ∀ positive integer n .

3 Notice that (∗) is equivalent to − ε < a n < + ε ∀ integers n ≥ N .

7/28/2019 Math51H



Now let l = lim an, m = lim bn and note that by the triangle inequality and (∗),

|anbn − lm| ≡ |anbn − lbn + lbn − lm|≡ |(an − l)bn + l(bn − m|≤ |an − l||bn| + |l||bn − m|≤ M |an − l| + |l||bn − m| ∀ n ≥ 1 .(∗∗)

On the other hand, for any given ε > 0 we use the definition of convergence, i.e., (ix) above) todeduce that there exist integers N 1, N 2 ≥ 1 such that

|an − l| <ε

2(1 + M + |l|) ∀ integer n ≥ N 1

and|bn − m| < ε

2(1 + M + |l|) ∀ integer n ≥ N 2 .

Thus, for each integer n ≥ max{N 1, N 2}, (∗∗) implies

|anbn − lm| < M ε

2(1 + M + |l|) + |l| ε

2(1 + M + |l|)<

ε

2+ ε

2= ε ,

and the proof is complete because we have shown that the definition (ix) is satisfied for the sequenceanbn with lm in place of l.

Remark: Notice that if we take {an} to be the constant sequence −1, −1, . . . (so that trivially lim an

= −1) in part (ii) above, we get

lim(−bn) = − lim bn .

Hence, using part (i) with {−bn} in place of bn we conclude

lim(an − bn) = lim an − lim bn .

By similar argument (i), (ii), imply that

lim(αan + βbn) = α lim an + β lim bn .

For any α, β ∈ R (provided lim an, lim bn both exist).

The following definition of subsequence is important.

Definition: Given a sequence a1, a2, . . . , we say an1 , an2 , . . . is a subsequence of {an} if n1, n2, . . .are integers with 1 ≤ n1 < n2 < n3 < . . . . (Note the strict inequalities.)

Thus, {bn}n=1,2,... is a subsequence of {an}n=1,2,... if and only if for each j ≥ 1 both the followinghold:

(i) bj is one of the terms of the sequence a1, a2, . . . , and

7/28/2019 Math51H



(ii) the real number bj +1 appears later than the real number bj in the sequence a1, a2, . . .

Thus, {a2n}n=1,2,... = a2, a4, a6 . . . is a subsequence of {an}n=1,2,... and 12

, 13

, 14

, 15

. . . is a subse-quence of { 1

n}n=1,2,... but 1

3, 1

2, 1

4, 1

5, . . . is not a subsequence of { 1

n}.

Theorem 2.4. (Bolzano-Weierstrass Theorem.) Any bounded sequence {an} has a convergent subse-quence.

Proof: (“Method of bisection”) Since {an} is bounded we can find upper and lower bounds, respec-tively. Thus, ∃ c < d such that

(∗) c ≤ ak ≤ d ∀ integer k ≥ 1 .

Now subdivide the interval [c, d

] into the two half intervals [c, c

+d

2 ],[

c

+d

2, d

]. By (∗

), at least one of these (call it I 1 say) has the property that there are infinitely many integers k with ak ∈ I 1. Similarly,

we can divide I 1 into two equal subintervals (each of length d −c4

), at least one of which (call it I 2)has the property that ak ∈ I 2 for infinitely many integers k. Proceeding in this way we get a sequenceof intervals {I n}n=1,2... with I n = [cn, d n] and with the properties that, for each integer n ≥ 1,

length I n ≡ d n − cn = d − c

2n(1)

[cn+1, d n+1] ⊂ [cn, d n] ⊂ [c, d ](2)ak ∈ [cn, d n] for infinitely many integers k ≥ 1 .(3)

(Notice that (3) says cn ≤ ak ≤ d n for infinitely many integers k ≥ 1.)

Using properties (1), (2), (3), we proceed to prove the existence of a convergent subsequence as

follows.Select any integer k1 ≥ 1 such that ak1

∈ [c1, d 1] (which we can do by (3)). Then select any integerk2 > k1 with ak2

∈ [c2, d 2]. Such k2 of course exists by (3). Continuing in this way we get integers1 ≤ k1 < k2 . . . such that akn

∈ [cn, d n] for each integer n ≥ 1. That is,

(4) cn ≤ akn≤ d n ∀ integer n ≥ 1 .

On the other hand, by (1), (2) we have

(5) c ≤ cn ≤ cn+1 < d n+1 ≤ d n ≤ d ∀ integer n ≥ 1 .

Notice that (5), in particular, guarantees that {cn}, {d n} are bounded sequences which are,respectively,increasing and decreasing , hence by Thm. 2.1 are convergent . On the other hand, (1) says d n − cn =d −c

2n (

→0 as n

→ ∞), hence lim cn

=lim d n(

= say). Then by (4) and the Sandwich Theorem

(see Exercise 2.5 below) we see that {akn}n=1,2,... also has limit .

LECTURE 2 PROBLEMS

2.1 Use the Archimedean property of the reals (Lem. 1.1 of Lecture 1) to prove rigorously that lim 1

n= 0.

7/28/2019 Math51H



2.2 Prove part (i) of Thm. 2.1.Hint: Let α = sup S , and show first that for each ε > 0, ∃ an integer N ≥ 1 such that aN > α − ε.

2.3 Using the definition (ix) on p. 92 prove that a sequence {an} cannot have more than one limit.

2.4 If {an}, {bn} are given convergent sequences and an ≤ bn ∀ n ≥ 1, prove lim an ≤ lim bn.

Hint: lim(an − bn) = lim an − lim bn, so it suffices to prove that lim cn ≤ 0 whenever {cn} is convergent with cn ≤ 0 ∀ n.

2.5 (“Sandwich Theorem.”) If {an}, {bn} are given convergent sequences with lim an = lim bn,and if {cn} is any sequence such that an ≤ cn ≤ bn ∀ n ≥ 1, prove that {cn} is convergent andlim cn = lim an(= lim bn) (Hint: Let l = lim an = lim bn and use the definition (ix) on p. 92.)

2.6 If k is a fixed positive integer and if {an} is any sequence such that1

nk ≤ an ≤ nk

∀ n ≥ 1,prove that lim a

1/nn = 1. (Hint: use 2.5 and the standard limit result that lim n

1n = 1.)

7/28/2019 Math51H



LECTURE 3: CONTINUOUS FUNCTIONSHere we shall mainly be interested in real valued functions for some closed interval [a, b]; thus

f : [a, b] →R. (This is reasonable notation, since for each x ∈ [a, b], f assigns a value f(x) ∈ R.)

First we recall the definition of continuity of such a function.

Definition 1. f : [a, b] →R is said to be continuous at the point c ∈ [a, b], if for each ε > 0 thereis a δ > 0 such that

x ∈ [a, b] with |x − c| < δ ⇒ |f(x) − f(c)| < ε .

Definition 2. We say f : [a, b] →R is continuous if f is continuous at each point c ∈ [a, b].

We want to prove the important theorem that such a continuous function attains both its maximumand minimum values on [a, b]. We first make the terminology precise.

Terminology: If f : [a, b] →R, then:

(1) f is said to attain its maximum at a point c ∈ [a, b] if f(x) ≤ f(c) ∀ x ∈ [a, b];(2) f is said to attain its minimum at a point c ∈ [a, b] if f(x) ≥ f(c) ∀ x ∈ [a, b].

We shall also need the following lemma, which is of independent importance.

Lemma 3.1. If an ∈ [a, b] ∀n ≥ 1 and if lim an = c ∈ [a, b] and if f : [a, b] →R is continuous at c,then

limf (a

n)=

f (c) ,

i.e., the sequence {f (an)}n=1,2... converges to f(c).

Proof: Let ε > 0. By Def. 1 above, ∃ δ > 0 such that

(∗) |f(x) − f(c)| < ε whenever x ∈ [a, b] with |x − c| < δ .

On the other hand, by the definition of lim an = c, with δ in place of ε (i.e., we use the definition(ix) on p. 92 of Lecture 2 with δ in place of ε ) we can find an integer N ≥ 1 such that

|an − c| < δ whenever n ≥ N .

Then, (since an ∈ [a, b] ∀ n) (∗) tells us that

|f (an)−f(c)| < ε ∀ n ≥ N .

Theorem 3.2. If f : [a, b] →R is continuous, then f is bounded and ∃ points c1, c2, ∈ [a, b] suchthat f attains its maximum at the point c1 and its minimum at the point c2; that is, f(c) ≤ f(x) ≤f (c2) ∀ x ∈ [a, b].

7/28/2019 Math51H


LECTURE 3. CONTINUOUS FUNCTIONS 97

Proof: It is enough to prove that f bounded above and that there is a point c1 ∈ [a, b] such thatf attains its maximum at c1 , because we can get the rest of the theorem by applying this results to−f .

To prove f is bounded above we argue by contradiction. If f is not bounded above, then for eachinteger n ≥ 1 we can find a point xn ∈ [a, b] such that f(x) ≥ n. Since xn is a bounded sequence,by the Bolzano-Weierstrass Theorem (Thm. 2.4 of Lecture 2) we can find a convergent subsequencexn1

, xn2, . . . Let c = lim xnj

Of course, since a ≤ xnj ≤ b ∀j , we must have c ∈ [a, b].Also,since 1 ≤ n1 < n2 < . . . (and since

n1, n2 . . . are integers), we must have nj +1 ≥ nj + 1 , hence by induction on j

(1) nj ≥ j ∀ integer j ≥ 1 .

Now since xnj ∈ [a, b] and lim xnj

= c ∈ [a, b] we have by Lem. 3.1 that lim f (xnj ) = f(c).Thus,

f (xnj )

j =1,2,...is convergent , hence bounded by Thm. 2.2. But by construction f (xnj

) ≥ nj (≥ j by

(1)),hence f (xnj )

j =1,2...is not bounded, a contradiction.This completes theproof that f is bounded

above.

We now want to prove f attains its maximum value at some point c1 ∈ [a, b].Let S = {f(x) : x ∈[a, b]}. We just proved above that S is bounded above, hence (since it is non empty by definition)S has a least upper bound which we denote by M . We claim that for each integer n ≥ 1 there is apoint xn ∈ [a, b] such that f (xn) > M − 1

n. Indeed, otherwise M − 1

nwould be an upper bound for

S , contradicting the fact that M was chosen to be the least upper bound. Again we can use theBolzano-Weierstrass Theorem to find a convergent subsequence xn1

, xn2. . . and again (1) holds.

Let c1

=lim xnj

. By Lem. 3.1 again we have

(2) f (c1) = lim f (xnj ) .

However, by construction we have

M ≥ f (xnj ) > M − 1

nj

≥ M − 1

j (by (1)) .

And hence by the Sandwich Theorem (Exercise 2.4 of Lecture 2) we have lim f (xnj ) = M . By (2)

this gives f (c1) = M .But M is an upper bound for S = {f(x) : x ∈ [a, b]}, hence we have f(x) ≤f (c1) ∀ x ∈ [a, b], as required.

An important consequence of the above theorem is the following.Lemma (Rolle’s Theorem): If f : [a, b] →R is continuous, if f(a) = f(b) = 0 and if f is differ-entiable at each point of (a, b) , then there is a point c ∈ (a,b) with f (c) = 0.

Proof: If f is identically zero then f (c) = 0 for every point c ∈ (a, b), so assume f is not identically zero. Without loss of generality we may assume f (x) > 0 for some x ∈ (a,b) (otherwise this

7/28/2019 Math51H



property holds with −f in place of f ). Then max f (which exists by Thm. 3.2) is positive and isattained at some point c ∈ (a, b). We claim that f (c) = 0. Indeed, f (c) = limx→c

f(x)−f(c)x−c

=limx→c+

f(x)−f(c)x−c

= limx→c−f(x)−f(c)

x−cand the latter 2 limits are, respectively, ≤ 0 and ≥ 0. But

they are equal, hence they must both be zero.

Corollary (Mean Value Theorem): If f : [a, b] → R is continuous and f is differentiable at each

point of (a,b) , then there is some point c ∈ (a,b) with f (c) = f(b)−f(a)b−a

.

Proof: Apply Rolle’s Theorem to the function f(x) = f(x) − f(a) − f(b)−f(a)b−a

(x − a).

LECTURE 3 PROBLEMS

3.1 Give an example of a bounded function f : [0, 1] →R such that f is continuous at eachpoint c ∈ [0, 1] except at c = 0, and such that f attains neither a maximum nor a minimum value.

3.2 Prove carefully (using the definition of continuity on p. 96) that the function f : [−1, 1] →R

defined by

f(x) =

+1 if 0 < x ≤ 1

0 if − 1 ≤ x ≤ 0

is not continuous at x = 0. (Hint: Show the definition fails with, e.g., ε = 12

.)

3.3 Let f : [a, b] →R be continuous, and let |f | : [a, b] →R be defined by |f |(x) = |f(x)|.Prove that

|f

|is continuous.

3.4 Suppose f : [a, b] →R and c ∈ [a, b] are given, and suppose that lim f (xn) = f(c) forall sequences {xn}n=1,2,... ⊂ [a, b] with lim xn = c. Prove that f is continuous at c. (Hint: If not,∃ ε > 0 such that (∗) on p. 96 fails for each δ > 0; in particular,∃ ε > 0 such that (∗) fails for δ = 1

n∀

integer n ≥ 1.)

3.5 If f : [0, 1] →R is defined by

f(x) =

1 if x ∈ [0, 1] is a rational number

0 if x ∈ [0, 1] is not rational .

Prove that f is continuous at no point of [0, 1].

Hint: Recall that any interval (c, d) ∈ R (with c < d ) contains both rational and irrational numbers.

3.6 Suppose f : (0, 1) → R is defined by f(x) = 0 if x ∈ (0, 1) is not a rational number, andf(x) = 1/q if x ∈ (0, 1) can be written in the form x = p

qwith p, q positive integers without

common factors. Prove that f is continuous at each irrational value x ∈ (0, 1).

Hint: First note that for a given ε > 0 there are at most finitely many positive integers q with 1q

≥ ε.

7/28/2019 Math51H


LECTURE 3. CONTINUOUS FUNCTIONS 99

3.7 Suppose f : [0, 1] →R is continuous, and f(x) = 0 for each rational point x ∈ [0, 1]. Provef(x) = 0 for all x ∈ [0, 1].3.8 If f : R → R is continuous at each point of R, and if f (x + y) = f(x) + f(y) ∀x, y ∈ R,prove ∃ a constant a such that f(x) = ax ∀x ∈ R. Show by example that the result is false if wedrop the requirement that f be continuous.

7/28/2019 Math51H



LECTURE 4: SERIES OF REAL NUMBERSConsider the series

a1 + a2 + · · · + an + . . .

(usually written with summation notation as∞

n=1 an), where a1, a2, . . . is a given sequence of realnumbers. an is called the n-th term of the series. The sum of the first n terms is

sn =n

k=1

ak ;

sn is called the n-th partial sum of the series. If

sn → s

(i.e., if lim sn = s) for some s ∈ R, then we say the series converges , and has sum s. Also, in this case we write

s =∞

n=1

an .

If sn does not converge, then we say the series diverges.

Example: If a ∈ R is given, then the series 1 + a + a2 + . . . (i.e., the geometric series) has nthpartial sum

sn

=1

+a

+ · · · +an−1

= n if a = 1

1−

an

1−a if a = 1 .

Using the fact that an → 0 if |a| < 1, we thus see that the series converges and has sum 11−a

if |a| < 1, whereas the series diverges for |a| ≥ 1. (Indeed {sn} is unbounded if |a| > 1 or a = 1, and

if a = −1, {sn}n=1,2,... = 1, 0, 1, 0, . . . .)

The following simple lemma is of key importance.

Lemma 4.1. If ∞

n=1 an converges, then lim an = 0.

Note: The converse is not true. For example, we check below that∞

n=11n

does not converge, but itsnth term is 1

n, which does converge to zero.

Proof of Lemma 4.1: Let s = lim sn. Then of course we also have s = lim sn+1. But, sn+1 − sn =(a1 + a2 + · · · + an+1) − (a1 + a2 + · · · an) = an+1, hence (see the remark following Thm. 2.3 of

Lecture 2) we have lim an+1 = lim sn+1 − lim sn = s − s = 0. i.e., lim an = 0. The following lemma is of theoretical and practical importance.

Lemma 4.2. If ∞

n=1 an,∞

n=1 bn both converge, and have sum s, t , respectively, and if α, β are real numbers, then

∞n=1(αan + βbn) also converges and has sum αs + βt , i.e.,

∞n=1(αan + βbn) =

α∞

n=1 an + β∞

n=1 bn if both∞

n=1 an,∞

n=1 bn converge.

7/28/2019 Math51H


LECTURE 4. SERIES OF REAL NUMBERS 101

Proof: Let sn = nk=1 an , t n = n

k=1 bk.Wearegiven sn → s and t n → t ,then αsn + βt n → αs +βt (see the remarks following Thm. 2.3 of Lecture 2). But αsn + βt n = α

nk=1 an + β

nk=1 bk =n

k=1(αak + βbk), which is the nth partial sum of ∞

n=1(αan + βbn).

There is a very convenient criteria for checking convergence in case all the terms are nonnegative .

Indeed, in this casesn+1 − sn = an+1 ≥ 0 ∀ n ≥ 1 ,

hence the sequence {sn} is increasing if an ≥ 0. Thus, by Thm. 2.1(i) of Lecture 2 we see that {sn}converges if and only if it is bounded . That is, we have proved:

Lemma 4.3. If each term of

∞n=1 an is nonnegative (i.e., if an ≥ 0 ∀ n ) then the series converges if and

only if the sequence of partial sums (i.e.,

{sn

}n

=1,2,... ) is bounded.

Example. Using the above criteria, we can discuss convergence of ∞n=1

1np where p > 0 is given.

The nth partial sum in this case is

sn =n

k=1

1

kp.

Since 1xp is a decreasing function of x for x > 0, we have, for each integer k ≥ 1,

1

(k + 1)p≤ 1

xp≤ 1

kp∀ x ∈ [k, k + 1] .

Integrating, this gives

1

(k + 1)p ≡ k+1

k

1

(k + 1)p ≤ k+1

k

1

xp dx ≤ k+1

k

1

kp dx ≡1

kp ,

so if we sum from k = 1 to n, we get

nk=1

1

(k + 1)p≤ n+1

1

1

xpdx ≤

nk=1

1

kp.

That is,

sn+1 − 1 ≤ n+1

1

1

xpdx ≤ sn ∀ n ≥ 1 .

But

n+1

1

1

xpdx

= log(n + 1) if p = 1

(n+1)1−

p

−11−p if p = 1 .

Thus, we see that {sn} is unbounded if p ≤ 1 and bounded for p > 1, hence from Lem. 4.2 weconclude ∞

n=1

1

np

converges p > 1

diverges p ≤ 1 .

7/28/2019 Math51H



Remark. The above method can be modified to discuss convergence of other series. See Exercise 4.6below.

Theorem 4.4. If ∞

n=1 |an| converges, then∞

n=1 an converges.

Terminology: If ∞

n=1 |an| converges, then we say that∞

n=1 an is absolutely convergent .Thus,withthis terminology, the above theorem just says “absolute convergence ⇒ convergence.”

Proof of Theorem 4.4: Let sn = nk=1 ak, t n = n

k=1 |ak|. Then we are given t n → t for somet ∈ R.

For each integer n ≥ 1, let

pn = an if an ≥ 0

0if

an

< 0

qn =−an if an ≤ 0

0 if an > 0 ,

and let s+n = n

k=1 pn, s−n = n

k=1 qn. Notice that for each n ≥ 1 we then have

an = pn − qn, sn = s+n − s−

n

|an| = pn + qn, t n = s+n + s−

n ,

and pn, qn ≥ 0. Also,

0 ≤ s+n ≤ t n ≤ t and 0 ≤ s−

n ≤ t n ≤ t .

Hence, we have shown that ∞n=

1 pn,∞n=

1 qn have bounded partial sums. Hence, by Lem. 4.3,

both∞n=1 pn,∞

n=1 qn converge.But then (by Lem.4.2)∞n=1(pn − qn) converges, i.e.,∞

n=1 an

converges.

Rearrangement of series: We want to show that the terms of an absolutely convergent series canbe rearranged in an arbitrary way without changing the sum. First we make the definition clear.

Definition: Let j 1, j 2, . . . be any sequence of positive integers in which every positive integer appearsonce and only once (i.e., the mapping n → j n is a 1 : 1 mapping of the positive integers onto thepositive integers). Then the series

∞n=1 aj n is said to be a rearrangement of the series

∞n=1 an.

Theorem 4.5. If ∞

n=1 an is absolutely convergent, then any rearrangement ∞

n=1 aj n converges, and has the same sum as

∞n=1 an.

Proof: We give the proof when an ≥ 0 ∀ n (in which case “absolute convergence” just means “convergence”). This extension to the general case is left as a exercise. (See Exercise 4.8 below.)

Hence, assume∞

n=1 an converges, and an ≥ 0 ∀ integer n ≥ 1, and let∞

n=1 aj n be any rearrange-ment. For each n ≥ 1, let

P(n) = max{j 1, . . . , j n} .

7/28/2019 Math51H


LECTURE 4. SERIES OF REAL NUMBERS 103

So that {j 1, . . . , j n} ⊂ {1, . . . P ( n )} and hence (since an ≥ 0 ∀ k)

aj 1 + aj 2 + · · · + aj n ≤ a1 + · · · + aP(n) ≤ s ,

where s = ∞n=1 an. Thus, we have shown that the partial sums of

∞n=1 aj n are bounded above

by s, hence by Lem. 4.3,∞

n=1 aj n converges, and has sum t satisfying t ≤ s. But∞

n=1 an is arearrangement of

∞n=1 aj n (using the rearrangement given by the inverse mapping j n → n), and

hence by the same argument we also have s ≤ t . Hence, s = t as required.

LECTURE 4 PROBLEMS

The first few problems give various criteria to test for absolute convergence (and also, in some cases,

for testing for divergence ).4.1 (i) (Comparison test.) If

∞n=1 an,

∞n=1 bn are given series and if |an| ≤ |bn| ∀ n ≥ 1, prove∞

n=1 bn absolutely convergent ⇒ ∞n=1 an absolutely convergent.

(ii) Use this to discuss convergence of:

(a)∞

n=1sin n

n2

(b)∞

n=1

sin( 1n )

n.

4.2 (Comparison test for divergence .) If ∞

n=1 an,∞

n=1 bn are given series with nonnegativeterms, and if an ≥ bn ∀ n ≥ 1, prove

∞n=1 bn diverges ⇒ ∞

n=1 an diverges.

4.3 (i) (Ratio Test.) If an

=0

∀n, if λ

∈(0, 1) and if there is an integer N

≥1 such that |an+1|

|an

| ≤λ ∀ n ≥ N , prove that ∞n=1 an is absolutely convergent. (Hint: First use induction to show that

|an| ≤ λn−N ∀ n ≥ N .)

(ii) (Ratio test for divergence.) If an = 0 ∀n and if ∃ an integer N ≥ 1 such that |an+1||an| ≥ 1 ∀ n ≥ N

then prove∞

n=1 an diverges.

4.4 (Cauchyroottest.)(i)Suppose ∃ λ ∈ (0, 1) andaninteger N ≥ 1 suchthat |an| 1n ≤ λ ∀ n ≥ N .

Prove that∞

n=1 an converges.

(ii) Use the Cauchy root test to discuss convergence of ∞

n=1 n2xn. (Here x ∈ R is given—considerthe possibilities |x| < 1, |x| > 1, |x| = 1.)

4.5 Suppose an ≥ 0 ∀ n ≥ 1 and

∞n=1 an diverges. Prove

(i) ∞n=1an

1+an diverges(ii)

∞n=1

an

1+n2anconverges.

4.6 (IntegralTest.) If f : [1, ∞) → R is positive and continuous at each point of [1, ∞),andif f isdecreasing , i.e., x < y ⇒ f(y) ≤ f(x), prove using a modification of the argument on pp. 100–102that

∞n=1 f(n) converges if and only if { n

1f( x ) d x}n=1,2,... is bounded .

7/28/2019 Math51H



4.7 Using the integral test (in Exercise 4.6 above) to discuss convergence of (i)

∞n=2

1n log n

(ii)∞

n=21

n(log n)1+ε , where ε > 0 is a given constant.

4.8 Complete the proof of Thm. 4.5 (i.e., discuss the general case when |an| converges). (Hint:

The theorem has already been established for series of nonnegative terms; use pn, qn as in Thm. 4.4.)

7/28/2019 Math51H


LECTURE 5.POWER SERIES 105

LECTURE 5: POWER SERIES A power series is a series of the form

∞n=0 anxn, where a0, a1, . . . are given real numbers and x is

a real variable. Here we use the standard convention that x0 = 1, so the first term a0x0 just meansa0.

Notice that for x = 0 the series trivially converges and its sum is a0.

The following lemma describes the key convergence property of such series.

Lemma 1. If the series ∞

n=0 anxn converges for some value x = c , then the series converges absolutely for every x with |x| < |c|.

Proof: ∞n=0 ancn

converges ⇒ lim ancn

= 0 ⇒ {ancn

}n=1,2,... is a bounded sequence. That is,there is a fixed constant M > 0 such that |ancn| ≤ M ∀ n = 0, 1, . . ., and so |x| < |c| ⇒, for any j = 0, 1, . . .,

|aj xj | =

aj cj xj

c j

≤ M xj

c j

= M |x|

|c|j

= Mrj , r = |x||c| < 1 ,

and hence n−1j =0|aj x

j | ≤ M n−1

j =0rj = M 1 − rn

1 − r≤ M

1 − r, n = 1, 2, . . . .

Thus, the series∞

n=0 |anxn| is convergent, because it has nonnegative terms and we’ve shown itspartial sums are bounded. This completes the proof.

We can now directly apply the above lemma to establish the following basic property of power series.

Theorem 5.1. For any given power series ∞n=0 anxn , exactly one of the following 3 possibilities holds:

(i) the series diverges ∀ x = 0 , or

(ii) the series converges absolutely ∀ x ∈ R , or

(iii) ∃ ρ > 0 such that the series converges absolutely ∀ x with |x| < ρ , and diverges ∀ x with |x| > ρ.

Terminology: If (iii) holds, the number ρ is called the radius of convergence and the interval (−ρ , ρ )

is called the interval of convergence . If (i) holds we say the radius of convergence is zero, and if (ii)holds we say the radius of convergence =∞.

Note: The theorem says nothing about what happens at x = ±ρ in case (iii).

Proof of Theorem 5.1: Consider the set S ⊂ R defined by

S = {|x| :∞n=0anxn converges} .

Notice that we always have 0 ∈ S , so S is nonempty. If S = {0} then case (i) holds, so we canassume S contains at least one c with c = 0. If S is not bounded then by Lem. 1 we have that∞

n=0 anxn is absolutely convergent (A.C.) on (−R,R) for each R > 0, and hence (ii) holds. If

7/28/2019 Math51H



S = {0} is bounded then R = sup S exists and is positive. Now for any x ∈ (−R, R) we have c ∈ S with |c| > |x| (otherwise |c| ≤ |x| for each c ∈ S meaning that |x| would be an upper bound forS smaller than R, contradicting R = sup S ), and hence by Lem. 1

anxn is A.C. So, in fact,∞

n=0 anxn is A.C. for each x ∈ (−R,R). We must of course also have∞

n=0 anxn diverges foreach x with |x| > R because otherwise we would have x0 with |x0| > R and

∞n=0 anxn

0 convergent,hence |x0| ∈ S , which contradicts the fact that R is an upper bound for S .

Suppose now that a given power series∞

n=0anxn has radius of convergence ρ > 0 (we includehere the case ρ = ∞, which is to be interpreted to mean that the series converges absolutely for allx ∈ R).

A reasonable and natural question is whether or not we can also expand f(x) in terms of powers of

x − α for given α ∈ (−ρ , ρ ).The following theorem shows that we can do this for |x − α| < ρ − |α|(and for all x in case ρ = ∞).

Theorem 5.2 (Change of Base-Point.) If f(x) = ∞n=0anxn has radius of convergence ρ > 0 or

ρ = ∞ , and if |α| < ρ (and α arbitrary in case ρ = ∞ ), then we can also write

(∗) f (x) = ∞m=0bm(x − α)m ∀ x with |x − α| < ρ − |α| ( x, α arbitrary if ρ = ∞ ) ,

where bm = ∞n=m

nm

anαn−m (so, in particular, b0 = f(α) ); part of the conclusion here is that the series

for bm converges, and the series ∞

m=0bm(x − α)m converges for the stated values of x, α.

Note: The series on the right in (∗) is a power series in powers of x − α, hence the fact that itconverges for |x − α| < ρ − |α| means that it has radius of convergence (as a power series in powersof x

−α)

≥ρ

− |α

|(and radius of convergence

= ∞in case ρ

= ∞). Thus, in particular, the series

on the right of (∗) automatically converges absolutely for |x − α| < ρ − |α| by Lem. 1.

Proof of Theorem 5.2: We take any α with |α| < ρ and any x with |x − α| < ρ − |α| (α, x arearbitrary if ρ = ∞), and we look at the partial sum S N =

N n=0anxn. Since the Binomial Theorem

tells us that xn = (α + (x − α))n = nm=0

nm

αn−m(x − α)m, we see that S N can be writtenN

n=0anxn = N n=0an

nm=0

n

m

αn−m(x − α)m .

Using the interchange of sums formula (see Problem 6.3 below)N n=0

nm=0cnm = N

m=0

N n=mcnm ,

this then gives

(1)N

n=0anxn = N m=0

N n=m

n

m

anαn−m(x − α)m .

Now since

nm

1/n → 1 as n → ∞ for each fixed m, we see that for any ε > 0 we have N such thatnm

≤ (1 + ε)n for all n ≥ N , hence |an

nm

xn| ≤ |an((1 + ε)x)n| ∀ n ≥ N and hence by the com-

parison test∞

n=0

nm

anxn also converges absolutely for all x ∈ (−ρ , ρ ) (because |x| < ρ ⇒ (1 +

7/28/2019 Math51H


LECTURE 5.POWER SERIES 107

ε)|x| < ρ for suitable ε > 0).Thus,since |α| < ρ we have,in particular, thatnmanαn is absolutely convergent and we can substitute

N n=m

nm

anαn−m = ∞

n=m

nm

anαn−m −∞

n=N +1

nm

anαn−m

in (1) above, whence (1) givesN n=0anxn =N

m=0

∞n=m

n

m

anαn−m

(x − α)m

−N m=0

∞n=N +1

n

m

anαn−m

(x − α)m

andN

m=0

∞n=N +1

n

m

anαn−m

(x−α)m

≤ N

m=0

∞n=N +1

n

m

|an| |α|n−m|x − α|m

≡ ∞n=N +1N

m=0n

m|an| |α|n

−m

|x − α|m

≤ ∞n=N +1

nm=0

n

m

|an| |α|n−m|x − α|m

≡∞

n=N +1|an|(|α|+|x−α|)n, |α|+|x−α|< ρ ,

where we used the Binomial Theorem again in the last line. Now observe that we have∞n=N +1|an|(|α| + |x − α|)n =∞

n=1|an|(|α| + |x − α|)n

−N n=1|an|(|α| + |x − α|)n → 0 as N → ∞ ,

so the above shows that limN →∞N

m=0bm(x − α)m exists (and is real),i.e., the series∞

m=0bm(x −α)m converges for |x − α| < ρ − |α|, and that the sum of the series agrees with

∞n

=0anxn, so the

proof of Thm. 5.2 is complete.

LECTURE 5 PROBLEMS

5.1 (i) Suppose the radius of convergence of ∞

n=0 anxn is 1, and the radius of convergence of ∞n=0 bnxn is 2. Prove that

∞n=0(an + bn)xn has radius of convergence 1.

(Hint: Lemma 1 guarantees that {anxn}n=1,2,... is unbounded if |x| is greater than the radius of convergence of

∞n=0 anxn.)

(ii) If both∞

n=0 anxn,∞

n=0 bnxn have radius of convergence = 1, show that

(a) The radius of convergence of ∞

n=0(an + bn)xn is ≥ 1, and

(b) For any given number R > 1 you can construct examples with radius of convergence of ∞n=0(an + bn)xn = R.

5.2 If ∃ constants c,k > 0 such that c−1n−k ≤ |an| ≤ cnk ∀n = 1, 2, . . ., what can you say aboutthe radius of convergence of

∞n=0 anxn.

7/28/2019 Math51H



LECTURE 6: TAYLOR SERIES REPRESENTATIONS

The change of base point theorem proved in Lecture 5 is actually quite strong; for example, itmakes it almost trivial to check that a power series is differentiable arbitrarily many times inside itsinterval of convergence (i.e., a power series is “C∞” in its interval of convergence), and furthermoreall the derivatives can be correctly calculated simply by differentiating each term (i.e., “termwise”differentiation is a valid method for computing the derivatives of a power series in its interval of convergence). Specifically, we have:

Theorem 6.1. Suppose the power series ∞

n=0 anxn has radius of convergence ρ > 0 (or ρ = ∞ ), and let f(x) =

∞n=0 anxn for |x| < ρ. Then all derivatives f (m)(x) , m = 1, 2, . . . , exist at every point x

with

|x

|< ρ , and, in fact,

f (m)(x) = ∞n=mn(n − 1) · · · (n − m + 1)anxn−m, x ∈ (−ρ , ρ ) ,

which says precisely that the derivatives of f can be correctly computed simply by differentiating the series ∞n=0anxn termwise (because n(n − 1) · · · (n − m + 1)anxn−m is just the mth derivative of anxn ); that

is, d m

dx m

∞n=0anxn = ∞

n=0d m

dx m anxn for |x| < ρ.

Proof: It is enough to check the stated result m = 1, because then the general result follows directly by induction on m.

The proof for m = 1 is an easy consequence of Thm. 5.2 (change of base-point theorem), whichtells us that we can write

(1) f (x)

−f(α)

= ∞m

=1bm(x

−α)m

≡(x

−α)b1

+(x

−α)

∞m

=2bm(x

−α)m−1

for |x − α| < ρ − |α|, where bm = ∞n=m

nm

anαn−m. That is,

(2)f(x) − f(α) − b1(x − α)

x − α= ∞

m=2bm(x − α)m−1, 0 < |x − α| < ρ − |α|

and the expression on the right is a convergent power series with radius of convergence at leastr = ρ − |α| > 0 and hence is A.C. if x − α is in the interval of convergence (−r,r). In particular,it is A.C. when |x − α| = r/2 and hence (2) shows that

(3)

f(x) − f(α)

x − α− b1

≤ ∞

m=2|bm| |x − α|m−1 ≤ |x − α|∞m=2|bm|(r/2)m−2

for 0 <|x

−α|

< r/2 (where r=

ρ− |

α|

> 0). Since the right side in (3)→

0 as x→

α, this

shows that f (α) exists and is equal to b1 = ∞n=1nanαn−1.

We now turn to the important question of which functions f can be expressed as a power series onsome interval. Since we have shown power series are differentiable to all orders in their interval of convergence, a necessary condition is clearly that f is differentiable to all orders; however, this is not

7/28/2019 Math51H


LECTURE 6. TAYLOR SERIES REPRESENTATIONS 109

sufficient; see Exercise 6.2 below. To get a reasonable sufficient condition, we need the followingtheorem.

Theorem 6.2 (Taylor’s Theorem.) Suppose r > 0, α ∈ R and f is differentiable to order m + 1 on the interval |x − α| < r .Then ∀ x with |x − α| < r, ∃ c between α and x such that

f(x) = mn=0

f (n)(α)

n! (x − α)n + f (m+1)(c)

(m + 1)! (x − α)m+1 .

Proof: Fix x with 0 < x − α < r (a similar argument holds in case −r < x − α < 0), and, for|t − α| < r , define

(1) g(t )

=f( t )

−mn=

0

f (n)(α)

n!(t

−α)n

−M(t

−α)m+1 ,

where M (constant) is chosen so that g(x) = 0, i.e.,

M = (f(x) −mn=0

f (n)(α)n! (x − α)n)

(x − α)m+1.

Notice that, by direct computation in (1),

(2)

g(n)(α) = 0 ∀ n = 0, . . . , m

g(m+1)(t ) = f m+1(t ) − M(m + 1)!, |t − α| < r.

In particular, since g(α) = g(x) = 0, the mean value theorem tells us that there is c1 ∈ (α, x) such

that g (c1) = 0. But then g (α) = g (c1) = 0, and hence again by the mean value theorem there isa constant c2 ∈ (α, c1) such that g (c2) = 0.

After (m + 1) such steps we deduce that there is a constant cm+1 ∈ (α, x) such that g(m+1)(cm+1) =0. However, by (2), g(m+1)(t ) = f (m+1)(t ) − M(m + 1)!, hence this gives

M = f (m+1)(cm+1)

(m + 1)! .

In view of our definition of M , this proves the theorem with c = cm+1.

Theorem 6.2 gives us a satisfactory sufficient condition for writing f in terms of a power series.Specifically we have:

Theorem 6.3. If f(x) is differentiable to all orders in |x − α| < r , and if there is a constant C > 0 suchthat

(∗)

f (n)(x)

n! rn ≤ C ∀ n ≥ 0, and ∀ x with |x − α| < r ,

then∞

n=0f (n)(α)

n! (x − α)n converges, and has sum f(x) , for every x with |x − α| < r .

7/28/2019 Math51H



Note 1: Whether or not (∗) holds, and whether or not∞n=0

f (n)(α)n! (x − α)n converges to f(x), we

call the series∞

n=0f (n)(α)

n! (x − α)n the Taylor series of f about α.

Note 2: Even if the Taylor series converges in some interval x − α < r , it may fail to have sum f(x).(See Exercise 6.2 below). Of course the above theorem tells us that it will have sum f(x) in casethe additional condition (∗) holds.

ProofofTheorem 6.3: The condition (∗) guarantees that the term f (m+1)(c)(m+1)! (x − α)m+1 on the right

in Thm. 6.2 has absolute value ≤ C |x−α|

r

m+1and hence Thm. 6.2, with m = N , ensures

f(x) −

N n=0

f (n)(α)

n

!(x − α)n

≤ C

|x − α|

r N +1 → 0 as N → ∞

if |x − α| < r . Hence,

limN →∞

N n=0

f (n)(α)

n! (x − α)n = f(x)

whenever |x − α| < r , i.e.,∞n=0

f (n)(α)

n! (x − α)n = f(x), |x − α| < r .

LECTURE 6 PROBLEMS

6.1 Prove that the function f , defined by

f(x) = e−1/x

2

if x = 00 if x = 0 ,

is C∞ on R and satisfies f (m)(0) = 0 ∀ m ≥ 0.

Note: This means the Taylor series of f(x) about 0 is zero; i.e., it is an example where the Taylorseries converges, but the sum is not f(x).

6.2 If f is as in 6.1, prove that there does not exist any interval (−ε,ε) on which f is representedby a power series; that is, there cannot be a power series

∞n=0anxn such that f(x) = ∞

n=0anxn

for all x ∈ (−ε,ε).

6.3 Let bnm be arbitrary real numbers 0 ≤ n ≤ N , 0 ≤ m ≤ n. ProveN n=0

nm=0bnm = N

m=0

N n=mbnm .

Hint: Define b̃nm =

bnm if m ≤ n

0 if n < m ≤ N .

7/28/2019 Math51H


LECTURE 6. TAYLOR SERIES REPRESENTATIONS 111

6.4 Find the Taylor series about x = 0 of the following functions; in each case prove that the seriesconverges to the function in the indicated interval.

(i) 11−x2 , |x| < 1 (Hint: 1

1−y= 1 + y + y2 . . . , |y| < 1).

(ii) log(1 + x), |x| < 1

(iii) ex , x ∈ R(iv) ex2

, x ∈ R (Hint: set y = x2).

6.5 (The analytic definition of the functions cosx,sin x, and the number π.)

Letsin x,cos x be defined bysin x =

∞n=0(−1)n x2n+1

(2n+1)! , cos x =

∞n=0(−1)n x2n

(2n)! . For convenience

of notation, write C(x) = cos x, S(x) = sin x. Prove:(i) The series defining S(x), C(x) both have radius of convergence ∞,and S (x) ≡ C(x), C (x) ≡−S(x) for all x ∈ R.

(ii) S 2(x) + C2(x) ≡ 1 for all x ∈ R. Hint: Differentiate and use (i).

(iii) sin, cos (as defined above) are the unique functions S, C on R with the properties (a) S(0) =0, C(0) = 1 and (b) S (x) = C(x), C (x) = −S(x) for all x ∈ R. Hint: Thus, you have to show thatS = S and C = C assuming that properties (a),(b) hold withS, C in place of S, C, respectively;

show that Thm. 6.3 is applicable.

(iv) C(2) < 0 and hence there is a p ∈ (0, 2) such that C(p) = 0,S(p) = 1 and C(x) > 0 for allx ∈ [0, p).

Hint: C(x) can be written 1 −x2

2 +x4

24 −∞k=1 x4k

+2

(4k+2)! −x4k

+4

(4k+4)!.

(v) S, C are periodic on R with period 4p. Hint: Start by defining C(x) = S(x + p) and S(x) =−C(x + p) and use the uniqueness result of (iii) above.

Note: The number 2p, i.e., twice the value of the first positive zero of cos x, is rather important, and we have a special name for it—it usually denoted by π . Calculation shows that π = 3.14159 . . ..

(vi) γ(x) = (C(x), S(x)), x ∈ [0, 2π] is a C1 curve, the mapping γ |[0, 2π ) is a 1:1 map of [0, 2π )

onto the unit circle S 1 of R2, and the arc-length S(t) of the part of the curve γ |[0, t ] is t for each

t ∈ [0, 2π]. (See Figure A.2.)

Remark: Thus, we can geometrically think of the angle between e1 and (C(t), S(t)) as t (that’s t

radians,meaning the arc on the unit circle going from e1 to P = (C(t), S(t)) (counterclockwise) haslength t as you are asked to prove in the above question) and we have the geometric interpretationthat C(t)(= cos t ) and S(t)(= sin t ) are, respectively, the lengths of the adjacent and the opposite

sides of the right triangle with vertices at (0, 0), (0, C(t)), (C(t), S(t)) and angle t at the vertex(0, 0), at least if 0 < t < p = π

2. (Notice this is now a theorem concerning cos t, sin t as distinct

from a definition.)

7/28/2019 Math51H



Figure A.2:

Note: Part of the conclusion of (vi) is that the length of the unit circle is 2π . (Again, this becomesa theorem; it is not a definition—π is defined to be 2p, where p, as in (iv), is the first positive zero

of the function cos x = ∞n=0(−1)n x2n

(2n)! .)

7/28/2019 Math51H


LECTURE 7. COMPLEX SERIES AND COMPLEX EXPONENTIAL SERIES 113

LECTURE 7: COMPLEX SERIES, PRODUCTS OF SERIES, ANDCOMPLEX EXPONENTIAL SERIES

In Real Analysis Lecture 4 we discussed series∞

n=1 an where an ∈ R. One can analogously considercomplex series, i.e., the case when an ∈ C. The definition of convergence is exactly the same as in

the real case. That is we say the series converges if the nth partial sum (i.e.,n

j =1 aj ) converges;more precisely:

Definition:∞

n=1 an converges if the sequence of partial sums {nj =1 aj }n=0,1,... is a convergent

sequenceinC;thatis,thereisacomplexnumber s = u + iv (u, v real) such that limn→∞n

j =1 aj =s.

Note: Of course here we are using the terminology that a sequence {zn}n=1,2,... ⊂ C converges, with

limit a = α + iβ, if the real sequence |zn − a| has limit zero, i.e., limn→∞ |zn − a| = 0. In terms of “ε, N ” this is the same as saying that for each ε > 0 there is N such that |zn − a| < ε for all n ≥ N . Writing zn in terms of its real and imaginary parts, zn = un + ivn, then this is the same as sayingun → α and vn → β, so applying this to the sequence of partial sums we see that the complex series∞

n=1 an with an = αn + iβn is convergent if and only if both of the real series∞

n=1 αn,∞

n=1 βn

are convergent, and in this case∞

n=1 an = ∞n=1 αn + i

∞n=1 βn.

Most of the theorems we proved for real series carry over, with basically the same proofs, tothe complex case. For example, if

∞n=1 an,

∞n=1 bn are convergent complex series then the se-

ries

∞n=1(an + bn) is convergent and its sum (i.e., limn→∞

nj =1(aj + bj )) is just

∞n=1 an +

∞n=

1 bn.

Also, again analogously to the real case, we say the complex series ∞n=1 an is absolutely convergent

if ∞

n=1 |an| is convergent. We claim that just as in the real case absolute convergence impliesconvergence.

Lemma 1. The complex series ∞

n=1 an is convergent if ∞

n=1 |an| is convergent.

Proof: Let αn, βn denote the real and imaginary parts of an, so that an = αn + iβn and |an| = α2

n + β2n ≥ max{|αn|, |βn|}, so

∞n=1 |an| converges ⇒ ∃ a fixed M > 0 such that

nj =1 |aj | ≤

M ∀ n ⇒ nj =1 |αj | ≤ M ∀ n ⇒ ∞

j =1 |αn| is convergent, so∞

n=1 αn is absolutely conver-gent, hence convergent. Similarly,

∞n=1 βn is convergent. But

nj =1 aj =

nj =1 αj + i

nj =1 βj

and so limn

→∞nj =

1 an exists and equals limn

→∞nj =

1 αj

+i limn

→∞nj =

1 βj

= ∞n=

1 αn

+i∞n=1 βn, which completes the proof.

We next want to discuss the important process of multiplying two series: If ∞

n=0 an and∞

n=0 bn

aregiven complex series, we observe that theproduct of thepartial sums,i.e.,the product (N

n=0 an) ·(N

n=0 bn), is just the sum of all the elements in the rectangular array

7/28/2019 Math51H



⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

aN b0 aN b1 aN b2 · · · · · · · · · aN bN

aN −1b0 aN −1b1 aN −1b2 · · · · · · · · · aN −1bN

......

......

......

......

a2b0 a2b1 a2b2 · · · · · · · · · a2bN

a1b0 a1b1 a1b2 · · · · · · · · · a1bN

a0b0 a0b1 a0b2 · · · · · · · · · a0bN

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

N

n=0

anN

n=0

bn =

2N

n=0

0≤i,j ≤N, i+j =nai bj ,

and observe that if i, j ≥ 0 and i + j ≤ N we automatically have i, j ≤ N , and so with cn =i,j ≥0, i+j =n ai bj (= n

i=0 ai bn−i ) we see that the right side of the above identity can be writtenN n=0 cn +0≤i,j ≤N, i+j>N ai bj , and so we have the identity

(∗) N

n=0

an

N n=0

bn

−

N n=0

cn =

0≤i,j ≤N, i+j>N

ai bj

for each N = 0, 1, . . .. (Geometrically,

N n=0 cn is the sum of the lower triangular elements of the

array, including the leading diagonal, and the term on the right of (∗

) is the sum of the remaining,upper triangular, elements.) If the given series an, bn are absolutely convergent, we show below that the right side of (∗) → 0 as N → ∞, so that

∞n=0 cn converges and has sum equal to

(∞

n=0 an) (∞

n=0 bn). That is:

Lemma (Product Theorem.) If

an and

bn are absolutely convergent complex series, then(∞

n=0 an)(∞

n=0 bn) = ∞n=0 cn , where cn = n

i=0 ai bn−i for each n = 0, 1, 2, . . .; furthermore,

the series

cn is absolutely convergent.

Proof: By (∗) we have

N

i=0

aiN

j =0

bj −

N

n=0

cn= i,j ≤N, i+j >N

ai bj ≤

i,j ≤N, i+j >N

|ai ||bj |

≤ i,j ≤N,i>N/2

|ai ||bj | + i,j ≤N , j >N /2

|ai ||bj |

= N

i=[N/2]+1

|ai | N

j =0

|bj | N

i=0

|ai | N

j =[N/2]+1

|bj |

7/28/2019 Math51H


LECTURE 7. COMPLEX SERIES AND COMPLEX EXPONENTIAL SERIES 115

≤ ∞i=[N/2]+1

|ai | ∞j =0

|bj |+ ∞i=0

|ai | ∞j =[N/2]+1

|bj |→ 0 as N → ∞ ,

where [N/2] = N 2

if N is even and [N/2] = N −12

if N is odd. Notice that in the last line we usedthe fact that

∞i=[N/2]+1 |ai | =

∞i=0 |ai | −[N/2]

i=0|ai | → 0 as N → ∞ because by definition∞

n=0 |an| = limJ →∞J

n=0|an|, and similarly,∞

j =[N/2]+1 |bj | =∞

j =0 |bj | −[N/2]j =0

|bj | → 0,

because∞

n=0 |bn| = limJ →∞J

n=0 |bn|. This completes the proof that

∞n=0 cn converges, and

∞n=0 cn = (

∞n=0 an) (

∞n=0 bn).

To prove that

∞n=0 |cn| converges, just note that for each n ≥ 0 we have |cn| ≡

i+j =n ai bj ≤ i+j =n |ai ||bj | ≡ Cn say, and the above argument, with |ai |, |bj | in place of ai , bj , respectively,shows that

∞n=0 Cn converges, so by the comparison test

∞n=0 |cn| also converges.

We now define the complex exponential series.

Definition: The complex exponential function exp z (also denoted ez) is defined by exp z =∞n=0

zn

n! .

Observe that this makes sense for all z because the series∞

n=0|z|n

n! is the real exponential series,

which we know is convergent on all of R, so that the series∞

n=0zn

n! is absolutely convergent (henceconvergent by Lem. 1) for all z.

We can use the above product theorem to check the following facts, which explains why the notationez is sometimes used instead of exp z:

(exp a)(exp (b) = exp(a + b), a, b ∈ C,(i)exp ix = cos x + i sin x, x ∈ R .(ii)

The proof is left as an exercise (Exercises 7.1 and 7.2 below).

Notice that it follows from (i) that exp z is never zero (because by (i) (exp z) (exp −z) = exp 0 =1 = 0 ∀ z ∈ C).

LECTURE 7 PROBLEMS

7.1 Use the product theorem to show that exp a exp b = exp(a + b) for all a, b ∈ C.

7.2 Justify the formula exp ix

=cos x

+i sin x for all x

∈R.

Note: cos, sin are defined by cos x = ∞k=0(−1)k x2k

(2k)! and sin x = ∞k=0(−1)k x2k+1

(2k+1)! for all real x.

7/28/2019 Math51H



LECTURE 8: FOURIER SERIES

So far we have only considered vectors in Rn, but of course the concept of vector is really muchmore general. Indeed, any collection of objects V which have a well defined operation of additionand multiplication by real scalars such that following eight vector space axioms are satisfied, is calleda real vector space and the objects themselves are called vectors :

8.1 Vector Space Axioms for V :

We assume that for v, w ∈ V and λ ∈ R there are well-defined operations “+” (addition) and “.”(multiplication of vector by a scalar) to give v + w ∈ V and λ · v ∈ V (henceforth written just asλv) and that the following properties hold:

(i) v, w ∈ V ⇒ v + w = w + v

(ii) u , v , w ∈ V ⇒ (u + v) + w = u + (v + w)

(iii) ∃ 0 ∈ V (called the “zero vector”) with v + 0 = v ∀ v ∈ V

(iv) v ∈ V ⇒ ∃ an element −v with v + (−v) = 0

(v) λ, μ ∈ R, v ∈ V ⇒ λ(μv) = (λμ)v

(vi) λ, μ ∈ R, v ∈ V ⇒ (λ + μ)v = (λv) + (μv)

(vii) λ ∈ R, v , w ∈ V ⇒ λ(v + w) = (λv) + (λw)

(viii) v ∈ V ⇒ 1 v = v.

Note: In a general vector space we usually write v

−w as an abbreviation for v

+(

−w).

There are many examples of such general real vector spaces: For example, the set C0([a, b]) of

continuous functions f : [a, b] → R trivially satisfiestheabove axioms if addition and multiplicationby scalars are defined pointwise (i.e., f, g ∈ C0([a, b]) ⇒ f + g ∈ C0([a, b]) is defined by (f +g)(x) = f(x) + g(x)∀x ∈ [a, b], and λ ∈ R and f ∈ C0([a, b]) ⇒ λf is defined by (λf )(x) =λf (x) for x ∈ [a, b]). In this case, the zero vector is the zero function 0(x) = 0 ∀x ∈ [a, b]. Mostof the definitions and theorems for the Euclidean space Rn carry over with only notational changesto general real vector spaces. For example, the linear dependence lemma holds in a general vectorspace and the basis theorem applies in any subspace which is the span of finitely many vectors.

Suppose now that V is a real vector space equipped with a inner product , . Thus, v, w is real

valued and(i) v, w = w, v ∀ v, w ∈ V ,

(ii) αv1 + βv2, w = αv1, w + βv2, w ∀ v1, v2, w ∈ V and α, β ∈ R, and

(iii) v, v ≥ 0 ∀ v ∈ V with equality if and only if v = 0.

We define the norm of V , denoted , by v = √ v, v.

7/28/2019 Math51H


LECTURE 8. FOURIER SERIES 117

Notice that then the inner product and the norm have properties very similar to the dot product andnorm of vectors in Rn (indeed an important special case of inner product is the case V = Rn andv, w = v · w). In the general case, by properties (i), (ii), (iii), we have

(1) v − w2 = v2 + w2 − 2v, w ∀ v, w ∈ V ,

and we can prove various other results which have analogues in the special case when V = Rn andv, w = v · w; for example, there is a Cauchy-Schwarz inequality (proved exactly as in Rn):

|v, w| ≤ v w, v, w ∈ V .

Analogous to the Rn case, a set v1, . . . , vN of vectors in V is said to be orthonormal if

(2)

vi , vj

=δij

∀i, j

∈ {1, . . . , N

} where δij denotes the “Kronecker delta,” defined by

δij =

1 if i = j

0 if i = j .

Given such an orthonormal set v1, . . . , vN and given any other v ∈ V , we define

(3) cn = v, vn .

Then by using (1) we get

v −

N n=1 cnvn2 = v2 +

N n=1

N m=1 cncmvn, vm − 2cnv, vn .

And hence by (2), (3) we get

(∗) v −N n=1 cnvn2 = v2 −N

n=1 c2n ,

and hence in particular,

(∗∗)N

n=1 c2n ≤ v2 ,

with equality if and only if v = N n=1 cnvn. (Notice that geometrically

N n=1 cnvn represents the

“orthogonal projection” of v onto the subspace spanned by v1, . . . vN —see Exercise 8.1.)

Now in the case when V is not finite dimensional we can have an orthonormal sequence v1, v2 . . . of elements of V . In particular, the above discussion applies to v1, v2, . . . , vN for each integer N ≥ 1,

and (∗∗) tells us that the partial sums N

n=1 c

2

n of the series ∞n=1 c

2

n are bounded above by v2

.Hence,

∞n=1 c2

n converges (recall from Lecture 4 that a series of nonnegative terms converges if and only if the partial sums are bounded) and

(‡)

∞n=1

c2n ≤ v2 .

7/28/2019 Math51H



(‡) is called Bessel’s Inequality . Notice (by (∗)) that equality holds in (‡) if and only if limN →∞ v −N n=1 cnvn = 0. This ties in with the following definition.

Definition: The orthonormal sequence v1, v2, . . . is said to be complete if, for each v ∈ V , we havelimN →∞ v −N

n=1 cnvn = 0; here of course cn = v, vn as in (3) above. Notice that by (∗)

completeness holds if and only if equality holds in (‡) ∀ v ∈ V .

Terminology: If limN →∞ v −N n=1 cnvn = 0 then we say the series

∞n=1 cnvn converges to v

in the norm of V . Whether or not this convergence takes place, the series∞

n=1 cnvn is called the Fourier series for v relative to the orthonormal sequence v1, v2, . . ..

For the rest of this lecture we specialize to a concrete situation—Viz., Fourier series for piecewisecontinuous functions. First we introduce some notations: for any given closed interval

[a, b

] ⊂R,

C P ([a, b]) = the set of piecewise continuous functions on [a, b]. That is f ∈ C P ([a, b]) means thereis a partition a = x0 < x1 < . . . < xJ = b of the interval [a, b] such that f is continuous at eachpoint of (xj −1, xj ), j = 1, . . . , J , and such that

limx↓a

f(x), limx↑b

f (x), limx↓xj

f (x), limx↑xj

f(x) all exist j = 1, . . . , J − 1 .

C A([a, b]) denotes the subset of C P ([a, b]) functions f with the additional properties that

f(a) = f(b) = 1

2

limx↓a

f(x) + limx↑b

f(x)

and

f (xj )

=

1

2 limx↓

xj

f(x)

+lim

x↑

xj

f(x), j

=1, . . . , J

−1 .

One readily checks that if f, g ∈ C P ([a, b]) and if α, β ∈ R, then αf + βg ∈ C P ([a, b]), soC P ([a, b]) is a vector space which is a subspace of the space of all functions f : [a, b] →R.Likewise,

C A([a, b]) is a vector space, which is a subspace of C P ([a, b]).

We now specialize further to the interval [−π, π] and we define an inner product on C A([a, b]) by

(†) f, g = 1π

π−π

f (x)g(x) dx ,

so that f =

1π

π−π

f 2(x)dx. (The factor 1π

is present purely for technical reasons.)

By direct computation, one checks that the sequence 1√ 2

, cos x, sin x, cos 2x, sin 2x , . . . ,

cos nx, sin n x , . . . is an orthonormal sequence relative to this inner product (Exercise 8.4.).

Definition: Given f ∈ C A([−π, π]), the trigonometric Fourier series of f is defined to bethe Fourier series of f relative to the orthonormal sequence 1√

2, cos x, sin x, cos 2x, sin 2x,

. . . , cos nx, sin n x , . . .. Thus, the trigonometric Fourier series of f can be written

c01√

2+∞

n=1(an cos nx + bn sin nx), where

7/28/2019 Math51H


LECTURE 8. FOURIER SERIES 119

c0 = 1π π−π 1√ 2 f( x ) d x ,

an = 1π

π−π

cos(nx )f (x ) d x , n = 1, 2, . . .

bn = 1π

π−π

sin(nx )f (x ) d x , n = 1, 2, . . . .

A much more convenient notation is achieved by setting a0 =√

2c0, in terms of which the abovetrigonometric Fourier series is written

a0

2+∞

n=1(an cos nx + bn sin nx), where

an = 1π

π−π

cos(nx )f (x ) d x , n = 0, 1, 2, . . .

bn = 1π

π−π

sin(nx )f (x ) d x , n = 1, 2, . . . .

Bessel’s inequality (∗∗) says in this case (keeping in mind that c20 = a20 /2) that

a20

2+∞

n=1(a2n + b2

n) ≤ 1π

π−π

f 2(x)dx ,

and by the above general discussion we have equality if and only if the Fourier series is normconvergent to f (i.e., if and only if the orthonormal sequence 1√

2, cos x, sin x, cos 2x, sin 2x , . . ., is

complete in the vector space V = C A([−π, π]). We claim that this is in fact true; that is, we claim:

Theorem 8.2. For each f ∈ C A([−π, π]) , we have

limN →∞f − a0

2+N

n=1(an cos nx + bn sin nx) = 0,

and a2

0

2 +N

n=1(a

2

n + b

2

n) = f 2

≡1

π π−π f

2

(x)dx .

Proof: The proof is nontrivial, and we omit it. Nevertheless, in the next lecture we will give acomplete discussion (with proof) of pointwise convergence of the Fourier series to f . (For pointwiseconvergence we will have to assume a stronger condition on f than that merely f ∈ C A([−π, π]).)

LECTURE 8 PROBLEMS

8.1 With notation as on p. 117 prove thatN

n=1 cnvn really does represent “orthogonal projection” onto the subspace of V spanned by v1, . . . vN ; that is, prove (i)

N n=1 cnvn = v when

v ∈ span{v1, . . . vN }, and (ii) v −N n=1 cnvn, w = 0 whenever w ∈ span{v1, . . . vN }.

8.2 (i) With the same notation, prove that for any constants d 1, . . . , d N and any v ∈ V

v −N n=1 d nvn2 = v2 −N

n=1(d 2n − 2cnd n) (cn = v, vn) .

(ii) Using (i) together with the inequality 2ab ≤ a2 + b2 with equality if and only if a = b, provethat (with cn = v, vn)

(∗) v −N n=1 d nvn2 ≥ v −N

n=1 cnvn2

7/28/2019 Math51H



with equality if and only if cn = d n∀ n = 1, . . . , N .(Noticethatgeometrically (∗) saysthat

N n=1 cnvn isthepointinthesubspacespannedby v1, . . . , vN

which has least distance from v.)

8.3 If f, g ∈ C A([a, b]) and α, β ∈ R, prove αf + βg ∈ C A([a, b]).

8.4 Prove that , defined in (†) is an inner product on C A([−π, π]), and check that the sequenceof functions 1√

2, cos x, sin x, cos 2x, sin 2x , . . . is an orthonormal sequence (as claimed in the above

lecture).

8.5 Compute the Fourier series for the following functions f ∈ C A([−π, π]):

(i) f(x)

= |x

|,

−π

≤x

≤π

(ii) f(x) =

x, −π < x < π

0, x = ±π.

8.6 Assuming the result of Thm. 8.1 and using the Fourier series of 8.5(i), (ii), find the sums of theseries

(i)∞

n=11

(2n−1)4 ,

(ii)∞

n=11

n2 .

7/28/2019 Math51H


LECTURE 9. POINTWISE CONVERGENCE OF TRIGONOMETRIC FOURIER SERIES 121

LECTURE 9: POINTWISE CONVERGENCE OF TRIGONOMETRIC FOURIER SERIES

In the previous lecture we defined CA([a, b]) to be the space of functions f : [a, b] → R which arepiecewise continuous (i.e., in C P ([a, b])) and which satisfy the additional conditions

f(a) = f(b) = 1

2

f (a+) + f (b−)

and

(∗) f (y) = 1

2

f (y+) + f (y−)

at each point y ∈ (a,b). Here and subsequently we usethenotation f (y+) = limx↓y f(x),f(y−) =limx↑y f(x); notice that of course we automatically have f (y+) = f (y−) = f(y) at any point y

where f is continuous, so (∗) really only imposes a restriction on f at the finitely many points wheref is not continuous.

Now take f ∈ CA([−π, π]). Since f (−π ) = f( π), we can extend f to give a function f̄ : R → R

by defining

f̄ (x + 2kπ ) = f(x),x ∈ [−π, π], k = 0, ±1, ±2, . . . .

This extended function is 2π -periodic.(A function g : R → R issaidtobe p-periodic if g(x + p) =g(x) ∀ x ∈ R—evidently f̄ has this property with p = 2π by definition.) Henceforth we shall dropthe “bar” in

¯f ; that is f will denote both the given function in CA(

[−π, π

]) and its 2π -periodic

extension as defined above. Of course the 2π -periodic extension is piecewise continuous on all of Rand satisfies (∗) at each point y ∈ R.

An important fact about periodic functions (which we shall need below) is the following.

Lemma 9.1. If g : R → R is piecewise continuous and p-periodic for some p > 0 , then a+pa

f( x ) d x = p0

f( x ) d x ∀ a ∈ R .

Proof: a0 g(x)dx = a

0 g(x + p)dx (since g(x) ≡ g(x + p))

=

a+pp

g(t)dt (by change of variable t = x + p) ,

so a+pa

g(t)dt = a+p0

g(t)dt − a0

g(t)dt

= p0

g(t)dt + a+pp

g(t)dt − a0

g(t)dt

= p0

g(t) dt .

7/28/2019 Math51H



We now prove the main result about pointwise convergence of trigonometric Fourier series: Noticethat in the statement of the following theorem we use the 2π -periodic extension of f as describedabove in order to make sense of the quantities f (x+) if x = π and f (x−) if x = −π , and also tomake sense of the one-sided derivatives when x = ±π .

Theorem 9.2. Suppose f ∈ CA([−π, π]) and let x ∈ [−π, π] be such that the one-sided derivatives

(∗) limh↑0

f (x + h) − f (x−)

h, lim

h↓0

f (x + h) − f (x+)

h,

both exist. Then the Fourier series a0

2+∞

n=1(an cos nx + bn sin nx) converges to f(x).

ImportantNotes:(1) Ofcoursethelimitsin ∗ existautomatically(andareequal)if f isdifferentiable

at x.

(2) Here the coefficients are as described in Lecture 8, i.e.,

(‡)

an = 1

π

π−π

f( t ) cos nt dt, n ≥ 0,

bn = 1π

π−π

f( t ) sin nt dt, n ≥ 1 .

Recall also that Bessel’s inequality tells us that

a20

2+∞

n=1(a2n + b2

n) ≤ 1π

π−π

f 2(t)dt .

(As a matter of fact, Thm. 8.1 tells us that equality holds here, but we will not need this fact in thepresent discussion.)

A key step in the proof of Thm. 9.2 is the following lemma, which gives an alternative expression

for the partial sum a02

+N n=1(an cos nx + bn sin nx).

Lemma 9.3. If S N (x) = a0

2+N

n=1(an cos nx + bn sin nx) with an, bn as above, then for N ≥ 1 ,

S N (x) − f(x) = 0−π

(f(x − t ) − f (x+))DN (t)dt + π0 (f(x − t ) − f (x−))DN (t) dt ,

where DN (t ) = 1

π( 1

2+N

n=1 cos nt) .

Proof: Using (‡) we have

S N (x) = 1π

π−π

f( t ){12

+N n=1(cos nt cos nx + sin nt sin nx)} dt ,

= 1π

π−π

f( t ){12

+

N n=1 cos(n(x − t ))} dt

= π−π f(t)DN (x − t )) dt, (by definition of DN )= x+π

x−πf (x − s)DN (s) d s, (by change of variable t → x − s)

= x+πx−π

f (x − t )DN (t)dt,

= π−π

f (x − t )DN (t ) dt , (by Lem. 9.1)

= 0−π

f (x − t )DN (t)dt + π0

f (x − t )DN (t)dt ,(1)

7/28/2019 Math51H



where in the second line we used the identity cos(A − B) = cos A cos B + sin A sin B.

Now evidently 0−π

DN (t)dt = π0

DN (t)dt = 12

, and since f ∈ CA([−π, π]), we have f(x) =12

(f(x+) + f (x−)). Hence, 0−π

f (x − t )DN ( t ) d t = 0−π

(f(x − t ) − f (x+))DN (t)dt +12

f (x+), and π

0f (x − t )DN (t)dt = π

0f (x − t ) − f (x−))DN (t)dt + 1

2f (x−), so (1) yields

S N (x) − f(x) = 0−π

(f(x − t ) − f (x+))DN (t)dt + π0

(f(x − t ) − f (x−))DN (t)dt

as required.

Before proceeding to the proof of Thm. 9.2 we notice that we have the following alternative expres-

sions for DN (t ):

(∗) DN (t ) = sin(N + 12

)t

2π sin( 12

t ), −π ≤ t ≤ π, t = 0 .

To see this,note that π DN (t ) ≡ 12

+N n=1 cos nt = 1

2

N n=−N eint (byvirtueoftheformulacos x =

12

(eix + e−ix ), so that by multiplying by e±it /2 and taking the difference of the resulting expressions, we have

π DN (t)(eit /2 − e−it /2) = 1

2(ei(N + 1

2 )t − e−i(N + 12 )t ) ,

and so, using sin x =1

2i (e

ix

− e−ix

), we obtainπ sin 1

2t DN (t ) = 1

2sin(N + 1

2)t ,

or, in other words,

DN (t ) = sin(N + 12

)t

2π sin 12

t if − π < t ≤ π, t = 0 .

We shall also need the following inequality for DN (t ):

(∗∗) |t ||DN (t )| ≤ 1, |t | ∈ (0,2π

3) .

Indeed, this is an easy consequence of (∗), because | sin( t 2

)| ≥ |t |4

for |t | ≤ 2π3

(see Exercise 9.1), andhence by (∗)

|t ||DN (t )| ≡ | sin(N + 12

)t |2π

· |t || sin( t

2)| ≤ 4

2π

sin(N + 1

2)t

≤ 1 for |t | ≤ 2π

3.

7/28/2019 Math51H



Proof of Theorem 9.2: Choose an arbitrary δ ∈ 0, 2π3 ), and note that by Lem. 9.3 and (∗) (andusing the identity sin(N + 1

2)t = sin N t cos 1

2t + cos N t sin 1

2t ) we can write

2π(S N (x) − f (x)) =(1) −δ−π

(f(x − t ) − f (x+)) cos N t cot t 2

dt −δ−π

(f (x − t ) − f (x+)) sin Nt dt

+ πδ

(f(x − t ) − f (x−)) cos N t cot t 2

dt + πδ

(f(x − t ) − f (x−)) sin Nt dt

+ 2π 0−δ

(f(x − t ) − f (x+))DN (t)dt + 2π δ

0(f(x − t ) − f (x−))DN (t)dt .

Now notice that the first 2 integrals on the right are just the Fourier coefficients of cos N x andsin N x in the Fourier series of

g1(t )

= 12

(f(x − t ) − f (x+) cot t 2

, −π < t < −δ

0 −δ < t < π

and

g2(t ) =

12

(f(x − t ) − f (x+) −π < t < −δ

0 −δ < t < π ,

respectively, and hence, since g1(t),g2(t ) are piecewise continuous on [−π, π], these first 2 integralsconverge to zero (by Bessel’s inequality) as N → ∞ for each fixed δ ∈ (0, 2π

3). Similarly, the 3rd

and 4th integrals converge to zero as N → ∞ for each fixed δ ∈ (0, 2π3

). On the other hand, usingthe hypotheses (∗) of Thm. 9.2 we have that there is a fixed constant C and a constant δ1 > 0 suchthat

|f (x

−t )

−f (x

+)

| ≤C

|t

|,

−δ1 < t < 0

and|f (x − t ) − f (x−)| ≤ C|t |, 0 < t < δ1 .

Hence, the final two integrals on the right in the expression above can be estimated as follows if δ ≤ δ1:

| 0−δ(f (x − t ) − f (x+))DN (t)dt | + | δ

0(f (x − t ) − f (x−))DN (t)dt |

≤ 0−δ

|(f (x − t ) − f (x+)||DN (t)dt + δ0 |(f(x − t ) − f (x−)||DN (t )| dt

≤ C 0

−δ|t ||DN (t )| dt + δ

0|t ||DN (t )| dt

≤ 2Cδ by (∗∗) above .

Thus, we can finish the proof as follows. Let ε > 0 be given. Choose δ ∈ (0, 2π3

) such that δ < δ1

and 2Cδ <ε

2 (i.e., δ <ε

4C ).Then with this choice of δ fixed , the first 4 integrals on the right of identity (‡) converge to zero as N → ∞, hence there is a constant N 1 ≥ 1 such that the absolute value of each of the first 4 integrals is < ε

8for all N ≥ N 1. That is,

|f(x) − S N (x)| < 4 × ε8

+ ε2

= ε, for all N ≥ N 1.

7/28/2019 Math51H



LECTURE 9 PROBLEMS9.1 Prove | sin t | ≥ |t |

2for |t | ≤ π

3.

(Hint: Since | sin t |, |t | are even functions, it is enough to check the inequality for t ∈ [0, π3]. Use

the mean value theorem on the function sin t − t 2

.)

9.2 Use the Fourier series of the function in Problem 8.4 (i) of Lecture 8,together with the pointwiseconvergence result of Thm. 9.2 to find the sum of the series

∞n=1

1(2n−1)2 .

9.3 Calculate theFourierseries forthefunction f(x) = x2, −π ≤ x ≤ π and prove that theFourierseries converges to f(x) for each x ∈ [−π, π]. (Use any general theorem from thetext that you need.)Hence, check the identity

∞n=1 1n2 = π

2

6

(independently of the result of Thm. 8.1).

9.4 By using a change of scale x → πxL

and thetheory of trigonometric Fourier series for functions inCA([−π, π]), give a general discussion of trigonometric Fourier series of functions in CA([−L, L]).

7/28/2019 Math51H



7/28/2019 Math51H


127

Index

absolute convergence, 103

absolutely convergent, 102

absolutely, converges, 105

addition and multiplication by scalars, closed

under, 5addition, definition of, 1

affine space, 27, 28

algebra, linear, 1, 61

analysis

in R n, 31

in R n, more, 81

introductory lectures on real, 87

angle between two nonzero vectors, 4

angle between vectors in R n, 3

approximation, polygonal, 49

Archimedean property, 89arrow, 2

attain its maximum, 96

attain its minimum, 96

augmented matrix, 28

axioms

field, 87

order, 87

vector space, 116

basis, 11

orthonormal, 73theorem, 11

vectors, 12

vectors for Rn, standard, 2

Bessel’s inequality, 118

binomial theorem, 107

bisection, method of, 94

Bolzano-Weierstrass, 33

Bolzano-Weierstrass theorem, 33, 82, 94

bound

least upper, 88lower, 88

upper, 88

bounded, 88, 91

above, 88, 91

below, 88, 91

sequence, 94

cancellation law, 90

Cauchy root test, 103

Cauchy-Schwarz inequality, 4, 5, 117

chain rule, 41chain rule theorem, 41

change of base-point theorem, 106

closed sets in Euclidean space, 31

closed under addition and multiplication by scalars, 5

column

pivot, 23, 26

rank of a matrix, 16

space, 15, 16

combination, linear, 6

comparison test for divergence, 103complements, orthogonal, 18

completeness property, 88

completion of the square, 3

complex exponential series, 113

complex series, 113

7/28/2019 Math51H


128 INDEX

component of the vector, 1component sequences, 33

computation of determinants, 67

computing the inverse, 72

construction of the real numbers, 89

continuity definition, 96

continuity of the Jacobian matrix, 84

continuous

at the point, 96

functions, 96

functions, piecewise, 118

contraction mapping principle, 81convergence

interval of, 105

pointwise, 119

radius of, 105

convergent, absolutely, 102

convergent, sequences, 92

converges, 102

absolutely, 105

sequences, 92

Taylor series, 110

cosines, law of, 5curves in R n, 48

De Morgan Laws, 32

decreasing, 91

decreasing, strictly, 91

definition

complex exponential series, 115

continuity, 96

dot product, 3

local maximum, 44

local minimum, 45of addition, 1

of differentiability, 35

of matrices, 13

of open and closed sets, 31

of span, 6

of subsequence, 93of the angle between vectors, 4

orthogonal complement, 18

derivatives

directional, 37

higher-order partial, 42

matrix, 36

one-sided, 48, 122

partial, 37

test, second, 44

determinants, 64

determinants, efficient computation of, 67difference, second, 43

differentiability definition, 35

differentiability in R n, 35

dimension of subspace, 12, 16

direct sum, 19

directed line segment, 2

directional derivatives, partial derivatives,

gradient, 37

divergence, 103

comparison test for, 103

ratio test for, 103dot product

and angle between vectors in R n, 3

definition, 3

properties, 3

echelon form of a matrix, 22, 23

efficient computation of determinants, 67

eigenvalues

and the spectral theorem, 76

of the matrix, 76

properties, 76eigenvector, 77

elementary operations, 6

elimination, Gaussian, 7, 22

equations, linear systems of, 7

Euclidean space, 14

7/28/2019 Math51H


INDEX 129

definitions and theorems for, 116open and closed sets in, 31

real, 1

extrema of a multivariable function, 44

field axioms, 87

finite length, 49

Fourier series, 116, 118

Fourier series, pointwise convergence of trigonometric, 121

function

continuous, 96extrema of a multivariable, 44

multivariable, 44

piecewise continuous, 118

second derivative test for, 46

Gaussian elimination, 7, 22

Gaussian elimination and the lineardependence lemma, 7

gradient, 37

gradient, tangential, 53, 57

Gram-Schmidt orthogonalization, 74, 75

graph map, 53

higher-order partial derivatives, 42

homogeneous system, 8

identity matrix, 66

implicit function theorem, 84, 85

increasing, strictly, 91

indefinite, 48

index of a permutation, 62

inequality

Bessel’s, 118Cauchy-Schwarz, 4, 5, 117

triangle, 4, 40

infimum, 88

inhomogeneous systems, 27

inner product, 116

integral test, 103interval of convergence, 105

inverse

computing the, 72

function theorem, 82

matrix, 69

of a square matrix, 69

inversions, number of, 62

Jacobian matrix, 36

j -th standard basis vector, 12

Lagrange multiplier theorem, 59

law

cancellation, 90

De Morgan, 32

of cosines, 5

parallelogram, 5

triangle, 2

least upper bound, 88, 97

length finite, 49

length of a matrix, 14

limit point of a set, 31limits and continuity in R n, 33

line segment, directed, 2

line vectors, 1, 2

linear

combination, 6

combination, nontrivial, 6

dependence

lemma, 10

lemma, Gaussian elimination and the, 7

of vectors, subspaces, 5

subspace, 5systems of equations, 7

transformations, 15, 78

transformations, matrix representations of,75

linearly dependent vectors, 6

7/28/2019 Math51H


130 INDEX

linearly independent vectors, 11local maximum, 44, 59

local minimum, 45, 59

lower bound, 88

matrices, 13

matrix

augmented, 28

column rank of a, 16

component-wise, 14

continuity of the Jacobian, 84

derivative, 36eigenvalues of the, 76

identity, 66

inverse, 69

inverse of a square, 69

Jacobian, 36

length of a, 14

norm of a, 14

nullity of a, 16

orthogonal, 75

product, 13

rank, 16real, 13

reduced row echelon form, 23

representations of linear transformations,

75

row echelon form of a, 22

row rank of a, 16

symmetric, 20

mean value theorem, 39, 98

method of bisection, 94

monotone, 91

multiplying two series, 113multivariable function, 44

negative definite quadratic form, 45

nontrivial linear combination, 6

nontrivial subspace, 18

norm of a matrix, 14null space, 16

nullity of a matrix, 16

one-sided derivatives, 48, 122

open and closed sets in Euclidean space, 31

operations, elementary, 6

operations with vectors, 1

order axioms, 87

orthogonal

complements and orthogonal projection,

18matrix, 75

projection, 119

orthogonal complements and, 18

unique, 19

vectors, 4

orthogonalization, Gram-Schmidt, 74, 75

orthonormal

basis, 73

rows, 75

set of vectors, 73

vectors, 73, 78, 117

parallelogram law, 5

parity of the permutation, 62

partial derivatives

gradient, directional derivatives, 37

higher-order, 42

second order, 42

partition, 49

permutation, 61

even, 62

index of a, 62odd, 62

parity of the, 62

piecewise continuous functions, 118

pivot columns, 23, 26

pointwise convergence, 119

7/28/2019 Math51H


INDEX 131

pointwise convergence of trigonometricFourier series, 121

polygonal approximation, 49

positive definite quadratic form, 45

power series, 105

product

dot, 3

inner, 116

matrix, 13

of series, 113

properties of dot, 3

theorem, 114projection, orthogonal, 19

Pythagoras’ theorem, 2

quadratic form, 45

radius of convergence, 105

rank

and the rank-nullity theorem, 15

column, 16

nullity theorem, 16

nullity theorem, rank and the, 15

of a matrix, 16ratio test, 103

ratio test for divergence, 103

rational numbers, 88

rearrangement of series, 102

reduced row echelon form, 23

representations of linear transformations,matrix, 75

representations, Taylor series, 108

Rolle’s theorem, 97

Sandwich theorem, 35, 95second derivative test, 44

second difference, 43

segment, directed line, 2

sequences

bounded, 94

component, 33convergent, 92

of real numbers and theBolzano-Weierstrass theorem, 91

of transpositions, 62

real, 33

series

complex, 113

complex exponential, 113

converges, 100, 113

absolutely, 105

Taylor, 110definition of complex exponential, 115

Fourier, 116, 118

multiplying two, 113

of real numbers, 100

pointwise convergence of trigonometricFourier, 121

power, 105

products of, 113

rearrangement of, 102

representations, Taylor, 108

trivially converges, 105

solution

of the system, 7

set of the system, 7

vector, 7

span definition, 6

spectral theorem, 77

spectral theorem, eigenvalues and the, 76

square matrix, inverse of a, 69

standard basis vectors for Rn, 2

strictly decreasing, 91strictly increasing, 91

submanifolds of R n and tangential gradients,

53

subsequence definition, 93

subspace

7/28/2019 Math51H


132 INDEX

dimension, 16dimension of a, 12

linear, 5

linear dependence of vectors, 5

nontrivial, 18

trivial, 5

sum, direct, 19

symmetric matrix, 20

tangent vector, unit, 48

tangential gradient, 57

Taylor series representations, 108 Taylor theorem, 109

theorem

basis, 11

binomial, 107

Bolzano-Weierstrass, 33, 82, 94

chain rule, 41

change of base-point, 106

contraction mapping, 81

eigenvalues and the spectral, 76

implicit function 84 85

Taylor’s, 109transformation, 61

linear, 15, 78

matrix representations of linear, 75

transposition, 61, 62

triangle

identity, 2

inequality, 4, 40

law, 2

trivial subspace, 5

under-determined system, 9unit tangent vector, 48

upper bound, 88

upper bound, least, 88

vectors, 1

angle between, 3

basis, 12

components of the, 1

definition of the angle between, 4

in R n, 1

math51h

Documents