rigorous calculus

121
Rigorous Calculus Part I, Notes for Math 147 Kenneth R. Davidson University of Waterloo

Upload: others

Post on 15-Oct-2021

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rigorous Calculus

Rigorous CalculusPart I, Notes for Math 147

Kenneth R. DavidsonUniversity of Waterloo

Page 2: Rigorous Calculus

PREFACE

These notes are designed for a rigorous course in calculus based on the complete-ness of the real numbers. Students are assumed to have had a high school levelcalculus course already. While we do not assume that students know about proofs,we only provide a rather brief introduction to the notion of proofs in the first chapterbefore diving in to the business of doing real analysis.

The real numbers are defined as the set of infinite decimals with the identifi-cation of a number ending in an infinite string of 9s with another decimal numberending in an infinite string of 0s. This definition has its problems when it comesto defining addition and multiplication. However it is familiar, and is the quickestway to get going. A better method of defining the real numbers is provided in anappendix. Chapter 2 is the meat of the course, and is also the most difficult. Weexplore what is meant by the completeness of the real numbers in several guises.All of the deeper theorems of calculus, including the Extreme Value Theorem andthe Intermediate Value Theorem, rely in an essential way on this material.

Curve sketching is stressed from the beginning. Students can use their knowl-edge of calculus from their earlier course to help them. To a great extent, the earlyexamples require only a little differentiation. Trigonometric functions, and the log-arithm and exponential functions are used throughout, as these provide a wealthof interesting functions. We assume that students are already familiar with trigfunctions and the basic trig identities such as the addition formula. The naturallogarithm is introduced as an area and its properties are derived without the use ofintegration. The exponential function is the inverse function of the logarithm.

Chapter 4 introduces the key concept of continuity and establishes the ExtremeValue Theorem and the Intermediate Value Theorem.

Finally in Chapter 5, we define the derivative formally. We discuss maxima andminima with an emphasis on the Mean Value Theorem. We do not spend much timeon the useful topic of max-min problems. We assume that this was well coveredin the high school course. We do include some exercises with such problems. Wediscuss convexity of functions in some detail, relating it to the second derivativeand deriving Jensen’s inequality.

It is my experience that students have learned L’Hopital’s Rule in high school,of course without the rather tricky proof. I make a big deal about this being unac-ceptable until it is proven. In particular, if the famous limit limx→0

sinxx = 1 is ever

‘proven’ using L’Hopital’s Rule, there will be a severe penalty. This limit is neededto determine the derivative of the sin function, and so using L’Hopital’s Rule is cir-cular. Moreover, I make the point soon after that many limits that can be computedby two or three applications of L’Hopital’s Rule can be established quickly usinglow order Taylor polynomial approximations.

This material is the first half of a two semester course on calculus. The divisionhas been made to leave all of integration to the second course.

i

Page 3: Rigorous Calculus
Page 4: Rigorous Calculus

CONTENTS

Preface i

Chapter 1. The Logic of Proofs 11.1. The Language of Mathematics 11.2. Proofs 4Exercises for Chapter 1 8

Chapter 2. The Real Numbers and Limits 102.1. The real numbers 102.2. Limits 122.3. Least upper bound Principle 172.4. Monotone Sequences 192.5. Subsequences 212.6. Completeness 232.7. Some Topology 25Exercises for Chapter 2 27

Chapter 3. Functions 293.1. Limits of functions 293.2. The natural logarithm 343.3. The exponential function 38Exercises for Chapter 3 43

Chapter 4. Continuity 454.1. Continuous functions 454.2. Properties of Continuous Functions 474.3. Extreme Value Theorem 484.4. Intermediate Value Theorem 494.5. Monotone Functions 52Exercises for Chapter 4 56

Chapter 5. Differentiation 585.1. The derivative 585.2. Differentiation Rules 615.3. Maxima and Minima 665.4. Mean Value Theorem 685.5. Convexity and the second derivative 725.6. Convexity and Jensen’s Inequality 765.7. L’Hopital’s Rule 80Exercises for Chapter 5 83

iii

Page 5: Rigorous Calculus

iv Contents

Chapter 6. Approximations 866.1. Taylor Polynomials 866.2. Uniform Continuity 936.3. Uniform limits 946.4. Newton’s Method 97Exercises for Chapter 6 100

Appendices 102A.1. Equivalence Relations 102A.2. A Construction of R 103A.3. Cardinality 106A.4. e is transcendental 109A.5. π is irrational 112

Index 115

Page 6: Rigorous Calculus

CHAPTER 1

The Logic of Proofs

1.1. The Language of Mathematics

In this section, we provide a quick overview of some of the language behindmathematical and logical statements. This is not a thorough treatment, but shouldsuffice to get us going.

Mathematics deals with statements, which assert some relationship betweencertain items or classes of items. Examples are

• x belongs to the set X .

• Y is a subset of X .

• x = y or x < y or x > y.

• x < y.

• f : [0, 1] → R is a monotone increasing function.

• All horses are white.

• Every cat has one tail.

Examples of non-statements are

• x+ y.

• Don’t divide by 0.

Statements must have the property that they are either True or False. Often a con-text is specified that limits the variables (if any) in the statements. Certain self-referential statements are not allowed, such as

• This statement is false.

Statements can be manipulated or combined with others to make new state-ments. There are three key words that have precise meanings that may differ frompopular usage in everyday speech. These are not, and and or.

If A is a statement, then “not A” or ¬A is the negation of A. If A is true, then¬A is false and vice versa.

If A and B are statements, then “A and B” or A ∧B is true if and only if bothA and B are true. If either is false, then A ∧B is false.

If A and B are statements, then “A or B” or A∨B is true if and only if at leastone of A or B is true. If both are false, then A ∨B is false.

1

Page 7: Rigorous Calculus

2 The Logic of Proofs

We also define implication, “A implies B” or A ⇒ B, means that if A is true,then B is true. The statement A ⇒ B is true if A and B are both true, but also ifA is false regardless of the truth of B. This is because such a statement means “IfA is true, then B is also true.” For this reason, you also see this written “if A, thenB”.

Since a statement can be either true or false, one can make truth tables whichstart with the various possibilities for the original statements and calculates the truthor falsity of other statements obtained by combining them. Here is an example. Tosave space, let MP be the statement (A ∧ (A⇒ B)) ⇒ B.

A B ¬A A∧B A∨B A⇒B B⇒A ¬B⇒¬A A∧(A⇒B) MPT T F T T T T T T TT F F F T F T F F TF T T F T T F T F TF F T F F T T T F T

We don’t use truth tables to prove theorems, but they can be used to investigategeneral relationships. The converse of the statement A⇒ B is the statement B ⇒A. You can see from the truth table that these statements are not equivalent. IfA⇒B is true, the converse may or may not be true depending on the circumstances.

The contrapositive of the statement A ⇒ B is the statement ¬B ⇒ ¬A. Thetruth table shows that these statements are equivalent. You can think this through asfollows: Assume that A⇒ B. This means that if A is true, then B is true. So if Bis false, then A cannot be true, so A is also false. That is the statement ¬B ⇒ ¬A.

We use the expression “A if and only if B” and write A ⇔ B to mean that(A⇒ B)∧ (B ⇒ A). It means that either both A and B are true or both are false.

A statement is a tautology if it is always true. The statementMP is a tautology.This particular statement is known as modus ponens.

A formula is a statement usually involving some variables. The truth or falsityof the statement may depend on the values assigned to the variables.

For example, consider the statement P (x, y) that 2x2−2xy+y2+2x−4y ≥ 20.Here we are told that x and y are real numbers. You can check that P (4, 2) is true,but P (3, 3) is false.

1.1.1. Quantifiers. There are two important quantifiers that we use in math-ematics. The first is the universal quantifier “for all” or ∀. This has the form∀x ∈ X, P (x). It states that for every variable x in some specified range X , thestatement P (x) is true. If the statement fails for a single value of a variable, thenthe whole statement is false. Consider

(A) ∀n ∈ N, n2 − 2 is even.

(B) ∀n ∈ N, n2 + n+ 41 is prime.

In both statements, the range is specified to be positive integers. Statement (A) istrue because n2 − n = n(n− 1), and either n is even, and thus so is the product or

Page 8: Rigorous Calculus

1.1 The Language of Mathematics 3

n is odd, in which case n − 1 is even, and so is the product. Thus n2 − n is evenfor all choices of n.

For (B), we look at the value of n2 + n + 41 for n ≥ 1. It starts out well:43, 47, 53, 61, 71, 83, 97, 113, 131, 151. So far, all are prime. However when youget to n = 40, we get (40)2+40+41 = (41)2 which is not prime. So the statementis false. The number n = 40 is called a counterexample to statement (B).

The second is the existential quantifier “there exists” or ∃. A statement has theform “∃x ∈ X such that P (x)”. It is true if there is a single x for which P (x) istrue. Consider

(C) ∃x ∈ R such that x sin( 1x) = 1.

(D) ∃n ∈ N such that 1141n2 + 1 is a perfect square.

The function f(x) = x sin( 1x) is even, i.e., f(−x) = f(x), so we can consider

x ≥ 0. When 0 ≤ x < 1, |x sin( 1x)| ≤ |x| < 1. If x ≥ 1, then 1

x ≤ 1 ∈ (0, π2 ).In this range, we will show in the course that 0 < sin 1

x < 1x , so 0 < f(x) < 1.

Therefore there is not a single value of x which makes the statement true—so (A)is false.

To properly analyze (D), we would need to know some more number theory.You could check on a computer, say the first trillion numbers, and you would notsucceed. It turns out that there are infinitely many integers which make this aperfect square, but the smallest has 26 digits:

30, 693, 385, 322, 765, 657, 197, 397, 208

Therefore (D) is true, even though you would never find it by crude methods.When you negate a statement using ∀, it becomes an ∃. The reason is that a “for

all” statement is contradicted by a single counterexample. For instance ¬(B) is thestatement: ∃n ∈ N such that n2 + n + 41 is not prime. This is a true statement,since n = 40 does the job.

Similarly, when you negate ∃, it becomes a ∀ statement. This is again becauseto contradict ∃x ∈ X such that P (x), you must show that P (x) is false for allx ∈ X . So the negation is ∀x ∈ X , ¬P (x). For instance, ¬(C) is the statement∀x ∈ R, x sin( 1

x) = 1. We showed that this is a true statement by proving some-thing a bit stronger, that

∣∣x sin( 1x)∣∣ < 1 for all x ∈ R.

Things start to get more complicated when we have more quantifiers. Thiscomes up in calculus because the definitions of limit and continuity require multiplequantifiers. Here we just give a couple of elementary examples. Consider

(E) ∀n ∈ N0 ∃m ∈ N0 such that 13 divides m2 + n2.

(F) ∃m ∈ N0 such that ∀n ∈ N0, 13 divides m2 + n2.

For statement (E), we are asked if for each n ∈ N0, we can select some m ∈ N0 sothat m2 + n2 is a multiple of 13. We are allowed to choose m any way we wish.So let’s choose m = 5n. Then m2 + n2 = 25n2 + n2 = 13(2n2) is divisible by13. Therefore (E) is true.

Page 9: Rigorous Calculus

4 The Logic of Proofs

Look at the difference when we reverse the order of the quantifiers. This isasking for a singlem so thatm2+n2 is always a multiple of 13. Thus bothm2+02

and m2 + 12 would need to be divisible by 13. But then (m2 + 1)− (m2 + 0) = 1would be divisible by 13. This is absurd. So the statement (F) is false.

1.2. Proofs

1.2.1. Direct proofs. Start with the hypothesis and work through straight tothe answer. Here is an example.

1.2.1. DEFINITION. If x = a0.x1x2x3 . . . is an infinite decimal (here a0 is aninteger and xi ∈ {0, 1, . . . , 9}), we say that the expansion is eventually periodic ifthere are positive integers N and d so that xn+d = xn for all n ≥ N .

1.2.2. THEOREM. If x ∈ R has a decimal expansion which is eventually peri-odic, then x is rational.

PROOF. Multiply x by 10N and by 10N+d. There are integers b and c so that

10N+dx = c.xN+d+1xN+d+2xN+d+3 . . .

10Nx = b.xN+1xN+2xN+3 . . .

Hence by subtracting,(10N+d − 10N )x = c− b.

Therefore x =c− b

10N+d − 10Nis rational. ■

1.2.2. Pigeonhole Principle. If n+ 1 or more objects are divided into n cate-gories, then there are at least two objects in the same category. The name refers toan office mailroom in which each person gets a pigeonhole in which to receive let-ters. There are variants, such as putting mn+ 1 objects into n categories. You candeduce that some category contains at least m+ 1 objects. Usually two is enough.

1.2.3. THEOREM. If x ∈ Q, then the decimal expansion of x is eventuallyperiodic.

PROOF. Since x is rational, we can write x =p

qwhere p, q are integers and

q > 0. For each k ≥ 0, there is a unique integer rk ∈ {0, 1, . . . , q − 1} sothat q divides 10k − rk. That is, rk is the remainder left when dividing q into10k. Then {r0, r1, . . . , rq} are q + 1 remainders taking only q possible values. By

Page 10: Rigorous Calculus

1.2 Proofs 5

the pigeonhole principle, there are two values 0 ≤ i < j ≤ q so that ri = rj .Since q divides 10i − ri and 10j − rj , it also divides the difference 10j − 10i, say10j − 10i = qa. Then

x =p

q=ap

aq= 10−i ap

10j−i − 1.

Let d = j − i and divide the denominator 10d − 1 into ap to get an integer b with aremainder s with 0 ≤ s < 10d − 1. So s has a most d digits, and thus we can writes = s1s2 . . . sd with si ∈ {0, 1, . . . , 9} and we put 0’s at the beginning if requiredto make this a d digit number. Then

y = 0.s1s2 . . . sds1s2 . . . sd · · · =∞∑k=1

10−dks =10−ds

1 − 10−d=

s

10d − 1

because a periodic decimal is a geometric series. Hence

10ix = b.s1s2 . . . sds1s2 . . . sd · · · = b.s1s2 . . . sd

and the decimal expansion of x is just the same with the decimal place shifted iplaces to the left. So it is eventually periodic. ■

1.2.3. Proof by Contradiction. We saw that the contrapositive of a statementA ⇒ B is ¬B ⇒ ¬A, and has the same truth. So we assume that B is false, anddeduce that A is false. This proves the contrapositive, so we are done. We usuallythink of this as reaching a contradiction to the hypothesis that A is true, whichexplains the name.

1.2.4. THEOREM. If d ∈ N is a positive integer which is not a perfect square,then

√d is irrational.

PROOF. Assume that√d is rational. Then A = {n ∈ N : n

√d ∈ N} is not

empty. A non-empty subset of N has a smallest element, so let a be the smallestelement of A. So a

√d ∈ N but b

√d is not an integer for 1 ≤ b < a.

Choose the integer m ∈ N so that m2 < d < (m+ 1)2. Notice that

0 <√d−m <

√(m+ 1)2 −m = 1.

Therefore 0 < b := a(√d −m) < a. However b = a

√d − am is an integer, so

1 ≤ b ≤ a− 1. Finally b√d = ad− (a

√d)m is an integer. This means that b ∈ A

and b < a. This contradicts the assumption that a was the smallest element, whichin turn was a consequence of the fact that A was non-empty. Hence A is empty,and so

√d is irrational. ■

Page 11: Rigorous Calculus

6 The Logic of Proofs

1.2.4. Proof by Induction. The Principle of Induction: let P (n) be a se-quence of statements for n ≥ n0. Suppose that

(1) P (n0) is true, and

(2) If n > n0 and P (k) is true for n0 ≤ k < n, then P (n) is true.Then P (n) is true for each n ≥ n0.

You can think of the statements P (n) as dominoes which are lined up in sucha way that if the earlier dominoes are knocked over, then one of them will knockover the nth domino. Then you knock over the first one and watch them all fall insuccession.

Simple induction works by showing that statement P (n−1) implies P (n). Butthere is no reason that it cannot be some other earlier statement or more than oneearlier statement which are required to establish P (n). We will see two examples.The first is a bit trickier than simple induction in that it depends on two previousstatements.

1.2.5. THEOREM. The Fibonacci sequence is defined recursively by

F (0) = F (1) = 1 and F (n+ 2) = F (n) + F (n+ 1) for n ≥ 0.

Let τ =

√5 + 12

. Then F (n) =τn+1 − (−1/τ)n+1

√5

.

PROOF. First observe that

1τ=

2√5 + 1

√5 − 1√5 − 1

=2(√

5 − 1)4

=

√5 − 12

.

Let P (n) be the statement F (n) =τn+1 − (−1/τ)n+1

√5

for n ≥ 0. When n = 0,

τ 1 − (−1/τ)1√

5=

1√5

(√5 + 12

+

√5 − 12

)=

√5√5= 1.

Hence P (0) is true. Next consider n = 1.

τ 2 − (−1/τ)2√

5=

1√5

((√5 + 1)2

4− (

√5 − 1)2

4

)=

4√

54√

5= 1.

Thus P (1) is true. This is step one.We need the identities

1 + τ = 1 +

√5 + 12

=

√5 + 32

= τ 2

and

1 − 1/τ = 1 −√

5 − 12

=3 −

√5

2=

1τ 2 .

Page 12: Rigorous Calculus

1.2 Proofs 7

Now for the induction step. Suppose n ≥ 2 and that P (n − 2) and P (n − 1)are true. Then

F (n) = F (n− 2) + F (n− 1)

=τn−1 − (−1/τ)n−1

√5

+τn − (−1/τ)n√

5

=1√5

(τn−1(1 + τ)− (−1/τ)n−1(1 − 1/τ)

)=

1√5

(τn−1τ 2 − (−1/τ)n−1(1/τ)2

)=τn+1 − (−1/τ)n+1

√5

This shows that P (n) follows from the previous two statements, P (n − 2) andP (n− 1). By induction, the statements P (n) are true for all n ≥ 0. ■

Here is a second example.

1.2.6. THEOREM. Every natural number n ≥ 2 is a product of prime numbers.

PROOF. Let P (n) be the statement that n factors as a product of primes. Startat n = 2. Then P (2) holds because 2 is prime (so is the product of one prime). Forthe induction step, we need to assume P (k) for all 2 ≤ k < n. Consider P (n).There are two cases.

Case 1: n is prime. Then P (n) is true,Case 2: n is not prime, so n = ab where 2 ≤ a, b < n. By P (a) and P (b), we

can write a = p1 . . . pm as a product of primes and b = q1 . . . qn as another productof primes. Therefore n = ab = p1 . . . pmq1 . . . qn is a product of primes, so P (n)is true. By induction, every n ≥ 2 is a product of primes. ■

In principle, this proof is more complex than the first. For example, for P (48),we might factor 48 = 6 · 8. Then the statement P (48) is deduced from P (6) andP (8). In each case, we don’t need to know the explicit factorization, only that aand b are strictly smaller so that we are assured that P (a) and P (b) have alreadybeen verified.

Page 13: Rigorous Calculus

8 The Logic of Proofs

Exercises for Chapter 1

1. In Xanadu, people are either Knights, Normals or Villains. Knights always tellthe truth, Villains always lie, and Normals can do both. Knights outrank Nor-mals, who outrank Villains.(a) Three people are known to consist of exactly one Knight, one Normal and

one Villain. They say:Alice: I’m normal.Bob: That is true.Charlie: I’m not normal.

Determine which type each person is. Explain.

(b) Two people from Xanadu, Dick and Jane, say:Dick: I rank below Jane.Jane: That is not true.

Determine the types of both and decide who is telling the truth. Explain.

2. Three young men accused of stealing cellphones make the following state-ments:

(1) Ed: “Fred did it, and Ted is innocent.”(2) Fred: “If Ed is guilty, then so is Ted.”(3) Ted: “I’m innocent, but at least one of the others is guilty.”

(a) If they are all innocent, who is lying? Explain.(b) If all these statements are true, who is guilty? Explain.(c) If the innocent told the truth and the guilty lied, who is guilty? Explain.

3. (a) Show that |15 sin θ + 8 cos θ| ≤ 17.Use trig identities, not calculus, to do this exercise.HINT: show that there is an angle α with sinα = 8

17 and cosα = 1517 .

(b) When does equality hold in this inequality?

4. Let a, b, c be positive real numbers greater than 1. Show that

loga(bc) logb(ac) logc(ab) = loga(bc) + logb(ac) + logc(ab) + 2.

HINT: : express everything in terms of A = log a, B = log b and C = log c.

5. (a) Find an expression for

f(x) =∣∣2x− |2x+ 1|

∣∣− ∣∣x− |x− 2|∣∣ for x ∈ R

which avoids the use of absolute value signs or square roots. You may splitthe real line into disjoint intervals and have a different algebraic expressionon each one.

(b) Graph f(x). In particular, indicate all solutions of f(x) = 0.

Page 14: Rigorous Calculus

1.2 Exercises for Chapter 1 9

6. (a) Show that if x is a positive real number, then−18x2 < x

(√x2 + 1 − x

)− 1

2<

−18(x2 + 1)

.

HINT: Use a − b = a2−b2

a+b to ‘rationalize the numerator’ twice! This issuperior to working backwards from the answer and bashing it out usingalgebra.

(b) Use (a) to show that for x > 0,

x+1

2x− 1

8x3 <√x2 + 1 < x+

12x

− 18x3 +

18x4 .

7. The Fibonacci sequence is defined by F (0) = F (1) = 1 and F (n + 2) =F (n+1)+F (n) for all n ≥ 0. Fix an integerD ≥ 2. Consider the remaindersq(n) obtained by dividing F (n) by D, with 0 ≤ q(n) < D. Prove that thissequence is periodic with some period d ≤ D2 as follows:(a) Show there are integers 0 ≤ i < j ≤ D2 such that q(i) = q(j) and

q(i+ 1) = q(j + 1).HINT: pigeonhole.

(b) Let d = j−i. Use induction to show that q(n+d) = q(n) and q(n+1+d) =q(n+ 1) for all n ≥ i.

(c) Show that q(n + d) = q(n) for all n ≥ 0. HINT: work backwards fromn = i.

Page 15: Rigorous Calculus

CHAPTER 2

The Real Numbers and Limits

2.1. The real numbers

What are the real numbers? You may think that you know the answer, but itturns out that you need to be very careful about this. It took mathematicians a longtime to realize that this was even necessary. There are some sophisticated ways toaccomplish this, but we will get by with a more mundane approach. We will haveto slough over some fine points in order to get on with doing calculus.

We will define a real number to be an infinite decimal. For convenience, wewill (temporarily) write our real numbers in the form

x = a0.a1a2a3 . . . where a0 ∈ Z and ai ∈ {0, 1, . . . , 9} for i ≥ 1.

This is a bit peculiar in the sense that we think of x as a (possibly negative) integera0 plus a positive infinite decimal number 0.a1a2a3 . . . . This will be convenientfor us, in that it treats all intervals [n, n + 1] the same. Once we start doing ourreal business, we make the fairly trivial change back to the common usage wherenegative real numbers are written as the negative of a positive real number.

We immediately run into difficulty with this definition. Are 1.000 . . . and0.999 . . . different real numbers? They are definitely different infinite decimals.However, when we try to distinguish them, we find that they are infinitely close toone another. Indeed, if we have any infinite decimal that ends in an infinite stringof 9’s, we use the formula for summing a geometric series to get

x = a0.a1 . . . an999 . . .

= a0.a1 . . . an + 10−n∑k≥1

910k

= a0.a1 . . . an + 10−n.

The rational number on the right hand side has a finite decimal expansion. If Istart at the point where an = 9, this is y = a0.a1 . . . an−1(an+1)000 . . . So itmake sense to identify x and y as the same real number. We can think of theinfinite decimal as a name for the real number, and certain numbers, those endingin an infinite string of 0’s or 9’s, have two names. We call this an equivalencerelation where certain names are identified and considered as a single object. (SeeAppendix A.1.)

10

Page 16: Rigorous Calculus

2.1 The real numbers 11

The set R of real numbers is an ordered field. It has a total order: a relation <on R such that

(1) For x, y ∈ R, exactly one of x < y, x = y or y < x holds.

(2) For x, y, z ∈ R, if x < y and y < z, then x < z.

And R is a field: there are operations of addition (x + y) and multiplication (xy)and special elements 0, 1 such that for x, y, z ∈ R,

(3) x+ y = y + x (addition is commutative).

(4) x+ (y + z) = (z + y) + z (addition is associative).

(5) x+ 0 = x (0 is the additive identity).

(6) For x ∈ R, ∃y ∈ R called “−x” so that x+ y = 0 (additive inverse).

(7) xy = yx (multiplication is commutative).

(8) x(yz) = (xy)z (multiplication is associative).

(9) x1 = x (1 is the multiplicative identity).

(10) If x = 0, ∃y ∈ R called “x−1” so that xy = 1 (multiplicative inverse).

(11) (x+ y)z = xz + yz (distributive law).

Finally there are some axioms that relate the order with the algebraic relations.

(12) If x < y, then x+ z < y + z.

(13) If 0 < x and 0 < y, then 0 < xy.

It is a lot of work to check all of these properties, and we are not going to do it.We will just discuss a few issues that arise.

Firstly, the order is easy to describe. If x = y, then choose a decimal expansionfor each:

x = a0.a1a2a3 . . . and y = b0.b1b2b3 . . .

Since they differ, there is a first n ≥ 0 such that ai = bi for 0 ≤ i < n and an = bn.If an < bn, we say that x < y, and if an > bn, we say y < x. The rational numbers,and even the numbers with a finite decimal expansion (you can check that these arerational numbers of the form x = p

2m5n ) are order dense in R, i.e., if x < y, there isa finite decimal number z such that x < z < y. Indeed, z = b0.b1b2b3 . . . bn000 . . .works unless y = z. If an ≤ bn − 2, z = b0.b1b2b3 . . . bn−1(bn−1)000 . . . works.Lastly, if an = bn − 1 and x = a0.a1a2a3 . . . an999999999am . . . with am < 9,then

z = a0.a1a2a3 . . . an999999999(am+1)000 . . .

works. Note that in this case, x cannot end in infinitely many 9’s because then

x = a0.a1a2a3 . . . an999 · · · = a0.a1a2a3 . . . (an+1)000 · · · = y.

Page 17: Rigorous Calculus

12 The Real Numbers and Limits

Adding and multiplying infinite decimals is delicate. Considering computingπ + e.

π = 3.141592653589 . . .e = 2.718281828459 . . .

π + e = 5.85987448204?

We have to add from the left, but it is necessary to look ahead for carries from theright. The red digits are the result of carries. The blue ? indicates that we needmore information to decide if the next digit is 8 or 9. You may have to look a longway to be sure of the next digit. However we can say for sure that

5.859874482048 < π + e < 5.859874482050.

So we know the answer is 5.859874482049 ± 10−12. In calculus, this informationis just as useful as knowing the digit for sure.

Multiplication is even more challenging to define than addition. Again we canbound the product to any desired accuracy by using finite decimal approximations.In Appendix A.2, we will explain a better way to approach this problem.

We will need to use the absolute value frequently. This is defined as

|x| =

{x if x ≥ 0

−x if x < 0.

An easy but important fact is the triangle inequality

|x+ y| ≤ |x|+ |y|.This is an equality if x and y have the same sign, but is strict when xy < 0. It canbe rearranged to provide the inequalies

|x| ≥ |x+ y| − |y| and |x± y| ≥∣∣|x| − |y|

∣∣.A frequent use of the absolute value is in describing an interval by {x : |x−a| < r}.This means that −r < x− a < r, which can be rewritten as a− r < x < a+ r.

2.2. Limits

What does it mean to say that a sequence an converges to L? Here are someattempts:

(A) The larger n gets, the closer an gets to L.

(B) The larger n gets, the closer an gets to L; and it gets arbitrarily close.

(C) Eventually an = L.

(D) Eventually an is close but not equal to L.

(E) Eventually every an is as close as we want to L.

Page 18: Rigorous Calculus

2.2 Limits 13

One problem with (A) is that it doesn’t say how close. So a sequence like12 , 1,

23 , 1,

34 , 1,

45 , . . . gets closer and closer to π, but also closer and closer to e.

Statement (B) tries to fix that by specifying that it has to get very, very close. How-ever it seems to suggest a monotone approach. What about 1

2 ,32 ,

23 ,

54 ,

34 ,

98 .

45 ,

1716 , . . .

This sequence approaches 1 from both sides, and the even terms are getting there alot faster than the odd terms. This sequence should be considered as convergent—so the sequence can get close, back off a little bit, and get even closer. A sequencecan approach the limit from both sides, and doesn’t have to get closer at each step.Statement (C) is way too strong. And (D) excludes a sequence like 1, 1, 1, 1, . . . or12 , 1,

23 , 1,

34 , 1, . . . .

What about the sequence 12 ,

13 ,

23 ,

14 ,

34 ,

15 ,

45 , . . . ? Does this converge to both 0

and 1, or to neither? What about 0, 1, 0, 1, 0, 1, . . . ? We say that these sequencesdo not converge because they do not get close to any number L. The odd termsare far from 1, and the even terms are far from 0. All are eventually bounded awayfrom anything else.

Statement (E) seems to paraphrase what we want. The trouble with it is justthat the meaning is not precisely articulated. In mathematics, we need to be ableto nail it down is quantitative terms. By “arbitrarily close’’, we use a small (butunspecified) positive number ε > 0. By ‘eventually’, we mean that there should besome numberN so that we are close within ε for all n ≥ N . Putting this altogetherwe get the following formulation.

2.2.1. DEFINITION. limn→∞

an = L means: for any ε > 0, there is an N so that

for all n ≥ N , we have |an − L| < ε. In symbols: ∀ε>0∃N∈N∀n≥N |an − L| < ε.A sequence which has a limit is said to converge.

We will refer to this as the ε–N definition of limit. To verify this definition inexamples, you should think of ε being given to you, and your job is to find the Nwhich makes the definition work.

2.2.2. EXAMPLES.

(1) Let an =

{n

n+1 if n is odd.1 + 2−n if n is even.

Let’s show that L = 1 is the limit. We are

given ε > 0, and we can find someN large enough so that 1N < ε. If n ≥ N is odd,

then |an−1| = 1n ≤ 1

N < ε; while if n ≥ N is even, then |an−1| = 2−n ≤ 1N < ε.

So this choice ofN does the job. There is no advantage to choosingN in an optimalway. Thus lim

n→∞an = 1.

(2) Let xn =2n3 + n2 − 137

5n3 − n− 1for n ≥ 1. We rewrite this as

Page 19: Rigorous Calculus

14 The Real Numbers and Limits

xn =2 + 1

n − 137n3

5 − 1n2 − 1

n3

.

This makes it clear that the numerator tends to 2 while the denominator tends to 5.We need a quantitative estimate for the difference that is tractable and goes to 0.

|xn − 25 | =

∣∣∣2n3 + n2 − 1375n3 − n− 1

− 25

∣∣∣=

|(10n3 + 5n2 − 685)− (10n3 − 2n− 2)|(5n3 − n− 1)5

=|7n2 − 687|

25n3 − 5n− 5<

7n2

24n3 <1

3n.

The second last inequality is valid provided that n ≥ 10. Now given ε > 0, choosean N ≥ max{10, 1

3ε}. Then N ≥ 10, so that the inequality is valid, and also1

3N < ε. Now if n ≥ N , we have |xn − 25 | <

13n < ε. This verifies the definition

of limit. Thus limn→∞

xn = 25 .

(3) Let an = (−1)n for n ≥ 1; i.e., −1, 1,−1, 1,−1, 1, . . . . This sequence doesnot appear to converge. To establish this, we first need to understand the negationof the limit definition. In order for the definition to fail for a specific value of L, weonly need to find a single ε > 0 for which it fails, but we then need to show that nochoice of N will work. To that end, given any N , we need to find some n ≥ N sothat |an − L| ≥ ε. But we also need to consider all values of L.

So take an arbitrary L ∈ R. Consider two cases. Case 1 L ≥ 0. Take ε = 1.Given any N , choose an odd n > N . Then |an−L| = L+ 1 ≥ ε. So no N works,and thus L is not the limit. Case 2 L < 0. Take ε = 1. Given any N , choose aneven n > N . Then |an − L| = |L|+ 1 > ε. So no N works, and thus L is not thelimit. Therefore this sequence has no limit.

(4) If the limit exists, then it is unique. That is, if limn→∞

an = L and limn→∞

an =M ,

then L = M . Indeed, if M = L, let ε = |L −M |/2. If limn→∞

an = L, use this ε

and find N so that |an − L| < ε for all n ≥ N . Then

|an −M | ≥ |L−M | − |L− an| > |L−M | − |L−M |/2 = ε.

Hence the sequence does not converge to M .

2.2.3. SQUEEZE THEOREM. Suppose that an ≤ bn ≤ cn for n ≥ 1, and thatlimn→∞

an = L = limn→∞

cn. Then limn→∞

bn = L.

PROOF. Let ε > 0. Since limn→∞

an = L, there is an N1 ∈ N so that for all

n ≥ N1, |an − L| < ε. Similarly, since limn→∞

cn = L, there is an N2 ∈ N so that

Page 20: Rigorous Calculus

2.2 Limits 15

for all n ≥ N2, |cn − L| < ε. Let N = max{N1, N2}. If n ≥ N , then

L− ε < an ≤ bn ≤ cn < L+ ε.

Therefore |bn − L| < ε. So limn→∞

bn = L. ■

Here are some more examples.

2.2.4. EXAMPLES.(5) Consider bn =

(1 + 1

n2

)n for n ≥ 1. By the binomial theorem,

bn =n∑

k=0

(n

k

)1n2k = 1 +

n∑k=1

n(n− 1) . . . (n+ 1 − k)

nk1

k!nk

Therefore for n ≥ 2,

1 ≤ bn ≤ 1 +∞∑k=1

1nk

= 1 +1n

1 − 1n

= 1 +1

n− 1.

This is crude but sufficient for our purpose. Take bn = 1 and cn = 1 + 1n−1 for

n ≥ 2. They both converge to 1. So by the Squeeze Theorem, limn→∞

bn = 1.

(6) Let a1 = 12 , a2 = 1

2+ 12, a3 = 1

2+ 12+ 1

2

, a4 = 12+ 1

2+ 12+ 1

2

, a5 = 12+ 1

2+ 12+ 1

2+ 12

,

a6 = 12+ 1

2+ 12+ 1

2+ 12+ 1

2

′a7 = 1

2+ 12+ 1

2+ 12+ 1

2+ 12+ 1

2+ 12

, a8 = 12+ 1

2+ 12+ 1

2+ 12+ 1

2+ 12+ 1

2+ 12

The limit is called a continued fraction. What we need is a formula for an. Thenatural way to do this is to find a recursion formula which defines an+1 in terms of

an. Here we have a1 = 12 and an+1 =

12 + an

for n ≥ 1.

First suppose that the limit L exists. Then we can compute

L = limn→∞

an+1 = limn→∞

12 + an

=1

2 + L.

Therefore L2 + 2L − 1 = 0. Thus L = −1 ±√

2. However L ≥ 0, so we getL =

√2 − 1. To check that this really is the limit, we must verify the definition.

Note that L = 12+L .

an+1 − L =1

2 + an− 1

2 + L=

L− an(2 + an)(2 + L)

.

This shows that an − L alternates in sign, and we can estimate

|an+1 − L| = |an − L|(2 + an)(2 + L)

<|an − L|

4.

Page 21: Rigorous Calculus

16 The Real Numbers and Limits

Now|a1 − L| = |1

2− (

√2 − 1)| = 3

2−√

2 <110.

Therefore

|an+1 − L| < |an − L|4

<|an−1 − L|

42 <|a1 − L|

4n<

110

4−n.

We can now show that limn→∞

an =√

2 − 1.

Suppose that ε = 12 10−100 because we want 100 decimals accuracy. We would

require n so that 110 4−n < 1

2 10−100, or 1099 < 22n−1. Use the simple fact that210 = 1024 > 103. Then 1099 < 2330, so that n = 166 works.

(7) Consider limn→∞

5n100 + 3 · 2n + 7n!3n100 + 2n + 5n!

. The question here is which term dom-

inates as n → ∞, n100, 2n or n!? First, polynomials grow more slowly than

exponentials. The way to see this is to consider the ratio bn =n100

2n. Observe that

bn+1

bn=(n+ 1

n

)100 12−→ 1

2.

So eventually this ratio is decreasing almost by a factor of 2 each time, and thusbn → 0. This shows that 2n grows faster than n100. Let’s deal with n! in a similar

fashion. Set cn =n!2n

. Then

cn+1

cn=

(n+ 1)!n!

2n

2n+1 =n+ 1

2−→ +∞.

This shows that cn → ∞, and hence n! grows more quickly that 2n. Therefore

limn→∞

5n100 + 3 · 2n + 7n!3n100 + 2n + 5n!

= limn→∞

5n100/n! + 3 · 2n/n! + 73n100/n! + 2n/n! + 5

=75.

We have already been using some rules of manipulation of limits which appearto be true. We record these basic operations.

2.2.5. PROPOSITION. Suppose that limn→∞

an = L and limn→∞

bn = M , and letr ∈ R. Then

(1) limn→∞

an + bn = L+M .

(2) limn→∞

ran = rL.

(3) limn→∞

anbn = LM .

(4) If M = 0, then there is an N0 so that bn = 0 for n ≥ N0, and

limn→∞

anbn

=L

M.

Page 22: Rigorous Calculus

2.3 Least upper bound Principle 17

PROOF. I will prove (4) and leave the others as exercises. First take ε = |M |/2.Find an N0 so that for n ≥ N0, |bn −M | < |M |/2. Then

|bn| ≥ |M | − |M |/2 = |M |/2.

In particular, bn = 0. CalculateL

M− anbn

=Lbn −Man

Mbn=Lbn − LM + LM −Man

Mbn

=L(bn −M) +M(L− an)

Mbn.

Therefore for n ≥ N0,∣∣∣ LM

− anbn

∣∣∣ ≤ |L| |bn −M |+ |M | |L− an||M | |M |/2

=2|L|M2 |bn −M |+ 2

|M ||an − L|.

Now let ε > 0 be given. Use the two limits to find N1 so that if n ≥ N1, then

|an − L| < ε|M |4

; and choose N2 so that if n ≥ N2, then |bn −M | < εM2

4|L|+ 1.

Define N = max{N0, N1, N2}. If n ≥ N , then∣∣∣ LM

− anbn

∣∣∣ < 2|L|M2 |bn −M |+ 2

|M ||an − L|

<2|L|M2

εM2

4|L|+ 1+

2|M |

ε|M |4

2+ε

2= ε.

Therefore limn→∞

anbn

=L

M. ■

2.2.6. PROPOSITION. Every convergent sequence is bounded.

PROOF. Suppose that limn→∞

an = L. Take ε = 1 and find N so that if n ≥ N ,

then |an − L| < 1. Then

M = max{|a1|, |a2, . . . , |aN−1|, |L|+ 1}is a bound for {an : n ≥ 1}. ■

2.3. Least upper bound Principle

2.3.1. DEFINITION. If ∅ = S ⊂ R, say S is bounded above if it has an upperbound R ∈ R, meaning that s ≤ R for all s ∈ S. A number M is the least upperbound or supremum of S if M is an upper bound for S and whenever R is an upperbound, then M ≤ R. We write M = supS.

Page 23: Rigorous Calculus

18 The Real Numbers and Limits

Similarly S is bounded below if there is some R ∈ R so that R ≤ s for alls ∈ S. The greatest lower bound or infimum of S is the lower bound L which islargest among all lower bounds. We write L = infS.

If S has no upper bound, we write supS = +∞; and if it has no lower bound,we write infS = −∞.

2.3.2. EXAMPLES.(1) If our universe is Q, the set of rational numbers, then some sets do not havea supremum. Consider A = {x ∈ Q : x2 < 2}. It is not hard to show thatsupA =

√2. However since

√2 is not rational, the upper bounds in Q are all

strictly bigger than√

2. Then because Q is order dense in R, if r ∈ Q is an upperbound, we can find another s ∈ Q such that

√2 < s < r. Hence there is no best

choice. This is an important difference between Q and R.

(2) A = {1,−e, 6,√

91,−3.5, π}. Then supA =√

91 and infA = −3.5.

(3) B = {2, 4, 6, 8, . . . } = 2N. Then supB = +∞ and infB = 2.

(4) C − {(−1)n nn+1 : n ≥ 1}. Then supC = 1 and infC = −1. Neither ±1

belongs to C.

(5) D = {sinn : n ∈ N}. Then 1 is an upper bound and −1 is a lower bound. Tofigure out the sup and inf, we use the pigeonhole principle. Let ε > 0.

The angle n (always in radians because this is calculus!!) only matters modulointeger multiples of 2π. So for each n, let θ(n) = n −

⌊n

⌋2π ∈ [0, 2π). No two

are the same because if 2π divides m − n, say m − n = 2πk, then π = m−n2k is

rational. But π is irrational, so this doesn’t happen. (See Appendix A.5.)Divide [0, 2π) into N intervals of length less than ε. We have to take N >

2π/ε. Now {θ(n) : n ≥ 1} is an infinite sequence. There are infinitely manyθ(n)’s in N intervals. Hence there are two in one interval, say |θ(n) − θ(m)| < εfor 1 ≤ n < m. then

θ(m− n) =

{θ(m− n) ∈ (0, ε) if θ(m) > θ(n)

2π + θ(m− n) ∈ (2π − ε, 2π) if θ(m) < θ(n).

Now sin π2 = 1. Suppose that θ(m − n) = α ∈ (0, ε). Let k =

⌊π

⌋and look at

θ(km − kn). This belongs to the interval(π2 − ε, π2

), say θ(km − kn) = π

2 − βfor 0 < β < ε. If, on the other hand, if θ(m − n) = 2π − α ∈ (2π − ε, 2π), letk =

⌈ 3π2α

⌉and look at θ(km− kn). Again, θ(km − kn) = π

2 − β for 0 < β < ε.Therefore

sin(km− kn) = sin(π2 − β) = cosβ > 1 − β2 > 1 − ε2.

(We will prove this estimate for cosx later in the course.) Since ε > 0 was arbitrary,we obtain that supD = 1. Similarly, infD = −1.

Page 24: Rigorous Calculus

2.4 Monotone Sequences 19

Now we will establish a very important property of the real numbers.

2.3.3. LEAST UPPER BOUND PRINCIPLE. Every non-empty set S ⊂ Rwhich is bounded above has a least upper bound. Every non-empty set S ⊂ Rwhich is bounded below has a greatest lower bound.

PROOF. We prove the second statement.Assume that S has a lower bound M , and we can take M to be an integer. Let

s ∈ S, and let k = ⌈s −M⌉. Consider M,M + 1,M + 2, . . . ,M + k. SinceM + k ≥ s, there is a largest integer in this list, say a0 so that a0 is a lowerbound for S and a0 + 1 is not. Choose s0 ∈ S so that s0 < a0 + 1. This is a‘witness’ to the fact that a0 + 1 is not a lower bound. Now consider the numbersa0.0, a0.1, . . . , a0.9. Pick the largest value a1 ∈ {0, 1, . . . , 9} so that a0.a1 is alower bound. Select a witness s1 ∈ S so that s1 < a0.a1 +

110 .

Repeat this procedure recursively. Suppose that a0.a1a2 . . . an is a lower boundfor S, and there is an sn ∈ S so that sn < a0.a1a2 . . . an + 10−n. Considera0.a1a2 . . . an0, . . . , a0.a1a2 . . . an9 and pick the largest an+1 ∈ {0, 1, . . . , 9} sothat a0.a1a2 . . . anan+1 is a lower bound. Then select a witness sn+1 ∈ S so thatsn+1 < a0.a1a2 . . . anan+1 + 10−n−1 to show that this is not a lower bound.

Let L = a0.a1a2a3 . . . . If s ∈ S, then s ≥ a0.a1a2 . . . an for all n ≥ 0.Therefore s ≥ L. If L < b, then there is a finite decimal so that

L < c = c0.c1c2 . . . cn < b.

Thus

a0.a1a2 . . . an ≤ L ≤ sn < a0.a1a2 . . . an + 10−n ≤ c < b.

The witness sn shows that b is not a lower bound. Therefore L is the greatest lowerbound.

For the first part, we observe that supS = − sup(−S). ■

2.4. Monotone Sequences

2.4.1. DEFINITION. Say that (an)n≥1 is a increasing sequence if an ≤ an+1for all n ≥ 1; and say that it is a strictly increasing sequence if an < an+1 for alln ≥ 1. Sometimes we say that (an) is monotone increasing for emphasis. Similarlywe define decreasing sequence and strictly decreasing sequence.

2.4.2. MONOTONE CONVERGENCE THEOREM. If (an) is an increasingsequence which is bounded above, then (an) converges, i.e., lim

n→∞an = L exists. If

(an) is a decreasing sequence which is bounded below, then (an) converges.

Page 25: Rigorous Calculus

20 The Real Numbers and Limits

PROOF. Suppose M is an upper bound for S = {an : n ≥ 1}. Let L = supS,which exists by the Least Upper Bound Principle. We claim that lim

n→∞an = L.

Let ε > 0. Then L− ε is not a lower bound for S. Hence there is an N so thatL− ε < aN . For n ≥ N , L− ε < aN ≤ an ≤ L. Therefore |L− an| < ε. Thuslimn→∞

an = L.

If (an) is decreasing, then (−an) is increasing with limit sup{−an : n ≥ 1}.Therefore (an) has limit L = inf{an : n ≥ 1}. ■

2.4.3. EXAMPLE. Let a1 = 1 and an+1 =√

2 +√an for n ≥ 1.

Claim: an is increasing. Indeed, a2 =√

3 > a1. Proceed by induction. Ifan > an−1, then

an+1 =√

2 +√an >

√2 +

√an−1 = an.

Hence by induction, an+1 > an for all n ≥ 1.Claim: an ≤ 2 for all n ≥ 1. Again this is true for a1 = 1. If an < 2, then

an+1 =√

2 +√an <

√2 +

√2 < 2.

Therefore (an) is monotone increasing and bounded above. By the MonotoneConvergence Theorem (MCT), L = lim

n→∞an exists. Hence

L = limn→∞

an+1 == limn→∞

√2 +

√an =

√2 +

√L

So L2 = 2 +√L; whence (L2 − 2)2 = L, or

0 = L4 − 4L2 − L+ 4 = (L− 1)(L3 + L2 − 3L− 4).

NowL ≥ a2 =√

3, soL = 1. ThusL is a root of the cubic p(x) = x3+x2−3x−4.Now p′(x) = 3x2 + 2x− 3 = 3(x2 − 1) + 2x > 0 on [1, 2]. Thus p is strictly

increasing and p(1) = −5 and p(2) = 2. The curve must cross the axis, and it hasexactly one root between 1 and 2. There is a formula for cubics, though it is notvery enlightening:

L =13

(3

√79 +

√2241

2+

3

√79 −

√2241

2− 1

)≈ 1.831177.

2.4.4. EXAMPLE. Let a1 = 2 and an+1 =1 + a2

n

2for n ≥ 1. First we show

that (an) is increasing. Now a2 = 52 > 2 = a1. If an−1 < an, then

an =1 + a2

n−1

2<

1 + a2n

2= an+1.

Thus the sequence is increasing by induction.

Page 26: Rigorous Calculus

2.5 Subsequences 21

Suppose that L = limn→∞

an. Then

L = limn→∞

an+1 = limn→∞

1 + a2n

2=

1 + L2

2.

Therefore 0 = L2 − 2L+ 1 = (L− 1)2; so that L = 1. But this is absurd, becausean ≥ 2. What went wrong?

The problem is that this sequence isn’t bounded, and thus does not converge.In fact, an > n for n ≥ 1. We have seen this for n = 1, 2; and a3 = 29

8 > 3.

Suppose that n ≥ 3 and an > n. Then an+1 =1 + a2

n

2>

1 + n2

2; and

1 + n2

2− (n+ 1) =

12(n2 − 2n− 1) >

12n(n− 3) ≥ 0.

By induction, an > n for all n, and the sequence is unbounded.

It is convenient to be able to describe the divergence to infinity precisely.

2.4.5. DEFINITION. If (an)n≥1 is a sequence, then limn→∞

an = +∞ meansthat for all R > 0, there is an integer N so that for n ≥ N , an > R. Andlimn→∞

an = −∞ is defined analogously.

The notion of being close to a limit value L is replaced by eventually beinggreater than any large number R. In our example above, given R, we just chooseN so that N ≥ R and for n ≥ N , an > n > R.

2.5. Subsequences

2.5.1. DEFINITION. A subsequence of (an)n≥1 is a sequence (ani)i≥1 wheren1 < n2 < n3 < . . . .

The following proposition is elementary.

2.5.2. PROPOSITION. Suppose that limn→∞

an = L and that (ani)i≥1 is a subse-

quence of (an)n≥1. Then limi→∞

ani = L.

PROOF. Let ε > 0 and select an N so that |an − L| < ε for n ≥ N . Then ifi ≥ N , then ni ≥ i ≥ N , so |ani − L| < ε. Thus lim

i→∞ani = L. ■

Page 27: Rigorous Calculus

22 The Real Numbers and Limits

We showed in Proposition 2.2.6 the elementary fact that convergent sequencesare bounded. The following result is deep, and deals with a partial converse. It hasa very interesting proof.

2.5.3. BOLZANO-WEIERSTRASS THEOREM. Every bounded sequencehas a convergent subsequence.

PROOF. Suppose that |an| ≤ B for n ≥ 1. Let I0 = [−B,B], and split thisinterval into two halves, I0− = [−B, 0] and I0+ = [0, B]. One (and possiblyboth) of these two intervals must contain infinitely many terms of the sequence.Let I1 = [b1, c1] be such an interval, and pick n1 so that an1 ∈ I1. Split I1 intotwo halves I1− =

[b1,

b1+c12

]and I1+ =

[b1+c1

2 , c1]. Again, at least one of these

intervals, say I2 = [b2, c2], contains infinitely many terms of the sequence. Pick ann2 > n1 so that an2 ∈ I2.

We repeat this procedure recursively. Suppose that we have a nested sequenceof intervals I1 ⊃ I2 ⊃ · · · ⊃ Im = [bm, cm] so that the length |Ik| = 2−k|I0| for1 ≤ k ≤ m so that Im contains infinitely many terms of the sequence, and thatwe have selected n1 < n2 < · · · < nm so that ank

∈ Ik for 1 ≤ k ≤ m. SplitIm into two halves Im− =

[bm,

bm+cm2

]and Im+ =

[bm+cm

2 , cm]. Pick one, say

Im+1 = [bm+1, cm+1], which contains infinitely many elements of the sequence.Then select nm+1 > nm so that anm+1 ∈ Im+1.

Our subsequence is (ani)i≥1. Observe that because of the construction of thenested intervals Im, we have that

b1 ≤ b2 ≤ · · · ≤ bm ≤ anm ≤ cm ≤ · · · ≤ c2 ≤ c1

and cm − bm = 2−m|I0| = 21−mB. The sequence (bm) is increasing and boundedabove by c1. By the MCT, lim

m→∞bm = L exists. Also (cm) is a decreasing sequence

bounded below by b1, and thus by MCT, limm→∞

cm =M exists. Moreover

M − L = limm→∞

cm − bm = 0.

so M = L. Finally limi→∞

ani = L follows from the Squeeze Theorem. ■

2.5.4. EXAMPLE. In Example 2.3.2(5), we considered the sequence (sinn).What are the possible limits of subsequences of this? The way the argument workedis that we found terms in this sequence approaching 1 by approximating the angleπ2 by θ(ni) for various integers ni. There was nothing special about π

2 except thatit was where sinx takes the value 1. We can do the same thing for any angle. Thusgiven L ∈ [−1, 1], we can find positive integers mi (not necessarily increasing) sothat sinmi converges to L.

To select an increasing sequence, look back at the construction. The number

km− kn ≥ k =⌊ π

⌋>⌊ π

⌋.

Page 28: Rigorous Calculus

2.6 Completeness 23

This is large when ε is very small. So what we do is to choose the sequence recur-sively, and if n1 < · · · < nk are defined, we select ε so small that the lower boundis greater than nk. Then the terms that we choose will be increasing. Thus everyvalue in [−1, 1] is a limit of a subsequence of (sinn).

2.6. Completeness

Given a sequence (an), is it possible to decide if it will converge without iden-tifying a limit? The answer is crucial to explaining why the real line has no “holes”in it, while the rational numbers has many. A sequence like Example 2.2.4(6) is asequence of rational numbers with an irrational limit,

√2 − 1. How do we know

that there aren’t sequences of real numbers converging to something in a larger uni-verse? The answer is the notion of completeness, which relies on a good answer tothe question just posed.

2.6.1. PROPOSITION. If limn→∞

an = L and ε > 0, there is an integer N so that

for all N ≤ m ≤ n, |an − am| < ε.

PROOF. Use ε2 is the definition of limit, and find N so that for n ≥ N , we have

|an − L| < ε2 . Then if N ≤ m ≤ n,

|an − am| ≤ |an − L|+ |L− am| < ε

2+ε

2= ε. ■

2.6.2. DEFINITION. A sequence (an) is a Cauchy sequence if for every ε > 0,there is an integer N so that for all N ≤ m ≤ n, |an − am| < ε.

We can use essentially the same proof as for Proposition 2.2.6 to show thatCauchy sequences are bounded.

2.6.3. PROPOSITION. Every Cauchy sequence is bounded.

PROOF. Suppose that (an) is a Cauchy sequence. Take ε = 1 and find N sothat for all N ≤ m ≤ n, |an − am| < ε. Let

M = max{|a1|, |a2, . . . , |aN−1|, |aN |+ 1}.

If n ≥ N , then |an| ≤ |aN |+ |an − aN | < |aN |+ 1. ■

Like convergent sequences, Cauchy sequences don’t have multiple limit points.

Page 29: Rigorous Calculus

24 The Real Numbers and Limits

2.6.4. LEMMA. Let (an) be a Cauchy sequence. If there is a convergent subse-quence lim

i→∞ani = L, then lim

n→∞an = L.

PROOF. Let ε > 0 be given. Using ε2 , select N so that |an − am| < ε

2 for allN ≤ m ≤ n. Using the limit, find an integer I so that if i ≥ I , then |ani −L| < ε

2 .Select some i0 ≥ I so that ni0 ≥ N . Then if n ≥ N ,

|an − L| ≤ |an − ani0|+ |ani0

− L| < ε

2+ε

2= ε.

Therefore limn→∞

an = L. ■

2.6.5. DEFINITION. A set S ⊆ R is complete if every Cauchy sequence in Sconverges to a point in S.

2.6.6. COMPLETENESS THEOREM. R is complete.

PROOF. Let (an) be a Cauchy sequence. By Proposition 2.6.3, it is bounded.By the Bolzano-Weierstrass Theorem, there is a convergent subsequence (ani).Then by Lemma 2.6.4, the whole sequence converges. Therefore R is complete. ■

2.6.7. EXAMPLE. Let a0 = 0 and an =2

3an + 5for n ≥ 0. The first few terms

are 0, 25 ,

1031 ,

62185 ,

3701111 , . . . which is approximately

0, 0.4, 0.32258, 0.335135, 0.3330333, . . .

We will show that (an)n≥1 is a Cauchy sequence. First

an+1 − an =2

3an + 5− 2

3an−1 + 5=

6(an−1 − an)

(3an + 5)(3an−1 + 5).

Since it is clear that an ≥ 0 for all n,

|an+1 − an| <6|an − an−1|

25<

|an − an−1|4

.

Therefore,

|an+1 − an| <|an−1 − an−2|

42 <|an−2 − an−3|

43 <|a1 − a0|

4n.

Now, since a1 − a0 = 25 , if N ≤ m < n,

|an − am| =∣∣∣ n−1∑i=m

ai+1 − ai

∣∣∣ ≤ n−1∑i=m

|ai+1 − ai|

<25

n−1∑i=m

4−i <25

4−m 11 − 1

4

=8

154−m ≤ 8

154−N .

Page 30: Rigorous Calculus

2.7 Some Topology 25

Given ε > 0, we choose N so large that 815 4−N < ε, and we see that (an) is

Cauchy. By the Completeness Theorem, L = limn→∞

an exists. Therefore,

L = limn→∞

an+1 = limn→∞

23an + 5

=2

3L+ 5.

Thus 0 = 3L2 + 5L − 2 = (3L − 1)(L + 2). So L ∈ {13 ,−2} and we know that

L ≥ 0. Therefore limn→∞

an = 13 .

We have established a number of results which all say something rather similar.In particular, we started by establishing the Least Upper Bound Principle. This wasused to derive the Monotone Convergence Theorem. Then, we used MCT to provethe Bolzano-Weierstrass Theorem. And finally we used the Bolzano-WeierstrassTheorem to prove Completeness of R. Let’s go full circle, and use the Complete-ness Theorem to prove the Least Upper Bound Principle. This will show that eachof these results is an equivalent formulation of completeness.

Suppose that S ⊂ R is non-empty and bounded above. To get started supposethat s0 ∈ S and s ≤ M for all s ∈ S. Let L = (s0 + M)/2. If L is not anupper bound for S, pick s1 ∈ S with s1 > L and set M1 = M . Otherwise, if Lis an upper bound, set M1 = L and s1 = s0. Either way, M1 − s1 ≤ 1

2(M − s0).Repeat this procedure recursively to construct an increasing sequence sn in S anda decreasing sequence Mn of upper bounds for S so that lim

n→∞Mn − sn = 0. The

sequence (sn) is Cauchy because if ε > 0, select N so that MN − sN < ε. Then ifN ≤ m ≤ n, then

sN ≤ sm ≤ sn ≤MN < sN + ε.

Therefore |sn − sm| < ε. By the Completeness Theorem, L = limn→∞

sn exists.

Moreover limn→∞

Mn = limn→∞

sn +(Mn − sn) = L. Thus no number smaller than Lis an upper bound, but L is an upper bound; and so it is the supremum of S.

2.7. Some Topology

It is convenient to introduce some notation that is used to describe two specialclasses of sets of real numbers.

2.7.1. DEFINITION. A subset U of R is open if for each x ∈ U , there is anr > 0 so that (x − r, x + r) ⊂ U . In particular, (a, b) = {x ∈ R : a < x < b} isan open interval.

A subset A of R is closed if it contains all of its limit points; i.e., if an ∈ A andlimn→∞

an = b, then b ∈ A. In particular, [a, b] = {x ∈ R : a ≤ x ≤ b} is a closedinterval.

Page 31: Rigorous Calculus

26 The Real Numbers and Limits

2.7.2. EXAMPLES.(1) An open interval (a, b) is open. If x ∈ (a, b), then r = min{x− a, b− x} > 0and (x− r, x+ r) ⊂ (a, b).

(2) A closed interval [a, b] is closed. For if xn ∈ [a, b] and limn→∞

xn = y, then sincea ≤ xn ≤ b, we get a ≤ y ≤ b.

(3) A union of open sets if open: if Un are open, then U =⋃

n≥1 Un is open. Forif x ∈ U , there is some n so that x ∈ Un. Hence there is some r > 0 so that(x− r, x+ r) ⊂ Un ⊂ U .

(4) The intersection of two open sets is open: if U and V are open, and x ∈U ∩ V , then there are r1 > 0 so that (x − r1, x + r1) ⊂ U and r2 > 0 so that(x− r2, x+ r2) ⊂ V . Take r = min{r1, r2} and note that (x− r, x+ r) ⊂ U ∩V .

2.7.3. PROPOSITION. A set U is open if and only if U c = R \ U is closed.

PROOF. Suppose that U is open, and an ∈ U c and limn→∞

an = b. If b ∈ U , then

for some r > 0, (b − r, b + r) ⊂ U . But then there is an N so that |an − b| < rfor all n ≥ N , and they all belong to U , which is false. Thus b ∈ U c; whence U c

is closed.Conversely, suppose thatA is a closed set. Let x ∈ Ac. IfA∩(x−r, x+r) = ∅

for all r > 0, then taking r = 1n , we can pick an ∈ A so that |x − an| < 1

n . Thenlimn→∞

an = x. Since A is closed, x ∈ A, a contradiction. Thus for some r > 0,

(x− r, x+ r) ⊂ Ac. So Ac is open. ■

2.7.4. PROPOSITION. A subset S ⊂ R is complete if and only if it is closed.

PROOF. Suppose that S is complete, and sn ∈ S so that the sequence (sn)n≥1converges in R, say lim

n→∞sn = b. Then (sn) is a Cauchy sequence in S. Therefore

the limit b belongs to S. Thus S is closed.Conversely, suppose that S is closed. Let (sn) be a Cauchy sequence in S.

Since R is complete, limn→∞

sn = b exists in R. Then because S is closed, b ∈ S.Thus S is complete. ■

Page 32: Rigorous Calculus

2.7 Exercises for Chapter 2 27

Exercises for Chapter 2

1. For the following sequences, determine the limit if it exists, or prove that itdoes not converge using the definition of limit

(a)(−1)n

√n cosn

n2 + 1.

(b) sin(nπ

3

).

(c)2100+5n

e4n−10 , n ≥ 1.

(d) (−1)n(n−1)(n−2)(n−3)

8 , n ≥ 1.

2. (a) Let an =√n2 + 3n− 3 − n for n ≥ 1. Find the limit.

(b) Using ε = 12 10−20, find an N that works in the limit definition.

3. Suppose that limn→∞

an = L. Show that limn→∞

a1 + a2 + · · ·+ ann

= L.

4. Let x0 = 3 and xn+1 = (xn + 8xn

)/2 for n ≥ 0.(a) Assume a limit exists and figure out what L must be.(b) Set εn = xn − L. Show that 0 < εn+1 < ε2

n/5.(c) Hence show that the limit exists.

5. Let x0 = 0 and xn+1 =√

15 − 2xn for n ≥ 0.(a) Figure out what the limit L should be.(b) Show by induction that 2 ≤ xn ≤ 4 for n ≥ 1.(c) Define εn = xn−L. Find a formula for εn+1 in terms of εn. Show that this

alternates in sign. Hence prove that limn→∞

xn = L.

6. Let a0 and a1 be positive numbers, and set an+2 =√an+1 +

√an for n ≥ 0.

(a) Show that here is some N so that an ≥ 1 for all n ≥ N .(b) Let εn = |an − 4|. Show that εn+2 ≤ (εn+1 + εn)/3 for n ≥ N .(c) Hence prove that this sequence converges.

7. Give a careful ε–N proof that: if (an) and (bn) are sequences of real numbersso that lim

n→∞an = L and lim

n→∞bn =M , then lim

n→∞anbn = LM.

8. Let an = n√

3n + 5n for n ≥ 1.(a) Prove that this sequence is monotone decreasing and bounded below. What

can you conclude?(b) Evaluate lim

n→∞an.

9. A series∞∑n=1

an is said to converge if the sequence sn =n∑

k=1ak, for n ≥ 1,

converges. Suppose that an ≥ 0 for all n ≥ 1. Prove that the following are all

Page 33: Rigorous Calculus

28 The Real Numbers and Limits

equivalent:

(i)∑∞

n=1 an converges.(ii) the sequence sn =

∑nk=1 ak is bounded above.

(iii) for all ε > 0, there is an N so that∑m

k=N+1 ak < ε for all m > N .

10. Suppose that (xn)∞n=1 is a sequence such that∑∞

n=1 |xn+1 − xn| converges.Prove that (xn) is a Cauchy sequence.

11. Let S = {x ∈ R : x = 0, 0 < sin( 1x) <

12}. Find supS and infS.

12. Suppose that (an)∞n=1 is a sequence such that

a2n−1 ≤ a2n+1 ≤ a2n+2 ≤ a2n for all n ≥ 1

Prove that the sequence converges if and only if limn→∞

an − an+1 = 0.

13. (a) Find limh→0

3√

1 + h− 1h

. HINT: rationalize the numerator.

(b) (i) Show that if h > −1 and h = 0, then 3√

1 + h < 1 + h3 .

(ii) For any 0 < ε < 18 , show that if 0 < h < 4ε, then

1 + (13− ε)h <

3√

1 + h.

(iii) For any 0 < ε < 18 , show that if −4ε < h < 0, then

1 + (13+ ε)h <

3√

1 + h.

(c) Use (b) to provide a different proof of (a).

14. Let x0 = 0 and xn+1 =√

5 − 2xn for n ≥ 0.(a) Compute x1, . . . , x10 on your calculator or computer.(b) Prove that the even terms are increasing and bounded above by all the odd

terms, which are decreasing. HINT: f(x) =√

5 − 2x is decreasing.(c) Get a bound for |xn+1 − xn| in terms of |xn − xn−1| for n ≥ 4.(d) Prove that lim

n→∞xn exists, and evaluate this limit.

15. Find a sequence of rational numbers (an) so that a real number L is a limit ofa subsequence of (an) if and only if e ≤ |L| ≤ π.

Page 34: Rigorous Calculus

CHAPTER 3

Functions

We briefly review some the of terminology regarding functions. A function is amap f : X → Y from a set X into a set Y that assigns exactly one value y = f(x)for each x ∈ X . The domain of f is the set on which it is defined, and the targetspace Y is the codomain. The range of f is the set {y = f(x) : x ∈ X}.

A function is one-to-one or injective if x1 = x2 ∈ X implies f(x1) = f(x2).A function is onto or surjective if for each y ∈ Y , there is some x ∈ X so thatf(x) = y. A function is one-to-one and onto or bijective if it is both injective andsurjective.

When the range is a field like R, say f, g : X → R, we can add, subtract andmultiply functions: (f ± g)(x) = f(x) ± g(x) and (fg)(x) = f(x)g(x). We canalso multiply them by a real scalar, say t ∈ R: (tf)(x) = tf(x). If g(x) = 0 for

all x ∈ X , we can divide: (f/g)(x) =f(x)

g(x).

If f : X → Y and g : Y → Z, the composition g ◦ f : X → Z is defined byg ◦ f(x) = g(f(x)). If f : X → Y is a bijection, then there is a unique functiong : Y → X so that g ◦ f(x) = idX(x) = x for all x ∈ X , namely, for each y ∈ Y ,there is a unique x ∈ X so that f(x) = y; and we set g(y) = x. This is called theinverse function of f , and is denoted f−1. We also have f ◦ f−1 = idY , so that fis also the inverse of f−1. We will discuss inverse functions in more detail later.

3.1. Limits of functions

We want to extend our definition of limits of sequences to limits of functions.

3.1.1. DEFINITION. Suppose that a real valued function f is defined on aninterval (a − d, a + d) \ {a} for some d > 0. The limit of a function f as x → ais L, written lim

x→af(x) = L, means that for all ε > 0, there is a δ > 0 so that if

0 < |x− a| < δ, then |f(x)− L| < ε.In symbols, ∀ε>0∃δ>0∀0<|x−a|<δ |f(x)− L| < ε.

3.1.2. REMARKS. (1) f does not need to be defined at x = a, and in any case,the value f(a) has no bearing on the value of the limit.

29

Page 35: Rigorous Calculus

30 Functions

(2) When f is defined on (a, a + d), we can talk about the limit from the right.We write lim

x→a+f(x) = L to mean: for all ε > 0, there is a δ > 0 so that if

a < x < a + δ, then |f(x) − L| < ε. Similarly, we define limit from the left,lim

x→a−f(x) = L, if for all ε > 0, there is a δ > 0 so that if a − δ < x < a, then

|f(x)− L| < ε.

3.1.3. EXAMPLES.(1) lim

x→3

√x =

√3. To prove this, estimate the difference

|√x−

√3| =

∣∣∣(x−√

3)(x+√

3)x+

√3

∣∣∣ = |x− 3|√x+

√3.

Make an initial choice to control the denominator: if |x− 3| < 1, then 2 < x < 4and so

1√x+

√3<

1√2 +

√3<

13.

Therefore |√x −

√3| < |x− 3|

3. Given ε > 0, pick δ = min{3ε, 1}. Then if

0 < |x − 3| < δ, we get |√x −

√3| < |x− 3|

3< ε. The first inequality requires

δ ≤ 1 and the second requires δ ≤ 3ε. Hence limx→3

√x =

√3.

(2) Let f(x) =

{1 if x ∈ Q0 if x ∈ Q

. Then limx→0

f(x) does not exist. Take any value L,

and let ε = 12 .

Case 1. Suppose that L ≤ 12 . For any δ > 0, pick n ∈ N so that 0 < 1

n < δ.Then |f( 1

n)− L| = |1 − L| ≥ 12 = ε. So L is not the limit.

Case 2. Suppose that L ≥ 12 . Given δ > 0, choose n ∈ N so that 0 < π

n < δ.Then |f(πn) − L| = |L| ≥ 1

2 = ε. Thus L is not the limit. So the limit does notexist.

Let g(x) = xf(x) =

{x if x ∈ Q0 if x ∈ Q.

Then 0 ≤ g(x) ≤ x. Then limx→0

0 = 0 =

limx→0

x. Therefore limx→0

g(x) = 0 by the function version of the Squeeze Theorem.

3.1.4. SQUEEZE THEOREM FOR FUNCTIONS. Suppose that f, g, h arefunctions on [a − d, a + d] \ {a} such that f(x) ≤ g(x) ≤ h(x) and lim

x→af(x) =

L = limx→a

h(x). Then limx→a

g(x) = L.

The following analogue of Proposition 2.2.5 is established in the same way asfor sequences. We leave the proofs as an exercise.

Page 36: Rigorous Calculus

3.1 Limits of functions 31

3.1.5. PROPOSITION. Suppose that f and g are functions on (a − d, a + d)such that lim

x→af(x) = L and lim

x→ag(x) =M , and let r ∈ R. Then

(1) limx→a

f(x) + g(x) = L+M .

(2) limx→a

rf(x) = rL.

(3) limx→a

f(x)g(x) = LM .

(4) If M = 0, then there is a δ0 > 0 so that g(x) = 0 for 0 < |x− a| < δ0;and

limx→a

f(x)

g(x)=

L

M.

3.1.6. EXAMPLE. Consider limθ→0

sin θθ

. Remember that we are using radians.

Note that if −π2 < θ < 0, then

sin θθ

=sin |θ||θ|

because this is an even function. So

it is enough to work with 0 < θ < π2 . Draw a circle of radius 1, centre O, and mark

the points A and B on rays separated by angle θ. Draw the lines perpendicular toOA throughA andB, and mark pointsC andD as in the figure. Note that segments

O

B

C

D A

FIGURE 3.1. Limit of sinxx

OA and OB have length 1, while BD has length sin θ, and AC has length tan θ.Observe that

△OAB ⊂ sector OAB ⊂ △OAD.Thus their areas compare:

12

sin θ ≤ 12θ 12 ≤ 1

2tan θ =

sin θ2 cos θ

.

Rearranging we get

cos θ ≤ sin θθ

≤ 1.

Page 37: Rigorous Calculus

32 Functions

In particular, 0 < sin θ < θ if 0 < θ < π2 . Therefore if 0 < θ < 1,

cos θ =√

1 − sin2 θ ≥√

1 − θ2 ≥ 1 − θ2.

Thus 1 − θ2 ≤ sin θθ

≤ 1. By the Squeeze Theorem, limθ→0

sin θθ

= 1.

Another important limit that is a consequence of this is

limx→0

1 − cosxx2 = lim

x→0

1 − (1 − 2 sin2 x2 )

x2 = limx→0

2 sin2 x2

4(x2 )2 =

12

We can use these two limits to compute the derivative of the sinx and cosxfunctions. We will not study the derivative formally until Chapter 5. However moststudents in this course have seen the derivative in their high school calculus course.So we will make use of that knowledge until we get to the theory later on.

d

dx(sinx)

∣∣x=a

= limh→0

sin(a+ h)− sin ah

= limh→0

sin a cosh+ cos a sinh− sin ah

= limh→0

sin a(cosh− 1)

h+ cos a

sinhh

= sin a(0) + cos a(1) = cos a.

andd

dx(cosx)

∣∣x=a

= limh→0

cos(a+ h)− cos ah

= limh→0

cos a cosh− sin a sinh− cos ah

= limh→0

cos a(cosh− 1)

h− sin a

sinhh

= cos a(0)− sin a(1) = − sin a.

Thusd

dx(sinx) = cosx and

d

dx(cosx) = − sinx.

Now I want to graph the function f(x) =sinxx

for x = 0. Before we draw any-thing, I will collect some information. First we know that sinx oscillates between±1 with period 2π. Therefore f(x) oscillates between the graphs of y = ± 1

x . Sincesinx has zeros at all integer multiples of π, πZ, f(x) = 0 at

{nπ : n ∈ Z \ {0}

};

but as we appraoch x = 0, we have limx→0

f(x) = 1. Also f(x) touches the curves

y = ± 1x at alternate odd multiples of π

2 . The function f is even, meaning that

f(−x) = sin(−x)−x

=sinxx

= f(x).

Page 38: Rigorous Calculus

3.1 Limits of functions 33

It won’t help much to solve for f ′(x) = 0 because we know that when thecurve touches y = ± 1

x , it will be tangent there, but won’t have derivative zero.Thinking of x > 0, the zeroes of f ′(x) will happen a bit before touching the curvey = ± 1

x , reaching an extremal point and turning slightly to line up with the othercurve. We can’t actually solve explicitly for these points, and the information onlyhelps a little bit. For example, the minimum of the function occurs at points ±x0where x0 is a bit smaller than 3π

2 . The exception is the behaviour near x = 0. Hereit helps.

f ′(x) =x cosx− sinx

x2 and limx→0

f ′(x) = limx→0

cosx− sinxx

x.

But we just showed that 0 > cosx − sinxx > cosx − 1, and limx→0

1−cosxx = 0.

Thus by the Squeeze Theorem, limx→0

f ′(x) = 0. This means that the curve flattens

out as it approached x = 0 at (0, 1).

O

B

C

D A

FIGURE 3.2. Graph of sinxx

3.1.7. EXAMPLE. Graph g(x) = x sin 1x on R \ {0}. This is also an even

function. As x approaches 0, 1x approaches ±∞. So sin 1

x oscillates faster andfaster between ±1. So g(x) oscillates between the two functions y = ±x. Inparticular, lim

x→0g(x) = 0. The zeroes occur at

{ 1nπ

: n ∈ Z \ {0}}

. At the points2

(2n+1)π for n ≥ 0, g(x) = (−1)nx. So the largest point where the curve touches

Page 39: Rigorous Calculus

34 Functions

FIGURE 3.3. Graph of x sin 1x

the bounding lines is at x = 2π . Use the even property to reflect this over to the

negative real axis.This explains the behaviour as we approach x = 0. What happens as x→ ∞?

In this limit, substitute u = 1x . As x→ ∞, u→ 0+.

limx→∞

x sin 1x = lim

u→0+

sinuu

= 1.

Thus this curve has a horizontal asymptote y = 1 as x → ∞ and by symmetry, asx→ −∞ as well.

3.2. The natural logarithm

For 0 < a < b, define A(a, b) to be the area under the graph y = 1x between

x = a and x = b. Set A(b, a) = −A(a, b) when b > a and L(a) = A(1, a).Observe that A(a, a) = 0 and A(a, b) +A(b, c) = A(a, c).

We are not going to do any integral calculus here. Rather I am going to arguegeometrically to obtain the properties that we need. For s > 0, consider the lineartransformation

Ts(x, y) = (sx, ys ) for (x, y) ∈ R2.

This takes a square S with corners (x, y), (x+ h, y), (x, y+ h) and (x+ h, y+ h)to the rectangle R = TsS with corners (sx, ys ), (sx + sh, ys ), (sx,

ys + h

s ), and(sx + sh, ys + h

s ). Now S has area h2 while R has area (sh)(hs ) = h2. Any niceplanar region can be approximated by a union of small squares. Since Ts preservesarea on each square, it preserves area for any nice region.

Page 40: Rigorous Calculus

3.2 The natural logarithm 35

FIGURE 3.4. Area under y = 1x

Next observe that the points on the curve y = 1x for x > 0, say (x, 1

x), aremapped by Ts to (sx, 1

sx). So Ts maps the curve onto itself. It similarly maps thex-axis onto itself. The line x = a, which consists of points (a, y), is mapped ontothe line x = sa. It is now easy to see that the region under the curve y = 1

x betweenx = a and x = b is mapped by Ts onto the region under y = 1

x between x = saand x = sb. Therefore, since Ts preserves area,

A(a, b) = A(sa, sb) for 0 < a < b and s > 0.

3.2.1. PROPOSITION. For a, b > 0, L(ab) = L(a) + L(b).

PROOF. L(ab) = A(1, ab) = A(1, a) +A(a, ab)

= A(1, a) +A(1, b) = L(a) + L(b). ■

3.2.2. COROLLARY. For a > 0 and n ∈ Z, L(an) = nL(a).

PROOF. First consider n ≥ 0. Clearly L(a0) = L(1) = A(1, 1) = 0 andL(a1) = L(a). Proceed by induction. Assume that the formula is true for n. Then

L(an+1) = L(aan) = L(a) + L(an) = L(a) + nL(a) = (n+ 1)L(a).

Thus by induction, the formula is valid for all n ∈ N0.Next consider n = −1.

L( 1a) = A(1, 1

a) = −A( 1a , 1) = −A(1, a) = −L(a).

Finally, if n ∈ N, L(a−n) = −L(an) = −nL(a). So the formula is valid forn ∈ Z. ■

Page 41: Rigorous Calculus

36 Functions

3.2.3. COROLLARY. L(x) is a strictly increasing function on (0,∞), and

limx→∞

L(x) = +∞ and limx→0+

L(x) = −∞.

PROOF. Strictly increasing is clear because A(a, b) > 0 for a < b. Therefore

limx→∞

L(x) = limn→∞

L(2n) = limn→∞

nL(2) = +∞

because L(2) > 0. Similarly,

limx→0+

L(x) = limn→ −∞

L(2n) = limn→−∞

nL(2) = −∞. ■

3.2.4. PROPOSITION. For a, b > 0,1b<

A(a, b)

b− a<

1a

. Thus for x > 1,

1x<L(x)

x− 1< 1.

PROOF. Notice that the region with areaA(a, b) contains the rectangle on [a, b]with height 1

b and is contained in the rectangle on [a, b] with height 1a . Therefore

FIGURE 3.5. Bounds for A(a, b)

b− a

b≤ A(a, b) ≤ b− a

a.

Divide by b− a. Take a = 1 and b = x to get the second formula. ■

3.2.5. COROLLARY. d

dxL(x) =

1x

for x > 0.

PROOF. For x > 0 and h > 0, we have1

x+ h<A(x, x+ h)

h=L(x+ h)− L(x)

h<

1x.

Page 42: Rigorous Calculus

3.2 The natural logarithm 37

Let h→ 0+ and use the Squeeze Theorem to get

limh→0+

L(x+ h)− L(x)

h=

1x.

Similarly if 0 < h < x,1x<A(x− h, x)

h=L(x− h)− L(x)

−h<

1x− h

.

Thus

limh→0−

L(x+ h)− L(x)

h=

1x.

Therefore L′(x) =1x

. ■

3.2.6. DEFINITION. The natural logarithm, written lnx or logx, is defined aslnx = L(x) for x > 0. The number e is the unique real number such that ln e = 1.

The main property of any logarithm function is given in Proposition 3.2.1:ln(ab) = ln a + ln b. The base of the logarithm is the number, in this case e,such that ln e = 1. This specific number is chosen so that the derivative is 1

x , notsome multiple. Corollary 3.2.2 shows that ln en = n, but in fact it says much more.Consider a rational number m

n . Then

m = ln em = ln((em/n)n) = n ln em/n.

Therefore ln em/n = mn .

If we have a log function such that a number a > 1 had log equal to 1, we callthis function loga x. Because of the product property, it follows that loga a

mn = m

n

and ln amn = m

n ln a. Therefore loga x =lnxln a

. Therefored

dxloga x =

1x ln a

.

We want to get a rough estimate for the value of e. We will find more precisemethods later. First estimate ln 2 = A(1, 2) is bounded above by the trapezoid withvertices (1, 0), (1, 1), (2, 1

2), (2, 0) which has base 1 and average height 12(1+

12) =

34 . So ln 2 < 0.75. Next look at ln 2.5. From 2 to 2.5, we bound the curve by

FIGURE 3.6. Estimating e

another trapezoid including vertices (2.5, 0.4) and (2.5, 0). Its area is greater than

Page 43: Rigorous Calculus

38 Functions

A(2, 2.5). This trapezoid has base 12 and average height 1

2(12 + 2

5) =9

20 , so it addsan additional 9

40 . Therefore ln 2.5 < 34 + 9

40 = 3940 < 1. Hence 2.5 < e.

On the other hand, the area A(1, 3) under 1x from 1 to 3 is bounded below by a

trapezoid on [1, 2] with upper edge tangent to 1x at (3

2 ,23) plus the rectangle on [2.3]

of height 13 . This provides a lower bound ln 3 = A(1, 3) > 1(2

3) +13 = 1. See the

figures. Thus 2.5 < e < 3.

3.3. The exponential function

Since lnx is strictly increasing on (0,∞) and maps onto the whole line, R,there is an inverse function E : R → (0,∞) satisfying

E(lnx) = x for x > 0 and ln(E(x)) = x for x ∈ R.

In particular, E is strictly increasing, and E(0) = 1 and E(1) = e. Each point(x, y) on the graph of lnx converts to the point (y, x) on the graph of E.

Since ln(ab) = ln a+ ln b, apply E to get ab = E(ln a+ ln b). Substitute x =ln a and y = ln b, which are arbitrary real numbers, to get E(x)E(y) = E(x+ y).Thus E(x)n = E(nx) and so E(xn)

n = E(x). Thus E(mnn ) = E(xn)

m = E(x)mn .

Now take x = 1 to get E(mn ) = emn . As E is monotone increasing, we obtain that

E(x) = ex for all real numbers. This is known as the exponential function.

(A) Graph of lnx (B) Graph of ex

3.3.1. EXAMPLES. Let’s try to understand the growth rates of these functionsmore precisely.

(1) limx→∞

lnxx

. If en ≤ x ≤ en+1, we have n ≤ lnx ≤ n+ 1, and hence

n

en+1 ≤ lnxx

≤ n+ 1en

.

Page 44: Rigorous Calculus

3.3 The exponential function 39

Claim: limn→∞

n

en= 0. Set an =

n

enand compute

an+1

an=n+ 1ne

≤ 2e< 0.8.

Therefore by repeated application, we get an+1 ≤ (.8)na1 → 0. Thus by the

Squeeze Theorem, limx→∞

lnxx

= 0.

(2) Let a > 0 (think of a as small). Then in the next limit, substitute y = xa andnote that as x→ ∞, then y → ∞.

limx→∞

lnxxa

= limy→∞

ln y1/a

y=

1a

limy→∞

ln yy

= 0.

This says that lnx grows more slowly than any positive power of x, and thus in-creases very slowly for large x. The curve gets flatter and flatter as x increases.

(3) This converts to a statement about ex. Here we substitute y = ex, and note thatas x→ ∞, then y → ∞.

limx→∞

ex

x= lim

y→∞

y

ln y= +∞.

Then for any b > 0 (think of b as very large), and again substitute y = ex,

limx→∞

ex

xb= lim

y→∞

y

(ln y)b= lim

y→∞

(y1/b

ln y

)b= +∞.

Hence ex grows faster than any power of x.

(4) Now consider the behaviour of lnx as x → 0+. Again let a > 0. So asx → 0+, we have lnx → −∞ and xa → 0. We will substitute y = 1

x , so thaty → ∞.

limx→0+

xa lnx = limy→∞

y−a ln y−1 = limy→∞

− ln yya

= 0.

Moreover it takes negative values. So while lnx → −∞, it goes more slowly thanxa goes to 0 for any a > 0.

(5) If I substitute y = −x in the following:

limx→−∞

|x|bex = limy→∞

yb

ey= 0.

This shows that ex goes to zero faster than any polynomial xn goes to ∞ as x →−∞.

3.3.2. EXAMPLE. Graph f(x) = x lnx for x > 0. Then f(x) = 0 only atx = 1, where it changes from negative to positive. It increases to infinity a bitfaster than a straight line. The interesting behaviour is between 0 and 1. We have

Page 45: Rigorous Calculus

40 Functions

limx→0+

f(x) = 0. Now look at the derivative: f ′(x) = lnx + 1. The is monotone

increasing, so the curve always curves upwards as a faster rate as x increases. Notethat 0 = f ′(x) implies that lnx = −1, so that x = 1

e . This is a minimum at (1e ,−

1e).

Notice also that limx→0+

f ′(x) = −∞. This means that the curve is becoming vertical.

There is a vertical tangent at the limit point (0, 0). See the figure.

FIGURE 3.8. Graph of x lnx.

3.3.3. PROPOSITION. d

dx(ex) = ex for x ∈ R.

PROOF. Apply the chain rule to the identity ln(ex) = x to get1ex

d

dx(ex) = 1 or

d

dx(ex) = ex. ■

3.3.4. PROPOSITION. limn→∞

(1 + x

n

)n= ex for x ∈ R.

PROOF. By Proposition 3.2.4 applied to 1 + xn , we have

11 + x

n

<ln(1 + x

n)

x/n<

11.

Thereforex

1 + xn

< ln(1 + x

n

)n< x.

Exponentiate to gete

x1+ x

n <(1 + x

n

)n< ex.

Now let n→ ∞. By the Squeeze Theorem, limn→∞

(1 + x

n

)n= ex. ■

In the next result, we need Bernouilli’s inequality.

3.3.5. LEMMA. (1 + x)n ≥ 1 + nx for 1 + x > 0 and n ≥ 1.

Page 46: Rigorous Calculus

3.3 The exponential function 41

PROOF. Proceed by induction on n. For n = 1, 1 + x = 1 + x. Assume that itis true for n− 1. Then

(1+x)n = (1+x)n−1(1+x) ≥ (1+(n−1)x)(1+x) = 1+nx+(n−1)x2 ≥ 1+nx.

Hence it is true for n. By induction, this holds for all n ≥ 1. ■

3.3.6. PROPOSITION. Let an = (1 + 1n

)n and bn = (1 + 1n

)n+1. Then an ismonotone increasing, bn is monotone decreasing and

limn→∞

an = limn→∞

bn = e

Similarly cn = (1 − 1n

)n−1 and dn = (1 − 1n

)n are monotone decreasing andincreasing respectively with limit 1

e .

PROOF. Set x = 1 in the previous Proposition and get limn→∞

an = e1. Also

limn→∞

bn = limn→∞

an(1 + 1n

)= e(1) = e.

Compute, using Bernouilli’s inequality at the last step.

an+1

bn=(n+ 2n+ 1

)n+1( n

n+ 1

)n+1=((n+ 2)n(n+ 1)2

)n+1

=(

1 − 1(n+ 1)2

)n+1≥ 1 − n+ 1

(n+ 1)2 =n

n+ 1.

Thus an+1 ≥ nn+1bn = an. Similarly we can invert this to get

bnan+1

=(

1 +1

n2 + 2n

)n+1≥ 1 +

n+ 1n2 + 2n

>n+ 2n+ 1

.

Thus bn+1 = n+2n+1an+1 < bn.

Note that cn =1

an−1and dn =

1bn−1

. The monotonicity and limit follows. ■

3.3.7. EXAMPLE. We will graph a more complicated function, g(x) = xe1/x.We first collect as much information as we can about zeros and limits approaching±∞ and any points where g(x) is undefined. Then we will look for critical points.

• g is undefined at x = 0. Also g has no zeros.

• Check the behaviour as x→ 0. Substitute y = 1x in the limits.

limx→0+

xe1/x = limy→+∞

ey

y= +∞.

This is a vertical asymptote. Notice that the limits from the two sidesdiffer!

limx→0−

xe1/x = limy→−∞

ey

y= 0−.

Page 47: Rigorous Calculus

42 Functions

• Behaviour at +∞. limx→+∞

xe1/x = +∞. However e1/x → 1, so in some

sense xe1/x ≈ x. To see how good an approximation this is, we computeanother limit. Substitute y = 1

x .

limx→+∞

xe1/x − x = limx→+∞

e1/x − 11/x

= limy→0+

ey − 1y

=d

dy(ey)|y=0 = e0 = 1.

This means that g(x) approaches the line y = x + 1 as x → ∞. So thisline is an oblique asymptote.

• Behaviour at −∞. limx→−∞

xe1/x = −∞, and again looks like x.

limx→−∞

xe1/x − x = limy→0−

ey − 1y

=d

dy(ey)|y=0 = 1.

Thus the line y = x+ 1 is an oblique asymptote as x→ −∞ as well.

• g′(x) = e1/x + xe1/x(−1x2 ) = e1/x(1 − 1

x). So g′(1) = 0 is the only

critical point. Since g(x) → +∞ as x → 0+ and as x → +∞, there is alocal minimum at (1, e).

• We should also check how g approaches 0 as x→ 0−.

limx→0−

g′(x) = limx→0−

(1 − 1x)e

1/x = limy→−∞

(1 − y)ey = 0.

So there is a horizontal tangent at the limit point (0, 0).

FIGURE 3.9. Graph of g(x).

Page 48: Rigorous Calculus

3.3 Exercises for Chapter 3 43

Exercises for Chapter 3

1. Give a careful ε–δ argument to prove that limx→8

3x1/3

x+ 4=

12

.

2. Compute the following limits:

(a) limx→1

3x3 − 1

− 4x4 − 1

(b) limx→1

sin(x2 − 1)x− 1

(c) limx→0

tanx− sinxx3

(d) limx→+∞

x3/2(√x+ 2 +

√x− 2

√x+ 1

)3. Consider the functions f(x) = x sin2( 1

x) for x = 0(a) Where does the graph of this function touch the curves y = 0, y = x and

y = 1x?

(b) Graph the function f(x) (by hand! ). Include graphs of the the auxillarycurves y = x and y = 1

x on the same graph and show the intersectionpoints from (a). Pay attention to the behaviour of f(x) as x approaches 0and ±∞ and identify any asymptotes.Do not try to compute local maxima and minima or inflection points.

4. Compute the following limits:

(a) limx→+∞

lnx1000

x1/1000

(b) limx→0

x−4e−1/x2

(c) limx→1

x2 − 1lnx

(d) limn→+∞

(1 + 1

n

)n2(1 + 1

n+1

)−(n+1)2

5. Graph the function f(x) =1 − cosx

x2 for x = 0. Draw the auxillary curve

y =2x2 on the same graph and explain where these two curves touch and where

f touches the x-axis. Pay attention to the behaviour of f(x) as x approaches0 and ±∞ and identify any asymptotes. Do not try to compute local maximaand minima or inflection points.

6. (a) Graph the function f(x) =lnxx

for x > 0. Pay attention to the behaviour

of f(x) as x approaches 0+ and +∞, asymptotes, maxima and minima,zeros and points of inflection (i.e. points where f ′′(x) changes sign).

(b) Which is larger, 3π or π3?

Page 49: Rigorous Calculus

44 Functions

7. Two of the hyperbolic trig functions are

sinh(x) =ex − e−x

2and cosh(x) =

ex + e−x

2.

(a) Sketch the functions ex/2, e−x/2, sinh(x) and cosh(x) on the same graph.Pay attention to the relationships between the four curves. I am not expect-ing a detailed graph here.

(b) Show that cosh2(x)− sinh2(x) = 1 for all x ∈ R.(c) Show that sinh(x+y) = sinh(x) cosh(y)+cosh(x) sinh(y) for all x, y ∈ R.(d) Solve sinh(x) = y for x as a function of y.

8. Consider the function f(x) = xe3x−1x2 for x = 0. Graph the function f(x) (by

hand! ). Pay attention to the following (show your work):• asymptotic behaviour at ±∞ and behaviour at 0.• compute the derivative and find the critical points, including lim

x→0f ′(x).

• compute any points of inflection.• choose a scale that illustrates the key features.

9. For which values of t > 1 does the expression tttt...

make sense?HINT: fix t > 1 and define a0 = 1 and an+1 = tan for n ≥ 0. The questionasks when this sequence has a limit. Try t =

√2 and t = 2 on the computer to

see what happens.(a) Show that an+1 > an for all n ≥ 1. What does this tell you?(b) When L = lim

n→∞an exists, solve for t in terms of L. Use this to find the

optimal upper bound for those values of t for which the limit exists. Whathappens for larger t?

(c) For these values of t, show by induction that an is bounded above by e forall n ≥ 1. What does this tell you?

Note: the behaviour when 0 < t < 1 is very interesting, but much trickier.Compute a few terms using t = 1

16 and see what occurs. Compare with t = 14 .

Page 50: Rigorous Calculus

CHAPTER 4

Continuity

4.1. Continuous functions

We introduce an extremely important property of functions.

4.1.1. DEFINITION. Suppose that f : [b, c] → R and b < a < c. The functionf is continuous at a if lim

x→af(x) = f(a). That is, for ε > 0, there is a δ > 0 so

that if |x − a| < δ, then |f(x) − f(a)| < ε. If it is not continuous at a, then it isdiscontinuous at a.

At endpoints of a closed interval, use one-sided limits. So f is continuous at bif lim

x→b+f(x) = f(b); and f is continuous at c if lim

x→c−f(x) = f(c). We say that

f(x) is a continuous function on a set X if it is continuous at each a ∈ X .

4.1.2. REMARK. In the definition of limx→a

f(x), we specify that 0 < |x−a| < δ.

But here we said |x−a| < δ. This is fine because f(a) is defined and is the putativelimit. In particular. |f(a)− f(a)| = 0 < ε is always true.

4.1.3. EXAMPLES.(1) Let f(x) = 1

x for x = 0. Fix a = 0. Then

|f(x)− f(a)| =∣∣∣1x− 1a

∣∣∣ = |a− x||ax|

.

Let ε > 0. If 0 < δ ≤ δ0 = |a|2 , then |x−a| < δ implies |x| ≥ |a|−|x−a| ≥ |a|/2.

Therefore|a− x||ax|

≤ 2|a− x|a2 . Now choose δ = min{a2ε

2 ,|a|2 }. Then |x− a| < δ

implies that |f(x) − f(a)| < 2δa2 ≤ ε. Therefore f(x) is continuous on R \ {0}.

Since limx→0+

1x= +∞, there is no way to define f at 0 to make it continuous there.

(2) Consider sinx. Use the addition formula

sin(a+h)−sin a=sin a cosh+cos a sinh−sin a=sin a(cosh−1) + cos a sinh.

45

Page 51: Rigorous Calculus

46 Continuity

Using the estimates from Example 3.1.6 for |h| < π/2,

| sin(a+ h)− sin a| ≤ 1| cosh− 1|+ 1| sinh| < h2 + |h|.Therefore lim

h→0sin(a+ h) = sin a. Hence sinx is continuous.

(3) A function f : [a, b] → R is Lipschitz with Lipschitz constant L if

|f(x)− f(y)| ≤ L|x− y| for all x, y ∈ [a, b].

Given ε > 0, we can take δ = ε/L. Then for any x, y in the domain with |x− y| <δ, we have |f(x)− f(y)| ≤ Lδ = ε. Therefore Lipschitz functions are continuous.

(4) Let f(x) =

sinxx

if x = 0

1 if x = 0. Then f(x) is continuous at x = 0 because

of Example 3.1.6. It is continuous everywhere else by general facts that we willestablish in the next section.

(5) Let f(x) =

1 if x > 00 if x = 0

−1 if x < 0. Then f(x) is continuous on R \ {0}, but is

discontinuous at 0.

(6) Let f(x) =

{x if x ∈ Q0 if x ∈ R \Q.

Then f(x) is continuous at 0, but discontinu-

ous at every a = 0.

(7) Let f(x) = sin 1x . This has a bad discontinuity at 0. There is no way to define

f(0) to make it continuous because every value in [−1, 1] is a limit of f(x) alongsome subsequence approaching 0.

x

y1

−1

0.5−0.5

FIGURE 4.1. Graph of sin(1/x).

(8) Thomae’s function. Let f(x) =

{1q if x = p

q ∈ Q, gcd(p, q) = 1, q > 0.0 if x ∈ R \Q.

Claim: f(x) is discontinuous on Q and continuous on R \Q.Every a ∈ R is a limit of irrational numbers. So if a ∈ Q, then since f(a) = 0,

f is discontinuous at a. Now suppose that a is irrational, and ε > 0. Choose N so

Page 52: Rigorous Calculus

4.2 Properties of Continuous Functions 47

x

y

−1 0 1 2

1

FIGURE 4.2. Thomae’s function on [−1, 2]

that 1N ≤ ε. The set X = {p

q: 1 ≤ q ≤ N, p ∈ Z} ∩ [a − 1, a + 1] is a finite

set of rational numbers. Thus dist(a,X) = min{|a − x| : x ∈ X} = δ > 0. If|x − a| < δ, then either x ∈ Q and f(x) − f(a) = 0 or x = p

q with q > N andgcd(p, q) = 1. Hence |f(x)− f(a)| ≤ 1

N+1 < ε. Therefore f is continuous at a.

4.2. Properties of Continuous Functions

The standard operations of functions that we use preserve continuity.

4.2.1. PROPOSITION. Suppose that f, g : [b, c] → R are both continuous at a.Then rf + sg and fg are continuous at a for any r, s ∈ R. If g(a) = 0, then f/gis also continuous at a.

PROOF. This is an immediate consequence of Proposition 3.1.5. We will usethe ε–δ definition to show that fg is continuous at a. We need to control

|f(x)g(x)− f(a)g(a)| = |f(x)g(x)− f(a)g(x) + f(a)g(x)− f(a)g(a)|≤ |f(x)− f(a)| |g(x)|+ |f(a)|, |g(x)− g(a)|.

First bound |g(x)|. Take ε0 = 1 and find δ0 > 0 so that |x − a| < δ0 implies that|g(x)−g(a)| < 1; and hence |g(x)| ≤ |g(a)|+1. Now given ε > 0, find δ1 > 0 sothat |x−a| < δ1 implies that |f(x)−f(a)| < ε

2(|g(a)|+1) . Also choose δ2 > 0 so that

|x − a| < δ2 implies that |g(x) − g(a)| < ε2(|f(a)|+1) . Define δ = min{δ0, δ1, δ2}.

If |x− a| < δ, then

|f(x)g(x)− f(a)g(a)| ≤ |f(x)− f(a)| |g(x)|+ |f(a)|, |g(x)− g(a)|

2(|g(a)|+ 1)(|g(a)|+ 1) +

ε

2(|f(a)|+ 1)|f(a)| < ε.

Therefore f(x)g(x) is continuous at a. ■

Next we show that composition preserves continuity.

Page 53: Rigorous Calculus

48 Continuity

4.2.2. THEOREM. Suppose that f : [r, s] → [u, v] and g : [u, v] → R. Supposethat a ∈ (r, s) and f(a) = b. If lim

x→af(x) = f(a) and lim

y→bg(y) = g(b), then

limx→a

g ◦ f(x) = g ◦ f(a); i.e., if f is continuous at a and g is continuous at f(a),then g ◦ f is continuous at a. If f and g are both continuous, then so is g ◦ f .

PROOF. Let ε > 0 be given. Find δ1 > 0 so that |y − b| < δ1 implies that|g(y)−g(b)| < ε. Use this δ1 as an epsilon in the limit of f to obtain δ2 > 0 so that|x− a| < δ2 implies that |f(x)− f(a)| < δ1. And hence |g(f(x))− g(f(a))| < ε.Therefore g ◦ f is continuous at a. ■

4.2.3. EXAMPLES.(1) Trig functions. We saw that sinx is continuous everywhere. Hence cosx =

sin(x + π2 ) is continuous. The function tanx =

sinxcosx

is then continuous except

at the points where cosx = 0, namely odd multiples of π2 . However tanx is not

defined at these points. Similarly, cotx, secx and cscx are continuous where theyare defined.

(2) If a > 0, we define ax = ex ln a. This is continuous on R.

(3) Rational functions, which are functions of the form f(x) =p(x)

q(x)where p and

q are polynomials are continuous except at the roots of q.

(4) f(x) = cos(x2 + 1x3 ) sin(e3 tanx) is continuous except at

{0, (2n+1)π

2 : n ∈ Z}

.

4.3. Extreme Value Theorem

This fundamental result describing general conditions which guarantee that afunction attains its maximum and minimum values depends both on continuity andon the completeness of the real line.

4.3.1. EXTREME VALUE THEOREM. Let f(x) be a continuous function ona closed, bounded interval [a, b]. Then f is bounded and there is a point c ∈ [a, b]so that f(c) = supa≤x≤b f(x), and a point d ∈ [a, b] so that f(d) = infa≤x≤b f(x).

PROOF. Let L = supa≤x≤b f(x). This could possibly be +∞. Pick real num-bers L1 < L2 < Ln < Ln+1 < . . . so that L = limn→∞ Ln. (E.g., if L < ∞,take Ln = L− 1

n and if L = ∞, take Ln = n.) Since supa≤x≤b f(x) > Ln, thereis an xn ∈ [a, b] so that f(xn) > Ln. Now (xn)n≥1 is a bounded sequence. Bythe Bolzano-Weierstrass Theorem, there is a convergent subsequence (xni) with

Page 54: Rigorous Calculus

4.4 Intermediate Value Theorem 49

limi→∞

xni = c. Since [a, b] is closed, c ∈ [a, b]. By continuity of f ,

L ≥ f(c) = limi→∞

f(xni) ≥ limi→∞

Lni = L.

Therefore L = f(c) <∞ and the supremum is attained.Similarly, there is a point d ∈ [a, b] so that f(d) = infa≤x≤b f(x). ■

4.3.2. EXAMPLES.(1) Examples where the maximum is attained are familiar. Like f(x) = 1− |x| on[−2, 2]. Then max f(x) = f(0) = 1.

(2) The real line, R, is closed but not bounded. The function f(x) = −11+x2 is

continuous and bounded, but does not attain sup f(x) = 0.

(3) The continuous function f(x) = x is not even bounded on R.

(4) Let f(x) = x on (0, 1). The interval (0, 1) is bounded, but not closed. Thecontinuous function f does not attain its supremum or infimum.

(5) Let f(x) = 1x on (0, 1]. This is continuous, but unbounded. Again the domain

(0, 1] is not closed.

(6) Define f(x) =

{0 if x = 01x if 0 < x ≤ 1

. This function is defined on a closed

bounded interval, but is unbounded. The problem is that f is not continuous at0.

(7) A similar example is f(x) =

{12 if x ∈ {0, 1}x if 0 < x < 1.

Again [0, 1] is closed and

bounded, and here f is bounded, but does not attain its supremum or infimum. Theproblem is that f is not continuous at 0 or 1.

4.4. Intermediate Value Theorem

There is a second important theorem about continuous functions that relies inan essential way on the completeness of R.

4.4.1. INTERMEDIATE VALUE THEOREM. Let f : [a, b] → R be a contin-uous function such that f(a) < L < f(b). Then there is a point c ∈ (a, b) so thatf(c) = L.

Page 55: Rigorous Calculus

50 Continuity

4.4.2. REMARK. Intuitively this says that if the curve y = f(x) starts below theline y = L at a and arrives above the line at b, then it must cross the line somewhere.However if we define f : Q → Q by f(x) = x3, then f(0) = 0 < 3 < 8 = f(2),but there is no rational number x so that f(x) = 3. There is a gap in Q at 3

√3. It is

the incompleteness of Q which allows the result to fail.

PROOF. Let X = {x ∈ [a, b] : f(x) < L}. Then a ∈ X and X ⊂ [a, b), soit is a non-empty bounded set. Hence c = supX exists by the Least Upper BoundPrinciple, and clearly c ≤ b.

Claim: c < b. Let ε = f(b)− L > 0. By continuity of f at b, there is a δ > 0so that if |x− b| < δ, then |f(x)− f(b)| < ε. Hence f(x) > f(b)− ε = L. so on(b− δ, b], f(x) > L. Thus X ∩ (b− δ, b] = ∅, and so c ≤ b− δ < b.

Claim: f(c) = L. Any x > c is not in X and so f(x) > L. Choose a sequence(an) in (c, b] and decreasing to c. Then by continuity,

f(c) = limn→∞

f(an) ≥ limn→∞

L = L.

On the other hand, there is a sequence of points xn ∈ X such that limn→∞

xn = c.Thus,

f(c) = limn→∞

f(xn) ≤ limn→∞

L = L.

Therefore, f(c) = L. ■

4.4.3. EXAMPLES.(1) Every polynomial of odd degree has a real root. Write the polynomial asp(x) = anx

n + an−1xn−1 + · · · + a1x + a0, where n is odd and an = 0. Ob-

serve that

limx→+∞

p(x) = limx→+∞

anxn(1 +

an−1

x+ · · ·+ a1

xn−1 +a0

xn)=

{+∞ if an > 0−∞ if an < 0.

Similarly,

limx→−∞

p(x) = limx→−∞

anxn(1 +

an−1

x+ · · ·+ a1

xn−1 +a0

xn)=

{−∞ if an > 0+∞ if an < 0.

Therefore p(x) changes sign. By the Intermediate Value Theorem (IVT), theremust be a point x0 so that p(x0) = 0.

(2) Every monic polynomial of even degree attains its minimum value. Writep(x) = x2n + a2n−1x

2n−1 + · · · + a1x + a0, Arguing as in the previous exam-ple, we see that

limx→+∞

p(x) = limx→−∞

p(x) = +∞.

Page 56: Rigorous Calculus

4.5 Intermediate Value Theorem 51

Therefore there is some large numberN so that if |x| > N , then p(x) > p(0)+1. Itfollows that the infimum of p occurs in [−N,N ]. By the Extreme Value Theorem,p attains its minimum value.

(3) Let f(x) = x179 +163

1 + x2 + sin2 x. The equation f(x) = 119 has a solution.

Note that f is continuous since the denominator is always non-zero. Now f(0) =

163 > 119 and f(1) = 1 +163

2 + sin2 1< 1 + 81.5 < 119. By IVT, there is a point

c ∈ (0, 1) so that f(c) = 119.

4.4.4. COROLLARY. If f : [a, b] → R is continuous, then Ran(f) is a closedbounded interval.

PROOF. By the Extreme Value Theorem, the range of f is bounded, and thereare points x0, x1 ∈ [a, b] so that

f(x0) = inf{f(x) : a ≤ x ≤ b} and f(x1) = sup{f(x) : a ≤ x ≤ b}.

By the Intermediate Value Theorem, every value y with f(x0) ≤ y ≤ f(x1) is inthe range of f . Thus Ran(f) = [f(x0), f(x1)] is a closed bounded interval. ■

When f is defined on an interval I which is not closed or not bounded, then byrestricting f to an increasing sequence of closed bounded subintervals with unionI , you can conclude that the range is an interval, but it may be open, closed or halfopen independent of I , and it may be unbounded.

4.4.5. COROLLARY. If I is an interval (open, closed or half closed) and f :I → R is continuous and one-to-one, then f is strictly monotone.

PROOF. We prove this by contradiction. Notice that f is monotone increasingand one-to-one if and only if a < b < c implies f(a) < f(b) < f(c); and likewisef is monotone decreasing if a < b < c implies f(a) > f(b) > f(c). Thus failureto be monotone means that there are points a < b < c so that f(a) < f(b) > f(c)or the inequalities are reversed, since an equality contradicts injectivity. Eitherway, there is a value L in (f(a), f(b)) ∩ (f(c), f(b)). By the IVT, there are pointsx1 ∈ (a, b) and x2 ∈ (b, c) so that f(x1) = L = f(x2), contradicting the fact thatf is one to one. Therefore f is monotone. ■

Page 57: Rigorous Calculus

52 Continuity

4.5. Monotone Functions

4.5.1. PROPOSITION. Let f : [a, b] → R be a monotone increasing function,and let a < c < b. Then lim

x→c−f(x) = L and lim

x→c+f(x) = M both exist, and

L ≤ f(c) ≤M .

PROOF. Let A = {f(x) : a ≤ x < c}. This is a non-empty set bounded aboveby f(c). Hence L = supA exists by the LUBP, and L ≤ f(c). For ε > 0, L− ε isnot an upper bound for A, and hence there is some x0 < c so that f(x0) > L− ε.Set δ = c − x0. If c − δ < x < c, then L − ε < f(x0) ≤ f(x) ≤ L. Thereforelim

x→c−f(x) = L. Similarly lim

x→c+f(x) = inf{f(x) : c < x ≤ b} =M ≥ f(c). ■

4.5.2. DEFINITION. If limx→c−

f(x) = L and limx→c+

f(x) = M both exist, but L,

M and f(c) are not all equal, then f is said to have a jump discontinuity.

4.5.3. REMARK. At the endpoint a, a similar analysis shows that limx→a+

f(x) =

M exists, and f(a) ≤ M . If f(a) < M , we call this a jump discontinuity as well.Similarly, lim

x→b−f(x) = L ≤ f(b); and it is a jump discontinuity if L < f(b).

The Proposition shows that the only type of discontinuity that a monotone func-tion can have is a jump discontinuity. This leads to the following useful conclusion.

4.5.4. COROLLARY. If I is an interval (closed, open or half open) and f : I →R is a monotone function, then f is continuous if and only if Ran(f) is an interval.

PROOF. If f is discontinuous at an interior point c ∈ I , then

limx→c−

f(x) = L < M = limx→c+

f(x).

Thus Ran(f) ⊂ [f(a), L]∪{f(c)}∪ [M,f(b)]. The range omits all except possiblyone point of (L,M), and hence the range is not an interval. A similar analysisworks at an endpoint.

Conversely, if Ran(f) is an interval, then we see that f has no jump disconti-nuities. Thus for a < c < b, we have lim

x→c−f(x) = f(c) = lim

x→c+f(x). Therefore

f is continuous at c. The argument works similarly at any endpoints. ■

Recall that when a function f is one-to-one, it is a bijection from its domainonto its range. Thus it has an inverse function f−1. We have seen that when fis continuous and one-to-one on an interval, then it is monotone and the range inan interval. We can now show that the inverse function is also continuous andmonotone.

Page 58: Rigorous Calculus

4.5 Monotone Functions 53

4.5.5. PROPOSITION. If I is an interval and f is continuous and strictly in-creasing, then the inverse function f−1 is continuous and strictly increasing. Sim-ilarly, if f is continuous and strictly decreasing, then the inverse function f−1 iscontinuous and strictly decreasing.

PROOF. We will deal with the increasing case. Since f is continuous, the rangeis an interval J . The inverse function f−1 : J → I is a bijection. If y1 < y2 ∈ J ,then there are unique points x1, x2 ∈ I with f(xi) = yi. Since f is increasing,x1 < x2. Thus f−1(y1) = x1 < x2 = f−1(y2). Thus f−1 is strictly increasing.The range is an interval, and thus has no gaps. Therefore f−1 is continuous. ■

A countable set X is either finite or can be written as a list X = {xn : n ∈ N}.See the Appendix section A.3.

4.5.6. PROPOSITION. A monotone function f : (a, b) → R is continuousexcept on a countable set.

PROOF. We may suppose that f is increasing. Since the only discontinuitiesare jump discontinuities, we count the jumps based on their size. Since the rangeof f may be unbounded, we also need to carefully approach the endpoints.

Between a+ 1n and b− 1

n , count the jumps Jn of height at least 1n . There can’t

be more than n(f(b− 1

n)−f(a+1n)), which is finite. Therefore all discontinuities

belong to J =⋃

n≥1 Jn, which is countable. ■

4.5.7. EXAMPLES.

(1) Inverse trig functions. The trig functions are all periodic, either 2π-periodiclike sinx and cosx or π-periodic like tanx and cotx. So they are not one-to-one. We get around that by restricting the domain so that it is injective, maps ontothe whole range, and the domain is as connected as possible and includes the firstquadrant (0, π2 ).

Take sinx. It is monotone increasing on [−π2 ,

π2 ] and maps onto [−1, 1]. Thus

sin−1(y) is the unique x ∈ [−π2 ,

π2 ] with sinx = y. For cosx, we use [0, π], on

which cosx is monotone decreasing and maps onto [−1, 1].The tangent tanx is defined on (−π

2 ,π2 ), is strictly increasing, and maps onto

R. So tan−1 : R → (−π2 ,

π2 ) is a bounded function on the whole line. Similarly

cotx maps (0, π) onto R and is strictly decreasing.The secant is a problem, because the range of secx is (−∞,−1]∪[1,∞), which

is the union of two disjoint intervals. Normal practice is to restrict the domain to[0, π2 ) ∪ (π2 , π]. On [0, π2 ), secant is monotone increasing and maps onto [1,∞);while on (π2 , π], secant is also increasing with range (−∞,−1]. Hence sec−1 is

Page 59: Rigorous Calculus

54 Continuity

increasing, and maps [1,∞) onto [0, π2 ), and maps (−∞,−1] onto (π2 , π]. Simi-larly, csc−1 is strictly decreasing, and maps [−π

2 , 0) onto (−∞,−1] and (0, π2 ] onto[1,∞).

FIGURE 4.3. sec−1(x)

(2) Enumerate Q ∩ (0, 1) = { 12 ,

13 ,

23 ,

14 ,

34 ,

15 ,

25 ,

35 ,

45 ,

16 , . . . } as {rn : n ≥ 1}.

Define f : [0, 1] → [0, 1] by

f(x) =∑

{n:rn<x}

2−n.

Then if 0 ≤ x < y ≤ 1, there is some rn so that x < rn < y, and thereforef(x) < f(y). So f is strictly monotone increasing. Clearly f has a jump discon-tinuity at every rn. However it is continuous at each irrational number as wellas 0 and 1. Say c is irrational in (0, 1) and ε > 0. Pick N so that 2−N < ε.Set δ = dist(x, {rn : 1 ≤ n ≤ N}. If c − δ < y < x < z < c + δ, then{n : y ≤ rn < z} ⊂ {n : n > N}, and hence

f(z)− f(y) =∑

{n:y≤rn<z}

2−n <∑n>N

2−n = 2−N < ε.

Defining L and M as in Proposition 4.5.1, we see that M −L < f(z)− f(y) < ε.But ε > 0 was arbitrary, and thus L =M and f is continuous at c.

(3) The Cantor function. Define a function f : [0, 1] → [0, 1] as follows: setf(0) = 0 and f(1) = 1. On the middle third, [1

3 ,23 ], set f(x) = 1

2 . Then take themiddle third from each of the remainder, and set f(x) = 1

4 on [19 ,

29 ] and f(x) = 3

4on [7

9 ,89 ]. At the nth stage, there are 2n − 1 intervals on which f is defined to

take the values k2n , in order, for 1 ≤ k ≤ 2n − 1. What remains are 2n intervals

of length 3−n on which f has not yet been defined. In each one, take the middle

Page 60: Rigorous Calculus

4.5 Monotone Functions 55

third and define f to take the average of the values at the two ends of the interval,which will be a number of the form 2k−1

2n+1 for 1 ≤ k ≤ 2n. In the end, we havedefined a monotone increasing, locally constant function on the union S of all ofthese intervals, taking all of the values k

2n for n ≥ 1 and 0 ≤ k ≤ 2n. These are thediadic rationals numbers in [0, 1]. The set S does not include the whole interval, sonow we define f on the rest by

f(x) = sup{f(t) : t ≤ x, t ∈ S}.

This defines a monotone increasing function on [0, 1] known as the Cantor function.Notice that the range of f has no gaps because the range includes all diadic

rationals,. Therefore f is continuous. On each open interval in S, namely (13 ,

23),

(19 ,

29), (

79 ,

89), etc., the function f is constant, and thus has derivative 0. The total

length of all of these intervals is

13+ 2(1

9) + 4( 127) + · · · =

∞∑k=1

2k−1

3n=

1/31 − 2/3

= 1.

So this is “most” of the interval in some sense. What remains after these openintervals are removed is a closed set C known as the Cantor set. It includes theendpoints of the removed intervals, and their limits, which is actually quite a lotmore.

x1

y1

0

FIGURE 4.4. The Cantor function

To understand this function better, write x in base 3 as x = (0.x1x2x3 . . . )base 3,where xi ∈ {0, 1, 2}. The interval [1

3 ,23 ] = {x : x1 = 1}. The endpoints are

13 = (0.1000 . . . )base 3 and 2

3 = (0.1222 . . . )base 3. Like in base 10, numbers with afinite expansion in base 3 also have another ending in an infinite string of 2’s. Then

[19 ,

29 ] = {x = (0.01x3x4 . . . )base 3} and [7

9 ,89 ] = {x = (0.21x3x4 . . . )base 3}.

Page 61: Rigorous Calculus

56 Continuity

At the nth stage the intervals are determined by an initial sequence of 0s and/or 2’sof length n − 1 followed by a 1. The endpoints actually have another expressionusing only 0s and 2s, 1

3 = (0.0222 . . . )base 3, 23 = (0.2000 . . . )base 3,

19 = (0.00222 . . . )base 3, 2

9 = (0.02000 . . . )base 3, 79 = (0.20222 . . . )base 3, etc.

The Cantor set consists of all numbers in [0, 1] which have a ternary expansionwith no 1s. Let (ε1, ε2, . . . ) be a sequence of 0s and 1s. Then

f((0.(2ε1)(2ε2)(2ε3) . . . )base 3 = (0, ε1ε2ε3 . . . )base 2.

The Cantor set is mapped onto [0, 1] by f . This shows that the Cantor set has thesame cardinality as [0, 1] and R, (See Appendix A.3.)

Exercises for Chapter 4

1. Let f(x) and g(x) be continuous functions on [a, b]. Define

f ∧ g(x) = min{f(x), g(x)} and f ∨ g(x) = max{f(x), g(x)}.Prove that f ∧ g and f ∨ g are continuous on [a, b].

2. (a) Let f(x) = x1

1−x for x ≥ 0, x = 1. Can f be defined at x = 1 in order tomake the function continuous there?

(b) Let g(x) =e1/x

1 + e1/x for x = 0. Can g be defined at x = 0 in order to make

the function continuous there?(c) Let f(x) = (sin2 x)sec2 x. Where is this function defined? Can f be defined

at the missing points in order to make the function continuous there?HINT: xa = ea lnx. Look for a derivative in the exponent.

3. Fix a number d > 0. A function f(x) on R is called d–periodic if f(x+ d) =f(x) for all x ∈ R. Let f be a continuous d-periodic function on R. Show thatf attains its maximum and minimum values.

4. Suppose that f : R → (0,∞) is a continuous function such that f(0) > 0 andlim

x→+∞f(x) = 0 = lim

x→−∞f(x). Prove that f attains its supremum.

5. Let f : (a, b) → R be a monotone increasing function.(a) Show that if f is bounded above, then lim

x→b−f(x) exists.

HINT: use MCT for a sequence approaching b−.(b) Hence show that for all a < c < b that lim

x→c−f(x) and lim

x→c+f(x) both exist.

When is f(x) continuous at c?(c) Construct a function which is monotone increasing and bounded on R and

is discontinuous at every rational number.HINT: list all of the rational numbers as Q = {r1, r2, r3, . . . } and introducea small jump discontinuity at each rk.

Page 62: Rigorous Calculus

4.5 Exercises for Chapter 4 57

6. Show that f(x) = |x|1/2 is continuous but not Lipschitz.

7. Suppose that there is a constant C such that

|f(x)− f(y)| ≤ C|x− y|2 for all a ≤ x, y ≤ b.

Prove that f is constant.

8. Show that tanx− 4 sinx = ex has at least 3 solutions in (−π2 ,

π2 ).

9. Show that there are two antipodal points on the equator with exactly the sametemperature. HINT: parametrize points on the equator by [0, 2π] using theangle θ from the centre of the earth. Assume that the temperature functionT (θ) is continuous. (Is this reasonable?) You are asked to prove the existenceof some θ so that T (θ) = T (θ + π).

10. Suppose that f : R → (0,∞) is a positive continuous function such thatlim

x→+∞f(x) = 0 = lim

x→−∞f(x). Prove that f attains its supremum.

11. (a) Show that a continuous function on (−∞,+∞) cannot take every real valueexactly twice.

(b) Find a continuous function on (−∞,+∞) which takes every real valueexactly three times. A sketch of the curve will suffice. An exact formula isnot required.

Page 63: Rigorous Calculus

CHAPTER 5

Differentiation

5.1. The derivative

As you have seen in high school calculus, the derivative of a function f(x) atx0, if it exists, is the slope of the tangent line to the curve y = f(x) at a point(x0, f(x0)). We compute this by computing the slope of a secant, the line segmentfrom (x0, f(x0)) to (x0 + h, f(x0 + h)), and taking the limit as h → 0 throughboth positive and negative values.

FIGURE 5.1. Tangent line

5.1.1. DEFINITION. Let f : (a, b) → R and let x0 ∈ (a, b). Then f is differen-tiable at x0 if

limx→x0

f(x)− f(x0)

x− x0= lim

h→0

f(x0 + h)− f(x0)

h

exists and is finite. The limit is called f ′(x0) or(

ddxf)(x0).

We say f(x) is differentiable on (a, b) if it is differentiable at every pointx0 ∈ (a, b). When f is defined on a closed interval [a, b], we will say that f isdifferentiable on [a, b] if it is differentiable on (a, b) and the one-sided derivatives

limh→0+

f(a+ h)− f(a)

hand lim

h→0−

f(b+ h)− f(b)

h

58

Page 64: Rigorous Calculus

5.1 The derivative 59

both exists and are finite. The tangent line to f(x) at x0 is

T (x) = f(x0) + f ′(x0)(x− x0).

This is the line through (x0, f(x0)) with slope f ′(x0).

5.1.2. THEOREM. If f(x) is differentiable at x0, then f is continuous at x0.

PROOF. This is an easy computation

limx→x0

f(x)− f(x0) = limx→x0

f(x)− f(x0)

x− x0(x− x0) = f ′(x0) · 0 = 0. ■

Now we provide two useful variants which are equivalent to differentiability.

5.1.3. THEOREM. Let f : (a, b) → R and let x0 ∈ (a, b). The following areequivalent:

(1) f is differentiable at x0 and f ′(x0) = m.

(2) there is a linear function T (x) = f(x0) +m(x− x0) such that

limx→x0

f(x)− T (x)

x− x0= 0.

(3) there is a function φ(x) which is continuous at x0 such that φ(x0) = mand f(x) = f(x0) + φ(x)(x− x0).

PROOF. (1)⇒(3). Define φ(x) =f(x)− f(x0)

x− x0if x = x0 and φ(x0) = m.

Then f(x) = f(x0) + φ(x)(x− x0) and

limx→x0

φ(x) = limx→x0

f(x)− f(x0)

x− x0= f ′(x0) = m = φ(x0).

Therefore φ is continuous at x0, and (3) holds.(3)⇒(2). Let T (x) = f(x0) +m(x− x0). Then

limx→x0

f(x)− T (x)

x− x0= lim

x→x0

f(x0) + φ(x)(x− x0)− f(x0)−m(x− x0)

x− x0

= limx→x0

φ(x)−m = 0.

(2)⇒(1). If (2) holds, then

0 = limx→x0

f(x)− T (x)

x− x0= lim

x→x0

f(x)− f(x0)

x− x0−m.

Therefore f ′(x0) = limx→x0

f(x)− f(x0)

x− x0= m exists. ■

Page 65: Rigorous Calculus

60 Differentiation

5.1.4. EXAMPLES.(1) Let f(x) = sinx. we have shown that f ′(x) = cosx. Our proof used theaddition formula:

sin(x0 + h) = sinx0 cosh+ cosx0 sinh

= sinx0 + cosx0 h+ sinx0(cosh− 1) + cosx0(h− sinh).

The tangent line is T (x0 + h) = sinx0 + cosx0 h. Define

φ(x0 + h) = cosx0 + sinx0cosh− 1

h+ cosx0

h− sinhh

.

Then sin(x0 + h) = sinx0 + φ(x0 + h)h and

limh→0

φ(x0 + h) = cosx0 + sinx0 limh→0

cosh− 1h

+ cosx0 limh→0

h− sinhh

= cosx0.

(2) Let f(x) = xn. Then using the binomial theorem,

f ′(a) = limh→0

(a+ h)n − an

h= lim

h→0

∑nk=0

(nk

)an−khk − an

h

= limh→0

an − an

h+

(n

1

)an−1h

h+

n∑k=2

(n

k

)an−khk−1

= nan−1 + limh→0

hn∑

k=2

(n

k

)an−khk−2 = nan−1.

So f ′(x) = nxn−1.

(3) f(x) = ax = ex ln a. Then

f ′(x0) = limh→0

ax0+h − ax0

h= ax0 lim

h→0

eh ln a − 1h ln a

ln a

= ax0 ln a limu→0

eu − 1u

= ax0 ln a.

(4) Let f(x) =

{x2 sin 1

x if x = 00 if x = 0

. For x = 0, we have

f ′(x) = 2x sin 1x + x2

(−1x2

)cos 1

x = 2x sin 1x − cos 1

x .

However at x = 0, we have

limh→0

f(h)− f(0)h

= limh→0

h sin 1h = 0

by the Squeeze Theorem (since −|h| ≤ h sin 1h ≤ |h|). Thus f ′(0) = 0. Notice

however that f ′(x) is discontinuous at 0:

limx→0

f ′(x) = limx→0

2x sin 1x − cos 1

x

Page 66: Rigorous Calculus

5.2 Differentiation Rules 61

does not exist!To graph f(x), we notice that as x → 0, the curve oscillates rapidly between

y = x2 and y = −x2. The function is odd. It has zeroes at 1nπ for n ∈ Z \ {0}.

It touches the bounding curves in between, and at ±2π . As x → ±∞, we can

approximate x2 sin 1x ≈ x2 1

x = x. Thus we compute (by substituting u = 1x )

limx→∞

x2 sin1x− x = lim

u→0+

1u

(sinuu

− 1)

We showed that for 0 < u < π2 , that cosu <

sinuu

< 1, and hence

cosu− 1u

<1u

(sinuu

− 1)< 0.

As limu→0cosu− 1

u= 0, the Squeeze Theorem shows that lim

x→∞x2 sin 1

x −x = 0.Thus y = x is an oblique asymptote. A similar analysis shows that as x→ −∞, itis also an asymptote.

FIGURE 5.2. Graph of x2 sin 1x

5.2. Differentiation Rules

We first derive the rules for derivatives of sums, products and quotients of func-tions. The addition rule is routine. The proofs of the product rule and the quotientrule are similar, but the latter is a bit more complicated. We will prove the quotientiule and leave the others as an exercise.

Page 67: Rigorous Calculus

62 Differentiation

5.2.1. PROPOSITION. Let f, g be differentiable at x, and let r, s ∈ R. Then(1) (rf + sg)′(x) = rf ′(x) + sg′(x).

(2) (fg)′(x) = f ′(x)g(x) + f(x)g′(x).

(3) If g(x) = 0,(f/g

)′(x) =

f ′(x)g(x)− f(x)g′(x)

g(x)2 .

PROOF.

limh→0

f(x+h)g(x+h) −

f(x)g(x)

h= lim

h→0

f(x+ h)g(x)− f(x)g(x+ h)

hg(x+ h)g(x)

= limh→0

1g(x+h)g(x)

(f(x+h)−f(x)h

g(x)+f(x)g(x)−g(x+ h)

h

)=

1g(x)2

(f ′(x)g(x)− f(x)g′(x)

).

We used the fact that differentiability of g implies continuity of g at x. ■

The chain rule for the derivative of a composition will be proven using Theo-rem 5.1.3.

5.2.2. CHAIN RULE. Let f : (a, b) → (c, d) and g : (c, d) → R. Supposethat f is differentiable at x0 and that g is differentiable at f(x0). Then g ◦ f isdifferentiable at x0 and

(g ◦ f)′(x0) = g′(f(x0))f′(x0).

PROOF. Since f is differentiable at x0, applying Theorem 5.1.3, we can writef(x) = f(x0) + φ(x)(x − x0) where φ(x0) = f ′(x0) = lim

x→x0φ(x). Similarly,

set y0 = f(x0) and write g(y) = g(y0) + ψ(y)(y − y0) where ψ(y0) = g′(y0) =limy→y0

ψ(y). Set h(x) = (g ◦ f)(x) = g(f(x)). Then

h(x) = h(x0) + ψ(f(x))(f(x)− f(x0)) = h(x0) + ψ(f(x))φ(x)(x− x0).

Let χ(x) = ψ(f(x))φ(x). Then χ(x0) = ψ(y0)φ(x0) = g′(f(x0))f′(x0). Since

f is continuous at x0, and φ and ψ are continuous at x0 and f(x0) respectively,

limx→x0

χ(x) = limx→x0

ψ(f(x))φ(x) = ψ(f(x0))φ(x0) = χ(x0).

By Theorem 5.1.3, h is differentiable at x0 and h′(x0) = g′(f(x0))f′(x0). ■

Suppose that f(x) has an inverse function f−1(x). If we knew that f andf−1 were differentiable, we could apply the chain rule to y = f ◦ f−1(y) to get1 = f ′(f−1(y))(f−1)′(y). Solving yields

(f−1)′(y) =1

f ′(f−1(y)).

Page 68: Rigorous Calculus

5.2 Differentiation Rules 63

For this to make sense, we also need f ′(f−1(y)) = 0. To avoid assuming differen-tiability, we use Theorem 5.1.3 again.

5.2.3. THEOREM (Inverse functions). Suppose that f is strictly monotoneand maps (a, b) onto (c, d). Let f−1 : (c, d) → (a, b) be the inverse function. Lety0 ∈ (c, d) and x0 = f−1(y0). If f is differentiable at x0 and f ′(x0) = 0, then f−1

is differentiable at y0 and

(f−1)′(y0) =1

f ′(f−1(y0)).

PROOF. Since f is differentiable at x0, applying Theorem 5.1.3, we can writef(x) = f(x0)+φ(x)(x−x0) where φ(x0) = f ′(x0) = lim

x→x0φ(x). For y ∈ (c, d),

let x = f−1(y). Therefore

y = f(x) = f(x0) + φ(x)(x− x0)

= y0 + φ(f−1(y))(f−1(y)− f−1(y0)).

Note that if y = y0, then φ(f−1(y)) = 0. Solve for f−1(y):

f−1(y) = f−1(y0) +y − y0

φ(f−1(y))= f−1(y0) + ψ(y)(y − y0),

where ψ(y) =1

φ(f−1(y))for y = y0. Set ψ(y0) =

1f ′(x0)

. Then

limy→y0

ψ(y) = limy→y0

1φ(f−1(y))

= limx→x0

1φ(x)

=1

φ(x0)=

1f ′(x0)

= ψ(y0).

Thus ψ is continuous at y0, so by Theorem 5.1.3, f−1 is differentiable at x0 and

(f−1)′(y0) =1

f ′(x0)=

1f ′(f−1(y0))

. ■

5.2.4. EXAMPLES.(1) Trig functions. We have seen that d

dx sinx = cosx and ddx cosx = − sinx.

d

dxtanx =

( sinxcosx

)′=

(cosx)(cosx)− (sinx)(− sinx)cos2 x

= sec2 x.

d

dxcotx =

(cosxsinx

)′=

− sin2 x− cos2 x

sin2 x= − csc2 x.

d

dxsecx =

( 1cosx

)′= − − sinx

cos2 x=

sinxcosx

1cosx

= tanx secx.

d

dxcscx =

( 1sinx

)′= − cosx

sin2 x= − cotx cscx.

Page 69: Rigorous Calculus

64 Differentiation

(2) Inverse trig functions. We restrict tanx to (−π2 ,

π2 ), which is an increasing

function with range R. Hence tan−1 : R → (−π2 ,

π2 ). Moreover, lim

x→∞tan−1(x) =

π2 shows that there is a horizontal asymptote, and similarly lim

x→−∞tan−1(x) = −π

2 .

We compute

(tan−1)′(x) =1

tan′(tan−1(x))=

1sec2(tan−1(x))

= cos2(tan−1(x)).

Take a right triangle with angle θ = tan−1(x) and compute cos θ = 1√1+x2 . There-

FIGURE 5.3. Graph of tan−1(x)

fore ddx tan−1(x) =

11 + x2 .

Next we restrict cotx to (0, π). It has range R, is decreasing, and verticalasymptotes x = 0 and x = π. Thus cot−1 : R → (0, π) is decreasing, and hashorizontal asymptotes y = 0 and y = π. The formula cotx = tan(π2 − x) impliesthat cot−1 x = π

2 − tan−1 x. Thus

d

dxcot−1(x) = − d

dxtan−1 x =

−11 + x2 .

For sin−1, we restrict sinx to [−π2 ,

π2 ] on which sinx is increasing with range

[−1, 1]. Thus the inverse sin−1 : [−1, 1] → [−π2 ,

π2 ] is increasing. The derivative is

d

dxsin−1(x) =

1cos(sin−1 x)

=1√

1 − x2.

Again we compute the cosine by drawing a right triangle with angle θ = sin−1 x.Notice that the derivative is undefined at x = ±1. This corresponds to the points±π

2 at which sin′ x = cosx = 0. The function sin−1 x has vertical tangents at(±1,±π

2 ) corresponding to the horizontal tangents of sinx at (±π2 ,±1).

Page 70: Rigorous Calculus

5.2 Differentiation Rules 65

Now cosx maps [0, π] onto [−1, 1] and is decreasing. We will use the relationcosx = sin(π2 − x). Therefore cos−1 x = π

2 − sin−1 x. Differentiating shows thatd

dxcos−1 x =

−1√1 − x2

.

For sec−1 x, recall from Example 4.5.7(1) that sec−1 has two branches definedon (−∞,−1] and [1,∞), respectively. We can compute the derivative by notingthat sec−1 x = cos−1 1

x . Therefore by the chain rule,

d

dxsec−1 x =

d

dxcos−1( 1

x) =−1√1 − 1

x2

−1x2 =

1|x|

√x2 − 1

.

There is a subtlety here. Since x2 > 0, we factor it as |x|√x2 in order to

clear the denominator in√

1 − 1x2 . This keeps the derivative positive, instead of

changing the sign if we forget the absolute value. Similarly, csc−1 x = sin−1 1x and

d

dxcsc−1 x =

−1|x|

√x2 − 1

.

(3) Let a = 0 and let f(x) = xa for x > 0, which is a non-integer power of x. Wewrite f(x) = ea lnx. Thus

f ′(x) = ea lnx a

x=axa

x= axa−1.

(4) Let f(x) = ln |x| for x = 0. When x > 0, f(x) = lnx and so f ′(x) = 1x .

When x < 0, f(x) = ln(−x), and so by the chain rule, f ′(x) = 1−x(−1) = 1

x .Therefore f ′(x) = 1

x .

(5) Implicit differentiation. Consider the curve

x3 + 2x2y + xy2 + 3y3 − 2x2 − xy − y2 + x+ 7y − 9 = 0.

By inspection, (0, 1) lies on the curve. However there is no easy way to solve for yas a function of x. We differentiate the whole expression:

3x2 + 2xy + x2y′ + y2 + 2xyy′ + 9y2y′ − 4x− y − xy′ − 2yy′ + 1 + 7y′ = 0.

Solve for y′:

y′ =−(3x2 + 2xy + y2 − 4x+ 1)x2 + 2xy + 9y2 − x− 2y + 7

.

Plugging in x = 0 and y = 1 yields y′ = −1/7 at (0, 1).

(6) Logarithmic differentiation. Consider f(x) = (tan2 x)cosxex2(x + 1)4. Then

since f(x) ≥ 0

ln f(x) = 2 cosx ln | tanx|+ x2 + 2 ln |x+ 1|

Page 71: Rigorous Calculus

66 Differentiation

which is valid unless f(x) = 0. Differentiate both sides:

f ′(x)

f(x)= −2 sinx ln | tanx|+ 2

cosxtanx

sec2 x+ 2x+2

|x+ 1|.

Solve for f ′(x). At points where f(x) = 0, a separate argument is needed.

5.3. Maxima and Minima

Recognizing the maximum and minimum points on a graph, even locally, hasmany applications.

5.3.1. DEFINITION. If f : [a, b] → R is a function, a point x0 is a maximumfor f if f(x0) = sup{f(x) : a ≤ x ≤ b}. A point x0 is a minimum for f iff(x0) = inf{f(x) : a ≤ x ≤ b}.

We say that x0 is a local maximum for f if there is δ > 0 so that it is a maximumon the smaller interval [x0 − δ, x0 + δ]. A point x0 is a local minimum for f if thereis δ > 0 so that it is a minimum on [x0 − δ, x0 + δ].

5.3.2. FERMAT’S THEOREM. Suppose that f : [a, b] → R is a continuousfunction which attains its maximum or minimum value at x0. Then either

(1) x0 ∈ {a, b} is an endpoint of [a, b],

(2) f ′(x0) is undefined,or

(3) f ′(x0) = 0.

FIGURE 5.4. Fermat’s Theorem

Page 72: Rigorous Calculus

5.3 Maxima and Minima 67

PROOF. Suppose that x0 is a maximum for f and (1) is false, so a < x0 < b;and that (2) is also false, so that f ′(x0) is defined. Then

f ′(x0) = limh→0+

f(x0 + h)− f(x0)

h≤ 0

= limh→0−

f(x0 + h)− f(x0)

h≥ 0.

The first inequality is because the numerator is negative and the denominator ispositive, while the second inequality is because the numerator is negative and thedenominator is negative. Therefore f ′(x0) = 0. Minima are treated similarly. ■

We get the following important corollary.

5.3.3. ROLLE’S THEOREM. Suppose that f : [a, b] → R is continuous, isdifferentiable on (a, b) and f(a) = f(b). Then there is a point x0 ∈ (a, b) so thatf ′(x0) = 0.

PROOF. If f is constant, any point x0 ∈ (a, b) will do. Otherwise there issome x with f(x) = f(a). without loss of generality, f(x) > f(a). By theExtreme Value Theorem, f attains its maximum value at some point x0; and clearlyx0 ∈ {a, b}. By Fermat’s Theorem, since f is differentiable at x0, f ′(x0) = 0. ■

5.3.4. EXAMPLE. Snell’s Law. A beam of light travels from point A to pointB. It starts in one medium in which the speed of light is c1, but when it crossesthe line CD into the second medium, the speed is c2. (Typical media are air, water,glass, vacuum, etc.) The light will travel in a straight line from A to some point Xon CD, but then will change the angle (refraction) and follow a straight line to B.The point X is determined by Fermat’s Principle: the light will travel along thepath requiring the least time.

FIGURE 5.5. Snell’s law

Drop perpendicular lines fromA to a point C on the line, and fromB to a pointD. Let the distances be CD = L, AC = h1 and BD = h2. We will treat X asa variable position along the line, and compute the time it would take the light to

Page 73: Rigorous Calculus

68 Differentiation

travel through X . Then we will minimize the time to find a relationship betweenthe angle of incidence, α1 = ∠CAX , and the angle of refraction, α2 = ∠DXB.Once we fix the angle α1, the point X and the angle α2 are determined as functionsof α1.

The distances travelled in each medium are

AX = h1 secα1 and XB = h2 secα2.

Thus the time taken is

T (α1) =h1

c1secα1 +

h2

c2secα2.

We also need to work L into this by observing that

L = CX +XD = h1 tanα1 + h2 tanα2.

This works for −π2 < α1 <

π2 , so even if the light should travel away fromB or be-

yondB at first (actually impossible), the formula is still valid. We will differentiateit with respect to α1:

0 =dL

dα1= h1 sec2 α1 + h2 sec2 α2

dα2

dα1.

Thereforedα2

dα1=

−h1 sec2 α1

h2 sec2 α2. Observe that lim

α1→±π2

T (α1) = ∞, and thus T will

attain its minimum value in between. Differentiate T :dT

dα1=h1

c1secα1 tanα1 +

h2

c2secα2 tanα2

dα2

dα1

=h1

c1secα1 tanα1 −

h2

c2secα2 tanα2

h1 sec2 α1

h2 sec2 α2

= h1 sec2 α1

(sinα1

c1− sinα2

c2

).

Therefore T ′(α1) = 0 if and only ifsinα1

c1=

sinα2

c2; and this must be the mini-

mum. The relationship between the two angles is known as Snell’s law.

5.4. Mean Value Theorem

There is a routine but powerful extension of Rolle’s Theorem.

5.4.1. MEAN VALUE THEOREM. Suppose that f : [a, b] → R is continuous,and is differentiable on (a, b). Then there is a point x0 ∈ (a, b) so that

f ′(x0) =f(b)− f(a)

b− a.

Page 74: Rigorous Calculus

5.4 Mean Value Theorem 69

FIGURE 5.6. Mean Value Theorem

PROOF. Let g(x) = f(x)−(f(b)− f(a)

b− a

)x. Then

g(b)− g(a) = f(b)−(f(b)− f(a)

b− a

)b− f(a) +

(f(b)− f(a)

b− a

)a

= f(b)− f(a)−(f(b)− f(a)

b− a

)(b− a) = 0.

So g(a) = g(b). By Rolle’s Theorem, thee is a point x0 ∈ (a, b) so that

0 = g′(x0) = f ′(x0)−f(b)− f(a)

b− a.

Thus f ′(x0)−f(b)− f(a)

b− a. ■

The following immediate consequence is used all the time.

5.4.2. COROLLARY. Suppose that f : [a, b] → R is continuous, and is differ-entiable on (a, b).

• If f ′(x) > 0 on (a, b), then f is strictly increasing.

• If f ′(x) ≥ 0 on (a, b), then f is increasing.

• If f ′(x) < 0 on (a, b), then f is strictly decreasing.

• If f ′(x) ≤ 0 on (a, b), then f is decreasing.

PROOF. We only prove the first statement. Suppose that a ≤ x < y ≤ b. Thenby the Mean Value Theorem applied to [x, y], there is a point x0 ∈ (x, y) so that

f(y)− f(x)

y − x= f ′(x0) > 0.

Hence f(x) < f(y). ■

5.4.3. PROPOSITION. If f : [a, b] has a continuous derivative f ′(x) andf ′(x0) > 0, then there is a δ > 0 so that f is strictly increasing on (x0 − δ, x0 + δ).

Page 75: Rigorous Calculus

70 Differentiation

PROOF. Take ε = f ′(x0). By continuity of f ′, there is a δ > 0 so that ifx ∈ (x0 − δ, x0 + δ), then |f ′(x)− f ′(x0)| < ε. Hence

f(x) ≥ f(x0)− |(f(x)− f(x0)| > ε− ε = 0.

Thus f is strictly increasing on (x0 − δ, x0 + δ). ■

5.4.4. EXAMPLE. This proposition can fail if f ′ is not continuous. Let

f(x) = ax+ x2 sin1x

for 0 < a < 1. By Example 5.1.4(4), this function is differentiable everywhere,and f ′(x) is continuous except at x = 0. Moreover f ′(0) = a > 0. However

f ′(x) = a+ 2x sin1x+ x2 cos

1x

(−1x2

)= a+ 2x sin

1x− cos

1x.

For xn = 12nπ , f ′(xn) = a − 1 < 0. By Proposition 5.4.3, f is decreasing on

a small interval around xn. Since xn → 0, f is not increasing on any intervalcontaining 0.

If a > 1, then for any |x| < (a− 1)/2 = ε, we have f ′(x) ≥ a− 2ε− 1 = 0.Thus f is strictly increasing on

( 1−a2 , a−1

2

).

When a = 1, we see that f ′(xn) = 0. However

f ′′(x) = 2 sin 1x + 2x cos 1

x

(−1x2

)+ sin 1

x

(−1x2

)=(2 − 1

x2

)sin 1

x − 2x cos 1

x .

This is continuous except at x = 0 and f ′′(xn) = −2xn

< 0. By Proposition 5.4.3,f ′(x) is strictly decreasing on a small interval (xn − δ, xn + δ) around xn. Hencef ′(x) < 0 on (xn, xn + δ). Again f is not increasing on any interval containing 0.

Now we demonstrate another application of the Mean Value Theorem.

5.4.5. EXAMPLES.(1) Let f(x) = sinx. For 0 < x ≤ π

2 , apply MVT on [0, x]. There is a pointx0 ∈ (0, x) so that

sinxx

=sinx− sin 0

x− 0= f ′(x0) = cosx0 ∈ (0, 1).

Hence 0 < sinx < x, an inequality established in Example 3.1.6.Now let g(x) = 1

2x2 + cosx. Then g′(x) = x− sinx > 0 on (0, π2 ). Thus g is

strictly increasing, so that 12x

2 + cosx > g(0) = 1 on (0, π2 ]. That is,

1 − 12x

2 < cosx < 1 for 0 < x ≤ π2 .

Now let h(x) = sinx− x+ 16x

3. This is chosen so that h(0) = 0 and h′(x) =cosx− 1+ 1

2x2 > 0. Thus h is strictly increasing on [0, π2 ]. Hence h(x) > h(0) =

0. Thereforex− 1

6x3 < sinx < x for 0 < x ≤ π

2 .

Page 76: Rigorous Calculus

5.4 Mean Value Theorem 71

Once more! Let k(x) = cosx − (1 − 12x

2 + 124x

4). Then k(0) = 0 andk′(x) = − sinx + x − 1

6x3 = −h(x) < 0. Thus k is strictly decreasing, and so

k(x) < 0. Therefore

1 − 12x

2 < cosx < 1 − 12x

2 + 124x

4 for 0 < x ≤ π2 .

(2) Istanxx

>x

sinxon (0, π2 )?

This is true if and only if f(x) = sinx tanx− x2 > 0. Note that f(0) = 0 and

f ′(x) = cosx tanx+ sinx sec2 x− 2x = sinx(1 + sec2 x)− 2x.

Thus f ′(0) = 0 and

f ′′(x) = cosx(1 + sec2 x) + sinx(2 secx(secx tanx))− 2

= (cosx− 2 + secx) + 2sin2 x

cos3 x

=(√

cosx− 1√cosx

)2+ 2

sin2 x

cos3 x> 0.

Thus f ′′(x) > 0 on (0, π2 ), and therefore f ′(x) is strictly increasing on (0, π2 ).Since f ′(x) = 0, we have f ′(x) > 0, and thus f(x) is strictly increasing. Finally,since f(0) = 0, we have f(x) > 0 on (0, π2 ). So the answer is yes.

(3) We introduce the hyperbolic trig functions.

sinhx =ex − e−x

2coshx =

ex + e−x

2and tanhx =

sinhxcoshx

=ex − e−x

ex + e−x.

Note that tanhx has limx→∞

tanhx = 1 and limx→−∞

tanhx = −1. Thus tanhx has

horizontal asymptotes y = ±1. Note that

d

dxsinhx =

ex + e−x

2= coshx and

d

dxcoshx =

ex − e−x

2= sinhx.

Hence

d

dxtanhx =

cosh2 x− sinh2 x

cosh2 x

=(e2x + 2 + e−2x)− (e2x − 2 + e−2x)

4 cosh2 x

=1

cosh2 x=: sech2 x.

Claim: tan−1 x < π2 tanhx. Note that π

2 tanhx has the same horizontal asymp-totes, y = ±π

2 , as tan−1 x. Also tan−1 0 = 0 = tanh 0. However

π

2tanh′(0) =

π

2> 1 =

11 + 02 =

d

dxtan−1(0).

Page 77: Rigorous Calculus

72 Differentiation

So π2 tanhx increases faster near x = 0. However that means that tan−1 x will

have to increase faster for larger x to make up the difference. Subtracting them anddifferentiating will not help us here.

Let f(x) =tan−1 x

tanhxfor x ≥ 0. Then lim

x→∞

tan−1 x

tanhx= π

2 . Differentiate

f ′(x) =1

1+x2 tanhx− tan−1 x sech2 x

tanh2 x

=sinhx coshx− (1 + x2) tan−1 x

(1 + x2) sinh2 x

=12 sinh 2x− (1 + x2) tan−1 x

(1 + x2) sinh2 x

The denominator is positive and f ′(0) = 0. Let g(x) = 12 sinh 2x−(1+x2) tan−1 x.

Theng′(x) = cosh 2x− 2x tan−1 x− 1

g′′(x) = 2 sinh 2x− 2 tan−1 x− 2x1 + x2

and

g(3)(x) = 4 cosh 2x− 21 + x2 −

2 + 2x2 − 2x(2x)(1 + x2)2 = 4

(cosh 2x− 1

(1 + x2)2

)> 0

because cosh 2x > 1 > 1(1+x2)2 . Now 0 = g(0) = g′(0) = g′′(0). Since g(3) > 0,

g′′ is strictly increasing and hence positive. Thus g′ is strictly increasing, and hencepositive; and thus g is strictly increasing. Finally, this means that g > 0, and thusf ′ > 0. Thus f(x) is strictly increasing. In particular, f(x) < π

2 for all x > 0.

5.5. Convexity and the second derivative

5.5.1. DEFINITION. Higher order derivatives. If f ′(x) is differentiable, thenwe write f ′′(x) = d

dxf′(x) for the second derivative of f . Similarly, for k ≥ 3,

we write f (k)(x) = ddxf

(k−1)(x) =: dk

dxk f(x) for the kth derivative. If f has kderivatives and f (k) is continuous, we call f a Ck function.

5.5.2. DEFINITION. A function f : (a, b) → R is convex if

f(tu+ (1 − t)v) ≤ tf(u) + (1 − t)f(v) for a < u < v < b and 0 < t < 1.

And f is strictly convex if this is a strict inequality. A function g is concave iff(x) = −g(x) is convex.

Page 78: Rigorous Calculus

5.5 Convexity and the second derivative 73

5.5.3. REMARK. The straight line through (u, f(u)) and (v, f(v)) is

L(tu+ (1 − t)v) = tf(u) + (1 − t)f(v)

= f(u) +f(v)− f(u)

v − u(x− u) for t ∈ R.

The values 0 < t < 1 yield all points x = tu + (1 − t)v between u and v.So convexity means that the function always lies below the chord connecting twopoints on the graph of f .

FIGURE 5.7. A convex function

5.5.4. PROPOSITION. f ′′(x) ≥ 0 on (a, b) implies that f ′(x) is increasing on(a, b), which implies that f(x) is convex on (a, b). Similarly, f ′′(x) ≤ 0 on (a, b)implies that f ′(x) is decreasing on (a, b), which implies that f(x) is concave on(a, b). Strict inequalities yield strict convexity/concavity.

PROOF. The first step follows from the Mean Value Theorem. Now supposethat f ′(x) is increasing (even if f ′′(x) is not defined). Define

g(x) = f(x)− L(x) = f(x)− f(u)− f(v)− f(u)

v − u(x− u).

Then g(u) = g(v) = 0 and g′(x) is increasing. By Rolle’s Theorem, there is apoint x0 ∈ (u, v) so that g′(x0) = 0. Then g′(x) ≤ 0 on [u, x0], so that g(x)is decreasing and thus g(x) ≤ g(u) = 0. Likewise g′(x) ≥ 0 on [x0, v], so thatg(x) is increasing, and thus g(x) ≤ g(v) = 0. Thus g(x) ≤ 0 on [u, v]; whencef(x) ≤ L(x).

The other cases are similar. ■

This next corollary is known as the second derivative test for extreme points.

Page 79: Rigorous Calculus

74 Differentiation

5.5.5. COROLLARY. Suppose that f is C2 on (a, b). If x0 ∈ (a, b), f ′(x0) = 0and f ′′(x0) < 0, then x0 is a local maximum. If x0 ∈ (a, b), f ′(x0) = 0 andf ′′(x0) > 0, then x0 is a local minimum.

PROOF. Since f ′′(x0) < 0 and f ′′ is continuous, there is a δ > 0 so thatf ′′(x) < 0 on (x0−δ, x0+δ). Thus f ′(x) is strictly decreasing on (x0−δ, x0+δ).Since f ′(x0) = 0, f ′(x) > 0 on (x0 − δ, x0); and thus f(x) ≤ f(x0) there.Similarly, f ′(x) < 0 on (x0, x0 + δ); and thus f(x) ≤ f(x0) there too. Hence x0is a local maximum, ■

The second derivative f ′′(x) measures the curvature of the curve y = f(x).When f ′′(x) > 0, the slope f ′(x) is increasing, and thus the graph is curvingupwards. Similarly if f ′′(x) < 0, the slope is decreasing, and the graph is curvingdownwards. When f ′′(x) changes sign, the curve switches from curving down tocurving up or vice versa. The transition point is called a point of inflection. Thiscan happen when f ′′(x) = 0, but also when there is a vertical tangent

FIGURE 5.8. Inflection points

What does the third derivative, f (3)(x) represent physically? For a movingvehicle, it measures the change in acceleration. For example, when a subway carstarts up, there is a jerk as it takes off. For this reason, the third derivative issometimes called the jerk, especially in physics.

5.5.6. EXAMPLE. Graph f(x) =x1/3

x− 1.

• f(x) = 0 only at x = 0.• f is undefined at x = 1. lim

x→1−f(x) = −∞ and lim

x→1+f(x) = +∞. So x = 1 is

a vertical asymptote.• lim

x→±∞f(x) = 0+. So y = 0 is a horizontal asymptote as x→ ±∞.

• f ′(x) =13x

−2/3(x− 1)− x1/3

(x− 1)2 =(x− 1)− 3x3x2/3(x− 1)2 =

−1 − 2x3x2/3(x− 1)2 .

So f ′(−12) = 0. Thus (−1

2 ,3√43 ) is a critical point. The denominator is positive,

and the numerator changes sign from positive to negative at x = − 12 . Hence this is

Page 80: Rigorous Calculus

5.6 Convexity and the second derivative 75

a local maximum.Also f ′(x) is undefined at x = 0 and 1. At the point (0, 0), lim

x→0f ′(x) = −∞. This

is a vertical tangent. It is also an inflection point.We already know what is happening near x = 1.• For f ′′(x), use the product rule rather than the quotient rule.

f ′′(x) =−2

3x2/3(x− 1)2 +(−1 − 2x)(−2

3)

3x5/3(x− 1)2 +(−1 − 2x)(−2)3x2/3(x− 1)3

=−2x(x− 1) + (1 + 2x)(2

3)(x− 1) + (1 + 2x)(2x)3x5/3(x− 1)3

=2(5x2 + 5x− 1)

9x5/3(x− 1)3 .

Now f ′′(x) = 0 at the roots of 5x2 + 5x − 1 = 0, namely x =−5 ± 3

√5

10which

are about 0.17 and −1.17. These are inflection points.Also f ′′ is undefined at 0, 1, but this doesn’t add new information.

FIGURE 5.9. Graph of f(x)

Page 81: Rigorous Calculus

76 Differentiation

5.6. Convexity and Jensen’s Inequality

We first look at how close convex functions are to being differentiable.

5.6.1. SECANT LEMMA. Let f : (a, b) → R be convex and let a < x < y <z < b. Then

f(y)− f(x)

y − x≤ f(z)− f(x)

z − x≤ f(z)− f(yx)

y − y.

FIGURE 5.10. Secant Lemma

PROOF. Let t = z−yz−x and observe that y = tx + (1 − t)z. By convexity,

f(y) ≤ tf(x) + (1 − t)f(z). Therefore

f(y)− f(x) ≤ (1 − t)(f(z)− f(x)).

Since y − x = (1 − t)(z − x), we can divide and obtain

f(y)− f(x)

y − x≤ f(z)− f(x)

z − x.

The second inequality is proven in the same manner. ■

5.6.2. COROLLARY. If f : (a, b) → R is convex, then it is continuous.

PROOF. It is enough to show that f is continuous on [c, d] if a < c < d < b.Pick c′ and d′ so that a < c′ < c < d < d′ < b. If c ≤ x < y ≤ d, then by theSecant Lemma,

C :=f(c)− f(c′)

c− c′≤ f(y)− f(x)

y − x≤ f(d′)− f(d)

d′ − d=: D.

Hence |f(y) − f(x)| ≤ max{|C|, |D|}|y − x|. Therefore f is Lipschitz on [c, d],and so is continuous. ■

Page 82: Rigorous Calculus

5.6 Convexity and Jensen’s Inequality 77

5.6.3. REMARK. A convex function can fail to be continuous at an endpoint.

For example, f(x) =

{0 if 0 < x < 11 if x ∈ {0, 1}

.

Also a convex function does not need to be differentiable everywhere. Thefunction f(x) = |x| on (−1, 1) is typical.

5.6.4. DEFINITION. A function f : (a, b) → R has a right derivative at x if

D+f(x) = limh→0+

f(x+ h)− f(x)

hexists. Similarly there is a left derivative at x if

D−f(x) = limh→0−

f(x+ h)− f(x)

hexists.

5.6.5. THEOREM. Let f : (a, b) → R be convex. Then f has left and rightderivatives at every point. If a < x < y < b, then

D−f(x) ≤ D+f(x) ≤ D−f(y) ≤ D+f(y).

PROOF. Let 0 < h < k < min{x − a, 12(y − x), b − y} =: δ. By the Secant

Lemma,

f(x)− f(x− k)

k≤ f(x)− f(x− h)

h≤ f(x+ h)− f(x)

h≤ f(x+ k)− f(x)

k.

Therefore g(h) =f(x+ h)− f(x)

his an increasing function on (−δ, δ) \ {0}. By

Proposition 4.5.1,

limh→0−

f(x+ h)− f(x)

h= sup

h<0

f(x+ h)− f(x)

h= D−f(x)

exists, and similarly

limh→0+

f(x+ h)− f(x)

h= sup

h>0

f(x+ h)− f(x)

h= D+(x)

exists and D−f(x) ≤ D+f(x).

The Secant Lemma also shows thatf(x+ h)− f(x)

h≤ f(x)− f(x− k)

k,

and therefore D+f(x) ≤ D−f(y). ■

5.6.6. COROLLARY. A convex function is differentiable except on a countableset.

PROOF. By Proposition 4.5.6, the monotone function D−f(x) is continuousexcept on a countable set. However if D−f(x) is continuous at x, then for ε > 0,there is a δ > 0 so that |y − x| < δ implies that D−f(y)−D−f(x)| < ε. Now if

Page 83: Rigorous Calculus

78 Differentiation

y > x, choose x < y < y′ < x+ δ and note that

D−f(x) ≤ D+f(x) ≤ D+f(y) ≤ D−f(y′) < D−f(x) + ε.

Since ε is arbitrarily small, we have that D+f(x) = D−f(x), and it is also con-tinuous at x. Hence f is differentiable at each x except for the countable set ofdiscontinuities of D−f . ■

In a certain sense, the following is just a repeated application of the defintionof convex function. However it has some surprising applications.

5.6.7. JENSEN’S INEQUALITY. Let f : (a, b) → R be convex. Suppose that

x, . . . , xn ∈ (a, b), t1, . . . , tn ≥ 0 andn∑i=1

ti = 1. Then

f( n∑

i=1

tixi)≤

n∑i=1

tif(xi).

If f is strictly convex, and ti > 0 for all i, then equality only holds if x1 = · · · = xn.

PROOF. Proceed by induction on n ≥ 2. The case n = 2 is the definition ofconvexity. Now assume that the result is true for n, and let x, . . . , xn+1 ∈ (a, b),

t1, . . . , tn+1 ≥ 0 andn+1∑i=1

ti = 1. Define

t =

n∑i=1

ti = 1 − tn+1 and y =

n∑i=1

titxi.

Then y ∈ (a, b) and∑n

i=1 ti = 1, so the n case yields

f(y) ≤n∑i=1

titf(xi).

Observe that 0 < t < 1 and

ty + tn+1xn+1 =

n+1∑i=1

tixi.

By convexity of f ,

f( n+1∑

i=1

tixi)= f(ty + tn+1xn+1) ≤ tf(y) + tn+1f(xn+1)

≤ tn∑i=1

titf(xi) + tn+1f(xn+1) =

n+1∑i=1

tif(xi).

Now if f is strictly convex, then the n = 2 case is a strict inequality if x1 = x2and 0 < t1 < 1. So if every ti > 0, then in the argument above, equality in the first

Page 84: Rigorous Calculus

5.6 Convexity and Jensen’s Inequality 79

line forces y = xn+1; and equality in the inequality for f(y), by the n case, forcesx1 = · · · = xn = y. ■

5.6.8. EXAMPLES.(1) Let f(x) = ex. Since f ′′(x) = ex > 0, this is a strictly convex function on R.

Suppose that ai > 0 and ti > 0 for 1 ≤ i ≤ n andn∑i=1

ti = 1. Let xi = ln ai and

apply Jensen’s inequality. It says

at11 a

t22 . . . a

tnn = e

∑ni=1 ti ln ai ≤

n∑i=1

tieln ai =

n∑i=1

tiai.

This is known as the generalized geometric mean–arithmetic mean inequality.The usual GM–AM inequality takes t1 = t2 = · · · = tn = 1

n . It says that

n√a1a2 . . . an ≤ a1 + a2 + · · ·+ an

n.

(2) Let 0 < s < t and ai > 0 for 1 ≤ i ≤ n. We will show that( 1n

n∑i=1

asi)1/s ≤

( 1n

n∑i=1

ati)1/t

.

Let f(x) = xt/s for x ≥ 0. Then f ′′(x) = ts(

ts − 1)x

ts−2 > 0 if x > 0. Hence f

is strictly convex. Let xi = asi and ti = 1n . Then by Jensen’s inequality,

f( 1n

n∑i=1

xi)≤ 1n

n∑i=1

f(xi);

which becomes ( 1n

n∑i=1

asi)t/s ≤ 1

n

n∑i=1

ati.

Take the tth root to obtain the inequality.

(3) For n ≥ 3, find the n-gon inscribed in a circle which has the greatest area.Choose points A1, . . . , An in order around the circumference of a circle of ra-

dius r. Then the chord AiAi+1 subtends an angle αi ∈ (0, π] and∑n

i=1 αi = 2π.(Here we mean A1 when we write An+1 and O is the centre of the circle.) Thetriangle OAiAi+1 is isosceles with base 2r sin αi

2 and height r cos αi2 . Thus it has

area r2 sin αi2 cos αi

2 = r2

2 sinαi. Therefore the area of the polygon is

A = A(α1, . . . , αn) =r2

2

n∑i=1

sinαi.

Consider f(x) = sinx on [0, π]. Since f ′′(x) = − sinx < 0 on (0, π), this is astrictly concave function. Equivalently, −f(x) is strictly convex. This reverses the

Page 85: Rigorous Calculus

80 Differentiation

FIGURE 5.11. Maximize area of the inscribed polygon

inequality in Jensen’s formula. This becomes

A =r2n

21n

n∑i=1

f(αi) ≤r2n

2f( 1n

n∑i=1

αi

)=r2n

2sin

2πn.

Moreover the inequality is strict unless α1 = · · · = αn = 2πn . The maximum area

occurs only at the regular n-gon which has area r2

2 n sin 2πn .

5.7. L’Hopital’s Rule

We begin with an intermediate value theorem for derivatives, which need notbe continuous.

5.7.1. DARBOUX’S THEOREM. Suppose that f, g are differentiable on [a, b]and f ′(a) < L < f ′(b). Then there is an x0 ∈ (a, b) so that f ′(x0) = L.

PROOF. Let g(x) = f(x) − Lx. This is differentiable on [a, b] with g′(x) =f ′(x) − L; so g′(a) < 0 < g′(b). Since g is differentiable, it is continuous. Bythe Extreme Value Theorem, g attains its minimum value at some point x0 ∈ [a, b].Near x = a,

0 > g′(a) = limh→0+

g(a+ h)− g(a)

h.

Therefore there is a δ > 0 so that for 0 < h < δ, g(a + h) < g(a), and thusthe minimum does not occur at a. Similarly the minimum cannot occur at b. ByFermat’s Theorem, since g is differentiable, g′(x0) = 0. Therefore f ′(x0) = L. ■

5.7.2. CAUCHY’S MVT. Suppose that f, g are continuous on [a, b] and differ-entiable on (a, b). If g′(x) = 0 on (a, b), then there is a x0 ∈ (a, b) so that

f(b)− f(a)

g(b)− g(a)=f ′(x0)

g′(x0).

Page 86: Rigorous Calculus

5.7 L’Hopital’s Rule 81

PROOF. Let h(x) =(f(b)− f(a)

)g(x)− f(x)

(g(b)− g(a)

). Then

h(a) = f(b)g(a)− f(a)g(b) = h(b).

By Rolle’s Theorem, there is an x0 ∈ (a, b) so that

0 = h′(x0) =(f(b)− f(a)

)g′(x0)− f ′(x0)

(g(b)− g(a)

).

Since g′(x) = 0, sign(g′(x)) is contant by Darboux’s Theorem. Thus g(x) isstrictly monotone, and thus g(b)− g(a) = 0. Now divide by g′(x0))

(g(b)− g(a)

)to get the result. ■

The main result of this section is very popular with student’s, but in practice,there are usually superior methods. The author warns the reader to never ever

apply L’Hopital’s rule to limx→0

sinxx

= 1. This limit must be known before one can

differentiate sinx, and thus the argument is circular! The hypotheses of this resultare crucial, and need to be verified in any application.

5.7.3. L’HOPITAL’S RULE. Suppose that f, g are differentiable on an openinterval J with c as an endpoint (±∞ are allowed). Suppose that

(1) g(x) = 0 and g′(x) = 0 for x ∈ J .

(2) limx→c

f(x) = limx→c

g(x) = 0 or limx→c

|f(x)| = limx→c

|g(x)| = +∞.

(3) limx→c

f ′(x)

g′(x)= L exists.

Then limx→c

f(x)

g(x)= L.

PROOF. Case 1. Suppose that c ∈ R and limx→c

f(x) = limx→c

g(x) = 0. (This

may be a one-sided limit, in which case x → c+ or x → c− as appropriate.) Since

limx→c

f ′(x)

g′(x)= L, given ε > 0, there is a δ > 0 so that 0 < |x− c| < δ implies that

L− ε <f ′(x)

g′(x)< L+ ε.

If we define f(c) = g(c) = 0, then both f and g are continuous at c. Then wecan apply the Cauchy Mean Value Theorem on the interval [c, x] (or [x, c]) with|x− c| < δ to get

f(x)

g(x)=f(x)− f(c)

g(x)− g(c)=f ′(x0)

g′(x0)∈ (L− ε, L+ ε).

Since ε > 0 was arbitrary, limx→c

f(x)

g(x)= L.

Page 87: Rigorous Calculus

82 Differentiation

Case 2. Suppose that c ∈ R and limx→c

|f(x)| = limx→c

|g(x)| = +∞. For ε > 0,

find δ > 0 for limx→c

f ′(x)

g′(x)= L as before. Consider points c < x < y < c + δ (or

c − δ < y < x < c) and apply the Cauchy Mean Value Theorem to [x, y] to get apoint x0 ∈ (x, y) so that

f ′(x0)

g′(x0)=f(x)− f(y)

g(x)− g(y)=

f(x)g(x) −

f(y)g(x)

1 − g(y)g(x)

∈ (L− ε, L+ ε).

Let x → c while holding y fixed. Thenf(y)

g(x)→ 0 and

g(y)

g(x)→ 0. Therefore

for x sufficiently close to c, this quantity will be within ε off(x)

g(x), and hence∣∣∣f(x)

g(x)− L

∣∣∣ < 2ε. This means that limx→c

f(x)

g(x)= L.

Case 3. When c = ±∞, make the substitution u = 1x . Make J smaller to

exclude 0 if necessary. Set F (u) = f( 1u) and G(u) = g( 1

u) and I = { 1x

: x ∈ J}.Then

(1) G(u) = g( 1u) = 0 and G′(u) = g′( 1

u)(−1u2

)= 0 for u ∈ I .

(2) limu→0

F (u) = limx→c |f(x)| ∈ {0,∞} and similarly

limu→0

G(u) = limx→c |g(x)| ∈ {0,∞}.

(3) limu→0

F ′(u)

G′(u)= lim

u→0

f ′( 1u)(−1u2

)g′( 1

u)(−1u2

) = limx→c

f ′(x)

g′(x)= L.

So the hypotheses for either Case 1 or 2 is satisfied. Therefore

L = limu→0

F (u)

G(u)= lim

x→c

f(x)

g(x). ■

5.7.4. EXAMPLES.

(1) Let a > 0. Find limx→a+

√x+

√x− a−

√a√

x2 − a2.

Both numerator f(x) and denominator g(x) tend to 0 as x → a+, and g(x) =√x2 − a2 = 0. Compute g′(x) =

2x2√x2 − a2

= 0 as well. So we compute

limx→a+

f ′(x)

g′(x)= lim

x→a+

12√x+ 1

2√x−a

2x2√x2−a2

= limx→a+

(√x− a+

√x)

2√x√x− a

√x2 − a2

x

= limx→a+

√x2 − a2 +

√x(x+ a)

2x√x

=

√2a

2a√a=

1√2a.

Page 88: Rigorous Calculus

5.7 Exercises for Chapter 5 83

Thus limx→a+

√x+

√x− a−

√a√

x2 − a2= 1√

2a.

(2) Compute limx→0

( tanxx

)1/x2

. This function is even, so consider x > 0. Then take

the logarithmln tanx− lnx

x2 =:f(x)

g(x). We now have a “0/0” situation because

limx→0

lntanxx

= ln 1 = 0; and the denominator x2 and its derivative 2x never vanish

on (0, 1). Compute

limx→0+

f ′(x)

g′(x)= lim

x→0+

sec2 xtanx − 1

x

2x= lim

x→0+

x− sinx cosx2x2 sinx cosx

= limx→0+

2x− sin 2x2x2 sin 2x

= limx→0+

2x− sin 2x4x3

2xsin 2x

Now limx→0+

2xsin 2x

= 1 is a known limit. Consider limx→0+

2x−sin 2x4x3 . This is another

“0/0” situation, and we have (setting y = 2x)

limx→0+

2 − 2 cos 2x12x2 = lim

y→0+

2 − 2 cos y3y2 =

13.

by Example 3.1.6. Therefore L’Hopital’s rule applies, and limx→0+

2x− sin 2x4x3 = 1

3 .

Thus limx→0+

f ′(x)

g′(x)= 1

3 . So by L’Hopital’s rule again, limx→0+

f(x)

g(x)= 1

3 . Finally we

have

limx→0

( tanxx

)1/x2

= e1/3.

(3) Consider limx→∞

x− sinxx+ sinx

. Here we have an “∞/∞” situation, and the denomi-

nator is never 0 for x > 1. However, limx→∞

1 − cosx1 + cosx

does not exist, in part because

the denominator vanishes periodically. So L’Hopital’s rule does not apply. In fact,

limx→∞

x− sinxx+ sinx

= limx→∞

1 − sinxx

1 + sinxx

=11= 1.

Exercises for Chapter 5

1. Let f(x) = e−1/x2for x = 0 and f(0) = 0.

(a) Compute the derivative of f at x = 0 from the definition.(b) Compute the derivative of f for x = 0. Is f ′(x) continuous?

Page 89: Rigorous Calculus

84 Differentiation

2. (a) Simplify the expression f(x) = sec(tan−1(sin(tan−1 x))) for x ∈ R.(b) Graph f(x) including identification of inflection points.

HINT: work with the simplified formula.

3. Two corridors meet at right angles. They have widths a meters and b meters,respectively. Find the length of the longest ladder that can be moved aroundthe corner (while keeping the ladder horizontal). HINT: use an angle as yourvariable.

4. (a) A movie screen 10m high is mounted on the wall 5m above the floor. Atwhat distance from the screen does a point on the floor subtend the greatestangle and find the angle.HINT: consider A = (0, 5), B = (0, 15) and X = (x, 0) for x ≥ 0. Findan expression for the angle ∠AXB as a function of x.

(b) Bonus. Show that the solution to (a) may be obtained by finding the circlethrough A and B which is tangent to the x-axis. Explain why this is thesolution using Euclidean geometry.

5. (a) Find the trapezoid that fits inside a semicircle of radius r with the two sidesparallel to the diameter which has the largest area.

(b) Bonus. Find the quadrilateral of largest area that fits inside a semicircle ofradius r.

6. (a) Sketch csc(x) for −π/2 ≤ x ≤ π/2, x = 0 and its inverse functioncsc−1(x).

(b) Compute the derivative of csc−1(x). Warning: be careful with signs.

7. (a) Show that tanx > x+x3

3+

2x5

15for 0 < x <

π

2.

(b) Show that tanx < x+x3

3+

2x5

5for 0 < x < 1. HINT: cosx > 1− x2

2on

(0, π2 ).

8. (a) Let f(x) = x2 sin(1/x) for x = 0 and f(0) = 0. Show that 0 is a criticalpoint of f that is not a local maximum nor a local minimum nor an inflectionpoint.

(b) Let f : (a, b) → R be a continuous function which is differentiable exceptpossibly at x0, and lim

x→x0f ′(x) = L exists. Prove that f ′(x0) = L. HINT:

use MVT on [x0, x0 + h].(c) Let g(x) = 2x2 + f(x). Show that g does have a global minimum at 0, but

g′(x) changes sign infinitely often on both (0, ε) and (−ε, 0) for any ε > 0.

9. Let f : (a, b) → R be a continuous function which is differentiable exceptpossibly at x0, and lim

x→x0f ′(x) = L exists. Prove that f ′(x0) = L. HINT: use

MVT on [x0, x0 + h].

Page 90: Rigorous Calculus

5.7 Exercises for Chapter 5 85

10. Let f be a differentiable function on [a, b], but the derivative may be discontin-uous.(a) Suppose that f ′(a) < 0 < f ′(b). Show that the minimum of f does not

occur at an endpoint. What can you conclude?(b) Suppose that f ′(a) < L < f ′(b). Prove that there is an c ∈ (a, b) such that

f ′(c) = L.(c) Prove that if f ′(x) is monotone, then f ′(x) is continuous.

11. Graph f(x) = exp(x2−3x2−x

). You can use a computer program to find the ap-

proximate roots of the degree 5 polynomial that occurs in the second derivative.

12. Let n ≥ 1, and let f and g have nth order derivatives on (a, b).

Show that (fg)(n)(x) =n∑

k=0

(nk

)f (n−k)(x)g(k)(x).

13. (a) Show that ln(1 + ex) is convex. Sketch it.

(b) Show that for any ai > 0,(

1 + n√a1a2 . . . an

)n≤∏n

i=1(1 + ai).(c) Suppose that 0 < a = x0 < x1 < · · · < xn = b. Show that the maximum

ofx0x1 . . . xn

(x0 + x1)(x1 + x2) . . . (xn−1 + xn)occurs when

xi+1

xiare all equal for

0 ≤ i < n.

14. (a) Suppose that f ′′(a) exists. Show that

limh→0

f(a+ h) + f(a− h)− 2f(a)h2 = f ′′(a).

(b) Show by example that this limit may exist even when f ′′(a) does not.

15. Compute the following limits.

(a) limx→0+

(1 + x)1/x − e

x

(b) limx→0

cot2 x− 1x2

16. Let f(x) = e−2x(cosx+ 2 sinx) and g(x) = e−x(cosx+ sinx).Find all of the errors (if any) in the following L’Hopital’s rule argument:

limx→∞

f(x)

g(x)= lim

x→∞

f ′(x)

g′(x)= lim

x→∞

52e−x = 0.

17. Suppose that f is differentiable on [a, b], f(a) = 0 and |f ′(x)| ≤ A|f(x)| forall x ∈ [a, b], where A is a positive real number. Prove that f is constant.

Page 91: Rigorous Calculus

CHAPTER 6

Approximations

6.1. Taylor Polynomials

In this section, we examine whether we can use higher derivatives to get a betterapproximation to a function along the lines of the tangent line.

6.1.1. DEFINITION. If f(x) has n derivatives at a, the Taylor polynomial ofdegree n for f at a is

Pn,a(x) = f(a) + f ′(a)(x− a) +f ′′(a)

2(x− a)2 + · · ·+ f (n)(a)

n!(x− a)n

=

n∑k=0

f (k)(a)

k!(x− a)k.

First we see that this polynomial has the same derivatives at a as f does up tothe nth order.

6.1.2. LEMMA. P (k)n,a(a) = f (k)(a) for 0 ≤ k ≤ n.

PROOF. It is straightforward to check that

dk

dxk(x− a)l =

0 if l < k

k! if l = k

l(l − 1) · · · (l + 1 − k)(x− a)l−k if l > k

whence

dk

dxk(x− a)l

∣∣∣x=a

=

0 if l < k

k! if l = k

0 if l > k.

.

Therefore P (k)n,a(a) =

f (k)(a)

k!k! = f (k)(a). ■

In order to decide if the Taylor polynomial is a good approximation to f(x)other than just as one approaches a, we introduce the error function Rn,a(x) :=f(x)− Pn,a(x). Taylor’s Theorem is a higher order Mean Value Theorem.

86

Page 92: Rigorous Calculus

6.1 Taylor Polynomials 87

6.1.3. TAYLOR’S THEOREM. Suppose that f(x) has n + 1 derivatives on[a, b]. Then there is an x0 ∈ (a, b) so that

Rn,a(b) = f(b)− Pn,a(b) =f (n+1)(x0)(b− a)n+1

(n+ 1)!.

PROOF. For each t ∈ [a, b], let Pn,t(x) =∑n

k=0f (k)(t)

k! (x − t)k be the Taylorpolynomial for f about t. Set

R(t) = f(b)− Pn,t(b) = f(b)−n∑

k=0

f (k)(t)

k!(b− t)k.

Note that R(a) = Rn,a(b) and R(b) = 0. The following computation involves atelescoping sum.

R′(t) = −n∑

k=0

f (k+1)(t)

k!(b− t)k − f (k)(t)

k!k(b− t)k−1

= −n∑

k=0

f (k+1)(t)

k!(b− t)k +

n∑k=1

f (k)(t)

(k − 1)!(b− t)k−1

= −n+1∑k=1

f (k)(t)

(k − 1)!(b− t)k−1 +

n∑k=1

f (k)(t)

(k − 1)!(b− t)k−1

= −f(n+1)(t)

n!(b− t)n

Now let G(t) = R(t) −( b− t

b− a

)n+1R(a). Then G(a) = R(a) − R(a) = 0

and G(b) = R(b)− 0 = 0. By Rolle’s Theorem, there is an x0 ∈ (a, b) so that

0 = G′(x0) = −f(n+1)(x0)(b− x0)

n

n!− (n+ 1)(b− x0)

n

(b− a)n+1 R(a).

Solve for R(a):

Rn,a(b) = Rn(a) =f (n+1)(x0)(b− a)n+1

(n+ 1)!. ■

6.1.4. COROLLARY. If f ∈ Cn+1[a, b], then |Rn,a(x)| ≤ C|x− a|n+1. And ifq(x) is a polynomial of degree at most n so that |f(x)− q(x)| ≤ C ′|x− a|n+1 forsome contant C ′, then q(x) = Pn,a(x).

PROOF. By hypothesis, f (n+1) is continuous on [a, b]. By the Extreme ValueTheorem, max

a≤x≤b|f(x)| = M < ∞. By Taylor’s Theorem (with x in place of b),

C = M(n+1)! works.

Page 93: Rigorous Calculus

88 Approximations

Now if q(x) is another polynomial of degree at most n satisfying a similarinequality, then

|q(x)− Pn,a(x)| ≤ |q(x)− f(x)|+ |f(x)− Pn,a(x)| ≤ (C ′ + c)|x− a|n+1.

Write q(x) = b0 + b1(x − a) + b2(x − a)2 + · · · + bn(x − a)n. Let k be thesmallest integer at which the coefficients differ from the coefficients of Pn,a(x),namely ck := 1

k!f(k)(a).. Then q(x)−Pn,a(x) = (x− a)k(bk − ck) + · · ·+ (bn −

cn)(x− a)n−k). Thus

limx→a

|q(x)− Pn,a(x)||x− a|n+1 = lim

x→a

|(bk − ck) + · · ·+ (bn − cn)(x− a)n−k||x− a|n+1−k

= +∞.

This contradicts the estimate above. Hence q(x) = Pn,a(x). ■

6.1.5. EXAMPLES.(1) Let f(x) = ex and a = 0. Then f (n)(x) = ex for all n ≥ 1. So f (n)(0) = 1.

Therefore Pn,0(x) =n∑

k=0

xk

k!. For any x ∈ R, Taylor’s Theorem provides an x0

between 0 and x so that∣∣∣ex − n∑k=0

xk

k!

∣∣∣ = |f (n+1)(x0)||x|n+1

(n+ 1)!≤ max{ex, 1}|x|n+1

(n+ 1)!.

Now limn→∞

|x|n+1

(n+ 1)!= 0 because for |x| ≤ N and n = 2N + k,

|x|n

n!=

|x|2N

(2N)!|x|

2N + 1. . .

|x|2N + k

<|x|2N

(2N)!12k

→ 0.

Therefore this sum converges to ex as n→ ∞, so that

ex =∞∑k=0

xn

n!and e =

∞∑k=0

1n!.

If we take n = 13, we get∣∣e −∑13

k=01n!

∣∣ < e14! < 4 · 10−11, which yields 10

decimals accuracy.This is a poor way to find ex if x is large, and even for x = 1. Notice that

∣∣e1/16 −10∑k=0

1n!24k

∣∣ < e1/16

11!244 = ε < 1.6 · 10−21.

Set a =∑10

k=01

n!24k . Then e1/16 − ε < a < e1/16, so that

e > a16 =(((a2)2)2)2

> (e1/16 − ε)16 > e− 16e15/16ε > e− 7 · 10−20.

This has the same number of computations, but has 19 decimals of accuracy.

Page 94: Rigorous Calculus

6.1 Taylor Polynomials 89

(2) Let f(x) = sinx, g(x) = cosx and a = 0. The derivatives are periodic:

f ′(x) = cosx and g′(x) = − sinx

f (2)(x) = − sinx and g(2)(x) = − cosx

f (2n)(x) = (−1)n sinx and g(2n)(x) = (−1)n cosx

f (2n+1)(x) = (−1)n cosx and g(2n+1)(x) = (−1)n+1 sinx.

So f (2n)(0) = 0 = g(2n+1)(0); and f (2n+1)(0) = (−1)n = g(2n)(0). The Taylorpolynomials for sinx are P2n−1,0(x) = P2n,0(x) where

P2n,0(x) = x− x3

3!+x5

5!− · · ·+ (−1)n−1 x2n−1

(2n− 1)!=

n−1∑k=0

(−1)kx2k+1

(2k + 1)!.

Likewise the Taylor polynomials for cosx are Q2n,0(x) = Q2n+1,0(x) where

Q2n+1,0(x) = 1 − x2

2!+x4

4!− · · ·+ (−1)n

x2n

(2n)!=

n∑k=0

(−1)kx2k

(2k)!.

By Taylor’s Theorem, the remainders have the form

R2n,0(x) = sinx− P2n,0(x) =f (2n+1)(x0)x

2n+1

(2n+ 1)!= (−1)n+1 cosx0

x2n+1

(2n+ 1)!.

Therefore |R2n,0(x)| ≤ |x|2n+1

(2n+1)! . As in the previous example, we know that thisconverges to 0. Therefore

sinx = x− x3

3!+x5

5!− x7

7!+ · · · =

∞∑k=0

(−1)kx2k+1

(2k + 1)!.

Convergence is slow for a while if x is large, so you should always use trig identitiesto manipulate things so that x is small. For example, if x is 1◦, which is π

180 inradians, then

sin π180 ≈ π

180 − π3

6(180)3 +π5

120(180)5 − π7

7!(180)7 +π9

9!(180)9

with an error of at most π11

11!(180)11 < 5 · 10−22. The real issue will be computingpowers of π.

Similarly, the error for cosx is given as

cosx−Q2n+1,0 =f (2n+2)(x0)x

2n+2

(2n+ 2)!= (−1)n+1 cosx0

x2n+2

(2n+ 2)!.

Thus | cosx−Q2n+1,0| ≤ |x|2n+2

(2n+2)! . Again we conclude that the error tends to 0 forany value of x. Thus

cosx = 1 − x2

2!+x4

4!− x6

6!+x8

8!− · · · =

∞∑k=0

(−1)kx2k

(2k)!.

Page 95: Rigorous Calculus

90 Approximations

If we take f(x) = sinx and a = π/6, then

P3,π/6(x) =12+

√3

2(x− π

6)− 1

212(x− π

6)2 −

√3

216(x− π

6)3.

This is a better starting point if you want to estimate sin(31◦) = π6 + π

180 .

(3) These examples give a false impression about how well Taylor polynomialswork. Consider f(x) = tan−1(x). The derivatives get progressively more com-plicated. But there is a way around the problem. Note that f is an odd function,and

f ′(x) =1

1 + x2 = 1 − x2 + x4 − x6 + x8 − · · · =∞∑k=0

(−1)kx2k.

This is a geometric series, and it converges when |x| < 1, and diverges if |x| ≥ 1.Let

Q(x) =n∑

k=0

(−1)kx2k =1 − (−1)n+1x2n+2

1 − (−x2)=

1 + (−1)nx2n+2

1 + x2 .

Therefore

|f ′(x)−Q(x)| = |x|2n+2

1 + x2 ≤ |x|2n+2.

By Corollary 6.1.4, Q(x) is the 2n+ 1st Taylor polynomial of f ′(x) at a = 0.It follows immediately from the definition that if Pn,a(x) is the nth Taylor

polynomial for f(x), then P ′n,a(x) is the Taylor polynomial for f ′(x) of degree

n− 1. Since f(x) = 0, it follows that

P2n+2,0 =

n∑k=0

(−1)k

2k + 1x2k+1 = x− 1

3x3 +

15x5 − · · ·+ (−1)n

2n+ 1x2n+1.

Rather than applying Taylor’s estimate for the error, which we can’t do easily sincewe don’t know the higher derivatives, we instead apply the Mean Value Theoremto f(x)− P2n+2,0(x) on [0, x]. There is an x0 ∈ (0, x) so that

|f(x)− P2n+2,0(x)||x|

=|(f(x)− P2n+2,0(x))− (f(0)− P2n+2,0(0))|

|x|= |f ′(x0)− P ′

2n+2,0(x0)| = |f ′(x0)−Q(x0)| < |x|2n+2.

For example, tan−1( 110) =

110 −

13000 +

1500000 −

170000000 is within 1

9·109 . In yourhomework, you will be asked to verify that

π

4= 4 tan−1 1

5− tan−1 1

239.

Using this, you can get a formula for π.The function tan−1(x) is defined on the whole real line, but the Taylor polyno-

mials only approximate f(x) when |x| < 1. This is fairly typical behaviour.

Page 96: Rigorous Calculus

6.1 Taylor Polynomials 91

6.1.6. DEFINITION Big O and little o Notation.. We say that f(x) is O(g)as x → a if there is a constant C so that |f(x)| ≤ C|g(x)| for x ∈ (a − δ, a + δ).

We write f(x) = O(g(x)). Also f(x) is o(g) as x→ a if limx→a

f(x)

g(x)= 0. We write

f(x) = o(g(x)).

Normally g(x) will go to 0 as x → a, and f = O(g) means that it goes to 0 atthe same rate or faster. And f = o(g) means that f goes to 0 faster than g.

It is not hard to show that if fi = O(gi) near x = a, then f1f2 = O(g1g2) andf1 + f2 = O(max{g1, g2}. Sometimes division is possible if the denominator is

closely related to the numerator. For exampleO((x− a)n)

(x− a)k= O((x− a)n−k).

The Corollary 6.1.4 says that f(x) = Pn,a(x) +O((x− a)n+1).

6.1.7. EXAMPLES.(1) Let f(x) = tanx and a = 0. This is another function whose derivatives getcomplicated quickly. Note that f is odd, so that f (2n)(0) = 0 for n ≥ 1.

f ′(x) = sec2 x and f ′(0) = 1

f ′′(x) = 2 sec2 x tanx and f ′′(0) = 0

f (3)(x) = 4 sec2 x tan2 x+ 2 sec4 x and f (3)(0) = 2

f (4)(x) = 8 sec2 x tan3 x+ 16 sec4 x tanx and f (4)(0) = 0

f (5)(x) = 16 sec2 x tan4 x+ 88 sec4 x tan2 x+ 16 sec6 x and f (5)(0) = 16.

Therefore P5,0(x) = P6,0(x) = x+ 13x

3 − 215x

5.We will illustrate how to use big O arithmetic to find the Taylor polynomials.

tanx =sinxcosx

=x− 1

3x3 + 1

120x5 − 1

7!x7 +O(x9)

1 − 12x

2 + 124x

4 − 1720x

6 +O(x8)

In order to invert the expression for cosx, we find the polynomial which multipliesit to get 1 +O(x8), which can be done by long division. We get(1− 1

2x2+ 1

24x4− 1

720x6+O(x8)

)(1+ 1

2x2+ 5

24x4+ 61

720x6+O(x8)

)= 1+O(x8).

Therefore

tanx =(x− 1

3x3 + 1

120x5 − 1

7!x7 +O(x9)

)(1 + 1

2x2 + 5

24x4 + 61

720x6 +O(x8)

)= x+ 1

3x3 + 2

15x5 + 17

315x7 +O(x9).

Note that when doing the multiplication, we only need to keep track of terms oforder at most 7 (since 8 never occurs). By Corollary 6.1.4, we have

P8,0(x) = x+13x3 +

215x5 +

17315

x7.

Page 97: Rigorous Calculus

92 Approximations

(2) Compute limx→0

cot2 x− 1x2 .

limx→0

cot2 x− 1x2 = lim

x→0

1tan2 x

− 1x2

= limx→0

1(x+ 1

3x3 +O(x5))2

− 1x2

= limx→0

1x2 + 2

3x4 +O(x6)

− 1x2

= limx→0

1 − (1 + 23x

2 +O(x4))

x2(1 + 23x

2 +O(x4))

= limx→0

−23 +O(x2)

1 + 23x

2 +O(x4)= −2

3.

(3) Compute limx→0

(1 + x)1/x − e

x.

First (1 + x)1/x = eln(1+x)

x . We know that eu = 1 + u+ 12u

2 +O(u3). We find theTaylor polynomial for f(x) = ln(1+x) at a = 0. Then f(0) = 0, f ′(x) = 1

1+x andf ′(0) = 1 and f ′′(x) = −1

(1+x)2 and f ′′(0) = −1. So ln(1+x) = x− 12x

2 +O(x3),

and thusln(1 + x)

x= 1 − 1

2x+O(x2). Therefore

(1 + x)1/x = eln(1+x)

x = ee−12x+O(x2)

= e(1 + (−1

2x+O(x2)) +O

((−1

2x+O(x2))2))

= e(1 − 12x+O(x2)) = e− e

2x+O(x2).

Consequently.

limx→0

(1 + x)1/x − e

x= lim

x→0

(e− e2x+O(x2))− e

x= lim

x→0−e

2+O(x) = −e

2.

These two limits are easier with Taylor polynomials than by L’Hopital’s rule.

(4) We finish off with a strange example, f(x) =

{e−1/x2

if x = 00 if x = 0

.

Observe that for n ≥ 1, with the substitution u = −1/x2,

limx→0

f(x)

x2n = limx→0

e−1/x2

x2n = limu→−∞

euun = 0.

Therefore f(x) = o(x2n) as x → 0. By Corollary 6.1.4, P2n−1,0(x) = 0. Inparticular, we have that f (n)(0) = 0 for all n ≥ 0. The function f(x) is extremelyflat at x = 0.

Page 98: Rigorous Calculus

6.2 Uniform Continuity 93

Notice that the Taylor series∞∑n=0

f (n)(0)n! xn =

∞∑n=0

0xn = 0 converges quickly

for all x ∈ R. However the sum equals f(x) only at the point x = 0.

6.2. Uniform Continuity

Recall that f : J → R is continuous if for each x ∈ J and ε > 0, there is aδ > 0 so that |x − y| < δ implies |f(y) − f(x)| < ε. However in checking thisin many examples, we find a δ which works for many, sometimes all, values of xsimultaneously. This is an important distinction which is captured in this definition.Note that the quantifiers for x and δ come in the reversed order.

6.2.1. DEFINITION. A function f : J → R is uniformly continuous if for eachε > 0, there is a δ > 0 so that x, y ∈ J and |x− y| < δ implies |f(y)− f(x)| < ε.

6.2.2. EXAMPLES.(1) Let f ∈ C1[a, b] and set M = maxa≤x≤b |f ′(x)|, which is finite by the Ex-treme Value Theorem. Then by the Mean Value Theorem, there is some x0 so that|f(y)− f(x)|

|y − x|= |f ′(x0)| ≤ M . Hence |f(y)− f(x)| ≤ M |y − x|. So f is Lips-

chitz with constant M . As in Example 4.1.3(3), Lipschitz functions are uniformlycontinuous using δ = ε/M . Thus C1 functions on a closed bounded interval areuniformly continuous.

(2) Let f(x) =1x

on (0, 1]. Then f is continuous. However if 0 < ε < 1 is given

and δ > 0, choose x = min{ε, δ}. Then |x− x10 | < |x| ≤ δ, but

|f( x10

)− f(x)| = 9x> 9 > ε.

So this (arbitrary) choice of δ does not work! Therefore this function is not uni-formly continuous. This is a C1 function, but the derivative blows up as x→ 0.

(3) f(x) = x2. On any bounded interval, the derivative is bounded. So by (1), f isuniformly continuous. However on the whole line R, it is not uniformly continuous.For 0 < x < y,

f(y)− f(x) = y2 − x2 = (y + x)(y − x).

Let 0 < ε < 1, and suppose that δ > 0. Take x = 1δ and y = x + δ/2. Then

|y − x| = δ/2 < δ, but

|f(y)− f(x)| = (y + x)(y − x) > 2xδ

2= 1 > ε.

Page 99: Rigorous Calculus

94 Approximations

Hence this (arbitrary) choice of δ does not work! So this function is not uniformlycontinuous. In this example, the derivative f ′(x) = 2x blows up as x→ ∞.

(4) f(x) =√x on [0,∞). This function is continuous, but it fails to be differen-

tiable at x = 0. Moreover f ′(x) =1

2√x

is unbounded as x → 0+. Nevertheless,

we will show that f is uniformly continuous.Let ε > 0. Define δ = ε2/2. Suppose that 0 ≤ x ≤ y and |y − x| < δ. There

are two cases. If x ≤ δ, then y < 2δ = ε2. Then |√y −√x| ≤ √

y < ε.In the second case, δ < x. Then

|√y −√x| = y − x

√y +

√x<

δ

2√δ=

√δ

2=

ε

2√

2< ε.

There is an important general result which implies uniform continuity.

6.2.3. THEOREM. Suppose that f : [a, b] → R is continuous on a closed,bounded interval. Then f is uniformly continuous.

PROOF. If f is not uniformly continuous, then the definition fails for someε0 > 0. That means that no value of δ makes the definition work. So we takeδn = 1

n . Since the definition fails, there are xn, yn ∈ [a, b] so that |xn − yn| < 1n

and |f(xn) − f(yn)| ≥ ε0. By the Bolzano-Weierstrass Theorem, the sequence(xn) has a convergent subsequence (xnk

)k≥1, say c = limk→∞

xnk. Therefore

limk→∞

ynk= lim

k→∞(ynk

− xnk) + xnk

= 0 + c = c.

By continuity of f(x) at c, we have

f(c) = limk→∞

f(xnk) = lim

k→∞f(ynk

).

Hence0 = lim

k→∞|f(xnk

)− f(ynk)| ≥ ε0.

This is a contradiction. Thus f must be uniformly continuous. ■

6.3. Uniform limits

In the section on Taylor polynomials, we were sometimes able to show thatthe infinite series of functions converges to our function. Exactly how this happenscan be delicate, and like the previous section, it is better when the estimates forconvergence are uniform over the domain. Note that in the ε–N version, there is aninterchange of when x and δ are determined.

Page 100: Rigorous Calculus

6.3 Uniform limits 95

6.3.1. DEFINITION. Suppose that f, fn : [a, b] → R are functions.We say that fn converges pointwise to a function f(x) if for each x ∈ [a, b],

limn→∞

fn(x) = f(x). This means that for any ε > 0 and x ∈ [a, b], there is an N so

that if n ≥ N , then |fn(x)− f(x)| < ε.We say that fn converges uniformly to a function f(x) if

limn→∞

supa≤x≤b

|fn(x)− f(x)| = 0.

This means that for any ε > 0, there is an N so that if x ∈ [a, b] and n ≥ N , then|fn(x)− f(x)| < ε.

6.3.2. EXAMPLES.(1) Let fn(x) = xn for x ∈ [0, 1]. Then

limn→∞

xn = f(x) =

{0 if 0 ≤ x < 11 if x = 1.

So fn converge pointwise to f . The convergence is not uniform because

sup0≤x≤1

|fn(x)− f(x)| = sup0≤x<1

xn = 1

for every n ≥ 1, Notice that each fn(x) is continuous, but the limit function is not.

(2) Let fn(x) = 1n sinnx on [0, 2π]. Then max

0≤x≤2π|fn(x)| =

1n

. Therefore

fn(x) → 0 uniformly on [0, 2π].

(3) Let fn(x) =

n2x if 0 ≤ x ≤ 1

n

n2( 2n − x) if 1

n ≤ x ≤ 2n

0 if 2n ≤ x ≤ 1.

for n ≥ 2. Then fn are continuous,

and

limn→∞

fn(x) =

{0 if x = 00 if x > 0 because x > 2

n for large n.

Therefore fn converges pointwise to f(x) = 0. This limit is continuous. Neverthe-less,

sup0≤x≤1

|fn(x)− f(x)| = sup0≤x≤1

fn(x) = fn(1n) = n.

Therefore the convergence is not uniform.

The advantage of uniform convergence is explained in the next result.

6.3.3. THEOREM. Suppose that fn : [a, b] → R are continuous functions whichconverge uniformly to a function f(x). Then f is continuous.

Page 101: Rigorous Calculus

96 Approximations

PROOF. Let ε > 0. Pick N so that supa≤x≤b

|fn(x) − f(x)| ≤ ε

3. Since fn

is continuous on a closed bounded interval, it is uniformly continuous by Theo-rem 6.2.3. Thus there is some δ > 0 so that x, y ∈ [a, b] and |x − y| < δ implies|fn(x)− fn(y)| < ε

3 . Compute

|f(x)− f(y)| = |f(x)− fn(x) + fn(x)− fn(y) + fn(y)− f(y)|≤ |f(x)− fn(x)|+ |fn(x)− fn(y)|+ |fn(y)− f(y)|

3+ε

3+ε

3= ε.

Therefore f(x) is (uniformly) continuous. ■

6.3.4. EXAMPLES.(1) We shows in Example 6.1.5(1) that ex =

∞∑k=0

1k!x

k. This is pointwise conver-

gence of the partial sums fn(x) =n∑

k=0

1k!x

k. But in fact, we showed more. We had

estimates from Taylor’s Theorem

|ex − fn(x)| =∣∣∣ex − n∑

k=0

1k!xk∣∣∣ ≤ e|x|n+1

(n+ 1)!.

The convergence is not uniform because the terms 1k!x

k are not uniformly small ifx can be arbitrarily large. However if we restrict the domain to [−R,R] for any R,we obtain

sup|x|≤R

|ex − fn(x)| ≤eRn+1

(n+ 1)!→ 0

as n → ∞. Thus the convergence is uniform on [−R,R]. Frequently this is thebest one can do on an unbounded domain.

(2) Let fn(x) =∑n

k=0(−1)kx2k for x ∈ (−1, 1). We can sum this geometricseries as in Example 6.1.5(3) to get

fn(x) =1 + (−1)nx2n+2

1 + x2 .

This converges pointwise to f(x) =1

1 + x2 .

sup−1<x<1

|fn(x)− f(x)| = sup−1<x<1

|x|2n+2

1 + x2 =12.

This does not go to 0, so the convergence is not uniform. However, if 0 < r < 1,and we restrict our domain to [−r, r], then

sup−r≤x≤r

|fn(x)− f(x)| = sup−r≤x≤r

|x|2n+2

1 + x2 ≤ r2n+2 → 0.

Page 102: Rigorous Calculus

6.4 Newton’s Method 97

Therefore the convergence is uniform on [−r, r]. Frequently this is the best one cando on an open domain.

6.4. Newton’s Method

This is an algorithm for approximately solving as equation f(x) = 0 usingcalculus. The idea is to repeatedly solve for where the tangent line crosses the axis.The power of the method lies in its rapid convergence.

FIGURE 6.1. Newton’s method

Given an approximation xn, find the tangent line

Tx = f(xn) + f ′(xn)(x− xn).

Solve for the root of T , x = xn − f(xn)

f ′(xn).

Hypotheses

• f(x) has a zero x∗, i.e., f(x∗) = 0.

• f is C2 near x∗.

• f ′(x∗) = 0.

Algorithm.

• Start with an initial guess x0 “sufficiently close” to x∗ with f(x0) ‘small’.

• Given xn, set xn+1 = xn − f(xn)

f ′(xn).

Page 103: Rigorous Calculus

98 Approximations

Error estimates.Fix a small interval [a, b] containing x∗ and x0 with x∗ near the middle on whichf ′(x) = 0. This is necessary to ensure that the iterates stay in the interval. Let

m = mina≤x≤b

|f ′(x)| and C = maxa≤x≤b

|f ′′(x)|.

• |xn − x∗| ≤|f(xn)|m

. This follows from the MVT

f(xn)

xn − x∗=f(xn)− f(x∗)

xn − x∗= f ′(c)

for some point c between xn and x∗, so c ∈ [a, b]. Thus

|xn − x∗| =|f(xn)||f ′(c)|

≤ |f(xn)|m

.

• Consider the map Sx = x− f(x)

f ′(x). Then

S′(x) = 1 − f ′(x)f ′(x)− f(x)f ′′(x)

f ′(x)2 =f(x)f ′′(x)

f ′(x)2 .

Then S(x∗) = x∗ and S′(x∗) = 0. So there is an interval around x∗ on which|S′(x)| < 1

2 . Then the MVT shows that

|xn+1 − x∗||xn − x∗|

=|S(xn)− S(x∗)|

|xn − x∗|= |S′(c)| < 1

2.

Therefore, |xn+1 − x∗| < 12 |xn − x∗|. In particular, this shows that lim

n→∞xn = x∗.

This is decent convergence, but in fact it improves as we get closer because theestimate on S′ improves.

• |xn+1 − x∗| <C

m|xn − x∗|2. In the first estimate, we found a point c so that

xn − x∗ =f(xn)

f ′(c)and the algorithm says xn+1 − xn = − f(xn)

f ′(xn). Thus

xn+1 − x∗ = (xn − x∗) + (xn+1 − xn)

=f(xn)

f ′(c)− f(xn)

f ′(xn)

=f(xn)(f

′(xn)− f ′(c))

f ′(c)f ′(xn)

We apply MVT to f ′(xn) − f ′(c) to find a point d between xn and c, and hencebetween xn and x∗, so that f ′(xn)− f ′(c) = f ′′(d)(xn − c). Plugging this back in

Page 104: Rigorous Calculus

6.4 Newton’s Method 99

yields

xn+1 − x∗ =f(xn)(f

′(xn)− f ′(c))

f ′(c)f ′(xn)

=f(xn)

f ′(c)

f ′′(d)(xn − c)

f ′(xn)

=f ′′(d)

f ′(xn)(xn − x∗)(xn − c)

Therefore, since |xn − c| < |xn − x∗|,

|xn+1 − x∗| ≤C

m|xn − x∗|2.

This is known as quadratic convergence, and says that the number of decimals ofaccuracy roughly doubles with each iteration. Very quickly on a computer, the issuebecomes the ability to do high precision calculation.

6.4.1. EXAMPLES.(1) Compute

√149.

Let f(x) = x2 − 149 and restrict the domain to [12, 13]. Then f ′(x) = 2x andf ′′(x) = 2. Thus m = minx∈[12,13] 2x = 24 and C = maxx∈[12,13] 2 = 2. HenceCm = 1

12 .Start with x0 = 12 and x∗ =

√149. The first estimate shows that

|x0 − x∗| ≤|f(x0)|m

=524

< 0.21.

The algorithm is

xn+1 = xn − f(xn)

f ′(xn)= xn − x2

n − 1492xn

=x2n + 149

2xn=

12(xn +

149xn

).

The map S has derivative S′(x) = |f(x)f ′′(x)|f ′(x)2 ≤ (169−149)2

242 < 112 . So

|xn+1 − x∗| ≤1

12|xn − x∗|,

but the quadratic estimate actually is better immediately.

x1 = 12.2083 |x1 − x∗| ≤1

12(0.21)2 < 3.7 10−3

x2 = 12.2065557 |x2 − x∗| ≤1

12(3.7 · 10−3)2 < 1.16 10−6

x3 = 12.2065556157337036 |x3 − x∗| ≤1

12(1.16 · 10−6)2 < 3.6 10−13

x4 = 12.20655561573370295189 . . . |x4 − x∗| ≤1

12(3.6 · 10−13)2 < 1.2 10−26

So in four steps, I have surpassed the accuracy of my computer’s significant digits.

Page 105: Rigorous Calculus

100 Approximations

(2) Solve 2x = 4x.Let f(x) = 2x − 4x. By inspection, x = 4 is a solution, but the graph shows a

FIGURE 6.2. Graph of 2x and 4x

second smaller solution. Compute f ′(x) = 2x ln 2− 4 and f ′′(x) = 2x(ln 2)2 > 0.Since f ′′ > 0, the curve is convex, and hence there are only two solutions. Nowf(0) = 1 > 0 > −2 = f(1), so the second solution is in (0, 1) by IVT. We willwork in [0, 1]. Then

m = minx∈[0,1]

|f ′(x)| = 4−2 ln 2 ≈ 2.6 and C = maxx∈[0,1]

|f ′′(x)| = 2(ln 2)2 < 1.

Thus Cm < 0.37. Start with x0 = 0.25. Then

|x∗ − x0| <|f ′(x0)|m

=0.1892

2.6< 0.073.

Applying the algorithm we get

x1 = 0.309579 |x1 − x∗| ≤ 0.37(.073)2 < .002

x2 = 0.30990692 |x2 − x∗| ≤ 0.37(.002)2 < 1.5 10−6

x3 = 0.3099069323806 |x3 − x∗| ≤ 0.37(1.5 · 10−6)2 < 8.3 10−13

x4 = 0.3099069323806905654546 |x4 − x∗| ≤ 0.37(8.3 · 10−13)2 < 2.5 10−25

Exercises for Chapter 6

1. Use Taylor polynomials to compute these limits, not L’Hopital’s Rule.

(a) limx→0

ex + e−x − 2x2 .

(b) limx→0

sinhx− sinxx3 .

Page 106: Rigorous Calculus

7.0 Exercises for Chapter 6 101

(c) limx→0

(sinxx

)1/x2

. HINT: Use a 3rd order polynomial for sin(x) at x = 0

and a 2nd order polynomial for ln(x) at x = 1.

2. (a) Verify that 4 tan−1(15)− tan−1( 1

239) =π

4.

(b) Use (a) and Taylor polynomials to calculate π to 6 decimals of accuracywith error estimates.

3. (a) Find the Taylor polynomials Pn,1(x) for f(x) = lnx about a = 1, and givethe error estimates.

(b) Compare the errors for the following methods for computing ln 2. Which isbest?(i) Pn,1(2) (ii) −Pn,1(.5) (iii) Pn,1(

43)− Pn,1(

23).

4. Let

f(x) =

1

1 + (lnx)2 if x > 0

0 if x = 0.Prove that f(x) is uniformly continuous on [0,∞).HINT: first prove it separately on [0, 3] and on [2,∞).

5. Let f(x) = (1 + x)−1/2.(a) Find a formula for f (k)(x). Hence show that

f (k)(0)k!

=

(–12k

):=

–12(

–12 − 1) · · · (–1

2 + 1 − k)

k!=

(−1)k

4k

(2kk

)(b) Find the Taylor polynomial Pn,0(x) for f , and give the error estimate.(c) Show that

√2 = 1.4f(−.02). Use this to compute

√2 to 6 decimal places

with error estimates.

6. Show that p(x) = x3 + x+ 1 has exactly one real root. Find this root of p(x)to 6 decimal places using Newton’s method. Provide the algorithm and showyour error estimates.

You can use a computer to do the calculations, but an analysis of the errorshould be done by hand. Try using Maple: (punctuation is critical!)

with(Student[Calculus1]):NewtonsMethod(xˆ5+x+1,x=-0.5,

view=[-1.5 .. 0.0, DEFAULT],output=sequence);

This only yield 5 terms, so substitute the last result back in and repeat.

Page 107: Rigorous Calculus

APPENDICES

A.1. Equivalence Relations

We introduce a basic mathematical construction known as an equivalence re-lation. Equivalence relations occur frequently in mathematics and have appearedoccasionally in these notes.

A.1.1. DEFINITION. Let X be a set, and let R be a subset of X × X . ThenR is a relation on X . Let us write x ∼ y if (x, y) ∈ R. We say that R or ∼ is anequivalence relation if it is

(1) (reflexive) x ∼ x for all x ∈ X .

(2) (symmetric) if x ∼ y for x, y ∈ X , then y ∼ x.

(3) (transitive) if x ∼ y and y ∼ z for y x, y, z ∈ X , then x ∼ z.

If ∼ is an equivalence relation on X and x ∈ X , then the equivalence class [x]is the set {y ∈ X : y ∼ x}. By X/∼ we mean the collection of all equivalenceclasses.

A.1.2. EXAMPLES.

(1) Equality is an equivalence relation on any set. Verify this.

(2) Consider the integers Z. Say that m ≡ n (mod 12) if 12 divides m− n. Notethat 12 divides n − n = 0 for any n, and thus n ≡ n (mod 12). So it is reflexive.Also if 12 divides m−n, then it divides n−m = −(m−n). So m ≡ n (mod 12)implies that n ≡ m (mod 12) (i.e., symmetry). Finally, if l ≡ m (mod 12) andm ≡ n (mod 12), then we may write l −m = 12a and m − n = 12b for certainintegers a, b. Thus l − n = (l −m) + (m − n) = 12(a + b) is also a multiple of12. Therefore, l ≡ n (mod 12), which is transitivity.

There are twelve equivalence classes [r] for 0 ≤ r < 12 determined by theremainder r obtained when n is divided by 12. So [r] = {12a+ r : a ∈ Z}.

(3) Consider the set R with the relation x ≤ y. This relation is reflexive (x ≤ x)and transitive x ≤ y and y ≤ z implies x ≤ z. However, it is antisymmetric: x ≤ yand y ≤ x both occur if and only if x = y. This is not an equivalence relation.

When dealing with functions defined on equivalence classes, we often definethe function on an equivalence class in terms of a representative. In order for thefunction to be well defined, that is, for the definition of the function to make sense,we must check that we get same value regardless of which representative is used.

102

Page 108: Rigorous Calculus

A.2 A Construction of R 103

A.1.3. EXAMPLES.(1) Consider the set of real numbers R. Say that x ≡ y (mod 2π) if x − y is aninteger multiple of 2π. Verify that this is an equivalence relation. Define a functionf([x]) = (cosx, sinx). We are really defining a function F (x) = (cosx, sinx) onR and asserting that F (x) = F (y) when x ≡ y (mod 2π). Indeed, we then havey = x+ 2πn for some n ∈ Z. As sin and cos are 2π-periodic, we have

F (y) = (cos y, sin y)

= (cos(x+ 2πn), sin(x+ 2πn))

= (cosx, sinx) = F (x).

It follows that the function f([x]) = F (x) yields the same answer for every y ∈ [x].So f is well defined. One can imagine the function f as wrapping the real linearound the circle infinitely often, matching up equivalent points.

(2) Consider R modulo 2π again, and look at f([x]) = ex. Then 0 ≡ 2π (mod 2π)but e0 = 1 = e2π. So f is not well defined on equivalence classes.

(3) Now consider Example A.1.2(2). We wish to define multiplication modulo 12by [n][m] = [nm]. To check that this is well defined, consider two representativesn1, n2 ∈ [n] and two representatives m1,m2 ∈ [m]. Then there are integers a andb so that n2 = n1 + 12a and m2 = m1 + 12b. Then

n2m2 = (n1 + 12a)(m1 + 12b)

= n1m1 + 12(am1 + n1b+ 12ab).

Therefore, n2m2 ≡ n1m1 (mod 12). Consequently, multiplication modulo 12 iswell defined.

A.2. A Construction of R

Our description of the real numbers as the set of all infinite decimals modulothe issue with terminal 9’s versus terminal 0’s, (which is an equivalence relation!),was a bit problematic because the rules for addition and multiplication were veryhard to formulate. Now that you have read Chapter 2, we can introduce a superiormethod for constructing the reals. The idea is to start with the rational numbers, Q,and complete it.

Let C denote the set of all Cauchy sequences of rational numbers. Note that inthe definitions of Cauchy sequence and limit, there is no harm in using only rationalnumbers for ε. Put an equivalence relation on C by (xn) ∼ (yn) if lim

n→∞xn − yn =

0. The three properties of an equivalence relation are very easy to check. LetR = C/ ∼ be the set of equivalence classes. We define an imbedding J of Q intoR by J(r) = [(r, r, r, r, . . . )], the equivalence class of the constant sequence.

Page 109: Rigorous Calculus

104 Appendices

Next we define addition, multiplication and order.• [(xn)] + [(yn)] = [(xn + yn)].

• [(xn)] · [(yn)] = [(xnyn)].

• If [(xn)] = [(yn)], say [(xn)] < [(yn)] if there is some N so that xn < ynfor all n ≥ N .

We need to verify that these notions are well defined. Suppose that (x′n) ∼ (xn)and (y′n) ∼ (yn). Then

limn→∞

(x′n + y′n)− (xn + yn) = limn→∞

x′n − xn + limn→∞

y′n − yn = 0.

Hence (x′n + y′n) ∼ (xn + yn), and thus addition is well defined. Similarly

limn→∞

(x′ny′n)− (xnyn) = lim

n→∞(x′n − xn)y

′n + lim

n→∞xn(y

′n − yn) = 0.

This uses that Cauchy sequences are bounded by Proposition 2.6.3.The order is a bit more delicate. If [(xn)] = [(yn)], then xn − yn does not con-

verge to 0. Thus there is some ε > 0 so that |xni − yni | ≥ ε for some subsequence.Use ε/3 in the definition of Cauchy sequence to obtain N so that for N ≤ m ≤ n,

|xn − xm| < ε

3and |yn − ym| < ε

3.

Then choose some ni ≥ N , and for definiteness, suppose that xni − yni ≥ ε. (Theother case is similar.) Then for n ≥ N ,

xn − yn = (xni − yni)− (xni − xn)− (yn − yni ≥ ε− ε

3− ε

3=ε

3.

Hence [(yn)] < [(xn)]. Now take equivalent sequences (x′n) and (y′n). Then

lim infx′n − y′n = lim inf(x′n − xn) + (xn − yn) + (y′n − yn) ≥ 0 +ε

3+ 0 =

ε

3.

This shows that order is well defined. Moreover, we see that exactly one of

[(xn)] < [(yn)], [(xn)] = [(yn)] or [(xn)] > [(yn)]

holds. You should check that [(xn)] < [(yn)] and [(yn)] < [(zn)] implies that[(xn)] < [(zn)].

Next observe that the embedding J : Q → R preserves the ordered fieldproperties:

J(r) + J(s) = J(r + s) and J(r) · J(s) = J(rs) for r, s ∈ Q,

and r < s implies that J(r) < J(s). We let 0 = J(0) and 1 = J(1).It is straightforward to verify all of the field operations. We give two examples.

The distributive law follows from the corresponding property for Q:([(xn)] + [(yn)]

)· [(zn)] = [(xn + yn)] · [(zn)] =

[((xn + yn)zn)

)]= [(xnzn + ynzn)] = [(xnzn)] + [(ynzn)]

= [(xn)] · [(zn)] + [(yn)] · [(zn)].

Page 110: Rigorous Calculus

A.3 A Construction of R 105

Multiplicative inverses take a bit of work. If [(xn)] = 0, then there is some ε0so that |xn| ≥ ε0 for n ≥ N . Let yn = 0 for n < N and yn = x−1

n for n ≥ N . Wehave to show that this is Cauchy. Choose N1 ≥ N so that if N1 ≤ m ≤ n, then|xn − xm| < εε2

0. Then

|yn − ym| =∣∣∣ 1xn

− 1xm

∣∣∣ = |xm − xn||xnxm|

<εε2

0

ε20

= ε.

Thus (yn) ∈ R and [(xn)] · [(yn)] = 1.We leave the straightforward verification of the other properties of an ordered

field to the interested reader.There is more work to be done. The ordered field R is Archimedean: if

[(xn)] > 0, then there is an r ∈ Q so that 0 < J(r) < [(xn)]. This follows becausewe showed that lim infxn ≥ ε > 0 for some rational ε, and so any 0 < r < ε willwork. If we like, we can take r = 1

k for k ∈ N sufficiently large. This also impliesthat that if [(xn)] ∈ R, then there is an integer k ∈ N so that [(xn)] < J(k). In-deed, if [(xn)] ≤ 0, then k = 1 suffices. If x = [(xn)] > 0, then x−1 > 0. By theArchimedean property, J( 1

k ) < x−1 for some k ∈ N, and thus x < J(k).We need to verify the Least Upper Bound Property for R. Let S ⊂ R be

a nonempty set which is bounded above by z ∈ R, and let s ∈ S. Since R isArchimedean, we can find integers a, b so that a < s ≤ z < b. Recursively definesequences xn and yn of rational numbers as follows. Let x1 = a and y1 = b.Suppose that xi and yi have been defined in Q for 1 ≤ i < n so that J(xi) is not anupper bound for S and J(yi) is an upper bound for S and yi − xi = 21−i(b − a).Let cn = 1

2(xn−1 + yn−1). If cn is an upper bound for S, then let xn = xn−1 andyn = cn; while if cn is not an upper bound for S, then let xn = cn and yn = yn−1.Let x = [(xn)]. Then x = [(yn)] because limn→∞ yn − xn = 0. We claim thatsupS = x.

Let s = [(sn)] ∈ S. If s > x, then by the Archimedean property, s > x+J( 1k )

for some k ∈ N. So there is an integer N so that sn > yn + 12k for all n ≥ N .

Choose M ≥ N so that 21−M (b− a) < 14k . Then for n ≥M

yn = yM +m∑

i=M+1

(yi − yi−1) < yM +m∑

i=M+1

21−i(b− a) < yM +1

4k.

Therefore for n ≥ M , we have sn > yM + 14k ; and hence s ≥ J(yM ) + J( 1

4k ).This contradicts the fact that J(yM ) is an upper bound for S. So no such s exists,and x is an upper bound for S . A similar argument shows that if z < x, then z isnot an upper bound.

It follows that R is an ordered field with the Least Upper Bound Property.This is exactly the property of R that we used to establish the various versions ofcompleteness. We call this field the real numbers, R. It is a subtle point that thereis only one such field with these properties. This issue will not be addressed here.

Page 111: Rigorous Calculus

106 Appendices

A.3. Cardinality

Cardinality is the notion that measures the size of a set in the crudest of ways—by counting the numbers of elements. Obviously, the number of elements in a setcould be 0, 1, 2, 3, 4, or some other finite number. Or a set can have infinitelymany elements. Perhaps surprisingly, not all infinite sets have the same cardinality.For our purposes, infinite sets have two possible sizes: countable and uncountable(the uncountable ones are larger). The most important ideas to understand are whatcountable means and what distinguishes countable sets from those with larger car-dinality.

A.3.1. DEFINITION. Two sets A and B have the same cardinality if there is abijection f from A onto B. We write |A| = |B| in this case. Similarly, we say thatthe cardinality of A is less than that of B (|A| ≤ |B|) if there is an injection f fromA into B.

The definition says simply that if all of the elements of A can be paired, one-to-one, with all of the elements of B, then A and B have the same size. If A fitsinside B in a one-to-one manner, then A is smaller than B. One of the subtletiesthat we address later is whether |A| ≤ |B| and |B| ≤ |A| mean that |A| = |B|. Theanswer is yes, but this is not obvious for infinite sets.

A.3.2. EXAMPLES.(1) The cardinality of any finite set is the number of elements, and this numberbelongs to N0 = {0, 1, 2, 3, 4, . . . }. Set theorists go to some trouble to define thenatural numbers too. But we will take for granted that the reader is familiar withthe notion of a finite set.

(2) Most sets encountered in analysis are infinite, meaning that they are not finite.The sets of natural numbers N, integers Z, rational numbers Q, and real numbersR are all infinite. Moreover, we have the natural containments N ⊂ Z ⊂ Q ⊂ R.So |N| ≤ |Z| ≤ |Q| ≤ |R|. Notice that the integers can be written as a list0, 1,−1, 2,−2, 3,−3, . . . . This amounts to defining a bijection f : N → Z by

f(n) =

{(1 − n)/2 if n is oddn/2 if n is even.

Therefore, |N| = |Z|.

A.3.3. DEFINITION. A set A is a countable set is it is finite or if |A| = |N|.The cardinal |N| is denoted by ℵ0. This is the first letter of the Hebrew alphabet,aleph, with subscript zero. It is pronounced aleph nought.

An infinite set that is not countable is called an uncountable set.

Page 112: Rigorous Calculus

A.3 Cardinality 107

Equivalently, A is countable and infinite if the elements of A may be listedas a1, a2, a3, . . . . Indeed, the list itself determines a bijection from N to A byf(k) = ak. It is a basic fact that countable sets are the smallest infinite sets.

Notice that two uncountable sets could have different cardinalities.

A.3.4. LEMMA. Every infinite subset of N is countable. Moreover, if A is aninfinite set such that |A| ≤ |N|, then |A| = |N|.

PROOF. Any nonempty subset X of N has a smallest element. Indeed, asX is nonempty, it contains an integer n. Consider the elements of the finite set{1, 2, . . . , n} in order and pick the first one that belongs to X—that is, the small-est.

Let B be an infinite subset of N. List the elements of B in increasing order asb1 < b2 < b3 < . . . . This is done by choosing the smallest element b1, then thesmallest of the remaining set B \ {b1}, then the smallest of B \ {b1, b2} and so on.The result is an infinite list of elements of B in increasing order. It must includeevery element b ∈ B because {n ∈ B : n ≤ b} is finite, containing say k elements.Then bk = b. As noted before the proof, this implies that |B| = |N|.

Now consider a set A with |A| ≤ |N|. By definition, there is a injection f ofA into N. Let B = f(A). Note that f is a bijection of A onto B. Then B is aninfinite subset of N. So |A| = |B| = |N|. ■

A.3.5. PROPOSITION. The countable union of countable sets is countable.

PROOF. By the previous lemma, we may assume that there is a countably in-finite collection of sets A1, A2, A3, . . . that are each countably infinite. Write theelements of Ai as a list ai,1, ai,2, ai,3, . . . . Then we may write A =

⋃i≥1 Ai as a

list as follows:

a1,1, a1,2, a2,1, a1,3, a2,2, a3,1, a1,4, a2,3, a3,2, a4,1, . . . ,

where the elements ai,j are written so that i+ j is monotone increasing, and withinthe set of pairs (i, j) with i+ j = n, the terms are written with the i’s in increasingorder. See Figure 7.1. Thus A is countable. ■

A.3.6. COROLLARY. The set Q of rational numbers is countable.

PROOF. The set Z × N = {(i, j) : i ∈ Z, j ∈ N} is the disjoint union ofthe sets Ai = {(i, j) : j ∈ N} for i ∈ Z. Each Ai is evidently countable. ByExample A.3.2(2), Z is countable. Hence Z×N is the countable union of countablesets, and thus is countable by Proposition A.3.5.

Define a map from Q into Z×N by f(r) = (a, b) if r = a/b, where a and b areintegers with no common factor and b > 0. These conditions uniquely determine

Page 113: Rigorous Calculus

108 Appendices

a1,1

a2,1

a3,1

a4,1

a5,1

a1,2 a1,3 a1,4 a1,5

FIGURE 7.1. The set N× N is countable.

the pair (a, b) for each rational r, and so f is a function. Clearly, f is injectivesince r is recovered from (a, b) by division. Therefore, f is an injection of Q intoa countable set. Hence Q is an infinite set with |Q| ≤ |N|. So Q is countable byLemma A.3.4. ■

A.3.7. COROLLARY. If A and B are countable, then A × B is countable.Hence Zn is countable for all n ≥ 1.

PROOF. First A× B =⋃

b∈B A× {b} is a countable union of countable sets,and thus is countable. In particular, Z2 is countable. By induction, Zn is countablefor each n ≥ 1. ■

There are infinite sets that are not countable.

A.3.8. THEOREM. The set R of real numbers is uncountable.

PROOF. The proof uses a diagonalization argument due to Cantor. Supposeto the contrary that R is countable. Then all real numbers may be written as a listx1, x2, x3, . . . . Express each xi as an infinite decimal, which we write as xi =xi0.xi1xi2xi3 . . . , where xi0 is any integer and xik is an integer from 0 to 9 for eachk ≥ 1. Our goal is to write down another real number that does not appear in this(supposedly exhaustive) list. Let a0 = 0 and define ak = 7 if xik ∈ {0, 1, 2, 3, 4}and ak = 2 if xik ∈ {5, 6, 7, 8, 9}. Define a real number a = a0.a1a2a3 . . . .

Since a is a real number, it must appear somewhere in this list, say a = xk.However, the kth decimal place ak of a and xk,k of xk differ by at least 3. Thiscannot be accounted for by the fact that certain real numbers have two decimalexpansions, one ending in zeros and the other ending in nines because this changesany digit by no more than 1 (counting 9 and 0 as being within 1). So a = xk, andhence a does not occur in this list. It follows that there is no list containing all realnumbers, and thus R is uncountable. ■

Page 114: Rigorous Calculus

A.4 e is transcendental 109

We conclude with the result promised in the start of this section.

A.3.9. SCHROEDER–BERNSTEIN THEOREM. If A and B are sets with|A| ≤ |B| and |B| ≤ |A|, then |A| = |B|.

PROOF. The proof is surprisingly simple. Since |A| ≤ |B|, there is an injectionf mapping A into B. Likewise, as |B| ≤ |A|, there is an injection g mapping Binto A. Let B1 = B \ f(A). Recursively define Ai = g(Bi) and Bi+1 = f(Ai) fori ≥ 1. Define A0 = A \

⋃i≥1 Ai and B0 = B \

⋃i≥1 Bi. We will show that the

actions of f and g fit the scheme of Figure 7.2.

B B1 B2 B3 · · · B0

A A1 A2 A3 · · · A0

g g g

f fg f

FIGURE 7.2. Schematic of action of f and g on A and B.

First we show that theBi’s are disjoint. Clearly eachBi for i ≥ 2 is in the rangeof f and hence does not intersect B1. Suppose that 1 < i < j. Then (fg)i−1 is aninjection of B into itself that carries Bk onto Bk+i−1 for every k ≥ 1. In particular,B1 is mapped onto Bi and Bj−i+1 is mapped onto Bj . Since B1 ∩ Bj−i+1 = ∅and (fg)i−1 is one-to-one, it follows that Bi ∩Bj = ∅.

By construction, g−1 is a bijection of each Ai onto Bi for i ≥ 1. We claim thatf maps A0 onto B0. Observe that f maps Ai onto Bi+1 for each i ≥ 1. Thus theremainder of A, namely A0, is mapped onto the remainder of the image. Thus

f(A0) = f(A) \⋃i≥1

f(Ai) = (B \B1) \⋃i≥1

Bi+1 = B \⋃i≥1

Bi = B0.

This means that the function

h(a) =

{g−1(a) if a ∈

⋃i≥1 Ai

f(a) if a ∈ A0

is a bijection between A and B. Therefore |A| = |B|. ■

A.4. e is transcendental

We first establish a relatively easy fact.

Page 115: Rigorous Calculus

110 Appendices

A.4.1. THEOREM. e is irrational.

PROOF. We use the formula e =∑∞

k=01k! . Suppose that e = p

q where p, q ∈ N.Then q ≥ 2 since 2 < e < 3. We have that

p(q − 1)! = q!e =q∑

k=0

q!k!

+

∞∑k=q+1

q!k!

is an integer. Therefore we have an integer

∞∑k=q+1

q!k!

=1

q + 1+

1(q + 1)(q + 2)

+1

(q + 1)(q + 2)(q + 3)+ . . .

<

∞∑k=1

1(q + 1)k

=

1q+1

1 − 1q+1

=1q< 1.

Hence this “integer” lies in (0, 1), which is absurd. Therefore e is irrational. ■

A.4.2. DEFINITION. A real number α is algebraic if there is a polynomial p(x)with integer coefficients with α as a root. A real number is transcendental if it isnot algebraic.

A.4.3. PROPOSITION. The set of algebraic numbers is countable.

PROOF. First count the number of polynomials with integer coefficients of de-gree n. Such a polynomial has n+1 coefficients, which we can associate to a pointin Zn+1, (or more precisely Zn × (Z \ {0}) since the leading coefficient an = 0,)which is countable by Corollary A.3.7. Each polynomial has n roots, so the set ofroots is countable. Finally we take the union over all n, and the countable union ofcountable sets is countable. Thus the set of algebraic numbers is countable. ■

The following proof is tricky and pulls several functions out of thin air.

A.4.4. THEOREM. e is transcendental.

PROOF. Suppose that e is algebraic. Then there are integers c0, . . . , cn so that

cnen + cn−1e

n−1 + · · ·+ c1e+ c0 = 0.

We may suppose that cn = 0 = c0 because we can factor out an x if c0 = 0. Let pbe a prime which is much larger than max{n, |c0|} to be specified more precisely

Page 116: Rigorous Calculus

A.4 e is transcendental 111

later. Define a polynomial f(x) of degree r = (n+ 1)p− 1 by

f(x) =1

(p− 1)!xp−1(1 − x)p(2 − x)p · · · (n− x)p =

(n+1)p−1∑k=p−1

ak(p− 1)!

xk

=(n!)p

(p− 1)!xp−1 +

ap(p− 1)!

xp +ap+1

(p− 1)!xp+1 + · · ·+ (−1)n

(p− 1)!x(n+1)p−1.

Note that each ak is an integer; and ap−1 = n! and a(n+1)p−1 = (−1)n.Claim: If l ≥ p and j ∈ Z, then f (l)(j) is an integer multiple of p. Indeed,

since f is a polynomial, and l ≥ p, dl

dxl

( (n!)p

(p−1)!xp−1)= 0 and

dl

dxl

( ak(p− 1)!

xk)= pak

k(k − 1) . . . (k + 1 − p) . . . (k + 1 − l)

p!

= pak

(k

p

)(k − p) . . . (k + 1 − l).

This is a product of integers including p; so is a multiple of p.Claim: If 0 ≤ l ≤ p − 1 and 0 ≤ j ≤ n, then f (l)(j) = 0 except for

f (p−1)(0) = (n!)p; and (n!)p is an integer but is not a multiple of p. For each1 ≤ j ≤ n, the factor (j − x)p has a zero of order p, and so f (l)(x) has a zero oforder p − l ≥ 1; and hence f (l)(j) = 0. At j = 0, a similar argument shows thatf (l)(0) = 0 for 0 ≤ l ≤ p− 2. Finally, f (p−1)(x) = (n!)p + higher order terms, sof (p−1)(0) = (n!)p. Since n < p, this has no factor of p.

Now we define two more functions.

F (x) = f(x) + f ′(x) + f (2)(x) + · · ·+ f (r)(x) =∞∑k=0

f (k)(x)

and G(x) = e−xF (x). Then

G′(x) = e−x(F ′(x)− F (x))

= e−x( ∞∑

k=0

f (k+1)(x)−∞∑k=0

f (k)(x))= −e−xf(x).

Apply the MVT on [0, k] for 1 ≤ k ≤ n and find points xk ∈ (0, k) so that

G(k)−G(0)k

= G′(xk) = −e−xkf(xk).

Multiply by ckkek to get

ckF (k)− ckekF (0) = −kckek−xkf(xk)

and sum from 1 to nn∑

k=1

ckF (k)−( n∑k=1

ckek)F (0) = −

n∑k=1

kckek−xkf(xk).

Page 117: Rigorous Calculus

112 Appendices

Now we observe that∑n

k=1 ckek = −c0 by hypothesis, and by the two claims

above, the LHS is a sum of integers and all but one is a multiple of p,

n∑k=0

ckF (k) ≡ c0(n!)p ≡ 0 (mod p).

In particular, this is a non-zero integer—and hence is at least 1 in absolute value.Now we make some estimates to show that the RHS is too small for p sufficientlylarge, reaching a contradiction.

Using the formula for f(x), we get the crude estimate max0≤x≤n

|f(x)| ≤ (nn+1)p

(p− 1)!.

Therefore

|RHS| ≤( n∑

k=1

k|ck|ek) (nn+1)p

(p− 1)!.

The bound has the formABp

(p− 1)!for constants A and B. However

limp→∞

ABp

(p− 1)!= 0.

Thus we can choose a prime p large enough that |RHS| < 1. Therefore e is notalgebraic, and so is transcendental. ■

A.5. π is irrational

Here is another off-the-wall irrationality proof. It implies that π is irrational.The argument will need to use a trig function in order to encapsulate the value πsomehow. It is much in the same spirit as the previous proof.

A.5.1. THEOREM. π2 is irrational.

PROOF. Suppose that π2 = ab where a, b ∈ N. Since lim

n→∞

πan

n!= 0, we

choose an n so that 0 <πan

n!< 1. Let f(x) =

xn(1 − x)n

n!. Then 0 < f(x) < 1

n!

if 0 < x < 1. Also

f(x) =1n!xn

n∑k=0

(n

k

)(−1)kxk =

2n∑k=n

ckn!xk

Page 118: Rigorous Calculus

A.5 π is irrational 113

where ck ∈ Z are integers and cn = 1. Compute

f (k)(0) = 0 if 0 ≤ k ≤ n− 1 and k > 2n

f (n)(0) = n!cnn!

= 1 if k = n

f (k)(0) = k!ckn!

= k(k − 1) · · · (n+ 1)ck if n+ 1 ≤ k ≤ 2n.

In particular, f (k)(0) is always an integer. Notice that f(1 − x) = f(x), and hencef (k)(1 − x) = (−1)kf (k)(x), Therefore, f (k)(1) is also always an integer.

Define another polynomial

F (x) = bn(π2nf(x)− π2n−2f (2)(x) + · · ·+ (−1)nf (2n)(x)

)= bn

n∑k=0

(−1)kπ2n−2kf (2k)(x).

Since π2 = ab by assumption, bnπ2n−2k are integers. Also by the previous para-

graph, F (0) and F (1) are integers. Compute

π2F (x)+F ′′(x) = bnn∑

k=0

(−1)kπ2n+2−2kf (2k)(x)+bnn∑

k=0

(−1)kπ2n−2kf (2k+2)(x)

= bnn∑

k=0

(−1)kπ2n+2−2kf (2k)(x)−bnn+1∑l=1

(−1)lπ2n+2−2lf (2l)(x)

= bnπ2n+2f(x) = anπ2f(x).

Now we introduce some trig functions. Define

G(x) = F ′(x) sinπx− πF (x) cosπx.

Then

G′(x) = F ′′(x) sinπx+ πF ′(x) cosπx− πF ′(x) cosπx+ π2F (x) sinπx

= (F ′′(x) + π2F (x)) sinπx = anπ2f(x) sinπx.

Now G(0) = −πF (0) and G(1) = πF (1). Therefore

G(1)−G(0)π

= −F (0)− F (1) ∈ Z.

The MVT provides an x0 ∈ (0, 1) so that G(1)−G(0) = G′(x0). Hence

| − F (0)− F (1)| = |G(1)−G(0)|π

=|G′(x0)|

π= anπ|f(x0) sinπx0| > 0

and anπ|f(x0) sinπx0| < anπn! < 1. This is an integer in (0, 1), which is a contra-

diction. Therefore π2 is irrational. ■

Page 119: Rigorous Calculus
Page 120: Rigorous Calculus

INDEX

aleph nought (ℵ0), 106algebraic, 110and, 1antisymmetric, 102Archimedean, 105asymptote, 41

Bernouilli’s inequality, 40bijective, 29Bolzano-Weierstrass Theorem, 22bounded above, 17bounded below, 18

Cantor, 108Cantor function, 54Cantor set, 55cardinality, 106Cauchy sequence, 23Cauchy’s mean value theorem, 80Chain Rule, 62closed, 25closed interval, 25codomain, 29complete, 24Completeness Theorem, 24composition, 29concave, 72continuous function, 45contrapositive, 2converge, 13converges pointwise, 95converges uniformly, 95converse, 2convex, 72countable, 53countable set, 106counterexample, 3

Darboux’s Theorem, 80diagonalization, 108differentiable, 58differentiable at x0, 58discontinuous, 45domain, 29

e, 37

e is irrational, 110equivalence class, 102equivalence relation, 102even, 32eventually periodic, 4exponential function, 38Extreme Value Theorem, 48

Fermat’s Principle, 67Fermat’s Theorem, 66for all, 2formula, 2function, 29

generalized geometric mean–arithmeticmean inequality, 79

greatest lower bound, 18

hyperbolic trig functions, 71

implication, 2increasing sequence, 19infimum, 18injective, 29Intermediate Value Theorem, 49inverse function, 29

Jensen’s inequality, 78jump discontinuity, 52

kth derivative, 72

L’Hopital’s rule, 81least upper bound, 17Least Upper Bound Principle, 19left derivative, 77limit from the left, 30limit from the right, 30limit of a function, 29limit points, 25Lipschitz, 46Lipschitz constant, 46local maximum, 66local minimum, 66

maximum, 66Mean Value Theorem, 68

115

Page 121: Rigorous Calculus

116 Index

minimum, 66modus ponens, 2Monotone Convergence Theorem, 19monotone increasing, 19

natural logarithm, 37negation, 1Newton’s method, 97not, 1

oblique asymptote, 42one-to-one, 29one-to-one and onto, 29onto, 29open, 25open interval, 25or, 1order dense, 11ordered field, 11

pigeonhole principle, 4point of inflection, 74Principle of Induction, 6proof by contradiction, 5proof by induction, 6

quantifiers, 2

range, 29relation, 102right derivative, 77Rolle’s Theorem, 67

Schroeder–Bernstein Theorem, 109secant, 58Secant Lemma, 76second derivative, 72second derivative test, 73Snell’s Law, 67Squeeze Theorem, 14Squeeze Theorem for Functions, 30statements, 1strictly convex, 72strictly increasing sequence, 19subsequence, 21supremum, 17surjective, 29

tangent line, 59tautology, 2Taylor polynomial, 86Taylor’s Theorem, 87there exists, 3

Thomae’s function, 46total order, 11transcendental, 110triangle inequality, 12truth tables, 2

uncountable set, 106uniformly continuous, 93upper bound, 17