math 234 third semester calculus · math 234 third semester calculus fall 2015 1. 2 math 234 –...

z=f(x,y)

x

z

y

MATH 234THIRD SEMESTER

CALCULUS

Fall 2015

1

2

Math 234 – 3rd Semester CalculusLecture notes version 0.9.1(Fall 2015)

�is is a self contained set of lecture notes for Math 234. �e notes were wri�en by SigurdAngenent, some problems were taken from Guichard’s open calculus text which is avail-able athttp://www.whitman.edu/mathematics/multivariable/src/

�e LATEX �les, as well as the Python and Inkscape-svg �les that were used to pro-duce the notes before you can be obtained from the following web site:http://www.math.wisc.edu/∼angenent/Free-Lecture-Notes

�ey are meant to be freely available for non-commercial use, in the sense that “freeso�ware” is free. More precisely:

Copyright (c) 2009 Sigurd B. Angenent. Permission is granted to copy, distribute and/or modify thisdocument under the terms of the GNU Free Documentation License, Version 1.2 or any laterversion published by the Free So�ware Foundation; with no Invariant Sections, no Front-CoverTexts, and no Back-Cover Texts. A copy of the license is included in the section entitled ”GNU FreeDocumentation License”.

http://www.whitman.edu/mathematics/multivariable/src/

http://www.math.wisc.edu/~angenent/Free-Lecture-Notes

Contents

Chapter 1. Vector Geometry in Three dimensional space 51. Three dimensional space 52. Geometric description of vectors 53. Arithmetic of vectors 64. Vector algebra 75. Component representation of vectors 86. The dot product 97. The cross product 108. The triple product 129. Determinants 1310. Determinants, the triple product, and the cross product 1311. Defining equations for lines and planes 1412. Problems 16

Chapter 2. Parametric curves and vector functions 191. Vector functions 192. Using vector functions to describe motion 193. Lines 204. Circular motion 205. The cycloid 216. The helix 217. The derivative of a vector function 228. The derivative as velocity vector 239. Acceleration 2410. The di�erentiation rules 2511. Vector functions of constant length 2612. Two examples 2713. Arc length 2814. Arc length derivative 2915. Unit Tangent and Curvature 3016. Osculating plane 3117. Problems 31

Chapter 3. Functions of more than one variable 351. Functions of two variables and their graphs 352. Linear functions 383. �adratic forms 394. Functions in polar coordinates r, θ 425. Methods of visualizing the graph of a function 44Problems 46

Chapter 4. Derivatives 491. Interior points and continuous functions 492. Partial Derivatives 503. Problems 514. The linear approximation to a function 525. The tangent plane to a graph 55

3

4 CONTENTS

6. The Two Variable Chain Rule 587. Problems 618. Gradients 629. The chain rule and the gradient of a function of three variables 6610. Implicit Functions 69Problems 7211. The Chain Rule with more Independent Variables;

Coordinate Transformations 7312. Problems 7513. Higher Partials and Clairaut’s Theorem 7814. Finding a function from its derivatives 7915. Problems 81

Chapter 5. Maxima and Minima 831. Local and Global extrema 832. Continuous functions on closed and bounded sets 843. Problems 854. Critical points 865. When there are more than two variables 896. Problems 917. A Minimization Problem: Linear Regression 928. Problems 939. The Second Derivative Test 9410. Problems 9911. Second derivative test for more than two variables 10012. Optimization with constraints and the method of Lagrange multipliers 10113. Problems 104

Chapter 6. Integrals 1071. Ways of Integrating 1072. Double Integrals 1083. Problems 1204. Triple integrals 1215. Why compute a Triple Integral? 1246. Integration in special coordinate systems 1297. Problems 132

Chapter 7. Vector Calculus 1371. Vector Fields 1372. Examples of vector fields 1373. Line integrals 1404. Problems 1425. Line integrals of vector fields 1426. Another Fundamental Theorem of Calculus 1487. Conservative vector fields 1508. Problems 1519. Flux integrals 15110. Green’s Theorem 15511. Conservative vector fields and Clairaut’s theorem 15712. Problems 15913. Surfaces and Surface integrals 16014. Examples 16515. The divergence theorem and Stokes’ theorem 16716. ~∇ – di�erentiating vector fields 16817. Problems 171

Math 234 – Answers and Hints 175

CHAPTER 1

Vector Geometry in �ree dimensional space

1. �ree dimensional space

�e world according to our �rst and second semester calculus courses is �at: exceptfor a brief digression about surfaces of revolution, everything that we discussed in Math221 and 222 took place in the (x, y)-plane. All curves were curves in the plane and allfunctions had graphs that were curves in the plane. �is semester we leave two dimen-sions behind and enter the three dimensional world. In order to understand the objectswe will be dealing with, such as curves that are free to loop around in space, or func-tions whose graphs are themselves two dimensional curved surfaces, we will �rst reviewsome three dimensional geometry. In particular, we will review the use of vectors in threedimensional geometry.

2. Geometric description of vectors

2.1. Points and their coordinates. We are used to describing the location of anypoint in the plane by choosing two perpendicular “coordinate axes” (the x and y axes),and specifying the corresponding (x, y)-coordinates of any given point. In the same waywe can describe where points are in three dimensional space by choosing three mutuallyperpendicular axes, which we call the x, y, and z axes. To say where some given point Pis, we travel from the origin to P , �rst along the x axis, then parallel to the y-axis, and�nally parallel to the z-axis. �e distances we had to go in the x, y, and z directions arethe x, y, and z coordinates of our point P .

y-axis

z-axis

x-axis

Figure 1. To determine the location of points in three dimensional space (such as the center of theblue sphere in this drawing), we should choose three coordinate axes, and specify three numbers:the x, y, and z coordinates of the point.

5

6 1. VECTOR GEOMETRY IN THREE DIMENSIONAL SPACE

2.2. Vectors. While points and their coordinates are used to described locationsin space, vectors are used to describe displacements, i.e. how to go from one point toanother. Such a displacement has a size (how far we have to go), and a direction (whichway do we go). Vectors also get used in non-geometric situations to describe objectsthat have size and direction, e.g. velocities and forces in physics are typical examples ofvector-like objects.

Informal de�nition of “vectors”. Wewill think of a vector as an arrow connecting twopoints. If the points are A andB then we call the vector # ‰

AB. If we translate a vector # ‰

AB

without turning it then we say that the resulting vector # ‰

CD is the same vector as theoriginal vector # ‰

AB. A more precise way of saying that we should be able to move # ‰

AB“without turning,” is to insist that the line segments AB and CD should be parallel, andhave the same length and orientation.

A

B

C

D

Figure 2. This figure contains four points (A, B, C , D), two line segments (AB and CD), butonly one vector since

# ‰AB and

# ‰CD represent the same vector:

# ‰AB =

# ‰CD.

We say that the arrows # ‰

AB and # ‰

PQ both represent the same vector. Since both # ‰

AB

and # ‰

PQ are the same vector we will o�en want to use a notation for vectors that doesnot emphasize any particular choice of initial- and endpoint. �e notation we will use inthis course is

#‰a =# ‰

AB =# ‰

PQ,

i.e., a single le�er with an arrow on top will always stand for a vector in this course.

to addtwo vectors. . .

. . .move one vectoruntil its initial

point. . .

. . . is the end point ofthe other. . .

. . . and combine them.

BP

Q

BP

Q

C

B

C

B

C

A A A A

#‰a #‰a #‰a #‰a

#‰

b#‰

b#‰

b#‰

b#‰a +

#‰

b

Figure 3. Adding vectors

3. Arithmetic of vectors

To add two vectors # ‰

AB and # ‰

PQ we �rst translate the vector # ‰

PQ so that its initialpoint becomes B; let the result of this translation be the vector # ‰

BC . �en, by de�nition,

4. VECTOR ALGEBRA 7

the sum of # ‰

AB and # ‰

PQ is # ‰

AC : in a formula,# ‰

AB +# ‰

PQ =# ‰

AB +# ‰

BC =# ‰

AC.

An equivalent way of adding two vectors # ‰

AB and # ‰

PQ is to move the vectors around untilthey have the same initial point. Two vectors with a common initial point form two sidesof a parallelogram (see Figure 4) and the sum of the two vectors is the diagonal of thatparallelogram.

A

B

CC

D

A

B

CC

D

A

B

CC

D

A

BD

# ‰

AB +# ‰

AD =?

Figure 4. Using a parallelogram to add vectors. To find# ‰AB+

# ‰AD wemove the vector

# ‰AD so

that its initial point is at B, i.e. the endpoint of# ‰AB. This gives us a parallelogram ABCD, where

# ‰AD =

# ‰BC . Therefore

# ‰AB +

# ‰AD =

# ‰AB +

# ‰BC =

# ‰AC

One can also multiply vectors with numbers. To multiply a vector #‰a with a positivereal number t > 0, we multiply the length of the vector by a factor t, without changingthe direction of the vector.

#‰a

2 #‰a

− #‰a

#‰a

#‰

b

− #‰a

− #‰

b

#‰a − #‰

b

#‰

b − #‰a

Figure 5. Multiplying and subtracting vectors

4. Vector algebra

�e addition and multiplication of vectors and numbers satisfy a number of alge-braic properties that should look familiar, as they are very similar to the usual algebraicproperties for adding and multiplying numbers. Here they are:

#‰a +#‰

b =#‰

b + #‰a commutative law

( #‰a +#‰

b ) + #‰c = #‰a + (#‰

b + #‰c ) t · (s · #‰a) = (ts) · #‰a associative laws

t · ( #‰a +#‰

b ) = t #‰a + t#‰

b (t+ s) #‰a = t #‰a + s #‰a distributive laws


5. Component representation of vectors

5.1. Components of a vector in two dimensional space. �ere is a way to repre-sent a vector by specifying a list of numbers instead of by giving a geometric descriptionof the vector. To do this for vectors in the plane, we must choose two perpendicularcoordinate axes (the “x” and “y” axes). We de�ne

#‰e1 = vector with length 1, in the direction of the x axis#‰e2 = vector with length 1, in the direction of the y axis

�en any other vector can be wri�en as the sum of a multiple of #‰e1 and another multipleof #‰e2:

(1) #‰a = a1#‰e1 + a2

#‰e2.

See Figure 6. �e numbers a1 and a2 are called the components of the vector #‰a . If weknow the components a1 and a2 of a vector, and if we know the two vectors #‰e1 and #‰e2,then we can reconstruct the vector #‰a by using the formula (1).

#‰e1

#‰e2

#‰a #‰a #‰a

a1#‰e1

a2#‰e2

Figure 6. Describing a vector in terms of its components.

Instead of using the notation (1), one very o�en writes

(2) #‰a =

(a1

a2

), or #‰a =

[a1

a2

], or #‰a = 〈a1, a2〉.

�is notation says that #‰a is the vector whose components are a1 and a2. Since the twovectors #‰e1 and #‰e2 depend on our choice of coordinate axes, we can only use the compo-nent notation if it is clear to everyone how we chose the coordinate axes.

�e �rst way of writing the vector, in which the components a1 and a2 are listed in acolumn enclosed in either parentheses or square brackets, is the standard way of writing“column vectors,” and is used in linear algebra courses (math 320, 340, 341, etc.), as well asby most computational so�ware (MatlabTM, Octave, etc.). �e other way of writing thecomponents, i.e. as 〈a1, a2〉, also gets used, especially when one has to type the equationsrather than write them by hand.

5.2. Components of a vector in three dimensional space. �e preceding alsoapplies to vectors in three dimensional space: instead of choosing two coordinate axeswe choose three axes, and call them the x, y, and z axes (or, the x1, x2, and x3 axes). �enwe de�ne #‰ı , #‰ , and #‰

k (or #‰e1, #‰e2, and #‰e3) to be vectors of length one in the direction of

6. THE DOT PRODUCT 9

the three coordinate axes. A vector #‰a in space can then be wri�en as a combination ofthe three vectors #‰ı , #‰ , and #‰

k , namely,

#‰a = a1#‰ı + a2

#‰ + a3#‰

k , or #‰a =

a1

a2

a3

.

�e #‰e1, #‰e2, #‰e3 notation is more systematic, but the #‰ı , #‰ , #‰

k notation, which was intro-

a2#‰e2a1

#‰e1

a3#‰e3

#‰e2

#‰e3

#‰e1

The vector #‰a =

a1

a2

a3

is

#‰a = a1#‰e1 + a2

#‰e2 + a3#‰e3

a1

a2

a3

Figure 7. Components of a vector in three dimensional space

Josiah Willard Gibbs1839–1903

https://en.wikipedia.org/wiki/

Josiah Willard Gibbs

duced into vector geometry and vector calculus by J.W.Gibbs, is also very common.

5.3. Length of a vector whose components are given. We will write

‖ #‰a‖

for the length of a vector #‰a . If the vector is given in components,#‰a = a1

#‰e1 + a2#‰e2, or #‰a = a1

#‰e1 + a2#‰e2 + a3

#‰e3,

then the length of the vector is determined by Pythagoras’ law (see Figures 6 and 7):

(3) ‖ #‰a‖ =√a2

1 + a22, or ‖ #‰a‖ =

√a2

1 + a22 + a2

3.

6. �e dot product

�ere are two di�erent descriptions of the dot product of two vectors: one geometric,and the other in terms of the components of the vectors.

6.1. Geometric description of the dot product. If #‰a and #‰

b are two given vectors,then, by de�nition,

θ #‰a

#‰b

The dot product betweentwo vectors.

(4) #‰a · #‰b = ‖ #‰a‖ ‖ #‰

b ‖ cos θ,

where θ is the angle between the two vectors #‰a and #‰

b .

https://en.wikipedia.org/wiki/Josiah_Willard_Gibbs




6.2. �e dot product in terms of vector components. If we choose an orthonor-mal set of vectors #‰e1,

#‰e2,#‰e3, and write

#‰a = a1#‰e1 + a2

#‰e2 + a3#‰e3 =

a1

a2

a3

,#‰

b = b1#‰e1 + b2

#‰e2 + b3#‰e3 =

b1b2b3

,

then

(5) #‰a · #‰b = a1b1 + a2b2 + a3b3.

�e fact that (4) and (5) always give the same result is not obvious (the formulas look verydi�erent), and requires a proof. A very common proof relies on the law of cosines (it wasgiven in math 222 – see also Problem 12.17)

6.3. Algebraic properties of the dot product. �e dot product has the followingalgebraic properties, which we will use very o�en throughout this course:

#‰a · #‰b =#‰

b · #‰a commutative

s( #‰a · #‰b ) = (s #‰a)· #‰b associative

( #‰a +#‰

b )· #‰c = #‰a · #‰c +#‰

b · #‰c . distributive

We will not prove these properties here. Proofs can be given if one starts either fromthe algebraic description of the dot-product (5), or from the geometric description (4) (al-though the distributive property is more di�cult to prove from the geometric descriptionthan from the algebraic description.)

�e sign of the dot product tells us if the angle between two vectors is acute, obtuse,or if the vectors are perpendicular:

#‰a ⊥ #‰

b ⇐⇒ #‰a · #‰b = 0(6a)#‰a · #‰b > 0 ⇐⇒ θ <

π

2(6b)

#‰a · #‰b < 0 ⇐⇒ θ >π

2.(6c)

7. �e cross product

As with the dot product, the cross product of two vectors also has a geometric de-scription, and a description in terms of components.

7.1. Geometric description of the cross product. Let #‰a and #‰

b be two vectors inthree dimensional space, then their cross product is the vector #‰a× #‰

b that satis�es

• #‰a× #‰

b is perpendicular to #‰a , and also to #‰

b

• the length of #‰a× #‰

b is given by

‖ #‰a× #‰

b ‖ = ‖ #‰a‖ ‖ #‰

b ‖ sin θ,

where θ is the angle between the vectors #‰a and #‰

b ,• the three vectors #‰a , #‰

b , #‰a× #‰

b satisfy the right hand rule: if on your right hand#‰a is the index �nger and #‰

b is the middle �nger, then your thumb points in thedirection of #‰a× #‰

b . See Figure 8.

7. THE CROSS PRODUCT 11

#‰a

#‰

b

#‰a× #‰

b

#‰a

#‰

b

#‰a× #‰

b

Figure 8. The cross product: #‰a× #‰

b is perpendicular to both #‰a and#‰

b ; its direction follows fromthe right-hand rule.

�e length of the cross product of two vectors has a geometric interpretation. Namely,the quantity ‖ #‰a‖ ‖ #‰

b ‖ sin θ is exactly the are of the parallelogram spanned by the vectors#‰a and #‰

b .

height = ‖ #‰a‖ sin θ

base = ‖ #‰

b ‖

#‰a

θ

Area=height×base

#‰

b

7.2. Algebraic description of the cross product. If #‰a and #‰

b are given by (4),i.e. by

#‰a = a1#‰e1 + a2

#‰e2 + a3#‰e3 =

(a1a2a3

),

#‰

b = b1#‰e1 + b2

#‰e2 + b3#‰e3 =

(b1b2b3

),

then

#‰a× #‰

b =

a2b3 − a3b2a3b1 − a1b3a1b2 − a2b1

.

7.3. Algebraic properties of the cross product. �e cross product has the dis-tributive property, namely,

(7) ( #‰a +#‰

b )× #‰c = #‰a× #‰c +#‰

b× #‰c ,

holds true for any three vectors #‰a , #‰

b , #‰c .�e cross product is not commutative: #‰a× #‰

b and #‰

b× #‰a are not the same thing.Instead, we have :

(8) #‰a× #‰

b = − #‰

b× #‰a .

Because of this property the cross product is said to be “anti-commutative.”


�e associative property fails completely for the cross product: for most vectors #‰a ,#‰

b , #‰c one has

(9) ��

( #‰a× #‰

b )× #‰c 6= #‰a×(#‰

b× #‰c )��

If you need a vector that is perpendicular to two given vectors, take their cross prod-uct.

�e length of the cross product #‰a× #‰

b is the area of the parallelogram spanned bythose vectors.

8. �e triple product

Just as two vectors in the plane form a parallelogram, three vectors in space willform a shape called a parallelepiped. By de�nition, a parallelepiped is a solid body eachof whose faces is a parallelogram.

θ

#‰a

#‰c

#‰

b

#‰

b× #‰c

height

θ

#‰a#‰

b#‰c

#‰

b× #‰c

height

Figure 9. A parallelepiped spanned by three vectors #‰a ,#‰

b , #‰c . Since the base of the paral-lelepiped is a parallelogram with edges

#‰

b and #‰c , we haveArea of base = ‖ #‰

b× #‰c ‖.The height of the parallelepiped is ‖ #‰a‖ cos θ, and therefore the volume is given by

Volume = height · area of base = ‖ #‰a‖ ‖ #‰

b× #‰c ‖ cos θ = #‰a ·( #‰

b× #‰c).

This derivation applies to the situation on the le�, where the vector #‰a and the cross product#‰

b× #‰cpoint in the same direction. If these vectors form an obtuse angle, as is the case on the right, thencos θ < 0, and the height is −‖ #‰a‖ cos θ. In that case one has

Volume = height · area of base = −‖ #‰a‖ ‖ #‰

b× #‰c ‖ cos θ = − #‰a ·( #‰

b× #‰c).

If we are given three vectors #‰a , #‰

b , and #‰c , then the volume of the parallelepiped theydetermine is given by the formula

“Volume equals Area of base times height”In terms of the three vectors this is(10) V =

∣∣∣ #‰a ·( #‰

b× #‰c)∣∣∣ .

A derivation is sketched in Figure 9. �e quantity #‰a ·( #‰

b× #‰c ) (without the absolute val-ues) is called the triple product of the three vectors #‰a , #‰

b , and #‰c . Apart from its usein computing the volume of a parallelepiped, the triple product appears in many other

10. DETERMINANTS, THE TRIPLE PRODUCT, AND THE CROSS PRODUCT 13

contexts. At �rst sight the expression #‰a ·( #‰

b× #‰c ) suggests that the order in which thevectors appear is important, but this turns out not to be true. One has

#‰a ·( #‰

b× #‰c)

=#‰

b ·(

#‰c× #‰a)

= #‰c ·(

#‰a× #‰

b)

for any #‰a ,#‰

b , #‰c .

9. Determinants

For any four numbers a, b, c, d, one de�nes the 2× 2 determinant to be

(11)∣∣∣∣ a bc d

∣∣∣∣ = ad− bc .

One can also de�ne 3 × 3 determinants. Namely, for any nine numbers a1, . . . , c3 onede�nes

(12)

∣∣∣∣∣∣a1 b1 c1a2 b2 c2a3 b3 c3

∣∣∣∣∣∣ = a1b2c3 − a1b3c2 − a2b1c3 + a2b3c1 + a3b1c2 − a3b2c1 .

�is can be wri�en as∣∣∣∣∣∣a1 b1 c1a2 b2 c2a3 b3 c3

∣∣∣∣∣∣ = a1

(b2c3 − b3c2

)− a2

(b1c3 − b3c1

)+ a3

(b1c2 − b2c1

)(13)

= a1

∣∣∣∣ b2 c2b3 c3

∣∣∣∣− a2

∣∣∣∣ b1 c1b3 c3

∣∣∣∣+ a3

∣∣∣∣ b1 b1b2 b2

∣∣∣∣where each coe�cient in the �rst row is multiplied with the 2×2 determined that remainsa�er one deletes the row and column containing the coe�cient.

Instead of expanding along the �rst row one can also expand along the �rst column:

(14)

∣∣∣∣∣∣a1 b1 c1a2 b2 c2a3 b3 c3

∣∣∣∣∣∣ = a1

∣∣∣∣ b2 c2b3 c3

∣∣∣∣− b1 ∣∣∣∣ a2 c2a3 c3

∣∣∣∣+ c1

∣∣∣∣ a2 b2a3 b3

∣∣∣∣Many other mnemonic devices exist to remember how to compute a 3 × 3 determinant.A popular trick is “Sarrus’ rule” (see Figure 10.)

One can also de�ne larger determinants, i.e. 4 × 4, 5 × 5, etc, and generally n × ndeterminants. �e theory, which is beyond the scope of this course, is treated in linearalgebra courses such as Math 320, 340, or 341.

10. Determinants, the triple product, and the cross product

If the numbers a1, . . . , c3 in a determinant happen to be the components of threevectors #‰a , #‰

b , #‰c , i.e. if

#‰a =

a1

a2

a3

,#‰

b =

b1b2b3

, #‰c =

c1c2c3

,

then the corresponding determinant is exactly the triple product:

(15)

∣∣∣∣∣∣a1 b1 c1a2 b2 c2a3 b3 c3

∣∣∣∣∣∣ = #‰a ·( #‰

b× #‰c).


a1 a2 a3 a1 a2

+ + +---

b1 b2 b3 b1 b2

c1 c2 c3 c1 c2

a1b2c3 a2b3c1 a3b1c2a3b2c1 a1b3c2 a2b1c3

Figure 10. Computing 3 × 3 determinants. There are several shortcuts to remember howto compute a 3 × 3 determinant. Pictured here is “Sarrus’ rule,” which tells us to copy the firsttwo columns of the determinant to the right of the determinant, and read o� the six terms in thedeterminant by following the diagonals.

Related to this is the following practical trick for computing the cross product of twocolumn vectors. Given two column vectors #‰

b and #‰c one can write their cross product asb1b2b3

×c1c2c3

=

∣∣∣∣∣∣#‰e1 b1 c1#‰e2 b2 c2#‰e3 b3 c3

∣∣∣∣∣∣=

∣∣∣∣ b2 c2b3 c3

∣∣∣∣ #‰e1 −∣∣∣∣ b1 c1b3 c3

∣∣∣∣ #‰e2 +

∣∣∣∣ b1 c1b2 c2

∣∣∣∣ #‰e3.

�e 3 × 3 determinant in this equation is unusual in that some of its entries are vectorsinstead of numbers. �e intention of this notation is that one expand the determinantalong the �rst column, as in (13) and then interpret the result as a vector.

11. De�ning equations for lines and planes

11.1. Lines. Let ` be a line in the plane, and suppose we know one point A on theline, and that we also have a vector #‰n that is perpendicular to the line (and we exclude#‰n =

#‰0 .) Such a vector is called a normal vector to the line. Given any other pointX in

the plane we can form the vector # ‰

AX and consider its dot-product with the normal. Wehave

#‰n· # ‰

AX = ‖ #‰n‖ ‖ # ‰

AX‖ cos θ,

where θ is the angle between the normal vector #‰n and # ‰

AX .

�e combination ‖ # ‰

AX‖ cos θ is, up to its sign, the distance from the line ` to thepoint X : If X lies on the side of ` at which the normal vector points then #‰n· # ‰

AX > 0; ifX lies on the other side then #‰n· # ‰

AX < 0. We therefore have the following formula forthe distance between a point X and the line `:

(16) d =#‰n· # ‰

AX

‖ #‰n‖When we use this equation to compute the distance from X to `, it is good to recall thatif #‰x = ( x1

x2) and #‰a = ( a1a2 ) are the position vectors of the points X and A, then

# ‰

AX = #‰x − #‰a =

(x1 − a1

x2 − a2

).

11. DEFINING EQUATIONS FOR LINES AND PLANES 15

X

A

`

d

θ#‰n

XA

`

d

θ#‰n

π − θ

#‰n· # ‰

AX < 0d = ‖ # ‰

AX‖ cos(π − θ)

= −‖ # ‰

AX‖ cos θ#‰n· # ‰

AX > 0 d = ‖ # ‰

AX‖ cos θ

Moreover, the length of the normal vector is ‖ #‰n‖ =√n2

1 + n22, so we can rewrite (16) as

d =n1(x1 − a1) + n2(x2 − a2)√

n21 + n2

2

.

�is last formula is more impressive than (16), but it is be�er to remember (16).�e equation for the distance from any point X to a given line ` is also important

because it gives us the de�ning equation for the line `. �e de�ning equation is anequation that tells us for any given pointX in the plane if that point is on the line or not.SinceX is on ` exactly when the distance from ` toX vanishes, it follows from (16) thatX is on ` if and only if

(17) #‰n· # ‰

AX = 0.

We can again rewrite this equation in a few di�erent ways. If we want to write it in termsof the position vectors of A and X , then we get

#‰n·(

#‰x − #‰a)

= 0, i.e.: #‰n· #‰x = #‰n· #‰a .

Wri�en without vectors, but in terms of the coordinates of the points A, X , and thecomponents of the normal vector #‰n, we can write this last version of our equation as

n1x1 + n2x2 = n1a1 + n2a2.

11.2. Planes. We can repeat the derivation of the distance from a point to a line inthe plane and derive a formula for the distance from a point in three dimensional spaceto a given plane. �e drawings are harder to make (at �rst only, practice makes perfect!),but the resulting formulas are the same.

�e distance from a point X to a plane P is given by equation (16), where #‰n is anormal vector to the plane (a vector that is perpendicular to the plane), and A is somepoint on the plane that we happen to know.


A

X

#‰n

θd

d = ‖ # ‰

AX‖ cos θ

#‰n· # ‰

AX = ‖ #‰n‖ ‖ # ‰

AX‖ cos θ

12. Problems

1. (a) Simplify the following

#‰a =

1−23

+ 3

013

#‰

b = 12

(1

1/3

)− 3

(41

)#‰c = (1 + t)

(1

1− t

)− t(

1−t

)#‰

d = t

100

+ t2

0−12

−0

01

(b) Write the vectors from part (a) usingGibbs’ notation, i.e. write them in terms of#‰ı , #‰ ,

#‰

k . (See § 5).

2. If #‰a ,#‰

b , #‰c are as in the previous prob-lem, then which of the following expressionsmean anything? Compute those expressionsthat are well defined.

(a) #‰a +#‰

b (b)#‰

b + #‰c (c) π #‰a

(d)#‰

b2

(e)#‰

b / #‰c (f) ‖ #‰a‖+ ‖ #‰

b ‖

(g) ‖ #‰

b ‖2 (h)#‰

b / ‖ #‰c ‖

3. Let #‰a =(

1−22

)and

#‰

b =(

2−11

).

Compute:

(a) || #‰a || (b) 2 #‰a (c) ||2 #‰a ||2

(d) #‰a +#‰

b (e) 3 #‰a − #‰

b

•

4. Given: points A(2, 1) and B(−1, 4).Compute the vector

# ‰AB. Is

# ‰AB a position

vector? •

5. Given: points A(2, 1), B(3, 2), C(4, 4)andD(5, 2).�estion: Is ABCD a parallelogram? •

6. Given: points A(0, 2, 1), B(0, 3, 2),C(4, 1, 4) andD.

(a) If ABCD is a parallelogram, then whatare the coordinates of the pointD? •

(b) If ABDC is a parallelogram, then whatare the coordinates of the pointD? •

7. You are given three points in the plane:A has coordinates (2, 3), B has coordinates(−1, 2) and C has coordinates (4,−1).

(a) Compute the vectors# ‰AB,

# ‰BA,

# ‰AC ,

# ‰CA,

# ‰BC and

# ‰CB.

(b) Find the pointsP,Q,R and S whose po-sition vectors are

# ‰AB,

# ‰BA,

# ‰AC , and

# ‰BC ,

respectively. Make a precise drawing.

8. Explain how you can use the dot prod-uct to find the angle between the vectors#‰a = 2 #‰ı − 3 #‰ , and

#‰

b = #‰ +#‰

k .

12. PROBLEMS 17

A

B

C

D

E FGH

Figure 11. Figure for problem 12.10

9. For which value(s) of the number s arethe vectors

#‰a =

(s

1− s

)and

#‰

b =

(23

)perpendicular? Forwhich values of s do theymake an acute angle? •

10. Figure 11 shows a cube whose sides havelength 1.

Choose A to be the origin, and let the x,y, and z axes be along the sides AB, AD,and AE, respectively.

(a) Draw the vectors #‰e1, #‰e2, and #‰e3 in thefigure.

(b) Find a normal vector to the planethrough the points B,D, and E.

(c) Draw the plane through ACH (or atleast the portion of that plane that lies in-side the cube). Find a normal to the planeACH .

(d) Find the angle between the two planesBDE and ACH . (The angle between twoplanes is the same as the angle between theirnormal vectors, i.e. to find the angle betweentwo planes find a normal vector for each ofthe planes and compute the angle betweenthese two vectors.)

(e) Find the angle between the two planesBDE andHFC .

11. (a) Draw two vectors #‰a and#‰

b for which#‰a has length 3,

#‰

b has length 5, and forwhich #‰a · #‰b = −12. How many solutionsare there? •

(b)Can there be two vectors #‰a and#‰

b whoselengths are ‖ #‰a‖ = 3 and ‖ #‰

b ‖ = 5, andwhose inner product is #‰a · #‰b = 25? •

12. Compute

#‰a = ( #‰ı× #‰ )× #‰ and#‰

b = #‰ı×( #‰× #‰ ).

What does your answer say about the asso-ciative property for the cross product? (See§ 7.3.)

What about#‰c = ( #‰ı× #‰ )× #‰

k and#‰

d = #‰ı×( #‰× #‰

k )?

13. Which of the following vector equationsare true for any pair of vectors #‰a and

#‰

b ? Ei-ther give a proof (using the algebraic prop-erties or the algebraic or geometric descrip-tions).

(a) ( #‰a +#‰

b )·( #‰a − #‰

b ) = ‖ #‰a‖2 − ‖ #‰

b ‖2 ? •

(b) If #‰a ⊥ #‰

b then

‖ #‰a +#‰

b ‖2 = ‖ #‰a‖2 + ‖ #‰

b ‖2 ? •

(c) If #‰a ⊥ #‰

b then

‖ #‰a − #‰

b ‖2 = ‖ #‰a‖2 − ‖ #‰

b ‖2 ? •


14. True or False:

(a) If #‰a ⊥ #‰

b and also#‰

b ⊥ #‰c then #‰a ⊥ #‰c?

(b) If #‰a ⊥ #‰

b and also #‰a ⊥ #‰c then #‰a ⊥(

#‰

b + #‰c ) ?

(c) If #‰a ⊥ #‰

b and also#‰

b ⊥ #‰c then#‰

b ⊥( #‰a − #‰c ) ?

(d) If #‰a ⊥ #‰

b + #‰c and also #‰a ⊥ #‰

b − #‰c then#‰a ⊥ #‰

b ?

15. Simplify the following expressions

(a) ( #‰a +#‰

b )×( #‰a +#‰

b ) •(b) ( #‰a +

#‰

b + #‰c )×( #‰a +#‰

b + #‰c ) •(c) ( #‰a − #‰

b )×( #‰a +#‰

b ) •(d) ( #‰a +

#‰

b − #‰c )×( #‰a − #‰

b + #‰c )

(e) ( #‰a +#‰

b − #‰c )·( #‰a − #‰

b + #‰c )

16. This problem is about “cross division,”i.e. can you solve #‰a× #‰

b = #‰c for#‰

b if youknow #‰a and #‰c ?

(a) Let#‰a = #‰e1 − #‰e3,

#‰c = #‰e1 + 3 #‰e2 + 2 #‰e3.

Find a vector#‰

b for which #‰a× #‰

b = #‰c , ifthere is such a thing. (Hint: if #‰c = #‰a× #‰

b ,then what do you know about #‰a · #‰c ?) •

(b) Let #‰a = 2 #‰e1− #‰e3, and #‰c = #‰e1 +3 #‰e2 +

2 #‰e3. Find a vector#‰

b for which #‰a× #‰

b = #‰c ,if such a thing exists. •

17. The law of cosines says that in a triangle4ABC for which you know the sides ABandAC , as well as the angle ∠A, the lengthof the opposing side BC is given by

(BC)2 = (AB)2 + (AC)2

− 2(AB)(AC) cos∠A.

Show how you can use the dot product to(re)prove this law.

Hint: consider the vector equation# ‰BC =

# ‰AC − # ‰

AB. You will need both thegeometric description (4) of the dot product,and the algebraic properties from § 6.3.

CHAPTER 2

Parametric curves and vector functions

1. Vector functions

So far in calculus we have only considered functions y = f(x) where both the inde-pendent variable x and the dependent variable y are real numbers.

A vector function is a function of one variable whose values are vectors instead ofnumbers. One way to specify a vector function is to say what its components are:

#‰x(t) =

x(t)y(t)z(t)

= x(t) #‰e1 + y(t) #‰e2 + z(t) #‰e3.

2. Using vector functions to describe motion

One way to visualize a vector function #‰x(t) is to think of the vector #‰x(t) for anygiven value of t as the position vector of some point in space (or the plane, if #‰x(t) is a two-dimensional vector). In other words, we represent the vector #‰x(t) as an arrow startingat the origin, and ending at some point X(t) whose coordinates are (x(t), y(t), z(t)):

#‰x(t) =# ‰

OX(t).

As t varies, the pointX(t) moves around and traces out a curve. Such a curve is called aparametrized curve, or a parametric curve. �e quantity t is called the parameter.

We will now take a look at some examples of parametric curves.

#‰x(t)

O

X(t)

Figure 1. A parametric curve: as the parameter t changes, the vector #‰x(t)will also move. Keep-ing the initial point of the vector #‰x(t) at the originO, the endpointX(t) traces out a space curve.

19

20 2. PARAMETRIC CURVES AND VECTOR FUNCTIONS

3. Lines

Consider the parametric curve given by

(18) #‰x(t) = #‰a + t #‰v

where #‰a and #‰v are given constant vectors. As before we let X(t) be the point with#‰x(t) =

# ‰

OX(t), i.e. #‰x(t) is the position vector of the point X(t), and as t changes, X(t)traces out the parametric curve.

To see what the parametric curve looks like, we let A be the point with # ‰

OA = #‰a ,then, since

# ‰

OX(t) =# ‰

OA+# ‰

AX(t),

it follows from (18) that# ‰

AX(t) = t #‰v . Now consider going from the origin O to thepoint X(t) in two steps: �rst move from O to the point A, then go from A to X(t). �edisplacement in the second step is

# ‰

AX(t) = t #‰v . Changing t will then make the pointX(t) slide along the line through the point A in the direction of #‰v .

#‰a#‰v

#‰x(t) = #‰a + t #‰v

X(t)

Origin

A

t #‰v

Figure 2. Vector form of linear motion given by #‰x(t) = #‰a + t #‰v .

We say that #‰x(t) given by (18) describes motion with constant velocity, whose ve-locity vector is #‰v .

4. Circular motion

For given constants R > 0 and ω we consider the vector function

(19) #‰x(t) = R cosωt #‰e1 +R sinωt #‰e2 =

(R cosωtR sinωt

).

�e corresponding point is X(t) =(R cosωt,R sinωt

). It lies on the circle of radius R

with center at the origin, and the angle subtended by OX(t) and the positive x-axis isexactly ωt.

If ω > 0 then as t increases, the angle ωt increases and the point X(t) goes aroundthe circle in counter-clockwise direction. Ifω < 0 thenX(t) goes around in the clockwisedirection.

�e number ω is the rate of increase of the angle ωt, and is called the angular ve-locity of the motion.

6. THE HELIX 21

#‰x(t)ωt

X(t)

O

Figure 3. Circular motion with angular velocity ω.

5. �e cycloid

�e cycloid is the curve we get if we put a (bicycle) wheel on the ground, markthe point on the tire that touches the ground, and follow this point as we roll the wheelforward. If we call the pointX , then it depends on the angle θ that the wheel has turnedsinceX was on the ground. Figure 4 provides a derivation of the vector function #‰x(θ) =# ‰

OX(θ) that describes the cycloid. �e result is

(20) #‰x(θ) =

(Rθ −R sin θR−R cos θ

).

X

C

B

AO

θθ

θ

O AA

CC

X

X

Figure 4. The cycloid. A wheel of radius R rolls over the x-axis. Initially the wheel touches thex-axis at the origin O. The cycloid is the curve traced out by a pointX on the wheel.

Derivation of the cycloid motion. The arc AX and the line segment OA have the samelength. Since AX has length Rθ, the x coordinates of the points A, B, and C are Rθ. The righttriangle CXB has hypotenuse R, so the lengths ofXB and CB are R sin θ, and R cos θ, respec-tively. Therefore the coordinates of the pointX are x = Rθ −R sin θ, and y = R−R cos θ.

6. �e helix

When we walk up a spiral staircase we are tracing out a helix: we are going aroundin circles, and moving upward at the same time. �e parametric curve that does this (and


that has the z-axis as its central axis) is given by

(21) #‰x(θ) =

R cos θR sin θaθ

or: #‰x(θ) = R cos θ #‰e1 +R sin θ #‰e2 + aθ #‰e3.

Here R > 0 is the radius of the helix, i.e. the radius of the circle on the ground abovewhich the helix lies; the number a represents the rate at which the helix goes up.

x y

z

θ

aθ

X

O

YA

Figure 5. The Helix. The point X traces out a helix: it sits at a height aθ above the point Y ,while Y runs around on a circle of radius R; here θ = ∠AOY

7. �e derivative of a vector function

For a function y = f(x) of one variable we had twoways of describing the derivative:on one hand we had a geometric description of f ′(x) as “the slope of the tangent to thegraph,” and on the other we could describe f ′(x) in terms of a di�erence quotient, i.e.

f ′(x) = lim∆x→0

f(x+ ∆x)− f(x)

∆x.

For vector functionswe can imitate both descriptions. We beginwith the formal de�nitionin terms of limits and then proceed to the geometric description, in which we interpretthe derivative as the “instantaneous velocity vector.”

De�nition. If #‰x(t) is a vector function, then we set

(22) #‰x ′(t)def= lim

∆t→0

#‰x(t+ ∆t)− #‰x(t)

∆t.

For (22) to make sense we would have to de�ne what the limit of a vector function is.�is can be done, but we will not go into the precise de�nitions in this course. More

8. THE DERIVATIVE AS VELOCITY VECTOR 23

important for our use is that if the components of a vector function #‰x(t) are given, thenthe derivative can be computed by just di�erentiating those components:

(23) #‰x ′(t) =

x′(t)y′(t)z′(t)

, or #‰x ′(t) = x′(t) #‰e1 + y′(t) #‰e2 + z′(t) #‰e3.

As with ordinary functions of one variable we will use Leibniz’ notation for the derivativewhenever it seems convenient. �us the following are equivalent ways of expressing thesame derivative:

#‰a ′(t) =d #‰a(t)

dt=

d

dt#‰a(t).

Example. For instance,

#‰x(θ) =

cos θ0θ

= cos θ #‰e1 + θ #‰e3

de�nes a vector function. Here we have called the independent variable θ instead of t.�e derivative of this vector function is

d #‰x

dθ=

d

dθ

cos θ0θ

=

− sin θ01

= − sin θ #‰e1 + #‰e3.

8. �e derivative as velocity vector

Suppose the motion of some point X(t) in space is described by its position vectorfunction #‰x(t). Let us try to de�ne the instantaneous velocity of the point. �is velocityshould have magnitude (“how fast the point is moving”) and also direction (“which way

Δx

v = dx/dt

x(t)x(t+

Δt)

X(t)

O

Figure 6. The vector function #‰x(t) traces out a curve in space. The vector #‰x(t) is the positionvector of a pointX(t) on this curve. As we increase time from t to t+ ∆t, the pointX(t) moves.The displacement of the point X(t) is given by ∆ #‰x = #‰x(t + ∆t) − #‰x(t). The average velocityvector during this displacement is “displacement/time”, i.e. ∆ #‰x/∆t.

If we let ∆t → 0, then the average velocity becomes the instantaneous velocity at time t:#‰v = lim∆t→0 ∆ #‰x/∆t = #‰x ′(t). This vector is tangent to the curve traced out by the vectorfunction #‰x(t). We call it a tangent vector.


is the point going?”). �e velocity should therefore be a vector. To see which vector, wego back to the notion that “velocity” is always “displacement divided by time.”

We consider two instances in time, say, time t and time t+∆t. �en the position vec-tors of the pointX at these two di�erent times are #‰x(t) and #‰x(t+∆t). �e displacementof the point X between these two times is then

∆ #‰x = #‰x(t+ ∆t)− #‰x(t)

(see Figure 6.) We say that the average velocity over the time interval from t to t+ ∆t is“the displacement divided by ∆t,” i.e.

#‰v average =#‰x(t+ ∆t)− #‰x(t)

∆t.

Note that the average velocity is a vector. If we write it out in components, we get a muchlarger formula:

#‰v average =

x(t+ ∆t)− x(t)

∆t

y(t+ ∆t)− y(t)

∆t

z(t+ ∆t)− z(t)∆t

.

One big advantage of using vector notation is that many formulas simplify considerablywhen wri�en in terms of vectors.

To get the instantaneous velocity, we do the same thing as in one variable calculus:we take the limit as∆t→ 0 of the average velocity over the time interval from t to t+∆t.�us we get

(24) #‰v (t) = lim∆t→0

#‰x(t+ ∆t)− #‰x(t)

∆t

def=

d #‰x

dt.

In terms of components this derivative is

#‰x ′(t) =d #‰x

dt=

x′(t)y′(t)z′(t)

.

�us the velocity vector of any given vector function #‰x(t) is the same as the derivativeof this vector function.

9. Acceleration

Having found the velocity vector of a point X(t) whose position vector is a givenvector function

# ‰

OX(t) = #‰x(t), we can also de�ne the acceleration vector of the movingpoint. By de�nition, the acceleration vector is the derivative of the velocity vector, i.e.

(25) #‰a(t) =d #‰v

dt=d2 #‰x

dt2=

x′′(t)y′′(t)z′′(t)

.

�is de�nition is entirely analogous to the de�nition of acceleration (“a = dvdt ”) from �rst

semester calculus. �e only di�erence is that, here, the position, velocity, and accelerationall have directions in addition to magnitudes: they are vectors.

10. THE DIFFERENTIATION RULES 25

Newton’s famous law relating forces and acceleration continues to hold. If a pointX(t) moves according to some vector function #‰x(t), then some force must be actingon this point. �is force is a vector (it has magnitude and direction), and, according toNewton, it is given by

(26) #‰

F = m #‰a = md #‰v

dt= m

d2 #‰x

dt2,

wherem is the mass of the object at the pointX(t) whose motion we are considering. Itis always assumed to be a positive number.

Note that according to this law, the absence of forces, i.e. #‰

F =#‰0 , is the same as

d #‰vdt =

#‰0 , i.e. no force acts on the point if and only if its velocity vector is constant. Here

“constant” means constant magnitude and constant direction.

10. �e di�erentiation rules

Just as with ordinary derivatives, the derivatives of vector functions satisfy certainrules, such as the product rule. �e purpose of these rules is not the same as in one variablecalculus. �ere we used sum, product, quotient and chain rules to compute derivativesof given functions without having to fall back on the de�nition of a derivative all thetime. For vector functions we do not need such rules, because we can di�erentiate themby simply di�erentiating each of their components (see the above example). Instead, thedi�erentiation rules for vector functions are mostly used to gain insight and establishgeneral facts about vector functions, a number of which we will see shortly.

10.1. �e sum rule. �e analog of the sum rule (“derivative of the sum is the sum ofthe derivatives”) looks exactly like the ordinary sum rule. It says that for any two vectorfunctions #‰a(t) and #‰

b (t) one has

d

dt

(#‰a(t)± #‰

b (t))

=d #‰a(t)

dt± d

#‰

b (t)

dt.

10.2. �emany product rules. �ere is no quotient rule for vector functions, sim-ply because we have no way of dividing vectors. On the other hand we have two waysof multiplying vectors, and we can also multiply vectors and numbers, so there are threedi�erent product rules. Fortunately they all look like the product rule from �rst semestercalculus.

If #‰a(t) and #‰

b (t) are vector functions, and if f(t) is a function, then

d #‰a(t)· #‰b (t)

dt=d #‰a(t)

dt· #‰b (t) + #‰a(t)·d

#‰

b (t)

dt

d #‰a(t)× #‰

b (t)

dt=d #‰a(t)

dt× #‰

b (t) + #‰a(t)×d#‰

b (t)

dt

d f(t) #‰a(t)

dt=df(t)

dt#‰a(t) + f(t)

d #‰a(t)

dt

In spite of the fact that these rules “look right,” they could still be wrong, so to be surewe would have to prove them. �e proofs are very straightforward. Here is a short proof


for the product rule involving the dot product. To shorten the formulas we omit the “(t)”from all functions:

d #‰a · #‰bdt

=d

dt

(a1b1 + a2b2

)=da1b1dt

+da2b2dt

=da1

dtb1 + a1

db1dt

+da2

dtb2 + a2

db2dt

ordinary product rule

=da1

dtb1 +

da2

dtb2 + a1

db1dt

+ a2db2dt

switch terms around

=d #‰a

dt· #‰b + #‰a ·d

#‰

b

dt. recognize the dot-products

11. Vector functions of constant length

As an immediate application of the product rule for the dot-product we prove thefollowing fact about vector functions whose length does not change, i.e. vector functions#‰a(t) that change their direction, but not their length.

#‰a(t)

∆ #‰a#‰a(t+ ∆t)

If a vector function #‰a(t) hasconstant length, then, when theparameter t undergoes a smallchange ∆t, the correspondingsmall change ∆ #‰a in the vectorfunction will be almost perpendic-ular to #‰a(t) itself.

�eorem. Let #‰a(t) be a vector function. �en a necessary and su�cient condition forthe length ‖ #‰a(t)‖ to be constant is that #‰a(t) and #‰a ′(t) be perpendicular for all t.

Proof. Di�erentiating both sides of the equation‖ #‰a(t)‖2 = #‰a(t)· #‰a(t)

we get

(27) d

dt‖ #‰a(t)‖2 = #‰a ′(t)· #‰a(t) + #‰a(t)· #‰a ′(t) = 2 #‰a(t)· #‰a ′(t).

If #‰a(t) has constant length, then ‖ #‰a(t)‖2 is also constant, and thus ddt‖

#‰a(t)‖2 = 0.�erefore, for a vector function #‰a(t)whose length is constant, #‰a(t)· #‰a ′(t) = 0, i.e. #‰a(t) ⊥#‰a ′(t).

Conversely, if #‰a(t) is a vector function for which #‰a(t) ⊥ #‰a ′(t) holds for all t, then#‰a(t)· #‰a ′(t) = 0, and (27) implies that d

dt‖#‰a(t)‖2 = 0, i.e. that ‖ #‰a(t)‖2 and hence ‖ #‰a(t)‖

are constant.�

12. TWO EXAMPLES 27

12. Two examples

12.1. Motion on a straight line. We return to the motion given by (18), i.e.

(28) #‰x(t) = #‰a + t #‰v .

�e velocity and acceleration are easy to compute:

d #‰x(t)

dt= #‰v ,

d2 #‰x(t)

dt=d #‰v

dt=

#‰0 ,

since #‰v is a constant vector in this case.We see that if a point X(t) moves according to the parametrization (18), then its

velocity is constant, and its acceleration is zero. According to Newton’s law, no force isexerted on an object undergoing this motion.

12.2. Circular motion. For the point X(t) moving on a circle of radius R withangular velocity ω we have (19), i.e.

#‰x(t) = R cosωt #‰e1 +R sinωt #‰e2

so that the velocity and acceleration are easy to compute:#‰v (t) = #‰x ′(t) = −ωR sinωt #‰e1+ ωR cosωt #‰e2,#‰a(t) = #‰v ′(t) = −ω2R cosωt #‰e1− ω2R sinωt #‰e2.

Note that the velocity vector #‰v (t) is perpendicular to the position vector #‰x(t), aspredicted in § 11. Our expression for the velocity vector #‰v (t) contains the familiar re-lation between angular velocity and velocity: the velocity v = ‖ #‰v (t)‖ with which thepoint X(t) is moving is

v(t) = ‖−ωR sinωt #‰e1 + ωR cosωt #‰e2‖(29)

=√ω2R2 sin2 ωt+ ω2R2 cos2 ωt

= ωR.

Hence the angular velocity of an object undergoing circular motion is

(30) ω =v

R.

#‰

F#‰v (t) ωt R

X

Figure 7. If an object moves along a circle with constant angular velocity, then the force#‰F re-

quired to make the object follow that motion is#‰F = −ω2 #‰x . In particular it is parallel to the

position vector #‰x but in the opposite direction.


We also note that the acceleration is a multiple of the position vector:#‰a(t) = −ω2 #‰x(t).

According to Newton the force acting on the object atX(t) is #‰

F = m #‰a = −mω2 #‰x , andits magnitude is

(31) F = ‖ #‰

F ‖ = ‖mω2 #‰x(t)‖ = mω2R,

because ‖ #‰x(t)‖ = R at all times.Using (30) we can replace the angular velocity ω by the actual velocity, which leads

to the classical formula for the centrifugal force

(32) F =mv2

R.

13. Arc length

For any given vector function there is a simple formula for the length of the curve ittraces out. �e formula is essentially the same as the formula for the length of a parametriccurve (or, to a lesser extent, of the graph of a function) that was described in Math 221.Here we repeat the intuitive derivation of the formula, wri�en in terms of vectors thistime.

Let #‰x(t) (a ≤ t ≤ b) be a vector function. To determine the length of the arc tracedout by X(t) as t varies from t = a to b, we divide the interval a ≤ t ≤ b into manyvery short subintervals. �e corresponding pointsX(t) on the curve split the curve intomany short segments, each of which will be “close to a line segment.” We approximatethe length of the curve by adding the lengths of all these short segments. Finally we takethe limit in which the number of partition points becomes in�nite and our sum of lengthsof short segments becomes an integral. To see which integral we get, we need to �nd anexpression for the length of a short segment between two adjacent partition points onthe curve.

Suppose we have two points on the curve, with parameter values t and t + ∆t, re-spectively. �e points are X(t) and X(t + ∆t), and the distance between them is thelength of the vector ∆ #‰x from one point to the next. �is vector is

Δx start(t=a)

end(t=b)

partition piece

X(t)

X(t+Δt)

∆x = #‰x(t+ ∆t)− #‰x(t) =#‰x(t+ ∆t)− #‰x(t)

∆t∆t ≈ #‰x ′(t)∆t,

so that its length is ≈ ‖ #‰x ′(t)‖∆t. Adding the lengths of the short segments together,we �nd that the length is approximately

∑‖ #‰x ′(t)‖∆t (where the summation is over all

short pieces of the curve). Taking the limit we arrive at this formula for the length of thecurve traced out by #‰x(t), a ≤ t ≤ b:

(33) Length =

∫ b

t=a

‖ #‰x ′(t)‖ dt.

�is integral looks simple, but that appearance turns out to be deceptive as we �ndout when we write it in terms of the components of the vector function #‰x(t). Suppose#‰x(t) = x(t) #‰e1 + y(t) #‰e2 + z(t) #‰e3. �en

#‰x ′(t) = x′(t) #‰e1 + y′(t) #‰e2 + z′(t) #‰e3,

so that‖ #‰x ′(t)‖ =

√x′(t)2 + y′(t)2 + z′(t)2.

14. ARC LENGTH DERIVATIVE 29

�erefore the length formula (33) of the curve is equivalent to

(34) Length =

∫ b

t=a

√x′(t)2 + y′(t)2 + z′(t)2 dt.

�e square root makes this formula a reliable source of very di�cult integrals. In fact thelist of curves whose length one can actually compute by doing the integral is rather short(see Problem …).

14. Arc length derivative

Let #‰x(t) be some vector function that describes the motion through space of somepoint X(t), and let f(t) be some other function. In what follows it will help to think ofthe parameter t as “time.” Typical examples of functions f that we might want to considerare f(t) = ‖ #‰x(t)‖ (the distance to the origin of the point X(t)) or f(t) = ‖ #‰x ′(t)‖ (thespeed at which the point is moving.)

To describe the rate with which f(t) is changing we could compute its derivative,df

dt

which tells us what the ratio between the change ∆f of f , and the change ∆t in theparameter t is (at least approximately, if ∆t is small). If we interpret t as “time” thenthis derivative tells us how fast f(t) changes per second. But sometimes it is more usefulto know how much f changes a�er we have travelled a small distance along the curve,rather than a�er a short amount of time has passed. In other words, for two nearby pointsX(t) and X(t+ ∆t) on the curve we would like to know the ratio

(35) change in fdistance travelled =

f(t+ ∆t)− f(t)

distance from X(t) to X(t+ ∆t)

We can work this out by observing that the distance fromX(t) toX(t+∆t) is the lengthof the vector from X(t) to X(t+ ∆t), i.e.

distance from X(t) to X(t+ ∆t) = ‖ #‰x(t+ ∆t)− #‰x(t)‖ .Assuming ∆t is small, we have

‖ #‰x(t+ ∆t)− #‰x(t)‖ =

∥∥∥∥ #‰x(t+ ∆t)− #‰x(t)

∆t

∥∥∥∥ ∆t ≈∥∥ #‰x ′(t)

∥∥ ∆t.

We substitute this in (35), and getchange in f

distance travelled ≈f(t+ ∆t)− f(t)

‖ #‰x ′(t)‖∆t.

Now let ∆t → 0: the quantity on the le� becomes what is called the arc length deriv-ative of the function f along the curve vx(t), and which is commonly denoted by df

ds Inthe quantity on the right we recognize the derivative of f with respect to t (time), whichleads to

(36) df

ds=

1

‖ #‰x ′(t)‖df

dt.

Here dfdt = f ′(t) is the usual derivative of f with respect to t.

If we want to emphasize the distinction between these two derivatives, then we cancall dfdt the “time derivative of f .”


15. Unit Tangent and Curvature

15.1. Unit tangent. We have seen that we can �nd a tangent vector to the curvetraced out by some vector function #‰x(t), simply by di�erentiating the vector function:#‰x ′(t) always provides a tangent vector (if #‰x ′(t) 6= #‰

0 ). In fact any multiple λ #‰x ′(t) ofA vector with length 1 iscalled a unit vector this vector will also be a tangent vector (provided λ 6= 0.) We can single out one special

tangent vector, by choosing λ > 0 so that λ #‰x ′(t) has length 1. Since for λ > 0 wehave ‖λ #‰x ′(t)‖ = λ‖ #‰x ′(t)‖ the value of λ that will make λ #‰x ′(t) a unit vector is λ =1/‖ #‰x ′(t)‖.

For this reason the vector

(37) #‰

T (t) =d #‰x

ds=

#‰x ′(t)

‖ #‰x ′(t)‖

is called the unit tangent vector to the curve corresponding to the vector function #‰x(t).

15.2. Example. For our constant velocity parametrization (18) of a straight linefrom § 3 we have

#‰x(t) = #‰a + t #‰v ,

so that #‰x ′(t) = #‰v and hence#‰

T =#‰v

‖ #‰v ‖.

We see that the unit tangent vector is constant.

15.3. Curvature and normal. If the curve described by a vector function #‰x(t) isnot a straight line, then the tangent to the curve will turn as one moves along the curve.�e curvature vector #‰κ measures how much the curve is curved. It is de�ned to be therate of change of the unit tangent, but with respect to arc length instead of with respectto the given parameter t. �us

(38) #‰κdef=

d#‰

T

ds.

According to our de�nition of “derivative with respect to arc length” the right hand sidestands for

(39) d#‰

T

ds=

1

‖ #‰x ′(t)‖d

#‰

T

dt.

To write this completely in terms of the original vector function #‰x(t) we use (37)

(40) #‰κ =1

‖ #‰x ′(t)‖d

dt

{ 1

‖ #‰x ′(t)‖d #‰x

dt

}�is formula is not as short as the original de�nition (38), but it does show that the curva-ture vector comes about by di�erentiating the vector function #‰x(t) twice (and dividingby ‖ #‰x ′(t)‖ at the right moments.)

17. PROBLEMS 31

�eorem. �e curvature vector #‰κ is perpendicular to the tangent, i.e. #‰κ ⊥ #‰

T .

Proof. We have to show that #‰κ· #‰T = 0. From the second form (39) of the de�nitionof #‰κ we see

#‰κ· #‰T =( 1

‖ #‰x ′(t)‖d

#‰

T

dt

)· #‰T =

1

‖ #‰x ′(t)‖d

#‰

T

dt· #‰T .

Remember that #‰

T (t) is always a unit vector, i.e. #‰

T (t) has constant length: by § 11 thisimplies that d

#‰Tdt ⊥

#‰

T (t) and thus d#‰Tdt ·

#‰

T = 0, so we are done. �

�ere are two concepts that are derived from the curvature vector: the curvature κis by de�nition the length of the curvature vector #‰κ ,

(41) κ = ‖ #‰κ‖ =

∥∥∥∥∥d#‰

T

ds

∥∥∥∥∥ ,and the normal vector to the curve is

(42) # ‰

N =#‰κ

‖ #‰κ‖=

d#‰Tds∥∥∥d #‰Tds

∥∥∥ .�e normal vector is unde�ned when #‰κ =

#‰0 , because it would require division by zero.

Since #‰κ is perpendicular to #‰

T , the normal vector # ‰

N is also perpendicular to #‰

T (henceits name).

(43) d#‰

T

ds= κ

# ‰

N

16. Osculating plane

At any pointX(t) on a space curve given by #‰x(t) one de�nes the osculating planeto be the plane that contains the pointX(t) and that is parallel to both the tangent #‰

T (t)

and normal # ‰

N(t) of the curve.If we want to write a de�ning equation for the osculating plane as in § 11.2 then

we need a vector perpendicular to the osculating plane. Since this plane is de�ned to beparallel to both #‰

T and # ‰

N , we can �nd a normal vector to the osculating plane by takingthe cross product of #‰

T and # ‰

N . �is vector is called the binormal to the curve. In aformula, it is de�ned to be(44) #‰

B =#‰

T× # ‰

N .

17. Problems

1. Let ` be the line given by

#‰x(t) =

110

+ t

−121

.

(a) Find the unit tangent vector, the curva-ture, and the tangent line to the line ` at thepoint where t = 2.

(b) Find the unit tangent vector, the curva-ture, and the tangent line to the line ` at anypoint on the line.

2. What sign does ω have in Figure 7 ? Howwould the figure change if we change the


sign of ω? Does the force#‰F on the object

change if we change the sign of ω?

3. Suppose a point P is rotating around aline `, keeping its distance to the line fixedat r, and moving in a plane perpendicularto the line. Suppose the point has angularvelocity ω: this means that during a time in-terval of length t the angle swept out by theline segment connecting P to ` is exactly ωt.

In a previous math or physics class it wasshown that the velocity of the point P is ωr,where r is the distance from P to the line `.

The angular velocity vector is definedto be the vector #‰ω whose length is ω, andthat is parallel to the line `. There are twosuch vectors (± #‰ω). By definition #‰ω points inthe direction in which a screwwould move ifit were turning in the same direction as thepoint P .

(a) Assuming the line ` passes through theorigin show from the drawing that the ve-locity vector of the point P is #‰v is given by#‰ω× #‰x . You can do this in two steps, namely:

— show that #‰ω× #‰x has the same direction as #‰v ,— show that #‰ω× #‰x has the same length as #‰v .

(b) Show that the acceleration vector isgiven by #‰a = #‰ω×( #‰ω× #‰x). (hint: don’t usethe drawing, but combine the definitions of#‰v and #‰a , in (24) and (25) and also the prod-uct rule; finally, keep in mind that you havejust found that #‰v = #‰ω× #‰x .)

(c) If someone told you they had computedthe acceleration vector and found

#‰a = ( #‰ω× #‰ω)× #‰x ,

could they be right? Explain! What if theytold you they got #‰a = #‰ω× #‰ω× #‰x?

(d) True or False (explain your answers):

(a) #‰v ⊥ #‰x? (b) #‰a ⊥ #‰v ? (c) #‰aand #‰x are parallel?

(e) Include the acceleration vector #‰a in theabove drawing.

4. Consider the “twisted cubic,” i.e. thecurve given by #‰x(t) = t #‰e1 + t2 #‰e2 + t3 #‰e3.

(a) Find a parametrization for the tangent tothe curve at the point where t = 1. Wheredoes this point intersect the xy-plane?

(b) For any given t find the tangent line tothe curve at the point X(t), and find wherethis curve intersects the xy-plane.

(c) If you call that intersection point P (t),then which curve is traced out by the pointP (t) as t varies?

5. Compute the length of one full turn of thehelix by taking the parametrization given in(21) and computing the length of the seg-ment with 0 ≤ θ ≤ 2π.

A�er computing the length, considerthis: let P be the perimeter of the circle un-derneath the helix, and let H be the heightachieved by one full turn of the helix. Showthat the length L of the helix satisfies L2 =P 2 +H2.

6. There is a multistory parking rampwherethe way out is a path in the shape of a he-lix that is wound around the outside of thebuilding. As a car drives down this pathat night its headlights shine a spot on theground. Which curve is traced out by thislight spot as the car drives all the way down?

Origin

∆s = r∆θ = rω∆t

#‰ω

#‰x

#‰v = #‰ω× #‰x`

r rP P

17. PROBLEMS 33

Make a good drawing. Assume for sim-plicity that the center of the Parking ramp isthe z-axis.

7. Compute the tangent, curvature, normaland binormal for the following curves

(a) The parabola: #‰x(t) =(t2

t

). At which

point on the curve is the curvature thelargest?

(b) Neil’s parabola: #‰x(t) =(t2

t3

). At

which point on the curve is the curvature thelargest?

(c) The helix: #‰x(θ) =(R cos θR sin θaθ

)(see § 6 for

an explanation of the constantsR and a). At

which point on the curve is the curvature thelargest?

(d) The graph of y = ex by using theparametrization #‰x(t) =

(tet

). Where on

the graph is the curvature the largest? •

CHAPTER 3

Functions of more than one variable

1. Functions of two variables and their graphs

1.1. De�nition. A function of two variables has two ingredients: a domain and arule. �e domain of the function is a collection of points in the xy-plane. For each point(x, y) from the domain of the function, the rule should tell us how to �nd the functionvalue f(x, y).

Just as with functions of one variable, the “rule” that gives us the function value iso�en speci�ed by some formula, e.g. f(x, y) = x + y. �e domain of a function is theset of points at which we de�ne the function. �is can in principle be any set of pointsin the plane. Typically the domain will be a rectangle, or a disc, or it could be the entirexy-plane, possibly with some points and lines removed.

z

height:z=f(x,y)

Domain of f

x

y

Figure 1. The graph of some function, and its domain (a rectangle in this example).

1.2. Graphs. By de�nition, the graph of a function z = f(x, y) is the collection ofall points (x, y, z) in three dimensional space that satisfy the equation z = f(x, y).

�e graph is usually a surface that �oats above (or below) the domain of the function(see Figure 2).

35

36 3. FUNCTIONS OF MORE THAN ONE VARIABLE

1.3. Level sets. �e graph of a function of two variables is a surface si�ing in threedimensional space, which can be di�cult to draw or visualize. Instead of looking at thegraph we can also consider its level sets. If c is any real number, then, by de�nition, thelevel set at level c of the function is the set of all points (x, y) in the plane that satisfyf(x, y) = c.

z

c

x

y

level set at level c

level set at level c

x

y

Figure 2. The graph of some function (top), and a construction of one of its level sets (bo�om).Note that by definition the level set (“at level c”) is the curve in the xy-plane under the graph: itis obtained by intersecting the graph of the function with a horizontal plane at height c, and thenprojecting this curve of intersection onto the xy-plane.

Since the level set is the set of all solutions to the equation f(x, y) = c, one o�en usesthe notation f−1(c) (“f -inverse of c”) for the level set. We can summarize the de�nitionin an equation:

f−1(c) ={

(x, y) : f(x, y) = c}.

�

Note that the de�nition says that f−1(c) is not a number, but a set of points!

1. FUNCTIONS OF TWO VARIABLES AND THEIR GRAPHS 37

Level sets tend to be curves in the xy-plane, although in general level sets can haveany shape (see Problem 5.13 for an example.) �ey are usually easier to draw than thegraphs of the corresponding functions.

1.4. An example from the “real” world. Here is a function of local interest. �edomain of the function is the water surface of Lake Mendota (let’s pretend this is a planedomain), and the function, which we will call d instead of f , is given by d(x, y) = thedepth of the lake at location (x, y). �ere is no formula for this function, but the Wiscon-sin Department of Natural Resources has measured the depth and presented the resultsin terms of the level sets of the function d.

Figure 3. The level curves of a function z = d(x, y). The domain of this function is the lakesurface, and d(x, y) is the depth in meters of Lake Mendota at (x, y). To see the graph of thefunction we could try to drain the lake.

See http://limnology.wisc.edu/lake information/mendota/mendota.html

1.5. A comment about language and set-theoretic notation. We will o�en say“consider a function z = f(x, y). . . ”, but there is a sense in which this is incorrect. Itis convenient to say “consider a function z = f(x, y). . . ” since it not only names thefunction, but it also gives the independent variables x, y, and the dependent variable z aname. Nevertheless, the symbol in the equation z = f(x, y) that actually represents thefunction is “f”. �e correct way of introducing the function1 would be to say “consider afunction f .”

In fact, in the notation that is used inmodernmathematics onewouldwrite “Considerthe function f : D → R. . . ” Here f is the name of the function we are introducing, D is

1Saying “consider the function z = f(x, y). . . ” to introduce the function f is like saying “Please meet mybrother Joe, Bill, and Sue” when you want to introduce your brother Joe, who happens to be standing next toBill and Sue. To introduce your brother, you would of course say “Please meet my brother Joe.” and to introducethe function you should really say “Consider the function f .”

http://limnology.wisc.edu/lake_information/mendota/mendota.html

http://limnology.wisc.edu/lake_information/mendota/mendota.html


the domain of that function (soD is a set of points in the plane), and R stands for the setof real numbers, indicating that computing f always results in a real number.

1.6. Vector notation. If #‰x is the position vector of the point (x, y) in the plane, i.e.if #‰x = ( xy ), then one sometimes writes

f(x, y) = f( #‰x).

Physicists have a preference for #‰r instead of #‰x (because they call the position vector the“radius vector”), and will write f(x, y) = f( #‰r ).

2. Linear functions

�e simplest function of one variable are those of the form f(x) = ax + b. �eirgraphs are lines, and we called them linear functions.

A linear function of two variables is a function f of the form

(45) z = f(x, y) = ax+ by + c,

where a, b, c are constants.

x

y

z

Figure 4. The graph of a linear function z = ax+ by + c.

�e graph of a linear function is always a plane. Indeed, the graph consists of allpoints (x, y, z) that satisfy the equation

−ax− by + z = c,

which we can write as#‰n· #‰x = #‰n· #‰p ,

where

#‰n =

−a−b1

, and #‰p =

00c

.

3. QUADRATIC FORMS 39

3. �adratic forms

A�er learning about linear functions in pre-calculus one usually goes on to quadraticfunctions. We will do the same for functions of two variables and study�adratic Forms.Just as in the one variable case where quadratic functions can have a maximum or min-imum, quadratic forms provide examples of functions of two variables that can have amaximum or a minimum, or, it turns out, a third kind of “min-max” or “saddle shape.”�ey provide the basic pro�le of what we will run into when we look for local minimaand maxima of functions of two variables. In particular, the technique of classifying qua-dratic forms by completing the square, which we will see in this section, is the key to thesecond derivative test for functions of more than one variable.

3.1. De�nition. �e general quadratic form in two variables is

(46) f(x, y) = Ax2 +Bxy + Cy2,

where A, B, and C are constants. Depending on the values of these constants the graphsof the functions can have a number of di�erent shapes.

In addition to these quadratic forms one can also consider the more general class ofquadratic functions,

f(x, y) = Ax2 +Bxy + Cy2 +Dx+ Ey + F,

which also have terms of degree 1 and 0. We will restrict ourselves to quadratic forms(for now).

�e prototypical examples. �ere are several important special cases that are repre-sentative of what the graphs of quadratic forms can look like. �ese special cases are

f(x, y) = x2 + y2, and g(x, y) = −x2 − y2,(47a)

h(x, y) = x2, and h(x, y) = −x2,(47b)k(x, y) = xy(47c)

�eir graphs are discussed in Figure 5.

3.2. Classifying quadratic forms – the general procedure. All quadratic formshave graphs that look like one of the examples shown above – but how can we tell whichit is? In other words, if Q(x, y) is a given quadratic form how can we tell if it is de�nite,inde�nite, or semide�nite? How do we know for which (x, y) the formQ(x, y) is positiveor negative? It turns out that we can always �nd out by using the trick of “completingthe square.”

�e general procedure for a given quadratic formQ(x, y) = Ax2 +Bxy+Cy2 is asfollows:

(1) If A = 0, then we really have Q = Bxy + Cy2 and we can factor Q as

Q(x, y) = (Bx+ Cy)y.


(2) Assume A 6= 0. We factor out A, and complete the square for the �rst twoterms:

Q(x, y) = A{x2 +

B

Axy +

C

Ay2}

= A{(x+

B

2Ay)2 − ( B

2Ay)2

+C

Ay2}

= A{(x+

B

2Ay)2︸︷︷︸

u2

+4AC −B2

4A2y2︸︷︷︸

±v2

}.

(3) If 4AC −B2 > 0, then the expression in braces is positive, and we can write

Q(x, y) = A(u2 + v2), where u = x+B

2Ay, and v =

√4AC −B2

2Ay.

Depending on the sign of A our function is always positive or always negative,and we say the form is positive de�nite or negative de�nite.

The two forms f and g from (47a)are called definite, since they cannotchange sign:

f(x, y) = x2 + y2

is the sum of two squares, and there-fore is always positive, unless both xand y vanish. Similarly, g(x, y) =−f(x, y) is always negative, exceptat (x, y) = (0, 0).

The form h(x, y) = x2 is calledsemidefinite because it too cannotchange its sign. Clearly, h(x, y) = x2

is never negative, but for h(x, y) to bepositive, we need x 6= 0. So, the func-tion h(x, y) is positive, except on theline x = 0 (the y axis). The graph ofthe function h(x, y) = −y2 is similar,but upside down.

The form k(x, y) = xy is called in-definite, because it can be both pos-itive and negative: if x and y have thesame sign, then xy > 0, but if theyhave opposite signs, then xy < 0.Thus the graph of z = xy lies abovethe xy-plane in the first and thirdquadrants, and below the xy-plane inthe second and fourth quadrants.

xy > 0

xy > 0

xy < 0

xy < 0x

y

Figure 5. Graphs of some representative quadratic forms.

3. QUADRATIC FORMS 41

(4) If 4AC −B2 < 0, then we have

Q(x, y) = A(u2 − v2), where u = x+B

2Ay, and v =

√B2 − 4AC

2Ay.

When this happens we can factor the quadratic form, i.e. we have

Q(x, y) = A(u+ v)(u− v).

�e form is inde�nite.(5) in the only remaining case we have 4AC −B2 = 0, so that

Q(x, y) = A(x+

B

2Ay)2

.

In this case the form is a perfect square (times A). �e form is semi-de�nite.

To understand this procedure it is perhaps best to look at how it works in some examples.

3.3. Classifying quadratic forms – two examples.

3.3.1. An inde�nite quadratic form. Consider the formQ(x, y) = −3x2 +9xy+6y2.We rewrite this as follows:

Q = −3x2 + 6xy + 9y2

= −3(x2 − 2xy − 3y2

)= −3

[x2 − 2xy + y2︸︷︷︸−4y2

]complete the square

= −3[(x− y)2 − 4y2

] in this case we get the di�erence of twosquares, so use a2− b2 = (a− b)(a+ b)

= −3(x− y − 2y)(x− y + 2y)

= −3(x− 3y)(x+ y).

�is shows thatQ(x, y) > 0 when y > 13x or y < −x, andQ(x, y) < 0 when−x < y <

13x.

y

x

Q(x,y)<0

Q(x,y)<0

++

+

+ ++

++

+

+

+

+

++

+

++

+

++ ++

+

+

Figure 6. The signs of the quadratic form in example 3.3.1.


3.3.2. A positive de�nite quadratic form. To see a di�erent example, consider the qua-dratic form Q(x, y) = 2x2 − 4xy + 6y2. By completing the square we can write it as

Q(x, y) = 2{x2 − 2xy + 3y2

}= 2

{x2 − 2xy + y2 + 2y2

}the square is complete

= 2{

(x− y)2 + 2y2}

= 2(x− y)2 + 4y2.

We see that this particular quadratic form is positive de�nite.

4. Functions in polar coordinates r, θ

Recall that instead of using Cartesian coordinates (x, y) to specify the location pointsin the plane, we can also use polar coordinates. In many cases it is much easier to describea function using polar coordinates than in Cartesian coordinates.

To go back and forth between Cartesian and Polar Coordinates we can use the fol-lowing relations

x = r cos θ(48a)y = r sin θ(48b)

r =√x2 + y2(48c)

�

θ = arctany

x

�(48d)

�e equation for θ is only valid for x > 0, where −π2 < θ < π2 . In other regions of the

plane there are other expressions relating θ to (x, y). See problem 5.8.

θ

r

x

y

P

θ0

θ=θ0r=r0

Figure 7. Polar coordinates are defined in the picture on the right (see also equations (48)). Onthe le�: the set of points at which θ has one given value θ0 form a half line emanating from theorigin that makes an angle θ0 with the positive x-axis. The set of points at which r has a givenvalue r0 form a circle centered at the origin, with radius r0.

�e simplest kinds of functions one can consider in polar coordinates are those thatonly depend on one of those coordinates, i.e. functions that only depend on the radius r,and functions that only depend on the polar angle θ. Let’s look at some examples of suchfunctions.

4. FUNCTIONS IN POLAR COORDINATES r, θ 43

xy

z

z = r =√x2 + y2

r

z

z=

Φ(r

) =r

Figure 8. Radially symmetric functions. The graph of z = r.

4.1. Radially symmetric functions. �e functions

f(x, y) = x2 + y2, g(x, y) =√x2 + y2, h(x, y) = ln

(x2 + y2

),

all can be expressed in terms of the radius r only. Namely, using r2 = x2 + y2, we havef(x, y) = r2, g(x, y) = r, h(x, y) = ln r2(= 2 ln r).

In general, a function z = f(x, y) that can be wri�en in terms of the radius r only, i.e. afunction for which there is some function Φ of one variable with

f(x, y) = Φ(r), i.e. f(x, y) = Φ(√

x2 + y2),

is called a radially symmetric function.Since a radially symmetric function only depends on the radius r, its level sets consist

of circles centered at the origin (one exception: the origin, r = 0 can also be a level set,and this is obviously not a circle but a point.)

As an example, we consider the function g(x, y) =√x2 + y2 = r in more detail.

�e function Φ of one variable here is Φ(r) = r. We can try to visualize the graph of gby �rst looking at the positive x-axis only. �ere we have f(x, 0) =

√x2 = x. We get

the graph of g by revolving the graph of z = x around the z-axis. See Figure 8.

4.2. Functions of θ only. Here are two functions that happen to depend on thepolar angle θ only:

f(x, y) = sin θ, h(x, y) = θ.

We can rewrite these functions in terms of x and y by using the relations between Carte-sian and Polar coordinates (48). We get

f(x, y) = sin θ =y

r=

y√x2 + y2

for f , andh(x, y) = θ = arctan

y

xfor h, at least in the right half plane where x > 0.

A function that only depends on θ is constant on rays emanating from the originbecause the polar angle θ is constant on such rays. �e level sets of such a functiontherefore consist of half-lines (“rays”) starting at the origin. Its graph consists of “spokes”a�ached to the z-axis. Each spoke lies above a ray in the xy-plane with some polar angleθ, and is a�ached to the z-axis at a height given by the function value. As we vary θ, the


spoke rotates around the vertical axis and moves up or down, as dictated by the function.Figure 9 shows what happens for f(x, y) = sin θ.

θx y

z=f(θ)

“ray”

“spoke”

The graph of a function of θ onlyconsists of horizontal spokes

a�ached to the z-axis.The graph of z = sin θ

(the x-axis is coming right at us.)

Figure 9

�e function z = θ has a simpler formula in polar coordinates but actually has amore complicated graph. Let us try to visualize its graph: the spokes that make up thegraph are horizontal, a�ached to the z-axis, and are at height θ. If we increase the angleθ the spokes go up at a steady rate in a way that should remind us of a helix (see § 6and Figure 5). Based on this description its graph should look like the surface drawn inFigure 10. �e surface is called the helicoid, and it is not the graph of a function (it failsthe “vertical line test.”) We could have known this from the beginning , because when wedescribed our function as f(x, y) = θ, we should have immediately asked which θ? �epolar angle θ of any given point is only determined up to a multiple of 2π. �e “graph”that we have drawn of the “function” z = θ re�ects this. To make h(x, y) = θ into anhonest function we have to say which of the many possible angles θ we choose when weare given a point. One possible choice is to always require the polar angle θ to lie between0 and 2π (radians). More precisely, we can insist on

0 ≤ θ < 2π.

If we do this then there is a unique angle θ for each point (x, y) in the plane. �e graphof this function is shown on the right in Figure 10.

5. Methods of visualizing the graph of a function

5.1. Freezing a variable. If a function is not familiar, then a good strategy for draw-ing its graph is to “freeze a variable.” In other words, to analyze a function z = f(x, y)we pretend y is a constant: then x is the only independent variable, and we can try todraw the graph of the function z = f(x, y), now thinking of this as a function of onlyone variable. �is graph is a curve in the xz plane. We get one such curve for each choiceof y. Piecing these graphs together then gives us the graph of the two-variable functionz = f(x, y).

We could apply the same procedure with the roles of x and y switched: i.e. for each�xed x you try to graph z = f(x, y) as a function of the variable y only, a�er which wetry to �t all the graphs we get for di�erent values of x together.

x

y

z

5. METHODS OF VISUALIZING THE GRAPH OF A FUNCTION 45

x

y

x

y

Figure 10. The graph of z = θ is the helicoid. It is not the graph of a function, but one can extracta function by choosing a “branch” of the function. One possible choice, drawn here on the right,is to restrict the polar angle θ to the interval 0 ≤ θ < 2π. There are many other possible choices.

5.2. Moving graphs. �ere is another way of visualizing a function z = f(x, y) oftwo variables in which we think of one of the independent variables (e.g. y) as “time.” �e�nal picture is not one static image of a three dimensional surface, but rather a movie ofa graph that is moving around in the xz plane.

If we have a function z = f(x, y), then let us think of y as time, and let us relabelit as t, so that we are looking at the function z = f(x, t). Now at each moment in timet we can think of z = f(x, t) as a function of one variable x whose graph we can try todraw, regarding it as a still-image. �en, as we let time t vary, pu�ing the still images ina sequence, you get a movie of a graph of a changing function of one variable.

For instance, if the function is (once again) the saddle surface function z = xy, thenwe would be considering the function z = xt. At each moment t the graph of z = xt is

t=1

z

x x x x x

z z z z

t=−1 t=−1/2 t=0 t=1/2

Figure 11. The saddle movie. It’s about a line segment whose slope changes, even though it isotherwise stuck to the origin.


a line with slope t. Pu�ing these graphs together gives a movie which begins with a lineof rather negative slope; during the movie the slope increases, and in the middle of themovie our line has achieved horizontality; �nally, the closing shot presents us with a linewith a very positive slope. Figure 11 shows some stills from the movie.

�is interpretation is not very di�erent from the procedure of “freezing the y vari-able.” �e only real di�erence lies in what we do with all the separate graphs we get a�erwe freeze a variable. In one case we try to piece them together to make a bigger draw-ing of a three-dimensional object, in the other we put them together to make a motionpicture.

Problems

In the problems in this stage of the course, you will be asked to “sketch the graph of a function.”From math 221 you remember that this meant you had to find minima, maxima, inflection points,and other features of the graph. In 234 you will learn to do the same for functions of two (andmore) variables, but for now you should try to use the method of “freezing a variable” or othersimilar tricks to get an idea of what the graph of f looks like.

You can use a graphing program (such as Grapher.app on the Mac, GraphCalc on Win-dows, or one of the many websites such as http://www.graphycalc.com/) to check youranswer.

Note: very o�en students try to fittheir drawings into a region the sizeof a post-it. In this course, wheneveryou make a drawing, especially if it’sa three-dimensional drawing, make itlarge! Use half a page for a drawing.Make sure you have enough paper, tryto find lots of cheap scrap paper.

1. If we were to drain LakeMendota, as sug-gested in § 1.4, would the lake bo�om give usthe graph of d(x, y) or of −d(x, y)? (whered is the depth of the lake)? •

2. What are the signs of the coe�icients a,b, and c for the linear function whose graphis drawn in Figure 4? •

3. About planes and their intersections withthe coordinate axes.

(a) Where does the plane z = 3x − y + 6intersect the three coordinate axes? •(b) Find the equation for the plane that in-tersects the x-axis at x = 4, the y-axis aty = 2, and the z-axis at z = 3. •(c) Find the equation for the plane that in-tersects the x-axis at x = a, the y-axis aty = b, and the z-axis at z = c. (Write theequation as nice as possible.) •

4. Find a formula for the distance to the ori-gin of the graph of (45). •

5. Classify the following quadratic forms asdefinite, indefinite, or other, by completingthe square. Determine the zero set for eachof these quadratic forms.

(a) f(x, y) = x2 + 2y2 •(b) Q(x, y) = x2 − y2 •(c) g(x, y) = x2 − 4xy + 3y2 •(d) Q(s, t) = 9s2 − 36st+ 81t2 •(e)M(α, β) = 1

2α2 − αβ + β2. •

(f) Q(x, y) = xy + y2 •(g) Q(x, y) = x2 + 2xy •

6. For which values of the constant k is thequadratic form

Q(x, y) = x2 + 2kxy + y2

positive definite? •

7. Which functions of two variables z =f(x, y) are defined by the following formu-lae?

http://www.graphycalc.com/

PROBLEMS 47

. Find draw the domain of each function(the largest domain on which the definitionwould make sense).

. Try to sketch their graphs.

. Draw the level sets for each function.

(a) z = xy •

(b) z − x2 = 0 •

(c) z2 − x = 0 •

(d) z − x2 − y2 = 0 •

(e) z2 − x2 − y2 = 0 •

(f) xyz = 1 •

(g) xy/z2 = 1 •

(h) x+ y + z2 = 0

(i) x+ y + z2 = 1

8. The following expressions are all equal tothe polar angle θ in some region of the xy-plane. Explain why the expression gives θ,and identify in which region this holds.

(a) θ = arctany

x•

(b) θ = π + arctany

x•

(c) θ = 2π + arctany

x•

(d) θ =π

2− arctan

x

y•

(e) θ = arcsin y√x2+y2

. •

9. “The level set is always a curve. . . ” — not!If d(x, y) is the depth function of Lake Men-dota (see §1.4), then what are the level setsd−1(c) for c = 0, c = +24 and for c = −24(meters)? What is the level set d−1(400)(meter)? •

10. Describe and explain the relation be-tween the graph of the function y = g(x)of one variable, and the corresponding func-tion f(x, y) = g

(√x2 + y2

)of two vari-

ables.

What do the level sets of f(x, y) looklike?

For instance, if g(x) = x, then f(x, y) =√x2 + y2: what is the relation between the

graphs of g and f? •

11. Find the largest domain on which thefollowing functions of two (or occasionallythree) variables can be defined:

(a) f(x, y) =√

9− x2 +√y2 − 4 •

(b) f(x, y) = arcsin(x2 + y2 − 2) •

(c) f(x, y) =√x · √y •

(d) f(x, y) =√xy •

(e) f(x, y, z) = 1/√xyz

(f) f(x, y) =√

16− x2 − 4y2 •

12. Here are two sets of level curves with lev-els z = 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4. Oneis for a function whose graph is a cone (z =√x2 + y2), the other is for a paraboloid

(z = x2 + y2). Which is which? Explain.•

13.�

Let Q be the square in the plane con-sisting of all points (x, y) with |x| ≤ 1,|y| ≤ 1. This problem is about the so-calleddistance function toQ. This function is de-fined as follows: f(x, y) is the distance fromthe point (x, y) to the point in Q nearest to(x, y).

(a) Which point in Q is nearest to (0, 12)?

Which is closest to (0, 2)? Which is closestto (3, 4)? •

(b)Compute f(0, 12), f(0, 2) and f(3, 4)). •

(c)What is the zero set of f? •

(d) Draw the level sets of f at levels −1,1, 2, and 3. Describe the general level set


f(x, y) = c where c is an arbitrary number.•(e) Give a formula for f(x, y). (It turns outto be too hard to capture the distance func-tion in one formula. You will have to splitthe plane into di�erent regions and describef(x, y) by di�erent formulas, according towhich region (x, y) belongs to.) •

14. Describe the “movie” that goes with eachof the following functions.

(a) f(x, t) = x sin t •(b) f(x, t) = x sin 2t •(c) f(x, t) = t sinx •(d) f(x, t) = 2t sinx •(e) f(x, t) = t sin 2x •(f) f(x, t) = (x− t)2 •(g) f(x, t) = (x− sin t)2 •(h) f(x, t) = (x− t2)2

(i) f(x, t) =t2

1 + x2

(j) f(x, t) =1

(1 + x2)(1 + t2)•

15. Describe the movie that goes with thefunction

f(x, t) = arctanx

t,

for t > 0. The function is not defined att = 0, but can you describe the limit of this

function as t→ 0? (Hint: the sign of xmat-ters).

16. If y = g(x) is any function of one vari-able, then a function of the form f(x, t) =g(x − ct) is o�en called a traveling wavewith wave speed c and profile g. Let g be anynon constant function of your choice and de-scribe the movie presented by the functionf(x, t) = g(x− ct) (can’t choose? Then try“Agnesi’s witch” g(x) = 1

1+x2.)

The number c is called the wave speed.If c > 0 is the motion to the le� or to theright? Explain. •

17. If y = g(x) is any function of one vari-able, then a function of the form

f(x, t) = cos(ωt)g(x)

is o�en called a standing wave. Let g beany non constant function of your choiceand describe the movie presented by thefunction f(x, t) = cos(ωt)g(x) (can’tchoose? Then try “Agnesi’s witch” g(x) =

11+x2

again, or for this example, try g(x) =

sinx.)

The number ω2π

is called the frequencyof the standing wave. The function g(x) iscalled its profile. How long does it take be-fore the standing wave returns to its originalposition, i.e. what is the smallest T > 0 forwhich f(x, T ) = f(x, 0) for all x? Explain.•

CHAPTER 4

Derivatives

1. Interior points and continuous functions

Before diving into the calculus of partial derivatives we need to discuss certain as-sumptions that we shall always implicitly make about the functions in this course. �e�rst concerns the domains of our functions. Namely:(49) We only consider functions at interior points of their domain

Here, by de�nition, a point (a, b) in the domain of a function is called an interior point ifthe function is also de�ned at all points (x, y) that lie within some small disc centered at(a, b).

P1 P2

P3

domain of f

Q

Figure 1. Interior and boundary points in the domain of f : P1, P2, or P3 are interior pointsin the domain. Each of these points is the center of a su�iciently small disc that is still containedin the domain. For points such asQ, that lie on the edge of the domain, any disc centered atQwill“stick out of the domain,” no ma�er how small the disc is chosen. If we talk about the derivativeof a function at some point in its domain, then, in this course, we will always assume that we arenot at an edge-point like Q.

�e other standing assumption we make in this course is that(50) all functions we consider are continuous.

We have seen the concept of continuity for functions of one variable. For functions ofmore variables “continuity” has a similar de�nition. In this course we will aim for anintuitive understanding of the concept, which can be formulated as follows.

�e function z = f(x, y) is continuous at some point (a, b) if the functionvalue f(x, y) at any point (x, y) is close to f(a, b) when (x, y) is closeto (a, b).

49

50 4. DERIVATIVES

�ere are many other ways of describing continuity, e.g. one can say that f is continuousat (a, b) if

lim(x,y)→(a,b)

f(x, y) = f(a, b).

To make this precise we would have to de�ne what “lim(x,y)→(a,b) . . . ” means.A precise de�nition of “f is continuous at (a, b)” invokes ε’s and δ’s:

�e function z = f(x, y) is continuous at some point (a, b) if for everyε > 0 there is a δ > 0 such that for every point (x, y) that lies in thedisc of radius δ centered at (a, b) one has |f(x, y)− f(a, b)| < ε.

In this course we will not use the de�nition much, but we will occasionally appeal to theintuitive notion of “continuity.” �e problems show some examples of how a function oftwo variables can fail to be continuous (e.g. Problem 3.1).

Now that we have dispensed with these preliminary issues, we can go on to the centraltopic in the �rst half of the semester: partial derivatives and the chain rule.

2. Partial Derivatives

�ederivative f ′(x) of a function of one variable, y = f(x), measures a rate of change:if we increase x by a small amount ∆x then y = f(x) also increases by a small amount∆y. �e ratio between these two changes is the derivative: f ′(x) ≈ ∆y

∆x .For a function z = f(x, y) of two variables there is a similar concept: if we change

x and/or y by a small amount then z will also change by a small amount, and there areformulas relating the changes ∆x, ∆y and ∆z. Because there are many di�erent ways inwhich we can change x and y there are a few di�erent formulas. We will encounter thefollowing versions of “the derivative of f(x, y)”:IChange only one of the variables but not the other: this leads to the so-called partial

derivatives.I Simultaneously vary both x and y: the resulting change turns out to be the sum

of the changes we would get if we were to vary only x or only y, respectively. �is willfollow from the chain rule, and the resulting formula is called the total derivative.

We begin with the partial derivatives.

2.1. De�nition of Partial Derivatives. If z = f(x, y) is a function of two variablesthen the partial derivatives of f with respect to x and with respect to y are

(51) ∂f

∂x(x, y) = lim

∆x→0

f(x+ ∆x, y)− f(x, y)

∆x

and

(52) ∂f

∂y(x, y) = lim

∆y→0

f(x, y + ∆y)− f(x, y)

∆y

�e followingmore convenient notation is used very o�en (because it’s somuch shorter):

(53) fx(x, y) =∂f

∂x(x, y), fy(x, y) =

∂f

∂y(x, y).

When we are in a hurry we can also drop the “(x, y)” from our notation for derivativesand just write fx and fy .

3. PROBLEMS 51

y

x

∂f

∂yis the rate of change of f in the vertical direction

∂f

∂xis the rate of change of f in the horizontal direction

When we define the partial derivatives at some point(x, y), we assume that the function is defined on somesu�iciently small disc centered at that point (x, y).

Figure 2. The partial derivatives of a function at some point (x, y) measure how fast the func-tion f(x, y) changes if we move the point either horizontally (the x direction) or vertically (the ydirection).

2.2. Partial derivatives of functions of three or more variables. If a functiondepends on three or more variables then one can de�ne its partial derivatives in the sameway as for functions of two variables. For instance, ifw = f(x, y, z) is a function of threevariables, then its partial derivative with respect to x is de�ned to be

∂f

∂x= lim

∆x→0

f(x+ ∆x, y, z)− f(x, y, z)

∆x.

�e derivatives of f with respect to y and z have very similar de�nitions.

2.3. Examples. Computing partial derivatives is not harder than computing ordi-nary derivatives. To �nd the partial derivative of a function with respect to x we justpretend all other variables are constants and di�erentiate. Or, in other words, we couldthink of the partial derivative of f(x, y) with respect to x as the ordinary derivative ofthe function f in which we have frozen the variable y at some particular value.

For instance, the partial derivatives of the function f(x, y, z) = x2 sinπy+ z of threevariables x, y, and z, are

fx = 2x sinπy, fy = πx2 cosπy and fz = 1.

3. Problems

1. For each of the following functions sketchthe graph (use a graphing program, if nec-essary) and decide if you think the functionhas a limit as (x, y) approaches (0, 0).

(a) f(x, y) =xy

x2 + y2

(b) g(x, y) =1

x2 + y2

(c) h(x, y) =x

x2 + y2.

(d) p(x, y) =x√

x2 + y2.

(e) q(x, y) =x2√x2 + y2

.

2. Find the partial derivatives of the follow-ing functions:

(a) f(x, y) = x2y3 − x3y2.

(b) f(x, y) = cos(x2y) + y3. •

(c) f(x, y) =xy

x2 + y. •

(d) f(x, t) = (x+ t)4.

(e) f(x, t) = (x− t)4.

(f) f(x, t) = sinωt cos2πx

L.

52 4. DERIVATIVES

(g) f(x, y) = ex2+y2 . •

(h) f(x, y) = xy ln(xy). •

(i) f(x, y) =√

1− x2 − y2. •

(j) f(x, y, z) =√x2 + y2 + z2

(k) f(u, v) = eu+v

(l) f(x, y) = x tan(y). •

(m) f(x, y) =1

xy. •

3. Let r be the radius in polar coordinates,as defined in § 4 of Chapter III.

(a) Compute the partial derivatives of r.

(b) Show that the partial derivatives of r canbe wri�en as

∂r

∂x=x

r,

∂r

∂y=y

r.

4. Let θ be the polar angle function, definedin § 4.2 of Chapter III.

(a) In the le� half plane the function θ is de-fined by

θ(x, y) = arctany

x.

Use this expression to find its partial deriva-tives, ∂θ

∂xand ∂θ

∂y. •

(b) Check that the angle function also satis-fies

x sin θ = y cos θ

at all points in the plane. Use implicit di�er-entiation to find the partial derivatives ∂θ

∂x

and ∂θ∂y

.

5. Let f(x, y) = the distance from (x, y) tothe origin. Find a formula for f , and com-pute

fx, fy, and√f2x + f2

y .

(Hint: compare this problem with problem3.3.) •

6. Suppose f(t) and g(t) are single variabledi�erentiable functions. Find ∂z/∂x and∂z/∂y for each of the following two variablefunctions.

(a) z = f(x)g(y) •

(b) z = f(xy) •

(c) z = f(x/y) •

7. Let f be the distance to the square Qfunction from problem 5.13. Find the par-tial derivatives fx and fy of f . (Youwill needyour answer to problem 5.13, in particularthe description of f as a “piecewise definedfunction”.)

4. �e linear approximation to a function

4.1. �e Chain Rule and friends. When we compute the partial derivative of afunction with respect to a variable x we pretend all other variables are constants, andjust di�erentiate with respect to x, just as we would in �rst semester calculus. �ereis therefore no need to state a product rule or quotient rule, because these are exactlythe same as for functions of one variable. �e chain rule on the other hand is di�erent:there is a chain rule for functions of several variables, but it has more terms than thechain rule from one-variable calculus. �ere are several related topics that �t togetherin a discussion of the chain rule, namely Linear Approximation, Tangent Planes to aGraph, and The Total Derivative. We will go through these one at a time in the nextfew sections.

4.2. �e linear approximation formula. �e key to the chain rule is the linearapproximation formula. �is formula tells us approximately how much a function z =f(x, y) of two variables changes if both variables are subjected to a small change.

More precisely, if we have a function z = f(x, y), and we know its value f(x0, y0)at some point (x0, y0), then how much does the function value change if x is increasedfrom x0 to x0 + ∆x, and if y is similarly increased from y0 to y0 + ∆y?

4. THE LINEAR APPROXIMATION TO A FUNCTION 53

x0 x0 + ∆x

y0

y0 + ∆y

We can change (x0, y0) to (x0 + ∆x, y0 + ∆y) in twosteps:

first keep y fixed and increase x by ∆x,then keep x fixed and increase y by ∆y

(x, y0)

(x0 + ∆x, y)

To express the change in function values in terms ofderivatives, we can use the Mean Value Theorem. Weget two intermediate points:

one at x = x for the increase in f when x changes,and

one at y = y for the increase in f when y changes.

Figure 3. Computation of the linear approximation (54)

�e basic idea in the computation of the change in f(x, y) is to go from (x0, y0) to(x0 + ∆x, y0 + ∆y) in two steps:

∆f = f(x0 + ∆x, y0 + ∆y)− f(x0, y0)(54)= f(x0 + ∆x, y0 + ∆y)− f(x0 + ∆x, y0)︸︷︷︸

only y changes

+ f(x0 + ∆x, y0)− f(x0, y0)︸︷︷︸only x changes

We have wri�en the total change in f as the sum of two changes, one of them caused bythe change in x, and the other due to the change in y. See Figure 3.

In the second di�erence only x changes while y remains the same, so we can use theone variable Mean Value �eorem to conclude that there is some number x between x0

and x0 + ∆x withf(x0 + ∆x, y0)− f(x0, y0)

∆x= fx(x, y0),

i.e.(55) f(x0 + ∆x, y0)− f(x0, y0) = fx(x, y0) ·∆x.Likewise, in the di�erence in (54) where only y changes we can use the Mean Value�eorem to conclude that there is some y between y0 and y0 + ∆y such that

f(x0 + ∆x, y0 + ∆y)− f(x0 + ∆x, y0)

∆y= fy(x0 + ∆x, y),

and hence(56) f(x0 + ∆x, y0 + ∆y)− f(x0 + ∆x, y0) = fx(x0 + ∆x, y) ·∆x.If we now combine (55) and (56) with (54) then we get

∆f = fx(x, y0) ·∆x+ fy(x0 + ∆x, y) ·∆y.�is equation is exactly true, i.e. we have not made any approximations, and we have notignored any kind of “error terms.” However, the equation does contain the numbers xand y, which are provided by the Mean Value�eorem, and of which we therefore do not

54 4. DERIVATIVES

know anything besides the fact that x lies between x0 and x0 +∆x, and y lies between y0

and y0 + ∆y. We can get rid of this uncertainty by se�ling for an approximation for ∆finstead of the exact expression we have just found. To do this we assume that ∆x and ∆yare “small.” �en, since x lies between x0 and x0 + ∆x, we know that x ≈ x0. We alsoknow that y0 + ∆y ≈ y0, so, if the function fx is continuous, then it seems reasonable toassume that

(57) fx(x, y0 + ∆y) ≈ fx(x0, y0).

Similarly, we will assume that

(58) fy(x0, y) ≈ fy(x0, y0).

Substituting this in (54) we �nd

(59) ∆f ≈ fx(x0, y0)∆x+ fy(x0, y0)∆y

Keeping in mind that ∆f = f(x0 + ∆x, y0 + ∆y)− f(x0, y0), we conclude

(60) f(x0 + ∆x, y0 + ∆y) ≈ f(x0, y0) + fx(x0, y0)∆x+ fy(x0, y0)∆y

�e linear approximation formula (60) is o�en wri�en using Leibniz-style notation forthe derivatives, where one writes ∂f∂x for fx, and ∂f

∂y for fy . In this notation the approxi-mation formula takes these forms:

f(x0 + ∆x, y0 + ∆y) ≈ f(x0, y0) +∂f

∂x(x0, y0) ·∆x+

∂f

∂y(x0, y0) ·∆y,

or, shorter,

(61) ∆f ≈ ∂f

∂x∆x+

∂f

∂y∆y.

�e approximation (60) can also be wri�en without ∆x and ∆y by a change of nota-tion. To do this we introduce

(62) x = x0 + ∆x and y = y0 + ∆y,

and interpret (60) as a formula that tells us approximately what the function value at(x, y) is, provided (x, y) is close enought to (x0, y0). Wri�en in terms of x and y, (60)says

(63) f(x, y) ≈ f(x0, y0) + fx(x0, y0) (x− x0) + fy(x0, y0) (y − y0).

4.3. Linear approximation – in�nitesimal version. We expect the approxima-tion in (61) to improve as we decrease ∆x and ∆y (and we will try to make this statementmore precise in the next section, § 4.4). We could then say, as is commonly done, that thereis an exact equation when ∆x and ∆y are “in�nitely small,” and write this equation as

(64) df =∂f

∂xdx+

∂f

∂ydy.

�e meaning of this equation is that in�nitesimally small changes in x and y, of magni-tudes dx and dy, respectively, lead to an in�nitesimally small change in f of magnitudedf , and that df , dx, and dy are related by (64). Even though it is very di�cult to makesense of the “in�nitely small” quantities dx, dy, df , in (64), this notation is widely used,because the make-belief it entails allows one to ignore the more awkward error termsthat we will now discuss.

5. THE TANGENT PLANE TO A GRAPH 55

4.4. �e linear approximation formula with error term. In our computationof the change ∆f of the function we approximated fx(x, y0) by fx(x0, y0), and fy(x0 +∆x, y) by fy(x0, y0). As a result our linear approximation formula (60) is not an exactequation, but only says that one thing is “approximately equal” to another.

We can make this a bit more precise by including error terms, i.e. by saying that thereare small numbers ex and ey such that

fx(x, y0) = fx(x0, y0) + ex, and fy(x0 + ∆x, y) = fy(x0, y0) + ey.

Here ex and ey depend on ∆x and ∆y, and as both ∆x and ∆y go to zero, the errors exand ey will also go to zero.

Pu�ing this in (54) we get the linear approximation formula with error terms:

(65) f(x0 + ∆x, y0 + ∆y) = f(x0, y0) + fx(x0, y0)∆x+ fy(x0, y0)∆y︸︷︷︸linear approximation

+ ex∆x+ ey∆y︸︷︷︸error

in which ex and ey depend on ∆x,∆y, and satisfylim

∆x,∆y→0ex = lim

∆x,∆y→0ey = 0.

If we ignore the “error term” then we recover the linear approximation formula (60). Ourmore precise linear approximation formula (65) tells us that the error in (60) (di�erencebetween le� and right hand sides) is given by ex∆x+ ey∆y, and that this error is “small” compared to ∆x and ∆y. We could write this as

Error in the approximation = ex∆x+ ey∆y = o(∆x) + o(∆y).

5. �e tangent plane to a graph

5.1. �e tangent plane. For a function z = f(x, y) and a point (x0, y0) the linearapproximation (63) gives us an approximation for the function f at any other point (x, y)near (x0, y0). It says

z ≈ f(x0, y0) + fx(x0, y0)(x− x0) + fy(x0, y0)(y − y0).

If we replace “≈” by equality, then we get a new function of (x, y):(66) z = f(x0, y0) + fx(x0, y0)(x− x0) + fy(x0, y0)(y − y0).

Keeping inmind that f(x0, y0), fx(x0, y0), and fy(x0, y0) are constants, while only (x, y)are variables here, we see that this is the equation for a plane which we call the tangentplane to the graph of f at the point (x0, y0, f(x0, y0)).

5.2. Example: tangent plane to the saddle surface at the origin. Find the equa-tion for the tangent plane to the saddle surface z = xy at the origin.

Solution: �e saddle surface is the graph of the function f(x, y) = xy whose partialderivatives are fx(x, y) = y and fy(x, y) = x. To �nd the tangent plane at x0 = 0,y0 = 0, we compute the partial derivatives,

fx(x, y) =∂xy

∂x= y, so at (x0, y0) = (0, 0) we have fx(0, 0) = 0,

andfy(x, y) =

∂xy

∂y= x, so at (x0, y0) = (0, 0) we have fy(0, 0) = 0,

56 4. DERIVATIVES

y+Δy

yx+Δxx

fy Δyfx Δx

Δy

Linear approximationto the graph of z=f(x, y)

Δxy+Δy

yx+Δxx

fy Δyfx Δx

ΔyΔx

fx Δx+fy Δy

Figure 4. Top: The graph of the linear approximation of f (graph of f itself is not shown – seethe bo�om figure). If we increase x by ∆x, then f will increase by approximately fx∆x, and if weincrease y by ∆y, then f increases by approximately fy∆y. If we increase x and y by ∆x and ∆yat the same time, then f increases by roughly fx∆x+ fy∆y. The vertical do�ed line behind theparallelogram represents this increase in f .

Bo�om: The graph of a function, and of its tangent plane at some point (x0, y0, z0). Thetangent plane is the graph of the linear approximation to f .

Moreover, we also have f(x0, y0) = f(0, 0) = 0, so that the equation for the tangentplane is

z = 0 + 0 · (x− 0) + 0 · (y − 0) = 0,

i.e.,z = 0.

�e tangent plane at the origin is just the xy-plane.

5.3. Example: another tangent plane to the saddle surface. Find the equationfor the tangent plane to the saddle surface z = xy at the point (2, 1, 2). Where does thisplane intersect the coordinate axes?

Solution: �is is almost the same problem as before. �e only di�erence is that we aretrying to �nd the tangent plane at a point other than the origin. To get the tangent planeat the point (x0, y0) = (2, 1) we compute the derivatives

fx(x, y) = y =⇒ fx(2, 1) = 1,

5. THE TANGENT PLANE TO A GRAPH 57

Figure 5. The graph of z = xy and the tangent plane at the origin.

andfy(x, y) = x =⇒ fy(2, 1) = 2.

�e equation for the tangent plane is therefore

z = x0y0 + y0(x− x0) + x0(y − y0)(67)= 2 + 1 · (x− 2) + 2 · (y − 1)

= −2 + x+ 2y

�e intersectionswith thex, y and z axes are, respectively, (2, 0, 0), (0, 1, 0), and (0, 0,−2).

5.4. Example: tangent plane to a sphere. �e point (x0, y0, z0) lies on the upperhalf of the sphere with radius 4 centered at the origin. Find an equation for the tangent planeto the sphere at that point, if x0 = 1 and y0 = 3.

Solution: �e equation for the sphereis x2 + y2 + z2 = 42 = 16, so the upperhalf is the graph of the function

f(x, y) =√

16− x2 − y2.

�e z coordinate of the given point istherefore z0 =

√16− 12 − 32 =

√6. �e

partial derivatives of f at (x0, y0) = (1, 3)are

∂f

∂x=

−x0√16− x2

0 − y20

= − 1√6,

∂f

∂y=

−y0√16− x2

0 − y20

= − 3√6.

�e equation for the tangent plane is then

z =√

6− 1√6

(x− 1)− 3√6

(y − 3)

=16√

6− x√

6− 3y√

6.

58 4. DERIVATIVES

6. �e Two Variable Chain Rule

6.1. �e chain rule. Given two functions x = x(t), y = y(t) of one variable, and afunction z = f(x, y) of two variables, then what is the derivative of the function

g(t) = f(x(t), y(t))?

We can �nd a general formula for g′(t) by using the linear approximation (§ 4) in thefollowing way.

To �nd g′(t0) for some t0, we must compute

g(t0 + ∆t)− g(t0)

∆t

and let ∆t→ 0.If t increases by an amount ∆t from t0 to t0 + ∆t, then x and y will also change. We

write ∆x and ∆y for the changes in x and y, i.e.

∆x = x(t0 + ∆t)− x0, ∆y = y(t0 + ∆t)− y0,

where x0 = x(t0) and y0 = y(t0). �e resulting change in g is thus

∆g = g(t0 + ∆t)− g(t0)

= f(x(t0 + ∆t), y(t0 + ∆t)

)− f

(x(t0), y(t0)

)= f(x0 + ∆x, y0 + ∆y)− f(x0, y0).

By the linear approximation formula (65) one then has

∆f

∆t= fx(x0, y0)

∆x

∆t+ fy(x0, y0)

∆y

∆t+ ex

∆x

∆t+ ey

∆x

∆t

As we let ∆t→ 0 the quotients ∆x/∆t and ∆y/∆t converge to x′(t0) and y′(t0), whilethe errors ex and ey converge to zero, so we get the two-variable chain rule:

(68) df(x(t), y(t))

dt= fx(x0, y0) · x′(t0) + fy(x0, y0) · y′(t0).

�e chain rule is o�en also wri�en as

(69) df

dt=∂f

∂x

dx

dt+∂f

∂y

dy

dt.

�is form becomes easy to remember if we interpret the �rst term as “the change in fcaused by the change in x” and the second term as “the change in f caused by the changein y.”

In the way (69) is wri�en a number of details are swept under the rug: the two deriva-tives dx

dt and dydt are ordinary (Math 221) derivatives of the two functions x(t) and y(t);

the two partial derivatives ∂f∂x and ∂f∂y are the partial derivatives of f in which one has

substituted x(t) and y(t). A more correct way of writing the equation would be

(70) df(x(t), y(t))

dt=∂f

∂x(x(t), y(t)) · x′(t) +

∂f

∂y(x(t), y(t)) · y′(t).

Many people �nd (69) easier on the eyes, so that is what we will usually write.

6. THE TWO VARIABLE CHAIN RULE 59

6.2. �e di�erence between d and ∂. Compare (69) with the linear approximationformula (64) with in�nitesimal small quantities. Equation (69) is just (64) in which onehas divided both sides by dt. In contrast to equation (64) which contains the strange“in�nitely small quantities” dx, dy, df , equation (69) contains the derivatives dx

dt , etc.which are well-de�ned.

Note that we have a breakdown of Leibniz’s notation: if we ignore the distinctionbetween “d” and “∂”, and just cancel dx and ∂x, and also dy and ∂y on the right then weend up with

df

dt=∂f

��∂x��dx

dt+∂f

��∂y��dy

dt=∂f

dt+∂f

dt= 2

∂f

dt,

which doesn’t make a lot of sense. �e moral: don’t cancel dx against ∂x!

6.3. An example. Suppose x(t) = cosωt and y(t) = sinωt, so that #‰x(t) =x(t) #‰e1 + y(t) #‰e2 traces out the unit circle.

How fast does S(t) = 2x(t) + 3y(t) change along this motion?In other words, what can we say about dSdt ?

�e quantity S(t) is the composition of a function of two variables with the functionsx(t) and y(t), i.e. it is the result of substituting x(t) and y(t) in the function f(x, y) =2x+ 3y.

Answer 1 – without using the chain rule. We can simply compute S(t) = cosωt +sinωt and di�erentiate:

(71) dS

dt=

d

dt

{2 cosωt+ 3 sinωt

}= −2ω sinωt+ 3ω cosωt.

Note that we did not use our new two-variable chain rule here. �is answer shows thatthe point of the two-variable chain rule is not to compute d

dtf(x(t), y(t)) in situationswhere we have formulas for the functions f(x, y), x(t), and y(t). In such a situation wecan always substitute x(t) and y(t) in the function f(x, y) a�er which we get a functionS(t) = f(x(t), y(t)) of one variable. We learned how to di�erentiate those in our �rstcalculus course.

Answer 2 – using the chain rule. �e quantity we want to di�erentiate isS(t) = f

(x(t), y(t)

),

wheref(x, y) = 2x+ 3y, and x(t) = cosωt, y(t) = sinωt.

�e chain rule tells us that

(72) dS

dt=∂f

∂x

dx

dt+∂f

∂y

dy

dt.

Here the �rst term stands for the change in S that is caused by the change in x. Tocompute it we �rst �nd

∂f

∂x=∂{2x+ 3y}

∂x= 2,

so that∂f

∂x

dx

dt= 2 · dx

dt.

Similarly, the second term in (72) represents the change in S(t) due to the fact that y ischanging:

∂f

∂y=∂{2x+ 3y}

∂y= 3 =⇒ ∂f

∂y

dy

dt= 3 · dy

dt.

60 4. DERIVATIVES

To get the rate of change of S we add both the x and y contributions to this rate of change,which leads us to

(73) dS

dt= 2 · dx

dt+ 3 · dy

dt.

So far we have not used what we know about x(t) and y(t). �is expression we have justderived for dS/dt is true no ma�er which x(t), y(t) we are given. In our case we have

x(t) = cosωt =⇒ dx

dt= −ω sinωt,

y(t) = sinωt =⇒ dy

dt= +ω cosωt.

Substitute this in (73):dS

dt= −2ω sinωt+ 3 cosωt,

as before.�e moral: In this example the answer using the chain rule was longer, much more

verbose, and perhaps more complicated than the straightforward computation that led toour �rst answer (71). Indeed, if the derivative of S is all we want then our �rst computa-tion is the most e�cient way of ge�ing dS/dt. However, the computation using the chainrule did give us some useful intermediate results, such as the general expression (73) fordS/dt. �is expression remains valid if we change the path (x(t), y(t)) and can thereforebe useful in situations where, for example, we are allowed to choose the path and wewould like to choose a path for which dS/dt has some prescribed value (e.g. suppose wewant to keep S constant, how do we choose the path?)

6.4. Another example. Suppose the temperature at the point (x, y) in the plane isgiven by T (x, y), and suppose that an ant is walking along the parametrized curve

x(t) = R cosωt, y(t) = R sinωt.

�us the ant is walking on a circle with radius R, and with angular velocity ω.How fast is the temperature of the ant changing?

i.e. compute dTdt .Herewe are not given an explicit formula for the function T (x, y), so we cannot substitutex(t) and y(t) in T and di�erentiate using only our �rst semester calculus skills. �eapproach in Answer 1 of our previous example does not apply here; we must use thechain rule.

In § 6.1 we have seen several equivalent ways of writing the chain rule. Let us look attwo of these and consider the meaning of the terms that arise.

�e short form (69) of the chain rule tells us thatdT

dt=∂T

∂x

dx

dt+∂T

∂y

dy

dt.

�e T on the le� stands for T (x(t), y(t)), which we can interpret as the temperature atthe point (x(t), y(t)). �at point is the location of the ant at time t, so the T on the le�is the temperature the ant feels at time t. �is is a function of t. In mathematical termsit is the result of substituting (composing) the functions x(t) and y(t) in the functionT = T (x, y).

�e two T ’s on the right appear in partial derivatives. Here ∂T∂x stands for the partial

derivative of the function T = T (x, y) with respect to the variable x. One can computethis without knowing the ant’s path (x(t), y(t)). Similarly, ∂T∂y is the partial derivative of

7. PROBLEMS 61

7068

6664

T=62°F

6058

56

54

48

5250

7274

Figure 6. Ant walking in a region of varying temperature.

T with respect to y. �e partial derivatives ∂T∂x and ∂T

∂y themselves are again functionsof x and y. A�er computing these partials they are meant to be evaluated at the point(x(t), y(t)).

�is leads us to the more verbose version (70) of the chain rule, which tells usdT (x(t), y(t))

dt=∂T

∂x(x(t), y(t)) · x′(t) +

∂T

∂y(x(t), y(t)) · y′(t).

At this point the only additional information we have is about the ant’s motion, namely,x(t) = R cosωt and y(t) = sinωt. We can compute the derivatives of x(t) and y(t),which gives us the velocity of the ant in the x and y directions:

x′(t) = −ωR sinωt, y′(t) = ωR cosωt.

If we substitute everything we know in the chain rule we �nd that the rate at which theant’s temperature changes isdT

dt= −∂T

∂x(R cosωt,R sinωt) · ωR sinωt+

∂T

∂y(R cosωt,R sinωt) · ωR cosωt.

To make the equation more readable one can leave out the (R cosωt,R sinωt), whichresults in

dT

dt= −ωR sinωt

∂T

∂x+ ωR cosωt

∂T

∂y.

�e disadvantage of this shorter version is that the reader has to �gure out where weintended to evaluate the two partial derivatives ∂T∂x and ∂T

∂y .

7. Problems

1. Find the linear approximation to f(x, y)at the point (a, b) in the following cases:

(a) f(x, y) = xy2, (a, b) = (3, 1). •(b) f(x, y) = x/y2, (a, b) = (3, 1). •(c) f(x, y) = sinx+ cos y, (a, b) = (π, π).•(d) f(x, y) = xy/(x+ y), (a, b) = (3, 1). •

2. Find an equation for the plane tangentto the graph of f(x, y) = sin(xy) at(π, 1/2, 1). •

3. Find an equation for the plane tangent tothe graph of f(x, y) = x2 + y3 at (3, 1, 10).•

62 4. DERIVATIVES

4. Find an equation for the plane tangentto the graph of f(x, y) = x ln(xy) at(2, 1/2, 0). •

5. (a) Find an equation for the plane tangentto the surface defined by 2x2+3y2−z2 = 4at (1, 1,−1). (Hint: first write the surface asa graph z = f(x, y)). •(b) The same question at the point(1, 1,+1).

6. (a) Suppose you have computed thetwo partial derivatives of a function z =f(x0, y0), and you found fx(x0, y0) = Aand fy(x0, y0) = B. Find a normal vec-tor to the tangent plane of the graph of z =f(x, y) at (x0, y0, z0).

(Hint: If you know the equation for aplane, then how do you find a normal vec-tor to this plane?) •(b) Find an equation in vector form for thetangent plane to x2 + 4y2 = 2z at (2, 1, 4).Also find an equation for the normal line tothe graph at (2, 1, 4). (The normal line to thegraph of a function at some point P , is theline through P that is perpendicular to thetangent plane to the graph at P .) •

7. Imagine a di�erentiable function,f(x, y). Make a good drawing of the func-tion f and show how fx(a, b) and fy(a, b)are the slopes of two lines which are tangentto the graph at (a, b). Indicate clearly which

two lines you mean, and describe how theyare defined.

(Can’t think of a nice graph? Take some-thing like the bo�om drawing in Figure 4.)•

8. Let f be as in problem 7.4. Use linear ap-proximation to approximate f(1.98, 0.4) byhand. Compare your answer with the actualvalue of f(1.98, 0.4) (you’ll need a calcula-tor). •

9. (a) The tangent plane to the saddle sur-face z = xy at the origin intersects thegraph of the saddle surface in two lines.Which lines are they? •(b)Consider the tangent plane to the saddlesurface at x = 2, y = 1 that was computedin §5.3. Let (x, y, z) be a point on the saddlesurface, and let (x, y, z∗) be the point on thetangent plane with the same x and y coor-dinates. What is the di�erence in heights ofthese two points? •(c) Show that the saddle surface and its tan-gent plane intersect when x = 2 or y = 1.

10. (a) Find an equation for the tangent)plane to the graph of f(x, y) = xy at thepoint (a, b, ab). Here a and b are constantswhich will appear in your answer. •(b) Show that the intersection of the tangentplane and the graph consists of two straightlines. •

8. Gradients

8.1. �e gradient vector of a function. �e right hand side in the chain rule (68)can be wri�en as a dot-product of two vectors, namely

df

dt= fx(x0, y0) · x′(t0) + fy(x0, y0) · y′(t0)(74)

=

(fx(x0, y0)fy(x0, y0)

)·(x′(t0)y′(t0)

)�is turns out to be so useful that the vector containing the derivatives of f has beengiven a name. It is called the gradient of f , and it is wri�en as

(75) #‰∇f(x, y)def=

(fx(x, y)fy(x, y)

)�e symbol #‰∇ is pronounced “nabla.”

�e chain rule, wri�en in vector form, looks like this:

(76) df( #‰x(t))

dt=

#‰∇f(x(t)) · #‰x ′(t)

8. GRADIENTS 63

#‰∇f(P )

f = 0.0

-0.6

-0.3

0.30.6

AB

CD

P

Figure 7. The gradient as direction of fastest increase: if we are at a point P , and we are allowedto jump to any point at a given fixed distance from P , and if we only know

#‰∇f(P ), then the linearapproximation formula tells us that

to maximize f we follow the gradient (choose A);to minimize f we go in the direction opposite to

#‰∇f(P ) (chooseD);to keep f fixed we move perpendicular to the gradient (choose B or C).

�e linear approximation formula (60) can also be rewri�en more compactly using thegradient vector:

(77) f( #‰x0 + ∆ #‰x) ≈ f( #‰x0) +#‰∇f( #‰x0) ·∆ #‰x .

8.2. �egradient as the “direction of greatest increase” for a function f . Whenwe apply the formula

(78) #‰a · #‰b = ‖ #‰a‖ ‖ #‰

b ‖ cos∠( #‰a ,#‰

b )

for the dot product to the vector form (77) of the linear approximation equation, we �nda very useful interpretation of the gradient. If we are at a point with position vector #‰x0

(P in �gure 7) and we are allowed to make a small step ∆ #‰x in any direction we like, butof prescribed length, then which way should we go if we want to increase f as much aspossible? And where should we go if, instead, we want to decrease f as much as possible?What if we want to keep f the same?

From (77) we see that the change in f is (approximately) given by

∆fdef= f( #‰x + ∆ #‰x)− f( #‰x)

(77)≈ #‰∇f ·∆ #‰x

(78)= ‖ #‰∇f‖ ‖∆ #‰x‖ cos θ

where θ is the angle between the gradient #‰∇f and the vector ∆ #‰x which represents thestep we take. In this formula the lengths #‰∇f and ‖∆ #‰x‖ are �xed, and the angle θ is theonly thing we can change. �erefore the largest change in f results if cos θ = +1, thesmallest when cos θ = −1, and no change will result if cos θ = 0. So we conclude

• To increase f as much as possible choose ∆ #‰x in the direction of the gradient#‰∇f ,

• To decrease f as much as possible choose ∆ #‰x in the direction opposite to thegradient #‰∇f , i.e. in the direction of − #‰∇f ,

• To keep f constant choose ∆ #‰x perpendicular to the gradient.

64 4. DERIVATIVES

�is is sometimes summarized by saying that the gradient #‰∇f points in the directionof fastest increase for the function f .

8.3. �e gradient is perpendicular to the level curve. Suppose that for somefunction z = f(x, y) the level set at level C is a curve, and suppose that we have aparametric representation #‰x(t) =

(x(t)y(t)

)of this curve. �is means that x(t) and y(t)

satisfyf(x(t), y(t)) = C.

By the chain rule we then get

0 =df( #‰x(t))

dt=

#‰∇f( #‰x(t))· #‰x ′(t),

which tells us that the tangent vector #‰x ′(t) to the level set is perpendicular to the gradient#‰∇f( #‰x(t)) of the function. �erefore,

if#‰∇f(x0, y0) 6= #‰

0 , then#‰∇f(x0, y0) is a normal vector to the tangent

to the level curve of f at (x0, y0).We now have the necessary ingredients to write the equation for the tangent, namely weknow a point (x0, y0) on the line, and we know a normal vector to the line (the gradient).�us the equation for the tangent is

#‰∇f( #‰x0)·( #‰x − #‰x0) = 0,

or, equivalently,∂f

∂x(x0, y0) · (x− x0) +

∂f

∂y(x0, y0) · (y − y0) = 0.

8.4. �e tangent to the parabola y = x2, again. �e very �rst example anyonesees in their �rst calculus course must surely be the computation of the tangent to theparabola y = x2 at the point (x, y) = (1, 1). We know the answer: it is a line with slope2, through the point (1, 1).

We can interpret the parabola as the zero set of the function of two variables given byf(x, y) = y − x2, and therefore we should be able to �nd the same tangent at (1, 1) bycomputing the gradient of f . �e computation goes like this:

f(x, y) = y − x2 =⇒ #‰∇f(x, y) =

(fxfy

)=

(−2xy

).

At (x, y) = (1, 1) we have#‰∇f(1, 1) =

(−21

).

�is vector is perpendicular to the tangent to its zero set. If we let #‰x0 = ( 11 ) be the

position vector of our point on the parabola, then the equation for the tangent to theparabola at this point is

#‰n·( #‰x − #‰x0) = 0,

i.e. (−21

)·(x− 1y − 1

)= 0.

Simplifying this we get−2 · (x− 1) + 1 · (y − 1) = 0, and thus y = 2x− 1.

�is is the same line that we found in our �rst calculus course.

8. GRADIENTS 65

8.5. Example: the tangent to the zero set of x2 − y2 + y3. Consider the zero setof the function

f(x, y) = x2 − y2 + y3.

�e resulting curve is not as familiar as the parabola from the previous example, anddrawing the curve takes some e�ort1.

We will not try to draw the whole zero set in this example, but instead we will seewhat happens when we try to �nd the tangent to the zero set at two di�erent points onthe zero set, namely, at (0, 1) and at the origin.

�e tangent at (0, 1). To �nd the tangent at any point on the zero set of f we use thatthe normal to the tangent is given by the gradient of f

#‰∇f =

(fxfy

)=

(2x

−2y + 3y2

).

�e normal to the tangent at the point (0, 1) is therefore

#‰n =#‰∇f(0, 1) =

(0

−2 + 3

)=

(01

).

In other words, the normal to the tangent at (0, 1) is the vertical unit vector #‰e2, andtherefore the tangent is a horizontal line through (0, 1). Its equation is y = 1. We couldalso �nd this equation by working out the general equation #‰n·( #‰x − #‰x0) = 0 for a linewith a given normal and point. Here we have

#‰n =#‰∇f(0, 1) =

(01

), #‰x0 =

(01

),

so the equation for the tangent is(01

)·(x− 0y − 1

)= 0,

which simpli�es toy − 1 = 0.

�e tangent at the origin. When we repeat the previous calculation at (x0, y0) =

(0, 0) we run into problems. �ese problems begin when we compute the gradient #‰∇fat the origin:

#‰∇f(0, 0) =

(2x

−2y + 3y2

)x=0,y=0

=

(00

).

�e gradient at the origin turns out to be the zero vector. �is is problematic becausethe zero vector has no direction, and thus is not perpendicular to any particular line. Wecannot �nd the tangent at the origin!

To see what is going on one has to take a closer look at the curve near the origin – see�gure 8. It turns out that near the origin the zero set of f consists of two smooth curvesthat cross each other.2 �e gradient has to be perpendicular to both of these curves, andthe only vector that achieves this is the zero vector. Note also that there is no single line

1One could start by solving the equation for x, which leads to x = ±y√

1− y. �is shows that y ≤ 1 onthe curve. Graphing x = y

√1− y using our 1st semester calculus skills then gives us half the curve; the other

half is given by its re�ection in the y-axis, i.e. x = −y√

1− y.2One way to see this is to solve x2−y2 +y3 = 0 for x, which gives x = ±y

√1− y. Near the origin y is

very small, so we can approximate√

1− y ≈√

1 = 1. �e zero set near the origin is therefore approximatelydescribed by x = ±y, i.e. two crossing lines.

66 4. DERIVATIVES

that is tangent to the zero set at the origin. If we had seen the drawing ahead of time thenwe would not have expected to �nd a tangent to the zero set of f at the origin.

f(x, y) = 0

#‰∇f(0, 1)

(a, b)

#‰∇f(a, b)

Figure 8. The zero set of the function f(x, y) = x2 − y2 + y3, and its gradient at various pointson this zero set. Since the gradient is always perpendicular to the level set of a function, a drawingof the zero set tells us the direction of the gradient. However, the drawing does not say anythingabout the length of the gradient.

9. �e chain rule and the gradient of a function of three variables

9.1. �e gradient, etc. So far we have only looked at the gradient of a function oftwo variables. But for a function of three variables there is a very similar de�nition, andthe facts we have discovered have nearly identical counterparts.

If u = f(x, y, z) is a function of three variables, then its gradient is de�ned to be thevector

#‰∇f(x, y, z) =

fx(x, y, z)fy(x, y, z)fz(x, y, z)

.

�e chain rule in this context says that, if x = x(t), y = y(t), and z = z(t) are functionsof one variable, then the derivative of the function we get by substituting x(t), y(t), z(t)in f is given by any of the following three equivalent formulas

df(x(t), y(t), z(t))

dt= fx(x(t), y(t), z(t))x′(t) + fy(x(t), y(t), z(t)) y′(t)(79)

+ fz(x(t), y(t), z(t)) z′(t)

=∂f

∂x

dx

dt+∂f

∂y

dy

dt+∂f

∂z

dz

dt

=#‰∇f( #‰x(t))· #‰x ′(t), where #‰x(t) =

x(t)y(t)z(t)

.

�e linear approximation formula for the function f at some point (x0, y0, z0), whichgives us an approximation of the amount by which f increases if we go from (x0, y0, z0)to (x, y, z) = (x0 + ∆x, y0 + ∆y, z0 + ∆z), is as follows:

∆f = f(x, y, z)− f(x0, y0, z0)(80)

≈ ∂f

∂x·∆x+

∂f

∂y·∆y +

∂f

∂z·∆z,

9. THE CHAIN RULE AND THE GRADIENT OF A FUNCTION OF THREE VARIABLES 67

in which the partial derivatives are to be evaluated at (x0, y0, z0). Compare this with thetwo variable version (59). In vector form we have

(81) ∆f = f( #‰x0 + ∆ #‰x)− f( #‰x0) ≈ #‰∇f( #‰x0)·∆ #‰x ,

where #‰x0 =

x0

y0

z0

, ∆ #‰x =

∆x∆y∆x

.

�is is the same formula as in the two-variable case, where we had (77). �e discussionabout “direction of fastest increase” applies to the three variable case without change.�us, if we are at a point #‰x0, and we are allowed to change our position by a smallvector ∆ #‰x of a prescribed length, then we should choose ∆ #‰x in the direction of thegradient #‰∇f( #‰x) if we want to increase f as much as possible; we should choose ∆ #‰x inthe direction of − #‰∇f( #‰x) if we want to decrease f as much as possible; and we shouldchoose ∆ #‰x perpendicular to #‰∇f( #‰x) if we want to keep f constant.

9.2. Tangent plane to a level set. If t = f(x, y, z) is a function of three variablesthen it is hard to visualize its graph, since this involves drawing four mutually perpendic-ular axes, something we, three dimensional creatures, cannot do. However, we can try tovisualize the level sets of the function. �e level set at levelC consists, by de�nition, of allpoints in three dimensional space whose coordinates satisfy the equation f(x, y, z) = C .

For instance, the unit sphere is given by the equation x2 + y2 + z2 = 1, so it is thelevel set at level 1 of the function f(x, y, z) = x2 + y2 + z2. �e sphere with radius R isthe level set of the same function f at level R2.

Consider a function of three variables, and let (x0, y0, z0) be some point on the level setat level C (thus f(x0, y0, z0) = C .) �e equation for the level set itself is f(x, y, z) = C ,and since (x0, y0, z0) satis�es this equation we can write the equation for the level set as

f(x, y, z)− f(x0, y0, z0) = 0.

Near the point (x0, y0, z0) we can use the linear approximation of f to approximate theequation for the level set of f . We have

f(x, y, z)− f(x0, y0, z0) ≈ ∂f

∂x· (x− x0) +

∂f

∂y· (y − y0) +

∂f

∂z· (z − z0),

where, as in (80), the partial derivatives are to be computed at the given point (x0, y0, z0).�ey are, in particular, constants (they depend on (x0, y0, z0) but not on (x, y, z).)

#‰∇f

f(x, y, z) = C

68 4. DERIVATIVES

�us we see that near any particular point on the level set of a function we can ap-proximate the equation for the level set by

(82) ∂f

∂x· (x− x0) +

∂f

∂y· (y − y0) +

∂f

∂z· (z − z0) = 0.

If at least one of the partial derivatives at (x0, y0, z0) is non zero, then this is the equationof a plane. We call this plane the tangent plane to the level set.

In vector form the equation for the tangent plane to a level set of f at a point withposition vector #‰x0 can be wri�en as

(83) #‰∇f( #‰x0)·( #‰x − #‰x0) = 0.

From this equation we see that, just as in the case (§8.3) of level curves of a functionof two variables, the gradient #‰∇f( #‰x0) is perpendicular to the tangent plane of thelevel set of the function f at the point #‰x0.

9.3. Example: tangent plane to a sphere revisited. In the example in § 5.4 wefound the tangent plane to the sphere at the point (1, 3,

√6), where the sphere had radius

4, and was centered at the origin. �ere we represented the top half of the sphere as thegraph of a function. We will now redo this calculation by representing the sphere as thelevel set of some other function.

By Pythagoras the distance d from a point (x, y, z) to the origin satis�esd2 = x2 + y2 + z2.

�e sphere with radius 4 and center at the origin therefore consists of all points (x, y, z)that satisfy

x2 + y2 + z2 = 42 = 16.

In other words, it is the level set at level C = 16 of the functionf(x, y, z) = x2 + y2 + z2.

To �nd an equation for the tangent plane through the point (1, 3,√

6) we need two ingre-dients: a point on the plane and a normal vector to the plane. (See Chapter I, §11.2.) Wealready have a point on the plane, namely our point (1, 3,

√6), and the normal is given

by the gradient of the function f whose level set is the sphere. �is gradient is easy tocompute. Since f(x, y, z) = x2 + y2 + z2, we have

∂f

∂x= 2x,

∂f

∂y= 2y,

∂f

∂z= 2z,

and thus#‰∇f(1, 3,

√6) =

2x2y2z

(x,y,z)=(1,3,

√6)

=

26

2√

6

.

�e equation for the tangent plane is #‰n·( #‰x− #‰x0) = 0, where the normal #‰n to the tangentplane is the gradient #‰∇f evaluated at our given point #‰x0. So, the tangent plane is givenby

#‰∇f(1, 3,√

6)·( #‰x − #‰x0) = 0,

which we can write as 26

2√

6

· x− 1

y − 3

z −√

6

= 0,

10. IMPLICIT FUNCTIONS 69

i.e.2(x− 1) + 6(y − 3) + 2

√6(z −

√6) = 0.

A�er some cleaning up we get

x+ 3y +√

6z = 16.

�is is the same answer we got in §5.4.

9.4. Example. Find the linear approximation of F (x, y, z) = e−y(x − z)2 andtangent plane to its level set at x = 1, y = 2, z = 5

Solution: At the given values of x, y, z on has F (1, 2, 5) = e−2(1− 5)2 = 16/e2. �epartial derivatives of F are

Fx = 2(x− z)e−y, Fy = −e−y(x− z)2, Fz = −2(x− z)e−y,

which at (x, y, z) = (1, 2, 5) reduces to Fx = −8/e2, Fy = −16/e2 and Fz = +8/e2. If(x, y, z) is close to (1, 2, 5), then the linear approximation formula tells us that

F (x, y, z) ≈ F (1, 2, 5)− 8

e2(x− 1)− 16

e2(y − 2) +

8

e2(z − 5)

or, in “∆x” notation,By definition:∆x = x− 1∆y = y − 2

∆z = z − 5

F (1 + ∆x, 2 + ∆y, 5 + ∆z) ≈ F (1, 2, 5)− 8

e2∆x− 16

e2∆y +

8

e2∆z.

�e equation for the tangent plane to the level set of F at the point (1, 2, 5) is therefore

− 8

e2(x− 1)− 16

e2(y − 2) +

8

e2(z − 5) = 0,

or, a�er cancelling e2’s and 8’s: (x− 1) + 2(y− 2)− (z − 5) = 0. Further simpli�cationshows that the equation for the tangent plane is

x+ 2y − z = 0.

10. Implicit Functions

In �rst semester calculus we learned a procedure for �nding derivatives of implicitlyde�ned functions. If some function y = f(x) was not given by an explicit formula, butrather by an implicit equation

(84) F (x, y) = 0

then there was a way to �nd the derivative of y = f(x) from the above equation only. Butthere was no formula for f ′(x). �e reason is that the formula for the derivative f ′(x)involves the partial derivatives of F .

In this section we review implicit di�erentiation again. �e following theorem is aboutthe zero set of the function F . One usually thinks of the zero set of a function of twovariables as a curve (“an equation de�nes a curve”) but this is not always so. �e theorembelow gives us a way to �nd out if the zero set is really a curve, at least near any givenpoint on the zero set which we happen to know.

70 4. DERIVATIVES

B

A

C

Dy=f(x)x=g(y)

F(x,y) =

0

Figure 9. The Implicit Function Theorem. The zero set of a function F (x, y) does not haveto be the graph of a function, but if at some point (A) on the zero set we have Fy 6= 0, then, nearthat point A, the zero set is the graph of a function y = f(x). If Fx 6= 0 at some point (B), thennear B the zero set is also the graph of a function, provided we let x be a function of y: x = g(y).

Exceptional points: At some points, like C and D in this figure, the level set of F cannot berepresented as the graph of a function y = f(x), nor can it be represented as a graph of the typex = g(y). At such points the Implicit Function Theorem implies that both Fx = 0 and Fy = 0.

10.1. �e Implicit Function �eorem. Let F (x, y) be a function de�ned on someplane domain with continuous partial derivatives in that domain, and suppose that a point(x0, y0) in the zero set of F is given.

If ∂F∂y (x0, y0) 6= 0 then there is a small rectangle centered at (x0, y0) such that withinthis rectangle the zero set of F is the graph of a function y = f(x). �e derivative of thisfunction is

(85) f ′(x) =dy

dx= −Fx(x, f(x))

Fy(x, f(x)).

If ∂F∂x (x0, y0) 6= 0 then there is a small rectangle centered at (x0, y0) such that withinthis rectangle the zero set of F is the graph of a function x = g(y). �e derivative of thisfunction is

(86) g′(y) =dx

dy= −Fy(g(y), y)

Fx(g(y), y).

A proof may be given in class, time permi�ing.�ere is no need to memorize the formulas (85) and (86). We can get them by using the

method of implicit di�erentiation from math 221. For instance, suppose that the graph ofthe function y = f(x) gives you a piece of the zero set of F . �is means that

F (x, f(x)) = 0 for all x.

10. IMPLICIT FUNCTIONS 71

Di�erentiating both sides of this equation leads us via the chain rule,dF (x, f(x))

dx=∂F

∂x(x, f(x)) · dx

dx+∂F

∂y(x, f(x)) · df(x)

dx,

to

(87) 0 =dF (x, f(x))

dx= Fx(x, f(x)) + Fy(x, f(x))f ′(x).

Solve this for f ′(x) and we get

f ′(x) =dy

dx= −Fx(x, f(x))

Fy(x, f(x)),

which is what the theorem claims.

10.2. �e Implicit Function �eorem with more variables. �ere are manyvariations and extensions of �eorem 10.1. �e simplest is to consider the level set ofa function of three rather than two variables. Suppose F is a function of three variables,with continuous partial derivatives, and consider the set of points de�ned by the equation

F (x, y, z) = C.

�is is the level set of F at level C .If

∂F

∂y(x0, y0, z0) 6= 0,

then near (x0, y0, z0) the level set of F is the graph of a function y = g(x, z), meaningthat the function y = g(x, z) satis�es

G(x, g(x, z), z) = 0.

Hence we can �nd the partial derivatives of this function by implicit di�erentiation. �eresult is

(88) ∂y

∂x= gx(x, z) = −Fx(x, y, z)

Fy(x, y, z),

∂y

∂z= gz(x, z) = −Fz(x, y, z)

Fy(x, y, z),

where y = g(x, z).

10.3. Example –�e saddle surface again. �e saddle surface is the graph of thefunction z = xy, which we can think of as the zero set of the function

F (x, y, z) = z − xy.�e point (2, 3, 6) lies on the saddle surface, and at this point the partial derivatives of Fare

Fx =∂(z − xy)

∂x= −y = −3, Fy =

∂(z − xy)

∂y= −x = −2, Fz =

∂(z − xy)

∂z= 1.

Since Fx(2, 3, 6) = −3 is non zero, the Implicit Function �eorem tells us that near thispoint the zero set of F is the graph of a function x = g(y, z). Solving F = 0 for x we seethat this function is in fact

x = g(y, z) =z

y.

�e partial derivatives of g are easy to compute in this example, but even if we couldn’t�nd them directly, the Implicit Function �eorem would tell us that

gy(3, 6) = −Fy(2, 3, 6)

Fx(2, 3, 6)=

2

3, gz(3, 6) = −Fz(2, 3, 6)

Fx(2, 3, 6)=

1

3.

72 4. DERIVATIVES

Problems

1. Compute the gradient of each function inProblem 3.2 of § 3.

2. Show that for any two di�erentiablefunctions f and g one has

#‰∇(f ± g) =#‰∇f ± #‰∇g,

#‰∇(fg) = f#‰∇g + g

#‰∇f,

#‰∇(fg

)=g

#‰∇f − f #‰∇gg2

.

In other words the sum-, product- and quo-tient rules for di�erentiation also apply tothe gradient. •

3. (a) Draw the level sets of the functionf(x, y) = x2 + 4y2 at levels 0, 4, 16.

(b) Find the points on the level set f(x, y) =4 where the gradient is parallel to the vector( 1

1 ). What can you say about the tangentline to the level set at those points? Drawthe gradient vectors, and the tangent linesat the points you just found.

Hint: two non-zero vectors #‰v and #‰ware parallel if there is a number s such that#‰v = s #‰w. •(c) Repeat the same two problems for thefunction g(x, y) = 4xy2. •

4. (a) Draw the zero set of the functionf(x, y, z) = x2 + y2 − 2z. •(b) Find all points on the zero set of the func-tion f where the gradient is parallel to the

vector #‰v =(

112

). •

5. A bug is crawling on the surface of a hotplate, the temperature of which at the pointx units to the right of the lower le� cornerand y units up from the lower le� corner isgiven by T (x, y) = 100− x2 − 3y3.

(a) If the bug is at the point (2, 1), in whatdirection should it move to cool o� thefastest? •(b) If the bug is at the point (1, 3), in whatdirection should it move in order tomaintainits temperature? •

6. The level sets of a function z = f(x, y)are o�en curves. Must they always becurves? Could the zero set of a functionbe a solid square (e.g. all points (x, y) with0 ≤ x ≤ 1 and 0 ≤ y ≤ 1)? •

7. The caption of Figure 8 says that one canonly see the direction, but not the length ofthe gradient

#‰∇f of a function, from just oneof its level sets. It is however possible to seewhere the gradient is larger from a drawingof several level sets. We can read this in-formation from the way in which level setsare more bunched together in some regionsthan in others.

f=0.3

f=0.2

f=0.1

f=0.0

f=-0.1

The picture above shows some level setsof a function. On the bo�om le� the levelsets are further apart, on the top rightthey are more bunched together. Where isthe gradient the larger, i.e. where is ‖ #‰∇f‖larger: bo�om-le�, or top-right? •

8. Have a look at Figure 8. Assume the func-tion di�erentiable at the origin.

(a)What can you say about the gradient#‰∇f

at the origin? •(b)Where is the function positive andwhereis it negative (assume that the whole zero setis drawn). •

9. This problem asks you to think about theImplicit Function Theorem 10.1

Consider the unit circle C with equation

x2 + y2 = 1.

The unit circle C is a level set of the functionF (x, y) = x2 + y2.

(a) Where on C is Fy 6= 0? Near whichpointsP on C can one represent C as a graphof the form y = f(x)?

(b) Near which points P on C can one rep-resent C as a graph of the form x = g(y)?

10. Here is the zero set of a function z =f(x, y) (in bold). The function is only zero

11. THE CHAIN RULE WITH MORE INDEPENDENT VARIABLES; COORDINATE TRANSFORMATIONS 73

on the bold curve, it is nonzero everywhereelse.

A

B

f(x, y) = 0

f(x, y) = -0.1 ⁇

(a) One of the two other curves above is thelevel set f(x, y) = −0.1. Which one is it, Aor B? As always, explain your answer.

(b) Draw a possible level set f(x, y) =+0.1.

(c) Draw possible gradients on the zero set(similar to Figure 8).

11. Here is the zero set of a di�erentiablefunction z = f(x, y).

AB

f(x,y)=0

Explain why the Implicit Function Theorem(§10.1) implies that

#‰∇f =#‰0 at the two

points A and B.

12. (a)Compute the gradient of the “distanceto the square function” f from problems 5.13and 3.7.

(b) How much is | #‰∇f |? •(c)Make a drawing of the level sets of f , andthe gradient

#‰∇f .

13. Let f(x, y) = ln(2 + 2x+ ey).

(a) Compute the gradient of f at the point(x0, y0) with position vector #‰x0 = ( 1

0 ).

(b) You are allowed to choose a point at adistance 0.01 from the point (1, 0). Wherewould you choose the new point if you wantf to be as large as possible? (Hint: reviewthe linear approximation formula and sub-sequent discussion about the gradient as di-rection of greatest increase in §8.2)

(c) Is your answer to the previous the exactanswer, or only an approximation? I.e., couldsomeone else find a point at distance 0.01from (1, 0) at which f has a (slightly) highervalue than at the point you found?

(d) The level set C of f through the point(1, 0) happens to be the graph of a functiony = g(x). Find that function.

(e) Find a normal vector to the tangent linetoC at the point (1, 0). Find an equation forthe tangent line to C at (1, 0).

(f) How much is g(1)? Find two di�erentways to compute g′(1) based on the workyou have done so far.

14. Let (a, b, c) be a point on the sphere withradius R centered at the origin. Find anequation for the tangent plane to the sphereat (a, b, c). Simplify your answer as much aspossible (a, b, c, and R will show up in youranswer of course.) •

11. �e Chain Rule with more Independent Variables;Coordinate Transformations

�e chain rule we have seen so far tells us how to di�erentiate expressions of the formf(x(t), y(t)). Such expressions are the result of substituting two functions x(t), y(t) ofone variable t in one function of two variables z = f(x, y). What dowe do if the functionsx, y that get substituted in f(x, y) depend on not one, but two (or more) variables? �eanswer is easy: we do exactly the same.

For instance, suppose we want to substitute x = x(u, v) and y = y(u, v) in a functionz = f(x, y), resulting in a function F (u, v) = f(x(u, v), y(u, v)), and suppose we want�nd the partial derivatives of F with respect to u. To compute this we keep v �xed andregard u as the variable – then x(u, v) and y(u, v) are functions of one variable u and we

74 4. DERIVATIVES

apply the chain rule we already know. �is leads to

∂F

∂u=∂f

∂x

∂x

∂u+∂f

∂y

∂y

∂u

�e only di�erence with (69) is that we have wri�en the derivatives of x and y as partialderivatives. We do this to indicate that in computing this derivative we momentarilyconsider x as a function of u, but later we may want to vary v again.

�e same considerations lead to the partial derivative of F with respect to v:

∂F

∂v=∂f

∂x

∂x

∂v+∂f

∂y

∂y

∂v.

11.1. An example without context. Suppose f is some function of two variablesand we want to �nd the partial derivatives of

g(u, v, w) = f(2uv, u2 + w2).

By this we mean that g is the result of substituting x = 2uv and y = u2 +w2 in f . Notethat g is a function of three vairables, and f is a function of two variables.

�e chain rule tells us that the derivatives of g are∂g

∂u=∂f

∂x

∂x

∂u+∂f

∂y

∂y

∂u= 2v

∂f

∂x+ 2u

∂f

∂y

∂g

∂v=∂f

∂x

∂x

∂v+∂f

∂y

∂y

∂v= 2u

∂f

∂x

∂g

∂w=∂f

∂x

∂x

∂w+∂f

∂y

∂y

∂w= 2w

∂f

∂y

11.2. Example: a rotated coordinate system. We are used to specifying the loca-tion of points in the plane by giving their x and y coordinates, but sometimes it is be�erto use di�erent coordinates. For instance, two people A and B could have chosen the sameorigin, but their axes could be rotated with respect to each other. See Figure 10. If A’scoordinates are called x, y and B’s coordinates areX,Y then it should be possible to �ndA’s coordinates of a point if we know what coordinates B assigns to this point – given

Figure 10. A�er choosing di�erent x and y axes, A and B will assign di�erent x, y coordinates tothe same point in the plane. Equations (89) give the relation between these two sets of coordinates.

12. PROBLEMS 75

X,Y what are x, y? �e answer to this question is3

(89){x = X cosα− Y sinα,

y = X sinα+ Y cosα.

Suppose both A and B are measuring the temperature T at various points in the plane.A predicts the temperature at various points in the plane: he says that at the point withcoordinates (x, y) the temperature will be T (x, y). In fact he has also found the partialderivatives ∂T∂x and ∂T

∂y .Equipped with the X,Y → x, y conversion (89) B can now take A’s formula for the

temperature and express it in terms of her own X,Y coordinates. If we write TA(x, y)for the temperature at the point whose A-coordinates are (x, y) and TB(X,Y ) for thetemperature at the point whose B-coordinates are (X,Y ), then we have

TB(X,Y ) = TA(x, y)

= TA(X cosα− Y sinα,X sinα+ Y cosα).

What is the relation between the partial derivatives of the temperatures as computed byA and by B? �e chain rule gives the answer:

∂TB∂X

=∂

∂X

{TA(X cosα− Y sinα︸︷︷︸

=x

, X sinα+ Y cosα︸︷︷︸=y

)}=∂TA∂x

cosα+∂TA∂y

sinα.

11.3. Another example – Polar coordinates. Suppose a quantity P is given interms of Cartesian coordinates x and y: P = f(x, y). How does P change if we vary thepolar coordinates r and θ, i.e. what are the partial derivatives of P with respect to r andθ?

To answer this question we must write P as a function of r and θ. Recall that therelation between Cartesian coordinates and polar coordinates is(90) x = r cos θ, y = r sin θ.

�erefore P = f(x, y) = f(r cos θ, r sin θ) and we get

(91) ∂P

∂r= cos θ

∂f

∂x+ sin θ

∂f

∂y,

∂P

∂θ= −r sin θ

∂f

∂x+ r cos θ

∂f

∂y

Since the function f always gives us the value of the quantity P , these relations areusually wri�en in this way:

(92) ∂P

∂r= cos θ

∂P

∂x+ sin θ

∂P

∂y,

∂P

∂θ= −r sin θ

∂P

∂x+ r cos θ

∂P

∂y

Using the relation (90) between polar and Cartesian coordinates we can write these equa-tions in yet another way:

(93) ∂P

∂r=x

r

∂P

∂x+y

r

∂P

∂y,

∂P

∂θ= −y ∂P

∂x+ x

∂P

∂y

12. Problems

3One way of arriving at these relations is to use vectors as in the �rst vector work sheet of this semester.

76 4. DERIVATIVES

1. Use the chain rule to compute dz/dt forz = sin(x2 + y2), x = t2 + 3, y = t3. •

2. Use the chain rule to compute dz/dt forz = x2y, x = sin(t), y = t2 + 1. •

3. Use the chain rule to compute ∂z/∂s and∂z/∂t for z = x2y, x = sin(st), y = t2+s2.•

4. Use the chain rule to compute ∂z/∂s and∂z/∂t for z = x2y2, x = st, y = t2 − s2. •

5. (a) Let x = x(u, v), y = y(u, v) be thefollowing set of functions of u, v:

x = u2 − v2, y = 2uv.

If g(u, v) = f(x(u, v), y(u, v)) thencompute gu(1, 0), gu(1, 1), gv(1, 0), andgv(1, 1), if you are given these values of thepartial derivatives of f :

x y fx(x, y) fy(x, y)

0 0 A B1 0 C D0 1 E F1 1 G H2 0 I J0 2 K L

(b) Repeat the above problem if x and y aregiven by x = u, y = v/u.

(c) Repeat part (a) of this problem if x andy are given by x = u+ v, y = u− v.

6. Let x, y,X, Y, TA, and TB be as in the ex-ample in §11.2. In that section we computed∂TB∂X

.

(a) Compute∂TB∂Y

. •

(b) Show that(∂TA∂x

)2+(∂TA∂y

)2=(∂TB∂X

)2+(∂TB∂Y

)2.

In other words, A and B may measure di�er-ent partial derivatives, but the temperaturegradients they find have the same length.‖ #‰∇TA‖ = ‖ #‰∇TB‖. •

7. (About polar coordinates). In §11.3 wesaw howwe can use the chain rule to find ∂f

∂r

and ∂f∂θ

if we know the function f in termsof Cartesian coordinates (x, y). In this prob-lem we turn the question around: supposewe are given a function in polar coordinates,how do we compute its gradient.

Recall that polar and Cartesian coordi-nates are related by

r =√x2 + y2 and θ = arctan

y

x,

at least in the region where x > 0. (SeeChapter III, § 4.)

(a) Compute ∂r∂x

, ∂r∂y

, ∂θ∂x

, ∂θ∂y

. Try to simplifyyour answer as much as possible, by reusingthe variables r and θ. For instance, the sim-plest way to write ∂r

∂xis as ∂r

∂x= x

r.

(b) Suppose a quantity P is given in termsof Polar coordinates byP = f(r, θ). Express∂P∂x

and ∂P∂y

in terms of ∂f∂r

and ∂f∂θ

.

More precisely, compute

∂P

∂x

def=∂{f(r(x, y), θ(x, y)

}∂x

and

∂P

∂y

def=∂{f(r(x, y), θ(x, y)

}∂y

(c) Show that

‖ #‰∇P‖2 =(∂f∂r

)2+

1

r2

(∂f∂θ

)2.

8. For some function f we are told that atthe point with Cartesian coordinates (4, 3)one has

∂f

∂r= 3,

∂f

∂θ= 6.

Compute the gradient#‰∇f at (2, 1).

9. In physics an electric field is describedby its potential function, φ = φ(x, y) (inthis problem we assume the world is two-dimensional; the potential φ is measured inVolts). Minus the gradient of the potentialfunction is called the electric field:

#‰E = − #‰∇φ.

The electric potential of a point charge inthe plane is given in Polar coordinates byφ = −C ln r, for some constant C (thephysicists will tell you that C depends onthe charge that was placed at the origin; forus it is just some number, and we will in factassume that C = 1.)

(a)Compute the electric field#‰E correspond-

ing to the potential φ = − ln r. •

(b) Compute ‖ #‰E‖ (this quantity measures

the strength of the electric field, but not

12. PROBLEMS 77

its direction.) Where is the electric fieldstronger? •

(c)Make a drawing of the level curves of thepotential φ, and the electric field

#‰E.

(d) In the three dimensional world the elec-tric potential generated by a charged parti-cle at the origin is not given by−C ln r, butinstead by the so-called Coulomb potential

φ =C

r, where r =

√x2 + y2 + z2.

Compute the corresponding electric field#‰E = − #‰∇φ.

10. The ideal gas law , given by PV = nRT ,relates the Pressure, Volume, and Tempera-ture of n moles of gas. (R is the ideal gasconstant). Thus, we can view pressure, vol-ume, and temperature as variables, each onedependent on the other two.

(In this problem pressure is measured inPascals, temperature in degrees Kelvin, andvolume in Liters.)

Each of the following three questions canbe answered by applying the chain rule todi�erentiate z(t) = f(x(t), y(t)) for suit-able quantities x, y, and z. In each case statewhich variables play the role of x, y, z, andwhat the function f is.

(a) If pressure of a gas is increasing at a rateof 0.2Pa/min and temperature is increasingat a rate of 1◦K/min, how fast is the volumechanging?

(b) If the volume of a gas is decreasing ata rate of 0.3L/min and temperatuere is in-creasing at a rate of 0.5◦K/min, how fast isthe pressure changing?

(c) If the pressure of a gas is decreasing ata rate of 0.4Pa/min and the volume is in-creasing at a rate of 3L/min, how fast is thetemperature changing?

11. The ideal gas law says PV = nRT ,where P, V, T are variables, and n, R areconstants. Verify the following identity:

∂P

∂V

∂V

∂T

∂T

∂P= −1

12. The previous exercise was a special caseof the following fact, which you are asked toverify here:

Assume that F (x, y, z) is a function of3 variables, and suppose that the relationF (x, y, z) = 0 defines each of the vari-ables in terms of the other two, namely x =f(y, z), y = g(x, z) and z = h(x, y), then

∂x

∂y

∂y

∂z

∂z

∂x= −1.

Hint: this is a problem about implicit di�er-entiation.

13. Four cartographers are using di�erent co-ordinates to describe the same landscape.Each of them describes the landscape byspecifying a the height of a point in the land-scape as a function of its position above ahorizontal plane.

Cartographer A uses Cartesian coordi-nates (x, y) in the plane, B uses Cartesiancoordinates (X,Y ) in the plane. The coor-dinates (X,Y ) are rotated by 45◦ with re-spect to (x, y) (see §11.2).

Cartographer C works with A but usespolar coordinates (r, θ) (r is the distance tothe origin, θ is the angle with A’s x-axis).

Cartographer D works with B and usespolar coordinates (r, ϕ) (r is the distance tothe origin, ϕ is the angle with B’sX-axis).

Here is a picture of the landscape that A,B, C, and D are looking at:

(a) If B has found that the height is givenby the function f(X,Y ) = 2XY/(X2 +Y 2), then what function does A find for theheight? •

(b) What height function does C find? •

(c)What height function does D find? •

14. Brian and Ally are using di�erent Carte-sian coordinate systems in the plane: (x, y)for Ally, (X,Y ) for Brian. They have thesame origin, but Brian’s coordinates are ro-tated by an angle of θ = arctan 4

3(≈ 53◦,

although that is only an approximation. You

78 4. DERIVATIVES

can give exact answers in this problem, andyou don’t need a calculator.)

(a) What is the relation between (x, y) and(X,Y )?

(b) If Ally has found that TA(x, y) =32+0.1y, then what formula TB(X,Y )willBrian use to describe the temperature?

(c) On a di�erent occasion Ally found thatthe temperature had changed. Now Allymeasures the temperature and finds that at

the point with x = 1, y = 1 one hasTA(1, 1) = 35, and also ∂TA

∂x= 0.05 and

∂TA∂y

= 0.8. Which coordinates does Brianassign to this point, which temperature TB ,and which derivatives ∂TB

∂Xand ∂TB

∂Ydoes

Brian compute at this point?

[Hint: before you compute anything, findsin θ and cos θ; also draw a right triangleone of whose acute angles is θ.]

13. Higher Partials and Clairaut’s �eorem

13.1. Higher partial derivatives. By de�nition

(94)

∂2f

∂x2=∂(∂f∂x

)∂x

∂2f

∂x∂y=

∂(∂f∂y

)∂x

∂2f

∂y∂x=∂(∂f∂x

)∂y

∂2f

∂y2=

∂(∂f∂y

)∂y

In subscript notation one writes these higher partial derivatives as follows:

fxx(x, y) =∂2f

∂x2fxy(x, y) =

∂2f

∂y∂x

fyx(x, y) =∂2f

∂x∂yfyy(x, y) =

∂2f

∂y2.

Note the reversal in xy order in the mixed partial derivatives!

13.2. Example. If f(x, y) = x2y + cosxy then fx = 2xy − y sinxy, and hence

fxx =∂(2xy − y sinxy)

∂x= 2y − y2 cosxy,

fxy =∂(2xy − y sinxy)

∂y= 2x− sinxy − xy cosxy.

�e other partial derivatives follow from fy = x2 − x sinxy, and they are

fyx = 2x− sinxy − xy cosxy, fyy = −x2 cosxy.

Every time we take a derivative, we can choose whether we di�erentiate with respectto x or y. Di�erentiating once we have two possibilities, di�erentiating twice we have2× 2 = 4 possibilities, etc. �at is why we found four partial derivatives of second orderin the above example. But if we look carefully, we also see that fxy and fyx are the same.�is is no coincidence.

13.3. Clairaut’s �eorem – mixed partials are equal. If for a given function f oftwo variables the mixed partial derivative fxy(x, y) exists for all (x, y) in a neighborhoodof a point (a, b), and if this derivative is continuous at (a, b), then the other mixed partialderivative fyx(a, b) also exists, and fxy(a, b) = fyx(a, b).

So we normally don’t have to worry about the order in which we take partial deriva-tives.

14. FINDING A FUNCTION FROM ITS DERIVATIVES 79

13.4. Proof of Clairaut’s theorem. With some algebra we can show that the def-inition of partial derivatives implies

(95) ∂2f

∂x∂y=

lim∆x→0

lim∆y→0

f(x+ ∆x, y + ∆y)− f(x, y + ∆y)− f(x+ ∆x, y) + f(x, y)

∆x∆y

while

(96) ∂2f

∂y∂x=

lim∆y→0

lim∆x→0

f(x+ ∆x, y + ∆y)− f(x, y + ∆y)− f(x+ ∆x, y) + f(x, y)

∆x∆y

So it’s a ma�er of showing that one can switch the two limits. We won’t go into thedetails here, but the hypothesis that fxy is continuous implies that we are indeed allowedto switch the limits.

14. Finding a function from its derivatives

We now look at integrating the partial derivatives of a function, which looks out ofplace here (this being a chapter on derivatives and not on integrals), but Clairaut’s �eo-rem actually turns out to play a role.

If we have the derivative f ′(x) of some function of one variable then we know how torecover the function f(x): we integrate, i.e.

f(x) =

∫f ′(x)dx+ C.

Furthermore, any (continuous) function can be the derivative of a function, because, ifsomeone gives us a continuous function f(x), then

F (x)def=

∫ x

a

f(t)dt

is a di�erentiable function whose derivative is F ′(x) = f(x).What about functions of more than one variable? Suppose we know the partial deriva-

tives

(97) ∂f

∂x= P (x, y) and ∂f

∂y= Q(x, y)

of a function of two variables, can you then �nd the function f(x, y)?�e answer is “yes, you can �nd f by integrating, if it exists, but not every pair of

functions P and Q are the partial derivatives of some function.”�e following two examples are typical of what can happen.

14.1. Example. Does there exist a function f(x, y) of two variables such that∂f

∂x= x3 − 2xy, and ∂f

∂y= 3y2

both hold? �e answer is no, such a function cannot exist, and here is the reason: if therewere such a function, then we could compute

∂2f

∂y∂x=∂(x3 − 2xy)

∂y= −2x, and ∂2f

∂x∂y=∂(3y2)

∂x= 0.

80 4. DERIVATIVES

By Clairaut’s�eorem both computations should give us the same answer, but they don’t.�erefore the function f whose partials are as above cannot exist.

14.2. Example. Does there exist a function f(x, y) of two variables whose deriva-tives are

∂f

∂x= x3 − 2xy, and ∂f

∂y= sinπy − x2?

Let’s check Clairaut’s condition:∂2f

∂y∂x=∂(x3 − 2xy)

∂y= −2x, and ∂2f

∂x∂y=∂(sinπy − x2)

∂x= −2x.

�is time both computations gave us the same answer, so Clairaut’s theorem does notrule out the existence of the function f that we are looking for. We can try to computeit by integrating both partial derivatives. �ere is a systematic way of doing this thatusually leads to the answer.

We �rst integrate fx while treating y as a constant:

f(x, y) =

∫{x3 − 2xy} dx = 1

4x4 − x2y + C(y).

�e “constant” is only a constant in the sense that it does not depend on x. It may dependon y, and that is why we wrote it as C(y). To �nd C(y) we di�erentiate this result withrespect to y:

sinπy − x2 = fy =∂{

14x

4 − x2y + C(y)}

∂y= −x2 + C ′(y).

So we see that C ′(y) = sinπy, and hence C(y) = − 1π cosπy + K , where K is a real

constant (K depends neither on x nor on y).We �nd that the following function has the prescribed partial derivatives

f(x, y) = 14x

4 − x2y − 1π cosπy +K

whereK is constant, i.e. whereK depends on neither x nor y.�e method used in this example always works, and we summarize this fact in the

following theorem.

14.3. �eorem. Suppose P (x, y) andQ(x, y) are two functions that are de�ned on arectangular domain R = {(x, y) : a < x < b, c < y < d}, and suppose that they havecontinuous partial derivatives on this domain.

If a function f(x, y) exists such that (97) holds on R, then

(98) ∂P

∂y=∂Q

∂x

must hold on R.Conversely, if P and Q satisfy (98) then there is a function f de�ned on R that satis�es

(97).

To prove this theoremwe need to understand integrals of functions of several variables,and Green’s theorem in particular, so this will have to wait until the end of the semester.See § VII.11.

It should be noted that the assumption above that the functions P and Q be de�nedon a rectangle is important: the theorem is no longer true if the domain of P and Q “hasholes.” See problem 15.16.

15. PROBLEMS 81

15. Problems

1. Find all first and second partial deriva-tives of x3y2 + y5. •

2. Find all first and second partial deriva-tives of 4x3 + xy2 + 10. •

3. Find all first and second partial deriva-tives of x sin y. •

4. Find all first and second partial deriva-tives of sin(3x) cos(2y).

5. Find all first and second partial deriva-tives of ex+y2 .

6. Find all first and second partial deriva-tives of ln

√x3 + y4.

7. Find all first and second partial deriva-tives of z with respect to x and y if x2 +4y2 + 16z2 − 64 = 0. (Hint: solve for z oruse implicit di�erentiation. . . )

8. Find all first and second partial deriva-tives of z with respect to x and y if xy +yz + xz = 1. (Hint: solve for z or use im-plicit di�erentiation. . . )

9. How many di�erent second partialderivatives does a function of two variableshave? What about a function of three vari-ables? Howmany derivatives of third degreedoes a function of two variables have? •

10. Derive the formulas (95) and (96) from thedefinition of partial derivatives (51) and (52).

11. The equation which describes the vibrat-ing string (as in a guitar, piano, or violinstring) is

(99) ∂2f

∂t2= c2

∂2f

∂x2

where c > 0 is some constant. The equationis called the wave equation. It is an exam-ple of a partial di�erential equation.

Note : this problem looks like a prob-lem about di�erential equations, but to an-swer the following questions you really onlyhave to compute partial derivatives of cer-tain functions, and solve some (easy) alge-braic equations.

(a) For which values of the constant v is a“traveling wave with velocity v and profile

F (x)” a solution of the wave equation (99)?Does it ma�er which profile F is used here?

(For the terminology used here, revisitproblem 5.16 in Chapter III, §5.2.)

(b) Suppose the string is clamped down atits ends, and that its length is L. For whichvalues of the constants A and α is

f(x, t) = A sin(αt) sinπx

L

a solution of the wave equation? (AssumeA 6= 0).

(c) Same question for

g(x, t) = B sin(βt) sin2πx

L.

(d) Describe the movies that go with the so-lutions you found in (b) and (c). Which ofthe two graphs moves faster?

(e) Show that h(x, t) = f(x, t) + g(x, t) isagain a solution of the wave equation, wheref and g are as above. (Don’t use the formu-las for f and g: it is easier to prove a moregeneral fact, namely, if two functions f andg satisfy (99), then so does their sum f + g.)

(f) Describe the movie that goes with thefunction h(x, t) (it is probably be�er to usea graphing application likegrapher.appon Mac OS X, graphcalc.exe on Win-dows or Linux).

12. Suppose P (x, y) = x2 − 2xy3 andQ(x, y) = (xy)2. Does there exist a func-tion f(x, y) such that P = fx andQ = fy?

13. Suppose P (x, y) = x2 + axy3 andQ(x, y) = (xy)2, where a is a constant. Forwhich a does there exist a function f(x, y)such that P = fx and Q = fy?

14. Suppose P (x, y) = x2 − 2xy3 andQ(x, y) = (xy)2. Does there exist a func-tion f(x, y) such that P = fx andQ = fy?

82 4. DERIVATIVES

15. Suppose x = u + v, y = u − v, andsuppose f(x, y) = g(u, v). Then compute

(a)∂2g

∂u2•

(b)∂2g

∂v2•

(c)∂2g

∂u∂v•

(d)∂2g

∂u2− ∂2g

∂v2•

(e)∂2g

∂u2+∂2g

∂v2•

16. [For discussion] Let

P (x, y) =−y

x2 + y2, Q(x, y) =

x

x2 + y2.

(a) What is the domain of P and Q?

(b) Show that

P =∂θ

∂x, Q =

∂θ

∂y

where θ is the angle variable from polar co-ordinates.

(c) Show that P andQ satisfy the condition(98). (You don’t have to compute the deriva-tives to check this, although you could.)

(d) Is there a function f such that (97) holds?

CHAPTER 5

Maxima and Minima

In �rst semester calculus we learned how to �nd the maximal and minimal valuesof a function y = f(x) of one variable. �e basic method is as follows: assuming theindependent variable is restricted to some interval a ≤ x ≤ b, we �rst look for interiormaxima andminima. �ese always occur at critical or stationary points of the function,i.e. solutions x of f ′(x) = 0. We then check the function values at the endpoints a and bof the interval, to see if they might be maxima or minima.

To �nd out which solutions of f ′(x) = 0 are actually local maxima or minima wecan look at the sign of the derivative f ′(x) to see where the function is increasing ordecreasing, or we can apply the second derivative test.

�is chapter we will see how to solve similar questions about functions of two or morevariables.

1. Local and Global extrema

Let z = f(x, y) be the function whose maximal or minimal values we are lookingfor, and let D be the domain of this function. �is domain could be the largest possibledomain for the given function (in case f is de�ned by a formula), but it could also be somesmaller region which we ourselves have chosen. �e question we are considering is

What are the largest and smallest values that f(x, y) can haveif the point (x, y) belongs to the domain D?

1.1. De�nition of global extrema. �e function f has a global maximum or ab-solute maximum at a point (a, b) in D if f(x, y) ≤ f(a, b) for all points (x, y) in D.

Similarly, the function f has a global minimum or absolute minimum at a point(a, b) in D if f(x, y) ≥ f(a, b) for all points (x, y) in D.

1.2. De�nition of local extrema. �e function f has a local maximum at a point(a, b) in D if there is a r > 0 such that f(x, y) ≤ f(a, b) for all points (x, y) in D whichalso lie in a disc of radius r centered at (a, b).

Local minima are de�ned analogously.

1.3. Interior extrema. Recall that a point (a, b) in a domain D is called interiorif it is not a boundary point, or, more precisely, if there is some small r > 0 such thatthe disc with radius r centered at (a, b) is entirely contained in D. We will apply thisdistinction to the local and global maxima and minima that we �nd: an interior localminimum is a local minimum that occurs at an interior point of the domain D of thefunction.

83

84 5. MAXIMA AND MINIMA

Figure 1. The graph of f(x, y) = x2 + y2 from example § 2.2 on three di�erent rectangles Q.From le� to right:

(i) 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Both max and min are a�ained at a corner point of the rectangle.(ii) 0 ≤ x ≤ 1,−1 ≤ y ≤ 1. Two maxima, both are a�ained at corner points of the rectangle;

the minimum is a�ained at an edge point.(iii) −1 ≤ x ≤ 1,−1 ≤ y ≤ 1, Four maxima, all a�ained at corner points of the rectangle; the

minimum is a�ained at an interior point.

2. Continuous functions on closed and bounded sets

Before we go into the details of how we can actually �nd the maxima and minima, it isgood to know the following general fact. It tells us where to expect maxima and minima.

Let z = f(x1, . . . , xn) be a continuous function de�ned on some closed and boundedregionD inRn. Here “closed”means thatD contains all its boundary points, and “bounded”means that all points inD are not further away from the origin than some �xed radiusR(D does not “stretch all the way to in�nity”.)

We will also assume that f is continuous on D.

2.1. �eorem about Maxima andMinima of Continuous Functions. A contin-uous function de�ned on a closed and bounded region D ⊂ Rn has both a maximum andminimum within that region.

�e precise de�nitions of the concepts (continuous, closed, bounded) and the proof ofthis theorem all involve a fair number of ε’s and δ’s. �is material is treated in courseslike Math 421, 521 (real analysis) or 551 (point set topology) and really does not belonghere in Math 234. Nevertheless it is important to have some understanding of what ismeant in the above theorem. �e following examples are meant to clarify this.

2.2. Example –�e function f(x, y) = x2 + y2. �is function is continuous, andthe square Q = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} is bounded, and it contains allboundary points (the edges of the square). �erefore �eorem 2.1 tells us that f a�ainsboth its highest and lowest values somewhere in the square. �e theorem does not saywhere these max/min points are, but in this example they are easy to �nd. �e functionf(x, y) = x2 + y2 is at its smallest when both x = 0 and y = 0, i.e. at the bo�om-le�corner of the square. And f(x, y) is at its largest when x and y are both as large as theycan be, i.e. when x = 1 and y = 1. �is happens at the top-right corner of the square.

Note that the boundary of the rectangleQ has two di�erent kinds of points: it has fourcorner points, and then all the other points that lie on the edges.

If we change the rectangle Q then the minimum can appear at a corner point, a pointon an edge, or in an interior point. See Figure 1.

3. PROBLEMS 85

2.3. A �shy example. Consider the function f(x, y) = x2 − x3 − y2. Its zero setis the curve y2 = x2 − x3, which is shaped like the le�er α, or like a �sh – see Figure 2.�e function is positive on the tail (D1) and also on the body (D2) of the �sh, it vanisheson the curve that traces out the �sh, and f is negative elsewhere.

We assume that both regionsD1 andD2 are closed, which means that we assume thatthey include their boundary points. See Figure 2 below.

�eorem 2.1 does not apply to the regionD1 becauseD1 is not bounded (it contains thewhole negative x-axis). But the region D2 is bounded, and our function f is continuous,so �eorem 2.1 does apply toD2. �e theorem tells us that the function f has a maximalvalue and a minimal value somewhere inD2. In the interior ofD2 the function is strictlypositive, and at the boundary points of D2 we have f = 0. �erefore each boundarypoint is a minimum point of f onD2. �e point(s) inD2 where f a�ains its highest valuemust be somewhere in the interior of D2. In the next section we will see how to �nd it(and how to check that in this case there really only is one such point.)

x

y

D1

D2

x 2-y 2-x 3=0interior points

boundary points

Figure 2. Le�: The region where f(x, y) = x2 − x3 − y2 is positive consists of two parts, onebounded (D2), and the other unbounded (D1). Theorem 2.1 does not apply to the unboundedregion, but it does apply to the bounded region D2. In that region f must a�ain a maximumand also a minimum. Since f = 0 on the boundary of the region D2, and f > 0 in the interior, fachieves its lowest value inD2 everywhere on the boundary ofD2 and its highest value somewherein the interior. Theorem 2.1 does not tell us how to find that interior point, and allows for thepossibility that there might be more interior maxima, as well as a few interior (local) minima.

Right: The graph of the function z = x2 − y2 − x3.

3. Problems

1. Suppose you want to find the maximalvalue of f(x, y) = x2 − x3 − y2 over all

possible (x, y) with x ≥ 0 (and no restric-tion on y – this region is called the right halfplane).

(a) Explain why you should always choosey = 0 in order to maximize this particularfunction f(x, y). •(b) Use your answer to part (a) to find thepoint (x, y) that maximizes f(x, y) over theright half plane. •(c) Does our function f(x, y) have a maxi-mal value if (x, y) can be any point in theplane? (hint: what is f(−1000, 0)?) •


2. Suppose that D is a bounded and closedregion in the plane (you should draw one:any region will do as long as you include theboundary points).

Where does the function f(x, y) = xa�ain its maximum in the region that youdrew? Can f a�ain its maximum at an inte-rior point of the region?

What about minima?

3. Draw the region

R ={

(x, y) : y2 ≤ 4(x3 − x4)}.

Find the largest and smallest values that thefunction f(x, y) = x can have on this re-gion.

(Hint: where is 4(x3−x4) = 4x3(1−x)positive? The region looks like an Onion). •

4. Critical points

For functions y = f(x), a ≤ x ≤ b, of one variable the standardway of �ndingminima(and maxima) is to look for them in two di�erent places: either the minimum is a�ainedat one of the end points x = a or x = b of the interval, or else the minimum is a�ainedat an interior point. At an interior minimum one has f ′(x) = 0, so they can be found bysolving the equation f ′(x) = 0. �e same approach works for functions of two or morevariables. �e basic fact that tells us that this is so, is the following theorem.

4.1. De�nition (critical point). A critical point of a function z = f(x, y) of twovariables is a point (a, b) at which

#‰∇f(a, b) = 0, i.e. at which

fx(a, b) = 0 and fy(a, b) = 0.

At a critical point of a function the tangent plane to the graph is horizontal.

4.2. �eorem. Local extrema are critical points. If a function z = f(x, y) de�nedon a domainD has a local minimum or local maximum at an interior point (a, b) then onehas

∂f

∂x(a, b) = 0, and

∂f

∂y(a, b) = 0.

Picture proof. (See Figure 3.) If f has a local maximum at an interior point (a, b) thenf(x, y) ≤ f(a, b) for all (x, y) close to (a, b). �is means that a small piece of the graphof f near its local maximum at (a, b, f(a, b)) lies below the plane z = f(a, b). �is planemust therefore be the tangent plane to the graph of f . Being horizontal, its slopes arezero, and these slopes are exactly the partial derivatives of f at (a, b).

Frozen variable proof. Suppose f has a local maximum at an interior point (a, b) ofthe domain D. �en we can freeze the y-variable at the value y = b and consider thefunction of one variable g(x) = f(x, b). �is function has a maximum at x = a, so by�rst semester calculus we know that g′(x) = 0. By de�nition g′(a) = fx(x, b), so weconclude that fx(a, b) = 0.

By freezing x instead of y we �nd that fy(a, b) = 0 also must hold.�e same arguments apply in the case of a local minimum.

4.3. �ree typical critical points. Let’s �nd the critical points of the followingthree functions:

f(x, y) = x2 + y2, g(x, y) = x2 − y2, h(x, y) = −x2 − y2.

4. CRITICAL POINTS 87

fx = 0fy=0

x

y

Figure 3. Theorem 4.2: at a local maximum the tangent plane to the graph is horizontal. Thepartial derivatives w.r.t. both x and y vanish, and in fact, the derivative along any path through(a, b) vanishes. To see a picture of a local minimum turn the page upside down.

I f(x, y) = x2 +y2. Computing the partial derivatives we �nd for the �rst function

∂f

∂x= 2x,

∂f

∂y= 2y.

If (x, y) is a critical point of f then x and y must satisfy the equations fx(x, y) = 0 andfy(x, y) = 0, in this case, 2x = 0 and 2y = 0. So we see that f has exactly one criticalpoint, namely the origin (x, y) = (0, 0).

Is this critical point perhaps a minimum or a maximum? Since squares can never benegative, f(x, y) = x2 + y2 is always non-negative, and it is at its smallest when bothterms x2 and y2 vanish, i.e. when x = y = 0. So f(x, y) has a global minimum at theorigin.I h(x, y) = −x2 − y2. �is function is just −f(x, y), and without looking at its

derivatives we can tell that it has a global maximum at the origin (because f(x, y) has aglobal minimum there). �e derivatives are

∂h

∂x= −2x,

∂h

∂y= −2y

so that the origin is the only critical point of this function.

local max local minsaddle point

Figure 4. The three most common kinds of critical point. See the examples in §4.3 and also thesecond derivative test in §9.


I g(x, y) = x2 − y2. �e derivatives of g are∂g

∂x= 2x,

∂g

∂y= −2y,

so, once again, the origin is the only critical point. But, unlike the previous two functions,g has neither a maximum nor a minimum at the origin. We can see this by �rst lookingat what g does on the x-axis, and then what g does on the y-axis:

On the x-axis we have g(x, 0) = +x2, so g has aminimum at the origin.On the y-axis we have g(0, y) = −y2, so g has amaximum at the origin.So arbitrarily close to the origin we can �nd points (x, y) where g(x, y) is larger than

g(0, 0), and we can �nd other points where g(x, y) is smaller than g(0, 0). �erefore gdoes not have a local maximum or a local minimum at the origin.

Figure 4 shows the three cases we have just discussed.

4.4. Critical points in the �shy example. What are the critical points of the func-tion f(x, y) = x2 − x3 − y2 from §2.3?

We compute the partial derivatives of the function∂f

∂x= 2x− 3x2 = (2− 3x)x,

∂f

∂y= −2y.

�e equation fy = 0 implies that y = 0, while fx = 0 implies x = 0 or x = 23 . �erefore

f has two critical points: one at the origin (0, 0), and the other at ( 23 , 0).

x

y

D1

D2

x 2-y 2-x 3=0

interior points

boundary points

In this example we could have already predicted from the shape of the zero set of fthat f has at least two critical points – we don’t need to compute the derivatives of f forthat. Namely, the zero set of f is a curve that crosses itself at the origin, so the ImplicitFunction�eorem 10.1 (chapter 2) cannot hold at the origin, and hence fx = fy = 0 there.And in § 2.3 we argued that the function f must have a local maximum somewhere inthe region D2 (Figure 2), so f must have at least two critical points. On the other hand,by computing the critical points we have found that there is only one local maximum inthe region D2.

4.5. Another example – �nd the critical points of f(x, y) = x− x3 − xy2.Solution: �e derivatives of our function are

∂f

∂x= 1− 3x2 − y2,

∂f

∂y= −2xy.

�e critical points are therefore the solutions of the equations

1− 3x2 − y2 = 0, −2xy = 0.

�is is a system of two equations, with two unknowns (that always happens when welook for critical points, since we are looking for solutions of fx(x, y) = 0, fy(x, y) = 0.)�e second equation, −2xy = 0, implies that either x = 0 or y = 0 (or both). We haveto treat these two cases separately:

�e case x = 0. If x = 0 then we only have the �rst equation le�,which tells us 1−y2 = 0, i.e. y = ±1. We �nd two critical points withx = 0, namely, (0, 1) and (0,−1).

�e other case, x 6= 0. If x 6= 0, then the second equation (−2xy =0) implies y = 0. Substitute this in the �rst equation and we �nd1− 3x2 = 0, i.e. x = ± 1

3

√3, so that we have two critical points with

x 6= 0, namely, (− 13

√3, 0) and ( 1

3

√3, 0).

5. WHEN THERE ARE MORE THAN TWO VARIABLES 89

—

—

— —

—

++

+

+ +

+

——

—

—

—

—

——

A

B

DC

—

—

—

+

++

+

+

+

+

+

+

+

+ +

—

—

——

——

—

Figure 5. The zero set and signs of the function f(x, y) = x− x3 − xy2.

�e conclusion is that this function has four critical points, two on the x-axis, and twoon the y-axis. Without looking into this in any further detail we cannot tell if any of thesepoints are local maxima or minima. In general the second derivative test (to be explainedin § 9) will provide this information. For this example a look at the zero set of f also helpsus �gure out what kind of critical points we have found. Since f factors as

f(x, y) = x · (1− x2 − y2),

we see that its zero set consists of the line x = 0 and the unit circle x2 + y2 = 1. In theabove picture f > 0 in the grey region, and f < 0 in the white area. Consider the righthalf of the unit disc. �e function is positive in the interior, and zero on the boundaryof this region. Just as in the “�shy example” of § 2.3, we have another case where themaximum of the function must be a�ained at one or more interior points of the right halfof the unit disc. According to our computation f only has one critical point in the righthalf circle, and therefore this point must be a local maximum of the function. Conclusion:D = ( 1

3

√3, 0) is a local maximum.

In the same spirit you can argue that f has a local minimum at C .�e other two pointsA,B are neither local maxima nor minima, since arbitrarily close

to A or B there are both points (x, y) with f(x, y) positive, and points with f(x, y) neg-ative. �e points A and B turn out to be “saddle points” (see §9 on the second derivativetest.)

5. When there are more than two variables

�e whole discussion so far has been about functions of two variables. Fortunately,not much changes when you have more variables. �e concepts local minimum and localmaximum are de�ned in the same way, and it turns out that any interior local maximumor minimum must be a critical point of the function. Here, by de�nition, a critical point ofa function w = f(x1, . . . , xn) of n variables is a solution of the equations

∂f

∂x1(x1, · · · , xn) = 0

∂f

∂x2(x1, · · · , xn) = 0

...∂f

∂xn(x1, · · · , xn) = 0.


Observe that there are n equations, and that there are also n unknowns (x1, . . . , xn) sothat we should in principle be able to solve these equations. In practice the system ofequations we get can be very easy, di�cult, or simply impossible to solve.

6. PROBLEMS 91

6. Problems

1. Find all critical points of the followingfunctions. Try to classify them into lo-cal/globalmaxima/minima, saddles, or otherkind of critical points. (Write clear solutions.You will need your solutions later in problem10.5.)

(a) f(x, y) = x2 + 4y2 − 2x+ 8y − 1 •(b) f(x, y) = x2 − y2 + 6x− 10y + 2 •(c) f(x, y) = x2 + 4xy + y2 − 6y + 1 •(d) f(x, y) =

x2 − xy + 2y2 − 5x+ 6y − 9 •(e) f(x, y) = y2 − 18x2 + x4 •(f) f(x, y) = y4 − 4y2 − 18x2 + x4 •(g) f(x, y) = 9 + 4x− y − 2x2 − 3y2 •(h) f(x, y) = xy(4− x− 2y) •(i) f(x, y) = x(x− y)(x− 1) •(j) f(x, y) = (x− y)(xy − 4) •(k) f(x, y) = y2 + cosx

(l) f(x, y) = x2y − 13y3 •

(m) f(x, y) = (x− y2)(x− 1) •(n) f(x, y) = (x− y)(xy − 4) •(o) f(x, y) = x2 •(p) f(x, y) = x2y •

(q) f(x, y) =(1− x2 − y2

)2 •(r) f(x, y) = x2y •

2.(a) Draw the zero set of the functionf(x, y) = sin(x) sin(y).

(b) Where is the function f positive? Findas many critical points as you can withoutcomputing fx or fy .

(c) Find all critical points of f(x, y). Whichare local minima or local maxima?

3. Find the critical points of the function

f(x, y, z) = x2 + y2 + z2 − 2x+ 4y − 2.

4. Draw the zero set and find the criticalpoints of the functions

f(x, y, z) = x2 + y2 − z2

andg(x, y, z) = x2 − y2 − z2

5. If we have three points A, B, and C inthe plane, which point is closest to all threeof them? The answer depends on what wemean by “closest to all three points.” The fol-lowing problem gives us one interpretationof this general question.

Consider the three points (1, 4), (5, 2),and (3,−2) in the plane. The function

f(x, y, z) =

(x− 1)2 + (y − 4)2+

(x− 5)2 + (y − 2)2+

(x− 3)2 + (y + 2)2

is the sum of the squares of the distancesfrom point (x, y) to the three points.

(a) Assuming that there is a global mini-mum, find x and y so that f(x, y) is mini-mized. •

(b) (For discussion) Does f(x, y) have aglobal minimum? How can we be sure thatthe point we found in part (a) is not actu-ally amaximum or some other critical point?(c) Given the three points (a, b), (c, d), and(e, f), let f(x, y) be the sum of the squaresof the distances from point (x, y) to thethree points. Find x and y so that this quan-tity is minimized. •

6. Suppose that a function f(x, y) factors,i.e. we can write it as the product of twoother di�erentiable functions, f(x, y) =g(x, y)h(x, y).

Prove: if a point (a, b) lies in the zero setof g and also in the zero set of h, then (a, b)is a critical point of f .

Hint: compute the partial derivatives of f byapplying the product rule to f = g · h. •

7. Find the critical points of the functions

(a) f(x, y, z) = x2 + y2 + z2− 2x+ 4y− 2

(b) f(x, y, z) = x4 + y2 + z2 − 2xz + 4y

(c) f(x, y, z) = xyze−x−y−z

(d) f(x, y, z) = x2 + y2 + z2 − 2xyz


7. A Minimization Problem: Linear Regression

Suppose we are measuring two quantities x and y in some experiment, and supposethat we expect that there is a linear relation of the form y = ax + b between x and y.If we have a set of data points (xk, yk) from our experiment, then what do they tell usabout a and b? Which choice of coe�cients a and b bests �ts our data? Becauseof experimental errors we would not expect our data points to lie on a straight line, butinstead, we expect them to be clustered around a straight line. We could plot the datapoints, get a ruler, and draw a straight line by hand that looks like the best match – thenwe could measure a, b from our drawing. A more systematic approach is to �rst de�newhat we mean by “best match” and then �nd the line that best matches according to ourchosen criterion.

A very common criterion is the least-mean-square-�t. To describe it, imagine we haveN data points, (x1, y1), . . . , (xN , yN ), and consider the line with coe�cients a and b.Most data points (xk, yk) will then probably not lie on the line y = ax+ b, and one uses

Ek = 12

(axk + b− yk

)2as a measure for the mismatch between the data point (xk, yk) and the line y = ax + b(the factor 1

2 makes formulas later on nicer). Adding all these errors we get the total“mean square” error

E = E1 + · · ·+ EN .

If we think of all the numbers x1, . . . , xN , y1, . . . , yN as given constants (a�er all, wemeasured them, so we shouldn’t change them any more1), then the total error only de-pends on the coe�cients a and b. It is a measure for how well the line y = ax + b �tsour data points, and the common method of linear regression consists in choosing thecoe�cients a and b so as to minimize this error E.

y = ax+ b

(xk, yk)

∣∣axk + b− yk∣∣

Figure 6. Which line best fits a set of data points?

�is leads us to the problem of �nding the critical points of the total error E as afunction of a and b. We have to solve

∂E

∂a= 0

∂E

∂b= 0.

1�is is called the “Sushi Principle”: raw data is be�er than cooked data.

8. PROBLEMS 93

�e total error is the sum of the individual errors Ek(a, b) so we get∂E

∂a=∂E1

∂a+ · · ·+ ∂EN

∂a,

∂E

∂b=∂E1

∂b+ · · ·+ ∂EN

∂b.

�e individual errors have the following derivatives:∂Ek∂a

= xk(axk + b− yk

),

∂Ek∂b

= axk + b− yk.

Adding all these derivatives then leads to∂E

∂a=∑

xk(axk + b− yk

)= (∑x2k)a+ (

∑xk)b−

∑xkyk

and∂E

∂b=∑{

axk + b− yk}

= (∑xk)a+Nb−

∑yk

Here “∑

” represents summation over k = 1, · · · , N , i.e.∑xkyk = x1y1 + · · ·+ xNyN ,

etc.If (a, b) is a critical point then a and b must satisfy

(∑x2k)a+ (

∑xk)b =

∑xkyk

(∑xk)a+Nb =

∑yk

�ese are two linear equations for the two unknowns a and b. Solving them leads to

a =N∑xkyk −

∑xk∑yk

N∑x2k −

(∑xk)2 ; b =

−∑xk∑xkyk +

∑x2k

∑yk

N∑x2k −

(∑xk)2 .

�ese are the standard formulas for the coe�cients a and b provided by the method oflinear regression. Most calculators, and certainly all spreadsheets (like Excel) have theseformulas preprogrammed, so we only have to enter the data points (xk, yk) and “pushthe right bu�on” to get a and b.

8. Problems

1. We are given N measurements x1, . . . ,xN from some experiment, and, inspired bythe Linear Regression example, we decide tosee which number a “best fits the data.” Wedefine the error (or “measure of misfit”) foreach measurement to be

Ek(a) = 12(a− xk)2

and we look for the number a which mini-mizes the total error

E(a) = E1(a) + · · ·+ EN (a).

(a) Is this a problem about several variablecalculus, or about one variable calculus? •(b)Which number a do we find? •

2. We have a series of data points (xk, yk),and when we plot them we think we see a

convex curve rather than a straight line. Infact it looks like a parabola, and sowe set outto find a quadratic function y = ax2+bx+cthat minimizes the error

E(a, b, c) = E1 + · · ·+ EN ,

with

Ek(a, b, c) = 12

(ax2

k + bxk + c− yk)2.

(a) How many variables are there in thisproblem? •(b) If (a, b, c) is a critical point of E(a, b, c)then a, b, and c satisfy three linear equa-tions. Find these equations (don’t solvethem). •

3. A measurement in a certain experimentresults in three numbers (x, y, z). The point


of the experiment is to see if there is a lin-ear relation of the form z = ax + by + cbetween the three measured quantities, andto estimate the coe�icients a, b, c.

A�er repeating the experiment N timeswe have N data points (xk, yk, zk) (k =1, . . . , N ). We decide to choose a, b, c so as

to minimize the mean square error

E = E1 + · · ·+ EN ,

with

Ek(a, b, c) = 12

(axk + byk + c− zk

)2.

Which (linear) equations will we get for a, b,and c? •

9. �e Second Derivative Test

9.1. Review of the one-variable second derivative test and Taylor’s formula.For a function y = f(x) of one variable you can tell if a critical point a is a local maximumor minimum by looking at the sign of the second derivative f ′′(a) of the function at thatpoint.

a b

f"(a)>0 f "(b)<0

If f ′′(a) > 0 then the graph of f is curved upwards and f has a local minimum at a;if f ′′(a) < 0 then f has a local max. �is section is about the analogous test for criticalpoints of functions of two variables.

One way to understand the second derivative test is to look at the Taylor expansion ofthe function y = f(x). If x = a is a critical point for f , then

f(x) = f(a) + f ′(a)(x− a) + 12f′′(a)(x− a)2 + · · ·

Since a is a critical point of f we have f ′(a) = 0, so that the Taylor expansion reduces to

(100) f(x) = f(a) + 12f′′(a)(x− a)2 + · · ·

If we ignore the remainder term (the dots), then we �nd that

f(x) ≈ f(a) + 12f′′(a)(x− a)2.

Near the critical point the graph of y = f(x) is a approximately a parabola. It is curvedupwards if f ′′(a) > 0, and downwards if f ′′(a) < 0.

To apply the same reasoning to a function of two (or more) variables we need to knowthe Taylor expansion of such a function.

9.2. Taylor’s formula for a function of several variables. �eTaylor expansionof a function z = f(x, y) should give us an approximation of f(a+∆x, b+∆y) in termsinvolving powers of ∆x and ∆y. �ere is a general formula, but here we only need thesecond order terms, so we’ll derive those and stop there.

�e trick to �nding the Taylor expansion is to consider the function

(101) g(t) = f(a+ t∆x, b+ t∆y).

By de�nitiong(1) = f(a+ ∆x, b+ ∆y)

9. THE SECOND DERIVATIVE TEST 95

is the quantity we want to approximate, and g(0) = f(a, b). Since g(t) is a function ofone variable, we can apply Taylor’s formula from Math 222 to it. We get:

(102) g(t) = g(0) + g′(0)t+ g′′(0)t2

2!+ · · ·

�e dots contain the remainder term, which we will ignore. Now we set t = 1, and weget

g(1) = g(0) + g′(0) +1

2g′′(0) + · · ·

�e derivatives of g can be computed with the chain rule:

g′(t) =df(a+ t∆x, b+ t∆y)

dt(103)

= fx(a+ t∆x, b+ t∆y)d(a+ t∆x)

dt+ f(a+ t∆x, b+ t∆y)

d(b+ t∆y)

dt= fx(a+ t∆x, b+ t∆y)∆x+ fy(a+ t∆x, b+ t∆y)∆y.

�e second derivative is

(104) g′′(t) = fxx(a+ t∆x, b+ t∆y)(∆x)2

+ 2fxy(a+ t∆x, b+ t∆y)∆x∆y

+ fyy(a+ t∆x, b+ t∆y)(∆y)2.

In computing g′′(t) we run into terms involving fxy and terms with fyx. Because ofClairaut’s theorem these are the same, and combining them leads to the coe�cient “2” infront of fxy above.

Se�ing t = 0 in (103) and in (104) gives you expressions for g′(0) and g′′(0), and bysubstituting these in (102) we get the second order Taylor expansion of a function oftwo variables:

(105) f(a+ ∆x, b+ ∆y) = f(a, b) + fx(a, b)∆x+ fy(a, b)∆y

+1

2

{fxx(a, b)(∆x)2 + 2fxy(a, b)∆x∆y + fyy(a, b)(∆y)2

}+ · · ·

�e �rst three terms are exactly the linear approximation (60) of the function that we

(a,b) ΔxΔy

(a+Δx,b+Δy)

Figure 7. ∆x and ∆y: Taylor’s formula lets us approximate a function z = f(x, y) at points(x, y) = (a+ ∆x, b+ ∆y) close to (a, b). The expansion gives us f(x, y) = f(a+ ∆x, b+ ∆y)as a function of ∆x and ∆y.

saw in Chapter III, § 4.2. �e next terms in 105 are1

2fxx(a, b)(∆x)2 + fxy(a, b)∆x∆y +

1

2fyy(a, b)(∆y)2.


�ese terms determine a quadratic form in the variables ∆x and ∆y. �e quantities12fxx(a, b), etc. are the coe�cients of the form.

As always, the dots in the expansion (105) contain the remainder term. By carefullyincluding the one-variable Lagrange remainder in the derivation we can get a formulafor the remainder in (105). We will not do that, but it can be shown that the remainder iso((∆x)2 + (∆y)2

), i.e. that it is small compared to the other terms in the expansion, at

least when ∆x and ∆y are small.

9.3. Example – compute the Taylor expansion of f(x, y) = sin 2x cos y at thepoint ( 1

6π,16π). To �nd the expansion we need to compute f, fx, fy, fxx, fxy, and fyy

at ( 16π,

16π). Here goes:

f = sin 2x cos y = 34

fx = 2 cos 2x cos y = 12

√3

fy = − sin 2x sin y = − 14

√3

fxx = −4 sin 2x cos y = −3

fxy = −2 cos 2x sin y = − 12

fyy = − sin 2x cos y = − 34 .

Substituting in the Taylor expansion we getf(

16π + ∆x, 16π + ∆y

)= 3

4 + 12

√3 ∆x− 1

4

√3 ∆y +

1

2

{−3(∆x)2 − 2 · 1

2∆x∆y − 34 (∆y)2

}+ · · ·

= 34 + 1

2

√3 ∆x− 1

4

√3 ∆y − 3

2 (∆x)2 − 12∆x∆y − 3

8 (∆y)2 + · · ·Note that the �rst three terms in the expansion are the linear approximation of the func-tion:

f(

16π + ∆x, 1

6π + ∆y)

= 34 + 1

2

√3 ∆x− 1

4

√3 ∆y + · · ·

9.4. Another example – the Taylor expansion of f(x, y) = x3 + y3 − 3xy atthe point (1, 1). �e function f(x, y) = x3 + y3 − 3xy has the following derivatives at(1, 1):

f = x3 + y3 − 3xy = 1

fx = 3x2 − 3y = 0

fy = 3y2 − 3x = 0

fxx = 6x = 6

fxy = −3 =− 3

fyy = 6y = 6

�e �rst derivatives vanish, so (1, 1) is a critical point of f . �e second order Taylorexpansion of f at (1, 1) is(106) f(1 + ∆x, 1 + ∆y) = 1 + 3(∆x)2 − 3∆x∆y + 3(∆y)2 + · · ·Note that there are not �rst order terms in this expansion because (1, 1) is a critical point– the coe�cients of the �rst order terms are both zero.

To see what kind of critical point (1, 1) is, we have to analyze the second order, qua-dratic, terms(107) 3(∆x)2 − 3∆x∆y + 3(∆y)2.

�is expression is a quadratic form in ∆x and ∆y, and by completing the square (seeChapter III, § 3) we �nd that

3(∆x)2 − 3∆x∆y + 3(∆y)2 = 3[(

∆x− 12∆y

)2+ 3

4 (∆y)2].

In particular, the quadratic terms in the Taylor expansion of f at the critical point arealways positive, no ma�er what ∆x and ∆y we choose (as long as they are not both

9. THE SECOND DERIVATIVE TEST 97

zero). If we are allowed to ignore the remainder term (the “· · · ”), then this implies thatthe function has a local minimum: a�er all, the Taylor expansion (106) says that for small∆x and ∆y the function value f(1 + ∆x, 1 + ∆y) is

f(1 + ∆x, 1 + ∆y) ≈ f(1, 1) + 3(∆x− 1

2∆y)2

+ 94 (∆y)2.

�e second order terms are all positive, so the Taylor expansion tells us that

f(1 + ∆x, 1 + ∆y) ≥ f(1, 1),

at least for small ∆x and ∆y. �e function therefore has a local minimum at (1, 1).

9.5. Example of a saddle point. �e same function f(x, y) = x3 + y3 − 3xy hasanother critical point, namely, the origin. By calculating the derivatives at (0, 0) we �ndthat the Taylor expansion at the origin is

(108) f(∆x,∆y) = −3∆x∆y + · · ·

Ignoring the remainder terms we see that near the origin f(∆x,∆y) ≈ −3∆x∆y, whichsuggests that f is negative when ∆x and ∆y are both positive, or when they are bothnegative, while f is positive when ∆x and ∆y have opposite signs.

Arbitrarily close to the origin the function f therefore has both positive and negativevalues, and therefore f has neither a local maximum nor a local minimum at the origin.In fact the Taylor expansion (108) suggests that the graph of f should look like that ofthe “saddle function” z = xy.

9.6. �e two-variable second derivative test. �e last two examples essentiallyshow us how the second derivative test for functions of two variables works. To explainhow it works in general, let’s suppose a function f has a critical point at (a, b). �en the�rst partial derivatives of f vanish at (a, b) and hence the Taylor expansion has no �rstorder terms. We get

(109) f(a+ ∆x, b+ ∆y) = f(a, b)+

1

2


}+ · · ·

�is is the two-variable analog of equation (100). To see if (a, b) is a local maximum orminimum (or something else), we have to see if the quadratic terms in (109) are alwaysnegative, always positive, or if they can have either sign, depending on the choice of ∆x,∆y.

�e precise statement of the second derivative test uses the terminology introduced inChapter I, §3 and Figure 5 in that chapter.

�eorem (second derivative test). If (a, b) is a critical point of f(x, y), and if

Q(∆x,∆y) =1

2


}is the quadratic part of the Taylor expansion of f at the critical point, then

I If Q is positive de�nite then (a, b) is a local minimum of f ,I If Q is negative de�nite then (a, b) is a local maximum of f ,I If Q is inde�nite then (a, b) is a saddle point of fI If Q is semide�nite the second derivative test is inconclusive.


When the form Q is inde�nite, so that it can be factored asQ(∆x,∆y) = (k∆x+ l∆y)(m∆x+ n∆y),

then the level set of the function f containing the critical point (a, b) consists of twocurves. One of these curves is tangent to the line

k∆x+ l∆y = 0, i.e. k(x− a) + l(y − b) = 0

while the other is tangent tom∆x+ n∆y = 0, i.e. m(x− a) + l(y − b) = 0.

9.7. Example – apply the second derivative test to the �shy example. In § 2.3and § 4.4 we had found that the function f(x, y) = x2 − x3 − y2 has two critical points,one at the origin, and one at the point ( 2

3 , 0). By carefully looking at the zero set of the

x

y

D1

D2

x 2-y 2-x 3=0

interior points

boundary points

function we discovered that the origin is neither a local maximum nor a local minimum,and that the point ( 2

3 , 0) is a local maximum. �e second derivative test provides a moresystematic way of reaching these conclusions. To apply the test we need to know thesecond derivatives of f at the critical points. �ey are:

(x, y) fxx(x, y) fxy(x, y) fyy(x, y)

(x, y) 2− 6x 0 −2

(0, 0) 2 0 −2

( 23 , 0) −2 0 −2

�erefore the second order Taylor expansion of f at the origin isf(∆x,∆y) = f(0, 0) + 1

2

{2 · (∆x)2 + 2 · 0 ·∆x∆y + (−2)(∆y)2

}+ · · ·

= (∆x)2 − (∆y)2 + · · ·= (∆x−∆y)(∆x+ ∆y) + · · ·

�e quadratic part of the Taylor expansion can be factored, so this is the “inde�nite” case.It can be both positive and negative, depending on our choice of ∆x and ∆y. �e secondderivative test implies that the origin is a saddle point. It also says that the zero set of fnear the origin consists of two curves, whose tangents at the origin are given by the twoequations(110) ∆x−∆y = 0 and ∆x+ ∆y = 0.

10. PROBLEMS 99

In this case the point (a, b) is the origin, so ∆x = x − a = x and ∆y = y − b = y, andthe two tangents are the lines y = ±x.

�e second order Taylor expansion at the other critical point ( 23 , 0) is given by

(111) f( 23 + ∆x,∆y) = f( 2

3 , 0)− (∆x)2 − (∆y)2 + · · ·�is timewe see that the second order terms of the Taylor expansion are negative de�nite.�e second derivative test therefore says that we have a local maximum at ( 2

3 , 0).

10. Problems

1. [for discussion] Are ∆x in § 9.4 and § 9.5the same?

Are the ∆x in the equations (110) and in(111) of the second derivative test examplethe same? Explain what they stand for. •

2. Compute the second order Taylor expan-sion of the following functions at the indi-cated points:

[In this problem you are asked to findTaylor expansions of functions at variouspoints. Since these points are not necessar-ily critical points, the expansions you findwill generally have first and second oderterms. In the expansions you will computewhen you use the second derivative testlater on, there will be no first order terms.]

(a) f(x, y) =(1− x+ xy

)2 at (0, 0) •

(b) f(x, y) =(1− x+ xy

)2 at (1, 1) •

(c) f(x, y) = ex−y2

at (0, 0) •

(d) f(x, y) = ex−y2

at (1, 1) •

(e) f(x, y) =x

1− y at (0, 0)

(f) f(x, y) =x

1 + yat (1, 0)

3. Factor, or complete the square in the fol-lowing quadratic forms, draw their zero sets,and determine if they are positive definite,negative definite, indefinite or degenerate.

(a) Q(x, y) = x2 + 3xy + y2

(b) Q(x, y) = x2 + xy + y2

(c) Q(x, y) = 2x2 + 3xy − 4y2

(d) Q(x, y) = 2x2 + 3xy − 5y2

(e) Q(∆x,∆y) = (∆x)2 + (∆y)2

(f) Q(∆x,∆y) = (∆x)2 − 3(∆y)2

(g) Q(∆x,∆y) = ∆x∆y

(h) Q(∆x,∆y) = ∆x∆y − 2(∆y)2

4. If a is a constant, then for which valuesof a is the form Q(x, y) = x2 + 2axy + y2

positive/negative definite, indefinite, or de-generate? •

5. Find all critical points of the followingfunctions (you did many of these in problem6.1). Apply the second derivative test to allcritical points you find. •

(a) f(x, y) = x2 + 4y2 − 2x+ 8y − 1

(b) f(x, y) = x2 − y2 + 6x− 10y + 2

(c) f(x, y) = x2 + 4xy + y2 − 6y + 1

(d) f(x, y) = x2− xy+ 2y2− 5x+ 6y− 9

(e) f(x, y) = y2 − 18x2 + x4

(f) f(x, y) = y4 − 4y2 − 18x2 + x4

(g) f(x, y) = 9 + 4x− y − 2x2 − 3y2

(h) f(x, y) = xy(4− x− 2y)

(i) f(x, y) = x(x− y)(x− 1)

(j) f(x, y) = (x− y)(xy − 4)

(k) f(x, y) = y2 + cosx

(l) f(x, y) = x2y − 13y3

(m) f(x, y) = (x− y2)(x− 1)

(n) f(x, y) = (x− y)(xy − 4)

(o) f(x, y) = x2

(p) f(x, y) = x2 − y4

(q) f(x, y) = x2 + y4

(r) f(x, y) = x2y

6. (a) Draw the zero set of the functionf(x, y) = sin(x) sin(y). (b) Where is thefunction f positive? Find as many criticalpoints as you can without computing fx orfy .

(c) Find all critical points of f(x, y). Whichare local minima or local maxima?


7. Find all critical points of the followingfunctions, and apply the second derivativetest to the points you find.

(a) f(x, y) = x2 + y2 − 12xy2 •

(b) f(x, y) = x2 + y2 − x2y2

(c) f(x, y) = x+ 2y − xy2 •

(d) f(x, y) = 8x4 + y4 − xy2

8. Suppose that f(x, y) = x2 + y2 + kxy.Find and classify the critical points, and dis-cuss how they change when k takes on dif-ferent values.

9. Consider the function

f(x, y) = x3 − 3xy2.

The graph of this function is known as the“Monkey Saddle.”

(a) Show that (0, 0) is the only critical pointof f .

(b) Show that the second derivative test isinconclusive for f .

(c) Draw the zero set of f , and indicatewhere f > 0 and where f < 0.

(d)What kind of critical point is (0, 0)?

10. Consider the function

f(x, y) = x3 − x2y.

(a)Draw the zeroset of f and indicate wheref(x, y) is positive, and where f(x, y) is neg-ative.

(b) Find all the critical points of the function.(c) Does the second derivative test apply toany of the critical points of f?

(d) Use the sign-diagram you made in part(a) to decide which critical points are localmaxima or minima.

11. Second derivative test for more than two variables

�e ideas that lead to the second derivative test for functions of two variables alsowork when we have a function with more variables. However, the second derivative testfor functions of more than two variables is beyond the scope of Math 234, and this shortsection tries to explain why.

11.1. �e second order Taylor expansion. If z = f(x1, x2, · · · , xn) is a functionof n variables, then its Taylor expansion of order two at some point (a1, a2, · · · , an) turnsout to be

f(a1 + ∆x1, · · · , an + ∆xn) =

f(a1, · · · , an)+fx1∆x1 + · · ·+ fxn∆xn+

1

2

{fx1x1

(∆x1)2 + · · ·+ fx1xn∆x1∆xn

+fx2x1∆x2∆x1 + · · ·+ fx2xn∆x2∆xn

...

+fxnx1∆xn∆x1 + · · ·+ fxnxn(∆xn)2}

+ · · ·

where the partial derivatives fxi and fxixj are to be evaluated at the point (a1, · · · , an).�e same trick involving the function “g(t)” that was used in §9.2 to derive the two-variable Taylor expansion works without modi�cation.

12. OPTIMIZATION WITH CONSTRAINTS AND THE METHOD OF LAGRANGE MULTIPLIERS 101

If (a1, · · · , an) is a critical point then fx1= fx2

= · · · = fxn = 0, so the linear termsare absent, and the function is described by the quadratic terms of the Taylor expansion

f(a1 + ∆x1, · · · , an + ∆xn) =

f(a1, · · · , an) +1

2

{fx1x1

(∆x1)2 + · · ·+ fx1xn∆x1∆xn

+fx2x1∆x2∆x1 + · · ·+ fx2xn∆x2∆xn

...

+fxnx1∆xn∆x1 + · · ·+ fxnxn(∆xn)2

}+ · · ·

Just as in the two-variable case we could now try to see if the quadratic terms are positivede�nite or negative de�nite by completing squares. �e procedure is however muchmorecomplicated, and best understood in terms of “eigenvalues of matrices,” a subject which isexplained in courses on linear algebra ormatrix algebra (Math 320, 340, or 341). �erefore,we will only use the second derivative test for functions of two variables in this course.

12. Optimization with constraints and the method of Lagrange multipliers

In many optimization problems we want to �nd the maximal or minimal value of afunction f(x, y) where (x, y) can be any point satisfying a certain constraint

(112) g(x, y) = C.

�us the domainD of the function we want to minimize consists of all points (x, y) thatsatisfy the equation g(x, y) = C : it is a level set of g.

12.1. Solution by elimination or parametrization. One approach to minimiza-tion problems with a constraint is to “eliminate one variable.” If we are asked to �nd theminimal value that f(x, y) can have if (x, y) must satisfy the constraint g(x, y) = C ,then we �rst try to solve the constraint equation for one of the variables, say, for y:

g(x, y) = C ⇐⇒ y = h(x).

Now the only (x, y) that we have to consider are points of the form (x, h(x)), so theold minimization problem is equivalent to a new problem: �nd the minimal value ofF (x) = f(x, h(x)), where there are no constraints on x. �is new problem is a onevariable problem of the kind we learned to solve in Math 221.

12.2. Example – which rectangle with perimeter 1 has the largest area? �isis another problem, like �nding the tangent to the parabola y = x2, that appears in almostevery �rst semester calculus course. We recall its solution.

If the sides of the rectangle are x and y, then its area is xy and its perimeter is 2(x+y).Hence the function we want to maximize is f(x, y) = xy and the constraint is

g(x, y) = 2(x+ y) = 1.

Solving the constraint for y tells you that y = 12−x, so we want to maximize the function x

y y

x

F (x) = f(x, 12 − x) = x( 1

2 − x). �e only remaining constraint is that x cannot benegative, and that y = 1

2 −x also cannot be negative. �us we want to maximize F (x) =

x( 12 − x) over all x in the interval 0 ≤ x ≤ 1

2 .


12.3. Example – maximize x+ 2y over the unit circle. We are asked to �nd themaximal value of f(x, y) = x+ 2y where (x, y) is allowed to be any point that satis�esthe constraint g(x, y) = x2 + y2 = 1. If we try to solve for y we �nd that there are twosolutions, y = ±

√1− x2, and so the “function” F (x) = x+ 2y = x± 2

√1− x2 is not

really a function at all. In this case we can still solve the problem by noting that any pointon the unit circle can be wri�en as (x, y) = (cos θ, sin θ) for some angle θ, and thus wehave to maximize the function

F (θ) = f(cos θ, sin θ) = cos θ + 2 sin θ.

Here there are no constraints on θ, and we again have a �rst semester calculus problem.

12.4. Solution by Lagrange multipliers. In both examples above we were luckybecause we could either solve the constraint equation or we could parametrize all possiblepoints that satisfy the constraint. �ere is amethod due to Joseph-Louis Lagrange (knownfrom the remainder term) that does not require this kind of luck. His method is based onthe following observation (see Figure 8).

f=0.69

f=0.68

f=0.66f=0.65

f=0.67

g(x,y) = C

A

B

∇g

∇f

∇f ∇g

Figure 8. Lagrange multipliers: if, at some point like B on the constraint set the gradients of fand g are not parallel, then we can increase f by moving along the constraint set in the directionof

#‰∇f . At a point (such as A) where the function f reaches a maximum, the gradients#‰∇f and

#‰∇g must be parallel.

Let B = (x, y) be a point on the constraint set as in the �gure. Assume that #‰∇g 6= #‰0

atB, then nearB the Implicit Function�eorem says that the constraint set g(x, y) = C

is a curve, and that its tangent is perpendicular to #‰∇g(B).If #‰∇f(B) is not perpendicular to the constraint set atB, then it provides us a direction

along the constraint set in which f will increase (see Figure 8). �erefore f does not havea maximum at B. It follows that at a maximum of f on the constraint set g(x, y) = C

the gradient #‰∇f(B) must be perpendicular to the constraint set, and hence it must beparallel to #‰∇g(B). Since one vector is parallel to another if it is a multiple of the othervector, we have found the following fact.

12. OPTIMIZATION WITH CONSTRAINTS AND THE METHOD OF LAGRANGE MULTIPLIERS 103

12.5. �eorem (Lagrange multipliers). If the function z = f(x, y) a�ains itslargest value among all points that satisfy the constraint g(x, y) = C at the point (a, b),and if

(113) #‰∇g(a, b) 6= 0,

then the point (a, b) satis�es the Lagrange Multiplier equation,

(114) #‰∇f(a, b) = λ#‰∇g(a, b)

�e number λ is called the Lagrange multiplier, and it is one of the unknowns in theequations we must solve when we use Lagrange’s method.

12.6. Example. We again try to �nd the largest rectangle with perimeter 1, as inexample 12.2.

�e problem is to maximize f(x, y) = xy with constraint g(x, y) = 2x+ 2y = 1. Wecompute the gradients

#‰∇f =

(yx

),

#‰∇g =

(22

),

�e gradient of g never vanishes, i.e. #‰∇g(x, y) 6= #‰0 for all (x, y), so Lagrange tells us

that at any minimum or maximum the following equations hold:

fx = λgx, i.e. y = 2λ

fy = λgy, i.e. x = 2λ

g(x, y) = C, i.e. 2x+ 2y = 1.

�e�rst two equations come from #‰∇f = λ#‰∇g, and the last equation is the constraint. We

have three equations, and we also have three unknowns: x, y and the Lagrange multiplierλ.

In this case it is easy to solve the equations: the �rst two say that both y and x equal2λ, so in particular, they equal each other: y = x. �is already tells us that the solutionis a square! To complete the problem we must still solve for x, y, λ. Since x = y theconstraint implies 4x = 1, so x = y = 1

4 . Finally, either of the �rst two equationsprovides λ = 1

2x = 12y = 1

8 .What is the meaning of λ? In this example you see that we �rst found the solution

(x, y), and then computed λ. �e multiplier λ is the ratio between the lengths of thegradients of f and g at the maximum, and can be interpreted as the rate at which themaximum f changes if the value of the constraint g = C is varied; the details go beyondthis course (but see Problems 13.15 and 13.16). In any case, the upshot is that, when usingLagrange’s method, you must always also �nd λ, or at least make sure that a λ exists forthe x and y you have found.

Did we �nd a maximum or a minimum? Lagrange’s method does not tell us if wehave a maximum or a minimum, and we will have to use di�erent methods to �gure thisout. �ere does exist a second derivative test for constrained minimization problems, butit falls outside the scope of this course.

12.7. A three variable example. Find the largest value of x + y + z on the spherewith equation x2 + y2 + z2 = 1.

Solution: We must maximize f(x, y, z) = x + y + z with constraint g(x, y, z) =x2 + y2 + z2 = 1.


Lagrange’smethod says that theminimumandmaximumeither occur at a point (x0, y0, z0)

with #‰∇g(x0, y0, z0) =#‰0 , or else at a point that satis�es Lagrange’s equations. �e gra-

dient of g is#‰∇g(x, y, z) =

2x2y2z

,

and the only point where #‰∇g =#‰0 is at the origin. �e origin does not satisfy the con-

straint g(x, y, z) = 1, so we can rule out the possibility of the maximum or minimumoccurring at a point with #‰∇g =

#‰0 .

�is leads us to consider the Lagrange multiplier equations, which are1 = λ · 2x (fx = λgx)

1 = λ · 2y (fy = λgy)

1 = λ · 2z (fz = λgz)

x2 + y2 + z2 = 1 (g(x, y, z) = C)

Solve the �rst three equations for x, y, z and substitute the result in the constraint, andwe �nd

1

4λ2+

1

4λ2+

1

4λ2= 1 =⇒ 3

4λ2= 1 =⇒ λ = ± 1

2

√3.

We therefore �nd two points on the sphere,(x, y, z) =

(13

√3, 1

3

√3, 1

3

√3)and (x, y, z) =

(− 1

3

√3,− 1

3

√3,− 1

3

√3)

By computing the function values we �nd that the �rst point maximizes x + y + z, andthe second minimizes x+ y + z.

13. Problems

1. Minimize xy subject to the constraint

x2 + 14y2 = 1.

Draw the constraint set. •

2. A six-sided rectangular box is to hold 1/2cubic meter. Which shape should the box beto minimize surface area?

(a) Find the solution without using La-grange’s method. •

(b) Use Lagrange multipliers to solve thisproblem. •

3. Using the methods of this section, findthe shortest distance from the origin to theplane x+y+z = 10. (suggestion: instead ofminimizing the distance, you can also mini-mize the square of the distance.) •

4. Use Lagrange multipliers to find thelargest and smallest values of f(x, y) = xunder the constraint g(x, y) = y2 − x3 +x4 = 0.

5. (a) Using Lagrange multipliers, find theshortest distance from the point (2, 1, 4) tothe plane 2x− y + 3z = 1. •(b) Using Lagrange multipliers, find theshortest distance from the point (x0, y0, z0)to the plane ax+ by + cz = d. •

6. (a) Find the shortest distance from thepoint (0, b) to the parabola y = x2, usingLagrange multipliers.

(b) Find the shortest distance from the point(0, 0, b) to the paraboloid z = x2 + y2.

(c) Find the shortest distance from the point(0, 0, b) to the paraboloid z = x2 + 1

4y2.

7. Find the volume of the largest rectangu-lar box with edges parallel to the axes thatcan be inscribed in the ellipsoid

2x2 + 72y2 + 18z2 = 288.

8. A six-sided rectangular box is to hold 1/2cubic meter; what shape should the box beto minimize surface area? •

13. PROBLEMS 105

9. A circular cone has height H , and itsbase has radius R. If the volume of thecone is fixed, then which ratio of radius toheight (R : H) minimizes the surface areaof the cone? (The area of the cone is A =πR√R2 +H2, its volume is V = 1

3πR2H ,

and instead of minmizing the area you couldalso minimize the square of the area.)

10. The post o�ice will accept packageswhose combined length and girth are atmost 130 inches (girth is the maximum dis-tance around the package perpendicular tothe length). What is the largest volume thatcan be sent in a rectangular box? •

11. The bo�om of a rectangular box coststwice as much per unit area as the sides andtop. Find the shape for a given volume thatwill minimize cost. •

12. Find all points on the surface

xy − z2 + 1 = 0

that are closest to the origin. •

13. The material for the bo�om of an aquar-ium costs half as much as the high strengthglass for the four sides. Find the shape of thecheapest aquarium that hold a given volumeV . •

14. The plane x − y + z = 2 intersects thecylinder x2 + y2 = 4 in an ellipse. Find the

points on the ellipse closest to and farthestfrom the origin. (Hint: on the plane you al-ways have z = 2− x+ y, so you can elimi-nate z and make this a problem about func-tions of (x, y) only.) •

15. (Interpretation of the Lagrangemultiplier–general case.) Suppose thatfor all values of the constraint parameterC we have a solution

(x(C), y(C)

)to the

Lagrange multplier equations#‰∇f(x, y) =

λ#‰∇g(x, y), g(x, y) = C .

Show that the derivative off(x(C), y(C)) with respect to C is exactlyλ.

16. (Interpretation of the Lagrangemultiplier–example.)(a) Use Lagrange multpliers to find the rec-tangle with sides x and y and enclosed areaA whose perimeter is as small as possible.Find the x and y coordinates of the solu-tion, the Lagrange multiplier λ, as well asthe smallest perimeter L, and write all ofthem as functions of the prescribed area A.

(b) Compute the derivative

dL

dA.

Describe inwordswhat this derivative repre-sents (“the rate of change of . . . ”), and verifythat in this example dL

dA= λ.

CHAPTER 6

Integrals

1. Ways of Integrating

In this chapter we will see several di�erent ways of integrating functions of severalvariables. Before introducing them one by one, we spend this section reviewing howintegration was de�ned in �rst semester calculus and outlining the general features thatall di�erent ways of integrating have in common.

1.1. �e one variable integral. To begin, let us quickly recall how the integral ofa function of one variable is de�ned. Given a function y = f(x) and an interval [a, b], wechoose a partition of the interval [a, b], which means that

• we split the interval [a, b] into shorter intervals [x0, x1], [x1, x2], . . . , [xN−1, xN ],where a = x0 < x1 < · · · < xN = b,• and we choose one sample point ξk from each interval [xk−1, xk].

From these ingredients we compute the Riemann sum

R = f(ξ1)∆x1 + · · ·+ f(ξN )∆xN =

N∑k=1

f(ξk)∆xk

where ∆xk = xk − xk−1 is the length of the kth interval.

a = x0 x1 x2 x3 x4 x5 b = x6 a b

Figure 1. Riemann sums for∫ baf(x)dx with one partition on the le�, and a finer partition on the

right. The dashed lines in the figure on the le� indicate where the sample points ξk were chosen.

For most functions y = f(x) it is true that upon making the intervals [xk−1, xk]shorter (and hence choosing more partition intervals), the resulting Riemann sums ap-proach a limiting value. When this happens we call the limiting value of the Riemannsums the integral of the function f(x) over the interval [a, b]:∫ b

a

f(x)dx = lim“as the partition

gets finer”

f(ξ1)∆x1 + · · ·+ f(ξN )∆xN .

107

108 6. INTEGRALS

�e individual terms in the Riemann sum are areas of the narrow rectangles in the �gure.Added together they approximate the area of the region under the graph, so that theintegral is the area between the graph of y = f(x) and the x-axis (at least in the case thatf is a positive function, so that its graph lies above the x-axis.)

A note about rigor. Our quick description of the single variable integral is lacking inmathematical precision. It is based on a belief that we know what “area” is. In the late19th and early 20th centuries many examples of geometric �gures were found in whicharea computations give unexpected and counterintuitive results. �erefore one cannotbase a theory on our intuitive idea of “area,” and instead the integral, de�ned as limit ofRiemann sums is used a way of giving a rigorous de�nition of the notion of “area.” Fora proper treatment of these issues the student is referred to a more advanced course onReal Analysis (e.g. Math 421 or 521).

1.2. Generalizing the one variable integral. While there is essentially only onekind of integral in single variable calculus, there are many di�erent ways of integratingfunctions of several variables. All these di�erent notions of “integral” �t the followingbroad description.

In any kind of integral we have these ingredients:• a domain. Depending on the kind of integral, this can be a region in the plane,a region in space, a plane curve, a space curve, or even some surface in threedimensional space.• a function that is de�ned on the domain• a way of measuring the “size” of pieces of the domain

To de�ne the integral we “partition” the region, i.e. we divide it into lots of li�le pieces.Given any such partition of the region into smaller pieces, we then form the following“Riemann sum” ∑

pieces in thepartition

(f at sample point

in piece #k

)×{Size of piece #k

}

�is gives us a number for each way of partitioning the region. As we make the partition�ner, i.e. as we choose more, smaller, pieces, the Riemann sums tend to get closer to oneparticular number, which is called the integral of the function. In short, the integral is thelimit of the Riemann sums we �nd as we take �ner and �ner partitions:∫some region

f(x) dx = limas the

partitiongets finer

∑pieces in thepartition

(f at sample point

in piece #k

)×{Size of piece #k

}

Depending on what kind of function we have, and what kind of region the function isde�ned on, and also howwe decide to measure the size of the small pieces in the partition,this process can lead to many di�erent kinds of integrals. �e integrals we will meet inthis chapter are double integrals and triple integrals; in the next chapter on vectorcalculus we will also see line integrals and surface integrals. See Table 1.

2. Double Integrals

Let z = f(x, y) be a function of two variables de�ned on some regionD in the plane.�e double integral of f over D is de�ned in terms of Riemann sums, following thegeneral scheme described in the previous section. To form a Riemann sum we �rst need a

2. DOUBLE INTEGRALS 109

Kind of integral DomainTypical piece of

partition Size of piece

“Good old 221Integral”∫ baf(x) dx

intervala ≤ x ≤ b

small subinterval(xk−1, xk)

length of subinterval∆xk = xk − xk−1

Multiple integral∫∫Df(x, y)dA

region inthe plane tiny sub domain

area ∆A oftiny sub domain

Multiple integral∫∫∫Df(x, y, z)dV

regionin space tiny sub domain

volume ∆V oftiny sub domain

Line integral∫Cf(x, y) ds

curve inthe plane

short sub arcof the curve

length ∆s ofthe sub arc

Line integral∫Cf(x, y, z) ds

curvein space

short sub arcof curve

length ∆s ofthe sub arc

Surface integral∫∫Sf(x, y, z) dA

surfacein space

small patchon the surface

area ∆A ofthe patch

Table 1. A list of the di�erent kinds of integrals that we will encounter in math 234.

partition of the regionD into smaller regionsD1, . . . ,DN , andwe need to choose a samplepoint (xk, yk) from each region Dk . If ∆Ak is the area of region Dk , then the Riemannsum corresponding to the partitionD1, · · · , DN and the choice of sample points (x1, y1),. . . , (xN , yN ) is

(115) R = f(x1, y1)∆A1 + · · ·+ f(xN , yN )∆AN =

N∑k=1

f(xk, yk) ∆Ak.

If the partition is “su�ciently �ne” then this Riemann sum will in many cases be close toone particular number, which we will call the integral of the function f over the region

Figure 2. On the le�: a region in the plane with some partition. Many pieces of the partitionare rectangles. This is a common choice, but the pieces don’t have to be rectangles: here the piecesthat touch the boundary of the domain have at least one curved edge. On the right: the sameregion with two finer partitions.

110 6. INTEGRALS

D. �us

(116)∫∫D

f(x, y) dA = limas the partition“gets finer&finer”

N∑k=1

f(xk, yk) ∆Ak.

To make this more precise one has to resort to ε’s and δ’s, which results in the followingde�nition.

2.1. De�nition. If for every ε > 0 there is a δ > 0 such that the Riemann sumcorresponding to any partition of the regionD into smaller piecesD1, . . . ,DN , whose pieceshave diameter no more than δ satis�es∣∣∣∣∣I −

N∑k=1

f(xk, yk) ∆Ak

∣∣∣∣∣ < ε

then we say that ∫∫D

f(x, y) dA = I.

On one hand it can be shown in many cases that that the integral of a function ex-ists according to the above de�nition. On the other hand the ε-δ de�nition is neither apractical method of computing such integrals, nor does it provide an easy intuitive un-derstanding of the properties of the integral. �erefore, we will stick to the less precisede�nition (116) in this course.

2.2. �e integral is the volume under the graph, when f ≥ 0. If the functionf is positive, then its graph lies above the xy-plane, and there is a simple interpretationof the integral, namely ∫∫

D

f(x, y) dA = Volume of R,

where R is “the region under the graph of f above the domain D” – in symbols,(117) R =

{(x, y, z) : (x, y) lies in D, and 0 ≤ z ≤ f(x, y)

}.

To see why this is so, imagine that we have a positive function z = f(x, y) de�ned onsome region D in the xy-plane, and let us try to compute the integral

∫∫Df(x, y)dA

“geometrically.” To compute the integral we begin by �nely partitioning the region Dinto smaller regionsD1,D2, . . . ,DN (see Figure 3 on the le� where the small pieces werethemselves chosen to be rectangles). We also choose one “sample point” (xk, yk) in eachregion Dk . �e Riemann sum we get this way is

R = f(x1, y1)∆A1 + · · ·+ f(xN , yN )∆AN

where ∆Ak is the area of Dk . �e kth term, f(xk, yk)∆Ak , is the volume of a blockwhose base is Dk and whose top is some point on the graph of the function above theregion Dk . �is volume is almost, but usually not exactly the same as the volume of theregion between the graph of the function and the small region Dk in the xy-plane. �evolume f(xk, yk)∆Ak of the block aboveDk is not exactly the same as the volume of theregion under the graph because the top of the block is a piece of a horizontal plane whilethe graph of f will usually have a slope (see Figure 3).

�e total Riemann sum is therefore the sum of the volumes of such blocks, (see Fig-ure 4) and this will approximate the volume between the graph of f and the domain of


f(xk, yk)

x

z

y

ΔAk

Dk

a b

c

d

(xk,yk)

Figure 3. On the le�: the domain of the function f partitioned into 6 × 5 pieces, each withthe same width ∆x and height ∆y. To form a Riemann sum we have to choose one sample point(xk, yk) in each piece Dk of the partition. Below we will always choose the upper-right-handcorner of the rectangle to be the sample point. On the right: Any piece in the partition correspondsto a term in the Riemann sum of the form f(xk, yk)∆Ak . This is the volume of a block of heightf(xk, yk), and baseDk , which is approximately the volume of the region under the graph of f andabove the piece Dk . Adding all these volumes together we see that a Riemann sum approximatesthe total volume between the graph and the regionD.

x

y

z

N=4M=3x

y

z

N=8M=6

Figure 4. Approximating the region under the graph of z = f(x, y) from Figure 3 by verticalblocks. The base of each block is a rectangle in a partition of the domain of f . As we choose finerand finer partitions, the region occupied by the vertical blocks gets closer to the region under thegraph of f .

integration D. �e �ner the partition, the be�er the approximation and so we can con-clude1 that the limit of the Riemann sums is the volume under the graph, i.e. the volumeof the region R de�ned in (117).

1As promised before, this is not a very precise “proof,” a proof that the limit of Riemann sums exists quicklylead us to ε&δ arguments.

112 6. INTEGRALS

2.3. How to compute a double integral. So far, we have a de�nition for the doubleintegral

∫∫Df(x, y)dA, and an interpretation of the integral as “volume under the graph

of f .” What is missing is a method of actually computing the integral. In this section we’llsee how one can compute a double integral by doing two one-variable integrals.

Let us take another look at the integral of the function f over the rectangleD =

{(x, y) : a ≤ x ≤ b, c ≤ y ≤ d

},

from the previous section.We again partitionD into smaller rectangles, as in Figure 3, but instead of just counting

them and arbitrarily numbering the pieces 1, 2, . . . ,N , we can use the fact that the smallerrectangles appear in rows and columns. If we take N rectangles in the x direction, andM in the y direction, then the smaller rectangles will measure ∆x by ∆y, where

∆x =b− aN

, ∆y =d− cM

.

We let (xk, yl) be the upper-right-hand corner of the rectangle in the kth column fromthe le�, and the lth row from below. �en(118) xk = a+ k∆x, yl = c+ l∆y.

�e Riemann sum corresponding to this partition and choice of sample points (xk, yl) is

(119)

R =∑

f(xk, yl)∆x∆y

= f(x1, y1)∆x∆y + · · · + f(xN , y1)∆x∆y+

f(x1, y2)∆x∆y + · · · + f(xN , y2)∆x∆y+...

f(x1, yM )∆x∆y + · · · + f(xN , yM )∆x∆y

Since we are choosing the upper-right-hand corner of each rectangle as sample pointin that rectangle, the sample point for the rectangle at the top-right is (xN , yM ). (SeeFigure 3 on the le�.) �erefore, in this summation k can have any value with 1 ≤ k ≤ Nand l can be any integer with 1 ≤ l ≤M .

�e term corresponding to rectangle (k, l) represents the volume of a block whoseheight is f(xk, yl) and whose base is a ∆x × ∆y rectangle. Together these blocks ap-proximate the region between the graph of the function and the xy-plane.

Consider the terms on the kth row in equation (119); a�er factoring out ∆y we get

row #k of (119) = ∆y{f(x1, yk)∆x+ f(x2, yk)∆x+ · · ·+ f(xN , yk)∆x

}.

Note that in this sum the function is always evaluated at the same value of y, namely yk .�e sum between braces {· · · } is actually a Riemann sum for the one-variable integral

I =

∫ b

a

f(x, yk)dx

in which we treat f(x, yk) as a function of x only and consider the variable y to be frozenat y = yk . �e value of this integral depends on the value at which y is frozen, so it isbe�er to write

I(y) =

∫ b

a

f(x, y)dx.

With this notation we �nd thatrow #k of (119) ≈ ∆y ×

{I(yk)

}= I(yk)∆y.


To �nd the value of the Riemann sum that approximates the double integral∫∫Df(x, y)dA

we add the rows in (119) and �nd

R ≈ I(y1)∆y + I(y2)∆y + · · ·+ I(yM )∆y.

�esumon the right is again a Riemann sum for a one variable integral, namely,∫ dcI(y)dy.

�erefore we �nd that

R ≈∫ d

c

I(y)dy

If we now take the limit in which we let the size of the pieces in the partition go to zero,then it can be shown (with quite a bit of e�ort) that the approximation above gets be�er,and that one has ∫∫

D

f(x, y)dA =

∫ d

c

I(y)dy.

�erefore, remembering the de�nition of I(y), we have found the following method ofcomputing a double integral.

2.4. �eorem. If f(x, y) is a function de�ned on a rectangle

D = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d} ,

then the double integral of f over D is given by

(120)∫∫D

f(x, y)dA =

∫ d

c

{∫ b

a

f(x, y)dx}dy.

One can also �rst integrate with respect to y and then x, so that

(121)∫∫D

f(x, y)dA =

∫ b

a

{∫ d

c

f(x, y)dy}dx.

x

y

z

y = c

y = d

x = a

x = b

Figure 5. This picture shows the blocks corresponding to all those terms in the Riemann sum Rfrom equation (119) in which y = yk . These terms

{f(x1, yk)∆x+ · · ·+ f(xN , yk)∆x

}∆y give

you the total volume of one row of “matchsticks” from Figure 4. In this sum y is frozen at the valuey = yk , so we can think of f(x1, yk)∆x+ · · ·+ f(xN , yk)∆x as a Riemann sum for the integral∫ baf(x, yk) dx.

114 6. INTEGRALS

�e second way of computing the double integral∫∫Df(x, y) dA, i.e. equation (121),

follows by the same reasoning that led us to (120), except in (119) one groups the termsby columns rather than rows.

To compute the right hand side in this equation we have to compute two one-variableintegrals. �e expression∫ d

c

{∫ b

a

f(x, y)dx}dy =

∫ d

c

∫ b

a

f(x, y) dx dy

is called an iterated integral.�e two integrals that appear in an iterated integral are o�en called “inner” and “outer”

integral: ∫ d

c

{∫ b

a

f(x, y)dx︸︷︷︸inner integral

}dy

︸︷︷︸outer integral

.

2.5. Example: the volume under the graph of the paraboloid z = x2 + y2

above the square Q = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. �e double integral we haveto compute is

Volume =

∫∫Q

(x2 + y2

)dA

and to compute it we write it as an iterated integral∫∫Q

(x2 + y2

)dA =

∫ 1

0

{∫ 1

0

(x2 + y2)dx}dy.

In the inner integral the variable y is frozen, so to compute the inner integral, we simplytreat y as a constant, and integrate with respect to x. We get∫ 1

0

(x2 + y2)dx =[

13x

3 + y2x]1x=0

= 13 + y2.

(�is is I(y) in the notation of the previous section.)To get the double integral we must still do the outer integral:∫∫

Q

(x2 + y2

)dA =

∫ 1

0

{∫ 1

0

(x2 + y2)dx}dy

=

∫ 1

0

(13 + y2

)dy

=[

13y + 1

3y3]10

= 13 + 1

3 = 23 .

Since the surrounding block (Figure 6) is a 1× 1× 2 block, its volume is 2, and the regionunder the graph occupies exactly one third of the whole block.

To compute the volume of the region under the graph of the same function above therectangle {(x, y) : 0 ≤ x ≤ a, 0 ≤ y ≤ b} one can compute either of the iteratedintegrals ∫ a

0

∫ b

0

(x2 + y2

)dy dx or

∫ b

0

∫ a

0

(x2 + y2

)dx dy.


x

y

z

a

b x

y

z

Figure 6. The graph of z = x2 + y2 above the unit square Q on the le�, and rectangle {(x, y) :0 ≤ x ≤ a and 0 ≤ y ≤ b}, on the right, together with the surrounding block. What fraction ofthe volume of the block lies below the graph?

2.6. Double integrals when the domain is not a rectangle. We have seen howto compute a double integral when the domain is a rectangle. �e reasoning that led usfrom a double integral to an iterated integral also works for non rectangular domains,provided they are not too complicated. Suppose we want to compute

∫∫Df(x, y)dA

where the domain D is the region caught between the graphs of two functions:

D ={

(x, y) : a ≤ x ≤ b, f(x) ≤ y ≤ g(x)}.

We again partition the region by cu�ing it along many vertical lines x = x1, x = x2,. . . , x = xN , and many horizontal lines y = y1, . . . , y = yM . Most of the pieces of thepartition will be rectangles, but those that overlap with the boundary of the region Dmay have curved edges. See Figures 7 and 8.

�is time, all the terms in a Riemann sum corresponding to one particular strip xk−1 ≤x ≤ xk add up to a Riemann sum for an integral over the y variable,∫ d(x)

c(x)

f(xk, y) dy × ∆x,

and adding all these we get the iterated integral

(122)∫∫D

f(x, y) dA =

∫ b

a

{∫ d(x)

c(x)

f(x, y) dy}dx.

2.7. An example–the parabolic o�ce building. Consider the region under thegraph of f(x, y) = x+ y, above the domain

D ={

(x, y) : 0 ≤ x ≤ 1, (1− x)2 ≤ y ≤ 1}.

116 6. INTEGRALS

Top: y=d(x)

D

xkxk-1

e strip c(x)≤y≤d(x), xk-1≤x≤xk

a b

Boom

: y=c(x)

Figure 7. The region between the graphs of y = f(x) and y = g(x).

�e volume of this region is given by

V =

∫∫D

(x+ y)dA.

We can compute this volume by �nding the following iterated integral

(123) V =

∫ 1

x=0

∫ 1

(1−x)2(x+ y) dy dx.

Alternatively, the region D can also be described as

D = {(x, y) : 0 ≤ y ≤ 1, 1−√y ≤ x ≤ 1} .

�is leads to the following iterated integral for the volume

(124) V =

∫ 1

y=0

∫ 1

1−√y

(x+ y) dx dy.

Both iterated integrals should give the same answer. Let’s compute the �rst one:

V =

∫ 1

0

∫ 1

(1−x)2(x+ y) dy dx

=

∫ 1

0

[12xy + 1

2y2]1(1−x)2

dx

=

∫ 1

0

[x(1− (1− x)2

)+ 1

2

(12 − (1− x)4

)]dx

=

∫ 1

0

[2x2 − x3 + 1

2

(4x− 6x2 + 4x3 − x4

)]dx

=

∫ 1

0

[2x2 − x3 + 2x− 3x2 + 2x3 − 1

2x4]dx

= 23 −

14 + 1− 1 + 2× 1

4 −12 ×

15

= 1615 .


x

y

1

1

D

x

y

1

1

x

y

z

Figure 8. On the le�: the domain of integration, a partition, and all pieces in the partitioncorresponding to one value of y. On the right: The “parabolic o�ice building,” being the regionwhose volume is computed in example 2.7

Note that even though the function we integrated is very simple (it’s just x + y) theintegral can still become complicated because of the shape of the domain D over whichwe are integrating.

2.8. Double integrals in Polar Coordinates. Sometimes Cartesian coordinatesare just not the best choice. For instance, a disc or radius R, centered at the origin, isvery easy to describe in polar coordinates as “all points with r ≤ R.” In Cartesian coor-dinates we need Pythagoras, and we have to say “all points with x2 + y2 ≤ R2.” In thesame spirit a “polar rectangle” is a domain of the form

R = {all points with θ0 ≤ θ ≤ θ1, r0 ≤ r ≤ r1} .See Figure 9 (on the le�). �ere is a very natural way of partitioning such a region intomany smaller regions, by cu�ing the region along curves of constant r (arcs centered atthe origin) or constant θ (rays emanating from the origin). If the partition is su�ciently�ne, then the pieces in the partition will almost be real Cartesian rectangles, with sidesr∆θ and ∆r (∆θ being the angle between adjacent rays, and ∆r being the di�erence inradius between two consecutive arcs). �e area of such a small partition piece is therefore∆A ≈ r∆θ ×∆r, and one arrives at the following formula for the integral of a functionof a polar rectangle

(125)∫∫R

f(x, y) dA =

∫ r1

r0

∫ θ1

θ0

F (r, θ) rdθ dr =

∫ θ1

θ0

∫ r1

r0

F (r, θ) rdr dθ.

118 6. INTEGRALS

x

y

ΔrrΔθΔA

Δθr

x

y

Figure 9. Le�: A “polar rectangle” and a partition by lines of constant θ (the spokes) and curvesof constant r (the arcs). Right: The area of a small piece of such a partition is approximately∆A ≈ ∆r × r∆θ.

Here F (r, θ) = f(r cos θ, r sin θ) is the function f(x, y) wri�en in polar coordinates.2

Figure 10. The gray region is the region between the polar graphs r = a(θ) and r = b(θ).

�ere is a similar formula for more complicated domains. If a domain can be describedin polar coordinates by

D = {all points with α ≤ θ ≤ β, a(x) ≤ r ≤ b(x)}

and if we want to integrate a function z = f(x, y) of this domain, then we can againpartition the domainD into many small pieces that are bounded by circular arcs centeredat the origin, and rays emanating from the origin. �e area of a small piece in the partitionis once again given by ∆A ≈ ∆r × r∆θ, and therefore the integral of f over D is

(126)∫∫D

f(x, y) dA =

∫ β

α

∫ b(θ)

a(θ)

F (r, θ) r dr dθ.

2It is very common to use the same le�er f for both functions, i.e. to write f(x, y) for f as a function ofCartesian coordinates, and also f(r, θ) for the same function but wri�en in Polar coordinates. �is begs thequestion of what f(0.3, 1.24) means – are (0.3, 1.24) the polar or the Cartesian coordinates of the point atwhich f is to be evaluated? To avoid this kind of ambiguity we will try to use di�erent le�ers for the samequantity regarded as a function of Cartesian coordinates, and of Polar coordinates.


x

y

z

π/4

Figure 11. The graph of the function z = aθ in polar coordinates is called the helicoid. Herewe see one quarter turn of a helicoid with a = 1

2. The volume under the helicoid is given by a

double integral which is best computed using polar coordinates. Which fraction of the volume inthe surrounding quarter cylinder lies beneath the helicoid?

2.9. Example: the volume under a quarter turn of a helicoid. A helicoid is thesurface that in polar coordinates is given by

z = aθ

where a > 0 is some constant. (See Chapter III, § 4.2)If we choose the constant a = 1

2 , and take the �rst quarter turn of this surface, onwhich 0 ≤ θ ≤ 1

2π, then we get the picture in Figure 11. In that drawing we have onlyincluded the part with 0 ≤ r ≤ 1. To compute the volume of the region under the quarterhelicoid using Cartesian coordinates, we would have to compute this integral

V =

∫ 1

0

∫ √1−x2

0

12 arctan

y

xdy dx.

(Try to set up this integral yourself!)In Polar coordinates things are easier. �e domain is a polar rectangle,

0 ≤ r ≤ 1, 0 ≤ θ ≤ 12π,

and the function is very simple,F (r, θ) = 1

2θ.

�e double integral that represents the volume is therefore

V =

∫∫D

12θ dA =

∫ 1

0

∫ π/2

0

12θr dθ dr =

π2

32.

120 6. INTEGRALS

3. Problems

1. Compute these iterated integrals:

(a)∫ 1

0

∫ 4

0

x dy dx •

(b)∫ 1

0

∫ 4

0

x dx dy •

(c)∫ 1

−1

∫ x2

0

dy dx •

(d)∫ π

0

∫ y

0

sin y

ydx dy •

(e)∫ π

0

∫ θ

0

sin θ

θdr dθ •

(f)∫ 1

0

∫ √1−x2

0

dy dx •

2. What is wrong with the iterated integral∫ 1

x

{∫ 1

0

sin(πx)dx}dy ?

Is the answer a number – does it depend on x or y? •

3. (a) Is the following true or false? For any two functions f(x) and g(y) one has∫ 1

0

∫ 2

0

f(x)g(y) dx dy =

(∫ 1

0

f(x) dx

)·(∫ 2

0

g(y) dy

).

Explain your answer (if you claim “true” give a proof, if you claim “false” give a counterexample.) •(b) Is the following true or false? For any two functions f(x) and g(y) one has∫ 2

0

∫ 1

0

f(x)g(y) dy dx =

(∫ 1

0

f(x) dx

)·(∫ 2

0

g(y) dy

).

Explain your answer (no, this is not the same question as before. Look at the integration bounds.)•(c) Suppose D is the unit disc, D = {(x, y) : x2 + y2 < 1}. True or False: For any two functionsf(x) and g(y) one has∫∫

D

f(x)g(y) dx dy =

(∫ 1

−1

f(x) dx

)·(∫ 1

−1

g(y) dy

).

Again, explain your answer. •

4. Answer the question posed in Figure 6. •

5. Compute the following double integrals. In each case sketch the domain of integration andshow which iterated integral you must compute to find the given double integral.

(a)∫∫

D

(1 + x) dA D = {(x, y) : 0 ≤ x ≤ 2, 0 ≤ y ≤ 4}. •

(b)∫∫

D

(x+ y) dA D = {(x, y) : |x| ≤ 1, 0 ≤ y ≤ 4}. •

(c)∫∫

D

xy dA D = {(x, y) : 0 ≤ x ≤ y, 1 ≤ y ≤ 2}. •

(d)∫∫

D

dA D ={

(x, y) : 12y2 ≤ x ≤ √y, 0 ≤ y ≤ 1

}. •

(e)∫∫

D

x2

y2dA D = {(x, y) : 1 ≤ x ≤ 2, 1 ≤ y ≤ x}. •

(f)∫∫

D

y

exdA D =

{(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ x2

}. •

(g)∫∫

D

x cos y dA D ={

(x, y) : 0 ≤ x ≤√π/2, 0 ≤ y ≤ x2

}. •

(h)∫∫

D

√x3 + 1 dA D = {(x, y) : 0 ≤ y ≤ 1,

√y ≤ x ≤ 1}. •

4. TRIPLE INTEGRALS 121

(i)∫∫

D

y sin(x2) dA D ={

(x, y) : 0 ≤ y ≤ 1, y2 ≤ x ≤ 1}. •

(j)∫∫

D

x√

1 + y2 dA D ={

(x, y) : 0 ≤ x ≤ 1, x2 ≤ y ≤ 1}. •

(k)∫∫

D

2√1− x2

dA D is the triangle bounded by the y axis, the line y = 1

and the line y = x.•

6. Find the volumes of the following regions by computing a double integral.

(a) the region bounded by z = x2 + y2 and z = 4. •

(b) the region in the first octant bounded by y2 = 4− x and y = 2z. •

(c) the region in the first octant bounded by y2 = 4x, 2x+ y = 4, z = y, and y = 0. •

(d) the region in the first octant bounded byx+ y + z = 9, 2x+ 3y = 18, and x+ 3y = 9. •

(e) the region in the first octant bounded by x2 + y2 = a2 and z = x+ y. •

(f) the region bounded by x2 + y2 = 4z and z = 2. •

(g) the region bounded by z = x2 + y2 and z = y. •

7. The average value of a function f(x, y)over a domainD is by definition

average f overD =

∫∫Df(x, y) dA

area ofDFind the average value of f(x, y) =ey√x+ ey on the rectangle with vertices

(0, 0), (4, 0), (4, 1) and (0, 1).

8. Suppose f(x) is a positive function de-fined on an interval a ≤ x ≤ b. Let Abe the area under the graph of y = f(x),(a ≤ c ≤ b), and letB be the area under thegraph of y = f(x)2 (a ≤ c ≤ b)

(a) Compute∫ ba

∫ f(x)

0dydx. •

(b) Compute∫ ba

∫ f(x)

0ydydx. •

9. Let V be the volume under the graph ofthe function

z =2xy

x2 + y2,

above the region

D ={

(x, y) : x ≥ 0, y ≥ 0, x2 + y2 ≤ 1}.

(a)Write an iterated integral for the volumeV , using Cartesian coordinates. (You don’thave to compute the integral you get.) •

(b) Compute V using polar coordinates. •

10. Let V be the volume under the graph ofz = xy above the domain

D ={

(x, y) : x ≥ 0, y ≥ 0, x2 + y2 ≤ 4}.

Try to draw the region D, and the graph ofz = xy aboveD.

(a)UseCartesian coordinates to computeV .(Hint: this is similar to part (i) of the previ-ous problem, but the integral in this problemisn’t as bad.)

(b) Use Polar Coordinates to compute V .

4. Triple integrals

Instead of integrating over two-dimensional regions in the plane, we can also integrateover three-dimensional regions in space. In this section we will see the de�nition, howto compute triple integrals using iterated integrals, and some examples of how tripleintegrals come up in the real world.

4.1. De�nition, and how to compute triple integrals. �e de�nition of tripleintegrals follows the same pa�ern as that of double integrals. Let D be some three di-mensional region in three dimensional space: D could be a cube, a “block,” a cylinder, a

122 6. INTEGRALS

sphere, or in general, the region enclosed by some surface. A particular case is that of arectangular block, which is a region de�ned by the inequalities

(127) ax ≤ x ≤ bx, ay ≤ y ≤ by, az ≤ z ≤ bz.

To de�ne the triple integral of a function w = f(x, y, z) over such a region we considera partition of D into many smaller pieces. We number the pieces 1, 2, · · · , N and foreach j we choose a sample point (xj , yj , zj) from the jth partition piece. Let ∆Vj be thevolume of the jth partition piece and consider the Riemann sum

f(x1, y1, z1)∆V1 + · · ·+ f(xN , yN , zN )∆VN =

N∑j=1

f(xj , yj , zj)∆Vj .

If these Riemann sums converge to some number as we choose �ner and �ner partitions,then we call this limit is called the triple integral, or volume integral, of f overD. �enotation we use is

(128)∫∫∫D

f(x, y, z) dV = limas the

partitiongets finer

N∑j=1

f(xj , yj , zj)∆Vj .

If the domain D is a rectangular block, de�ned by the inequalities (127), then the tripleintegral can be computed by an iterated integral

(129)∫∫∫D

f(x, y, z) dV =

∫ bz

az

∫ by

ay

∫ bx

ax

f(x, y, z) dx dy dz.

�is follows from the same kind of arguments that allowed us to turn a double integralinto an iterated integral in § 2.3.

We can use (129) to compute a triple integral over any three dimensional block. Tocompute triple integrals over more general domains we can use the same slicing methodas in § 2.6. If the domain D is given by inequalities of the type

(130) ax(y, z) ≤ x ≤ bx(y, z), ay(z) ≤ y ≤ by(z), az ≤ z ≤ bz.

where ay(z), by(z), az(y, z), and bz(y, z) now are functions rather than constants, thenthe triple integral of a function f(x, y, z) over D is given by∫∫∫

D

f(x, y, z) dV =

∫ bz

az

∫ by(z)

ay(z)

∫ bx(y,z)

ax(y,z)

f(x, y, z) dx dy dz.

4.2. Example – the integral of f(x, y, z) = x2 + y2 over a rectangular block.Let’s compute the integral of f(x, y, z) = x2 + y2 over the domain

D = {(x, y, z) : 0 ≤ x ≤ A, 0 ≤ y ≤ B, 0 ≤ z ≤ C} ,

where A, B, and C are the sides of the block.�e integral of f over D is∫∫∫

D

(x2 + y2

)dV =

∫ C

0

∫ B

0

∫ A

0

(x2 + y2

)dx dy dz

4. TRIPLE INTEGRALS 123

It is a good idea to write such an integral as

∫∫∫D

(x2 + y2

)dV =

C∫z=0

B∫y=0

A∫x=0

(x2 + y2

)dx dy dz = 1

3ABC(A2 +B2

),

to emphasize which integral goes with which variable.�e computation goes in three steps (there are three integrals). �e innermost integral

is ∫ A

0

(x2 + y2) dx = 13A

3 + y2A.

Next we integrate this with respect to y:∫ B

0

∫ A

0

(x2 + y2

)dx dy =

∫ B

0

(13A

3 + y2A)dy = 1

3A3B + 1

3AB3.

�nally, we integrate with respect to z:∫ C

0

∫ B

0

∫ A

0

(x2 + y2

)dx dy dz =

∫ C

0

(13A

3B + 13AB

3)dz

= 13A

3BC + 13AB

3C

= 13ABC

(A2 +B2

).

4.3. Example of setting up a triple iterated integral– the integral of ex overthe unit sphere. Suppose we needed to know the integral∫∫∫

D

ex dV,

where the domain

D ={

(x, y, z) : x2 + y2 + z2 ≤ 1}

is the unit sphere. By slicing the domain D in the x, y, and z directions we can describefollowing the general template in (130):

• z can take any value between −1 and +1,• for given z the coordinate y can be anything between−

√1− z2 and+

√1− z2,

• for given y and z the remaining coordinatex can have all values from−√

1− y2 − z2

to +√

1− y2 − z2.(See Figure 12.)

�is lets us write the triple integral as an iterated integral:∫∫∫D

ex dV =

∫ 1

−1

∫ √1−z2

−√

1−z2

∫ √1−y2−z2

−√

1−y2−z2ez dx dy dz.

Even though it can be computed this is not an easy integral – the point of this examplewas to �nd the integration bounds in the iterated integral.

124 6. INTEGRALS

z

y

x

Figure 12. Turning a triple integral over the unit sphere into an iterated integral. The hor-izontal gray disc contains all points with a given fixed value of z; the solid line in that disc containsall points at height z whose y coordinate is also fixed at a particular value. From this drawing wecan see that z runs between−1 and +1; for any given z, the y coordinate runs between−

√1− z2

and +√

1− z2; for fixed y and z, the x coordinate can take any value between√

1− y2 − z2 and√1− y2 − z2.

5. Why compute a Triple Integral?

5.1. �e 4D-volume under a graph. Just as∫ baf(x) dx is the area between the

graph of the function y = f(x) and the interval [a, b] on the x-axis, and∫∫Df(x, y) dA is

the volume caught between the graph of z = f(x, y) and the domain D in the xy plane,there should be a similar description of

∫∫∫Df(x, y, z) dV . �ere is, but it requires

some imagination: the graph of f is the set of points in four dimensional space whosecoordinates (x, y, z, w) satisfy w = f(x, y, z), and the triple integral

∫∫∫Df(x, y, z)dV

is the “four dimensional volume” of the four dimensional region caught between the graphof f and the domain D in xyz-space. Of course, even though people will draw cartoonlike representations of the situation like this,

w=f(x, y, z)

“xyz-space”

w

What appears as the“x-axis” in thisdrawing really is meant to representthe three dimensional “xyz-space.”

The region between the graph andthe horizontal axis is four dimensional

5. WHY COMPUTE A TRIPLE INTEGRAL? 125

we cannot really visualize four dimensional volumes. Rather than telling us what thetriple integral is, the interpretation “integral=volume” gives a de�nition of what “fourdimensional volume” should be.

5.2. �e average of a function over a domainD. �ere is a formula for the “av-erage value of a function on a region.” �e only rigorous de�nition for the “average” isjust that formula, so we could simply state the formula be done with it. Here it is: theaverage of a function w = f(x, y, z) over a region D is de�ned to be

(131) Average of fover D

=1

VD

∫∫∫D

f(x, y, z) dV.

�ere is however an intuitive derivation (a story) that justi�es why we call this partic-ular quantity the average. Understanding this derivation is at least as important as justknowing the formula (131).

Why (131) deserves to be called the average. What is an average? If we have �nitelymany numbers a1, . . . , aN then their average is just

Average =a1 + · · ·+ aN

N.

If we only have �nitely many points (x1, y1, z1), . . . , (xN , yN , zN ) in the region D thenthe average function value at these points is

Average function valueat given points

=f(x1, y1, z1) + · · ·+ f(xN , yN , zN )

N.

To de�ne the average of a function over a regionD, we cannot simply add all the functionvalues of f at all the points in D because there are in�nitely many such points. Instead,we sprinkle the regionD with a very large but �nite number of points, and calculate theaverage value of the function at all these points. If the points are evenly distributed, andif there are enough of them, then the average value of the function at the dots should be agood approximation for the average value of the function on the region. E.g. the averageof our function over the region on the le� should be approximately the average of the

DD1

D2

D3

Dn

function at the dots drawn in that region.To approximate the average at the dots we partition the region into many small pieces,

which we labelD1, . . . ,Dn. We write ∆Vj for the volume of the jth pieceDj , and VD forthe volume of the whole region D. We assume that the pieces are so small that we mayassume that the function is practically constant in each piece.

Since the dots are evenly distributed over D, the number of dots in the jth partitionpiece is proportional to the volume of that piece, so

(132) NjN≈ ∆Vj

VD

where Nj is the number of dots in the jth piece, and N is the total number of dots.

126 6. INTEGRALS

To compute the average value of f at all the dots we begin with

sum of f at all dots =∑j

sum of f at all dots in jth piece .

If we pick a sample point (xj , yj , zj) in each pieceDj , then, since the pieces are assumedto be small, we may approximate the function value at every dot in Dj by the value ofthe function at the sample point. �ere are Nj dots in Dj , so we �nd that

sum of f at all dots ≈∑j

Njf(xj , yj , zj)

Using (132) we therefore �nd that the average function value at all the dots issum of f at all dotsnumber of dots in D

≈ 1

N

∑j

Njf(xj , yj , zj)

=∑j

∆VjVD

f(xj , yj , zj)

=1

VD

∑j

f(xj , yj , zj) ∆Vj

≈ 1

VD

∫∫∫D

f(x, y, z) dV.

�is is exactly how we had de�ned the average of f over the region D.Keep in mind that the above is not a proof of the equation (131), but rather an intuitive

justi�cation for taking (131) as de�nition of the average.

5.3. Example 4.2 continued. In §4.2 we computed the volume integral off(x, y, z) = x2 + y2

over the rectangular blockD given by 0 ≤ x ≤ A, 0 ≤ y ≤ B, 0 ≤ z ≤ C and we found∫∫∫D

(x2 + y2

)dV = 1

3ABC(A2 +B2

).

Since the volume of the block isABC , the average value of f(x, y, z) = x2 +y2 over theblock D is

Average of x2 + y2

over D=

13ABC

(A2 +B2

)ABC

= 13

(A2 +B2

).

5.4. Densities. If a substance (for an example, think of a gas in a cylinder) occupiesa certain region D in space, then its density µ is de�ned to be

µ = density =mass in Dvolume of D

.

If the substance is evenly distributed throughout the regionD, then the mass-to-volumeratio will be the same for any subregion D′. �us the mass contained in any smallerregion D′ will be proportional to the volume of that region:

mass in D′ = µ× volume of D′.

When the substance is not distributed evenly this proportionality will no longer hold,and we say that “the density varies from point to point.” If we now want to give a precisede�nition of the density at any point P , we run into the same kind of problem we had

5. WHY COMPUTE A TRIPLE INTEGRAL? 127

in �rst semester calculus when we tried to de�ne the slope of a tangent, or the velocityat one moment in time. Namely, the “density at P ” should be the mass of the substanceat P divided by the volume of the point P – but there is no mass at one point, and thevolume of one point is zero, so this leads to density= 0

0 =� �e way out of this is tocalculate the average density for very small regions D′ surrounding the point P , and todeclare those as approximations of the density at P . To get a be�er approximation weshould choose a smaller region D′.

�is is summarized in the following formula,

(133) µ(x, y, z) = limD′↘P

mass in D′

volume of D′

where “D′ ↘ P ” means that we are taking the limit as the regionD′ shrinks to the pointP .

D₁ D₂ D₃ D4

P

Figure 13. Density of gas in a container; in these drawings most of the gas concentrates in thebo�omof the container. Le�: The total mass in two regions,D′ andD′′, depends on their location,even though they have the same shape and volume. Right: To define the density at a point P , wecompute the average density over smaller and smaller regionsD1,D2, . . .which shrink to the givenpoint P . If the average densities converge to some number, then we call that limit the density atP .

5.5. Mass as integral of the density. Suppose the density of a substance is givento us as a function µ(x, y, z), how do we �nd the total mass of the substance present in aparticular regionD? �e answer is in terms of a triple integral, and the way this integralcomes about is typical for a large number of applications of double and triple integrals.

To �nd the total mass present in a regionD we partition it into many small pieces, andcompute the mass in each small piece. Consider one such piece. If it is small enough, thenwe assume that the density µ(x, y, z) is nearly constant in that small piece, and hence thetotal mass in one small piece will be

mass in a piece of the partition = µ(x, y, z)×∆V.

Here (x, y, z) is a sample point in the partition piece, and ∆V is the volume of the piece.So when we compute the total mass by adding all the masses of the partition pieces, eachpiece in the partition contributes one term of the form f(x, y, z)∆V . Our formula for thetotal mass is therefore a Riemann sum for the following triple integral

(134) total mass =

∫∫∫D

µ(x, y, z) dV.

128 6. INTEGRALS

5.6. Example: air in the atmosphere. How much air is there in the atmosphere ina vertical column of height H above one square meter?

According to one model of the atmosphere, the density of the atmosphere decays ex-ponentially with height, so that

(135) µ(x, y, z) = Ce−z/L (kg/m3)

where z is the height above sea level, and x, y are horizontal coordinates. �e constant

x

y

z

1

1

H

C is the density of air at sea level, and L is another constant (L must have the units oflength).

We adapt our coordinates to the 1×1 square which is the base of the air columnwhosemass we are to compute, namely, we let the origin be one of the corners of the square,and we let the sides at this corner be the x and y axes. �e region occupied by the aircolumn is then a rectangular block

D = {(x, y, z) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ H}

and the mass of the air in this block is

M =

∫∫∫D

µ(x, y, z) dV

=

∫ H

0

∫ 1

y=0

∫ 1

x=0

Ce−z/L dx dy dz

= LC(1− e−H/L

).

To get the mass of all the air above our 1× 1 square, we let H →∞ which leads to

Total Mass = LC.

5.7. �e moment of inertia of a solid about an axis of rotation. An object ofmassm that moves with velocity v has kinetic energy given by

(136) K =1

2mv2.

If a solid object is rotating about an axis, then it also has kinetic energy, but the formula(136) does not apply, because di�erent parts of the solid will be moving with di�erentvelocities. �e problem is that v is not a constant: it varies from place to place, and thusit is a function of where we measure the velocity.

ω (radians/second)

v=ωrr

the kinetic energy of awhirling potato

To compute the kinetic energy of a rotating solid we break it up into small pieces:if each of the pieces is small enough, then all the particles in that small piece will havenearly the same velocity. A well known formula from trigonometry says that if the objectis rotating with angular velocity ω about an axis, then the velocity of a particle in theobject is given by v = ωr where r is the distance from the particle to the axis of rotation.On the other hand the mass of such a small piece will be µ ·∆V , where µ is the density ofthe material (which we assume to be constant here), and ∆V is the volume of the smallpiece. �erefore if we break the object into many small pieces (partition the object), thekinetic energy of any one of the small pieces is

K.E. of one piece =1

2µ(ωr)2∆V.

6. INTEGRATION IN SPECIAL COORDINATE SYSTEMS 129

Adding the kinetic energies of all the small pieces again gives us a Riemann sum for anintegral, and this leads us to the formula

(137) K =

∫∫∫D

12µω

2r2 dV =1

2Mω2,

where

(138) Mdef= µ

∫∫∫D

r2 dV.

is called the moment of inertia of the given object about the given axis of rotation.

5.8. Example. Compute the moment of inertia of a wooden rectangular block

D ={

(x, y, z) : 0 ≤ x ≤ A, 0 ≤ y ≤ B, 0 ≤ z ≤ C}.

around the z axis. �e density of the wood is µ.�e integral we have to calculate is

M = µ

∫∫∫D

r2 dV.

To compute this we have to �gure out what r is: since r is the distance from the point(x, y, z) to the axis of rotation, and since this axis is the z-axis, we get, by Pythagoras,r2 = x2 + y2. �erefore we have to compute

M = µ

∫∫∫D

(x2 + y2

)dV.

We have already computed this integral in §4.2, where we found that

r

x

y

P (x,y,z)

axis

of r

otat

ion

M = 13µABC(A2 +B2).

6. Integration in special coordinate systems

Many volume integrals arise in situations where there is a lot of symmetry. When thishappens Cartesian “x, y, z” coordinates are usually not the best choice to compute theintegral. �ere are many di�erent coordinates besides Cartesian. In this section we willlook at the two most commonly used coordinate systems. �ey can both be thought of asthree-dimensional variations on polar coordinates in the plane.

6.1. Cylindrical coordinates. Let P be some point in three dimensional space. Ifwe provide the z coordinate as well as the polar coordinates (r, θ) of the projection of Pon the xy plane, then the location of P is completely determined. See the drawing on thele� in Figure 14. From this drawing it is easy to derive the relation between cylindricalCartesian coordinates

(139)x = r cos θ

y = r sin θ

z = z

130 6. INTEGRALS

θr

z

x

y

z

φ

θ

ρ

ρ sinφ

x

y

z

Figure 14. Le�: In cylindrical coordinates we specify the location of a point by its height zabove the xy-plane, and the polar coordinates (r, θ) of its projection on the xy-plane. Right: Inspherical coordinates we specify the location of a point by its distance ρ to the origin, the polarangle θ of its projection on the xy-plane, and the angle φ between the z-axis and the line segmentfrom the point to the origin.

6.2. Spherical coordinates. We can also specify the location of a point P by pro-viding these three numbers:

– the distance ρ from P to the origin– the angle φ between the positive z-axis and the line from the origin to the point P– the polar angle θ of the projection of P onto the xy-plane.See the drawing on the right in Figure 14, from which we can derive the following

DO NOT MEMORIZE.What if the north pole hap-pens to be on the x-axis?Can you still relate spher-ical and Cartesian coordi-nates?

relation between the spherical coordinates (ρ, φ, θ) and the Cartesian coordinates (x, y, z)of a point:

(140)x = ρ sinφ cos θ

y = ρ sinφ sin θ

z = ρ cosφ

�e angle φ takes values between 0 and +π, with φ = 0 on the north pole, and φ = π onthe south pole. �e polar angle θ can take all values from 0 to 2π, or more generally anyvalue in some interval of length 2π (like −π < θ < π).

6.3. Triple integral in cylindrical coordinates. Suppose we wanted to �nd atriple integral ∫∫∫

D

f(x, y, z) dV

over a domainD which is a “rectangular block” in cylindrical coordinates, i.e. supposeDis given by the inequalities

r0 ≤ r ≤ r1, z0 ≤ z ≤ z1, θ0 ≤ θ ≤ θ1.

Let’s try to write it as an iterated integral. To do this we partition the regionD into manysmall pieces by dividing the interval r0 ≤ r ≤ r1 into pieces of length ∆r, the intervalz0 ≤ z ≤ z1 into pieces of length ∆z, and interval θ0 ≤ θ ≤ θ1 into pieces of length∆θ. �e whole region D then gets broken up into small regions in which the radius isconstrained to lie in the interval (r, r+∆r), the height to the interval (z, z+∆z) and thepolar angle to (θ,∆θ). See Figure 15. Such a small region is approximately a rectangular

6. INTEGRATION IN SPECIAL COORDINATE SYSTEMS 131

x

y

z

drr rdθ

dz

θ

dθ

Figure 15. The cylindrical volume element.

block, so that we can approximate its volume bymultiplying the lengths of its sides, whichleads to

∆V ≈ ∆r × r∆θ ×∆z.

Arguing as in the case of polar coordinates (see §2.8) we get the following iterated integralformula for a triple integral over a rectangular block in cylindrical coordinates:

(141)∫∫∫D

f(x, y, z)dV =

∫ r1

r0

∫ z1

z0

∫ θ1

θ0

f(x, y, z)r dθ dz dr

If the function to be integrated is given in terms of the Cartesian coordinates x, y, z, thenwe �rst have to rewrite it in terms of cylindrical coordinates using (139).

6.4. Triple integral in spherical coordinates. A spherical block is a region Dwhich in spherical coordinates is given by the inequalities

ρ0 ≤ ρ ≤ ρ1, θ0 ≤ θ ≤ θ1, φ0 ≤ φ ≤ φ1

To integrate a function over such a block we divide into many small spherical blocks. Ineach of these blocks ρ increases by ∆ρ, θ by ∆θ, and φ by ∆φ. See Figure 16. Any su�-ciently small spherical block is approximately rectangular, and we can therefore computeits volume by multiplying the lengths of its sides. If we carefully look at the drawing onthe right in Figure 16, then we �nd that

∆V ≈ ρ∆φ× ρ sinφ∆θ ×∆ρ.

�is leads us to the formula for integration in spherical coordinates:A spherical block

(142)∫∫∫D

f(x, y, z)dV =

∫ ρ1

ρ0

∫ θ1

θ0

∫ φ1

φ0

f(x, y, z)ρ2 sinφ dφ dθ dρ

As in the cases of polar coordinates and cylindrical coordinates we �rst have to expressthe function f(x, y, z) in terms of the variables ρ, φ, and θ, using (140).

132 6. INTEGRALS

ρφ

θ

ρdφρ sinφ

ρ sinφ dθ

dρ

Figure 16. Le�: a number of small spherical blocks with varying φ but the same θ and ρ stackedtogether. Right: the volume of a small spherical block is approximately the product of the lengthsof its sides, so ∆V ≈ ρ2 sinφ∆ρ ∆θ ∆φ.

6.5. Example – Rotational Kinetic energy of the Earth. �e earth is roughlya sphere with radius a ≈ 6400km, which rotates around its axis with angular velocityω = 2πrad/day. Let’s assume that the density of the earth is constant, say, µkg/m3.

To compute the total kinetic energy of the earth we can use formula (137), which tellsus that we have to �nd ∫∫∫

Earth

r2 dV,

where r is the distance to the earth’s axis of rotation.�is integral is best computed using spherical coordinates, in which

r = ρ sinφ (see Figure 14, right).�us the kinetic energy is

K = 12µω

2

∫∫∫Earth

ρ2 sin2 φ dV

= 12µω

2

∫ π

φ=0

∫ a

ρ=0

∫ 2π

θ=0

ρ2 sin2 φ ρ2 sinφ dθ dρ dφ︸︷︷︸dV

.

A�er doing the θ and ρ integrals, we get

K = πµω2 a5

5

∫ π

0

sin3 φ dφ.

�is last integral can be done several ways (integrate by parts and �nd a reduction for-mula, or substitute u = − cosφ). �e result is

K = 415πµω

2a5.

7. Problems

7. PROBLEMS 133

1. Describe the following sets (given inspherical coordinates):

(a) All points with φ = π/6. •

(b) All points with φ = π. •

(c) All points with φ = π/2. •

(d) All points with θ = π/2 •

2. Let E be the part of the sphere with ra-dius a, centered at the origin, and containedin the first octant

(a) Describe E in terms of spherical coordi-nates. •

(b)DescribeE in terms of cylindrical coordi-nates. There are two possible answers, findboth. •

3. Draw the volume elements in cylindricaland in spherical coordinates and show howthese lead to dV = rdrdθdz, and dV =ρ2 sinφ dρ dθ dφ, respectively.

•

4. Look at Figure 12. Suppose the grey dischas height z, and suppose all points on theline segment drawn in this disc have thesame y-coordinate (y).

(a) What are the radii of the two circlesdrawn in the xy plane? •

(b)What are the coordinates of the two end-points of the drawn line segment? •

5. The potential energy in a pile of honey.If you li� an object to height h above theground, then the potential energy you giveit is mgh, where m is the mass of the ob-ject, and g is the acceleration of gravitation(g ≈ 9.8m/sec2).

Suppose that a certain substance occu-pies a three dimensional region D (think ofhoney that has just been poured into a jar,see the drawing which gives a two dimen-sional side view of the situation).

heig

ht=f

(x, y

)

e potential energyof a small piece is

Δm × g × zz

Assuming the base of the jar is an A × Brectangle, the honey occupies the region

D ={

(x, y, z) :

0 ≤ x ≤ A, 0 ≤ y ≤ B,0 ≤ z ≤ f(x, y)

}.

Here f(x, y) is the height of the honeyabove the point (x, y) in the base of the jar.

(a) What is the potential energy of a smallpiece of the honey at (x, y, z) (assume thedensity of the honey is µ, and that this isconstant.) Is your formula an exact formula?•

(b) Write a volume integral for the total(gravitational) potential energy contained inthe honey. •

(c) Write your triple integral as an iteratedintegral, and show that you can do the inte-gration in the z direction even if you don’tknow the height function f(x, y). •

6. The kinetic energy in a tornado.

Assume an airmass is whirling aroundthe z-axis, and assume that the wind veloc-ity v(r) only depends on the distance fromthe z-axis.

Assume furthermore that the air has con-stant density µ.

(a) Derive a volume integral for the total ki-netic energy of the airmass in a given regionD. (The kinetic energy of an object of massm and velocity v is 1

2mv2. See the deriva-

tion of the moment of inertia in §5.7). •

(b) Suppose the velocity is actually given byv(r) = 1/

√1 + r2, the density is µ = 1.

LetD be the cylinder of heightH and radiusR, with the z-axis as its central axis. Howmuch kinetic energy does the airmass in Dhave? (Hint: which coordinates should youuse?) •

134 6. INTEGRALS

7. For each of the following iterated inte-grals, describe and draw the domain of inte-gration. Then compute the integral.

(a)∫ 2

0

∫ x2

−1

∫ y

1

xyz dz dy dx. •

(b)∫ 1

0

∫ x

0

∫ ln y

0

ex+y+z dz dy dx. •

(c)∫ π/2

0

∫ sin θ

0

∫ r cos θ

0

r2 dz dr dθ. •

(d)∫ π

0

∫ sin θ

0

∫ r sin θ

0

r cos2 θ dz dr dθ. •

(e)∫ 1

0

∫ y2

0

∫ x+y

0

x dz dx dy. •

(f)∫ 2

1

∫ y2

y

∫ ln(y+z)

0

ex dx dz dy. •

8. Find the mass of a cube with edge length2 and density equal to the square of the dis-tance from one corner. •

9. Find the mass of a cube with edge length2 and density equal to the square of the dis-tance from one edge. •

♦

If a mass is distributed throughout a re-gionD with density µ(x, y, z), then, by def-inition the coordinates (X,Y, Z) of the cen-ter of mass

Xdef=

∫∫∫Dxµ(x, y, z)) dV

Mass ofD,

and similarly for Y and Z .

10. An object occupies the volume of the up-per hemisphere of x2 + y2 + z2 = 4 andhas density z at (x, y, z). Find the center ofmass.

•

11. An object occupies the volume of thepyramid with corners at (1, 1, 0), (1,−1, 0),(−1,−1, 0), (−1, 1, 0), and (0, 0, 2) and hasdensity x2 + y2 at (x, y, z). Find the centerof mass. •

12. Let z = f(x, y) be a function on somedomain D, and assume that D is split intotwo parts: D+, on which f ≥ 0, andD−, onwhich f(x, y) < 0.

Let V+ be the volume of the region be-neath the graph of f and above the domain

D+ in the xy-plane, and, similarly, let V−be the volume of the region above the graphof f and beneath the region D− in the xy-plane.

Reminder: volumes are never nega-tive, so both V+ ≥ 0 and V− ≥ 0.

(a) Express the following integrals in termsof V+ and V−:

I =

∫∫D+

f(x, y) dA,

J =

∫∫D−

f(x, y) dA

K =

∫∫D

f(x, y) dA

L =

∫∫D

|f(x, y)| dA.

•

(b) Find the region E in three dimensionalspace for which∫∫∫

E

(1− x2 − y2 − z2) dV

is a maximum. [Hint: Suppose E is someregion; consider then what happens to theintegral if you make E larger by adding ona piece.]

13. Evaluate∫ 1

0

∫ x

0

∫ √x2+y2

0

(x2 + y2)3/2

x2 + y2 + z2dz dy dx.

•

14. Evaluate∫∫∫

x2 dV over the interior of

the cylinder x2 +y2 = 1 between z = 0 andz = 5. •


xy dV over the interior of

the cylinder x2 +y2 = 1 between z = 0 andz = 5. •


z dV over the region

above the x-y plane, inside x2 +y2−2x = 0and under x2 + y2 + z2 = 4. •


yz dV over the region in

the first octant, inside x2 +y2−2x = 0 andunder x2 + y2 + z2 = 4. •

7. PROBLEMS 135


x2 + y2 dV over the inte-

rior of x2 + y2 + z2 = 4. •

19. Evaluate∫∫∫ √

x2 + y2 dV over the

interior of x2 + y2 + z2 = 4. •

20. Find the mass of a right circular cone ofheight h and base radius a if the density isproportional to the distance from the base. •

21. Find the mass of a right circular cone ofheight h and base radius a if the density isproportional to the distance from its axis ofsymmetry. •

22. An object occupies the region inside theunit sphere at the origin, and has densityequal to the distance from the x-axis. Findthe mass. •

23. An object occupies the region inside theunit sphere at the origin, and has densityequal to the square of the distance from theorigin. Find the mass. •

24. An object occupies the region betweenthe unit sphere at the origin and a sphereof radius 2 with center at the origin, and hasdensity equal to the distance from the ori-gin. Find the mass. •

CHAPTER 7

Vector Calculus

1. Vector Fields

So far we have been studying the calculus of functions of several variables. Functionsare used to describe things that have di�erent values at di�erent locations, e.g. quantitieslike temperature, or density. Many other physical phenomena are described by vector�elds, i.e. by vectors whose direction and magnitude can vary from place to place. Vectorcalculus is the theory of integration and di�erentiation of vector �elds.

By de�nition, a vector �eld in the plane is a vector valued function of two variables:whereas an ordinary function of two variables gives us a number for each (x, y) in itsdomain, a vector �eld gives us a vector in the plane for each point (x, y) in its domain.Such a vector is determined by its two components, both of which are ordinary functionsof (x, y). �e notation we will use in this course is as follows:

(143) #‰v (x, y) =

(P (x, y)Q(x, y)

)= P (x, y) #‰ı +Q(x, y) #‰ .

For a vector �eld in three dimensional space we must specify a vector #‰v (x, y, z) ateach point (x, y, z) in a three dimensional domain :

#‰v (x, y) =

P (x, y)Q(x, y)R(x, y)

= P (x, y) #‰ı +Q(x, y) #‰ +R(x, y)#‰

k .

To draw a vector �eld in the plane we would have compute #‰v (x, y) at lots of points andsimply plot them. �e more points we pick, the busier the picture gets. See for exampleFigure 1, in which the vector �eld

(144) #‰v (x, y) =

(−y/(x2 + y2)x/(x2 + y2)

)is drawn.

2. Examples of vector �elds

2.1. Gradients as vector �elds. We have already seen examples of vector �eldsbefore, namely the gradient of any function f(x, y) is a vector �eld:

#‰∇f(x, y) =

(fx(x, y)fy(x, y)

).

In fact, the example (144) is such a vector �eld: it is the gradient of the polar angle θ. In§III.4.2 we saw that for x > 0 this angle is given by θ(x, y), and we checked in prob-lem III.15.16 that #‰v given by (144) and shown below in Figure 1 satis�es #‰v =

#‰∇θ.

137

138 7. VECTOR CALCULUS

Figure 1. A vector field in the plane. This vector field is

#‰v (x, y) =#‰∇θ(x, y) =

−yx2 + y2

#‰ı +x

x2 + y2#‰ .

In general, drawings of vector fields becomemessy in the regionwhere the vectors are long, becausethey tend to overlap. Drawing a three dimensional vector field is challenging.

2.2. Fluid �ow. Vector �elds appear in various ways in physics. �e easiest wayto visualize a vector �eld is by thinking of it as the velocity �eld of a �uid �ow. Supposea �uid is �owing through a certain region in space. �e velocities of the �uid particleswill generally vary from place to place, and also with time. A �uid �ow is called steady ifthe velocity of a �uid particle only depends on its location. �is means that the velocityvector #‰v of a �uid particle is a function of its coordinates (x, y, z) only, and does notdepend on time.

central axis

rwall

Figure 2. Fluid flow in a cylindrical pipe. Le�: as a viscous fluid flows through a pipe itsticks to the walls, and so its velocity will be highest at the center of the pipe. Right: a drawingof a cross section the flow on the le�. We see the vector field corresponding to so-called Poiseuilleflow, given by Equation (145).

For instance, if a viscous �uid �ows through a cylindrical pipe, the velocity of the �uidwill only depend on the distance to the central axis of the pipe. On the walls the velocitywill vanish (the �uid sticks to the wall of the pipe), and in the center the �uid will movefastest. Under certain circumstances it follows from the laws of �uid mechanics that thevelocity �eld

• is always parallel to the central axis, and• depends quadratically on the distance to the central axis.

2. EXAMPLES OF VECTOR FIELDS 139

It is given by

(145) #‰v (x, y, z) = vc(1− r2

R2

)#‰ı =

vc(1− (r/R)2)

00

,

whereR is the radius of the pipe, r is the distance to its central axis, and vc is the velocityat the center of the pipe.

�is example describes the motion of a �uid, but a vector �eld can be the velocity �eldof anything that moves, in particular, a gas �ow has a velocity �eld, and the velocities ina moving elastic solid (think “Jello”) must also be described by a vector �eld.

2.3. Force �elds. If we assume the Earth is �at, then the gravitational force it exertson a massm is always the vector #‰

F =(

0−mg

). We can think of this as a constant vector

�eld: its magnitude and direction are the same everywhere.But the Earth is not �at, and according to Newton the gravitational force #‰

F is a vectorpointing towards the center of the earth, whose magnitude is inversely proportional tothe distance to the center of the Earth. If we choose the Earth’s center to be the origin,then Newton’s law looks like this:

(146) #‰

F (x, y, z) = −C#‰x

‖ #‰x‖3, #‰x =

xyz

.

Here C is a constant that depends on the mass m of the object, and the mass M of the

x

F

Earth

M

m

m

F

flat Earth

Earth (physics tells us that C = GMm, where G is called the “universal gravitationalconstant.”)

Other prominent examples of vector �elds appear in the theory of electromagnetism.�e electric currents and charges around us create an electric �eld and a magnetic �eld,which at each point in space are given by vectors #‰

E and #‰

B. �ese vectors change fromplace to place, and so they de�ne vector �elds

#‰

E =#‰

E(x, y, z),#‰

B =#‰

B(x, y, z).

For example, Coulomb’s law states that the electric �eld generated by a charged particleat the origin is given by

(147) #‰

E(x, y, z) =Q

4πε0

#‰x

‖ #‰x‖3,

which is almost the same as Newton’s law (146) for the gravitational �eld. Here ε0 is someconstant, and Q is the electric charge of the particle.

If an electric current of strength I runs upward through the z-axis, then this currentwill create a magnetic �eld which is given by

(148) #‰

B(x, y, z) =µ0I

2π

−y/(x2 + y2)x/(x2 + y2)

0

.

Again, a constant (µ0) appears. If we compare (148) with (144), then we see that, exceptfor the constant factor µ0I/2π this vector �eld is a three dimensional version of the onedrawn in Figure 1: we can regard Figure 1 as a “top view” of the magnetic �eld #‰

B of anelectric current.


3. Line integrals

3.1. Line integrals of functions. Instead of integrating over plane domains, orregions in space, it o�en turns out to be useful to integrate over a curve in the plane, ora curve in space.

If C is a curve in the plane, or in space, (think of a line segment, a circular arc, or afancier curve), and if w = f(x, y, z) is a function then the basic pa�ern for de�ning theintegral of f over the curve C is the same as for all the other integrals we have de�ned inthe previous chapter.

l ∆s

Figure 3. Partitioning a curve.

To de�ne the integral we divide the curve C into many short arcs, and label them C1,. . . , Cn; we choose one sample point (xk, yk, zk) on arc Ck for every k = 1, · · · , n; andwe compute the length ∆sk of each arc Ck . With these data we form the Riemann sum

(149) R = f(x1, y1, z1)∆s1 + · · ·+ f(xn, yn, zn)∆sn,

and if these Riemann sums converge as one makes the partition arbitrarily �ne, then wecall the limit the line integral of f with respect to arc length over the curve C:

(150)∫C

f(x, y, z) ds = lim“as the partition

gets finer”

n∑k=1

f(xk, yk, zk)∆sk.

�e length of the curve C can be expressed as a line integral

Length of C =

∫C

ds.

3.2. How to calculate a line integral. Recall that a curve C is usually given by aparametrization

#‰x = #‰x(t) =

x(t)y(t)z(t)

, (a ≤ t ≤ b)

also wri�en as #‰x(t) = x(t) #‰ı + y(t) #‰ + z(t)#‰

k .Given such a parametrization it is easy to make partitions by just partitioning the

parameter interval a ≤ t ≤ b into many short sub intervals, a = t0 < t1 < · · · < tn = b.We could choose the kth sample point to be the point #‰x(tk). �e length of the arc from#‰x(tk−1) to #‰x(tk) is approximately the same as the distance between these two points (foras one makes the partition �ner, the arcs becomemore andmore like short line segments).�us we �nd

∆sk ≈ ‖ #‰x(tk)− #‰x(tk−1)‖ =

∥∥∥∥ #‰x(tk)− #‰x(tk−1)

∆tk

∥∥∥∥ ∆tk ≈ ‖ #‰x ′(tk)‖ ∆tk,

3. LINE INTEGRALS 141

#‰x(t)

#‰x ′(t)C

#‰x(t)∆ #‰x ≈ #‰x ′(t)∆t

C

#‰x(t+ ∆t)

Figure 4. A parametrized curve: Le�: The vector #‰x ′(t) is tangent to the curve at the point#‰x(t). The vector #‰x(t) is the position vector of a point on the curve. Right: Increasing the param-eter t by a small amount ∆t changes the position vector to #‰x(t+ ∆t), causing the correspondingpoint on the curve to move by #‰x(t+ ∆t)− #‰x(t) ≈ #‰x ′(t)∆t.

where ∆tk = tk − tk−1. �e Riemann sum for the line integral is

R ≈n∑k=1

f( #‰x(tk)) ‖ #‰x ′(tk)‖ ∆tk.

As the partition is made �ner the approximation gets be�er, and in the limit we get

(151)∫C

f( #‰x) ds =

∫ b

a

f( #‰x(t))‖ #‰x ′(t)‖ dt.

3.3. Example –What is the average of f(x, y) = x over the quarter unit circlein the �rst quadrant? Just as with double and triple integrals, the average of a functionover a curve C is de�ned to be

Average of f( #‰x) =

∫Cf( #‰x) ds∫Cds

,

where∫Cds is the length of C.

x

y

1½ 2/π

Figure 5. The average x-coordinate on a quarter circle

To compute these integrals we must �rst parametrize the curve. Since the curve is theunit circle, we can parametrize points on the curve by their polar coordinate θ, whichgives us:

#‰x(t) =

(cos θsin θ

)and thus ‖ #‰x ′(t)‖ =

∥∥∥∥(− sin θcos θ

)∥∥∥∥ = 1.

�erefore ∫C

x ds =

∫ π/2

0

cos θ dθ = 1.


�e length of the curve is π/2, so the average value of x on the quarter circle is1

π/2=

2

π≈ 0.636 619 8 . . . .

4. Problems

1. If C is the quarter of the unit circle thatlies in the first quadrant, then. . .

(a) What is the average distance to the ori-gin on C? •(b) what is the average polar angle θ? •

2. (a) Compute the average x and y coordi-nates of the polygon fromA(1, 0) toB(1, a)to C(0, a) (a > 0 is a constant; the polygonhas the shape of an upside-down “L”).

(b) Compute the average polar angle θ onthe same polygon ABC .

3. Find the average x and y-coordinates onthe part that lies above the x-axis of the cir-cle with radius R and center at the origin.•

4. Compute∫Cx dswhere C is the parabola

y = x2, with 0 ≤ x ≤ 1. •

5. A wire is made in the shape of a helix, ofradius a and heightH , with parametrization

#‰x(t) =

a cos ta sin tHt/2π

(0 ≤ t ≤ 2π).

x

y

z

π2

1

1

(cos θ, sin θ, θ4)

Suppose the temperature at (x, y, z) is T =

T0e−z/L for constants L and T0.

(a)What are the units of a,H, T0, and L? •(b)What values do a andH have for the he-lix in the drawing? •(c) What is the average temperature on thewire? (Check that your answer has the rightunits.) •

5. Line integrals of vector �elds

5.1. De�nition. If C is a curve in three dimensional space, and#‰

F (x, y, z) is a vector�eld, then the line integral of

#‰

F over C is de�ned to be

(152)∫C

#‰

F ·d #‰x = lim“as the partition

gets finer”

n∑k=1

#‰

F (xk, yk, zk)·∆ #‰xk

To de�ne the Riemann sum we have partitioned the curve into n pieces; (xk, yk, zk) is asample point on the kth short arc in the partition, and ∆ #‰xk is the vector connecting theinitial and �nal points of the kth partition arc. See Figure 6 (right).

5.2. Integrals over closed curves. A curve C is closed if its initial and �nal pointscoincide. If we are integrating a vector �eld over a closed curve, and if we want to em-phasize this in the notation, then we can write∮

C

#‰

F ·d #‰x , or∮C

Pdx+Qdy +Rdz.

5. LINE INTEGRALS OF VECTOR FIELDS 143

displacementvector = Δx

θ

F

C

FΔx

kth partitionpiece

start

end

Figure 6. Le�: The work done by a force#‰F acting on an object is equal to the product of the

length of the displacement∆ #‰x and themagnitude of the force in the direction of the displacement.If the angle between force and displacement is θ, then this is W = ‖ #‰

F ‖‖∆ #‰x‖ cos θ =#‰F ·∆ #‰x .

Right: To define∫C

#‰F ·d #‰x we partition the curve into small pieces, and add the work done by the

force#‰F over all partition pieces.

5.3. Di�erential form notation for line integrals. �e d #‰x that appears in lineintegrals is o�en interpreted as an “in�nitesimally short vector” connecting two adjacentpoints on the curve C. Its components give us the amounts by which the coordinates x, y,and z change as we go “from one point to the next” on the curve, and therefore one o�enwrites

d #‰x =(dxdydz

).

If the vector �eld #‰

F has components #‰

F =(PQR

), where P,Q, and R are functions of

(x, y, z), then the expression #‰

F ·d #‰x can be wri�en as#‰

F ·d #‰x =(PQR

)·(dxdydz

)= P (x, y, z)dx+Q(x, y, z)dy +R(x, y, z)dz.

Because of this the following notation for line integrals is o�en used:∫C

#‰

F ·d #‰x =

∫C

Pdx+Qdy +Rdz.

For instance, the integral ∫C

xdx+ zdy − xydz

stands for the line integral of the vector �eld #‰

F =( x

z−xy

)over the curve C.

Expressions of the form Pdx + Qdy + Rdz, such as x dx + z dy − xy dz above, arecalled di�erential forms.

5.4. �eorientation of a curve. �eRiemann sum (152) contains the vectors∆ #‰xk ,which connect two adjacent points in our partition of the curve C. Whenever we have twopointsA andB, there are two vectors connecting them, namely # ‰

AB and # ‰

BA = − # ‰

AB. Tomake sure that the direction of the vector ∆ #‰xk in the Riemann sum (152) is unambigu-ous, we have to agree on a direction in which the curve C is traversed. Such a direction


is called an orientation of the curve. A curve can have exactly two orientations, and todistinguish between a curve and the same curve with the opposite orientation, one writes

−C = the curve C with its orientation reversed.

If one reverses the orientation of a curve (e.g. by switching its begin and end points, seeFigure 6), then each vector ∆ #‰xk in the Riemann sum in (152) changes its sign, and as aresult the whole Riemann sum changes its sign. In the limit the integral changes its sign.�us we have

(153)∫−C

#‰

F ·d #‰x = −∫C

#‰

F ·d #‰x .

It is important to realize that the integral changes its sign here because it is the lineintegral of a vector �eld. If w = f(x, y, z) is a function of three variables then∫

−Cf(x, y, z) ds = +

∫C

f(x, y, z) ds.

For instance, if f = 1 is constant then∫−C ds and

∫Cds are the length of−C and C. Since

the length of a curve is always a positive number and does not depend on its orientation,we have ∫

C

ds = length of C = length of − C =

∫−C

ds.

5.5. Integrating over piecewise de�ned curves. To compute a line integral it iso�en best to start with a parametrization of the curve and use (151). In practice it can bevery di�cult to �nd such a parametrization of the whole curve, even though the curvecan be broken into a few pieces, each of which does have a simple parametrizations. Forinstance, the edges of a square together form a closed curve. It is di�cult to �nd oneparametrization for all four edges at once, but each edge of the square is a simple linesegment for which one can easily �nd a parametrization. In this situation one can write

A B

D C

C1

C2

e curve C=C1+C2

the line integral over the whole curve as a sum of line integrals over the separate pieces.Going back to the example of the square, we have∫

C

#‰

F ·d #‰x =

∫AB

#‰

F ·d #‰x +

∫BC

#‰

F ·d #‰x +

∫CD

#‰

F ·d #‰x +

∫DA

#‰

F ·d #‰x .

In general, if a curve C consists of two parts, C1 and C2, then we express this by writingC = C1 + C2.


A line integral over the whole curve is the sum of the line integrals over the separatepieces:

(154)∫C

#‰

F ·d #‰x =

∫C1

#‰

F ·d #‰x +

∫C2

#‰

F ·d #‰x .

5.6. �e line integral as the integral of the tangential component of a vector�eld. In the Riemann-sum (152) the kth term contains the vector ∆ #‰xk , which connectstwo adjacent points in the partition of the curve (see �gure 6, on the right). We can writethis vector as the product of a unit vector and a positive number:

∆ #‰xk =∆ #‰xk‖∆ #‰xk‖

‖∆ #‰xk‖.

�evector∆ #‰xk will be almost tangent to the curve, and the �ner onemakes the partition,the smaller the angle between ∆ #‰xk and the tangent to the curve will be. If the partitionis su�ciently �ne, then we will have

∆ #‰xk‖∆ #‰xk‖

≈ #‰

T k,

where #‰

T k is the unit tangent vector to the curve at the point (xk, yk, zk). Furthermore,∆ #‰xk ≈ ∆sk approximates the length of the kth arc in the partition, and hence we canwrite the Riemann sum as

n∑k=1

#‰

F (xk, yk, zk)·∆ #‰xk ≈n∑k=1

#‰

F (xk, yk, zk)· #‰T k ∆sk.

�e sum on the right is a Riemann sum for the line integral∫C

#‰

F ( #‰x)· #‰T ds. Taking thelimit of arbitrarily �ne partitions, we conclude that

(155)∫C

#‰

F ·d #‰x =

∫C

#‰

F · #‰T ds.

Since #‰

T is a unit vector, the quantity #‰

F · #‰T is the length of component of the vector �eld#‰

F tangential to the curve. Equation (155) therefore says that the line integral of thevector �eld #‰

F along a curve C is the same as the line integral (in the sense of §3.1) of thetangential component of #‰

F along C.

5.7. Example – work around a circle. �e formula (155) for the line integral isuseful if we know the angle between the force #‰

F and the curve, and the magnitude of theforce. For instance, consider this problem:

Compute the work done by the vector �eld#‰

F (x, y) = x #‰ı + y #‰ = ( xy ) along the curveC, where C is some piece of the unit circle in the plane.

We are asked to compute∫C

#‰

F ·d #‰x . �e vector �eld #‰

F = ( xy ) always points awayfrom the origin, and thus it is always perpendicular to the tangent #‰

T to the unit circle.(See Figure 7.) Hence #‰

F · #‰T = 0, and we �nd that∫C

#‰

F ·d #‰x =

∫C

#‰

F · #‰T ds =

∫0 ds = 0.

�ose who prefer the di�erential form notation (§ 5.3) can write this as∫C

x dx+ y dy =

∫C

#‰

F ·d #‰x = 0.


Figure 7. Le�: the vector field#‰F (x, y) = x #‰ı + y #‰ from § 5.7. Right: the vector field

#‰F is

perpendicular to the path C and hence does no work.

5.8. How to compute a line integral. If a parametrization of a curve C is given,so that the curve C is the image of

#‰x = #‰x(t) =

(x(t)y(t)z(t)

), a ≤ t ≤ b,

then we can partition the curve C by partitioning the parameter interval a ≤ t ≤ b bychoosing partition points a = t0 < t1 < . . . < tn = b, just as in §3.2. �e kth term inthe Riemann sum (152) de�ning

∫C

#‰

F ·d #‰x is #‰

F (xk, yk, zk)·∆ #‰xk , with

∆ #‰xk = #‰x(tk)− #‰x(tk−1) ≈ #‰x ′(tk)∆tk

(again as in § 3.2). �e Riemann sum for∫C

#‰

F ·d #‰x is thereforen∑k=1

#‰

F (xk, yk, zk)·∆ #‰xk ≈n∑k=1

#‰

F (xk, yk, zk)· #‰x ′(tk)∆tk.

�e sum on the right converges to the integral∫ ba

#‰

F ( #‰x(t))· #‰x ′(t) dt, and thus we havefound

(156)∫C

#‰

F ·d #‰x =

∫ b

a

#‰

F (x(t), y(t), z(t))· #‰x ′(t)dt.

We can think of this as a substitution formula for integrals, in which we substitute #‰x =#‰x(t) in the integral

∫C

#‰

F ( #‰x)·d #‰x , using the rule

d #‰x = #‰x ′(t)dt.

5.9. �ree examples. Let C1 be the line segment from the origin to the point (1, 1),and let C2 be the piece of the parabola y = x2 between the origin and the point (1, 1).Compute the work done by the vector �eld #‰

F (x, y) = (−yx ) along each of these twopaths.

�e two curves C1 and C2 together bound a region R. Let C3 be the boundary ofthis region, traversed in clockwise direction, and compute the work done by #‰

F along theclosed curve C3.

To �nd these integrals we need parametrizations of the curves. For C1 we can use

#‰x1(t) =

(tt

), 0 ≤ t ≤ 1,


C1

C2

(1, 1)

#‰

F

C1

−C2

(1, 1)

#‰

F

R

Figure 8. Le�: Two di�erent paths from the origin to the point (1, 1), and the vector field#‰F .

Right: By reversing the orientation of the second path C2 we can create a closed path that startsand ends at the origin. This path (C1 combined with −C2) is the boundary of the shaded regionR, traversed in the clockwise sense.

and for C2 we can use#‰x2(t) =

(tt2

), 0 ≤ t ≤ 1.

To show how both notations work, we will do the �rst integral using vector notation, andthe second using the di�erential form notation.

Integral over C1. �e �rst integral is computed as follows:∫C1

#‰

F ·d #‰x =

∫C1

(−yx

)·d #‰x substitute #‰x = #‰x1(t) =

(tt

)=

∫ 1

t=0

(−tt

)︸︷︷︸

#‰F

·(

11

)dt︸︷︷︸

d #‰x

since d #‰x = #‰x ′1(t)dt =

(11

)dt

=

∫ 1

0

0 dt

= 0.

Integral over C2. �e second integral wri�en using di�erential forms is∫C2

#‰

F ·d #‰x =

∫C2

−y dx+ x dy.

Here we substitute the parametrization of the path

x = x2(t) = t and y = y2(t) = t2,

withdx = dt, dy = dt2 = 2t dt,


and we �nd ∫C2

#‰

F ·d #‰x =

∫ 1

t=0

−t2 dt︸︷︷︸−ydx

+ t 2t dt︸︷︷︸xdy

=

∫ 1

0

t2 dt =1

3.

Integral over C3. We had de�ned C3 to be the combination of the curves C1 and−C2

(which is C2 with its orientation reversed). �erefore∫C3

#‰

F ·d #‰x =

∫C1

#‰

F ·d #‰x +

∫−C2

#‰

F ·d #‰x

=

∫C1

#‰

F ·d #‰x −∫C2

#‰

F ·d #‰x .

We have already computed these two integrals so there is no need to do a new integration.�e result we are looking for is∫

C3

#‰

F ·d #‰x = 0− 1

3= −1

3.

6. Another Fundamental �eorem of Calculus

If we know the derivative f ′(x) of a function y = f(x) of one variable then the Fun-damental �eorem of Calculus tells us that we can recover the function by integratingthe derivative:

(157) f(b) = f(a) +

∫ b

a

f ′(x) dx

�is semester we saw in chapter IV, § 14 that one can do the same for functions of severalvariables, i.e. following a somewhat complicated procedure one can recover a function oftwo or more variables if one knows it s partial derivatives. In this section we show thatthe procedure has a much shorter description in terms of a line integral.

6.1. �eorem. For any path C and any di�erentiable function f one has

(158) f(B)− f(A) =

∫C

#‰∇f( #‰x)·d #‰x ,

where A and B are the initial and �nal points, respectively, of the path C.

In di�erential form notation the same statement is wri�en as∫C

{∂f

∂xdx+

∂f

∂ydy +

∂f

∂zdz

}= f(B)− f(A).

6.2. Line integral of a gradient does not depend on the path. �e examples in§ 5.9 show that the line integral

∫C

#‰

F ·d #‰x of some vector �eld #‰

F normally depends onthe path C. However, it follows from�eorem 6.1 that if the vector �eld #‰

F happens to bethe gradient of a function, #‰

F =#‰∇f , then the line integral

∫C

#‰

F ·d #‰x only depends on theinitial and �nal points, A and B, of the path C, but not on the way that C gets from A toB.

6.3. Line integral of a gradient around a closed curve vanishes. An importantspecial case of �eorem 6.1 is that in which the curve C is closed. If C is a closed curve,then its initial and �nal points coincide, so that one always has

(159)∮C

#‰∇f( #‰x)·d #‰x = 0.

6. ANOTHER FUNDAMENTAL THEOREM 149

C

P

Figure 9. If we know the gradient of a function, and its value at one point (say, the origin), then wecan compute f(P ) at any other point P by choosing a path C from the origin to P , and computingthe line integral of the gradient. We have f(P ) = f(0, 0) +

∫C

#‰∇f ·d #‰x . It does not ma�er whichpath we choose.

6.4. Proof of the Fundamental �eorem. Suppose #‰

F =#‰∇f(x, y, z), and let the

curve C be parametrized by #‰x = #‰x(t), a ≤ t ≤ b. �en

#‰

F =#‰∇f =

fxfyfz

,

and hence ∫C

#‰∇f ·d #‰x =

∫C

∂f

∂xdx+

∂f

∂ydy +

∂f

∂zdz

=

∫ b

a

{∂f

∂x(x(t), y(t), z(t)) · x′(t)

+∂f

∂y(x(t), y(t), z(t)) · y′(t)

+∂f

∂z(x(t), y(t), z(t)) · z′(t)

}dt

�e expression between {· · · } is what the Chain Rule would give us if we tried to di�er-entiate f(x(t), y(t), z(t)) with respect to t. So we get∫

C

#‰∇f ·d #‰x =

∫ b

a

df(x(t), y(t), z(t))

dtdt

= f(x(b), y(b), z(b)

)− f

(x(a), y(a), z(a)

).

�epointB = (x(b), y(b), z(b)) is the end point of the curveC, andA = (x(a), y(a), z(a))is its initial point, so we have found the fundamental theorem (158).


7. Conservative vector �elds

7.1. De�nition. A vector �eld#‰

F is called conservative if one has

(160)∮C

#‰

F ·d #‰x = 0

for every closed curve C.�e name “conservative” derives from the interpretation of the integral in (160) as the

amount of work done by the force �eld #‰

F around the closed curve C. As an object movesthroughout the plane along the curve C, the force #‰

F acts on it, does work, and thereforeprovides energy to the object. �e line integral (160) measures how much energy theforce adds to the object a�er going around the curve C once. For a conservative vector�eld the total energy provided to the object is exactly zero, suggesting that its energy isconserved.

C

#‰

F

P

Figure 10. As an object moves along the closed curve C the force#‰F acts on it. At times the force

works in the direction of the motion, at other times it works against the motion. If the object startsat P , and goes around once, will it have gained energy when returning to P ?

It follows from § 6.3 that any vector �eld #‰

F that is the gradient of a function is aconservative vector �eld. �e following theorem says that these are actually the onlyconservative vector �elds.

7.2. �eorem. If #‰

F is a conservative vector �eld then there is a function f such that#‰

F =#‰∇f .

If #‰

F =#‰∇f the function

V = −fis called a potential of the vector �eld #‰

F . �us a function V is a potential of the vector�eld F if

#‰

F = − #‰∇V.�e potential V can be found by choosing one �xed pointA, at which we declare V (A) =0, and then computing the line integral

(161) V (P )def= −

∫ P

A

#‰

F ·d #‰x

where the integral is a line integral over a path from the point A to the point P . �eassumption that #‰

F is a conservative vector �eld implies that the integral in (161) doesnot depend on the path that is chosen.

9. FLUX INTEGRALS 151

8. Problems

1. Is the gravitational vector field#‰g (x, y) = −g #‰e2 =

(0−g)

a conservative vector field? •

2. Newton’s gravitational vector field

#‰F (x, y) = −

#‰x

‖ #‰x‖3

from §2.3, equation (146) is a conservativevector field. Show this by finding a potentialof the form f(x, y, z) = K‖ #‰x‖a for suit-able constants a andK .

3. Reread the section in Chapter IV aboutClairaut’s theorem. You now have two waysto tell that a vector field

#‰F = P (x, y) #‰e1 +

Q(x, y) #‰e2 cannot be a gradient. Which arethey? •

4. (a) Compute the line integrals of the vec-tor fields

#‰F =

(x0

)and

#‰G =

(0x

)around the unit circle #‰x(θ) = cos θ #‰e1 +sin θ #‰e2. •(b) Which of the vector fields

#‰F or

#‰G can-

not be a gradient, based on your answer to(a)? •(c) Can you conclude from your answer to(a) that any of the vector fields

#‰F or

#‰Gmust

be a gradient? •

9. Flux integrals

9.1. De�nition of �ux. In § 5.1 we de�ned the integral of a vector �eld along acurve C as the line integral of the tangential component of the vector �eld. If the curveC is not a space curve, but lies in the xy-plane, then one can also de�ne the flux of thevector �eld across the curve.

To de�ne the �ux we must �rst choose a unit normal vector # ‰

N for the curve C, i.e. ateach point on Cwemust choose a vector # ‰

N that has unit length and that is perpendicularto the curve:

‖ # ‰

N‖ = 1, and # ‰

N · #‰T = 0.

Once a unit normal for the curve C has been chosen, the �ux of a vector �eld #‰v acrossthe curve C in the direction of # ‰

N is de�ned to be

(162) Flux =

∫C

#‰v · # ‰

N ds

�e �ux integral has a very natural interpretation if the vector �eld #‰v is the velocity�eld of a two dimensional �uid �owing in the plane. If C is an arc in the plane, and if # ‰

Nis a unit normal to C, then �uid will �ow across this arc, and one can ask how much �uid�ows across the arc in the direction of # ‰

N . �e answer is given by the �ux integral (162).For an explanation see Figure 11. �ere the arc is divided into many small sub arcs, whichmay be considered nearly straight. During a time interval of length ∆t the �uid �owingthrough one such short arc sweeps out a parallelogram of which one side has length ∆s,while the other is given by the vector #‰v∆t. �e area of the small parallelogram is then# ‰

N · #‰v∆t∆s. To get the rate at which �uid �ows across the short arc we divide this by∆t to get #‰v · # ‰

N∆s. Adding over all short arcs that comprise the curve C leads to the �uxintegral (162).

9.2. Flux across a closed curve. If C is a closed curve without self intersections,and R is the region it encloses then the �ux of a vector �eld #‰v across the curve C canagain be interpreted as the rate at which �uid �ows across the curve C. Since the curvenow encloses the bounded region R, we can also say that the �ux of #‰v across the curveC is the net rate at which �uid leaves the region R (provided # ‰

N is the outward pointingunit normal).


C

#‰

T− # ‰

NC

#‰v∆t

#‰v∆t

#‰v∆t

#‰v∆t

The unit tangentand a choice

of unit normals

∆s

# ‰

N

#‰v∆t

Area =# ‰

N · #‰v∆t∆s

Water flowing acrossthe curve C# ‰

N

C

Figure 11. Le�: At each point on a plane curve there are two choices of unit normal. If a unittangent is given, then the most common choice of normal is to rotate the unit tangent counter-clockwise by 90◦.

Top, right: if water is flowing over the plane with velocity field #‰v , then the rate at which waterflows across the curve C in the direction of the normal

# ‰N is given by the flux integral (162) of the

velocity.Bo�om, right: the amount of water flowing across a short arc of length ∆s on the curve C in

time ∆t is the area of a parallelogram one of whose sides is #‰v∆t. The area of this parallelogramis the length of the normal component of #‰v∆t times ∆s.

Figure 12. The flux of a vector field #‰v across a closed curve measures the rate at which fluid isflowing out of the enclosed region, if

# ‰N is the outward normal to the curve.

9.3. Example – water under the bridge. An endless river R occupies the strip

R = {(x, y) : −1 ≤ y ≤ 1}

in the xy-plane. (�e width of the river is 2.) �e water in the river �ows with velocity

#‰v (x, y) = V (1− y2) #‰e1 =

(V (1− y2)

0

),

where V is a constant (it is the maximal velocity of the water, which is a�ained at y = 0,i.e. in the middle of the river; this �ow is a two dimensional version of the Poiseuille �owfrom § 2.2.)

9. FLUX INTEGRALS 153

#‰v = V (1− y2) #‰e1

Bridge

Figure 13. The shaded region represents the water that passed under the bridge during one timeunit.

�estion: Howmuch water �ows from le� to right through the line segmentAB, whereA is the point (0,−1), and B is the point (0, 1)?

Solution: We parametrize the line segment by

#‰x(u) =

(0u

), −1 ≤ u ≤ 1.

Normally one refers to the parameter as “time,” but since we are considering �owingwater, time is already part of the problem. �erefore we have called the parameter on thecurve u instead of t.

�e line segment is vertical, so the unit normal is a horizontal vector of length 1,i.e. either # ‰

N = #‰e1 or # ‰

N = − #‰e1. We are asked to �nd how much water �ows fromle� to right, so we need the normal that points to the right: # ‰

N = + #‰e1.We can now compute the integral. We begin with

ds = ‖ #‰x ′(u)‖du =

∥∥∥∥(01

)∥∥∥∥ du = du,

and# ‰

N · #‰v =

(10

)·(V (1− y2)

0

)= V (1− y2) = V

(1− u2

),

which gives us∫C

#‰v · # ‰

N ds =

∫ 1

u=−1

V (1− u2)du = V [u− 13u

3]1−1 = 43V.

9.4. An expanding �ow. A substance, perhaps a �uid, or a gas, is spreading fromthe origin and is moving with velocity �eld

#‰v = V

(x/Ry/R

)=V

R#‰x ,

where V and R are constants: V has the units of a velocity, and R has the units of alength. �e interpretation of these constants is that V is the speed at which �uid particlesare moving when they are at a distance R from the origin.


a b

in

out

Figure 14. Le�: The vector field #‰v (x, y) = VR

(x #‰e1 + y #‰e2

)and a circle with radius a. Right:

This vector field cannot describe the flow of an “incompressible” fluid like water since more fluidflows out of the circle with radius b than through the circle with radius a: water would have to becreated in the annular region between the two circles.

�estion: How much �uid �ows out of the circle with radius a?

Before we compute anything let us decide on the units that the answer should have.�e question of “how much” �uid �ows across the curve is ambiguous since we couldanswer in terms of mass (pounds or kilograms of �uid per second), or in terms of volume(gallons per second). �ese two are related by the density (pounds per gallon, kilos perliter, etc.) of the substance, and since we do not know anything about the density wewill measure “how much” in terms of the volume of substance �owing across the curveper second. In fact, since we are dealing with a two dimensional model (the substance is�owing in the plane rather than three dimensional space, we will measure the area that�ows across the curve instead of the volume.

Solution: We need to compute ∮Ca

#‰v · # ‰

N ds

where Ca is the circle with radius a centered at the origin. �e unit normal # ‰

N is theoutward pointing normal, because we are asked to �nd how much �uid �ows out of thecircle.

In this case # ‰

N and #‰v are parallel so that on the circle Ca we have# ‰

N · #‰v = ‖ #‰v ‖ = Va

R.

�erefore the �ux integral is very simple, namely∮Ca

#‰v · # ‰

N ds =

∮Ca

Va

Rds = V

a

R·∮Ca

ds︸︷︷︸Length of Ca

= 2πV a2

R.

�is answer is unrealistic if we assume that #‰v really is the velocity �eld of a normal �uid(like water). To see what is wrong we compute how much �uid �ows through circles ofdi�erent radii a and b. If a < b then the rate at which �uid �ows through the smaller

10. GREEN’S THEOREM 155

circle is less than the rate at which �uid �ows out of the larger circle. �e di�erence,

(163) 2πV

R

(b2 − a2

),

represents the amount of �uid that is (apparently) being created every second in the ring-shaped region between Ca and Cb.

However, the computation could apply to a �owing gas. In this case we have computedthe volume of gas that �ows across each circle per time unit (or the area of gas, becausewe are using a two dimensional model here). A larger volume could �ow across Cb thanacross Ca, provided the gas is less dense at the circle Cb than it is at the smaller circleCa. �is kind of reasoning is important for �uid and gas dynamics, and in fact appears inmany other branches in physics.

10. Green’s �eorem

We have seen that the line integral∮C

#‰

F · #‰T ds of a vector �eld along a closed curvevanishes if the vector �eld happens to be the gradient of some function (§ 6.3), but if thevector �eld #‰

F is not the gradient of a function then its line integral around a closed curveneed not vanish (see the example in § 5.8). We have also seen examples where a �uxintegral

∮C

#‰v · # ‰

N ds is non-zero.Green’s theorem relates the line integral of any vector �eld on the boundary curve C

of some domain R with a double integral involving partial derivatives of the vector �eldon the domain R itself. �ere are two versions of the theorem, depending on what kindof line integral one considers. �e �rst version is for “work-type integrals,” and is bestwri�en in di�erential form notation. �e second version is about �ux integrals.

10.1. Simply connected domains. In both versions of Green’s theorem one has aplane region R and its boundary curve(s). �e boundary curves of a region can be some-what complicated. �e simplest situation is where the domain R is simply connected.�is means that R is the region enclosed by one curve C (the curve C is not allowed tointersect itself.) Another way of describing what a simply connected region is, is to saythat a region is simply connected if “it has no holes.” See Figure 15. If a domain is notsimply connected, then its boundary may consist of more than one curve (Figure 15 onthe right).

Green’s theorem. Let R be a simply connected region in the plane, and let C be theboundary curve of the region R, with the counter clockwise orientation. Let

#‰v (x, y) = P (x, y) #‰e1 +Q(x, y) #‰e2

be a vector �eld that is de�ned and has continuous derivatives everywhere in R. �en onehas

(164)∮C

P (x, y)dx+Q(x, y)dy =

∫∫R

{∂Q

∂x− ∂P

∂y

}dA.

�e second form of Green’s �eorem is about �ux integrals and is o�en called the“divergence theorem.”


R

C

R

C1

C2

Figure 15. Le�: A simply connected domain, i.e. a domain“without holes.” Right: a non-simplyconnected domain, i.e. “a domainwith a hole.” For this non-simply connected domain the boundaryconsists of two closed curves rather than one.

Flux version of Green’s theorem. Let R be a bounded domain in the plane that isenclosed by a curve C. If

#‰v =

(P (x, y)Q(x, y)

)is a vector �eld that is everywhere de�ned and di�erentiable on R, then

(165)∮C

#‰v · # ‰

N ds =

∫∫R

{∂P∂x

+∂Q

∂y

}dA

where# ‰

N is the outward unit normal for the domain R.

�e quantity

(166) ∂P

∂x+∂Q

∂y

is called the divergence of the vector �eld #‰v , and is wri�en as “div #‰v .” It is one of severalcombinations of partial derivatives of vector �elds that turn out to be useful. See § 16 formore of these.

10.2. Examples illustrating Green’s �eorem.An example where the line integral vanishes on any closed curve. Consider the vector

�eld#‰

F (x, y) = x #‰e1 + y #‰e2 = ( xy ) ,

and let C be a closed curve in the plane, that encloses the region R. �en the line integralof #‰

F along C is given by ∮C

#‰

F ·d #‰x =

∫∫R

{∂y∂x− ∂x

∂y

}dA

=

∫∫R

0 dA

= 0.

11. CONSERVATIVE VECTOR FIELDS AND CLAIRAUT’S THEOREM 157

We �nd that the integral is always zero, no ma�er what the region R is. If we were luckyenough to note that this particular vector �eld is a gradient,

#‰

F =

(xy

)=

#‰∇(

12x

2 + 12y

2),

then we could also have used (159) to conclude that∮C

#‰

F ·d #‰x = 0 for any closed curve C.�e expanding gas example again. Let

#‰v (x, y) =V

R#‰x =

V

Rx #‰e1 +

V

Ry #‰e2

be the velocity �eld of the expanding gas from § 9.4, and let C be any closed curve thatis the boundary curve of some domain R. We again compute the �ux of the velocity �eldacross the curve C in the direction of its outward normal, but this time we use Green’s�eorem.

Figure 16. A “gas” is flow-ing in the plane with velocityfield #‰v . At what rate is gasflowing out of the shaded re-gion?

The answer turns out to beproportional to the area ofthe region.

According to Green’s �eorem we have∮C

#‰v · # ‰

N ds =

∫∫R

div #‰v dA

where div #‰v is the divergence of #‰v , de�ned in (166). �us

div #‰v =∂v1

∂x+∂v2

∂y=∂{V x/R}

∂x+∂{V y/R}

∂y=V

R+V

R= 2

V

R,

and �nally,

(167)∮C

#‰v · # ‰

N ds =

∫∫R

2V

RdA = 2

V

R· area of R.

�is is consistent with our previous computation in § 9.4. �ere we found in (163) that theamount of �uid produced in an annulus of inner and outer radii a and b is 2π VR (b2− a2).Since the area of the annulus is πb2 − πa2 this is the same result that we just found in(167).

11. Conservative vector �elds and Clairaut’s theorem

Let #‰

F (x, y) = P (x, y) #‰e1 +Q(x, y) #‰e2 be a vector �eld on some region R in the plane.�e fundamental theorem for line integrals and Clairaut’s �eorem (III.13.3) provide con-nections between conservative vector �elds, gradient vector �elds, and the partial deriva-tives of P and Q. To summarize what we have seen so far, recall that. . .

• if#‰

F =#‰∇f for some function f(x, y) then

#‰

F is conservative,


• if#‰

F is conservative then#‰

F =#‰∇f for some function f(x, y)

• if#‰

F =#‰∇f , then ∂P

∂y=∂Q

∂x.

Looking at this list we see that themissing statement would be that “Py = Qx implies that#‰

F is a gradient vector �eld.” �is turns out only to be true if we impose an extra assump-tion on the domain R, namely, R must be simply connected (see § 10.1.) We formulatethis more precisely in a theorem.

11.1. �eorem. If the domain R is simply connected and if#‰

F = P #‰e1 + Q #‰e2 is avector �eld on R for which

(168) ∂P

∂y=∂Q

∂x,

then#‰

F is conservative, and hence#‰

F =#‰∇f for some function f .

�e proof is an instructive application of Green’s theorem, so we include it here:

Proof. We will show that (168) implies that #‰

F is conservative, i.e. that the line inte-gral of #‰

F around any closed curve in R vanishes.Let C be a closed curve inR, and assume to begin with that the curve does not intersect

itself. �en it must enclose a domain D, and since R is simply connected, the domain D

enclosed by the curve C lies entirely within R. We can therefore apply Green’s theoremto the curve C and conclude that∮

C

#‰

F ·d #‰x =

∮C

Pdx+Qdy =

∫∫D

{∂Q∂x− ∂P

∂y

}dA = 0.

�is is what we have to show. For a complete proof we would still have to remove theassumption we made that the curve C does not intersect itself. We will not do this indetail, but merely point out that if C has one self intersection, then one can break the

C

C = C1 + C2

RC

D

R

Figure 17. Le�: In the proof of Theorem 11.1 the case that C has no self intersections. Right:the case where C has at least one self intersection.

curve into pieces, each of which forms a closed curve without self intersections, to whichwe can apply the previous arguments.

�

12. PROBLEMS 159

12. Problems

1. Use Green’s theorem to compute the lineintegrals

I =

∮C

y dx− x dy

J =

∮−C

y dx− x dy

K =

∮C

(x− sin y) dy

where C is this curve:

1

1

-1

-1

In this drawing the circle has radius 1, andthe height of the triangle is also 1. The ori-entation of the curve is in the direction ofthe arrows.

2. Let R be the unit square, i.e. R ={(x, y) : 0 ≤ x, y ≤ 1}. Let C be the

boundary of the squareR traversed in coun-terclockwise sense.

(a) Compute∫C

2y dx + 3x dy by finding

parametrizations of the edges and applyingthe definition of the line integral.

(b) Compute∫C

2y dx+ 3x dy by applying

Green’s theorem and computing a suitabledouble integral over R. •

3. Compute∮C

#‰∇(x2y2)· #‰T ds where C is

the counter clockwise traversed boundary ofthe region R defined by x2 + y2 < 16. •

4. A gas is flowing in the plane with velocityfield

#‰v (x, y) =

(1−y

).

(a) Draw the vector field.

(b)Howmuch gas flows out of the rectangleR defined by 0 < x < L, −H < y < H?

5. In each of the following problems C is the counter clockwise traversed boundary of the regionD and you are asked to compute the indicated line integral in two ways: directly, and by usingGreen’s Theorem.

(a)∮C

xy dx+ xy dy, R : 0 ≤ x, y ≤ 1. •

(b)∮C

e2x+3y dx+ exy dy, R : −2 ≤ x ≤ 2, −1 ≤ y ≤ 1. •

(c)∮C

#‰F · #‰T ds, #‰

F (x, y) =( y cos xy sin x

), R : 0 ≤ x ≤ π/2, 1 ≤ y ≤ 2. •

(d)∮C

xy2 dx+ x2y dy, R : 0 ≤ x ≤ 1, 0 ≤ y ≤ x. •

(e)∮C

x2y dx+ xy2 dy, R : 0 ≤ x ≤ 1, 0 ≤ y ≤ x. •

(f)∮C

x√y dx+

√x+ y dy, R : 1 ≤ x ≤ 2, 2x ≤ y ≤ 4. •

(g)∮C

(x/y) dx+ (2 + 3x) dy, R : 1 ≤ x ≤ 2, 1 ≤ y ≤ x2. •

(h)∮C

sin y dx+ sinx dy, R : 0 ≤ x ≤ π/2, x ≤ y ≤ π/2. •

(i)∮C

x ln y dx, R : 1 ≤ x ≤ 2, ex ≤ y ≤ ex2

. •

(j)∮C

√1 + x2 dy, R : −1 ≤ x ≤ 1, x2 ≤ y ≤ 1. •


(k)∮C

x2y dx− xy2 dy, R : x2 + y2 ≤ 1. •

(l)∮C

#‰v · # ‰N ds, #‰v (x, y) =

(xy2

x2y

), R : x2 + y2 ≤ 1,

# ‰N the outward normal. •

(m)∮C

y3 dx+ 2x3 dy, R : x2 + y2 ≤ 4. •

13. Surfaces and Surface integrals

In addition to integrals over two and three dimensional domains, and line integralsover curves in the plane or in space, one can also integrate over surfaces. In this sectionwe will give a quick introduction to surfaces and surface integrals. For an in-depth studyof the subject, students should consider taking amore advanced course on vector calculus,such as Math 321.

Figure 18. Two dimensional surfaces.

13.1. Surfaces and surface patches. We can think of a curve as the result of takinga line and bending it into some curved shape. In the same way a surface can be thoughtof as the result of taking a portion of a �at plane and bending and twisting it into someother shape. Just as some curves appear as the boundaries (or edges) of plane domains,some surfaces appear as boundaries of domains in three dimensional space. For example,the sphere centered at the origin and with radius R(169) x2 + y2 + z2 = R2

is the boundary of the three dimensional ball it encloses.Surfaces can be described using “de�ning equations,” i.e. by specifying an equation

whose zero set is the intended surface. For example, the sphere of radius R has (169) asde�ning equation. For purposes of integration it is more convenient to represent surfacesin terms of surface patches. �ese are the surface analog of parametrized curves.

De�nition. A surface patch is a di�erentiable vector function of two variables#‰x = #‰x(u, v), a ≤ u ≤ b, c ≤ v ≤ d.

13.2. Example – the graph of a function is a surface patch. If z = f(x, y) is afunction de�ned for a ≤ x ≤ b, c ≤ y ≤ d, then its graph can be thought of as a surfacepatch, where

(170) #‰x(u, v) =

uv

f(u, v)

.

In words: we take the x and y coordinates as parameters, se�ing x = u and y = v. �e zcomponent of any point on the patch is then z = f(x, y) = f(u, v).

13. SURFACES AND SURFACE INTEGRALS 161

S

u

v

d

c

a b

(u, v) 7→ #‰x(u, v)

v constant,a ≤ u ≤ b

u constant,c ≤ v ≤ d

Figure 19. A surface patch. A vector function #‰x of two variables u and v maps a piece of theuv-plane into three dimensional space. The rectangular grid in the uv domain gets mapped ontoa network of curves on the surface patch S. If the rectangular grid in the uv-domain is su�icientlyfine, then the corresponding curves on the surface divide the surface patch into small pieces thatare approximately parallelograms.

#‰x(u, v

)

x

y

z

u

v

z = f(u, v)

Figure 20. A graph as a surface patch: the graph of a function z = f(x, y) can be representedas a surface patch. The vector function #‰x that parametrizes the graph is #‰x(u, v) = u #‰e1 + v #‰e2 +f(u, v) #‰e3.

13.3. Example – the sphere as a surface patch. �e sphere is a two dimensionalsurface, and one way to parametrize it is to use spherical coordinates. �us

(171) #‰x(θ, ϕ) =

R cosϕ sin θR sinϕ sin θR cos θ

with

0 ≤ θ ≤ π, 0 ≤ ϕ ≤ 2π

is a surface patch that parametrizes the sphere: it is a parametrization of the sphere. See§VI-6.2 where spherical coordinates were de�ned, and see Figure 21 for a picture.

All points with θ = 0 are mapped to the “north pole”; all points with θ = π correspondto the “south pole”; the points with θ = 1

2π form the “equator.”


Figure 21. Sphere: a piece of the sphere parametrized by the surface patch in (171). Shown isthe piece with 0.1π ≤ θ ≤ 0.9π and 0.1π ≤ ϕ ≤ 1.9π.

13.4. Area of a surface patch. For any given surfacewe can ask “what is its surfacearea?” �e intuitive interpretation of this could be(172) “how much paint do we need to cover one side of the surface?”

or(173) “how much paper to we need to make the surface?”

Neither interpretation stands up to closer scrutiny: there are surfaces, like the Mobiusstrip in Figure 22, that only have one side, so that questions (172) and (173) will give

Figure 22. A Mobius strip. What is the surface area of this strip, and how many square inchesof paper do we need to make one?

di�erent answers. On the other hand, while it is possible to take a �at piece of paper andbend it in the shape of a cylinder, a cone, or a Mobius strip, it is not possible to bend a �atpiece of paper into a sphere without ripping or stretching it (and thus changing its area.)

In spite of these (and other) issues we will argue from intuition and derive a formulafor the area of a surface patch. �e story is very similar to the derivation of the arc lengthof a parametrized curve in § II.13.

If #‰x(u, v) is a surface patch with domain a ≤ u ≤ b, c ≤ v ≤ d, then we divide itsdomain into many small rectangular pieces of size ∆u by ∆v by partitioning both the uand v intervals. See the le� half of Figure 23. �is leads to a partitioning of the surfacepatch into small regions, each of which is approximately a parallelogram (on the right

13. SURFACES AND SURFACE INTEGRALS 163

# ‰

N

#‰xv∆v

#‰xu∆u

∆A

S

u

v

d

c

a b

(u, v) 7→ #‰x(u, v)

∆v∆u

Figure 23. Computing the area and normal to a surface patch. The small rectangle in the uv-domain gets mapped to a small region on the surface patch. This small region is almost a parallel-ogram whose sides are given by the vectors #‰xu∆u and #‰xv∆v.

in Figure 23). We compute the area of the surface patch by adding the areas of all thesesmaller pieces. Since any such piece is approximately a parallelogram, we can �nd itsarea by computing the cross product of the vectors de�ned by its edges. To �nd these

u0 u0 + ∆u

v0

v0 + ∆v#‰x(u0 + ∆u, v0)− #‰x(u0, v0)

#‰x(u0, v0 + ∆v)− #‰x(u0, v0)

x

z

y

Figure 24. The small blue rectangle in the uv-plane from Figure 23, and its image on the surfacepatch.

edges consider Figure 24. In a small partition piece on the surface patch, the parameter uis allowed to vary between some value u0 and u0 + ∆u, while the other parameter v isallowed to vary between some v0 and v0+∆v. One edge of the surface patch (on the rightin Figure 24) represents the change in #‰x(u, v) as u is increased by ∆u, while keeping vconstant; i.e. it is

#‰x(u0 + ∆u, v0)− #‰x(u0, v0) ≈ ∂ #‰x

∂u(u0, v0) ·∆u.

�e other edge represents the change in #‰x(u, v) when v is increased by ∆v and is thusgiven by

#‰x(u0, v0 + ∆v)− #‰x(u0, v0) ≈ ∂ #‰x

∂v(u0, v0) ·∆v.


�e area of the small parallelogram on the surface patch is therefore the length of thecross-product of these two vectors:

∆A ≈∥∥∥∥∂ #‰x

∂u(u0, v0) ·∆u × ∂ #‰x

∂v(u0, v0) ·∆v

∥∥∥∥= ‖ #‰xu× #‰xv‖∆u∆v.

Adding this over all pieces that make up the surface patch gives us the total area of thepatch:

(174) Area of S =

∫ d

c

∫ b

a

‖ #‰xu× #‰xv‖ du dv.

�e quantity that appears in this integral appears in many other surface integrals and iscalled “the area element” of the surface patch #‰x . �e usual notation for this quantity is

(175) dA = ‖ #‰xu× #‰xv‖ du dv,

and it is thought of as the “area of an in�nitesimally small piece of the surface.”

13.5. Surface integrals. If f(x, y, z) is some function that is de�ned on the surface(e.g. a density of some kind), then one de�nes its integral over the surface to be

(176)∫∫S

f(x, y, z) dA =

∫ d

c

∫ b

a

f( #‰x(u, v)) ‖ #‰xu× #‰xv‖ du dv.

Here f( #‰x(u, v)) is the result of substituting the surface parametrization #‰x(u, v) in thefunction.

13.6. Unit normal to a surface patch. From Figures 23 and 24 it appears that bothvectors #‰xu and #‰xv are tangent to the surface, and that their cross product #‰xu× #‰xv isperpendicular to the surface. We adopt this as the de�nition of the tangent plane andnormal direction to the surface:

De�nition. Let #‰x be a surface patch, and letX0 be a point with position vector #‰x(u0, v0)on the surface patch. If

# ‰mdef= #‰xu(u0, v0)× #‰xv(u0, v0) 6= #‰

0 ,

then the vector # ‰m de�nes the normal direction to the surface. �e tangent plane to the surfacethrough X0 is the plane with normal vector # ‰m that goes through X0.

In general the vector # ‰m does not have unit length, and one o�en needs a normal vectorwith length one for the surface. �us one de�nes

(177) # ‰

N =# ‰m

‖ # ‰m‖=

#‰xu× #‰xv‖ #‰xu× #‰xv‖

to be the unit normal for the surface patch #‰x . Note that − # ‰

N also is a unit vector thatis normal to the surface.

14. EXAMPLES 165

13.7. Flux across a surface patch. In § 9 we de�ned the �ux across a curve of avector �eld #‰v (which we think of as the velocity �eld of some �owing liquid or gas).�e set-up in § 9 was purely two dimensional. Now that we have introduced surfaceintegrals we can formulate the same concept for the more realistic situation of a �uid�owing through three dimensional space with velocity �eld #‰v . We de�ne the flux of avector �eld #‰v across a surface patch to be

(178) Flux =

∫∫S

#‰v · # ‰

N dA

We have expressions for both # ‰

N and dA (namely, (175) and (177)). When put together,they simplify to

# ‰

N dA =#‰xu× #‰xv‖ #‰xu× #‰xv‖

· ‖ #‰xu× #‰xv‖ du dv = #‰xu× #‰xv du dv

�erefore the �ux integral can be computed as

(179)∫∫S

#‰v · # ‰

N dA =

∫ d

c

∫ b

a

#‰v ·( #‰xu× #‰xv) du dv.

14. Examples

14.1. Area and unit normal of a sphere. �e sphere with radius R can be repre-sented by the surface patch

(180) #‰x(θ, ϕ) =


,

for which we have

#‰xθ = R

cosϕ cos θsinϕ cos θ− sin θ

, #‰xϕ = R

− sinϕ sin θcosϕ sin θ

0

and hence

#‰xθ× #‰xϕ = R2

cosϕ sin2 θsinϕ sin2 θ

cos2 ϕ sin θ cos θ + sin2 ϕ sin θ cos θ

= R2

cosϕ sin2 θsinϕ sin2 θsin θ cos θ

= R2 sin θ

cosϕ sin θsinϕ sin θ

cos θ

.

�e length of #‰xθ× #‰xϕ is

‖ #‰xθ× #‰xϕ‖ = R2 sin θ

∥∥∥∥∥∥cosϕ sin θ

sinϕ sin θcos θ

∥∥∥∥∥∥ = R2 sin θ.

and the area element on the sphere isdA = R2 sin θ dθ dϕ.


Integrating over the sphere gives us the area of the sphere:

(181) Area of sphere =

∫ 2π

ϕ=0

∫ π

θ=0

R2 sin θ dθ dϕ = 4πR2,

which is the familiar answer.We also �nd from our formula for #‰xθ× #‰xϕ that the unit normal at the point with

position vector #‰x(θ, ϕ) is

# ‰

N =#‰xθ× #‰xϕ‖ #‰xθ× #‰xϕ‖

=


cos θ

#‰x

# ‰

N =#‰xR

Figure 25. The unit normal at a point on a sphere centered at the origin has the same directionas the position vector of the point.

Looking back at the de�nition (180) of our surface patch we see that

#‰x = R# ‰

N , or, # ‰

N =#‰x

R.

In words: the unit normal is just the position vector #‰x rescaled to length one. Perhapswith hindsight, this should be clear from a drawing of the sphere (e.g. Figure 25). Inmany geometrically simple situations it is o�en easier to guess the unit normal from adrawing than by going through a computation like the one we did in this example. Andsometimes it is even possible to compute the area element without working out #‰xu, #‰xv ,and their cross product. For instance, it is possible to derive our formula for the areaelement dA = R2 sin θdθdϕ from a drawing like Figure VI.16.

14.2. �e �ux of a vector �eld across the sphere. We consider the velocity �eldof the expanding gas from § 9.4 again, except we now consider a gas occupying threedimensional space:

#‰v =V0

R0

#‰x .

Here V0 and R0 are constants: V0 is the velocity of the gas when it has reached distanceR0 from the origin.

We compute the �ux

Flux =

∫∫SR

#‰v · # ‰

N dA

of this velocity �eld across the sphere SR with radius R in two ways.

15. THE DIVERGENCE THEOREM AND STOKES’ THEOREM 167

First, we use the formula for # ‰

N dA

# ‰

N dA = #‰xθ× #‰xϕ dϕ dθ = R2 sin θ


cos θ

dϕ dθ

and compute

Flux =

∫ π

θ=0

∫ 2π

ϕ=0

V0

R0

#‰x·R2 sin θ


cos θ

dϕ dθ

=V0

R0R2

∫ π

θ=0

∫ 2π

ϕ=0


·cosϕ sin θ

sinϕ sin θcos θ

sin θ dϕ dθ

=V0

R0R3

∫ π

θ=0

∫ 2π

ϕ=0

sin θ dϕ dθ

= 4πV0

R0R3.

�e second approach is more geometrical and avoids computing any integrals. Webegin by noting that the unit normal on the sphere at the point with position vector #‰x is# ‰

N = #‰x/R, and hence that

#‰v · # ‰

N =V0

R0

#‰x·#‰x

R=V0

R0

#‰x· #‰xR

=V0

R0

R2

R= V0

R

R0.

�e quantity we want to integrate is therefore constant. We �nd that the �ux is

Flux =

∫∫S

V0R

R0dA = V0

R

R0· Area of S = V0

R

R0· 4πR2,

which is the same as we got using the �rst approach.

15. �e divergence theorem and Stokes’ theorem

15.1. �e divergence theorem in three dimensions. If S is a surface that enclosesa three dimensional region R, if #‰v is a vector �eld that is de�ned and di�erentiable on all ofR, and if

# ‰

N is the outward unit normal on S, then

(182)∫∫S

#‰v · # ‰

N dA =

∫∫∫R

div #‰v dV

where div #‰v is the divergence of the vector �eld #‰v .

By de�nition the divergence of the vector �eld

#‰v =

v1(x, y, z)v2(x, y, z)v3(x, y, z)

= v1(x, y, z) #‰e1 + v2(x, y, z) #‰e2 + v3(x, y, z) #‰e3

is

div #‰v =∂v1

∂x+∂v2

∂y+∂v3

∂z.


15.2. Stokes’ �eorem. If S is a surface patch, if the curve C is the boundary of S,and if

#‰

F is a di�erentiable vector �eld de�ned everywhere on the surface, then

(183)∮C

#‰

F ·d #‰x =

∫∫S

(curl#‰

F )· # ‰

N dA

where the “curl” of a vector �eld is de�ned by

curl#‰

F =

∂F3

∂y −∂F2

∂z

∂F1

∂z −∂F3

∂x

∂F2

∂x −∂F1

∂y

15.3. Example involving the divergence theorem. We return to the computa-tion in § 14.2 of the �ux across the sphere S of radius R of the expanding gas vector �eld#‰v = V0

R0

#‰x . According to the divergence theorem we have∫∫S

#‰v · # ‰

N dA =

∫∫∫B

div #‰v dV

where B is the region enclosed by the sphere (the ball of radius R).�e divergence of #‰v is easy to compute:

div #‰v =∂

∂x

{V0x

R0

}+

∂

∂y

{V0y

R0

}+

∂

∂z

{V0z

R0

}= 3

V0

R0

Since the divergence is constant its integral over B is easy:∫∫∫B

div #‰v dV = 3V0

R0· Volume of B

= 3V0

R0

4

3πR3

= 4πV0

R0R3

where we have used that the volume of the ball B is 43πR

3.

16. ~∇ – di�erentiating vector �elds

�ecomponents of a vector �eld are functions, and thereforewe can di�erentiate them.As we have seen in the divergence theorem and Stokes’ theorem, various combinationsof the partial derivatives of vector �elds turn out to be very useful. �e easiest way todescribe these is to introduce the so-called “nabla operator” (or “del operator”) de�ned by

(184) #‰∇ =

∂∂x∂∂y∂∂z

=∂

∂x#‰ı +

∂

∂y#‰ +

∂

∂z

#‰

k .

At �rst sight something is missing here: there are partial derivatives, but the functionwhose derivative is supposed to be taken is missing. �is is intentional, and the way #‰∇is to be interpreted is as follows:

16. ~∇ – DIFFERENTIATING VECTOR FIELDS 169

in any formula containing#‰∇,

the partial derivatives are to be taken ofall functions appearing to the right of the

#‰∇.For example, if f(x, y, z) is a function of (x, y, z), then

#‰∇f =

∂∂x∂∂y∂∂z

f(x, y, z) =

∂f∂x (x, y, z)∂f∂y (x, y, z)∂f∂z (x, y, z)

.

So #‰∇f is the gradient of the function f , just as we had de�ned it before. Sometimes adi�erent notation is used, namely

#‰∇f = grad f.

Next, supposing we have a vector �eld

#‰v =

P (x, y, z)Q(x, y, z)R(x, y, z)

what would be the result of “multiplying” #‰∇with #‰v ? Since we think of #‰∇ as a vector, themultiplication can be either a dot product, or a cross product. If we “take the dot product”of #‰∇ and #‰v , we get

#‰∇· #‰v =

∂∂x∂∂y∂∂z

·PQR

=∂P

∂x+∂Q

∂y+∂R

∂z.

Other commonly used notation for the divergence is

div #‰v =#‰∇· #‰v .

�is combination of derivatives of the components of #‰v is called the divergence of thevector �eld #‰v .

If we take the cross product of #‰∇ and #‰v we �nd the so-called curl of the vector �eld#‰v ,

#‰∇× #‰v =

∣∣∣∣∣∣∣#‰ı ∂

∂x P#‰ ∂

∂y Q#‰

k ∂∂z R

∣∣∣∣∣∣∣ =

Ry −QzPz −RxQx − Py

.

�e curl of a vector �eld #‰v is sometimes called the “rotation of #‰v ,” and the followingalternative notations also get used:

#‰∇× #‰v = curl #‰v = rot #‰v .

16.1. Example – compute the divergence of #‰v (x, y, z) = #‰x and #‰w = ρ #‰x . �evector �elds are

#‰v (x, y, z) = #‰x =

xyz

, and #‰w(x, y, z) = ρ #‰x =

ρxρyρz

,

in which ρ is the radius from spherical coordinates, i.e.

ρ =√x2 + y2 + z2.


�e divergence of #‰v is easy:#‰∇· #‰v =

∂x

∂x+∂y

∂y+∂z

∂z= 3, or div #‰v = 3.

�e divergence of #‰w is a li�le harder. To begin with, we have#‰∇· #‰w =

∂ρx

∂x+∂ρy

∂y+∂ρz

∂z.

It helps to �nd the partial derivatives of ρ separately. �ey are∂ρ

∂x=x

ρ,

∂ρ

∂y=y

ρ,

∂ρ

∂z=z

ρ.

�ese formulas look nicer in vector form, namely

(185) #‰∇ρ =

x/ρy/ρz/ρ

=1

ρ

xyz

=#‰x

ρ.

(Problem 17.8 will ask you to check this.) Armed with these partial derivatives we �nd∂ρx

∂x=x

ρx+ ρ

∂x

∂x=x2

ρ+ ρ.

We get similar terms for ∂ρy∂y and ∂ρz∂z . Adding these together leads to

#‰∇· #‰w =x2

ρ+y2

ρ+z2

ρ+ 3ρ =

x2 + y2 + z2

ρ+ 3ρ =

ρ2

ρ+ 3ρ = 4ρ.

16.2. Example – compute the curl of the Poiseuille �ow from § 2.2. �e �owis given in Equation (145). For simplicity we will assumeR = 1 and vc = 1. If we assumethat the central axis is the x axis, then the distance r to the central axis is r =

√y2 + z2,

and the velocity �eld in the cylinder is given by

#‰v (x, y, z) =

1− y2 − z2

00

.

Its curl is then#‰∇× #‰v =

∣∣∣∣∣∣#‰ı ∂

∂x 1− y2 − z2

#‰ ∂∂y 0

#‰

k ∂∂z 0

∣∣∣∣∣∣ =

0−2z+2y

16.3. �e curl of a gradient always vanishes. If f(x, y, z) is any function of three

variables, then its gradient is a vector �eld. What is the curl of this vector �eld? �ecomputation is straightforward,

(186) #‰∇× #‰∇f =#‰∇×

fxfyfz

=

∣∣∣∣∣∣#‰ı ∂

∂x fx#‰ ∂

∂y fy#‰

k ∂∂z fz

∣∣∣∣∣∣ =

(fz)y − (fy)z(fx)z − (fz)x(fy)x − (fx)y

.

We know that for any function of several variables “mixed partials are equal” (when theyare continuous), meaning (fx)y = (fy)x, etc. Another look at the curl we just computedtells us that(187) #‰∇× #‰∇f =

#‰0 , or, curl grad f =

#‰0 ,

for any function f (whose second derivatives are continuous).

17. PROBLEMS 171

Functiongrad−→ Vector field curl−→ Vector field div−→ Function

fgrad−→ #‰∇(f), #‰v

curl−→ #‰∇× #‰v , #‰wdiv−→ #‰∇· #‰w

Figure 26. The three basic operations of vector calculus. If we apply two consecutive operationsin this diagram, we get zero. See Equations (187) and (188).

16.4. �e divergence of a curl always vanishes. A computation just like the oneabove shows that if we have a vector �eld #‰v and we compute the divergence of its curl,we always get zero:(188) #‰∇·( #‰∇× #‰v ) = 0, or, div curl #‰v = 0.

Both Equations (187) and (188) are easy to remember in their “ #‰∇” form, if we pretend that#‰∇ is a real vector.

To get (187) remember that the cross product of any vector with itself always vanishes:#‰a× #‰a =

#‰0 for any #‰a . �e expression #‰∇× #‰∇f contains the cross product of #‰∇ with

itself, and so it should vanish. �e argument doesn’t hold because #‰∇ is not really a vector,but our computation (186) shows that the conclusion is true anyway.

To get (188), we use that #‰a× #‰

b is always perpendicular to #‰

b , no ma�er what #‰a and #‰

b

are, so that #‰a ·( #‰a× #‰

b ) = 0 always holds. Equation (188) is exactly that, with “ #‰a =#‰∇”

and “ #‰

b = #‰v .”

16.5. Other combinations of gradient, curl and divergence. �e divergence ofthe gradient does not normally vanish. If we expand the de�nitions we �nd

#‰∇· #‰∇f =∂2f

∂x2+∂2f

∂y2+∂2f

∂z2.

�is combination of second derivatives of a function, which occurs very o�en is calledthe Laplacian of the function f . �e following notation is used:

4(f) =#‰∇· #‰∇f = fxx + fyy + fzz.

�e other combination of derivatives that one can consider is “the curl of the curl.” If#‰v is a vector �eld then its curl #‰∇× #‰v is again a vector �eld, and thus one can computethe curl of the curl: #‰∇×(

#‰∇× #‰v ). �is combination usually does not vanish.For a given vector �eld one can also consider its divergence, #‰∇· #‰v , which is a function,

and of which one can compute the gradient, #‰∇(#‰∇· #‰v ). �is quantity usually also does

not vanish.�ere is a relation between the curl of the curl and the gradient of the divergence,

which is useful in mathematical physics, and which we state here for reference only: forany vector �eld #‰v one has

#‰∇×(#‰∇× #‰v ) = 4( #‰v )− #‰∇(

#‰∇· #‰v ).

17. Problems

1. If the central axis of the cylinder in Fig-ure 2 is the x-axis, and if the vector field is

as given in (145), then write #‰v in terms ofx, y, z instead of r. •


2. It is always said that Newton discoveredthe “inverse square law” for gravitation. Ac-cording to this law the strength of the grav-itational force is inversely proportional tothe square of the distance to the center ofthe Earth. But the exponent in our equa-tion (146) is three instead of two!

Could this be a di�erent law? A typo?To find out, compute the length ‖ #‰

F ‖ of thegravitational force in (146). •

3. Show that the magnetic field in (148) canbe wri�en as

#‰B(x, y, z) = C

#‰

k× #‰x

‖ #‰

k× #‰x‖n

for some integer n and some constant C .Find the right n and C . •

4. Let #‰a and # ‰m be two constant vectors,with components

#‰a =( a1a2a3

), and # ‰m =

(m1m2m3

).

Let #‰v (x, y, z) be the vector field

#‰v = ( # ‰m· #‰x) #‰a .

(a) Write #‰v in terms of its components:

#‰v =(···?······?······?···

).

(b) Compute#‰∇· #‰v .

(c) Compute#‰∇× #‰v .

(d) If #‰v is the gradient of some function f ,what can you say about the vectors #‰a and# ‰m?

(e) If #‰v is the curl of some vector field #‰w,what can you say about the vectors #‰a and# ‰m?

5. Let #‰a and # ‰m be as in the previous prob-lem. Consider the vector field

#‰v (x, y, z) = e#‰m· #‰x #‰a

= em1x+m2y+m3z( a1a2a3

).

(a) Show by computing the derivatives that#‰∇(e

#‰m· #‰x)

= e#‰m· #‰x # ‰m. •

(b) Compute#‰∇· #‰v . (Find the shortest way

to write the answer.) •

(c) Compute#‰∇× #‰v . Again, simplify your

answer. •

(d)Which condition must the vectors #‰a and# ‰m satisfy if #‰v is to be “divergence free,” i.e.if div #‰v = 0? •

(e) Suppose that #‰v =#‰∇φ for some func-

tion. What do you know about #‰a and # ‰m?•

6. If #‰v =(PQR

)is a vector field and f is a

function, then what is #‰v · #‰∇f? •

7. Product rules. Let f be a function ofthree variables, and let #‰v be a three dimen-sional vector field.

(a)#‰∇·(f #‰v ) = (

#‰∇f)· #‰v + f#‰∇· #‰v •

(b) Guess a product rule for#‰∇×(f #‰v ) and

prove it. •

8. In this problem, as in all the problems inthis section, ρ =

√x2 + y2 + z2 = ‖ #‰x‖ is

the radius in spherical coordinates.

Check the following formulas

#‰∇ρ =#‰x

ρ, and

#‰∇· #‰x = 3.

•

9. Use the product rule from Problem 17.7and the formulas from problem 17.8 to com-pute the following quantities

(a)#‰∇·(ρ2 #‰x) •

(b) #‰x· #‰∇ρ •

(c) div#‰x

‖ #‰x‖3 . What does this say about the

Earth’s gravitational field? •

10.

(a) Show that #‰x = 12

#‰∇(ρ2).

(b) Compute#‰∇× #‰x without doing any

derivatives. •

(c) Compute#‰∇×(ρ #‰x) using the product

rule from problem 17.7. •

11. Compute#‰∇× #‰v for the vector field

#‰v (x, y, z) =#‰

k× #‰x . •

17. PROBLEMS 173

12. Consider the vector field

#‰v (x, y, z) = ρn #‰x ,

where n is a constant. (Both Newton’s lawof gravitation and Coulomb’s law have thisvector field with n = −3.)

(a)Write #‰v (x, y, z) in the form( ·········

), using

only Cartesian coordinates x, y, z. •

(b) Compute#‰∇· #‰v . (Use one of the product

rules from Problem 17.7; you can also avoidcomputing the derivatives of ρ by lookingthem up in the text.) •

(c) For which value(s) of n does one havediv #‰v = 0? •

13. A function of three variables is calledradially symmetric if it only depends onthe radius ρ =

√x2 + y2 + z2, i.e. if it

can be wri�en as F (ρ) for some functionF of one variable. E.g. f(x, y, z) = ρ−2,or g(x, y, z) = e−ρ are radially symmetricfunctions.

Find the gradient of a radially symmetric func-tion F (ρ).

(You may want to use ρx = x/ρ,etc. from (185) to speed up the computation.)•(a) Let #‰v = ρn #‰x , as in problem 17.12. Doesthere exist a function f(x, y, z) such that#‰v =

#‰∇f? (Hint: try a radially symmetricfunction, and use problem 17.13.) •

Math 234 – Answers and Hints

(I12.3e) (a) 3 (b)(

2−44

)(c) 36 (d)

(3−33

)(e)(

1−55

)(I12.4) Every vector is a position vector. To see of which point it is the position vector translate it so its initial point is

the origin.

Here# ‰AB =

(−33

), so

# ‰AB is the position vector of the point (−3, 3).

(I12.5) One always labels the vertices of a parallelogram counterclockwise (see §�).

ABCD is a parallelogram if# ‰AB+

# ‰AD =

# ‰AC .

# ‰AB =

(1

1

),

# ‰AC =

(2

3

),

# ‰AD =

(3

1

). So

# ‰AB+

# ‰AD 6= # ‰

AC ,

and ABCD is not a parallelogram.

(I12.6a) As in the previous problem, we want# ‰AB +

# ‰AD =

# ‰AC . If D is the point (d1, d2, d3) then

# ‰AB =

0

1

1

,

# ‰AD =

d1

d2 − 2

d3 − 1

,# ‰AC =

4

−1

3

, so that# ‰AB +

# ‰AD =

# ‰AC will hold if d1 = 4, d2 = 0 and d3 = 3.

(I12.6b) Now we want# ‰AB +

# ‰AC =

# ‰AD, so d1 = 4, d2 = 2, d3 = 5.

(I12.9) Compute the dot product: #‰a · #‰b = 2s + 3(1 − s) = 3 − s. When the dot-product vanishes the vectors areperpendicular; this happens when s = 3. The angle between the vectors is acute is the dot-product is positive.This happens when 3− s > 0, i.e. when s < 3.

(I12.11a) The problem is open-ended because it doesn’t specify what “draw” means.If you are allowed to use a calculator and a protractor, then you could use the dot product to compute the angleθ between the two vectors; then, using your protractor, draw two line segments that make this angle, and marko� lengths 3 and 5 to get the vectors. From the dot-product and the two lengths you find 3×5×cos θ = −12,so cos θ = − 12

15= −0.8, which implies θ = arccos(−0.8) ≈ 2.498 . . . radians, or θ ≈ 143.13 . . . degrees.

This turns out to be only half the answer: we have forgo�en that the equation cos θ = −0.8 has many moresolutions than just arccos(−0.8). One other solution is − arccos(−0.8). This gives us two vectors

#‰b with

‖ #‰b ‖ = 5 and ‖ #‰

b ‖ = 5 and #‰a · #‰b = −12.

A di�erent approach goes like this: you could assume #‰a = 3 #‰e1, which has length 3, and#‰b =

(b1b2

). The

condition that#‰b have length 5 then says b21 + b22 = 52 = 25, while the dot-product is #‰a · #‰b = a1b1 +

a2b2 = 3b1. Since the dot-product must be −12 we find b1 = − 123

= −4. Using the length of#‰b leads to

b2 =√

25− (−4)2 = ±3. Thus we find two solutions:#‰b =

(−4±3

)= −4 #‰e1 ± 3 #‰e2.

You make the drawing.

(I12.11b) No. The inner product of two vectors is #‰a · #‰b = ‖ #‰a‖ ‖ #‰b ‖ cos θ, and therefore it can never be larger than

‖ #‰a‖ ‖ #‰b ‖.

(I12.13a) True:

( #‰a +#‰b )·( #‰a − #‰

b ) = ( #‰a +#‰b )· #‰a − ( #‰a +

#‰b )· #‰b

= #‰a · #‰a +#‰b · #‰a − #‰a · #‰b − #‰

b · #‰b

= ‖ #‰a‖2 + ‖ #‰b ‖2.

175

176 MATH 234 – ANSWERS AND HINTS

(I12.13b) True: This is Pythagoras’ theorem. Here is an algebraic derivation:

‖ #‰a +#‰b ‖2 = ( #‰a +

#‰b )·( #‰a +

#‰b )

= ( #‰a +#‰b )· #‰a + ( #‰a +

#‰b )· #‰b

= #‰a · #‰a +#‰b · #‰a + #‰a · #‰b +

#‰b · #‰b

= ‖ #‰a‖2 + 2 #‰a · #‰b + ‖ #‰b ‖2

= ‖ #‰a‖2 + ‖ #‰b ‖2.

(I12.13c) Not so. The same computation as for the previous problem shows

‖ #‰a − #‰b ‖2 = ( #‰a − #‰

b )·( #‰a − #‰b )

= ( #‰a − #‰b )· #‰a − ( #‰a − #‰

b )· #‰b

= #‰a · #‰a − #‰b · #‰a − #‰a · #‰b +

#‰b · #‰b

= ‖ #‰a‖2 − 2 #‰a · #‰b + ‖ #‰b ‖2

= ‖ #‰a‖2 + ‖ #‰b ‖2.

Therefore‖ #‰a − #‰

b ‖2 = ‖ #‰a‖2 − ‖ #‰b ‖2

only is true if#‰b =

#‰0 .

(I12.15a) ( #‰a +#‰b )×( #‰a +

#‰b ) =

#‰0

(I12.15b) ( #‰a +#‰b + #‰c )×( #‰a +

#‰b + #‰c ) =

#‰0

(I12.15c) ( #‰a +#‰b )×( #‰a − #‰

b ) = 2 #‰a× #‰b .

(I12.16a) #‰a · #‰c = #‰a ·( #‰a× #‰b ) = 0, but for the two given vectors in the problem #‰a · #‰c = −1 6= 0, so there cannot be a

vector#‰b with #‰a× #‰

b = #‰c as #‰c is not perpendicular to #‰a .

(I12.16b) In this case #‰a ⊥ #‰c , so the argument from the first part of this problem doesn’t rule out that there might be a

solution. So let’s try#‰b =

(b1b2b3

). Then

#‰a× #‰b =

b2−b1 − 2b3

2b2

?= #‰c =

1

32

.

Solving this for b1, b2, and b3 leads to b2 = 1, and−b1 − 2b3 = 3 as only remaining equation. Since we havefound b2 there are still two unknowns le�. We can choose an arbitrary b3 and set b1 = −3− 2b3, e.g. b3 = 0works, provided we choose b1 = −3.

(II17.7d) κ(x) =ex(

1 + e2x)3/2 .

To find the point with largest curvature: κ′(t) =et(

1 + e2t)5/2 (1−2e2t

), so the maximal curvature (smallest

radius of curvature) occurs when x = − 12

ln 2.

(III5.1) −d(x, y).

(III5.2) a < 0, b > 0, c > 0.

(III5.3a) x = −2 for the x-axis, y = 6 for the y-axis, z = 6 for the z-axis.

(III5.3b) z = 3− 34x− 3

2y.

MATH 234 – ANSWERS AND HINTS 177

(III5.3c)x

a+y

b+z

c= 1 is a nice symmetric way of writing the equation.

(III5.4) The distance is|c|

√1 + a2 + b2

.

(III5.5a) This one is already the sum of squares. We don’t have to do anything, and can immediately conclude thatf(x, y) > 0 for all (x, y) in the plane except the origin, where x = y = 0 and f(x, y) = 0.

(III5.5b) The square containing x is already complete (no xy terms) and we can immediately factor Q(x, y) = (x −y)(x+ y).

(III5.5c) We complete the square:g(x, y) = (x− 2y)2 − y2.

We get the di�erence of two squares, so we can factor the quadratic form:

g(x, y) = (x− 2y − y)(x− 2y + y) = (x− 3y)(x− y).

(III5.5d) This one is positive definite:

Q = 9(s2 − 4st+ 9t2

)= 9[(s− 2t)2 − 4t2 + 9t2

]= 9[(s− 2t)2 + 5t2

]= 9(s− 2t)2 + 45t2.

(III5.5e) Positive definite:

M =1

2

{α2 − 2αβ + 2β2

}=

1

2

{(α− β)2 + β2

}.

(III5.5f) This quadratic form has no x2 term. When that happens you cna immediately factor the form, because allterms contain y:

Q(x, y) = xy + y2 = (x+ y)y.

This form is indefinite.

(III5.5g) Now this form does have an x2 term, so we can complete the square if we want to . . . but if we look carefullythen we see that there’s not y2 term. Because of this we can factor out x, and we get

Q = x2 + 2xy = x(x+ 2y).

The form is indefinite.What if we don’t notice that y2 is missing and just blindly complete the square? Nothing goes wrong and weget the same answer:

Q = x2 + 2xy = x2 + 2xy + y2 − y2 = (x+ y)2 − y2 = (x+ y − y)(x+ y + y) = x(x+ 2y).

We did work too hard though :-(

(III5.6) Complete the square:

Q = (x+ ky)2 − k2y2 + y2 = (x+ ky)2 + (1− k2)y2.

If 1− k2 > 0 then we have the sum of two squares. If 1− k2 < 0, then we can rewriteQ as the di�erence oftwo squares

Q = (x+ ky)2 − (k2 − 1)y2 = (x+ ky)2 −(√

k2 − 1y)2

which is indefinite. That is all we need to know: we are not actually asked to factor the form when it isindefinite. But in case you’re wondering, the somewhat ugly formula is thus:

Q =(x+ (k +

√k2 − 1)y

)(x+ (k −

√k2 − 1)y

).

The conclusion is that Q(x, y) is positive definite if −1 < k < 1 and indefinite when k > 1 or k < −1. Inthe remaining cases k = ±1 we have

Q = (x+ ky)2 − k2y2 + y2 = (x+ ky)2 + (1− k2)y2 = (x± y)2,

i.e. the form is a square (it is semidefinite).


(III5.7a) The graph is the saddle surface, the function is defined at all (x, y). The level set is given by xy = c. If c 6= 0

then this set consists of both branches of the hyperbola y = cx. If c = 0 then xy = 0 is equivalent with x = 0

or y = 0, so the level set is the union of the x-and y-axes.

(III5.7b) z − x2 = 0. Domain R2. Graph is a parabolic cylinder and consists of horizontal lines perpendicular to thexz-plane, going through the parabola y = x2 in that plane.Level sets: parallel straight lines x = ±

√z if z > 0, the x axis if z = 0, the empty set if z < 0.

(III5.7c) z2 − x = 0. Implicit function. At least two functions are defined, namely z = ±√x. Domain: all points

(x, y) with x ≥ 0. Graph is half a parabolic cylinder and consists of horizontal lines perpendicular to thexz-plane, going through the parabola z =

√x (or z = −

√x, depending on which function you choose) in

that plane.Level sets (assuming we choose the function z = +

√x): the line x = z2 if z ≥ 0, empty set otherwise.

(III5.7d) z − x2 − y2 = 0. Domain is the whole plane. Graph is a paraboloid of revolution, obtained by rotating theparabola z = x2 in the xz-plane around the z axis.Level sets: circle with radius

√z for z > 0, the origin for z = 0 (note: this level set is a point rather than a

curve), empty for z < 0.

(III5.7e) z2 − x2 − y2 = 0. Implicit function. Domain all of R2. Possible functions are z = ±√x2 + y2. Graph is

the cone obtained by rotating the half line z = x, x ≥ 0 in the xz-plane around the z axis (or the half linez = −x, x ≥ 0, if you chose z = −

√x2 + y2.)

Level sets (assuming we choose z = +√x2 + y2): circle with radius z when z > 0, origin when z = 0, empty

when z < 0.

(III5.7f) xyz = 1. Domain the whole plain with the x and y-axes removed, i.e. all points (x, y) with xy 6= 0. Functionis f(x, y) = 1

xy. For each y the graph is the hyperbola z = 1/(yx) which is just the standard hyperbola

z = 1/x stretched vertically by a factor 1/y. As y → 0 this factor goes to∞.

(III5.7g) xy/z2 = 1. Implicit function. Domain first and third quadrants (all points with xy > 0). Functions z =

±√xy. Cross sections with planes y =constant are half parabolas.Note: Harder to see, but the surface with equation xy = z2 is in fact the cone obtained by rotating the x-axisaround the line x = y in the xy-plane.

(III5.8a) x > 0. This one is in the text.

(III5.8b) x < 0.

(III5.8c) x > 0. This is the same region as in part (a): remember that the polar angle is only determined up to a multipleof 2π.

(III5.8d) In the upper half plane, y > 0.

(III5.8e) In the whole plane, except the origin, and the negative x-axis. This formula for the polar angle θ clearly is validin a larger region than the other formulas, but it does not look half as nice.

(III5.9) The level set for c = −24 is the empty set, since it consists of all points on the lake surface where the lake is−24 meters deep–i.e. where the water reaches 24meters above the lake.Similarly, the level set for c = +400 is also empty since the lake is not that deep anywhere.The level set d−1(0) consists of those points where the lake is 0meters deep. This is exactly the shore line.The level set d−1(24) consists of all points on the lake surface where the lake is exactly 24meters deep. Formthe map it looks like this happens on two separate curves near the center of the lake.

(III5.10) See § 4.

(III5.11a) The two rectangular strips −3 ≤ x ≤ 3, 2 ≤ y <∞ and −3 ≤ x ≤ 3,−∞ < y ≤ −2.

(III5.11b) By definition arcsin(x) is only defined if −1 ≤ x ≤ 1. For arcsin(x2 + y2 − 2) to be defined, we musttherefore have −1 ≤ x2 + y2 − 2 ≤ 1, i.e. 1 ≤ x2 + y2 ≤ 3.The domain of this function is the ring-shaped region between the circles with radii 1 and

√3, both centered

at the origin. Circles are included in the domain.


(III5.11c) The way this function is wri�en both√x and

√y must be defined, so the domain consists o� all (x, y) with

x ≥ 0 and y ≥ 0.

(III5.11d) √xy must exist, which happens for all (x, y) in the first and third quadrants (axes included.)

(III5.11f) The region in the plane given by x2 + 4y2 ≤ 16, which is the region enclosed by an ellipse with major axis oflength 4, along the x axis, and minor axis of length 2 along the y-axis. The ellipse is included.

(III5.12) The level sets of the function whose graph is a cone are equally spaced circles (the level set at level c is a circlewith radius c). Hence the one on the right corresponds to the cone, and the one on the le� corresponds to theparaboloid.

(III5.13a) (0, 12

) is in the squareQ, so it is the point closest to (0, 12

).The point (0, 1) on the top edge of the square is closest to (0, 2).The corner point (1, 1) is closest to (3, 4).

(III5.13b) f(0, 12

) = 0; f(0, 2) = 1 and f(3, 4)) =√

22 + 32 =√

13.

(III5.13c) The zero set of f is the squareQ.

(III5.13d) The level set at level −1 is empty. The others are “rounded rectangles,” see this drawing, in which the squareis grey, the dashed lines are given by x = ±1 or y = ±1.

x

y

(III5.13e) The lines x = ±1 and y = ±1 divide the plane into nine regions. On each region the function is given by adi�erent formula. Here they are:

f(x, y) if . . .0 (x, y) inQx− 1 x ≥ 1, |y| ≤ 1y − 1 |x| ≤ 1, y ≥ 1

−x− 1 x ≤ −1, |y| ≤ 1−y − 1 |x| ≤ 1, y ≤ −1√

(x− 1)2 + (y − 1)2 x ≥ 1 and y ≥ 1√(x− 1)2 + (y + 1)2 x ≥ 1 and y ≤ −1√(x+ 1)2 + (y − 1)2 x ≤ −1 and y ≥ 1√(x+ 1)2 + (y + 1)2 x ≤ −1 & y ≤ −1

(III5.14a) At time t we have a line through the origin with slope sin t. As time progresses this lines turns up and down,and up and down, etc.

(III5.14b) Same as previous problem, but twice as fast.

(III5.14c) At all times one sees the graph of y = sinx stretched vertically by a factor t.

(III5.14d) Same as previous problem, but twice as fast.


(III5.14e) The graph of y = sin 2x stretched vertically by a factor t.

(III5.14f) Parabola with its minimum on the x-axis at x = t. So we see the parabola y = x2 translating from the le� tothe right with constant speed 1.

(III5.14g) Parabola with its minimum on the x-axis at x = sin t. So we see the parabola y = x2 translating back andforth horizontally every 2π time units.

(III5.14j) At time t we see Agnesi’s witch, i.e. the graph y = a/(1 + x2) with amplitude a = 1/(1 + t2). Thus we seea bump whcich starts out small at t = −∞, grows to its maximal size at time t = 0, and then decays again,until it vanishes at t = +∞.

(III5.16) The graph of y = g(x − a) is obtained from the graph of y = g(x) by translating the graph of y = g(x) bya units to the right.Hence the graph of g(x − ct) is the graph of g(x) translated by ct units to the right. As time changes thegraph of g(x− ct) therefore moves with velocity c to the right.

(III5.17) If you know the graph of a function y = g(x), then you get the graph of y = cg(x) by stretching the graphof g vertically by a factor c (here c is a constant.) If you allow this constant to depend on time, e.g. as in thisproblem by se�ing c = cos(ωt), then the “movie” you get is of a version of the graph of g which is growingand shrinking vertically.

y=cos(ωt)g(x)

y=g(x)

(IV3.2b) −2xy sin(x2y), −x2 sin(x2y) + 3y2

(IV3.2c) (y2 − x2y)/(x2 + y)2, x3/(x2 + y)2

(IV3.2g) 2xex2+y2 , 2yex

2+y2

(IV3.2h) y ln(xy) + y, x ln(xy) + x

(IV3.2i) −x/√

1− x2 − y2, −y/√

1− x2 − y2

(IV3.2l) tan y, x/ cos2 y

(IV3.2m) −1/(x2y), −1/(xy2)

(IV3.4a)∂θ

∂x= −

y

x2 + y2,∂θ

∂x=

x

x2 + y2.

(IV3.5) The distance to the origin is exactly the radius in polar coordinates, so f(x, y) =√x2 + y2, and

fx =x√

x2 + y2, fy =

y√x2 + y2

.

This is the same as in problem 3.3. The only quantity that we did not compute before is(fx)2

+(fy)2

=x2

x2 + y2+

y2

x2 + y2=x2 + y2

x2 + y2= 1.

(IV3.6a) ∂z∂x

= f ′(x)g(y), ∂z∂y

= f(x)g′(y).


(IV3.6b) ∂z∂x

= yf ′(xy), ∂z∂y

= xf ′(xy).

(IV3.6c) ∂z∂x

= 1yf ′(x

y), ∂z∂y

= − xy2f ′(x

y).

(IV7.1a) The linear approximation formula is equation (60), in whichx0 = a = 3, y0 = b = 1, and∆x = x−a = x−3,∆y = y − b = y − 1. So for this problem the linear approximation of f(x, y) = xy2 at (3, 1) is

f(x, y) ≈ 3 + (x− 3) + 6(y − 1) = x+ 6y − 6.

This approximation is only expected to be good when (x, y) is close to (3, 1). The approximation contains anerror which is small compared to |x− 3| and |y − 1|.FAQ: What is the relation between the linear approximation and the tangent plane?Answer: They are very closely related: the tangent plane is the graph of the linear approximation. The linearapproximation is the equation for the tangent plane. To compute either you have to do the same thing.

(IV7.1b) x/y2 ≈ 3 + (x− 3)− 6(y − 1) = x− 6y + 6 when x is close to 3 and y is close to 1.

(IV7.1c) sinx+ cos y ≈ −1 + (−1)(x− π) + (0)(y − π) = π − 1− x when x is close to π and y is close to π.

(IV7.1d) xyx+y

≈ 34

+ 116

(x− 3) + 916

(y − 1) when x is close to 3 and y is close to 1.

(IV7.2) z = 1

(IV7.3) z = 6(x− 3) + 3(y − 1) + 10

(IV7.4) z = (x− 2) + 4(y − 1/2)

(IV7.5a) Solve for z: z = ±√

2x2 + 3y2 − 4. In this problem we are looking at the point (1, 1,−1) so we have thegraph of z = f(x, y) = −

√2x2 + 3y2 − 4. The partials are

∂f

∂x=

−2x√2x2 + 3y2 − 4

,∂f

∂y=

−3y√2x2 + 3y2 − 4

so that, at (1, 1,−1) you get fx = −2, fy = −3. There for the equation for the tangent plane is z =

−2(x− 1)− 3(y − 1)− 1

(IV7.6a) The tangent plane has equation z = z0 + A(x − x0) + B(y − y0). By pu�ing the variables x, y, z on oneside, and all the constants on the other, you can write this as

Ax+By − z = Ax0 +By0 − z0.

This is the equation for a plane whose normal is #‰n =(AB−1

). Any other multiple of this vector is also a valid

normal to the plane, in particular,(−A−B+1

)is OK.

(IV7.6b) We want a normal to the graph of z = f(x, y) = 12x2 + 2y2 at the point P . By the previous problem a

normal is given by #‰n =

(fx(2,1)fy(2,1)−1

)=(

24−1

).

A line through P in the direction of #‰n is given by #‰r (t) =(

214

)+ t(

24−1

)(IV7.7) Below you see the graph of a function and two (solid) lines which are tangent to the graph. On one line you

have x = a (hence constant), and its slope is fx(a, b); on the other you have y = b, and it has slope fy(a, b).


The tangent plane to the graph (not drawn here, but see Figure 4 in the notes) is the plane containing the twolines in the drawing.

(IV7.8) The function is f(x, y) = x ln(xy). We have f(2, 12

) = 2 ln(2 · 12

) = ln 1 = 0. The gradient of the function

is#‰∇f =

(ln(xy)+1x/y

). At the point (2, 1

2) this is

#‰∇f =(

14

), so the linear approximation is

f(x, y) ≈ f(2,1

2) + 1 · (x− 2) + 4 · (y −

1

2),

i.e.

f(x, y) ≈ 1(x− 2) + 4(y −1

2).

(This is also the answer to problem 7.4.)Here we don’t want to describe the tangent plan, but we want to find the value of f(x, y) for (x, y) =(1.98, 0.4). Substituting these values of x and y in the linear approximation we get f(1.98, 0.4) ≈ (1.98 −2) + 4(0.4− 0.5) = −0.42.This is only an approximation, and you wonder how good it is. We have ∆x = 1.98 − 2 = −0.02, and∆y = 0.4 − 1

2= −0.1. . . are these numbers “small”? To find the error in the approximation you could use

a Lagrange-type remainder term, but that’s not part of math 234. Instead we grab a calculator and computef(1.98, 0.4) = 1.98 · ln(1.98 · 0.4) = −0.46172 · · · . So our linear approximation formula is o� by 0.04 · · · .

(IV7.9a) The x-and y-axes.

(IV7.9b) The heights are the z-coordinates, so z = xy and z∗ = −2 + x+ 2y. The di�erence is

z − z∗ = xy − (−2 + x+ 2y) = xy − x− 2y + 2.

(IV7.10a) The tangent plane has equation z = ab+ b(x− a) + a(y − b) = bx+ ay − ab.


(IV7.10b) The point (x, y, z) lies on the intersection if z = xy and z = bx + ay − ab. Therefore x and y must satisfyxy − bx− ay + ab = 0. This equation factors as follows:

xy − bx− ay + ab = (x− a)(y − b) = 0,

so that the intersection contains the line x = a, z = ay, and also the line y = b, z = bx.

(IV10.2) ∂(f+g)∂x

= fx + gx, and∂(f+g)∂y

= fy + gy , so∂(f+g)∂x

∂(f+g)∂y

=

(fx + gxfy + gy

)=

(fxfy

)+

(gxgy

)

Hence#‰∇(f + g) =

#‰∇f +#‰∇g.

The product and quotient rules follow in the same way.

(IV10.3b) The gradient is#‰∇f =

(2x8y

). This vector is parallel to

(11

)if there is a number s such that

#‰∇f = s(

11

), i.e.(

fxfy

)= ( ss ). This happens if fx(x, y) = fy(x, y). From our computation of the partial derivatives of f we

find that#‰∇f is parallel to

(11

)when 2x = 8y. This happens at every point on the line y = 1

4x.

We are asked which points on the level set f = 4 satisfy this condition, so we must find where the liney = 1

4x intersects the level set x2 + 4y2 = 4. Solving the two equations gives two points ( 4

5

√5, 1

5

√5) and

(− 45

√5,− 1

5

√5).

(IV10.3c)#‰∇g =

(4y2

8xy

). This is parallel to

(11

)when y = 2x. This line intersects the level set g = 4 in the point

( 12

3√

2, 3√

2).

Note: when you solve the equations#‰∇g = ( ss ), you find y = 2x, but also the line y = 0 (x-axis). On this

line the gradient actually vanishes, i.e.#‰∇g =

#‰0 and has no direction, so you can’t really say it is parallel to(

11

).

(IV10.4a) It’s a paraboloid of revolution.

(IV10.4b)#‰∇f =

(2x2y−2

)= s

(112

)if−2 = 2s, i.e. s = −1. This then implies 2x = −1, 2y = −1, so thatx = y = − 1

2.

Since the point has to lie on the zero set of f , we find z = 12

(x2 + y2) = 14.

(IV10.5a) At (2, 1) the gradient is#‰∇T =

(−2x

−9y2

)=(−4−9

). To cool o� as fast as possible the bug should go in the

opposite direction, i.e. in the direction of(

49

), or any positive multiple of this vector.

(IV10.5b) At (1, 3) the gradient is#‰∇T =

(−2−81

). To keep its temperature constant the bug should walk in any direction

perpendicular to the gradient. The vector(

81−2

)is perpendicular to the gradient, so the bug should go in the

direction of(

81−2

)or the opposite direction,

(−812

).

Any non-zero multiple of(−81

2

)is also a valid answer, since we can only give the direction and not the speed.

Remember: the vector(−ba

)is perpendicular to ( ab ).

(IV10.6) The zero set doesn’t have to be a curve. For example the zero set of the function f(x, y) =distance from (x, y)to the squareQ (Problems 5.13 and 3.7) is the whole squareQ.

(IV10.7) ‖ #‰∇f‖ is larger at the top right, because there the function f changes faster.

(IV10.8a) The gradient at the origin is the zero vector. This was explained in the text.

(IV10.8b) The function increases in the direction of the gradient. Since it vanishes on the curve in Figure 8, the functionwill be positive in the region above the curve, and it will be negative both below the curve and inside the li�leloop.

(IV10.12b) The result of a rather long calculation is that ‖ #‰∇f‖ = 1 everywhere outside the square, and ‖ #‰∇f‖ = 0inside the square (because f is constant in the square.)

(IV10.14) ax+ by + cz = R2.


(IV12.1) 4xt cos(x2 + y2) + 6yt2 cos(x2 + y2)

(IV12.2) 2xy cos t+ 2x2t

(IV12.3) 2xyt cos(st) + 2x2s, 2xys cos(st) + 2x2t

(IV12.4) 2xy2t− 4yx2s, 2xy2s+ 4yx2t

(IV12.6a) ∂TB∂Y

= − sinα ∂TA∂x

+ cosα ∂TA∂y

.

(IV12.6b) Take the formulas for ∂TB∂X

and ∂TB∂Y

and work out the right hand side in this problem.

(IV12.9a)#‰E = − #‰∇ ln r = 1

r2

( xy

).

(IV12.9b) ‖ #‰E‖ = 1/r = 1√

x2+y2.

(IV12.13a) Height = −(x2 − y2)/(x2 + y2)

(IV12.13b) Height = sin 2θ.

(IV12.13c) Height = cos 2ϕ.

(IV15.1) fx = 3x2y2, fy = 2x3y + 5y4, fxx = 6xy2, fyy = 2x3 + 20y3, fxy = 6x2y

(IV15.2) fx = 12x2 + y2, fy = 2xy, fxx = 24x, fyy = 2x, fxy = 2y

(IV15.3) fx = sin y, fy = x cos y, fxx = 0, fyy = −x sin y, fxy = cos y

(IV15.9) A function of two variables hasfxx, fxy = fyx, fyy ,

so it has three di�erent partial derivatives of second order.A function of three variables has these partial derivatives:

fxx fxy fxzfyx fyy fyzfzx fzy fzz

The ones “below the diagonal” are the same as corresponding derivatives above the diagonal, so there are onlysix di�erent partial derivatives of second order, namely these:

fxx fxy fxzfyy fyz

fzz

A function of two variables has

fxxx,

fxxy = fxyx = fyxx,

fxyy = fyxy = fyyx,

and fyyy

so four di�erent partial derivatives of third order.

(IV15.15a) We have g(u, v) = f(u+ v, u− v), so

∂g

∂u=∂f

∂x

∂(u+ v)

∂u+∂f

∂y

∂(u− v)

∂u= fx(u+ v, u− v) + fy(u+ v, u− v).

Similarly,∂g

∂v= fx(u+ v, u− v)− fy(u+ v, u− v).

Di�erentiate again to get∂2g

∂u2= fxx(u+ v, u− v) + 2fxy(u+ v, u− v) + fyy(u+ v, u− v).


(IV15.15b)∂2g

∂v2= fxx(u+ v, u− v)− 2fxy(u+ v, u− v) + fyy(u+ v, u− v)

Note that this is almost the same as∂2g

∂u2: the only change is in the minus sign before fxy .

(IV15.15c)∂2g

∂u∂v= fxx(u+ v, u− v)− fyy(u+ v, u− v)

(IV15.15d)∂2g

∂u2−∂2g

∂v2= −4fxy

(IV15.15e)∂2g

∂u2+∂2g

∂v2= 2(fxx + fyy

).

(V3.1a) If y 6= 0 then you can increase x2−x3−y2 by se�ing y = 0. To put it di�erently, no ma�er what you choosefor y, you always have

f(x, y) = x2 − x3 − y2 ≤ x2 − x3 = f(x, 0).

(V3.1b) The maximum has to appear on the x axis, so the question is which x ≥ 0 maximizes f(x, 0) = x2 − x3?This is a Math 221 question. The answer is at x = 2/3.

(V3.1c) No, limx→−∞ f(x, y) = +∞, so f has no largest value.

(V3.3)

1

( 34, 3

8

√3)

( 34,− 3

8

√3)

The quantity 4(x3 − x4) = 4x3(1 − x) is negative when x < 0 or x > 1, so the region is confined to thevertical strip 0 ≤ x ≤ 1. Within this stripR is comprised of those points which satisfy−

√4(x3 − x4) ≤ y ≤

+√

4(x3 − x4). The largest x value is a�ained at the point with x = 1, where y = 0, so, at the point (1, 0).The smallest x value is a�ained at the point (0, 0). The largest y value is a�ained at the point where y2 =

4x3−4x4 is maximal. This happens whenx = 34, and the largest y value is therefore

√4[(3/4)3 − (3/4)4] =

38

√3. The smallest y value also occurs at x = 3

4and is given by y = − 3

8

√3.

(V6.1a) fx = 2x− 2, fy = 8y + 8, fxx = 2, fxy = 0, fyy = 8.There is exactly one critical point, at (x, y) = (1,−1).The 2nd order Taylor expansion at this point is

f(1 + ∆x,−1 + ∆y) = f(1,−1) + (∆x)2 + 4(∆y)2 + · · ·

The quadratic part is positive definite, therefore f has a local minimum at (1,−1).

(V6.1b) fx = 2x+ 6, fy = −2y − 10, fxx = 2, fxy = 0, fyy = −2.There is exactly one critical point, at (x, y) = (−3,−5).The 2nd order Taylor expansion at this point is

f(−3 + ∆x,−5 + ∆y) = f(−3,−5) + (∆x)2 − (∆y)2 + · · ·

= f(−3,−5) +(∆x−∆y

)(∆x+ ∆y

)+ · · ·

The quadratic part factors, therefore f has a saddle point at (−3,−5). The level set near the critical pointconsists of two crossing curves whose tangents are given by the equations ∆x = ∆y and ∆x = −∆y. Since


∆x = x − a = x + 3 and ∆y = y − b = y + 5, the two tangent lines have equations x + 3 = y + 5 andx+ 3 = −(y + 5).

Critical point and level setnear the critical point.

(V6.1c) fx = 2x+ 4y, fy = 4x+ 2y, fxx = 2, fxy = 4, fyy = 2. There is one critical point: (x, y) = (2,−1).The 2nd order Taylor expansion at this point is

f(2 + ∆x,−1 + ∆y) = f(2,−1) + (∆x)2 + 4∆x∆x+ (∆y)2 + · · ·

= f(2,−1) +(∆x+ 2∆y

)2 − 3(∆y)2 + · · ·

= f(2,−1) +(∆x+ (2 +

√3)∆y

)(∆x+ (2−

√3)∆y

)+ · · ·

The quadratic part factors, therefore f has a saddle point at (2,−1). The level set near the critical point

Critical point and level setnear the critical point.

consists of two crossing curves whose tangents are given by the equations ∆x = −(2 +√

3)∆y and ∆x =−(2−

√3)∆y. Since ∆x = x− a = x− 2 and ∆y = y − b = y + 1, the two tangent lines have equations

x− 2 = −(2 +√

3)(y + 1) and x− 2 = −(2−√

3)(y + 1).

(V6.1d) fx = 2x− y − 5, fy = −x+ 4y + 6, fxx = 2, fxy = −1, fyy = 4.There is again one critical point: x = 2, y = −1.The 2nd order Taylor expansion at this point is

f(2 + ∆x,−1 + ∆y) = f(2,−1) + (∆x)2 −∆x∆x+ 2(∆y)2 + · · ·

= f(2,−1) +(∆x− 1

2∆y)2

+ 74

(∆y)2 + · · ·

The second order part of the Taylor expansion is positive, so (2,−1) is a local minimum.

(V6.1e) fx = −36x+ 4x3, fy = 2y, fxx = −36 + 12x2, fxy = 0, fyy = 2.The equation fx = 0 has three solutions, x = 0 and x = ±3. The equation fy = 0 has only one solutiony = 0. Therefore there are three critical points, the origin and the points (±3, 0).The taylor expansions at these points are

f(∆x,∆y) = f(0, 0)− 18(∆x)2 + (∆y)2 + · · ·

= f(0, 0) +(∆y −

√18x)(

∆y +√

18x)

+ · · ·

f(3 + ∆x,∆y) = f(3, 0) + 36(∆x)2 + (∆y)2 + · · ·

f(−3 + ∆x,∆y) = f(−3, 0) + 36(∆x)2 + (∆y)2 + · · ·

The second order terms in the Taylor expansions at (3, 0) and at (−3, 0) are both positive for all∆x and∆y, soboth points (±3, 0) are local minima. The second order part of the expansion at the origin factors and hence theorigin is a saddle point. The tangents to the zeroset at the origin are the lines ∆y = ±

√18∆x = ±3

√2∆x.

Since here∆x = “x−a” = x, and∆y = y, the tangents are the lines through the origin given by y = ±3√

2x.You can try to draw the zeroset of this function and analyze it in the same way as the “fishy example” in 4.4.The zeroset of f consists of the graphs of y = ±

√18x2 − x4 = ±|x|

√18− x2. It looks like a squashed “∞”

or a bu�erfly (you decide.)

-3 3

Critical points and zero set.(V6.1f) There are nine critical points. Four global minima at (±3,±

√3), four saddle points at (0,±

√3) and (±3, 0)

respectively, and finally, a local but not global maximum at the origin.

(V6.1g) critical point at (1,−1/6) fx = 4− 4x, fy = −1− 6y, fxx = −4, fxy = 0, fyy = −6.Second order Taylor expansion at the critical point:

f(−1 + ∆x,− 16

+ ∆y) = f(1,− 16

)− 2(∆x)2 − 3(∆y)2 + · · ·

The second order terms are always negative so (1,− 16

) is a local maximum.

(V6.1h) The derivatives are:

fx = 4y − 2xy − 2y2, fy = 4x− x2 − 4xy, fxx = −2y, fxy = 4− 2x− 4y, fyy = −4x.

This function is given in factored form, so without solving the equations fx = 0, fy = 0 you can say thefollowing about this problem. The zero set consists of the three lines: the y-axis (x = 0), the x-axis (y = 0)and the line with equation 4− x− 2y = 0. It follows that the intersection points (0, 0), (4, 0), and (0, 2) ofthese lines are saddle points. Since f > 0 in the triangle formed by the three lines this triangle must containat least one local maximum.


To find all critical points solve these equations:

fx = 4y − 2xy − 2y2 = 0 ⇐⇒ y = 0 or 4− 2x− 2y = 0

andfy = 4x− x2 − 4xy = 0 ⇐⇒ x = 0 or 4− x− 4y = 0

Since both equations fx = 0 and fy = 0 lead to two possibilities, we have to consider 2× 2 = 4 cases:y = 0 & x = 0: This tells us the origin is a critical pointy = 0 & 4− x− 4y = 0: Solving these equations leads to x = 4, y = 0, so (4, 0) is a critical point.4− 2x− 2y = 0 & x = 0: Solve and you find that (0, 2) is a critical point.4− 2x− 2y = 0 & 4− x− 4y = 0: Solve these equations and you get (x, y) = ( 4

3, 2

3).

The first three critical points are the saddle points we predicted. The fourth critical point must be a localmaximum, since there has to be one in the triangle, and of all the critical points we have found the others areall saddle points.

(V6.1i) Two saddle points: (0, 0) and (1, 1).

(V6.1j) Two saddle points: (2, 2) and (−2,−2)

(V6.1l) The origin. Neither a local max, min, nor saddle. The graph of this function is called the “Monkey Saddle” asit accommodates two legs and a tail too. Draw it in your graphing program to see this.

(V6.1m) Zero set is the parabola with equation x = y2, and the line x = 1. They intersect at (1,±1), so the functionhas two saddle points (1, 1) and (1,−1). The region between the line x = 1 and the parabola must containlocal minimum. It is located at ( 1

2, 0).

(V6.1n) Two saddle points : (2, 2) and (−2,−2). Yes, this problem appeared twice.

(V6.1o) All points on the y-axis are critical points. They are all global minima, but the second derivative test doesn’ttell you so.

(V6.1p) All points on the y-axis are again critical points. Those with y > 0 are local minima, those with y < 0 arelocal maxima, and the origin is neither. The second derivative test applies to none of these points.

(V6.1q) All points on the unit circle are global minima, because the function vanishes there, and is positive everywhereelse. The origin is a local maximum. The 2nd derivative test applies to the origin, but not to any of the othercritical points.

(V6.1r) All points on the y-axis are again critical points. Those with y > 0 are local minima, those with y < 0 arelocal maxima, and the origin is neither. The second derivative test applies to none of these points.

(V6.5a) (3, 4/3)

(V6.5c) x = (a+ c+ e)/3, y = (b+ d+ f)/3.

(V6.6) You have to show that fx(a, b) = fy(a, b) = 0. By the product rule fx(a, b) = gx(a, b)h(a, b) +

g(a, b)hx(a, b). Since both g(a, b) = 0 and h(a, b) = 0, it follows that fx(a, b) = 0. The same reason-ing applies to fy(a, b).

(V8.1a) One variable calculus! There is only one variable, a, and we must solve E′(a) = 0.

(V8.1b) a = (x1 + · · ·+ xN )/N , i.e. the average provides “the best fit.”

(V8.2a) Three: a, b, and c.


(V8.2b) The equations for (a, b, c) are:

(∑x4k) a + (

∑x3k) b + (

∑x2k) c =

∑x2kyk

(∑x3k) a + (

∑x2k) b + (

∑xk) c =

∑xkyk

(∑x2k) a + (

∑xk) b + N c =

∑yk

(V8.3) The equations are

(∑x2k) a + (

∑xkyk) b + (

∑xk) c =

∑xkzk

(∑xkyk) a + (

∑y2k) b + (

∑yk) c =

∑ykzk

(∑xk) a + (

∑yk) b + N c =

∑zk

(V10.1) The two ∆x and ∆y’s are di�erent. The first set of (∆x,∆y) are

∆x = x− 0, ∆y = y − 0,

(0, 0) being the coordinates of the first critical point we studied. The second set of (∆x,∆y) is

∆x = x− 23, ∆y = y − 0,

where ( 23, 0) is the other critical point. In a drawing:

Critical point at (2/3, 0)

ΔxΔy

Δx

Δy

Critical point at (0,0)

(x,y) = (Δx, Δy)(x,y) = (2/3+Δx, Δy)

(V10.2a) f(∆x,∆y) =(

1−∆x+ ∆x∆y)2

= 1− 2∆x+ ∆x2 + 2∆x∆y + · · ·

(V10.2b) f(1 + ∆x, 1 + ∆y) =(

1− (1 + ∆x) + (1 + ∆x)(1 + ∆y))2

= 1 + 2∆y + 2∆x∆y + 2(∆y)2 + · · ·

(V10.2c) f(∆x,∆y) = e∆x−(∆y)2 = 1 + ∆x+ 12

(∆x)2 − (∆y)2 + · · ·

(V10.2d) f(1 + ∆x, 1 + ∆y) = e(1+∆x)−(1+∆y)2 = 1 + ∆x− 2∆y + 12

(∆x)2 − 2∆x∆y + (∆y)2 + · · ·

(V10.4) Complete the square and you get

Q(x, y) =(x− ay

)2+(1− a2

)y2.

When 1 − a2 > 0, i.e. when −1 < a < 1 the form is positive definite. When a = ±1 the form is a perfectsquare, namely,

x2 ± 2xy + y2 =(x± y

)2.

When 1− a2 < 0, i.e. when a > 1 or a < −1, the form is indefinite:

x2 + 2axy + y2 =(x− ay −

√a2 − 1y

)(x− ay +

√a2 − 1y

)= (x− k+y)(x− k−y),

where k± = −a±√a2 − 1.

(V10.5) See the solutions to Problem 6.1 for the solutions to this problem.


(V10.7a) fx = 2x− 12y2, fy = 2y − xy. The equation fy = y(2− x) = 0 leads to two possibilities: x = 2 or y = 0.

If y = 0 then fx = 0 implies x = 0, which gives us one critical point, the origin (0, 0). If on the other handx = 2, then fx = 0 implies y2 = 8 ⇐⇒ y = ±2

√2. We therefore get two more critical points (2,±2

√2).

The second derivatives are fxx = 2, fxy = −y, fyy = 2 − x. Therefore we have the following Taylorexpansions at the three critical points:

f(∆x,∆y) = f(0, 0) + (∆x)2 + (∆y)2 + · · · =⇒ loc.min.

f(2 + ∆x, 2√

2 + ∆y) = f(2, 2√

2) + (∆x)2 − 2√

2∆x∆y + 0(∆y)2 + · · ·

= f(2, 2√

2) +(∆x− 2

√2∆y

)∆x+ · · · =⇒ saddle

f(2 + ∆x,−2√

2 + ∆y) = f(2,−2√

2) + (∆x)2 + 2√

2∆x∆y + 0(∆y)2 + · · ·

= f(2,−2√

2) +(∆x+ 2

√2∆y

)∆x+ · · · =⇒ saddle

The origin is therefore a local minimum, and the points (2,±2√

2) are saddlepoints. At (0, 2

√2) the level set consists of two crossing curves, whose tangents are given by ∆x = 0 (a

vertical line) and ∆x = 2√

2∆y (a line with slope 1/2√

2 = 14

√2).

(V10.7c) fx = 1 − y2, fy = 2 − 2xy. Critical points: fx = 0 holds when y = ±1. If y = +1, then fy = 0 impliesx = 1, and if y = −1 then fy = 0 implies x = −1. There are therefore two critical points, (1, 1) and(−1,−1).

(V13.1) f(x, y) = xy, g(x, y) = x2 + 14y2.

#‰∇f = ( yx ),#‰∇g =

(2xy/2

).

First we check for possible max/minima which satisfy#‰∇g =

#‰0 . But the only point (x, y) satisfying

#‰∇g(x, y) =(

00

)is the origin (x, y) = (0, 0), and this point does not lie on the constraint set.

Therefore, if there is a minimum it is a�ained at a solution of Lagrange’s equations

fx = λgx ⇐⇒ y = 2λx

fy = λgy ⇐⇒ x = λy/2

g(x, y) = 1 ⇐⇒ x2 + 14y2 = 1

Multiply the first equation with y and the second with 4x, then you get

y2 = 2λxy and 4x2 = 2λxy

Hence y2 = 4x2. Put that in the constraint, and you find

1 = x2 + 14y2 = 2x2.

Thus x = ±√

1/2 = ± 12

√2 and y = ±

√2. In all we have found four possible solutions. Lagrange’s method

does not tell us which, if any, of these are minima.

AB

C D

Level sets of the functionf(x, y) = xy and the con-straint set x2 + 1

4y2 = 1

By looking at the constraint set (it’s an ellipse with horizontal axis of length 1 and vertical axis of length 2)and taking into account that f(x, y) = xy is positive in the first and third quadrants, and negative in thesecond and fourth, you find out that the two points ( 1

2

√2,√

2) and (− 12

√2,−√

2) (A and C in the figure)are maximum points, while (− 1

2

√2,√

2) and ( 12

√2,−√

2) (B andD in the figure) are minimum points.


(V13.2a) Let the sides of the box be x, y, z. Wewant tominimize the quantityA = 2xy+2yz+2xz, with the constraintV = xyz = 1

2. The constraint implies that x 6= 0, y 6= 0 and z 6= 0 moreover, given x and y the only z which

satisfies the constraint is z = 1/(2xy). Thus we must minimize the following function of two variables

A(x, y) = xy +1

2x+

1

2y

over all x > 0, y > 0.A minimummust be an interior minimum (can’t be on the x or y-axis since these are excluded), and thus mustbe a critical point.

∂A

∂x= y −

1

2x2,

∂A

∂y= x−

1

2y2.

Solving Ax = Ay = 0 for (x, y) leads to x = y = 3√

2, so the solution is a cube 1/ 3√

2 on a side

(V13.2b) Wewish to minimizeA(x, y, z) = 2yz+2xz+2xy with constraint V (x, y, z) = xyz = 12, using Lagrange’s

method.First we check for exceptional points on the constraint set, i.e. points (x, y, z) that satisfy both V (x, y, z) = 1

2

and#‰∇V (x, y, z) =

#‰0 . Since

#‰∇V =

yzxzxy

the gradient

#‰∇V vanishes if at least two of the three coordinates x, y, z are zero. But such a point can neversatisfy the constraint xyz = 1

2. Therefore, if there is a box with least area, its sides x, y, z must satisfy

Lagrange’s equations.Lagrange’s equations are

Ax = λVx ⇐⇒ 2y + 2z = λyz

Ay = λVy ⇐⇒ 2x+ 2z = λxz

Az = λVz ⇐⇒ 2x+ 2y = λxy

To get rid of λ multiply the first equation with x and the second with y to get

y(2x+ 2z) = λxyz = x(2y + 2z) =⇒ 2xy + 2yz = 2xy + 2xz =⇒ 2yz = 2xz.

Therefore we find that either z = 0 or x = y. But z = 0 is not possible, because (x, y, z) must satisfy theconstraint xyz = 0. Therefore we get x = y.If you multiply the second Lagrange equation with y and the third with z then the same reasoning as abovetells you that y = z.So, if there is a minimum then it happens when x = y = z, i.e. when the box is a cube. The only cube thatsatisfies the constraint has sides x = y = z = 2−1/3.As always, Lagrange’s method does not rule out the possibility that the cube we have found actually maximizesthe surface area, rather than minimizing it. That this is actually not the case is something you would have toprove by other means. We will not do that in this course.

(V13.3) Answer: the shortest distance is√

100/3.Solution: If (x, y, z) is any point than its distance to the origin is d(x, y, z) =

√x2 + y2 + z2. We want

to minimize d(x, y, z) over all points (x, y, z) which satisfy the constraint g(x, y, z) = x + y + z = 10.Instead of minimizing d(x, y, z) we will minimize f(x, y, z) = d(x, y, z)2 = x2 + y2 + z2. You can do thisproblem directly with the function d(x, y, z) and you will get the same answer – the computations are just ali�le longer because f has easier derivatives than d.We use Lagrange’s method. First we check for exceptional points, i.e. points on the constraint set which satisfy#‰∇g =

#‰0 . Since

#‰∇g =(

111

)the gradient of g can never be the zero vector, so there are no exceptional points.

If there is a minimum of f on the constraint set, it must be a solution of Lagrange’s equations.The Lagrange equations are

fx = λgx ⇐⇒ 2x = λ

fy = λgy ⇐⇒ 2y = λ

fz = λgz ⇐⇒ 2z = λ

Therefore if there is a nearest point to the origin on the plane then it must satisfy x = y = z = λ/2 aswell as the constraint. The only point satisfying these conditions is ( 10

3, 10

3, 10

3).


Lagrange’s method does not tell us that this is the nearest point. As far as Lagrange is concerned it could alsobe the furthest point from the origin. (But because we know what a plane looks like we “know” that there hasto be a nearest point to the origin.)

(V13.5a) Minimize f(x, y, z) = (x−2)2 +(y−1)2 +(z−4)2 subject to the constraint g(x, y, z) = 2x−y+3z = 1.

First, since#‰∇g −

(2−13

)6= #‰

0 , there are no exceptional points, so the nearest point (if it exists) is a solution

of Lagrange’s equations. These are

2(x− 2) = 2λ, 2(y − 1) = −λ, 2(z − 4) = 3λ.

Eliminate λ to getx = −2y + 4, z = −3y + 7.

Combined with the constraint you then find

y = 2, x = 0, z = 1.

The Lagrange multiplier is λ = x− 2 = −2.The distance from the point we found to the given point (2, 1, 4) is

d =√

(x− 2)2 + (y − 1)2 + (z − 4)2 =√

14

(V13.5b) |ax0 + by0 + cz0 − d|/√a2 + b2 + c2

(V13.8) a cube

(V13.10) 65/3× 65/3× 130/3

(V13.11) It has a square base, and is one and one half times as tall as wide. If the volume is V the dimensions are3√

2V/3× 3√

2V/3× 3√

9V/4.

(V13.12) (0, 0, 1), (0, 0,−1)

(V13.13) 3√

4V × 3√

4V × 3√V/16

(V13.14) Farthest: (−√

2,√

2, 2 + 2√

2); closest: (2, 0, 0), (0,−2, 0)

(VI3.1a) 2

(VI3.1b) 8

(VI3.1c) 2/3

(VI3.1d)∫ π

0

∫ y

0

sin y

ydx dy =

∫ π

0

sin y

y· y dy =

∫ π

0sin y dy = 2.

(VI3.1e) Except for a change in notation (y → θ and x → r) this is the same integral as in the previous problem. Theanswer is again 2.

(VI3.1f) Which function is being integrated? It’s the function f(x, y) = 1.∫ 10

∫√1−x20 dy dx =

∫ 10

[y]y=√

1−x2y=0

dx =∫ 10

√1− x2 dx. The last integral is the area of a quarter

circle with radius 1, so the answer is π/4.

(VI3.2) Once you compute the inner integral∫ 1

0sin(πx)dx =

[−

1

πcosπx

]1

x=0

= −1

πcosπ −

1

π/4(− cos 0) = 2,

you get ∫ 1

x

{∫ 1

0sin(πx)dx

}dy =

∫ 1

x2dy = [2y]1y=x = 2(1− x).

The result depends on x. The x in the answer and the two x-es in the inner integral refer to di�erent quantities.This is at best confusing, and should really never be done.


(VI3.3a) Not true! To give a counterexample for the statement in the problem, almost any two functions f and g willdo. For instance, if you choose f(x) = x, g(y) = 1, then you get∫ 1

0

∫ 2

0f(x)g(y) dx dy =

∫ 1

0

∫ 2

0xdxdy = 2.

but ∫ 1

0f(x) dx ×

∫ 2

0g(y) dy =

∫ 1

0x dx ×

∫ 2

0dy =

1

2× 2 = 1.

(VI3.3b) True! ∫ 1

0

∫ 2

0f(x)g(y)dydx =

∫ 1

0

{∫ 2

0f(x)g(y)dy

}dx.

Since f(x) does not depend on y, we have∫ 2

0f(x)g(y)dy = f(x)

∫ 2

0g(y) dy.

Therefore ∫ 1

0

{∫ 2

0f(x)g(y)dy

}dx =

∫ 1

0f(x)

{∫ 2

0g(y)dy

}dx.

The integral∫ 20 g(y)dy is a constant, and does therefore not depend on x, so we can factor it out of the x-

integral: ∫ 1

0f(x)

{∫ 2

0g(y)dy

}dx =

∫ 1

0f(x) dx ·

∫ 2

0g(y) dy,

which is what we had to show.

(VI3.3c) This is false, and there is no simple way of fixing it. To see that this fails evaluate both sides with f(x) = 1

and g(y). On the le� you get the area of the discD, which is π, and on the right you get 2 · 2 = 4.

(VI3.4) The volume under the graph is 13ba3 + 1

3ab3 = 1

3ab(a2 + b2). The volume of the surrounding block is

a × b × (a2 + b2), so the region beneath the graph occupies one third of the surrounding block, no ma�erwhich a or b you choose.

(VI3.5a) 16

(VI3.5b) 4

(VI3.5c) 15/8

(VI3.5d) 1/2

(VI3.5e) 5/6

(VI3.5f) 12− 65/(2e).

(VI3.5g) 1/2

(VI3.5h) (2/9)23/2 − (2/9)

(VI3.5i) (1− cos(1))/4

(VI3.5j) (2√

2− 1)/6

(VI3.5k) π − 2

(VI3.6a) 8π

(VI3.6b) 2

(VI3.6c) 5/3

(VI3.6d) 81/2


(VI3.6e) 2a3/3

(VI3.6f) 8π

(VI3.6g) π/32

(VI3.8a) A

(VI3.8b) B/2

(VI3.9a)∫ 1

0

∫ √1−x2

0

2xy

x2 + y2dy dx.

(VI3.9b) In P.C. the function simplifies to F (r, θ) = 2 sin θ cos θ, so the volume is

V =

∫ 1

0

∫ π/2

02 sin θ cos θr dθ dr =

∫ 1

0

[sin2 θ

]π/20

r dr = 12.

(VI7.1a) A cone around the positive z axis, with opening angle π/6.

(VI7.1b) The negative half of the z axis.

(VI7.1c) The xy plane.

(VI7.1d) The half of the yz plane which contains the positive y axis, and which ends at the z-axis.

(VI7.2a) 0 ≤ θ ≤ π/2, 0 ≤ ρ ≤ a, 0 ≤ φ ≤ π/2.

(VI7.2b) 0 ≤ θ ≤ π/2, 0 ≤ r ≤ a, 0 ≤ z ≤√a2 − r2, or:

0 ≤ θ ≤ π/2, 0 ≤ z ≤ a, 0 ≤ r ≤√a2 − z2.

(VI7.3) Figures 15 and 16.

(VI7.4a) Large circle has radius 1, the smaller has radius√

1− z2.

(VI7.4b) x =√

1− y2 − z2 for the point in front, and x = −√

1− y2 − z2 for the point in the back (furthest awayfrom you, the viewer).

(VI7.5a) The potential energy is “mass×height×g”. The mass of the small piece of honey is∆m = µ×∆V , where∆Vis the volume occupied by the small piece of honey. This is not an exact formula, but only an approximation,since not all particles in the small piece of honey have exactly the same height. However, as one considerssmaller and smaller pieces the approximation gets be�er.

(VI7.5b) The total potential energy is

P.E. =

∫∫∫D

µgz dV.

Interpretation: this is the total energy that would be released if you put all the honey at height zero (e.g. bypouring it out of the jar onto the floor.)

(VI7.5c) The iterated integral is

P.E. =

∫ A

x=0

∫ B

y=0

∫ f(x,z)

z=0µgz dz dy dx =

1

2µg

∫ A

x=0

∫ B

y=0f(x, y)2 dy dx.

(VI7.6a) The kinetic energy in a small region of the airmass is 12

∆m×v2, where ∆m is the mass of the air in the smallregion. This mass is µ×∆V , with ∆V the volume of the small region, so the kinetic energy of the small regionis 1

2µ×v2×∆V . Partitioning the whole airmass, and adding the kinetic energies of all the small pieces leads

to this integral:

K.E. =

∫∫∫D

12µv(r)2 dV = 1

2µ

∫∫∫D

v(r)2 dV.


(VI7.6b) In cylindrical coordinates the domain is defined by 0 ≤ r ≤ R and 0 < z ≤ H , so the integral is

K.E. =1

2

∫ 2π

θ=0

∫ H

z=0

∫ R

r=0

r

1 + r2dr dz dθ =

π

2H ln

(1 +R2

).

(VI7.7a) 623/60

(VI7.7b) −3e2/4 + 2e− 3/4

(VI7.7c) 1/20

(VI7.7d) π/48

(VI7.7e) 11/84

(VI7.7f) 151/60

(VI7.8) 32

(VI7.9) 64/3

(VI7.10) x = y = 0, z = 16/15

(VI7.11) x = y = 0, z = 1/3

(VI7.12a) I = V+, J = −V− (note the minus sign),K = V+ − V−, L = V+ + V−.

(VI7.13) π/12

(VI7.14) 5π/4

(VI7.15) 0

(VI7.16) 5π/4

(VI7.17) 4/5

(VI7.18) 256π/15

(VI7.19) 4π2

(VI7.20) πkh2a2/12

(VI7.21) πkha3/6

(VI7.22) π2/4

(VI7.23) 4π/5

(VI7.24) 15π

(VII4.1a) The answer is 1. You could compute that, but you don’t have to. The distance is 1 everywhere, so its averageshould also be 1.

(VII4.1b)∫ π/20 θ dθ

π/2= π/4

(VII4.3) The average x coordinate is zero, and the average y coordinate is 2/π.

(VII4.4) #‰x(t) =(tt2

)is a parametrization, so the integral becomes∫

C

x ds =

∫ 1

t=0t︸︷︷︸x=t

√1 + 4t2︸︷︷︸‖ #‰x ′(t)‖

dt =

[2

3

1

8

(1 + 4t2

)3/2]1

0

=5√

5− 1

12.


(VII4.5a) a,H,L are lengths; T0 is a temperature.

(VII4.5b) a = 1 is the radius of the cylinder on which the helix lies, andH = π/2 is the height of one turn of the helix.

(VII4.5c) The average is

average temp. =

∫CT ds∫

Cds

.

With the given parametrization ds = ‖ #‰x ′(t)‖ dt =√a2 +H2/4π2 dt – an ugly expression, but it’s

constant, which is good for integrating. You get∫C

ds =

∫ 2π

0

√a2 +H2/4π2 dt = 2π

√a2 +H2/4π2 =

√4π2a2 +H2.

and ∫C

T ds =

∫ 2π

0T0e−Ht/2πL

√a2 +H2/4π2 dt

= T0

√a2 +H2/4π2

[−

2πL

He−Ht/2πL

]2π

t=0

= T0

√a2 +H2/4π2

2πL

H

[1− e−H/L

].

Therefore the average temperature is

average temp. =L

H

(1− e−H/L

)T0.

(VII8.1) Yes. It is the gradient of f(x, y) = gy.

(VII8.3) By Clairaut’s theorem, if#‰F is a gradient, then Py = Qx.

By the fundamental theorem for line integrals, if#‰F is a gradient, then

∮C

#‰F ·d #‰x = 0 (or, equivalently,∮

CPdx+Qdy = 0) for every closed curve C.

(VII8.4a)∫C

#‰F ·d #‰x =

∫Cxdx = 0.∫

C

#‰G·d #‰x =

∫Cxdy = π.

(VII8.4b) Since∫C

#‰G·d #‰x 6= 0 the vector field

#‰G cannot be a gradient.

(VII8.4c) The integral∫C

#‰f ·d #‰x vanishes, but to check that

#‰F is conservative one has to check

∫C

#‰f ·d #‰x = 0 for all

closed curves C, and not just the unit circle. So our integral computation does not imply that#‰F is a gradient.

You can use di�erent arguments to show directly that#‰F is a gradient, for instance, by noting that

#‰∇( 12x2) =

( x0 ), or if you’re not that lucky, by using the methods of § IV.14.

(VII12.2b) The answer in both cases is the same (because they are two di�erent ways of computing the same integral).The second approach, using Green’s theorem leads to∫

C

2y dx+ 3x dy =

∫∫R

(∂3x

∂x−∂2y

∂y

)dA =

∫∫R

(3− 2) dA,

so the answer is the area of the square, i.e. 1

(VII12.3) Using Green’s theorem we get zero. But here we do not need Green’s theorem: the Fundamental Theorem forline integrals (see § sec:integral-over-closed-curve-of-gradient-vanishes) tells us that this integral must be zero.

(VII12.5a) 0

(VII12.5b) 1/(2e)− 1/(2e7) + e/2− e7/2

(VII12.5c) 1/2

(VII12.5d) 0


(VII12.5e) −1/6

(VII12.5f) (2√

3− 10√

5 + 8√

6)/3− 2√

2/5 + 1/5

(VII12.5g) 11/2− ln(2)

(VII12.5h) 2− π/2

(VII12.5i) −17/12

(VII12.5j) 0

(VII12.5k) −π/2

(VII12.5l) −π/2

(VII12.5m) 12π

(VII17.1) The distance to the central axis is r2 = y2 + z2, so

#‰v (x, y, z) = vc(1−

y2 + z2

R2

)#‰ı

(VII17.2) The inverse square law holds:

‖ #‰F ‖ =

∥∥∥∥−C #‰x

‖ #‰x‖3

∥∥∥∥ =C

‖ #‰x‖3‖ #‰x‖ =

C

‖ #‰x‖2.

(VII17.3) n = 2 and C = µ0I/2π.

(VII17.5a)(e

#‰m· #‰x)x1

= m1e#‰m· #‰x , and the same for the x2 and x3 derivatives. Therefore

#‰∇(e

#‰m· #‰x)

=

m1e#‰m· #‰x

m2e#‰m· #‰x

m3e#‰m· #‰x

= e#‰m· #‰x

(m1m2m3

).

(VII17.5b) A�er simplifying you get#‰∇· #‰v = # ‰m· #‰ae #‰m· #‰x .

(VII17.5c)#‰∇× #‰v = # ‰m× #‰ae

#‰m· #‰x .

(VII17.5d) #‰a and # ‰m must be perpendicular.

(VII17.5e) If #‰v is the gradient of some function, then its curl must vanish. Therefore #‰a× # ‰m =#‰0 in view of part 3 of

this problem. The conclusion is that #‰a and # ‰m must be parallel.

(VII17.6) #‰v · #‰∇f = Pfx +Qfy +Rfz .

(VII17.7a) By definition,

#‰∇·(f #‰v ) =#‰∇·

fPfQfR

=∂fP

∂x+∂fQ

∂y+∂fR

∂z

= fxP + fPx + fyQ+ fQy + fzR+ fRz

= fxP + fyQ+ fzR+ f(Px +Qy +Rz

)=

fxfyfz

·PQR

+ f#‰∇· #‰v

=#‰∇f · #‰v + f

#‰∇· #‰v ,

as claimed.


(VII17.7b)#‰∇×(f #‰v ) = (

#‰∇f)× #‰v + f#‰∇× #‰v is the rule. The derivation goes along the same lines as in the previous

product rule.

(VII17.8) This is example 16.1.

(VII17.9a) 5ρ2.

(VII17.9b) #‰x·#‰xρ

= ‖ #‰x‖2/ρ = ρ2/ρ = ρ.

(VII17.9c) Note that ‖ #‰x‖ = ρ, so you have to compute#‰∇·( #‰x/ρ3). The answer is zero.

It says that the divergence of the gravitational field of the Earth is zero.

(VII17.10b) Since #‰x is the gradient of some function its curl must vanish.

(VII17.10c)#‰∇×(ρ #‰x) = (

#‰∇ρ)× #‰x + ρ#‰∇× #‰x =

#‰0

(VII17.11) #‰v (x, y, z) =(−yx0

)so

#‰∇× #‰v =(

002

)= 2

#‰k .

(VII17.12a) #‰v (x, y, z) =

x(x2 + y2 + z2)n/2

y(x2 + y2 + z2)n/2

z(x2 + y2 + z2)n/2

.

(VII17.12b) Using the product rule, you get#‰∇(ρn #‰x) = (

#‰∇ρn)· #‰x + ρn#‰∇· #‰x = −nρn−1(

#‰∇ρ)· #‰x + ρn#‰∇· #‰x .

Now recall (or compute again):#‰∇ρ =

#‰x

ρ, and

#‰∇· #‰x = 3.

This leads to#‰∇(ρn #‰x) = nρn−1

#‰x

ρ· #‰x + 3ρn = nρn−2‖ #‰x‖2 + 3ρn = (n+ 3)ρn)

(VII17.12c) n = −3.

(VII17.13) There are a long and a short answer. The long(er) computation goes likes this:

#‰∇F (ρ) =

F (ρ)xF (ρ)yF (ρ)z

=

F ′(ρ)ρxF ′(ρ)ρyF ′(ρ)ρz

= F ′(ρ)

ρxρyρz

.

Now recall (185), and you find

#‰∇F (ρ) = F ′(ρ)

x/ρy/ρ

z/ρ

=1

ρF ′(ρ) #‰x .

The short computation is essentially the same, but you never write the components of the vectors:

#‰∇F (ρ) = F ′(ρ)#‰∇ρ =

1

ρF ′(ρ) #‰x .

(VII17.13a) If f(x, y, z) = F (ρ), then by the previous problem we have#‰∇f = ρ−1F ′(ρ) #‰x . We want this to be equal

to ρ−n #‰x , so F (ρ) must satisfy

ρ−1F ′(rho) = ρn =⇒ F ′(ρ) = ρ1+n =⇒ F (ρ) =ρ2+n

2 + n+ C

for some constant C . We are only asked to find on function f , so we find that the given vector field is indeedthe gradient of a radially symmetric function:

#‰v = ρn #‰x =#‰∇( ρ2+n

2 + n

).

The exceptional case is when n = −2, in which case you get F (ρ) = ln ρ.

CHAPTER 8

GNU Free Documentation License

Version 1.3, 3 November 2008Copyright © 2000, 2001, 2002, 2007, 2008 Free So�ware Foundation, Inc.

〈h�p://fsf.org/〉

Everyone is permi�ed to copy and distribute verbatim copies of this license document, but changing it is not allowed.

Preamble

�e purpose of this License is to make a manual, textbook, or otherfunctional and useful document “free” in the sense of freedom: toassure everyone the e�ective freedom to copy and redistribute it,with or without modifying it, either commercially or noncommer-cially. Secondarily, this License preserves for the author and pub-lisher a way to get credit for their work, while not being consideredresponsible for modi�cations made by others.

�is License is a kind of “copyle�”, which means that derivativeworks of the document must themselves be free in the same sense.It complements the GNU General Public License, which is a copy-le� license designed for free so�ware.

We have designed this License in order to use it for manuals for freeso�ware, because free so�ware needs free documentation: a freeprogram should come with manuals providing the same freedomsthat the so�ware does. But this License is not limited to so�waremanuals; it can be used for any textual work, regardless of subjectma�er or whether it is published as a printed book. We recommendthis License principally for works whose purpose is instruction orreference.

1. APPLICABILITY AND DEFINITIONS

�is License applies to any manual or other work, in any medium,that contains a notice placed by the copyright holder saying it canbe distributed under the terms of this License. Such a notice grantsa world-wide, royalty-free license, unlimited in duration, to usethat work under the conditions stated herein. �e “Document”,below, refers to any such manual or work. Any member of thepublic is a licensee, and is addressed as “you”. You accept the li-cense if you copy, modify or distribute the work in a way requiringpermission under copyright law.

A “Modi�ed Version” of the Document means any work contain-ing the Document or a portion of it, either copied verbatim, or withmodi�cations and/or translated into another language.

A “Secondary Section” is a named appendix or a front-ma�er sec-tion of the Document that deals exclusively with the relationship ofthe publishers or authors of the Document to the Document’s over-all subject (or to related ma�ers) and contains nothing that couldfall directly within that overall subject. (�us, if the Document isin part a textbook of mathematics, a Secondary Section may notexplain any mathematics.) �e relationship could be a ma�er ofhistorical connection with the subject or with related ma�ers, orof legal, commercial, philosophical, ethical or political position re-garding them.

�e “Invariant Sections” are certain Secondary Sections whosetitles are designated, as being those of Invariant Sections, in thenotice that says that the Document is released under this License.If a section does not �t the above de�nition of Secondary then itis not allowed to be designated as Invariant. �e Document maycontain zero Invariant Sections. If the Document does not identifyany Invariant Sections then there are none.

�e “Cover Texts” are certain short passages of text that are listed,as Front-Cover Texts or Back-Cover Texts, in the notice that says

that the Document is released under this License. A Front-CoverText may be at most 5 words, and a Back-Cover Text may be atmost 25 words.

A “Transparent” copy of the Document means a machine-readable copy, represented in a format whose speci�cation is avail-able to the general public, that is suitable for revising the docu-ment straightforwardly with generic text editors or (for imagescomposed of pixels) generic paint programs or (for drawings) somewidely available drawing editor, and that is suitable for input totext forma�ers or for automatic translation to a variety of formatssuitable for input to text forma�ers. A copy made in an otherwiseTransparent �le format whose markup, or absence of markup, hasbeen arranged to thwart or discourage subsequent modi�cation byreaders is not Transparent. An image format is not Transparent ifused for any substantial amount of text. A copy that is not “Trans-parent” is called “Opaque”.

Examples of suitable formats for Transparent copies include plainASCII without markup, Texinfo input format, LaTeX input for-mat, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for humanmodi�cation. Examples of transparent image formats include PNG,XCF and JPG. Opaque formats include proprietary formats that canbe read and edited only by proprietary word processors, SGMLor XML for which the DTD and/or processing tools are not gen-erally available, and the machine-generated HTML, PostScript orPDF produced by some word processors for output purposes only.

�e “Title Page” means, for a printed book, the title page itself,plus such following pages as are needed to hold, legibly, the ma-terial this License requires to appear in the title page. For worksin formats which do not have any title page as such, “Title Page”means the text near the most prominent appearance of the work’stitle, preceding the beginning of the body of the text.

�e “publisher” means any person or entity that distributes copiesof the Document to the public.

A section “EntitledXYZ”means a named subunit of the Documentwhose title either is precisely XYZ or contains XYZ in parenthe-ses following text that translates XYZ in another language. (HereXYZ stands for a speci�c section name mentioned below, suchas “Acknowledgements”, “Dedications”, “Endorsements”, or“History”.) To “Preserve the Title” of such a section when youmodify the Document means that it remains a section “EntitledXYZ” according to this de�nition.

�e Document may include Warranty Disclaimers next to the no-tice which states that this License applies to the Document. �eseWarranty Disclaimers are considered to be included by reference inthis License, but only as regards disclaiming warranties: any otherimplication that these Warranty Disclaimers may have is void andhas no e�ect on the meaning of this License.

2. VERBATIM COPYING

You may copy and distribute the Document in any medium, eithercommercially or noncommercially, provided that this License, thecopyright notices, and the license notice saying this License ap-plies to the Document are reproduced in all copies, and that you

199

200 8. GNU FREE DOCUMENTATION LICENSE

add no other conditions whatsoever to those of this License. Youmay not use technical measures to obstruct or control the readingor further copying of the copies you make or distribute. However,you may accept compensation in exchange for copies. If you dis-tribute a large enough number of copies you must also follow theconditions in section 3.

You may also lend copies, under the same conditions stated above,and you may publicly display copies.

3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonlyhave printed covers) of the Document, numbering more than 100,and the Document’s license notice requires Cover Texts, you mustenclose the copies in covers that carry, clearly and legibly, all theseCover Texts: Front-Cover Texts on the front cover, and Back-CoverTexts on the back cover. Both covers must also clearly and legiblyidentify you as the publisher of these copies. �e front cover mustpresent the full title with all words of the title equally prominentand visible. You may add other material on the covers in addition.Copying with changes limited to the covers, as long as they pre-serve the title of the Document and satisfy these conditions, can betreated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to �t legi-bly, you should put the �rst ones listed (as many as �t reasonably)on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document num-bering more than 100, you must either include a machine-readableTransparent copy along with each Opaque copy, or state in or witheach Opaque copy a computer-network location from which thegeneral network-using public has access to download using public-standard network protocols a complete Transparent copy of theDocument, free of added material. If you use the la�er option, youmust take reasonably prudent steps, when you begin distributionof Opaque copies in quantity, to ensure that this Transparent copywill remain thus accessible at the stated location until at least oneyear a�er the last time you distribute an Opaque copy (directly orthrough your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of theDocument well before redistributing any large number of copies, togive them a chance to provide you with an updated version of theDocument.

4. MODIFICATIONS

You may copy and distribute a Modi�ed Version of the Documentunder the conditions of sections 2 and 3 above, provided that yourelease the Modi�ed Version under precisely this License, with theModi�ed Version �lling the role of the Document, thus licensingdistribution and modi�cation of the Modi�ed Version to whoeverpossesses a copy of it. In addition, you must do these things in theModi�ed Version:

A. Use in the Title Page (and on the covers, if any) atitle distinct from that of the Document, and fromthose of previous versions (which should, if therewere any, be listed in the History section of theDocument). You may use the same title as a previ-ous version if the original publisher of that versiongives permission.

B. List on the Title Page, as authors, one or morepersons or entities responsible for authorship ofthe modi�cations in the Modi�ed Version, togetherwith at least �ve of the principal authors of theDocument (all of its principal authors, if it hasfewer than �ve), unless they release you from thisrequirement.

C. State on the Title page the name of the publisher ofthe Modi�ed Version, as the publisher.

D. Preserve all the copyright notices of the Document.E. Add an appropriate copyright notice for your mod-

i�cations adjacent to the other copyright notices.F. Include, immediately a�er the copyright notices, a

license notice giving the public permission to usethe Modi�ed Version under the terms of this Li-cense, in the form shown in the Addendum below.

G. Preserve in that license notice the full lists of In-variant Sections and required Cover Texts given inthe Document’s license notice.

H. Include an unaltered copy of this License.

I. Preserve the section Entitled “History”, Preserve itsTitle, and add to it an item stating at least the title,year, new authors, and publisher of the Modi�edVersion as given on the Title Page. If there is nosection Entitled “History” in the Document, createone stating the title, year, authors, and publisher ofthe Document as given on its Title Page, then addan item describing the Modi�ed Version as statedin the previous sentence.

J. Preserve the network location, if any, given in theDocument for public access to a Transparent copyof the Document, and likewise the network loca-tions given in the Document for previous versionsit was based on. �ese may be placed in the “His-tory” section. You may omit a network location fora work that was published at least four years beforethe Document itself, or if the original publisher ofthe version it refers to gives permission.

K. For any section Entitled “Acknowledgements” or“Dedications”, Preserve the Title of the section, andpreserve in the section all the substance and tone ofeach of the contributor acknowledgements and/ordedications given therein.

L. Preserve all the Invariant Sections of the Docu-ment, unaltered in their text and in their titles. Sec-tion numbers or the equivalent are not consideredpart of the section titles.

M. Delete any section Entitled “Endorsements”. Sucha section may not be included in the Modi�ed Ver-sion.

N. Do not retitle any existing section to be Entitled“Endorsements” or to con�ict in title with any In-variant Section.

O. Preserve any Warranty Disclaimers.

If the Modi�ed Version includes new front-ma�er sections or ap-pendices that qualify as Secondary Sections and contain no mate-rial copied from the Document, you may at your option designatesome or all of these sections as invariant. To do this, add their ti-tles to the list of Invariant Sections in theModi�ed Version’s licensenotice. �ese titles must be distinct from any other section titles.

You may add a section Entitled “Endorsements”, provided it con-tains nothing but endorsements of your Modi�ed Version by vari-ous parties—for example, statements of peer review or that the texthas been approved by an organization as the authoritative de�ni-tion of a standard.

You may add a passage of up to �ve words as a Front-Cover Text,and a passage of up to 25 words as a Back-Cover Text, to the end ofthe list of Cover Texts in the Modi�ed Version. Only one passage ofFront-Cover Text and one of Back-Cover Text may be added by (orthrough arrangements made by) any one entity. If the Documentalready includes a cover text for the same cover, previously addedby you or by arrangement made by the same entity you are actingon behalf of, you may not add another; but you may replace the oldone, on explicit permission from the previous publisher that addedthe old one.

�e author(s) and publisher(s) of the Document do not by this Li-cense give permission to use their names for publicity for or toassert or imply endorsement of any Modi�ed Version.

5. COMBINING DOCUMENTS

You may combine the Document with other documents releasedunder this License, under the terms de�ned in section 4 above formodi�ed versions, provided that you include in the combinationall of the Invariant Sections of all of the original documents, un-modi�ed, and list them all as Invariant Sections of your combinedwork in its license notice, and that you preserve all their WarrantyDisclaimers.

�e combinedwork need only contain one copy of this License, andmultiple identical Invariant Sections may be replaced with a singlecopy. If there are multiple Invariant Sections with the same namebut di�erent contents, make the title of each such section uniqueby adding at the end of it, in parentheses, the name of the origi-nal author or publisher of that section if known, or else a uniquenumber. Make the same adjustment to the section titles in the listof Invariant Sections in the license notice of the combined work.

8. GNU FREE DOCUMENTATION LICENSE 201

In the combination, you must combine any sections Entitled “His-tory” in the various original documents, forming one section Enti-tled “History”; likewise combine any sections Entitled “Acknowl-edgements”, and any sections Entitled “Dedications”. You mustdelete all sections Entitled “Endorsements”.

6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and otherdocuments released under this License, and replace the individualcopies of this License in the various documents with a single copythat is included in the collection, provided that you follow the rulesof this License for verbatim copying of each of the documents in allother respects.You may extract a single document from such a collection, and dis-tribute it individually under this License, provided you insert a copyof this License into the extracted document, and follow this Licensein all other respects regarding verbatim copying of that document.

7. AGGREGATIONWITH INDEPENDENTWORKS

A compilation of the Document or its derivatives with other sepa-rate and independent documents or works, in or on a volume of astorage or distributionmedium, is called an “aggregate” if the copy-right resulting from the compilation is not used to limit the legalrights of the compilation’s users beyond what the individual workspermit. When the Document is included in an aggregate, this Li-cense does not apply to the other works in the aggregate which arenot themselves derivative works of the Document.If the Cover Text requirement of section 3 is applicable to thesecopies of the Document, then if the Document is less than onehalf of the entire aggregate, the Document’s Cover Texts may beplaced on covers that bracket the Document within the aggregate,or the electronic equivalent of covers if the Document is in elec-tronic form. Otherwise they must appear on printed covers thatbracket the whole aggregate.

8. TRANSLATION

Translation is considered a kind of modi�cation, so you may dis-tribute translations of the Document under the terms of section 4.Replacing Invariant Sections with translations requires special per-mission from their copyright holders, but you may include trans-lations of some or all Invariant Sections in addition to the originalversions of these Invariant Sections. You may include a transla-tion of this License, and all the license notices in the Document,and any Warranty Disclaimers, provided that you also include theoriginal English version of this License and the original versions ofthose notices and disclaimers. In case of a disagreement betweenthe translation and the original version of this License or a noticeor disclaimer, the original version will prevail.If a section in the Document is Entitled “Acknowledgements”,“Dedications”, or “History”, the requirement (section 4) to Preserveits Title (section 1) will typically require changing the actual title.

9. TERMINATION

You may not copy, modify, sublicense, or distribute the Documentexcept as expressly provided under this License. Any a�empt oth-erwise to copy, modify, sublicense, or distribute it is void, and willautomatically terminate your rights under this License.However, if you cease all violation of this License, then your licensefrom a particular copyright holder is reinstated (a) provisionally,

unless and until the copyright holder explicitly and �nally termi-nates your license, and (b) permanently, if the copyright holder failsto notify you of the violation by some reasonable means prior to 60days a�er the cessation.Moreover, your license from a particular copyright holder is rein-stated permanently if the copyright holder noti�es you of the vi-olation by some reasonable means, this is the �rst time you havereceived notice of violation of this License (for any work) from thatcopyright holder, and you cure the violation prior to 30 days a�eryour receipt of the notice.Termination of your rights under this section does not terminatethe licenses of parties who have received copies or rights from youunder this License. If your rights have been terminated and notpermanently reinstated, receipt of a copy of some or all of the samematerial does not give you any rights to use it.

10. FUTURE REVISIONS OF THIS LICENSE

�e Free So�ware Foundation may publish new, revised versionsof the GNU Free Documentation License from time to time. Suchnew versions will be similar in spirit to the present version, butmay di�er in detail to address new problems or concerns. Seeh�p://www.gnu.org/copyle�/.Each version of the License is given a distinguishing version num-ber. If the Document speci�es that a particular numbered version ofthis License “or any later version” applies to it, you have the optionof following the terms and conditions either of that speci�ed ver-sion or of any later version that has been published (not as a dra�)by the Free So�ware Foundation. If the Document does not spec-ify a version number of this License, you may choose any versionever published (not as a dra�) by the Free So�ware Foundation. Ifthe Document speci�es that a proxy can decide which future ver-sions of this License can be used, that proxy’s public statement ofacceptance of a version permanently authorizes you to choose thatversion for the Document.

11. RELICENSING

“Massive Multiauthor Collaboration Site” (or “MMC Site”) meansany World Wide Web server that publishes copyrightable worksand also provides prominent facilities for anybody to edit thoseworks. A public wiki that anybody can edit is an example of sucha server. A “Massive Multiauthor Collaboration” (or “MMC”) con-tained in the site means any set of copyrightable works thus pub-lished on the MMC site.“CC-BY-SA”means the Creative Commons A�ribution-Share Alike3.0 license published by Creative Commons Corporation, a not-for-pro�t corporation with a principal place of business in San Fran-cisco, California, as well as future copyle� versions of that licensepublished by that same organization.“Incorporate” means to publish or republish a Document, in wholeor in part, as part of another Document.An MMC is “eligible for relicensing” if it is licensed under this Li-cense, and if all works that were �rst published under this Licensesomewhere other than this MMC, and subsequently incorporatedin whole or in part into the MMC, (1) had no cover texts or invari-ant sections, and (2) were thus incorporated prior to November 1,2008.�e operator of an MMC Site may republish an MMC contained inthe site under CC-BY-SA on the same site at any time before August1, 2009, provided the MMC is eligible for relicensing.

math 234 third semester calculus · math 234 third semester calculus fall 2015 1. 2 math 234 –...

Documents