supp notes calculus

MATH 2010

Supplementary Notes on Multivariable Calculus

Department of Mathematics

The Hong Kong University of Science and Technology

January 27, 2012

Contents

1 Vectors and Geometry of Space 11.1 Three-Dimensional Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5 Equations of Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.6 Equations of Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.7 Quadric Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Vector-Valued Functions 292.1 Vector Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 Calculus with Vector Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3 Arc Length in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Partial Derivatives 393.1 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.3 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.5 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.6 Applications of Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4 Multiple Integrals 994.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.2 Double Integrals Over Non-rectangular Regions . . . . . . . . . . . . . . . . . . . . . 1084.3 Double Integrals in Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 1174.4 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.5 Triple Integrals in Cylindrical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . 1344.6 Triple Integrals in Spherical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 136

iii

Chapter 1

Vectors and Geometry of Space

To apply calculus in many real-world situations as well as in higher mathematics, we need a mathematicaldescription of the three-dimensional space. In this beginning chapter we will introduce three-dimensionalcoordinate systems and vectors. Building on what we already know about coordinates in the two-dimensionalxy-plane, we establish coordinates in space by adding a third axis which measures distance above and belowthe xy-plane. Vectors are used to study the analytic geometry of space, where they provide simple ways todescribe lines, planes, surfaces, and curves in space. We will use these geometric ideas later to study motionin space and the calculus of several variables, with their many important applications in science, engineering,and higher mathematics.

1.1 Three-Dimensional Coordinate Systems

In this section we will have a fairly short introducion on the three-dimensional coordinate system andconventions that we will be using. We will also take a brief look at how the different coordinate systems canchange the graph of an equation. The 3-dimensional coordinate system is often denoted by R

3. Likewisethe 2-dimensional coordinate system is often denoted by R

2 and the 1-dimensional by R.

To locate a point in R3, we use three mutually perpendicular coordinate axes, arranged as in the

following figure. The axes shown there make a right-handed coordinate frame. When you hold your righthand so that the fingers curl from the positive x-axis toward the positive y-axis, your thumb points alongthe positive z-axis.

x

y

z

•S = (x, 0, z)

•Q = (x, y, 0)

• R = (0, y, z)

• P = (x, y, z)

Let us look at the basic coordinate system. It is assumed that only the positive directions are shown by theaxes. If we need the negative axis for any reason we will put them in as needed.

1

1. Vectors and Geometry of Space

Also note the various points in the figure. The point P of the Cartesian coordinates (x, y, z) is the generalpoint sitting out in 3-dimensional space R

3. If we start at P and drop a straight line down until we reacha z-coordinate of zero we arrive that the point Q. We say that Q sits in the xy-plane. The xy-planecorresponds to all the points which have a zero z-coordinate. We can also start at P and move in the othertwo directions as shown to get points in the xz-plane (that is the point S with a y-coordinate of zero) andthe yz-plane (that is the point R with a x-coordinate of zero). The xy, yz, and xz-planes are often calledthe coordinate planes. Also, the point Q is often referred to as the projection of P in the xy-plane. Likewise,R is the projection of P in the yz-plane and S is the projection of P in the xz-plane.

Most of the formulae that you are familiar with in R2 have their natural extensions in R

3. For instance,the distance between two points in R

2 is given by

dist(P1, P2) =p

(x2 − x1)2 + (y2 − y1)2,

while the distance between two points in R3 is given by

dist(P1, P2) =p

(x2 − x1)2 + (y2 − y1)2 + (z2 − z1)2.

We might use the distance formula to write equations for spheres in space. A point P (x, y, z) lies on thesphere of radius a centered at P0(x0, y0, z0) precisely when dist(P0, P ) = a, or

(x − x0)2 + (y − y0)

2 + (z − z0)2 = a2.

It is worth-mentioning that we should be careful when just translating everything we know about R2 into

R3 and assuming that it will work the same way. A good example of this is in graphing to some extent.

Consider the following example.

� Example 1.1.1 (Graphing in different coordinate systems)

Graph and interpret the equation x = 3 geometrically in the coordinate systems R, R2, and R

3.

Solution In R we have a single coordinate system and so x = 3 is a point in a one-dimensional coordinatesystem.

In R2 the equation x = 3 tells us to graph all the points that are in the form (3, y). This is a vertical

line in a two-dimensional coordinate system.

In R3 the equation x = 3 tells us to graph all the points that are in the form (3, y, z). If you go back

and look at the coordinate planes in 3-space this is very similar to yz-plane except this time we have x = 3instead of x = 0. So, in a three-dimensional coordinate system this is a plane parallel to the yz-plane.Here are the graphs of each of these.

0 1 2 3 4• x

1 2 3x

y

x

y

z

2

1.1 Three-Dimensional Coordinate Systems

�

Note that at this moment we can now write down the equations for each of the coordinate planes as wellusing this idea.

x = 0 ⇐⇒ the yz-plane,y = 0 ⇐⇒ the xz-plane,z = 0 ⇐⇒ the xy-plane.

Let us take a look at a slightly more general example.


Graph and interpret the equationy = 2x − 1

geometrically in the coordinate systems R2 and R

3.

Solution Of course we have to throw out R for this example since there are two variables which meansthat we cannot be in a one-dimensional space.

In R2 the equation y = 2x − 1 is a line with slope 2 and a y-intercept of −1.

However, in R3 this is not necessarily a line. Because we have not specified a value of z ,we may consider

that z can take any value. This means that at any particular value of z we will get a copy of this line. So,the graph of the given equation in 3-space is then a vertical plane that lies over the line given by y = 2x− 1in the xy-plane.

y

x

x

y

z

y = 2x − 1on xy-plane

�

Notice that if we look to where the plane intersect the xy-plane we will get the graph of the line in R2 as

shown in the above graph. Let us take a look at one more example of the difference between graphs in thedifferent coordinate systems.

3



Graph and interpret the equation

x2 + y2 = 4

geometrically in the coordinate systems R2 and R

3.

Solution As with the previous example this won’t have a one-dimensional graph since there are twoindependent variables in the equation.

In R2 this is a circle centered at the origin with radius 2.

In R3, as with the previous example, this may or may not be a circle. Since we have not specified z in any

way we must assume that z can take on any value. In other words, at any value of z this equation must besatisfied and so at any value z we have a circle of radius 2 centered on the z-axis. This means that we havea cylinder of radius 2 centered on the z-axis. The following is the graphs for this example.

y

x

x

y

z

Again, if we look to where the cylinder intersects the xy-plane (i.e., z = 0) we will again get the circlefrom R

3. �

We need to be careful with the last two examples. It would be interesting to take the results of these andsay that we can’t graph lines or circles in R

3 and yet that does not really make sense. There is no reasonfor not allowing to graph a line or a circle in R

3. To graph a circle in R3 we would need to do something like

x2 + y2 = 4 at z = 5. This would be a circle of radius 2 centered on the z-axis at the level of z = 5. So,as long as we specify a z-value we will get a circle and not a cylinder. We will see an easier way to specifycircles in a later section. We could do the same thing with the line from Example 1.1.2. However, we willbe looking at line in more generality in Section 1.5 (page 16) and so we will see a better way to deal withlines in R

3 there.

The point of the examples in this section is to make sure that we are being careful with graphing equationsand making sure that we always remember which coordinate system that we are in. Another quick pointto make here is that, as we have seen in the above examples, many graphs of equations in R

3 are surfaces.That does not mean that we cannot graph curves in R

3. We can and will graph curves in R3 as well as we

will see later in this chapter.

4

1.2 Vectors

As we mentioned in the beginning of this chapter, there is a mathematical tool which can be used todescribe lines, curves, planes, and surfaces in space. This is the building block in geometry: vectors. Inthe next section we will give a short review on vectors and their basic properties.

1.2 Vectors

In this section we show how to represent things that have both magnitude and direction in the plane or inspace. A quantity in physics such as force, displacement, or velocity is called a vector and is represented bya directed line segment. The arrow points in the direction of the action and its length gives the magnitudeof the action in terms of a suitably chosen unit. For example, a force vector points in the direction in whichthe force acts; its length is a measure of the force’s strength; a velocity vector points in the direction ofmotion and its length is the speed of the moving object.

−−→PQ

Initialpoint

P

Terminalpoint

Q

1.2.1 Definition A vector is a directed line segment. The directed line segment−−→PQ has initial point P

and terminal point Q; its length (or norm) is denoted by ‖−−→PQ‖. Two vectors are equal if they have thesame length and direction.

The arrows we use when we draw vectors are understood to represent the same vector if they have thesame length, are parallel, and point in the same direction regardless of the initial point. We need a way torepresent vectors algebraically so that we can be more precise about the direction of a vector. Let

v =−−→PQ.

There is one directed line segment equal to−−→PQ whose initial point is the origin. It is the representative of

v in standard position and is the vector we normally use to represent v. We can specify v by writing thecoordinates of its terminal point (v1, v2, v3) when v is in standard position.

1.2.2 Definition If v is a two-dimensional vector in the plane equal to the vector with initial point atthe origin and terminal point (v1, v2), then the component form of v is

v = 〈v1, v2〉.

If v is a three-dimensional vector equal to the vector with initial point at the origin and terminal point(v1, v2, v3), then the component form of v is

v = 〈v1, v2, v3〉.

5


So a two-dimensional vector v = 〈v1, v2〉 is an ordered pair of real numbers, and a three-dimensional vectorv = 〈v1, v2, v3〉 is an ordered triple of real numbers. The numbers v1, v2, v3 are the coordinates of v.

In general, an n-component multivariable x = 〈x1, x2, · · · , xn〉 is called an n-dimensional vector. Thecomponents x1, x2, · · · , xn are called the coordinates of the vector x. The collection R

n of all n-dimensional vectors is called the n-dimensional Euclidean space. We indicate x is an n-dimensionalvector by writing x ∈ R

n.

We may visualize low dimensional Euclidean spaces as follows.

R0 : a single point;

R1 : a straight line with origin;

R2 : a plane with origin;

R3 : our living world with a choice of reference point as origin.

The origin corresponds to the zero vector 0 = 〈0, 0, · · · , 0〉. We may also imagine higher dimensionalEuclidean spaces by analogy.

In summary, a position vector is represented either as a point in the Euclidean space or as an arrow fromthe origin to the point.

0

•(−2, 1)

•(1,−1)

• (2, 2)

0

〈−2, 1〉

〈1,−1〉

〈2, 2〉

vectors as points vectors as arrows

In the real world, one often has to deal with arrows not starting from the reference point (i.e., theorigin). In practical, we get around the problem by parallely moving the arrows so that the starting pointsbecome the origin. Throughout this course any two arrows will be considered as the same vector if one canbe parallely moved to the other.

0

v

v

Two principal operations involving vectors are vector addition and scalar multiplication. A scalar issimply a real number, and is called to distinguish it from vectors. Scalars can be positive, negative or zerowhich can be used to “scale” a vector by multiplication.

6

1.2 Vectors

• Addition and scalar multiplication

Vectors of the same dimension may be added and scalar multiplied in the natural way:

〈x1, x2, · · · , xn〉 + 〈y1, y2, · · · , yn〉 = 〈x1 + y1, x2 + y2, · · · , xn + yn〉,c〈x1, x2, · · · , xn〉 = 〈cx1, cx2, · · · , cxn〉.

The scalar multiplication is easily visualized as the stretching and shrinking of vectors. The addition isvisualized with the help of parallelograms.

−u 0•

12u u 2u

v u + v

The above figure indicates that the operations on vectors have physical meaning in the real world. Forexample, suppose vectors x and y represent two forces applied to the same point of an object. Then thecombined effect of the two forces on the object is a force represented by the vector x + y.

By repeatedly making use of the two operations, we have linear combinations of vectors.

� Example 1.2.1 (Linear combination of vectors)

Let

x = 〈1,−2,−3, 5〉, y = 〈0,−1, 2,−4〉, z = 〈−1, 3, 1,−1〉.Then

−x + 2y − 3z = 〈2,−9, 4,−10〉, 2x + y − 2z = 〈4,−11,−6, 8〉.�

The following figure indicates all the linear combinations of two vectors.

•−u − v•−v

•u − v

•2u − v

•−u 0 u

u − 12v

•2u

•−u + v v12u + 1

2v

•u + v

•2u + v

•2v

•u + 2v

7


• Unit vectors

A vector v of length 1 is called a unit vector. The standard unit vectors are

i = 〈1, 0, 0〉, j = 〈0, 1, 0〉, k = 〈0, 0, 1〉.

Any vector v = 〈v1, v2, v3〉 can be written as a linear combination of the standard unit vectors as follows:

v = 〈v1, v2, v3〉= 〈v1, 0, 0〉 + 〈0, v2, 0〉 + 〈0, 0, v3〉= v1 〈1, 0, 0〉 + v2 〈0, 1, 0〉 + v3 〈0, 0, 1〉= v1i + v2j + v3k.

Whenever v = 0, its length ‖v‖ is not zero and‚‚‚‚ 1

‖v‖ v

‚‚‚‚ =1

‖v‖ ‖v‖ = 1.

That is, v/‖v‖ is a unit vector in the direction of v, called the direction of the nonzero vector v. Theprocess of rescaling a nonzero vector into a unit vector is often named the normalization of the vector.

� Example 1.2.2 (Unit vectors)

Find a unit vector u in the direction of the vector from P1(1, 0, 1) to P2(3, 2, 0).

Solution We divide−−−→P1P2 by its length.

−−−→P1P2 = (3 − 1) i + (2 − 0) j + (0 − 1)k = 2i + 2j − k,

‖−−−→P1P2‖ =p

(2)2 + (2)2 + (−1)2 =√

9 = 3,

u =

−−−→P1P2

‖−−−→P1P2‖=

2i + 2j − k

3=

2

3i +

2

3j − 1

3k.

The unit vector u is in the direction of−−−→P1P2. �

� Example 1.2.3 (Unit vectors)

If v = 3i − 4j is a velocity vector, express v as a product of its and a unit vector in the direction ofmotion.

Solution Speed is the magnitude (length) of v:

‖v‖ =p

(3)2 + (−4)2 =√

25 = 5.

The unit vector v/‖v‖ has the same direction as v:

v

‖v‖ =3i − 4j

5=

3

5i − 4

5j.

So,

v = 3i − 4j = 5

„3

5i − 4

5j

«| {z }a unit vector

.

8

1.3 The Dot Product

�

In summary, we can express any nonzero vector v in terms of its two important features, length and

direction, by writing v = ‖v‖ v

‖v‖ .

Theorem 1.2.1 (Normalizing a Vector) If v = 0, then

(1)v

‖v‖ is a unit vector in the direction of v.

(2) The equation v = ‖v‖ v

‖v‖ expresses v in terms of its length and direction.

1.3 The Dot Product

In this section, we will see how we can determine the angle between two vectors directly from their com-ponents. A key part of the calculation is an expression called the dot product. Dot products are alsocalled inner or scalar products because the product results in a scalar, not a vector. After introducing thedot product, we will examine its physical meaning such as to determine the work done by a constant forceacting through a displacement.

u

v

θ

When two nonzero vectors u and v are placed so their initial points coincide, they form an angle θ ofmeasure 0 � θ � π. If the vectors do not lie along the same line, the angle θ is measured in the planecontaining both of them. If they do lie along the same line, the angle between them is 0 if they point inthe same direction, and π if they point in opposite directions. The angle θ is the angle between u andv. The following gives a formula to determine this angle.

Theorem 1.3.1 (Angle Between Two Vectors) The angle θ between the two nonzero vectorsu = 〈u1, u2, u3〉 and v = 〈v1, v2, v3〉 is given by

u · v = ‖u‖ ‖v‖ cos θ,

or explicitly

θ = cos−1

„u1v1 + u2v2 + u3v3

‖u‖ ‖v‖«

.

9


In Theorem 1.3.1, we pay our attention to the expression u1v1 + u2v2 + u3v3 in the calculation of θ.

1.3.1 Definition The dot product u · v (read “u dot v”) of vectors u = 〈u1, u2, u3〉 andv = 〈v1, v2, v3〉 is

u · v = u1v1 + u2v2 + u3v3.

� Example 1.3.1 (The dot product)

(a) 〈1,−2, 3〉 · 〈−2,−3, 6〉 = (1)(−2) + (−2)(−3) + (3)(6) = −2 + 6 + 18 = 22.

(b)

„1

2i − 2j + 3k

«· (4i − 3j + 2k) = (

1

2)(4) + (−2)(−3) + (3)(2) = 14.

�

� Example 1.3.2 (The dot product)

Find the angle between u = i − 2j − 2k and v = 6i + 3j + 2k.

Solution We use the formula in Theorem 1.3.1:

u · v = (1)(6) + (−2)(3) + (−2)(2) = −4,

‖u‖ =p

(1)2 + (−2)2 + (−2)2 =√

9 = 3,

‖v‖ =p

(6)2 + (3)2 + (2)2 =√

49 = 7,

θ = cos−1

„u · v

‖u‖ ‖v‖«

= cos−1

„ −4

(3)(7)

«≈ 1.76 (radians).

�

• Work

Let us briefly discuss the physical interpretation of dot product. If a force F is applied to a particle movingalong a path, we often need to know the magnitude of the force in the direction of motion. If d is parallelto the tangent line to the path at the point where F is applied, then we want the magnitude of F in thedirection of d. The following figure shows that the scalar quantity we seek is the length ‖F‖ cos θ, whereθ is the angle between the two vectors F and d.

Length = ‖F‖ cos θ

F

θ d

10

1.3 The Dot Product

Notice that the magnitude of the force F in the direction of vector d is the length ‖F‖ cos θ of theprojection of F onto d.

Recall that the work done by a constant force of magnitude F in moving an object through a distanced is W = Fd. This formula holds only if the force is directed along the line of motion. Now if a forceF moving an object through a displacement d is in a particular direction, the work is performed by thecomponent of F in the direction of d. If θ is the angle between F and d, then

Work = (scalar component of F in the direction of d) (length of d)

= (‖F‖ cos θ) ‖d‖ = F · d.

In other words, we apply the dot product to finding the work done by a constant force acting through adisplacement. The work done is

W = F · d = ‖F‖ ‖d‖ cos θ.

• Perpendicular (orthogonal) vectors

Two nonzero vectors u and v are perpendicular or orthogonal if the angle between them is π/2.For such vectors, we have u · v = 0 because cos(π/2) = 0. The converse is also true. If u and v arenonzero vectors with u · v = ‖u‖ ‖v‖ cos θ = 0, then cos θ = 0 and hence θ = cos−1 0 = π/2.

Theorem 1.3.2 Vectors u and v are orthogonal (or perpendicular) if and only if u ·v = 0.

• Dot product properties

The dot product obeys many of the laws that hold for ordinary products of real numbers (scalars).

Theorem 1.3.3 (Properties of the Dot Product) If u, v, and w are any vectors and c is ascalar, then

(1) u · v = v · u.

(2) (cu) · v = u · (cv) = c(u · v).

(3) u · (v + w) = u · v + u · w.

(4) u · u = ‖u‖2.

(5) 0 · u = 0.

The properties are easy to prove using the definition. We skip the detail of any proofs here but students aresuggested to try the proofs of (1) and (3), for instance.

11


1.4 The Cross Product

In the 2-dimensional plane, when we want to describe the orientation a line, we may use the notions of“slope”. In the 3-dimensional space, we also need a way to describe how a plane is tilted. We accomplishthis by multiplying two vectors in the plane together to get a third vector perpendicular to the plane. Thedirection of this third vector tells us the “inclination” of the plane. The product we use to multiply thevectors together is the vector or cross product, the second of the two vector multiplication methods. Westudy the cross product in this section.

2

1

u × v

u

vθ

n

We start with two nonzero vectors u and v in space. If u and v are not parallel, they uniquelydetermine a plane. We select a unit vector n perpendicular to the plane by the right-hand rule. Thismeans that we choose n to be the unit (normal) vector that points the way your right thumb points whenyour fingers curl through the angle θ from u to v. Then the cross product u× v (read “u cross v”)is the vector defined as follows:

1.4.1 Definitionu × v = (‖u‖ ‖v‖ sin θ)n.

Unlike the dot product, the cross product is a vector. For this reason it is also called the vector productof u and v, and applies only to vectors in space. The vector u × v is orthogonal to both u and vbecause it is a scalar multiple of n.

There is a straightforward way to calculate the cross product of two vectors from their components. Themethod does not require that we know the angle between them (as suggested by the definition), but wepostpone that calculation at this moment so we can focus first on the properties of the cross product.

Since the sines of 0 and π are both zero, it makes sense to define the cross product of two parallelnonzero vectors to be 0. If one or both of u and v are zero, we also define u× v to be zero. This way,the cross product of two vectors u and v is zero if and only if u and v are parallel or one or both ofthem are zero.

Theorem 1.4.1 Nonzero vectors u and v are parallel if and only if u × v = 0.

12


The cross product obeys the following basic rules.

Theorem 1.4.2 (Properties of the Cross Product) If u, v, and w are any vectors and r, sare scalars, then

(1) (ru) × (sv) = (rs) (u × v).

(2) u × (v + w) = u × v + u ×w.

(3) (v + w) × u = v × u + w × u.

(4) v × u = −u × v.

(5) 0 × u = 0.

To visualize (4), for example, notice that when the fingers of a right hand curl through the angle θ from vto u, the thumb points the opposite way and the unit vector we choose in forming v × u is the negativeof the one we choose in forming u× v. Also remind that cross product is not associative so (u× v) ×wdoes not generally equal u × (v × w).

When we apply the definition to calculate the pairwise cross products of i, j, and k, we find

i × j = −(j× i) = k, k× i = −(i× k) = j,

j × k = −(k× j) = i, i× i = j × j = k × k = 0.

• Cross product as an area of a parallelogram

θ

h = ‖v‖ sin θ

u

v

Area = base · height = (‖u‖) (‖v‖ sin θ)

Because n is a unit vector, the magnitude of u × v is

Theorem 1.4.3‖u × v‖ = ‖u‖ ‖v‖ | sin θ| ‖n‖ = ‖u‖ ‖v‖ sin θ.

This is the area of the parallelogram determined by u and v, ‖u‖ being the base of the parallelogramand (‖v‖ sin θ) the height. Remark that sin θ is positive (and hence | sin θ| = sin θ) for 0 < θ < π.

13


• Determinant formula for cross product

We may also calculate the cross product u × v from the components of u and v in the Cartesiancoordinate system. For ease in calculating the cross product using determinants, we usually write vectors inthe form v = v1i + v2j + v3k rather than as ordered triples v = 〈v1, v2, v3〉. We have the following rule.

Theorem 1.4.4 (Calculating Cross Products Using Determinants) If u = u1i + u2j + u3kand v = v1i + v2j + v3k, then

u × v =

˛˛ i j ku1 u2 u3

v1 v2 v3

˛˛ .

� Example 1.4.1 (Cross product by determinant)

Find u × v and v × u if u = 2i + j + k and v = −4i + 3j + k.

Solution By the formula,

u × v =

˛˛ i j k

2 1 1−4 3 1

˛˛

=

˛˛1 13 1

˛˛ i −

˛˛ 2 1−4 1

˛˛ j +

˛˛ 2 1−4 3

˛˛ k

= −2i − 6j + 10k,

v × u = −(u × v) = 2i + 6j − 10k.

�

� Example 1.4.2 (Cross product by determinant)

Find the area of the triangle with vertices P (1,−1, 0), Q(2, 1,−1), and R(−1, 1, 2).

Solution The vector−−→PQ×−→

PR is perpendicular to the plane because it is perpendicular to both vectors.In terms of components,

−−→PQ = (2 − 1) i + (1 + 1) j + (−1 − 0)k = i + 2j − k,

−→PR = (−1 − 1) i + (1 + 1) j + (2 − 0)k = −2i + 2j + 2k.

Then

−−→PQ ×−→

PR =

˛˛ i j k

1 2 −1−2 2 2

˛˛

=

˛˛2 −12 2

˛˛ i −

˛˛ 1 −1−2 2

˛˛ j +

˛˛ 1 2−2 2

˛˛ k = 6i + 6k.

The area of the parallelogram determined by P , Q, and R is

‖−−→PQ ×−→PR‖ = ‖6i + 6k‖ =

p(6)2 + (6)2 =

√2 · 36 = 6

√2.

The triangle’s area is half of this, or 3√

2. �

14


• Triple scalar or box product

The product (u× v) ·w is called the triple scalar product of u, v, and w (in that order). As youcan see in the formula,

|(u × v) · w| = ‖u × v‖ ‖w‖ | cos θ|,the absolute value of the product is the volume of the parallelepiped (parallelogram-sided box) determinedby u, v, and w (see the following figure). The number ‖u × v‖ is the area of the base parallelogram.The number ‖w‖ | cos θ| is the parallelepiped’s height. Because of this geometry, (u×v) ·w is also calledthe box product of u, v, and w.

u × v

Height = ‖w‖ | cos θ|θ

w

v

uArea = ‖u × v‖

Volume= area of base · height= (‖u × v‖) (‖w‖ | cos θ|)

By treating the planes of v and w and of w and u as the base planes of the parallelepiped determinedby u, v, and w, we see that

(u × v) · w = (v ×w) · u = (w × u) · v.

Since the dot product is commutative, we also have

(u × v) · w = u · (v × w).

The triple scalar product can be evaluated as a determinant:

Theorem 1.4.5 (Calculating the Triple Scalar Products by Determinants)If u = u1i + u2j + u3k, v = v1i + v2j + v3k and w = w1i + w2j + w3k, then

(u × v) · w =

˛˛u1 u2 u3

v1 v2 v3

w1 w2 w3

˛˛ .

Three vectors in 3-space are said to be coplaner if the parallelepiped they span has zero volume; if theirtails coincide, three such vectors must lie in the same plane.

Theorem 1.4.6 (Three Coplanar Vectors)

u, v, and w are coplanar ⇐⇒ (u × v) · w = 0 ⇐⇒˛˛u1 u2 u3

v1 v2 v3

w1 w2 w3

˛˛ = 0.

15


� Example 1.4.3 (Triple scalar product)

Find the volume of the box (parallelepiped) determined by u = i+2j−k, v = −2i+3k, and w = 7j− 4k.

Solution Using the rule for calculating determinants, we find

(u × v) · w =

˛˛ 1 2 −1−2 0 30 7 −4

˛˛ = ˛˛0 3

7 −4

˛˛+ 2

˛˛2 −17 −4

˛˛ = −21 + 2(−8 + 7) = −23.

The volume is |(u × v) · w| = 23 units cubed. �

1.5 Equations of Lines

In this section we will take a look at the equation of a line in R3. As we saw in Example 1.1.2 (page 3) the

equation y = mx + b does not describe a line in R3, instead it describes a surface, particularly a plane.

This does not mean however that we cannot write down an equation for a line in three-dimensional space.To see how to do this let us think about what we need to write down the equation of a line in R

2. Intwo-dimensional space we need the slope (i.e., m) and a point that was on the line in order to write downthe equation.

In R3 that is still all that we need except in this case the “slope” will not be a simple number as it was in

the two dimensions. In this case we will need to acknowledge that a line can have a three-dimensional slope.So, we need something that will allow us to describe a direction that is potentially in three dimensions. Wealready have a quantity that will do this for us. Vectors give directions and are three-dimensional objects.

So, let us start with the following information. Suppose that we know a point that is on the line,P0(x0, y0, z0), and that v = 〈a, b, c〉 is a vector which is parallel to the line. Note that v is not necessaryto be on the line itself. We only need v to be parallel to the line. Finally, let P (x, y, z) be any point on theline. Now, since our “slope” is a vector let us also turn the the two points (i.e., P0 and P ) into vectorsas well. Of course, we don’t actually turn them into vectors, we instead use position vectors to representthem. So, let r0 and r be the position vectors for P0 and P , respectively. Also, for no apparent reason,

let us define a to be the vector with representation−−→P0P . We now have the following sketch with all these

vectors.

The Line

P0(x0, y0, z0) P (x, y, z)

v

r0 r

a

x

y

z

16


Now we can write r as a sum of two vectors by the parallelogram rule.

r = r0 + a.

Also notice that a and v are parallel. Therefore, there is a number t such that

a = tv.

We then have

r = r0 + tv = 〈x0, y0, z0〉 + t 〈a, b, c〉.This is called the vector form of the equation of a line. The only unknown in this equation is theparameter t. Notice that tv is a vector that lies along the line and it tells us how far from the originalpoint that we should move. If t is positive we move to the right of the original point and if t is negativewe move to the left of the original point. As t varies over all possible values we will completely span theline.

There are several other forms of the equation of the line. To get the first alternative form let us startwith the vector form and do a slight rewrite.

r = 〈x0, y0, z0〉 + t 〈a, b, c〉,〈x, y, z〉 = 〈x0 + at, y0 + bt, z0 + ct〉.

The only way for two vectors to be equal is for the components to be equal. In other words,

x = x0 + at,

y = y0 + bt,

z = z0 + ct.

This set of equations is called the parametric form of the equation of a line. To get a point on theline, all we need to do is to pick a t-value and plug it into either form of the line. In the vector form of theline we get a position vector for the point and in the parametric form we get the actual coordinates of thepoint.

There is one more form of the line that we want to look at. If we assume that a, b, and c are all nonzeronumbers we can solve each of the equations in the parametric form of the line for t. We can then set all ofthem equal to each other since t will be the same number in each. Doing this gives the following.

x − x0

a=

y − y0

b=

z − z0

c(= t).

This is called the symmetric equations of the line. If one of a, b, c does happen to be zero we canstill write down the symmetric equations. To see this let us suppose that b = 0. In this case t will notexist in the parametric equation for y and so we will only solve the parametric equations for x and z fort. We can set those equal and acknowledge the parametric equation for y as follows.

x − x0

a=

z − z0

c, y = y0.

� Example 1.5.1 (Equations of lines)

Write down the equation of the line that passes through the points (2,−1, 3) and (1, 4,−3). Write downall three forms of the equation of the line.

17


Solution To do this we need the vector v which is parallel to the line. This can be any vector as longas it is parallel to the line. In general, v will not lie on the line itself. However, in this case it will. All weneed to do is let v be the vector that starts at the second point and ends at the first point. Hence,

v = 〈2,−1, 3〉 − 〈1, 4,−3〉 = 〈1,−5, 6〉.Note that the order of the points was chosen to reduce the number of minus signs in the vector. Once wehave v there really is not anything else to do. To use the vector form we will need a point on the line.From the given information we have two points, so we can use either one. Let us use the first point. Thefollowing is the vector form of the equation of the line.

r = 〈2,−1, 3〉 + t 〈1,−5, 6〉 = 〈2 + t, −1 − 5t, 3 + 6t〉.Once we have this equation the other two forms follow. Here is the parametric equations of the line.

x = 2 + t, y = −1 − 5t, z = 3 + 6t.

Here is the symmetric form.x − 2

1=

y + 1

−5=

z − 3

6.

�

� Example 1.5.2 (Equations of lines)

Determine if the line that passes through the point (0,−3, 8) and is parallel to the line given by

x = 10 + 3t, y = 12t, and z = −3 − t

passes through the xz-plane. If it does give the coordinates of the point.

Solution To answer this we will first need to write down the equation of the line. We know a point onthe line and just need a parallel vector. We know that the new line must be parallel to the line given by theparametric equations in the problem statement. That means that any vector that is parallel to the givenline must also be parallel to the new line.

Now recall that in the parametric form of the line the numbers multiplied by t are the components ofthe vector that is parallel to the line. Therefore, the vector

v = 〈3, 12,−1〉is parallel to the given line and so must also be parallel to the new line. The equation of the new line is then

r = 〈0,−3, 8〉 + t 〈3, 12,−1〉 = 〈3t, −3 + 12t, 8 − t〉.If this line passes through the xz-plane then we know that the y-coordinate of that point must be zero. So,let us set the y-component of the equation equal to zero and see if we can solve for t. If we can, this willget the value of t for which the point will pass through the xz-plane.

−3 + 12t = 0 =⇒ t =1

4.

So, the line does pass through the xz-plane. To get the complete coordinates of the point all we need to dois plug t = 1/4 into any of the equations. We will use the vector form.

r = 〈3(1

4), −3 + 12(

1

4), 8 − (

1

4)〉 = 〈3

4, 0,

31

4〉.

Recall that this vector is the position vector for the point on the line and so the coordinates of the pointhere the line will pass through the xz-plane are given by

(3

4, 0,

31

4).

�

18


• The distance from a point to a line in space

To find the distance from a point S to a line that passes through a point P parallel to a vector v, we find

the absolute value of the scalar component of−→PS in the direction of a vector normal to the line (see the

following figure). In the notation of the figure, the absolute value of the scalar component is ‖−→PS‖ sin θ,

which is ‖−→PS × v‖/‖v‖.

v

‖−→PS‖ sin θ = d (the distance we want)

•S

•P θ The Line defined by P and v

Theorem 1.5.1 The distance from a point S to a line through P parallel to v is given by

d =‖−→PS × v‖

‖v‖ .

Just recall the definition of cross product in Section 1.4 (page 12) that

−→PS × v = (‖−→PS‖ ‖v‖ sin θ)n =⇒ ‖−→PS × v‖ = ‖−→PS‖ ‖v‖ | sin θ|.

� Example 1.5.3 (Distance from a point to a line)

Find the distance from the point S(1, 1, 5) to the line

x = 1 + t, y = 3 − t, z = 2t.

Solution We see from the equations that the line passes through P (1, 3, 0) parallel to v = i − j + 2k.With −→

PS = (1 − 1) i + (1 − 3) j + (5 − 0)k = −2j + 5k

and

−→PS × v =

˛˛i j k0 −2 51 −1 2

˛˛ = i + 5j + 2k.

Thus,

d =‖−→PS × v‖

‖v‖ =

√1 + 25 + 4√1 + 1 + 4

=

√30√6

=√

5.

�

19


1.6 Equations of Planes

In the first section of this chapter we saw some equations of planes. However, none of those equations hasthree variables in them and are basically the extension of graphs in two dimensions. We would like have amore general equation for planes.

Let us start by assuming that we know a point on the plane, P0(x0, y0, z0). In addition, we assumethat we have a vector that is orthogonal (perpendicular) to the plane, n = 〈a, b, c〉. This vector is calledthe normal vector. For an arbitrary point P = (x, y, z) on this plane, we can find the position vectorsr0 and r of P0 and P , respectively. Here is the sketch of all these vectors.

The Plane

P (x, y, z)

P0(x0, y0, z0)

r

r0

n

r − r0

x

y

z

Notice that the vector r − r0 is also lying on the plane. Also notice that we put the normal vector n onthe plane, but it is not necessary to be the case and we put it here just for illustration. It is possible that thenormal vector does not touch the plane. Now, because n is orthogonal to the plane, it is also orthogonalto any vector that lies in the plane. In particular it is orthogonal to r − r0. Recall that two orthogonalvectors will have a dot product of zero. In other words,

n · (r − r0) = 0 =⇒ n · r = n · r0.

This is called the vector equation of the plane.

The vector equation of the plane is not a very useful equation in some ways. Let us get a much moreuseful form of the equations. Let us start with the first form of the vector equation.

〈a, b, c〉 · (〈x, y, z〉 − 〈x0, y0, z0〉) = 0,

〈a, b, c〉 · 〈x − x0, y − y0, z − z0〉 = 0.

Now, actually compute the dot product,

a(x − x0) + b(y − y0) + c(z − z0) = 0.

This is called the scalar equation of plane. Often this will be written as

ax + by + cz = d,

20


where d = ax0 + by0 + cz0. This second form is often how we are given equations of planes. Notice that ifwe are given the equation of a plane in this form we can quickly get a normal vector for the plane. A normalvector is

n = 〈a, b, c〉.

Let us work a couple of examples.

� Example 1.6.1 (Equations of planes)

Determine the equation of the plane that contains the points P = (1,−2, 0), Q = (3, 1, 4), and R =(0,−1, 2).

Solution In order to write down the equation of plane we need a point (yes, we have three here) and anormal vector. We need to find a normal vector. Recall that we saw how to do this in Section 1.4. We canform the following two vectors from the given points.

−−→PQ = 〈2, 3, 4〉, −→

PR = 〈−1, 1, 2〉.

These two vectors lie completely in the plane since we formed them from points that were in the plane.Notice that there are many possible vectors for use, we just chose two of the possibilities.

Now we know that the cross product of two vectors is orthogonal to both of these vectors. Since both ofthese are in the plane any vector that is orthogonal to both of these will also be orthogonal to the plane.Therefore, we can use the cross product as the normal vector.

n =−−→PQ ×−→

PR =

˛˛ i j k

2 3 4−1 1 2

˛˛ = 2i − 8j + 5k.

The equation of the plane is then 2(x − 1) − 8(y + 2) + 5(z − 0) = 0, or

2x − 8y + 5z = 18.

We used P for the point, but could have used any of the three points. �

� Example 1.6.2 (Equations of planes)

Determine if the plane given by −x+2z = 10 and the line given by r = 〈5, 2− t, 10+4t〉 are orthogonal,parallel or neither.

Solution This is not a problem as difficult as it first appeared to be. We can pick off a vector that isnormal to the plane. This is n = 〈−1, 0, 2〉. We can also get a vector that is parallel to the line. This isv = 〈0,−1, 4〉.

Now, if these two vectors are parallel then the line and the plane will be orthogonal. If you think aboutit this makes some sense. If n and v are parallel, then v is orthogonal to the plane, but v is also parallelto the line. So, if the two vectors are parallel the line and plane will be orthogonal. Let us consider thefollowing.

n × v =

˛˛ i j k−1 0 20 −1 4

˛˛ = 2i + 4j + k = 0.

So, the vectors are not parallel and so the plane and the line are not orthogonal.

21


Now, let us check to see if the plane and line are parallel. If the line is parallel to the plane then anyvector parallel to the line will be orthogonal to the normal vector of the plane. In other words, if n andv are orthogonal then the line and the plane will be parallel. Let us check this.

n · v = 0 + 0 + 8 = 8 = 0.

The two vectors are not orthogonal and so the line and plane are not parallel.

So, the line and the plane are neither orthogonal nor parallel. �

• Line of intersection

Just as lines are parallel if and only if they have the same direction, two planes are parallel if and only iftheir normals are parallel, or n1 = kn2 for some scalar k. Two planes that are not parallel intersect in aline.

Plane 1

Plane 2

The Lineof Intersection

n1 × n2

n2

n1

� Example 1.6.3 (Line of intersection)

Find a vector parallel to the line of intersection of the planes 3x − 6y − 2z = 15 and 2x + y − 2z = 5.

Solution The line of intersection of two planes is perpendicular to both planes’ normal vectors n1 andn2 and therefore parallel to n1 ×n2. Turning this around, n1 ×n2 is a vector parallel to the planes’ lineof intersection. In our case,

n1 × n2 =

˛˛i j k3 −6 −22 1 −2

˛˛ = 14i + 2j + 15k.

Any nonzero scalar multiple of n1 × n2 will do as well. �


Find parametric equations for the line in which the planes 3x−6y−2z = 15 and 2x+y−2z = 5 intersect.

22


Solution We find a vector parallel to the line and a point on the line. Example 1.6.3 identifies v =14i + 2j + 15k as a vector parallel to the line. To find a point on the line, we can take any point commonto the two planes. Substituting z = 0 in the plane equations and solving for x and y simultaneouslyidentifies one of these points as (3,−1, 0). The line is

x = 3 + 14t, y = −1 + 2t, z = 15t.

The choice z = 0 is arbitrary and we could have chosen z = 1 or z = −1 just as well. Or we could havelet x = 0 and solved for y and z. The different choices would simply give different parametrizations ofthe same line. �


Find the point where the line

x =8

3+ 2t, y = −2t, z = 1 + t

intersects the plane 3x + 2y + 6z = 6.

Solution The point

(8

3+ 2t, −2t, 1 + t)

lies in the plane if its coordinates satisfy the equation of the plane, that is, if

3(8

3+ 2t) + 2(−2t) + 6(1 + t) = 6,

8 + 6t − 4t + 6 + 6t = 6,

8t = −8,

t = −1.

The point of intersection is

(8

3+ 2(−1), −2(−1), 1 + (−1)) = (

2

3, 2, 0).

�

• The distance from a point to a plane

If P is a point on a plane with normal n, then the distance from any point S to the plane is the length

of the vector projection of−→PS onto n. That is, the distance from S to the plane is

d =

˛˛−→PS · n

‖n‖˛˛,

where n = Ai + Bj + Ck is normal to the plane.


Find the distance from S(1, 1, 3) to the plane 3x + 2y + 6z = 6.

23


Solution We find a point P in the plane and calculate the length of the vector projection of−→PS onto a

vector n normal to the plane (see the following figure). The coefficients in the equation 3x + 2y + 6z = 6give

n = 3i + 2j + 6k.

x

y

z

3x + 2y + 6z = 6

The Plane

Distance fromS to the plane

n = 3i + 2j + 6k

•(2, 0, 0)

•(0, 0, 1)

S(1, 1, 3)

P (0, 3, 0)

The points on the plane easiest to find from the plane’s equation are the intercepts. If we take P to be they-intercept (0, 3, 0), then

−→PS = (1 − 0) i + (1 − 3) j + (3 − 0)k

= i− 2j + 3k,

‖n‖ =p

(3)2 + (2)2 + (6)2 =√

49 = 7.

The distance from S to the plane is the length of projn−→PS, or

d =

˛˛−→PS · n

‖n‖˛˛

=

˛˛(i − 2j + 3k) · (3

7i +

2

7j +

6

7k)

˛˛

=

˛˛37 − 4

7+

18

7

˛˛ =

17

7.

�

• Angles between planes

The angles between two intersecting planes is defined to be the (acute) angle between their normal vectors.

� Example 1.6.7 (Angle between planes)

Find the angle between the planes 3x − 6y − 2z = 15 and 2x + y − 2z = 5.

Solution The vectors

n1 = 3i − 6j − 2k, n2 = 2i + j − 2k

24

1.7 Quadric Surfaces

are normals to the planes. The angle between them is

θ = cos−1

„n1 · n2

‖n1‖ ‖n2‖«

= cos−1

„4

21

«≈ 13.8 radians.

�


In the previous two sections we have looked at lines and planes in three dimensions (or R3) and while

these are used quite heavily at times in calculus class there are many other surfaces that are also used fairlyregularly and so we need to take a look at those.

In this section we are going to be looking at quadric surfaces. Quadric surfaces are the graphs of anyequation that can be put into the general form

Ax2 + By2 + Cz2 + Dxy + Exz + Fyz + Gx + Hy + Iz + J = 0,

where A, B, · · · , J are all constants. There is no way that we can possibly list all of them, but there aresome standard equations so here is a list of some of the more common quadric surfaces.

• Ellipsoid

Here is the general equation of an ellipsoid,

x2

a2+

y2

b2+

z2

c2= 1.

The following is the sketch of a typical ellipsoid.

x

y

z

If a = b = c then we will have a sphere.

Notice that we only gave the equation for the ellipsoid that has been centered on the origin. Clearlyellipsoids don’t have to be centered on the origin. However, in order to make the discussion in this section alittle easier we have chosen to concentrate on surfaces that are centered on the origin in one way or another.

25


• Cone

Here is the general equation of a cone,

x2

a2+

y2

b2=

z2

c2.

The following is the sketch of a typical cone.

x

y

z

Note that this is the equation of a cone that will open along the z-axis. To get the equation of a cone thatopens along one of the other axes all we need to do is make a slight modification of the equation. This willbe the case for the rest of the surfaces that we will be looking at in this section as well.

In the case of a cone the variable that sits by itself on one side of the equal sign will determine the axisthat the cone opens up along. For instance, a cone that opens up along the x-axis will have the equation

y2

b2+

z2

c2=

x2

a2.

For most of the following surfaces we will not give the other possible formulae. We will however acknowledgehow each formula needs to be changed to get a change of orientation for the surface.

• Cylinder

Here is the general equation of a cylinder,

x2 + y2 = r2.

The following is the sketch of a typical cylinder.

x

y

z

26


The cylinder will be centered on the axis corresponding to the variable that does not appear in the equation.Be careful to not confuse this with a circle. In two dimensions it is a circle, but in three dimensions it is acylinder.

• Hyperboloid of one sheet

Here is the general equation of a hyperboloid of one sheet,

x2

a2+

y2

b2− z2

c2= 1.

The following is the sketch of a typical hyperboloid of one sheet.

x

y

z

The variable with the negative in front of it will give the axis along which the graph is centered.

• Hyperboloid of two sheets

Here is the general equation of a hyperboloid of two sheets,

−x2

a2− y2

b2+

z2

c2= 1.

The following is the sketch of a typical hyperboloid of two sheets.

x

y

z

27


The variable with the positive in front of it will give the axis along which the graph is centered. Notice thatthe only difference between the hyperboloid of one sheet and the hyperboloid of two sheets is the signs infront of the variables. They are exactly the opposite signs.

• Elliptic paraboloid

Here is the general equation of an elliptic paraboloid,

x2

a2+

y2

b2=

z

c.

The following is the sketch of a typical elliptic paraboloid.

x

y

z

In this case the variable that is not squared determines the axis upon which the paraboloid opens up. Also,the sign of c will determine the direction that the paraboloid opens. If c is positive then it opens up andif c is negative then it opens down.

• Hyperbolic paraboloid

Here is the general equation of an hyperbolic paraboloid,

x2

a2− y2

b2=

z

c.

The following is the sketch of a typical hyperbolic paraboloid.

x

y

z

As with the elliptic paraboloid the sign of c will determine the direction in which the surface “opens up”.The graph above is shown for c positive.

With the both of the paraboloids the surface can be easily moved up (resp. down) by adding (resp.subtracting) a constant from the left side. For instance, z = −x2 − y2 + 6 is an elliptic paraboloid thatopens downward and starts at z = 6 instead of z = 0. Can you make a sketch of it?

28

Chapter 2

Vector-Valued Functions

2.1 Vector Functions

We have seen some examples of graphing surfaces in R3. However, as we saw with lines, not every graph in

R3 needs to be a surface. We can graph curves (sometimes called space curves) that are three-dimensional

as well. To do this we may use vector-valued functions or shortly vector functions. Actually, wecan also use vector functions to represent surfaces as well. However, in this section we will focus on talkingabout curves instead of surfaces. The vector form of the equation for a line is a good example of a vectorfunction. For instance,

r(t) = r0 + tv.

Vector functions take real numbers (or scalars) as arguments, t in this case, and return vectors that arethe position vectors for points on the curve (or surface).

The general form of a three-dimensional vector function for a curve is

r(t) = 〈f(t), g(t), h(t)〉, or r(t) = f(t) i + g(t) j + h(t)k,

where f(t), g(t), and h(t) are sometimes called the component functions. In the following figure the

moving point P (x, y, z) = (f(t), g(t), h(t)) (with r(t) =−−→OP being its position vector) makes up the curve

in space. We say that the equations x = f(t), y = g(t), z = h(t) parametrize the curve. Each fixedvalue of t will determine one point on the curve. Here r(t) is a vector function whereas the componentsof r are scalar functions of t.

x

y

z

rP (f(t), g(t), h(t))

O

The Curve

29

2. Vector-Valued Functions

The domain of a vector function is the set of all t for which all the component functions are defined.

� Example 2.1.1 (Vector functions)

Determine the domain of the following vector function

r(t) = 〈cos t, ln(4 − t),√

t + 1〉.

Solution The first component is defined for all t. The second component is only defined for t < 4. Thethird component is only defined for t � −1. Putting all of these together gives the following domain

[−1, 4).

This is the largest possible interval for which all three component functions are defined. �

We now need to think about how to get the graph of a space curve from a vector function. There aretwo ways to do this. The first is to think of the graph as the set of points (x, y, z) where

x = f(t), y = g(t), z = h(t).

Note that these are also the parametric equations for the curve. We have seen parametric equationsbefore and the only difference is that we are now working with them in three dimensions instead of twodimensions that we used last time.

The second way to interpret the graph is to think of r(t) = 〈f(t), g(t), h(t)〉 as the position vector ofthe point (f(t), g(t), h(t)).

Either these two ways of thinking about the graph will work and useful. We will mostly use the first wayof thinking of the graph of a vector function. Let us take a look at a couple of graphs of vector functions.

� Example 2.1.2 (Graph of vector functions)

Sketch the graph of the following vector function

r(t) = 〈2 − 4t, 1 + 2t, −3 − t〉.

Solution Notice that the graph of the given vector function is nothing but just a line. It might be betterif we rewrite it a little,

r(t) = 〈2, 1,−3〉 + t 〈−4, 2,−1〉.In this form we can see that this is the equation of a line that passes through the point (2, 1,−3) and isparallel to the vector v = 〈−4, 2,−1〉.

To graph this line all that we need to do is to plot the point and then sketch in the parallel vector. Inorder to get the sketch we assume that the vector is on the line and will start at the point in the line. Tosketch in the line all we need to do is to extend the parallel vector into a line.

Here is the sketch.

30

2.1 Vector Functions

x

y

z

v = 〈−4, 2,−1〉

•

(2, 1,−3)

The Line

�



r(t) = 〈2 cos t, 2 sin t, 5〉.

Solution In order to sketch the graph let us first get the parametric equations for the curve.

x = 2 cos t, y = 2 sin t, z = 5.

If we ignore the equation for z we will recall that the parametric equations for x and y will give acircular cylinder of radius 2 with axis along the z-axis in R

3. Now, all the parametric equations here tellus that no matter what is going on in the graph all the z-coordinates must be 5. So, we get a circle ofradius 2 centered on the z-axis and at the level of z = 5. The following is a sketch.

x

y

z

5 The circlewith radius 2

31


�

Note that it is very easy to modify the above vector function to get a circle centered on the x- or y-axisas well. For instance,

r(t) = 〈10 sin t, −3, 10 cos t〉will be a circle of radius 10 centered on the y-axis and at y = −3. In other words, as long as two ofthe terms are sine or cosine (with the same coefficient) and the other is a fixed number then we will have acircle that is centered on the axis that is given by the fixed number.



r(t) = 〈4 cos t, 4 sin t, t〉.Solution If this one has a constant in the z component we will have another circle. However, in thiscase we don’t have a constant. Instead we have a t and that will change the curve. However, because thex and y component functions are still a circle in parametric equations our curve should have a circularnature to it in some way. In fact, the only change in the z component and as t increases the z coordinatewill increase. Also, as t increases the x and y coordinates will continue to form a circle centered on thez-axis. Putting these two ideas together tells us that as we increase t the circle that is being traced out inthe x and y directions should also be rising. With its spiral-like graph the curve is named a helix.

The helix

-4 -20

24

-4-2

02

4

0

5

10

15

20

-20

24

�

As with circles the component that has the t will determine the axis that the helix rotates about. Forinstance, the vector function

r(t) = 〈t, 6 cos t, 6 sin t〉represents a helix that rotates around the x-axis. Also note that if we allow the coefficients on the sine andcosine for both the circle and helix to be different we will get ellipses. For example,

r(t) = 〈9 cos t, t, 2 sin t〉will be a helix that rotates about the y-axis and is in the shape of an ellipse.

32

2.2 Calculus with Vector Functions

� Example 2.1.5 (Vector equation for a line segment)

Determine the vector equation for the line segment starting at the point P = (x1, y1, z1) and ending at thepoint Q = (x2, y2, z2).

Solution It is important to note here that we only want to find the equation of the line segment thatstarts at P and ends at Q. We don’t want any other portion of the line and we do want the direction ofthe line segment preserved as we increase t. Let us not worry about that and just find the vector equationof the line that passes through the two points. Once we have this we will be able to get what we are after.So, we need a point on the line. Two points are given by the question and we will use P . We need a vectorthat is parallel to the line and since we have two points we can find the vector between them. This vectorwill lie on the line and hence parallel to the line. Also, remember that we want to preserve the starting andending points of the line segment so let us construct the vector using the same “orientation”.

v = 〈x2 − x1, y2 − y1, z2 − z1〉.Using this vector and the point P we get the following vector equation of the line.

r(t) = 〈x1, y1, z1〉 + t 〈x2 − x1, y2 − y1, z2 − z1〉.While this is the vector equation of the line, let us rewrite the equation slightly.

r(t) = 〈x1, y1, z1〉 + t 〈x2, y2, z2〉 − t 〈x1, y1, z1〉= (1 − t) 〈x1, y1, z1〉 + t 〈x2, y2, z2〉.

This is the equation of the line that contains the points P and Q. We of course just want the line segmentthat starts at P and ends at Q. We can get this by simply restricting the values of t. Notice that

r(0) = 〈x1, y1, z1〉, r(1) = 〈x2, y2, z2〉.So, if we restrict t to be between zero and one we will cover the line segment and we will start and end atthe corresponding point. So the vector equation of the line segment that starts at P = (x1, y1, z1) andends at Q = (x2, y2, z2) is

r(t) = (1 − t) 〈x1, y1, z1〉 + t 〈x2, y2, z2〉, 0 � t � 1.

�

As noted at the beginning of this section we can also use vector functions for surfaces as well. So, to makesure that we don’t forget that let us work an example with that as well.


In this section we need to discuss briefly about limits, derivatives and integrals of vector functions. As youwill see, these behave in a fairly predicatable manner. We will be doing all of the work in R

3 but we cannaturally extend the results in this section to R

n (i.e., n-dimensional space).

• Limits. Let us start with limits. Here is the limit of a vector function.

limt→a

r(t) = limt→a

〈f(t), g(t), h(t)〉

= 〈limt→a

f(t), limt→a

g(t), limt→a

h(t)〉 or limt→a

f(t) i + limt→a

g(t) j + limt→a

h(t)k.

So all we need to do is take the limits of each of the component functions and leave it as a vector.

33


� Example 2.2.1 (Limit of vector functions)

Compute limt→1

r(t) where r(t) = 〈t3, sin(3t − 3)

t − 1, e2t〉.

Solution Taking the limit of the vector function is equivalent to taking the limits of the three componentscalar functions.

limt→1

r(t) = 〈limt→1

t3, limt→1

sin(3t − 3)

t − 1, lim

t→1e2t〉

= 〈limt→1

t3, limt→1

3 cos(3t − 3)

1, lim

t→1e2t〉 = 〈1, 3, e2〉.

Notice that we used l’Hopital’s Rule on finding the limit of the second component function. �

• Continuity. We define continuity for vector functions the same way we define continuity for scalarfunctions. A vector function r(t) is continuous at a point t = t0 in its domain if

limt→t0

r(t) = r(t0).

The function is continuous if it is continuous at every point in its domain. Particularly, r(t) is continuousat t = t0 if and only if each component function is continuous there.

• Derivatives. Now let us take care of derivatives and after seeing how limits work it should not besurprising that we have the following for derivatives.

r′(t) = 〈f ′(t), g′(t), h′(t)〉 = f ′(t) i + g′(t) j + h′(t)k.

� Example 2.2.2 (Derivative of vector functions)

Compute r′(t) where r(t) = t6 i + sin 2t j − ln(t + 1) k.

Solution The calculations are nothing but just taking the derivatives of each component independently.

r′(t) = 6t5 i + 2 cos 2t j − 1

t + 1k.

�

1.d

dt(u + v) = u′ + v′. 4.

d

dt(u · v) = u′ · v + u · v′.

2.d

dt(cu) = cu′. 5.

d

dt(u × v) = u′ × v + u× v′.

3.d

dt(f(t)u(t)) = f ′(t)u(t) + f(t)u′(t). 6.

d

dt(u(f(t))) = u′(f(t)) f ′(t).

Most of the basic facts that we know about derivatives still hold however, just to make it clear here are somefacts about derivatives of vector functions. There is also one quick definition that we should get out of theway so that we can use it when we need to.

34


A smooth curve is any curve for which r′(t) is continuous and r′(t) = 0 for any t. A helix is asmooth curve, for example.

• Integrations. Finally, we need to discuss integrals of vector functions. Using both limits andderivatives as a guide it should be natural that we also have the following for indefinite integrals:

Zr(t) dt = 〈

Zf(t) dt,

Zg(t)dt,

Zh(t) dt〉 + c,

Zr(t) dt =

Zf(t) dt i +

Zg(t) dt j +

Zh(t) dtk + c

and the following for the definite integrals:

Z b

a

r(t) dt = 〈Z b

a

f(t) dt,

Z b

a

g(t) dt,

Z b

a

h(t) dt〉,

Z b

a

r(t) dt =

Z b

a

f(t) dt i +

Z b

a

g(t)dt j +

Z b

a

h(t) dtk.

With the indefinite integrals we put in a constant of integration to make sure that it was clear that theconstant in this case needs to be a vector instead of a regular constant.

Also, for the definite integrals we will sometimes write it as follows,Z b

a

r(t) dt =

„〈Z

f(t) dt,

Zg(t) dt,

Zh(t) dt〉

«˛˛ba

,

Z b

a

r(t) dt =

„Zf(t) dt i +

Zg(t)dt j +

Zh(t) dtk

«˛˛ba

.

In other words, we will do the indefinite integral and then do the evaluation of the vector as a whole insteadof on a component-by-component basis.

� Example 2.2.3 (Indefinite integration of vector functions)

Compute

Zr(t) dt for r(t) = 〈sin t, 6, 4t〉.

Solution All we need to do is integrate each of the components and we are done.Zr(t) dt = 〈− cos t, 6t, 2t2〉 + c,

where c is the constant (vector) of integration. �

� Example 2.2.4 (Definite integration of vector functions)

Compute

Z 1

0

r(t) dt for r(t) = 〈sin t, 6, 4t〉.

35


Solution In this case all that we need to do is apply the result from the previous example and then dothe evaluation. Z 1

0

r(t) dt =`〈− cos t, 6t, 2t2〉´˛1

0

= 〈− cos 1, 6, 2〉 − 〈−1, 0, 0〉 = 〈1 − cos 1, 6, 2〉.�

2.3 Arc Length in Space

In this section we will focus on an old formula with vector functions. We want to determine the length of avector function

r(t) = 〈f(t), g(t), h(t)〉,on the interval a � t � b. We actually know how to do this. Recall that we can write the vector functioninto the parametric form

x = f(t), y = g(t), z = h(t).

Also, recall that with two-dimensional parametric curves the arc length is given by

L =

Z b

a

p[f ′(t)]2 + [g′(t)]2 dt.

There is a natural extension of this into three dimensions. So, the length of the curve r(t) on the intervala � t � b is

L =

Z b

a

p[f ′(t)]2 + [g′(t)]2 + [h′(t)]2 dt.

There is a nice simplification that we can make for this. Notice that the integrand (the function we areintegrating) is nothing more than the magnitude of the tangent vector, i.e.,

‖r′(t)‖ =p

[f ′(t)]2 + [g′(t)]2 + [h′(t)]2.

Therefore, the arc length of the curve can be written as

L =

Z b

a

‖r′(t)‖dt.

� Example 2.3.1 (Arc length)

Determine the length of the curve r(t) = 〈2t, 3 sin 2t, 3 cos 2t〉 on the interval 0 � t � 2π.

Solution We need the tangent vector and its magnitude.

r′(t) = 〈2, 6 cos 2t, −6 sin 2t〉,

‖r′(t)‖ =p

4 + 36 cos2 2t + 36 sin2 2t =√

4 + 36 = 2√

10.

The arc length is then

L =

Z b

a

‖r′(t)‖dt =

Z 2π

0

2√

10 dt = 4π√

10.

36

2.3 Arc Length in Space

�

We need to take a quick look at another concept here. We define the arc length function as

s(t) =

Z t

0

‖r′(u)‖ du.

Before we look at why this might be important let us work a quick example.

� Example 2.3.2 (Arc length function)

Determine the arc length function for

r(t) = 〈2t, 3 sin 2t, 3 cos 2t〉.

Solution From the previous example we know that

‖r′(t)‖ = 2√

10.

The arc length function is then

s(t) =

Z t

0

2√

10 du =h2√

10uit

0= 2

√10 t.

�

• Finding arc length parametrizations. Now the question is why would we want to do this? Let ustake the result of the example above and solve it for t.

t =s

2√

10.

Taking this and plugging it into the original vector function and we can reparametrize the function intothe form r(t(s)). For our function this is

r(t(s)) = 〈 s√10

, 3 sins√10

, 3 coss√10

〉.

So, why would we want to do this? What is the reason of considering the problem of finding an arc lengthparametrization of a vector function that is expressed initially in terms of some other parameter t? Thereason is: with the reparametrization we can now tell where we are on the curve after we have traveled adistance of s along the curve. Note as well that we will start the measurement of distance from where weare at t = 0.

� Example 2.3.3 (Reparametrization of vector function)

Where on the curve

r(t) = 〈2t, 3 sin 2t, 3 cos 2t〉

are we after traveling for a distance ofπ√

10

3.

37


Solution To determine this we need the reparametrization, which we have from above

r(t(s)) = 〈 s√10

, 3 sins√10

, 3 coss√10

〉.

Then, to determine where we are all that we need to do is plug in s =π√

10

3into this and we will get our

location.

r

„t(

π√

10

3)

«= 〈π

3, 3 sin

π

3, 3 cos

π

3〉 = 〈π

3,

3√

3

2,

3

2〉.

So, after traveling a distance ofπ√

10

3along the curve we are at the point (

π

3,

3√

3

2,

3

2) in space. �

Because arc length parameters for a curve are intimately related to the geometric characteristics of the curve,arc length parametrizations have properties that are not enjoyed by other parametrizations. For example,there is a theorem stating that if a smooth function curve is represented parametrically using arc lengthparameter, then the tangent vectors all have length 1, i.e., ‖dr/ds‖ = 1.

38

Chapter 3

Partial Derivatives

The notation y = f(x) is used to indicate that the variable y depends on the single independent variablex, that is, that y is a function of x. In fact, many functions depend on more than one independent variable.For instance, the volume of a circular cone is a function V = (1/3)πr2h of its radius and its height, so itis a function V (r, h) of two variables. In this chapter we extend the basic ideas of single variable calculusto functions of several variables.

The calculus of several variables is basically single variable calculus applied to several variables onceat a time. When we hold all but one of the independent variables of a function constant and differentiatewith respect to that one variable, we get a “partial” derivative. Section 3.3 (page 46) will show how partialderivatives are defined and interpreted geometrically, and how to calculate them by applying the rules fordifferentiating functions of a single variable. Despite the fact that this chapter is about derivatives wewould like to first develop the fundamentals and to introduce the basic concepts on limits and continuity offunctions of several variables.

3.1 Functions of Several Variables

In this beginning section we first define functions of more than one independent variable and discuss theirgeometric representations.

Real-valued functions of several independent variables are defined similarly to functions in the singlevariable case. By analogy with the corresponding definition for functions of single variable, we define afunction of n variables as follows:

3.1.1 Definition Suppose D is a set of n-tuples of real numbers (x1, x2, · · · , xn). A real-valued functionf on D is a rule that assigns a unique (single) real number

w = f(x1, x2, · · · , xn)

to each element in D. The set D is the function’s domain. The set of w-values taken on by f is thefunction’s range. The symbol w is the dependent variable of f , and f is said to be a function of then independent variables x1 to xn. We also call the xj’s the function’s input variables and call w thefunction’s output variable.

Most of the examples we consider hereafter will be functions of two or three independent variables. Whena function f depends on two variables, we will usually call these independent variables x and y, and we willuse z to denote the dependent variable that represents the value of the function; that is, z = f(x, y). Wewill normally use x, y, and z as the independent variables of a function of three variables and w as the valueof the function: w = f(x, y, z). Some definitions will be given, and some theorems will be stated only forthe two-variable case, but extensions to three or more variables will usually be natural and obvious.

39

3. Partial Derivatives

• Natural domain

As we stated in the previous definition that the independent variables of a function of two or more variablesmay be restricted to lie in some set D, which we call the domain of f . Sometimes the domain will bedetermined by physical restrictions on the variables (for instance, the time t must be non-negative). If thefunction is defined by a formula and if there are no physical restrictions or other restrictions stated explicitly,then it is understood that the domain consists of all points for which the formula yields a real value for thedependent variable. We call this the natural domain of the function.

� Example 3.1.1 (Natural domain)

Sketch the natural domain of the function f(x, y) = ln(x2 − y).

Solution The function ln(x2 − y) is defined only when

x2 − y > 0 or y < x2.

We first sketch the parabola y = x2 as a “dashed” curve.

x

y

y = x2

The region y < x2 then consists of all points below this curve. Remark that the dashed boundary does notbelong to the domain. �

� Example 3.1.2 (Natural domain)

Letf(x, y, z) =

p1 − x2 − y2 − z2.

Find f(0, 12, 1

2) and the natural domain of f .

Solution By substitution,

f(0,1

2,1

2) =

r1 − (0)2 − (

1

2)2 − (

1

2)2 =

r1

2.

Because of the square root sign, we must have 1 − x2 − y2 − z2 � 0 in order to have a real value forf(x, y, z). Rewriting this inequality in the form

x2 + y2 + z2 � 1

we see that the natural domain of f consists of all points on or within the sphere

x2 + y2 + z2 = 1.

�

40

3.1 Functions of Several Variables

• Graphical representations

The graph of a function f of one variable (i.e., the graph of the equation y = f(x)) is the set of pointsin the xy-plane having coordinates (x, f(x)), where x is in the domain of f . Similarly, the graph of afunction of two variables (i.e., the graph of the equation z = f(x, y)) is the set of points in 3-space havingcoordinates (x, y, f(x, y)), where (x, y) belongs to the domain of f . This graph is a surface in R

3 lyingabove (if f(x, y) > 0) or below (if f(x, y) < 0) the domain of f in the xy-plane. The graph of a functionof three variables is a three-dimensional hypersurface in 4-space, R

4. In general, the graph of a function ofn variables is an n-dimensional surface in R

n+1. However, we will not attempt to draw graphs of functionsof more than two variables!

� Example 3.1.3 (Surface as a plane)

Consider the function

f(x, y) = 4“1 − x

2− y

3

”, 0 � x � 2, 0 � y � 4 − 2x.

The graph of f is the plane triangular surface with vertices at (2, 0, 0), (0, 3, 0), and (0, 0, 4). If thedomain of f had not been explicitly stated to be a particular set in the xy-plane, the graph would have beenthe whole plane through these three points. �

� Example 3.1.4 (Surface as a shell)

Consider the functionf(x, y) =

p9 − x2 − y2.

The expression inside the square root cannot be negative, so the domain is the disk x2 + y2 � 9 in the xy-plane. If we square the equation z =

p9 − x2 − y2, we can rewrite the result in the form x2 +y2 +z2 = 9.

This is a spherical shell of radius 3 centered at the origin. However, the graph of f is only the upperhemisphere where z � 0. �

Quite often it is difficult to sketch the surface z = f(x, y) onto a two-dimensional paper without considerableartistic talent and training. Nevertheless, you should always try to visualize such a graph and sketch it asbest you can. Sometimes it is convenient to sketch only part of a graph, for instance, the part lying in thefirst octant. It is also helpful to determine (and sketch) the intersections of the graph with various planes,especially the coordinate planes, and planes parallel to the coordinate planes.

Another way to represent the function f(x, y) graphically is to produce a two-dimensional topographicmap of the surface z = f(x, y). In the xy-plane we sketch the curves f(x, y) = C for various values of theconstant C. These curves are called level curves of f because they are the vertical projections onto thexy-plane of the curves in which the graph z = f(x, y) intersects the horizontal (level) planes z = C. Forexample, the graph of the function f(x, y) = x2 + y2 is a circular paraboloid in 3-space; the level curvesare circles centered at the origin in the xy-plane.

� Example 3.1.5 (Level curves)

The level curves of the function f(x, y) = 4“1 − x

2− y

3

”of Example 3.1.3 are the segments of the straight

lines

4“1 − x

2− y

3

”= C or

x

2+

y

3= 1 − C

4, 0 � C � 3,

which lie in the first quadrant. Such level curves correspond to equally spaced values of C, and their equalspacing indicates the uniform steepness of the graph of f . �

41


� Example 3.1.6 (Level curves)

The level curves of the function f(x, y) =p

9 − x2 − y2 of Example 3.1.4 are the concentric circlesp9 − x2 − y2 = C or x2 + y2 = 9 − C2, 0 � C � 3.

The level curves (i.e. concentric circles) should be drawn for several equally spaced values of C. The circlesare getting closer and closer for values of C getting closer and closer to 0. The decreasing spacing indicatesthe steepness of the hemispherical surface that is the graph of f . �

Let us summarize all these terminologies for a function of two variables in the following definition.

3.1.2 Definition The set of points in the plane where a function f(x, y) has a constant value f(x, y) = Cis called a level curve of f . The set of all points (x, y, f(x, y)) in space, for (x, y) in the domain of f , iscalled the graph of f . The graph of f is also called the surface z = f(x, y).

3.2 Limits and Continuity

In this section we will take a look at limits involving functions of more than one variable. In fact, we willconcentrate on limits of functions of two variables, but the ideas can be extended out to functions with morethan two variables.

Before getting into this let us briefly recall how limits of functions of one variable work. We say that

limx→a

f(x) = L (exists)

provided that the two one-sided limits exist and have equal value, that is,

limx→a−

f(x) = limx→a+

f(x) = L.

Also recall that limx→a−

f(x) is the left hand limit and requires us to only look at values of x that are less

than a. Likewise, limx→a+

f(x) is the right hand limit and requires us to only look at values of x that are

greater than a.

x

y

from the left

(a

)

from the right

L

Function of single variabley = f(x)

point of interest

•

Now, notice that in this case there are only two paths that we can take as we move in towards x = a. Wecan either move in from the left or we can move in from the right. Then in order for the limit of a functionof one variable to exist the function must be approaching the same value as we take each of these paths intowards x = a.

42


• Limits of functions of two variables

With functions of two variables we will have to do something similar, except this time there is possibly goingto be a lot more work involved. Let us first address the notation and get a feeling of what we are going tobe asking for in these kinds of limits.

We will be asking to take the limit of the function f(x, y) as x approaches a and y approaches b. Thiscan be written in several ways. Here are a couple of the more standard notations

limx→a, y→b

f(x, y), lim(x,y)→(a,b)

f(x, y).

We will use the second notation in this course. The second notation is also a little more helpful in illustratingwhat we are really doing here when we are taking a limit. In taking a limit of a function of two variableswe are really asking what the value of f(x, y) is doing as we move the point (x, y) in closer and closer tothe point (a, b) without actually letting it be (a, b).

Just like with limits of functions of one variable, in order for this limit to exist, the function must beapproaching the same value regardless of the path that we take as we move in towards (a, b). The problemthat we are immediately faced with is that there are literally an infinite number of paths that we can takeas we move towards the point (a, b).

x

y

z

Function of two variablesz = f(x, y)

a

b

• x

y

a

b

Domain off(x, y)

The above figure (right side) gives a few examples of paths that we could take. We put in several straightline paths as well as a couple of “stranger” paths that are not straight line paths. Also, we only included6 paths here and as you can see simply by varying the slope of the straight line paths there are an infinitenumber of these and then we would need to consider paths that are not straight line paths.

In other words, to show that a limit (of function of two variables) exists we would technically need tocheck an infinite number of paths and verify that the function is approaching the same value regardless ofthe path we are using to approach the point. Of course, practically, this is simply not possible. Fortunatelyhowever we can use the main ideas from single variable calculus to help us take limits here. See the followingdefinition for continuity of a function of two variables.

• Continuity of functions of two variables

As for functions of one variable, continuity of function of a function f at a point of its domain is defineddirectly in terms of the limit.

43


3.2.1 Definition A function f is continuous at the point (a, b) if

1. f is defined at (a, b),

2. lim(x,y)→(a,b)

f(x, y) exists,

3. lim(x,y)→(a,b)

f(x, y) = f(a, b).

A function is continuous if it is continuous at every point of its domain.

From a graphical viewpoint this definition means the same thing as it did when we first learnt singlevariable calculus. A function will be continuous at a point if the graph does not have any holes or breaks atthat point.

How can this help us take limits? Well, just as in single variable calculus, if you know that a functionis continuous at (a, b) then you also know that

lim(x,y)→(a,b)

f(x, y) = f(a, b)

must be true. So, if we know that a function is continuous at a point then all we need to do to take the limitof the function at that point is plug the point into the function. All the standard functions that we knowto be continuous are still continuous even if we are plugging in more than one variable now. We just needto watch out for division by zero, square roots of negative numbers, logarithms of zero or negative numbers,etc.

Note that the idea about paths however is not one that we should forget since it is a nice way to determineif a limit does not exist. If we can find two paths upon which the function approaches different values as weget near the point then we will know that the limit does not exist. Let us first take a look at a couple ofexamples.

� Example 3.2.1 (Evaluation of limits)

Determine if the following limits exist or not. If they do exist then give the value of the limit.

(a) lim(x,y)→(2,3)

`2x − y2´ , (b) lim

(x,y)→(a,b)x2y, (c) lim

(x,y)→(2,−1)

`3x2 + xy cos(πx)

´.

Solution In this example the three functions are continuous at the respective point and so all we need todo is plug in the values and we are done.

(a) lim(x,y)→(2,3)

`2x − y2

´= 2(2) − (3)2 = 4 − 9 = −5,

(b) lim(x,y)→(a,b)

x2y = a2b,

(c) lim(x,y)→(2,−1)

`3x2 + xy cos(πx)

´= 3(2)2 + (2)(−1) cos(2π) = 10.

Recall that any combinations and compositions of continuous functions (e.g. polynomials, rational functions,sine/cosine functions, exponential functions, etc.) are still continuous. For example, the composite functions

ex−y,p

x2 + y2 + 1, cosxy

x2 + 1, ln(1 + x2y2)

are continuous at every point (x, y). So, basically, we can calculate the limits of this kind of continuousfunctions by evaluating the function values at (a, b). The only reminder is that the rational functions mustbe defined at (a, b). �

44


� Example 3.2.2 (Evaluation of limits)

Investigate the limiting behavior of the function f as (x, y) approaches (5, 1):

f(x, y) =xy

x + y.

Solution In this example the function f will not be continuous along the line y = −x since we willencounter division by zero when this is true. However, for this problem that is not something that we willneed to worry about since the point that we are taking the limit at is not on this line. Therefore, all thatwe need to do is plug in the point as in Example 3.2.1 since the function is continuous at this point.

lim(x,y)→(5,1)

xy

x + y=

5

6(exists).

�

� Example 3.2.3 (Existence of limits)

Find the limit if it exists.

lim(x,y)→(0,0)

x2y2

x4 + 3y4.

Solution Now, in this example the function is not continuous at the point in question and so we cannotjust plug in the point. So, since the function is not continuous at the point there is at least a chance thatthe limit does not exist. If we can find two different paths to approach the point that will give two differentvalues for the limit then we will know that the limit does not exist. Two of the more common paths to checkare the x and y-axis so let us try those.

Before actually doing this we need to address just what exactly do we mean when we say that we aregoing to approach a point along a path. When we approach a point along a path we will do this be eitherfixing x or y or by relating x and y through some function. In this way we can reduce the limit to just alimit involving a single variable which we know how to do from elementary calculus.

So, let us see what happens along the x-axis. If we are going to approach (0, 0) along the x-axis we cantake the advantage of the fact that along the x-axis we know that y = 0. This means that, along the x-axis,we will plug in y = 0 into the function and then take the limit as x approaches zero.

lim(x,y)→(0,0)

x2y2

x4 + 3y4= lim

(x,0)→(0,0)

x2(0)2

x4 + 3(0)4= lim

x→00 = 0.

So, along the x-axis the function will approach zero as we move in towards the origin.

Now, let us try the y-axis. Along the y-axis we have x = 0 and so the limit becomes

lim(x,y)→(0,0)

x2y2

x4 + 3y4= lim

(0,y)→(0,0)

(0)2y2

(0)4 + 3y4= lim

y→00 = 0.

So, the same limit along two paths. Do not mis-read this. This does NOT say that the limit exists and hasa limit value of zero. This only means that limit happens to have the same limit value along two specialpaths.

Let us take a look at the limit of the function along a third (fairly common) path. In this case we willmove in towards the origin along the path y = x. This is what we meant previously about relating x andy through a function. To do this we will replace all the y’s with x’s and then let x approach zero. Let ustake a look at this limit.

lim(x,y)→(0,0)

x2y2

x4 + 3y4= lim

(x,x)→(0,0)

x2(x)2

x4 + 3(x)4= lim

x→0

x4

4x4= lim

x→0

1

4=

1

4.

So, a different value from the previous two paths and this means that the limit

lim(x,y)→(0,0)

x2y2

x4 + 3y4does not exist.

45


Note that we can use this idea of moving in towards the origin along a line with the more general pathy = mx if we need to. �

� Example 3.2.4 (Existence of limits)

Find the limit if it exists.

lim(x,y)→(0,0)

x3y

x6 + y2.

Solution With this ending example we still have continuity problems at the origin. So, again let us see ifwe can find a couple of paths that give different values of the limit.

First, we will use the path y = x. Along this path we have

lim(x,y)→(0,0)

x3y

x6 + y2= lim

(x,x)→(0,0)

x3(x)

x6 + (x)2= lim

x→0

x4

x2(x4 + 1)= lim

x→0

x2

x4 + 1= 0.

Now, let us try the path y = x3. Along this path the limit becomes

lim(x,y)→(0,0)

x3y

x6 + y2= lim

(x,x3)→(0,0)

x3(x3)

x6 + (x3)2= lim

x→0

x6

2x6= lim

x→0

1

2=

1

2.

We now have two paths that give different values for the limit and so the limit does not exist. As this limithas shown us we can, and often need, to use paths other than straight lines. �

3.3 Partial Derivatives

In this section we begin the process of differentiating functions of more than one variable. Before we actuallystart taking derivatives of functions of more than one variable let us recall an important interpretation ofderivatives of functions of one variable.

Recall that given a function of one variable, f(x), the derivative, f ′(x), represents the rate of changeof the function as x changes. This is an important interpretation of derivatives and we are not going to wantto lose it with functions of more than one variable. The problem with functions of more than one variableis that there is more than one variable. In other words, what do we do if we only want one of the variablesto change, or if we want more than one of them to change? In fact, if we are going to allow more than oneof the variables to change there are then going to be an infinite amount of ways for them to change. Forinstance, one variable could be changing faster than the other variable(s) in the function. Notice as wellthat it will be completely possible for the function to be changing differently depending on how we allowone or more of the variables to change.

We will need to develop ways, and notations, for dealing with all of these cases. In this section we aregoing to concentrate exclusively on only changing one of the variables at a time, while remaining variable(s)are held fixed. We will deal with allowing multiple variables to change in a later section (page 66). Becausewe are going to only allow one of the variables to change taking the derivative will become a fairly simpleprocess. Let us start off with a fairly simple function. Consider the function

f(x, y) = 2x2y3.

Let us determine the rate at which the function is changing at a point (a, b), if we hold y fixed and allow xto vary and if we hold x fixed and allow y to vary. We will start by looking at the case of holding y fixedand allowing x to vary. Since we are interested in the rate of change of the function at (a, b) and are holding

46


y fixed this means that we are going to always have y = b. Doing this will give us a function involvingonly x’s and we can define a new function as follows

g(x) = f(x, b) = 2x2b3.

Now, this is a function of a single variable and at this point all that we are asking is to determine the rateof change of g(x) at x = a. In other words, we want to compute g′(a) and since this is a function of asingle variable we already know how to do that. Here is the rate of change of the function at (a, b) if wehold y fixed and allow x to vary.

g′(a) = 4ab3.

We will call g′(a) the partial derivative of f(x, y) with respect to x at (a, b) and we will denote it inthe following way,

fx(a, b) = 4ab3.

Now, let us do it the other way. We will now hold x fixed and allow y to vary. We can do this in a similarway. Since we are holding x fixed it must be fixed at x = a and so we can define a new function of y andthen differentiate this as we have always done with functions of one variable. Here is the work for this,

h(y) = f(a, y) = 2a2y3 =⇒ h′(b) = 6a2b2.

In this case we call h′(b) the partial derivative of f(x, y) with respect to y at (a, b) and we denote itas follows

fy(a, b) = 6a2b2.

Note that these two partial derivatives are sometimes called the first-order partial derivatives. Justas with functions of one variable we can have derivatives of all orders. We will be looking at higher-order(partial) derivatives later in this section (page 57).

Note that the notation for partial derivatives is different than that for derivatives of functions of a singlevariable. With functions of a single variable we could denote the derivative with a single prime. However,with partial derivatives we will always need to remember the variable that we are differentiating with respectto and so we will subscript the variable that we differentiated with respect to. We will shortly be seeingsome various notations for partial derivatives as well.

Note as well that we usually do not use the (a, b) notation for partial derivatives. The more standardnotation is to just continue to use (x, y). So, the partial derivatives from above will more commonly bewritten as

fx(x, y) = 4xy3 and fy(x, y) = 6x2y2.

Now, as this quick example has shown taking derivatives of functions of more than one variable is done inpretty much the same manner as taking derivatives of a single variable. To compute fx(x, y) all we need todo is treat all the y’s as constants (or numbers) and then differentiate the x’s as we have done. Likewise,to compute fy(x, y) we will treat all the x’s as constants and then differentiate the y’s as we are used todoing.

Before we do a few examples let us get the formal definition of the partial derivative out of the way aswell as some alternate (but equivalent) notations. Since we can think of the two partial derivatives aboveas derivatives of single variable functions it should not be too surprising that the definition of each is verysimilar to the definition of the derivative for single variable functions. For a function of two variables, wemake this precise in the following definition.

3.3.1 Definition The first partial derivatives of the function f(x, y) with respect to the variablesx and y are the functions fx(x, y) and fy(x, y) given by

fx(x, y) = limh→0

f(x + h, y) − f(x, y)

h, and fy(x, y) = lim

h→0

f(x, y + h) − f(x, y)

h,

provided these limits exist.

47


Each of the two partial derivatives is the limit of a difference quotient in one of the variables. Observe thatfx(x, y) is just the ordinary first derivative of f(x, y) considered as a function of x only, regarding y asa constant parameter. Similarly, fy(x, y) is the first derivative of f(x, y) considered as a function of yalone, with x held fixed.

� Example 3.3.1 (Partial derivatives)

If f(x, y) = x2 sin y, then

fx(x, y) = 2x sin y and fy(x, y) = x2 cos y.

�

Remark that various notations can be used freely to denote the partial derivatives of z = f(x, y) consideredas functions of x and y:

fx(x, y) = fx =∂f

∂x=

∂

∂x(f(x, y)) = zx =

∂z

∂x,

fy(x, y) = fy =∂f

∂y=

∂

∂y(f(x, y)) = zy =

∂z

∂y.

For the fractional notation for the partial derivative notice the difference between the partial and the ordinaryderivative from single variable calculus.

f(x) =⇒ f ′(x) =df

dx,

f(x, y) =⇒ fx(x, y) =∂f

∂x, and fy(x, y) =

∂f

∂y.

To distinguish partial derivatives from ordinary derivatives we use the symbol ∂ rather than the d usedin single variable calculus. The symbol ∂/∂x should be read as “partial with respect to x” so ∂f/∂x is“partial f with respect to x”.

Let us work some examples. When working these examples always keep in mind that we need to payvery much attention to which variable we are differentiating with respect to. This is important because weare going to treat all other variables as constants and then proceed with the derivative as if it was a functionof a single variable. Also note that the standard differentiation rules for sums, products, reciprocals, andquotients continue to apply to partial derivatives.

� Example 3.3.2 (Partial derivatives)

Find all of the first-order partial derivatives for the following functions.

(a) f(x, y) = x4 + 6√

y − 10.

(b) w = x2y − 10y2z3 + 44x − 7 tan(4y).

(c) h(s, t) = t7 ln s2 +9

t3− 7

√s4.

(d) f(x, y) = cos4

xex2y−5y3

.

(e) z =9u

u2 + 5v.

(f) g(x, y, z) =x sin y

z2.

(g) z =p

x2 + ln(5x − 3y2).

Solution

48


(a) Let us first take the derivative with respect to x and remember that as we do so all the y’s will betreated as constants. The partial derivative with respect to x is

fx(x, y) = 4x3.

Notice that the second and the third terms differentiate to zero in this case. It should be clear whythe third term differentiated to zero. It is a constant and we know that constants always differentiateto zero. This is also the reason that the second term differentiated to zero. Remember that since weare differentiating with respect to x here we are going to treat all y’s as constants. This means thatthose terms that only involve y’s will be treated as constants and hence differentiate to zero.

Now, let us take the derivative with respect to y. In this case we treat all x’s as constants and so thefirst term involves only x’s and so will differentiate to zero, just as the third term will. The partialderivative with respect to y is

fy(x, y) =3√y

.

(b) With this function we have three first-order derivatives to compute. Let us do the partial derivativeswith respect to x first. Since we are differentiating with respect to x we will treat all y’s and all z’sas constants. This means that the second and fourth terms will differentiate to zero since they onlyinvolve y’s and z’s. The first term contains both x’s and y’s and so when we differentiate with respectto x the y is just treated to be a multiplicative constant and so the first term will be differentiatedjust as the third term will be differentiated. Here is the partial derivative with respect to x.

∂w

∂x= 2xy + 44.

Let us now differentiate with respect to y. In this case all x’s and z’s will be treated as constants.This means the third term will differentiate to zero since it contains only x’s while the x’s in the firstterm and the z’s in the second term will be treated as multiplicative constants. Here is the partialderivative with respect to y.

∂w

∂y= x2 − 20yz3 − 28 sec2(4y).

Finally, let us get the derivative with respect to z. Since only one of the terms involve z’s this will bethe only non-zero term in the derivative. Also, the y’s in that term will be treated as multiplicativeconstants. Here is the partial derivative with respect to z.

∂w

∂z= −30y2z2.

(c) With this function we will not put in the detail of the first two. Before taking derivative let us rewritethe function a little to help us with the differentiation process.

h(s, t) = 2t7 ln s + 9t−3 − s47 .

Now, the fact that we are using s and t here instead of the “standard” x and y should not be a problemat all. It will work the same way. Here are two partial derivatives for this function.

hs(s, t) =∂h

∂s= 2t7 · 1

s+ 0 − 4

7s−

37 =

2t7

s− 4

7s−

37 ,

ht(s, t) =∂h

∂t= 14t6 · ln s − 27t−4.

49


(d) Now, we cannot forget the product rule with derivatives. The product rule will work the same wayhere as it does with functions of one variable. We will just need to be careful to remember whichvariable we are differentiating with respect to. Let us start out by differentiating with respect to x. Inthis case both the cosine and the exponential contain x’s and so we have a product of two functionsinvolving x’s and so we will need the product rule for differentiation. Here is the derivative withrespect to x.

fx(x, y) = − sin4

x· (− 4

x2) · ex2y−5y3

+ cos4

x· ex2y−5y3 · (2xy)

=4

x2sin

4

xex2y−5y3

+ 2xy cos4

xex2y−5y3

.

Do not forget the chain rule for functions of one variable. We will be looking at the chain rule forsome more complicated expressions for multivariable functions in a later section (page 59). However,at this point we are treating all the y’s as constants and so the chain rule will continue to work as itdoes in single variable calculus.

Also, do not forget how to differentiate exponential functions

d

dx

“ef(x)

”= f ′(x) ef(x).

Now, let us differentiate with respect to y. In this case we do not have a product rule to worry aboutsince the only place that the y shows up is in the exponential. Therefore, since x’s are considered tobe constants for this derivative, the cosine in the front will also be treated as a multiplicative constant.Here is the partial derivative with respect to y.

fy(x, y) = (x2 − 15y2) cos4

xex2y−5y3

.

(e) We also cannot forget about the quotient rule. Since there is not much to do this one, we will simplygive the derivatives.

zu =9(u2 + 5v) − 9u(2u)

(u2 + 5v)2=

−9u2 + 45v

(u2 + 5v)2,

zv =(0)(u2 + 5v) − 9u(5)

(u2 + 5v)2=

−45u

(u2 + 5v)2.

In the case of the derivative with respect to v recall that u’s are constants and so when we differentiatethe numerator we will get zero.

(f) Now, we do need to be careful however not to use the quotient rule when it does not need to be used.In this case we do have a quotient, however, since the x’s and y’s only appear in the numerator andthe z’s only appear in the denominator this really is not a quotient rule problem. Let us computethe derivatives with respect to x and y first. In both these cases the z’s are constants and so thedenominator is a constant and so we do not really need to worry too much about it. Here are thederivatives for these two cases.

gx(x, y, z) =sin y

z2and gy(x, y, z) =

x cos y

z2.

Now, in the case of differentiation with respect to z we can avoid the quotient rule with a quick rewriteof the function. Here is the rewrite as well as the derivative with respect to z.

g(x, y, z) = x sin y · z−2,

gz(x, y, z) = −2x sin y · z−3 = −2x sin y

z3.

50


(g) In the last part we are going to apply the chain rule. If you have a good knowledge in single variablecalculus this should not be all that difficult of a problem. Here are the two derivatives.

zx =1

2

`x2 + ln(5x − 3y2)

´− 12 · ∂

∂x

`x2 + ln(5x − 3y2)

´=

1

2

`x2 + ln(5x − 3y2)

´− 12 ·„

2x +5

5x − 3y2

«

=

„x +

5

2(5x − 3y2)

«`x2 + ln(5x − 3y2)

´− 12 ,

zy =1

2

`x2 + ln(5x − 3y2)

´− 12 · ∂

∂y

`x2 + ln(5x − 3y2)

´

=1

2

`x2 + ln(5x − 3y2)

´− 12 ·„ −6y

5x − 3y2

«

= − 3y

5x − 3y2

`x2 + ln(5x − 3y2)

´− 12 .

�

So, there are some examples of partial derivatives. Hopefully you will agree that as long as we canremember to treat the other variables as constants these work in exactly the same manner that derivativesof functions of one variable do. So, if you can do single variable calculus derivative you should not have toomuch difficulty in doing basic partial derivatives.

• Implicit differentiation

There is one important topic that we need to take a quick look in this section, implicit differentiation. Beforegetting into implicit differentiation for multivariable functions let us first remark how implicit differentiationworks for functions of one variable.

� Example 3.3.3 (Implicit differentiation)

Finddy

dxfor 3y4 + x7 = 5x.

Solution Remember that the key to this is to always think of y as a function of x, or y = y(x) and sowhenever we differentiate a term involving y’s with respect to x we will need to use the chain rule which

will mean that we will add on ady

dxto that term. The first step is to differentiate both sides with respect

to x, we have

12y3 · dy

dx+ 7x6 = 5.

The second step is to solve fordy

dx.

dy

dx=

5 − 7x6

12y3.

�

Implicit differentiation works in exactly the same manner with multivariable functions. If we have a functionin terms of three variables x, y, and z we will assume that z is in fact a function of x and y. In other

51


words, z = z(x, y). Then whenever we differentiate z’s with respect to x we will use the chain rule and

add on a∂z

∂x. Likewise, whenever we differentiate z’s with respect to y we will add on a

∂z

∂y. Let us take

a quick look at a couple examples of implicit differentiation problems.


Find∂z

∂xand

∂z

∂yfor each of the following functions.

(a) x3z2 − 5xy5z = x2 + y3. (b) x2 sin(2y − 5z) = 1 + y cos(6xz).

Solution

(a) Let us start with finding∂z

∂x. We will differentiate both sides with respect to x and remember to

add on a∂z

∂xwhenever we differentiate a z.

3x2z2 + 2x3z · ∂z

∂x− 5y5z − 5xy5 · ∂z

∂x= 2x.

Remember that since we are assuming z = z(x, y) then any product of x’s and z’s will be a product

and so we need the product rule. Now, solve for∂z

∂x.

`2x3z − 5xy5

´ ∂z

∂x= 2x − 3x2z2 + 5y5z,

∂z

∂x=

2x − 3x2z2 + 5y5z

2x3z − 5xy5.

Now, we will do the same thing for∂z

∂yexcept this time we will need to remember to add on a

∂z

∂ywhenever we differentiate a z.

2x3z · ∂z

∂y− 25xy4z − 5xy5 · ∂z

∂y= 3y2,

`2x3z − 5xy5

´ ∂z

∂y= 3y2 + 25xy4z,

∂z

∂y=

3y2 + 25xy4z

2x3z − 5xy5.

(b) Basically, we will do the same thing for this function as we did in the previous part. Let us find∂z

∂xfirst.

2x sin(2y − 5z) + x2 cos(2y − 5z) · (−5∂z

∂x) = −y sin(6xz) · (6z + 6x

∂z

∂x).

Do not forget to do the chain rule on each of the trigonometric functions and when we are differentiating

the inside function on the cosine we will need to also use the product rule. Now let us solve for∂z

∂x.

2x sin(2y − 5z) − 5x2 cos(2y − 5z)∂z

∂x= −6yz sin(6xz) − 6xy sin(6xz)

∂z

∂x,

2x sin(2y − 5z) + 6yz sin(6xz) =`5x2 cos(2y − 5z) − 6xy sin(6xz)

´ ∂z

∂x,

∂z

∂x=

2x sin(2y − 5z) + 6yz sin(6xz)

5x2 cos(2y − 5z) − 6xy sin(6xz).

52


Next, let us find∂z

∂y. This one will be slightly easier than the first one.

x2 cos(2y − 5z) ·„

2 − 5∂z

∂y

«= cos(6xz) − y sin(6xz)(6x

∂z

∂y),

2x2 cos(2y − 5z) − 5x2 cos(2y − 5z)∂z

∂y= cos(6xz) − 6xy sin(6xz)

∂z

∂y,

`6xy sin(6xz) − 5x2 cos(2y − 5z)

´ ∂z

∂y= cos(6xz) − 2x2 cos(2y − 5z),

∂z

∂y=

cos(6xz) − 2x2 cos(2y − 5z)

6xy sin(6xz) − 5x2 cos(2y − 5z).

�

• Interpretations of partial derivatives

At this point we will show that the two main interpretations of derivatives of functions of a single variablestill hold for partial derivatives, with small modifications of course to account for the fact that we now havemore than one variable.

Rates of change. The first interpretation we have already seen and is the more important of the two.As with functions of several variables partial derivatives represent the rates of change of the functions as thevariables change. As we saw previously, fx(x, y) represents the rate of change of the function f(x, y) aswe change x and hold y fixed while fy(x, y) represents the rate of change of the function f(x, y) as wechange y and hold x fixed.

� Example 3.3.5 (Rates of change)

Determine if f(x, y) =x2

y3is increasing or decreasing at (2, 5),

(a) if we allow x to vary and hold y fixed. (b) if we allow y to vary and hold x fixed.

Solution

(a) In this case we still first need fx(x, y) and its value at the point.

fx(x, y) =2x

y3=⇒ fx(2, 5) =

4

125> 0.

The partial derivative with respect to x is positive and therefore if we hold y fixed the function isincreasing at (2, 5) as we vary x.

(b) For this part we will need fy(x, y) and its value at the point.

fy(x, y) = −3x2

y4=⇒ fy(2, 5) = − 12

625< 0.

The partial derivative with respect to y is negative and therefore the function is decreasing at (2, 5)as we vary y and hold x fixed. �

53


Note that it is completely possible for a function to be increasing for a fixed y and decreasing for a fixed x ata point as the above example has shown. To see a nice example of this take a look at the following graph.

y

x

z

-2

-1

0

1

2

-2

0

2

-0.5

0

0.5

1

-2

-1

0

1

This is a graph of hyperbolic paraboloid and at the origin we can see that if we move along the positivex-axis the graph is increasing and if we move along the positive y-axis the graph is decreasing. So it iscompletely possible to have a graph both increasing and decreasing at a point depending upon the directionthat we move. We should never expect the function will behave in exactly the same way at a point as eachvariable changes.

Slopes of tangent lines. The next interpretation was one of the standard interpretations in any singlevariable calculus course. We know that from single variable calculus that f ′(a) represents the slope of thetangent line to the curve y = f(x) at x = a. Here, fx(x, y) and fy(x, y) also represent the slopes oftangent lines. The difference is the functions that they represent tangent lines to.

Partial derivatives are the slopes of traces. By a trace to f(x, y) at the point (a, b) we mean theintersection curve between the surface defined by z = f(x, y) and a plane defined by x = a (resp. byanother plane y = b). In particular, the partial derivative fx(a, b) is the slope of the trace to f(x, y) forthe plane y = b at the point (a, b). Likewise, the partial derivative fy(a, b) is the slope of the trace tof(x, y) for the plane x = a at the point (a, b).

� Example 3.3.6 (Slopes of tangent lines)

Find the slopes of the traces to z = 10 − 4x2 − y2 at the point (1, 2).

Solution We sketch the graphs of the traces for the planes x = 1 and y = 2 in the following.

54


x y

z

Trace for x = 1x y

z

Trace for y = 2

Next we will need the two partial derivatives so we can get the slopes.

fx(x, y) = −8x, fy(x, y) = −2y.

To get the slopes all we need to do is evaluate the partial derivatives at the point (1, 2).

fx(1, 2) = −8, fy(1, 2) = −4.

So, the tangent line at (1, 2) for the trace to z = 10 − 4x2 − y2 for the plane y = 2 has a slope of −8.Also, the tangent line at (1, 2) for the trace to z = 10 − 4x2 − y2 for the plane x = 1 has a slope of −4.

�

� Example 3.3.7 (Slopes of tangent lines)

The plane x = 1 intersects the paraboloid z = x2 + y2 in a parabola. Find the slope of the tangent to theparabola at (1, 2, 5).

Solution The slope is the value of the partial derivative ∂z/∂y at (1, 2).

∂z

∂y

˛˛(1,2)

=∂

∂y(x2 + y2)

˛˛(1,2)

= 2y˛y=2

= 4.

As a check, we can treat the parabola as the graph of the single variable function z = (1)2 + y2 = 1 + y2

in the plane x = 1 and ask for the slope at y = 2. The slope, calculated now as an ordinary derivative, is

dz

dy

˛˛y=2

=d

dy(1 + y2)

˛˛y=2

= 2y˛y=2

= 4.

Could you sketch the graphs of the functions involved in this question? �

Vector equations of tangent line. Finally, let us briefly talk about getting the equations of the tangentline. Recall that the equation of a line in 3-space is given by a vector equation. Also to get the equation weneed a point on the line and a vector that is parallel to the line. The point is easy. Since we know the x-ycoordinates of the point all we need to do is plug this into the equation to get the point. So, the point willbe

(a, b, f(a, b)).

55


The parallel (or tangent) vector is also easy. We can write the equation of the surface as a vector functionas follows,

r(x, y) = xi + yj + zk = xi + yj + f(x, y)k,

or in the alternate vector notation

r(x, y) = 〈x, y, f(x, y)〉.We know that if we have a vector function of one variable we can get a tangent vector by differentiating thevector function. The same will still be true here. If we differentiate with respect to x we will get a vector totraces for the plane y = b (i.e. for fixed y) and if we differentiate with respect to y we will get a vector totraces for the plane x = a (for fixed x). The following is the tangent vector for traces with fixed y.

rx(x, y) = 〈1, 0, fx(x, y)〉.

We differentiated each component with respect to x. Therefore the first component becomes a one and thesecond becomes a zero because we are treating y as a constant when we differentiate with respect to x. Thethird component is just the partial derivative of the function with respect to x. For traces with fixed x thetangent vector is

ry(x, y) = 〈0, 1, fy(x, y)〉.The equation for the tangent line to traces with fixed y is

r(t) = 〈a, b, f(a, b)〉 + t 〈1, 0, fx(a, b)〉,

whereas the tangent line to traces with fixed x is

r(t) = 〈a, b, f(a, b)〉 + t 〈0, 1, fy(a, b)〉.

� Example 3.3.8 (Vector equations of tangent line)

Write down the vector equations of the tangent lines to the traces to

z = 10 − 4x2 − y2

at the point (1, 2).

Solution Actually there is not much to do other than plugging the values and function into the formulasabove. We have already computed the derivatives and their values at (1, 2) in Example 3.3.6 (page 54) andthe point on each trace is

(1, 2, f(1, 2)) = (1, 2, 2).

The equation of the tangent line to the trace for the plane y = 2 is given by

r(t) = 〈1, 2, 2〉 + t 〈1, 0,−8〉 = 〈1 + t, 2, 2 − 8t〉,

and the equation of the tangent line to the trace for the plane x = 1 is given by

r(t) = 〈1, 2, 2〉 + t 〈0, 1,−4〉 = 〈1, 2 + t, 2 − 4t〉.

�

56


• Higher-order partial derivatives

Just as we have higher-order derivatives with functions of one variable we will also have higher-order (partial)derivatives of functions of more than one variable. Consider the case of a function of two variables, f(x, y),since both of the first-order partial derivatives are also functions of x and y we could in turn differentiateeach with respect to x or y. This means that for the case of a function of two variables there will be a totalfour possible second-order derivatives. Here they are the notations that we will use to denote them.

(fx)x = fxx =∂

∂x

„∂f

∂x

«=

∂2f

∂x2,

(fx)y = fxy =∂

∂y

„∂f

∂x

«=

∂2f

∂y∂x,

(fy)x = fyx =∂

∂x

„∂f

∂y

«=

∂2f

∂x∂y,

(fy)y = fyy =∂

∂y

„∂f

∂y

«=

∂2f

∂y2.

In the above, the second and third second-order partial derivatives are often called mixed partial derivativessince we are taking derivatives with respect to more than one variable. Note as well that the order that wetake the derivatives in is given by the notation for each of these. If we are using the subscripting notation, forexample fxy, then we will differentiate from left to right. In other words, in this case, we will differentiate

first with respect to x and then with respect to y. With the fractional notation, for example∂2f

∂y∂x, it is

the opposite. In these cases we differentiate moving along the denominator from right to left. So, again, inthis case we differentiate with respect to x first and then y.

Let us take a quick look at an example.

� Example 3.3.9 (Second-order partial derivatives)

Find all the second-order derivatives for f(x, y) = cos(2x) − x2e5y + 3y2.

Solution We will need the first-order derivatives so here they are

fx(x, y) = −2 sin(2x) − 2xe5y , fy(x, y) = −5x2e5y + 6y.

Now, let us get the second-order derivatives. They are

fxx = −4 cos(2x) − 2e5y , fxy = −10xe5y , fyx = −10xe5y , fyy = −25x2e5y + 6.

�

Note that we dropped the (x, y) from the derivatives (i.e., writing for example fxx instead of fxx(x, y)).This is fairly standard and we will be doing it most of the time from now on. We will also be dropping itfor the first-order derivatives in most cases.

You may have noticed that the “mixed” second-order partial derivatives

fxy =∂2f

∂y∂xand fyx =

∂2f

∂x∂y

in Example 3.3.9 are equal. This is not a coincidence. If the function is “nice enough” this will always bethe case. So, what is actually “nice enough”? The following theorem tells us the answer.

57


Theorem 3.3.1 (The Mixed Derivative Theorem) If f(x, y) and its partial derivatives fx, fy,fxy, and fyx are defined on a disk containing a point (a, b) and are all continuous at (a, b), then

fxy(a, b) = fyx(a, b).

The theorem is also known as Clairaut’s Theorem, named after the French mathematician Alexis Clairautwho discovered it. The proof is omitted here. This theorem says that to calculate a mixed second-orderderivative, we may differentiate in either order, provided the continuity conditions are satisfied. This canlead to our advantage.

� Example 3.3.10 (Mixed derivative)

Find∂2w

∂x∂yif w = xy +

ey

y2 + 1.

Solution The symbol ∂2w∂x∂y

tells us to differentiate first with respect to y and then with respect to x. Ifwe postpone the differentiation with respect to y and differentiate first with respect to x, however, we getthe answer more quickly. In two steps,

∂w

∂x= y and

∂2w

∂y∂x= 1.

If we differentiate first with respect to y, certainly we still obtain ∂2w∂x∂y

= 1 as well. We can differentiatein either order because the conditions of Theorem 3.3.1 hold for w at all points. �

Although we will deal mostly with first- and second-order partial derivatives, because these appear the mostfrequently in applications, there is no theoretical limit to how many times we can differentiate a functionas long as the derivatives involved exist. There are higher-order derivatives as well and the following is acouple of the third-order partial derivatives of a function of two variables.

fxyx = (fxy)x =∂

∂x

„∂2f

∂y∂x

«=

∂3f

∂x∂y∂x,

fyxx = (fyx)x =∂

∂x

„∂2f

∂x∂y

«=

∂3f

∂2x∂y.

Notice as well that for both of these we differentiate once with respect to y and twice with respect to x.There is also another third-order partial derivative in which we can do this, fxxy. There is an extension toClairaut’s Theorem that says if furthermore all three of these are continuous then they should all be equal,

fxxy = fxyx = fyxx.

To this point we have only looked at functions of two variables, but everything that we have done here willwork regardless of the number of variables that we have got in the function and there are natural extensionto Clairaut’s theorem to all of these cases as well. For instance,

fxz(x, y, z) = fzx(x, y, z),

provided both of the derivatives are continuous. In general, we can extend Clairaut’s theorem to anyfunction and mixed partial derivatives. The only requirement is that in each derivative we differentiatewith respect to each variable the same number of times. In other words, provided we meet the continuitycondition, the following will be equal

fssrtsrr = frtsrssr

because in each case we differentiate with respect to t once, s three times and r three times. Let us do acouple of examples with higher-order derivatives and functions of more than two variables.

� Example 3.3.11 (Higher-order derivatives)

Find the indicated derivative for each of the following functions.

58

3.4 The Chain Rule

(a) Find fxxyzz for f(x, y, z) = z3y2 lnx. (b) Find∂3f

∂y∂x2for f(x, y) = exy.

Solution

(a) In this case remember that we differentiate from left to right. The derivatives are

fx =z3y2

x, fxx = −z3y2

x2, fxxy = −2z3y

x2,

fxxyz = −6z2y

x2, fxxyzz = −12zy

x2.

(b) Here we differentiate from right to left. The derivatives are

∂f

∂x= yexy,

∂2f

∂x2= y2exy,

∂3f

∂y∂x2= 2yexy + xy2exy.

�

3.4 The Chain Rule

The chain rule for functions of a single variable says that when y = f(x) is a differentiable function of xand x = g(t) is a differentiable function of t, y becomes a differentiable function of t and dy/dt could becalculated with the formula

dy

dt=

dy

dx

dx

dt.

It is now time to extend the chain rule out to more complicated situations. Notice that in the above thederivative dy/dt really does make sense since if we plug in for x then y really will be a function of t. Oneway to remember this form of the chain rule is to note that if we think of the two derivatives on the rightside as fractions the dx’s will cancel to get the same derivative on both sides.

As with many topics in multivariable calculus, there are in fact many different formulas depending onthe number of variables that we are dealing with. So, let us start this discussion off with a function of twovariables, z = f(x, y). From this point there are still many different possibilities that we can look at. Wewill be looking at two distinct cases prior to generalizing the whole idea out.

Case 1. z = f(x, y), x = g(t), y = h(t) and computedz

dt.

This case is analogous to the standard chain rule from single variable calculus that we looked at above. Inthis case we are going to compute an ordinary derivative since z really would be a function of t only if wesubstitute in for x and y. The chain rule for this case is

dz

dt=

∂f

∂x

dx

dt+

∂f

∂y

dy

dt.

So, basically what we have done here is differentiating f with respect to each variable in it and thenmultiplying each of these by the derivative of that variable with respect to t. The final step is to add themup together. Let us take a look at a couple of examples.

� Example 3.4.1 (Chain rule)

Computedz

dtfor each of the following.

59


(a) z = xexy, x = t2, y = t−1. (b) z = x2y3 + y cos x, x = ln t, y = sin(4t).

Solution

(a) We may directly apply the formula

dz

dt=

∂f

∂x

dx

dt+

∂f

∂y

dy

dt

= (exy + xyexy)(2t) + x2exy(−t−2)

= 2t(1 + xy) exy − t−2x2exy.

So, technically we have computed the derivative. However, we should probably go ahead and substitutein for x and y as well at this point since we have already got t’s in the derivative. Doing this gives

dz

dt= 2t(1 + t) et − t−2t4et = (2t + t2) et.

Note that in this case it might actually be easier to just substitute in for x and y in the originalfunction and just compute the derivative as we normally would. For comparison purpose let us dothat

z = t2et =⇒ dz

dt= 2tet + t2et.

The same result for less work. Note however, that often it will actually be more work to do thesubstitution first.

(b) In this case it would almost definitely be more work to do the substitution first so we will use thechain rule first and then substitute.

dz

dt= (2xy3 − y sin x)(

1

t) + (3x2y2 + cos x)(4 cos(4t))

=2 sin3(4t) ln t − sin(4t) sin(ln t)

t+ 4 cos(4t)

`3 sin2(4t) ln2 t + cos(ln t)

´.

Note that sometimes, because of the significant mess of the final answer, we will only simplify the firststep a little and leave the answer in terms of x and y, and t. This is dependent upon the situation,class and instructor however this kind of substitution work is not necessary in the examinations forthis class.

�

Now, there is a special case that we should take a quick look at before moving on to the next case. Let ussuppose that we have the following situation.

z = f(x, y) and y = g(x).

In this cae the chain rule fordz

dxbecomes

dz

dx=

∂f

∂x

dx

dx+

∂f

∂y

dy

dx=

∂f

∂x+

∂f

∂y

dy

dx.

In the first term we used the fact thatdx

dx=

d

dx(x) = 1.

Let us take a quick look at an example.

60

3.4 The Chain Rule


Computedz

dxfor

z = x ln(xy) + y3 and y = cos(x2 + 1).

Solution We just plug into the formula

dz

dx=

»ln(xy) + x · y

xy

–+

»x · x

xy+ 3y2

–`−2x sin(x2 + 1)´

= ln`x cos(x2 + 1)

´+ 1 − 2x sin(x2 + 1)

„x

cos(x2 + 1)+ 3 cos2(x2 + 1)

«= ln

`x cos(x2 + 1)

´+ 1 − 2x2 tan(x2 + 1) − 6x sin(x2 + 1) cos2(x2 + 1).

�

Let us take a look at the second case.

Case 2. z = f(x, y), x = g(s, t), y = h(s, t) and compute∂z

∂sand

∂z

∂t.

In this case if we substitute in for x and y we may find that z is a function of s and t and so it makes sensethat we will be computing partial derivatives here and that there will be two of them. Here is the chain rulefor both of these partial derivatives.

∂z

∂s=

∂f

∂x

∂x

∂s+

∂f

∂y

∂y

∂sand

∂z

∂t=

∂f

∂x

∂x

∂t+

∂f

∂y

∂y

∂t.

So, not surprisingly, these are very similar to the first case that we looked at. Here is a quick example ofthis kind of chain rule.


Find∂z

∂sand

∂z

∂tfor

z = e2r sin(3θ), r = st − t2, θ =p

s2 + t2.

Solution Here is the chain rule for∂z

∂s.

∂z

∂s=

`2e2r sin(3θ)

´(t) +

`3e2r cos(3θ)

´ s√s2 + t2

= 2t e2(st−t2) sin(3√

s2 + t2) +3se2(st−t2) cos(3

√s2 + t2)√

s2 + t2.

Now the chain rule for∂z

∂t.

∂z

∂t=

`2e2r sin(3θ)

´(s − 2t) +

`3e2r cos(3θ)

´ t√s2 + t2

= 2(s − 2t) e2(st−t2) sin(3√

s2 + t2) +3te2(st−t2) cos(3

√s2 + t2)√

s2 + t2.

�

61


We have seen a couple of cases for the chain rule let us see the general version of the chain rule.

Chain Rule Suppose that z is a function of n variables x1, x2, · · · , xn, and that each of thesevariables are in turn functions of m variables t1, t2, · · · , tm. Then for any variable ti (i = 1, 2, · · · , m),we have the following

∂z

∂ti=

∂z

∂x1

∂x1

∂ti+

∂z

∂x2

∂x2

∂ti+ · · · + ∂z

∂xn

∂xn

∂ti.

This is a bit troublesome. There is actually an easier way to construct all the chain rules that we havediscussed in the section or will look at in later examples. We can build up a tree diagram that will giveus the chain rule for any situation. To see how these work let us go back and take a look at the chain rulefor ∂z/∂s given that z = f(x, y), x = g(s, t), y = h(s, t). Of course we have already known the answerbut the following tree diagram is used as an illustration. For reference, here is the chain rule for this case,

∂z

∂s=

∂f

∂x

∂x

∂s+

∂f

∂y

∂y

∂s.

Here is the tree diagram for this case.

z

∂z∂x

∂z∂y

x y

∂x∂s

∂x∂t

∂y∂s

∂y∂t

s t s t

We start at the top with the function itself and the branch out from that point. The first set of branches isfor the variables in the function. From each of these endpoints we put down a further set of branches thatgives the variables that both x and y are a function of. We connect each letter with a line and each linerepresents a partial derivative as shown. Note that the letter in the numerator of the partial derivative isthe upper “node” of the tree and the letter in the denominator of the partial derivative is the lower “node”of the tree.

To use this to get the chain rule we start at the bottom and for each branch that ends with the variablewe want to take the derivative with respect to (s in this case) we move up the tree until we hit the topmultiplying the derivatives that we see along that set of branches. Once we have done this for each branchthat ends at s, we then add the results up to get the chain rule for that given situation.

Note that we do not usually put the derivatives in the tree. They are always an assumed part of thetree. Let us write down the chain rules for a couple of examples.


Use a tree diagram to write down the chain rule for the given derivatives.

62

3.4 The Chain Rule

(a)dw

dtfor w = f(x, y, z), x = g1(t), y = g2(t), z = g3(t).

(b)∂w

∂rfor w = f(x, y, z), x = g1(r, s, t), y = g2(r, s, t), z = g3(r, s, t).

Solution

(a) We first draw the tree diagram.

w

x y z

t t t

From this tree diagram we know that the chain rule is given by

dw

dt=

∂f

∂x

dx

dt+

∂f

∂y

dy

dt+

∂f

∂z

dz

dt

which is really just a natural extension to the two variable case that we saw before.

(b) Here is the tree diagram for this situation.

w

x y z

r s t r s t r s t

From this tree diagram we know that the chain rule is given by

∂w

∂r=

∂f

∂x

∂x

∂r+

∂f

∂y

∂y

∂r+

∂f

∂z

∂z

∂r.

�

So, provided we can construct the tree diagram, and it is not too difficult to write down the chain rule forany set up that we might run across. We have now seen how to take the first-order derivatives of these morecomplicated situation, but what about higher-order derivatives? How do we do these? It is probably easiestto see how to deal with these with an example.

63



Compute∂2z

∂θ2for z = f(x, y) if x = r cos θ and y = r sin θ.

Solution We will need the first-order derivative before we can even think about finding the second-orderderivative so let us get that. This situation falls into the second case that we looked at above so we do notneed a new tree diagram. The following is the first-order derivative.

∂f

∂θ=

∂f

∂x

∂x

∂θ+

∂f

∂y

∂y

∂θ

= −r sin θ∂f

∂x+ r cos θ

∂f

∂y.

Now, the second-order derivative is given by

∂2f

∂θ2=

∂

∂θ

„∂f

∂θ

«=

∂

∂θ

„−r sin θ

∂f

∂x+ r cos θ

∂f

∂y

«.

The issue here is to correctly deal with this derivative. Since the two first-order derivatives, ∂f∂x

and ∂f∂y

,are both functions of x and y which are in turn functions of r and θ both of these terms are products. So,using the product rule gives the following,

∂2f

∂θ2= −r cos θ

∂f

∂x− r sin θ

∂

∂θ

„∂f

∂x

«− r sin θ

∂f

∂y+ r cos θ

∂

∂θ

„∂f

∂y

«.

We now need to determine what ∂∂θ

`∂f∂x

´and ∂

∂θ

“∂f∂y

”will be. These are both chain rule problems again

since both of the derivatives are functions of x and y and we want to take the derivative with respect to θ.

∂

∂θ

„∂f

∂x

«= −r sin θ

∂

∂x

„∂f

∂x

«+ r cos θ

∂

∂y

„∂f

∂x

«

= −r sin θ∂2f

∂x2+ r cos θ

∂2f

∂y∂x,

∂

∂θ

„∂f

∂y

«= −r sin θ

∂

∂x

„∂f

∂y

«+ r cos θ

∂

∂y

„∂f

∂y

«

= −r sin θ∂2f

∂x∂y+ r cos θ

∂2f

∂y2.

The final step is to plug these back into the second-order derivative and do some simplifying.

∂2f

∂θ2= −r cos θ

∂f

∂x− r sin θ

„−r sin θ

∂2f

∂x2+ r cos θ

∂2f

∂y∂x

«

−r sin θ∂f

∂y+ r cos θ

„−r sin θ

∂2f

∂x∂y+ r cos θ

∂2f

∂y2

«

= −r cos θ∂f

∂x+ r2 sin2 θ

∂2f

∂x2− r2 sin θ cos θ

∂2f

∂y∂x

−r sin θ∂f

∂y− r2 sin θ cos θ

∂2f

∂x∂y+ r2 cos2 θ

∂2f

∂y2

= −r cos θ∂f

∂x− r sin θ

∂f

∂y+ r2 sin2 θ

∂2f

∂x2

−2r2 sin θ cos θ∂2f

∂y∂x+ r2 cos2 θ

∂2f

∂y2.

64

3.4 The Chain Rule

It is long and fairly messy but there it is. �

Implicit Differentiation The final topic in this section is a revisiting of implicit differentiation. Withthese forms of the chain rule implicit differentiation actually becomes a fairly simple process. We will startwith a function in the form F (x, y) = 0 (if it is not in this form simply move everything to one side of theequal sign to get it into this form) where y = y(x). In a single variable calculus course we were asked tocompute dy

dxand this was often a fairly messy process. Using the chain rule from this section however we

can get a nice simple formula for doing this. We will start by differentiating both sides with respect to x.This will mean using the chain rule on the left side and the right side will differentiate to zero. Here is theresult of that.

Fx + Fydy

dx= 0 =⇒ dy

dx= −Fx

Fy.

As shown, all we need to do next is solve for dydx

and we now have a very nice formula to use for implicitdifferentiation. Note as well that in order to simplify the formula we switched back to using the subscriptnotation for the derivatives. Let us check out a quick example.


Finddy

dxfor

x cos(3y) + x3y5 = 3x − exy.

Solution The first step is to get a zero on one side of the equal sign and that is easy enough to do.

x cos(3y) + x3y5 − 3x + exy = 0.

Now, the function on the left is F (x, y) in our formula so all we need to do is use the formula to find thederivative.

dy

dx= − cos(3y) + 3x2y5 − 3 + yexy

−3x sin(3y) + 5x3y4 + xexy.

�

We can also do something similar to handle the types of implicit differentiation problems involving partialderivatives like those we saw when we first introduced partial derivatives. In these cases we will start offwith a function in the form

F (x, y, z) = 0 and assume that z = f(x, y)

and we want to find ∂z∂x

and/or ∂z∂y

. Let us start by trying to find ∂z∂x

. We will differentiate both sideswith respect to x and we will need to remember that we are going to be treating y as a constant. Also, theleft side will require the chain rule. Here is the derivative.

∂F

∂x

∂x

∂x+

∂F

∂y

∂y

∂x+

∂F

∂z

∂z

∂x= 0.

Now, we have the following,∂x

∂x= 1 and

∂y

∂x= 0.

The first is because we are just differentiating x with respect to x and we know that is 1. The second isbecause we are treating y as a constant and so it will differentiate to zero.

Plugging these in and solving for ∂z∂x

gives

∂z

∂x= −Fx

Fz.

65


A similar argument can be used to show that

∂z

∂y= −Fy

Fz.

As with the one variable case we switched to the subscripting notation for derivatives to simplify the formulas.Let us take a quick look at an example of this.


Find∂z

∂xand

∂z

∂yfor

x2 sin(2y − 5z) = 1 + y cos(6xz).

Solution This is one of the functions discussed in Example 3.3.4 (page 52). You might go back and seethe difference between the two. First let us get everything on one side,

x2 sin(2y − 5z) − 1 − y cos(6xz) = 0.

Now, the function on the left is F (x, y, z) and so all we need to do is use the formulas developed above tofind the derivatives,

∂z

∂x= − 2x sin(2y − 5z) + 6yz sin(6xz)

−5x2 cos(2y − 5z) + 6xy sin(6xz),

∂z

∂y= − 2x2 cos(2y − 5z) − cos(6xz)

−5x2 cos(2y − 5z) + 6xy sin(6xz).

If you go back and compare these answers to those that we found the first time around you will notice thatthey might appear to be different. However, if you take into account the minus sign that sits in the front ofour answers here you will see that they are in fact the same. �

3.5 Directional Derivatives

So far we have only looked at the two partial derivatives fx(x, y) and fy(x, y). Recall that thesederivatives represent the rate of change of f as we vary x (holding y fixed) and as we vary y (holding xfixed) respectively. We now need to discuss how to find the rate of change of f if we allow both x and yto change simultaneously. The problem here is that there are many ways to allow both x and y to change.For instance one could be changing faster than the other and then there is also the issue of whether or noteach is increasing or decreasing. So, before we get into finding the rate of change we need to get a coupleof preliminary ideas first. The main idea that we need to look at is just how are we going to define thechanging of x and/or y.

Let us start off by supposing that we want the rate of change of f at a particular point, say (x0, y0).Let us also suppose that both x and y are increasing and that, in this case, x is increasing twice as fast asy is increasing. So, as y increases one unit of measure x will increase two units of measure.

To help us see how we are going to define this change let us suppose that a particle is sitting at (x0, y0)and the particle will move in the direction given by the changing x and y. Therefore, the particle will moveoff in a direction of increasing x and y and the x-coordinate of the point will increase twice as fast as they-coordinate. Now we are thinking of this changing x and y as a direction of movement we can get a way of

66


defining the change. We have known that vectors can be used to define a direction and so the particle, atthis point, can be said to be moving in the direction,

v = 〈2, 1〉.Since this vector can be used to define how a particle at a point is changing we can also use it to describehow x and/or y is changing at a point. For our example we will say that we want the rate of change of f inthe direction of v = 〈2, 1〉. In this way we will know that x is increasing twice as fast as y is. There is stilla small problem with this, however. There are many vectors that point in the same direction. For instance,all of following vectors point in the same direction as v = 〈2, 1〉,

v = 〈15,

1

10〉, v = 〈6, 3〉, v = 〈 2√

5,

1√5〉.

We need a way to consistently find the rate of change of a function in a given direction. We will do this byinsisting that the vector that defines the direction of change be a unit vector. Recall that a unit vector is avector with length, or magnitude, of 1. This means that for the example that we started off thinking aboutwe might want to use

v = 〈 2√5,

1√5〉,

since this is the unit vector that points in the direction of change.

For reference purposes recall that the magnitude or length of the vector v = 〈a, b, c〉 is given by

‖v‖ =p

a2 + b2 + c2.

For two-dimensional vectors we drop the c from the formula. Sometimes we will give the direction ofchanging x and y as an angle. For instance, we may say that we want the rate of change of f in the directionof θ = π/3. The unit vector that points in this direction is given by

u = 〈cos θ, sin θ〉.Now that we know how to define the direction of changing x and y it is time to start talking about findingthe rate of change of f in this direction. Let us first give the formal definition.

3.5.1 Definition The rate of change of f(x, y) in the direction of the unit vector u = 〈a, b〉 is called thedirectional derivative and is denoted by Duf(x, y). The definition of the directional derivative is

Duf(x, y) = limh→0

f(x + ah, y + bh) − f(x, y)

h.

So, the definition of the directional derivative is very similar to the definition of partial derivatives. However,in practice this can be a very difficult limit to compute so we need an easier way of taking directionalderivatives. It is actually fairly simple to derive an equivalent formula for taking directional derivatives.

To see how we can do this let us define a new function of a single variable,

g(z) = f(x0 + az, y0 + bz),

where x0, y0, a, and b are some fixed numbers. Note that this really is a function of a single variable nowsince z is the only letter that is not representing a fixed number. Then by the definition of the derivativefor functions of a single variable we have

g′(z) = limh→0

g(z + h) − g(z)

h

67


and the derivative at z = 0 is given by

g′(0) = limh→0

g(h) − g(0)

h.

If we now substitute in for g(z) we get

g′(0) = limh→0

g(h) − g(0)

h= lim

h→0

f(x0 + ah, y0 + bh) − f(x0, y0)

h= Duf(x0, y0). (3.1)

Now let us look at this from another perspective. Let us rewrite g(z) as follows,

g(z) = f(x, y), where x = x0 + az and y = y0 + bz.

We can now use the chain rule to compute

g′(z) =dg

dz=

∂f

∂x

dx

dz+

∂f

∂y

dy

dz= fx(x, y)a + fy(x, y) b.

So, from the chain rule we get the following relationship

g′(z) = fx(x, y)a + fy(x, y) b. (3.2)

If we now take z = 0 we will get that x = x0 and y = y0 (from how we defined x and y above) andplug these into (3.2) we get

g′(0) = fx(x0, y0)a + fy(x0, y0) b. (3.3)

Now, simply equate (3.1) and (3.3) to get that

Duf(x0, y0) = g′(0) = fx(x0, y0)a + fy(x0, y0) b.

If we now go back to allowing x and y to be any number we get the following formula for computingdirectional derivatives.

Duf(x, y) = fx(x, y)a + fy(x, y) b.

This is much simpler than the limit definition. Also note that this definition assumed that we were workingwith functions of two variables. There are similar formulas that can be derived by the same type of argumentfor functions with more than two variables. For instance, the directional derivative of f(x, y, z) in thedirection of the unit vector u = 〈a, b, c〉 is given by

Duf(x, y, z) = fx(x, y, z) a + fy(x, y, z) b + fz(x, y, z) c.

Let us work out a couple of examples.

� Example 3.5.1 (Directional derivative)

Find each of the directional derivatives.

(a) Duf(2, 0), where f(x, y) = xexy + y and u is the unit vector in the direction of θ =2π

3.

(b) Duf(x, y, z), where f(x, y, z) = x2z + y3z2 − xyz and u is the unit vector in the direction ofv = 〈−1, 0, 3〉.

Solution

68


(a) We will first find Duf(x, y) and then use this formula for finding Duf(2, 0). The unit vector givingthe direction is

u = 〈cos 2π

3, sin

2π

3〉 = 〈−1

2,

√3

2〉.

So, the directional derivative is

Duf(x, y) = −1

2(exy + xyexy) +

√3

2

`x2exy + 1

´,

Duf(2, 0) = −1

2(1) +

√3

2(5) =

5√

3 − 1

2.

(b) In this case let us first check to see if the direction vector is a unit vector or not and if it is not convertit into one. To do this all we need to do is compute its magnitude.

‖v‖ =√

1 + 0 + 9 =√

10.

So, it is not a unit vector. Recall that we can normalize it into the unit vector

u =1√10

〈−1, 0, 3〉 = 〈− 1√10

, 0,3√10

〉.

The directional derivative is then

Duf(x, y, z) = − 1√10

(2xz − yz) + 0`3y2z2 − xz

´+

3√10

`x2 + 2y3z − xy

´

=1√10

`3x2 + 6y3z − 3xy − 2xz + yz

´.

�

There is another form of the formula that we may use to get the directional derivative that is slightly betterand somewhat more compact. It is also a much more general formula that will encompass both of theformulas above.

Let us start with the second one and notice that we can rewrite it as follows.

Duf(x, y, z) = fx(x, y, z) a + fy(x, y, z) b + fz(x, y, z) c

= 〈fx, fy , fz〉 · 〈a, b, c〉.In other words we can write the directional derivative as a dot product and notice that the second vector isnothing more than the unit vector u that gives the direction of change. Also, if we like to use this versionfor functions of two variables the third component will not be there, but other than that the formula will bethe same.

Now let us give a name and notation to the first vector in the dot product since this vector will show upfairly regularly throughout this course. The gradient of f or gradient vector of f is defined to be

∇f = 〈fx, fy , fz〉 or ∇f = 〈fx, fy〉.Or, if we want to use the standard basis vectors the gradient is

∇f = fx i + fy j + fz k or ∇f = fx i + fy j.

The definition is only shown for functions of two or three variables, however there is a natural extension tofunctions of any number of variables that we would like.

69


With the definition of the gradient we can now say that the directional derivative is given by

Duf = ∇f · u,

where we will no longer show the variable and use this formula for any number of variables. Note as wellthat we will sometimes use the following notation

Duf(x) = ∇f(x) · u,

where x = 〈x, y, z〉 or x = 〈x, y〉 as needed. This notation will be used when we want to note the variablesin some way, but don’t really want to restrict ourselves to a particular number of variables. In other words,x will be used to represent as many variables as we need in the formula and we will most often use thisnotation when we are already using vectors or vector notation in the problem.

Let us work out a couple of examples using this formula of the directional derivative.

� Example 3.5.2 (Directional derivative)

Find each of the directional derivatives.

(a) Duf(2, 0) for f(x, y) = x cos y in the direction of v = 〈2, 1〉.

(b) Duf(x, y, z) for f(x, y, z) = sin(yz) + ln x2 at (1, 1, π) in the direction of v = 〈1, 1,−1〉.Solution

(a) Let us first compute the gradient for this function.

∇f = 〈cos y,−x sin y〉.Also, as we saw earlier in this section the unit vector for the direction of v is

u = 〈 2√5,

1√5〉.

The directional derivative is then

Duf(x) = 〈cos y,−x sin y〉 · 〈 2√5,

1√5〉 =

1√5

(2 cos y − x sin y) ,

Duf(2, 0) =1√5

(2 − 0) =2√5.

(b) In this case we are asking for the directional derivative at a particular point. To do this we will firstcompute the gradient, evaluate it at the point in question and then do the dot product. So, let us getthe gradient.

∇f(x, y, z) = 〈 2

x, z cos(yz), y cos(yz)〉,

∇f(1, 1, π) = 〈21, π cos π, cos π〉 = 〈2,−π,−1〉.

Next, we need the unit vector for the direction.

‖v‖ =√

3, u = 〈 1√3,

1√3,− 1√

3〉.

Finally, the directional derivative at the point is

Duf(1, 1, π) = 〈2,−π,−1〉 · 〈 1√3,

1√3,− 1√

3〉

=1√3

(2 − π + 1) =3 − π√

3.

70


�

Before proceeding let us note that the first-order partial derivatives that we were looking at in the majorityof the section can be thought of as special cases of the directional derivatives. For instance, fx can bethought of as the directional derivative of f in the direction of u = 〈1, 0〉 or u = 〈1, 0, 0〉, depending onthe number of variables that we are working with. The same can be done for fy and fz.

Gradient vectors. We will finish this section with a couple of nice facts about the gradient vector. Thefirst tells us how to determine the maximum rate of change of a function at a point and the direction thatwe need to move in order to achieve that maximum rate of change.

Theorem 3.5.1 The maximum value of Duf(x) ( and hence then the maximum rate of change ofthe function f(x) ) is given by ‖∇f(x)‖ and will occur in the direction given by ∇f(x).

This theorem provides a very useful interpretation for the gradient vector. Now we are going to discuss thedetail of this theorem and the reasoning behind.

For any point x and any unit vector u we have

Duf(x) = ∇f(x) · u = ‖∇f(x)‖ cos θ,

where θ is the angle between the vector u and ∇f(x). Since cos θ only takes on values between −1and 1, Duf(u) only takes on values between −‖∇f(x)‖ and ‖∇f(x)‖. Moreover,

Duf(x) = −‖∇f(x)‖ if and only if u points in the opposite direction to ∇f(x) (cos θ = −1),

Duf(x) = ‖∇f(x)‖ if and only if u points in the same direction as ∇f(x) (cos θ = 1).

The directional derivative is zero in the direction θ = π/2; this is the direction of the (tangent line to the)level curve of f through x.

We summarize these properties of the gradient as follows.

Theorem 3.5.2 (Geometric Properties of the Gradient Vector)

(a) At x, f(x) increases most rapidly in the direction of the gradient vector ∇f(x). The maximumrate of increase is ‖∇f(x)‖.

(b) At x, f(x) decreases most rapidly in the direction of −∇f(x). The maximum rate of decreaseis ‖∇f(x)‖.

(c) The rate of change of f(x) at x is zero in directions tangent to the level curve of f that passesthrough x.

As we remarked before, these properties hold in both two and three dimensions.

� Example 3.5.3 (Gradient vector)

Find the directions in which the function f(x, y) =x2

2+

y2

2

71


(a) Increases most rapidly at the point (1, 1).

(b) Decreases most rapidly at (1, 1).

(c) What are the directions of zero change in f at (1, 1).

Solution

(a) The function increases most rapidly in the direction of ∇f(x) at (1, 1). The gradient there is

∇f(1, 1) = 〈∂f

∂x,∂f

∂y〉˛(1,1)

= 〈x, y〉˛(1,1)

= 〈1, 1〉.

Its direction is

u =1√2〈1, 1〉 = 〈 1√

2,

1√2〉.

(b) The function decreases most rapidly in the direction of −∇f(x) at (1, 1), which is

−u = 〈− 1√2,− 1√

2〉.

(c) The directions of zero change at (1, 1) are the directions orthogonal to ∇f :

n = 〈− 1√2,

1√2〉 and − n = 〈 1√

2,− 1√

2〉.

�

� Example 3.5.4 (Gradient vector)

Suppose that the height of a hill above sea level is given by

z = 1000 − 0.01x2 − 0.02y2.

If you are at the point (60, 100) in what direction is the elevation changing faster? What is the maximumrate of change of the elevation at this point? Is the maximum rate of change of the elevation towards thecenter of the hill or away from it?

Solution First, you will hopefully know that the graph of the function is an elliptic paraboloid that opensdownward. So even though most hills are not this symmetrical it will at least be vaguely hill shaped and sothe question makes at least some sense.

To this problem there are a couple of questions to answer here, but using Theorem 3.5.2 makes answeringthem very simple. We will first need the gradient vector.

∇f(x) = ∇f(x, y) = 〈−0.02x,−0.04y〉.

The maximum rate of change of the elevation will then occur in the direction of

∇f(60, 100) = 〈−1.2,−4〉.

The maximum rate of change of elevation at this point is

‖∇f(60, 100)‖ =p

(−1.2)2 + (−4)2 =√

17.44 ≈ 4.176.

To answer the final part it might be convenient to have a quick sketch of the gradient at this point.

72


x

y

0 20 40 60 80 100

100

80

60

40

20

0

We have only shown a portion of the axis system here to make the picture easier to see. The center of thehill is at the origin and that is also the highest point on the hill. If we are standing at the point (60, 100)then the direction with the maximum rate of change of the elevation is given by ∇f(60, 100) = 〈−1.2,−4〉.This means that both x and y are decreasing (since they are negative) and y is decreasing faster than x.This is shown by the vector in the above sketch.

This also shows that the direction with maximum rate of change of the elevation is generally up the hill(and hence towards the center) rather than down the hill (away from the hill). �

The second fact about the gradient vector that we need to give before the end of this section will be veryconvenient in some later sections.

Let us consider the case of two variables for illustration. If a differentiable function f(x, y) has a constantvalue c along a smooth curve r = 〈g(t), h(t)〉 (making the curve a level curve of f), then f(g(t), h(t)) = c.Differentiating both sides of this equation with respect to t leads to the equations

d

dtf(g(t), h(t)) =

d

dt(c),

∂f

∂x

dg

dt+

∂f

∂y

dh

dt= 0,

〈∂f

∂x,∂f

∂y〉 · 〈dg

dt,dh

dt〉 = 0,

in which we might denote

∇f = 〈∂f

∂x,∂f

∂y〉 and

dr

dt= 〈dg

dt,dh

dt〉.

The last equation says that ∇f is normal to the tangent vector dr/dt, so it is normal to the curve.

We summarize the second fact about gradient vector as follows.

Theorem 3.5.3 (Gradient Vector Normal to Level Curve) The gradient vector ∇f(x0, y0) isorthogonal (or perpendicular) to the level curve f(x, y) = c at the point (x0, y0). Likewise, the gradientvector ∇f(x0, y0, z0) is orthogonal to the level surface f(x, y, z) = c at the point (x0, y0, z0).

73


As we are going to see in later sections we always like to know the vectors that are orthogonal to a surfaceor curve. Therefore, what we need to do is to compute a gradient vector and we will get the orthogonalvector we need. We will see the first application of this in the next section.

3.6 Applications of Partial Derivatives

In this section we will take a look at a couple of applications of partial derivatives. Most of the applicationsare simply extensions of what we have learnt about ordinary derivatives in single variable calculus. Forinstance, we will be looking at finding absolute and relative extrema of a function and we will also lookat optimization. Both of these subjects are major applications back in single variable calculus. They will,however, be a little more work here because we now have more than one variable.

• Tangent planes and linear approximations

Previously we saw (page 54) how the two partial derivatives fx and fy can be thought of as the slopesof traces. We want to extend this idea a little bit in this section. The graph of a function z = f(x, y) is asurface in R

3 (three-dimensional space) and so we can now start thinking of the plane that is “tangent” tothe surface as a point.

Let us start out with a point (x0, y0) and also let C1 represent the trace to f(x, y) for the planey = y0 (i.e., allowing x to vary with y held fixed) and we will let C2 represent the trace to f(x, y) forthe plane x = x0 (i.e., allowing y to vary with x held fixed). Now, we know that fx(x0, y0) is the slopeof the tangent line to the trace C1 and fy(x0, y0) is the slope of the tangent line to the trace C2. So,let L1 be the tangent line to the trace C1 and let L2 be the tangent line to the trace C2.

The tangent plane will then be the plane that contains the two lines L1 and L2. Geometrically thisplane will serve the same purpose that a tangent line did in single variable calculus. A tangent line to acurve was a line that just touched the curve at that point and was “parallel” to the curve at the point inquestion. Tangent planes to a surface are planes that just touch the surface at the point and are “parallel”to the surface at the point. Note that this gives us a point that is on the plane. Since the tangent plane andthe surface touch at (x0, y0) the following point will be on both the surface and the plane.

(x0, y0, z0) = (x0, y0, f(x0, y0)) .

What we need to do now is determine the equation of the tangent plane. We know that the general equationof a plane is given by

a(x − x0) + b(y − y0) + c(z − z0) = 0,

where (x0, y0, z0) is a point that is on the plane, which we know already. Let us rewrite this a little. Wewill move the x terms and y terms to the other side and divide both sides by c. Doing this gives

z − z0 = −a

c(x − x0) − b

c(y − y0).

Now, let us rename the constants to simplify up the notation a little. Let us rename them as follows.

A = −a

c, B = − b

c.

With this renaming the equation of the tangent plane becomes

z − z0 = A(x − x0) + B(y − y0)

and we need to determine values for A and B.

74


Let us first think about what happens if we hold y fixed, i.e., if we assume that y = y0. In this case theequation of the tangent plane becomes

z − z0 = A(x − x0).

This is the equation of a line and this line must be tangent to the surface at (x0, y0) (since it is part ofthe tangent plane). In addition, this line assumes that y = y0 (i.e., fixed) and A is the slope of this line.But if we think about it this is exactly that the tangent to C1 is a line tangent to the surface at (x0, y0)assuming that y = y0. In other words,

z − z0 = A(x − x0)

is the equation for L1 and we know that the slope of L1 is given by fx(x0, y0). Therefore we have thefollowing

A = fx(x0, y0).

If we hold x fixed at x = x0 the equation of the tangent plane becomes

z − z0 = B(y − y0).

However, by a similar argument to the one above we can see that this is nothing more than the equation forL2 and that its slope is B or fy(x0, y0). So,

B = fy(x0, y0).

The equation of the tangent plane to the surface given by z = f(x, y) at (x0, y0) is then

z − z0 = fx(x0, y0) (x − x0) + fy(x0, y0) (y − y0).

Also, if we use the fact that z0 = f(x0, y0) we can rewrite the equation of the tangent plane as

z − f(x0, y0) = fx(x0, y0) (x − x0) + fy(x0, y0) (y − y0),

z = f(x0, y0) + fx(x0, y0) (x − x0) + fy(x0, y0) (y − y0).

We will see another derivation of this formula (actually a more general formula) later on (page 76). So ifyou didn’t quite follow this argument hold off until then to see a better derivation.

� Example 3.6.1 (Tangent plane)

Find the equation of the tangent plane to z = ln(2x + y) at the point (−1, 3).

Solution There really is not too much to do here other than taking a couple of derivatives and doing somequick evaluations.

f(x, y) = ln(2x + y), z0 = f(−1, 3) = ln 1 = 0,

fx(x, y) =2

2x + y, fx(−1, 3) = 2,

fy(x, y) =1

2x + y, fy(−1, 3) = 1.

The equation of the plane is then

z − 0 = 2(x + 1) + (1)(y − 3),

z = 2x + y − 1.

75


�

One nice use of tangent planes is that they give us a way to approximate a surface near a point. As longas we are near to the point (x0, y0) then the tangent plane should nearly approximate the function at thatpoint. The tangent plane to the graph of z = f(x, y) at (x0, y0) is z = L(x, y), where

L(x, y) = f(x0, y0) + fx(x0, y0) (x − x0) + fy(x0, y0) (y − y0)

is the linear approximation of f at (x0, y0). We can use L(x, y) to approximate values of f(x, y)near (x0, y0):

f(x, y) ≈ L(x, y) = f(x0, y0) + fx(x0, y0) (x − x0) + fy(x0, y0) (y − y0).

� Example 3.6.2 (Linear approximation)

Find an approximate value for f(x, y) =√

2x2 + e2y at (2.2,−0.2).

Solution It is convenient to use the linear approximation at (x0, y0) = (2, 0), where the values of f andits partial derivatives are easily evaluated:

f(x, y) =√

2x2 + e2y, f(2, 0) = 3,

fx(x, y) =2x√

2x2 + e2y, fx(2, 0) =

4

3,

fy(x, y) =e2y

√2x2 + e2y

, fy(2, 0) =1

3.

Thus, L(x, y) = 3 +4

3(x − 2) +

1

3(y − 0), and

f(2.2,−0.2) ≈ L(2.2,−0.2) = 3 +4

3(2.2 − 2) +

1

3(−0.2 − 0) = 3.2.

(For the sake of comparison, f(2.2,−0.2) ≈ 3.2172 to 4 decimal places.) �

• Gradient vector, tangent planes and normal lines

In this subsection we want to revisit the derivation of tangent planes in the light of the gradient vector. Inthe process we will also take a look at a normal line to a surface.

Let us first recall the equation of a plane that contains the point (x0, y0, z0) with normal vectorn = 〈a, b, c〉 is given by

a(x − x0) + b(y − y0) + c(z − z0) = 0.

When we introduced the gradient vector in Section 3.5 (page 69) on directional derivatives we gave thefollowing fact in Theorem 3.5.3 (page 73).

Fact 1 The gradient vector ∇f(x0, y0, z0) is orthogonal to the level surface f(x, y, z) = c at the point(x0, y0, z0).

76


This says that the gradient vector is always orthogonal, or normal, to the surface at the point.

Also recall that the gradient vector is

∇f = 〈fx, fy , fz〉.So, the tangent plane to the surface given by f(x, y, z) = c at (x0, y0, z0) has the equation

fx(x0, y0, z0) (x − x0) + fy(x0, y0, z0) (y − y0) + fz(x0, y0, z0) (z − z0) = 0.

This is a much more general form of the equation of a tangent plane than the one that was derived previously(page 75).

Note however, that we can also get the equation from the previous subsection (page 75) using this moregeneral formula. To see this let us start with the equation z = f(x, y) and we want to find the tangentplane to the surface given by z = f(x, y) at the point (x0, y0, z0) where z0 = f(x0, y0). In order to usethe formula above we need to have all the variables on one side. This is easy enough to do. All we need todo is to subtract a z from both sides to get

f(x, y) − z = 0.

Now, if we define a new functionF (x, y, z) = f(x, y) − z,

we can see that the surface given by z = f(x, y) is identical to the surface given by F (x, y, z) = 0 andthis new equivalent equation is in the correct form for the equation of the tangent plane that we derived inthis subsection.

So, the first thing that we need to do is to find the gradient vector for F ,

∇F = 〈Fx, Fy, Fz〉 = 〈fx, fy ,−1〉.

Notice that

Fx =∂

∂x(f(x, y) − z) = fx,

Fy =∂

∂y(f(x, y) − z) = fy ,

Fz =∂

∂z(f(x, y) − z) = −1.

The equation of the tangent plane is then

fx(x0, y0) (x − x0) + fy(x0, y0) (y − y0) − (z − z0) = 0.

Solving for z givesz = f(x0, y0) + fx(x0, y0) (x − x0) + fy(x0, y0) (y − y0)

which is identical to the equation that we derived in the previous subsection (page 75).

We can get another nice piece of information out of the gradient vector as well. We might on occasionwant a line that is orthogonal to a surface at a point, sometimes called the normal line. This is easyenough to get if we recall that the equation of a line only requires that we have a point and a parallel vector.Since we want a line that is at the point (x0, y0, z0) we know that this point must also be on the line andwe know that ∇f(x0, y0, z0) is a vector that is normal to the surface and hence will be parallel to the line.Therefore the equation of the normal line is

r(t) = 〈x0, y0, z0〉 + t ∇f(x0, y0, z0).

77


� Example 3.6.3 (Tangent plane, normal line)

Find the tangent plane and normal line to x2 + y2 + z2 = 30 at the point (1,−2, 5).

Solution For this case the function that we are going to be working with is

F (x, y, z) = x2 + y2 + z2,

and note that we don’t have to have a zero on one side of the equal sign. All that we need is a constant. Tofinish this problem out we simply need the gradient evaluated at the point

∇F (x, y, z) = 〈2x, 2y, 2z〉, ∇F (1,−2, 5) = 〈2,−4, 10〉.

The tangent plane is then

2(x − 1) − 4(y + 2) + 10(z − 5) = 0.

The normal line is

r(t) = 〈1,−2, 5〉 + t 〈2,−4, 10〉 = 〈1 + 2t, −2 − 4t, 5 + 10t〉.�

• Relative minima and maxima

In this subsection we are going to extend one of the more important ideas from single variable calculus intofunctions of two variables. We wil see how we can find minima and maxima of functions in two variables.Recall that we often use the word extrema to refer to both minima and maxima. This in fact will be thetopic of the following two subsections as well (i.e., absolute maxima and Lagrange multipliers).

The definition of relative extrema for functions of two variables is identical to that for functions of onevariable we just need to remember now that we are working with functions of two variables. So, for the sakeof completeness here is the definition of relative minima and relative maxima for functions of two variables.

3.6.1 Definition (Relative Extrema)

1. A function f(x, y) has a relative minimum at the point (a, b) if f(x, y) � f(a, b) for allpoints (x, y) in some region around (a, b).

2. A function f(x, y) has a relative maximum at the point (a, b) if f(x, y) � f(a, b) for allpoints (x, y) in some region around (a, b).

Note that this definition does not say that a relative minimum is the smallest value that the function willever take. It only says that in some region around the point (a, b) the function will always be larger thanf(a, b). Outside of that region it is completely possible for the function to be smaller. Likewise, a relativemaximum only says that around (a, b) the function will always be smaller than f(a, b).Again, outside of that region it is completely possible that the function will be larger.

Next we need to extend the idea of critical values/points up to functions of two variables. Recall thata critical value of the function f(x) is a number x = c so that either f ′(c) = 0 or f ′(c) does notexist. If x = c is a critical value of f , then (c, f(c)) is said to be a critical point. We have a similardefinition for critical points of functions of two variables.

78


3.6.2 Definition The point (a, b) is a critical point ( or a stationary point ) of f(x, y) providedone of the following is true.

1. ∇f(a, b) = 0 ( this is equivalent to saying that fx(a, b) = 0 and fy(a, b) = 0 ).

2. fx(a, b) and/or fy(a, b) does not exist.

To see the equivalence in the first part let us start with ∇f = 0 and put in the definition of each part.

∇f(a, b) = 0,

〈fx(a, b), fy(a, b)〉 = 〈0, 0〉.

The only way that these two vectors can be equal is to have fx(a, b) = 0 and fy(a, b) = 0. In fact, we willuse this definition of the critical point more than the gradient definition since it will be easier to find thecritical points if we start with the first-order partial derivatives. Note as well that both of the first-orderpartial derivatives must be zero at (a, b). If only one of the first-order partial derivatives is zero at thepoint then the point will NOT be a critical point.

We now have the following fact that, at least partially, relates critical points to relative extrema.

Fact 2 If the point (a, b) is a relative extremum of the function f(x, y) then (a, b) is also a criticalpoint of f(x, y).

Note that this does NOT say that all critical points are relative extrema. It only says that relative extremawill be critical points of the function. To see this let us consider the function

f(x, y) = xy.

The two first-order partial derivatives are

fx(x, y) = y and fy(x, y) = x.

The only point that will make both of these derivatives zero at the same time is (0, 0) and so (0, 0) is acritical point for the function. Here is the graph of the function.

79


z

y

x

Note that the axes are not in the standard orientation here so that we can see more clearly what is happeningat the origin, i.e., at (0, 0). If we start at the origin and move into either of the quadrants where both xand y are the same sign the function increases. However, if we start at the origin and move into either ofthe quadrants where x and y have opposite signs then the function decreases. In other words, no matterwhat region you take about the origin there will be points larger than f(0, 0) = 0 and points smaller thanf(0, 0) = 0. Therefore, there is no way that (0, 0) can be a relative extremum. Critical points that exhibitthis kind of behavior are called saddle points.

While we have to be careful not to misinterpret the results of this fact. Because of this fact we knowthat if we have all the critical points of a function then we also have every possible relative extrema for thefunction. The fact tells us that all relative extrema must be critical points so we know that if the functiondoes have relative extrema then they must be in the collection of all the critical points. Remember however,that it will be completely possible that at least one of the critical points won’t be a relative extremum.

So, once we have all the critical points in hand all we need to do is test these points to see if they arerelative extrema or not. To determine if a critical point is a relative extremum (and in fact to determine ifit is a minimum or a maximum) we can use the following fact.

Fact 3 (Second Derivative Test) Suppose that (a, b) is a critical point of f(x, y) and that thesecond-order partial derivatives are continuous in some region that contains (a, b). Next define

D = D(a, b) = fxx(a, b) fyy(a, b) − [fxy(a, b)]2 .

We then have the following classifications of the critical point.

1. If D > 0 and fxx(a, b) > 0, then (a, b) is a relative minimum.

2. If D > 0 and fxx(a, b) < 0, then (a, b) is a relative maximum.

3. If D < 0, then (a, b) is a saddle point.

4. If D = 0, then (a, b) may be a relative minimum, relative maximum or a saddle point. Othertechniques would need to be used to classify the critical point.

80


Note that we are not going to be seeing any cases in this class where D = 0. We will be able to classify allthe critical points that we find.

Let us see a couple of examples.

� Example 3.6.4 (Critical points)

Find and classify all the critical points of

f(x, y) = 4 + x3 + y3 − 3xy.

Solution We first need all the first-order (to find the critical points) and second-order (to classify thecritical points) partial derivatives so let us get those.

fx = 3x2 − 3y, fy = 3y2 − 3x,

fxx = 6x, fyy = 6y, fxy = −3.

Let us first find the critical points. Critical points will be solutions to the system of equations(fx = 3x2 − 3y = 0,

fy = 3y2 − 3x = 0.

This is a nonlinear system of equations and these can (quite often) be difficult to solve. However, in thiscase it is not too bad. We can solve the first equation for y as follows

3x2 − 3y = 0 =⇒ y = x2.

Plugging this into the second equation gives

3(x2)2 − 3x = 3x(x3 − 1) = 0.

From this we can see that we must have x = 0 and x = 1. Now use the fact that y = x2 to get thecritical points.

x = 0 : y = 02 = 0 =⇒ (0, 0),

x = 1 : y = 12 = 1 =⇒ (1, 1).

So we get two critical points. All we need to do now is classify them. To do this we will need to know thesign of D. Here is the general formula for D.

D(x, y) = fxx(x, y) fyy(x, y) − [fxy(x, y)]2

= (6x)(6y) − (−3)2

= 36xy − 9.

To classify the critical points all that we need to do is plug in the critical points and use the fact above toclassify them.

For the critical point (0, 0):D = D(0, 0) = −9 < 0.

For (0, 0), D is negative and so this must be a saddle point.

For the critical point (1, 1):

D = D(1, 1) = 36 − 9 = 27 > 0, fxx(1, 1) = 6 > 0.

For (1, 1), D is positive and fxx is positive and so we must have a relative minimum. �

81



Find and classify all the critical points of

f(x, y) = 3x2y + y3 − 3x2 − 3y2 + 2.

Solution We first need all the first-order (to find the critical points) and second-order (to classify thecritical points) partial derivatives so let us get those.

fx = 6xy − 6x, fy = 3x2 + 3y2 − 6y,

fxx = 6y − 6, fyy = 6y − 6, fxy = 6x.

We will first need the critical points. The equations that we will need to solve this time are(6xy − 6x = 0,

3x2 + 3y2 − 6y = 0.

These equations are a little trickier to solve than the first set, but once you see what to do they really arenot too complicated. First, notice that we can factor out a 6x from the first equation to get

6x(y − 1) = 0.

So, we can see that the first equation will be zero if x = 0 or y = 1. Be careful to not just cancel x fromboth sides. If we really do this we would miss the case x = 0.

To find the critical points we can plug these (individually) into the second equation and solve for theremaining variable.

x = 0 : 3y2 − 6y = 3y(y − 2) = 0 =⇒ y = 0, y = 2,

y = 1 : 3x2 − 3 = 3(x2 − 1) = 0 =⇒ x = −1, x = 1.

So, if x = 0 we have the following critical points

(0, 0) and (0, 2)

and if y = 1 the critical points are

(1, 1) and (−1, 1).

Now all we need to do is classify the critical points. To do this we will need the general formula for D.

D(x, y) = (6y − 6)(6y − 6) − (6x)2 = (6y − 6)2 − 36x2.

To classify the critical points all that we need to do is to plug in the critical points and use the fact aboveto classify them.

For the critical point (0, 0): D = D(0, 0) = 36 > 0, fxx(0, 0) = −6 < 0. So (0, 0) is a relative maximum.

For the critical point (0, 2): D = D(0, 2) = 36 > 0, fxx(0, 2) = 6 > 0. So (0, 2) is a relative minimum.

For the critical point (1, 1): D = D(1, 1) = −36 < 0. So (1, 1) is a saddle point.

For the critical point (−1, 1): D = D(−1, 1) = −36 < 0. So (−1, 1) is a saddle point. �

Let us do one more example that is a little different from the first two.

82



Determine the point on the plane 4x − 2y + z = 1 that is closest to the point (−2,−1, 5).

Solution Note that we are NOT asking for the critical points of the plane. In order to do this examplewe are going to work out the equation that we are going to work with.

First let us suppose that (x, y, z) is any point on the plane. The distance between this point and thepoint in question, (−2,−1, 5), is given by the formula

d =p

(x + 2)2 + (y + 1)2 + (z − 5)2.

What are then asking is to find the minimum value of this distance function. The point (x, y, z) that givesthe minimum value of this equation will be the point on the plane that is closest to (−2,−1, 5).

There are a couple of issues with this function. First, it is a function of x, y and z and we can onlydeal with functions of x and y at this point. This is easy to fix however. We can solve the equation ofthe plane to see that

z = 1 − 4x + 2y.

Plugging this into the distance function gives

d =p

(x + 2)2 + (y + 1)2 + (1 − 4x + 2y − 5)2

=p

(x + 2)2 + (y + 1)2 + (−4 − 4x + 2y)2.

Now, the next issue is that there is a square root in this formula and we know that we are going to bedifferentiating this eventually. So, in order to make our argument a little easier let us notice that findingthe minimum value of d will be equivalent to finding the minimum value of d2. So, let us instead find theminimum value of

f(x, y) = d2 = (x + 2)2 + (y + 1)2 + (−4 − 4x + 2y)2.

Now, we need to be a little careful here. We are being asked to find the closest point on the plane to(−2,−1, 5) and that is not really the same thing as what we have been doing in this subsection. In thissubsection we have been finding and classifying critical points as relative minima or maxima and what weare really asking is to find the smallest value the function will take, or the absolute minimum. Hopefully, itdoes make sense from a physical standpoint that there will be a closest point on the plane to (−2,−1, 5).Also, this point should be a relative minimum.

So, let us go through the process from the first and second example and see what we get as far as relativeminima go. If we only get a single relative minimum then we will be done since that point will also need tobe the absolute minimum of the function and hence the point on the plane that is closest to (−2,−1, 5).We will need the derivatives first.

fx = 2(x + 2) + 2(−4)(−4 − 4x + 2y) = 36 + 34x − 16y,

fy = 2(y + 1) + 2(2)(−4 − 4x + 2y) = −14 − 16x + 10y,

fxx = 34,

fyy = 10,

fxy = −16.

Now, before we get into finding the critical point(s) let us compute D quickly.

D = (34)(10) − (−16)2 = 84 > 0.

83


So, in this case D will always be positive and also notice that fxx = 34 > 0 is always positive and so anycritical points that we get will be guaranteed to be relative minima. Now, let us find the critical point(s).This will mean solving the system (

36 + 34x − 16y = 0,

−14 − 16x + 10y = 0.

To do this we can solve the first equation for x.

x =1

34(16y − 36) =

1

17(8y − 18).

Now, plug this into the second equation and solve for y.

−14y − 16

17(8y − 18) + 10y = 0 =⇒ y = −25

21.

Backward substituting this into the equation for x gives x = −34/21. So we get a single critical point„−34

21, −25

21

«.

Also, since we know this will be a relative minimum and it is the only critical point we know that this is alsothe x and y coordinates of the point on the plane that we are looking for. We can find the z coordinateby plugging into the equation of the plane as follows

z = 1 − 4(−34

21) + 2(−25

21) =

107

21.

So, the point on the plane that is closest to (−2,−1, 5) is„−34

21,

25

21,

107

21

«.

�

• Absolute minima and maxima

Now we are going to extend the work from the previous subsection. In the previous subsection we wereasked to find and classify all critical points as relative minima, relative maxima and/or saddle points. Inthis subsection we want to optimize a function, that is identify the absolute minimum and/or the absolutemaximum of the function, on a given region in R

2. Note that when we say we are going to be working ona region in R

2 we mean that we are going to be looking at some region in the xy-plane.

In order to optimize a function in a region we are going to introduce a couple of definitions out of theway. Here are the definitions.

3.6.3 Definition

1. A region in R2 is called closed if it includes its boundary. A region is called open if it does

not include any of its boundary points.

2. A region in R2 is called bounded if it can be completely contained in a disk. In other words, a

region will be bounded if it is finite.

84


Let us think a little more about the definition of a closed region. We said a region is closed if it includes itsboundary. Just what does this mean? Let us think of a rectangle. Below are two definitions of a rectangle,one is closed and the other is open.

Open Closed

−5 < x < 3, −5 � x � 3,

1 < y < 6. 1 � y � 6.

In the first case we don’t allow the ranges to include the endpoints (i.e., we are not including the edges ofthe rectangle) and so we are not allowing the region to include any points on the edge of the rectangle. Inother words, we are not allowing the region to include its boundary and so it is open. In the second casewe are allowing the region to contain points on the edges and so will contain its entire boundary and hencewill be closed. This is an important idea because of the following fact.

Theorem 3.6.1 (Extreme Value Theorem) If f(x, y) is continuous in some closed, bounded set Din R

2 then there are points in D, (x1, y1) and (x2, y2) so that f(x1, y1) is the absolute maximumand f(x2, y2) is the absolute minimum of the function in D.

Note that this theorem does NOT tell us where the absolute minimum or absolute maximum will occur. Itonly tells us that they will exist. Note as well that the absolute minimum and/or absolute maximum mayoccur in the interior of the region or it may occur on the boundary of the region.

The basic process to find absolute maxima is pretty much identical to the process that we used in singlevariable calculus when we looked at finding absolute extrema of functions of a single variable. There willhowever, be some differences to account for the fact that we now are dealing with functions of two variables.Here is the process.

Theorem 3.6.2 (Finding Absolute Extrema)

1. Find all the critical points of the function that lie in the region D and determine the functionvalue at each of these points.

2. Find all extrema of the function on the boundary. This usually involves the single variable calculusapproach for this work.

3. The largest and smallest values found in the first two steps are the absolute maximum and theabsolute minimum of the function respectively.

The main difference between this process and the process that we used in single variable calculus is that the“boundary” in single variable calculus is just two points and so there is just little to do in the second step.For these problems the majority of the work is often in the second step as we will often end up doing anabsolute extrema problem in single variable calculus one or more times.

Let us take a look at a couple of examples.

85


� Example 3.6.7 (Absolute extrema)

Find the absolute minimum and absolute maximum of

f(x, y) = x2 + 4y2 − 2x2y + 4

on the rectangle given by −1 � x � 1 and −1 � y � 1.

Solution Let us first get a quick picture of the rectangle for reference purposes.

x

y

x = 1

y = 1

x = −1

y = −1

The boundary of this rectangle is given by the following conditions.

right side : x = 1, −1 � y � 1,

left side : x = −1, −1 � y � 1,

upper side : y = 1, −1 � x � 1,

lower side : y = −1, −1 � x � 1.

These will be important in the second step of our process. We will start this off by finding all the criticalpoints that lie inside the given rectangle. To do this we will need the two first-order derivatives

fx = 2x − 4xy, fy = 8y − 2x2.

Note that since we are not going to be classifying the critical points we don’t have to consider the second-orderderivatives. To find the critical points we will need to solve the system(

2x − 4xy = 0,

8y − 2x2 = 0.

We can solve the second equation for y to get

y =x2

4.

Plugging this into the first equation gives

2x − 4x(x2

4) = 2x − x3 = x(2 − x2) = 0.

86


This implies that we must have

x = 0 or x = ±√

2 ≈ ±1.414.

Now, recall that we only want critical points in the region that we are given. This means that we only wantcritical points for which −1 � x � 1. The only value of x that will satisfy this is the first one so we canignore the last two for this problem. Note however that a simple change to the boundary would includethese two so don’t forget to always check if the critical points are in the region (or on the boundary sincethat can also happen).

Put x = 0 into the equation for y gives

y =02

4= 0.

The single critical point, in the region (and again, that’s important), is (0, 0). We now need to get thevalue of the function at the critical point.

f(0, 0) = 4.

Eventually we will compare this to values of the function found in the next step and take the largest andsmallest as the absolute extrema of the function in the rectangle.

Now we have reached the long part of this problem. We need to find the absolute extrema of the functionalong the boundary of the rectangle. What this means is that we are going to look at what the function isdoing along each of the sides of the rectangle listed above.

Let us first take a look at the right side. As noted above the right side is defined by

x = 1, −1 � y � 1.

Notice that along the right side we know that x = 1. Let us take advantage of this by defining a newfunction as follows

g(y) = f(1, y) = (1)2 + 4y2 − 2(1)2y + 4 = 5 + 4y2 − 2y.

Now, finding the absolute extrema of f(x, y) along the right side will be equivalent to finding the absoluteextrema of g(y) in the range −1 � y � 1. Hopefully you can recall how to do this from single variablecalculus. We find the critical points of g(y) in the range −1 � y � 1 and then evaluate g(y) at thecritical points and the end points of the range y’s. Let us do that for this problem.

g′(y) = 8y − 2 =⇒ y =1

4.

This is in the range and so we will need the following function evaluations.

g(−1) = 11, g(1) = 7, g(1

4) =

19

4= 4.75.

Notice that, using the definition of g(y) these are also function values for f(x, y).

g(−1) = f(1,−1) = 11,

g(1) = f(1, 1) = 7,

g(1

4) = f(1,

1

4) =

19

4= 4.75.

We can now do the left side of the rectangle which is defined by

x = −1, −1 � y � 1.

87


Again, we will define a new function (it doesn’t matter we are still using the symbol g) as follows

g(y) = f(−1, y) = (−1)2 + 4y2 − 2(−1)2y + 4 = 5 + 4y2 − 2y.

However, notice that for this boundary, this is the same function as we looked at for the right side. Thiswill not always happen, but for this example let us take advantage of the fact that we have already done thework for this function. We know that the critical point is y = 1/4 and we know that the function value atthe critical point and the end points are

g(−1) = 11, g(1) = 7, g(1

4) =

19

4= 4.75.

The only real difference here is that these will correspond to value of f(x, y) at different points than forthe right side. In this case these will correspond to the following function values for f(x, y).

g(−1) = f(−1,−1) = 11,

g(1) = f(−1, 1) = 7,

g(1

4) = f(−1,

1

4) =

19

4= 4.75.

We can now look at the upper side defined by

y = 1, −1 � x � 1.

We will again define a new function except this time it will be a function of x.

h(x) = f(x, 1) = x2 + 4(1)2 − 2x2(1) + 4 = 8 − x2.

We need to find the absolute extrema of h(x) on the range −1 � x � 1. First find the critical points.

h′(x) = −2x =⇒ x = 0.

The value of this function at the critical point and the end points are

h(−1) = 7, h(1) = 7, h(0) = 8,

and the corresponding values for f(x, y) are

h(−1) = f(−1, 1) = 7,

h(1) = f(1, 1) = 7,

h(0) = f(0, 1) = 8.

Note that there are several “repeats” here. The first two function values have already been computed whenwe looked at the right and left sides. This will often happen.

Finally, we need to take care of the lower side. This side is defined by

y = −1, −1 � x � 1.

The new function we will define in this case is

h(x) = f(x,−1) = x2 + 4(−1)2 − 2x2(−1) + 4 = 8 + 3x2.

The critical point for this function is

h′(x) = −2x =⇒ x = 0.

88


The function values at the critical point and the end points are

h(−1) = 11, h(1) = 11, h(0) = 8,

and the corresponding values for f(x, y) are

h(−1) = f(−1,−1) = 11,

h(1) = f(1,−1) = 11,

h(0) = f(0,−1) = 8.

The final step to this long process is to collect up all the function values for f(x, y) that we havecomputed in this problem. Here they are

f(0, 0) = 4, f(1,−1) = 11, f(1, 1) = 7,

f(1,1

4) = 4.75, f(−1, 1) = 7, f(−1,−1) = 11,

f(−1,1

4) = 4.75, f(0, 1) = 8, f(0,−1) = 8.

The absolute minimum is at (0, 0) since it gives the smallest function value and the absolute maximumoccurs at (1,−1) and (−1,−1) since these two points give the largest value among all. �

As this example has shown these can be very long problems. Let us take a look at an easier problemwith a different kind of boundary.

� Example 3.6.8 (Absolute extrema)

Find the absolute minimum and absolute maximum of

f(x, y) = 2x2 − y2 + 6y

on the disk of radius 4, x2 + y2 � 16.

Solution First note that a disk of radius 4 is given by the inequality in the problem statement. The “lessthan” inequality is included to get the interior of the disk and the equal sign is included to get the boundary.Of course, this also means that the boundary of the disk is a circle of radius 4.

x

y

Circular boundary

x2 + y2 = 16

89


Let us find the critical points of the function that lie inside the disk. This will require the following twofirst-order partial derivatives.

fx = 4x, fy = −2y + 6.

To find the critical points we will need to solve the system(4x = 0,

−2y + 6 = 0.

This is actually a fairly simple system to solve however. The first equation tells us that x = 0 and thesecond tells us that y = 3. So the only critical point for this function is (0, 3) and this is inside the diskof radius 4. The function value at this critical point is

f(0, 3) = 9.

Now we need to look at the boundary. This one will be somewhat different from the previous example. Inthis case we don’t have fixed values of x and y on the boundary. Instead we have

x2 + y2 = 16.

We can solve this for x2 and plug this into the x2 in f(x, y) to get a function of y as follows

x2 = 16 − y2,

g(y) = 2(16 − y2) − y2 + 6y = 32 − 3y2 + 6y.

We will need to find the absolute extrema of this function on the range −4 � y � 4 (this is the range ofy’s for the disk). We will first need the critical points of this function.

g′(y) = −6y + 6 =⇒ y = 1.

The value of this function at the critical point and the end points are

g(−4) = −40, g(4) = 8, g(1) = 35.

Unlike the first example we will still need to find the values of x that correspond to these. We can do thisby plugging the value of y into our equation for the circle and solving for y.

y = −4 : x2 = 16 − 16 = 0 =⇒ x = 0,

y = 4 : x2 = 16 − 16 = 0 =⇒ x = 0,

y = 1 : x2 = 16 − 1 = 15 =⇒ x = ±√15.

The function values for g(y) then correspond to the following function values for f(x, y).

g(−4) = −40 =⇒ f(0,−4) = −40,

g(4) = 8 =⇒ f(0, 4) = 8,

g(1) = 35 =⇒ f(−√15, 1) = 35 and f(

√15, 1) = 35.

Note that the third one actually corresponds to two different values for f(x, y) since that y also producestwo different values of x. So, comparing these values to the value of the function at the critical point off(x, y) that we found earlier we can see that the absolute minimum occurs at (0,−4) while the absolutemaximum occurs twice at (−√

15, 1) and (√

15, 1). �

In these examples one of the absolute extrema actually occured at more than one place. Sometimes thiswill happen and sometimes it won’t, so don’t read too much into the fact that it happened in both examplesgiven here. Also note that, as we have seen, absolute extrema will often occur on the boundaries of theseregions, although they don’t have to occur at the boundaries. There are more complicated examples withmultiple critical points that the absolute extrema may occur interior to the region and not on the boundary.

90


• Lagrange multipliers

In the previous subsection we optimized (i.e., found the absolute extrema of) a function on a region thatcontains its boundary. Find potential optimal points in the interior of the region is not too difficult ingeneral, all that we need to do is to find the critical points and plug them into the function. However, aswe saw in Examples 3.6.7, 3.6.8 finding potential optimal critical points on the boundary is often a fairlylong and messy process.

Now we are going to take a look at another method (Lagrange multipliers) of optimizing a functionsubject to given constraint(s). The constraint(s) may be equation(s) that describe the boundary of a regionalthough in this subsection we won’t concentrate on those types of problems since this method just requiresa general constraint and doesn’t really care where the constraint came from.

So, let us get things set up. We want to optimize (find the minimum and maximum) a function,f(x, y, z), subject to the constraint g(x, y, z) = c. Again, the constraint may be the equation thatdescribes the boundary of a region or it may not be. The process is actually fairly simple, although the workcan still be a little overwhelming at times.

Theorem 3.6.3 (Method of Lagrange Multipliers)

1. Solve the following system of equations8<:

∇f(x, y, z) = λ ∇g(x, y, z),

g(x, y, z) = c.

2. Plug in all solutions, (x, y, z), from the first step into f(x, y, z) and identify the minimum andmaximum values, provided they exist.

The constant, λ, is called the Lagrange multiplier.

Notice that the system of equations actually has four equations, we just wrote the system in a simpler form.To see this let us take the first equation and put in the definition of the gradient vector to see what we get.

〈fx, fy , fz〉 = λ 〈gx, gy, gz〉= 〈λgx, λgy, λgz〉.

For these two vectors to be equal, the individual components must also be equal. So, we actually have threeequations here.

fx = λgx, fy = λgy, fz = λgz.

These three equations along with the constraint, g(x, y, z) = c, give four equations with four unknowns x,y, z, and λ.

Note as well that if we only have functions of two variables then we won’t have the third component ofthe gradient and so will only have three equations in three unknowns x, y, and λ.

Let us work a couple of examples.

91


� Example 3.6.9 (Lagrange multiplier)

Find the dimensions of the box with largest volume if the total surface area is 64 cm2.

Solution Before we start the process here note that we also have a way to solve this kind of problem insingle variable calculus, except in those problems we require a condition that relates one of the sides of thebox to the other sides so that we can get down to a volume and surface area function that only involve twovariables. We no longer need this condition for these problems.

Now, let us go on to solve the problem. We first need to identify the function that we are going tooptimize as well as the constraint. Let us set the length of the box to be x, the width of the box to be yand the height of the box to be z. We want to find the largest volume and so the function that we wantto optimize is given by

f(x, y, z) = xyz.

Next we know that the surface area of the box must be a constant 64. So this is the constraint. The surfacearea of a box is simply the sum of the areas of each of the sides so the constraint is given by

2xy + 2yz + 2xz = 64 =⇒ xy + yz + xz = 32.

Note that we divide the constraint by 2 to simplify the equation a little. Also, we get the function g(x, y, z)from this.

g(x, y, z) = xy + yz + xz.

Here are the four equations that we need to solve.

yz = λ(y + z) (fx = λgx) , (3.4)

xz = λ(x + z) (fy = λgy) , (3.5)

xy = λ(x + y) (fz = λgz) , (3.6)

xy + yz + xz = 32 (g(x, y, z) = 32) . (3.7)

Although the equations are nonlinear, there are many ways to solve this system. We will solve it in thefollowing way. Let us multiply equation (3.4) by x, equation (3.5) by y and equation (3.6) by z.

xyz = λx(y + z), (3.8)

xyz = λy(x + z), (3.9)

xyz = λz(x + y). (3.10)

Now notice that we can set equations (3.8) and (3.9) equal. Doing this gives

λx(y + z) = λy(x + z),

λ(xy + xz) − λ(xy + yz) = 0,

λ(xz − yz) = 0 =⇒ λ = 0 or xz = yz.

This implies two possibilities. The first, λ = 0, is not possible since if this is the case equation (3.4) willreduce to

yz = 0 =⇒ y = 0 or z = 0.

Since we are talking about the dimensions of a box neither of these are possible so we can discount λ = 0.This leaves the second possibility.

xz = yz.

Since we know that z = 0 (again since we are talking about the dimensions of a box) we can cancel the zfrom both sides. This gives

x = y. (3.11)

92


Next, let us set equations (3.9) and (3.10) equal. Doing this gives

λy(x + z) = λz(x + y),

λ(xy + yz − xz − yz) = 0,

λ(xy − xz) = 0 =⇒ λ = 0 or xy = xz.

As already discussed we know that λ = 0 won’t work and so this gives

xy = xz.

We can also say that x = 0 since we are dealing with the dimensions of a box so we have

y = z. (3.12)

Plugging equations (3.11) and (3.12) into equation (3.7) we get

y2 + y2 + y2 = 3y2 = 32, y = ±r

32

3≈ ±3.266.

However, we know that y must be positive since we are talking about the dimensions of a box. Thereforethe only solution that makes physical sense here is

x = y = z ≈ 3.266 cm.

This shows that we have a cube here.

We should be a little careful here. Since we have obtained only one solution we might be tempted toassume that this is the dimensions that will give the largest volume. The method of Lagrange multiplierswill give a set of points that will either maximize or minimize a given function subject to the constraint.However, when we get a single solution it may be either a maximum or a minimum. To verify that we indeedhave a maximum, as we want, all we need to do is to pick any other point that satisfies the constraint andcheck its volume against the volume of the point we got above. If the volume of the point above is largerthan the second point we will know that we indeed have a maximum.

To get the second point let us choose y = z = 2 plugging these into the constraint gives

2x + 2x + 4 = 32, x = 7.

Checking the volume at the two points gives

f(3.266, 3.266, 3.266) = 34.8376,

f(7, 2, 2) = 28.

So, it is certain that we did get a maximum value as expected. �

Notice that we never actually found values for λ in the above example. This is fairly standard for thesekind of problems. The value of λ is not really important to determining if the point is a maximum or aminimum so often we will not bother with finding a value for it. On occasion we will need its value to helpsolve the system, let us take a look at the next example for illustration.


Find the maximum and minimum off(x, y) = 5x − 3y

subject to the constraint x2 + y2 = 136.

93


Solution This one is going to be a little easier than the previous one since it only has two variables. Hereis the system that we need to solve.

5 = 2λx,

−3 = 2λy,

x2 + y2 = 136.

Notice that, as with the last example, we cannot have λ = 0 since that would not satisfy the first twoequations. So, since we know that λ = 0 we can solve the first two equations for x and y, respectively.This gives

x =5

2λ, y = − 3

2λ.

Plugging these into the constraint gives

25

4λ2+

9

4λ2=

17

2λ2= 136.

We can solve this for λ.

λ2 =1

16=⇒ λ = ±1

4.

Now, that we know λ we can find the points that will be potential maxima and/or minima.

If λ = −1

4, we get

x = −10, y = 6.

If λ =1

4, we get

x = 10, y = −6.

To determine if we have maxima or minima we just need to plug these into the function.

f(−10, 6) = −68, minimum at (−10, 6),

f(10,−6) = 68, maximum at (10,−6).

�

In the first two examples we have excluded λ = 0 either for physical reasons or because it would notsatisfy one or more of the equations. Do not always expect this to happen. Sometimes we will be able toautomatically exclude a value of λ and sometimes we won’t.

Let us take a look at another example.


Find the maximum and minimum off(x, y, z) = xyz

subject to the constraint x + y + z = 1. Assume that x, y, z � 0.

Solution Here is the system that we need to solve.

yz = λ, (3.13)

xz = λ, (3.14)

xy = λ, (3.15)

x + y + z = 1. (3.16)

94


Let us start this solution process off by noticing that since the first three equations all have λ they are allequal. So, let us start off by setting equations (3.13) and (3.14) equal.

yz = xz =⇒ z(y − x) = 0 =⇒ z = 0 or y = x.

So, we have two possibilities here. Let us consider the first possibility: z = 0. With this we can see fromeither equation (3.13) or (3.14) that we must have λ = 0. From equation (3.15) we see that this meansthat xy = 0. This in turn means that either x = 0 or y = 0.

So, we have two possible cases to deal with here. In each case two of the variables must be zero. Oncewe know this we can plug into the constraint, equation (3.16), to find the remaining value.

z = 0, and x = 0 =⇒ y = 1,

z = 0, and y = 0 =⇒ x = 1.

So, we get two possible solutions (0, 1, 0) and (1, 0, 0).

Now, let us go back and take a look at the other possibility, y = x. We also have two possible cases tolook at here as well.

The first case is x = y = 0. In this case we can see from the constraint that we must have z = 1 andso we now have a third solution (0, 0, 1).

The second case is x = y = 0. Let us set equations (3.14) and (3.15) equal.

xz = xy =⇒ x(z − y) = 0 =⇒ x = 0 or z = y.

Now, we have already assumed that x = 0 and so the only possibility is that z = y. However, this alsomeans that

x = y = z.

Using this in the constraint gives

3x = 1 =⇒ x =1

3.

So, the next solution is (1

3,

1

3,

1

3). We got four solutions by setting the first two equations equal.

To completely finish this problem out we should probably set equations (3.13) and (3.15) equal as wellas setting equations (3.14) and (3.15) equal to see what we get. Doing this gives

yz = xy =⇒ y(z − x) = 0 =⇒ y = 0 or z = x,

xz = xy =⇒ x(z − y) = 0 =⇒ x = 0 or z = y.

Both of these are very similar to the first situation that we looked at and we will leave it up to you to showthat in each of these cases we arrive back at the four solutions that we already found. So, we have foursolutions that we need to check in the function to see whether we have minima or maxima.

f(0, 0, 1) = 0, f(0, 1, 0) = 0, f(1, 0, 0) = 0 : all minima,

f(1

3,

1

3,

1

3) =

1

27: maximum.

So, in this case the maximum occurs only once while the minimum occurs three times.

Note as well that we never really used the assumption that x, y, z � 0 in this problem. This assumptionis here mostly to make sure that we really do have a maximum and a minimum of the function. Without

95


this assumption it would not be too difficult to find points that give both larger and smaller values for thefunction. For example,

x = −100, y = 100, z = 1 : −100 + 100 + 1 = 1, f(−100, 100, 1) = −10000,

x = −50, y = −50, z = 101 : −50 − 50 + 101 = 1, f(−50,−50, 101) = 252500.

With these examples you can clearly see that it is not too hard to find points that will give larger andsmaller function values. However, all of these examples required negative values of x, y, and/or z to makesure we satisfy the constraint. By eliminating these we can now say that we have found the minimum andmaximum values of the function. �

To this point we have only looked at constraints that were equations. We can also have constraints that areinequalities. The process for these types of problems is nearly identical to what we have been doing. Themain difference between the two types of problems is that we will also need to find all the critical pointsthat satisfy the inequality in the constraint and check these in the function when we check the values wefound using Lagrange multipliers. We are not going to give any examples of this type here and this will notappear in our examinations.

The final topic that we need to discuss here is what to do if we have more than one constraint. We willlook at two constraints, but we can naturally extend the work here to more than two constraints. We wantto optimize f(x, y, z) subject to the constraint g(x, y, z) = c and h(x, y, z) = k. The system that weneed to solve in this case is8>>>><

>>>>:

∇f(x, y, z) = λ ∇g(x, y, z) + μ ∇h(x, y, z),

g(x, y, z) = c,

h(x, y, z) = k.

So in this case we get two Lagrange multipliers (i.e., λ and μ). Also, note that the first equation reallyis three equations as we saw in the previous examples. Let us see an example of this kind of optimizationproblem.


Find the maximum and minimum of

f(x, y, z) = 4y − 2z

subject to the constraints 2x − y − z = 2 and x2 + y2 = 1.

Solution Here is the system that we need to solve.

0 = 2λ + 2μx (fx = λgx + μhx) , (3.17)

4 = −λ + 2μy (fy = λgy + μhy) , (3.18)

−2 = −λ (fz = λgz + μhz) , (3.19)

2x − y − z = 2, (3.20)

x2 + y2 = 1. (3.21)

96


First, let us notice that from equation (3.19) we get λ = 2. Plugging this into equation (3.17) andequation (3.18) and solving for x and y respectively gives

0 = 4 + 2μx =⇒ x = − 2

μ,

4 = −2 + 2μy =⇒ y =3

μ.

Now, plug these into equation (3.21).

4

μ2+

9

μ2=

13

μ2= 1 =⇒ μ = ±

√13.

So, we have two cases to look at here. First, let us see what we get when μ =√

13. In this case we have

x = − 2√13

and y =3√13

.

Plugging these into equation (3.20) gives

− 4√13

− 3√13

− z = 2 =⇒ z = −2 − 7√13

.

So, we have obtained one solution.

Let us now see what we get if we take μ = −√13. Here we have

x =2√13

and y = − 3√13

.

Plugging these into equation (3.20) gives

4√13

+3√13

− z = 2 =⇒ z = −2 +7√13

,

and this is the second solution.

Now all that we need to do is check the two solutions in the function to see which is the maximum andwhich is the minimum.

f(− 2√13

,3√13

, −2 − 7√13

) = 4 +26√13

≈ 11.2111,

f(2√13

, − 3√13

, −2 +7√13

) = 4 − 26√13

≈ −3.2111.

So, we have

a maximum at (− 2√13

,3√13

, −2 − 7√13

) and a minimum at (2√13

, − 3√13

, −2 +7√13

).

�

97


98

Chapter 4

Multiple Integrals

4.1 Double Integrals

Before starting on double integrals let us do a quick review of what is the definition of definite integrals forfunctions of one single variable. For these integrals we are integrating over the interval a � x � b,Z b

a

f(x) dx.

Now, when we are going to make the definition of the definite integral we first consider this as a classical“area problem”. What is the area problem? Historically, the idea of limits has been used to find areas ofdifferent shapes. Let us first consider a simple example: how the area of a circle can be found by a limitingprocess?

r height aboutsame as radius

width approximately = half circumference

We divide the circular disc into pieces of equal size and then rearrange the pieces as in the figure above.In the limit of increasing number of pieces (i.e., smaller and smaller divisions) the rearrangement will givesomething closer to a rectangle (imagine it) of which the area can easily be found. Thus the area of thecircle is the same as the area of the limiting rectangle, which is

height × width = r × πr = πr2.

However, in general, we cannot divide things up like we can with the circle. We need a general methodthat will work for a wider variety of shapes. For example, given a region in the plane, we need a consistentmethod that helps to find the enclosed area. Very often this refers to finding the area of the region boundedabove by the graph of a given positive function f(x), bounded below by the x-axis, bounded to the leftby the vertical line x = a, and to the right by the vertical line x = b. The answer to this problem camethrough a very nice idea. Indeed, this can be done by first dividing the region into strips (figure below) andapproximating the area of each strip by a small rectangle whose area we know how to find! Of course, thearea approximated by the rectangles is not quite the same as the area under the curve, but as the rectanglesget thinner (more rectangles at the same time), the total area of the rectangles gets closer to the real area.Again, this is a limiting process similar to the disc area problem.

99

4. Multiple Integrals

The following is a figure illustrating the above idea of dividing the region into (infinitely many) strips.

As mentioned before, we may regard integration as the limiting process of an algebraic summation. So itis not hard to imagine that definite integration plays the role in summing up the areas of rectangles in alimiting way. In the following we shall try to derive the mathematics of finding areas by definite integration.In the following you will see how the method works.

Mathematical formulation. Let f(x) be a continuous positive function defined on the closed interval[a, b]. We divide the interval [a, b] by any finite set of points, called a partition P of [a, b], i.e.,

a = x0 < x1 < x2 < · · · < xn−1 < xn = b.

Hence, [a, b] is divided into n subintervals: [a, x1], [x1, x2], [x2, x3], · · · , [xi−1, xi], · · · , [xn−1, b]. Thelength of each subinterval is

Δxi = xi − xi−1, i = 1, 2, · · · , n.

Denote the longest length of these subintervals by Δx = max1�i�n

{Δxi} and we call it the norm of the

partition P. On each of the subintervals [xi−1, xi] (i = 1, 2, · · · , n), we choose an arbitrary point x∗i such

that xi−1 � x∗i � xi. The choice for x∗

i will define a particular rectangle, with f(x∗i ) as its height and

Δxi as its width, on the subinterval [xi−1, xi]. The following figure summarizes the details of what wehave discussed.

x

y

0 a bxi−1

x∗i xi

y = f(x)

magnified parton right side

Δxi

xi−1 x∗i xi (x∗

i movable)

A Riemann sum of f for the partition P is defined as a finite algebraic sum of the form

Sn =nX

i=1

f(x∗i ) Δxi, where x∗

i ∈ [xi−1, xi] for i = 1, 2, · · · , n.

The sum Sn is nothing but just the sum of areas of the rectangles over [a, b]. If the sum has a limit as thenumber of subintervals increases without bound (while the longest length Δx goes to zero) and if the limitis independent of the choices x∗

i , then we call the limit the definite integral of f(x) over [a, b] and write it

as

Z b

a

f(x) dx. We make the conclusion in the following definition.

100


4.1.1 Definition Let f be a real-valued function defined on [a, b], P be a partition of [a, b]. If the limit

limn→∞

nPi=1

f(x∗i )Δxi exists independently of the mode of subdivision of [a, b] and x∗

i , then the limit is called

the definite integral of f(x) from x = a to x = b and is written as

limn→∞

nXi=1

f(x∗i )Δxi =

Z b

a

f(x) dx. (4.1)

The function f(x) is called the integrand. The numbers a and b are called the lower and upper limitsof integration, respectively and [a, b] is called the range of integration. The letter x is the variableof integration which is actually a “dummy” variable that can be replaced by any other letter. It can be

proved that the definite integral

Z b

a

f(x) dx exists if f is continuous on [a, b]. It also exists if f is piecewise

continuous.

Geometrically,

Z b

a

f(x) dx represents the area bounded by the curve y = f(x), the x-axis and the

vertical lines at x = a, x = b only if f(x) � 0 (i.e., the whole graph of f lies in the upper half plane).Otherwise, the definite integral represents the algebraic sum of the areas above and below the x-axis, treatingareas above the x-axis as positive and areas below the x-axis as negative. That is to say, in general, a definiteintegral gives the net area between the graph of f and the x-axis, i.e., the sum of the areas of the regionswhere y = f(x) is above the x-axis minus the sum of the areas of the regions where y = f(x) is belowthe x-axis.

Double integrals. In this section we want to integrate a function of two variables, f(x, y). For functionsof one variable we integrate over an interval (i.e., a one-dimensional space) and so it makes some sense thatwhen integrating a function of two variables we will integrate over a region in R

2 (i.e., a two-dimensionalspace).

We begin by assuming that the region in R2 is a rectangle which we will denote as follows

D = [a, b] × [c, d].

This means that the ranges for x and y are a � x � b and c � y � d. Also we will initially assume thatf(x, y) � 0 although this does not really have to be the case. Let us start out with the graph of the surfaceS given by graphing z = f(x, y) over the rectangle D.

x

y

z

a

b

cd

Rectangular Region D

Surface S

��

��

101


Now, just like with functions of one variable, let us first ask what the volume of the region under the surfaceS (and above the xy-plane of course) is.

We will first approximate the volume much as we approximated the area before. We will first divide upa � x � b into n subintervals and divide up c � y � d into m subintervals. This will divide up theregion D into a series of smaller rectangles and from each of these we will choose a point (x∗

i , y∗j ). Here

is a sketch of this set up.

Region D

x

y

d = ym

yj

c = y0

a = x0 x1 xi xn−1 b = xn

(x∗i , y∗

j )

•

Now, over each of these smaller rectangles we will construct a box whose height is given by f(x∗i , y∗

j ). Hereis a sketch of that.

x

y

z

Surface S

Region D (“projection”)

Each of the rectangles has a base area of ΔA and a height of f(x∗i , y

∗j ) so the volume of each of these

boxes is f(x∗i , y∗

j ) ΔA. The volume under the surface S is then approximately

V ≈nX

i=1

mXj=1

f(x∗i , y∗

j )ΔA.

We will have a double sum since we will need to add up volumes in both the x and y directions.

102


To get a better estimation of the volume we will take n and m go to infinity. In other words,

V = limn,m→∞

nXi=1

mXj=1

f(x∗i , y

∗j ) ΔA.

Now, this looks familiar. This looks like the definition of the integral of a function of single variable. In factthis is also the definition of a double integral, or more exactly an integral of a function of two variables overa rectangle.

Here is the formal definition of a double integral of a function of two variables over a rectangular regionD as well as the notation that we will use for it.

ZZD

f(x, y) dA = limn,m→∞

nXi=1

mXj=1

f(x∗i , y

∗j ) ΔA.

Note the similarities and differences in the notation to single integrals. We have two integrals (i.e.,RR

) todenote the fact that we are dealing with a two-dimensional region and we have a differential here as well.Note that the differential is dA instead of the dx and dy that we are used to seeing. Note as well that wedo not have limits on the integrals in this notation. Instead we have the D written below the two integralsto denote the region that we are integrating over.

Note that one interpretation of the double integral of f(x, y) over the rectangle D is the volume underthe function f(x, y) (and above the xy-plane), or

volume =

ZZD

f(x, y) dA.

We can use this double sum in the definition to estimate the value of a double integral if we need to. Wecan do this by choosing (x∗

i , y∗j ) to be the midpoint of each rectangle. When we do this we usually denote

the point as (xi, yj). This leads to the midpoint ruleZZD

f(x, y) dA ≈nX

i=1

mXj=1

f(xi, yj) ΔA.

In the next section we start looking at how to actually compute double integrals.

• Iterated integrals

We gave the definition of the double integral as a double sum (page 103). However, just like single integral,the definition is very difficult to use in practice and so we need to start looking into how we actually computedouble integrals. We still assume that we are integrating over the rectangle

D = [a, b] × [c, d].

We will look at more non-rectangular regions in the next section.

The following theorem tells us how to compute a double integral over a rectangle.

103


Theorem 4.1.1 (Fubini’s Theorem) If f(x, y) is continuous on D = [a, b] × [c, d], then

ZZD

f(x, y) dA =

Z b

a

Z d

c

f(x, y) dy dx =

Z d

c

Z b

a

f(x, y) dx dy.

These integrals are called iterated integrals.

Note that there are in fact two ways to compute a double integral and also notice that the inner differentialmatches up with the limits on the inner integral and similarly for the outer differential and limits. In otherwords, if the inner differential is dy then the limits on the inner integral must be y limits of integrationand if the outer differential is dy then the limits on the outer integral must be y limits of integration.

Now, on some level this is just notation and does not really tell us how to compute the double integral.Let us just take the first possibility above and change the notation a little.ZZ

D

f(x, y) dA =

Z b

a

»Z d

c

f(x, y) dy

–dx.

We will compute the double integral by first computing the inner integralZ d

c

f(x, y) dy

and we compute this by holding x constant and integrating with respect to y as if this is a single integral.The output of this integral is therefore a function involving only x’s which we can in turn integrate.

We have done a similar process with partial derivatives. To take the derivative of a function with respectto y we treated the x as constants and differentiated with respect to y as if it was a function of a singlevariable. Double integrals work in the same manner. We think of all the x’s as constants and integratewith respect to y or we think of all y’s constants and integrate with respect to x, depending on which of theiterated integrals is considered.

Let us take a look at some examples.

� Example 4.1.1 (Double integrals)

Compute each of the following double integrals over the indicated rectangles.

(a)

ZZD

6xy2 dA, D = [2, 4] × [1, 2].

(b)

ZZD

(2x − 4y3) dA, D = [−5, 4] × [0, 3].

(c)

ZZD

ˆx2y2 + cos(πx) + sin(πy)

˜dA, D = [−2,−1] × [0, 1].

(d)

ZZD

1

(2x + 3y)2dA, D = [0, 1] × [1, 2].

(e)

ZZD

xexy dA, D = [−1, 2] × [0, 1].

104


Solution

(a) It does not matter which variable we integrate with respect to first, we will get the same answerregardless of the order of integration. To justify that let us work this one with each order to makesure that we do get the same answer.

Solution 1. In this case we will integrate with respect to y first. So, the iterated integral that weneed to compute is ZZ

D

6xy2 dA =

Z 4

2

Z 2

1

6xy2 dy dx.

When setting these up make sure the limits match up to the differentials. Since the dy is the innerdifferential (i.e. we are integrating with respect to y first) the inner integral needs to have y limits.To compute this we will do the inner integral first and we typically keep the outer integral around asfollows ZZ

D

6xy2 dA =

Z 4

2

h2xy3

i21dx =

Z 4

2

(16x − 2x) dx =

Z 4

2

14x dx.

Remember that we treat the x as a constant when doing the first integral and we don’t do anyintegration with it yet. Now, we have a normal single integral so let us finish the integral by computingthis ZZ

D

6xy2 dA =h7x2i42

= 84.

Solution 2. In this case we will integrate with respect to x first and then y. Here is the computationfor the solution.ZZ

D

6xy2 dA =

Z 2

1

Z 4

2

6xy2 dx dy =

Z 2

1

h3x2y2

i42dy =

Z 2

1

36y2 dy =h12y3

i21

= 84.

Sure enough the same answer as the first solution. So, remember that we can do the integration inany order.

(b) For this integral we will integrate with respect to y first.ZZD

(2x − 4y3) dA =

Z 4

−5

Z 3

0

(2x − 4y3) dy dx =

Z 4

−5

h2xy − y4

i30dx

=

Z 4

−5

(6x − 81) dx =h3x2 − 81x

i4−5

= −756.

Remember that when integrating with respect to y all x’s are treated as constants and so as far asthe inner integral is concerned the 2x is a constant and we know that when we integrate constantswith respect to y we just tack on a y and so we get 2xy from the first term.

(c) In this case we will integrate with respect to x first.ZZD


˜dA =

Z 1

0

Z −1

−2


˜dx dy

=

Z 1

0

»1

3x3y2 +

1

πsin(πx) + x sin(πy)

–−1

−2

dy

=

Z 1

0

»7

3y2 + sin(πy)

–dy =

»7

9y3 − 1

πcos(πy)

–10

=7

9+

2

π.

105


(d) In this case because the limits for x are kind of nice (i.e., they are zero and one which are often nicefor evaluation) let us integrate with respect to x first. We will also rewrite the integrand to help withthe first integration. ZZ

D

1

(2x + 3y)2dA =

Z 2

1

Z 1

0

(2x + 3y)−2 dx dy

=

Z 2

1

»−1

2(2x + 3y)−1

–10

dy

= −1

2

Z 2

1

»1

2 + 3y− 1

3y

–dy

= −1

2

»1

3ln |2 + 3y| − 1

3ln |y|

–21

= −1

6(ln 8 − ln 2 − ln 5) .

(e) Now, while we can technically integrate with respect to either variable first sometimes one way issignificantly easier than the other way. In this case it will be significantly easier to integrate withrespect to y first as we will see. ZZ

D

xexy dA =

Z 2

−1

Z 1

0

xexy dy dx.

The y integration can be done with the quick substitution

u = xy, du = x dy

which gives ZZD

xexy dy dx =

Z 2

−1

hexyi10dx =

Z 2

−1

[ex − 1] dx

=hex − x

i2−1

= e2 − 2 − (e−1 + 1) = e2 − e−1 − 3.

So, not too bad of an integral there provided you get the substitution. Now let us see what wouldhappen if we had integrand with respect to x first.ZZ

D

xexy dA =

Z 1

0

Z 2

−1

xexy dx dy.

In order to do this we would have to use integration by parts. We are not even going to continuehere as the outcome is indeed troublesome to do. (However, students are suggested to work out thisfor review purposes as integration by parts is a very important technique in single variable integralcalculus. Do not forget the basic integration techniques.) �

As we saw in the previous set of examples we can do the integral in either direction. However, sometimesone direction of integration is significantly easier than the other so make sure that you think about whichone you should do first before actually doing the integral. The next topic of this section is a quick fact thatcan be used to make some iterated integrals somewhat easier to compute on occasion.

106


Theorem 4.1.2 If f(x, y) = g(x)h(y) and we are integrating over the rectangle D = [a, b] × [c, d], then

ZZD

f(x, y) dA =

ZZD

g(x)h(y) dA =

„Z b

a

g(x)dx

«„Z d

c

h(y) dy

«.

So we can break up the function into a function only of x times a function of y then we can do two integralsindividually and multiply them together. Let us do a quick example using this integral.

� Example 4.1.2 (Integral as a product)

Evaluate ZZD

x cos2 y dA, D = [−2, 3] × [0,π

2].

Solution Since the integrand is a function of x times a function of y we can use the fact.ZZD

x cos2 y dA =

„Z 3

−2

x dx

« Z π/2

0

cos2 y dy

!

=

„Z 3

−2

x dx

« Z π/2

0

1 + cos(2y)

2dy

!

=

»1

2x2

–3−2

·»1

2y +

1

4sin(2y)

–π/2

0

=5

2· π

4=

5π

8.

�

We have one more topic to discuss in this section. This topic really doesn’t have anything to do with iteratedintegrals, but this is as good a place as any to put it and there are liable to be some questions about it atthis point as well so this is as good a place as any.

What we want to do is discuss single definite integrals of a function of two variables. In other words wewant to look at integrals like the following.Z

(x sec2 y + 4xy) dy,

Z(x3 − e−x/y) dx.

From single variable calculus we know that these integrals are asking what function that we differentiate toget the integrand. However, in this case we need to pay attention to the differential (dy or dx) in theintegral, because that will change things a little. In the case of the first integral we are asking what functionwe differentiate with respect to y to get the integrand whereas in the second integral we are asking whatfunction differentiate with respect to x to get the integrand. For the most part answering these questions isnot that difficult. The important issue is how we deal with the constant of integration.

Here are the integrals. Z(x sec2 y + 4xy) dy = x tan y + 2xy2 + g(x),Z

(x3 − e−x/y) dx =1

4x4 + ye−x/y + h(y).

Notice that the “constants” of integration are now functions of the opposite variable. If we are differentiatingthe first integral with respect to y and we know that any function involving only x’s will differentiate to zeroand so when integrating with respect to y we need to acknowledge that there may have been a function ofonly x’s in the primitive function and so the “constant” of integration is in general a function of x. Likewise,

107


in the second integral, the “constant” of integration must be a function of y since we are integrating withrespect to x. Again, remember if we differentiate the answer with respect to x then any function of only y’swill differentiate to zero.

4.2 Double Integrals Over Non-rectangular Regions

In the previous section we looked at double integrals over rectangular regions. The problem with this is thatpractically most of the regions are not rectangular so we need to now look at the following double integralZZ

R

f(x, y) dA,

where R is a general region.

There are two types of regions that we need to look at. Here is a sketch of both of them.

y

xa b

y = g2(x)

y = g1(x)

Region R

y

x

c

d

x = h2(y)

x = h1(y)

Type 1 Type 2

We will often use set builder notation to describe these regions. Here is the definition for the region in Type 1:

R = {(x, y) : a � x � b, g1(x) � y � g2(x)},and here is the definition for the region in Type 2:

R = {(x, y) : h1(y) � x � h2(y), c � y � d}.This notation is really just a fancy way of saying we are going to use all the points (x, y) in which both ofthe coordinates satisfy the two given inequalities. The double integral for both of these cases are defined interms of iterated integrals as follows.

In Type 1 where R = {(x, y) : a � x � b, g1(x) � y � g2(x)} the integral is defined to beZZR

f(x, y) dA =

Z b

a

Z g2(x)

g1(x)

f(x, y) dy dx.

In Type 2 where R = {(x, y) : h1(y) � x � h2(y), c � y � d} the integral is defined to beZZR

f(x, y) dA =

Z d

c

Z h2(y)

h1(y)

f(x, y) dx dy.

108


The following are some properties of the double integral that we should go over before we actually do someexamples. Note that all three of these properties are really just extensions of properties of single integralsthat have been extended to double integrals.

Properties.

(1)

ZZR

[f(x, y) + g(x, y)]dA =

ZZR

f(x, y) dA +

ZZR

g(x, y) dA.

(2)

ZZR

cf(x, y) dA = c

ZZR

f(x, y) dA, where c is any constant.

(3) If the region R can be split into two separate regions R1 and R2 then the integral can bewritten as ZZ

R

f(x, y) dA =

ZZR1

f(x, y) dA +

ZZR2

f(x, y) dA.

Let us take a look at some examples of double integrals over non-rectangular regions.

� Example 4.2.1 (Double integrals over non-rectangular regions)

Evaluate each of the following double integrals over the given region R.

(a)

ZZR

ex/y dA, R = {(x, y) : 1 � y � 2, y � x � y3}.

(b)

ZZR

(4xy − y3) dA, R is the region bounded by y =√

x and y = x3.

(c)

ZZR

(6x2 − 40y) dA, R is the triangle with vertices (0, 3), (1, 1), and (5, 3).

Solution

(a) Let us do this one by just using the formula.

ZZR

ex/y dA =

Z 2

1

Z y3

y

ex/y dx dy =

Z 2

1

hyex/y

iy3

ydy

=

Z 2

1

hyey2 − ye1

idy =

»1

2ey2 − e

2y2

–21

=1

2e4 − 2e.

(b) In this case we need to determine the two inequalities for x and y that we need to evaluate the integral.The best way to do this is first sketch the graphs of the two curves. Here is the sketch.

109


y

x0 1

y =√

x

y = x3

Region R

So, from the sketch we can see that inequalities are

0 � x � 1 and x3 � y �√

x.

We can now do the integralZZR

(4xy − y3) dA =

Z 1

0

Z √x

x3(4xy − y3) dy dx

=

Z 1

0

»2xy2 − 1

4y4

–√x

x3dx

=

Z 1

0

»7

4x2 − 2x7 +

1

4x12

–dx

=

»7

12x3 − 1

4x8 +

1

52x13

–10

=55

156.

(c) We have even less information about the region for this case. Let us start this off by sketching thetriangle and getting equations for each side of the triangle.

y

x

(5, 3)(0, 3)

(1, 1)

y = 3

y = −2x + 3y =

1

2x +

1

2

y

x

(5, 3)(0, 3)

(1, 1)

x = −1

2y +

3

2 x = 2y − 1

Now, there are two ways to describe this region (i.e., Type 1 and Type 2, page 108). If we usefunctions of x, as shown in the figure (left) we will have to break the region up into two differentpieces since the lower function is different depending upon the value of x. In this case the region willbe given by R = R1 ∪ R2, where

R1 = {(x, y) : 0 � x � 1, −2x + 3 � y � 3},R2 = {(x, y) : 1 � x � 5,

1

2x +

1

2� y � 3}.

110


Note that the ∪ is the “union” notation and it just means that R is the region given by combiningthe two regions. If we do this then we will need to do two separate integrals, one for each of theregions. To avoid this we could turn things around and solve the two equations for x to get.

y = −2x + 3 =⇒ x = −1

2y +

3

2,

y =1

2x +

1

2=⇒ x = 2y − 1.

If we do this we can notice that the same function is always on the right and same function is alwayson the left, as shown in the figure above (right). The region is

R = {(x, y) : −1

2y +

3

2� x � 2y − 1, 1 � y � 3}.

Writing the region in this form means doing a single integral instead of the two integrals. Either waysshould give the same answer and let us compare the two ways in the following.

Solution 1.ZZR

(6x2 − 40y) dA =

ZZR1

(6x2 − 40y) dA +

ZZR2

(6x2 − 40y) dA

=

Z 1

0

Z 3

−2x+3

(6x2 − 40y) dy dx +

Z 5

1

Z 3

12 x+ 1

2

(6x2 − 40y) dy dx

=

Z 1

0

h6x2y − 20y2

i3−2x+3

dx +

Z 5

1

h6x2y − 20y2

i312 x+ 1

2

dx

=

Z 1

0

ˆ12x3 + 80x2 − 240x

˜dx

+

Z 5

1

ˆ−3x3 + 20x2 + 10x − 175˜dx

=

»3x4 +

80

3x3 − 120x2

–10

+

»−3

4x4 +

20

3x3 + 5x2 − 175x

–51

=

„3 +

80

3− 120

«

+

»„−1875

4+

2500

3+ 125 − 875

«−„−3

4+

20

3+ 5 − 175

«–= −935

3.

The above is a lot of work. Let us see the second way of evaluating the same integral.

Solution 2.

This way will be a lot less work since we are only going to do one single integral.ZZR

(6x2 − 40y) dA =

Z 3

1

Z 2y−1

− 12 y+ 3

2

(6x2 − 40y) dx dy

=

Z 3

1

h2x3 − 40xy

i2y−1

− 12 y+ 3

2

dy

=

Z 3

1

»65

4y3 − 505

4y2 +

475

4y − 35

4

–dy

=

»65

16y4 − 505

12y3 +

475

8y2 − 35

4y

–31

= −935

3.

111


So, the numbers are a little messier, but other than that there is much less work for the same result.�

As the part (c) of Example 4.2.1 has shown we can integrate these integrals in either order (i.e., x followedby y or y followed by x), although often one order will be easier than the other. In fact there will be timeswhen it will not even be possible to do the integral in one order while it will be possible to do the integralin the other order.

Let us see a couple examples of these kinds of integrals.

� Example 4.2.2 (Change the order of a double integral)

Evaluate the following integrals by first reversing the order of integration.

(a)

Z 3

0

Z 9

x2x3ey3

dy dx. (b)

Z 8

0

Z 2

3√y

px4 + 1 dx dy.

Solution

(a) First, notice that if we try to integrate with respect to y we cannot do the integral because we wouldneed a y2 in front of the exponential in order to do the y integration. Instead, we hope that if wereverse the order of integration we will get an integral that we can do.

Now, when we say that we are going to reverse the order of integration this means that we want tointegrate with respect to x first and then y. Note as well that we cannot just interchange the integrals,keeping the original limits, and be done with it. This would not fix our original problem and in orderto integrate with respect to x we cannot have x’s in the limits of the integrals. Even if we ignoredthat the answer would not be a constant as it should be.

So, let us see how we reverse the order of integration. The best way to reverse the order of integrationis to first sketch the region given by the original limits of integration. From the integral we see thatthe inequalities that define this region are

0 � x � 3 and x2 � y � 9.

These inequalities tell us that we want the region with y = x2 as the lower boundary and y = 9 asthe upper boundary that lies between x = 0 and x = 3. Here is a sketch of that region (left).

y

x0 3

y = 9

y = x2

y

x0

9

x = 0 x =√

y

112


Since we want to integrate with respect to x first we will need to determine limits of x (probably interms of y) and then get the limits on y’s. Here are the inequalities

0 � y � 9 and 0 � x � √y.

Any horizontal line drawn in this region will start at x = 0 and end at x =√

y and so these arethe limits on the x’s and the range of y’s for the region is from 0 to 9. The sketch of the region isin the above (right). Now the integral, with the order reversed, is given by

Z 3

0

„Z 9

x2x3ey3

dy

«dx =

Z 9

0

„Z √y

0

x3ey3dx

«dy

and notice that we can do the first integration with this order. We will also hope that this will giveus a second integral that we can do. Here is the work for this integral.

Z 3

0

Z 9

x2x3ey3

dy dx =

Z 9

0

Z √y

0

x3ey3dx dy

=

Z 9

0

»1

4x4ey3

–√y

0

dy

=

Z 9

0

»1

4y2ey3

–dy

=

»1

12ey3–90

=1

12

`e729 − 1

´.

(b) As with the first integral we cannot do this integral by integrating with respect to x first so we willhope that by reversing the order of integration we will get something that we can integrate. Here arethe limits for the variables that we get from this integral.

0 � y � 8 and 3√

y � x � 2.

Here is a sketch of this region (left).

y

x0

8

x = 2x = 3√

y

y

x0 2y = 0

y = x3

So, if we reverse the order of integration we get the following limits (sketch above on the right).

0 � x � 2 and 0 � y � x3.

113


The integral is then

Z 8

0

Z 2

3√y

px4 + 1 dx dy =

Z 2

0

Z x3

0

px4 + 1 dy dx

=

Z 2

0

hyp

x4 + 1ix3

0dx

=

Z 2

0

x3p

x4 + 1 dx

=1

4

Z 2

0

(x4 + 1)1/2 d(x4 + 1)

=1

4

»(x4 + 1)3/2

3/2

–20

=1

6

“173/2 − 1

”.

�

The final topic of this section is two geometric interpretations of a double integral. The first interpretation isan extension of the idea that we used to develop the idea of a double integral in Section 4.1 of this chapter.We did this by looking at the volume of the solid that was below the surface of the function z = f(x, y)and over the rectangle D in the xy-plane. This idea can be extended to more general regions.

The volume of the solid that lies below the surface given by z = f(x, y) and above any non-rectangularregion R in the xy-plane is

volume =

ZZR

f(x, y) dA.

Let us look at a couple of examples.

� Example 4.2.3 (Volume as a double integral)

Find the volume of the solid that lies below the surface given by z = 16xy + 200 and lies above the regionin the xy-plane bounded by y = x2 and y = 8 − x2.

Solution The following is the graph of the given region, say R, in the xy-plane.

y

x

Region R

−2 2

y = 8 − x2

y = x2

114


By setting the two bounding equations equal we can see that they will intersect at x = 2 and x = −2.So, the inequalities that will define the region R in the xy-plane are

−2 � x � 2 and x2 � y � 8 − x2.

The volume is then given by

ZZR

(16xy + 200) dA =

Z 2

−2

Z 8−x2

x2(16xy + 200) dy dx

=

Z 2

−2

h8xy2 + 200y

i8−x2

x2dx

=

Z 2

−2

`−128x3 − 400x2 + 512x + 1600´

dx

=

»−32x4 − 400

3x3 + 256x2 + 1600x

–2−2

= 2

»−400

3(2)3 + 1600(2)

–=

12800

3.

�

� Example 4.2.4 (Volume as a double integral)

Find the volume of the solid enclosed by the planes

z + 4x + 2y = 10, y = 3x, z = 0, x = 0.

Solution This example is a little different from the previous one. Here the region R is not explicitlygiven so we are going to look for it. First, notice that the last two planes are really telling us that we won’tgo past the xy-plane and the yz-plane when we reach them. The first plane, z + 4x + 2y = 10, is the topof the volume and so we are really looking for the volume under

z = 10 − 4x − 2y

and above the region R in the xy-plane. The second plane, y = 3x, gives one of the sides of the volumeas shown below. The region R will be the region in the xy-plane (i.e., z = 0) that is bounded byy = 3x, x = 0, and the line where z + 4x + 2y = 10 intersects the xy-plane. We can determine wherez + 4x + 2y = 10 intersects the xy-plane by plugging z = 0 into it.

0 + 4x + 2y = 10 =⇒ 2x + y = 5 =⇒ y = −2x + 5.

So, the following is the sketch of the solid and the region in the xy-plane. Note that the region R is alittle out of scale.

115


z

x

y

z + 4x + 2y = 10

y = 3x

x

y

Region R

0 1

y = −2x + 5

y = 3x

The region R is really where this solid will sit on the xy-plane and the region is defined by the inequalities:

0 � x � 1 and 3x � y � −2x + 5.

The volume is then given by

volume =

ZZR

(10 − 4x − 2y) dA

=

Z 1

0

Z −2x+5

3x

(10 − 4x − 2y) dy dx

=

Z 1

0

h10y − 4xy − y2

i−2x+5

3xdx

=

Z 1

0

`25x2 − 50x + 25

´dx =

»25

3x3 − 25x2 + 25x

–10

=25

3.

�

The second geometric interpretation of a double integral is the following.

Theorem 4.2.1 (Area as a double integral)

Area of region R =

ZZR

1 dA,

in which the integrand is the constant function 1.

This is easy to see why this is true in general. Let us suppose that we want to find the area of the regionshown below.

116

4.3 Double Integrals in Polar Coordinates

y

xa b

Region R

y = g2(x)

y = g1(x)

From single variable calculus we know that this area can be found by the integral

area of the region R =

Z b

a

(upper curve − lower curve) dx

=

Z b

a

[g2(x) − g1(x)] dx.

Or in terms of a double integral we have

area of the region R =

ZZR

1 dA

=

Z b

a

Z g2(x)

g1(x)

1 dy dx

=

Z b

a

hyig2(x)

g1(x)dx

=

Z b

a

[g2(x) − g1(x)]dx.

This is exactly the same formula we have in single variable calculus.


To this point we have seen quite a few double integrals. However, in every case region R could be easilydescribed in terms of simple functions in Cartesian coordinates. In this section we want to look at someregions that are much easier to describe in terms of polar coordinates. For instance, we might have a regionthat is a disk, ring, or a portion of a disk or ring. In these cases using Cartesian coordinates could besomewhat cumbersome. For instance, let us suppose we have the following integralZZ

R

f(x, y) dA, R is the disk of radius 2.

To this we have to determine a set of inequalities for x and y that describe this region. These would be

−2 � x � 2 and −p

4 − x2 � y �p

4 − x2,

117


and then the integral would becomeZZR

f(x, y) dA =

Z 2

−2

Z √4−x2

−√

4−x2f(x, y) dy dx.

Due to the limits on the inner integral this is liable to be an uneasy integral to compute.

However, a disk of radius of 2 can be defined easily in polar coordinates by the following inequalities

0 � θ � 2π and 0 � r � 2.

These are very simple limits and, in fact, are constant limits of integration which always makes integralssomewhat easier. So, if we could convert our double integral formula into one involving polar coordinateswe would be in pretty good shape. The problem is that we cannot just convert the dx and the dy into adr and a dθ. In computing double integrals to this point we have been using the fact that dA = dx dyand this really does require Cartesian coordinates to use. In fact it can be shown that, in terms of polarcoordinates, dA can be written as

dA = r dr dθ.

Note the addition of the r in the formula. That is important. Without it the answer will be wrong. Wenow need to find a formula for the double integral when we use polar coordinates to define the region R.First, let us get a sketch of a sample region as well as the area differential (area element):

y

x

r = r1(θ)

r = r2(θ)

θ = θ1

θ = θ2

Δθ

ΔA

r + Δr

r

Area differential:dA ≈ ΔA

So, our general region will be defined by inequalities

θ1 � θ � θ2 and r1(θ) � r � r2(θ).

Now, if we are going to integrate with respect to polar coordinates we have to make sure that we havealso converted all the x’s and y’s into polar coordinates as well. To do this we will need to remember thefollowing conversion formulas

x = r cos θ, y = r sin θ, and x2 + y2 = r2.

Also, as we mentioned that the area differential can be written as dA = r dr dθ. We are going to give thedetail of the proof in the following.

dA ≈ ΔA

=1

2(r + Δr)2 Δθ − 1

2r2 Δθ

= r Δr Δθ +1

2(Δr)2 Δθ

≈ r Δr Δθ

≈ r dr dθ.

118


We are now ready to write down a formula for the double integral in terms of polar coordinates.

ZZR

f(x, y) dA =

Z θ2

θ1

Z r2(θ)

r1(θ)

f(r cos θ, r sin θ) r dr dθ.

It is important not to forget the added r in the differential and also don’t forget to convert the Cartesiancoordinates (i.e., x, y) in the integrand to polar coordinates.

Let us look at a couple of examples of these kinds of integrals.

� Example 4.3.1 (Double integrals in polar coordinates)

Evaluate the following integrals by converting them into polar coordinates.

(a)

ZZR

2xy dA, R is the portion of the region between the circles of radius 2 and radius 5 centered at

the origin that lies in the first quadrant.

(b)

ZZR

ex2+y2dA, R is the unit circle centered at the origin.

Solution

(a) First let us get R in terms of polar coordinates. The circle of radius 2 is given by r = 2 and thecircle of radius 5 is given by r = 5. We want the region between them so we will have the followinginequality for r,

2 � r � 5.

Also, since we only want the portion that is in the first quadrant we get the following range of θ’s,

0 � θ � π

2.

Now we can do the integralZZR

2xy dA =

Z π2

0

Z 5

2

2(r cos θ)(r sin θ) r dr dθ.

Do not forget to do the conversions and to add in the extra r. Now, let us simplify and make use ofthe double angle formula for sine to make the integral a little easier.ZZ

R

2xy dA =

Z π2

0

Z 5

2

r3 sin(2θ) dr dθ

=

Z π2

0

»1

4r4 sin(2θ)

–52

dθ

=

Z π2

0

609

4sin(2θ) dθ

=

»−609

8cos(2θ)

– π2

0

=609

4.

119


(b) In this case we can’t do this integral in terms of Cartesian coordinates. We will however be able todo it in polar coordinates. First, the region R is defined by

0 � θ � 2π and 0 � r � 1.

In terms of polar coordinates the integral is thenZZR

ex2+y2dA =

Z 2π

0

Z 1

0

er2r dr dθ.

Notice that the addition of r gives us an integral that can be solvable. Here is how to do the integral.ZZR

ex2+y2dA =

Z 2π

0

Z 1

0

rer2dr dθ

=

Z 2π

0

Z 1

0

1

2er2

d(r2) dθ

=

Z 2π

0

»1

2er2–10

dθ

=

Z 2π

0

1

2(e − 1) dθ

= π(e − 1).

�

Do not forget that we still have the two geometric interpretations for these integrals as well. They includefinding enclosed areas and volumes in polar coordinates. We use several examples in the following to illustratethe advantage of using polar coordinates. That is to say occasionally it is technically easier to evaluate thedouble integrals in polar coordinates.

� Example 4.3.2 (Area in polar coordinates)

Find the area of the region enclosed by the two circles:

x2 + (y − 1)2 = 1 and x2 + (y − 2)2 = 4.

Solution For the first circle x2 +(y−1)2 = 1, substituting x = r cos θ and y = r sin θ into the equationand simplifying, we have

r = 2 sin θ.

Similarly, the second circle isr = 4 sin θ.

The enclosed region is shown as follows.

y

x

Region R

120


The values of θ is given by0 � θ � π.

Thus, the area of the region R is

area =

ZZR

1 dA

=

Z π

0

Z 4 sin θ

2 sin θ

1 r dr dθ

=

Z π

0

»r2

2

–4 sin θ

2 sin θ

dθ

=

Z π

0

6 sin2 θ dθ

=

Z π

0

3(1 − cos 2θ) dθ

=

»3θ − 3

2sin 2θ

–π

0

= 3π.

�


Find the area of the region enclosed by the two circles:

x2 + (y − 1)2 = 1 and (x − 1)2 + y2 = 1.

Solution In polar coordinates, the two circles are

r = 2 sin θ and r = 2 cos θ.

The region can be divided into two equal-sized non-overlapping regions. We just need to evaluate one ofthem and eventually multiply it by 2 to obtain the required area. The enclosed region R is shown as follows.

y

x

Region R

r = 2 sin θ

r = 2 cos θ

Solving for the intersection points between the two circles gives

2 sin θ = 2 cos θ,

tan θ = 1,

θ =π

4.

121


Now, the area of the region R is

area = 2

Z π/4

0

Z 2 sin θ

0

r dr dθ

= 2

Z π/4

0

»r2

2

–2 sin θ

0

dθ

= 4

Z π/4

0

sin2 θ dθ

= 2

Z π/4

0

(1 − cos 2θ) dθ

= 2

»θ − 1

2sin 2θ

–π/4

0

=π

2− 1.

�


Determine the area of the region that lies inside r = 3 + 2 sin θ and outside r = 2.

Solution Below shows a sketch of the region R that we want to determine the area of. To determinethis area we will need to know that value(s) of θ for which the two curves intersect. We can determine thesepoints by setting the two equations equal and solving. Thus, 3 + 2 sin θ = 2, and hence

sin θ = −1

2=⇒ θ =

7π

6,

11π

6.

Here is the sketch of the figure with the rays (θ = 7π/6, θ = 11π/6) shown.

y

x

Region R

r = 3 + 2 sin θ

r = 2θ =7π

6θ = −π

6,

11π

6

Note as well that we have acknowledged that −π

6is another representation for the angle

11π

6. This is

important since we need the range of θ to actually enclose the region as we increase from the lower limit to

the upper limit. If we choose to use11π

6then as we increase from

7π

6to

11π

6we would be tracing out

the lower portion of the circle and that is not the region that we are looking for.

122


So, here are the ranges that will define the region.

−π

6� θ � 7π

6and 2 � r � 3 + 2 sin θ.

To get the ranges for r the function that is closest to the origin is the lower bound and the function thatis farthest to the origin is the upper bound.

The area of the region R is then

area =

Z 7π6

− π6

Z 3+2 sin θ

2

r dr dθ

=

Z 7π6

− π6

»1

2r2

–3+2 sin θ

2

dθ

=

Z 7π6

− π6

»5

2+ 6 sin θ + 2 sin2 θ

–dθ

=

Z 7π6

− π6

»7

2+ 6 sin θ − cos(2θ)

–dθ

=

»7

2θ − 6 cos θ − 1

2sin(2θ)

– 7π6

− π6

=11

√3

2+

14π

3≈ 24.19.

�

� Example 4.3.5 (Volume in polar coordinates)

Find the volume of the spherex2 + y2 + z2 = a2.

Solution First recall that the volume of the solid enclosed by two surfaces (top and bottom) is given by

volume =

ZZR

(ztop − zbottom) dA.

It would be difficult to use the rectangular coordinates to find the volume. In terms of polar coordinates,

volume =

Z 2π

0

Z a

0

(ztop − zbottom) r dr dθ.

Solving the given equation for z gives

z2 = a2 − (x2 + y2) = a2 − r2 =⇒ z = ±p

a2 − r2.

The volume of the solid is given by

volume =

Z 2π

0

Z a

0

hpa2 − r2 − (−

pa2 − r2)

ir dr dθ

= 2

Z 2π

0

Z a

0

pa2 − r2 r dr dθ

= −Z 2π

0

Z a

0

(a2 − r2)1/2 d(a2 − r2) dθ

= −Z 2π

0

»(a2 − r2)3/2

3/2

–a

0

dθ

=2

3

Z 2π

0

a3 dθ =2

3a3 (2π) =

4

3πa3.

123


�


Find the volume of the solid that lies under the sphere

x2 + y2 + z2 = 9,

above the plane z = 0 and inside the cylinder x2 + y2 = 5.

Solution We know that the formula for finding the volume is given by

volume =

ZZR

(ztop − zbottom) dA.

In order to make use of this formula we are going to determine the integrand function that we should beintegrating and the region R that we are going to be integrating over.

The integrand function is just not too complicated. The solid is bounded above by the top surface whichis a portion of the sphere (i.e., z =

p9 − x2 − y2) and bounded below by the bottom surface which is the

xy-plane (i.e., z = 0). Remark that we took the positive square root since we are wanting the solid liesabove the xy-plane.

ztop =p

9 − x2 − y2 and zbottom = 0.

The region R is also not too difficult in this case either. As we take points, (x, y), from the region we needto completely graph the portion of the sphere that we are working with. Since we only want the portion ofthe sphere that actually lies inside the cylinder given by x2 + y2 = 5 this is also the region R. The regionR is the disk x2 + y2 � 5 in the xy-plane.

For reference purposes here is a sketch of the solid that we are trying to find the volume of.

z

yx

Solid

Region R (bottom of solid)

��

��

So, the solid that we want the volume for is really a cylinder with a cap that comes from the sphere.Definitely we are going to do this integral in terms of polar coordinates, so here are the limits (in polarcoordinates) for the region R:

0 � θ � 2π and 0 � r �√

5.

124


We will also need to convert the function to polar coordinates as well.

ztop − zbottom =p

9 − (x2 + y2) − 0 =p

9 − r2.

The volume is therefore

volume =

Z 2π

0

Z √5

0

p9 − r2 r dr dθ

= −1

2

Z 2π

0

Z √5

0

(9 − r2)1/2 d(9 − r2) dθ

= −1

2

Z 2π

0

»(9 − r2)3/2

3/2

–√5

0

dθ

=1

3

Z 2π

0

“93/2 − 43/2

”dθ

=1

3

`33 − 23´ (2π) =

38

3π.

�


Find the volume of the solid that lies inside

z = x2 + y2

and below the plane z = 16.

Solution Here is a sketch of the solid.

z

yx

Solid

Region R (“projection”)

��

��

Now, in this case the standard formula is not going to work. The formula

volume =

ZZR

f(x, y) dA

125


gives the volume under the function f(x, y) and we are actually look for the volume above the function.This is not the problem that it might appear to be however. First, notice that

V1 =

ZZR

16 dA

will be the volume under z = 16 (of course we will need to determine R eventually) while

V2 =

ZZR

(x2 + y2) dA

will be the volume under z = x2 + y2, using the same R.

The volume that we are looking for is really the difference between these two, or

V = V1 − V2 =

ZZR

ˆ16 − (x2 + y2)

˜dA.

Now all that we need to do is determine the region R and then convert everything to polar coordinates.

Determining the region R in this case is not too complicated. If we look straight down the z-axis ontothe region we will see a circle of radius 4 centered at the origin. This is because the top of the solid, wherethe elliptic paraboloid intersects the plane, is the widest part of the solid. We know the z coordinate at theintersection so, setting z = 16 in the equation of the paraboloid gives

16 = x2 + y2

which is the equation of a circle of radius 4 centered at the origin. Here are the inequalities for the regionR and the function we will be integrating in terms of polar coordinates.

0 � θ � 2π, 0 � r � 4, z = 16 − r2.

The volume is then

volume =

ZZR

ˆ16 − (x2 + y2)

˜dA

=

Z 2π

0

Z 4

0

(16 − r2) r dr dθ

=

Z 2π

0

»8r2 − 1

4r4

–40

dθ =

Z 2π

0

64 dθ = 128π.

�

In Examples 4.3.5– 4.3.7 we would have not been able to easily compute the volume without first convertingto polar coordinates. So, as these examples have shown, it is a good idea to always remember polarcoordinates.

There is one more type of example that we need to look at before moving on to the next section.Sometimes we are given an iterated integral that is already in terms of x and y and we need to convert thisover to polar coordinates so that we can actually do the integral. We need to see an example of how to dothis kind of conversion.

126


� Example 4.3.8 (Double integral in polar coordinates)

Evaluate the following integral by first converting to polar coordinates.

Z 1

0

Z √1−y2

0

cos(x2 + y2) dx dy.

Solution First, notice that we can not do this integral in Cartesian coordinates and so converting to polarcoordinates may be the only option we have for actually doing the integral. Notice that the function willconvert to polar coordinates nicely and so should not be a problem.

Let us first determine the region that we are integrating over and see if it is a region that can be easilyconverted into polar coordinates. Here are the inequalities that define the region in terms of Cartesiancoordinates.

0 � y � 1, 0 � x �p

1 − y2.

Now, the upper limit for the x’s is

x =p

1 − y2

and this looks like the right side of the circle of radius 1 centered at the origin. Since the lower limit for thex’s is x = 0 it looks like we are going to have a portion (or all) of the right side of the disk of radius 1centered at the origin.

The range for the y’s, however, tells us that we are only going to have positive y’s. This means that weare only going to have the portion of the disk of radius 1 centered at the origin that is in the first quadrant.So, we know that the inequalities that will define this region in terms of polar coordinates are then

0 � θ � π

2, 0 � r � 1.

Finally, we just need to remember that

dx dy = dA = r dr dθ

and so the integral becomes

Z 1

0

Z √1−y2

0

cos(x2 + y2) dx dy =

Z π/2

0

Z 1

0

cos(r2) r dr dθ.

Note that this is an integral that we can do. Here is the rest of the work for this integral.

Z 1

0

Z √1−y2

0

cos(x2 + y2) dx dy =

Z π/2

0

»1

2sin(r2)

–10

dθ

=

Z π/2

0

1

2sin(1) dθ =

π

4sin 1.

�

127


4.4 Triple Integrals

After introducing integrations over a two-dimensional region we need to move on to integrations over athree-dimensional solid. We used a double integral to integrate over a two-dimensional region and so itshould not be too surprising that we will use a triple integral to integrate over a three-dimensional solid.The notation for the general triple integrals isZZZ

G

f(x, y, z) dV.

Let us start with simple solid by integrating over the box,

G = [a, b] × [c, d] × [r, s].

Note that when using this notation we list the x’s first, the y’s second and the z’s third. Just as a doubleintegral can be evaluated by two successive single integrations, so a triple integral can be evaluated by threesuccessive integrations. The triple integral in this case is

ZZZG

f(x, y, z) dV =

Z s

r

Z d

c

Z b

a

f(x, y, z) dx dy dz.

Note that we integrate with respect to x first, then y, and finally z here, but in fact there is no reason to dothe integrals in this order. There are 6 different possible orders to do the integral and in which order youdo the integral will depend on the function and the order that you feel will be the easiest. We will get thesame answer regardless of the order however. The six different orders for the iterated integral are

dx dy dz, dy dz dx, dz dx dy,

dx dz dy, dz dy dx, dy dx dz.

Let us do a quick example of a triple integral over a three-dimensional box.

� Example 4.4.1 (Triple integral)

Evaluate the following integralZZZG

8xyz dV, where G = [2, 3] × [1, 2] × [0, 1].

Solution Just to make the point that order does not matter let us use a different order from that listedabove. We will do the integral in the following order.ZZZ

G

8xyz dV =

Z 2

1

Z 3

2

Z 1

0

8xyz dz dx dy

=

Z 2

1

Z 3

2

ˆ4xyz2

˜10

dx dy

=

Z 2

1

Z 3

2

4xy dx dy

=

Z 2

1

ˆ2x2y

˜32

dy

=

Z 2

1

10y dy =ˆ5y2˜2

1= 15.

128


�

Before moving on to more general solids let us get a nice geometric interpretation about the triple integralout of the way so we can use it in some of the examples to follow. In the special case where f(x, y, z) = 1,the triple integral yields the formula for calculating the volume of a solid.

Theorem 4.4.1 The volume of the three-dimensional solid G is given by the integral,

volume of G =

ZZZG

1 dV .

Let us now move on the more general three-dimensional solids. We have three different possibilities fora general solid. Here is a sketch of the first possibility.

x

y

z

Solid G

Region R (“projection”)

z = u2(x, y)

z = u1(x, y)

��

��

In this case we define the solid G as follows,

G = {(x, y, z) : (x, y) ∈ R, u1(x, y) � z � u2(x, y)},

where (x, y) ∈ R is the notation which means that the point (x, y) lies in the region R from the xy-plane.In this case we will evaluate the triple integral as follows

ZZZG

f(x, y, z) dV =

ZZR

"Z u2(x,y)

u1(x,y)

f(x, y, z) dz

#dA,

where the double integral can be evaluated in any of the methods that we saw in the previous couple ofsections. In other words, we can integrate first with respect to x, we can integrate first with respect to y, orwe can use polar coordinates as needed.

129



Evaluate

ZZZG

2x dV , where G is the solid under the plane 2x + 3y + z = 6 that lies in the first octant.

Solution We should first define octant. Just as the two-dimensional coordinates system can be dividedinto four quadrants the three-dimensional coordinate system can be divided into eight octants. The firstoctant is the octant in which all three of the coordinates are positive.

Here is a sketch of the plane in the first octant.

z

y

x

0

3

2

Region Ron xy-plane

2x + 3y + z = 6

��

��

We now need to determine the region R in the xy-plane. We can get a visualization of the region bypretending to look straight down on the object from above. What we see will be the region R in thexy-plane. So R will be the triangle with vertices at (0, 0), (3, 0), and (0, 2). Here is a sketch of R.

y

x

Region R

0

2

3

y = −2

3x + 2

or

x = −3

2y + 3

��

��

Now we need the limits of integration. Since we are under the plane and in the first octant (so we are abovethe plane z = 0) we have the following limits for z.

0 � z � 6 − 2x − 3y.

130


We can integrate the double integral over R using either of the following two sets of inequalities

0 � x � 3, 0 � y � −2

3x + 2

or

0 � x � −3

2y + 3 0 � y � 2.

Since neither really holds an advantage over the other we will use the first one. The integral is thenZZZG

2x dV =

ZZR

»Z 6−2x−3y

0

2x dz

–dA

=

ZZR

h2xzi6−2x−3y

0dA

=

Z 3

0

Z − 23 x+2

0

2x(6 − 2x − 3y) dy dx

=

Z 3

0

h12xy − 4x2y − 3xy2

i− 23 x+2

0dx

=

Z 3

0

„4

3x3 − 8x2 + 12x

«dx

=

»1

3x4 − 8

3x3 + 6x2

–30

= 9.

�

Let us now move onto the second possible three-dimensional solid we may run into for triple integrals.Here is the sketch of the solid.

x

y

z

R (“projection” on yz-plane)Solid G

x = u2(y, z)

x = u1(y, z)

��

��

For this possibility we define the solid G as follows,

G = {(x, y, z) : (y, z) ∈ R, u1(y, z) � x � u2(y, z)}.

131


So, the region R will be a region in the yz-plane. Here is how we will evaluate these integrals.

ZZZG

f(x, y, z) dV =

ZZR

"Z u2(y,z)

u1(y,z)

f(x, y, z) dx

#dA.

As with the first possibility we will have two options for doing the double integral in the yz-plane as wellas the option of using polar coordinates if needed.

Of course, there is the third (and final) possible three-dimensional solid we may run into for tripleintegrals. Here is a sketch of the solid G.

x

y

z

Region R Solid G

y = u1(x, z)

y = u2(x, z)

��

��

In this final case G is defined as

G = {(x, y, z) : (x, z) ∈ R, u1(x, z) � y � u2(x, z)}and here the region R will be a region in the xz-plane. Here is how we will evaluate these integrals.

ZZZG

f(x, y, z) dV =

ZZR

"Z u2(x,z)

u1(x,z)

f(x, y, z) dy

#dA,

where we can use either of the two possible orders for integrating R in the xz-plane or we can use polarcoordinates as needed.


Evaluate

ZZZG

p3x2 + 3z2 dV , where G is the solid bounded by y = 2x2 + 2z2 and the plane y = 8.

Solution Here is a sketch of the solid G.

132


z

y

x

08

Region R

Solid G

y = 2x2 + 2z2

��

��

The region R in the xz-plane can be found by “projecting” the surface onto the xz-plane and we can seethat R will be a disk in the xz-plane. This disk will come from the front of the solid and we can determinethe equation of the disk by setting the elliptic paraboloid and the plane equal.

2x2 + 2z2 = 8 =⇒ x2 + z2 = 4.

This region, as well as the integrand, both seem to suggest that we should use something like polar coor-dinates. However we are in the xz-plane and we have only seen polar coordinates in the xy-plane. This isindeed not a problem. We can always “translate” them over to the xz-plane with the following definition.

x = r cos θ, z = r sin θ.

Since the region does not have y’s we will let z take the place of y in all the formulas. Note that the definitionalso leads to the formula

x2 + z2 = r2.

With this in hand we can arrive at the limits of the variables that we will need for this integral.

2x2 + 2z2 � y � 8, 0 � r � 2, 0 � θ � 2π.

The integral is then

ZZZG

p3x2 + 3z2 dV =

ZZR

»Z 8

2x2+2z2

p3x2 + 3z2 dy

–dA

=

ZZR

hyp

3x2 + 3z2i82x2+2z2

dA

=

ZZR

p3(x2 + z2)

“8 − (2x2 + 2z2)

”dA.

Now, since we are going to do the double integral in polar coordinates let us get everything converted overto polar coordinates. The integrand is

p3(x2 + z2)

“8 − (2x2 + 2z2)

”=

√3r2

`8 − 2r2´

=√

3`8r − 2r3´ .

133


The integral is then ZZZG

p3x2 + 3z2 dV =

ZZR

√3`8r − 2r3´ dA

=√

3

Z 2π

0

Z 2

0

`8r − 2r3´ r dr dθ

=√

3

Z 2π

0

»8

3r3 − 2

5r5

–20

dθ

=√

3

Z 2π

0

128

15dθ

=256

√3 π

15.

�

4.5 Triple Integrals in Cylindrical Coordinates

In this section we want to take a look at triple integrands done completely in cylindrical coordinates.Recall that cylindrical coordinates are really nothing more than an extension of polar coordinates into threedimensions. The following are the conversion formulas for cylindrical coordinates.

x = r cos θ, y = r sin θ, z = z.

In order to do the integral in cylindrical coordinates we will need to know that dV will become in termsof cylindrical coordinates. We will be able to show that

dV = r dz dr dθ.

The solid G over which we are integrating becomes

G = {(x, y, z) : (x, y) ∈ R, u1(x, y) � z � u2(x, y)}

= {(r, θ, z) : θ1 � θ � θ2, r1(θ) � r � r2(θ), u1(r, θ) � z � u2(r, θ)}.

Note that we have only given this for G’s in which R is in the xy-plane. We can modify this accordinglyif R is in the yz-plane or the xz-plane as needed.

In terms of cylindrical coordinates a triple integral isZZZG

f(x, y, z) dV =

Z θ2

θ1

Z r2(θ)

r1(θ)

Z u2(r,θ)

u1(r,θ)

f(r cos θ, r sin θ, z) r dz dr dθ.

Do not forget to add in the r and make sure that all the x’s and y’s also get converted over intocylindrical coordinates. Practically, to apply the above formula it is best to begin with a three-dimensionalsketch of the solid G, from which the limits of integration can be obtained as follows:

Determining Limits of Integration: Cylindrical Coordinates

134

4.5 Triple Integrals in Cylindrical Coordinates

Step 1. Identify the upper surface z = u2(r, θ) and the lower surface z = u1(r, θ) of the solid. Thefunctions u1(r, θ) and u2(r, θ) determine the z-limits of integration. (If the upper and lowersurfaces are given in rectangular coordinates, convert them to cylindrical coordinates.)

Step 2. Make a two-dimensional sketch of the projection R of the solid on the xy-plane. From thissketch the r- and θ-limits of integration may be obtained exactly as with double integrals in polarcoordinates.

Let us see some examples in the following.

� Example 4.5.1 (Cylindrical Coordinates)

Evaluate

ZZZG

y dV , where G is the solid that lies below the plane z = x + 2 above the xy-plane and

between the cylinders x2 + y2 = 1 and x2 + y2 = 4.

Solution There really is not too much with this one other than do the conversions to cylindrical coordinatesand then evaluate the integral.

We will start out by getting the range for z in terms of cylindrical coordinates.

0 � z � x + 2 =⇒ 0 � z � r cos θ + 2.

Remember that we are above the xy-plane and so we are above the plane z = 0. Next, the region R isthe region between the two circles x2 + y2 = 1 and x2 + y2 = 4 in the xy-plane and so the ranges forthe region are

0 � θ � 2π, 1 � r � 2.

Here is the integral. ZZZG

y dV =

Z 2π

0

Z 2

1

Z r cos θ+2

0

(r sin θ) r dz dr dθ

=

Z 2π

0

Z 2

1

(r2 sin θ) (r cos θ + 2) dr dθ

=

Z 2π

0

Z 2

1

„1

2r3 sin 2θ + 2r2 sin θ

«dr dθ

=

Z 2π

0

»1

8r4 sin 2θ +

2

3r3 sin θ

–21

dθ

=

Z 2π

0

„15

8sin 2θ +

14

3sin θ

«dθ

=

»−15

16cos 2θ − 14

3cos θ

–2π

0

= 0.�

Just as we did with double integral involving polar coordinates we can start with an iterated integral interms of x, y, and z and convert it to cylindrical coordinates.

135


� Example 4.5.2 (Cylindrical Coordinates)

Convert

Z 1

−1

Z √1−y2

0

Z √x2+y2

x2+y2xyz dz dx dy into an integral in cylindrical coordinates.

Solution Here are the ranges of the variables from this iterated integral.

−1 � y � 1, 0 � x �p

1 − y2, x2 + y2 � z �p

x2 + y2.

The first two inequalities define the region R and since the upper and lower bounds for the x’s arex =

p1 − y2 and x = 0 we know that we have got at least part of the right half a circle of radius 1

centered at the origin. Since the range of y’s is −1 � y � 1 we know that we have the complete right halfof the disk of radius 1 centered at the origin. So, the ranges for R in cylindrical coordinates are

−π

2� θ � π

2, 0 � r � 1.

The last thing we have to do is to convert the limits of the z range, but that is pretty simple.

r2 � z � r.

Here we notice that the lower bound is an elliptic paraboloid and the upper bound is a cone. Therefore Gis a portion of the solid between these two surfaces. The integral is then

Z 1

−1

Z √1−y2

0

Z √x2+y2

x2+y2xyz dz dx dy =

Z π/2

−π/2

Z 1

0

Z r

r2(r cos θ)(r sin θ) z r dz dr dθ

=

Z π/2

−π/2

Z 1

0

Z r

r2zr3 cos θ sin θ dz dr dθ.

�

4.6 Triple Integrals in Spherical Coordinates

We introduce here another common coordinate system that is frequently more convenient than either rect-angular or cylindrical coordinates. In particular, some triple integrals that cannot be calculated exactly ineither rectangular or cylindrical coordinates can be dealt with easily in spherical coordinates.

First, we need to recall just how spherical coordinates are defined. The following sketch shows therelationship between the Cartesian and spherical coordinate systems.

136


θ

φ

ρ

O

z

y

x

z

P (x, y, z) = (ρ, θ, φ)

Q(0, 0, z)

R(x, y, 0)

Spherical coordinates locate points in space with two angles and one distance, as shown in the above figure.The first coordinate,

ρ = ‖−−→OP ‖ =p

x2 + y2 + z2,

is the point’s distance from the origin. Unlike r, the distance ρ is never negative. The second coordinate,

φ, is the angle−−→OP makes with the positive z-axis. It is required to lie in the interval [0, π]. The third

coordinate is the angle θ as measured in cylindrical coordinates.

If you look closely at the above figure, you can see how to relate rectangular and spherical coordinates.Notice that

x = ‖−→OR‖ cos θ = ‖−−→QP‖ cos θ.

Looking at the triangle OQP , we find that ‖−−→QP‖ = ρ sin φ, so that

x = ρ sin φ cos θ.

Similarly, we have

y = ‖−→OR‖ sin θ = ρ sin φ sin θ.

Finally, focusing again on triangle OQP , we have

z = ρ cos φ.

Here is the summary of the conversion formulas for spherical coordinates.

x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ,

x2 + y2 + z2 = ρ2.

As just mentioned we have the following restrictions on the coordinates.

ρ � 0, 0 � φ � π.

137


For our integrals we are going to restrict G down to spherical wedge. This will mean that we are going totake ranges for the variables as follows,

ρ1 � ρ � ρ2, θ1 � θ � θ2, φ1 � φ � φ2.

Here is a quick sketch of a spherical wedge for reference purposes.

z

y

x

Solid G

From this sketch we can see that G is really nothing more than the intersection of a sphere and a cone. Infact we can show that

dV = ρ2 sin φ dρ dθ dφ.

Therefore the integral will become

ZZZG

f(x, y, z) dV =

Z φ2

φ1

Z θ2

θ1

Z ρ2

ρ1

f(ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ) ρ2 sin φ dρ dθ dφ.

This looks a bit complicated, but given that the limits are all constants the integrals here tend to not betoo bad. Let us see a couple of examples of this kind.

� Example 4.6.1 (Spherical Coordinates)

Evaluate ZZZG

16z dV,

where G is the upper half of the sphere x2 + y2 + z2 = 1.

Solution Since we are taking the upper half of the sphere the limits for the variables are

0 � ρ � 1, 0 � θ � 2π, 0 � φ � π

2.

138


The integral is then ZZZG

16z dV =

Z π/2

0

Z 2π

0

Z 1

0

(16ρ cos φ) ρ2 sin φ dρ dθ dφ

=

Z π/2

0

Z 2π

0

Z 1

0

8ρ3 sin 2φ dρ dθ dφ

=

Z π/2

0

Z 2π

0

2 sin 2φ dθ dφ

=

Z π/2

0

4π sin 2φ dφ

=h− 2π cos 2φ

iπ/2

0

= 4π.

�

� Example 4.6.2 (Spherical Coordinates)

Convert Z 3

0

Z √9−y2

0

Z √18−x2−y2

√x2+y2

`x2 + y2 + z2

´dz dx dy

into spherical coordinates.

Solution Let us first write down the limits for the variables.

0 � y � 3, 0 � x �p

9 − y2,p

x2 + y2 � z �p

18 − x2 − y2.

The range for x tells us that we have a portion of the right half of a disk of radius 3 centered at the origin.Since we are restricting y’s to positive values it looks like we will have a quarter disk in the first quadrant.Therefore since R is in the first quadrant the solid G must be in the first octant and this follows that wehave the following range for θ (since this is the angle around the z-axis).

0 � θ � π

2.

Now, let us see what the range for z tells us. The lower bound, z =p

x2 + y2, is the upper half of a

cone. At this point we don’t need this quite yet, but we will later. The upper bound, z =p

18 − x2 − y2,is the upper half of the sphere,

x2 + y2 + z2 = 18

and so from this we now have the following range for ρ.

0 � ρ �√

18 = 3√

2.

Now we need the range for φ. There are two ways to get this. One is from where the cone and the sphereintersect. Plugging in the equation for the cone into the sphere gives“p

x2 + y2”2

+ z2 = 18,

z2 + z2 = 18,

z2 = 9,

z = 3.

139


Note that we can assume z is positive here since we have the upper half of the cone and/or sphere.Finally, put this into the conversion for z and take advantage of the fact that we know that ρ = 3

√2 since

we are intersecting on the sphere. This gives

ρ cos φ = 3,

3√

2 cos φ = 3,

cos φ =1√2

=

√2

2=⇒ φ =

π

4.

So, we have the following range.

0 � φ � π

4.

The other way to get this range is from the cone itself. By first converting the equation into cylindricalcoordinates and then into spherical coordinates we get the following

z =p

x2 + y2,

ρ cos φ = ρ sin φ,

1 = tanφ =⇒ φ =π

4.

So, recalling thatρ2 = x2 + y2 + z2,

the integral is then

Z 3

0

Z √9−y2

0

Z √18−x2−y2

√x2+y2

`x2 + y2 + z2´ dz dx dy =

Z π/4

0

Z π/2

0

Z 3√

2

0

ρ4 sin φ dρ dθ dφ.

�

140

supp notes calculus

Documents