a first course in optimization theory

364

Click here to load reader

Upload: quji-bichia

Post on 21-Jan-2016

361 views

Category:

Documents


42 download

DESCRIPTION

Basic course in optimization

TRANSCRIPT

Page 1: A First Course in Optimization Theory

~CAMBRIDGE~ UNIVERSITY PRESS

New York UniversityRANGARAJAN K. SUNDARAM

A FIRST COURSEIN OPTIMIZATION THEORY

Page 2: A First Course in Optimization Theory

vii

Preface page xiiiAcknowledgements xvii

Mathematical Preliminaries I1.1 Notation and Preliminary Definitions 2

1.1.1 Integers, Rationals, Reals, Rn 21.1.2 Inner Product, Norm, Metric 4

l.2 Sets and Sequences in ]R11 71.2.1 Sequences and Limits 71.2.2 Subsequences and Limit Points to1.2.3 Cauchy Sequences and Completeness 111.2.4 Suprema, Infima, Maxima, Minima 141.2.5 Monotone Sequences in IR 171.2.6 The Lim Sup and Lim Inf 181.2.7 Open Balls, Open Sets, Closed Sets 221.2.% Bounded Sets and Compact Sets 231.2.9 Convex Combinations and Convex Sets 231.2.l0 Unions, Intersections. and Other Binary Operations 24

1.3 Matrices 30l.3.1 Sum, Product, Transpose 301.3.2 Some Important Classes of Matrices 321.3.3 Rank of a Matrix 331.3.4 The Determinant 351.3.5 The Inverse 381.3.6 Calculating the Determinant 39

1.4 Functions 411.4.1 Continuous Functions 411.4.2 Differentiable and Continuously Differentiable Functions 43

Contents

Page 3: A First Course in Optimization Theory

viii Contents

1.4.3 Partial Derivatives and Differentiability 461.4.4 Directional Derivatives and Differentiability 481.4.5Higher Order Derivatives 49

1.5 Quadratic Forms: Definite and Semidefinite Matrices 501.5.1 Quadratic Forms and Definiteness 501.5.2 Identifying Definiteness and Semidefiniteness 53

1.6 Some Important Results 551.6.1 Separation Theorems 561.6.2 The Intermediate and Mean Value Theorems 601.6.3 The Inverse and Implicit Function Theorems 65

1.7 Exercises 66

2 Optimization in JRn 742.1 Optimization Problems in lRn 742.2 Optimization Problems in Parametric Form 772.3 Optimization Problems: Some Examples 78

2.3.1 Utility Maximization 782.3.2 Expenditure Minimization 792.3.3 Profit Maximization 802.3.4 Cost Minimization 802.3.5 Consumption-Leisure Choice 812.3.6 Portfolio Choice 812.3.7 Identifying Pareto Optima 822.3.8 Optimal Provision of Public Goods 832.3.9 Optimal Commodity Taxation 84

2.4 Objectives of Optimization Theory 852.5 A Roadmap 862.6 Exercises 88

3 Existence of Solutions: The Weierstrass Theorem 903.1 The Weierstrass Theorem 90:u The Weierstrass Theorem in Applications 923.3 A Proof of the Weierstrass Theorem 963.4 Exercises 97

4 Unconstrained Optima 1004.1 "Unconstrained" Optima 1004.2 First-Order Conditions 1014.3 Second-Order Conditions 1034.4 Using the First- and Second-Order Conditions 104

Page 4: A First Course in Optimization Theory

Contents ix

4.5 A Proof of the First-Order Conditions 1064.6 A Proof of the Second-Order Conditions 1084.7 Exercises 110

5 Equality Constraints and the Theorem of Lagrange 1125.1 Constrained Optimization Problems 1125.2 Equality Constraints and the Theorem of Lagrange 113

5.2.1 Statement of the Theorem 1145.2.2 The Constraint Qualification 1155.2.3 The Lagrangean Multipliers [ [6

5.3 Second-Order Conditions 1175.4 Using the Theorem of Lagrange 121

5.4.1 A "Cookbook" Procedure 1215.4.2 Why the Procedure Usually Works 1225.4.3 When It Could Fail 1235.4.4 A Numerical Example 127

5.5 Two Examples from Economics 1285.5.1 An Illustration from Consumer Theory 1285.5.2 An Illustration from Producer Theory 1305.5.3 Remarks 132

5.6 A Proof of the Theorem of Lagrange 1355.7 A Proof of the Second-Order Conditions 1375.8 Exercises 14:

6 Inequality Constraints and the Theorem of Kuhn and Tucker 145'6.1 The Theorem of Kuhn and Tucker 145'

6.1.1 Statement of the Theorem 145'6.1.2 The Constraint Qualification 1476.1.3 The Kuhn-Tucker Multipliers 14Sl

6.2 Using the Theorem of Kuhn and Tucker 1506.2.1 A "Cookbook" Procedure 1506.2.2 Why the Procedure Usually Works I"6.2.3 When ItCould Fail 1526.2.4 A Numerical Example ISS-

6.3 Illustrations from Economics 1St6.3.1 An Illustration from Consumer Theory I%'6.3.2 An Illustration from Producer Theory 161

6.4 The General Case: Mixed Constraints 16'1'6.5 A Proof of the Theorem of Kuhn and Tucker 1656.6 Exercises 16?

Page 5: A First Course in Optimization Theory

x Contents

7 Convex Structures in Optimization Theory 1727.1 Convexity Defined 173

7.1.1 Concave and Convex Functions 1747.1.2 Strictly Concave and Strictly Convex Functions 176

7.2 Implications of Convexity 1777.2.1 Convexity and Continuity 1777.2.2 Convexity and Differentiability 1797.2.3 Convexity and the Properties of the Derivative 183

7.3 Convexity and Optimization 1857.3.1 Some General Observations 1857.3.2 Convexity and Unconstrained Optimization 1877.3.3 Convexity and the Theorem of Kuhn and Tucker 187

7.4 Using Convexity in Optimization 1897.S A Proof of the First-Derivative Characterization of Convexity 1907.6 A Proof of the Second- Derivative Characterization of Convexity 1917.7 A Proof of the Theorem of Kuhn and Tucker under Convexity 1947.8 Exercises 198

8 Quasi-Convexity and Optimization 2038.1 Quasi-Concave and Quasi-Convex Functions 2048.2 Quasi-Convexity as a Generalization of Convexity 2058.3 Implications of Quasi-Convexity 2098.4 Quasi-Convexity and Optimization 2138.5 Using Quasi-Convexity in Optimization Problems 2158.6 A Proof of the First-Derivative Characterization of Quasi-Convexity 2168.7 A Proof of the Second-Derivative Characterization of

Quasi-Convexity 2178.8 A Proof of the Theorem of Kuhn and Tucker under Quasi-Convexity 2208.9 Exercises 221

9 Parametric Continuity: The Maximum Theorem 2249.1 Correspondences 225

9.1.1 Upper- and Lower-Scmicontinuous Correspondences 2259.1.2 Additional Definitions 2289.1.3 A Characterization of Semi continuous Correspondences 2299.1.4 Semicontinuous Functions and Semi continuous

Correspondences 2339.2 Parametric Continuity: The Maximum Theorem 235

9.2.1 The Maximum Theorem 2359.2.2 The Maximum Theorem under Convexity 237

Page 6: A First Course in Optimization Theory

2X I2XI2X22X32X(,

2X()287289

2/lS261'26X2A<)27127227()27~

2) ,

2)·l

25425525X2(12

262263263

24224~24_124424()

247

_140240

XI

12 Stationary Discounted Dynamic Programming12.1 Description of the Framework12.2 Histories. Strategies, and the Value Function12.3 The Bellman Equation12.4 A Technical Digression

12.4.1 Complete Metric Spaces and Cauchy Sequences12.4.2 Contraction Mappings12.4.3 Uniform Convergence

11 Finite-Horizon Dynamic Programming11.1 Dynamic Programming Problems11.2 Finite-Horizon Dynamic Programming11.3 Histories. Strategies. and the Value Function1104 Markovian Strategies11.5 Existence of an Optimal StrategyI 1.6 An Example: The Consumption-Savings Problem11.7 Exercises

10 Supermodularity and Parametric Monotonicity10.1 Lattices and Superrnodularity

10.1.1 Lattices10.1.2 Supermodularity and Increasing Differences

10.2 Parametric MonotonicityJ 0.3 An Application to Supermodular Garnes

10.3.1 Supermodular Games10.3.2 The Tarski Fixed Point Theorem10.3.3 Existence of Nash Equilibrium

1004 A Proof of the Second-Denvarive Characterization ofSupermodulariry

10.5 Exercises

9.3 An Application io Consumer Theory9.3.1 Continuity of the Budget Correspondence9.3.2 The Indirect Utility Function and Demand

Correspondence904 An Application to Nash Equilibrium

9.4.1 Normal-Form Games9.4.2 The Brouwer/Kakutani Fixed Point Theorem9.4.3 Existence of Nash Equilibrium

9.5 Exercises

Contents

Page 7: A First Course in Optimization Theory

XII Contents

12.5 Existenceof an Optimal Strategy 29112.5.1 A Preliminary Result 29212.5.2 Stationary Strategies 29412.5.3 Existence of an Optimal Strategy 295

12.6 AnExample:The OptimalGrowth Model 29812.6.1 The Model 29912.6.2 Existence of Optimal Strategies 30012.6.3 Characterizationof Optimal Strategies 301

12.7 Exercises 309

AppendixA Set Theory and Logic: An Introduction 315Al Sets,Unions, Intersections 315A2 Propositions:Contrapositivesand Converses 316A3 Quantifiersand Negation 318A4 Necessaryand SufficientConditions 320

AppendixB The Real Line 323B.l Constructionof the Real Line 3238.2 Propertiesof the Real Line 326

AppendixC Structures on VectorSpaces 330C.l VectorSpaces 330C.2 Inner Product Spaces 332C.3 NormedSpaces 333C.4 Metric Spaces 336

CA.l Definitions 336CA.2 Sets and Sequences in Metric Spaces 337CA.3 Continuous Functions on Metric Spaces 339CAA Separable Metric Spaces 340CA.5 Subspaccs 341

C.5 TopologicalSpaces 342C.5.1 Definitions 342C.5.2 Sets and Sequences in TopologicalSpaces 343C.5.3 Continuous Functions on TopologicalSpaces 343C.5A Bases 343

C.6 Exercises 345

Bibliography 349Index 351

Page 8: A First Course in Optimization Theory

xiii

An Overview of the ContentsThemain body of this bookmaybe dividedinto three parts.The firstpart, encompass­ing Chapters 3 through 8, studies optimization in n-dimensional Euclidean space,]Rn. Several topics are covered in this span. These include-but are not limited to­(i) theWeierstrassTheorem,and the existenceof solutions to optimization problems;(ii) the Theorem of Lagrange, and necessary conditions for optima in problems withequality constraints; (iii) theTheoremof KuhnandTucker, and necessary conditionsfor optima in problems with inequality constraints; (iv) the role of convexity in ob­taining sufficient conditions for optima in constrained optimization problems; and(v) the extent to which convexity can be replaced with quasi-convexity, while stillobtaining sufficiencyof the first-orderconditions for global optima.The second part of the book, comprised of Chapters 9 and 10, looks at the issue

of parametric variation in optimization problems, that is. at the manner in whichsolutions to optimizationproblems respond to changes in the values of underlyingparameters.Chapter9 beginsthis exercisewiththe questionof parametric continuity:under what conditionswill solutions to optimization problems vary "continuously"with changes in the underlyingparameters?An answer is provided in the centerpieceof this chapter,theMaximumTheorem.The strengtheningof the MaximumTheoremthat is obtained by adding convexity restrictions to the problem is also examined.Chapter lOis concernedwithparametric monotonicity, that is, with conditionsunderwhich increases in the valuesof parameters result in increases in the size of optimal

This book developedout of a course I have taught since 1988to first-year Ph.D. stu­dents at the University of Rochester on the use of optimization techniques in eco­nomic analysis. A detailed account of its contents is presented in Section 2.5 ofChapter 2. The discussionbelow is aimed at providing a broad overview of the book,as well as at emphasizing some of its special features.

Preface

Page 9: A First Course in Optimization Theory

This chapter lays the mathematical foundation for the study of optimization thatoccupies the rest of this book. It focuses on three main topics: the topological structureof Euclidean spaces, continuous and differentiable functions on Euclidean spaces andtheir properties, and matrices and quadratic forms. Readers famil iar with real anal ysisat the level of Rudin (1976) or Bartle (1964), and with matrix algebra at the level ofMunkres (1964) or Johnston (1984, Chapter 4), will find this chapter useful primarilyas a refresher; for others, a systematic knowledge of its contents should significantlyenhance understanding of the material to follow.

Since this is not a book in introductory analysis or linear algebra, the presentationin this chapter cannot be as comprehensive or as leisurely as one might desire. Theresults stated here have been chosen with an eye to their usefulness towards the book'smain purpose, which is to develop a theory of optimization in Euclidean spaces. Theselective presentation of proofs in this chapter reveals a similar bias. Proofs whoseformal structure bears some resemblance to those encountered in the main body ofthe text are spell out in detail; others are omitted altogether. and the reader is giventhe cuoice of either accepting the concerned results on faith or consulting the moreprim ...ry sources listed alongside the result.

It would be inaccurate to say that this chapter does not presuppose any knowl­edge on the part of the reader, but it is true that it does not presuppose much. Ap­pendices A and B aim to fill in the gaps and make the book largely self-contained.Appendix A reviews the basic rules of propositional logic; it is taken for grantedthroughout that the reader is familiar with this material. An intuitive understandingof the concept of an "irrational number," and of the relationship between rationaland irrational numbers, suffices for this chapter and for the rest of this book. Aformal knowledge of the real line and its properties will, however. be an obviousadvantage, and readers who wish to acquaint themselves with this material mayconsult Appendix B.

Mathematical Preliminaries

1

Page 10: A First Course in Optimization Theory

The Euclidean distance between two points x and y in lR is defined as [x - YI. i.e .•as the absolute value of their difference.

For any positive integer n E N. the n-fold Cartesian product of IRwill be denotedRn. We will refer to IRnas n-dimensional Euclidean space. When n = I. we shallcontinue writing IRfor lRI.

ApointinIRn is a vector x = (Xl ••••• Xn) where for each i = l •.... n i x, isareal number. The number XI is called the i-tb coordinate of the vector x.

We use 0 to denote the real number 0 as well as the null vector (0 •... ,0) E R".This notation is ambiguous. but the correct meaning will usually be clear from thecontext.

ifz:::: 0if z < O.

[z] = { z-z

The set of rational numbers is denoted by Q:

IQ = {X I x = f P. q E Z. q ;i: o} .Finally. the set of all real numbers. both rational and irrational. is denoted by lR. Asmentioned earlier. it is presumed that the reader has at least an intuitive understandingof the real line and its properties. Readers lacking this knowledge should first reviewAppendix B.

Given a real number z E R, its absolute value will be denoted [z]:

N={1.2.3 •... }Z = {.... -2. -I. O.1.2•... }.

1.1 Notation and Preliminary Definitions1.1.1 Integers, Rationals, Reals, lRn

The notation we adopt is largely standard. The set of positive integers is denoted byN. and the set of all integers by Z:

The discussion in this chapter takes place solely in the context of Euclidean spaces.This is entirely adequate for our purposes. and avoids generality that we do not need.However. Euclidean spaces are somewhat special in that many of their properties(such as completeness. or the compactness of closed and bounded sets) do not carryover to more general metric or topological spaces. Readers wishing to view thetopological structure of Euclidean spaces in a more abstract context can. at a firstpass. consult Appendix C. where the concepts of inner product. norm. metric. andtopology are defined on general vector spaces. and some of their properties arereviewed.

Chapter 1 Mathematical Preliminaries2

Page 11: A First Course in Optimization Theory

and

lR~ = Ix E JRn I x ?: A},

The nonnegative and strictly positive orthants of lRn, denoted lR~ and 1R~+, reospectively, are defined as

• X ?: y does not preclude the possibility that x = y, and• for n > 1, the vectors X and y need not be comparable under any of the categories

above; for instance, the vectors X = (2, I) and Y = (I, 2) in JR2 do not satisfyx ?: y, but neither is it true that y ?: x.

Note that

X = y, if Xi = Yi, i = 1, , n,X ?: y, if x, ?:Yi, i = 1, ,n.X > y, if x?: Y and x # y.X» y, if x, >Yi, i= l,...,n.

Figure 1.1 provides a graphical interpretation of vector addition and scalar mutipli­cation in JR2.

Given any two n-vectorsx = (XI, ... ,xn) and Y = (YI,"" Yn), we write

x + Y = (XI + YI , ... , Xn + Yn)ax = (axI, ... , aXn).

Vector addition and scalar multiplication are defined in Rn as follows: for .r , Y E

Rn and a E JR,

Fig. 1.1. Vector Addition and Scalar Multiplication in JR2

-y

-1l·r

.r

2xx+y

J. J Notation and Definitions

Page 12: A First Course in Optimization Theory

Proof For notational ease, let X == x . x, Y == y. y. and Z == x . y. Then, the resultwill be proved if we show that XY ::: Z2, since the required inequality will followsimply by taking square roots on both sides.Ifx == 0, then Z = X == 0, and the inequality holds trivially. Suppose, therefore.

that x =1= O. Note that by the positivity property of the inner product, we must then

Theorem 1.2 (Cauchy-Schwartz Inequality) For any x, y E IRnwe have

Ix. yl s (x· x)I/2(y. y)I/2.

The inner product also satisfies a very useful condition called the Cauchy-Schwartzinequality:

Proof Symmetry and bilinearity are easy to verify from the definition of the innerproduct. To check that positivity holds, note that the square of a real number is alwaysnonnegative. and can be zero if and only if the number is itself zero. It follows thatas the sum of squared real numbers, x . x = L:7=1 xf is always nonnegative. and iszero if and only if Xi = 0 for each i , i.e., if and only if x == O. 0

Theorem 1.1 The inner product has thefollowing propertiesfor ang vectors x , y, z EIRnand scalars a, b E IR:

I. Symmetry: x . y == y. x.2. Bilinearity: (ax + by)· z = ax· z + by- z and x . (ay + bz) == x . ay + x . bz.3. Positivity: x . x :::O, with equality holding if and only if X = O.

We shall henceforth refer to the Euclidean inner product simply as the inner product.

nX . Y == LX;Yi'

i=1

1.1.2 Inner Product, Norm, MetricThis subsection describes three structures on the space JRn: the Euclidean innerproduct of two vectors x and y in JRn, the Euclidean norm of a vector x in IRn. andthe Euclidean metric measuring the distance between two points x and yin IRn. Eachof these generalizes a familiar concept from JR.Namely, when n == I, and x and yare just real numbers. the Euclidean inner product of x and y is just the product x yof the numbers x and y; the Euclidean norm of x is simply the absolute value [x] ofx; and the Euclidean distance between x and y is the absolute value [x - yl of theirdifference.

Given x, y E IRn, the Euclidean inner product of the vectors x and y, denotedx . y, is defined as:

Chapter J Mathematical Preliminaries4

Page 13: A First Course in Optimization Theory

Proof The positivity property of the norm follows from the positivity property ofthe inner product, and the fact that IIx II = (x . x) 1/2.Homogeneity obtains since

Theorem 1.3 The 1I0ml satisfies the following properties at all x, y E R", anda E IR:

1. Positivity: IIx II ~ O. witli equality if and only if x = 0.2. Homogeneity: lIax II = lal . IIx II.3. Triangle Inequality: IIx + YII s IIxll + 11)'11.

Our next result, which describes some useful properties of the norm, uses this obser­vation.

Ix' yl s IIxII Ilyll·

for all x E IRn; in particular. the Cauchy-Schwartz inequality may be written as

IIxll = (x . x)I/2

The norm is related to the inner product through the identity

(

n ) 1/2IIxll == (;x}

The Euclidean norm (henceforth. simply the norm) of a "Vectorx E ]Rn, denotedIIx II, is defined as

or Y ?: Z2/ X. Since X > 0, this in turn implies XY ?: Z2. as required. 0

(~)

2

X - 2 (~) Z + Y = - (~) + y ~ 0,

In particular, this inequality must hold for a = - Z/ X. When this value of a is usedin the equation above, we obtain

o ::: (ax + y) . (ax + y)

= a2x . x + 2ax . y + y. y

= a2 X + 2a Z + Y.

have X > O. The positivity property also implies that for any scalar a E IR., we have

5J.I Notation and Definitions

Page 14: A First Course in Optimization Theory

The concepts of inner product, norm. and metric can be defined on any abstractvector space. and not just IRn. In fact, the propertieswe have listed in Theorems 1.1.1.3.and 1.4 are, in abstract vector spaces, definingcharacteristics of the respectiveconcepts. Thus. for instance, an inner producton a vector space is defined to be anyoperatoron that space that satisfiesthe three propertiesof symmetry.bilinearity, andpositivity; while a norm on that space is defined to be any operator that meets theconditions of positivity, homogeneity.and the triangle inequality.For more on this.see Appendix C.

This isjust the triangle inequalityfor norms,whichwehavealreadyestablished. 0

Proof The positivity property of the metric follows from the positivity property ofthe norm, and the observation that d(x, y) = IIx - YII.Symmetry is immediate fromthe definition.The inequality d(x, z) ::::d(x, y) + d (y, z) is the same as

IIx - z] :::: IIx - yll + lIy - zll·

Theorem 1.4 The metric d satisfies the following properties at all x, y, Z E IRn :

1. Positivity: d (x, y) :::0 with equality if and only if x = y.2. Symmetry: dtx ; y) = dey, x).3. Triangle Inequality: d(x, z) ::::d(x, y) + dey, z)for all x, y, z E IRn.

for all x, y E IRn.

d(x, y) = IIx - yll

The distance function d is called a metric, and is related to the norm II . II throughthe identity

(

n )1/2dtx , y) = L(x; _ y;)2

1=1

The Euclidean distance d (x, y) between twovectors x and y in IRn is given by

oThe proof is completed by takingsquare roots on both sides.

The triangle inequality is a little trickier; we will need the Cauchy-Schwartzinequality to establish it. Observe that for any x and y in IRn, we have

IIx + yll2 = (x + y) . (x + y) = IIxll2 + 2x . Y + Ilyll2.

BytheCauchy-Schwartz inequality,X· Y ::: IIxIiIlYIl. Substitutingthis in the previousequation. we obtain

Chapter 1 Mathematical Preliminaries6

Page 15: A First Course in Optimization Theory

Proof Suppose Xk -;. x. Let E = I in the definition of convergence. Then. thereexists k(l) such that for all k :::k(l). d(Xk. x) < I.Since d(Xk, x) =:; IIxk - xII. an

Theorem 1.6 Every convergent sequence in IRn is bounded.

A sequence {xk} in ]R" is called a bounded sequence if there exists a real numberM such that II-cq II :::: M for all k. A sequence {Xk} which is not bounded is said tobe unbounded; that is, {Xk} is an unbounded sequence if for any M E IR. there existsk(M) such that IXk(M) I > M.

Since d(x, y) > 0 and d(Xk. x) -;. 0, this inequality shows that d(:q. y) cannot goto zero as k goes to infinity, so Xk -+ Y is impossible. 0

d(Xk, Y) ::: d(x, y) - d txi , x).

Proof This follows from a simple application of the triangle inequality. If .q. -> X

and y =1= x, then

Theorem I.S A sequence call have 01 most one limit. That is, if [Xk} is a sequencein ]R" converging to a point x E ]Rn, it cannot also converge 10 a point y E lR" fory =1= x.

or, more compactly, simply as (Xk J. Occasionally. where notational clarity will beenhanced by this change, we v··;lluse superscripts instead of subscripts. and denotethe sequence by {xk}.

A sequence of points {Xk) in ]R" is said to converge to a limit x (written Xk -;. x) ifthe distance d(Xk, x) between Xk and x tends to zero as k goes to infinity, i.e .. if for allE > 0, there exists an integer k(E) such that for all k ::: k(E), we have dt xs: .r) < f.

A sequence {Xk} which converges to a limit X is called a conve rgent sequence.For example, the sequence {Xk} in ]R defined by Xk = 1/ k for all k is a convergent

sequence, with limit x = O. To see this, let any E > ° be given. Let k(E) be anyintegersuchthatk(E) > I/E.Then,forallk > k(E),wehaved(xko0) =d(1/k,O) =1/ k < 1/ k(E) < f. so indeed. Xk -;. O.

XI, X2. X3 •..•

1.2 Sets and Sequences in ]R"

1.2.1 Sequences and LimitsA sequence in ]R" is the specification of a point Xk E JR." for each integer k c:p,2, ... J. The sequence is usually written as

71.2 Sets and Sequences

Page 16: A First Course in Optimization Theory

IThis may also be shown directly: for any fixed candidate limit x thedistanced(xk. x) = Ix -xkl = Ix -klbecomes unbounded as k goes to infinity. It follows that no x E R can be a limit of this sequence. andtherefore that it does not possess a limit.

Theorem 1.7 A sequence {xk} in ]Rn converges to a limit x if and only if xf ~ Xi

for each i E {I, ... , n}. where xk = (xt, ... , x!) and x = (Xl, ... , Xn)·

This sequence is bounded since we have Ix!1 ::: 1 for all k. However, it does notpossess a limit. The reason here is that the odd terms of the sequence are convergingto the point 0, while the even terms are converging to the point 1. Since a sequencecan have only one limit, this sequence does not converge.

Our next result shows that convergence of a sequence {xk} in lRn is equivalentto convergence in each coordinate. This gives us an alternative way to establishconvergence in lRN. We usc superscripts to denote the sequence in this result to avoidconfusion between the k-th element Xk of the sequence, and the i -th coordinate Xi ofa vector x.

k == 2, 4, 6, ...

k = 1,3,5, ...I k'Xk = 1

1-­k'

While Theorem 1.5 established that a sequence can have at most one limit, The­orem 1.6 implies that a sequence may have no limit at all. Indeed, because everyconvergent sequence must be bounded, it follows that if (Xk) is an unbounded se­quence, then {Xk} cannot converge. Thus, for instance, the sequence {Xk} in lRdefinedby Xk == k for all k is a non-convergent sequence. I

However, unboundedness is not the only reason a sequence may fail to converge.Consider the following example: let (Xk) in IRbe given by

oThen, M 2:: IIxk II for all k, completing the proof.

{lIxdl. ... , IIxk(l)-lIl, 1+ IIxll}.

Now define M to be the maximum of the finite set of numbers

IIXkll = II(xk -x) +xII

< IIXk - x II+ IIx II< 1+ IIxli.

application of the triangle inequality yields for any k 2:: k( I)

Chapter J Mathematical Preliminaries8

Page 17: A First Course in Optimization Theory

Proof The theorem will be proved if we show that a, .:5 x, ::::b, for each i E(I, ...• n). Suppose that the result were false. so for some i, we had Xi < a, (say).Since xk -+ x, it is the case by Theorem 1.7 that xj -+ Xj for each j E (I •... , n):in particular. xf -+ Xi. But xf -+ x, combined with x, < a, implies that for all largek, we must have x~ < ai. This contradicts the hypothesis that a, :5 x~ ~ b, for a IIk. A similar argument establishes that Xi > bj also leads to a contradiction. Thus,ai ::::xi ~ hi. and the proof is complete. 0

Theorem 1.8 Let {xk} be a sequence in lR" converging to a limit x. Suppose thatfor every k, we have a :5 xk ~ b. where a = (al •... , an) and b = (61• "'. bTl) aresome fixed vectors in jRn. Then. it is a/so the case that (I ~ x ::::b.

Theorem 1.7 makes it easy to prove the following useful result:

LJ

d(xk, x) = (~IX: - x;i2 r/2 < (~[~ y) 1/2 = E,

which completes the proof.

Setting kj(E) = k(E) for each i, the proof that xf -+ x, for each i is complete.Now, suppose that Ix{) converges to x, for each i. Let E > 0 be given. We wiII

show that there is k(E) such that dtx", x) < E for all k ::: keE), which will establishthat xk -+ x.

Define 7] = E/.[ii. For each i, there exists ki(7]) such that for k ::: kj(TJ), wehave IX:- xjl < 7]. Define k(E) to be the maximum of the finite set of numberskJ (1/), ... .k; (TJ)· Then, for k ::: k(E),we have 14 - xii < 'I for all i , so

( )

1/2k k 2 1/2 "k 2 k

Ixj-x;i=(lxj-x;i) :::: LIXj-Xjl =d(X,X)<E.}=I

where IXj - yt I is the Euclidean distance between Xi and yt in IR.First, suppose that xk -+ x. We are to show that xf -+ Xj for each i, i.e., that, given

any i and E > 0, there exists kj(E) such that for k ::: kj(E), we have Ixf - xii < E.

SO let E > 0 be given. By definition of xk -+ x, we know that there exists k(f)such that d(xk, x) < E for all k ::: k(E). Therefore, for k :::.keE) and any i,we obtain:

(

" ) 1/2d(x, y) =:: L Ix; _ y;l2 ,

1=1

Proof We will use the fact that.the Euclidean distance between two points x,=(XI ••.. , xn) and Y = (YI, ... , y,,) in JR." can be written as

1.2 Sets and Sequences

Page 18: A First Course in Optimization Theory

1,1,2,1,2,3,1,2,3,4, ...

If a sequence (Xk) is convergent (say, to a limit x), then it is apparent that everysubsequenceof {Xk 1must converge to x. It is less obvious,but also true, that if everysubsequence {xm(k)l of a given sequence {Xk} converges to the limit x, then {Xk}itself converges to x. Wedo not offer a proof of this fact here, since it may be easilyderivedas a consequence of other considerations. SeeCorollary 1.19below.

In general, a sequence {Xk 1 may have any number of limit points. For instance,every positive integer arises as a limit point of the sequence

Proof If x is a limit point of {Xk} then there must be a subsequence {xm(k)l thatconvergesto x. By definition of convergence, it is the case that for any € > 0, allbut finitelymany elements of the sequence (Xm(k)} mustbe within € of x. Therefore,infinitelymany elements of the sequence {Xk} mustalso be within € of x.

Conversely,suppose that for every e > 0, there are infinitely many m such thatd(xm, x) < E. Define a subsequence (Xm(k)} as follows: let m(l) be any m forwhich d(xm, x) < I. Now for k = 2,3, ... define successively m (k) to be anym that satisfies (a) d(x,xm) < Ilk, and (b) m > m(k - 1). This constructionis feasible, since for each k, there are infinitely many m satisfying d (xm , x) <I( k. Moreover,the sequence (Xm(k)} evidently convergesto x, so x is a limit pointof'{xr}, 0

Theorem 1.9 A point x is a limit point of the sequence {Xk} if ani only if for anyf > 0, there are infinitely many indices m for which d(x, xm) < f.

1.2.2 Subsequences and limit PointsLet a sequence {Xk} in IRn be given. Let m be any rule that assigns to each kEN avaluem(k) E N. Suppose further thatm is increasing, i.e., for each kEN, we havem(k) < m(k + 1). Given {Xk}, we can now define a new sequence (Xm(k)}, whosek-thelement is them(k)-th element of the sequence {Xk}. This newsequence is calleda subsequence of {Xk}. Put differently, a subsequenceof a sequence is any infinitesubset of the original sequence that preserves the ordering of terms.Even if a sequence {Xk} is not convergent, it may contain subsequences that con­

verge.Forinstance, thesequence0, 1, 0, 1,0, 1, ... hasnolimit,but the subsequences0,0,0, ... and I, I, I, ... which areobtained fromtheoriginal sequenceby selectingthe odd and even elements, respectively,are both convergent.If a sequence contains a convergent subsequence, the limit of the convergent

subsequence is called a limit point of the original sequence. Thus, the sequence0, I, 0, I, 0, I, ... has two limit points ° and 1. The following result is simply arestatementof the definition of a limit point:

Chapter I Mathematical Preliminaries10

Page 19: A First Course in Optimization Theory

kef), the proof that {x~} is a Cauchy sequence for each i isBy setting kj(E)complete.

IxI" - xII =

Proof Let {xk} be a Cauchy sequence in R". We are to show that for any i and anyE > 0, there is kj(E) such that for all m, / ::::kj(f), we have Ixr - xii < E. SO letE > 0 and i be given. Since {xk} is aCauchy sequence, there is kef) such that for allm, / ::::k(f), we have dtx'"; xl) <: E. Therefore, for any m , I::: k(f), we have

Theorem 1.10 A sequence {xk} in Rn is a Cauchy sequence if and Dilly if for eachi E {I, ... , n}, the sequence {x~} is a Cauchy sequence in lR.

d(xm,Xt) = 1~2 - kl ~ 1;2 + kl ~ l[k(~)J2 + [k(~)J21= [k(~)j2'

and this last term is less than f by choice of k(E).Our first result deals with the analog of Theorem 1.7 for Cauchy sequences. It

establishes that a sequence in ]Rn is a Cauchy sequence if and only if each of thecoordinate sequences is a Cauchy sequence in JR.

can be made as small as desired. A sequence which satisfies the Cauchy criterion iscalled a Cauchy sequence ..\n example of a Cauchy sequence is given by the sequence (Xk I in JR defined hy

Xk = l/k2 for all k. To check that this sequence does, in fact, satisfy the Cauchycriterion, let any e > 0 be given. Let kef) be any integer k that satisfies k2f > 2. Form, l ::::k(E), we have

1.2.3 Cauchy Sequences and Completeness

A sequence {Xk} in ]Rn is said to satisfy the Cauchy criterion if for all E > 0 thereis an integer k(E) such that for all m, / ::::k(f), we have d(xm, XI) < f. Informally,a sequence {Xk} satisfies the Cauchy criterion if, by choosing k large enough, thedistance between any two elements Xm and XI in the "tail" of the sequence

Of course, it is also possible that 110 subsequence of a given sequence converges ..so a given sequence may have no limit points at all. A simple example is the sequence{Xk} in JR defined by Xk = k for all k: every subsequence of this sequence divergesto +00.

II1.2 Sees and Sequences

Page 20: A First Course in Optimization Theory

Anymetric space which has the property that every Cauchy sequence is also aconvergentsequence is said to be complete. Thus.by Theorem 1.11,IRn is complete.It is important to note that not allmetric spacesare complete (so the definition is notvacuous).For more on this, see AppendixC.

Evenwithout appealing to the completeness of JRn, it is possible to show that aCauchy sequence must possess two properties that all convergent sequences musthave:namely, that it must be bounded, and that it has at most one limit point.

Thus, setting k(€) = k(ry), the proof that {Xk} is a Cauchy sequenceis complete.The proof that everyCauchy sequencemustconvergeunfortunatelyrequires more

apparatus than we have built up. In particular, it requires a formal definition of thenotion of a "real number," and the properties of real numbers. Appendix B. whichwhich presents such a formal description. proves that every Cauchy sequence in IRmustconverge.An appeal to Theorem 1.10then completes the proof. 0

Proof The proof that every convergentsequencemust also be aCauchy sequence issimple.Suppose Xk -+ x. Let e > 0 be given.Define T} = E /2. Since Xk -+ x, thereexistsk(1]) such that for all j ~k(1]), we havedtx], x) < TJ. It follows by using thetriangle inequality that for all l- / 2:: k(1]), we have

Theorem 1.11 A sequence {Xk} in IRn is a Cauchy sequence if and only if it is aconvergent sequence, i.e., if and only if there is x E IRn such that Xk -+ x.

Observe that the important difference between the definition of a convergent se­quence and that of a Cauchy sequence is that the limit of the sequence is explicitlyinvolvedin the former,but plays no role in the latter.Our next result shows,however,that the two concepts are intimately linked:

oand the proof is complete.

Now suppose {xf} is a Cauchy sequence for each i .We are to show that for anyE > 0, there is k(E) such that m,l ~ k(E) implies dtx'", xl) < E. SO let E > abe given. Define 11 = €/Jn. Since each (xt) is a Cauchy sequence, there ex­ists kj(I) such that for all m, I ~ k/(T}), we have Ixi' - xii < II. Deline k(E) =max{k,(1]), ... , kn(1])}. Then, form, 12:: k(€), we have

d(xm,xi) == (t1IXF _xfI2)'/2 < (t1[~r)'/2 = €,

Chapter J Mathematical Preliminaries12

Page 21: A First Course in Optimization Theory

Finally. two Cauchy sequences (Xk) and {n} are said to be equivalent if for all€ > 0, there is keEl such that for all i :::keEl we have d (Xj. Yj) < E. We writethis as {Xk} '" {Yk}. It is easy to see that r- is, in fact. an equivalence relationship.It is reflexive (IXk} '" Ixd) and symmetric (IXk} '" Iyk) implies IYk] '" Ixk}). Itis also transitive: IXk} "" {yk} and {yk] ,....,Izk! implies {Xk} "" {zk}. To see this,let E > ° be given. Let T) = E/2. Since {Xk] '" IYk}, there is kt (T) such that forall k ~ kt(T/). we have d(XhYk) < T). Similarly, there is k2(T/) such that for allk:::: k2(TJ), d(Yk, Zk) < TJ.Let keEl = maxlkt(TJ). k2(TJ)}. Then, for k ::: kef). wehave d(Xk , Zk) :::: d(Xk, Yk)+d(Yh Zk) < TJ+T)= E, and we indeed have {Xk} '" {zkl.

Since E > 0 was arbitrary, this string of inequalities shows precisely that x, -+ x.o

=E.

Then. by construction, M ::: Ilxk II for all k, and the proof is complete.To see that a Cauchy sequence cannot have two or more limit points. suppose

that X is a limit point of the Cauchy sequence Ix,!;}. We will show that Xk -+ x. LetE > 0 be given. and let T/ = E/2. Since {Xk} is Cauchy. there exists k(I/) such thatdtx], XI) < T)for all f .l ::: k(T). Moreover. since x is a limit point of {Xk}. there arcelements of the sequence {Xk} that lie arbitrarily close to X; in particular. we can f ndm ::::k(TJ) such that d(xm, x) < TJ. Therefore, if i ~k('1),

d(xj.x) :::: d(xj,xm) +d(xm.x)

::::T)+T)

Let M be the maximum of the finite set of numbers

IIXjll = lI(xj - Xk(I» + xk(l)11

:::: IIxj-xk(I)II+lIxk(I)11

< 1 + IIXk(l) II·

Proof To see that every Cauchy sequence {Xk} in IRn must be bounded. take E = linthe Cauchy criterion. Then. there exists an integer k(l) such that for all j.l :::k( I),we have dtx], Xl) < 1.An application of the triangle inequality now implies for alli > k(l) that

Theorem 1.12 Let {Xk} be a Cauchy sequence in R". Then,

I. {Xk} is bounded.2. {Xk} luis (It most (JIll' limit point.

1.2 Sets and Sequences

Page 22: A First Course in Optimization Theory

21nfact. the procedure of constructing the real numbers from the rational numbers using Cauchy sequencesof rationals (see Appendix B) defines a real number to be an equivalence class of Cauchy sequences ofrational numbers.

Remark 2 To avoid legitimate confusion, we should stress the point that some

Rcmurk I By our conventions that Slip A = +00 when U (A) is empty and inf A =-00 when L(A) is empty, this will establish that sup A and inf A are defined for anynonempty set A C JR. 0

Theorem 1.13 If U(A) is nonempty, the supremum of A is well defined, i.e., thereis a* E U(A) such that a" :s ufor all u E U(A). Similarly, if L(A) is nonempty, theinfimum of A is well defined, i.e., there is a E L(A) such that a 2: I for aliI E L(A).

In general, U(A) and/or L(A) could be empty. For instance, if A = N, the set ofpositive integers, then U(A) is empty; if A = Z, the set of all integers, then bothU(A) and L(A) are empty. IfU(A) is nonempty, then A is said to be bounded above.If L(A) is nonempty, then A is said to be bounded below.

The supremum of A, written sup A, is defined to be the least upper bound of A.Namely, if U(A) is nonempty, then sup A is defined to be the unique pointe" E U(A)such that a" :s u for all u E U (A). If U (A) is empty, on the other hand, then A hasno finite upper bounds, so, by convention, we set sup A = +00.

Similarly, the infimum of A, denoted inf A, is defined to be the greatest lower boundof A. That is, when L(A) is nonempty, then inf A is the unique point a E L(A) suchthat a 2: I for alii E L(A). If L(A) is empty, then A admits no finite lower bounds,so, by convention, we set inf A = -00.

L (A) = (I E lR I / :s a for all a E A).

while the set of lower bounds of A, denoted L(A), is given by

U(A) = (u E IR Iu ~ a for all a E A I

1.2.4 Suprema, Infima, Maxima, MinimaLet A be a nonernpty subset of R The set of upper bounds of A, denoted U(A), isdefined as

Equivalent Cauchy sequences must have the same limit. 2 For, suppose {Xk} ~ IYk Iand x, -+ x.Given s > O,thereiskl(E)Suchthatk ~ kl(E)impliesd(xk,x) < E/2.There is also k2(E) such that k 2: k2(E) implies d(Xk, Yk) < E/2. Therefore, lettingk(E) = max{kl (E), k2(E)}, k 2: k(E) implies d(x, Yk) :s d(x, Xk) + d(Xk, Yk) <t /2 + E /2 = E, which states precisely that Yk -+ x.

Chapter I Mathematical Preliminaries14

Page 23: A First Course in Optimization Theory

3Note that zi rt U(A) docs not imply that ZI EA. For instance. if A = [0. I} U [3.4} and 'I = ~.lhfnZI rt U(A) and ZI <t A.

Proof We prove that sup A is well defined whenever A is bounded above. A similarprocedure can be employed to show that inf A is well-defined whenever A is boundedbelow. The details are left as an exercise.

So suppose A is bounded above and U(A) is nonempty. We will construct twosequences (ak) and (Uk) that have a common limit a *. The sequence {ak} will consistentirely of points from A and will converge "upwards" to a"; while the sequence{Uk} will consist solely of points from U (A), and will converge "downwards" to a' .It will follow quite easily from the construction of these sequences that the commonlimit a* is, in fact. sup A.

The required sequences are constructed using a divide-and-conquer procedure.Pick any point (II in A and any point u , in U(A). LeI =1 = (III -! 111)/2 be thci rmidpoint. Of course. al ::: ZI S III. There are two possible cases: ZI E U (A) andz, ¢ U(A). In case I, where ZI E U(A), set a2 =:; a( and U2 = z(. In case 2, whereZI ¢ U (A), there must exist some point a E A such that a ~ z 13 In this case. letai =:; a and 112 = III (see Figure 1.2).

authors (e.g., Apostol. 1967. or Bartle. 1964) take an axiomatic approach to thereal line. In this approach, Theorem 1.13 is an axiom, indeed, a key axiom. Otherauthors adopt a constructive approach to the real line, building the real numbers fromthe rationals by using, for example. the method of Dedekind cuts (Rudin, 1976),01equivalent Cauchy sequences of rationals (Hewitt and Stromberg, 1965; Strichartz ..1982; Appendix B in this book). In this case, the result that bounded sets have welldefined suprema is genuinely a "theorem." We implicitly adopt the latter approachin this chapter. 0

Fig. 1.2. Constructing the Supremum

Case 2:

"1+''1 - 'U.;F '2 - 2

Case 1:

151.2 Sets and Sequences

Page 24: A First Course in Optimization Theory

A similar result to Theorem 1.14 evidently holds for the infimum. It is left to thereader as an exercise to fill in the details.

Proof For notational ease, let 0* = sup A. Suppose the theorem failed for somef > O.that is. suppose there were f > 0 such that 0 ~ a" - E for all a E A. Then,a" - E would be an upper bound of A. But this violates the definition of a" as theleast upper bound. since a" - E is obviously strictly smaller than a", 0

Theorem 1.14 Suppose sup A is finite. Then.for any £ > 0, there is a(£) E A suchthat a«() > sup A-I'.

The following result is an immediate consequence of the definition of the supre­mum.We raise it to the level of a theorem, since it is an observation that comes inhandy quite frequently:

Note that in either case we have a2 E A and U2 E U(A). Moreover. we mustalso have al :s «: and III ?:: U2. Finally, in the first case, we have d(a2, uz) ==d Ca«,(al + ul)/2) = dta», ul)/2, while in the second case, we have d(a2. U2) ==d(a,I,,) ~ dtas ; ud/2. since a ?:: (01 + ud/2. Thus, in either case, we haved(a2, U2) ~ deal, uJ)/2.

We iterate on this procedure. Let Z2 == (a2+u2)/2 be the midpoint of a2 and 112. IfZ2 E U(A), IC[U3 = Z2 anda3 == az. Ifz2 ~ U(A), then [here mustexista' E A suchthata' ?:: Z2; in this case, set a3 = a' and 113= U2. Then, it is once again true that thefollowing three conditions hold in either case: first, we have a3 E A and 113 E U (A);second. a3 ::: «z and U3 ~ U2; finally, d(03, U3) ~ d(a2, u2)/2 ~ d(ol. II I)/4.

Continuing in this vein. we obtain sequences {Ok Iand {Ilk} such that

1. for each k, Ok E A and Uk E U(A);2. (ad is a nondecreasing sequence. and (Uk) is a nonincreasing sequence; and3. d(ak, Uk) ~ dia, II, )/2k.

Property 2 implies in particular that (Ok) and (Uk) are bounded sequences. It nowfollows from property 3 that {ad and {Uk} are equivalent Cauchy sequences, and.therefore, that they have the same limit a*.Wewill show that 0* == sup."t" by showingfirst that a" E U(A), and then that 0* ~ It for all It E U(A).

Pick any (/ E A. Since Uk E U(A) for each k, we have Uk ::: (/ for each k, soa" == lim, Ilk ::: a. Since a E A was arbitrary. this inequality implies a" E U(A).Now pick any U E U(A). Since Ok E A for each k, we must have u :::Ok for each k.Therefore, we must also have II ::: lim, Ok = 0*. Since U E U(A) was arbitrary. thisinequality implies II :::.a" for all U E U CAl.

Summing up. a" E U(A) and a" ~ u for all II E U(A). By definition, this meansa" == sup A, and the proof is complete. 0

Chapter I Mathematical Preliminaries16

Page 25: A First Course in Optimization Theory

is an unbounded sequence but it does not diverge to +00 (why"), On the other hand, it

if k is oddif k is even{

I,;q = k ,

We will also refer to monotone increasing sequences as nondecreasing sequences.and to monotone decreasing sequences as nonincreasing sequences.

Monotone sequences possess a particularly simple asymptotic (i.e., limiting) srruc­ture, and this is one of the reasons they are of special interest. To state the formalresult requires one more definition.

Say that a sequence {Xk} in lR diverges to +00 (written Xk i +(0) if for all positiveintegers pEN, there is k(p) such that for all k ::::k (p), we have Xk ::: p: and Ihal{Xk} diverges to -00 (written Xk -l- -00) if for any positive integer p E l"l. thereexists k(p) such that for all k ::::k(p), we have Xk S - p.

Observe that while a sequence that diverges to ±oo must necessarily be unbounded.the converse is not always true: the sequence {Xk} defined by

Xk+1 :s Xk for all k.

It is monotone decreasing if

Xk+1 ::: Xk for all k.

1.2.5 Monotone Sequences in lRA sequence {Xk} in IR is said to be a monotone increasing sequence if it is the casethat

Two concepts closely related to !he ,supremum and the infimum arc the fII1Ltilluun.

and the minimum of a nonernpty set A C R The maximum of A, denoted max A.is defined as a point Z E A such that z ::: a for all a E A. The minimum of A.denoted min A, is defined as a point W E A such that w :S a for all (/ EA. Bydefinition, the maximum must be an upper bound of A, and the minimum must be alower bound of A. Therefore, we can equivalently define max A = An U(A), andmin A = An L(A).It is very important to point out that while sup A and inf A are always defined for

any nonempty set A (they could be infinite), A n U(A) and A n L(A) could bothbe empty, so max A and min A need not always exist. This is true even if sup A andinf A are both finite. For instance. if A = (0, I). we have U (A) = {x I x ::: I} andLCA) = {x I x :s OJ, so sup A = 1 and inf A = 0, but max A and min A do not exist.Indeed, it follows from the definition that if max A (resp. min A) is well defined, wemust have max A = sup A (resp. inf A = min A), so that max A exists if, and onlyif, sup A E A (resp. min A exists if and only if inf A E A).

171.2 Sets and Sequences

Page 26: A First Course in Optimization Theory

1.2.6 The Lim Sup and Lim In!We now define the important concepts of the "lim sup" and the "lim inf" of a real­valued sequence. Throughout this discussion, we will assume as a convention thatthe values ±oo are allowed as "limit points" of a sequence of real numbers {Xk 1 inthe following sense: the sequence {Xk 1 will be said to have the limit point +00 if

Corollary 1.16 Let {:q} be a monotone decreasing sequence in lit If {xd is WI­

bounded, it must diverge 10 -00. If {Xk} is bounded, it must converge to the limit x,where x is the infimum of the set of points {Xl, x2, ... }.

Since a sequence {Xk} in IRis monotone increasing if and only if {- Xk} is monotonedecreasing. Theorem 1.15 has the following immediate corollary:

owhich, of course, implies that d(Xk. x) < f for all k ~ kef).

Xk E (x - f,X]. k ~ kef).

Proof First suppose that {Xk} is an unbounded sequence. Then. for any pER thereexists an integer k(p) such that Xk(p) ~ p. Since {Xk} is monotone increasing. it isnow the case that for any k ~ k(p). we have Xk ~ Xk(p) ~ p. This says preciselythat {Xk} diverges to +00.

Next suppose that (Xk) is a bounded sequence. We will show that Xk ~ x, where.as defined in the statement of the theorem. x is the supremum of the set of points{x I•X2, •.. }. Let e > 0 be given. The proof will be complete if we show that there iskef) such that d(x, Xk) < f for all k ~ kte).

Since x is the least upper bound of the set of points {XI, X2 •••. }. X - f is not anupper bound of this set. Therefore. there exists k(f) such that Xk(f) > X - E. Since{Xk} is monotone increasing, it follows that Xk > x - f for all k ~ kef). On the otherhand. since x is an upper bound, it is certainly true that x ~ Xk for all k. Combiningthese statements. we have

Theorem 1.15 Let {Xk} be a monotone increasing sequence in R. If {Xk} is un­bounded, it must diverge to +00. If {xk} is bounded, it must converge to the limit x,where x is the supremum of the set oj points {Xl, x2, " .}.

is true that if {Xk} is an unbounded sequence. it must contain at least one subsequencethat diverges (to either +00 or -(0).The following result classifies the asymptotic behavior of monotone increasing

sequences.

Chapter J Mathematical Preliminaries18

Page 27: A First Course in Optimization Theory

Theorem 1.17 Let {Xk) be a real-valued sequence, and lei A denote the set of alllimit points of {Xk} (including ±oo if (xd contains such divergent subsequences).For notational convenience. let a = lim SUPk-->00Xk and b = lim infk-> 00 :(k. Then:

1. There exist subsequences m(k) and l(k) of k such (hat Xm(k) -+ a andX/(k) ---+ b.

2. a = sup A and b == inf A.

and is usually den()te_Q_lil11k-+00inf/~kXl, or just lim infk-4(X)xs, Once again. eitherbk = -00 for some k (in ~hlch-case-bk'; ,:::ooToralJk). or {hi:} is a nondecreasingsequence in IR. Therefore, lim infk--+ooXk. is always well defined for any real-valuedsequence {xd. although it could also take on infinite values.

The following result establishes two important facts: first, that the lim sup ant]lim inf of a sequence {xkl are themselves limit points of the sequence {xd. andsecond, that the lim sup is actually the supremum of the set of limit points of (Xk I.and the lim inf is the infimum of this set. This second part gives us an alternativeinterpretation of the lim sup and lim inf.

and is usually abbreviated as lirru .......oo sUP/~k XI. or in more (;()rn.ra.<.:t.i1ot<i!iQIl..'l\simply liITlSUPk-4ooxk: To see that the lim sup is always well defined. note that thereare onfy th'ieepossibilities that arise concerning the sequence (ak). First, it could bethat ak = +00 for some k, in which case Uk = +00 for all k. If this case is ruledout (so Uk is finite for each k), the sequence (okl must satisfy Uk+l ::: ak for all k ,since ak+t is the supremum over a smaller set; that IS, (ak) must be a nonincreasingsequence in R By Corollary 1.16, this specifics the only two remaining possibil itics:either (ak) is unbounded, in which case ak t -00. or {ak] is bounded, in which case(ak) converges to a limit a in R In all three cases, therefore. the limit of Ok is welldefined (given our convention that ±oo may be limiting values), and this limit is, ofcourse, lim sUPk--+ooxi, Thus, the lim sup of any real-valued sequence always exists.

In a similar vein. the lim in/of the sequence (Xk) is defined as the limit as k -+ 00

of (bk}, where

{xt! contains a subsequence IXmtk»).thatdiverges to +00. and to have the limit point-00 if (Xk) contains a subsequence (X/(k)} that diverges to -00. In particular. if thesequence {xk} itself diverges to +00 (resp. -00). then we will refer to +00 (resp.-00) as the limit of the sequence (Xk).

So let a sequence {Xk} in lR be given. The lim sup of the sequence {.\'k) is thendefined as the limit as k ---+ 00 of the sequence (ak J defined by

ak = SUp{Xk.Xk+I.Xk+2 .... ) {fl' Tr;"(-)txo)(k

191.2 Sets and Sequences

Page 28: A First Course in Optimization Theory

-tr Ok is finite for some k, then OJ must also be finite for any j > k, because oJ is the supremum over asmaller set.

and from the definition of the supremum, this is possible only if for all positiveintegers p, there is k(p) such that Xk(p) ::: p. Of course, this is the same as sayingthat there is a subsequence of {Xk J which diverges to +00. It therefore follows thata is a limit point of {Xk Iin this case also.

Finally, suppose a = -00. This is possible only if the non increasing sequence(ak Iis unbounded below, i.e., that for any positive integer p, there is k(p) such thatak < - p for all k ?: k(p). But ak is the supremum of the set {Xk, xk+ I,...J, so thisimplies in tum that for all k :::k{p), we must have Xk < - p. This states preciselythat the sequence {Xk} itself diverges to -00, so of course a is a limit point of {Xk}in this case also. This completes the proof of part 1 that a EA.

SUP{XI, X2, ••• } = +00,

We will show that for any k ::: K, we must now have lak - al ::: E, where, as in thedefinition of the lim sup, ak = SUP{Xk, Xk+ I, ... J. This violates the l,iefinition of a asthe limit of the sequence {ak J, and provides the required contradiction.

So pick any k ::: K. For any j ::: k, there are only two possibilities: eitherXj ::: a +E,or Xj :S a-E. Therefore, ak must satisfy either (i) Ok ::: a + E (whichhappens if there exists at least one j :::k such that Xj 2: a + E), or (b) ak :::::a - E(which occurs if x j :::::a - E for all j :::k). In either case, lak - a I ::: E, as required.This completes the proof that a is a limit point of {xd when a is finite.

Now suppose 0 = +00. This is possible only ifzn = +00 for all k.4 But zrj = +00implies

Xk i (a - E, a + E).

Proof We prove the results for the lim sup. The arguments for the lim inf are anal­ogous; the details are left to the reader. We consider, in tum, the cases where a isfinite, a = +00, and a = -00.

We first consider the case where a is finite. We will show that for any E > 0, aninfinite number of terms of the sequence {Xk} lie in the interval (0 - E, 0 + E). ByTheorem 1.9, this will establish that a is a limit point of {Xk}.

Suppose that for some e > 0, only a finite number of the terms of the sequence{.xdlie in the interval (a - f, a +E). We will show that a contradiction must result.If Xk E (a - E, a + E) for only finitely many k, there must exist K sufficiently largesuch that for all k ::: K,

Remark Note that the first part of the theorem does not follow from the second,since there exist sets which do not contain their suprema and infima.

Chapter 1 Mathematical Preliminaries20

Page 29: A First Course in Optimization Theory

The Exercises contain some other useful properties of the lim sup and lim inf.

Corollary 1.19 A sequence {Xk I in IRn converges to a limit x if and only if ('I'crysubsequence of x; converges to x.

Finally, the following result is obtained by combining Theorems 1.18 and 1.7:

Proof From the second part of Theorem 1.17. lim SUPk-> 00 Xk is the supremum. andlim inf', ....oo Xk the infimum, of the set of limit points of {Xk). So lim suPk....co Xk =lim infk-+oo Xk implies that (Xk) has only one limit point. and therefore that it converges. Conversely. if lim suPk ....ooxi; > lim infk-+oo Xk. then the sequence (Xk) hasat least two limit points by the first part of Theorem 1.17. I I

Theorem 1.18 A sequence {Xk} ill IR COli verges to a limit x E IR if and oniy IJlim sUPk... ooXk == lim inf , ... oo Xk == x. Equivalently, {x/(l converges to x if andonly if every subsequence of (x/(} converges to x.

It is a trivial consequence of Theorem 1.17 that lim sUPk-+ooXk ~ lim inf k--. 00 Xic

for any sequence {Xk}. (This could also have been established directly from thedefinitions. since Ok ~ bk for each k.) Strict inequality is. of course. possible: thesequence {xk} = to. 1,0, 1,0, I, ... J has lim sUPk-+00 Xk = I and lim inf k-.oc··q =O. Indeed, the only situation in which equality obtains is identified in the followingresult:

[Jand the proof is complete.

a = lim am(k) ~ lim Xm(k) = x,k-+oo k-+oo

Part 2 will be proved if w_e£l}!l}how that a is an upper bound of the set of limi \points A: we have already shown that a E A, and if a is also an upper bound of A,we must clearly have a == sup A. If a = +00, it is trivial that a ~ x must hold forall x E A. If a = -00, we have shown that the sequence IXk) itself must divcrg«to -00, so A consists of only the point a. and vacuously therefore. a ~ .r for all.c F A.

This leaves the case where a is finite. Let x E A. Then. by definition of A. thereexists a subsequence (Xm(k») such that Xm(k) -+ x. Now. um(k) as the supremum of theset {Xm(k). Xm(k)+l. Xm(k)+2 •.•• J clearly satisfies ameki ~ Xm(k). and since Ok ---'> a,it is also evidently true that ameki -+ a. Therefore,

211.2 Sets and Sequences

Page 30: A First Course in Optimization Theory

Among the commonly encountered closed and open sets are the "closed unitinterval" [0,1) defined as [x E IR I 0 ::: x ::: II, and the "open unit interval"(0, I) defined as [x E lR I 0 < x < I}. Observe that there exist sets that areneither open nor closed such as the intervals [0, I) = {x E lR I 0 ::: x < I}, and(0, I) == [x E lR I 0 < x ::: I}.

Proof Suppose S is closed. Let {Xk} be a sequence in S and suppose Xk ~ x. Weare to show that XES. Suppose x r/:. S, i.e., x E SC. Then, since S is closed, SCmust be open, so there exists r > 0 such that B(x, r) eSC. On the other hand,by definition of Xk ~ x, there must exist k(r) such that for all k :::::.k(r), we haved(xk, x) < r , i.e., such that Xk E B(x, r) C SC.This contradicts the hypothesis that{xk} is a sequence in S, which proves the result.

Now suppose that for all sequences IXk) in S such that Xk ~ x, we have XES.We will show that S must be closed. If S is not closed, then SC is not open; For theopenness of SC to fail, there must exist a point x E SC such that no open pall withcenter x is ever completely contained in SC,i.e., such that every open ball with centerx and radius r > 0 has at least one point x(r) that is not in SC. For k = 1,2,3, ... ,define ''k = 1/ k, and let Xk = x(rk). Then, by construction Xk ¢ SC for each k,so Xk E S for each k, Moreover, since Xk E B(x, rk) for each k, it is the case thatII (.\·k ,x) <: I'k = 1/ k . SllXk -+ x. But this implies that x must he ill S, a contradiction.

o

Theorem 1.20 A set S c lRn is closed if and only iffor all sequenoes (xkl such thatXk E Sfor each k and Xk ~ x, it is the case that xES.

In words, B (x , r) is the set of points in lRnwhose distance from x is strictly less thanr. If we replace the strict inequality with a weak inequality (d(x, y) ::: r), then weobtain the closed ball R(x, r).

A set S in lRn is said to be open if for all XES, there is an r > 0 such thatBtx , r) C S. Intuitively, Sis open if given any XES, one can move a small distanceaway from x in any direction without leaving S.

A set S in lRn is said to be closed if its complement SC = Ix E lRn I x rt S}is open. An equivalent-and perhaps more intuitive-definition is provided by thefollowing result, using the notion of convergent sequences. Roughly put, it says thata set S is closed if and only if any sequence formed by using only points of S cannot"escape" from S.

1.2.7 Open Balls, Open Sets, Closed SetsLet x E JR." . The open ball B(x, r) with center rand radius r > 0 is given by

B(x, r) = Iy E lRn Id(x, y) < r}.

Chapter 1 Mathematical Preliminaries22

Page 31: A First Course in Optimization Theory

C = [x E ]R2 I IIx II = 1)

'D = [x E 1R2 IlIxll s I)

is a convex subset of]R2. On the other hand, the unit circle

1.2.9 Convex Combinations and Convex Sets

Given any finite collection of points XI, ... ,Xm E R", a point Z E ]Rn is said tobea convex combination of the points (x] •... , xm) if there exists A E lRm satisfyirig(i) Ai ?:: 0, j = I, ... .rn , and (ii) L:7'=1 Ai = I, such that z = L:;:' I AjXj.

A set S C IRn is convex if the convex combination of any two points in S' is al soin S. Intuitively. S is convex if the straight line joining any two points in Sis it'clfcompletely contained in S.

For example, the closed and open unit intervals (0, I) and [0, 1] are both convexsubsets of JR, While the unit disk

oProof See Rudin (1976,Theorem 2.41, pAO).

Theorem 1.21 A set S C lRn is compact if and only if it is closed and bounded.

1.2.8 Bounded Sets and Compact SetsA set S in IRn is said to be bounded if there exists r > 0 such that S c B(O,I"I.that is, if S is completely contained in some open ball in IRn centered at the origin.For instance, the interval (0, I) is a bounded subset of lR. but the set of integers,{I, 2, 3, ... } is not.

A set S c ]Rn is said to be compact if for all sequences of points IXk) such tha tXk E S for each k, there exists a subsequence IXm(k)) of (xk) and a point x E ,...;

such that Xm(k) ~ x. In words, this definition is abbreviated as "a set is compact i fevery sequence contains a convergent subsequence." If S C lRn is compact. it is easyto see that S must be bounded. For, if S were unbounded. it would be possible 1<)

pick a sequence \xd in S such that \\Xk l\ > k, and such a sequence cannot contama convergent subsequence (why?). Similarly, if S is compact, it must also be closed.If not, there would exist a sequence {Xk} in S such that Xk ~ x, where x 1. S.All subsequences of this sequence then also converge to x, and since x 1. S, tiledefinition of compactness is violated. Thus, every compact set in JRn must be close dand bounded.

The following result establishes that the converse of this statement is also true. Itis a particularly useful result since it gives us an easy way of identifying cornpoctsets in ]R",

1.2 Sets and Sequences

Page 32: A First Course in Optimization Theory

1.2.10 Unions, Intersections, and Other Binary Operations

This section summarizes some important properties of open, closed, compact, andconvex sets. While most of the properties listed here are used elsewhere in the book,some are presented solely for their illustrative value.

Some definitions first. A set A is said to index a collection of sets S in 1R"if a setoS;' E S is specified for each a E A, and each S E S corresponds to some a EA.We will denote such a collection by (Sa)aEA, or when A is understood, simply by(.\,). When the index set A consists of a finite number of elements. we will call ita finite index set; if there are no restrictions on the number of elements in A. wewill call it an arbitrary index set. If B C A. then the collection (Sp)PEB is called asubcollection of (Sa )aEA. If B consists of only a finite number of elements, (Sp) PEBis called e finite subcollection. For notational clarity. we will use (S"')I{JEF to denotea finite subcollection, and retain (Sp)pEB to denote an arbitrary subcollection.

Given any collection of sets (Sa) indexed by A. recall that the union of the col­lection (Sa). denoted UaEASa. and the intersection of the collection (Sa). denoted

is not a convex subset of ]R2 since (-1,0) E C and (1,0) E C, but the convexcombination (0,0) 1. C. Additional examples of convex and nonconvex sets aregiven in Figure 1.3.

Fig. 1.3. Convex and Nonconvex Sets

Ilonconvex

convex

nonconvex

COil vex

Chapin J Mathematical Preliminaries

nonconvex

COIIYf>X

24

Page 33: A First Course in Optimization Theory

Proof By definition, naeAHex is closed if and only if (nftEA}/a)'" is open. I3yDeMorgan's laws. (naEAHa)C = UexEAHg. For each a E A, Hg is open since Ha isclosed. Now use Theorem 1.23. 0

Theorem 1.24 Let A be (III arbitrary index set. Suppose for each a EA. J/" is (/closed set. Then, naEA Hex is closed. That is, the arbitrary intersection of closed sets;.;closed.

Proof Suppose x E UaEAGa. Then. x E Gft for some a. Since Ca is open. there isr > 0 sueh that B(x. r) C G« C UOEAG"o II

Theorem 1.23 Let A be all arbitrary index set. Suppose for each a E A, Ca is anopen set. Then. UaEACa is a/so open. That is. the arbitrary union of open sets isopen.

The next set of results deals with conditions under which closedness and opennessare preserved under unions and intersections.

Proof We prove part 1 here. Pan 2 may be established analogously. and is left asan exercise. We first show that each element of the set on the left-hand side (LHS)of (I) is also an element of the right-hand side (RHS). thereby establishing UIS CRHS. Then, we show that RHS C LHS also. completing the proof.

So suppose x E L H S. Then. by definition. x 1:- Ga for any a EA. so x E G;; forall a, and, therefore x E RHS. So LH S C RH S.

Now suppose x E RHS. Then x E G~ for all a. or x rf- Ca for any a E A. Itfollows that x E L H S. Thus we also have RH S C L H S. []

Theorem 1.22 (DeMorgan's Laws) Let t\ be WI arbitrary index set, 1I1lt! /N (; (t)

be a collection of sets ill IRn indexed by A. Theil.

I. (UaEAGa/ = naEAG~. and2. (naEAGa)C = UaEAG~.

The following pair of identities. known as Delvlorgan's laws, relates unions. in­tersections, and complementation.

U"'EASa = {x E )Rn Ix E S'u for some a E A In"'EASa = (x E IR:n I x E .<"';, for all a E A) 0

l.2 Sets and Sequences

Page 34: A First Course in Optimization Theory

As an immediate corollary, we have the result that every nested sequence of com­pact sets has a nonempty intersection:

oProof See Rudin (1976, Theorem 2.36, p.38).

Theorem 1.28 Suppose (Sa )aeA is a col/ection of compact sets in IRnsuch that everyfinite subcollection (Stp)tpEF has nonempty intersection. Then the entire collectionhas nonempty intersection, i.e., naeA Sa '# 0.

Since compact sets are necessarily closed, all of the properties we have identifiedhere for closed sets also apply to compact sets. The next collection of results describessome properties that are special to compact sets. Given a subcollection (Sp)pEBof a collection (Sa )aeA, say that the subcollection has a nonempty intersection ifnfJeBSp ¥ 0.

Example 1.27 Let A = {I, 2,3, ... J. For each a E A, let Ca be the open interval(0,1 + I/a), and Ha be the closed interval [0, I - I/a]. Then, naeAGa = (0, 1],which is not open, while UaeA Ha = [0, I), which is not closed. 0

Unlike Theorems 1.23 and 1.24, neither Theorem 1.25 nor Theorem 1.26 is validif we allow for arbitrary (i.e., possibly infinite) intersections and unions respectively.Consider the following counterexample:

Proof This is an immediate consequence of Theorem 1.25 and DeMorgan's laws.o

Theorem 1.26 Let HI , ... , HI be closed sets, where I is some positive integer: Then.u1=:1 Hi is closed. That is, the finite union of closed sets is closed.

Proof Let x E nf=1 G], We will show that there is r > 0 such that Btx , r) Cnf =: IGi. Since x is an arbitrary point in the intersection, the proof will be complete.

Since x is in the intersection, we must have x E Gi for each i. Since G; is open,there exists rj > 0 such that B(x, rt) C G], Let r denote the minimum of the finiteset of numbers rl, ... , rio Clearly, r > O.Moreover, Btx , r) C B(x, rj) for each i,so B(x,r) C Gj foreachi. Therefore, B(x,r) C n1=IG;. 0

Theorem 1.25 Let G I, ... , Gr be open sets, where I is some positive integer: Then,n{=1 Gi is open. That is, the finite intersection of open sets is open.

Chapter I Mathematical Preliminaries26

Page 35: A First Course in Optimization Theory

Example 1.33 Let St = {(x, y) E IR2 I xy = J} and S2 = {(x. Y) E IR2 I xy =-1 }(see Figure 1.4). Then, 51 and S2are closed. For each k = 1.2, 3 ..... (k. 1/ k) E

51 and (-k, Ijk) E S2. so (0, 2jk) E St + S2. This sequence converges 10 (Om

The following example, which shows that the word "compact" in Theorem 1J 2cannot be replaced with "closed," provides another illustration of the power of com­pactness.

Proof Let (xk) be a sequence in 51 + 52. We will show that it has a convcrgen t

SUbsequence, i.e., that there is a point x E 51 + 52 and a subsequence / (k) of k suchthat XI(k) -+ x.

For each k, there must exist points Yk E 51 and Zk E S2 such that Xk = Yk + "j .

Since SI is compact, there is a subsequence m(k) of k such that Ym(k) ~ Y E SI .Now {Zm(k)}, as a subsequence of {Zk)' itself defines a sequence in the compact set52. Thus, there must exist a further subsequence of m(k) (denoted say, /(k» suchthat Z/(k) -+ Z E S2. Since l(k) is a subsequence of m (k) and Ym(k) -+ Y, it is clearlythe case that .Yi(k) -+ y. Therefore, XI(k) = .Yi(k) + Z/(k) is a subsequence of (xJc 1 thatconverges to y + z. Since Y E SI and z E S2, (y + z) E SI + S2. so we have shownthe existence of the required subsequence. LJ

Theorem 1.32 If SI and S2 are compact. so is 51 + 52.

Next, given sets 5t, S2 in IRn, define their sum 51 + S2 by:

SI + S2 = {x E lRn I X =Xt +X2, XI E 51,X2 E S21·

Example 1.31 Let Sk = (0,1/ k); kEN. Once again, Sk is noncompact for eachk (this time it is bounded but not closed). Since 1/k > 1/(k + I). the collection(5k )kENis evidently nested. But nb ISk is again empty. For if x E (0. I),x rt S; fo:­any k satisfying kx > I; while if x .:s 0 or x ::: I, x ~ Sk for any k . D

Example 1.30 For kEN, let Sk = [k, 00). Then, Sk is noncom pact for each k (it isclosed but not bounded). The collection (Sk)kENis clearly nested since Sk+ I C SAfor each k. However, nbt Sk is empty. To see this, consider any x E IR. If k is anyinteger larger than x, then x ¢. Si; so x cannot be in the intersection. 0

Compactness is essential in Theorem 1.28 and Corollary 1.29. The followill);examples show that the results may fail if the sets are allowed to be noncompact.

Corollary 1.29 Suppose (Sr)keNis a collection of compact sets ill IRn that is n{'.114J'('

i.e., such that Sk+l C Si for each k. Then nblsk :f 0.

271.2 Sets and Sequences

Page 36: A First Course in Optimization Theory

Example 1.34 Let S be the closed unbounded interval [0, (0), and let A = N. ForkEN, define Sk be the open interval (k - 2, k+2). Clearly, (SkheN is an open coverof S that hasno finite subcover. 0

It is an elementary matter to construct sets S in IRnand open covers (Sa )aeA of S,such that (Sa)aeA admits no finite subcover. Here are two examples:

The open cover (Su )aeA of S is said to admit a finite subcover if there exists a finitesubcollection (S'q,)",eF such that

The next result describes a property that is equivalent to compactness. A definitionis required first. A collection of open sets (Sa)aeA in R" is said to be an open coverof a given set S C jR", if

which cannot be in S) + S2. To see this, observe that (0, 0) can be in S) + S2 if andonly if there is a point of the form (x, y) E S) such that (-x, -y) E S2. However,xy = (-x)( -y), so (x, y) E S, implies (-x, -y) E St also. 0

Fig. 1.4. The Sum of Closed Sets Need Not Be Closed

y

Chapter 1 Mathematical Preliminaries2R

Page 37: A First Course in Optimization Theory

On the other hand. the convexity of SI and S2 obviously has no implications forS) U S2. For instance. if SI = [0. I] and S2 = [2. 3], then SI and S2 are both convex,but Sl U S2 is not.

Proof Let S = SI + S2. Pick any x , yES and A E (0, I). Then. there exist poi ntsXI E SI and X2 E S2 such that X = XI + X2; and points YI E SI and Y2 E S2 suchthat Y = YI + )'2. Since SI and S2 are both convex, AXI + (1 - A»)'I E SI andAx2 + (1 - A)Y2 E S2. Therefore,

[(Ax I + (I - A)YI) + (Ax2 + (l - A»)'2)J E S.

But Ax] +(1-).,)YI +A.x2+(l-),,)Y2 = A(XI+X2)+(l-A)(YI +yZ) = h+(l-A)y.so Ax + (l - ),,)y E S, as required. 0

Theorem 1.38 Suppose SI and S2 are COilvex sets ill lRn. Then SI + S2 is cOllvex.

Proof Let S = naEASa. If XI, X2 E S. then XI, X2 E Sa for every a EA. Sinceeach Sa is convex, any convex combination of XI and X2 is in each Sa, and therefore~~ 0

Theorem 1.37 Suppose (Sa )aEA is a collection of convex sets in jRn. Theil n"EA .S;,is also convex.

There is yet another property, called tue finite intersection property, that is equiv­alent to compactness. This property is described in the Exercisesv f-e, Ej'~~

Lastly. we turn to some of the properties of convex sets:

oProof See Rudin (1976, Theorem 2041, pAO).

Theorem 1.36 A set Sin ]Rn is compact if and only if every open cover of S has afinite subcover:

The following result states that examples such as these are impossible if (and onlyif) S is compact.

Example 1.35 Let S be the bounded open interval (0, 1), and let A = N..-FoLkEN, let Sk be the open interval (l/(k + 2), II k). A little reflection shows thatUkENSk = (0, 1) = S, and therefore that the collection (Sk )kEN is an open coverof S. The collection (Sk)kEN. moreover, admits no finite subcover. For, suppose{kl, ...• k/J is any finite subset of N. Let k" = max{kl, ... ,k,}. If x is such thato < x < l/(k'" +2), then x if, u5=1 Skj' so the finite subcollection (Sk)~= I does notcover S. 0

1.2 Sets and Sequences

Page 38: A First Course in Optimization Theory

Observe that A + B is defined only if A and B have the same numbers of rows andcolumns.

If A is an 11 x m matrix and B is an m x k matrix, their product A B is the n x kmatrix whose (i, j)-th entry is the inner product of the i-th row Ai of A and the j-th

[

all ~bll

A+B = :ani +bnl

If.A and B are two n x m matrices, their sum A + B is the n x m matrix whose(i, j)-th entry is a;j + bij:

[

a~1

A= :ani

is called the j-th column of the matrix A, and will be denoted Aj, j = 1, ... ,m.Thus, an 11 xm matrix A has 11 rows and m columns, and may be represented variouslyas

is called the i -th row of the matrix A, and will be denoted Ai, i = 1, ... , n. Thevector

[ail,.··,aim]

where ail is a real number for each i E {l, ... , n} and j E {I, ... , m}. We willfrequently refer toaij as the (i, j)-th entry of the matrix A. In more compact notation,we write the matrix as A = (aij). The vector

(a"al2

a21 a22A= .

ani an2

1.3 Matrices1.3.1 Sum, Product, Transpose

An n x m matrix A is an array

Chapter I Mathematical Preliminaries30

Page 39: A First Course in Optimization Theory

l. (A + B)' = A' + B'.2. (AB)' = B'A',

It is easy to check from the definitions that the transpose has the following prop­erties with respect to the sum and product. (For the second property. note that theproduct B' A' is well-defined whenever the product AB is well defined and viceversa.)

then A' is the 2 x 3 matrix

The transpose of a matrix A, denoted A', is the matrix whose (i. j)-th entry is aitThus, if A is an n x m matrix, A' is an m x n matrix.

For example, if A is the 3 x 2 matrix

oProof Immediate from the definitions.

l. A + B = B + A.2. Addition is associative: (A + B) +C = A + (B + e).3. Multiplication is associative: (A B)C = A (Be).4. Multiplication distributes over addition: A (B + e) = A B + Ac.

Theorem 1.39 The matrix slim A+ II and product A B have lire [ollowing properties:

Of course. for any i E (I •...• n) and j E {I •...• k), Ai . BJ = L7~I ailblj. Notethat for the product A B to be well-defined. the number of columns in A must be thesame as the number of rows in B. Note also that the product A B is not. in general.the same as the product BA. Indeed. if A is an n x m matrix. and B is an m x kmatrix. then AB is well-defined. but BA is not even defined unless n = k.

Ar.Be]I k

AS, Bf

A~· B'kA~· B2[

A~ . Bf

Az' BfAB=

A~· Bf

column BJ of B:

31/.3 Matrices

Page 40: A First Course in Optimization Theory

Note that every diagonal matrix is also symmetric.

o

Diagonal Matrix

1\ diagonal matrix D of order 11 is a square matrix of order n , all of whose off-diagonalentries are zero:

Symmetric Matrix

A square matrix A of order n is called a symmetric matrix if for all i, j E {I, ... , n},we have aij = aji. Observe that A is a symmetric matrix if and only if it coincideswith its transpose. i.e., if and only if we have A = A'.

Square Matrix

A square matrix is an n x m matrix A for which n =: m (i.e .• the number of rowsand columns are the same). The common value of nand m for a square matrix Ais called the order of the matrix. Given a square matrix A = (aU) of order n , theelements aii, i = I, ... n, are called the diagonal entries of A. and the elements at]

for i -:j:. j are called the off-diagonal elements.

1.3.2 Some Important Classes of Matrices

The object of this subsection is to single out some important classes of matrices.Properties of these matrices are highlighted in the succeeding sections on rank. de­terminants. and inverses.

Finally. a word on notation. It is customary in linear algebra to regard vectorsx E ~II as CO/III1/I1 vectors, i.e.• as n x I matrices, and to write x' when one wishes torepresent x as a row vector, i.e .• as a I x n matrix. In particular, under this convention.we would write x' y to denote the inner product of the vectors x and y. rather thanx . y as we have chosen to do. In an abuse of notation, we will continue to use x . yto denote the inner product, but will, in the context of matrix multiplication, regardx as a column vector. Thus. for instance, we will use Ax to denote the product ofthe m x n matrix A and the vector x E IRn. Similarly, we will use x' A to denote thepre-multiplication of the II x k matrix A by the vector x.

Chapter I Mathematical Preliminaries32

Page 41: A First Course in Optimization Theory

1.3.3 Rank of a MatrixLet a finite collection of vectors XI, ... , Xk in Rn be given. The vectors XI, .•.• Xk

are said to be linearly dependent if there exist real numbers UI •...• Uk. with a, :j; 0for some i,such that

Note that the transpose of an upper-triangular matrix is a lower-triangular matrix,and vice versa.

o

Upper-triangular Matrix

An upper-triangular matrix of order n is a square matrix of order n which has theproperty that all the entries below the diagonal are zero:

od22

Lower-Triangular Matrix

A lower triangular matrix of order 11 is a square matrix D of order II which has theproperty that all the entries above the diagonal arc zero:

The identity matrix is a diagonal matri x (and, therefore, also a symmetric matrix). Inaddition, it has the property that if A and B arc any k x nand Il x m matrices, thenA J == A and J B = B. In particular, therefore, /2 = / x 1 = I.

o

o

Identity Matrix

The identity matrix of order n is a square n x n matrix whose diagonal entries arc allequal to unity, and whose off-diagonal entries are all zero:

1.3 Matrices

Page 42: A First Course in Optimization Theory

Corollary 1.41 The rank peA) of any matrix A is equal to the rank peA') of itstransposeA'.

In the sequel. we will denote this common value of pr(A) and pC(A) by p(A).Note that since pr(A) S n and pC(A) S m, we must have peA) S minIm, n}.An immediate consequence of Theorem 1.40.which we present as a corollary foremphasis. is that the rank of a matrix coincideswith the rank of its transpose:

Proof See Munkres (1964. Corollarly 7.3. p.27) or Johnston (1984. Chapter 4.p.IIS). 0

Theorem 1.40 The row rank pr(A) ofanyn x m matrixA coincides with its columnrank pC (A).

I. There is some subset (II•...• Ik) of k distinct integersof the sst {I •...• n} suchthat the set of vectors All' ...• A'k are linearly independent.

2. For all selections of (k + 1) distinct integers (11, ... , h+ I) from {I •...• n}, thevectors Arl •...• Air are linearlydependent.

I *+1

Note that if k = n. the secondconditionis redundant.since it is not possible to select(n + 1) distinct integers from a set consistingof only n integers.Similarly.each of the m columnsof A definesa vectorin JRn• and the column rank

of A. denoted pC(A). is defined as the maximum number of linearly independentcolumns in A. Since A is an n x m matrix.we musthavepr (A) S nand pC(A) Sm.Among the most useful results in linear algebra is:

Let A be an n x m matrix. Then. each of the n rows of A defines a vector in JRm .The rowrankof A. denoted pr (A). is definedto be themaximumnumber of linearlyindependent rows of A. That is. the row rank of A is k if the following conditionsboth hold:

are linearly dependent since X + 3y - 2z = O. On the other hand. the followingvectors x, y. and z are obviously linearly independent:

If. on the other hand. the only solution to alxl + ... + akxk = 0 is al = ... =(Xk = O. the vectorsXI ••.•• Xk are said to be linearly independent.

For example. the vectors x, y. and z given by

Chapter I Mathematical Preliminaries34

Page 43: A First Course in Optimization Theory

1.3.4 The Determinant

Throughout this subsection. it will be assumed that A is a square matrix of order n,The determinant is a function that assigns to every square matrix A, a real number

denoted IAI. The formal definition of this function is somewhat complicated. Wepresent it in several steps.

Given the finite set of integers (1, .... n), a permutation of this set is a rewritingof these n integers in some order. say iI, ... , in. Fix a permutation. and pick any j, .

[]Proof See Johnston (1984. Chapter 4. p.122).

l. If B is any n x k matrix, then peA B) .s: min(p(A). pCB)}.

2. Suppose P and Q are square matrices of orders m and n, respectively, that arcboth offull rank. Then. p(PA) = p(AQ) = p(PA Q) = peA).

Theerem 1.43 Let A be an m x n matrix.

The three operations described in Theorem 1.42 are termed the "elementary rowoperations." They playa significant role in solving systems of linear equations bythe method of transforming the coefficient matrix to reduced row-echelon form (sec

, Munkres. 1964. for details).Finally. an m x n matrix A is said to be of full rank if p(A) = min(m, n). In

particular. a square matrix A of order n is said to have full rank if p (A) = n.

f]Proof See Munkres (1964. Theorem 5.1. p.IS).

I. by interchanging any two rows of A, or2. by multiplying each entry in a given row by a nonzero constant a, or3. by replacing a given row, say the i-th by itself plus a scalar multiple a of SOI1l('

other row, say the j-th,

the rank pCB) of B is the same as the rank pCA) of A. The same result is true If theword "row" in each of the operations above is replaced by "column."

Theorem 1.42 Let A be a given n x m matrix. If B is an n x m matrix obtainedfrom A

Our next result lists some important transformations of a matrix which leave Irsrank unchanged.

1.3 Matrices

Page 44: A First Course in Optimization Theory

IAI = a11a22a33 - a11a23a32 - a12a21033

+ a12a23a31 + al3a21a32 - a13a22a31·

then

a22

032

we have IAI = a II a22 - a21 a 12.The expression for the determinant of a 3 x 3 matrixis more complicated. If

al2 ]

a22[

allA=

021

Prefix a + sign (0 this product if the permutation jl, ... , jn is an odd permutationof {I, ... , n }, and a - sign if the permutation is even.

When all possible such expressions a Iii •.. Olljn are written down (prefixed withthe appropriate ± sign), the sum of these expressions gives us a number, which isthe determinant of A, and which we have denoted IA I.

For example, given the 2 x 2 matrix

Count the number of integers that follow ii in the ordering iv. ... , ill, but that pre­cede it in the natural order I, ... n. This number is the number of inversions causedby ji, When we determine the number of inversions caused by each i; and sum themup, we obtain the total number of inversions in the permutation ii- ... , ill' If thistotal is an odd number, we call the permutation an odd permutation; if the total iseven, we call the permutation an even permutation.

For example, consider the permutation {4, 2, 3, I} of the integers {I, 2, 3, 4}. Thenumber of inversions caused by the number 4 in the permutation is 3, since all threenumbers that follow it in the permutation also precede it in the natural ordering.The number 2 in the permutation causes only one inversion, since only the integer Ifollows it in the permutation and also precedes it in the natura) ordering. Similarly, thenumber 3 also causes only one inversion, and, finally, the number I evidently causesno inversions. Thus, the total number of inversions in the permutation {4, 2, 3, I} is3 + I + I == 5, making it an odd permutation.

Now let an n x nmatrix A = (oii) begiven. Let jl , ... , jn denote any permutationof the integers {1, ... , n}, and consider the vector (01 i. ' ... , anin)' Note that thisvector consists of one entry from each row of A, and that no two.entries lie in thesame column. Take the product of these entries

Chapter J Mathematical Preliminaries36

Page 45: A First Course in Optimization Theory

columns is called a submatrix of B. For instance. if

[bl1b12 b!3 ]B = h2) bn b23

b31 b32 b33b41 b42 b43

then the following are all submatrices of B:

[b21] [ b31 b33 J [bl1 bl2 b!3]b41 b43

b21 bn b23 .b31 h)2 bJ3

The notion of the determinant is closely related to that of the rank of a matrix. Tostate this relationship requires one further definition. Given a (possibly nonsquare)n x m matrix B. any matrix obtained from B by deleting some of its rows and/or

Proof Sec Munkres (1964. Chapter 8).

1. If the matrix B is obtained from the matrix A by inte nhang ing (IIlV 111'0 roll'S illA, then IBI = -IAI.

2. If B is obtained from A by multiplying each entry of some given TOll' of A by thenonzero constant a, then IBI= alAI.

3. If B is obtained from A by replacing some row of A (sa» row i) hy mil' i {JIIIS (X

times row i. then IBI = IAI.4. If A has a row of zeros, then IA I = O.5. If A is a lower-triangular (or upper-triangular) matrix of order n, then the

determinant of A is the product of the diagonal terms: IA I = a II '" Unn. Inparticular, the determinant of the identity matrix is unity: III= I.

Each of the first four properties remains valid If the word" row" is replaced hy"column."

Theorem 1.44 Let A be a square matrix of order 11.

For higher order matrices, the expressions get progressively me ssier: there are-4' '=24 permutations of the set {I. 2, 3, 4J. so there are 24 expression s in the determinantof a 4 x 4 matrix. while the number for a 5 x 5 matrix is 120.

It is evident that the definition of the determinant we have provided is not ofmueh use from a computational standpoint. In a later subsect ion. we offer someeasier procedures for calculating determinants. These procedures arc based on thefollowing very useful properties:

:'71.3 Matrices

Page 46: A First Course in Optimization Theory

It is possible to describe a procedure for constructing the inverse of a square matrixA which satisfies the condition that IA I =1= O. Consider first the (n - 1) x (n - 1)submatrix of A that is obtained when row i and column j are deleted. Denote thedeterminant of this submatrix by IA(ij)l. Define

Cij(A) = (-l)i+jIA(ij)I.

Cij(A) is called the (i, j)-th cofactorof A.

oProof See Johnston (1984, Chapter 4).

1. A necessary and sufficientconditionfor A to havean inverse is that A have rank11, or,equivalently, that IAI =I: O.

2. If A-I exists, it is unique, i.e., two different matrices Band C cannot both bethe inverse of a given matrix A.

Theorem 1.47 Let A he an n x 11 matrix.

1.3.5 The InverseLet a square n x n matrix A be given. The inverseof A, denoted A-I, is defined tobe an n x n matrix B which has the property that A B = 1.

.Corollary 1.46 A square matrix A of ordern has rank 11 if, and only if, IA I =1= o.

A special case of this result is that square matrices have full rank if and only ifthey have a non-zero determinant:

oProof See Munkres (1964, Theorem 8.1).

Theorem 1.45 Let B be an n x m matrix. Let k be the order of the largest squaresubmatrix of B whose determinant is nonzero. Then, p(B) = k. In particular, therowsof B are linearly independent if and only if B contains some square submatrixof order 11 whose determinant is nonzero.

Of course, not all submatrices of B need be square. The following, for instance, isnot:

Chapter I Mathematical PreliminariesJR

Page 47: A First Course in Optimization Theory

C'j(A) = (-I)'+jIA(ij)l.

where A(ij) is the (n - I) x (n - I) matrix obtained from A by deleting row j andcolumn j.

Here, as in the previous subsection, Cij(A) is the (i. j)-th cofactor of the matrix A.i.e.,

1.3.6 Calculating the Determinant

We offer in this subsection two methods for calculating the determinant of a squarematrix A. Each method is easier to use than directly following the definition of thedeterminant.

The first method is based on the observation that. from the definition of the deter­minant, the expression for IAI for any given n x n matrix A can be written as

oProof See Johnston (1984, Chapter 4, pp.133-l35).

Theorem 1.48 The inverse has the following properties:

l. The inverse of A-I is A: (A-1)-1 = A.2. The inverse of the transpose is the transpose of the ;nverse: (A ') - I = (A - I)'.3. (AB)-I = B-IA-1•

4. lA-II = l/IAI.5. The inverse of a lower- (resp. upper-) triangular matrix is also a lower- iresp.

upper-) triangular matrix.

We list some useful properties of the inverse in our next result. In stating theseproperties, it is assumed in each case that the relevant inverse is well defined.

Now, construct ann x n matrixC(..>f)whose (i. j)-th entry is Ci, (A). The transpose . -of C(A) is called the adjoint of A, and is denoted Adj(A):

. rCII.(A) cnl.CA)]AdJ(A) =: :.

Cln(A) CnnCA)

Finally, when each element of Adj(A) is divided by IA I. the result is A -I. Thus.

A-I = I~I Adj(A).

I ..? Matrices

Page 48: A First Course in Optimization Theory

521

I. If B is obtained from A by interchanging any two rows (or columns), thenIBI =: -IAI·

2. If B is obtained from A by replacing any row i (resp. column i) by itself plus extimes some other row j (resp. some other column j), then IB I ::::IA I·

3. If A is a (lower- or upper-) triangular matrix, then the determinant of A is theproduct of its diagonal elements.

Specifically, the idea is to repeatedly use properties I and 2 to convert a given matrixA into a triangular matrix B, and then use property 3. This method is easier to use thanthe earlier one on large matrices (say, order 4 or larger). Nonetheless, we illustrateits use through the same 3 x 3 matrix used to illustrate the earlier method:

JAJ ::::ailCiI + +ainCin

::::aljClj + + anjCnj'

The second method is based on exploiting the following three properties of thedeterminant:

Nothing in this method relies on using the first row of the matrix. Any row (or, forthat matter, column) would do as well; in fact, we have

JAJ= 1(18 - 8) - 5(27 -48) + 7(3 - 12) = 10+ 105 - ~3 = 52.

all a12 a13a21 1 a22 a231 1 a21 a231 + a131 a2l a221·a22 a23 =a11 - a12a3l

a32 a33 a31 a33 a31 a32°32 a33

For example, if

A~U 5 :].2

then, we have

This observation gives us a recursive method for computing the determinant, sinceit enables us to express the determinant of an n x n matrix A in terms of the deter­minants of the (n - I) x (n - I) submatrices A(lI), ... , A(1n). By reapplying themethod, we can express the determinants of each of these (n - I) x (n - I) subma­trices in terms of some of the (n - 2) x (n - 2) submatrices of these submatrices,and so on. Thus, for example, for a 3 x 3 matrix A = (ai), we obtain:

Chapter 1 Mathematical Preliminaries40

Page 49: A First Course in Optimization Theory

is continuous everywhere except at x = O. At x = 0, every open ball B(x,,) withcenter x and radius 8 > 0 contains at least one point y > 0, At all such points.fey) = I > 0 = I(x), and this approximation docs not get better, no matter howclose y gets to x (i.e., no matter how small we take .5 to be).

A function I: S --+ T is said to be continuous Oil S if it is cont inuous at each pointin S,

Observe that if I: S C IRn --+ IRI, then J consists of I "component functions"(/1, ... ,r» i.e .• there are functions t'. S ~ JR, i = I .... ,I, such that for eachXES, we have f(x) = (fl (x), ... , II (x). It is left to the reader as an exercise to

xsOx>o{

0,I(x) =

I,

1.4.1 Continuous Functions{..etI: S --+ T, where S c IRn and T C /R/. Then, I is said to be continuous atXES if for all f > 0, there is /j > 0 such that YES and d(x, Y) < /j impliesd(j(x), I(y» < f. (Note that d(x, y) is the distance between x and y in Jl(", whiled(f(x),/(y» is the distance in IR.t.) In the language of sequences, I: S --+ Tis continuous at XES if for all sequences (xk) such that Xk E S for all k . andIimk-+ooXk = x. it is the case that limk-+oo I(Xk) = ftx),

Intuitively, I is continuous at x if the value of I at any point j- that is "close'I to .ris a good approximation of the value of I at .r, Thus, the identity function I(x) = .rfor all x E 1Ris continuous at each x E IR,while the function f lR --+ lR given by

1.4 Functions

Let S, T be subsets of lRn and IRl, respectively. AJunction J from S to T, denotedJ: S --+ T, is a rule that associates with each element of S, one and only one elementof T. The set S is called the domain of the function J, and the set T its range.

where the second expression is obtained from the first by replacing row 2 by itselfplus (-3) times row I; the third is obtained from the second by replacing row 3 byitself plus (-6) times row 1;and, finally, the last expression is obtained from thethird by replacing row 3 by itself plus (-29/13) times row 2. Since the last is intriangular form, the determinant of A is seen to be given by (I )( -IJ)(--4) = 52.

1.4 Functions 41

We have:

1 5 7 1 5 7 1 5 7 1 5 73 2 8 = 0 -13 -13 0 -13 -13 0 -13 -136 9 6 9 0 -29 -33 0 0 -4

Page 50: A First Course in Optimization Theory

Finally, some observations. Note that continuity of a function I at a point x is alocal property, i.e., it relates to the behavior of I near x, but tells us nothing about the

II! particular. if S is an open set in jRn, I is continuous on S if and only if I-I (V)is an open set injRn jor each open set V in jR/.

Corollary 1.50 A junction I: S C JR_n -* jRl is continuous 011 S if and only if foreach open set V C JR.', there is an open set U C jRn such that I-I (v) = U n S,where /-1 (V) is defined by

I-I(V) = (x E S I I(x) E VI.

As an immediate corollary, we have the following statement, which is usuallyabbreviated as: "a function is continuous if and only if the inverse image of everyopen set is open."

Proof Suppose I is continuous at x, and V is an open set in jRicontaining ftx],Suppose, per absurdum. that the theorem was false, so for any open set U containingx, there is Y E un S such that I(y) 'I. V. We will show a contradiction. Fork E (I, 2,3, ... }.Iet Uk be the open ball with center x and radius 1/k. Let Yk E UknSbe such that I(Yk) ¢ V. The sequence {n} is clearly well defined. and since Yk E Ukfor all k, we have d(x, Yk) < 1/k for each k, so Yk -* x as k -* 00. Since I iscontinuous at x by hypothesis, we also have I(Yk) -* I(x) as k -* 00. However,I(Yk) ¢ V for any k, and since V is open, VC is closed, so I(x) =}imk I(Yk) Eve,which contradicts I(x) E V.

Now, suppose that for each open set V containing f'tx), there is an open set Ucontaining x such that I(y) E V for all Y E un S.We will show that I is continuousat x. Let E > 0 be given. Define Vf to be the open ball in]R' with center I(x) andradius E. Then, there exists an open set U., containing x such that I(y) E V., for ally E U, n S. Pick any 0 > 0 so that Btx, 8) c Uf• Then, by construction, it is truethat yES and d(x, y) < 8 implies I(y) E Vf, i.e., that d(f(x),f(y)) < E. SinceE > 0 was arbitrary, we have shown precisely that I is continuous at x. 0

Theorem 1.49 A function I: S C jRn -* jRl is continuous at a point XES if andonly iffor all open sets V C jR' such that I(x) E V. there is an open set U C jRnsuch that x E U. and I(z) E V for all z E un s.

show that I is continuous at XES (resp. I is continuous on S) if and only if eachr is continuous at x (resp. if and only if each / is continuous on S).The following result gives us an equivalent characterization of continuity that

comes in quite useful in practice:

Chapter J Mathematical Preliminaries42

Page 51: A First Course in Optimization Theory

(The notation "y ~ x" is shorthand for "for all sequences {Yk) such that Yk -+ x .")The matrix A in this case is called the derivative of f at x and is denoted Df(x).Figure 1.5 provides a graphical illustration of the derivative. In keeping with standardpractice, we shall, in the sequel. denote Df(x) by [,(x) whenever rt = rn = 1. i.e ..whenever S C JRand f: S ~ JR.

lim (II[(Y) - f(x) - A(y - X)II) = o.y->x I!y-xl!

Equivalently, f is differentiable at XES if

IIf(x) - fey) - A(x - y)11 < E IIx - YII·

1.4.2 Differentiable and Continuously Differentiable FunctionsThroughout this subsection S will denote an open set in JRn.

A function f: S ~ Rm is said to be differentiable at a point XES if there existsan m x n matrix A such that for all E > 0, there is /) > 0 such that yES and[x - yll < /)implies

behavior of f elsewhere. In particular, the continuity of f at x has /to implicationseven for the continuity of f at points "close" to x. Indeed, it is easy to constructfunctions that are continuous at a given point x, but that are discontinuous at e~'cr)'other point in every neighborhood of x (see the Exercises). It is also important tonote that, in general, functions need not be continuous at even a single point in theirdomain. Consider f: JR+ ~ IR+ given by f(x) = I, if x is a rational number. and[(x) = 0, otherwise. This function is discontinuous everywhere on IR+.

Fig. 1.5. The Derivative

x

fly)

f(x)

1.4 Functions

Page 52: A First Course in Optimization Theory

if x = 0if x :;£0.

Example 1.51 Let I: lR -+ lR be given by

If f is differentiable at all points in S, then f is said to be differentiable on S.When f is differentiable on S, the derivative Df itself forms a function from S toIRmxn. If Df: S -+ ]Rmxn is a continuous function, then f is said to be continuously£iii/erell/iah/e on 5;, and we write .f is c'.

The following observations are immediate from the definitions. A function f: S CIRn -+ lRm is differentiable at XES if and only if each of the m componentfunctions I': S -+ lR of I is differentiable at x, in which case we have Df(x) =(Drl (x), ... , Dim (x». Moreover, I is Cion S if and only if each f is Cion S.The difference between differentiability and continuous differentiability is non­

trivial. The following example shows that a function may be differentiable every­where, but may still not be continuously differentiable.

This is precisely the definition of the derivative we have given. 0

(lIf(Y) - g(Y)II) = (lIf(Y) - (A(y - x) + f(X»II) -+ 0 as y -+ x.

IIY-xil lIy-xli

g(y) = Ay - Ax + f(x) = A(y - x) + lex).

Given this value for g(y), the task of identifying the best affine approximation to fat x now amounts to identifying a matrix A such that

goes to zero as y -+ x. Since the values of I and g must coincide at x (otherwise gwould hardly be a good approximation to I at x), we must have g(x) = Ax + b =f(x), or b = lex) - Ax. Thus, we may write this approximating function g as

( IIICy) - g(y) II)lIy-xll

where A is an m x n matrix, and b E lRm. (When b = 0, the function g is calledlinear.s Intuitively, the derivative of I at a point x E Sis the best affine approximationto I at x, i.e., the best approximation of I around the point x by an affine functiong. Here, "best" means that the ratio

g(y)::: Ay-s-b,

Remark The definition of the derivative DImay bemotivated as follows. An affinefunction from lRn to lRm is a function g of the form

Chapter 1 Mathematical Preliminaries44

Page 53: A First Course in Optimization Theory

Theorems 1.52 and 1.53 are only one-way implications. For instance. while thedifferentiability of f and g at x implies the differentiability of (j+ g) at x. U+ g)can be differentiable everywhere (even Cl) without I and g being differentiableanywhere. For an example, let f: lR ~ IRbe given by [(x) = 1 if x is rational. and

[]Proof See Rudin (1976. Theorem 9.15. p.214).

/)Un h\(x) = IJj(fI(.\){)Ir(x).

Theorem 1.53 Let I: Rn ~ IRm and h: IRk _. IRn. Let x E IRk. If h is diffcrentiabt«at x, and f is differentiable at hex). then 10 h is itself different iable at x. and iTS

derivative may be obtained through' the "chain rule" as:

Next, given functions I: Rn ~ IRmand II: IRk _. IRn. define their composition10 h to be the function from IRk to Rm whose value at any x E IRk is givCll hyf(h(x», that is. by the value oif evaluated at htx ),

[]Proof Obvious from the definition of differentiability.

D(j + g)(x) = DI(x) + Dg(x).

Theorem 1.52 If I: IR" __. IRnl and g: IRn __. lRm are both differentiable (I( a pointx E IRn. so is (f + g) and, in fact,

TUs example notwithstanding, it is true that the derivative of an everywhere dif­ferentiable function f must possess a minimal amount of continuity. See the Inter­mediate Vaiue Theorem for the Derivative in subsection 1.6.1 for details.

We close this subsection with a statement of two important properties of the deriva­tive. First, given two functions f: IRn ~ IRm and g: Rn __. IRm, define their SI/III

(j+g) to be the function from R" to Rm whose value at any x E lRn is I(x) +g(x).

Sincelsin(l/x2)1 ~ l.wehavelxsin(i/x2)1 ~ Ixl.soxsin(l/x2) __. Oasx __. O.This means f'(O) = O.Thus. I is not C' on IR+. 0

f'(x) = 2x sin (12) - (~) cos (12) .Since I sint-)] ~ I and Icost-)] ~'I, but (2/x) ~ ClO as x ~ O. it is clear that thelimit as x -'+ 0 of f'(x) is not well defined. However. f'(O) does exist! Indeed.

f'(O) = lim (f(X)-/(O») :::::limxsin(~2)'x-.o x - 0 x-O X

For x =I- 0, we have

4:;1.4 Functions

Page 54: A First Course in Optimization Theory

Example 1.55 Let f :]R2 ~ ]R be given by /(0,0) = 0, and for (x , Y) i= (0,0),xy

f(x,y)= .Jx2 + y2

We will show that f has all partial derivatives everywhere (including at (0,0», butthat these partials are not continuous at (0,0). Then we will show that f is notdifferentiable at (0,0).

Thus, to check if f is Ct, we only need figure out if (a) the partial derivatives allexist on S, and (b) if they are all continuous on S. On the other hand. the require­ment that the partials not only exist but be continuous at x is very important for thecoincidence of the vector of partials with Df(x). In the absence of this condition,all partials could exist at some point without the function itself being differentiableat that point. Consider the following example:

oProof See Rudin (1976, Theorem 9.21, p.219).

Theorem 1.54 Let f: S ~ JR, where S C JRn is open.

1. Iff is differentiable at x, then all partials IJ/(x)/IJXj exist at;, and D/(x) =[af(X)/3xl, ... , IJ/(x)/3xn].

2. If all the partials exist and are continuous at x, then Df(x) exists, and Df(x) =r8f(x)/8xl .... , 8/(x)/8x,,).

3. f is CIon S if and only if all partial derivatives of f exist and are continuouson S.

1.4.3 Partial Derivatives and DifferentiabilityLet f: S ~ JR, where S C JRn is an open set. Let ej denote the vector in JRn thathas a I in the j-Ih place and zeros elsewhere (j = I, ... , n). Then the j-th partialderivative of f is said to exist at a point x if there is a number af(x) / axj such that

lim (f(X + tej) - f(X») = of (x).1-+0 t 8xj

Among the more pleasant facts of life are the following:

f(x) = 0 otherwise, and let g: lR ~ lR be given by g(x) = 0 if x is rational, andg(x) = I otherwise. Then, f and g are discontinuous everywhere, so are certainlynot differentiable anywhere. However, (f +g)(x) = I for all x, so (f +g)' (x) = 0at all x, meaning (j + g) is Cr. Similarly, the differentiability of f 0 h has noimplications for the differentiability of f at h (x) or the differentiability of h at x.

Chapter 1 Mathematical Preliminaries46

Page 55: A First Course in Optimization Theory

Intuitively, the feature that drives this example is that in looking at the partialderivative of / with respect to (say) x at a point (x , yl, we arc moving along onlythe line through (x, y) parallel to the x-axis (see the line denoted II in Figure 1.6).Similarly, the partial with derivative with respect to y involves holding the x variablefixed, and moving only on the line through (x, y) parallel to the y-axis (see the linedenoted 12 in Figure 1.6). On the other hand, in looking at the derivative DJ. both thex and y variables arc allowed to vary simultoneouslv (for instance. along the dottedcurve in Figure 1.6).

Lastly, it is worth stressing that although a function must be continuous in orderto be differentiable (this is easy to see from the definitions), there is no implicationin the other direction whatsoever. Extreme examples exist of functions which arccontinuous on all of IR, but fail to be differentiable at even a single point (sec.for example, Rudin, 1976, Theorem 7.18, p.154). Such functions are by no meanspathological; they play, for instance. a central role in the study of Brownian motion

[]so the limit of this fraction as a ~ 0 cannot be zero.

III(a,a) - /(0,0) - D/(O.O)· (a,a)1!

11(0, a) - (0, 0) I!

lim II I(x, Y) - 1(0.0) - Dj(O, 0) . (_~_:_}~~~== D.(x.y)-+(o.O) lI(x,y)-(O,O)1I

but this is impossible if Dj(O, 0) = O. To see this, take any point (x, y) of the form(a, a) for some a > 0, and note that every neighborhood of (0.0) contains at least

one such point. Since j(O, 0) = o, D1(0, 0) = (0,0). and I! (x, y) II = 1:2~~-~.2.itfollows that

so aj(o. O)/ax exists at (0, 0), but is not the limit of al(o. y)/ ax as y ~ O. Similarly.we also have a (0, o)/ay = 0 1= I = lim......o af(x, o)/ay.

Suppose I' 'ere differentiable at (0,0). Then, the derivative Df(O, 0) must coin­cide with the vector of partials at (0,0) so we must have DI(O, 0) = (0,0). However.from the definition of the derivative, we must also have

af (0,0) = lim f(x, 0) - 1(0,0) = lim 0 - 0 = 0,ax x-O x x-o X

Since [t», 0) = 0 for any x.",. 0; it-is immediate that for all x oj: 0,

af(x, 0) = Jim f(x,;) -: j(x,O) = Jim x = I.ay y-+O y Y---'o Jx2 + ,y2

Similarly, at all points of the form (0, y) for y =F 0, we have aj(O, y)/ih:: I.However, note that

471.4 Functions

Page 56: A First Course in Optimization Theory

when this limit exists, and is denoted DI(x; h). (The notation t --+ 0+ is shorthandfor t > 0, t --+ 0.)When the condition t --+ 0+ is replaced with t --+ 0, we obtain what is sometimes

called the "two-sided directional derivative." Observe that partial derivatives are aspecial case of two-sided directional derivatives: when h = e, for some i, the two­sided directional derivative at x is precisely the partial derivative aI(x) / ax;.

In the previous subsection, it was pointed out that the existence of all partialderivatives at a point x is not sufficient to ensure that I is differentiable at x. It isactually true that not even the existence of all two-sided directional derivatives at ximplies that f is differentiable at x (see the Exercises for an example). However, thefollowing relationship in the reverse direction is easy to show and is left to the readeras an exercise.

. (f(X + th) - I(X))11m1 ......0+ t

1.4.4 Directional Derivatives and DifferentiabilityLet f: S --+ JR, where S C JR" is open. Let x be any point in S, and let h E ]Rn. Thedirectional derivative of f at x in the direction h is defined as

in probability theory (with probability one, a Brownian motion path '\s everywherecontinuous and nowhere differentiable).

Fig. 1.6. Partial Derivatives and Differentiability

x

I .-\ ,.\ "" _/..... __ ....

IIIIIIIIII

Chapter J Mathematical Preliminaries

v48

Page 57: A First Course in Optimization Theory

a2 I--(x) =(Jx;(JXjfor all i.] -: l •...• ni and for all x E D.

Theorem 1.58 If I: D -+ JRn is a C2 function, D2 j is a symmetric matrix, i.e., I~'ehave

OXnOXI ox;Once again. we shall follow standard practice and denote D2 I(x) by II/(x) whenevern = I (i.e .• if S c R).

If I is twice-differentiable at each x in S, we say that I is twice-differentiable Of!

S. When I is twice-differentiable on S. and for each i, j = I, ... , n , the cross-partia Ia2 f/ox;xj is a continuous function from S to JR, we say that f is twice continuousl vdifferentiable on S. and we write f is C2.

When f is C2• the second-derivative D2 f ,which is also called the matrix ofcross-partials (or the hessian of f at x), has the following useful property:

1.4.5 Higher Order DerivativesLet I be a function from S c JRn to lR. where S is an open set. Throughout thissubsection. we will assume that I is differentiable on all of S. so that the derivativeDI = [al/axl •.. ·. al/oxn] itself defines a function from S to R".

Suppose now that there is XES such that the derivative Df is itself differentiableat x, i.e., such that for each i, the function af/ox;: S -+ JR is differentiable at x ,Denote f .e partial of ai/ax; in the direction ej at x by a2 I(x) /aXjax;. if i =1= i. anda2/(x)/oxl. if i = j. Then. we say that j is twice-differentiable at x, with secondderivative D2/(x), where

Remark What is the relationship between DI(x) and the two-sided directionalderivative of I at x in an arbitrary direction h?

Corollary 1.57 If DI(x) exists, then DI(x; h) = -Df(x; -h).

Theorem 1.56 Suppose I is diffeieniiabie at XES. Then, lor any h E [Rn .ut«:(one-sided) directional derivative DI(x; h) of I at x in the direction h exists, and,in fact, we have DI(x; h) = DI(x) . h.

An immediate corollary is:

491.4 Functions

Page 58: A First Course in Optimization Theory

ngA (x) = x' Ax = L ajjxjx)

;,)=1

where A = (aU) is any symmetric n x n matrix. Since the quadratic form gA iscompletely specified by the matrix A, we henceforth refer to A itself as the quadraticform. Our interest in quadratic forms arises from the fact that if I is a C2 function,and z is a point in the domain of I. then the matrix of second partials D2/(z) definesa quadratic form (this follows from Theorem 1.58 on the symmetry property of D2 ffor a C2 function j).

A quadratic form A is said to be

J. positive definite if we have x' Ax > 0 for all x E lRn, x =1= o.2. positive semidefinite if we have x' Ax ? 0 for all x E ]Rn. x =1= o.3. negative definite if we have x' Ax < 0 for all x E R", x #- O.4. negative semidefinite if we have x' Ax ::5 0 for all x E ]Rn, x =1= o.

1.5 Quadratic Forms: Definite and Semidefinite Matrices1.5.1 Quadratic Forms and Definiteness

A quadratic form on ]Rn is a function gA on ]Rn of the form

For an example where the symmetry of D2 I fails because the cross-partials failto be continuous, see the Exercises.

The condition that the partials should be continuous for D2I to be a symmetricmatrix can be weakened a little. In particular, for

a2 f a2I--(y) = --(y)aX)aXk aXkaX)

to hold, it suffices just that (a) the partials al/axj and aflaxk exist everywhere on D,and (b) that one of the cross-partials a2 fI ax) aXk or a21/8xkdXj exist everywhereon D and be continuous at y.

Still higher derivatives (third, fourth, etc.) may be defined for a function I: lRn ~

R. The underlying idea is simple: for instance, a function is thrice-differentiable ata point x if all the component functions of its second-derivative D2 f (i.e., if all thecross-partial functions a2 flaxj ax) are themselves differentiable at x; it is c3 if allthese component functions are continuously differentiable, etc. On the other hand,the notation becomes quite complex unless n = 1 (i.e., f: lR ~ lR), and we do nothave any use in this book for derivatives beyond the second, so we will not attemptformal definitions here.

oProof See Rudin (1976, Corollary to Theorem 9.41, p.236).

Chapter 1 Mathematical Preliminaries50

Page 59: A First Course in Optimization Theory

Proof We will make use of the Weierstrass Theorem, which is proved in Chapter 3(see Theorem 3.1). The Weierstrass Theorem states that if KeIRn is compact. andf: K -)0 ]R is a continuous function, then f has both a maximum and a minimum

Theorem 1.59 Let A be a positive definite n x n matrix. Theil, there is y > 0such that if B is any symmetric n x II matrix with Ihjk - lJjk I < y [or 1/1/ i.k E

[L, ... , nl, then B is also positive definite. A similar statement holds for negativedefinite matrices A.

A=[~ ~lFor x = 0, I), x' Ax = 2 > 0, so A is not negative semidefinite. But for x = (- l . I),x' Ax = -2 < 0, so A is not positive semidefinite either.

Given a quadratic form A and any t E IR, we have itx]' AU x) = t2x' Ax, so thequadratic form has the same sign along lines through the origin. Thus, in particular,A is positive definite (resp. negative definite) if and only if it satisfies x' A x > °(resp. x' Ax < 0) for all x in the unit sphere C = {u E jRn I lIuli = I}. We will usethis observation to show that if A is a positive definite (or negative definite) n x nmatrix, so is any other quadratic form B which is sufficiently close to A:

For any x = (XI. X2) E jR2, we have x' Ax = xT, so x' Ax can be zero even if x of O.(For example, x' Ax = 0 if x = (0, I).) Thus, A is not positive definite. On the otherhand, it is certainly true that we always have x' Ax :::0, so A is positive semidefinite.

Observe that there exist matrices A which are neither positive semidefinite nornegative semidefinite, and that do not, therefore, fit into any of the four categories wehave identified. Such matrices are called indefinite quadratic forms. As an exampleof an ir Iefinite quadratic form A, consider

is positive definite, since for any x ::::(xj , X2) E ]R2, we have x I A x = x? + xi, andthis quantity is positive whenever x of O. On the other hand, consider the quadraticform

The terms "nonnegative definite" and )'nonpositivc definite" arc: often used in placeof "positive semidefinite" and "negative semidefinite" respectively.

For instance, the quadratic form A defined by

A = [~ n511.5 Quadratic Forms

Page 60: A First Course in Optimization Theory

Finally, it is important to point out that Theorem 1.59 is no longer true if "positivedefinite" is replaced by "positive semidefinite." Consider, as a counterexample, thematrix A defined by

Corollary 1.60 Iff is a C2 function such that at some point x, D2 f(x) is a positivedefinitematrix, then there isa neighborhood B(x, r) of x such thatforall y E Bix ; r),D2 fey) is also a positive definite matrix. A similar statement holds If D21(x) is,instead, a negative definite matrix.

A particular implication of this result, which we will use in Chapter 4 in the studyof unconstrained optimization problems, is the following:

oso B is also positive definite, and the desired result is established.

Therefore, for any x E C,

x'Bx = x'Ax+x'(B-A)x:::: E-E/2 = E/2

n< y L IXjllxkl

j,k=1

< yn2 = E/2.

n

< L Ibjk - ajkllxjllxkl/.Ic=1

nIx'(B - A)xl =·1L (bjk - ajk)XjXkl

j,k=1

If A is positive definite, then z' Az must be strictly positive, so there must exist E > 0such that x' Ax :::: E > 0 for all x EC.

Define y = E /2n2 > O. Let B be any symmetric n x n matrix, which is such thatIbjk - ajkl < Y for all j, k = 1, ... , n. Then, for any x E C,

z'Az::s x'Ax.

on K, i.e., there exist points k' and k" in K such that f(k') ~ f(k) ~ f(k*) for allk E K.

Now, the unit sphere C is clearly compact, and the quadratic form A is continuouson this set. Therefore, by the Weierstrass Theorem, there is z in C such that for anyx E C, we have

Chapter 1 Mathematical Preliminaries52

Page 61: A First Course in Optimization Theory

A natural conjecture is that this theorem would continue to hold if the words "neg­ative definite" and "positive definite" were replaced with "negative semidefinite" and

oProof See Debreu (1952, Theorem 2, p.296).

Moreover. a positive semidefinite quadratic form A is positive definite if and onlv ifIAI =I- O. while a negative semidefinite quadraticform is negative definit« ij and onlvi/IAI =I- O.

Theorem 1.61 All n x n symmetric matrix A is

I. negative definite ifandonly if(-I)kIAkl > O/orall k E {I, ... , nl.2. positive definite if and only if IAk I > 0 for all k E {1, ... , n I·

We will refer to Ak as the k-th naturally ordered principal minor of A.

1.5.2 Identifying Definiteness and Semidefinite nessFrom a practical standpoint, it is of interest to ask: what restrictions on the structure ofA are imposed by the requirement that A be a positive (or negative) definite quadraticform? We provide answers to this question in this section. The results we presentare, in fact, equivalence statements; that is, quadratic forms possess the requireddefiniteness or semidefiniteness property if and only if they meet the conditions weoutline.

The first result deals with positive and negative definiteness. Given an n x 11

symmetric matrix A, let Ak denote the k x k submatrix of A that is obtained whenonly the fi.ct k rows and columns are retained, i.e., let

satisfies laij - bijl < y for all i, j. However, B is not positive semidefinite: forx = (XI, X2), we have x' Bx = xr - Exi, and this quantity can be negative (forinstance, if XI = 0 and X2 =1= 0). Thus, there is no neighborhood of A such that allquadratic forms in that neighborhood are also positive semidefinite.

We have seen above that A is pcsijjve.semidefinite (but not positive definite). I:ick_any y > O.Then, for E = Y/2, the matrix

B = [~ .: ]

531.5 Quadratic Forms

Page 62: A First Course in Optimization Theory

Finally, let Il denote the set of all possible permutations of {I, ... , II J.

[

a1TI1T1

Af = :Q1Tk1T1

For k E {I, ... ,II}, let Ak denote the k x k symmetric submatrix of AiT obtained byretaining only the first k rows and columns:

larrlrrlA 1I' - •- .arrn1T1

Roughly speaking, the feature driving this counterexample is that, in both thematrices A and B, the zero entries in all but the (2,2)-place of the matrix make thedeterminants of order 1 and 2 both zero. In particular, no play is giv~n to the sign ofthe entry in the (2,2)-place, which is positive in one case, and negative in the other.On the other hand, an examination of the expressions x' Ax and x' Bx reveals that inboth cases, the sign of the quadratic form is determined precisely by the sign of the(2,2)-entry.

This problem points to the need to expand the set of submatrices that we consider, ifwe are to obtain an analog of Theorem 1.61 for positive and negative semidefiniteness.Let an II x II symmetric matrix A be given, and let zr = (1ft, ... , 1fn) be a permutationof the integers {I, ... ,n}. Denote by Arr the symmetric n x n matrix obtained byapplying the permutation rr to both the rows and columns of A:

Then, A and B are both symmetric matrices. Moreover, 1A II = 1A 21 = 1BII =/B2/ = 0, so if the conjecture were true, both A and B would pass the test for positivesemidefiniteness, as well as the test for negative semidefiniteness. However, for anyx E ]R2, x' Ax = xi and x' Bx = -xi. Therefore, A is positive semidefinite but notnegative semidefinite, while B is negative semidefinite, but not positive semidefinite.

o

Example 1.62 Let

"positive semidefinite," respectively, provided the strict inequalities were replacedwith weak ones. This conjecture isfaLse. Consider the following example:

Chapter J Mathematical Preliminaries54

Page 63: A First Course in Optimization Theory

1.6 Some Important ResultsThis section brings together some results of importance for the study of optimizationtheory. Subsection 1.6.1 discusses a class of results called "separation theorems" forconvex sets in lRn. Subsection 1.6.2 summarizes some important consequences of

It is an easy matter to check that the determinants of all four of these are nonneg­ative, so A passes the test for positive semidefiniteness. However, A is not positivesemidefinite: we have x' Ax = XIX2. which could be positive or negative. 0

a12] , anda22

[all1, [an], [alla21

A=[~ ~lThere are only two possible permutations of the set {I, 2). namely, {I, 2) itself. and{2, I}. This gives rise to four different submatrices, whose determinants we have toconsider:

Example 1.65 Let

A = [~ -n.Note that IA Ii = 1, and IA21 = (1)(1) - (-3)(0) = I, so A passes the test forpositive definiteness. However, A is not a symmetric matrix, and is not. in fact.positive definite: we have x' Ax = xf +xi - 3X1X2 which is negative for x = (I. I).

o

Example 1.64 Let

One final remark is important. The symmetry assumption is crucial to the validityof these results. If it fails, a matrix A might pass all the tests for (say) positivesemidefiniteness without actually being positive semidefinite. Here are two examples:

oProof See Debreu (1952. Theorem 7, p.298).

1.6 Some Important Results

i;l,~orem1.63 A symmetric iJ·x'n matrix A is1. positive semidefinite if and only if IAI:"I 2: 0 for all k E {l, .... n) and [or aI/

7r E n.2. negative semidefinite ij and only if (- III AI:"I :::0jor all II. E {I.... ,n Iand ],»

all n E n.

Page 64: A First Course in Optimization Theory

IfD and E are separated by Ht.p, a) and one of the sets (say, £) consists of just asingle point x, we will indulge in a slight abuse of terminology and say that H(p, a)separates the set V and the point x .

1\'10 sets 'D and £ in lRn are said to be separated by the hyperplane H (p, a) in lRlIif D and £ lie on opposite sides of H (p, a). i.e., if we have

p . y 5 a, for all y E 'Dp . z > a • for all Z E E:

bounds D: if xy ~ I and x, y ~ 0, then we must have (x + y) ~ (x + x-I) 2: 2. Infact, H(p, a) is a supporting hyperplane for V since H(p, a) and D have the point(x, y) == (I, I) in common. 0

Example 1.66 Let D = {(x, y) E 1R~I xy ~ l}. Let p be the vector (1,1), and leta = 2. Then. the hyperplane

H(p,a) = {(X,y)ElR2Ix+y=2}

If V is bounded by Htp, a) and V n Ht p, a) =1= 0, then H(p, a) is said 10 be asupporting hyperplane for D.

p . x ~ a. for all x E 'D.

or

p : x ::: a, for all x s D,

is called a hyperplane in an, and will be denoted H (p, a).Ahyperplane in R", for example, is simply a straight line: if p E ]R2anda E Rthe

hyperplane Ht p, a) is simply the set of points (XI, X2) that satisfy PIXI + P2X2 = a.Similarly, a hyperplane in ]R3 is a plane.

A set D in ]Rn is said to be bounded by a hyperplane H(p, a) if'D lies entirely onone side of H(p, a), i.e., if either

H = Ix E an I p . x = a}

1.6.1 Separation TheoremsLet p # 0 be a vector in JR.n, and let a E IR.The set H defined by

assuming the continuity and/or differentiability of real-valued functions defined onIR".Finally, Subsection 1.6.3 outlines two fundamental results known as the InverseFunction Theorem and the Implicit Function Theorem.

Chapter J Mathematical Preliminaries56

Page 65: A First Course in Optimization Theory

YEY.

Indeed, th:i_s.i~l.lll ~a~xconseqllenc(!,~L~hl!_!\,eierstrass Theorem. Let jj (x" , r) denotethe closed ball with cente~ x· and radius r. Plckr>-Osufflc'ie'ntly large so that theset Y = B(x*, r) nVis nonempty. Since R(x", r) and V are both closed sets, sois Y. Since B(x*, r) is bounded, and Y C R(x·, r), Y is also bounded. Therefore,Y is compact. The function f: Y ~ IR defined by fey) = dtx"; y) is clearlycontinuous on Y, since the Euclidean metric d is continuous. Thus, by the WeierstrassTheorem, there exists a minimum y* of j on y, i.e., there exists y. in Y suchthat

YED.dtx"; yo) :s d(x", y),

Proof We first prove the result for the case where V is a closed set. Then, usingthis result, we ~sta15lish~theg~'~~;~IZa~Th~p-roor~ili~'3ke-lise'onhe ~~i~IslrassTheorem (see Theorem 3.) in Chapter 3), which states that if KeRn is compact.-------and f: K, ~ IR is a continuous function, then f has both a maximum and a minimumon K: i.e., there exist points k' and k* in K, such that f(k') ~ j(k) ~ f(k*) for allk e K.

So supj -seV is a nonempty, closed, convex set in IRn, and x" i D. W~_C_~1i,nl_t!:'.ltth~e e~ig~y~~'D_sucb that

Theorem 1,67 Let V be a~.P...!2'SQ!l~ set in JRn, and let x* be a point illIRn that is not in V. Then. there is Q hyperplane H(p, a) in IRn with p i 0 whichseparates Valid x*. We may. if we desire, choose p to also satisfy IIpil = I.

Intuitively, the closure of X is the "smallest" closed set that contains X. Since thearbitrary intersection of closed sets in closed, XO is closed for any set ,yo. Note thatXO = X if and only if X is itself closed.

The following results deal with the separation of convex sets by hyperplanes. Thcyplaya significant role in the study of inequality-constrained oprimizarion problemsunder convexity restrictions (see Chapter 7).

then

L\(X) = IY c IRn I Xc Y},

A final definition is requiredbefore we state the main results of this section. Givena set Xc IRn, the closure of X, denoted xo, is defined to be the intersection of allclosed sets containing X, i.e., if

571.6 Some Important Results

Page 66: A First Course in Optimization Theory

Rearranging terms once again, and using the definitional relations p = (y" - x")and p- y* = a, we finally obtain

P: y = (y. - x") . y 2: (y. - x") . y* = a.

Takinglimits as A~ 0, and dividing the result by 2, results in the inequality

o ::::(x" - Y) . (x" - y.) - IIx" _ Y·1I2

= (x" - y"') . (x" - Y - x" + /')= (x" - y*) . (y. - y).

Dividingboth sides by ). > 0, we get

o :5 Allx'" - yll2 + 2(1 - A)(x* - y). (x· - y"') - (2 - A)lIx· _ Y"'1I2.

Rearrangingterms, this gives us

o :5 >..2l1x* - YII2 + 21..(1 - A)(x* - Y) . (x" - Y*) - ),,(2 - A) IIx* _ y* 112.

By expanding the terms in this inequality.we obtain

IIx" - Y*1I2 :5 IIx* _ y()..) 112= IIx* - AY - (1 - A)Y*1I2

= 1I)..(x* - Y) + (l - )..)(x* - Y*)1I2

= A211x* - YII2 + 21..(1 - A)(x* - Y) . (x" - y*)

+ (1 - >.illx* - Y*1I2.

To complete the proof of the theorem for the case where V is closed, we will nowshow that P> y::: a for all y E V. Pick any y E V. Since y" ED and V is convex.it is the case that for all XE (0,1), the point y()..) =)..y + (1- )..)y* is also inV. Bydefinitionof y*. we then have

< a.

Nowlet p = y* - x" and leta = p- y*. Wewill showthat the hyperplane H (p, a)separatesD and x". To this end, note first that

p .x" = (y* - x") . x"= -(Y" - .r ") . (y* - x") + Y* . (y* - x")= -lIpll2 +a

YED.d(x", y*) :5 d(x*, y),

If y E D and y ¢ y, then we must have y ¢ B(x*, r), so dtx", y) > r. Therefore,we have established, as required, that

Chapter I Mathematical Preliminaries58

Page 67: A First Course in Optimization Theory

If we define a = p . x", we have shown that the hyperplane H(p, a) separates Dand z". []

YEP.p. x· ~ P' y,

Since D C 1)0, this implies

p- x· :::p. y,

and the inner product is continuous, by taking limits as k -+ 00. we obtain

Pm(k) . Xm(k) S Pm(k) . y,

1'!1(""':fo,~,there is a subsequence m(k) of k ; and a point pEe such that p(m(k» -+

p. Since we have

Pick a sequence {rk} with rk > 0 and r« -+ O. For notational ease, let Pk = p(rk)and Xk = x(r(k». Since" Pk II= I for all k, each Pk lies in the compact set

C = {z E IRn Ilizil = I}.

1f'.lrtper) . ~ s p(r)· y,

Thus, H (p, ii) also separates D and x", as required, and we have shown that thevector p in the separating hyperplane may be taken to have un it norm without loss ofgenerality. This completes the proof of the theorem for the case where D is closed.

Now, suppose D is not closed. Let D" denote the closure of P.As the closure of aconvex set, 'D" is also convex. Ifx· rf. D", then by the arguments we have just given,there exists a hyperplane that separates the closed set D" and x". Since P cpo,this implies Dand x· can be separated.

Finally, suppose x" ¢:. P, but x" E 1)0, Then, for any r > 0, there must exist a c:­point x(r) E B(:!, r) such that x(r) 1. D", Therefore, there exists p(r) E ]Rn withIIp(r)1I = Isuch that

p.y ap. Y = -- > = a.

IIpll - IIpll

while for Y E D, we also have

= 0,Ilpll

< -ar x*

IIpllp·x* =

Thus, we have shown that there is a hyperplane Ht p, a) that separates D and ',{­when D is closed. It remains to be shown that we may, without loss of generality.take" pll to be unity. Suppose III'll #- 1. We will show the existence of (jJ. ii) suchthat H(fi, u) also separates D and .r", and which further satisfies Ilijll = I. Sincep i= 0, we have Ilpll > O. Define p = p/lipll and a = a/lipil. Then, we havellfill = Iand

~()I,fi Some Important Remits

Page 68: A First Course in Optimization Theory

Remark It might appear at first glance that the intermediate value property actuallycharacterizes continuous functions, i.e., that a function f: [a, b] ~ lR is continuousif and only if for any two points XI < X2 and for any real number c lying betweenl(xl) and f(x2), there is x E (XI. X2) such that f(x) = c. The Intermediate Value

oProof See Rudin (1976, Theorem 4.23, p.93).

Theorem 1.69 (Intermediate Value Theorem) Let D = [a, b] be all interval inIR and let f: D ~ IR be a continuous function. If f(a) < f(b), and if c is a realnumher sucli that f(a) <:: c < fIb), then there exists x E (a, h) such that j(x) = c.A similar statement holds if f(a) > feb).

1.6.2 The Intermediate and Mean Value TheoremsThe Intermediate Value Theorem asserts that a continuous real function on an intervalassumes all intermediate values on the interval. Figure 1.7 illustrates the result.

It follows that SUPyE£p . y !:: infxED P . x. If a E [SUPyE£P . y, infxED p- x], thehyperplane H (p, a) separates D and E,

That p can also be chosen to satisfy IIpll = I is established in the same way as inTheorem 1.67, and is left to the reader as an exercise. 0

x E D,yE £.r» !:: p·x,

This is the same thing as

Z E F.p·o !:: p' z,

By Theorem 1.38, the convexity of D and £ implies that F is also convex. We claimthat 0 ¢ :F. If we had 0 E F, then there would exist points xED and y E E such thatx - y = O.But this implies x = y, so xED nE, which contradicts the assumptionthat Dn£ is empty. Therefore, 0 ~ F.

By Theorem 1.67, there exists P E ]R" such that

(y E JRn I - y EE).

Proof Let F = D+ (-£), where, in obvious notation, -£ is the set

Theorem 1.68 Let D and E be convex sets in lRnsuch that D nE = 0. Then, thereexists a hyperplane Hi p, a) in IR" which separates D and E, We may, if we desire,choose p to also satisfy IIpli = I.

Chapter I Mathematical Preliminaries60

Page 69: A First Course in Optimization Theory

It is very important to emphasize that Theorem 1.70 docs not assume that I IS

a C 1 function. Indeed, if I were C I, the result would be a trivial consequence ofthe Intermediate Value Theorem, since the derivative I' would then be a continuousfunction on D.

The next result. the Mean Value Theorem, provides another property that thederivative must satisfy. A graphical representation of this result is provided in Fig­ure 1.8. As with Theorem 1.70. it is assumed only that I is everywhere differentiableon its domain D, and not that it is Cl.

[JProof See Rudin (1976, Theorem 5.12, p.WS).

Theorem 1.70 (Intermediate Value Theorem for the Derivative) Lei D = (a, hibe an interval in lR, and let I: D --+ lR be afunction that is differentiable everywhereon D. 1II'(a) < /,(b), and if c is a real number such that I'(a) < c < I'(h).then there is a point x E (a, b) such that /' (x) = c. A similar statement holds ifI'(a) > I'(b).

We have seen in Example 1.51 that a function may be differentiable everywhere,but may fail to be continuously differentiable, The following result (which may beregarded as an Intermediate Value Theorem for the derivative) states, however, that thederivative must still have some minimal continuity properties, viz., that the derivativemust assume all intermediate values. In particular, it shows that the derivative I' ofan everywhere differentiable function I cannot have jump discontinuities.

Theorem shows that the "only if" part is true. It is left to the reader to show that theconverse, namely the "if" part, is actually false. (Hint: Use Example 1.51.) []

Fig. 1.7. The Intermediate Value Theorem

a b

f{a)

J(b) -------- ------------- --------------------

J(c)

/.fi Some Important Results

Page 70: A First Course in Optimization Theory

oProof See Rudin (1976, Theorem 5.l5, p.1IO).

_ ~ (f(k) (x)(y - X)k) fm+Icz)(y _ x)m+1f(y) - t:o k! + (m + I)!

Theorem 1.72 (Taylor's Theorem) Let f: D -+ lR be a em function, where Dis an open interval in JR, and m :::0 is a nonnegative integer. Suppose also thatfm+1 (z) existsfor every point zED. Then,for any x t Y E D, there is z E (x t Y)such that

The following generalization of the Mean Value Theorem is known as Taylor'sTheorem. It may be regarded as showing that a many-times differentiable functioncan be approximated by a polynomial. The notation f(k)(z) is used in the statementof Taylor's Theorem to denote the k-th derivative of f evaluated at the point z. Whenk = 0, I(k)(x) should be interpreted simply as I(x).

oProof See Rudin (1976, Theorem 5.10, p.108).

feb) - f(a) = (b - a)f'(x).

Theorem 1.71 (Mean Value Theorem) Let D = [a, b] be an interval in Randlet f: D -+ R be a continuousfunction. Suppose f is differentiableon (a, b). Then,there exists x E (a, b) such that

Fig. 1.8. The Mean ValueTheorem

bxa

~ slope = f'(x)

",/1»< 1

,;'" I

: -:: \: f(b) - f(a)A" ~ slope = =v.:':

" I 1 - a.:" 1

"'~ I I"," 1

,,'" 1'" 1

1111I1I

Chapter 1Mathematical Preliminaries62

Page 71: A First Course in Optimization Theory

Finally. we turn to Taylor's Theorem in ]Rn. A complete statement of this resultrequires some new notation, and is also irrelevant for the remainder of this book.Weconfine ourselves, therefore, to stating two special cases that are useful for ourpurposes.

oThe theorem is proved.

feb) - f(a) = Df(z('A'» . (b - a).

Substituting for g in terms of f ,this is precisely the statement that

g(l) - g(O) = g'(A')(l _-0) = g'(A').

Proof For notational ease, let zeAl = (1 - A)a + sb . Define g: [0, II -~ lRbyg().) = f(z().» for A E [0, I]. Note that g(O) = f(a) and gO) = feb). Since fis everywhere differentiable by hypothesis, it follows that g is differentiable at all)..E [0, 1], and in fact,g'(A)= Df(z(A»' (b - a). By the Mean ValueTheorem forfunctions of one variable, therefore, there is A' E (0, I) such that

Theorem 1.74(The Mean ValueTheorem inR") Let D c ~n be open and convex.and let f: S ~ lR. be afunction that is differentiable everywhere all D. Then.for anya, bED, there is A.E (0, I) such that

feb) - f(a) = Df«1 - ~)a + A.b) . (b - a).

An n-dimensional version of the Mean ValueTheorem is similady established:

odone.

Proof Wederivethis result as a consequenceof the Intermediate ValueTheorem inlit Let g: [0, 1J ~ lRbe definedby g().) = f«(l - ).)a + Ab), A E [0, I J. Since fis a continuous function, g is evidentlycontinuous on [O,IJ.Moreover, g(O) = f(a)and g(1) = f(b), so g(O) < c < g(l). By the Intermediate Value Theorem in litthere exists ~ E (0, I) such that g(A.)= c. Since g(A.) = f(O - A.)a + A.b), we arc

Theorem 1.73(The Intermediate ValueTheorem in R") Let D C ]Rn be a convexset, and let f: D ~ ]R be continuous on D.Suppose that a and b are points in D suchthat f(a) < f(b). Then, for any c such that f(a) < C < fib), there is ~ E (0, I)such that f((l - ~)a + ~b) = c.

Each of the results stated irilhiS'"subsection,with the obvious exception of the In­termediate ValueTheorem for theDerivative,also has an n-dimensional version.Westate these versionshere, derivingtheir proofsas consequences of the correspondingresult in IR.

631.6 Some 1mportant Results

Page 72: A First Course in Optimization Theory

5Weare implicitly assuming here that g(.) is well-defined. i.e.• that (I-/)X +Iy ED for aliI E [0. 1J.Thisassumption is without loss of generality. To see this. note that since D is open and xED. there is r > 0such that B(x, r) CD. By shrinking S if need be. we can ensure that /) < r. Then it is evidently the casethat for all y satisfying IIy - xII < 8, the line segment joining y and x (i.e .• the set of points (I -I)X + tyfor I E (0, I f) is completely contained in D.

Iy - x] < D implies Ih(y)1 < E and IIDh(y)1I < E.

Fix any y satisfying Iy - x I < D. Define a function g on [0, I] by

get) = h[(l - t)x + ty].5

Then, g(O) = hex) = O.Moreover, g is C' with g'(I) = Dh[(1 -t)x + ty)(y - x).Now note that 1(1 - t)x + ty - x] =: tl(y - x)] < 0 for all t E [0, I), since

[x - yl < D. Therefore, IIDh[(l - t)x + ly]1I < E for all 1 E [0, I], and it followsthat Ig'(t) I .:5 Elly - xII for all r E [0, 11·

So let E > ° be given. By the continuity of h and Dh, there is D > ° such that

lIy-xll < 0 implies Ih(y)1 < Ellx - YII·

F(y) = f(x) + Df(x) . (y - x).

Let hey) = f(y)- F(y). Since f and Fare c', so ish. Note that h(x) = Dh(x) = O.The first-part of the theorem will be proved if we show that

hey)_....:.:...--+ ° as y -+ x,lIy-xll

or, equivalently, if we show that for any E > 0, there is D > ° such that

Proof Fix any xED, and define the function F(·) on D by

where the remainder term R I(x, y) has the property that

lim (RI (x, y») = 0y-+x IIx - yll .

If J is C2• this statement can be strengthened to

I , ~fey) = I(x) + Df(x)(y - x) + l(Y - x) D l(x)ly - x) + R2(X, y),

where the remainder term R2 (x, y) has the property that

lim ( R2 (x, y») = 0y-+x IIx _ yll2 .

fey) = f(x) + Df(x)(y -x) + R, (x, y),

Theorem 1.75 (Taylor's Theorem in JRn) Let f: D -+ JR. where D is an open setill ]RII. If f is Cion D, then it is the case that for any x, y E D. we have

('/uII'lrr I Mathcmatira! Preliminaries

Page 73: A First Course in Optimization Theory

Turning now to the Implicit Function Theorem, the question this result addressesmay be motivated by a simple example. Let S = lR.~+, and let.f: S -+ lR. be defined

oProof See Rudin (1976, Theorem 9.24, p.221).

Dg(x) = (D[O)rl• where fCY) = x.

1. There are open sets U and V in IRn such that x E U, Y E v. f is one-to-one OIT

V, and .1':; ) = U.2. The inverse/unction g:U -+ V oj f is a Cl Junction all U. whose derivative at

any point x E U satisfies

Theorem 1.76 (Inverse Function Theorem) Let f: S -+ IRnbe a Cl junction.where S c lRn is open. Suppose there is a point yES such that the n x n matrixDfey) is invertible. Let x = fey). Then:

1.6.3 The Inverse and Implicit Function TheoremsWe now state two results of much importance especially for "comparative statics"exercises. The second of these results (the Implicit Function Theorem) also plays acentral role in proving Lagrange's Theorem on the first-order conditions for equality­constrained optimization problems (see Chapter 5 below). Some new terminologyis, unfortunately, required first.

Given a function f: A -+ B, we say that the function f maps A onto E, if forany bEE, there is some a E A such that f(a) = h. We say that f is a on e-t o­one function if for any b E B. there is at most one a E A such that I(a) = b . Itf: A -+ B is both one-to-one and onto. then it is easy to see tha t there is a tunique;function g: B -+ A such that f(g(b) = b for all b E B. (Note that we also haveg(f(a» = a for all a E A.) The function g is called the inversejunction of f .

Since y was an arbitrary point satisfying Iy - x] < 8, the first part of the theorem isproved.

The second part may be established analogously. The details arc left as an exercise.LJ

I}'(y)1 = Ig(l)1 = Ig'(I*)1 S Ely-xi·

Therefore,

g(l) = g(O)+ g'(t*)(l - 0) = g'(t*).

By Taylor's Theorem in JR, there'is!" E (0, 1) such that

I.r. Somr Important Ur.<II/r,

Page 74: A First Course in Optimization Theory

1.7 Exercises

1. Show that equality obtains in the Cauchy-Schwartz inequality if and only if thevectors x and y are collinear (i.e., either x = ay for some a E IRor y = bx forsome b E IR). (Note: This is a difficult question. Prove the result for unit vectors

oProof See Rudin (1976, Theorem 9.28, p.224).

Dg(x) = (DFy(x, »r ' .DFx(x, y).

Theorem 1.77 (Implicit Function Theorem) Let F: S C ~m+n ~ IRn be a C1junction, where S is open. Let (x", y") be a point in S such that DFy{x·, y*) isinvertible, and let Ftx", y*) = c. Then. there is a neighborhood U C JR.m of x" anda C 1function g: U ~ JR.n such that (i) (x, g{x» E Sfor all x E U, (ii) g(x*) = y*,and (iii) Ftx . g(x» == c for all x E U. The derivative oj g at any x E U may beobtained from the chain rule:

Thus, the values of the x-variable on the level set C(x, y) can be represented explicitlyin terms of the values of the y-variable on this set, through the function h.

In general, an exact form for the original function I may not be specified-forinstance, we may only know that I is an increasing CI function on JR2-so we maynot be able to solve for h explicitly. The question arises whether at least an implicitrepresentation of the function h would exist in such a case.

The Implicit Function Theorem studies this problem in a general setting. That is,it looks at level sets of functions I from S c lR'" to]Rk, where m > k, and askswhen the values of some of the variables in the domain can be represented in termsof the others, on a given level set. Under very general conditions, it proves that atleast a local representation is possible. •

The statement of the theorem requires a little more notation. Given integers m 2: Iand n 2: I, let a typical point in ~,"+n be denoted by (x , y), where x E lRm andy E ]Rn. For a C 1 function F mapping some subset of ]Rm+/J into JR/J, let DFy(x, y)denote that portion of the derivative matrix D F(x, y) corresponding to the last nvariables. Note that D Fy(x, y) is an n x n matrix. D Fx (x, y) is defined similarly.

I(h (y), y) == I(x, ji), y E IR++.

If we now define the function h: lR++ ~ R by h (y) = I(x, ji) / y, we have

C(x, ji) = {(x, y) E S I f't», y) = I(x, ji)}.

by I{x, y) = xy. Pick any point (x, ji) E S, and consider the "level set"

Chapter 1 Mathematical Preliminaries66

Page 75: A First Course in Optimization Theory

5. Give an example of a nonconvergent real-valued sequence {Xk} such that [.lie Icontains at least one convergent subsequence, and every convergent subsequenceof {Xk} converges to the limit x = o.

6. Letx,y E ]Rn. Show that for any sequences {Xk}, {n} such that x, --+ x andYk --+ y. we have limr., 00 d(Xb Yk) = d(x, Y), where d is the Euclidean metricon R".

7. Provide a formal proof that if a sequence {Xk} converges to a limit x , then everysubsequence of (Xk] also converges to x.

8. Show that lim supj, Xk = -lim infk( -Xk) for any sequence {xkl in IR.9. Given two sequences {ak} and {bk} in IR, show that

lim sup(ak + hk) :::: lim sup ak + lim sup blek k k

lim inf(ak + hk) ~ lim inf ak + lim inf bk.k k k

Give examples where the strict inequalities obtain.

10. Let {ak} and {hk] be two real-valued sequences such that (lk, hk ~ 0 for all k.What is the relationship between lim sUPk(akbk) and (lim sUPk (lk )(lim sUPkbk) ')What if ai; ble ::::0 for all k?

II. Le: ;A~} be a real-valued sequence, and a E IR. When is it the case thatlim sUPk(aa,,) = a lim !>uPk(lie?

12. In each of the following cases, give an example of a real-valued sequence l.qJmeeting the stated properties:

(a) lim sUPkXk = lim inf k Xk = +00.

(b) lim sUPkXk = +00 and lim infk Xk = -00.

(c) lim sUPkXk = +00 and lim infh Xk = O.(d) lim sUPkXk = 0 and lim infk Xk == -00.or show that no such sequence can exist.

first. i.e .• for vectors x andy which satisfy x . x = Y . Y = 1. Then reduce thegeneral case to this one.)

2. Suppose a real-valued sequence (Xk I satisfies Xic > 0 for all k, and x is any limitpoint of the sequence {Xk}. Under what further conditions on {.q Ican we assertthat x > O?

3. Let {Xk}. {Yk} be sequences in lR.n such that Xk --+ x and Yk --+ y. For each k • letZk = Xk + Yk. and let Wie = Xk • Yk. Show that Zie --+ (x + y) and Wk --+ x . y.

4. Give an example of a sequence {Xk Iwhich has exactly n limit points, wherenE{1.2 .... }.

67J.7 Exercises

Page 76: A First Course in Optimization Theory

20. Give an example of an open set in lR that is not convex.

21. Give an example of a compact set in ]R that is not convex.

22. A set X C lRn is said to be connected if there do not exist two non empty opensets XI and X2 such that XI n X2 = 0 and XI U X2 :::> X. Show that everyconnected set in lR must be convex. Give an example of a connected set in ]R2that is not convex.

23. Let X C ]R be an open set. Show that if a finite number of points {XI, .. , , XI}are removed from X, the remaining set is still open. Is this true if we remove acountable infinity of points {Xl, X2, .•• J from X?

24. Let An be the interval (0, lin). Provide a formal proof that n~1 An is empty.

25. Let B C ]R2 be as follows:

B = {(x. Y) E ]R2 : y = sin ~, x > O}U {(O,O)} .

Is B closed? open? bounded? compact?

13. Find the lim sup and the lim inf of each of the following sequences:

(a) Xk =: (-I)k, k =: 1,2, .(b) Xk =k(_I)k, k = 1,2, .(c) Xk = C-I)k + Ilk, k = 1,2, ...(d) Xk == 1 if k is odd, and Xk = -k12 if k is even, k = 1,2, .. ,

14. Find the lim sup and lim inf of the following sequence: 1,1,2,1,2,3,1,2,3,4, ...

15. Let {xkl be a bounded sequence of real numbers. Let S C lRbe the set whichconsists only of members of the sequence {Xk}. i.e.• XES if and only if x = Xkfor some k. What is the relationship between lim sUPkXk and sup S?

16. Find the supremum, infimum, maximum, and minimum of the set X in each ofthe following cases:

(a) X =: {x E [0, I] I x is irrational}.(b) X== [x Ix = lin, n = 1,2, ... }.(c) X = {x I x = I - lin, n =: 1,2, ... }.(d) X = Ix E [0, JrJ I sin x > 1/2}.

17. Prove that the "closed interval" [0,1] is, in fact, a closed set, and that the "openinterval" (0,1) is, in fact, an open set. Prove also that [0,1) and (0,1) are neitheropen nor closed.

18. Consider the set Z; = {O,1,2, ... J of nonnegative integers viewed as a subsetof IR.. Is it closed, open, or neither?

19. Is R" viewed as a subset of itself an open set? Is it a closed set? Explain youranswer.

Chapter J Mathematical Preliminaries68

Page 77: A First Course in Optimization Theory

for eyery finite Sllp_s~tF o(/-t. then we also have_ .. ---_---- .. - ~--.".-.~,-,•..---'-

/naEASa n S:/; 0. t:') 11.5 c

L

5<:.

Give an example of a set X C ]R2 that cannot be expressed as the Cartesianproduct of sets A, B C IR.

31. Give an example of an infinite collection of compact sets whose union is boundedbut not compact.

32. Let A = {I, 1/2, 1/3, ... , I/n, ... J U (OJ. Is A closed? Is it compact')

33. Give an example of a countably infinite collection of bounded sets whose unionis bounced and an example where the union is unbounded.

34. Say that a set Sin lRnhas thefinite intersection properly if the following conditionholds: given an arbitrary collection of closed sets (S'a)aEA in IRn, it is the casethat whenever every finite subcollection (Sa )aEF has nonernpty intersection withS, then the entire collection has nonempty intersection with S, i.e., whenever

naEFSa n S 'I0 1::711

A x B = {(a, b) I a E A, b « BJ.

B is called the projection of A onto the x-axis.

(a) If A is a closed set in JR2, is B necessarily a closed set in lR?(b) If A is an open set in ]R2, is B necessarily an open set in lR?

30. Given two subsets A and B of R, recall that their Cartesian product A x B C [R2is defined as

Is A open? bounded? compact?

27. Let A = [-1,0) and B = (0, IJ. Examine whether each of the followingstatements is true or false.

(a) A U B is compact;(b) A + B = {x + y ;X E A, Y E B) is compact;(c) A n B is compact.

28. Let A C ]Rn be given. Show that there is a smallest closed set, A, which containsA, i.e., A is a closed set containing A and if C is a closed set containing A thenA s; Ii s; C. The set A is called the closure of A.

29. Let A C ]R2. Let B C ]R be defined by

B = {x E IR I there is y E IR such that (x , y) E A}.

26. Let A c ]R2 be defined as

A ={(X,Y)E]R2; I <x <2andy=x).

1.7 Exercises

Page 78: A First Course in Optimization Theory

43. Let S be a finite set. Show that any function f: S --+ ]Rn is continuous on S.What if S is countably infinite?

40. Find the rank of each of the following matrices by finding the size of the largestsubmatrix which has a nonzero determinant:

A~U 6 4 n [ 18 -~] [ -5 8 11 n7 1 B= ~ C = 13 3 02 5 11 10 0 -6

41. Using the first method described in subsection 1.3.6, find the determinant of eachof the following three matrices. Verify that you get the same answers if you usethe second method.

[ -13 n [13 4 1] [ -4

4 ~nA = ~ -2 B= 0 4 C = ~ -310 -7 2 2

42. Find the inverse of the following matrices:

A ~ [~

13 ~][15 24 12] c~p 0 n15 B = 1~ 16 10 814 3 1 11 4

39. Let A beagivenn xn nonsingularmatrix, andletaA be the matrix whose (i. J)-thelement isaai], i, j = 1, ...• n.What is the relationship, if any-between (aA)-1and A-I?

37. Give an example of a bounded and convex set S C 1Rsuch that sup S E S butinf S rt. S.

38. Let A and B be 2 x 2 matrices. Identify general conditions on the entries (aij) and(bij) under which A B = BA. Using these conditions. find a numerical exampleof such matrices A and B.

Show that the finite intersection property is equivalent to compactness, i.e., thata set S is compact if and only if it has the finite intersection property.

35. Show that a set S C lR is convex if. and only if. it is an interval, i.c., it is ofthe form [a, b1. [a. b). (a. b1. or (a. b). (We do not preclude the possibility thata = -00 and/or b = +00.)

36. Show that if S c ]Rn is convex (i.e .• the convex combination of any two pointsin S is also in S). then the convex combination of any finite collection of pointsfrom S is also in S.

Chapter J Mathematical Preliminaries70

Page 79: A First Course in Optimization Theory

x=O

Show that if I is continuous at x = 0, then it is continuous at every point ofR. Also show that if I vanishes at a single point of JR, then I vanishes at everypoint of lit

51. Let I :R, --+ R be defined by

{0,

I(x) =x sin(l/x),

f'(» + y) = l(x)/(y) for all x, y E IR.

Find an open set 0 such that I-I(0) is not open and find a closed set C suchthat 1-1 (C) is not closed.

48. Give an example of a function I: IR --+ IRwhich is continuous at exactly twopoints (say, at ° and 1), or show that no such function can exist.

49. Show that it is possible for two functions I: lR --+ lR and g: lR --+ lR to bediscontinuous, but for their product I .g to be continuous. What about theircomposition log?

50. Let I: IR --+ IRbe a function which satisfies

if 0 s: x s: Iotherwise.

47. Let I: IR --+ IR be defined by

I(x) = {~

is a closed set.

[x E JR" I I(x) = O}

44. Let I be a function from JR." to IRm: For B C JR.m• define 1- I(B) by I-I (lJ) =[x E lRn I I(x) E B}. Show that for any subsets A I. A2 of lRn and BI, 82 ofIRm:

(a) I(AI n A2) S; f(AI) n f(A2)'(b) I(AI U A2) = I(AI) U I(A2).(c) I-I(BI U B2) = 1-1(81) u 1-I(B2).(d) I-I (BD = [f-I(Bdf.(e) I-I(BI n B2) = 1-1(8t) n 1-1(82)'(f) AI S; I-IU(AI».(g) IU-I(BI» S; 81·

45. Let I :IRn --+ lR be continuous at a point p E lRn. Assume f( p) > O. Showthat there is an open ball B S; lRn such that p E B, and for all x E B, we haveI(x) > 0.

46. Suppose I: lRn --+ lR is a continuous function. Show that the set

711.7 Exercises

Page 80: A First Course in Optimization Theory

Show that the two-sided directional derivative of I evaluated at (x, y) = (0,0)exists in all directions h E 1R2.but that I is not differentiable at (0.0).

xyI(x, y) = J .

x2 + y2

56. Let I: 1R2~ IRbe defined by 1(0,0) = 0, and for (x, y) ¥- (0,0),

for some fixed M > 0 and a > I. then I is a constant function, i.e .. lex) isidentically equal to some real number hat 1I11x E JR.

If(x) ._ l(y)1 ::: M(lx - yl)a

Show that I is continuous at 1/2 but discontinuous elsewhere.

54. Let f: JR'I ~ IR and g: IR ~ lR be continuous functions. Define h: lRn ~ lR byh (x) =: g[f(x»). Show that h is continuous. Is it possible for h to be continuouseven if I and g are not?

55. Show that if a function I: lR ~ IR satisfies

if x is irrational

if x is rational.I(x) = { x

I-x

(Drawing a picture of f for a fixed t will help.) Show that I is a separatelycontinuous function, i.e., for each fixed value of t, I is continuous as a functionof s, and for each fixed value of s, I is continuous in t. Show also that I is notjointly continuous in sand t , i.e., show that there exists a point (5, t) E D and asequence (sn, tn) in Dconvergingto(s. t) such that limn-+oo I(~n, tn) i= I(s, t).

53. Let f : IR ~ lR be defined as

°s E G. tJs E (t, 1].

2s2-­

tfi», t) =

2sand for t > 0,

l(s,O) = 0, for all s E [0, I],

Show that I is continuous at 0.

52. Let D be the unit square [0, I] x [0, I] in JR2. For (s, t) E D, let [ts, t) bedefined by

Chapter J Mathematical Preliminaries72

Page 81: A First Course in Optimization Theory

63. Find the hessians D2 I of each of the following functions. Evaluate the hessiansat the specified points, and examine if the hessian is positive definite. negativedefinite. positive semidefinite. negative semidefinite. or indefinite:(a) I: lR2 -+ lR. I(x) = xf + .JX2. at x = (I, I).(b) (:]R2 -+ IR. I(x) = (XIX2)1/2. at an arbitrary point x E lR.~ I'(c) (:]R2 ~ JR. I(x) = (XtX2)2, at an arbitrary point x E Il{~+.

(d) 1:]Rt -+ IR. (x) =Ft +.JX2 + .JX3. at x = (2,2.2).(e) f: lRt -+ R, I(x) = y'X)X2X3. at x = (2,2,2).(0 I: lR~ -+ lR. I(x) = X)X2 + X2X3 + X3XI, at x = (I, I, I).(g) I: lRt -+ IR. I(x) = ax) + bX2 + eX3 for some constants a, h. C E JR. at

x = (2,2.2).

A~ [~ ° ~]A~U 2 n4

° 6

A~ [~

0 f] [ -I 2 -I]I A = 2 -4 2 .0 -I 2 -I

57. Let f: JR.2 ~ JR. be defined by f{O; 0) = O.and for (x. y) :f: (0,0),xy(x2 _ y2)

f(x,y)::::: 2 2X + y

Show that the cross-partials a2 [tx ,y) /axa y and a2 [t;x , y) I ayax exist at all(x,y) E lR2• but that these partials are not continuous at (0, 0). Show also that

a2 I a2Idxay (0,0) i= ayax (0, 0).

58. Show that an n x n symmetric matrix A is a positive definite matrix if and only if-A is a negative definite matrix. (- A refers to the matrix whose (i, j)-th entryis -aij')

59. Prove the following statement or provide a counterexample to show it is false: ifA is a positive definite matrix. then A -I is a negative definite matrix.

60. Give an example of matrices A and B which are each negative semidefinite, butnot negative definite. and which are such that A + B is negat}'Veoefm}\e.

61. Is it possible for a symmetric matrix A to be simultaneously negative semidefiniteand positive semidefinite? If yes, give an example. If not, provide a proof.

67. Examine the definiteness or sernidefiniteness of the following quadratic forms:

1.7 Exercises

Page 82: A First Course in Optimization Theory

74

min{f(x) I x E VI·and

max{f(x) IxED},

respectively. Alternatively, and more compactly, we shall also write

Minimize f(x) subject to xED,

and

Maximize j(x) subject to xED,

2.1 Optimization Problems in ]Rn

An optimization problem in ]Rn, or simply an optimization problem, is one wherethe values of a given function /: lRn ~ lR are to be maximized or minimized over agiven set D c ]Rn. The function / is called the objective junction, and the set D theconstraint set. Notationally, we will represent these problems by

This chapter constitutes the starting point of our investigation into optimization the­ory. Sections 2.1 and 2.2 introduce the notation we use to represent abstract optimiza­tion problems and their solutions. Section 2.3 then describes a number of examplesof optimization problems drawn from economics and its allied disciplines, which areinvoked at various points throughout the book to illustrate the use .of the techniqueswe develop to identify and characterize solutions to optimization problems. Finally,Sections 2.4 and 2.5 describe the chief questions of interest that we examine over thenext several chapters, and provide a roadmap for the rest of the book.

Optimization in lRn

2

Page 83: A First Course in Optimization Theory

lit frequently happens through this text that definitions or results stated for maximization problems have anexact analog in the context of rninimization problems. Rather than spell this analog out on each occasion.we shall often leave it 10 the reader to fill in the missing details.2In this context, sec Theorem 2.4 below.

In the seqr '1, therefore, we will not talk of the solution of a given optimizationproblem, but If a set of solutions of the problem, with the understanding that this

Example 2.3 Let V = [-1,1] and I(x) = x2 for x E V. The problem of maxi­mizing f on V now has two solutions: x = - 1and x = 1. 0

Example2.2 Let D = [0, 1]and let I(x) = x(l -x) for xED. Then, the problemof maximizing f on D has exactly one solution, namely the point x = 1/2. LJ

Example 2.1 Let 1) = lR+ and fex) = x for x E V. Then, I(V) =sup f(,D) = +00, so the problem max{f(x)lx E D} has no solution.

We say in this case that f attains a minimum on 1) at z, and also refer to z as aminimizer of I on D.

The set of attainable values of I on D, denoted I(D), is defined by

I(D) = {w E lR I there is x E 1) such that lex) = wj.

We will also refer to feD) as the image of 1) under I. Observe that I attains amaximum on V (at some x) if and only if the set of real numbers feD) has a well­defined maximum, while f attains a minimum on D (at some z) if and only if I(D)has a well-defined minimum. (This is simply a restatement of the definitions.)

The following simple examples reveal two important points: first, that in a givenmaximization problem, a solution may fail to exist (that is, the problem may have nosolution at ail), and, second, that even if a solution does exist, it need not necessarilybe unique (that is, there could exist more than one solution). Similar statementsobviously also hold for minimization problerns.I

I(z) _:::fey) for all Y E D.

We will say in this case that I attains a maximum on D at x, and also refer to x as amaximizer of I on V.Similarly,' a solution to the problem min{f(x) I xED} is a point z in D such

that

lex) ~ ley) for all y E D.

Problems of the first sort are termedmaximization problems and those of the SCCQIHJ __

sort are called minimization problems. A solution to the problem max{/ex) I xED}is a point x in V such that

752.1 Optimization Problems: Definitions

Page 84: A First Course in Optimization Theory

Proof We deal with the maximization problem here; the minimization problem isleft as an exercise. Suppose first that x maximizes lover D. Pick any y E V.Then,f(x) 2: fey), and since I{J is strictly increasing, rp(/(x» ::: rp(/(y». Since y E 'Dwas arbitrary, this inequality holds for all y E '0,which states precisely that x is amaximum of I{J0 fan V.

Now suppose that x maximizes cpo fan D, SOI{J(/(X» ::: I{J(/(Y» for all y E V.lfx did not also maximize f on D. there would exist y* E V such that I(y*) > f(x).

Remark As will be evident from the proof. it suffices that rp be a strictly increasingfunction on just the set /(V), i.e., that cp only satisfy I{J(ZI) > rp(Z2) for all ZI, Z2 E

feD) with ZI > Z2. 0

Then x is a maximum of Ion '0 if and only if x is also a maximum of the compositionI{J0 1011 '0;and z is a minimum of f on '0 if and only if z is also a minimum of I{J0 f011 D.

x > y implies l{J(x)> rp(y).

Theorem 2.5 Let I{J: lR ~ lR be a strictly increasing function, that is, a functionsuch that

Proof The point x maximizes lover '0 if and only if f(x) ::: fey) for all y E '0,while x minimizes - f over V if and only if - f(x) s - I(y) for all Y E V.Since I(x) 2: fey) is the same as - f(x) s - fey), the first part of the theorem isproved. The second part of the theorem follows from the first simply by noting that-(- f) = f· 0

Theorem 2.4 Let - I denote the function whose value at any x is - f(x). Then x isa maximum of Ion '0 if and only if x is aminimum of - f on '0;and z is a minimumof f on '0 if and only if z is a maximum of - Ion V.

The set arg minl/(x) I x E '0)of minimizers of I on '0 is defined analogously.We close this section with two elementary, but important, observations, which

we state in the form of theorems for ease of future reference. The first shows thatevery maximization problem may be represented as a minimization problem, andvice versa. The second identifies a transformation of the optimization problem underwhich the solution set remains unaffected.

set could, in general, be empty. The set of all maximizers of I on '0 will be denotedargmax{j(x) I x E V):

arg maxl.f(x) I x E VI :::: Ix EV I f(x) ::: fey) for all Y E VI·

Chapter 2 Optimization76

Page 85: A First Course in Optimization Theory

Similarly, we let V.«()) denote the set of minimizers of 1(',8) on V(O).

L '(8) = argmax{f(x, 8) I x E D«()}= [x E D(O) I f(x,8) ~ f(z,e) for all z E V(O)).

However, the set of max.imizers is itself an object of considerable interest in parame­trized optimization problems (see Section 2.4 below), and a less cumbersome notationwill be valuable. We shall use V*(()) to denote this set:

argmax.{j(x, 8) I x E D(e)}.

Notice that although our notation is general enough to allow both I and D to dependin a non-trivial way on 8, it also admits as a special case the situation where () affectsthe shape of only the objective function or only the constraint set. More generally, itallows for the possibility that B is a vector of parameters, some of which affect onlythe objective function. and others only the constraint set.

The set of maximizers of f(·, e) on vee) will typically depend in a non-trivialway on (). It would be consistent with our notation for the unparametrized case todenote this set by

minlf(x, e) I x E Vee»).

while the corresponding minimization problem will be denoted

maxlf(x,8) I x E Vee»).

2.2 Optimization Problems in Parametric FormIt is often the case that optimization problems are presented in what is called pa­rametric form, that is, the objective function andJor the feasible set depend on thevalue of a parameter () in some set of feasible parameter values 0). Indeed, all of theexamples of optimization problems presented in Section 2.3 below belong to thisclass. Although we do not study parametrized optimization problems until Chapter9, it is useful to introduce at this point the notation we shall use to represent theseproblems in the abstract. We will denote by E> the set of all parameters of interest.with () denoting a typical parameter configuration in E>. Given a particular value8 E e, the objective function and the feasible set of the optimization problem under8 will be denoted fe 8) and V(8), respectively. Thus, if the optimization problemis a max.imization problem, it will be written as

Since If> is a strictly increasing fUl.lcti!;ln,_itfollows that cp(j'(/» > cp(j'(x», so ~\does not maximize If> 0 lover 1), a contradiction, completing the proof. 0

772.2 Optimization Problems in Parametric Form

Page 86: A First Course in Optimization Theory

'There is a slight abuse oftenninology here. Since f·«() is defined through the supremum, the "maximizedvalue function" could be well defined even if a maximum docs not exist.

2.3.1 Utility MaximizationA basic. but typical. model in consumer theory concerns a single agent who consumesn commodities in nonnegative quantities. Her utility from consuming x; ?: ° unitsof commodity i (i = 1, ... , n) is given by U(XI, ••. ,xn), where u: lR~ ~ IR isthe agent's utility function. The agent has an income of I ?: 0, and faces the pricevector P = (PI, ... , Pn), where Pi ?: odenotes the unit price of the (-th commodity.

2.3 Optimization Problems: Some ExamplesEconomics and its allied disciplines are rich in decision problems that can be castas optimization problems. This section describes simple versions of several of theseproblems. Many of these examples are invoked elsewhere in the book to illustrate theuse of the techniques we shall develop. All of the examples presented in this sectionare also optimization problems in parametric form. This is pointed out explicitlyin some cases, but in others we leave it to the reader to identify the parameters ofinterest.

A similar remark is valid for 1.(0) when V.(O) is nonempty.

f*(B) = f'tx", e) for any x" E v*(e).

We call 1* the value function of the problem max{f(x, 8) I x E V(e)}, and thefunction j. the valuefunctionofthe minimization problem min{/(x, 8) I x E Vee»).We shall, in the sequel, refer to these functions as the maximized value function andthe minimized value function, respectively.' Observe that if V* (e) is nonempty forsome e, [hen we must have sup I(v(e» = l(x",8) = [tx'; e) for any x· and x'in v*(e), so in this case, we may also write

f*(8) = inf f(v(e».

and

r(B) = sup I(V(B»

represents the set of attainable values of f(', B) on V(B), then

f(V(B» = {y E IR I y = f(x, B) for some x E D(8»)

Lastly, we will denote by f*(B) and f.(B), respectively, the supremum and infimumof the objective function given the parameter configuration 8. That is, if

Chapter 2 Optimization78

Page 87: A First Course in Optimization Theory

4Well. almost. The feasible set will change as long as prices and income do not change in the same proportion

Minimize p . x subject to x E X(ii).

The objectiv. is to solve:

X(u) = [r E lR~ I u(x) ::::ii}.

2.3.2 Expenditure Minimization

The expenditure minimization p.oblern represents the flip side of utility maximiza­tion. The problem takes as given a price vector {J E IR+. and asks: what is theminimum amount of income needed to give a utility-maximizing consumer a utilitylevel of at least U, where u is some fixed utility level (possibly corresponding to somegiven commodity bundle). Thus, the constraint set in this problem is the set

In particular, our objective may be to examine how the solutions to the problemchange as the vector of "weights" a = (a I•...• an) changes. for a given level ofprices and income. The vector a would then constitute the only parameter of interest.

Uj > () for all i.

The utility maximization problem is a typical example of an optimization problemin parametric form. When the price vector p and/or the income 1change. so too doesthe feasible set" l3(p, I), and therefore. the underlying optimization problem. Thus.p and I parametrize this problem. Since prices and income are usually restricted totaking on nonnegative values. the range of possible values for these parameters isIR+ x lR+

Of course, depending on the purpose of the investigation. P and I Illay not bethe only-or even the primary-parameters of interest. For instance. we may wishto study the problem when the utility function is restricted to being in the Cobh­Douglas class:

Maximize u(x) subject to x E Bt p, 1).

The agent's objective is to maximize the level of her utility over the set of affordablecomrnor'ity bundles. i.e., to solve:

B(p.l) = (x E IR~ I p. x 5 fl.

Her budget set (i.e., the set of feasible or affordable consumption bundles. given her - -income I and the prices p) is denoted B(p, I), and is given by

792.3 Optimization Problems: Examples

Page 88: A First Course in Optimization Theory

5Iflhe finn operates in a competitive environment. we usually assume Ihal pe,) = p for all y 2: 0 for somefixed p. i.e., that the finn has no influence over market prices.

Minimize w . x subject to X E F(Y).

Implicitly. this formulation of the cost-minimization problem presumes that a "freedisposal" condition holds, i.e .• that any excess amount produced may be costlesslydiscarded. If there is a nonzero cost of disposal, this should also be built into theobjective function.

and the problem is to solve

F(y) = (x E lR't I g(x) :::: .rl.

2.3.4 Cost MinimizationThe cost minimization problem for a firm is similar to the expenditure minimizationproblem for a consumer. The aim is to identify the mix of inputs that will minimize thecost of producing at least y units of output, given the production function g: lR't ~lR+ and the input price vector W = (WI •.•. , wn) E lR't. Thus. the set of feasibleinput choices is

2.3.3 Profit MaximizationProducer theory in economics studies the decision-making processes of firms. Acanonical, if simple. model involves a firm which produces a single output using ninputs through the production relationship y = g(xl •... ,Xn), where Xi denotes thequantity of the i-th input used in the production process, y is the resultant output,and g: llt'+- ~ llt+ is the firm's production function or technology. The unit price ofinput i is Wj :::: O.When the firm produces y units of output, the unit price it canobtain is given by p(y), where p: lR+ ~ lR+ is the market (inverse- )demand curve. 5

Inputs may be used in any nonnegative quantities. Thus, the set of feasible inputcombinations is llt,+-. Letting W denote the input price vector (WI, ...• wn), and xthe input vector (XI, ... , xn), the firm's objective is to choose an input mix that willmaximize its level of profits, that is, which solves

Maximize p(g(x»g(x) - W • X subject to x E llt't.

The profit maximization problem is another typical example of an optimizationproblem in parametric form. The input price vector w, which affects the shape ofthe objective function. evidently parametrizes the problem, but th~e could also beother parameters of interest, such as those affecting the demand curve pt-), or thetechnology g( -}. For example, in the case of a competitive firm, we have p(y) = pfor all y ::::O,where p > 0 is given. In this case, the value of p is obviously importantfor the objective function; p, therefore, constitutes a parameter of the problem.

Chapter 2 Optimization80

Page 89: A First Course in Optimization Theory

Letting y(~ ) = (YI (¢), ._. ,YS(¢ », it follows that the agent's utili ty from choosingthe portfolio ¢ is U(y(¢).

A portfolio ¢ is affordable if the amount made from the sales of securities issufficient to cover the purchases of securities, i.e., if p . ¢ ::::o. It leads to z feasible

NYs(¢) = Ws + L¢iZI.,.

;=1

2.3.6 Portfolio ChoiceThe portfolio selection problem in finance involves a single agent who faces S pos­sible states of the world, one of which will be realized as the "true state." The agenthas a utility function U: IR~ ~ IRdefined across these states: if the income the agentwill have in state sis Ys, s = 1, ...• S, his utility is given by U(y! •... , ys).

If the true state of the world is revealed as s, the agent will have an endowment of(J)s E IR+; this represents the agent's initial income in state s, Let co :::: (WI, .... IVS)

represent the initial endowment vector of the agent.Before the true state is revealed, the agent may shift income from one state to

another by buying and selling securities. There are N securities available. One unitof security i costs Pi ::: 0 and pays an income of Zis if state s occurs.

A portfolio is a vector of securities ¢ = (¢I, ... , ¢N) E lRN. If the agent selectsthe portfolio ¢, then his income from the securities in state s is L~ I ¢; ::,,: thus, hisoverall income in state s is

Maximize u(x ./) subject to (r ,/) E Fi p, w).

The agent's maximization problem may now be represented as:

2.3.5 Consumption-Leisure ChoiceA simple model of the labor-supply decision process of households may be obtainedfrom the utility maximization problem by making the income level I also an object ofchoice. That is, it is assumed that the agent begins with an endowment of 11 :: 0 unitsof "time," and faces a wage rate of w :::O. By choosing to work for L units of time,the agent can earn an income of w L; thus, the maximum possible income is lJlfI .while the minimum is zero. The agent also gets utility from leisure. i.e .. from the timenot spent working. Thus, the agent's utility function takes the form u(.\ I , ...• x". I).where x, represents the consumption of commodity i , and I = If - L is the amountof leisure enjoyed. Letting P = (PI, ... , Pn) E IR~ denote the price vector for then consumption goods, the set of feasible leisure-consumption bundles for the agcntis, therefore, given by:

F(p,w) = I(x,l) E IR~+II p·x:s w(JI-/), I::: IJ}.

X I2.3 Optimization Problems: Examples

Page 90: A First Course in Optimization Theory

u;(X;) ~ u;(x;), for j = 1,2,

withstrict inequality for at leastone i.

F(w) = {(XI, X2) E lR~ x lR~ I XI +X2 ~ «i].

A feasible allocation is Pareto optimal if there does not exist another feasible allo­calion (x;, x2) such that

2.3.7 Identifying Pareto OptimaThe notion of a Pareto optimum is fundamental to many areas of economics. Aneconomic outcome involving many agents is said to be Pareto optimal if it is notpossible to readjust the outcome in a manner that makes some agent better off,withoutmaking at least one of the other agentsworse off.The idcnti fication and characterizationof Paretooptimal outcomes is a matter of

considerable interest in many settings.A simple example involvesallocating a giventotal supplyof resourceW E lR't- betweentwo agents.Agent i's utility from receivingthe allocationx, E 1R.'t-is II; (x.), where II;: IR.~ -+ IR. is agent i's utility function. Anallocation (XI, X2) is/casi"l£' if XI, X2 2: 0 and XI + X2 ::: (II. Let F«(o) he the set ofall feasible outcomes:

The agent's problem is to selecttheportfoliothatmaximizeshis utility from amongthe set of all feasible portfolios:

Maximize U(y(r/») subject to ¢ E <I>(p, w).

As with the earlier examples, the portfolio selection problem is also one in para­metric form. In this case, the parameters of interest include the initial endowmentvectorw, the security price vector p, and the securitypayoffs (zij). Of course, onceagain, these need not be the only parameters of interest. For instance, the agent'sutility function may be of the expected utility form

sU(Yl,· .. ,YS) = l)TiU(Y;),

;=1

wheren, represents the (agent's subjective)probabilityof state ioccurring, andU(Yi)is the utility the agent obtains fromhaving an incomeof Yi in state i. In this case, theprobability vector rr = (rrl, ... , n s) also formspart of the parametersof interest; sotoo mayany parameters that affect the form of u(·).

consumption policy if income is nonnegativein all states, i.e., if Ys(¢) ~ 0 for alls. A portfolio is feasible if it is both affordableand leads to a feasible consumptionpolicy.Thus, the set of all feasible portfolios is

<1>(p, w) = (¢ E lRn I P: ¢ ~ 0 and y(¢) ~ OJ.

Chapter 2 OptimizationR2

Page 91: A First Course in Optimization Theory

<P(w) = l\XI, _ .. , xn, y) E 1R~t-+) I there is x E IR+ such that

y=h(x)andx+x) +···+x" < (0).

2.3.8 Optimal Provision of Public GoodsAs opposed to private consumption goods, a public good in economics is definedas a good which has the characteristic that one agent's consumption of the gooddoes not affect other agents' ability to also consume it. A central problem in publicfinance concerns the optimal provision of public goods. A simple version of thisproblem involves a setting with n agents, a single public good, and a single privategood ("money"). The initial total endowment of the private good is co > D. and thatof the public good is zero. The private good can either be used in the productionof the public good. or it can be allocated to the agents for consumption by the1ll.If x ::: 0 is the amount of the private good used in production of the public good,then the amount of the public good produced is hex), where II: IR+ ---> IR+. LettingXi denote the allocation of the private good to agent i, the set of feasible combi­nations of private consumption and public good provision that can be achieved is.t:, erefore.

which contradicts the fact that (xi, xi) is a maximum of the weighted utility maxi­mization problem.

Observe that the weight ex E (0, I) is a parameter of this optimization problem.as is the total endowment vector w. It is not very hard to sec that every solution(xr, xi) to this weighted utility maximization problem must define a Pareto optimaloutcome. For, if (xr, xi) were a solution, but were not Pareto optimal, there wouldexist (x;,x~) E F(w) such that UI(X;) ::: lI;(xn with at least one strict inequality.But this would imply

U(x;,x~,ex) = al/I(x;) + (I - a)u2(x;)> aUI(xt)+(l-a)u2(xi)= U(xi,xi,a),

Maximize U(XI, X2, a) subject to (XI, X2) E F((J)).

Now consider the optimization problem

There are several alternative-methods for identifying a Pareto optimal allocaiiou,One is to use a "weighted" utility function. Pick any a E (0, I), and let U(XI, X2, a)be defined by

2.3 Optimization Problems: Examples

Page 92: A First Course in Optimization Theory

611is possible to describe a suitable set of regularity conditions on u under which such existence anduniqueness may be guaranteed, but this is peripheral to our purposes here.

T(p.l) = (T E IRn I r . x(p + T, I) 2: R}.

Maximize u(x) subject to x E B(p + T, J) = {x E IR~ I (p + T)' X ~ I}.

Assume that a unique solution exists to this problem for each choice of valuesof the parameter vector (p, T, 1),6 and denote this solution by x(p + T, J). That is,x(p + T, I) is the commodity bundle that maximizes u(x) over Bt p-s- T, J). Further,let v(p + T, I) be the maximized value of the utility function given (p, T, 1), i.e.,v(P + T, l)= u(x(p + T, I).

Now suppose the government has a given revenue requirement of R. The optimalcommodity taxation problem is to determine the tax vector t' that will (a) raise atleast the required amount of money. and (b) make the consumer best-off amongstall plans that will also raise at least R in tax revenue. That is, suppose that the taxauthority chooses the tax vector r , Then, the consumer will choose the commoditybundle x(p + T, I). so this will result in a total tax revenue of T . x(p + T, I). Thus,the set oi feasible tax vectors is

2.3.9 Optimal Commodity TaxationA second question of interest in public finance is determining the optimal rate atwhich different commodities should be taxed to raise a given amount of money. It istaken for granted that consumers will react to tax changes in a manner that maximizestheir utility under the new tax regime. Thus, the study of optimal taxation begins withthe utility maximization problem of a typical consumer.

There are n goods in the model, indexed by i. The (pre-tax) prices of the n com­modities are given by the vector P = (PI, .. " Pn). Given a tax per unit of Ti onthe i-th commodity, the gross price vector faced by the consumer is p + T, whereT = (TI, ... T/I)' Thus, if 1 represents the consumer's income, and u: IRt --+ IR theconsumer's utility function, the consumer solves

Maximize W[u I (XI, y), ... , u" (xn, y)] subject to (XI, ... ,X", y) E <l> (w).

The utility of any agent i depends on her own consumption Xi of the private good,and on the level y of provision of the public good, and is denoted u, (Xi, y). A so­cia/ welfare junction is a function W(UI, ... , un) that takes as its arguments theutility Icvels of the n agents. (For instance, W could be a weighted utility functionallil + ... + allIIn as in the previous subsection.) The public goods provision prob­lem typically takes as given some social welfare function W, and aims to find thecombination of private consumption levels (x I, ... ,x,,) and public good provisiony that is feasible and that maximizes W, i.e., that solves

Chapter 2 Optimization1\4

Page 93: A First Course in Optimization Theory

7For instance, (J could be a real number representing a variable such as "income" or "price," and .r couldrepresent consumption levels.

I. The identification of conditions that every solution to an optimization problemmust satisfy, that is, of conditions that are necessary for an optimum point.

2. The identification of conditions such that any point that meets these conditionsis a solution. that is, of conditions that are sufficient to identify a point as beingoptimal.

3. The identification of conditions that ensure only a single solution exists to agiven optimization problem. that is, of conditions that guarantee uniqueness ofsolutions.

4. A general theory of parametric variation in a parametrized family of optimizationproblems. For instance, given a family of optimization problems of the formmax{j(x,O) I x E V(O)). we are interested in

(a) the identification of conditions under which the solution set '0*(0) varies"continuously" with e, that is, conditions which give rise to parametriccontinuity; and

(b) in problems where the parameters e and actions x have a natural ordering.Ithe identification of conditions under which an increase in the value of 1:1

also leads to an increase in the value of the optimal action x ", that is, ofcom '. ions that lead to parametric monotonicity.

2.4 The Objectives of Optimization TheoryOptimization theory has two main objectives. The first is to idcnti fy a set of conditionson f and '0 under which the existence of solutions to optimization problems isguaranteed. An important advantage to be gained from having such conditions i\that the case-by-case verification of existence of solutions in applications can heavoided. In particular, given a parametrized family of optimization problems, it willbe possible to identify restrictions on parameter values which ensure existence ofsolutions. On the other hand, to be of use in a wide variety of applications, it isimportant that the identified conditions be as general as possible. This makes the"existence question" non-trivial.The second objective of optimization theory is more detailed and open-ended: it

lies in obtaining a characterization of the set of optimal points. Broad categories ofquestions of interest here include the following:

Maximize v(p + T, I) subject to r E T'( p, I).

The tax rate of T gives the QPlitnaJly reacting consumer a maximized utility (IIv(p + T, I).Thus, the optimal commodity taxation problem is to solve

2.4 Objectives of Optimization Theory

Page 94: A First Course in Optimization Theory

The main result of this chapter is the TheoremofLagrange,which describes necessaryconditions that must be met at all local optima of such problems. Analogous to theresults for interior optima. we also describe conditions that are sufficient to identifyspecific points as being local optima. The remainder of the chapter focusses on

D = {xElRnlgj(x)=O, i=l, ... ,kj.

2.5 A Roadmap

Chapter 3 begins our study of optimization with an examination of the fundamentalquestion of existence. The main result of this chapter is the Weierstrass Theorem,which provides a general set of conditions under which optimization problems areguaranteed to have both maxima and minima. We then describe the use of this resultin applications, and, in particular, ilJustrate how it could come in handy even inproblems where the given structure violates the conditions of the theorem.

Chapters 4 through 6 turn to a study of necessary conditions for optima usingdifferentiability assumptions on the underlying problem. Since differentiability isonly a local property, the results of these chapters all pertain only'to local optima,i.c., of points that are optima in some neighborhood, but not necessarily on all of theconstraint set. Formally, we say that x is a local maximum of f on V if there is r > 0such that x is a maximum of f on the set Btx, r) nV. Local minima are definedsimilarly. Of course, every optimum of f on D is also a local optimum of f on D.To distinguish between points that are only local optima and those that are optimaon all of D, we will refer to the latter as global optima.

Chapter 4 looks at the simplest case, where local optima occur in the interior ofthe feasible set D, i.e., at a point where it is possible to move away from the optimumin any direction by at least a small amount without leaving V. The main results herepertain to necessary conditions that must be met by the derivatives of the objectivefunction at such points. We also provide sufficient conditions on the derivatives ofthe objective function that identify specific points as being local optima.

Chapters 5 and 6 examine the case where some or all of the constraints in Dcould matter at an optimal point. Chapter 5 focuses on the classical case whereV is speci ficd implicitly using equality constraints, i.e., where there are functionsgj:]R" --I- JR, i = 1, ... , k ; such that

These questions, and related ones, form the focus of the next several chapters ofthis book. As is perhaps to be expected, further hypotheses on the structure of theunderlying optimization problems are required before many of them can be tackled.For instance, the notion of differentiability plays a central role in the identificationof necessary conditions for optima, while the key concept in obtaining sufficientconditions is convexity.The next section elaborates further on this point.

Chapter 2 Optimization86

Page 95: A First Course in Optimization Theory

The centerpiece of this chapter is the Theorem of Kuhn and Tucker, which describesnecessary conditions for optima in such problems. The remainder of the analysis inthis chapter is, in some sense, isomorphic to that of Chapter 5. We discuss a cook hookprocedure for using the Theorem of Kuhn and Tucker in applications. As with thecookbook procedure for solving equality-constrained problems, this one is also no!guaranteed to be successful, and once again, for the same reason (namely. becausethe conditions of the relevant theorem are only necessary, and not also sufficient.)Nonetheless, this procedure also works well in practice, and the chapter examineswhen the procedure will definitely succeed and when it could fail.

Chapters 7 and 8 tum to a study of sufficient conditions, that is, conditions that.when met, will identify specific points as being global optima of given optimizationproblems. Chapter 7 presents the notion of convexity, that is. of convex sets, and ofconcave and convex functions, and examines the continuity, differentiability. and cur ..vature properties of such functions. It also provides easy-to-use tests for identi fyingconcave and convex functions in practice. The most important results of this chapter.however, pertain to the use of convexity in optimization theory. We show that thesame first-order conditions that were proved, in earlier chapters, to be necessary forlocal optima, are also, under convexity conditions, sufficient for global optima.

Although these results are very strong, the assumption of convexity is not an un..restrictive one. In Chapter 8, therefore, a weakening of this condition. called III/wi·convexity, is studied. The weakening turns out to be substantial; quasi-concave andquasi-convex functions fail to possess many of the strong properties that characterizeconcave and convex functions. However, it is again possible to give tests for idcrui ..fying such functions in practice. This is especially useful because our main result ofthis chapter shows that, under quasi-convexity, the first-order necessary conditionsfor local optima are "almost" sufficient for global optima; that is, they are sufficientwhenever some mild additional regularity conditions are met.

Chapters" md 10 address questions that arise from the study of parametric fami Iiesof optimizat on problems. Chapter 9 examines the issue of parametric continuityunder what conditions on the primitives will the solutions to such problems vary

D = {xElRnlhj(x):::O, i=I, ... ,/}.

using the theory in applications. We describe a "cookbook" procedure for using the..necessary conditions of the Theorem of Lagrange in finding solutions to equality­constrained optimization problems. The procedure will not always work. since theconditions of the theorem are only necessary, and not also su fficicnt, Nonetheless.the procedure is quite successful in practice, and the chapter explores why this is thecase, and, importantly, when the procedure could fail.

Chapter 6 moves on to inequality-constrained optimization problems, that is.where the constraint set is specified using functions hi: jRn ~ JR, i = I, ... , I, as

'11,72.5 A Rom/map

Page 96: A First Course in Optimization Theory

1. Is Theorem 2.5 valid if cp is only required to be nondecreasing (i.e., if x > yimplies cp(x) ~ cp(y» insteadof strictly increasing?Why or why not?

2. Give anexample of anoptimizationproblemwith an infinitenumber of solutions.

2.6 Exercises

continuously with the parameters ()? Of course, continuity in the solutions will notobtainwithout continuity in the primitives, and this requires a notion of continuityfora map such as naJ) which takespoints () E B intosets D(O) c IRn.The study ofsuchpoint-to-set maps, called correspondences is the starting point of this chapter.With a satisfactory notion of continuity for correspondences in hand, Chapter 9presents one of the major results of optimization theory, the Maximum Theorem.Roughlyspeaking, theMaximumTheoremstates that the continuityproperties oftheprimitives are inherited, but not in their entirety,by the solutions. A second result,the Maximum Theorem under Convexity, studies the impact of adding convexityassumptionson the primitives.Chapter 10 looks at the problem of parametric monotonicity. It introduces the

key notion of supermodularity of a function, which has become, in recent years, avaluable tool for the study of incomplete-informationproblems in economics. It isshownthat, under some regularityconditionson the problem, supermodularity of theobjective function suffices to yield monotonicityof optimal actions in the parameter.Chapters 1I and 12introduce the reader to the fieldof dynamic programming, i.e.,

of multiperiod decision problems, in which the decisions taken in.any period affectthe decision-making environment in all future periods. Chapter II studies dynamicprogramming problems with finitehorizons. It is shown that under mild continuityconditions,an optimal strategyexists insuchproblems.More importantly,it is shownthat the optimal strategies can be recoveredby the process of backwards induction,that is, by begining in the last period of the problemand working backwards to thefirstperiod.Chapter 12looks at dynamicprogrammingproblemswithan infinitehorizon. Such

problems are more complex than the corresponding finite-horizon case, since theylack a "last" period. The key to findinga solution turns out to be the Bellman Equa­tion, which is essentially a statement of dynamic consistency.Using this equation,we show that a solution to the problem can be shown to exist under just continuityand boundedness conditions; and the solution itself obtained using a rather simpleprocedure. A detailed presentation of the neoclassical model of economic growthillustrates the use of these results;more importantly,it also demonstrates how con­vexity conditions can be worked into dynamic programming problems to obtain asharp characterization of the solution.

Chapter 2 Optimization

Page 97: A First Course in Optimization Theory

(a) I(x) = I + x for all x ED.(b) I(x) = I, if x < 1/2, and I(x) = 2x otherwise.(c) I(x) = x, if x < I, and l(l) = 2.(d) 1(0)= 1,/(1) = 0, and f(x) = 3x for x E (0, I).

4. Let D = [0, I]. Suppose I: D -+ IR is increasing on D, i.e., for .r, y E D. ifx > y, then j(x) > Js». (Note that j is not assumed to he continuous on '0.1Is leD) a compact set? Prove your answer, or provide a counterexample.

5. Find a function I: IR -+ lR and a collection of sets 5" C IR. k = 1.2,3 .....such that f attains a maximum on each Sk, but not on n~ 1.5".

6. Give an example of a function I: [0, I] -+ lRsuch that /([0, I]) is an open set.

7. Give an example of a set D C IR and a continuous function I: D --+ IR such thatf attains its maximum, but not a minimum, on D.

8. Let D = [0, I]. Let f :V .-1- IRbe an increasing function on '0,and let g:D -? frtbe a decreasing function on D. (That is, if x , y E D with x > ythcn .r(x) > I(.t')and g(x) < g(y).) Then, / attains a minimum and a maximum on D (at 0 and I.respectively), as does g (at I and 0, respectively). Docs f + g necessarily attaina maximum and minimum on D?

3. Let D = [0, 1J. Describe the set j'(D) in each of the follow: ng cases, and identifysup I(D) and inf I(D). In which cases does f attain its supremum? What aboutits infimum?

Page 98: A First Course in Optimization Theory

90

Theorem 3.1 (The Weierstrass Theorem) Let V c lRn be compact, and letf: V --> IR be a continuous function on V. Then 1attains a maximum and aminimum

3.1 The Weierstrass Theorem

The followingresult, a powerful theorem credited to the mathematicianKarlWeier­strass. is the main result of this chapter:

We begin our study of optimization with the fundamental question of existence:under what conditions on the objective function f and the constraint set 'D are weguaranteed that solutions will always exist in optimization problems of the formmax{I(x) I x E VI or min{j(x) I x E V}? Equivalently,underwhat conditions onf and V is it the case that the set of attainable values I(V) contaips its supremumand/or infimum?Trivial answers to the existence question are, of course, always available: for

instance,1is guaranteed to attain a maximum and a minimumon V ifV is a finiteset. On theother hand,our primarypurpose in studying the existenceissue isfrom thestandpointof applications:we would like to avoid, to the maximumextent possible,the needto verify existenceon a case-by-case basis. In particular,when dealing withparametric families of optimization problems, we would like to be in a position todescribe restrictionson parameter values under which solutions always exist.All ofthis is possibleonly if the identifiedset of conditionspossesses aconsiderable degreeof generality.The centerpiece of this chapter, the Weierstrass Theorem, describes just such a

set of conditions. The statement of the theorem, and a discussion of its conditions,is the subject of Section 3.1. The use of the WeierstrassTheorem in applications isexamined in Section 3.2. The chapter concludes with the proof of the WeierstrassTheoremin Section 3.3.

Existence of Solutions: The Weierstrass Theorem

3

Page 99: A First Course in Optimization Theory

Then V is not compact (it is neither closed nor bounded), and f is discontinuousat every single point in lR(it "chatters" back and forth between the values 0 and I).Nonetheless I attains a maximum (at every rational number) and a minimum (atevery irrational number). 0

if x is rationalotherwise.

I(X)={I.O.

Example 3.5 Let V = lR++, and let f: 1) _.. IRbe defined by

Note that V is compact, but f fails to be continuous at just the two points -\ and I.In this case, leD) is the open interval (-I, I): consequently, f fails to attain eithera maximum or a minimum on 1). 0

Example 3.4 Let V = [-I, 11.and let I be given by

I(x) = {o. ~fx = -lor x = Ir , If - I < x < I.

Example 3.3 Let V = (0, 1) and f(x) = x for all x E (0, I). Then f is continuous,but 1) is again noncompact (this time it is bounded, but not closed). The set feD) isthe open interval (0. I), so, once again, f attains neither a maximum nor a minimumoo~ D

oattains neither a maximum nor a minimum on V.

Example 3.2 Let V = IR, and I(x) = x3 for all x E R Then I is continuous,but 1) is not compact (it is closed. but not bounded). Since feD) = lR, f evident I)'

It is of the utmost importance to realize that the Weierstrass Theorem only providessufficient conditions for the existence of optima. The theorem has nothing to say aboutwhat happens if these conditions are not met, and. indeed, in general, nothing callbe said. as the following examples illustrate. In each of Examples 3.2-3.4, only asingle condition of the Weierstrass Theorem is violated, yet maxima anti minima Iai Ito exist in each case. In the last example. all of the conditions of the theorem arcviolated. yet both maxima and minima exist.

IJProof See Section 3.3.

X E V.

on V. i.e.• there exist points Zt"anI1Z{in V such that

913. J The Weierstrass Theorem

Page 100: A First Course in Optimization Theory

So suppose that (p, J) E e.Observe that even if the agent spent her entire income Jon commodity i.her consumption of this commodity cannot exceed J / Pi' Therefore,

e = {(p, I) EE>l p »OJ.

where the price vector P = (PI, ... , Pn) and income J are given. As is usual, weshall restrict prices and income to taking on nonnegative values, so the parameter setE> in this problem is given by JR~ x 1R+.

It is assumed throughout this example that the utility function u: IR+ ~ IR iscontinuous on its domain. By the Weierstrass Theorem, a solution to the utilitymax imiz.uion problem will always exist as long as the budget set l3( p, /) is compact.We will show that compactness of this set obtains as long as p »0; thus, a solutionto the utility maximization problem is guaranteed to exist for all (p, J) E e,wheree c E> is defined by

Maximize u (x) subject to x E Bt p, J) = Ix E IR~ I p. x :::: J},

Example 3.6 Consider the utility maximization problem

3.2 The Weierstrass Theorem in ApplicationsPerhaps the most obvious use of the Weierstrass Theorem arises i.n the context ofoptimization problems in parametric form: the theorem makes it possible for us toidentify restrictions on parameter values that will guarantee existence of solutions insuch problems, by simply identifying the subset of the parameter space on which therequired continuity and compactness conditions are satisfied. The following exampleillustrates this point using the framework of subsection 2.3.1.

To restate the point: if the conditions of the Weierstrass Theorem arc met, a max­imum and a minimum are guaranteed to exist. On the other hand, if one or more ofthe theorem's conditions fails, maxima and minima mayor may not exist, dependingon the specific structure of the problem in question.

As one might perhaps anticipate from Examples 3.2 through 3.5, the proof of theWeierstrass Theorem presented in Section 3.3 proceeds in two steps. First, it is shownthat, under the stated conditions, feD) must be a compact set. Since compact sets inJR are necessarily also bounded, the unboundedness problem of Example 3.2, wheresup feD) = +00 and inf f(D) = -00, cannot arise. Second, it is shown that if Ais a compact set in JR, max A and min A are always well-defined; in particular, thepossibility of f(D) being a bounded set that fails to contain its sup and inf, as inExamples 3.3 and 3.4, is also precluded. Since 1(TJ) has been shown to be compactin step I, the proof is complete.

Chapter 3 The WeierstrassTheorem92

Page 101: A First Course in Optimization Theory

where F(w) = {(XI, X2) E R~ x R~ I XI + X2 ~ wi represents the set of possibledivisions, and IIi is agent j's utility function. j = I. 2. The parameters of this problemare a E (0. I) and (I) E 1R~.; thus. thc space H of possible parameter values is givellby (0. I) x IR~.

It is easy to see that F(w) is compact for any w E lR~. Moreover, the weighted util­ity function au 1(XI) +Cl-a)u2 (X2) is continuous as a function of (XI, X2) wheneverthe underlying utility functions uland 112 are continuous on 1R~.Thus. provided onlythat the utility functions uland U2 are continuous functions, the Weierstrass Theoremassures us . i the existence of a solution to this family of optimization problems forevery possiole choice of parameters (a, w) E e. 0

Maximize onq Ixj ) + (1- a)1I2(:q) subject to (XI.X2) E F(w).

Example 3.7 In subsection 2.3.7, we described how a weighted utility function canbe used to identify a Pareto-optimal division of a given quantity of resources betweentwo agents. The optimization exercise here is

A similar, but considerably simpler, argument works for the problem described insubsection 2.3.7:

Since the sequence of real numbers p- xk satisfies 0 S P: xk ~ 1 for each k ; anotherappeal to Theorem 1.8 yields 0 ~ p. x ~ 1. Thus, x E Bi p, f). and Bt p, l) isclosed.

As a closed and bounded subset of ~n, it follows that Bi p, 1) is compact. Sincep E ~~+ and 1 ::::0 were arbitrary, we are done. 0

n nLPjx} -+ LPjx) = r x.j=1 pi

~ max {!_, ... ,l.},PI Pn

it is the case that for any x E B(p,/), we have 0 ~ x ~ (~, ... , ~). It follows thatSCp, I) is bounded.

To see that 8Cp, I) is also closed, let (xk Ibe a sequence in [3(1'. f). and supposexk ~ x. We are to show that x E Bi p, 1), i.e., that x ::::0 and 0 ~ p' x ~ I. Thefirst of these inequalities is easy: since xk ::::0 for all k, and xk ~ x, Theorem I.Ximplies x ::::O. To see that 0 ~ P: x ~ I, pick any j E {I. ... , n}. Since .rJ ~ x, wehave xj -+ Xj by Theorem 1.7. Therefore, pjxj ~ PjXj for each i. and summingover j, we obtain

if we define

3.2 Applications

Page 102: A First Course in Optimization Theory

Then, the firm will never use more than ~j units of input i: if it does so. then-nomatter what the quantities of the other inputs used-this will result in total coststrictly exceeding c. On the other hand. the required output level can be produced

Thus. despite the continuity of the objective function. a direct appeal to the Weier­strass Theorem is precluded.

However. for a suitable range of parameter values. it is possible in this problem to"compactify" the action set. and thus to bring it within the ambit of the WeierstrassTheorem. Specifically, one can show that, whenever the input price vector w satisfiesw » D. it is possible to define a compact subset F(y) of F(y). such that attentionmay be restricted to F(y) without any attendant loss of generality. Consider thefollowing procedure. Let i E lR't- be any vector that satisfies g(i) 2: y. (We presumethat at least one such vector exists; otherwise the set of feasible actions is empty. andthe problem is trivial.) Let c = w . x. Define for i = 1, ... , n,

2c~i ==-.

Wj

ai > 0 for all i.

and members of the Cobb-Douglas family

()al agx =X, ···Xnn.

a, > 0 for all ig(x) = ajXj +...+ anxn•

This feasible set is unbounded-and therefore noncompact-for lPany popularlyused forms for g. including the linear technology

F{y) == Ix E lR't- I g(x) 2: y}.

Example 3.8 Recall that the cost minimization invol ves finding the cheapest mix ofinputs through which a finn may produce (at least) y > 0 units of output. given theinput price vector W E lR't-. and technology g: lR't- -+ lR. We will suppose throughoutthis example that g is a continuous function on lR't-. The problem is to minimize the(obviously continuous) objective function W • x over the feasible action set

Unlike these problems. natural formulations of some economic models involvefeasible action sets that are noncompact for all feasible parameter values. At firstblush. it might appear that the Weierstrass Theorem does not have anything to sayin these models. However. it frequently happens that such problems may be reducedto equivalent ones with compact action sets. to which the Weierstrass Theorem isapplicable. A typical example of this situation is the cost minimization problemoutlined in subsection 2.3.4:

Chapter 3 The WeierstrassTheorem94

Page 103: A First Course in Optimization Theory

IFor some forms of the production function g, the necessary cornpactification could have been achieved hya simpler argument. For instance. it is apparent that when g is linear, the finn will never produce an outputlevel strictly greater than the minimum required level y. (Otherwise. costs could be reduced by using les-,of some inpu' ~nd producing exactly y.) So attention may be restricted to the set Ix E R: I g(x) = y},which in this ase is compact. However. the set Ix E R':- I g(x) = y} is noncompact for many forms of R(for instance, vhen g is a member of the Cobb-Douglas family). so, unlike the method outlined in thetext, merely restricting attention to this set will not always work.

Like the cost minimization problem, the expenditure minimization problem of sub­section 2.3.2 also involves an action space that isunbounded for most popularly usedforms for the utility function. However, a construction similar to the one describedhere can also be employed to show that, under suitable conditions, the action spacecan be compactified, so that a solution to the expenditure minimization problem doesexist (see the Exercises).

Finally, the Weierstrass Theorem can also be used in conjunction with other resultsto actually identify an optimal point. We demonstrate its value in this direction inthe succeeding chapters, where we present necessary conditions that every solutionto an optimization problem must satisfy. By themselves, these necessary conditionsare not enough to identify an optimum, since they are, in general. not also sufficient;in particular, there may exist points that are not optima that meet these conditions,and indeed. there may exist points meeting the necessary conditions without optimaeven existing.

On the other hand, if the primitives of the problem also meet the compactness andcontinuity conditions of the Weierstrass Theorem, a solution must exist. By definition.moreover, the solution must satisfy the necessary conditions. \t follows that one-of thepoints satisfying the necessary conditions must, in fact, be the solution. Therefore,a solution can be found merely by evaluating the objective function at each point

Moreover, xk 2: 0 for all k implies x 2: 0 by Theorem 1.8. Therefore, x E Fi y),so n» is closed. An appeal to the Weierstrass Theorem now shows the existenceof a solution to the problem of minimizing w . x over F(y), and, therefore, to theproblem of minimizing w . x over F(y).1 0

This restricted set F(y) is clearly bounded. The continuity of g implies it is alsoclosed. To see this, let xk be any sequence in F(y), and let xk -+ x. Then, g(xk) 2: yfor each k, which implies by the continuity of g that

g(x) = lim g(xk) 2: y.k-+oo

F(y) = (x E lR~ I g(x) ~ y. x; 5 ~i for all r}.

at a total cost of c by using the inpu'Cvector x. It follows that we may, without loss: -restrict attention to the set of actions

953.2 Applications

Page 104: A First Course in Optimization Theory

Proof Since A is bounded, sup A E JR. For k == 1,2, ... , let Nk represent theinterval (sup A - 1/ k, sup A], and let Ak = Nk nA. Then Ak must be nonempty foreach k. (If not, we would have an upper-bound of A that was strictly smaller thansup A, which is impossible.) Pick any point from Ako and label it xs,

We claim that Xk ~ sup A as k ~ 00.This follows simply from the observationthatsince x, E (sup A -1/k, sup A] for each k, it must be the case that d(Xko sup A) <1/ k. Therefore, d (Xk, sup A) ~ 0 with k, establishing the claim.

Lemma 3.10 If A C JR is compact, then sup A E A and inf A E A, so the maximumandminimumof A are well defined.

Proof Pick any sequence IYk Iin feD). The lemma will be proved if we can showthat IYk Imust have a convergent subsequence, i.e., there is a subsequence (Ym(k)} of(Yk) and a point Y E feD) such that Ym(k) -+ y.

For each k, pick Xk ED such that I(x/c) = Yk. (For each k, at least one such pointXk must exist by definition of f(D); if more than one exists, pick anyone.) Thisgives us a sequence {Xk} in D. Since D is compact, there is a subsequence (Xm(k») of{xkl, and a point xED such that Xm(k) ~ x.

Define Y == f(x), and Ym(k) = f(xm(k». By construction, {Ym(k)l is a subsequenceof the original sequence {Yk}. Moreover, since x E V, it is the case that Y = ftx) E

feD). Finally, sinceXm(k) -+ x and fis a continuous function, Ym(k) = I(xm(k» ~f(x) == y, completing the proof. 0

Lemma 3.9 If f: D ~ R is continuous onD,and D is compact, then f(V) is alsocompact.

3.3 A Proof of the Weierstrass TheoremThe following lemmata, when combined, prove the Weierstrass Theorem. The firstshows that, under the theorem's assumptions, feD) must be a compact set, so, inparticular, the problem of unboundedness cannot arise. TIe second completes theproof by showing that if A c R is a compact set, max A and min A are both welldefined, so the "openness" problem is also precluded.

that satisfies the necessary conditions, and choosing the point that maximizes (or, asrequired, minimizes) the value of the objective on this set. Since the set of points atwhich the necessary conditions are met is typically quite small, this procedure is notvery difficult to carry out.

Chapter 3 The WeierstrassTheorem96

Page 105: A First Course in Optimization Theory

I. Prove the following statement or provide a counterexample: if f IS a continuousfunction on a bounded (but not necessarily closed) setD, then sup /(D) is finite.

2. Suppose D C [Rn is a set consisting of a finite number of points (x I, ... ,x T' I.Show that any function f :D -+ lRhas a maximum and a minimum on V. Is thisresult implied by the Weierstrass Theorem? Explain your answer.

3. Call a function /: IRn -+ IRnondecreasing if x, y E IRn with x ::: y impliesI(x) ::::/(y). Suppose I is a nondecreasing, but not necessarily continuous.function on ]Rn, and V C IRn is compact. Show that if n = 1, / always has amaximum on V. Show also that if n > I. this need no longer be the case.

4. Give an example of a compact set D C IRn and a continuous function I: V -+ IRsuch that /(D) consists of precisely k points where k :::2. Is this possible if Dis also convex? Why or why not?

5. Let I: JR+ -+ lRbe continuous on lR+. Suppose that / also satisfies the conditionsthat /(0) = I and lim.......00 I(x) = O. Show that / must have a maximum onlR+. What about a minimum?

6. Let C be a compact subset of JR.Let g: C -)- lR. and I: lR --+ lRbe a continuousfunction. Does the composition /0 ghave a maximum on C?

7. Let g: lR -> lRbe a function (not necessarily continuous) which has a maximumand minimum on lR. Let f: lR -+ lR be a function which is continuous on therange of g. Docs I ()g necessarily have a maximum on ~'! Prove your answer,or provide a counterexample.

8. Use the Weierstrass Theorem to show that a solution exists to the expenditureminimization problem of subsection 2.3.2, as long as the utility function II iscontinuous on 1R~ and the price vector p satisfies p » O.What if one of theseconditions fails?

9. Consider the profit maximization problem of subsection 2.3.3. For simplicity.assume that the inverse-demand curve facing the firm is constant, i.e., that p( y) =p > 0 for all output levels y :::O.Assume also that the technology g is continuouson 1R~,1 rd that the input price vector w satisfies w »O.Are these assumptions

3.4 Exercises

uThe Weierstrass is an immediate consequence of these lemmata.

But Xk E Ak C A for each k, and 'A is a closed set, so the limit of the sequence{Xk} must be in A, establishing one part of the result. The other part, that inf A E Ais established analogously. 0

973.4 Exercises

Page 106: A First Course in Optimization Theory

(a) Suppose there is x* > 0 such that p(x*) = O.Show that the WeierstrassTheorem can be used to prove the existence of a solution to this problem.

(b) Now suppose instead there is x' > 0 such that c(x) 2: x p(x) for all x 2:x', Show, once again, that the Weierstrass Theorem can be used to proveexistence of a solution.

(c) What aboutthe casewhere p(x) = jjforall x (thedemand curve is infinitelyelastic) and cCx) -+ 00 as x -+ oo?

14. A consumer, who lives for two periods, has the utility function v(c(l), c(2»,where c(t) E ]R~ denotes the consumer's consumption bundle in period t, t =1,2. Theprice vectorinperiod t isgivenby p(t)E lR~,p(t) =(Pl(t), ... , Pn(t».The consumer has an initialwealthof Wo,but has no other income. Any amountnot spent in the first period canbe savedandused for the secondperiod. Savingsearn interest at a rate r :::O.(Thus, a dollar saved in period 1 becomes $(1+ r)in period 2.)

sufficient to guarantee the existence of a solution to the profit maximizationproblem?Why or why not?

10. Showthat the budget set F(p, w) = {(x, I) E IR~+I I p·x ::: w(H -l), I ::: H}in the consumption-leisurechoice problem of subsection 2.3.5 is compact if andonly if p »o.

11. Consider the portfolio selection problem of subsection 2.3.6. The problem issaid to admit arbitrage, if there is some portfolio <P E ]RN such that p . <p ::: 0and Z' <p > 0, where p = (PI, ... , PN) is the vector of security prices, andZ = (zij) is the N x Smatrixof securitypayoffs. (Inwords, the problem admitsarbitrage if it is possible to create a portfolio at nonpositive cost, whose payoffsare nonnegative in all states, and are strictly positive in some states.) Supposethat the utility function U:]R~ -+ ]R in this problem is continuous and strictlyincreasing on lR~ (that is, y, y' E lR~ with y > y' implies U(y) > U(y').)Show that a solution exists to the utility maximization problem if and only ifthere is no arbitrage.

12. Under what conditions (if any) on the primitives does the problem of the op­timal provision of public goods (see subsection 2.3.8) meet the continuity andcompactness conditions of theWeierstrassTheorem?

13. Amonopolist faces a downwardslopinginverse-demandcurvep(x) that satisfiesp(O) < 00 and p(x) 2: 0 for all x E R+. The cost of producing x units is ,givenby cCx) 2: 0 where c(O) = O.Suppose p(.) and cO are both continuous onlR+. The monopolist wishes to maximize 1r(x) = xp(x) - cCx) subject to theconstraintx 2: O.

Chapter 3 The WeierstrassTheorem98

Page 107: A First Course in Optimization Theory

subject to Xl ~ )"

X2 ~ Y2 = f(YI -XI)x) ~ )'3 = f()'2 - X2)

and the nonnegativity constraints that XI ::: 0, t = I, 2. 3.Show that if rr and f are continuous on lR+, then the Weierstrass Theorem

may be used to prove that a solution exists to this problem. (This is immediateif one can show that the continuity of f implies the compactness of the set offeasible triples (Xl. X2. X3).)

Maximize IT(X'I) + .rr(X2) + IT(x:ll

(a) Assume the usual nonnegativity constraints on consumption and set up thc>consumer's utility-maximization problem.

(b) Show that the feasible set in this problem is compact if and only if fI( I) »()and p(2) »O.

IS. A fishery eams a profit of IT(;) from catching and selling X' units of fish. The finnowns a pool which currently has YI fish in it. If x E [0. )'1 J fish are caught thisperiod, the remaining i = )'t -x fish will grow to f(;) fish by the beginning of thenext period, where f: lR+ -* lR+ is the growth function for rhe fish population.The fishery wishes to set the volume of its catch in each of the next three periodsso as to maximize the sum of its profits over this horizon. That is, it solves

<;<)3.4 Exercises

Page 108: A First Course in Optimization Theory

100

IThe term "interior optimum" is sometimes used to represent what we have called an "unconstrainedoptimum." While this alternative term is both more descriptive and less misleading. it is not as popular. atleast in the context of optimization theory. Consequently. we stick to "unconstrained optimum."

A point x where f achieves a maximum will be called an unconstrained maximumof f if x E int 'O.Unconstrained minima are defined analogously.

One observation is important before proceeding to the analysis. The concepts ofmaxima and minima are global concepts, i.e., they involve comparisons between thevalue of f at a particular point x and all otherfeasible points Z E V. On the otherhand, differentiability is a local property: the derivative of f at a point x tells ussomething about the behavior of f in a neighborhood of x, but nothing at all aboutthe behavior of f elsewhere on D. Intuitively, this suggests that if there exists anopen set XeD such that x is a maximum of f on X (but maybe not on all of V),the behavior of f at x would be similar to the behavior of f at a point z which is anunconstrained maximum of f on all of V. (Compare the behavior of f at x" and y*in Figure 4.1.) This motivates the following definitions:

int V = {x ED I there is r > 0 such that Btx , r) C VI.

4.1 "Unconstrained" Optima

We now tum to a study of optimization theory under assumptions of differentiability.Our principal objective here is to identify necessary conditions that the derivative off must satisfy at an optimum. We begin our analysis with an examination of whatare called "unconstrained" optima. The terminology, while standard, is somewhatunfortunate, since unconstrained does not literally mean the absence of constraints.Rather, it refers to the more general situation where the constraints have no bite atthe optimal point, that is, a situation in which we can move (at least) a small distanceaway from the optimal point in any direction without leaving the feasible set. 1

Formally, given a set V c IRn, we define the interior of '0 (denoted int D) by

Unconstrained Optima

4

Page 109: A First Course in Optimization Theory

4.2 First-Order Conditions

Our first result states that the derivative of f must be zero at every unconstrainedlocal maximum or minimum. At an intuitive level, this result is easiest to see in theone-dimensional case: suppose x* were a local maximum (say), and f' (x') I: O.Then, if f'(x*) > 0, it would be possible to increase the value of f by moving asmall amount to the right of x .., while if f' (x') < 0, the same conclusion couldbe obtained by moving a small amount to the left of x" (see Figure 4.2); at an

To maintain a distinction between the concepts, we will sometimes refer to a pointx which is a maximum of 1on all of D as a global maximum, Local and globalminima of 1on D are defined analogously.

The next two sections classify the behavior of the derivatives of 1at unconstrainedlocal maxima and minima. Section 4.2 deals with "first-order" cond itions, namely. thebehavior of the first derivative Df of f around an optimal point x ", whi Ie Section 4.lpresents "second-order" conditions, that is, those relating to the behavior of thesecond derivative D21 of 1 around the optimum x*.

• A point x E 1) is a local maximum of f on 1) if there is r > 0 such that f(x) ~f(y) for all y E D n B(x, r) .

• A point xED is an unconstrained local maximum of f on D if there is r > ()such that B(x, r) C D, and I(x) ~ I(y) for all y E Bt x , r).

Fig. 4.1. Global and Local Optima

D

1014.2 First-Order Conditions

Page 110: A First Course in Optimization Theory

In the sequel. we will refer to any point x* that satisfies the first-order conditionDf(x*) = 0 as a critical point of f on 'D.To restate the point made by Theorem 4.1

Example 4.2 Let D ::::::JR. and let f: IR -+ IRbe given by f(x) ::::x3 for x E lR.Then. we have f' (0) == 0, but 0 is neither a local maximum nor a local minimumof f on JR. For. any open ball around 0 must contain a point x > 0 and a pointz < O.and. of course, x > 0 implies f(x) = x3 > 0 = f(O). while z < 0 impliesf(z) = z3 < 0 = f(O). 0

Itmust be emphasized that Theorem 4.1 only provides a necessary condition for anunconstrained local optimum. The condition is in no way also sufficient; that is, thetheorem does not state that if Df(x*) = 0 for some x" in the interior of the feasibleset, then x" must be either an unconstrained local maximum or an unconstrainedlocal minimum. In point of fact, it need be neither. Consider the following example:

oProof See Section 4.5.

Theorem 4.1 Suppose x" E int 'D c JRn is a local maximum of f on D. i.e .. there isr > Osuch that Btx"; r) C D'and f(x*)::: f(x)forallx E Btx"; r). Suppose alsothat f is differentiable at x". Then Df(x*) = O. The same result is true if, instead.x* were a local minimum of f on V.

unconstrained point x", such moves would be feasible, and this would violate thedefinition of a local maximum. A similar argument shows that f'(x*) must be zeroat a local minimum x".

Fig. 4.2. First-Order Conditions in lR

.r'

Chapter 4 Unconstrained Optima102

Page 111: A First Course in Optimization Theory

Observe that unlike Theorem 4.1, the conditions of Theorem 4.3 arc not all neces­sary conditions. In particular, while pans I and 2 of Theorem 4.3 do identi fy necessaryconditions that the second derivative D2f must satisfy at local optima, parts 3 and 4are actually sufficient conditions that identify specific points as being local optima.Unfortunately. these necessary and sufficient conditions are not the same: while thenecessary conditions pertain to semidrfinitcncss of /)2 f at the optimal pouu, thesufficient conditions require definiteness of this matrix. This is problematic from thepoint of view of applications. Suppose we have a critical point x" at which D2 f(x')is negative semidefinite (say), bu' not negative definite. Then. Part 3 of Theorem 4.3does not allow us to conclude that x" must be a local maximum. On the other hand,we cannot rule out the possibility of x' being a local maximum either, becauseby Part 1 of the theorem, D2 f(x*) need only be negative semidefinite at a localmaximum x*.

uProof See Section 4.6.

1. Iff has a local maximum at x, then D2/(x) is negative semidefinite.2. Iff has a local minimum at x, then D2/(x) is positive semidefinite.3. If DI(x) = 0 and D2f(x) is negative definite at some x, then x is a strict/ocal

maximum of 1011D.4, If Df(x) = 0 and D2 f(x) is positive definite at some x, then x is a strict local

minimum of I on D.

Theorem 4.3 Suppose I is a C2 function on V c ]Rn. and x is a point ill the interiorofD.

Strict local minima are defined analogously. The following theorem classifies thebeha-ior of D2 f at unconstrained local optima.

• A local maximum x of f on D is called a strict local maximum if there is r > 0such that f(x) > fey) for ally E Bix , r) n D, y -:f x.

4.3 Second-Order Conditions

The first-order conditions for unconstrained local optima do not distinguish betweenmaxima and minima. To obtain such a distinction in the behavior of f at an optimum,we need to examine the behavior of the second derivative D2 f of f. A preliminarydefinition first:

and Example 4.2, every unconstrained local optimum must be a critical point, butthere could exist critical points that are neither local maxima nor local minima.

4.3 Second-Order Conditions

Page 112: A First Course in Optimization Theory

4.4 Using the First- and Second-Order ConditionsTaken by itself, the first-order condition of Theorem 4.1 is of limited use in computingsolutions to optimization problems for at least two reasons. First, the condition appliesonly to the cases where the optimum occurs at an interior point of the feasible set,whereas in most applications, some or all of the constraints will tend to matter.Second, the condition is only a necessary one: as evidenced by Example 4.2. it ispossible for a point x to meet the first-order condition without x being either a localmaximum or a local minimum, and, indeed, without an optimum even existing.

Even if Theorem 4.1 is combined with the second-order conditions provided byTheorem 4.3, a material improvement in the situation does not result. Parts I and 2 ofTheorem 4.3 also provide only necessary conditions. and the function of Example 4.5passes both of these conditions at x = 0, although 0 is not even a local maximumor local minimum. While parts 3 and 4 do provide sufficient conditions, these are

Example 4.5 As is Example 4.2. let V = JR, and let I: JR ~ JR be given by I(x) =x3 for all x E lit Then, I'(x) = 3x2 and I"(x) = 6x, so 1'(0) = /,,(0) = O.Thus, If! (0), viewed as a I x I matrix, is both positive semidefinite and negativesemidefinite (but not either positive definite or negative definite). If parts 3 and 4 ofTheorem 4.3 only required semidefiniteness, I" (0) would pass both tests; however,o is neither a local maximum nor a local minimum of Ion V. 0

Example 4.4 Let V == JR, and let I: JR ~ JR and g: lR ~ JR be defined byI(x) = x4 and g(x) = _x4. respectively, for all x E IR. Since x4 ::: 0 everywhere,and 1(0) = g(O) = 0, it is clear that 0 is a global minimum of I on V, and aglobal maximum of g on V. However, 1"(0) = g"(O) = 0, so viewed as I x Imatrices, I" (0) is positive semidefinite, but not positive definite, while g" (0) isnegative semidefinite, but not negative definite. 0

or strengthen parts 3 and 4 of the theorem to

If D[(x) == 0 and D2[(x) is negative semidefinite (resp. positive semidefinite), then x isa (possibly non-strict) local maximum (resp. local minimum) of I on V.

However, such strengthening is impossible, as the following examples demonstrate.The first example shows that parts I and 2 of the theorem cannot be improved upon,while the second does the same for parts 3 and 4.

This situation could be improved if we could either strengthen parts I and 2 of thetheorem to

n2.r(x) must be negative definite at a local maximum x, and positive definite at a localrmmmumx,

Chapter 4 Unconstrained Optima104

Page 113: A First Course in Optimization Theory

2Second-ordcr conditions can also help to rule out candidate critical points.Fur Instance. a critical pointcannot be a global maximum if it passes the test for a strict local minimum.3A "solution." as always, refers to a global maximum or minimum. whichever is the object of study in thespecific problem at hand.

On the other hand, in problems in which it is known a priori that a solution3 doesexist (say, because the problem meets the conditions of the Weierstrass Theorem).Theorem 4.1 can, sometimes, come in handy in computing the solution. TIle simplestcase where this is true is where the problem at hand is known to have an unconstrainedsolution. In this case, the setN* of all points that meet the first-order conditions mustcontain this unconstrained optimum point (denoted. say. x"): moreover, since .r" isan optimum on all of the feasible set, it must certainly be an optimum on N*. Thus.by solving for N* and finding the point that optimizes the objective function overN*. the solution to the problem may be identified.

More generally. even if it is not known that the optimum must be in the interior ofthe feasible set. a modified version of this procedure can he successfully employed.provided the set of boundary points of the feasible set is "small" (say, is a finiteset). To wit. the optimum must either occur in the interior of the feasible set. oron its boundary. Since it must satisfy the first-order conditions i11 the former case.it suffices. in order to identify the optimum, to compare the optimum value of theobjective function on the boundary, with those points in the interior that meet the first­order conditions. The point that optimizes lover this restricted set is the optimumwe are seeking. Here is a simple example of a situation where this procedure works:

[}-00.

Example 4.6 Let D =R and let f:D -;- lRbe given by

I(x) = 2x3 - 3x2.

It is easily checked that I is a C2 function on R and that there arc precisely twopoints at which I'(X) = 0: namely, at x = 0 and x = I. Invoking the second-orderconditions, we get j" (0) = -6, while j" (I) = +6.Thus. the pOInt () is a strict localmaximum of f on D, while the point I is a strict local minimum of f 011 D.

However, there is nothing in the first- or second-order conditions that wi II helpdetermine whether these points are global optima. In fact, they arc not: global optimado not exist in this problem, since limx-++oo j(x) = +00, and limx->-oo i(x) =

sufficient for all local optima, and nct just for global optima. Thus, the second-order-­conditions can at most help ascertain if a point is a local optimum; they cannot helpdetermine if a critical point is a global optimum or not.2 In particular, global optimamight fail to exist altogether (though several critical points, including local optima,may exist), but the first- or second-order conditions will not help spot this problem.Consider the following example:

11)54.4 Applications

Page 114: A First Course in Optimization Theory

4.5 A Proof of the First-Order Conditions

We provide two proofs of Theorem 4.1. The first illustrates the technique of "boot­strapping," a technique that we employ repeatedly in this book. That is, we first prove

{x I p- x = I,X::: O}

is part of the boundary of the budget set l3(p, I) = Ix I P> x S 1, x ::: OJ.Nonetheless, this procedure is indicative of how global optima pay be identifiedby combining knowledge of the existence of a solution with first-order necessaryconditions for local opti rna. We will return to this point again in the next two chapters.

It bears stressing that the a priori knowledge of existence of an optimum is impor­tant if the first-order conditions (or, more generally, allY set of necessary conditions)are to be used successfully in locating an optimum. If such knowledge is lacking,then, as in Example 4.6, the set of points satisfying the necessary conditions may failto contain the global optimum, because no global optimum may exist.

Finally, it must be stated that the limited importance given in this section to theuse of second-order conditions in locating optima is deliberate. The value of second­order condi tions is often exaggerated. As we have seen, if a solution is known to exist,then it may be identified simply by comparing the value of the objective functionat the points that satisfy the first-order conditions. The only role the second-orderconditions can play here is, perhaps. to cut down on the number of points at whichthe comparison has to be carried out; for instance, if we are seeking a maximum of [,then all critical points can be ruled out that also satisfy the second-order conditionsfor a strict local minimum. On the other hand, if a priori knowledge of existence of asolution is Jacking. then as Example 4.6 shows, the most that second-order conditionscan do is to help identify local optima, and this is evidently of limited value.

In most economic applications, however, the set of boundary points is quite large,so carrying out the appropriate comparisons is a non-trivial task. For example, in theutility maximization problem of subsection 2.3.1, the entire line segment

Example 4.7 Consider the problem of maximizing f(x) :::::4x3 - 5x2 + 2x overthe unit interval [0,1]. Since [O,IJ is compact, and f is continuous on this interval,the Weierstrass Theorem shows that f has a maximum on this interval. There aretwo possibilities: either the maximum occurs at one of the boundary points 0 or I,or it is an unconstrained maximum. In the latter case, it must meet the first-orderconditions: I' (x) == 12x2 - lOx + 2 = O. The only points that satisfy this conditionare x = 1/2 and x = 1/3. Evaluating I at the four points 0, 1/3, 112, and 1 showsthat x = 1 is the point where I is maximized on [0,1]. An identical procedure alsoshows that x = 0 is the point where f is minimized on [0, I J. 0

Chapter 4 Unconstrained Optima106

Page 115: A First Course in Optimization Theory

andthe right-hand side (RHS) of this expression converges to af(x') jax; as k -+ co.Therefore, g is differentiable at 0, and g'(O) = af(x·)joXj.

l(x* + tke;) - /(x*)tk

g(/) = fix" + te.],

Note that g(O) :::::/'(x*). Moreover, for any sequence tk -+ 0, we have for each k ,

Case 2: n > I

Suppose f has a local maximum at x· EVe IRn, and that / is differentiable at x'.Wewill show that 8f(x*)/8xj = 0 for any i. By the hypothesis that Df(x') exists,wemust have Df(x*) = [o/(X·)/OXI, ... , a/(x*)/axnJ. so this will complete theproof.For i = I, ... , n, let e, E IRnbe the i-th unit vector, i.e., the vector with a I in the

i-tb place and zeros elsewhere. Fix any i, and define a function g on IRby

Taking limits as k goes to infinity along the Yk sequence establishes f'(X') ::: 0,while taking limits along the Zk sequence establishes f'(.x·) :::O. These can holdsimultaneously only if f'(x·) :::::0, which establishes the desired result.

f(Yk) - f(x·) f(zA) - r(x')> 0 > ---.:..--.- - .Yk - x· - - Zk - x'

Consider two sequences (Yk Iand (Zk I such that (a) Yk < x' for all k , Yk -+ .r ",and (b) Zk > x" for all k ; Zk -t x'. For sufficiently large k, we must have Yk, Zk E

Btx", E), so for all large k, f(x*) ~ f(Yk) and f(x·) 2: f(Zk). Therefore, for alllarge k,

lim (f(Yk) - f(X.») = f'(X.).k-s-oo Yk - x"

Case 1:n = ISince f is assumed differentiable at x", it is the case that for any sequence .l'A _..• x ".we have

First Proof We proceed in two steps. First, we establish the result for the casen == I, i.e., when V is a subset of lR. Then we use this to prove that the result holdsfor any n > I. Our proofs only cover local maxima. The case of local minima can beproved analogously; the details are left to the reader.

the result for a special case, and then 'use'this result to establish the general case. The .second proof, which is shorter, establishes the result by appealing to properties ofthe directional derivative (see Chapter I).

1074.5 Proof of First-Order Conditions

Page 116: A First Course in Optimization Theory

Case 1:n = 1

When n = 1, the second-derivative f" (x) of f at x is simply a real number. There­fore, Parts 1 and 2 of the theorem will be proved if we prove Parts 3 and 4. For, supposeParts 3 and 4 are true. Suppose further that x is a local maximum, so f'(x) = O. If

4.6 A Proof of the Second-Order Conditions

As with the first proof of Theorem 4. J, we adopt a two-step procedure to proveTheorem 4.3. We first prove the result for the case where n = J (i.e., D c ~),andthen using this result to prove the general case.

Therefore, for all t > 0 sufficiently small, it is the case that (f(x* +th) - f(x*» > O.However, for t < r/llhil. we also have d(x + tho x) = tllhll < r ; so (x + th) EBtx"; r). Since x is a maximum on Btx", r), this implies f(x* + th) s f'tx"), acontradiction which establishes the claim.

Now, pick any h E ]Rn, and let hi = h, b: = -h. Since hl,h2 E R", wemust then have Df ix"; hi) = Df(x*) . hi = Df(x*)· h :::0, and Df(x*; h2) =Df(x*) . h2 = -Df(x*) . h s O. But Df(x*) . h s 0 and -Df(x*) . h s 0 arepossible simultaneously only if Df tx"; h) = O. Since h was chosen arbitrarily, wehave Df(x*) ./Z = 0 for all h. 0

Second Proof Once again, we prove the result for local maxima. The case of localminima may be handled analogously. So suppose x* is a local maximum of f on D,i.e., there is r > 0 such that f(x*) ~ fey) for all Y E Btx"; r). Suppose also that fis differentiable at x".

Recall from Chapter 1 that the differentiability of f at x" implies that the (one­sided) directional derivative Df'(x"; h) exists for any h E jRn, with, in fact,Df tx"; h) = Df(x*)· h (see Theorem 1.56). We will show that Df tx": h) = 0 forall h E R", This will establish the desired result since Df(x") . h = 0 can hold forall h only if Df(x*) = O.

First, we claim that Df tx"; h) :::0 for any h E ]Rn. Suppose this were not truefor some h, so Df(x*; h) > O. From the definition of Dftx": hj, we now have

Df'(x"; h) = lim f(x* + th) - f(x*) > O.t~O+ t

implying that g has a local maximum at t = O. By case 1, this means we must haves' (0) = 0, and the proof is complete. 0

Now,d(x* +tej,x*) = IIx* +tej -x*1I = Itl,sofort sufficiently close to zero,we must have (x + tej) E U.Therefore, for sufficiently small It I,

g(/) = f(x* + lei) ~ f(x*) = g(O),

Chapter 4 Unconstrained Optima108

Page 117: A First Course in Optimization Theory

Case 2: n > 1When n > I, Parts 3 and 4 are no longer the contrapositives of Parts I and 2,respectively, since the failure of an n x n matrix to be positive definite docs not makeit negative semidefinite. So a proof of Parts I and 2 will not suffice to establish Parts Jand 4, and vice versa.

We prove Part I first. Let x be an unconstrained local maximum of f on D. Wehave to show that for any z in IRn, z i- 0, we have z' D2 f(x)z ::::O.

Pick any z in Rn. Define the real-valued' function g by get) = /(x + !::).Notethat g(O) = I(x), For 1'1sufficiently small. (x + t z) will belong to the neighborhoodof x on which x is a local maximum of f. It follows that there is some E > 0 suchthat g(O) 2: get) for alit E (-E, E), i.e., that 0 is a local maximum of g. By case I,therefore, we must have g"(0) S O.On the other hand, it follows from the definitionof gthatg"(t) = z' D2 f(x +IZ)Z, so that Z' D2 I(x)z = gl/(O) ~ 0, as desired. Thisproves Part 1. Part 2 is proved similarly and is left to the reader,

To see Part 3, suppose Df(x) = 0 and D2 f(x) is negative definite. Since I isC2, by Corollary 1.60, there exists E > 0 such that D2 f( w) is negative definite forall wE Btx , E). Fix E. We will show that I(x) > few) for all w E Bt x , E), Wi' .r.

LetS={zEIRn Ilizil = 1}.DefineB= (yE IR.n I y=x+lz, zE S, It I < fl.We claim that B = B(x, E).

To see this, first note that for any y = x + t z E B. dt;x , y) = Ilx + I:: - .r II =111'lIzll = If I < s .so certainly Be Btx , E). On the other hand.pick any 111 E 8(1. -).

Define 1 == dtx , w) = IIx - wll, and z = (w - x)/t. Then, since w E Bt x , E). wehave 0 ~ t < E, and of course, [z] = IIw - xii/llw - x ] = 1. so z E S. Byconstruction, tz = w - x or x + t z = w. This establishes Bt x , E) C B. Therefore,Btx . E) = B, as claimed.We will prove that x is a strict local maximum of f on B. Fix any Z E S. Let

get) = [(x + tz), t E (-E, E). Since (x + tz) E B = B(x, E), it is the casethat D2/(x + tz) is negative definite for It I < E. The usual argument shows that

[" (x) ;t 0, we must have /"(x L> J), But this will imply by Part 4 that x is actually a_strict local minimum, which is a contradiction because a strict local minimum cannotalso be a local maximum.We prove Part 3 here. Part 4 may be proved by a completely analogous procedure.

Suppose x is in the interior of D and satisfies ['(x) = 0 and f" (x) < 0, Since[" (x) < 0, the first derivative /' must be strictly decreasing in a neighborhood(denoted, say, Btx , r» of x. Since /,(x) = 0, a point Z E B(x, r) must satisfyf'(z) > 0 if z < x, and f'(z) < 0 if z > x. In tum, these signs of f' mean that fis strictly increasing to the left of x (i.e., f(x) > fez) if Z E Bt;x , r),:: < X) andstrictly decreasing [0 the right of x (i.e., f(x) > fez) if Z E Bi x , r), Z > .r ). Thisstates precisely that x is a strict local maximum.

)(N4,6 Proof of Second-Order Conditions

Page 118: A First Course in Optimization Theory

and I· (f(x.) - f(y») 0tm sup .. ::: .y,x. x -y

liminf (f(X*) - fey») ~ 0,ytx' x*-y

Oil the set (x, Y) E IR~ I x + Y = 9} by representing the problem as nnunconstrained optimization problem in one variable.

6. Let f: lR. ~ lR. have a local maximum at x", Show that the following inequalities(called the "generalized first-order conditions") hold regardless of whether f isassumed differentiable or not:

f i», y) = 2 + 2x + 2y - x2 _ y2

4.7 Exercises

I. Is Theorem 4.1 valid if x* if:. int D? If yes, provide a proof. If not, a counterex­ample.

2. Find all the critical points (i.e., points where I' (x) = 0) of the function I: lR ---*R, defined as f'(x) = x - x2 - x3 for x E R,Which of these points can youidentify as local maxima or minima using the second-order test? Are any of theseglobal optima?

3. Suppose x E int D is a local minimum of I on V. Suppose also that I isdifferentiable at x. Prove, without appealing to Theorem 4.1, -that Df(x) = o.

4. Find and classify the critical points (local maximum, local minimum, neither) ofeach of the following functions. Are any of the local optima also global optima?

(a) [tx ,y) = 2x3 + xy2 + 5x2 + y2.(b) [t», y) = e2x(x + y2 + 2y).(c) [t», y) = xy(a - x - y).

(d) [t», y) = x sin y.(e) [tx ,y) = X4 + x2y2 - y.

(f) [t», y) = x4 +i -x3.x

(g) [t», y) = I+ x2 + y2 .

(h) [tx, y) = (x4/32) +x2 y2 - x _ y2.

5. Find the maximum and minimum values of

g'(O) = O.Moreover, g"(t) = z'D2(j(x+tz)z, which is strictly negative for It I < e.This implies that the point 0 is a strict maximum of g on the interval It I < E. In turn,this means that I(x) > I(x + tz) for all It I < E.

Since z was arbitrary, this inequality holds for any z E Son It I < €. This statesprecisely that I(x) > I(w) for all w E Btx , E), proving Part 3. Part 4 is provedanalogously. 0

Chapter 4 Unconstrained Optima110

Page 119: A First Course in Optimization Theory

(The expression y t x is-shorthand for y < x and y ~ x, while y .!.. x i~shorthand for y > x and y ~ x.) Explain whether the "lim sup" and "lim inf"in these inequalities can be replaced with a plain "limit." Examine also whetherthe weak inequalities can be replaced with strict ones. if x" is instead a strictlocal maximum.

7. Suppose I: 1R~ 1Rhas a local maximum at x that is not a strict local maximum.Does this imply that I is constant in some neighborhood of x? Prove your answeror provide a counterexample.

8. Let I: 1R+~ 1R+be a Cl function that satisfies 1(0) = 0 and lim, ......oo I(x) =o. Suppose there is only a single point x E 1R+at which j' (x) =O. Show thatx must be a global maximum of Ion R

9. Let rp: 1R~ 1Rbe a strictly increasing C2 function. Let V be an open set in IRnand let I: D ~ 1Ralso be a C2 function. Finally. suppose that I has a strict localmaximum at x" E V. and that D2/(x*) is negative definite. Use Theorem 4.3to show that the composition rp 0 I also has a strict local maximum at .r ".

10. Let I; lRn ~ 1Rbe a C I function. and let g = - I. Fix any .r, and show thatthe quadratic form D2g(x) is positive definite if and only if D2/(x) is negativedefinite.

III4.7 Exercises

Page 120: A First Course in Optimization Theory

112

INote that we do not preclude the possibility that Ucould simply be all of In .that is. thai D can be expressedusing only the inequality and equality constraints.

j = l, ... ,n,

where U C ]Rn is open, g: IRn -+ ]Rk, and h: IRn -+ ]R'. We will refer to the functionsg = (gl, ... ,gk) as equality constraints, and to the functions h = (h I, ... , hi) asinequality constraints. I

This specificationfor the constraint setV is very general. much more so than mightappear at first sight. Many problems of interest in economic theory can be writtenin this form, including all of the examples outlined in Section 2.3. Nonnegativityconstraints are, for instance, easily handled: if a problem requires that x E 1R+, thismay be accomplished by defining the functions h]: IRn-+ IR

D = un (x E ]Rn I g(x) = 0, h(x) :::0),

5.1 Constrained Optimization Problems

It is assumed in the sequel that the constraint set 1)has the form

It is not often that optimization problems have unconstrained solutions. Typically,some or all of the constraints will matter. Over this chapter and the next, we examinenecessary conditions for optima in such a context.If the constraints do bite at an optimum x, it is imperative, in order to characterize

the behavior of the objective function f around x, to have some knowledge of whatthe constraint set looks like in a neighborhood of x. Thus, a first ~tep in the analysisof constrained optimization problems is to require some additional structure of theconstraint set D, beyond just that it be some subset of IRn. The structure that weshall require. and the order in which our analysis will proceed, is the subject of thefollowing section.

Equality Constraints and the Theorem of Lagrange

5

Page 121: A First Course in Optimization Theory

5.2 Equality Constraints and the Theorem of LagrangeThe Theorem of Lagrange provides a powerful characterization of local optima ofequality-constrained optimization problems in terms of the beh avior of the ohjecti vefunction f and the constraint functions g at these points. The conditions the theoremdescribes may be viewed as the first-order necessary conditions for local optimain these problems. The statement of the theorem, and a discussion of some of itscomponents, is the subject of this section.

V = UnIx E IRn I g(x) = 0, hex) 2: O}.

where U C IRn is open, and h:lRn ~ ]R/. We label these inequality-constrainedoptimizationproblems. Finally, in Section 6.4. we combine these results into thegeneral case of mixed constraints, where the specification of D may involve bothequality and inequality constraints:

1) = Un{xlh(x)2:0},

where U c lRn is open. and g:]Rn ~ IRk. In the sequel, we shall refer to these asequality-constrainedoptimizationproblems.

Then, in Chapter 6. we study the complementary case where all the constraintsare inequality constraints. i.e .. the constraint set has the form

1) = un {x I g(x) = OJ,

As mentioned above, our analysis of constrained optimization problems in thisbook comes in three parts. In this chapter. we study the case where all the constraintsare equality constraints, i.e .• where the constraint set V can be represented as

More generally. requirements of the form a(x) 2: a, f3(x) :::: b. or 1jr(x) = c(where a, b. and c are constants), can all be expressed in the desired form by simplywriting them as a(x) - a 2: 0, b - f3(x) 2: O. or c - 1jr(x) = o.Thus for instance, the budget set Bi p, J) = {x E IR~ I P : x ::: /} of the utility

maximization problem of subsection 2.3.1 can be represented using (n + 1) inequalityconstraints. Indeed. if we define hj(x) = Xj for j = I ..... n, and

hn_I_I(X) = I-p·x,

then it follows that Btp, 1) is the set

{x E IRn Ihj(x) 2: 0, j = 1, ... ,n + II.

j=I, ... ,n.

and using the n inequality constraints

ILl5.2 The Theorem of Lagrange

Page 122: A First Course in Optimization Theory

Example 5.2 Let j and g be functions on 1R2 defined by f (x, y) = x3 + y3and g(x, y) == x - y. Consider the equality-constrained optimization problem ofmaximizing and minimizing f(x, y) over the set D = {(x, y) E 1R2 I g(x, y) = OJ.

Let (x", y*) be the point (0,0), and let A* = O.Then, g(x*, y*) = 0, so (x", y*)is a feasible point. Moreover, since Dg(x, y) = (1, -1) for any (x, y), it is clearly

It must be stressed that the Theorem of Lagrange only provides necessary condi­tions for local optima x", and, at that, only for those local optima x" which also meetthe condition that p(Dg(x*» = k. These conditions are not asserted to be sufficient;that is. the theorem does 1101 claim that if there exist (x, A) such that g(x) = O. andD{(x) + L~=IAi Dgi(x) = 0, then x must either be a local maximum or a localminimum, even if x also meets the rank condition p(Dg(x» = k. Indeed, it is aneasy matter to modify Example 4.2 slightly to show that these conditions cannot besufficient:

Remark In the sequel, if a pair (x", A*) satisfies the twin conditions that g(x*) = 0and Df(x*) + "L7=lAiDgi(X*) = 0, we will say that (X*,A*) meets the first­order conditions of the Theorem of Lagrange, or that (x", )..*) meets the first-ordernecessary conditions in equality-constrained optimization problems. 0

oProof See Section 5.6 below.

where U C IRn is open. Suppose also that p(Dg(x*» = k. Then, there exists avector A'"= (Ai, ...• Ai;) E IRk such that

kDj(x*) +I)iDgi(X*) = O.

;=1

D = U n {x I gi (x) = 0, i == I •.... k},

Theorem 5.1 (The Theorem of Lagrange) Let /: IRn ---+ R. and gi: IRn ---+ IRk beC1 junctions, i = 1•... , k. Suppose x* is a local maximum or minimum of j on theset

al;i (x), I .i= , .... p, j se ], ... ,n.aXj

5.2.1 Statement of the TheoremTo state the Theorem of Lagrange, it is necessary to recall two pieces of notationfrom Chapter I. First, that peA) denotes the rank of a matrix A; and. second. thatthe derivative Dl;(x) of a function s: IRn ---+ IRP is the p x n matrix whose (i, .j)-lhentry is

Chapter 5 Equality-Constrained Optimization114

Page 123: A First Course in Optimization Theory

oLagrange also fail.

Example 5.3 Let 1:]R2 ~ Rand g:]R2 ~ R be given by [t;«, y) = - y, andg(x, y) = y3 - x2, respectively. Consider the equality-constrained optimizationproblem

Maximize [t»; y} subject 10 (x , Y) ED = {(x', y') E ]R2 I g(x', v') = OJ.

Since x2 ::: 0 for any real number x , and the constraint requires that )" = '"? w(,'

must have y ::: 0 for any (x, y) E D; moreover, y = 0 if and only if x = O. Iteasily follows that I attains a unique global maximum on D at the origin (x, y) =(0,0). At this global-and, therefore, also local-maximum, Dg(x. y) = (0,0),so p(Dg(O, 0» = 0 < 1. Thus, the constraint qualification is violated. Moreover.D[t», y) = (0, -1) at any (x, y), which means that there cannot exist any A E Esuch that DI(O, 0) + >"Dg(O, 0) = (0,0). Thus, the conclusions of the Theorem of

5.2.2 The Constraint QualificationThe condition in the Theorem of Lagrange that the rank of Dg(x') be equal to thenumber of constraints k is called the constraint qualification under equality COIl­

straints. It plays a central role in the proof of the theorem; essentiall y, it ensures that-Dg(x·) contains an invertible k x k submatrix, which may be used to define thevector x" (for details, see Section 5.6 below).

More importantly, it turns out to be the case that if the constraint qualification isviolated, then the conclusions of the theorem may also fail. That is, if .r " is a localoptimum at which p(Dg(x·» < k, then there need not exist a vector A" such thatDI(x·) + L:7=1 Ai Dg;(x*) = O.The following example, which involves seeminglywell-behaved objective and constraint functions. illustrates this point:

In the subsections that follow, we focus on two aspects of the Theorem of Lagrange.Subsection 5.2.2 examines the importance of the rank condition that p( Dg(x'» = k .while subsection 5.2.3 sketches an interpretation of the vcctor x" = 0..r ..... ).;)whose existence the theorem asserts.

Thus, if the conditions of the Theorem of Lagrange were also sufficient, then (x'. y')would be either a local maximum or a local minimum of I on D. It is, guile evidently,neither: we have ftx", yO) = 0, but for every e > 0, it is the case that (-E, -E) E Dand (f,f) ED, so I(-f, -E) = -2t3 < 0 = f'(x"; yO). while j(E, e ) = 2(' >

0= I(x*, yO). []

the case that p(Dg(x·, y.» = 1. Finally, since D[tx , y) = (3x2, } l), we have

DI(x·, y.) + )..*Dgtx", y*) = (0,0)+0 . (I. -I) = (0, 0).

II:;

Page 124: A First Course in Optimization Theory

2Every thing in the sequel remains valid if x'(c) is just a local maximum of f on V.

DF(c) = Df(x*(c»Dx*(c),

that is, that a F(c)/acj = .\.;(c). In words, this states precisely that ).,j<c) representsthe sensitivity of the objective function at x*(c) to a small relaxation in the i-thconstraint.

First, note that from the definition of F(·), we have

DF(c) = .\.*(c),

be the value of the objective function at the optimum given the parameter c. Since jis CI and x*(·) has been assumed to be differentiable on C, F(·) is also differentiableon C. We will show that

F(c) = j(x*(c»

Finally, suppose that x*(·) is a differentiable function on C, that is, that the optimumchanges smoothly with changes in the underlying parameters. Let

kDj(x*(c)) + I>r (c)Dgi (x*(c» = o.

i=}

Suppose further that, for each c E C, the constraint qualification holds at x*(c), sothere exists X"(c) E jRk such that

D = un {x E jRn I g(x; c) = OJ...

where c = (C}, ... , Ck) is a vector of constants. This assumption enables a formaldefinition of the concept we need: a relaxation of the i-th constraint may now bethought of as an increase in the value of the constant ci.

Now, let C be some open set offeasible values of c. Suppose that for each c E C,there is a global optimum.f denoted x*(c), of f on the constraint set

g(x; c) = g(x) + c,

5.2.3 The Lagrangean MultipliersThe vector A* = (Ai, ... , Ak) described in the Theorem of Lagrange is called thevector of Iagrangean multipliers corresponding to the local optimum x", The i-thmultiplier A; measures, in a sense, the sensitivity of the value of the objective functionat x* to a small relaxation of the i -rh constraint gi.We demonstrate this interpretationof A* under some assumptions designed to simplify the exposition.

We begin with a clarification of the notion of the "relaxation" of a constraint. Tothis end, we will suppose in the rest of this subsection that the constraint functions gare given in parametric form as

Chapter 5 Equality-Constrained Optimization116

Page 125: A First Course in Optimization Theory

5.3 Second-Order ConditionsThe Theorem of Lagrange gives us the first-order necessary conditions for optirniza­tion problems with equality constraints. In this section, we describe second-order.x/Hoitions for such problems. As in the unconstrained case. these conditions pertainonly to local maxima and minima. since differentiability is only a local property.

So consider the problem of maximizing or minimizing I: IRn --+ IR over the set'D = UnIx I g(x) = OJ, where g: IRn --+ ]Rk, and U C IRn is open. We will assumein this subsection that I and g are both C2 functions. The following notation willcome in handy: given any A E IRk, define the function L on IRn by

kL(x; A) = I(x) +LAigi(X).

i=1

and the proof is complete.An economic interpretation of the result we have just derived bears mention. Since

Aj(c) = CJf(x·(C»/axi. a small relaxation in constraint i will raise the maximizedvalue of the objective function by A;(C). Therefore, A;(C) also represents the maxi­mum amount the decision-maker will be willing to pay for a marginal relaxation ofconstraint i,and is the marginal value or the "shadow price" of constraint i at c.

kDF(c) = -LAT(c)(-ei)

'=1

where ei is the r-th unit vector in ]Rk, that is, the vector that has a I in the i -th placeand zeros elsewhere. It follows that

Since x·(c) must be feasible at all c, it must identically be true for each i and forall c that gi(X*(C» + c, = O. Differentiating with respect to c. and rearranging. weobtain

By combining the last two expressions, we obtain

kD/(x*(c» = - LA~(c)f)g,(x·(c»).

i=1

where Dx*(c) is the n x k matrix-whose (i. j)-th entry is dx;*(c)/iJc). By thc.Jirst­order conditions at each c, we also have

1175.3 Second- Order Conditions

Page 126: A First Course in Optimization Theory

There are obvious similarities between this result and the corresponding one forunconstrained maximization problems (Theorem 4.3). As there, parts I and 2 ofthis theorem are necessary conditions that must be satisfied by all local maximaand minima, respectively,while parts 3 and 4 are sufficient conditions that identifyspecificpoints as being localmaxima or minima.There are also two very important differences. First, the second-order conditions

are no! stated in terms of only the second derivative D2/(x*) of I at x". Rather,we add the "correction term" L}=I Ai D2 gi(X"'), and state these second-ordercondi­tions in terms of 1)2/,(x*; ).") = D2 l(x"') + L7,.., I >"7 D2 Ri (r"),Second, the statedproperties of the quadratic form D2 L(x"; A"') do not have to hold 011 all of Ril, butonly on the subset of R" defined by Z(x*).The last observation motivatesour next result.In Chapter 1,we have seen that the

definitenessof a symmetric n x n matrix A can be completelycharacterized in termsof the submatriccs of A. We now turn to a related question: the characterizationof the definiteness of A on only the set [z =f:. 0 I Bz = OJ,where B is a k x nmatrixof rank k. Such a characterization would give us an alternativeway to check

oProof See Section 5.7 below.

and let D2 L* denote the n xn matrix D2L(x*; A*) = D2 f(x*)+ 2:7=1.:\7 D2g;(x*).

I. If I has a local maximum 011 'D at x", then z'D2L*z ::::0lor all Z E Z (x").2. If I has a local minimum on 1)at x", then z' D2L *z ::::0 for all Z E Z (x"),3. lj z'D2 L *z < 0 for all z E Z (x") with z =f:. 0, then x* is a suict local maximum

of Ion 'D.4. If z' D2L*z > 0 for all z E Z (x") with z =f:. 0, then x* is a strict local minimum

of Ion 'D.

Theorem5.4 Suppose there existpoints x" E Tr and X" E IRksuch that p(Dg(x"'» =k, and Df(x*) + L:7=1 Ai Dg;(x*) = O. Define

Z(x*) = {z E IRn IDg(x*)z = A},

Since I andg are both C2 functions of x, so is L (.; ).) for anygiven valueof A E IRk.Thus, D2L(x; A) is a symmetric matrix and defines a quadratic form on IRn.

kD2L(x;).) = D2fCx)+ LA;D2g;(x).

;=1

Note that the second derivativeD2L(x; A) of LC·; >..) with respect to the x-variablesis the n x n matrix defined by

Chapter 5 Equality-Constrained Optimization118

Page 127: A First Course in Optimization Theory

3Recall from Chapter I that a permutation JT of the integers {I •. , .• n} is simply a reordering of the integers.The notation lTk is used to denote the new integer in the position k under the ordering JT. Thus, for instance.if we represent the permutation (2. I, 3) of the set (I. 2, 3) by 11'. we have ]1'1 = 2. JT2 = I. and IT] = 3.

In an obvious extension of this notation. AI will be the J x J submatrix obtainedfrom AIT by retaining only the first I rows and I columns of ATf • and BY, will denotethe k x I submatrix of B" obtained by retaining only the first J columns of 8lT .

Finally. given any IE! I ... "II l, let C, bc the (k + /) x (k +- /) matrix obtainedby "bordering" the submatrix AI by the submatrix BkJ in the following manner:

and let BIT denote the k x n matrix obtained by applying the permutation tt to onlythe columns of B:

[

QITI1'1

A IT - .- .QlTPT.

When k = I. we shall denote Bki simply by Bi,Next, given any permutation 1f of the first n integers-' let AJT denote the 11 x 11

symmetric matrix obtained from A by applying the permutation ;rr to both its rowsand columns.

Similarly. for I E (I .... n). let Bkl denote the k x I matrix obtained from B byretaining only the first I columns of B:

for the conditions of Theorem 5Aby simply using the substitution A = D2V andB = Dg(x·).

Some notation first. For I E {1, ... , n I. let AI denote the I x I submatrix of Aobtained by retaining only the first / rows and columns of A.:

1195.3 Second-Order Conditions

Page 128: A First Course in Optimization Theory

4Recall from Chapter I that ihe second derivative D2h of a C2 function h is called the hessian of h.

Note the important difference between parts 1 and 3 of Theorem 5.5 on the onehand, and parts 2 and 4 on the other. In parts I and 3, the term -I is raised 10 the fixedpower k, so that the signs of the determinants ICrl are required to all be the same; inparts 2 and 4. the term -1 is raised to the power r, so the signs of the determinantsICrl are required to alternate.

When A = D2 L· and B = Dgtx"), the matrices C, are called "bordered hes­sians." This term arises from the fact that C; is constructed by bordering a r x rsubmatrix. of the hessian D2 L", with terms obtained from the matrix Dg(X*).4

Proof See Debreu (1952, Theorem 4. p.297. Theorem 5, p.298, and Theorems 9and 10, p.299). 0

Denote by C, the matrix obtained similarly when A is replaced by All" and B by BlI".The following result shows that the behavior of the quadratic form x' Ax on the

set {z f:. 0 I Bz = O} can, in tum. be completely characterized by the behavior of thebordered matrices C/. For this result. we will assume that IBk I f:. O. Since we haveassumed that p(B) = k, the matrix B must contain some k x k submatrix whosedeterminant is nonzero; an assumption that this submatrix is the one consisting ofthe first k rows and columns involves no loss of generality.

Theorem 5.5 Let A be a symmetric n x n matrix. and B a k x n"matrix such thatIBk I f:. O. Define the bordered matrices C/ as described above. Then.

I. x' Ax :::0 for every x such that Bx = 0 if and only if for all permutations IT ofthe first 11 integers. andfor all r E (k + I •...• n I. we have (-I )kIC~ I :::O.

2. x' Ax :::0 for all x such that Bx = 0 if and only if for all permutations IT of thefirst 11 integers. andforall r E {k + 1, ...• n}. we have (-lnC~1:::O.

3. x' Ax > Ofor all x f:.0 such that Bx = 0 if and only iffor all r E (k+ I, ... , n),we have (-I)kICrl > O.

4. x=Ax < Ofor all x f:.Osuch that Bx = o ifandonly ifforallr E {k+ I,... ,n},we have (-IYlCI > O.

oo

all

C, =

where Ok is a null k x k matrix. and Bkl is the transpose of Bkl. In full notation, wehave

Chapter 5 Equality-Constrained Optimization12()

Page 129: A First Course in Optimization Theory

In practice. the values of x which maximize j over this set are also usually thesolutions to the equality-constrained maximization problem we started with.

As the third and last step in the procedure. we evaluate f at each point x in the set

{x E IR" I there is A such that (x, A) E MJ.

M = {(x, A) I x E U, DL(x, A) = OJ.

IJL-(X,A) = 0, i = 1,,,.,k.i)A;

Let M be the set of all solutions to these equations for which x E U:

The vector A = (AI,.'" Ak) E IRk is called the vector of UlgWrlf.i(' multipliers.As the second step in the procedure, we find the set of all critical points of L (x. ),)

for which x E U, i.e., all points (x , A) at which DL (x, A) = 0 and x E U. Sincex E IR.n and A E IRk, the condition that D L (x, A) = 0 results in a system of (n + k)equations in the (n + k) unknowns:

aL-(x,A) = 0, j = l , ... ,naX)

kL(x,)..) = j(x) + I),;g;(x).

;=1

Maximize f(x) subject to xED = unIx I g(x) = 01

be given, where f: IRn -+ Rand g: IRn -* IRk are CI functions, and U C IRn isopen. We describe here a "cookbook" procedure for using the Theorem of Lagrange tosolve this maximization problem. This procedure, which we shall call the Lagrangcanmethod, involves three steps.

In the first step, we set up a function L:D x IRk -* R called the Lagrangcan,defined by:

5.4 Using the Theorem of Lagrange5.4.1 A "Cookbook" Procedure

Let an equality-constrained optimization problem of the form

The use of the second-order cofidilions in applications is illustrated in Section ~:5below, where we work out examples based on economic problems. Subsection 5.5.1illustrates the use of Theorem 5.4 in the context of a utility maximization prob­lem, while subsection 5.5.2 does likewise for Theorem 5.5 in the context of a costminimization problem.

1715.4 Using thr Theorem of 1.tIXT(III,J:('

Page 130: A First Course in Optimization Theory

I. A global optimum x" (i.e., a global maximum or, as required, a global minimum)exists to the given problem.

2. The constraint qualification is met at x",

Then, there exists A'" such that (x", A*) is a critical point of L.

Proposition 5,6 Suppose the following two conditions hold:

for j = 1, ... , n. Thus. (x, A) is a critical point of L if and only if it meets the first­order conditions of the Theorem of Lagrange, i.e .• it satisfies both g(x) = 0 as wellas Df(x) + 2:7=1 x, Dgi(x) = o.

Now, suppose x is a local optimum. and the constraint qualification holds at x.Then. we must certainly have g(x) = O. since x must be feasible. By the Theoremof Lagrange. there also exists ~ such that Df(x) + 1:7=1 ~i Dg, (x) = O.This statesprecisely that (x. ~) must be a critical point of L. and establishes the claimed property.

A particular implication of this property is the following, which we state in theform of a proposition for case of future reference:

[)L of k ogi,,(x,A) = -(x) + I:>/-(x) = 0vXj OXj ;=1 OXj

for i = I,....k, as well as

= 0,e:OAi (x ,A) = gi(X)

Indeed, this is an immediate consequence of the definition of L. For (x, A) to be acritical point of L, we must have

The set of all critical points of L contains the set of all local maxima and minima of f onDat which the constraint qualification is met. That is, if x is any local maximum or minimumof f on V, and if the constraint qualification holds at X. then there exists ~ such that (x, ~)is a critical point of L.

5.4.2 Why the Procedure Usually Works

It is important to understand why the Lagrangean method typically succeeds inidentifying the desired optima, and equally importantly, when it may fail. The key toboth questions lies in the following property of the set of critical points of L:

The steps to be followed for a minimization problem are identical to the ones fora maximization problem, with the sole difference that at the last step, we select thevalue of x that minimizes f over the set [x I there is A such that (x, A) E MI.

Chapter 5 Equality-Constrained Optimization122

Page 131: A First Course in Optimization Theory

-2Ax = 0-1 + 3Ay2 == 0_.x2 + y3 == O.

Example 5.7 As in Example 5.3, let I and g be functions on ~2 given by f ix , y) =- y and g(x, y) = y3 - x2, respectively. We saw earlier that the unique globalmaximum of Ion D = lex. y) I g(x, y) = O} is at (x, y) = (0.0). but that theconstraint qualification fails at this point; and, consequently. that "there is no A suchthat DI(O, 0) + )"Dg(O. 0) = O. Evidently. then, there is no choice of A for whichthe point (0, 0, )..) turns up as a solution to the critical points of the Lagrangcan L inthis problem.

Indeed, the critical points of L in this problem admit no solution at all: if we defineL(x, y,)..) = f (», y) + )..g(x, y) = - y + )..(y3 - x2), the critical points of L arethe solutions (x, y, )..)to the following system of equations:

5.4.3 When It Could FailUnfortunately, when the conditions of Proposition 5.6 fail (0 hold. the procedurecould also fail to identify global optima. This subsection provides a number of ex­arnples to illustrate this point.

First, if an optimum exists but the constraint qualification is not met at the optimum.then the optimum need not appear as part of a critical point of L. Of course. thisdoes not imply that L will have no critical points; the conditions of the Theoremof Lagrange are only necessary, so many critical points could exist. The followingexamples illustrate these points. In each example, a unique global maximum exists.and in each case the constraint qualification fails at this optimum. In the first example.this results in a situation where there are no solutions at all to the equations that definethe critical points of L; in the second example, on the other hand, multiple solutionsexist to these equations, but the problem's unique maximum is not one of these.

It follows that under the two ~o~ditions of this proposition, the Lagrangean methodwill succeed in identifying the optimum x". Indirectly, the conditions of this propo­sition also explain why the Lagrangean method usually works in practice. In mostapplications, the existence of a solution is not a problem, and neither typically is theconstraint qualification. In particular. it is often possible to verify existence (/priori,say, by an appeal to the Weierstrass Theorem. Although it is not. in general. possibleto do likewise for the constraint qualification (since this depends on the propertiesof Dg at the unknown optimal point x"), it is quite often the case that the constraintqualification holds everywhere on the feasible set D, so this is not a problem either.(The utility-maximization problem is a typical case in point. See subsection 5.5.1below.)

5.4 Using tire Theorem oj Lagrange

Page 132: A First Course in Optimization Theory

Sit is evident that [ is negative when x < 0 because, in this case, 2x3 < 0 and -3x2 < O. Since[' (x) = 6x2 - 6x. a simple calculation shows that f is strictly decreasing on the interval (0, I) and isstrictly increasing for x > I.Since [(0) = [(3/2) = 0, the claim in the text obtains.

Alternatively, even if the constraint qualification holds everywhere on the con­straint set D, the cookbook procedure may fail to identify the optimum, for thesimple reason that an optimum may not exist. In this case, the optimum evidentlycannot turn up as part of a critical point of L, although, once again, Lmay have manycritical points. The following examples illustrate this point. In both examples, globaloptima do not exist. In the first example, it is also true that L has no critical pointsat all; in the second example, however, multiple solutions exist to the equations thatdefine the critical points of L.

For the second condition to be met, we must have A = 0 or y = O.If Y = 0, then thethird condition implies x = 3, but x = 3 violates the first condition. This leaves thecase A = 0, and it is easily checked that there are precisely two solutions that arise inthis case, namely, (x, y,).) = (0, v'27, 0) and (x, y,).) = (1,,J8, 0). In particular,the unique global maximum of the problem (x , y) = (3,0) does not tum up as partof a solution to the critical points of L. 0

Example 5.8 Let I and g be functions on]R2 defined by [tx ;y) = 2x3 - 3x2, andg(x, y) = (3 - x)3 - y2, respectively. Consider the problem of maximizing loverthe set 'D = (x, y) E ]R2 I g(x, y) = O}.

Since the constraint requires that (3 - x)3 = i,and since y2 2: 0, it is easy tosee that the largest value of x on the feasible set is x = 3 which occurs at y = O.A little calculation also shows that I is nonpositive for x in the interval (-00, ~],and is strictly positive and strictly increasing for x > i.5 It follows from thesestatements that I attains a global maximum on 'D at the point (x, y) = (3,0).Note that since Dg(x, y) = (-3(3 - x)2, -2y), we have Dg(3, 0) = (0,0), so theconstraint qualification fails at this unique global maximum. We will show that, as aconsequence, the cookbook procedure will fail to identify this point.

The Lagrangean L for (his problem has (he form L(x, y, A) = 2x3 - 3x2 +),,«3 - x)3 - y2), so the critical points of L are the solutions (x, y,..),,)to

6x2 - 6x - 3),,(3- x)2 = 0

-2AY = 0

(3 - x)3 -l = o.

For the first equation to be satisfied, we must have x = 0 or A = 0_ If A = 0, thesecond equation cannot be satisfied. If x = 0, the third equation implies y = 0, andonce again the second equation cannot he satisfied. 0

Chapter 5 Equality-Constrained Optimization124

Page 133: A First Course in Optimization Theory

This system of equations has two solutions: (x , y,)..) == (2.2, -6) and (x, y,).) =(1,1, -3).Evaluating I at these two points, we get 1(1,1) = 5/6 and 1(2, 2) = 2/3, so if

we were following the cookbook Lagrangean procedure, we would pick (I, 1) as thepoint where I is maximized on V, and (2, 2) as the point where I is minimized on

and g(x, y) = x - y. Consider the problem of maximizing and minimizing Jonthe set V = ((x,y) I g(x,y) = OJ. Since Dg(x,y) = (I, -I) at all (x.y). wehave p (Dg(x, y» = I at all (x , y) and the constraint qualification holds everywhereon the feasible set. If we set up the Lagrangean Ltx , y) = f (x , Y) + Ag(x, y). thecritical points of L arc the solutions (x, y, A) to

x2 + 2 +). = 0-3y - A = 0

x - y = O.

Example 5.10 Let I and g be functions on ~2 defined respectively by

I 1 3 2I(x,y) = -x' --y +2x,3 2

IfA i- O,then the first two equations imply that x = - y. but this violates the IbmJequation. If )..= 0, then from the first two equations, we have x = y = 0, so againthe third equation cannot be met. Thus, there are no solutions to the critical pointsof L.

Since the constraint qualification holds everywhere, it must be the case that globalmaxima and minima fail to exist in this problem (otherwise, by Proposition 5.6. suchpoints must arise as critical points of L), and, indeed, it is easy to see that this isthe case. For any x E lR, the point (x, 1 - x) is in the feasible set of the problem.Moreover, ft», I - x) = x2 - (l - x)2, so by taking x large and positive, I canbe made arbitrarily large, while by taking -x large and positive, I can be madearbitrarily negative. 0

2x -). = 0-2y -). = 0

I - x - y = O.

Example 5.9 Consider the problem of maximizing and minimizing I(x, y)- =­x2 - y2 subject to (he single constraint g(x, y) = I - x - y = O. Observethat Dg(x, y) = (-I, -I) at any (x , y). so p( Dx(x. y» = I everywhere. andthe constraint quulitication holds everywhere 011 the feasible set. LeI l.t;x, y, A)f'(», y) + Ag(X, y). The critical points of L are the solutions (x, y, A) to:

12<;5.4 Using the Theorem of Lagrange

Page 134: A First Course in Optimization Theory

Note also that. since Dg(x. y) = (I. -I 1 at all (r , yl. the set

Z(x.y) = IZER2IDg(x,y):=Ol

is simply the set [z E JR.2 I z = (ui, w). W E RI.which is independent of (r , y). Denote this set by just Z.To show that a point (r , y) is a strict local maximum (resp. strict local minimum) of f on D, it suffices byTheorem 5.4 to show that z' D2L(x. y)z is strictly negative (resp. strictly positive) for z e Z with z # O.At (x, y) = (I. I) and z = (w, w) e Z, we have z' D2L(x. y)z = _w2 < 0 for all w # 0, while for(x . .v) = (2, 2), we have z' D2L(x, y)z =w2 > 0 for all w # O.The claimed results are proved.

7An exception to this statement arises when the constraint qualification is known to hold everywhere on theconstraint set V. In this case. if a global optimum exists, it must tum up as part of a critical point of L;thus. if no critical points of L exist. it must be the case that the problem admits no solution.8Note the important point that, as evidenced by Example 5.10, the second-order conditions are of limitedhelp in resolving this problem. Even if a critical point of L could be shown to be a local maximum (say).this would not establish it to be a global maximum.

2 [2XDL(x.yl= 0

6To see this. note that

The examples of this subsection add a word of caution regarding the blind use of theLagrangean method in solving equality-constrained problems. In many applications,as we pointed out earlier, it is possible to verify a priori both the existence of asolution, and the constraint qualification condition, and in such cases the methodworks well. Sometimes, however, such a priori knowledge may not be available,and this could be problematic, whether or not solutions to the critical points of theLagrangean exist.

To elaborate, the critical points of the Lagrangean could fail to have a solution fortwo very different reasons. First, as in Example 5.7, this could occur because althoughan optimum exists, the constraint qualification is violated at that point; alternatively,as in Example 5.9, this could be the case because an optimum does "not even exist.Therefore, the absence of a solution to the critical points of the Lagrangean does notenable us to draw any conclusions about the existence, or nonexistence, of globaloptima."

On the other hand, it is also possible that while solutions do exist to the criticalpoints of L, none of these is the desired optimum point. Again, this could be because,although an optimum exists, the constraint qualification is violated at that point (asin Example 5.8); or because an optimum does not exist (as in Example 5.10).8 Thus,even the existence of solutions to the critical points of L does not, in general, enableus to make inferences about the existence, or nonexistence, of optima.

oasx --? -00.

D. Indeed, the second-order conditions confirm these points, in the sense that (1, 1)passes the test for a strict local maximum, while (2, 2) passes the test for a strict localrninirnum.?

However, neither of these points is a global optimum of the problem: In fact, theproblem has no solutions, since [Ix, 0) ~ +00 as x ~ +00, and f(x, 0) ~ -00

Chapter 5 Equality-ConstrainedOptimization12()

Page 135: A First Course in Optimization Theory

Evaluating [ at these four points, we see that J(1, 0, I) = J( -1,0, I) = I, whileJ(O. 1. -1) = 1(0, -I, -1) = O. Since the critical points of L contain the globa Imaxima and minima of I on D, the first two points must be global maximizers of fon V, while the latter two are global minimizers of J on 7).

(+1,0, +1)(-1,0,+1)(0,+1,-1)(0,-I, -I)

(x, y, A) ==

From the first equation we have 2x( I - A) = 0, while from the second, we have2y(l + ).,)= O. If A i= ± I, these can hold only if x = y = 0, but this violatesthe third equation. So A ± I. It easily follows that there are only four possiblesolutions:

2x - 2Ax = 0-2y - 2Ay = 0

x2 + y2 = 1.

is simply the unit circle in JR2.We will first show that the two conditions of Proposition 5.6 (namely, existence of

global optima, and the constraint qualification) are both met. so the critical points ofthe Lagrangean L will, in fact, contain the set of global maxima and global minima.That solutions exist is easy to see: J is a continuous function onD. and D is evidentl ycompact, so an appeal to the Weierstrass Theorem yields the desired conclusion. Tocheck the constraint qualification, note that the derivative of the constraint functiong at any (x , y) E JRl is given by Dg(x, y) = (2x, 2y). Since x and y cannotbe zero simultaneously on 1) (otherwise, x2 + y2 = 1 would fail), we must havep(Dg(x, y» = I at all (x, y) E 1). Therefore, the constraint qualification holdseverywhere on 7).

Nc.w set up the Lagrangean L(x, y. A) = x2 - y2 + A(I - x2 - y2). The criticalpoints v!L are the solutions (x, y, A) E ]R3 to

5.4.4 .i\ Numerical Example

We close this section with an illustration of the use of the Lagrangean method ona simple numerical example. Consider the problem of maximizing and minimizing[(x, y) = x2 - y2 subject to the single constraint g(x, y) = I - x2 - )'2 = O.Theconstraint set

1275.4 Using the Theorem of Lagrange

Page 136: A First Course in Optimization Theory

/3(p, I) = {(XI,X2) I 1- PIXI - P2X2 ~ 0, XI:::: O,X2:::: O}

is a compact set. The utility function U(XI, X2) = XIX2 is evidently continuous on

Step J: Reduction to an Equality-Constrained Problem

As stated. this utility-maximization problem is not an equality-constrained one. Webeginby transforming it in a mannerthatwill facilitateuseof theLagrangeanmethod.To this end. note that the budget set

5.5.1 An Illustration/rom Consumer TheoryThe utility-maximization example we consider here involvesa consumer who con­sumes two goods, and whose utility from consuming an amount Xi of commodityi = I, 2, is given by u(x I, X2) = XI X2. The consumerhas an income I > 0, and theprice of commodity i is Pi > O. The problem is to solve:

max{Xlx2 I 1- PIXI - P2X2 2: 0, XI 2: 0, X2 2: OJ.Weproceed with the analysis in three steps:

5.5 Two Examples from EconomicsWenow present two familiar examples drawn from consumer theory and producertheory, respectively, namely, those of utility maximization and cost minimization.The presentation here achieves two purposes:

I. As with most problems in economic theory, these problems are characterizedby inequality, rather than equality constraints. We show that, under suitableconditions, it may be possible to reduce such problems to equivalent equality­constrained problems. and thereby to bring them within the framework of theTheorem of Lagrange.

2. We demonstrate the use of the Lagrangean L in determining the solution tothe transformed equality-constrainedmaximizationproblems. In particular, weshow how to(a) use the critical points of L to identifycandidateoptimal points in the reduced

equality-constrained problem;(b) use the second-order conditionsto determineif a critical point of L is a local

optimum; and, finally.(c) combine the WeierstrassTheorem and theTheoremof Lagrange to identify

global optima of the reduced, and thereby the original, problem.

Of course, the phrase "under suitableconditions" is important for the transformationof inequality-constrainedproblems into equality-constrainedones. See the remarksat the end of this section.

('/fII/lI('( 5 Equality-Constrained Optimiration12li

Page 137: A First Course in Optimization Theory

If A = O. this system of equations has no solutions. since we must then have XI

Xz = 0 from the first and second equations. but this violates the third equation. Sosuppose A =1= O. From the first two equations. we then have A = XI /1'2 '"" X2/ Pl. soXI = P2X2/1'1. Using this in the third equation. we see that the unique solution tothis set of equations is given by xi = 1/21'1. xi = J /2p2. and .A. = 1/2p, 1'2·

Step 3: Classifying the Critical Points oj LAs a first step in classifying the single critical point of L. we show how the second­order conditions may be used to check that (xi. xi) is a strict local maximum of 1/

on I3"(p, I).Webegin by noting that Dg(xT, xi) = (- PI. - P2) so the set

Z(x*) = {z E He I Dg(x*)z = OJ

L(XI, X2.).) = XIX2 + A(I - Pixi - p2X2)·

The critical points of L are the solutions (xi. xi, )..)E lR~+ x lR to:

X2 - API = 0

XI - AP2 = 0[ - PIXI - P2X2 = O.

Step 2: Obtaining the Critical Points of L

We set up the Lagrangean

S*(p. l) = IR~+n (XI. X2) 11 - Pixi - P2X2 = 0).

and by setting U = lR~+ and g(Xt. X2) = J - PIXI - p2X2. we are now within thesetting of the Theorem of Lagrange.

this set, so by the Weierstrass Theorem a solution (xr, xi) docs exist to the given· .maximization problem.

Now, if either XI = 0 or X2 = 0, then U(XI, X2) :::: o. On the other hand. theconsumption point (XI, X2) = (l Fl p«, J /2p2). which divides the available incomeJ > 0 equally between the two commodities, is a feasible consumption point.which satisfies U(XI,X2) :::: XIX2 > O. Since any solution (xr,xi) must satisfyu(xj, xi) ::: u(il. X2), it follows that any solution (xr, xi) must satisfy x;* > O.i = 1,2. Moreover, any solution must also meet the income constraint with equality(i.e., we must have PI xi + P2xi = l) or total utility could be increased.

Combining these observations, we see that (x~.xi) is a solution to the originalproblem if and only if it is a solution to the problem

max{xlx2 I PIXt + P2X2 = I. XI. X2 > OJ.The constraint set of this reduced problem. denoted, say, B· (p. I). can equivalentlybe written as

17')5.5 Examplesfrom Economic«

Page 138: A First Course in Optimization Theory

Minimize WIXI + W2X2 subject to (XI, X2) E X(Y).

We will assume throughout that WI> 0 and W2 >- 0 (otherwise, it is apparent thata solution cannot exist). Once again, we proceed in three steps.

The firm's optimization problem is:

5.5.2 An Illustration from Producer TheoryWe now turn to the cost-minimization problem faced by a firm. which uses two inputsXl and Xl to produce a single output y through the production function y = g(xi , X2).The unit prices of XI and X2 are WI :::: 0 and W2 ~ 0 respectively. The firm wishes tofind the cheapest input combination for producing y units of output. where y ::::° isgiven. Let X(ji) denote the set of feasible input combinations:

X(y) = {(XI, X2) E JIl2 I XIX2 = y,.1"1 ~ O,X2 ::::OJ.

by the arguments given in Step I. Second, note that the single cons!raint

g(XI, X2) = 1- PIXI - P2X2

of this problem satisfies Dgix i, X2) = (- PI, - pz) everywhere on l3*(p, I). SincePI, PZ > 0 by hypothesis, we have p(Dg(xl, xz» ::; 1 at all (XI, Xz) E l3*(p, I). Inparticular, the constraint qualification holds at the global maximum. Therefore, byProposition 5.6, the global maximum must be part of 3critical point of the LagrangeanL in this problem. There is a unique critical point, with associated x-values (xr, xi),so these x-values must represent the problem's global maximum.

So for any Z E ]R2, we have z' D2L'"Z = 2z1Z2. In panicu\ar, for any z E Z(x') withZ #- 0, we have z' D2L "z == -2P2Z~/ PI < O. Thus, by Theorem 5.4, (xr, xi) is astrict local maximum of u on l3*(p, I).

However, we can actually show the stronger result that (xi, xi) is a global max­imum of u on l3*(p, I) by showing that the conditions of Proposition 5.6 are met.First, note that a solution (i.e., a global maximum) does exist in the problem

0]=[0 I]o I 0 .1] * [0o +)... 0

• {2 P2Z2 }Z(X) = ZEIR IZI=-~'

Defining D2L· = D2u(x·) + )....D2g(x*), we have

is simply

Chapter 5 Equality-Constrained Optimization130

Page 139: A First Course in Optimization Theory

By Theorem 5.4, if we show that z' D2L "z > 0 for all z =1= 0 such that D2g(x·)·z c;;:: O.then we would have established that x" is a strict local minimum of f on D. ByTheorem 5.5, showing that z' D2L*z > 0 for all z =1= 0 such that Dg(x*)z = 0 is thesame as showing that the determinant of the following matrix is negative:

[0 Dg(X.)]

CI:::;:: Dgix")' D2L. .

-A']o .

Next, note that the matrix D2 L * = D2/(x*) + A* . D2g(x') is given by

D2L* = [0 0] )..* [0 -I ] = [ 0o 0 + -1 0 -A'

Dg(xj. xi) = (-xi, -xi)·

WI X2 WIXI- = - or X2 = --.W2 XI W2

Substituting this into the third equation, we obtain the unique solution

* (W2y)1/2 * (Wly)1/2 * (WIW2)1/2XI = -- , x2 = -- , and A = --WI W2 Y

Step 3: Classifying the Critical Points of L

Once again, we begin with a demonstration that (xi. xi) is a strict local minimum of!(XI. X2) = WIXI + W2X2 on X*(ji). This time. we employ Theorem 5.5 to achievethis end. First. note that

Step 2: Obtaining the Critical Points of L

Define the Lagrangean L(XI, X2.)..) = f t» I, X2) + Ag(Xl, X2) = WIX I + W2X2 +)..(ji - XIX2). The critical points of L are the solutions to:

WI - AX2 = 0W2 - AXI = 0

ji - XlX2 = O,

From the first (or the second) equation. )..cannot be zero. Therefore, from the frrs:two equations we have

X*(ji) = IR~+ n ((XI,X2) E 1R21 XIX2 = jil·

Setting I(Xl, X2) = WIXI + W2X2, g(XI, X2) = ji - XIX2, and U = IR~+, we are inthe framework of the Theorem of Lagrange.

5.5 Examplesfrom Economics

Step J: Reductionto ~~ e'quality-constrained problem

In this problem, this first step is trivial. If Xl = 0 or X2 = 0, we have XIX2 = 0, sothe inequality constraints have no bite, and the constraint set is actually given by

Page 140: A First Course in Optimization Theory

5.5.3 RemarksAt the beginning of this section, it was mentioned that the reduction of inequality­constrained problems to equality-constrained ones was possible only under suitableconditions. The purpose of this subsection is to emphasize this point through three

so that I(XI, X2) = WIXI +W2X2 achieves a global minimum on X*(ji) if and onlyif it achieves a global minimum on X(y). Since Xey) is evidently compact, and fis continuous on this set, a minimum certainly exists on X(y) by the WeierstrassTheorem; therefore, a minimum of I on X* (y) also exists.

That the constraint qualification condition is met is easier to see. The problem'ssingle constraint g(XI, X2) = y-XIX2 satisfies Dgtxv, X2) = (-X2, -XI), and sinceneither XI nor X2 can be zero on X*(y) (otherwise, XIX2 = Y > 0 is violated), we havep(Dg(xl' X2» = 1 everywhere on X*(y). In particular, the constraint qualificationmust hold at the global minimum.

Since both conditions of Proposition 5.6 are met, the global minimum must turnup as part of the solution to the critical points of L. Since only a single solution existsto these critical points, it follows that the point (xT, xi) is, in fact. a glohalminimumof f on V.

Some simple calculation shows that ICIl = -2)" *xixi < 0, as required. Thus,(xi, xi) is a strict local minimum of I on V.

As in the utility-maximization problem, however, we can show the stronger resultthat (xi, xi) is a globalminimum of Ion X· (y), by proving that the problem meets(he conditions of Proposition 5.6. The existence issue is a little trickier, because thefeasible action set X*(ji) is noncompact. However, as explained in Chapter 3, wecan reduce the problem to one with a compact action set as follows. Let (X\, ;2) =«(j!)1/2, (y)I/2). Note that XI;2 = y, so (XI, X2) E X*(y). Define C = WIXI +W2X2.Let XI = 2c/wl and X2 = 2C/W2. Then, it is clear that the firm will never usemore than Xj units of input i, since the total cost from doing so will strictly exceedc (no matter what the quantity of the other input used), while the input combination(XI, ;2) can produce the desired output at a cost of c. Thus, w~ may, without loss,restrict the feasible action set to

-A*

-Xi]-A.. .

°

-xio

Substituting for Dg(x·) and D2L·, we obtain:

Chapter 5 Equality-Constrained Optimiration132

Page 141: A First Course in Optimization Theory

Example 5.12 Suppose that in the problem of subsection 5.5.2, the production func­tionwere modified to the linear function g(xi.X2) = XI +X2. A little reflection shows

A similar problem can also arise in the cost-minimization exercise, as the nextexample demonstrates.

The first two equations are in contradiction except whcn PI = 1'2. which is the onlycase when thc nonncgativity constraints do not bite. Thus. except ill this case. theLagrangean method fails to identify a solution. 0

I - API = 01- AP2 = 0

PIX) + P2X2 = I.

and that any (xj , X2) E IR~ that satisfies PIXI + P2X2 = J is optimal when PI = 1'2·Thus. the constraints Xi ::: 0 "bite" when PI =1= Pz·Ifwe had ignored this problem. and attempted to use the Lagrangean method to

solve this problem. the following system of equations would have resulted:

if p) < P2

if PI > P2I(llpl,O),(xr, xi) =:

(0, 1/ P2),

It is still true that the utility function is continuous and the budget set is compact.so the Weierstrass Theorem guarantees us a maximum. It also remains true that anymaximum (xt, xi) must satisfy 1-Plxr - P2xi = O.Nonetheless. it is nOIpossihleto set up the problem as an equality-constrained maximization problem. since theconstraints XI ::: 0 and X2 ::: 0 can no longer be replaced with.x I > 0 and X2 > O.Indeed. it is obvious upon inspection that the solution to the problem is

Maximize XI + X2 subject to (XI, XZ) E [3(1', I).

Example 5.11 Consider a utility maximization problem as in subsection 5.5.1. butwhere the utility function U(XI. X2) = XIX2 is replaced with V(XI, .\"2) = XI + X~.

That is, we are to solve

examples, which demonstrate .Ee danger of attempting such a reduction by ignor>­ing nonnegativity constraints. when ignoring these constraints may not be a legit­imate option. In the first two examples. the problems that arise take on the formof inconsistent equations defining the critical points of the Lagrangean. In the lastexample. the Lagrangean admits a unique solution. but this is not the solution tothe problem.

5.5 Examples from Economics

Page 142: A First Course in Optimization Theory

An optimum evidently exists in this problem, since the feasible set is compactand the objective function is continuous. Moreover, it is also apparent that at theoptimum, both resources must be fully allocated; that is, we must have XI +X2 = xand YI + Y2 = y. If we ignore the nonnegativity constraints in this problem, and

where

Maximize [axlYI + (1 - a)x2Y2] subject to (XI, X2, YI, Y2) E F(x, y),

Example 5.13 We consider the problem of identifying Pareto-optimal divisions ofa given amount of resources between two agents, as described in subsection 2.3.7.Suppose there are two commodities, and the endowment vector co E lR~+ consists ofx units of the first resource and Y units of the second resource. Let Xi and yt denoterespective}y the amounts of resources x and y that are allocated to agent i, i = I, 2.Agent i's utility from receiving the allocation (Xi, Yi) is assumed to be given byU(Xi, Yi) = Xi yt- Given a weight a E (0, I), the problem is to solve

Finally, the following example shows that even if the equations that define thecritical points of L do have a solution, the solution need not be the solution tothe problem if the reduction of the original inequality-constrained problem to anequality-constrained one is not legitimate.

The first two equations are consistent if and only if WI = w2~Thus, if WI =I- W2,the Lagrangean method will not work. 0

WI +A = 0W2 +).._ = 0XI +X2 = y.

and the critical points of L are the solutions to

and that any pair (XI, X2) E X(ji) is optimal when WI = W2. Thus, when WI =IW2, the nonnegativity constraints matter in a nontrivial sense. If these constraintsare ignored and the problem is represented as an equality-constrained problem, theLagrangean has the form

if WI < W2

if WI > W2I(y,O)(XI, X2) =

(0, y)

that, in this case, the solution to the problem is given by

Chapter 5 Equality-Constrained Optimization134

Page 143: A First Course in Optimization Theory

1. We shall assume. without loss of generality. that the k x k submatrix of Dg(x·)that has full rank is the k x k submatrix consisting of the first k rows and kcolumns.

5.6 A Proof of the Theorem of LagrangeThe following notational simplification will aid greatly in the proof of Theorem 5.1:

On the other hand. if the entire available vector of resources is given over to a singleagent (say. agent I). then the objective function has the value ax y. Since a E (0. I).this is strictly larger than a(l - a)xy.

The problem here. as in the earlier examples. is that the nonnegativity constraintsbite at the optimum: viz .• it can be shown that the solution to the problem is to turnover all the resources to agent I if a > 1/2. to agent 2 if a < 1/2, and to.give it allto anyone of the agents if a = 1/2. [J

axj yj + (I - a)xiyi = a(l - a)x(J - a)y + (1 - a)axay

== a(l - a)xy.

However. it is easy to show that these values do not identify a solution to the maxi­mization problem. When they are substituted into the objective function, we obtain

yj = (I - a)y

yi = ay

A;, == a (I - a)x .

x~ = (l - a)x

xi = ax

A; = a(l - a)y

A simple computation shows that these equations admit the unique solution

The critical points of L are the solutions to:

aYt - Ax = 0(I - a)Y2 - Ax = 0

aXI - Ay = 0

(l - a )X2 - Ay = 0

x - XI - X2 = 0y- YI - Y2 = o.

write it as an equality-constrained·problem with only the resource constraints.uhc -Lagrangean for the problem is

L(XI.X2.YI.Y2.Ax.Ay) = aXIYI +(1 -a)x2Y2

+ Ax (x - Xl - X2) + Ay(Y - )'1 - )'2).

5.6 Proof of the Theorem 0/ Lagrange

Page 144: A First Course in Optimization Theory

9As we mentioned in Chapter I, it is customary to treat vectors as columnvectors, and to use the transposeto represent them as row vectors. We are not following this rule here.

lOWeassume in this proof that x' is a local maximum: the proof for a local minimum follows identical steps,with obvious changes.

)c. DgwCw', z·) = -DlwCw·, z·)[DgwCw·, z*)r1 Dgw(w·, z*)

= -DlwCw·, z*),

Wewill show that A' so defined meets the required conditions. Indeed, it follows fromthe very definition 01'),,* that when both sides arc postrnultiplicd by Dg",(w', z"),we obtain

).* = -Dlw(w*, z·)[Dgw(w·, z*»)-I.

Now define).' by

At z = z", we have h(z·) = w'. Since Dgw(w·, z") is invertible by assumption,this implies

Dgw(h(z), z)DhCz) + Dgz(h(z), z) = o.

2. Wewill denote the first k coordinates of a vector x E V by w, and the last (n - k)coordinates by z, i.e., we write x = (w, z). In particular, we shall write (w*, z")to denote the local optimum r",

3. Wewill denote by Dlw(w, z), the derivative of I at (w, z) with respect to the wvariables alone, and by DIz(w, z), the derivative of I at (w, z) with respect to theZ variables alone. Dgw(w, z) and Dgz(w, z) are defined similarly. Note that thedimensions of the matrices Dlw(w, z), DIz(w, z), Dgw(w, z), and Dgz(w, z),are, respectively, 1 x k, 1 x (n - k), k x k, and k x (n - k).

4. We shall treat the vector A· in ]Rk, whose existence we are to show, as a 1 x kmatrix." Thus, for instance, we will write A"Dg(x') to represent L:7=1 A;Dg,(x').

In this notation, we are given the data that (ui", z·) is a local maximum!" of I onV = un few, z) E ]Rn I g(w, z) = O}; and that p(Dgw(w', z") = k. We are toprove that there exists )c. such that

Dlw(w*, z") +).... Dgw(w*, z") = 0Dfz(w*,z*)+)C*Dgz(w*,z·) = O.

As a first step, note that since p (Dgw (w*, z") = k, the Implicit Function Theorem(see Theorem 1.77 in Chapter 1) shows that there exists an open set V in ]Rn-kcontaining z" , and aC 1 function h: V ~ ]Rk such that h (z") = w· and g(h (z), z) ==o for all z E V. Differentiating the identity g(h (z), z) == 0 with respect to z by usingthe chain rule, we obtain

Chapter 5 Equality-Constrained Optimization

Page 145: A First Course in Optimization Theory

))Specifically. the added complication derives from the Iact that when there is only a single constraint 1/. thesecond-derivative D2g is an n x n matrix. while in the general case where g= (gl. 0 0 0 , gk). Dg is itselfan n x k matrix. so D2g = D(Dg) is an array of dimensions n x k x n. A typical entry in this array IS

a2g;(x)/axih/. where; = 1, ...• k; and j.I = 1•.... no

• We will denote a typical point x E IRn by (w, z ). where WEIR and z E IRn-1 0

The point x* will be denoted (ui", z").

Proof of Part J of Theorem 5.4

Wewill prove Part 1of Theorem 5,4 for the case where there is just a single constraint.i.e., where g: IRn ~ R The general case follows exactly along the same lines, exceptthat notation gets considerably more involved. I I The proof we will present is a directextension of the proof of the first-order necessary conditions provided in Section 5_(ioIn particular, we will use the notation introduced there. adapted to the case k "" I.That is:

5.7 A Proof of the Second-Order ConditionsWe prove parts 1 and 3 of Theorem 5.4 here. Parts 2 and 4 are proved analogously.The details are left as an exercise to the reader.

o

Dfz(w*,z*)+.l..*Dgz(w*,z*) = o.o Theorem 5.1 is proved.

which, from the definition of .l.., is the same as

Substituting for Dh(z*), we obtain

-DJw(w*, z*)[Dgw(w·, z*)]-t Dgz(w*, z") + Dfz(w*, z") O.

DJw(w*, z*)Dh(z*) + DJz(w·, z*) O.

Dfz(w*,z*)+.l..*Dgz(w*,z*) = O.

To chis end, define the funccion F: V ~ 1Rby F(z) = j(h(z), z). Since / ha.<;alocal maximum at (w*,z*) = (h(z*), z"), it is immediate that F has a local max­:ml:ul at z". Since V is open, z " is an unconstrained local maximum of F. and thefirst-c.der conditions for an unconstrained maximum now imply that DF(z*) = O.or that

Thus, it remains only to be shown that

D/w(w*,z*)+) .._'Dgw(w*,z*) = o.which is the same as

1:\75.7 Proof of Second-Order Conditions

Page 146: A First Course in Optimization Theory

12Note that. in full notation. we have

Dgw(w, z)Dh(z) + Dgz(w, z) = 0

As a first step, note that since g(h(z). z) = 0 for all z E V. we have

D/(x*)+A.*Dg(x*) = [Drw+A.*Dg~] = O.DIz· + x..Dg;

We are now ready to show that, if D2L· denotes the n x n quadratic formn2 /(w*, ;:*) + A* D2g(w·, z*), we have

x' D2L *x .:s 0 for all x E Z· = Ix E R" I Dgtui", z*)x = O}.

then

and tluu p( /)gll'(w*, z ") = I, where U C lR" is open. We have shown in Section 5.6under these conditions. that if

D = un (cw, z) E R" I g(w. z) = 0),

• We will denote by D/w(w, z) and Dfz(w, z) the derivatives of / with respect tothe w-variable and the z-variables, The tenus Dgw and Dg, are defined similarly.Of course, we have D/ = (D/w, D/z) and Dg = (Dgw, Dgz) .

• We will assume, without loss of generality, that the 1 x I submatrix of Dgiui", z")that has full rank is Dgw(w·, z "). By the Implicit Function Theorem, there is aneighborhood V of z" E jR"- I. and a C I function h: V ~ IRsuch that h (z ") = ur",and g(h(z). z) = 0 for all Z E V.

In an extension of this notation to second-derivatives. we define D2/ww to be thederivative of D/w with respect to the ui-variable; D2/wz to be the derivative of D/wwith respect to the z-variables: and D2/zz to be the derivative of D/z with respect{O the a-variables, The terms D2gww. D2gwz. and D2gzz are defined analogously. 12

Finally. for notational ease. we will simply use the superscript ...... to denote eval­uation of the derivatives of / and g at r " = (ur", z"). So. for instance. D/oOwill denote DI(w*, z*), [)2f;~UI will denote D2 /ww(w*, z"), D2g~z will denoten2gll,:(III"',z"), etc.In this notation, we are given that / and g are C2 functions mapping IR" into JR.;

that (ui", z") is a local maximum of / on the set

Chapter 5 Equality-Constrained Optimization138

Page 147: A First Course in Optimization Theory

The result is proved.

for all ~ E IR". To complete the proof, we will now show that this is preciselythe same thing as the negative definiteness of D2 L** on the set Z". Indeed. thisis immediate: note that since Dgtui", z") = (Dgw(w*, z"), Irg.tu:"; ZO», it is thecase that (w,~) E Z" if and only if Dg~Jw+ Dg;~ = 0, i.e., if and only if

(t) = -LDg:r 1 [)g;~ = Dh·~.

Therefore, D2 L '" is negative semidefinite on Z' if and only if for all ~ E lRll•wehave

for all (ui, z). Differentiating again'wiih respect to z, and suppressing notation 011

(ur, z), we obtain

(Dh)'[D2gwwDh + D2gwzl + DgwD2h + (Dh)' D2gwz + D2g:z = O.

Evaluating these expressions at (w*, z"), and using the assumption that Dg:! =J 0,we obtain

Dh" = -[Dg:l-1 Dg;.

D2h* = -[Dg:,l-I (Dh*)' D2g~!wDh* + 2(DI1*)' D:'g,:,z + /)2g;z)'

Now, define F: V ~ lR by F(z) = f(h(z), z), Since f has a local maximum at(w", z "), it is easy to sec that F has a local maximum on V at z". Therefore, wemust have z' D2 F(z*)z ~ 0 for all z E lRn. Since DF(z) = D/.,,(h(z), z) Dh (z) +Dfz(h(z), z) at all z E V, we have

D2F(z*) == (Dh*)' D2 f~wDh* + 2(DI1*)' D2 /';,z + Df;,l)2". /)2 J;'z.Substituting for D2h*, using the fact that )."0 = - D/,:[ Dg~ r I, and writing [)2L~'IJ'for D2 f:;w + ).*D2 g:'w' etc., we obtain

DF(z*) = (Dh*)' D2L:'wDh* + 2(Dh')' D2 L:'z + D2 L;z'

where Dh" = -[Dg~l-IDg;, Since f)2 F(z*) is a negative semidefinite quadraticform, we have shown that the expression on the right-hand side of this equation isalso a negative semidefinite quadratic form on ]Rn, i.e., we must have

(Dh* n'D2L:'w Dh"~ + 2(Dh"1;)' D2L:'z +( D2L;z~ ::: 0

for all ~ E ]R". In more compact matrix notation, wc can write this as

1395.7 Proof of Second-Order Conditions

Page 148: A First Course in Optimization Theory

en-I = {y' E IRn 11Iy'1i = I}.

As a closed and bounded subsetof lRn,en-I is compact, so the sequence {YI} admits

YI =XI- x*IIXI - x*1I

Since XI i= x" for any I, the sequence {yJ} is well-defined. Moreover, we haveIIYIII= I for all I , so the sequence {yJ} lies in the unit circleen-I in IRn:

I. Y i= O.2. Dg(x*)y = O.3. y' D2L •y ~ o.This will furnish the required contradiction, since, by hypothesis, we must havey' D2L" y < 0 for all Y i= 0 such that Dg(x")y = O.Taylor's Theorem, which wasdescribed inChapter 1(see Theorem 1.75),will playa central role in this process.

Definea sequence {Y/} in IRn by

Weare to show that x" is a strict local maximum of f on TJ, i.e., that there existsr > 0 such that f(x*) > f(x) for all x E Btx", r) n D.Weusea reductio ad absurdum approach.Supposethat underthe statedconditions,

the conclusion of the theorem were not true. Then, it is the case that for all r > 0,there exists x(r) E Btx", r) n D such that x(r) i= x* and f(x(r» ~ f(x*). Pickany sequence rl > 0, rt ~ 0, and let XI = x(n). We shall use the sequence {xi} toshow that there must exist at least one point Y E IRn with the followingproperties:

x'D2L*x < 0 for all 0 i= x E Z· = {v E IRn I Dg(x*)v = O}.

Proof of Part 3 of Theorem 5.4

The sufficient conditions given in Part 3 of Theorem 5.4 may be established byreversingthe arguments used in proving the necessary conditions of Part I. That is,we begin with the condition that D2L '" is negative definite on the set Z .., show thatthis implies that the second derivative D2 F(z*) of the function F(z) = f(h(z), z)is negativedefiniteon R", anduse this to conclude that f has a strict localmaximumat (w*, z"). This is a relatively straightforward procedure, so we do not fill in thedetails here. Instead, we offer an alternative method of establishing Part 3, whichuses a completely different approach.

We revert to the notation of Theorem 5.4, and drop the simplifying assumptionthat k = 1. We are given that x· satisfies p(Dg(x"» = k, and that there existsA"E IRk such that the following two conditions are met:

kDf(x") +LAj Dg;(x*) = 0,

;=1

Chapter 5 Equality-Constrained Optimization140

Page 149: A First Course in Optimization Theory

= ' D2 L * + R2(Xm(l), x")Ym(l) Ym(/) II *112'Xm(/) - x

f(xm(/») - f(x·)

IIxm(/) - x*1I2

we see that

Xm(/) - x*Ym(1) = II *11'Xm(/) - x

Substituting these into the Taylor expansion of L(-). we obtain

f(xm(/» = f(x·) + (xm(/) - x")' D2L*(xm(l) - x") + R(xm(/). x").

Rearranging terms, dividing through by IIXm(/) - x"1I2, and using the fact that

1. DL(x*) = Df(x"') + 2:7=1 ATDg;(x*) = O.2. D2L(x*) = D2L"'.3. L(xm(l» = f(xm(/) + 2:7=1 )"7g;(Xm(l» ::: f(xm(/», since g(xm(/» == 0 by

choice of xm(l).4. L(x"') = f(x"') + I:7=1 A7g;(x*) = f(x*), since g(x*) = 0 by definition of

x*.

Now, we also have

L(xm(l) = L(x") + DL(x·)(xm(l) - .r ")

+~(xm(l) - x")' D2 L(x*)(xm(l) - .r") + R2(Xm(l). x").

o = Dg(x*)y.

Finally, to complete the proof, we will show that y' D2/," y ~ O. For notationalease,let L(·) denote L (.; ).."). Since L(.) is C2, a second-order Taylor series expansionaround x" yields

a convergent subsequence (Ym(I)'1 converging to a point y Een-I. Observe that sincc,we must have lIyll = I, it is certainly the case that y "I O.We will now show that Dg(x*)y = O. Using a first-order Taylor series expansion

of g around x" , we have

g(xm(/) = g(x*) + Dg(x*)(xm(/) - x*) + RI (xm(l). .r ").

By definition of x', and by choice of xm(l). we must have g(x') = g(Xml/) = o.Therefore,

ODg(x"')(xm(l) - x") R(xm(/). x') D ( *) R(xm(/). x')

= + = g x )'m(/) + _ -.IIxm(l)-x"lI IIxl1l(/)-x'll Ilxm(/)-x'll

By Taylor's Theorem. we must have R(Xm(I), x·)/lIxm(/) - x"] -+ 0 as Xm(l) .-~

Therefore. taking limits as I --+ 00, and noting that Ym(/) --+ Z, we obtain

1415.7 Proof of Second-Order Conditions

Page 150: A First Course in Optimization Theory

Im_:x c'x + 2'x'Dx subject to Ax = b

(a) Solve the problem geometrically.(b) Show that the method of Lagrange multipliers does not work in this case.

Can you explain why?

6. Consider the following problem where the objective function is quadratic andthe constraints are linear:

(a) J(x, y) = xy subject to x2 + y2 = 2a2.(b) J(x, y) = s]» + l/ysubject to O/x)2 + 01y)2 = O/a)2.(c) [tx ,y, z) == x + y + z subject to (lIx) + (1 I y) + (l/z) = 1.(d) fcx, y, z) == xyzsubject to ,r + y+ z = 5 and xy+ xz + yz = 8.(e) !(x,y)=x+yforxy=16.(f) [tx , y, z) ::::x2 + 2y - z2 subject to 2x - y = 0 and x + z = 6.

4. Maximize and minimize lex. y) = x+yonthe lemniscate (x2_ y2)2 = x2+i.S. Consider the problem:

min x2 +l subject to (x - 1)3-l = o.

5.8 Exercises

1. Find the maximum and minimum of [t», y) = x2 - y2 on the unit circlex2 + y2 = 1 using the Lagrange multipliers method. Using the substitutiony2 = I - x2, solve the same problem as a single variable unconstrained problem.Do you get the same results? Why or why not?

2. Show that the problem of maximizing [t», y) = x3 + y3 on the constraint set1J = {(x. y) I x + y = I) has no solution. Show also that if the Lagrangeanmethod were used on this problem, the critical points of the Lagrangean have aunique solution. Is the point identified by this solution either a local maximumor a (local or global) minimum?

3. Find the maxima and minima of the following functions subject to the specifiedconstraints:

oas required.

By the definition of the sequence {xI}, the left-hand side of this equation must benonnegative at each I.Since R2(Xm(l). x*)/lIxm(l) -X* 112~ 0 as Xm(l) ~ x", takinglimits as I -. 00 now results in

Chapter 5 Equality-Constrained Optimization142

Page 151: A First Course in Optimization Theory

(a) Let y ee l.Represent the set Itxi.xj) E IR~' x?+xi 2: y}inadiagram,and

The firm incurs two types of inventory costs: a holding cost and an ordering cost.The average stock of inventory is x/2. and the cost of holding one unit of thecommodity is Cs, so ChX /2 is the holding cost. The firm orders the commodity,as stated above. n times a year. and the cost of placing one order is C«. so Cnnis the ordering cost. The total cost is then:

xC = Ch2_ + Con.

(a) In a diagram show how the inventory level varies over time. Prove that theaverage inventory level is x /2.

(b) Minimize the cost of inventory. C. by choice of x and n subject to theconstraint A = nx using the Lagrange multiplier method. Find the optimalx as a function of the parameters Co. Cs , and A. Interpret the Lagrangemultiplier.

9. Suppose the utility function in subsection 5.5.1 is modified to

U(XI,X2) = xf+xt,

where a. f3 > O. Under what circumstances, if any. can the problem now bereduced to an equality-constrained optimization problem?

10. Consider the cost-minimization problem of subsection 5.5.2. but with the pro­duction function g modified to

A = nx .

where A is a given symmetric matrix.

8. A firm's inventory of a certain homogeneous commodity, f (r ). is depleted ata constant rate per unit time d 1/dt, and the firm reorders an amount x of thecommodity, which is delivered immediately. whenever the level of inventory iszero. The annual requirement for the commodity is A. and the firm orders thecommodity n times a year where

where c is a given n-vector;'D'i's a given n x n symmetric, negative dcfinuc '._matrix, and A is a given m x n matrix.

(a) Set up the Lagrangcan and obtain the first-order condit ions.(b) Solve for the optimal vector x" as a function of A, b, C, and D.

7. Solve the problem

max I(x) = x' Ax subject to x· x = 1

1435.8 Exercises

Page 152: A First Course in Optimization Theory

with either (1,0) or (0, 1) being optimal if WI = W2.

(b) Show that if the nonnegativity constraints are ignored and the Lagrangeanmethod is employed, the method fails to identify the solution, regardless ofthe values of WI and W2.

II. Consider the problem of maximizing the utility function

u(x, y) = xl/2 + yl/2

on the budget set {(x, y) E lRt I px + y = I}. Show that if the nonnegativityconstraints x :::0 and y ::: 0 are ignored, and the problem is written as anequality-constrained one, the resulting Lagrangean has a unique critical point.Does this critical point identify a solution to the problem? Why or why not?

if WI < W2

if WI > W2* * {(l,0),

(XI,X2) = (0,1),

argue using this diagram that the solution to the problem is

Chapter 5 Equality-Constrained Optimization144

Page 153: A First Course in Optimization Theory

145

6.1.1 Statement of the TheoremIn the statement of the theorem, as elsewhere in the sequel, we say that an inequalityconstraint hj(x) :::0 is effective at a point x" if the constraint holds with equality at

6.1 The Theorem of Kuhn and Thcker

The Theorem of Kuhn and Tucker provides an elegant characterization of the be­havior of the objective function f and the constraint functions hi at local optima ofinequality-constrained optimization problems. The conditions it describes may beviewed as the first-order necessary conditions for local optima in these problems.The statement of the theorem, and a discussion of some of its components, is thesubject of this section.

v = Un{XEJRn I gj(X) =0, i= I, ... k, hj(x):::O, i= 1,... ,I}.

where U C JRn is open, and h;: JRn ~ JR, i = 1, ... , I. The centerpiece of thischapter is the Theorem of Kuhn and Tucker, which describes necessary conditionsfor local optima in such problems. Following the description of the theorem-andof its use in locating optima in inequality-constrained problems-we show that theTheorem of Lagrange may be combined with the Theorem of Kuhn and Tuckerto obtain necessary conditions for local optima in the general case of optimizationproblems defined by mixed constraints, where the constraint set takes the form

v = Un(xEII~nlhi(X):::O, i=I, ...,I},

Building on the analysis of the previous chapter, we now tum to a study of opti­mization problems defined by inequality constraints. The constraint set will now beassumed to have the form

Inequality Constraints and the Theorem ofKuhn and Tucker

6

Page 154: A First Course in Optimization Theory

IWith the (important) exception of the nonnegativity of the vector A, the conclusions of the Theorem ofKuhn and Tucker can be derived from the Theorem of Lagrange. Indeed, this constitutes the starting pointof our proof of the Theorem of Kuhn and Tucker in Section 6.5.

Remark In the sequel, we will say that a pair (x" , ).*) meets the first-order necessaryconditions for a maximum (or that it meets the Kuhn-Tucker first-order conditions fora maximum) in a given inequality-constrained maximization problem, if (x ", A*) sat­isfies h (x") ::: 0 as well as conditions [KT-I] and [KT-2] of Theorem 6.1. Similarly.

oProof Follows immediately from Theorem 6.1.

Corollary 6.2 Suppose f and D are defined as in Theorem 6.1, and x* is a localminimum of f 011 D. Let E be the set of effective constraints at x", Leth E = th, )iE£.and suppose that p(Dhdx*» = lEI. Theil, there exists J,.'" E]RI such that

(KT-j'] Ai::: 0 and Aih;(x"') = 0 foraL! i.

I(KT-2'j Df(x"') - I>i Dh;(x*) = O.

i=l

Although we have stated the Theorem of Kuhn and Tucker for local maxima, thetheorem is easily extended to cover local minima. For, if x* were a local minimumof f on D, x'" would be a local maximum of - f on D. Since D(- f) = -Df. wehave the following:

oProof See Section 6.5.1

where U isan open set in lRn. Let E C {I, ... , /} denote the set of effective constraintsat .r", and let h s = (h, );EE. Suppose p(Dh £(x*» = lEI. Then, there exists a vectorA* = (Ar, ... , An E ]RI such that the following conditions are met:

{KT-J] A72:0 and J,.;h;(x*)=0 fori=l, ... ,/.

I(KT-2] Df(x*) +2)';Dh;(x"') = O.

;=1

D = un (x E JR." I h;(x) ~ 0, i = 1, ... , I},

Theorem 6.1 (Theorem of Kuhn and Tucker) Let f: JR." ~ JR. and hi: JR." ~ lRbe C Ifunctions, j = I, ... , I. Suppose x" is a local maximum of f on

x*, that is, we have hi (x ") = O. We will also use the expression IElto denote thecardinality of a finite set E, i.e., the number of elements in the set E.

Chapter 6 Inequality-Constrained Optimization146

Page 155: A First Course in Optimization Theory

6,1.2 The Constraint QualificationAs with the analogous condition in the Theorem of Lagrange, the condition in theTheorem of Kuhn and Tucker that the rank of Dhdx*) be equal to lEI is called

This example notwithstanding, the Theorem of Kuhn and Tucker turns out to bequite useful in practice in identifying optima of inequality-constrained problems. Itsuse in this direction is explored in Sections 6.2 and 6.3 below.

In the subsections that follow, we elaborate on two aspects of the Theorem ofKuhn and Thcker. Subsection 6.1.2 discusses the importance of the rank conditionthat p(DhE(X"» = lEI. Subsection 6.1.3 then sketches an interpretation of thevector )._.., whose existence the theorem asserts,

Thus, if the conditions of the Theorem of Kuhn and Tucker were also sufficient, .r"would be a local maximum of f on 1). However, f is strictly increasing at x' = o.so quite evidently x* cannot be such a local maximum. 0

f'(x*)+)._*g'(x*) = o.

Example 6.3 Let J: IR -+ IR and g: lR -+ lR be defined by J(x) = x3 andg(x) = x, respectively. Consider the problem of maximizing J on the set 1) =Ix E JR.I g(x) ~ OJ.

Let x* = )._* = O. Then, x" is evidently in the feasible set, and the problem'ssingle constraint is effective at x". Moreover, g'(x) = I for all x, so the constraintqualification holds at all x, and, in particular, at x *. Finally, note that

Condition [KT-I] in Theorem 6.1 (which is the same as condition [KT-l'J inCorollary 6.2) is called the condition of "complementary slackness." The terminol­ogy arises simply from the observation that by the feasibility of x ", we must havehj(x*) ~ 0 for each i; therefore, for 'J...7hj(x·) = 0 to hold alongside A; ~ 0, wemust have A7 = 0 if hi (x·) > 0, and hi(X') = 0 if A7 > O.That is, if one inequalityis "slack" (not strict), the other cannot be.ltmust be stressed that the Theorem of Kuhn and Tucker provides conditions that

are only necessary for local optima, and at that, only for local optima that meet theconstraint qualification. These conditions are not claimed to be sufficient to identifya point as being a local optimum, and indeed, it is easy to construct examples to showthat the conditions cannot be sufficient. Here is a particularly simple one:

we will say that (r ", 'J.....) meets me JjrsJ-order necessary conditions for a lllinilllUlIl_(or that it meets the Kuhn-Tucker first-order conditions for a minimum) if it satisfiesh(x·) ~ 0 as well as conditions [KT-J'] and [KT-2'] of Corollary 6.2. 0

1476.1 The Kuhn-Tucker Theorem

Page 156: A First Course in Optimization Theory

2The reason we have AI ~ 0 in this case, and not the strict inequality Ai > O. is also intuitive: anotherconstraint. say the j-th, may have also been binding at x·. and it may not be possible to raise the objectivefunction without simultaneously relaxing constraints i and j.

6.1.3 The Kuhn-Tucker Multipliers

The vector A * in the Theorem of Kuhn and Tucker is called the vector of Kuhn­Tucker multipliers corresponding to the local maximum x*. As with the Lagrangeanmultipliers, the Kuhn-Tucker multipliers may also be thought of as measuring thesensitivity of the objective function at x" to relaxations of the various constraints.Indeed, this interpretation is particularly intuitive in the context of inequality con­straints. To wit, if hi (x *) > 0; then the i-th constraint is already slack, so relaxingit further will not help raise the value of the objective function in the maximizationexercise, and A; must be zero, On the other hand, if hj(x') = 0, then relaxing ther-th constraint may help increase the value of the maximization exercise, so we haveAi :::: 0.2

For a more formal demonstration of this interpretation of A, we impose somesimplifying assumptions in a manner similar to those used in Chapter 5. First, wewill assume throughout this discussion that the constraint functions hi are all given

A solution to this problem can be obtained by inspection: the function I reachesa maximum at the point where (x2 + y2) reaches a minimum. Since the constraintrequires (x - 1)3.~ y2, and y2 ~ 0 for all y, the smallest absolute value of x on 1)

is x = 1, which occurs when y = 0; and, of course, the smallest absolute value ofyon 1) is y = O. It follows that I is maximized at (r", yO) = (1,0). Note that theproblem's single constraint is effective at this point.

At this global maximum of Ion D, we have Dhtx", y*) = (3(x* - 1)2, 2y*) =(0,0), so p(Dh(x*, y*» = 0 < I, and the constraint qualification fails. On theother hand, we have Dftx"; yO) = (-2x*, -2y*) = (-2,0), so there cannot existA ~ 0 such that Df tx", yO) +ADh(x*, yO) = (0,0). Therefore, the conclusions ofthe Theorem of Kuhn and Tucker also fail. 0

1) = {(x, y) E]R2 I hex, y) ~ OJ.

Example 6.4 Let 1:]R2 --+ lRand h: JR2 --+ IRbe given by [tx, y) = _(x2 + y2)and h(x, y) = (x - 1)3 - y2, respectively. Consider the problem of maximizing Ion the set

the constraint qualification. This condition plays a central role in the proof of thetheorem (see Section 6.5 below). Moreover, if the constraint qualification fails, thetheorem itself could fail. Here is an example to illustrate this point:

Chapter 6 Inequality-Constrained Optimization148

Page 157: A First Course in Optimization Theory

aF-(c) = )"7 (c).ae;

F(C_i.(;j) = /(x·(c» = F(c).

It follows immediately that aF(c)/acj = O. On the other hand. it is also the casethat hj(x"'Cc» + ci > 0 implies by the Kuhn-Tucker complementary slacknessconditions that >"1(e) == O. Therefore. we have shown that if constraint j is slack atc, we must have

Since x" = x*(c) is a local maximum of f on the larger constraint set DCc), andsince x"'(c) E V(Ci. Ci) by choice of Ci. it follows that x"'Cc) is also a maximum ofj on the constraint set VCCi. Ci). Therefore,

Consider the constraint set VCC-i. Ci) which results when the parameter c, in con­straint j is replaced by Cj. Since Cj < ci, we must have

The demonstration will be complete if we show that a F(e)/ilct = Aj, i = I, .... 1.Pick any c E C. Suppose i is such that hi (x*(e» + c, > O. Pick any Ci such that

c; < Cj and

F(e) := j(x*(e».

Finally. suppose that A" (.) and x* (-) are C I functions of the parameters c on theset C.Define F: C -+ lRby

IDj(x*(c» + I>7(c)Dh;(x*(e» = o.

i=1

Suppose further that the constraint qualification holds a: each C E C, su there existsA*(C) ::: 0 such that

Vee) = Un(XElRn Ihi(x)+ei :::0, i= l , ... ,/j.

Under this assumption. we can define a relaxation of the i-th constraint as a "small"increase in the value of the parameter c..

Let C c lit' be some open set of feasible values for the parameters (CI' ... ,C{).Suppose that for each C E C, there is a global maximum x" (c) of j on the constraintset

in parametric form as

1496.1 The Kuhn-Tucker Theorem

Page 158: A First Course in Optimization Theory

M = {(x,).) I (x,).) is a critical point of L and x E U}.

aL aL-(x,A) 2: O. Aj 2: O. A;a'I.(x,A) = O. i=I •...• I.aAj I\.

Any solution to this system of equations will be called a "critical point" of L. It isimportant to note that the equations that define the critical points of L differ fromthe corresponding ones in equality-constrained problems, in particular with respectto the A-derivatives. Let M denote the set of all critical points of L for which x E U:

O. j = I •...• n,aL-a (x, A)Xj

The second step in the procedure is to find all solutions (x, A) to the following setof equations:

As the first step in the procedure for solving this problem, we form a functionL:]Rn x ]RI -+ lR, which we shall continue calling the Lagrangean, defined by:

IL(x,A) = f(x) + LA;h;(x).

;=1

Maximize f(x) subject to x E V = Un {x I h(x) 2: OJ.

6.2 Using the Theorem of Kuhn and Tucker

6.2.1 A "Cookbook" ProcedureThe cookbook procedure for using the Theorem of Kuhn and Tucker to solve aninequality-constrained optimization problem involves essentially the same steps asthose in using the Theorem of Lagrange in solving equality-constrained problems:namely, we form a "Lagrangean" L, then we compute its critical points, and finally,we evaluate the objective at each critical point and select the point at which theobjective is optimized.

There are. however, some important differences in the details. One of these arisesfrom the fact that the conclusions of the Kuhn-Tucker Theorem differ for local max­ima and local minima. This necessiates a difference in the steps to be followed insolving maximization problems, from those to be used in solving minimization prob­lems. These differences are minor; nonetheless, for expositional ease, we postponediscussion of the minimization problem to the end of this subsection, and focus onmaximization problems of the form

It remains to be shown that this relationship also holds if i is an effective constraintat c, i.e., if constraint i holds with equality. This may be achieved by adapting theargument used in subsection 5.2.3 of Chapter 5. The details are left to the reader.

Chapter 6 Inequality-Constrained Optimization150

Page 159: A First Course in Optimization Theory

6.2.2 Why the Procedure Usually WorksAs earlier with the Theorem of Lagrange, it is not very hard to see why this methodis usually successful. We discuss the reasons in the contcx t of inequality-constrainedmaximizationproblems here. With the appropriate modifications, the same argumentsalso hold for minimization problems.

The key, once again. lies in a property of the Lagrangean L:

The set of critical points of L contains the set of all local maxima of f on D at which theconstraint qualification is met.That is. if x is a local maximum of f on D. and the constraintqualification is satisfied at x, then there must exist A such that (x, ).) is a critical point of L.

a.c a.ca l,.(X,A):::: 0, A;::: 0, A;-(x,A) = 0, i=I, ... ,I.A OA;

Lasrl y, we evaluate f at each point x in the set {x I there is A such that (r ,A) EM}.The vaice of x that minimizes f over this set is typically also a global minimum ofthe original problem.

sc-(x,A) = 0, j=l, ... ,n,OXj

and follow the same remaining steps as listed for the maximization problem. That is.in the second step, we find the set M of all the points (x, A) which satisfy x E U aswell as

Two routes are open to us. First. since x minimizes f over D if, and only if. xmaximizes - f over D. we could simply rewrite the problem as a maximizationproblem with the objective function given by - I. and use the procedure listed above.

The alternative route is to modify the definition of the Lagrangean to

I.c(x, A) = I(x) - L Ajhj(x),

;=1

Minimize I(x) subject to xED =Un (x I hi x ; :':OJ.

In practice, the value of x that maximizes lover this set is typically also the solutionto the original maximization problem.

The reason this procedure works well in practice, and the conditions that maycause its failure. are explored in the following subsections. But first, some notes oninequality-constrained minimization problems. Suppose the original problem is to

solve

As the third and last step, we compute the value of f at each x in the set

{x I there is A such that (x, A) EM}.

I~I6.2 Using the Kuhn-Tucker Theorem

Page 160: A First Course in Optimization Theory

6.2.3 When It Could FailUnfortunately, the failure of either condition of Proposition 6.5 could also lead tofailure of this procedure in identifying global optima. This subsection provides anumber of examples to illustrate this point.

It follows that, under the conditions of Proposition 6.5, the procedure we haveoutlined above will succeed in identifying the maximum x". Since neither the exis­tence of solutions nor the constraint qualification is usually a problem in applications,Proposition 6.5 also provides an indirect explanation of why this procedure is quitesuccessful in practice.

Proposition 6.5 Suppose thefollowing conditions hold:1. A global maximum x* exists to the given inequality-constrainedproblem.2. Theconstraint qualificationis met at x".Then, there exists A'" such that (x", A*) is acritical point of L.

Equivalently, (x, A) is a critical point of L if and only if it satisfies the Kuhn-Tuckerfirst-order conditions for a maximum, i.e., it satisfies h(x) ::.0 as well as conditions[KT-l] and [KT-2] of Theorem 6.1.

Now, suppose x" is a local maximum of f on 1), and the constraint qualificationholds at x". Since x* is feasible, it must satisfy h(x*) ::.O. By the Theorem of Kuhnand Tucker, there must also exist A* such that (x", )....) satisfies conditions [KT-l]and [KT-2] of Theorem 6.1. This says precisely that (x", A*) must be a critical pointof L, establishing the claimed property.

A special case of this property is:

hj(x)::.O, ).j::'O, Ajhj(x)=O,i=I, ... ,1.I

Df(x) +I)..;Dhj(x) = o.;=1

st.-(X,A) = hj(x)aAj

st. af I ahj-(x, A) = -(x) + I:>j-(x),ax) aX) ;=1 ax)

a pair (x, A) can be a critical point of L if and only if it satisfies the followingconditions:

As earlier, this is an immediate consequence of the definition of L and its criticalpoints. Since we have

Chapter 6 Inequality-Constrained Optimization152

Page 161: A First Course in Optimization Theory

Example 6.7 Let I and g be functions on 1Rdefined by f(x) = 2x3 - 3x2 andg(x) = (3- x)3, respectively. Consider the problem of maximizing lover the set1) = (x I g(x) ::: OJ.

Since (3 - x)3 ::: ° if and only if 3 - x ::: 0, the constrain! set is just theinterval (-00,3]. A simple calculation shows that I is nonpositive for x ::: ~, andis strictly positive and strictly increasing for x > ~.Therefore, the unique globalmaximum of f on V occurs at the point x· = 3. At this point, however, we haveg'(x*) = -3(3 - x*)2 = 0, so the constraint qualification fails. We will show that.as a consequence, the procedure we have outlined will fail to identify x".

Example 6.6 As in Example 6.4, let I and h be C2 functions Oil 1J{2 defined hyf't», y) = _(x2 + y2), and ht», y) = (x - 1)3 - y2, respectively. We have seen III

Example 6.4 that the unique global maximum of Ion V is achieved at (x, y) = (1,0),but that the constraint qualification fails at this point; and that, as a consequence, thereis no A E lR+ such that (1,0, A) meets the conclusions of the Theorem of Kuhn andTucker, Evidently, then, there is no value of).. for which the point (1,0.)..) arises asa critical point of Lix , y) = lex, y) + Ah(x, y), and the cookbook procedure failsto identify this unique global optimum.

Indeed, there are no solutions to the equations that define the critical points of Lin this problem. These equations are given by:

-2x+3A(x-I)2 = 0

-2y- ny = °(x_I)3_y2::: 0, A::: 0, A«x_I)3_y2) = 0.

If y =1= 0, then the second equation implies A= -I, which violates the third equation.So we must have y = O. If A is also zero, then from the first equation, we have x = O.butx = y = °violates (x - 1)3 - y2 ::: O.On the other hand, if A > O. then-sincey = O-the complementary slackness condition implies (x - 1)3 = 0, or x = I, butthis violates the first equation. 0

First, even if an optimum does' exist, the constraint qualification may fail at·the­optimum, and as a consequence, the optimum may not turn up as part of a solution 10

the equations defining the critical points of L. It is very important to understand thatthis does not imply that L will possess no critical points. In each of Examples (1.6and 6.7, a unique global maximum exists to the stated problem, and in each casethe constraint qualification fails at the optimum. In Example 6.6, this results In asituation where L fails to have any critical points. This is not. however. the casein Example 6.7, where L possesses multiple critical points, although the problem'sunique maximum is not amongst these.

1.')16.2 Using the Kuhn=Tucker Theorem

Page 162: A First Course in Optimization Theory

v = {(x, y) E ]R21 g(x, y) ::::OJ.

Example 6.9 Let I and g be functions on jR2 defined by ft»; y) = x + y andg(x, y) = x y - 1, respectively. Consider the problem of maximizing lover thefeasible set

x ::::0, >.. :::: 0, Ax = O.

This system of equations admits two solutions: (x,).) = (0, I), and (x, A) = (~,0).However, neither point is a solution to the given maximization problem: for instance,at x = 2 (which is a feasible point). we have I(x) = 3, while 1(0) = 0 andI(~)= - ~. Indeed, the given problem has no solution at all, since the feasible setis all oflR+. and I(x) t 00 asx t 00. 0

2x - I +).. = 0,

to

Example 6.8 Let I and g be Cl functions on lRdefined by f'tx) = x2 - x, andg(x) = .r , respectively. Consider the problem of maximizing lover the set D ={x I g(x) ::::OJ. Note that since g'(x) = I everywhere. the constraint qualificationholds everywhere on D.

Define L(x,)..) = I(x) + )..g(x).The critical points of L are the solutions (x,)..)

Alternatively, even if the constraint qualification holds everywhere on the feasibleset. the procedure may still fail to identify global optima. because global optima maysimply not exist. In this case, the equations that define the critical points of L mayhave no solutions; alternatively, there may exist solutions to these equations whichare not global. or maybe even local, optima. Consider the following examples:

For the complementary slackness condition to hold, we must have either X = ° orx = 3. A simple calculation shows that if)., = 0, there are precisely two solutions tothese equations. namely (r , >") = (0.0) and (x, ).,)= (1,0). As we have seen. neitherx = 0 nor x = I is a global maximum of the problem. On the other hand, if x == 3, thefirst equation cannot be satisfied, since 6x2 - 6x - 3),(3 _x)2 = 6x2 - 6x = 36 i- O.So the unique global maximum x* is not part of a solution to the critical points of L.

o

)., ::::O. (3 - x)3 ::::0, ).,(3 - x)3 = O.

6x2 - 6x - 3),,(3 - x)2 = O.

Form the Lagrangean L(x,).,) = I(x) + ).,g(x). The critical points of L are thesolutions to

Chapter r, lnequality-Constrained Optimiuuion154

Page 163: A First Course in Optimization Theory

D = {(x.y)\g(x,y):::O).

6.2.4 A Numerical ExampleLet g(x. y) = 1 - x2 - y2. Consider the problem of maximizing [tx , y) = x2 - yover the set

These examples suggest that caution should be employed in using the cookbooktechnique in solving inequality-constrained optimization problems. If OIlC call veri fyexistence of global optima and the constraint qualification a priori, then the methodworks well in identifying the optima. On the other hand, in situations where answersto these questions are not available a priori, problems arise.

First. the Lagrangean L may fail to have any critical points for two very differentreasons. On the one hand. this may be because the problem itself does not havea solution (Example 6.8). On the oilier hand. this situation can arise even when asolution does exist, since the constraint qualification may be violated at the optimum(witness Example 6.6). Thus, the absence of critical points of L does not enable us todraw any conclusions about the existence or non-existence of solutions to the givenproblem.

Second, even if the Lagrangean L has one or more critical points, this set of criticalpoints need not contain the solution. Once again, this may be because no solutionexists to the problem (as in Example 6.9), or because a solution docs exist, but oneat which the constraint qualification is violated (cf. Example 6.7). Thus, even thepresence of critical points of L docs not enable us to draw conclusions about theexistence of solutions to the given problem.

A simple calculation shows that the vector (x, y, A) = (-I, -I, I) satisfies all theseconditions (and is, in fact, the unique solution to these equations). As we have seen.this point does not define a solution to the given problem. o

A::: 0, xy::: I, A(I - xy) = o.

I + AX = 0

I+)"y = 0

This problem evidently has no solution: for any real number x ::: I, the vector (x, x)is in D, and f(x, x) = 2x, which is increasing and unbounded in x . We will showthat, nonetheless, a unique solution exists to the critical points of the Lagrallgc;tll illthis problem,

The Lagrangean L has the form L(x, y,)..) = x + y + )..(xy - I), The criticalpoints of L are the solutions (x, y, )..)to

15562 Using ,h" Kuhn=Tuckrr Throrrm

Page 164: A First Course in Optimization Theory

(x, y, A) = (0, -I,D.At this critical point. we have /(0. -I) = I -c i.which means this point cannot be asolution to the original maximization problem. Since there are no other cri tical points.and we know that any global maximum of / on V must arise as part of a critical

Note that at either of these critical points, we have f(x. y) = ~+ ~== ~.This leaves the case x = O. If we also have x = O. then the second equation

cannot be satisfied. so we must have A > O.This implies from the third equation that, ,,\'"-/-y. ::::I. so .I' = ± I. Since), = I is inconsistent with a positive value for )..from the second equation, the only possible critical point in this case is

For the first equation to hold. we must have x = 0 or }"= I. If ). = I, then fromthe second equation, we must have y = -~. while from the third equation, we musthave x2 + y2 = I. This gives us two critical points of L, which differ only in thevalue of x:

~ 2 ~ ~A -: O. <I -.c-y) ::: o. A(l-X~-y·) = O.

-I - 2AY = 0

2x - 2Ax = 0

We will first argue that both conditions of Proposition 6.5 are satisfied. sothe cookbook procedure must succeed in identifying the optimum. The feasibleset V in this problem is just the closed unit disk in ]R2. which is evidentlycompact. Since the objective function is continuous on V, a maximum exists byIIu- W"j"lsIIIISSTheorem, so the first of the two conditions of Proposition 6.5 is'"111,.11<,01. 'Ii, ,,,'.'Ihnl II,,' ~r",",,1 ,'ollililioll is ulso mer, note thnt at any point wherethe problem's single consuuuu is effective (that is. at any (x. y) where we havex2+ y2 = I), we must have either x ¥= 0 or y ¥= O. Since Dg(x, y) = (-2x. -2y)at all (x, y), it follows that at all points where g is effective. we must havep(Dg(x, y» = 1. Thus. the constraint qualification holds if the optimum occursat a point (r , y) where g(x. y) = O. If the optimum occurs at a point (x , y) whereg(x. y) > 0, then the set of effective constraints is empty. Since the constraintqualification pertains only to effective constraints, it holds vacuously in this casealso. Thus, the second of the two conditions of Proposition 6.5 is also met.

Now set up the Lagrangean L(x. y. >..) = x2 - Y +A(1 - x2 - y2). The criticalpoints of L are the solutions (x , y. ).) to

Chapter 6 Inequality-Constrained Optimization156

Page 165: A First Course in Optimization Theory

3UAII possible locations of the optimal point" does not mean the entire feasible set, since it may be possibleto use the structure of the specific problem at hand to rule out certain portions of the feasible set. See theexamples that follow in this section.

As a practical matter, these examples make an important point: that, whenever thisis possible, it is better to solve an inequality-constrained optimization problem byreducing it to an equality-constrained one, since calculating the critical points of theLagrangean in the latter class of problems is a significantly easier task.

Some comments on the examples themselves are also important. The two exam­ples presented in this section share some common features. Most notably, in eachcase the formulations considered arc such that the problem C<l1l110tbe reduced to anequality-constrained problem. However, they are also designed to illustrate differ­ent aspects of solving inequality-constrained optimization problems. For instance.the set of critical points of the Lagrangean in the first example is very sensitive tothe relationship between the parameters of the problem. whereas this dependence is

1. It explains the process of checking for the constraint qualification condition inthe presence of multiple inequality constraints. Unlike the case with equality­constrained problems where this process is relatively straightforward. there isthe additional complication here that the constraint qualification depends on theprecise subset of constraints that are effective at the unknown optimal point.Thus, the only way, in general. to check that the constraint qualification willhold at the optimal point is to take all possible locations of the optimal point.and demonstrate that the condition will hold at each of these locations.!

2. It details a method by which the critical points of L may be identified from thcequations that define these points. This process is again more complicated thanthe corresponding situation in equality-constrained optimization problems, sincea part of the equations (namely, the complementary slackness conditions) are notspecified in the form of inequalities.

6.3 Illustrations from EconomicsIn this section, we present two ( xarnples drawn from economics, which illustrate theuse of the procedure outlined in the previous section in finding solutions to inequality­constrained optimization problems. The first example, presented in subsection 0.3.1,considers a utility maximization problem. The second example, presented in subscc­.ion 6.3.2, describes a cost minimization problem.The presentation in this section serves a twofold purpose:

point of L, it follows that there are exflctly two solutions to the given optimizafion"problem, namely the points (x, y) = (.J374, -1/2) and (x, y) = (-.j374, -1/2).

IJ

1:;76.3 Illustrations [rom Economics

Page 166: A First Course in Optimization Theory

As we have seen in Subsection 5.5.3, the nonnegativity constraints hIand b: cannotbe ignored, so this problem cannot be reduced to an equality-constrained one. Wewill show that it can be solved through the procedure outlined in Section 6.2.

We begin by showing that both conditions of Proposition 6.5 are met. The budgetset 13(p, I) is evidently compact, since prices and income are all strictly positive, andthe utility function is continuous on this set. An appeal to the Weierstrass Theoremyields the existence of a maximum in this problem, so one of the two requirementsof Proposition 6.5 is satisfied.

To check that the other requirement is also met, we first identify all possiblecombinations of constraints that can, in principle, be effective at the optimum. Sincethere are three inequality constraints, there are a total of eight different combinationsto be checked: namely, 0, hi, bz, h3, (h I, h2), (h I,h3), ihz, h3), and (h I,hz, h3).Of these, the last can be ruled out, since hi = b: = 0 implies h3 > O. Moreover,since the utility function is strictly increasing in both arguments, it is obvious thatall available income must be used up at the optimal point, so we must have h3 = O.Therefore, there are only three possible values for the set hE of effective constraintsatthe optimum, namely," E = (hi, h3), hE ::;::(h2, h3), and" E = "3.We will showthat the constraint qualification holds in each of these cases.If the optimum occurs at a point where only the first and third constraints are

effective and b e = (h I, h3), we have

DhE(Xl, X2) = [I 0]-PI -P2

hl(XI,X2) = XI ~ 0

h2(Xl, X2) = X2 ~ 0

h3(Xl, X2) = 1- PIXI - P2X2 ~ O.

where 1, PI, and P2 are all strictly positive terms. There are three inequality con­straints that define this problem:

6.3.1 An Illustrationfrom Consumer TheoryIn this subsection, we consider the problem of maximizing the utility functionU(Xl, X2) = Xl + X2 on the budget set

much less pronounced in the second exarnple, On the other hand, every critical pointin the first example also identifies a solution of the problem; in contrast, the secondexample always has at least one critical point that is not a solution.

Chapter 6 Inequality-Constrained Optimization158

Page 167: A First Course in Optimization Theory

Case 1: C= (h3)

Since only constraint 3 is assumed to hold with equality in this case, we must haveXl > 0 and X2 > O. By the complementary slackness conditions, this implies wemust also have Al = ).2 = O.Substituting this into the first two equations that define

To solve for all the critical points of this system. we adopt the following procedure.We fix a subset C of {h I.h i, h3j, and examine if there are any critical points in whichonly the constraints in the set C hold with equality. Then, we vary C over all possiblesubsets of {h I, h2. h3}. and thereby obtain a\l the critical points.

In general, this procedure would be somewhat lengthy. since there are 8 possiblevalues for C: namely. 0. {h I}. {h2}, {h3}. {h I, 112}. {h 1.h3}. {h2• h3}. and (h I. h2. h3)However, as we have already mentioned. the optimum in this problem must occur at apoint where h3 = O.so it suffices to find the set of critical points of L at which h3 = O.Moreover, as also mentioned, the case C = (h I.h i. h3) has no solutions becausehi = tn = 0 irnplies z , > O.Thus, there are just three cases to be considered: (I! J).{h2, h3}, and {h I, h3}. We examine each of these cases in tum.

4. )..2 > 0, X2 ~ 0, A2X2 == o.S. ).3 > O. [- Plx1 - P2x2 ~ o. ).,3(1 -- Pixi - P2X2) = O.

1. 1+).,1 -).,3P1 = O.. 2. 1+ ).,2 - A3P2 = O.3. Al > O. Xl ~ O. AIXI O.

Ltx , A) = XI + X2 + )..IXI + A2X2 + A3(1 - I'IXI - 1'2X2).

The critical points of the Lagrangean are the solutions (XI. X2. A I, )..2, )..3) to thefollowing system of equations:

at any (x I.X2). Since PI and P2 are-strictly positive by hypothesis. this matrix has Iull _rank, so the constraint qualification will hold at such a point. A similar computationshows that if the optimum occurs at a point where only the second and third constrai ntsare effective. the constraint qualification will be met. Finally. if the third constraintis the only effective one at the optimum and h t: = h3. we have Dh E(XI. X2) =(- PI. - P2), and once again the positivity of the prices implies that there are noproblems here.

In summary, the optimum in this problem must occur at a point where h 3 = O.butno matter where on this set the optimum lies (i.e .•no matter which other constraintsare effective at the optimum). the constraint qualification must hold. Therefore. thesecond condition of Proposition 6.5 is also satisfied, and the critical points of theLagrangean must contain the global maxima of the problem.

The Lagrangean for this problem is:

1596.3 Illustrations from Economics

Page 168: A First Course in Optimization Theory

• If PI > pz. there is exactly one critical point of L (namely, that arising in Case 3),whose associated x-values are (XI, X2) = (0, 1/ P2) .

• If PI < PZ, L has only a single critical point (namely, that arising in Case 2), andthis has the associated x-values (XI, X2) = (l/PI, 0).

Summing up, we have the following:

(I PI I )(XI,X2,AI,)..2,A3)= 0,-,--1,0,- .P2 pz P2

The value of the objective function at this critical point is given by II(x I, X2) = 1/ P2.

Case 3: C = (hl,h3}This is similar to Case 2. It is possible as a critical point of L only if PI ::: P2. Theunique critical point of L in this case is given by:

( 1 pz I)(XI,X2,)"I,A2,A3) = -,0,0, - -I, - .PI PI PI

The value of the objective function at this critical point is given by U(XI, X2) = 1/ PI.

Therefore, such a critical point can exist only if PI ~ Pz. Assuming this inequalityto be true, it is seen that the unique critical point of L at which hz = h3 = ° is

Case 2: C = {h2, h3}Since 112 == 113 = ° translates to X2 = ° and I - PIXI - P2X2 =0,we must havex I =: 1/ P > () in this casco Therefore, by the complementary slackness condition,we must have A I= O. Of course, we must also have A2 ::: O. Substituting these intothe first two equations that define the critical points of L, we obtai;

Note that at all these critical points, we have U(XI, X2) = 1/ p.

These conditions make sense only if PI = pz. Thus, we retain this case as a possiblesolution only if the parameters happen to satisfy PI = Pz. Otherwise, it is discarded.

In the event that PI and pz are equal to a common value P > 0,we must evi­dently have X; = 1/ p. It is now easily seen that, when PI = pz = P, there areinfinitely many critical points of L that satisfy XI > °and Xz > O. Indeed, any point(XI, X2, AI, A2, A3) is a critical point provided

Al = A2 = 0, )..3 = 1/P, XI E (0,1/ p), and X2 := (f - pxJ)/p.

the critical points of L, we obtain

Chapter 6 Inequality-Constrained Optimization160

Page 169: A First Course in Optimization Theory

hl(XI,X2) = XI > 0

hZ(XI, X2) = X2 > 0

h3(XI.X2) = xl +xi - y 2: O.

As usual, we shall assume that all the parameters of the problem-namely. WI. lJ)2.

and y-are strictly positive.Once again, we begin our analysis with a demonstration that both conditions of

Proposition 6,5 are satisfied. The existence of solutions may be demonstrated in manyways, for instance, by compactifying the feasible action set in the manner describedin Example 3.8, The details are omitted.

To check that the constraint qualification will hold at the opt imurn, we first identifyall possible combinations of the constraints that can, in principle, hold with equalityat the optimum, Since there are three constraints, there are eight cases to be checked:0, h v. hz, h3, (hl,h2), (h),h3), (h2,h3), and (hl,h2,h3), Of these, the last canbe ruled out since hI = il2 = ()implies x? + xi = 0, whereas the constraint setrequiresxj +xi 2: y. It is also apparent that. since WI and W2 are strictly positive, wemust have h3 = 0 at the optimum (i.e., total production XI + xi must exactly equaly), or costs could be reduced by reducing output. This means there are only three

The constraint set of this problem is defined through three inequality constraints.namely

6.3.2 An Illustration from Producer TheoryThe p;',)blem we consider in this section is the one of minimizing VI IX I + W2X2 overthe feasible set

We have already shown that for any PI > 0 and P2 > O. the critical pointsof L must contain the solution of the problem. Since L has a unique critical pointwhen PI > P2, it follows that this critical point also identifies the problem's globalmaximum for this parameter configuration; that is, the unique solution to the problemwhen PI > P2 is (XI, X2) = (0, 1/ P2). Similarly, the unique solution to the problemwhen PI < P2 is (XI, X2) = t I]PI, 0). Finally, when PI = P2 = p. there areinfinitely many critical points of L, but at all of these we have U(XI. X2) = /1 p.Therefore, each of these values of (XI, X2) defines a global maximum of the problemin this case.

• IfPI = P2 = P, L has infinitely many critical points (viz., all of the critical points-­arising in each of the three cases). The set of x-values that arises in this case is{(XI, X2) I XI + X2 = J/ P, XI 2: 0, X2 2: OJ.

1616.3 lliustrations from Economics

Page 170: A First Course in Optimization Theory

As in subsection 6.3.1, we fix a subsetC of {h 1, h2, h3} and find the set of all pos­sible solutions to these equations when only the constraints in C hold with equality.As C ranges over all possible subsets of {h I,h2, h3], weobtain the set of all criticalpointsof L.

I. -WI + )1.) + 2A3XI = O.2. -W2 + A2 + 2A3X2 = O.3. A.I :::: 0, XI :::: 0, A.IXI = O.

4. A2 :::: 0, X2 ~ 0, A2X2 = O.5. A3 :::: 0, xf +xi - y ~ 0, A3(xf + xi - y) = o.

Note that we have implicitly set the problemup as a maximization problem, withobjective -WtXt -W2X2. Thecritical pointsof L are thesolutions (XI, X2, AI, A2, A3)to the following system of equations:

Since we are assuming that only h3 holds with equality,we must nave XI, X2 > 0,sop(DhE(XI,X2» = lEI = I as required.

Summing up, the optimum must occur at a point where h3 = 0, but no matterwhere on this set the optimum occurs, the constraint qualification will be met. Itfollows that the set of critical points of the Lagrangeanmust contain the solution(s)of the problem.The Lagrangean L in this problem has the form

L(xt, X2, AI, A2, A3) = -WtXt - W2X2 + AtXt + A2X2+ A3(xf + xi - y).

Since this matrix evidently has the required rank, it follows that if the optimumoccurs at a point where h I and h3 are the only effectiveconstraints, the constraintqualificationwill be met.An identical argument, with the obviouschanges, showsthat the constraint quali­

ficationis not a problem if the optimum happensto occur at a point whereb: and h3are effective.This leaves the case h s = h3. In this case, we have

possibledescriptions of the set b s of effectiveconstraintsat the optimum: hE = 1t3,hE = (It I,1t3), and hE = (h2, h3)' Wewill show that in each case, the constraintqualificationmust hold.First, consider the case h s = (h I, hs). Since h I and h3 are effective, we have

x I= 0 and xf + xi = y, so X2 = .,(Y. Therefore,

DhE(XI, X2) = [2~1 2~2] [~2~ l

Chapter 6 Inequality-Constrained Optimization162

Page 171: A First Course in Optimization Theory

The value of the objective function (-w,x, - W2X2) at this critical point is -WI y1/2.

2A3XI - WI = 2A3.JY - WI = 0,

or A3 = wl/2,J'Y. Finally. substituting X2 = 0 into the second equation definingthe critical points of L, we get A2= W2. Thus, the unique critical point of L whichsatisfies hz = h3 = 0 is

(X., X2, AI, A2, ).,3) = (Jy, o, 0, W2. WI/ Jy).

Case 2: C = {h2, h3)

In this case, we have XI > 0 (and. in fact, XI = .,/Y. since n: = h3 = 0 impliesxz = 0 and xt + xi - y = 0). Therefore, we must also have AI = O. SubstitutingXI = ,J'Y and AI = 0 into the first of the equations that define the critical points ofL. we obtain

Case 1: C = (h3)

Since we must have XI > 0 and.x2 > 0 in this case, the complementary slacknessconditions imply AI = A2 = O. Substituting these values into the first two equationsthat define the critical points of L, we obtain

2A3.x1 = WI

2A3X2 = W2·

Dividing the first equation by the second, this implies

XI = (:~) X2·

By hypothesis. we also have h3 = 0 or x~ + x~ = y. Therefore. we have

XI =.( 2WIY 2)1/2, X2 = (. 2wiy 2)1/2, A3 = (WI+w~)1/2WI + Wz WI + Wz Y

Combined with AI = AZ = 0, this represents the unique critical point of L In

which hi > 0, h: > 0, and h3 = O. Note that the value of the objective function-WIXI - WZX2 at this critical point is

-WIXI - W2X2 = -(WI + w~)1/2 yt/2.

Once again, this process is simplified by the fact that we do not have to consider allpossible subsets C. First, as we have mentioned, h3 must be effective at an optimum.so it suffices to find all critical points of L at which h3 = O. Second. the caseC = (h I,h2, h3) is ruled out. since hi = 112 = 0 violates the third constraint thath3 ?: O.This results in three possible values for C, namely. C = (h 3). C = (II 2. II) I.and C = lh I,h3}. We consider each of these in turn.

ti.3 Illustrations from Economics.. ...,~ ,,,'

Page 172: A First Course in Optimization Theory

V = un (x E lRn Ig(x) = 0, h(x) :::0).

6.4 The General Case: Mixed Constraints

A constrained optimization problem with mixed constraints is a problem where theconstraint set V has the form

• When WI < W2, then the larger of the two values is _WIy'/2, which arises at(x I,X2) = (y1/2, 0). Therefore, the problem has a unique solution when WI < W2,namely (y1/2, 0).

• When WI < W2, then the larger of the two values is - W IY 1/2, which arises at(XI, X2) = (0, yl/2). Therefore, the problem has a unique solution when WI > W2,namely (0, yI/2).

• When WI and W2 have a common value W > 0, then these two values of theobjective function coincide. Therefore, if WI = W2, the problem has two solutions,namely (yl/2, 0) and (0, yI/2).

We may, as a consequence, ignore the first value of the objective function, and thepoint that it represents. Comparing the value of the objective function at the remainingtwo points, it can be seen that

Summing up, L has three critical points, and the values taken by the objectivefunction at these three points are -(wr +W~)1/2yl/2, _wlyl/2, and -w2yl/2. Now,we have already established that the set of critical points of L in this problem containsthe solution(s) to the problem; and, therefore, that the point(s) that maximize the valueof the objective function among the set of critical points must be the solution{s) tothe problem. All that remains now is to compare the value of the objective functionat the three critical points.

Since WI > 0 and W2 > 0, it is always the case that {wf t wi)I/2 > WI and,therefore, that

and the value of the objective function (-WIXI - W2X2) at this critical point is-w2yl/2.

Case 3: C = (h), h3}This is the same as Case 2, with the obvious changes. The unique critical point of Lthat falls in this case is

Chapter 6 Inequality-Constrained Optimization164

Page 173: A First Course in Optimization Theory

There is, of course, no loss of generality in this assumption. since this may be achievedsimply by renumbering constraints.

hj(x·) =0, i = l .... ,k

h;(x"') >0, i=k+l, ... I.

With the important exception of the nonnegativity of the vector A, the Theorem ofKuhn and Tucker can be derived as a consequence of the Theorem of Lagrange. Tosimplify notation in the proof. we will denote 1EI by k; we will also assume that theeffective constraints at x" are the first k constraints:

1. Ai 2: 0, and Ajhj(x') = 0, i = 1, .. " I.2. D/(x·) + L~=I AjDhj(x') = O.

where h = (h 1, ... , h/) is a C 1 function from lR,n to lRi,and U C JR.n is open. Let Ebe the set of effective constraints at x" , and suppose that p (D h t: (x ..» = 1E I. wherehs = (hi)iEE· We are to show that there is A E JR.' such that

1) = un/x E IRn 1 h (x) 2: 0).

6.5 A Proof of the Theorem of Kuhn and Tucker

Let x" be a local maximum of f on the set

1. Aj 2: OandAjC{Jj(X*) = Of or jE {k+ I, ... ,k+/}.

2. Df(x*) +L~::x, Dcpi(X*) = O.

where U C lRnis open. Let E C {I, ... ,k+/} denote the set of effective constraints ([I

x", and let 'PE = (!pj)jEE. Suppose p«DC{J E(x*» = 1EI. Then. there exists ). E fRl t k

such that

D = UnIx E lRn I C{Ji (x) :::0, i = I, ... , k, C{Jj(x) 2: 0, j = k + I, ... , k + I},

Theorem 6.10 Let I: lRn ~ Rand C{Jj: lRn -+ R i = I, ... , I + k he C I[unctions.Suppose x" maximizes Ion

The following theorem is a simple consequence of combining the Theorem of La­grange with the Theorem of Kuhn and Tucker:

ifiE{l, ... ,k}

ifiE{k+I ....• k+l}.{gj,

C{Jj =hj-k,

where U C jRn is open. g: jRn -+- ~k •and II: lRn -+ lR/. For notational case. definetpj: an ~ ]Rk+/, where

1656.5 Proof of the Kuhn- Tucker Theorem

Page 174: A First Course in Optimization Theory

It remains to be shown that A ::: O. Since Ai = ° for i == k + 1, ... , I, we arerequired only to show that Ai ::: 0 for i = 1, ... , k. We will establish that Al ::: O.Asimilar argument will establish that A; ::: 0 for j = 2•...• k.

i=I, ... ,I.Ajhj(x*) == 0,

which establishes one of the desired properties.Now, for any i E !1, , k}, we have h, (x") ::::0, so certainly it is the case that

Ajhj(x*) = ° for i E {l, ,k}. For i E {k + 1, ... , f}, we have Ai = 0, so it is alsothe case that Ajhj(x*) = 0 for i E {k + 1, ... , I}. Summing up, we have as required

= 0,

We will show that the vector A satisfies the properties stated in the theorem.First, observe that for i = k + 1, ... , I, we have A; == 0, and. therefore.

A;Dh;(x*) == 0. Therefore,

I kDf(x*) + LA;Dhj(x*) = Df(x*) + LAjDh;(x*)

i=J i==JIe

= Df(x*) +L Il; Dhi (x*)i=J

i=k+l, ... ,l.i=I, ... .kAi = {Mi'

0,

Now define AE jR' by

IeDf(x*) + L.MiDhj(x*) == O.

;==1

By construction, we have x* E 1)*.Since x" is a local maximum of f on 1), itis certainly a local maximum of f on D*. Moreover, we have p(Dh£(x*') == k,by hypothesis. Therefore, by the Theorem of Lagrange, there exists a vector Il =(111, ... , Ilk) E Rk such that

D* == UnVn{XElRn Ihj(x)=O, i=l, ... ,k}.

Let V == n;==k+IV;. By the continuity of hi, Vi is open for each i,and so, therefore,is V. Now, let D* C D be the equality-constrained set defined through k equalityconstraints and given by

For each i E (l, _._, /), define

Chapter 6 Inequality-Constrained Optimization166

Page 175: A First Course in Optimization Theory

To this end, we first show that there is ')* > 0 such that for all 1') E [0. 1')*), wemust have ~(17)E 1),where V is the constraint set of the original problem.

Dj(x*)DI;(O) ::: O.

= -AI.

To complete the proof, we will show that

Djex*)DI;(O) = - (tAiDh;(X*») DI;(O)

= - (tAiDh;eX.)DI;(O»),=1

Since Ai = 0 for i = k + I •...• 1. we now have

Dh 1 (x")DI;(O) = 1

Dhj(x·)DI;(O) = O. i = 2 ..... k.

or, equivalently, that

DhE(X*)DI;(O) = (1.0..... 0).

DHx(x*, O)DI;(O) + DHf'/(x*, 0) = O.

SinceDHx(x.1') = DhE(x)andDHry(x,') = (-I.O ..... O)atany(x.I/).thisimplies in turn

Differentiating this expression using the chain rule, and evaluating at 1;(0) = x', weobtain

H(I;(1').1) = O. 1') E N.

Let DHx(x, ') denote the k x n derivative matrix of H with respect to the x variablesalone, and DHf'/(x, ') denote the k x 1 derivative vector of H with respect to 1'). Forfuture reference, we note that since b e = (h 1•... ,hk), we have D H, (r , I])DhE(x); we also have DIl,/(x. ') = (-1. O•.... 0) at any (x. 'I).

By the definition of H, we have Ht x"; 0) = O. Moreover. p(DH,(x·. 0»p(DhE(x*» = k. Therefore, by the ImplicitFunction Theorem (sec Chapter 1.Theorem 1.77), there is a neighborhood N of zero. and a C1 function 1;: N _.. IRn.such that HO) = x*, and

III (x. II) = II I(x) - '1

IIj(x, ') = IIj(x). i = 2 ..... k.

To this end, define for x E JR" ~d'ry E JR, the function H = (HI ....• lh) frOli'IRn+1 to IRk by

1676.5 Proof of th» Kuhn-Tucker Theorem

Page 176: A First Course in Optimization Theory

6.6 Exercises

I. Solve the following maximization problem:

Maximize In x + In y

subject to x2 + y2 = Ix,y ~ O.

2. A finn produces two outputs y and z using a single input x. The set of attainableoutput levels H(x) from an input use of x, is given by

H(x) = {(Yo z) E lR~ Ii + z2 s x}.

The firm has available to it a maximum of one unit of the input x. Letting Py

or DI(x*)D;(O) .::::O. The proof is complete. 0

DI(~(O»DHO) .::::0,

(/(HO» ~ 1(;('7») .::::0,

for a111] > 0 and sufficiently small. Taking limits as 17~ 0, we obtain

Therefore,

I(x"') ~ 1(;('1»·

Finally, by shrinking the value of n" if need be, we can evidently ensure that ;(T}) E Ufor all '7 E [0, '7*). Thus, there is 1]* > 0 such that ;('1) E '0 for 1] E [0, '7*), asclaimed.

Now, since ;(0) = x" is a local maximum of 1on '0, and ;('7) is in the feasibleset for 1] E [0, '7*), it follows that for 1]> 0 and sufficiently close to zero, we musthave

h;(;(1]» > 0, j = k + I, ... ,1.

hj(Hl1» = 0, i = 2, ... , k.

For i = k + 1, ... , l •we have h;(;(O» = h;(x"') > O. Since both h;(·) and;(-) arecontinuous, it follows that by choosing 1]sufficiently small (say, 1] E (0, 1]*», we canalso ensure that

and

If I] > 0, then Hj(;(l1» = 0 for i = 1, ... , k, and from the definition of thefunctions Hi, this means

Chapter 6 Inequality-Constrained Optimization168

Page 177: A First Course in Optimization Theory

Let Pi denote the unit price of Yi, and Wi that of Xi. Describe the firm's opti­mization problem and derive the equations that define the critical points of theLagrangean L. Calculate the solution when PI = P2 = I and WI = W2 = 2.

7. A consumer with a utility function given by 11(."1:1, X2) = JxI + XIX2 has anincome of 100. The unit prices of XI and X2 are 4 and 5, respectively.

(a) Compute the utility-maximizing commodity bundle, if consumption mustbe nonnegative.

(b) Now suppose the consumer were offered the following option. By paying alump sum of a to the government, the consumer can obtain coupons whichwill enable him to purchase commodity 1 at a price of 3. (The price of

XI ::: 0, t = I, ... , T.

5. A firm produces an output y using two inputs XI and X2 as Y = .JXiX2. Unionagreements obligate the firm to use at least one unit of XI in its production process.The input prices of XI and X2 are given by W I and W2, respectively. Assume thatthe firm wishes to minimize the cost of producing y units of output.

(a) Set up the firm's cost-minimization problem. Is the feasible set closed? Is itcompact?

(b) Does the cookbook Lagrangean procedure identify a solution of this prob­lem? Why or why not?

6. A firm manufactures two outputs YI and Y2 using two inputs XI and X2. Theproduction function f: IR~ -+ R~ is given by

(YI, 12) = j(XI, X2) = (x:/2, X :/2xi/\

subject to

T (1)1{;z.JxrMaximize

3. A consumer has income I > 0, and faces a price vector p E R++ for the threecommodities she consumes. All commodities must be consumed in nonnegativeamounts. Moreover, she must consume at least two units of commodity 2, andcannot consume more than one unit of commodity 1. Assuming 1 = 4 andp = (I, I, I), calculate the optimal consumption bundle if the uti lity function isgiven by U(XI, X2, X3) = XIX2X3. What if I = 6 and P = (1,2, 3)?

4. Let T ::: 1 be some finite integer. Solve the following maximization problem:

and pz denote the prices of the two outputs, determine the firm's optimal output -mix.

16()6.6 Exercises

Page 178: A First Course in Optimization Theory

commodity 2 remains unchanged.) For what values of a will the consumeraccept this offer? For what values of a is he indifferent between acceptingthe offer and not accepting it?

8. An agent allocates the H hours of time available to her between labor (I) andleisure (H - I). Her only source of income is from the wages she obtains byworking. She earns w per hour of labor; thus, if she works I E [0, H) hours,her total income is wi. She spends her income on food <I) and entertainment(e), which cost p and q per unit respectively. Her utility function is given byu(j. e.T; and is increasing in I and e, and is decreasing in I.

(a) Describe the consumer's utility maximization problem.(b) Describe the equations that define the critical points of the Lagrangean L.(c) Assuming H = 16. w = 3, and p = q = 1, find the utility-maximizing

consumption bundle if u(j. e, I) = 11/3e1/3 - P.9. An agent who consumes three commodities has a utility function given by

U(XI. X2, X3) = x:f3 +min{x2, X3}.

Given an income of I, and prices of Pl. P2, P3, describe the·consumer's utilitymaximization problem. Can the Weierstrass and/or Kuhn-Tucker theorems beused to obtain and characterize a solution? Why or why not?

10. A firm has contracted with its union to hire at least L• units of labor at a wage rateof wIper unit. Any amount of additional labor may be hired at a rate of W2 perunit, where WI > W2. Assume that labor is the only input used by the firm in itsproduction process, and that the production function is given by I:~+ --+ lR+,where I is C' and concave. Given that the output sells at a price of P. describethe firm's maximization problem. Derive the equations that define the criticalpoints of the Lagrangean L in this problem. Do the critical points of L identifya solution of the problem?

11. A firm produces the output y using two inputs x 1 and X2 in nonnegative quantitiesthrough the production relationship

y = g(Xl, X2) =: x:/4X~/4.

The firm obtains a price of Pv > 0 per unit of y that it sells. It has available aninventory of K 1 units of the input XI and K2 units of the input X2. More units ofXt and X2 may be purchased from the market at the unit prices of Pt > 0 andP2 > 0, respectively. Alternatively, the finn can also sell any unused amount ofits inputs to the market at these prices.

(a) Describe the firm's profit-maximization problem, and derive the equationsthat define the critical points of the Lagrangean L.

Chapter 6 Inequality-Constrained Optimization170

Page 179: A First Course in Optimization Theory

The unit price of y is Py > 0, while that of the input Xi is ui, > O. i ;= 1.2.3.

(a) Describe the firm's profit-maximization problem. and derive the equationsthat define the critical points of the Lagrangean L in this problem.

(b) Show that the Lagrangean L has multiple critical points for any choice of(py, WI, W2, W3, W4) E IR~+.

(c) Show that none of these critical points identifies a solution of the profit­maximization problem. Can you explain why this is the case?

(b) Assuming Py = PI =P2":::: 1, KI = 4, and K2 = 0, solve for the firms>optimal level of output y.

(,.; Assume again that Pr = PI = P2 = I, but suppose now that the values ofKI and K2 are interchanged, i.e., we have K I = 0 and K2 = 4. Is the firm'snew optimal output level different from the old level? Why or why not?

2. A firm produces a single output y using three inputs XI, X2, and X3 in nonnegativequantities through the relationship

y = g(xi. X2. X3) = XI (X2 + X3)·

1716.6 Exercises

Page 180: A First Course in Optimization Theory

172

The notion of convexity occupies a central position in the study of optimizationtheory. It encompasses not only the idea of convex sets, but also of concave andconvex functions (see Section 7.1 for definitions). The attractiveness of convexity foroptimization theory arises from the fact that when an optimization problem meetssuitable convexity conditions, the same first-order conditions that we have shown inprevious chapters to be necessary for local optima, also become sulJicient for globaloptima. Indeed, even more is true. When the convexity conditions are tightened towhat are called strict convexity conditions, we get the additional bonus of uniquenessof the solution.

The importance of such results, especially from a computational standpoint, isobvious. Of course, such a marked strengthening of our earlier analysis does notcome free. As we show in Section 7.2, the assumption of convexity is a strong one.A function that is concave or convex must necessarily be continuous everywhere onthe interior of its domain. It must also possess strong differentiability properties; forinstance, all directional derivatives of such a function must exist at all points in thedomain. Finally, an assumption of convexity imposes strong curvature restrictionson the underlying function, in the form of properties that must be met by its first­and second-derivatives.

These results indicate that an assumption of convexity is not an innocuous one,but, viewed from the narrow standpoint of this book, the restrictive picture they paintis perhaps somewhat exaggerated. For nne thing. we continue to assume ill study­ing constrained optimization problems that all the functions involved arc (at least)continuously differentiable. Continuous differentiability is a much greater degree ofsmoothness than can be obtained from just an assumption of convexity: thus, thecontinuity and differentiability properties of concave and convex functions, whichappear very strong when viewed in isolation, certainly do not imply any increasedrestrictiveness on this problem's structure. Although the same cannot be said of thecurvature implications of convexity, even these are not very significant in economic

Convex Structures in Optimization Theory

7

Page 181: A First Course in Optimization Theory

IAs always. some additional regularity conditions may have to be mel.

7.1 Convexity Defined

Recall from Chapter 1 that a set D c JR." is called convex if the convex combinationof any two points in V is itself in '0, that is. if for all x and y in D and all A E (0, I).it is the case that AX + (I - )..)y E V. Building on this definition. we introducein subsection 7.1.1 two classes of functions called concave functions and comexfunctions.

Concave and convex functions play an important role in the study of maximizationand minimization problems, respectively, Their significance arises primarily from thefact that in problems with a convex constraint set and a concave objective function,the first-order conditions are both necessary and sufficient to identi fy global maxi rna;while in problems with a convex constraint set and a convex objective function. thefirst-order conditions are necessary and sufficient to identify global minima. I

Motivated by this, we will say in the sequel that a maximization problem is a COIll'('X

maximization problem if the constraint set is convex and the objective function isconcave. A minimization problem will similarly be said to be a convex minimizationproblem if the constraint set is convex, and the objective function is convex. Moregenerally. we will say that an optimization problem is a cOllvex optimit aticn, prohl em,or has a convex environment, if it is either a convex maximization problem. or a convexminimization problem. Thus, our use of the word "convexity" encompasses the entirecollective of convex sets, and concave and convex functions.

Following our definition of concave and convex functions in subsection 7.1.1.we define the more restricted notions of strictly concave and strictly convex func­tions in subsection 7.1.2. In addition to possessing all the desirable properties ofconcave and convex functions, respectively. strictly concave and strictly convexfunctions also possess the remarkable feature that they guarantee uniqueness ofsolutions. That is, a convex maximization problem with a strictly concave objec­tive function can have at most one solution, as can a convex rninimizatiou prohlcmwith a strictly convex objective function. In an obvious extension of the tcrmi­nology introduced earlier, we will say that a maximization problem is a strictlyconvex maximization problem if it is a convex maximization problem, and thcobjective function is strictly concave. Strictly convex minimization problems arcsimilarly defined.

applications, since they are often justifiable by an appeal to such considerations asdiminishing marginal utility, or diminishing marginal product.

1717.1 CO/l\'l'.\irv nr{inrd

Page 182: A First Course in Optimization Theory

f[h + (1 - A)Y] :s Af(x) + (l - A)f(y)·

Similarly. j: D ~ IR is convex if and only if for all x , y E D and A E (0, I). it is thecase that

I[h + (I - A)Y] ~ V(x) + (1 - A)f(y)·

Theorem 7. t Afunction I: V ~ lRis concave on V if and only iflor all x, y E Vlind)" E (0. 1). it is tile case that

Intuitively, the sub graph of a function is the area lying below the graph of a function,while the epigraph of a function is the area lying above the graph of the function.Figure 7.1 illustrates these concepts.

A function f is said to be concave on 'D if sub I is a convex set, and to be convexon V if cpi I is a convex set.

epi I = {(x,y) E V x ~ I I(x):s y}.

sub I = {(x, Y) E V x ~ I I(x) ~ y}

7.1.1 Concave and Convex Functions

Let I: V C ~n ~ R Through the rest of this chapter, it will be'assumed that'D isa convex set.

The subgraph of I and the epigraph of I, denoted sub f and epi I, are definedby:

Fig. 7.1. The Epigraph and Subgraph

Chapter 7 Convexityand Optimization174

Page 183: A First Course in Optimization Theory

so f is not convex either.

flAx + (1 - ).._)y] = f(-I) = + l > -4 = Aj(x) + (l - A)/CI'),

so f is not concave. On the other hand, for A= 3/4, we have

flAX + (I - A)Y) = fell = J < 4 = A/(x) + (I - A)/(y),

The notions of concavity and convexity are neither exhaustive nor mutually exclu­sive; that is, there arc functions that are neither concave nor convex, and functionsthat are both concave and convex. As an example of a function of the former sort.consider f: lR ~ ~ defined by f(x) = x3 for x E lR. Let x = -2 and y = 2. For,i. :co ; /4. we have

This completes the proof for concave functions, The result for convex functions isproved along analogous lines. 0

(AWl + (1 - A)W2, AZI + (I - A)Z2) E sub f

Therefore, we have flAW, + (I - A)W2] :::. hi + (1 - A):2, or

We will show that sub f is a convex set, i.c., that if (w I, Z I) and (W2, Z2) are arbitrarypoints in sub f .and A E (0, I). then (AWl + (I - A)W2. AZI + (I - A)Z2) is also insub f.

By definition of sub f, we must have f(wt) :::.ZI and f(w2) :::.Z2. By hypothesis,if A E (0, I), we must also have

I[Ax + (I - A)Y) :::. A/(x) + (I - A)f(y)·

as required.Now, suppose it is the case that for all v, Y (, '0, and all A C (0, I), it is the case

that

IIAx + (I - A)Y] :::. A/(X) + (I - A)j(y),

By definition of sub I, a point (w, z) is in sub I only if Jew) :::.z. Therefore.

(AX + (l - A)Y, Af(x) + (I - )._)/(y» E sub f

Proof First, suppose f isconcaveri.er, sub f is a convex set. Let x and y be arbitrarypoints in 1). Then, (x, f(x» E sub f and (y, fey)) E sub f. Since sub / is aconvex set, it is the case that for any A E (0, I)

1757.1 Convexity Defined

Page 184: A First Course in Optimization Theory

2Indeed, it can be shown that if a function on R" is both concave and convex on R", then it must have theform [(x) = a . x + b for some a and b.3The term "linear" is usually reserved for affine functions which also satisfy [(0) = o.

Theorem 7.2 Afunction f: V -+ IRis concave on 'D if and only if the junction - fis convex on V. It is strictly concave on D if and only if f is strictly convex on V.

is strictly concave if 0 < CL < I, and is strictly convex if a > I. At the "knife-edge"point a = I, the function is both concave and convex, but is neither strictly concavenor strictly convex.

Our last result of this section is an immediate consequence of the definitions ofconcavity and convexity. Its most important implication is that it enables us, in thesequel, to concentrate solely on concave functions, whenever this is convenient forexpositional reasons. The analogous statements for convex functions in such casesare easily derived, and are left to the reader.

f(x) = x"

It is trivial to give examples of functions that are concave, but not strictly concave,or convex, but not strictly convex. Any affine function, for instance, is both concaveand convex, but is neither strictly concave nor strictly convex. On the other hand, thefunction f: lR++ -+ IRdefined by

f[Ax+ (l - A)Y] < Af(x) + (1 - )..)f(y)·

Similarly, a convex function f: D -+ lRis said to be strictly convex..if for all x, y E Dwith x =1= y, and for all ). E (0, I), we have

flAx + (I - )..)y] > )..f(x) + (I - )..)f(y)·

7.1.2 Strictly Concave and Strictly Convex FunctionsA concave function f: D -+ IRis said to be strictly concave if for all x, y E D withx =1= y. and all )..E (0, I). we have

f[h + (1 - A)Y] = a . [AX + (1- )..)y] + b= A(a . x + b) + (1 - A)(a . y + b)

= )..f(x) + (l - )..)f(y),

and it follows that f has the required properties.i Such functions, which are bothconcave and convex, are termed affine.3

For an example of a function that is both concave and convex, pick any a E IRnand b E IR. Consider the function f: 1R" -+ IR defined by f(x) = a . x + b. For anyx and yin ]Rn, and any A E (0, I), we have

Chapter 7 Convexity and Optimization176

Page 185: A First Course in Optimization Theory

Proof We will prove that if D is open and f is concave on 1), then f must becontinuous on D. Since int V is always open for any set D, and since the concavityof f on 'D also implies its concavity on int 1), this result will also prove that even if1) is not open, f must be continuous on the interior of V.

So suppose 'D is open and xED. Let Xk -+ x, Xk E V for all k. Since 'D is open,there is r > 0, such that B(x, r) C D. Pick a such that 0 < a < r. Let A C Btx . r)

Theorem 7.3 Let [: 1) -+ ~ be a concave function. Then, if D is open. f iscontinuous on D. If D is not open, f is continuous Oil the interior of D.

7.2.1 Convexity and ContinuityOur main result in this subsection is that a concave function must be continuouseverywhere on its domain, except perhaps at boundary points.

Several other useful properties of concave and convex functions arc listed in theExercises.

With the exception of Theorem 7.9, which is used in the proof of Theorem 7.1 S,none of the results of this section play any role in the sequel, so readers who are in ahurry (and who are willing to accept on faith that convexity has strong implications)may wish to skip this section altogether. The statement of one result, however, may heworth spending some time on, on account of its practical significance: Theorem 7. I()in subsection 7.2.3 describes a test for identifying when a C2 function is concave orconvex, and this test is almost always easier to use in applications than the originaldefinition.

• Every concave or convex function must also be continuous on the interior of itsdomain (subsection 7.2.1).

• Every concave or convex function must possess minimal differentiability proper­ties (subsection 7.2.2). Among other things, every directional derivative must hewell-defined at all points in the domain of a concave or convex function.

• The concavity or convexity of an everywhere differentiable function f can hecompletely characterized in terms of the behavior of its derivative D/, and theconcavity or convexity of a C2 function f can be completely characterized Interms of the behavior of its second derivative [)2f (subsection 7.2.:1).

7.2 implications of ConvexityThis section is divided into three parts which examine, respectively, the continuity.differentiability, and curvature properties that obtain from an assumption of COTlVCx­ity. Our main results here are that:

1777.2 Implications of Convexity

Page 186: A First Course in Optimization Theory

The conclusions of Theorem 7.3 cannot be strengthened to obtain continuity onall of D; that is, the continuity of I could fail at the boundary points of D. Considerthe following example:

We have already established that lim infk !(Xk) ~ lex). Therefore, we must havelimk->ooI(Xk) = I(x), and the proof is complete. 0

f'tx) ~ lim sup I(Xk).k-4OO

Since Ak must go to 1 as k ~ 00. by taking limits we obtain

Secondly. it is also true that for all k :::K. there is Wk E A and Ak E (0. I) suchthat x = AkXk + (l - Ak)Wk (see Figure 7.2). Once again. exploiting the concavityof I. we have

lim inf !(Xk) ~ I(x).k-oo

Taking limits, we now have

be defined by A = [z] liz - xII = a). Pick K large enough so that for all k ~ K, wehave IIxk - x II < a. Since Xk ~ x, such a K exists.Then, for all k ::: K. there is Zk E A such that Xk = (hx + (1 -- 8k)Zk for some

Ok E (0, I) (see Figure 7.2). Since Xk ~ x, and IIzk- xII = a > 0 for all k, it is thecase that Ok ~ I. Therefore. by the concavity of !.

Fig. 7.2. Convexity and Continuity

:r

A A

ClltIl'trr 7 Convexity and Optimization17H

Page 187: A First Course in Optimization Theory

with strict inequality if g is strictly concave. This establishes one of the requiredinequalities. The other inequalities may be obtained in a similar way. 0

with strict inequality if g is strictly concave. Rearranging this expression. we finallyobtain

with strict inequality if g is strictly concave. Therefore,

X2 - Xl X3 - X2g(X2) ::: --g(X3) + g(XI)

X3 - Xl x3 - Xl

g[ax3 + (1 - a)xtl ::: ag(x3) + (l - a)g(xl),

Proof Define a = (X2 - X,)/(X3 - Xl)' Then a E (0, I), and some simple calcu­lations show that (1 - a) = (X3 - X2)/(X3 - xj ), and that ax] + (l - a)xl = X2.Since g is concave, we have

oRemark Figure 7.3 describes the content of Theorem 7.5 in a graph.

If g is strictly concave. these inequalities become strict.

Theorem 7.5 Let g: 1) -+ lRbe concave. where 1) C lR is open and cOIn·cx. LN x I.X2. and X3 be points in D satisfying XI < X2 < X3. Then.

7.2.2 Convexity and DifferentiabilityAs with continuity, the assumption of convexity also carries strong implications forthe differentiability of the function involved. We examine some of these implica­tions in this subsection. We begin by establishing properties of concave functionsof one variable. Then we use these properties in a bootstrapping argument to deriveanalogous results for concave functions defined on lRn.

Then, J is concave (in fact, strictly concave) on [0, 1], but is discontinuous at theboundary points °and 1. 0

ifO < x < 1if x = 0,1.{

.r:f(x) =

-I,

Example 7.4 Define f: lO, I) -4- Jrtb~

1797.2 Implications of Convexity

Page 188: A First Course in Optimization Theory

Proof Fix any x and y in ~. First. suppose y > O.Then.

(g(x+t);-g(X)) = (g(X +t;;-g(X)) y.

Theorem 7.6 If g: ~ ~ lR is concave then all (one-sided)directional derivativesofg exist at all points x in ~. although some of thesemay have infinite values.

is nonincreasing in b for b > O. Therefore. allowing +00 as a limiting value. thelimit of this expression exists as b ~ 0+. It is now easy to establish that a concavefunction on IR must possess all its directional derivatives:

Theorem 7.5 implies. in particular. that if g: JR ~ R is a concave function. and xis any point in JR. the difference quotient

g(x + b) - g(x)b

Fig. 7.3. Concavity and Declining Slopes

Chapter 7 Convexity and Optimization1S0

Page 189: A First Course in Optimization Theory

Moreover. for any a E (0. 1),

g[at+(1-a)/'J = ![a(x+th)+(l-a)(x+/'II»)

~ «[t» + til) + (l - a)f(x + l'lz)

= agit) + (l - a)g(t'),

= --_---g(t) - g(O)I(x + til) - I(x)

Proof Let xED and II E IR"be arbitrary. Deline get) = f(x + til). I ::::O. SinceV is open. g is well-defined in a neighborhood of zero. Note that

Theorem 7.7 LeI V c JR" be open and convex. Then, if f: V ~ IR is concave, fpossesses all directional derivatives at all points in 'D. (These derivatives could beinfinite.)

An n-dimensional analogue of this result is now easy to establish:

o

(g(x+t~)-g(X») = (g(X)~g(X») = 0,

and this completes the proof.

Since the limit on the right-hand side is known to be well-defined, we have shownthat the directional derivative of g at x exists in any direction y < 0 also.

Finally. if y = 0, the directional derivative at x in the direction y trivially existssince

Since y > O. it is the case that ty »: 0 whenever t > 0; therefore, ty -+ 0+ if andonly if 1-+ 0+. Letting b = ty, we have

. (g(x+ty)-g(X») . (g(X+f»-g(X»)lim = y. lim .t .....0+ t b~O+ b

We have already established as a consequence of Theorem 7.5 that the limit of thesecond expression on the right-hand side exists as b -+ 0+. Therefore, the directionalderivative of x exists in any direction y > O.

Now suppose y < O.Then. we have

(g(X+ly)-g(X») _ (g(X+ly)-g(X») _

- ( y).t -Iy

Since y < 0, we have -/ y > 0 whenever t > 0; therefore. -I y -+ 0+ if and onlyif t ~ 0+. Letting b = =-ty, we have

). (g(X+ly)-g(X)) . (g(X+b)-g(X»)1m = (-y). lim .1-0+ I b.....O+ b

IRI7.2 Implications of Convexity

Page 190: A First Course in Optimization Theory

Observe that the Lebesgue measure is a natural generalization of the familiar "mea­sures" of length in JR, area in lR2, and volume in JR3. A cylinder in lR is simply an

where a = (a), ... ,an) and b = (h), ... , bn) are given vectors satisfying a « b.The Lebesgue measure of this cylinder, denoted /-L(E) say, is defined as

The following discussion is aimed at readers who are unfamiliar with measuretheory. It attempts to provide an intuitive idea of what is meant by a set of Lebesguemeasure zero. A more formal description may be found in Royden (1963) or Billings­ley (1978). Define a cylinder in lRn to be a set E of the form

E = {(Xl, ...• Xn) Iaj < Xi ~ bi, i = I, .... nl.

Remark A property which holds everywhere on a subset of lRn, except possiblyon a set of Lebesgue measure zero, is also said to hold "almost everywhere" onthat subset (or. more accurately, "almost everywhere with respect to the Lebesguemeasure"). In this terminology, Theorem 7.8 would be stated as: a concave or convexfunction defined on an open subset of IRn must be differentiable almost everywhereon its domain. 0

oProof See Fenchel (1953. p.87).

Theorem 7.8 Let D be on open and convex set in lRn, and let f: D ~ IR beconcave. Then f is differentiable everywhere on D. except possibly at a set of pointsof Lebesgue measure zero. Moreover, the derivative D/ of f is continuous at all..points where it exists.

A natural question that arises from Theorem 7.7 is whether, or to what extent,convexity actually implies full differentiability of a function. The answer to thisquestion, unfortunately, requires a knowledge of the concept of Lebesgue measureon IRn. For readers unfamiliar with measure theory, an interpretation of the result isgiven following the statement of the theorem.

so g is concave on lR+. By Theorem 7.6, therefore, the difference quotient

( g(t) ~ g(O))

is nondecreasing as t ~ 0+, so the limit as t ~ 0+ of (g(t) - g(O»/t exists,although it could be infinite. From the definition of g, this limit is D/(x; h). SincexED and y E lRn were arbitrary, we are done. 0

Chapter 7 Convexity and Optimization182

Page 191: A First Course in Optimization Theory

Df(x)(y-x)::: J(y)-f(x) for all x i y e D,

Theorem 7.9 Let D be an open and convex set in JR", and let J: D -+ lit bedifferentiable on 1), Then. / is concave on D if and only if

7.2.3 Convexity and the Properties of the DerivativeIn this section, we establish the implications of convexity for the behavior of thederivative. Our first result provides a complete characterization of the concavity orconvexity of an everywhere differentiable function J using its first derivative DJ:

I=Jl(Ek) = I= (~)Ic T] = T) < f,k=1 k=1

so we have covered the rationals by cylinders of total measure less than f. Since ewas arbitrary, we are done.

and let e, be the interval (r;, r:1.Note that

Jl(Ek) = 2 G)k+1 T] = Gr T].

We now have rk E e, for each k, so Q C U~I e; Moreover,

For instance, the rationals Q constitute a set of measure zero in lR. To see this, takeany enumeration {rl' r2, ... } of the rationals (recall that the rationals are a countableset), and any f > O. Pick T} such that 0 < '7 < f. For each k . define r; and rit hy

00

LJl(E;) < f.

; == 1

interval (a, bl, and its Lebesguemeasiire is (b - a), which is the same as the in- __terval's length. Similarly, a cylinder in JR2 is a rectangle, whose Lebesgue measurecorresponds to our usual notion of the rectangle's area, while a cylinder in lR3 is acube, whose Lebesgue measure corresponds to our usual notion of volume.

In an intuitive sense, a set of Lebesgue measure zero may be thought of as a setthat can be covered using cylinders whose combined Lebesgue measure can be madearbitrarily small. That is, a set X C JR." has Lebesgue measure zero if it is truethat given any E > 0, there is a collection of cylinders E 1, E2, ... in JR" such thatX C U~lEi and

IRJ7.2 Implications of Convexity

Page 192: A First Course in Optimization Theory

a.b > O.

Example 7.12 Let I: IR~+ ~ IRbe defined by

Our next example demonstrates the significance of Theorem 7.10 from a practicalstandpoint.

Example 7.11 Let I: lR ~ lR and g: lR ~ IR be defined by I(x) = _x4 andg(x) = x4, respectively. Then, f is strictly concave on JR, while g is strictly convexon IR. However, /'(0) = g'(O) = 0, so, viewed as I x I matrices. /,(0) is notnegative definite, while g' (0) is not positive definite. 0

It is very important to note that parts 3 and 4 of Theorem 7.10 are only one­way implications. The theorem does not assert the necessity of these conditions,and, indeed, it is easy to see that the conditions cannot be necessary. Consider thefollowing example:

oProof See Section 7.6.

I. I is concave on V if and only if D2 f(x) isa negative semidefinitematrixfor allx ED.

2. f is COilvex 011 V if and only if D2 I(x) is a positive semidefipite matrixfor allx ED.

3. If D2 I(x) is negative definitefor all xED. then I is strictly concave on D.4. If D2 I(x) is positive definitefor allXED. then I is strictly convex on V.

Theorem 7.10 Let I: D ~ R be a C2 function, whereVeRn is open and convex.Theil,

The concavity or convexity of a C2 function can also be characterized using itssecond derivative, as our next result shows. The result also provides a sufficientcondition for identifying strictly concave and strictly convex functions, and is ofespecial interest from a computational standpoint as we explain shortly.

oProof See Section 7.5.

Dfx)(y-x) ~ f(y)-/(x) for all x i y e D,

while f is convex on V if and only ifChapter 7 Convexity and Optimization184

Page 193: A First Course in Optimization Theory

7.3.1 Some General ObservationsThis section presents two results which indicate the importance of convexity foroptimization theory. The first result (Theorem 7.13) establishes that in convex opti­mization problems, all local optima must also be global optima; and, therefore, that

7.3 Convexity and Optimization

This section is divided into three parts. In subsection 73.1, we point out some simple.but strong, implications of assuming a convex structure in abstract optimizationproblems. Subsection 7.3.2 then deals with sufficiency of first-order conditions forunconstrained optima. Finally, in subsection 7.3.3, we present one of the main resultsof this section, Theorem 7.16, which shows that under a mild regularity condition,the Kuhn-Tucker first-order conditions are both necessary and sufficient to identifyoptima of convex inequality-constrained optimization problems.

All results in this section are stated in the context of maximization problems. Eachhas an exact analogue in the context of minimization problems: it is left to the readerto fill in the details.

The determinant of this matrix is ab(l - 1I - b)x2a-2 y211-2, which is positive IIa + b < I, zero if a + b = I, and negative if a + b > L Since the diagonal termsare negative whenever a, b < I, it follows that f is a strictly concave function ifa + b < I, that it is a concave function if a + b = I, and that it is neither concavenor convex when a + b > I. 0

Compare checking for the convexity properties of f by using these inequalities tochecking for the convexity properties of f using the second derivative test provided hyTheorem 7.10. The latter only requires us to identify the definiteness of the followingmatrix:

while it is convex on lR~+ if for all x and yin lRt+ and all AE (0, I), we have

[Ax +(l-)..)x)QP ..y+(l-A)y]b ~ hoi +(1 -)..)x°yh.

For given a and b, this function is-concave if, for any (x, y) and (x, y) in IR~~ andany x E (0, I), we have

[AX + (I - )..)x)Q[)..y + (I - A)y)h ~ hO yh + (I - ),V" /',

1857.3 Convexity and Optimization

Page 194: A First Course in Optimization Theory

oa contradiction.

I(z) = I[Ax + (1- "J...)y) > A/(x) + (1 - A)!(Y) = = lex),

Proof Suppose argmax{f(x) Ix E V} is nonempty. We will show it must containa single point.

We have already shown in Theorem 7.13 that argmax(f(x) I x E VI must beconvex. Suppose this set contains two distinct points x and y. Pick any "J... E (0, I)and let z = AX + (I - A)y. Then. z must also be a maximizer of I, so we must havefez) = lex) = I(y)· However, by the strict concavity of I,

Theorem 7.14 Suppose DC lRn is convex, and I: V -+ lR. is strictLy concave. Thenarg maxi/ex) I x E VI either is empty or contains a single point.

and this must hold with equality or Xl and X2 would not be maximizers. Thus, theset of maximizers must be convex. completing the proof. 0

since fez) > ftx), But AX + (l - )..)z is in Btx , r) by construction, so f(x) :::flAx + (I - A)Z], a contradiction. This establishes Part I.

To see Part 2, suppose x I and X2 are .both maximizers of !on V.Then. we have!(X) = I(X2). Further, for A E (0, I), we have

f[h + (I - A)z] ~ Af(x) + (1 - )..)f(z) > !~x),

Proof Suppose [admits a local maximum x that is not also a global maximum. Sincex is a local maximum,"lhere is r > 0 such that [(x) ::: I(y) for all y E B(x, r) n D.Since x is not a global maximum, there is zED such that I(z) > J(x). Since V isconvex, (AX + (l - A)Z) E V for all A E (0. 1). Pick A sufficiently close to unity sothat (Ax + (1 - A)z) E Btx , r). By the concavity of f.

Theorem 7.13 Suppose D C lRn is convex, and f: V -+ lR is concave. Then,

I. AllY local maximum oj I is a global maximum of [.2. The set arg max (f(x) I x E V} oj maximizers of [ on V is either empty or

convex.

to find a global optimum in such problems. it always suffices to locate a local opti­mum. The second result (Theorem 7.14) shows that if a strictly convex optimizationproblem admits a solution, the solution must be unique.

Chapter 7 Convexity and Optimization186

Page 195: A First Course in Optimization Theory

D = {x E U I hi(x) ::::0, i = I, ... , l}

Then x· maximizes f over

i = I, ... .l .hj(x) > 0,

Theorem 7.16 (The Theorem of Kuhn and Thcker under Convexity) Let J bea concave C1 function mapping Uinta IR. where U C IRn is open and convex. Forj = 1,.,., l. let hi: U -+ IR also be concave c1 functions. Suppose there is somex E U such that

7.3.3 Convexity and the Theorem of Kuhn and TuckerThe following result, perhaps the most important of this entire section, states that thefirst-order conditions of the Theorem of Kuhn and Tucker are both necessary andsufficient to identify optima of convex inequality-constrained optimization problems,provided a mild regularity condition is met.

If DJ(x) = 0, the right-hand side of this equation is also zero, so the equation statesprecisely that J(x) ::::f(y). Since Y E D was arbitrary, x is a global maximum of fonD. 0

J(y)-J(x) ::: DJ(x)(y-x).

Proof We have shown in Chapter 4 that the condition Df(x) = 0 must hold when­ever x is any unconstrained local maximum. It must evidently also hold, therefore,if x is an unconstrained global maximum.

The reverse implication (which requires the concavity of f) is actually an irnmc­diate consequence of Theorem 7.9. For, suppose x and yare any two points in V.Theorem 7.9 states that, by the concavity of J.we must have

Theorem 7.15 Let V c IRn be convex. and f: V -l> IRhe a conca\'e and differen­tiable function on V. Then. x is an unconstrained maximulIl of f on V if and only ifDf(x) = o.

unconstrained maxima, when such maxima exist.

I R77.3 Convexity and Optimization

7.3.2 Convexity an/Unconstrained Optimization

The following result shows that the first-order condition for unconstrained optima(i.e., the condition that Df(x) = 0) is both necessary and sufficient to idcnii fy gloha I

Page 196: A First Course in Optimization Theory

Example 7.18 As in Example 6.4, let 1:]R2 ~ ]Rand h:]R2 ~ lR be defined byf(x,y) = -x2-y2andh(x,y) = (x-l)3_y2,respectively.Considertheproblemof maximizing I on the set

consists of exactly the one point 0; thus, there is no point x E V such that h(x) > 0,and Slater's condition is violated. Evidently, the maximum of Ion V must occur atO.Since I' (0) = I and h' (0) = 0, there is no J.... such that I' (0) +)...•g'(0) = 0, and[KT-I] fails at this optimum. 0

V = [r E IR I hex) 2:: O}

Example 7.17 Let I and h be functions on Rdefined by I(x) = x and hex) = _x2for all x E IR. Then I and h are concave functions. However, the constraint set

The condition that there exist a point x at which h;(x) > 0 for all i is calledSlater's condition. There are two points about this condition that bear stressing.

First, Slater's condition is used only in the proof that [KT-l] and [KT-2) are nec­essary at an optimum. It plays no role in proving sufficiency. That is, the conditions[KT-l] and [KT-2] are sufficient to identify an optimum when I and the functionshi are all concave, regardless of whether Slater's condition is satisfied or not.

Second, the necessity of conditions [KT-l] and [KT-2] at any I,pcal maximumwas established in Theorem 6.1, but under a different hypothesis, namely, that therank condition described in Theorem 6.1 held. Effectively, the necessity part ofTheorem 7.16 states that this rank condition can be replaced with the combinationof Slater's condition and concave constraint functions. However, both parts of thiscombination are important: just as the necessity of [KT-I J and [KT-2] could fail if therank condition is not met, the necessity of [KT-I] and [KT-2] could also fail if eitherSlater's condition or the concavity of the functions h, fails. Consider the followingexamples:

oProof See Section 7.7.

I/K7:2/ A* .2:: 0, 'L)ihi(X*) = O.

;==1

if and only if there is )...* E IRk such that the Kuhn-Tucker first-order conditions hold:

I(KT-lj DI(x*) + 'f).iDh;(x*) = O.

;=)

Chapter 7 Convexity and Optimization11l1l

Page 197: A First Course in Optimization Theory

Of course, as we have seen in Section 6.2, a point (x , J..) is a critical point of L if andonly if xED and (x, A) meets [KT-l] and IKT-2].

First, we consider the case where Slater's condition is met. In this case, Theo­rem 7.16 has the powerful implication that the cookbook procedure for using theTheorem of Kuhn and Tucker (outlined in Section 6.2) can be employed "blindly,"that is, without regard to whether the conditions of Proposition 6.5 are met or not.Two factors give rise to, and strengthen, this implication. First, the concavity of thefunctions f and h, together with Slater's condition imply that the first-order condi­tions [KT-I] and [KT-2] are necessary at any optimum. So, if an optimum exists atall, it must satisfy these conditions, and must, therefore, tum up as pan of a criticalpoint of the Lagrangean L. Second, since [KT-I J and [KT-2J arc also sufficient for a

Aj ::: 0, hj(x) :::0, A;h;(x) == 0, i = I, ... ,I.

DI(x) + A;Dhj(x) == 0,

and the critical points of L are the points (x, J..) E U x Jl{1 that satisfy the followingconditions:

,Lt x , A) = I(x) + :r>-;!I;(X),

;=1

Maximize !(x) subject to xED == [z E U I h;(z):::: 0, i = 1, ... , f},

where the functions! and h, are all concave C I functions on the open and convex sctU c lRn. Recall that the Lagrangean for this problem is the function L: U x lRl -+ 1M:defined by

7.4 Using Convexity in Optimization

The results of the previous section are of obvious importance in solving optimizationproblems. The purpose of this section is to highlight by repetition their value in thisdirection. We focus on inequality-constrained problems of the sort

The point of Example 7.18 is to stress that Slater's condition cannot, by itself.replace the rank condition of Theorem 6.1.Rather, this is possible only if the functionsh; are all also concave.

noptimal point.

Note that while Slater's condition "iswet in this problem (for instance, h (x ,)I) =;I > 0 at (x, y) = (2,0», h is not concave on D. As a consequence, the conditions[KT-IJ and [KT-2] fail to be necessary; indeed, we have shown in Example 6.4 thatthe unique global maximum of f on D occurs at (x", y.) = (1,0), but that thereis no A· E lR such that Df tx"; y.) + A"Dg(x', y.) = O. Thus, [KT-l) fails at this

11\97.4 Using Convexity ill Optimization

Page 198: A First Course in Optimization Theory

and the result is proved.

When t E (0, I). the concavity of f implies j(ty+ (1- t)x) ?: tf(y) + (1 - t) f(x).so by choosing t > O. t ~ 0, we have

Df(x)(y - x)?: lim tf(y) + (I - t)f(x) ~ f(x) = j(y) - j(x).t~O+ t

Df '()( ) I' f t» + t(y - x)) - f(x) I' f(ty + (l - t)x) - f(x)x y-x = 1m = lin .,--.0+ t I~O+ t

7.5 AProof of the First-Derivative Characterjzathm of ConvexityWe prove Theorem 7.9 for the case where f is concave. The result for convex f thenfollows simply by noting that D( - j) = - Df. and appealing to Theorem 7.2,

So suppose first that f is concave. We have

In particular. the last step outlined in the cookbook procedure of Section 6.2 (viz .•comparing the values of f at the different critical points of L). can be ignored: sinceeach of these points identifies a global maximum, the value of f at all these pointsmust, perforce, be the same.

The situation is less rosy if Slater's condition is not met. In this case. it remains truethat every critical point of L identifies a solution to the problem. since the conditions[KT-I] and [KT-2] are still sufficient for a solution. On the other hand. [KT-I] and[KT-2J are no longer necessary. so the absence of a critical point of L does not enableus to conclude that no solution exists to the problem. In Example 7.17. for instance,we have seen that Slater's condition fails; and although the Lagrangean L admitsno critical points. a solution to the problem does exist. This points to a potentiallyserious problem in cases where the existence of solutions cannot be verified a priori;however, its practical significance is much diminished by the fact that the failure ofSlater's condition happens only in rare cases.

Finally, a small point. When the constraint functions hi are all concave. the con­straint set D is a convex set. Thus, if the objective function f happens to be strictlyconcave. there can exist at most one solution to the problem by Theorem 7.14. Froman applications standpoint. this observation implies that if we succeed in unearthinga single critical point of the Lagrangean, we need not search for any others, since theproblem's unique solution is already identified.

• If L has no critical points. then no solution exists to the given problem .• If (x". A*) is a critical point of L. then x· is a solution of the problem.

maximum, it is the case that every critical point identifies a solution to the problem.Summing up:

Chapter 7 Convexityand Optimization190

Page 199: A First Course in Optimization Theory

Lemma 7.19 Let [: an -+ JR. Given any X and h in IRn, define the junction CPx.h (.)by f{Jx,h(t)= I(x + th). t E JR. Then,1. I is concave on an if and only if the [unction f{Jx,h(.) is concave in t jar each

fixed x, h E an.

7.6 A Proof of the Second-Derivative Characterization of ConvexityFor expositional convenience, we prove Theorem 7.10 for the case where the domain1) of / is all of an. With minor changes. the proof is easily adapted to include thecase where 1) is an arbitrary open and convex subset of IRn. The proof requires thefollowing preliminary result:

owhich completes the proof.

Af(x) + 1 - A)/(y) s I(z) = flAx + (1 - A)Yl,

fey) - fez) s Df(z)(y - z) = (I ~ i) Df(z)w.

Multiplying the first equation by [Aj( J- A)], and adding the two equations, we obtain

(_A_) lex) + fey) _ (_1_) fez) s o.I-A J-A

Rearranging terms after multiplying through by (I - A), we have

and

j(x) - j(z) s Dj(z)(x - z) = Dj(z)w,

By hypothesis, we also have

y = z- (_A )w.I-A

Note that we have

z = AX + (I - A)y,

w = x - z = (I - A)(X - y).

which will establish that / is concave on D. For expositional convenience, define

/[Ax + (I - ).,)y] S A/(X) + (I - A)/(y),

Pick any X and yin D, and any A E (0, I). We will show that we must have

Now, suppose that for all XI and X2 inD we have

1917,6 Convexity and 1/11' Srcolld Derivative

Page 200: A First Course in Optimization Theory

Case 1:n = I

Suppose first that f: lR ~ IR is a C2 concave function. We will show that D2 f isnegative semidefinite at all xED, i.e., that I" (x) ::: 0 at all x E V.

Let r, y E R and suppose x < y. Pick sequences {Xk} and {Yk) in lR so that foreach k we have x < Xk < Yk < y, and x« ~ .r, Yk ~ y. By repeating the arguments

Proof of Theorem 7.10 It is easy to see that for any C2 function g: lRn ~ JR, thematrix of cross-partials D2 g(x) at a point x, is negative semidefinite (resp. nega­tive definite) if and only if D2h(x) is positive semidefinite (resp. positive definite).Therefore, Parts 2 and 4 of the theorem will be proved if we prove Parts 1 and 3. Weconcentrate on establishing Parts t and 3 here. Once again, we resort to a bootstrap­ping argument; we establish the theorem first for the case n = .1 (i.e .. f: IR ~ iR),and then use this to prove the general case.

owhich completes the proof that! is concave.

The proof of Part 2 of the lemma is left as an exercise to the reader.

f«(1 - a)zl + aZ2) = ifJx.h(a)= IPx.h«(l - 0')0 + at)~ (l - a )rpx.h(0) + a!px.h (I)= (1 - a)!(zl)+ (1 - a)!(z2),

so <Px.h(-) is indeed concave in t.Next, suppose for any x and h in lRn, IPx.hO is concave in t. Pick any Z), Zz E IRn

and any tx E (0, 1). Let z(a) = aZI + (l - a)zz. In order to prove that! is concave,we are required to show that I(z(a» ~ af(z)) + (l - a)f(zz) for any a E (0, 1).

Consider the function ifJx.h(-) where x == Z), and h = zz - z). Note that ifJx.h(0) =f(zl) and !Px.h(l) == !(Z2). Moreover, !px.h(a) = f(zi + a(z2 - zj ) = 1«1 -a)zl + aZ2). Since !Px.hO is concave by hypothesis, we have for-any a E (0, I)

ifJx.k(a/ + (1 - a)t') == !(x + ath + (l - a)t'h)== !(a(x + th) + (1 - Cl)(X+ t'h»?: Cl!(x+th)+(l-a)!(x+r'h)== ClIPx.h(t)+ (1 - a)IPx.h(t'),

Proof of Lemma 7.19 First suppose that! is concave on IRN. Fix any x and h in]Rn. For any pair of real numbers t and t', and any a E (0. 1), we have

2. If <Px.h(.) is strictly concave in t for each fixed x , h E lRn with h i= 0. then! isstrictly concave On lRn.

Chapter 7 Convexity and Optimization192

Page 201: A First Course in Optimization Theory

Case 2: n :> 1

We now turn to the general case where f: IRn _.,. IR for some n ?: I. Lemma 7.19will be repeatedly invoked in this step.

We begin with Part I. Suppose first that f is concave. Pick any x and h in IR".Weare required to show that h' D2 J(x)h :: O. Define IPx,h(t) = J(x + th). Since J isC2 by hypothesis, so is IPx.h(·), and in fact, we have IP: h(t) = Df(x + th) . h. and

J(x) - fez) I(z) - fey)~----~- > ---------x-z z-y

Cross-multiplying and rearranging terms, we finally obtain.

y-z Z-Xfez) ::: --f(x) + --fey)·

y-x y-x

Sincez =h+(l-)..)y,we have X = (y-z)/(y-x) and I-A = (=-x)/(y-x).Substituting in the inequality above. the proof of the concavity of f is complete. Thisestablishes Part I of the theorem for the case n = 1.

Note that if we had fl/(x) < 0, then we would have also had f'(wl) :> f'(W2).so all the inequalities in the above argument become strict. In particular. retracingthe steps (but with the strict inequality) establishes that f is then strictly concave.completing the proof of Part 3 of the theorem for the case II = I.

Since Wt < W2 and I" ~ 0, we must have J'(W) ::: J'(W2). Using this in theexpression above, we obtain

Jez) - J{y) = J'(W2).z- y

I(x) - fez) = J'(wI) andx-z

When k -+ 00 the len-most term in this expression converges to f' (r ). while theright-most term converges to f'(y), so we must have f'(x) ~ f'(y). Since x and ywere arbitrary points that satisfied x < y, this says I' is a nonincreasing functionon IR. If f' is nonincreasing, its derivative f" must satisfy I"(x) ~ 0 at all x E lR.which is precisely the statement that I" is negative semidefinite at all .r .

Now suppose f:'R -+ lR satisfies I"(x) ~ ° at all x E lR. We will show that Iis concave on JR. completing the proof of Part I of the theorem for the case n = I.Pick any x and y in JR and assume. without loss of generality, that X < y. Pick anyA E (0, I).and letz = Ax+(I-A)y. We are to show that f(z) ~ )..j(x)+(l-A)/(y).

By the Mean Value Theorem (see Theorem 1.71 in Chapter I). there exist pointsWI and W2 such that W) E (x, z), W2 E (z, Y), and

YI< - y.q - YI<x -XI<

of Theorem 7.5, it is seen that the following inequalities hold at each k:

f(x) - j(x,,) > j(Xk) - f(Yk) > f(Yk) - fey)

IlJl7.6 Convexity and the Second Derivative

Page 202: A First Course in Optimization Theory

4Note that. as with all necessary conditions. the concavity of f and the convexity of V played no role here.

Proof Suppose x" maximizes [over V.Let y point into 1)at x". Then. for all '7 > 0such Ihal (x" + 17Y) E D. we have f(x*) ~ f(x" + '7Y) since x" is a maximizer.Subtracting f(x*) from both sides. dividing by '7 and taking limits as T} -+ 0+establishes necessity."

Conversely. suppose DI(x"; y) ::: 0 for all ypointing into 'D at x". If x" does notmaximize [over D, there exists z E V with I(z) > [tx"). Let y == z - x". Then.x" + y = Z, so by taking w = I, it follows from the convexity of V that y points

Theorem 7.20 Suppose f:D -+ IR is concave, where D c lRn is convex. Then x*maximizes f over D if. and only if. Df ex"; Y) :::Of or att y pointing into 'D at x*.

7.7 A Proof of the Theorem of Kuhn and Tucker under Convexity

We first present a result that can be viewed as an abstract version of Theorem 7.16.Theil. we will usc this result to prove Theorem 7.16.

A definition first. Let x EVe IRn, and Y E lRn. We will say that y points into 'Dat x if there is w > 0 such that for all '7 E (0.w). we have (x + YiY) E V.

'P.~.1r(I) = h' D2 I(x + t h)h. Moreover, by Lemma 7.19, !Px.h(-) is conc~ve in t, sowe have (/J" (I)::: 0 for all t, by the result established for concave functions of one

x.h ., 2 0variable. Therefore, h'D2 [(x+th)h::: o for all I,andinparticular, h D I(x)h::: ,as required,

Now suppose that D2 f(z) is negative semidefinite at all z. We will show that Iis concave by showing that for any x and h in IRn, the function (/Jx.h(t) = [(x + Ih)is concave in I. Indeed, since I is C2, we again have 'P~,,,(1) = DI(x + 1h) . h, andCP~,h(t) = h' D2 f't» + t h)h. Since D2 I is negative semidefinite at all points, it isthe case that h' D2 f(x + Ih)h ::: o. Therefore, CP~,h(t) ::: 0 everywhere. so by theresult established for concave functions of one variable, CPx.h(.) is concave. Since xand h were arbitrary, an appeal to Lemma 7.19 establishes that I is also concave,completing the proof of Part 1 of the theorem.

To see Part 3, suppose that D2/(z) were negative definite at all z. Pick any xand h in IRn with h =I- O. Let !Px.h(t) = I(x + th). As above, the twice-continuousdifferentiability of [implies the same property for f/Jx,h(·). and we have f/J~.h(t) =h' ])2ft» + 1h)h for any t. Since II =I- 0 and D2I is negative definite everywhere,it follows that CP~.h(t) < 0 at all t. From the result established ion the case n = I,it follows that CPx.h0 is strictly concave in I. Since x and h =I- 0 were arbitrary,Lemma 7.19 implies that I is also strictly concave. 0

Chapter 7 Convexity and Optimization194

Page 203: A First Course in Optimization Theory

for all t E (0, f). Taking limits as t .i- O. we obtain Dhj (x*) Y :::O. Since A; ::::0, we

Suppose XI. X2 E D], Pick any x E (0, I). and let z = Axt + (1 - ).,)X2. Then, z E Usince U is convex. Moreover, hi(z) ::: Ahj(xt) + (I - A)hi(X2) ::: 0, so Z E '0,.Thus. D, is convex for each i . This implies V = n~=ID, is also convex. Since f isconcave, all that remains to be shown now is that Df(x*)y :: 0 for all y pointinginto V at x", and we can then appeal to Theorem 7.20.

So suppose y points into V at x'. Fix y. We will show that for each i == I, ... ,I,we have A7 Dhi(x')y::::: O. First, note that by definition of y, there is e > 0 such thatfor all t E (0, f), we have (x' + f y) E V. This implies hj(x' + ry) ::: 0 for all i . forall t E (0, d.

Pick any i. There are two possibilities: hi(x·) > 0, and hi(x·) = O. In the firstcase, Ai = 0 by condition [KT-21, so certainly.A'' Dhi(x')y :::O. Now, consider thecase hj(x*) = O. By hypothesis, we have hj(x· + ty) ::: 0 for all t E (0, E), so wealso have

D, == (x E U I hi(X) ::: OJ.

We begin our proof of Theorem 7.16 by demonstrating the sufficiency of theconditions [KT- I] and [KT-2] when f and the functions hi are all concave. (As wehave already mentioned, Slater's condition plays no part in proving sufficiency.) Ourproof will use the fact that if a function rJ>: U ~ lit is differentiable at a point x, thenthe directional derivative Dot»: h) exists at all h E ]Rn, and, in fact, Dot»: h) =DljJ(x)h.

So suppose that there exists X" E lRt such that [KT-I] and [KT-2] hold. Let

Bu: ille left-hand side tends to Df(x'; y)a~l1 ~ 0+ andthis irnplics Df cx"; y) > ()By hypothesis, on the other hand, Df(x": y) :: 0 since y points into '0 at x', acontradiction establishing the theorem. 0

f(x' + 1](Z- x'» - f(x') ::: fez) _ j(x') > O.T/

by concavity of f. Thus,

f(x' + T/(z - x") ::: (I - 1])f(x*) + T/f(z)= j(x') + T/(j(z) - f(x·»

into 1) at x". But for 1] E (0, 1),' .

7.7 Proof of the Kuhn-Tucker Theorem under Convexitv

Page 204: A First Course in Optimization Theory

SOne can actually show that the following stronger "saddle-point condition" is met at (x", >.."):

L(x,>"*):5 L(x·,>"·):5 L(X·,A), XEU, AER~.

We claim that X nY is empty. For if there were a point (w, z) in this intersection,

Y = {(w, z) E lR x lRn I IV > j(x"), z» OJ.

and

Thus, rKT-I ] must also hold, and the proof will be complete.We will utilize a separation theorem to derive the required point J...". To this end,

define the sets X and Y by

X = {(w, z) E IRx IRn Iw ::; I(x), z .::::hex) for some x E Uj,

IDf(x*) +L AiDh;(x·) = O.

;=1

Since A" 2: 0, the first of these conditions establishes [KT-2J. The second conditionstates that x" is a maximum of L(·, A") on U. Since U is open and convex, and L(·,}.. *)is concave in x, x* can be a maximum of Lon U if and only if DLx(x*, A*) = 0,i.e., if and only if

x E U.L(x,>-") ::; L(x*,A*),

as well ass

i= I, ... ,I,)..7h;(x*) = 0,

To prove the result, we will show that there is )... E lR~ which satisfies

IL(x, >-) = I(x) + L:>-·;h;(x).

;=1

Since y was an arbitrary vector pointing into D at x", this inequality holds for allsuch y. By Theorem 7.20, x· is then a maximum of I on V, completing the proofof the sufficiency of[KT-I] and [KT-2].

We now turn to necessity. Unlike the sufficiency part, Slater's condition will playan important role here. So suppose that x" is a maximum of f on the given constraintset D. We are to show the existence of A* such that [KT-l] and lKT-2] hold. Definethe function L: U x lR~ -+ lRby

finally obtain ).7Dh;(x*)y 2: 0 in this case also. It now follows that

IDf(x")y = - L).7 Dh;(x*)y s O.

;=1

Chapter 7 Convexity and Optimization196

Page 205: A First Course in Optimization Theory

Ir>;h;(x") = 0,;=1

If we take x = x" in this expression. we obtain L~=1>"7h, (r") ::5 O. On the otherhand. we also have hex") :::D. and A_" ::: O, so L:=I >..7htC,t·) ::: O. Together. theseinequalities imply

x E U.I

[(x) + "L).7h;(x) ::5 f(x·).;=t

We then have

But p = 0 combined with (p, q) ::: 0 and (p, q) =1= 0 implies q > 0 (i.e., thatq = (ql, ... ,qt) has at least one positive coordinate). By Slater's condition. thereis i E U such that hex) » O. Together with q > 0, this means q . hex) > O. acontradiction. Therefore, we cannot have p = O.

Now define

x E U.q . hex) ::5 0,

We have already established that (p, q) :::D.We now claim that p = 0 leads to acontradiction. If p = D, then from the last inequality, we must have

x E U.pj(x) + q . hex) ::5 p/(x*),

From the definition of Y. we must have (P. q) ::: O. For. if some coordinate wasnegative. by taking the corresponding coordinate of (u. v) positive and large, thesum pu +q . v could be made arbitrarily negative. It would not, then, be possible tosatisfy the inequality required by the separation.

Now, by taking a sequence (urn, vm) E Y converging to the boundary point([(x*), D), it also follows from the separation inequality and the definition of ,I.'that

(w. z) E X, (u. v) E Y.pw + q. z ::5 pu + q . u,

we would have the existence of an x in U such that j(x) ::: w > /(x*)-aoo­o « z :::::hex). so the point x is feasible and dominates x'. This contradicts thepresumed optimality of .r ", Thus. X and Y have no points in common. as claimed.

It is also true that X and Yare convex sets. The convexity of Y is obvious. Tileconvexity of X follows from the concavity of / and the constraint functions h, .

By Theorem 1.68. there is a vector (P. q) E lR x lRn. (p. q) =1= O. such that

7. 7 Proof of the Kuhn- Tucker Theorem under Convexity 197

Page 206: A First Course in Optimization Theory

6Tosee that (x',). *) is actually a saddle point of L as claimed in the earlier footnote, note that since'L~=IAjhj(x') = 0, we also have 'L~=IAjhj(x') ~ 'L:=IA;hj(x·). Therefore,

I I

L(x·,). *)= f(x') +L::>;h;(x') ~ J(x') +LAjhj(x) = L(x', A).i=) ;::::::1

is convex. Is this still true for any linear combination of convex functions?

f(x) = aI/I (X) + ... +anln(x)

where a > O. Is I concave?

3. Let I: IR~ ~ JR be a concave function satisfying 1(0) = O. Show that for allk ?: 1 we have kf(x) ::: ftkx), What happens if k E [0, I)?

4. Let D = I(x , y) E 1R21x2 +i S I} be the unit disk in R,2. Give an example ofa concave function I: V ~ IRsuch that I is discountinuous at every boundarypoint ofD.

S. Show that the linear function I: lR,n~ IRdefined by I(x) :: a . x - b, a E IRn,b E IR is both convex and concave on IRn. Conversely, show that if I: JRn ~ lRis both convex and concave, then it is a linear function.

6. Let III, 12, ... , In} be a set of convex functions from JRn to R Show that thenonnegative linear combination

7.8 Exercises

1. Define I: jR2 ~ Rby I(x, y) = ax2 +bi +2cxy+ d. For what values of a,b, c, and d is f concave?

2. Let I: IR~+ ~ JR be defined by

I(x}, ... , xn) = log (xI'" x~)

oThe proof is complete.P

I= I(x*) + I)·jhj(x*)

;=}

= L(x*,>..*).

s I(x*)

IL(x,')..*) = I(x) + 'L).7h;(x)

;=}

and, therefore, that for all x E U,

Chapter 7 Convexity and Optimization198

Page 207: A First Course in Optimization Theory

Show that I is concave on [0,2]. Observe that x" = I is a global maximizer ofI. Let V be the set of all y that point into [0,2] at x *. Show that DI (x'; y) ::: 0for I'!l Y E V.

x E [0, I]

X E (1,2].I(x) = { x,

2-x,

Is h concave? Why or why not?

11. Let I and g be concave functions on JR. Give an example to show that thei rcomposition log is not necessarily concave. Show also that if I is an increasingconcave function and g is any concave function, then fog will also be concave.What if instead, g were increasing and concave, and I were simply concave?

12. Let I and g be concave functions on JR.Is the product function I· g concave onJR?Prove your answer, or provide a counterexample.

13. Let I: [0, 2] ~ IR be defined by

hex) = I[Ax + b], x E JRm.

Is g convex? Why or why not?

9. Let 1)c R,n be a convex set. Let I: 1) ~ IR be a differentiable function. Showthat the following are equivalent:

(a) I is concave on D.(b) I(y) - I(x) :::DI(x)(y - x) for all x, y E 1).

(c) [DI(y) - DI(x)](y - x) ::: ° for all x, y ED.10. Let I: IR" ~ IRbe concave. Let A be an n x m matrix, and let b E JR.". Consider

the function h : IRm ~ IR defined by

g(X) = inf f;(x)?iEJ

is a convex function on D.What about the frnction g, defined as

I(x) = sup f;(x)iet

8. Let {f; : j E J} be a set (finite or infinite) of functions from a convex set V s;: IR"to lRwhich are convex and bounded on V. Show that the function, I, defined as

is convex on [0, I]. Is the above statement still true if we replace convex withconcave?

7. Show that a function f: lRn ~ llr isconvex if and only if for each xj , x2 E :Ru,the function ip: [0, I] ~ 1Rdefined by

cp()..) = f{)..Xl + (1 - )..)x21

1997.8 Exercises

Page 208: A First Course in Optimization Theory

Maximize P!(XI •.•. , xn) - WIXI -'" - WnXn

subjectto Xi ~ O. i = I •...• n,

where x E lR+. and u and f are nondecreasing continuous functions from lR+into itself. Derive the Kuhn-Tucker first-order conditions for this problem, andexplain under what circumstances these conditions are sufficient.

20. A firm produces an output y using two inputs Xl and X2 as Y == ..jXIX2. The firmis obligated to use at least one unit of XI in its production process. The inputprices of XI and X2 are given by WI and W2. respectively. Assume that the firmwishes to minimize the cost of producing y units of output.

(a) Set up the firm's cost-minimization problem. Is the feasible set closed? com­pact? convex?

(b) Describe the Kuhn-Tucker first-order conditions. Are they sufficient for asolution? Why or why not?

21. Describe conditions on f under which the first-order conditions in the followingoptimization problem are sufficient:

subject to CI + XI :5 xCt + Xt :5 f(Xt-J). t :::::2•.... Tc,.Xt::: O. t == I •...• T

17. Identify a set of conditions on the parameters p and w of the portfolio choiceproblem of subsection 2.3.6 under which the constraint set ct>(p. w) ofthe prob­lem satisfies Slater's condition.

18. Find a set of sufficient conditions on the technology g under which the constraintset of the cost-minimization problem of subsection 2.3.4meets Slater's condition.

19. Let T be any positive integer. Consider the following problem:

TMaximize Lu(Ct)

/==1

14. Repeat problem 13 if f(x) = x, and if f<x) == x(l - x).

15. Describe a set of conditions on the parameters p and I under which the budget setBt p, I) of the utility-maximization problem of subsection 2.3.1 meets Slater'scondition.

16. Under what conditions on p and W does the constraint set F(p. w) of theconsumption-leisure choice problem of subsection 2.3.5 meet Slater's condi­tion?

Chapter 7 Convexity and Optimization2oo

Page 209: A First Course in Optimization Theory

Given an income of I, and prices of PI. P2, P3, write down the consumer's utility­maximization problem. Can the Weierstrass and/or Kuhn-Tucker theorems beused to obtain and characterize a solution? Why or why not?

26. An agent consumes two commodities x and y. His utility function is given hyu(x, y) = x + In(1 + y), where In(z) denotes the natural Iogarithm of z , Theprices of the two commodities are given by Px > 0 and Pv > O. The consumer

22. A finn has contracted with its union to hire at least L• units of labor at a wage rate -of WI per unit. Any amount of additional labor may be hired at a rate of W2 perunit, where WI > W2. Assume that labor is the only input used by the firm in itsproduction process, and that the production function is given by I: ~+ ~ ~+.where / is Cl and concave. Given that the output sells at a price of p. describethe finn's maximization problem. Derive the Kuhn-Tucker first-order condi tions.Are these necessary for a solution? Are they sufficient?

23. A finn uses two inputs,labor (/) and raw material (m). to produce a single outputy. The production function is given by y = f(l, m). The output sells for a priceof p, while labor has a unit cost of w. The firm has in its stock 4 units of the rawmaterial m. Additional units may be purchased from the market at a price of c.The firm can also sell part or all of its own raw material stock in the market atthe price c.

(a) Set up the firm's profit-maximization problem, and derive the Kuhn-Tuckerfirst-order conditions. Describe under what conditions on I, these conditionsare sufficient.

(b) Let p = w = m = I, and let 1(1,m) = /1/3m 1/3. Calculate the firm'soptimal choice of actions.

24. A firm produces an output y using a single input / through the production functiony = /(1). The output is sold at a constant price of p. The firm is a monopsonistin the labor market: if it offers a wage of w, then it can hire A.. ( w) units of labor,where A(O) = 0,)..' (w) > 0 at all w ::: 0, and )..(w) t 00 as tv t 00. Assume theusual nonnegativity conditions.

(a) Describe the firm's profit-maximization problem, and wri te down the Kuhn­Tucker first-order conditions for a maximum.

(b) Explain under what further-assumptions on 10 and ).(.) these first-orderconditions are also suffici~nt. (Specify the most general conditions you canthink of.)

25. An agent who consumes three commodities has a utility function given by

U(XI.X2,X:l) =x:/3 +min{x2.x3}.

20t7.11 Exercises

Page 210: A First Course in Optimization Theory

has an income of I > O. Assuming consumption of either commodity must benonnegative, find the consumer's utility-maximizing commodity bundle.

27. 1\ consumer with a fixed income of I > 0 consumes two commodities. If hepurchases q, units of commodity i (i = 1,2), the price he pays is Pi (qi), wherePi (.) is a strictly increasing C I function. The consumer's utility function is givenby U(ql, q2) = lnql + Inq2.

(a) Describe the consumer's utility maximization problem. Does the WeierstrassTheorem apply to yield existence of a maximum? Write down the Kuhn­Tucker first-order conditions for a maximum.

(b) Under what conditions on PI (.) and P2 (-) are these conditions also sufficient?(Specify the most general conditions you can think of.)

(c) Suppose PI (qJ) = ..(iiI and P2(q2) = .fiji. Are the sufficient conditionsyou have given met by this specification? Calculate the optimal consumptionbundle in this case.

Chapter 7 Convexity and Optimization202

Page 211: A First Course in Optimization Theory

2m

is not concave unless L:7=1 a, S I. In this chapter, we examine optimization undera weakening of the condition of convexity, which is called quasi-convexity.

Quasi-concave and quasi-convex functions fail to exhibit many of the sharp prop­erties that distinguish concave and convex functions. A quasi-concave function may,for instance, be discontinuous on the interior of its domain. A local maximum of aquasi-concave function need not also be a global maximum of the function. Perhapsmore significantly, first-order conditions are not, in general, sufficient to identifyglobal optima of quasi-convex optimization problems.

Nonetheless, quasi-concave and quasi-convex functions do possess enough struc­ture to be of value for optimization theory. Most importantly, it turns out that theKuhn-Tucker first-order conditions are "almost" sufficient to identify optima ofinequality-constrained optimization problems under quasi-convexity restrictions;more precisely, the first-order conditions are sufficient provided the optimum occursat a point where an additional regularity condition is also mel. The result cannot.unfortunately, be strengthened to obtain unconditional sufficiency of the first-orderconditions: it is easy to construct examples of otherwise well-behaved quasi-convexoptimization problems where the regularity condition fails. and as a consequence.the first-order conditions do not suffice to identify optima. However. the regularitycondition is not a very restrictive one, and is satisfied in many models of economicinterest.

The previous chapter showed that convexity carries powerful implications for opti­mization theory. However, from the point of view of applications, convexity is oftenalso quite restrictive as an assumption. For instance, such a commonly used utilityfunction as the Cobb-Douglas function

Quasi-Convexity and Optimization

Page 212: A First Course in Optimization Theory

f[J...x + (l - A)Y] ::::fey) 2: a = ICy) = min{j(x), fey»).

Proof First, suppose that I is quasi-concave, i.e .• that Uj(a) is a convex set foreach a E R Let x, y E V and A E (0, I). Assume. without loss of generality, thatf(x) 2: fey). Letting fey) = a, we have x, y E Uj(a). By the convexity of Uj(a).we have Ax + (l - A)Y E Uj(a). which means

flAX + (I - A)y] ::: max{f(x), fey»).

"J1lC [unction I is quasi-convex on D if and only if for all x, y E Valid for allA E (0, I), it is the case that

I[Ax + (1 - A)Y] 2: min{j(x), fey)}·

Theorem 8.1 A function f: V --? lR is quasi-concave on V if and only if for allx, y E V and for all A E (0, I), it is the case that

The function f is said to be quasi-concave on V if Uj(a) is a convex set for eacha. It is said to be quasi-convex on V if Lj(a) is a convex set for each a.

As with concave and convex functions, it is also true in the case of quasi-concaveand quasi-convex functions that a strong relationship exists between the value of afunction at two points x and Y, and the value of the function at a convex combinationh+(I-A)Y:

Lj(a) = {x E'[) I I(x) .:::a}.

while the lower-contour set of f at a ER denoted Lj(a), is defined as

U/(a) = (x E V I I(x) 2: a},

S.l Quasi-Concave and Quasi-Convex FunctionsThroughout this chapter, V will denote a convex set in mn. Let I: V --? lR. Theupper-contour set of I at a E JR, denoted Uj(Y), is defined as

Finally, it must be mentioned that technical expedience is not the only, or even theprimary, ground for the significance of quasi-convex structures in economic analy­sis. As Arrow and Enthoven (1961) demonstrate, the quasi-concavity of a functionturns out to be, under certain conditions, the precise mathematical expression of theeconomic concept of a diminishing marginal rate of substitution. The latter is anassumption that is frequently imposed on economic models, and is usually justi­fied by an appeal to economic considerations alone. That such functions also yieldsufficiency of the first-order conditions can be viewed as simply an added bonus.

Chapter B Quasi-Convexity and Optimization204

Page 213: A First Course in Optimization Theory

Theorem 8.3 Let f: D c lRn -+ R. If f is concave on D, it is also quasi-concaveon V. If f is cOllvex 011 D, it is also quasi-convex on D.

8.2 Quasi-Convexity as a Generalization of ConvexityIt is a simple matter to show that the set of all quasi-concave functions contains theset of all concave functions. and that the set of all quasi-convex functions containsthe set of all convex functions:

Theorem 8.2 The function f: D -+ IRis quasi-concave on D if and only if - fis quasi-convex 011 D. It is strictly quasi-concave Oil D if and only if - f is strictlvquasi-convex 011 D.

A strictly quasi-convex function is similarly defined.In the sequel. we will use the term "quasi-convexity" to encompass the concepts

of convex sets, and of quasi-concave and quasi-convex functions. In an obviousextension of the terminology we defined in the previous chapter. we will say thata maximization problem is a quasi-convex maximization problem if it has a convexconstraint set and a quasi-concave objective function. A quasi-convex minimizatiollproblem will. similarly. refer to a minimization problem with a convex constraintset and a quasi-convex objective function. A quasi-convex optimiration problem willrefer to a problem that is either a quasi-convex maximiz-ation problem. or a quasi­convex minimization problem.

Finally. the following observation. which relates quasi-concave and quasi-convexfunctions. enables us to focus solely on quasi-concave functions. wherever this sim­plifies the exposition. Deriving analogous results for quasi-convex functions usingTheorem 8.2 is then a straightforward task. and will be left to the reader as an exercise.

f[h + (\ - )..)y) > min\J(x). fey)}.

A quasi-concave function f: D -+ lR is said to be strictly quasi-concave if thedefining inequality can be made strict. i.e .. if for all x, y E D with x i y, and for allA E (0, 1). we have

Now, suppose we have f[J...x +(I ~ J...)y) ~ min{j(x), fey)) for all x, y E-D·­and for all)" E (0, I). Let a E R. If UI(a) is empty or contains only one point.it is evidently convex, so suppose it contains at least two points x and y. Thenf(x) ~ a and ley) ~ a, so min{j(x), I(y)} ~ a. Now. for any A. E (0, I), we havej[)..x +(I - )..)y] ~ minl/(x). I(y») by hypothesis. and so Ax + (l - )..)y E Ur(a).Since a was arbitrary. the proof is complete for the case of quasi -concave functions.

An analogous argument shows that the claimed result is also true in the case ofquasi-convex functions. [J

20S8.2 Quasi·Convexity and Convexity

Page 214: A First Course in Optimization Theory

f[AX + (1 - )...)y] ?: min{f(x),/(y)}.

which will complete the proof. Indeed, this is immediate. Since I is quasi-concaveby hypothesis. we have

¢U["Ax + (I - ).)y]) 2: min(¢(j(x», ¢(j(y»},

Proof Pick any x and y in D, and any A E (0. 1). We will show that

Theorem 8.S If f: 'D ~ IR is quasi-concave on 1), and ¢:IR ~ lR is a monotonenondecreasing function, then the composition ¢o I is a quasi-concave function from1) to R In particular. any monotone transform of a concave function results in aquasi -("OIlClIl'(, [unction.

The next result elaborates on the relationship between concave and quasi-concavefunctions. It also raises a very important question regarding the value of quasi­convexity for optimization theory. This is discussed after the proof of the theorem.

Since f(x) = max{f(x). f(y»). the first inequality shows that f is quasi-convexon R Since fey) = min{f(x). fey)}. the second inequality shows that f is quasi­concave on R.

Since it is always possible to choose a nondecreasing function I that is neitherconcave nor convex on lR (for instance, take I(x) = x3 for all x E lR). we haveshown that not every quasi-concave function is concave, and not every quasi-convexfunction is convex. 0

I(x) ?: f[Ax + (I - A)yJ ::: I(y)·

Example 8.4 Let I; IR ~ IR be any nondecreasing function on IR. Then, I is bothquasi-concave and quasi-convex on JR. To see this, pick any x and y in JR. and any x E

(0. I).Assume, without loss of generality, that x > y. Then. x > Ax+ (l - A)y > y.Since I is nondecreasing, we have

The converse of this result is false, as the following example shows:

so I is also quasi-concave. A similar argument establishes that if I is convex, it isalso quasi-convex. 0

f[Ax + (1 - A)yj 2: Aj(x) + (I - )...)/(y)2: )"'min{f(x), I(y)} + (1- )...)min{f(x), I(y)}

= min{f(x), I(y»)'

Proof Suppose f is concave. Then, for all x, y E 1) and x E (0, I), we have

Chapter 8 Quasi-Convexity and Optimization206

Page 215: A First Course in Optimization Theory

Figure 8.1 illustrates this function. Evidently. f is a nondecreasing, and thereforequasi-concave, function on lR+ Note that f is constant in x for x E [0. 1]. and isstrictly increasing in x for x > I.

Suppose there existed a concave function g and a strictly increasing function if'such that rp 0 g == f.We will show that a contradiction must result.

We claim first that g must be constant on the interval [O. I]. To see this. supposethere existed x and y in 10. I J such that K(x) i- g(y). say g(:t) > g(y). Then. sincl'rp is strictly increasing. we must also have rp(g(x)) > rp(g(y)). This contradicts therequirement that ip 0 g be constant on [0, I].

Next we claim that g is strictly increasing in x for x > J. For. suppose thereexisted points x and y such that x > y > I. and such that g(x) :::: g(y). Then,

INote that Theorem 2.5 requires the transforming function cp 10 beslricliy increasing, not just nondecre avingIndeed, the equivalence claimed in the theorem would be false if", were onty nondecreasing. For instance,if", was a constant function. then it would also be nondecreasing. and evidently every point in the domainwould now maximize cp 0 f over V;clearly. nOI every point need be a maximizer of f on V.

x E [0. I]

x > I.

Example 8.6 Let f: lR+ -+ lR be defined by

{0.

f(x) = (x _ 1)2.

A natural question that arises is whether the converse of the second part of The­orem 8.S is also true. that is. whether every quasi-concave function is a monotonetransformation of some concave function. This question is significant in light of The­orem 2.S. which states that if cp is any strictly increasing function on llt then a pointx· maximizes a given function f over a given constraint set D if and only if .r "maximizes the composition tp 0 f over D. I If it turned out that every quasi-concavefunction could be obtained as a strictly increasing transformation of some concavefunction, then any optimization problem with a quasi-concave objective could beconverted to an equivalent problem with a concave objective. Thus. quasi-convexitywould have nothing to add to convexity, at least from the point of view of optimizationtheory.

It turns out. however. that this concern is without foundation: there do exist func­tions which are quasi-concave. but which are not obtainable as a strictly increasingtransformation of a concave function. A particularly simple example of such a func­tion is presented in Example 8.1 below. Thus. the notion of quasi-concavity IS agenuine generalization of the notion of concavity.

The proof is complete. []

¢(f[h + (l - A)Y]) 2: ¢(min{f(x). f(y))) = min{¢U(x)).1>U(y))I.

Since ¢ is nondecreasing, this-implies

2078.2 Quasi-Convexity and Convexity

Page 216: A First Course in Optimization Theory

(This indifference curve is graphed in Figure 8.2.) Since f is strictly increasing, theupper-contour set Uf(k) consists of all points (x, Y) E lR~ lying "above" (i.e., to thenortheast of) this straight line, and so is clearly convex. Thus, I is quasi-concave.

4y + (4+ 2k)x = k2 + 2k.

Note that f is a strictly increasing function on lR~, i.e., we have al(x, y)jax :=: 0and ilf(x, y) jay:=: 0 at all (x, y) E lR~. Moreover, the "indifference curve" of f atk, i.e., the locus of points (x, Y) E lR~ such that [t», y) = k, is the straight line

[t», y) = (x - I)+ [(1 - x)2 + 4(x + y)JI/2.

Arrow and Enthoven (1961) provide a richer, but also more complicated, exampleto illustrate this point. They consider the function I: lR~ ~ lRgiven by

rp(g(x) s rp(g(y». This contradicts the requirement that cp 0 g be strictly increasingin x for x > I.

But if g is constant on the interval [0, I], it has a local maximum at every Z E (0, I).These local maxima arc not global maxima, since g is increasing for x > I. Thiscontradicts Theorem 7.13, which shows that every local maximum of a concavefunction must also be a global maximum. 0

Fig. 8.1. A Quasi-Concave Function That Is Not a Monotone Transform of a ConcaveFunction

xo

f(x)

Chapter 8 Quasi-Convexity and Optimization208

Page 217: A First Course in Optimization Theory

Since f is a nondecreasing function, it is both quasi-concave and quasi-convexon R. Clearly, I has a dicontinuity at x = 2. Moreover, f is constant on the openinterval (1,2), so every point in this interval isa local maximum of f as well as a local

x E [0, I]xE(I,2]

x> 2.

Example 8.7 Let I: IR- R be defined by

The following example illustrates all of these points:

1. Quasi-concave and quasi-convex functions are not necessarily continuous in theinterior of their domains;

2. Quasi-concave functions can have local maxima that are not global maxima, andquasi-convex functions can have local minima that are not global minima;

3. First-order conditions are not sufficient to identify even local optima under quasi­convexity.

8.3 Implications of Quasi-ConvexityQuasi-concave and quasi-convex functions do not enjoy many of the properties thatcome with convexity. For instance, unlike concave and convex functions,

To prove that leannot be a strict monotone transform of a concave function I ....

significantly more difficult. We leave it to the reader as an exercise.

Fig. 8.2. The Arrow-Enthoven Example: An Indifference Curve

.r

y

21)<)8.3 Implications of Quasi-Convexity

Page 218: A First Course in Optimization Theory

0 al af-,-(x) -,-(x)dXI tlxk

af a2f a2f-(x) -2 (x) --(x)Ck(x) = aXI aXI aXlaXk

af (x) a2f cPf--(x) -(x)axk (1.\:kaXI ax2k

It is also possible to give a second-derivative test for quasi-convexity along thelines of Theorem 7.10 for concave and convex functions. Let a C2 function I definedon some open domain V c IRnbe given, and let x E V. For k = 1, ... n, let Ck (x)be the (k + I) x (k + I)matrix defined by

oProof See Section 8.6.

I(y) 2: f(x) => DI(x)(y - x) 2: 0.

Theorem 8.8 Let I: V ~ IRbe a C I function where V C IRn is convex and open.Then f is a quasi-concave function on V if and only if it is the case that for allYX,yE D,

Another significant distinction between convexity and quasi-convexity arises fromthe fact that a strictly concave function cannot be even weakly convex, and a strictlyconvex function cannot be even weakly concave. In contrast, a strictly quasi-concavefunction may well be strictly quasi-convex also: a simple modification of Exam­ple 8.4 shows that if I is any strictly increasing function on IR (i.e., if x > y im­plies I(x) > I(y», then I is both strictly quasi-concave and strictly quasi-convexon JR.

However, as with concave and convex functions, the derivatives of quasi-concaveand quasi-convex functions do possess a good deal of structure. In particular, it ispossible to provide analogs of Theorems 7.9 and 7.10, which characterized the con­vexity properties of functions using their first- and second-derivatives, respectively.Here is the first result in this direction:

minimum of I. However, no point in (0, 1) is either a global maximum or a globalminimum. Finally, If (0) = 0, although 0 is evidently neither a local maximum, nora local mimi mum. 0

Chapter 8 Quasi-Convexity and Optimization210

Page 219: A First Course in Optimization Theory

2There is also another difference between the two results. In Theorem 7.10. the strict inequality (the ncguuvcdefini.teness of D2 f) was shown to be sufficient for strict concavity. In Theorem 8.9. the strict inequalityis claimed to be sufficient only for quasi-concavity and not for strict quasi-concavity. We leave •••0 .hereader to check whether. or under what conditions. strict quasi-concavity results from the strict inequality.

4(x-l)(y- 1)

2(x _1)2(,1'-1)]

4(x - l)(y - I) .

2(x - 1)2

2(y-l)2

2(x _1)(y_I)2

and

2(x _1)(y_I)2].

2(y - 1)2

CI(x,y) _ [ 0- 2(x _ )(y_I)2

Therefore,

4(x - I)(y - I)] .

2(x _ 1)2[

2(y - 1)2

4(x - I)(y - I)

Example 8.10 Let f: lR~ ~ IRbe given by

f(x,y) = (x_I)2(y_l)2, (x,Y)ER~.

Then. we have D'[I;«, y) = (2(x - I)(y - 1)2. 2(x - 1)2(y -I», and

There is one very important difference between Theorem 8.9 and the correspondingresult for concave functions, Theorem 7.10. InTheorem 7.10, a weak inequality (viz ...the negative sernidefiniteness of D2 f) was both necessary and sufficient to establishconcavity. In the current result, the weak inequality is necessary for quasi-concavity.but the sufficient condition involves a strict inequality. The following example showsthat the weak inequality is not sufficient to identify quasi-convexity. so Theorem X.tJcannot be made into a perfect analog of Theorem 7.10.2

oProof See Section 8.7.

Theorem 8.9 LeI f: D ~ IRbe dC'l function. where 1)c IRnis open lind COIU'f.e

Then:

I. If fisqltllJi-coll('(/v(,()II1),wchl1l'c(-I)*jCk(X)j?: Ofork = I..... 11.

2. If(-l)kjCk(X)j > Oforallk E {I •... ,nl. thett f is quasi-concavcon D:

2118.3 Implications of Quasi-Convexity

Page 220: A First Course in Optimization Theory

bx°yh-I 1abxo-1yb-1 .

b(b - l)xQyb-2

and

In contrast, if we appeal to Theorem 8.9, we only have to show that ICI (x, y)1 < °amlIC2(x, y)1 > 0, where

Example 8.11 Let f :lRt+ ~ lRbe defined by

[(x,y)=xoi, a.b i- t).

We have seen in Example 7.12 that this function is strictly concave on lRi+ ifa + b < I, that it is concave on lR~+ if a + b = I, and that it is neither concavenor convex on lRt+ if a + b > I.We will show using Theorem 8.9 that it is quasi­concave on lR~+ for all a, b > O. Note that to show this directly from the definitionof quasi-concavity would require us to prove that the following inequality holds forall distinct (x, y) and (x, y) in lR~+ and for all )..E (0, I):

(Ax + (l - A)X)o(> ..y + (l - ),_)y)h :::: minlx" i,XO _yh}.

Thus, the sufficient condition of Theorem 8.9 cannot be strengthened to allow forweak inequalities. 0

As with the second-derivative test provided by Theorem 7.10 for convexity, the testgiven in Theorem 8.9 is also of considerable practical value. The following exampleillustrates this point.

I[)..(O, 0) + (l - ),,)(2, 2») = [(l, I) = 0 < 1 = min(f(O, 0), [(2,2)1·

with equality holding in either case if and only if x = 1 or y = 1. Thus, if weakinequalities were also sufficient to establish quasi-concavity, 1would have to bequasi-concave. However, 1 is not quasi-concave: we have 1(0,0) = 1(2,2) = 1,but for c = ~,

(X,Y)ElR~

(x, y) E lR~

A pair of simple calculations now yields

ICI(x,y)1 = -4(x _1)2(y_I)4 < 0,

IC2(X,y)l = 16(x - l)\y - 1)4 :::: 0,

Chapter 8 Quasi-Convexity and Optimization212

Page 221: A First Course in Optimization Theory

A similar result to Theorem 8.12 is true for strictly quasi-convex functions inminimization problems. It is left 10 the reader to provide details. The most significantpart of Theorem 8.12 for optimization is that, as with strict concavity. strict quasi­concavity also implies uniqueness of the solution.

Perhaps the most important result concerning quasi-concave functions in optimiza­tion theory is the following. It shows that the Kuhn-Tucker first-order conditions are"almost" sufficient to identify the global optimum of an inequality-constrained max­imization problem, if all the functions involved are quasi-concave:

I(y(),,» > rnin{j(x), fez)} = lex).

for any A E (0, I). But for A> I-r/d(x, z), d(x, yeA) < r, so }'(A) E B(x, r)n D,which contradicts the hypothesis that x is a local maximum, and establishes one partof the result.

To see the other part, let x and y both be maximizers of I on D. Pick any A E (0. I)and let z = AX + (l - A)y, Since D is convex, Z E V. Since I is strictly quasi­concave, I(z) = I(AX + (1 - A)Y) > min{/(x)I(y)} fix ; = fey), and thiscontradicts the hypothesis that x and yare maximizers. 0

Proof Suppose x is a local maximum of f on V, so there exists r > 0 such (hatI(x) ?: I(y) for all Y E Btx , r) n D. If x were not a global maximum of f on '0,there must be Z E V such that I(z) > I(x). Let y()") = AX + (I - A)Z. Note thaty()") E V because V is convex. By the strict quasi-concavity of f,we have

Theorem 8.12 Suppose f: V -* lRbe strictly quasi-concave where V c JRn isconvex. Then, any local maximum of Jon V is also a global maximum (if frm D.Moreover, the set arg maxl/(x) I x E V} of maximizers of f on V is either emptyor a singleton.

8.4 Quasi-Convexity and OptimizationWe have already seen that local maxima of quasi-concave functions need not. ingeneral, also be global maxima. When the function involved is strictly quasi-concave.however, a sharp result is valid:

A simple calculation shows that 1("I(x,y)1 = _a2x2(,,-I)y2h, which is stricily;negative for all (x, y) E lRt+, while

IC2(X,y)1 = X3,,-2),11>-2«(/" + 112(/).

which is strictly positive for all (x, y) E lR~+. The quasi-concavi ty of I on IR~ I IS

established. 0

21~8.4 Quasi-Convexity ami Optimization

Page 222: A First Course in Optimization Theory

31n fact. we have established in Example 8.6 that f cannot even be a strictly increasing transformation of aconcave function.

and II(x) = x for all x E IR, respectively. Consider the problem of maximizing fover the inequality-constrained set D = {x E IR I hex) ::: OJ. Note that f is notconcave.' so condition [QC-2] does not hold in this problem.

We claim that for any point x· E [0, 1], there is A" 2: 0 such that the pair(x", A') meets the Kuhn-Tucker first-order conditions. At any such point, [QC-llfails, since f'(x") = O. Indeed, pick any point x" E [0,1] and set 'A.. = O. Then,

I x3, x < 0I(x) == 0, 0:::: x ::::I

(x - 1)2, X > 1

Example 8.14 Let f: IR -7- IR and h: IR -7- IR be quasi-concave Cl functions givenby

It is very important to note that Theorem 8.13 does not assert that the first-orderconditions of the Kuhn-Tucker theorem are sufficient under quasi-concavity, unlessaile of the conditions {QC-l] or (QC-2] holds. Indeed, if[QC-I] and [QC-2] bothfail, it is easy to construct examples where the Kuhn-Tucker first-order conditions donot identify a maximum point, even though all the other conditions of Theorem 8.13arc met. Consider the following:

Proof See Section 8.8. 0

[QC-l] Df(x") ::f. 0.[QC-2] f is concave.

Then. x" maximizes f ave rDprovided at least one of thefollowing condi lions holds:

(KT-2J Aj 2: 0, Ajhj(x·) == 0, i = 1,... , k .

k[KT-l] Df(x") + LAjDhi(x·) = O.

;=1

Suppose there exist x" E D and A E IRk such that the Kuhn-Tucker first-orderconditions are met:

D = {x E U I hj(x) :::0, i = 1, ... , k}.

Theorem 8.13 (The Theorem of Kuhn and Tucker under Quasi-Convexity) Leif and h, (i == I, ... , k) be Cl quasi-concave functions mapping U C IRn into IR.where U is open and convex. Define

Chapter 8 Quasi,Collvexity and Optimization214

Page 223: A First Course in Optimization Theory

Of course, the set of critical points of L is the same as the set of points (x. )..)that satisfy hex) 2: 0 as well as the first-order conditions [KT-l J and [KT-2J.

3. Identify the critical points (x,)..) of L which also satisfy DI(x) -# 0. At anysuch point (x, )..),condition {QC-l J of Theorem 8.13 is satisfied. so x must be asolution to the given maximization problem.

As an alternative to appealing to condition [QC-IJ in step 3, it might be easierin some problems to use condition [QC-2] that I is concave. Indeed, it may bepossible to use [QC-21 even if I is not itself concave, if I happens to at least bean increasing transformation of a concave function. In this case, the given problemcan be transformed into an equivalent one with a concave objective function, and

hj(x) 2: 0, A 2: 0, A,hj(x) = 0, i = I, ... ,I.

kDI(x) + L)..jDhi(X) == 0,

j='

Maximize I(x) subject to xED = [z E IItn \ hj(z) :::0, i = I, ... ,I),

where f and hi are quasi-concave C' functions, i == I, ... , I. A simple three-stepprocedure may be employed.

1. Set up the Lagrangean Lt;x , A) = I(x) + L~='A;h; (x).2. Calculate the set of critical points of L, i.e., the set of points (x , A) at which the

following conditions hold:

8.5 Using Quasi-Convexity in Optimization ProblemsTheorem 8.13 is of obvious value in solving optimization problems of the form

It is also important to note that Theorem 8.13 only provides sufficient conditionsthat identify an optimum. It does not assert these conditions are necessary, and indeed,they are not unless the constraint qualification is met at x", or, as in Theorem 7. \ 6.the functions h, are all concave and Slater's condition holds.

I'(x*)+)"*h'(x*) = 0,

and [KT-I] holds. Moreover, it is clearly true that for any x· E [0, 11. we haveh(x·) 2: ° and)" *h(x·) == 0, so [KT-2J also holds. Thus, (x", A") meets the Kuhn­Tucker first-order conditions.

However, it is evident that no x· E [0,1] can be a solution to the problem: sinceI is unbounded on the feasible set lR.+. a solution to the problem docs not exist. CI

since x· E [0, I j, we have I' (r") ~ O. Since A· = 0, we also have )!h' (x ") = D, -Therefore,

21~8.5 Using QIUlIi,C(l1lvl'xiry

Page 224: A First Course in Optimization Theory

4Note that this condition does not exclude the possibility that f is itself concave. For, if f is itself concave.we can simply let 1= f ,and take IP to be the identity function lP(x) = x for all x, Then <p is a strictlyincreasing function. and 1is concave. and. of course. we now have / := rp 0 t.

Therefore, it is the case that for all t E (0. t):

I(x + try - x» - I(x) > O.t -

As t ~ 0+, the LHS of this expression converges to DI(x)(y - x), soDI(x) (y - x) 2:. 0, establishing one part of the result.Now suppose that for all x, y E 'D such that I(y) ~ ftx), we have

DI(x)(y - x) ~ 0. Pick any x, y E '0,and suppose without loss of generalitythat I(x) = min{f(x), j(y)}. Wewill show that for any t E [0, I], we must also

f'(» + t (y - x) = 1((1 - t)x + ty) ~ min{f(x), I(y)} = ftx),

8.6 A Proof of the First-Derivative Characterization of Quasi-ConvexityFirst, suppose I is quasi-concave on 'D, and let x, y E V be such that ri» ~I(x).Let t E (0, l ). Since I is quasi-concave,we have

it follows from Theorem 8.13 that everycritical point of i will identify a solution tothe transformedproblem, and, therefore, also to the original problem.Of course, if it turns out that [QC-2'] is inapplicable to the given problem, and

there is also no critical point (x , >-) of the LagrangeanL at which DI(x) =I- 0, thenit is not possible to appeal to Theorem 8.13. As we have seen, it is possible underthese circumstances that a point may meet the Kuhn-Tucker first-order conditions,and still not be a solution to the problem.

[QC-21can be used to obtain sufficiencyof the first-order conditions in the trans­formedproblem. Formally,webegin inthiscase bycheckingthe followingcondition,that we call [QC-2'):

[QC-2'] I is a strictly increasing transformof a Cl concave function ].4If [QC-2'J holds, the given optimizationproblem is equivalent to the following onein which the objective function is the concave function j:

Maximize j(x) subject to x E'D = (z E IRnI hj(z) 2:. 0, i = 1•.... II.

The concavity of j in this transformed problem implies that condition [QC-2] ofTheorem8.13 is met. If we now set up the Lagrangean

IL(x. >-) = j(x) + 'L).jh;(x).

;=1

Chapter 8 Quasi-Convexity and Optimization216

Page 225: A First Course in Optimization Theory

Since x E A'[x], andx maximizes f over all of A[x), it also maximizes lover A'lx].Since Dg(x) = Df(x) i 0, we must have p(Dg(x» = 1. Since the constraint

Since A = I, it also satisfies A 2: O. Finally. at y = x , we have g(y) == 0, soAg(y) :::::O.Thus. x is a global maximum of / on A[x}.

Now define the subset A'[xJ of A[xJ by

A'[x) = Iy E ]Rn I g(y) == O}.

Since the constraint function is linear, A [x] is a convex set. Since the objectivefunction I is quasi-concave by hypothesis, Theorem 8.] 3 states that any point (y, A)meeting the Kuhn-Tucker first-order conditions identifies a solution to the problem.provided DJ(y) =I O. We will show that for A = I, the pair (x , A) meets the Kuhn­Tucker first-order conditions. Since we are assuming that D[ex) i 0, this means xis a global maximum of Jon A[x]. Indeed. since Dg(y) = -D[(x) at each y, wehave at (y, A) = (x, I),

DJ(y) + A Dg(y) = Df(x) + (-D(x» = O.

A[xj ::::: (y E lR" I g(y) ::: OJ.

8.7 A Proof of the Second-Derivative Characterization of Quasi-ConvexityWe prove Part 1 of the theorem first. Suppose / is a quasi-concave and C2 functionon the open and convex set V c lRn. Pick any xED. If Dj(x) = O. then wehave ICk(x)1 = 0 for each k ; since the first row of this matrix is null. This evidentlyimplies (-Ii \Ck (x) I 2: 0 for all k, so Part 1of the theorem is true in this case.

Now suppose D lex) =I O. Define g by g(y) = D J(x)(x - y) for y E lRn, andconsider the problem of maximizing / over the constraint set

have J[(l - t)x + ty] 2: minl!(~), I(y)}, establishing the quasi-concavity of j,For notational simplicity, let z(t) == (l - t)x + ty.

Defineg(l) = /fx+t(y-x»). Notethatg(O) == J(x) ~ J(y) = gel); andthat g is C' on [0, IJ with g'(t) = DJ[x + t(y - x)](y - x). We will show that ift* E (0, 1) is any point such that J[z(l*)J ~ J(x) (i.e., such that g(t") ~ g(O). wemust have g'(t*) = O. This evidently precludes the possibility of having any pointt E (0, 1) such that g(t) < g(O), and the desired result is established.

So suppose that t" E (0, I) and we have f(x) ::: f[z(t*)J. Then, by hypothesis,we must also have Df[z(t")J(x - zU*» = --t*Df[z«(")J(y - x) ::: O. Sincet > 0, this implies g'([*) ~ O. On the other hand. since it is also true that fey) :::I(x), we have fey) ?: /[z(t*)], so we must also have D/[z(t")](y - z tr ") =(1 - t*)Df[z(t*)J(y - x) 2: O.Since t" < I. this implies in turn that g'(t*) ':: O. Itfollows that g'(t*) = O. []

2178.7 Quasi-Convexity and the Second Derivative

Page 226: A First Course in Optimization Theory

(-J)klCk(X)1 > O. k=l •...• n,

• Finally, we will show in Step 3 that if y is any other point in '0,and)", E (0, I),then f[)...x + (l - A)Y] ::: min I/(x) , fey)}. Since x was chosen arbitrarily, theproof of the quasi-concavity of I is complete.

So fix XED. We will show that x is a strict local maximum of f on A'[x]by showing (a) that the constraint qualification is met at x, (b) that there is )...suchthat (x, )",)meets the first-order conditions of the Theorem of Lagrange. and (c) that(x, A) also meets the second-order conditions of Theorems 5.4 and 5.5 for a strictlocal maximum.

First. note that since (-I)kICk(X)1 > 0 at each k, we cannot have DI(x) = O.Since Dg(y) = - Df(x) at any y E IRn. and Df(x) # O.we must have p(Dg(y» :::I at all y, and, in particular, p(Dg(x» = 1.This establishes (a). Moreover. for)",= 1,the pair (x.)",) is a critical point of the Lagrangean L(y) = I(y) + )",g(y). sinceDL(x) = DI(x) + )",Dg(x) = Df(x) - Df(x) = O.This establishes (b). Finally.note that since g is linear. we have D2L(x) = D2 I(x). From the definition of g. itnow follows that the condition

A[x] = {y E '0 I g(y) ::: O}.

• In Step 2, we will show that x is actually a global maximum of I on the constraintset

A'[x] = {y E '0 I g(y) = O}.

Since D2g(x) = 0, and Dg(x) = -DI(x), the matrix Mk is precisely Ck(X). Thus,we must have (_I)k ICk (x) I ::: 0 for k = 2, ... ,n, and this establishes the first partof the theorem.

The second part of the theorem will be proved using a three-step procedure. Fixan arbitrary point x in '0,and, as earlier, let g(y) = DI(x)(x - y).

• In Step 1, we will show that x is itself a strict local maximum of I_?nthe constraintset

Dg(x) ]

D2 f(x) + )"'D2g(x)

qualification is met, x must meet the second-order necessary conditions provided inTheorem 5.4, that is, it must be the case that the quadratic form D2f(x) +AD2g(x)is negative semidefinite on the set [z I Dg(x)z = OJ. By Theorem 5.5, a necessarycondition for this negative semidefiniteness to obtain is that the determinants of thesubmatrices Mk derived from the following matrix by retaining only the first (k + I)rows and columns should have the same sign as (_I)k, k = I, ... , n:

Chapter 8 Quasi-Convexity and Optimization218

Page 227: A First Course in Optimization Theory

AD[(z)x + (l - ),_)Df(z)y ::: min{Df(z)x, Df(z)yj.Df(z)z

Since xtt" -77) E A'[X(t*)] for n E (0, to), this means that for n °but sufficientlysmall, we must have [(xU·» > [txt:" -77». This contradicts the definition of.rfr ")as the smallest minimizer of F on [0,1], and shows that 1* E (0, 1) is impossible.Therefore, we must have I'" = O. Step 2 is now complete, since t" = () impliesF(l) ~ F(O), which is the same as f(x) ::: fey).

This leaves Step 3. Pick any y E Vand any A E (0, I), and let z = Ax + (l - A»)'.Observe that

By Step I, Df(x(t*»)[x(t* - h) - x(t·)] = 0 implies that x(/·) is a strict localmaximum of f on the constraint set

Df(x(t·»[x(t· - h) - X(I*)] = 0.

Now pick any.,., E (0,1*). Observe that xU'" - 1/) - xU*) = -'1(x - y), whichimplies that

Df(x(t*»(x - y) = 0.

If (* = I, then we must have F'(t*) ::::0, or 1* could not be a minimum of F.Therefore, we must have Df(x(t·»(x - y) ::::0. However, at t" = I we also haveX(/·) = x, and since y is in A[x), we must have g(y) = Df(x)(x - y) ~ O.Combining these inequalities, we see that in this case also, we must have

Df(x(t*»(x - y) = 0.

F(t) = f(X(I».

Note that F(O) = fey), F(l) = f(x), and that F is Cl 011 [0, II with F/(I)

Df(x(t))(x - y). Let A denote the set of minimizers of F on [0,)),and let r " = inf A.We claim that I· = O.

Suppose (. i= 0. We will show the existence of a contradiction. If I· E (0, I ), thenwe must, of course, have F'(t*) = 0 since t" is an interior minimum of F on [0, II;and this implies we must have

is exactly the same as the second-order condition required under Thcorems-S.d..and 5.5. This establishes (c), proving that x is a strict local maximum of f on A'[x I.and completing Step I.

We tum to Step 2. Pick any y E A[x]. We will show that f(x) ~ fey). Since .l'

was chosen arbitrarily in A [x], Step 2 will be complete. Let x(t) = IX + (I - I) y,and define the function F on [0, I) by

21<)R. 7 Quasi-Collvexity and the Second Derivative

Page 228: A First Course in Optimization Theory

The lemma will be established if we can show that the sum on the right-hand side isnonpositive. We will prove that this is the case, by proving that for each i E {I, ... , k},it is the case that A;Dh, (x·)(y - x") ?: O.So pick any y E '0,and any i E {l, ... ,k}. By definition, we must have hj(x*) ?:

O. Suppose first that hi (x") > O. Then, we must have AI = 0 by the Kuhn-Tuckerfirst-order conditions, and it follows that Aj Dh, (x*)(y - x") = O. Next suppose1I;(x') = O. Since V is convex, (x*+ I(y - x*» = «(1 - I)X* + Iy) E '0 forall ( E (0, I), am) therefore, we have "j(x· + t (y - x") ?: O. It follows that fort E (0, I), we also have

"j(x'" + t(y - x*» hj(x* + I(y - x") - h;(x·)o < = ,- t I

anti taking limits as t ~ 0+ establishes that Dh, (x*)(y - x") ?: O. Since A; ?: 0, wefinally obtain A;Dh j (x*)(y - x") ?: 0 in this case also. Thus, for any i E {I, ... , k I,we must have Aj Dhj(x*)(y - x") ?: 0, and the lemma is proved. 0

kDf(x*)(y-x*) = - :L::>-.;Dhj(x*)(y-x*).

;=1

Proof of Lemma 8.15 By hypothesis, we have

Lemma 8.15 Under the hypotheses of the theorem. it is the case that for any y E V.we have Df(x*)(y - x*) SO.

8.8 A Proof of the Theorem of Kuhn and Tucker under Quasi-ConvexityWe begin the proof of Theorem 8.13by noting the simple fact that when h, is quasi­concave, the set '0;= (x I hj(x) ?: O} is a convex set, and so then is the feasible set1)= nf=1Vi. We now claim that

By Step 2, z maximizes f over A[z], so we must have fez) ?: f(x).Now suppose Df(z)y ~ Df(z)x. Then, we must have Df(z)(z - y) ?: 0, so

Y E A[z]. Since z maximizes f over A[z], in this case we must have fez) ?: fey).Thus, we must have either fez) ?: f(x) or fez) ?: fey). This implies that

fez) ?: min{/(x), fey)}, so the proof is complete. 0

Alz] = /w ERn IDf(z)(z - w) ?: OJ,

Suppose first that Df(z)x ~ Df(z)y- Then, Df(z)z ?: Df(z)x impliesDf(z)(z - x) ?: O.Therefore, x E A[z], where, of course,

Chapter 8 Quasi-Convexity and Optimization220

Page 229: A First Course in Optimization Theory

h(x.y) = f(xl+g(y).

Show that if f and g are both strictly concave functions on JR., then h is a strictlyconcave function on JR2. Give an example to show that if f and g are bothstrictly quasi-concave functions on JR, then h need not be a strictly quasi-concavefunction on ]R2.

2. Give an example of a strictly quasi-concave function f: IR~ ~ IRwhich is alsostrictly quasi-convex, or show that no such example is possible.

8.9 Exercises

I. Let f and g be real-valued functions on lR. Define the function h: JR2 -+ lRby

Case 2: [QC-2] f is concave

By repeating the arguments used in the proof of the Theorem of Kuhn and Tuckerunder Convexity (Theorem 7.16), it is readily established that Df ix") y S 0 for ally pointing into D at x*. Since f is concave and D is convex, the optimality of x" isnow a consequence of Theorem 7.20. 0

Df(x*)(Y(I) - x") < 0,

implying. by Theorem 8.8, that f(y(t) < f(x·). Since this holds for any t E (0. I).taking limits as I ~ I.we have fey) s f(x·). Since y E Dwas chosen arbitrarily.this states precisely that x· is a global maximum of f on D.

where the inequality in the first expression follows from the definition of z , andthat in the second inequality obtains from Lemma 8.1 S. When these inequalities aresummed, we have

Case J: [QC-l] Df(x*) '1= 0Since Df(x·) '1= 0, there is wE JR" such that Df(x·)w < O. Let z = x " + !Ii,Notethat we then have Df(x·)(z - x") < O.

Now pick any y E V. For t E (0, I), let

y(t) = (I -I)y + t z and X(I) = (I - I)X· + t z .

Fix any t E (0, I). Then we have

Df(x·)(x(l) - x ") = t Df(x')(z - x") < 0,

Df(x·)(Y(/) - X(/» = (I - t)Df(x·)(y - .r ") SO,

We return now to the proof of-Theorem 8.13. We will show that the theorem IS

true under each of the two identified conditions, (QC-I] Df (x") :j:. 0, and (QC-2] fis concave.

2218. 9 Exercises

Page 230: A First Course in Optimization Theory

(a) is concave if ex + f3 ~ I.(b) is quasi-concave, but not concave, if ex + f3 > I.

Show also that htx«, X2) = log(u(x!, X2» is concave for any value of ex > 0 andfi> O.

8. Show that the function I: D C JR2 ~ JR defined by [t», y) = xy is quasi­concave if D = ]R~, but not if D = ]R2.

9. Give an example of a function u: IR~ ~ JR such that u is strictly increasing andstrictly quasi-concave on JR~. [Note: Strictly increasing means that if X :::: x' andx f=. x' then u(x) > u(x'). This rules out Cobb-Douglas type functions of theform I/(X, y) = xa yb for a, b > 0: since u(O, y) = u(x,O) = 0 for all (x, y),so II cannot be strictly increasing.]

10. Describe a continuous quasi-concave function u: JR~ ~ JR such that for allc E R, the lower-contour set Lu(c) = (x E JR~ I u(x) ~ c] is a convexset. Would such an example be possible if u were required to be strictly quasi­concave? Why or why not?

11. A consumer gets utility not only out of the two goods that he consumes, but alsofrom the income he has left over after consumption. Suppose the consumer'sutility function is given by u(C!, C2, m) where Cj denotes the quantity consumed

ex, f3 > 0,

Give an example to show that if each J; is only quasi-concave, then I need notbe quasi-concave.

5. Let I:D ~ R, where D C JRII is convex. Suppose I is a strictly quasi-concavefunction. If f is also a concave function, is it necessarily strictly concave? Whyor why not?

6. Let g:]R1I ~ ]Rbequasi-concave, and let I: ]R ~ R be anondecreasing function.Show that hex) = l[g(x)] is also quasi-concave.

7. Show that the "Cobb-Douglas" utility function u:]R~ ~ ]R defined by

XED.

3. We have seen in this chapter that if I: JR ~ JR is a nondecreasing function,then f is both quasi-concave and quasi-convex on JR. Is this true if I is insteadnon-increasing?

4. Let II, ...!t be functions mapping D C lRn into JR, where D is convex. Letai, ... ,aJ be nonnegative numbers. Show that if for each i E {I, ... , I}, f; isconcave, then so is I, where I is defined by

JI(x) = LOj!t(x),

1=1

Chapter 8 Quasi-Convexity and Optimization222

Page 231: A First Course in Optimization Theory

of commodity i , and m ~ 0 is the left-over income. Suppose further that u isstrictly increasing in each of the three arguments. Assuming that the consumerhas an income of I > 0, that the price of commodity i is Pi > 0, and that all theusual nonnegativity constraints hold, answer the following questions:

(a) Set up the consumer's utility maximization problem, and describe the Kuhn­Tucker first-order conditions.

(b) Explain under what further assumptions, if any, these conditions are alsosufficient for a maximum.

2218.9 Exercises

Page 232: A First Course in Optimization Theory

224

INote also that the set of solutions 1)"(8) will not. in general. be single-valued. Thus. the question of whenthis set varies continuously with 8 also requires a notion of continuity for point-to-set maps.

We will sometimes refer to the pair (f*, V*) as the solutions of the given family ofparametric optimization problems.

In this chapter, we examine the issue of parametric continuity: under what con­ditions do f*O and V·O vary continuously with the underlying parameters 8? Atan intuitive level, it is apparent that for continuity in the solutions to obtain, somedegree of continuity must be present in the primitives f(·,·) and VO of the prob­lem. Being precise about this requires, first of all, a notion of continuity for a mapsuch as V(-) which takes points e E e into sets V(O) c lRn. Our analysis in thischapter begins in Section 9.1 with the study of such point-to-set maps, which arecalled correspondences. I

1*(0) = sup{f(x, 8) I x E D(8)}

V*CO) = argmax{j(x,O) I x E DCO)}.

Deriving the corresponding results for minimization problems is a routine exerciseand is left to the reader. As in Chapter 2, we will denote by f*(8) the maximized valuefunction which describes the supremum of attainable rewards under the parameterconfiguration 0; and by V*(O) the set of maximizers in the problem at (). That is,

Maximize fex) subject to x E V(O).

The notion of aparametric family of optimization problems was defined in Section 2.2of Chapter 2. To recall the basic notation, such a family is defined by a parameterspace El C JR', and the specification for each 8 E e of two objects: a constraintset V(O) c IRn and an objective function f(·, 0) defined on V(8). We will assumethroughout this chapter that we are dealing with maximization problems: that is, thatthe problem in question is to solve at each 0 E e,

Parametric Continuity: The Maximum Theorem

9

Page 233: A First Course in Optimization Theory

f(·) E V

9.1.1 Upper- and Lower-Semicontinuous Correspondences

Any function .r from e to Smay also be viewed as a single-valued correspondencefrom e to S. Thus, an intuitively appealing consideration to keep in mind in defininga notion of continuity for correspondences is that the definition be consistent withthe definition of continuity for functions; that is, we would like the two definitions tocoincide when the correspondence in question is single-valued. Recall that a functionf: e -+ S is continuous at a point e E e if and only if for all open sets V such thatf(O) E V, there is an open set U containing () such that for all (Jf E en U, we havef(O') E V. In extending this definition to a notion of continuity for correspondences4>: e -+ P(S), a problem arises: there are at least two ways to replace the conditionthat

9.1 Correspondences

Let e and S be subsets of JR.' and IRn, respectively. A correspondence ¢ from (-)to S is a map that associates with each element () E e a (nonernpty) subset4>(0) C S. To distinguish a correspondence notationally from a function. we willdenote a correspondence cl> from e to S by cl>: e -+ Pt S), where peS) denotes thepower set of S, i.e., the set of all nonernpty subsets of S.

In subsection 9.1.1 below, we explore some definitions of continuity for a cor­respondence. based on "natural" generalizations of the corresponding definition forfunctions. Some additional structures on correspondences are then provided in sub­section 9.1.2. In subsection 9.1.3, the various definitions are related to each otherand characterized. Finally, the definitions for correspondences and functions arecompared in subsection 9.1.4. \

Section 9.2 is the centerpiece of this chapter. In subsection 9.2.1. we prove one atthe major results of optimization theory, the Maximum Theorem. Roughly speaking,the Maximum Theorem states that continuity in the primitives is inherited by thesolutions, but not in its entirety; that is. some degree of continuity in the problem isinevitably lost in the process ofoptimization. In subsection 9.2.2. we then examine theeffect of placing convexity restrictions on the primitives, in addition to the continuityconditions required by the Maximum Theorem. We label the result derived here. theMaximum Theorem under Convexity. We show here that. analogous to the continuityresults of the Maximum Theorem. the convexity structure of the primitives is alsoinherited by the solutions. but. again. not in its entirety.

Finally, Sections 9.3 and 9.4 present two detailed worked-out examples. designedto illustrate the use of the material of Section 9.2.

2259.1 Correspondences

Page 234: A First Course in Optimization Theory

This correspondence is graphed in Figure 9.1. It is not very difficult to see that <l>is both usc and lsc at all e f; 1. For, suppose (}< I. Let E == (1 - e) /2. Let U bethe open interval «(} - E, 8 + E). Then, for aJl (}' E Un e, we have 4>(8') = <1>(8).Therefore, if V is any open set such that <l> (8) c V, we also have <1>«(}')C V forall (}' E U n 8; while if V is any open set such that V n <I>«(}) f; 0, we also haveV n <1>(0')f; 0 for all e' E Un e.Therefore, <t>is both usc and Isc at all e < 1.Asimilar argument establishes that it is both usc and Isc at all e > I.

At e = I, if V is any open set containing <t>(I)= [0,2], then V contains <1>(8')for all e' E 8, so <1>is clearly usc at e = 1. However, <I> is not Isc at 8 = 1.To seethis, consider the open interval V = (~,~).Clearly, V has a nonempty intersectionwith <1>(1), but an empty intersection with <t>(e) for all () < 1. Since any open set

I{I}.<1>«() =s,

Example 9.1 Let 8= S = [0,2]. Define <1>: e ~ Pt S) by

These two conditions coincide when <I> is single-valued everywhere (for, in this case,<1>(.) C V if and only if <I> (.) n V f; 0), but this is evidently not true if <I> is not single­valued. This leads us to two different notions of continuity for correspondences.

First, a correspondence <1>: e ~ peS) is said to be upper-semicontinuous or uscat a point e E 8 if for all open sets V such that <t>(e) c V, there exists an open setU containing e, such that (}' E un e implies <t>(e') c V. We say that <t>is usc one if <t>is usc at each () E e.

Second, the correspondence <t>is said to be lower-semicontinuous or Isc at e E eif for all open sets V such that V n <t>(e) f; 0, there exists an open set U containingf) such that (}' E un e implies V n <I> (e') t 0. The correspondence 4>is said to belsc on e if it is lsc at each () E 8.

Combining these definitions, the correspondence <1>: e ~ peS) is said to becontinuous at () E 8if <I> is both usc and Isc at ().The correspondence <I> is continuouson e if 4>is continuous at each () E e.

The following examples illustrate these definitions. The first presents an exampleof a correspondence that is usc but not lsc, the second describes one that is lsc butnot usc, and the third one that is both usc and lsc, and, therefore, continuous.

or that

<1>(.)C V,

with a condition appropriate for correspondences: namely, one can require either that

Chapter 9 The Maximum Theorem226

Page 235: A First Course in Optimization Theory

A graphical illustration of <1> is provided in Figure 9.2. The same argument as in theearlier example establishes that <1> is both usc and lsc at all (J =f. I. At () :::: J, <P(fJ)can have a nonempty intersection with an open set V if and only if J E V. Since wealso have I E <P«(}') for all ()' E e, it follows that <Pis Isc at 1.However, <1> is not usc at B = 1: the open interval V = (~, j) contains <P(I), but

fails to contain <1>(0') for any (J' > I. [1

Example 9.2 Let e = s= [0,2]. Define <1>: e ~ Pt.S) by

I{I)' O:sB:s1<1>(0) =

T, 1 < 0 :::2.

U containing ()= 1must also contain at least one point ()' < I, it follows that forthis choice of V, there is no open set U containing 8 = 1 which is also such that<1>«(}') n V f= 0 for all ()' E un e.Therefore, <1> is not lsc at ()= 1. 0

Fig. 9.2. A Correspondence That Is Lsc, But Not Usc

2o

2

Fig. 9.1. A Correspondence That Is Usc, But Not Lsc

2

2279.1 Correspondences

Page 236: A First Course in Optimization Theory

Gr(4)) = {(e, s) E ex Sis E <t>(8)}.

Note that Gr( <t» is a subset of R" x ]R/, since 8 C IR" and S C lR.t.

9.1.2 Additional DefinitionsIn this subsection, we define some additional concepts pertaining to correspondences.These are extremely useful in applications. but are also of help in understanding andcharacterizing the definitions of semicontinuity given above.

So let e c ]R" and S C JR/. A correspondence 4>: e ~ peS) is said to be

I. closed-valued at e E e if c1>(8) is a closed set;2. compact-valued at () E e if <1>(8) is a compact set; and3. convex-valued at (I E (-) if <1>(8) is a convex set,

If a correspondence <l> is closed-valued (resp. compact-valued, convex-valued) at alle E e. then we will simply say that <l> is a closed-valued (resp. compact-valued,convex-valued) correspondence.

The graph of a correspondence <1>: e ~ peS), denoted Gr(<1», is defined as

Figure 9.3 presents this correspondence in a graph. A correspondence such as ¢I iscalled a "constant-valued" or simply a "constant" correspondence. Pick any open setV. and any (i E E>. Then •

• 4>«(}) c V if and only if 4>(8') C V for all 0' E 8, and• 4>(0) n V ~ '" if and only if ct>«(i') ~ 0 for all 0' E 8.

It follows that aconstant correspondence is both usc and Isc at all points in its domain.o

¢I(O) = K, e E 8.

Example 9.3 Let e c IRJand S c JR." be arbitrary. and let K be any subset of S.Deline 4>:e ~ P(S) by

Fig. 9.3. A Continuous Correspondence

Chapter 9 The Maximum Theorem22X

Page 237: A First Course in Optimization Theory

9.1.3 A Characterisation of Semicontinuous CorrespondencesWe present in this subsection a number of results that characterize semicontinuouscorrespondences. The discussion here is based on the excellent summary of Hilden­brand (1974).

Some preliminary definitions will help simplify the exposition. Given a set X C IRkand a subset A of X, we will say that A is open in X if there exists an open set Uin IRk such that A = X n U. We will also say that the subset B of X is closed illX if the set {x E X I x f/. Bj is open in X. (The set {x E X I x rf:. B} is called thecomplement of B in X. Abusing notation, we will denote this set by Be.) Thus, forinstance, the set [0, a) is open in the closed unit interval [0, 1] for any a E [0, I).while the set (0, I) is closed in itself.Let a correspondence <1>:e -r Pt.S) be given, where e c IRn and S C IRI, Let

W be any set in jR/. Define the upper inverse of Wunder <1>, denoted <1>:;: I (W). by

<I>:;:I(W) = (8 Eel <1>(8)c WJ,

Clearly, <1>(8) is compact and convex for each () E e, so <1> is both convex-valuedand compact-valued. It does not have a closed graph since the sequence 18m, .I'm}defined by Om = Sm = 1 - ~ is in the graph of <1>, but this sequence convergesto (1, I),which is not in the graph of $, Nor does $ have a convex graph since(8, s) = (0, 0) and (01, s') = (I , 0) are both in the graph of <1>, but the convexcombination (i, 0) = !(8, s) + ~(8', s') is not in the graph of $. 0

<1>(8) = { (8).(OJ,

Example 9.4 Let 0) = T = [0, 11.and define <1> by

Note that a closed-graph correspondence is necessarily closed-valued, but theconverse is not true. Similarly, every correspondence with a convex graph is alsoconvex-valued, but the converse is again false. Here is an example illustrating bothpoints:

The correspondence <I> is said to be a closed correspondence or a closed-graphcorrespondence if Gr{¢» is a closed subset of Rn x JR.', that is, if it is the case thatfor all sequences (19m j in e such that 19m --+ 8 E e,

Sm E $(8m), Sm --+ S implies S E <1>(8).

::::milarly, <1>is said to have a convex graph if Gr( <1» is a convex subset of R" x lR/.i.e., if for any 8 and Of in e, and any S E <1>(8) and s' E <1>(8'), it is the case that

AS + (I - A)s' E <1>[A8 + (I - A)8'], A E (0, I).

9, J Correspondences

Page 238: A First Course in Optimization Theory

2Such a set must exist since W is open in El.

Proposition 9.6 Thefollowing conditionsare equivalent:

1. ¢ is lsc.

The desired result obtains since a set H is open in e (resp. S) if and only if e - His closed in e (resp. S - H is closed in S).

It remains to be shown that Condition 2 implies Condition 1.Suppose Condition 2holds. Let t) E e. and let V be any open set containing <1>(t). By Condition 2, theset W = <1>:;: I (V) is open in e. W is nonempty, since 0 E W. Therefore, if U is anyopen set such that U ne = W,2 we have (} E U and <1>«(}') C V for all 0' E Un e,which states precisely that <1> is usc at (J. Since () was arbitrary, we are done. 0

<1>:;:I(T - A) = {O Eel <1>(0) c T - A}

= {O Eel <1>(0) n A = 0}

= e - It) Eel <1>(0) n A ::j:. 0)= e - <1>-1 (A).

So suppose first that Condition 1 holds. IfCondition 2 fails, ti1ere is an open set Gin S such that the set <1>:;: 1 (G) is not open in e.Therefore, there is 0 E <1>:;: 1 (G) suchthat for each E > 0, there is O'(E) E B(s, s) ne such that <1>(t)'(E» is not containedin G. But then <1> is not usc at 0, so Condition 1 also fails.

Next we show that Condition 2 holds if and only if Condition 3 holds. Given anytwo sets E and F, define E - F to be the set En Fe, that is E - F is the set ofelements of E that are not in F. Now, for all AcT,

Proof We prove the following series of implications:

Proposition 9.5 Thefollowing conditions are equivalent:

1. <1> is usc on e.2. <1>+1 (G) is open in efor every G that is open in S.3. <1>-I (F) is closed inefor every F closed in S.

Observe that the definitions of lower- and upper-inverses coincide with each other,as well as with the definition of the inverse of a function, when the correspondenceis single-valued.

<1>-1(W) = {O Eel <1>(0) n W 1= 0}.and the lower inverse of Wunder <1>, denoted simply <1>-1 (W), by

Chapter 9 The Maximum Theorem230

Page 239: A First Course in Optimization Theory

Proof Suppose <l> is usc at 8 E 8.Suppose also that ep --+ e, and sp E <P (Op) for allp. We are required to show that (i) there exists a convergent subsequence Skip) of sp.

and (ii) the limit s of this subsequence is in ¢(8). The set K = (O)U{8" 82. e3, ... J isclearly compact. Therefore, by Proposition 9.7, so is <l>(K). Hence, (spJ is a sequencein a compact set, and posseses a convergent subsequence, denoted (say) Sk(p). Thisproves (i). Let s = limp-> 00 SkiP)' If S f/:. 4>(0), then there is a closed neighborhood

Proposition 9.8 Let 4>: e -+ Pt.S) be a compact-valued correspondence. Then. ~is usc al e E 8 if and only if for all sequences {)p -+ 8 E 8 and for all sequencesSp E 4> (ep), there is a subsequence Skip) of Sp such that Skip) converges to someS E ¢(e).

K c U~= IW0; = U~= I<l> ~ 1(Vo, ).

Therefore, the collection (VII; )~=1 is an open cover of <l> (K). Since each Vo; contai nsonly finitely many elements of (Va)aeA, it follows that (Vo; )~=l is a finite subcovcrof the open cover (Va )aeA.

We have shown that an arbitrary open cover of <l> (K) must have a finite subcovcr,which is precisely the statement that <l>(K) is compact. []

Proof We will use the characterization of compactness provided in Theorem 1.36of Chapter 1. that a set E C JR" is compact if and only if every open cover of E hasa finite subcover.

Let (Va)aeA be an open cover of <t>(K).Pick any e E 8. Then, <1> (e) c <t>(K)cUaeA Va. Since <p(e) is compact by hypothesis, there exists a finite subcollection80 C A such that <t>(8) C UfleBo Vf3. Let Vo = UflEBO Vfl. As the union of open sets.Vo is open.

By Proposition 9.5, the set Wo = <l>~ 1(Vo) is open in Eo). Trivially. e E Wo.Therefore, we have K C UfieK Woo which means that UOeK Wo is an open cover ofthe compact set K. Therefore, there exist finitely many indices 81 •... , 8, E K suchthat

Proposition 9.7 Let 4>: 8 -+ P(S) be a compact-valued. usc correspondence.Theil, tj K:«; e is compact, so is <t>(K)= {t E Tit E <t>(8)forsomeO E KJ. Thatis, compact-valued usc correspondences preserve compactness.

Proof This result may be proved in a manner similar to the proof of Proposition II.S.The details are left as an exercise. 0

2:119.1 Correspondences

2. <1>+.J (F) is closed for eve;';' F'·th~tis closed in S.3. <1>- J (G) is open for every G that is open in S.

Page 240: A First Course in Optimization Theory

Proof Suppose <I> is Isc at O. Let s E <1>(0), and let {Om} be a sequence in e suchthat em --+ e. For p = 1,2, .. " Jet Bts, t) denote the open ball in ~I with center S

and radius ~. Since <I> is Isc at e, it is the case that for each p, there exists an openset Up containing e such that <I>«()') n Bis, i) =1= 0 for each f)f E Up n e. SinceOm --+ 0, it is the case that for each p, there exists m(p) such that Om E Up for allIII ::::m(p). Obviously, the sequence m(p) can be chosen to be increasing, i.e., tosatisfy III(p + I) > m (p) for all p, Now, if m is such that m (p) ~ m < m (p + I),define Sill to be any point in the set <l>(em) n B(t, ~), Since m ::: m(p) impliesem E Up. it follows from the definition of Up that this intersection is nonempty. Byconstruction, the distance between Sm and s goes to zero as m ~ 00. This provesthe first result.

Now suppose that for all s E <1>(8)and for all sequences em ~ 0, there is asubsequence {Ok(m)1 of {OmI and Sk(m) E <1>(Ok(m» such that Skim) ~ s. Suppose <I>is not lsc at fJ, Then. there is an open set V such that V n <1>(8) =1= 0, and such thatfor all open sets U containing e, there is Of E un e such that <1>(0') n V = 0. Forp = 1,2 •... , let Up be the open ball B(e, t) with center 8 and radius ~. Letf)p E Upbe any point such that <1>«()p) n V = 0. Pick any s E <1>(8) nV. By hypothesis in thiscase, there is a subsequence k(p) and Sk(p) E <I>(Ok(p») such that Sk(p) ~ s. SinceV is open, and S E V, it must be the case that Skip) E V for all p sufficiently large.But this contradicts the definition of (Jp as a point where <I>«(Jp) n V == 0. 0

Proposition 9.9 Suppose the correspondence <1>:e --+ Pt S) is lsc at 0, and s E

<1>(0). Then,for all sequences8m ~ 0, there is Sm E <1>(8m) such that Sm ~ s.Conversely, let <1>:e --+ Pt S), and let 8 E e, s E <1>(8). Suppose that for all

sequences Om ~ 0, there is a subsequence k(m) of m, and Sk('1n) E <I>(Ok(m» suchthat Skim) ~ s. Then. <1> is lsc at O.

G ::J <1>(8)such that s rt. G. But <1>(8p) C G for sufficiently large p, so skip) E G,and since skip) ~ t, we must have s E G after all, a contradiction. This proves (ii).

Conversely. suppose that for all ()p --+ S and for all sp E <1>«(),,) , there is skip)converging to some s E <1>«(). Suppose also that <1> is compact-valued. Suppose <1>is not usc at some 8 E e, that is, there exists an open set V containing <1>(8) suchthat for all open sets U containing 8, there exists ()' E un e such that <1>«()') is notcontained in V. For m == 1,2, ... , let Urn be the open ball B(e, *) around (), and letOm E Um n e be such that <f>«(}m) is not contained in V. Pick any point Sm E <I>(8m)

with Sm 1: V. By construction, it is the case that Om --+ O. Since .'1m E <1>(Om) for eachm, it is true by hypothesis that there is a subsequence Sk(m) of Sm and S E <1>(0) suchthat Skim) --+ s. But Sm ¢ V for each m, and V is open. Therefore, s f/:: V, whichcontradicts the assumption that <1>(8) C V. 0

Chapter 9 The Maximum Theorem232

Page 241: A First Course in Optimization Theory

Theorem 9.12 A single-valued correspondence that is semicontinuous (whetherusc or lsc] is continuous when viewed as a function, Conversely, every continuousfunction, when viewedas a single-valued correspondence,is both usc and lsc.

9.1.4 Semicontinuous Functions and Semicontinuous Correspondences

We have mentioned above that when <I> is single-valued at each f} E (-), it can heviewed as a function from (0:) to S, This section explores further the relationshipbetween the continuity of a single-valued correspondence, and the continuity ofthe same map when it is viewed as a function. The following preliminary result isimmediate from the definitions, and was, indeed, used to motivate the concept ofcrrmcontinuity,

It is easily checked that Gr(<I» is a closed subset of R~,Evidently, ¢ is also a com­pact-valued correspondence, However, <I> is not usc at () = 0: if V is any boundedopen set containing <1>(0) = {O}, then V cannot contain <1>(0) for any 0 which ispositive, but sufficiently close to 0, o

0=00> O.{

10),<1>(0)= 10, lis\'

Example 9.11 (A compact-valued, closed-graph correspondence that is not usc)Let e =S= R+, Define <1>: 8 ~ P(S) by

Example 9.10 (An upper-sernicontinuous correspondence that does not have aclosed graph) Let e = s = [0, I]. and let <1>(11) = (0, I) for all () E 8. Then,<I> is usc (and, in fact, continuous) on e since it is a constant correspondence, butOr(<I» is not closed. Indeed, <I> is not even closed-valued. U

We close this section with an 'important warning. It is not uncommon in th~ eco­nomics literature to define an upper-sernicontinuous correspondence as one with aclosed graph. Such a definition is legitimate provided the range space 51 of the cor

respondence is compact, and the correspondence itself is compact-valued, tor. a,Proposition 9.8 then shows, a correspondence is usc if and only if it has a closedgraph. However, in general, the link between usc and closed-graph correspondencesis weak: it is not the case that usc correspondences necessarily have closed graphs,nor is it true that all closed-graph correspondences are usc (indeed, even if they arcalso compact-valued). Consider the following examples:

2\\9. / CO""('J{J(Jlldl'flCCS

Page 242: A First Course in Optimization Theory

3Pcrhaps On this account. it is not uncommon in the economics literature to use the adjective "hemicontin­uous" when referring to correspondences. and to reserve "semicontinuous" for functions, Since very littleof this book involves semicontinuous functions. we continue to use "semicontinuous" in both cases.

Since a continuous function is both lsc and usc, it easily follows that a correspon­dence which is single-valued and usc is both usc and Isc, when viewed as a function.However, a function which is only usc but not Isc (or only lsc and not usc) is neitherlsc nor usc when viewed as a single-valued correspondence. This is quite easy to see.Suppose that I were, for instance, usc, but not Isc. Then, there must exist a point xand a sequence xp -+ x such that I(x) > limp !(xp). Let E = f(x) - limp I(xp).Take the open ball (j(x) - E /2, I(x) + E /2) around f(x). This open ball does notcontain I(xp) for any large n, so, viewed as a single-valued correspondence, f isneither lsc nor usc.

Proof Suppose I is usc on D. Let a = sup I(,D). By definition of the supremum,there must exist a sequence lap} in I(D) such that ap -+ a. Since ap E I(D)for each n, there must be xp E 1) such that I(xp) = ap' and I(xp) -+ a. Since1) is compact, there must exist a subsequence Xk(p) of xp and ZI E V such thatXk(p) -+ ZI· Obviously, f(xk(p» -+ a, and since I is usc, we must have l(zl) ::::lim SUPn->00 I(Xk(p» = a, so by the definition of a, I(z!) = a, which establishesPart I. Since I is usc if and only if - f is Isc, Part 2 is also proved. 0

l. If I is usc all V. it attains its supremum on Ti.i.e .. there is ZI E D such (hatf(z,!::: f tx) for all x E V.

2. If I is lsc Oil D, it attains its infimum Oil D, i.e., there exists ZJo E D such thatI(Z2) ~ I(x)forallx ED.

Theorem 9.13 (Generalized Weierstrass Theorem) Let D c IR.n be compact andf: D -+ lR.

There are notions of upper- and lower-semicontinuity for functions also, and some­what unfortunately, the terminology is rnisleading.f We elaborate on this below.

A function f: D c lRn -+ R is said to be upper-semicontinuous or usc at xEDif for all sequences xk -+ x, lim sUPk--+oo!(Xk) ~ [tx]. The function! is said tobe lower-semicontinuous or lsc at x if - ! is usc at x, i.e., if lim infk-+oo !(Xk) ::::f(x). '\,(. ( IAfjiffir t.."'~.as-"f ~ / ifc.. (;)t,noJ

A semicontinuous (usc or lsc) function'Keed not be a continuous function; indeed, leNt;;.a function is continuous (if and) only if it is both usc and Isc. The importance of /:1Ji?( ,

semicontinuous functions for optimization theory lies in the following generalizationof the Weierstrass Theorem:

Chapter 9 TheMaximum Theorem234

Page 243: A First Course in Optimization Theory

4Thc mathematics literature on parametric optimization contains a large number of related results. whichexamine the consequences of weakening or otherwise altering the hypotheses of the Maximum Theorem.For details, the reader is referred 10 Berge (1963) or Bank. ct al. (1983).

Proof We will appeal to Propositions 9.8 and 9.9. So let () E e, and let (}mbe asequence in e converging to (). Pickr., E v·(em). By Proposition 9.8, there exists asubsequence Xk(m) of Xm such thatxk(m) -+ x E Dee). The theorem will be proved ifwe can show that f ix , 8) = f* (8), for this will establish not only the continuity off* at (), but also (by Proposition 9.8) the upper-semicontinuity of the correspondenceV* at e.

Since j is continuous on ex S, it is the case that f(Xk(m), (}k(m» -+ [t;x , 0). Sup­pose there were x* E D(e) such that ftx", (J) > f(x, 8). We show a contradictionmust result.

Since V is lsc at (), x· E D(e), and Xk(m) --+ x, there is, by Proposition 9.9, afurther subsequence l(m) of k(m) and xt("') E D(O'(m» such that x/~m) ---> x*. Itfollows from the continuity of j that

lim f(x~m)' e/(m)) = j(x·, e) > ft», 8) = lim l(x/(m),8/(m»'m~oo m~oo

Then f* is a continuous function on e, and V* is a compact-valued, upper-semi­continuous correspondence on e.

j*«(}) = max{j(x, e) I x E V«(})}

v·(e) = argmax{f(x, (}) I x E Dee)} = [x E V«(}) I f t;x , 0) = r(O)}.

Theorem 9.14 (The Maximum Theorem) Let f: S x (0 --+ lR be (/ continuousfunction, and'D: e -+ P(S) be a compact-valued, continuous correspondence. Letf": e --+ IR and D*: e -+ peS) be defined by

9.2.1 The Maximum TheoremThe Maximum Theorem is essentially a statement that if the primitives of a parametricfamily of optimization problems possess a sufficient degree of continuity. then thesolutions will also be continuous, albeit not to the same degree.

9.2 Parametric Contlnulty: The Maximum TheoremThis section describes the two main results of this chapter. In subsection 9.2.1. wepresent one of the fundamental results of optimization theory. the Maximum Theo­rem. This is followed in subsection 9.2.2 by an examination of the problem, whenadditional convexity conditions are imposed on the family of optimization problems."Throughout this section, it will be assumed that e c ]R' and S c lRn. where I and /I

are arbitrary positive integers.

9.2 Tilt' Mcuimllm TIIl'On'1II

Page 244: A First Course in Optimization Theory

Trivially, f is continuous on S x e. The correspondence 'DO is continuous andcompact-valued everywhere on e except at () = 0, where it is usc, but not Isc. We

0=0,

e > O.'D(O) = {[O, 1],

{OJ,

and

(X,O) E Sx 0),f(x,O) =x,

Example 9.16 Let S = e = [0, I), Define f: S x e -+ lR and V: e -+ P(S) by

oClearly, V*(9) is usc, but not lsc, at f) = O.

0=0.1)*(8) = [1,2],

At (J = 0, however, f is identically equal to unity on V(O), so

(J > O.'D*(8) = {2l.

and 'D(O) = [1,2] for all 0 E e. Note that 'DO is continuous and compact-valuedon e, while f is continuous on S x e.

For 8 > 0, f is strictly increasing on S, so we have

(x,8) E S x e

Example 9.15 Let 8 = [0, I] and S= [1,2].Define f: S x e -+ lRand D: e ~P(S) by

It is important to reiterate that although we assume full continuity in the primitivesof the problem, the Maximum Theorem claims only that IlPper-semicontinuity of theoptimal action correspondence will result, and not lower-semicontinuity. This raisestwo questions. First, is it possible, under the theorem's hypotheses, to strengthenthe conclusions and obtain full continuity of V*'! Alternatively, is it possible toobtain the same set of conclusions under a weaker hypothesis, namely, that 1) is onlyupper-semicontinuous, and not necessarily also lower-sernicontinuous, on 8? Thefollowing examples show, respectively, that the answer to each of these questions isin the negative.

oand therefore, XI(m) ¢ V*(OI(m», a contradiction.

But this means that for all sufficiently large m, we must have

Chapter 9 The Maximum Theorem236

Page 245: A First Course in Optimization Theory

1. f* is a continuous function Oil e, and D* is a usc correspondence 011 (-).

2, If 1(" (}) is concave in x for each 0, and Dis convex-valued (i. e.. D(fJ) is IICOIll'I',t

setfor each 8), then D· is a convex-valued correspondence. When" concave" isreplaced by "strictly concave," then V* is a single-valued usc correspondence,hence a continuous function.

3, Iff is concave on Sx e,and'D has a com'ex graph. then 1*is a concave function.and V* is a convex-valued usc correspondence. If "concave" is replaced bv"strictly concave," then f* is also strictly concave, and 1)* is single-valuedeverywhere, and therefore, a continuous function,

ThcII:

f*(O) = max(j(x, 0) I x E V(O»)

V*(8) = argmax(j(x,8) I x E V(8») = (x E V(O) I ft», 0) = /*(8»).

Theorem 9.17 (The Maximum Theorem under Convexity) Suppose f is a CO/l­

tinuous junction on S x e and V is a compact-valued continuous correspondenceone. Let

9.2.2 The Maximum Theorem under ConvexityThe purpose of this subsection is to highlight some points regarding the MaximumTheorem under convexity restrictions on the underlying problem. We retain the no­tation of the previous subsection. We will prove the following result:

Finally, a small note. The joint continuity of I in (x, 0), i.e., the continuity of Ion S x 8, is important for the validity of the Maximum Theorem. This hypothesiscannot be replaced with one of separate continuity, i.e., that f(', 8) is continuouson S for each fixed 0, and that f(x, .) is continuous on e for each f xed x. For ..Illexample, see the Exercises.

Both conclusions of the Maximum Theorem fail: f* is not continuous at f) = 0, andV* is not usc (or, for that matter, lsc) at 8 = O. 0

/*(e) = { I, f) = 0,0, f) > 0.

V*(8) = { (I), 8 = 0,(O), ()> O.

have

2379.2 Tile Maximum Theorem

Page 246: A First Course in Optimization Theory

51nfact, I is concave On S x 0.

o E e,

Example 9.18 Let S = e = (0, I]. Define f: S x e ~ lR and D: e ~ peS) byft», e) = x for all (x , (J) E S x e, and DCe) = [0,82]. Then, I is concave on S foreach fixed 8,5 and '0(-) is convex-valued for each O.Note, however, that '00 doesnot have a convex graph. Since I is strictly increasing on S for each 0, we have

Once again, simple examples show that the result cannot be strengthened. Forinstance, Part 2 of the theorem assumes that I is concave and V is convex-valued;while the convex-valuedness of V is inherited by D*, the theorem makes no claimregarding the concavity of f*. In fact, the hypotheses of Part 2 are insufficient toobtain concavity of 1*; the stronger assumption made in Part 3 that D has a convexgraph is needed. Here is an example:

This establishes the concavity of f*. If I is strictly concave, then the second in­equality in this string becomes strict, proving strict concavity of 1*. 0

j*(0') 2: f tx'; 0')= f t);» + (l - A)i, ),,0 + (l - )..)e)

2: A/(x, 0) + (I - A)/(;, 0)= )..j*(0) + (l - )..)/*(0).

so, by the definition of f*, we must also have x' E '0* (s), If f is strictly concave,then, since Vee) is a convex set, f(·,8) has a unique maximizer, so '0* must besingle-valued. This completes the proof of Part 2.

Finally, to see Part 3, let (J, e E e, and let 9' = J...(J+ (I - ),)0 for some). E (0, I).Pick any x E '0*(0), and i E V(O). Let x' = Ax + (1 - A)i. Then, since x E V(O)and; E V(e), and VO has a convex graph, we must have x' E V«(J'). Since x' isfeasible, but not necessarily optimal at 0',we have by the concavity of I,

l(x',B) = I[Ax + (I-A)i,e]

2: Af(x, e) + (1 - )...)f(X, B)= ).f*(8) + (l - A)f*(8)

= /*(a),

Proof Part 1 is just the statement of the Maximum Theorem. To see Part 2, supposex, i E 1)*(B). Let x' = Ax + (1 - A)i for some A E (0, 1). Since D(e) is convex,x' E D(e). Then,

Chapter 9 The Maximum Theorem238

Page 247: A First Course in Optimization Theory

I, Suppose f(" () is quasi-concave in x [or each e, and Vis convex-valued 011 H.Then, V* is a convex-valued usc correspondence.

2. If "quasi-concave " is replaced by "strictly quasi-concave," V* is single-valuedeverywhere on e, and hence defines a continuous function.

Corollary 9.20 Let f: S x e -+ IRbe comilluous, and'D: (:) -+ peS) be cOlltirlllOLISand compact-valued. Define 1* and V* {IS in Theorem 9.17.

The point of the examples of this subsection-as also of the examples of theprevious subsection-is simply that it is not possible to obtain all the properties ofthe primitives in the solutions: inevitably, some properties are lost in the process ofoptimization itself.

Finally, we note that since the maximization of quasi-concave functions over con­vex sets also results in a convex set of maximizers, and since strictly quasi-concavefunctions also have unique maxima, we have the following obvious corollary ofTheorem 9.17:

LJThe graph ofV*(-) is clearly not convex.

() E 8.V*«()) = (hI,

Then, I is concave on S x 8, and is, in fact, strictly concave on S for each fixed (),while DO has a convex graph. Since I is strictly increasing on S for each fixed (),we have

() E 8.D(V) = [0, .)8],

and

(x, fJ) E S x (-0,fcx , fJ) = ../X,

Example 9.19 Let S = 8 = [0, I]. Define f :S x e -+ IRand V: e -+ peS) by

Similarly, although Part 3 of the theorem assumes that V(·) has a convex graph, thetheorem does not claim that V* also has a convex graph. As the following examplereveals, such a claim would be false.

uClearly, 1'" is not concave on (-::).

fJ E e.and, therefore,

2399.2 TheMaximum Theorem

Page 248: A First Course in Optimization Theory

Lemma 9.22 B: e -jo P(S) is usc.

Proof Compact-valuedness is obvious. since prices are assumed to be strictly pos­itive. Convex-valuedness is also obvious. The continuity of 8is established throughthe following lemmata:

Theorem 9.21 The correspondence B: e -jo P(S) is a continuous, compact-valuedand convex-valued correspondence.

9.3.1 Continuity of tireBudget Correspondence

We will prove the following result:

The function vO and the correspondence x(·) form the objects of our study in thissection. We proceed in two steps. First in subsection 9.4.1. we establish that thebudget correspondence B is a compact-valued continuous correspondence on e.Then. in subsection 9.4.2, we examine the properties of v and x in the parameters(P. 1).especially when additional convexity restrictions are placed on the primitives.

v(p.l) = max(u(x) I x E B(p.l)}

x(P.1) = (x E 8(p.l) I u(x) = vt p, I)}.

Now, define the indirect utility function v(·) and the demand correspondence x (.)by:

Bt p, I) = (x E IR~ I p . x ~ I).

9.3 An Application to Consumer Theory

This section is devoted to examining an application of the Maximum Theorem and theMaximum Theorem under convexity to the problem of utility maximization subjectto a budget constraint. The objective is to examine under what conditions it is thatmaximized utility varies continuously with changes in underlying parameters (i.e.,the prices and income), and how optimal demand varies as these parameters change.

There are I commodities in the model, which may be consumed in nonnegativeamounts. We will assume that the price of each commodity is always strictly positiveso IR~+ will represent our price space. Similarly, income will also be assumed to bestrictly positive always, so IR++ will represent the space of possible income levels.Define (9 = IR~+ x IR++. The set 00) will serve as our parameter space; a typicalelement of e will be represented by (p, I).

Let S = 1R~ be the set of all possible consumption bundles. Let u: S -jo IRbe acontinuous utility function. The budget correspondence 8:e -jo P(S) is given by:

Chapter 9 TheMaximum Theorem240

Page 249: A First Course in Optimization Theory

Proof Let (p. I) E e and V be an open set such that V n Bt p, f) 1= 0. Let x bea point in this intersection. Then P: x ::: I. Since V is open. ox E V for 0 < I. 0close to I. Let x = ox. Then. p- i < p . x S I. Suppose there is no neighborhoodof (p. I) E e such that Bt p', I') n v # '"for all (p'. I') in the neighborhood. Takea sequence f (II) -+ O.and pick (Pn. In) E NE(n) (p, I) such that B(Pn. In) n V = 0.Now (Pn.x - In) ~ (p.x - I) < O,so for n sufficiently large • .i E B(Pn. In) eVe.a contradiction. 0

Lemma 9.23 B: e -+ peS) is /sc.

Therefore. there is a subsequence of (xn). which we shall continue to denote by {xn Ifor notational simplicity, along which {xn} converges to a limit x.

Since Xn ~ 0 for all n, we also have x :::O.Moreover. Pn . Xn -+ P: x , and sincePn .Xn ::: In for each n and In ~ I. we also have p- x ::: I. Therefore. x E B( p. i v,and since Btp, I) c V. by hypothesis. we have x E V.

However. Xn ¢ V for any n, and V is an open set. Therefore. we also have x 1. V.a contradiction. This establishes Lemma 9.22. []

M = {XER~I,.,txi:::2/}.1=1

Since In ~ I. we can also ensure. by taking n" large enough. that In ::: 21 forn :::n", Now, at any n :::n"; 1/ is a lower bound for the price of commodity i, sincePin > 1/. Moreover. 21 is an upper bound on the income at n. It follows that forn :::/1*. we have Xn EM. where M is the compact set defined by

i = 1 ....• 1.Pin> n,

Suppose B is not usc at (p. I). Then. for all E > O.there is (p'. I') in N( (p, I) suchthat B(p'. I') is not contained in V. i.e .• there exists x' such that x' E B( r'. I"),

x' if. V. We will show a contradiction must occur.ChooseasequenceE(n) ~ O.and let (Pn, I,,) E N(n)(P, I).withx" E B(Pn, In)

but Xn ¢ V. We will first argue that the {xn} sequence contains a convergent subse­quence. by showing that the sequence lies in a compact set.

Since P » 0, there is 1/ > 0 such that P, > 21/, i = 1, ... , l , Since {III --. I). wehave Pin ~ Pi for each i = I, ...• I. Therefore. there is n" such that for all II ~ /I' •

we have

Proof Let V C lR.tbe an open set such that B(p. J) C V. Define an e-neighborhood ,N,(p, I), of (p, I) in e by

Nf(p. I) = (p'. I') E 1R~+ x 1R++ I lip - p'lI + II - I'i -< (I.

2419.3 Application: Consumer Theory

Page 250: A First Course in Optimization Theory

p . X(A) = p. (Ax) + p . «(1 - ).)x')

= AP' x + (1- ),,)p. x':s ).f + (1 - ).)/'

= I(A).

Proof We have already shown that l3 is a continuous and compact-valued corre­spondence on 8. Since II is continuous on S by hypothesis, and II does not dependon (p. I),u is certainly continuous on ex S. Then, by the Maximum Theorem, v(·)is a continuous function on 8, and x(·) is a compact-valued usc correspondence ont). This establishes Part I.That v(p, 1) is nonincreasing in p for fixed 1 is easily seen: if there are (p, 1) and

(1'" I) in e such that p ~ p', then Bt.p, I) C l3(p'1). Thus, any consumption levelthat is feasible at (p, I) is also feasible at ip', 1), and it follows that we must havev(p', I) ::: v(p, 1). The other part of Part 2-that v is nondecreasing in 1 (i.e., thatmore income cannot hurt)-is similarly established.Now, suppose II is concave. We will first show using the Maximum Theorem under

Convexity that v(·) must be concave in 1for fixed p.We will use the notation B(p, .)to denote that p is fixed at the value p, and only the parameter [ is allowed to vary.Note that since B is continuous in (1', I), it is also continuous in 1 for fixed p.We claim that l3(p,') has a convexgraph in I. To see this, pick any I, [' in lR++,

andletx E l3(p.l)andx' E 8(p, 1').PickanyA E (0, 1).LetX(A) = h+(l-A)X',and J (A) = AI + (l - A)['. We have to show that x().) is in l3(p, J (A». But this isimmediate since

Theorem 9.24 The indirect utility function v(·) and the demand correspondencex(·) have thefollowing properties on e:1. v(·) is a continuousfunction on 8, and x (.) is a compact-valued usc correspon­

dence on e.2. v(·) is nondecreasing in J for fixed 1', and is nonincreasing in p for fixed I.3. If u is concave, then v(·) is quasi-convex in p for fixed I, and is concave in I for

fixed p; further, x(·) is a convex-valuedcorrespondence.4. If u is strictly concave, then x (.) is a continuousfunction.

9.3.2 The Indirect Utility Function and Demand CorrespondenceThe properties of v(·) and xO that we shall prove in this subsection are summed upin the following theorem:

By Lemmata 9.22 and 9.23, 8 is a continuous correspondence. The theorem isestablished. 0

Chapter 9 The Maximum Theorem242

Page 251: A First Course in Optimization Theory

A mixed strategy a, for player i is interpreted as a strategy under which player iwill play the pure action s, E S, with probability a, (Si). One of (he most importantreasons we allow players to usc mixed strategies is that the resulting strategy set

9.4 An Application to Nash Equilibrium9.4.1 Normal-Form Games

Unlike perfectly competitive markets in which no single agent has any market power.and monopoly in which one agent has all the market power, game theory attemptsto study social situations in which there are many agents. each of whom has someeffe,.: (In the overall outcome, In this section. we introduce the notion of normal-formgames, and describe the notions of mixed strategies and a Nash equilibrium.

A norm ..l-form game (henceforth, simply game) is specified by:

I. A finite set of players N. A generic player is indexed by i.2. For each i E N. a strategy set or action set 5i. with typical element s.,3. Foreachi E N,apayofffunctionorrewardfullctionri:S\ x ... x Sn""'" IR:.

We will confine our attention in this section to the case where S, is s finite set foreach i, Elements of 5j are also called the pure strategies of player i. The set of mixedstrategies available to player i is simply the set of all probability distributions on Si.and is denoted 1;i, i.e.,

Remark It is left to the reader to check that if p is also allowed to vary, the graphof 8could fail to be convex in the parameters. In particular. Part 3 of the theoremcannot be strengthened. It is also left to the reader to examine if it is true that whenu is strictly concave. v is strictly concave in I for fixed p, i.e .. if Part 4 can bestrengthened.

"00' ' .....~ '.r _

Since u is concave in x and is independent of 1, it is certainly jointly concave in(x, I). Thus, v(p, l) = max(u(x) I x E B(p, l)} is concave in I by the MaximumTheorem under Convexity. This completes the proof of the first part of Pan 1.

The proof of the second part-the quasi-convexity of v in p, for each fixed value ofI-is left as an exercise to the reader. Finally, since Btp, I) is compact and convexfor each (p, I), it is immediate that x(p, 1) is a convex set for each (p. /). Thiscompletes the proof of Part 3.

Finally, suppose II is strictly concave. Since Bt p, l) is a convex set. x(p, /) issingle-valued for each (1', /), and is thus a continuous function. Ll

9.4 Application: Nash Equilibrium

Page 252: A First Course in Optimization Theory

The definition of fixed points for functions generalizes in a natural way to a defi­nition of fixed points for correspondences <l> from X into itself: a point x E X is said

Example 9.25 Let X = [0, 1], and let I: X -+ X be defined by I(x) = x2. Then,I has two fixed points on X, namely, the points 0 and I. 0

9.4.2 The Brouwerl Kakutani Fixed Point TheoremGiven a set X and a function I mapping X into itself, a point x E X is said to be afixed point of I if it is the ease that I(x) = x.

The set of all best responses of player ito (1 is denoted B R, (a). Observe that B R, (a)only depends on (1-i and not on the r-th coordinate of a.Notationally, however, it ismuch easier to write B R; as depending on the entire vector a.

A Nash equilibrium of the game is a strategy vector a" = (aj •... ,a;) E 1: suchthat for each; E N we have at E BR, (a*).

The existence of at least one Nash equilibrium in every finite game is proved below.A technical digression is required first in the form of the Brouwer/Kakutani FixedPoint Theorem.

It is left to the reader to verify that ri so defined is a continuous fsnction on :E foreach i.

Given any (J E ':E, we define 0; E ':Ej to be player i's best response to (1 if

rj(oj.a_;)?: r;(O-;,a_;), for all 0-; E :Ej.

~j is convex. and convexity of the set of available strategies for each player plays acritical role in proving existence of equilibrium. (Note that Sj is not convex, since itis a finite set.)The following shorthand notation will be useful. In statements pertaining to i , let

the index =i denote "everyone-bur-r." Let S = XjeNSj. and for any i E N. letLj = xji.jSj' For any S_; E S-i and Sj E Sf, (Sj,s_;) will represent the obviousvector in S. Similarly. let :E= xieN:Ei, and let :E-i = x ji.i:Ei. Observe that (a) Sis a finite set. and (b) :E is a convex set. Observe also that 1: is strictly smaller thanthe set of probability measures on S (why?).

Given a vector (1 = (ai, ... , (1n) E :E. the pure-action profile () = (SI, ... sn) E Soccurs with probability al (SI) x ... X (1n(sn). Thus, player i receives a reward ofrj (s) with probability a1 (Sl) x ... X (1n(sn). Assuming that players seek to maximizeexpected reward, we extend the payoff functions to :E by defining for each i, and forany c = (ai, .... an) E:E,

r;(a) =L (r;(s)al (SI) ... an(sn» .seS

Chapter 9 The Maximum Theorem244

Page 253: A First Course in Optimization Theory

Example 9.29 Let X = R Define <l> by <l>(x) = [x + I. x + 3), Then. X is convex.<l> is nonempty-valued, convex-valued, and usc (in fact. continuous). However, X isnot compact, and ~ fails to possess a fixed point. 0

Then, X is compact and convex, and <l> is nonernpty-valued and, convex-valued,However, 4> is not usc; it also fails to have a fixed point. 0

x=()

.r ¥ o.<1>(.\') = {{II.

(OJ.

Example 9.28 Let X = [0. I]. and let 4> be defined by

Then, X is compact and convex. and 4> is nonernpty-valued and usc. However. <l> isnot convex-valued at I, and evidently <l> possesses no fixed point: 0

I {2l. x E [0. I)<1>(x) = {0,2l, x = I

[O], x E (1,2].

Example 9.27 Let X = [0. 2], and let <l> be defined by

While the condition that <l> be nonempty-valued is of obvious importance. theother conditions (that it be usc, compact- and convex-valued) are also critical. More­over, neither the compactness nor the convexity of X can be dispensed with. Simpleexamples show that if any of these conditions are not satisfied. a fixed point may notexist:

oProof See Smart (1974).

Theorem 9.26 (Kakutani's Fixed Point Theorem) Let X C IRn he compact andconvex. If 4>: X -+ P(X) is a usc correspondence that has none mptv, compact. andconvex values, then 4> has a fixed point.

to be a fixed point of <1>: X ~ P(X) if it is the case that x E <1> (x ). that is. if it is-thc­case that the image of x under <t> contains r . Note that when <l> is single-valued. thisis precisely 'he definition (If II fixed point for a function.

The following result provides sufficient conditions for a correspondence to possessa fixed point:

9.4 ;\Pl'limtio/l: N(/,," Equilibrium

Page 254: A First Course in Optimization Theory

Lemma 9.34 B R is nonempty-valued, convex-valued, and usc on 'E.

Proof We show that BRi is nonempty-valued, convex-valued. and usc for each i.This will obviously establish the result.

Pick any a E 'E and i EN. Then. player j solves:

max{r;(a:, a_i) I a: E :E;}.

Lemma 9,33 L is COli vex and compact.We will now show that B R has all the properties required by the Kakutani Fixed

Point Theorem.

Proof Recall that for any a E 'E, B Ri (a) C :E; denotes the set of best responses ofplayer i to a. B R; evidently defines a correspondence from :E to Lj.Thus, definingB R = xiEN B Ri, we obtain a correspondence BR from 'E into itself. By definition,a Nash equilibrium is simply a fixed point of this correspondence. Thus. the theoremwill be proved if we can show that BR and L satisfy all the conditions of the Kakutanifixed point theorem. That L does so is immediate (why is compactness apparent?).

Theorem 9.32 Everyfinite game r = (N, (Sj, r;)jeN} has aNash equilibriumpoint.

9.4.3 Existence of Nash EquilibriumWe shall now show that every finite game has at least one Nash equilibrium.

oProof See Smart (1974).

Theorem 9.31 (Brouwer's Fixed Point Theorem) Let X C ]Rn be compact andconvex,and f: X -r X a continuousfunction. Then f has afixed point.

In closing this section, it should be noted that Kakutani's Theorem is a generalizationof an earlier fixed point theorem that was proved by L.E.I. Brouwer in 1912 forfunctions:

Then, X is compact, ¢> is nonernpty-valued, convex-valued, and usc (in fact, contin­uous), but X is not convex, and <l> does not have a fixed point. 0

x E [0, 1]x E [2,3].

<l>(x) = {{2},{I} ,

Example 9.30 Let X = [0, I] U [2, 3]. Let ¢ be given by

Chapter 9 The Maximum Theorem246

Page 255: A First Course in Optimization Theory

o E 0).o E 8.o E 8.

(a) <1>(0) = [0, ()].(b) <1>(0) = [0,0).(c) <1>(0) = (0, OJ.

9.S Exercises

1. Let S = 8 = ~+.Determine, in each of the following cases, whether <1>: e _.P(S) is usc and/or lsc at each () E 8:

Remark The existence of a Nash equilibrium inpure strategies can be shown underthe following modified set of assumptions: for each i (i) S, is a compact and convex.set. and (ii) ri is quasi-concave on S, for every fixed s s., E S_ j. The argumentsinvolve essentially a repetition of those above, except that mixed strategies arc notused at any point (so the best-response correspondence maps S) x ... x S~ intoitself). The details are left to the reader. r I

The existence of a Nash equilibrium in every finite game now follows from Kaku-tani's fixed point theorem. 0

It follows that rj is linear (so concave) on I:j. By the Maximum Theorem underConvexity, B R; is also convex-valued, and this completes the proof. 0

Then, for any OJ E Ei and a=i E L:_j,

r; (a; , a-i) = L r;(s;. a_j)aj(Sj) ..vjESj

r;(s;,a_;) = r;(5;,s_;) naj(sj).Hj

is a nonernpty-valued, compact-valued, usc correspondence from L: into itself.It remains to be shown that BRj is convex-valued. Define for any s, E S, and

a_i E "E-i,

BRj(a) = argmax{rj(ai.a_j) I aj E I:;}

,'..,.. ''_,,, .'This is simply a parametric family of optimization problems in which:

• The parameter space is given by ~.• The action space is given by "E;.• For af E S; and a ::: (a;, a_;) E "E, the reward f(u:, o ) is given by r; (OJ. o -j).• The correspondence of feasible actions 1) is given by 1)(a) = L:j for all a E L:.

Since 'Ei is compact and convex, the feasible action correspondence is continuous,compact-valued, and convex-valued. Since rj is continuous on I:; x L:_j, it followsfrom the Maximum Theorem that

2479.5 Exercises

Page 256: A First Course in Optimization Theory

if x E (0. 1]if x =0.f(x) = l:

Is this correspondence usc on /? Is it lsc? Does it have a closed graph?

9. Let X = Y = [0, 1I,and f: X -+ Y be defined by

4>(x) = (O, x}, x E I.

Is this correspondence usc on IR.+ ? Is itlsc? Does it have a closed graph?

8. Let I = [0, I], and let the correspondence 4>: 1 -+ P(l) be defined by

Determine if <1> is usc and/or Isc on IR.

7. Define the correspondence 4>: IR.+ ~ P(IR.+) as

4>(x) = [x, 00), xeR+

if x i= 0if x = O.

if x i= 0if x = O.

if x » 0if x = O.

if x > 0if x =0.

6. Let <1>: 1R ~ P(lR.) be defined by

<1>(x) = { (0, 1)[0, I]

Determine if <1> is usc and/or Isc on R,

Determine if 4> is usc and/or lsc on IR+.

S. Let 4>: 1R ~ P(lR.) be defined by

<1>(x) = { [0, 1](0, I)

Determine if ~ is usc and/or lsc on IR+.

4. Let e:R+ ~ P(lR) be defined by

<1>(x) = {{I/X}(Ol

Determine if 4> is usc and/or lsc on lR.3. Let e: lR+ ~ P(IR.) be defined as

{[0, llx]

4>(x) =(OJ

(d) ~(e) = (0, e) if e > 0, and <1>(0)= (OJ.

2. Let e:R ~ peR) be defined as

4>(x) = [-lxI, Ixl], x E lR.

248 Chapter 9 The Maximum Theorem

Page 257: A First Course in Optimization Theory

(a) Do f and V meet all the conditions of the Maximum Theorem? If yes, proveyour claim. If no, list all the conditions you believe are violated and explainprecisely why you believe each of them is violated.

if 0 E [0, 1/2)

if o E (1/2, I].{[0, 1-28]

V(O) = (0,2 - 20}

Let the correspondence V: e ~ P(S) be defined by

o if x > 20.

if 0 > °and x E [0, 8)

if 0 > 0 and x E [8,28]

x/O2 - (x/FJ)

f('O) =

13. Let S = [0,2]. and e == [0, 1]. Let j: S x e .......IR be defined by

° if 0 = 0

Let V"(a) be the set of maximizers of 1(·,a) on V(a). Do the hypotheses of theMaximum Theorem hold for this problem? Verify, through direct calculation,whether the conclusions of the Maximum Theorem hold.

!*(a) = max.{j(a, x) I x E V(a)}.

Let

Examine if I is an upper-semicontinuous and/or lower-sernicontinuous functiononX.

12. Let f: IR+ x IR+ ~ IR be defined by

[Ia, x) = (x -I) - (x _a)2, (a,x) E IR~.

Define the correspondence V: IR+ ---}-P(IR+) by

V(a) = {y E lR+ : y :5 u}, a E lR+.

if x is irrational

if x is rational.I(x) = {~

II. Let X = Y = [0, I] and f :X .......Y be defined by

Determine if(a) I is an upper-sernicontinuous function on X, and (b) if it attainsa maximum on X.

if x E [0, I)

if x = I.I(x) = {~

Examine if f is an upper-semicontinuous and/or lower-semicontinuous functionon X. Does f attain a minimum on X?

10. Let X = Y = [0, I], and f: X ~ Y be defined by

2499.5 Exercises

Page 258: A First Course in Optimization Theory

(a) Show that [*(-) must then be nondecreasing on 8.(b) Examine under what additional conditions [ will be strictly increasing on

e.(c) Give an example to show that if V is not monotone, then [* need not be

nondecreasing on e.16. Let S = e = (0, 1]. Define '0:8 -+ peS) by V(8) = [0, fJ] for all 8 E S. Give

an example of a function [: S x e -+ lR such that [*(8) = max{j(x, 0) I x ED(O) I fails to be continuous at only the point fJ = 1/2, or show that no suchexample is possible.

17. Let S = e = 1R+. Describe a continuous function f: S x e .....lR and acontinuous, compact-valued correspondence D: e -+ P(S) that possess thefollowing properties (or show that no such example is possible):

(a) '0 is an increasing correspondence on e, that is, 0 > 0' implies D(O) ::::>

'0(8').(b) There are points e), 82, 83 in e such that 8) > (h > 83, but /*(81) >

[*(03) > r«()2), where r(e) = max{f(x) I x E Vee)}.18. Provide examples to show that Part 2 of the Maximum Theorem under Convexity

is not necessarily true if'D is not convex-valued, or if f is not necessarily concave.

19. Provide an example to show that Part 3 of the Maximum Theorem under Con­vexity is not necessarily true if the graph of V fails to be convex.

That is, all actions feasible at a value of fJ are also feasible at all larger values ofe.

fJ ?: e' => V(B) ::::> V(fJ').

15. Let [: S x e .....lR be a continuous function, and V: e .....peS) be a contin­uous compact-valued correspondence, where S C ]Rn, e c ]R'. Suppose thecorrespondence V is monotonic in the sense that ife, e' E e, then

if e E [0, 1/2)ife E [1/2, I].{

[0, I - 28]Vee) = [0,28 - 1]

Is it the case that v*(e) :f= 0 for each 8 E e? If so, determine whether V*is usc and/or lsc on e.

14. Repeat the last problem with the definition of'D changed to

(b) Let 1)''': e .....Pt S) be given by

D*(t)::::: {x E D(O) I [(x,e) ~ [(x/,e) for all x E D(O»).

Chapter 9 The Maximum Theorem250

Page 259: A First Course in Optimization Theory

22. Suppose that in the statement of the Maximum Theorem, the hypothesis on fwas weakened to require only the upper-semicontinuity of f on S x (-), ratherthan full continuity. Assume that the other conditions of the theorem remainunchanged.

(a) Prove that the maximized value function f· will also now be an upper­semicontinuous function on e.

(b) Give an example to show that V· may fail to be usc on 0).

23. Let {N, (S;, r;);eN) be a normal-form game in which for each i. S, is a convexand compact set, and rt is a continuous concave function on S. Show that thisgame has a Nash equilibrium in pure strategies, i.e., that there is s" E S suchthat rj (s") ~ rj (s;, S~j) for any Sj E S, and any i EN.

24. Show that the result of the previous question holds even if rt is only quasi-concaveon St for each fixed Lj E S_j.

25. Find all the Nash equilibria of each the following games:

V.(O) = arg min{f(x, 0) 1 x E V(O)}?

and the correspondence of minimizers

f.(O) = max{f(x, 0) 1 x E D(O)},

20. Given an example of a function"/: S x e -4 lR and a correspondence D: 't-) .....

P(S) such that f is not continuous on S x e,D is not continuous on 0),but f* isa continuous function on e, and V· is an upper-semicontinuous correspondenceon e, where f·andV·are obtained from f and Vas in the Maximum Theorem.

21. Suppose we were minimizing [tx, 0) over x E D(O), where f and D meet allthe conditions of the Maximum Theorem. What can one say about the millillli;:_edvalue function

2519.5 F..'xercises

Page 260: A First Course in Optimization Theory

(4, 10) (2, 10) (8, 10)(4,4) (8,3) (2,10)(4,7) (3,7) (10,5)

(4,6) (0,0) (6,4)(0,0) (4,6) (0,0)(6,4) (0,0) (4,6)

(l0, 10) (10,6) (l0, 1)(6, 10) (14, 14) (8,2)(1,10) (2,8) (10,10)

Chapter 9 The Maximum Theorem252

Page 261: A First Course in Optimization Theory

253

Then, under some technical regularity conditions. optimal actions increase in theparameter vector e whenever all the off-diagonal terms in (he matrix D2 j(x, e) arcnonnegative. The remarkable part of the result is that it involves no restrictions onthe diagonal terms of this matrix; in particular. there are no convexity assumptionsrequired.

This chapter examines the issue of parametric monotonicity, The objective is (0

identify conditions on the primitives of parametric families of optimization problemsunder which the optimal action correspondence V*(.)varies monotonicallywith theparameter e, that is, under which increases in the parameter e result in increases inthe optimal actions.

Implicit in the very question of parametric rnonotonicity is the idea of an orderstructure on the parameter space e and the action space S. In subsection 10.1.1 below.we identify the structure we will require these spaces to have. This is followed insubsection 10.1.2 by the definition of the two key conditions of supermodularity andincreasingdifferences that will be placed on the objective function.

Section 10.2 then presents the main result of this chapter, a set of sufficient condi­tions on the primitives of a parametric family of optimization problems under whichparametric monotonicity obtains. A rough summary of this result is as follows. Sup­pose S c lRn and e c ]R/. Let D2 j(x, e) represent the (n + I) x (n + I) matrix ofsecond-partials of the objective function j:

Supermodularity and Parametric Monotonicity

10

Page 262: A First Course in Optimization Theory

x, Y EX==> {(x 1\ y) E X and (x v y) E Xl.

These elementary observations will come in handy in the proof of parametric mono­tonicity in Section 10.2.

A set X c Rm is said to be a sublattice of Rm if the meet and join of any twopoints in X is also in X. That is. X is a sublattice of JR.m if

x t. y => (x v y) > x

x i Y => (x 1\ y) < x.

It is always true that x v Y :::x; equality occurs if and only if x ::: y. Similarly. it isalways true that x 1\ Y ::: x, with equality if and only if X ::: y. In particular. it is thecase that

xv Y = (maxjxj , yd •...• max{xm• ym}).

In corresponding fashion. the join of x and y. denoted x v y. is defined to be thecoordinate-wise maximum of the points x and y:

10.1.1 LatticesGiven two points x and yin Rm. we define the meet of x and y. denoted x 1\ y. to bethe coordinate-wise minimum of x and y:

Except in the case m = I. the ordering given by":::" onR" is incomplete. Nonethe­less. this ordering suffices to define the restrictions that we will need to obtain para­metric monotonicity. We do this in two stages. In subsection 10.1.1, we discuss theconditions we will require the action space S to satisfy. Then. in subsection 10.1.2.we describe the assumptions that will be placed on the objective function f.

if Xi = Yi. j = 1•...• mif Xi ::: Yi. j = 1•...• mif x :::y and x =1= Yif Xi > Yi. j = I. '" .m .

x == y.x ::: y.x > y.x » y.

10.1 Lattices and SupermodularityRecall our basic notation that. given any two vectors x = (x I•...•xm) and y =(YI •...• YIII) in IRm. we have

Finally. Section 10.3 provides an example illustrating the value of the results ofthis chapter.

Chapter 10 Parametric Monotonicity254

Page 263: A First Course in Optimization Theory

f(z) + fez') :5 fez v Zl) + f(z 1\ z').

10.1.2 Supermodularity and Increasing Differences

Let Sand ebe subsets of R" and 1R.t, respectively. We will assume, through the restof this chapter, that S and e are both sublattices.

A function f: S x e ~ lR is said to be supermodular in (x, e) if it is the case thatfor all z = (r ,9) and z' = (x', e') in S x e, we have

oProof See Birkhoff (1967).

Theorem 10.1 Suppose X is a nonempty, compact sublattice of lRm. Then, X has agreatest element and a least element.

The following result offers sufficient conditions for a sublattice in ]Rm to admita greatest element; it plays an important role in the proof of our main result inSection 10.2:

are sublattices of JR2. On the other hand, the hyperplane

H' = {(x, y) E JR2 I x + y = I}

is not a sublattice of JR2: we have (1,0) E H' and (0, 1) E H', but the meet (0,0)and the join (1, I) of these two points are not contained in }r.

A sublattice X c JRm is said to be a compact sublattice in JRm if X is also a compactset in the Euclidean metric. Thus, any compact interval is a compact sublattice in JR.while the unit square J2 is a compact sublattice in JR2. On the other hand. the ope ninterval (0, 1) is a noncompact sublattice of IR, while the set lR.~ is a noncompactsub lattice of IRm.

A point x* E X is said to be a greatest element of a sublattice X if it is the casethat x" ::: x for all x E X. A point x E X is said to be a least element of a sublaniceX if it is the case that x S x for all x E X.

It is an elementary matter to construct sublattices that admit no greatest and/orleast element. For example, consider the open unit square JO in JR2:

1° = {(X,y)ElR.210<x< I, O<y< I).

and the hyperplane

For instance, any interval in IRis a sublattice of R while the unit square

1 = {(x, y) E IR2I°~x ~ I, 0 S y ~ I}

2'i'i10. J Lattices and Supermodularity

Page 264: A First Course in Optimization Theory

IIf z and z' are comparable under 2: (i.e., if either z 2: z' or z' 2: z), then a simple computation shows thatboth sides of the defining inequality are equal to /(z) + ftz'), so a strict inequality is impossible.

between the values of I evaluated at the larger action x and the lesser action x' isitself an increasing function of the parameter 8.

f'tx, 8) - [tx': 0)

If the inequality becomes strict whenever x > x' and 8 > 8', then I is said to satisfystrictly increasing differences in (x, 8).

In words, I has increasing differences in (x, 8) if the difference

l(x,8)-/(x',(})::: l(x,O')-/(x',(}').

Supermodularity of I in (x, 8) is the key notion that goes into obtaining parametricmonotonicity. It turns out, however, that two implications of supermodularity, whichare summarized in Theorem 10.3 below, are all we really need. A definition is requiredbefore we can state the exact result.

A function I: S x e ~ lRis said to satisfy increasing differences in (x, 0) if forall pairs (x, 8) and (x', 0') in S x e, it is the case that x ~ x' and e 2':: e' implies

Therefore, f t», e) = xO is supermodular on S x e. In fact, since strict inequalityholds whenever x < x' and 8 > o' (or x > x' and e > e'), I is strictly supermodularon S x e. 0

f t», e) + I(x', 8') ::: I«x, 8) v (x', e'» + I«x. e) 1\ (x', 8'».

[tx , 8) + ftx', 0') ::: I[(x, 8) v (x', 0')] + I[(x, 0) 1\ (x', 8')].

Now suppose, alternatively, that 8 < 8'. Then, (x, 8) v (x',O') = (x, 8') and(x, e) 1\ (x', 8') = (x', e). Therefore,

I«x, 8) v (x', e'» + I«x, e) 1\ (x', 0'» = x8' + x'U,

Now, (xe' + x'B) - (xO +x'B") = x(e' - 8) - x'(O' - 8) = (x - x')«()' - e) 2':: 0,so it is true in this case also that

Pick any (x, e) and (x', e') in S x e, and assume without loss that x ~ x', Supposefirst that e ~ o'. Then, (x, e) v (x', e') = (x, 8) and (x, 8) 1\ (x', 8') = (x', 8').Therefore, it is trivially the case that

f't», e) = xe.

Example 10.2 Let S = e = IR+. and let I: S x e ~ IRbe given by

If the inequality becomes strict whenever z and z' are not comparable under theordering ~. then I is said to be strictly supermodular in (x , e).1

Chapter 10 Parametric Monotonicity256

Page 265: A First Course in Optimization Theory

By a process a little more painful than that employed in Example 10.2, it can beshown that f meets the inequality required to be supermodular in (r , y, 0). For a

1«x,y),8) := xy8.

Example 10.5 Let S c IR~ ande c lR+. Let f: S x e - lR be given by

Theorem lOA is of particular interest from a computational standpoint: it becomeseasy to check for supermodularity of aC2 function. The following example illustratesthis point, using a generalization of the function of Example 10,2:

[JProof See Section 10.4.

i, j = 1, ... , m, i ¥ j.

Theorem 10.4 Let Z be an open sub/attice of R'". A C2 function h: Z -+ lR issupermodular 011 Z if and only if for all z E Z, we have

a'2h--(z) ~ 0,OZjOZj

A partial converse to Part 2 of Theorem 10.3 is valid. This converse is somewhatperipheral to our purposes here; consequently, in the interests of expositional continu­ity, we postpone its statement and proof to Section lOA below (see Theorem 10.12).Perhaps the most significant aspect of Theorem 10.12 is that it makes for an easyproof of the following very useful result concerning when a twice-differentiablefunction is supermodular:

[Jso j satisfies increasing differences also, as claimed.

I'x .e) - j(x,O') ~ J(x',e) - I(x',e/),

Rearranging, and using the definitions of wand w', this is the same as

few) + few') s J(z) + J(z').

Proof Part 1 is trivial. To see Part 2, pick any z := (x, 0) and z' == (x', fJ') thatsatisfy x ~ x' and 0 :::0'. Let w := (x, 0') and ui' = (x', f). Then, w v w' = z andw 1\W' = z', Since f is supennodular on S x e,

Theorem 10.3 Suppose f: S x E)' .; lR. is supermodular in (x, 0). Then:

1. I is supermodular in x jor each fixed e, i.e., jor any fixed () E e, and for any xand x' in S, we have f't», 8) + ftx", 8) :-::f(x v x', 8) + It» 1\ x', 0),

2. I satisfies increasing differences in (x, 0).

2.')710,1 l atticcs and Supermodutaritv

Page 266: A First Course in Optimization Theory

has at least one solution for each 8 E e. Suppose also that I satisfies strictlyincreasing differences in (x, e). Finally, suppose that S C JR. Then optimal actionsare monotone increasing in the parameter e.

Theorem 10.6 Suppose that the optimization problem

Maximize f'(x ; ()) subject to XES

which is in direct contradiction to the weak inequalities we obtained above. Thus,we must have XI ~ X2, and since XI and X2 were arbitrary selections from the sets ofoptimal actions at 81 and (f2, respectively, we have shown that ifSis a subset of JR., thecondition of strictly increasing differences suffices by itself to obtain monotonicityof optimal actions in the parameter.

Suppose f satisfies strictly increasing differences. Suppose further that although81 > 82, parametric monotonicity fails, and we have XI t. X2. Since x is a scalarvariable, we must then have XI < X2. Therefore, the vectors (X2, el) and (XI, (2)satisfy X2 > XI and 81 > e2. By strictly increasing differences, this implies

I(XI, OJ) - I(X2, el) ~ 0~ f(XI. (2) - f(X2, (2).

Maximize [tx, 0) subject to XES.

Suppose that a solution exists for all () E e (for instance, suppose that 1(·,8) iscontinuous on S for each fixed e, and that S is compact). Pick any two points e, and()2in e, and suppose that 8, > (ho Let x, and X2 be points that arsoptimal at el and82, respectively. Since X2 need not be optimal at 91 nor XI at e2, we have

10.2 Parametric MonotonicityTo get an intuitive feel for the role played by the condition of increasing differencesin obtaining parametric monotonicity, consider the following problem in the specialcase where S C lR (i.e., where x is a scalar):

simpler procedure to verify this, note that

a2f fj2I '02fOxa/x, y, e) = e, oxoe (x, y,8) = y, and oy8e (x, y,8) = x.

Since x, y, and (f are all nonnegative, these cross-partials are all nonnegative, so Iis supermodular in (x, y, (f) by Theorem 10.4. 0

Chapter 10 Parametric Monotonicity258

Page 267: A First Course in Optimization Theory

which contradicts the presumed optimality of x and x' at O. A similar argument alsoestablishes thatxv x' E V*C(}). Thus, V·(O) is a sublattice of lRn, and as a nonempty,compact sublattice of lRn, admits a greatest element x"(8).This completes the proofof Part I. :.,"'/......,/j~~." ,) ".,. A "if j "/~ /-.j)( /"1';' )

'CF ,{.; ~ .... ~,. _v'"( -; (..-' /o/4.'l\-'A/"V,:.r . ..R'~-·".-t!'1·"·' -.,:. r'!', II

l(x/\x',B) < l(x,B) = l(x',O).

Supermodularity in x then implies ! iJ) t' I vt!! .::-f (x!Jj .{ ()(_ t.): pi.........;i' ;:V,'

lCx'vx,O) > lCx,O) = l(x',e),

so x E v·ce). Therefore, V*«(}) is closed, and as a closed subset of the compact setS, it is also compact. Let X and x' be distinct elements of D" (e). If x /\ x' 1. v'(e),we must have

l(x,O) ::: f(y,O).

by the optimality of xp. Taking limits as p ~ 00, and using the continuity of Ie 0),we obtain

I(xp, e) ?: ICy,O)

Proof Since f is continuous on S for each fixed 8 and S is compact, v·ce) isnonempty for each e. Fix e and suppose (xpJ is a sequence in VO(e) converging toXES. Then, for any yES, we have

1. For each () E e, V·«(}) is a nonempty compact sublattice of ]Rn, and admits (J

greatest element, denoted x·(e). va a0D A :tr(O) .2. x·(81) ~ X·(e2) whenever Oi > e2· "'U. )lj<{9,) >,rXr{&,) '~~~M' 6l, > 0,3. If f satisfies strictly increasing differences in (x, e), then x I ?: X2 for any

XI E V«(}I) and X2 E V(82), whenever 81 > (h

V*(8) = argmax{f(x,e) I x E SI·

i,c:1 ".ei·"

Theorem 10.7 Let S be a compact sublattice of lRn, E> be aisubi~t/ice OT~I), andf: S x E> --+ IR be a continuous function on S for each fixed e. Suppose that fsatisfies increasing differences in (x, If), and is supennodular in x for each fixed O.Let the correspondence V· from E> to S be defined by

The arguments leading to Theorenll0.6 do not extend to the case where S C lR"'­for n ~ 2, since now the failure of XI and X2 to satisfy XI ?: X2 does not imply thatXl < X2. Thus, additional assumptions will be needed in this case. The appropriatecondition is, it turns out, that of supermodularity in x. The following result is thecenterpiece of this chapter:

25910.2 ParametricMonotonicity

Page 268: A First Course in Optimization Theory

l(xl /\ X2, (2) - Itx». (2) ::: O.

If the feasible correspondence V is allowed to be nonconstant, and if XIVX2 'I: D(BI)(say), then we could not conclude from the optimality of XI at 81 that the first of theseinequalities holds. since XI only maximizes J over D(el)' On the other hand. if wedirectly assume a condition that implies the feasibility of XIVX2 at V(e!)and XI (\X2at V(82), then we can obtain a generalization of Theorem 10.7 to the case where '0is nonconstant:

and that the optimality of X2 at 82 implied the inequality

Theorem 10.7 effectively assumes a constant feasible action correspondence 'Dwith D(O) = S for all e.The only significant role this assumption plays in the proofis in ensuring the feasibility of the action XI V X2 at el, and the feasibility of theaction XI(\ X2 at e2. To wit. the proof used the fact that optimality of XI at el impliedthe inequality

so the third inequality in the string becomes strict. contradicting the equality. 0

.::::0 (by optimality of X2 at (h),

so equality holds at every point in this string.Now. suppose XI = x*(81) and X2 = X"'(82). Since equality holds at all points \ ,

in the string, it is the case that XI V X2 is also an optimal action at 81. If it were .~~not true that XI ::: X2. then we would have XI V X2 > XI. and this contradicts thedefinition of XI as the greatest element of '0"'(81). Thus, we must have XI ::: X2. and i

this establishes Part 2 of the theorem. );',To see Part 3. suppose that XI and X2 are arbitrary selections from 'O·(81) and

'O*(82),respectively.Supposewedidnothavexl ::: X2. Then, wem'Usthavexl VX2 >XI and XI (\X2 < X2. If f satisfies strictly increasing differences. then. since 01 > 02.

we have

.::::l(xl /\ X2, (h) - l(x2, (2) ~... (by increasing differences in (x, 0»

.::::l(xl /\ x2.OJ) - l(x2, OJ) (by supermodularity in x)

o .::::I(xl. Or) - f'(x ; V X2. OJ) (by optimality of XI at OJ)

Now. let 81 and 82 be given with 01 > (h, Let XI E '0*(01) and X2 E 1)'(82).Then. we have

Chapter 10 Parametric Monotonicity260

Page 269: A First Course in Optimization Theory

It is easy to check that this correspondence does not meet the conditions of Corol­lary 10.8; indeed. l3(.) is not even a sublattice of IRn•

Letting x(p, I) denote a solution to this problem at the parameters (p, I), a questionof some interest is whether, or under what conditions, "own-price" effects on optimaldemand are negative, that is. whether xit p, I) is non increasing in the parameter Pi.It is easy to put this question into the framework of Corollary 10.8 by writing

- Pi. rather than Pi. as the parameter of interest. Then. a nonincreasing own-priceeffect is the same thing as the optimal demand for commodity i being tictvdecreasingin the parameter - Pi. and this is exactly the manner in which the conclusions ofCorollary 10.8 are stated.

Unfortunately. Corollary 10.8 cannot help us obtain an answer. Suppressing thedependence on the remaining parameters, the correspondence of feasible actionsB( - Pi) is given by

l3(-Pi) = lE lR~ I PIXi +L Pjx) ::5 f}.)'Ii

Maximize u(x) subject to x E B(p, I) = {z E lR~ I P: z ::: l}.

also meets the conditions of Corollary 10.8.On the other hand, these conditions are also sufficiently restrictive that many in­

teresting applications arc precluded. Consider, for instance, the utility maximizationproblem

·1,-::'t-•satisfies them. More generally, if S = (M) =~~.,the correspondence

. I&,)~ (0.)e. n! D(e) = (x E S I 0 ::: x ::: eJ

Vee) = [0, OJ

The conditions on D given in Corollary 10.8 are less forbidding than they mightappear at first sight. For instance, if S = 0) =:: ~+, the correspondencej

Then, ijV*(O) = arg max{f(x, e) I x E D(e»). both conclusions of Theorem 10.7hold.

111 ::: e2 implies XI V X2 E V(el) and XI 1\ X2 E V(fh).

Corollary 10.8 Suppose S, e.and f meet the conditions of Theorem 10.7.Supposealso that 1):e~ peS) satisfies the following conditions:

J. For all 0 E e, V(O) is a compact sublattice of e.2. For all 01 andfh in e, andforall XI E V(el) and X2 E D(82).

26110.2 ParametricMonotonicity

Page 270: A First Course in Optimization Theory

2The positive coefficient on Pi for j I- i captures the fact that an increase in the price of product j causessome consumers to move to the substitute offered by i; the positive coefficient on Pi reflects the demandthat will be lost (or that will move to other firms) if i raises prices.

A simple calculation shows that ri satisfies increasing differences in (Pi, P-i).Since Pi is unidimensional, ri also satisfies supermodularity in Pi for each P-i.

'3 Finally, the strategy space St for firm i is simply lR+, the set of all possible prices

rj(Pi, P-j) = (Pi - Ci)qi(Pi, p-i).

qj(p) = a, - fJjpj + LYijPj,Hj

where fJ and (Yjj)Nj are strictly positive parameters.f Suppose further that it costsfirm i a total of Ciqi to produce qj units, where c, > O. Then, firm i's payoff fromsetting the price pi; given the choices P-i of the other firms, is

Example 10.9 A typical example of a supermodular game is the Bertrand oligopolymodel with linear demand curves. In this example, players i = 1, ... , n, are firms inan oligopolistic market. The firms' products are substitutes, but not perfect substitutesin the eyes of consumers. Thus, each firm has some degree of market power. It isassumed that firms compete by setting prices so as to maximize (own) profit. Supposethe demand curves are given by the following information: if the vector of priceschosen by the n firms is P = (PI, ... , Pn), the demand that arises for firm j'sproduct is

10.3.1 Supermodular GamesRecall from Section 9.4 that an n-person normal-form game is specified by a strategyset S, and a payoff function ri: S. x §n -4 lR for each player i , i = 1, ... , n. Asearlier, we will write S for x 7=.Sj, and S-i to denote the strategy sets of "everyone­but-i:" S_j = Xf#jSj. The notation (Si' s_;) will be used to denote the obviousvector in S.

An n-person game is said to be a supermodular game if for each i, S, is a sublatticeof some Euclidean space, and ri: S -+ R is supermodular in Si for each fixed $_j, andsatisfies increasing differences in (Si, S-i). The game is said to be strictly supermod­ular if rt has strictly increasing differences in (Si, S-i) and is st~ctly supermodularin s, for each fixed S_j.

10.3 An Application to Supermodular GamesThis section offers an application of the theory of the previous section to a class ofn-person games known as supermodular games. For further examples along theselines, the reader is referred to Vives (1990), Milgrom and Roberts (1990), Fudenbergand Tirole (1990), or Topkis (1995).

Chapter 10 ParametricMonotonicity262

Page 271: A First Course in Optimization Theory

Proof Given s E S, player solves'

Maximize fCsi, S-j) subject to Sj E Sj.

3As in Section 9.4, player j's best-response problem depends on s only through Lj. However. it is nota­tionally convenient to write it as a fucntion of s, rather than just s_j.

Theorem 10.11 Suppose (Ill n-plavcr supermodular I;ll/1I(' has the properly that Ioreach i E {l, ... , n], S, is compact and r, is continuous on S, for each fixed s.i],

Then, the game has a Nash equilibrium.

10.3.3 Existence of Nash EquilibriumThe following result is an easy consequence of Theorem 10.7 on parametric mono­tonicity and Tarski's Fixed Point Theorem.

There are at least three features of the Tarski fixed point theorem that are worthstressing:

I. Unlike the great majority of fixed point theorems, the Turski Fixed Point Theoremdoes not require convexity of the set X.

2. Equally unusually, the theorem does not require the map f to be continuous, butmerely to be nondecreasing.

3. The theorem is valid for nondecreasing functions J, but (somewhat unintuitively)is false for non increasing functions J. Indeed, it is an elementary task to constructexamples of nonincreasing functions from the compact sublauice [0, \] into itselfthat do not have a fixed point; we leave this as an exercise.

IIProof See Tarski (1955).

Theorem 10.10 (Tarski's Fixed Point Theorem) Let X be a nonempty compactsublattice of]Rm. Let f: X ~ X be a nondecreasing function, i.e.. x, y E X withx :::y implies f(x) ::: fey). Then, [ has a fixed point on X.

10.3.2 The Tarski Fixed Point Theorem

As is almost always the case, the existence of Nash equilibrium in the class ofsupennodular games will also be proved through the use of a fixed point theorem.The relevant theorem in this case is the fixed point theorem due to Tarski, whichdeals with fixed points of monotone functions on lattices.

. -firm i may charge. Since this is evidently a sublattice of lR, the game we have de-scribed here is a supermodular game. U

26.\10.3 Application: Supermodular Games

Page 272: A First Course in Optimization Theory

If Z 2: z' or Z ~ z', this inequality trivially holds (in fact, we have an equality), sosuppose Z and z' are not comparable under 2:. For notational convenience, arrange

fez) + fez') ~ fez v z') + fez /\ z').

Proof That supermoduJarity implies increasing differences on Z can be establishedby a slight modification of the proof of Part 2 of Theorem 10.3. The details are leftto the reader.

To see the reverse implication (that increasing differences on Z implies supermod­ularity on Z), pick any Z and z' in Z. We are required to show that

Theorem 10.12 A function f: Z C ~m ~ ~ is supermodular 011 Z if and only iff has increasing differences 011 Z.

In words, f has increasing differences on Z if it has increasing differences in eachpair (Zj, Zj) when all other coordinates are held fixed at some value.

10.4 AProof of the Second-Derivative Characterization of Supermodularity

Our proof of Theorem lOA will rely on a characterization of supermodularity interms of increasing differences that was proved in Topkis (1978). Some new notationand definitions are required.

Let Z C ~m. For Z E Z, we will denote by (Ljj, z;, zj) the vector z, but withz; and Zj replaced by z; and zj, respectively. A function f: Z ~ ~ will be saidto satisfy increasing differences on Z if for all z E Z, for all distinct i and j in(I, ... ,n), and for all Z;, zj such that z; 2: Zi and zj 2: Zj, it is the case that

!(z_jj, z;, zj) - f(Z-ij, z;, Zj) 2: f(Z-ij, zi, zj) - f(Z-ij, zr, Zj).

Since each S, is a compact sublattice, so is S. By Tarski's Fixed Point Theorem, b"has a fix.edpoint. Any fixed point of b" is clearly a Nash equilibrium of the game.

D

b*(s) = (bi(s), ... , b;(s».

Since f is supermodular in s, and has increasing differences in (Sj, S_j), and sinceSj is a compact sublattice of some Euclidean space, it follows from Part 2 of The­orem 10.7 that the set of maximizers in this problem (i.e., player i's best-responsecorrespondence B R, (.» admits a greatest element b7(s) at each s E S; and that bi (s)is nondecreasing in s. Therefore, so is the map b": S ~ S, where

Chapter 10 Parametric Monotonicity2li4

Page 273: A First Course in Optimization Theory

Since z and z' were chosen arbitrarily, we have shown that 1is supennodular on Z.Theorem 10.12 is proved, 0

This is precisely the statement that

I(z v z") - I(z) ::: f(ZI) - [(z 1\ z").

Since this inequality holds for all j satisfying k :::: j < m, it follows that the left-handside is at its highest value at j = m - I, while the right-hand side is at its lowestvalue when j = k. Therefore,

I(zk.m) _ l(zO.m) :: f(zk.k) _ f(zO,k).

k-I?(L [[(z;+I.) - I(zi.));=0

k-Il(zk,)+I) - [(zO.}+I)= L[[(z;+I,)+I) - fezi·i+1»)

;=0

Therefore, we have for k ::::j < m ,

j < m,Since 1has increasing differences on Z, it is the ease that for all 0 ::: i < k .:5

ZO.k = z 1\ z'

zk.m = z v z'zO.m = z

zk.k = z',

Then, we have

(This rearrangement of the coordinates may obviously be accomplished without lossof generality.) Note that since z and z' are not comparable under ?:_, we must have0< k < m.

Now forO::: i :::i z: m, define

the coordinates of z and z' so'thlii .

10.4 SlIprnnodll/nrity Cbaracterircd

Page 274: A First Course in Optimization Theory

5. Let C(S) be the set of all continuous functions on S = [0, I]. For f,g E C(S).let f v g and f /\g be defined by

(f v g)(x) = max{/(x), g(x)} and (j /\g)(x) = min{j(x). g(x)}.

Show that f v g E C(S) and! /\ g E C(S). so C(S) is a lattice.

6. Suppose in Part 3 of Theorem 10.7. we replaced the condition that f satisfiesstrictly increasing differences with the condition that f is strictly supermoduJar.

10.5 Exercises

I. Show that the hyperplane H = {(x, y) I x - y = I} is a sublattice of]R2.

2. Give an example of a nonempty compact sublattice in ]R2which has at least twogreatest elements, or show that no such example is possible.

3. Suppose a closed set X in R2 has the property that the meet of any two pointsfrom X is also in X. Suppose also that X is bounded above, i.e., there is a E JR2

such that x S a for all x E X. Is it true that X must have a greatest element?Why or why not?

4. Give an example of a set X in JR2 with the property that the join of any two pointsin X is always in X, but there are at least two points x and y in X whose meet isnot in X.

oTheorem lOA is proved.

of o!-(z_jj, Zj + E, Zj) 2: -(Z-jj, Zj, Zj).o~ a~

Subtracting the right-hand side from the left-hand side. dividing by the positivequantity E, and letting E -!- 0, f is seen to be supennodular on Z if and only if for allZ E Z, and for all distinct i and j, we have

We proceed with the proof of Theorem 10.4. Let ! be a C2 function on Z C Rm.By Theorem 10.12, f is supennodularon Z if and only ifforall Z E Z, for all distincti and j, and for all E > ° and 0 > 0, we have

!(Z-ij, z, +1', Zj+0) - f(Z-ij, Zi + 1', Zj) ::::!(Z-jj, zs, Zj+0) - !(Lij, zi , Zj).

Dividing both sides by the positive quantity 0 and letting 0 -!- 0, we see that ! issupermodular on Z if and only if for all Z E Z, for all distinct i and j, and for allI' > 0,

Chapter 10 ParametricMonotonicity266

Page 275: A First Course in Optimization Theory

Will the result still hold tha~·D· is a nondecreasing correspondence? Why' or­why not?

7. Let XES = [0, I] and e E e = lR+. In each of the following cases, dctcrmi neif the given function is supennodular in (x, ()). In each case, determine alsoif the optimal action correspondence in the problem max {f(x, (}) I XES) IS

nondecreasing in ().

(a) f'(»,e) = xe - x282•(b) l(x,O) = xO-x2.(c) I(x,e) = x/(l +8).(d) I(x,e) = x(1 +8).(e) ft»,e) = x(O - x).

8. Prove Tarski's Fixed Point Theorem for the case where S c IR.9. Show that Tarski's Fixed Point Theorem is false if "nondecreasing" is replaced

by "nonincreasing," That is, give an example of a nonincreasing function froma compact lattice into itself that fails to have a fixed point.

10.5 Exercises

Page 276: A First Course in Optimization Theory

268

I. S is the state space of the problem, with generic element s.2. A is the action space of the problem, with generic element a.3. T, a positive integer, is the horizon of the problem.

11.2 Finite-Horizon Dynamic ProgrammingA Finite Horizon (Markovian) Dynamic Programming Problem (henceforth FHDP)is defined by a tuple {S, A, T, (rr, j" ct>r);=1j, where

11.1 Dynamic Programming ProblemsA dynamic programming problem is an optimization problem in which decisions haveto be taken sequentially over several time periods. To make the problem non-trivial, itis usually assumed that periods are "linked" in some fashion, viz., that actions takenin any particular period affect the decision environment (and thereby, the rewardpossibilities) in all future periods. In practice, this is typically achieved by positingthe presence of a "state" variable, representing the environment, which restricts theset of actions available to the decision-maker at any point in time, but which alsomoves through time in response 10 the decision-maker's actions. These twin featuresof the state variable provide the problem with "bite": actions that look attractivefrom the standpoint of immediate reward (for instance, a Carribean vacation) mighthave the effect of forcing the state variable (the consumer's wealth or savings) intovalues from which the continuation possibilities (future consumption levels) are notas pleasant. The modelling and study of this trade-off between current payoffs andfuture rewards is the focus of the theory of dynamic programming.

In this book, we focus on two classes of dynamic programming problems-Finite­Horizon Markovian Dynamic Programming Problems, which are the subject of thischapter, and Infinite-Horizon Stationary Discounted Dynamic Programming Prob­lems, which we examine in the next chapter.

Finite-Horizon Dynamic Programming

11

Page 277: A First Course in Optimization Theory

11.3 Histories, Strategies, and the Value FunctionA strategy for a dynamic programming problem is just a contingency plan. i.e.. aplan that specifies what is to be done at each stage as a function of all that hastranspired up to that point. In more formal language. a t-his tory III is a vectorlSI.a I•...• SI_I. aI-I. SI) of the state Sr in each period r up to t . the action (/T

taken that period. and the pcriod-r state .1'1' Let III = S. and for t "> I. let J/, denotethe set of all possible r-histories 1)1' Given a r-history 1)1. we will denote by .1', [lit) theperiod-t state under the history 1)1'

A strategy a, forthe problem. is a sequence (ar)i=I•where for each I. a.: H, ---+ Aspecifies the action at (1)1) E <%>1(s, [1)1]) to be taken in period t as a function of thehistory TIl E HI up to t. The requirement that o, (171) be an element of <%>,(s,r,lt J)ensures that the feasibility of actions at all points is built into the definition of astrategy. Let E denote the set of all strategies a for the problem.

A cleaner way to represent this objective-and. as we explain. an analytically moreadvantageous one-is to employ the notion of a "strategy." We tum to this now.

.1'1 = .I' E S

.1'1 fr-I(SI-I.O,-I). t=2 ..... T

01 E <1>,(51). t = 1....• T.

subject to

Maximize

The FHDP has a simple interpretation. The decision-maker begins from somefixed initial slate SI = S E S. The set of actions available to the decision-makerat this state is given by the correspondence <%>I (Sl) CA. When the decision-makerchooses an action al E <I> I (s), two things happen. First. the decision-maker receivesan immediate reward of rl (.\'1. al). Second. the state .\'2 at the begining of period 2 isrealized as S2 = /1 (.1'1.01). At this new state. the set of feasible actions is given by<%>2(S2) C A. and when the decision-maker chooses an action a2 E <%>2(.1'2). a rewardof r2(s2. a2) is received. and the period-3 state S3 is realized as S3 = [: (.\'2, a2). Theproblem proceeds in this way till the terminal date T is reached. The objective is tochoose a plan for taking actions at each point in time in order to maximize the sumof the per-period rewards over the horizon of the model. i.e .. to solve

TLr,(sl.at)1=1

4. For each t E {I •.... T).

(a) r.: S x A ~ lR is the period-t reward/unction.(b) It: S x A ~ S is the period-t transition/unction. and(c) <1>1: S ~ peA) is the period-t feasible action correspondence.

26911.3 Histories. Strategies. ValueFunction

Page 278: A First Course in Optimization Theory

IThe value function V in this problem is the exact analog of the maximized value function f' that weexamined in the context of parametric families of optimization problems. Indeed. the initial state s of theFHDP can be viewed precisely as parametrizing the problem.

In this notation, the objective of the decision-maker in our dynamic programmingproblem can be described as the problem of finding an optimal strategy. Writing theproblem in this way has (at least) two big advantages over writing it as the problem ofmaximizing rewards over the set of feasible state-action sequences from each initialstate.

The first advantage is a simple one. If we succeed in obtaining an optimal strategy,this strategy will enable us to calculate the optimal sequence of actions from anyinitial statc. In particular, this will enable us to carry out "comparative dynamics"exercises, where we can examine the effect of a change in the value of the initial state(or some parameter of the problem) on the sequence of optimal actions and states.

The second advantage is deeper. In the formulation of dynamic programmingproblems we have given here. we have assumed that transitions are deterministic.

W«(1*)(s) = V(s), for all s e S.

The function V is called the value function of the problem. IA strategy (1 * is said tobe an optimal strategy for the problem if the payoff it generates from any initial stateis the supremum over possible payoffs from that state. that is. if

V(s) = sup W«(1)(s).aEI:

Now, define the function V: S -+ IRby

TW(a)(s) = I,>/(a)(s).

/=1

The total reward under a from the initial state s, denoted W(a)(s), is, therefore,given by

Thus, given an initial state Sl = s E S, each strategy a gives rise to a unique period-rreward rl (a) (s) from s defined as

1/1(a,s) = (sl(a,s),al(a,s), ... ,sl(a,s»)

a,(a, s) = a/[1/I(a, s)]

SI+l (a, s) = ;;[s/(a, s), a/(a, s)].

Each strategy a E E gives rise from each initial stale s E S to a unique sequence ofstates and actions {Sf(a, s), a,(a, s»), and therefore, to a unique sequence of historiesTI, «(1, s), in the obvious recursive manner: we have sj (a, s) = s,and for t = I, ... , T,

Chapter 11 Finite-Horizon Dynamic Programming270

Page 279: A First Course in Optimization Theory

2Hence, the adjective "Markovian" (0describe these problems.

We shall call the problem {S,A, T-r, (r,·, <I>~, J;.) ~..::;-r1 the (T-r) -pe riod continuationproblem, and, for notational simplicity, denote it by {S, A, T - r ,v,(1)1,.II) ~~r t I}'Clearly. all r-histories 'Ir that end in Sr result in the same continuation (T-r )-pcriodproblem. Therefore, at any point t in an FHDP, the current state s, encapsulates allrelevant information regarding continuation possibilities from period t onwards, suchas the strategies that are feasible in the continuation and the consequent rewards thatmay be obtained.?

r,*(s, a) = r(+r(s,a), (.1', a) E S x A¢;(s) = ¢,+r(s), ~ f' SJ;·(s, a) = j;+r(s, a), (s,a)E5;xA.

11.4 Markovian Strategies

Any r-history Tlr = (SI, ai, ... , sr) in a given FHOP

{S, A, T, (r" ¢" Ji);=tlresults in another FHOP, namely the (T - r j-period problem given by

{S, A, T - r , (r,·, <1>7. J;");,;;-r).whose initial state is s = Sr, and where, for I = I, ...,T - r ,we have:

i.e., that given any pcriod-r statc.s, and.pcriod-t action a" the period- (t -+- I) S!<tIC, I,

precisely determined as Ji (s/. at). In a variety of economic problems, it might be morerealistic to replace this assumption with one where (s,. at) determines a probabilitydistribution over S, according to which the period-(I + I) state is realized. When thetransition is allowed to be stochastic in this manner, it may be simply impossible topick a sequence of actions a priori, since some of the actions in this sequence maytum out to be infeasible at the realized states. Moreover, even if infeasibility is notan issue, picking an a priori sequence of actions leaves us without any flexibility,in that it is not possible to vary continuation behavior based on the actual (i.c.,realized) states. It is obvious that this is a serious shortcoming: it will rarely be thecase that a single continuation action is optimal, regardless of which future state isrealized. On the other hand, since a strategy specifics a contingency plan of action,its prescriptions do not suffer from such problems. We will not be dealing with thestochastic transitions case in this book, since the technical details involved arc quitemessy. Nonetheless, at a conceptual level, the techniques used to show existenceare exactly the same as what we develop here for the deterministic case. By dealingwith strategy spaces rather than sequences of actions, the methods we develop willcontinue to be applicable to the more general case.

271J 1.4 Markovian Strategies

Page 280: A First Course in Optimization Theory

is an optimal strategyfor the originalproblem.

Lemma 11.1 Let a = (al,"" aT) be an optimal strategy for the FHDP

{S, A. T, in, f" <t>1)i=d·Suppose that for some T E {I, ... , T}, the (T -1:+ I)-period continuation problem

IS, A, T -T+ 1, (rr, t;<t>r);=r)

admits a Markovian optimal strategy {gr, ... ,srl. Then, the strategy

to represent the strategy in which the decision-maker acts according to the recom­mendations of a for the first «(-I) periods, and then switches to following the dictatesof the strategy (gl, ... , gT ).

The key to proving the existence of an optimal strategy-indeed, of a Markovianoptimal strategy-is the following lemma:

(at,.. ·•aI-I, g" ... ,grl

11.5 Existence of an Optimal StrategyLet a strategy a = {al •... , aT} be given, and suppose {gr •...• gT} is a Markovianstrategy for the (T -r+I)-period continuation problem. Then, we will use the notation

Since continuation possibilities from a state are not affected by how one arrivesat that state, it appears intuitively plausible that there is no gain to be made byconditioning actions on anything more than just the value of the current state and thetime period in which this state was reached. (This last factor is obviously important;to calculate optimal continuations from a given state s requires knowledge of thenumber of time periods remaining, or, equivalently, what the current date is.) Thisleads us to the notion of a Markovian strategy.

AMarkovian strategy is a strategy a in which at each t = 1,.... T -1, a, dependson the r-history 11, only through t and the value of the period-r state s, [TJ,] under11,. Such a strategy can evidently be represented simply by a sequence {gl .... ,gT }.where for each t . g,: S ~ A specifies the action g, (s,) E <t>,(s, ) to be taken in periodt, as a function of only the period-r state St.

If a Markovian strategy {gl, ... ,gT} is also an optimal strategy. it is called aMarkovian optimal strategy. This terminology is standard, but is a little ambiguous;to avoid legitimate confusion, it should be stressed that a Markovian optimal strategyis a strategy that is optimal in the class of all strategies, and not just among Markovianstrategies.

Chapter 11 Finite-HorizonDynamic Programming272

Page 281: A First Course in Optimization Theory

Let gj-(s) be a solution to this problem at the state s. By Lemma ILl, the solutiongi- can be used in the last period of an optimal strategy, without changing the totalrewards. Now, given that we are going to use gr in the last period, we can find theactions in the two-period problem beginning at period-(T -I), that will be optimal forthe two-period problem. This gives us an strategy pair (gj- -I' gj.) that is optimal forthe two-period continuation problem beginning at period T-I. An induction argument

Maximize rr (s, a) subject to a E 4>1' (r ).

The importance of Lemma 11.1 arises from the fact that it enables us to solve foran optimal strategy by the method of backwards induction. That is, we first considerthe one-period problem in which, from any s E S, we solve

Ifwe now lets; denote the common period-r state under both y and (J, this inequalitystates that from s;, the strategy {gT' ... , gT} is strictly dominated by the continuationstrategy {aT" .. ,aT}' This is an obvious contradiction of the presumed optimalityof {gr, ... , srl in the (T +r-l-I )-period problem: for, when beginning from the initialstate s; in the (T-T+I )-period problem, one could simply behave "as if' the historyl1r (a, s) had occurred, and follow the continuation strategy {ar, ... , aT]. This wouldresult in a strictly larger reward from s; than from following [gr, ... , g-T]. [J

I=TI=T

By construction, 'L;;:i r/(O')(s) = 'L;;:i r/(y)(s), since a and yare identical overthe first (r -I)-periods. Therefore, for W (a)(s) > W (y)(s) to hold, we must have

T TLr/(a)(s) > Lr/(y)(s).

I=T/=1

T-I T= L r/(y)(s) +L I'/(Y)(S).

TW(y)(s) :::: Lr/(y)(s)

1=1

and

r-l rL I'/(a)(s) +L I'/(a)(s),/=1 /",cr

--,W(O')(s) :::: I:.> (a)(s)

/=1

Letting r,Ca )(s) and r( (y )(s) denote the period-r rewards under a and y, respectively.from the initial state .'I, we have

W(a)(s) > W(y)(s).

Proof For notational case, denote the strategy [o i, ... ,aT -I ,gr, ... ,gr J by y-.-IIthe claim in the theorem were not true, then there would exist an initial state s of theT -period problem, such that the total payoffs under y from the state s were stricti ydominated by the total payoff from s under the optimal strategy a, i.e., such that

27 ,I11.5 Existence of all Optimal Strategy

Page 282: A First Course in Optimization Theory

Since rt is continuous on S x A, and <1>ris a compact-valued continuous corre­spondence, the solution to this problem is well defined. The maximized value of theobjective function is, of course, simply Vr(s). Let <1>j.(s)denote the set of maximiz­ers at s. By the Maximum Theorem, Vr is a continuous function on S, and <I>} is anonempty-valued usc correspondence from S into A.

Now, pick any function gr: S -+ A such that gr(s) E <l>j(s) for all s E S. (Sucha function is called a selection from <1>r.)Then, at any initial state of the one-periodproblem, the function gr recommends an action that is optimal from that state, sogr is an optimal strategy for the one-period problem. Note that <l>rmay admit manyselections, so the optimal strategy for the one-period problem may not be unique;however, all optimal strategies for this problem result in the payoff Vr(s) from theinitial state s.

Now pick any optimal strategy gr for the one-period problem, and consider thetwo-period problem from the initial state s (i.e., when s is the state at the beginning ofperiod (T-1». If the action a E<l>r-I (s) is now taken, the immediate reward receivedis rt -I (s, a).The period- T state is then realized as fr -I (s, a).The maximum one­period reward from the state fr _I (s, a) in the continuation one-period problem is,by definition, Vr[fT-I (s, a»), which can be attained using the action recommended

max{rr(s, a) I a E <l>r(s)}.

Proof We begin with the last period. A strategy gr for the one-period problem canbe optimal if, and only if, at each s, gr(s) solves:

Vt(s) = max {rt(s, a) + Vt+l[fr(S, a)]}.QE4>,(S)

Theorem 11.2 Under AI-A3, the dynamic programming problem admits a Marko­vian optimal strategy. The value function VtC·) of the CT-1+1)-period continuationproblem satisfies for each 1 E {I, ... T} and s E S, the following condition, knownas the "Bellman Equation," or the "Bellman Principle of Optimality:"

A3 For each t, <1>tis a continuous, compact-valued correspondence on S.

A2 For each t, fr is continuous on S x A.

A1 For each t, rt is continuous and bounded on S x A.

now completes the construction of the optimal strategy. The formal details of thismethod are given below in Theorem 11.2. But first, since even the one-period problemwill not have a solution unless minimal continuity and compactness conditions aremet, we shall impose the following assumptions:

Chapter J J Finite-HorizonDynamic Programming274

Page 283: A First Course in Optimization Theory

By hypothesis, r, and It are both continuous in (.I', a), and VI+I is continuous on S.Therefore, the objective function in this problem is continuous on S x A, and since<1>/ is a continuous compact-valued correspondence, a maximum exists at each s; the

max (rr(s,a) + VI+ILfr(s,a)]}.aE¢,(s)

r.ts, a) + Vr+1 [fr(s, a)].

It follows that g7 can be part of an optimal strategy in the (T -( + I)-period problemif, and only if, at each s, g7(s) solves:

It now follows that (gj- _I' gj.) is an optimal strategy for the two-period problem if,and only if, at each S E S, gj. _I solves

max {rr-I (s, a) + Vr[fr-I (.I', a»)).aE¢T_!(S)

Since fr-I is continuous on S x A by assumption, and we have shown that Vr IS

continuous on S, it must be the case that Vr[!r -1(s, a)] is continuous in (5, a). Sincerr _I is also continuous on S x A, the objective function in this problem is continuouson S x A. By hypothesis, ¢r-I is a continuous, compact-valued correspondence.Thus, invoking the Maximum Theorem again, we see that the maximum is well­defined, i.e., the correspondence of maximizers ¢'T _I is nonempty valued and usc,and the maximized value (which is, of course, just Vr -I) is continuous on S. Letgj. _I be any selection from ¢'T -1' By construction, then, (gj. _ 1' gj,) is an optimalstrategy for the two-period problem. Since it is also a Markovian strategy, it is aMarkovian optimal strategy for the two-period problem.

To show that there is a solution for any t, we use induction. Suppose that for somet E (I, ... , T -I), we have shown the following:

I. A solution (that is, a Markovian optimal strategy (g:+ I' ... ,gj. I) exists for the(T -I)-period problem.

2. The value function V/+Iof the (T -/ )-period problem is continuous on S'.

We will show that these properties are also true for the (T - t+ I )-period problem.Suppose the initial state of the (T -t + I)-period problem (i.e., at the beginning

of period r) is given by s, If the action a E <1>/(s)is taken, then an immediatereward of r (.I', a) is received; the state at the beginning of period I is then fr (s, a).The maximum one can receive in the continuation is, by definition, Vr+ 1 [fr(s. a)].which can be obtained using (g:+I' ...• gj.). Thus, given the action a in period I,

the maximum possible reward is

by gj- at this state. Thus, given'th;i the action a is taken at state s in the first period'of the two-period model, the maximum reward that can be obtained is

rr-t(s,a) + Vr{fr-I(s,a»).

27'\11.5 Existence of (III Optimal Strategy

Page 284: A First Course in Optimization Theory

Now, consider the two-period problem starting from some level w E S. The actiongr -I (w) is part of an optimal strategy for the two-period problem if, and only if,

Vr(W) =../W, w E S.

It is easy to see that since u(·) is a strictly increasing function the unique solution atany w is to consume everything; thus, the unique optimal strategy for the one-periodproblem is grew) = W for all w E S. The one-period value function Vr is, then,

max lI(C).cEIO,w)

11.6 An Example: The Consumption-8avings Problem

We illustrate the use of backwards induction in this section in a simple finite-horizondynamic optimization problem. A consumer faces aT-period planning horizon,where T is a finite positive integer. He has an initial wealth of W E IR+. If hebegins period-t with a wealth of WI, and consumes CI (WI ~ c, ~ 0) that period,then his wealth at the beginning of the next period is (WI - c,)( I + r), where r ~ 0is the interest rate. Consumption of c in any period gives the consumer utility ofu(c), where u(·) is a continuous, strictly increasing function on IR+~The consumer'sobjective is to maximize utility over the T -period horizon.

We shall first set up the consumer's maximization problem as a dynamic program­ming problem. Then, assuming u(c) = -/C, we shall obtain an exact solution.

The natural choice for the state space S is the set of possible wealth levels whichis IR+- Similarly, the natural choice for the action space A is the set of possibleconsumption levels, which is also IR+.

The reward function ri depends only on the consumption c in any period, andis given by r,(w, c) = u(c). The transition function fr is given by fr(w, c) =(w - c)(l + r). And, finally, the feasible action correspondence ¢, is given by¢,(w) = [0,w] for all w.

Now suppose u (c) = -vic. For ease of notation, let k = (l + r). In the last period,given a wealth level of w, the consumer solves:

maximized value of the objective function (which is VI) is continuous on S; and thecorrespondence of maximizers ct>7 is usc on S_ Letting g: denote any selection fromct>~, we see that {g:, ... , gr} is an optimal strategy for the (T -H I)-period problem.The induction step is complete.

Finally, note that the Bellman Equation at each t has also been established in thecourse of proving the induction argument. 0

Chapter II Finite-HorizonDynamic Programming276

Page 285: A First Course in Optimization Theory

WES.w

gl(W) =

where V,+J[k(w - c)] = (l + k + ... + kr -I-t) 1/2[kCw - c)] 1/2, by the inductionhypothesis. This is, once again, a strictly convex optimization problem. so gr _/(!l')is uniquely determined by the solution. A simple calculation shows it is, in fact. givenby

max {v'C + V(-j.J[k(w - c»))CE[O.W)

To check that these are correct expressions we use induction. Suppose that these arethe forms for Vr and gr for r E {I + I, ... , n.We will show that VI and gl alsohave these forms.

In period I, at the state w, the consumer solves:

wg,(w) = l+k+ ... +kT-"

and

Reworking this procedure for I = 3, I = 4, etc., is conceptually trivial. if a bitpainful to carry out. When one actually does the calculations. a pattern begins toemerge: we obtain

Substituting this into the maximization problem, we see also that the two-periodvalue function Vr-J is given by

Wgr-J (w) = --, WE S.

I+k

This is a strictly convex optimization problem. The first-order conditions are, there­fore. necessary and sufficient to identify a solution. A simple calculation reveals thatthere is a unique solution at each w. given by w/( I + k). So, if gr _I is to be part ofan optimal strategy for the two-period problem. we must have

max {..;c + .jk(w - C)} .cE[o.wl

(Recall that k = (l + r).) Thus, sr _I (w) must solve

max {..;c+ Vr[k(w - -n] .ce[O.wl

sr-i (w) solves

27711.6 An Example

Page 286: A First Course in Optimization Theory

Tsubject to Lat = C

1=101 2: 0, t = 1, ... , T.

TMaximize L Sta~

1=1

11.7 Exercises

I. Redo the consumer's multi period utility maximization problem of Section 11.6for each of the following specifications of the utility function:

(a) !l(c) = cU, a E (0, 1).(b) !l(c) = c.(c) u(c) = cP, f3 > 1.

2. Consider the following problem of optimal harvesting of a natural resource. Afirm (say, a fishery) begins with a given stock y > 0 of a natural resource (fish).In each period t = 1, ... T, of a finite horizon, the firm must decide how muchof the resource to sell on the market that period. If the firm de,fides to sell x unitsof the resource, it receives a profit that period of rr (x), where rr : 1R+~ R Theamount (y - x) of the resource left unharvested grows to an available amountof I(y - x) at the beginning of the next period, where f: 1R+ ~ IR+The firmwishes to choose a strategy that will maximize the sum of its profits over themodel's T-period horizon.

(a) Set up the firm's optimization problem as a finite-horizon dynamic program­ming problem, i.e., describe precisely the state space S, the action space A,the period-t reward function feasible action ri, etc.

(b) Describe sufficient conditions on f and x under which an optimal strategyexists in this problem.

(c) Assuming rr(x) = logx and f(x) = XU (0 < a ::: 1), solve this problemfor the firm's optimal strategy using backwards induction.

3. Given WI, W2, •.. , WT E 1R++and C E 1R++express the following problem as adynamic programming problem, i.e., find S, A. r, j" and 4>,. and solve it usinghackwards induction. The problem is:

The induction step is complete. 0

Substituting this back into the objective function of the problem, we obtain

V/(w) = (l+ k + ...+ e-/)1/2wl/2, wE S.

Chapter J J Finite-Horizon Dynamic Programming278

Page 287: A First Course in Optimization Theory

36I5

9673

64

6. Let an m x n matrix A be given. Consider the problem of finding a path betweenentries O;j in the matrix A which (i) starts at a 11 and ends at limn, (ii) which movesonly to the right or down, and (iii) which maximizes the sum of the entries ai;encountered. Express this as a dynamic programming problem. Using backwardsinduction solve the problem when the matrix A is given by:

PI ::: 0, St ::: 0, t = 1, ... , T.

where Pt and SI are parameters such that:

rLPr= 11=1

a, :::0, t = 1, ... , T

Tsubject to La, = C

1=1

5. Express the following problem as a dynamic programming problem and solve it,

T",_PtSIMaximize L1=1 SI +al

at ~ I, t = 1,... , T.

where 51, ... ,5r and c are given positive constants.

(a) Express this problem as a dynamic programming problem and solve it.(b) Repeat this exercise with the constraints modified to

T

Oat = c1=1

Tsubject to La, = C

1=1a, ~ 0, t = 1, ... , T

4. Consider the following maximization problem:

TMaximize L a~1

1=1

27')/1.7 Exercises

Page 288: A First Course in Optimization Theory

Tsubject to L Ct = c

(=1c( 2: o.

8. A firm is faced with achoice of two technologies. The first (the known technology)results in the finn incurring a constant unit cost of c per unit of production. Thesecond technology (the unknown technology) has associated with it an initialunit cost of production of d > c, but the cost of production declines over time asthis technology is used owing to a "learning by doing" effect. Suppose that if theunknown technology has been used for t periods, the per-unit cost of productionfrom using this for a (t + I)-th period is given by w(t), where

wet) = do + dIe-I

where do < c. (Note that we must have do +di = d, since w(O) = d.) Assumethat the firm produces exactly one unit of output per period, and is interested inminimizing the sum of its costs of production over a T -period horizon.

(a) Express the optimization problem facing the firm as a dynamic programmingproblem, and write down the Bellman Equation.

(b) Does this problem have a solution for any T? Why or why not?(c) As T increases, show that it becomes more likely that the firm will use the

unknown technology in the following sense: if there exists an optimal strategyin the T -horizon model in which the firm uses the unknown technology inthe first period, there exists an optimal strategy in the (T + I )-horizon modelin which the same statement is true.

(d) Fix any T > J. Prove that if it is optimal for the firm to use the unknowntechnology in the first period, then it is optimal for the firm to stay with theunknown technology through the entire T -period horizon.

TMaximize nc(

1=1

7. Describe the following problem in a dynamic programming framework, andsolve it using backwards induction.

Chapter 11 Finite-HorizonDynamic Programming280

Page 289: A First Course in Optimization Theory

281

I. S is the state space, or the set of environments, with generic element s. Weassume that S c IRn for some n.

2. A is the action space, with typical element a. We assume that A C IRk for somek.

3. <1>: S ~ P(A) is thefea.l'ihll' action correspondence that specifics for each .v c s·the set <1>(.1') C A of actions thai arc available at v.

4. f: S x A - S is the transition [unction for the stale. that specifics for eachcurrent state-action pair (s, a) the next-period state fts, a) E S.

5. r: S x A ~ IRis the (one-period) reward function that specifies a reward r(s. a)when the action a is taken at the state s,

6. /j E [0. I) is the one-period discount factor.

The interpretation of this framework is similar to that of the finite-horizon model.the one important difference being the horizon of the model, which is assumed (0 be

12.1 Description of the FrameworkA (deterministic) stationary discounted dynamic programming problem (henceforth.SDP) is specified by a tuple (S. A. <1>. j. r. 8), where

The principal complication that arises in extending the results of the last chapterto dynamic programming problems with an infinite horizon IS that infinite-horizonmodels lack a "last" period; this makes it impossible to usc backwards inductiontechniques to derive an optimal strategy. In this chapter. we show that general condi­tions may. nonetheless. be described for the existence of an optimal strategy in suchproblems. although the process of actually deriving an optimal strategy is necessar­ily more complicated. A final section then studies the application of these results toobtaining and characterizing optimal strategies in the canonical model of dynamiceconomic theory: the one-sector model of economic growth.

Stationary Discounted Dynamic Programming

12

Page 290: A First Course in Optimization Theory

r,(a)(s) = r[s,(a. s), a,(a,s)].

Thus, each strategy a induces. from each initial state s, a period-r reward r,(a)(s).where

12.2 Histories, Strategies, and the Value Function

A t-history h, = {so, ao •...• S,_I, a'-I. sd for theSDPis a listofthe stateandactionin each periodup to t - 1. and the period-r state. Let Ho = S. and for t = I, 2•... ,let H, denote the set of all possible r-histories, Denote by s,[h,] the period-r state inthe historyh,.

A strategy a for theSDP is. as earlier.acontingencyplan. whichspecifiesanactionfor the decision-maker as a function of the history that has transpired. Formally.a strategy a is a sequence of functions (a,}~o such that for each I = O. 1•...•a.: H, -+ A satisfies the feasibility requirement that a, (h,) E ¢(s, [hr)). Let 1:denote the space of all strategies.

Fix a strategy a. From each given initial state S E S, the strategy a determinesa unique sequence of states and actions (s,(a. s), a,(a. s)} as follows: so(a.s) = s.and for t = O. 1.2, ...•

h,(a, s) = {so(a.s), ao(a, s), ... , s,(a, s)}a,(a,s) = ar(h,(a. s))

s,+l(a.s) = /(s,(a.s).at(<J.s).

Once again. the notion of a "strategy" plays an important role in our analysis. Weturn to a formal definition of this concept.

subject to S1+1 = /(s,.a,), 1=0.1,2 •...a, E <1> (s,), t = 0.1.2 ....

00

Maxi rnize L8' r (s,. a, ),=0

infinitehere.An initial state So E S is given. If the decision-maker takes an actionaoin the set of actions ¢(so) that are feasible at this state. two things happen. First. thedecision-maker receives an immediate reward of r(so. ao). Second. the state movesto its period-I value SI = /(so. ao). The situation now repeats itself from SI. Thedecision-maker is presumed todiscountfuture rewards by the factora E [0, 1);thus.if the action a, is taken in period-r at the state s" the reward is deemed to be worthonly {)'r(s" a,) today. As earlier. the aim of the decision-maker is to maximize thesum of the rewards over the modeJ's (now infinite) horizon. i.e.• to solve from anygiven initial state so E S :

Chapter 12 Stationary Dynamic Programming282

Page 291: A First Course in Optimization Theory

IWe are being sloppy here in the interests of brevity. What we mean formally is that there is a strategy a inthis SDP such that from the initial suite s, the strategy a results in the series of actions {a,(a, sl} = {(4/3)' [.

Finally. let li = 3/4.Suppose the initial state is So = 3. It is easily checked that the sequence of actions

{a,} given by a, = (4/3)' is feasible from so. 1 For each 1,

li'r(s"ar} = GY GY = I,

so the total reward associated with this sequence is 1+ 1+ 1+ ... = +00. On theother hand. the sequence of actions {a:ldefinedbya; = (3/2)(4/3)',1= 0,1, 2, ...is also feasible from So = 3. and also yields infinite total reward. However, for each

a, (s, a) E S x A.res, a)

and

/(s,a)=3(s-a), (s,a)ESxA,

Example 12.1 Let S = A = IR+, and let <1>(s) = [0, s 1 for s E S. Define th etransition and reward functions by

Problem 1 Twoor more strategiesmayyield infinitetotal rewards from some initialstates, yet may not appear equivalent from an intuitive standpoint. Consider t~,efollowing example:

12.3 The Bellman EquationSimpleexamples showthat severalproblemsarise in thesearch foran optimal strategyunless we impose some restrictions on the structure of the SOP.

In words, an optimal strategy is one whose specificationsyield a reward that cannotbe beaten from any initial state.

W(o*)(s) = V(s), s E S.

A strategy 0* is said to be an optimal strategy for the SDP if:

V(s) == sup W(o)(s}.I7E:[

The value function V: S -l> IRof the SDP is defined as

00

W(o)(s) = Lt5lr,(o)(s).1=0

2H .'J 2.3 The Bellman Equation

Let W (a )(s) denote the tot~ldisc'ountedreward from s under the strategy 0:-

Page 292: A First Course in Optimization Theory

Overcoming the second problem is, of course, the central issue: when does theSDP have a solution? The first step in examining the existence of a solution is theBellmanEquation, or the Principle of Optimality. Intuitively, the Bellman Equationis a statement of "dynamic consistency" of the optimal path-viz., that once on thepath there is never a tendency to move away from it.

Proof Fix any s. Since r is bounded, it is the case that IW (O')(s)1 ::::K/O - 8) forany O'.Thus, V(s) = sup; W(O')(s) must be finite, so V: S -+ JR is well defined.Pick any sand e > O. If there were no strategy such that W (O')(s) 2: V (.I') - E. thenwe must have W(O')(s) ::::V(s) - E, and so V(s) :::::sUPa W(O')(s) ::::V(s) - E,

which is absurd. 0

Lemma 12.3 Under Assumption 1, the value function V is well defined as afunctionfrom S to JR, even if an optimal strategy does not exist. Moreover, given any s E S,and any Ii > 0, it is the case that there exists a strategy a, depending possibly on E

and s. such that W(O')(s) ~ V(s) - E.

Assumption 1 is, by itself, sufficient to yield two useful conclusions.

Assumption 1 There is a real number K such that Ir(s, a)1 < K for all (s, a) ESx A.

The first problem can be overcome if we assume S and A to be compact sets andr to be continuous on S x A. In this case, r is bounded on S x A, so for any feasibleaction sequence, total reward is finite since 8 < 1. More generally, we may directlyassume, as we shall, that r is bounded on S x A:

Example 12.2 Let S = {O},A = {I, 2,3, ... }, <1>(0)= A, [(0, a) = 0, andr(O, a) = (a -1)/a. Let 8be any number in (0, 1). It is easily seen that the supremumof the objective function over the space of feasible sequences of actions is (1 - 8)-I,but there is no feasible sequence {at I, such that L~o8t[(al - 1)/arJ = (I - ~)-I.

o

Problem 2 No optimal strategy may exist even ifL~o 81r(st, at) converges for eachfeasible sequence {ac} from each initial state s. Consider the following example:

a; = (3/2)ac, and one gets the feeling that {an is more "reasonable" as a possiblesolution than (at I. 0

Chapter 12 Stationary Dynamic Programming284

Page 293: A First Course in Optimization Theory

V(s) - e ~ W(a)(s)

= r(s,a) +/iW(alJ)[!(s, a)]

~ res, a) + 8V(j(s, a».

We now have

Let a =O'o(s), and 51 = res, a). Finally, let the continuation strategy specified by aafter the first period be denoted by a II. Since the strategy (J II is certainly a feasiblecontinuation SI, we must have

W(a)(s) ;:: V(s) - E.

This completes step 1 of the proof.Now fix an s E S. Pick any f > O.Let a be a strategy that satisfies

V(s);:: sup Ir(s, a) + 8V(J(s, a»}.aE<I>(s)

or, since IE > 0 was also arbitrary

V(s);:: sup (r(s,a)+/iV(f(s,a»-E)aE<I>(s)

But a was picked arbitrarily; therefore, this holds for all a E 4> (s). So

V(s) ~ r(s,a) +/iW(a)(J(s, a»

;:: res, a) + /iV(j(s, a» - f.

We will show that V = W, by first showing V(s) ~ W(s) for all s, and thenV(s) ~ W(s) for all s. Fix any s E S.

Pick any IE > O.Let a be any action in 4>(s) and let 05' = f ts, (1). Pick a strategy (J

that satisfies W (a )(s') ~ V (s') - E(/i.As we have observed, such a strategy alwaysex.ists by definition of V. Now the action a followed by the strategy a is feasiblefrom s, since a was chosen from 4>(05), and by definition all strategies specify onlyfeasible actions. So

Proof Let W: S -+ R be defined by

W(s) = sup (res, a) + /iV(J(s, a»}.aE<l>(s)

V(s) = Slip Ir(.I',a)+/iVIf(s,a)ll.aE<l>(s)

Theorem 12.4 (The BellmanEquation) The value function V satisfies thef;-l1()~:'­ing equation (the "Bellman Equation") at each S E S.-

2HSJ2.3 TheHellmanEquation

Page 294: A First Course in Optimization Theory

2VeClor spaces are defined in Appendix C.

Ii(y) - h(y)1 :'5 l1(y) - g(y) 1 + Ig(y) - h(y)1:'5 d(f. g) + d(g. h).

j,g E C(S).d(j, g) = sup l1(y) - g(y)l,ye[O.IJ

Since the absolute value I[(y) - g(y)1 is nonnegative. and is zero if and only iffey) = g(y), we have d(j, g) ~ 0, with equality if and only if j(y) = g(y) for ally, i.e., if and only if [ = g. Thus, the sup-norm metric meets the positivity condition.It is evidently symmetric. Finally, pick any j, g, h E C(S). For any yES. we have

A description of metric spaces and their properties may be found in Appendix C.We summarize here some of the properties that are important for this chapter.

The canonical example of a metric space is k-dimensional Euclidean space, JRk,

with the Euclidean metric. As we have seen in Chapter 1, the Euclidean metric meetsthe three conditions required in the definition of a metric.

A second example. and one of some importance for this chapter, is the spaceC (S) of all continuous and bounded functions from S C ]R" to R endowed with the"sup-norm metric" d, where d is defined by

1. (Positivity) d(x, y) ~ 0, with equality if and only if x = y;2. (Symmetry) d(x, y) = dey, x); and3. (Triangle Inequality) d(x, z) :'5 d(x, y) + dey, z).

12.4 A Technical Digression

12.4.1 Complete Metric Spaces and Cauchy SequencesA metric space is a pair (X, d) where X is a vector space.I and d: X x X ~ IR is afunction satisfying the following conditions for all x, y, Z E X:

We shall use the Bellman Equation to identify a set of conditions under which theSDP has a solution. To do this, a technical digression is required.

owhich completes the second step of the proof.

V(s) ::: sup{r(s, a) +<5V(j(s, a»),a

Since E is arbitrary, we have

V (s) - € < sup (res, a) + <5V(f(s, a»).ae<l>(s)

So

Chapter 12 Stationary Dynamic Programming286

Page 295: A First Course in Optimization Theory

A contraction mapping T on a metric space (X, d) must necessarily be continuouson that space. That is, it must be the case that for any sequence Yn ~ Y in X, we

oso T is a contraction with p = 1/2.

Example 12.5 Let X = [0, I], d be the Euclidean metric restricted to [0, I). LetT: X-+- Xbe defined by Tx = xj2. Then, d(Tx, Ty) = d(xj2, y/2) = d(x, y)/2.

d(Tx,Ty) s pd(x,y), x,yEX.

12.4.2 Contraction MappingsLet (X, d) be a metric space and T: X -+- X. For notational convenience in whatfollows, we shall write Tx rather than T(x) to denote the value of T at a point x E X.The map T is said to be a contraction if there is p E [0, I) such that

and the sup-norm metric d meets the triangle inequality condition also, Thus, d is <I

vueiric on C(S).'~nedefinitions of a convergent sequence and a Cauchy sequence are generalized

from IRn to abstract metric spaces in the obvious way. Given a sequence (xn I in ametric space (X, d), we say that {xnl converges to x E X (written Xn ~ x) if it isthe case that the sequence of real numbers d(xn, x) converges to zero, i.e., if for anyf > 0, there is n(f) such that for all n ~ n(f), it is the case that d(xn, x) < f.

A sequence {xn} in (X, d) is said to be a Cauchy sequence if for all f > 0, thereis n(f) such that for all n, m :::n(f), it is the case that d(xn, Xm) < f.

A Cauchy sequence {xn} in (X, d) is said to converge if there is x E X suchd(xn, x) -+- ° as n -+- 00. (That the limit x be an element of X is an important partof the definition.)

As in IRn with the Euclidean metric, every convergent sequence in an abstract met­ric space is also a Cauchy sequence. Unlike IRn, however, the converse is not alway.true. There exist metric spaces in which not all Cauchy sequences are convergentsequences. A simple example is the following: let X be the open interval (0, I), andlet d be the Euclidean metric on R restricted to (0,1). Then, the sequence (xn I in Xdefined by Xn = ~ for all n is a Cauchy sequence in X that does not converge: thereis no x E X such that ~ -+- x.

When a metric space has the property that all Cauchy sequences in the spaceconverge, it is said to be a complete metric space.

d(f,h) = suplf(y)-h(y)1 s d(f,g)+d(g,h),

Since yES was arbitrary, ' _," ,

28712.4 A Technical Digression

Page 296: A First Course in Optimization Theory

oThe theorem is proved.

Since x is fixed. d(Tx.x) is just a fixed number. Since p < I, it follows that bytaking /I sufficiently large. we can make d(Tmx. rnx) as small as desired for anym > n.Therefore, (Tn X J is a Cauchy sequence in X. and converges to a limit, say x ",

We will show that x" is a fixed point of T. Indeed. this is immediate: by thecontinuity of T, we have limn T(y,,) = T(limn y,,) for any convergent sequence y".so

Proof Uniqueness of the fixed point is easy to prove. Suppose x and y were bothfixed points of T. and x ¥- y. Then, we must have d(x,y) > O. Since x and yareboth fixed points. we have Tx = x and Ty = y, so dt'I'x, Ty) = dix , y). On theother hand. since T is a contraction. we must also have d(Tx. Ty) :5 pdtx. y). andsince we must have p < 1, this means d(Tx, Ty) < dtx , y). a cdntradiction. Thus,T can have at most one fixed point.

It remains to be shown that T has at least one fixed point. Pick any x E X. and letT2x denote T evaluated at Tx. By induction, define now Tnx to be T evaluated atT" -Ix. Using the fact that T is a contraction. we will show that the sequence' T" X Jis a Cauchy sequence in X.

To this end, note that d(Tn+1x, Tnx) :5 pdt'I"«, Tn-Ix). It follows thatd( m+1 x, T" x) s pnd(Tx. x). Therefore, if m > n.we have

d(rn'x. Tnx) :5 d(Tmx, Tm-1x) + ... +d(Tn+1x, Tnx)

:5 (pm-l + + pn)d(Tx, x)

= (1+p + + pm-n-l)pnd(Tx. x)

:::[I - prl pnd(Tx. x).

Remark Observe that the theorem asserts two results under the stated conditions:(a) that a fixed point exists. and (b) that the fixed point is unique. 0

Theorem 12.6 (The Contraction Mapping Theorem) Let (X. d) be a completemetric space. and T: X ~ X be a contraction. Then. T has a unique fixed point.

The importance of contraction mappings for us arises from the following powerfulresult. which is also known as the Banach Fixed Point Theorem:

d(TYn. Ty) :5 pd(Yn. y) ~ 0 as n ~ 00.

have TYn ~ Ty. This follows from applying the definition of a contraction: we have

Chapter 12 Stationary Dynamic Programming288

Page 297: A First Course in Optimization Theory

Theorem 12.9 (Uniform Convergence Theorem) Let S C IRm. LeI lin} be (I

sequence of functions from S to lR such that In --t I uniformly. If the functions f.are all bounded and continuous, then I is also bounded and continuous.

We will use this observation shortly to prove an important property about C(.')·).the space of continuous functions on S, endowed with the sup-norm. The followingpreliminary result, which is also of considerable independent interest, is needed first,

sup I/n(x) - l(x)1 ~ 0:1.'>n ~ 00.XES

In words, In converges uniformly to I if the distance between fn (x) and l(x) callbe made arbitrarily small simultaneously for all x, simply by taking n sufficientlylarge.It is immediate from the definitions of uniform convergence and of the sup-norm

that {in Iconverges to I uniformly if and only if {In} converges to I in the sup-normmetric, i.e., if ami only if

I/n(x) - l(x)1 < € for all XES.

12.4.3 Uniform ConvergenceLet S C IRk and UnI be a sequence of functions from S to JR. We say that thesequence (In Iconverges uniformly to a limit function I if for all e > 0, there is allinteger N(€), such that for all n :::Nte), we have

Example 12.8 (The completeness of X cannot be weakencd.) Let X = (0, I) withthe Euclidean metric. Let T (x) = x /2. Then T is a contraction. but X is not completeand T has no fixed point. IJ

Indeed, an even more subtle result is true. It is not even sufficient that for each pair(x, y), there exists p(x, y} < Isuch that d(Tx, Ty) < p(x, y)d(x, y). Equivalently.it is not sufficient that d(T x, Ty) < dt x , y) for all x , y. For an example, see theExercises.

Example 12.7 (The definition of the contraction cannot be weakened to allow torp = I.) Consider IR+ with the Euclidean metric. This is a complete metric space.Let T: IR+ --t IR+ be defined by rex) = I + x. Then d(Tx, Ty} :s d t x , y), but Thas no fixed point. 0

It may be seen through examples that the conditions of the theorem cannol beweakened:

2H')12.4 A Technical Digression

Page 298: A First Course in Optimization Theory

This says precisely that {In} converges uniformly to S.By the Uniform Convergence Theorem, I is a continuous function. Since uniform

convergence is the same as sup-norm convergence, we have shown that an arbitraryCauchy sequence in C(S) converges in the sup-norm to a limit in C(S). Thus, C(S)is complete when endowed with the sup-norm. 0

lin (X) - f(x)1 < E, XES.

Holding n fixed and letting m -+ 00, we obtain

Iln(x) - 1m (x)] < E, XES.

Proof Let {In} be a Cauchy sequence in S. Pick any XES. Then, (/n(x)} is a Cauchysequence in R. Since ]R. is complete, this Cauchy sequence converges to a limit. Callthis limit f(x). This defines a function f: S -+ lRsuch that fn(x) -+ I(x) for eachXES. We will first show that {In} converges uniformly to I. This will establish, bythe Uniform Convergence Theorem, that I is continuous on S.

Since (In) is a Cauchy sequence in C(S), it is the case that given any e > 0, thereis nie) such that m , n ::: neE) implies

Theorem 12.10 The space C(S) of continuous, bounded real-valuedfunctions onthe set Sc ]R.k is a complete metric space when endowedwith the sup-normmetric.

Our next result plays a significant role in establishing existence of an optimalstrategy in stationary dynamic programming problems.

oThe theorem is proved.

= E.

If(x) - It» 1 :s I/(x) - IN(X)I + IIN(X) - IN(y)1 + IIN(Y) - f(y)1

< €/3+E/3+E/3

Pick N sufficiently large so that for all n ~ N, I/(z) - In (z)] < E/3 for all z E S.Since fN is a continuous function, there is /) > 0 such that IIx - yll < /) impliesI[N(X) - [N(y)! < E/3. Therefore, whenever IIx - yll < 8, we have

I[(x) - f(y)1 :s If(x) - [k(x)1 + 1.Ik(x) - Ik(y)1 + 1.Ik(y) - f(y)l·

Proof Boundedness of f is obvious: since j,,(X) ~ f(x) for each x, if f isunbounded, then the functions In must also be. To show continuity of f at an arbitrarypoint XES we need to show that for all E > 0, there is 8 > 0 such that YES andIlx - yll < 8 implies If(x) - f(y)1 < E.

SOlet E > 0 be given. For any k, the triangle inequality implies

Chapter 12 Stationary Dynamic Programming290

Page 299: A First Course in Optimization Theory

W(a)(s) = sup {res, a) + 8W(a)(f(s, a»}.OE<1>(s)

In subsection 12.5.2, we identify a particularly useful class of strategies calledstationary strategies. Finally, in subsection 12.5.3, we show that under suitable

In subsection 12.5.1, we establish that a strategy a in the SDI' is an optimal strategyif~l;;' only if the total payoff Weal under {J also meets the Bellman Equation at eachs E:: S:

V(S) = sup !r(s,a)+oV(/(s,a»}.aE<l>{J)

12.5 Existence of an Optimal StrategyWe are now in a position to outline a set of assumptions on the SDP under whichthe existence of an optimal strategy can be guaranteed. We proceed in three stages.Recall that under Assumption I, we have shown that the value function V: S -4 IRsatisfies the Bellman Equation at all S E S:

Then, for each fixed x > 0, In (x) -+ 1 as n -+ 00, while In (0) = 0 for all n, so Inconverges pointwise to I defined by 1(0) = 0, I(x) = I for x > O. However, Indoes not converge uniformly to I (if it did, I would be continuous by the uniformconvergence theorem). [J

x:: linx > lin.{

nx,In(x) =

I,

Example 12.1l Let {In Ibe a sequence of functions from lR+ to lR+, defined by

We close this section with an' important remark. It is necessary to distingulslluniform convergence from another notion of convergence in functional spaces-thaI of pointwise convergence. We say thalli sequence of functions IIn ICOllvergespointwise 10a limit function I if for each ti xed x the following condition holds: for allE > 0, there exists an integer N(E), such that for all n ::: N(E), If" (x) - l(x)1 s (

Clearly a function converges uniformly only if it converges pointwise. TIle con­verse, however, is not true. The key distinction is that in pointwise convergence wefixx first and see if the sequence of real numbers (In (x) Iconverges 10 I(x). On theother hand, uniform convergence requires not only that Inex) -+ I(x) for each .r.but also (roughly speaking) that the "rate of convergence" of f~ex) to l(x) be thesame for all x. That is, for uniform convergence to hold, the same choice of N (r )must make Ifn(x) - l(x)1 < E for all n ::: N(E),for all x, whereas in the definitionof pointwise convergence, the choice of N(E) may depend on x. To illustrate thisvital distinction, consider the following example.

2()j12.5 Existence oj Of! Optimal Stratcg»

Page 300: A First Course in Optimization Theory

Proof That V is a fixed point of T is immediate from the definition of T andTheorem 12.4. We will show it is the only fixed point of T, by showing that T is acontraction on B(S). Since B(S) has been shown to be a complete metric space, the

at each s E S, then we must have w == V.

w(s) = sup {r(s, a) + 8w(f(s, a»}ae4>(s)

Theorem 12.13 V is the unique fixed point of the operator T. That is, If W E B( S)is anyfunction satisfying

Tw(s) = sup {r(s, a) + ow(f(s, a))}.aE¢>(S)

Since rand ware bounded by assumption, it follows that so is Tw, and thereforethat Tw E B(S). So T maps B(S) into itself. We first establish an important resultconcerning the fixed points of T.

Now, define a map T on B(S) as follows: for W E B(S), let Tw be that functionwhose value at any S E S is specified by

Proof Let {vn} be a Cauchy sequence in B(S). Then, for each YES, {vn(y)} is aCauchy sequence in R Ifwe let v(y) denote the limit of this sequence, we obtain afunction v: S -t R Since {vn} is bounded for each n, v is bounded. it is the case that.U E B(S). That {un} converges uniformly to v is established along the lines of theproof of Theorem 12.10 on the completeness of C(S) under the sup-norm metric.

D

Lemma 12.12 B(S) is a complete metric space when endowed with the sup-normmetric.

d(v, w) = sup Iv(y) - w(y)l.yeS

12.5.1 A Preliminary ResultLet B(S) denote the set of all bounded functions from S to lIt Note that underAssumption I,we have V E B(S). Endow B(S) with the sup-norm metric, i.e., forv, W E B(S), let the distance d(v, w) be given by

assumptions, there exists a stationary strategy rr such that W (rr) meets the Bell­man Equation, which completes the proof.

Chapter 12 Stationary Dynamic Programming292

Page 301: A First Course in Optimization Theory

To complete the proof of Theorem 12.13, we apply the Contraction MappingLemma to the mapping T. Note that T trivially satisfies monotonicity (a larger

and so SUPSES ILw(s)- Lv(s)1 s f311w- vII· Equivalently, IILw - LvII s .Bllw - vII·Since f3 E [0,1), we are done. 0

ILw(s) - Lv(s)1 s f3l1w - vII,

or [Lu(s) - Lw(s)] s f3l1w - vII. Combining this with the previous inequality, weobtain

Lv(s) s L(w + IIw - vll)(s) s Lw(s) + f3l1w - vII,

so that [Lw(s) - Lv(s)] s .Bllw - vII. Now, v(s) - w(s) s Ilw - vII, so goingthrough the same procedure yields

Lw(s) S L(v + IIw - vll)(s) s Lv(s) + f311w- vII,

So w(s) S v(s) + IIw - vII. Applying monotonicity and discounting in order, wenow have

W(s) - v(s) s sup luns ) - v(s)1 = IIw - vII·sES

Proof of the Contraction Mapping Lemma Let v, w E B(S). Clearly for anys E S, we have

Remark Like most shorthand notation, the notation in the lemma is sloppy. First,w ::::v means w(s) ::: v(s) for all s E S. Secondly, for W E B(S) and c E R w + cis the function which at any s assumes the value w(s) +c. Under this interpretation.it is clearly the case that whenever w E 8(S), we also have (w + c) E 8(S), soL(w + c) is well defined.

Then. L is a contraction.

L w + flc for all

Lemma 12.14 (Contraction Mapping Lemma) Let L: B(S) ~ B(S) be any mapthat satisfies

I. (Monotonicity) w :::: v => Lw ::: Lv.2. (Discounting) There is f3 E [0, I) such that L (w + c)

w E B(S) and c E lR.

Contraction Mapping Theorem w_iIIt,hcnimply that T has a unique fixed point on8(S).The proof that T is a contraction is simplified by the following lemma:

29.112.5 Existence of lin Optimal Strategy

Page 302: A First Course in Optimization Theory

12.5.2 Stationary StrategiesThe aim of this section is to single out a particularly useful class of strategies, theclass of stationary strategies. The definition of stationary strategies is motivatedby the observation that any t-history h, = (S(, a(, ... , St) in a stationary dynamicprogramming problem {S, A, r, I, <1>, 8} from the initial state s = si, simply resultsin the same stationary dynamic programming problem {S, A, r, f, <1>, o}, but withinitial state s = St. Intuitively, this appears to imply that there is no extra gain to be

By Theorem 12.15, the SDP will be solved if we can demonstrate the existence ofa strategy 0 such that W (0) satisfies the Bellman Equation at each s E S. We do thisin subsection 12.5.3 after first identifying an especially simple and attractive class ofstrategies.

oat all s E S, as required.

W(a)(s)= sup {r(s, a) + 8W(0)(f(s, a»),aE$(s)

Proof First, suppose that W (0) satisfies the given equation at each S E S. Since ris bounded by Assumption 1, W(o) is evidently bounded. But th'en we must haveW (0) = V, since (by Theorem 12.13) V is the unique bounded function that satisfiesthis equation at all s. And, of course, W (0) = V says precisely that 0 is an optimalstrategy.

Now suppose 0 is an optimal strategy. By definition of V, we must have W (0) =V, so we certainly have

W(o)(s) = sup (r(s,a) +8W(0)(f(s, a))}.aE$(s)

Theorem 12.15 Under Assumption 1,a strategy 0 in the SDP is an optimal strategyif, and only if, W (0) satisfies the following equation at each S E S:

We can now establish the main result of this subsection.

Since <5 E [0, 1) by hypothesis, T satisfies the discounting property also. Therefore,T is a contraction on B(S), and the theorem is proved. 0

T(w + c)(s) = sup{r(s, a) + o(w + c)(f(s, a»}= sup{r(s, a) + ow(f(s, a» + oc}

= Tw+oc.

function can only give a higher supremum). Furthermore,

Chapter 12 Stationary Dynamic Programming294

Page 303: A First Course in Optimization Theory

w·(s) = max {r(s,a)+ow*(f(s,a»)}.aE<t>(s)

Step 1 We first show that there is a unique continuous function ui":S ---)0 IR suchthat w· meets the Bellman Principle of Optimality at all S E S:

Under Assumptions 1-4,we will show the existence of a stationary strategy JT'

such that W (rr *) meets the Bellman Principle of Optimality at all s E S. Our dcriva­tion of this strategy is carried out in a series of three steps. The content of these stepsis as follows:

Assumption 4 <P is a continuous, compact-valued correspondence on S.

Assumption 3 f is continuous on S x A.

Assumption 2 r is continuous on S x A.

We now add the following assumptions:

Assumption 1 r is bounded on S x A.

!l.S.3 Existence of an Optimal StrategyWe have already assumed that:

made by conditioning the strategy on anything more than the current state, and noteven the date (i.e .• the time period) on which this state was reached. A strategy whichdepends solely on the current state in this fashion is called a stationary Markovianstrategy, or simply a stationary strategy.

For a more formal definition, aMarkovian strategy (J for the SDP is defined to bea strategy where for each t, o, depends on he only through t and the period-r stateunder hI. s, [h,]. Effectively. a Markovian strategy may be thought of as a sequence{iTe) where for each t , n, is a mapping from S to A satisfying ITt (s) E <P (s) for eachs. The interpretation is that it, (s) is the action to be taken in period t , if the state atthe beginning of period t is s.

A stationary strategy is a Markovian strategy {iTe} which satisfies the further con­dition that n, = iTr (= T(. say) for all t and r, Thus. in a stationary strategy, theaction taken in any period t depends only on the state at the beginning of that period.and not even on the value of t. It is usual to denote such a strategy by T( (00), but, fornotational simplicity. we shall denote such a strategy simply by the function tt .

Finally, a stationary optimal strategy is a stationary strategy that is also an optimalstrategy.

29512.5 Existence of an Optimal Strategy

Page 304: A First Course in Optimization Theory

Lemma 12.17 T: C(S) -4 C(S) is a contraction.

For w, V E C(S), it is immediate from the definition of T that w 2: v impliesTw 2: Tv, since a larger function can only give a larger maximum. Moreover,for any w E C(S) and C E R,we clearly have (w + c) E C(S), and, moreover,T(w +c) == Tw + lic. By mimicking the proof of the Contraction Mapping Lemma,it is easy to show that

Lemma 12.16 T maps C(S) into C(S).

Tw(s) = max (res, a) + (Jw(f(s, a»}.ae<l>(s)

Tw is evidently bounded on S since wand r are bounded functions. Moreover,since w is continuous and f is continuous, w 0 f is continuous as the compositionof continuous functions. By assumption, r is continuous on S x A, therefore theexpression in parentheses on the RHS is continuous on S x A. For fixed s, ct>(s) iscompact by hypothesis, so the maximum is well defined. The Maximum Theoremnow implies that Tw is also continuous on S. Therefore:

Step 1Let C(S) be the space of all real-valued continuous functions on S. Endow C(S) withthe sup-norm metric. We have already seen that C(S) is then a complete metric space.Now, define the function Ton C(S) as in subsection 12.5.1. That is, for wE C(S),let Tw be the function whose value at any S E S is given by:

W(rr*)(s) = w*(s).

By Step I, therefore, W(rr") is a fixed point of the mapping T. Thus, by Theo­rem 12.15, n" is a stationary optimal strategy.

Step 3 Finally, we shall show that the. total discounted reward W(rr*)(s) under thestationary strategy rr" defined in Step 2, from the initial state s, satisfies

for all S E S.

Step 2 Define C ..: S -4 peA) by

C"(s) = arg max {w*(s) = res, a) + liw*(f(s, a»}.Q€<I>(s)

We show that C* is well defined and admits a selection tt"; i.e., there is a functionrr ": S -+ A satisfying rr*(s) E C"(s) for all s E S. The function n" defines astationary strategy which satisfies, by definition,

w*(s) = res, rr*(s» + 8w"[f(s, ;rr*(s»],

Chapter 12 Stationary Dynamic Programming296

Page 305: A First Course in Optimization Theory

Iterating, we obtain for any integer T:

T-Iw·(s) = L 8(r,(Jl'°)(s) +8T w*[sr(]f*,s)].

(=0

On the other hand, by Step 2, we also have of LTC'

w*(s) = r(s, rr*(.\'» + '\w*[/(s, ]f*(s»]

= res, rr*(s» + 8w*(.\·I)

= res, rr*(s» + Sr(s!, at) + 82w*[/(S2)]= rO(]f*)(s) +- SrI (;rr*)(s) + 82w* [·\·2(rr"', s)].

00

W(Jl'*)(s) = L 81r(rr*)(s).,=0

Step 3

Define 11'° as in Step 2, and pick any initial state s E S. Recall that {51 (rr *, .1'), a, (11' *, s)]

denotes the sequence of states and actions that result from sunder rr ": and thatr(Jl'*)(s) = r[sr(Jl'*,s),al(rr*,s)] is the period-t reward under 11'* from .1'. Fornotational ease, let s( = s,(Jl' ° , s) and a, = a.tn", s). By definition of W(rr*), wehave

This completes Step 2.

WOes) = res, 11'* (s) + 8w*{f(s, Jl'°(s»]

2: res, a) + 8w*[f(s, a)], a E 1>(5).

By the Maximum Theorem, G* is a (nonernpty-valued) usc correspondence. Thus,there is a function 11'*: S -+ A such that for each s E S, ;rr*(s) E G*(s) C 1>(.1').The function n" defines a stationary optimal strategy, that by definition satisfies atall s E S,

Define GO: S -+ Pt A) by

GO(s) = arg max (res, a) +- ow * (/(s, a» = w·(s»).IlE<t>(s)

Step 2

WOes)= max {res, a) + 8w°(/(s, a»}.aEcl>(s)

Lemma 12.18 T has a uniquefixed point ui" E C(S). That is, there is a uniqueWOE C(S) that satisfies the/allowing equation at each S E S:

The following result, which obtains by combining these results, completes Step I~-

29712.5 Existence of an Optimal Strategy

Page 306: A First Course in Optimization Theory

12.6 An Example: The Optimal Growth Model

The one-sector model of optimal growth is a very popular framework in neoclas­sical economics. It studies the problem of a single agent (a "social planner") whomaximizes discounted utility from consumption over an infinite horizon subject totechnological constraints. It offers an excellent illustration of how convexity condi­tions can be combined with the continuity conditions on the primitives to provide avery sharp characterization of the solution.

W(rr*)(s) = max {res, a) + oW(rr*)(f(s, a»}aE¢>(S)

= res, rr"(s» + 8W(rr*Hf(s, rr * (s»].

Then, there exists a stationary optimal policy tt", Furthermore, the value functionV = W(rr*) is continuous on S, and is the unique bounded function that satisfiesthe Bellman Equation at each S E S:

1. r: S x A ~ IR is continuous and bounded on S x A.

2. f: S x A ~ S is continuous on S x A.3. <1>: S ~ peA) is a compact-valued, continuous correspondence.

Theorem 12.19 Suppose the SDP IS,A, <1>, f, r,oj satisfies the following condi­tions:

By Theorem 12.15, this equation establishes that rr* is an optimal strategy. Sincen" is also stationary, we have shown that a stationary optimal strategy exists underAssumptions 1-4. We summarize in the following theorem.

W(rr*)(s) = max {res, a) + cSW(JT*)(/(s, a»}.aE¢>(s)

it follows that

WOes) = max {r(s,a) +8w*(f(s, a»}.aE¢>(S)

so w* = W(rr"). Since Step 1 established that

00

1II"(s) = L r,(rr'")(s),1=0

Since ui" is bounded and li < 1, letting T ~ 00 yields

Chapter J 2 Stationary Dynamic Programming298

Page 307: A First Course in Optimization Theory

I. Under what conditions on u and I do optimal strategies exist in this model?2. When is the optimal strategy unique?3. What can one say about the dynamic implications of the optimal strategy? For

instance, letting {y" cr} denote the evolution of state-action levels under theoptimal strategy from an arbitrary initial state (i.e .• stock level) Yo:

(a) Will the sequences {y,} and {c/} be monotone. or will one (or both) exhibitcyclical tendencies?

The optimal growth model may be cast in a dynamic programming frameworkusing the following definitions. Let S* = IR+ be the state space. and A* = IR+ bethe action space. Let <l>(y) = [0, y] be the feasible action correspondence takingstates yES· into the set of feasible actions [0, y] C A'" at y. Let r(y. c) = II(C)be the reward from taking the action C E <l> (y) at the state y E S*. Finally, letF(y, c) = I(y - c) be the transition function taking current state-action pairs (y. c)into future states F(y, c). The tuple {S*. A". <P. r, F. oj now defines a stationarydiscounted dynamic programming problem, which represents the optimal growthmodel.

We are interested in several questions concerning this model. These include:

subject to YO = yY(+1 = I(x(). t =0, 1.2 .

c, +x( = y,. t =0.1.2 .c..x.> O. t=0,1.2 ..

00

Maximize L[/u(c()(=0

The basic model of optimal growth is very simple. There is a single good (themetaphorical com of ncoclnssical theory) which may he consumed or invested. Theconversion of investment to output takes one period and is achieved through a pm­duction function I: R+ -+ lR+. Thus. if XI denotes period-r investment. the outputavailable in period-It + I). denoted Y(+I. is given by I(x(). The agent begins withan initial endowment of Y = YO E IR++. In each period t = 0, 1.2 ..... the agentobserves the available stock Y( and decides on the division of this stock betweenperiod-r consumption CI and period-r investment x(. Consumption of c, in period I

gives the consumer instantaneous utility of u(c() where u: IR+ -+ IR is a utility func­tion. The agent discounts future utility by the discount factor 8 E [0. I). and wishesto maximize total discounted utility from lifetime consumption. Thus. the problemis to solve:

'12:0.1 The Model

12.6 The Optimal Growth Model

Page 308: A First Course in Optimization Theory

3The order in which results are listed here is not the order in which they are proved. Some of the resultslisted later here-such as the Ramsey-Euler Equation-are used to prove some of those listed earlier(such as the monotonicity of the sequence (c,1J.

4It must be emphasized that this result does not depend on f having any convexity properties.SThis is true only of the limiting values y. and c·. The path by which (y,1 and (c,1 converge to y. and c"does depend on u.

J2.6.2 Existence oj Optimal StrategiesWecannot directly appeal to Theorem 12.19to establish existence of optimal strate­gies in this problem, because u may be unbounded on 1R+.Rather than impose

(b) If the sequencesaremonotone,whatarethe propertiesof their limiting valuesy* and c"! In particular,are these limitingvalues independent of the initialstate YO (that is, are the "long-run" implications of growth independent ofwhere one starts)?

(c) Is it possible to use analogues of first-order conditions to characterize be­havior on the optimal path?

In subsection 12.6.2 below, we tackle the question of existence of an optimalstrategy.We show that under minimal continuity conditions, and a boundednessconditionon f thatenables compactificationof the state space, it is possible to showthat stationaryoptimal strategies exist.The characterizationof optimal strategies is taken up in subsection 12.6.3under

added strict convexity assumptions on / and u. We show that these assumptionscarry several strong implications, including the followingr'

1. If u is strictly concave, the optimal level of savings increases with stock levels.Therefore. the sequence of states {Yt} that results from any initial state under theoptimal plan is monotone, and convergesto a limiting value y:.4

2. If / and u are both strictly concave:(a) The optimal consumption sequence {et} from any initial state is also mono­

tone, and converges to a limit c".(b) The limiting values y* and c" are independentof the initial state; curiously,

they are even independent of the properties of the utility function u, anddepend only on the properties of the technology f and the discount factor8.5

(c) The right-hand side of the Bellman Equation is strictly concave in yandc. so a unique maximizing action exists for each y. As a consequence, theoptimal strategy is unique.

(d) If f and u are also continuously differentiable, then the optimal path mayalso be characterized using a dynamic analogue of first-order conditionsknown as the Ramsey-Euler Equations.

300 Chapter 12 Stationary Dynamic Programming

Page 309: A First Course in Optimization Theory

<1>(y') ::: [0, y'] :J [0, y] = <1>(y).

12.6.3 Characterization of Optimal StrategiesWhat more can we say about this problem without making additional assumptions?Consider two different initial states y and y' with y < y'. Let c = g(y) be an optimalaction at y. Then. c is a feasible (but not necessarily optimal) action at y', since

V(y) = max (1I(c)+8V[j(y-c)])CE[O.y]

= lI(g(.I'»+8VI/(y-g()'»].

Theorem 12.20 There is a stationary optimal strategy g: S ~ A ill the optima'growth problem under Assumptions I and 2. The value function V is continuous (If,

Sand satisfies the Bellman Equation at each yES:

The tuple IS, A, <1>, r. F, 8) now meets the requisite compactness and continuit­conditions to guarantee existence of an optimal strategy, and we have:

.<\~ -umptlon 2 II: IR.+ ~ lR is continuous on IR+.

Secondly, we make the usual continuity assumption on the reward function:

S = A = [0, v"].

Parts I and 2 of this assumption are self-explanatory. Pan 3 can be justified as aversion of diminishing marginal returns.

We assume that the initial state YO lies in some compact interval [0, y] of IR+. De­fine y* = max {x , YI. Then. by Assumption 1,if y E [0, yO]. we have .r(x) E [0, yO]for all x E [0, Y1. and we may, without loss of generality, restrict analysis to (0, y']We now set

I. (No free production) 1(0) = o.2. (Continuity and Monotonicity) f is continuous and nondecreasing on IR+.3. (Unproductivity at high investment levels) There is.r > 0 such that /(x) :5. .r

for all x :::x.

Assumption 1 The production function 1satisfies the following conditions:

boundedness on the framework as-an assumption. we consider a more natural tHld

plausible restriction which will ensure that we may, without loss of generali ty, restrictS* and A* to compact intervals in IR+, thereby obtaining boundedness of Ii from itscontinuity.

lOt12.6 The Optimal Growth Model

Page 310: A First Course in Optimization Theory

6An alternative. and somewhat lengthier. way to prove this result is to consider the set C' (S) of bounded.continuous. and nondecreasing functions on S with the sup-norm. This space is complete. and a simpleargument shows that the map T" defined on C'(Sl by

T"w(yl = malt(u(c) + c5w[/(y - ell Ic E [0.yllis a contraction that maps CO(S) into itself. (We use the monotonicity of wand f to show that r'w is anondecreasing function on S.) Therefore, T" has a unique fixed point VO. Since V' E C·(Sl. V' mustbe continuous. But V is the unique continuous function that satisfies the Bellman Equation. Therefore. wemust have V' = V. so V is nondecreasing.

Proof Suppose not. Then, there exist y. y' E S with y < y' and ';(y) > ~(y'). Forcase of notation, let x and x' denote ';(y) and S(v'). respectively.

Theorem 12.22 Under Assumptions 1-4, .; is nondecreasing on S. That is. y, y' E Swith y < y' implies ~(y) :::: ';(y').

V(y) = max (u(y-x)+cSV[f(x)]).X'E[O,y]

Note that under Assumptions 1-3, Theorem 12.21can be strengthened to the state­ment that V is a strictly increasing function. When Assumptions 1-3-are combinedwith the curvature condition in Assumption 4, we obtain a very strong conclusion:namely, that the marginal propensity to save must be nonnegative. That is, if the stocklevel increases from y to v'. then the optimal level of savings that results at y' mustbe at least as large as the optimal level of savings that results at y.

More formally, let ';(y) = y - g(y) denote the optimal savings (or investment)at y under the optimal strategy g. Since choosing the consumption level c at y isequivalent to choosing the investment level x = y - c, we must have

Assumption 4 u: 1R+ -+ IR is strictly concave on 1R+.

Assumption 3 u: R; -+ IR is strictly increasing on 1R+.

Without additional structure, it is not possible to further characterize the solution:we proceed therefore to make assumptions of increasing degrees of restrictivenesson the structure.

Theorem 12.21 V: S -+ IRis nondecreasing on S.

Moreover, since f is nondecreasing, consuming c at y' results in a period-I stocklevel of fCy' - c) ::: fey - c). Thus, in the continuation also, the optimal actionat ley - c) will be feasible (but not necessarily optimal) at f(y' - c). It followsby induction Ihal Ihe entire sequence of actions lei} that results from y under thcoptimal strategy g is feasible (but not necessarily optimal) from y'. Therefore:"

Chapter 12 Stationary Dynamic Programming302

Page 311: A First Course in Optimization Theory

Let let} and {e;} be the optimal consumption sequences from y and y' respectively.and let {ytl and {i,} denote the respective sequences of stock levels that arise. Ofcourse, we have

V(YA) :::: AV(y) + (I - A)V(y').

Proof Let y. y' E S with y # y'. Pick A E (0, I) and let YA = AY + (I - A) Vi. Weneed to show that

Theorem 12.23 Under Assumptions 1-5, V is concave on S.

Note that we do not (yet) require strict concavity of f. Nonetheless, this is nowsufficient to prove two strong results: the concavity of the value function V. andusing this, the uniqueness of the optimal strategy g.The second result is an especiallysignificant one.

Assumption 5 f is concave on lR+.

We now make an additional convexity assumption. Assumptions 1-5 describe theframework known as the concave one-sector growth model.

u(y - x') - u(y - x) s ut.y' - x') - u(y' - x).

But (y - x') and (y - x) are the same distance apart as (y' - x') and (y' - x).However, y < y', and u is increasing and strictly concave, which means we musthave U(y_X') - u(y-x) > u(y' - x') - u(y' -x), a contradiction. This establishesthe theorem. 0

and so

u(y - x) - u(y - x') ::::8[V(j(X'» - V(j(x») > u(y' - x) - u(y' - Xi).

From these inequalities we obtain:

V(y) = u(y - x) + 8V(j(x»

:::: u(y - x ') + 8V (j(x'».V(y') = 1I(y' - x') + 8V(j(x'»

~ ut y' - x ) + 8V(/(x».

Since y' > y:::: x, x is a f~as;bl~ level of savings at y'. Similarly. since Y'::::_ .'(:and x > x' by hypothesis, x' is a feasible level of savings at y. Since x and x' arethe optimal savings levels at y and v'. respectively. we must have

12.t'i The Optimal Growth Modr]

Page 312: A First Course in Optimization Theory

Theorem 12.24 Under Assumptions 1-5, the correSpondence of maximizers G ofthe Bellman Equation is single-valued on S. Therefore, there is a unique optimalstrategy g, and g is a continuous function on S.

owhich completes the proof.

= )"'V(y) + (1 - )",)V(i),

00 00

::: A LO'U(Cr) + (l - A) 2)ruCc~)1=0 1=0

00

= "L)' U[ACI + (1 - ).,)c;l1=0

00

V (n) :::Lor uCC;)1=0

The obvious induction argument now completes this step.Since Ic;) is a feasible, but not necessarily optimal. consumption sequence from

YA. and since II[)...C,+ (I - A)C;J ::: Au(e,) + (1 - A)U(C;) for each 1. we have

[(x~) = [lACy - co) + (1 - A)(y' - c~)]

::: )..I(y - co) + (l - )...)J(y' - co)

::: )...Cl+ (1 - A)C~_ A- Cl·

Using Xo = YA - cS and the concavity of J.we also have

YA = )..y+ (I - A)y' ::: ACo+ (1 .- A)co

The first of the required inequalities holds since

I(x,·) :::C~l·

Wewill establish that the sequence {ct) is a feasible C(>osumptionsequence from YA·Let {x;} denote the sequence of investment levels th~t will arise if {ct} is followedfrom YA. The feasibility of {ct} from YA will be es~ablished if we can show thatYA ::: cS and for t = 0, 1,2, ... ,

for all t. Now define for 1= 0,1,2, ... ,

c; = )..c,+ (1 - )..)c;.

Chapter 12 Stationary Dynamic Pr(Jgramming304

Page 313: A First Course in Optimization Theory

7The following is a sketch of the required arguments. We have seen that a strategy Jr in a dynamiv program­ming problem is optimal if and only if the payoff W (n) under Jr satisfies the Bellman Equation. Therefore.the actions prescribed by Tr at any Slate must solve the Bellman Equation. Since V is the Orll~ solutionto the Bellman Equation. it follows that if there is a unique action maximizing the right-hand Side of theBellman Equation, then there is also a unique optimal strategy.

Remark It is important to note that the proof of this result makes no use of theconcavity of f. 0

Theorem 12.25 Under Assumptions 1-7, it is the case that for all y > 0, we haveo < g(y) < y, i.e., the solution to the model is "interior:"

The assumptions that 11'(0) = +00 and f'(O) > 8-1 are called the lnpda conditions on u and f, respectively. They ensure essentially that the agent will try t<:

keep consumption levels strictly positive in all periods. If consumption in pcriod-t ispositive and in period-(t + I) is zero, the agent can gain in total utility by trallsferringa "small" amount from period-t to period-(t +1): the fall in marginal utility in periodt will be more than compensated for the gain in marginal utility in period (t + I).However, this argument is not quite complete, since f may be so "unproductive" asto make transferring any amount across periods unattractive. The Inada as~umptionon f rules out this possibility. The following result formalizes these ideas:

Assumption 8 f is strictly concave on lR+.

Assumption 7 f is C I on lR++ with lim, to f' (x) > 1/8.

Assumption 6 u is Cion lR++ with lilllcW u'(c) = 00.

It is very important to emphasize that this result shows the uniqueness of theoptimal strategy itself. not just that there is a unique stationary optimal straicgy. Weleave it as an exercise to the reader to explain why this is the case."

Finally, we add the following differentiability assumptions and obtain tHe differ­entiable concave model:

is strictly concave as a function of c. The single-valuedness of G follow'» As ;1

single-valued correspondence, G admits a unique selection g; since G is a usc cor­respondence. g must be a continuous function. [~

(u(c) + tSV[f(y - c»))

Proof By hypothesis, u is strietly-concave and f is concave. By Theorem 12.23; Vinherits this concavity. Thus, the RHS of the Bellman Equation

30<;J 2.6 The Optimal Growth Model

Page 314: A First Course in Optimization Theory

As an immediate consequence of Theorem 12.25,we obtain the Ramsey-EulerEquation, the first-order condition for optimality in the one-sectorgrowth model:

Thus,Cases I and 2 are both impossible, and the result is established. 0

This is not possibleunless x' = fey). But x' = fey) would similarly be impossibleas an optimal choice unless x" = fey'), where y" = fix'), and x" = v" - g(y"),and so on. Thus the only way this situation can arise is if on the entire path fromy, we have zero consumption. But this is evidently suboptimal since u is strictlyincreasing. 0

8u'(f(y) - x')f'(y) 2: u'(O).

Case 2: (x = y)

In this case, the FOC's are (using x = y)

+00 = ~u'(0)r (0) s u'(y) < u' (0) = +00.

If x = 0, then f(x) = 0, so x' = O. But these conditions then reduce to thecontradiction

~u'(j(x) - x')f'(x) ::.:u'(y - x).

Case J: (x = 0)

The first-orderconditions for a maximum in this case are:

Thereason is simple: suppose x was dominated in this problemby some z. Then, byconsuming(y - z) at the state y, and fez) - x' at the state fez), the utility over thefirsttwo periods beginning from y is strictly larger than followingthe prescriptionsof the optimal plan in these two periods. But in either case, the investmentat the endof the second period is x'; therefore, the continuationpossibilities after the secondperiod are the same in either case. This means x could not have been an optimalinvestmentlevel at y, a contradiction.Now,if itwerenot true that the solutionx to thistwo-periodmaximizationproblem

lies in (0, y), thenwe must havex = 0 or x = y.

max lu(y - z) + ~u(f(z) - x'»).ZE[O.yl

Proof Pick any y > 0. Let x = y - g(y) be the optimal investment at y, andx' = y' - g(y') be the optimal investment at y' = g(y - g(y», i.e., in the periodfollowingy. Then x must solve the following two-periodmaximizationproblem:

Chapter 12 Stationary Dynamic Programming306

Page 315: A First Course in Optimization Theory

( 1I'(e») (U'(CI)/'(y-e»)1I'(e) = U'(CI )/'(y - e) .

Suppose c ~ C.Then, since u is strictly concave, u'(c) ~ u'(c). Moreover, y - e >y - c, so f'(y - c) < f'Cy - c). Therefore, we must have II'(C) > 1I'(C!).OICI < CI.

ln summary, we have shown that if y > yand c ~ c, then these inequalitiesare repeated in the next period also; that is, YI > .h and CI < CI. Iterating on thisargument, it is seen that the sequence of consumption levels from y is dominated in

Therefore,

u'(c) :::::8I1'(cl)f'(y - e)

u' (c) = 811'(cl)I' (y - c).

Proof For notational ease, let c = g(y) and c == g(y). Let YI and YI denote,respectively, the output levels that result from y and j one period hence, i.e., YI =l(y-c)andYI == l(y-c).Finally,letc) = g(yJ) and CI = g<YI).BytheRamsey-Euler equation, we have

Theorem 12.27 Under Assumptions 1-7, g is increasing on S. That is, y > y impliesg(y) > g(y).

Using the Ramsey-Euler Equation, it is now an easy task to show that the sequenceof consumption levels from any initial state must also be monotone:

Il

Remark The Ramsey-Euler Equation is often expressed more elegantly as fol­lows. Let y > 0 be any given initial state, and let {Ct} and {Xt} denote the optimalconsumption and savings sequences that arise from y. Then, it must be the case thatat each t:

Proof We have shown that at each y, g(y) must solve

max (1I(e) + 811[/(y - c) - x']),CE[O.y]

and that the solution to this problem is interior. The Ramsey-Euler Equation is simplythe first-order condition for an interior optimal solution to this problem. lJ

Theorem 12.26 (Ramsey-Euler Equation) Suppose Assumptions J -7 are.met.Let y > 0, and y* = I(y - g(y». Then, g satisfies

1I'[g()')) = SII'[g(y*)lf'[y - g(y)].

12.6 The Optimal Growth Model

Page 316: A First Course in Optimization Theory

= 8f'(y" - c").

by the continuity of f. Moreover, sinee

u'(Ct) = 8u'(Ct+!)f'(Yt - Ct),

the continuous differentiability of f and u implies that in the limit II' (c") =8u'(c")f'(y" - c"), or

Proof For notational ease, fix y and suppress all dependence on y. Since the invest­ment function ~ is nondecreasing on S, and since f is also a nondecreasing function,ihe sequence {y,) is a monotone sequence. Since g is nondecreasing, the sequence(e,1 is also monotone. Therefore, both the sequences have limits. denoted, say, y"and c" . Since y,+1 = f (Yt - Ct) for all t , we must have

y" = f(y" - c")

Theorem 12.28 Given any yES, define the sequence of states (y, (y) )~o from yunder the optimal policy g as YO(Y) = y, and for t ~ 0, y,+ 1tv) = fly, (y) -g(y,(y»]. Let (cc(y») denote the corresponding sequence of consumption levels.Theil,

Note that y;, x;, and c; are all independent of the utility function u,and depend onlyon f and the value of S.

Finally, define the golden-rule consumption level ct to be that value of c that wouldmake y; a steady state, i.e.,

That is, y is a steady state under the optimal policy, if, whenever the initial state is y,the system remains at y forever under the optimal policy. Also define the golden-rulestate y; to be equal to f(x;> where x; is the unique solution to

Sf'(x) = 1.

y = fey - g(y)).

Finally, define a steady stale under the optimal policy g to be a state yES withthe property that

every period by the sequence of consumption levels from y'. This means the totalutility from y is strictly smaller than that from y' , a contradiction to the fact that Vis strictly increasing. 0

Chapter 12 Stationary Dynamic Programming308

Page 317: A First Course in Optimization Theory

Show that [satisfies the condition that Ij(x) - j(y)1 < [x - yl for all x, y E JR.but that [ has no fixed points.

6. Let Xbe a finite set. and In: X -t IR.Suppose jn converges pointwise to I. i.c.•for each x E X, In (x) ~ j(x). Show that In converges uniformly to f.

x:::;O

x c- O.{

X - ~eX,[(x) = I I

-2: + 2:x,

(a) Show that if a E (1,3), then f maps X into itself, i.c .. f(x) E X for allx EX.

(b) Show that f is actually a contraction if a E (I. 3). Find the fixed point as afunction of a for each a.

(c) What about a = I or a = 3?(d) Is X complete?

5. Let [: lR~ lRbe defined by

(a) [0, I)(b) Q(c) QC. the set of irrational numbers

(d) N(e) {I. 1/2, 1/3 •...• l/n J(f) {l. 1/2. 1/3 •... ,I/II, J U (OJ(g) [-1,2)

3. Let f: IR~ IRbe defined as I(x) :::ax + b. What are the values of a. b E JRthat make f a contraction?

4. Let X::: (1,00). Let f: X ~ lRbe given by

f(X)=~(x+~).

12.7 Exercises

1. Let (x,,) be a Cauchy sequence. Show that if (xn) has a convergent subsequencethen the sequence is itself convergent.

2. Determine whether or not the following subsets of IR. arc complete:

Thus, y. and c" are also solutions to the two equations that define y; and c; _SillfCthese solutions are unique, we are done. 0

30912.7 Exercises

Page 318: A First Course in Optimization Theory

Show that In converges pointwise to the constant function fCx) = 1.Does Inconverge uniformly to I?

10. Let X £= lR.+ be a compact set and let f: X ~ X be a continuous and increasingfunction. Prove that f has a fixed point.

II. Let X = [0, 1], and I: X ~ X be an increasing but not necessarily continuousfunction. Must f have a fixed point?

12. If I: lR ~ lR and g: IR ~ lR. are both contraction mappings, what can we sayabout fog?

13. Is it possible to have two discontinuous functions whose composition is a con­traction mapping?

14. If f: IR~ lRis a differentiable contraction mapping. what can we say about thevalue of dl(x)ldx?

15. A function F: X ~ Y is said to be an onto function. if for any y E Y, there issome x E X such that I(x) = y. Show that the mapping I(x) = (1 - x)I/2 isan onto mapping from [0, 1] to [0, I). Find the fixed points of this mapping.

[x] < n[x] ::: n.

Does fn converge uniformly to f?9. For each n E N let fn: lR ~ lRbe defined by

{I- !Ixl

I(x) = n'0,

x<l

x=l.{0,

f(x) =1,

i.e .• show that for each x E X. In(X) ~ f(x).(b) Let d be the sup-norm metric. What is d(ln. f)? Show that dUn. f) does

not converge to O. i.e .• In does not converge uniformly to f.8. Let X = [0, 11and let for each n E N define In: X ~ lR. as

In(x) = x".Show that fn converges pointwise to I: X ~ R defined by

x>Ox=o

{0.

I(x) =I.

7. Let X = [0, I]. For n E N define fn: X ~ lR by

{nx, x:Sljn

j~(x) =I, x > lin.

(a) Show that In converges pointwise to Iwhere I is given by

Chapter 12 Stationary Dynamic Programming310

Page 319: A First Course in Optimization Theory

Does a solution exist for all 0 E (0, I)? Why or why not?

22. A wholesaler faces a known demand of k units per period for its product. ata given price p. At any given time. a maximum of lEN (I > k) units of theproduct can be ordered from the manufacturer at a cost of c per unit. There is alsoa fixed cost I> 0 of placing the order. Ordered amounts are delivered instantly.Any amount of the product that remains unsold in a given period can be stored

if s = 0 or a ~ 1/2

if.s= I anda > 1/2.[ts,«: = {~

(a) Show that I is not continuous on S x A.(b) Suppose 0 = O. i.e., the future is worthless. Then. the problem is simply to

maximize r(s,a) for each s E S.What is the solution? What is V(O)? V( I)?(c) Suppose 0 E (0,1). Show that a solution exists. Find the solution as a

function of O.(d) What "should" the solution be for 0 = I?(e) Suppose that I is modified to

if s = 0if s = 1.{

-a

1 -0res. a) =

if s = 0 or a < 1/2ifs = 1 and c > 1/2.

I(s,o) = {~

Let r: S x A ~ a be given by

.~ -". .... '- 2'16. Let a mapping of the interval [0. 1] onto itself be defined by I(x) = 4x - 4x .

(a) Sketch the graph of the function and the diagonal line y = x .(b) Is the mapping one-to-one in the interval?(c) Find the fixed points of the mapping.

17. Give an example of a mapping of the interval [0. I] into itself having preciselytwo fixed points. namely 0 and 1.Can such a function be a contraction mapping')

18. Give an example of an onto function of the open interval (0. I) into itself havingno fixed points.

19. Let X = [0.1]. Find an function I: X - X which has 0 as its only fixed point.c; ~ show that no such example is possible.

20. LLt SCan. Let BU (S) be the set of all bounded uppcr-sernicontinuous func­tions '1) mapping S into JR. Give EU(S) the sup-norm metric and show thatEU(S) is a complete metric space under this metric.

21. Let S = (0, I). A = 1R+. and <P(s) = A for any s E S. Let the state transitionfunction be given by

31112.7 Exercises

Page 320: A First Course in Optimization Theory

(a) Doesthis problem have a solution?Why or why not?(b) Describe the Bellman Equation for this problem.(c) Showthat the value function in (b) is nondecreasing on S.

24. Suppose that in the optimal growth problem we had u(c) = ca. a E 0, I andj(x) = x. Solve for the optimal policy g with S = [0, I] and 8 E (0, I).Hint 1:Use the FOe's for the problem.Hint 2: g is linear, i.e., g(y) = ay for some constant a. Solve for a.

25. Redo the last question, assuming f(x) = xa for a E (0, I], and u(e) = loge.(Note that log c is unbounded at0, so our sufficientconditionsfor the existence ofan optimal strategy do not apply here. Nonetheless, it turns out that the BellmanEquation does hold, and a solution can be calculated using this equation. Thevalue function V has the form V (y) = A + B logy. Using this on both sides ofthe Bellman Equation, solve for A and B, and thereby for an optimal policy.)

subject to YO = yES

y,+1 = j(x,)

C, +XI ::: y,ci, x, ~ 0.

00

Maximize L 8u(c,)1=0

Let S = [0. I]. Suppose the problemadmitsfree disposal. That is. the agent candecideon the amount c, of consumptionand x, of investmentout of the availablestock YI. and costlessly dispose the remainingquantity YI - CI - XI' Then. givena discountfactor 8 E (0, I). the agent solves:

(a) u and f are both continuous on IR+(b) j(x) E [0, I] for all X E [0.1].

at a cost of s per unit. Assuming that the wholesalerdiscounts future profits bya factor8 E (0, I), and has an initial stockof x unitsof the product, describe theoptimizationproblem as a stationarydynamicprogrammingproblem. and writedown the Bellman Equation. Is it possible to assert the existence of an optimalstrategy?Whyor why not?

23. Consider the one-sector model of optimal growth. Suppose the only restric­tions we place on the utility function u: 1R+~ IRand the production functionf: 1R+-+ 1R+are:

Chapter 12 Stationary Dynamic Programming312

Page 321: A First Course in Optimization Theory

Y'+I =/(y,-q,).

Assume that (i) f: lR+ ~ IR+ is a continuous function. (ii) 1(0) = 0 (no fishin the lake this period implies no fish next period), and (iii) f maps [0, i] into[0. x] for some i > O. Assume also that the initial stock of fish is YO E [0. iJ.Finally. assume that the firm discounts future profit levels by a factor 8 E [0, I).Set up the dynamic programming problem faced by the finn. Show that thisproblem admits a stationary optimal policy. Describe the Bellman Equation.

This problem. which is a dynamic programming problem with 8 = I, is knownas the "cake-eating" problem. We begin with a cake of size s, and have to allocatethis cake to consumption in each period of an infinite horizon.

(a) Show that the problem may be rewritten as a dynamic programming problem.That is. describe formally the state and action spaces, the reward function,the transition function, and the feasible action correspondence.

(b) Show that if u: A ~ IR is increasing and linear, i.e., u(a) = ku for somek > 0, then the problem always has at least one solution. Find a solution.

(c) Show that if u: A ~ JR is increasing and strictly concave, then the problemhas no solution.

(d) What happens to part (c) if we assume 0 < .5 < I, i.e., if the objectivefunction is given by L~o 8'I/(c,)?

27. In each period t = 0, 1,2, .... of an infinite horizon, a fishery must decide onthe quantity q of fish to be caught for selling in the market that period. (All fishcaught must be sold.) The firm obtains a constant price p per unit of fish it sells.Catching fish requires effort by the firm. In each period t . the firm must decideon its effort level e, E [0. I] that it wishes to expend in this direction. The totalcatch q( is then determined as a function q, = hte, y,) of the effort level e,the firm expends and the number of fish y, in the lake that period. Assume thath : [0,1] x IR+ ~ IR+ is a continuous function satisfying hie, y) E [0, y] forall y E IR+. e E [0, I]. Expending an effort level e, in period I also results in acost to the firm of c(e,). where c: [0. I] ~ IR+ is continuous.Lastly. the population of fish y, - q, not caught in period t grows to a populationy,+ 1 in period t + I according to a growth function /, as

00

subject to La,:s s,=0

00

Maximize L II (a, ),=0

26. Consider the following optimization problem:

12.7 Exercises

Page 322: A First Course in Optimization Theory

28. A firm's income in any period t depends on the level of capital stock X, that ithas that period, and the amount of investment i, it undertakes that period, and isgiven by g(xl) -h(il), where g and h are continuous functions. The capital stocknext period is then given by{hi + il> where f3 E (0, I) is a depreciation factor.In no period is investment allowed to exceed b > 0, where b is some fixed level.Investment is allowed to be negative but cannot be smaller than - f3x, where x isthe capital stock of that period. The firm discounts future income by 8 E (0, I).Given that the firm begins with an initial capital stock of Xo > 0, show that thedynamic programming problem facing the firm has a solution, and describe theBellman Equation. Explain all your steps clearly.

314 Chapter J 2 Stationary Dynamic Programming

Page 323: A First Course in Optimization Theory

315

It may be that there is no x E X that satisfiesthe property IT. For instance, there isno real number x that satisfies the property that x2 + 1 = O.In such a case, A willcontain no elements at all. A set which containsno elementswill be called the emptyset. J.::noted0.

A = {x I x satisfies the property Il].

If X is understood, then we will write this as simply

A = {x E X I x satisfiesthe property Il },

If A is the set of all elements from some collection X which also satisfy someproperty Tl, we will write this as

a ¢ A.

If a is not an element of A, we shall write

a E A.

A.I Sets, Unions, Intersections

Weadopt the naive point of view regarding set theory.That is, we shall assume thatwhat is meant by a set is intuitively clear. Throughout this chapter, we shall denotesets by capital letters such as A, B, X. and Y, and elements of these sets by lowercaseletters such as a. b. x, and y. That an object a belongs to a set A is denoted by

This appendix discusses the basic rules of set theory and logic. For a more leisure]yand detailed discussion of this material,we refer the reader to Halmos (1960), or theexcellent introductory chapter of Munkres (1975).

Appendix A

Set Theory and Logic: An Introduction

Page 324: A First Course in Optimization Theory

IFor instance, by the expression "the complement of the negative reals," one usually means the complementof the negative reals ill R, which is the set of nonnegative reals.

A.2 Propositions: Contrapositives and ConversesGiven two propositions P and Q. the statement"If P, then Q" is interpreted as thestatement that if the proposition P is true, then the statement Q is also true. We

If the reference set X is understood, as it usually is in applications.' we will omit thewords "in X," and refer to AC as simply the complement of A.

AC = (x E X Ix 1= AI.

If A C X. the complement of A in X. denoted AC• is defined as

An B = {x I x E A and x E H}.

The intersection of two sets A and B. denoted A nB. is the set which consists ofall elements which belong to both A and B:

A U B = Ix I x E A or x E H}.

If two sets A and H are not equal. we will write this as A #- H. Note that B is aproper subsetof A if and only if we have B C A and B =1= A. •

The union of two sets A and H. denoted A U H. is the set which consists of allelements whichare either in A or in B (or both):

A cHand He A.

The set B will be said to be a proper subset of A if B cA. and there is somex E A with x 1= H. In words. H is a proper subset of A if every element of H is alsoin A. but A containsat least one element that is not in B.

Twosets A and B are said to be equal. written A = B. if every element of A isalso an element of B. and vice versa. That is. A = H if we have both

A :) B.

We will also say in this case that A is a superset of B. and denote the relationshipB C A alternativelyby

B C A.

If every element of a set B is also an element of a set A. we shall say that B is asubset of A. and write

Appendix A Set Theory and Logic316

Page 325: A First Course in Optimization Theory

A statement and its contrapositive are logically equivalent. That is. if the statementis true, then the contrapositive is also true, while if the statement is false. so is thecontrapositive. This is easy to see. Suppose. first. that P => Q is true. Then. if Qis false, P must also be false: if P were true, then, by P => Q, Q would have tobe true, and a statement cannot be both true and false. Thus, if P => Q holds. then'" Q =:}"" P also holds. Now, suppose P => Q is false. The only way this canhappen is if P were true and Q were false. But this is precisely the statement that'" Q =>.......P is not true. Therefore. if P => Q is false, so is "" Q =>"" P.

If x3 is not positive. then x is not positive.

is the statement

If x is positive, then x3 is positive

For example, the contrapositive of the statement

'" Q => '" P.

is the statement

P => Q

since the square of a positive number is positive. However. Q can be true even if Pis not true, since the square of a negative number is also positive.

Given a statement of the form "if P, then Q," its contrapositive is the statementthat "if Q is not true, then P is not true." If we let ~ Q denote the statement that Qis not true (we will simply call this "not Q," for short), then the contrapositive of thestatement

P => Q.

We will also say in this case that" P implies Q."We stress the point that P => Q only says that if P is true. then Q is also true. It

has nothing to say about the case where P is not true; in this case. Q could be eithertrue or false. For example, if P is the statement x > 0 and Q is the statement thatx2 > O. then it is certainly true that

P =:- Q.

denote this by

:-1171\.2 Propositions

Page 326: A First Course in Optimization Theory

There exists a E A such that property n(a) does not hold.

Then. P is false if there is just a single element a E A for which the property n(a)does not hold. Thus, the negation of P is the proposition

For all a E A, property n(a) holds.

A.3 Quantifiers and NegationThere are two kinds of logical quantifiers. the universal or "for all" quantifier, and theexistential or "there exists" quantifier. The former is used to denote that a property nholds for every element a in some set A; the latter to denote that the property holdsfor at least one element a in the set A.The negation of a proposition P is its denial ~ P. If the proposition P involves a

universal quantifier, then its negation involves an existential quantifier: to deny thetruth of a universal statement requires us to find just one case where the statementfaiIs. For instance, let A be some set and let n(a) be some property defined forelements a EA. Suppose P is the proposition of the form

For example. if P is the proposition that x > 0 and Q is the proposition that x3 > O.we have P <:? Q.

P '¢} Q.

is false: x could be negative and still satisfy x2 > O.If a statement and its converse both hold. we express this by saying that "P if and

only if Q." and denote this by

Q=>Pbut the converse

P => Q.

that is. the statement that "if Q. then P."There is no logical relationship between a statement and its converse. As we have

seen. if P is the proposition that x > 0 and Q is the proposition that x2 > O. then itis certainly true that

Q => P,

An important implication of the logical equivalence of a statement and its contra­positive is the following: if we are required to prove that P => Q. the result can beregarded as established if we show that r- Q =>~ P.The converse of the statement P => Q is the statement that

Appendix A Set Theory and logic3111

Page 327: A First Course in Optimization Theory

In fact, while the first statement is true (it asserts essentially that every positive realnumber has a positive square root), the second is false (it claims that a single fixedreal number is the square root of every positive number).

The importance of the artier of quantifiers makes it necessary to exercise cautionin forming the negation of statements with multiple quantifiers, since the negationwill also involve the use of multiple quantifiers, To elaborate, let 0 (a, b) denote a

is most definitely not the same as the statement that

There exists y > 0 such that for all x > 0, y2 = X.

For every x > 0, there exists y > 0 such that .1'1 = X

However, the order of the quantifiers becomes significant if quantifiers of differenttypes are involved. The statement

For all Y E IR, for all x E IR, (x + y)2 = x2 + 2x y + y2.

is the same as the statement

For all x E JR, for all y E IR, (x + y)2 = x2 + 2xy + y2,

When multiple quantifiers are involved in a statement. the situation gets a littlemore complicated. If all the quantifiers in a given proposition are of the same type (i.e.,they are all universal, or are all existential) the order of the quantifiers is immaterial.For instance, the statement

As a concrete example, consider the following. Given a real number x, let Il (x)be the property that x2 > O.Let P be the proposition that "Property 0 (x) holds forevery real number x." In the language of quantifiers. we would express P as

For every x E IR, x2 > O.

P is evidently negated if there is at least one real number whose square is not strictlypositive. So the negation .......P is the statement

There is x E JR such that x2 :I O.

For all b E B, property O'(b) does not hold.

its negation is the proposition

Similarly, the negation of an existential quantifier involves a universal quantifier:"to deny that there is at least one case where the proposition holds requires us to showthat the proposition fails in every cascoThat is. if Q is a proposition of the f0n11

There exists b E B such that property 0' (b) holds.

,1'1A.3 Quollti/iers and NI'K(l/iOIl

Page 328: A First Course in Optimization Theory

Under some subsidiary hypotheses on f and D,we then try to identify implicationsQ of x' being a maximum point. Any such implication Q. that arises from theassumption that x' is a maximum. is a necessary condition for an optimum. wheneverthe subsidiary hypotheses used in this derivation hold.

This is the approach we take in Chapters 4 through 6 of this book. In Chapter 4, forinstance. we make the following subsidiary hypotheses, that we shall call HI here:

P: x· is a maximum of a function f on a constraint set D.

is valid. Then. Q is said to be a necessary condition for P. The reason for thisterminology is apparent: if P => Q holds. and P is to be true, then Qmust necessarilyalso be true.

In optimization theory, P is usually taken to be a statement of the form

P => Q

A.4 Necessary and Sufficient ConditionsThe study of optimization theory involves the use of conditions that are.called neces­sary conditions and sufficient conditions. These are implications of the form P =? Qand Q ::::}P that were discussed in the previous section. There is. however, one pointthat bears elaboration. Necessary and sufficient conditions in optimization theory areusually derived under some subsidiary hypotheses on the problem, and their valid­ity depends on these hypotheses holding. Moreover. these hypotheses need not bethe same; that is, the necessary conditions may be derived under one set. while thesufficient conditions may use another. We discuss some aspects of this issue here.

We begin with a definition of necessary conditions in the abstract. Suppose animplication of the form

For every b E B. there exists a E A such that n (a. b) fails.

We reiterate the importance of the order of quantifiers in forming this negation. Thenegation of P is not the statement

There exists a E A such that for all b E B. Il (a. b) fails.

The statement P will be falsified if there is even one a E A for which the propertyD(a, b) fails to hold. no matter what we take for the value of b E B. Thus, thenegation of P is the staternent r- P defined by

I-or every a EA. there exists b E JJ such that n((I. h) holds.

property defined on elements a and b in sets A and B. respectvely. Consider thestatement P

Appendix A Set Theory and Logic320

Page 329: A First Course in Optimization Theory

2The actual conditions we assume are weaker than this.

The objective is then to find a set of conditions Q* on f and D such that propositionP* is true. Such conditions Q*. in the context of optimization theory, are calledsufficient conditions for the existence of a maximum. These are the conditions thatChapter 3 and the chapters on dynamic programming are primarily concerned with.

P": There exists a maximum of f on D.

holds. then proposition P is also true. That is, the condition DI(x *) = 0 is a sufficientcondition for a maximum at x* provided H2 holds.

An alternative class of sufficient conditions that optimization theory studies in­volves the statement P* that

Q: Df(x') = 0

It is then shown that if proposition Q that

H2: D is convex and open, and f is concave and differentiable on D.

The objective now is to find. under some subsidiary hypotheses on the problem, aset of conditions Q such that whenever the conditions in Q are met at the point x· ,P is always true. Such conditions are called sufficient conditions for the point x " tobe a maximum.

This is the route we follow in Chapters 7 and 8 of this book. In Chapter 7, forinstance. one of the results assumes the subsidiary hypotheses H2 that

P: x* is a maximum of I on V.

is true. Then. Q is said to be a sufficient condition for P. In words, since the truth ofQ must imply the truth of P, it is enough for P to be true that Q is true.

Sufficient conditions come in many forms in optimization theory. In one, we takeP to be a statement of the form

Q=}P

Thus. the condition that DI(x·) = 0 is a necessary condition for I to have amaximum at x", provided HI holds.

For a definition of sufficient conditions in the abstract, suppose that a statement ofthe form

Q: Df(x') = O.

We then show that if the proposition P (that x* is a maximum of I on D) is true,then the proposition Q must be true, where Q states that

HI: 1) is open; and f is differentiable on '02

321AA Necessary and Sufficient Conditions

Page 330: A First Course in Optimization Theory

provided H2 holds.

is necessary and sufficient for the proposition p that

P: x* is a maximum of f on V

Q: Df(x*) = 0

In this case, P and Q are equivalent in the sense that either both are true, or both arefalse. Thus, identifying the truth of P is the same thing as identifying the truth of Q.

In some sense, this equivalence makes conditions that are necessary and suffi­cient an ideal set of conditions for working with in optimization theory. However, asshould be apparent from the two examples above, necessary conditions often requirefar weaker subsidiary hypotheses on the problem than sufficient conditions. There­fore, necessary conditions derived under a given set of subsidiary hypotheses areunlikely to also be sufficient under those hypotheses. For instance. the condition thatDf(x*) = 0 is necessary for x* to be a maximum under HI, but it is not sufficientunder HI. Sufficiency of this condition requires the stronger hypotheses containedin H2.

On the other hand, since the subsidiary hypotheses used in deriving sufficientconditions are typically stronger than those used to prove that the same conditionsare necessary, it is possible that sufficient conditions derived under a given set ofhypotheses may also be necessary under the same hypotheses. For instance. it isapparent that whenever H2 is met. HI is also met. Therefore, we have the followingresult: the proposition Q that

P <:} Q.

In Chapter 3, for instance, we show that if Q" is the set of conditions that

Q*: D is a compact set in lR.n and f is continuous on D

then it is true that Q" '* P*, i.e .• that a solution exists in the maximization problem.A condition Q is said to be both necessary and sufficient for P if it is the case that

Appendix A Set Theory and Logic322

Page 331: A First Course in Optimization Theory

323

It is assumed throughout this appendix that the reader is familiar with handlingrational numbers, and with the rules for addition (+) and multiplication (.) of suchnumbers. It can be shown that under these operations, the rational numbers form afield; that is, for all rationals a, b, and c in Q, the following condi lions are met:

1. Addition is commutative: a + b = b + a.2. Addition is associative: (a + h) + c::=::a + (b + c).

Q will denote the set of rational numbers:

Q = {XIX=~' p,qEQ, qi:O}.

N = {1.2,3, ... j7L= t ... , -2, -1,0.1,2, ... J.

B.l Construction of the Real LineWe use the following notation: N will denote the set of natural numbers and Z theset of all integers:

There are many methods for obtaining the real number system from the rationalnumber system. We describe one in this appendix, which "constructs" the real numbersystem as (appropriately defined) limits of Cauchy sequences of rational numbers.An alternative constructive approach-the method of Dedekind cuts-is describedin Rudin (1976). A third approach, which is axiomatic, rather than constructive, maybe found in Apostol (1967). Bartle (1964), or Royden (1968).

Our presentation in this appendix. which is based on Strichartz (1982), is brief andrelatively informal. For omitted proofs and greater detail than we provide here, werefer the reader to Hewitt and Stromberg (1965), or Strichartz (1982).

Appendix B

The Real Line

Page 332: A First Course in Optimization Theory

1.4, 1.41, 1.414, 1.4142, 1,41421,1.5. 1.42, 1.415, 1.4143, 1.41422, ...

The second idea central to our construction is that the approximatingsequence ofrationals cannot be unique. For instance, the sequences

1.96, 1,9881, 1.999396. 1.99996164.....

when squared. results in the followingsequence.which gets closer and closer [0 2:

1.4, 1.41, 1.414, 1.4142, .. ,

where [x] denote the absolute value of a rational x, i.e., [x] = x, tf x ~ 0, andIx I = -x, otherwise.The need to extend the rational number systemarises from the observation that

thereare objects suchas .j2that are easy to describe(say, as the solution to x2 = 2),and that intuitively"should" form part of a reasonablenumber system, but that arenot part of the rational number system. The real number systemattempts to extendthe rational number system to close such holes, but also in a manner which ensuresthat the field properties of the rationals and the order relation (» are also extendedto the reals.Twosimpleand intuitive ideasunderlie theconstructionof the reals thatwe present

here. The first is that although objects such as ..fi are not rational, they are capableof being approximated arbitrarily closely by rational numbers. For example, thesequenceof rational numbers

Ix+ yl :::: Ixl +Iyl,

A rational pf q (with q i= 0) is positive if p and q are both positive; it is negative ifp is positiveand q negative, or q is positive and p negative;and it is zero if p = O.Everyrational is either positive, negative, or zero.The rationals are also ordered: forevery two distinct rationals a and b, we have either a > b (if a - b is positive) orb > a (if a - b is negative). Finally, the rationalssatisfy the triangle inequality: if xand yare any rationals, we have

3. Multiplicationis commutative: a . b = b . a.4. Multiplicationis associative: (a . b) . c =a . (b. c).5. Multiplicationdistributes over addition: a . (b + c) = a . b + a . c.6. 0 is the additive identity: 0 + a = a.7. I is themultiplicative identity: I . a = a.8. Everyrational has a negative: a + (-a) = O.

9. Everynon-zero rational has an inverse:a- ~ = 1.a

Appendix B The Real Line324

Page 333: A First Course in Optimization Theory

Definition B.t The set of all equivalence classes of Cauchy sequences of rationals.denoted JR, is called the real number system. A typical clement of R denoted x. iscalled a real number.

We write this as {xn} '" {Yn}. In words, equivalent Cauchy sequences arc those whoseterms are getting closer together, the farther we go out in the sequences. Intuitively,if two Cauchy sequences are equivalent, they can be viewed as approximating thesame real number.

It is easy to show that ~ is, in fact, an equivalence relationship: it is reflexive((x,,) ~ (xn)), and symmetric ({xn) ~ {Yn} implies {Yn} ~ (xn I). It is also transitive:that. s. {xn} '" {Yn} and {Yn I '" {zn} implies {xn} '" {zn}. We leave the proof of thetransitivity of r- as an exercise to the reader.

An equivalence class C of a Cauchy sequence of rationals is a collection of Cauchysequences of rationals such that if (xn) E C and LVn lEe. then (Xn) '" l,I'n)'At an intuitive level, all members of a given equivalence class may be viewed asapproximating the same real number. Thus, we may as well identify each equivalenceclass with a real number, and this is, in fact, our definition:

In words, a Cauchy sequence of rationals is one where the distance between termsin the tail of the sequence gets smaller, the farther one goes out in the sequence.

Two Cauchy sequences of rationals {xn} and {Yn} will be called equivalent if forall natural numbers n EN, there ex.ists men) such that for all k ::: men), we have

1IXk - xli < -.

n

both approximate -Ji, the first sequence from "below" (each term in the sequence-is '­strictly smaller than -Ji), and the second from "above" (each term is strictly largerthan -Ji).

To sum up, the idea is to regard real numbers as limits of sequences of rationalnumbers, but with the caveat that two sequences of rationals arc to be regarded asequivalent (i.e., as defining the same real number) if they are themselves gettingcloser together, the farther one goes out in the sequence. We formalize these ideasnow.

Let lal denote the absolute value of a rational number a, and let la - hi denotethe distance between two rationals a and b.A sequence of rational numbers (xn I iscalled a Cauchy sequence if it is the case that for all natural numbers n E N, thereexists an integer m (n), such that for all k ; I ::: m (n), we have

H.I Construction of th« Reals

Page 334: A First Course in Optimization Theory

The number x is negative if -x is positive. Under these definitions, it can be shownthat the real numbers inherit another important property of the rationals:

IXk 2: -, k 2: men).

n

Now, let {xn} be any Cauchy sequence of rationals converging to x ER Then, xis said to be positive if there is n E Nand m (n) such that

TheoremS.3 The real numbers form a field.

In view of this lemma. we may simply define the sum (x +y) of two real numbers xand yin IR as the equivalence class of {xn +Yn}. where {xn J converges to x (i.e .• (xn Jis in the equivalence class of x) and {YnJ converges to y. Similarly. the product x . ycan be defined as the equivalence class of {xn· Yn}. With addition and multiplicationdefined thus. our next result states that one of the properties desirable in IR is. in fact,true. The proof of this result is omitted.

oProof Left as an exercise.

Lemma B.2 Let (xn) and {x~} be Cauchy sequences of rationals in the equivalenceclass x E JR. and let (Yn} and {y~} be Cauchy sequences of rationals in the equivalenceclass Y E llt Then:

I. {xn+Yn} and {x~+y~} are both Cauchy sequences of rationals-Moreover; theyare equivalent sequences.

2. {xn . Yn} and {x~ . Y~} are both Cauchy sequences of rationals. Moreover. theyare equivalent sequences.

B.2 Properties of the Real LineWenow tum to an examination of the properties possessed by the real line. when it isdefined as in the previous section. We are especially interested in examining whetherthe reals also constitute a field. and whether they can be ordered. The first questionrequires us to first define the notion of addition and multiplication for arbitrary realnumbers. The following result shows that one may proceed to do this in the obviousmanner.

In keeping with the intuition underlying this definition. we will also say that if(xn J is an element of the equivalence class x E R, then {xn} converges to x, or thatr is the limit of the sequence {xn}.

Appendix B The Real Line326

Page 335: A First Course in Optimization Theory

for each k, Taking limits as k ~ 00, and using Lemma B.S, the theorem is proved.[J

Proof Let {Xk} and {Yk} be sequences of rationals converging to x and y, respectively.By definition, the Cauchy sequence of rationals {Xk + Yk 1 converges to x + y. Bythe triangle inequality for rationals,

Ix+yl s Ixl+lyl.

Theorem B.6 (Triangle inequality for real numbers) Let x and y be real numbers.Then,

Proof If not, then x < y, so x - Y < O. Since x - y is the limit of Xn - Yn. theremust then exist m E Ii such that Xn - Yn < - ~ for all large n. This contradicts theassumption that Xn ::=:. Yn for all n. [J

Lemma 8.S Let x and y E lRbe the limits of the Cauchy sequences of rationals{xn} and {Yn}, respectively. If Xn ::::,Yn for each n, then x ::=:. y.

We can also define inequalities for real numbers: given x and y in lR. we say thatx > y if x - Y > O. that x < y if y - x > 0, etc. Finally. we define the (Euclidean)distance between two real numbers x and y. to be the absolute value [r - yl of theirdifference. The following simple result comes in handy surprisingly often:

x::::,Ox < O.{

X,Ixl =

-x,

When the notion of positivity on a field satisfies the condi tions of this theorem,the field is called an ordered field. Thus. the reals, like the rationals. are an orderedfield.

The notion of a positive number can be used to define the notion of absolute valuesfor real numbers. Given any x E lR. the absolute value of x, denoted [x], is definedby

oProof Omitted.

Theorem 8.4 Every real numberi~ positive, negative, or zero. The sum and productof positive numbers is positive, while the product of a negative and positive numberis negative.

327B.2 Properties of the Reals

Page 336: A First Course in Optimization Theory

IFor a definition of denseness in the context of abstract metric spaces, see Appendix C.2For a definition of completeness in the context of abstract metric spaces, see Appendix C.

Proof It is easy to see that if a sequence {XII} converges to a limit x, then {xn} mustbe a Cauchy sequence. We leave the details to the reader as an exercise.

So suppose that {xn I is a Cauchy sequence of real numbers. We are to show thatthere is X E R such that Xn ~ x.

Theorem B.8 (Completeness of R) A sequence of real numbers {xn} has a limit ifand only if it is a Cauchy sequence.

Note that these definitions are meaningful, since we have defined inequalities betweenany two real numbers.

Our last result is particularly important because it shows that if we do to thereals what we did to the rationals, no new numbers are created: the limits of Cauchysequences of real numbers are also only real numbers. This fact is expressed as sayingthat the real line R is completer

IXk - xII < E.

We also say that {xn} is a Cauchy sequence if for all E > 0, there is neE) such thatfor all k, J :::: n (E), we have

IXn - x] < E.

Extending the definition for rational numbers in the obvious manner, we say thata sequence of real numbers {Xn} converges to a limit x E R if it is the case that forall E > 0, there is neE) such that for all n ::::neE), we have

Define y = xm(n)' Then, IXk - yl < * for all k, so taking limits as k ~ 00, andusing Lemma B.5, we obtain IXyl .s *. 0

k,l ~ men).

Proof Let {xnJ be a Cauchy sequence of rationals converging to x. Then, given n,there exists TIl (n) such that

Theorem B.7 (Denseness of Rationals) Given any real number x and any n E N,there exists a rational number Y E Q such that [x - yl ::: *.

There is another important property of the real line: it is the case that there arerational numbers arbitrarily close to any real number. This fact is expressed by sayingthat the rationals are dense in the reals. t

Appendix B The Real Line328

Page 337: A First Course in Optimization Theory

Since {Yn} converges to x, it is the case that [r - Yn I can be made arbitrarily smallfor all n sufficiently large. Since this also true of the term *' it follows that IXn)converges to x. 0

IIXn - Ynl < -.n

It is not very difficult to show that since {xn} is a Cauchy sequence, (Yn) is also aCauchy sequence. Once again. we leave the details to the reader as an exercise. Since{Yn} is a Cauchy sequence of rationals. it converges to a limit x E lR.

We will show that x = lim~->ooxn also. To this end. note that for any n, we have

IIx - Xn I ~ Ix - Yn I + IYn - Xn I ~ [r - Ynl + --.

n

-Since each Xn is a real number. there is a rational number Yn arbitrarily close to

it. In particular, we can choose a sequence {Yn Iof rationals such that for each n, wehave

329B.2 Properties of the Reals

Page 338: A First Course in Optimization Theory

330

Throughout, the letters a, b. c, etc. will denote scalars while x, y, z, will denoteelements of V.

C.l VectorSpacesA vector space over JR (henceforth, simply vector space) is a set V. on which aredefined two operators "addition." which specifies for each x and y in V. an elementx + yin V; and "scalar multiplication," which specifies for each a E JR and x E V.an element ax in V. These operators are required to satisfy the following axioms forall x, y, Z E V and a, b E lR:

1. Addition satisfies the commutative group axioms:

(a) Commutativity: x + y = y + x.(b) Associativity: x + (y + z) = (x + y) + z.(c) Existence of zero: There is an element 0 in V such that x + 0 = x.(d) Existence of additive inverse: For every x E V, there is (-x) E V such that

X+(-.\:) =0.

2. Scalar multiplication is

(a) Associative: (ab)x = a(bx), and(b) Distributive: a(x + y) = ax +ay. and (a + b)x = ax + bx.

This appendix provides a brief introduction to vector spaces, and the structures (innerproduct, norm, metric, and topology) that can be placed on them. It also describes anabstract context for locating the results of Chapter Ion the topological structure on]Rn. The very nature of the material discussed here makes it impossible to be eithercomprehensive or complete; rather, the aim is simply to give the rea~er a flavor ofthese topics. For more detail than we provide here, and for omitted proofs, we referthe reader to the books by Bartle (1964), Munkres (1975), or Royden (1968).

Structures on Vector Spaces

Appendix C

Page 339: A First Course in Optimization Theory

Many different structures can be placed on V. We will examine four of these (innerproduct spaces, normed spaces. metric spaces. and topological spaces) below. Thefour we examine are successive generalizations in that every inner product spacegenerates a normed space, every normed space a metric space, and every metric

Once again, it is a simple matter to verify that all relevant axioms are met. so C([O. I])defines a vector spaee over R 0

(j +g)(O = f(~) + g(~)

(afHO = a(O.

C([O. I]) = {f:[0. I] -+ lR I f is continuous on (0. I]).

Let V = C«(O. 1]). Define addition and scalar multiplication on V as follows: for /,gin V and a E IR, let (f + g) and af be the functions in V. whose values at any~ E [0. I] are given respectively by

Example C.3 Let C([O, 1]) denote the space of all continuous functions from [0. IIto IR:

uThen. V is a vector space.

where IXi I is the absolute value of Xi. Define addition and scalar multiplication on Vas follows: for x = (Xl. X2 •... ) and Y = (YI. )'2 •... ) in V, and a E lR. let

X + Y = (XI + YI. X2 + n.···)ax = (axl. (lX2, ... ).

sup IXi I < 00.iEN

Example C.2 Let V be the space of bounded sequences in JR. A typical clement 01-V is a sequence x = (Xl, X2. " .) which has the property that

It is an elementary matter to check that addition and scalar multiplication so definedmeet all of the requisite axioms to make JRn a vector space over JR. []

x + Y = (XI + )'1•...• Xn + Ynl

/lX = (axl •... , aXn).

Example C.l Let V = IRn. Define addition and scalar multiplication on V in theusual way: for x = (XI •...• xn) and), = CVI •...• Yn) in V, and a E R let

Standard examples of vector'spaces V, that we will use for illustrative purposesthroughout this chapter, include the following:

3:'1C. I Vector Spaces

Page 340: A First Course in Optimization Theory

(f, f) = f f(~)f(nd~

= f(J(~))2d~.

It is immediate that the symmetry condition is met. To check positivity, note that

(f, g) = 10' J(;)g(nd~.

Example C.S Let V = C([O, ID, with addition and scalar multiplication definedas in Example C.3. For J.gE V,let

We have seen in Chapter 1 that (.,.) defined in this way satisfies the conditionsof symmetry, positivity, and bilinearity. Thus. (x, y) = I:7=1 XiYi defines an innerproduct over V = JR.". the Euclidean inner product. 0

n(x,y) = x·y = LXiYi.

i=t

Example C.4 Let V = JR.", with addition and scalar multiplication defined as inExample C.l. For x, y E IRn, let

An imler product space is a pair (V, (', .}), where V is a vector space over R. and(', .) is an inner product on V.

Typical examples of inner product spaces include the following:

I. Symmetry: (x, y) = (y, x). .. ,".

2. Bilinearity: (ax+by, z) f~, z)+,\~y, z},and (x, ay+bz) =~~1M-¥tx,,~).3. Positivity: (x, x) ::: 0, (x, x) = 0 iff x = O.

C.2 Inner Product Spaces

The most restrictive structure on V that we shall study is that of the inner product.An inner product on V is a function (', .) from V x V to IRthat satisfies the followingconditions for all x, y E V:

space a topological space. The reverse containments need not always be true; thus,for instance, there exist topological spaces that are not generated by any metric space,and metric spaces that are not generated by any normed space.

Appendix C Structures on VectorSpaces332

Page 341: A First Course in Optimization Theory

C.3 Normed SpacesA notion less restrictive (in a sense to be made precise) than the inner product is thatof a norm. A norm is a function 11·11: V -+ IRthat satisfies the following requirementsfor all x , y E V:

I. Positivity: IIx II ::: 0, IIx II = 0 iff x = O.

IJobvious changes.

Proof The proof of the Cauchy-Schwartz inequality for the Euclidean inner productin ]Rn (see Chapter I, Theorem 1.2). relied only on the three properties defining aninner product. and not on the specific definition of the Euclidean inner product itself.The current result may, therefore, be established by mimicking that proof. with the

for all x , y E V.

(r , y) :::. (r , x) 1/2(y. y) 1/2

Theorem C.6 (Cauchy-Schwartz Inequality) Let \" .) be an inlier product Or! V,Then,

defines an inner product on V. Unless at = a2 = I, this is not the Euclidean innerproduct on ]R2.

We close this section with a statement of a very useful result regarding innerproducts:

It is important to note that more than one inner product can be defined on thesame vector space V. For instance, let V = ]R2, and let at and a: be strictly positivenumbers. Then, it can be verified that

iuf' + bg, II) = jla/(o + hg(O)h(Od~

= Jaf(~)h(~)d~+ fhg(~)Ir(Od~1/,;,' r)-,

= ~~5f,II) +~~!~,II),

so bilinearity also holds. Therefore, (f, g) = J f(~ )g(l;)d l; defines an inner productover V = C([O, In. [J

This last term is always noniiega'tive, since (j(~»)2 :::0 for all ~, and the 'intcgr~l-ota function which only takes on nonnegative values is nonnegative, Finally,

.1\1C.} NOTl1l1'd SI'(/('('s

Page 342: A First Course in Optimization Theory

As in the earlier example, it is easy to check that the positivity and homogeneityconditions are met by II . lip. The triangle inequality is again a consequence of theMinkowski inequality (see Royden, 1968, p.lI4). 0

(

I )IIP1I/IIp = 10 1/(~Wd~

Example C.S Let V = C([O, 1]), with addition and scalar multiplication definedas in Example C.3. For lEV and p 2: I, let

IIxlloo = Ilxllsup = sup Ixd·i=I .....n

Note that, as with inner products, Example C.? shows that many different normscan be placed on the same vector space. Examples e.8 and C.9 below, which definevarious norms on the space C ([0, 1n, reiterate this point.

When p = 2, the p-norm II . lip on Rn is called the Euclidean nann. As p --+ 00

we obtain the sup-norm

which is precisely the triangle inequality.Therefore, for any p ~ I, II . lip defines a norm over IRn, the so-called p-norm

on lR,n. 0

where IXi Idenotes the absolute value of Xi.

It is easy to see that II· II satisfies homogeneity. It also satisfies positivity since Ix; I,and, hence, IXi IP, is nonnegative for each i. Finally, that II . lip satisfies the triangleinequality is a consequence of an inequality known as the Minkowski inequality. Fora general ~\a\emen\ of M\nk.ow~k.\'Sineq\\a\it~, we refer the reader \QRQ~den (t96&,p.114), In the special context of Rn , the Minkowski inequality stat~s that

(n ) II P

IIxllp = L IXilP ,1=1

Example C.7 Let V = IRn, and let p 2: 1be any real number. Define II . lip: V --+ IRby

An normed space is a pair (V, II . 11), where V is a vector space, and 11. II is a normon V. Standard examples of nonned spaces include the following:

2. Homogeneity: [ax] = lIall . [x],3. Triangle Inequality: IIx+ yll s IIxll + lIyU.

Appendix C Structures on Vee/orSpaces334

Page 343: A First Course in Optimization Theory

It is actually true that the parallelogram law characterizes norms and their assoc i­ated inner products. If a norm II ·11satisfies the parallelogram law. then the following

oProof Omitted. See, e.g., Bartle (1964, p.58).

Theorem C.1I (Parallelogram Law) If II . II is a norm generated by an innnproduct. then it satisfies the following equation for all .r, y E V:

[JProof Immediate.

Theorem C.lO Let (', .) be an inner product on V. Letllx II = (x, x) 1/2. Then, II·is a norm on V.

The following results relate norms and inner products. The first result shows tnatevery inner product on V generates a norm on V. The second result shows that ifa norm is generated by an inner product, it has to satisfy a certain property. It is aneasy matter to construct norms (even on ]Rn) that do not satisfy this property. so acorollary of this result is that the concept of a norm is less restrictive than that of theinner product.

p.l'dthe triangle inequality is also met. Thus, 11·11is a norm on C([O. ID. the sup-norm.o

IIf + gil :so 11111+ IIglI,

where the first inequality is just the triangle inequality for the absolute value of re a\numbers, and the second inequality follows from the definition of II . II. Taking thesupremum of If(~) + g(OI over s , we now obtain

If(n + g(OI :so If(O + g(~):so IIfll + IIgli.

Example C.9 Let V = C([I1; tn. For lEV, let

11/11 = sup 1/(0·tEIO.I]

Since I is continuous and [0, I] is compact, II III is well defined and finite for every1E V. It is evident that II . II meets the positivity and homogeneity conditionsrequired of a norm. To see that it meets the triangle inequality. note that for any Iand g in V, and any ~ E [0. I], we have

c.J Normed Spaces

Page 344: A First Course in Optimization Theory

Example C.13 Let V == C([O, IJ). Define the metric dp on V by

dp(j,g) = (fotlf(X)-g(X)lPdX)I/P

Then, dp is generated by the p-norrn on C([O, I)). and it follows from Theorem C.14below that dp is a metric on C([O, I)). 0

Just as every inner product generates a norm, it is also true that every norm generatesa metric:

doo(x, Y) = sup IXi - Yil.ie{ I....n)

When p == 2. the p-metric dp on jRn is called the Euclidean metric. As p ~ 00,

we obtain the sup-norm metric doo:

where II • lip is the p-norm of the previous section. It follows from Theorem C.14below that dp is a metric on IRn. Since dp is generated by the p-norm, it is called thep-metric on jRn. 0

dp(x, y) = IIx - Yllp,

Example C.l2 Let V == jRn. Define the metric dp on V by

A metric space is a vector space V with a metric don V, and is usually denoted(V, d). Standard examples of metric spaces include the following:

C.4 Metric SpacesC.4.1 Definitions

A metric on V is a function d: V x V ~ lR that satisfies the following conditionsfor all .r , y, Z E V:

I. Positivity: d(x, y) ?: 0, with equality iff x = y.2. Symmetry: dix ; y) = dty, x).3. Triangle Inequality: d(x, z) S d(x, y) +d(y, z).

(x, y) = ~ (l1x + yl12 - I1x _ Y112)

defines an inner product from the norm. and the norm defined by this inner productmust coincide with the original norm. We leave the details as an exercise.

relationship, called the polarization identity

Appendix C Structures on VectorSpaces336

Page 345: A First Course in Optimization Theory

/+

Btx . r) = (Y E V I d(x, y) < r}.

Note that convergence depends on the metric d. For instance, the sequence lx, }(1/ n) converges in the Euclidean metric to the point 0,but not so in the discrete metri l(In the discrete metric, the only convergent sequences are constant sequences.) III th.interests of expositional cleanliness, however, we will drop the constant referenceto the underlying metric d in the sequel. Thus, for instance, we will simply say tha"xn .......x" rather than "xn .......x in the metric d."

A point x E V is said to be a limit-point of a sequence (x"). if there exis ts

subsequence (Xk(n)} of (xn) such that Xk(n) .......x. Note that x is a limit point of [xnif and only for all r > 0, it is the case that there are infinitely many indices rr fowhich d(xn, x) < r.

A sequence (xn I in V is said to be a Cauchy sequence if for all E > 0, there I

neE) such that for all m, n > neE), we have d(xm, xn) < E.Every convergent sequence in a metric space (V, d) must obviously be a Cauch­

sequence, but the converse is not always true. For instance, suppose V = (0, I) an.the metric d on V is defined by dtx , y) == [x - yl for x, y E V. Then, (xn) == {I/ nis a Cauchy sequence in V, but that does not converge (there is no x E V such thaiXn .......x). A metric space (V, d) in which every Cauchy sequence converges is cal leecomplete.

The open bal]of radius r and center x in (V , d), denoted B(x, r). is defined by

d(xn,x) -+ 0 as n .......00.

C.4.2 Sets and Sequences in Metric SpacesLet a metric space (V, d) be given. Given a sequence {xn} in V, we will say that {x"converges to the limit x E V (written Xn -+ x) in the metric d , if it is the case that

Thus, the concept of a metric space is less restrictive than that of a normed space.

x =.»x =1= y,

{I,

p(x, y) =0,

On the other hand, there do exist metrics even on IRn, wh ich are not generate d hany norm. An example of such a metric is the "discrete metric" p on IR defined by

Proof Immediate from the properties of the norm.

Theorem C.14 Let I! "'Ii be a-norm on V. Define d by d(x, y) = IIx - yn. x,-y EThen. d is a metric on V.

.1 ,C.4 Metric Spaces

Page 346: A First Course in Optimization Theory

Theorem C.I9 Let F be a finite index set. If Y<fJis a ciosea set in V for every cP E F.then so is Ut/>eFf</>.

Theorem C.I8 Let F be afinite index set.If X<fJ is an open set in V for every cP E F,then so is n<fJeFX</J.

Theorem C.l7 Let A be un arbitrary index set. If Ya i,ra closed set in V for eachex E A. then so is naeA fa.

Theorem C.16 Let A be an arbitrary index set. If Xa is an open set in V for eachex E A. then so is UaeA Xa.

Furthermore, a set XcV is said to be bounded if X is completely contained insome open ball around 0, that is, if there is r > 0 such that X C BCO, r).

As in IRn with the Euclidean metric, the following properties hold for arbitrarymetric spaces also. The proofs are omitted:

Theorem C.IS The set XCV is closed if and only if for all sequences {Xk} in Xsuch that {Xk} converges to some x E t. it is the case that x E X.

VThe set XcV is compact if every sequence of points in X contains a conver­

gent subsequence. That is. X is compact if for all sequences {xn} in X, there is asubsequence (X,,(k)} of Ix,,}, and a point x E X. such that

lim Xn(k) = x.k--+oo

It is easy to mimic the steps of Theorem 1.20 in Chapter I to establish that

{y E V I y ¢ Xl.

A set XCV is said to be open in V, or simply open if it is the case that for anyx E X, there is r > 0 such that B(x, r) C A.

As with convergent sequences. it must be stressed that whether or not a subset Xof V is open depends on the particular metric being used. For instance. while not allsubsets of lR are open in the Euclidean metric, every subset X of IR is open under thediscrete metric on IR. To see this. pick any X C IR and any x E X. If r E (0, I), thenfrom the definition of the discrete metric, the open ball B(x, r) with center x andradius r consists of only the point x. Since x E X, we have shown that there existsr > 0 such that Btx ; r) C X. Since x E X was arbitrary, we have shown that X isan open subset oflR under the discrete metric.

A set XCV is said to be closed in V, or simply closed, if XC is open, where XC,the complement of X in V, that is the set defined by

Appendix C Structures on VectorSpaces338

Page 347: A First Course in Optimization Theory

The function / is said to be continuous on VI if f is continuous at each V E VI,By using virtually the same arguments as in the proof of Theorem 1.49 in Chapter I,

it can be shown that

C.4.3 Continuous Functions on Metric SpacesGiven metric spaces (VI, dl) and (V2, d2), a function f :VI -+ V2 is said to becontinuous at the point V E Vt, if it is the case that for all sequences (un) in VI withVn -+ V, it is the case that f(un) -+ /(u), Equivalently, / is continuous at V if forall sequences {vn} in VI,

The set X is bounded since lIedl = 1 for all i . Since dte., ej) = I for all i '" i.X contains no convergent sequences, so it is (vacuously) closed. However, thesequence {el} is a sequence in X with no convergent subsequence, so X is notcompact. 0

x = lei Ii = 1,2, ... }.

Let e, be the element of V that contains a I in the i-th place, and zeros elsewhere.Note that for any i and j with i =I j, we have dtei, ej) = I.Let

dtx, y) = IIx - yll = sup IXi - yd.leN

and let d denote the corresponding metric:

IIxll = sup lxii,;EN

Example C.2I As in Example C.2, let V be the space of bounded sequences in JR.Let II . II denote the sup-norm on V, i.e., for x = (XI, X2, ... ) E V, let

However, not all properties that hold in ]Rn under the Euclidean metric hold forgeneral metric spaces. For instance. an important result in IRn is that a set Z iscompact if and only if it is closed and bounded. However, in arbitrary metric spaces,it is possible for a set to be closed and bounded without the set being compact. Hereis an example:

Theorem C.20 A set Z C Vis compact if and only if every open cover of Z has-afinite subcovet: That is, Z is compact ifand only if whenever (Xa )ae A is anarbiiraryfamily of opensets satisfying Z C UaEAXa, there is afinite subfamily (X¢)</>E F suchthat Z C Ut/>€F X¢.

339C.4 Merrie Spaces

Page 348: A First Course in Optimization Theory

oProof SeeRoyden (1968. Chapter7. p.130).

Theorem C.24 Ametric space (V. d) isseparable ifand only if there is a countablefamily of open sets {Zn} such that for any open Z C V. it is the case that Z =UneN(Z)Zn. where N(Z) = {n Iz, C Z}.

that is, if the open ball B(x, r) containsa point of X different from x.The closure in V of a set XcV is the set X togetherwith all the limit points of

X. Wewill denote the closure of X by cJ(X). Note that we have cJ(X) = X if andonly if the set X is itself closed.A set W C V is said to be dense in V if the closure of W contains V, that is, if

c1(W) ::::l V. Every set is dense in itself. However,as we have shown in Appendix B.everyreal number arises as the limit of someCauchy sequence of rational numbers,so the closure of the rationals Q is the entire real line R. Thus, a dense subset of aset X could be very "small" in relation to X.

Ametricspace (V. d) is said to be separable if it possesses a countable dense set.that is, there exists a countable subsetW of V such that cI(W) = V. For example.the set of rationalsQ is countable and is dense in JR, so JR is separable.The following result gives us an equivalent way of identifying separability. It

comes in handy frequently in applications.

Z E B(X,r)n(j},'f...

C.4.4 Separable Metric Spaces

A point x E V is said to be a limit point of a set XCV if for all r > 0, there isZ E V. Z -1= .r, such that

Corollary C.23 Afunction f: VI -4 V2is continuous on VI if and only if for allopen sets U2C VI. I-I (V2) = (x E VI I f(x) E V2} is an open set in VI.

As an immediate corollary of this result, we get the result that a function is con­tinuous if and only if "the inverse imageof everyopen set is open":

Theorem C.22 Let j: VI -+ V2. where (VI, dt) and (V2,d2) are metric spaces.Then. I is continuous at V E VI if and only if for all open sets V2 C V2such that/(v) E U2.there is an open set VI C VI such that V E VI, and /(x) E V2for allx E UI.

Appendix C Structures on VectorSpaces340

Page 349: A First Course in Optimization Theory

Proof Left as an exercise.

Theorem C.26 If (W. dw) is complete. then it is closed. On the other hand, if W I

a closed subset of V. and V is complete then CW.dw) is also complete.

Finally, our last result of this section relates the notions of "completeness" all."closed ness" of a metric space:

Thus. we have shown that an arbitrary open set in W can be expressed as the unioiof open sets drawn from the countable collection (Yn)nEN. By Theorem C.24, W I

separable.

It follows that

Proof Let (V, d) be a separable metric space, and let (W, dw) be a subspace "(V. d). Since V is separable, there exists. by Theorem C.24. a countable COIlCCli(l1

of open sets (Zn)nEN such that every open set in V can be expressed as the union l)

sets from this collection. Define the family (Yn)nEN by Yn = Zn n W. Then, Yn iopen in W for each n. Now suppose a set X is open in W. Then, there exists a set Xopen in V such that X = X n W. Therefore, there exists KeN and such that

Theorem C.25 Every subspace oj a separable metric space is itself separable.

C.4.S Subspaces

Let (V, d) be a metric space, and let W C V. Define a metric dw on W by dw (x, y) -;d(x, y) for x, YEW. That is. dw is just the metric d restricted to W. The spac.(W. dw) is then called a (metric) subspace of (V, d).

Let (W, dw) be a subspace of (V. d). A set X C W is said to be open in W I

there exists an open set Y C V such that X = Y n W. If W is itself open in V, the.a slIbset X of W is open in W if and only if it is open in V.

For example, consider the interval W = [0, 1] with the metric dw (x, y) = [r - .I

for x, YEW. Then. (W, dw) is a subspace of the real line with the Euclidean metricThe set (~, 1] is open in W since (~, 1] = (~,~)n W and (~, ~) is open in lR.

The following result relates separability of (V, d) to separability of the subs pac.(W, dw):

34 jC.4 Metric Spaces

Page 350: A First Course in Optimization Theory

IIf there were such a metric. it must satisfy dtx . y) = 0 for all x and y in V, but this violates the conditionrequired of a metric that dix, y) > 0 if x "i' y.

Thus, a topological space is a generalization of the concept of a metric space.Finally,a small point. Evenif a topologicalspace is metrizable, the topologycould

be quite "large." For instance, under the topology generated on V by the discretemetric (the discrete topology), every point of V, and therefore every subset of V, isopen.

Example C.27 Let V be any vector space consisting of at least two points. DefineT to be the two-pointset T = {0, V}. Then, it is easily verified that T meets the threeconditions required of a topology (T is called the trivial topology on V). Weleave itas an exerciseto the reader to verify that there cannot exist a metric d on V such thatthe only open sets of V underd are VIand V itself.I 0

The sets in r are called the "open sets" of V (under T); T itself is called a topologyon V.As we have seen in the previous subsection. if the open sets of V are identified

using a metric d on V, then the collection of open sets generatedby d will havethe three properties required above. Thus, everymetric space (V, d) gives rise to atopologicalspace (V. T). If a topologicalspace (V, T) is generated by ametric space(V, d) in this manner, then the topological space is said to be metrizable.It is possible that two differentmetric spaces (V, dl) and (V, d2) give rise to the

same topological space (V, T), since two different metrics may generate the sameclass of opensets (see the Exercises).Thus, even if a topological space is metrizable,there may not be a unique metric space associated with it. It is also possible that atopological space is not metrizable. For instance:

1. 0, V E r.2. 01,02 E T implies 01n 02 E T.

3. For anyindex set A, Oa E T fora E A implies UaeAOa E To

C.S Topological Spaces

C.S.l Definitions

Unlike a metric space (V. d) which uses the metric d to determine which classesof sets are open, and which are closed or compact, a topological space takes asthe defining primitive on a vector space V, the notion of open sets. Formally, atopologicalspace (V, T) is a vector space V and a collection T of subsets of V thatsatisfy

Appendix C Structures on VectorSpaces342

Page 351: A First Course in Optimization Theory

2In a sense this is analogous to the result that while closed and bounded sets in R are compact. this is nOI[rue in general metric spaces.

(0 E T, x E 0) implies there is B E B, such that x E B cO.

C.5.4 Bases

A base at x for a topological (V, r) is a collection of open sets Bx c r such that

C.S.3 Continuous Functions on Topological SpacesLet (V, r ) and (V', ,I) be topological spaces, and let I: V ~ VI. Then, I is saidto be continuous at x E V if for all 0' E r such that I(x) E 0' it is the case thatI-I(0') E r ,Further,I is continuouson V iffor all 0' E r', wehave r:' (0') E r ,

It is easy to see that if (V, r ) and (V', ,I) are metrizable spaces, then a function iscontinuous at x in the topological sense given above, if and only if I is sequentiallycontinuous at x (i.e., for all sequences Xn _". x we have I(xn) ....... f(x». Weleave it to the reader to examine if this remains true if (V, r ) and/or (V', r ') are notmetrizable.

A subsetW of a topological space (V , r ) is saidto be compact if every open coverof W has a finite subcover. It is not very hard to see that if (V, r ) is metrizable,then a set W is compact in the sequential sense (i.e., every sequence in W contains aconvergent subsequence) if and only if it is topologically compact (everyopen coverof W has a finite subcover). If a topological space is not metrizable, this is no longertrue.2

Theorem C.30 Let (V, r ) be a topological space. Let (Xa)aEA be a collection lijclosed sets in V, where A is an arbitrary index set. Then, naEA Xa is also closed. .

Theorem C.29 Let (V, r) be a topological space. Let (X"')cI>EF be a collection ~jclosed sets in V, where F is a finite index set. Then, UcI>EF Xci> is also closed.

Theorem C.28 Let (V, r ) be a topological space. Then, 0 and V are both closed.

C.S.2 Sets'and Sequences in Topological SpacesA sequence {xn} in a topological space (V, r) is said to converge to a limit x E Vif for all open sets 0 E r such that x E 0, there exists N such that for all n ::: N.Xn E O.

A set XcV is closed if AC is open (i.e., if AC E r ). It is easy to show thefollowingproperties:

C.5 TopologicalSpaces

Page 352: A First Course in Optimization Theory

Proof If at each x E V, we take the open balls B(x, r) for rational r > O. thenthe collection Bx of all such open balls forms a base at x which is countable. Thus,every metric space is first-countable, which proves Part 1. Part 2 is an immediateconsequence of Theorem C.24. 0

Theorem C.32 Let (V, r ) be a metrizable topological space with metric d. Then,I. (V, r) is first-countable.2. (V, r) is second-countable if and only if the metric space (V, d) is separable.

A base B is said to be a countable base if it contains countably many sets. Atopological space (V, r) is said to oe first-countabte, or to satisfy the first axiom ofcountability, if for each point x, there exists a countable base Bx at x. It is said to besecond-countable, or to satisfy the second axiom of countability, if it has a countablebase B.

The following result relates separability to the notion of a countable base.

Therefore, 0 is the union of open sets, and must be open. That is, 0 ET. 0

o = U{B E BIB cO}.

Proof Necessity (the "only if" part) follows from the definition of a base B for(V, T). To see sufficiency, suppose 0 C V is such that if x EO, then there is B E Bwith x E Be O. Then, we must have

Proposition C.31 Let B be a base for (V, r). Then, 0 E T if and only if for eachx E 0, there is B EB such that x E B C O.

where. of course, B(x, r) represents the open ball with center x and radius r in themetric d. Since the metric d gives rise to the topology r , it follows from the definitionof an open set in metric spaces that Bx forms a base at x. Thus. the concept of a baseat x is simply the topological analogue of the metric space notion of the open ballsBix , r) around x.

A base for a topological space (V, T) is a collection B c r of open sets such thatB is a base at x for each x E V. It is easy to see that

In words, Bx is a base for (V, T) at x, if every open set containing x is a superset ofsome set in the base that also contains x.

For example, suppose (V. T) is a metrizable topological space with metric d. Pickany x E V. and define Bx by

Bx = (B I B = B(x,r) for some r > OJ,

Appendix C Structures 011 VectorSpaces344

Page 353: A First Course in Optimization Theory

Show that if d 1 and d2 are equivalent metrics on V, then the following hold:

(a) A sequence {xnl in V converges to a limit x in (V, dJl if and only (xnconverges to x in (V, d2).

(b) A set XcV is open under d, if and only if it is open under d2.

S. Give an example of two metrics dl and d2 on a vector space V, that generate thesame open sets but are not equivalent.

6. Which of the metrics induced by the p-norms II . lip on ]R2 are equivalent"

7. Is the equivalence of metrics a transitive relationship? That is. is it true that if Iiis equivalent to d2, and d: is equivalent to d3. then d, is equivalent to d3?

8. Show that the discrete metric on lRdoes, in fact, meet the three properties requi rcofa metric.

9. Which subsets of lRarc closed sets in the discrete metric? Which arc compact

10. Prove that under the discrete metric, (V. d) is separable if and only if Y I

countable.

II. Let (VI, d I) denote R with the discrete metric and (Vl. d2) denote lR witt. thEuclidean metric. Let F denote the class of continuous functions from VI to 1'_,g the class of continuous functions from VI to VI, and H the class of continuoufunctions from V2 to V2.Compare F. g, and 'H.Which is the largest set (in termof containment)? Which is the second largest?

I(x, y) = 4(lIx + yII2 -lix _ yII2).

Hint: This follows by expanding (x ± y, x ± y) = Ilx ± Y1l2.2. Using the polarization identity, prove the Parallelogram Law.

3. Of all the p-nonns on ]Rn, only one is generated by an inner product-lhEuclidean (p = 2) nann. Show that, for example, I . II is not generated han inner product. One way to do this is to show that the Parallelogram Law I

violated for some pair x and y.

4. Let d, and di be metrics on a vector space V. Define d, and d: to be equival cnif there exist constants CI and Cl in lRsuch that for all x, y E V, we have

C.6 ExercisesI. Show that if a norm II . II on a vector space V is generated by an inner pro du.

(', .), then the polarization identity holds, i.e., show that for all x and y in V. v

have

c:« 1~X('rri,H'J

Page 354: A First Course in Optimization Theory

and completes the topology by taking the finite intersection and arbitrary unionof all such sets. Show that if CV,') and (V', r") are topological spaces thatare metrizable (with metrics d and d', say), then the product metric defines theproduct topolgy.

18. A set XCV in a topological space (V, r) is said to be sequentially compact ifevery sequence of points in X contains a susequence converging to a point of X.

(a) Show that if (V,.) is a metrizable space (say, with metric d), then X issequentially compact if and only if it is compact, that is, if and only if everyopen cover of X contains a finite subcover.

(b) Give an example of a nonmetrizable topological space (V, r) in which thereis a sequentially compact set that is not compact.

Show that the product metric is, in fact, a metric.

17. Let CV, r ) and (V', r') be topological spaces, and let W = V xV'. Theproducitopology 'w on W is defined as the topology that takes as open, sets of the form

o x a', a E r, a' E.',

14. Let V be a vector space. Suppose V is given the discrete topology. Which subsetsof V arc compact?

IS. Let V be a vector space. Suppose V is given the discrete topglogy, Which func­tions f: V ---+ _R are continuous on V, if JR is given the Euclidean topology?

16. Let (X, d!) and (X2, d2) be metric spaces. Let Z = X x Y. Define a metric (theproduct metric) on Z by

12. How does the class of continuous functions on IR" under the sup-norm metriccompare to those on JR." under the Euclidean metric. (Assume in both cases thatthe functions map _R" into JR, and that the range space JR is given the Euclideanmetric.)

13. (The p-adic metric) Fix a prime p. Any rational r can be written as r = p-n(r,)where n, a, b, are integers and p does not divide a or b. Let Irlp = p", This,the p-adic "valuation," is not a norm. For r, q, rational, let the p-adic metricbe defined by d(r, q) = Ir - qlp. Show that the p-adic valuation obeys Iqlp +Irlp S max{lqlp, Irlp}. Use this to show that the p-adic metric satisfies thetriangle inequality. Verify the other requirements for a metric. Finally, show thatif r E B(q, e) then B(q, e) c B(r, e). It follows that B(q, e) = Btr, e), so everypoint in an open ball around r in the p-adic metric is also the center point of theball!

Appendix C Structures on VectorSpaces346

Page 355: A First Course in Optimization Theory

19. Show that if(V, r ) and'(V', r ') are metrizable topological spaces, then a f~nClioJf: V -+ V' is continuous at a point x in the topological sense if and only if Jis sequentially continuous at x. i.e .. if and only jf for all sequences XII __" .1.w.have f(xn) -+ f(x). Is this true if (V, r) and/or (V', (') are not mctri zublc

20. Let V = (0, I]. Let r be the collection of sets consisting of 0. V, and all subsetof V whose complements are finite sets. Show that r is a topology for V. Shovalso that (V, r ) is not first-countable.

34C.6 Exercises

Page 356: A First Course in Optimization Theory

349

Apostol, T. (1967) Calculus, Second Edition, Blaiswell, Waltham, Mass.Arrow, KJ. and A.c. Enthoven (1961) Quasi-Concave Programming, Econometrica

29(4),779-800.Arrow, KJ., L.Hurwicz, and H. Uzawa (1958) (Eds.) Studies in Linear and Non-Linear

Programming, Stanford University Press, Stanford, Calif.Bank, B., et al. (1983) Nonlinear Parametric Optimization, Birknauser-Verlag, Boston

and Basel.BartJe, R.G. (1964) Elements of RealAnalysis, Wiley, New York.Berge, C. (1963) Topological Spaces, Macmillan, New York.Billingsley, P. (1978) Probability and Measure, Wiley, New York.Birkhoff, G. (1967) LAttice Theory, American Mathematical Society, Providence, R.I .Blackwell, O. (1964) Mernoryless Strategies in Finite-Stage Dynamic Programming.

Annals of Mathematical Statistics 35, 863-865.Blackwell, O. (1965) Discounted Dynamic Programming, Annals of Mathematical

Statistics 36, 225-235.Debreu, G. (1952) Definite and Semidefinite Quadratic Forms, Econometrica 20(2),

295-300.Debreu, G. (1959) Theory of Value,Wiley, New York.Fenchel, W. (1953) Convex Cones, Sets, and Functions, mimeograph, Princeton, N.lFudenberg, D. and J. Tirole (1990) Game Theory, MIT Press, Cambridge, Mass.Halmos, P.R. (1961) Naive Set Theory, Van Nostrand, Princeton, N.J.Hewitt, E. and K. Stromberg (1965) Real and Abstract Analysis, Springer- Verlag, B(TI in

and New York.Hildenbrand, W. (1974) Core and Equilibria of a LArgeEconomy, Princeton University

Press, Princeton, N.J.Hurwicz, L. (1958) Programming in Linear Spaces, in Studies in Linear and Non-Lir ra r

Programming (K. Arrow, L. Hurwicz, and H. Uzawa, Eds.), Stanford UniversityPress, Stanford, Calif.

Johnston, l. (1984) Econometric Methods, McGraw-Hill International, Singapore.Milgrom, P. and J. Roberts (1990) Rationalizability, Learning, and Equilibrium in Game

with Strategic Cornplcrncntarities, Econometrica 58(6). 1255-1278.Munkres, J .R. (1964) Elementary Linear Algebra, Addison- Wesley, Reading. Mass.Munkres, l.R. (1975) Topology:A First Course, Prentice-Hall International. Englewood

Cliffs, N.J.

Bibliography

Page 357: A First Course in Optimization Theory

Panhsarathy, T. (1973) Discounted, Positive, and Noncooperative Stochastic Games,International Journal of Game Theory 2, 25-37.

Royden, H.L (1968) Real Analysis, Macmillan, New York.Rudin. W. ( 1976)Principles of Mathematical Analysis, Third Edition, McGraw-Hili

International Editions, Singapore.Samuelson, P.A.(1947) Foundations of Economic Analysis, HarvardUniversity Press,

Cambridge, Mass.Smart, D. (1974) Fixed-Point Theorems, Cambridge University Press, Cambridge and

New York.Strichartz, R. (1982) The Wayof Analysis, Vols. I and II, mimeograph, Department of

Mathematics, Cornell University, Ithaca, N.Y.Takayama,A. (1985)Mathematical Economics, Second Edition, Cambridge University

Press, Cambridge and New York.Tarski, A. (1955) A Lattice-Theoretic Fixed-Point Theorem and Its Applications, Pacific

Journal of Mathematics 5, 285-308.Topkis, D. (1978) Minimizing a Submodular Function on a Lattice, Operations Research

26(2), 305-321.Topkis, D. (1995) Comparative Statics of the Firm, Journal of Economic Theory 67(2),

370-401.Vives,X. (1990) Nash Equilibrium with Strategic Complementarities, Journal of

Mathematical Economics 19(3),305-321.

Bibliography35~

Page 358: A First Course in Optimization Theory

351

Cartesian product. 69Cauchy criterion. IICauchy sequence. 11-14.287.323.325, 326,

329defined. IIequivalent. 13. 325limit of. 12

Cauchy-Schwartz inequality. 4-6.66. J .l1closed ball. 22closed graph. 233closed set. see set, closedclosure. see set. closure ofcofactor. 38commutative. 323. 324. 330compact set. see set. compactcornpacrification, 94. 300comparative dynamics. 270complement. see set. complement ofcomplementary slackness. 147. 149. 154concave function. see function. concaveconstraint

effective. 145-147. 148. 156-159. 162. 161equality. 86. 112. 128inequality. 87. 112. 113.128.145. ISSrelaxation of. I 16

constraint qualificationunder equality constraints. 115-116. I 22·-1

130. 132under inequality constraints. 147. 148. 14'1,

152-154.161.162constraint set. 74consumer theory. 128consumption-leisure choice. 8 J. 98consumption-savings problem. 276continuously differentiable. 44-45. 172

twice. 49contraction mapping. 287-289. 302Contraction Mapping Lemma. 293. 296Contraction Mapping Theorem. 288contrapositive. 316-318

Index

backwards induction. 273. 276Banach Fixed Point Theorem. 288Bartle. R.G,. 1.323.330base. 343

countahle.344Bellman Equation. 8R. 274. 276. 283-286. 291.

298.301. 302. 304Bellman principle of optimality. 274. 284. 295Bertrand oligopoly. 262best-response, 244. 263bilinearity, 332Birkhoff. G .• 255bordered

hessian. see hessian. borderedmatrix, see matrix, bordered

bounded set. see set. boundedBrouwer Fixed Point Theorem. 244. 246budget correspondence. 240-242budget set, 79.92. I 13. 128. 158

absolute value. 2.324.327action set. 243adjoint. 39Apostol. T.. 323arg max. 76arg min. 76Arrow. KJ .• 208associative. 323. 324. 330attainable values. 75

Ct.44C2.49N. 2. 323Q.2.323R. 2. 325R".2

completeness of. 12. 328Z. 2. 323ej.117

Page 359: A First Course in Optimization Theory

field. 323. 326ordered. 327

finite subcover, 28finite-horizon dynamic programming. 88. 268-278first-countable. 344first-order conditions, 101. 117. 122, 145.172.187.

203,277. 300for unconstrained maximum. 137for unconstrained optima. 101, 104-110under equality constraints. 114. 117under inequality constraints, 147

fixed point. 244, 245. 288unique. 288

fixed poinltheorem. 244. 263. 288free disposal, 80functionCt,44.45C2, 49. 50. 52concave. 87. 172-185. 187.203.237.239.242;

continuity properties of. 177-179; defined.174; differentiability properties of.179-183; second derivative of, 183-185

continuous, 1,41-43,71,90,237.239,242.244convex.S7, 172-185,203; defined. 174decreasing, 89derivative of, 43-45, 114; higher order, 50;

second. 49-50differentiable, I, 43-50. 116directional derivative of. 48-49, 72epigraph of, 174

endowment. 81, 83Enthoven, A.C., 208epigraph, see function: epigraph ofequality-constrained optimization. 112-142Euclidean inner product. 4-5. 332

bilinearity of, 4defined. 4positivity of. 4properties of. 4symmetry of, 4

Euclidean metric. 2. 6-67. 336and triangle inequality. 6defined. 6positivity of. 6properties of, 6relation to norm, 6symmetry of. 6

Euclidean norm, 5-6, 334and triangle inequality. 5defined. 5homogeneity of. 5positivity of. 5properties of. 5relation to inner product. 5

Euclidean space. I.2. 264expected utility, 82expenditure minimization. 79. 80. 95. 97

infinite-horizon. 88.281-298; optimal strategyin, 291-298

Index

Debreu, G., 120Dedekind cuts. 323dcmand correspondence, 240. 242demand curve. 80. 262DeMorgan's laws. 25. 26denseness of rationals. 328derivative• see function. derivative ofdeterminant . see matrix. determinant ofdeterministic, 271differentiability, 86diminishing marginal product, 173diminishing marginal utility, 173directional derivative. see function, directional

derivative ofdiscount factor. 281, 282, 300distributive, 330dynamic programming. 88. 268, 322finite-horizon. 88. 268-278; optimal strategy in.

272-276

convergencepointwise, see pointwise convergenceuniform, see uniform convergence

converse, 316-318convex

function, see function, convexset, see set, convex

convexity, 86-88, 237, 253, 263, 298, 300and optimization theory, 185-198and the Kuhn-Tucker Theorem, 187-189,

194-198and unconstrained optimization, 187strict, 172

correspondence, 224-234, 261closed,229dosed-graph, 229closed-valued, 228, 229compact-valued, 228, 229, 231, 233, 235, 237,

240,245,274,298constant, 228, 260continuous, 235, 237, 240, 274, 298

defined,226convex graph, 229.238convex-valued, 228. 237-240, 245, 246defmed,225graph of, 228lower-inverse, 230lower-semicontinuous (lsc), 236; characterized,

230, 232; defined,226nonempty-valued, 245, 246. 274single-valued, 225, 233, 237upper-inverse, 229uppcr-semiconrinuous (usc), 231,235, 237, 239,

246, 297; characterized, 230, 231; defined,226

cost minimization, 80,94,121,128,130,133.157,161

cover, 28critical point, 102,121-124,126,128,129,131,

150, lSI, 153-157,162,190,216cyclical, 299

352

Page 360: A First Course in Optimization Theory

labor supply. 81Lagrange's Theorem. see Theorem of LagrangeLagrangeanunder equality constraints, 121. 123-128, 130under inequality constraints. 150, 151, 155. 157.

159. 189Lagrangean method

under equality constraints. 121-123. 126, 127,133

under inequality constraints, 150-151Lagrangean multipliers, 116, 117, 121

and shadow price. I 17lallices.254-255least element. 255liminf.18-21.67,68defined. 19properties of. 19-21

limit. see sequence, limit oflimit point. see sequence. limit point oflim sup. 18-21.67.68

defined. 19properties of, 19-21

linearly dependent. 33. 34linearly independent. 34

Kakutani Fixed Point Theorem. 244. 246Kuhn-Tucker first-order condiuons. 147. 152. 185,

188.203,214,216 .Kuhn-Tucker muuipliers, 148-150Kuhn-Tucker Theorem• see Theorem of Kuhn an"

Tucker

Johnston. 1.. Ijoin. 254, 255

inequality-constrained optimization. 145, 168, 187,189

inf, see infimuminfimum. 14-17,68. 7R, R9. 234

defined, 14relation to minimum. 17

infinite-horizon dynamic programming, 8R.281-298

inner product. 2. 4,6,59, :nO, :n2-D:lEuclidean, see Euclidean inner productrelation to norm, 335

inner product space. 332integer. 2interest rate, 276Intermediate Value Theorem. (:i)

for the derivative. 61in R", 63

intersection. see sets, intersection ofinverse

of function. see function. inverseof matrix. see matrix. inverse

Inverse Function Theorem. 56,6Sinverse image. 42irrational number, I, 2

IIIIIe'x

Implicit Function Theorem. 56. 65. 66, 136, 138Inada conditions. 305increasing differences, 253. 255-258, 264

strictly. 256, 258, 260index. 24index set, 24

arbitrary. 24finite. 24

indirect utility function, 240, 242

Halrnos, P.R.• 315hessian. 49. 73, 120

bordered. 120Hewitt. E.•323history. 269homogeneity. 334hyperplane, 56-60. 255

separating. 56-60supporting, 56

games. 243strictly supermodular, 262supermodular, 262. 263

global maximum. 101. 104global minimum, 104golden rule, 308greatest element. 255. 259, 260. 264growth model. 298-309

concave. 303differentiable concave, 305optimal strategies in, 300-309

imageo!.75increasing. 89. 239inverse. 65jointly continuous. 72Iower-semicontinuous (Isc), 234nondecreasing, 263nonincreasing. 263one-to-one, 65onto. 65partial derivative of. 46-48quasi-concave. 87. 203-213. 239; defined. 204;

derivative of. 210. 216-217; properties of.209-210; second derivative of. 211.217-220

quasi-convex. 87. 203-213; defined. 204;properties of 209-210

separately continuous. 72strictly concave. 173, 176, 186, 190,237,2)9,

243strictly convex. 173, 176strictly increasing. 76. 208strictly quasi-concave, 205. 213. 239strictly quasi-convex. 205strictly supermodular, 256subgraph of. 174supermodular, 255, 264-266upper-semicontinuous (usc). 234

Page 361: A First Course in Optimization Theory

Nash equilibrium. 243. 244. 246. 263. 264necessary conditions. 85. 86. 91. 95. 102. 103. 112.

114.118.123.145.185.188.277.320-322negation. 318. 320negative definite. 50-55negative semidefinite. 50-55norrn.2.4.6. 330. 333-336

Euclidean. see Euclidean norm

strict local. 103. 140unconstrained. 100unconstrained local. 101

Maximum Theorem. 88. 224. 225. 235-237. 240.296,297

applications of, 240-247under convexity, 88, 225, 237-240

Mean Value Theorem, 61-63in Rn. 63

meet. 254. 255metric, 2, 4, 6. 330

Euclidean. see Euclidean metricrelation to norm. 336relation to topology. 342sup-norm. 286. 292. 336

metric space. 2. 328. 336-341bounded set in. 338Cauchy sequence in. 337closed set in. 338closure of set in. 340complete. 12.286-287.290.292.337continuous functions on. 339convergence in. 337convergent sequence in. 337dense subset of. 340open ball in. 337open set in. 338separable. 340. 344sequence in. 337subspace of. 341

metrizable space. 343Milgrom, P.,262minimization problem. 75. 76. 224

convex. 173existence of solution to. 90-92quasi-convex. 205solution to. 75strictly convex. 173

minimizer. 75. 76minimum, 16-17.68

defined. 16global. 122. 127. 132. 151local. 102. 104. 114.117. 118, 136.146relation to infimum. 17strict local. 103. 131unconstrained. 100

Minkowski inequality. 334mixed constraints. 113. 145. 164-165mixed strategies, 243monotone. 299. )00Munkres. J.R .. 1.315.330

Index

Markovian. 268. 271Markovian strategy. see strategy. Markovianmatrix. 1,30-41adjoint, 39bordered. 120cofactor. 38column rank. 34determinant of. 35-38. 70. 120; and rank, 38;

calculation of. 39-41; defined. 36;properties of. 37

diagonal. 32full rank. 35identity. 33indefinite. 51inverse. 38-39. 70; calculation of. 38-39;

properties of. 39lower-triangular. 33negative definite. 50-55,73, 103.104, 140negative semidefinite. 50-55, 73, 103, 104. 139nonnegative definite. 51nonpositive definite, 51positive definite. 50-55.73. 103. 104positive semidefinite. 50-55.73. 103, 104product. 31rank. 33-35rank of. 114row rank. 34square, 32sum, 30symmetric. 32transpose. 31, 33upper-triangular. 33

maxirnixcr, 224maximization problem. 75, 76. 224

convex. 173existence of solution to. 90·-92. 190quasi-convex, 205solution to. 75strictly convex. 173under inequality constraints. 151

maximizer. 75. 76. 186.264.274maximum, 16-17,68

defined. 16global, 122-124. 126. 127. 130.161,186.213local. 101, 102,104; 114. 117. 118. 136. 146.

151. 165.186,213relation to supremum. 17

local maximum. see maximum. locallocal minimum. see minimum. locallocal optimum. see optimum. locallogic. 315lower bound. 14lower-contour set. 204lower-semicontinuous correspondence. see

correspondence. lower-semi continuouslower-semi continuous function. see function.

lower-semicontinuouslsc, see correspondence. lower-semicontinuous;

function. lower-sernicontinuous

354

Page 362: A First Course in Optimization Theory

second-countable, 344second-order conditions. 101. 128

for unconstrained optima, 103-110under equality constraints. 117-121

securities. 81separable. see metric space. separableseparating hyperplane. see hyperplane. separatingseparation theorems, 56-60sequence

bounded, 7convergence of, 7-9convergent. 7. 67, 287defined. 7limit of. 7, 326limit point of. 10. 1'1.67monotone. 17-IXmonotone decreasing, 17monotone increasing, 17nondecreasing. 17nonincreasing, 17unbounded, 7

setbounded. 2, 23,68-70.93closed. 2, 22-93; defined, 22; properties of.

25-26; sum of. 27closure of, 57. 59, 69compact. 2, 23, 59, 68-69, 89.90,93, 94. 128;

Ramsey-Euler Equation, 300. 306rank of matrix, see matrix. rankrational number. I. 2real line. Ireal number, 325real number system. 325reward function. 243, 269,281Roberts. J .• 262Royden. H.L..323.330.334Rudin. W.. 1.46.61-63.65.66,323

quadratic form, I, 118, 120defined. 50negative definite, 50-55negative semidefinite. 50-55positive definite. 50-55positive semidefinite, 50-55

quantifierexistential. 318. 319universal, 318, 319

quasi-concave function, sa function. quasi-cone avequasi-convex function, see function, quasi-convexquasi-convexity, 87, 203, 205

and convexity, 205-209and optimization theory. 21 >-216and the Theorem of Kuhn and Tucker, 214 -21 5.

220-221

Cobb-Douglas, 94linear. 94. 133

profitmaximization. 80,97public finance. 83, 84public goods. 83. 98

355Index

Parallelogram Law. 335parameter space. 224parametric continuity. 85. 87. 224.235-239parametric maximization problem

defined. 77solution to. 77

parametric minimization problemdefined. 77solution to. 77

parametric monotonicity, 85. 88. 253. 258-261, 263parametric optimization problem. 77, 85, 224

defined. 77Pareto optimum. 82. 93, 134partial derivative, see function. partial derivative ofpayoff function. 243permutation, 35, 120

even. 36inversions in. 36odd. 36

players. 243pointwise convergence. 291portfolio. 81portfolio selection. 81.98positive definite. 50-55positive semidefinite. 50-55posuivity, 332, 333. 336producer theory. 128production function, 80. 130. 299

objective function. 74. 94. 164.253.278open ball. 22open cover. 28. 231. 343open set. see set. openoptimal strategy, see strategy. optimaloptimization, 1.74optimization problem. 76. 258

convex. 173defined. 74equalit), constrained. 113. 121existence of solution to. 90-92. 190existence of solutions. 85. 86inequality constrained, I 13parametric. 71quasi-convex. 205solution 10. 75. 76strictly convex. 173, 186.277uniqueness of solution. 85. In, : 86with mixed constraints. 113

optimization theoryobjectives of. 85

optimumglobal. 86.S7, 122.123.127.172. 203interior. 100local. 86. 122, 136.145. 172unconstrained. 100. 187

relation to inner product. 335 .relation to metric. 336

normal form games. 243normed space. 334

Page 363: A First Course in Optimization Theory

uniform convergence. 289-291Uniform Convergence Theorem. 289. 291union. see sets. union ofupper bound. J 4upper-contour set. 204upper-semicontinuous correspondence. see

correspondence. upper-semicontinuousupper-semicontinuous function, see function.

upper-semicontinuoususc. see correspondence, upper-semi continuous;

function. upper-semicontinuousutility function, 78, 81. 92. 128, 133. 158.240.276.

299,300Cobb-Douglas, 79, 203weighted, 83. 93. 134

r-history, 269. 271, 272. 282, 295Tarski Fixed Point Theorem, 263taxation, 84Taylor's Theorem, 62. 65,140,141

in Rn,63, 64Theorem of Kuhn and Tucker, 87, 145, 165. 168,

189cookbook procedure for using. 150proof of, 165-168statement of, 145under convexity, 187-189, 194-198under quasi-convexity. 214-215,220-221

Theorem of Lagrange. 86.112-142, 145. 165. 166cookbook procedure for using. 121-127proof of. 135-137statement of. 114

Topkis, D.• 264topological space, 2. 342-344

base for. 344closed set in. 343compact set in, 343convergence in. 343metrizable, 344sequences in. 343

topology. 2. 330. 342discrete. 342nonrnetrizable, 342relation to metric. 342trivial. 342

transition function. 269. 281triangle inequality. 324. 327. 334. 336

sup-norm, 334sup-norm metric, see metric. sup-normsupermodular, 255

strictly. 256supermodularity, 88,253, 255-258

characterization of, 264-266superset, see set. superset ofsupporting hyperplane. see hyperplane. supportingsupremum, 14-17,68,78,89.234

defined, 14relation to maximum. 17

symmetry, 332. 336

Index

(coni.)defined. 23; finite intersection property. 70;nested sequences of. 27; nonemptyintersection property, 26; properties of. 23,26-29; sum of. 27

complement of. 24. 316connected, 68convex. 23-24. 57.59.60.63. 68. 70, 87. 172.

173.190.204. 237,244; defined. 23;properties of. 29; sum of, 29

empty. 315interior of. 100open. 22-136; defined. 22; properties of. 25-26proper subset of. 316subset of. 316superset of, 316

set theory. 315sets

intersection of. 24-26. 316union of, 24-26. 316

shadow price. 117Slater's condition. 188-190social planner. 298stationary

dynamic programming. see dynamicprogramming. infinite horizon

strategy.see strategy. stationarysteady state. 308stochastic. 271strategy. 269. 282

Markovian. 271-272. 295Markovian optimal, 272. 274,275optimal, 270, 272, 274, 277, 283. 284stationary. 292. 294-296stationary optimal. 295. 296. 298. 301

strategy set. 243Strichartz, R.• 323strict localmaximum. see maximum. strict localstrict localminimum. see minimum, strict localstrictly concave function. see function, strictly

concavestrictly convex function. see function. strictly

convexStromberg. K.•323subcollection, 24

arbitrary, 24finite. 24

subcover, 28. 231subgraph, see function. subgraph ofsublatticc, 254. 255. 2Mcompact. 255. 259. 263. 264noncompact, 255

submatrix, 119. 120, 135subsequence. 67, 235.241defined. 10limit of. JO

subset•.,ee set, subset ofsubspace. 341sufficient conditions. 85. 87, 91.95.102.103. 114.

118.147.185.253.277.320-322sup. see supremum

356

Page 364: A First Course in Optimization Theory

WeierstrassTheorem. 52. 57. 86.90-97,123.127-129.132.133.234

Generalized. 234in applications, 92-96proof of. 96-97statement of. 90

Vives.X., 262

357Index

value function, 78. 269, 270. 274. 276. 277.283-285. )01

maximized, 78.224minimized. 78

vector space. 2. 330-332

utility maximization, 78, 79, 81, 84, 92,121,123,128.133.157.261