linear optimization - eindhoven university of …rudi/opt.pdflinear optimization problems can be...

Linear Optimization

a1

a3

a4

a5

a2

c

cx=d

��

��

��

��

��

��

��

c

v

v

v

v

v

0

1

2

3

4

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

a3

a4

a5

c

cx=d

a2

1a

��

��

��

��(x, f(x))

(y, f(y))

��

��

��

S

f(x)=d

Dr R.A.Pendavingh

August 26, 2015

Contents

Introduction 51. Optimization 52. Some preliminaries 73. Linear equations 9Exercises 12

Chapter 1. Linear inequalities 151. Convex sets 152. Linear inequalities 183. Caratheodory’s Theorem 20Exercises 23

Chapter 2. Linear optimization duality 271. The duality theorem 272. Optimal solutions 293. Constructing the dual 33Exercises 35

Chapter 3. Polyhedra 391. Polyhedra and polytopes 392. Faces, vertices and facets 423. Polyhedral cones 44Exercises 45

Chapter 4. The simplex algorithm 491. Tableaux and pivoting 492. Cycling and Bland’s rule 543. The revised simplex method 56Exercises 58

Chapter 5. Integrality 611. Linear diophantine equations 612. Lattices 643. Lattices and convex bodies 66Exercises 68

Chapter 6. Integer linear optimization 711. Integer linear optimization 712. Matching 743. Branch & bound 76Exercises 78

3

4 CONTENTS

Chapter 7. Convexity 791. Convex and concave functions 792. Positive definite and semidefinite matrices 823. Examples of convex functions 84Exercises 86

Chapter 8. Convex optimization 891. Convex optimization with linear constraints 892. Differentiable convex optimization 923. Conic convex optimization 94Exercises 97

Introduction

1. Optimization

1.1. Optimization. Optimization is choosing, from a set of alternative possibilities, theone that is ‘best’. In general terms, optimization is the following.

Given: a set S and a function f : S → R.Find: an x∗ ∈ S such that f(x∗) ≥ f(x) for all x ∈ S.

Example. Consider a cyclindrical can of height h with a circular bottom lid of radius r.The shape of the can is completely determined by h and r, so designing a can amounts tochoosing h, r ≥ 0. The volume of the can is πhr2, and its surface area is 2πr2 + 2πrh. So todesign a can of volume at least 1, and with minimal surface area, we must solve

min{2πr2 + 2πrh : πhr2 ≥ 1, h ≥ 0, r ≥ 0}

In this optimization problem, the set feasible possibilities is

S := {(h, r) ∈ R2 : πhr2 ≥ 1, h ≥ 0, r ≥ 0}.

Our objective is to minimize the function f : (h, r) 7→ 2πr2 + 2πrh.

In optimization the set S is usually referred to as the feasible set, an x ∈ S is a feasiblepoint, the function f is the objective function. The x∗ ∈ S we are looking for is an optimalsolution, this x∗ is such that f(x∗) = inf{f(x) : x ∈ S} and we also say that x∗ attains theoptimum.

In this course, we will mostly consider problems with finitely many design variables, re-stricted by finitely many constraints. Then the feasible set S can be viewed geometrically asthe subset of points x ∈ Rn that satisfy various restrictions, such inequalities g(x) ≤ 0, whereg : Rn → R, equations h(x) = 0, where h : Rn → R, and possibly the requirement that x ∈ Zn.

Example: a transportation problem. We have k factories producing certain goods andand l cities needing these goods. The i-th factory has a production capacity of ci ∈ R units,and the j-th city has a demand of dj ∈ R units. The cost of transporting one unit of thesegoods from factory i to city j is pij ∈ R. We need to choose xij ∈ R, the amount to be movedfrom the i-th factory to the j-th city, for each i and j. We must choose these numbers suchthat

• xij ≥ 0 for i = 1, . . . , k and j = 1, . . . , l; the goods are transported from factories tocities, and not the other way around.

•∑l

j=1 xij ≤ ci for i = 1, . . . , k; the total amount leaving a factory does not exceed itsproduction capacity.

•∑k

i=1 xij ≥ dj for j = 1, . . . , l; the demand for each city is met.

•∑k

i=1

∑lj=1 pijxij , the total cost of transportation, is as small as possible.

5

6 INTRODUCTION

This is an optimization problem in Rkl, with objective function f : x 7→∑k

i=1

∑lj=1 pijxij , and

with feasible set

(1)S := {x ∈ Rkl :

∑lj=1 xij ≤ ci, i = 1, . . . , k,∑ki=1 xij ≥ dj , j = 1, . . . , l,

xij ≥ 0, i = 1, . . . , k, j = 1, . . . , l }

1.2. Linear, convex, and integer linear optimization. A linear optimization problemis an optimization problem whose objective is a linear function and whose constraints are linearequations and inequalities. The above transportation problem is a linear optimization problem.Integer linear optimization is linear optimization, but with the additional constraint that thesolution be integral. This is a hard problem class, but one with many applications. The mostcommon solution method is an application of linear optimization. Convex optimization is abroad class of problems containing linear optimization, but also quadratic and semidefiniteoptimization.

1.3. Applications. Whatever people do, they will at some point ask whether there isan easier way to attain the same goals, or whether a better result can be reached with thesame effort. With mathematical optimization one can answer some of these questions. In thissection, we outline some typical applications of the theory taught in this course.

The classical application of linear optimization is to find the most economical use of re-cources; a simplest example being the diet problem, where one asks for the cheapest combinationof foods that make up a diet containing enough calories, minerals, vitamins, etc. The abovetransportation problem is of intermediate complexity: not only a combination of sources of stuffis sought for each city, but also a way to distribute the stuff produced in each factory over thecities. Slightly more complicated is the transshipment problem, where cargo has to be movedaround the world by a combination of sea lines, and it is possible to move cargo from one shipto another. A somewhat similar problem type is to plan the flow of goods in a productionsystem. For example, a chemical production system, where there are a number of processes,each taking a certain combination of energy, manpower and chemical substances in fixed ratios,and producing new chemical substances in fixed ratios which may be used again as input forother processes; substances being bought in the beginning and sold in the end (where wastedisposal is ‘selling’ at a negative price).

Linear optimization problems can be solved on a very large scale, using efficient imple-mentations of the method explained in this course. Problems with ten thousand variables andconstraints are solved in about a minute on a 1 GHz machine with sufficient memory.

The integer linear optimization model is more suitable for all the variants of the aboveproblems where the amounts in the solution need to be integral; when the food is canned,the stuff and cargo is transported in batches, and the intermediate products are producedin lots. Especially so when the amounts are small, and mere rounding of the optimal linearoptimization solution is not realistic. In integer linear optimization, we can also define so-called binary variables x ∈ {0, 1} (as this is equivalent with ‘0 ≤ x ≤ 1 and x ∈ Z’). Such avariable may represent an all-or-nothing choice to be made: assign this person to this job ornot, schedule this task after this other task or vice versa, let this piece of road be a part ofthe route or not, etc. With linear constraints one can then place logical requirements on thecombination to choices to be made: at least someone has to do this job, you cannot plan tasksA after B, B after C, and C after A, etc. So integer linear optimization has many applicationsin planning, rostering, scheduling, routing, and the like.

2. SOME PRELIMINARIES 7

An efficient method for solving integer linear optimization problems in general is not known.The existing solution method described in this course proceeds by solving many linear opti-mization problems to solve one integer linear optimization problem. Since we can solve linearoptimization very efficiently, integer linear optimization problems can be solved on a moderatescale with this method. Truly fast solution methods have been developed only for specificinteger linear optimization problems. They require more advanced methods and insights thandiscussed in this course.

Convex optimization traditionally has applications in statistical estimation, like linear re-gression where one wants to estimate the parameters of a formula so that it fits the observationsbest. Some more general curve fitting problems also give rise to convex optimization problems.

The field of convex optimization is going through important changes since the discovery ofthe interior point method (which is not covered in this course), an efficient method for solvingcertain conic convex optimization problems. This method cleared the way for new applications,like robust linear optimization. This is an extension of linear optimization, allowing us to takeuncertainty in the coefficients defining the constraints and the objective into account, and tofind a solution that is feasible and optimal for a worst-case realization of the coefficients.

2. Some preliminaries

2.1. Solving optimization problems. In optimization, we want to compute

(2) min{f(x) : x ∈ S}

given a set S and a function f : S → R. As a mathematical expression, (2) is just the minimumof the set of reals {f(x) : x ∈ S}. As a problem, (2) can have one of several possible outcomes.

• S = ∅; then we say that (2) is infeasible.• inf{f(x) : x ∈ S} = −∞; then (2) is unbounded.• inf{f(x) : x ∈ S} > −∞, but min{f(x) : x ∈ S} does not exist.• min{f(x) : x ∈ S} exists.

Solving an optimization problem is deciding which of these possiblities occurs, and in case theoptimum exists, finding one optimal solution. On several occasions, we shall use the followingtheorem to show that an optimal solution indeed exists.

Theorem 1 (Weierstrass). If S ⊆ Rn is a nonempty compact set and f : S → R is acontinuous function, then min{f(x) : x ∈ S} exists.

Recall that a subset S ⊆ Rn is compact if and only if it is both closed and bounded.

2.2. Vectors and matrices. A m×n matrix has m rows and n columns. If A is a m×nmatrix, the entry in the i-th row and the j-th column is aij :

(3) A =

a11 · · · a1n...

...am1 · · · amn

.As a rule, we shall use a capital letter for a matrix and the corresponding small letter for itsentries. We may also refer to the entry of A in the i-th row and the j-th column as (A)ij . It isconvenient to regard vectors as matrices with either one row or one column; thus we distinguishbetween row vectors and column vectors. When multiplying matrices we must observe that the

8 INTRODUCTION

dimensions of the matrices are appropriate: an m× n matrix A and a n× k matrix B may bemultiplied, the product AB being a m× k matrix. A column vector

(4) x =

x1...xn

,being a n× 1 matrix, may be multiplied on the left with any m× n matrix A; the result Ax isa column vector in Rm:

(5) Ax = a1x1 + · · ·+ anxn.

Here the column vectors a1, . . . , an ∈ Rm are the columns of A: aj :=

a1j...

amj

for j = 1, . . . , n.

A row vector

(6) y = (y1, . . . , ym),

being a 1×m matrix, may be multiplied on the right with any m× n matrix B; the result yBis a row vector in Rn:

(7) yB = y1b1 + · · ·+ ymbm.

Here the row vectors b1, . . . , bm ∈ Rn are the rows of B: bi = (bi1, . . . , bin) for i = 1, . . . ,m.The product of a row vector a and a column vector b in Rn is either the 1× 1 matrix

(8) ab = a1b1 + · · ·+ anbn,

which is the inner product of a and b, or the n× n matrix ba with (ba)ij = biaj , depending onthe order of multiplication. The transpose At of a m × n matrix A is the n ×m matrix withentries (At)ij = aji. Thus transposition turns row vectors into column vectors and vice versa.When A and B are two n×m matrices, then A ≤ B means aij ≤ bij for all i, j. In particular,if a, b ∈ Rn are both column vectors or both row vectors, then a ≤ b means ai ≤ bi for all i.The zero 0 is a vector or matrix with all entries 0, of appropriate dimensions. So if a ∈ Rnthen the 0 in a ≤ 0 is a vector in Rn, etc. Finally, for a vector x ∈ Rn, the Euclidian norm or

length of a vector is ‖x‖ :=√∑n

i=1 x2i . The n-dimensional ball with center c ∈ Rn and radius

r is Bn(c, r) := {x ∈ Rn : ‖x− c‖ ≤ r}.

2.3. Standard forms of linear optimization problems. With the above notation, wehave a compact way of writing down linear inequalities and linear optimization problems. Forexample we can write ax ≤ b, where a is a row vector, x is a column vector and b is a scalar, asshorthand for the linear inequality a1x1 + · · ·+ anxn ≤ b. If A is an m× n matrix and b ∈ Rmis a column vector, then Ax ≤ b is a system of m linear inequalities. A linear optimizationproblem is any problem of the form

(9) min or max{zx : Px ≤ u, Qx ≥ v, Rx = w, x ∈ Rn},where P,Q,R, u, v, w, z are given matrices and vectors of appropriate dimensions. It is nothard to see that if we can solve the seemingly easier problem

(10) max{cx : Ax ≤ b, x ∈ Rn},given any A, b, c, then we can also solve the general problem. Thus when proving theorems andgiving algorithms we will focus on (10) or other simple problems equivalent to (9) rather than(9) itself.

3. LINEAR EQUATIONS 9

2.4. Theorems of the alternative. Suppose we want to prove that a feasible pointx∗ ∈ Rn of max{cx : Ax ≤ b, x ∈ Rn} is optimal; then we must show that

(11) max{cx : Ax ≤ b, x ∈ Rn} ≤ d,where d := cx∗. But (11) is equivalent to:

(12) there is no x ∈ Rn such that Ax ≤ b and cx > d.

How to prove a negative statement like this? There is a way to show that a system of linearequations has no solution: if no solution can be found by Gaussian elimination, then there isnone. We will show that the nonexistence of a solution to a given system of linear equations isin fact equivalent to the existence of a solution to a related system of linear equations: this isFredholm’s Alternative(Theorem 3). So we can not only prove that a system of linear equationshas no solution, but we can certify it by showing the solution to the other system. There is asimilar theorem for systems of linear inequalities, Farkas’ Lemma (Theorem 10). This theoremis the key to understanding linear optimization problems. In Chapter 6, we prove anothertheorem of this type, about the existence of an integral solution to a system of linear equations(Theorem 38). Such theorems, stating that either one system has a solution or another systemhas a solution, but not both, are so-called theorems of the alternative.

3. Linear equations

3.1. Gaussian elimination. A row operation on a matrix C is any one of the following:

(1) exchanging two rows,(2) multiplying a row by a nonzero scalar, or(3) adding a scalar multiple of a row to another row.

We denote that C ′ can be obtained from C by a sequence of row operations by C ∼ C ′ — itis clear that ∼ is an equivalence relation of matrices. Note that

(13)[I C

]∼[Y C ′

]if and only if C ′ = Y C and det(Y ) 6= 0.

It follows that if C ∼ C ′, then there exists a nonsingular square matrix Y such that C ′ = Y C.Such a matrix Y can be constructed by applying the row operations that change C into C ′, to[I C

]. Conversely, the effect of multiplying C on the left by any nonsingular matrix Y can

be obtained by a series of row operations.A matrix C is in row echelon form if either

(1) C =[

0 D], where D is in row echelon form, or

(2) C =

[1 ∗0 D

], where D is in row echelon form.

Note that it is easy to recursively find a solution to Cx = d if[C d

]is in row echelon form.

Lemma 2. Let C be a matrix. There is a matrix C ′ with C ∼ C ′ such that C ′ is in rowechelon form.

Proof. Let C be an n ×m matrix. We prove the lemma by induction on the number ofrows plus the number of columns n + m, the case where n = 1 or m = 1 being trivial. If thefirst column of C has only zero entries, then we may write C =

[0 D

]. By induction, there

is a D′ such that D ∼ D′ and D′ is row echelon form, hence C =[

0 D]∼[

0 D′]

=: C ′,with C ′ in row echelon form. If, on the other hand, the first column of C has at least onenonzero, then by exchanging two rows (if necessary) we obtain a matrix in which the entry inthe first row and the first column nonzero. By multiplying the first row by a suitable number,

10 INTRODUCTION

we get a matrix in which the top left entry is 1, and after subtracting a suitable multiple ofthe first row from each other row we see that

(14) C ∼[

1 ∗0 D

].

Since by induction, D ∼ D′ with D′ in row echelon form, we have

(15) C ∼[

1 ∗0 D

]∼[

1 ∗0 D′

]=: C ′,

where C ′ is in row echelon form. �

The Gaussian elimination method to solve the matrix equation Ax = b is to write thecoefficients in a matrix

[A b

]and use that

(1) if[A b

]∼[A′ b′

], then Ax = b⇔ A′x = b′, and

(2) if[A′ b′

]is in row echelon form, either it has a row of the form

[0 1

], or a

solution to A′x = b′ is easily determined.

Thus, to either find a solution to Ax = b, or to find out that there is no solution, it sufficesto find a matrix

[A′ b′

]in row echelon form such that

[A b

]∼[A′ b′

]— a problem

that can be solved by following the steps in the proof of Lemma 2.

3.2. Fredholm’s Alternative. We prove the fundamental theorem of linear algebra, alsoknown as Fredholm’s Alternative.

Theorem 3 (Fredholm, 1903; Gauss, 1809). If A is an m × n real matrix and b ∈ Rm,then exactly one of the following is true:

(1) there is a column vector x ∈ Rn such that Ax = b, or(2) there is a row vector y ∈ Rm such that yA = 0 and yb = 1.

Proof. Suppose (1) and (2) are both true. Choose x, y such that Ax = b, yA = 0, andyb = 1. Then

(16) 0 = 0x = (yA)x = y(Ax) = yb = 1,

a contradiction. Thus, at most one of (1) and (2) holds. We next show that at least one of (1)and (2) is true. Consider the matrix

[A b

]. There is a matrix

[A′ b′

]∼[A b

]such

that[A′ b′

]is in row echelon form. Since

[A′ b′

]∼[A b

], there is a square matrix

Y such that[A′ b′

]= Y

[A b

], i.e. such that A′ = Y A and b′ = Y b. As

[A′ b′

]is in

row echelon form, either

(1’) A′x = b′ for some x, or(2’) the i-th row of

[A′ b′

]is of the form

[0 1

], for some i ∈ {1, . . . ,m}.

In the former case (1) holds, since A′x = b′ implies Ax = b. In the latter case (2) is true: takey equal to the i-th row of Y , then yA is the i-th row of Y A = A′, which is 0; yb equals the i-thentry of Y b = b′, which is 1. �

Example. Consider the matrix A :=

1 32 73 10

and the vectors b :=

112

, b′ := 1

11

.

To find a solution x ∈ R2 of Ax = b we do Gaussian elimination on[A b

]and find that

3. LINEAR EQUATIONS 11

(17)[A b

]=

1 3 12 7 13 10 2

∼ 1 3 1

0 1 −10 0 0

.It is easy to read off the solution x =

[4−1

]from the coefficient matrix on the right. Seeking

a solution x ∈ R2 of Ax = b′, we find that

(18)[A b′

]=

1 3 12 7 13 10 1

∼ 1 3 1

0 1 −10 0 1

.The third row of the matrix on the right is of the form

[0 1

], thus there is no x such that

Ax = b′. To find a certificate y ∈ R3 for this fact as in Fredholm’s Alternative, we compute

(19)[I A b′

]=

1 0 0 1 3 10 1 0 2 7 10 0 1 3 10 1

∼ 1 0 0 1 3 1−2 1 0 0 1 −1

1 1 −1 0 0 1

.From the third row of the coefficient matrix on the right, we read off y = (1, 1,−1). One easilyverifies that yA = 0 and yb′ = 1.

3.3. Linear equations. Consider the system

(20)

a11x1 + · · · + a1nxn = b1a21x1 + · · · + a2nxn = b2

......

...am1x1 + · · · + amnxn = bm

of m linear equations in n variables x1, . . . , xn. We say that the equation

(21) c1x1 + · · ·+ cnxn = d

is a linear combination of the rows of (20) if there exist y1, . . . , ym ∈ R such that

(22)

y1× (a11x1 + · · · + a1nxn = b1)y2× (a21x1 + · · · + a2nxn = b2)

......

......

ym× (am1x1 + · · · + amnxn = bm)+

c1x1 + · · · + cnxn = d.

Writing (20) as a matrix equation Ax = b, we see that Theorem 3 asserts that either

(1) the system of linear equations (20) has a solution x1, . . . , xn, or(2) the equation

(23) 0x1 + · · ·+ 0xn = 1

is a linear combination of rows of (20),

but not both. Since any solution x of (20) will satisfy any linear combination cx = d of therows of (20), and 0x = 1 has no solutions, it is clear that the two statements cannot both betrue; the more interesting fact is that at least one of the statements is true.

12 INTRODUCTION

3.4. Linear spaces. Recall from linear algebra that z ∈ Rm is a linear combination ofvectors a1, . . . , an ∈ Rm if there are scalars λ1, . . . , λn ∈ R such that

(24) z = λ1a1 + · · ·+ λnan,

and that the linear hull of vectors a1, . . . , an is the set of all linear combinations of a1, . . . , an:

(25) lin.hull {a1, . . . , an} := {λ1a1 + · · ·+ λnan : λ1, . . . , λn ∈ R}The set of all vectors orthogonal to a vector y ∈ Rm is

(26) Hy := {x ∈ Rm : yx = 0}.A set of points H ⊆ Rm is called a linear hyperplane if H = Hy for some nonzero vector y. LetA be the matrix with columns a1, . . . , an. Theorem 3 is equivalent to: either

(1) b ∈ lin.hull {a1, . . . , an}, or(2) a1, . . . , an ∈ H and b 6∈ H for some linear hyperplane H,

but not both. Again, it is easy to see that the two statements exclude each other, since

(27) a1, . . . , an ∈ H ⇒ lin.hull {a1, . . . , an} ⊆ Hfor any linear hyperplane H.

Exercises

(1) Consider the linear optimization problem

max{−x1 + 2x2 + x3 : 2x1 + x2 − x3 ≤ −2,−x1 + 4x2 ≤ 3, x2 − x3 ≤ 0, x ∈ R3}.(a) Rewrite this problem to a problem of the form max{cx : Ax ≤ b, x ∈ Rn}. Give

n,A, b, c.(b) Rewrite this problem to a problem of the form min{cx : Ax = b, x ≥ 0, x ∈ Rn}.

Give n,A, b, c.(2) Consider the linear optimization problem

max{5x1 + x3 : −x1 + x2 ≥ 2, x1 + 4x2 + x3 ≤ 3, x1, x2, x3 ≥ 0, x ∈ R3}.(a) Rewrite this problem to a problem of the form max{cx : Ax ≤ b, x ∈ Rn}. Give

n,A, b, c.(b) Rewrite this problem to a problem of the form min{cx : Ax = b, x ≥ 0, x ∈ Rn}.

Give n,A, b, c.(3) Write down the objective and the constraints of a transportation problem (section 1)

with 2 factories and 3 cities, and with transportation costsfact.\ city 1 2 3

1 3 12 42 6 1 2

,

capacitiesfact. cap.

1 52 2

and demandscity 1 2 3

demand 1 3 2.

(a) Rewrite this problem to a problem of the form max{cx : Ax ≤ b, x ∈ Rn}. Given,A, b, c.

(b) Rewrite this problem to a problem of the form min{cx : Ax = b, x ≥ 0, x ∈ Rn}.Give n,A, b, c.

(4) Rewrite the the general linear optimization problem (9) to problems of the formmax{cx : Ax ≤ b, x ∈ Rn} and min{cx : Ax = b, x ≥ 0, x ∈ Rn}.

(5) Find, for each of the matrices A and vectors b below, either a column vector x suchthat Ax = b or a row vector y such that yA = 0 and yb 6= 0.

EXERCISES 13

(a) A =

1 5 01 2 11 1 2

, and b =

223

.

(b) A =

1 5 0 11 2 1 21 1 2 4

, and b =

223

.

(c) A =

[−1 3

4 5

], b =

[40

].

(d) A =

−1 34 52 2

, b =

401

.

(e) A =

−1 3 04 5 12 2 −1

, b =

401

.

(6) Let a1, . . . , an ∈ Rm, and let H ⊆ Rm be a hyperplane. Show that a1, . . . , an ∈ H ⇒lin.hull {a1, . . . , an} ⊆ H.

(7) Let K be a field and let A be an m×n matrix with entries in K and let b ∈ Km. Showthat exactly one of the following holds:(a) there is a column vector x ∈ Kn such that Ax = b, or(b) there is a row vector y ∈ Km such that yA = 0 and yb = 1.

(8) Let A be a set of subsets of {1, . . . , n}. Show that exactly one of the following holds:(a) there is a set X ⊆ {1, . . . , n} such that |X ∩A| is odd for all A ∈ A, or(b) there is a set B ⊆ A such that |B| is odd and |{B ∈ B : i ∈ B}| is even for all

i ∈ {1, . . . , n}.

CHAPTER 1

Linear inequalities

1. Convex sets

1.1. Definitions. Given two points x, y ∈ Rn the line segment between x and y is the setof points

(28) [x, y] := {x+ λ(y − x) : λ ∈ [0, 1]}.A set of points C ⊆ Rn is convex if [x, y] ⊆ C for all x, y ∈ C (see Figure 2).

A hyperplane is a set of the form

(29) Hd,δ := {x ∈ Rn : dx = δ}.where d ∈ Rn is a nonzero row vector and δ ∈ R. A halfspace is a set of points

(30) H≤d,δ := {x ∈ Rn : dx ≤ δ} or H≥d,δ := {x ∈ Rn : dx ≥ δ}

where d ∈ Rn is a nonzero row vector and δ ∈ R. The common boundary of these sets isHd,δ. If X,Y ⊆ Rn, then we say that X and Y are separable (by a hyperplane) if there is some

nonzero d ∈ Rn and δ ∈ R such that X ⊆ H≤d,δ and Y ⊆ H≥d,δ. An open halfspace is a set ofpoints

(31) H<d,δ := {x ∈ Rn : dx < δ} or H>

d,δ := {x ∈ Rn : dx > δ}

where d ∈ Rn is a nonzero row vector and δ ∈ R. If X,Y ⊆ Rn, then we say that X and Yare strongly separable (by a hyperplane) if there is some nonzero d ∈ Rn and δ ∈ R such thatX ⊆ H<

d,δ and Y ⊆ H>d,δ.

1.2. The separation theorem. Let Y ⊆ Rn, x ∈ Rn. It is clear that if Y and {x} arestrongly separable, then x 6∈ Y . For closed and convex Y , the converse statement holds (seeFigure 2).

Theorem 4. Let C ⊆ Rn be a closed convex set and let x ∈ Rn. If x 6∈ C, then x and Care strongly separable by a hyperplane.

��

��

��

��

y

x

x−0.2(y−x)

x+0.9(y−x)

x+0.3(y−x)

H>

H<

H

Figure 1. A line segment, and a hyperplane

15

16 1. LINEAR INEQUALITIES

A convex set A nonconvex set

A separating hyperplane An inseparable pair

Figure 2.

Proof. We must show that there is a nonzero row vector d ∈ Rn and a δ ∈ R such thatdy ≥ δ for all y ∈ C and dx < δ. This is trivial if C = ∅, so we may assume that C 6= ∅. Firstwe find a vector y ∈ C that is closest to x among all y ∈ C, i.e. such that

(32) ‖y − x‖ = inf{‖y − x‖ : y ∈ C}.

To show that such a vector exists in C, choose some y ∈ C, and let p := ‖y − x‖. ThenC ∩ Bn(x, p) is a nonempty compact set and the map y 7→ ‖y − x‖ is continuous, so byTheorem 1

(33) min{‖y − x‖ : y ∈ C ∩Bn(x, p)}

is attained by some y ∈ C ∩ Bn(x, p). Clearly, y is closest to x among all vectors in C; thevectors outside Bn(x, p) by definition have distance to x greater than p, and ‖y − x‖ ≤ p.

We set d := (y − x)t and δ := 12d(x + y). Since x 6∈ C, we have y 6= x, and hence d is not

the zero vector.A simple calculation proves dx < δ:

(34) dx = d(1

2(x+ y) +

1

2(x− y)) = δ − ‖d‖

2

2< δ.

and similarly we have dy > δ.(Geometrically, the hyperplane Hd,δ is orthogonal to the linesegment [x, y] and intersects it in the middle.)

It remains to show that dy > δ for all y ∈ C. So let us assume that this is not true, andthere exists an y∗ ∈ C such that dy∗ ≤ δ. Let f : R→ R be defined by

(35) f : λ 7→ ‖y + λ(y∗ − y)− x‖2 = ‖d‖2 + 2λdw + λ2‖w‖2,

where w := y∗− y. Then f ′(λ) = 2(dw+ λ‖w‖2), and hence f ′(0) = 2dw = 2d(y∗− y) < 0. Sofor a small λ > 0, we have

(36) ‖y − x‖2 = f(0) > f(λ) = ‖y + λ(y∗ − y)− x‖2,

and moreover y + λ(y∗ − y) ∈ [y, y∗] ⊆ C, contradicting the choice of y. �

1. CONVEX SETS 17

The Lorenz cone L3 A cone generated by 5 vectors

Figure 3.

There are many variants of the above Separation Theorem; we mention two. Their proofsare exercises.

Theorem 5. Let C ⊆ Rn be a closed and convex set, and let c lie on the boundary of C.Then there exists a d ∈ Rn and δ ∈ R such that dc = δ and dx ≤ δ for all x ∈ C.

Theorem 6. Let C,D ⊆ Rn be closed and convex sets. If C ∩D = ∅, then C and D areseparable by a hyperplane.

1.3. Cones. A set of points C ⊆ Rn is a cone if C is convex and αx ∈ C for all x ∈ C andnonnegative α ∈ R. For example, the k-dimensional Lorenz cone or ice cream cone (see figure3) is

(37) Lk := {x ∈ Rk :√x2

1 + · · ·+ x2k−1 ≤ xk}.

It is an exercise to prove that a set C is a cone if and only if αx+ βy ∈ C for all x, y ∈ C andnonnegative α, β ∈ R. We have a variant of the separation theorem for cones.

Theorem 7. Let C ⊆ Rn be a closed cone and let x ∈ Rn. If x 6∈ C, then there is a d ∈ Rnsuch that dy ≥ 0 for all y ∈ C and dx < 0.

Proof. Suppose x 6∈ C. Since C is a closed convex set, there exists nonzero d ∈ Rn and aδ ∈ R such that dy > δ for all y ∈ C and dx < δ. We will show that dx < 0, and dy ≥ 0 for ally ∈ C.

As C is a closed cone, we have 0 ∈ C and hence δ < d0 = 0. So dx < δ < 0. Let y ∈ C.If dy < 0, then α := δ

dy > 0. Hence αy ∈ C but on the other hand d(αy) = α(dy) = δ, a

contradiction. Hence dy ≥ 0. �

The polar of a cone C is C∗ := {y ∈ Rn : xty ≥ 0 for all x ∈ C}. One way to formulate theseparation theorem for cones is as follows.

Corollary 7.1. If C is a closed cone, then C = C∗∗.

Proof. Let C be a closed cone. If x ∈ C, then ytx ≥ 0 for all y ∈ C∗, thus x ∈ C∗∗. Nowsuppose x 6∈ C. Then there is a d such that dx ≥ 0 for all x ∈ C and dx < 0 by the theorem.Then y := dt ∈ C∗ by definition of C∗ and since xty < 0, it follows that x 6∈ C∗∗. �


2. Linear inequalities

2.1. Farkas’ Lemma. Let a1, . . . , an ∈ Rm be vectors. We say that z is a nonnegativecombination of a1, . . . , an if

(38) z = λ1a1 + · · ·+ λnan

for some nonnegative λ1, . . . , λn ∈ R. It is an exercise to show that the set of all nonnegativecombinations of a given set of vectors is a closed cone (see Figure 3). We define

(39) cone {a1, . . . , an} := {λ1a1 + · · ·+ λnan : λ1, . . . , λn ∈ R, λ1, . . . , λn ≥ 0}.Compare the definition of ‘cone’ to the definition of ‘linear hull’. We can use the separationtheorem for cones to obtain a statement about nonnegative combinations of vectors; this isFarkas’ Lemma.

Theorem 8 (Farkas, 1894). Let a1, . . . , an and b be column vectors in Rm. Exactly one ofthe following statements holds:

(1) b ∈ cone {a1, . . . , an}.(2) There is a row vector d ∈ Rm such that dai ≥ 0 for all i and db < 0.

Proof. Suppose both (1) and (2) hold, so∑n

i=1 λiai = b for some nonnegative λ1, . . . , λn,and there is a d such that da1, . . . , dan ≥ 0 and db < 0. Then

(40) 0 > db = d(

n∑i=1

λiai) =

n∑i=1

λi(dai) ≥ 0,

a contradiction. So (1) and (2) cannot both be true; it remains to be shown that at least one ofthem holds. Suppose (1) does not hold, then b 6∈ cone {a1, . . . , an}. By the separation theoremfor cones, Theorem 7, there is a vector d ∈ Rm such that da ≥ 0 for all a ∈ cone {a1, . . . , an}and db < 0. Since a1, . . . , an ∈ cone {a1, . . . , an}, we have dai ≥ 0 for all i. Thus (2) holds. �

2.2. An extension of Fredholm’s Alternative. We say that a vector x is nonnegative,notation x ≥ 0, if each of the entries of x is nonnegative. By reformulating Farkas’ Lemma weobtain a theorem similar to Theorem 3, characterizing when the matrix equation Ax = b hasa nonnegative solution.

Theorem 9 (Farkas’ Lemma, variant). If A is an m × n real matrix and b ∈ Rm, thenexactly one of the following is true:

(1) there is a column vector x ∈ Rn such that x ≥ 0 and Ax = b, or(2) there is a row vector y ∈ Rm such that yA ≥ 0 and yb < 0.

Proof. Let a1, . . . , an be the columns of A, and apply Theorem 8. �

In terms of systems of linear equations, this theorem states that either

(1) the system of linear equations (20) has a nonnegative solution x, or(2) there is a linear combination cx = d of rows of (20) with c ≥ 0 and d < 0,

but not both. It is an exercise to show that Theorem 9 generalizes Fredholm’s Alternative.

2.3. Linear inequalities. The following form of Farkas’ lemma will be useful. Note thatif u and v are vectors, the inequality sign in u ≤ v means that ui ≤ vi for each i.

Theorem 10 (Farkas’ Lemma, variant). If A is an m × n real matrix and b ∈ Rm, thenexactly one of the following is true:

(1) there is a column vector x ∈ Rn such that Ax ≤ b, or

2. LINEAR INEQUALITIES 19

(2) there is a row vector y ∈ Rm such that y ≥ 0, yA = 0 and yb < 0.

Proof. We first show that at least one of (1) and (2) must be false. For if x ∈ Rn is suchthat Ax ≤ b, and y ∈ Rm is such that y ≥ 0, yA = 0 and yb < 0, then

(41) 0 = 0x = yAx ≤ yb < 0,

a contradiction. To see that at least one of (1) and (2) is true, we apply Theorem 9 to thematrix A′ :=

[A −A I

]and b. We find that either

(1’) there exist column vectors x′, x′′, x′′′ ≥ 0 such that Ax′ −Ax′′ + Ix′′′ = b, or(2’) there is a row vector y ∈ Rm such that yA ≥ 0, y(−A) ≥ 0, yI ≥ 0, and yb < 0.

If case (1’) is true, then x = x′ − x′′ satisfies Ax ≤ b, and then (1) holds. If case (2’) is true, itfollows that yA = 0, y ≥ 0, and yb < 0, and then (2) holds. �

Consider the system of linear inequalities

(42)

a11x1 + · · · + a1nxn ≤ b1a21x1 + · · · + a2nxn ≤ b2

......

...am1x1 + · · · + amnxn ≤ bm.

We say that cx ≤ d is a nonnegative combination of the rows of (42) if there are scalarsy1, . . . yn ≥ 0 such that:

(43)

y1× (a11x1 + · · · + a1nxn ≤ b1)y2× (a21x1 + · · · + a2nxn ≤ b2)

......

......

ym× (am1x1 + · · · + amnxn ≤ bm)+

c1x1 + · · · + cnxn ≤ d.

Theorem 10 says that either

(1) the system of linear inequalities (42) has a solution, or(2) the inequality

(44) 0x1 + · · ·+ 0xn ≤ −1

is a nonnegative combination of the rows of (42),

but not both.

2.4. Fourier-Motzkin elimination. The proof of Farkas’ Lemma is not algorithmic: itdoes not give us a procedure to compute a vector x satisfying a system of linear inequalities,like Gaussian elimination for systems of linear equations. A straightforward, but elaboratemethod to solve a system of linear inequalities is Fourier-Motzkin elimination. The method isnot efficient as an algorithm; on the other hand it can be used in an alternative proof of Farkas’Lemma that avoids calling the Separation Theorem.

The core of Fourier-Motzkin elimination is a procedure to eliminate one variable, xn say,and obtain a system of many more linear inequalities on the remaining variables x1, . . . , xn−1

that has a solution if and only if the original system had a solution. Recursively (using Fourier-Motzkin on this system of inequalities with fewer variables), we find a solution x1, . . . , xn−1 tothis system, from which we compute the missing xn.

We show how to solve the system of inequalities (42).


Step 1 For each i = 1, . . . ,m, divide the i-th row of (42) by |ain| whenever ain 6= 0. After this,in each row the coefficient at xn is 1,−1 or 0; say we have k′ inequalities with coefficient −1 atxn, k′′ with coefficient 1 at xn, k′′′ with coefficient 0 at xn. After reordering the rows, we get:

(45)a′ix′ −xn ≤ b′i i = 1, . . . , k′,

a′ix′ +xn ≤ b′i i = k′ + 1, . . . , k′ + k′′,

a′ix′ ≤ b′i i = k′ + k′′ + 1, . . . , k′ + k′′ + k′′′.

where x′ = (x1, . . . , xn−1). This system has the same solutions as (42).Step 2 Find a solution to the system of linear inequalities

(46)(a′i + a′j)x

′ ≤ b′i + b′j i = 1, . . . , k′, j = k′ + 1, . . . , k′ + k′′,

a′ix′ ≤ b′i i = k′ + k′′ + 1, . . . , k′ + k′′ + k′′′.

Call this solution x′ = (x1, . . . , xn−1).Step 3 Choose xn such that

(47) max{a′ix′ − b′i : i = 1, . . . , k′} ≤ xn ≤ min{b′j − a′j x′ : j = k′ + 1, . . . , k′ + k′′}.(That max ≤ min follows from the fact that x′ satisfies (46)). Then x1, . . . , xn is a solution to(42).

In step 2, the number of inequalities is k′k′′ + k′′′. A careful choice of the variable to beeliminated is recommended, to keep this number small if possible. It is the rapid increase inthe number of inequalities that makes this method inefficient. Note that if k′ ≤ 1 or k′′ ≤ 1,the number of inequalities actually decreases.

3. Caratheodory’s Theorem

3.1. Caratheodory’s Theorem.

Theorem 11 (Caratheodory, 1911). Let a1, . . . , am, b ∈ Rn. If b ∈ cone {a1, . . . , am}, thenthere is a J ⊆ {1, . . . ,m} such that {ai : i ∈ J} is linearly independent and b ∈ cone {ai : i ∈ J}.

Proof. Choose J ⊆ {1, . . . ,m} so that b ∈ cone {ai : i ∈ J}, and such that |J | is as smallas possible. So there exist λi ≥ 0 for each i ∈ J such that

∑i∈J λiai = b.

If {ai : i ∈ J} is not linearly independent, then there exist βi ∈ R for i ∈ J , not all zero,such that

∑i∈J βiai = 0. We may assume that βi > 0 for some i ∈ J ; if not, replace each βi

by −βi. Let α := min{λiβi : βi > 0, i ∈ J}, and let λ′i := λi − αβi for i ∈ J . Then λ′i ≥ 0 for all

i ∈ J , and there must be some i0 ∈ J such that α =λi0βi0

, so that λ′i0 = 0. Moreover,

(48)∑i∈J

λ′iai =∑i∈J

(λi − αβi)ai =∑i∈J

λiai − α(∑i∈J

βiai) = b− α0 = b.

But now b ∈ cone {ai : i ∈ J \ {i0}}, contradicting the minimality of J . So {ai : i ∈ J} islinearly independent. �

Corollary 11.1. Let a1, . . . , am ∈ Rn be row vectors and let b1, . . . , bm ∈ R. If the systemof linear inequalities

(49) a1x ≤ b1, . . . , amx ≤ bm,has no solution x ∈ Rn, then there is a set J ⊆ {1, . . . ,m} with at most n + 1 members suchthat the subsystem

(50) aix ≤ bi for all i ∈ Jhas no solution x ∈ Rn.

3. CARATHEODORY’S THEOREM 21

Proof. Let a′i :=

[atibi

]for i = 1, . . . ,m and let b :=

[0−1

]. If (49) has no solution,

then by Farkas Lemma (Theorem 10), we have b ∈ cone {a′1, . . . , a′m}. Hence there is a setJ ⊆ {1, . . . ,m} such that b ∈ cone {a′i : i ∈ J}, where {a′i : i ∈ J} is linearly independent. Itfollows that J has at most n+ 1 members, and that (50) has no solutions, as required. �

3.2. The fundamental theorem of linear inequalities. It is evident that Theorem11 allows us to improve Theorem 8 slightly: we can replace ‘b ∈ cone {a1, . . . , an}’ by astronger statement, stating that we need only a linearly independent set subset of {a1, . . . , an}to generate b. In fact, Caratheodory’s Theorem can be used to replace both alternatives in anyversion of Farkas’ Lemma by stronger statements.

Theorem 12. Let a1, . . . , am, b ∈ Rn. Then exactly one of the following holds:

(1) there is a linearly independent subset X ⊆ {a1, . . . , am} such that b ∈ cone X; and(2) there is a nonzero d ∈ Rn such that dai ≥ 0 for all i, db < 0, and rank{ai : dai = 0} =

rank{a1, . . . , am, b} − 1.

Proof. By Theorem 8, either b ∈ cone {a1, . . . , am} or there is a d ∈ Rn such thatdai ≥ 0 for all i, and db < 0. In the former case, (1) follows by Caratheodory’s Theorem.In the latter case, choose d with dai ≥ 0 for all i and db < 0, such that |{i : dai = 0}|is as large as possible. Suppose rank{ai : dai = 0} < rank{a1, . . . , am, b} − 1. Then alsorank{b} ∪ {ai : dai = 0} < rank{a1, . . . , am, b}, hence there is some f ∈ lin.hull {a1, . . . , am, b}so that f ⊥ lin.hull {b} ∪ {ai : dai = 0}. Then d + λf ⊥ {ai : dai = 0} and (d + λf)b < 0 forall λ. We may assume that fai > 0 for some i. Let

(51) λ∗ := max{λ ∈ R : (d+ λf)ai ≥ 0 for all i}.

Then there is some i such that dai 6= 0 and (d+ λ∗f)ai = 0, contradicting our choice of d. �

Although it can be done, it is a bit cumbersome to apply Theorem 11 in its current form toderive the sharper form of the second alternative, which is why we gave a direct proof. Indeed,it is hard to even recognize that the modifications of the second alternative have anything to dowith Caratheodory’s Theorem. It is therefore useful to take a more abstract viewpoint. BothFarkas’ lemma and Caratheodory’s Theorem can be formulated in terms of linear subspaces ofRn. As such, they are easily combined.

3.3. An abstract view. Recall that the orthogonal complement of a linear space L ⊆ Rnis the linear space

(52) L⊥ := {y ∈ Rn : y ⊥ x for all x ∈ L}

Theorem 13 (Farkas’ Lemma, variant). Let L ⊆ Rn be a linear space, and let e ∈{1, . . . , n}. Then exactly one of the following holds:

(1) there exists an x ∈ L such that x ≥ 0 and xe > 0; and(2) there exists an y ∈ L⊥ such that y ≥ 0 and ye > 0.

Proof. We prove the theorem by induction on n. If n = 1 the theorem clearly holds.Suppose the theorem fails. Then n > 1, and we may assume e 6= n.

Let L′ := {x′ ∈ Rn−1 : (x′, 0) ∈ L}. Then L′⊥ = {y′ ∈ Rn−1 : (y′, t) ∈ L⊥ for some t ∈ R}.Since the theorem holds for L′, there exists either a x′ ∈ L′ so that x′ ≥ 0 and x′e > 0 or ay′ ∈ L′⊥ so that y′ ≥ 0 and y′e > 0. In the former case (x′, 0) ∈ L satisfies (1), and in the lattercase, let t be such that (y′, t) ∈ L⊥. Then (y′, t) satisfies (2) unless t < 0. So let y := (y′, t).


Let L′′ := {x′′ ∈ Rn−1 : (x′′, s) ∈ L for some s ∈ R}. Then L′′⊥ := {y′′ ∈ Rn−1 : (y′′, 0) ∈L⊥}. Since the theorem holds for L′′, there exists either x′′ ∈ L′′ so that x′′ ≥ 0 and x′′e > 0or a y′′ ∈ L′′⊥ so that y′′ ≥ 0 and y′′e > 0. In the latter case (y′′, 0) ∈ L⊥ satisfies (2), and inthe former case, let s ∈ R be such that (x′′, s) ∈ L. Then (x′′, s) satisfies (1) unless s < 0. Letx := (x′′, s).

We now have the contradiction 0 = ytx = y′tx′′ + st ≥ y′ex′′e + st > 0. �

Note that we could also have derived this version of Farkas’ lemma from the previousversions; this is an exercise. Apply the theorem to L := {x ∈ Rn+1 : x1a1 + · · ·+xnan = xn+1b}to derive Theorem 8, to L = {x ∈ Rn+1 ∈ [A| − b]x = 0} for a proof of Theorem 9, and toL := {[bs−Ax|s] ∈ Rm+1 : x ∈ Rn, s ∈ R} for a proof of Theorem 10.

The support of a vector x ∈ Rn is

(53) supp(x) := {i ∈ {1, . . . , n} : xi 6= 0}.If L ⊆ Rn is a linear space, then X ⊆ {1, . . . , n} is a dependent set of L if there exists anx ∈ L such that supp(x) = X, and an independent set otherwise. A circuit is an inclusionwiseminimal dependent set and a basis is an inclusionwise maximal independent set.

A dependent set D of L is positive if there exists some x ∈ L such that x ≥ 0 andsupp(x) = D.

Theorem 14 (Caratheodory’s Theorem, variant). Let L ⊆ Rn be a linear space, and lete ∈ {1, . . . , n}. If D is a positive dependent set of L such that e ∈ D, then there exists a positivecircuit C of L such that e ∈ C ⊆ D.

Proof. Choose C ⊆ D so that C is a positive dependent set, e ∈ C and |C| is as smallas possible. Let x ∈ L be such that supp(x) = C and xe > 0. If C is not a circuit, thenthere is a circuit C ′ properly contained in C. Let x′ ∈ L be such that supp(x′) = C ′, andx′e < 0 if e ∈ C ′ and x′i > 0 for some i in any case. Let α = min{xi/x′i : x′i > 0}. Thenx− αx′ ≥ 0, (x− αx′)e > 0 and supp(x− αx′) is properly contained in C, contradiction. �

The two theorems are combined in an obvious manner.

Theorem 15. Let L ⊆ Rn be a linear space, and let e ∈ {1, . . . , n}. Then exactly one ofthe following holds:

(1) there exists a positive circuit of L containing e; and(2) there exists a positive circuit of L⊥ containing e.

To apply this abstract version of the fundamental theorem of linear inequalities, it is usefulto translate some linear algebra facts into our new terms. Observe that if a1, . . . , am ∈ Rnand L := {x ∈ Rm :

∑mi=1 xiai = 0}, then X ⊆ {1, . . . ,m} is independent in L if and only if

{ai : i ∈ X} is an independent set of vectors.

Lemma 16. Let L ⊆ Rn be a linear space.

(1) if B,B′ are basis of L and e ∈ B \ B′ there there exists an f ∈ B′ \ B such thatB \ {e} ∪ {f} is a basis of L.

(2) if B is a basis of L, then {1, . . . , n} \B is a basis of L⊥.(3) if B is a basis of L and e 6∈ B, then B ∪ {e} contains a unique circuit of L.(4) B is a basis of L if and only if B is independent and intersects all circuits of L⊥.

If L ⊆ Rn, the L-rank of X ⊆ {1, . . . , n} is

(54) rankL(X) := max{|I| : I ⊆ X, I independent in L}.If L = {x ∈ Rm :

∑mi=1 xiai = 0}, then rankL(X) = rank{ai : i ∈ X}.

EXERCISES 23

Lemma 17. Let L ⊆ Rn be a linear space.

(1) rankL({1, . . . , n}) = dim(L);(2) if X,Y ⊆ {1, . . . , n}, then rankL(X) + rankL(Y ) ≥ rankL(X ∩ Y ) + rankL(X ∪ Y );(3) rankL(X) ≤ |X|, and X is independent iff rankL(X) = |X|.

We sketch a second Proof of Theorem 12. Consider the linear space

(55) L := {x ∈ Rm+1 : x1a1 + · · ·+ xmam = xm+1b}

Observe that L⊥ = {(da1, . . . , dam,−db) : d ∈ Rn}.Apply Theorem 15 with e = m + 1. If there exists a positive circuit C of L containing

e, then (1) follows taking X := {ai : i ∈ C \ {e}}. If there exists a positive circuit D of L⊥

containing e, let y ∈ L⊥ be such that y ≥ 0, ye > 0 and supp(y) = D. Let d ∈ Rn be such thaty = (da1, . . . , dam,−db). Then (2) follows.

Exercises

(1) Let a1 =

122

, a2 =

−110

, a2 =

3−12

, a4 =

10−5

. Show that:

(a)

032

∈ cone {a1, a2, a3, a4};

(b)

711

∈ cone {a1, a2, a3, a4};

(c)

7−81

6∈ cone {a1, a2, a3, a4}.

Sketch cone {a1, a2, a3, a4} in R3.(2) Show that there is no x ∈ R2 such that 2x1 + x2 ≤ 1, −x1 + x2 ≤ 0, x1 − 3x2 ≤ −1

and x1 + x2 ≤ 4.(3) Prove Theorem 3 using Theorem 9. Hint: apply Theorem 9 to matrixA′ :=

[A −A

]and vector b.

(4) Prove: if A is an m× n matrix, B is an m× k matrix, and c ∈ Rm, then exactly oneof the following holds:(a) There are column vectors x ∈ Rn, y ∈ Rk such that x ≥ 0 and Ax+By = c.(b) There is a row vector z ∈ Rm such that zA ≥ 0, zB = 0, and zc < 0.Hint: apply Theorem 9 to matrix

[A B −B

]and vector c.

(5) Prove: if A is an m× n matrix, B is an m× k matrix, and c ∈ Rm, then exactly oneof the following holds:(a) There are column vectors x ∈ Rn, y ∈ Rk such that x ≥ 0 and Ax+By ≤ c.(b) There is a row vector z ∈ Rm such that zA ≥ 0, zB = 0, z ≥ 0, and zc < 0.Hint: apply Theorem 9 to matrix

[A B −B I

]and vector c.

(6) Prove: if A is an m × n matrix, B is an k × n matrix, and c ∈ Rm, d ∈ Rk , thenexactly one of the following holds:(a) There is a column vector x ∈ Rn such that x ≥ 0, Ax = c, and Bx ≤ d.(b) There are row vectors y ∈ Rm, z ∈ Rk such that yA + zB ≥ 0, z ≥ 0, and

yc+ zd < 0.


(7) Prove Gordan’s Theorem: for any vectors a1, . . . , am in Rn, exactly one of the followingholds:(a) There exist λ1, . . . , λm ≥ 0, not all zero, such that

∑mi=1 λiai = 0.

(b) There is a vector d ∈ Rn such that dai > 0 for i = 1, . . . ,m.

Hint: apply Theorem 8 to vectors a′i :=

[ai1

]and b′ :=

[01

].

(8) Prove Theorem 10 by induction on n using Fourier-Motzkin elimination. Hint: verifythat it suffices to show: if 0x′ ≤ −1 is a nonnegative combination of the systeminequalities (46), then 0x ≤ −1 is a nonnegative combination of the system inequalities(42).

(9) Find a solution x1, x2 to the system of linear inequalities

x1 +x2 ≤ 0,x1 ≤ 0,x1 +2x2 ≤ 3,x1 −x2 ≤ 3,−x1 −2x2 ≤ 4,−x1 ≤ 5

(a) by Fourier-Motzkin elimination,(b) by drawing the set of solutions in the plane.What is the graphical interpretation of the minimum and maximum in step 3. ofFourier-Motzkin elimination?

(10) Prove that the intersection of convex sets is convex.(11) Prove: if C ⊆ Rn is a nonconvex set, then there exists an x ∈ Rn \ C that cannot be

strongly separated from C by a hyperplane.(12) Prove that for any closed convex set C, we have C =

⋂{H<

d,δ : dx < δ for all x ∈ C}.(13) Let C,D be compact convex sets. Prove: if C ∩ D = ∅, then there is a hyperplane

separating C and D. Hint: consider C −D := {c− d : c ∈ C, d ∈ D}.(14) Prove that a set C ⊆ Rn is a cone if and only if αx+ βy ∈ C for all x, y ∈ C and all

α, β ∈ R such that α, β > 0.(15) Let C ⊆ Rn be a closed and convex set, and let c lie on the boundary of C. Prove

that there exists a d ∈ Rn \{0} and a δ ∈ R such that dc = δ and dx ≤ δ for all x ∈ C.(16) Let C,D ⊆ Rn be closed, convex sets. Show that if C and D are disjoint, then there

is a d ∈ Rn \ {0} and δ ∈ R such that dx ≤ δ for all x ∈ C and dy ≥ δ for all y ∈ D.(17) Verify that cone {a1, . . . , an} is a cone for all a1, . . . , an ∈ Rm.(18) Show that C = C∗ if

(a) C = {x ∈ Rn : x ≥ 0}, the positive orthant.(b) C = Lk, the Lorenz cone.(c) C = C1 × C2, where C∗1 = C1 and C∗2 = C2.

(19) Prove that the following sets are cones:(a) Pn := {p ∈ Rn+1 : p0x

n + · · ·+ pnx+ pn+1 ≥ 0 for all x ∈ R};(b) Σn := {p ∈ Rn+1 : p0x

n + · · ·+ pnx+ pn+1 is a sum of squares of polynomials }.Prove that Pn = Σn.

(20) Complete and prove: if A is an m × n matrix and b ∈ Rm, then exactly one of thefollowing holds:(a) There is a column vector x ∈ Rn such that Ax ≤ b and x ≥ 0.(b) (???).

EXERCISES 25

(21) Complete and prove: if A is an m×n matrix, B is an k×n matrix, and c ∈ Rm, d ∈ Rk,then exactly one of the following holds:(a) There is a column vector x ∈ Rn such that Ax ≤ c and Bx ≥ d.(b) (???).

(22) Show that Farkas’ Lemma is equivalent to the following statement. Let L ⊆ Rn be alinear space. For each i ∈ {1, . . . , n} exactly one of the following holds:(a) there is an x ∈ L such that x ≥ 0 and xi > 0, or(b) there is an y ∈ L⊥ such that y ≥ 0 and yi > 0.Here L⊥ := {y ∈ Rn : ytx = 0} is the orthogonal complement of L.

(23) Prove: For any closed convex cone C, m×n matrix A and vector b ∈ Rm, exactly oneof the following is true:(a) there exists an x ∈ C such that Ax = b;(b) there exists a y ∈ Rm such that yA ∈ C∗ and yb < 0.

(24) Prove Helly’s Theorem: Suppose m ≥ n+1. If C1, . . . , Cm are compact, convex subsetsof Rn so that the intersection of any n+ 1 of them is nonempty, then the intersectionof all m sets is nonempty.

CHAPTER 2

Linear optimization duality

1. The duality theorem

1.1. Weak duality. Consider the linear optimization problem max{cx : Ax ≤ b, x ∈ Rn},or equivalently,

(56) max{cx : a1x ≤ b1, a2x ≤ b2, . . . , amx ≤ bm, x ∈ Rn}

where a1, . . . , am are the rows of A (see Figure 1).We will describe a method to make upper bounds for (56). Recall that the inequality cx ≤ d

is a nonnegative combination of the constraints of (56) if and only if there exist y1, . . . , ym ≥ 0such that

(57)

y1× (a1x ≤ b1)y2× (a2x ≤ b1)

......

ym× (amx ≤ bm) +cx ≤ d.

or in other words, if

(58) y1a1 + · · ·+ ymam = c and y1b1 + · · ·+ ymbm = d.

If cx ≤ d is a nonnegative combination of the constraints, then cx ≤ d holds for all feasiblepoints x, and then d is an upper bound on (56). To find the best possible upper bound of this

a1

a3

a4

a5

a2

c

cx=d

Figure 1. A linear optimization problem

27

28 2. LINEAR OPTIMIZATION DUALITY

type, we need to solve the following optimization problem:

(59) min{y1b1 + · · ·+ ymbm : y1a1 + · · ·+ ymam = c, y1, . . . , ym ≥ 0, y1, . . . , ym ∈ R}.This is itself a linear optimization problem, the dual of (56). Taking y = (y1, . . . , ym), a

more concise way to write (59) is min{yb : yA = c, y ≥ 0, y ∈ Rm}. When speaking of a linearoptimization problem and its dual, the original problem (in this case (56)) is referred to as theprimal problem. We summarize the above in a Lemma, and give the short formal proof.

Lemma 18 (‘Weak duality’). For any m× n matrix A and vectors b ∈ Rm and c ∈ Rn, wehave

(60) sup{cx : Ax ≤ b, x ∈ Rn} ≤ inf{yb : yA = c, y ≥ 0, y ∈ Rm}.

Proof. If x ∈ Rn is such that Ax ≤ b and y ∈ Rm is such that yA = c and y ≥ 0, thenyb− yAx = y(b−Ax) ≥ 0 as both y ≥ 0 and b−Ax ≥ 0. Hence

(61) cx = (yA)x = y(Ax) ≤ yb.This proves the Lemma. �

1.2. Strong duality. We prove the Duality Theorem for linear optimization.

Theorem 19 (von Neumann, 1947; Gale, Kuhn and Tucker, 1951). For any m× n matrixA and vectors b ∈ Rm and c ∈ Rn, we have

(62) max{cx : Ax ≤ b, x ∈ Rn} = min{yb : yA = c, y ≥ 0, y ∈ Rm},provided that the maximization problem, or the minimization problem, is both feasible andbounded.

Proof. Since we already know that max ≤ min, it suffices to show that there is a feasiblesolution x to the maximization problem and a feasible solution y to the minimization problem sothat cx ≥ yb. To be exact, we want an x ∈ Rn and a y ∈ Rm such that Ax ≤ b, yA = c, y ≥ 0,and cx ≥ yb, or equivalently,

(63)

A 00 −At0 At

0 −I−c bt

[xyt

]≤

b

−ctct

00

.By Farkas’ Lemma, such a pair (x, y) does not exist if and only if there exists a row vector[u v v′ s z] ∈ Rm+n+n+m+1 such that

(64) [u v v′ s z]

A 00 −At0 At

0 −I−c bt

= 0, [u v v′ s z]

b

−ctct

00

< 0, [u v v′ s z] ≥ 0.

If the latter system has a solution, then there are u ∈ Rm, w ∈ Rn (taking w := (v− v′)t) andz ∈ R such that z ≥ 0, uA = zc, u ≥ 0, Aw ≤ zb and ub− cw < 0. We prove the theorem byshowing that such a triple (u,w, z) cannot exist.

We distinguish two cases, z = 0 and z > 0. If z > 0, let x := 1zw and y := 1

zu. ThenAx ≤ b, yA = c, y ≥ 0, and cx > yb, a violation to weak duality. So z = 0, and henceuA = 0, u ≥ 0, Aw ≤ 0 and either ub < 0 or cw > 0. If ub < 0 (and uA = 0) then there is nox ∈ Rn such that Ax ≤ b by Farkas’ Lemma (Theorem 10), i.e. the maximization problem is

2. OPTIMAL SOLUTIONS 29

infeasible. If cw > 0 and there exists a feasible x for the maximization problem, then x+λw isfeasible for any λ ≥ 0 and c(x+λw)→∞ when λ→∞, so that in this case the maximizationproblem is infeasible or unbounded. Thus if z = 0 the maximization problem is either infeasibleor unbounded, and one similarly argues that the minimization problem is either infeasible orunbounded, contradicting our assumption. �

In Chapter 1, we suggested that via Farkas’ Lemma we would find a certificate for opti-mality, that is a positive statement equivalent with condition (12). We state such a conditionbelow. Note that (12) is logically equivalent to (1) in the Corollary.

Corollary 19.1. Let A be an m×n matrix, b ∈ Rm a column vector, c ∈ Rn a row vectorand d ∈ R. If {x ∈ Rn : Ax ≤ b} 6= ∅, then the following are equivalent:

(1) cx ≤ d for all x ∈ Rn such that Ax ≤ b, and(2) there exists a row vector y ∈ Rm such that y ≥ 0, yA = c and yb ≤ d.

Proof. If (2) holds, then cx = yAx ≤ yb ≤ d for all x such that Ax ≤ b. Now considerthe problem max{cx : Ax ≤ b}. By the assumption that {x ∈ Rn : Ax ≤ b} 6= ∅, this problemis feasible. If (1) holds, then the problem is bounded; then Theorem 19 implies the existence ofa dual optimal solution y such that yb = max{cx : Ax ≤ b}. Then yb ≤ d, and (2) holds. �

1.3. Sensitivity analysis. In some applications, the coefficients of the vector b in theoptimization problem

(65) max{cx : Ax ≤ b, x ∈ Rn}are uncertain or fluctuating, and it is useful to estimate the value of (65) as a function of b.Note that if for a given b, the optimum solution of the dual problem

(66) min{yb : yA = c, y ≥ 0}is y, then the feasibility of y does not depend on b; hence

(67) max{cx : Ax ≤ b′, x ∈ Rn} = min{yb′ : yA = c, y ≥ 0} ≤ yb′,and

(68) max{cx : Ax ≤ b′, x ∈ Rn} −max{cx : Ax ≤ b, x ∈ Rn} ≤ yb′ − yb = y(b′ − b).So the coefficient yi is a bound for the change in the optimal value of (65) per unit of changein bi. In economics, the coefficients of the optimal dual solution are sometimes called shadowprices.

2. Optimal solutions

2.1. Complementary slackness. We consider again the primal and dual problems ofTheorem 19.

Lemma 20 (‘Complementary slackness’). Let A be and m × n matrix, b ∈ Rm a columnvector and c ∈ Rn a row vector. Let a1, . . . , am be the rows of A. If x is a feasible solution ofmax{cx : Ax ≤ b} and y is a feasible solution of min{yb : yA = c, y ≥ 0}, then the followingare equivalent.

(1) x and y are both optimal solutions; and(2) yi = 0 or aix = bi for each i ∈ {1, . . . ,m}.

Proof. Feasible solutions x and y are both optimal if and only if cx = yb, by Theorem19. Since yb− cx = yb− yAx = y(b−Ax) =

∑mi=1 yi(bi − aix), and yi ≥ 0 and b− aix ≥ 0 for

all i if x, y are feasible, we have cx = yb if and only if (2) holds. �


x optimal x not optimal x not optimal x not optimal

xc

xc

c

x

x

c

c

a3

a2

c c

a2c

a3a4

c ∈ cone {a2, a3} c 6∈ cone ∅ c 6∈ cone {a2} c 6∈ cone {a3, a4}

Figure 2. Several feasible solutions x and the optimality criterion

We give an optimality criterion with a more geometrical flavor.

Lemma 21. Let A be an m × n matrix, let b ∈ Rm be a column vector and c ∈ Rn a rowvector. Let a1, . . . , am be the rows of A. If x ∈ Rn is such that Ax ≤ b, then the following areequivalent:

(1) x is an optimal solution of max{cx : Ax ≤ b, x ∈ Rn}, and(2) c ∈ cone {ai : aix = bi, i ∈ {1, . . . ,m}}.

Proof. If (2) is not true, then by Theorem 8 (Farkas’ Lemma) there is a column vectord ∈ Rn such that cd < 0, and aid ≥ 0 for all i with aix = bi. But then A(x − λd) ≤ b for asmall enough λ > 0 and c(x − λd) > cx for all λ > 0, hence x is not optimal and (1) is false.Let B := {i ∈ {1, . . . ,m} : aix = bi}. If (2) is true, then there exist λi ≥ 0 for i ∈ B such thatc =

∑i∈B λiai. But then for any x ∈ Rn such that Ax ≤ b we have

(69) cx =∑i∈B

λiaix ≤∑i∈B

λibi =∑i∈B

λiaix = cx,

so x is optimal. �

Observe that the above lemma states nothing else than:

x is an optimal solution of max{cx : Ax ≤ b} if and only if there exists afeasible solution y of min{yb : yA = c, y ≥ 0} so that the complementaryslackness condition holds for the pair (x, y).

So there is nothing here that could not be proved from the strong duality theorem and com-plementary slackness. The reason for including this lemma is to give some geometrical insightinto optimality (see Figure 2; the feasible set and the vectors a1, . . . , a5 in this picture are asin Figure 1).

There is a second, more algebraic way to view Lemma 21. We say that an inequality ax ≤ bis tight in x if ax = b. Lemma 21 states that if x is an optimal solution of (56), it is possibleto obtain the inequality cx ≤ d (for some d ∈ R) as a nonnegative combination of inequalitiesthat are tight in x. Then it follows that d = cx, and the upper bound on (56) thus obtained isbest possible.

The direct proof of Lemma 21 is sometimes used to prove the strong duality theorem differ-ently: for the lemma implies that if there exists an optimal solution to the primal problem, thenthere exists an optimal solution to the dual with the same objective value. For an alternative

2. OPTIMAL SOLUTIONS 31

proof of the strong duality theorem, it would suffice to show directly that max{cx : Ax ≤ b}is attained if the problem is feasible and bounded, and then apply the above Lemma to theoptimal solution to show the existence of an optimal dual solution y.

2.2. Basic solutions. We are still working on the dual pair of optimization problems ofTheorem 19. Recall from linear algebra that the rowspace of a matrix A is rowspace(A) :={yA : y ∈ Rm}.

Theorem 22. Let A be an m × n matrix, b ∈ Rm a column vector and c ∈ Rn a rowvector. Let a1, . . . , am be the rows of A. Assume that max{cx : Ax ≤ b, x ∈ Rn} is feasible andbounded. Then there exists a set B ⊆ {1, . . . ,m} such that

(1) {ai : i ∈ B} is a basis of the rowspace of A;(2) any x ∈ Rn such that aix = bi for all i ∈ B is an optimal solution of max{cx : Ax ≤ b};

and(3) any y ∈ Rm such that yA = c and yi = 0 for all i 6∈ B is an optimal solution of

min{yb : yA = c, y ≥ 0}.

Proof. By Theorem 19, max{cx : Ax ≤ b} has an optimal solution. Let us choose anoptimal solution x ∈ Rn so that the total number of inequalities from Ax ≤ b that are tight inx is as large as possible: i.e. such that |{i ∈ {1, . . . ,m} : aix = bi}| is maximal.

Since x is optimal, it follows from Lemma 21 that c ∈ cone {ai : aix = bi}. By Theorem11, there exists a set of indices B ⊆ {1, . . . ,m} so that

(1’) {ai : i ∈ B} is a basis of lin.hull {ai : aix = bi};(2’) aix = bi for all i ∈ B; and(3’) c ∈ cone {ai : i ∈ B}.

(In fact, Caratheodory’s Theorem only implies that there is a linearly independent set satisfying(2’) and (3’), but it is easy to see that we may add vectors from {ai : aix = bi} to obtain abasis as in (1’).)

Suppose that (1) does not hold. Then

(70) lin.hull {ai : aix = bi} 6= rowspace(A) = lin.hull {ai : i ∈ {1, . . .m}}.

Then there must be a vector d orthogonal to all ai such that aix = bi, but not orthogonal toall rows ai of A, so Ad 6= 0. Consider the line ` : x+λd. Since Ad 6= 0, ` cannot be completelycontained in {x ∈ Rn|Ax ≤ b}, so at least one of

(71) min{λ ∈ R : A(x+ λd) ≤ b} or max{λ ∈ R : A(x+ λd) ≤ b}

exists. Let λ∗ be an optimal solution of the maximization or the minimization problem, andlet x∗ := x+λ∗d. Then there is an i∗ 6∈ B such that the linear inequality ai∗x ≤ bi∗ is satisfiedwith equality at x∗. Now

(72) i∗ 6∈ {i : aix = bi} ⊆ {i : aix∗ = bi} 3 i∗,

and x∗ is an optimal solution of max{cx : Ax ≤ b} by lemma 21 and (3’). This is a contradictionto our choice of x. So (1) is true.

If (2) is not true, then there exists an x′ ∈ Rn such that aix′ = bi for all i ∈ B, which is not

an optimal solution of max{cx : Ax ≤ b}. Note that any feasible x ∈ Rn such that aix = bi forall i ∈ B is optimal by (3’) and Lemma 21. So x′ is not feasible. Let d := x′ − x. Then theline ` : x+ λd is not completely contained in {x ∈ Rn|Ax ≤ b}, and again either the minimumor the maximum in (71) exists. We can deduce a contradiction as before.

We leave it to the reader to show (3). �


c

xc

x

x

c

x basic optimal solution x nonbasic optimal solution x basic optimal solution

Figure 3. Basic and nonbasic solutions

When considering the optimization problem max{cx : Ax ≤ b}, we say that a set B ⊆{1, . . . ,m} is a basis if it satisfies (1) in the above theorem, which is optimal if both (2) and (3)hold. An x as in (2) is then a basic optimal solution of max{cx : Ax ≤ b} (see Figure 3); suchan x necessarily exists, since a system of |B| ≤ n linear equations in n unknowns must have asolution. A y as in (3) is a basic optimal solution of min{yb : yA = c, y ≥ 0}; since B is a basisthere exists exactly one such y (exercise). To construct basic optimal solutions, it suffices toknow an optimal basis.

Recall that the rank of a m × n matrix A satisfies rank(A) = n − dim ker(A), and equalsthe dimension of the rowspace of A.

Corollary 22.1. If the problem max{cx : Ax ≤ b, x ∈ Rn} is feasible and bounded, it hasan optimal solution satisfying at least rank(A) inequalities from Ax ≤ b with equality.

An application of this corollary is the transportation problem of Chapter 1: this problemhas k + l + kl inequality constraints: k for factory productivity, l for city demands and klnonnegativity constraints. The problem has kl variables, and it is easy to verify that theconstraint matrix has rank kl. So there is an optimal solution for which at least kl of theconstraints are tight. At least kl − k − l of these tight constraints must be nonnegativityconstraints, as there are only k + l other constraints. This implies that there is an optimalsolution such that at most k+ l (out of kl) variables are nonzero. So an optimal transportationplan need not be complicated.

2.3. An abstract view. There is more symmetry in the duality theorem than is apparentfrom Theorem 19. We give an abstract version of the duality theorem that should make thissymmetry more clear.

Theorem 23. Let L ⊆ Rn be a linear space, and let s, t ∈ {1, . . . , n}. Then(73)

max{xs : x ∈ L, xi ≥ 0 if i 6= s, xt = 1}+ max{yt : y ∈ L⊥, yi ≥ 0 if i 6= t, ys = 1} = 0,

provided that both maximization problems in (73) are feasible.

The theorem follows from the following version of the ‘basic solutions’ theorem. For a linearspace L, a basis B, and an e 6∈ B, let CL(B, e) denote the unique x ∈ L such that xe = 1 andxi = 0 if i 6∈ B ∪ {e}.

Theorem 24. Let L ⊆ Rn be a linear space, and let s, t ∈ {1, . . . , n}. If both maximizationproblems in (73) are feasible, then there exists a basis B ⊆ {1, . . . , n} of L such that

(1) s ∈ B 63 t;(2) CL(B, t)i < 0 only if i = s;(3) CL⊥(B, s)i < 0 only if i = t.

Here B := {1, . . . , n} \B. Note that B is a basis of L⊥ iff B is a basis of L.

3. CONSTRUCTING THE DUAL 33

3. Constructing the dual

3.1. The dual of the dual. Since max{cx : Ax ≤ b, x ∈ Rn} is a problem to whichany linear optimization problem can be reduced, we should be able to construct a dual to anylinear optimization problem. A first question that comes to mind is: what is the dual of thedual of max{cx : Ax ≤ b, x ∈ Rn}; i.e. what is the dual of the linear optimization problemmin{yb : yA = c, y ≥ 0, y ∈ Rm}? Observe that min{yb : yA = c, y ≥ 0, y ∈ Rm} equals

(74) −max{(−bt)z :

At

−At−I

z ≤ ct

−ct0

, z ∈ Rm}.

The dual of this problem is

(75) −min{[u u′ w]

ct

−ct0

: [u u′ w]

At

−At−I

= −bt, [u u′ w] ≥ 0, [u u′ w] ∈ Rn+n+m},

which after transposing and simplifying some expressions, equals

(76) −min{−c(u′ − u)t : A(u′ − u)t ≤ b, u, u′ ≥ 0, u, u′ ∈ Rn}.

Substiting x for (u′ − u)t, we obtain max{cx : Ax ≤ b, x ∈ Rn}. So taking the dual of the dualbrings back the original problem.

3.2. Construction of a standard dual. Again, since any linear optimization problemcan be reduced to a problem of the form max{cx : Ax ≤ b}, we should be able to construct adual for any given linear optimization problem. This is possible in principle; it involves rewritingthe constraints to a single matrix inequality Ax ≤ b, then applying Theorem 19, followed bymore matrix manipulations. The above computation of the dual of the dual is an example. Itcan be a messy job, and there is no reason to assume that the result of such manipulations isunique. We state a version of the duality theorem that contains all possible duality theoremsalmost trivially as a special case. With this theorem, the problem of constructing a dualbecomes an exercise in substitution. Moreover, this procedure gives a ‘standard’ dual for eachprimal problem.

Theorem 25. Let A,B,C,D,E, F,G,H,K be matrices, let d, e, f be row vectors and leta, b, c be column vectors of appropriate sizes. Then

(77) max{dx+ey+fz :

Ax+By + Cz ≤ a,Dx+ Ey + Fz = b,Gx+Hy +Kz ≥ c,x ≥ 0, z ≤ 0

} = min{ua+vb+wc :

uA+ vD + wG ≥ d,uB + vE + wH = e,uC + vF + wK ≤ f,u ≥ 0, w ≤ 0

}

provided the maximization problem, or the minimization problem, is both feasible and bounded.

The proof of this theorem is an exercise. Note that each variable of the primal problemcorresponds to a constraint of the dual and vice versa. The exact correspondence laid down inTheorem 25 is summarized in the following table.


maximization problem minimization problem‘≤’ constraint ‘≥ 0’ variable‘=’ constraint free variable‘≥’ constraint ‘≤ 0’ variable‘≥ 0’ variable ‘≥’ constraintfree variable ‘=’ constraint‘≤ 0’ variable ‘≤’ constraint

coefficient at i-th variable in j-th constraint coefficient in i-th constraint at j-th variable

The slack of an inequality ax ≤ b is b − ax. The complementary slackness condition for apair of feasible solutions of the primal and dual problem is that if an inequality of the primal(resp. dual) problem has nonzero slack in the given solution, then the dual (resp. primal)variable corresponding to that inequality is zero in the given solution, i.e. has no slack in itsnonnegativity constraint. Hence the name ‘complementary slackness condition’. A feasibleprimal-dual solution is optimal if and only if it satisfies complementary slackness.

Corollary 25.1 (‘Complementary slackness’). For feasible solutions x, y, z and u, v, w ofthe above maximization and minimization problems, the following are equivalent:

(1) (x, y, z) and (u, v, w) are both optimal.(2) (a) if xi > 0 then the i-th column of uA+ vD+wG ≥ d holds with equality, for all i,

(b) if zk < 0 then the k-th column of uC + vF + wK ≤ f holds with equality, for allk,

(c) if ul > 0 then the l-th row of Ax+By+Cz ≤ a holds with equality, for all l, and(d) if wn < 0 then the n-th row of Gx+Hy +Kz ≥ c holds with equality, for all n.

Example: the transportation problem. We consider again the transportation problemof Chapter 1,

(78)

min{∑k

i=1

∑lj=1 pijxij :

∑lj=1 xij ≤ ci for i = 1, . . . , k,∑ki=1 xij ≥ dj for j = 1, . . . , l,

xij ≥ 0 for i = 1, . . . , k, j = 1, . . . , l},

where capacities ci, demands dj and unit transportation costs pij are given. In matrix notation,this problem takes the form

(79) min{px : Fx ≤ c, Gx ≥ d, x ≥ 0, x ∈ Rkl},

where F is the k × kl matrix with fi,i′j = 1 if i = i′ and fi,i′j = 0 otherwise, G is the l × klmatrix with gj,ij′ = 1 if j = j′ and gj,ij′ = 0 otherwise, and c ∈ Rk, d ∈ Rl and p ∈ Rkl are thecapacity, demand and transportation cost vectors. It is an exercise to show that the dual of(79) is

(80) max{zd− yc : zG− yF ≤ p, y ≥ 0, z ≥ 0, y ∈ Rk, z ∈ Rl}.

Let us translate this matrix version of the dual back to more down-to-earth terms. In thedual, we have a variable for each constraint of the primal problem; in particular, we have avariable yi for each factory i and a variable zj for each city j. The constraints of the dual eachcorrespond to a variable of the primal problem: e.g. the ij-th column of the matrix inequality

EXERCISES 35

zG− yF ≤ p is zj − yi ≤ pij . So the dual is

(81)

max{∑l

j=1 zjdj −∑k

i=1 yici : yi ≥ 0 for i = 1, . . . , k,

zj ≥ 0 for j = 1, . . . , l,zj − yi ≤ pij for i = 1, . . . , k, j = 1, . . . , l}.

We will take the dualization process one step further and try to interpret the dual in thesame economical setting that gave rise to the primal problem. Note that zj − yi is comparedto pij ; this suggests that the dual variables zj and yi have the same dimension as pij , whichis ‘money per unit’. Suppose yi is the price per unit for goods at the front door of factoryi and zj is the price per unit one is willing to pay for these goods in city j. Then there isa reasonable explanation for the constraint that zj ≤ yi + pij for each i, j. If citizens havecomplete knowledge of transportation costs and current factory prices, no citizen is willing topay more than the cost at any given factory plus the amount it takes to transport from thatfactory to his or her city. So the feasible region of the dual problem may be interpreted as therange of market prices in a market where the buyers have complete information. The objective∑l

j=1 zjdj −∑k

i=1 yici is a lower bound for the total amount spent by citizens minus an upperbound for the total amount payed to factories, which is a lower bound for the amount of moneyspent on transportation. It is not hard to verify formally that the dual maximum is less thanor equal to the primal minimum: if xij ≥ 0 are such that

∑i xij ≥ dj for all j and

∑j xij ≤ ci

for all i, and yi ≥ 0, zj ≥ 0 are such that zj − yi ≤ pij for all i, j, then

(82)∑i,j

pijxij ≥∑i,j

(zj − yi)xij =∑j

zj(∑i

xij)−∑i

yi(∑j

xij) ≥∑j

zjdj −∑i

yici.

Strong duality implies that equality is attained by the optimal solutions.

Exercises

(1) Prove: min{cx : Ax ≥ b, x ≥ 0, x ∈ Rn} = max{yb : yA ≤ c, y ≥ 0, y ∈ Rm} providedthat the minimum, or the maximum, is both feasible and bounded (mimick the proofof Theorem 19). Give a ‘complementary slackness condition’ for a feasible pair x, ythat is equivalent to the optimality of both x and y.

(2) Let A =

1 21 01 −3−3 −1−2 2

and b =

1241−3

6

. Let P := {x ∈ R2 : Ax ≤ b}. Draw the P in

the plane. Prove that cx ≤ d holds for all x ∈ P , for each of the following c ∈ R2 andd ∈ R.(a) c = (−1, 0), d = 1,(b) c = (0,−1), d = 0,(c) c = (1, 1), d = 9, and(d) c = (−1,−2), d = −3.Hint: read Corollary 19.1.

(3) Finish the proof of Theorem 22.(4) Let B be an optimal basis, i.e. a B satisfying (1)-(3) of Theorem 22.

(a) Prove that {x ∈ Rn : aix = bi for all i ∈ B} = x+ ker(A) for some x ∈ Rn.(b) Prove that there is exactly one y such that yA = c and yi = 0 for all i 6∈ B.


(5) Prove that the transportation problem has an optimal solution with at most k+ l− 1nonzero variables.

(6) Prove Theorem 25.(7) Construct the dual of the following linear optimization problems.

(a) max{cx : Ax = b, l ≤ x ≤ u, x ∈ Rn}, where A is an m × n matrix, b ∈ Rm andc, l, u ∈ Rn.

(b) min{fx : Ax = b, Cx ≤ d, x ∈ Rn}, where A is an m× n matrix, b ∈ Rm, C is ak × n matrix, d ∈ Rk, and f ∈ Rn.

(8) Prove that the dual of min{px : Fx ≤ c, Gx ≥ d, x ≥ 0, x ∈ Rn} is equal tomax{zd− yc : zG− yF ≤ p, y ≥ 0, z ≥ 0, y ∈ Rk, z ∈ Rl}.

(9) Determine the dual of:(a) max{−x1 + 2x2 + x3 : 2x1 + x2 − x3 ≤ −2,−x1 + 4x2 ≤ 3, x2 − x3 ≤ 0, x ∈ R3};(b) max{5x1 + x3 : −x1 + x2 ≥ 2, x1 + 4x2 + x3 ≤ 3, x1, x2, x3 ≥ 0, x ∈ R3}.

(10) Let A be an m× n matrix and let b ∈ Rn, and let P := {x ∈ Rn : Ax ≤ b}. Considerthe problem of finding a ball Bn(x0, r) := {x ∈ Rn : ‖x− x0‖ ≤ r} with center x0 andradius r, such that Bn(x0, r) ⊆ P , and the radius r is as large as possible. Formulatethis problem as a linear optimization problem. Determine the dual of this problem.

(11) Let L ⊆ Rn be a linear space and let s, t ∈ {1, . . . , n}, s 6= t. Show that if(a) there exists an x ∈ L such that xs > 0 and xi ≥ 0 for all i 6= s, t, and(b) there exists a y ∈ L⊥ such that yt > 0 and yi ≥ 0 for all i 6= s, t,then

max{xt : x ∈ L, xs = 1, xi ≥ 0 for all i 6= s, t} =−max{ys : y ∈ L⊥, yt = 1, yi ≥ 0 for all i 6= s, t} .

(12) The diet problem is the problem of finding a cheapest combination of foods that willsatisfy all the daily nutritional needs of a person or animal. There are n types of foodand m nutritional characteristics (vitamins, minerals, calories, dietary fibre, etc.).Given are a unit price ci for each food i, required amount bj for each j, and amountaij of j present in a unit of food i, for each i, j.(a) Formulate the diet problem as a linear optimization problem.(b) Determine the dual of this problem.(c) Interpret the constraints and variables of the dual.

(13) Tschebyshev approximation is the following. Given are row vectors a1, . . . , am ∈ Rnand numbers b1, . . . , bm ∈ R. We seek a smallest t ∈ R such that there exists an x ∈ Rnwith the property that |aix − bi| ≤ t for i = 1, . . . ,m. Formulate the Tschebyshevapproximation problem as a linear optimization problem, and determine the dual ofthis problem.

(14) Structural optimization is the problem of finding a bars-and-joints structure capable ofresisting a given set of external forces on its joints, using as little material as possible.Given are a finite set J of possible joints, each joint j ∈ J having coordinates pj ∈ R3,an external force fj ∈ R3 acting on each joint j, and a set B of unordered pairs ofjoints. A subset K ⊆ J of the joints is fixed to the wall or floor. Between eachpair of joints {i, j} ∈ B, you may place a bar of strength sij , meaning that sucha bar can withstand a force ≤ sij pushing together or pulling apart its ends i andj. The amount of material needed for a bar are proportional to both its length andits required strenght. The bars act like springs: when a framework is subject toan external force, the joints are displaced slightly (the displacement is negligable)compressing or pulling the bars which respond by exerting an (opposite) force on the

EXERCISES 37

joints they connect. Equilibrium is reached when the forces in each joint add up tozero.(a) Let xij be the force with which bar {i, j} pushes its endpoints apart (when xij is

negative the bar pulls its endpoints together). Give the equations that describewhen a framework is in equilibrium. Can there be more than one solution?

(b) Give an expression for the total amount of material needed in a framework withbars of strenght sij .

(c) Formulate the structural optimization problem as a linear optimization problemwith variables xij and sij for each {i, j} ∈ B. Can there be more than one optimalsolution?

(d) Determine the dual of this problem. Give a mechanical interpretation of thevariables and constraints.

(e) Suppose now that a type of rope is available capable of holding any force pullingits ends apart, of negligable cost. Formulate the problem that arises now.

(f) Can you incorporate the effect of gravity in your model?

CHAPTER 3

Polyhedra

1. Polyhedra and polytopes

1.1. Polyhedra. A set P ⊆ Rn is a polyhedron if P = {x ∈ Rn : Ax ≤ b}, for some m× nmatrix A and vector b ∈ Rm, i.e. a polyhedron is the solution set of a system of finitely manylinear inequalities. The set of solutions to just one linear inequality ax ≤ b is the closed affinehalfspace, or just halfspace H≤a,b := {x ∈ Rn : ax ≤ b}. Halfspaces are clearly both closed and

convex. Since a polyhedron is the intersection of finitely many halfspaces, it follows that allpolyhedra are closed, convex sets.

1.2. Polytopes. We say that y ∈ Rn is a convex combination of x1, . . . , xk ∈ Rn, if thereexist λ1, . . . , λk ≥ 0 such that y = λ1x1 + · · ·+ λkxk and λ1 + · · ·+ λk = 1. The line segment[x, y], for example, is the set of all convex combinations of x and y.

Lemma 26. If C ⊆ Rn is a convex set and x1, . . . , xk ∈ C then any convex combination ofx1, . . . , xk is in C.

The convex hull of a set of vectors X ∈ Rn is the set of all convex combinations of finite setsof vectors in X: conv.hull X := {λ1x1 + · · ·+ λkxk : k ∈ N, xi ∈ X,λ1 + · · ·+ λk = 1, λi ≥ 0}.It is clear from Lemma 26 that any convex set containing X will contain the convex hull of X.The convex hull of any set is, as its name suggests, a convex set. A polytope1 is the convex hullof a finite set of points, i.e. P is a polytope if there are k vectors x1, . . . , xk such that

(83) P = conv.hull {x1, . . . , xk} = {λ1x1 + · · ·+λkxk : λ1 + · · ·+λk = 1 and λ1, . . . , λk ≥ 0}.We shall prove the following theorem in the next section. Meanwhile, see Figure 1.

Theorem 27. Let P ⊆ Rn. Then P is a polytope if and only if P is a bounded polyhedron.

1The terminology is from the Greek: poly=many, hedron=side, topos=point.

Figure 1. A bounded polyhedron, or polytope P ⊆ R3

39

40 3. POLYHEDRA

dx

−dx

d

An element d of the lineality space An element d of the cone of directions

Figure 2. Lineality space and cone of directions

1.3. Lineality space and cone of directions. Let P ⊆ Rn be a convex set. The linealityspace of P is the linear space

(84) lin(P ) := {d ∈ Rn : {x+ λd : λ ∈ R} ⊆ P for all x ∈ P},

and the cone of directions of P is

(85) dir(P ) := {d ∈ Rn : {x+ λd : λ ∈ R, λ ≥ 0} ⊆ P for all x ∈ P}.

See Figure 2. It follows directly from these definitions that lin(P ) ⊆ dir(P ). It is an exerciseto show that if P = {x ∈ Rn : Ax ≤ b}, then lin(P ) = {d ∈ Rn : Ad = 0} = ker(A) anddir(P ) = {d ∈ Rn : Ad ≤ 0}. If P is a polytope, then lin(P ) = dir(P ) = {0}.

1.4. Affine hull and dimension. Let x, y ∈ Rn. The line through x and y is the set

(86) < x, y >:= {x+ λ(y − x) : λ ∈ R}.

A set W ⊆ Rn is affine if < x, y >⊆W for all x, y ∈W .We say that y ∈ Rn is an affine combination of x1, . . . , xk if there exist λ1, . . . , λk ∈ R so

that

(87) y = λ1x1 + · · ·+ λkxk and λ1 + · · ·+ λk = 1.

The affine hull of a set X ⊆ Rn is the set of all affine combinations of points in X:

(88) aff.hull (X) := {λ1x1 + · · ·+ λkxk : k ∈ N, xi ∈ X,λ1 + · · ·+ λk = 1, λi ∈ R}

and one can show that

(89) aff.hull (X) =⋂{W : W ⊇ X,W affine},

i.e. the affine hull of X is the intersection of all affine sets containing X; it can furthermore beshown that aff.hull (X) is an affine set itself (exercise).

A finite set {x1, . . . , xk} ⊆ Rn is affinely independent if for all λ1, . . . , λk ∈ R we have

(90) λ1x1 + · · ·+ λkxk = 0 and λ1 + · · ·+ λk = 0⇒ λ1 = · · · = λk = 0.

The dimension of a set X ⊆ Rn is dim(X) := max{|I| : I ⊆ X, I affinely independent }− 1. Itcan be shown that dim(X) = dim(aff.hull (X)) for any X ⊆ Rn.

1. POLYHEDRA AND POLYTOPES 41

��

��

A vertex An edge A facet

Figure 3. Faces of a polytope P ⊆ R3

1.5. Faces. Let P ⊆ Rn be a convex set, and let c ∈ Rn, d ∈ R. The hyperplaneHc,d := {x ∈ Rn : cx = d} is a supporting hyperplane of P if max{cx : x ∈ P} = d. Ingeometrical terms: H is a supporting hyperplane of P if all of P is on one side of H, but Htouches P .

We say that F ⊆ Rn is a face of P if F = P , or F = P ∩H for some supporting hyperplaneH of P ; in the latter case, H is said to define F . Clearly, the face F = P ∩Hc,d is exactly theset of optimal solutions to max{cx : x ∈ P}.

Let P be a polyhedron. A point v ∈ P is a vertex of P if {v} is a face of P . We denote theset of all vertices of P by V (P ). An edge of a polyhedron is a face F such that dim(F ) = 1. Aface F is a facet of P if dim(F ) = dim(P )− 1. See Figure 3.

1.6. Examples of polyhedra / bounded polytopes.

(1) The n-simplex Sn is the convex hull of the n+ 1 standard basis vectors in Rn+1.

Sn := conv.hull {e1, . . . , en+1}.

The dimension of Sn is n. We have Sn = {x ∈ Rn+1 : x ≥ 0, x1 + · · ·+ xn+1 = 1}.(2) The n-cube Qn is the convex hull of all possible vectors with each entry ±1:

Qn := conv.hull {−1, 1}n.

We have Qn = {x ∈ Rn : −1 ≤ xi ≤ 1 for i = 1, . . . , n}. The dimension of Qn is n.(3) The n-orthoplex On is the convex hull of ± all standard basis vectors in Rn.

On := conv.hull {e1,−e1, . . . , en,−en}.

We have On = {x ∈ Rn : cx ≤ 1 for all c ∈ {−1, 1}n}. The dimension of On is n.(4) The 5 platonic solids in R3 are polytopes. The tetrahedron is isomorphic to S3, the

hexahedron or cube is Q3, and the octahedron or cross polytope is O3. The other twoare the dodecahedron (20 vertices, 30 edges, 12 pentagonal facets) and the icosahedron(12 vertices, 30 edges, 20 triangular facets).

(5) There are 6 special polyhedra in R4 (polychora2), the pentachoron which is isomorphicto S4, the octachoron or tesseract Q4, and the cross polychoron O4, the 24-cell whose24 facets are all isomorphic to the octahedron, the 120-cell whose 120 facets are allisomorphic to the dodecahedron, and the 600-cell whose 600 facets are tetrahedra.

2A ‘polychoron’ is a 4-dimensional polyhedron. Greek: choros=space, room.

42 3. POLYHEDRA

2. Faces, vertices and facets

2.1. Faces of polyhedra. The aim of this section is to show one direction of Theorem27, that a bounded polyhedron is a polytope; precisely, we show that a bounded polyhedron isalways the convex hull of its set of vertices.

Lemma 28. Let A be an m × n matrix, let b ∈ Rn and let P := {x ∈ Rn : Ax ≤ b}. Leta1, . . . , am be the rows of A. Then the following are equivalent for a nonempty set F ⊆ P :

(1) F is a face of P ; and(2) F = {x ∈ P : aix = bi for all i ∈ J} for some J ⊆ {1, . . . ,m}.

Proof. (1) ⇒ (2): Suppose F is a face of P . If F = P , then (2) hlods taking J = ∅.Otherwise, F is the set of optimal solutions of max{cx : Ax ≤ b} for some c ∈ Rn. Let y be anoptimal solution of the dual problem min{yb : yA = c, y ≥ 0}. By complementary slackness, xis an optimal solution of max{cx : Ax ≤ b} if and only if x is feasible and yi > 0⇒ aix = bi forall i. In other words, F = {x ∈ P : aix = bi for all i ∈ J} with J := {i ∈ {1, . . . ,m} : yi > 0}.

(2) ⇒ (1): Suppose F = {x ∈ P : aix = bi for all i ∈ J} 6= ∅ for some J ⊆ {1, . . . ,m}. IfJ = ∅, then F = P and we are done. Otherwise, let c :=

∑i∈J ai and d :=

∑i∈J bi. Then

(91) cx =∑i∈J

aix ≤∑i∈J

bi = d

for all x ∈ P , with equality if and only if aix = bi for all i ∈ J , i.e. if and only if x ∈ F . SinceF 6= ∅, it follows that F is the set of optimal solutions of max{cx : Ax ≤ b}. �

It follows directly that a polyhedron has only finitely many faces, as there are finitely manysubsets J of {1, . . . ,m}. In particular, a polyhedron has only finitely many vertices. Any faceof a polyhedron is again a polyhedron.

Lemma 29. Let P ⊆ Rn be a polyhedron and let c ∈ Rn. If max{cx : x ∈ P} is feasible andbounded, and lin(P ) = {0}, then max{cx : x ∈ P} is attained by a vertex of P .

Proof. Let A and b be such that P = {x ∈ Rn : Ax ≤ b}, and let a1, . . . , am be therows of A. Then ker(A) = lin(P ) = {0}, hence rank(A) = n. By Theorem 22 there exists aset B ⊆ {1, . . . ,m}, so that {ai :∈ B} is a basis of rowspace(A) and F := {x ∈ Rn : aix =bi for all i ∈ B} is a set of optimal solutions of max{cx : x ∈ P}. In particular, all x ∈ F arefeasible, so that F ⊆ P . By Lemma 28, F is a face of P . Since ker(A) = lin(P ) = {0}, wehave rowspace(A) = Rn, thus {ai :∈ B} is a basis of Rn and F = {x} for some x. Then x is avertex of P and an optimal solution of max{cx : x ∈ P}. �

We say that a polyhedron is pointed if it has at least one vertex. One can show using theabove Lemma that a polyhedron P is pointed if and only if lin(P ) = {0}.

Theorem 30. If P ⊆ Rn is a bounded polyhedron, then P = conv.hull V (P ).

Proof. We may assume that P is nonempty. Since V (P ) ⊆ P , it follows directly thatconv.hull (V (P )) ⊆ P . We will show that P ⊆ conv.hull V (P ) as well. Suppose this is not true,and let x ∈ P \ conv.hull V (P ). By Theorem 4, there is a vector c ∈ Rn and a d ∈ R such thatthe hyperplane Hc,d separates conv.hull V (P ) and x, say cx < d for all x ∈ conv.hull V (P ) andcx > d. As P is bounded, we know that lin(P ) = {0}, and that max{cx : x ∈ P} is attained.Then

(92) max{cx : x ∈ V (P )} ≤ max{cx : x ∈ conv.hull V (P )} < d < cx ≤ max{cx : x ∈ P}.But by Lemma 29 there is a vertex of P attaining the latter maximum, a contradiction. �

2. FACES, VERTICES AND FACETS 43

2.2. Faces of polytopes. We will now complete the proof of Theorem 27, and show thata polytope is always a bounded polyhedron, i.e. that for any set of points x1, . . . , xk ∈ Rnthere is a system of finitely many linear inequalities Ax ≤ b so that conv.hull {x1, . . . , xk} ={x ∈ Rn : Ax ≤ b}.

Let X ⊆ Rn. We say that an inequality cx ≤ d is valid for X if cx ≤ d holds for allx ∈ X. For the polyhedral description of our polytope P = conv.hull {x1, . . . , xk}, we needa finite collection of inequalities that are valid for P . Clearly cx ≤ d is valid for P if andonly if cx ≤ d is valid for {x1, . . . , xk}. Since a polytope P is both convex and compact, theSeparation Theorem (Theorem 4) implies that the set of all valid inequalities is an adequatedescription of P :

(93) P = {x ∈ Rn : cx ≤ d for all c, d so that cxi ≤ d for all i}.

But this is an infinite set of linear inequalities; so we must argue that most of these valid in-equalities are superfluous for the description of P . We will show that the only valid inequalitiescx ≤ d we really need for the description of P are such that P ∩Hc,d is a facet; we call suchinequalities facet-defining or essential.

Lemma 31. Let y, x1, . . . , xk ∈ Rn, and let P := conv.hull {x1, . . . , xk}. Suppose thatdim(P ) = n. If y 6∈ P , then there is a vector c ∈ Rn and a d ∈ R such that cxi ≤ d fori = 1, . . . , k, cy > d, and P ∩Hc,d is a facet of P .

Proof. Using that y ∈ conv.hull {x1, . . . , xk} if and only if

(94)

[y1

]∈ cone {

[x1

1

], . . . ,

[xk1

]},

and that {x1, . . . , xk} is affinely independent if and only if {[x1

1

], . . . ,

[xk1

]} is linearly

independent, it follows from Theorem 12 that if y 6∈ conv.hull {x1, . . . , xk}, then there is anaffinely independent set Y ⊆ {x1, . . . , xk}, a vector c ∈ Rn and a d ∈ R such that

(1) cxi ≤ d for i = 1, . . . , k, cy > d, and(2) cx = d for all x ∈ Y , and |Y | = dim{y, x1, . . . , xk}.

Let F := P ∩Hc,d. To show that F is a facet of P , it suffices to show that dim(F ) = n− 1, asdim(P ) = n by assumption. Observe that

(95) Y ⊆ F ⊆ Hc,d,

so that dim(Y ) ≤ dim(F ) ≤ dim(Hc,d). Since Y is affinely independent, we have dim(Y ) =|Y | − 1 = dim{y, x1, . . . , xk} − 1 = n− 1, and clearly dim(Hc,d) = n− 1. �

Theorem 32. Let x1, . . . , xk ∈ Rn, and let P := conv.hull {x1, . . . , xk}. Let Ax ≤ b be asystem of valid linear inequalities for P so that for each facet F of P there is a row aix ≤ biof Ax ≤ b such that F = P ∩ Hai,bi. Let Cx = d be a system of linear equations such thataff.hull (P ) = {x ∈ Rn : Cx = d}. Then P = {x ∈ Rn : Ax ≤ b, Cx = d}.

Proof. Let us first assume that dim(P ) = n, so aff.hull (P ) = Rn. Clearly P ⊆ {x ∈Rn : Ax ≤ b}; we need to show that P ⊇ {x ∈ Rn : Ax ≤ b}. Suppose y 6∈ P . By Lemma31, there is a valid inequality cx ≤ d for P with cy > d, such that F := P ∩ Hc,d is a facetof P . Let aix ≤ bi be the valid inequality from Ax ≤ b such that F = P ∩ Hai,bi . ThenHai,bi = aff.hull (F ) = Hc,d. It follows that aiy > bi, and hence y 6∈ {x ∈ Rn : Ax ≤ b}.

The case where dim(P ) < n is left to the reader. �

44 3. POLYHEDRA

3. Polyhedral cones

3.1. Polyhedral cones and finitely generated cones. We say that a cone C ⊆ Rn ispolyhedral if there is an m× n matrix A such that C = {x ∈ Rn : Ax ≥ 0}. A cone C ⊆ Rn isfinitely generated if there are vectors a1, . . . , am ∈ Rn such that C = cone {a1, . . . , am} — thena1, . . . , am are generators of C. We have:

Lemma 33. Let a1, . . . , am. Then

(96) C = {x ∈ Rn : at1x ≥ 0, . . . , atmx ≥ 0} ⇒ C∗ = cone {a1, . . . , am}.

Proof. Let C = {x ∈ Rn : at1x ≥ 0, . . . , atmx ≥ 0}. It is easy to see that then C∗ ⊇cone {a1, . . . , am}. Suppose that y ∈ C∗ \ cone {a1, . . . , am}. By Farkas’ Lemma, there existsan x ∈ Rn such that atix ≥ 0 for all i and ytx < 0. Then x ∈ C, y ∈ C∗, and ytx < 0,contradicting the definition of polar cone. So C∗ = cone {a1, . . . , am}. �

In other words, if a cone is polyhedral, then its polar is finitely generated. This saves ushalf the work in proving the next theorem. The ‘if’ part is due to Hermann Weyl (1935) andthe ‘only if’ part is due to Hermann Minkowski (1896).

Theorem 34. Let C be a cone. Then C is polyhedral if and only if C is finitely generated.

Proof. We first show that if C is finitely generated, then C is polyhedral. Let C =cone {a1, . . . , am} for certain vectors a1, . . . , am ∈ Rn. We distinguish two cases, the case thatrank({a1, . . . , am}) = n and the case that rank({a1, . . . , am}) < n. We consider first the casewhere rank({a1, . . . , am}) = n; then lin.hull {a1, . . . , am} = Rn. Call a row vector d ∈ Rnessential if there is a linearly independent set Y ⊆ {a1, . . . , am} such that

(1) dai ≥ 0 for all i, and(2) da = 0 for all a ∈ Y , and |Y | = n− 1,

and such that ‖d‖ = 1. There are at most 2 such d’s for each subset Y . Since there are finitelymany subsets Y of {a1, . . . , am}, it follows that there are finitely many essential vectors. ThusD := {x ∈ Rn : dx ≥ 0, d essential} is a polyhedral cone. We claim that C = D. Clearly C ⊆ D,since dai ≥ 0 for all essential d and all i. If x 6∈ C, then there is an essential d such that dx < 0 byTheorem 12, hence x 6∈ D. It follows that C = D. Now suppose that rank({a1, . . . , am}) < n.Let L := lin.hull {a1, . . . , am}, and let b1, . . . , bk be a set of linearly independent vectors suchthat L = {x ∈ Rn : btix = 0, i = 1, . . . , k}. Then rank({a1, . . . , am, b1, . . . , bk}) = n, and hencethe cone C ′ = cone {a1, . . . , am, b1, . . . , bk} is polyhedral. Since C = {x ∈ C ′ : btix = 0, i =1, . . . , k}, the cone C is polyhedral as well.

To see that each polyhedral cone is finitely generated, note that if C is polyhedral, thenC∗ is finitely generated (by the lemma), and hence C∗ is polyhedral (by the above argument),hence C∗∗ = C is finitely generated (by the lemma applied to C∗). �

Theorem 27 is a corollary to this theorem.

Corollary 34.1. Let P ⊆ Rn. Then P is a bounded polyhedron if and only P is a polytope.

Proof. We first show that any bounded polyhdron is a polytope. Let P be a boundedpolyhedron. Then P = {x ∈ Rn : Ax ≤ b} for some m× n matrix A and vector b ∈ Rm. Thecone C := {x′ ∈ Rn+1 :

[A −b

]x′ ≤ 0} is polyhedral, and we have

(97) P = {x ∈ Rn :

[x1

]∈ C}.

EXERCISES 45

Note that if

[d0

]∈ C, then Ad ≤ 0 and hence d ∈ dir(P ). Since we assumed that P is

bounded, there are no such d other than d = 0. By the theorem, there are a1, . . . , ak ∈ Rn+1

such that C = cone {a1, . . . , ak}. As ai ∈ C for all i, the last coordinate of each ai is nonzero.

Let a′i := 1ain+1

ai. Then we may write a′i =

[xi1

]for i = 1, . . . , k, and we have

(98) C = cone {[x1

1

], . . . ,

[xk1

]}.

It follows that P = conv.hull {x1, . . . , xk}.We now show that each polytope is a bounded polyhedron. Let P be a polytope. Then

P = conv.hull {x1, . . . , xk} for some x1, . . . , xk ∈ Rn. The cone

(99) C := cone {[x1

1

], . . . ,

[xk1

]}

is finitely generated, and we have

(100) P = {x ∈ Rn :

[x1

]∈ C}.

By the theorem C is polyhedral, so there is a (n+1)×m matrix A′ such that C = {x′ ∈ Rn+1 :A′x ≤ 0}. Writing A′ =

[A −b

], it follows that P = {x ∈ Rn : Ax ≤ b}. �

Exercises

(1) Show that if C ⊆ Rn is convex and y ∈ Rn is a convex combination of x1, . . . , xk ∈ C,then y ∈ C.

(2) Show that the convex hull of any set is convex.(3) Show that the lineality space of a convex set is a linear space.(4) Show that if P = {x ∈ Rn : Ax ≤ b} 6= ∅, the following sets are equal:

(a) {d ∈ Rn : {x+ λd : λ ∈ R} ⊆ P for all x ∈ P}(=: lin(P )).(b) {d ∈ Rn : there is an x ∈ P such that {x+ λd : λ ∈ R} ⊆ P}.(c) {d ∈ Rn : Ad = 0}.

Hint: prove (a) ⊆ (b) ⊆ (c) ⊆ (a).(5) Show that the cone of directions of a convex set is a cone.(6) Show that if P = {x ∈ Rn : Ax ≤ b} 6= ∅, the following sets are equal:

(a) {d ∈ Rn : {x+ λd : λ ≥ 0, λ ∈ R} ⊆ P for all x ∈ P}(=: dir(P )).(b) {d ∈ Rn : there is an x ∈ P such that {x+ λd : λ ≥ 0, λ ∈ R} ⊆ P}.(c) {d ∈ Rn : Ad ≤ 0}

Hint: prove (a) ⊆ (b) ⊆ (c) ⊆ (a).(7) Show that if P is a polytope, then P = conv.hull V (P ).(8) Prove Radon’s Theorem: If X is a set of at least n+ 2 vectors in Rn, then there exists

a partition X1, X2 of X (i.e. X1 ∪X2 = X and X1 ∩X2 = ∅) so that conv.hull (X1)∩conv.hull (X2) 6= ∅.

(9) Show that the following are equivalent for a set W ⊆ Rn.(a) W is affine, i.e. < x, y >⊆W for all x, y ∈W .(b) W = p+ L for some p ∈ Rn and linear space L ⊆ Rn.(c) W = {x ∈ Rn : Ax = b} for some m× n matrix A and vector b ∈ Rn.

Prove that if W satisfies one, and hence each of the above, then dim(W ) = dim(L) =n− rank(A).

46 3. POLYHEDRA

(10) Let X ⊆ Rn.(a) Show that aff.hull (X) =

⋂{W : W ⊇ X,W affine}.

(b) Show that aff.hull (X) is affine.(c) Show that dim(X) = dim(aff.hull (X)).

(11) Show that a finite set X ⊆ Rn is affinely independent if and only if the associated set

{[x1

]: x ∈ X} is linearly independent in Rn+1.

(12) Let X ⊆ Rn, and let x ∈ Rn. Prove: if x ∈ conv.hull (X), then there exists an affinelyindependent set Y ⊆ X so that x ∈ conv.hull (Y ).

(13) Prove Barany’s Theorem: Let X1, . . . , Xn+1 be finite subsets of Rn so that 0 ∈conv.hull (Xi) for all i. Then there exist x1 ∈ X1, x2 ∈ X2, . . . , xn+1 ∈ Xn+1 sothat 0 ∈ conv.hull {x1, . . . , xn+1}.

(14) Prove that a polytope of dimension n has at least n+ 1 vertices.(15) Let P be a polyhedron and F a face of P . Prove that F is a facet of P if and only if

the only face of P that properly contains F is P .(16) Let P be a polyhedron and F a face of P . Prove that F is a minimal face (i.e. F does

not properly contain another face of P ) if and only if F = p+ lin(P ) for some vectorp.

(17) Prove the following statement: Let A be an m × n matrix, let b ∈ Rn and let P :={x ∈ Rn : Ax ≤ b}. Let a1, . . . , am be the rows of A. Then the following are equivalentfor a point v ∈ P :(a) v is a vertex of P ; and(b) {v} = {x ∈ P : aix = bi for all i ∈ B} for some B ⊆ {1, . . . ,m} such that{ai : i ∈ B} is a basis of Rn.

Deduce that a polyhedron defined by m inequalities has no more than m choose nvertices.

(18) We say that a polyhedron P is pointed if it has at least one vertex. Show that P ispointed if and only if lin(P ) = {0}.

(19) Let C be a closed convex set and let x lie on the boundary of C. Show that there isa face of C with dim(F ) < dim(C) containing x.

(20) Let P ⊆ Rn be a polytope with dim(P ) = n. Show:(a) ∂P =

⋃{F : F is a facet of P}, where ∂P denotes the boundary of P .

(b) if y 6∈ P there are c ∈ Rn, d ∈ R such that cx ≤ d for all x ∈ P , cy > d, andP ∩Hc,d is a facet of P .

(21) Let C be a convex set. A point v ∈ C is extreme if there are no x, x′ ∈ C \ {v} so thatv ∈ [x, x′]. Show:(a) If C is a compact convex set, then C is the convex hull of its extreme points.(b) If P is a polyhedron and v ∈ P , then v is extreme if and only if v is a vertex of

P .Conclude that a bounded polyhedron is the convex hull of its vertices.

(22) Let C be a closed convex set. A subset F ⊆ C is extreme if it is convex, and for allv ∈ F and x, x′ ∈ C \ {v} such that v ∈ [x, x′], we have x, x′ ∈ F . Show that F is anextreme set of C if and only if F is a face of C.

(23) Verify that(a) Sn = {x ∈ Rn+1 : x ≥ 0, x1 + · · ·+ xn+1 = 1}.(b) Qn = {x ∈ Rn : −1 ≤ xi ≤ 1 for i = 1, . . . , n}.(c) On = {x ∈ Rn : cx ≤ 1 for all c ∈ {−1, 1}n}.

EXERCISES 47

Hint: that the polytope is included in the polyhedron is easy to verify; to show theconverse, either prove that all vertices of the polyhedron occur in the definition of thepolytope, or prove that all essential inequalities of the polytope are in the definitionof the polyhedron.

(24) Let x1, . . . , xk ∈ Rn and P := conv.hull {x1, . . . , xk}. Show that V (P ) ⊆ {x1, . . . , xk}and P = conv.hull V (P ).

(25) Prove Euler’s formula: for any polytope P ⊆ Rn, we have∑n

i=0(−1)ifi(P ) = 1 wherefi(P ) is the number of i-dimensional faces of P .

(26) A regular polytope may be defined recursively as a polytope whose vertices lie on asphere, whose edges are all of the same length and whose facets are all isomorphic tothe same regular polytope. Each of the examples in subsection 1.6 of this Chapter isa regular polytope.(a) Which are the 2-dimensional regular polytopes?(b) Prove that there are no other regular polytopes of dimension 3 than the 5 Platonic

solids. Hint: use Euler’s formula and your classification of the 2-dimensionalregular polytopes.

(c) Prove that there are no other regular polytopes of dimension 4 than the 6 poly-chora described in subsection 1.6.

(d) Prove that if P is a regular polytope of dimension n > 4, then P = Sn, Qn, orOn.

(e) A polytope P is vertex-transitive if for any two vertices v, w ∈ V (P ) there is anorthogonal transformation L : Rn → Rn such that L(v) = w and L[P ] = P . Areall regular polytopes transitive?

Open problem. Let P be a polytope, and let v, w ∈ V (P ) be two vertices of P . A pathfrom v to w is a sequence of vertices v0, . . . , vk ∈ V (P ), such that [vi, vi+1] is an edge of P foreach i = 0, . . . , k − 1, and v = v0, w = vk; the length of that path P is k, i.e. the length of apath equals the number of edges it traverses.

Conjecture 1 (Hirsch, 1957). Let P be an n-dimensional polytope with m facets, and letv, w ∈ V (P ) be two vertices of P . Then there is a path from v to w of length at most m− n.

CHAPTER 4

The simplex algorithm

1. Tableaux and pivoting

1.1. Overview. Several methods for solving linear optimization problems currently exist.There is the Simplex method due to George Dantzig (1951), the Ellipsoid method of LeonidKhachiyan (1979), and Narendra Karmarkar’s Interior point method (1984). Theoretically, thelatter two are superior methods; it can be shown that they run in polynomial time which meansthat the running time of each algorithm on a theoretical computer is bounded by a polynomialfunction of the size of the input problem in bits. The simplex method does not have this virtue.In practice, both the simplex method and the interior point method perform well. The ellipsoidmethod requires very high precision calculations, and is slow in practice.

In this Chapter we describe the simplex method, which may be compared to the Gaussianelimination method for solving systems of linear equations. Gaussian elimination is a method tomodify a given system of linear equations into an equivalent system of linear equations which iseasily solvable. The simplex method is a technique to modify a given linear optimization prob-lem into an equivalent one for which an optimal solution is easily found. Gaussian eliminationon Ax = b is usually performed by applying row operations on the coefficient matrix

[A b

].

Likewise, the simplex method is performed by row operations on a ‘tableau’ containing all thecoefficients of the linear optimization problem at hand. These row operations are grouped inso-called pivot steps. After each pivot step, the tableau is ‘basic’, and it is easy to read offa ‘basic’ feasible solution of the optimization problem from the tableau. These solutions arevertices of the feasible region of the problem, and the sequence of basic solutions generated bythe method is a path from vertex to vertex over the edges of the feasible region (see figure 1).Pivot steps are repeated until an optimal basic solution is found.

��

��

��

��

��

c

v

v

v

v

v

0

1

2

3

4

Figure 1. The simplex method geometrically

49

50 4. THE SIMPLEX ALGORITHM

1.2. Tableaux. Let A be an m × n matrix, let b ∈ Rm be a column vector, let c ∈ Rnbe a row vector and let d ∈ R. The tableau corresponding to the linear optimization problemmax{cx+ d : Ax = b, x ≥ 0} is the (m+ 1)× (n+ 1) block matrix

(101)−c dA b

.

Note the minus sign before the c in the top row. By convention, the rows of tableau (101) areindexed 0, 1, . . . ,m and the columns 1, . . . , n, n+ 1; that way, the ij-th entry of A is the ij-thentry of T .

We say that two tableaux

(102) T =−c dA b

and T ′ =−c′ d′

A′ b′

are equivalent if the corresponding problems

(103) max{cx+ d : Ax = b, x ≥ 0} and max{c′x+ d′ : A′x = b′, x ≥ 0}have the same set of feasible solutions and the same set of optimal solutions.

Recall that row operations on a matrix are

(1) multiplying a row by a nonzero scalar;(2) interchanging two rows; and(3) adding a multiple of a row to another row.

Lemma 35. Let T and T ′ be tableaux. Suppose T ′ is obtained from T by any row operationsother than multiplying the top row by a scalar, interchanging the top row with another, oradding a multiple of the top row to another row. Then T and T ′ are equivalent.

Proof. Say T and T ′ are as in (102), with corresponding problems (103). If T ′ is obtainedfrom T by a row operation not involving the top row at all, then {x ∈ Rn : Ax = b} ={x ∈ Rn : A′x = b′} and then clearly the optimization problems have the same set of feasiblesolutions. If T ′ is obtained from T by adding a scalar multiple of the i-th row to the top row,say −c′ = −c+ λiai and d′ = d+ λibi, then

(104) c′x+ d′ = (c− λiai)x+ (d+ λibi) = cx+ d+ λi(−aix+ bi) = cx+ d

for all x such that Ax = b. Thus, the objective functions cx + d and c′x + d′ are identical onthe common feasible set of T and T ′, and hence the set of optimal solutions is the same foreither objective function. �

So we have a certain freedom in changing the appearance of a linear optimization problem:we may move to an equivalent problem by row operations. We must now decide how to usethis freedom.

1.3. Basic solutions and basic tableaux. Suppose we have a tableau of the form

(105) T =−c′ 0 dA′ I b

.

Then x∗ :=[

0 bt]t

is a solution to the system of linear equations [ A′ I ]x = b. If b ≥ 0,then x∗ ≥ 0, and then x∗ is a feasible solution of the optimization problem

(106) max{ [ c′ 0 ]x+ d : [ A′ I ]x = b, x ≥ 0, x ∈ Rn}

corresponding to T . Also, [ c′ 0 ]x∗ + d = d as [ c′ 0 ][

0 bt]t

= c′0 + 0b = 0. Thus if

c′ ≤ 0, then [ c′ 0 ]x+ d ≤ d for all x ≥ 0, and then x∗ is an optimal solution.

1. TABLEAUX AND PIVOTING 51

All this is a good reason to favour tableaux of the form (105) where b ≥ 0: from suchtableaux we can read off a feasible solution x∗ with objective value d easily; a solution whichwe can recognize as optimal if c′ ≤ 0. But (105) is not the only tableau with these usefulproperties: clearly, it doesn’t hurt if the columns of the identity matrix in T are scatteredthroughout the tableau. Hence the following definitions.

Let A be an m × n matrix and let aj be the j-th column of A, let b ∈ Rm be a columnvector, let c ∈ Rn be a row vector, let d ∈ R and let B ⊆ {1, . . . , n} be a set with |B| = m.The tableau

(107)−c dA b

is a basic tableau belonging to basis B if {aj : j ∈ B} = {e1, . . . , em} and cj = 0 for all j ∈ B.It is easy to determine a vector x ∈ Rn such that Ax = b and xj = 0 for all j 6∈ B; this is xB,the basic solution belonging to B. This basic solution xB has entries 0 and bi only, like the x∗

above, and its objective value cxB + d equals d, since cxB = 0. The tableau belonging to B iscalled feasible if b ≥ 0, dual feasible if c ≤ 0 and optimal if both feasible and dual feasible. Thecorresponding basic solution xB is feasible/optimal if the tableau is feasible/optimal.

Given any tableau T =−c dA b

and set B such that {aj : j ∈ B} is a basis of Rm, there is

a unique basic tableau T ′ corresponding to basis B that is equivalent to T (exercise).

1.4. Pivoting. Given a matrix, pivoting on the ij-th entry is adding multiples of the i-throw to other rows and dividing the i-th row so that the j-th column becomes a unit vector,with a 1 in the i-th row and 0 in the rest of the column.

Let T be a basic tableau belonging to basis B. If i∗ 6= 0 and j∗ 6∈ B, then pivoting on(i∗, j∗) yields an equivalent basic tableau T ∗ corresponding to basis B∗ = B ∪ {j∗} \ {j0},where j0 is the unique j ∈ B such that the i∗j-th entry of T is 1. We call the j∗-th column theentering column; the j0-th column is the leaving column. Suppose

(108) T =−c dA b

, T ∗ =−c∗ d∗

A∗ b∗.

Let aj be the j-th column of A and a∗j the j-th column of A∗. Then aj0 = ei∗ = a∗j∗ . Pivoting

on (i∗, j∗) amounts to applying the following row operations.

(1) dividing the i∗-th row by ai∗j∗ ;(2) adding the i∗-th row cj∗ times to the the top (or 0-th) row; and(3) subtracting the i∗-th row aij∗ times from the i-th row, for each i 6= 0, i∗.

The resulting tableau T ∗ is equivalent to T by Lemma 35. We deduce that after this pivot on(i∗, j∗), the entries of the resulting tableau T ∗ satisfy:

(1) d∗ = d+ bi∗cj∗/ai∗j∗ ;(2) b∗i∗ = bi∗/ai∗j∗ , and b∗i = bi − bi∗aij∗/ai∗j∗ for i 6= i∗; and(3) c∗j = cj − cj∗ai∗j/ai∗j∗ if j 6= j∗, and of course c∗j∗ = 0.

To improve a basic feasible tableau, we will pivot to obtain a better tableau. The above analysiswill help us to decide how to choose the pivot elements (i∗, j∗).

1.5. Pivot selection. Suppose we are given a basic and feasible tableau T , and want toobtain, by pivoting on some (i∗, j∗), another feasible basic tableau T ∗ whose objective valueis hopefully better, but certainly not worse. With T, T ∗ as in (108), we have b ≥ 0 as T is


feasible, and we must choose i∗, j∗ such that b∗ ≥ 0 and d∗ ≥ d, i.e.

(109) 0 ≤ b∗i∗ = bi∗/ai∗j∗ , 0 ≤ b∗i = bi− bi∗aij∗/ai∗j∗ for i 6= i∗, and d ≤ d∗ = d+ bi∗cj∗/ai∗j∗ .

A straightforward analysis shows that to achieve this, we must choose

(1) j∗ such that cj∗ > 0; and(2) i∗ such that ai∗j∗ > 0 and bi∗/ai∗j∗ = min{bi/aij∗ : i such that aij∗ > 0}.

There are two exceptional events. If there is no j such that cj > 0, then the tableau T isoptimal and the corresponding basic solution is an optimal solution. If there does not exist ani such that aij∗ > 0, then there exists an unbounded direction, i.e. a nonnegative vector f suchthat Af = 0 and cf > 0, which implies that the problem corresponding to T is unbounded(exercise). In either case, we have solved the optimization problem corresponding to T .

Whenever bi∗ > 0, we will have d∗ > d, i.e. the basic solution corresponding to T ∗ has astrictly better objective value than the basic solution corresponding to T . Obviously, we canrepeat such pivoting steps until we either find an optimal or an unbounded tableau, therebysolving the problem we started with. However, we need an initial feasible basic tableaux forthe given optimization problem to start this procedure. We show how to find such a feasibletableau after the next example.

Example. Consider the basic and feasible tableau

(110)−6 −8 −5 −9 0 0 0

2 1 1 3 1 0 51 3 1 2 0 1 3

corresponding to basis B(1) = {5, 6}. We apply the improvement algorithm. We choose j∗ = 1,which fixes i∗ = 1, hence j0 = 5. We pivot on (1, 1) and obtain

(111)

0 −5 −2 0 3 0 15

1 12

12 11

212 0 21

20 21

212

12 −1

2 1 12

,

a basic tableau corresponding to B(2) = {1, 6} = B(1) \ {5} ∪ {1}. In the next iteration of theimprovement algorithm, we choose j∗ = 2, i∗ = 2 and j0 = 6. Pivoting on (2, 2) results in

(112)

0 0 −1 1 2 2 16

1 0 25 12

535 −1

5 225

0 1 15

15 −1

525

15

,

a basic tableau corresponding to B(3) = {1, 2} = B(2)\{6}∪{2}. The next step: j∗ = 3, i∗ = 2,and j0 = 2. Pivoting on (2, 3) we get

(113)0 5 0 2 1 4 171 −2 0 1 1 −1 20 5 1 1 −1 2 1

,

a tableau belonging to basis B = {1, 3} = B(3) \ {2} ∪ {3}. This tableau is optimal, as thetop row contains only nonnegative elements to the left of the vertical line. The basic solution

belonging to B is xB = (2, 0, 1, 0, 0, 0)t, with objective value cxB = 17, where c = (6, 8, 5, 9, 0, 0)is the original objective.

1. TABLEAUX AND PIVOTING 53

1.6. Finding an initial basic feasible tableau. Suppose we are given a linear opti-mization problem of the form

(114) max{cx : Ax ≤ b, x ≥ 0, x ∈ Rn}.

Then we may rewrite this problem to the standard form

(115) max{ [ c 0 ]

[xs

]: [ A I ]

[xs

]= b,

[xs

]≥ 0,

[xs

]∈ Rn+m}.

suitable for the tableau method.If b ≥ 0, we immediately have the basic feasible tableau

(116)−c 0 0A I b

,

corresponding to the basis {n+ 1, . . . , n+m}, where n and m are the number of columns androws of A. For example, the problem

(117) max{(6, 8, 5, 9)x :

[2 1 1 31 3 1 2

]x ≤

[53

], x ≥ 0, x ∈ R4}

gives rise to the initial basic tableau (110). It follows that an optimal solution of (117) is(2, 0, 1, 0)t, and the optimum is 17.

If b 6≥ 0, then we consider the auxiliary problem

(118) max{−y : Ax− y1 ≤ b, x ≥ 0, y ≥ 0, x ∈ Rn, y ∈ R}.

The optimum of this problem is 0 if and only if the original problem (114) is feasible, and thenan optimal basic solution is a feasible basic solution of (114). To solve (118), we need to finda feasible and basic initial tableau for (118). Such a tableau is obtained from the tableau

(119)0 0 1 0A I −1 b

,

corresponding directly to (118) by pivoting on (i∗, j∗), where j∗ is the index of the ‘−1’-column(j∗ = n + m + 1 if A is an m × n matrix) and i∗ is such that bi∗ = min{bi : i ∈ {1, . . . ,m}}(exercise). Suppose the optimal tableau for the auxiliary problem is

(120)−c′ p d′

A′ q b′,

corresponding to basis B. If d′ = 0 we can make sure that n+m+ 1 6∈ B by one more pivot ifnecessary. Then the tableau

(121)−c 0A′ b′

,

is equivalent to (116) but not yet basic, as there may be j ∈ B such that cj 6= 0. A basic

feasible tableau corresponding to B and equivalent to (116) can be obtained by adding rowsto the top row in (121). This is the feasible initial tableau we need to start looking for theoptimal solution of of our original problem (114).


Example. Consider the problem

(122) max{(−3, 2, 6, 13)x :

[−1 1 2 5

3 −2 −5 −13

]x ≤

[−1

5

], x ≥ 0, x ∈ R4}.

We write down the tableau corresponding to the auxiliary problem:

(123)0 0 0 0 0 0 1 0−1 1 2 5 1 0 −1 −1

3 −2 −5 −13 0 1 −1 5.

Column 7 corresponds to the auxiliary variable. The minimum entry in the rightmost columnis the −1 in row 1. We pivot on (1, 7) and obtain

(124)−1 1 2 5 1 0 0 −1

1 −1 −2 −5 −1 0 1 14 −3 −7 −18 −1 1 0 6

,

the initial feasible tableau for the auxiliary problem, which corresponds to the basis {6, 7}. Wepivot on (1, 1) and get

(125)0 0 0 0 0 0 1 01 −1 −2 −5 −1 0 1 10 1 1 2 3 1 −4 2

,

corresponding to basis B := {1, 6}. This tableau is optimal and 7 6∈ B, thus B is a feasible

basis for the original problem. To get the tableau for the original problem corresponding to Bwe delete the 7th column and replace the top row by (minus) the objective of (122):

(126)3 −2 −6 −13 0 0 01 −1 −2 −5 −1 0 10 1 1 2 3 1 2

.

This is not yet a basic tableau corresponding to B as there are j ∈ B such that the j-th entryin the top row is nonzero. In this case, there is only a problem for j = 1 ∈ B. Adding −3times row 1 to the top row we obtain:

(127)0 1 0 2 3 0 −31 −1 −2 −5 −1 0 10 1 1 2 3 1 2

,

a feasible basic tableau corresponding to B. This is the initial feasible tableau for (122). Bysheer luck, this tableau is also optimal. The optimal basic solution is (1, 0, 0, 0, 0, 2)t. Theoptimal solution of (122) is (1, 0, 0, 0)t, with objective value −3.

2. Cycling and Bland’s rule

2.1. Cycling and pivot rules. We have not yet shown that the simplex method is a finitemethod, i.e. that the optimal tableau is reached in finitely many pivot steps starting from aninitial feasible tableau. In fact, it is possible that the simplex method as described above doesnot finish, but there are methods to avoid this unwanted behaviour. In particular, by beingmore careful when choosing pivot elements, one can guarantee that the simplex method willfinish in a finite number of pivot steps.

When applying the simplex algorithm starting from a feasible tableau, we get a sequenceof feasible basic tableaux T (1), T (2), T (3), . . . corresponding to bases B(1), B(2), B(3), . . . say.There are only finitely many bases as there are only finitely many finite subsets of columns.

2. CYCLING AND BLAND’S RULE 55

If the simplex method would run indefinitely without reaching the optimal solution, then itmust happen that some basis is repeated in the sequence, i.e. that B(s) = B(t) for somedistinct s, t; this is called cycling. Since the objective value of the consecutive basic solutions isnondecreasing in the simplex algorithm, cycling can only occur when there is no improvement

in the objective function going from xB(s)

to xB(t)

.

2.2. Bland’s rule. When applying the improvement procedure for basic feasible tableaux,there is some freedom in choosing the pivot elements i∗, j∗. Using the notation of the subsectionon pivot selection, we must choose

(1) j∗ ∈ {j : cj > 0}; and(2) i∗ ∈ {i : aij∗ > 0, bi/aij∗ = µj∗}, where µj∗ := min{bi/aij∗ : i such that aij∗ > 0}.

A pivot rule is a rule for selecting i∗ and j∗ within these candidate sets. Recall that choosingi∗ fixes j0 as the unique j ∈ B such that aj = ei∗ .

Bland’s rule is:

choose j∗ as small as possible; choose i∗ such that j0 is as small as possible.

Using Bland’s rule, no cycling can occur.

Theorem 36 (Bland, 1977). When using Bland’s rule for pivot selection, no cycling occurs.

Proof. Suppose to the contrary that T (1), . . . , T (s), T (s+1) is a sequence of basic tableauxbelonging to bases B(1), . . . , B(s), B(s+1), so that T (k+1) is the unique basic tableaux obtainedfrom T (k) by pivoting according to Bland’s rule, and that B(1) = B(s+1) (and hence T (1) =

T (s+1)).

Call an index fickle if it occurs in some, but not all of the bases B(1), . . . , B(s). Let t be thelargest fickle index, so

(128) t := max{j ∈ X : ∃p : j ∈ B(p);∃q : j 6∈ B(q)}.

with X =⋃sk=1B

(k). Note that if j ∈ X and j > t, then j ∈ B(k) for all k. Let p ∈ {1, . . . , s}be such that t ∈ B(p), t 6∈ B(p+1) and let q ∈ {1, . . . , s} be such that t 6∈ B(q), t ∈ B(q+1). For

simplicity we set B := B(p), T := T (p), B′ := B(q), T ′ := T (q). So t leaves the basis at T andenters the basis at T ′. Directly from the definition of t, and the fact that t is chosen accordingto Bland’s rule, we have

(1) c′j ≤ 0 for all j ∈ X such that j < t; and

(2) c′t > 0; and(3) c′j = 0 for all j ∈ X such that j > t.

Suppose j∗ is the entering variable at T . Let f be the vector such that fj∗ = 1, Af = 0,and {j : fj 6= 0} ⊆ B ∪ {j∗}. We claim that

(1) fj ≥ 0 for all j ∈ X such that j < t; and(2) ft < 0.

As c− c′ ∈ rowspace(A), we have cf = c′f . Thus we arrive at

(129) 0 < cj∗ = cf = c′f =∑j∈X

c′jfj = (∑

j∈X,j<tc′jfj) + c′tft + (

∑j∈X,j>t

c′jfj) < 0,

a contradiction.To complete the proof, the claim concerning f (that min{j ∈ X : fj < 0} = t) must be

verified. This is an exercise. �


Earlier, the more complicated lexicographic rule had been shown to prevent cycling. Dantzighimself, when he published the simplex method, showed that cycling could be avoided by‘perturbation’ of the given optimization problem.

2.3. An abstract view. Let L ⊆ Rn be a linear space and let s, t ∈ {1, . . . , n}. Then thesimplex method for solving the dual pair(130)

max{xs : x ∈ L, xi ≥ 0 if i 6= s, xt = 1} and max{yt : y ∈ L⊥, yi ≥ 0 if i 6= t, ys = 1} = 0

is summarised as follows. For any basis B of L such that s ∈ B 63 t, let xB := CL(B, t) andlet yB := CL⊥(B, s) (the notation is from Chapters 1 and 2). Call B primal feasible if xB is afeasible solution of the primal problem and dual feasible if yB is a feasible solution of the dualproblem. The simplex algorithm seeks a basis B that is both primal feasible and dual feasible,starting from any primal feasible solution.

Lemma 37. Suppose B is primal feasible and not dual feasible, say yBi < 0 for some i 6= s.Then there is a j 6= t such that B∗ := B− i+ j is a primal feasible basis and xBt ≤ xB

∗t , or the

dual problem is infeasible.

3. The revised simplex method

3.1. Arithmetic complexity. The arithmetic complexity of an algorithm is the numberof additions, subtractions, multiplications, divisions and comparisons performed in a full runof the algorithm. The arithmetic complexity is a reasonable measure of the amount of timea computer will take to run the algorithm, provided that there is not much else going on inthe algorithm besides basic arithmetic, and provided that the size of the numbers involvedis bounded. We recommend a solid course in complexity theory for those interested in thedefinition of bit complexity of an algorithm, which is a more precise measure of computationaleffort. For now, we will do with arithmetic complexity, as we will only want to analyze simplearithmetical procedures.

3.2. The arithmetic complexity of matrix operations and pivoting. Let A be am× n matrix. Then a row operation takes at most 2n arithmetic operations since

(1) multiplying a row of A by a nonzero scalar takes n multiplications;(2) adding a multiple of a row to another row of A takes n multiplications and n additions.

A pivoting operation on A takes m row operations, so at most 2mn arithmetic operations.Gaussian elimination takes at most rank(A) pivoting operations on the coefficient matrix[

A b]

to find a matrix in row echelon form. If A has m rows and n columns, then the arith-metic complexity of Gaussian elimination is at most rank(A)2(n+1)m. Or, since rank(A) ≤ m,at most 2(n+ 1)m2 arithmetic operations.

The simplex method is to repeatedly apply pivoting steps on a tableau, which is a blockmatrix of size (m + 1) × (n + 1), say. To determine the pivot (i∗, j∗) takes at most n − 1comparisons for finding j∗ = min{j : cj > 0}; then m comparisons and at most m divisions tocalculate {bi/aij∗ : aij∗ > 0} and finally at most m − 1 comparisons to find the minimizer i∗

of this set. The pivot itself will take at most 2(n + 1)(m + 1) arithmetic operations, in all atmost n − 1 + 3m − 1 + 2(n + 1)(m + 1) arithmetic operations, which is O(nm). In practice,this ‘worst-case’ bound is indeed proportional to the amount of time needed.

3. THE REVISED SIMPLEX METHOD 57

3.3. The number of pivot steps in the simplex algorithm. We have determined thearithmetic complexity of one pivot step. To know the arithmetic complexity of the simplexmethod we need a bound on the total number of pivot steps needed to reach an optimal tableaufrom an initial feasible tableau. Sadly, we have no better general upper bound than the totalnumber of bases, which is n!

(n−m)!m! for a tableau with m + 1 rows and n + 1 columns. Even

more sadly, for each n there is an example on n variables with 2n constraints where the simplexalgorithm, using Bland’s Rule, visits all 2n vertices of the cube-like feasible set in question.Similar bad examples exist for all known pivot rules that provably prevent cycling. It is stilltheoretically possible that a pivot rule exists that prevents cycling and for which a good upperbound on the number of pivots can be derived.

The simplex method would be useless if its average behaviour was close to the worst-casebound. However, empirical studies show that the number of pivot steps is ‘usually’ linear in n,the number of columns of the tableau. This is not an exact result, but the least one can say isthat the upper bound of n!

(n−m)!m! is in practice a very bad estimate for the average number of

pivot steps.

3.4. The revised simplex method. It is possible to lower the complexity of pivotingsteps by operating on a lighter data structure than a tableau. In a general pivot step, we needto access the current c to determine j∗, and only b and the j∗-th column of the current A todetermine i∗. It turns out to be more efficient to compute the j∗-th column of the current Aat the last moment, instead of updating each column of A in each iteration. The informationneeded for this reconstruction is kept and updated, but not the full tableau T .

Let T be the initial feasible basic tableau. Consider the extended initial tableau

(131) T =−c d 0A b I

Let T ∗ be obtained by applying several pivots to T . Then applying the same pivots to theextended initial tableau yields the extended tableau

(132) T ∗ =−c∗ d∗ y∗

A∗ b∗ Q∗.

The matrix Q∗ and the vector y∗ record the combined effect of all row operations that wereapplied going from T to T ∗: it is not hard to see that A∗ = Q∗A, b∗ = Q∗b, c∗ = c − y∗A,d∗ = d+ y∗b, and that Q∗ = (A{1,...,m}×B∗)

−1, where B∗ is the basis corresponding to T ∗. It isan exercise to show that if T ∗ is optimal, then y∗ is an optimal solution of the dual problem.

In the revised simplex method, we keep and update Q∗ and y∗, and compute only theentries of c∗, A∗ and b∗ we need to determine the pivot elements. Specifically, to perform onepivot step we compute entries of c∗ one by one until a positive entry c∗j∗ is found. Then b∗

and the j∗th column of A∗, is computed to find the pivot row i∗. Updating Q∗, y∗ is done bypivoting the i∗-th entry of a∗j∗ in the matrix.

(133) T =c∗j∗ y∗

a∗j∗ Q∗.

One pivot step now takes O(n) steps for finding j∗ and O(m2) steps for finding i∗ and updatingQ∗ and y∗. If m is much smaller than n this is significantly faster than the O(nm) steps neededto perform a pivot on the full tableau.

The simplex method is usually implemented in this revised form. A further advantage ofthis revised method is numerical stability: we need only worry about the accuracy of the matrix


Q and the vector y, and not about the larger matrix A. To make sure that Q is numericallyaccurate, one resets Q to (A{1,...,m}×B)−1 after every O(m) steps.

Exercises

To avoid repetition, we will assume that T =−c dA b

, T ∗ =−c∗ d∗

A∗ b∗etc. in all exercises.

Several problems were taken from Linear Programming, Foundations and Extensions by RobertVanderbei.

(1) Let T be a basic feasible tableau corresponding to basis B. Show that if the j∗-thcolumn of A has only nonpositive elements and cj∗ > 0, then there exists an f ≥ 0such that Af = 0, cf > 0 and fj = 0 if j 6∈ B ∪ {j∗}. Show that in that case xB + λfis feasible for all λ ≥ 0 and c(xB + λf) → ∞ if λ → ∞. Conclude that the problemcorresponding to T is unbounded.

(2) Solve the following tableaux. That is, find an optimal or an unbounded tableau equiv-alent to the given tableau, and write down an optimal basic solution or an unboundedimproving direction.

(a)−6 −8 −5 −9 0 0

1 1 1 1 1 1, belonging to basis {5}.

(b)−3 3 15 −4 0 4 0 0 0

1 1 1 3 3 3 1 0 85 5 5 2 2 2 0 1 14

, belonging to basis {7, 8}.

(c)−6 2 −1 −3 0 0 0

7 −2 −3 5 0 1 1−3 1 1 −2 1 0 1

, belonging to basis {5, 6}.

(d)

−5 −4 −3 0 0 0 02 3 1 1 0 0 54 1 2 0 1 0 113 4 2 0 0 1 8

, belonging to basis {4, 5, 6}.

(3) List all possible pivot elements (i∗, j∗) such that pivoting on (i∗, j∗) in (110) yieldsa feasible tableau. What is the minimum number of pivot steps needed to find theoptimal tableau (113) from (110) ?

(4) Solve: max{(2, 3, 4)x :

0 2 31 1 21 2 3

x ≤ 5

47

, x ≥ 0, x ∈ R3}. Sketch the set of

feasible solutions in R3 and indicate the successive basic solutions found by the simplexmethod.

(5) Let T be a basic feasible tableau corresponding to basis B. Show that xB is a vertexof {x ∈ Rn : Ax = b, x ≥ 0}, the feasible set of the problem corresponding to T .

(6) Let T and T ∗ be feasible tableaux corresponding to bases B and B∗, so that T ∗ arisesfrom T by one improving pivot step. Show that either xB = xB

∗, or that [xB, xB

∗] is

an edge of {x ∈ Rn : Ax = b, x ≥ 0}.(7) Suppose T is an optimal tableau. Is it possible to see from T whether there is another

optimal tableaux, equivalent to T . If so, how? Is it possible to see from an optimaltableau whether there is more than one optimal solution?

(8) Solve the problem max{cx : Ax ≤ b, x ≥ 0} for the following values of A, b, c.

(a) A =

[2 −5 12 −1 2

], b =

[−5

4

], c = (−1,−3,−1).

EXERCISES 59

(b) A =

1 −11 −5 181 −3 −1 21 0 0 0

, b =

001

, c = (−19, 4, 17, 5).

(c) A =

−1 −1−1 1

1 2

, b =

−3−1

2

, c = (1, 3).

(d) A =

−1 −1−1 1−1 2

, b =

−3−1

2

, c = (1, 3).

(e) A =

1 −21 −12 −11 02 11 11 20 1

, b =

1265

16122110

, c = (3, 2).

(9) Suppose that T is a basic dual feasible tableau, i.e. c ≤ 0. A dual pivot step is pivotingon (i∗, j∗) selected as follows:(a) choose i∗ such that bi∗ < 0; and(b) choose j∗ such that ai∗j∗ < 0 and cj∗/ai∗j∗ = min{cj/ai∗j : ai∗j < 0}.The dual simplex method is the repeated application of dual pivot steps to dual feasibletableaux until an optimal tableau is reached.(a) Show that if T ∗ is obtained from T by a dual pivot step, then T ∗ is dual feasible

and d∗ ≤ d.(b) Solve the problems of exercise 8 with c ≤ 0 by the dual simplex method.

(10) By linear optimization duality and elementary matrix manipulation, we have

max{cx : Ax ≤ b, x ≥ 0} = min{yb : yA ≥ c, y ≥ 0} == −max{(−bt)z : (−A)tz ≤ (−c)t, z ≥ 0},

provided that the first maximization problem is feasible and bounded. Solve the dualsof the problems of exercise 8 by the simplex method. When is solving the dual easierthan solving the primal problem?

(11) Complete the proof of Theorem 36.(12) Derive Theorem 22 from Theorem 36.

CHAPTER 5

Integrality

1. Linear diophantine equations

1.1. Linear diophantine equations. This section is all about proving the following‘theorem with an alternative’, which is an analogue of Fredholm’s Alternative (Theorem 3)and Farkas’ Lemma (Theorem 9).

Theorem 38. Let A be a rational m× n matrix and let b ∈ Qm. Then either

(1) there is an x ∈ Zn such that Ax = b, or(2) there is a y ∈ Qm such that yA ∈ Zn and yb 6∈ Z,

but not both.

We prove this theorem below, after presenting two essential Lemma’s. The following oper-ations on a matrix are integral column operations:

(1) exchanging two columns,(2) multiplying a column by −1, and(3) adding an integral multiple of a column to another column.

When a matrix A′ can be obtained from another matrix A by integral column operations, wedenote this by A′ ≈ A. The following lemma is easy to verify.

Lemma 39. Let A,A′ be rational m× n matrices such that A ≈ A′ and let b ∈ Qm. Then

(1) Ax = b for some x ∈ Zn if and only if A′x′ = b for some x′ ∈ Zn, and(2) yA ∈ Zn if and only if yA′ ∈ Zn, for all y ∈ Qm.

We say that a m × n matrix H is in Hermite normal form if we can write H =[B 0

]where B is a nonnegative m×m lower triangular matrix such that the unique maximum entryin each row is located on the diagonal of B; so if

(134) B =

b11 0 · · · 0

b21 b22. . .

......

. . . 0bm1 bm2 · · · bmm

,where bii > bij ≥ 0 for all i, j such that j < i.

Lemma 40. Let A be an integral m× n matrix with linearly independent rows. Then thereis a matrix H ≈ A such that H is in Hermite normal form.

Proof. We prove the Lemma by induction on the number of rows m. For an integralm × n matrix C, let σ(C) :=

∑nj=1 |c1j |. If C has two nonzero elements in its top row, say

c1k ≥ c1j > 0, then subtracting the j-th column λ := b c1kc1j c times from the k-th column, we

obtain a matrix C ′ ≈ C with σ(C ′) < σ(C), since |c′1k| = |c1k − λc1j | < |c1j | ≤ |c1k|. Startingfrom A and applying such column operations while possible we obtain a sequence of integral

61

62 5. INTEGRALITY

matrices A ≈ A′ ≈ · · · with σ(A(p)) > σ(A(p+1)) for all p. The sequence is finite as σ(A(p))is a nonnegative integer for all p, so in particular there are no more that σ(A) matrices in

the sequence. The final matrix A(t) cannot have more than one nonzero in the top row. Byexchanging two columns in A(t), and/or multiplying the first column by −1 if necessary weobtain

(135) A ≈[b11 0

∗ A

],

where b11 ≥ 0 and A is an (m − 1) × (n − 1) integral matrix with linearly independent rows.

By induction there exists a matrix H ≈ A that is in Hermite normal form. It follows that

(136) A ≈[b11 0

∗ H

].

By subtracting a suitable integer multiple of the i-th column from the first column for i =2, 3, . . . ,m (in that order), we obtain a matrix H ≈ A in Hermite normal form. �

Proof of Theorem 38. We show first that (1) and (2) cannot both be true. For if x, y areas in (1),(2), then

(137) Z 3 (yA)x = y(Ax) = yb 6∈ Z,

a contradiction. It remains to show that at least one of (1), (2) holds. Note that without lossof generality, we may assume that A is integral, as multiplying both A and b by a λ ∈ Z doesnot affect the validity of (1) and (2). If the rows of of A are linearly dependent, then either oneof the rows is redundant in both (1) and (2)(i.e. removing the row does not affect the validityof (1),(2)) or Ax = b has no solutions at all, and then there is a rational y ∈ Qm such thatyA = 0 ∈ Zn and yb = 1

2 . So we may assume that the rows of A are linearly independent. By

Lemma 40, there is a matrix H ≈ A that is in Hermite normal form, say H =[B 0

]. Since

A is integral with independent rows, it follows that H is integral and has independent rows aswell, so B is nonsingular. By Lemma 39, to prove the theorem it suffices to show that either

(1) there is an u ∈ Zm such that Bu = b, or(2) there is a y ∈ Qm such that yB ∈ Zn and yb 6∈ Z.

Since B is nonsingular, the equation Bu = b has a unique solution, namely u = B−1b. Thus ifB−1b ∈ Zm we are in case (1); if on the other hand B−1b 6∈ Zm, say (B−1b)i 6∈ Z, then takingy equal to the i-th row of B−1, we get yB = (B−1B)i = ei ∈ Zm and yb = (B−1b)i 6∈ Z, andwe are in case (2). �

Theorem 38 has two Corollaries that may be familiar from first-year Algebra. Their proofsare exercises. Let a1, . . . , an ∈ Z. The greatest common divisor of a1, . . . , an is

(138) gcd(a1, . . . , an) := max{d ∈ Z : d divides each of a1, . . . , an}.

Corollary 40.1. Let a1, . . . , an ∈ Z. Then λ1a1 + · · ·+ λnan = gcd(a1, . . . , an) for someλ1, . . . , λn ∈ Z.

Let a, b, d ∈ Z. By a ≡ b mod d we denote that a = b+ λd for some λ ∈ Z.

Corollary 40.2. (‘Chinese remainder Theorem’) Let b1, . . . , bm ∈ Z, and let d1, . . . dm ∈Z, where gcd(di, dj) = 1 for all i 6= j. Then there exists an x ∈ Z such that x ≡ bi mod di fori = 1, . . . ,m.

1. LINEAR DIOPHANTINE EQUATIONS 63

1.2. Solving linear diophantine equations. We explain how to compute the set ofall integral solutions to a system of linear equations in this section. An n × n matrix U isunimodular if U is integral and | det(U)| = 1. It is an exercise to prove the following Lemma.

Lemma 41. Let U be an n× n matrix. The following are equivalent:

(1) U is unimodular,(2) U ≈ I,(3) U−1 is unimodular, and(4) Ux ∈ Zn if and only if x ∈ Zn, for all x ∈ Rn.

Let A and A′ be two m× n matrices. By the Lemma, we have

(139) A ≈ A′ ⇔[AI

]≈[A′

U

]for some U ⇔ AU = A′ for some unimodular matrix U.

So applying integral column operations amounts to multiplying on the right by a unimodularmatrix.

Lemma 42. Let A be an integral m × n matrix with independent rows and let b ∈ Zm.Suppose that H,U,B, V and W are matrices such that

(140)

[AI

]≈[HU

]=

[B 0V W

],

and such that B is an m × m matrix. If B−1b is not integral, then Ax = b has no integralsolutions. Otherwise, {x ∈ Zn : Ax = b} = {v +Wy : y ∈ Z(n−m)}, where v := V B−1b.

Proof. By (140), we have AU = H, and U is unimodular since U ≈ I. Hence Hx′ = bif and only if A(Ux′) = b, and Ux′ ∈ Zn if and only if x′ ∈ Zn. If B−1b is not integral, thenHx′ = b has no integral solutions, hence Ax = b has no integral solutions. If B−1b is integral,

we have {x′ ∈ Zn : Hx′ = b}] = {[B−1by

]: y ∈ Z(n−m)}. It follows that

(141)

{x ∈ Zn : Ax = b} = U [{x′ ∈ Zn : Hx′ = b}] =

= {[V W

] [ B−1by

]: y ∈ Z(n−m)} =

= {v +Wy : y ∈ Z(n−m)}.�

Thus to compute the vector v ∈ Zn and the n× (n−m) matrix W it suffices to apply the

integral column operations that bring A into Hermite normal form H to the matrix

[AI

].

This yields B, V,W and hence v. As an example, consider the 1 × 2 matrix A =[

29 13]

and the vector b = (1). We apply integral column operations:

(142)

29 131 00 1

≈ 3 13

1 0−2 1

≈ 3 1

1 −4−2 9

≈ 0 1

13 −4−29 9

≈ 1 0−4 139 −29

.Now B = (1), V =

[−49

], W =

[13−29

]. So B−1b is integral, v := V B−1b =

[−49

]and

(143) {x ∈ Z2 : 29x1 + 13x2 = 1} = {[−49

]+

[13−29

]y : y ∈ Z}.

The initiate will recognize the Euclidian algorithm in this procedure.

64 5. INTEGRALITY

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

A lattice L generated by two vectors Parallel hyperplanes H‖ such that L ⊆ H‖ 63 ◦.

Figure 1.

2. Lattices

2.1. Definitions. Given vectors a1, . . . , an ∈ Rm, we define the integer span as

(144) int {a1, . . . , an} := {λ1a1 + · · ·+ λnan : λ1, . . . , λn ∈ Z}.If L = int {a1, . . . , an}, then L is said to be generated by a1, . . . , an. We say that a set L ⊆ Rn isa lattice if L = int {a1, . . . , an} for some some linearly independent set {a1, . . . , an} (see Figure1). A lattice L is full-dimensional if there is no hyperplane H such that L ⊆ H. Equivalently,L ⊆ Rn is full-dimensional if L is generated by a basis of Rn. Any set of rational vectors,whether linearly independent or not, generates a lattice.

Lemma 43. Let a1, . . . , an ∈ Qm. Then L = int {a1, . . . , an} is a lattice.

Proof. We may assume that lin.hull {a1, . . . , an} = Rm. Let A be the matrix with i-thcolumn ai for i = 1, . . . ,m. Then L = {Ax : x ∈ Zn}, and A has linearly independent rows.By Lemma 39(1), we have {Ax : x ∈ Zn} = {A′x′ : x′ ∈ Zn} if A ≈ A′, and by Lemma 40 thereexists a H ≈ A in Hermite normal form, i.e. H =

[B 0

]where B is a nonsingular matrix.

So L = {Bu : u ∈ Zm}, and L is generated by the linearly independent columns of B. �

Irrational vectors may generate sets that are not lattices. It is an exercise to show thatint {1,

√2} is not a lattice.

2.2. The dual lattice. Let a1, . . . , an, b ∈ Qm, and let L be the lattice generated bya1, . . . , an. Theorem 38 is equivalent to: either

(1) b ∈ L, or

(2) there is a y ∈ Qm such that L ⊆ H‖y and b 6∈ H‖y ,

but not both, where H‖y := {x ∈ Rn : ytx ∈ Z} =

⋃z∈ZHy,z is a set of parallel hyperplanes

(see Figure 1). The dual of a lattice L is defined as L† := {y ∈ Rn : ytx ∈ Z for all x ∈ L}.Thus L† = {y ∈ Rn : L ⊆ H‖y}. We have:

Lemma 44. Let L be a full-dimensional lattice. Then L†† = L.

Proof. Let L be a full-dimensional lattice, say L = {Bu : u ∈ Zn} for some nonsingularmatrix B. Then L† = {(B−1)tw : w ∈ Zn}. For suppose that y, w ∈ Rn are such thaty = (B−1)tw. Then

(145) y ∈ L† ⇔ ytx ∈ Z∀x ∈ L⇔ ((B−1)tw)tBu ∈ Z∀u ∈ Zn ⇔ (w)tu ∈ Z∀u ∈ Zn.Thus y ∈ L† if and only if w ∈ Zn, as required.

It follows that L†† = {Bu : u ∈ Zn}†† = {(B−1)tw : w ∈ Zn}† = {Bu : u ∈ Zn} = L. �

2. LATTICES 65

2.3. The determinant of a lattice. The following Lemma will allow us to define aninvariant of lattices.

Lemma 45. Let A and A′ be m × n matrices with independent columns. Then A ≈ A′ ifand only if {Ax : x ∈ Zn} = {A′x : x ∈ Zn} .

Proof. It is easy to see that if A ≈ A′, then {Ax : x ∈ Zn} = {A′x : x ∈ Zn}. So let usassume that {Ax : x ∈ Zn} = {A′x : x ∈ Zn}. Then each column ofA is an integral combinationof columns of A′ and vice versa, hence there are integral n × n matrices U and U ′ such thatAU = A′ and A = A′U ′. Then AUU ′ = A′U ′ = A, hence UU ′ = I as A has independentcolumns. But then U ′ = U−1, and hence det(U) ∈ Z and det(U)−1 = det(U−1) ∈ Z, hence| det(U)| = 1. It follows that U is unimodular, and AU = A′, so A ≈ A′. �

Let L be a lattice, and let A be a matrix with linearly independent columns such thatL = {Ax : x ∈ Zn}. The determinant of L is defined as

(146) d(L) :=√

det(AtA).

The determinant does not depend on the choice of A: for if A′ is some other matrix such thatL = {A′x : x ∈ Zn}, then by the Lemma AU = A′ for some unimodular U , and then

(147)√

det((A′)tA′) =√

det(U tAtAU) =√

det(U t) det(AtA) det(U) =√

det(AtA).

If L is full-dimensional, then the colums of A are a basis of Rn, hence A is square and d(L) =√det(AtA) = |det(A)|. It is an exercise to show that d(L)d(L†) = 1 for any full-dimensional

lattice L.

2.4. Examples of lattices. We include the description of a few special lattices.

(1) Zn, the cubic lattice;(2) An := {x ∈ Zn+1 :

∑i xi = 0}. A2 is the hexagonal lattice, and A3 is the face-centered

cubic lattice;(3) Dn := {x ∈ Zn :

∑i xi ≡ 0 mod 2}, the checkerboard lattice;

(4) The 8-dimensional lattice E8 := D8 ∪ (D8 + 121);

(5) The 24-dimensional Leech lattice Λ24, generated by the rows of the following matrix(we have omitted the 0’s for readability)

84 44 44 44 44 44 42 2 2 2 2 2 2 24 44 44 42 2 2 2 2 2 2 24 42 2 2 2 2 2 2 22 2 2 2 2 2 2 22 2 2 2 2 2 2 24 42 2 2 2 2 2 2 22 2 2 2 2 2 2 22 2 2 2 2 2 2 2

2 2 2 2 2 2 2 22 2 2 2 2 2 2 22 2 2 2 2 2 2 2

−3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

.

66 5. INTEGRALITY

3. Lattices and convex bodies

3.1. Minkowski’s Theorem. If a set of points of sufficiently high volume must containtwo points whose difference is in a given lattice.

Lemma 46 (Blichfeldt). Let C ⊆ Rn be a compact set, and let L be a full-dimensionallattice. If Volume(C) ≥ d(L), then there exist distinct x, y ∈ C such that x− y ∈ L.

Proof. Since L is a full-dimensional lattice, we may assume that L = {Bu : u ∈ Zn} forsome nonsigular matrix B. Let F := {Bu : u ∈ [0, 1)n}. Then for each point x ∈ Rn there is aunique l ∈ L such that l + x ∈ F (exercise).

Suppose that there do not exist distinct x, y ∈ C such that x− y ∈ L. Then the sets l+Cfor l ∈ L are pairwise disjoint, and for each x ∈ C, there is a unique l ∈ L so that l + x ∈ F .Thus

(148) Volume(C) =∑l∈L

Volume(l + C) ∩ F ≤ Volume(F ) = d(L).

In fact, we cannot have equality in (148), for then the sets {(l+C)∩F : l ∈ L} would partitionF , which is impossible (exercise). Thus Volume(C) < d(L), as required. �

Consequently, any centrally symmetric convex set of sufficiently high volume contains anonzero vector in a given lattice.

Theorem 47 (Minkowski). Let C ⊆ Rn be a compact, convex set such that −x ∈ C forall x ∈ C, and let L ⊆ Rn be a full-dimensional lattice. If Volume(C) ≥ 2nd(L), then C ∩ Lcontains a nonzero vector.

Proof. Apply Lemma 46 to C and the lattice 2L := {2x : x ∈ L}. Since Volume(C) ≥2nd(L) = d(2L), it follows that there are distinct x, y ∈ C such that x − y ∈ 2L. Hence12(x − y) ∈ L \ {0}. As y ∈ C, we have −y ∈ C, and since C is convex, it follows that12(x− y) ∈ [x,−y] ⊆ C. Hence 1

2(x− y) ∈ C ∩ L \ {0}, as required. �

Let Vn be the volume of the n-dimensional unit ball, i.e. Vn := Volume(Bn(0, 1)). It is an

exercise to show that Vn = πn/2

(n/2)! if n is even and Vn = 2nπ(n−1)/2((n−1)/2)!n! if n is odd.

Minkowski’s Theorem yields an upper bound on the length of a shortest nonzero vector ina lattice.

Theorem 48. Let L be an n-dimensional lattice. Then there exists a nonzero x ∈ L suchthat ‖x‖ ≤ 2 n

√d(L)/Vn.

Proof. Without loss of generality, L is full-dimensional. Let C = Bn(0, 2 n√d(L)/Vn).

Then C is a convex body with −x ∈ C for all x ∈ C and Volume(C) = 2nd(L). Hence byTheorem 47, there is some x ∈ L ∩ C \ {0}, as required. �

For any lattice L, let r(L) := min{‖x‖ : x ∈ L, x 6= 0} be the length of the shortestnonzero vector, and let αn := max{r(L) : rank(L) = n and det(L) = 1}, i.e. αn is the smallest

number so that r(L) ≤ αn n√

det(L) for all lattices L of rank n. The above theorem states that

αn ≤ 2V− 1

nn . For small values of n, αn is known exactly:

(149)n 1 2 3 4 5 6 7 8

αnn 1 2√3

√2 2 2

√2 8√

34√

2 16

For the lattices L = A2, A3, D4, D5, E8 we have r(L) = αnn√

det(L), where n = rank(L).

3. LATTICES AND CONVEX BODIES 67

3.2. The maximal distance to a lattice. We now come to a related problem: given afull-dimensional lattice L ⊆ Rn what is the maximal distance of a point z ∈ Rn to the closestlattice point?

Lemma 49. Let L ⊆ Rn be a lattice of rank n with r(L†) ≥ 1/2. Then for each z ∈ Rn

there is an l ∈ L such that ‖z − l‖ ≤√∑n

k=1 α4k.

Proof. We prove the theorem by induction on n, the case n = 1 being easy.So assume that r(L†) ≥ 1/2. Since d(L)d(L†) = 1, we have

(150) r(L)r(L†) ≤ α2n

It follows that r(L) ≤ 2α2n. Let b ∈ L\{0} be a vector with ‖b‖ ≤ 2α2

n. For v ∈ Rn, let v be the

orthogonal projection of v on the hyperplane H := {x ∈ Rn : btx = 0}. Then L := {l : l ∈ L}is a lattice with L† = {y ∈ L† : y ⊥ b}, so that r(L†) ≥ r(L) ≥ 1/2. Thus by induction on n,

there is an l ∈ L such that ‖z − l‖ ≤√∑n−1

k=1 α4k.

Let ` := {l+ λb : λ ∈ R} = {x ∈ Rn : x = l}, and let x be the point of ` closest to z. Then

x = l, and ‖z − x‖ = ‖z − l‖ ≤√∑n−1

k=1 α4k. Let l be the point of L ∩ ` closest to x. Then

‖x− l‖ ≤ 12‖b‖ ≤ α

2n. Since x− l ⊥ z − x, we have

(151) ‖z − l‖2 = ‖z − x‖2 + ‖x− l‖2 ≤

√√√√n−1∑k=1

α4k

2

+ α4n =

n∑k=1

α4k,

as required. �

For any lattice L ⊆ Rn, let R(L) := max{d(z, L) : z ∈ Rn}, and let

(152) βn := max{R(L) : rank(L) = n, r(L†) ≥ 1

2}.

Then βn ≤√∑n

k=1 α4k ≤

√∑nk=1 16V

− 4k

k by the above theorem and Lemma 48. By other

methods, it can be shown that βn ≤ n, where our exposition yields βn ≤ cn3/2 for someconstant c independent of n (exercise).

3.3. Khinchine’s Theorem. A convex body is a compact, convex set. Given a latticeL ⊆ Rn, the lattice width of a convex body C ⊆ Rn is

(153) w(C,L) := min{max{wx : x ∈ C} −min{wx : x ∈ C} : w ∈ L† \ {0}}.It is an exercise to show that if τ : Rn → Rn is a linear bijection, then w(τ [C], τ [L]) = w(C,L).A set E ⊆ Rn is an ellipsoid if there is some linear bijection τ such that τ [E] is an n-dimensionalball.

Lemma 50. If w(E,L) ≥ βn then E ∩ L 6= ∅, for any ellipsoid E and full-dimensionallattice L in Rn.

Proof. The validity of this statement remains invariant if we apply a linear bijection toboth E and L. Thus without loss of generality, E = Bn(z, βn) for some z ∈ Rn.

Suppose w(E,L) ≥ βn. Then for any w ∈ L† \ {0}, we have

(154) βn ≤ w(E,L) ≤ max{wx : x ∈ E} −min{wx : x ∈ E} = 2βn‖w‖.It follows that ‖w‖ ≥ 1/2 for all w ∈ L† \{0}, i.e. r(L†) ≥ 1/2. By Lemma 49 there is an l ∈ Lwith ‖z − l‖ ≤ βn, i.e. l ∈ E ∩ L, as required. �

68 5. INTEGRALITY

Each compact body can be ‘approximated’ by an ellipsoid. This will allow us to derive asimilar statement for general convex bodies.

Lemma 51. Let C be a convex body. Then there are ellipsoids E,E′ with common center zsuch that E ⊆ C ⊆ E′ and E′ = {n(x− z) + z : x ∈ E}.

In the above Lemma, E′ is a homothetic dilation of E, obtained by ‘blowing up’ E by afactor of n. Combining Lemma 50 and Lemma 51, we obtain Khinchine’s Flatness Theorem.

Theorem 52 (Khinchine, 1948). If w(C,L) ≥ nβn then C ∩L 6= ∅, for any convex body Cand full-dimensional lattice L in Rn.

Proof. Let L be a lattice and let C be a convex body such that w(C,L) ≥ γn. There areellipsoids E,E′ as in Lemma 51; thus w(E,L) = 1/nw(E′, L) ≥ 1/nw(C,L) ≥ γn/n = βn. ByLemma 50, E contains a lattice point. As E ⊆ C, C contains a lattice point as well. �

Exercises

(1) Let a, b ∈ Z \ {0}. Show that |a− bab cb| < |b|, and that |a− bab + 12cb| ≤

12 |b|.

(2) Let a1, . . . , an ∈ Z. Show that λ1a1+· · ·+λnan = gcd(a1, . . . , an) for some λ1, . . . , λn ∈Z.

(3) Show that min{ax+ by : ax+ by ≥ 1, x, y ∈ Z} = gcd(a, b), for any a, b ∈ Z \ {0}.(4) Prove Corollary 40.2.(5) Let A be an integer m × n matrix and let b ∈ Zm. Show that the following are

equivalent:(a) there is an x ∈ Zn such that Ax = b; and(b) for any prime p, there is an x ∈ Zn such that Ax = b mod p.

(6) Prove Lemma 41.(7) Show that the following statements are equivalent for any x, y ∈ Zn:

(a) gcd(x1, . . . , xn) = gcd(y1, . . . , yn); and(b) there is a unimodular n× n matrix U such that y = Ux.

(8) For any integral m × n matrix A, let ψ(A) be the greatest common divisor of thedeterminants of m×m submatrices of A. Prove that if A has independent rows, thenψ(A) = d({Ax : x ∈ Zn}).

(9) Find out whether Ax = b has an integral solution.(a) A =

[8 9

], b = 3.

(b) A =[

8 10 38], b = 3.

(c) A =[

121 22 14 7], b = 12.

(d) A =

[2 4 38 −3 7

], b =

[56

].

(e) A =

[9 7 −1 18−3 11 10 0

], b =

[61

].

(f) A =

2 6 11 −4 30 −4 5 2 −113 1 5 0 −5

, b =

23−4

.

(10) Find out whether Ax = b has an integral solution. If so compute the set of all integralsolutions.(a) A =

[76 42

], b = 3.

(b) A =[

76 42], b = 6.

(c) A =[

76 42], b = 12.

EXERCISES 69

(d) A =

[11 713 13

], b =

[16−1

].

(e) A =

[11 713 13

], b =

[153

].

(f) A =

[10 67 122 18 4

], b =

[44

].

(11) Solve the Frobenius problem: Given integers a, b ≥ 2 such that gcd(a, b) = 1, find thelargest integer n such that n 6∈ {αa+ βb : α, β ∈ Z, α, β ≥ 0}.

(12) Consider R := Z[ω] = {a+ ωb : a, b ∈ Z} and K := Q[ω] = {p+ ωq : p, q ∈ Q}, where

ω := eı2π/3 = −12 + ı

√3

2 . Verify that R is a ring and that K is a field. Show that forany m×n matrix A with entries in R and b ∈ Rm, exactly one of the following holds.(a) There is an x ∈ Rn such that Ax = b.(b) There is a y ∈ Km such that yA ∈ Rn and yb 6∈ R.Can you find other pairs (R,K) for which the above statement holds?

(13) A set L ⊆ Rn is an additive subgroup if −a ∈ L for all a ∈ L, and a + b ∈ L for alla, b ∈ L. An additive subgroup L is discrete if inf{‖a − b‖ : a, b ∈ L, a 6= b} > 0.Show that L is a lattice if and only if L is a discrete additive subgroup.

(14) Let a, b, c, d ∈ Z be such that for all u, v ∈ Z there exist x, y ∈ Z such that ax+by = uand cx+ dy = v. Show that ad− bc = ±1.

(15) Suppose a1, . . . , an ∈ Rm. Let U := lin.hull {a1, . . . , an}, and let k := dimU .(a) Show that there is a linear bijection l : U → Rk.(b) Let a′i := l(ai) for all i. Show that int {a1, . . . , an} is a lattice if and only if

int {a′1, . . . , a′n} is a lattice.(c) Let b1, . . . , bn ∈ U , and let b′i := l(bi) for all i. Show that int {a1, . . . , an} =

int {b1, . . . , bn} if and only if int {a′1, . . . , a′n} = int {b′1, . . . , b′n}.(16) Show that int {1,

√2} is not a discrete subset of R1.

(17) Let B be a nonsingular n × n matrix, and let L = {Bx : x ∈ Zn}. Show thatL† = {B−1x : x ∈ Zn}, and that d(L†) = d(L)−1.

(18) Let a ∈ Zn be a row vector and let L := {x ∈ Zn : ax = 0}. Show that d(L) = ‖a‖.(19) Show that An, Dn, E8,Λ24 are indeed lattices. Compute d(L), r(L), r(L†) for L =

An, Dn, E8 and Λ24.(20) Let L = {Bu : u ∈ Zn} for some nonsigular matrix B. Let F := {Bu : u ∈ [0, 1)n}.

Show that for each point x ∈ Rn there is a unique l ∈ L such that l + x ∈ F .(21) Show that in the proof of Lemma 46 the sets {(l+C)∩F : l ∈ L} cannot partition F .(22) Let a ∈ Zn be a row vector. Show that there is an x ∈ Zn such that ax = 0 and

‖x‖ ≤ 2(‖a‖/Vn−1)1/(n−1).

(23) Show that Vn = πn/2

(n/2)! if n is even and Vn = 2nπ(n−1)/2((n−1)/2)!n! if n is odd.

Hint: Show that Vn = 2πn Vn−2 by integrating over the 2-dimensional unit ball.

(24) Show that there exists a constant c0 ∈ R such that V− 2

nn ≤ c0n for all n ∈ N. Hint:

use the previous exercise and that√

2πnn+1/2 exp(−n+ 1/(12n+ 1)) < n! <√

2πnn+1/2 exp(−n+ 1/(12n))

(this is a refinement of Stirling’s formula).

(25) Show that there is a constant c1 ∈ R such that βn ≤ c1n3/2 for all n ∈ N.

(26) Derive from Lemma 49 that if L ⊆ Rn is a full-dimensional lattice and z ∈ Rn thenthere exists an l ∈ L such that ‖z − l‖ ≤ βnr(L†)/2.

(27) Show that if τ : Rn → Rn is a linear bijection, then w(τ [C], τ [L]) = w(C,L).

70 5. INTEGRALITY

(28) Show that if C is a convex body, then there are simplices S, S′ with common centerz so that S ⊆ C ⊆ S′ and S′ = {n(x − z) + z : x ∈ S}. Hint: take S a simplex ofmaximum volume contained in C. (This is a variant of Lemma 51 which is easier toprove.)

(29) Let L = {Ax : x ∈ Zn}, where A has linearly independent columns, and let r ∈ R.Show that if a ∈ L is such that ‖a‖ < r, then a ∈ {Ax : x ∈ Z, ‖x‖ < r√

λ1(AtA)}.

(30) Compute, d(L), r(L), r(L†) and ρ(L) for L = An, Dn, E8 and Λ24.

CHAPTER 6

Integer linear optimization

1. Integer linear optimization

1.1. Overview. Integer linear optimization is optimizing a linear objective over all inte-gral vectors satisfying given linear equations and/or inequalities. All such problems can bereduced to the standard form

(155) max{cx : Ax ≤ b, x ∈ Zn}.

The feasible set of this problem is the intersection of the lattice Zn with the polyhedronP := {x ∈ Rn : Ax ≤ b} (see Figure 1).

There is no algorithm that solves integer linear optimization problems efficiently (i.e. inpolynomial time), and is generally believed that no such algorithm can exist1. Nevertheless,there is an algorithm that solves any integer linear optimization problems in finite time, thebranch & bound algorithm. Used with discretion, this algorithm is a very powerful tool to solveinteger linear optimization problems in practice. We describe the branch & bound algorithmin the final section of this chapter.

Many combinatorial optimization problems can be formulated as an integer linear opti-mization problem. Certain well-stuctured problems can be solved to a much greater extentthan the general problem. We describe such a well-solved problem in section 2 of this chapter.

1Integer linear optimization is NP-complete.

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

a3

a4

a5

c

cx=d

a2

1a

Figure 1. An integer linear optimization problem

71

72 6. INTEGER LINEAR OPTIMIZATION

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

a3

a4

a5

a2

1a

Figure 2. A polyhedron and its integer hull

1.2. Integral polyhedra and integer hulls. Let P be a polyhedron. The integer hullof P is (see Figure 2)

(156) PI := conv.hull (P ∩ Zn),

Clearly, PI ⊆ P . A polyhedron P is integral if P = PI . We have

(157) max{cx : x ∈ P ∩ Zn} = max{cx : x ∈ PI}.Thus optimizing a linear function over all integral points in P is no harder than optimizingover PI , but we need a description of PI in terms of linear equations and inequalities to dothis in practice. For this reason, determining the integer hull and finding integral polyhedraare central problems in integer linear optimization.

If P is bounded, then we can derive from Theorem 27 that P is integral if and only if allits vertices are integral vectors, i.e. if V (P ) ⊆ Zn. It also follows from Theorem 27 that thepolytope PI is a polyhedron. It is a difficult problem to determine a polyhedral description ofPI given P . We describe how in theory, PI can be approximated.

Suppose P = {x ∈ Rn : Ax ≤ b, x ≥ 0}. The hyperplane Hc,d is a cutting plane for P ifcx ≤ d is valid for all x ∈ PI and there is some x ∈ P for which cx > d. Then Hc,d ‘cuts off’ acorner of the polyhedron P containing no integer points. One way to obtain a valid inequalityfor PI is to take a valid inequality for P and ‘round it down’. For any c ∈ Rn, let bcc be definedby bcci = bcic for all i (similarly for dce). If cx ≤ d for all x ∈ P , then

(158) bccx ≤ bcxc ≤ bdcfor all x ∈ P ∩ Zn, hence bccx ≤ bdc is valid for all x ∈ PI . Any valid inequality for P can beobtained by making a nonnegative combination of rows of Ax ≤ b, but not every cutting planeis obtained by rounding down a valid inequality for P . However, let

(159) P ′ := {y ∈ P : bccy ≤ bdc for all c, d such that cx ≤ d for all x ∈ P},let P (0) := P and let P (t+1) := (P (t))′ for all t ∈ N. Without proof, we mention:

Theorem 53. Let P := {x ∈ Rn : Ax ≤ b, x ≥ 0}, where A and b are rational. Then

PI = P (t) for some finite t.

1. INTEGER LINEAR OPTIMIZATION 73

1.3. Totally unimodular matrices. A n ×m matrix M is totally unimodular (TU) ifdet(M ′) ∈ {−1, 0, 1} for all square submatrices M ′ of M . In particular, the entries (1 × 1submatrices) of a TU matrix are either −1, 0, or 1. We describe how certain integral polyhedraarise from totally unimodular matrices. Let us first argue that there exist TU matrices.

Let D = (V,A) be a directed graph. The incidence matrix of D is the V × A matrixMD = (Mva) such that mva = 1 if v is the head of a, mva = −1 if v is the tail of a and mva = 0otherwise (thus MD is a matrix with exactly one 1 and exactly one −1 in each column, and0’s otherwise).

Lemma 54. If M is the incidence matrix of a directed graph, then M is totally unimodular.

Proof. Suppose the matrix M contradicts the Lemma. Let B be a square k×k submatrixof M with det(B) 6∈ {0,−1, 1}. We may assume B is chosen such that k is as small as possible.There cannot be a column with no nonzero entries, as this would imply det(B) = 0. If there isa column with exactly one nonzero entry, say the i-th column with a nonzero in the j-th row,then striking both the i-th column and the j-th row from B yields a (k−1)× (k−1) submatrixB′ of M with |det(B′)| = |det(B)|, i.e. a smaller counterexample, contradicting that k waschosen as small as possible. So each column of B has exactly two nonzeros, a 1 and a −1. Butthen 1tB = 0, implying that det(B) = 0. This contradicts the choice of B. �

If M is totally unimodular, then it follows easily from the the definition that the transposeof M as well as each submatrix of M is totally unimodular. Moreover, duplucating a row orcolumn, adding a row or column with only one nonzero entry, and negating a row or columnall preserve total unimodularity. It follows that if M is TU, then

[M I

],[M −M

], etc.

are all TU (exercise).

Theorem 55. Let M be a totally unimodular m× n matrix, and let b ∈ Zm be an integralvector. Then P := {x ∈ Rn : Mx ≤ b} is an integral polyhedron.

Proof. We need to show that if y ∈ P , then y ∈ conv.hull P ∩ Zn. So let y ∈ P .Let P ′ := {x ∈ P : byc ≤ x ≤ dye}. It suffices to show that P ′ is integral, since theny ∈ conv.hull P ′ ∩ Zn and hence y ∈ conv.hull P ∩ Zn. Now P ′ is a bounded polyhedron, andP ′ = {x ∈ Rn : M ′x ≤ b′}, where

(160) M ′ =

M−II

, b′ = b−bycdye

.So M ′ is again TU and b′ is integral. Since P ′ is a bounded polyhedron, we know thatrank(M ′) = n. Let mi denote the i-th row of M ′. Let v be a vertex of P ′. By Theorem 22,there is a set of rank(M ′) = n indices B such that {mi : i ∈ B} is linearly independent andmiv = bi for all i ∈ B. So the submatrix M ′B×n of M ′ is nonsingular, and M ′B×nv = b′B. Then

M ′B×n is unimodular, hence (M ′B×n)−1 is an integral matrix. Since b′J is integral as well, it

follows that v = (M ′B×n)−1b′j is integral. �

Corollary 55.1. Let M be a totally unimodular m×n matrix, and let b ∈ Zm and c ∈ Znbe integral vectors. Then

(161) {x ∈ Rn : Mx ≤ b, x ≥ 0}, {y ∈ Rm : yM ≥ c, y ≥ 0}, {y ∈ Rm : yM = c, y ≥ 0}

are all integral polyhedra.


Figure 3. A bipartite graph, a matching and a vertex cover

2. Matching

2.1. Matchings and vertex covers. Let G = (V,E) be a graph. A matching of G is a setof edges F ⊆ E such that each vertex of G is incident with at most one edge in F , and a vertexcover is a set of vertices X ⊆ V such that each edge of G is incident with some vertex in X.We set ν(G) := max{|F | : F a matching of G} and τ(G) := min{|X| : X a vertex cover of G}.If F is a matching and X is a vertex cover, then each edge in F is incident with at least onevertex in X and no vertex in X covers more than one edge in F , hence |F | ≤ |X|. It followsthat ν(G) ≤ τ(G) for any graph. See Figure 3.

The incidence matrix of the undirected graph G = (V,E) is the V ×E matrix MG = (mve)such that mve = 1 if v is incident with e and mve = 0 otherwise. The characteristic vector ofa set F ⊆ E is χF ∈ {0, 1}E defined by χFe = 1 if e ∈ F and χFe = 0 otherwise. It is easy toverify that

(162) {χF : F is a matching of G} = {x ∈ ZE : MGx ≤ 1, x ≥ 0},and therefore

(163) ν(G) = max{1tx : MGx ≤ 1, x ≥ 0, x ∈ ZE}.Similarly, we have

(164) {χX : X is a vertex cover of G}+ {y ∈ ZV : y ≥ 0} = {y ∈ ZV : yMG ≥ 1, y ≥ 0},and therefore

(165) τ(G) = min{y1 : yMG ≥ 1, y ≥ 0, y ∈ ZV }.

2.2. Matchings in bipartite graphs. A graph G = (V,E) is bipartite if there are twodisjoint subsets U,W ⊆ V such that each edge of G is incident with one vertex in U and one inW . If G is bipartite, the incidence matrix of G is totally unimodular (exercise). It follows thatwe can find a maximum-cardinality matching in any bipartite graph G by solving the linearoptimization problem

(166) max{1tx : MGx ≤ 1, x ≥ 0, x ∈ RE}.However, there exist more efficient methods.

Theorem 56 (Konig, 1931). Let G be a bipartite graph. Then ν(G) = τ(G).

2. MATCHING 75

Proof. Let MG be the V ×E incidence matrix of G. As MG is totally unimodular, both{x ∈ RE : MGx ≤ 1, x ≥ 0} and {y ∈ RV : yMG ≥ 1, y ≥ 0} are integral polyhedra. Hence

(167) ν(G) = max{1tx : MGx ≤ 1, x ≥ 0, x ∈ ZE} = max{1tx : MGx ≤ 1, x ≥ 0, x ∈ RE}and

(168) τ(G) = min{y1 : yMG ≥ 1, y ≥ 0, y ∈ ZV } = min{y1 : yMG ≥ 1, y ≥ 0, y ∈ RV }.Thus τ(G) = ν(G) follows by linear optimization duality (Theorem 19). �

We say that a matching F of G is perfect if each vertex of G is incident with exactly oneedge in F . If G = (V,E) and Q ⊆ V , then the neigbor set of Q in G is

(169) N(Q) := {w ∈ V \Q : vw ∈ E, v ∈ Q}.It is an exercise to prove the following Corollary.

Corollary 56.1 (Hall, 1935). For a bipartite graph G with bipartition U,W , exactly oneof the following is true.

(1) G has a perfect matching.(2) There is a set Q ⊆ U such that |N(Q)| < |Q|.

The above corollary is know as the marriage theorem, since it concerns a classic match-makers’ problem.

2.3. Matchings in nonbipartite graphs. If a graph is not bipartite, then we may haveν(G) < τ(G). For example, ν(K3) = 1 and τ(K3) = 2, and more general for odd circuits wehave ν(C2k+1) = k and τ(C2k+1) = k + 1.

Define the polytope PM (G) := conv.hull {χF : F is a matching of G} for any graph G. IfG is bipartite, we have

(170) PM (G) = {x ∈ RE : MGx ≤ 1, x ≥ 0},as the latter polyhedron is integral. Again this is not necessarily true for a nonbipartite graph:if G is an odd circuit, then

(171)1

21 6∈ PM (G) ⊆ {x ∈ RE : MGx ≤ 1, x ≥ 0} 3 1

21.

Theorem 27 implies that PM (G) is a bounded polyhedron, as it is by definition a polytope.So there must be a system of linear inequalities whose set of solutions is PM (G).

Let G = (V,E) be an undirected graph. For any U ⊆ V , let

(172) E(U) := {e ∈ E : e has both ends in U}be the set of edges spanned by U . Let

(173)

QM (G) := {x ∈ RE : xe ≥ 0 for all e ∈ E∑e∈δ(v) xe ≤ 1 for all v ∈ V,∑e∈E(U) xe ≤ b |U |2 c for all U ⊆ V }

.

It is an exercise to show that PM (G) ⊆ QM (G) for any graph G. Without proof, we mention:

Theorem 57 (Edmonds, 1965). If G is an undirected graph, then PM (G) = QM (G).

This theorem does not give an efficient method for finding maximum-cardinality matchingsthrough linear optimization, as the number of inequalities that describe QM (G) is of the order

2|V |. But an efficient method does exist. In fact, Edmonds’ Theorem was a by-product of thedevelopement such a method.


BB(c, P, d, x0)Solve max{cx : x ∈ P} and call the optimal solution x;if cx ≤ d return(d, x0);else if x ∈ Zn return(cx, x);else let i be such that xi 6∈ Z

Let P ′ := {x ∈ P : xi ≤ bxic} and P ′′ := {x ∈ P : xi ≥ dxie};(d, x0)← BB(c, P ′, d, x0);(d, x0)← BB(c, P ′′, d, x0);return(d, x0)

Figure 4. The branch & bound algorithm

3. Branch & bound

3.1. Branch & bound. Suppose we want to solve the problem

(174) max{cx : x ∈ P ∩ Zn},where P a polyhedron of which we know a description P = {x ∈ Rn : Ax ≤ b}. Two easyobservations lead to the branch and bound algorithm for solving (174)

First of all, if we somehow can find polyhedra P ′, P ′′ so that P ∩Zn = (P ′∩Zn)∪(P ′′∩Zn),then we may solve max{cx : x ∈ P ∩ Zn} by solving both max{cx : x ∈ P ′ ∩ Zn} andmax{cx : x ∈ P ′′ ∩ Zn}: the best of both solutions is the solution to (174).

Second, we obviously have

(175) max{cx : x ∈ P ∩ Zn} ≤ max{cx : x ∈ P}.The latter problem, the LP relaxation of (174), is an ordinary linear optimization problem,which we can solve by the simplex algorithm or any other suitable method. If we are onlyinterested in solutions x ∈ Zn so that cx > d for some d, then we may safely stop if the valueof the LP relaxation is ≤ d.

Branch & bound is the procedure described in Figure 4. To solve (174), we must computeBB(c, P,−∞, x0) (when passing a polyhdron as an argument, we assume that a description interms of linear inequalities is given), where x0 is just arbitrary. If the problem is feasible, thealgorithm will eventually find an integral x and set x0 ← x and d ← cx. From then on, it ispossible that in a subproblem (c, P , d, x0) (so P contains only a subset of P ∩Zn), we find that

there cannot be any x ∈ P ∩ Zn such that cx > d since already max{cx : x ∈ P} ≤ d; in that

case, there is no reason to further investigate P . If we would not have a check like this thealgorithm would be hardly more than a complete enumeration of all elements of P ∩Zn. Now,we may save considerable time by not investigating corners of P that cannot contain anythinginteresting anymore.

It should now be clear now why this is called branch and bound: after splitting P into twopolyhedra P ′ and P ′′ together containing all integer points of P , the computation ‘branches’;when we abandon P after computing an upper bound on the objective value, we ‘bound’.

The name ‘branch & bound’ applies to several variants of the above procedure. There areother ways to split up P : in general we may construct some a ∈ Zn so that ax 6∈ Z and putP ′ = {x ∈ P : ax ≤ baxc}, P ′′ := {x ∈ P : ax ≥ daxe}. Also, one may choose in which orderBB(c, P ′, d, x0) and BB(c, P ′′, d, x0) are evaluated, e.g. ‘if baxc − ax ≤ 1

2 , do BB(c, P ′, d, x0)first’.

3. BRANCH & BOUND 77

3.2. Alternative formulations. There is usually more than one polyhedron containingthe same integral points, i.e. we can easily have P ∩Zn = Q∩Zn for different polyhedra P,Q.Then max{cx : x ∈ P ∩Zn} and max{cx : x ∈ Q∩Zn} are alternative formulations of the sameoptimization problem. If P,Q are such that P ⊇ Q, then the branch & bound algorithm willin general solve max{cx : x ∈ Q ∩ Zn} faster than max{cx : x ∈ P ∩ Zn}, simply because then

(176) max{cx : x ∈ Q} ≤ max{cx : x ∈ P}.

and thus max{cx : x ∈ Q} ≤ d is more likely than max{cx : x ∈ P} ≤ d. Consequently, thebranch & bound algorithm will on the whole find more occasions to skip work when evaluatingmax{cx : x ∈ Q ∩ Zn} versus max{cx : x ∈ P ∩ Zn}. Thus, if an integer linear optimizationproblem max{cx : P ∩ Zn} is solved too slowly by B&B, switching to a formulation max{cx :Q ∩ Zn} where P ⊇ Q and P ∩ Zn = Q ∩ Zn might help. We have no mathematical proof ofthis claim, and indeed it is not a very precise statement. But in practice one does strengthenthe description of an optimization problem in this manner to improve the running time, andit does often work.

The best possible Q, the ‘tightest’ description of P ∩ Zn is PI , the integer hull of P . Ifwe have a description of PI , then there is not even the need to apply the branch & boundalgorithm: it suffices to solve the linear optimization problem max{cx : x ∈ PI}. But it maynot be possible to determine the inequalities that describe PI ; or there may just be too manyinequalities in the description to compute even this linear optimization problem.

In general, a polyhedron Q such that P ⊇ Q and P ∩ Zn = Q ∩ Zn may be obtained byadding linear inequalities to the description of P that are valid for all x ∈ P ∩ Zn but not forall x ∈ P , as was explained in the beginning of this Chapter. Even when the full descriptionof the integral hull is not found, adding valid inequalities may still improve the running timeof the branch & bound algorithm.

But we need not always use an existing description of a problem to find a better one —sometimes it is clear that two different formulations each accurately describe the problem athand.

Example: the facility location problem. Consider the following situation. Throughoutthe country, you have a total number of m clients that require services from a facility. Thereis a cost cij associated with servicing client i from facility j. Moreover, there is a cost fi forusing facility i at all. The facility location problem (FLP) is to decide which facilities to open,and to assign each client to an open facility, such that the combined cost of opening facilitiesand servicing clients is minimal.

The FLP can be formulated as an integer linear optimization problem:

(177)min{

∑ni=1 fiyi +

∑ni=1

∑mj=1 cijxij :

∑mj=1 xij ≤ myi for all i,∑ni=1 xij ≥ 1 for all j,

xij , yi ∈ {0, 1} for all i, j }.

Here, the constraint xij , yi ∈ {0, 1} abbreviates ‘xij , yi ∈ Z and 0 ≤ xij , yi ≤ 1’. So (177) is aninteger linear optimization problem. The variables represent the choices to be made: yi = 1means opening facility i, and xij = 1 means assigning client j to facility i. It is clear any x, ysatifying the constraints of (177) represent a consistent set of choices. So (177) is a correctformulation of the FLP, but so is

(178)min{

∑ni=1 fiyi +

∑ni=1

∑mj=1 cijxij : xij ≤ yi for all i, j∑n

i=1 xij ≥ 1 for all j,xij , yi ∈ {0, 1} for all i, j }

.


Let

(179)P := { (x, y) ∈ Rnm+n :

∑mj=1 xij ≤ myi for all i,∑ni=1 xij ≥ 1 for all j,

0 ≤ xij , yi ≤ 1 for all i, j }and

(180)Q := { (x, y) ∈ Rnm+n : xij ≤ yi for all i, j∑n

i=1 xij ≥ 1 for all j,0 ≤ xij , yi ≤ 1 for all i, j }

.

Then (177) is min{fy+cx : (x, y) ∈ P ∩Znm+n} and (178) is min{fy+cx : (x, y) ∈ Q∩Znm+n},and P ⊇ Q. Even though (178) has more constraints than (177), the B&B algorithm will ingeneral solve (178) faster than (177).

Exercises

(1) Determine whether the matrix−1 1 0 0 1

1 −1 1 0 00 1 −1 1 00 0 1 −1 11 0 0 1 −1

is totally unimodular.

(2) Let M be a totally unimodular matrix. Show that each of the following matrices istotally unimodular.(a)

[M I

].

(b)[M −M

].

(c) M t

(3) Let M be TU, and let M ′ be obtained from M by a pivot. Show that M ′ is TU.(4) Prove Corollary 55.1.

(5) A vector q ∈ {0, 1}n is an interval vector if qi =

{1 if a ≤ i ≤ b0 otherwise

for some a, b ∈ N.

A matrix M is an interval matrix if each column of M is an interval vector. Showthat each interval matrix is totally unimodular.

(6) Let M be a totally unimodular matrix and let b ∈ Zn. Show that exactly one of thefollowing holds.(a) There is an x ∈ Zn such that Mx = b and x ≥ 0.(b) There is a y ∈ Zm such that yM ∈ {0, 1}n and yb < 0.

(7) Show that the incidence graph of a bipartite graph is totally unimodular.(8) Show that the matching and the vertex cover shown in Figure 3 are both optimal.

Conclude that this bipartite graph has no perfect matching. Find a set of vertices Qsuch that |N(Q)| < |Q|.

(9) Prove Corollary 56.1.(10) Show that PM (G) ⊆ QM (G) for any undirected graph G = (V,E).(11) Determine the number of inequalities in the definition of QM (G), for any undirected

graph G = (V,E), as a function of |V | and |E|.(12) Show that QM (G) = {x ∈ RE : MGx ≤ 1, x ≥ 0}(1) for any undirected graph G.

CHAPTER 7

Convexity

1. Convex and concave functions

1.1. Definition. Let C be a subset of Rn and let f : C → R. The function f is convex ifand only if C is a convex set and

(181) f(x+ λ(y − x)) ≤ f(x) + λ(f(y)− f(x))

for all x, y ∈ C and all λ ∈ (0, 1).We say that f is strictly convex if we have strict inequality in (181). A function f is concave

if −f is convex (similar for strictly concave). For example, a linear function x 7→ cx+d is bothconvex and concave. The function f : Rn → R defined by f : x 7→ ‖x‖ is convex, since

(182) f(x+ λ(y − x)) = ‖(1− λ)x+ λy‖ ≤ ‖(1− λ)x‖+ ‖λy‖ = f(x) + λ(f(y)− f(x)),

by the triangle inequality: ‖a+ b‖ ≤ ‖a‖+ ‖b‖ for all a, b. We give several examples of convexfunctions in the third section of this chapter.

1.2. Operations that preserve convexity. We describe ways to make new convex func-tions from old. Let f : Rn → R be a convex function.

(1) Multiplication: The function αf : Rn → R is convex for any α ≥ 0, where (αf)(x) :=αf(x) for all x.

(2) Translation: Let t : Rn → Rn be a translation, i.e. t : x 7→ x + p for some p ∈ Rn.Then f ◦ t : Rn → R is convex, where f ◦ t : x 7→ f(x+ p).

(3) Linear transformation: Let l : Rm → Rn be a linear transformation, i.e l : x 7→ Ax forsome n×m matrix A. Then f ◦ l : Rm → R is convex, where f ◦ l : x 7→ f(Ax).

Of course such operations can be combined to show that, say, x 7→ α‖Ax+ p‖ is convex. Now,let f1, . . . , fk : Rn → R be convex functions.

(1) Addition: The function f1 + f2 + · · ·+ fk : Rn → R is convex, where (f1 + f2 + · · ·+fk)(x) := f1(x) + f2(x) + · · ·+ fk(x) for all x.

(2) Taking the maximum: The function max{f1, f2, . . . , fk} : Rn → R is convex, wheremax{f1, f2, . . . , fk}(x) := max{f1(x), f2(x), . . . , fk(x)} for all x.

��

��

��

��(x, f(x))

(y, f(y))

��

��

��

��(x, f(x))

(y, f(y))

Convex Nonconvex

Figure 1. A convex and a nonconvex function

79

80 7. CONVEXITY

1.3. Minimizing a differentiable function. Let f : Rn → R and let x ∈ Rn. We saythat f is differentiable at x if there is an affine function f1

x : Rn → R such that

(183) f(x) = f1x(x) + ‖x− x‖ρ(x),

where ρ is such that ρ(x)→ 0 whenever ‖x− x‖ → 0. Then f1x is the first-order approximation

of f at x, and one can show that f1x : x 7→ f(x) +∇f(x)(x− x), where

(184) ∇f(x) := (∂f

∂x1(x), . . . ,

∂f

∂xn(x))

is the gradient of f at x. If f : C → R is differentiable at each x ∈ C, then f is differentiable. Iff is not differentable at x, then x is a singular point of f . For example, the function f : x 7→ ‖x‖has one singular point, the 0. When x 6= 0, we have ∇f(x) = xt

‖x‖ .

Lemma 58. Let f : Rn → R be differentiable at x ∈ Rn and let d ∈ Rn. If ∇f(x)d < 0,then there exists an ε > 0 such that f(x+ λd) < f(x) for all λ ∈ (0, ε).

Proof. If f is differentable at x then f(x + λd) = f(x) + λ∇f(x)d + λ‖d‖ρ(λd) whereρ(λd) → 0 whenever λ → 0. Thus there is an ε such that |∇f(x)d| > ‖d‖|ρ(λd)| for allλ ∈ (0, ε). Hence ∇f(x)d < 0 implies that f(x+ λd) < f(x) for all λ ∈ (0, ε). �

It follows that if x is a minimizer of a differentiable function f , then ∇f(x) = 0. In general,the converse to need not be true, e.g. the function f : x 7→ −x4 has f ′(0) = 0 and f(λ) < f(0)for all λ < 0. We shall see that for convex functions, the converse does hold.

1.4. A characterization of differentiable convex functions. We have a characteri-zation of convexity in terms of the derivative for differentiable functions.

Theorem 59. Let C be an open convex subset of Rn and let f : C → R be differentiable.Then f is convex on C if and only if f(y) ≥ f(x) +∇f(x)(y − x) for all x, y ∈ C.

Proof. To see that the condition is sufficient, consider x, y ∈ C, λ ∈ (0, 1) and definez := x+ λ(y − x) = (1− λ)x+ λy. By applying ‘f(y) ≥ f(x) +∇f(x)(y − x)’ twice we get

(185) f(x) ≥ f(z) +∇f(z)(x− z) and f(y) ≥ f(z) +∇f(z)(y − z).

Adding λ times the left inequality to (1− λ) times the right inequality we obtain (181). Thusf is convex. It remains to show that the condition is necessary. So suppose f is convex on Cand let x ∈ C. Let g : y 7→ f(y) − ∇f(x)(y − x). Then g is convex and ∇g(x) = 0, hencef(y)−∇f(x)(y − x) = g(y) ≥ g(x) = f(x) for all y ∈ C. �

Lemma 60. Let f : C → R and let x ∈ C. Suppose f is convex, and f is differentiable atx. Then ∇f(x) = 0 if and only if f(x) ≤ f(y) for all y ∈ C.

Proof. If∇f(x) 6= 0, then f(x) > f(y) for some y by Lemma 58. Conversely, if∇f(x) = 0,then f(y) ≥ f(x) +∇f(x)(y − x) = f(x) by Theorem 59. �

It follows that to find a minimizer x of a differentiable convex function f , it suffices to solvethe system of equations ∇f(x) = 0. In some cases, this can be done analytically. When thisis not feasible, there are several methods to find a numerical solution. For example, one mayapply the Newton method for solving systems of equations. Variants of the Newton methodhave been designed specifically for finding the minimizer of a convex function.

1. CONVEX AND CONCAVE FUNCTIONS 81

Application: Fermat’s problem. Fermat’s Problem is:

“Given three points a, b, c ∈ R2, locate a point x ∈ R2 minimizing the sumof its distances to a, b and c.”

When Fermat said ‘locate’, he most likely meant a construction by ruler and compasses. Butwe can at least characterize the point x. Let f : x 7→ ‖x − a‖ + ‖x − b‖ + ‖x − c‖. Weseek a minimizer of f . It is easy to see that f is convex (f is the sum of translations ofconvex functions). By Lemma 60, it follows that if x is the minimizer of f , then either x isa singular point, or ∇f(x) = 0. It is easily seen that a, b, c are the only singular points of

f and ∇f(x) = (x−a)‖x−a‖ + (x−b)

‖x−b‖ + (x−c)‖x−c‖ for nonsingular x. It is an exercise to show that we

have ∇f(x) = 0 if and only if the angle between each pair from (x−a)‖x−a‖ ,

(x−b)‖x−b‖ ,

(x−c)‖x−c‖ is 2

3π. This

characterizes the minimizer.

1.5. Quadratic functions. Let A be a symmetric n×n matrix. We say that A is positivedefinite (PD) if xtAx > 0 for all x ∈ Rn \ {0}, and that A is positive semidefinite (PSD) if andonly if xtAx ≥ 0 for all x ∈ Rn.

Lemma 61. Let f : Rn → R be a quadratic function, say f(x) = xtAx+ bx+ c, where A isa symmetric n× n matrix, b ∈ Rn is a row vector, and c ∈ R. Then f is convex if and only ifA is positive semidefinite.

Proof. For f : x 7→ xtAx + bx + c, we have ∇f(x) = 2xtA + b, hence by Theorem 59, fis convex if and only if ytAy + by + c ≥ xtAx + bx + c + (2xtA + b)(y − x) for all x, y ∈ Rn.This is equivalent to: ‘(y − x)tA(y − x) ≥ 0 for all x, y ∈ Rn’ which in turn is equivalent to‘dtAd ≥ 0 for all d ∈ Rn’. �

Similarly, one shows that f is strictly convex if and only if A is positive definite.

1.6. Twice differentiable functions. Let f : Rn → R and let x ∈ Rn. We say that f istwice differentiable at x if there is a quadratic function f2

x : Rn → R such that

(186) f(x) = f2x(x) + ‖x− x‖2ρ(x),

where ρ is such that ρ(x)→ 0 whenever ‖x− x‖ → 0. One can show that then

(187) f2x : x 7→ f(x) +∇f(x)(x− x) +

1

2(x− x)t∇2f(x)(x− x).

where ∇2f(x) is the Hessian of f at x, a symmetric n× n matrix defined by

(188) ∇2f(x) :=

∂2f∂x21

(x) . . . ∂2f∂x1∂xn

(x)

......

∂2f∂xn∂x1

(x) . . . ∂2f∂x2n

(x)

.If f : C → R is twice differentiable at each x ∈ C, then f is twice differentiable. One can showthat a function is convex if and only if f2

x is convex at each x, or equivalently, if ∇2f(x) PSDat each x.

Theorem 62. Let C be an open convex subset of Rn and let f : C → R be twice differen-tiable. Then f is convex on C if and only if ∇2f(x) is positive semidefinite for all x ∈ C.

The proof is an exercise. Similarly, we have: if ∇2f(x) is positive definite for all x ∈ C,then f is strictly convex. The converse does not always hold: f : x 7→ x4 is strictly convex but∇2f(0) = f ′′(0) = 0.

82 7. CONVEXITY

2. Positive definite and semidefinite matrices

2.1. Characterization of PSD matrices. Recall the following theorem from linear al-gebra.

Theorem 63. If A is a symmetric n× n matrix, then A has n eigenvectors f1, . . . , fn be-longing to eigenvalues λ1, . . . , λn, so that {f1, . . . , fn} is an orthonormal basis of Rn. Moreover,the matrix F whose columns are f1, . . . , fn is orthogonal, and

(189) F tAF =

λ1 · · · 0...

. . ....

0 · · · λn

,is a diagonal matrix.

We derive a characterization of PSD matrices from this theorem.

Theorem 64. Let A be a symmetric n× n matrix. The following are equivalent:

(1) A is positive semidefinite, i.e. xtAx ≥ 0 for all x ∈ Rn.(2) all eigenvalues of A are nonnegative.(3) A = ZtZ for some real-valued k × n matrix Z.

Proof. We first prove the implication (1)⇒ (2). Suppose A is PSD. Let λ be an eigenvalueof A and let f be an eigenvector belonging to λ, i.e. Af = λf . Then λ‖f‖2 = f tAf ≥ 0, henceλ ≥ 0. We now show the implication (2) ⇒ (3). As A is a symmetric matrix, there is anorthogonal matrix F as in (189) by Theorem 63. Assume that the eigenvalues λi of A arenonnegative. Then the square roots

√λi are real numbers. Take

(190) Z :=

√λ1 · · · 0...

. . ....

0 · · ·√λn

F−1.

Then Z is real-valued, and A = ZtZ, as required. To see that (3) ⇒ (1), observe that ifA = ZtZ, then xtAx = xtZtZx = ‖Zx‖2 ≥ 0 for all x. �

Given an n×n matrix A and a set I ⊆ {1, . . . , n}, let AI := (aij)i,j∈I denote the submatrixof A restricted to the rows and columns indexed by I.

Corollary 64.1. Let A be an n×n PSD matrix. Then det(AI) ≥ 0, for any I ⊆ {1, . . . , n}.

Proof. A is PSD ⇒ xtAx ≥ 0 for all x ∈ Rn ⇒ xtAx ≥ 0 for all x ∈ Rn such that xi = 0whenever i 6∈ I ⇒ ytAIy ≥ 0 for all y ∈ RI ⇒ AI is PSD. Note that det(A) equals the productof the eigenvalues of A for a symmetric matrix A. Hence if AI is PSD, then det(AI) ≥ 0. �

2.2. Characterization of PD matrices. It is possible to obtain the following resultsfor PD matrices. The proofs are very similar to the ones about PSD matrices.

Theorem 65. Let A be a symmetric n× n matrix. The following are equivalent:

(1) A is positive definite, i.e. xtAx > 0 for all x ∈ Rn \ {0}.(2) all eigenvalues of A are positive.(3) A = ZtZ for some real-valued n× n matrix Z, with det(Z) 6= 0.

Corollary 65.1. Let A be an n×n PD matrix. Then det(AI) > 0, for any I ⊆ {1, . . . , n}.

2. POSITIVE DEFINITE AND SEMIDEFINITE MATRICES 83

2.3. Recognizing PD and PSD matrices. Theorem 64 enables us to check whether amatrix is PSD by calculating all its eigenvalues, and verifying that they are nonnegative. Thecorollary gives us quick way of arguing that a symmetric matrix A is not PSD: it suffices tofind I such that det(AI) < 0. Both methods are useful for small matrices. In general, we needa more efficient method.

A symmetric matrix operation on an n× n matrix is one of the following:

(1) multiplying the i-th row and the i-th column by the same nonzero scalar λ;(2) interchanging the i-th and the j-th column and interchanging i-th and the j-th row;

and(3) adding λ times the i-th column to the j-th column and λ times the i-th row to the

j-th row.

For two n × n matrices A and B, we denote by A ∼= B that B can be obtained from A by aseries of symmetric matrix operations. It is an easy to see that A ∼= B if and only if there issome invertible matrix Y such that B = Y tAY .

Lemma 66. Let A,B be a symmetric matrices. If A ∼= B, then A is PSD if and only if Bis PSD.

Proof. Suppose B = Y tAY . If A is PSD, then for any x ∈ Rn, we have xtBx =xtY tAY x = ytAy ≥ 0, where y = Y x. It follows that B is PSD. To see the converse, note that∼= is an equivalence relation. �

Similarly, one shows that if A ∼= B, then A is PD if and only if B is PD.Given a symmetric matrix A, it is straightforward to construct a diagonal matrix D such

that A ∼= D by applying symmetric matrix operations. Since the eigenvalues of the diagonalmatrix D are exactly its diagonal entries, it is easy to verify whether D is PSD/PD. The processis quite efficient: no more than n(n − 1)/2 symmetric matrix operations are required for ann× n matrix.

Example. Consider the matrix

(191) A :=

2 1 41 3 −14 −1 3

.Applying symmetric matrix operations, adding the first row/column to the other row/columnsto obtain a matrix with mostly 0’s in the first row/column, we get

(192)

2 0 00 21

2 −30 −3 −5

.We continue with the second row/column and find

(193)

2 0 00 21

2 00 0 −43

5

=: D.

This diagonal matrix D clearly has the negative eigenvalue −435 . So D is not PSD, hence A is

not PSD.In fact, (192) has a negative diagonal entry (a 1 × 1 principal submatrix with det < 0),

which implies that it is not PSD by Corollary 64.1. We could have finished our computationthere.

84 7. CONVEXITY

2.4. The Gersgorin Theorem. The following theorem can sometimes be applied to showthat a matrix has only nonnegative eigenvalues.

Theorem 67 (Gersgorin, 1931). Let A be a complex n× n matrix, and define

(194) ρj :=∑{|aij | : i ∈ {1, . . . , n} \ {j}}

for j = 1, . . . , n. If λ is an eigenvalue of A, then there exists a p such that |λ− app| ≤ ρp.

Proof. Let λ be an eigenvalue of A. Then Ax = λx for some x ∈ Cn, in other words

(195)n∑k=1

ajkxk = λxj

for each j.Let p be such that |xp| = max{|xj | : j}. Then from the above equation with j = p we have

(λ− app)xp =∑

k 6=p apkxk. It follows that

(196) |λ− app||xp| = |∑k 6=p

apkxk| ≤∑k 6=p|apk||xk| ≤ ρp|xp|.

Since x 6= 0, we have xp > 0, and it follows that |λ− app| ≤ ρp. �

The sets {z ∈ Cn : |z − ajj | ≤ ρj} are called Gersgorin disks. Gersgorin’s Theorem thusstates that each eigenvalue is in a Gersgorin disk.

3. Examples of convex functions

3.1. Some functions of one and two variables. For functions of one variable f :R→ R, Theorem 62 states that f is convex if and only f ′′(x) ≥ 0 for all x ∈ R. This makes iteasy to verify the following.

(1) f : x 7→ exp(ax) is convex for all a ∈ R.(2) f : x 7→ xp is convex on (0,∞) if p ≤ 0 or p ≥ 1, and concave on (0,∞) if p ∈ (0, 1).(3) f : x 7→ log(x) is strictly concave on (0,∞).(4) f : x 7→ x log(x) is strictly convex on (0,∞).

The function of two variables f(x, y) := x2

y is convex on {(x, y) ∈ R2 : y > 0}. To see this, note

that

(197) ∇2f(x, y) =2

y3

[y2 −xy−xy x2

].

It is an exercise to show that this matrix is PSD for any y > 0.

3.2. Entropy. The function f : x 7→ x log2(x) convex and naturally defined on (0,∞).Extending f by setting f(0) := limx↓0 f(x) = 0, we obtain a convex function defined on [0,∞).

The Shannon entropy

(198) Hn : x 7→ −n∑i=1

xi log2(xi)

is a concave function defined on [0,∞)n, extending each xi 7→ xi log2(xi) as above.For a discrete random variable taking value i ∈ {1, . . . , n} with probability pi, the Shannon

entropy Hn(p) is regarded as a measure of informativity of the probability distribution p. Thefollowing theorem formalizes this notion.

A binary word of length k is an element of {0, 1}k. We denote the length of a binary wordb by |b|. A binary code is a set of binary words. If b, b′ are binary words, then b is an initial

3. EXAMPLES OF CONVEX FUNCTIONS 85

segment of b′ if bi = b′i for a i = 1, . . . , |b|. A prefix code is a binary code C such that b isnot an initial segment of b′, for any distinct b, b′ ∈ C. A prefix code C has the advantage ofbeing ‘uniquely decodable’: that is, if b1 · b2 · · · bl is a concatenation of codewords bi ∈ C, thenb1 · b2 · · · bl = b′1 · b′2 · · · b′l implies b1 = b′1, b2 = b′2, . . . , bl = b′l for any b′i ∈ C. This makes prefixcodes especially suitable for compact encoding strings of elements of X: there is no need toindicate the ends of codewords to ensure unambiguous decoding.

A prefix encoding of a finite set X is an injection f : X → C, where C is a prefix code. Ifp is a probability distribution on X, then the expected length of a codeword is

(199) Ep(f) :=∑i∈X

pi|f(i)|.

Suppose we want to store or transmit a string of elements from X of which we only knowthat symbol i ∈ X occurs with rate pi in the string. Then we may save space/bandwidth byencoding the symbols before storage/transmission. A prefix encoding f will use Ep(f) bits persymbol on average. It is therefore worthwhile to seek a prefix encoding f which such that Ep(f)is minimal given the rates p. The optimum encoding has the following relation to the Shannonentropy.

Theorem 68 (Shannon, 1948). Let p be a probability distribution on {1, . . . , n}. Then

(200) Hn(p) ≤ min{Ep(f) : f is a prefix encoding of X} ≤ Hn(p) + 1.

The proof of this theorem is the subject of an exercise in the next chapter.

3.3. The log-sum-exp function. The function

(201) f : x 7→ log(exp(x1) + · · ·+ exp(xn))

is convex on Rn.We have

(202) max{x1, . . . , xn} ≤ f(x) ≤ max{x1, . . . , xn}+ log(n)

for all x ∈ Rn. For this reason, the log-sum-exp function is sometimes used to approximatemax in situations where a smooth function is more convenient.

3.4. The geometric mean. The function

(203) f : x 7→ (n∏i=1

xi)1/n

is concave on (0,∞)n.

3.5. The Lorentz barrier. Recall that the Lorenz cone was defined as

(204) Lk := {x ∈ Rk :√x2

1 + · · ·+ x2k−1 ≤ xk}.

The function

(205) f : x 7→ − log(−x21 − · · · − x2

k−1 + x2k)

is convex on the interior of Lk.

86 7. CONVEXITY

3.6. The log-determinant. The cone of PSD matrices is

(206) Sk := {X : X is a symmetric k × k PSD matrix}.The function

(207) f : X 7→ − log(det(X))

is convex on the interior of Sk.

Exercises

(1) The epigraph of a function f : Rn → R is defined as

epi(f) = {[xz

]: x ∈ Rn, z ∈ R, z ≥ f(x)}.

Show that f is a convex function if and only if epi(f) is a convex set.(2) Let C be a convex subset of Rn, let f : C → R, and let α ∈ R. Show that if f is

convex on C, then {x ∈ C : f(x) ≤ α} is a convex set.(3) Let f : Rn → R. Prove that f is an affine function (i.e. f : x 7→ cx + d for some

c ∈ Rn and d ∈ R) if and only if f is both convex and concave.(4) Show that if f is strictly convex, then f has at most one minimizer.(5) Verify that the operations of subsection 1.2 (multiplication, translation, linear trans-

formation, addition and taking the maximum) indeed preserve convexity.(6) Let A be a symmetric n × n matrix, let b ∈ Rn be a row vector, let c ∈ R, and let

f : x 7→ xtAx+ bx+ c. Verify that ∇f(x) = 2xtA+ b.(7) Prove Theorem 62. Hint: reduce to the case n = 1.(8) Let g, h : R→ R be twice differentiable and let f = h ◦ g (so f(x) := h(g(x))). Show:

(a) f is convex if g is convex and h is convex and nondecreasing.(b) f is convex if g is concave and h is convex and nonincreasing.Hint: Verify that f ′′(x) = (g′(x))2h′′(g(x)) + g′′(x)h′(g(x)).

(9) Let a, b, c ∈ R2. Show that f : x 7→ ‖x−a‖+‖x−b‖+‖x−c‖ has a unique minimizer.(10) Fargano’s problem is: Given three lines `1, `2 and `3 in R3, find three points x1 ∈

`1, x2 ∈ `2, x3 ∈ `3 minimizing the sum of distances ‖x1−x2‖+ ‖x2−x3‖+ ‖x3−x1‖.Formulate this problem as a convex optimization problem. Prove that there is a uniqueoptimal solution x1, x2, x3.

(11) Is A positive semidefinite?

(a) A =

[3 44 −1

].

(b) A =

[3 44 5

].

(c) A =

[3 44 7

].

(d) A =

3 4 −24 7 −2−2 −2 1

.

(e) A =

3 4 −24 7 −2−2 −2 2

.

(12) Is f : R2 → R convex (on C)?(a) f(x) = 3x2

1 + 8x1x2 − x22.

EXERCISES 87

(b) f(x) = (x1 − x2)ex1+x2 .

(c) f(x) = (3x1+x2)2

x1−x2 , on {x ∈ R2 : x1 > x2}.(d) f(x) = − log x1 − x2, on {x ∈ R2 : x1 > x2}

(13) For which α ∈ R, if any, is f convex?(a) f(x) = sin(x) + αx.(b) f(x) = sin(x) + αx2.(c) f(x) = (5− α)x2

1 + 10x1x2 + x22 + 4αx1

(14) Let A be an m×n matrix, and let Pb := {x ∈ Rn : Ax ≤ b} for all b ∈ Rm. Show thatf : b 7→ Volume(Pb) is concave on {b ∈ Rm : Pb 6= ∅}.

(15) Verify that the examples of convex functions of subsection 3.1 are indeed convex. Hint:compute the Hessian (which is just the second derivative in all cases but one), andshow that in each case it is positive semidefinite.

(16) Show that the entropy function is concave.(17) Show that the log-sum-exp function is convex.(18) Show that the geometric mean is concave. Prove the arithmetic-geometric inequality:

x1+···+xnn ≥ n

√x1 · · ·xn for all nonnegative x ∈ Rn.

(19) Show that the Lorenz barrier is convex.(20) Show that the log-determinant is convex.(21) Consider again Fermat’s Problem. Give a construction by ruler and compasses for the

point x minimizing the sum of its distances to given a, b, c.(22) Let A be an n × n symmetric matrix, let b ∈ Rn be a row vector and let c ∈ R. Let

f(x) = xtAx+ bx+ c. Show that there is an x ∈ Rn such that f(x) = 0 if and only ifthe matrix bbt − 4Ac has a nonnegative eigenvalue.

(23) Let A be a symmetric n × n matrix. Show that A is positive definite if and only ifdet(AI) > 0 for I = {1}, {1, 2}, {1, 2, 3}, . . . , {1, . . . , n}.

(24) Let g : Rn → R be a twice differentiable concave function, let C := {x ∈ Rn : g(x) >0}, and let f : x 7→ 1

g(x) . Show that f is convex on C.

(25) Show that if D ⊆ Rn is an open set and f : D → R is a convex function, then f iscontinuous.

(26) We say that a function f : Rn → R is coercive if f(x)→∞ whenever ‖x‖ → ∞. Provethat if f : Rn → R is a continuous coercive function, then min{f(x) : x ∈ Rn} exists.(Warning: we do not assume that f is convex in this exercise.)

(27) Let f, g : Rn → R, where f is convex and g is concave. Prove that f(x) ≥ g(x) forall x ∈ Rn if and only if there exists a row vector c ∈ Rn and a d ∈ R such thatf(x) ≥ cx+ d ≥ g(x) for all x ∈ Rn.

CHAPTER 8

Convex optimization

1. Convex optimization with linear constraints

1.1. General convex optimization problems. Consider the optimization problem

(208) min{f(x) : x ∈ S}

where S ⊆ Rn and f : Rn → R. A feasible solution x ∈ S of (208) is locally optimal if asufficiently small neighborhood of x contains no feasible solutions of better objective value, i.e.

(209) ∃ε > 0 ∀y ∈ S ∩Bn(x, ε) : f(y) ≥ f(x).

A locally optimal solution need not be an optimal solution. For example, the optimizationproblem min{x+ 2 sinx : x ≥ 0, x ∈ R} has many local optima but only one optimal solution.

We say that the problem (208) is convex if f is a convex function and S is a convex set (amaximization problem max{f(x) : x ∈ S} is convex if the equivalent problem min{−f(x) : x ∈S} is convex; i.e. if f is concave and S is convex). In convex optimization problems, a locallyoptimal solution is necessarily optimal.

Lemma 69. Let f : Rn → R be a convex function, let S ⊆ Rn be a convex set, and letx ∈ S. Then x is a locally optimal solution if and only if x is an optimal solution of (208).

Proof. Suppose x ∈ S is not an optimal solution of (208). Let y ∈ S be such thatf(y) < f(x). As S is convex, the line segment [x, y] is contained in S, and since f is convex,we have f(x+ λ(y − x)) ≤ f(x) + λ(f(y)− f(x)) for all λ ∈ [0, 1]. Hence if λ ∈ (0, 1), we havex + λ(y − x) ∈ S and f(x + λ(y − x)) < f(x). So we can find feasible solutions with betterobjective value arbitrarily close to x, thus x is not locally optimal. �

��

��

��

S

f(x)=d

Figure 1. A convex optimization problem

89

90 8. CONVEX OPTIMIZATION

This property of convex optimization problems is very important. Most known methodsfor solving optimization problems are essentially methods to find a locally optimal solution. Incase of convex optimization, such methods find an optimal solution.

1.2. Optimality conditions for convex optimization with linear equality con-straints. We first consider convex optimization with linear equality constraints:

(210) min{f(x) : Ax = b, x ∈ Rn},

where f : Rn → R is a convex function, A is an m× n matrix and b ∈ Rm.

Lemma 70. Let x ∈ Rn be such that Ax = b. Then the following are equivalent.

(1) x is an optimal solution of (210); and(2) ∇f(x) ∈ rowspace(A).

Proof. x is an optimal solution of (210) if and only if there is no d ∈ Rn such that∇f(x)d 6= 0 and Ad = 0. For if there does exist such a d, then x + λd is feasible for anyλ ∈ R, and f(x + λd) < f(x) for a sufficiently small λ by Lemma 58. Then x is not optimal.Conversely, if x is not optimal, then there exists a y such that Ay = b and f(y) < f(x). Takingd := y − x, we have ∇f(x)d < 0 and Ad = 0.

By Fredholm’s Alternative (Theorem 3, applied to At,∇f(x)t), the nonexistence of a d suchthat ∇f(x)d 6= 0 and Ad = 0 is equivalent to the existence of v such that vA = ∇f(x). �

Solving (210) is equivalent to finding the (unconstrained) minimimum of g : y 7→ f(u+Wy),where u,W are such that {x ∈ Rn : Ax = b} = {u + Wy : y ∈ Rk}. Alternatively, one maydetermine an optimal solution x of (210) by solving the system of equations ∇f(x) = vA invariables x, v. In case of a quadratic function f , this amounts to solving a system of linearequations.

1.3. Optimality conditions for convex optimization with linear inequality con-straints. We proceed to the more general problem

(211) min{f(x) : Ax ≥ b, x ∈ Rn}

where f : Rn → R is a convex function, A is an m× n matrix and b ∈ Rm. Let ai denote thei-th row of A. We have the following necessary and sufficient condition for optimality.

Lemma 71. Let x ∈ Rn be such that Ax ≥ b. Then the following are equivalent:

(1) x is an optimal solution of (211); and(2) ∇f(x) ∈ cone {ai : aix = bi}.

Proof. Let J := {i ∈ {1, . . . ,m} : aix = bi}. Then x is an optimal solution of (211) if andonly if there is no d ∈ Rn such that ∇f(x)d < 0 and aid ≥ 0 for all i ∈ J . For if there doesexist such a d, then x+λd is feasible for small enough λ, and f(x+λd) < f(x) for a sufficientlysmall λ > 0 by Lemma 58. Then x is not optimal. Conversely, if x is not optimal, then thereexists a y such that Ay ≥ b and f(y) < f(x). Then with d := y − x, we have ∇f(x)d < 0 andaid ≥ 0 for all i such that aix = bi, i.e. for all i ∈ J .

By Farkas’ Lemma (Theorem 8, applied to ati,∇f(x)t), the nonexistence of a d such that∇f(x)d < 0 and aid ≥ 0 for all i ∈ J is equivalent to ∇f(x) ∈ cone {ai : i ∈ J}. �

1. CONVEX OPTIMIZATION WITH LINEAR CONSTRAINTS 91

1.4. The active set method. We describe an iterative method to solve a convex opti-mization problem with linear constraints of the form (211), where f is differentiable. We willcompute the minimum of f subject to linear equality constraints as a subroutine of our method;it is assumed that f is such that we can do that. We will further assume that f is strictlyconvex and that min{f(x) : x ∈ Rn} is attained. The latter requirements are not essential;both the method and the proof of its effectiveness can be extended to general convex f .

The method aims at finding a set J ⊆ {1, . . . ,m} such that for some x ∈ Rn we have both

(1) Ax ≥ b and aix = bi for all i ∈ J , and

(2) ∇f(x) ∈ cone {ai : i ∈ J}.Then such an x is optimal by Lemma 71.

At the start of each iteration, we have a set J ⊆ {1, . . . ,m} such that {ai : i ∈ J} is linearlyindependent, and a vector x such that (1) holds for (J, x). In the main loop, a new pair (J∗, x∗)is determined or an optimal solution y is found.

The first step of each iteration is to find an optimal solution y to the problem

(212) min{f(x) : ajx = bj for all j ∈ J, x ∈ Rn}.As f is strictly convex and min{f(x) : x ∈ Rn} is attained, (212) has a unique optimal solution.By lemma 70, there exist vi ∈ R such that ∇f(y) =

∑i∈J viai for some vi ∈ R. As we assume

that {ai : i ∈ J} is linearly independent, these vi will be unique. There are three cases.

(1) If Ay ≥ b and vi ≥ 0 for all i ∈ J , then J is what we set out to find and y is anoptimal solution;

(2) If Ay ≥ b and vi < 0 for some i ∈ J , then we set i0 := min{i ∈ J : vi < 0} and putx∗ := y, J∗ := J \ {i0}.

(3) If Ay 6≥ b, then we set λ∗ := max{λ ∈ [0, 1] : A(x+λ(y−x) ≤ b}, put x∗ := x+λ∗(y−x)and J∗ := J ∪ {i∗}, where i∗ := min{i : aix

∗ = bi, ai(y − x) < 0}.If we are not done, we repeat the main loop with (J∗, x∗).

We will prove that starting with J (0) = ∅ and any x(0) such that Ax ≥ b, the algorithmfinishes after finitely many iterations. Let J (k), v(k), x(k), y(k) hold the variables as they are inthe k-th iteration. We show:

Lemma 72. There cannot be distinct s, t such that J (s) = J (t).

Proof. Suppose that there are; we may assume s < t. Choose j0, p, q as follows:

(1) let j0 := max{j ∈⋃tk=s J

(k) : ∃k : s ≤ k < t, j 6∈ J (k)},(2) let p be such that s ≤ p < t, j0 ∈ Jp and j0 6∈ Jp+1, and(3) let q be such that s ≤ q < t, j0 6∈ Jq and j0 ∈ Jq+1.

Put d(k) := y(k)− x(k) for all k. Then j ∈ J (k) for all k such that s ≤ k < t and j > j0, so that

ajd(k) = 0 for such j and k. In iteration p we must be in case (2), so ∇f(x(p)) =

∑j∈J(p) v

(p)j aj ,

and v(p)j ≥ 0 for all j ∈ J (p) with j < j0, and v

(p)j0

< 0. In iteration q we must be in case (3),

and thus we have ajd(q) ≥ 0 for all j ∈ J (q) with j < j0, and aj0d

(q) < 0. It is not difficult

to see that x(p) = x(q). Moreover, 0 ≥ ∇f(x(k))d(k) for all k, als f(y(k)) ≤ f(x(k)). It is not

difficult to see that x(p) = x(q). So

(213)

0 ≥ ∇f(x(q))d(q) =

= ∇f(x(p))d(q) =

= (∑

j∈J(p),j<j0v

(p)j ajd

(q)) + v(p)j0aj0d

(q) + (∑

j∈J(p),j>j0v

(p)j ajd

(q)) > 0,

a contradiction. �


2. Differentiable convex optimization

2.1. The Lagrange dual. In this section, we will consider constrained optimization withinequality constraints, that is problems of the form

(214) min{f(x) : g1(x) ≤ 0, . . . , gm(x) ≤ 0, x ∈ Rn},for some f, g1, . . . , gm : Rn → R. The feasible set is S := {x ∈ Rn : g1(x) ≤ 0, . . . , gm(x) ≤ 0}.If all functions f, gi are convex, then (214) is a convex optimization problem. We will constructa lower bound on (214) given arbitrary functions f, gi. Next we show that this lower bound istight if the functions f, gi are both differentiable and convex.

The Lagrangian of (214) is the function φ : Rn × Rm → R defined as φ(x, u) := f(x) +∑mi=1 uigi(x). If x ∈ S and u ≥ 0, then uigi(x) ≤ 0 for all i, and hence we have

(215) Θ(u) := inf{φ(y, u) : y ∈ Rn} ≤ inf{φ(y, u) : y ∈ S} ≤ φ(x, u) ≤ f(x).

Hence Θ(u) ≤ min{f(y) : y ∈ S} for all u ≥ 0. Choosing u ≥ 0 such that Θ(u) is as large aspossible is a constrained optimization problem:

(216) max{Θ(u) : u ≥ 0, u ∈ Rm}.The optimization problem (216) is the Lagrange dual of (214). We have proved:

Lemma 73 (‘Weak Lagrange duality’). Let f, gi : Rn → R. Then

(217) inf{f(x) : g1(x) ≤ 0, . . . , gm(x) ≤ 0, x ∈ Rn} ≥ sup{Θ(u) : u ≥ 0, u ∈ Rm},where Θ(u) := inf{f(y) +

∑mi=1 uigi(y) : y ∈ Rn}.

2.2. A sufficient optimality condition. Assume now that the functions f, gi are dif-ferentiable. Then we say that x is a Karush-Kuhn-Tucker point (KKT point) of the problem(214) if x is feasible and −∇f(x) ∈ cone {∇gi(x) : gi(x) = 0}. In other words, an x is a KKTpoint if x ∈ S and there is a vector u ∈ Rm such that

(1) u ≥ 0,(2) ∇f(x) +

∑i ui∇gi(x) = 0, and

(3) ui > 0⇒ gi(x) = 0 for i = 1, . . . ,m.

Then the vector u is called the Lagrange multiplier of the KKT point x. The following Lemmageneralizes one direction of Lemmas 21 and 71.

Lemma 74 (Karush, 1939). Let f, gi : Rn → R be differentiable convex functions. If x is aKKT point of (214) with multiplier u, then f(x) = Θ(u). In particular, x is then an optimalsolution of (214).

Proof. Let x be a KKT point with multiplier u. Let L : Rn → R be defined by L : x 7→f(x) +

∑i uigi(x). We will show that

(218) Θ(u) = min{L(y) : y ∈ Rn} = L(x) = f(x).

As either ui = 0 or gi(x) = 0 for all i, we have L(x) = f(x). To see that min{L(y) : y ∈Rn} = L(x), note that L is convex as it is the nonnegative sum of convex functions. Moreover,∇L(x) = ∇f(x) +

∑i ui∇gi(x) = 0. It follows that the minimum of L is attained at x. That

Θ(u) = min{L(y) : y ∈ Rn} follows directly from the definitions of L and Θ. �

So to prove that (214) equals its Lagrange dual (216) in case f, gi are convex and differ-entiable, it suffices to show the existence of a KKT point x with multiplier u. Then by weakLagrange duality and the above Lemma, it follows that x is an optimal solution of (214) andu is an optimal solution of (216).

2. DIFFERENTIABLE CONVEX OPTIMIZATION 93

2.3. A necessary optimality condition. Let us say that x is a Fritz-John point (FJpoint) of (214) if x ∈ S and there exist λ0, . . . , λm ∈ R such that

(1) λi ≥ 0 for i = 0, . . . ,m, λi not all 0,(2) λ0∇f(x) +

∑i λi∇gi(x) = 0, and

(3) λi > 0⇒ gi(x) = 0 for i = 1, . . . ,m.

Then λ0, . . . , λm are the multipliers of x. Note the difference between FJ points and KKTpoints: any KKT point is an FJ point, but an FJ point is not necessarily KKT when λ0 = 0.

Lemma 75. Suppose that f, gi : Rn → R are differentiable functions, and let x ∈ Rn. If xis a local optimum of (214), then x is a Fritz-John point.

Proof. Let x ∈ S be a local minimum of (214). Put J := {i ∈ {1, . . . ,m} : gi(x) = 0}. Ifthere is a d ∈ Rn and εi > 0 such that

(219)f(x+ δ0d) < f(x) for all δ0 ∈ (0, ε0) andgi(x+ δid) < gi(x) for all δi ∈ (0, εi) for all i ∈ J

then x + δd is a better feasible solution for all δ ∈ (0, ε), where ε := min{εi : i ∈ J}. Since xis assumed to be locally optimal, such d ∈ Rn and εi do not exist. By applying Lemma 58 itfollows that there is no d ∈ Rn such that

(220)∇f(x)d < 0 and∇gi(x)d < 0 ∀i ∈ J.

By Gordan’s Theorem, there exist nonnegative λ0 and λi for i ∈ J , not all 0, such that

(221) λ0∇f(x) +∑i∈J

λi∇gi(x) = 0.

Choosing λi = 0 for i 6∈ J , we obtain λi as in the definition of FJ point. �

We have shown that locally optimal solutions of (214) are necessarily FJ points, withoutassuming even that f and gi are convex functions. Usually the collection of FJ points islimited. Provided that the optimum of (214) is attained, the optimum has to be an FJ point.An elaborate, but effective method to find the optimal solution of (214) in that case is to findall FJ points by solving the equations and inequalities that define FJ points.

2.4. Strong Lagrange duality. The proof of the following Lemma is an exercise.

Lemma 76. Let g1, . . . , gm : Rn → R be differentiable convex functions. Suppose that thereexists an x ∈ Rn such that gi(x) ≤ 0 for all i, and there are λi ∈ R, not all zero, such that∑m

i=1 λi∇gi(x) = 0 and λi ≥ 0 for all i. Then there is no y ∈ Rn such that gi(y) < 0 for all i.

We obtain:

Theorem 77 (Kuhn and Tucker, 1951). Let f, g1, . . . , gm : Rn → R be differentiable convexfunctions. Suppose that there exists a y ∈ Rn such that gi(y) < 0 for i = 1, . . . ,m. Then

(222) min{f(x) : g1(x) ≤ 0, . . . , gm(x) ≤ 0, x ∈ Rn} = max{Θ(u) : u ≥ 0, u ∈ Rm},provided that the minimum is attained, where Θ(u) := inf{f(y) +

∑mi=1 uigi(y) : y ∈ Rn}.

Proof. Suppose x is an optimal solution of the minimization problem. By Lemma 75, xis a FJ point with multipliers λ0, . . . , λm say. If λ0 6= 0, then x is a KKT point with multiplieru defined by ui := λi

λ0for i = 1, . . . ,m. In this case, we are done by Lemma 74. In the other

case, when λ0 = 0, then Lemma 76 implies there is no feasible y such that gi(y) < 0 for all i,contradicting our assumption. �


2.5. Problems with both inequality and equality constraints. The above dualitytheorem and the definitions of KKT point and FJ point can be generalized to cover problemswith both inequality and equality constrains, i.e. problems of the form

(223) min{f(x) : g1(x) ≤ 0, . . . gm(x) ≤ 0, h1(x) = 0, . . . , hl(x) = 0, x ∈ Rn}.This problem is convex if f and the gi are convex functions and the hj are affine functions.In that case, we can reduce (223) easily to a problem with only inequality constraints: sinceW := {x ∈ Rn : h1(x) = . . . = hl(x) = 0} is an affine space, there exists a n× k matrix U anda vector w ∈ Rn so that

(224) W = {Ux′ + w : x′ ∈ Rk}.

Let f , gi : Rk → R be defined by f(x′) = f(Ux′ + w) and gi(x′) = gi(Ux

′ + w). Then

(225) min{f(x) : gi(x) ≤ 0, hj(x) = 0, x ∈ Rn} = min{f(x′) : gi(x′) ≤ 0, x′ ∈ Rk}.

This gives a dual for (223) indirectly, but is is convenient to have a duality theorem that appliesto (223) immediately.

Theorem 78. Let f, g1, . . . , gm : Rn → R be differentiable convex functions, and leth1, . . . , hl : Rn → R be affine functions. Suppose that there exists a y ∈ Rn such that gi(y) < 0

for i = 1, . . . ,m. Let Θ(u, v) := inf{f(y) +∑m

i=1 uigi(y) +∑l

j=1 vjhj(x) : y ∈ Rn}. Then

(226) min{f(x) : gi(x) ≤ 0, hj(x) = 0} = max{Θ(u, v) : u ≥ 0, u ∈ Rm, v ∈ Rl},provided that the minimum is attained.

The proof is an exercise. It is possible to use the reduction we mentioned above, butone can also modify the definitions of the Lagrangian, a KKT point, a FJ point to includeequality constraints and extend the Lemma’s concerning KKT and FJ points accordingly. TheLagrangian of (223) is a function φ : Rn × Rm × Rl → R defined by φ(x, u, v) := f(x) +∑m

i=1 uigi(x) +∑l

j=1 vjhj(x). A point x is KKT point with multipliers u ∈ Rm, v ∈ Rl if

(1) x ∈ S,(2) u ≥ 0,(3) ∇f(x) +

∑i ui∇gi(x) +

∑j vj∇hj(x) = 0, and

(4) ui > 0⇒ gi(x) = 0 for i = 1, . . . ,m,

and x is an FJ point if there exist λ0, . . . , λm, κ1, . . . , κl ∈ R such that

(1) x ∈ S,(2) λi ≥ 0, and not all λi and κj are 0,(3) λ0∇f(x) +

∑i λi∇gi(x) +

∑j κj∇hj(x) = 0, and

(4) λi > 0⇒ gi(x) = 0 for i = 1, . . . ,m.

Given arbitrary differentiable functions f, gi, hj , any local optimum of (223) is an FJ point. Ifthe problem is convex, any KKT point is optimal.

3. Conic convex optimization

3.1. Definition of conic optimization. A conic optimization problem is a problem ofthe form

(227) min{cx : Ax = b, x ∈ K},where K is some closed cone, A is an m × n matrix, b ∈ Rm and c ∈ Rm. The feasible set of(227) is convex: it is the intersection of an affine space and a cone. Thus conic optimization

3. CONIC CONVEX OPTIMIZATION 95

is convex optimization. Without proof, we mention the following duality theorem for conicoptimization.

Theorem 79. Let K ⊆ Rn be a full-dimensional closed cone, let A be an m×n matrix, letb ∈ Rm and c ∈ Rn. Then

(228) min{cx : Ax = b, x ∈ K} = max{yb : c− yA ∈ K∗, y ∈ Rm},provided that the minimization problem is both strictly feasible and bounded.

For certain special K, it is possible to approximate an optimal solution of the conic opti-mization problem (227) by a so-called interior point algorithm. It is beyond the scope of thissyllabus to present this algorithm, but we will mention three cases of conic optimization forwhich the interior point method has been proven to be an efficient method.

(1) Linear optimization. This is the case where K = {x ∈ R : x ≥ 0};(2) Second-order cone optimization. This is the case where K is the Cartesian product of

second-order cones: K =∏i L

ki ; and

(3) Semidefinite optimization. This is the case where K = Sk, the cone of semidefinitematrices.

The interior point method for linear optimization problems is competitive with the simplexmethod. Most software packages for linear optimization contain implementations of both thesimplex method and the interior point method.

Application: minimizing a univariate polynomial. We shall show how to find theglobal minimum of a univariate polynomial by means of semidefinite optimization.

We say that a polynomial p ∈ R[x1, . . . , xn] is positive semidefinite if p(x) ≥ 0 for all x ∈ Rn.Clearly, any polynomial that is the sum of squares of polynomials is positive semidefinite. Forunivariate polynomials, the converse also holds.

Theorem 80. Let p ∈ R[x]. Then p is positive semidefinite if and only if there are poly-nomials r, q ∈ R[x] such that p = r2 + q2.

Proof. It is clear that if p is a sum of squares, then p is positive semidefinite. We provethe converse implication.

So let us assume that p ∈ R[x] is positive semidefinite polynomial. By the fundamentaltheorem of algebra, there are distinct complex numbers a1, . . . , ak ∈ C, numbers m1, . . . ,mk,and c ∈ R such that p(x) = c(x − a1)m1 · · · (x − ak)mk . This representation is unique up topermutation of the factors (x− ai)mi . Since p is a real polynomial, we have p = p and hence

(229) c(x− a1)m1 · · · (x− ak)mk = c(x− a1)m1 · · · (x− ak)mk .

It follows from the uniqueness of the representation that if ai 6∈ R, then there is a j such thataj = ai and mi = mj . Moreover, if ai ∈ R then mi is even, for otherwise p(ai + ε) and p(ai− ε)have different sign for sufficiently small ε, contradicting that p is positive semidefinite. Thedegree of p must be even, otherwise p(x) → −∞ for x → ∞ or x → −∞. Therefore, we havec > 0, since otherwise p(x)→ −∞ for x→∞.

Hence, we may assume that there is a k′ such that ai ∈ R for i = 1, . . . , k′ and ai = ak−i+k′+1

for i = k′ + 1, . . . , k. Define the complex polynomial s ∈ C[x] by

(230) s :=√c(x− ai)m1/2 · · · (x− ak′)mk′/2(x− ak′+1)mk′+1 · · · (x− a(k′+k)/2)m(k′+k)/2 .

Let r, q ∈ R[x] be such that s = r + ιq (so r and q are the real and imaginary part of s). Then

(231) p = ss = (r + ιq)(r − ιq) = r2 + q2,

as required. �


Theorem 81. Let d ∈ N, and let p = c0 +c1x1 + · · · c2dx

2d, where c0, . . . , cn ∈ R. Then p isthe sum of squares of polynomials if and only if there exists a positive semidefinite (d+1)×(d+1)matrix A = (aij) such that ck =

∑i+j=k+2 aij for k = 0, . . . 2d.

Proof. Suppose p is a sum of squares of polynomials, say p =∑l

j=1 h2j , where

(232) hj = z1j + z2jx+ · · ·+ zd+1,jxd for each j.

Let Z = (zij). Then Z is a l × (d + 1) matrix and A := ZtZ is a (d + 1) × (d + 1) positivesemidefinite matrix such that ck =

∑i+j=k+2 aij for k = 0, . . . 2d.

Conversely, let A be a (d + 1) × (d + 1) positive semidefinite matrix such that ck =∑i+j=k+2 aij for k = 0, . . . 2d. By Theorem 64 there is a matrix Z = (zij) such that A = ZtZ.

Define hj := z1j + z2jx+ · · ·+ zd+1,jxd for all j. Then p =

∑lj=1 h

2j . �

Now let p(x) = c0 + c1x1 + · · · c2dx

2d. By the above theorems, we have

(233)

min{p(x) : x ∈ R} =max{α : p− α is positive semidefinite } =max{α : p− α is a sum of squares of polynomials } =max{c0 − a11 :

∑i+j=k+2 aij = ck for k = 1, . . . 2d and (aij) ∈ Sd+1}.

Thus the problem of minimizing a univariate polynomial can be expressed as a semidefiniteoptimization problem.

3.2. Example. Consider the polynomial p = x4 − 10x3 + 6x2 + 14x + 3. Then by theabove analysis, min{p(x) : x ∈ R} equals(234)

max{3− a11 :

a12 + a21 = 14a13 + a22 + a31 = 6

a23 + a32 = −10a33 = 1

,

a11 a12 a13

a21 a22 a23

a31 a32 a33

is a symmetric PSD matrix}.

An optimal solution is

(235)

a11 a12 a13

a21 a22 a23

a31 a32 a33

=

637 7 −147 34 −5−14 −5 1

,with objective value 3− a11 = −634.

Final remark. Not every positive semidefinite polynomial can be expressed as a sumof squares of polynomials. We mention a theorem of Hilbert that completely describes thesituation. Note that a polynomial of odd degree can never be positive semidefinite.

Theorem 82 (Hilbert, 1888). Among the polynomials of degree 2m in n variables thereare always positive semidefinite polynomials that are not the sum of squares of polynomials ofn variables, except in the following cases:

(1) n = 1, m arbitrary;(2) n arbitrary, m = 1; and(3) n = 2, m = 2.

The case ‘n = 1, m arbitrary’ is Theorem 80 above. It is an exercise to derive the case ‘narbitrary, m = 1’ from Theorem 64. The case ‘n = 2, m = 2’ is a theorem of Hilbert. Precisely,he showed that any biquadratic bivariate polynomial is the sum of three squares of quadraticpolynomials.

EXERCISES 97

A rational function is an expression of the form p/q where p, q ∈ R[x1, . . . , xn]. Artinshowed in 1927 that any positive semidefinite rational function can be written as the sum ofsquares of rational functions, thereby solving Hilbert’s 17th problem.

Exercises

(1) Consider the optimization problem

min{xtQx : Ax = b, x ∈ Rn},where Q is an n× n positive semidefinite matrix, A is an m× n matrix and b ∈ Rm.(a) Give a necessary and sufficient condition for a point x ∈ Rn to be an optimal

solution of this optimization problem.(b) Find an optimal solution to this problem for the following values of Q,A, b:

Q =

3 2 −12 5 1−1 1 2

, A =[

8 9 −1], b =

[25].

(2) Let A be an m × n matrix, b ∈ Rm and f : Rn → R a differentable concave func-tion. Give a necessary and sufficient optimality condition for the convex optimizationproblem

max{f(x) : Ax = b, x ≥ 0, x ∈ Rn}.(3) Let a1, . . . , am, c ∈ Rn be row vectors and let b1, . . . , bm ∈ R. Consider the linear

optimization problem

min{cx : a1x ≤ b, . . . , amx ≤ bm, x ∈ Rn}(a) Write down the Lagrange dual of this optimization problem.(b) Observe that if u ∈ Rn and w ∈ R, then

inf{ux+ w : x ∈ Rn} =

{w if u = 0−∞ if u 6= 0

Use this to formulate the Lagrange dual as a linear optimization problem.(4) Let Q be an n×n PSD matrix, p ∈ Rn, A is an m×n matrix, and b ∈ Rm. Determine

the Lagrange dual of(a) min{xtQx : Ax = b, x ∈ Rn};(b) min{xtQx : Ax ≥ b, x ∈ Rn};(c) min{xtQx+ px : Ax ≥ b, x ∈ Rn};

In each case, make sure to simplify the dual to a quadratic optimization problem.(5) Let Q be an n× n symmetric matrix. Consider the optimization problem

min{xtQx : xtx = 1, x ∈ Rn}.(a) Show that the minimum of this problem is attained. Hint: Theorem 1.(b) Show that x is a Fritz-John point of this problem if and only if x is an eigenvector

of Q of length 1.(c) Show that the minimum equals the smallest eigenvalue of Q.

(6) Let A be an m × n matrix, let b ∈ Rm and let P := {x ∈ Rn : Ax ≤ b}. Formulatethe problem of finding the shortest vector in P as a convex optimization problem.

(7) Let Let A be an m × n matrix, b ∈ Rm, C a k × n matrix and d ∈ Rk. Formulatethe problem of finding the minimum distance between P := {x ∈ Rn : Ax ≤ b} andQ := {x ∈ Rn : Cx ≤ d} as a convex optimization problem.

(8) Prove Lemma 76.


(9) Let a1, . . . , am ∈ Rn. Formulate the problem of finding the ball Bn(x, r) of minimumvolume containing all vectors ai as a convex optimization problem with differentiableobjective and constraints. Determine the Lagrange dual of this problem.

(10) Let f, g1, . . . , gm : Rn → R, and let Θ : Rm → [−∞,∞) be defined by Θ(u) :=inf{f(y) +

∑mi=1 uigi(y) : y ∈ Rn}. Show that Θ is a concave function.

(11) The aim of this exercise is to prove Shannon’s Theorem (Theorem 68). Let p ∈ Rn bea probability distribution, i.e. p ≥ 0,1tp = 1.(a) Show that for any x ∈ Zn such that x ≥ 0, there exists a prefix encoding

f : {1, . . . , n} → C such that |f(i)| = xi for all i ∈ {1, . . . , n} if and only if∑ni=1 2−xi ≤ 1 (this is Kraft’s inequality).

Conclude that min{Ep(f) : f is a prefix encoding of {1, . . . , n}} equals

min{n∑i=1

pixi :

n∑i=1

2−xi ≤ 1, x ≥ 0, x ∈ Zn}.

(b) Prove that

Hn(p) = min{n∑i=1

pixi :

n∑i=1

2−xi ≤ 1, x ∈ Rn}.

Hint: Show that x := (− log2(p1), . . . ,− log2(pn))t is a KKT point of the (convex)optimization problem on the right.

(c) Prove that

min{n∑i=1

pixi :n∑i=1

2−xi ≤ 1, x ≥ 0, x ∈ Zn} ≤ Hn(p) + 1.

Hint: Show that x := (d− log2(p1)e, . . . , d− log2(pn)e)t is a feasible solution tothe problem on the left.

Piece together these facts to a proof of Theorem 68.(12) Show that any positive semidefinite quadratic polynomial (i.e. any number of vari-

ables, maximum total degree 2) is the sum of squares of linear polynomials. Hint: useTheorem 64.

(13) Let p be a biquadratic bivariate polynomial (i.e. two variables, maximum total degree4). Write min{p(x, y) : x, y ∈ R} as a semidefinite optimization problem.

(14) Show that the following polynomials are positive semidefinite but cannot be expressedas the sum of squares of polynomials.(a) 1 + x4y2 + x2y4 − 3x2y2 (an example due to Motzkin);(b) x4y2 + y4 + x2 − 3x2y2 (Choi and Lam);(c) x6 + y6 + 1− (x4y2 + x2y4 + x4 + x2 + y4 + y2) + 3x2y2 (Robinson);(d) x2(x− 1)2 + y2(y − 1)2 + z2(z − 1)2 + 2xyz(x+ y + z − 2) (Robinson);(e) x2y2 + x2z2 + y2z2 + 1− 4xyz (Choi and Lam).

Remark: some of these are really hard. To prove positive semidefiniteness in (a),(b), and (e), substitute each xi by a suitable monomial in the arithmetic-geometricinequality x1+···+xn

n ≥ n√x1 · · ·xn.

linear optimization - eindhoven university of …rudi/opt.pdflinear optimization problems can be...

Documents