lecture notes - mcgill university · this set of notes was created for a series of invited lectures...

Spring School on Variational AnalysisPaseky nad Jizerou, Czech Republic (May 19–25, 2019)

Topics in Convex Analysis inMatrix Space

Tim Hoheisel1

e-mail: [email protected] University

Department of Mathematics and StatisticsBurnside Hall, Room 1114

805 Sherbrooke Street WestMontreal Quebec H3A0B9

Lecture Notes

Abstract This set of notes constitutes a snapshot in time of some recent results bythe author and his collaborators on different topics from convex analysis of functions ofmatrices. These topics are tied together by their common underlying themes, namelysupport functions, infimal convolution, and K-convexity. A complete exposition of thebasic convex-analytic concepts employed makes the presentation self-contained.

Keywords Convex analysis, Fenchel conjugate, infimal convolution, K-convexity, general-ized matrix-fractional function, Moore-Penrose pseudoinverse, support function, Hormander’sTheorem, variational Gram function.

1Research partially supported by NSERC grant RGPIN-2017-04035.

1

Contents

1 Introduction 3

2 Preliminaries 42.1 The Euclidean setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Extended arithmetic and extended real-valued functions . . . . . . . . . . . 62.3 Lower semicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Fundamentals from Convex Analysis 143.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Convex cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Functional operations preserving convexity . . . . . . . . . . . . . . . . . . 373.5 Conjugacy of convex functions . . . . . . . . . . . . . . . . . . . . . . . . . 393.6 Horizon functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.7 The convex subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.8 Infimal convolution and the Attouch-Brezis Theorem . . . . . . . . . . . . 573.9 Subdifferential calculus and Fenchel-Rockafellar duality . . . . . . . . . . . 613.10 Infimal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Conjugacy of Composite Functions via K-Convexity and Infimal Con-volution 664.1 K-convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.2 The convex-composite setting . . . . . . . . . . . . . . . . . . . . . . . . . 684.3 The Fenchel conjugate of the convex convex-composite . . . . . . . . . . . 704.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5 A New Class of Matrix Support Functionals 755.1 The matrix-fractional function . . . . . . . . . . . . . . . . . . . . . . . . . 765.2 The generalized matrix-fractional function . . . . . . . . . . . . . . . . . . 785.3 The geometry of Ω(A,B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.4 σΩ(A,0) as a gauge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A Background from Linear Algebra 95

2

1 Introduction

This set of notes was created for a series of invited lectures at the Spring School onVariational Analysis in Paseky nad Jizerou, Czech Republic, May 19–25, 2019.

We start in Chapter 2 with some preliminaries about (finite dimensional) Euclideanspaces, extended arithmetic and lower semicontinuity which is used throughout.

Following the preliminary section there are the three main chapters, Chapter 3, 4 and5: In Chapter 3 we present, in a self-contained way, the fundamental tools from convexanalysis that are necessary to understand the subsequent study. An emphasis is put on thestudy of support and gauge functions, including Hormander’s theorem and, most impor-tantly, on infimal convolution, upon which we build our presentation of many importantresults in convex analysis, and which is a main workhorse for large parts of our study. Theapproach centered around the conjugacy relations between infimal convolution and addi-tion of functions is inspired by the excellent textbook [2] by Bauschke and Combettes.The rest of this section is an eclectic selection from other standard references such as[19] by Hiriart-Urruty and Lemarechal, the classic and ever-current book by Rockafellar[24] and the monumental bible of variatonal analysis by Rockafellar and Wets [25], alsoreflecting some of the author’s personal preferences.

In Chapter 4 we take a broader perspective on convexity through a generalized notionof convexity for vector-valued functions with respect to partial orderings induced by a(closed, convex) cone K, which is referred to as K-convexity in the literature. Importantgroundwork in this area was laid by Borwein in his thesis [4]. Another pivotal (andsomewhat overlooked) contribution was made by Pennanen in [23]. Here, we focus onconjugacy results for the composition g F of a (closed, proper) convex function g witha nonlinear map F . Maps of this form are called convex-composite functions. We studythe case where the generalized convexity properties of the nonlinear map F and themonotonicity properties of the convex function g align in such a way that the compositeg F is still convex. Our main result, Theorem 4.7, provides sufficient conditions for theformula

(g F )∗(p) = minv∈−K

g∗(v) + 〈v, F 〉∗ (p)

to hold, thus complementing the work by Bot and Wanka on that subject, see e.g. [5]. Themain ingredients for the proof of said result are K-convexity and the conjugacy calculusfor infimal convolution. The material presented in Section 4 grew out of an ongoingcollaboration with my postdoc Quang Van Nguyen at McGill.

Chapter 5 is based on a series of papers [7, 8, 9] by the author and James V. Burke aswell as his PhD student Yuan Gao at the University of Washington, Seattle. The centralobject of study is the generalized matrix-fractional function (GMF)

ϕA,B : (X, V ) ∈ Rn×m×Sn 7→

12tr((

XB

)TM(V )†

(XB

))if rge

(XB

)⊂ rgeM(V ), V ∈ KA,

+∞, else.

Here, A ∈ Rp×n and B ∈ Rp×m are such that rgeB ⊂ rgeA, KA ⊂ Sn is the set of allsymmetric matrices that are positive semidefinite on the kernel of A, and M(V )† is theMoore-Penrose pseudoinverse of the bordered matrix M(V ) =

(V ATA 0

). The GMF occurs

in different situations, and for different choices of A,B, in matrix optimization or machine

3

and statistical learning, see Section 5.1 and 5.2. A main theoretical result, see Theorem5.8, is the observation that the GMF is the support function of the set

D(A,B) :=

(Y,−1

2Y Y T

)∈ Rn×m × Sn

∣∣ Y ∈ Rn×m : AY = B

.

The second main result, Theorem 5.10, provides the formula

convD(A,B) =

(Y,W ) ∈ Rn×m × Sn

∣∣∣∣ AY = B and1

2Y Y T +W ∈ KA

(1.1)

for the closed convex hull of the set D(A,B). Combining these two facts opens the doorfor a full convex-analytical study of the GMF, and for using it in various applications suchas nuclear norm smoothing, see Section 5.5.2, or variational Gram functions, see Section5.5.1. The description of the closed, convex hull of D(A,B) also establishes a tie betweenthe GMF and our study of K-convexity in Chapter 4 (which is also explicitly used in theproof of Theorem 5.26): The special case A = 0 and B = 0 in (1.1) gives

convD(A,B) =

(Y,W ) ∈ Rn×m × Sn

∣∣∣∣ 1

2Y Y T +W 0

.

In the language of K-convexity, this means that the closed, convex hull of D(0, 0) is thenegative K-epigraph of the map Y 7→ 1

2Y Y T with respect to the positive semidefinite

matrices.The notes end with an appendix on background from linear algebra and the bibliog-

raphy.

2 Preliminaries

2.1 The Euclidean setting

In what follows, E will be an N -dimensional (N ∈ N) Euclidean space, i.e. a finite-dimensional real vector space equipped with a scalar product, which we denote by 〈·, ·〉.Recall that a scalar product on E is a mapping 〈·, ·〉 : E × E → R such that for allx, y, z ∈ E and λ, µ ∈ R we have:

i) 〈λx+ µy, z〉 = λ 〈x, z〉+ µ 〈y, z〉 (linearity in first argument);

ii) 〈x, y〉 = 〈y, x〉 (symmetry);

iii) 〈x, x〉 ≥ 0 and 〈x, x〉 = 0 if and only if x = 0 (positive definiteness).

Altogether 〈·, ·〉 is a positive definite, symmetric bilinear form on E.By ‖ · ‖ we label the norm2 on E induced by the scalar product, i.e.

‖x‖ :=√〈x, x〉 (x ∈ E),

2Recall that a norm on E is a mapping ‖ · ‖∗ : E→ R such that for all x, y ∈ E and λ ∈ R:

i) ‖x‖∗ = 0 ⇐⇒ x = 0 (definiteness)

ii) ‖λx‖∗ = |λ| · ‖x‖∗ (absolute homogeneity)

iii) ‖x+ y‖∗ ≤ ‖x‖∗ + ‖y‖∗ (triangle inequality)

4

Scalar product and induced norm obey the famous Cauchy-Schwarz inequality

| 〈x, y〉 | ≤ ‖x‖ · ‖y‖ (x, y ∈ E),

where equality holds if and only if x and y are linearly dependent.The open ball with radius ε > 0 centered around x ∈ E is denoted by Bε(x). In

particular, we put B := B1(0) for the open unit ball.

If (Ei, 〈·, ·〉i) (i = 1, . . . ,m) is a Euclidean space, then Xm

i=1 Ei is also a Euclidean spaceequipped with the canonical scalar product

〈·, ·〉 :m

Xi=1

Ei → R, 〈(x1, . . . , xm), (y1, . . . , ym)〉 :=m∑i=1

〈xi, yi〉i .

For n = dimE1,m = dimE2 < ∞, the set of all linear (hence continuous) operators isdenoted by L(E1,E2). Recall from Linear Algebra that L(E1,E2) is isomorphic to Rm×n,the set of all real m× n-matrices, and that L ∈ L(E1,E2) if and only if

1) L(λx) = λL(x) (x ∈ E1, λ ∈ R) (homogeneity);

2) L(x+ y) = L(x) + L(y) (x, y ∈ E1) (additivity).

It is known from Linear Algebra that (since we restrict ourselves to finite dimensionalEuclidean spaces alone) for L ∈ L(E1,E2) there exists3 a unique mapping L∗ ∈ L(E2,E1)such that

〈Lx, y〉E2= 〈x, L∗y〉E1

(x, y ∈ E1).

The mapping L∗ is called the adjoint (mapping) of L. If E1 = E2 and L = L∗, we call Lself-adjoint.

With the well-known definitions for

rgeL := L(x) ∈ E2 | x ∈ E1 (image of L)

andkerL := x ∈ E1 | L(x) = 0 (kernel of L)

for L : E1 → E2 linear, the following important relations are standard knowledge fromLinear Algebra.

At this, recall that, for some nonempty subset S ⊂ E we define its orthogonal comple-ment by

S⊥ := x ∈ E | 〈s, x〉 = 0 (s ∈ S) .

Theorem 2.1 (Fundamental subspaces). Let L ∈ L(E1,E2). Then the following hold:

a) kerL = (rgeL∗)⊥ and (kerL)⊥ = rgeL∗;

b) kerL∗ = (rgeL)⊥ and (kerL∗)⊥ = rgeL.

3The unique existence of the adjoint mapping is already guaranteed if only the preimage space is finitedimensional.

5

Minkowski addition and multiplication

For two sets A,B ⊂ E we define their Minkowski sum by

A+B := a+ b | a ∈ A, b ∈ B.

If B = b is a singleton, we put

A+B =: A+ b.

Moreover, if A = ∅ (or B = ∅) then A+B := ∅.In addition, for Λ ⊂ R, we put

Λ · A := λa | a ∈ A, λ ∈ Λ .

The operation (Λ, A) ⊂ 2R × 2E → 2E is called (generalized) Minkowski multiplication.For Λ = λ we simply write

λA := λ · A.

Using the above notation, for instance, we have

Bε(x) = x+ εB

for all x ∈ E and ε > 0.

2.2 Extended arithmetic and extended real-valued functions

Let R := [−∞,∞] be the extended real line. The following conventions for an extendedarithmetic have become standard in the optimization and convex analysis community:The uncritical ones are

α +∞ = +∞ =∞+ α and α−∞ = −∞+ α = −∞ (α ∈ R),

α · ∞ = sgnffl(α) · ∞ =∞ · α and α · (−∞) = −sgnffl(α) · ∞ = −∞ · α (α ∈ R \ 0).

Although we will try to avoid these cases whenever possible it is expedient for our purposesto also use the following conventions:

0 · ∞ = 0 = 0 · (−∞),

∞−∞ = −∞+∞ =∞ (inf-addition).

Every subset S ⊂ R of the extended real line has a supremum (least upper bound) and aninfimum (greatest lower bound), which could be infinite. We use the common conventions

inf ∅ = +∞ and sup ∅ = −∞.

Functions of the type f : E → R occur naturally in various areas of mathematics, inparticular in optimization (e.g. optimal value functions) or measure theory, as soon assuprema or infima are involved.

6

The domain of a function f : E→ R is given by

dom f := x | f(x) <∞ .

We call f proper if dom f 6= ∅ and f(x) > −∞ for all x ∈ E.A central object of study for extended real-valued functions is the epigraph, which for

f : E→ R is defined by

epi f := (x, α) ∈ E× R | f(x) ≤ α ,

see Figure 1 for an illustration. The epigraph establishes a one-to-one correspondence ofsets in E × R and functions E → R. In fact, all the important convex-analytical prop-erties of an extended real-valued function (like lower semicontinuity, convexity, positivehomogeneity or sublinearity) have their correspondence in the geometry and topology, re-spectively, of their epigraph. It is sometimes expedient to also consider the strict epigraph

epi <f := (x, α) ∈ E× R | f(x) < α

of f : E→ R.

gph f

epi f

x

f(x)

Figure 1: Epigraph of a function f : R→ R

Another useful tool are the level sets of f , which are defined by

lev≤αf := x ∈ E | f(x) ≤ α (α ∈ R).

We call f level-bounded if lev≤αf is bounded for all α ∈ R. Level-boundedness is tremen-dously important for the existence of solutions of optimization problems, see Theorem2.8.

The most prominent example of an extended real-valued function is as simple as it isimportant.

Definition 2.2 (Indicator function). For a set S ⊂ E the mapping δS : E → R ∪ +∞given by

δS(x) :=

0 if x ∈ S,

+∞ if x /∈ S

is called the indicator (function) of S.

7

2.3 Lower semicontinuity

We now want to establish the notion of lower semicontinuity for extended real-valuedfunctions. To this end, for f : E→ R and x ∈ E, we define

lim infx→x

f(x) := infα ∈ R

∣∣ ∃xk → x : f(xk)→ α

(2.1)

as the lower limit of f at x.

Definition 2.3 (Continuity notions for extended-real valued functions). Let f : E → Rand x ∈ E. Then f is said to be lower semicontinuous (lsc) at x if

lim infx→x

f(x) ≥ f(x).

In addition, f is called continuous at x if both f and −f are lsc at x, i.e. if

limx→x

f(x) = x,

where the latter means that for any sequence xk → x we have f(xk)→ f(x).

Since the constant sequence xk = x is admitted in (2.1), the inequality in the definitionof lower semicontinuity can actually be substituted for and equality.

This fact that can also be phrased as follows: f : E → R is lower semicontinuous atx if and only if there does not exist a sequence xk → x such that limk→∞ f(xk) < f(x).See also Figure 2 for an illustration: Here the lack of lower semicontinuity at x could beremedied by setting by assigning f the value lim inf

x→xf(x) = lim

x↓xf(x) at x.

xx

f(x)

Figure 2: A function f not lsc at x

We continue with an example of a function that is continuous in the extended real-valuedsense.

Example 2.4 ((Negative) log-determinant). Consider the function

f : Sn → R ∪ +∞, f(X) :=

− log(detX) if X 0,

+∞ else(2.2)

8

which we call the (negative) log-determinant or the (negative) logdet function, for short.Then f is proper and continuous: The properness is clear (as dom f = Sn++ 6= ∅). Recallthat the determinant mapping X 7→ det(X) is continuous (as it is a polynomial of thematrices’ entries, cf. Leibniz formula), and hence f is continuous on its open domaindom f = Sn++. Hence we only need to consider the critical cases of points on the boundaryof the domain, i.e. in X ∈ bd (dom f) = Sn+ \ Sn++ and sequences Xk ∈ Sn++ → X. Atthis, Hence, for X ∈ bd (dom f) and Xk ∈ Sn++ → X we have det(Xk)→ 0, and thus,we obtain

limk→∞

f(Xk) = limk→∞− log(det(Xk)) = +∞ = f(X),

thus f is continuous.

Lower semicontinuity plays an important role in our study. In fact, we often find it usefulto rectify the absence of lower semicontinuity of a function as follows:

We define the lower semicontinuous hull or closure of f to be the function cl f : E→ R,

(cl f)(x) := lim infx→x

f(x).

As the constant sequence xk = x is admitted in (2.1), we always have

lim infx→x

f(x) ≤ f(x) (x ∈ E),

hencecl f ≤ f

for every function f : E→ R. Moreover, we have

epi (cl f) = cl (epi f),

see Exercise 2.10 In particular, in view of the following result (which also clarifies why lscfunctions are also called closed ), this shows that cl f is always lsc.

Proposition 2.5 (Characterization of lower semicontinuity). Let f : E → R. Then thefollowing are equivalent:

i) f is lsc (on E);

ii) epi f is closed;

iii) lev≤αf is closed for all α ∈ R.

Proof. ’i)⇒ii)’: Let (xk, αk) ∈ epi f → (x, α). By lower semicontinuity we have

f(x) ≤ lim infk→∞

f(xk) ≤ limk→∞

αk = α,

hence (x, α) ∈ epi f . This shows that epi f is closed.

’ii)⇒ iii)’: Fix α ∈ R and let xk ∈ lev≤αf → x. Then (xk, α) ∈ epi f → (x, α),and by closedness of epi f , we have (x, α) ∈ epi f , i.e. x ∈ lev≤αf . Thus, lev≤αf is closed,

9

which proves the desired implication.

’iii)⇒ i):’(Contraposition) Suppose that f is not lsc. Then there exists x ∈ E, xk → xand such that

f(xk)→ α < f(x).

Now, pick r ∈ (α, f(x)). Then we have

f(xk) ≤ r < f(x) for all k sufficiently large.

But that means that x /∈ lev≤rf , although almost every member of the sequence xk liesin lev≤rf . Hence, lev≤rf cannot be closed.

Due to the ubiquitiousness of the indicator function, we state a closedness result for itexplicitly.

Corollary 2.6 (Lower semicontinuity of the indicator). For a set C ⊂ E its indicator δCis proper and lsc if and only if C is nonempty and closed.

Lower semicontinuity plays a central role in minimization problems, as the followingparagraph illustrates.

2.4 Optimization problems

Let f : E→ R and C ⊂ E. We define

infCf := inf

x∈Cf(x) := inf f(x) | x ∈ C

andsupCf := sup

x∈Cf(x) := sup f(x) | x ∈ C .

In this scenario, infC f and supC f describe a(n extended) real number. By slight abuse ofnotation (since a minimum/maximum does not need to exist) we write the optimizationproblems that come with f and C by

min f(x) s.t. x ∈ C and maxx∈C

f(x) s.t. x ∈ C,

respectively. The function f is called objective function in this conext, where C is referredto as the feasible set. If C = E, the respective minimization/maximization problems arecalled unconstrained and otherwise constrained.

Note that we always have

infCf = − sup

C−f and sup

Cf = − inf

C−f,

hence there is no big loss in generality if we primarily focus on minimization problems.We now define the notion of a local and global minimizers.

10

Definition 2.7 (Minimizers). Let f : E→ R ∪ +∞ and C ⊂ E. Then x ∈ C ∩ dom fis called

i) local minimizer of f over C if there exists ε > 0 such that

f(x) ≤ f(x) (x ∈ C ∩Bε(x)).

ii) global minimizer of f over C if

f(x) ≤ f(x) (x ∈ C).

A global (local) minimizer of f over C is also called a (local) solution of the optimizationproblem

min f(x) s.t. x ∈ C.

Obviously, every global minimizer is a local minimizer, while the converse implication is,in general, not true. We will see, however, that in the convex setting, in turn, this doeshold.

We define

argminCf := argminx∈Cf(x) :=x ∈ C

∣∣∣ f(x) = infCf,

i.e. argminCf is the set of alll global minimizers of f over C. Using indicator functions,we have

infCf = inf

E(f + δC).

In fact, the minimization problems

min f(x) s.t. x ∈ C

andmin f(x) + δC(x)

are fully equivalent, in that not only there optimal values but also there (global and local)solutions coincide (if they exist), in particular

argminCf = argminEf + δC .

We also define

argmaxCf := argmaxx∈Cf(x) :=

x ∈ C

∣∣∣∣ f(x) = supCf

.

We want to emphasize here that, when minimizing proper functions f : E→ R ∪ +∞,we always have the implication

argminEf 6= ∅ ⇒ infEf ∈ R.

11

However, the converse implication does not hold in general: Consider for example thefunction

f : x ∈ R 7→

1x

if x > 0,+∞ else.

Then clearly, infR f = 0 but argminf = ∅.The significance of lower semicontinuity for minimization problems is highlighted by

the following famous result along the lines of the foregoing remark.

Theorem 2.8 (Existence of minima). Let f : E → R ∪ +∞ be proper, lsc and level-bounded. Then

argminEf 6= ∅ and infEf ∈ R.

Proof. Let f ∗ := infE f < ∞. There exists a sequence xk such that f(xk) → f ∗.Choosing α ∈ (f ∗,+∞) we have xk ∈ levαf for all k ∈ N sufficiently large. Since f islsc, levαf is closed by Proposition 2.5, and since f is level-bounded by assumption, levαis compact. Hence, by the Bolzano-Weierstrass Theorem, there exists x ∈ levα and aninfinite subset K ⊂ N such that xk →K x.

Lower semicontinuity of f then implies

f(x) ≤ lim infx→x

f(x) ≤ limk∈K

f(xk) = f ∗,

hence f(x) = f ∗, which proves the assertion.

The above theorem is the blueprint for what is called the direct method of the calculus ofvariations.

Exercises to Chapter 2

2.1 (Openness of Sn++) Argue that Sn++ is open in Sn.

2.2 (Riesz representation theorem - finite dimensional version) Let (E, 〈·, ·〉)be a (finite dimensional) Euclidean space and L ∈ L(E,R). Show that there existsa unique b ∈ E such that

L(x) = 〈b, x〉 (x ∈ E).

2.3 (Orthogonal matrices) Show that O(n) is a compact subset of Rn×n.

2.4 (Logdet and trace inequality) Let A ∈ Sn++. Show that

log(detA) + n ≤ tr (A).

2.5 (Matrix-fractional function) For Ω := Rn × Sn++ consider the function

f : Ω→ R, f(x, V ) =1

2xTV −1x.

12

Prove that f is differentiable by showing that

∇f(x, V ) =

[V −1x,−1

2V −1xxTV −1

](x, V ∈ Rn × Sn)

(w.r.t the standard scalar product on Rn × Sn).

Hint: Use the fact that for T ∈ Rn×n with ‖T‖ < 1 we have

(I − T )−1 =∞∑k=0

T k (Neumann series).

2.6 (Minimizing a linear function over the unit ball) Let g ∈ Rn \ 0. Computethe solution of the optimization problem

min 〈g, d〉 s.t. ‖d‖ ≤ 1.

2.7 (Minimizing a quadratic function) For A ∈ Sn and b ∈ Rn consider thequadratic function

q : Rn → R, q(x) =1

2xTAx+ bTx.

Prove (without using first-order optimality conditions) that the following are equiv-alent:

i) infRn q > −∞;

ii) A 0 and b ∈ rgeA;

iii) argminRnq 6= ∅.

2.8 (Topology of Minkowski sum) Let A,B ⊂ E nonempty. Prove:

a) A+B is open if A or B is open.

b) A+B is closed if both A and B are closed and at least one of them is bounded.Illustrate by a counterexample that the boundedness assumption cannot beomitted in general.

2.9 (Closures and interiors of epigraphs) Let f : E→ R. Then the following hold:

a) (x, α) ∈ cl (epi f) ⇐⇒ α ≥ lim infx→x f(x).

b) (x, α) ∈ int (epi f) ⇐⇒ α > lim supx→x f(x).

2.10 (Lower semicontinuous hull) Let f : E→ R. Show that

epi (cl f) = cl (epi f),

i.e. cl f is the function whose epigraph is the closure of epi f .

2.11 (Domain of an lsc function) Is the domain of an lsc function closed?

13

2.12 (Closedness of a positive combination) For p ∈ N let fi : E → R ∪ +∞ belsc and αi ≥ 0. Show that f :=

∑pi=1 αifi is lsc.

2.13 (Closedness preserving compositions) Let f : E→ R ∪ +∞ be lsc.

a) Show that f g is lsc if g : E′ → E is continuous.

b) Show that φ f is lsc if φ : R→ R is monotonically increasing and we use theconvention φ(+∞) = supt∈R φ(t).

3 Fundamentals from Convex Analysis

3.1 Convex sets

Definition 3.1 (Convex sets). A set C ⊂ E is called convex if

λx+ (1− λ)y ∈ C (x, y ∈ C, λ ∈ [0, 1]). (3.1)

In other words, a convex set is simply a set which contains all connecting lines of pointsfrom the set, see Figure 3 for examples.

Figure 3: Convex sets in R2

A vector of the form

r∑i=1

λixi,r∑i=1

λi = 1, λi ≥ 0 (i = 1, . . . , r)

is called a convex combination of the points x1, . . . , xr ∈ E. It is easily seen that a setC ⊂ Rn is convex if and only if it contains all convex combinations of its elements.

Below is a list of important classes of convex sets as well as operations that preserveconvexity.

Example 3.2 (Convex sets).

a) (Subspaces) Every subspace of E (in particular E itself) is convex, as convex com-binations are special cases of linear combinations.

14

b) (Minkowski sum) The Minkowski sum

A+B := a+ b | a ∈ A, b ∈ B

of two convex sets A,B ⊂ E is convex: For x, y ∈ A + B there exist a, a′ ∈ A andb, b′ ∈ B such that x = a+ b and y = a′ + b′. Then for λ ∈ [0, 1] we have

λx+ (1− λ)y = λ(a+ b) + (1− λ)(a′ + b′)

= λa+ (1− λ)a′ + λb+ (1− λ)b′

By convexity of A and B, respectively, we see that λa+ (1−λ)a′ ∈ A and λb+ (1−λ)b′ ∈ B, hence λx+ (1− λ)y ∈ A+B.

c) (Affine sets) Any set S ⊂ E which has a representation S = x+U , where x ∈ E andU ⊂ E is a subspace is called an affine set and is, due to a) and b), in particular,convex. It can be seen that the subspace U is uniquely determined and given byS − S. Moreover, S ⊂ E is affine if and only if

αx+ (1− α)y ∈ S (x, y ∈ S, α ∈ R).

For these details see, e.g. [24, Section 1].

d) (Intersection) Arbitrary intersections of convex sets are convex, see Exercise 3.1.1a).

e) (Linear images and pre-images) For convex sets C ⊂ E1 and D ⊂ E2 and a linearmapping F : E1 → E2 the sets F (C) and F−1(D) are convex, see Exercise 3.1.1 b).

f) (Hyperplanes) For s ∈ E \ 0 and γ ∈ R the set

x ∈ E | 〈s, x〉 = γ

is called a hyperplane. It is a convex set, which is easily verified elementary or asa special case part e) with F : x 7→ 〈s, x〉 and D = γ.

g) (Half-spaces) Sets of the form

x ∈ E | 〈s, x〉 ≥ γ , x ∈ E | 〈s, x〉 > γ

interval

h) (Intervals) The intervals (closed, open, half-open) are exactly the convex sets in R.

3.1.1 The convex hull

Definition 3.3 (Convex hull). Let M ⊂ E be nonempty. The convex hull of M is the set

convM :=⋂M⊂C,

C convex

C,

i.e. the convex hull of M is the smallest convex set containing M .

15

We can obtain an intrinsic characterization of the convex hull of a set by means ofconvex combinations of their elements. At this, for x1, . . . , xp ∈ E and λ ∈ ∆p we call∑p

i=1 λixi a convex combination.

Proposition 3.4 (Characterization of the convex hull). Let M ⊂ E be nonempty. Thewe have

convM =

r∑i=1

λixi | r ∈ N, λ ∈ ∆r, xi ∈M (i = 1, . . . , r)

Proof. Exercise 3.1.2.

Proposition 3.4 tells us that the convex hull of a set can be seen as the set of all convexcombinations of elements from the set in question.

In an N -dimensional (N ∈ N) space E this can be sharpened as follows.

Theorem 3.5 (Caratheodory’s Theorem). Let M ⊂ E be nonempty. Then we have

conv M =

N+1∑i=1

λixi | λ ∈ ∆N+1, xi ∈M (i = 1, . . . , N + 1)

,

i.e. every vector in convM can be written as a convex combination of at most N + 1elements from M .

Proof. Let x ∈ conv M . By Proposition 3.4 there exists r ∈ N , λ1, . . . , λr > 0 andx1, . . . , xr ∈M such that

r∑i=1

λi = 1 andr∑i=1

λixi = x. (3.2)

If r ≤ N + 1 there is nothing to do.Hence, let r > N + 1. We are going to show that x can already be written as a convex

combination of r − 1 elements from M , which then (inductively) gives the assertion.As r > N + 1, the vectors

(x1, 1), . . . , (xr, 1) ∈ E× R

are linearly dependent. Hence, there exist α1, . . . , αr ∈ R such that

0 =r∑i=1

αi(xi, 1) (3.3)

and αi 6= 0 for at least one index i.W.l.o.g. we can assume that∣∣∣αi

λi

∣∣∣ ≤ ∣∣∣αrλr

∣∣∣ ∀i = 1, . . . , r − 1, (3.4)

16

(i.e. r = argmax∣∣∣αiλi ∣∣∣ | i = 1, . . . , r

) hence, in particular αr 6= 0. Putting βi :=

− αiαr

(i = 1, . . . , r − 1), equation (3.3) yields

xr =r−1∑i=1

βixi andr−1∑i=1

βi = 1. (3.5)

Then (3.2) and (3.5) imply

x =r∑i=1

λixi =r−1∑i=1

λixi + λr

r−1∑i=1

βixi =r−1∑i=1

(λi + λrβi)xi,

Setting λi := λi + λrβi (i = 1, . . . , r − 1) we thus obtain

x =r−1∑i=1

λixi,

andr−1∑i=1

λi =r−1∑i=1

λi + λr

r−1∑i=1

βi = 1− λr + λr = 1.

Due to (3.4) we also have

λi = λi − λrαiαr

> 0.

Thus, x is already a convex combination of the r−1 vectors x1, . . . , xr−1, which concludesthe proof.

3.1.2 Topological properties of convex sets

We start by reviewing the fundamental concept of the (topological) closure, interior andboundary of an arbitray set M ⊂ E. The (topological) closure clM of M , which is theintersection all closed sets containing M , can also be written as

clM = x | ∀ε > 0 ∃y ∈ Bε(x) ∩M ,

in particular, clM is the set of all cluster (or, equivalently, limit) points of sequences inM . It is easy to see that clM can also be written as

clM =⋂ε>0

M + εB. (3.6)

The interior intM of M is defined by

intM := x ∈M | ∃ε > 0 : Bε(x) ⊂M .

The boundary bdM of M is given by

bdM := clM \ intM.

The first topological result shows that convexity of a set is inherited to its interior andits closure.

17

Proposition 3.6 (Closure and interior of a convex set). Let S ⊂ E be convex. Then clSand intS (which could well be empty, even if S is not) are convex, too.

Proof. Since

clS =⋂ε>0

(S + εB),

and S + εE is convex (see Example 3.2 b)), the convexity of clS follows from Example3.2 d).

Now, let x, y ∈ intS, hence there exist open neighborhoods U1, U2 ⊂ S of x and y,respectively. It follows for λ ∈ [0, 1] that

λx+ (1− λ)y ⊂ λU1 + (1− λ)U2 ⊂ λS + (1− λ)S ⊂ S,

which proves the result as λU1 + (1− λ)U2 is open by Exercise 2.8 a).

Boundedness, closedness and compactness are fundamental topological properties of sub-sets in E. At this point, we want to study in how far they are preserved under the convexhull operation.

Recall that, in (the finite-dimensional space) E, a subset is compact if and only if itis bounded and closed.The following result shows that compactness is preserved under the convex hull operator.

Proposition 3.7. Let M ⊂ E be compact. Then convM is compact.

Proof. Exercise 3.1.3 a).

As an immediate consequence, we obtain that boundedness, too, is preserved under theaffine hull operator.

Corollary 3.8. Let M ⊂ E be bounded. Then convM is bounded.

Proof. Exercise 3.1.3 b).

Unfortunately, closedness is, in general, not preserved under the convex hull operation,which we will illustrate by an example below. In the face of Proposition 3.7 the set tochoose here, necessarily needs to be unbounded.

Example 3.9. Consider M := (

00

) ∪

(a1

)| a ≥ 0 ⊂ R2. Then M is closed and(

11k

)=

1

k

(k

1

)+(

1− 1

k

)(0

0

)∈ convM (k ∈ N).

On the other hand, as one can easily verify, limk→∞(

11k

)=(

10

)/∈ convM , and hence

convM is not closed, see Figure 4.

This justifies the following definition.

Definition 3.10 (Closed convex hull). Let S ⊂ E be nonempty. Then its closed convexhull is the intersection of all closed convex sets containing it, we denote it by convS.

18

1

1

Figure 4: Closed set, non-closed convex hull

It is not surprising that the closed convex hull equals the closure of the convex hull of aset.

Proposition 3.11. Let S ⊂ E be nonempty. Then convS = cl (convS).

Proof. ’⊂’: cl (convS) is a closed convex set containing S, hence convS ⊂ cl (convS), asconvS is the smallest closed and convex set containing S.’⊃’: We have S ⊂ convS, hence convS ⊂ convS, thus cl (convS) ⊂ convS, as convS isclosed and convex.

3.1.3 Projection on convex sets and a separation theorem

For a set S ⊂ Rn and a given point in x ∈ E we want to assign to x the subset of pointsin S which have the shortest distance to it. We formalize this in the following definition.

Definition 3.12 (Projection on a set). Let S ⊂ E be nonempty and x ∈ E. Then wedefine the projection of x on S by

PS(x) := argminy∈S‖x− y‖ ⊂ S.

Observe that no changes occur if we substitute y 7→ ‖y − x‖ for y 7→ 12‖y − x‖2 in the

above definition.In general, the projection PS is a set-valued map E ⇒ S. It is well known, however,

that it is single-valued for nonempty, closed, convex sets. We record this fact in the nextresult.

Lemma 3.13 (Projection on closed convex sets). Let C ⊂ E be nonempty, closed andconvex. Then PC is a mapping E→ C with x = PC(x) if and only if x ∈ C.

Figure 5 illustrates this fact.The following theorem gives an important characterization of the projection on a closedconvex set in terms of a variational inequality.

For its proof, observe that by the definition of the Euclidean norm and the canonicalscalar product 〈·, ·〉 we have

‖x± y‖2 = ‖x‖2 ± 2 〈x, y〉+ ‖y‖2 (x, y ∈ E).

19

x

PC(x)

C

y

PC(y)

Figure 5: Projection on a closed convex set

Theorem 3.14 (Projection Theorem). Let C ⊂ E be nonempty, closed and convex andlet x ∈ E. Then v = PC(x) if and only if

v ∈ C and 〈v − x, v − v〉 ≥ 0 (v ∈ C). (3.7)

Proof. First, assume that v = PC(x) ∈ C and define f : E → R, f(v) = 12‖v − x‖2. By

convexity of C, we have v + λ(v − v) ∈ C for all v ∈ C and λ ∈ (0, 1). This implies

1

2‖v − x‖2 = f(v) ≤ f(v + λ(v − v)) =

1

2‖(v − x) + λ(v − v)‖2 (v ∈ C, λ ∈ (0, 1)),

which, in turn, gives

0 ≤ 1

2‖(v − x) + λ(v − v)‖2 − 1

2‖v − x‖2

= λ 〈v − x, v − v〉+λ2

2‖v − v‖2

for all v ∈ C and λ ∈ (0, 1). Dividing by λ yields

0 ≤ 〈v − x, v − v〉+λ

2‖y − v‖2.

Letting λ ↓ 0 gives the desired inequality in (3.7).In order to see the converse implication, let v ∈ E such that (3.7) holds. For v ∈ C

we hence obtain

0 ≥ 〈x− v, v − v〉= 〈x− v, v − x+ x− v〉= ‖x− v‖2 + 〈x− v, v − x〉≥ ‖x− v‖2 − ‖x− v‖ · ‖v − x‖,

where the last inequality is due to the Cauchy-Schwarz inequality. If x 6= v we can divideby ‖x− v‖ > 0 and infer

‖x− v‖ ≤ ‖x− v‖ (v ∈ C),

i.e. v = PC(x). If, in turn,

20

A geometrical interpretation of the projection theorem is as follows: The angle betweenx− PC(x) and v − PC(x) must be at least 90 for all v ∈ C.

Remark 3.15. A nonempty set S ⊂ H in an arbitrary Hilbert space is called Chebyshevset if PS is single-valued (i.e. for every x ∈ H the projection PS(x) is a singleton).

In finite dimension the (nonempty) closed and convex sets are exactly the Chebyshevsets, see [2, Corollary 21.11]. In infinite dimension this is still an open problem knownas the Chebyshev problem.

The projection theorem in the form of Theorem 3.14 remains valid (with literally thesame proof) if the assumption that E be finite dimensional is dropped and one works inHilbert space.

The following separation theorem is an immediate consequence of the projection theorem.

Theorem 3.16 (Separation Theorem). Let C ⊂ E be nonempty, closed and convex, andlet x /∈ C. Then there exists s ∈ E \ 0 with

〈s, x〉 > supv∈C〈s, v〉 .

Proof. Put s := x− PC(x) 6= 0. Then the projection theorem yields

0 ≥ 〈x− PC(x), v − PC(x)〉 = 〈s, v − x+ s〉 = 〈s, v〉 − 〈s, x〉+ ‖s‖2 (v ∈ C).

Thus,〈s, x〉 − ‖s‖2 ≥ 〈s, v〉 (v ∈ C),

hence, s fulfills the requirements of the theorem.

We would like to note some technicalities about the former theorem.

Remark 3.17. Under the assumptions of Theorem 3.16 the following hold:

a) The vector s can always be substituted for −s and thus, there exists s ∈ E\0 suchthat 〈s, x〉 < infv∈C 〈s, v〉.

b) By positive homogeneity, we can assume w.l.o.g. that ‖s‖ = 1.

c) The statement of the separation theorem can also be formulated as follows: ForC ⊂ E nonempty, closed and convex and x /∈ C there exist s ∈ E \ 0 and β ∈ Rsuch that

〈s, x〉 > β ≥ 〈s, v〉 (v ∈ C).

It is not quite clear yet why the above theorem was labeled separation theorem. In thesituation of the theorem, define γ := 1

2(〈s, x〉+ supy∈C 〈s, y〉). Then

x ∈ z | 〈s, z〉 > γ and C ⊂ z | 〈s, z〉 < γ ,

i.e. x and C lie in two distinct open half-spaces induced by the hyperplane H =z∣∣ sT z = γ

. We say that H separates the set C from the point x /∈ C. This situation

is illustrated in Figure 9.

21

C

•x

H = z | 〈s, z〉 = γ

s

Figure 6: Separation of a point from a closed convex set

Remark 3.18. In view of Remark 3.15 it is clear that the separation theorem in the formof Theorem 3.16 remains valid if for an arbitrary real Hilbert space instead of E withouteven changing the proof.

There is an extension of this result to arbitrary Banach spaces, but the existing proofsrely on the axiom of choice, usually in the form of Zorn’s Lemma.

3.1.4 The relative interior

For a nonempty set M ⊂ E, its affine hull is given by

aff M :=⋂S ∈ E |M ⊂ S, S affine ,

cf. Example 3.2 c). One can easily see (like in the case of the linear hull) that the intrinsiccharacterization

aff M =

r∑i=1

λixi

∣∣∣∣∣r∑i=1

λi = 1, xi ∈M (i = 1, . . . , r)

holds, i.e. aff M is the set of all affine combinations of M , see Exercise 3.1.6.

For convex sets the (topological) interior is too restrictive a concept as in many casesit is going to be empty. For instance, consider the line segment C := [x, y] for somex, y ∈ Rn. We have intC = ∅, but on the other hand it would be nice to declare the set(x, y)(:= C \ x, y) as the ’interior of C in some sense’. Since aff C = λx + (1 − λ)y |λ ∈ R, it is easily verified that

(x, y) = z | ∃ε > 0 : Bε(z) ∩ aff C ⊂ C ,

i.e. (x, y) is the interior of C with respect to the relative topology induced by the affinehull of C. We are going to make a general principle out of that.

Definition 3.19 (Relative interior/boundary and convex dimension). Let C ⊂ E beconvex. Then the relative interior riC of C is its interior with respect to the relativetopology induced by aff C, i.e.

riC := x ∈ C | ∃ε > 0 : Bε(x) ∩ aff C ⊂ C .

22

C aff C dimC riC

x x 0 x[x, x′] λx+ (1− λ)x′ | λ ∈ R 1 (x, x′)∆n a | eTna = 1 n− 1 a ∈ ∆n | ai > 0Bε(x) E N Bε(x)

Table 1: Examples for relative interiors

The relative boundary of C is given by

rbdC := clC \ riC,

which is closed. In addition, we define dimC := dim(aff C) to be the (convex) dimensionof C.

aff CriC

C

Figure 7: Illustration of relative interior

Figure 3.1.4 illustrates the relative interior and affine hull.By definition, for any convex set C ⊂ E, we have

intC ⊂ riC ⊂ C ⊂ clC. (3.8)

One might ask the question why we did not define a relative closure along with the relativeinterior and boundary, respectively. The reason for this is that there is no differencebetween the closure in the standard topology and in the topology relative to the affinehull. This holds since, briefly speaking, aff C is closed and hence clC ⊂ aff C.

Table 3.1.4 contains a list of examples of frequently used sets and their relative inte-riors. We urge the reader to verify them. Note that the first two show that the relativeinterior (as opposed to the interior) is non-monotonic in the sense that C1 ⊂ C2 doesnot necessarily imply riC1 ⊂ riC2. However, Corollary 3.27 points shows that, in mostsituations, monotonicity is valid.

23

Remark 3.20. The relative interior of a convex set C ⊂ E coincides with the interiorif (and only if) C has full dimension, i.e. when aff C = E, in which case all topologicalquestions reduce to the standard topology.

If, on the other hand, dimC = m < N(= dimE), and we choose any m-dimensionalsubspace U ⊂ E, there exists an invertible affine mapping F : E→ E such that F (aff C) =U , see [24, Corollary 1.6.1]. That is F maps aff C to U in a homeomorphic (even diffeo-morphic) way. Thus, it holds that

aff F (C) = F (aff C) = U,

and therefore, it is often possible to reduce topological questions about arbitrary (lowerdimensional) convex sets to the full dimensional case by just working with the affine,diffeomorphic image F (A) as a full dimensional convex set in some (sub-)space U .

Remark 3.20 already comes into play in the proof of the next result.

Proposition 3.21 (Line segment principle). Let C ⊂ E be convex as well as x ∈ riCand y ∈ clC. Then we have [x, y) ∈ riC, i.e.

(1− λ)x+ λy ∈ riC (λ ∈ [0, 1))

Proof. In view of Remark 3.20 we may assume w.l.o.g that dimC = N , i.e. riC = intC.Now, let λ ∈ [0, 1). Since y ∈ clC , we have

y ∈ C + εB (ε > 0). (3.9)

Hence, using Minkowski addition, we get

Bε((1− λ)x+ λy) = (1− λ)x+ λy + εB(3.9)⊂ (1− λ)x+ λ(C + εB) + εB

= (1− λ)[x+

1 + λ

1− λεB︸︷︷︸

=Bε 1+λ

1−λ(x)

]+ λC

for all ε > 0. Since x ∈ intC, we have Bε 1+λ1−λ

(x) ⊂ C for all ε > 0 sufficiently small.

Hence, for these ε > 0, we get

Bε((1− λ)x+ λy) ⊂ (1− λ)C + λC = C,

which shows that (1− λ)x+ λy ∈ intC, and hence concludes the proof.

We encourage the reader to emulate the proof of Proposition 3.21 without the assumptionthat C has nonempty interior to see that nothing changes other than the necessity ofintersecting with aff C.

An immediate consequence of the line segment principle is the convexity of the relativeinterior of a convex set (just take y ∈ riC in Proposition 3.21).

We now show that for C ⊂ E convex, the three convex sets C, riC and clC have thesame affine hull, hence the same convex dimension.

24

Theorem 3.22. Let C ⊂ E convex. Then aff (clC) = aff C = aff (riC). In particular, wehave dimC = dim clC = dim riC and riC 6= ∅ if C 6= ∅.

Proof. Since aff C is closed we have clC ⊂ aff C. Using again the properties of therespective hull operators ’cl ’ and ’aff ’ we hence obtain

aff C ⊂ aff (clC) ⊂ aff (aff C) = aff C,

therefore, in particular, we have aff C = aff (clC).We now show that aff C = aff (riC): In view of Remark 3.20 we can assume that

aff C = E, i.e. dimC = N . Hence, it suffices to show that intC 6= ∅ under this assump-tion, since then also aff (intC) = E. For these purposes, let x0, x1, . . . , xN ∈ C be affinelyindependent and define

S := conv x0, x1, . . . , xN ⊂ C.

We show that

x :=1

N + 1

N∑i=0

xi ∈ S

is an interior point of S hence also of C.Notice that E = aff S = aff x0, x1, . . . , xN, and x0, x1, . . . , xN are affinely indepen-

dent. Hence, for every y ∈ E, we have unique scalars β0(y), β1(y), . . . , βN(y) ∈ R with∑Ni=0 βi(y) = 1 such that

N∑i=0

βi(y)xi = x+ y =1

N + 1

N∑i=0

xi + y.

Thus, putting αi(y) := βi(y)− 1N+1

(i = 0, 1, . . . , N), the vector α(y) ∈ RN+1 is the uniquesolution of the linear system

y =N∑i=0

αixi, 0 =N∑i=0

αi.

The thus induced mapping y ∈ E 7→ α(y) is linear, hence continuous. Thus, we can findδ > 0 such that for all y ∈ δB, we have |αi(y)| ≤ 1

N+1. But then βi(y) = αi(y) + 1

N+1≥ 0

and hence x+y =∑N

i=0 βi(y)xi ∈ conv x0, . . . , xN = S for all y ∈ δE. Thus, x+δB ⊂ S,which gives the assertion.

We continue by providing a list of useful properties of the relative interior.

Proposition 3.23. Let C ⊂ E convex. Then the following hold:

a) ri (riC) = riC = ri (clC);

b) clC = cl (riC);

c) rbdC = rbd (riC) = rbd (clC).

25

Proof. By Remark 3.20 we can again assume that aff C = E, i.e. the standard and therelative topology coincide. The statements then follow from the well-known facts for theinterior, boundary and closure of sets with nonempty interior.

The fact that two convex set with the same closure have the same relative interior is animmediate consequence, which we want to state explicitly because it is so frequently used.

Corollary 3.24. Let C1, C2 ⊂ E such that clC1 = clC2. Then riC1 = riC2.

Proof. From Proposition 3.23 we infer that riC1 = ri (clC1) = ri (clC2) = riC2.

We next present another useful principle for the relative interior of a convex set that wecall the stretching principle.

Proposition 3.25 (Stretching principle). Let C ⊂ E be a nonempty convex set . Thenit holds that

z ∈ riC ⇐⇒ ∀x ∈ C ∃µ > 1 : µz + (1− µ)x ∈ C.

Proof. First, let z ∈ riC. By definition, there exists ε > 0 such that Bε(z) ∩ aff C ∈ C.Moreover, for every x ∈ C and µ ∈ R we have µz + (1 − µ)x ∈ aff C. For every µsufficiently close to 1, µz + (1− µ)x ∈ Bε(z). Hence, µz + (1− µ)x ∈ aff C ∩Bε(z) ⊂ C.

Now, suppose that z satisfies the condition on the right-hand side of the equivalence:As riC 6= ∅ by Theorem 3.22, there exists x ∈ riC. By assumption we then havey := µz + (1− µ)x ∈ C for some µ > 1. Then z = λy+ (1− λ)x, where λ := µ−1. By theline segment principle z ∈ riC.

Proposition 3.25 basically says that every line segment in C having z ∈ riC as oneendpoint can be, to some extent, stretched beyond z without leaving C.

In the following result we want to investigate how relative interiors and closures behavewith regard to intersection of sets. For these purposes, recall that for an intersection of afamily of sets Ai ∈ E (i ∈ I), it always holds that

cl⋂i∈I

Ai ⊂⋂

clAi

due to the monotonicity and idempotence of the closure operator and the fact that anarbitrary intersection of closed sets is closed.

Proposition 3.26. Let Ci ⊂ E be convex for i ∈ I (an index set) such that⋂i∈I riCi 6= ∅.

Then the following hold:

a) cl⋂i∈I Ci =

⋂i∈I clCi;

b) ri⋂i∈I Ci =

⋂i∈I riCi if I is finite.

26

Proof. Fix x ∈⋂i∈I riCi. By the line segment principle, given any y ∈

⋂i∈I clCi, we have

(1− λ)x+ λy ∈⋂i∈I riCi for all λ ∈ [0, 1). Moreover, we have y = limλ→1(1− λ)x+ λy.

As y was chosen arbitrarily in⋂i∈I clCi, we obtain⋂

i∈I

clCi ⊂ cl⋂i∈I

riCi ⊂ cl⋂i∈I

Ci ⊂⋂i∈I

clCi.

This proves a) and shows at the same time that⋂i∈I riCi and

⋂i∈I Ci have the same

closure, hence, by Corollary 3.24, the same relative interior. Thus,

ri⋂i∈I

Ci ⊂⋂i∈I

riCi.

In order to prove the opposite inclusion we assume that I is finite. Fix z ∈⋂i∈I riCi.

Then by the stretching principle (applied to every Ci), for every x ∈⋂Ci and every i ∈ I

there exists µi > 0 such that µiz + (1 − µi) ∈ Ci. Putting µ := mini∈I µi > 1 (I finite!),we see that µz + (1− µ)x ∈

⋂Ci. Hence, again by the stretching principle, z ∈ ri

⋂Ci,

which completes the proof.

As was pointed out above, the relative interior operator is not necessarily monotone. Thefollowing result tells us, however, that in many situations, it actually is.

Corollary 3.27. Let C1, C2 ⊂ E be convex sets such that C2 ⊂ clC1 but C2 6⊆ rbdC1.Then riC2 ⊂ riC2.

Proof. Since by assumption C2 ⊂ clC1, by Proposition 3.23 b) and the monotonicity ofthe closure operator we have

cl (riC2) = clC2 ⊂ clC1 (3.10)

We claim that riC1∩ riC2 6= ∅: Otherwise (3.10) would yield riC2 ⊂ clC1 \ riC1 = rbdC1

which is closed, hence C2 ⊂ cl (riC2) ⊂ rbdC1, which contradicts our assumption. HenceriC1 ∩ riC2 6= ∅, and we may apply Proposition 3.26 to obtain

riC1 ∩ riC2 = riC1 ∩ ri (clC2) = ri (C1 ∩ clC2) = riC2,

hence riC2 ⊂ riC2.

Next, we want to show that affine mappings preserve relative interiors (as was alreadyforeshadowed in Remark 3.20). To this end, recall that for a continuous function f : E1 →E2 and A ∈ E1, we have

f(clA) ⊂ cl (f(A)).

Proposition 3.28 (Relative interior under affine mappings). Let F : E1 → E2 be affineand C ⊂ E1 convex. Then

riF (C) = F (riC).

27

Proof. First note that by Example 3.2 e) the set F (C) is convex, so that we can even talkabout its relative interior.

By the continuity of F , the monotonicity of the closure operation and Proposition 3.23b) we obtain

F (C) ⊂ F (clC) = F (cl (riC)) ⊂ clF (riC) ⊂ clF (C).

Hence, taking the (idempotent) closure again on both sides, we get clF (C) = clF (riC).Therefore, see Corollary 3.24, riF (C) = riF (riC) ⊂ F (riC).

In order to prove the converse inclusion take z ∈ F (riC). Moreover, let x ∈ F (C). Inaddition, choose z′ ∈ F−1(z) ⊂ riC and x′ ∈ F−1(x) ⊂ C. By the stretching principle,there exists µ > 1 such that µz′ + (1− µ)x′ ∈ C, and thus

F (µz′ + (1− µ)x′) = µz + (1− µ)x ∈ F (C).

As x ∈ F (C) was chosen arbitrarily, we can apply the stretching principle (in the oppositedirection to before) to z and infer that z ∈ riF (C), which concludes the proof.

We close out this paragraph on the relative interior with a result that will be very usefulfor our study in section 5.3. It is usually referenced to [24, Theorem 6.8].

Theorem 3.29. Let C ⊂ E1×E2. For each y ∈ E1 we define Cy := z ∈ E2 | (y, z) ∈ C and D := y | Cy 6= ∅. Then riC = (y, z) | y ∈ riD, z ∈ riCy .

Proof. The projection L : (y, z) 7→ y has L(C) = D, hence L(riC) = riD, by Proposition3.28. Given y ∈ riD and the affine set M := (y, z) | z ∈ E2, Proposition 3.26 (Maffine!) and Exercise 3.1.8 we have

M ∩ riC = ri (M ∩ C) = ri (y × Cy) = ri y × riCy = (y, z) | z ∈ riCy ,

which is exactly the set of points in riC mapped to y by L. This shows the desiredstatement (clear?).

Exercises for Section 3.1

3.1.1 (Convexity preserving operations on sets)

a) (Intersection) Let I be an arbitrary index set (possibly uncountable) and letCi ⊂ E (i ∈ I) be a family of convex sets. Show that

⋂i∈I Ci is convex.

b) (Linear images and preimages) Let F ∈ L(E1,E2) and let C ⊂ E1, D ⊂ E2 beconvex. Show that

F (C) := Ax | x ∈ C and F−1(D) = x | Ax ∈ D

are convex.

28

3.1.2 (Convex hull of a set) Let M ⊂ E. Show that

convM =

r∑i=1

λixi | r ∈ N, λ ∈ ∆r, xi ∈M (i = 1, . . . , r)

,

i.e. convM is the set of all convex combinations of vectors in M .

Hint: You may use the comment in the notes that a convex set contains all convex combinations

of its elements.

3.1.3 (Topology of convex hulls) Let M ⊂ E. Show the following:

a) If M is compact then convM is compact.

b) If M is bounded then convM is bounded.

3.1.4 (Convex hulls) Let F : E → E′ be affine and A,C ⊂ E and B ⊂ E′ nonempty.Prove the following:

a) convF (A) = F (convA);

b) conv (A×B) = convA× convB;

c) conv (A+ C) = convA+ convC.

3.1.5 (Spectahedron) For n ∈ N compute convuuT ∈ Sn | u ∈ Rn : ‖u‖ = 1

.

3.1.6 (Affine hulls) Let M ⊂ E be nonempty and let F : E → E′ be linear. Show thefollowing:

a) aff M = ∑r

i=1 λixi |∑r

i=1 λi = 1, xi ∈M (i = 1, . . . , r).b) aff (F (M)) = F (aff M).

3.1.7 (Characterization of relative interior points) Let C ⊂ E be nonempty andconvex and x ∈ C. Show that the following are equivalent:

i) x ∈ riC;

ii) R+(C − x) is a subspace of E.

3.1.8 (Relative interiors and cartesian products) For i = 1, . . . , p let Ci ⊂ Ei. Showthat

ri

(p

Xi=1

Ci

)=

p

Xi=1

riCi.

3.1.9 (Conex hull and relative boundary) Let C ⊂ E be nonempty, convex andcompact. Show that conv (rbdC) = C.

3.1.10 (Open mapping theorem - finite dimensional version) Let A ⊂ E be openand L ∈ L(E,E′) surjective. Then L(A) is open.

29

3.2 Convex cones

3.2.1 Convex cones and conical hulls

Definition 3.30 (Cones). A nonempty set K ⊂ E is said to be a cone if

λK ⊂ K (λ ≥ 0),

i.e. K is a cone if and only if it is closed under multiplication with nonnegative scalars.We call K pointed if for any p ∈ N the implication

x1 + · · ·+ xp = 0 ⇒ xi = 0 (i = 1, . . . , p)

holds as soon as xi ∈ K.

Note that, in our definition, a cone always contains the origin. In the literature this isnot necessarily the case.

For obvious reasons, convex cones are of particular interest to our study. We have thefollowing handy characterization of convexity of a cone.

Proposition 3.31 (Convex cones). Let K ⊂ E be a cone. Then K is convex if and onlyif K +K ⊂ K.

Proof. Let K be convex. Then

x+ y = 2 · 1

2(x+ y)︸︷︷︸∈K

∈ K (x, y ∈ K)

hence K +K ⊂ K.If, in turn, K +K ⊂ K, then

λx︸︷︷︸∈K

+ (1− λ)y︸︷︷︸∈K

∈ K +K ⊂ K (x, y ∈ K,λ ∈ [0, 1]),

i.e. K is convex.

Pointedness of convex cones can be handily characterized.

Proposition 3.32 (Pointedness of convex cones). Let K ⊂ E be a convex cone. Then Kis pointed if and only if K ∩ (−K) = 0.

Proof. Exercise 3.2.1

We proceed with a list of prominent examples of cones.

Example 3.33 (Cones).

a) (Nonnegative Orthant) For all n ∈ N, the nonnegative orthant Rn+ is a pointed,

convex cone, which is also a polyhedron as

Rn+ =

x ∈ Rn

∣∣ (−ei)Tx ≤ 0 (i = 1, . . . , n).

30

b) (Cone complementarity constraints) Let K ⊂ E be a cone. Then the set

Λ := (x, y) ∈ E× E | x, y ∈ K, 〈x, y〉 = 0

is a cone. A prominent realization is K = Rn, in which case Λ is called the comple-mentarity constraint set.

c) (Positive semidefinite matrices) For n ∈ N, the set Sn+ of positive semidefinite n×nmatrices is a pointed, convex cone.

The next example is important enough to merit its own definition.

Definition 3.34 (Polar cone). Let K ∈ E be a cone. Then the polar (cone) of K isdefined by

K := d ∈ E | 〈d, x〉 ≤ 0 (x ∈ K) .respectively. Moreover, K := (K) is called the bipolar cone of K.

The cone K∗ := −K is sometimes referred to as the dual (cone) of K, and K iscalled self-dual if K∗ = K.

In order to visualize the normal cone, we think of E as Rn: Then the normal cone of thecone K is set of all vectors, which have an angle ≥ 90 to every vector in K.

Clearly, for an arbitrary cone K ⊂ E, its polar K is always a closed, convex cone.Hence, for K to be self-dual, it must necessarily be a closed, convex cone. Moreover,polarization is order-reversing, i.e. for K1 ⊂ K2 ⊂ E, we have K2 ⊂ K1 .

We continue with some elementary examples of polar cones.

Example 3.35 (Polar cones).

a) It holds that 0 = E and E = 0, which is a special case of part b).

b) If S is a subspace, S = S⊥ (cf. Exercise 3.2.4).

c) For 0 6= w ∈ E, the polar of the ray tw | t ≥ 0 is the half-space w ∈ E | 〈w, x〉 ≤ 0.

d) The negative orthant Rn+ and the positive semidefinite n × n matrices Sn+ are self-

dual.

Since the intersection of convex cones is again a convex cone, using our usual routine, wecan also build up (convex) conical hulls of arbitrary sets.

Definition 3.36 ((Convex) Conical hull). Let S ⊂ E be nonempty. Then the (convex)conical hull of S is the set

cone S :=⋂S⊂M,

M convex cone

M.

Moreover, we define the closed (convex) conical hull of S to be

cone S := cl (cone S).

We notice, without proof, that cone S is the intersection of all closed, convex cones thatcontain S.

31

3.2.2 The horizon cone

The next conical approximation of a set is, loosely speaking, comprised of the directionsin which one can go (starting at least one point in the set) without ever leaving the set,and thus takes account of the unboundedness of the set.

Definition 3.37 (Horizon cone). For a nonempty set S ⊂ E the set

S∞ := v ∈ E | ∃xk ∈ S, tk ↓ 0 : tkxk → v

is called the horizon cone of S. We put ∅∞ := 0.

C

C∞

Figure 8: The horizon cone of an unbounded set

The following result shows that the horizon cone is indeed a closed cone.

Lemma 3.38. The horizon cone of a set C ⊂ E is a closed cone.

Proof. The fact that C∞ is a cone is trivial. For the closedness of the horizon cone wecan invoke a diagonal sequence arguments as it is usually used to show closedness of thetangent cone.

The horizon cone can be used to very handily express boundedness of an arbitrary set inE.

Proposition 3.39 (Horizon criterion for boundedness). A set S ⊂ E is bounded if andonly if S∞ = 0.

Proof. If S is bounded, then for all sequences xk ∈ S, tk ↓ 0, we have tkxk → 0, henceS∞ = 0.

If, in turn, S is unbounded, then there exists xk ∈ S with ‖xk‖ → ∞. Puttingtk = 1

‖xk‖, we have tk ↓ 0 and tkxk = xk

‖xk‖converges (at least on a subsequence) to some

v 6= 0 and, hence v ∈ S∞ ) 0.

Figure 8 nicely illustrates the foregoing result. It shows that the horizon cone of anunbounded nonconvex set, which is a non-trivial closed cone, which is not necessarilyconvex.

We now want to show that the horizon cone is always going to be convex if the said inquestion is convex itself. For these purposes, we introduce another conical approximationfor convex sets.

32

Definition 3.40 (Recession cone). For nonempty convex sets C ⊂ E, the convex cone

0+(C) := v | ∀x ∈ C, λ ≥ 0 : x+ λv ∈ C ,

is called the recession cone of C,

The recession cone is very closely related to the horizon cone as we will now see.

Proposition 3.41 (Horizon vs. recession cone). Let C ⊂ E be nonempty and convex.Then

C∞ = 0+(clC).

In particular, C∞ is (a closed and) convex (cone) if C is convex.

Proof. Let v ∈ C∞. Then there exist xk ∈ C, tk ↓ 0 such that tkxk → v. Now, letλ ≥ 0 and x ∈ C be given. As C is convex, we have

(1− λtk)x+ λtkxk ∈ C

for all k ∈ N sufficiently large. Hence,

x+ λv = limk→∞

(1− λtk)x+ λtkxk ∈ cl (C),

which shows that v ∈ 0+(clC).Now let v ∈ 0+(clC). Fixing x ∈ C, we get x + λv ∈ clC for all λ ≥ 0. Hence,

we can find a sequence xk ∈ C such that ‖x + kv − xk‖ ≤ 1k

for all k ∈ N. Puttingλk := 1

k(k ∈ N), we obtain

‖v − λkxk‖ ≤1

k(‖x‖+ ‖x+ kv − xk‖)→ 0,

hence λkxk → v, i.e. v ∈ C∞.

An obvious consequence of the foregoing proposition is the fact that horizon and recessioncone coincide for nonempty, closed and convex sets.

Corollary 3.42. Let C ⊂ E be nonempty, closed and convex. Then

C∞ = 0+(C).

Exercises to Section 3.2

3.2.1 (Pointedness of convex cones) Let K ⊂ E be a convex cone. Show that K ispointed if and only if K ∩ (−K) = 0.

3.2.2 (Convex cones vs. subspaces) Let K ⊂ E be a convex cone. Show that:

a) K ∩ (−K) is the largest subspace that is contained in K;

b) K −K is the smallest subspace containing K, i.e. K = K = spanK.

c) K is a subspace if and only if K = −K.

33

3.2.3 (Characterizing the conical hull) Let S ⊂ E be nonempty. Prove that

cone S =

r∑i=1

λixi | r ∈ N, xi ∈ S, λi ≥ 0 (i = 1, . . . , r)

= R+(convS) = conv (R+S).

3.2.4 (Polar of subspace) Let U ⊂ E be a subspace. Show that U = U⊥.

3.2.5 (Pointedness and polarity) Let K be a convex cone. Show that

w ∈ intK ⇐⇒ 〈v, w〉 < 0 (v ∈ K \ 0).

3.2.6 (Polar cone and normal vectors) Let C ⊂ E be nonempty. Then it holds that

(cone C) = w ∈ E | 〈w, x〉 ≤ 0 (x ∈ C) .

3.3 Convex functions

We start the section with the basic definition of a convex function.

Definition 3.43 (Convex function). A function f : E → R is said to be convex if epi fis a convex set.

Note that in the above definition we could have substitued the epigraph for the strictepigraph epi <f := (x, α) ∈ E× R | f(x) < α of f , see Exercise 3.3.3. Moreover, notethat convex functions have convex level sets, see Exercise 3.3.6.

Recall that the domain of a function f : E→ R is defined by dom f := x ∈ E | f(x) <∞.Using the linear mapping

L : (x, α) ∈ E× R 7→ x ∈ E, (3.11)

we have dom f = L(epi f), and hence Example 3.2 e) yields the following immediate butimportant result.

Proposition 3.44 (Domain of a convex function). The domain of a convex function isconvex.

Recall that a (convex) function f : E→ R is proper if dom f 6= ∅ and f(x) > −∞ for allx ∈ E.

Improper convex functions are somewhat pathological (cf. Exercise 3.3.4), but they dooccur; rather as by-products then as primary objects of study. For example the function

f : x ∈ R 7→

−∞ if |x| < 1,

0 if |x| = 1,+∞ if |x| > 1.

is improper and convex.Convex functions have an important interpolation property, which we summarize in thenext result for the case that f does not take the value −∞.

34

Proposition 3.45 (Characterizing convexity). A function f : E→ R ∪ +∞ is convexif and only if for all x, y ∈ E we have

f(λx+ (1− λ)y) ≤ (1− λ)f(x) + λf(y) (λ ∈ [0, 1]). (3.12)

Proof. First, let f be convex. Take x, y ∈ E and λ ∈ [0, 1]. If x /∈ dom f or y /∈ dom fthe inequality (3.12) holds trivially, since the right-hand side is going to be +∞. If, onthe other hand, x, y ∈ dom f , then (x, f(x)), (y, f(y)) ∈ epi f , hence by convexity

(λx+ (1− λ)y, λf(x) + (1− λ)f(y)) ∈ epi f,

i.e. f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y), which proves the first implication.In turn, let (3.12) hold for all x, y ∈ E. Now, take (x, α), (y, β) ∈ epi f and let

λ ∈ [0, 1]. Due to (3.12) we obtain

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y) ≤ λα + λβ,

i.e. λ(x, α) + (1− λ)(y, β) ∈ epi f , which shows the converse implication.

We move the analogous characterization of convexity for functions E → R to Exercise3.3.3, because these kinds of functions are not our primary object of study.

Definition 3.46 (Convexity on a set). Let f : E→ R∪+∞ and C ⊂ dom f nonempty,convex. Then we call f convex on C if (3.12) holds for all x, y ∈ C.

With this terminology we can formulate the following useful result.

Corollary 3.47. Let f : E→ R ∪ +∞. Then the following are equivalent.

i) f is convex.

ii) f is convex on its domain.

Proof. The implication’ i)⇒ii)’ is obvious from the characterization of convexity in Propo-sition 3.45

For the converse implication note that (3.12) always holds for any pair of points x, yif one of them is not in the domain.

This completes the proof.

As an immediate consequence of Corollary 3.47, we can make the following statementabout proper, convex functions:

”The proper and convex functions E→ R are exactly those proper functions f : E→ Rfor which there exists a nonempty, convex set C ⊂ E such that (3.12) holds on C andf takes the value +∞ outside of C.”

35

We are mainly interested in proper, convex (even lsc) functions E→ R ∪ +∞. Hence,we introduce the abbreviations

Γ := Γ(E) := f : E→ R ∪ +∞ | f proper and convex

andΓ0 := Γ0(E) := f : E→ R ∪ +∞ | f proper, lsc and convex

which we will use frequently in the remainder.Although we are not primarily interested in continuity properties of convex functions

we still want to mention the following to standard result which can be substantiallyrefined, see the analysis in [24, §10].

Theorem 3.48 (Lipschitz continuity of convex functions; [3, Th. 4.1.3]). A proper convexfunction is (locally Lipschitz) continuous on the interior of its domain.

In particular, every finite-valued convex function is (locally Lipschitz) continuous.


3.3.1 (Univariate convex functions) Let f : R → R ∪ +∞ and I ⊂ dom f be anopen interval. Show the following :

a) f is convex on I if and only if the slope-function

x 7→ f(x)− f(x0)

x− x0

is nondecreasing on I \ x0.b) Let f is differentiable on I: Then f is convex on I if f ′ is nondecreasing on I,

i.e.f ′(s) ≤ f ′(t) (s, t ∈ I : s ≤ t).

c) Let f is twice differentiable on I. Then f is convex on I if and only if f ′′(x) ≥ 0for all x ∈ I.

3.3.2 (Convexity of maximum eigenvalue) Let f : Sn → R be given by

f(A) = λmax(A),

where λmax(A) denotes the largest eigenvalue of A. Show that f is convex.Hint: Rayleigh quotient.

3.3.3 (Characterization of convexity) Let f : E→ R. Show the equivalence of:

i) f is convex;

ii) The strict epigraph epi <f := (x, α) ∈ E× R | f(x) < α of f is convex;

iii) For all λ ∈ (0, 1) we have f(λx+ (1−λ)y) < λα+ (1−λ)β whenever f(x) < αand f(y) < β.

36

3.3.4 (Properness and closedness of convex functions) Prove the following:

a) An improper convex function f : E → R must have f(x) = −∞ for all x ∈ri (dom f).

b) An improper convex function which is lsc, can only have infinite values.

c) If f is convex then cl f is proper if and only if f is proper.

3.3.5 (Jensen’s Inequality) Show that f : E→ R ∪ +∞ is convex if and only if

f

(p∑i=1

λixi

)≤

p∑i=1

λif(xi) ∀xi ∈ E (i = 1, . . . , p), λ ∈ ∆p.

3.3.6 (Quasiconvex functions) A function f : E→ R is called quasiconvex if the levelsets lev≤αf are convex for every α ∈ R. Show:

a) Every convex function is quasiconvex.

b) f : E→ R ∪ +∞ is quasiconvex if an only if

f(λx+ (1− λ)y) ≤ maxf(x), f(y) (x, y ∈ dom f, λ ∈ [0, 1]).

c) If f : E→ R ∪ +∞ is quasiconvex then argminf is a convex set.

3.4 Functional operations preserving convexity

Proposition 3.49 (Positive combinations of convex functions). For p ∈ N let fi : E →R ∪ +∞ be convex (and lsc) and αi ≥ 0 for i = 1, . . . , p. Then

p∑i=1

αifi

is convex (and lsc). If, in addtion, ∩pi=1dom fi 6= ∅, then f is also proper.

Proof. The convexity assertion is an immediate consequence of the characterization in(3.12). For the additional closedness see Exercise 2.12. The properness statement isobvious.

Note that the latter result tells us that Γ and Γ0 are convex cones.

Proposition 3.50 (Pointwise supremum of convex functions). For an arbitrary index setI let fi be convex (and lsc) for all i ∈ I. Then the function f = supi∈I fi, i.e.

f(x) = supi∈I

fi(x) (x ∈ E)

is convex (and lsc).

37

Proof. It holds that

epi f =

(x, α)

∣∣∣∣ supi∈I

fi(x) ≤ α

= (x, α) | ∀i ∈ I : fi(x) ≤ α =

⋂i∈I

epi fi.

Since the intersection of (closed) convex sets it (closed) convex, this gives the assertion.

Proposition 3.51 (Pre-composition with and affine mapping). Let H : E1 → E2 be affineand g : E2 → R ∪ +∞ (lsc and) convex. Then the function f := g H is (lsc and)convex.

Proof. Let x, y ∈ E1 and λ ∈ (0, 1). Then we have

f(λx+(1−λx)) = g(λH(x)+(1−λ)y) ≤ λg(H(x))+(1−λ)g(H(y)) = λf(x)+(1−λ)f(y),

which gives the convexity of f . The closedness of f , under the closedness of g, followsfrom the continuity (as a consequence of affineness) of H, cf. Exercise 2.13

Proposition 3.52 (Convexity under epi-composition). Let f ∈ Γ and L ∈ L(E,E′).Then the function Lf : E′ → R defined by

(Lf)(y) := inf f(x) | L(x) = y

is convex.

Proof. We first show that , with T : (x, α) 7→ (Lx, α), we have

epi <Lf = T (epi <f). (3.13)

To this end, recall that

epi <Lf = (y, α) | Lf(y) < α and epi <f = (x, α) | f(x) < α .

First, let (x, α) ∈ epi <f . Then T (x, α) = (L(x), α) and

(Lf)(L(x)) = infzf(z) | L(z) = L(x) ≤ f(x) < α,

thus, T (x, α) ∈ epi <Lf .In turn, if (y, α) ∈ epi <Lf , i.e. inf f(z) | L(z) = y < α, then L−1(y) 6= ∅, hence,

there exists x ∈ L−1(y) with f(x) < α. Thus, we have T (x, α) = (y, α) and (x, α) ∈ epi <f .This proves (3.13).

Now, as f is convex, epi <f is convex (see Exercise 3.3.3). But, since T is linear, from(3.13) it follows that also epi <Lf is convex, which proves the convexity of Lf .

Theorem 3.53 (Infimal projection). Let ψ : E1 × E2 → R ∪ +∞ be convex. Then theoptimal value function

p : E1 → R, p(x) := infy∈E2

ψ(x, y)

is convex. Moreover, the set-valued mapping

x 7→ argminy∈E2ψ(x, y) ⊂ E2.

is convex-valued.

38

Proof. It can easily be shown that epi <p = L(epi <ψ) under the linear mapping L :(x, y, α) 7→ (x, α). This immediately gives the convexity of p.

The remaining assertion follows immediately from the fact that the solution set ofa convex minimization problem is convex (Exercise), and y 7→ ψ(x, y) is convex for allx ∈ E1.

3.5 Conjugacy of convex functions

3.5.1 The closed convex hull of a function

We start this section with the fundamental result that every closed, proper, convex func-tion has an affine minorant.

Proposition 3.54 (Affine minorization principle). For f ∈ Γ0 there exists an affineminorant, i.e. there exist a ∈ E and β ∈ R such that

f(x) ≥ 〈a, x〉+ β (x ∈ E).

Proof. By assumption epi f is nonempty, closed and convex and there exists x ∈ dom f .Choosing γ < f(x), we have (x, γ) /∈ epi f . By the separation theorem (Theorem 3.16)there exists 0 6= (a, λ) ∈ E× R and β ∈ R such that

〈a, x〉+ λα ≥ β > 〈a, x〉+ λγ ((x, α) ∈ epi f). (3.14)

In particular, for (x, f(x)) ∈ epi f this implies

λ(f(x)− γ

)> 0.

Since f(x) > γ, this implies λ > 0. Dividing (3.14) by λ this gives

α ≥ −⟨aλ, x⟩

+β

λ((x, α) ∈ epi f).

Since (x, f(x)) ∈ epi f this yields

f(x) ≥ −⟨aλ, x⟩

+β

λ(x ∈ dom f).

Since this inequality holds trivially for x /∈ dom f , the function x 7→ −⟨aλ, x⟩

+ βλ

has thedesired properties.

We continue with the main result of this section which says that every closed, proper,convex function is the pointwise supremum of its affine minorants. It will be the mainworkhorse for this section and Section 3.5.2 on Fenchel conjugates.

Theorem 3.55 (Envelope representation in Γ0). Let f ∈ Γ0. Then f is the pointwisesupremum of all affine functions minorizing it, i.e.

f(x) = sup h(x) | h ≤ f, h affine .

39

Proof. The inequalityf(x) ≥ sup h(x) | h ≤ f, h affine

is clear.In order to establish the reverse inequality we are going to show that for all (x, γ) such

that f(x) > γ there exists an affine function g : E→ R g ≤ f and γ ≤ g(x). As γ < f(x)can be chosen arbitrarily close to f(x) this will yield the assertion:

To this end, let (x, γ) such that γ < f(x) be given. Then (x, γ) /∈ epi f , and byseparation there exists (a, λ) 6= (0, 0) and β ∈ R such that

〈a, x〉+ λα ≥ β > 〈a, x〉+ λγ ((x, α) ∈ epi f). (3.15)

Note that we do not necessarily have x ∈ dom f . However, from (3.15) we infer at leastλ ≥ 0 (clear?). Hence, we consider two cases:Case 1: λ > 0. Dividing (3.15) by λ > 0 yields⟨a

λ, x⟩

+ α ≥ β

λ⇐⇒ α ≥ −

⟨aλ, x⟩

+β

λ((x, α) ∈ epi f).

In particular, for (x, f(x)) ∈ epi f we infer that

f(x) ≥ g(x) := −⟨aλ, x⟩

+β

λ(x ∈ dom f).

For x /∈ dom f this inequality is satisfied anyway. Due to (3.15) we also have

g(x) = −(aλ

)Tx+

β

λ> γ,

so that g has the desired properties.

Case 2: λ = 0. In this case (3.15) reduces to

〈a, x〉 ≥ β > 〈a, x〉 ((x, α) ∈ epi f),

which is obviously equivalent to

〈a, x〉 ≥ β > 〈a, x〉 (x ∈ dom f). (3.16)

In particular, we have x /∈ dom f here, hence f(x) =∞. We now define

g : E→ R, g(x) := −〈a, x〉+ β.

Then from (3.16) we immediately infer that

g(x) ≤ 0 (x ∈ dom f) and g(x) > 0. (3.17)

Invoking Proposition 3.54 yields another affine mapping g : E→ R such that

g(x) ≤ f(x) (x ∈ E). (3.18)

40

By means of g and g we now define another affine function

g : E→ R, g(x) := g(x) + ρg(x).

where ρ > 0 will be specified shortly. From (3.17) and (3.18) we infer that

g(x) = g(x) + ρg(x) ≤ f(x) + ρ g(x)︸︷︷︸≤0

≤ f(x) (x ∈ dom f),

which, in particular, implies g ≤ f as well as

g(x) = g(x) + ρ g(x)︸︷︷︸>0

> γ

for all ρ > 0 sufficiently large. Hence g has the the desired properties in this case, whichconcludes the proof.

We will now establish the notion of the convex and closed convex hull of a function, andwe will see that for functions that are minorized by an affine function, the latter coincideswith the supremum over all its affine minorants.

For these purposes, let f : E→ R and observe that from our contruction of the lowersemicontinuous hull cl f of f in Section 2.3 it follows that

(cl f)(x) = sup h(x) | h ≤ f, h lsc ,

i.e. cl f is the largest lsc function that minorizes f . We take this and Theorem 3.55 as aguide to build of the (closed) convex hull of f .

Definition 3.56 (Convex hull of a function). Let f : E → R. Then the pointwisesupremum of all convex functions minorizing f , i.e.

(conv f)(x) := sup h(x) | h ≤ f, h convex

is called the convex hull of f . Moreover, we define the closed convex hull of f to be

(conv f)(x) := sup h(x) | h ≤ f, h lsc and convex .

Since both convexity and lower semicontinuity are preserved under pointwise suprema (cf.Proposition 3.50), we find that conv f is the largest convex function that minorizes f , andanalogously, conv f is the largest lsc and convex function that minorizes f . Note that wealways have

conv f ≤ conv f ≤ f

andconv f = cl (conv f).

Moreover, there is an epigraphical characterization of the closed convex hull for properfunctions that have an affine minorant, cf. Exercise 3.5.2. An analogous statement doesnot hold for the convex hull, see the discussion in [19].

41

We close this section with a result which tells us that for a proper function whichhas an affine minorant, its closed convex hull is the pointwise supremum of its affineminorants. Hence the set of functions over which the supremum in the definition of theclose convex hull is taken over can be substantially reduced.

Here, notice that since every affine function E→ R is convex and lsc, by the definitionof the respective hulls, the functions f , cl f , conv f and conv f have the same affineminorants (if any).

Corollary 3.57 (Affine envelope representation of conv ). Let f : E → R be proper andminorized by an affine function. Then conv f is the pointwise supremum of all affinefunctions minorizing f , i.e.

(conv f)(x) = sup h(x) | h ≤ f, h affine .

Proof. Recall that f and conv f have the same affine minorants. Since, by assumptionf has one, so does conv f , and thus f does not take the value −∞. Moreover, f is notconstantly +∞ and hence so neither is conv f ≤ f . Therefore, conv f is proper and hasan affine minorant hence, by Theorem 3.55, conv f is the pointwise supremum of its affineminorants. But, again, since these conincide with those of f , the result follows.

3.5.2 The Fenchel conjugate

We start with the central definition of this section.

Definition 3.58 (Fenchel conjugate). Let f : E → R. Then its Fenchel conjugate (orsimply conjugate) is the function f ∗ : E→ R defined by

f ∗(y) := supx∈E〈x, y〉 − f(x).

The function f ∗∗ := (f ∗)∗ is called the (Fenchel) biconjugate of f .

Note that, clearly, we can restrict the supremum in the above definition of the conjugateto the domain of the underlying function f , i.e.

f ∗(y) = supx∈dom f

〈x, y〉 − f(x).

Moreover, by definition, we always have

f(x) + f ∗(y) ≥ 〈x, y〉 (x, y ∈ E), (3.19)

which is known as the Fenchel-Young inequality.The mapping f 7→ f ∗ from the space of extended real-valued functions to itself is

called the Legendre-Fenchel transform.We always have

f ≤ g =⇒ f ∗ ≥ g∗,

42

i.e. the Legendre-Fenchel transform is order-reversing. Moreover, this implies that forany function f we have

f ≥ f ∗∗.

We will study the case when equality occurs in depth below.Before we start analyzing the conjugate function in-depth, we want to motivate why

we would be interested in studying it: Let f : E→ R. We notice that

epi f ∗ = (y, β) | 〈x, y〉 − f(x) ≤ β (x ∈ E) . (3.20)

This means that the conjugate of f is the function whose epigraph is the set of all (y, β)defining affine functions x 7→ 〈y, x〉 − β that minorize f . In view of Corollary 3.57, if fis proper and has an affine minorant , the pointwise supremum of these affine mappingsis the closed convex hull of f , i.e., through its epigraph, f ∗ encodes information about fand its closed convex hull.

Sincef ∗(y) = sup

x∈E〈x, y〉 − f(x) = sup

(x,α)∈epi f

〈y, x〉 − α (y ∈ E), (3.21)

we also haveepi f ∗ = (y, β) | 〈x, y〉 − α ≤ β ((x, α) ∈ epi f)

We use our recent findings to establish our main result on conjugates and biconjugatesknown in the literature as the Fenchel-Moreau theorem, crediting the founding fathers ofconvex analysis.

Theorem 3.59 (Fenchel-Moreau Theorem). Let f : E→ R be proper and have an affineminorant. Then the following hold:

a) f ∗ and f ∗∗ are closed, proper and convex;

b) f ∗∗ = conv f ;

c) f ∗ = (conv f)∗ = (cl f)∗ = (conv f)∗.

Proof. a) Applying Proposition 3.50 to (3.21), we see that f ∗ is lsc and convex. If f ∗

attained the value −∞, f would be constantly +∞, which is false. On the otherhand, f ∗ is not identically +∞, since that would imply that its epigraph epi f ∗

which encodes all minorizing affine mappings of f , were empty, which is false byassumption . Hence, f ∗ is proper.

Applying the same arguments to f ∗∗ = (f ∗)∗ (and observing that f ∗ is proper andhas an affine minorant) gives that f ∗∗ is closed, proper and convex, too.

b) Applying (3.21) to f ∗∗, for x ∈ E, we have

f ∗∗(x) = sup(y,β)∈epi f∗

〈y, x〉 − β.

Hence, in view of (3.20), f ∗∗ is the pointwise supremum of all affine minorants of f .Therefore, by Corollary 3.57, we see that f ∗∗ = conv f .

43

c) Since the affine minorants of f, conv f, cl f and conv f coincide, their conjugateshave the same epigraph and hence are equal.

f ≥ f ∗∗ for any function f : E → R that has an affine minorant, and it holds thatf ∗∗ = f if and only if f is closed and convex. Thus, the Legendre-Fenchel transforminduces a one-to-one correspondence on Γ0: For f, g ∈ Γ0, f is conjugate to g if and onlyif g is conjugate to f and we write f

∗←→ g in this case. This is called the conjugacycorrespondence. A property on one side is reflected by a dual property on the other as wewill see in the course of our study.

A list of some elementary conjugacy operations is given below

Proposition 3.60 (Elementary cases of conjugacy). Let f : E→ R. Then the followinghold:

a) (f − 〈a, ·〉)∗ = f ∗((·) + a) (a ∈ E);

b) (f + γ)∗ = f ∗ − γ (γ ∈ R);

c) (λf)∗ = λf ∗(

(·)λ

)(λ > 0).

Proof. Exercise 3.5.1.

3.5.3 Special cases of conjugacy

Convex quadratic functions For Q ∈ Sn, b ∈ Rn we consider the quadratic functionq : Rn → R defined by

q(x) :=1

2xTQx+ bTx. (3.22)

By the well known characterization of convexity in the smooth case, we know that q isconvex if and only if Q is (symmetric) positive semidefinite. Hence, for the remainder weassume that Q 0.

Proposition 3.61 (Conjugate of convex quadratic functions). For q from (3.22) withQ ∈ Sn+ we have

q∗(y) =

12(y − b)TQ†(y − b) if y − b ∈ rgeQ,

+∞ else.

In particular, if Q 0, we have

q∗(y) =1

2(y − b)TQ−1(y − b)

Proof. By definition, we have

q∗(y) = supx∈Rn

xTy − 1

2xTQx− bTx

= − inf

x∈E

1

2xTQx− (b− y)Tx

. (3.23)

44

The necessary and sufficient optimality conditions of x to be a minimizer of the convexfunction x 7→ 1

2xTQx− (b− y)Tx read

Qx = y − b. (3.24)

Hence, if y − b /∈ rgeQ, from Exercise 2.7, we know that inf f = −∞, hence q∗(y) = +∞in that case.

Otherwise, we have y− b ∈ rgeQ, hence, in view of Theorem A.4, (3.24) is equivalentto

x = Q†(y − b) + z, z ∈ kerA.

Inserting x = Q†(y − b) (we can choose z = 0) in (3.23) yields

q∗(y) = (Q†(y − b))Ty − 1

2(Q†(y − b))TQQ†(y − b)− bTQ†(y − b)

= (y − b)Q†(y − b)− 1

2(y − b)Q†QQ†(y − b)

=1

2(y − b)Q†(y − b),

where we make use of Theorem A.4 a) and c). Part d) of the latter result gives theremaining assertion.

We point out that, by the foregoing result, the function f = 12‖ · ‖2 is self-conjugated in

the sense that f ∗ = f . Exercise 3.5.5 shows that this is the only function on Rn thathas this property. Clearly, by an isomorphy argument, the same holds for the respectivefunction on an arbitrary Euclidean space.

Support functions

Definition 3.62 (Positive homogeneity, subadditivity, and sublinearity). Let f : E→ R.Then we call f with 0 ∈ dom f

i) positively homogeneous if

f(λx) = λf(x) (λ > 0, x ∈ E);

b) subadditive iff(x+ y) ≤ f(x) + f(y) (x, y ∈ E);

c) sublinear iff(λx+ µy) ≤ λf(x) + µf(y) (x, y ∈ E, λ, µ > 0).

Note that in the definition of positive homogeneity we could have also just demanded aninequality, since f(λx) ≤ λf(x) for all λ > 0 implies that

f(x) = f(λ−1λx) ≤ 1

λf(λx).

We note that norms are sublinear.

45

Example 3.63. Every norm ‖ · ‖ is sublinear.

We next proivide a usful list of characerizations of positive homogeneneity and sublinear-ity, respectively.

Proposition 3.64. (Positive homogeneity, sublinearity and subadditivity) Letf : E→ R. Then the following hold:

a) f is positively homogeneous if and only if epi f is a cone. In this case f(0) ∈0,−∞.

b) If f is lsc and positively homogeneous with f(0) = 0 it must be proper.

c) The following are equivalent:

i) f is sublinear;

ii) f is positively homogeneous and convex;

iii) f is positively homogeneous and subadditive;

iv) epi f is a convex cone.

Proof. Exercise 3.5.9

We continue with the prototype of a sublinear functions, so-called support functions, whichwill from now on occur ubiquitiously.

Definition 3.65 (Support functions). Let C ∈ E nonempty. The support function of Cis defined by

σC : x ∈ E 7→ sups∈C〈s, x〉 .

We start our investigation of support functions with a list of elementary properties.

Proposition 3.66 (Support functions). Let C ⊂ E be nonempty. Then

a) σC = σclC = σconvC = σconvC.

b) σC is proper, lsc and sublinear.

c) δ∗C = σC and σ∗C = δconvC.

d) If C is closed and convex then σC∗←→ δC.

Proof. a) Obviously, closures do not make a difference. On the other hand, we have⟨N+1∑i=1

λisi, x

⟩=

N+1∑i=1

λi 〈si, x〉 ≤ maxi=1,...,r

〈si, x〉

for all si ∈ C, λ ∈ ∆N+1, which shows that convex hulls also do not change anything.

46

b) By Proposition 3.50 σC is lsc and convex, and as 0 ∈ domσC and since λσC(x) =σC(λx) for all x ∈ E and λ > 0 this shows properness and positive homogeneity,which gives the assertion in view of Proposition 3.64 c).

c) Clearly, δ∗C = σC . Hence, σ∗C = δ∗∗C = conv δC = δconvC , since

conv (epi δC) = conv (C × R+) = convC × R+ = epi (δconvC).

d) Follows immediately from c).

One of our main goals in this paragraph is to show that, in fact, part b) of Propostion3.66 can be reversed in the sense that every proper, lsc and sublinear function is a supportfunction. As a preparation we need the following result.

Proposition 3.67. Let f : E → R be closed, proper and convex. Then the following areequivalent:

i) f only takes the values 0 and +∞;

ii) f ∗ is positively homogeneous (i.e. sublinear, since convex).

Proof. ’i)⇒ii):’ In this case f = δC for some closed convex set C ⊂ E. Hence, f ∗ = σC ,which is sublinear, cf. Proposition 3.66.

In turn, let f ∗ be positively homogeneous (hence sublinear). Then, for λ > 0 andy ∈ E, we have

f ∗(y) = λf ∗(λ−1y)

= λ supx∈E

⟨x, λ−1y

⟩− f(x)

= sup

x∈E〈x, y〉 − λf(x)

= (λf)∗(y).

Thus, (λf)∗ = f ∗ for all λ > 0 and hence, by the Fenchel-Moreau Theorem, we have

λf = (λf)∗∗ = f ∗∗ = f (λ > 0).

But as f is proper, hence does not takte the value −∞, this immediately implies that fonly takes the values +∞ and 0.

Theorem 3.68 (Hormander’s Theorem). A function f : E → R is proper, lsc and sub-linear if and only if it is a support function.

Proof. By Proposition 3.66 b), every support function is proper, lsc and sublinear.In turn, if f is proper, lsc and sublinear (hence f = f ∗∗), by Proposition 3.67, f ∗ is the

indicator of some set C ⊂ E, which necessary needs to be nonempty, closed and convex,as f ∗ ∈ Γ0. Hence, f ∗∗ = δ∗C = σC .

47

We now want to give a slight refinement of Hormander’s Theorem, in that we describethe set that a proper, lsc sublinear function supports.

Corollary 3.69. Let f : E→ R be proper and sublinear. Then cl f is the support functionof the closed convex set

s ∈ E | 〈s, x〉 ≤ f(x) (x ∈ E) .

Proof. Since cl f is proper (cf. Exercise 3.3.4) closed and sublinear it is a support functionof a closed convex set C. Therefore, we have cl f = δ∗C and thus f ∗ = (cl f)∗ = δC . Hence,C = s ∈ E | f ∗(s) ≤ 0. But f ∗(s) ≤ 0 if and only 〈s, x〉 − f(x) ≤ 0 for all x ∈ E.

3.5.4 Gauges, polar sets and dual norms

We now present a class of functions that makes a connection between support functionsand norms.

Definition 3.70 (Gauge function). Let C ⊂ E. The gauge (function) of C is defined by

γC : x ∈ E 7→ inf λ ≥ 0 | x ∈ λC .

For a closed convex set that contains the origin, its gauge has very desirable convex-analytical properties.

Proposition 3.71. Let C ⊂ E be nonempty, closed and convex with 0 ∈ C. Then γC isproper, lsc and sublinear.

Proof. γC is obviously proper as γC(0) = 0. Moreover, for t > 0 and x ∈ E, we have

γC(tx) = inf λ ≥ 0 | tx ∈ λC

= inf

λ ≥ 0

∣∣∣∣ x ∈ λt C

= inf tµ ≥ 0 | x ∈ µC = t inf µ ≥ 0 | x ∈ µC = tγC(x),

i.e. γC is positively homogeneous (since also 0 ∈ dom γC). We now show that it is alsosubadditive, hence altogether, sublinear: To this end, take x, y ∈ dom γC (otherwise thereis nothing to prove). Due to the identity

x+ y

λ+ µ=

λ

λ+ µ

x

λ+

µ

λ+ µ

y

µ(λ+ µ 6= 0),

we realize, by convexity of C, that x+y ∈ (λ+µ)C if x ∈ λC and y ∈ µC for all λ, µ ≥ 0.This implies that γC(x+ y) ≤ γC(x) + γC(y).

In order to prove lower semicontinuity of γC notice that (by Exercise 3.5.12 and positivehomogeneity) we have lev≤αγC = αC for α > 0, lev≤αγC = ∅ for α < 0 and lev≤0γC = C∞

(again by Exercise 3.5.12), hence all level sets of γC are closed, i.e. γC is lsc.This concludes the proof.

48

Note that in the proof of Proposition 3.71, we do not need the assumption that C containsthe origin to prove sublinearity. We do need it, though, to get lower semicontinuity, cf.Exercise 3.5.12

Since the gauge of a closed convex set that contains 0 is proper, lsc and sublinearwe know, in view of Hormander’s Theorem (see Theorem 3.68), that it is the supportfunction of some closed convex set. It can be described beautifully using the concept ofpolar sets which generalizes the notion of polar cones, cf. Definition 3.34.

Definition 3.72 (Polar sets). Let C ⊂ E. Then its polar set is defined by

C := v ∈ E | 〈v, x〉 ≤ 1 (x ∈ C) .

Moreover, we put C := (C) and call it the bipolar set of C.

Note that there is no ambiguity in notation, since the polar cone and the polar set of acone coincide, see Exercise 3.5.11 Moreover, as an intersection of closed half-spaces, C isa closed, convex set containing 0. In addition, like we would expect, we have

C ⊂ D ⇒ D ⊂ C,

andC ⊂ C.

Before we continue to pursue our question for the support function representation ofgauges, we provide the famous bipolar theorem. Its proof is based on separation.

Theorem 3.73 (Bipolar Theorem). Let C ⊂ E. Then C = conv (C ∪ 0).

Proof. Since C∪0 ⊂ C and C is closed and convex, we clearly have conv (C∪0) ⊂C. Now assume there were x ∈ C \ conv (C ∪ 0). By strong separation, there existss ∈ E \ 0 such that

〈s, x〉 > σconv (C∪0)(s) ≥ maxσC(s), 0.

After rescaling s accordingly (cf. Remark 3.17) we can assume that

〈s, x〉 > 1 ≥ σC(s),

in particular, s ∈ C. On the other hand 〈s, x〉 > 1 and x ∈ C, which is a contradiction.

As a consequence of the bipolar theorem we see that every closed convex set C ⊂ Econtaining 0 satisfies C = C. Hence, the mapping C 7→ C establishes a one-to-onecorrespondence on the closed convex sets that contain the origin. This is connected toconjugacy through gauge functions as is highlighted by the next result.

Proposition 3.74. Let C ⊂ E be closed and convex with 0 ∈ C. Then

γC = σC∗←→ δC and γC = σC

∗←→ δC .

49

Proof. Since, by Proposition 3.71, γC is proper, lsc and sublinear we have

γC = σD, D = v ∈ E | 〈v, x〉 ≤ γC(x) (x ∈ E)

in view of Corollary 3.69. To prove that γC = σC , we need to show that D = C. SinceγC(x) ≤ 1 if (and only if; see Exercise 3.5.12) x ∈ C, the inclusion D ⊂ C is clear. Inturn, let v ∈ C, i.e. 〈v, x〉 ≤ 1 for all x ∈ C. Now let x ∈ E. By the definition of γC ,there exists λk → γC(x) and ck ∈ C such that x = λkck for all k ∈ N. But then

〈v, x〉 = λk 〈v, ck〉 ≤ λk → γC(x),

hence v ∈ D, which proves γC = σC . Since C = C, this implies γC = σC . Theconjugacy relations are due to Proposition 3.66.

Exercise 3.5.12 tells us that the gauge of a symmetric, compact convex set with nonemptyinterior is a norm. This justifies the following definition.

Definition 3.75 (Dual norm). Let ‖ · ‖∗ be a norm on E with closed unit ball B∗. Thenwe call

‖ · ‖∗ := γB∗

its dual norm.

Corollary 3.76 (Dual norms). For any norm ‖·‖∗ with (closed) unit ball B its dual normis σB, the support of its unit ball. In particular, we have ‖ · ‖ = ‖ · ‖, i.e. the Euclideannorm is self-dual.


3.5.1 (Elementary conjugacy operations) Let f : E→ R. Prove the following:

a) (f − 〈a, ·〉)∗ = f ∗((·) + a) (a ∈ E);

b) (f + γ)∗ = f ∗ − γ (γ ∈ R);

c) (λf)∗ = λf ∗(

(·)λ

)(λ > 0).

3.5.2 (Closed convex hulls) Let C ⊂ E be nonempty. Show the following:

a) convC = cl (convC).

b) Let f : E → R be proper and have an affine minorant. Then epi (conv f) =conv (epi f).

3.5.3 (Convex quadratic functions) Let A ∈ Sn and define f : Rn → R by

f(x) =1

2xTAx.

a) Show that f is convex if and only if A ∈ Sn+.

b) For A ∈ Sn++ compute f ∗ and f ∗∗.

50

3.5.4 (Conjugate of separable sum) For fi : Ei → R (i = 1, 2) the separable sum off1 and f2 is defined by

f1 ⊕ f2 : (x1, x2) ∈ E1 × E2 7→ f1(x1) + f2(x2).

Show that (f1 ⊕ f2)∗ = f ∗1 ⊕ f ∗2 .

3.5.5 (Self-conjugacy) Show that 12‖ · ‖2 is the only function f : E→ R with f ∗ = f .

3.5.6 (Projections and conjugate functions) Let φ : E → R proper with and affineminorant, let V ⊂ E be a subspace containing aff (domφ), and set U := V ⊥. Showthat for any z ∈ aff (domφ) and s ∈ E we have

φ∗(s) = 〈PU(s), z〉+ φ∗(PV (s)).

3.5.7 (Conjugate of max-function) Let f : Rn → R be given by f(x) = maxi=1,...,n xi.Show that f ∗ = δ∆n where ∆n :=

λ ∈ Rn

+ |∑n

i=1 λi = 1.

3.5.8 (Conjugate of negative logdet) Compute f ∗ for

f : X ∈ Sn 7→− log(detX) if X 0,

+∞ else.

3.5.9 (Positive homogeneity, sublinearity and subadditivity) Let f : E→ R. Showthe following:

a) f is positively homogeneous if and only if epi f is a cone. In this case f(0) ∈0,−∞.

b) If f is lsc and positively homogeneous with f(0) = 0 it must be proper.


i) f is sublinear;

ii) f is positively homogeneous and convex;

iii) f is positively homogeneous and subadditive;

iv) epi f is a convex cone.

3.5.10 (Finiteness of support functions) Let S ⊂ E be nonempty. Then σS is finite ifand only if S is bounded.

3.5.11 (Polar sets) Show the following:

a) If C ∈ E is a cone, we have v | 〈v, x〉 ≤ 0 (x ∈ C) = v | 〈v, x〉 ≤ 1 (x ∈ C).b) C ⊂ E is bounded if and only if 0 ∈ intC.

c) For any closed half-space H containing 0 we have H = H.

3.5.12 (Gauge functions) Let C ⊂ E be nonempty, closed and convex with 0 ∈ C. Prove:

51

a) C = lev≤1γC , C∞ = γ−1C (0), dom γC = R+C

b) The following are equivalent:

i) γC is a norm (with C as its unit ball);

ii) C is bounded, symmetric (C = −C) with nonempty interior.

3.5.13 (Cone polarity and conjugacy) Let K ⊂ E be a convex cone. Then δK∗←→

δK .

3.5.14 (Directional derivative of a convex function) Let f ∈ Γ, x ∈ dom f and d ∈ E.Show that the following hold:

a) The difference quotient

t > 0 7→ q(t) :=f(x+ td)− f(x)

t

is nondecreasing.

b) f ′(x; d) exists (in R) with

f ′(x; d) = inft>0

q(t),

c) f ′(x; ·) is sublinear with dom f ′(x; ·) = R+(dom f − x).

d) f ′(x; ·) is proper and lsc for x ∈ ri (dom f).

3.6 Horizon functions

There is a functional correspondence of the horizon/recession calculus for (convex) coneswhich is obtained through the epigraphical perspective.

Definition 3.77 (Horizon functions). For any function f : E→ R the associated horizonfunction f∞ : E→ R is the function determined through

epi f∞ = (epi f)∞ (f 6≡ +∞) and f∞ = δ0 (f ≡ +∞).

As the horizon cone of a set is, by Lemma 3.38, always a closed cone, we find that thehorizon function of is always lsc and positively homogeneous. If f is convex, then, byProposition 3.41, f∞ is, in even sublinear (and lsc), cf. Proposition 3.64.

We have the following analytic description of the horizon function of a closed, proper,convex functions.

Proposition 3.78. Let f ∈ Γ. Then f∞ is lsc and sublinear. If f is also lsc (i.e. f ∈ Γ0)we have

f∞(w) = limt↓0

f(x+ tw)− f(x)

t(x ∈ dom f).

52

Proof. By Proposition 3.41 we have

(w, β) ∈ epi f∞ ⇐⇒ (w, β) ∈ (epi f)∞

⇐⇒ ∀(x, α) ∈ epi f, t > 0 : (x, α) + t(w, β) ∈ epi f

⇐⇒ ∀x ∈ dom f, t > 0 :f(x+ tw)− f(x)

t≤ β

⇐⇒ ∀x ∈ dom f : supt>0

f(x+ tw)− f(x)

t≤ β.

The fact that the supremum is a limit is due to Exercise 3.5.14 a).

Proposition 3.79. Let f ∈ Γ0. Then for all α ∈ R such that lev≤αf 6= ∅ we have

x | f(x) ≤ α∞ = x | f∞(x) ≤ 0 . (3.25)

Proof. This follows from Proposition 3.41 and Proposition 3.78.

Proposition 3.79 motivates the following definition of the horizon cone of a closed, proper,convex function.

Definition 3.80 (Horizon cone of closed proper convex function). Let f ∈ Γ0. Then wedefine the horizon cone of f by hzn f := lev≤0f

∞.

We have the following duality correspondences.

Theorem 3.81. Let f ∈ Γ. Then (cone (dom f)) = hzn f ∗. Dually, if f is also lsc (i.e.f ∈ Γ0), then (hzn f) = cone (dom f).

Proof. We first note that f ∗ ∈ Γ0 by Theorem 3.59, hence hzn f ∗ is well-defined and, byPropositon 3.79, for any α > inf f ∗, equals the horizon cone of

lev≤αf∗ = y | f ∗(y) ≤ α

= y | 〈y, x〉 − f(x) ≤ α (x ∈ dom f)= y | 〈y, x〉 ≤ f(x) + α (x ∈ dom f) .

From the latter representation we find that

w ∈ hzn f ∗ ⇐⇒ ∀y ∈ lev≤αf∗, λ ≥ 0 : y + λw ∈ lev≤αf

∗

⇐⇒ ∀y ∈ lev≤αf∗, λ ≥ 0, x ∈ dom f : λ 〈x, w〉+ 〈y, x〉 ≤ f(x) + α

⇐⇒ ∀x ∈ dom f : 〈w, x〉 ≤ 0.

Thereforehzn f ∗ = w | 〈x, w〉 ≤ 0 (x ∈ dom f) ,

which proves the first statement, see e.g. Exercise 3.2.6. The second statement follows asf = f ∗∗ when f ∈ Γ0, see Theorem 3.59.

53

3.7 The convex subdifferential

For motivational purposes we consider the following well known result for characterizingconvexity for smooth functions which is a consequence of the mean-value theorem.

Proposition 3.82. Let C ⊂ E be open and convex and let f : E → R be continuouslydifferentiable on C. Then f is convex on C if and only if

f(x) ≥ f(x) + 〈∇f(x), x− x〉 (x, x ∈ C). (3.26)

In view of the latter result the following notion of a surrogate for the derivative for anarbitrary convex function appears natural.

Definition 3.83 (The convex subdifferential). Let f : E→ R and x ∈ E. Then g ∈ E iscalled a subgradient of f at x if

f(x) ≥ f(x) + 〈g, x− x〉 (x ∈ E). (3.27)

The set∂f(x) := v ∈ E | f(x) ≥ f(x) + 〈v, x− x〉 (x ∈ E)

of all subgradients is called the (convex) subdifferential of f at x.

Note that we did not restrict ourselves to convex functions in the above definition, butwe point out that its the class Γ where the subdifferential is most meaningful.

Moreover, note that, clearly, in (3.27) (which is called the subgradient inequality), wecan restrict ourselves to points x ∈ dom f , since the inequality holds trivially outside ofdom f .

We point out that the subdifferential ∂f(x) of f : E → R at x ∈ E contains exactlythe slopes of all affine minorants of f that coincide with f at x, see Figure 9.

gph f

Figure 9: Affine minorants at a point of nondifferentiability

The subdifferential of a convex function might well be empty, contain only a single point,be bounded (hence compact as it is always closed, as we wil see shortly) or unbounded asthe following examples illustrate.

Example 3.84.

54

a) (Subdifferential of indicator function) Let C ⊂ E be convex and x ∈ C. Then

g ∈ ∂δC(x) ⇐⇒ δC(x) ≥ δC(x) + 〈g, x− x〉 (x ∈ E)

⇐⇒ 0 ≥ 〈g, x− x〉 (x ∈ C),

i.e. ∂δC(x) = v ∈ E | 〈v, x− x〉 ≤ 0 (x ∈ C). The latter set is also called thenormal cone of C at x, and is denoted by NC(x). It plays a central role in thederivation of optimality conditions of convex optimization problems.

b) (Subdifferential of Euclidean norm) We have

∂‖ · ‖(x) =

x‖x‖ if x 6= 0,

clB if x = 0

as can be verified by elementary considerations, see Exercise 3.7.1.

c) (Empty subdifferential) Consider

f : x ∈ R 7→−√x if x ≥ 0,

+∞ else.

Then ∂f(x) = ∅ for all x ≤ 0, see Exercise 3.7.2.

We start our conceptual study of the subdifferential with some elementary properties.

Proposition 3.85 (Elementary properties of the subdifferential). Let f : E → R. Thenthe following hold:

a) ∂f(x) is closed and convex for all x ∈ dom f .

b) If f is proper then ∂f(x) = ∅ for x /∈ dom f .

c) We have 0 ∈ ∂f(x) if and only if x ∈ argminEf (Generalized Fermat’s rule)

d) If f is convex then ∂f(x) = v ∈ E | (v,−1) ∈ Nepi f (x, f(x)) (x ∈ dom f).

Proof. a) We have

∂f(x) =⋂x∈E

v | 〈x− x, v〉 ≤ f(x)− f(x) ,

and intersection preserves closedness and convexity.

b) Obvious.

c) By definition we have

0 ∈ ∂f(x) ⇐⇒ f(x) ≥ f(x) (x ∈ E).

55

d) Notice that

v ∈ ∂f(x) ⇐⇒ f(x) ≥ f(x) + 〈v, x− x〉 (x ∈ dom f)

⇐⇒ α ≥ f(x) + 〈v, x− x〉 ((x, α) ∈ epi f))

⇐⇒ 0 ≥ 〈(v,−1), (x− x, α− f(x))〉 ((x, α) ∈ epi f))

⇐⇒ (v,−1) ∈ Nepi f (x, f(x)).

There is a tight connection between subdifferentiation and conjugation of convex func-tions.

Theorem 3.86 (Subdifferential and conjugate function). Let f : E → R. Then thefollowing are equivalent:

i) y ∈ ∂f(x);

ii) x ∈ argmaxz〈z, y〉 − f(z);

iii) f(x) + f ∗(y) = 〈x, y〉;

If f ∈ Γ0 these are also equivalent to:

iv) x ∈ ∂f ∗(y);

v) y ∈ argmaxw〈x, w〉 − f ∗(w).

Proof. Notice that

y ∈ ∂f(x) ⇐⇒ f(z) ≥ f(x) + 〈y, z − x〉 (z ∈ E)

⇐⇒ 〈y, x〉 − f(x) ≥ supz〈y, z〉 − f(z)

⇐⇒ f(x) + f ∗(y) ≤ 〈x, y〉⇐⇒ f(x) + f ∗(y) = 〈x, y〉 ,

where the last equality makes use of the Fenchel-Young inequality (3.19). This establishesthe equivalences between i), ii) and iii). Applying the same reasoning to f ∗ and noticingthat f ∗∗ = f if f ∈ Γ0, gives the missing equivalences.

One consequence of Theorem 3.86 is that the set-valued mappings ∂f and ∂f ∗ are in-verse to each other. We notice some other interesting implications of the latter theoremcombined with Proposition 3.66.

Corollary 3.87. Let C ⊂ E. Then the following hold:

a) For x ∈ domσC, we have ∂σC(x) = v ∈ C | x ∈ NconvC(v).

b) If C is a closed, convex cone the following are equivalent:

i) y ∈ ∂δC(x);

ii) x ∈ ∂δC(y);

iii) x ∈ C, y ∈ C and 〈x, y〉 = 0.

56

Smoothness properties of convex functions Although we are not primarily inter-ested in studying the smoothness properties of convex functions we still want to mentionthe following important facts which can be gathered from the extensive study in [24, §25],and which will be sporadically used later on.

Theorem 3.88 (Differentiability of convex functions). Let f ∈ Γ and let x ∈ int (dom f).Then the following are equivalent:

i) ∂f(x) is a singleton;

ii) f is differentiable at x;

iii) f is continuously differentiable at x.

In either case, ∂f(x) = ∇f(x).


3.7.1 (Subdifferential of Euclidean norm) Show that

∂‖ · ‖(x) =

x‖x‖ if x 6= 0,

clB if x = 0.

3.7.2 (Empty subdifferential) Consider

f : x ∈ R 7→−√x if x ≥ 0,

+∞ else.

Show that ∂f(x) = ∅ for all x ≤ 0.

3.8 Infimal convolution and the Attouch-Brezis Theorem

In this section we study a special instance of infimal projection in the spirit of Theorem3.53 in the following form.

Definition 3.89 (Infimal convolution). Let f, g : E→ R ∪ +∞. Then the function

f#g : E→ R, (f#g)(x) := infu∈Ef(u) + g(x− u)

is called the infimal convolution of f and g. We call the infimal convolution f#g exactat x ∈ E if

argminu∈Ef(u) + g(x− u) 6= ∅.We simply call f#g exact if it is exact at every x ∈ dom f#g.

Observe that we have the representation

(f#g)(x) = infu1,u2:u1+u2=x

f(u1) + g(u2). (3.28)

This has some obvious, yet useful consequences.

57

Lemma 3.90. Let f, g : E→ R ∪ +∞. Then the following hold:

a) dom f#g = dom f + dom g;

b) f#g = g#f .

Moreover, observe the trivial inequality

(f#g)(x) ≤ f(u) + g(x− u) (u ∈ E). (3.29)

Infimal convolution preserves convexity, as can be seen in the next result.

Proposition 3.91 (Infimal convolution of convex functions). Let f, g : E → R ∪ +∞be convex. Then f#g is convex.

Proof. Defining

ψ : E× E→ R ∪ +∞, ψ(x, y) := f(y) + g(x− y),

we see that ψ is convex (jointly in (x, y)) as a sum of the convex functions (x, y) 7→ f(y)and (x, y) 7→ g(x − y), the latter being convex by Proposition 3.51. By definition of theinfimal convolution, we have

(f#g)(x) = infy∈E

ψ(x, y),

hence, Theorem 3.53 yields the assertion.

We continue with an important class of functions that can be constructed using infimalconvolution, and that is intimately tied to projection mappings.

Example 3.92 (Distance functions). Let C ⊂ E. Then the function dC := δC#‖ · ‖ iscalled the distance (function) to the set C. It holds that

dC(x) = infu∈C‖x− u‖.

Hence, it is clear that, if C ⊂ E is (nonempty) closed and convex, we have

dC(x) = ‖x− PC(x)‖.

Moreover, Proposition 3.91 tells us that the distance function of a closed set is convex.

Proposition 3.93 (Conjugacy of inf-convolution). Let f, g : E → R ∪ +∞. Then thefollowing hold:

a) (f#g)∗ = f ∗ + g∗;

b) If f, g ∈ Γ0 such that dom f ∩ dom g 6= ∅ then (f + g)∗ = cl (f ∗#g∗).

58

Proof. a) By definition, for all y ∈ E, we have

(f#g)∗(y) = supx

〈x, y〉 − inf

uf(u) + g(x− u)

= sup

x,u〈x, y〉 − f(u)− g(x− u)

= supx,u(〈u, y〉 − f(u)) + (〈x− u, y〉 − g(x− u))

= f ∗(y) + g∗(y).

b) From a) and the fact that f, g are closed, proper convex, we have

(f ∗#g∗)∗ = f ∗∗ + g∗∗ = f + g,

which is proper, as dom f meets dom g, closed and convex. Thus,

conv (f ∗#g∗) = (f ∗#g∗)∗∗ = (f + g)∗.

By Proposition 3.91 the convex hull on the left can be omitted, hence cl (f ∗#g∗) =(f + g)∗.

The closure operation in Proposition 3.93 can be omitted under the qualification condition

ri (dom f) ∩ ri (dom g) 6= 0. (3.30)

This in fact is the statement of the following prominent theorem which in the infinitedimensional setting is attributed to Attouch and Brezis [1], and we reflect that in callingit the Attouch-Brezis Theorem even though in the finite dimensional setting it was wellestablished before.

Theorem 3.94 (Attouch-Brezis). Let f, g ∈ Γ0 such that (3.30) holds. Then (f + g)∗ =f ∗#g∗, and the infimal convolution is exact, i.e. the infimum in the infimal convolutionis attained on dom f ∗#g∗.

Proof. By Proposition 2.5 it suffices to show that f ∗#g∗ has closed level to sets. Tothis end let r ∈ R and zk ∈ lev≤rf

∗#g∗ → z. Given δk ↓ 0 there hence exists(xk, yk) ∈ E× E such that

f ∗(xk) + g∗(yk) ≤ r + δk and L∗(xk, yk) = zk (k ∈ N),

where L : x ∈ E 7→ (x, x) ∈ E×E. Now let (pk, qk) be the orthogonal projection of (xk, yk)onto the linear space V := span (dom f ×dom g)− rgeL. Then by Exercise 3.5.6 (appliedto φ = f ⊕ g, see Exercise 3.5.4), we have f ∗(pk) + g∗(qk) = f ∗(xk) + g∗(yk). Moreover,V ⊥ = span (dom f × dom g)⊥ ∩ kerL∗ ⊂ kerL∗, hence L∗(pk, qk) = zk for all k ∈ N. Allin all, we find that

f ∗(pk) + g∗(qk) ≤ r + δk and pk + qk = zk (k ∈ N). (3.31)

59

If (pk, qk) is bounded we can assume w.l.o.g. that (pk, qk) → (p, q). Passing to thelimit in (3.31) hence yields

f ∗(p) + g∗(q) ≤ lim infk→∞

f ∗(pk) + g∗(qk) ≤ r and p+ q = z.

Therefore z ∈ lev≤rf∗#g∗, which shows that the latter is closed, and hence f ∗#g∗ is lsc.

Moreover with r := (f ∗#g∗)(z) it holds that

f ∗(p) + g∗(z − p) = infp∈Ef ∗(p) + g∗(z − p) = (f ∗#g∗)(z),

which gives exactness.We now show that (pk, qk) is, in fact, bounded: By our assumption that ri (dom f)∩

ri (dom g) 6= ∅ and that L : x 7→ (x, x) we have that (0, 0) ∈ ri (dom f × dom g)− rgeL =ri (dom f × dom g − rgeL). Hence V = aff (dom f × dom g − rgeL) and so there existsε > 0 such that Bε(0, 0)∩V ⊂ dom f ×dom g− rgeL. Hence for any (r, s) ∈ Bε(0, 0)∩Vthere exist (u, v) ∈ dom f × dom g and x ∈ E such that (r, s) = (u− x, v− x). Therefore,for all k ∈ N, we have

〈(pk, qk), (r, s)〉 = 〈(pk, qk), (u− x, v − x)〉= 〈(pk, qk), (u, v)〉 − 〈(pk, qk), (x, x)〉≤ f ∗(pk) + g∗(qk) + f(u) + g(v)− 〈pk + qk, x〉≤ r + δk + f(u) + g(v) + 〈zk, x〉 ,

where the first inequality is simply the due to the Fenchel-Young inequality (3.19), andthe second one follows from (3.31). Since δk ↓ 0, (u, v) ∈ dom f × dom g and zkconverges, and is hence bounded, the last expression is bounded from above. Therefore

supk∈N〈(pk, qk), (r, s)〉 < +∞ ((r, s) ∈ V ).

by scaling (r, s) accordingly. This readily implies that (pk, qk) ∈ V is bounded.

As a consequence we can study the conjugacy of a composition g L where g is closed(proper, convex). We also bring to mind Proposition 3.52 in this context.

Proposition 3.95. Let g : E → R be proper and L ∈ L(E,E′) and T ∈ L(E′,E). Thenthe following hold:

a) (Lg)∗ = g∗ L∗.

b) (g T )∗ = cl (T ∗g∗) if g ∈ Γ.

c) The closure in b) can be dropped and the infimum is attained when finite if g ∈ Γ0

andri (rgeT ) ∩ ri (dom g) 6= ∅. (3.32)

60

Proof. a) For y ∈ E′ we have

(Lg)∗(y) = supz∈E′

〈z, y〉 − inf

x: L(x)=zg(x)

= sup

z∈E′, x∈L−1(z)〈z, y〉 − g(x)

= supx∈E〈x, L∗(y)〉 − g(x)

= g∗(L∗(y)).

b) Follows from a) and the Fenchel-Moreau Theorem.

c) With φ : (x, y) ∈ E× E′ 7→ g(y) we have

(g T )∗(z) = supx∈E〈z, x〉 − g(T (x))

= sup(x,y)∈gphT

〈z, x〉 − g(y)

= sup(x,y)∈E×E′

〈(z, 0), (x, y)〉 − (δgphT + φ)(x, y)

= (δgphT + φ)∗(z, 0).

We now observe that

ri (δgphT ) ∩ ri (domφ) = ri (gphT ) ∩ E× ri (dom g).

Therefore, (3.30) for δgphT and φ is satisfied if (and only if) (3.32) holds. We can henceinvoke Attouch-Brezis to find that

(g T )∗(z) = (σgphT#φ∗)(z, 0)

= inf(u,v)σgphT (u, v) + δ0(z − u) + g∗(−v)

= infvδ0(T

∗(v) + z) + g∗(−v)

= (T ∗g∗)(z),

where the infimum is a minimum when finite.

3.9 Subdifferential calculus and Fenchel-Rockafellar duality

In this section we develop a calculus for the convex subdifferential. Our primary goal is toestablish a subdifferential rule for convex functions of the form f+gL for f ∈ Γ(E1), g ∈Γ(E2) and L ∈ L(E1,E2). There are various different ways to the subdifferential of thatkind of convex function. We choose the path via Fenchel-Rockafellar duality which wecan prove using Attouch-Brezis.

The central qualification condition is

0 ∈ ri (dom g − L(dom f)) (3.33)

or equivalentlyri (L−1dom g) ∩ ri (dom f) 6= ∅

which is simply (3.30) applied to f and g L.

61

Theorem 3.96 (Fenchel-Rockafellar duality). Let f : E1 → R ∪ +∞, g : E2 →R ∪ +∞ and L ∈ L(E1,E2). Then the following hold:

a) (Weak duality) We have

p := infx∈E1

f(x) + g(L(x)) ≥ supy∈E2

−f ∗(L∗(y))− g∗(−y) =: d. (3.34)

b) (Strong duality) If f ∈ Γ0(E1) and g ∈ Γ0(E2) (3.33) holds then equality holds in(3.34) and the supremum is attained if finite.

c) (Primal-dual recovery) If f ∈ Γ0(E1) and g ∈ Γ0(E2) the following are equivalentfor x ∈ E1 and y ∈ E2:

i) p = d, x ∈ argminf(x) + g(L(x)), y ∈ argmax− f ∗(L∗(y))− g∗(−y);

ii) L∗(y) ∈ ∂f(x), −y ∈ ∂g(L(x));

iii) x ∈ ∂f ∗(L∗(y)), L(x) ∈ ∂g∗(−y).

Proof. a) Follows immediately from the Fenchel-Young inequality.

b) By Attouch-Brezis (recall the remark about the qualification conditions above) andProposition 3.95 c) we find that

infx∈E1

f(x) + g(L(x)) = (f + g L)∗(0)

= infz∈E1

f ∗(z) + (L∗g∗)(−z)

= infy∈E2

f ∗(L(y)) + g∗(−y)

where the second (and hence third) infimum is attained if finite. That shows the desiredstatement.

c) ’i)⇔ii)’: Observe that

ii) ⇐⇒ f(x) + f ∗(L∗(y)) = 〈x, L∗(y)〉 , g(L(x)) + g∗(−y) = 〈L(x), −y〉⇐⇒ f(x) + f ∗(L∗(y)) + g(L(x)) + g∗(−y) = 0

⇐⇒ f(x) + g(L(x)) = −f ∗(L∗(y))− g∗(−y)

⇐⇒ i).

Here the first equivalence is due to Theorem 3.86. In the second equivalence the ’⇐’-direction uses the Fenchel-Young inequality. The third one is obvious, while the lastequivalence uses a).

’ii)⇔iii)’: Follows immediately from Theorem 3.86.

There is an abundance of applications of Fenchel-Rockafellar duality, but at this pointin our study, it is primarily a tool for proving a generalized sum rule for the convexsubdifferential.

62

Theorem 3.97 (Generalized sum rule). Let f : E1 → R, g : E2 → R and L ∈ L(E1,E2).Then the following hold:

a) ∂(f + g L)(x) ⊃ ∂f(x) + L∗ [∂g(L(x))] (x ∈ E1).

b) If f ∈ Γ0(E1) and g ∈ Γ0(E2) and condition (3.33) is satisfied, equality holds in a).

Proof. a) Let v ∈ ∂f(x) + L∗ [∂g(L(x))], i.e. v = u + L∗(w) where u ∈ ∂f(x) andw ∈ ∂g(L(x)). By the respective subgradient inequalities we have

〈u, x− x〉+ f(x) ≤ f(x) (x ∈ E1) (3.35)

and〈w, L(x)− L(x)〉+ g(L(x)) ≤ g(L(x)) (x ∈ E1).

The latter is equivalent to

〈L∗(w), x− x〉+ (g L)(x) ≤ (g L)(x) (x ∈ E1).

Adding this to (3.35) yields

〈v, x− x〉+ f(x) + (g L)(x) ≤ f(x) + (g L)(x) (x ∈ E1),

which shows that v ∈ ∂(f + g L)(x).

b) Let v ∈ ∂(f + g L)(x). Observe (cf. Theorem 3.86) that x solves

infx∈E1

(f(x)− 〈v, x〉) + g(L(x)).

By Fenchel-Rockafellar duality (applied to f−〈v, ·〉 and g noticing that dom f = dom (f−〈v, ·〉)) we therefore infer that there exists y ∈ E2 such that

f(x)− 〈v, x〉+ g(L(x)) = −f ∗(L∗(y) + v)− g∗(−y),

or equivalently

〈v, x〉 = f(x) + g(L(x)) + f ∗(L∗(y) + v) + g∗(−y).

Hence,

〈v + L∗(y), x〉+ 〈−y, L(x)〉 = f(x) + f ∗(L∗(y) + v) + g(L(x)) + g∗(−y).

By the Fenchel-Young inequality we hence must have

L∗(y) + v ∈ ∂f(x) and − y ∈ ∂g(L(x)).

Thereforev = v + L∗(y) + L∗(−y) ∈ ∂f(x) + L∗∂g(L(x)),

which concludes the proof.

63

The generalized subdifferential sum rule has many important consequences, two of whichwe present now: The first one is a sum rule, which in the literature is sometimes referredto as Moreau-Rockafellar Theorem, but this is moniker is not used uniformly.

Corollary 3.98 (Subdifferential sum rule). Let f, g ∈ Γ then

∂(f + g)(x) ⊃ ∂f(x) + ∂g(x) (x ∈ E). (3.36)

Under the qualification condition

0 ∈ ri (dom g − dom f)

equality holds in (3.36).

Likewise we obtain a chain rule.

Corollary 3.99 (Subdifferential chain rule). Let g ∈ Γ(E2) and L ∈ L(E1,E2). Then

∂(g L) ⊃ L∗(∂g) L. (3.37)

Under the qualification condition

0 ∈ ri (dom g − rgeL)

equality holds in (3.37).

We continue with a subdifferential rule for infimal convolution.

Theorem 3.100 (Subdifferential of infimal convolution). Let f, g ∈ Γ0 and x ∈ dom (f#g).Then

∂(f#g)(x) = ∂f(u) ∩ ∂g(x− u) (u ∈ argminuf(u) + g(x− u)).

Proof. Let u ∈ argminuf(u) + g(x− u). Then, by Theorem 3.86, we have

y ∈ ∂(f#g)(x) ⇐⇒ x ∈ argminx(f#g)(x)− 〈x, y〉⇐⇒ (f#g)(x)− 〈x, y〉 = inf

x(f#g)(x)− 〈x, y〉

⇐⇒ f(u) + g(x− u)− 〈x, y〉 = inf(x,u)f(u) + g(x− u)− 〈x, y〉

⇐⇒ (x, u) ∈ argminx,uf(u) + g(x− u)− 〈(x, u), (y, 0)〉.

Now consider the self-adjoint linear map

L : (x, u) ∈ E× E 7→ (u, x− u)

and the convex function

c : (v, w) ∈ E× E 7→ g(v) + h(w).

Using again Theorem 3.86 and Corollary 3.99 and Exercise 3.9.1, we find that y ∈∂(f#g)(x) is equivalent to

(y, 0) ∈ ∂(c L)(x, u) = L∗∂c(u, x− u) = L [∂f(u)× ∂g(x− u))].

This yields the desired result.

64


3.9.1 (Subdifferential of separable sum) For fi : Ei → R (i = 1, 2) the separable sumof f1 and f2 is defined by

f1 ⊕ f2 : (x1, x2) ∈ E1 × E2 7→ f1(x1) + f2(x2).

Show that for fi ∈ Γ(Ei) (i = 1, 2) we have

∂(f1 ⊕ f2) = ∂f1 × ∂f2.

3.10 Infimal projection

We would like to revisit the infimal projection setting from 3.53.

Theorem 3.101 (Infimal projection II). Let ψ ∈ Γ0(E1 × E2) and define p : E1 → R by

p(x) := infvψ(x, v). (3.38)

Then the following hold:

a) p is convex.

b) p∗ = ψ∗(·, 0) which is closed and convex.

c) The conditiondomψ∗(·, 0) 6= 0 (3.39)

is equivalent to having p∗ ∈ Γ0.

d) If (3.39) holds then p ∈ Γ0 and the infimum in its definition is attained when finite.

Proof. a) Theorem 3.53.

b) We have

p∗(y) = supx〈y, x〉 − inf

vψ(x, v)

= sup(x,v)

〈(y, 0), (x, v)〉 − ψ(x, v)

= ψ∗(y, 0).

A conjugate function is always closed and convex as a pointwise supremum of affine, henceconvex functions, cf. Proposition 3.50.

c) The is obvious from b).

d) Observe that by b) we have

p∗∗(x) = supy〈x, y〉 − ψ∗(y, 0)

= sup(y,w)

〈(x, 0), (y, w)〉 − (ψ∗ + δE1×0)(y, w)

= (ψ∗ + δE1×0)∗(x, 0).

65

Now we notice that

ri (domψ∗) ∩ ri (dom δE1×0) = ri (domψ∗) ∩ E1 × 0

see Exercise 3.1.8. Hence

ri (domψ∗) ∩ ri (dom δE1×0) 6= ∅ ⇐⇒ domψ∗(·, 0) 6= ∅

as the relative interior of a convex set is nonempty if and only if the set itself is not, seeTheorem 3.22. Since we assume that the latter holds, we can hence apply the Attouch-Brezis Theorem to continue the above derivations and obtain

p∗∗(x) = (ψ∗ + δE1×0)∗(x, 0)

= (ψ#δ0×E2)(x, 0)

= infvψ(x, v)

= p(x),

and the infimum is attained when finite.

4 Conjugacy of Composite Functions via K-Convexity

and Infimal Convolution

In this section we are concerned with functions gF , where g is closed, proper, convex andF is a (generally) nonlinear map. These kinds of functions are known under the monikerconvex-composite function, and are, in general, nonconvex. However, if the monotonicityproperties of g and the generalized convexity properties of F align in a certain way, thisconvex-composite may be convex after all. We want to study this case with a focus onthe Fenchel conjugate of these kinds of (convex) convex-composite functions. Our maintools are infimal convolution, especially the Attouch-Brezis Theorem (see Theorem 3.94)and a notion of generalized convexity for vector-valued functions with respect to somecone-induced ordering.

4.1 K-convexity

Given a cone K ⊂ E, the relation

x ≤K y :⇐⇒ y − x ∈ K (x, y ∈ E)

induces a partial ordering on E. We may then attach to E a largest element +∞• withrespect to that ordering which satisfies

x ≤K +∞• (x ∈ E).

We will set E• := E ∪ +∞•. For a function F : E1 → E•2 we define its domain, graphand range, respectively, as

domF := x ∈ E1 | F (x) ∈ E2 ,gphF := (x, F (x)) ∈ E1 × E2 | x ∈ domF ,rgeF := F (x) ∈ E2 | x ∈ domF .

66

We call F proper if domF 6= ∅. The following concept is central to our study.

Definition 4.1 (K-convexity). Let K ⊂ E2 be a cone and F : E1 → E•2. Then we call Fconvex with respect to K or K-convex if its K-epigraph

K-epiF := (x, v) ∈ E1 × E2 | F (x) ≤K v

is convex (in E1 × E2).

We point out that, in the setting of Definition 4.1, a K-convex function F has a convexdomain as domF = L(K-epiF ) where L : E1×E2 → E1 : (x, v) 7→ x. Thus, F is K-convexif and only if

F (λx+ (1− λ)y) ≤K λF (x) + (1− λ)F (y) (x, y ∈ domF, λ ∈ [0, 1]).

We will make use of the relative interior of the K-epigraph of a K-convex function whichis described in the following lemma, the proof on which is based on results from Section3.1.4.

Lemma 4.2 (Relative interior of K-epigraph). Let K ⊂ E2 be a convex cone, and letF : E1 → E•2 be proper and K-convex. Then

ri (K-epiF ) =

(x, v)∣∣ x ∈ ri (domF ), F (x) ri (K) v

=

(riK-epiF

)⋂(domF × E2

).

In particular, if domF = E1, then ri (K-epiF ) = riK-epiF .

Proof. For each x ∈ E1, set Cx = v ∈ E2 | (x, v) ∈ K-epiF . It follows from Proposition3.28 that

riCx = ri(L−1x (K)

)= L−1

x (riK) = v ∈ E2 | v ∈ F (x) + riK ,

where Lx : v 7→ v − F (x). Hence, by Theorem 3.29, we find that

(x, v) ∈ ri (K-epiF )⇐⇒ x ∈ ri x ∈ E1 | Cx 6= ∅ and v ∈ riCx

⇐⇒ x ∈ ri (domF ) and v ∈ riCx

⇐⇒ x ∈ ri (domF ) and v ∈ F (x) + riK.

It is clear from the definition that if F is K-convex and L ⊃ K is a cone, then F is alsoL-convex. The following result due to Pennanen [23, Lemma 6.1] gives a nice descriptionof the polar of smallest closed, convex cone with respect to which a given function isconvex.

Lemma 4.3. Let f : E1 → E•2 with a convex domain and let K ⊂ E2 be the smallest closedconvex cone with respect to which F is convex. Then

(−K) = v ∈ E2 | 〈v, F 〉 is convex =: PF .

67

Proof. First, we observe the following: As F is K-convex, for λ ∈ (0, 1) and x, y ∈ domF ,we have

λF (x) + (1− λ)F (y)− F (λx+ (1− λ)y) ∈ K⇐⇒ F (λx+ (1− λ)y)− λF (x)− (1− λ)F (y) ∈ −K⇐⇒ 〈v, F (λx+ (1− λ)y)− λF (x)− (1− λ)F (y)〉 ≤ 0 (v ∈ (−K))

⇐⇒ 〈v, F 〉 is convex (v ∈ (−K))

where the third equivalence uses that K = K as K is a closed, convex cone, see Theorem3.73. It hence, implies that

(−K) ⊂ PF . (4.1)

Next, it is clear that PF is a convex cone, and, due to the continuity of the inner product,PF is closed. By the above observation F is −P F -convex and therefore K ⊂ −P F , orequivalently,

−K ⊂ P F . (4.2)

Finally, by taking the polars in (4.2), using Theorem 3.73 and (4.1), we get

PF = (P F ) ⊂ (−K) ⊂ PF .

4.2 The convex-composite setting

For F : E1 → E•2 and g : E2 → R ∪ +∞ we define the composition of g and F as

(g F )(x) :=

g(F (x)) if x ∈ domF,

+∞ else

which extends the usual convention g(+∞) = +∞ when E2 = R. In particular, givenv ∈ E2 and the linear form 〈v, ·〉 : E2 → R that goes with it, we set 〈v, F 〉 := 〈v, ·〉 F ,i.e.

〈v, F 〉 (x) =

〈v, F (x)〉 if x ∈ domF,

+∞ else.

This scalarization of F is quite central to our study which is already foreshadowed in thenext two auxiliary results.

Lemma 4.4. Let F : E1 → E•2 and v ∈ E2. Then the following hold:

a) If F is K-convex then 〈v, F 〉 is convex for all v ∈ −K.

b) The inverse of a) holds true if K is closed and convex.

Proof. a) Suppose that F is K-convex and let v ∈ −K, x, y ∈ E1, and λ ∈ [0, 1]. Since

λF (x) + (1− λ)F (y)− F (λx+ (1− λ)y) ∈ K,

68

it follows that〈v, λF (x) + (1− λ)F (y)− F (λx+ (1− λ)y)〉 ≥ 0,

and hence,〈v, λF (x) + (1− λ)F (y)〉 ≥ 〈v, F (λx+ (1− λ)y)〉 .

b) Let x, y ∈ E1, λ ∈ [0, 1], and v ∈ −K. Set z = λx + (1− λ)y. Since 〈v, F 〉 is convexwe have

〈v, λF (x) + (1− λ)F (y)− F (z)〉 = λ 〈v, F (x)〉+ (1− λ) 〈v, F (y)〉 − 〈v, F (z)〉= λ 〈v, F 〉 (x) + (1− λ) 〈v, F 〉 (y)− 〈v, F 〉 (λx+ (1− λ)y)

≥ 0,

and therefore,

λF (x) + (1− λ)F (y)− F (λx+ (1− λ)y) ∈ −(−K) = K = K,

where K = K because K is closed and convex (and contains 0), see Theorem 3.73.

Lemma 4.5. Let F : E1 → E•2 and let K ⊂ E2 be a closed, convex cone. Then thefollowing hold for all (u, v) ∈ E1 × E2:

a) σK-epiF (u, v) = σgphF (u, v) + δK(v).

b) σgphF (u,−v) = 〈v, F 〉∗ (u).

c) If F is linear then 〈v, F 〉∗ = δF ∗(v).

Proof. a) Observe that

σK-epiF (u, v) = sup(x,y)∈K-epiF

〈(u, v), (x, y)〉

= sup(x,z)∈E1×K

〈(u, v), (x, F (x) + z)〉

= supx∈E1

〈(u, v), (x, F (x))〉+ supz∈K〈z, v〉

= σgphF (u, v) + δK(v),

where the last identity uses Exercise 3.5.13.

b) We have

σgphF (u,−v) = supx∈domF

〈(u,−v), (x, F (x))〉

= supx∈E1

〈u, x〉 − 〈v, F 〉 (x)

= 〈v, F 〉∗ (u).

69

Proposition 4.6. Let K ⊂ E2 be a convex cone, F : E1 → E•2 K-convex and g ∈ Γ(E2)such that rgeF ∩ dom g 6= ∅. If

g(F (x)) ≤ g(y) ((x, y) ∈ K-epiF ) (4.3)

then the following hold:

a) g F is convex and proper.

b) If g is lsc and F is continuous then g F is lower semicontinuous.

Proof. a) As g is proper and rgeF ∩ dom g 6= ∅, the composite g F is proper. Now, letv, w ∈ domF and λ ∈ [0, 1]. Then

g(F (λv + (1− λ)w)) ≤ g(λF (v) + (1− λ)F (w))

≤ λg(F (v)) + (1− λ)g(F (w)).

Here the first inequality is due to the fact that (λv + (1 − λ)w, λF (v) + (1 − λ)F (w)) ∈K-epiF (as F is K-convex) and assumption (4.3).

b) Let xn be a sequence in E1 converging to x and set t = lim infn→+∞ g(F (xn)).Then there exists a subsequence (xkn) such that g(F (xkn))→ t. Due to the continuity ofF , F (xkn)→ F (x), and hence, by the lower semicontinuity of g, we have t ≥ g(F (x)).

4.3 The Fenchel conjugate of the convex convex-composite

Without further ado, we present the main result of this chapter.

Theorem 4.7 (Conjugacy for convex-composite). Let K ⊂ E2 be a closed convex cone,F : E1 → E•2 K-convex such that K-epiF is closed and g0 ∈ Γ(E2) such that (4.3) issatisfied. Then the following hold:

a) (g F )∗ ≤ cl η, where

η(p) = infv∈−K

g∗(v) + 〈v, F 〉∗ (p).

b) IfF (ri (domF )) ∩ ri (dom g −K) 6= ∅. (4.4)

the function η in a) is closed, proper and convex and the infimum is attained iffinite.

c) Under the assumptions of b) we have


g∗(v) + 〈v, F 〉∗ (p)

with dom (g F )∗ = p ∈ E1 | ∃v ∈ dom g∗ ∩ (−K) : 〈v, F 〉∗ (p) < +∞.

70

Proof. a) Define φ : E1 × E2 → R ∪ +∞ : (x, y) 7→ g(y) and observe that

φ∗(u, v) = δ0(u) + g∗(v). (4.5)

Hence we find that

(g F )∗(p) = supx∈E1

〈x, p〉 − g(F (x))

= sup(x,y)∈K-epiF

〈x, p〉 − g(y)

= sup(x,y)∈E1×E2

〈(p, 0), (x, y)〉 − (g(y) + δK-epiF (x, y)

= (φ+ δK-epiF )∗(p, 0)

= cl (φ∗#σK-epiF ) (p, 0).

Here the second equality uses assumption (4.3), and last identity is then due to Proposition3.93 as the functions in play are closed, proper and convex by assumption. Moreover, wehave

(φ∗#σK-epiF ) (p, 0) = inf(u,v)∈E1×E2

φ∗(u, v) + σK-epiF (p− u,−v)

= infv∈E2

g∗(v) + σK-epiF (p,−v)

= infv∈−K

g∗(v) + σgphF (p,−v)

= infv∈−K

g∗(v) + 〈v, F 〉∗ (p) ,

where the second identity uses (4.5) and the third and fourth one rely on Lemma 4.5.This shows the desired statement.

b) This follows from Theorem 3.94 while observing that

ri (domφ) ∩ ri (dom δK-epiF ) 6= ∅ ⇐⇒ E1 × ri (dom g) ∩ ri (K-epiF ) 6= ∅⇐⇒ ∃x ∈ ri (domF ) : F (x) ∈ ri (dom g)− riK

⇐⇒ F (ri (domF )) ∩ ri (dom g −K) 6= ∅.

Here the second equivalence relies on Lemma 4.2.

c) The first statement follows from a) and b) and Theorem 3.94. The expression ofdom (g F )∗ is an immediate consequence of that.

We now continue with a whole sequence of rather immediate consequences of Theorem4.7. The first one shows that our setting g F in fact covers the seemingly more generalsetting with an additional additive term.

Corollary 4.8 (Conjugate of additive composite functions). Under the assumptions ofTheorem 4.7 let f ∈ Γ0 such that

F (ri (dom f ∩ domF )) ∩ ri (dom g −K) 6= ∅. (4.6)

Then(f + g F )∗(p) = min

v∈−K,y∈E1

g∗(v) + f ∗(y) + 〈v, F 〉∗ (p− y).

71

Proof. Define g : R× E2 → R ∪ +∞ by g(s, y) := s+ g(y). Then g ∈ Γ0(R× E2) with

g∗(t, v) = δ1(t) + g∗(v). (4.7)

Moreover, define F : E1 → R ∪ +∞ × E•2 by F (x) = (f(x), F (x)). Then dom F =dom f ∩ domF . Setting K := R+ ×K, we find that F is K-convex, g F = f + g F ∈Γ0(E1) and g F satisfies (4.3) with K (since g F satisfies it with K). Moreover, asdom g = R × dom g, we realize that condition (4.4) for g, F and K amounts to (4.6).Therefore, we obtain

(f + g F )∗(p) = (g F )(p)

= min(t,v)∈−K

g∗(t, v) +⟨

(t, v), F⟩∗

(p)

= minv∈−K

g∗(v) + supx∈E1

〈p, v〉 − (f(x) + 〈v, F 〉 (x))

= minv∈−K

g∗(v) + (f + 〈v, F 〉)∗(p)

= minv∈−K

g∗(v) + (f ∗# 〈v, F 〉∗)(p)

= minv∈−K,y∈E1

g∗(v) + f ∗(y) + 〈v, F 〉∗ (p).

Here the second identity is due to Theorem 4.7, the third one uses (4.7), while the fifthrelies once more on Theorem 3.94, realizing that dom 〈v, F 〉 = domF and (4.6) impliesthat ri (dom f) ∩ ri (domF ) ⊃ ri (dom f ∩ domF ) 6= ∅.

The next corollary follows simply from the fact that condition (4.3) is trivially satisfied ifg has the following property: For a cone K ⊂ E we call f : E→ R∪ +∞ K-increasingif

f(u) ≤ f(v) (u K v).

Corollary 4.9. Let K ⊂ E2 be a closed, convex cone, F : E1 → E•2 K-convex such thatK-epiF is closed and let g ∈ Γ0 and K-increasing such that (4.4) holds (which is true inparticular when g is finite-valued or F is surjective). Then


g∗(v) + 〈v, F 〉∗ (p).

Proof. Since the fact that g is K-increasing implies (4.3) the assertion follows from The-orem 4.7.

The next result shows that any closed, proper, convex function g is monotone with respectto its horizon cone, see Definition 3.80.

Lemma 4.10. Let g ∈ Γ0. Then g is (−hzn g)-increasing.

72

Proof. Put K := −hzn g. Then Theorem 3.81 yields K = −(cone (dom g∗)). Now letx, y ∈ E such that x K y, i.e. y = x+ b for some b ∈ K. Since g = g∗∗ we hence find

g(x) = supz∈dom g∗

〈x, z〉 − g∗(z)

= supz∈dom g∗

〈y, z〉 − 〈b, z〉 − g∗(z)

≤ supz∈dom g∗

〈y, z〉 − g∗(z)

= g(y),

where the inequality relies on the fact that 〈b, z〉 ≥ 0.

Corollary 4.11. Let g ∈ Γ0(E2) and let F : E1 → E•2 be (−hzn g)-convex with −hzn g-epiFclosed such that

F (ri (domF )) ∩ ri (dom g + hzn g) 6= ∅.

Then(g F )∗(p) = min

v∈E2

g∗(v) + 〈v, F 〉∗ (p).

Proof. This follows from combining Corollary 4.9 (with K = −hzn g) and Lemma 4.10while observing that dom g∗ ⊂ cone (dom g∗) = (hzn g), cf. Theorem 3.81.

Finally, as another immediate consequence of our study we get the well known result forthe case when F is linear, cf. Proposition 3.95.

Corollary 4.12 (The linear case). Let g ∈ Γ(E2) and F : E1 → E2 linear such thatrgeF ∩ ri (dom g) 6= ∅. Then

(g F )∗(p) = minv∈E2

g∗(v) | F ∗(v) = p

with dom (g F ) = (F ∗)−1(dom g∗).

Proof. We notice that F is 0-convex. Hence we can apply Theorem 4.7 with K = 0.Condition (4.4) then reads rgeF ∩ ri (dom g) 6= ∅, which is our assumption. Hence weobtain


g∗(v) + 〈v, F 〉∗ p = minv∈E2

g∗(v) + δF ∗(v)(p).

where the second identity uses Lemma 4.5.

4.4 Applications

In this section we present some eclectic applications of our study in Section 4.3 to illustratethe versatility of our findings.

73

4.4.1 Conic programming

We consider the general conic program

min f(x) s.t. F (x) ∈ −K (4.8)

where f : E1 → R is convex, F : E1 → E2 is K-convex and K ⊂ E2 is a closed, convexcone. Clearly, (4.8) can be written in the additive composite form

minx∈E1

f(x) + (δ−K F )(x). (4.9)

This fits the additive composite setting of Corollary 4.8 with g = δ−K which is K-increasing. Moreover, the qualification condition (4.4) for the conjugate calculus reads

rgeF ∩ ri (−K) = ∅, (4.10)

which is simply a generalized version of Slater’s condition.The Fenchel (or Fenchel-Rockafellar) dual problem associated with (4.8) via (4.9) is

maxy∈E1

−f ∗(y)− (δ−K F )∗(−y),

while the Lagrangian dual is

maxv∈−K

infx∈E1

f(x) + 〈v, F (x)〉 .

We obtain the following duality result.

Theorem 4.13. (Strong duality and dual attainment for conic programming) Let f :E1 → R is convex, K ⊂ E2 a closed, convex cone, and let F : E1 → E2 be K-convex withclosed K-epigraph. If (4.10) holds then

infx∈E1

f(x) + (δ−K F )(x) = maxv∈−K

−f ∗(y)− (δ−K F )∗(−y) = maxv∈−K

infx∈E1

f(x) + 〈v, F (x)〉 .

Proof. Observe that, by Corollary 4.8, we have

(f + δ−K F )∗(0) = minv∈−K,y∈E1

f ∗(y) + 〈v, F 〉∗ (−y)

= minv∈−K

(f + 〈v, F 〉)∗(0)

= minv∈−K

supx∈E1

−f(x)− 〈v, F (x)〉.

Moreover, by Theorem 4.7 we have

(δ−K F )∗(−y) = minv∈−K

〈v, F 〉∗ (−y).

Finally, since infE1 f + δ−K F = −(f + δ−K F )∗(0), this shows everything.

Theorem 4.13 furnishes various facts: It shows strong duality and dual attainment forboth the Fenchel duality (first identity) and Lagrangian duality (second identity) schemeunder a generalized Slater condition. Moreover, it shows the equivalence of both dualityconcepts. Of course, these are well-known results, but the proof based on Corollary 4.8and Theorem 4.7, respectively, unifies this in a very elegant and convenient way.

74

4.4.2 Conjugate of pointwise maximum of convex functions

In what follows we denote the unit simplex in Rm by ∆m, i.e.

∆m =

v ∈ Rm

∣∣∣∣∣ vi ≥ 0,m∑i=1

vi = 1

.

The following result provides the conjugate for the pointwise maximum of finitely manyconvex functions. It therefore, slightly generalizes (at least in the finite dimensional case)the results established for the case of two functions in [15] and alternatively proven in [5].

Proposition 4.14. For f1, . . . , fm ∈ Γ0(E) define f := maxi=1,...,m fi. Then f ∈ Γ0(E)with

f ∗(x) = minv∈∆m

( m∑i=1

vifi

)∗(x).

Proof. Define F : E→ (Rm)• by

F (x) =

(f1(x), . . . , fm(x)) if x ∈

⋂mi=1 dom fi,

+∞• otherwise,

and g : Rm → R by g(v) = max1≤i≤m vi. Then f = g F (with the conventions made inSection 4.2) and we observe that F is Rm

+ -convex and g is Rm+ -increasing with dom g = Rm.

Hence, Theorem 4.7 c) is applicable with the qualification condition (4.4) trivially satisfied.Thus, for all x ∈ E, we obtain

(g F )∗(x) = minv∈Rm+

g∗(v) + 〈v, F 〉∗ (x)

= minv∈Rm+

δ∆m(v) + 〈v, F 〉∗ (x)

= minv∈∆m

( m∑i=1

vifi

)∗(x)

where the second equality follows from Exercise 3.5.7.

5 A New Class of Matrix Support Functionals

From this point on we set

E := Rn×m × Sn and κ := dimE.

Then E is a Euclidean space equipped with the inner product

〈(X, V ), (Y,W )〉 = tr (XTY ) + tr (VW ).

75

5.1 The matrix-fractional function

We consider the function φ : E→ R ∪ +∞ given by

φ(X, V ) =

12tr (XTV −1X) if V 0,

+∞ else(5.1)

which is referred to as the matrix-fractional function, see [6, Example 3.4] and [11, Exam-ple 3.6.0.0.2]. However, we reserve this moniker for its closure, see below. The function φfrom (5.1) occurs in a myriad of different situations as the following two examples indicate;the first of which concerns a variational characterization of the nuclear norm.

Example 5.1 (Nuclear norm smoothing). The nuclear norm ‖·‖∗ is a popular regularizerin matrix optimization to promote low-rank solutions, see e.g. [13]. Hsieh and Olsen [20,Lemma 1] found an interesting variational representation

‖X‖∗ = minV ∈Sn++

1

2tr (V ) + φ(X, V ) (5.2)

using φ from (5.1).

The second example puts the function φ in the context of maximum-likelihood estimation.

Example 5.2 (Covariance estimation). Suppose yi ∈ Rn (i = 1, . . . , N) are measurementsof a random vector y that is normally distributed with mean value µ ∈ Rn and covariancematrix Σ ∈ Sn++, which are both unknown. In order to estimate them, one considers theoptimization problem

maxµ∈Rn,Σ∈Sn++

1

(2π)n/2

N∏i=1

1

(det Σ)1/2exp

(−1

2(yi − µ)TΣ−1(yi − µ)

)of maximizing the density with respect to mean value and covariance. This is, by themonotonicity of the logarithm, equivalent to the optimization problem

minµ∈Rn,Σ∈Sn++

N∑i=1

1

2(yi − µ)TΣ−1(yi − µ) +

N

2log det Σ.

Using the function φ from (5.1) this reads

min(Y,Σ)∈Rn×m×Sn

φ(Y − [µ, . . . , µ],Σ)−(−N

2log det Σ

).

From a convex-analytical perspective, the function φ defined in (5.1) has one major draw-back, namely it is not closed. As our study will show its closure is γ : E → R ∪ +∞given by

γ(X, V ) :=

12tr(XTV †X

)if V 0, rgeX ⊂ rgeV,

+∞ else.(5.3)

We will refer to γ as the matrix-fractional function while [11] refers to γ as the pseudomatrix-fractional function.

76

Proposition 5.3. For the functions γ from (5.3) and φ from (5.1) we have:

a) epi γ =

(X, V, α)| ∃Y ∈ Sm :

(V XXT Y

) 0, 1

2tr (Y ) ≤ α

.

b) epiφ =

(X, V, α)| ∃Y ∈ Sm :

(V XXT Y

) 0, V 0, 1

2tr (Y ) ≤ α

.

Moreover, epi γ = cl (epiφ), or equivalently, γ = clφ.

Proof. a) First note that

Y XTV †X ⇒ tr (Y ) ≥ tr (XTV †X). (5.4)

We have

(X, V, α) ∈ epi γ ⇔ V 0, rgeX ⊂ rgeV,1

2tr (XTV †X) ≤ α

⇔ ∃Y ∈ Sm : V 0, rgeX ⊂ rgeV, Y XTV †X,1

2tr (Y ) ≤ α

⇔ ∃Y ∈ Sm :

(V XXT Y

) 0,

1

2tr (Y ) ≤ α,

where the second equivalence follows from (5.4) and the fact that we may take Y =XTV †X, and the final equivalence follows from Lemma A.5.b) Analogous to a) with the additional condition that V 0 in each step.In order to show that cl (epiφ) = epi γ, we first note that, as γ is lsc, epi γ is closed. More-over, since obviously epiφ ⊂ epi γ, the inclusion cl (epiφ) ⊂ epi γ follows immediately.

In order to see the converse inclusion, let (X, V, α) ∈ epi γ. Now, set Vk := V + 1kIn 0

and put αk := α + γ(X, Vk)− γ(X, V ) for all k ∈ N. Then, by definition, we have

αk = α + γ(X, Vk)− γ(X, V ) ≥ γ(X, Vk) = φ(X, Vk),

i.e., (X, Vk, αk) ∈ epiφ for all k ∈ N. Clearly, Vk → V with Vk nonsingular, and soV −1k → V + (e.g., see [22, p. 153]). Consequently, φ(X, Vk) = γ(X, Vk)→ γ(X, V ) so thatαk → α. Therefore, (X, Vk, αk)→ (X, V, α), and so epi γ = cl (epiφ), i.e., γ = clφ.

It is fairly easy to see now that the matrix-fractional function γ is closed, proper, convexas its epigraph is a closed, convex cone. By Hormander’s theorem (Theorem 3.68) ithence must be a support function - the question is which set it supports! We will answerthis question as a by-product of our subequent analysis, in which we will consider asignificantly generalized version of the matrix-fractional function. We would also like topoint out that the fact that γ is a support function was already observed in the specialcase m = n = 1 in [24, p. 83].

77

5.2 The generalized matrix-fractional function

We now want to study the GMF in more detail. To this end the following definition iscrucial.

KS :=V ∈ Sn

∣∣ uTV u ≥ 0, (u ∈ S), (5.5)

where S is a subspace of Rn, that is, KS is the set of all symmetric matrices that arepositive definite with respect to the given subspace S. Observe that if P ∈ Sn is theorthogonal projection onto S, then

KS = V ∈ Sn | PV P 0 . (5.6)

Clearly, KS is a convex cone, and, for S = Rn, it reduces to Sn+. Given a matrix A ∈ Rp×n,the cones KkerA play a special role in our analysis. For this reason, we simply write KAto denote KkerA, i.e. KA := KkerA.

Proposition 5.4 (KS and its polar). Let S be a nonempty subspace of Rn and let P bethe orthogonal projection onto S. Then the following hold:

a) KS = cone−vvT | v ∈ S

= W ∈ Sn | W = PWP 0 .

b) intKS =V ∈ Sn

∣∣ uTV u > 0 (u ∈ S \ 0).

c) aff (KS) = spanvvT | v ∈ S

= W ∈ Sn | rgeW ⊂ S .

d) ri (KS) =W ∈ KS

∣∣ uTWu < 0 (u ∈ S \ 0)

when S 6= 0 and

ri (K0) = 0 (since K0 = Sn).

Proof. a) Put B :=−ssT | s ∈ S

⊂ Sn− and observe that

cone B =

−

r∑i=1

λisisTi | r ∈ N, si ∈ S, λi ≥ 0 (i = 1, . . . , r)

.

We have cone B =W ∈ Sn− | W = PWP

: To see this, first note that cone B ⊂

W ∈ Sn− | W = PWP

. The reverse inclusion invokes the spectral decomposition ofW =

∑ni=1 λiqiq

Ti for λ1, . . . , λn ≤ 0. In particular, this representation of cone B shows

that it is closed. We now prove the first equality in a): To this end, observe that

KS =V ∈ Sn

∣∣ sTV s ≥ 0 (s ∈ S)

=V ∈ Sn

∣∣ ⟨V, −ssT⟩ ≤ 0 (s ∈ S)

= (cone B),

where the third equality uses simply the linearity of the inner product in the secondargument. Polarization then gives

KS = (cone B) = cone B = cone B.

b)The proof is straightforward and follows the pattern of proof for intSn+ = Sn++.

78

c) With B as defined above, observe that

aff KS = spanKS = spanB,

since 0 ∈ KS , which shows the first equality. It is hence obvious that aff KS ⊂ W ∈ Sn | rgeW ⊂ S.On the other hand, every W ∈ Sn such that rgeW ⊂ S has a decomposition W =∑rankW

i=1 λiqiqTi where λi 6= 0 and qi ∈ rgeW ⊂ S for all i = 1, . . . , rankW , i.e. W ∈

spanB = aff KS .

d) Set R :=W ∈ KS

∣∣ uTWu < 0 (u ∈ S \ 0)

and let W ∈ ri (KS) \ R ⊂ KS .Then there exists u ∈ S with ‖u‖ = 1 such that uTWu = 0. Then for every ε > 0 wehave uT (W + εuuT )u = ε > 0. Therefore W + εuuT ∈ (Bε(W ) ∩ aff (KS)) \ KS for allε > 0, and hence W /∈ ri (KS), which contradicts our assumption. Hence, ri (KS) ⊂ R.

To see the reverse implication assume there were W ∈ R \ ri (KS), i.e. for all k ∈ Nthere exists Wk ∈ B 1

k(W ) ∩ aff (KS) \ KS . In particular, there exists uk ∈ S | ‖uk‖ = 1

such that uTkWkuk ≥ 0 for all k ∈ N. W.l.o.g. we can assume that uk → u ∈ S \ 0.Letting k → ∞, we find that uTWu ≥ 0 since Wk → W . This contradicts the fact thatW ∈ R.

5.2.1 Quadratic optimization with affine equality constraints

Given (A, b) ∈ Rp×n ×Rp with b ∈ rgeA we consider the quadratic optimization problem

minu∈Rn

1

2uTV u− xTu s.t. Au = b, (5.7)

and its optimal value function

v(x, V ) := infu∈Rn

1

2uTV u− xTu | Au = b

. (5.8)

The following result is well known.

Lemma 5.5 (Solvabilty of equality constrained QPs). For (x, V,A, b) ∈ Rn×Sn×Rp×n×Rp consider the quadratic optimization problem (5.7). Then (5.7) has a solution if andonly if the following three conditions hold:

i) b ∈ rgeA (i.e. (5.7) is feasible);

ii) x ∈ rge[V AT

];

iii) V ∈ KA.

If, however, b ∈ rgeA, but ii) or iii) are violated, we have v(x, V ) = −∞.

Proof. This is a standard result. It is an immediate consequence of [19, Exercise 5, page17]. We leave the details to the reader as an exercise.

79

We now provide an explicit formula for the optimal value function (5.8). Here, givenA ∈ Rp×n, the matrix

M(V ) :=

(V AT

A 0

), (5.9)

which is often referred to as a bordered matrix in the literature [10, 12, 22], will be useful.It also plays a central role in our subsequent analysis. We record an interesting result oninvertibility of the bordered matrix before we continue our study.

Proposition 5.6 (Invertibility of bordered matrix). The bordered matrix M(V ) is in-vertible if and only if rankA = p and V ∈ intKA in which case

M(V )−1 =

(P (P TV P )−1P T

(I − P (P TV P )−1P TV

)A†

(A†)T(I − V P (P TV P )−1P T

)(A†)T

(V P (P TV P )−1P TV − V

)A†

),

where P ∈ Rn×(n−p) is any matrix whose columns form an orthonormal basis of kerA.

Proof. For the characterization of the invertibility of M(V ) see e.g. [10, Th. 7]. Theinversion formula can be found in [12, Th. 1] and is easily verified by direct matrixmultiplication.

We now provide an explicit formula for the optimal value function in (5.8).

Theorem 5.7. For b ∈ rgeA and v given by (5.8) we have

v(x, V ) =

−12

(xb

)TM(V )†

(xb

)if x ∈ rge [V AT ], V ∈ KA,

−∞ else.

Proof. If b ∈ rgeA, Lemma 5.5 tells us that if x /∈ rge[V AT

]or V is not positive

semidefinite on kerA, we have v(V, x) = −∞. Hence, we need only show the expressionfor v when x ∈ rge

[V AT

]and V is positive semidefinite on kerA. Again, Lemma 5.5

tells us that, in this case, a solution to (5.7) exists. The first-order necessary optimalityconditions at

u ∈ argminu∈Rn

1

2uTV u− xTu s.t. Au = b

6= ∅

are that there exists y ∈ Rp for which(V AT

A 0

)(u

y

)=

(x

b

),

or equivalently, (u

y

)∈(V AT

A 0

)†(x

b

)+ ker

(V AT

A 0

).

80

Plugging such a pair(uy

)into the objective function yields

1

2uTV u− xT u =

1

2uT (V u− x︸︷︷︸

=−AT y

)− 1

2xT u

= −1

2uTAT︸︷︷︸

=bT

y − 1

2xT u

= −1

2

(x

b

)T(u

y

)= −1

2

(x

b

)T (V AT

A 0

)†(x

b

)T,

where the last equation is due to the fact that(x

b

)∈ rge

(V AT

A 0

)=

(ker

(V AT

A 0

))⊥.

Since all such points yield the same optimal value, this concludes the proof.

5.2.2 The generalized matrix fractional function

Given A ∈ Rp×n and B ∈ Rp×m such that rgeB ⊂ rgeA, Theorem 5.7 motivates aninspection of a generalization of the matrix fractional function defined by

ϕA,B : (X, V ) ∈ E 7→

12tr((

XB

)TM(V )†

(XB

))if rge

(XB


+∞, else.(5.10)

Theorem 5.7 then compactly reads

−v(x, V ) = ϕA,b(x, V )

for some b ∈ rgeA. Moreover, for A = 0 and B = 0 we recover the matrix-fractionalfunction from (5.3), i.e.

ϕ0,0 = γ.

this motivates us to call ϕA,B the generalized matrix-fractional function (GMF). Wealready argued in Section 5.1 that the matrix-fractional function γ is a support function.Naturally, this begs the question whether this is also true for the GMF. The followingresult gives a positive answer to this by showing that the GMF defined by A,B is thesupport function of the set

D(A,B) :=

(Y,−1

2Y Y T

)∈ E

∣∣ Y ∈ Rn×m : AY = B

. (5.11)

Theorem 5.8. Let A ∈ Rp×n and B ∈ Rp×m such that rgeB ⊂ rgeA and let D(A,B) begiven by (5.11). Then

σD(A,B)(X, V ) = ϕA,B(X, V ).

81

In particular,

domσD(A,B) =dom ∂σD(A,B) =

(X, V ) ∈ E

∣∣∣∣ rge

(X

B

)⊂ rgeM(V ), V ∈ KA

.

Moreover, we have

int (domσD(A,B)) = (X, V ) ∈ E | V ∈ intKA .

Proof. By direct computation,

σD(A,B)(X, V ) = sup(U,W )∈D(A,B)

〈(X, V ), (U,W )〉

= supU : AU=B

tr (XTU)− 1

2tr (UUTV )

= sup

U : AU=B

tr (XTU)− 1

2tr

(m∑i=1

ui(ui)TV

)

= supU : AU=B

m∑i=1

(xi)Tui − 1

2(ui)TV ui

= −m∑i=1

inf

u: Au=bi

1

2uTV u− (xi)Tu

(5.12)

=m∑i=1

12

(xi

bi

)TM(V )†

(xi

bi

)if

(xi

bi

)∈ rgeM(V ), V ∈ KA,

+∞ else

=

1

2tr

((X

B

)TM(V )†

(X

B

))if rge

(X

B

)⊂rgeM(V ), V ∈ KA,

+∞ else.

Here, the sixth equation exploits Theorem 5.7. This establishes the representation forσD(A,B) as well as for its domain. In addition, since

∂σD(A,B)(X, V ) = argmax 〈(X, V ), (U,W )〉 | (U,W ) ∈ D(A,B) ,

Theorem 5.7 also yields the equivalence domσD(A,B) = dom ∂σD(A,B).In order to see that domσD(A,B) is closed, let (Xk, Vk) ∈ domσD(A,B) → (V,X).

In particular, rgeXk ⊂ rge [Vk AT ] and Vk ∈ KA. These properties are preserved by

passing to the limit, hence (X, V ) ∈ domσD(A,B), i.e., domσD(A,B) is closed.It remains to prove the expression for int (domσD(A,B)). First we show that O :=

(X, V ) ∈ E | V kerA 0 is open. For this purpose, let (X, V ) ∈ O. Suppose q := rankAand let P ∈ Rn×(n−q) be such that its columns form a basis of kerA so that P TV P ispositive definite. Using the fact that Sn++ = int Sn+ (e.g., see [3, Ch. 1, Ex. 1]), thereexists an ε > 0 such that

‖W − V ‖ ≤ ε ⇒ P TWP 0 ∀W ∈ Sn.

82

Hence, Bε(X, V ) ⊂ O so that O is open.Next, we show that O ⊂ domσD(A,B). Again let (X, V ) ∈ O, so that V kerA 0.

By Finsler’s lemma (see Lemma A.6) we readily infer that rge [V AT ] = Rn, and sorgeX ⊂ rge [V AT ]. Hence (X, V ) ∈ domσD(A,B) yielding O ⊂ domσD(A,B). Since O isopen and O ⊂ domσD(A,B), we have O ⊂ int

(domσD(A,B)

).

We now show the reverse inclusion. Let (X, V ) ∈ domσD(A,B) \O. Hence V is positivesemidefinite, but not positive definite on kerA. Therefore, there exists x ∈ kerA \ 0such that xTV x = 0 . Define

Vk := V − 1

kI ∈ Sn (k ∈ N).

Then, for all k ∈ N, xTVkx = −‖x‖2

k< 0 , so that (X, Vk) /∈ domσD(A,B) for all k ∈ N while

(X, Vk)→ (X, V ). Hence (X, V ) ∈ bd(domσD(A,B)

), that is, (X, V ) /∈ int (domσD(A,B)).

Consequently, O ⊃ int(domσD(A,B)

).

Justified by Theorem 5.8, from now on we will σD(A,B) = ϕA,B interchangeably and alsorefer to σD(A,B) as the generalized matrix-fractional function or simply GMF.

We state the case A = 0 and B = 0 explicitly.

Corollary 5.9. We have γ = σD(0,0).

5.2.3 Convex analysis of the generalized matrix-fractional function

Theorem 5.8 allows us to invoke the whole machinery for support functions and its dual-ity correspondence with indicator functions, see Section 3.5.3, and its rich subdifferentialcalculus, see Corollary 3.87. These results already foreshadow that we need a good de-scription of the closed convex hull of D(A,B). This will be achieved by the followingset

Ω(A,B) :=

(Y,W ) ∈ E

∣∣∣∣ AY = B and1

2Y Y T +W ∈ KA

(5.13)

as the next result shows. We also set Ω := Ω(0, 0).

Theorem 5.10. Let D(A,B) and Ω(A,B) be s given by (5.11) and (5.13), respectively.Then

convD(A,B) = Ω(A,B).

Proof. We first show that Ω(A,B) is itself a closed convex set. Obviously, Ω(A,B) isclosed since KA is closed and the mappings Y 7→ AY and (Y,W ) 7→ 1

2Y Y T + W are

continuous.So we need only show that Ω(A,B) is convex: To this end, let (Yi,Wi) ∈ Ω(A,B), i =

1, 2 and 0 ≤ λ ≤ 1. Then there exist Mi ∈ KA, i = 1, 2 such that Wi = −12YiY

Ti + Mi.

83

Observe that A((1− λ)Y1 + λY2) = B. Moreover, we compute that

1

2((1− λ)Y1 + λY2)((1− λ)Y1 + λY2)T + ((1− λ)W1 + λW2)

=1

2((1−λ)Y1+λY2)((1−λ)Y1+λY2)T +

((1−λ)(−1

2Y1Y

T1 +M1)+λ(−1

2Y2Y

T2 +M2)

)=

1

2λ(1− λ)(−Y1Y

T1 + Y1Y

T2 + Y2Y

T1 − Y2Y

T2 ) + (1− λ)M1 + λM2

=λ(1− λ)

(−1

2(Y1 − Y2)(Y1 − Y2)T

)+ (1− λ)M1 + λM2.

Since rge (Y1 − Y2) ⊂ kerA, this shows λ(1− λ)(−1

2(Y1 − Y2)(Y1 − Y2)T

)+ (1− λ)M1 +

λM2 ∈ KA. Consequently, Ω(A,B) is a closed convex set.Next note that if (Y,−1

2Y Y T ) ∈ D(A,B), then (Y,−1

2Y Y T ) ∈ Ω(A,B) since 0 ∈ KA.

Hence, convD(A,B) ⊂ Ω(A,B).It therefore remains to establish the reverse inclusion: For these purposes, let (Y,W ) ∈

Ω(A,B). By Caratheodory’s theorem, there exist µi ≥ 0, vi ∈ kerA (i = 1, . . . , N) suchthat

W = −1

2Y Y T −

N∑i=1

µivivTi ,

where N = n(n+1)2

+ 1. Let 0 < ε < 1. Set λ1 := 1− ε and λ2 = . . . = λN+1 = λ := ε/N .

Denote Y1 := Y/√

1− ε. Take Zi ∈ Rn×m, i = 1, . . . , N such that AZi = B. Finally, set

Vi =

[√2µiλvi, 0, . . . , 0

]∈ Rn×m and Yi+1 = Zi + Vi, (i = 1, . . . , N).

Observe that

N+1∑i=1

λiYi =√

1− εY +ε

N

N+1∑i=2

Yi =√

1− εY +ε

N

N∑i=1

Zi +

√ε

N

N∑i=1

Vi,

where Vi = [√

2µivi, 0, . . . , 0], i = 1, . . . , N , and

−1

2

N+1∑i=1

λiYiYTi = −1

2Y Y T − 1

2

N∑i=1

ε

N

(ZiZ

Ti + ZiV

Ti + ViZ

Ti

)−

N∑i=1

µivivTi

= W −N∑i=1

1

2

(ε

NZiZ

Ti +

√ε

NZiV

Ti +

√ε

NViZ

Ti

),

.

Therefore(√

1− εY +ε

N

N∑i=1

Zi +

√ε

N

N∑i=1

Vi, W −N∑i=1

1

2

(ε

NZiZ

Ti +

√ε

NZiV

Ti +

√ε

NViZ

Ti

))

=

(N+1∑i=1

λiYi, −1

2

N+1∑i=1

λiYiYTi

). (5.14)

84

By Caratheodory’s theorem (see Theorem 3.5) we have

convD(A,B)=

(κ+1∑i=1

λiYi,−1

2

κ+1∑i=1

λiYiYTi

)∣∣∣∣λ ∈ Rκ+1+ ,

∑κ+1i=1 λi = 1, Yi ∈ Rn×m

AYi = B (i = 1, . . . , κ+ 1)

.

By letting ε ↓ 0 in (5.14), we find (Y,W ) ∈ convD(A,B) thereby concluding the proof.

Let us state the immediate consequence of the above theorem.

Corollary 5.11. Let Ω(A,B) be given by(5.13). Then

ϕA,B = σD(A,B) = σΩ(A,B) and γ = σΩ.

The conjugacy relation between indicator and support functions, see Proposition 3.66,now gives the following immediate consequence of Theorem 5.10.

Corollary 5.12 (Conjugate of GMF). We have

σ∗D(A,B) = δΩ(A,B).

In order to derive the subdifferential for ϕA,B = σD(A,B), we use the relation

∂σC(x) = z ∈ convC | x ∈ NconvC (z) (5.15)

from Corollary 3.87. Therefore, we first need the normal cone to Ω(A,B).

Proposition 5.13 (The normal cone to Ω(A,B)). Let Ω(A,B) be as given by (5.13) andlet (Y,W ) ∈ Ω(A,B). Then

NΩ(A,B) (Y,W ) =

(X, V ) ∈ E

∣∣∣∣∣∣ V ∈ KA,⟨V,

1

2Y Y T +W

⟩= 0

and rge (X − V Y ) ⊂ (kerA)⊥

Proof. Observe that Ω(A,B) = C1 ∩ C2 ⊂ E where

C1 :=Y ∈ Rn×m | AY = B

× Sn and C2 := (Y,W ) | F (Y,W ) ∈ KA ,

with F (Y,W ) := 12Y Y T + W . Clearly, C1 is affine, hence convex, and C2 is also convex,

which can be seen by an analogous reasoning as for the convexity of Ω(A,B) (cf. theproof of Theorem 5.10). Therefore, [24, Corollary 23.8.1] tells us that

NΩ(A,B) (Y,W ) = NC1 (Y,W ) +NC2 (Y,W ) , (5.16)

whereNC1 (Y,W ) =

R ∈ Rn×m ∣∣ rgeR ⊂ (kerA)⊥

× 0.

We now compute NC2 ((Y,W )). First recall that for any nonempty closed convex coneC ⊂ E , we have NC (x) = z ∈ C | 〈z, x〉 = 0 for all x ∈ C. Next, note that

∇F (Y,W )∗U = (UY, U) (U ∈ Sn),

85

so that ∇F (Y,W )∗U = 0 if and only if U = 0. Hence, by [25, Exercise 10.26 Part (d)],

NC2 (Y,W ) =

(V Y, V )

∣∣∣∣ V ∈ KA, ⟨V, 1

2Y Y T +W

⟩= 0

.

Therefore, by (5.16), NΩ(A,B) (Y,W ) is given by(X, V )

∣∣∣∣ rge (X − V Y ) ⊂ (kerA)⊥, V ∈ KA,⟨V,

1

2Y Y T +W

⟩= 0

,

which proves the result.

By combining (5.15) and Proposition 5.13 we obtain a simplified representation of thesubdifferential of the support function σD(A,B).

Corollary 5.14 (The subdifferential of σD(A,B)). Let D(A,B) be as given in (5.11). Then,for all (X, V ) ∈ domσD(A,B), we have

∂σD(A,B) (X, V ) =

(Y,W ) ∈ Ω(A,B)

∣∣∣∣∣∣∃Z ∈ Rp×m : X = V Y + ATZ,⟨V,

1

2Y Y T +W

⟩= 0

.

Proof. This follows directly from the normal cone description in Proposition 5.13 and therelation (5.15).

We infer from Corollary 5.14 that the GMF is continuously differentiable on the interiorof its domain, and we give an explicit formula for the gradient in this case.

Corollary 5.15. Let D(A,B) be as given in (5.11). Then σD(A,B) is (continuously)differentiable on the interior of its domain with

∇σD(A,B)(X, V ) =

(Y,−1

2Y Y T

)((X, V ) ∈ int (dom σD(A,B)))

where Y := A†B + (P (P TV P )−1P T )(X − A†X), P ∈ Rn×(n−p) is any matrix whosecolumns form an orthonormal basis of kerA and p := rankA.

Proof. We first recall from Theorem 5.8 that (X, V ) ∈ int dom σD(A,B) if and only ifV ∈ intKA. By Corollary 5.14, (Y,W ) ∈ ∂σD(A,B)(X, V ) if and only if there existsZ ∈ Rp×m such that

AY = B (5.17)

V Y + ATZ = X (5.18)⟨V,

1

2Y Y T +W

⟩= 0 (5.19)

1

2Y Y T +W ∈ KA. (5.20)

86

We now observe that (5.17)-(5.18) are equivalent to

M(V )

(YZ

)=

(XB

).

As V ∈ intKA, by Proposition 5.6 this implies

Y = A†B + P (P TV P )−1P T (X − A†X).

Moreover (5.19), (5.20) and V ∈ intKA readily imply that 12Y Y T + W = 0, which

concludes the proof.

We state the special case A = 0 and B = 0 (i.e. σD(A,B) = γ) as a separate result, whichby the way, was well known all along, see e.g. [6], and can be derived by standard calculusmethods.

Corollary 5.16. The matrix-fractional function γ from (5.3) is (continuously) differen-tiable on the interior of its domain with

∇γ(X, V ) =

(V −1X,−1

2V −1XXTV −1

)((X, V ) ∈ Rn×m × Sn++).

Remark 5.17. We would like to point out that in [7, Proposition 4.3] the followingdescription of the closed convex hull of D(A,B) was established:

convD(A,B) =

(Z(d⊗ Im),−1

2ZZT ) | (d, Z) ∈ F(A,B)

, (5.21)

where d⊗ Im := (d1Im, . . . , dκ+1Im)T ∈ Rm(κ+1)×m and

F(A,B) :=

(d, Z) ∈ Rκ+1 × Rn×m(κ+1)

∣∣∣∣∣ d ≥ 0, ‖d‖ = 1,

AZi = diB (i = 1, . . . , κ+ 1)

. (5.22)

The description in (5.21) was obtained by computing the convex hull of D(A,B) first, see[7, Lemma 4.2], using ”brute force” in the form of Caratheodory’s Theorem (cf. Theorem3.5), and then determining the closure of said convex hull. Although the representationof convD(A,B) from (5.21) was successfully used in [7] to study some convex-analyticalproperties of the GMF and it yielded some interesting applications, the description fromTheorem 5.10 is much more powerful due to its simplicity.

5.3 The geometry of Ω(A,B)

We now compute the relative interior and the affine hull of Ω(A,B). We will rely heavilyand expand on the results established in Section 3.1.4. In particular, Theorem 3.29 willbe very useful as we will use this result to get a representation for the relative interior ofΩ(A,B) directly, and then mimic its technique of proof to tackle its affine hull.

Lemma 5.18. Let A,B ⊂ E be convex with riA ∩ riB 6= ∅. Then aff (A ∩ B) = aff A ∩aff B.

87

Proof. The inclusion aff (A ∩ B) ⊂ aff A ∩ aff B is clear since the latter set is affine andcontains A ∩B.

For proving the reverse inclusion, we can assume w.l.o.g. that 0 ∈ riA ∩ riB =ri (A ∩ B), where for the latter equality we refer to Proposition 3.26. In particular wehave

aff A = R+A, aff B = R+B and aff (A ∩B) = R+(A ∩B), (5.23)

see Exercise 3.1.7 and the discussion afterwards. Now, let x ∈ aff A ∩ aff B. If x = 0there is nothing to prove. If x 6= 0, by (5.23), we have x = λa = µb for some λ, µ > 0 anda ∈ A, b ∈ B. W.l.o.g we have λ > µ, and hence, by convexity of B, we have

a =(

1− µ

λ

)0 +

µ

λb ∈ B.

Therefore x = λa ∈ R+(A ∩B) = aff (A ∩B), see (5.23).

We now prove a result analogous to Theorem 3.29.

Proposition 5.19. In addition to the assumptions of Theorem 3.29 assume that D isaffine. Then (y, z) ∈ aff C if and only if y ∈ D and z ∈ aff Cy.

Proof. We imitate the proof of Theorem 3.29: Let L : (y, z) 7→ z. Since D is assumed tobe affine (hence D = aff D = riD), we have

D = L(C) = L(riC) = L(aff C), (5.24)

where we invoke the fact that linear mappings commute with the relative interior (Propo-sition 3.28) and the affine hull (Exercise 3.1.6). Now fix y ∈ D = riD and define theaffine set My := (y, z) | z ∈ E2 = y × E2. Then, by (5.24), there exists z ∈ E2 suchthat y = L(y, z) and (y, z) ∈ riC. Hence, riMy ∩ riC 6= ∅ and we can invoke Lemma 5.18to obtain

aff My ∩ aff C = aff (My ∩ C) = aff (y × Cy) = y × aff Cy.

Hence, if y ∈ D, z ∈ aff Cy, we have (y, z) ∈ y × aff Cy = My ∩ aff C ⊂ aff C.In turn, for (y, z) ∈ C, we have (y, z) ∈ My ∩ aff C = y × Cy, hence z ∈ Cy 6= ∅, so

y ∈ D.

We are now in a position to prove the desired result on the relative interior and the affinehull of Ω(A,B).

Proposition 5.20. For Ω(A,B) given by (5.13) the following hold:

a) ri Ω(A,B) =

(Y,W ) ∈ E∣∣ AY = B and 1

2Y Y T +W ∈ ri (KA)

.

b) aff Ω(A,B) =

(Y,W ) ∈ E∣∣ AY = B and 1

2Y Y T +W ∈ spanKA

,

where spanKA = spanvvT | v ∈ kerA

.

Proof. We apply the format of Theorem 3.29 and Proposition 5.19, respectively, for C :=Ω(A,B). Then

D = Y | AY = B and CY =

KA − 1

2Y Y T , if AY = B,∅, else

(Y ∈ Rn×m).

88

a) Apply Theorem 3.29 and observe that ri (KA − 12Y Y T ) = ri (KA)− 1

2Y Y T .

b) Apply Proposition 5.19 and observe that D is affine, and that aff (KA − 12Y Y T ) =

aff (KA)− 12Y Y T .

As a direct consequence of Propositions 5.4 and 5.20, we obtain the following result forthe special case (A,B) = (0, 0).

Corollary 5.21. It holds that

conv

(Y,−1

2Y Y T )

∣∣ Y ∈ Rn×m

=

(Y,W ) ∈ E

∣∣∣∣ W +1

2Y Y T 0

,

and

int

(conv

(Y,−1

2Y Y T )

∣∣ Y ∈ Rn×m)

=

(Y,W ) ∈ E

∣∣∣∣ W +1

2Y Y T ≺ 0

.

We conclude this section by giving representations for the horizon cone and polar ofΩ(A,B).

Proposition 5.22 (The polar of Ω(A,B)). Let Ω(A,B) be as given in (5.13). Then

Ω(A,B) =

(X, V )

∣∣∣∣∣ rge(XB


12tr((

XB

)TM(V )†

(XB

))≤ 1

.

Moreover,

Ω(A,B)∞ = 0n×m × KA (5.25)

and

(Ω(A,B))∞=

(X, V )

∣∣∣∣∣ rge(XB


12tr((

XB

)TM(V )†

(XB

))≤ 0

. (5.26)

Proof. For any nonempty convex set C ⊂ E, observe that

z | σC (z) ≤ 1 = z | 〈z, x〉 ≤ 1, (x ∈ C) = C,

see Definition 3.72. Consequently, our expression for Ω(A,B) follows from Theorem 5.8.To see (5.25), let (Y,W ) ∈ Ω(A,B) and recall that (S, T ) ∈ Ω(A,B)∞ if and only if

(Y + tS,W + tT ) ∈ Ω(A,B) for all t ≥ 0. In particular, for (S, T ) ∈ Ω(A,B)∞, we haveA(Y + tS) = B and

1

2

[Y Y T + t(SY T + Y ST ) +

t2

2SST

]+ (W + tT ) ∈ KA (t > 0). (5.27)

Consequently, AS = 0 and, if we divide (5.27) by t2 and let t ↑ ∞, we see that SST ∈ KA.But SST ∈ KA since rgeS ⊂ kerA, so we must have S = 0. If we now divide (5.27) by

89

t and let t ↑ ∞, we find that T ∈ KA. Hence the set on the left-hand side of (5.25) iscontained in the one on the right. To see the reverse inclusion, simply recall that KA is aclosed convex cone so that KA +KA ⊂ KA.

Finally, we show (5.26). Since (0, 0) ∈ Ω(A,B), we have (S, T ) ∈ (Ω(A,B))∞ if andonly if (tS, tT ) ∈ Ω(A,B) for all t > 0, or equivalently, for all t > 0,

tT ∈ KA and ∃ (Yt, Zt) ∈ Rn×m × Rp×m s.t.

(tS

B

)= M(tT )

(YtZt

)with

1

2tr

((YtZt

)TM(tT )

(YtZt

))≤ 1,

or equivalently, by taking Zt := t−1Zt,

T ∈ KA and ∃ (Yt, Zt) ∈ Rn×m × Rp×m s.t.

(S

B

)= M(T )

(Yt

Zt

)with

t

2tr

((Yt

Zt

)TM(T )

(Yt

Zt

))≤ 1.

If we take(YtZt

):= M(T )†

(SB

), we find that (S, T ) ∈ (Ω(A,B))∞ if and only if

T ∈ KA andt

2tr

((S

B

)TM(T )†

(S

B

))≤ 1 (t > 0),

which proves the result.

5.4 σΩ(A,0) as a gauge

Note that the origin is an element of Ω(A,B) if and only if B = 0. In this case the supportfunction of Ω(A, 0) equals the gauge of Ω(A, 0) This fact and an explicit representationfor both γΩ(A,0) and γΩ(A,0) will be given in the following theorem.

Theorem 5.23 (σD(A,0) is a gauge). Let Ω(A,B) be as given in (5.13). Then

σΩ(A,0) (X, V ) = γΩ(A,0) (X, V ) = γΩ(A,0)(X, V ), (5.28)

and

γΩ(A,0) (Y,W )=σΩ(A,0) (Y,W )

=

12σ−1

min(−Y †W (Y †)T ) if rgeY ⊂kerA ∩ rgeW,W ∈ KA,+∞ else,

(5.29)

where σmin(−Y †W (Y †)T ) is the smallest singular value of −Y †W (Y †)T when such a singu-lar value exists and +∞ otherwise, e.g. when Y = 0. Here we interpret 1

∞ as 0 (0 = 1∞),

and so, in particular, γΩ(A,0) (0,W ) = δKA (W ).

90

Proof. The statement in (5.28) follows from Proposition 3.74. To show (5.29), first observethat

tΩ(A, 0)=

(Y,W )

∣∣∣∣ AY = 0 and1

2Y Y T + tW ∈ KA

, (5.30)

whose straightforward proof is left to the reader.Given t ≥ 0, by (5.30), (Y,W ) ∈ tΩ(A, 0) for all t > t if and only if AY = 0 and

12Y Y T + tW ∈ KA for all t > t. By Proposition 5.4 a), this is equivalent to AY = 0 and

1

2Y Y T + tW = P

(1

2Y Y T + tW

)P 0 (t > t), (5.31)

where, again, P is the orthogonal projection onto kerA. Dividing this inequality by t andtaking the limit as t ↑ ∞ tells us that W = PWP 0. Since Y Y T is positive semidefinite,inequality (5.31) also tells us that kerW ⊂ kerY T , i.e. rgeY ⊂ rgeW . Consequently,

dom γΩ(A,0) ⊂ (Y,W ) | rgeY ⊂kerA ∩ rgeW,W ∈ KA .

Now suppose (Y,W ) ∈ dom γΩ(A,0). Let Y = UΣV T be the reduced singular-value de-composition of Y where Σ is an invertible diagonal matrix and U, V have orthonormalcolumns. Since rgeY ⊂ rgeW = (kerW )⊥, we know that UTWU is negative definite,and so Σ−1UTWUΣ−1 is also negative definite. Multiplying (5.31) on the left by Σ−1UT

and on the right by UΣ−1 gives

µI −2Σ−1UTWUΣ−1 (0 < µ ≤ µ),

where µ = t−1. The largest µ satisfying this inequality is

σmin(−2Y †W (Y †)T ) = σmin(−2Σ−1UTWUΣ−1) > 0,

or equivalently, the smallest possible t in (5.31) is 1/σmin(−2Y †W (Y †)T ), which provesthe result.

5.5 Applications

5.5.1 Conjugate of variational Gram functions

Given a set M ⊂ Sn+, the associated variational Gram function (VGF) [9, 21] is given by

ΩM : Rn×m → R ∪ +∞, ΩM(X) =1

2σM(XXT ).

Since σM = σconvM there is no loss in generality to assume that M is closed and convex.For the remainder we let

F : Rn×m → Sn, F (X) =1

2XXT . (5.32)

Then ΩM = σM F fits the composite scheme studied in Section 4.3. It is obvious thatrgeF = Sn+. The following lemma clarifies the K-convexity properties of F .

91

Lemma 5.24. Let F be given by (5.32). Then Sn+ is the smallest closed convex cone inSn with respect to which F is convex.

Proof. Let K be the smallest closed convex cone in Sn such that F is K-convex. On theone hand, since F is Sn+-convex, K ⊂ Sn+. On the other hand, by Lemma 4.3,

(−K) = KF := V ∈ Sn | 〈V, F 〉 is convex .

Now fixing V ∈ Sn for all X ∈ Rn×m the mapping ∇2 〈V, F 〉 (X) : Rn×m × Rn×m → R isgiven by

∇2 〈V, F 〉 (X)[D,H] =

⟨V,

1

2(HDT +DHT )

⟩.

Clearly, this symmetric bilinear form is positive semidefinite if and only if V ∈ Sn+, whichproves that

KF = Sn+.

Finally, by taking the polarity and using the fact that (Sn+) = −Sn+ we get

K = −(−K) = −KF = −(Sn+) = Sn+.

Corollary 5.25. Let M ⊂ Sn+ be nonempty, closed and convex, and let F be given by(5.32). Then the following hold:

a) We have −hznσM ⊃ Sn+. In particular, F is (−hznσM)-convex.

b) ri (hznσM) ⊂ ri (domσM).


i) rgeF ∩ ri (domσM − Sn+) 6= ∅;ii) rgeF ∩ ri (domσM + hznσM) 6= ∅;

iii) M is bounded.

Proof. a) We have cone M ⊂ Sn+ as M ⊂ Sn+, and hence,

−hznσM = −(cone M) ⊃ −(Sn+) = Sn+.

b) Since M is a subset of the closed convex cone cone M , cone M = (cone M)∞ ⊃ M∞.Furthermore, because they lie entirely in Sn+, they are both pointed and hence, by Exercise3.2.5, ∅ 6= int [(cone M)] ⊂ int [(M∞)]. Therefore,

ri (hznσM) = ri [(cone M)] = int [(cone M)]

⊂ int [(M∞)] = ri [(M∞)]

= ri (domσM) = ri (domσM),

where the equality between the second line and the third line follows from Theorem 3.81and the equality in the last line follows from Proposition 3.23 b).

92

c) i) ⇒ ii): This follows immediately from a) and the fact that both domσM and hznσMhave nonempty interiors and Corollary 3.27.

ii)⇔iii): Observe that

∅ 6= int [(M∞)] = W ∈ Sn | tr (WV ) < 0 (V ∈M∞ \ 0) .

In particular, also taking into account a), we have

ri (domσM + hznσM) = ri (domσM) + ri (hznσM)

= ri [(M∞)] + ri [(cone M)]

= ri [(M∞) + (cone M)]

= ri [(M∞)] = int [(M∞)]

where (M∞) + (cone M) = (M∞) as 0 ∈ (cone M) ⊂ (M∞) are convex cones. AsrgeF = Sn+, condition ii) is equivalent to the condition

F :=W ∈ Sn+ | tr (VW ) < 0 (V ∈M∞ \ 0)

6= ∅.

We claim that

F :=

Sn+ if M is bounded,∅ if else.

The first case is clear, since M∞ = 0 if (and only if) M is bounded, see Proposition3.39, in which case the condition restricting F is vacuous.

On the other hand, if M is unbounded, then there exists V ∈ M∞ \ 0 ⊂ Sn+ \ 0.But then tr (VW ) ≥ 0 for all W ∈ Sn+, see Exercise 1.0.1, which proves the second case.

All in all, we have established the equivalence between ii) and iii).

iii) ⇒ i): Follows readily from the fact that (M∞) = 0 = Sn here.

Our analysis in Section 4.3 combined with our findings in Section 5.2.3 allows for a veryshort proof of the conjugate function Ω∗M in case M is bounded (hence compact). Thiscovers what was proven in [21, Proposition 3.4] entirely and one case of [9, Proposition5.10].

Corollary 5.25 c) shows that our framework does not apply when M is unbounded asthe crucial condition (4.4) is violated then.

Theorem 5.26. Let M ⊂ Sn+ be nonempty, convex and compact. Then Ω∗M is finite-valuedand given by

Ω∗(X) =1

2minV ∈M

tr (XTV †X) | rgeX ⊂ rgeV

.

Proof. Let K := Sn+. Recall that (cone M) = hzn σM . Then F given by (5.32) is K-convex by Lemma 5.24 and Corollary 5.25 a), and g is K-increasing by Lemma 4.10 andCorollary 5.25 a). By Corollary 5.25 c) we find that condition (4.4) for σM M and K is

93

satisfied if (and only if) M is bounded. Hence, we can apply Corollary 4.9 and Lemma4.5 a) and Corollary 5.11 to infer that

Ω∗(X) = minV ∈−K

δM(V ) + 〈V, F 〉∗ (X)

= minV ∈M

σK-epiF (X,−V )

= minV ∈M

σΩ(X, V )

= minV ∈M

γ(X, V )

= minV ∈M

1

2tr (XTV †X) | rgeX ⊂ rgeV

As M is compact this proves also the finite-valuedness.

5.5.2 Relation of the GMF and the nuclear norm

We now want to exapand on the intriguing relation between the GMF and the nuclearthat was already foreshadowed in Example 5.1. Here we use the following notation forthe matrix A ∈ Rp×n.

KerA :=V ∈ Rn×n | AV = 0

and RgeA :=

W ∈ Rn×n | rgeW ⊂ rgeA

.

Theorem 5.27. Let p : Rn×m → R be defined by

p(X) = infV ∈Sn

σΩ(A,0)(X, V ) +⟨U , V

⟩for some U ∈ Sn+ ∩KerA and set C(U) :=

Y∣∣ 1

2Y Y T U

. Then we have:

a) p∗ = δC(U)∩KerA is closed, proper, convex.

b) p = σC(U)∩KerA = γC(U)+RgeAT is sublinear, finite-valued, nonnegative and symmet-ric (i.e. a seminorm).

c) If U 0 with 2U = LLT (L ∈ Rn×n) and A = 0 then p = σC(U) = ‖LT (·)‖∗, i.e. pis a norm with C(U) as its unit ball and γC(U) as its dual norm.

Proof. a) Observe that σΩ(A,0)(X, V ) +⟨U , V

⟩= σΩ(A,0)+0×U(X, V ). As Ω(A, 0) +

0×U is nonempty, closed and convex, Theorem 3.101 yields p∗ = δΩ(A,0)+0×U(·, 0)which is closed and convex. Now observe that

dom δΩ(A,0)+0×U(·, 0) =

Y

∣∣∣∣ Y ∈ KerA,1

2Y Y T − U ∈ KA

= KerA ∩ C(U)

6= ∅.

Here we use the first identity uses the definition of Ω(A, 0). The second one is due to thefact that with RgeY Y T = RgeY ⊂ KerA and U ∈ KerA. The nonemptiness is clear as

94

0 ∈ KerA ∩ C(U). All in all, we have p∗ = δKerA∩C(U) which is closed proper, convex, seeTheorem 3.101.

b) We have

p = p∗∗

= σC(U)∩KerA

= γ(C(U)∩KerA)

= γcl (C(U)+RgeAT )

= γC(U)+RgeAT .

The first identity is due to Theorem 3.101 c). The second uses a), the third followsfrom Proposition 3.74. The sublinearity of p is clear. The finite-valuedness is also clearsince the domain of p is the whole space (clear?) and p is proper. Since 0 ∈ C(U) thenonnegativity follows as well, and the symmetry is due to the symmetry of C(U).

c) Consider the case U = 12I: By part a), we have p∗ = δY | Y Y TI . Observe that

Y∣∣ Y Y T I

= Y | ‖Y ‖2 ≤ 1 =: BΛ

is the closed unit ball of the spectral norm. Therefore, p = σBΛ= ‖ · ‖BΛ = ‖ · ‖∗, cf.

Corollary 3.76.To prove the general case suppose that 2U = LLT . Then it is clear that C(U) =

Y∣∣ L−1Y ∈ C(1

2I)

, and therefore

p(X) = σC(U)(X)

= supY :L−1Y ∈C( 1

2I)

〈Y, X〉

= supY :L−1Y ∈C( 1

2I)

⟨L−1Y, LTX

⟩= σC( 1

2I)(L

TX)

= ‖LTX‖∗.

Here the first identity is due to part b) (with A = 0) and the last one follows from thespecial case considered above.

We point out that Theorem 5.27 significantly generalizes the result by Hsieh and Olseneluded to in Example 5.1.

For another result along these lines, which also generalized the Hsieh and Olsen result,we refer the interested reader to [7, Theorem 5.7].

A Background from Linear Algebra

Here we gather some background material and notation from linear algebra that we usein our study. Most of the results (unless stated otherwise) can be found in e.g. [17, 18].

95

Symmetric and orthogonal matrices

Recall that the setO(n) :=

A ∈ Rn×n ∣∣ ATA = I

is a group (in fact, a subgroup of the the invertible matrices in Rn×n) called the orthogonalgroup with its members being called orthogonal matrices.

An important subspace of Rn×n is

Sn =A ∈ Rn×n ∣∣ AT = A

,

the space of all n× n symmetric matrices.In Linear Algebra it is shown that any symmetric matrix is orthogonally similar to

a diagonal matrix with real entries (the eigenvalues), which is usually subsumed in thefollowing theorem.

Theorem A.1 (Spectral Theorem). Let A ∈ Sn. Then there exists U ∈ O(n) orthogonalsuch that

A = UTdiag (λ1, . . . , λn)U

and λ1, . . . , λn ∈ R.

Some important subsets of Sn are:

• Sn+ :=A ∈ Sn

∣∣ xTAx ≥ 0 (x ∈ Rn)

(positive semidefinite matrices),

• Sn++ :=A ∈ Sn

∣∣ xTAx > 0 (x ∈ Rn \ 0)

(positive definite matrices),

• Sn− :=A ∈ Sn

∣∣ xTAx ≤ 0 (x ∈ Rn)

(negative semidefinite matrices),

• Sn−− :=A ∈ Sn

∣∣ xTAx < 0 (x ∈ Rn \ 0)

(negative definite matrices).

Note thatSn− = −Sn+ and Sn−− = −Sn++.

For A,B ∈ Sn, we also make use of the convention

A B :⇐⇒ A−B ∈ Sn+

andA B :⇐⇒ A−B ∈ Sn++.

An immediate consequence of the spectral theorem is the existence of a square root of apositive semidefinite matrix.

Corollary A.2 (Square root of a positive semidefinite matrix). For A ∈ Sn+ there existsa unique matrix B ∈ Sn+ such that B2 = A.

In the scenario of Corollary A.2, we put√A := A1/2 := B and call it the square root of

A. Moreover, we define A−1/2 :=√A−1 = (

√A)−1.

96

Singular value decomposition and the Moore-Penrose Pseudoinverse

An important theoretical and computational tool for matrix analysis is the following well-known theorem which is also a consequence of the spectral theorem applied to the positivesemidefinite matrix

√ATA.

Theorem A.3 (Singular-value decomposition). Let A ∈ Rm×n and put r := rankA. Thenthere exist orthogonal matrices U ∈ Rm×m and V ∈ Rn×n as well as a matrix Σ ∈ Rm×n

with

Σ =

(diag (σ1, . . . σr) 0

0 0

), σ1, . . . , σr > 0

such that A = UΣV T .

The scalars σ1, . . . , σr are called the singular-values of A, and they coincide with thepositive eigenvalues of

√ATA.

The singular-value decomposition of a matrix gives rise to a whole class of matrixnorms, namely the Schatten norms. For A ∈ Rm×n the Schatten p-norm is the p-norm ofthe vector of singular values, i.e.

‖A‖p := ‖σ‖p :=

(r∑i=1

σpi

) 1p

,

where σ = (σ1, . . . , σr) is the vector of singular-values of A. The nuclear norm is theSchatten 1-norm, and often denoted by ‖ · ‖∗.

Theorem A.4 (Moore-Penrose pseudoinverse). Let A ∈ Sn+ with rankA = r and thespectral decomposition

A = QΛQT with Λ =

λ1

...λr

0...

0

, Q ∈ O(n).

Then the matrix

A† := QΛ†QT with Λ† :=

λ−1

1

...λ−1r

0...

0

,

called the (Moore-Penrose) pseudoinverse of A, has the following properties:

a) AA†A = A and A†AA† = A†;

b) (AA†)T = AA† and (A†A)T = A†A;

c) (A†)T = (AT )†;

97

d) If A 0, then A† = A−1;

e) AA† = PrgeA, i.e. AA† is the projection onto the image of A. In particular, ifb ∈ rgeA, we have

x ∈ Rn | Ax = b = A†b+ kerA.

In fact, the Moore-Penrose pseudoinverse can be uniquely defined through properties a)and b) from above for any matrix A ∈ Cm×n, see, e.g. [18], but we confine ourselves withthe positive semidefinite case.

Using the Moore-Penrose pseudoinverse, one has the following extension of the Schurcomplement, see [16, Th. 16.1] for a proof or [6, App. A.5.5].

Lemma A.5 (Schur complement). Let S ∈ Sn, T ∈ Sm, R ∈ Rn×m. Then(S RRT T

) 0 ⇐⇒ [S 0, rgeR ⊂ rgeS, T −RTS†R 0].

The next result is referred to in the literature as Finsler’s Lemma and originally goesback to [14], and can also be found in [10, Th. 2].

Lemma A.6 (Finsler’s lemma). Let A ∈ Rp×n and V ∈ Sn. Then

V kerA 0 ⇐⇒ ∃ε > 0 : V + εATA 0.

Exercises for Section A

1.0.1 (Trace of product of semdefinite matrices) Let A,B ∈ Sn+.

a) Is AB symmetric positive definite?

b) Show that tr (AB) ≥ 0.

98

References

[1] H. Attouch and H. Brezis: Duality for the Sum of Convex Functions in GeneralBanach Spaces. In Aspects of Mathematics and its Applications, Eds. l.A. Barroso,North Holland, Amsterdam, 1986, pp.125–133.

[2] H.H. Bauschke and P.L. Combettes: Convex analysis and Monotone Operator Theoryin Hilbert Spaces. CMS Books in Mathematics, Springer-Verlag, New York, 2011.

[3] J.M. Borwein and A.S. Lewis: Convex Analysis and Nonlinear Optimization.Theory and Examples. CMS Books in Mathematics, Springer-Verlag, New York, 2000.

[4] J. M. Borwein: Optimization with respect to Partial Orderings. Ph.D. Thesis,University of Oxford, 1974.

[5] R.I. Bot and G. Wanka: The conjugate of the pointwise maximum of two convexfunctions revisited. Journal of Global Optimization 41(3), 2008, pp. 625–632.

[6] S. Boyd and L. Vandenbergh: Convex Optimization. Cambridge UniversityPress, 2004.

[7] J. V. Burke and T. Hoheisel: Matrix support functionals for inverse problems,regularization, and learning. SIAM Journal on Optimization 25, 2015, pp. 1135–1159.

[8] J. V. Burke, Y. Gao and T. Hoheisel: Convex Geometry of the GeneralizedMatrix-Fractional Function. SIAM Journal on Optimization, 28, 2018, pp. 2189–2200.

[9] J. V. Burke, Y. Gao, and T. Hoheisel: Variational properties of matrix func-tions via the generalized matrix-fractional function. Submitted to SIAM Journal onOptimization.

[10] Y. Chabrillac and J.-P. Crouzeix: Definiteness and semidefiniteness ofquadratic forms. Linear Algebra and its Applications 63, 1984, pp. 283–292.

[11] J. Dattorro: Convex Optimization & Euclidean Distance Geometry. Mεβoo Pub-lishing USA, Version 2014.04.08, 2005 .

[12] M. Faliva and M.G. Zoia: On a partitioned inversion formula having usefulapplications in econometrics. Econometric Theory 18, 2002, pp. 525–530.

[13] M. Fazel:Matrix Rank Minimization with Applications. PhD Thesis, Stanford Uni-versity, 2002.

[14] P. Finsler: Uber das Vorkommen definiter und semidefiniter Formen und Scharenquadratischer Formen. Commentarii Mathematici Helvetici 9, 1937, pp. 188–192.

[15] S.P. Fitzpatrick and S. Simons: On the pointwise maximum of convex functions.Proceedings of the American Mathematical Society 128(12), 2000, pp. 3553–3561.

99

[16] J. Gallier: Geometric Methods and Applications: For Computer Science and En-gineering. Texts in Applied Mathematics, Springer New York, Dordrecht, London,Heidelberg, 2011.

[17] R.A. Horn and C.R. Johnson: Matrix Analysis. Second Edition, CambridgeUniversity Press, Cambridge, 2013.

[18] R.A. Horn and C.R. Johnson: Topics in Matrix Analysis. Cambridge UniversityPress, Cambridge, 1991.

[19] J.-B. Hiriart-Urruty and C. Lemarechal: Fundamentals of Convex Analysis.Springer-Verlag, New York, N.Y., 2001.

[20] C.-J. Hsieh and P. Olsen: Nuclear norm minimization via active subspace selec-tion. Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 575-583.

[21] A. Jalali, M. Fazel, and L. Xiao: Variational Gram Functions: Convex Anal-ysis and Optimization. SIAM Journal on Optimization 27(4), 2017,pp. 2634–2661.

[22] J.R. Magnus and H. Neudecker: Matrix Differential Calculus with Applicationsin Statistics and Econometrics. John Wiley & Sons, New York, NY, 1988.

[23] Teemu Pennanen: Graph-Convex Mappings and K-Convex Functions. Journal ofConvex Analysis 6(2), 1999, pp. 235–266.

[24] R.T. Rockafellar: Convex Analysis. Princeton University Press, Princeton, NewJersey, 1970.

[25] R.T. Rockafellar and R.J.-B. Wets: Variational Analysis. A Series of Com-prehensive Studies in Mathematics, Vol. 317, Springer, Berlin, Heidelberg, 1998.

100

lecture notes - mcgill university · this set of notes was created for a series of invited lectures...

Documents