research.wsulibs.wsu.edu€¦ · acknowledgements my greatest appreciation and my most sincere...
TRANSCRIPT
OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY
By
BAHA’ M. ALZALG
A dissertation submitted in partial fulfillment ofthe requirements for the degree of
DOCTOR OF PHILOSOPHY
WASHINGTON STATE UNIVERSITYDepartment of Mathematics
DECEMBER 2011
To the Faculty of Washington State University:
The members of the Committee appointed to examine the dissertation of
BAHA’ M. ALZALG find it satisfactory and recommend that it be accepted.
K. A. Ariyawansa, Professor, Chair
Robert Mifflin, Professor
David S. Watkins, Professor
ii
For all the people
iii
ACKNOWLEDGEMENTS
My greatest appreciation and my most sincere “Thank You!” go to my advisor Pro-
fessor Ari Ariyawansa for his guidance, advice, and help during the preparation of this
dissertation. I am also grateful for his offer for me to work as his research assistant and
for giving me the opportunity to write papers and visit several conferences and workshops
in North America and overseas.
I wish also to express my appreciation and gratitude to Professor Robert Mifflin and
Professor David S. Watkins for taking the time to serve as committee members and for
various ways in which they helped me during all stages of doctoral studies.
I want to thank all faculty, staff and graduate students in the Department of Mathe-
matics at Washington State University. I would especially like to thank from the faculty
Associate Professor Bala Krishnamoorthy, from the staff Kris Johnson, and from the
students Pietro Paparella for their kind help.
Finally, no words can express my gratitude to my parents and my grandmother for
love and prayers. I also owe a special gratitude to my brothers and sisters, and some
relatives in Jordan for their support and encouragement.
iv
OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY
Abstract
by Baha’ M. Alzalg, Ph.D.
Washington State University
December 2011
Chair: Professor K. A. Ariyawansa
We introduce and study two-stage stochastic symmetric programs (SSPs) with recourseto handle uncertainty in data defining (deterministic) symmetric programs in which alinear function is minimized over the intersection of an affine set and a symmetric cone.We present a logarithmic barrier decomposition-based interior point algorithm for solvingthese problems and prove its polynomial complexity. Our convergence analysis proceedsby showing that the log barrier associated with the recourse function of SSPs behaves asa strongly self-concordant barrier and forms a self-concordant family on the first stagesolutions. Since our analysis applies to all symmetric cones, this algorithm extends Zhao’sresults [48] for two-stage stochastic linear programs, and Mehrotra and Ozevin’s results[25] for two-stage stochastic semidefinite programs (SSDPs). We also present another classof polynomial-time decomposition algorithms for SSPs based on the volumetric barrier.While this extends the work of Ariyawansa and Zhu [10] for SSDPs, our analysis is basedon utilizing the advantage of the special algebraic structure associated with the symmetriccone not utilized in [10]. As a consequence, we are able to significantly simplify the proofsof central results. We then describe four applications leading to the SSP problem where, inparticular, the underlying symmetric cones are second-order cones and rotated quadraticcones.
v
Contents
Acknowledgement iv
Abstract v
Chapter 1
1 Introduction and Background 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What is a symmetric cone? . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Symmetric cones and Euclidean Jordan algebras . . . . . . . . . . . . . . . 11
2 Stochastic Symmetric Optimization Problems 28
2.1 The stochastic symmetric optimization problem . . . . . . . . . . . . . . . 28
2.1.1 Definition of an SSP in primal standard form . . . . . . . . . . . . 29
2.1.2 Definition of an SSP in dual standard form . . . . . . . . . . . . . . 30
2.2 Problems that can be cast as SSPs . . . . . . . . . . . . . . . . . . . . . . 31
3 A Class of Polynomial Logarithmic Barrier Decomposition Algorithms
for Stochastic Symmetric Programming 42
3.1 The log barrier problem for SSPs . . . . . . . . . . . . . . . . . . . . . . . 43
3.1.1 Formulation and assumptions . . . . . . . . . . . . . . . . . . . . . 43
3.1.2 Computation of ∇xη(µ,x) and ∇2xxη(µ,x) . . . . . . . . . . . . . . 50
vi
3.2 Self-concordance properties of the log-barrier recourse . . . . . . . . . . . . 53
3.2.1 Self-concordance of the recourse function . . . . . . . . . . . . . . . 53
3.2.2 Parameters of the self-concordant family . . . . . . . . . . . . . . . 59
3.3 A class of logarithmic barrier algorithms for solving SSPs . . . . . . . . . . 64
3.4 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.1 Complexity for short-step algorithm . . . . . . . . . . . . . . . . . . 66
3.4.2 Complexity for long-step algorithm . . . . . . . . . . . . . . . . . . 67
4 A Class of Polynomial Volumetric Barrier Decomposition Algorithms
for Stochastic Symmetric Programming 75
4.1 The volumetric barrier problem for SSPs . . . . . . . . . . . . . . . . . . . 76
4.1.1 Formulation and assumptions . . . . . . . . . . . . . . . . . . . . . 76
4.1.2 The volumetric barrier problem for SSPs . . . . . . . . . . . . . . . 77
4.1.3 Computation of ∇xη(µ,x) and ∇2xxη(µ,x) . . . . . . . . . . . . . . 80
4.2 Self-concordance properties of the volumetric barrier recourse . . . . . . . . 85
4.2.1 Self-Concordance of η(µ, ·) . . . . . . . . . . . . . . . . . . . . . . . 86
4.2.2 Parameters of the self-concordant family . . . . . . . . . . . . . . . 93
4.3 A class of volumetric barrier algorithms for solving SSPs . . . . . . . . . . 97
4.4 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4.1 Complexity for short-step algorithm . . . . . . . . . . . . . . . . . . 100
4.4.2 Complexity for long-step algorithm . . . . . . . . . . . . . . . . . . 101
5 Some Applications 106
5.1 Two applications of SSOCPs . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.1.1 Stochastic Euclidean facility location problem . . . . . . . . . . . . 106
5.1.2 Portfolio optimization with loss risk constraints . . . . . . . . . . . 112
5.2 Two applications of SRQCPs . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2.1 Optimal covering random ellipsoid problem . . . . . . . . . . . . . . 118
vii
5.2.2 Structural optimization . . . . . . . . . . . . . . . . . . . . . . . . . 125
6 Related Open problems: Multi-Order Cone Programming Problems 130
6.1 Multi-order cone programming problems . . . . . . . . . . . . . . . . . . . 131
6.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3 Multi-oder cone programming problems over integers . . . . . . . . . . . . 138
6.4 Multi-oder cone programming problems under uncertainty . . . . . . . . . 139
6.5 An application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.5.1 CERFLPs—An MOCP model . . . . . . . . . . . . . . . . . . . . . 142
6.5.2 DERFLPs—A 0-1MOCP model . . . . . . . . . . . . . . . . . . . . 143
6.5.3 ERFLPs with integrality constraints—An MIMOCP model . . . . . 145
6.5.4 Stochastic CERFLPs—An SMOCP model . . . . . . . . . . . . . . 146
7 Conclusion 150
viii
List of Abbreviations
CERFLP continuous Euclidean-rectilinear facility location problem
CFLP continuous facility location problem
DERFLP discrete Euclidean-rectilinear facility location problem
DFLP discrete facility location problem
DLP deterministic linear programming
DRQCP deterministic rotated quadratic cone programming
DSDP deterministic semidefinite programming
DSOCP deterministic second-order cone programming
DSP deterministic symmetric programming
EFLP Euclidean facility location problem
ERFLP Euclidean-rectilinear facility location problem
ESFLP Euclidean single facility location problem
FLP facility location problem
KKT Karush-Kuhn-Tucker conditions
MFLP multiple facility location problem
MIMOCP mixed integer multi-order cone programming
MOCP (deterministic) multi-order cone programming
0− 1MOCP 0-1 multi-order cone programming
POCP pth − order cone programming
RFLP rectilinear facility location problem
SFLP stochastic facility location problem
SLP stochastic linear programming
SMOCP stochastic multi-order cone programming
SRQCP stochastic rotated quadratic cone programming
SSDP stochastic semidefinite programming
ix
SSOCP stochastic second-order cone programming
SSP stochastic symmetric programming
Arw(x) the arrow-shaped matrix associated with the vector x
Aut(K) the automorphism group of a cone K
diag(·) the operator that maps its argument to a block diagonal matrix
GL(n,R) the general linear group of degree n over R
int(K) the interior of a cone K
En the n dimensional real vector space whose elements are indexed from 0
En+ the second-order cone of dimension n
En+ the rotated quadratic cone of dimension n
Hn the space of complex Hermitian matrices of order n
Hn+ the space of complex Hermitian semidefinite matrices of order n
KJ the cone of squares of a Euclidean Jordan algebra J
Qnp the pth-order cone of dimension n
QHn the space of quaternion Hermitian matrices of order n
QHn+ the space of quaternion Hermitian semidefinite matrices of order n
Sn the space of real symmetric matrices of order n
Sn+ the space of real symmetric positive semidefinite matrices of order n
Rn the space of real vectors of dimension n
Rn+ the cone of nonnegative orthants of Rn
||x|| the Frobenius norm of an element x
||x||2 the Euclidean norm of a vector x
||x||p the p-norm of a vector x
bxc the linear representation of an element x
dxe the quadratic representation of an element x
x
X 0 the matrix X is positive semidefinite
X 0 the matrix X is positive definite
x 0 the vector x lies in a second-order cone of an appropriate dimension
x 0 the vector x lies in the interior of a second-order cone of an
appropriate dimension
x N 0 the vector x lies in the Cartesian product of N second-order cones with
appropriate dimensions
x 0 the vector x lies in a rotated quadratic cone of an appropriate dimension
x N 0 the vector x lies in the Cartesian product of N rotated quadratic cones
with appropriate dimensions
x KJ 0 x is an element of a symmetric cone KJ
x KJ 0 x is an element of the interior of a symmetric cone KJ
x 〈p〉 0 the vector x lies in a pth-order cone of an appropriate dimension
x N〈p〉 0 the vector x lies in the Cartesian product of N pth-order cones with
appropriate dimensions
x 〈p1,p2,...,pN 〉 0 the vector x lies in the Cartesian product of N cones of orders p1, p2, . . . ,
pN and with appropriate dimensions
x y the Jordan multiplication of elements x and y of a Jordan algebra
x • y the inner product trace(x y) of elements x and y of a Euclidean Jordan
algebra
xi
Chapter 1
Introduction and Background
1.1 Introduction
The purpose of this dissertation is to introduce the two-stage stochastic symmetric pro-
grams (SSPs)1 with recourse and to study this problem in the dual standard form:
max cTx+ E [Q(x, ω)]
s.t. Ax+ ξ = b
ξ ∈ K1,
(1.1.1)
where x and ξ are the first-stage decision variables, and Q(x, ω) is the minimum of the
problem
max d(ω)Ty
s.t. W (ω)y + ζ = h(ω)− T (ω)x
ζ ∈ K2,
(1.1.2)
1Based on tradition in optimization literature, we use the term stochastic symmetric program to meanthe generic form of a problem, and the term stochastic symmetric programming to mean the field ofactivities based on that problem. While both will be denoted by the acronym SSP, the plural of the firstusage will be denoted by the acronym SSPs. Acronyms DLP, DRQCP, DSDP, DSOCP, DSP, MIMOCP,MOCP, 0-1MOCP, SLP, SMOCP, SRQCP SSDP, and SSOCP are defined and used accordance with thiscustom.
1
where y and ζ are the second-stage variables, E[Q(x, ω)] :=∫
ΩQ(x, ω)P (dω), the matrix
A and the vectors b and c are deterministic data, and the matrices W (ω) and T (ω) and
the vectors h(ω) and d(ω) are random data whose realizations depend on an underlying
outcome ω in an event space Ω with a known probability function P . The cones K1 and K2
are symmetric cones (i.e., closed, convex, pointed, self-dual cones with their automorphism
groups acting transitively on their interiors) in Rn1 and Rn2 . (Here, n1 and n2 are positive
integers.)
The birth of symmetric programming (also known as symmetric cone programming
[36]) as a subfield of convex optimization can be dated back to 2003. The main motivation
of this generalization was its ability to handle many important applications that cannot be
covered by linear programming. In symmetric programs, we minimize a linear function
over the intersection of an affine set and a so called symmetric cone. In particular, if
the symmetric cone is the nonnegative orthant cone, the result is linear programming, if
it is the second-order cone, the result is second-order cone programming, and if it is the
semidefinite cone (the cone of all real symmetric positive semidefinite matrices), the result
is semidefinite programming. It is seen that symmetric programming is a generalization
of linear programming that includes second-order cone programming and semidefinite
programming as special cases. We shall refer to such problems as deterministic linear
programs (DLPs), deterministic second-order cone programs (DSOCPs), deterministic
semidefinite programs (DSDPs) and deterministic symmetric programs (DSPs) because
the data defining applications leading to such problems is assumed to be known with
certainty. It has been found that DSPs are related to many application areas including
finance, geometry, robust linear programming, matrix optimization, norm minimization
problems, and relaxations in combinatorial optimization. We refer the reader to the survey
papers by Todd [38] and Vandenberghe and Boyd [41] which discuss, in particular, DSDP
and its applications, and the survey papers of Alizadeh and Goldfarb [1], and Lobo, et al.
[23] which discuss, in particular, DSOCPs with a number of applications in many areas
2
including a variety of engineering applications.
Deterministic optimization problems are formulated to find optimal decisions in prob-
lems with certainty in data. In fact, in some applications we cannot specify the model
entirely because it depends on information which is not available at the time of formu-
lation but it will be determined at some point in the future. Stochastic programs have
been studied since 1950s to handle those problems that involve uncertainty in data. See
[17, 32, 33] and references contained therein. In particular, two-stage stochastic linear
programs (SLPs) have been established to formulate many applications (see [15] for ex-
ample) of linear programming with uncertain data. There are efficient algorithms (both
interior and noninterior point) for solving SLPs. The class of SSP problems may be viewed
as an extension of DSPs (by allowing uncertainty in data) on the one hand, and as an
extension of SLPs (where K1 and K2 are both cones of nonnegative orthants) or, more
generally, stochastic semidefinite programs (SSDPs) with recourse [9, 25] (where K1 and
K2 are both semidefinite cones) on the other hand.
Interior point methods [30] are considered to be one of the most successful classes of
algorithms for solving deterministic (linear and nonlinear) convex optimization problems.
This provides motivation to investigate whether decomposition-based interior point algo-
rithms can be developed for stochastic programming. Zhao [48] derived a decomposition
algorithm for SLPs based on a logarithmic barrier and proved its polynomial complexity.
Mehrotra and Ozevin [25] have proved important results that extend the work of Zhao
[48] to the case of SSDPs including a derivation of a polynomial logarithmic barrier de-
composition algorithm for this class of problems that extends Zhao’s algorithm for SLPs.
An alternative to the logarithmic barrier is the volumetric barrier of Vaidya [40] (see also
[5, 6, 7]). It has been observed [8] that certain cutting plane algorithms [21] for SLPs
based on the volumetric barrier perform better in practice than those based on the loga-
rithmic barrier. Recently, Ariyawansa and Zhu [10] have derived a class of decomposition
algorithms for SSDPs based on a volumetric barrier analogous to work of Mehrotra and
3
Ozevin [25] by utilizing the work of Anstreicher [7] for DSDP.
Concerning algorithms for DSPs, Schmieta and Alizadeh [35, 36] and Rangarajan [34]
have extended interior point algorithms for DSDP to DSP. Concerning algorithms for
SSPs, we know of no interior point algorithms for solving them that exploits the special
structure of the symmetric cone as it is done in [35, 36, 34] for DSP. The question that
naturally arises now is whether interior point methods could be derived for solving SSPs,
and if the answer is affirmative, it is important to ask whether or not we can prove the
polynomial complexity of the resulting algorithms. Our particular concern in this disser-
tation is to extend decomposition-based interior point methods for SSDPs to stochastic
optimization problems over all symmetric cones based on both logarithmic and volumet-
ric barriers and also to prove the polynomial complexity of the resulting algorithms. In
fact, there is a unifying theory [18] based on Euclidean Jordan algebras that connects all
symmetric cones. This theory is very important for a sound understanding of the Jordan
algebraic characterization of symmetric cones and, hence, a correct understanding of the
very close equivalence between Problem (1.1.1, 1.1.2) and our constructive definition of
an SSP which will be presented in Chapter 2.
Now let us indicate briefly how to solve Problem (1.1.1, 1.1.2). We assume that the
event space Ω is discrete and finite. In practice, the case of interest is when the random
data would have a finite number of realizations, because the inputs of the modeling process
require that SSPs be solved for a finite number of scenarios. The general scheme of the
decomposition-based interior point algorithms is as follows. As we have an SSP with a
finite scenarios, we can explicitly formulate the problem as a large scale DSP, we then use
primal-dual interior point methods to solve the resulting large scale formulation directly,
which can be successfully accomplished by utilizing the special structure of the underlying
symmetric cones.
This dissertation is organized as follows. In the remaining part of this chapter, we
outline a minimal foundation of the theory of Euclidean Jordan algebras. Based on Eu-
4
clidean Jordan algebras. In Chapter 2 we explicitly introduce the constructive definition
of SSP problem in both the primal and dual standard forms and then introduce six general
classes of optimization problems that can be cast as SSPs. The focus of Chapter 3 is on
extending the work of Mehrotra and Ozevin [25] to the case of SSPs by deriving a class
of logarithmic barrier decomposition algorithms for SSPs and establishing the polynomial
complexity of the resulting algorithm.
In Chapter 4, we extend the work of Ariyawansa and Zhu [10] to the case of SSPs
by deriving a class of volumetric barrier decomposition algorithms for SSPs and proving
polynomial complexity of certain members of the class of algorithms.
Chapter 5 is devoted to describe four applications leading to, indeed, two important
special cases of SSPs. More specifically, we describe the stochastic Euclidean facility
location problem and the portfolio optimization problem with loss risk constraints as two
applications of SSPs when the symmetric cones K1 and K2 are both second-order cones,
and then describe the optimal covering random ellipsoid problem and an application in
structural optimization as two applications of SSPs when the symmetric cones K1 and K2
are both rotated quadratic cones (see §2.2 for definitions).
The material in Chapters 3-5 are independent, but they essentially depend on the
material of Chapters 1 and 2. So, after reading Chapter 2, one can proceed directly to
Chapter 3 and/or Chapter 4 concerning theoretical results (algorithms and complexity
analysis), and/or to Chapter 5 concerning applicational models (see Figure 1.1).
In Chapter 6, we propose the so called multi-order cone programming problem as a
new conic optimization problem. This problem is beyond the scope of this dissertation
because it is over non-symmetric cones, so we will leave it as an unsolved open problem.
We indicate weak and strong duality relations of this optimization problem and describe
an application of it.
We conclude this dissertation in Chapter 7 by summarizing its contributions and
indicating some possible future research directions.
5
Figure 1.1: The material has been organized into seven chapters. After reading Chapter2, the reader can proceed directly to Chapter 3, Chapter 4, and/or Chapter 5.
We gave a concise definition for the symmetric cones immediately after Problem (1.1.1,
1.1.2). In §1.2, we write down this definition more explicitly, and in §1.3 we review the
theory of Euclidean Jordan algebras that connects all symmetric cones. The text of Faraut
and Koranyi [18] covers the foundations of this theory.
In Chapter 2, the reader will see how general and abstract the problem is. In order to
make our presentation concrete, we will describe two interesting examples of symmetric
cones with some details throughout this chapter: the second-order cone and the cone of
real symmetric positive semidefinite matrices. Our presentation in §1.2 and §1.3 mostly
follows that of [18] and [36], and, in particular, most examples in §1.3 are taken from [36].
We now introduce some notations that will be used in the sequel. We use R to
denote the field of real numbers. If A ⊆ Rk and B ⊆ Rl, then the Cartesian product of
A× B := (x;y) : x ∈ A and y ∈ B. We use Rm×n to denote the vector spaces of real
m×n matrices. The matrices 0n, In ∈ Rn×n denote, respectively, the zero and the identity
matrices of order n (we will write 0n and In as 0 and I when n is known from the context).
All vectors we use are column vectors with superscript “T” indicating transposition. We
use “,” for adjoining vectors and matrices in a row, and use “;” for adjoining them in a
column. So, for example, if x, y, and z are vectors, we have:
6
x
y
z
= (xT,yT, zT)T = (x;y; z).
For each vector x ∈ Rn whose first entry is indexed with 0, we write x for the subvector
consisting of entries 1 through n− 1; therefore x = (x0; x) ∈ R×Rn−1. We let En denote
the n dimensional real vector space R× Rn−1 whose elements x are indexed with 0, and
denote the space of real symmetric matrices of order n by Sn.
1.2 What is a symmetric cone?
This section and the next section are elementary. We start with some basic definitions
finally leading us to the definition of the symmetric cone.
Let V be a finite-dimensional Euclidean vector space over R with inner product 〈·, ·〉.
A subset S of V is said to be convex if it is closed with respect to convex combinations
of finite subsets of S, i.e., for any λ ∈ (0, 1), x,y ∈ S implies that λx + (1 − λ)y ∈ S.
A subset K ⊂ V is said to be a cone if it is closed under scalar multiplication by positive
real numbers, i.e., if for any λ > 0, x ∈ K implies that λx ∈ K. A convex cone is a cone
that is also a convex set. A cone is said to be closed if it is closed with respect to taking
limits, solid if it has a nonempty interior, pointed if it contains no lines or, alternatively,
it does not contain two opposite nonzero vectors (so the origin is an extreme point), and
regular if it is a closed, convex, pointed, solid cone.
By GL(n,R) the general linear group of degree n over R (i.e., the set of n×n invertible
matrices with entries from R, together with the operation of ordinary matrix multiplica-
tion). For a regular cone K ⊂ V , we denote by int(K) the interior of K, and by Aut(K)
the automorphism group of K, i.e., Aut(K) := ϕ ∈ GL(n,R) : ϕ(K) = K.
Definition 1.2.1. Let V be a finite-dimensional real Euclidean space. A regular K ⊂ V
7
is said to be homogeneous if for each u,v ∈ int(K), there exists an invertible linear map
ϕ : V −→ V such that
1. ϕ(K) = K, i.e., ϕ is an automorphism of K, and
2. ϕ(u) = v.
In other words, Aut(K) acts transitively on the interior of K.
A regular K ⊂ V said to be self-dual if it coincides with its dual cone K∗, i.e.,
K = K∗ := s ∈ V : 〈x, s〉 ≥ 0, ∀x ∈ K,
and symmetric if it is both homogeneous and self-dual.
Almost all conic optimization problems in real world applications are associated with
symmetric cones such as the nonnegative orthant cone, the second-order cone (see below
for definitions), and the cone of positive semi-definite matrices over the real or complex
numbers. Except for Chapter 7 which proposes an unsolved open optimization problem
over a non-symmetric cone, our focus in this dissertation is on those cones that are
symmetric.
The material in the rest of this section is not needed later because the theory of Jordan
algebras will lead us independently to the same results. However, we include this material
for the sake of clarity of the examples and completeness.
Example 1. The second-order cone En+.
We show that the second-order cone (also known as the quadratic, Lorentz, or the ice
cream cone) of dimension n
En+ := ξ ∈ En : ξ0 ≥ ‖ξ‖2,
with the standard inner product, 〈ξ, ζ〉 := ξTζ, is a symmetric cone.
8
In Lemma 6.2.1, we prove that the dual cone of the pth-order cone is the qth-order cone,
i.e.,
ξ ∈ En : ξ0 ≥ ‖ξ‖p∗ = ξ ∈ En : ξ0 ≥ ‖ξ‖q,
where 1 ≤ p ≤ ∞ and q is conjugate to p in the sense that 1/p+ 1/q = 1. Picking p = 2
(hence q = 2) implies that (En+)∗ = En+. This demonstrates the self-duality of En+. So, to
prove that the cone En+ is symmetric, we only need to show that it is homogeneous. The
proof follows that of Example 1 in §2 of [18, Chapter I].
First, notice that En+ can be redefined as
En+ := ξ ∈ En : ξTJξ ≥ 0, ξ0 ≥ 0,
where
J :=
1 0T
0 −In−1
.Notice also that each element of the group G := A ∈ Rn×n : ATJA = J maps En+ onto
itself (because, for every A ∈ G, we have that (Aξ)TJ(Aξ) = ξTJξ), and so does the
direct product H := R+ × G. It now remains to show that the group H acts transitively
on the interior of En+. To do so, it is enough show that, for any x ∈ int(En+), there exists
an element in H that maps e to x, where e := (1; 0) ∈ En.
Note that we may write x as x = λy with λ =√xTJx and y ∈ En. Moreover, there
exists a reflector matrix Q such that y = Q (0; . . . ; 0; r), with r = ‖y‖2. We then have
y20 − r2 = y2
0 − ‖y‖22 = yTJy =
1
λ2xTJx = 1.
Therefore, there exists t ≥ 0 such that y0 = cosh t and r = sinh t.
9
We define
Q :=
1 0T
0 Q
and Ht :=
cosh t 0T sinh t
0 In−2 0
sinh t 0T cosh t
.
Observing that Q,Ht ∈ G yields that QHt ∈ G, and therefore λ QHt ∈ H. The result
follows by noting that x = λ QHt e.
Example 2. The cone of real symmetric positive semidefinite matrices Sn+.
The residents of the interior of the cone of real symmetric positive semidefinite matrices of
order n, Sn+, are the positive definite matrices, the residents of the boundary of this cone
are the singular positive semidefinite matrices (having at least one 0 eigenvalue), and
the only matrix that resides at the origin is the positive semidefinite matrix having all 0
eigenvalues. We now show that Sn+, with the Frobenius inner product 〈U, V 〉 := trace(UV ),
is a symmetric cone.
We first verify that Sn+ is self-dual, i.e.,
Sn+ = (Sn+)∗ = U ∈ Sn : trace(UV ) ≥ 0 for all V ∈ Sn+.
We can easily prove that Sn+ ⊆ (Sn+)∗ by assuming that X ∈ Sn+ and observing that, for
any Y ∈ Sn+, we have
trace(XY ) = trace(X12Y X
12 ) ≥ 0.
Here we used the positive definiteness of X12Y X
12 to obtain the inequality. In fact, any
U ∈ Sn can be written as U = QΛQT where Q ∈ Rn×n is an orthogonal matrix and
Λ ∈ Rn×n is a diagonal matrix whose diagonal entries are the eigenvalues of U (see for
example Watkins [43, Theorem 5.4.20]). Therefore, if U ∈ Sn+ we have that
trace(U) = trace(QΛQT) = trace(ΛQQT) = trace(Λ) =n∑i=1
λi ≥ 0.
10
To prove that (Sn+)∗ ⊆ Sn+, assume that X /∈ Sn+, then there exists a nonzero vector y ∈ Rn
such that trace(X yyT) = yTXy < 0, which shows that X /∈ (Sn+)∗.
So, to prove that the cone Sn+ is symmetric, it remains to show that it is homogeneous.
The proof follows exactly that of Example 2 in §2 of [18, Chapter I].
For P ∈ GL(n,R), we define a linear map ϕP : Sn −→ Sn by
ϕP (X) = PXPT.
Note that ϕP maps Sn+ into itself. By the Cholesky decomposition theorem, if X ∈ int(Sn+)
then X can be decomposed into a product X = LLT = ϕL(In), where L ∈ GL(n,R), which
establishes the result.
1.3 Symmetric cones and Euclidean Jordan algebras
In this section we give a review some of the basic definitions and theory of Euclidean
Jordan algebras that are necessary for our subsequent development. We will also see how
we can use Euclidean Jordan algebras to obtain symmetric cones.
Let J be a finite-dimensional vector space over R. A map : J × J −→ J is
called bilinear if for all x,y ∈ J and for all α, β ∈ R, we have that (αx + βy) z =
α(x z) + β(y z) and x (αy + βz) = α(x y) + β(x z).
Definition 1.3.1. A finite-dimensional vector space J over R is called an algebra over R
if a bilinear map : J × J −→ J exists.
Let x be an element in an algebra J , then for n ≥ 2 we define xn recursively by
xn := x xn−1.
Definition 1.3.2. Let J be a finite-dimensional R algebra with a bilinear product :
J × J −→ J . Then (J , ) is called a Jordan algebra if for all x,y ∈ J
11
1. x y = y x (commutativity);
2. x (x2 y) = x2 (x y) (Jordan’s axiom).
The product xy between two elements x and y of a Jordan algebra (J , ) is called the
Jordan multiplication between x and y. A Jordan algebra (J , ) has an identity element if
there exists a (necessarily unique) element e ∈ J such that xe = ex = x for all x ∈ J .
A Jordan algebra (J , ) is not necessarily associative, that is, x (y z) = (xy)z may
not hold in general. However, it is power associative, i.e., xp xq = xp+q for all integers
p, q ≥ 1.
Example 3. It is easy to see that the space Rn×n of n× n real matrices with the Jordan
multiplication X Y := (XY + Y X)/2 forms a Jordan algebra with identity In.
Example 4. It can be verified that the space En with the Jordan multiplication x y :=
(xTy;x0y + y0x) forms a Jordan algebra with the identity (1; 0) ∈ En.
Definition 1.3.3. A Jordan algebra J is called Euclidean if there exists an inner product
〈·, ·〉 on (J , ) such that for all x,y, z ∈ J
1. 〈x,x〉 > 0 ∀ x 6= 0 (positive definiteness);
2. 〈x,y〉 = 〈y,x〉 (symmetry);
3. 〈x,y z〉 = 〈x y, z〉 (associativity).
That is, J admits a positive definite, symmetric, quadratic form which is also associative.
In the sequel, we consider only Euclidean Jordan algebras with identity. We simply
denote the Euclidean Jordan algebra (J , ) by J .
Example 5. The space Rn×n is not a Euclidean Jordan algebra. However, under the
operation “” defined in Example 3, the subspace Sn is a Jordan subalgebra of Rn×n, and,
12
indeed, is a Euclidean Jordan algebra with the inner product 〈X, Y 〉 = trace(X Y ) =
trace(XY ). While both the symmetry and the associativity can be easily proved by using the
fact that trace(XY ) = trace(Y X), the positive definiteness can be immediately obtained
by observing that trace(X2) > 0 for X 6= 0.
Example 6. It is easy to verify that the space En (with “” defined as in Example ( 4))
is a Euclidean Jordan algebra with the inner product 〈x,y〉 = xTy.
Many of the results below also hold for the general setting of Jordan algebras, but
here we focus entirely on Euclidean Jordan algebras with identity as that generality is not
needed for our subsequent development.
Since a Euclidean Jordan algebra J is power associative, we can define the concepts
of rank, minimal and characteristic polynomials, eigenvalues, trace, and determinant for
it.
Definition 1.3.4. Let J be a Euclidean Jordan algebra. Then
1. for x ∈ J , deg(x) := min r > 0 : e,x,x2, . . . ,xr is linearly dependent is called
the degree of x;
2. rank(J ) := maxx∈J deg(x) is called the rank of J .
Let x be an element of degree d in a Euclidean Jordan algebra J . We define R[x] as
the set of all polynomials in x over R:
R[x] :=
∞∑i=0
aixi : ai ∈ R, ai = 0 for all but a finite number of i
.
Since e,x,x2, . . . ,xd is linearly dependent, there exists a nonzero monic polynomial
q ∈ R[x] of degree d, such that q(x) = 0. In other words, there are real numbers
a1(x), a2(x), . . . , ad(x), not all zero, such that
q(x) := xd − a1(x)xd−1 + a2(x)xd−2 + · · ·+ (−1)dad(x)e = 0.
13
Clearly q is of minimal degree of those polynomials in R[x] which have the above
properties, so we call it (or, alternatively, the polynomial p(λ) := λd − a1(x)λd−1 +
a2(x)λd−2 + · · · + (−1)dad(x)) the minimal polynomial of x. Note that the minimal
polynomial of an element x ∈ J is unique, because otherwise, as its monic, we will have
a contradiction to the minimality of its degree.
An element x ∈ J is called regular if deg(x) = rank(J ). If x is a regular element of a
Euclidean Jordan algebra, then we define its characteristic polynomial to be equal to its
minimal polynomial. We have the following proposition.
Proposition 1.3.1 ([18, Proposition II.2.1]). Let J be an algebra with rank r. The
set of regular elements is open and dense in J . There exist polynomials a1, a2, . . . , ar
such that the minimal polynomial of every regular element x is given by p(λ) = λr −
a1(x)λr−1 + a2(x)λr−2 + · · ·+ (−1)rar(x). The polynomials a1, a2, . . . , ar are unique and
ai is homogeneous with degree i.
The polynomial p(λ) is called the characteristic polynomial of the regular element x.
Since the set of regular elements are dense in J , by continuity we can extend characteristic
polynomials to all x in J . So, the minimal polynomial coincides with the characteristic
polynomial for regular elements and divides the characteristic polynomial of non-regular
elements.
Definition 1.3.5. Let x be an element in a rank-r algebra J , then its eigenvalues are the
roots λ1, λ2, . . . , λr of its characteristic polynomial p(λ) = λr − a1(x)λr−1 + a2(x)λr−2 +
· · ·+ (−1)rar(x).
Whereas the minimal polynomial has only simple roots, it is possible, in the case of
non-regular elements, that the characteristic polynomial have multiple roots. Indeed, the
characteristic and minimal polynomials have the same set of roots, except for their mul-
tiplicities. In fact, we can define the degree of x to be the number of distinct eigenvalues
of x.
14
Definition 1.3.6. Let x be an element in a rank-r algebra J , and λ1, λ2, . . . , λr be the
roots of its characteristic polynomial p(λ) = λr−a1(x)λr−1+a2(x)λr−2+· · ·+(−1)rar(x).
Then
1. trace(x) := λ1 + λ2 + · · ·+ λr = a1(x) is the trace of x in J ;
2. det(x) := λ1λ2 · · ·λr = ar(x) is the determinant of x in J .
Example 7. All these concepts (characteristic polynomials, eigenvalues, trace, determi-
nant, etc.) are the corresponding concepts used in Sn. Observe that rank(Sn) = n because
the deg(X) is the number of distinct eigenvalues of X which is, indeed, at most n.
Example 8. We can easily prove the following quadratic identity for x ∈ En:
x2 − 2x0x+ (x20 − ‖x‖2
2)e = 0.
Thus, we can define the polynomial p(λ) := λ2 − 2x0λ+ (x20 − ‖x‖2
2) as the characteristic
polynomial of x in En and its two roots, λ1,2 = x0±‖x‖2, are the eigenvalues of x. We also
have that trace(x) = 2x0 and det(x) = x20 − ‖x‖2
2. Observe also that λ1 = λ2 if and only
if x = 0, and therefore x is a multiple of the identity. Thus, every x ∈ En−αe : α ∈ R
has degree 2. This implies that rank(En) = 2, which is, unlike the rank of Sn, independent
of the dimension of the underlying vector space.
For an element x ∈ J , let bxc : J −→ J be the linear map defined by bxcy := x y,
for all y ∈ J . Note that bxce = x and bxcx = x2. Note also that the operators bxc and
bx2c commute, because, by Jordan’s Axiom, bxcbx2cy = bx2cbxcy.
Example 9. An equivalent way of dealing with symmetric matrices is dealing with vectors
obtained from the vectorization of symmetric matrices. The operator vec : Sn −→ Rn2
creates a column vector from a matrix X by stacking its column vectors below one another.
15
Note that
vec(XY ) = vec
(XY + Y X
2
)=
1
2(vec(XY I)+vec(IY X)) =
1
2(I ⊗X +X ⊗ I)︸ ︷︷ ︸
:=bXc
vec(Y ),
where we used the fact that vec(ABC) = (CT⊗A)vec(B) to obtain the last equality (here,
the operator ⊗ : Rm×n × Rk×l −→ Rmk×nl is the Kronecker product which maps the pair
of matrices (A,B) into the matrix A⊗B whose (i, j) block is aijB for i = 1, 2, . . . ,m and
j = 1, 2, . . . , n). This gives the explicit formula of the b·c operator for Sn.
Example 10. The explicit formula of the b·c operator for En can be immediately given
by considering the following multiplication:
x y =
xTy
x0y + y0x
=
x0 xT
x x0I
︸ ︷︷ ︸:=Arw(x):=bxc
y.
Here Arw(x) ∈ Sn is the arrow-shaped matrix associated with the vector x ∈ En.
For x,y ∈ J , let dx,ye : J × J −→ J be the quadratic operator defined by
dx,ye := bxcbyc+ bycbxc − bx yc.
Therefore, in addition to bxc, we can define another linear map dxe : J −→ J associated
with x that is called the quadratic representation and defined by
dxe := 2bxc2 − bx2c = dx,xe.
16
Example 11. Continuing Example 9 we have
dX,Zevec(Y ) = bXcbZcvec(Y ) + bZcbXcvec(Y )− bX Zcvec(Y )
= vec(X (Z Y )) + vec(Z (X Y ))− vec((X Z) Y )
= vec(XZY+XY Z+ZY X+Y ZX
4
)+ vec
(ZXY+ZY X+XY Z+Y XZ
4
)− vec
(XZY+ZXY+Y XZ+Y ZX
4
)= 1
2(vec(XY Z) + vec(ZY X))
= 12(X ⊗ Z + Z ⊗X)vec(Y ).
This shows that dX,Ze := 12(X ⊗ Z + Z ⊗X), and, in particular, dXe := X ⊗X.
Example 12. Continuing Example 10, we can easily verify that
dxe := 2Arw2(x)− Arw(x2) =
‖x||22 2x0xT
2x0x det(x)I + 2xxT
.Notice that bec = dee = I, trace(e) = r, det(e) = 1 (since all the eigenvalues of e
are equal to one). Notice also that the linear operator bxc is symmetric with respect to
〈·, ·〉, because, by the associativity of the inner product 〈·, ·〉, we have that 〈bxcy, z〉 =
〈x y, z〉 = 〈x,y z〉 = 〈y, bxcz〉. This implies that dxe is also symmetric with respect
to 〈·, ·〉. It is also easy to see that bx, y + zc = bx,yc + bx, zc, and consequently
dx, y + ze = dx,ye+ dx, ze.
As the operator dXe in Example 11 plays an important role in the development of
the interior point methods for DSDP (SSDP) and the operator in Example 12 plays an
important role in the development of the interior point methods for DSOCP (SSOCP),
we would expect that the operator dxe will play a similar role in the development of the
interior point methods for DSP (SSP).
A spectral decomposition is a decomposition of x into idempotents together with the
eigenvalues. Recall that two elements c1, c2 ∈ J are said to be orthogonal if c1 c2 = 0.
17
A set of elements of J is orthogonal if all its elements are mutually orthogonal to each
other. An element c ∈ J is said to be an idempotent if c2 = c. An idempotent is primitive
if it is non-zero and cannot be written as a sum of two (necessarily orthogonal) non-zero
idempotents.
Definition 1.3.7. Let J be a Euclidean Jordan algebra. Then a subset c1, c2, . . . , cr
of J is called:
1. a complete system of orthogonal idempotents if it is an orthogonal set of idempotents
where c1 + c2 + · · ·+ cr = e;
2. a Jordan frame if it is a complete system of orthogonal primitive idempotents.
Example 13. Let q1, q2, . . . , qn be an orthonormal subset (all its vectors are mutually
orthogonal and all of unit norm (length)) of Rn. Then the set q1qT1 , q2q
T2 , . . . , qnq
Tn is
a Jordan frame in Sn. In fact, by the orthonormality, we have that
(qiqTi )(qjq
Tj ) =
0n, if i 6= j
qiqTi , if i = j.
and that∑n
i=1 qiqTi = In.
Example 14. Let x be a vector in En. It is easy to see that the set
12
(1; x‖x‖2
), 1
2
(1; −x‖x‖2
)is a Jordan frame in En.
Theorem 1.3.1 (Spectral decomposition (I), [18]). Let J be a Euclidean Jordan algebra
with rank r. Then for x ∈ J there exist real numbers λ1, λ2, . . . , λr and a Jordan frame
c1, c2, . . . , cr such that x = λ1c1 +λ2c2 + · · ·+λrcr, and λ1, λ2, . . . , λr are the eigenvalues
of x.
It is immediately seen that the eigenvalues of elements of Euclidean Jordan algebras
are always real, which is not the case for non-Euclidean Jordan algebras.
18
Example 15. It is known that for any X ∈ Sn there exists an orthogonal matrix Q ∈ Rn×n
and a diagonal matrix Λ ∈ Rn×n such that X = QΛQT. In fact, λ1, λ2, . . . , λn; the
diagonal entries of Λ, and q1, q2, . . . , qn; the columns of Q, can be used to rewrite X
equivalently as
X = λ1 q1qT1︸︷︷︸
C1
+λ2 q2qT2︸︷︷︸
C2
+ · · ·+ λn qnqTn︸ ︷︷ ︸
Cn
.
which, in view of Example 13, gives the spectral decomposition (I) of X in Sn.
Example 16. Using Example 14, the spectral decomposition (I) of x in En can be obtained
by considering the following identity:
x = (x0 + ‖x‖2)︸ ︷︷ ︸λ1
(1
2
)(1;
x
‖x‖2
)︸ ︷︷ ︸
c1
+ (x0 − ‖x‖2)︸ ︷︷ ︸λ2
(1
2
)(1;− x
‖x‖2
)︸ ︷︷ ︸
c2
.
Theorem 1.3.2 (Spectral decomposition (II), [18]). Let J be a Euclidean Jordan algebra.
Then for x ∈ J there exist unique real numbers λ1, λ2, . . . , λk, all distinct, and a unique
complete system of orthogonal idempotents c1, c2, . . . , ck such that x = λ1c1 +λ2c2 + · · ·+
λkck.
Continuing Examples 15 and 16, we have the following two examples [36].
Example 17. To write the spectral decomposition (II) of X in Sn, let λ1 > λ2 > · · · > λk
be distinct eigenvalues of X such that, for each i = 1, 2, . . . , k, the eigenvalue λi has
multiplicity mi and orthogonal eigenvectors qi1, qi2, . . . , qimi. Then X can be written as
X = λ1
m1∑j=1
q1jqT1j︸ ︷︷ ︸
:=C1
+λ2
m2∑j=1
q2jqT2j︸ ︷︷ ︸
:=C2
+ · · ·+ λk
mk∑j=1
qkjqTkj︸ ︷︷ ︸
:=Ck
.
where the set C1, C2, . . . , Ck is an orthogonal system of idempotents and∑k
i=1 Ci = In.
Notice that for each eigenvalue λi, the matrix Ci is uniquely written, even though the
19
corresponding eigenvectors qi1, qi2, . . . , qimimay not be unique.
Example 18. By Example 16, the eigenvalues of x ∈ En are λ1,2 = x0 ± ‖x‖2, and
therefore, as mentioned earlier, only multiples of identity have multiples eigenvalues. In
fact, if x = αe, for some α ∈ R, then its spectral decomposition (II) is simply αe (here
e is the singleton system of orthonormal idempotents).
Definition 1.3.8. Let J be a Euclidean Jordan algebra. We say that x ∈ J is positive
semidefinite (positive definite) if all its eigenvalues are nonnegative (positive).
Proposition 1.3.2 ([18, Proposition III.2.2]). If the elements x and y are positive defi-
nite, then the element bxcy is so.
Definition 1.3.9. We say that two elements x and y of a Euclidean Jordan algebra
are simultaneously decomposed if there is a Jordan frame c1, c2, . . . , cr such that x =∑ri=1 λici and y =
∑ri=1 µici.
For x ∈ J , it is now possible to rewrite the definition of x2 as
x2 := λ21c1 + λ2
2c2 + · · ·+ λ2kck = x x.
We also have the following definition.
Definition 1.3.10. Let x be an element of a Euclidean Jordan algebra J with a spectral
decomposition x := λ1c1 + λ2c2 + · · ·+ λkck. Then
1. the square root of x is x1/2 := λ1/21 c1 + λ
1/22 c2 + · · · + λ
1/2k ck, whenever all λi ≥ 0,
and undefined otherwise;
2. the inverse of x is x−1 := λ−11 c1 + λ−1
2 c2 + · · · + λ−1k ck, whenever all λi 6= 0, and
undefined otherwise.
20
More generally, if f is any real valued continuous function, then it is also possible to
extend the above definition to define f(x) as
f(x) := f(λ1)c1 + f(λ2)c2 + · · ·+ f(λk)ck.
Observe that x−1 x = e. We call x invertible if x−1 is defined, and non-invertible or
singular otherwise. Note that every positive definite element is invertible and its inverse
is also positive definite.
Remark 1. The equality x y = e may not imply that y = x−1, as it can be seen in the
following equality [18, Chapter II]:
1 0
0 −1
︸ ︷︷ ︸
=X=X−1
1 1
1 −1
︸ ︷︷ ︸
=Y 6=X−1
=
1 0
0 1
︸ ︷︷ ︸
=I
.
We define the differential operator Dx : J −→ J by
Dxf(x) :=
(d
dλ1
f(λ1)
)c1 +
(d
dλ2
f(λ2)
)c2 + · · ·+
(d
dλkf(λk)
)ck.
The Jacobian matrix ∇xf(x) is defined so that (∇xf(x))Ty = (Dxf(x)) y for all
y ∈ J . For n ≥ 2, we define Dnxf(x) recursively by Dn
xf(x) := Dx(Dn−1x f(x)). For
instance, D2xx = Dx(Dxx) = Dxe = 0, Dxx
−1 = −x−2, provided that x is invertible.
More generally, if y is a function of x and they are both simultaneously decomposed,
then Dxf(y) := Dyf(y) Dxy. For example, Dxy−1 = −y−2 Dxy, provided that y
is invertible. Note that, if y and z are functions of x and they are all simultaneously
decomposed, then Dx(y ± z) = Dxy ± Dxz, and Dx(y z) = (Dxy) z + y (Dxz).
This differential operator is interesting and will play an important role when computing
partial derivatives.
21
There is a one-to-one correspondence between Euclidean Jordan algebras and sym-
metric cones.
Definition 1.3.11. If J is a Euclidean Jordan algebra then its cone of squares is the set
KJ := x2 : x ∈ J .
Such a one to one correspondence between (cones of squares of) Euclidean Jordan
algebras and symmetric cones is given by the following fundamental result, which says
that a cone is symmetric if and only if it is the cone of squares of some Euclidean Jordan
algebra.
Theorem 1.3.3 (Jordan algebraic characterization of symmetric cones, [18]). A regular
cone K is symmetric iff K = KJ for some Euclidean Jordan algebra J .
The above result implies that an element is positive semidefinite if and only if it
belongs to the cone of squares, and it is positive definite if and only if it belongs to the
interior of the cone of squares. In other words, an element x in a Euclidean Jordan algebra
J is positive semidefinite if and only if x ∈ KJ , and is positive definite if and only if
x ∈ int(KJ ), where int(KJ ) denotes the interior of the cone KJ .
The following notations will be used throughout the dissertation. For a Euclidean
Jordan algebra J , we write x KJ 0 and x KJ 0 to mean that x ∈ KJ and x ∈ int(KJ ),
respectively. We also write x KJ y (or y KJ x) and x KJ y (or y ≺KJ x) to mean
that x− y KJ 0 and x− y KJ 0, respectively.
Example 19. The cone KSn is, indeed, Sn+, the cone of real symmetric positive semidef-
inite matrices of order n. This can be seen in view of the fact that a symmetric matrix is
square of another symmetric matrix if and only if it is a positive semidefinite matrix. To
prove this fact, suppose that X is positive semidefinite, then it has nonnegative eigenvalues
22
λ1, λ2, . . . , λn and, by the spectral theorem for real symmetric matrices, there exists an or-
thogonal matrix Q ∈ Rn×n and a diagonal matrix Λ := diag2(λ1;λ2; . . . ;λn) ∈ Rn×n such
that X = QΛQT. By letting Y := QΛ1/2QT ∈ Sn where Λ1/2 := diag(√λ1;√λ2; . . . ;
√λn),
it follows that
X = QΛQT = QΛ1/2Λ1/2QT = (QΛ1/2QT)(QΛ1/2QT) = Y 2.
To prove the other direction, let us assume that X = Y 2 for some Y ∈ Sn. It is clear
that X ∈ Sn. To show that X is positive semidefinite, let (λ,v) be an eigenpair of Y ,
then λ ∈ R (every real symmetric matrix has real eigenvalues). Furthermore, we have
that Xv = Y 2v = Y (λv) = λ2v. Thus, (λ2,v) is an eigenpair of X, which means that X
has always nonnegative eigenvalues, and therefore this completes the proof.
Example 20. The cone of squares of En is the second-order cone of dimension n:
En+ := ξ ∈ En : ξ0 ≥ ‖ξ‖2.
To see this, recall that the cone of squares (with with respect to “” defined in Example
4) is
KEn = ζ2 : ζ ∈ En = (‖ζ||22; 2ζ0ζ) : ζ ∈ En.
Thus, any x ∈ KEn can be written as x = (‖y||22; 2y0y), for some y ∈ En. It follows that
x0 = ‖y||22 = y20 + ‖y‖2
2 ≥ 2y0y = x,
where the inequality follows by observing that (y0 − ‖y‖2)2 ≥ 0. This means that x ∈ En+
and hence KEn ⊆ En+. The proof of the other direction can be found in §4 of [1].
2We indicate by the operator diag(·) that one maps its argument to a block diagonal matrix; forexample, if x ∈ Rn, then diag(x) is the n× n diagonal matrix with the entries of x on the diagonal.
23
Theorem 1.3.4 ([18, Theorem III.1.5]). Let J be a Jordan algebra over R with the
identity element e. The following properties are equivalent:
1. J is a Euclidean Jordan algebra.
2. The symmetric bilinear form trace(x y) is positive definite.
A direct consequence of the above theorem is that if J is a Euclidean Jordan algebra,
then x • y := trace(x y) is an inner product. In the sequel, we define the inner product
as 〈x,y〉 := x • y = trace(x y) and we call it the Frobenius inner product. It is easy to
see that, for x,y, z ∈ J , x • e = trace(x), x • y = y • x, (x + y) • z = x • z + y • z,
x • (y + z) = x • z + x • z, and (x y) • z = x • (y z).
For x ∈ J , the Frobenius norm (denoted by ‖ ·‖F , or simply ‖ ·‖) is defined as ‖x|| :=√x • x. We can also define various norms on J as functions of eigenvalues. For example,
the definition of the Frobenius norm can be rewritten as ‖x|| :=√λ2
1 + λ22 + · · ·+ λ2
k =√trace(x2) =
√x • x. The Cauchy-Schwartz inequality holds for the Frobenius inner
product, i.e., for x,y ∈ J , |x • y| ≤ ‖x|| ‖y|| (see for example [34]). Note that ‖e|| =
√e • e =
√r.
Let x ∈ J and t ∈ R. For a function g := g(x, t) from J ×R into R, we will use “g′”
for the partial derivative of g with respect to t, and “∇xg”, “∇2xxg”, “∇3
xxxg” to denote
the gradient, Hessian, and the third order partial derivative of g with respect to x. For a
function y := y(x, t) from J × R into J , we will also use “y′” for the partial derivative
of y with respect to t, “Dxy” to denote the first partial derivative of y with respect to
x, and “∇xy” to denote the Jacobian matrix of y (with respect to x).
Let h1,h2, . . . ,hk ∈ J . For a function f from J into R, we write
∇kx...xf(x)[h1,h2, . . . ,hk] :=
∂kf(x+ t1h1 + · · ·+ tkhk)
∂t1 · · · ∂tk|t1=···=tk=0
to denote the value of the kth differential of f taken at x along the directions h1,h2, . . . ,hk.
24
We now present some handy tools that will help with our computations.
Lemma 1.3.1. Let J be a Euclidean Jordan algebra with identity e, and x,y, z ∈ J .
Then
1. ln det (e+ tx)′|t=0 = trace(x) and, more generally, ln det (y + tx)′|t=0 = y−1 •
x, provided that det (e+ tx) and det (y + tx) are positive.
2. trace(e+ tx)−1′|t=0 = −trace(x) and, more generally, trace(e+txy)−1′|t=0 =
−x • y, provided that e+ tx is invertible.
3. ∇x ln detx = x−1, provided that detx is positive (so x is invertible). More gener-
ally, if y is a function of x, then ∇x ln dety = (∇xy)Ty−1, provided that dety is
positive.
4. ∇xx−1 = −dx−1e provided that x is invertible, and hence ∇2
xx ln detx = ∇xx−1 =
−dx−1e. More generally, if y is a function of x, then ∇xy−1 = −dy−1e∇xy
provided that y is invertible.
5. ∇xdxe[y] = 2dx,ye.
6. If x and y are functions of t where t ∈ R, then (x y)′ = x y′ + x′ y. In other
words, (bxcy)′ = bx′cy+ bxcy′. Therefore, bxc′ = bx′c, dx,ye′ = dx,y′e+ dx′,ye,
and, in particular, dxe′ = 2dx,x′e.
7. ∇x trace(x) = e = Dxx and, more generally, if y is a function of x and they are
both simultaneously decomposed, then ∇xtrace(y) = (∇xy)Te = Dxy. Hence, if y
and z are functions of x and they are all simultaneously decomposed, then
∇x(y • z) = Dx(y z) = (Dxy) z + (Dxz) y = (∇xy)Tz + (∇xz)Ty.
25
The proofs of most of these statements are straightforward. We only indicate that item
1 follows from the facts that det (e+ tx)′|t=0 = trace(x) and that det (y + tx)′|t=0 =
det(y)(y−1•x) [18, Proposition III.4.2], the first statement in item 2 follows from spectral
decomposition (I), the first statement in item 3 is taken from [18, Proposition III.4.2],
the first statement in item 4 is taken from [18, Proposition II.3.3], item 5 is taken from
§3 of [18, Chapter II], the first statement in item 6 is taken from §4 of [18, Chapter II],
and the first statement in item 7 is obtained by using item 3 and the observation that
det(exp(x)) = exp(trace(x)).
Definition 1.3.12. We say two elements x,y of a Euclidean Jordan algebra J operator
commute if bxcbyc = bycbxc. In other words, x and y operator commute if for all z ∈ J ,
we have that x (y z) = y (x z).
We remind the reader about the fact that two matrices X, Y ∈ Sn commute if and
only if XY is symmetric, if and only if X and Y can be simultaneously diagonalized, i.e.
they share a common system of orthonormal eigenvectors (the same Q). This fact can be
generalized in the following theorem which can be also applied to multiple-block elements.
Theorem 1.3.5 ([36, Theorem 27]). Two elements of a Euclidean Jordan algebra operator
commute if and only if they are simultaneously decomposed.
Note that if two operator commutative elements x and y are invertible, then so is
x y. Moreover, (x y)−1 = x−1 y−1, and det(x y) = det(x) det(y) (see also [18,
Proposition II.2.2]). In Remark 1 we mentioned that the equality x y = e may not
imply that y = x−1. However, the equality x y = e does imply that y = x−1 when the
elements x and y operator commute.
Lemma 1.3.2 (Properties of d·e). Let x and y be elements of a Euclidean Jordan algebra
with rank r and dimension n, x invertible, and k be an integer. Then
1. det(dxey) = det2(x) det(y).
26
2. dxex−1 = x, dxee = x2.
3. ddyexe = dyedxedye.
4. dxke = dxek.
The first three items of the following lemma are taken from [18, Chapters II and III]
and the last one is taken from [36, §2].
We use “,” for adjoining elements of a Euclidean Jordan algebra J in a row, and use
“;” for adjoining them in a column. Thus, if J is a Euclidean Jordan algebra, and xi ∈ J
for i = 1, 2, . . . ,m, we have
(x1;x2; . . . ;xm) :=
x1
x2
...
xm
∈ J × J × · · · × J︸ ︷︷ ︸
m times
,
and we write (x1;x2; . . . ;xm)T := (x1,x2, . . . ,xm) :=[x1 x2 · · · xm
]. As we mentioned
earlier, we also use the superscript “T” to indicate transposition of column vectors in Rn.
We end this section with the following lemma which is essentially a part of Lemma 12
in [36].
Lemma 1.3.3. Let x ∈ J with a spectral decomposition x = λ1c1 + λ2c2 + · · · + λrcr.
Then the following statements hold.
1. The matrices bxc and dxe commute and thus share a common system of eigenvec-
tors.
2. Every eigenvalue of bxc have the form (1/2)(λi+λj) for some i, j ≤ r. In particular,
x KJ 0 (x KJ 0) if and only if bxc 0 (bxc 0). The eigenvalues of x; λi’s,
are amongst the eigenvalues of bxc.
27
Chapter 2
Stochastic Symmetric Optimization
Problems
In this chapter, we use the Jordan algebraic characterization of symmetric cones to define
the SSP problem in both the primal and dual standard forms. We then see how this
problem can include some general optimization problems as special cases.
2.1 The stochastic symmetric optimization problem
We define a problem based on the DSP problem analogous to the way SLP problem is
defined based on the DLP problem. To do so, we first introduce the definition of a DSP.
Let J be a Euclidean Jordan algebra with dimension n and rank r. The DSP problem
and its dual [36] are
min c • x max bTy
(P1) s.t. ai • x = bi, i = 1, 2, . . . ,m (D1) s.t.m∑i=1
yiai KJ c
x KJ 0; y ∈ Rm,
28
where c,ai ∈ J for i = 1, 2, . . . ,m, b ∈ Rm, x is the primal variable, y is the dual
variable, and, as mentioned in §2.1, KJ is the cone of squares of J .
The pair (P1, D1) can be rewritten in the following compact form
min c • x max bTy
(P2) s.t. Ax = b (D2) s.t. ATy KJ c
x KJ 0; y ∈ Rm,
where A := (a1;a2; . . . ;am) is a linear operator that maps J into Rm and AT is a linear
operator that maps Rm into J such that x ATy = (Ax)Ty. In fact, we can prove weak
and strong duality properties for the pair (P2, D2) as justification for referring to them as
a primal dual pair; see, for example, [30].
In the rest of this section, we assume thatm1,m2, n1, n2, r1, and r2 are positive integers,
and J1 and J2 are Euclidean Jordan algebras with identities e1 and e2, dimensions n1
and n2, and ranks r1 and r2, respectively.
2.1.1 Definition of an SSP in primal standard form
We define the primal form of the SSP based on the primal form of the DSP. Given ai, tj ∈
J1 and wj ∈ J2 for i = 1, 2, . . . ,m1 and j = 1, 2, . . . ,m2. Let A := (a1;a2; . . . ;am1)
be a linear operator that maps x to the m1-dimensional vector whose ith component is
ai • x, b ∈ Rm1 , c ∈ J1, T := (t1; t2; . . . ; tm2) be a linear operator that maps x to
the m2-dimensional vector whose ith component is ti • x, W := (w1;w2; . . . ;wm2) be a
linear operator that maps y to the m2-dimensional vector whose ith component is wi • y,
h ∈ Rm2 , and d ∈ J2. We also assume that A, b and c are deterministic data, and T,W,h
and d are random data whose realizations depend on an underlying outcome ω in an event
space Ω with a known probability function P . Given this data, an SSP with recourse in
29
primal standard form is
min c • x+ E [Q(x, ω)]
s.t. Ax = b
x KJ1 0,
(2.1.1)
where Q(x, ω) is the minimum value of the problem
min d(ω) • y
s.t. W (ω)y = h(ω)− T (ω)x
y KJ2 0,
(2.1.2)
where x is the first-stage decision variable, y is the second-stage variable, and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
2.1.2 Definition of an SSP in dual standard form
In many applications we find that a defined SSP problem based on the dual standard form
(D2) is more useful. In this part we define the dual form of the SSP based on the dual form
of the DSP. Given ai ∈ J1 and ti,wj ∈ J2 for i = 1, 2, . . . ,m1 and j = 1, 2, . . . ,m2. Let
A := (a1,a2, . . . ,am1), b ∈ J1, c ∈ Rm1 , T := (t1, t2, . . . , tm1), W := (w1,w2, . . . ,wm2),
h ∈ J2, and d ∈ Rm2 . We also assume that A, b and c are deterministic data, and T,W,h
and d are random data whose realizations depend on an underlying outcome ω in an event
space Ω with a known probability function P . Given this data, an SSP with recourse in
dual standard form is
max cTx+ E [Q(x, ω)]
s.t. Ax KJ1 b,(2.1.3)
where Q(x, ω) is the maximum value of the problem
30
max d(ω)Ty
s.t. W (ω)y KJ2 h(ω)− T (ω)x,(2.1.4)
where x ∈ Rm1 is the first-stage decision variable, y ∈ Rm2 is the second-stage variable,
and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
In fact, it is also possible to define SSPs in“mixed forms” where the first-stage is based
on primal problem (P2) while the second-stage is based on the dual problem (D2), and
vice versa.
2.2 Problems that can be cast as SSPs
It is interesting to mention that almost all conic optimization problems in real world
applications are associated with symmetric cones, and that, as an illustration of the
modeling power of conic programming, all deterministic convex programming problems
can be formulated as deterministic conic programs (see [29]). As a consequence, by
considering the stochastic counterpart of this result, it is straightforward to show that all
stochastic convex programming problems can be formulated as stochastic conic programs.
In this section we will see how SSPs can include some general optimization problems as
special cases. We start with two-stage stochastic linear programs with recourse.
Problem 1. Stochastic linear programs:
It is clear that the space of real numbers R with Jordan multiplication x y := xy and
inner product x • y := xy forms a Euclidean Jordan algebra. Since a real number is a
square of another real number if and only if it is nonnegative, the cone of squares of R is
indeed R+; the set of all nonnegative real numbers. This verifies that R+ is a symmeric
cone. The cone Rp+ of nonnegative orthants of Rp is also symmetric because it is just
31
Euclidean Jordan algebra Rp = (x1;x2; . . . ;xp) : xi ∈ R, i = 1, 2, . . . , pSymmetric cone Rp
+ = x ∈ Rp : xi ≥ 0, i = 1, 2, . . . , pConic inequality x Rp
+0 ≡ x ≥ 0 and x Rp
+0 ≡ x > 0
Jordan multiplication x y = (x1y1;x2y2; · · · ;xpyp) = diag(x)︸ ︷︷ ︸bxc
y
Inner product x • y = x1y1 + x2y2 + · · ·+ xpyp = xTyIdentity element e = (1; 1; . . . ; 1)Spectral decomposition x = x1︸︷︷︸
λ1
(1; 0; . . . ; 0)︸ ︷︷ ︸c1
+ x2︸︷︷︸λ2
(0; 1; . . . ; 0)︸ ︷︷ ︸c2
+
· · ·+ xp︸︷︷︸λp
(0; 0; . . . ; 1)︸ ︷︷ ︸cp
Cone rank rank(Rp+) = p
Expression for trace trace(x) = x1 + x2 + · · ·+ xpExpression for determinant det(x) = x1 x2 · · · xpExpression for Frobenius norm ||x|| =
√x2
1 + x22 + · · ·+ x2
p = ||x||2Expression for inverse x−1 = ( 1
x1; 1x2
; . . . ; 1xp
) (if xi 6= 0 for all i)
Expression for square root x1/2 = (√x1;√x2; . . . ;
√xp) (if xi ≥ 0 for all i)
Log barrier function − ln det(x) = −∑p
i=1 ln xi if xi > 0 for all i
Table 2.1: The Euclidean Jordan algebraic structure of the nonnegative orthantcone.
the Cartesian product of the symmetric cones
p time︷ ︸︸ ︷R+,R+, . . . ,R+. Table 2.1 summarizes the
Euclidean Jordan algebraic structure of the nonnegative orthant cone Rp+.
When J1 = Rn1 and J2 = Rn2 ; the spaces of vectors of dimensions n1 and n2, re-
spectively, with Jordan multiplication x y := diag(x)y and the standard inner product
x • y := xTy, then KJ1 = Rn1+ and KJ2 = Rn2
+ ; the cones of nonnegative orthants of
Rn1 and Rn2 , respectively, and hence the SSP problem (2.1.1, 2.1.2) becomes the SLP
problem:
min cTx+ E [Q(x, ω)]
s.t. Ax = b
x ≥ 0,
32
where Q(x, ω) is the minimum value of the problem
min d(ω)Ty
s.t. W (ω)y = h(ω)− T (ω)x
y ≥ 0.
Two-stage SLP with recourse has many practical applications, see for example [15].
Problem 2. Stochastic second-order cone programs:
When J1 = En1 and J2 = En2 with Jordan multiplication x y := (xTy, x0y + y0x),
and inner product x • y := xTy, then KJ1 = En1+ and KJ2 = En2
+ ; the second-order cones
of dimensions n1 and n2, respectively (see Example 20), and hence we obtain SSOCP
with recourse. Table 2.21 summarizes the Euclidean Jordan algebraic structure of the
second-order cone in both single- and multiple-block forms.
To introduce the SSOCP problem, we first introduce some notations that will be used
throughout this part and in Chapter 5.
For simplicity sake, we write the single-block second-order cone inequality x Ep+ 0
as x 0 (to mean that x ∈ Ep+) when p is known from the context, and the multiple-
block second-order cone inequality x Ep1+ ×Ep2+ ×···×E
pN+
0 as x N 0 (to mean that x ∈
Ep1+ × Ep2+ × · · · × E
pN+ ) when p1, p2, . . . , pN are known from the context. It is immediately
seen that, for every vector x ∈ Rp where p =∑N
i=1 pi, x N 0 if and only if x is
partitioned conformally as x = (x1;x2; . . . ;xN) and xi 0 for i = 1, 2, . . . , N . We also
write x y (x N y) or y x (y N x) to mean that x− y 0 (x− y N 0).
We now are ready to introduce the definition of an SSOCP with recourse.
Let N1, N2 ≥ 1 be integers. For i = 1, 2, . . . , N1 and j = 1, 2, . . . , N2, let m1,m2, n1, n2,
n1i, n2j be positive integers such that n1 =∑N1
i=1 n1i and n2 =∑N2
i=1 n2j. An SSOCP with
recourse in primal standard form is defined based on deterministic data A ∈ Rm1×n1 , b ∈
1The direct sum of two square matrices A and B is the block diagonal matrix A⊕B :=
[A 00 B
].
33
Euclidean Jordan alg. J Ep = x : x = (x0; x) ∈ R× Rp−1 Ep1 × · · · × EpN
Symmetric cone KJ Ep+ = x ∈ Ep : x0 ≥ ||x||2 Ep1
+ × · · · × EpN
+
x KJ 0 (x KJ 0) x 0 (x 0) x N 0 (x N 0)
Jordan product x y (xTy;x0y + y0x) = Arw(x)y (x1 y1; . . . ;xN yN )
Inner product x • y xTy x1Ty1 + · · ·+ xN
TyNThe identity e (1;0) (e1; . . . ; eN )The matrix bxc Arw(x) Arw(x1)⊕ · · · ⊕Arw(xN )
Spectral decomposition x = (x0 + ||x||2)︸ ︷︷ ︸λ1
(1
2
)(1;
x
||x||2
)︸ ︷︷ ︸
c1
Follows from decomp. of
+ (x0 − ||x||2)︸ ︷︷ ︸λ2
(1
2
)(1;− x
||x||2
)︸ ︷︷ ︸
c2
each block xi, 1 ≤ i ≤ N
rank(KJ ) 2 2N
Expression for trace(x) λ1 + λ2 = 2x0∑N
i=1 trace(xi)Expression for det(x) λ1λ2 = x20 − ||x||2 ΠN
i=1 det(xi)
Frobenius norm ||x||√λ21 + λ22 =
√2||x||2
∑Ni=1 ||xi||2
Inverse x−1 λ−11 c1 + λ−12 c2 = Jxdet(x) (x−11 ; . . . ;x−1N ) (if x−1i exists
(if det(x) 6= 0; o/w x is singular) for all i; o/w x is singular)
Log barrier function − ln (x20 − ||x||22) −∑N
i=1 ln det xi
Table 2.2: The Euclidean Jordan algebraic structure of the second-order cone in single-and multiple-block forms.
Rm1 and c ∈ Rn1 and random data T ∈ Rm2×n1 ,W ∈ Rm2×n2 ,h ∈ Rm2 and d ∈ Rn2
whose realizations depend on an underlying outcome ω in an event space Ω with a known
probability function P . Given this data, the two-stage SSOCP in the primal standard
form is the problem
min cTx+ E [Q(x, ω)]
s.t. Ax = b
x N10,
(2.2.1)
where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of
the problem
min d(ω)Ty
s.t. W (ω)y = h(ω)− T (w)x
y N20,
(2.2.2)
34
where y ∈ Rn2 is the second-stage variable and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
Note that if N1 = n1 and N2 = n2, then SSOCP (2.2.1, 2.2.2) reduces to SLP. In fact,
if N1 = n1, then ni = 1 for each i = 1, 2, . . . , N1, and so xi ∈ E1+ := t ∈ R : t ≥ 0 for
each i = 1, 2, . . . , N1. Thus, the constraint x N1 0 means the same as x ≥ 0, i.e., x lies
in the nonnegative orthant of Rn1 . The same situation occurs for y N2 0. Thus SLPs is
a special case of SSOCP (2.2.1, 2.2.2).
Stochastic quadratic programs (SQPs) are also a special case of SSOCPs. To demon-
strate this, recall that a two-stage SMIQP with recourse is defined based on deterministic
data C ∈ Sn1+ , c ∈ Rn1 , A ∈ Rm1×n1 and b ∈ Rm1 ; and random data H ∈ Sn2
+ ,d ∈ Rn2 , T ∈
Rm2×n1 ,W ∈ Rm2×n2 , and h ∈ Rm2 whose realizations depend on an underlying outcome
in an event space Ω with a known probability function P . Given this data, an SQP with
recourse is
min q1(x, ω) = xTCx+ cTx+ E[Q(x, ω)]
s.t. Ax = b
x ≥ 0,
(2.2.3)
where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of
the problem
min q2(y, ω) = yTH(ω)y + d(ω)Ty
s.t. W (ω)y = h(ω)− T (ω)x
y ≥ 0,
(2.2.4)
where y ∈ Rn2 is the second-stage decision variable and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
35
Observe that the objective function of (2.2.3) can be written as (see [1])
q1(x1, ω) = ||u||2 + E[Q(x, ω)]− 1
4cTC−1c where u = C
1/2x+1
2C−
1/2c.
Similarly, the objective function of (2.2.4) can be written as
q2(y, ω) = ||v||2 − 1
4d(ω)TH(ω)−1d (ω) where v = H(ω)
1/2y +1
2H(ω)−
1/2d(ω).
Thus, problem (2.2.3, 2.2.4) can be transformed into the SSOCP:
min u0
s.t. u− C1/2x = 12C−
1/2c
Ax = b
u 0, x ≥ 0,
(2.2.5)
where Q(x, ω) is the minimum value of the problem
min v0
s.t. v −H(ω)1/2y = 1
2H(ω)−
1/2d(ω)
W (ω)y = h(ω)− T (ω)x
v 0, y ≥ 0,
(2.2.6)
where
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
Note that the SQP problem (2.2.3, 2.2.4) and the SSOCP problem (2.2.5, 2.2.6) will
have the same minimizing solutions, but their optimal objective values are equal up to
constants. More precisely, the difference between the optimal objective values of (2.2.4,
2.2.6) would be −12d(ω)TH(ω)−1 d(ω). Similarly, the optimal objective values of (2.2.3,
36
2.2.4) and (2.2.5, 2.2.6) will differ by
−1
2cTC−1c− 1
2
∫Ω
(d(ω)T H(ω)−1d(ω)
)P (dω).
In §5.1 of Chapter 5, we describe two applications that illustrate the applicability of
two-stage SSOCPs with recourse.
Problem 3. Stochastic rotated quadratic cone programs:
For each vector x ∈ Rn indexed from 0, we write x for the sub-vector consisting of entries
2 through n − 1; therefore x = (x0;x1; x) ∈ R × R × Rn−2. We let En denote the n
dimensional real vector space R× R× Rn−2 whose elements x are indexed with 0.
The rotated quadratic cone [1] of dimension n is defined by
En+ := x = (x0;x1; x) ∈ En : 2x0x1 ≥ ||x||2, x0 ≥ 0, x1 ≥ 0.
The constraint on x that satisfies the relation 2x0x1 ≥ ||x||2 is called a hyperbolic con-
straint. It is clear that the cones En+ and En+ have the same Euclidean Jordan algebraic
structure. In fact, the later is obtained by rotating the former through an angle of
thirty degrees in the x0x1-plane [1]. More specifically, by writing x 0 (x N 0)
to mean that x ∈ Ep+ (x ∈ Ep1+ × Ep2+ × · · · ˆEpN+ ), then one can easily see that the
hyperbolic constraint (x0;x1; x) 0 is equivalent to the second-order cone constraint
(2x0 + x1; 2x0 − x1; 2x) 0. In fact,
2x0 + x1 ≥ ||(2x0 − x1; 2x)||2 ⇐⇒ (2x0 + x1)2 ≥ ||(2x0 − x1; 2x)||22
⇐⇒ 4x0x1 ≥ −4x0x1 + 4||x||22
⇐⇒ 2x0x1 ≥ ||x||22.
So, if we are given the same setting as in Problem (2.2.1, 2.2.2), the two-stage stochastic
rotated quadratic cone program (SRQCP) in the primal standard form is the problem
37
min cTx+ E [Q(x, ω)]
s.t. Ax = b
x N10,
where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of
the problem
min d(ω)Ty
s.t. W (ω)y = h(ω)− T (w)x
y N20.
In §5.2 of Chapter 5, we describe two applications that illustrate the applicability of
two-stage SRQCPs with recourse.
Problem 4. Stochastic semidefinite programs:
When J1 = Sn1 and J2 = Sn2 ; the spaces of real symmetric matrices of orders n1 and
n2, respectively, with Jordan multiplication X Y := 12(XY + Y X) and inner product
X •Y := trace(XY ), then KJ1 = Sn1+ and KJ2 = Sn1
+ ; the cones of real symmetric positive
semidefinite matrices of orders n1 and n2, respectively (see Example 19). We simply write
the linear matrix inequality X Sp+ 0p as X 0 (to mean that X ∈ Sp+). Hence the SSP
problem (2.1.1, 2.1.2) becomes the SSDP problem (see also [9, Subsection 2.1]):
min C •X + E [Q(X,ω)]
s.t. AX = b
X 0,
(2.2.7)
where Q(X,ω) is the minimum value of the problem
min D(ω) • Y
s.t. W(ω)Y = h(ω)− T (ω)X
Y 0,
(2.2.8)
38
Euclidean Jordan alg. J Sp = X ∈ Rp×p : XT = X vec(Sp) = vec(X) ∈ Rp2 : XT = XSymmetric cone KJ Sp+ = X ∈ Sp : X 0 vec(Sp+) = vec(X) ∈ Rp2 : X 0Jordan product x y X Y = 1
2(XY + Y X) vec(X) vec(Y ) = 12 vec(XY + Y X)
Inner product x • y X • Y = trace(XY ) vec(X) • vec(Y ) = vec(X)Tvec(Y )
The identity e Ip vec(Ip)
The matrix bXc 12(I ⊗X +X ⊗ I) 1
2(I ⊗X +X ⊗ I)
The matrix dXe X ⊗X X ⊗XSpectral decomposition X = λ1 q1q
T1︸ ︷︷ ︸
C1
+λ1 q2qT2︸ ︷︷ ︸
C2
+ · · · vec(X) = λ1 vec(q1qT1 )︸ ︷︷ ︸
c1
+λ1 vec(q2qT2 )︸ ︷︷ ︸
c2
+λp qpqTp︸ ︷︷ ︸
Cp
+ · · ·+ λp vec(qpqTp )︸ ︷︷ ︸
cp
rank(KJ ) p p
Table 2.3: The Euclidean Jordan algebraic structure of the positive semidefinite cone inmatrix and vectorized matrix forms.
where X ∈ Sn1 is the first-stage decision variable, Y ∈ Sn2 is the second-stage variable,
and
E[Q(X,ω)] :=
∫Ω
Q(X,ω)P (dω).
Here, clearly, C ∈ Sn1 , D ∈ Sn2 , and the linear operator A : Sn1 −→ Rm1 is defined by
AX := (A1 •X;A2 •X; . . . ;Am1 •X)
where Ai ∈ Sn1 for i = 1, 2, . . . ,m1. The linear operators W : Sn2 −→ Rm2 and T :
Sn2 −→ Rm1 are defined in a similar manner. Problem (2.2.7, 2.2.8) can also be written
in a vectorized matrix form (see [25, Section 1]). See Table 2.3 for a summary of the
Euclidean Jordan algebraic structure of the positive semidefinite cone in both matrix and
vectorized matrix forms. Some applications leading to two-stage SSDPs with recourse can
be found in [49].
Problem 5. Stochastic programming over complex Hermitian positive semidef-
inite matrices:
Recall that a square matrix with complex entries is called Hermitian if it is equal to its
39
own conjugate transpose, and that the eigenvalues of a (positive semidefinite) Hermitian
matrix are always (nonnegative) real valued.
Let Hp denote the space of complex Hermitian matrices of order p. Equipped with the
Jordan multiplication X Y := (XY + Y X)/2, the space Hp forms a Euclidean Jordan
algebra. By following a similar argument to that in Example 19 (but using the spectral
theorem for complex Hermitian matrices instead of using the spectral theorem for real
symmetric matrices, see [43, Theorem 5.4.13]), we can show that the cone of squares
of the space of complex Hermitian matrices of order p is the cone of complex Hermitian
positive semidefinite matrices of orders p, denoted byHp+. The Euclidean Jordan algebraic
structure of the cone of complex Hermitian positive semidefinite matrices is analogous to
that of the cone of real symmetric positive semidefinite matrices.
Another subclass of stochastic symmetric programming problems is obtained when
J1 = Hn1 and J2 = Hn2 , and so we have that KJ1 = Hn1+ and KJ2 = Hn1
+ ; the cones
of complex Hermitian positive semidefinite matrices of orders n1 and n2, respectively.
In fact, it is impossible to deal with the field of complex numbers directly as it is an
unordered field. However, this can be overcome by defining a transformation that maps
complex Hermitian matrices to symmetric matrices; see, for example, [35, 18].
Problem 6. Stochastic programming over quaternion Hermitian positive semidef-
inite matrices:
We first recall some notions of quaternions and quaternion matrices. The entries of the
field of real quaternions has the form x = x0 + x1i + x2j + x3k where x0, x1, x2, x3 ∈ R
and i, j, and k are abstract symbols satisfying i2 = j2 = k2 = ijk = −1. The conjugate
transpose of x is x∗ = x0 − x1i − x2j − x3k. If A is a p × p quaternion matrix and
A∗ is its conjugate transpose, then A is called Hermitian if A = A∗. Note that, if A is
Hermitian, then w∗Aw is a real number for any p dimensional column vector w with
quaternion components. The matrix A is called positive semidefinite if it is Hermitian
40
and w∗Aw ≥ 0 for any p dimensional column vector w with quaternion components. A
(positive semidefinite) quaternion Hermitian matrix has (nonnegative) real eigenvalues.
Let QHp denote the space of quaternion Hermitian matrices of order p. Equipped with
the Jordan multiplication X Y := (XY + Y X)/2, the space QHp forms a Euclidean
Jordan algebra. It has been shown [46] that the spectral theorem holds also for quaternion
matrices (i.e., any quaternion Hermitian matrix is conjugate to a real diagonal matrix).
So, by following argument similar to that in Example 19, we can show that the cone of
squares of the space of quaternion Hermitian matrices of order p is the cone of quaternion
Hermitian positive semidefinite matrices of orders p, denoted by QHp+. The Euclidean
Jordan algebraic structure of the cone of complex Hermitian positive semidefinite matrices
is analogues to that of the cone of real symmetric positive semidefinite matrices and that
of the cone of complex Hermitian positive semidefinite matrices.
The last class of problems is obtained when J1 = QHn1 and J2 = QHn2 with X Y :=
(XY +Y X)/2. Hence KJ1 = QHn1+ and KJ2 = QHn1
+ ; the cones of quaternion Hermitian
positive semidefinite matrices of orders n1 and n2, respectively. Similar to Problem (5),
there is a difficulty in dealing with quaternion entries, but this difficulty can be overcome
by defining a transformation that maps Hermitian matrices with quaternion entries to
symmetric matrices; see, for example, [35, 18].
Of course, it is also possible to consider a “mixed problem” such as that one obtained
by selecting J1 = Rn1 and J2 = En2 , or by selecting J1 = En1 and J2 = Sn2 , etc.
In many applications we find that the models lead to SSPs in dual standard form. The
focus of this dissertation is on the SSP in dual standard form. In the next two chapters,
we will focus on interior point algorithms for solving the SSP problem (2.1.3) and (2.1.4).
41
Chapter 3
A Class of Polynomial Logarithmic
Barrier Decomposition Algorithms
for Stochastic Symmetric
Programming
In this chapter, we turn our attention to the study of logarithmic barrier decomposition-
based interior point methods for the general SSP problem. Since our algorithm applies to
all symmetric cones, this work extends Zhao’s work [48] which focuses particularly on SLP
problem (Problem 1), and Mehrotra and Ozevin’s work [25] which focuses particularly on
the SSDP problem (Problem 4). More specifically, we use logarithmic barrier interior
point methods to present a Bender’s decomposition-based algorithm for solving the SSP
problem (2.1.3) and (2.1.4) and then prove its polynomial complexity. Our convergence
analysis proceeds by showing that the log barrier associated with the recourse function
of SSPs behaves as a strongly self-concordant barrier and forms a self-concordant family
on the first stage solutions. Our procedure closely follows that of [25] (which essentially
follows the procedure of [48]), but our setting is much more general. The results of this
42
chapter have been submitted for publication [3].
3.1 The log barrier problem for SSPs
We begin by presenting the extensive formulation of the SSP problem (2.1.3, 2.1.4) and
the log barrier [30] problems associated with them.
3.1.1 Formulation and assumptions
We now examine (2.1.3, 2.1.4) when the event space Ω is finite. Let (T (k),W (k),h(k),
d(k)) : k = 1, 2, . . . , K be the set of the possible values of the random variables(T (ω),
W (ω),h(ω),d(ω))
and let pk := P(T (ω),W (ω),h(ω),d(ω)
)=(T (k),W (k),h(k),d(k)
)be
the associated probability for k = 1, 2, . . . , K. Then Problem (2.1.3, 2.1.4) becomes
max cTx+K∑k=1
pkQ(k)(x)
s.t. Ax KJ1 b,(3.1.1)
where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum value of the problem
max d(k)Ty(k)
s.t. W (k)y(k) KJ2 h(k) − T (k)x,
(3.1.2)
where x ∈ Rm1 is the first-stage decision variable, and y(k) ∈ Rm2 is the second-stage
variable for k = 1, 2, . . . , K. Now, for convenience we redefine d(k) as d(k) := pkd(k) for
43
k = 1, 2, . . . , K, and rewrite Problem (3.1.1, 3.1.2) as
max cTx+K∑k=1
Q(k)(x)
s.t. Ax+ s = b
s KJ1 0,
(3.1.3)
where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum value of the problem
max d(k)Ty(k)
s.t. W (k)y(k) + s(k) = h(k) − T (k)x
s(k) KJ2 0.
(3.1.4)
Let ν(k) be the second-stage dual multiplier. The dual of Problem of (3.1.4) is
min (h(k) − T (k)x) • ν(k)
s.t. W (k)Tν(k) = d(k)
ν(k) KJ2 0.
(3.1.5)
The log barrier problem associated with Problem (3.1.3, 3.1.4) is
max η(µ,x) := cTx+K∑k=1
ρ(k)(µ,x) + µ ln det s
s.t. Ax+ s = b
s KJ1 0,
(3.1.6)
where, for k = 1, 2, . . . , K, ρ(k)(µ,x) is the maximum value of the problem
max d(k)Ty(k) + µ ln det s(k)
s.t. W (k)y(k) + s(k) = h(k) − T (k)x
s(k) KJ2 0.
(3.1.7)
44
(Here µ > 0 is a barrier parameter.) If for some k, Problem (3.1.7) is infeasible, then we
define∑K
k=1 ρ(k)(µ,x) :=∞. The log barrier problem associated with Problem (3.1.5) is
the problem
min (h(k) − T (k)x) • ν(k) − µ ln detν(k)
s.t. W (k)Tν(k) = d(k)
ν(k) KJ2 0,
(3.1.8)
which is the Lagrangian dual of (3.1.7). Because Problems (3.1.7) and (3.1.8) are, respec-
tively, concave or convex, (y(k), s(k)) and ν(k) are optimal solutions to (3.1.7) and (3.1.8),
respectively, if and only if they satisfy the following optimality conditions:
s(k) ν(k) = µ e2,
W (k)y(k) + s(k) = h(k) − T (k)x,
W (k)Tν(k) = d(k),
s(k) KJ2 0, ν(k) KJ2 0.
(3.1.9)
The elements s(k) and ν(k) may not operator commute. So, the equality s(k) = µ ν(k)−1
may not hold (see Remark 1 in Chapter 1). In fact, we need to scale the optimality
conditions (3.1.9) so that the scaled elements are simultaneously decomposed.
Let p KJ2 0. From now on, with respect to p, we define
s(k) := dp−1es(k), ν(k) := dpeν(k), h := dp−1eh, W (k) := dp−1eW (k), and T (k) := dp−1eT (k).
Recall that, dpedp−1e = dpedpe−1 = I. We have the following lemma and proposition.
Lemma 3.1.1 (Lemma 28, [36]). Let p be an invertible element in J2. Then s ν = µe2
if and only if s ν = µe2.
Proposition 3.1.1. (y, s,ν) satisfies the optimality conditions ( 3.1.1) if and only if
45
(y, s, ν) satisfies the relaxed optimality conditions:
s(k) ν(k) = µe2,
W (k)y(k) + s(k) = h(k) − T (k)x,
W (k)Tν(k) = d(k),
s(k) KJ2 0, ν(k) KJ2 0.
(3.1.10)
Proof. The proof follows from Lemma 3.1.1, Proposition 1.3.2, and the fact that dpe(KJ2) =
KJ2 , and likewise, as an operator, dpe(int(KJ2)) = int(KJ2), because KJ2 is symmetric. 2
To our knowledge, this is the first time this effective way of scaling is being used for
stochastic programing, but it was originally proposed by Monteiro [28] and Zhang [47] for
DSDP, and after that generalized by Schmieta and Alizadeh [36] for DSP.
With this change of variable Problem (3.1.6, 3.1.7) becomes
max η(µ,x) := cTx+K∑k=1
ρ(k)(µ,x) + µ ln det s
s.t. Ax+ s = b
s KJ1 0,
(3.1.11)
where, for k = 1, 2, . . . , K, ρ(k)(µ,x) is the maximum value of the problem
max d(k)Ty(k) + µ ln det s(k)
s.t. W (k)y(k) + s(k) = h(k) − T (k)x
s(k) KJ2 0,
(3.1.12)
46
and Problem (3.1.8) becomes
min (h(k) − T (k)x) • ν(k) − µ ln det ν(k)
s.t. W (k)Tν(k) = d(k)
ν(k) KJ2 0,
(3.1.13)
Note that the Problem (3.1.7) and Problem (3.1.12) have the same maximizer, but
their optimal objective values are equal up to a constant. More precisely, the difference
between their optimal objective values would be µ ln detp2 (see item 1 of Lemma (1.3.2)).
Similarly, Problems (3.1.8) and (3.1.13) have the same minimizer but their optimal ob-
jective values differ by the same value (µ ln detp2).
The SSP (3.1.11, 3.1.12) can be equivalently written as a DSP:
max
= η(µ,x)︷ ︸︸ ︷cTx+ µ ln det s+
K∑k=1
(d(k)Ty(k) + µ ln det s(k)
)︸ ︷︷ ︸
= ρ(k)(µ,x)
s.t. Ax+ s = b
W (k)y(k) + s(k) = h(k) − T (k)x, k = 1, 2, . . . , K
s KJ1 0, s(k) KJ2 0, k = 1, 2, . . . , K.
(3.1.14)
In Subsection 3.1.2, we need to compute ∇xη(µ,x) and ∇2η(µ,x) so that we can
determine the Newton direction defined by
∆x := −∇2η(µ,x)−1∇xη(µ,x)
for our algorithms. We shall see that each choice of p leads to a different search direction
(see Algorithm 1). As we mentioned earlier, we are interested in the class of p for which
the scaled elements are simultaneously decomposed. In view of Theorem 1.3.5, it is enough
to choose p so that s(k) and ν(k) operator commute. That is, we restrict our attention to
47
the following set of scalings:
C(s(k),ν(k)) := p KJ2 0 | s(k) and ν(k) operator commute.
We introduce the following definition [1].
Definition 3.1.1. The set of directions ∆x arising from those p ∈ C(s(k),ν(k)) is called
the commutative class of directions, and a direction in this class is called a commutative
direction.
It is clear that p = e may not be in C(s(k),ν(k)). The following choices of p shows
that the set C(s(k),ν(k)) is not empty.
We may choose p = s(k)1/2and get s(k) = e2. In fact, by using Lemma 1.3.2, it can
be seen that
s(k) = dp−1e s(k)
=⌈s(k)−1/2
⌉(s(k)1/2
)2
=(⌈s(k)−1/2
⌉ ⌈s(k)1/2
⌉)e2
=
(⌈s(k)1/2
⌉−1 ⌈s(k)1/2
⌉)e2
= e2.
We may choose p = ν(k)−1/2and get ν(k) = e2. To see this, note that by using Lemma
1.3.2 we have
ν(k) = dpeν(k)
=⌈ν(k)−1/2
⌉(ν(k)1/2
)2
=(⌈ν(k)−1/2
⌉ ⌈ν(k)1/2
⌉)e2
=
(⌈ν(k)1/2
⌉−1 ⌈ν(k)1/2
⌉)e2
= e2.
The above two choices of directions are well-known in commutative classes. These two
choices form a class of Newton directions derived by Helmberg et al [20], Monteiro [28],
48
and Kojima et al [22], and referred to as the HRVW/KSH/M directions. It is interesting
to mention that the popular search direction due to Nesterov-Todd (NT) is also in a
commutative class, because in this case one chooses p in such a way that ν(k) = s(k).
More precisely, in the NT direction we choose
p =
(⌈ν(k)1/2
⌉(⌈ν(k)1/2
⌉s(k))−1/2
)−1/2
=
(⌈s(k)−1/2
⌉(⌈s(k)1/2
⌉ν(k)
)1/2)−1/2
,
and, using Lemma 1.3.2, we get
dpe2ν(k) = dp2eν(k)
=
⌈⌈ν(k)1/2
⌉(⌈ν(k)1/2
⌉s(k))−1/2
⌉−1
ν(k)
=
⌈⌈ν(k)−1/2
⌉(⌈ν(k)1/2
⌉s(k))1/2
⌉ν(k)
=
(⌈ν(k)−1/2
⌉⌈(⌈ν(k)1/2
⌉s(k))1/2
⌉⌈ν(k)−1/2
⌉)ν(k)
=
(⌈ν(k)−1/2
⌉⌈(⌈ν(k)1/2
⌉s(k))1/2
⌉)e2
=(⌈ν(k)−1/2
⌉ ⌈ν(k)1/2
⌉)s(k)
= s(k).
Thus, ν(k) = dpeν(k) = dp−1es(k) = s(k).
We proceed by making some assumptions. First we define
F1 :=x : Ax+ s = b, s KJ1 0
;
F (k)(x) :=y(k) : W (k)y(k) + s(k) = h(k) − T (k)x, s(k) KJ2 0
for k = 1, 2, . . . , K;
F (k)2 :=
x : F (k)(x) 6= ∅
for k = 1, 2, . . . , K;
F2 :=⋂Kk=1F
(k)2 ;
F0 := F1
⋂F2;
49
F :=
(x, s,γ)× (y(1), s(1), ν(1), . . . ,y(K), s(K), ν(K)) : Ax+ s = b, s KJ1 0,
W (k)y(k) + s(k) = h(k) − T (k)x, s(k) KJ2 0, W (k)Tν(k) = d(k),
ν(k) KJ2 0, k = 1, 2, . . . , K;ATγ +∑K
k=1 T(k)Tν(k) = c
.
Here γ is the first-stage dual multiplier. Now we make
Assumption 3.1.1. The matrices A and W (k) for all k have full column rank.
Assumption 3.1.2. The set F is nonempty.
Assumption 3.1.1 is for convenience. Assumption 3.1.2 guarantees strong duality for
first- and second-stage SSPs. In other words, it requires that Problem (3.1.14) and its dual
have strictly feasible solutions. This implies that problems (3.1.11-3.1.14) have a unique
solutions. Note that for a given µ > 0,∑K
k=1 ρ(k)(µ,x) <∞ if and only if x ∈ F2. Hence,
the feasible region for (3.1.11) is described implicitly by F0. Throughout the paper we
denote the optimal solution of the first-stage problem (3.1.11) by x(µ), and the solutions
of the optimality conditions (3.1.10) by (y(k)(µ,x), s(k)(µ,x), ν(k)(µ,x)). The following
proposition establishes the relationship between the optimal solutions of Problems (3.1.11,
3.1.12) and those of Problem (3.1.14).
Proposition 3.1.2. Let µ > 0 be fixed. Then (x(µ), s(µ); y(1)(µ), s(1)(µ); · · · ; y(K)(µ),
s(K)(µ)) is the optimal solution of (3.1.14) if and only if (x(µ), s(µ)) is the optimal solution
of (3.1.11) and (y(1)(µ), s(1)(µ); · · · ; y(K)(µ), s(K)(µ)) are the optimal solutions for (3.1.12)
for given µ and x = x(µ).
3.1.2 Computation of ∇xη(µ,x) and ∇2xxη(µ,x)
In order to compute ∇xη(µ,x) and ∇2xxη(µ,x), we need to determine the derivative of
ρ(k)(µ,x) with respect to x. Let (y(k), ν(k), s(k)) := (y(k)(µ,x), ν(k)(µ,x), s(k)(µ,x)). We
50
first note that from (3.1.10) we have that
(h(k)−T (k)x)•ν(k) = (W (k)y(k)+s(k))•ν(k) = y(k)T(W (k)Tν(k))+s(k)•ν(k) = y(k)Td(k)+r2 µ,
where in the second equality we used the observation that
W (k)y(k) • ν(k) =
m2∑i=1
(y(k)i w
(k)i ) • ν(k) =
m2∑i=1
y(k)i (w
(k)i • ν(k)) = y(k)T(W (k)Tν(k))
(here w(k)i ∈ J2 is the ith column of W (k)), and in the last equality we used that trace(e2) =
rank(J2) = r2. This implies that
(h(k) − T (k)x) • ν(k) − µ ln det ν(k) = ρ(k)(µ,x)− µ ln det s(k) + r2 µ− µ ln det ν(k)
= ρ(k)(µ,x) + r2 µ− µ ln det(s(k) ν(k))
= ρ(k)(µ,x) + r2 µ (1− lnµ).
Thus,
ρ(k)(µ,x) = (h(k) − T (k)x) • ν(k) − µ ln det ν(k) − r2 µ (1− lnµ). (3.1.15)
Differentiating (3.1.15) and using the optimality conditions (3.1.10), we obtain
∇xρ(k)(µ,x) = ∇x((h(k) − T (k)x) • ν(k))− µ ∇xln det ν(k)
= (∇x(h(k) − T (k)x))Tν(k) + (∇xν(k))T(h(k) − T (k)x)− µ (∇xν
(k))Tν(k)−1
= −T (k)Tν(k) + (∇xν(k))T(W (k)y(k) + s(k))− µ (∇xν
(k))Tν(k)−1
= −T (k)Tν(k) + (∇xν(k))T(W (k)y(k)) + (∇xν
(k))T(s(k) − µ ν(k)−1)
= −T (k)Tν(k) + (∇xν(k))T(W (k)y(k))
From (3.1.10) we have that W(k)i • ν(k) = d
(k)i , where d
(k)i is the ith entry of d(k). This
implies that
51
(∇xν(k))T(W (k)y(k)) = (∇xν
(k))Tm2∑i=1
y(k)i w
(k)i
=
m2∑i=1
y(k)i (∇xν
(k))Tw(k)i
=
m2∑i=1
y(k)i ∇x(ν(k) • w(k)
i )
= 0.
Thus,
∇xρ(k)(µ,x) = −T (k)Tν(k) and ∇2
xxρ(k)(µ,x) = −T (k)T∇xν
(k).
Therefore, we also need to determine the derivative of ν(k) with respect to x. Differ-
entiating (3.1.10) with respect to x, we get the system
∇xν(k) = −µ
⌈s(k)−1
⌉∇xs
(k),
W (k)∇xy(k) +∇xs
(k) = −T (k),
W (k)T∇xν(k) = 0.
(3.1.16)
Solving the system (3.1.16), we obtain
∇xs(k) = −
⌈s(k)1/2
⌉P (k)
⌈s(k)−1/2
⌉T (k),
∇xν(k) = µ
⌈s(k)−1/2
⌉P (k)
⌈s(k)−1/2
⌉T (k),
∇xy(k) = −R(k)−1
W (k)T⌈s(k)−1
⌉T (k),
(3.1.17)
where
R(k) := R(k)(µ,x) = W (k)T⌈s(k)−1
⌉W (k),
P (k) := P (k)(µ,x) = I −⌈s(k)−1/2
⌉W (k)R(k)−1
W (k)T⌈s(k)−1/2
⌉.
(3.1.18)
Observing that, by differentiating Ax + s = b with respect to x, we get ∇xs = −A.
52
We then have
∇xη(µ,x) = c−K∑k=1
∇xρ(k)(µ,x) + µ (∇xs)Ts−1,
= c−K∑k=1
T (k)Tν(k) − µ ATs−1,
(3.1.19)
and
∇2xxη(µ,x) = −
K∑k=1
T (k)T∇xν(k) + µ AT
⌈s−1⌉∇xs
= −µK∑k=1
T (k)T⌈s(k)−1/2
⌉P (k)
⌈s(k)−1/2
⌉T (k) − µ AT
⌈s−1⌉A.
(3.1.20)
3.2 Self-concordance properties of the log-barrier re-
course
The notion of so called self-concordant functions introduced by Nesterov and Nemirovskii
[30] allows us to develop polynomial time path following interior point methods for solving
SSPs. In this section, we prove that the recourse function with log barrier is a strongly
self-concordant function and leads to a strongly self-concordant family with appropriate
parameters.
3.2.1 Self-concordance of the recourse function
This subsection is devoted to show that η(µ, ·) is a µ strongly self-concordant barrier on
F0. First, we have the following definition.
Definition 3.2.1 (Nesterov and Nemirovskii [30, Definition 2.1.1]). Let E be a finite-
dimensional real vector space space, G be an open nonempty convex subset of E, and let
53
f be a C3, convex mapping from G to R. Then f is called α-self-concordant on G with
the parameter α > 0 if for every x ∈ G and h ∈ E, the following inequality holds:
|∇3xxxf(x)[h,h,h]| ≤ 2α−1/2(∇2
xxf(x)[h,h])3/2. (3.2.1)
An α-self-concordant function f on G is called strongly α-self-concordant if f tends to
infinity for any sequence approaching a boundary point of G.
We note that in the above definition the set G is assumed to be open. However,
relative openness would be sufficient to apply the definition. See also [30, Item A, Page
57].
The proof of strongly self-concordance of the function η(µ, ·) relies on two lemmas
that we state and prove below. We point out that the proof of the first lemma is shown
in [30, Proposition 5.4.5] for s ∈ Sn1++ := int(Sn1
+ ), i.e., s lies in the interior of the cone of
real symmetric positive semidefinite matrices of orders n1. We now generalize this proof
for the case where s lies in the interior of an arbitrary symmetric cone of dimension n1,
where n1, as mentioned before, is any positive integer.
Lemma 3.2.1. For any fixed µ > 0, the function f(s) := −µ ln det s is a µ strongly
self-concordant barrier on KJ1 .
Proof. Let si∞i=1 be any sequence in KJ1 . It is clear that the function f(si) tends to
∞ when si approaches a point from boundary of KJ1 . It remains to show that (3.2.1) is
satisfied for f(s) on KJ1 . Let s KJ1 0 and h ∈ J1, then there exists an element u ∈ J1
such that s = u2, and therefore, by Lemmas 1.3.1 and 1.3.2, we have that
∇sf(s)[h] = −µln det (s+ th)′|t=0
= −µln det (u2 + th)′|t=0
= −µln det (due(e1 + tdue−1h))′|t=0
= −µln det (u2) + ln det (e1 + tdue−1h))′|t=0
54
= −µ trace(due−1h)
det e1
= −µ(s−1 • h),
∇2ssf(s)[h,h] = ∇s(∇sf(s)[h]) • h
= −µ (∇s(s−1 • h)) • h
= −µ (Ds(s−1 h)) • h
= µ (s−2 h) • h
= µ(s−1 h) • (s−1 h),
∇3sssf(s)[h,h,h] = ∇s(∇2
ssf(s)[h,h]) • h
= µ (∇s((s−1 h) • (s−1 h))) • h
= µ (Ds((s−1 h) (s−1 h))) • h
= −2µ ((s−2 h) (s−1 h)) • h
= −2µ(s−1 h) • ((s−1 h) (s−1 h)).
Let h = s−1 h ∈ J1, and λ1, λ2, . . . , λp be its eigenvalues. The result is established
by observing that
2µ |trace(h3)|︸ ︷︷ ︸= |∇3
sssf(s)[h,h,h]|
= 2µ
p∑i=1
|λi|3 ≤ 2µ
p∑i=1
λi2
3/2
= 2µ trace(h2)3/2︸ ︷︷ ︸= 2µ−1/2∇2
xxf(s)[h,h]3/2
. 2
Lemma 3.2.2. For any fixed µ > 0, ρ(k)(µ,x) is a µ strongly self-concordant barrier on
F (k)2 , k = 1, 2, · · · , K.
Proof. Let xi∞i=1 be any sequence in F (k)2 . It is clear that the function ρ(k)(µ,xi) tends
to ∞ when xi approaches a point from boundary of F (k)2 . It remains to show that (3.2.1)
is satisfied for ρ(k)(µ,x) on F (k)2 . For any µ > 0, x ∈ F (k)
2 , and d ∈ Rm1 , we define the
univariate function
Ψ(k)(t) := ∇2xxρ
(k)(µ,x+ td)[d,d].
55
Note that Ψ(k)(0) = ∇2xxρ
(k)(µ,x)[d,d] and Ψ(k)(0)′ = ∇3xxxρ
(k)(µ,x)[d,d,d]. So, to
prove the lemma, it is enough to show that
|Ψ(k)(0)′| ≤ 2√µ|Ψ(k)(0)|3/2.
Let (ν(k)(t), s(k)(t), P (k)(t), R(k)(t)) := (ν(k)(µ,x+td), s(k)(µ,x+td), P (k)(µ,x+td), R(k)(µ,x+
td)). We also define
u(k)(t) :=õ P (k)(t)
⌈s(k)−1/2
(t)⌉T (k)(t)d.
By the notations introduced in §4, we have (ν(k), s(k)) = (ν(k)(0), s(k)(0)). We also let
(P (k), R(k),u(k)) := (P (k)(0), R(k)(0),u(k)(0)).
Notice that, by the definition of P (k), we have P (k)2= P (k). Using (3.1.17) and
(3.1.18), we get
Ψ(k)(0) = ∇2xxρ
(k)(µ,x)[d,d]
= −(T (k)T∇xν(k))[d,d]
= −µ(T (k)T
⌈s(k)−1/2
(t)⌉P (k)
⌈s(k)−1/2
(t)⌉T (k)
)[d,d]
= −µ dT(T (k)T
⌈s(k)−1/2
(t)⌉P (k)2
⌈s(k)−1/2
(t)⌉T (k)
)d
= −u(k) • u(k)
= −‖u(k)‖2.
Hence Ψ(k)(0)′= −2u(k) • u(k)′. So, in order to bound |Ψ(k)(0)
′|, we need to compute
the derivative of u(k) with respect to t. Using (3.1.18), we have
u(k)′ =√µP (k)
⌈s(k)−1/2
(t)⌉′
T (k)d
=√µ⌈s(k)−1/2
(t)⌉−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T⌈s(k)−1/2
(t)⌉2′
T (k)d
56
=√µ⌈s(k)−1/2
(t)⌉′−⌈s(k)−1/2
(t)⌉′W (k)R(k)−1
W (k)T⌈s(k)−1/2
(t)⌉2
−⌈s(k)−1/2
(t)⌉W (k)(R(k)−1
)′W (k)T⌈s(k)−1/2
(t)⌉2
−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉2)′
T (k)d
=√µ⌈s(k)−1/2
(t)⌉′−⌈s(k)−1/2
(t)⌉′W (k)R(k)−1
W (k)T⌈s(k)−1/2
(t)⌉2
+⌈s(k)−1/2
(t)⌉W (k)R(k)−1
(R(k))′R(k)−1W (k)T
⌈s(k)−1/2
(t)⌉2
−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉ ⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′ ⌈s(k)−1/2
(t)⌉)
T (k)d
=√µ⌈s(k)−1/2
(t)⌉′−⌈s(k)−1/2
(t)⌉′W (k)R(k)−1
W (k)T⌈s(k)−1/2
(t)⌉2
+⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉ ⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′ ⌈s(k)−1/2
(t)⌉)
W (k)R(k)−1W (k)T
⌈s(k)−1/2
(t)⌉2
−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉ ⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′ ⌈s(k)−1/2
(t)⌉)
T (k)d
=√µ⌈s(k)−1/2
(t)⌉′(
I − W (k)R(k)−1W (k)T
⌈s(k)−1/2
(t)⌉2)
−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉ ⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′ ⌈s(k)−1/2
(t)⌉)(
− W (k)R(k)−1W (k)T
⌈s(k)−1/2
(t)⌉2
+ I)T (k)d
=√µ⌈s(k)−1/2
(t)⌉′−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉ ⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′ ⌈s(k)−1/2
(t)⌉)(
I − W (k)R(k)−1W (k)T
⌈s(k)−1/2
(t)⌉2)
T (k)d
=√µ⌈s(k)−1/2
(t)⌉′−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉ ⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′ ⌈s(k)−1/2
(t)⌉)(⌈
s(k)−1/2(t)⌉−1
P (k)⌈s(k)−1/2
(t)⌉)
T (k)d
=⌈s(k)−1/2
(t)⌉′−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T(⌈s(k)−1/2
(t)⌉ ⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′ ⌈s(k)−1/2
(t)⌉)⌈
s(k)−1/2(t)⌉−1
u(k).
Notice that, for any ξ ∈ Rm2 , we have
u(k) •⌈s(k)−1/2
(t)⌉W (k)ξ =
(P (k)
⌈s(k)−1/2
(t)⌉T (k)d
)•⌈s(k)−1/2
(t)⌉W (k)ξ
=(⌈s(k)−1/2
(t)⌉T (k)d
)• P (k)
⌈s(k)−1/2
(t)⌉W (k)ξ
57
=(⌈s(k)−1/2
(t)⌉T (k)d
)•(⌈s(k)−1/2
(t)⌉W (k)ξ
−⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T⌈s(k)−1/2
(t)⌉2
W (k)︸ ︷︷ ︸=R(k)
ξ)
= 0.
This implies that
Ψ(k)(0)′= 2u(k) • u(k)′ = 2u(k) •
(⌈s(k)−1/2
(t)⌉′⌈s(k)−1/2
(t)⌉−1
u(k)). (3.2.2)
By (3.2.2), (3.1.17) and (3.1.18) and using norm inequalities, we get
|Ψ(k)(0)′| = 2µ−1/2
∣∣∣∣u(k) •⌈s(k)−1/2
(t)⌉′⌈s(k)1/2(t)
⌉u(k)
∣∣∣∣= µ−1/2
∣∣∣∣u(k) •(⌈s(k)−1/2
(t)⌉′⌈s(k)1/2(t)
⌉+⌈s(k)1/2(t)
⌉⌈s(k)−1/2
(t)⌉′)u(k)
∣∣∣∣= µ−1/2
∣∣∣u(k) •⌈s(k)1/2(t)
⌉(⌈s(k)−1/2
(t)⌉⌈s(k)−1/2
(t)⌉′
+⌈s(k)−1/2
(t)⌉′⌈s(k)−1/2
(t)⌉)⌈
s(k)1/2(t)⌉u(k)
∣∣∣= µ−1/2
∣∣∣u(k) •⌈s(k)1/2(t)
⌉(⌈s(k)−1/2
(t)⌉2)′⌈
s(k)1/2(t)⌉u(k)
∣∣∣= µ−1/2
∣∣∣u(k) •⌈s(k)1/2(t)
⌉⌈s(k)−1
(t)⌉′⌈s(k)1/2(t)
⌉u(k)
∣∣∣= µ−1/2
∣∣∣u(k) •⌈ν(k)−1/2
(t)⌉⌈ν(k)(t)
⌉′⌈ν(k)−1/2
(t)⌉u(k)
∣∣∣= 2µ−1/2
∣∣∣u(k) •⌈ν(k)−1/2
(t)⌉⌈ν(k)(t), ν(k)′(t)
⌉⌈ν(k)−1/2
(t)⌉u(k)
∣∣∣= 2µ−1/2
∣∣∣u(k) •⌈ν(k)−1/2
(t)⌉⌈ν(k)(t), ν(k)(µ,x+ td)′|t=0
⌉⌈ν(k)−1/2
(t)⌉u(k)
∣∣∣= 2µ−1/2
∣∣∣u(k) •⌈ν(k)−1/2
(t)⌉⌈ν(k)(t),∇xν
(k)[d]⌉⌈ν(k)−1/2
(t)⌉u(k)
∣∣∣= 2µ−1/2
∣∣∣u(k) •⌈e2, ν
(k)−1/2(t) ∇xν
(k)[d] ν(k)−1/2(t)⌉u(k)
∣∣∣= 2µ−1/2
∣∣∣u(k) •⌊ν(k)−1/2
(t) ∇xν(k)[d] ν(k)−1/2
(t)⌋u(k)
∣∣∣≤ 2µ−1/2‖u(k)‖
∥∥∥⌊ν(k)−1/2(t) ∇xν
(k)[d] ν(k)−1/2(t)⌋u(k)
∥∥∥≤ 2µ−1/2‖u(k)‖2
∥∥∥ν(k)−1/2(t) ∇xν
(k)[d] ν(k)−1/2(t)∥∥∥
= 2µ−1/2‖u(k)‖2∥∥∥ ⌈ν(k)−1/2
(t)⌉∇xν
(k)d∥∥∥
= 2µ−1 ‖u(k)‖2∥∥∥ ⌈s(k)1/2(t)
⌉∇xν
(k)d∥∥∥
58
= 2µ−1/2‖u(k)‖2‖u(k)‖
= 2µ−1/2‖Ψ(k)(0)‖3/2. The lemma is established. 2
Theorem 3.2.1. For any fixed µ > 0, η(µ,x) is a µ strongly self-concordant barrier on
F0.
Proof. It is trivial to see that the linear function cTx is a µ strongly self-concordant
barrier on F1 (indeed, both sides of (3.2.1) are identically zero). By Lemma 3.2.1, one
can see that the function µ ln det s is also a µ strongly self-concordant barrier on F1.
By Lemma 3.2.2 and [30, Proposition 2.1.1(ii)], we can conclude that∑K
k=1 ρ(k)(µ,x)
is a µ strongly self-concordant barrier on F2. The theorem is then established by [30,
Proposition 2.1.1(ii)]. 2
3.2.2 Parameters of the self-concordant family
We have shown that the recourse function η(µ, ·) is a strongly self-concordant function.
Having such a property, this function enjoys many nice features. In this subsection, we
show that the family of functions η(µ, ·) : µ > 0 is a strongly self-concordant family
with appropriate parameters. We first introduce the definition of self-concordant families.
Definition 3.2.2 (Nesterov and Nemirovskii [30, Definition 3.1.1]). Let R++ be the set of
all positive real numbers. Let G be an open nonempty convex subset of Rn. Let µ ∈ R++
and let fµ : R++×G→ R be a family of functions indexed by µ. Let α1(µ), α2(µ), α3(µ),
α4(µ) and α5(µ) : R++ → R++ be continuously differentiable functions on µ. Then the
family of functions fµµ∈R++ is called strongly self-concordant with the parameters α1,
α2, α3, α4, α5, if the following conditions hold:
(i) fµ is continuous on R++ × G, and for fixed µ ∈ R++, fµ is convex on G. fµ has
three partial derivatives on G, which are continuous on R++ × G and continuously
differentiable with respect to µ on R++.
59
(ii) For any µ ∈ R++, the function fµ is strongly α1(µ)-self-concordant.
(iii) For any (µ,x) ∈ R++ ×G and any h ∈ Rn,
|∇xfµ(µ,x)[h]′ − ln α3(µ)′∇xfµ(µ,x)[h]| ≤ α4(µ)α1(µ)12
(∇2
xxfµ(µ,x)[h,h]) 1
2 ,
|∇2xxfµ(µ, x)[h,h]′ − ln α2(µ)′∇2
xxfµ(µ,x)[h,h]| ≤ 2α5(µ)∇2xxfµ(µ,x)[h,h].
In this section, we need to compute ∇xη(µ,x)′, and in order to do so we need to
determine the partial derivative of ν(k)(µ,x) with respect to µ. Let (y(k)′, ν(k)′ , s(k)′)
denote the partial derivatives of (y(k)(µ,x), ν(k)(µ,x), s(k)(µ,x)) with respect to µ. Dif-
ferentiating (3.1.10) with respect to µ, we get the system
s(k) ν(k)′ + ν(k) s(k)′ = e2,
W (k)y(k)′ + s(k)′ = 0,
W (k)Tν(k)′ = 0.
(3.2.3)
Solving the system (3.2.3), we obtain
s(k)′ = W (k)R(k)−1W (k)T s(k)−1
,
ν(k)′ =⌈s(k)−1/2
(t)⌉P (k)e2,
y(k)′ = −R(k)−1W (k)T s(k)−1
.
(3.2.4)
The proof of strongly self-concordance of the family η(µ, ·) : µ > 0 depends on the
following two lemmas.
Lemma 3.2.3. For any µ > 0,x ∈ F0 and h ∈ Rm1, the following inequality holds:
|∇xη(µ,x)T[h]′| ≤(−(r1 +Kr2)
µ∇2
xxη(µ,x)[h,h]
)1/2
.
60
Proof. By differentiating (3.1.19) with respect to µ and applying (3.2.4), we obtain
∇xη(µ,x)′ = −K∑k=1
T (k)T⌈s(k)−1/2
(t)⌉P (k)e2 − ATs−1 = − 1
√µBε,
where B ∈ Rm1×(Kn2+n1) is defined by
B :=[√
µ T (1)T⌈s(1)−1/2
(t)⌉P (1), . . . ,
õ T (k)T
⌈s(K)−1/2
(t)⌉P (K),
õ AT
⌈s−1/2
⌉ ]
and
ε := (e2; . . . ; e2︸ ︷︷ ︸K times
; e1) ∈ J2 × · · · × J2︸ ︷︷ ︸K times
×J1.
Notice that, in view of (3.1.20), we have
BBT = µK∑k=1
T (k)T⌈s(k)−1/2
(t)⌉P (k)
⌈s(k)−1/2
(t)⌉T (k) + µ AT
⌈s−1⌉A = −∇2
xxη(µ,x).
This gives
−∇xη(µ,x)T′∇2xxη(µ,x)−1∇xη(µ,x)′ = 1
µε •BT[BBT]−1Bε
≤ 1µε • ε
= 1µ(e1 • e1 +Ke2 • e2).
Recall that e1 • e1 = rank(J1) = r1 and e2 • e2 = rank(J2) = r2. It follows that
−∇xη(µ,x)T′∇2xxη(µ,x)−1∇xη(µ,x)′ ≤ (r1 +Kr2)
µ. (3.2.5)
We then have
|∇xη(µ,x)T[h]′| ≤(−∇xη(µ,x)T′∇2
xxη(µ,x)−1∇xη(µ,x)′)1/2
(−∇2xxη(µ,x)[h,h])
1/2
≤(−(r1 +Kr2)
µ∇2
xxη(µ,x)[h,h]
)1/2
, as desired. 2
61
Lemma 3.2.4. For any µ > 0,x ∈ F0 and h ∈ Rm1, the following inequality holds:
|∇2xxη(µ,x)[h,h]′| ≤ −
2√r2
µ∇2
xxη(µ,x)[h,h].
Proof. Let (ν(k), s(k), P (k), R(k)) := (ν(k)(µ,x), s(k)(µ,x), P (k)(µ,x), R(k)(µ,x)). We fix
h ∈ Rm1 and define
u(k) :=õ P (k)
⌈s(k)−1/2
⌉T (k)h.
Now by following some similar steps as in the proof of Lemma 3.2.2, and using (3.2.2),
(3.1.18), (3.2.3), (3.1.10), and (3.2.4), we get
(u(k) • u(k))′ = 2 u(k) • u(k)′
= 2 u(k) •(⌈s(k)−1/2
(t)⌉′⌈s(k)−1/2
(t)⌉−1
u(k))
= u(k) •⌈s(k)1/2(t)
⌉⌈s(k)−1
(t)⌉′⌈s(k)1/2(t)
⌉u(k)
= u(k) •⌈ν(k)−1/2
(t)⌉⌈ν(k)(t)
⌉′⌈ν(k)−1/2
(t)⌉u(k)
= 2 u(k) •⌈ν(k)−1/2
(t)⌉⌈ν(k)(t), ν(k)′(t)
⌉⌈ν(k)−1/2
(t)⌉u(k)
= 2 u(k) •⌈e2, ν
(k)−1/2(t) ν(k)′(t) ν(k)−1/2
(t)⌉u(k)
= 2 u(k) •⌊ν(k)−1/2
(t) ν(k)′(t) ν(k)−1/2(t)⌋u(k)
= 2 u(k) •⌊ν(k)−1
(t) ν(k)′(t)⌋u(k)
= 2µ−1 u(k) •⌊s(k)(t) ν(k)′(t)
⌋u(k)
= 2µ−1 u(k) •⌊e2 − ν(k)(t) s(k)′(t)
⌋u(k)
= 2µ−1 u(k) •⌊e2 − µ s(k)−1
(t) s(k)′(t)⌋u(k)
≤ 2µ−1‖u(k)‖2∥∥∥e2 − µ
⌈s(k)−1/2
(t)⌉s(k)′(t)
∥∥∥= 2µ−1‖u(k)‖2
∥∥∥e2 − µ⌈s(k)−1/2
(t)⌉W (k)R(k)−1
W (k)T s(k)−1∥∥∥
= 2µ−1‖u(k)‖2 ‖e2 − µ (e2 − P (k)e2)‖
≤ 2µ−1‖u(k)‖2 ‖e2‖,
where the last inequality is obtained by observing that e2 − P (k)e2 KJ2 0, which can be
62
immediately seen by noting that P (k) I.
Recall that ‖e2‖2 = rank(J2) = r2. This gives us
|(u(k) • u(k))′| ≤2√r2
µu(k) • u(k).
From (3.1.20), we have that
∇2xxη(µ,x)[h,h] = −
K∑k=1
u(k) • u(k) − µ (AT⌈s−1⌉A)[h,h].
Therefore, for any h ∈ Rn, we have
|∇2xxη(µ,x)[h,h]′| ≤
K∑k=1
|(u(k) • u(k))′|+ (AT⌈s−1⌉A)[h,h]
≤2√r2
µ
K∑k=1
u(k) • u(k) + (AT⌈s−1⌉A)[h,h]
≤ −2√r2µ∇2
xxη(µ,x)[h,h].
This completes the proof. 2
Theorem 3.2.2. The family η(µ, ·) : µ > 0 is a strongly self-concordant family with
the following parameters
α1(µ) = µ, α2(µ) = α3(µ) = 1, α4(µ) =
√r1 +Kr2
µ, α5(µ) =
√r2
µ.
Proof. It is clear that condition (i) of Definition 3.2.2 holds. Theorem 3.2.1 shows that
condition (ii) is satisfied and Lemmas 3.2.4 and 3.2.3 show that condition (iii) is satisfied.
2
63
3.3 A class of logarithmic barrier algorithms for solv-
ing SSPs
In §3.2 we have established that the parametric functions η(µ, ·) constitute a strongly
self-concordant family. Therefore, it is straightforward to develop primal path following
interior point algorithms for solving SSP (3.1.3, 3.1.4). In this section we introduce a
class of log barrier algorithms for solving this problem. This class is stated formally in
Algorithm 1.
Algorithm 1 The Decomposition Algorithm for Solving SSP
Require: ε > 0, γ ∈ (0, 1), θ > 0, β > 0, x0 ∈ F0 and µ0 > 0.x := x0, µ := µ0
while µ ≥ ε dofor k = 1, 2, . . . , K do
solve (3.1.9) to obtain (y(k), s(k),ν(k))choose a scaling element p ∈ C(s(k),ν(k)) and compute (s(k), ν(k))
end forcompute ∆x := −∇2
xxη(µ,x)−1∇xη(µ,x) using (3.1.19) and (3.1.20)
compute δ(µ,x) :=
√− 1
µ∆xT∇2
xxη(µ,x)∆x using (3.1.20)
while δ > β dox := x+ θ∆xfor k = 1, 2, . . . , K do
solve (3.1.9) to obtain (y(k), s(k),ν(k))choose a scaling element p ∈ C(s(k),ν(k)) and compute (s(k), ν(k))
end forcompute ∆x := −∇2
xxη(µ,x)−1∇xη(µ,x) using (3.1.19) and (3.1.20)
compute δ(µ,x) :=
√− 1
µ∆xT∇2
xxη(µ,x)∆x using (3.1.20)
end whileµ := γµapply inverse scaling to (s(k), ν(k))
end while
Our algorithm is initialized with a starting point x0 ∈ F0 and a starting value µ0 > 0
for the barrier parameter µ, and is indexed by a parameter γ ∈ (0, 1). We use δ as a
measure of the proximity of the current point x to the central path, and β as a threshold
64
for that measure. If the current x is too far away from the central path in the sense that
δ > β, Newton’s method is applied to find a point close to the central path. Then the
value of µ is reduced by a factor γ and the whole precess is repeated until the value of
µ is within the tolerance ε. By tracing the central path as µ approaches zero, a strictly
feasible ε-optimal solution to (3.1.11) will be generated.
3.4 Complexity analysis
In this section we present the complexity analysis for two variants of algorithms: short-
step algorithms and long-step algorithms, which are controlled by the input value of γ in
Algorithm 1.
As mentioned in [25], the first part of the following proposition follows directly from
the definition of self concordance and is due to [30, Theorem 2.1.1]. The second part is
results from the first part and is given in [48] without proof.
Proposition 3.4.1. For any µ > 0 and x ∈ F0, we denote ∆x := −∇2xxη(µ,x)−1∇xη(µ,x)
and δ :=√− 1µ∆xT∇2
xxη(µ,x)∆x. Then for δ < 1, τ ∈ [0, 1] and any h,h1,h2 ∈ Rm2 we
have
(i) −(1−τδ)2hT∇2xxη(µ,x)h ≤ −hT∇2
xxη(µ,x+τ∆x)h ≤ −(1−τδ)−2hT∇2xxη(µ,x)h,
(ii) |hT1 (∇2
xxη(µ,x+ τ∆x)−∇2xxη(µ,x))h2| ≤ [(1− τδ)−2 − 1]
√−hT
1∇2xxη(µ,x)h1√
−hT2∇2
xxη(µ,x)h2.
The following lemma is essentially Theorem 2.2.3 of [30] and describes the behavior
of the Newton method as applied to η(µ, ·).
Lemma 3.4.1. For any µ > 0 and x ∈ F0, let ∆x be the Newton direction defined by
∆x := −∇2xxη(µ,x)−1∇xη(µ,x); δ := δ(µ,x) =
√− 1µ∆xT∇2
xxη(µ,x)∆x, x+ = x +
∆x, ∆x+ be the Newton direction calculated at x+, and δ(µ,x+) :=√− 1µ∆x+T∇2
xxη(µ,x+)∆x+.
Then the following relations hold:
65
(i) If δ < 2−√
3, then δ(µ,x+) ≤(
δ
1− δ
)2
≤ δ2.
(ii) If δ ≥ 2−√
3, then η(µ,x)− η(µ,x+ θ∆x) ≥ µ(δ− ln(1 + δ)), where θ = (1 + δ)−1.
3.4.1 Complexity for short-step algorithm
In the short-step version of the algorithm, we decrease the barrier parameter by a factor
γ := 1 − σ/√r1 +Kr2, with σ < 0.1, in each iteration. The kth iteration of the short-
step algorithm is performed as follows: at the beginning of the iteration, we have µ(k−1)
and x(k−1) on hand and x(k−1) is close to the central path, i.e., δ(µ(k−1),x(k−1)) ≤ β.
After the barrier parameter µ is reduced from µ(k−1) to µk := γµ(k−1), we have that
δ(µk,x(k−1)) ≤ 2β. Then a full Newton step with size θ = 1 is taken to produce a new
point xk with δ(µk,xk) ≤ β. We now show that, in this class of algorithms, only one
Newton step is sufficient for recentering after updating the parameter µ. For the purpose
of proving this result, we present the following proposition which is a restatement of [30,
Theorem 3.1.1].
Proposition 3.4.2. Let χκ(η;µ, µ+) :=(
1+r22
+√r1+Kr2κ
)lnγ−1. Assume that δ(µ,x) <
κ and µ+ := γµ satisfies
χκ(η;µ, µ+) ≤ 1− δ(µ,x)
κ.
Then δ(µ+,x) < κ.
Lemma 3.4.2. Let µ+ = γµ, where γ = 1 − σ/√r1 +Kr2 and σ ≤ 0.1, and let β =
(2−√
3)/2. If δ(µ,x) ≤ β, then δ(µ+,x) ≤ 2β.
Proof. Let κ := 2β = 2 −√
3. Since δ(µ, y) ≤ κ/2, one can verify that for σ ≤ 0.1, µ+
satisfies
χκ(η;µ, µ+) ≤ 1
2≤ 1− δ(µ, y)
κ.
By Proposition 3.4.2, we have δ(µ+, y) ≤ κ. 2
66
By Lemmas 3.4.1(i) and 3.4.2, we conclude that we can reduce the parameter µ by the
factor γ := 1 − σ/√r1 +Kr2, σ < 0.1, at each iteration, and that only one Newton step
is sufficient to restore proximity to the central path. So we have the following complexity
result for short-step algorithms.
Theorem 3.4.1. Consider Algorithm 1 and let µ0 be the initial barrier parameter, ε >
0 the stopping criterion, and β = (2 −√
3)/2. If the starting point x0 is sufficiently
close to the central path, i.e., δ(µ0,x0) ≤ β, then the short-step algorithm reduces the
barrier parameter µ at a linear rate and terminates with at most O(√r1 +Kr2 ln(µ0/ε))
iterations.
3.4.2 Complexity for long-step algorithm
In the long-step version of the algorithm, the barrier parameter is decreased by an arbi-
trary constant factor γ ∈ (0, 1). It has a potential for larger decrease on the objective
function value, however, several damped Newton steps might be needed for restoring the
proximity to the central path.
The kth iteration of the long-step algorithms is performed as follows: at the beginning
of the iteration we have a point x(k−1), which is sufficiently close to x(µ(k−1)), where
x(µ(k−1)) is the solution to (3.1.11) for µ := µ(k−1). The barrier parameter is reduced
from µ(k−1) to µk := γµ(k−1), where γ ∈ (0, 1), and then a search is started to find a point
xk that is sufficiently close to x(µk). The long-step algorithm generates a finite sequence
consisting of N points in F0, and we finally take xk to be equal to the last point of this
sequence. We want to determine an upper bound on N , the number of Newton iterations
that are needed to find the point xk.
For µ > 0 and x ∈ F0, we define the function φ(µ,x) := η(µ,x(µ)) − η(µ,x), which
represents the difference between the objective value η(µk,x(k)) at the end of kth iteration
and the minimum objective value η(µk,x(µk−1)) at the beginning of kth iteration. Then
67
our task is to find an upper bound on φ(µ+,x). To do so, we first give upper bounds on
φ(µ,x) and φ′(µ,x) respectively. We have the following lemma.
Lemma 3.4.3. Let µ > 0 and x ∈ F0, we denote ∆x := x(µ)− x and define
δ := δ(µ,x) =
√− 1
µ∆xT∇2
xxη(µ,x)∆x.
For any µ > 0 and x ∈ F0, if δ < 1, then the following inequalities hold:
φ(µ,x) ≤ µ
(δ
1− δ+ ln(1− δ)
), (3.4.1)
|φ′(µ,x)| ≤ −√r1 +Kr2 ln(1− δ). (3.4.2)
Proof.
φ(µ,x) := η(µ,x(µ))− η(µ,x) :=
∫ 1
0
∇xη(µ,x+ τ∆x)T∆xdτ.
Since x(µ) is the optimal solution, we have
∇xη(µ,x(µ)) = 0 (3.4.3)
Hence, with the aid of Proposition 3.4.1(i), we get
φ(µ,x) =
∫ 1
0
∫ τ
1
∆xT∇2xxη(µ,x+ α∆x)∆x dαdτ
≤ −∫ 1
0
∫ 1
τ
∆xT∇2xxη(µ,x)∆x
(1− αδ)2dαdτ
=
∫ 1
0
∫ 1
τ
µδ2
(1− αδ)2dαdτ
= µ
(δ
1− δ+ ln(1− δ)
),
which establishes (3.4.1).
Now, for any µ > 0, by applying the chain rule, using (3.4.3), and applying the
68
Mean-Value Theorem, we get
φ′(µ,x) = η′(µ,x(µ))− η′(µ,x) +∇xη(µ,x(µ))Tx′(µ)
= η′(µ,x(µ))− η′(µ,x) = ∇xη(µ,x+$∆x)T∆x,(3.4.4)
for some $ ∈ (0, 1). Hence
|φ′(µ,x)| =
∣∣∣∣∫ 1
0
∇xη′(µ,x+ τ∆x)T∆x dτ
∣∣∣∣≤
∫ 1
0
√−∆xT∇2
xxη(µ,x+ τ∆x)∆x√−∇xη′(µ,x+ τ∆x)T[∇2
xxη(µ,x+ τ∆x)]−1∇xη′(µ,x+ τ∆x) dτ.
Then, by using (3.2.5) and Proposition 3.4.1(i), we obtain
|φ′(µ,x)| ≤∫ 1
0
√−∆xT∇2
xxη(µ,x)∆x
1− τ δ
√r1 +Kr2
µdτ
=
∫ 1
0
δ√µ
1− τ δ
√r1 +Kr2
µdτ
=√r1 +Kr2
∫ 1
0
δ
1− τ δdτ
= −√r1 +Kr2 ln(1− δ),
which establishes (3.4.2). 2
Lemma 3.4.4. Let µ > 0 and x ∈ F0 be such that δ < 1, where δ is as defined in Lemma
3.4.3. Let µ+ := γµ with γ ∈ (0, 1). Then
η(µ+,x(µ+))− η(µ+,x) ≤ O(r1 +Kr2)µ+.
Proof. Differentiating (3.4.4) with respect to µ, we get
φ′′(µ,x) = η′′(µ,x(µ))− η′′(µ,x) +∇xη′(µ,x(µ))Tx′(µ). (3.4.5)
69
We will work on the right-hand side of (3.4.5) by verifying that the second term
η′′(µ,x) is nonnegative and then bounding the first and the last term.
From the definition of η(x, µ), we have that η′′(µ,x) =∑K
k=1 ρ′′k(µ,x). Differentiating
ρk(µ,x) with respect to µ and using (3.2.4) and (3.1.10) gives
ρ′k(µ,x) = d(k)Ty(k)′ + ln det s(k) + µ s(k)−1 • s(k)′
= ln det s(k) + d(k)T(−R(k)−1W (k)T s(k)−1
) + µ s(k)−1 • (W (k)R(k)−1W (k)T s(k)−1
)
= ln det s(k) + d(k)T(−R(k)−1W (k)T s(k)−1
) + ν(k) • (W (k)R(k)−1W (k)T s(k)−1
)
= ln det s(k) + (−d(k) + W (k)Tν(k))T(R(k)−1W (k)T s(k)−1
)
= ln det s(k).
Observe that (I − P (k))2 = I − P (k) (remember P (k)2= P (k)). Therefore, by differen-
tiating ρ′k(µ,x) with respect to µ and using (3.2.4) and (3.1.18), we obtain
ρ′′k(µ,x) = s(k)−1 • s(k)′
= s(k)−1 • (W (k)R(k)−1W (k)T s(k)−1
)
= s(k)−1 •(⌈s(k)1/2
⌉ (I − P (k)
) ⌈s(k)1/2
⌉s(k)−1
)=
∥∥∥ (I − P (k)) ⌈s(k)1/2
⌉s(k)−1
∥∥∥2
≥ 0.
Thus, for µ > 0 and x ∈ F0, η′′(µ,x) ≥ 0, and hence η(µ,x) is a convex function of
µ. In addition, using (3.1.18), we also have
ρ′′k(µ,x) = s(k)−1 •(⌈s(k)⌉s(k)−1 −
⌈s(k)1/2
⌉P (k)
⌈s(k)1/2
⌉s(k)−1
)= s(k)−1 •
⌈s(k)⌉s(k)−1 − s(k)−1 •
⌈s(k)1/2
⌉P (k)
⌈s(k)1/2
⌉s(k)−1
= s(k)−1 •⌈s(k)⌉s(k)−1 −
∥∥∥P (k)⌈s(k)1/2
⌉s(k)−1
∥∥∥2
≤ s(k)−1 •⌈s(k)⌉s(k)−1
= s(k)−1 • s(k)
70
= µ−1 ν(k) • s(k)
= µ−1 trace(e2)
=r2
µ.
Hence
η′′(µ,x(µ)) ≤ Kr2
µ. (3.4.6)
By differentiating (3.4.3) with respect to µ, we get
∇xη′(µ,x(µ)) +∇2
xxη(µ,x(µ))x′(µ) = 0,
or equivalently
x′(µ) = −∇2xxη(µ,x(µ))−1∇xη
′(µ,x(µ)).
Hence, by using (3.2.5), we have
−∇xη′(µ,x(µ))Tx′(µ) = ∇xη
′(µ,x(µ))T∇2xxη(µ,x(µ))−1∇xη
′(µ,x(µ))
≤ 1
µ(r1 +Kr2).
(3.4.7)
By combining (3.4.6) and (3.4.7), and using the fact that η′′(µ,x) ≥ 0, we obtain
φ′′(µ,x(µ)) ≤ r1 + 2Kr2
µ. (3.4.8)
Applying the Mean-Value Theorem and using Lemma 3.4.3 and (3.4.8) gives
φ(µ+,x) = φ(µ,x) + φ′(µ,x)(µ+ − µ) +
∫ µ+
µ
∫ τ
µ
φ′′(υ,x) dυdτ
≤ µ
(δ
1− δ+ ln(1− δ)
)−√r1 +Kr2 ln(1− δ) (µ− µ+)
+(r1 + 2Kr2)
∫ µ+
µ
∫ τ
µ
υ−1 dυdτ
71
= µ
(δ
1− δ+ ln(1− δ)
)−√r1 +Kr2 ln(1− δ) (µ− µ+)
+(r1 + 2Kr2) (µ− µ+) ln τµ
≤ µ
(δ
1− δ+ ln(1− δ)
)−√r1 +Kr2 ln(1− δ) (µ− µ+)
+(r1 + 2Kr2) (µ− µ+) lnγ−1
(recall γ−1 = µ+/µ ≥ τ/µ). Since δ and γ are constants, the lemma is established. 2
Notice that the previous lemma requires δ < 1. However, evaluating δ explicitly may
not be possible. In the next lemma we will see that δ is actually proportional to δ, which
can be evaluated.
Lemma 3.4.5. For any µ > 0 and x ∈ F0, let ∆x := −∇2xxη(µ,x)−1∇xη(µ,x) and
∆x := x− x(µ). We denote
δ := δ(µ,x) =
√− 1
µ∆xT∇2
xxη(µ,x)∆x and δ := δ(µ,x) =
√− 1
µ∆xT∇2
xxη(µ,x)∆x.
If δ < 1/6, then 23δ ≤ δ ≤ 2δ.
Proof. Let H := ∇2xxη(µ,x) and g := ∇xη(µ,x), and denote g := g + H∆x. From
(3.4.3), we have ∆x = −H−1g. This gives us ∆x = ∆x+H−1g. We then have
δ =
√− 1
µ(∆x+H−1g)T∇2
xxη(µ,x)(∆x+H−1g)
≤
√√√√− 1
µ∆xT∇2
xxη(µ,x)∆x− 1
µ(H−1g)T∇2
xxη(µ,x)︸ ︷︷ ︸H
(H−1g)
≤ δ +
√− 1
µgTH−1g,
(3.4.9)
where we used the triangle inequality to obtain the last inequality. Note that, by (3.4.3),
72
we have ∇xη(µ,x− ∆x) = 0. Applying the Mean-Value Theorem gives
−hTg = −hT(H∆x+ g)
= −hT(∇2xxη(µ,x)∆x+∇xη(µ,x))
= −hT(∇2xxη(µ,x)∆x− (∇xη(µ,x− ∆x)−∇xη(µ,x)))
= −hT(∇2xxη(µ,x)−∇2
xxη(µ,x− (1−$)∆x))∆x, for some $ ∈ (0, 1).
Now in view of Proposition 3.4.1(ii) we have
−hTg = −∫ 1
0
hT(∇2xxη(µ,x)−∇2
xxη(µ,x− (1− τ)∆x))∆x dτ
≤√−∆xTH∆x
√−hTHh
∫ 1
0
((1− (1− τ)δ)−2 − 1)dτ
=
(δ
1− δ
) √−∆xTH∆x
√−hTHh
=
(√µ δ2
1− δ
)√−hTHh.
It can be verified that −gTH−1g = maxhTHh − 2hTg : h ∈ Rm1. It then follows
that
−gTH−1g ≤ max
hTHh+
(2√µ δ2
1− δ
)√−hTHh : h ∈ Rm1
=
µ δ4
(1− δ)2. (3.4.10)
From (3.4.9) and (3.4.10), we obtain δ ≤ δ+ δ2
1−δ , or equivalently, 2δ2−(1+δ) δ+δ ≥ 0.
Therefore, the condition δ ≤ 1/6 implies that 12δ2− 7 δ+ 1 ≥ 0, which in turn gives that
δ ≤ 1/3 = 2δ.
From (3.4.9), by exchanging positions of ∆x and ∆x and following the above steps,
we get
δ ≤ δ +δ2
1− δ, or equivalently, δ ≤ δ
1− δ≤ δ
1− 13
=3
2δ.
Thus, the condition δ < 1/6 implies that 23δ ≤ δ ≤ 2δ. This completes the proof. 2
73
Combining Lemmas 3.4.1(ii), 3.4.4, and 3.4.5, we have the following complexity result
for long-step algorithms.
Theorem 3.4.2. Consider Algorithm 1 and let µ0 be the initial barrier parameter, ε > 0
the stopping criterion, and β = 1/6. If the starting point x0 is sufficiently close to the
central path, i.e., δ(µ0,x0) ≤ β, then the long-step algorithm reduces the barrier parameter
µ at a linear rate and terminates with at most O((r1 +Kr2) ln(µ0/ε)) iterations.
74
Chapter 4
A Class of Polynomial Volumetric
Barrier Decomposition Algorithms
for Stochastic Symmetric
Programming
Ariyawansa and Zhu [10] have derived a class of polynomial time decomposition-based
algorithms for solving the SSDP problem (Problem 4) based on a volumetric barrier
analogous to work of Mehrotra and Ozevin [25] by utilizing the work of Anstreicher [7]
for DSDP. In this chapter, we extend Ariyawansa and Zhu’s work [10] to the case of
SSPs by deriving a class of volumetric barrier decomposition algorithms for the general
SSP problem and establishing polynomial complexity of certain members of the class of
algorithms. The results of this chapter have been submitted for publication [4].
75
4.1 The volumetric barrier problem for SSPs
In this section we formulate appropriate volumetric barrier function for the SSP problem
(with finite event space Ω) and obtain expressions for the derivatives required in the rest
of the paper. Our procedure closely follows that in §3 of [10], although our setting is much
more general.
4.1.1 Formulation and assumptions
We now examine (2.1.3, 2.1.4) when the event space Ω is finite. Let (T (k),W (k),h(k),d(k)) :
k = 1, 2, . . . , K be the set of the possible values of the random variables(T (ω),W (ω),h(ω),
d(ω))
and let pk := P(T (ω),W (ω),h(ω),d(ω)
)=(T (k),W (k),h(k),d(k)
)be the associated
probability for k = 1, 2, . . . , K. Then Problem (2.1.3, 2.1.4) becomes
max cTx+K∑k=1
pkQ(k)(x)
s.t. Ax KJ1 b,(4.1.1)
where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum of the problem
max d(k)Ty(k)
s.t. W (k)y(k) KJ2 h(k) − T (k)x,
(4.1.2)
where x ∈ Rm1 is the first-stage decision variable, and y(k) ∈ Rm2 is the second-stage
variable for k = 1, 2, . . . , K.
We notice that the constraints in (4.1.1, 4.1.2) are negative symmetric while the com-
mon practice in the DSP literature is to use positive symmetric constraints. So for conve-
nience we redefine d(k) as d(k) := pkd(k) for k = 1, 2, . . . , K, and rewrite Problem (4.1.1,
76
4.1.2) as
min cTx+K∑k=1
Q(k)(x)
s.t. Ax− b KJ1 0,
(4.1.3)
where, for k = 1, 2, . . . , K, Q(k)(x) is the maximum of the problem
min d(k)Ty(k)
s.t. W (k)y(k) + T (k)x− h(k) KJ2 0.(4.1.4)
In the rest of this paper our attention will be on Problem (4.1.3, 4.1.4), and from now
on when we use the acronym “SSP” in this paper we mean Problem (4.1.3, 4.1.4).
4.1.2 The volumetric barrier problem for SSPs
In this section we formulate a volumetric barrier for SSPs and obtain expressions for the
derivatives required in our subsequent development.
In order to define the volumetric barrier problem for the SSP (4.1.3, 4.1.4), we need
to make some assumptions. First we define
F1 :=x : s1(x) := Ax− b KJ1 0
;
F (k)(x) :=y(k) : s
(k)2 (x,y(k)) := W (k)y(k) + T (k)x− h(k) KJ2 0
for k = 1, 2, . . . , K;
F2 :=x : F (k)(x) 6= ∅, k = 1, 2, . . . , K
;
F0 := F1
⋂F2.
Now we make
Assumption 4.1.1. The matrix A and every matrix T (k) have full column rank.
Assumption 4.1.2. The set F0 is nonempty.
Assumption 4.1.3. For each x ∈ F0 and for k = 1, 2, . . . , K, Problem (4.1.4) has a
nonempty isolated compact set of minimizers.
77
Assumption 4.1.1 is for convenience. Under Assumption 4.1.2, the set F1 is nonempty.
The logarithmic barrier [30] for F1 is the function l1 : F1 → R defined by
l1(x) := − ln det(s1(x)), ∀x ∈ F1,
and the volumetric barrier [30, 40] for F1 is the function v1 : F1 → R defined by
v1(x) :=1
2ln det(∇2
xxl1(x)), ∀x ∈ F1.
Also under Assumption 4.1.2, F2 is nonempty and for x ∈ F2, F (k)(x) is nonempty for k =
1, 2, . . . , K. The logarithmic barrier [30] for F (k)(x) is the function l(k)2 : F2×F (k)(x)→ R
defined by
l(k)2 (x,y(k)) := −ln det(s
(k)2 (x,y(k))), ∀y(k) ∈ F (k)(x), x ∈ F2,
and the volumetric barrier [30, 40] for F (k)(x) is the function v(k)2 : F2 × F (k)(x) → R
defined by
v(k)2 (x,y(k)) :=
1
2ln det(∇2
y(k)y(k)l(k)2 (x,y(k))), ∀y(k) ∈ F (k)(x), x ∈ F2.
Now, we define the volumetric barrier problem for the SSP (4.1.3, 4.1.4) as
min η(µ,x) : = cTx+K∑k=1
ρk(µ,x) + µc1v1(x), (4.1.5)
where for k = 1, 2, . . . , K and x ∈ F0, ρk(µ,x) is the minimum of the problem
min d(k)Ty(k) + µc2v(k)2 (x,y(k)). (4.1.6)
78
Here c1 := 225√n1 and c2 := 450n3
2 are constants, and µ > 0 is the barrier parameter.
Now, we will show that (4.1.6) has a unique minimizer for each x ∈ F0 and for
k = 1, 2, . . . , K. For this purpose, we present the following theorem.
Theorem 4.1.1 (Fiacco and Mccormick [19, Theorem 8]). Consider the inequality con-
strained problem
min f(x)
s.t. gi(x) ≥ 0, i = 1, 2, . . . ,m,(4.1.7)
where the functions f, g1, . . . , gm : Rn → R are continuous. Let I be a scalar-valued
function of x with the following two properties: I(x) is continuous in the region R0 :=
x : gi(x) > 0, i = 1, 2, . . . ,m, which is assumed to be nonempty; if xk is any infinite
sequence of points in R0 converging to xB such that gi(xB) = 0 for at least one i, then
limk→∞ I(xk) = +∞. Let τ be a scalar-valued function of the single variable s with the
following two properties: if s1 > s2 > 0, then τ(s1) > τ(s2) > 0; if sk is an infinite
sequence of points such that limk→∞ sk = 0, then limk→∞ τ(sk) = 0. Let U : R0×R+ → R
be defined by U(x, s) := f(x)+τ(s)I(x). If (4.1.7) has a nonempty, isolated compact set of
local minimizers and sk is a strictly decreasing infinite sequence, then the unconstrained
local minimizers of U(·, sk) exist for sk small.
Lemma 4.1.1. If Assumptions 4.1.2 and 4.1.3 hold, then for each x ∈ F0 and k =
1, 2, . . . , K, the Problem (4.1.6) has a unique minimizer for µ small.
Proof. For any x ∈ F0, v(k)2 (x,y(k)) is defined on the nonempty set F (k)(x). By Theorem
1.3.1, there exist real numbers λ(k)1 , λ
(k)2 , · · · , λ(k)
r2 and a Jordan frame c(k)1 , c
(k)2 , · · · , c(k)
r2
such that s(k)2 (x,y(k)) = λ
(k)1 c
(k)1 + λ
(k)2 c
(k)2 + · · ·+ λ
(k)r2 c
(k)r2 , and λ
(k)1 , λ
(k)2 , · · · , λ(k)
r2 are the
eigenvalues of s(k)2 (x,y(k)). Moreover, λ
(k)j can be viewed as a function of y(k) ∈ F (k)(x)
for j = 1, 2, . . . , r2. Then λ(k)j is continuous for j = 1, 2, . . . , r2 and hence the constraint
s(k)2 (x,y(k)) KJ2 0 can be replaced by the constraints: λ
(k)j (y(k)) > 0, j = 1, 2, . . . , r2.
So (4.1.4) can be rewritten in the form of (4.1.7). Therefore, by Theorem 4.1.1, local
79
minimizers of (4.1.6) exist for each x ∈ F0 and k = 1, 2, . . . K for µ small. The uniqueness
of the minimizer follows from the fact that v(k)2 is strictly convex. 2
In view of Lemma 4.1.1, Problem (4.1.5) is well-defined, and its feasible set is F0.
4.1.3 Computation of ∇xη(µ,x) and ∇2xxη(µ,x)
In order to compute the derivatives of η we need to determine the derivatives of ρk, k =
1, 2, . . . , K, which in turn require the derivatives of v(k)2 and l
(k)2 for k = 1, 2, . . . , K. We
will drop the superscript (k) when it does not lead to confusion.
Note that
∇xs1(x) = A, ∇xs2(x,y) = T, and ∇ys2(x,y) = W.
Hence
∇xl1(x) = −(∇xs1)Ts−11 = −ATs−1
1 , and ∇yl2(x,y) = −(∇ys2)Ts−12 = −WTs−1
2 .
This implies that
∇2xxl1(x) = ATds−1
1 e∇xs1 = ATds−11 eA,
and
H := ∇2yyl2(x,y) = WTds−1
2 e∇xs2 = WTds−12 eW.
We need the matrix calculus result for our computation.
Proposition 4.1.1. Let X ∈ Rn×n be nonsingular . Then
∂
∂Xij
X−1 = −X−1eieTjX
−1,
for i, j = 1, 2, . . . , n. Here, ei is the ith vector in the standard basis for Rn.
80
To compute the first partial derivatives of v2(x,y), we start by observing that
∂
∂xk
⌈s−1
2
⌉=
∑ij
∂
∂ ds2eij
⌈s−1
2
⌉ ∂
∂xkds2eij
=∑ij
∂
∂ ds2eijds2e−1 ∂
∂xkds2eij
= −2∑ij
ds2e−1 eieTj ds2e−1
⌈s2,
∂
∂xks2
⌉ij
= −2∑ij
ds2e−1 eieTj ds2e−1 ds2, tkeij
= −2⌈s−1
2
⌉ds2, tke
⌈s−1
2
⌉.
(4.1.8)
Then, for i = 1, 2, . . . ,m1, we have
∂
∂xiv2(x,y) = 1
2
∂
∂xiln det H
= 12H−1 • ∂
∂xiH
= 12H−1 • ∂
∂xi(WTds−1
2 eW )
= 12H−1 •WT
(∂
∂xids−1
2 e)W
= −H−1 •WT ds2e−1 ds2, tie ds2e−1W
= −WTH−1W • ds2e−1 ds2, tie ds2e−1
= −⌈s−1/22
⌉WTH−1W
⌈s−1/22
⌉•⌈s−1/22
⌉ds2, tie
⌈s−1/22
⌉By defining
P :=⌈s−1/22
⌉WTH−1W
⌈s−1/22
⌉,
which acts as the orthogonal projection onto the range of⌈s−1/22
⌉W , we get
∂
∂xiv2(x,y) = −WTH−1W •
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉= −P •
⌈s−1/22
⌉ds2, tie
⌈s−1/22
⌉,
(4.1.9)
for i = 1, 2, . . . ,m1.
81
Similarly, we have
∂
∂yiv2(x,y) = −WTH−1W •
⌈s−1
2
⌉ds2,wie
⌈s−1
2
⌉= −P •
⌈s−1/22
⌉ds2,wie
⌈s−1/22
⌉,
for i = 1, 2, . . . ,m2.
To compute the second partial derivatives of v2(x,y), we start by observing that
∂
∂xkH−1 =
∑ij
∂
∂Hij
H−1 ∂
∂xkHij
= −∑ij
(H−1eieTjH−1)
∂
∂xk
[WT
⌈s−1
2
⌉W]ij
= −∑ij
(H−1eieTjH−1)[WT
(∂
∂xk
⌈s−1
2
⌉)W]ij
= 2∑ij
(H−1eieTjH−1)[WT ds2e−1 ds2, tke ds2e−1W
]ij
= 2H−1WT⌈s−1
2
⌉ds2, tke
⌈s−1
2
⌉WH−1,
(4.1.10)
and, using (4.1.8) and Lemma 1.3.2, that
∂
∂xj
(⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉)=
(∂
∂xj
⌈s−1
2
⌉)ds2, tie
⌈s−1
2
⌉+⌈s−1
2
⌉( ∂
∂xjds2, tie
)⌈s−1
2
⌉+⌈s−1
2
⌉ds2, tie
(∂
∂xj
⌈s−1
2
⌉)= −2
⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉+⌈s−1
2
⌉dtj, tie
⌈s−1
2
⌉−2
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉= −2
⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉+⌈s−1
2
⌉ (dtj, tie − 2 ds2, tie
⌈s−1
2
⌉ds2, tje
) ⌈s−1
2
⌉.
(4.1.11)
By combining (4.1.9), (4.1.10) and (4.1.11), we get
∇2xxv2(y,x) =
∂2
∂x∂xv2(x,y) = 2Qxx +Rxx − 2Txx, (4.1.12)
82
where
Qxxi,j = (WH−1WT) •
⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉,
Rxxi,j = (WH−1WT) •
⌈s−1
2
⌉ (2 ds2, tie
⌈s−1
2
⌉ds2, tje − dtj, tie
) ⌈s−1
2
⌉,
Txxi,j = (WH−1WT) •
⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉WH−1W
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉.
By following similar steps as above, we have that
∇2yyv2(y,x) =
∂2
∂y∂yv2(x,y) = 2Qyy +Ryy − 2T yy,
where
Qyyi,j = (WH−1WT) •
⌈s−1
2
⌉ds2,wje
⌈s−1
2
⌉ds2,wie
⌈s−1
2
⌉,
Ryyi,j = (WH−1WT) •
⌈s−1
2
⌉ (2 ds2,wie
⌈s−1
2
⌉ds2,wje − dwj,wie
) ⌈s−1
2
⌉,
T yyi,j = (WH−1WT) •
⌈s−1
2
⌉ds2,wje
⌈s−1
2
⌉WH−1W
⌈s−1
2
⌉ds2,wie
⌈s−1
2
⌉;
∇2xyv2(x,y) =
∂2
∂y∂xv2(x,y) = 2Qxy +Rxy − 2Txy,
where
Qxyi,j = (WH−1WT) •
⌈s−1
2
⌉ds2,wje
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉,
Rxyi,j = (WH−1WT) •
⌈s−1
2
⌉ (2 ds2, tie
⌈s−1
2
⌉ds2,wje − dwj, tie
) ⌈s−1
2
⌉,
Txyi,j = (WH−1WT) •
⌈s−1
2
⌉ds2,wje
⌈s−1
2
⌉WH−1W
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉;
and
∇2yxv2(y,x) =
∂2
∂x∂yv2(x,y) = 2Qyx +Ryx − 2T yx,
where
Qyyi,j = (WH−1WT) •
⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉ds2,wie
⌈s−1
2
⌉,
83
Ryyi,j = (WH−1WT) •
⌈s−1
2
⌉ (2 ds2,wie
⌈s−1
2
⌉ds2, tje − dtj,wie
) ⌈s−1
2
⌉,
T yxi,j = (WH−1WT) •
⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉WH−1W
⌈s−1
2
⌉ds2,wie
⌈s−1
2
⌉.
Now, define ϕk : R+ ×F0 ×F (k)(x) −→ R by
ϕk(µ,x,y) := dTy + µc2v2(x,y).
By (4.1.6) we then have
ρk(µ,x) = miny∈F(k)(x)
ϕk(µ,x,y)
and
ρk(µ,x) = ϕk(µ,x,y)|y=y = ϕk(µ,x, y),
where y is the minimizer of (4.1.6). Observe that y is a function of x and is defined by
∇yϕk(µ,x,y)|y=y = 0. (4.1.13)
Note that, by (4.1.13), we have ∇yv2(x, y) = ∇yv2(x,y)|y=y = − 1µc2d. This implies that
∇2yxv2(x, y) = ∇x∇yv2(x, y) = 0 and ∇3
yxxv2(x, y) = ∇2xx∇yv2(x, y) = 0.
Now we are ready to calculate the first and second order derivatives of ρk with respect to
x. We have
∇xρk(µ,x) = [∇xϕk(µ,x,y) +∇yϕk(µ,x,y) ∇xy]|y=y
= ∇xϕk(µ,x,y)|y=y +∇yϕk(µ,x,y)|y=y ∇xy|y=y
= ∇xϕk(µ,x,y)|y=y
= µc2∇xv2(x,y)|y=y
84
= µc2∇xv2(x, y),
∇2xxρk(µ,x) = µ∇x∇xv2(x, y)
= µc2∇2xxv2(x, y) +∇2
yxv2(x, y) [∇xy]|y=y
= µc2∇2xxv2(x, y),
∇3xxxρk(µ,x) = µ∇x∇2
xxv2(x, y
= µc2∇3xxxv2(x, y) +∇3
yxxv2(x, y) [∇xy]|y=y
= µc2∇3xxxv2(x, y).
In summary we have
∇xρk(µ,x) = µc2∇xv(k)2 (x, y(k)),
∇2xxρk(µ,x) = µc2∇2
xxv(k)2 (x, y(k)),
∇3xxxρk(µ,x) = µc2∇3
xxxv(k)2 (x, y(k)),
(4.1.14)
and
∇xη(µ,x) = c+ µc1∇xv1(x) +K∑k=1
µc2∇xv(k)2 (x, y(k)),
∇2xxη(µ,x) = µc1∇2
xxv1(x) +K∑k=1
µc2∇2xxv
(k)2 (x, y(k)),
(4.1.15)
where ∇xv(k)2 (x, y(k)), ∇2
xxv(k)2 (x, y(k)), and ∇3
xxxv(k)2 (x, y(k)) are calculated in (4.1.9),
(4.1.12), and (4.2.4) respectively.
4.2 Self-concordance properties of the volumetric bar-
rier recourse
In this section we prove that the recourse function with volumetric barrier is a strongly
self-concordant function leading to a strongly self-concordant family with appropriate
85
parameters. Establishing this allows us to develop volumetric barrier polynomial time
path following interior point methods for solving SSPs.
We need the following proposition for proving some results.
Proposition 4.2.1. Let A,B,C ∈ Rn×n. Then
1. A,B 0 implies that A •B ≥ 0;
2. if A 0 and B C, then A •B ≥ A • C.
4.2.1 Self-Concordance of η(µ, ·)
This subsection is devoted to show that η(µ, ·) is a strongly self-concordant barrier on F0
(see Definition 3.2.1). Throughout this subsection, with respect to h ∈ Rm1 , we define
b := b(h) :=
m1∑i=1
hiti and b :=⌈s−1/22
⌉ds2, be
⌈s−1/22
⌉e2.
Our proof relies on the following lemmas.
Lemma 4.2.1. Let (x,y) be such that s2(x,y) KJ2 0. Then we have
0 Qxx ∇2xxv2(x,y). (4.2.1)
Proof. Let h ∈ Rm1 , h 6= 0. We have
hTQxxh =∑i,j
Qxxij hihj
= (WH−1WT) •∑i,j
( ⌈s−1
2
⌉ds2, tje
⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉hihj
)= (WH−1WT) •
⌈s−1
2
⌉ (∑i,j
ds2, hjtje⌈s−1
2
⌉ds2, hitie
) ⌈s−1
2
⌉= (WH−1WT) •
⌈s−1
2
⌉ [∑j
ds2, hjtje] ⌈s−1
2
⌉ [∑i
ds2, hitie] ⌈s−1
2
⌉= (WH−1WT) •
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉
86
=(⌈s−1/22
⌉WH−1WT
⌈s−1/22
⌉)•(⌈s−1/22
⌉ds2, be
⌈s−1/22
⌉)2
= P •⌊b2⌋.
Similarly we have
hTRxxh = (WH−1WT) •(
2⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉−⌈s−1
2
⌉dbe⌈s−1
2
⌉ )=
(⌈s−1/22
⌉WH−1WT
⌈s−1/22
⌉)•(
2(⌈s−1/22
⌉ds2, be
⌈s−1/22
⌉)2
−⌈s−1/22
⌉dbe⌈s−1/22
⌉)= P •
⌈b⌉,
and
hTTxxh = (WH−1WT) • ds2e−1 ds2, be⌈s−1
2
⌉WH−1W
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉= P •
⌈s−1/22
⌉ds2, be
⌈s−1/22
⌉P⌈s−1/22
⌉ds2, be
⌈s−1/22
⌉= P •
⌊b⌋P⌊b⌋.
Using Proposition 4.2.1 and observing that P 0 and⌊b2⌋ 0, we conclude that
Qxx 0.
Since P is a projection, we have that I − P 0 and therefore
⌊b⌋P⌊b⌋⌊b⌋2
=1
2
(⌊b2⌋
+⌈b⌉). (4.2.2)
This implies that
P •⌊b⌋P⌊b⌋≤ 1
2P •
(⌊b2⌋
+⌈b⌉),
which is exactly hTTxxh ≤ 1
2hT(Qxx +Rxx)h. Since h is arbitrary, we have shown that
Txx 1
2(Qxx +Rxx), which together with Qxx 0 establishes the result. 2
87
Lemma 4.2.2. For any h ∈ Rm1, and (x,y) be such that s2(x,y) KJ2 0. Then
‖b‖ ≤√
2 r3/22 (hTQxxh)1/2. (4.2.3)
Proof. By Theorem 1.3.1, there exist real numbers λ1, λ2, · · · , λr2 and a Jordan frame
c1, c2, · · · , cr2 such that b = λ1c1 +λ2c2 + · · ·+λr2cr2 , and λ1, λ2, · · · , λr2 are the eigenval-
ues of b. Without loss of generality (scaling h as needed, and re-ordering indices), we may
assume that 1 = |λ1| ≥ |λ2| ≥ . . . ≥ λr2 . In view of Lemma 1.3.3, the matrix⌊b2⌋
has a
full set of orthonormal eigenvectors cij with corresponding eigenvalues (1/2)(λ2i +λ2
j), for
1 ≤ i ≤ j ≤ r2. It follows that
hTQxxh =1
2P
[r2∑
i,j=1
(λ2i + λ2
j) cijcTij
]
=1
2
r2∑i,j=1
(λ2i + λ2
j) cTijPcij.
Recall that P is a projection onto an m2-dimensional space. So, we can write P as
P =
m2∑l=1
uluTl ,
where u1,u2, . . . ,um2 are the orthonormal eigenvectors of P corresponding to the nonzero
eigenvalues of P . Consider uk for some k, we have
uk =
r2∑i,j=1
αijcij,
for some constants αij, for i, j = 1, 2, . . . , r2, and
1 = ‖uk‖ =∥∥∥ r2∑i,j=1
αijcij
∥∥∥ ≤ r2∑i,j=1
‖αijcij‖ =
r2∑i,j=1
|αij|.
88
This means that there exist ik, jk such that
|αikjk | ≥1
r22
.
Thus
hTQxxh =1
2
∑i,j
(λ2i + λ2
j)cTij
(m2∑l=1
uluTl
)cij
=1
2
∑i,j
(λ2i + λ2
j)
m2∑l=1
cTijuluTl cij
=1
2
∑i,j
(λ2i + λ2
j)
m2∑l=1
‖uTl cij‖2
≥ 1
2
∑i,j
(λ2i + λ2
j)‖uTk cij‖2
=1
2
∑i,j
(λ2i + λ2
j)|αikjk |2
≥ 1
2
∑i,j
(λ2i + λ2
j)1
r42
≥ 1
2r42
∑j
(λ2i + λ2
j)
≥ 1
2r42
∑j
λ2i
=1
2r42
∑j
‖b‖22
=1
2r32
‖b‖22.
The result is established. 2
We will next compute the third partial derivative of v2(x,y) with respect to x. To
start, let (x,y) be such that s2(x,y) KJ2 0, and h ∈ Rn2 . We have
∂
∂xihTQxxh = W
(∂
∂xiH−1
)WT •
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉+ (WH−1WT) • ∂
∂xj
(⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉)= 2WH−1WT ds2e−1 ds2, tie ds2e−1WH−1WT •
(⌈s−1
2
⌉ds2, be
)2 ⌈s−1
2
⌉+ (WH−1WT) • ∂
∂xi
(⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉),
89
where
∂
∂xi
(⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉)= −2
⌈s−1
2
⌉ (ds2, tie
⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, be
+ ds2, be⌈s−1
2
⌉ds2, tie
⌈s−1
2
⌉ds2, be
+ ds2, be⌈s−1
2
⌉ds2, be
⌈s−1
2
⌉ds2, tie
) ⌈s−1
2
⌉+⌈s−1
2
⌉ (dti, be
⌈s−1
2
⌉ds2, be
+ ds2, be⌈s−1
2
⌉dti, be
) ⌈s−1
2
⌉.
We conclude that the first directional derivative of hTQxxh with respect to x, in the
direction h, is given by
∇xhTQxxh [h] =
m1∑i=1
hi∂
∂xihTQxxh
= 2P •⌊b⌋P⌊b2⌋− 3P •
⌊b3⌋− P
⌈b2, b
⌉.
By following similar steps as above, we obtain
∇xhTRxxh [h] = 2P •
⌊b⌋P⌈b⌉− 4P •
⌈b2, b
⌉,
∇xhTTxxh [h] = 4P •
⌊b⌋P⌊b⌋P⌊b⌋− 4P •
⌊b⌋P⌊b2⌋− 2P •
⌊b⌋P⌈b⌉.
By combining the previous results, we get
∇3xxxv2(x,y) [h,h,h] = 12P •
⌊b⌋P⌊b2⌋− 6P •
⌊b3⌋− 6P •
⌈b2, b
⌉+6P •
⌊b⌋P⌈b⌉− 8P •
⌊b⌋P⌊b⌋P⌊b⌋.
(4.2.4)
We need the following lemma which bounds ∇3xxxv2(x,y) [h,h,h].
Lemma 4.2.3. For any h ∈ Rm1 and (x,y) be such that s2(x,y) KJ2 0. Then
|∇3xxxv2(x,y) [h,h,h] | ≤ 30‖b‖hTQxxh. (4.2.5)
90
Proof. Note that ⌊b2⌋ ⌊b⌋
=1
2
(⌊b3⌋
+⌈b2, b
⌉).
Then (4.2.4) can be rewritten as
∇3xxxv2(x,y) [h,h,h] = P
⌊b⌋P •
(12⌊b2⌋
+ 6⌈b⌉− 8
⌊b⌋P⌊b⌋)− 12P •
⌊b2⌋ ⌊b⌋.
(4.2.6)
From (4.2.2) we have
12⌊b2⌋
+ 6⌈b⌉− 8
⌊b⌋P⌊b⌋ 8
⌊b2⌋
+ 2⌈b⌉.
By observing that that −⌊b2⌋⌈b⌉⌊b2⌋, we obtain
6⌊b2⌋ 12
⌊b2⌋
+ 6⌈b⌉− 8
⌊b⌋P⌊b⌋ 18
⌊b2⌋. (4.2.7)
Let λ1, λ2, . . . , λr2 be the eigenvalues of B. Then, for i, j = 1, 2, . . . , r2, the eigenvalues of⌊b⌋
are of the form (1/2)(λi + λj) (see Lemma 1.3.3). We then have
−‖b‖ I ⌊b⌋ ‖b‖ I
and hence
−‖b‖P P⌊b⌋P ‖b‖P. (4.2.8)
Using (4.2.7), (4.2.8), and the face that⌊b2⌋ 0, we get
∣∣P ⌊b⌋P • (12⌊b2⌋
+ 6⌈b⌉− 8
⌊b⌋P⌊b⌋)∣∣ ≤ 18||b‖P •
⌊b2⌋. (4.2.9)
In addition, since the elements b2 and b have the same eigenvectors, the matrices⌊b2⌋
91
and⌊b⌋
also have the same eigenvectors. This implies that
−‖b‖⌊b2⌋⌊b2⌋ ⌊b⌋ ‖b‖
⌊b2⌋.
Thus we have that ∣∣P • ⌊b2⌋ ⌊b⌋∣∣ ≤ ‖b‖P • ⌊b2
⌋. (4.2.10)
The result follows from (4.2.6), (4.2.9) and (4.2.10). 2
We can now state the proof of Theorem 4.2.1.
Theorem 4.2.1. For any fixed µ > 0, ρk(µ, ·) is µ-self-concordant on F0, for k =
1, 2, . . . , K.
Proof. By combining the results of (4.2.1), (4.2.3) and (4.2.5), we get
∣∣∇3xxxv2(x, y) [h,h,h]
∣∣ ≤ 30√
2n3/22 (hT∇2
xxv2(x, y)h)3/2,
which combined with (4.1.14) implies that
|∇3xxxρk(y) [h,h,h]| ≤ 30
√2µc2r
3/22 (∇2
xxv2(x, y) [h,h])3/2
= 2µ−1/2(c2µ∇2xxv2(x,y) [h,h])3/2
= 2µ−1/2(∇2xxρk(x) [h,h])3/2.
The theorem is established. 2
Corollary 4.2.1. For any fixed µ > 0, η(µ, ·) is a µ-self-concordant function on F0.
Proof. It is easy to verify that µc1v1 is µ-self-concordant on F1. The corollary follows
from [30, Proposition 2.1.1]. 2
92
4.2.2 Parameters of the self-concordant family
In this subsection, we show that the family of functions η(µ, ·) : µ > 0 is a strongly
self-concordant family with appropriate parameters (see definition 3.2.2).
The proof of self-concordancy of the family η(µ, ·) : µ > 0 relies on the following
two lemmas.
Lemma 4.2.4. For any µ > 0 and x ∈ F0, the following inequality holds:
|∇2xxη(µ,x)′[h,h]| ≤ 1
µ∇2
xxη(µ,x)[h,h], ∀h ∈ Rm1 .
Proof. Differentiating ∇2xxη(µ,x) in (4.1.15) with respect to µ, we obtain
∇2xxη(µ,x)′ = ∇2
xxv1(x) +K∑k=1
∇2xxv
(k)2 (x, y(k)) + µ∇3
xxyv(k)2 (x, y(k)) (y(k))′
= ∇2xxv1(x) +
K∑k=1
∇2xxv
(k)2 (x, y(k)) =
1
µ∇2
xxη(µ,x).
The result immediately follows by observing that ∇2xxη(µ,x) 0, and therefore, for
any h ∈ Rn, we have that 1µ∇2
xxη(µ,x)[h,h] ≥ 0. 2
For fixed (x, y) with s2(x, y) KJ2 0, let
ti =⌈s−1/22
⌉ds2, tie
⌈s−1/22
⌉e2 and wj =
⌈s−1/22
⌉ds2,wje
⌈s−1/22
⌉e2,
for i = 1, 2, . . . ,m1 and j = 1, 2, . . . ,m2. We can apply a Gram-Schmidt procedure to
bwic and obtain buic with ‖ buic ‖ = 1 for all i and ui • uj = 0, i 6= j. Then the
linear span of buic , i = 1, 2, . . . ,m2 is equal to the span of bwic , i = 1, 2, . . . ,m2.
Let U = [u1; u2; . . . ; um2 ] ∈ Rn22×m2 and u =
∑m2
k=1 ui. It follows that P = UUT. We
93
then have∂v2(x, y)
∂xi= −P • btic
= −UUT • btic
= −trace(U bticU)
= −m2∑k=1
uk • (ti uk)
= −m2∑k=1
ti • u2k
= −ti •m2∑k=1
u2k
= −ti • u,
(4.2.11)
and
Qxxi,j = P • btic btjc
= trace(U btic btjcU)
=
m2∑k=1
(uk • (ti (tj uk)))
=
m2∑k=1
ti • (u2k tj)
= ti •
((m2∑k=1
u2k
) tj
)= ti • (u tj).
(4.2.12)
Lemma 4.2.5. Let (x, y) be such that s2(x, y) KJ2 0. Then
∇xv2(x, y)T∇2xxv2(x, y)−1∇xv2(x, y) ≤ m2. (4.2.13)
Proof. Let T = [t1; t2; . . . ; tm1 ] ∈ Rn22×m1 . From (4.2.12) we have that
Qxxi,j = ti • (u tj) = ti • due tj,
94
and hence
Qxx = TT due T .
From (4.2.11), we also have
∇xv2(x, y)T = −TTu.
Thus
∇xv2(x, y)T(Qxx)−1∇xv2(x, y) = u • T (TT due T )−1TTu
= u1/2 •⌈u1/2
⌉T (TT due T )−1TT
⌈u1/2
⌉u1/2
≤ u1/2 • u1/2
= trace(u)
= m2,
where the last equality follows from the fact that u =∑m2
k=1 u2k, and that trace(u2
k) = uk •
uk = 1 for each k. In addition, Qxx ∇2xxv2(x, y) implies ∇2
xxv2(x, y)−1 (Qxx)−1.
This completes the proof. 2
Lemma 4.2.6. For any µ > 0 and x ∈ F0, we have
|∇xη′(µ,x)T[h]| ≤
√(m1c1 +m2c2)(1 +K)
µ∇2
xxη(µ,x)[h,h], ∀h ∈ Rm1 .
Proof. Differentiating ∇xη(µ,x) in (4.1.15) with respect to µ, we obtain
∇xη′(µ,x) = c1∇xv1(x) +
K∑k=1
c1∇xv(k)2 (x, y(k)) + µ∇2
xyv(k)2 (x, y(k)) · (y(k))′
= c1∇xv1(x) +K∑k=1
c2∇xv(k)2 (x, y(k)) .
95
In Lemma 4.2.13, we have shown that
∇xv(k)2 (x, y(k))T∇2
xxv(k)2 (x, y(k))−1∇xv
(k)2 (x, y(k)) ≤ m2,
which is equivalent to
|∇xv(k)2 (x, y(k))[h]| ≤
√m2∇2
xxv(k)2 (x, y(k))[h,h], ∀h ∈ Rm2 . (4.2.14)
Similarly, we can show that
∇xv1(x)T∇2xxv1(x)−1∇xv1(x) ≤ m1,
which is equivalent to
|∇xv1(x)[h]| ≤√m1∇2
xxv1(x)[h,h], ∀h ∈ Rm1 . (4.2.15)
Then, using (4.2.14) and (4.2.15), we have that for all h ∈ Rm1
|∇xη′(µ,x)[h]| =
∣∣∣∣∣(c1∇xv1(x) +
K∑k=1
c2∇xv(k)2 (x, y(k))
)[h]
∣∣∣∣∣≤ |c1∇xv1(x)[h]|+
K∑k=1
|c2∇xv(k)2 (x, y(k))[h]|
≤√m1c2
1∇2xxv1(x)[h,h] +
K∑k=1
√m2c2
2∇2xxv
(k)2 (x, y(k))[h,h]
≤√
(m1c1)c1∇2xxv1(x)[h,h] +
K∑k=1
√(m2c2)c2∇2
xxv(k)2 (x, y(k))[h,h]
≤
√√√√(m1c1 +m2c2)(1 +K)
(c1∇2
xxv1(x)[h,h] +K∑k=1
c2∇2xxv
(k)2 (x, y(k))[h,h]
)
=
√(m1c1 +m2c2)(1 +K)
µ∇2
xxη(µ,x)[h,h].
96
The result is established. 2
Theorem 4.2.2. The family η(µ, ·) : µ > 0 is a strongly self-concordant family with
the following parameters
α1(µ) = µ, α2(µ) = α3(µ) = 1, α4(µ) =
√(1 +K)(m1c1 +m2c2)
µ, α5(µ) =
1
µ.
Proof. It is direct to see that condition (i) of Definition 3.2.2 holds. Corollary 4.2.1
shows that condition (ii) is satisfied and Lemmas 4.2.4 and 4.2.6 show that condition (iii)
is satisfied. 2
4.3 A class of volumetric barrier algorithms for solv-
ing SSPs
In §5 we have established that the parametric functions η(µ, ·) is a strongly self-concordant
family. Therefore, it becomes straightforward to develop primal path following interior
point algorithms for solving SSP (4.1.3, 4.1.4). In this section we introduce a class of
volumetric barrier algorithms for solving this problem. This class is stated formally in
Algorithm 2.
Our algorithm is initialized with a starting point x0 ∈ F0 and a starting value µ0 > 0
for the barrier parameter µ, and is indexed by a parameter γ ∈ (0, 1). We use δ as a
measure of the proximity of the current point x to the central path, and β as a threshold
for that measure. If the current x is too far away from the central path in the sense that
δ > β, Newton’s method is applied to find a point close to the central path. Then the
value of µ is reduced by a factor γ and the whole precess is repeated until the value of
µ is within the tolerance ε. By tracing the central path as µ approaches zero, a strictly
97
Algorithm 2 Volumetric Barrier Algorithm for Solving SSP (4.1.3,4.1.4)
Require: ε > 0, γ ∈ (0, 1), θ > 0, β > 0, x0 ∈ F0 and µ0 > 0.x := x0, µ := µ0
while µ ≥ ε dofor k = 1, 2, . . . , K do
solve (4.1.6) to obtain y(k)
end forcompute ∆x := −∇2
xxη(µ,x)−1∇xη(µ,x) using (4.1.15)
compute δ(µ,x) :=
√− 1
µ∆xT∇2
xxη(µ,x)∆x using (4.1.15)
while δ > β dox := x+ θ∆xfor k = 1, 2, . . . , K do
solve (4.1.6) to obtain y(k)
end forcompute ∆x := −∇2
xxη(µ,x)−1∇xη(µ,x) using (4.1.15)
compute δ(µ,x) :=
√− 1
µ∆xT∇2
xxη(µ,x)∆x using (4.1.15)
end whileµ := γµ
end while
feasible ε-solution to (4.1.6) will be generated.
4.4 Complexity analysis
Theorems 4.4.1 and 4.4.2 present the complexity analysis for two variants of algorithms:
short-step algorithms and long-step algorithms, which are controlled by the manner of
selection γ that we have made in Algorithm 2.
In the short-step version of the algorithm, we decrease the barrier parameter in each
iteration by a factor γ := 1−σ/√
(1 +K)(m1c1 +m2c2), σ < 0.1. The kth iteration of the
short-step algorithms is performed as follows: at the beginning of the iteration, we have
µ(k−1) and x(k−1) on hand and x(k−1) is close to the center path, i.e., δ(µ(k−1),x(k−1)) ≤ β.
After the barrier parameter µ is reduced from µ(k−1) to µk := γµ(k−1), we have that
δ(µk,x(k−1)) ≤ 2β. Then a full Newton step with size θ = 1 is taken to produce a new
98
point xk with δ(µk,xk) ≤ β. We now show that, in this class of algorithms, only one
Newton step is sufficient for recentering after updating the parameter µ. We have the
following theorem.
Theorem 4.4.1. Consider Algorithm 2 and let µ0 be the initial barrier parameter, ε > 0
the stopping criterion, and β = (2 −√
3)/2. If the starting point x0 is sufficiently close
to the central path, i.e., δ(µ0,x0) ≤ β, then the short-step algorithm reduces the barrier
parameter µ at a linear rate and terminates with at most O(√
(1 +K)(m1c1 +m2c2)
ln(µ0/ε)) iterations.
In the long-step version of the algorithm, the barrier parameter is decreased by an
arbitrary constant factor γ ∈ (0, 1). It has a potential for larger decrease on the objective
function value, however, several damped Newton steps might be needed for restoring the
proximity to the central path. The kth iteration of the long-step algorithms is performed
as follows: at the beginning of the iteration we have a point x(k−1), which is sufficiently
close to x(µ(k−1)), where x(µ(k−1)) is the solution to (4.1.5) for µ := µ(k−1). The barrier
parameter is reduced from µ(k−1) to µk := γµ(k−1), where γ ∈ (0, 1), and then searching
is started to find a point xk that is sufficiently close to x(µk). The long-step algorithm
generates a finite sequence consisting of N points in F0, and we finally take xk to be
equal to the last point of this sequence. We want to determine an upper bound on N , the
number of Newton iterations that are needed to find the point xk. We have the following
theorem.
Theorem 4.4.2. Consider Algorithm 2 and let µ0 be the initial barrier parameter, ε > 0
the stopping criterion, and β = 1/6. If the starting point x0 is sufficiently close to the
central path, i.e., δ(µ0,x0) ≤ β, then the long-step algorithm reduces the barrier parameter
µ at a linear rate and terminates with at most O((1+K)(m1c1+m2c2) ln(µ0/ε)) iterations.
The proofs of the Theorems 4.4.1 and 4.4.1 are similar to the proofs of Theorems 3.4.1
99
and 3.4.2 and are given in the subsections below. For convenience, we restate Proposition
3.4.1(i) and Lemma 3.4.1.
Proposition 4.4.1. For any µ > 0 and x ∈ F0, we denote ∆x := −∇2xxη(µ,x)−1
∇xη(µ,x) and δ :=√
1µ∇2
xxη(µ,x)[∆x,∆x]. Then for δ < 1, τ ∈ [0, 1] and any h ∈ Rm2
we have
∇2xxη(µ,x+ τ∆x)[h,h] ≤ (1− τδ)−2∇2
xxη(µ,x)[h,h].
Lemma 4.4.1. For any µ > 0 and x ∈ F0, let ∆x be the Newton direction defined
by ∆x := −∇2xxη(µ,x)−1∇xη(µ,x); δ := δ(µ,x) =
√1µ∇2
xxη(µ,x)[∆x,∆x], x+ =
x+ ∆x, ∆x+ be the Newton direction calculated at x+, and δ(µ,x+) :=√1µ∇2
xxη(µ,x+)[∆x+,∆x+]. Then the following relations hold:
(i) If δ < 2−√
3, then δ(µ,x+) ≤(
δ
1− δ
)2
≤ δ
2.
(ii) If δ ≥ 2−√
3, then η(µ,x)− η(µ,x+ θ∆x) ≥ µ(δ− ln(1 + δ)), where θ = (1 + δ)−1.
4.4.1 Complexity for short-step algorithm
For the purpose of proving Theorems 4.4.1, we present the following proposition which is
a restatement of [30, Theorem 3.1.1].
Proposition 4.4.2. Let χκ(η;µ, µ+) :=
(1 +
√(1+K)(m1c1+m2c2)
κ
)lnγ−1. Assume that
δ(µ,x) < κ and µ+ := γµ satisfies
χκ(η;µ, µ+) ≤ 1− δ(µ,x)
κ.
Then δ(µ+,x) < κ.
Lemma 4.4.2. Let µ+ = γµ, where γ = 1 − σ/√
(1 +K)(m1c1 +m2c2) and σ ≤ 0.1,
and let β = (2−√
3)/2. If δ(µ,x) ≤ β, then δ(µ+,x) ≤ 2β.
100
Proof. Let κ := 2β = 2 −√
3. Since δ(µ,x) ≤ κ/2, one can verify that for σ ≤ 0.1, µ+
satisfies
χκ(η;µ, µ+) ≤ 1
2≤ 1− δ(µ,x)
κ.
By Proposition 4.4.2, we have δ(µ+,x) ≤ κ. 2
By Lemmas 4.4.1(i) and 4.4.2, we conclude that we can reduce the parameter µ by
the factor γ := 1 − σ/√
(1 +K)(m1c1 +m2c2), σ < 0.1, at each iteration, and that only
one Newton step is sufficient to restore proximity to the central path. Hence, Theorem
4.4.1 follows.
4.4.2 Complexity for long-step algorithm
For x ∈ F0 and µ > 0, we define the function φ(µ,x) := η(µ,x(µ)) − η(µ,x), which
represents the difference between the objective value η(µk,x(k)) at the end of kth iteration
and the minimum objective value η(µk,x(µk−1)) at the beginning of kth iteration. Then
the task is to find an upper bound on φ(µ,x). To do so, we first give upper bounds on
φ(µ,x) and φ′(µ,x) respectively. We have the following lemma.
Lemma 4.4.3. Let µ > 0 and x ∈ F0, we denote ∆x := x− x(µ) and define
δ := δ(µ,x) =
√1
µ∇2
xxη(µ,x)[∆x, ∆x].
For any µ > 0 and x ∈ F0, if δ < 1, then the following inequalities hold:
φ(µ,x) ≤ µ
(δ
1− δ+ ln(1− δ)
), (4.4.1)
|φ′(µ,x)| ≤ −√
(1 +K)(m1c1 +m2c2) ln(1− δ). (4.4.2)
101
Proof.
φ(µ,x) := η(µ,x)− η(µ,x(µ)) :=
∫ 1
0
∇xη(µ,x+ (1− τ)∆x)[∆x]dτ.
Since x(µ) is the optimal solution, we have
∇xη(µ,x(µ)) = 0. (4.4.3)
Hence, with the aid of Proposition 4.4.1, we get
φ(µ,x) =
∫ 1
0
∫ 1
0
∇2xxη(µ,x(µ) + (1− α)∆x)[∆x, ∆x] dαdτ
≤∫ 1
0
∫ 1
0
∇2xxη(µ,x)[∆x, ∆x]
(1− δ + αδ)2dαdτ
=
∫ 1
0
∫ 0
1
µδ2
(1− δ + αδ)2dαdτ
= µ
(δ
1− δ+ ln(1− δ)
),
which establishes (4.4.1).
Now, for any µ > 0, by applying the chain rule, using (4.4.3), and applying the
Mean-Value Theorem, we get
φ′(µ,x) = η′(µ,x)− η′(µ,x(µ))−∇xη(µ,x(µ))Tx′(µ)
= η′(µ,x)− η′(µ,x(µ))
= ∇xη(µ,x(µ) +$∆x)T∆x,
(4.4.4)
for some $ ∈ (0, 1). Hence
|φ′(µ,x)| =
∣∣∣∣∫ 1
0
∇xη(µ,x(µ) + τ∆x)′[∆x] dτ
∣∣∣∣
102
≤∫ 1
0
√∇2
xxη(µ,x(µ) + τ∆x)[∆x, ∆x]√∇xη(µ,x(µ) + τ∆x)′ • (∇2
xxη(µ,x(µ) + τ∆x)−1∇xη(µ,x(µ) + τ∆x)′) dτ.
In view of Lemma 4.2.6 we have the following estimation
−∇xη(µ,x)′ • (∇2xxη(µ,x)−1∇xη(µ,x)′) ≤ (1 +K)(m1c1 +m2c2)
µ. (4.4.5)
Then, by using (4.4.5), Proposition 4.4.1, and the observation x(µ)+τ∆x = x−(1−τ)∆x,
we obtain
|φ′(µ,x)| ≤∫ 1
0
√∇2
xxη(µ,x)[∆x, ∆x]
1− (1− τ)δ
√(1 +K)(m1c1 +m2c2)
µdτ
=
∫ 1
0
δ√µ
1− δ + τ δ
√(1 +K)(m1c1 +m2c2)
µdτ
=√
(1 +K)(m1c1 +m2c2)
∫ 1
0
δ
1− δ + τ δdτ
= −√
(1 +K)(m1c1 +m2c2) ln(1− δ),
which establishes (4.4.2). 2
Lemma 4.4.4. Let µ > 0 and x ∈ F0 be such that δ < 1, where δ is as defined in Lemma
4.4.3. Let µ+ := γµ with γ ∈ (0, 1). Then
η(µ+,x)− η(µ+,x(µ+)) ≤ O(1)[(1 +K)(m1c1 +m2c2)]µ+.
Proof. Differentiating (4.4.4) with respect to µ, we get
φ′′(µ,x) = η′′(µ,x)− η′′(µ,x(µ))− ∇xη(µ,x(µ))′Tx′(µ). (4.4.6)
We will work on the right-hand side of (4.4.6) by bounding the second and the last
terms. Observe that η′′(µ,x(µ)) =∑K
k=1 ρ(k)′′(µ,x). Differentiating ρ(k)(µ,x) with re-
103
spect to µ we obtain ρ(k)′(µ,x) = c2v2(x, y). Now, differentiating ρ(k)′(µ,x) with respect
to µ we obtain ρ(k)′′(µ,x) = c2∇yv2(x, y)Ty′. Then, differentiating (4.1.13) yields
y′ = − 1
µ∇2
yyv2(x, y)−1∇yv2(x, y).
Therefore we have
ρ′′k(µ,x) = − 1
µc2∇yv2(x, y)T∇2
yyv2(x, y)−1∇yv2(x, y).
It can be shown (see also [7, Theorem 4.4]) that −ρ′′k(µ, y) ≤ 1
µc2m2. Thus
−η′′(µ,x(µ)) = −K∑k=1
ρ(k)′′(µ,x) ≤ m2c2K
µr2
. (4.4.7)
By differentiating (4.4.3) with respect to µ, we get
∇xη(µ,x(µ))′ +∇2xxη(µ,x(µ))x′(µ) = 0,
or equivalently,
x′(µ) = −∇2xxη(µ,x(µ))−1∇xη
′(µ,x(µ)).
Hence, by using (4.4.5), we have
−∇xη(µ,x(µ))′Tx′(µ) = ∇xη(µ,x(µ))′T∇2xxη(µ,x(µ))−1∇xη(µ,x(µ))′
≤ (1 +K)(m1c1 +m2c2)
µ.
(4.4.8)
Observe that η(µ,x) is concave in µ, it follows that η′′(µ,x) ≤ 0. Combining this with
(4.4.7) and (4.4.8), we obtain
104
φ′′(µ,x(µ)) ≤ m(1 +K)(m1c1 + 2m2c2)
µ. (4.4.9)
Applying the Mean-Value Theorem and using Lemma 4.4.3 and (4.4.9) to get
φ(µ+,x) = φ(µ,x) + φ′(µ,x)(µ+ − µ) +
∫ µ
µ+
∫ µ
τ
φ′′(υ,x) dυdτ
≤ µ
(δ
1− δ+ ln(1− δ)
)−√
(1 +K)(m1c1 +m2c2) ln(1− δ) (µ+ − µ)
+m(1 +K)(m1c1 + 2m2c2)
∫ µ
µ+
∫ τ
µ
υ−1 dυdτ
= µ
(δ
1− δ+ ln(1− δ)
)−√
(1 +K)(m1c1 +m2c2) ln(1− δ) (µ+ − µ)
+m(1 +K)(m1c1 + 2m2c2) (µ− µ+) ln τµ
≤ µ
(δ
1− δ+ ln(1− δ)
)−√
(1 +K)(m1c1 +m2c2) ln(1− δ) (µ− µ+)
+m(1 +K)(m1c1 + 2m2c2) (µ− µ+) lnγ−1.
(Recall that γ−1 = µ+/µ ≥ τ/µ.)
Since δ and γ are constants, the lemma is established. 2
Note that the previous lemma requires δ < 1. However, evaluating δ explicitly may
not be possible. In the next lemma we will see that δ is actually proportional to δ, which
can be evaluated.
Lemma 4.4.5. For any µ > 0 and x ∈ F0, let ∆x := −∇2xxη(µ,x)−1∇xη(µ,x) and
∆x := x− x(µ). We denote
δ := δ(µ,x) =
√− 1
µ∇2
xxη(µ,x)[∆x,∆x] and δ := δ(µ,x) =
√− 1
µ∇2
xxη(µ,x)[∆x, ∆x].
If δ < 1/6, then 23δ ≤ δ ≤ 2δ.
Proof. See the proof of Lemma 3.4.5. 2
Theorem 4.4.2 follows by combining Lemmas 4.4.1(ii), 4.4.4, and 4.4.5.
105
Chapter 5
Some Applications
In this chapter of this dissertation, we will turn our attention to present four applications of
SSOCPs and SRQCPs. Namely, we describe stochastic Euclidean facility location problem
and the portfolio optimization problem with loss risk constraints as two applications
of SSOCPs, then we describe the optimal covering random ellipsoid problem and an
application in structural optimization as two applications of SRQCPs. We also refer the
reader to a paper by Maggioni et al. [24] which describes another important application
of SRQCPs in mobile ad hoc networks. The results of this chapter have been submitted
for publication [2].
5.1 Two applications of SSOCPs
Each one of the following two subsections is devoted to describe an application of SSOCPs.
5.1.1 Stochastic Euclidean facility location problem
In facility location problems (FLPs) we are interested in choosing a location to build a
new facility or locations to build multiple new facilities so that an appropriate measure
of distance from the new facilities to existing facilities is minimized. FLPs arise locating
106
airports, regional campuses, wireless communication towers, etc.
The following are two ways of classifying FLPs (see also [39]): We can classify FLPs
based on the number of new facilities in the following sense: if we add only one new facility
then we get a problem known as a single facility location problem (SFLP), while if we add
multiple new facilities instead of adding only one, then we get more a general problem
known as a multiple facility location problem (MFLP). Another way of classification is
based on the distance measure used in the model between the facilities. If we use the
Euclidean distance then these problems are called Euclidean facility location problems
(EFLPs), if we use the rectilinear distance then these problems are called rectilinear
facility location problems (RFLPs).
In (deterministic) Euclidean single facility location problem (ESFLP), we are given r
existing facilities represented by the fixed points a1,a2, · · · ,ar in Rn, and we plan to place
a new facility represented by x so that we minimize the weighted sum of the distances
between x and each of the points a1,a2, · · · ,ar. This leads us to the problem
min∑r
i=1wi ‖x− ai‖2
or, alternatively, to the problem (see for example [42])
min∑r
i=1wi ti
s.t. (t1;x− a1; · · · ; tr;x− ar) r(n+1)r 0,
where wi is the weight associated with the ith existing facility and the new facility for i =
1, 2, . . . , r. However, the resulting model is a DSOCP model. So, there is no stochasticity
in this model. But what if we assume that some of the fixed existing facilities are random?
Where could such a randomness be found in the real world?
Picture (A) in Figure 5.1 shows the locations of multi-national military facilities led
by troops from the United States and the United Kingdom in the Middle East region as
107
Figure 5.1: The multinational military facilities locations before and after starting theIraq war. Pictures taken from www.globalsecurity.org/military/facility/centcom.htm.
of December, 31, 2002, i.e., before the beginning of the Iraq war (it is known that the
war began on March 20, 2003). This picture shows the multi-national military facilities
including the navy facility locations and the air force facility locations but not the army
facility locations. Assuming that the random existing facilities are the Iraqi military
facilities whose locations are unknown at the time of taking this picture which is before
the beginning of the war. Assuming also that the new facilities are the army multi-national
facilities whose locations have to be determined at the time of taking this picture which
is before the beginning of the war. Then it is reasonable to look at a problem of this kind
as a stochastic ESFLP. Picture (B) in Figure 5.1 shows the locations of all multi-national
military facility locations including the army facility locations as of December, 31, 2002,
i.e., after starting the Iraq war.
So, in some applications, the locations of existing facilities cannot be fully specified
because the locations of some of them depend on information not available at the time
when decision needs to be made but will only be available at a later point in time. In
general, in order to be precise only the latest information of the random facilities is used.
This may require an increasing or decreasing of the number of the new facilities after
108
the latest information about the random facilities become available. In this case, we
are interested in stochastic facility location problems (or abbreviated as stochastic FLPs).
When the locations of all old facilities are fully specified, FLPs are called deterministic
facility location problems (or abbreviated as deterministic FLPs).
In this section we consider (both single and multiple) stochastic Euclidean facility
location problems, and in the next chapter we describe four different models of FLP, two
of them can be viewed as generalizations of the models presented in this section.
Stochastic ESFLPs
Let a1,a2, . . . ,ar1 be fixed points in Rn representing the coordinates of r1 existing fixed
facilities and a1(ω), a2(ω), . . . , ar2(ω) be random points in Rn representing the coordinates
of r2 random facilities who realizations depends on an underlying outcome ω in an event
space Ω with a known probability function P .
Suppose that at present we do not know the realizations of r2 random facilities, and
that at some point in time in future the realizations of these r2 random facilities become
known.
Our goal is to locate a new facility x that minimizes the weighted sums of the Euclidean
distance between the new facility and each one of the existing fixed facilities and also
minimizes the expected weighted sums of the distance between the new facility and the
realization of each one of the random facilities. Note that this decision needs to be made
before the realizations of the r2 random facilities become available. This leads us to the
following SSOCP model:
min∑r1
i=1wi ti + E [Q(x, ω)]
s.t. (t1;x− a1; . . . ; tr1 ;x− ar1) r1 0,
where Q(x, ω) is the minimum value of the problem
109
min∑r2
j=1 wj tj
s.t. (t1;x− a1(ω); . . . ; tr2 ;x− ar2(ω)) r2 0,
and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω),
where wi is the weight associated with the ith existing facility and the new facility for
i = 1, 2, . . . , r1 and wj(ω) is the weight associated with the jth random existing facility
and the new facility for j = 1, 2, . . . , r2.
Stochastic EMFLPs
Assume that we need to add m new facilities, namely x1,x2, . . . ,xm ∈ Rn, instead of
adding only one. This may require an increasing or decreasing of the number of the new
facilities after the latest information about the random facilities become available. For
simplicity, let us assume that the number of new facilities was previously known and fixed.
Then we have two cases depending on whether or not there is an interaction among the
new facilities in the underlying model. If there is no interaction between the new facilities,
we are just concerned in minimizing the weighted sums of the distance between each one
of the new facilities on one hand and each one of the fixed facilities and the realization of
each one of the random facilities on the other hand. In other words, we solve the following
SSOCP model:
min∑m
j=1
∑r1i=1wij tij + E [Q(x1; . . . ;xm, ω)]
s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1 0, j = 1, 2, . . . ,m,
where Q(x1; · · · ;xm, ω) is the minimum value of the problem
110
min∑m
j=1
∑r2i=1 wij(ω) tij
s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2 0, j = 1, 2, . . . ,m,
and
E[Q(x1; . . . ;xm, ω)] :=
∫Ω
Q(x1; . . . ;xm, ω)P (dω).
where wij is the weight associated with the ith existing facility and the jth new facility
for j = 1, 2, . . . ,m and i = 1, 2, . . . , r1, and wij(ω) is the weight associated with the ith
random existing facility and the jth new facility for j = 1, 2, . . . ,m and i = 1, 2, . . . , r2.
If interaction exists among the new facilities, then, in addition to the above require-
ments, we need to minimize the sum of the Euclidean distances between each pair of the
new facilities. In this case, we are interested in a model of the form:
min∑m
j=1
∑r1i=1wijtij +
∑mj=1
∑j−1j′=1 wjj′ tjj′ + E [Q(x1; . . . ;xr1 , ω)]
s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1 0, j = 1, 2, . . . ,m
(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j) 0, j = 1, 2, . . . ,m− 1,
where Q(x1; . . . ;xm, ω) is the minimum value of the problem
min∑m
j=1
∑r2i=1 wij(ω) tij +
∑mj=1
∑j−1j′=1 wjj′ tjj′
s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2 0, j = 1, 2, . . . ,m
(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j) 0, j = 1, 2, . . . ,m− 1,
and
E[Q(x1; . . . ;xm, ω)] :=
∫Ω
Q(x1; . . . ;xm, ω)P (dω),
111
where wjj′ is the weight associated with the new facilities j′ and j for j′ = 1, 2, . . . , j − 1
and j = 1, 2, . . . ,m.
5.1.2 Portfolio optimization with loss risk constraints
The application in this subsection is a well-known problem from portfolio optimization.
We consider the problem of maximizing the expected return subject to loss risk con-
straints. The same problem over one period was cited as an application of DSOCP (see
Lobo, et al. [23]). Some extensions of this problem will be described.
The problem
Consider a portfolio problem with n assets or stocks over two periods. We start by letting
xi denote the amount of asset i held at the beginning of (and throughout) the first period,
and pi will denote the price change of asset i over this period. So, the vector p ∈ Rn is
the price vector over the first period. For simplicity, we let p be Gaussian with known
mean p and covariance Σ, so the return over this period is the (scalar) Gaussian random
variable r = pTx with mean r = pTx and variance σ = xTΣx where x = (x1;x2; . . . ;xn).
Let yi denote the amount of asset i held at the beginning of (and throughout) the
second period, and qi(ω) will denote the random price change of asset i over this period
whose realization depends on an underlying outcome ω in an event space Ω with known
probability function P . Similarly, we take the price vector q(ω) ∈ Rn over the second
period to be Gaussian with mean q(ω) and covariance Σ(ω), so the return over this period
is the (scalar) Gaussian random variable r(ω) = q(ω)Ty with mean ¯r(ω) = q(ω)Ty and
variance σ(ω) = yTΣ(ω)y where y = (y1; y2; . . . ; yn). Suppose that in the first period we
do not know the realization of the Gaussian vector q(ω) ∈ Rn, and that at some point in
time in the future (after the first period) the realization of this Gaussian vector becomes
known. As pointed out in [23], the choices of portfolio variables x and y involve the
112
(classical, Markowitz) tradeoff between random return mean and random variance.
The optimization variables are the portfolio vectors x ∈ Rn (of the first stage) and
y ∈ Rn (of the second stage). For these portfolio vectors, we take the simplest assumptions
x ≥ 0,y ≥ 0 (i.e., no short positions [23]) and∑n
i=1 xi =∑n
i=1 yi = 1 (i.e. unit total
budget [23]).
Let α and α be given unwanted return levels over the first and second periods, respec-
tively, and let β and β be given maximum probabilities over the first and second periods,
respectively. Assuming the above data is given, our goal is to determine the amount of
the asset i (which is xi over the first period and yi over the second period), i.e. determine
x and y, such that the expected returns over these two periods are maximized subject
to the following two types of loss risk constraints: the constraint P (r ≤ α) ≤ β must be
satisfied over the first period, and the constraint P (r(ω) ≤ α) ≤ β must be satisfied over
the second period. This determination needs to be made before the realizations become
available.
Formulation of the model
As noted in [23], the constraint P (r ≤ α) ≤ β is equivalent to the second-order cone
constraint (α− r; Φ−1(β)(Σ
12x)) 0,
provided β ≤ 1/2 (i.e.,Φ−1(β) ≤ 0), where
Φ(z) =1√2π
∫ z
−∞e−t
2/2dt
is the cumulative normal distribution function of a zero mean unit variance Gaussian
random variable. To prove this (see also [23]), notice that the constraint P (r ≤ α) ≤ β
113
can be written as
P
(r − r√σ≤ α− r√
σ
)≤ β.
Since (r− r)/√σ is a zero mean unit variance Gaussian random variable, the probability
above is simply Φ((α − r)/√σ), thus the constraint P (r ≤ α) ≤ β can be expressed as
Φ((α− r)/√σ) ≤ β or ((α− r)/
√σ) ≤ Φ−1(β), or equivalently r+ Φ−1(β)
√σ ≥ α. Since
√σ =√xTΣ x =
√(Σ1/2 x)T(Σ1/2x) = ‖Σ1/2x‖2,
the constraint P (r ≤ α) ≤ β is equivalent to the the second-order cone inequality r +
Φ−1(β)‖Σ1/2 x‖2 ≥ α or equivalently to the second-order cone constraint
(α− r; Φ−1(β)(Σ
12x)) 0.
Similarly, provided β ≤ 1/2, the constraint P (r(ω) ≤ α) ≤ β is equivalent to the
second-order cone constraint
(α− ¯r(ω); Φ−1(β)(Σ(ω)1/2y)
) 0.
Our goal is to determine the amount of the asset i (which is xi over the first period
and yi over the second period), i.e. determine x and y, such that the expected returns
over these two periods are maximized. This problem can be cast as a two-stage SSOCP
as follows:
max pT x+ E[Q(x, ω)]
s.t.(α− pTx; Φ−1(β)(Σ1/2x)
) 0
1Tx = 1,x ≥ 0,
(5.1.1)
where Q(x, ω) is the maximum value of the problem
114
max q(ω)T y
s.t.(α− q(ω)Ty; Φ−1(β)(Σ(ω)1/2y)
) 0
1Ty = 1,y ≥ 0,
(5.1.2)
and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
Note that this formulation is different from one suggested by Mehrotra and Ozevin [27]
which minimizes the variance period (two-stage extension of Markowitz’s mean-variance
model). Here we formulate a model with a linear objective function leading to another
approach for solving this problem.
Extensions
The simple problem described above has many extensions [23]. One of these extensions is
imposing several loss risk constraints, i.e., the constraints P (r ≤ αi) ≤ βi, i = 1, 2, . . . , k1
(where βi ≤ 1/2, for i = 1, 2, . . . , k1), or equivalently
(αi − r; Φ−1(βi)(Σ
1/2x)) 0, for i = 1, 2, . . . , k1
to be satisfied over the first period, and the constraints P (r(ω) ≤ αj) ≤ βj, j = 1, 2, . . . , k2
(where βj ≤ 1/2, for j = 1, 2, . . . , k2), or equivalently
(αj − ¯r(ω); Φ−1(βj)(Σ(ω)1/2y)
) 0, for j = 1, 2, . . . , k2
to be satisfied over the second period. So our problem becomes
115
max pT x+ E[Q(x, ω)]
s.t.(αi − r; Φ−1(βi)(Σ
1/2x)) 0, i = 1, 2, . . . , k1
1Tx = 1,x ≥ 0,
where Q(x, ω) is the maximum value of the problem
max q(ω)T y
s.t.(αj − ¯r(ω); Φ−1(βj)(Σ(ω)1/2y)
) 0, j = 1, 2, . . . , k2
1Ty = 1,y ≥ 0,
and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
As another extension, observe that the statistical models (p; Σ) for the price changes
during the first period and(q(ω), Σ(ω)
)for the price changes during the second period
are both uncertain (regardless of the fact that the later depends on ω while the first
does not) and one of the limitations of model in (5.1.1, 5.1.2) is its need to estimate these
statistical models. In [11], Bawa et al. indicated that using estimates of unknown expected
returns and unknown covariance matrices leads to an estimation risk in portfolio choice.
To handle this uncertainty, as in [11], we take those expected returns and covariance
matrices that belong to bounded intervals. i.e., we consider the statistical models (p; Σ)
and (q(ω), Σ(ω)) that belong to the bounded sets:
ℵ := (p; Σ) : pl ≤ p ≤ pu,Σl ≤ Σ ≤ Σu
116
and
< := (q(ω); Σ(ω)) : ql(ω) ≤ q(ω) ≤ qu(ω), Σl(ω) ≤ Σ(ω) ≤ Σu(ω),
respectively, where pl, pu, ql(ω), qu(ω),Σl,Σu, Σl(ω) and Σu(ω) are the extreme values of
the intervals mentioned above, then we take a worst case realization of the statistical
models (p; Σ) and(q(ω), Σ(ω)
)by maximizing the minimum of the expected returns
over all (p; Σ) ∈ ℵ and (q(ω), Σ(ω)) ∈ <.
Let us consider here only a discrete version of this development. Suppose we have
N1 different possible scenarios over the first period, each of which is modeled by a simple
Gaussian model for the price change vector pk with mean pk and covariance Σk. Similarly,
we also have N2 different possible scenarios over the second period, each of which is
modeled by a simple Gaussian model for the price change ql(ω), with mean ql(ω) and
covariance Σl(ω). We can then take a worst case realization of these information by
maximizing the minimum of the expected returns for these different scenarios, subject to
a constraint on the loss risk for each scenario to get the following SSOCP model:
max pT x+ E[Q(x, ω)]
s.t.(α− pTi x; Φ−1(β)(Σ
1/2i x)
) 0, i = 1, 2, . . . , N1
1Tx = 1,x ≥ 0,
where E[Q(x, ω)] is the maximum value of the problem
max q(ω)T y
s.t.(α− qj(ω)Ty; Φ−1(β)(Σj(ω)1/2y)
) 0, j = 1, 2, . . . , N2
1Ty = 1,y ≥ 0,
and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
117
5.2 Two applications of SRQCPs
The stochastic versions of the two problems in this section have been described and
formulated by Ariyawansa and Zhu [49] as SSDP models, but it has been found recently [1,
23, 24] that on numerical grounds solving DSOCPs (SSOCPs) or DRQCPs1 (SRQCPs) by
treating them as DSDPs (SSDPs) is inefficient, and that DSOCPs (SSOCPs) or DRQCPs
(SRQCPs) can be solved more efficiently by exploiting their structure. In fact, in [1, 23]
we see many problems first formulated as DSDPs in [38, 41, 30] have been reformulated
as DSOCPs or DRQCP mainly for this reason. In view of this fact, in this section we
reformulate SRQCP models (as these problems should be solved as such) for the minimum-
volume covering random ellipsoid problem and the structural optimization problem.
5.2.1 Optimal covering random ellipsoid problem
The problem
The problem description in this part is taken from Ariyawansa and Zhu [9, Subsection
3.2]. Suppose we have n1 fixed ellipsoids:
Ei := x ∈ Rn : xTHix+ 2gTi x+ vi ≤ 0 ⊂ Rn, i = 1, 2, . . . , n1
where Hi ∈ Sn+, gi ∈ Rn and vi ∈ R are deterministic data for i = 1, 2, . . . , n1.
Suppose we also have n2 random ellipsoids:
Ei(ω) := x ∈ Rn : xTHi(ω)x+ 2gi(ω)Tx+ vi(ω) ≤ 0 ⊂ Rn, i = 1, 2, . . . , n2,
where, for i = 1, 2, . . . , n2, Hi(ω) ∈ Sn+, gi(ω) ∈ Rn and vi(ω) ∈ R are random data
whose realizations depend on an underlying outcome ω in an event space Ω with a known
1The acronym DRQCPs stands for deterministic rotated quadratic cone programs.
118
probability function P . We assume that, at present time, we do not know the realizations
of n2 random ellipsoids, and that at some point in the future the realizations of these n2
ellipsoids become known.
Given this, we need to determine a ball that contains all n2 fixed ellipsoids and the
realizations of the n2 random ellipsoids. This decision needs to be made before the real-
izations of the random ellipsoids become available. Consequently, when the realizations
of the random ellipsoids do become available, the ball that has already been determined
may or may not contain all the realized random ellipsoids. In order to guarantee that the
modified ball contains all (fixed and realizations of random) ellipsoids, we assume that at
that stage we are allowed to change the radius of the ball (but not its center), if necessary.
We consider the same assumptions as in [49]. We assume that the cost of choosing
the ball has three components:
• the cost of the center, which is proportional to the Euclidean distance to the center
from the origin;
• the cost of the initial radius, which is proportional to the square of the radius;
• the cost of changing the radius. The center and the radius of the initial ball are
determined so that the expected total cost is minimized.
In [49], Ariyawansa and Zhu describe the following concrete version of this generic
application: Let n := 2. The fixed ellipsoids contain targets that need to be destroyed,
and the random ellipsoids contain targets that also need to be destroyed but are moving.
Fighter aircrafts take off from the origin with a planned disk of coverage that contains
the fixed ellipsoids. In order to be accurate only the latest information about the moving
targets is used. This may require increasing the radius of the planned disk of coverage
after latest information about the random ellipsoids become available, which may occur
after the initially planned fighter aircrafts have taken off. This increase, dependent on
119
the initial disk of coverage and the specific information about the moving targets, may
result in an additional cost. The initial disk of coverage need to be determined so that
the expected total cost is minimized.
Our first goal is to determine x ∈ Rn and γ ∈ R such that the ball B defined by
B := x ∈ Rn : xTx− 2xTx+ γ ≤ 0
contains the fixed ellipsoids Ei for i = 1, 2, . . . , n1. As we mentioned, this determination
need to be determined before the realization of the random ellipsoids become available.
When the realizations of the random ellipsoids become available, if necessary, we need to
determine γ so that the new ball
B := x ∈ Rn : xTx− 2xTx+ γ ≤ 0
contains all the realizations of the random ellipsoids Ei(ω) for i = 1, 2, . . . , n2.
Notice that the center of the ball B is x, its radius is r :=√xTx− γ, and the distance
from the origin to its center is√xTx. Notice also that the new ball B has the same center
x as B but a larger radius r :=√xTx− γ.
Formulation of the model
We introduce the constraints d12 ≥ xTx and d2 ≥ r2 = xTx− γ. That is, d1 is an upper
bound on the distance between the center of the ball B and the origin,√xTx, and d2 is
an upper bound on square of the radius of the ball B.
In order to proceed, we feel that it is necessary for the reader to recall the following
fact:
Fact 1 (Sun and Freund [37]). Suppose that we are given two ellipsoids Ei ⊂ Rn, i = 1, 2
defined by Ei := x ∈ Rn : xTHix+ 2gTi x+ vi ≤ 0, where Hi ∈ Sn+, gi ∈ Rn and vi ∈ R
120
for i = 1, 2, then E1 contains E2 if and only if there exists τ ≥ 0 such that the linear
matrix inequality H1 g1
gT1 v1
τ
H2 g2
gT2 v2
holds.
In view of Fact 1 and the requirement that the ball B contains the fixed ellipsoids Ei
for i = 1, 2, . . . , n1, and the realizations of the random ellipsoids Ei(ω) for i = 1, 2, . . . , n2,
we accordingly add the following constraints:
I −x
−xT γ
τi
Hi gi
giT vi
, i = 1, 2, . . . , n1,
and I −x
−xT γ
δi
Hi(ω) gi(ω)
gTi (ω) vi(ω)
, i = 1, 2, . . . , n2,
or equivalently
Mi 0, ∀i = 1, . . . , n1 and Mi(ω) 0, i = 1, 2, . . . , n2
where for each i = 1, . . . , n1,
Mi :=
τiHi − I τigi + x
τigiT + xT τivi − γ
,and for each i = 1, . . . , n2,
Mi(ω) :=
δiHi(ω)− I δigi(ω) + x
δigiT(ω) + xT δivi(ω)− γ
.
121
Since we are looking to minimizing d2, where d2 is an upper bound on square of the
radius of the ball B, we can write the constraint d2 ≥ xTx− γ as d2 = xTx− γ. So, the
matrix Mi can be then written as
Mi =
τiHi − I τigi + x
τigiT + xT τivi + d2 − xTx
.Now, letHi := ΞiΛiΞ
Ti be the spectral decomposition ofHi, where Λi := diag(λi1; . . . ;λin),
and let ui := ΞTi (τigi + x). Then, for each i = 1, . . . , n1, we have
Mi :=
ΞTi 0
0T 1
Mi
Ξi 0
0T 1
=
τiΛi − I ui
uTi τivi + d2 − xTx
.Consequently, Mi 0 if and only if Mi 0 for each i = 1, 2, . . . , n1. Now our
formulation of the problem in SSOCP depends on the following lemma (see also [23]):
Lemma 5.2.1. For each i = 1, 2, . . . , n1, Mi 0 if and only if τiλmin(Hi) ≥ 1 and
xTx ≤ d2 + τivi + 1Tsi, where si = (si1; . . . ; sin), sij = u2ij/(τiλij − 1) for all j such that
τiλij > 1, and sij = 0 otherwise.
Proof. For each i = 1, 2, . . . , n1, it is known that the matrix Mi 0 if and only if every
principle minor of Mi is nonnegative. Since
det(τiΛi − I) =
∣∣∣∣∣∣∣∣∣∣∣∣∣
τiλi1 − 1 0 . . . 0
0 τiλi2 − 1 . . . 0
......
. . ....
0 0 . . . τn1λin1 − 1
∣∣∣∣∣∣∣∣∣∣∣∣∣= Πn1
j=1(τjλij − 1).
It follows that Mi 0 if and only if Πsj=1(τjλij − 1) ≥ 0, for all s = 1, . . . , n1 and
det(Mi) ≥ 0. Thus, Mi 0 if and only if τjλij ≥ 1, for all j = 1, . . . , n1 and det(Mi) ≥ 0.
122
Notice that
det(Mi) =
(k∏j=1
(τiλij − 1)
)((τivi + d2 − xTx)−
k∑j=1
u2ij
).
This means the inequality det(Mi) ≥ 0 strictly holds for each i ≤ j ≤ k such that
τjλij = 1. Hence, det(Mi) ≥ 0 if and only if
(τivi + d2 − xTx)− 1Tsi = (τivi + d2 − xTx)−∑
τiλij>1
(u2ij/(τjλij − 1)) ≥ 0.
Therefore, Mi 0 if and only if τiλmin(Hi) ≥ 1 and d2 ≥ xTx− τivi + 1Tsi. 2
So far, we have shown that each constraint Mi 0 can be replaced by the constraints
τiλmin(Hi) ≥ 1, xTx ≤ σ and σ ≤ d2 + τivi − 1Tsi.
Similarly, for each i = 1, 2, . . . , n2, by letting Hi := ΞTi ΛiΞ
Ti (the spectral decomposi-
tion of Hi), ui := ΞTi (δigi + x), and si := (si1; si2; . . . ; sin), where sij := u2
ij/(δiλij − 1)
for all j when δiλij > 1, and sij := 0 otherwise, then we can show that each constraint
Mi(ω) 0 can be replaced by the constraints δiλmin(Hi(ω)) ≥ 1, xTx ≤ σ(ω), and
σ(ω) ≤ d2 + δivi(ω)− 1Tsi(ω).
Since we are minimizing d2 and d2, then for all j = 1, 2, . . . , n, we can relax the
definitions of sij and sij by replacing them by u2ij ≤ sij(τiλij − 1) for all i = 1, 2, . . . , n1
and u2ij ≤ sij(δiλij − 1) for all i = 1, 2, . . . , n2, respectively.
When the realizations of the random ellipsoids become available, if necessary, we
determine λ so that the new ball
B := x ∈ Rn : xTx− 2xTx+ λ ≤ 0
contains all the realizations of the random ellipsoid. This new ball B has the same center x
as B but a larger radius, r :=√xTx. We note that r2−r2 = (xTx−γ)−(xTx−γ) = γ−γ,
123
and thus we introduce the constraint 0 ≤ γ− γ ≤ z where z is an upper bound of r2− r2.
Let c > 0 denote the cost per unit of the Euclidean distance between the center of the
ball B and the origin; let α > 0 be the cost per unit of the square of the radius of B; and
let β > 0 be the cost per unit increase of the square of the radius if it becomes necessary
after the realizations of the random ellipsoids are available.
We now define the following decision variables
x := (d1; d2; x; γ; τ ) and y := (z; γ; δ).
Then, by introducing the following unit cost vectors
c := (c;α; 0; 0; 0) and q := (β; 0; 0),
and combining all of the above, we get the following SQRCP model:
min cTx + E[Q(x, ω)]
s.t. ui = ΞTi (τigi + x), i = 1, 2, . . . , n1
u2ij ≤ sij(τiλij − 1), i = 1, 2, . . . , n1, j = 1, 2, . . . , n
xTx ≤ σ1
σ1 ≤ d2 + τivi − 1Tsi, i = 1, 2, . . . , n1
τi ≥ 1/λmin(Hi), i = 1, 2, . . . , n1
xTx ≤ d21,
where Q(x, ω) is the minimum value of the problem
124
min qTy
s.t. ui(ω) = ΞTi (ω)T(δigi(ω) + x), i = 1, 2, . . . , n2
uij(ω)2 ≤ sij(ω)(δiλij(ω)− 1), i = 1, 2 . . . , n2, j = 1, 2, . . . , n
xTx ≤ σ2(ω)
σ2(ω) ≤ d2 + δivi(ω)− 1Tsi(ω), i = 1, 2, . . . , n2
δi ≥ 1/λmin(Hi)(ω), i = 1, 2, . . . , n2
0 ≤ δ − δ ≤ z,
and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
5.2.2 Structural optimization
Ben-Tal and Bendsøe in [12] and Nemirovski [13] consider the following problem from
structural optimization. A structure of k linear elastic bars connects a set of p nodes.
They assume the geometry (topology and lengths of bars) and the material are fixed. The
goal is to size the bars, i.e., determine appropriate cross-sectional areas of the bars.
For i = 1, 2, . . . , k, and j = 1, 2, . . . , p, we define the following decision variables and
parameters:
• fj := the external force applied on the jth node,
• dj := the (small) displacement of the jth node resulting from the load force fj,
• xi:= the cross-sectional area of the ith bar,
• xi:= the lower bound on the cross-sectional area of the ith bar,
• xi:= the upper bound on the cross-sectional area of the ith bar,
• li := the length of the ith bar,
125
• v := the maximum allowed volume of the bars of the structure,
• G(x) :=∑k
i=1 xiGi is the stiffness matrix, where the matrices Gi ∈ Sp, i = 1, 2, . . . , k
depend only on fixed parameters (such as length of bars and material).
In the simplest version of the problem they consider one fixed set of externally applied
nodal forces fj, j = 1, 2, . . . , p. Given this, the elastic stored energy within the structure
is given by
ε = fTd,
which is a measure of the inverse of the stiffness of the structure. In view of the definition
of the stiffness matrix G(x), we can also conclude that the following linear relationship
between f and d:
f = G(x) d.
The objective is to find the stiffest truss by minimizing ε subject to the inequality
lT x ≤ v as a constraint on the total volume (or equivalently, weight) and the constraint
x ≤ x ≤ x as upper and lower bounds on the cross-sectional areas.
For simplicity, we assume that x > 0 and G(x) 0, for all x > 0. In this case we
can express the elastic stored energy in terms of the inverse of the stiffness matrix and
the external applied nodal force as follows:
ε = fTG(x)−1f .
In summary, they consider the problem
min fT G(x)−1f
s.t. x ≤ x ≤ x
lT x ≤ v,
126
which is equivalent to
min s
s.t. fT G(x)−1f ≤ s
x ≤ x ≤ x
lT x ≤ v.
(5.2.1)
The first inequality constraint in (5.2.1) is just fractional quadratic function inequality
constraint and it can be formulated as a hyperbolic inequality. In §2 of [1] (see also §2 of
[23]), the authors demonstrate that this inequality is satisfied if and only if there exists
ui ∈ Rri and ti ∈ R with ti > 0, i = 1, 2, . . . , k, such that
k∑i=1
DTi ui = f , uT
i ui ≤ xiti, for i = 1, 2, . . . , k, and 1T t ≤ s,
where ri = rank(Gi) and the matrix Di ∈ Rri×n is the Cholesky factor of Gi (i.e., DTi Di =
Gi) for each i = 1, 2, . . . , k. Using this result, problem (5.2.1) becomes the problem
min s
s.t.∑k
i=1DTi ui = f
uTi ui ≤ xiti, i = 1, . . . , k
1T t ≤ s
x ≤ x ≤ x
lT x ≤ v,
(5.2.2)
which includes only linear and hyperbolic constraints.
The model (5.2.2) depends on a “simple” assumption that says that the external forces
applied to the nodes are fixed. As more complicated versions, they consider multiple
loading scenarios as well. In [49], Ariyawanza and Zhu consider the case that external
forces applied to the nodes are random variables with known distribution functions and
they formulate an SSDP model for this problem. In this subsection, we formulate an
127
SRQCP model for this problem under the assumption that some of the external forces
applied to the nodes are fixed and the rest are random variables with known distribution
functions. Let us denote to the external force applied on the jth node by fj if it is
fixed and by ˜fj(ω) if it is random. Due to the changes in the environmental conditions
(such as wind speed and temperature), we believe that the randomness of some of the
external forces is much closer to the reality. Without loss of generality, let us assume that
f(ω) = (f ; ˜f(ω)) where ˜f(ω) depends on an underlying outcome ω in an event space Ω
with a known probability function P .
Accordingly, the displacement of the jth node resulting from the random forces and the
elastic stored energy within the structure are also random. Then we have the following
relations
ε = fTd, f = G(x) d, ˜ε(ω) = ˜f(ω)T ˜d(ω), and ˜f(ω) = G(x) ˜d(ω).
Suppose we design a structure for a customer. The structure will be installed in an
open environment. From past experience, the customer can provide us with sufficient in-
formation so that we can model the external forces that will be applied to the nodes of the
structure as random variables with known distribution functions. Given this information,
we can formulate a model of this problem so that we can guarantee that our structure will
be able to continue to function and stand up against the worst environmental conditions.
128
In summary, we solve the following SRQCP problem:
min s+ E [Q(x, ω)]
s.t.∑k
i=1DTi ui = f
uTi ui ≤ xiti, i = 1, . . . , k
1T t ≤ s
x ≤ x ≤ x
lT x ≤ v,
where Q(x, ω) is the minimum value of the problem
min ˜s
s.t.∑k
i=1 DTi
˜iu(ω) = ˜f(ω)
˜iu(ω)T ˜
iu(ω) ≤ xi˜ti, i = 1, . . . , k
1T ˜t ≤ ˜s,
and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
129
Chapter 6
Related Open problems:
Multi-Order Cone Programming
Problems
In this chapter we introduce a new class of convex optimization problems that can be
viewed as an extension of second-order cone programs. We present primal and dual forms
of multi-order cone programs (MOCPs) in which we minimize a linear function over a
Cartesian product of pth-order cones (we allow different p values for different cones in the
product). We then indicate weak and strong duality relations for the problem. We also
introduce mixed integer multi-order cone programs (MIMOCPs) to handle MOCPs with
integer-valued variables, and two-stage stochastic multi-order cone programs (SMOCPs)
with recourse to handle uncertainty in data defining (deterministic) MOCPs. We demon-
strate how decision making problems associated with facility location problems lead to
MOCP, MIMOCP, and SMOCP models. It is interesting to investigate other applicational
settings leading to (deterministic, stochastic and mixed integer) MOCPs. Development
of algorithms for such multi-order cone programs which in turn will benefit from a duality
theory is equally interesting and remains one of the important open problems for future
130
research.
We begin by introducing some notations we use in the sequel.
Given p ≥ 1, the pth-order cone of dimension n is defined as
Qnp := x = (x0; x) ∈ R× Rn−1 : x0 ≥ ‖x‖p
where ‖ · ‖p denotes the p-norm. The cone Qnp is regular (see, for example, [45]). As
special cases, when p = 2 we obtain Qn2 := En+; the second-order cone of dimension n, and
when p = 1 or ∞, Qnp is a polyhedral cone.
We write x 〈n〉〈p〉 0 to mean that x ∈ Qnp . Given 1 ≤ pi ≤ ∞ for i = 1, 2, . . . , r. We
write x 〈n1,n2,...,nr〉〈p1,p2,...,pr〉 0 to mean that x ∈ Qn1
p1× Qn2
p2× · · · × Qnr
pr . It is immediately seen
that, for every vector x ∈ Rn where n =∑r
i=1 ni, x 〈n1,n2,...,nr〉〈p1,p2,...,pr〉 0 if and only if x is
partitioned conformally as x = (x1;x2; . . . ;xr) and xi 〈ni〉〈pi〉 0 for i = 1, 2, . . . , r. For
simplicity, we write:
• Qnp as Qp and x 〈n〉〈p〉 0 as x 〈p〉 0 when n is known from the context;
• Qn1p1×Qn2
p2× · · · ×Qnr
pr as Q〈p1,p2,...,pr〉 and x 〈n1,n2,...,nr〉〈p1,p2,...,pr〉 0 as x 〈p1,p2,...,pr〉 0 when
n1, n2, . . . , nr are known from the context;
• x 〈p, p, . . . , p︸ ︷︷ ︸r times
〉 0 as x r〈p〉 0.
The set of all interior points of Qnp is denoted by int(Qnp ) := x = (x0; x) ∈ R ×
Rn−1 : x0 > ‖x‖p. We write x 〈p1,p2,...,pr〉 0 to mean that x ∈ int(Q〈p1,p2,...,pr〉) :=
int(Qp1)× int(Qp2)× · · · × int(Qpr).
6.1 Multi-order cone programming problems
Let r ≥ 1 be an integer, and p1, p2, . . . , pr are such that 1 ≤ pi ≤ ∞ for i = 1, 2, . . . , r.
Let m,n, n1, n2, . . . , nr be positive integers such that n =∑r
i=1 ni. Then we define an
131
MOCP in primal standard form as
min cTx
(P ) s.t. A x = b
x 〈p1,p2,...,pr〉 0,
where A ∈ Rm×n, b ∈ Rm and c ∈ Rn constitute given data, and x ∈ Rn is the primal
decision variable. We define a MOCP in dual standard form as
max bTy
(D) s.t. ATy + z = c
z 〈q1,q2,...,qr〉 0,
where y ∈ Rm and z ∈ Rn are the dual decision variables, and q1, q2, . . . , qr are integers
such that 1 ≤ qi ≤ ∞ for i = 1, 2, . . . , r.
If (P) and (D) are defined by the same data, and qi is conjugate to pi, in the sense
that 1/pi + 1/qi = 1 for i = 1, 2, . . . , r, then we can prove relations between (P) and (D)
(see §3) to justify referring to (D) as the dual of (P) and vice versa.
A pth-order cone programming (POCP) problem in primal standard form is
min cTx
s.t. A x = b
x r〈p〉 0,
(6.1.1)
where m,n, n1, n2, . . . , nr are positive integers such that n =∑r
i=1 ni, p ∈ [1,∞], A ∈
Rm×n, b ∈ Rm and c ∈ Rn constitute given data, x ∈ Rn is the primal variable. According
to (D), the dual problem associated with POCP (6.1.1) is
132
max bTy
s.t. ATy + z = c
z r〈q〉 0,
(6.1.2)
where y ∈ Rm and z ∈ Rn are the dual variables and q is conjugate to p.
Clearly, second-order cone programs are a special case of POCPs which occurs when
p = 2 in (6.1.1) (hence q = 2 in (6.1.2)).
Example 21. Norm minimization problems:
In [1], Alizadeh and Goldfarb presented second-order cone programming formulations of
three norm minimization problems where the norm is the Euclidean norm. In this subsec-
tion we indicate extensions of these three problems where we use arbitrary p norms leading
to MOCPs. Let vi = Aix+ bi ∈ Rni−1, i = 1, 2, . . . , r. The following norm minimization
problems can be cast as MOCPs:
1. Minimization of the sum of norms: The problem min∑r
i=1 ‖vi||pi can be formulated
as
min∑r
i=1 ti
s.t. Aix+ bi = vi, i = 1, 2, . . . , r
(t1;v1; t2;v2; . . . ; tr;vr) 〈p1,p2,...,pr〉 0.
2. Minimization of the maximum of norms: The problem min max1≤i≤r ‖vi||pi can be
expressed as the MOCP problem
min t
s.t. Aix+ bi = vi, i = 1, 2, . . . , r
(t;v1; t;v2; . . . ; t;vr) 〈p1,p2,...,pr〉 0.
3. Minimization of the sum of the k largest norms: More generally, the problem of
minimizing the sum of the k largest norms can also be cast as an MOCP. Let the
133
norms ‖v[1]‖p[1] , ‖v[2]‖p[2] , . . . , ‖v[r]‖p[r] be the norms ‖v1||p1 , ‖v2||p2 , . . . , ‖vr||pr sorted
in nonincreasing order. Then the problem min∑r
i=1 ‖v[i]‖p[i] can be formulated (see
also [1] and [23] and the related references contained therein) as the MOCP problem
min∑r
i=1 si + kt
s.t. Aix+ bi = vi, i = 1, 2, . . . , r
(s1 + t;v1; s2 + t;v2; . . . ; sr + t;vr) 〈p1,p2,...,pr〉 0
si ≥ 0, i = 1, 2, . . . , r.
6.2 Duality
Since MOCPs are a class of convex optimization problems, we can develop a duality theory
for them. Here we indicate weak and strong duality for the pair (P, D) as justification for
referring to them as a primal dual pair.
It was shown by Faraut and Koranyi in [18, Chapter I.2] that the second-order cone
is self-dual. We now prove the more general result that the the dual of the pth-order cone
of dimension n is the qth-order cone of dimension n, where q is the conjugate to p.
Lemma 6.2.1. Qp∗ = Qq, where 1 ≤ p ≤ ∞ and q is the conjugate to p. More gen-
erally, Q〈p1,p2,...,pr〉∗ = Q〈q1,q2,...,qr〉, where 1 ≤ pi ≤ ∞ and qi is the conjugate to pi for
i = 1, 2, . . . , r.
Proof. The proof of the second part trivially follows from the first part and the fact
that (K1 × K2 × · · · × Kr)∗ = K∗1 × K∗2 × · · · × K∗r . To prove the first part, we first
prove that Qq ⊆ Qp∗. Let x = (x0; x) ∈ Qq, we show that x ∈ Qp∗ by verifying that
xTy ≥ 0 for any y ∈ Qp. So let y = (y0; y) ∈ Qp. Then xTy = x0 y0 + xTy ≥
‖x‖q‖y‖p + xTy ≥ |xTy|+ xTy ≥ 0, where the first inequality follows from the fact that
x ∈ Qq and y ∈ Qp and the second one from Holder’s inequality. Now we show Qp∗ ⊆ Qq.
Let y = (y0; y) ∈ Qp∗, we show that y ∈ Qq by verifying that y0 ≥ ‖y‖q. This is trivial if
134
y = 0 or p =∞. If y 6= 0 and 1 ≤ p <∞, let u := (y1p/q; y2
p/q; . . . ; yn−1p/q) and consider
x := (‖u‖p;−u) ∈ Qp. Then by using Holder’s inequality, where the equality is attained,
we obtain 0 ≤ xTy = ‖u‖p y0 − uTy = ‖u‖p y0 − ‖u‖p‖y‖q = ‖u‖p (y0 − ‖y‖q). This
gives that y0 ≥ ‖y‖q. 2
From this lemma we deduce that the pth-order cone is reflexive, i.e., Qp∗∗ = Qp, and
more generally, Q〈p1,p2,...,pr〉 is also reflexive. On the basis of this fact, it is natural to infer
that the dual of the dual is the primal.
In view of the above lemma, problem (D) can be derived from (P) through the usual
Lagrangian approach. The Lagrangian function for (P) is
L(x,λ,ν) = cTx− λT(A x− b)− νTx.
The dual objective is
q(λ,ν) := infxL(x,λ,ν) = inf
x(c− ATλ− ν)Tx+ λTb.
In fact, we may call the constraint x 〈p1,p2,...,pr〉0 as the “nonnegativity of x”, but,
with respect to the multi-order cone Q〈p1,p2,...,pr〉. Note that the Lagrange multiplier ν
corresponding to the inequality constraint x 〈p1,p2,...,pr〉0 is restricted to be nonnegative
with respect to the dual of Q〈p1,p2,...,pr〉 (i.e., ν 〈q1,q2,··· ,qr〉0), whereas the Lagrange mul-
tiplier λ corresponding to the equality constraint Ax− b = 0 is unrestricted. Hence the
dual problem is obtained by maximizing q(λ,ν) subject to ν 〈q1,q2,...,qr〉0.
If c − ATλ − ν 6= 0, the infimum is clearly −∞. So we can exclude λ for which
c − ATλ − ν 6= 0. When c − ATλ − ν = 0, the dual objective function is simply λTb.
135
Primal Minimum Maximum Dual
C vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 V
O vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 A
N vector or scalar: ≥ vector or scalar: ≥ RS vector or scalar: ≤ vector or scalar: ≤ BT vector or scalar: = vector or scalar: free LV vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 C
A vector: 〈p1,p2,...,pr〉 vector: 〈q1,q2,...,qr〉 O
R vector or scalar: ≥ vector or scalar: ≤ NB vector or scalar: ≤ vector or scalar: ≥ SL vector or scalar: free vector or scalar: = T
Table 6.1: Correspondence rules between primal and dual MOCPs.
Hence, we can write the dual problem as follows:
max bTλ
s.t. ATλ+ ν = c
ν 〈q1,q2,...,qr〉 0.
(6.2.1)
Replacing λ by x and ν by z in (6.2.1) we get (D).
In general, MOCPs can be written in a variety of forms different from the standard
form (P, D). The situation in MOCPs is similar to that for linear programs; any MOCP
problem can be written in the standard form. However, if we consider MOCPs in other
forms, then it is more convenient to apply the duality rules directly. Table 6.1 is a
summary of these rules. This table is a generalization of a similar table in [14, Section
4.2].
For instance, using this table, the dual of (P) is the problem
max bTλ
s.t. ATλ 〈q1,q2,...,qr〉c,
which is equivalent to problem (6.2.1) where ATλ〈q1,q2,...,qr〉c means that c−ATλ〈q1,q2,...,qr〉0.
136
Using Lemma 6.2.1, we can prove the following weak duality property.
Theorem 6.2.1. (Weak duality) If x is any primal feasible solution of (P) and (y, z)
is any feasible solution of (D), then the duality gap cTx− bTy = xTz ≥ 0.
Proof. Note that cTx − bTy = (ATy + z)Tx − bTy = yTAx + zTx − yTb = yT(Ax −
b) + zTx = xTz. Since x ∈ Q〈p1,p2,...,pr〉 and z ∈ Q〈q1,q2,...,qr〉 = Q〈p1,p2,...,pr〉∗, we conclude
that xTz ≥ 0. 2
We now give conditions for strong duality to hold. We say that problem (P) is strictly
feasible if there exists a primal feasible point x such that x 〈p1,p2,...,pr〉 0. In the remaining
part of this section, we assume that the m rows of the matrix A are linearly independent.
Using the Karush-Kuhn-Tucker (KKT) conditions, we state and prove the following strong
duality result.
Theorem 6.2.2. (Strong duality I) Consider the primal–dual pair (P, D). If (P) is strictly
feasible and solvable with a solution x, then (D) is solvable and the optimal values of (P)
and (D) are equal.
Proof. By the assumptions of the theorem, x is an optimal solution of (P) where we can
apply the KKT conditions. This implies that there are Lagrange multiplier vectors λ and
ν such that (x, λ, ν) satisfies
A x = b, x 〈p1,p2,...,pr〉0,
ATλ+ ν = c, ν 〈q1,q2,...,qr〉0,
xTi νi = 0, for i = 1, 2, . . . , r.
This implies that (λ, ν) is feasible for the dual problem (D). Let (y, z) be any feasible
solution of (D), then we have that bTy ≤ cTx = xTν + bTλ = bTλ, where we used the
weak duality to obtain the inequality and the complementary slackness to obtain the last
equality. Thus, (λ, ν) is an optimal solution of (D) and cTx = bTλ as desired. 2
137
Note that the result in Theorem 6.2.1 is symmetric between (P) and (D). The following
strong duality result can also be obtained by applying the duality relations [30, Theorem
4.2.1] to our problem formulation,
Theorem 6.2.3. (Strong duality II) Consider the primal–dual pair (P, D). If both (P)
and (D) have strictly feasible solutions, then they both have optimal solutions x∗ and
(y∗, z∗), respectively, and p∗ := cTx∗ = d∗ := bTy∗ (i.e., x∗Tz∗ = 0 (complementary
slackness)).
From the above results, we get the following corollary.
Corollary 6.2.1. (Optimality conditions) Assume that both (P) and (D) are strictly
feasible, then (x,y, z) ∈ Rn+m+n is a pair of optimal solutions if and only if
A x = b, x 〈p1,p2,...,pr〉0,
ATy + z = c, z 〈q1,q2,...,qr〉0,
xTi zi = 0, for i = 1, 2, . . . , r.
6.3 Multi-oder cone programming problems over in-
tegers
In this section we introduce two important related problems that result when decision
variables in an MOCP can only take integer values. Consider the MOCP problem (P).
If we require an additional constraint that a subset of the variables have to attain 0-1
values, then we are interested in optimization problem of the form
138
min cTx
s.t. Ax = b
x〈p1,p2,...,pr〉0
xk ∈ 0, 1, k ∈ Γ,
where Γ ⊆ 1, 2, . . . , n, the decision variable x ∈ Rn has some of its components xk
(k ∈ Γ) with integer values and bounded by αk, βk ∈ R. This class of optimization
problems may be termed as 0-1 multi-order cone programs (0-1MOCPs).
A more general and interesting problem when in a MOCP some variables can only
take integer values. If we are given the same data A, b, and c as in (P), then we are
interested in the problem of the form
min cTx
s.t. Ax = b
x〈p1,p2,...,pr〉0
xk ∈ [αk, βk]⋂Z, k ∈ Γ,
(6.3.1)
where Γ ⊆ 1, 2, . . . , n, the decision variable x ∈ Rn has some of its components xk
(k ∈ Γ) with integer values and bounded by αk, βk ∈ R. This class of optimization
problems may be termed as mixed integer multi-order cone programs (MIMOCPs).
6.4 Multi-oder cone programming problems under
uncertainty
In this section we define two-stage stochastic multi-order cone programs (SMCOPs) with
recourse to handle uncertainty in data defining (deterministic) MOCPs. Let r1, r2 ≥ 1 be
integers. For i = 1, 2, . . . , r1 and j = 1, 2, . . . , r2, let p1i, p2j ∈ [1,∞] andm1,m2, n1, n2, n1i,
139
n2j be positive integers such that n1 =∑r1
i=1 n1i and n2 =∑r2
i=1 n2j. An SMOCP with re-
course in primal standard form is defined based on deterministic dataA ∈ Rm1×n1 , b ∈ Rm1
and c ∈ Rn1 and random data T ∈ Rm2×n1 ,W ∈ Rm2×n2 ,h ∈ Rm2 and d ∈ Rn2 whose
realizations depend on an underlying outcome ω in an event space Ω with a known prob-
ability function P . Given this data, an SMOCP with recourse in primal standard form
is
min cTx+ E [Q(x, ω)]
s.t. Ax = b
x 〈p11,p12,...,p1r1 〉0,
where x ∈ Rn1 is the first-stage decision variable and Q(x, ω) is the minimum value of
the problem
min d(ω)Ty
s.t. W (ω)y = h(ω)− T (w)x
y 〈p21,p22,...,p2r2 〉0,
where y ∈ Rn2 is the second-stage variable, and
E[Q(x, ω)] :=
∫Ω
Q(x, ω)P (dω).
Two-stage stochastic pth(second)-order cone programs with recourse are a special case
of SMOCPs with p1i = p2j = p ≥ 1 (p1i = p2j = 2) for all i = 1, 2, . . . , r1 and j =
1, 2, . . . , r2.
6.5 An application
Our application is four versions the FLP (see Subsection 5.1.1). For these four versions we
present problem descriptions leading to a MOCP model, a 0-1MOCP model, an MIMOCP
140
model, and an SMOCP model.
As we mentioned in Subsection 5.1.1, FLPs can be classified based on the distance
measure used in the model between the facilities. If we use the Euclidean distance then
these problems are called Euclidean facility location problems (EFLPs), if we use the rec-
tilinear distance (also known as L1 distance, city block distance, or Manhattan distance)
then these problems are called rectilinear facility location problems (RFLPs). Further-
more, in some applications we use both the Euclidean and the rectilinear distances (based
on the relationships between the pairs of facilities) as the distance measures used in the
model between the facilities to get a mixed of EFLPs and RFLPs that we refer to as
Euclidean-rectilinear facility location problems (ERFLPs). Another way of classification
this problem is based on where we can place the new facilities in the solution space.
When the new facilities can be placed any place in solution space, the problem is called
a continuous facility location problem (CFLP), but usually the decision maker needs the
new facilities to be placed at specific locations (called nodes) and not in any place in
the solution space. In this case the problem is called a discrete facility location problem
(DFLP).
Each one of the next subsections is devoted to a version of ERFLPs. More specifi-
cally, we consider (deterministic) continuous Euclidean-rectilinear facility location prob-
lems (CERFLPs) which leads to an MOCP model, discrete Euclidean-rectilinear facility
location problems (DERFLPs) which leads to a 0-1MOCP model, ERFLPs with integral-
ity constraints which leads to an MIMOCP model, and stochastic continuous Euclidean-
rectilinear facility location problems (stochastic CERFLPs) which leads to an SMOCP
model.
141
6.5.1 CERFLPs—An MOCP model
In single ERFLPs, we are interested in choosing a location to build a new facility among
existing facilities so that this location minimizes the sum of weighted (either Euclidean
or rectilinear) distances to all existing facilities.
Assume that we are given r + s existing facilities represented by the fixed points
a1,a2, . . . ,ar, ar+1,ar+2, . . . ,ar+s in Rn, and we plan to place a new facility represented
by x so that we minimize the weighted sum of the Euclidean distances between x and
each of the points a1,a2, . . . ,ar and the weighted sum of the rectilinear distances between
x and each of the points ar+1,ar+2, . . . ,ar+s. This leads us to the problem
min∑r
i=1wi ‖x− ai‖2 +∑r+s
i=r+1wi ‖x− ai‖1
or, alternatively, to the problem
min∑r+s
i=1 wi ti
s.t. (t1;x− a1; . . . ; tr;x− ar) r〈2〉 0
(tr+1;x− ar+1; . . . ; tr+s;x− ar+s) s〈1〉 0,
where wi is the weight associated with the ith existing facility and the new facility for
i = 1, 2, . . . , r + s.
In multiple ERFLPs we add m new facilities, namely x1,x2, . . . ,xm ∈ Rn, instead of
adding only one. We have two cases depending whether or not there is an interaction
among the new facilities in the underlying model. If there is no interaction between the
new facilities, we are just concerned in minimizing the weighted sums of the distance
between each one of the new facilities and each one of the fixed facilities. In other words,
we solve the following MOCP model:
142
min∑m
j=1
∑r+si=1 wij tij
s.t. (t1j;xj − a1; . . . ; trj;xj − ar) r〈2〉 0, j = 1, 2, . . . ,m
(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m,
(6.5.1)
where wij is the weight associated with the ith existing facility and the jth new facility
for j = 1, 2, . . . ,m and i = 1, 2, . . . , r + s.
If interaction exists among the new facilities, then, in addition to the above require-
ments, we need to minimize the sum of the (either Euclidean or rectilinear) distances
between each pair of the new facilities. Let 1 ≤ l ≤ m and assume that we are required
to minimize the weighted sum of the Euclidean distances between each pair of the new
facilities x1,x2, . . . ,xl and the weighted sum of the rectilinear distances between each
pair of the new facilities xl+1,xl+2, . . . ,xm. In this case, we are interested in a model of
the form:
min∑m
j=1
∑r+si=1 wij tij +
∑mj=2
∑j−1j′=1 wjj′ tjj′
s.t. (t1j;xj − a1; . . . ; trj;xj − ar) r〈2〉 0, j = 1, 2, . . . ,m
(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m
(tj(j+1);xj − xj+1; . . . ; tjl;xj − xl) (l−j)〈2〉 0, j = 1, 2, . . . , l − 1
(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1,
(6.5.2)
where wjj′ is the weight associated with the new facilities j′ and j for j′ = 1, 2, . . . , j − 1
and j = 2, 3, . . . ,m.
6.5.2 DERFLPs—A 0-1MOCP model
We consider the discrete version of the problem by assuming that the new facilities
x1,x2, . . . ,xm need to be placed at specific locations and not in any place in 2- or 3-
143
(or higher) dimensional space. Let the points v1,v2, . . . ,vk ∈ Rn represent these specific
locations where k ≥ m. So, we add the constraint xi ∈ v1,v2, . . . ,vk for i = 1, 2, . . . ,m.
Clearly, for i = 1, 2, . . . ,m, the above constraint can be replaced by the following linear
and binary constraints:
xi = v1 yi1 + v2 yi2 + · · ·+ vk yik,
yi1 + yi2 + · · ·+ yik = 1, and
yi = (yi1; yi2; . . . ; yik) ∈ 0, 1k.
We also assume that we cannot place more than one facility at each location. Conse-
quently, we add the following constaints:
(1; y1l; y2l; . . . ; yml) 〈1〉 0, for l = 1, 2, . . . , k.
If there is no interaction between the new facilities, then the MOCP model (6.5.1) becomes
the following 0-1MOCP model:
min∑m
j=1
∑r+si=1 wij tij
s.t. (t1j;xj − a1; . . . ; trj;xj − ar) r〈2〉 0, j = 1, 2, . . . ,m
(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m
xi = v1 yi1 + v2 yi2 + · · ·+ vk yik, i = 1, 2, . . . ,m
(1; y1l; y2l; . . . ; yml) 〈1〉 0, for l = 1, 2, . . . , k
1T yi = 1, yi ∈ 0, 1k, i = 1, 2, . . . ,m.
If interaction exists among the new facilities, then the MOCP model (6.5.2) becomes the
144
following 0-1MOCP model:
min∑m
j=1
∑r+si=1 wij tij +
∑mj=2
∑j−1j′=1 wjj′ tjj′
s.t. (t1j;xj − a1; . . . ; trj;xj − ar) r〈2〉 0, j = 1, 2, . . . ,m
(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m
(tj(j+1);xj − xj+1; . . . ; tjl;xj − xl) (l−j)〈2〉 0, j = 1, 2, . . . , l − 1
(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1
xi = v1 yi1 + v2 yi2 + · · ·+ vk yik, i = 1, 2, . . . ,m
(1; y1l; y2l; . . . ; yml) 〈1〉 0, for l = 1, 2, . . . , k
1T yi = 1, yi ∈ 0, 1k, i = 1, 2, . . . ,m.
For l = 1, 2, . . . , k, let zl = 1 if the location vi is chosen, and 0 otherwise. Then, we can
go further, and consider more assumptions: Let k1, k2, k3, k4 ∈ [1, k] be integers such that
k1 ≤ k2 and k3 ≤ k4. If we must choose at most k1 of the locations v1,v2, . . . ,vk2 , then
we impose the constraints:
(k1; z1; z2; . . . ; zk2) 〈1〉 0, and z ∈ 0, 1k.
If we must choose at most k1 of the locations v1,v2, . . . ,vk2 , or at most k3 of the
locations v1,v2, . . . ,vk4 , then we impose the constraints:
(k1f ; z1; z2; . . . ; zk2) 〈1〉 0, (k3(1− f); z1; z2; . . . ; zk4) 〈1〉 0, z ∈ 0, 1k, and f ∈ 0, 1.
6.5.3 ERFLPs with integrality constraints—An MIMOCP model
In some problems we may need the locations to have integer-valued coordinates. In most
cities, streets are laid out on a grid, so that city is subdivided into small numbered blocks
that are square or rectangular. In this case, usually the decision maker needs the new
145
facility to be placed at the corners of the city blocks. Thus, for each i ∈ ∆ ⊂ 1, 2, . . . ,m,
let us assume that the variable xi lies in the hyperrectangle Ξni ≡ xi : ζi ≤ xi ≤ ηi, ζi ∈
Rn,ηi ∈ Rn and has to be integer-valued, i.e. xi must be in the grid Ξni
⋂Zn. Thus,
if there is no interaction between the new facilities, then instead of solving the MOCP
model (6.5.1), we solve the following MIMOCP model:
min∑m
j=1
∑r+si=1 wij tij +
∑mj=2
∑j−1j′=1 wjj′ tjj′
s.t. (t1j;xj − a1; . . . ; trj;xj − ar) r〈2〉 0, j = 1, 2, . . . ,m
(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m
xk ∈ Ξnk
⋂Zn, k ∈ ∆.
If interaction exists among the new facilities, then instead of solving the MOCP model
(6.5.2), we solve the following MIMOCP model:
min∑m
j=1
∑r+si=1 wij tij +
∑mj=2
∑j−1j′=1 wjj′ tjj′
s.t. (t1j;xj − a1; . . . ; trj;xj − ar) r〈2〉 0, j = 1, 2, . . . ,m
(t(r+1)j;xj − ar+1; . . . ; t(r+s)j;xj − ar+s) s〈1〉 0, j = 1, 2, . . . ,m
(tj(j+1);xj − xj+1; . . . ; tjl;xj − xl) (l−j)〈2〉 0, j = 1, 2, . . . , l − 1
(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1
xk ∈ Ξnk
⋂Zn, k ∈ ∆.
6.5.4 Stochastic CERFLPs—An SMOCP model
Before we describe the stochastic version of this generic application, we indicate a more
concrete version of it. Assume that we have a new growing city with many suburbs and
we want to build a hospital for treating the residents of this city. Some people live in
the city at the present time. As the city expands, many houses in new suburbs need to
be built and the locations of these suburbs will be known in the future in different parts
of the city. Our goal is to find the best location of this hospital so that it can serve the
146
current suburbs and the new ones. This location must be determined at the current time
and before information about the locations of the new suburbs become available. For
those houses that are close enough to the location of the hospital, we use the rectilinear
distance as a distance measure between them and the hospital, while for the new suburbs
that that will be located far way from the location of the hospital, we use the Euclidean
distance as a distance measure between them and the hospital. See Figure 6.1.
Figure 6.1: A more concrete version of the stochastic CERFLP: A new growing city withmany expected building houses in different possible sides of the city.
Generally speaking, let a1,a2, . . . ,ar1 ,ar1+1,ar1+2, . . . ,ar1+s1 be fixed points in Rn
representing the coordinates of r1+s1 existing fixed facilities and a1(ω), a2(ω), . . . , ar2(ω),
ar2+1(ω), ar2+2(ω), . . . , ar2+s2(ω) be random points in Rn representing the coordinates of
r2 + s2 random facilities whose realizations depends on an underlying outcome ω in an
event space Ω with a known probability function P .
Suppose that at present we do not know the realizations of r2 + s2 random facilities,
and that at some point in time in future the realizations of these r2 + s2 random facilities
147
become known.
Our goal is to locate m new facilities x1,x2, . . . ,xm ∈ Rn that minimize the the
following sums:
• the weighted sums of the Euclidean distance between each one of the new facilities
and each one of the fixed facilities a1,a2, . . . ,ar1 ;
• the weighted sums of the rectilinear distance between each one of the new facilities
and each one of the fixed facilities ar1+1,ar1+2, . . . ,ar1+s1 ;
• the expected weighted sums of the Euclidean distance between each one of the new
facilities and each one of the random facilities a1(ω), a2(ω), . . . , ar2(ω);
• the expected weighted sums of the rectilinear distance between each one of the new
facilities and each one of the random facilities ar2+1(ω), ar2+2(ω), . . . , ar2+s2(ω).
Note that this decision needs to be made before the realizations of the r2 + s2 random
facilities become available. If there is no interaction between the new facilities, then the
(deterministic) MOCP model (6.5.1) becomes the following SMOCP model:
min∑m
j=1
∑r1+s1i=1 wij tij + E [Q(x1; . . . ;xm, ω)]
s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1〈2〉 0, j = 1, 2, . . . ,m
(t(r1+1)j;xj − ar1+1; . . . ; t(r1+s1)j;xj − ar1+s1) s1〈1〉 0, j = 1, 2, . . . ,m,
where Q(x1; . . . ;xm, ω) is the minimum value of the problem
min∑m
j=1
∑r2+s2i=1 wij(ω) tij
s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2〈2〉 0, j = 1, 2, . . . ,m
(t(r2+1)j;xj − a(r2+1)(ω); . . . ; t(r2+s2)j;xj − a(r2+s2)(ω)) s2〈1〉 0, j = 1, 2, . . . ,m,
148
and
E[Q(x1; . . . ;xm, ω)] :=
∫Ω
Q(x1; . . . ;xm, ω)P (dω),
where wij is the weight associated with the ith existing facility and the jth new facility for
j = 1, 2, . . . ,m and i = 1, 2, . . . , r1 + s1, and wij(ω) is the weight associated with the ith
random existing facility and the jth new facility for j = 1, 2, . . . ,m and i = 1, 2, . . . , r2+s2.
If interaction exists among the new facilities, then the (deterministic) MOCP model
(6.5.2) becomes the following SMOCP model:
min∑m
j=1
∑r1+s1i=1 wij tij +
∑mj=2
∑j−1j′=1 wjj′ tjj′ + E [Q(x1; . . . ;xm, ω)]
s.t. (t1j;xj − a1; . . . ; tr1j;xj − ar1) r1〈2〉 0, j = 1, 2, . . . ,m
(t(r1+1)j;xj − ar1+1; . . . ; t(r1+s1)j;xj − ar1+s1) s1〈1〉 0, j = 1, 2, . . . ,m
(tj(j+1);xj − xj+1; . . . ; tjl;xj − xl) (l−j)〈2〉 0, j = 1, 2, . . . , l − 1
(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1,
where Q(x1; . . . ;xm, ω) is the minimum value of the problem
min∑m
j=1
∑r2+s2i=1 wij(ω) tij +
∑mj=2
∑j−1j′=1 wjj′ tjj′
s.t. (t1j;xj − a1(ω); . . . ; tr2j;xj − ar2(ω)) r2〈2〉 0, j = 1, 2, . . . ,m
(t(r2+1)j;xj − a(r2+1)(ω); . . . ; t(r2+s2)j;xj − a(r2+s2)(ω)) s2〈1〉 0, j = 1, 2, . . . ,m
(tj(j+1);xj − xj+1; . . . ; tjl;xj − xl) (l−j)〈2〉 0, j = 1, 2, . . . , l − 1
(tj(j+1);xj − xj+1; . . . ; tjm;xj − xm) (m−j)〈1〉 0, j = l + 1, 2, . . . ,m− 1,
and
E[Q(x1; . . . ;xm, ω)] :=
∫Ω
Q(x1; . . . ;xm, ω)P (dω),
where wjj′ is the weight associated with the new facilities j′ and j for j′ = 1, 2, . . . , j − 1
and j = 2, 3, . . . ,m.
149
Chapter 7
Conclusion
In this dissertation we have introduced the two-stage stochastic symmetric programming
(SSP) with recourse. While one can take the view point that SSPs are a way of dealing
with uncertainty in data defining deterministic symmetric programs, it is also possible to
take the dual viewpoint that SSPs are a way of allowing any symmetric cone in place of
the nonnegative orthant cone in stochastic linear programs, or the positive semidefinite
cone in stochastic semidefinite programs. It is also seen that the SSP problem includes
some important general classes of optimization problems (Problems 1-6) as special cases.
We have presented a logarithmic barrier interior point algorithm for solving SSPs
using Bender’s decomposition. We have proved the convergence results by showing that
the logarithmic barrier associated with the recourse function is a strongly self-concordant
barrier on the first-stage solutions. We have described and analyzed short- and long-
step variance of the algorithm that follow the primal central trajectory of the first-stage
problem. We concluded that, for given two symmetric cones with ranks r1 and r2 in the
first- and second-stage problems, respectively, and for K number of realizations, we need
at most O(√r1 +Kr2 ln(µ0/ε)) Newton iterations in the short-step class of the algorithm
to follow the first-stage central path from a starting value of the barrier µ0 to a terminating
value ε. We have also shown that at most O((r1 + Kr2) ln(µ0/ε)) Newton iterations for
150
this recentering in the long-step class of the algorithm.
An alternative to the logarithmic barrier is the volumetric barrier of Vaidya [40]. In this
dissertation, we have also presented a class of volumetric barrier decomposition algorithms
for SSPs. We have proved the convergence results by showing that the volumetric barrier
associated with the recourse function is a strongly self-concordant barrier on the first-
stage solutions. We have also described and analyzed short- and long-step variant of
the algorithm that follow the primal central trajectory of the first-stage problem. For
given two symmetric cones with ranks r1 and r2 and dimensions n1 and n2 in the first-
and second-stage problems, respectively, and for K number of realizations, we have seen
that in order to follow the first-stage central path from a starting value of the barrier µ0
to a terminating value ε, we need at most O(√
(1 +K)(m1c1 +m2c2) ln(µ0/ε)) Newton
iterations in the short-step class of the algorithm. We have also shown that at most
O((1+K)(m1c1+m2c2) ln(µ0/ε)) Newton iterations in the long-step class of the algorithm.
Chen and Mehrotra [16] have proposed a prototype primal interior point decomposi-
tion algorithm for the two-stage stochastic convex optimization problem. Since stochastic
symmetric programming is a subclass of general stochastic convex optimization, our prob-
lem can be solved by using their prototype algorithm, but doing this is not a good idea,
because solving SSPs as convex programs without exploiting the special structure of the
symmetric cone is not advisable both theoretically and practically. Therefore, a separate
study of SSP algorithms is warranted.
Concerning applications, we have presented four application areas that lead to SSPs
when the underlying symmetric cones are second-order cones and rotated quadratic cones.
This has been simply accomplished by applying a relaxation on assumptions that include
deterministic data to include random data.
Problems 1-6 in Chapter 2 are all important special cases of the SSP problem. So, it
is interesting to investigate other applicational settings leading to these general problems,
such as the ones obtained when the underlying symmetric cones are real symmetric or
151
complex Hermitian positive semidefinite matrices.
Concerning implementation of the proposed SSP algorithms, it will be interesting to
implement these methods in the future to determine their practical effects. We there-
fore refer ourselves to a paper by Mehrotra and Ozevin [26] where they develop efficient
practical implementations of their SSDP algorithm proposed in [25].
We have ended the dissertation by introducing deterministic and stochastic multi-
order cone programs (MOCPs) as new conic optimization problems and to be natural
extensions of deterministic and stochastic second-order cone programs. We presented an
application leading to deterministic, 0-1, integer, and stochastic multi-order cone pro-
grams. The MOCP problems are over non-symmetric cones, hence we left such problems
as “logarithmically unsolved” open problems so as to contemplate for our future research
in conic programming.
152
Bibliography
[1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program., Ser.B 95:3–51, 2003.
[2] B. Alzalg. Stochastic second-order cone programming: Application models, Submit-ted.
[3] B. Alzalg. Stochastic symmetric optimization. Submitted.
[4] B. Alzalg and K. A. Ariyawansa. A class of polynomial volumetric barrier decompo-sition algorithms for stochastic symmetric programming, Submitted.
[5] K. M. Anstreicher. On Vaidya’s volumetric cutting plane method for convex pro-gramming. Mathematics of Operations Research, 22(1):63–89, 1997.
[6] K. M. Anstreicher. Volumetric path following algorithms for linear programming.Mathematical Programming, 76(1B):245–263, 1997.
[7] K. M. Anstreicher. The volumetric barrier for semidefinite programming. Mathemat-ics of Operations Research, 25(3):365–380, 2000.
[8] K. A. Ariyawansa and A. J. Felt. On a new collection of stochastic linear programmingtest problems. Informs Journal on Computing, 16(3):291–299, 2004.
[9] K. A. Ariyawansa and Y. Zhu. Stochastic semidefinite programming: A new paradigmfor stochastic optimization. 4OR—The Quarterly Journal of the Belgian, French andItalian OR Societies, 4(3):65–79, 2006.
[10] K. A. Ariyawansa and Y. Zhu. A class of polynomial volumetric barrier decompositionalgorithms for stochastic semidefinite programming. Mathematics of computation,80:1639–1661, 2011.
[11] V. S. Bawa, S. J. Brown, and R. W. Klein. Estimation risk and optimal portfoliochoice: A survey, Proceedings of the American Statistical Association, 53–58, 1979.
[12] A. Ben-Tal and M. P. Bendsøe. A new method for optimal truss topology design.SIAM J. Optimization, 3(2):322–358, 1993.
153
[13] A. Ben-Tal and A. Nemirovski. Interior point polynomial-time method for trusstopology design, Technical report, Faculty of Industrial Engineering and Mangement,Technion, 1992.
[14] D. Bertsimas and J. Tsitsiklis. Introduction to linear optimization, Athena Scientific,1997.
[15] J. R. Birge. Stochastic programming computation and applications. INFORMS Jour-nal on Computing, 9:111–133, 1997.
[16] M. Chen and S. Mehrotra. Self-concordance tree and decomposition based interiorpoint methods for the two stage stochastic convex optimization problem. Technicalreport, Northwestern University, 2007.
[17] M. A. H. Dempster. Stochastic Programming. Academic Press, London, UK, 1980.
[18] J. Faraut and A. Koranyi. Analysis on Symmetric Cones. Oxford University Press,Oxford, UK, 1994.
[19] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Uncon-strained Minimization Techniques. John Wiley, New York, NY, 1968.
[20] C. Helmberg, F. Rendl, R. J. Vanderbei, H. Wolkowicz. An interior-point methodsfor stochastic semidefinite programming. SIAM J. of Optimization, 6:342–361, 1996.
[21] P. L. Jiang. Polynomial Cutting Plane Algorithms for Stochastic Programming andRelated Problems. Ph.D. dissertation, Department of Pure and Applied Mathematics,Washington State University, Pullman, WA, 1997.
[22] M. Kojima, S. Shindoh, S. Hara. Interior-point methods for the monotone linearcomplementarity problem in symmetric matrices. SIAM J. of Optimization, 7(9):86–125, 1997.
[23] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-ordercone programming. Linear Algebra Appl., 284:193–228, 1998.
[24] F. Maggioni, F.A. Potra, M.I. Bertocchi, and E. Allevi. Stochastic second-order coneprogramming in mobile ad hoc networks. J. Optim. Theory Appl., 143:309–328, 2009.
[25] S. Mehrotra and M. G. Ozevin. Decomposition-based interior point methods for two-stage stochastic semidefinite programming. SIAM J. of Optimization, 18(1):206–222,2007.
[26] S. Mehrotra, and M. Ozevin. On the implementation of interior point decompositionalgorithms for two-stage stochastic conic programs. SIAM J. Optim., 19(4):1846–1880, 2008.
154
[27] S. Mehrotra, and M. Ozevin. Decomposition based interior point methods for two-stage stochastic convex quadratic programs with recourse. Operations Research,54(4):964–974, 2009.
[28] R. D. Monteiro. Primal-dual path-following algorithms for semidefinite programming.SIAM J. Optim., 7:663–678, 1997.
[29] Yu. E. Nesterov and A. S. Nemirovskii. Conic formulation of a convex programmingproblem and duality. Optimization Methods and Software, 1(2):95–115 , 1992.
[30] Yu. E. Nesterov and A. S. Nemirovski. Interior Point Polynomial Algorithms inConvex Programming., SIAM Publications. SIAM, Philadelphia, PA, USA, 1994.
[31] A. Prekopa. On probabilistic constrained programming. Proceedings of the PrincetonSymposium on Mathematical Programming. 113–138, Princeton Univ. Press, Prince-ton, NJ, USA, 1970.
[32] A. Prekopa. Stochastic Programming. Kluwer Academic Publishers, Boston, MA,USA, 1995.
[33] A. Prekopa. The use of discrete moment bounds in probabilistic constrained stochasticprogramming models. Ann. Oper. Res., 85:21–38, 1999.
[34] B. K. Rangarajan. Polynomial convergence of infeasible-interior point methods oversymmetric cones. SIAM J. of Optimization, 16(4):1211–1229, 2006.
[35] S. H. Schmieta and F. Alizadeh. Associative and Jordan algebras, and polynomialtime interior point algorithms for symmetric cones. Mathematics of Operations Re-search, 26(3):543–564, 2001.
[36] S. H. Schmieta and F. Alizadeh. Extension of primal-dual interior point methods tosymmetric cones. Math. Program., Ser. A 96:409–438, 2003.
[37] P. Sun and R. M. Freund. Computation of minimum-cost covering ellipsoids. Opera-tions Research, 52(5):690–706, 2004.
[38] M. J. Todd. Semidefinite optimization. ACTA Numerica, 10:515–560, 2001.
[39] J. A. Tompkins, J. A. White and Y. A. Bozer, J. M. Tanchoco Facilities Planning,3rd edn. Wiley, Chichester, 2003.
[40] P. M. Vaidya. A new algorithm for minimizing convex functions over a convex set.Math. Program., Ser. A, 73:291–341, 1996.
[41] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Rev., 38:49–95,1996.
[42] R. J. Vanderbei and H. Yurittan. Using LOQO to solve second-order cone program-ming problems. Report SOR 98-09, Princeton University, Princeton, USA, 1998.
155
[43] D. S. Watkins. Fundamentals of Matrix Computations. 3rd edn. Wiley, New York,2010.
[44] R. J-B. Wets. Stochastic programming: Solution techniques and approximationschemes. Mathematical Programming: The State of the Art, 1982, A. Bachem, M.Groeschel, and B. Korte, eds., Springer-Verlag, Berlin, 566–603, 1982.
[45] M. M. Wiecek. Advances in cone-based preference modeling for decision making withmultiple criteria. Decision Making in Manufacturing and Services, 1(1-2):153–173,2007.
[46] F. Zhang. Quaternions and matrices of quaternions. Linear Algebra Appl., 251(2):21–57, 1997.
[47] Y. Zhang. On extending primal-dual interior-point algorithms from linear programingto semidefinite programming. SIAM J. Optim., 8:356–386, 1998.
[48] G. Zhao. A log-barrier method with Benders decomposition for solving two-stagestochastic linear programs. Math. Program., Ser. A, 90:507–536, 2001.
[49] Y. Zhu and K. A. Ariyawansa. A preliminary set of applications leading to stochasticsemidefinite programs and chance-constrained semidefinite programs. Applied Math-ematical Modelling, 35(5):2425–2442, 2011.
156