a globally convergent primal-dual interior-point three ... · a globally convergent primal-dual...

A globally convergent primal-dual

interior-point three-dimensional filter

method for nonlinear semidefinite

programming

Zhongyi Liu∗

Abstract

This paper proposes a primal-dual interior-point filter methodfor nonlinear semidefinite programming, which is the first multi-dimensional (three-dimensional) filter methods for interior-pointmethods, and of course for constrained optimization. A freshlynew definition of filter entries is proposed, which is greatly dif-ferent from those in all the current filter methods. A mixednorm is used to tackle with trust region constraints and globalconvergence to first-order critical points can easily be proved byslightly modifying the analysis in [7].

Keywords: nonlinear semidefinite programming, interior-point method,

three-dimensional filter method, global convergence

AMS subject classification: 65K05, 90C51

1 Introduction

For the concepts and frame of filter methods, we refer to [1, 2, 3, 4] and specially

to [7]. The multidimensional filter methods [5, 6] can only tackle with uncon-

strained optimization. As mentioned in [7], that multidimensional filter methods

∗Correspondence to: College of Science, Hohai University, Nanjing 210098, China.E-mail: [email protected]

1

for constrained optimization and interior-point methods with three-dimensional

filter is still an open question till now. In the current paper we propose a three-

dimensional filter method for interior-point methods. A freshly new definition

of filter entries in the filter method is proposed for the first time, in which the

first component is always kept smaller than the second one, which looks greatly

different from those in all the current filter methods. This idea is simple but

valuable for it is possible to open access to multidimensional filter methods for

constrained optimization.

We consider the following nonlinear SDP problem:

min f(x), x ∈ Rn,

s.t. h(x) = 0, X ≡ Ax−B � 0, (1.1)

where the functions f : Rn → R and h : Rn → Rm are sufficiently smooth. Here

A is a linear operator defined by Ax =∑n

i=1 xiAi for x ∈ Rn, and Ai ∈ Sp, i =

1, . . . , n, and B ∈ Sp are given matrices, where Sp denotes the set of pth order

real symmetric matrices. By X � 0 and X � 0, we mean that the matrix X is

positive semidefinite and positive definite, respectively.

Throughout this paper, we define the inner product 〈X,Z〉 by 〈X, Z〉 =

Tr(XZ) for any matrices X and Z in Sp, where Tr(M) denotes the trace of

the matrix M . We also define an adjoint operator A∗ of A such that A∗Z is an

n dimensional vector whose ith element is Tr(AiZ). Then we have

〈Ax, Z〉 = xT (A∗Z) = (A∗Z)T x,

where the superscript T denotes the transpose of a vector or a matrix. And the

mixed norm ‖·‖∗ expresses ‖u‖+‖ V ‖F for some form (u, V ), where u is a vector,

V is a matrix, ‖ · ‖ is the Euclidean norm and ‖ · ‖F is the Frobenius norm.

2 Interior-point framework

We return to the nonlinear semidefinite programming problem posed in (1.1).

2.1 Step computation

Let the Lagrangian function of problem (1.1) be defined by

L(x, y, Z) = f(x)− yT h(x)− 〈X,Z〉,

2

where y ∈ Rm and Z ∈ Sp are the Lagrange multiplier vector and matrix which

correspond to the equality and positive semidefiniteness constraints, respectively.

Then Karush-Kuhn-Tucker (KKT) conditions for optimality of problem (1.1) are

given by ∇xL(x, y, Z)

h(x)

X ◦ Z

= 0

and

X � 0, Z � 0.

Here ∇xL(x, y, Z) is given by

∇xL(x, y, Z) = ∇f(x)−∇h(x)y −A∗Z,

and the multiplication X ◦ Z is defined by

X ◦ Z =XZ + ZX

2.

It is known that X ◦ Z = 0 is equivalent to the relation XZ = ZX = 0.

We call (x, y, Z) satisfying X � 0 and Z � 0 the interior point. To construct

an interior-point algorithm, we introduce a positive parameter µ, and we replace

the complementarity condition X ◦ Z = 0 by X ◦ Z = µI, where I denotes the

identity matrix and µ > 0. Then we try find a point that satisfies the following

equation:

F (x, y, Z) :=

∇xL(x, y, Z)

h(x)

X ◦ Z − µI

= 0 (2.1)

and

X � 0, Z � 0.

Throughout we will work with µ = σµ, where σ ∈ (0, 1) is a centering parameter,

and

µ =Tr(XZ)

n.

To abbreviate notation we set

w = (x, y, Z) and ∆w = (∆x, ∆y, ∆Z).

3

We apply a Newton-like method to the system of equations (2.1). Let the Newton

directions for the primal-dual variables by ∆x ∈ Rn and ∆Z ∈ Sp, respectively.

We define ∆X =∑n

i=1 ∆xiAi and we note that ∆X ∈ Sp.

Since (X + ∆X) ◦ (Z + ∆Z) = σµI can be written as

(X + ∆X)(Z + ∆Z) + (Z + ∆Z)(X + ∆X) = 2σµI,

neglecting the nonlinear parts ∆X∆Z and ∆Z∆X implies the Newton equation

for (2.1) as

G∆x−∇h(x)∆y −A∗∆Z = −∇xL(x, y, Z),

∇h(x)T ∆x = −h(x),

∆XZ + Z∆X + X∆Z + ∆ZX = 2σµI −XZ − ZX,

where G denotes the Hessian matrix of the Lagrangian function L(w) or its ap-

proximation.

We rewrite the perturbed KKT-conditions in the form 0

h(x)

X ◦ Z − µI

+

∇xL(x, y, Z)

0

(1− σ)µI

= 0. (2.2)

Now we use the decomposition associated with the splitting (2.2) to define tangen-

tial and normal components of trial steps. For the normal step sn = (∆xn, ∆yn, ∆Zn),

we choose G∆xn −∇h(x)∆yn −A∗∆Zn

∇h(x)T ∆xn

X ◦∆Zn + Z ◦∆Xn

= −

0

h(x)

X ◦ Z − µI

, (2.3)

whereas the tangential step st = (∆xt, ∆yt, ∆Zt) is given by G∆xt −∇h(x)∆yt −A∗∆Zt

∇h(x)T ∆xt

X ◦∆Zt + Z ◦∆X t

= −

∇xL(w)

0

(1− σ)µI

. (2.4)

Thus ∆w = sn +st. We will introduce different stepsizes for sn and st in our trial

step computation.

We introduce ∆ as the positive scalar that primarily controls the length of

the step taken along ∆w, forcing the damped components αn(∆)sn and αt(∆)st,

to satisfy

‖αn(∆)sn‖∗ ≤ ∆, ‖αt(∆)st‖∗ ≤ ∆.

4

Having these bounds in mind, and requiring explicitly αt(∆) ≤ αn(∆), we set

αn(∆) = min

{1,

∆

‖sn‖∗

}, (2.5)

αt(∆) = min

{αn(∆),

∆

‖sn‖∗

}= min

{1,

∆

‖sn‖∗,

∆

‖st‖∗

}. (2.6)

Hereby, we use for ∆ > 0 the natural definition αn(∆) = 1 for ‖sn‖∗ = 0 and

αt(∆) = αn(∆) for ‖st‖∗ = 0 by using the convention min{1,∞} = 1.

Let also

w(∆) = (x(∆), y(∆), Z(∆)) = w + αn(∆)sn + αt(∆)st,

s(∆) = (sx(∆), sy(∆), sZ(∆)) = w(∆)− w = αn(∆)sn + αt(∆)st.

Thus,

‖s(∆)‖∗ ≤ 2∆,

and here ∆ plays a role comparable to the trust-region radius.

The scalars αn(∆) and αt(∆) define the step sizes taken along the normal

and tangential directions, respectively, and will be such that positivity and some

measure of centrality of the new iterate w(∆) are maintained. However, both

αn(∆) and αt(∆) depend on ∆, that in turn will be adjusted, not only to meet

the purpose of positivity and centrality, but also to enforce global convergence.

We introduce the notations

θh(w) := ‖h(x)‖, θc(w) :=

∥∥∥∥X ◦ Z − Tr(XZ)

nI

∥∥∥∥F, θ(w) := θh(w) + θc(w),

θL(w) := ‖∇xL(w)‖, θg(w) :=Tr(XZ)

n+ ‖∇xL(w)‖2.

In particular, we write the filter components as

θ1(w) := min{θh(w), θc(w)}, θ2(w) := max{θh(w), θc(w)},

θ3(w) := θg(w) :=Tr(XZ)

n+ ‖∇xL(w)‖2.

Since X ◦ Z might not be zero matrix, a point w that satisfies θc(w) = θh(w) =

θL(w) = 0 and X � 0, Z � 0, might not be a KKT point. The definition of

θg(w), however, guarantees that a point w verifying θc(w) = θh(w) = θg(w) = 0,

which is the same as θ1(w) = θ2(w) = θ3(w) = 0, and X � 0, Z � 0, indeed a

KKT point.

5

With the purpose of achieving a reduction on the function θg, we introduce,

at a given point w, the quadratic model

m(w(∆))

=Tr(XZ)

n+

Tr((X(∆)−X)Z) + Tr((Z(∆)− Z)X)

n+‖∇xL(w) +∇2

xwL(w)(w(∆)− w)‖2

=Tr(X(∆)Z(∆))− Tr((X(∆)−X)(Z(∆)− Z))

n+‖∇xL(w) +∇2

xwL(w)(w(∆)− w)‖2,

by adding to the linearization of Tr(XZ)/n the squared norm of the linearization

of ∇xL(w). To shorten notation we also set

µ(∆) =Tr(X(∆)Z(∆))

n.

In order to prevent (X(∆), Z(∆)) from approaching the boundary of the positive

cone too rapidly we will keep the iteration in the neighborhood

N (γ, M) =

{w : X � 0, Z � 0, X ◦ Z � γ

Tr(XZ)

n, θh(w) + θL(w) ≤ M

Tr(XZ)

n

}with fixed γ ∈ (0, 1) and M > 0. We will set in the next subsection that

w ∈ N (γ, M) implies w(∆) ∈ N (γ, M) whenever ∆ ∈ (0, ∆min] for a given

constant ∆min > 0.

2.2 Step estimates

Lemma 2.1. Let A, B ∈ Sn×n, λi(A) be its eigenvalue, Λ = diag(λ(A)), and

ρ(A) be its spectral radius, i.e., ρ(A) = max{|λi(A)|, i = 1, . . . , n}. Then

A ◦B � ρ(A)ρ(B)I

Proof. Since

2ρ(A)ρ(B)I − (QT AQQT BQ + QT BQQT AQ)

= ρ(A)ρ(B)I − ΛQT BQ + ρ(A)ρ(B)I −QT BQΛ

� Λρ(B)I − ΛQT BQ + ρ(B)IΛ−QT BQΛ

= Λ(ρ(B)I −QT BQ) + (ρ(B)I −QT BQ)Λ

� 0,

6

then the lemma follows easily.

Lemma 2.2. For all ∆ > 0 and all i = 1, . . . , n it holds

λi(X(∆) ◦ Z(∆)) ≤ (1− αn(∆))λi(X ◦ Z)

+(αn(∆)− αt(∆)(1− σ))µ + 4∆2, (2.7)

λi(X(∆) ◦ Z(∆)) ≥ (1− αn(∆))λi(X ◦ Z)

+(αn(∆)− αt(∆)(1− σ))µ− 4∆2, (2.8)

µ(∆) ≤ (1− αt(∆)(1− σ))µ + 4∆2, (2.9)

µ(∆) ≥ (1− αt(∆)(1− σ))µ− 4∆2. (2.10)

Proof. By the definition of sn and st, we have

X(∆) ◦ Z(∆) = (X + αn(∆)∆Xn + αt(∆)∆X t) ◦ (Z + αn(∆)∆Zn + αt(∆)∆Zt)

= X ◦ Z + αn(∆)(∆Xn ◦ Z + X ◦∆Zn)

+αt(∆)(∆X t ◦ Z + X ◦∆Zt)

+(αn(∆)∆Xn + αt(∆)∆X t) ◦ (αn(∆)∆Zn + αt(∆)∆Zt)

= X ◦ Z − αn(∆)(X ◦ Z − µI)− αt(∆)(1− σ)µ

+(X(∆)−X) ◦ (Z(∆)− Z).

Note that |λi(X(∆)−X)| ≤ ‖X(∆)−X‖F , similarly for |λi(Z(∆)− Z)|. Then

|λi(X(∆)−X)||λi(Z(∆)− Z)| ≤ (2∆)2.

Hence the relations (2.7) and (2.8) follow.

Summing (2.7) and (2.8) over all i, dividing the result by n, and using µ =

Tr(XZ)/n, µ(∆) = Tr(X(∆)Z(∆))/n, yield (2.9) and (2.10).

The following lemma provides, at w(∆), upper bounds on the values of θh, θc, θL

and θg in terms of ∆ and of the corresponding values at w. It also provides a

lower bound for the decrease produced on the quadratic model m by the step

w(∆)− w.

Lemma 2.3. There exist positive constants Mh, Mc, and ML, depending on an

upper bound for θL and on the Lipschitz constants of ∇h and ∇2xwL, such that,

for all ∆ > 0,

θh(w(∆)) ≤ (1− αn)θh(w) + Mh∆2, (2.11)

θc(w(∆)) ≤ (1− αn(∆))θc(w) + Mc∆2, (2.12)

θL(w(∆)) ≤ (1− αt(∆))θL(w) + ML∆2. (2.13)

7

For ∆ satisfying 0 < ∆ ≤ ∆ub, it also holds

θg(w(∆)) ≤ (1− αt(∆)(1− σ))θg(w) + Mg∆2, (2.14)

for some positive constant Mg.

Finally, for any ∆ > 0, we also have

m(w)−m(w(∆)) ≤ αt(∆)(1− σ)θg(w). (2.15)

Proof. Denote by BL an upper bound for θL, and by Ch and CL′ > 1 Lipschitz

constants for ∇h and ∇2xwL, respectively. We will prove this Lemma with

Mh = 2Ch, Mc = 8√

n,

ML = 2CL′ , Mg = 4(1 + BLCL′) + 4C2L′∆2

ub.

Note that ∇h(x)T sx(∆) = −αn(∆)h(x) from (2.3),(2.4) and the definition of

s(∆). Thus

θh(w(∆)) = ‖h(x(∆))‖ = ‖h(x) +

∫ 1

0

∇h(x + tsx(∆))T sx(∆)dt‖

= ‖(1− αn(∆))h(x) +

∫ 1

0

(∇h(x + tsx(∆))−∇h(x))T sx(∆)dt‖

≤ (1− αn(∆))θh(w) + Ch‖sx(∆)‖2∗

∫ 1

0

tdt,

which proves (2.11).

Similarly, we have ∇2xwL(w)T s(∆) = −αt(∆)∇xL(w) from (2.3), (2.4) and

the definition of s(∆), as above, we get

θL(w(∆)) ≤ (1− αt(∆))θL(w) +

∫ 1

0

‖∇2xwL(w + ts(∆))−∇2

xwL(w)‖‖s(∆)‖∗dt,

which yields (2.13).

From Lemma 2.2,

±λi(X(∆) ◦ Z(∆)− µ(∆)I) ≤ ±((1− αn(∆))λi(X ◦ Z) + (αn(∆)− αt(∆)(1− σ))µ

)+ 4∆2

∓(1− αt(∆)(1− σ))µ + 4∆2

= ±(1− αn(∆))(λi(X ◦ Z)− µ) + 8∆2.

Then (2.12) follows.

From Lemma 2.2 and the inequality (2.13),

θg(w(∆)) = µ(∆) + θL(w(∆))2

≤ (1− αt(∆)(1− σ))µ + 4∆2 + ((1− αt(∆))θL(w) + 2CL′∆2)2

≤ (1− αt(∆)(1− σ))θg(w) + (4 + 4(1− αt(∆))θL(w))CL′∆2 + 4C2L′∆4,

8

where the second inequality take advantage of the relation (1 − αt(∆))2 ≤ 1 −αt(∆)(1− σ). Then (2.14) follows.

Finally, we have

m(w)−m(w(∆)) = (Tr(XZ)

n+ ‖∇xL(w)‖2)

−(Tr(X(∆)Z(∆))

n− Tr((X(∆)−X)(Z(∆)− Z))

n+‖∇xL(w) +∇2

xwL(w)(w(∆)− w)‖2)

= µ− µ(∆) +Tr((X(∆)−X)(Z(∆)− Z))

n+(1− (1− α(∆))2)‖∇xL(w)‖2

= αt(∆)(1− σ)µ + (1− (1− α(∆))2)‖∇xL(w)‖2

≥ αt(1− σ)θg(w),

where the second equality is due to the relation ∇2xwL(w)s(∆) = −αt∆∇xL(w)

and the inequality is due to 1 − (1 − αt(∆))2 ≥ (1 − σ)αt(∆). The proof is

therefore completed.

By introducing E = µ(X−1 � X−1) and F as the identity, we see that the

Newton system of equations (2.1) can be transformed into ∇xxL(x, y, Z) −∇h(x) −A∗

∇h(x)T 0 0

EA 0 F

∆x

∆y

∆Z

= −

∇xL(x, y, Z)

h(x)

X ◦ Z − µI

,

that is,

F ′(x, y, Z)∆w = −F (x, y, Z).

Now we state a result that says that if the current point w = (x, y, Z) satis-

fies the centrality requirement X ◦ Z ≥ γµI, so does the next point w(∆) =

(x(∆), y(∆), Z(∆)), provided ∆ is sufficiently small. A similar property is also

stated for the inequalities θh(w) + θL(w) ≤ Mµ and (X, Z) � 0.

Lemma 2.4. Let ‖F ′(w)−1‖ ≤ C and, for γ ∈ (0, 1) and M > 0, assume that

X ◦ Z ≥ γµI, θh(w) + θL(w) ≤ Mµ.

Then there exists a constant ∆min such that, if 0 < ∆ ≤ ∆min, then

X(∆) ◦ Z(∆) ≥ γµ(∆)I, (2.16)

θh(w(∆)) + θL(w(∆)) ≤ Mµ(∆). (2.17)

9

Furthermore, if (X, Z) � 0, then, for all ∆ ∈ (0, ∆min],

(X(∆), Z(∆)) � 0. (2.18)

Thus, w ∈ N (γ, M) implies w(∆) ∈ N (γ, M) for all ∆ ∈ (0, ∆min].

Proof. We will prove this Lemma with

∆min = { σ(1− γ)

4(1 + γ)C(M + n),

σM

(Mh + ML + 4M)C(M + n)}. (2.19)

1. We first show that (2.16) holds for all ∆ ∈ (0, ∆min] with ∆min satisfying

(2.19). The proof is divided into two stages, first we show (2.16) holds whenever

(2.25) holds, then we show (2.25) holds for all ∆ ∈ (0, ∆min].

From Lemma 2.2 we obtain

λi(X(∆) ◦ Z(∆)) ≥ (γ + (1− γ)αn(∆)− αt(∆)(1− σ))µ− 4∆2. (2.20)

On the other hand, Lemma 2.2 also yields

γµ(∆) ≤ γµ− γαt(∆)(1− σ)µ + 4γ∆2. (2.21)

Hence, λi(X(∆) ◦ Z(∆)) ≥ γµ(∆) holds whenever

4∆2 ≤ (1− γ)(αn(∆)− αt(∆)(1− σ))µ

1 + γ. (2.22)

Since αt(∆) ≤ αn(∆), a sufficient condition for this inequality to hold is

∆2 ≤ αn(∆)σ(1− γ)µ

4(1 + γ),

which, by (2.5), is implied by

∆ ≤ min

{√σ(1− γ)µ

4(1 + γ),

σ(1− γ)µ

4(1 + γ)‖sn‖∗

},

which in turn is true if

∆ ≤ δ1(µ) := min

{√σ(1− γ)µ

4(1 + γ),

σ(1− γ)

4(1 + γ)C(M + n)

}, (2.23)

10

since

‖X ◦ Z − µI‖2F =

n∑i=1

λ2i (X ◦ Z)−

n∑i=1

2µλi(X ◦ Z) + nµ2

=n∑

i=1

λ2i (X ◦ Z)− nµ2

≤ (n∑

i=1

λi(X ◦ Z))2 − nµ2

= (n2 − n)µ2,

and then

‖sn‖∗ = ‖(∆xn, ∆yn, ∆Zn)‖∗≤ C(‖h(x)‖+ ‖X ◦ Z − µI‖F)

= C(θh(w) + ‖X ◦ Z − µI‖F)

≤ C(M + (n2 − n)1/2)µ

≤ C(M + n)µ.

On the other hand, we can also deduce that ‖st‖∗ ≤ C(M + n1/2)µ ≤ C(M +

n)µ, and therefore

δ := max{‖sn‖∗, ‖st‖∗} ≤ C(M + n)µ. (2.24)

We consider now two possible cases in order to show that (2.16) holds whenever

min(∆, δ) ≤ δ1(µ). (2.25)

In the first case ∆ ≤ δ we use that, as we have just shown, (2.16) holds

provided ∆ ≤ δ1(µ). Since ∆ ≤ δ, the inequality ∆ ≤ δ1(µ) is equivalent to

(2.25).

In the case δ < ∆, we know that αn(∆) = αt(∆) = 1 by (2.5) and (2.6), note

that

αn(δ) = min

{1,

δ

‖sn‖∗

}= 1,

αt(δ) = min

{αn(δ),

δ

‖st‖∗

}= 1,

then w(∆) = w(δ), and s(∆) = s(δ). Thus, (2.16) is the same as X(δ) ◦ Z(δ) ≥γµ(δ)I. One can apply the same algebraic arguments used in the first paragraph

11

of the proof to show that the result similar to (2.23), that is, X(δ)◦Z(δ) ≥ γµ(δ)

holds if δ ≤ δ1(µ). Now, since δ ≤ ∆, we have that δ ≤ δ1(µ) is also equivalent

to (2.25). The first stage is completed.

Hence, it remains to show (2.25) holds for ∆ ∈ (0, ∆min] with ∆min from (2.19).

Let

µ1crit :=

σ(1− γ)

4(1 + γ)C2(M + n)2.

If µ ≤ µ1crit, then δ1(µ) is given by the first expression in (2.23), and since

δ ≤ C(M + n)µ ≤√

C2(M + n)2µ1critµ =

√σ(1− γ)µ

4(1 + γ)= δ1(µ),

then (2.25) follows.

If µ > µ1crit, then δ1(µ) is given by the second expression in (2.23), that is,

δ1(µ) = σ(1−γ)4(1+γ)C(M+n)

. Hence if ∆ ≤ ∆min, of course, ∆ ≤ σ(1−γ)4(1+γ)C(M+n)

, i.e.,

∆ ≤ δ1(µ), then (2.25) holds.

2. We will prove now that (2.17) holds for all ∆ ∈ (0, ∆min] with ∆min from

(2.19).

From Lemma 2.3 and αt(∆) ≤ αn(∆) we derive

θL(w(∆)) ≤ (1− αt(∆))θL(w) + ML∆2

θh(w(∆)) ≤ (1− αt(∆))θh(w) + Mh∆2.

Using θh(w) + θL(w) ≤ Mµ, we get

θh(w(∆)) + θL(w(∆)) ≤ (1− αt(∆))Mµ + (Mh + ML)∆2.

On the other hand, by Lemma 2.2,

Mµ(∆) ≥ (1− αt(∆))Mµ + σαt(∆)Mµ− 4M∆2.

Therefore, (2.17) holds whenever

(Mh + ML + 4M)∆2 ≤ σαt(∆)Mµ,

which, by (2.6), is implied by

∆ ≤ min

{√σMµ

Mh + ML + 4M,

σMµ

(Mh + ML + 4M) max{‖sn‖∗, ‖st‖∗}

},

which in turn is true, by (2.24), if

∆ ≤ δ2(µ) := min

{√σMµ

Mh + ML + 4M,

σM

(Mh + ML + 4M)C(M + n)

}.

12

Let

µ2crit :=

σM

(Mh + ML + 4M)C2(M + n)2.

If µ ≤ µ2crit, then by (2.24), δ2(µ) =

√σMµ

Mh+ML+4M, and then

δ ≤ C(M + n)µ ≤√

C2(M + n)2µ2critµ =

√σMµ

Mh + ML + 4M= δ2(µ).

If µ > µ2crit, then δ2(µ) = σM

(Mh+ML+4M)C(M+n). Hence, if ∆ ≤ ∆min, of course,

∆ ≤ σM(Mh+ML+4M)C(M+n)

, then min{∆, δ} ≤ σM(Mh+ML+4M)C(M+n)

= δ2(µ). The

similar process will implies that, for all ∆ ∈ (0, ∆min], (2.17) holds.

3.Finally we prove that (2.18) holds for ∆ ∈ (0, ∆min] such that ∆min from

(2.19). We know that (2.20) and (2.21) hold for all ∆ ∈ (0, ∆min] with ∆min

satisfying (2.19). It follows from (2.22) that

4∆2 < (αn(∆)− αt(∆)(1− σ))µ.

Thus, from (2.20), we get

X(∆) ◦ Z(∆) � γ(1− αn(∆))µI � 0.

Hence, (2.18) follows.

3 The interior-point filter method

We choose θ1, θ2 and θ3, instead of θh, θc and θg, to form a filter entry, where

θh measures feasibility, θc measures centrality and θg measures optimality. Since

θ3 = θg, we replace the objective function value by the criticality measure θg.

Definition 3.1. (Dominance). A point w, or the corresponding triple (θ1(w), θ2(w), θ3(w)),

is said to dominate a point w′, or the corresponding triple (θ1(w′), θ2(w

′), θ3(w′)),

if

θ1(w) ≤ θ1(w′), θ2(w) ≤ θ2(w

′) and θ3(w) ≤ θ3(w′),

or, alternatively, if the following inequality is violated:

max{θ1(w)− θ1(w′), θ2(w)− θ2(w

′), θ3(w)− θ3(w′)} > 0.

13

Definition 3.2. (Filter). A filter is a finite subset F ⊂ R3 consisting of triples

(θf1 , θf

2 , θf3 ) such that no triple can dominate any of the others.

The mere requirement that a new iterate is not dominated by any of the filter

entries is too weak as an acceptance criterion, instead, we require:

Definition 3.3. Let γF ∈ (0, 1/2) be fixed. The point w is acceptable to the

filter F if, for all (θf1 , θf

2 , θf3 ) ∈ F , it holds

max{θf1 − θ1(w), θf

2 − θ2(w), θf3 − θ3(w)} > γFτθf ,

where θf := θfh + θf

c = θf1 + θf

2 and τ > 0 is sufficiently small but bounded away

from zero.

In the course of algorithm, we will add selected new points to the filter. This

procedure is done in the following way.

Definition 3.4. By adding w to the filter F we mean the following operation:

F → F = {(θ1(w), θ2(w), θ3(w))} ∪ {(θf1 , θf

2 , θf3 ) ∈ F :

min{θf1 − θ1(w), θf

2 − θ2(w), θf3 − θ3(w)} < 0}.

Now we state the algorithm below.

Algorithm 3.5. Primal-dual interior-point filter method

step 0. Choose σ ∈ (0, 1), ν ∈ (0, 1), γ1, γ2, τ > 0, 0 < β, η, κ < 1, γF ∈ (0, 1/2).

Set F := ∅. Choose (X0, Z0) � 0 and y0, and determined γ ∈ (0, 1) such that

X0 ◦ Z0 ≥ γµ0I with µ0 = Tr(X0Z0)/n. Furthermore, choose M > 0 such that

θh(w0) + θL(w0) ≤ Mµ0. Choose ∆in0 > 0 and set k := 0.

step 1. Set µk := Tr(XkSk)/n and compute snk and st

k by solving the linear

systems (2.3) and (2.4), respectively, with (w, µ) = (wk, µk).

step 2. Compute ∆′k ∈ [0, ∆in

k ] such that

Xk(∆) � 0, Zk(∆) � 0, Xk(∆) ◦ Zk(∆) � γµkI for all ∆ ∈ [0, ∆′k].

step 3. Compute the largest ∆′′k ∈ (0, ∆′

k] such that

θh(wk(∆′′k)) + θL(wk(∆

′′k)) ≤ Mµk(∆

′′k).

Set ∆k := ∆′′k.

step 4. If θ(wk) ≤ ∆k min{γ1, γ2∆βk}, then go to step 5. Otherwise, add wk to

14

the filter and call a restoration algorithm that produces a point wk+1 such that:

wk+1 ∈ N (γ, M) with µk+1 = Tr(Xk+1Zk+1)/n;

wk+1 is acceptable to the filter;

θ(wk+1) ≤ ∆k+1 min{γ1, γ2∆βk} with ∆k+1 = ∆k.

Set ∆ink+1 := ∆k, k := k + 1, and go to step 1.

step 5. If wk(∆k) is not acceptable to the filter (and mk(wk) − mk(wk(∆k)) <

κθ(wk)2), then go to step 11.

step 6. If mk(wk)−mk(wk(∆k)) = 0, then set ρk := 0. Otherwise,

ρk :=θg(wk)− θg(wk(∆k))

mk(wk)−mk(wk(∆k)).

step 7. If ρk < η and mk(wk)−mk(wk(∆k)) ≥ κθ(wk)2, then go to step 11.

step 8. If mk(wk)−mk(wk(∆k)) < κθ(wk)2, then add wk to the filter.

step 9. Choose ∆ink ≥ ∆k.

step 10. Set wk+1 := wk(∆k), k := k + 1, and go to step 1.

step 11. Set wk+1 := wk, snk+1 := sn

k , stk+1 := st

k, ∆′k+1 := ∆k/2. Set k := k + 1

and go to step 3.

Remark 3.6. In step 2 and step 3, we use a Armoji-like method to get ∆′k and

∆′′k. Note that step 5 guarantees that the potentially new iterate wk(∆k) is always

acceptable to the filter. For the restoration algorithm, see [7].

4 Global convergence

Now we assume that {wk} is a sequence of iterates generated by Algorithm 3.5.

We will also impose the following assumptions.

Assumption 4.1. (A1) The sequence {(xk, yk, Zk)} is bounded.

(A2) The derivatives ∇h and ∇2xwL exist and are Lipschitz continuous in an open

set containing all iterates (xk, yk, Zk) and the line segments [wk, wk + sk(∆k)].

(A3) There exists C > 0 such that for all k it holds ‖F ′k(wk)

−1‖F ≤ C.

The following lemma is a direct consequence of the assumptions, Lemma 2.3

and Lemma 2.4.

15

Lemma 4.2. [7, Lemma 3] The following hold:

i) The sequence {θh(wk)}, {θc(wk)}, {µk}, and {θg(wk)} are bounded.

ii) The constants Mh, Mc, ML, Mg in Lemma 2.3 are bounded for all k.

iii) There exists ∆min > 0 such that the conditions in step 2 and step 3 are

satisfied for all ∆′k, ∆

′′k ∈ [0, ∆min]. Thus, step 2 and step 3 leave ∆in

k unchanged

for 0 ≤ ∆ink ≤ ∆min and we have ∆k = ∆in

k .

iv) It holds max{‖snk‖∗, ‖st

k‖∗} ≤ C(M + (n2 − n + 1)1/2)µk for all k.

Lemma 4.3. (Lemma 4,[7]) If wk is added to the filter, then θ(wk) > 0.

Lemma 4.4. (Lemma 5,[7]) In all iterations k ≥ 0, the current iterate wk is

acceptable to the filter.

Lemma 4.5. (Lemma 6,[7]) There exists a ∆r > 0 such that, if ∆k ≤ ∆r in

step 5, it holds

θ(wk(∆k)) ≤ (Mh + Mc)∆2k.

Lemma 4.6. Suppose that θ(wk)+ θg(wk) ≥ ε > 0 for all k. There exists ∆a > 0

depending only on ε and on the values of the filter entries, such that, if

0 < ∆k ≤ ∆a,

then w(∆k) is in step 5 acceptable to the filter (with wk considered in the filter

when mk(wk)−mk(wk(∆k)) < κθ(wk)2).

Proof. One has from Lemma 4.3 that

θF := minF

(1− γF)θf > 0.

Consider first θ(wk) ≥ ε/2. Then wk(∆k) is acceptable to the filter (with wk

considered in the filter when mk(wk)−mk(wk(∆k)) < κθ(wk)2) if either

θ1(wk)− θ1(wk(∆k)) ≥ τγFθ(wk), θf1 − θ1(wk(∆k)) ≥ τγFθf ,

or

θ2(wk)− θ2(wk(∆k)) ≥ τγFθ(wk), θf2 − θ2(wk(∆k)) ≥ τγFθf .

The above conditions hold only if either

θ1(wk(∆k)) ≤1

2min{θ1 − τγFθ, θf

1 − τγFθf} (4.1)

16

or

θ2(wk(∆k)) ≤1

2min{(1− γF)τε/2, τθF} < min{(1− γF)τε/2, τθF}. (4.2)

Here (4.2) is due to θ1 ≤ 12θ ≤ θ2 ≤ θ, θf

1 ≤ 12θf ≤ θf

2 ≤ θf . The right-hand side

of (4.1) is nonnegative whenever θ1 ≥ τθ and θf1 ≥ τθf . At this time, if

θ1(wk(∆k)) ≤1

2min{(1− γF)τε/2, τθF} < min{(1− γF)τε/2, τθF}, (4.3)

then wk(∆k) is acceptable. And also from Lemma 4.5,

θ(wk(∆k)) ≤ (Mh + Mc)∆2k

holds for ∆k ≤ ∆r. Thus (4.2) or (4.1), (4.3) is satisfied for ∆k ≤ ∆(1)a with

∆(1)a > 0 depending only on θF , ε, Mh, Mc, γF , and ∆r.

Now we consider θg ≥ ε/2. If wk is not consider in the filter in step 5, then

θ1(wk(∆k)) ≤1

2(θf

1 − τγFθf ),

or

θ2(wk(∆k)) ≤1

2(θf

2 − τγFθf ),

will show that if ∆k ≤ ∆(1)a then wk(∆k) is acceptable to the filter. On the other

side, with wk considered in the filter when mk(wk) −mk(wk(∆k)) < κθ(wk)2, if,

in addition,

θg(wk(∆k))− θg(wk) < −γFτθ(wk) (4.4)

Since step 5 is reached, we know that

θ(wk) ≤ γ2∆1+βk .

And we can obtain from θg(wk) ≥ ε/2 and Lemma 2.3 with 0 < ∆k ≤ ∆ub that

θg(wk(∆k))− θg(wk) ≤ −(1− σ)αtkε/2 + Mg∆

2k.

What we need to show is

−(1− σ)αtkε/2 + Mg∆

2k < −γτγ2∆

1+βk .

Since ‖snk‖∗ and ‖st

k‖∗ are bounded by a constant Ms and αtk = min{1, ∆k

‖snk‖∗

, ∆k

‖stk‖∗},

we have for all ∆k ≤ Ms, that αtk ≥ ∆k/Ms. Thus (4.4) holds if

Mg∆k + γFτγ2∆βk ≤

(1− σ)ε

4Ms

<(1− σ)ε

2Ms

,

which in turn holds for all ∆k ≤ ∆(2)a with ∆

(2)a > 0 depending only on ε, Mg, γF ,

τ , γ2, β, σ, Ms, and ∆ub. Taking ∆a = min{∆(1)a , ∆

(2)a } concludes the proof.

17

Lemma 4.7. Suppose that for given ε > 0

θg(wk) ≥ ε and θ(wk) >∆k

2min{γ1, γ2(∆k/2)β}. (4.5)

Then there exists ∆f > 0 such that, if

0 < ∆k ≤ ∆f ,

then wk(∆k) is in step 5 acceptable to filter (with wk considered in the filter when

mk(wk)−mk(wk(∆k)) < κθ(wk)2).

Proof. The proof is very similar to the proof of Lemma 8 in [7].

Lemma 4.8. (Lemma 9,[7]) Beginning from the moment where a point wk is

added to the filter, the filter always contains an entry that dominates wk.

We first investigate what happens if infinitely many values are added to the

filter in the course of the algorithm.

Lemma 4.9. Suppose that infinitely many values of (θ1, θ2, θ3) are added to the

filter. Then there exists a subsequence {ki} such that wkiis added to the filter

and

limi→∞

θh(wki) = lim

i→∞θc(wki

) = 0.

Proof. Now assume, for the purpose of a contradiction, that there exists a sub-

sequence {kj} ⊆ {ki} such that

θh(wkj) + θc(wkj

) ≥ ε1

for some ε1 > 0. Since the assumptions imply that θh(wkj), θc(wkj

), θg(wkj) are

all bounded above and below, there must exist a further subsequence {kl} ⊆ {kj}such that

liml→∞

θh(wkl) + θc(wkl

) = θh(w∞) + θc(w∞), (4.6)

with

θh(w∞) + θc(w∞) ≥ ε.

18

Moreover, by the definition of {θkl}, θkl

is acceptable for the filter for every

l, which implies that for each l, at least one holds among the following three

inequalities,

θ1(wkl)− θ1(wkl−1

) < −γFτθ(wkl),


) < −γFτθ(wkl),


) < −γFτθ(wkl).

(4.7)

Hence we deduce from (4.6) that at least one holds among the following three

inequalities,


) < −γFτε,


) < −γFτε,


) < −γFτε,

for l sufficiently large. For the successful one among the three, the left-hand side

tends to zero when l tends to infinity because of (4.6), yielding the contradiction.

Hence

limi→∞

θh(wki) = lim

i→∞θc(wki

) = 0.

Since iterates are added to the filter only if restoration is invoked or in step 8,

the sequence {ki} in the above lemma must contain either a subsequence where

restoration is invoked, or a subsequence where the iterates are added to the filter

in step 8. We start the first case.

Lemma 4.10. (Lemma 11,[7]) Suppose that there exists an infinite sequence

{ki} of iterations at which restoration is invoked and for which holds

limi→∞

θ(wki) = 0.

Then {ki} contains a subsequence {k′j} with

limj→∞

θ(wk′j) = 0, lim

j→∞θg(wk′j

) = 0.

The other situation is where the sequence {ki} of Lemma 4.9 contains a sub-

sequence, where the iterates are added to the filter in step 8.

19

Lemma 4.11. (Lemma 12,[7]) Suppose that there exists an infinite sequence

{ki} of iterations for which wkiis added to the filter in step 8 and, in addition,

limi→∞ θ(wki) = 0. Then {ki} contains a subsequence {k′j} such that

limj→∞

θ(wk′j) = 0, lim

j→∞θg(wk′j

) = 0.

We summarize both situations in the next theorem.

Theorem 4.12. (Theorem 1,[7]) Suppose that infinitely many iterates are

added to the filter. Then there exists a subsequence {kj} such that

limj→∞

θ(wkj) = 0, lim

j→∞θg(wkj

) = 0.

It remains to consider the case where the algorithm runs infinitely but only

finitely many iterates are added to the filter.

Theorem 4.13. (Theorem 2,[7]) Suppose that the algorithm runs infinitely and

only finitely many iterates are added to the filter. Then

limk→∞

θ(wk) = 0, lim infk→∞

θg(wk) = 0.

The main result is obtained by combining Theorem 4.12 and Theorem 4.13.

Corollary 4.14. Under Assumption 4.1, the sequence of iterates {wk} generated

by Algorithm 3.5 satisfies

lim infk→∞

θ1(wk) + θ2(wk) + θ3(wk) = 0.

5 Concluding remarks

We propose a primal-dual interior-point filter method for nonliner semidefinite

programming, where a freshly new definition of filter entries is given. After

the definition, the three-dimensional filter method is proposed for constrained

optimization for the first time. Though the definition of acceptance to filter

is slightly strong in this paper, the new definition of filter triples maybe open

the access to constrained optimization. We expect the future work reducing the

requirement of acceptance to filter.

20

References

[1] R. Fletcher, S. Leyffer, and P.L. Toint. On the Global Convergence of and

SLP-Filter Algorithm, Report 98/13, Dept of Mathematics, FUNDP, Namur

(B), 1998.

[2] R. Fletcher and S. Leyffer, Nonlinear programming without a penalty func-

tion. Mathematical Programming, 91(2):239–269, 2002.

[3] On the Global Convergence of a Filter–SQP Algorithm. SIAM Journal on

Optimization, 13(1):44–59, 2002.

[4] R. Fletcher, N. I. M. Gould, S. Leyffer, P. L. Toint, A. Wachter. Global

convergence of a trust-region SQP-filter algorithm for general nonlinear pro-

gramming. SIAM Journal on Optimization, 13(3):635–659, 2002.

[5] N. I. M. Gould, S. Leyffer, and Ph. L. Toint, A multidimensional filter al-

gorithm for nonlinear equations and nonlinear least-squares. SIAM Journal

on Optimization, 15(1):17-38, 2005.

[6] N. I. M. Gould, C. Sainvitu, and Ph. L. Toint, A filter-trust-region method

for unconstrained optimization. SIAM Journal on Optimization, 16(2):341-

357, 2006.

[7] M. Ulbrich, S. Ulbrich, and L. N. Vicente, A globally convergent primal-

dual interior-point filter method for nonlinear programming.Mathematical

Programming,100(2):379–410, 2004.

21

a globally convergent primal-dual interior-point three ... · a globally convergent primal-dual...

Documents