applied nonlinear optimizationromisch/papers/rotrobmg.pdf · applied nonlinear optimization werner...

Applied nonlinear optimization

werner romisch and fredi troltzsch

1 INTRODUCTION

Modelling and simulation of complex processes in key technologies are the main issuesof the DFG research center. Being able to efficiently deal with the complex modelsestablished and having the corresponding mathematical basis available, engineers andpractitioners are interested in optimizing these models according to some objective.This is reflected by several scientific projects in the center. They cover various fields ofoptimization ranging from discrete, nonlinear, or stochastic optimization to optimalcontrol of ordinary or partial differential equations. In some of the projects, differentfields interact and grow together.

Traditionally, Berlin is a good place to deal with optimization. All fields mentionedabove are covered by well known mathematicians working at one of the scientificinstitutions in Berlin.

In this paper, we first introduce some basic ideas of nonlinear optimization, aimingat guiding the reader from the calculus of variations started in 1696 to necessaryoptimality conditions for nonlinear optimization problems in Banach spaces. Finally,we arrive at specific applied optimization problems which shall be the subject of studyin the DFG research center. A detailed presentation of all aspects of optimizationthat shall be treated in the center would go beyond the scope of this contribution.Therefore, we will only shed light on a few selected projects contributing to the fieldsof stochastic optimization and optimal control of partial differential equations, wherethe two authors are engaged.

Nonlinear optimization developed as an independent field of applied mathematicsin the early fifties of the 20th century. It grew out of linear programming, which hadposed new questions in the field of extremal problems during World War II, and hadinvented the simplex method. Over a long time this method was the most successfulnumerical technique to solve optimization problems.

Since a long period before, nonlinear optimization – in the sense of extremal prob-lems – is part of calculus. The unconstrained minimization of the Rosenbrock function

f(x1, x2) = 100 (x2 − x21)

2 + (1 − x1)2

belongs to this class. It is obvious that the absolute minimum of this function isattained at x = (1, 1). Therefore, it is quite surprising that this simple minimizationproblem serves as a benchmark example for numerical methods of unconstrainedoptimization. A glimpse at the graph of f shows the source of troubles – it has theform of a banana-shaped valley, where it is difficult to find the minimum iteratively(see Figure 1).

22 werner romisch and fredi troltzsch

−2

−1

0

1

2

−1

0

1

2

3

4

−5

0

5

10

figure 1. Rosenbrock function.

The situation becomes more difficult if more general constraints (e.g., inequalities)are given. Let us consider the nonlinear optimization problem in finite-dimensionalspaces,

Minimize f0(x) subject to x ∈ C, (P1)

where f0 : Rm → R is differentiable and C is a closed subset of R

m. Clearly, theproperties of the boundary of C can be crucial when characterizing a solution to (P1).The boundary may have all kinds of curvilinear facets, edges and corners. Hence, anapproach to geometry that can cope with such a lack of smoothness is needed. Suchan approach, sometimes called variational geometry, was developed by associatingcertain cones with each point of C instead of the classical subspaces [4], [37].

To characterize optimality for (P1) one makes use of the concepts of tangent andnormal cones to a set and of the regularity of a set. The sets

TC(x) :=

w ∈ Rm : w = lim

n→∞

xn − x

τn

for some sequences xnC−→ x, τn ց 0

NC(x) :=

v ∈ Rm : lim sup

xC

−→x

x 6=x

〈v, x − x〉

‖x − x‖≤ 0

are called the tangent cone and the normal cone, respectively, to C at some elementx ∈ C. Indeed, the sets TC(x) and NC(x) are closed cones at every x in C. In addition,the normal cone is convex and polar to the tangent cone, i.e., NC(x) = v ∈ R

m :〈v, w〉 ≤ 0 for all w ∈ TC(x). The set C is said to be regular at x ∈ C, if for everyv ∈ NC(x), there exist sequences (xn) in C and (vn) with vn ∈ NC(xn) that convergeto x and v, respectively. If C is regular at x, the cones NC(x) and TC(x) are polar toeach other and the tangent cone is convex, too. If C is convex, it is regular at any x ∈ Cand the normal cone is of the form NC(x) = v ∈ R

m : 〈v, x − x〉 ≤ 0 for all x ∈ C.Now, a necessary condition for x ∈ C to be a local minimizer of (P1) reads

0 ∈ ∇f0(x) + NC(x) . (1)

applied nonlinear optimization 23

If x is an interior point of C, it holds NC(x) = 0 and, hence, condition (1) reducesto the classical Fermat’s rule ∇f0(x) = 0. If C is convex, condition (1) is equivalentto the well known variational inequality

〈∇f0(x), x − x〉 ≥ 0 for all x ∈ C.

The latter condition is sufficient for x to be globally optimal, if also f0 is convex. Ingeneral, optimality conditions represent generalized equations.

If C is a set with constraint structure, it is possible to obtain explicit forms fortheir normal cones at points where C is regular. This leads to Lagrange multiplierformulations of optimality conditions. Let C be of the form

C = x ∈ D : F (x) ∈ K,

where the sets D ⊆ Rm, K ⊆ R

d are closed and F = (f1, . . . , fd) : Rm → R

d iscontinuously differentiable. Then one has

NC(x) ⊇

d∑

i=1

yi∇fi(x) + z : y = (y1, . . . , yd) ∈ NK(F (x)), z ∈ ND(x)

(2)

at any x ∈ C. The set C is regular at x and equality holds in (2) if the sets D and Kare regular at x and F (x), respectively, and the constraint qualification

[

y ∈ NK(F (x)), −d

∑

i=1

yi∇fi(x) ∈ ND(x)]

=⇒ y = 0 (3)

or, equivalently,TK(F (x)) + ∇F (x)TD(x) = R

d (4)

is satisfied [37, 6.14 and 6.39]. In many applications the sets D and K are convex and,hence, regular at any of their elements. D is often polyhedral, containing simple con-straints (e.g., bounds), and K is a polyhedral cone given by finitely many inequalitiesand equations. In such cases the tangent and normal cones are polyhedral and allowexplicit representations.

If the above regularity conditions are satisfied at a local minimizer x of (P1), theoptimality condition (1) implies the existence of y ∈ NK(F (x)) such that

−(

∇f0(x) +d

∑

i=1

yi∇fi(x))

∈ ND(x). (5)

If K is a convex cone, the condition y ∈ NK(F (x)) is equivalent to F (x) ∈ K, y ∈ K−

with K− denoting the polar cone to K and 〈y, F (x)〉 = 0. The latter result is known asthe Karush–Kuhn–Tucker theorem [21], [24] or as the Lagrange multiplier rule. Thecrucial conditions (3) and (4), respectively, are usually referred to as (generalized)Mangasarian–Fromovitz constraint qualifications.


A

B

figure 2. The Brachistochrone problem.

Many real-world or scientific extremal problems cannot be posed in finite dimen-sional spaces. Often, functions x = x(t) of a certain variable t are unknown ratherthan vectors x. One of the oldest examples of this type is the famous Brachistochroneproblem posed by Johann Bernoulli in 1696 (see Figure 2). It considers a bead slid-ing frictionless under gravity along a smooth curve joining two points A and B andasks what shape the wire should be such that the bead, when released from rest inA, should slide to B in minimum time. In this problem, now viewed as the startingpoint of the calculus of variations, an optimal function has to be found, see [1] or[16]. Although this first mathematical problem in function spaces was posed morethan 300 years ago, nonlinear optimization in function spaces or – more general –in Banach spaces came up only after the beginning of nonlinear optimization. Thistheory investigates problems of the type

Minimize f(x) subject to x ∈ C, F (x) ∈ K (P2)

where f : X → R and F : X → Y are Frechet differentiable mappings, X, Y are Ba-nach spaces, and C and K are closed convex subsets of X and Y , respectively. Today,the abstract theory of necessary optimality conditions for optimization problems ofthe type (P2) is well developed. In particular, the Lagrange multiplier rule

−∇xL(x, λ) ∈ NC(x), λ ∈ NK(F (x))

holds at a regular local minimizer x of (P2) that is analogous to (5) for (P1) [26], [8].Here, L(x, λ) := f(x) + 〈λ, F (x)〉, for (x, λ) ∈ X × Y ∗, with Y ∗ denoting the dual toY , is the Lagrangian of (P2).

This general class of problems covers many important mathematical optimiza-tion problems. In particular, the fields of optimal control and stochastic optimization

belong to this class. Nevertheless, people working in optimal control theory have expe-rienced that a direct application of the abstract optimality conditions does often notprovide satisfactory results. The reason is that the dual spaces of Lagrange multipliersare not useful in many applications. Therefore, each class of optimization problemsneeds a special theoretical treatment, guided by the abstract theory.


y

u

figure 3. An oscillating pendulum.

2 NONLINEAR OPTIMIZATION

2.1 Optimal control

In optimal control, the unknown element x is a vector of two functions y and u,x = (y, u). The function u denotes the control that must be chosen optimally. Forinstance, think of the engines of a space craft that must be fired such that the vehiclemoves optimally. The function y stands for the state of the physical system that isinfluenced by u. In optimal control, except in discrete control systems, y is obtainedas the solution of a system of differential or integral equations.

Let us consider a simple academic example to explain the situation: An oscillatingpendulum (see Figure 3) should be stopped by a controllable force in minimum time,see [27]. In mathematical terms, the problem reads

min T (P3)

y′′(t) = −c1 sin(y(t)) + c2 u(t)

y(0) = y0 y(T ) = 0

y′(0) = y1 y′(T ) = 0

|u(t)| ≤ umax.

Here, y is the deflection angle of the pendulum, t ∈ [0, T ] denotes the time, c1, c2

are certain constants, and y0, y1 are the initial values for the angle and the angularvelocity. This is a characteristic example of optimal control of ordinary differential

equations, and it is formally clear how this problem is related to the optimizationproblem in Banach spaces defined above: The function f corresponds to T , the map-ping F with the cone K = 0 covers the differential equation together with theinitial- and terminal conditions, while C is formed by the functions u with maximumabsolute value umax.


Today, it is no challenge to solve this simple problem numerically. A linearizedversion, where sin(y) is approximated by y, can be solved even analytically. Theassociated theoretical basis is the Pontryagin Maximum Principle, a fundamentalresult belonging to the greatest achievements of applied mathematics in the 20thcentury [30].

In some sense it is a generalization of the Lagrange multiplier rule for optimizationproblems in Banach spaces and provides an optimality system containing the equationsof (P3) for y, an adjoint differential equation for an adjoint state w, and the so-calledmaximum condition for u, all being mutually coupled. This optimality system is thebasis of direct methods to solve optimal control problems numerically. The optimalcontrol of ordinary differential equations has many applications, for instance in spaceflight, aviation, robotics, chemical processes or in the generation of electrical power.

There were so many phenomena that could not be appropriately modelled by or-dinary differential equations that the investigation of optimal control problems forpartial differential equations soon came into play. The associated theory is more diffi-cult, and the numerical solution of such problems is still a challenge for practitionersand mathematicians. One of the most important monographs on this subject is thewell known book [25]. It did not take long and interesting applications were discussed.We refer, for instance, to [10], where several industrial problems were considered.

Let us explain the situation by an academic problem again. Consider a 3-dimen-sional bounded domain Ω ⊂ R

3 standing for a body of metal that should be heated inan optimal way. We assume that the energy is produced by induction heating so thatthe controlled heat source u appears in the body Ω. The temperature at a point x ∈ Ωis denoted by y(x), while yd = yd(x) stands for a desired steady-state temperature inΩ to be generated. This problem is modelled by the following optimal control problemfor the Poisson equation:

min

∫

Ω

(y(x) − yd(x))2dx (P4)

subject to

−∆y(x) = u(x) in Ω

y(x) = yΓ(x) on Γ

and0 ≤ u(x) ≤ umax.

Here, yΓ is the temperature at the boundary Γ of Ω, which is assumed to be known.Although this problem is very simple and yet academic, from a mathematical

point of view it covers main aspects of the problems posed in some projects of theresearch center. Among other states, in these projects the temperature plays themost significant role too, and it is obtained as the solution of a heat equation. Thecontrol appears as a source term in the right-hand side of the heat equation. Moreover,inequalities are given as additional constraints.


Real-world equations are more difficult to handle than the simple Poisson equationin (P4). In the projects of the DFG center they form nonlinear systems for differentstate functions, hence the theory of existence and uniqueness is more difficult. Also,their numerical solution is a real challenge. Consequently, associated optimal con-trol problems are not easy to handle. Their discretization leads to very large scaleoptimization problems which require special algorithms to be solved. Another spe-cific difficulty arises from the presence of so-called pointwise state-constraints, i.e., ofinequality constraints imposed on the state function y.

2.2 Stochastic optimization

Stochastic optimization is concerned with models that require an optimal decision onthe basis of given probabilistic information on random data. We refer to [31], [7] forintroductory textbooks and to [45] for a recent state-of-the-art volume. To give anidea of such models, let ξt

Tt=1 be a discrete-time stochastic data process on some

probability space (Ω,F , P) with ξt : Ω → Rst . The stochastic decision xt : Ω → R

mt

at period t is assumed to be nonanticipative, i.e., to depend only on ξt := (ξ1, . . . , ξt).With Ft ⊆ F denoting the σ-algebra generated by ξt, we have Ft ⊆ Ft+1 for t =1, . . . , T − 1. Furthermore, we assume that F1 = ∅, Ω, i.e., that the data ξ1 att = 1 is deterministic, and that FT = F . We assume that ξt ∈ L1(Ω,F , P; Rst) andxt ∈ L∞(Ω,F , P; Rmt) for each t = 1, . . . , T . Then the nonanticipativity conditionmay be expressed by the subspace

Nna = x ∈ ×Tt=1L∞(Ω,F , P; Rmt) : xt = E[xt|Ft] , t = 1, . . . , T

using the conditional expectation E[·|Ft] with respect to the σ-algebra Ft.We consider the stochastic optimization model

min E

[

T∑

t=1

ct(ξt, xt)]

s.t. x ∈ Nna, xt ∈ Xt(ξt), gt(ξt, xt, xt−1) ≤ 0 (P5)

where the sets Xt(ξt) are nonempty and compact, ct and gt are affine linear functionsin each variable, and E denotes expectation w.r.t. P.

(P5) represents an infinite-dimensional optimization problem. It becomes finite-dimensional if Ω is finite, i.e., Ω = ωsS

s=1 and F is the power set of Ω. Let ξst :=

ξt(ωs) be the data scenario s at time t, xst the decision scenario s at t, and ps :=

P(ωs) the probability of scenario s. Then the nonanticipativity condition x ∈ Nna

can be expressed by a finite number of linear equality constraints for the decisionvariables xs

t (e.g., using finite partitions Et of Ω that generate Ft). Then the stochasticprogram (P5) takes the scenario form

minS

∑

s=1

T∑

t=1

psct(ξst , x

st ) s.t. x ∈ Nna, xs

t ∈ Xt(ξst ), gt(ξ

st , x

st , x

st−1) ≤ 0.

Such optimization models contain ST vectorial decisions, about 3ST constraints and,hence, are large scale. The number S is typically large as it comes from an approxi-


mation or sampling procedure, and, thus, the model is extremely huge in most casesand requires specific solution approaches.

As Ft is contained in Ft+1, each element of Et can be represented as the union ofcertain elements of Et+1. Representing the elements of Et by nodes and the above setrelations by arcs leads to a tree that is called scenario tree (see Figure 4). It is based ona finite set N ⊂ N of nodes, where n = 1 stands for the root node at period t = 1. Wedenote by n− the unique predecessor of node n, by path(n):=1, . . . , n−, n the pathfrom the root node to node n with length t(n) := |path(n)|, by Nt := n : t(n) = t thenodes at time t and by N+(n) the set of successors to node n. By πnn∈NT

:= psSs=1

and πn :=∑

n+∈N+(n) πn+, n ∈ N , we assign probabilities to each node.

s

n = 1s

@@

@@

s

s

n−

s

HHHH

HHHH

s

s

sn

s

s

s

s

XXXX

XXXX

s

s

s

s

s

N+(n)s

s

s

NT

s

s

s

q

t = 1

q

t1

q

t(n)

q

T

figure 4. Scenario tree with t1 = 2, T = 5, |N | = 23 and 11 leaves.

Let ξnn∈Ntand xnn∈Nt

denote the realizations of ξt and xt, respectively. Thenthe scenario tree form of the stochastic program (P5) reads

min∑

n∈N

πnct(n)(ξn, xn) s.t. xn ∈ Xt(n)(ξ

n), gt(n)(ξn, xn, xn−) ≤ 0.

The dimensions of the scenario tree form are considerably smaller than in the previousformulation, as the number |N | of nodes is much smaller than TS and the nonan-ticipativity constraints are incorporated into the tree structure. The models exhibitspecial structures, but are still of enormous size.

The standard algorithmic approach in stochastic programming is, consequently,decomposition [6], [43]. While primal decomposition approaches need convexity to be-come efficient for reasonably large models, dualization techniques seem to work wellin large scale nonconvex situations, too. Dual decomposition approaches circle around


duality results for stochastic programs (see [36] for classical work in this direction).They are based on Lagrangian relaxation of certain groups of constraints and on solv-ing the corresponding Lagrangian dual by subgradient-type methods. For instance,relaxing the nonanticipativity constraints leads to scenario decomposition, and relax-ing component coupling constraints to component or geographic decomposition (see[41] for details). Hence, the specific decomposition strategy depends on the modelstructure and, in the nonconvex case, on the size of the relevant duality gaps [11]. Bymeans of the recursive dynamic programming construction

fT (y1, . . . , yT , ξ(ω)) :=

T∑

t=1

ct(ξt(ω), yt),

ϕt(y1, . . . , yt, ω) := Er[ft(y1, . . . , yt, ξ(·))|Ft](ω),

ft−1(y1, . . . , yt−1, ξ(ω)) := infy

ϕt(y1, . . . , yt−1, y, ω),

for t = T, . . . , 2 and for each ω ∈ Ω and feasible yτ , τ = 1, . . . , T , the original stochasticprogram can be shown to be equivalent to the nonlinear program

min E[f1(x1, ξ)] s.t. x1 ∈ X1(ξ1) (P6)

for determining the (deterministic) decision x1 ∈ Rm1 at t = 1. Here, E

r[·|Ft] denotesthe regular conditional expectation, which represents a version of E[·|Ft] satisfyingspecial measurability properties. This procedure works under quite weak measurabil-ity and boundedness assumptions [14].

The nonlinear program (P6) serves as a basis for stability studies of the originalstochastic program (P5). It is called stable at the underlying probability distributionP of ξ if the optimal values and solution sets of the latter program behave contin-uously for small perturbations of P in some spaces of measures equipped with someprobability metric. The corresponding analysis is based on general perturbation re-sults of optimization problems [5], [8], [37]. We refer to [40] for bibliographical noteson the stability of stochastic programs and to [32] for a systematic theory of prob-ability metrics. Stability properties justify the approximation or estimation of P bysimpler (e.g., finitely discrete) measures. In [33], [40] certain ideal probability metricsfor stability are associated with specific classes of stochastic programs. They may beused directly or as a guide for constructing scenario tree approximations to P .

3 OPTIMAL CONTROL OF PDES AND STOCHASTIC OPTIMIZATION

3.1 Production of silicon carbide bulk single crystals

The control of PDEs plays a role in a number of projects of the DFG center. In partic-ular, this concerns the application of optimization methods in regional hyperthermia,methods of model reduction for various kinds of PDEs, and the optimization in crystalgrowth. To give the reader an impression, we concentrate here on the latter subject.

Silicon carbide (SiC) bulk single crystals have important applications in key tech-nologies (MESFETs, thyristors, LEDs, lasers, sensors, etc.), especially in high power,


SiC Sour e Powder1

Blind Hole for Coolingand MeasuringGraphite Cru ibleIndu tion Heating

Gas Mixture3 SiCseed 2figure 5. Induction-heated graphite crucible.

high frequency, high temperature, or intensive radiation environments. The mainprocedure for their production, the Physical Vapor Transport method, employs aninduction-heated graphite crucible containing polycrystalline SiC source powder andan SiC single crystal seed cooled by means of a blind hole. The system is kept in alow pressure inert gas atmosphere and is heated up to temperatures between 2000and 3000 K by an induction coil located around the crucible (see Figure 5).

High temperature and low pressure let the SiC powder sublimate, adding speciessuch as Si, Si2C, and SiC2 to the gas phase. The crystallization occurs at the cooledseed, which thereby grows into the reaction chamber. Quality and growth rate of thecrystal strongly depend on the evolution of temperature distribution, mass transport,and species concentrations. The research group of J. Sprekels (WIAS) has a longstanding cooperation in modelling and simulation of these processes with the Instituteof Crystal Growth (IKZ) Adlershof.

It is a challenging technological aim to optimize quality and growth rate of thecrystals by adjusting the control parameters. This issue will be considered in researchgroups headed by O. Klein, J. Sprekels (WIAS) and A. Rosch, F. Troltzsch (TU).

We explain the situation for a simplified model where the optimization aims atgenerating a radially constant temperature profile in the growth chamber Ωg, i.e.,∂θ/∂r = 0 should hold in Ωg. This gives rise to the problem

min

∫

Ωg

T∫

0

H(θ(x, t) − θmin, crystal)(∂θ

∂r(x, t)

)2

dx dt (P7)


(H: Heaviside function) subject to

ρ c(θ)∂θ

∂t= div (k(θ) grad θ) + u in Ω × [0, T ],

k(θ)∂θ

∂n= σ (θ4

a − θ4) on Γo × [0, T ],(

(k(θ)∇θ)gas − (k(θ)∇θ)solid)

· n = R − J on Γi × [0, T ],

θ(x, 0) = θ0 in Ω

and

θ(x, t) ≤ θ(x′, t) ∀x ∈ Γ1, ∀x′ ∈ Γ2, ∀t ∈ [0, T ],

θmin ≤ θ(x, t) ≤ θmax ∀x ∈ Ω, ∀t ∈ [0, T ],

θmin, crystal ≤ θ(x, T ) ≤ θmax, crystal ∀x ∈ Γ2,

δ + θ(x, T ) ≤ θ(x′, T ) ∀x ∈ Ωsource, ∀x′ ∈ Γ2,

0 ≤ P (t) ≤ Pmax ∀t ∈ [0, T ].

In this setting, the following quantities are used: The domain Ω (covering crucibleand growth chamber Ωg), the domain Ωsource of the SiC source, the boundary Γi =Γ1 + Γ2 + Γ3 of Ωg including the surface of the SiC source Γ1 and the surface of theSiC seed crystal Γ2 (see Figure 5), the outer boundary Γo of the crucible, outer tem-perature θa, initial temperature θ0, and the average power P (t) of the induction coil.The number δ is a temperature difference to be established between SiC source andseed, and [θmin, crystal, θmax, crystal] constitutes the temperature range leading to thegrowth of the desired SiC polytype. Moreover, the so-called radiosity R and the irradi-ation J occur. Both quantities depend on integrals containing θ and other quantities.Therefore, the heat equation is given with a non-local boundary condition.

In a more realistic setting, the boundary conditions in the blind holes are differentfrom those posed at the other parts of the crucible. Moreover, the heat equation isoversimplified. Being more precise, it admits different forms in Ωg and Ω \ Ωg.

The function u = u(x, t; P ) stands for the heat source generated by inductionheating. It depends on a scalar magnetic potential Φ that is obtained from Maxwell’sequations, where the average power P (t) plays the role of the actual control. We donot specify all these equations and associated boundary conditions here. We only statethat a mapping P 7→ Φ 7→ u is provided by the solution of Maxwell’s equations. Undermore realistic assumptions, the heat equation and Maxwell’s equations are mutuallycoupled. In the simplified case of decoupling, all functions u form a set of admissibleauxiliary controls which, in some sense, can be precomputed.

Compared with the simplified problem (P4), we see some similarities, but alsoadditional difficulties. The state equations are quasilinear and now we have a cou-pled system of PDEs consisting of the instationary heat equation for θ and Maxwell’sequations for induction heating. Moreover, certain diffusion equations modelling thetransport of chemical substrates and equations for chemical reactions have to be con-sidered additionally. Non-local boundary conditions are given. Altogether, the phys-ical behaviour of the process is modelled by a highly nonlinear system of equations.


Their analysis is a challenge in itself. The numerical analysis including all aspectsof optimization is even more complicated, since also pointwise state constraints aregiven.

It is clear that the analysis of this problem must begin with simplified models. Thesame concerns the numerical optimization. Having solved the simplified problems,more difficulties can be included to finally approach the problem in its full generality.

The investigations do not have to start from scratch. We refer to [9], [22], [23],[29] (modelling and simulation of crystal growth), [35] (Pontryagin principle for thecontrol of semilinear parabolic equations with state-contraints), [34] (associated sec-ond order sufficient optimality conditions), and [2], [17], [44], [42] (numerical analysisand application of the SQP Method).

3.2 Electricity portfolio management under risk

Traditional models in stochastic programming and in stochastic power managementare based on minimizing costs or maximizing expected revenues. Typically, such mod-els do not reflect the risk of decisions. In power utilities, portfolio and risk manage-ment are often considered separate tasks. Recently, it was proposed to unify earlierapproaches to risk modelling, to identify reasonable properties of risk (e.g., coherence,convexity), to develop a theory of risk measures [3], [15], and to incorporate risk func-tionals into stochastic programming models [38]. Incorporating risk into the objectiveor constraints of stochastic programs may change their structural and stability prop-erties and algorithmic approaches [39], [40]. Accordingly, the first challenge consists inidentifying favourable properties of risk functionals for the stability and computationsof/in mean-risk stochastic programming models. Here, mean-risk stands for modelscontaining expectation and risk terms in the objective or constraints.

Let us consider a German power utility that owns a hydro-thermal generationsystem and acts in the liberalized electricity market. The utility aims at decisions onproduction and trading electricity such that its profit is maximized and all opera-tional constraints are met. Corresponding (deterministic) optimization models wereregularly solved for short- and mid-term time horizons in the past. During the lastyears the utilities were confronted with new challenges. The former mostly bilateralpower contracts are now supplemented by a variety of electricity contracts at powerexchanges. Due to the increasing role of competition and trading, the stochasticity ofdata (e.g., electricity prices and electrical load) becomes more and more important.

Let I be the set of power generation units and contracts (also called units in thefollowing), respectively, and T be the number of time periods in the time horizon(day, week or year). Let ξ = ξt

Tt=1 be the multivariate stochastic data process, pit

the stochastic decision (vector) for unit i at time t and Si(ξ) the set of constraintsfor the decision of the single unit i. The vector pit may contain a 0, 1-component tomodel an on/off-decision (e.g., in the case of thermal generation units). The sets Si(ξ)may depend on the data process, e.g., in the case of a hydraulic unit with stochasticinflow. They typically contain bounds for each period and dynamic constraints (e.g.,for the operation of reservoirs of hydraulic units and minimum up/down conditionsfor thermal units). In addition, there is a set C(ξ) describing equilibrium, reserve


figure 6. Thermal power plant.

and group constraints that couple the output of different power units. With ri(ξt, pit)denoting the (stochastic) revenue of the decision pit at unit i and time t, the electricityportfolio management model is of the form

max E

[

∑

it

ri(ξt, pit)]

s.t. pi ∈ Si(ξ), p ∈ C(ξ), (P8)

when the expected total revenue is to be maximized. As such an objective may lead todecisions with enormous risk, one might be led to replacing the objective function bysome other risk measure F (instead of E) of the stochastic total revenue

∑

it ri(ξt, pit)or to introducing additional risk constraints.

If the functions ri(ξt, ·) are piecewise linear concave (which can be assumed in mostpractical cases), the portfolio management model (P8) represents a large scale linearmixed-integer stochastic program. It can be solved by a Lagrangian relaxation strategyfor the constraints in C(ξ), which leads to a decomposition into single unit subprob-lems, and by a subsequent Lagrangian heuristic [18], [19]. If, however, risk terms enterthe model, this decomposition technique has to be modified or may even fail.

Furthermore, the methods for constructing scenario trees for such mean-risk mod-els that have to be developed must be adapted to the structural properties of theoptimization model and of the risk term(s). Recent stability studies in [33], [40] offercertain ideal probability metrics that may be used for such constructions. Extendingthe work in [19], this methodology will be used to construct multivariate load, spotprice and inflow trees.

The project on mean-risk models in electricity portfolio management in the re-search center will be directed by W. Romisch (HU) and R. Henrion (WIAS). All al-


figure 7. Airline network.

gorithms mentioned above will be implemented and tested on real-life data of powerexchanges and of the hydro-thermal generation system of the cooperation partnerE.ON Sales & Trading GmbH.

3.3 OD optimization in airline revenue management

Revenue management aims at controlling the sale of inventory under uncertaintysuch that the (expected) profit is maximized [28], [46]. Since its foundation, airlinerevenue management has concentrated on optimizing booking control parameters ona single flight (leg) level. However, many characteristics are not leg-specific (e.g., theuncertain passenger demand), but depend on origin-destination itineraries. Hence,the performance of leg-based approaches is limited and optimization methods thatwork on the origin-destination (OD) level are needed. But, the presently availableOD optimization methods [46] have some shortcomings in common (e.g., unrealisticassumptions, ignoring the integer nature of the model, separation of related tasks).

A new stochastic programming approach for OD optimization that applies togeneral stochastic passenger demand processes, i.e., without restrictive assumptionson the underlying probability distribution, shall be developed.

Let us consider an airline network at a specific day consisting of sets I of origins,J of destinations and L of legs. For each leg l ∈ L we have Kl booking classes. Letfi,j,k denote the fare associated with a passenger in the OD market from the origini to destination j in booking class k. The airline wishes to select the booking limits(i.e., the maximum number of available seats) for all booking classes and legs suchthat the expected profit is maximized.


Hence, stochastic programming models for static OD optimization are of the form

max E

[

∑

i,j,k

fi,j,kxi,j,k

]

s.t. 0 ≤ xi,j,k ≤ di,j,k ,∑

k

xi,j,k ≤ cli,j, (P9)

where xi,j,k and di,j,k are the booking limit and the demand, respectively, from i to jin booking class k. By cl we denote the capacity of leg l ∈ L and li,j denotes the legin L that is relevant for the OD market from i to j.

Dynamic OD optimization models are of particular importance in the airline in-dustry as they take into account that the customer’s willingness to pay tends toincrease if t gets closer to T and, thus, discount fares are made available for thosewho make reservations early. In such models the revenues, the demand processes andthe booking limits also depend on the booking time t, starting at the initial time t = 0and ending at t = T , i.e., the day of departure. As a result of dynamic models theairline obtains answers on the optimal booking limits at the initial time and on theirchange over time until t = T . Clearly, such models are large scale linear stochasticinteger programs.

The mathematical challenges for solving OD optimization models consist in thedesign of solution algorithms and in the approximation of the multivariate stochasticbooking demand process di,j,k,tk,t for each OD pair (i, j) ∈ I × J . Due to the hugesize of the models, algorithms have to be based on suitable decomposition strategies.The choice of a decomposition method depends on the model structure, the subprob-lem complexity, and also on duality gap estimates [11], [41]. The construction of sce-nario tree approximations of the stochastic processes di,j,k,tk,t is based on historicaldata of the OD network at relevant days. Many approaches for constructing scenariotrees use ideas from cluster analysis [12]. The approach described in [19] combinesthe clustering of scenarios and scenario reduction techniques [13], [20], where bothcomponents are based on a probability metric measuring the distance of multivariateprobability distributions. This approach leads to (almost) optimal tree constructions,i.e., to best approximations with respect to the probability metric. An important re-quirement for such constructions is, in particular, that the procedure works withoutassuming independence of the demand for different booking classes and time periods,respectively.

The project on OD optimization in airline revenue management will be directedby W. Romisch (HU). The whole solution method will be implemented and tested onreal-life data of the company Lufthansa Systems Berlin. The stochastic programmingmodel will also be compared with earlier approaches in airline revenue management.

BIBLIOGRAPHY

[1] N.I. Akhiezer. The Calculus of Variations, Ginn (Blaisdell), Boston, 1962.

[2] N. Arada, J.-P. Raymond, and F. Troltzsch. On an augmented Lagrangian SQP method for aclass of optimal control problems in Banach spaces, Computational Optimization and Applica-tions 22 (2002), 369–398.

[3] P. Artzner, F. Delbaen, J.-M. Eber and D. Heath. Coherent measures of risk, MathematicalFinance 9 (1999), 203–228.


[4] J.-P. Aubin and H. Frankowska. Set-Valued Analysis, Birkhauser, Boston 1990.

[5] B. Bank, J. Guddat, D. Klatte, B. Kummer and K. Tammer. Non-Linear Parametric Opti-mization, Akademie-Verlag, Berlin 1982.

[6] J.R. Birge. Stochastic programming computation and applications, INFORMS Journal onComputing 9 (1997), 111–133.

[7] J.R. Birge and F. Louveaux. Introduction to Stochastic Programming, Springer, New York1997.

[8] J.F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems, Springer, NewYork 2000.

[9] N. Bubner, O. Klein, P. Philip, J. Sprekels and K. Wilmanski. A transient model for thesublimation growth of silicon carbide single crystals, Journal of Crystal Growth 205 (1999),294–304.

[10] A.G. Butkovskii. Methods for the Control of Distributed Parameter Systems (in Russian),Nauka, Moscow, 1975.

[11] D. Dentcheva and W. Romisch. Duality gaps in nonconvex stochastic optimization, Preprint02-5, Inst. Math., HU Berlin, 2002 and submitted.

[12] J. Dupacova, G. Consigli and S.W. Wallace. Scenarios for multistage stochastic programs,Annals of Operations Research 100 (2000), 25–53.

[13] J. Dupacova, N. Growe-Kuska and W. Romisch. Scenario reduction in stochastic programming:An approach using probability metrics, Mathematical Programming (to appear).

[14] I.V. Evstigneev. Measurable selection and dynamic programming, Mathematics of OperationsResearch 1 (1976), 267–272.

[15] H. Follmer and A. Schied. Convex risk measures and trading constraints, Finance and Stochas-tics 6 (2002), 429–447.

[16] P. Funk. Variationsrechnung und ihre Anwendung in Physik und Technik, Springer, Berlin,1962.

[17] H. Goldberg and F. Troltzsch. On a SQP–multigrid technique for nonlinear parabolic boundarycontrol problems, in: Optimal Control: Theory, Algorithms and Applications (W.W. Hager andP.M. Pardalos eds.), Kluwer, Dordrecht, 1997, 154–177.

[18] N. Growe-Kuska, K.C. Kiwiel, M.P. Nowak, W. Romisch and I. Wegner. Power managementin a hydro-thermal system under uncertainty by Lagrangian relaxation, in: Decision Makingunder Uncertainty: Energy and Power (C. Greengard and A. Ruszczynski eds.), IMA Volumesin Mathematics and its Applications Vol. 128, Springer, New York, 2002, 39–70.

[19] N. Growe-Kuska and W. Romisch. Stochastic unit commitment in hydro-thermal power pro-duction planning, Preprint 02-3, Inst. Math., HU Berlin, 2002 and in: Applications of StochasticProgramming (Wallace, S.W.; Ziemba, W.T. eds.), MPS-SIAM Series in Optimization (to ap-pear).

[20] H. Heitsch and W. Romisch. Scenario reduction algorithms in stochastic programming, Com-putational Optimization and Applications (to appear).

[21] W. Karush. Minima of functions of several variables with inequalities as side conditions, Mas-ter’s thesis, University of Chicago, 1939.

[22] O. Klein and P. Philip. Transient numerical investigation of induction heating during sublima-tion growth of silicon carbide single crystals. J. of Crystal Growth (to appear).

[23] O. Klein, P. Philip, J. Sprekels, and K. Wilmanski. Radiation- and convection-driven transientheat transfer during sublimation growth of silicon carbide single crystals, J. of Crystal Growth222 (2001), 832–851.

[24] H.W. Kuhn and A.W. Tucker. Nonlinear programming, in: Proceedings of the Second Berke-ley Symposium on Mathematical Statistics and Probability (Neyman, J. ed.), University ofCalifornia Press, Berkeley, 1951, 481–492.

[25] J.L. Lions. Controle optimal de systemes gouvernes par des equations aux derivees partielles,Dunod, Gauthier-Villars, Paris, 1968.

[26] D.G. Luenberger. Optimization by Vector Space Methods, Wiley, New York, 1969.

[27] J. Macki and A. Strauss. Introduction to Optimal Control Theory, Springer-Verlag, Berlin,Heidelberg, New York, 1982.


[28] J. McGill and G. van Ryzin. Revenue management: Research overview and prospects, Trans-portation Science 33 (1999), 233–256.

[29] P. Philip. Transient numerical simulation of sublimation growth of SiC bulk single crystals.modeling, finite volume method, Results. Doctoral thesis, submitted to the Dept. of Math., HUBerlin, 2002.

[30] L.S. Pontrjagin, V.G. Boltyanskii, R.V. Gamkrelidze and E.F. Mischenko. The MathematicalTheory of Optimal Processes, Wiley, New York, 1962.

[31] A. Prekopa. Stochastic Programming, Kluwer, Dordrecht, 1995.[32] S.T. Rachev. Probability Metrics and the Stability of Stochastic Models, Wiley, Chichester 1991.[33] S.T. Rachev and W. Romisch. Quantitative stability in stochastic programming: The method

of probability metrics, Mathematics of Operations Research (2002) (to appear).[34] J.-P. Raymond and F. Troltzsch. Second order sufficient optimality conditions for nonlinear

parabolic control problems with state constraints, Discrete and Continuous Dynamical Systems6 (2000), 431–450.

[35] J.-P. Raymond and H. Zidani. Hamiltonian Pontryagin’s principles for control problems gov-erned by semilinear parabolic equations, Applied Mathematics and Optimization 39 (1999),143–177.

[36] R.T. Rockafellar and R.J-B Wets. The optimal recourse problem in discrete time: L1-multipliers

for inequality constraints, SIAM Journal on Control and Optimization 16 (1978), 16–36.[37] R.T. Rockafellar and R.J-B Wets. Variational Analysis, Springer, Berlin 1998.[38] R.T. Rockafellar and S. Uryasev. Conditional value-at-risk for general loss distributions, Journal

of Banking & Finance 26 (2002), 1443–1471.[39] W. Romisch. Optimierungsmethoden fur die Energiewirtschaft: Stand und Entwicklungsten-

denzen, in: Optimierung in der Energieversorgung, VDI Berichte 1627, VDI-Verlag, Dusseldorf,2001, 23–36.

[40] W. Romisch. Stability of Stochastic Programming Problems, Preprint submitted to ElsevierScience, 2002.

[41] W. Romisch and R. Schultz. Multistage stochastic integer programs: An introduction, in: On-line Optimization of Large Scale Systems (M. Grotschel, S.O. Krumke and J. Rambau eds.),Springer, Berlin, 2001, 579–598.

[42] A. Rosch. Analysis schneller und stabiler Verfahren zur optimalen Steuerung partieller Diffe-rentialgleichungen. Habilitation thesis, TU Berlin, 2001.

[43] A. Ruszczynski. Decomposition methods in stochastic programming, Mathematical Program-ming 79 (1997), 333–353.

[44] F. Troltzsch. On the Lagrange-Newton-SQP method for the optimal control of semilinearparabolic equations, SIAM Journal on Control and Optimization 38 (1999), 294–312.

[45] R.J.-B. Wets and W.T. Ziemba. Stochastic programming. State of the Art 1998, Annals Oper-ations Research 85 (1999).

[46] G. Yu (ed.). Operations Research in the Airline Industry, Kluwer, Boston, 1998.

applied nonlinear optimizationromisch/papers/rotrobmg.pdf · applied nonlinear optimization werner...

Documents