symbolic–numeric techniques for solving nonlinear systems

Symbolic–Numeric Techniques for Solving Nonlinear Systems

Thomas Beelitz, Andreas Frommer, Bruno Lang, and Paul Willems

Bergische Universitat Wuppertal, Fachbereich Mathematik und Naturwissenschaften, Scientific Computing, D-42097Wuppertal, Germany

We describe a multilevel technique combining symbolic and numeric methods to obtain guaranteed enclosures for all solutionsof a nonlinear system within a given search box.

Before describing the techniques underlying our nonlinear solver [3] we note that we understand the term “symbolic” in abroad sense. Rather than having it denote only purely algebraical methods, such as the cylindrical algebraic decomposition[2], or approaches with an algebraic component [8], in our view “symbolic” comprises all techniques that are focused on adiscrete representation of the system. This definition comprises term manipulation as well as graph-based solving techniquessuch as constraint propagation.

1 Interval arithmetic

First we recall some important properties of interval arithmetic and fix our notation. For a thorough introduction to intervalarithmetic the reader is referred to, e.g., [1]. Real intervals are denoted as [a] = [a, a] = {a ∈ R : a ≤ a ≤ a} with a, a ∈R ∪ {−∞, +∞}. IR is the set of all such intervals. IRn is the set of all interval vectors (or boxes) [a] = ([a1], . . . , [an]) =([a1, a1], . . . , [an, an]) ≡ [a1] × · · · × [an] ⊂ R

n. For intervals [a], [b] the arithmetic operations ◦ ∈ {+,−,×, /} and thestandard functions ϕ ∈ {exp, ln,

√, sin, cos, . . .} are defined by

[a] ◦ [b] := {a ◦ b : a ∈ [a], b ∈ [b]} ( ∈ IR), ϕ([a]) := {ϕ(a) : a ∈ [a]} ( ∈ IR). (1)

As a consequence of these definitions, interval arithmetic allows to compute enclosures for the range of a function in a straight-forward manner. Let the function f = f(x1, . . . , xn) be given by an expression composed of the variables, some constants,and operations +,−,×, /, exp, ln, . . .. If we replace each variable xi with an interval [ai] and proceed according to (1) thenwe obtain an interval [f ] = f([a1], . . . , [an]), which contains the range of f over the n-dimensional box [a1]×· · ·×[an] = [a],i.e., an enclosure for the range of f over [a]. As an example, consider two expressions for the same function, f(x1, x2) =x1x2−x1 ≡ x1(x2−1) = g(x1, x2), over the box [a] = [1, 2]×[−1, 1]. We have f([1, 2]×[−1, 1]) = [1, 2]·[−1, 1]−[1, 2] =[−2, 2] − [1, 2] = [−4, 1] and g([1, 2] × [−1, 1]) = [1, 2] · ([−1, 1] − 1) = [1, 2] · [−2, 0] = [−4, 0]. This example shows thatdifferent representations of a function in general lead to different enclosures. In the above example g yields the exact range.In most cases, however, the range is over-estimated. Derivative information can be used to obtain tighter enclosures [9].

For practical purposes it is important that interval operations require only finitely many operations (with the interval bound-aries). To give an example, we have

[a] · [b] = [min S, max S], where S = {ab, ab, ab, ab}. (2)

To take rounding errors into account, we can make use of the directed rounding modes of IEEE-compliant processors. That is,for computing the minimum in (2) the products in S are computed with downward rounding, whereas for the maximum theyare recomputed with upward rounding. Thus guaranteed results can be obtained automatically even in the context of machinearithmetic with its inherent rounding, at the cost of slightly wider enclosures.

2 The branch–and–bound scheme

Interval-based rigorous solvers are aimed at finding all solutions of a nonlinear system

fi(z) = 0, i = 1, . . . , n,

in a domain [z] = [z1, z1] × . . . × [zn, zn]. Most of these solvers rely on a branch–and–bound scheme to locate and discardparts of [z] that cannot contain a zero.

At the heart of these methods there is a recursive function Check([z]), which first computes enclosures fi[z] for the rangesof the fi over [z] and checks if 0 ∈ fi[z] for some i ∈ {1, . . . , n}. If this is the case then [z] cannot contain a solution z∗,and the box can be discarded. Otherwise the box may or may not contain a zero. Then [z] is subdivided into two parts [z′],[z′′], and the Check function is applied recursively to these. The recursion ends when the size of the box falls below someprescribed threshold. Proceeding this way one obtains a list of small boxes that are guaranteed to cover all the zeros of f in[z], but not each of these boxes must contain a zero.

PAMM · Proc. Appl. Math. Mech. 5, 705–708 (2005) / DOI 10.1002/pamm.200510328

© 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim


3 Acceleration devices

The basic branch–and–bound algorithm, as described above, is by far too inefficient to be used in practice. It must be comple-mented with more sophisticated acceleration devices to determine parts of the box [z] that cannot contain a zero. Removingthese parts before the box is subdivided can lead to a substantial reduction of the recursion depth.

3.1 Nonlinear systems and term nets

The acceleration devices discussed below rely on graph representations of the nonlinear system. We will explain these repre-sentations with an example from process systems engineering [4]. The nonlinear system to be solved is given by

f1(z1, z2) = (px − z1) − pDa exp(

z2

1 + pgz2

)z1 = 0

f2(z1, z2) = (py − z2) + pB pDa exp(

z2

1 + pgz2

)z1 = 0

⎫⎪⎪⎬⎪⎪⎭ (3)

with px, py , pDa, pg , and pB denoting parameters describing the chemical process. From this “original” system, the “fullsplit” is obtained by introducing a new variable for each subterm (intermediate result) occurring in the terms of f1 and f2:

z3 − (px − z1) = 0 z6 − z2/z5 = 0 z9 − z8 · z1 = 0 z3 − z9 = 0z4 − pg · z2 = 0 z7 − exp(z6) = 0 z10 − pB · z9 = 0 z11 + z10 = 0z5 − (1 + z4) = 0 z8 − pDa · z7 = 0 z11 − (py − z2) = 0

⎫⎬⎭ (4)

Here, the first nine equations in (4) define the new variables z3, . . . , z11, whereas the last two equations (right column)correspond to the original system (3). Adding new variables only for selected subterms leads to an “intermediate split”, e.g.,

z3 − (1 + pg · z2) = 0 (px − z1) − z4 = 0z4 − pDa · exp(z2/z3) · z1 = 0 (py − z2) + pB · z4 = 0

}(5)

Again, the two equations in the right column of (5) correspond to the original system (3), whereas the remaining equationsdefine two new variables. These systems correspond to different views on the term net representing the computational flowfor the evaluation of f1 and f2; cf. Fig. 1. Common subterms in the functions are mapped to the same node in the term net.

z 2

p g

p Da

z 1

p Bp y

p x

1

f 1f 2

/

exp

+

**

*

+

−

−

−

0 0

*z 4

z 5

z 6

z 7

z 8z 9

z 3

z 10

z 11

p g

p Da

z 1

p Bp y

p x

z 2

f 1f 2

1 *

/

exp

+

**

*

+

−

−

−

0 0

variable

function val

variable andfunction val

intermediatequantity

constant

z 2

p g

p Da

z 1

p Bp y

p xz 3

z 4

*1

f 1f 2

/

exp

+

**

*

+

−

−

−

0 0

Fig. 1 Views on the term net for the original system, the full split, and an intermediate split (from left to right). Shaded boxes with solidborders represent constants, rounded corners denote variables, and dashed borders indicate the function values in the respective system.

3.2 Constraint propagation

To explain the idea of constraint propagation (CP), we first consider a very simple example. Suppose we know that somequantities z1, z2, and f are bounded by z1 ∈ [z1] = [0, 2], z2 ∈ [z2] = [0, 2], f ∈ [f ] = [0, 2], and that the quantities obey the“constraint” f = z2

1 + exp(z2). Then solving for each variable in turn yields the tighter bounds

z1 ∈(±

√[f ] − exp([z2])

)∩ [z1] = [0, 1],

z2 ∈ ln([f ] − [z1]2) ∩ [z2] = [0, ln 2],f ∈ (

[z1]2 + exp([z2])) ∩ [f ] = [1, 2].

In CP this procedure is applied to the nodes of the term net. More precisely, in a forward sweep of CP one starts with theknown bounds zi ∈ [zi] for the “original” variables (i = 1, . . . , n), and zi ∈ (−∞,+∞) for the new variables (i > n). Then


Section 19 706

one proceeds “bottom up”, i = n + 1, n + 2, . . ., using the current bounds for z1, . . . , zi−1 to improve the bounds for zi, ashinted at in the above example. When one finally arrives at the topmost nodes representing the function values fi, the resultingbounds are enclosures for the ranges of the fi over the box [z].

In a backward sweep the computation flows “top down”. Now one starts with the required conditions fi = 0, i = 1, . . . , n,and propagates the bounds downward through the nodes. For example, in the “f2” node the equation f2 = z11 + z10 and thecurrent bounds on f2 (i.e., f2 ∈ [0, 0]), z11, and z10 are used to obtain narrower bounds for z11 and z10. Proceeding this way,one finally arrives at improved bounds for the original variables zi.

Full CP has no fixed direction. Here one selects some “promising” node and tries to refine the bounds for its neighboringnodes. Those neighbors with significant improvement become candidates for future selection. The procedure stops if no oronly minor reduction was possible during some fixed number of steps.

Constraint propagation can be very effective in reducing the size of the box [z]. In fact, in certain situations one forwardsweep and one backward sweep yield the best possible contraction. By contrast, progress may be very slow or nil if there arecycles in the term net. In such situations CP must be complemented with numerical accelerators, such as the so-called order–1or order–2 Taylor refinement [4], or the Newton–Gauß–Seidel method described below.

3.3 The interval Newton–Gauß–Seidel method

Let z∗ be a zero of f in [z], C be a non-singular n × n matrix (the preconditioner), and [f ′] be an enclosure for the Jacobianof f over [z]. Then order–1 Taylor expansion around an arbitrary point z ∈ [z] yields

0 = C · f(z∗) = C · f(z) + (C · f ′(ξ)) · (z∗ − z)

with points ξi between z and z∗. Thus

(C · [f ′]) · ([z] − z) � −C · f(z), (6)

which is an interval linear system of the form [A] · [x] = [b]. The Newton–Gauß–Seidel (NGS) method consists of solvingthis linear system, which defines the Newton correction, with an interval variant of the Gauß–Seidel scheme:

xi ∈⎛⎝ 1

[aii]

⎛⎝[bi] −

∑j �=i

[aij ][xj ]

⎞⎠

⎞⎠ ∩ [xi] =: [xi]new, i = 1, . . . , n. (7)

For large systems, the computation of the product C · [f ′] and the Gauß–Seidel solver can be rather expensive. It is, however,not necessary to apply (7) for all i. We can also solve only for a subset of the variables, and only the ith row of the precondionerC is required to solve for xi. Thus the NGS scheme can be made more efficient if we tighten only (or primarily) the boundsof those variables for which CP was not successful.

It remains to decide to which of the systems the interval NGS method is applied and which preconditioner is to be used.The original system has the advantage of being the smallest, so the solution is not too expensive. On the other hand, thissystem often contains heavy dependencies of the variables, which can lead to severe over-estimation of the ranges. Thus theNGS scheme yields only poor contraction of the box. The “full split” system, by contrast, may be very large, and thereforeexpensive to solve even if its sparsity is exploited. But since dependencies are eliminated, it may give better contraction. Asa compromise, “intermediate split” systems can lead to significant contraction at a reasonable cost if the “right” subterms arechosen to define additional variables. The choice of the preconditioner is also determined by a trade-off between performanceand cost: LP preconditioners [9] yield tight results, but since each row of the preconditioner is computed via linear optimiza-tion, they are rather expensive. Others, such as the “midpoint inverse” preconditioner, C = mid([f ′])−1, are cheaper but lesseffective.

3.4 A multilevel approach

In order to obtain a robust and efficient tightening scheme, we have combined the CP and NGS techniques in a tightly coupledway. First, we do a forward CP sweep to obtain bounds for all nodes in the term net. Then a backward CP sweep is doneto tighten the bounds for the zi. Following this, we apply a truncated full CP to further improve the bounds. The CP partis complemented with a new “multilevel” NGS scheme, which consists of doing i) a full NGS iteration with an expensivepreconditioner C on the original system, ii) an NGS with the same or a less expensive preconditioner on selected variablesof intermediate splits, and iii) an NGS with an inexpensive preconditioner on selected variables of the full split. If this wholescheme leads to a sufficient contraction of the box then it is repeated.

4 Symbolic preprocessing

In addition to having CP as a symbolic component of our solver, we do some term manipulations in a preprocessing stage, forthree reasons. First, dependencies of the variables within a function can lead to severe over-estimation of the ranges. In this


PAMM · Proc. Appl. Math. Mech. 5 (2005) 707

case the exclusion test of the branch–and–bound algorithm and its accelerators are not effective, leading to a high recursiondepth and an excessive number of boxes to consider. Therefore we apply some term simplifications to reduce the dependencies.There is, however, no hope for always finding an optimal expression because even the simpler question “does an expressionrepresent the zero function?” is not decidable [5]. A second type of transformations is aimed at handling infinite searchregions. If the range for some variable zi is of the form [α, +∞) then this variable is replaced with z′i = (α− σ)/(zi − σ) forsome σ < α. Similarly, variables with ranges (−∞, β] or (−∞,+∞) are replaced with z′i = (β − τ)/(zi − τ), where τ > β,and z′i = 1/(ex +1), respectively. In each case, the range for z′i is [0, 1]. These manipulations proved useful for regularizationvariables, which are only known to be nonnegative. More than one variable may have infinite bounds, but not all of them,because then the above exclusion techniques fail. Finally, expressions for the derivatives needed in the NGS algorithm andsome other accelerating devices are generated during the symbolic preprocessing stage.

5 Numerical results and concluding remarks

The combination of symbolic and numeric techniques, as summarized in Fig. 2, has led to a rather robust and efficient nonlinearsolver. The data presented in Tab. 1 show that leaving out either the symbolic CP component or the numeric NGS component,or even only the multilevel scheme for NGS can lead to a significant increase of the number of boxes to be considered. Notethat the default settings do not always yield the best possible run-time; they were chosen for robustness, i.e., to reduce the riskof an excessive number of boxes. The approach described in this paper can also be applied to unconstrained and constrainedglobal optimization. In fact, the “Griewank” data in Tab. 1 come from locating critical points in an unconstrained optimizationproblem.

constraint propagation

multilevel Newton

branch−and−bound framework

generatederivativesnonlinear

systemgenerate term net

and splits

preprocessing

set upH systems non−solvability

check for

verification

Fig. 2 Synopsis of the computational flow in the solver. Dark and light shading indicates symbolic and numeric components, respectively.

Table 1 Overall number of boxes and time for solving two problems with different settings of the control parameters. The default settingsare: CP ON (with order–1 Taylor ON, order–2 Taylor OFF), MULTILEVEL Newton (with LP/midpoint inverse preconditioner), HYBRID

subdivision strategy.

Settings Robotics (n = 7) #boxes time Griewank (n = 7) #boxes timedefault 5 219 103 7 296 55no CP 7 214 122 18 575 7no Newton > 1 000 000 > 6 000 18 577 12Newton only on original system 40 846 182 11 716 81

Due to limited space the verification component of our solver cannot be discussed here. It allows to computationallyprove the existence of a solution within the (small) candidate boxes resulting from the branch–and–bound scheme. One ofthe available algorithms is based on the topological degree and requires investigating the solvability of systems involving ahomotopy function H; for details see [7].

Acknowledgements This work was partially supported by VolkswagenStiftung within the project “Konstruktive Methoden der Nichtlin-earen Dynamik zum Entwurf verfahrenstechnischer Prozesse”, Geschaftszeichen I/79 288.

References[1] G. Alefeld and J. Herzberger, Introduction to Interval Computation (Academic Press, New York, 1983).[2] D. S. Arnon et al., Cylindrical algebraic decomposition I: the basic algorithm, SIAM J. Comput. 13(1), 865–877 (1984).[3] T. Beelitz et al., SONIC—a framework for the rigorous solution of nonlinear problems, Preprint BUW-SC 2004/7, University of

Wuppertal (2004).[4] C. H. Bischof et al., Verified determination of singularities in chemical processes, in Scientific Computing, Validated Numerics,

Interval Methods, edited by W. Kramer and J. Wolff von Gudenberg, pp. 305–316 (Kluwer Academic Publishers, New York, 2001).[5] B. F. Caviness, On Canonical Forms and Simplification, Ph.D. thesis, Dept. of Computer Science, Carnegie Mellon University (1968).[6] K. Deimling, Nonlinear Functional Analysis (Springer, Berlin, 1985).[7] A. Frommer and B. Lang, A framework for existence tests based on the topological degree and homotopy, Preprint BUW-SC 2005/6,

University of Wuppertal (2005).[8] C. Jager and D. Ratz, A combined method for enclosing all solutions of nonlinear systems of polynomial equations, Reliable Comput.

1, 41–64 (1995).[9] R. B. Kearfott, Rigorous Global Search: Continuous Problems (Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996).


Section 19 708

symbolic–numeric techniques for solving nonlinear systems

Documents