287-517-1-sm

Upload: pkuchonthara383

Post on 04-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 287-517-1-SM

    1/24

    JNM: Journal of Natural Sciences and Mathematics

    Vol. 2, No. 1, pp. 17-40 (June 2008/Jumada Al-Akher 1429 H)

    c Qassim University Publications

    Reduced Gradient Method and its Generalizationvia Stochastic Perturbation

    Abdelkrim El Mouatasim

    Department of Mathematics,

    Faculty of Sciences, Jazan University,

    B.P. 114, Jazan, Saudi Arabia.

    E-mail: [email protected]

    Abstract: In this paper, the global optimization of a nonconvex objective function

    under linear and nonlinear differentiable constraints is studied, a reduced gradient and

    GRG descent methods with random perturbation is proposed and it is desired to establish

    the global convergence of the algorithm. Some numerical examples are also given by the

    problems of statistical, octagon, mixture, alkylation and pooling.

    Keywords: Stochastic perturbation, constraints optimization, reduced gradientmethod and its generalization, Nanconvex optimization.

    Received for JNM on November 19, 2007.

    1 Introduction

    The general constrained optimization problem is to minimize a nonlinear function

    subject to linear or nonlinear constraints. Two equivalent formulations of this problem

    are useful for describing algorithms. They are

    Minimize : f(x)

    subject to : gi(x) = 0, i = 1,..,ml

    gi(x) 0, i = ml + 1,..,m x u

    (1.1)

    where f and each gi : IRn IR is twice continuously differentiable function and the

    lower- and upper-bound vectors, and u, may contain some infinite components; and

    Minimize : f(x)Subject to : h(x) = 0

    x 0(1.2)

    17

  • 7/29/2019 287-517-1-SM

    2/24

    18 Abdelkrim El Mouatasim

    where h maps IRm to IR, the objective function f : IRn IR and the restrictionhj : IR

    n IR, j = 1,...,m (the component of h) are twice continuously differentiablefunctions. For the case of multi objective optimization see [10].

    The main technique that has been proposed for solving constrained optimization

    problems in this work is reduced gradient method and its generalization. We are mainlyinterested in the situation where, on one hand f is not convex and, on the other hand,

    the constraints are in general not linear; to get equality constraints one might think

    about introducing slack variables and incorporating them into the functions gi.

    The problem (1.2) can be numerically approached by using reduced gradient method

    or its generalizations, which generates a sequence {xk}k0, where x0 is an initial feasiblepoint and, for each k > 0, a new feasible point xk+1 is generated from xk by using anoperator Qk (see section 3). Thus the iterations are given by:

    k 0 : xk+1 = Qk(xk). (1.3)In order to prevent convergence to local minimum, various modifications of these basic

    methods have been introduced in the literature. For instance, the reduced gradi-

    ent method has been considered by [1]; the random perturbations projected gradient

    method in [2]; the sequential convex programming in [24]; the generalized reduced

    gradient in [5,7, 2526,32, 34]; sequential quadratic programming in [15, 30]; nonlin-

    ear programming in [3, 20, 27]; stochastic perturbation of active set [11]. Moreover,

    stochastic or evolutionary search can be introduced, but these methods are usually

    considered as expensive and not accurate: on the one hand, the flexibility introduced

    by randomization implies often a large number of evaluations of the objective func-

    tion and, on the other hand, the pure random search generates many points which do

    not improve the value of the objective function. This remark has led some people to

    introduce a controlled random search (see for instance [2, 4, 6, 9, 23, 28, 33]).

    In such a method, the sequence {xk}k0 is replaced by a random vectors sequence{Xk}k0 and the iterations are modified as follows:

    k 0 : Xk+1 = Qk(Xk) + Pk (1.4)where Pk is a suitable random variable, called the stochastic perturbation. The se-quence {Pk}k0 goes to zero slowly enough in order to prevent convergence to a localminimum (see Section 4). The reduced gradient method and its generalization are

    recalled in Section 3, while the notations are introduced in Section 2. The results of

    numerical experiments will be given in Section 5.

    2 Notations and assumptions

    We denote by IR the set of the real numbers (, +), IR+ the set of posi-tive real numbers [0, +), E = IRn, the n-dimensional real Euclidean space. For

  • 7/29/2019 287-517-1-SM

    3/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 19

    x = (x1, x2,....,xn)t E, xt denotes the transpose of x. We denote by x =

    xtx = (x21 + ... + x2n)

    1/2 the Euclidean norm of x.

    The general optimization problem (1.1) is equivalent to global optimization problem

    (1.2) with equality constrained and nonnegative variables.

    Remark 2.1 The assumed conditions are satisfied by using two phase method, since

    we can add artificial variables in the constraints.

    Linear case: h(x) = Ax b where A is m n matrix with rank m, and b isan m-vector. Any m columns of A are linearly independent, and every extreme

    point of the feasible region has m strictly positives variables.

    We introduce basic and nonbasic variables according to

    A = [B, N] x =

    xBxN

    xN 0 xB > 0 (2.1)

    by nondegeneracy assumption. Furthermore the gradient off may be conformally

    partitioned as

    f(x)t = Bf(x)t,Nf(x)t,whereBf andNf are the basic gradient and nonbasic gradient of f respec-tively.

    Nonlinear case: for simplicity we also assume that the gradients of the con-straint functionshj are linearly independent for all point x

    0. This assumption

    like the assumption of the full rank of the matrix A in the linear case ensures thatwe can choose a basis and express the basic variables as a linear combination of

    the nonbasic variables and then reduce the problem. Hence a similar generalized

    reduced gradient procedure applies.

    Let a feasible solution xk 0 with hj(xk) = 0 for all j be given. By assumptionthe Jacobian matrix of the constraints h(x) = (h1(x),...,hm(x))

    t at each x 0has full rank and, for simplicity at the point xk will be denoted by

    Ak = Jh(xk) and bk = Akx

    k.

    Then a similar construction as in linear case applies.

    Let

    C = {x E | h(x) = 0, x 0}.The objective function is f : E IR, its lower bound on C is denoted by l: l =minC

    f. Let us introduce

    C = S C; S = {x E | f(x) }.

  • 7/29/2019 287-517-1-SM

    4/24

    20 Abdelkrim El Mouatasim

    We assume that

    f is twice continuously differentiable on E (2.2)

    > l : C is not empty, closed and bounded (2.3)

    > l

    : meas (C) > 0, (2.4)where meas(C) is the measure of C.

    Since E is a finite dimensional space, the assumption (2.3) is verified when C is

    bounded or f is coercive, i.e., limx+

    f(x) = +. Assumption (2.3) is verified when Ccontains a sequence of neighborhoods of a point of optimum x having strictly positive

    measure, i.e., when x can be approximated by a sequence of points of the interior of

    C. We observe that the assumptions (2.2)-(2.3) yield that

    C =>l

    C i.e., x C : > l such that x C.

    From (2.2) and (2.3) we get

    1 = sup{f(x) : x C} < +.

    Then

    2 = sup{d : x C} < +,where d is the direction of reduced gradient method (see section 3). Thus,

    (, ) = sup{y (x + d) : (x, y) C C, 0 } < +, (2.5)

    where , are positive real numbers.

    3 Reduced Gradient and its Generalization

    3.1 Reduced Gradient Method

    The reduced gradient method is coldly related to the simplex method of linear pro-

    gramming in that the problem variables are partitioned into basic and nonbasic groups.

    From a theoretical viewpoint, the method can be shown to have very much like the gra-

    dient projection method [1]. Some variants of the method applied to convex nonlinear

    problems with linear constraints are also referred to as convex simplex method.

    Again we are dealing with the same basic descent algorithm structure and are

    interested in still other means for computing descent directions. We use the following

    form of the linearly constrained with f(.) continuously differentiable on IRn:

    Minimize : f(x)subject to : Ax = b

    x 0(3.1)

  • 7/29/2019 287-517-1-SM

    5/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 21

    where A is an m n matrix with full row rank and b IRm.The reduced gradient method begins with a basis B and a feasible solution xk =

    (xkB, xkN) such that x

    kB > 0. The solution x is not necessarily a basic solution, i.e. xN

    do not have to be identically zero. Such a solution can be obtained e.g. by the usual

    first phase procedure of linear optimization. Using the basis B from BxB + NxN = b,we have

    xB = B1b B1NxN,

    hence the basic variables xB can be eliminated from the problem (3.1)

    Minimize : fN(xN)subject to : B1b B1NxN > 0

    0 xN(3.2)

    where fN(xN) = f(x) = f(B1b B1NxN, xN). Using the notation

    f(x)t = Bf(x)t,Nf(x)t,the gradient of fN, which is the so-called reduced gradient, can be expressed as

    fN(x)t = Bf(x)tB1N) + (Nf(x)t.

    Now let us assume that the basis is nondegenerate, i.e. only the nonnegativity

    constraints xN 0 might be active at the current iterate xk. Let the search directionbe a vector dt = (dtB, d

    tN) in the null space of the matrix A defined as dB = B1NdN

    and dN 0. If we define so, then the feasibility of xk + d is guaranteed as long asxkB + dB 0, i.e. as long as

    max = miniB,di

  • 7/29/2019 287-517-1-SM

    6/24

    22 Abdelkrim El Mouatasim

    3.2 Generalized reduced gradient (GRG)

    The reduced gradient method can be generalized to nonlinearly constrained optimiza-

    tion problems. Similar to the linearly constrained case we consider the problem with

    equality constraints and nonnegative variables.

    The reduced gradient method and its generalization try to extend the methods of

    linear optimization to the nonlinear case. These methods are close to, or equivalent

    to projected gradient methods; just the presentation of methods is frequently quite

    different.

    We generate a reduced gradient search direction by virtually keeping the linearized

    constraints valid. This direction by construction will be in the null space of Ak. More

    specifically for the linearized constraints we have

    h(xk) + Jh(xk)(x xk) = 0 + Ak(x xk) = 0.From this, one has

    BkxBk + NkxNk = Akxk,

    and by introducing the notation bk = Akxk, we have

    xBk = B1k bk B1k NkxNk .

    Hence the basic variables xBk can be eliminated from the linearization of the problem

    (1.2) to result the problem (3.2).

    From this point the generation of the search direction d goes exactly to the same

    way as in the linearly constrained case. Due to the nonlinearity of the constraints

    h(xk+1) = h(xk + d) = 0 will not hold. Hence something more has to be done torestore feasibility.

    In old versions of the GRG Newtons method is applied to the nonlinear equality

    system h(x) = 0 from the initial point xk+1 to produce gradient direction, which is

    combined by a direction from the orthogonal subspace (the range space of Atk) and

    then a modified (nonlinear, discrete) line search is performed. These schemes are quite

    complicated and not discussed here in details.

    Let us recall briefly the essential points of the reduced gradient method: an initial

    feasible guess x0 C is given and a sequence {xk}k0 C is generated by usingiterations of the general form:

    k 0 xk+1 = Qk(xk) = xk + kdk. (3.3)The optimal choice for k is

    k arg min{f(xk + dk) : 0 max}, (3.4)where

    max =

    min1jn

    xkjdkj

    : dkj < 0

    if dk < 0

    if dk 0.We have f(xk + kd

    k) f(xk).

  • 7/29/2019 287-517-1-SM

    7/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 23

    4 Stochastic perturbation of the reduced gradient

    The main difficulty remains the lack of convexity: if f is not convex, the Kuhn-Tucker

    points may not correspond to global minimum. In the following we shall improve this

    point by using an appropriate random perturbation.The sequence of real numbers {xk}k0 is replaced by a sequence of random variables

    {Xk}k0 involving a random perturbation Pk of the deterministic iteration (3.3). Thenwe have X0 = x0;

    k 0 Xk+1 = Qk(Xk) + Pk = Xk + kdk + Pk, (4.1)

    where

    k 1Pk is independent from (Xk1,..., X0), (4.2)x C Qk(x) + Pk C. (4.3)

    Equation (4.1) can be viewed as perturbation of the descent direction dk, which isreplaced by a new direction Dk = d

    k + Pkk and the iterations (4.1) become

    Xk+1 = Xk + kDk.

    General properties defining convenient sequences of perturbation {Pk}kO can be foundin the literature: usually, a sequence of Gaussian laws may be used in order to satisfy

    these properties.

    We introduce a random vector Zk, we denote by k and k the cumulative distri-bution function and the probability density ofZk, respectively.

    We denote by Fk+1(y|Xk

    = x) the conditional cumulative distribution function

    Fk+1(y|Xk = x) = P(Xk+1 < y|Xk = x);the conditional probability density of Xk+1 is denoted by fk+1.

    Let us introduce a sequence of n-dimensional random vectors {Zk}k0 C. Weconsider also {k}k0, a suitable decreasing sequence of strictly positive real numbersconverging to 0 and such that 0 1. The optimal choice for k is

    k arg min{f(Xk + Dk) : 0 max}, (4.4)

    where

    max =

    min1jn

    XkjDkj

    : Dkj < 0

    if Dk < 0

    if Dk 0.Fk+1(y|Xk = x) = P(Xk+1 < y|Xk = x).

    Then

    Fk+1(y|Xk = x) = P(Zk < y Qk(x)k

    )

  • 7/29/2019 287-517-1-SM

    8/24

    24 Abdelkrim El Mouatasim

    and we have

    Fk+1(y|Xk = x) = ky Qk(x)

    k

    fk+1(y|Xk = x) =

    1

    nkky Qk(x)

    k y C. (4.5)

    Note that (2.5) shows that

    y Qk(x) (, ) for (x, y) C C.

    We assume that there exists a decreasing function t gk(t), gk(t) > 0 on IR+ suchthat

    y C ky Qk(x)

    k

    gk

    (, )k

    (4.6)

    For simplicity, let

    Zk = 1C(Zk)Zk.

    (4.7)The procedure generates a sequence Uk = f(X

    k). By construction this sequence is

    decreasing and lower bounded by l

    k 0 l Uk+1 Uk. (4.8)

    Thus there exists U l such that Uk U for k +.Lemma 4.1 LetPk = kZk and = f(x0) if Zk is given by (4.7). Then there exists > 0 such that

    P(Uk+1 < |Uk ) meas(C

    C)

    nk gk(, )

    k

    > 0 (l

    , l

    + ],

    where n = dim(E).

    Proof. Let C = {x C|f(x) < }, for (l, l + ]. Since C C , l < < , itfollows from (2.4) that C is non empty and has a strictly positive measure.

    If meas(C C) = 0 for any (l, l + ], the result is immediate, since we havef(x) = l on C.

    Let us assume that there exists > 0 such that meas(CC) > 0. For (l, l+],we have C C and meas(C C) > 0.

    P(Xk

    C) = P(Xk

    C C) = CC P(Xk

    dx) > 0 for any (l

    , l

    + ], sincethe sequence {Ui}i0 is decreasing, we have also

    {Xi}i0 C. (4.9)

    Therefore

    P(Xk C) = P(Xk C C) =CC

    P(Xk dx) > 0 for any (l, l + ].

  • 7/29/2019 287-517-1-SM

    9/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 25

    Let (l, l + ], we have from (4.8)P(Uk+1 < |Uk ) = P(Xk+1 C|Xi C, i = 0, , k).

    But (4.4) and (4.5) yield that

    P(Xk+1 C|Xi C , i = 0, , k) = P(Xk+1 C|Xk C).Thus

    P(Xk+1 C|Xk C) = P(Xk+1 C, Xk C)

    P(Xk C).

    Moreover

    P(Xk+1 C, Xk C) =CC

    P(Xk dx)C

    fk+1(y|Xk = x) dy.

    Using (4.9) we get

    P(Xk+1 C, Xk C) =CC

    P(Xk dx)C

    fk+1(y|Xk = x) dy

    and

    P(Xk+1 C, Xk C) infxCC

    C

    fk+1(y|Xk = x) dy

    CC

    P(Xk dx).

    Thus

    P(Xk+1 C|Xk C) infxCC

    C

    fk+1(y|Xk = x) dy

    .

    Taking (4.5) into account, we have

    P(Xk+1 C|Xk C) 1nk

    infxCC

    C

    k(y Qk(x)

    k) dy

    .

    Note that (2.5) shows that

    y Qk(x) (, )and (4.6) yields that

    k

    y Qk(x)k

    gk

    (, )k

    .

    Hence

    P(Xk+1 C|Xk C) 1nk

    infxCC

    C

    gk(((, )k

    )) dy.

    It follows that

    P(Xk+1 C|Xk C) meas(C C)nk

    gk

    (, )k

    .

    The global convergence is a consequence of the following result, which yields from

    the Borel-Catellis lemma (see, for instance, [28]).

  • 7/29/2019 287-517-1-SM

    10/24

    26 Abdelkrim El Mouatasim

    Lemma 4.2 Let{Uk}k0 be a decreasing sequence, lower bounded by l. Then thereexists U such that Uk U for k +. Assume that there exists > 0 such that forany (l, l + ], there is a sequence of strictly positive real numbers{ck()}k0 suchthat

    k 0 P(Uk+1 < |Uk ) ck() > 0;+k=0

    ck() = +. (4.10)

    Then U = l almost surely.

    For the proof we refer the reader to [21] or [28].

    Theorem 1 Let = f(x0) and k satisfy (4.4). Assume that x0 C, the sequence k

    is non increasing and+

    k=0

    gk

    (, )k

    = +. (4.11)

    Then U = l almost surely.

    Proof. Let

    ck() =meas(C C)

    nkgk(

    (, )

    k) > 0. (4.12)

    Since the sequence { k }k 0 is non increasing,

    ck() meas(C C)n0

    gk((, )

    k) > 0.

    Thus, equation (4.11) shows that

    +k=0

    ck() meas ( C C )n0

    +k=0

    gk

    (, )

    k

    = +.

    Using Lemma 1 and Lemma 2 we have U = l almost surely.

    Theorem 2 Let Zk = 1C(Z)Z where Z is a random variable following N( 0, Id),

    ( > 0) and let

    k = a

    log(k + d) , (4.13)

    where a > 0, d > 0 and k is the iteration number. If x0 C then, for a large enough,U = l almost surely.

    Proof. We have

    k ( z ) =1

    2n exp

    1

    2

    z

    2

    = gk ( z ) > 0.

  • 7/29/2019 287-517-1-SM

    11/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 27

    So

    gk

    (, )

    k

    =

    1

    2n

    (k + d)(,)2/(22a)

    For a such that

    0 < (, )22a

    < 1,

    we havek=0

    gk

    (, )

    k

    = +,

    and, from the preceding theorem, we have U = l almost surely.

    5 Numerical results

    In order to apply the method presented in Section 4, we start at the initial valueX0 = x0 C. At step k 0, Xk is known and Xk+1 is determined.

    We generate ksto the number of perturbation, the case ksto = 0 corresponds to the

    unperturbed descent method. We add slash variables z1, z2,... to inequality constraints

    in order to opting equality constraints.

    We shall introduce a maximum iteration number kmax such that the iterations are

    stopped when k = kmax or xk is a Kuhn-Tucker point. For each descent direction,

    the optimal k is calculated by unidimensional exhaustive search with a fixed step on

    the interval defined by (3.4). This procedure could imply a large number of calls of

    the objective function. The global number of function evaluations is eval and includes

    the evaluations for unidimensional search. We denote by kend the value of k when theiterations are stopped (it corresponds to the number of evaluations of the gradient of

    f). The optimal value and optimal point are fopt and xopt respectively. The perturba-

    tion is normally distributed and samples are generated by using the log-trigonometric

    generator and the standard random number generator of FORTRAN library. We use

    k =

    alog(k+2)

    , where a > 0.

    The maximum of iterations has been fixed at kmax = 100. The results for ksto = 0,

    250, 500, 1000 and 2000 are given in the tables below for each problems. Concern

    experiments performed on a workstation HP Intel(R) Celeron(R) M processor 1.30GHz,

    224 Mo RAM. The row cpu gives the mean CPU time in seconds for one run.

    5.1 Statistical problem

    The statistical problem of estimating a bounded mean with a minimax procedure is

    nonconvex and nonlinear; we will restrict ourselves to this problem (see for instance, [16]

    and [17]). Thus, our problem is reduced to the following: for a suitable m2 > m1 1.27and for each m (m1, m2] find the a1, a2. In this case, the problem reduces to a globaloptimization problem with 5 unknown (, a1, a2, a3, b) and 3 equality constraints. This

  • 7/29/2019 287-517-1-SM

    12/24

    28 Abdelkrim El Mouatasim

    problem is equivalent to the maximization of the convex combination of the several gifunction without the constraint on (see for instance [14] and [22]). Using the equality

    a1 + a2 + a3 = 1, one variable, a1 for example, can be eliminated. The problem we

    eventually studied is the following:

    Minimize : {(1 a2 a3)g1 + a2g2 + a3g3}subject to:

    a2 + a3 + z1 = 1

    ai + zi = 1 i = 2, 3

    b + z4 = 1

    0 ai i = 2, 30 b, 0 zi i = 1,.., 4,

    where

    g1(a2, a3, b) : (1)2 = ,

    g2(a2, a3, b) : bm exp(bm) +x=1

    (bm (x))2 (bm)x1

    x!exp(bm) = ,

    g3(a2, a3, b) : m exp(m) +x=1

    (m (x))2 mx1

    x!exp(m) = and

    (x) = ma2b

    x exp (1 b)m + a3a2bx1 exp(1 b)m + a3

    if x= 0, 1

    (1) = ma2b exp(bm) + a3 exp(m)

    1 a2 + a3 + a2 exp(bm) + a3 exp(m) .

    There are 7 variables and 10 constraints. We use a = 1 and m = 1.5, the Fortran

    code furnishes the following optimal solutions, see also Table 1:

    a1 = 0.0958 a2 = 0.9042 a

    3 = 0.00000 b

    = 0.907159

    Table 1: Results for Statistics

    ksto 0 250 500 1000 2000

    fopt 0.3401 0.4022 0.4030 0.4044 0.4044eval 1 1183 4773 10743 17635

    kend 2 7 12 13 11

    cpu 0.00 37.16 134.72 288.91 491.86

  • 7/29/2019 287-517-1-SM

    13/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 29

    5.2 Octagon problem

    Consider polygon in the plane with 5 sides (5-gons for short) and unit diameter. Which

    one of them has a maximum area? (see for instant [18,31]). Using QP and geometric

    reasoning, the optimal octagon is determined in (6, 10) to have an area about 2.8%

    larger than the regular octagon.

    Grahams conjecture states that the optimal octagon can be illustrated as in Fig-

    ure 1, in which a solid line between two vertices indicates that the distance between

    these points is one.

    In 1996 Hansen formulates this question in the quadratically constrained quadratic

    optimization problem defining this configuration, which appears below (see for instance

    [8]). By symmetry, and without loss of generality, the constraint x2 x3 is added toreduce the size of the feasible region.

    (

    ((((((((((

    rrrrrrrrrrrrrgggggggggggggg

    A0 = (0, 0)

    A7 = (x3 x1, y1 y3)

    A6 = (x1 x2 + x4, y1 y2 + y4)

    A5 = (x1, y1)A4 = (0, 1)

    A3 = (x1, y1)

    A2 = (x3 x1 x5, y1 y3 + y5)

    A1 = (x1 x2, y1 y2)

    Figure 1: Definition of variables for the configuration conjectured by Graham to have max-imum area

  • 7/29/2019 287-517-1-SM

    14/24

    30 Abdelkrim El Mouatasim

    Minimize: 12{(x2 + x3 4x1)y1 + (3x1 2x3 + x5)y2 + (3x1 2x2 + x4)y3+

    (x3 2x1)y4 + (x2 2x1)y5}+ x1Subject to:

    A0 A1 1 : (x1 x2)2 + (y1 y2)2 + z1 = 1A0 A2 1 : (x1 + x2 x5)2 + (y1 y3 + y5)2 + z2 = 1A0 A6 1 : (x1 x2 + x4)2 + (y1 y2 + y4)2 + z3 = 1A0 A7 1 : (x1 + x3)2 + (y1 y3)2 + z4 = 1,A1 A2 1 : (2x1 x2 x3 + x5)2 + (y2 y3 + y5)2 + z5 = 1A1 A3 1 : (2x1 + x2)2 + y22 + z6 = 1A1 A4 1 : (x1 x2)2 + (y1 y2 1)2 + z7 = 1A1 A7 1 : (2x1 x2 x3)2 + (y2 + y3)2 + z8 = 1A2 A3 1 : (x3 x5)2 + (y3 + y5)2 + z9 = 1

    A

    2 A

    4 1 : (

    x

    1+ x

    3 x

    5)2 + (y

    1 y

    3+ y

    5 1)2 + z

    10= 1

    A2 A5 1 : (2x1 + x3 x5)2 + (y3 + y5)2 + z11 = 1A2 A6 = 1 : (2x1 x2 x3 + x4 + x5)2 + (y2 + y3 + y4 y5)2 = 1A3 A6 1 : (2x1 + x2 x4)2 + (y2 y4)2 + z12 = 1A4 A6 1 : (x1 x2 + x4)2 + (y1 y2 + y4 1)2 + z13 = 1A4 A7 1 : (x1 x3)2 + (1 y1 + y3)2 + z14 = 1A5 A6 1 : (x2 x4)2 + (y2 y4)2 + z15 = 1A5 A7 1 : (2x1 x3)2 + y23 + z16 = 1A6 A1 1 : (2x1 x2 x3 + x4)2 + (y2 + y3 + y4)2 + z17 = 1

    x2 x3 z18 = 0x2i +

    y2i = 1

    , i= 1

    ,2

    ,3

    ,4

    ,5x1 + z19 = 0.5

    xi + z18+i = 1, i = 2, 3, 4, 5

    0 xi, yi i = 1, 2,.., 5, 0 zi i = 1, 2,.., 23.There are 33 variables and 62 constraints.

    We use a = 5, the Fortran code furnishes the following optimal solutions, see also

    Table 2:

    xopt = (0.25682, 0.67357, 0.67060, 0.91187, 0.91342),

    yopt = (0.96619, 0.73978, 0.74245, 0.41172, 0.40820)

    Table 2: Results for Octagon

    ksto 0 250 500 1000 2000

    fopt 0.726637 0.72712 0.727377 0.72738 0.72739eval 2 526 1654 2634 4585

    kend 3 11 12 10 15

    cpu 0.00 1.48 6.70 10.67 12.53

  • 7/29/2019 287-517-1-SM

    15/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 31

    5.3 Mixture problem

    In this example of petrochemical mixture we have four reservoirs: R1, R2, R3 and R4.

    The first two receive three distinct source products. Then their content is combined inthe other two reservoirs to create the wanted miscellanies. The question is to determine

    the quantity of every product to buy in order to maximize the profits.

    The R1 reservoir receives two products of quality (e.g., content in sulfur) 3 and

    1 in quantity q0 and q1 (see for instance [12, 19]). The quality of the mixture done

    contain in R1 is s =3q0+q1q0+q1

    (1, 3). The R2 reservoir contains a product of the thirdsource, of quality 2 in quantity q2. One wants to get in the R3 reservoirs and R4 of

    capacity 10 and 20, of the quality products of to the more 2.5 and 1.5. Figure 2, where

    the variable x1 to x2 represents some quantities, illustrates this situation. The unit

    prices of the bought products are respectively 60, 160 and 100; those of the finished

    products are 90 and 150. Is the difference between the costs of purchase and saletherefore 60q0 + 160q1 + 100q2 90(x1 + x3) 150(x2 x4)? The qualities of thefinal miscellanies are sx1+2x3x1+x3 and

    sx2+2x4x2+x4

    . The addition of the constraints of volume

    conservation q0 + q1 = x1 + x2 and q2 = x3 + x4 permits the elimination of the variables

    q0, q1 and q2:

    q0 =(s 1)(x1 + x2)

    2, q1 =

    (3 s)(x1 + x2)2

    , q2 = x3 + x4.

    R2

    R1

    R4

    R3

    qq0,3

    q1,1

    Ex1, s E 10; 2.5ttt

    tttttttt

    x2,s

    Eq2, 2 E

    x3, 2

    0

    x4, 2E 20; 1.5

    Figure 2: A Problem of mixture

  • 7/29/2019 287-517-1-SM

    16/24

    32 Abdelkrim El Mouatasim

    The mathematical model transformed (the variable s is noted x5) is as follows:

    Minimize: 120x1 + 60x2 + 10x3 50x4 50x1x5 50x2x5Subject to:

    2.5x1

    + 0.5x3

    x1

    x5

    z1

    = 0

    1.5x2 0.5x4 x2x5 z2 = 0x1 + x3 + z3 = 10

    x2 + x4 + z4 = 20,

    x5 + z5 = 3,

    x5 z6 = 1,0 xi i = 1,.., 5; 0 zi i = 1,.., 6.

    There are 11 variables and 17 constraints. We use a = 0.05, the Fortran code furnish

    the following optimal solutions, see also Table 3.

    xopt = (0, 10, 0, 10, 1)

    Table 3: Results for Mixture

    ksto 0 250 500 1000 2000

    fopt 306 353 367 374 400eval 2 28 51 105 28667

    kend 3 7 7 9 101

    cpu 0.00 0.02 0.36 1.27 38.53

    We have the optimum value of ob jective function (fopt = 400) for Mixture problemin ksto = 2000.

    5.4 Alkylation problem

    The alkylation process is common in the petroleum industry, it is an important unit

    that is used in refineries to upgrade light olefins and isobutane into much more highly

    valued gasoline component. The alkylation process is usually mixed at the gasoline

    in order to improve his performance; see the problem 5.3. A simplified process flow

    diagram of an alkylation process is shown in Figure 3 below. The process model seeks

    to determine the optimum set of operating conditions for the process, based on a math-

    ematical model, which allows the maximization of profit. The problem is formulated

    as a direct nonlinear programming model with mixed nonlinear inequality and equalityconstraints and a nonlinear profit function to be maximized (see for instance [3]). A

    typical reaction [29] is

    (CH3)2COlefin feed

    = CH2 + (CH3)3CHIsobutane make up

    Fresh acid (CH3)2CHCH2C(CH3)3Alkylate product

    As shown in Figure 3, an olefin feed (100% butane), a pure isobutane recycle and a

    100% isobutane make up stream are introduced in a reactor together with an acid

  • 7/29/2019 287-517-1-SM

    17/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 33

    catalyst. The reactor product stream is then passed through a fractionator where the

    isobutane and the alkylate product are separated. The spent acid is also removed

    from the reactor.

    Reactor

    E

    EAlkylate product

    ESpent acid

    EOlefin feed

    EIsobutanemake up

    EFresh acid

    Isobutane recycle

    FractionatorEHydrocarbone product

    Figure 3: Simplified Alkylation Process Flow sheet

    The notation used is shown in Table 4 along with the upper and lower bounds on

    each variable. The bounds represent economic, physical and performance constraints.

    Table 4: Variable and their bounds

    Symbol Variable Lower Upper

    Bound Bound

    x1 Olefin feed (barrels/day) 0 2000

    x2 Isobutane recycle (barrels/day) 0 16000

    x3 Acid Addition Rate (X1000 pounds/day) 0 120

    x4 Alkylate yield (barrels/day) 0 5000

    x5 Isobutane make up (barrels/day) 0 2000

    x6 Acid strength (wt.%) 85 93

    x7 Motor octane no. 90 95

    x8 External Isobutane-olefin Ratio 3 12

    x9 Acid dilution factor 1.2 4

    x10 F-4 performance no. 145 162

    The external isobutane- to- olefin ratio, x8, was equal to the sum of the isobutane

    recycle, x2, and the isobutane make up, x5, divided by the olefin feed, x1.

    x8 =x2 + x5

    x1.

  • 7/29/2019 287-517-1-SM

    18/24

    34 Abdelkrim El Mouatasim

    The motor octane number, x7, was a function of the external isobutane- to- olefin ratio,

    x8, and the acid strength by weight percent, x6, (for the same reactor temperatures

    and acid strenghts as for the alkylate yield x4)

    x1 86.35 + 1.098x8 0.038x28 + 0.325(x6 89).

    The acid strength by weight percent, x6, could be derived from an equation that

    expressed the acid addition rate, x3, as a function of the alkylate yield, x4, and the

    acid dilution factor, x9, and the acid strength by weight percent, x6 (the addition acid

    was assumed to have a strength of 98%)

    1000x3 =x4x6x9

    98 x6 .

    This equation repeats itself with the help of quadratic constraints implying an artificial

    variable x11

    x3 = x6x11 and 98000x11 1000x3 = x4x9.

    The alkylate yield, x4, was a function of the olefin feed, x1, and the external isobutane-

    to- olefin ratio, x8. The relationship determined by holding the reactor temperaturebetween 80 F and 90 F and the reactor acid strength by weight percent at 85 to 93

    was:

    x4 x1(1.12 + 0.13167x8 0.00667x28).

    While adding an artificial variable x12, one gets the equivalent quadratic constraints:

    x4 = x1x12 and x12 1.12 + 0.13167x8 0.00667x2

    8.

    The objective function is linear in the cost of the products sources x1, x3 and x5, and

    in the cost of the isobutane recycled x2, and depends linearly on the product of the

    quantity of alkane by the rate of octane x4x7.

    The other constraints of the physical boundary are on the variables and the linear

    relations. Continuation to the elimination of variables, a stake to the scale and a

  • 7/29/2019 287-517-1-SM

    19/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 35

    renumbering of the indications, one gets the following optimization problem:

    Minimize : 614.88x1 171.5x2 6.3x1x4 + 4.27x1x5 3.5x2x5 + 10x3x6Subject to :

    0.325x3 + x4 1.098x5 + 0.038x25 + z1 = 57.4250.13167x5 + x7 + 0.00667x25 + z2 = 1.12

    12.2x1 10x2 + z3 = 200, 12.2x1 10x2 z4 = 0.110x2 + 12.2x1x5 10x2x5 + z5 = 1600, 10x2 + 12.2x1x5 10x2x5 z6 = 0.65.346x1 980x6 0.666x1x4 + 10x2x5 = 0x1 1.22x1x7 + x2x7 = 0x3x6 + z7 = 120

    x1 + z8 = 32.786885, x1 z9 = 0.01x5 + z10 = 12, x5 z11 = 3x2 + z12 = 20, x6 + z13 = 1.411765

    x3 + z14 = 95, x3 z15 = 85x4 + z16 = 95, x4 z17 = 92.66666.0.8196722 + z18 = x7,

    xi 0, i = 1,.., 7; zj, j = 1,.., 18.

    There are 25 variables and 38 restrictions and we use a = 10 and kmax = 100.

    The SPRGM algorithm furnishes the following optimal solutions, see also Table 5:

    xopt = (30.4713, 20.0000, 90.9999, 94.1900, 10.4100, 1.0697, 1.7743)

    Table 5: Results for Alkylation

    nsto 0 250 500 1000 2000

    fopt 1161.626 1176.192 1176.191 1176.186 1176.193eval 2 274 531 1019 2063

    kend 3 14 14 14 14

    cpu 0.00 0.97 1.95 3.86 7.86

    5.5 Pooling problem

    We present in Figure 4 (and Table 6) an example in which the intermediate pools are

    allowed to be intermediate, see for instance [12]. The pooling is thus extended to the

    case where exiting blends of some intermediate pools are feeds of others.

    The proportion model of the generalized pooling problem is not a bilinear program,

    since the variables are not partitioned into two sets. Therefore, this formulation belongs

    to the class of quadratically constrained quadratic programs; cf. for instance [19].

  • 7/29/2019 287-517-1-SM

    20/24

    36 Abdelkrim El Mouatasim

    F3

    F2

    F1

    P1

    P2

    B3

    B2

    B1E

    x11rrrj

    B

    B

    $$$$$$

    $$$$$Xx32d

    dddy23

    y21

    q

    y12dd

    Figure 4: A Generalized Pooling Problem

    Table 6: Characteristics of GP

    Feed Price Attribute Max Pool Max Blend Price Max Attribute

    $/bbl Quality Supply capacity $/bbl Demand Max

    F1 6 3 18 P1 20 B1 9 10 2.5

    F2 16 1 18 P2 20 B2 13 15 1.75

    F3 10 2 18 B3 14 20 1.5

    A hybrid model for GP is given below where v12 denotes the flow from P1 to P2 (all

    other variables are defined as before).

    maxq,t,v,w,x,y 6(x11 + y12 + v12 q21(y12 + v12)) 16q12(y12 + v12)10(x32 + y21 + y23 v12) + 9(x11 + y21) + 13(y12 + x32) + 14y23

    Subject to :

    supply:

    x11 + y12 + v12 q21(y12 + v12) + z1 = 18q21(y12 + v12) + z2 = 18

    x32 + y21 + y23 v12) + z3 = 18

    demand:

    x11 + y21 + z4 = 10

    y12 + x32 + z5 = 15

    y23 + z6 = 20

    capacity: y12 + v12 + z7 = 20y21 + y23 + z8 = 20

    attribute:

    (3 2q21)v12 + 2(y21 + y23 v12) t12(y21 + y23) = 03x11 + t

    12y21 2.5(x11 + y21) + z9 = 0

    2x32 + (3 2q21)y12 1.75(x32 + y12) + z10 = 0t12 + z11 = 1.5

    pos. flow:

    y21 + y23 v12 z12 = 0q21 + z13 = 1

    q 0, t 0, v 0, w 0, x 0, y 0, zi 0, i = 1,.., 13.

  • 7/29/2019 287-517-1-SM

    21/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 37

    There are 21 variables and 35 restrictions, we use a = 50.

    The SPRGM algorithm furnish the following optimal solutions, see also Table 7:

    xopt = (x11, x32) = (0.1506, 2.6564); yopt = (y12, y21, y23) = (0.7425, 0.0005, 19.9988)vopt = v12 = 10.0013; qopt = q21 = 0.9949; topt = t

    12 = 1.5000.

    Table 7: Results for Pooling

    nsto 0 250 500 1000 2000

    fopt 26.15 26.63 26.71 26.72 26.73eval 2 181 441 813 1114

    kend 3 5 5 5 5

    cpu 0.00 0.03 0.03 0.11 0.17

    6 Concluding remarks

    We have presented a stochastic modification of the reduced gradient method for linear

    and nonlinear constraints, involving the adjunction of a stochastic perturbation. This

    approach leads to a stochastic descent method where the deterministic sequence gen-

    erated by the reduced gradient is replaced by a sequence of random variables. We have

    established a result of stochastic descent methods.

    The numerical experiments show that our method is effective to be calculated for

    nonconvex optimization problems verifying a nondegeneracy assumption. The use of

    stochastic perturbations generally improves the results furnished by the reduced gra-

    dient.

    Here yet, we observe that the adjunction of the stochastic perturbation improvesthe result, with a larger number of evaluations of the objective function. The final

    points Xkend are very close or practically identical for ksto 250.The main difficulty in the practical use of the stochastic perturbation is connected

    to the tuning of the parameters. Even if the combination with the deterministic reduced

    gradient increases the robustness, the values of a and ksto have an important influence

    on the quality of the result. The random number generator may have also a influence

    aspects such as the number of evaluations. All the sequences {k}k0 tested havefurnished analogous results, but for different sets of parameters.

    References

    [1] Beck, P., Lasdon, L. and Engquist, M. A reduced gradient algorithm for nonlinear

    network problems. ACM Trans. Math. Softw., 9 (1983), 5770.

    [2] Bouhadi, M., Ellaia, R. and Souza de Cursi, J. E. Random perturbations of the

    projected gradient for linearly constrained problems. Nonconvex Optim. Appl.,

    54 (2001), 487499.

  • 7/29/2019 287-517-1-SM

    22/24

    38 Abdelkrim El Mouatasim

    [3] Bracken, J. and McCormick, G. P. Selected applications of nonlinear programming,

    New York-London-Sydney: John Wiley and Sons, Inc. (1986) XII, 110.

    [4] Carson, Y. and Maria, A. Simulation optimization: methods and applications,

    Proceeding of Winter Simulation Conference, USA, 1997.

    [5] Conn, A.R., Gould, N. and Toint, P.L. Methods for nonlinear constraints in opti-

    mization calculations, Inst. Math. Appl. Conf. Ser., New Ser., 63 (1997), 363390.

    [6] Dorea, C. C. Y. Stopping rules for a random optimization method. SIAM

    Journal Control and Optimization, Vol. 28, No. 4 (1990), 841850.

    [7] Drud, A.S. CONOPT A large-scale GRG code. ORSA J. Comput., Vol. 6,

    No. 2 (1994), 207216.

    [8] El Mouatasim, A. Stochastic perturbation in global optimization: analyses andapplication, PhD Theses, Morocco: Mohammadia School of Engineers, 2007.

    [9] El Mouatasim, A., Ellaia, R. and Souza de Cursi, J. E. Random perturbation of

    variable metric method for unconstraint nonsmooth global optimization. Inter-

    national Journal of Applied Mathematics and Computer Science, Vol. 16, No. 4

    (2006), 463474.

    [10] Ellaia, R., El Mouatasim, A., Banga, J. R. and Sendin, O. H. NBI-RPRGM for

    multi-objective optimization design of Bioprocesses, Journal of ESAIM: PRO-

    CEEDINGS, 20 (2007), 118128.

    [11] El Mouatasim, A., Ellaia, R. and Souza de Cursi, J.E. Projected variable metric

    method for linear constrained nonsmooth global optimization via perturbation

    stochastic. Submitted to RAIRO Journal (2008).

    [12] Foulds, L.R. Haugland, D. and Jornsten, K. A bilinear approach to the pooling

    problem. Optimization. 24 (1992), 165180.

    [13] Floudas, C. A. and Pardalos, P. M. A collection of test problems for constrained

    global optimization algorithms, Lecture Notes in Computer Science, 455. Berlin:

    Springer-Verlag, XIV, 180, 1990.

    [14] Ghosh, M. N. Uniform approximation of minimax point estimates. Ann. Math.

    Statist., 35 (1964), 10311047.

    [15] Gould, N.I.M. and Toint, P.L. SQP Methods for large-scale nonlinear program-

    ming, System modelling and optimization. Methods, theory and applications, 19th

    IFIP TC7 conference, Cambridge, GB, July 12-16, 1999, Boston: Kluwer Aca-

    demic Publishers, IFIP, Int. Fed. Inf. Process. 46, 149178, 2000.

  • 7/29/2019 287-517-1-SM

    23/24

    Reduced Gradient Method and its Generalization via Stochastic Perturbation 39

    [16] Gourdin, E. Global optimization algorithms for the construction of least favorable

    priors and minimax estimators, Engineering Undergraduate Thesis, Every, France:

    Institut dInformatique dInterprise, 1989.

    [17] Gourdin, E., Jaumard, B. and MacGibbon, B. Global optimization decomposi-tion methods for bounded parameter minimax evaluation, SIAM Journal of Sci.

    Comput., Vol. 15, No. 1 (1994), 1635.

    [18] Graham, R. L. The largest small Hexagon. Journal of Combinatorial Theory,

    Series A, 18 (1975), 165170.

    [19] Haverly, C. A. Studies of the behaviour of recursion for the pooling problem,

    ACM SIGMAP Bulletin, 25 (1996), 1928.

    [20] Hock, W. and Schittkowski, K. Test examples for nonlinear programming

    Codes,Lecture Notes in Economics and Mathematical Systems, 187, Berlin-

    Heidelberg-New York: Springer-Verlag., V, 178, 1981.

    [21] LEcuyer, P. and Touzin, R. On the Deng-Lin random number generators and

    related methods. Statistics and Computing, Vol. 14, No. 1 (2003), 59.

    Johnstone, I.M. and MacGibbon, K.B. Minimax estimation of a constrained

    Poisson vector, Ann. Stat., Vol. 20, No. 2 (1992), 807831.

    [22] Kall, P. Solution methods in stochastic programming, Lect. Notes Control Inf. Sci.

    197, 322, 1994.

    [23] Kiwiel, K. C. Free-Steering relaxation mthods for problems with strictly convexcosts and linear constraints, Math. Oper. Res., Vol. 22, No. 2 (1997), 326349.

    [24] Lasdon, L.S., Waren, A.D. and Ratner, M. Design and testing of a generalized

    reduced gradient code for nonlinear programming, ACM Trans. Math. Softw., 4

    (1978), 3450.

    [25] Luenberger, D. G. Introduction to linear and nonlinear programming, Addison-

    Wesley Publishing Company. XII, 356, 1973.

    [26] Martinez, J. M. A direct search method for nonlinear programming. ZAMM,

    Z. Angew. Math. Mech., Vol. 79, No. 4 (1999), 267276.

    Pogu, M. and Souza de Cursi, J.E. Global optimization by random perturbation

    of the gradient method with a fixed parameter. J. Glob. Optim. 5, 2 (1994),

    159180.

    [27] Pine, S. H. Organic chemistry, New York London: McGraw-hill, 1987.

    [28] Raber, U. Nonconvex All-Quadratic global optimization problems: solution meth-

    ods, application and related topics, Disseration Universitat Trier, 1999.

  • 7/29/2019 287-517-1-SM

    24/24

    40 Abdelkrim El Mouatasim

    [29] Reinhardt, K. Extremale polygone gegebenen durchmessers, Jber Dtsch. Math.

    Ver., 31 (1922), 251270.

    [30] Smeers, Y. Generalized reduced gradient method as an extension of feasible

    Direction Methods, J. Optimization Theory Appl., 22 (1977), 209226.

    [31] Sadegh, P. and Spall, J. C. Optimal random perturbations for stochastic approx-

    imation using a simultaneous perturbation gradient approximation, IEEE Trans.

    Autom. Control, Vol. 44, No. 1 (1999), 231232.

    [32] Stople, M. On models and methods for global optimization of structural topology,

    Doctoral Thesis, Stockholm, 2003.