smo - department of engineering, university of cambridgehj231/papers/smoothedfb.pdf · 2 < n h...

Smoothed Fischer-Burmeister Equation Methods for theComplementarity Problem 1Houyuan JiangCSIRO Mathematical and Information SciencesGPO Box 664, Canberra, ACT 2601, AustraliaEmail: [email protected]: By introducing another variable and an additional equation, we describe a techniqueto reformulate the nonlinear complementarity problem as a square system of equations. Someuseful properties of this new reformulation are explored. These properties show that this newreformulation is favourable compared with some pure nonsmooth equation reformulation andsmoothing reformulation because it combines some advantages of both nonsmooth equationbased methods and smoothing methods. A damped generalized Newton method is proposed forsolving the reformulated equation. Global and local superlinear convergence can be establishedunder some mild assumptions. Numerical results are reported for a set of the standard testproblems from the library MCPLIB.AMS (MOS) Subject Classi�cations. 90C33, 65K10, 49M15.Key Words. Nonlinear complementarity problem, Fischer-Burmeister functional,semismooth equation, Newton method, global convergence, superlinear convergence.1 IntroductionWe are concerned with the solution of the nonlinear complementarity problem (NCP)[35]. Let F : <n ! <n be continuously di�erentiable. Then the NCP is to �nd a vectorx 2 <n such that x � 0; F (x) � 0; F (x)Tx = 0: (1)Reformulating the NCP as a constrained or unconstrained smooth optimizationproblem, and as constrained or unconstrained systems of smooth or nonsmooth equa-tions, has been a popular strategy in the last decade. Based on these reformulations,many algorithms such as merit function methods, smooth or nonsmooth equation meth-ods, smoothing methods, and interior point methods have been proposed. In almostall these methods, one usually tries to apply techniques in traditional nonlinear pro-gramming or systems of smooth equations to the reformulated problem considered.Di�erent descent methods have been developed for the NCP by solving the systemof nonsmooth equations reformulated by means of the Fischer-Burmeister functional[18]. See for example [10, 16, 17, 19, 25, 26, 27, 37, 42, 45]. In particular, globalconvergence of the damped generalized Newton method and the damped modi�edGauss-Newton method for the Fischer-Burmeister functional reformulation of the NCPhas been established in [25].1This work was carried out initially at The University of Melbourne and was supported by theAustralian Research Council. 1

A number of researchers have proposed and studied di�erent smoothing methods.We refer the reader to [1, 2, 3, 4, 5, 6, 7, 14, 15, 20, 21, 23, 29, 30, 31, 32, 33, 41, 43, 44]and references therein. The main feature of smoothing methods is to reformulatethe NCP as a system of nonsmooth equations, and then to approximate this systemby a sequence of systems of smooth equations by introducing one or more parame-ters. Newton-type methods are applied to these smooth equations. Under certainassumptions, these solutions of smooth systems converge to the solution of the NCPby appropriately controlling these parameters. It seems that a great deal of e�ort isusually needed to establish global convergence of smoothing methods. The introduc-tion of parameters results in underdetermined systems of equations, which may be thereason from our viewpoint that makes global convergence analysis complicated.The use of smoothing methods by means of the Fischer-Burmeister functional startsfrom Kanzow [29] for the linear complementarity problem. It has now become one ofthe main smoothing tools to solve the NCP and related problems. In particular,Kanzow [30] and Xu [44] have proved global as well as local superlinear convergenceof their smoothing method for the NCP with uniform P -functions respectively. Burkeand Xu [1] proved global linear convergence of their smoothing method for the linearcomplementarity problem with both the P0-matrix and S0-matrix properties. Globalconvergence and local fast convergence analysis is usually complicated because sometechniques are required in order to drive the smoothing parameter to zero. This featureseems to be shared by the other smoothing methods mentioned in the last paragraph.Motivated by the above points, we shall introduce a technique to approximate thesystem of nonsmooth equations by a square system of smooth equations. This canbe ful�lled by introducing a new parameter and a new equation. The solvability ofthe generalized Newton equation of this system can be guaranteed under very mildconditions. Since the reformulated system still gives rise to a smooth merit function, itturns out that the global convergence of the generalized Newton method can be estab-lished by following the standard analysis with some minor modi�cations. Moreover,the damped modi�ed Gauss-Newton method to the smooth equations can be extendedto our system of nonsmooth equations without di�culties. We would like to use theFischer-Burmeister functional [18] to demonstrate our new technique though it maybe adapted for other smoothing methods.In Section 2, the NCP is reformulated as a square system of equations by introduc-ing a parameter, an additional equation and using the Fischer-Burmeister functional.We then study various properties which include semismoothness of the new system,equivalence between the new system and the NCP, and di�erentiability of the leastsquare merit function of the new system. Section 3 is devoted to study of su�cientconditions that ensure nonsingularity of generalized Newton equations, a stationarypoint of the least square merit function to be a solution of the NCP, and boundednessof the level set associated with the least square merit function, respectively. In Sec-tion 4, we propose a damped generalized Newton method for solving this new system.Its global and local superlinear convergence can be established under mild conditions.Numerical results are reported for a set of the test problems from the library MCPLIB.We conclude the paper by o�ering some remarks in the last section.The following notion is used throughout the paper. For the vector x; y 2 <n, xTis the transpose of x and thus xTy is the inner product of x and y. kxk indicates theEuclidean norm of the vector x 2 <n. For a given matrix M = (mij) 2 <n�n and theindex sets I;J � f1; . . . ; ng,MIJ de�nes the submatrix ofM associated with the row2

indexes in I and the column indexes in J . For a continuously di�erentiable functionalf : <n ! <, its gradient at x is de�ned by rf(x). If the function F : <n ! <n iscontinuously di�erentiable at x, then let F 0(x) denote its Jacobian at x. If F : <n !<n is locally Lipschitz continuous at x, then @F (x) indicates its Clarke generalizedJacobian at x [8]. The notion (A) () (B) means that the statements (A) and (B)are equivalent.2 Reformulations and EquivalenceIn order to reformulate the NCP (1), let us recall two basic functions. The �rst one isnow known as the Fischer-Burmeister functional [18] which is de�ned by � : <2 ! <�(b; c) � pb2 + c2 � (b+ c):The second one, denoted by : <3 ! <, is a modi�cation of � or a variation of itscounterpart of � in <3. More precisely, : <3 ! < is de�ned by (a; b; c)� pa2 + b2 + c2 � (b+ c):Note that the function is introduced to study linear complementarity problems byKanzow in [29], where a is treated as a parameter rather than an independent variable.Using these two functionals, we de�ne two functions associated with the NCP asfollows. For any given x 2 <n and � 2 <, de�ne H : <n ! <n byH(x) � 0B@ �(x1; F1(x))...�(xn; Fn(x)) 1CAand G : <n+1 ! <n+1, ~G : <n ! <n byG(�; x) � 0BBBB@ e� � 1 (�; x1; F1(x))... (�; xn; Fn(x)) 1CCCCA � e� � 1~G(�; x) ! ;where e is the Euler constant (or the natural logrithmic base). Consequently, we mayde�ne two systems of equations: H(x) = 0 (2)and G(�; x) = 0: (3)Note that the �rst system has been extensively studied for the NCP (See for example[10, 16, 17, 19, 25, 26, 27, 37, 42, 45] and the references therein). If the �rst equationis removed in the second system, then it reduces to the system introduced by Kanzow[29] for proposing smoothing or continuation methods to solve the LCP. Thereafter,this smoothing technique has been used for solving other related problems (See forexample [1, 15, 20, 23, 29, 30, 31, 44]).The novelty of this paper is to introduce the �rst equation, which makes (3) asquare system. As it will be seen later, this new feature will overcome some di�culties3

encountered by the generalized Newton-type methods based on the system (2), andfacilitate the analysis of global convergence, which is, from our point of view, usuallycomplicated in the smoothing methods. Some nice properties for the methods basedon the system (2) can be established for the similar methods based on (3). Moreover,our analysis is much closer to the spirit of the classical Newton method than smoothingmethods. The global convergence analysis of the generalized Newton and the modi�edGauss-Newton method for the system (2) has been done in [25]. In the sequel, thesecond system will be the main one to be considered despite some connections anddi�erences between (2) and (3) are explored.One may de�ne other functions which may play the same role as e� � 1. Forsimplicity of analysis, we use this special function in the sequel. See the discussions inSection 6 for more details on how to de�ne these kinds of functions.The least squares of H and G are denoted by � and , namely,�(x) � 12kH(x)k2;(�; x) � 12kG(�; x)k2:� and are usually called merit functions.The de�nitions of the functions H and G heavily depend on the functional � and respectively. Certainly, the study of some fundamental properties of � and willhelp to get more insights into the functions H and G. Let E : <n ! <n be locallyLipschitz continuous at x 2 <n. Then the Clarke generalized Jacobian @E(x) of E atx is well-de�ned and can be characterized by the convex hull of the following setf limxk!xE0(xk)j E is di�erentiable at xk 2 <ng:@E(x) is a nonempty, convex and compact set for any �xed x [8]. E is said to besemismooth at x 2 <n if it is directionally di�erentiable at x, i.e., E 0(x; d) exists forany d 2 <n, and if V d�E 0(x; d) = o(kdk)for any d ! 0 and V 2 @E(x+ d). E is said to be strongly semismooth at x if it issemismooth at x and V d�E 0(x; d) = O(kdk2):See [39, 36, 19] for other characterizations and di�erential calculus of semismoothnessand strong semismoothness.We now present some properties of , G and . Note that similar properties for�, H and � have been studied in [10, 17, 18, 22, 27, 28].Lemma 2.1 (i) When a = 0, then (a; b; c) = 0 if and only if b � 0, c � 0 andbc = 0.(ii) is locally Lipschitz, directionally di�erentiable and strongly semismooth on<3. Furthermore, if a2 + b2 + c2 > 0, then is continuously di�erentiable at(a; b; c) 2 <3. Namely, is continuously di�erentiable except at (0; 0; 0). Thegeneralized Jacobian of at (0; 0; 0) is@ (0; 0; 0) = O � 8><>:0B@ �� 1CA j �2 + (� + 1)2 + ( + 1)2 � 19>=>; :4

(iii) 2 is smooth on <3. The gradient of 2 at (a; b; c)2 <3 isr 2(a; b; c) = 2 (a; b; c)@ (a; b; c):(iv) @b (a; b; c)@c (a; b; c)� 0 for any (a; b; c) 2 <3. If (0; b; c) 6= 0, then @b (0; b; c)@c (0; b; c)>0.(v) 2(0; b; c) = 0() @b 2(0; b; c) = 0() @c 2(0; b; c) = 0() �@b 2(0; b; c) = @c 2(0; b; c) = 0�.Proof. (i) Note that (0; b; c) = �(b; c). The result can be veri�ed easily.(ii) Note that pa2 + b2 + c2 is the Euclidean norm of the vector (a; b; c)T. Thenpa2 + b2 + c2 is locally Lipschitz, directionally di�erentiable and strongly semismoothon <3. �(b+ c) is continuously di�erentiable on <3, hence locally Lipschitz, direction-ally di�erentiable and strongly semismooth on <3. Fischer [19] has proved that thecomposition of strongly semismooth functions is still strongly semismooth. Therefore, is locally Lipschitz, directionally di�erentiable and strongly semismooth on <3. Ifa2 + b2 + c2 > 0, pa2 + b2 + c2 is continuously di�erentiable at (a; b; c), and so is .Let d 2 <3 and d 6= 0. Then is continuously di�erentiable at td for any t > 0. Andr (td) = ( d1qd21 + d22 + d23 ; d2qd21 + d22 + d23 � 1; d3qd21 + d22 + d23 � 1)T :For simplicity, let r (td) be denoted by (�; �; )T. Clearly,�2 + (� + 1)2 + ( + 1)2 = 1:Let t tend to zero. By the semicontinuity property of the Clarke Jacobian, we obtainthat (�; �; ) 2 @ (0; 0; 0):It follows from the convexity of the generalized Jacobian thatO � @ (0; 0; 0):On the other hand, for any (a; b; c) 6= 0,(ra (a; b; c))2+ (rb (a; b; c)+ 1)2 + (rc (a; b; c)+ 1)2 = 1:By the de�nition of the Clarke generalized Jacobian, one may conclude that@ (0; 0; 0)� O:This shows that @ (0; 0; 0) = O.(iii) Since is smooth everywhere on <3 except at (0; 0; 0), (0; 0; 0) is the only pointat which 2 is possibly not smooth. But it is easy to prove that 2 is also smooth at(0; 0; 0). Therefore, 2 is smooth on <3. Furthermore,r 2(a; b; c) = 2 (a; b; c)@ (a; b; c):Note that 2 (0; 0; 0)@ (0; 0; 0) = f0g is singleton though @ (0; 0; 0) = fOg is a set.(iv) By (ii), for any (a; b; c)2 <3 and any (�; �; )T 2 @ (a; b; c), we have�2 + (� + 1)2 + ( + 1)2 � 1:5

This shows that � � 0. Suppose (0; b; c) 6= 0. Then it holds that either minfb; cg< 0or bc 6= 0. In both cases, (ii) implies that � 6= 0 and 6= 0. Consequently, � > 0.(v) Clearly, if 2(0; b; c) = 0, then (iii) implies all the other results. If either@b 2(0; b; c) = 0 or @c 2(0; b; c) = 0, then we must have 2(0; b; c) = 0. If this is notso, (iv) implies that @b (0; b; c)@c (0; b; c)> 0, which is a contradiction. The proof iscomplete. 2Proposition 2.1 (i) If (�; x) is a solution of (3), then � = 0. And x is a solutionof the NCP if and only if (0; x) is a solution of (3), i.e. G(�; x) = 0.(ii) G is continuously di�erentiable at (�; x) when � 6= 0 and F is continuouslydi�erentiable at x. G is semismooth on <n+1 if F is continuously di�erentiableon <n, and G is strongly semismooth on <n+1 if F 0(x) is Lipschtiz continuouson <n. If V 2 @G(�; x), then V is of the following format,V = e� 0C DF 0(x) +E !where C 2 <n, and both D and E are diagonal matrices in <n�n satisfyingCi = �q�2 + x2i + (Fi(x))2 ;Dii = xiq�2 + x2i + (Fi(x))2 � 1;Eii = Fi(x)q�2 + x2i + (Fi(x))2 � 1;if �2 + x2i + Fi(x)2 > 0, and Ci = �i;Dii = �i;Eii = i;with �2i + (�i + 1)2 + ( i + 1)2 � 1 if �2 + x2i + Fi(x)2 = 0.(iii) (�; x) � 0 for any (�; x) 2 <n+1. And when the NCP has a solution, x is asolution of the NCP if and only if (0; x) is a global minimizer of over <n+1.(iv) is continuously di�erentiable on <n+1. The gradient of at (�; x) isr(�; x) = V TG(�; x) = e�(e� � 1) + CT ~G(�; x)F 0(x)TD ~G(�; x) +E ~G(�; x) !for any V 2 @G(�; x).(v) In (iv), for any � and x(D ~G(�; x))i(E ~G(�; x))i � 0; 1 � i � nIf ~Gi(0; x) 6= 0, then (D ~G(0; x))i(E ~G(0; x))i > 0:6

(vi) The following four statements are equivalent.~G(0; x))i = 0;(D ~G(0; x))i = 0;(E ~G(0; x))i = 0;(D ~G(0; x))i = (E ~G(0; x))i = 0:Proof. (i) If G(�; x) = 0, then e� � 1 = 0, i.e., � = 0. The rest follows from (i) ofLemma 2.1.(ii) When � 6= 0 and F is continuously di�erentiable at x, (�; xi; Fi(x)) is contin-uously di�erentiable at (�; x) for 1 � i � n. Hence G(�; x) is di�erentiable at (�; x).Note that the composition of any two semismooth functions or strongly semismoothfunctions is semismooth or strongly semismooth (See [19]). Since is strongly semis-mooth on <3 by (ii) of Lemma 2.1, semismoothness or strong semismoothness of Gfollows respectively if F is smooth at x or if F 0 is Lipschitz continuous at x. Theform of an element V in @G(�; x) follows from the Chain Rule Theorem (Theorem2.3.9 of [8]) and the generalized Jacobian form of in (ii) of Lemma 2.1. It should bepointed out that unlike @ , we only manage to give an outer estimation of @G(�; x).Nevertheless, this outer estimation will be enough for the following analysis.(iii) Trivially, (�; x) � 0 for any (�; x). If x is a solution of the NCP, (i) showsthat G(0; x) = 0, i.e., (0; x) is a global minimizer of . Conversely, if the NCP has asolution, then the global minimum of is zero. If in addition, (0; x) is also a globalminimizer of , then (0; x) = 0 and G(0; x) = 0. The desired result follows from (i)again.(iv) can be rewritten as follows:(�; x) = 12(e� � 1)2 + 12 nXi=1 k (�; xi; Fi(x))k2:The smoothness of over <n+1 follows from the smoothness of F and 2. The formof r follows from the Chain Rule Theorem and the smoothness of .(v) and (vi) The proof is analogous to that of (vi) and (v) of Lemma 2.1. It isomitted. 2Remark. Let W denote the set of all elements DF 0(x) + E such that there exists avector C which makes the following matrix 1 0C DF 0(x) +E !an element of @G(0; x). On the one hand, any element of @G(0; x) is very much likethe element of @H(x), and @H(x) � W . Because of this similarity, some standardanalysis on @H(x) can be extended to @G(0; x) as we shall see in the next section. Onthe other hand, we must be aware that @H(x) and W are not the same in general. See[8] for more details. Therefore, some extra care needs to be taken if we say that sometechniques on @H can be extended to W or @G(0; x).The results below reveal that , ~G and reduce to �, H and � when � = 0.Further relationships between them can be explored. But we do not proceed here.Lemma 2.2 (i) (0; b; c) = �(b; c) for any b; c 2 <.7

(ii) ~G(0; x) = H(x) for any x 2 <n.(iii) (0; x) = �(x) for any x 2 <n.3 Basic PropertiesIn this section, some basic properties of the functions G and are investigated. Theseproperties include nonsingularity of the generalized Jacobian of G, su�cient conditionsfor a stationary point of to be a solution of the NCP, and the boundedness of thelevel set of the merit function .In the context of the nonlinear complementarity, the notions of monotone matrices,monotone functions and other related concepts play important roles. We review someof them in the following.A matrixM 2 <n�n is called a P -matrix (P0-matrix) if each of its principal minorsis positive (nonnegative). A function F : <n ! <n is said to be a P0-function over theopen set S � <n if for any x; y 2 S with x 6= y, there exists i such that xi 6= yi and(xi � yi)(Fi(x)� Fi(y)) � 0:F is a uniform P -function over S if there exists a positive constant � such that for anyx; y 2 S max1�i�n(xi � yi)(Fi(x)� Fi(y)) � �kx� yk2:Obviously, a P -matrix must be a P0-matrix, and a uniform P -function must be a P0-function. It is well known that the Jacobian of a P0-function is always a P0-matrixand the Jacobian of a uniform P -function is a P -matrix (See [9, 34]).The following characterization on a P0-matrix can be found in Theorem 3.4.2 of[9].Lemma 3.1 A matrix M 2 <n�n is a P0-matrix if and only if for every nonzero xthere exists an index i (i � i � n) such that xi 6= 0 and xi(Mxi) � 0.To guarantee nonsingularity of the generalized Jacobian of G at a solution of (3),R-regularity introduced by Robinson [40] will be proved to be one of the su�cientconditions. Suppose x� is a solution of the NCP (1). De�ne three index setsI := f1 � i � n j x�i > 0 = Fi(x�)g;J := f1 � i � n j x�i = 0 = Fi(x�)g;K := f1 � i � n j x�i = 0 < Fi(x�)g:The NCP is said to be R-regular at x� if the submatrix F 0(x�)II of F 0(x�) is nonsingularand the Schur-complementF 0(x�)JJ � F 0(x�)JIF 0(x�)�1IIF 0(x�)IJis a P -matrix.Proposition 3.1 (i) If � 6= 0 and F 0(x) is a P0-matrix, then V is nonsingular forany V 2 @G(�; x).(ii) If F 0(x) is a P -matrix, then V is nonsingular for any V 2 @G(�; x).8

(iii) If � = 0 and the NCP is R-regular at x�, then V is nonsingular for any V 2@G(0; x�).Proof. From the de�nition of the generalized Jacobian of G(�; x), it follows that forany V 2 @G(�; x), V is nonsingular if and only if the following submatrix of V isnonsingular; DF 0(x) +E:(i) If � 6= 0, then both �D and �E are positive de�nite diagonal matrices. Thenonsingularity of DF 0(x)+E is equivalent to the nonsingularity of the matrix F 0(x)+D�1E with D�1E a positive de�nite diagonal matrix. It follows that F 0(x) + D�1Eis a P -matrix hence nonsingular if F 0(x) is a P0-matrix.(ii) If F 0(x) is a P -matrix, as remarked after Proposition 2.1, the technique to provenonsingularity of the matrix DF 0(x) + E is quite standard. We omit the detail hereand refer the reader to [27] for a proof.(iii) If � = 0 and the NCP is R-regular at x�, the techniques to prove nonsingularityof DF 0(x) + E are also standard. See for example [17]. Therefore, nonsingularity of@G at (0; x�) follows from nonsingularity of DF 0(x) +E. 2The next result provides a su�cient condition so that a stationary point of theleast square merit function implies a solution of the NCP.Proposition 3.2 If (�; x) is a stationary point of and F 0(x) is a P0-matrix, then� = 0 and x is a solution of the NCP.Proof. Suppose (�; x) is a stationary point of , i.e., r(�; x) = 0. By Lemma2.1, r(�; x) = V TG(�; x) = 0 for any V 2 @G(�; x). We now prove that � = 0.Otherwise, assume � 6= 0. Then V is nonsingular by Proposition 3.1. This shows thatG(�; x) = 0, which implies � = 0. This is a contradiction. Therefore, � = 0. In thiscase, V TG(0; x) = 0 implies that(F 0(x))TD ~G(0; x) +E ~G(0; x) = 0;and Dii ~G(�; x)(F 0(x)TD ~G(�; x))i +Dii ~Gi(�; x)Eii ~Gi(�; x) = 0:Suppose ~Gi(0; x) 6= 0 for some index i. By (v) and (vi) of Proposition 2.1,Dii ~G(0; x)(F 0(x)TD ~G(0; x))i < 0;for any index i such that ~Gi(0; x) 6= 0. By Lemma 3.1, F 0(x)T and F 0(x) are notP0-matrices. This is a contradiction. Therefore,~G(0; x) = 0;which, together with � = 0 shows that G(�; x) = 0. The desired result follows from(i) of Proposition 2.1. 2Lemma 3.2 If F is a uniform P -function on <n and fxkg is an unbounded sequence,then there exists i (1 � i � n) such that both the sequences fxki g and fFi(xk)g areunbounded. 9

Proof. See the proof of Proposition 4.2 of Jiang and Qi [27]. 2Lemma 3.3 Suppose that f(ak; bk; ck)g is a sequence such that fakg is bounded, fbkgand fckg are unbounded. Then f (ak; bk; ck)g is unbounded.Proof. Without loss of generality, we may assume that bk ! 1 and ck ! 1 as ktends to in�nity. By the de�nition of , it is clear that k ! +1 if either bk or cktends to �1. Now assume that bk ! +1 and ck ! +1. Then for su�ciently largek, it follows thatj (ak; bk; ck)j = �(ak)2 + 2bkckq(ak)2 + (bk)2 + (ck)2 + bk + ck= �(ak)2 + 2maxfbk; ckgminfbk; ckgq(ak)2 + (bk)2 + (ck)2 + bk + ck� �(ak)2 + 2maxfbk; ckgminfbk; ckgq�(ak)2 + 2(maxfbk; ckg)2 + 2maxfbk; ckgHence, it follows from the boundedness of fakg that k is unbounded. This completesthe proof. 2Proposition 3.3 If F is a uniform P -function on <n and �k is bounded, then the setof the level sets k(�) � f(�k; x) : (�k ; x) � �gis bounded for any � � 0.Proof. Assume that k(�) is unbounded. Then there exists a sequence f�k ; xkgwhich is unbounded such that (�k ; xk) � �. This implies that fxkg is unbounded bythe boundedness of �k . By Lemma 3.2, there exists an index i such that both xki andFi(xk) are unbounded. Lemma 3.3 shows that (�k; xki ; Fi(xk)) is unbounded. Clearly,we obtain that (�k ; xk) is unbounded. This is a contradiction. Therefore, k(�) isbounded for any � � 0. 24 A Damped Generalized Newton Method and Conver-genceIn this section, we develop a generalized Newton method for the system (3). Themethod contains two main steps. The �rst one is to de�ne a search direction, whichwe call the Newton step, by solving the following so-called generalized Newton equationV d = �G(�; x): (4)where V 2 @G(�; x). The generalized Newton equation can be rewritten as followse�d� = �(e� � 1);Cd�+ (DF 0(x) + E)dx = � ~G(�; x);where ~G(�; x) is de�ned as in Section 2. The second main step is to do a line searchalong the generalized Newton step to decrease the merit function . The full descrip-tion of our method is stated as follows. For simplicity, let z = (�; x), z+ = (�+; x+)and zk = (�k ; xk). Similarly, dk = (d�k; dxk), etc.Algorithm 1 (Damped generalized Newton method)10

Step 1 (Initialization) Choose an initial starting point z0 = (�0; x0) 2 <n+1 suchthat �0 > 0, two scalars �; � 2 (0; 1), and let k := 0.Step 2 (Search direction) Choose Vk 2 @G(zk) and solve the generalized Newtonequation (4) with � = �k , z = zk and V = Vk. Let dk be a solution of thisequation. If d = 0 is a solution of the generalized Newton equation, the algorithmterminates. Otherwise, go to Step 3.Step 3 (Line search) Let �k = �ik where ik is the smallest nonnegative integer isuch that (zk + (�)idk)�(zk) � �(�)ir(zk)Tdk:Step 4 (Update) Let zk+1 := zk + �kdk and k := k + 1. Go to Step 2.The above generalized Newton method reduces to the classical damped Newtonmethod if G is smooth. See Dennis and Schnabel [11]. A similar algorithm for solvingthe system (2) is proposed in [25]. It has been recognized for a long time that non-monotone line search strategies are superior to the monotone line search strategy froma numerical point of view. As shall be seen later, we shall implement a non-monotoneline search in our numerical experiments. In a non-monotone version of the dampedgeneralized Newton method, (zk) on the left-hand side of the inequality in Step 3 isreplaced by maxf(zk);(zk�1); . . . ;(zk�l)g;where l is a positive integer number. When l = 0, the non-monotone line searchcoincides with the monotone line search.Lemma 4.1 If G(z) 6= 0 and the generalized Newton equation (4) is solvable at z, thenits solution d is a descent direction of the merit function at z, that is G0(z)Td < 0.Furthermore, the line search step is well-de�ned at z.Proof. It trivially follows from the di�erentiability of and the generalized Newtonequation. 2Since is continuously di�erentiable on <n+1, it is easy to see that Algorithm 1is well-de�ned provided that the generalized Newton direction is well-de�ned at eachstep. In Step 2, the existence of the search direction depends on the solvability of thegeneralized Newton equation. From Proposition 3.1, the generalized Newton equationis solvable if rF (x) is a P0-matrix and � 6= 0.We repeat that the main di�erence between (2) and (3) is that (3) has one morevariable � and one more equation than (2). This additional variable � must be drivento zero in order to obtain a solution of (3) or a solution of the NCP from Algorithm1. So we next present a result on � and d�.Lemma 4.2 When � > 0, then d� 2 (��; 0). Moreover,� + td� 2 (0; �) if � > 0:for any t 2 (0; 1]. 11

Proof. By the the �rst equation of the generalized Newton equation (4) and the Taylorseries, we have d� = �e� � 1e�= �P1i=1 1i!�nP1i=0 1i!�n= ��P1i=0 1(i+ 1)!�nP1i=0 1i!�n ;which implies that d� 2 (��; 0) when � > 0. It is easy to see that �+ td� 2 (0; �) forany t 2 (0; 1]. 2Simply speaking, the above result says that after each step, the variable � willbe closer to zero than the previous value. Namely, � is driven to zero automatically.However, � is always positive. This implies two important observations. Firstly, Gis continuously di�erentiable at zk = (�k; xk), which is nice. Secondly, the solvabilityof the generalized Newton equation becomes more achievable in the case � 6= 0 than� = 0; see Proposition 3.1.Theorem 4.1 Suppose the generalized Newton equation in Step 2 is solvable for eachk. Assume that z� = (��; x�) is an accumulation point of fzkg generated by the dampedgeneralized Newton method. Then the following statements hold:(i) x� is a solution of the NCP if fdkg is bounded.(ii) x� is a solution of the NCP and fzkg converges to z� superlinearly if @G(z�) isnonsingular and � 2 (0; 12). The convergence rate is quadratic if F 0 is Lipschitzcontinuous on <n.Proof. The proof is similar to that of Theorem 4.1 in [25] where the damped gener-alized Newton method is applied to the system (2). We omit the details. 2Corollary 4.1 Suppose F is a P0-function on <n and � 2 (0; 12). Then Algorithm 1is well-de�ned. Assume z� = (��; x�) is an accumulation point of fzkg and @G(z�)is nonsingular or F 0(x�) is a P -matrix. Then �� = 0, x� is a solution of the NCP,and zk converges to (0; x�) superlinearly. If F 0 is Lipschitz continuous on <n, then theconvergence rate is quadratic.Proof. By Lemma 4.2, �k > 0 for any k. Since F is a P0-function, it follows fromProposition 3.1 that @G(�k; xk) is nonsingular, which implies that the generalizedNewton equation is solvable for any k. The result follows from Theorem 4.1. 2Corollary 4.2 Suppose F is a uniform P -function on <n and � 2 (0; 12). Then Algo-rithm 1 is well-de�ned, fzkg is bounded and zk converges to z� = (0; x�) superlinearlywith x� the unique solution of the NCP, and the convergence rate is quadratic if F 0 isLipschitz continuous on <n. 12

Proof. The results follow from Proposition 3.3 and Corollary 4.1. 2Reamrk. One point worthy mentioning is about the calculation of the generalizedJacobian of G(�; x) since we only managed to give an outer estimation of @G(�; x) inProposition 2.1. However, we never have to worry about this in Algorithm 1. Thereason is that the parameter �k is never equal to zero for any k. This implies thatG is actually smooth at (�k; xk) for any k. Therefore, the generalized Jacobian of Greduces to the Jacobian of G which is singleton and easy to calculate.5 Numerical ResultsIn this section, we present some numerical experiments for Algorithm 1 in Section 4with a non-monotone line search strategy. We chose l = 3 for k � 4 and l = k � 1for k = 2; 3, where k is the iteration index. We also made the following change inour implementation: �k is replaced by 10�6 when �k < 10�6 because our experienceshowed that numerical di�culties occur sometimes if �k is too close to zero.Algorithm 1 was implemented in MATLAB and run on a Sun SPARC workstation.The following parameters were used for all the test problems: �0 = 10:0, � = 10�4,� = 0:5. The default initial starting point was used for each test problem in the libraryMCPLIB [12, 13]. The algorithm is terminated when one of the following criteria issatis�ed: (i) The iteration number reaches to 500; (ii) The line search step is less than10�10; (iii) The minimum ofkmin(F (xk); xk)k1 and kr(zk)k2is less than or equal to 10�6.We tested the nonlinear and linear complementarity problems from the libraryMCPLIB [12, 13]. The numerical results are summarized in Tables 1, where Dimdenotes the number of variables in the problem, Iter the number of iterations, whichis also equal to the number of Jacobian evaluations for the function F , NF the numberof function evaluations for the function F , and " the �nal value of kmin(F (x�); x�)k1at the found solution x�.The algorithm failed to solve bishop, colvdual, powell and shubik initially.Therefore, we perturbed the Jacobian matrices for these problems by adding � � I toF 0(xk), where � > 0 is a small constant and I is an identity matrix. We used � = 10�5for bishop, powell and shubik, and � = 10�2 for colvdual. Our code failed to solvetinloi within 500 iterations whether Jacobian perturbation is used or not. However,our experiment showed that it did not make any meaningful progress from the 33-rditeration to the 500-th iteration. In fact, " = 2:07�6 in the both iterations and " isvery close to 10�6 that was used for termination.All other problems have been solved successfully. One may see that most problemswere solved in small number of iterations. One important observation is that thenumber of function evaluations is very close to the number of iterations for most ofthe test problems. This implies that full Newton steps are taken most times andsuperlinear convergence follows. 13

Problem Dim Iter NF "bertsekas 15 16 17 1.08e-07billups 1 14 15 6.09e-07bishop� 1645 83 176 4.87e-07colvdual+ 20 17 18 2.40e-07colvnlp 15 16 17 1.25e-08cycle 1 14 15 1.23e-11degen 2 14 15 1.21e-10explcp 16 14 15 1.75e-10hanskoop 14 22 33 7.03e-08jel 6 14 15 7.20e-11josephy 4 14 15 1.32e-10kojshin 4 15 16 8.59e-07mathinum 3 22 23 6.45e-07mathisum 3 15 16 4.86e-07nash 10 14 15 6.68e-11pgvon106 106 39 71 3.44e07powell� 16 15 23 2.22e-09scarfanum 13 19 20 1.29e-08scarfasum 14 21 23 1.83e-08scarfbsum 40 22 32 4.12e-08shubik� 45 169 1093 7.45e-07simple-red 13 14 15 2.27e-08sppe 27 14 15 1.57e-10tinloi�� 146 32 (500) 118 (14540) 2.07e-06tobin 42 14 15 1.18e-10Table 1: Numerical results for the problems from MCPLIB6 Concluding RemarksBy introducing another variable and an additional equation, we have reformulated theNCP as a square system of nonsmooth equations. It has been proved that this refor-mulation shares some desirable properties of both nonsmooth equations reformulationsand smoothing techniques. The semismoothness of the equation and the smoothnessof its least square merit function enable us to propose the damped generalized New-ton method, and to prove global as well as local superlinear convergence under mildconditions. Encouraging numerical results have been reported.The main feature in the proposed methods is the introduction of the additionalequation e� � 1 = 0:As we have seen, f�kg is a monotonically decreasing positive sequence if �0 > 0. Thisproperty ensures the following important consequences: (i) the reformulated systemis smooth at each iteration, which might not be so important for our methods sincethe system is semismooth everywhere; (ii) the linearized system has a unique solution14

at any iteration k under mild conditions such as P0-property; (iii) the fact that �kmust be driven to zero is usually satis�ed in order to ensure right convergence (i.e.,the accumulation point should be the solution of the equation or a stationary point ofthe least square merit function).One may �nd other functions which may play a similar role. For example, e�+��1 = 0 might be an alternative. In general, the equation e� � 1 = 0 can be replaced bythe equation �(�) = 0, where � satis�es the following conditions:(i) � : < ! < is continuously di�erentiable with �0(�) > 0 for any �(ii) �(�) = 0 implies that � = 0(iii) d� = � �(�)�0(�) 2 (��; 0) for any � > 0.Some comments on the requirements imposed on the function � are explained asfollows. The condition (i) is to ensure that is smooth and that d� is well-de�ned.The condition (ii) guarantees that G(�; x) = 0 implies that � = 0 and x is a solutionof the NCP and a stationary point of the merit function is a solution of the NCPunder some mild conditions; see Propositions 2.1 and 3.2. The condition (iii) impliesthat 0 < � + td� < � for any t 2 (0; 1], which is required in Armijo line search ofAlgorithm 1, and which also ensures that � always remains positive and in a boundedset.In [38], Qi, Sun and Zhou also treated smoothing parameters as independent vari-ables in their smoothing methods. In their algorithm, these smoothing parametersare updated according to both the line search rule and the quality of the approximatesolution of the problem considered. See the mentioned paper for more details. As hasbeen seen in Algorithm, our smoothing parameter � is updated by the line search rule.The techniques introduced in this paper seem to be applicable for variational in-equality, mathematical programs with equilibrium constraints, semi-de�nite mathe-matical programs and related problems. The technique of introducing an additionalequation may be useful in other methods to solve the NCP and related problems asfar as parameters are needed to be introduced. In an early version [24] of this paper,a damped modi�ed Gauss-Newton method and another damped generalized Newtonmethod based on a modi�ed functional of were proposed, and global as well as lo-cal fast convergence results were established. The interested reader is referred to thereport [24] for more details.Acknowledgements. The author is grateful to Dr. Danny Ralph for his numerousmotivative discussions and many constructive suggestions and comments, and to Dr.Steven Dirkse for providing him the test problems and an MATLAB interface to accessthese problems. I am also thankful to anonymous referees and Professor Liqun Qi fortheir valuable comments.References[1] J. Burke and S. Xu, The global linear convergence of a non-interior path-followingalgorithm for linear complementarity problem, Mathematics of Operations Re-search 23 (1998) 719-735. 15

[2] B. Chen and X. Chen, A global and local superlinear continuation-smoothingmethod for P0+R0 and monotone NCP, SIAM Journal on Optimization 9 (1999)624-645.[3] B. Chen and P.T. Harker, A continuation method for monotone variational in-equalities, Mathematical Programming (Series A) 69 (1995) 237-254.[4] B. Chen and P.T. Harker, Smooth approximations to nonlinear complementarityproblems, SIAM Journal on Optimization 7 (1997) 403-420.[5] C. Chen and O.L. Mangasarian, Smoothing methods for convex inequalities andlinear complementarity problems, Mathematical Programming 71 (1995) 51-69.[6] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear andmixed complementarity problems, Computational Optimization and Applications5 (1996) 97-138.[7] X. Chen, L. Qi and D. Sun, Global and superlinear convergence of the smooth-ing Newton methods and its application to general box constrained variationalinequalities, Mathematics of Computation 67 (1998) 519-540.[8] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983.[9] R.W. Cottle, J.-S. Pang and R.E. Stone, The Linear Complementarity Problems,Academic Press, New York, 1992.[10] T. De Luca, F. Facchinei and C. Kanzow, A semismooth equation approach tothe solution of nonlinear complementarity problems, Mathematical Programming75 (1996) 407-439.[11] J.E. Dennis and R.B. Schnabel, Numerical Methods for Unconstrained Optimiza-tion and Nonlinear Equation, Prentice Hall, Englewood Cli�s, New Jersey, 1983.[12] S.P. Dirkse, MCPLIB and MATLAB interface MPECLIB and MCPLIB models,http://www.gams.com/mpec, 2001.[13] S.P. Dirkse and M.C. Ferris, MCPLIB: A collection of nonlinear mixed comple-mentarity problems, Optimization Methods and Software 5 (1995) 407-439.[14] J. Eckstein and M. Ferris, Smooth methods of multipliers for complementarityproblems, Mathematical Programming 86 (1999) 65-90.[15] F. Facchinei, H. Jiang and L. Qi, A smoothing method for mathematical programswith equilibrium constraints, Mathematical Programming 85 (1999) 81-106.[16] F. Facchinei and C. Kanzow, A nonsmooth inexact Newton method for the solu-tion of large-scale nonlinear complementarity problems, Mathematical Program-ming (Series B) 17 (1997) 493-512.[17] F. Facchinei and J. Soares, A new merit function for nonlinear complementarityproblems and a related algorithm, SIAM Journal on Optimization 7 (1997) 225-247. 16

[18] A. Fischer, A special Newton-type optimization method, Optimization 24 (1992)269-284.[19] A. Fischer, Solution of monotone complementarity problems with locally Lips-chitzian functions, Mathematical Programming 76 (1997) 513-532.[20] M. Fukushima, Z.-Q. Luo and J.-S. Pang, A globally convergent sequentialquadratic programming algorithm for mathematical programs with linear com-plementarity constraints, Computational Optimization and Applications 10 (1998)5-34.[21] S.A. Gabriel and J.J. Mor�e, Smoothing of mixed complementarity problems, inComplementarity and Variational Problems, Michael C. Ferris and Jong-Shi Pang,eds., SIAM Publications, 1997, pp. 105-116.[22] C. Geiger and C. Kanzow, On the resolution of monotone complementarity prob-lems, Computational Optimization and Applications 5 (1996) 155-173.[23] K. Hotta and A. Yoshise, Global convergence of a class of non-interior-point algo-rithms using Chen-Harker-Kanzow functions for nonlinear complementarity prob-lems, Mathematical Programming 86 (1999) 105-133.[24] H. Jiang, Smoothed Fischer-Burmeister Equation Methods for the complemen-tarity problem, Manuscript, Department of Mathematics, The University of Mel-bourne, June 1997.[25] H. Jiang, Global convergence analysis of the generalized Newton and Gauss-Newton methods for the Fischer-Burmeister equation for the complementarityproblem, Mathematics of Operations Research 24 (1999) 529-543.[26] H. Jiang, M. Fukushima, L. Qi and D. Sun, A trust region method for solvinggeneralized complementarity problems, SIAM Journal on Optimization 8 (1998)140-157.[27] H. Jiang and L. Qi, A new nonsmooth equations approach to nonlinear com-plementarity problems, SIAM Journal on Control and Optimization 35 (1997)178-193.[28] C. Kanzow. An unconstrained optimization technique for large-scale linearly con-strained convex minimization problems, Computing 53 (1994) 101-117.[29] C. Kanzow, Some noninterior continuation methods for linear complementarityproblems, SIAM Journal on Matrix Analysis and Applications 17 (1996) 851-868.[30] C. Kanzow, A new approach to continuation methods for complementarity prob-lems with uniform P-functions, Operations Research Letters 20 (1997) 85-92.[31] C. Kanzow and H. Jiang, A continuation method for (strongly) monotone varia-tional inequalities, Mathematical Programming 81 (1998) 103-125.[32] M. Kojima, N. Megiddo and S. Mizuno, A general framework of continuationmethods for complementarity problems, Mathematics of Operations Research 18(1993) 945-963. 17

[33] M. Kojima, S. Mizuno and T. Noma, Limiting behaviour of trajectories generatedby a continuation method for monotone complementarity problems, Mathematicsof Operations Research 15 (1990) 662-675.[34] J.J. Mor�e and W.C. Rheinboldt, On P� and S�functions and related classes ofn-dimensional nonlinear mappings, Linear Algebra and its Applications 6 (1973)45-68.[35] J.-S. Pang, Complementarity problems, in: R. Horst and P. Pardalos, eds., Hand-book of Global Optimization, Kluwer Academic Publishers, Boston, 1994, pp. 271-338.[36] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations,Mathematics of Operations Research 18 (1993) 227-244.[37] L. Qi, Regular pseudo-smooth NCP and BVIP functions and globally and quadrat-ically convergent generalized Newton methods for complementarity and varia-tional inequality problems, Mathematics of Operations Research 24 (1999) 440-471.[38] L. Qi, D. Sun and G. Zhou, A new look at smoothing Newton methods for nonlin-ear complementarity problems and box constrained variational inequalities, Math-ematical Programming 87 (2000) 1-35.[39] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical Pro-gramming 58 (1993) 353-368.[40] S.M. Robinson, Strongly regular generalized equation, Mathematics of OperationsResearch 5 (1980) 43-61.[41] H. Sellami and S. Robinson, Implementations of a continuation method for normalmaps, Mathematical Programming (Series B) 76 (1997) 563-578.[42] P. Tseng, Growth behavior of a class of merit functions for the nonlinear comple-mentarity problem, Journal of Optimization Theory and Applications 89 (1996)17-38.[43] P. Tseng, An infeasible path-following method for monotone complementarityproblems, SIAM Journal on Optimization 7 (1997) 386-402.[44] S. Xu, The global linear convergence of an infeasible non-interior path-followingalgorithm for complementarity problems with uniform P -functions, MathematicalProgramming 87 (2000) 3, 501-517[45] N. Yamashita and M. Fukushima, Modi�ed Newton methods for solving a semis-mooth reformulation of monotone complementarity problems, Mathematical Pro-gramming 76 (1997) 469-491. 18

AttachmentProof of Theorem 4.1:(i) The generalized Newton direction in Step 2 is well-de�ned by the solvabilityassumption of the generalized Newton equation. By the generalized Newton equationand the smoothness of , we haver(zk)Tdk = G(zk)TVkzk = �kG(zk)k2 = �2(zk) < 0:In view that dk 6= 0 and that d = 0 is not a solution of the generalized Newtonequation, it follows that dk is a descent direction of the merit function at xk . Therefore,the well-de�nedness of the line search step (Step 3) and the algorithm follows fromdi�erentiability of the merit function .Without loss of generality, we may assume that z� is the limit of the subse-quence fzkgk2K where K is a subsequence of f1; 2; . . .g. If f�kgk2K is boundedaway from zero, using a standard argument from the decreasing property of the meritfunction after each iteration and nonnegativeness of the merit function over <n+1,then Pk2K ��kr(zk)Tdk < +1, which implies that Pk2K (zk) < +1. Hence,limk!+1;k2K (zk) = (z�) = 0 and z� is a solution of (3). On the other hand, iff�kgk2K has a subsequence converging to zero, we may pass to the subsequence andassume that limk!1;k2K �k = 0. From the line search step, we may show that for allsu�ciently large k 2 K(zk + �kdk)�(zk) � ��kr(zk)Tdk;(zk + ��1�kdk)�(zk) > ��1�kr(zk)Tdk:Since fdkg is bounded, by passing to the subsequence, we may assume that limk!+1;k2K dk =d�. By some algebraic manipulations and passing to the subsequence, we obtainr(z�)Td� = �r(z�)Td�;which means that r(z�)Td� = 0. By the generalized Newton equation, it followsthat G(zk)TG(zk) +G(zk)TVkdk = G(zk)TG(zk) +r(zk)Tdk = 0:This shows that limk!1;k2K G(zk)TG(zk) = G(z�)TG(z�) = 0, namely, z� is a solutionof (3).(ii) Since @G(z�) is nonsingular, it follows thatk(Vk)�1k � c;for some positive constant c and all su�ciently large k 2 K. The generalized Newtonequation implies that fdkgk2K is bounded. Therefore, (i) implies that G(z�) = 0.We next turn to the convergence rate. From semismoothness of G at z�, for anysu�ciently large k 2 K,G(zk + dk) = G(z� + zk + dk � z�)�G(z�)= U(zk + dk � z�) + o(kzk + dk � z�k);where U 2 @G(zk + dk) andG(zk) = G(z� + zk � z�)�G(z�)= V (zk � z�) + o(kzk � z�k);19

where V 2 @G(zk). Let V = Vk in the last equality. Then the generalized Newtonequation and uniform nonsingularity of Vk (k 2 K) imply thatkzk + dk � z�k = o(kzk � z�k): (5)and kdkk = kzk�z�k+o(kzk�z�k) which implies that limk1;k2K dk = 0. Consequently,it follows from nonsingularity of @G(z�), for any su�ciently large k 2 Klimk!1;k2K kG(zk)kkzk � z�k > 0;limk!1;k2K kG(zk + dk)kkzk + dk � z�k > 0:Hence, (5) shows that kG(zk + dk)k = o(kG(zk)k):By the generalized Newton equation and � 2 (0; 12), we obtain that �k = 1 for allsu�ciently large k 2 K, i.e., the full generalized Newton step is taken. In anotherword, when k is su�ciently large, both zk and zk + dk are in a small neighborhood ofz� by (5), and the damped Newton method becomes the generalized Newton method.Then convergence and the convergence rate follow from Theorem 3.2 of [39]. 2

20

smo - department of engineering, university of cambridgehj231/papers/smoothedfb.pdf · 2 < n h...

Documents