optimization in normed linear spaces - core › download › pdf › 30605457.pdf · optimization...
TRANSCRIPT
Atlanta University CenterDigitalCommons@Robert W. Woodruff Library, AtlantaUniversity Center
ETD Collection for AUC Robert W. Woodruff Library
7-1-1975
Optimization in normed linear spacesAnnie Ruth SmithAtlanta University
Follow this and additional works at: http://digitalcommons.auctr.edu/dissertations
Part of the Mathematics Commons
This Thesis is brought to you for free and open access by DigitalCommons@Robert W. Woodruff Library, Atlanta University Center. It has beenaccepted for inclusion in ETD Collection for AUC Robert W. Woodruff Library by an authorized administrator of DigitalCommons@Robert W.Woodruff Library, Atlanta University Center. For more information, please contact [email protected].
Recommended CitationSmith, Annie Ruth, "Optimization in normed linear spaces" (1975). ETD Collection for AUC Robert W. Woodruff Library. Paper 2075.
OPTIMIZATION
IN
NORMED LINEAR SPACES
A THESIS
SUBMITTED TO THE FACULTY OF ATLANTA UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
THE DEGREE OF MASTER OF SCIENCE
BY
ANNIE RUTH SMITH
DEPARTMENT OF MATHEMATICS
ATLANTA, GEORGIA
JULY 1975
TABLE OF CONTENTS
Page
LIST OP FIGURES iii
Chapter
I. PRELIMINARIES 1
II. OPTIMUM NORMS IN HILBERT SPACES 3
The Projection Theorem......*• • 3
Minimization by Orthonormal Sets 4
Minimization by Infinite-Dimensional Subspaces. 6
Minimization by Convex Sets 8
III. OPTIMUM NORMS IN NORMED LINEAR SPACES 11
Minimization by Dual Spaces 11
Minimization Involving Linear Operators 15
IV. DIPPERENTIATION IN NORMED LINEAR SPACES 18
Gateaux and Frechet Differentials 18
Frechet Derivatives 21
V. OPTIMIZATION BY ITERATIVE METHODS 25
Methods for Solving Nonlinear Equations........ 25
Descent Methods 34
Conjugate Direction Methods 44
BIBLIOGRAPHY 52
ii
LIST OP FIGURES
Page
Figure
1. Successive Approximation Process 26
2. Technique of Newton' s Method 30
3. Descent Process 36
iii
CHAPTER I
PRELIMINARIES
Before beginning our discussion on optimization, we
will devote this chapter to presenting some necessary
preliminary material. Assuming that we are already familiar
with linear space theory and elementary functional analysis,
we will state only certain basic concepts which will be
directly related to the development of our topics. The
proofs in most cases are omitted. The reader is advised to
refer to Luenberger's Optimization by Vector Space Methods.
Lemma 1.1.1. (Cauchy-Schwarz Inequality) For all x,,x2
in an inner product space X, |(x1,x2)|*|lx1ll l|x2li. Equality
holds if and only if x.-vqc- or x2=0.
Lemma 1.1.2. (x1,x2)=O for all x2£ X implies x,=0.
Lemma 1.1.3. (Parallelogram Law) For all x.^x-tX,
!lx1+x2U2+llx1-x2il2=2||x1ll2+2|!x2H2.
Proposition 1.1.4. Let Z be a pre-Hilbert space. The
function Ilz||- fCz^z) is a norm for all z t z.
Lemma 1.1.5. (Continuity of Inner Product) If z -» z____________ jj
and vfl -* v in Z, then fzn,vn) -t (z#v).
Definition 1.1.6. z1 and z2 are orthogonal if (z^a^^O
(symbolized z^z^ for all z1#z2£Z. z^ is orthogonal to a
set V if z-j^ 1 v^ for each v,£ V.
2
2 2 2Lemma 1.1.7. If z1iz2, then )|z,+z2ll =||zJI +UZJI
for each z,,z2e Z.
Definition 1.1.8. A sequence \h\ in a Hilbert space
H is Cauchy if ilh -hJj -* 0 as n,ra -*«o.
CHAPTER II
OPTIMUM NORMS IN HILBERT SPACES
2.1. The Projection Theorem
The solution to our first minimum norm problem is
characterized by the projection theorem. Consequently, the
problem will be solved by proving two different versions of
the projection theorem.
Theorem 2.1.1. Given z in a pre-Hilbert space Z and
a subspace MCZ. If there is an mQ fc M which minimizes
Hz-n»0H such that |lz-mol| ± f|z-m!l for all m£M# then mQ is
unique, itu is unique if and only if (z-mQ) lM.
Proof. Assume that there is an m£ M such that m is
not orthogonal to z-iDq. Then assume llm!S=l and (z-mo,m)=%j*0.
Let m, 6. M be iiu-i-am. Then
2 2Hz-m^l = !|z-mo-sm!i
2 2= ||z-mol! -(z-m0#sm)-(Sim,z-m0)+|Umli
2 2= ||z-mois -5(z-mo,m)S(mzm)+!si
- ||z-n0ll2-2lS»l2+ |Sl2
» ||z-moii2-!s|2<||z-moll2.
Hence mQ is not a minimizing vector and (z-mQ) lM.
Take an m £ M. Then by the Pythagorean theorem,
3
4
2 2 2liz-mll =||z-mo+mo-mll =Hz-moll o
Therefore l|z-mol| <<- i(z-m|| for m^m0 and mQ is unique.
Theorem 2,1,2. Given a Hilbert space H and a closed
subspace M, for any h€ H, there is a unique mQ£ M which
minimizes ||h-mol! such that llh-mot! £ llh-mll for all m€M. mQ
is unique if and only if (h-mQ) 1 M.
Proof. By theorem 2.1.1, mQ € M is unique and (h-mJlM,
Thus we need only show that mQ exists.
If h€M, then mQ=h and mQ obviously exists. But
suppose h£M. Let S=infl|h-mii. We find an mofc M such thatm£M
Hh-moll=S> by taking a sequence of vectors \mA in H such
that (Ih-m^j -» s. By the parallelogram law,
li(m.-h) + (h-mi)tl2+H(m.-h)-(h-.mi)ll2=2|lm.-hll2H-2lih-mi!|2.
By direct computation,
|nns-m.|l 2=2|!m.-hll2+2llh-m.|t2-4|lh-m,-f-m.||2.2
Since M is a subspace, mi+m.€ M for all i,j. Thus, by the
2
definition of 4, Ijh-n^+m .|| 2 s and
tin .-m^l \ 2l|m .-h!l2+2||h-mi!l 2-4s2.? 2
Since llh-m^j -* s, llh-m^l "-* s as i -»«-». It follows that
2Hm.-m^t -» 0 as i,j -^«so. Therefore \mA is Cauchy.
Since H is complete, \itkj has a limit in H and thus
has a limit mo£M. Hence |ih-m0U=§ and mQ exists,
2.2. Minimization by Orthonormal Sets
Definition 2.2.1. A set of vectors y,,y,,...,y in a
linear space Y is linearly dependent if there exists scalars
5
«*•! ,^i • • •«^«, not all zero, such that ^lyi+^oYo** • •+«* Y~=0»
If «*,y ■+ot2y2+*"'+o*nyn==^ implies ^, = ^2=.. .= =^ =0, then the
set is linearly independent.
Definition 2.2.2. A set of vectors V in a Hilbert
space H is said to be an orthogonal set if v. 1 v, for each
v,,v2€V, v,?*v2. V is orthonormal if for each v £ V, llvll=l.
Definition 2.2.3. Let Yi,Y2f">Yn be in a Hilbert
space H. g(y1,Y2»•••,Yn) is the determinant of the Gram
matrix of y1#y2,...,yn and is called the Gram determinant.
Theorem 2.2.4. Let Y;l#Y2» •••»yn be a linearlY
independent subspace M of a Hilbert space H. Given an
arbitrary h£Hf there is an hQ £. M which minimizes Hh-ho(j .
Equivalently, if hg is written in terms of M as
h0=otlvl+ol2v2+'#"+a<hvn' then (n"h0) L (vl»y2'"#'yn) andthere is some § such that
Then
Proof. By definition, we know that
s2=||h-h0H2=(h-h0,h)-(h-h0,h0).
By theorem 2.1.1, (h-hQ) 1 M, so that (h-ho,hQ)=O. Thus
2
S =(hhh) =(hh)a(yh)^(yh)
or
,2(h,h)
David Luenberger, Optimization by Vector Space
Methods (New York, 1969), p. 56.
This equation and the normal equations yield n+1 equations
for the n+1 unknowns ^1, «^,...,o<n, g.. Applying Cramer' s
Rule, we have
(yl'yl)(y2'yl)###(yn'yl)(h'yl>
(yl»y2)
(yl'yn}(y1#h)
...<yn,yn)(h,yn)
...(yn,h) (h,h)
{yl'y2}
(ylfh) ...<ynfh) i
2.3. Minimization by Infinite-Dimensional Subspaces
The approach in theorem 2.2.4 is of little practical
importance since, in some cases, the subspace M may not be
finite dimensional and it would be generally impossible
to reduce the problem to a finite set of linear equations
similar to normal equations. We now turn to these types
of problems which involve a modification of the projection
theorem applicable to linear varieties.
Definition 2.3.1. Let yQ be fixed in a subspace MQ
of a linear space Y. Then a set M^Y is called a linear
variety if M=yo+MQ. M is called a translation of MQ.
Theorem 2.3.2. For a closed subspace MQ of a Hilbert
Ibid., pp. 56-57.
7
space H, a fixed hQ6 H, and a linear variety M=hQ+MQ, there
is a unique h£M of minimum norm with hlMQ.
Proof. First, translate M by -hQ and obtain a closed
subspace MQ. The remainder of the proof follows directly
from the proof of theorem 2.1.2.
Definition 2.3.3. Let M be a nonempty subset of a
Pre__Hilbert space Z. ^xl(x,z)=O for all z Q. Z^ is called
the orthogonal complement of M, denoted M .
Theorem 2.3.4. Take a linear variety in a Hilbert
space H consisting of all h € H satisfying equations of the
form (h,y )=cn for a linearly independent set \,Yi^- H^and
fixed constants c. where i=l,2,...,n. If hQ has minimum
n
norm, then hg= 2 &±Y± where #. satisfies equations of the
form
Proof. Let M be the n-dimensional subspace generated
Y\, i=l,2,...,n. Since each c. is nonzero, the linear
variety is a translation of M1'. Since M is closed, the
existence and uniqueness of an optimal solution follows
from theorem 2.1.2. Now we have that hA.M , and therefore
hQ£ M11. But M is closed, so Mii=M. Therefore hQ£ M orn
hn= Z &iYi» Choose the 6^l s so that hn satisfies theu i=:1 xx x u
equations of the form (n#Yn)=cn» Then 0. satisfies the
equations of the form
and the proof is complete.
8
2.4. Minimization by Convex Sets
Definition 2.4.1. Given a set K in a linear space Y.
If for k^^feK, the set of all points of the form <Ak,+(l-<4)k2
((H <*il) e K, then K is said to be convex.
Proposition 2.4.2. If K,G are convex sets in a linear
space, then
(1) <AK=tx|x=jk, k£Kf is convex for any «<..
(2) *K+£G is convex for any *,&.
Proof. (1) Let x1=<*k1, x2=<4k2. Then,
Therefore *K is convex.
(2) The proof follows from that of (1).
Theorem 2.4.3. Given a Hilbert space H, an h £H, and
a closed convex set K such that KCh. There exists a unique
kQ£K which minimizes ilh-koll such that |lh-kon ± llh-kll for all
k£K. kQ is unique if and only if (h-ko,k-kQ)5O for all
k£K.
Proof. (1) We will first show that kQ exists.
Find a kQ£ K such that iih-k0H=s by taking a sequence of
vectors ik.J in K such that llh-k^l -*s. By the parallelogram
law,
ll(k:.-h)4-(h-ki)l!2+li(k:.-h)-(h-ki)tl2=2Hk.-hll2+2l|h-ki!l2.
By direct computation,
t|kj"ki(l 2=2||kj-h|i2+21lh-kili 2-4|lh-ki+k .||2.2
Since K is convex, ^4-k. tK for all i,j. Thus by the
9
definition of 5, llh-ki+k.|( >s and
2
j
Since Hh-kjjl -* a, llh-kjll 2 -*»s2 as i -*>«*,. It follows that
Ilk.-kjJI -> 0 as i,j -*«*>. Hence Vz^l is Cauchy and has a
limit kQ£K. Therefore |lh-koll = & and kQ exists.
(2) Next, we will show that kQ is unique. Let k^ K
such that llh-k,l|=S». Take a sequence tk i such that k =k«■l n n u
if n is even and Tz^ if n is odd, and ||h-k }} -*• o..
Using the same argument as in (1), since l|h-k || -* s
and ||h-k_|| 2 -» §>2 as n ■♦ » it follows that Ilk -k || -» 0 asn ' m n
m,n -*««. Therefore \knl is Cauchy and has a limit k.t K.
This can only be true for one k£K, so we conclude that
kQ=k1 and therefore kQ is unique.
(3) We will now show that kQ is unique implies that
(h-ko,k-ko)sO for all k €K. Let k^K be such that
(h-kQ,k1-k0)= £ >0. Take vectors k<rf=(l-^)ko+«Ak1(O ^ u <1).
Each k^ € K, since K is convex and
l|h-kJl2=ll(l-«A)(h-k0)+«»(h-k1)!l2
= j|(l- 4) (h-ko)l! 2+( ((1-^) (h-kQ)),
+U(h-k1) ,((l-u) (h-k0)))+
2!!h-koil is differential with respect to 4, so
djlh-kj!2 = -2llh-kJ!2-
= -2(h-k0fk1-kQ) = -26^0.
Thus, ilh-k^j! * |lh-koll for some 4*0 which is a contradiction.
Therefore k1 does not exist.
10
(4) Finally, we will show that if (h-ko,k-ko)<O for
all k£.K, then kQ is unique. Let kQ €. K such that
(h-ko,k-ko)£O for all k€K. Then for any k<£K,
||h-k||2=!lh-k0+k0-kl!2
= |jh-ko|j2-f2(h-ko,ko-k)+||ko-ki|2.
Therefore !lh-ki| ^" 1 jh—lcQ11 for kj^kg and kQ is unique
CHAPTER III
OPTIMUM NORMS IN NORMED LINEAR SPACES
3.1. Minimization by Dual Spaces
Definition 3.1.1. Let N be a nornted linear space and
* 3N the space consisting of all bounded linear functionals
on N. N is called the dual of N.
Theorem 3.1.2. N is a Banach space.
Proof. We will show that N is complete.
N is a normed linear space, so let^x 3>be Cauchy in N .
* * #
Then ||xn~xmll -* 0 as n,m -»«*>. \x (x)£ is a Cauchy sequence
it it if it
of scalars for any x€n, since |xn(x)-xro(x)| £ |lxn"xm^ WXW'
Define a functional x for each x6N as x (x) such that
x (x) -» x (x). Now.n '
x Ux+ay)=lim x («*
it *
x (x)+61im x_n n
=dx (x)+$x (y).
Hence x is certainly linear.
Since ixn| is Cauchy, given an e^O, there exists an M
such that
|x*(x)-x*(x)UeHx(|
Angus Taylor, Introduction to Functional Analysis.
(New York, 1958), pp. 33-34, 162-163, 185-186.
11
12
for all n,m>M and all x. But x (x) -*x (x). Son
for in >M. Therefore,
is & tit ik
|x (x)| = |x (x)-xin(x)+xm(x)l
± |x*(x)-x*(x)i +lx*(x)l— • m m
and x is a bounded linear functional.
. .
Further, |x (x)-xm(x)l ^ £ (|x|| for m ?-M, so j|x -m
and xm -? x eN . Thus N is complete.
We can solve a minimum norm problem by considering two
versions of the Hahn-Banach Extension theorem. The first
parallels the projection theorem and the conclusion is
formulated in a normed linear space as well as its dual.
The second is a geometric approach in which convex sets
are separated with hyperplanes.
Theorem 3.1.3. Let K be a normed linear space and M
a subspace of N, For an x€N,
d(x,M)= inf||x-m||= max(x*(x))
and for some x^M-1, d(x,m)= max (xQ(x)).
If for some mQ£M, d(x,m)= inf ||x-moj|, then
xo(x-mQ)= |(x*|) ||x-moj| .
Proof. Let d=»d(x,m). Given e^O, let ||x-mtj| ±
for ra££M. Then,
* it it
(x (x)) = (x (x-m&)) ±\\x \\ ))x-m£|| <
4David Luenberger, Optimization by Vector Space Methods
(New York, 1969), pp. 110-113.
13
for x*e M1 and |jx*|| * 1.
Since & is arbitrary, we assume that (x
Therefore (xQ(x))=d for any xQ and the first statement is
proved•
Let S be the subspace (x+M). Let n=^x+m, ro£M,.A real,
be the form of elements of S. Let f be the linear functional
on S such that f (n)=«*d. Then,
j|f|J= sup |f(n)( = sup Mid = sup |<Md = d =1.S ;|n|| !|x+mj| mtix+mii infyx+mll
* * uForm the Hahn-Banach extension xQ of f from S to N. Then
||Xq|j=1 and xQ=f on S. So XqsM1 and (xQ(x))=d.
We can now assume that there is an mQ& M such that
||x-mo|!=d. Let x^M1, j|xo|j<-l and txo(x))=d. Then,
=d=||x|| ||x-moi|
and the proof is complete.
Theorem 3.1.4. Let N be a normed linear space and M
a subspace of N. Then for an x £ N ,
cUx*,!!1)- min ||x*-n*U - sup (x*(x))
and for some mQCM^, d(x ,M^)= min ||x -mo(|.
If for some xQ£Mf d(x*#M^)a sup(x*(xQ)), then
((x*-m*)(xo))= ||x*-bJu IIXqII •
Proof. Let d=d(x ,MI). For any m £ M1,
||x -m ||= [sup(x (x))-(m (x))]
> sup[(x (x))-(n (x))]xeM,|jx|||l
= sup (x (x))
xeMJIxjiil
and the first statement is proved.
Now take jjx ||M with norm of the functional x
14
restricted to M. Let y be the Hahn-Banach extension to
the whole space of the restriction of x . Then j|y il=iix|L,
* * * *
0=x -y . Then mQ* * * * * * t
and x -y =0 on M. Set mn=x -y . Then mn <k Mx and
||x*-m*|j=||x*||M.
If d= sup(x (Xq)) for some xQ€Mf then ||Xq||=1, and
jlx -mol! = (x (xQ)) = ((x -ra0)(x0)).
Definition 3.1,5. Let Y be a linear space and U,V
be linear varieties such that U/Y. If UCV, then V=Y or
v=U. U is called a hyperplane.
Definition 3.1.6. Let N be a normed linear space and
it n it it
K a convex set in N. For any x £ N , if h(x )= sup(x (x))#
then h is called the support functional of K.
Theorem 3.1.7. (Minimum Norm Duality) Let N be a
normed linear space and K a convex set. Let x,t N, d(x,K)>0
and let h be the support functional of K. Then
d(x,K)= inf}|x-x,||= max [x (x, )-h(x )]
and for some xQ£ N , d(x#K)= max [x (xQ)-h(x )] .
If for some xQ<= K, d(x,K)= infllXQ-x^J , then
(-XqMxq-o^)*: H+Xq|( IjXq-x^I.
Proof. Let d=d(x,K). We will take the general case
and set x,=0 so that d= inf ||x||= max -h(x ). Thus we will
only be concerned with the negative case of h(x )• If
h(x ) is negative, then K is in the half space
it
which does not contain zero, since (x (0))=0. Thus, the
hyperplane U=tx|(x (x))=h(x H separates K and 0 when
15
h(x ) is negative.
Let s(t) be the sphere of radius t centered at 0. If
h(x*)*0 and I|x*!|=l for any x € N , let t be the supremum
of the £.'s for which U separates K and s(&). Obviously
Ote^d and
h(x*)= inf(x*(x))= -£.*.
Thus -h(x )id for every x 6. N such that nx II ± 1.
K contains no interior points of s(d). Therefore a
hyperplane U separates K and s(d). In addition, -h(xQ)=d
for some Xg£N* such that ||XOM=1. The proof of the first
statement is complete.
Let xQ€K be such that Hx^Md. Then since xQ£ K,
(x*(xQ))<h(x*)= -d. But -(xj(xo)) f |(x*H |\xof!=d. Consequently,
-(x*(xo)) = |(x*j| l|xo!|and the proof is complete.
3.2. Minimization Involving Linear Operators
Definition 3.2.1. Given linear spaces W and Y and a
function A with domain DCW having range Rcy. a is a linear
operator on W into Y if AU^+a^J^^Atw^-f^A^) for all
w1#w26W and any p^,^.
Definition 3.2.2. Let L and N be normed linear spaces.
The spaces consisting of all bounded linear operators on
L into N is denoted b(L,N).
If L and N are linear spaces and A is a linear operator
on L into N, the equation Al=nf for all n£N, may (1) have
Angus Taylor, Introduction to Functional Analysis
(New York, 1958), pp. 85-86, 163, 213-215.
16
one and only one solution lc-L (Notes A exists such that
if Al=n, then A~ (n)=l)f (2) have no solution in which case
an approximate solution can be found, and (3) have any
number of solutions from which the optimal solution is
chosen. Only the latter two cases will be discussed, since
they involve choosing an optimal solution.
Theorem 3.2.3. Given Hilbert spaces G and H and gn
A£B(G,H). There is a g£G which minimizes ||h-Ag!l, h is
fixed in H, if and only if AlAg=A'h.
Proof. This is a case in which no solution exists and
is equivalent to minimizing ||h-hH where h~6R(A) (the range
of A). So by theorem 2.1.1, h is a minimizing vector if and
only if h-h e [R(A)]1 . Then h-hfcN(A') (the nullspace of H),
since [R(A)]1 e N(A* ). Further 0=A*(h-h)=A*(h)-A*Ag.
Theorem 3.2.4. Let G and H be Hilbert spaces. Let
AeB(G,H) such that R(A) is closed in H. The solution of
Ag=h is the g of minimum norm such that g=A*f where the
solution of AA'f=h is f.
Proof. If g satisfies Ag=h, the general solution is
g=gx-fu, u£N(A). But since N(A) is closed, there must be
a unique g of minimum norm satisfying Ag=h such that glN(A).
Assume R(A) is closed. Then g<£ [N(A)]1 =R(A* ). Thus
g=A'f for some f£H. Since Ag=h, it follows that AA*f=h.
Definition 3.2.5. Let G and H be Hilbert spaces. Let
A€B(G,H) such that R(A) is closed in H. Let a unique gQ e G
be of minimum norm corresponding to an h & H which varies
such that gQ is a g1e G which satisfies
17
||Ag1-h||= minllAg-hll.
g
If A1: h -? gQ, then A+ is the pseudoinverse of A.
The concept of the pseudoinverse can be used as another
approach for solving the equation Ag=h. However, it will
not be discussed here.
CHAPTER IV
DIFFERENTIATION IN NORMED LINEAR SPACES
4.1. Gateaux and Frechet Differentials
Definition 4.1,1. Let Y be a vector space, N a normed
space, and T a (possibly nonlinear) transformation defined
on a domain D CY having range RCN. Let y£DCY and let h be
arbitrary in Y. If the limit
(1) &T(y;h)= lira 1 T(y+<*h)-T(y)
exists, it is called the Gateaux differential of T at y
with increment h. If the limit exists for each h 6.Y, then
T is said to be Gateaux differentiable at y.
Consider (1) only if y+dhG D for all j. sufficiently
small in the usual sense of norm convergence in N. For a
fixed yfcD and an h regarded as a variable, the Gateaux
differential defines a transformation from Y to N.
Proposition 4.1.2. If T is linear, then sT(y;h)=T(h).
Proof. Assume T is linear. Thus
h)= lim .1 [T(y+<Oi)-T(y)Ji-»0 *
= lim a,[T(y)+«^T(h)-T(y)j«■< -% 0 •*
= lim 1 [*T(h)l0
- lim ^[T(h)J =T(h).<* -* 0 o<
Definition 4.1.3. Let T be a transformation defined
on an open domain D in a normed space L having range in a
18
19
norraed space N. If for a fixed y£D and for each hfcL,
there exists a linear ^T(y;h)€ N which is continuous with
respect to h such that
lim UT(v+h)-T(y)- T(v:h)ll = 0
0
then T is said to be Prechet differentiable at y and sT(y;h)
is called the Prechet differential of T at y with increment
h.
Note that 5iT(y;h) will be used to represent both the
Gateaux and Prechet differentials since it is obvious from
the context which is meant.
Proposition 4.1.4. If the transformation T has a
Frechet differential, then it is unique.
Proof. Assume sT(y;h) and %'T(y;h) are Prechet
differentials of T at y with increment h. Then
||&T(y;h)-s'T(y;h)|l
= IIT(v-i-h)~T(v)-§>T(y;h)-(T(v-<h)-T(v)-st>T(v:h))ll
llhll
HT(y+h)-T(y)-&T(y;h)-T(y+h)*T(y)+s'T(y:h)ll
llh||
. llT(y-fh)-T(y)-ST(y;h)JI , HT(yfh)-T(y)-^T(y:h)||- Hhil + liHi
So
UT(y+h)-T(y)-4T(y;h)il UT(y+h)-T(y)-s>'T(v:h)H0 |lh!| + llhll
= 0( llhll).
Thus ||sT(y;h)-s4T(y;h)ll » 0( llhll). Since §T(y;h)-s'T(y;h)
is bounded and linear in h, then &T(y;h)-a'T(y;h) must be
zero. Consequently, &T(y;h)=s'T(y;h) and &T(y;h) is
unique.
20
Proposition 4.1.5, If the Prechet differential of T
exists at y, then the Gateaux differential exists at y and
they are equal.
Proof. Let ST(y;h) denote the Frechet differential.
If the Gateaux differential of T exists, then
lim l.[T(y-Mh)-T(y)] =&T(y;h).
In addition,
= 1 ||T(y-^h)-T(y)-£T(y;o<h)|l
w
=_lJ|T(y+*m)-T(y)-s,T(y;o<h)l|
approaches zero when h is constant.
So by the linearity of j>T(y;<*h) with respect to <*,
T(v-fMh)-T(y) = 4T(y;h).0 o»
Proposition 4.1.6. If the transformation T defined
on an open set DCY has a Frechet differential at y, then
T is continuous at y.
Proof. We know that s>T is a bounded linear operator,
so
HIOll Ilhll
since if (1) lira l|T(y+h)-T(y)ll approaches zero, then (1)h->0
becomes zero.
Given e>0, then
||T(y+h)-T(y)-sr(y;h)ll iellhil if ||hl|«s.
Thus
j|T(y+h)-T(y)H-l&T(y;h)ll < |IT(y+h)-T(y)-%T(y;h)H ± eIlhll
||T(y+h)-T(y)H ie||hll + lls»T(y;h)ll 6 fcHh||+MllhH»(e+M)l|h||.
21
Therefore T is continuous at y.
4.2. Frechet Derivatives
If the transformation T defined on an open domain DCY
has a Prechet derivative at each y £ Df then for a fixed
point y£D, £T(y;h) is a bounded linear operator in hey.
i>T(y;h) can be written as Ah where A is a bounded linear
operator from Y to H and A £B(Y,N) which is a normed
linear space.
Since A depends onyeD, y -*A defines a transformation
from D into B(Y,N). This transformation is called the Prechet
derivative T1 of T. We will write A as T'(y). Thus
&T(y;h) = T'(y)h.
Definition 4.2.1. Let U: D -*B(Y,N) be defined by
U(y) » T'(y) where U = T1 and T'(y0) £B(Y,N). U is
continuous at y if and only if for t>0, there exists an
s>>0 such that fly-yo!l<§. implies HT1 (y)-T1 (yQ)IU & . If U
is continuous on some open sphere S, then T is continuously
Prechet differentiable on S.
Proposition 4.2.2. Let S be a transformation mapping
of an open set D cy into an open set E cy. Let P be a
transformation mapping E into a norraed space N. Put T = PS.
Suppose S is Frechet differentiable at yfeD and P is
Frechet differentiable at z - S(y)eB. Then T is Prechet
differentiable at y and T'(y) = P'(z)SMy).
Proof. For h^Y, we have that y+h € D and
T(y*h)-T(y) = PS(y+h)-PS(y)
= P[S(y+h)j -P[S(y)]
22
= P[S(y+h)-S(y)+S(y)] -P
Let g = S(y-s-h)-S(y). Then
P[S(y+h)-S(y)+S(y)] -P[s(y)] = P(g+S(y)] -PJ><y)] .
Since z = S(y), we have that
p[g+S(y)}-PJS(y)] = P(g+z)-P(z).
By the definition of Prechet derivatives, it follows
that P(g+z)-P(z) ■ p'(z)g. So
[|T(v4-h)-T(y)-P'(z)gu - IIP(g-t-z)~P(z)-P> (z)qii *. t
m\\ ng|iif ||gi| *&. Thus,
i!P(g+z)-P(z)-p'(z)g!l = O(llgll)
and
' - ||S(y+h)-S(y)-S4(y)hll = O(!|hll).
T'CyJhll = ilT(y-i-h)-T(y)-P'(x)S'(y)h!l
by definition. Further,
||T(y+h)-T(y)-p'(z)S'(y)hll
= HT(y+h)-T(y)-P '(z)g+P» (z)g-P l(z)S ' (y)hll
* |]T(y+h)-T(y)-P l(z)gH+ilP ' (z)g-P1 (z)S l (y)hll
Ilg-St(y)hll
Since S is continuous at y,
llgll - llS(y+h)-S(y)ll = O(||hll).
Consequently, Tlty)h = P*(z)S'(y)h and T'(y) = Fl(z)Sl(y).
Proposition 4.2.3, Let T be Prechet differentiable
on an open domain D. Let y£D and y+<*h £D for all «*,
Oi^il. Then
||T(y+h)-T(y)lif ||hl| sup
23
Proof. Let zQ = T(y+h)-T(y)£ N where N is a real
normed linear space. If zQ = 0, then zQ = T(y+h)-T(y). If
z0 ^ 0, then by the Hahn-Banach Extension theorem, there
exists a z 6 N and \\z l! = 1. Thus
||T(y+h)-T(y)U = z*(T(y+h)-T(y)).
Define c|(«O = z [T(y+4h>] on the interval [0,l] . By
proposition 4.2.2, <$'(<*) = z [t ' (y+«*h)h] • By the mean
value theorem for functions of a real variable,
<i>(l)-<t>(Q) = $i&n), CU«*oxl. Thus
z* [T(y+h)-T(y)} = z* [T1 (y-Mh)h] .
In addition,
i !)z*l! sup ||T' (yH-th)hll
5 ||z*|| sup ||T'(y+^h)H llhil.
Therefore,
||T(y+h)-T(y)H £ ||h|| sup ffT1 (y+«th)H .
Definition 4.2.4. If T: Y -* N is Prechet differentiable
on an open domain DCY, then T1 maps D into B(Y,N) and may
be Frechet differentiable on a subset D, CD. Here, the
Prechet derivative of T1 is denoted by T" . T11 is called
the second Prechet derivative of T.
Proposition 4.2.5. Let T be twice Frechet differentiable
on an open domain D. Let yfcD and suppose that y+<*h6D for
all c*, 04J11. Then
UT(y+h)-T(y)-Tl (y)hlf ± lllhll2 sup ||T " (y-uh)il .
S. C. Saxena and S. M. Shah, Introduction to Real
Variable Theory (Scranton, 1972), pp. 168-169.
24
Proof. The proof follows from that of proposition 4.2.3.
CHAPTER V
OPTIMIZATION BY ITERATIVE METHODS
5.1, Methods for Solving Nonlinear Equations
The first method which we will discuss is that of
successive approximation. It is used to solve equations
of the form y=T(y) where the solution y is said to be a
fixed point of the transformation T, since T leaves y
invariant. The process is illustrated by figure 1.
We find a fixed point by beginning with an initial
trial vector y^j^ and computing y2=T(y1). This is done by
finding the point of intersection of T(y) with the forty-
five degree line through the origin. y^TCy^) is derived
by moving along the curve as shown. Continuing in this
manner iteratively, successive vectors yn+i=T(yn) are
computed. Thus, the sequence ^yni converges to a solution
of the equation y=T(y).
Definition 5.1.1. Let S be a subset of a normed
space N and T a transformation mapping S into S. Then T
is said to be a contraction mapping if there is an eA,
0<«*«l, such that ||T<y1)-T<y2)H ^dklly1-y2N for a11 vi»y2eS<
Let h=y2-y1 so that y1+<*h=y1-M(y2-y1). Since S
converges, y1+^(y2-y1) ^S. Sly2-y1l!<& where Ǥ = t/* implies
l(T(y1)-T(y2)|j^ & . Thus, T is absolutely continuous.
25
26
Pig. 1.—Successive approximation process
27
Note that a transformation having \\T* (y)ll ± <*<l1 on a
convex set K is a contraction mapping, since by the mean
value inequality,
llTly^-TCy^lli supllT'Cy)!! Hyj^lte,* lly^y^l.
i.e. when a transformation is continuous and its derivative
is less than one on a convex set, it is a contraction
mapping.
Theorem 5.1.2. (Contraction Mapping Theorem) If T is
a contraction mapping on a closed subset S of a Banach
space, there is a unique vector y«€; S satisfying yo=T(yo).
Furthermore, yQ can be obtained by the method of successive
approximation starting from an arbitrary initial vector in
S.
Proof. Take an arbitrary y,£S, Define the sequence
n\ by the formula yn+i=T^yn)« Then,
By the mean value inequality,
It follows that
In addition.
28
as n -> oo . Therefore Hyn+o-ynll -» 0 as n -s> *», and \yn\ is
Cauchy in N.
Since S is a closed subset of a complete space, there
is a yo£N such that yR -»Yoe S. We know that Yn+1=T<yn> •
Consequently, lim Yn+i=1im T^Yn^ * S:"-nce Yn _»Yo» it
follows that yQ=T(lim yn)=T(y0). Therefore yo=T(yQ).
Assume that yQ and zQ are fixed points. Since
yQ=T(y0) and Zq=T(Zq),
J!yo-zolt=llT(yo)-T(zo)li.
By the mean value inequality,
!|T(yo)-T(zo)|lle<|{yo-zoK.
Therefore )|yo-zoll(l-<*)iO, Y0-zQt and yQ is unique.
Theorem 5.1.3. Let T be a continuous mapping from a
closed subset S of a Banach space into S. Suppose Tn is
a contraction mapping for some positive integer n. Then
T has a unique fixed point in S which can be found by
successive approximation.
Proof. Let s, be arbitrary in S. Define the sequence
ts.S by s. ,=T(s.). Since Tn is a contraction mapping, the
subsequence \s . \ converges to an s«6 S which is a fixed
point of Tn, by theorem 5.1.2.
Since T is continuous, T(sQ) can be derived by
successively applying T11 to Tls^. We know that sn=Tn(sx)
and s ^T^fs, ). Thus, we have that sn=lim Tnk(s,) andnK x u * _ x
T(sn)=T[lim Tnk(s,)] =lim Tnk[T(s,)]0 k-*~ x k-»-««. ■*•
29
Further,
so-T( sQ) = tlim Tnk (s±)] - [lim Tnk [T( s±)}]
Since T is continuous,
||so-T(so)!i= liro||Tnk(s1)-Tnk [T(s1k*<»
o»
i liiMllTk~1(s1)-Tk""1i*T(s1)JS!lk-*<«
U, J-T*-1 [T(s.
where A<1. Thus so=T(sQ).
If so,tQ are fixed points, then
Uso-to!MiTn(so)-Tn(to)IU d iisQ-t0U.
Therefore so=tQ and sQ is a unique fixed point of T.
The next method which we will discuss is Newton's
Method. It is used for solving equations of the form P(y)=O.
However, it has a direct extension applicable to a nonlinear
transformation T on a normed space. The technique is
illustrated in figure 2.
At a given point, tentatively, the graph of the function
P is approximated by its tangent, and an approximate solution
of P(y)=O is taken as the point where the tangent crosses the
x-axis. This process is repeated iteratively from this new
point. This process defines a sequence of points as follows:
of o 0*
& •<
N• •
| o 313 H-
A C (D 0 H»
fl>
O
1 n •< 4;
i 6 I f i
to
L,—-
j !
i_<——
\\
\
o
31
Theorem 5.1.4. Let G and H be Banach spaces and let
P be a mapping from G to H. Assume that:
(1) P is twice Prechet differentiable and ||PU (g)ll £ K.
(2) There is a point g^e G such that P1=PI (g-^) has a
bounded inverse p, ~ with Up," l'£6i# "p-i"X X XX
(3) The constant I, = 0.i1,k satisfies 1,«
Then the sequence gn+i=9n"Pn" (p^n^3 exists for all n>l
and converges to a solution of P(g)=O.
Proof. We must dhow that if g, satisfies (1), (2),
and (3), then g2=gl"pl"" p*gi* satisfies the same conditions
with new constants ^O'^Z'^Z'
g2=g1-P1~1P(g1), so g2-g1=g1-P1"1P(g1)-g1. Then it
follows that
Since Pj^sP'tg-i^) and Pg^'fc^*' we nave
Ilo1"1tp1-P2]lli Wp^W Uj^
sup IIP" (g1+«<(g2-g1))Hfdil
Since l.<l/2# the linear operator
-1r n -1LIp IPP1 P P^ ^2 ^ 2
Since Wp^1 h?1-P2\\\ < 1^1/2, we have that (I-p^1 fe^-pj ) "1
32
exists. Therefore HL 111 1 . Thus (p. p_)~ and(1-1^ X 2
(Po~ Pi) exist. In addition, lip," p,H ± 1 . It
follows that lt=r>^" p,, p1L=p2, and (p.L)~ =(L"" p." ^P?" <
Therefore p2~ exists.
Now we can estimate the bound of p2~ .
To obtain a bound for Hp2~ P(g2)H, we will consider the
operator T^CgJ^g-p^ P(g). Clearly ^(g^sg^ Thus
Ti' (gi)=I-Pi"lpl(g1)=i-p1"1p1=i-i=o.
Consequently, T^' (gj^JsO.
Since T1(g1)=g1-p1~1P(g1), it follows that
T1(g2^=g2"Pi" p^2)»
So
P1"1P(g2)=g2-t1(g2)=T1(g1)-T(g2)-T1'
and
Hp1-1p(g2)ll=llT1(g1)-T(g2)-T1l
-gJ)2 sup |IT12 2 -1
p2"1P(g2)ll=llL"1n1"1P(g2)i! HIL"1!! llp1~1P(g2)H
^P1 (g)so T " (gJsOp^P "We know that T1 (g^I-p^P1 (g), so T " (gJsO-p^P " (g)
Consequently,
llp1"1P(g2))Uln12 sup Hp1"1P"
sup
33
" 2
Let I2=62^2k# Then
6l ^1 1 11 1 12 1
V 1-1, 2 1-1, " 2(1_V2 " 2 <1^;> 2
The conditions (1), (2)f and (3) are satisfied for g2 and
the constants ^2# ^2' ^2* B^ induction, tg_^ exists for
nn Xn> 6n such that
(1) IIP "(g)ll *K
(2) P^=P'(g^) has a bounded inverse lip^"1P(xv,)|Ur\Y,-n n *n n n
Since
n,
2 a 2 ?
we have that ^n;i^ J- %« In addition, since ngn+i-gn!lA An»2n-l
it follows that
on-f-k-4 ,n-l£.2 2
_ru f 1 + 1 +., .■¥ 1 »
1 tii / 1 + 1 +.. .+12n-l 1(2k-2 2k-3
V-1
As n -*oo, k -»««, and ilgn^-gnll £. JHl_ "* °« Therefore,«n-1
lgn? is Cauchy in 6 and there exists a q-. e G such that
It'll
34
To show that gQ satisfies P(gQ)=O, we will begin by
noting that tllpnlll is bounded, since
=llPl(gn)-P1(g1)H+»lp1H
sup
But lig -g^H is bounded, since it is convergent and
lig -g^l- M for a11 nT=1» Therefore UpJI <
Knowing that gn+i-gn-Pn"lp<0n>' we have gn+l"gn= "pn"lp(gn)Then for each n, Pn<gn+i-9n>= -Hgn) and Pn
But Hgn+i-gnll "* ° and UPnH is bounded. So
KP<xn)lj=lipn<gn+1-gn>IU llPnll ||gn+1-gnll
Hence lim ||P(gn)il=O. Since P is differentiable, it is
continuous and P(gQ)=O.
5,2. Descent Methods
We now turn to Descent Methods which iterate in such
a way as to decrease the cost functional continuously from
one ster? to the next and thus insures convergence from an
arbitrary starting point. The procedure consists of
minimizimg a functional f, taking a given initial point y1#
and then constructing iterations according to the equations
of the form
where <* is a scalar and pn is a direction vector. After
selecting the vector p , the scalar ©<n is chosen to minimize
f (y +cto ) which is considered a function of «(. Then - n
35
arrangement is such that f(y +°<Pn)*f (yR) for some small
positive**, a* is then often taken as the smallestn
positive root of
Then, f (Yn+o<nPn) is evaluated to verify that it has
decreased from f(yn)» If f has not decreased, then a
value of «* is chosen,n
The descent process is illustrated in figure 3, where
the contours represent the functional f in the space Y. We
start from a point y.^ and then move along the direction
vector p, to the first point where the line y^^p-j^ is
tangent to a contour of f. If f is bounded below, note
that the descent process defines a bounded decreasing
sequence of functional values and that the objective values
tend toward a limit fQ.
The descent method which we will discuss here is that
of Steepest Descent. This method is used to minimize a
functional f defined on a Hilbert space Y. The direction
vector p at a given point y is chosen to be the negative
gradient of f at yn<
An important application is to minimize the quadratic
functional
f(y)=(y,Qy)=2(b,y),
where Q is a self-adjoint positive-definite operator on Y.
Assume that
m=inf (y.Qy)(y
.QVi
77)
36
/>f increasing
\
) i
Pig. 3.—Descent process
37
M=sup (y.Qv)
Y^O (y,y)
are positive, finite numbers. Then f is minimized by
solving the linear equations of the form
Qyo=b.
The vector
r=b-Qy
is called the residual of the approximation. Inspecting
reveals that fl(y)l, as *=0, is (l,-2r). Thus 2r is the
negative gradient of f at the point y. Therefore the
steepest descent method applied to f takes the form
where r =b-Qy and «* is chosen to minimize fCy.,.-,). Then n n n-i-x
value of o< is fotAnd as follows:n
Further,
2*<rn,Qrn)-2(rn,rn>=0
38
Thus,
Trn7Qrn7 *n'
where rn=b~Qyn»
Theorem 5.2.1. For any y^ Y, the sequence £yn<
defined by
rn(rnfQrn)
converges (in norm) to the unique solution yQ of Qy=b
Furthermore, defining
F(y)=(y-yo,Q(y-yo)),
the rate of convergence satisfies
where zn=y0-yn'
Proof. Note that
F(y)=(y-yo,Q(y-yo))
=(y#Qy)-(yrQyo)-(yOrQy)+(yo,Qyo)
=(y,Qy)-2(y,b)+(yo,Qyo)
=f(y)+(yo,Qyo)
so that f and F achieve a minimum at yQ and the gradients
of f and F are equal. By direct computation,
f(y)+(y0,Qy0>-f(yn.,,1>-<yo,Qyo>
F(yn) F(yn)
39
2"<rn,rn)-*2Crn,Qrn>F(yn)
We know that r =Qz , son n'
(r #r )2
(rn>rn}
From the definition of m, where m>0f it follows that
<rn,Qrn)*M(rn,rn)
and
In addition,
Flyp M*
Then
rn+l >in
M
and we have that
M'
M.
40
Continuing in the same manner,
M
SO
n ~( MJ
We know that
Tzn'Qzn>Consequently,
l0n,0-yn))=I F(yn)
So
Therefore
and the proof is complete.
Theorem 5.2.2. Let f be a functional bounded below
and twice Frechet differentiable on a Hilbert space H.
Given h^fe H, let S be the closed convex hull of lh: f(h)<
Assume that f " (h) is self-adjoint and satisfies
throughout S (i.e., f " (h) is uniformly bounded and positive
definite). If lhn^ is the sequence generated by steepest
7David Luenberger, Optimization by Vector SpaceMethods (New York, 1969), pp. 150-152.
41
descent applied to f starting at h,, then f'(h ) -*0.
Furthermore, there exists an hQes such that hn -*h0 and
f(hQ)=inftf(h): h£HJ.
Proof. Given h£S, apply Taylor's expansion8 with
remainder to the function
g(t)=f(th+(l-t)h1).
Then we have
g(l)=f(h),
g(O)=f(h1),
g'(O)=f'(h1),
g(t)=g(O)+tg'(0)+t2g'(t), where 0<t<L.2
So
g(l)=g(O)+g'(O)+lgl(t), where2
Then
g1 (t)»f * (t(h-h1)+h1) (h-^)
implies that
g'(O)=fl(h1)(h-h1)=fl(h1)f(h-h1)
and
g" (t)=f "
=f" (hj
-f "(hllh-h^fh-h^.
Let h be such that fChUffl^) implies f(h)-
Then we have
o
G.E. Sherwood and Angus Taylor, Calculus (Englewood
Cliffs, N.J., 1954), pp. 395-398.
42
-f'(h1)(h-h1)=|fl(h1)(h-h1)| s-
mp-h^l 2£f (h)-f (h^+llf' (h^ll H
So it follows that \h: f (h)« is bounded.
tf(hn)l -* fQ implies that for an £"0, there is an Nn
such that
|f(h )-
n 4M
if n2NQ. So assume that f'(hn) does not approach zero.
Then there exists an t>0 such that for any N1# there is an
x such that (if '(hn)f| a t . Let N««iax(NOfN1). Then if0%
nsN. we have that |£(h -fn|*e and Jlf' (h_)l|rn ° 4M n
<*>0f let hA=hn-o«f *(hn). We know that
For an
So let
-f'(hh-»fl(hn)+fl(hn))
" (hn-«tf' (hn) )f' (hn)f (hn).
Then
l> (h)fl(n)ft(hn)
and
l(hn)f »(hn))+jii(f " (h)f'(hn)f
llf *
43
M)||f'(h )U22 n
Hence, for«*=l_,
M
f(hj-f<h )«,-
"2M "
"2M*
If <='\l and ^=nn+x# then
Therefore f(hn+1)<fQ. But all f(hn)2fQ. So we have a
contradiction and therefore ||f'(hn)(l -*>0.
For any h,l^Sf by the one-dimensional mean value
theorem,
(f'(h)-f' (l)(h-l) = (h-l,f" (h)(h-D)
where h=th+(l-t)l, Oitsl. Consequently,
^"f'«V 'm
in
Thus
m
Since if'(hn)? is Cauchy, so is thn?• Therefore
44
hQ6S and hR -*h0. Clearly f'(hg)=O. Let s be such that
ho+H£S. Then there is a tf O*t<L, such that
f(ho+s)=f(ho)+l(s,f" (ho+ts)s)
2f(ho)+mtls|l2.
Therefore hQ minimizes f in S and H. The proof is
complete.
5.3. Conjugate Direction Methods
Q
We will use the Fourier Series here to minimize a
quadratic functional f on a Hilbert space H by an
appropriate transformation.
Let f(y)=(y#Qy)-2(y,b) where Q is a self-adjoint
linear operator satisfying (y,Qy)-M(y,y) and (y,Qy)2m(y,y)
for all y£H and some M,m»O. Then the unique vector yQ
minimizing f, is the unique solution of the equation Qy=b.
This problem can be considered a minimum norm problem by
introducing an inner product (y,zj=(y,Qz), since it is
equivalent to minimizing lly-yollQ
If we can generate a sequence of vectors ip,,P2»•••?
that are orthogonal with respect to the inner product [,] ,
then it is said to be Q-orthogonal or a sequence of
conjugate directions. yQ can be expanded in a Fourier
Series with respect to this sequence. If the n-th partial
sum of such an expansion is denoted by y then by the
fundamental approximation property of Fourier Series,
9David Luenberger, Optimization by Vector Space
Methods (New York, 1969), pp. 58-60.
45
||yn-y0!l_ is minimized over the subspace [p-l^, • •
Since fly -YqI!0 is a maximal orthonormal set in Hf
22So by expanding Hyn-yoliQ , we have
^ 2
iin.Hyn-yoi|Q2=Hyoll2-llyol!2=on*
2
Thus y -> yQ and iiyn-yollo decreases as n increases.
Consequently, if tp^ is complete, the process converges
to yQ.
Theorem 5.3.1, (Method of Conjugate Directions) Let
XpA be a sequence in H such that (pi#QPj)=O, i=j, and the
closed linear subspace generated by the sequence is H.
Then for any y,£ Hr the sequence generated by the recursion
yn+1-V-Vn
satisfies (r tP^)^, lc=lf 2,.. ,,n-l. In addition, yn -♦ yQ
(the unique solution of Qy=b).
Proof. Define zn=synm"yi» The recursion is then
equivalent to z,=0 and
pn
46
Pn
pn
<Pn,QPn>
In terms of the inner product [,] , knowing that
we have
[Pn,QPnl
Since z £ [Pi»P2» • • ''pn-ll ' it: follows tnat
and
n-1 n-1
Then
[Pn-1'Pn-llSo
IPa-1'Pn-J
47
In addition,
Continuing in this manner,
which is the n-th partial sum of the Fourier expansion of
z«. With any assumptions on Q, convergence with respect
to III) is equivalent to convergence with respect to IIIL.
Thus, it follows that z -» Zq and y ~*yo» (r
follows from the error that
is orthogonal to the subspace [plrp2,•••#£«_]] • The P^oof
is complete.
The next method of conjugate direction, which we will
discuss, consists of obtaining Q-orthogonal direction
vectors by applying the Gram-Schmidt procedure to any
sequence of vectors that generate a dense subspace of S.
Definition 5.3.2. Let \v±\ be such a sequence in S
so that
for n>l, where fy,z] =(y,Qz). Start with an initial vector
Vj^ and a bounded linear self-adjoint operator B, so that
tv^l is generated by vn+i=Bvn- This sequence is said to
be a sequence of moments of B.
48
Theorem 5.3.3. Let \v.\ be a sequence of moments of
a self-adjoint operator B. Then the sequence
pn-l'
for n^2, defines an orthogonal sequence in S such that for
each n,
Proof. It is clear that the theorem is true for p,
and p?. So to prove that it is true for n>2, we assume
n
that it is true for \p.T . We will show it is true for1 i-1
n+1
\p.\ . Note that Pn+T is nonzero and is in the subspacen=l
lvl»v?» • • *»vn+J * Thus we nee^ only show that Pn,i is
orthogonal to each p.f i^n. By direct calculation.
(1) (2)
If is.n-2, then (1) and (2) are zero. Because
BpAe (plf P2 # • • • # Pi+il #
it follows that
Therefore Pn+^ is orthogonal to each p., iin. The proof
is complete.
Finally, we will discuss the Conjugate Gradient
49
Method, which involves selecting direction vectors when
minimizing the functional
f(y)=(y,Qy)-2(b,y)
by choosing
pl==rl=b"QYl(direction of negative gradient of f at y1). A new negative
gradient direction,
r2=b-Qy2,
is considered and p0 is chosen to be in the soace spanned
by r^r,, but Q-orthogonal to p1# The selection of pA' s
is continued in this manner so that
This leads to a recursion of the form
Theorem 5.3,4. Given yx in a Hilbert space H, define
(1) P1
(2) rn=b-Qyn,
nn
Then ^yn^ converges to yo=Q"~ b.
Proof. We must first show that this is a method of
50
conjugate directions. First, we assume that the theorem
n n+1
is true for t p, ^ and tyvS . Prom (4), we have thatJC k=l K k=l
(7)
If k=n, then (7) and (8) cancel. If k<n, then (8) is zerO
and (-P^Q*^^) can be written as (QPfc*1^}!) • But
and for any conjugate direction method, (r .i»P^)=0» i£n«
Therefore, this method is a conjugate direction method.
Next, we must show that ty ^ converges to Vq. So we
define the functional E by
E(y)=(b-Qy,Q"1(b-Qy)).
By direct computation, using (1) through (6), we have
From (4) and the fact that (r ,p -,)=0,
and
From (4) and since pw and p n are Q-orthogonal,n n—i '
(10) (rn,Qrn)=(pn2
Since (y,Qy)2m(y,y) and (y,Qy)iM(y,y), it follows that
(rn,rn) >m.
51
Combining (9), (10), and (11), we see that
Therefore E(yn) —>0 implies that rn -*0. The proof is
complete.
BIBLIOGRAPHY
Kantorovich, L. V., and G. P. Akilov. Functional Analysis
in Normed Spaces. Translated by D. E. Brown. Edited
by A. P. Robertson. New York: Macmillan Company,1964.
Luenberger, David G. Optimization by Vector Space Methods.
New York: John Wiley and Sons, Inc.f 1969.
Saxena, S. C, and S. M. Shah. Introduction to Real
Variable Theory. Scranton: International TextbookCompany, 1972.
Sherwood, G. E., and Angus Taylor. Calculus. 3rd ed.
Englewood Cliffs: Prentice-Hall, Inc., 1954.
Taylor, Angus. Introduction to Functional Analysis. NewYork: John Wiley and Sons, Inc., 1958.
52