the theory of large deviations for random processes and strong convergence of stochastic...

NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHOR

The Theory of Large Deviations for Random Processes and Strong Convergence of Stochastic Approximation Procedures

S.L. Leonov

October 1986 WP-86-57

Working Papers a r e interim repor t s on work of the International Institute f o r Applied Systems Analysis and have received only limited review. Views or opinions expressed here in do not necessarily represent those of the Institute or of i ts National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria

Preface

This paper deals with t he application of "large deviation" theory t o t he analysis of stochastic approximation procedures. The approach allows to ge t new results in t h e asymptotical behaviour of stochastic procedures under very mild assumption about t he "noise". The paper contains a shor t but illuminative survey of these resul ts together with some new author 's findings. For applications t he last section seems t o be interesting presenting some new ideas in multiobjective optimization.

Prof. V. Fedorov Environmental Program

The Theory of Large Deviations for Random Processes and Strong Convergence of Stochastic Approximation

Procedures

S.L. Leonov

1. INTRODUCTION Stochastic recurs ive procedures or stochastic approximation procedures

(SAP), as they are often called, t ake a significant role among t h e investigations on t h e probability theory and applied mathematics. Popularity of SAP may be ex- plained by the i r re la t ive simplicity (from computational point of view) and efficien- cy, as f o r a number of cases recursive procedures are asymptotically optimal.

Theoretical methods f o r t he SAP's studying are in steady development. The "martingale" approach appeared to be t he f i r s t fundamental approach fo r t he analysis of recurs ive procedures. I t is based on the application of t he theory of martingales to construct a stochastic analogue of a Lyapunov function (for re fe r - ences see Nevel'son and Has'minskii, 19'72).

During the last decade new resul ts were obtained in t he theory of random processes, concerning the analysis of t he probabilities of l a rge deviations (Ventsel and Freidlin, 1983). Application of t h e theory of l a rge deviations (TLD) made i t possible to find new resul ts f o r t he SAP's asymptotical behaviour. Recent monograph by Korostelev (1984) is devoted to t h e analysis of t h e necessary and sufficient conditions for almost s u r e ( a s . ) convergence and rates of convergence in terms of upper functions. Kushner (1983, 1984) applies TLD to investigate asymptotics of t h e probabilities of escape time from the neighbourhood of s table points.

In t he presen t paper an account of main ideas and resul ts of t he approach based on TLD is given. The range of problems is limited h e r e by the studying of almost s u r e convergence f o r discrete-time procedures, main attention being given to t h e investigation of t ra jector ies ' behaviour in the neighbourhood of s table points.

The s t ruc ture of t h e paper is t h e following. In section 2 w e introduce basic notations and assumptions. Sections 3, 4 are devoted respectively to the investigation of convergence conditions and rates of convergence f o r SAP with independent noise. In section 5 SAP with dependent noise are analysed. In section 6 TLD is applied, as an illustrative example, to t h e problem of minimization of an additive function.

2. BASIC NOTATIONS AND ASSUMPTIONS The following recursive procedure will b e studied:

X(t +I ) = X(t )+a ( t )[B(X(t 1) + cpt (X(t 111 + B(t > t ( t 1 ,

t =1,2. .... ; X(0) = z o e d where zo is an a rb i t r a ry point in t h e Euclidean space R ~ , d 21 ; random vec tors t ( t ) = t ( t , w) (the noise) are defined in t he probability space (Q,B,P) and have values in R~ . The set of assumptions i s t h e follow-

https://www.researchgate.net/publication/3029848_Asymptotic_Behavior_of_Stochastic_Approximation_and_Large_Deviations?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

https://www.researchgate.net/publication/243092902_Stochastic_Approximations_via_Large_Deviations_Asymptotic_Properties?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

ing . A l . A point z , is an asymptotically stable equilibrium point of the dynamic

system

2, =B(z,) (2)

with the domain of attraction D , the condition

being held. B

A2. Vector field B (z):Rd -Rd satisfies the Lipschitz condition in domain D \ V, for every E > 0 , where V, is a E-neighbourhood of z. : fo r every E > 0 there exists a positive constant L, , such that

llB(z) -B(v)ll s L , l l z - u l l f o r z . y € D \ V , .

A3. With probability 1 (as . ) trajectories X(t) = X(t ,a) re turn infinitely often into compact se t K , z. E K c D.

A4. The sequences [ a ( t ) 1, [@(t ) a r e deterministic (non-random) sequences of positive scalars, and

9: Function pt ( z ) is a deterministic vector function: Rd -Rd , such that

A5. M.$(t) = O f o r a l l t > O , M)\#(t) l ) 2 ~ c o n s t < - , M is a symbol of averaging over measure P (we assume for simplicity that vectors #(t) do not depend on space coordinate X(t) ).

Now let us make some oomments on introduced assumptions.

Assumption A 2 makes it possible t o consider functions B ( z ) for which the Lipschitz condition is violated in z, : e.g., in one-dimensional case ( d =1 )

Assumption A 3 is an analogue of the "boundedness condition" of Ljung (1977) and it helps to avoid traditional assumptions on behaviour of the field B ( z ) as

11 z 1 1 +- . It may be attained by introducing of a projector on K :

where Z(t ) denotes right-hand side of (1).

It must be mentioned that several important optimization and estimation procedures may be put in the framework (1): - the Robbins-Monro (RM) procedure (1951), a ( t ) = @(t) . - an algorithm for

finding the root of the equation B ( z ) = 0 when function B ( z ) is measured with random e r r o r .$ (in the RM case w e shall denote coefficients by r ( t ) : a ( t ) = @ ( t ) = ~ ( t ) ).

- the Kiefer-Wolfowitz (KW) procedure (1952), &- + 0 - a stochastic analo- B(t) t + -

gue for a gradient method (in the KW case qt ( z ) has a meaning of a deterministic bias of gradient's estimate),

- procedures of parametric adaptation (Tsypkin, 1971).

Throughout the paper the convergence of the procedure (1) means a.s. convergence of the t rajector ies X(t ) t o a point z. as t += .

3. CONVERGENCE OF SAP WKTH INDEPENDENT NOISE When we study SAP on the basis of TLD - approach the process X(t) , defined

by (I), is considered as a small random perturbation of the stable dynamic system (2). Such an idea w a s proposed by Ljung (1974), Derevitskii and Fradkov (1974). The deviation of X(t )-trajectories from the t rajector ies of the dynamic system (2) may be estimated by means of the Gronwall-Bellman lemma:

Let s (t ) be a new time-variable (continuous time analogue),

Let x (t ) denote the solution of (2) at time s (t ) with initial value z (No) = X(No) N,

. If g a ( r ) 4 T , T > 0 . then the re exists a positive constant C = C(T) . such tha t No

(independence of t ( t ) is he re of no importance).

Investigation of the conditions f o r a.s. convergence is based on the analysis of asymptotics f o r t he probabilities of large deviations (PLD)

, 6

f o r a rb i t r a ry fixed values E > 0 and T > 0 , where

I t follows from conditions

M 1 1 #(t ) ( 1 4 const and l im &Lo t +- a ( t

tha t the variance of random vectors YNt(t) tends t o zero as i += , this fact explain-

ing the origin of the term "Large deviations".

By means of Levy's inequality estimation of PLD in (4) may be reduced t o the estimation of the probability of deviation at the final time of interval

Levy's i n e q u a l i t y (see Gihman and Skorohod. 1977, chapter 6).

Let v1 ,v2 , , - ,v, be independent random variables with zero mean, k

&K) = z v , and < . Then 1 2

P I max I P ( ~ ) I < c j S Z P I I P ( ~ ) I <$I. l rkrn

(5)

https://www.researchgate.net/publication/268645196_Adaptation_and_Learning_in_Automatic_Systems?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

If w e assume now tha t

i t is easy enough to yield t h e convergence of procedure (1). Indeed, i t follows from (3) and the Borel-Cantelli lemma tha t f o r all sufficiently la rge t t ra jector ies X ( t ) a.s. will be close t o the t ra jec tor ies z ( t ) of t he dynamic system (2 ) . and z ( t ) converges t o z, owing t o stability assumptions.

I t is well-known (see, f o r instance, Nevel'son and Has'minskii, 1972) tha t under condition

M Il # ( t ) ( l 2 s const

the sufficient condition f o r t he convergence has t he f o r m

If additional information on the distribution of 1 # ( t ) 1 is known, then using exact estimates f o r t h e probabilities

i t i s possible to weaken condition ( 6 ) and even t o obtain t he necessary and sufficient convergence conditions f o r some cases.

Godovan5uk and Korostelev (1983) considered t h e case of vec tors # ( t ) having power tails of distributions:

and showed tha t t h e necessary and sufficient convergence condition has the form

necessity of ( 9 ) follows from the left-hand inequality in (8 ) and t h e Borel-Cantelli lemma.

Now let vectors # ( t ) have finite exponential moment, i.e., Cramer's condition holds:

with application of t h e resul ts by Hanson and Wright (1969) t he probability in ( 7 ) is

M exp f ( ~ , # ( t ) ) ] < f o r all t > 0 , z e d (10)

shown t o have the o r d e r o

( the Gaussian noise o r the bounded with probability 1 noise give appropr ia te examples). For this case Korostelev (1979) proved tha t t he necessary and sufficient condition is t he following:

z a ( t ) exp 1 - h a ( t ) / B 2 ( t ) ] < - f o r d l h > 0. (11) t

For the RM procedure i t has t he form

z exp I - h / y ( t ) j < - f o r all h > 0 t

yield sufficiency, this inequality follows immediately from Chebyshev's inequality if t he noise #( t ) has finite moment of o r d e r v :

h+l z Bv( t ) N{ b

. The right-hand inequality in ( 8 ) i s used t o

https://www.researchgate.net/publication/243100246_Damping_Perturbations_of_Dynamical_Systems_and_Convergence_Conditions_for_Recursive_Stochastic_Procedures?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

(e.g., (11') holds with y ( t ) = t *,O < C S 1 ) - h e r e exponential estimates f o r PLD a r e used (Ventsel and Freidlin, 1970). The sufficiency of (11') f o r t he Gaussian noise w a s ea r l ie r obtained by Ljung (1974).

4. RATES OF CONVERGENCE FOR SAP WITH INDEPENDENT NOISE From the standpoint of the general theory of random processes convergent (in

some probability sense) recursive procedures have a specific nature since t he i r limiting distribution is degenerate. The analysis of rates of convergence is intend- ed f o r finding a deterministic function k ( t ) , k (t ) - + - , such tha t t he normalized

t -.a process Y(t ) ,

has a non-degenerate limiting distribution.

The f i r s t au thors to study rates of convergence f o r SAP were Chung (1954). Sacks (1958) who analysed asymptotic normality of t he normalized process and rates of moment's convergence. Methods based on the martingale approach received fu r the r development in t h e papers devoted to studying of a.s. rates of convergence f o r one-dimensional case (d =I). To formulate the resu l t s of these pa- pe r s w e will remind the definition of upper functions.

Definition. Let X( t ) a.s. converge to z, . Deterministic sequence h ( t ) i s called a n upper function f o r procedure (1) if

, X(t ) z , a-S. lim - - 1. t h ( t )

Thus, h ( t ) i s a deterministic sequence which majorizes ( for all sufficiently la rge t ) random t ra jec tor ies X(t ) z, , this majorant being precise - i t cannot be reduced.

Studying of upper functions is both of theoret ical and pract ical interest: t he i r application makes i t possible t o modify (1) so tha t X(t ) converges to z. "from one side" (see, f o r instance, Anbar, 1977). Such modifications are useful f o r the problems with constraints on t he domain of definition of function B ( z ) .

In papers by Gaposhkin and Krasulina (1974), Heyde (1974), Kersting (1977) i t i s shown tha t if function B ( z ) i s l inear in t he neighbourhood of z. :

a and a ( t )=@(t)=- , M t2( t )=t? , then upper function f o r procedure (1) i s defined

t by the l a w of i t e ra ted logarithm:

The most strong resul t i s obtained by Kersting - only t he existence of variance is required f o r t he noise t ( t ) . But linearity of function B ( z ) is of primary importance f o r this method since i t i s based on the explicit solution of the linearized equation.

Application of the technique of TLD (Ventsel and Freidlin, 1983) makes i t possible t o obtain new resul ts concerning rates of a.s. convergence for SAP. These resul ts follow from the "sweeping theorem" (Korostelev, 1979): if condition (10) holds and the sum in (11) converges f o r h>ho and diverges f o r X<Ao (e.g.,

A0 a ( t )=@(t )=-), then under some regularity of t he coefficients a ( t ) ,@( t ) the tra- t

jectories X(t) a.s. have a set of limit points

https://www.researchgate.net/publication/238879086_On_the_Law_of_the_Iterated_Logarithm_in_Stochastic_Approximation_Processes?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==


https://www.researchgate.net/publication/38366869_Asymptotic_Distribution_of_Stochastic_Approximation_Procedures?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

where

9 cp(t) a r e absolutely continuous functions: Rd +Rd ,

H(u ) is a Legendre transform of a function G (2 ) = In A4 exp j(Z, [(t ) {:

Applying the "sweeping theorem" Korostelev (1983) extended (12) t o (a) mul- tidimensional case and (b) non-linear functions B ( z ) in one-dimensional case:

1 (a) if (10) holds, a ( t )= t -' . @(t )=t , ~ < b Sl ; matrix COV=M [((t ) tT ( t )] is

non-singular and B ( z ) is l inear in the neighbourhood of z.:

where B o + I / 2 is a s table matrix ( I - an identity ( d xd ) - matrix), then the normalized process Y(t ) ,

~ ( t ) = ( ~ ( t ) z . ) JtZb -'/ ~ n ~ n t

a.s. has s e t G of limit points, where

In par t icular , i t follows from (13) t h a t

(it i s well-known, s ee , f o r instance, Nevel'son and Has'minskii, 1972 - tha t finite- dimensional distributions of vectors dt '' -' (X(t ) z. ) converge, in t he corresponding time-scale, t o t he distributions of the stationary Gaussian process with covariance matrix S . The above resul t shows tha t t h e set of limit points f o r the process Y(t) coincides with t h e ellipsoid of equal probabilities f o r the limiting Gaussian distribution);

(b) in one-dimensional case, if (10) holds, upper functions are found f o r non- l inear functions B ( z ) :

E.g., if a ( t )=@( t )= t -C , O<CSl ; MP(t)=c? , then upper functions are defined by the formula

(for P Z 1 , unlike t h e l inear case, the solution of procedure (1) cannot b e evaluat-

ed in t he explicit form).

I t must be noted tha t Cramer's condition of the existence of exponential moment (10) is r a t h e r res t r ic t ive, besides that , only t he second moments of the noise #(t) en t e r final resul ts - namely covariance matrix COV in (13), variance o2 in (14). I t tu rns out tha t condition (10) can be weakened.

Korostelev and Leonov (1983) studied t he problem of existence of upper functions for SAP with t he noise #(t ) having power tails of distributions (8). In multidi- mensional case (13) remains valid, and in one-dimensional case the f o r m of upper functions coincides with (14) if condition

holds.

The fulfillment of condition (15) turns out t o be of primary importance f o r t he existence of upper functions. Let us consider, f o r example, a linear one- dimensional procedure

where variables #( t ) have power tails of distributions (8). Let condition (15). which h e r e has t h e f o r m vC > 2 , be violated and instead the following condition holds:

(the left-hand inequality provides tha t X(t ) a.s. converges to a point z, =O - cf. with (9)).

The process t ' I 2 x ( t ) is asymptotically normal, but under condition (17) t h e r e does not exist an upper function for procedure (16) in t he class of non-increasing sequences h ( t ) (see Korostelev and Leonov, 1983):

-v a.s. if x [h ( t ) t '1 < - . then lim h -'(t )X(t) = 0.

t t -.-

as. if x [A (t ) . t ~ ] - ~ = - , then &i I h "(t)X(t) 1 = +

t t +-

This example is apparently of origin interest for the theory of random processes.

5. SAP WITH DEPENDENT NOISE While investigating SAP with independent noise t h e main attention, as shown

above, is given to the analysis of interdependence between the conditions for the coefficients a ( t ) , b( t ) and f o r the distribution of random vectors t ( t ) . In t h e case of dependent noise i t i s necessary to introduce t h e conditions for weakening of dependence between elements of the random sequence I t ( t ) j (so t ha t t h e "past" and the "future" are asymptotically independent). Traditionally t h e dependence of the It(t)j-sequence is described by different mixing conditions (see Rosenblatt, 1974, or Ibragimov and Linnik, 1965).

The f i r s t publications on SAP with dependent noise appeared in t h e beginning of the 1960s (Driml and Nedoma, 1960; Sakrison, 1964). During the last decade the number of papers devoted t o this problem significantly increased, t he methods f o r t he analysis of SAP with independent noise being developed and extended to the dependent case. The approaches f o r studying of SAP with dependent noise are diverse. They are based on the theory of quasimartingales (Borodin, 1979), t he averaging principle (Geman, 1979; Krasulina, 1975; Kul'chitskii, 1978). the

https://www.researchgate.net/publication/268656953_Stochastic_approximations_for_continuous_random_processes?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

https://www.researchgate.net/publication/266052202_A_method_of_averaging_for_random_differential_equations_with_applications_to_stability_and_stochastic_approximations?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

https://www.researchgate.net/publication/266277053_A_Continuous_Kiefer-Wolfowitz_Procedure_for_Random_Processes?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

analysis of conditional means (Poznyak and Chikin, 1984); also such publications may be mentioned as Ljung (1977, 1978), Kushner (1977, 1978), Farden (1981), Solo (1982). Bibliography may be found in Korostelev and Leonov (1984), Poznyak and Chikin (1984).

Kushner (1983, 1984) applies TLD t o investigate asymptotics of t he probabili- t ies of escape time from the neighbourhood of s table points of SAP with dependent noise.

Application of t he TLD-approach f o r t he convergence analysis of SAP with dependent noise makes i t possible t o obtain the resul ts which are similar t o t he corresponding resu l t s in t he independent case o r even coincide with them. In this section w e will confine ourselves t o t he formulation of the resul ts fo r the RM-type procedures ( a ( t ) =@(t ) =7(t ) ).

Leonov (1982) (see also Korostelev, 1984, chapter 6) studied t he necessary and sufficient convergence conditions f o r procedure (1) where vectors # ( t ) have power tails of distribution (8) and satisfy strong mixing condition:

F, .F, are sigma-algebras, generated by sequences {((s ), #(s +I),.... I and {#(I), . - , #(s ) j respectively, p(t ) is a coefficient of strong mixing (see Ibragimov and Linnik, 1965). If condition

holds, then i t is sufficient f o r t he convergence tha t t h e r e exists an a rb i t r a ry s m a l l positive 6 , such t h a t

If coefficient p(t ) exponentially decreases:

p ( t ) s C e - A t forsome C >0 ,A > 0 , (20)

then the necessary condition is the following:

C 7"+'(t ) < 0 f o r all 6 > o t

(similar to the independent case, t he right-hand inequality in (8) i s used to obtain sufficiency, t he left-hand one - t o obtain necessity).

While proving the sufficiency of condition (19) the key ro le is played by the analogues of Levy's inequality (5) and Hanson-Wright's estimate f o r dependent noise. Here is t he formulation of these results.

Lemma. Let condition (18) hold. Then f o r a rb i t r a ry E>O , T>O , 6>0 t h e r e exist constants C1=Cl(&, T) >0,C2=C2(&, T ,6) >0 and natural number r= r (&,T) , such tha t

inequalities (22), (23) being valid uniformly fo r such N,,N1 , tha t

https://www.researchgate.net/publication/222029276_General_convergence_results_for_stochastic_approximations_via_weak_convergence_theory?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



https://www.researchgate.net/publication/3027980_Analysis_of_Recursive_Stochastic_Algorithms?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

https://www.researchgate.net/publication/271512974_Asymptotic_properties_of_procedures_of_stochastic_approximation_with_dependent_noise?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

Here is an example: Let q ( t ) be i.i.d. random variables with density of distribution p ( z ) =Co(l+ 1 z I +") , Co-normalizing constant, then sequence #(t ) { , defined by

u(#) is a saturation function), satisfies (a), (20) and thus, also (18) - i t can b e easily shown tha t sequence #(t ) satisfies Doeblin's condition (see Doob, 1953).

Another example of the noise satisfying strong mixing condition is a sequence of m-dependent random vectors: p(t)=O , t >m . By means of inequality (22) the necessary and sufficient convergence condition is obtained for this case and i t has the form

which coincides with independent case (cf. with (9)).

Strong mixing condition is r a t h e r res t r ic t ive and eliminates f r o m consideration some interesting examples. Such an example is provided by a sequence t.$(t)] having "innovations representation":

where q ( k ) a r e independent random vectors with ze ro mean. SAP with the noise having the representation (25) were studied by Ljung (1978). Exact estimates fo r PLD are not used by Ljung, but rough estimates like Chebyshev's inequality, and in o u r notation his resu l t may b e formulated in the following manner: if for some even number v,v24 ,

7( t ) is a non-increasing sequence, such tha t

- I 1

then condition

is sufficient for t h e convergence (condition (27) is, certainly, less res t r ic t ive than the traditional condition (6) ).

With the help of the analysis of exact asymptotics for PLD Korostelev and Leonov (1984) strengthened Ljung's result: if in (25) vectors q ( k ) have power tails of distributions (8). then the necessary and sufficient convergence condition has t he form (24) and is t h e s a m e as for SAF with independent noise.

An important example of the noise satisfying (25) is a stationary autoregressive scheme (d =1 ):

where q is an a rb i t r a ry natural number, ai are real constants, such tha t all t he

https://www.researchgate.net/publication/38358325_Strong_Convergence_of_a_Stochastic_Approximation_Algorithm?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

roots z (i ) of t he character is t ic equation

i =o

satisfy inequality: 1 z (i) 1 <I , i == . The case of dependent noise, forming Markov chain on the compact s e t and

thus, satisfying condition ( lo) , was studied by Korostelev (1984, chapter 6). It is shown tha t under regularity conditions of type (26) t he necessary and sufficient convergence condition coincides with the one fo r the independent case and has the form (11). The "sweeping theorem" is generalized also f o r t h e case of dependent noise having finite exponential moment (10).

With the application of limit theorems by Ventsel (1979) and Freidlin (1978) some resul ts are obtained concerning r a t e s of convergence fo r SAP with dependent noise having power tails of distributions and forming a stationary autoregressive scheme (28). If condition (15) holds, random variables q ( t ) satisfy (8) and hfq2(t )=u$ , then upper functions a r e defined by formula (14), where

(see Leonov, 1984). The resul t , absolutely analogous to the independent case, i s also t rue fo r t he procedure (16) with autoregressive noise.

6. A STOCHASTIC RECURSIYE PROCEDURE FOR MINIMIZATION OF AN ADDI- TIYE FUNCTION In this section the theory of large deviations is applied, as a n illustrative ex-

ample, to the analysis of a recursive procedure f o r minimization of an additive function:

Procedure under consideration has the following form:

where UR=fx: 11 z 11 G! , nuR i s a projector.

Algorithms of type (29) may be used when the application of traditional optimization methods, such as gradient methods, meets with cer tain difficulties - e.g., n is large enough o r calculation of functions f i ( z ) is laborious. In such situations i t is possible t o use functions q ( z ) one by one f o r finding the minimum

z.@) = argmin Fp (z). Rd

indexes d ( t ) being selected in a deterministic (6( t )=t [modn ] , f o r example) o r a random manner.

Here we shall study procedure (29) with random selection of indexes: j 6 ( t ) 1 is a sequence of i.i.d. random variables, such tha t -

P I d ( t ) = i j=pi , i = l , n .

The following assumptions a r e also supposed to be fulfilled.

81. iy ( t ) 1 is a deterministic sequence of positive numbers, i t satisfies traditional conditions

lim y(t )=0 , C y(t)=oo, t 4- t

https://www.researchgate.net/publication/271136954_Asymptotic_properties_of_stochastic_recursive_procedures_with_autoregressive_noise?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

and a "regularity condition": f o r every T >O

t 1 lim [ max 7 ( t )]/ [ min 7 ( t ) =l , if C 7 ( t ) s T

to+- t e t c t 1 t e t d 1 I t o

(e.g., (30) is satisfied with 7 ( t ) =t -a , 0 <a <1 ). fc

B2. f ( z ) , i == , are convex continuous differentiable functions: R~ ; Fp ( z ) has a unique minimum f o r a rb i t ra ry set !pi j , such that

Moreover, t he re exist integers j and m , such that

z; f z: . z; = argmin j', (z ). ~d

B3. Gradients l V f i ( z ) j satisfy the Lipschitz condition in U' and X;EU' , - i =l.n ; R is a positive constant, such that

(existence of such R follows from B2).

Procedure (29) can be put in the framework (1) :

where ~(z)=-C~~vj'~(z);~I#,(t)=-v~~(z)-~(z)j=~~ i

(here the noise #= ( t ) depends on space coordinate z ). Thus. t he main resul ts of the SAP'S analysis discussed in sections 3 , 4 remain valid. Since vectors #= ( t ) are bounded with probability 1 , i.e., condition (10) holds, t he necessary and sufficient condition f o r t he convergence t o a point z.@) has the form (11) . The analogue of (13) also remains t rue , matrix C W having representation

If t he sum in (11') converges f o r X > X o and diverges f o r X < X o , then the re exists a s e t Wp ( A O ) of a.s. limit points f o r the t rajector ies X(t ) :

W p ( X 0 ) = 11 €Rd:vp( z ) + X o j .

where

V,(z) = inf I I ~ , ~ ( ~ . @ ) , C ) , C ( T ) = z I , T>O T

4p(t ) are absolutely continuous functions,

H ( z , u ) i s a Legendre transform of a function G, (2) :

Now let us consider procedure (29) f o r which the sum in (11') diverges f o r all X > O : X o = o o , e.g., 7 ( t ) = l / a . In this case the set of limit points has t he representation

Wp =Wp ( a ) =!z : t h e r e exist an absolutely continuous function q( t ) and positive T , such tha t q ( o ) = z P ) . q(T)=z and I~,~(Z.@),(P)<-] . The following theorem establishes t he relation between Wp and Qf , where Qf

is a set of Pareto-optimal solutions of a multicriterial minimization problem

f I

(it must be outlined tha t minimization of the weighted sum xpifi ( z ) is one of t he i

ways f o r c r i te r ia ' s scalarization).

Let f p be a closure of Wp . Theorem. 6 does not depend on !pi ] : Cp =W . where W is a bounded closed

se t , such tha t

Moreover, Let E be a domain of accessibility f o r t h e dynamic system

vi ( t ) are absolutely continuous non-negative functions,

C v , ( t ) = l . i

Then E =W , where E is a closure of E

(E=Iz : t h e r e exis t a function q ( t ) and t l , such tha t q ( t ) satisfies (31) and

v( t l )=z 1. The proof of t he theorem is based on the TLD-technique (Gartner. 1977;

Ventsel and Freidlin, 1983) and i s obtained by the present au thor (Leonov, 1985). This publication also contains examples of sets W , Qf in t he Euclidean space R' . For some cases these sets coincide: W=Qf , f o r another s t r i c t imbedding takes place: W 3 Q f .

Anbar, D. (1977): A modified Robbins-Monro procedure approximating zero of a regression function from below. - Ann. Statist., v.5, No. 1 , pp. 229-234.

Borodin. A.N. (1979): A stochastic approximation procedure in t h e case of weakly dependent observations. - Theor. Prob. Appl., v. 24, No. 1.

Chung, K.L. (1954): On a stochastic approximation method. - Ann. Math. Stat., v. 25, NO. 3, pp. 463-483.

Derevitskii, D.P. and A.L. Fradkov (1974): Two models f o r analyzing the dynamics of adaptation algorithms. - Automat. Telemech., No. 1 (Automat. and Remote Contr., v. 35).

Driml, M. and J. Nedoma (1960): Stochastic approximation f o r continuous random processes. - In: Trans. 2nd Prague Conference on Inform. Theory, Stat . Deci- sion Functions and Random Proc., Prague, pp. 145-158.

Doob, J.L. (1953): Stochastic Processes. - John Wiley & Sons, N.Y.

https://www.researchgate.net/publication/265314959_Two_models_for_analyzing_the_dynamics_of_adaptation_algorithms?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==








https://www.researchgate.net/publication/240147095_On_large_deviations_for_invariant_measure?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

https://www.researchgate.net/publication/259527289_Stochastic_algorithm_for_minimization_of_an_additive_function?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

Farden, D.C. (1981): Stochastic approximation with correlated data. - IEEE Trans. on Information Theory, v. IT-21, No. 1 , pp. 105-113.

Freidlin, M.I. (1978): The averaging principle and theorems on large deviations. - Russian Math. Surveys, v. 33, July-Dec.

Gaposhkin, V.F. and T.P. Krasulina (1974): On the law of i terated logarithm f o r stochastic approximation procedures. - Theor. Prob. Appl., v. 19. No. 4.

Gartner, J. (1977): On large deviations from the invariant measure. - Theor. Prob. Appl., v. 22, No. 1.

Geman, S. (1979): A method of averaging f o r random differential equations with applications t o stability and stochastic approximation. - In: Approximate Solution of Random Evolution Equations, N.Y. - Oxford: North Holland, pp. 49-86.

Gihman, 1.1. and A.V. Skorohod (1977): Introduction t o the Theory of Random Processes. - Moscow: Izd. Nauka (in Russian).

GodovanEuk, V.V. and A.P. Korostelev (1983): Conditions f o r t he local convergence of recursive stochastic procedures. - Theor. Prob. Appl., v. 28, No. 1.

Hanson, D.L. and F.T. Wright (1969): Some more results on rates of convergence in the law of large numbers f o r weighted sums of independent random variables. - Trans. Amer . Math. Soc., v. 141, pp. 443-464.

Heyde, C.C. (1974): On Martingale limit theory and strong convergence results fo r stochastic approximation procedures. - Stoch. Proc. Appl., v. 2. No. 4, pp. 359-370.

Ibragimov, I.A. and Yu. V. Linnik (1965): Independent and Stationary Random Vari- ables. - Moscow: Izd. Nauka (in Russian).

Kersting, G. (1977): Almost su re approximation of the Robbins-Monro process by sums of independent random variables. - Ann. Prob., v. 5, No. 6 , pp. 954-965.

Kiefer, J. and J. Wolfowitz (1952): Stochastic estimation of the maximum of a regression function. - Ann. Math. Stat., v. 23, No. 3, pp. 462-466.

Korostelev, A.P. (1979): Damping perturbations of dynamic systems and convergence conditions f o r recursive stochastic procedures. - Theor. Prob. Appl., v. 24, No. 2.

Korostelev, A.P. (1983): A note on upper functions f o r stochastic approximation. - Theor. Prob. Appl., v. 28, No. 4.

Korostelev, A.P. (1984): Stochastic Approximation Procedures (Local Properties). - Moscow: Izd. Nauka (in Russian).

Korostelev, A.P. and S.L. Leonov (1983): Upper functions f o r stochastic approximation procedures with power tails of distributions. - In: Dynamics of Non- Homogeneous Systems, Moscow: VNIISI, pp. 55-64 (in Russian).

Korostelev, A.P. and S.L. Leonov (1984): Stochastic approximation procedures with stationary noise. - In: Statistical Models and Methods, Moscow: VNIISI, pp. 46-61 (in Russian).

Krasulina, T.P. (1975): Some notes on the stochastic approximation. - Automat. Telemech., No. 7.

Kul'chitskii, O.Yu. (1978): Non-Markov algorithms f o r statistical optimization with continuous time. - Automat. Telemech., No. 5,6.

Kushner, H.J. (1977): General convergence results f o r stochastic approximations via week convergence theory. - J. Math. Anal. Appl., v. 61, pp. 490-503.

https://www.researchgate.net/publication/220680586_Stochastic_approximation_with_correlated_data?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



https://www.researchgate.net/publication/243038308_The_Averaging_Principle_and_Theorems_on_Large_Deviations?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

https://www.researchgate.net/publication/243038308_The_Averaging_Principle_and_Theorems_on_Large_Deviations?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==












https://www.researchgate.net/publication/243100394_Conditions_for_Local_Convergence_of_Recursive_Stochastic_Procedures?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



https://www.researchgate.net/publication/243076610_Some_More_Results_on_Rates_of_Convergence_in_the_Law_of_Large_Numbers_for_Weighted_Sums_of_Independent_Random_Variables?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



https://www.researchgate.net/publication/227424123_On_martingale_limit_theory_and_strong_convergence_results_for_stochastic_approximation_procedures?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==




https://www.researchgate.net/publication/38362212_Almost_Sure_Approximation_of_the_Robbins-Monro_Process_by_Sums_of_Independent_Random_Variables?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



https://www.researchgate.net/publication/38367465_Stochastic_Estimation_of_the_Maximum_of_A_Regression_Function?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==





https://www.researchgate.net/publication/268856548_Upper_functions_for_stochastic_approximation_procedures_with_power_tails_of_distributions?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==







Kushner, H.J. and D.S. Clark (1978): Stochastic Approximation f o r Constrained and Unconstrained Systems. - Springer.

Kushner, H.J. (1983): Asymptotic behaviour of stochastic approximation and large deviations. - LCDS Report 83-1, Brown Univ., Providence, R.I.

Kushner, H.J. and P. Dupuis (1984): Stochastic approximations via large deviations: asymptotic properties. - LCDS Report 84-2, Brown Univ.. Providence, R.I.

Leonov. S.L. (1982): Conditions fo r the convergence of stochastic approximation procedures with dependent noise. - In: D a t a Analysis in System Studies, Mos- cow: VNIISI, pp. 44-54 (in Russian).

Leonov, S.L. (1984): Asymptotic properties of stochastic recursive procedures with autoregressive noise. - In: Methods of Complex Systems Studying, Mos- cow: VNIISI, pp. 20-27 (in Russian).

Leonov, S.L. (1985): A stochastic algorithm f o r minimization of an additive function. - Automat. Telemech., No. 10.

Ljung, L. (1974): Convergence of recursive stochastic algorithms. - Report 7403, Division of Automatic Control, Lund Institute of Technology, Lund, Sweden.

Ljung, L. (1977): Analysis of recursive stochastic algorithms. - IEEE Transactions on Automatic Control, v. AC-22, No. 4, pp. 551-575.

Ljung, L. (1978): Strong convergence of a stochastic approximation algorithm. - Ann. Stat., v. 6, No. 3, pp. 680-696.

Nevel'son, M.B. and R.Z. Has'minskii (1972): Stochastic Approximation and Recur- sive Estimation. - Moscow: Izd. Nauka (in Russian).

Poznyak, A.S. and D.O. Chikin (1984): Asymptotical properties of stochastic approximation procedures with dependent noise. - Automat. Telemech., No. 12.

Robbins, H. and S. Monro (1951): A stochastic approximation Method. - Ann. Math. Stat., v. 22, No. 2, pp. 400-407.

Rosenblatt. M. (1974): Random Processes. - Springer (2nd ed.)

Sacks, J. (1958): Asymptotic distribution of stochastic approximation procedures. - Ann. Math. Stat., v. 29, No. 2, pp. 373-405.

Sakrison, D.J. (1964): A continuous Kiefer-Wolfowitz procedure fo r random processes. - Ann. Math. Stat., v. 35, No. 2, pp. 590-599.

Solo, V. (1982): Stochastic approximation with dependent noise. - Stoch. Proc. Appl.. v. 13, pp. 157-170.

Tsypkin, Y a . Z. (1971): Adaptation and Learning in Automatic Systems. - N.Y.: Academic.

Ventsel, A.D. (1976): Rough limit theorems on la rge deviations f o r Markov processes, I, 11. - Theor. Prob. Appl., v. 21, No. 2, 3.

Ventsel, A.D. (1979): Rough limit theorems on large deviations f o r Markov processes, 111. - Theor. Prob. Appl., v. 24, No. 4.

Ventsel, A.D. and M.1. Freidlin (1970): On small random perturbations of dynamic systems. - Russian Math. Surveys, v. 25, Jan.-June.

Ventsel, A.D. and M.1. Freidlin (1983): Large Deviations. - Springer.

https://www.researchgate.net/publication/242395812_wchastic_approximation_methods_for_constrained_and_unconstrained_systems?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==










https://www.researchgate.net/publication/271136771_Conditions_for_the_convergence_of_stochastic_approximation_procedures_with_dependent_noise?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



















https://www.researchgate.net/publication/236736791_A_Stochastic_Approximation_Method?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==







https://www.researchgate.net/publication/222391674_Stochastic_approximation_with_dependent_noise?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==

https://www.researchgate.net/publication/222391674_Stochastic_approximation_with_dependent_noise?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==




https://www.researchgate.net/publication/270611559_Rough_Limit_Theorems_on_Large_Deviations_for_Markov_Stochastic_Processes_III?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



https://www.researchgate.net/publication/230945060_Random_Perturbations_of_Dynamical_Systems?el=1_x_8&enrichId=rgreq-f0f54176-864f-43db-8882-5b798f80f7a1&enrichSource=Y292ZXJQYWdlOzI3MDU3MTUzMztBUzoxODM1NTA4NDIyNTMzMTJAMTQyMDc3MzMzOTI3OA==



the theory of large deviations for random processes and strong convergence of stochastic...

Documents