stochastic approximation and simulated annealing
DESCRIPTION
Stochastic Approximation and Simulated Annealing. Lecture 8. Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous Optimization. Content. Introduction. Stochastic Approximation : SPSA with Lipschitz perturbation operator; - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/1.jpg)
Stochastic Approximation and Simulated Annealing
Lecture 8
Leonidas SakalauskasInstitute of Mathematics and InformaticsVilnius, Lithuania EURO Working Group on Continuous Optimization
![Page 2: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/2.jpg)
Introduction.
Stochastic Approximation: SPSA with Lipschitz perturbation operator; SPSA with Uniform perturbation operator; Standard Finite Difference Approximation
algorithm.
Simulated Annealing
Implementation and Applications
Wrap-Up and Conclusions
Content
![Page 3: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/3.jpg)
In many practical problems of technical design some of the data may be subject to significant uncertainty which is reduced to probabilistic-statistical models.
The performance of such problems can be viewed like constrained stochastic optimization programming tasks.
Stochastic Approximation can be considered as alternative to traditional optimization methods, especially when objective functions are no differentiable or computed with noise.
Introduction
![Page 4: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/4.jpg)
Application of Stochastic Approximation to solving of optimization problems, while the objective function is non-differentiable or nonsmooth and computed with noise is a topical theoretical and practical problem.
The known methods of Stochastic Approximation for solving of these problems use the idea of stochastic gradient and certain rules of changing of step length for ensuring the convergence.
Stochastic Approximation
![Page 5: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/5.jpg)
The optimization problem is (minimization)
as follows:
where is a bounded from below Lipshitz function.
nx
minxf
n:f
Formulation of the optimization problem
![Page 6: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/6.jpg)
Let be generalized gradient of this function.
Assume to be a set of stationary points:
and to be a set of function values:
*X
,xfxX * 0
*F
.Xx,xfzzF **
)x(f
Formulation of the optimization problem
![Page 7: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/7.jpg)
We consider a function smoothed by perturbation operator:
where is the value of the perturbation parameter.
The functions smoothed by this operator are twice continuously differentiable (Rubinstein & Shapiro (1993), Bartkute & Sakalauskas (2004)), that offers certain opportunities creating optimization algorithms.
0
, , ~f x Ef x p
![Page 8: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/8.jpg)
At last time the interesting research was focussed on
Stochastic Perturbation Stochastic Approximation (SPSA)
It is enough to calculate values of the function only in one or some points for the estimation of the stochastic gradient in SPSA algorithms, that promises for us to reduce numerical complexity of optimization.
Advantages of SPSA
![Page 9: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/9.jpg)
1. SPSA with Lipschitz perturbation operator.
2. SPSA with Uniform perturbation operator.
3. Standard Finite Difference Approximation algorithm.
SA algorithms
![Page 10: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/10.jpg)
... 2, ,1 k ,gxx kk
k1k
.0,,,,,, xgxgxgExg
General Stochastic Approximation scheme
where stochastic gradient
and
kkkk xgg ,,
This scheme is the same for different Stochastic Approximation algorithms whose distinguish only by approach for stochastic gradient estimation.
![Page 11: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/11.jpg)
xfxf,,xg
SPSA with Lipschitz perturbation operator
Gradient estimator of the SPSA with Lipschitz perturbation operator is expressed as:
where -is the value of the perturbation parameter,
-is uniformly distributed in the unit ball vector
.yif,
,yif,Vy n
10
11
nV -is the volume of the n-dimensional ball (Bartkute & Sakalauskas (2007))
![Page 12: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/12.jpg)
2
xfxf,,xg
SPSA with Uniform perturbation operator
Gradient estimator of the SPSA with Uniform perturbation operator is expressed as:
where -is the value of the perturbation parameter,
n ,....,, 21 -is a vector consisting of variables uniformly distributed from the interval [-1;1] (Mikhalevitch et al (1987)).
![Page 13: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/13.jpg)
Standard Finite Difference Approximation algorithm
Gradient estimator of the Standard Finite Difference Approximation algorithm is expressed as:
where -is the value of the perturbation parameter,
-is uniformly distributed in the unit ball; vector
xfxf,,,xg i
i
0,.....,1,....,0,0,0i -is the vector with zero components except ith one, which is equal to 1. (Mikhalevitch et al (1987)).
![Page 14: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/14.jpg)
Let consider that the function f(x) has a sharp
minimum in the point , in which the algorithm converges
when
Then
where A>0, H>0, K>0 are certain constants, is minimum point of the smoothed function.
*x
,1121
22*1
1
b
aHk
k
okH
baKAxxE
k
,k
ak ,0a ,
k
bk ,10,0 b .
2
1
Hb
a
1
*
kx
Rate of convergence
![Page 15: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/15.jpg)
The proposed methods were tested with following
functions:
where is a set of real numbers randomly and uniformly
generated in the interval ,
ka
Computer simulation
n
kkk Mxaf
1
K, .K 0
The samples of T=500 test functions were generated, when .K, 52
![Page 16: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/16.jpg)
Empirical and theoretical rates of convergence by SA methods
Theoretical rates
1.5 1.75 1.9
Empirical rates
SPSA (Lipshitz perturbation)
n = 2 1.45509 1.72013 1.892668
n = 4 1.41801 1.74426 1.958998
SPSA ( Uniform perturbation)
n = 2 1.605244 1.938319 1.988265
n = 4 1.551486 1.784519 1.998132
Stochastic Difference Approximation method
n = 2 1.52799 1.76399 1.90479
n = 4 1.50236 1.75057 1.90621
50. 750. 90.
![Page 17: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/17.jpg)
The rate of convergence (n = 2)2*xxE k
![Page 18: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/18.jpg)
The rate of convergence (n = 10)
2*xxE k
![Page 19: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/19.jpg)
Let us consider the application of SA to the minimization of the mean absolute pricing error for the parameter calibration in the Heston Stochastic Volatility model [Heston S. L.(1993)].
We consider the mean absolute pricing error (MAE) defined as :
Volatility estimation by Stochastic Approximation algorithm
N
ii
Hi CC
NMAE
1
v,, , , ,1
v,, , , ,
where N is the total number of options, iC and HiC
the realized market price and the implied the theoretical model price, respectively, while (n=6) are the parameters of the Heston model to be estimated.
v,, , , ,
represent
![Page 20: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/20.jpg)
To compute option prices by the Heston model, one needs input parameters that can hardly be found from the market data. We need to estimate the above parameters by an appropriate calibration procedure. The estimates of the Heston model
parameters are obtained by minimizing MAE:
Let consider the Heston model for the Call option on SPX (29 May 2002).
min v,, , , , MAE
![Page 21: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/21.jpg)
Minimization of the mean absolute pricing error by SPSA and SFDA methods
![Page 22: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/22.jpg)
In cargo oil tankers design, it is necessary to
choose such sizes for bulkheads, that the weight of
bulkheads would be minimal.
Optimal Design of Cargo Oil Tankers
![Page 23: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/23.jpg)
min885.5
22
231
314
xxx
xxxxf
subject to 094.8
6
14.0 2
223131421
xxxxxxxxg
094.82.212
12.0 3
422
231314
222
xxxxxxxxg
015.00156.0 143 xxxg
015.00156.0 344 xxxg
005.145 xxg
0236 xxxg
The minimization of weight of bulkheads for the cargo oil tank we can formulate like nonlinear programing task (Reklaitis et al (1986)):
where 1x - width, 2x -debt, 3x - lenght, 4x - thikness.
![Page 24: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/24.jpg)
SPSA with Lipschitz perturbation for the cargo oil target design
6.5
6.6
6.7
6.8
6.9
7
7.1
7.2
7.3
7.4
7.5
100 1000 1900 2800 3700 4600 5500 6400 7300 8200 9100 10000
Number of iterations
![Page 25: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/25.jpg)
Confidence bounds of the minimum (A=6.84241, T=100, N=1000)
6.4
6.5
6.6
6.7
6.8
6.9
7
7.1
2 102 202 302 402 502 602 702 802 902 Number of iterations
Upper bound
Lower bound
Minimum of theobjective function
![Page 26: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/26.jpg)
Simulated AnnealingGlobal optimization methods
Global algorithms (bounds and branch algorithms, dynamic programming, full selection, etc)
Greedy optimization (local search) Heuristic optimization
![Page 27: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/27.jpg)
Metaheuristics Simulated Annealing Genetic Algorithms Swarm Intelligence Ant Colony Taboo search Scatter search Variable neighborhood Neural Networks Etc.
![Page 28: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/28.jpg)
Simulated Annealing algorithm
Simulated Annealing algorithm is developed by modeling steel annealing process (Metropolis et al. (1953))
A lot of applications in Operational Research and Data Analysis, etc.
![Page 29: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/29.jpg)
Simulated Annealing
Main idea: to simulate drift of current solution with
probability distribution
to improve solution updating - temperature function - neighborhood function
kTk
),( kTxP
![Page 30: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/30.jpg)
Simulated Annealing algorithm
Step 1. Choose , , , set . Step 2. Generate drift with probability distribution Step 3. If and (Metropolis rule)
then accept: ; otherwise Step 2
0T00x 0k1kZ
),( kTxPkkZ 1
k
kkk
T
Zxfxf
e)()( 1
11 kkk Zxx
)1,0(U
![Page 31: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/31.jpg)
Improvement of SA by Pareto Type models
The theoretical investigation of SA convergence shows, that in these algorithms Pareto type models can be applied to form search sequence (Yang (2000)).
Class of Pareto models, main feature and parameter:
Pareto model’s distributions have "heavy tails“.
α - the main parameter of these models, which impacts the heaviness of the tail
α –stable distributions are Pareto (follows to C.L.T.)
![Page 32: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/32.jpg)
Pareto type (Heavy-tailed) distributions
Main features:
infinite variance, infinite mean
Introduced by Pareto in the 1920’s
Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena.
There are a lot of other applications (financial market, traffic in computer and telecommunication networks, etc.).
![Page 33: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/33.jpg)
Pareto type (Heavy-tailed) distributions
Decay of DistributionsHeavy-Tailed - Power Law (polynomial) Decay (e.g. Pareto-Levy):
0,~}{ xCxxXP where 0 < α < 2 and C > 0 are constants
![Page 34: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/34.jpg)
- stable distributions
![Page 35: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/35.jpg)
Comparison of tail probabilities for standard normal, Cauchy and Levy distributions
In this table were compared the tail probabilities for the three distributions. It is clear that the tail probability for the normal quickly becomes negligible, whereas the other two distributions have a significant probability mass in the tail.
![Page 36: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/36.jpg)
Improvement of SAS by Pareto type models
The convergence conditions (Yang (2000)) indicate that, under suitable conditions, an appropriate choice of the temperature and neighborhood size updating functions ensures the convergence of the SA algorithm to the global minimum of the objective function over the domain of interest.
The following corollaries give different forms of temperature and neighborhood size updating functions corresponding to different kinds of generation probability density functions to guarantee the global convergence of the SA algorithm.
![Page 37: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/37.jpg)
Convergence of Simulated Annealing
![Page 38: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/38.jpg)
Improvement of SA in continuous optimization
The above corollaries indicate that a different form of temperature updating function has to be used with respect to a different kind of generation probability density function in order to ensure the global convergence of the corresponding SA algorithm.
![Page 39: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/39.jpg)
Convergence of Simulated Annealing
Some Pareto-type models were explored. See the Table 1.
![Page 40: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/40.jpg)
Convergence of Simulated Annealing
![Page 41: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/41.jpg)
Testing of SA for continuous optimization
In global and combinatorial optimization problems, when optimization algorithms are used, the reliability and efficiency of these algorithms is needed to be tested. Special testing functions, known in literature, are used for this.Some of these functions have one or more global minimum, some of them have global and local minimums. With the help of these functions it can be ensured, that the methods are efficient enough, thus, it is possible to test and prevent algorithms from being trapped in local minimum, as well as the speed and accuracy of convergence and other parameters can be watched.
![Page 42: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/42.jpg)
Testing criteriaBy modeling SA algorithm with some testing functions with two different distributions, and changing some optional parameters, there were some questions:
which of these distributions guarantees the faster convergence to global minimum by value of objective function;
what are probabilities of finding global minimum, how can impact these probabilities the changing of some parameters;
what the proper number of iterations, which guarantees the finding global minimum with desirable probability.
![Page 43: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/43.jpg)
Testing criteria
value of minimized objective function; probability to find global minimum after some
number of iterations.
These characteristics were computed by Monte-Carlo method - N realizations (N=100, 500, 1000) with K iterations each (K=100, 500, 1000, 3000, 10000, 30000).
Characteristics evaluated by Monte-Carlo simulation:
![Page 44: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/44.jpg)
Testing functions
An example of test function:Branin’s RCOS (RC) function (2 variables):
RC(x1,x2)=(x2-(5/(42))x12+(5/)x1-6)2+10(1-(1/(8)))cos(x1)+10; Search domain: 5 < x1 < 10, 0 < x2 < 15;
3 minima: (x1 , x2)*=(- , 12.275), ( , 2.275), (9.42478 , 2.475);RC((x1 , x2)*)=0.397887.
![Page 45: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/45.jpg)
Simulation resulsts
![Page 46: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/46.jpg)
Simulation results
![Page 47: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/47.jpg)
Simulation results
![Page 48: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/48.jpg)
Simulation results
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
1 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
Iteracijų skaičius
Fig. 1. Probability to find global minimum by SA for Rastrigin function
![Page 49: Stochastic Approximation and Simulated Annealing](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56813c53550346895da5d53e/html5/thumbnails/49.jpg)
1. The SA methods have been considered for comparison SPSA with Lipschitz perturbation operator; SPSA with Uniform perturbation operator and SFDA method as well Simulated Annealing;
2. Computer simulation by Monte-Carlo method has shown that the empirical estimates of the rate of convergence of SA for nondifferentiable functions corroborate the theoretical rates
Wrap-Up and Conclusions
21,1
kO