simulated stochastic approximation annealing for global optimization...
TRANSCRIPT
Simulated Stochastic Approximation Annealingfor Global Optimization with a Square-Root
Cooling Schedule
Faming Liang
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Abstract
Simulated annealing has been widely used in the solution of optimization problems. As
known by many researchers, the global optima cannot be guaranteed to be located by
it unless a logarithmic cooling schedule is used. However, the logarithmic cooling
schedule is so slow that no one can afford to have such a long CPU time. We propose
a new stochastic optimization algorithm, the so-called simulated stochastic
approximation annealing algorithm. Under the framework of stochastic approximation
Markov chain Monte Carlo, we show that the new algorithm can work with a cooling
schedule in which the temperature can decrease much faster than in the logarithmic
cooling schedule, e.g., a square-root cooling schedule, while guaranteeing the global
optima to be reached when the temperature tends to zero. The new algorithm has
been tested on a few benchmark optimization problems, including feed-forward neural
network training and protein-folding. The numerical results indicate that the new
algorithm can significantly outperform simulated annealing and other competitors.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
The problem
The optimization problem can be simply stated as a minimization problem:
minx∈X
U(x),
where X is the domain of U(x).
Minimizing U(x) is equivalent to sampling from the Boltzmann distribution
fτ∗ (x) ∝ exp(−U(x)/τ∗)
at a very small value (closing to 0) of τ∗.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Simulated Annealing (Kirkpatrick et al., 1983)
It simulates from a sequence of Boltzmann distributions,
fτ1 (x), fτ2 (x), . . . , fτm (x),
in a sequential manner, where the temperatures τ1, . . . , τm form a decreasing ladder
τ1 > τ2 > · · · > τm = τ∗ > 0
with τ∗ ≈ 0 and τ1 reasonably large such that most uphill Metropolis-Hastings (MH)moves at that level can be accepted.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Simulated Annealing: Algorithm
1. Initialize the simulation at temperature τ1 and an arbitrary sample x0 ∈ X .
2. At each temperature τi , simulate the distribution fτi (x) for ni iterations usingthe MH sampler. Pass the final sample to the next lower temperature level asthe initial sample.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Simulated Annealing: Difficulty
The major difficulty with simulated annealing is in choosing the cooling schedule:
▶ Logarithmic cooling schedule O(1/log(t)): It ensures the simulation to convergeto the global minima of U(x) with probability 1. However, it is so slow that noone can afford to have so long running time.
▶ Linear or geometrical cooling schedule: A linear or geometrical cooling scheduleis commonly used, but, as shown in Holley et al. (1989), these schedules can nolonger guarantee the global minima to be reached.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Stochastic Approximation Monte Carlo (SAMC)
SAMC is a general purpose MCMC algorithm. To be precise, it is an adaptive MCMCalgorithm and also a dynamic importance sampling algorithm. Its self-adjustingmechanism enables it to be immune to local traps.
▶ Let E1, ...,Em denote a partition of the sample space X , which are madeaccording to the energy function as follows:
E1 = {x : U(x) ≤ u1}, E2 = {x : u1 < U(x) ≤ u2}, . . . ,Em−1 = {x : um−2 < U(x) ≤ um−1}, Em = {x : U(x) > um−1},
(1)
where u1 < u2 < . . . < um−1 are prespecified numbers.
▶ Let {γt} be a positive, non-increasing sequence satisfying the condition
∞∑t=1
γt = ∞,∞∑t=1
γ2t < ∞.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Stochastic Approximation Monte Carlo: Algorithm
1. (Sampling) Simulate a sample Xt+1 with a single MH update, which starts withXt and leaves the following distribution invariant:
fθt ,τ∗ (x) ∝m∑i=1
exp{−U(x)/τ∗ − θ
(i)t
}I (x ∈ Ei ), (2)
where I (·) is the indicator function.
2. (θ-updating) Setθt+ 1
2= θt + γt+1Hτt+1 (θt , xt+1), (3)
where Hτt+1 (θt , xt+1) = et+1 − π, et+1 = (I(xt+1 ∈ E1), ..., I(xt+1 ∈ Em)), andπ = (π1, . . . , πm).
Obviously, it is difficult to mix over the domain X if the temperature τ∗ is very low! In
this case, only very few points will be sampled from each subregion.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Space Annealing SAMC (Liang, 2007)
Suppose that the sample space has been partitioned as in (1) with u1, . . . , um−1
arranged in an ascending order. Let κ(u) denote the index of the subregion that asample x with energy u belongs to. For example, if x ∈ Ej , then κ(U(x)) = j . Let
X (t) denote the sample space at iteration t.
Space annealing SAMC starts with X (1) = ∪mi=1Ei , and then iteratively shrinks the
sample space by setting
X (t) = ∪κ(u(t)min+ℵ)
i=1 Ei , (4)
where u(t)min is the minimum energy value obtained by iteration t, and ℵ is a user
specified parameter.
A major shortcoming of this algorithm is that it tends to get trapped into local energyminima when ℵ is small and the proposal is relatively local.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
SAA Algorithm
Simulated Stochastic Approximation Annealing, or SAA in short, is a combination ofsimulated annealing and stochastic approximation.
▶ Let {Mk , k = 0, 1, . . .} be a sequence of positive numbers increasingly divergingto infinity, which work as truncation bounds of {θt}.
▶ Let σt be a counter for the number of truncations up to iteration t, and σ0 = 0.
▶ Let θ̃0 be a fixed point in Θ.
▶ E1, . . . ,Em is the partition of the sample space.
▶ π = (π1, . . . , πm) is the desired sampling distribution of the m subregions.
▶ {γt} is a gain factor sequence.
▶ {τt} is a temperature sequence.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
SAA Algorithm
1. (Sampling) Simulate a sample Xt+1 with a single MH update, which starts withXt and leaves the following distribution invariant:
fθt ,τt+1(x) ∝
m∑i=1
exp{−U(x)/τt+1 − θ
(i)t
}I (x ∈ Ei ), (5)
where I (·) is the indicator function.
2. (θ-updating) Setθt+ 1
2= θt + γt+1Hτt+1 (θt , xt+1), (6)
where Hτt+1 (θt , xt+1) = et+1 − π, et+1 = (I(xt+1 ∈ E1), ..., I(xt+1 ∈ Em)), andπ = (π1, . . . , πm).
3. (Truncation) If ∥θt+ 12∥ ≤ Mσt , set θt+1 = θt+ 1
2; otherwise, set θt+1 = θ̃0 and
σt+1 = σt + 1.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Features of SAA
▶ Self-adjusting mechanism: This distinguishes the SAA algorithm from simulatedannealing. For simulated annealing, the change of the invariant distribution issolely determined by the temperature ladder. While for SAA, the change of theinvariant distribution is determined by both the temperature ladder and the pastsamples. As a result, SAA can converge with a much faster cooling schedule.
▶ Sample space shrinkage: Compared to space annealing SAMC, SAA also shrinksits sample space with iterations but in a soft way: it gradually biases samplingtoward local energy minima of each subregion through lowering the temperaturewith iterations. This strategy of sample space shrinkage reduces the risk ofgetting trapped into local energy minima.
▶ Convergence: SAA can achieve essentially the same convergence toward globalenergy minima as simulated annealing from the perspective of practicalapplications.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Formulation of SAA
The SAA algorithm can be formulated as a SAMCMC algorithm with the goal ofsolving the integration equation
hτ∗ (θ) =
∫Hτ∗ (θ, x)fθ,τ∗ (x)dx = 0, (7)
where fθ,τ∗ (x) denotes a density function dependent on θ and the limitingtemperature τ∗s, and h is called the mean field function.
SAA works through solving a system of equations defined along the temperaturesequence {τt}:
hτt (θ) =
∫Hτt (θ, x)fθ,τt (x)dx = 0, t = 1, 2, . . . , (8)
where fθ,τt (x) is a density function dependent on θ and the temperature τt .
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Conditions on mean filed function
For SAA, the mean field function is given by
hτ (θ) =
∫Hτ (θ, x)fθ,τ (x)dx =
(S(1)τ (θ)
Sτ (θ)− π1, . . . ,
S(m)τ (θ)
Sτ (θ)− πm
), (9)
for any fixed value of θ ∈ Θ and τ ∈ T , where S(i)τ (θ) =
∫Ei
e−U(x)/τdx/eθ(i), and
Sτ (θ) =∑m
i=1 S(i)τ (θ).
Further, we define
vτ (θ) =1
2
m∑i=1
(S(i)τ (θ)
Sτ (θ)− πi
)2
, (10)
which is the so-called Lyapunov function in the literature of stochastic approximation.
Then it is easy to verify that SAA satisfies the stability condition.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Stability Condition: (A1)
The function hτ (θ) is bounded and continuously differentiable with respect to both θand τ , and there exists a non-negative, upper bounded, and continuously differentiablefunction vτ (θ) such that for any ∆ > δ > 0,
supδ≤d((θ,τ),L)≤∆
∇Tθ vτ (θ)hτ (θ) < 0, (11)
where L = {(θ, τ) : hτ (θ) = 0, θ ∈ Θ, τ ∈ T } is the zero set of hτ (θ), andd(z,S) = infy{∥z − y∥ : y ∈ S}. Further, the set v(L) = {vτ (θ) : (θ, τ) ∈ L} isnowhere dense.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Conditions on observation noise
Observation noise: ξt+1 = Hτt+1 (θt , xt+1)− hτt+1(θt).
▶ One can directly impose some conditions on observation noise, see e.g., Kushnerand Clark (1978), Kulkarni and Horn (1995), and Chen (2002). Theseconditions are usually very weak, but difficult to verify.
▶ Alternatively, one can impose some conditions on the Markov transition kernel,which can lead to required conditions on the observation noise.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Doeblin condition: (A2)
(A2) (Doeblin condition) For any given θ ∈ Θ and τ ∈ T , the Markov transitionkernel Pθ,τ is irreducible and aperiodic. In addition, there exist an integer l ,0 < δ < 1, and a probability measure ν such that for any compact subsetK ⊂ Θ,
infθ∈K,τ∈T
P lθ,τ (x ,A) ≥ δν(A), ∀x ∈ X , ∀A ∈ BX ,
where BX denotes the Borel set of X ; that is, the whole support X is a smallset for each kernel Pθ,τ , θ ∈ K and τ ∈ T .
Uniform ergodicity is slightly stronger than V -uniform ergodicity, but it just serves
right for the SAA as for which the function Hτ (θ,X ) is bounded, and thus the mean
field function and observation noise are bounded. If the drift function V (x) ≡ 1, then
V -uniform ergodicity is reduced to uniform ergodicity.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Doeblin condition
To verify (A2), one may assume that X is compact, U(x) is bounded in X , and theproposal distribution q(x , y) satisfies the local positive condition:
(Q) There exists δq > 0 and ϵq > 0 such that, for every x ∈ X ,|x − y | ≤ δq ⇒ q(x , y) ≥ ϵq .
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Conditions on {γt} and {τt}: (A3)
(i) The sequence {γt} is positive, non-increasing and satisfies the followingconditions:
∞∑t=1
γt = ∞,γt+1 − γt
γt= O(γι
t+1),∞∑t=1
γ(1+ι′)/2t √
t< ∞, (12)
for some ι ∈ [1, 2) and ι′ ∈ (0, 1).
(ii) The sequence {τt} is positive and non-increasing and satisfies the followingconditions:
limt→∞
τt = τ∗, τt − τt+1 = o(γt),∞∑t=1
γt |τt − τt−1|ι′′< ∞, (13)
for some ι′′ ∈ (0, 1), and
∞∑t=1
γt |τt − τ∗| < ∞, (14)
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Conditions on {γt} and {τt}
For the sequences {γt} and {τt}, one can typically set
γt =C1
tς, τt =
C2√t+ τ∗, (15)
for some constants C1 > 0, C2 > 0, and ς ∈ (0.5, 1]. Then it is easy to verify that
(15) satisfies (A3).
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Convergence Theorem
Theorem 1. Assume that T is compact and the conditions (A1)-(A3) holds. If θ̃0 used
in the SAA algorithm is such that supτ∈T vτ (θ̃0) < inf∥θ∥=c0,τ∈T vτ (θ) for some
c0 > 0 and ∥θ̃0∥ < c0, then the number of truncations in SAA is almost surely finite;
that is, {θt} remains in a compact subset of Θ almost surely.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Convergence Theorem
Theorem 2. Assume the conditions of Theorem 1 hold. Then, as t → ∞,
d(θt ,Lτ∗ ) → 0, a.s.,
where Lτ∗ = {θ ∈ Θ : hτ∗ (θ) = 0} and d(z,S) = infy{∥z − y∥ : y ∈ S}. That is,
θ(i)t →
{C + log(
∫Ei
fτ∗ (x)dx)− log(πi + πe), if Ei ̸= ∅,−∞, if Ei = ∅,
where C is a constant, and πe =∑
j :Ej=∅ πj/(m −m0), and m0 is the number of
empty subregions.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Strong Law of Large Numbers(SLLN)
Theorem 3. Assume the conditions of Theorem 1 hold. Let x1, . . . , xn denote a set ofsamples simulated by SAA in n iterations. Let g : X → R be a measurable functionsuch that it is bounded and integrable with respect to fθ,τ (x). Then
1
n
n∑k=1
g(xk ) →∫X
g(x)fθ∗,τ∗ (x)dx , a.s.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Convergence to Global Minima
Corollary. Assume the conditions of Theorem 1 hold. Let x1, . . . , xt denote a set ofsamples simulated by SAA in t iterations. Then, for any ϵ > 0, as t → ∞,
1∑tk=1 I (J(xk) = i)
t∑k=1
I (U(xk) ≤ u∗i +ϵ& J(xk) = i) →
∫{x :U(x)≤u∗
i+ϵ}∩Ei
e−U(x)/τ∗dx∫Ei
e−U(x)/τ∗dx, a.s.,
(16)for i = 1, . . . ,m, where I (·) denotes an indicator function. Moreover, if τ∗ goes to 0,then
P(U(Xt) ≤ u∗i + ϵ|J(Xt) = i
)→ 1, i = 1, . . . ,m. (17)
For simulated annealing, as shown in Haario and Saksman (1991), it can achieve thefollowing convergence with a logarithmic cooling schedule: For any ϵ > 0,
P(U(Xt) ≤ u∗1 + ϵ) → 1, a.s., (18)
as t → ∞.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Comparison with Simulated Annealing
▶ Simulated annealing can achieve a stronger convergence mode than SAA. As atrade-off, SAA can work with a cooling schedule in which the temperaturedecreases much faster than in the logarithmic cooling schedule, such as thesquare-root cooling schedule.
▶ From the perspective of practical applications, (17) and (18) are almostequivalent: Both allows one to identify a sequence of samples that converge tothe global energy minima of U(x).
▶ In practice, SAA can often work better than simulated annealing. This isbecause SAA possesses the self-adjusting mechanism, which enables SAA to beimmune to local traps.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
A 10-state Distribution
The unnormalized mass function of the 10-state distribution.
x 1 2 3 4 5 6 7 8 9 10P(x) 5 100 40 1 125 75 1 150 50 20
The sample space X = {1, 2, . . . , 10} was partitioned according to the mass function
into five subregions: E1 = {8}, E2 = {2, 5}, E3 = {6, 9}, E4 = {3} and
E5 = {1, 4, 7, 10}.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
A 10-state Distribution
Convergence of θt for the 10-state distribution: the true value θn is calculated at theend temperature 0.0104472, θ̂n is the average of θn over 5 independent runs, s.d. isthe standard deviation of θ̂n, and freq is the averaged relative sampling frequency of
each subregion. The standard deviation of freq is nearly 0.
Subregion E1 E2 E3 E4 E5
θn 6.3404 -11.1113 -60.0072 -120.1772 -186.5248
θ̂n 6.3404 -11.1116 -60.0009 -120.1687 -186.5044s.d. 0 6.26×10−3 2.28× 10−3 6.01× 10−3 8.16× 10−3
freq 20.29% 20.23% 20.05% 19.84% 19.6%
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
A 10-state Distribution
49000000 49200000 49400000 49600000 49800000 50000000
34
56
78
910
iteration
state
A thinned sample path of SAA for the 10-state distribution.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
A function with multiple local minima
Consider minimizing the function U(x) =
−{x1 sin(20x2) + x2 sin(20x1)}2 cosh{sin(10x1)x1} − {x1 cos(10x2)− x2 sin(10x1)}2
cosh{cos(20x2)x2}, where x = (x1, x2) ∈ [−1.1, 1.1]2.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
A function with multiple local minima
Comparison of SAA and simulated annealing for the multi-modal example:Average of Minimum Energy Valuesa
20000 40000 60000 80000 100000 propb cpuc
SAA -8.1145 -8.1198 -8.1214 -8.1223 -8.1229 92.0% 0.17(3.0× 10−4) (1.5× 10−4) (1.0× 10−4) (7.5× 10−5) (5.9× 10−5)
SAd (sr) -5.9227 -5.9255 -5.9265 -5.9269 -5.9271 3.5% 0.14(1.3× 10−2) (1.3× 10−2) (1.3× 10−2) (1.3× 10−2) (1.3× 10−2)
SAe(geo) -6.5534 -6.5598 -6.5611 -6.5617 -6.5620 30.7% 0.13(3.3× 10−2) (3.3× 10−2) (3.3× 10−2) (3.3× 10−2) (3.3× 10−2)
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
A function with multiple local minima
(a) Contour
x1
x2
−6 −6
−4
−4 −4
−4
−3
−3 −3
−3 −3
−3 −3
−3
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2
−2 −2
−2
−2
−1
−1
−1
−1
−1 −1
−1 −1
−1
−1
−1
−1 −1
−1
−1 −1 −1
−1
−1
−1
−1
−1
−1
−1
−1
−1
−1
−1
−1 −1
−1
−1
0 0
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.00.5
1.0
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.00.5
1.0
(b) SAA
x1
x2
O O
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.00.5
1.0
(c) SA (square−root)
x1
x2
O O
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.00.5
1.0
(d) SA (geometric)
x1
x2O O
(a) Contour of U(x), (b) sample path of SAA, (c) sample path of simulated annealingwith a square-root cooling schedule, and (d) sample path of simulated annealing with
a geometric cooling schedule. The white circles show the global minima of U(x).
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Feed-forward Neural Networks
j
j
j
j
j j
j
j
j
��
���
����
���������������
���������
���������
��
���
����
PPPPPPPPP
PPPPPPPPP
���������
@@@@@
@@@@
@@@@@
@@@@
PPPPPPPPP
TTTTTTTTTTTTTTT�
��
���
������������
������������������
((((((((((((((((((
hhhhhhhhhhhhhhhhhh
HHHHHHHHHHHHHHHHHH
��
����
���
QQQQ
QQQ
!!!!!!!!!!!!!!!
-
-
-
-
-
-
I1
I2
I3
I4
B H1
H2
H3
O
Input Layer
Hidden Layer
Output Layer
A fully connected one hidden layer MLP network with four input units (I1, I2, I3, I4),one bias unit (B), three hidden units (H1, H2, H3), and one output unit (O). The
arrows show the direction of data feeding.Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Two spiral Problem
The two-spiral problem is to learn a feedforward neural network that distinguishesbetween points on two intertwined spirals.
This is a benchmark feedforward neural network training problem. The objectivefunction is high-dimensional, highly nonlinear, and consists of a multitude of localenergy minima separated by high energy barriers.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Two spiral Problem
−6 −4 −2 0 2 4 6
−6−4
−20
24
6
x
y
(a)
−6 −4 −2 0 2 4 6
−6−4
−20
24
6
xy
(b)
Classification maps learned by SAA with a MLE of 30 hidden units. The black andwhite points show the training data for two intertwined spirals. (a) Classification maplearned in one run of SAA. (b) Classification map averaged over 20 runs. This figure
shows the success of SAA in optimization of complex functions.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Two spiral Problem
Comparison of SAA, space annealing SAMC, simulated annealing, and BFGS for thetwo-spiral example. Notation: vi denotes the minimum energy value obtained in theith run for i = 1, . . . , 20, “Mean” is the average of vi , “SD” is the standard deviation
of “mean”, “minimum”=min20i=1 vi , “maximum”=max20i=1 vi , “proportion”=#{i : vi ≤ 0.21}, “Iteration” is the average number of iterations performed in eachrun. SA-1 employs the linear cooling schedule, and SA-2 employs the geometric
cooling schedule with a decreasing rate of 0.9863.
Algorithm Mean SD Min Max Prop Iter(×106)SAA 0.341 0.099 0.201 2.04 18 5.82Space annealing SAMC 0.620 0.191 0.187 3.23 15 7.07Simulated annealing-1 17.485 0.706 9.02 22.06 0 10.0simulated annealing-2 6.433 0.450 3.03 11.02 0 10.0BFGS 15.50 0.899 10.00 24.00 0 —
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Protein Folding
The AB model consists of only two types of monomers, A and B, which behave ashydrophobic (σi = +1) and hydrophilic (σi = −1) monomers, respectively. Themonomers are linked by rigid bonds of unit length to form linear chains living in two orthree dimensional space.For the 2D case, the energy function consists of two types of contributions, bondangle and Lennard-Jones, and is given by
U(x) =N−2∑i=1
1
4(1− cos xi,i+1) + 4
N−2∑i=1
N∑j=i+2
[r−12ij − C2(σi, σj)r
−6ij
], (19)
where x = (x1,2, . . . , xN−2,N−1), xi,j ∈ [−π, π] is the angle between the ith and jthbond vectors, and rij is the distance between monomers i and j . The constant
C2(σi , σj ) is +1, + 12, and − 1
2for AA, BB, and AB pairs, respectively.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Protein Folding
Comparison of SAA and simulated annealing for the 2D-AB models. a The minimumenergy value obtained by SAA (subject to a post conjugate gradient minimizationprocedure starting from the best configurations found in each run). b The averagedminimum energy value sampled by the algorithm and the standard deviation of the
average. c The minimum energy value sampled by the algorithm in all runs.
SAA Simulated AnnealingN Posta Averageb Bestc Averageb Bestc
13 -3.2941 -3.2833 (0.0011) -3.2881 -3.1775 (0.0018) -3.201221 -6.1980 -6.1578 (0.0020) -6.1712 -5.9809 (0.0463) -6.120134 -10.8060 -10.3396 (0.0555) -10.7689 -9.5845 (0.1260) -10.5240
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Protein Folding
(a) (b) (c)
Minimum energy configurations produced by SAA (subject to post conjugate gradientoptimization) for (a) the 13-mer sequence with energy value -3.2941, (b) the 21-mer
sequence with energy value -6.1980, and (c) the 34-mer sequence with energy-10.8060. The solid and open circles indicate the hydrophobic and hydrophilic
monomers, respectively.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Summary
▶ We have developed the SAA algorithm for global optimization. Under theframework of stochastic approximation, we show that SAA can work with acooling schedule in which the temperature can decrease much faster than in thelogarithmic cooling schedule, e.g., a square-root cooling schedule, whileguaranteeing the global energy minima to be reached when the temperaturetends to 0.
▶ Compared to simulated annealing, an added advantage of SAA is itsself-adjusting mechanism that enables it to be immune to local traps.
▶ Compared to space annealing SAMC, SAA shrinks its sample space in a soft way,gradually biasing the sampling toward the local energy minima of each subregionthrough lowering the temperature with iterations. This strategy of sample spaceshrinkage reduces the risk for SAA to get trapped into local energy minima.
▶ SAA provides a more general framework of stochastic approximation than thecurrent stochastic approximation MCMC algorithms. By including an additionalcontrol parameter τt , stochastic approximation may find new applications orimprove its performance in existing applications.
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule
Acknowledgments
▶ Collaborators: Yichen Cheng and Guang Lin.
▶ NSF grants
▶ KAUST grant
Faming Liang Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule