cross entropy-based importance sampling using gaussian...

24
Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,* , Iason Papaioannou a , Daniel Straub a a Engineering Risk Analysis Group, Technische Universit¨ at M¨ unchen, Arcisstraße 21, 80333 M¨ unchen Abstract The computation of the probability of a rare (failure) event is a common task in structural reliability analysis. In most applications, the numerical model defining the rare event is nonlinear and the resulting failure domain often multimodal. One strategy for estimating the probability of failure in this context is the importance sampling method. The efficiency of importance sampling depends on the choice of the importance sampling density. A near-optimal sampling density can be found through application of the cross entropy method. The cross entropy method is an adaptive sampling approach that determines the sampling density through minimizing the Kullback-Leibler divergence between the theoretically optimal importance sampling density and a chosen parametric family of distributions. In this paper, we investigate the suitability of the multivariate normal distribution and the Gaussian mixture model as importance sampling densities within the cross entropy method. Moreover, we compare the performance of the cross entropy method to sequential importance sampling, another recently proposed adaptive sampling approach, which uses the Gaussian mixture distribution as a proposal distribution within a Markov Chain Monte Carlo algorithm. For the parameter updating of the Gaussian mixture within the cross entropy method, we propose a modified version of the expectation-maximization algorithm that works with weighted samples. To estimate the number of distributions in the mixture, the density-based spatial clustering of applications with noise (DBSCAN) algorithm is adapted to the use of weighted samples. We compare the performance of the different methods in several examples, including component reliability problems, system reliability problems and reliability in varying dimensions. The results show that the cross entropy method using a single Gaussian outperforms the cross entropy method using Gaussian mixture and that both distribution types are not suitable for high dimensional reliability problems. This is an electronic reprint of the original article published by Elsevier in: Structural Safety, 2019, Vol.76, pp. 15–27 URL: http://www.sciencedirect.com/science/article/pii/S0167473017303429 DOI: https://doi.org/10.1016/j.strusafe.2018.07.001 (please cite accordingly). This reprint differs from the original in layout, pagination and typographic details. Keywords: reliability analysis, simulation, importance sampling, cross entropy method, Gaussian mixture, expectation maximization, sequential importance sampling 1. Introduction In structural reliability, the goal is to assess the effects of uncertain input variables on the performance of an engineering system. The main objective is to evaluate the probability of unsatisfactory performance of the system, the probability of failure P F . Let X be a random vector with outcome space x R n that collects all uncertain input variables. The performance of a system can be assessed by means of the limit * Corresponding author Email address: [email protected] (Sebastian Geyer) Preprint submitted to Structural Safety August 1, 2018

Upload: others

Post on 18-Sep-2019

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

Cross entropy-based importance sampling using Gaussian densities revisited

Sebastian Geyera,∗, Iason Papaioannoua, Daniel Strauba

aEngineering Risk Analysis Group, Technische Universitat Munchen, Arcisstraße 21, 80333 Munchen

Abstract

The computation of the probability of a rare (failure) event is a common task in structural reliabilityanalysis. In most applications, the numerical model defining the rare event is nonlinear and the resultingfailure domain often multimodal. One strategy for estimating the probability of failure in this context is theimportance sampling method. The efficiency of importance sampling depends on the choice of the importancesampling density. A near-optimal sampling density can be found through application of the cross entropymethod. The cross entropy method is an adaptive sampling approach that determines the sampling densitythrough minimizing the Kullback-Leibler divergence between the theoretically optimal importance samplingdensity and a chosen parametric family of distributions. In this paper, we investigate the suitability of themultivariate normal distribution and the Gaussian mixture model as importance sampling densities withinthe cross entropy method. Moreover, we compare the performance of the cross entropy method to sequentialimportance sampling, another recently proposed adaptive sampling approach, which uses the Gaussianmixture distribution as a proposal distribution within a Markov Chain Monte Carlo algorithm. For theparameter updating of the Gaussian mixture within the cross entropy method, we propose a modified versionof the expectation-maximization algorithm that works with weighted samples. To estimate the number ofdistributions in the mixture, the density-based spatial clustering of applications with noise (DBSCAN)algorithm is adapted to the use of weighted samples. We compare the performance of the different methodsin several examples, including component reliability problems, system reliability problems and reliability invarying dimensions. The results show that the cross entropy method using a single Gaussian outperformsthe cross entropy method using Gaussian mixture and that both distribution types are not suitable for highdimensional reliability problems.

This is an electronic reprint of the original article published by Elsevier in:Structural Safety, 2019, Vol.76, pp. 15–27URL: http://www.sciencedirect.com/science/article/pii/S0167473017303429DOI: https://doi.org/10.1016/j.strusafe.2018.07.001 (please cite accordingly).This reprint differs from the original in layout, pagination and typographic details.

Keywords: reliability analysis, simulation, importance sampling, cross entropy method, Gaussian mixture,expectation maximization, sequential importance sampling

1. Introduction

In structural reliability, the goal is to assess the effects of uncertain input variables on the performanceof an engineering system. The main objective is to evaluate the probability of unsatisfactory performanceof the system, the probability of failure PF . Let X be a random vector with outcome space x ∈ Rn thatcollects all uncertain input variables. The performance of a system can be assessed by means of the limit

∗Corresponding authorEmail address: [email protected] (Sebastian Geyer)

Preprint submitted to Structural Safety August 1, 2018

Page 2: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

state function, denoted by g(x). If the limit state function gives a value smaller or equal to zero the systemfails, while otherwise the system has satisfactory performance. Hence, the event defining failure of thesystem is given by

F = {x ∈ Rn : g(x) ≤ 0} (1)

The probability of failure is defined through the following integral:

PF =

∫RnI(x) · f(x) dx (2)

In this definition, f(x) is the joint probability density function (PDF) of X and I(x) is an indicator functiondefined as

I(x) =

{1 if g(x) ≤ 0

0 else(3)

The function g(x) depends on the outcome of an engineering model, and hence the probability of failurecannot be evaluated analytically. Therefore, the failure probability needs to be approximated by means of anefficient method that minimizes the number of model evaluations [1]. Several methods have been developed,including approximation methods such as the first/second order reliability method (FORM/SORM) [2, 3]and sampling-based methods such as the Monte Carlo simulation (MCS) and its adaptive variants (e.g.[4–10]). The main advantage of sampling-based strategies is their robustness with respect to the complexityof the limit state function. Furthermore, they are asymptotically exact and, in contrast to approximationmethods, information about the quality of the obtained result is more easily available.Sampling methods are based on MCS; however, MCS is especially inefficient in estimating small failureprobabilities, which is typically the goal in reliability assessment of engineering systems. The variance of theMCS estimate is inversely proportional to the target failure probability, which leads to prohibitively largesample sizes for obtaining sufficiently accurate estimates. The aim of advanced sampling-based methods isto enhance the efficiency of MCS through reducing the required sample size while keeping the variance ofthe estimate low.A standard variant of MCS is the importance sampling (IS) method. IS aims at decreasing the varianceof the MCS probability estimate by sampling from an alternative sampling density, the so-called IS densityh(x). Eq. (2) can be modified as follows:

PF =

∫Rn

I(x) · f(x)

h(x)· h(x) dx (4)

Provided that the support of h(x) contains the failure domain, this modification does not alter the value ofthe integral. The IS estimate of PF is given as follows:

PF =1

ns

ns∑i=1

I(xi) ·f(xi)

h(xi), (5)

wherein the samples {xi, i = 1, . . . , ns} are distributed according to h(x). The efficiency of IS depends onthe choice of the IS density. A near-optimal IS density can be found through application of the cross entropy(CE) method [11]. The CE method is an adaptive sampling approach that determines the sampling densitythrough minimizing the Kullback-Leibler (KL) divergence between the theoretically optimal IS density anda chosen parametric family of probability distributions. Typically, the multivariate normal distribution ischosen as parametric distribution family, while recently the Gaussian mixture (GM) has been proposed forhandling multimodal failure domains [7, 12]. In this study we investigate the performance of these twodistribution types within the CE method and propose a new updating scheme for estimating the parametersof the GM model through a modified expectation-maximization (EM) algorithm. Moreover, another recentlyproposed approach, called sequential importance sampling (SIS) [6], which also makes use of the Gaussianmixture, is discussed and its performance is compared to the one of the CE method.

2

Page 3: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

The structure of the paper is as follows. First, the CE method is described and a general algorithm forits implementation is presented. Afterwards, the multivariate Gaussian and Gaussian mixture distributiontypes are introduced and their integration in the CE method is described; for integrating the Gaussianmixture model, a novel EM algorithm is proposed. Subsequently, a brief overview of SIS is given. Theperformance of the different methods is demonstrated by means of numerical examples. Finally, the resultsare summarized and the conclusions are presented.

2. Cross entropy method

For an efficient and reliable estimation of the failure probability with IS, the sampling density needs tobe chosen carefully. In fact, there exists an optimal IS density, whose PDF is [13]:

p∗(x) =I(x) · f(x)∫

Rn I(x) · f(x) dx(6)

Eq. (6) is the density of the random variables censored at the failure domain; its normalizing constant isthe target probability of failure. The IS density of Eq. (6) leads to a variance of the probability estimateof zero. That is, a single sample of this density would lead to the exact value of the probability of failure.However, this density is not applicable in practice, as it requires a priori knowledge of the failure domainand the target failure probability.Even though Eq. (6) cannot be used directly, one can use samples at the failure domain to identify a near-optimal IS density through fitting a distribution model. The CE method identifies the parameters of achosen distribution model through minimizing the KL divergence between the sought sampling density andthe optimal IS density of Eq. (6). The KL divergence is a measure of the difference between two PDFs, andis defined as [14]

D(p1(x), p2(x)) = Ep1

[ln

(p1(x)

p2(x)

)]=

∫Rnp1(x) · ln

(p1(x)

p2(x)

)dx , (7)

where Ep1 [·] denotes the expectation with respect to p1(x). Eq. (7) is also known as the relative crossentropy [15]. In the following the terms cross entropy and KL divergence are used interchangeably.The expression in Eq. (7) can be divided into two parts:

D(p1(x), p2(x)) =

∫Rnp1(x) · ln (p1(x)) dx−

∫Rnp1(x) · ln (p2(x)) dx (8)

Replacing p1(x) with the optimal (but unknown) IS density p∗(x) and p2(x) with a parametric IS densityh(x;v), where v is the parameter vector, Eq. (8) can be rewritten to describe the cross entropy betweenthese two PDFs:

D (p∗(x), h(x;v)) =

∫Rnp∗(x) · ln (p∗(x)) dx−

∫Rnp∗(x) · ln (h(x;v)) dx (9)

As the parametric IS density appears in the second term only and p∗(x) is invariant, a minimum of thecross entropy can be found by minimizing only the second part. Substituting Eq. (6) for p∗(x) in Eq. (9)results in the following optimization problem [7]:

arg minv D(p∗(x), h(x;v)) = arg maxv

∫RnI(x) · f(x) · ln (h(x;v)) dx (10)

The IS density h(x;v) found by the minimization of D(p∗(x), h(x;v)) is termed near-optimal IS density.For the efficient evaluation of Eq. (10), an alternative sampling density h(x;w) is defined, which is of the

3

Page 4: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

same type as the near-optimal sampling density h(x;v). The optimization program is modified accordingly:

arg minv D(p∗(x), h(x;v))

= arg maxv

∫RnI(x) · ln (h(x;v)) ·W (x;w) · h(x;w) dx

= arg maxv Ew [I(x) · ln (h(x;v)) ·W (x;w)] (11)

In this expression, Ew [·] denotes the mathematical expectation with respect to the alternative samplingdensity h(x;w), while W (x;w) is the likelihood ratio of the original sampling PDF to the alternativesampling PDF:

W (x;w) =f(x)

h(x;w)(12)

The expectation in Eq. (11) is approximated via IS. With xi referring to the i-th sample drawn from thedensity h(x;w), i = 1, . . . , ns, the solution to Eq. (11) is approximated by

arg minv D(p∗(x), h(x;v))

≈ arg maxv

1

ns·ns∑i=1

I(xi) · ln (h(xi;v)) ·W (xi;w) (13)

If Eq. (13) has a global maximum, which is typically the case in structural reliability [4], this maximum canbe found by taking the gradient with respect to v and setting the result to zero.According to Eq. (13), obtaining a reliable estimate of the KL divergence requires that a substantial numberof samples from h(x;w) fall in the failure domain. The CE method solves this problem through introducinga series of intermediate failure domains that gradually approach the target failure domain. In this way,the CE optimization problem is solved for the optimal IS density of each intermediate failure domain usingsamples from the fitted parametric density obtained at the previous step. The l-th intermediate failuredomain and corresponding indicator function are defined as follows:

F (l) = {x ∈ Rn : g(x) ≤ ξ(l)}, ξ(l) ≥ 0 (14)

Iξ(l)(x) =

{1 if g(x) ≤ ξ(l)

0 else(15)

The intermediate failure domain is defined such that ρ · ns samples are located in this domain, where ρ isa chosen value in the interval ]0, 1[. The corresponding ξ(l) is calculated as the ρ-quantile of the limit statefunction values of the samples. Thus, the optimization problem of Eq. (13) can be rewritten for the l-thstep as

arg minv D(p∗(l)(x), h(x;v))

≈ arg maxv

1

ns·ns∑i=1

Iξ(l)(xi) · ln (h(xi;v)) ·W(xi;w

(l)), (16)

where p∗(l)(x) is the theoretically optimal IS density corresponding to the l-th intermediate failure domain.The solution for Eq. (16) then provides a density which approximates this intermediate failure domain. Inthe next step, ns samples are drawn from this density and the procedure is repeated until at least ρ · nssamples fall in the actual failure domain. Starting from an initial sampling density, it is possible to obtain

4

Page 5: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

a near-optimal IS density within a number of steps of sampling and parameter updating.The general CE algorithm for structural reliability applications can be summarized as follows [11]:

1. Initializing:

• Set l = 1.

• Choose a distribution family and initial parameters w(l) for the sampling density h(x;w(l)

).

• Select sample size ns and sample quantile ρ.

2. Sampling:

• Generate ns random samples of h(x;w(l)

).

• Evaluate the limit state function g(x) for the samples.

• Determine the ρ-quantile of the limit state function values denoted by ξ(l).

• If ξ(l) ≤ 0, set ξ(l) = 0 and proceed with step 4.

3. Parameter updating:

• Find the solution v(l) to the optimization program of Eq. (16).

• Set w(l+1) = v(l) .

• Set l = l + 1 and return to step 2.

4. Estimating the probability of failure:

PF =1

ns·ns∑i=1

I(xi) ·W(xi;w

(l))

(17)

3. CE method with single Gaussian densities

The classical choice for the distribution family within the CE method is the multivariate normal distri-bution, referred to as single Gaussian (SG) density. It is fully described by its first two moments, the meanvector µ and the covariance matrix Σ. Hence, the unknown parameter vector in the optimization problemof the CE method is v = [µ; Σ]. As the SG belongs to the exponential family, closed-form updating rulescan be derived from Eq. (16) [14].

3.1. Updating rules for SG

Inserting the SG density into Eq. (16), taking the gradient of the objective function with respect to vand setting this to zero, one gets:

1

ns·ns∑i=1

Iξ(l)(xi) · ∇vln (N (xi;µ,Σ)) ·W (xi;w(l)) = 0 (18)

Here, N (xi;µ,Σ) describes the SG PDF with parameters µ and Σ. From Eq. (18), one obtains the followingparameter updates:

µ(l+1) =

∑nsi=1 Iξ(l)(xi) ·W

(xi;w

(l))· xi∑ns

i=1 Iξ(l)(xi) ·W(xi;w(l)

) (19)

Σ(l+1) =

∑nsi=1 Iξ(l)(xi) ·W

(xi;w

(l))·(xi − µ(l+1)

)·(xi − µ(l+1)

)T∑nsi=1 Iξ(l)(xi) ·W

(xi;w(l)

) (20)

5

Page 6: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

3.2. Initialization

The implemented algorithm follows the general CE algorithm scheme presented in Section 2. A pointof special interest when using an adaptive IS algorithm is the choice of the initial sampling parameters.One possibility would be to perform a FORM analysis to identify the most likely failure point and centerthe sampling density at this point, e.g. [16]. However, this approach concentrates on the most importantfailure domain while other domains of interest may be neglected. Moreover, it requires the solution of anoptimization problem, which could potentially increase the computational cost significantly. For this reason,we choose the initial sampling density as the original density of the random variables, i.e. h(x;w(1)) = f(x).In this case, the first step of the CE algorithm becomes equivalent to performing a MCS on a system withincreased failure probability.

4. CE method with Gaussian mixtures

This section introduces the Gaussian mixture (GM) as parametric model within the CE method, whichwas recently proposed for handling multimodal failure domains [7]. The PDF of the GM model is definedas the sum of a number of Gaussian PDFs, each of them multiplied by a weighting factor:

hGM(x;v) =K∑k=1

πk · N (x;µk,Σk) (21)

The GM distribution has valid cumulative distribution function and PDF, provided that∑Kk=1 πk = 1.

The unknown parameters of the CE optimization problem in this case are v = [πk;µk; Σk, k = 1, . . . ,K].This optimization problem is solved by substituting the GM PDF into Eq. (16), taking the gradient of theobjective function with respect to v and setting it to zero:

1

ns·ns∑i=1

Iξ(l)(xi) ·W(xi;w

(l))· ∇vln

(K∑k=1

πk · N (xi;µk,Σk)

)= 0 (22)

The gradients of the logarithm of the GM PDF in Eq. (22) with respect to the different components of vare:

∂ln (hGM(xi;v))

∂πk= γi,k ·

1

πk, k = 1, . . . ,K (23)

∂ln (hGM(xi;v))

∂µk= γi,k ·Σ−1k · (x− µk) , k = 1, . . . ,K (24)

∂ln (hGM(xi;v))

∂Σk= γi,k ·

1

2

(Σ−1k · (x− µk) · (x− µk)T ·Σ−1k −Σ−1k

), k = 1, . . . ,K (25)

Herein, γi,k is the probability of sample i to belong to distribution k and is defined as follows:

γi,k =πk · N (xi;µk,Σk)∑Kk=1 πk · N (xi;µk,Σk)

(26)

It holds that∑Kk=1 γi,k = 1. The term γi,k appears in each of the three partial derivatives. As the parameters

for which Eq. (22) is to be solved are part of this term, closed-form updating rules cannot be obtained. Theterm equals 1 for K = 1, which verifies the results from the previous section, where closed-form updatingrules were derived for a single Gaussian distribution. An approximation of γi,k in step l can be found based

6

Page 7: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

on the sampling density of the previous step, i.e. [7]

γ(l)i,k ≈

π(l−1)k · N

(xi;µ

(l−1)k ,Σ

(l−1)k

)∑Kk=1 π

(l−1)k · N

(xi;µ

(l−1)k ,Σ

(l−1)k

) (27)

If K does not change throughout the sampling steps of the CE algorithm, this approximation may besufficient. However, in order to keep the algorithm flexible and insensitive to different initial parameters, anefficient procedure to obtain the maximum likelihood estimates (MLE) of the parameters is derived in thefollowing for the parameter update of the GM distribution.

4.1. Maximum likelihood estimation of distribution parameters

The available samples are denoted as x within this section, with their corresponding vector of importanceweights denoted by W . The parameter vector to be fitted is denoted by v. In general, the MLE estimateof v is defined as follows:

vMLE = arg maxv hW (x|v) (28)

As the samples xi, i = 1, . . . , ns are independent, the joint likelihood hW (x|v) can be defined as theproduct of the likelihoods of the samples raised to the power of the corresponding weight W (xi), whichimplies W (xi) occurrences of xi, i = 1, . . . , ns:

hW (x|v) =

ns∏i=1

h (xi|v)W (xi) (29)

Since it is often easier to maximize the log-likelihood rather than the likelihood, the MLE estimate can beobtained by solving the following optimization program:

vMLE = arg maxv ln (hW (x|v)) (30)

This does not change the solution due to the monotonicity of the logarithm. The logarithm of the jointlikelihood is given by

ln (hW (x|v)) =

ns∑i=1

W (xi) · ln (h (xi|v)) (31)

Comparing Eqs. (30) and (31) with Eq. (13) shows that obtaining the MLE estimate of the parameters ofa parametric density with pre-selected samples according to the indicator function used in the CE methodis equivalent to solving the CE optimization problem. This equivalence has also been discussed in [4].An efficient way to compute maximum likelihood estimates of parameters of mixture distributions is theexpectation-maximization (EM) algorithm [17]. Originally, the EM algorithm was developed for unweightedsamples. By including the weights of the samples in the maximum likelihood estimation, it is possible toadapt the algorithm such that it can be used to fit an IS density within the CE method. In the following, ageneral EM updating scheme for weighted samples is derived and adapted to fitting the GM distribution.To obtain the MLE estimate for a mixture distribution directly, not only x and the corresponding weightsW are required, but in addition the assignment of the samples to the distributions in the mixture, e.g. asdefined in Eq. (26) [18]. If only x and W are available, the EM method treats this assignment as missingdata and uses an iterative procedure to approximate the MLE estimate. In each step of the iteration, aguess for the missing data is used to maximize the expected log-likelihood of the full data set Z, consistingof the available and the missing data.For a given value of the parameter vector v, the expected log-likelihood (or Q-function) is defined as [18]:

Q (v|v) = EZ|x,v [ln (hW (z|v))] =

∫Z(x)

ln (hW (z|v)) · hW (z|x, v) dz, (32)

7

Page 8: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

where Z(x) denotes the support of Z conditioned on x. Based on an initial guess for the missing data, theEM algorithm proceeds with two steps. The maximization step (M-step) provides a new parameter vectorby maximizing the expected log-likelihood of Z. Afterwards, in the expectation step (E-step), the expectedlog-likelihood is calculated following Eq. (32), based on the updated parameter vector. This procedure isrepeated until convergence is achieved.In the next section we discuss application of the EM algorithm for obtaining the parameters of the GMmodel in the CE method with nρ = ρ · ns available samples.

4.2. EM algorithm for the GM model

For each of the K distributions in the GM three quantities have to be estimated; namely the meanvector µk, covariance matrix Σk and weighting factor πk. As mentioned above, the missing data describesthe assignment of the samples to the distributions in the mixture. In step m of the EM algorithm, thisassignment can be approximated by a probabilistic assignment of the samples to the distributions using theparameter estimates [πmk ;µmk ; Σm

k , k = 1, . . . ,K]. In this way, an nρ × k matrix is obtained with entriescalculated as follows:

γ(m)i,k =

π(m)k · N

(xi;µ

(m)k ,Σ

(m)k

)∑Kk=1 π

(m)k · N

(xi;µ

(m)k ,Σ

(m)k

) ,i = 1, . . . , nρk = 1, . . . ,K (33)

Inserting Eq. (33) into Eq. (32) the following expression for the Q-function in step m can be derived [18]:

Q(v|v(m)

)=

nρ∑i=1

K∑k=1

W (xi) · γ(m)i,k · ln (πk · N (xi;µk,Σk)) , (34)

which completes the E-step of the algorithm. The logarithm in Eq. (34) reads as follows:

ln (πk · N (xi;µk,Σk))

= ln (πk)− 1

2ln(|Σk|)−

1

2(xi − µk)T ·Σ−1k · (xi − µk) + C (35)

The constant C can be dropped, as the equation is part of an optimization program. Defining

n(m)k =

nρ∑i=1

W (xi) · γ(m)i,k (36)

allows to rewrite Eq. (34) as

Q(v|v(m)

)=

K∑k=1

n(m)k ·

(ln (πk)− 1

2ln (|Σk|)

)+

− 1

2

nρ∑i=1

K∑k=1

W (xi) · γ(m)i,k · (xi − µk)T ·Σ−1k · (xi − µk) (37)

8

Page 9: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

The optimization program, which is to be solved within the M-step of the algorithm, can be expressed asfollows:

maximizev Q(v|v(m)

)

subject to

K∑k=1

πk = 1, πk ≥ 0, k = 1, . . . ,K,

Σk � 0, k = 1, . . . ,K, (38)

where the latter condition ensures that the covariance matrix is positive definite. By means of the methodof Lagrange multipliers the following updating rules for each of the K distributions in the mixture can befound [18]:

π(m+1)k =

n(m)k∑K

k=1 n(m)k

(39)

µ(m+1)k =

∑nρi=1W (xi) · γ(m)

i,k · xin(m)k

(40)

Σ(m+1)k =

∑nρi=1W (xi) · γ(m)

i,k · (xi − µ(m+1)k ) · (xi − µ(m+1)

k )T

n(m)k

, k = 1, . . . ,K (41)

With these updating rules, the M-step of the algorithm can be performed, providing a new parameter guessfor the subsequent E-step.The EM algorithm to fit a GM model based on weighted samples can be summarized as follows:

1. Initialization

• Set m = 1.

• Choose number of distributions in the mixture K and initial values for γ(m)i,k , i = 1, . . . , nρ,

k = 1, . . . ,K.

• Set L(m) = −∞ for convergence check.

2. M-step

• Calculate

n(m)k =

nρ∑i=1

W (xi) · γ(m)i,k , k = 1, . . . ,K

• Update distribution weights:

π(m+1)k =

n(m)k∑K

k=1 n(m)k

, k = 1, . . . ,K

• Update mean vectors:

µ(m+1)k =

∑nρi=1W (xi) · γ(m)

i,k · xin(m)k

, k = 1, . . . ,K

9

Page 10: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

• Update covariance matrices:

Σ(m+1)k =

∑nρi=1W (xi) · γ(m)

i,k · (xi − µ(m+1)k ) · (xi − µ(m+1)

k )T

n(m)k

, k = 1, . . . ,K

3. E-step

• Compute new estimate for the probabilistic assignment:

γ(m+1)i,k =

π(m+1)k · N

(xi;µ

(m+1)k ,Σ

(m+1)k

)∑Kk=1 π

(m+1)k · N

(xi;µ

(m+1)k ,Σ

(m+1)k

) , i = 1, . . . , nρk = 1, . . . ,K

• Calculate the log-likelihood:

L(m+1) =1∑nρ

i=1W (xi)·nρ∑i=1

W (xi) · ln

(K∑k=1

π(m+1)k · N

(xi;µ

(m+1)k ,Σ

(m+1)k

))

4. Check convergence

• If |L(m+1) − L(m)| > ε · |L(m+1)|, set m = m+ 1 and return to step 2.

• Otherwise convergence is reached. Exit the algorithm.

It is mentioned that the one-step parameter update of the GM model, as proposed in [7] and further appliedin [12], is equivalent to a single step of the EM algorithm with an initial guess for the missing data, estimatedbased on the sampling density of the previous step in the CE method. Hence, the parameter updating schemeof [7] may result in suboptimal parameter estimates.The EM procedure does not require additional evaluations of the model and hence its contribution to thecomputational effort is negligible.

4.3. Choice of the number of Gaussians in the GM model

The number of Gaussians in the mixture directly influences the performance of the algorithm. Choosingthis number too small will result in neglecting failure domains with less importance, while choosing thisnumber too large gives inaccurate estimates of the parameters for small sample sizes. Subsequent stepsof the CE method with poor choices of K amplify these effects, as the samples are drawn from the GMdistribution approximated in the previous step.Several methods exist to obtain a guess for the number of distributions in a mixture model. One suchmethod is the k-means algorithm [19] and the further developed k-means++ algorithm [20]. They cancluster available data and identify the centroids. The quality of the clustering can be measured, e.g. bythe so-called silhouette value. However, neither k-means nor k-means++ is able to account for the sampleweights. The method proposed in [21], based on the so-called minimum message length (MML) criterion,accounts for the weights of the samples; however, evaluation of the MML criterion is not straightforwardand has to be adapted to the particular mixture model.In this study, we identify the number of distributions in the GM model with the density-based spatialclustering of applications with noise (DBSCAN) algorithm [22]. In a nutshell, this algorithm detects theneighbors of every data point set and groups the data into three sets: 1) core points, which have a minimumnumber of points in their direct neighborhood; 2) border points, which are in the neighborhood of a corepoint but do not have enough points in their neighborhood to be identified as a core point; and 3) noise,which means that the data point is not near to other points and thus is neglected. A set of data points isidentified as a cluster if it consists of a minimum number of points that do not exceed a maximum distanceto the next neighbor. Originally, the DBSCAN algorithm was developed for unweighted samples. Here, wemodify the algorithm to work with weighted samples. We do so through modifying the criterion used toidentify a cluster. Instead of counting the number of points, we evaluate the weighted sum of the points in

10

Page 11: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

a cluster, whereby each weight is the corresponding IS weight of the sample. This method is able to detectclusters of arbitrary shape and identify outliers, marking them as noise. An initial guess for the missingdata can then be obtained through assigning each sample to the corresponding cluster. Based on this initialpseudo-E-step, the alternating procedure of the EM algorithm can be conducted until the change in thelog-likelihood is sufficiently small.To reduce the risk of neglecting failure domains with small probability mass, not only the nρ samples withthe smallest limit state function values are taken into account. Instead, an adaptive sample range is chosenfor the DBSCAN algorithm. By choosing the samples between the 0.05- and the 0.25-quantiles of the limitstate function values in the first step, the samples are selected according to the shape of the failure domainand the effect of possible outliers is reduced. In the subsequent steps, the samples in the 0.2-quantile of thelimit state function values are selected, since it is assumed that the effect of outliers is reduced due to thedistribution fitting by the EM algorithm and the applied IS weights.

4.4. Initialization

The CE algorithm outlined in Section 2 is directly applicable to the case with the GM distribution.For the parameter update, the EM algorithm is applied with nρ samples located at the intermediate failuredomain. The choice of the initial sampling density is an important question. One possibility is to identify theinitial sampling density through a pre-sampling step. Kurtz and Song [7] propose to run the CE method witha SG density employing a small sample size and thereby locate the radius of a hypersphere that intersectsthe failure domain. The mean vectors of the GM initial sampling density are uniformly sampled on thesurface of this hypersphere, while the distribution weights are chosen as uniform weights and the covariancematrices as identity matrices [7]. However, this method requires additional computational effort and thenumber of distributions in the mixture has to be chosen a priori. Moreover, as discussed in [12], the sampledmean vectors from this procedure might lead to a biased estimate, as they might cluster near one of theimportant regions and not capture all of the failure domains. In [12], it is suggested to obtain the meanvectors through Latin hypercube sampling from the whole outcome space. Although this method is superiorto the previous one regarding the computational effort, as it does not require additional evaluations of thelimit state function, it still requires an initial choice of the numbers of distributions in the mixture. Here,we choose the initial sampling density as the original density of the random variables. This results in anunweighted initial sampling step and does not require to define the number of distributions in the mixturea priori, which is estimated by the DBSCAN algorithm as discussed in Section 4.3.

5. SIS with Gaussian mixture

SIS is an adaptive sampling method that was recently proposed for efficient estimation of the failureprobability of engineering systems [6]. In SIS, it is proposed to make use of a GM model that is fitted bythe EM algorithm for weighted samples, as presented in Section 4.2. Hence, a direct comparison of theperformance of SIS to the CE method can be drawn within this study. In this section, a brief introductionto the principle of SIS is given, containing the most important ideas and their incorporation in the SISalgorithm. Note that in [6] the method is presented for application in an equivalent standard normal space.Here, we present the method for general application, i.e. without the need to transform the random variablespace.The basic idea of SIS is to adaptively sample a sequence of distributions that gradually approximate theoptimal IS density p?(x) of Eq. (6). An approximation of p?(x) can be obtained through inserting thefollowing smooth approximation of the indicator function in Eq. (6):

I (x) ≈ Φ

(−g(x)

σL

)(42)

11

Page 12: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

This approximation becomes more accurate for a decreasing value of σL. Accordingly, a sequence of densitieshl(x), l = 1, . . . , L is defined, as

hl(x) =1

Pl· Φ(−g(x)

σl

)· f(x) =

ηl(x)

Pl(43)

with ∞ = σ1 > σ2 > . . . > σL. The normalizing constant Pl of density hl can be expressed as [6]:

Pl =

∫ηl(x) dx = Pl−1

∫ηl(x)

ηl−1(x)· hl−1(x) dx = Pl−1 · Ehl−1

[wl(x)] (44)

where wl(x) = ηl(x)ηl−1(x)

. Hence, an estimate of the ratio of subsequent normalizing constants can be derived

through IS with samples distributed according to hl−1 as

Sl =Pl

Pl−1=

1

ns

ns∑i=1

wl(xi) (45)

The parameters σl, l = 2, . . . , L are chosen adaptively to ensure that two subsequent distributions do not varysignificantly [6]. Sampling from the sequence of Eq. (43) can be performed through a resample-move schemestarting from samples from h1(x) = f(x). In each step, samples from hl−1(x) are weighted according tothe next distribution in the sequence through employing the likelihood ratio wl(x). The resulting weightedsamples are resampled to obtain unweighted samples from hl(x) and moved to regions of high probabilitydensity of hl(x) through Markov chain Monte Carlo (MCMC) sampling. To achieve the latter it is proposedin [6] to apply an independent Metropolis-Hastings (MH) algorithm, which is demonstrated to perform wellin low to moderate dimensions. The probability of failure is estimated with the following expression:

PF =PLns

ns∑i=1

I(xi) ·f(xi)

ηL(xi), (46)

where

PL =

L∏l=1

Sl (47)

For further details on the implementation of the method the reader is referred to [6].The proposal distribution in the independent MH algorithm used in the MCMC step of the SIS algorithmis chosen in [6] as the GM model fitted to the samples from hl−1(x) weighted by wl(x). The fitting isperformed by application of the EM algorithm for weighted samples as described in Section 4.2. However,in contrast to the CE method, which only employs nρ = ρ · ns samples, SIS accounts for all ns availablesamples in the parameter update, weighted according to the next distribution in the sequence. In [6], thenumber of densities in the mixture is chosen a priori. To increase the flexibility and to make the resultscomparable to the ones of the CE method, here the number of densities is chosen through application ofthe DBSCAN algorithm. In the first step of the SIS algorithm, the samples featuring weights between the0.7-quantile and the 0.95-quantile of the importance weight values are selected for the clustering process,while in the following steps ns

2 samples with highest importance weight values are used. Based on the clustercenters, an initial guess of the missing data is obtained.

6. Numerical investigations

We investigate the performance of the CE method with both the SG density and the GM distributionby means of several numerical examples and compare it to the performance of SIS. The effect of varyingsample size is analyzed and the performance of the methods in increasing dimensions is assessed. It is notedthat in the field of structural reliability the main computational effort lies in the model evaluation. Hence,

12

Page 13: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

the total number of samples, i.e. the number of limit state function evaluations, is chosen as a measure forthe computational cost.All selected examples are problems in the independent standard normal space. Hence, all random variablesx1, ..., xn have the standard Gaussian distribution. The first example is a parabolic component reliabilityproblem in two dimensions while the second one is a series system problem with four components. The thirdexample consists of a linear limit state function and is well-suited to demonstrate the effect of increasingdimensions. The final example combines a linear limit state surface with a convex one on opposite sides ofthe origin. The reference values for the probability of failure for examples 6.1, 6.2 and 6.4 are obtained byMCS with 109 samples; for example 6.3, an analytical expression for the failure probability is available.In the CE method, ρ is set to 0.1, which results in nρ = 0.1 ·ns available samples for the parameter update.In the MCMC algorithm within SIS, each Markov chain has length 10, with ns

10 chains being run in each

level. We measure the performance of the method in terms of the mean E[Pf ] and coefficient of variationδPf of the estimate for the probability of failure, the average number of sampling steps required to reach the

failure domain l, the average total number of samples Ntot (limit state function evaluations) and the averagenumber of distributions in the GM in the final step kfin. These performance measures are estimated from500 independent simulation runs.

Table 1: Results for the concave limit state function with a reference probability of failure of 3.01 · 10−3. Comparison of CEmethod using SG, GM and SIS using GM for varying number of samples.

Number of samplesper level

E[PF ] δPFl Ntot kfin

CE method using SG

250 2.99 · 10−3 0.40 3.20 0.80 · 103 1500 3.01 · 10−3 0.18 3.09 1.55 · 103 11000 3.01 · 10−3 0.11 3.02 3.02 · 103 12000 3.00 · 10−3 0.07 3.00 6.00 · 103 14000 3.01 · 10−3 0.05 3.00 12.00 · 103 1

CE method using GM

250 1.53 · 10−3 0.72 4.85 1.21 · 103 2.55500 2.67 · 10−3 0.33 3.51 1.76 · 103 2.301000 2.94 · 10−3 0.10 3.17 3.17 · 103 2.192000 3.00 · 10−3 0.04 3.08 6.15 · 103 2.124000 3.02 · 10−3 0.04 3.06 12.23 · 103 2.03

SIS using GM

250 1.67 · 10−3 0.59 4.08 1.02 · 103 3.09500 2.66 · 10−3 0.21 3.98 1.99 · 103 2.391000 2.93 · 10−3 0.12 3.98 3.98 · 103 2.292000 2.98 · 10−3 0.07 4.00 8.00 · 103 2.174000 3.00 · 10−3 0.05 4.00 16.00 · 103 2.15

6.1. Concave limit state function

The limit state function of the first example has a parabolic shape, concave to the origin of the standardnormal space and has two design points, as shown in Figure 1. It is defined as [23]:

g1(x) = 5− x2 − 0.5 (x1 − 0.1)2

(48)

The reference value for the probability of failure is 3.01 ·10−3. The simulation results for CE using SG, GMand SIS using a GM are shown in Table 1. While using the SG in the CE method results in a practicallyunbiased estimate for any sample size, both methods using a GM show a significant negative bias for smallsample sizes. In terms of the coefficient of variation of the estimate δPF , the CE method with the SG isalso superior for sample sizes lower than 1000 per level. By increasing the sample size, this performancegap can be closed and all methods provide unbiased estimates with comparable δPF of around 0.05 with

13

Page 14: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

-4 -2 0 2 4

-4

-2

0

2

4

x1

x2

Figure 1: Optimal IS density for the concave limit state function with two design points.

4000 samples per level. For increasing sample sizes, the average number of sampling steps l stabilizes for allmethods. The CE method converges in three steps if the sample size is sufficiently large, while SIS needsan additional step on average. This results in a larger computational effort, which is illustrated by the totalnumber of samples Ntot.For small sample sizes, the DBSCAN algorithm cannot identify the two design points properly, but overes-timates the required number of distributions in the mixture on average. The EM algorithm is not able tocorrect this initial guess and fits the GM model with K distributions as proposed by the DBSCAN algorithm.As the sample size increases, a correct identification of the important regions is achieved for all methods.In addition, the performance of the CE method using GM with different choices of K and a fixed samplesize of 1000 samples per level is investigated for this example. For this investigation, the initial value of Kis fixed throughout the simulation, i.e. the DBSCAN algorithm is not applied to identify K. The initialguess for the probabilistic assignment of the samples to the distributions in the EM algorithm is obtained byrandomly choosing K samples from the available samples at each level and setting them as the initial meanvectors. The results are summarized in Table 2. It appears that the performance of the CE method stronglydepends on the choice of K. For an increasing K both the negative bias and δPF increase. The reason forthis behavior is that as K increases a smaller number of samples is available for estimating the parametersof each distribution. Even in the case where two distributions are chosen, the method performs worse thanwhen the adaptive mixture is used by application of the DBSCAN algorithm; the latter converges faster,which implies a smaller computational effort, and results in a smaller bias and δPF of the estimate.

Table 2: Results for the concave limit state function with a reference probability of failure of 3.01 · 10−3. Comparison of CEmethod using GM with ns = 1000 per level and the DBSCAN algorithm to CE method using GM with ns = 1000 per leveland different specified numbers of distributions in the mixture.

Number of distributions inthe mixture

CE method using GM

E[PF ] δPFl Ntot

DBSCAN 2.94 · 10−3 0.10 3.17 3.17 · 103

2 2.77 · 10−3 0.19 4.00 4.00 · 103

3 2.70 · 10−3 0.35 4.07 4.07 · 103

5 2.44 · 10−3 0.43 4.11 4.11 · 103

10 1.77 · 10−3 1.03 4.24 4.24 · 103

14

Page 15: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

-6 -4 -2 0 2 4 6

-6

-4

-2

0

2

x1

x2

-6 -4 -2 0 2 4 6

-6

-4

-2

0

2

x1

x2

-6 -4 -2 0 2 4 6

-6

-4

-2

0

2

4

6

-8 -4 0 4 8

-8

-4

0

4

8

x1

x2

4

6

x2

x1

(a) Optimal IS density

(c) CE method using GM

(b) CE method using SG

(d) SIS using GM

4

6

Figure 2: (a) Optimal IS density for series system problem with four distinct failure domains; (b)-(d) Samples of approximateoptimal IS density with ns = 1000 obtained from CE method using SG (b), CE method using GM (c) and SIS using GM (d).

6.2. Series system problem

This example consists of a series system reliability problem with four components. The limit statefunction is defined by [24]:

g2(x) = min

0.1(x1 − x2)2 − (x1+x2)√

2+ 3

0.1(x1 − x2)2 + (x1+x2)√2

+ 3

x1 − x2 + 7√2

x2 − x1 + 7√2

(49)

Two of the components feature a parabolic limit state surface convex to the origin, while the remaining twocomponents are of the linear type. Altogether, this results in four distinct failure domains.The reference value for the probability of failure is 2.22 · 10−3. For the case with ns = 1000, the CE methodusing SG results in an estimate of 2.17 · 10−3, producing a small negative bias, while δPF is 0.21. Using theGM in the CE method results in a significantly larger negative bias and a larger δPF ; the average estimate

15

Page 16: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

of the probability of failure is 1.45 · 10−3 and δPF is 0.30. SIS results in an average estimate of 1.82 · 10−3

with δPF of 0.16. In Figure 2, the samples of the final step of an exemplary simulation run are illustratedfor the three methods. The CE method using SG results in widely spread samples covering not only thefailure regions but also the safe domain, while the methods that employ the GM model concentrate on thefailure regions. Apparently, the CE method using GM is not able to detect all failure domains, it onlycovers the two most important ones which results in a strongly biased estimate for the failure probability.In contrast, SIS using GM identifies successfully the failure regions and generates samples in each of thedistinct domains.

CE-SG

CE-GM

SIS-GM

1.0

1.2

1.4

1.6

1.8

2.0

2.2

2.4

0 2000ns

4000 6000 8000

× 10-3

E[ P

F ]

Reference solution

Figure 3: Estimate of the failure probability for series system problem against number of samples per level.

10-2

10-1

100

101

δPF�

0 2000ns

4000 6000 8000

CE-GM

CE-SG

SIS-GM

Figure 4: δPFfor series system problem against number of samples per level.

We now investigate the effect of varying sample size. Figure 3 shows the average estimate for an increasingnumber of samples per sampling step ns. It is shown that the CE method using SG provides an unbiasedestimate for a relatively low number of samples, whereas, the CE method using GM features a significantnegative bias even for relatively large sample sizes. SIS generates a negative bias for small sample sizes butconverges to an unbiased estimate faster than the CE method using GM. For large sample sizes all three ofthe methods provide practically unbiased estimates. In terms of δPF , which is illustrated in Figure 4, theCE method using SG and SIS have similar performance, while the CE method using GM results in a largerδPF for low and moderate sample sizes. Only when using a large number of samples this performance gapcan be closed. While the CE method using SG converges within approximately three steps for ns ≥ 1000,SIS always converges in four steps if ns is larger or equal to 500. The CE method using GM needs moresteps than SIS to converge for small numbers of samples, but if ns is chosen as 2000 or larger, it convergesfaster than SIS. For large ns it converges as fast as the CE method using SG.Finally, the number of distributions in the GM in the final step of the simulation with respect to the samplesize are analyzed. As the limit state function consists of four distinct failure domains, the optimal number is

16

Page 17: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

kfin = 4. In both the CE method using GM and SIS using GM, the DBSCAN algorithm underestimates thenumber of distributions when using a small number of samples and is only able to identify the two parabolicfailure domains with higher probability mass. For increasing ns, the DBSCAN algorithm within SIS is ableto identify all four distinct modes, while in the CE method using GM it underestimates this number onaverage.

6.3. Linear limit state function in varying dimensions

This example is defined by the following limit state function:

g3(x) = β − 1√n

n∑i=1

xi (50)

It is designed to test the effect of increasing dimensions on the performance of a method, as its probabilityof failure is independent of the number of dimensions and depends only on the parameter β. Moreover, thelimit state function remains linear and features one failure domain for any number of dimensions [25]. Ananalytical solution is available for the probability of failure, given by PF = Φ (−β). We choose β = 3.5,which corresponds to a failure probability of 2.33 · 10−4. The simulation in two dimensions for ns = 1000samples per level results in strongly biased estimates for the CE method. Using the SG, the average estimateof the probability of failure is 1.52 · 10−4 with a δPF of 1.53. Using the GM results in a mean estimate of

0.64 · 10−5 with a δPF of 1.44. The number of distributions in the GM is estimated as 2.33 on average.

-4 -2 0 2 4 6

6

x1

-6-6

-4

-2

0

2

4

x2

-4 -2 0 2 4 6x1

-6

6

-6

-4

-2

0

2

4

x2

(a) Convergence after five steps (b) Convergence after more than 50 steps

Figure 5: Step-wise samples for the linear limit state function obtained from CE method using SG with ns = 1000 per level.(a) Convergence to the failure domain in five steps; (b) Convergence to the failure domain in more than 50 steps.

We now investigate the number of sampling steps required to reach the failure domain in the CE method.On average, the CE method using SG needs 16.89 steps, while the CE method using GM needs 36.43 steps.In some cases, the algorithm does not converge to the failure domain within the maximum number of steps,while in other cases, the algorithm converges in less than 10 steps. This occurs due to the linear shapeof the failure domain. Because the Gaussian model is only able to describe linear dependence, the fittedIS density based on the ρ · ns available samples at the linear intermediate failure domain results in a highlinear dependence between the random variables. Hence, most of the generated samples in the next step arelocated in a narrow sector around the previous intermediate failure domain. This decreases the step sizetowards the actual failure domain. In the extreme case, the step size tends towards zero and the algorithmgets stuck at an intermediate level with a sampling density giving almost perfect linear dependence betweenthe random variables. If the samples of the first steps are sufficiently scattered, the resulting sampling

17

Page 18: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

densities feature a smaller linear dependence. In that case, the algorithm is able to advance to the failuredomain and give an estimate of the failure probability. Figure 5 (a) illustrates a case with fast convergenceto the failure domain, while Figure 5 (b) shows a case with slow convergence to the failure domain. Theincreasing linear dependence between the samples and the decreasing step-size towards the failure domainare clearly visible. This effect becomes less severe with an increase of the number of samples per level.Running the simulation for the two-dimensional case using SIS with ns = 1000 per level results in a negativebias, as the probability of failure is estimated as 1.60 ·10−4, yet δPF = 0.54 is significantly smaller than withthe CE method. The average number of steps until convergence is 5.17, while the number of Gaussians inthe mixture is 3.27 on average. Apparently, the MCMC algorithm within SIS prevents the algorithm fromgetting stuck at intermediate failure domains, as it converges in less than 10 steps in every simulation run.To investigate the effect of increasing dimensions, the number of samples per level is increased for thetwo-dimensional case until all methods result in satisfactory performance in terms of bias and coefficient ofvariation of the estimate. For ns = 8000 the CE method using SG gives an estimate of 2.26 · 10−4 for thefailure probability in 5.67 steps on average with a corresponding δPF of 0.10. The CE method using GM

requires 5.58 steps on average to estimate the probability of failure as 2.01 · 10−4 with a δPF of 0.31, while

SIS gives an estimate of 2.28 · 10−4 with a δPF of 0.04 within 5.00 steps on average. Hence, ns = 8000 perlevel is chosen for the analysis of increasing dimensions.

0.5

1

1.5

2

2.5 × 10-4

5 10 15 20n

25 30 4035 45 50

E[ P

F ]

0

CE-SG

CE-GMSIS-GM

Analytical solution

Figure 6: Estimated probability of failure for the linear limit state function plotted against number of random variables obtainedwith ns = 8000 per level.

0.2

0

0.8

0.6

0.4

1

1.2

1.4

δPF�

5 10 15 20n

25 30 4035 45 500

CE-GM

CE-SG SIS-GM

Figure 7: δPFfor the linear limit state function plotted against number of random variables.

Figure 6 compares the average estimates of the failure probability obtained with the three differentmethods, while the dashed line marks the analytical solution of the linear limit state function. In Figure7, δPF is compared for varying number of dimensions. The two versions of the CE method show similarbehavior, as both feature a rapid increase of the bias and δPF for increasing dimensions. It is noted that

18

Page 19: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

for n ≥ 10 the DBSCAN algorithm always reduces the GM to a single Gaussian density and thus the twomethods coincide for those cases. The figures show that the CE method with Gaussian distributions is notable to deal with large numbers of random variables. As soon as n is larger than ten, the negative bias andδPF increase dramatically. In the case when n = 20, the results are not reliable anymore. SIS results inpractically unbiased estimates for up to 20 dimensions, while this bias increases with further increase of thenumber of random variables. δPF remains moderate for up to 50 dimensions, with an increasing tendency.It is mentioned that in [7], a δPF of 0.05 is achieved for this problem in 60 dimensions with approximately

4.5 · 104 samples using the GM in the CE method. However, this number does not include the samplesrequired for the pre-sampling step to determine the mean vectors and thus cannot be directly compared tothe computational cost reported in this paper.

6.4. Combination of linear and convex limit state function

Finally, a combination of a linear limit state function and a limit state function with convex shape isinvestigated in a two-dimensional standard normal space. It features two distinct failure domains on oppositesides of the origin. The combined limit state function is defined by [25, 26]:

g4(x) = min

{3.2 + 1√

n

∑ni=1 xi

0.1(x1 − x2)2 − (x1+x2)√2

+ 2.5

}(51)

The reference value for the probability of failure is 4.90 · 10−3. This example is used for investigating theperformance of the methods in system problems with limit state functions of different type. While theconvex limit state function features a distinct design point with most of the probability mass in its directneighborhood, the failure domain of the linear limit state function exhibits probability mass distributedalong the linear limit state surface. Furthermore, the convex limit state function makes up more than 85%of the overall probability mass in the failure domain.

The optimal IS density and samples of the approximated near-optimal IS densities obtained with theinvestigated methods are illustrated in Figure 8. It appears that the CE method using SG is able to captureboth failure domains with a small number of samples per level (ns = 1000), whereas, the CE method usingGM is more prone to neglecting the linear failure domain and to exclusively approximating the area aroundthe design point of the convex failure domain. By an increase of the sample size this effect is reduced andthe algorithm successfully detects both failure domains. Table 3 compares the statistics of the simulationobtained with the CE method and SIS for a varying sample size.

It is shown that the CE method using SG gives nearly unbiased estimates for the failure probability withns = 1000 and the small negative bias vanishes with an increasing sample size. The CE method using GMneeds a larger number of samples to give unbiased results, for ns = 1000 there is a negative bias. This biascan be attributed to the fact that the method neglects the linear failure domain. The convex limit statefunction alone has a reference value for the probability of failure of 4.21 · 10−3, which is close to the valueestimated by the CE method using GM with 1000 samples per level. By increasing the sample size the biasis decreased until the method gives a nearly unbiased estimate with 16000 samples per level. In terms of δPF ,for small to moderate sample sizes the CE method using SG outperforms the CE method using GM and forlarge sample sizes the two methods perform approximately the same. For this particular example the CEmethod using SG always converges in three steps while the CE method using GM needs an additional step,converging in four steps for any number of samples per level. The comparison with SIS shows that SIS givesbiased estimates for a small number of samples per level and this bias vanishes with an increasing samplesize. δPF is smaller than the one obtained with the CE method, while the numerical effort is comparable tothe CE method using GM.The number of Gaussians in the mixture estimated by the DBSCAN algorithm increases for increasingsample size, while in SIS this number slightly decreases for increasing sample size. That is, the CE methodwith GM overestimates the number of modes of the optimal IS density as the sample size increases. A reasonfor this can be found in the linear failure domain. Several Gaussians are allocated there by the algorithm,each of them with highly correlated random variables, which results in parallel arranged sample clusters in

19

Page 20: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

x1

x2

x1

x2

-4 -2 0 2 4 -6 -4 -2 0 2 64

-4

-2

0

2

4

-6

-4

-2

0

2

6

4

x1

x2 x2

x1

(c) CE method using GM (d) SIS using GM

(a) Optimal IS density (b) CE method using SG

-4

-4 -2 0 2 4

-2

0

2

4

-4

-4 -2 0 2 4

-2

0

2

4

Figure 8: (a) Optimal IS density for combined limit state function with linear and convex limit state surface; (b)-(d) Samples ofapproximate optimal IS density obtained from CE method using SG with ns = 1000 (b), CE method using GM with ns = 1000(c) and SIS using GM with ns = 1000 (d).

the failure domain of the linear limit state function. SIS is more robust in this regard due to the MCMCalgorithm conducted in each iteration step.

6.5. Accuracy of the investigated methods

This section considers an overall measure for the accuracy of a method, which combines the bias andvariability of the estimates. It is defined as the square root of the mean-square error of the estimatedprobability of failure:

errPF =

√E

[(PF − PF

)2]=

√(E[PF ]− PF

)2+(E[PF ] · δPF

)2(52)

With this measure, it is possible to compare the performances of the investigated methods by plotting theerror as a function of the total computational effort, i.e. the average total number of samples, needed to

20

Page 21: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

Table 3: Results for the combination of a linear and a convex limit state function with a reference probability of failure of4.90 · 10−3. Comparison of CE method using SG, GM and SIS using GM for varying number of samples.

Number of samplesper level

E[PF ] δPFl Ntot kfin

CE method using SG

1000 4.83 · 10−3 0.13 3.00 3.00 · 103 12000 4.89 · 10−3 0.10 3.00 6.00 · 103 14000 4.90 · 10−3 0.07 3.00 12.00 · 103 18000 4.89 · 10−3 0.04 3.00 24.00 · 103 116000 4.90 · 10−3 0.03 3.00 48.00 · 103 1

CE method using GM

1000 4.19 · 10−3 0.26 4.00 4.00 · 103 3.492000 4.52 · 10−3 0.16 4.00 8.00 · 103 3.544000 4.74 · 10−3 0.14 4.00 16.00 · 103 4.298000 4.80 · 10−3 0.05 4.00 32.00 · 103 4.7416000 4.85 · 10−3 0.04 4.00 64.00 · 103 5.21

SIS using GM

1000 4.64 · 10−3 0.12 3.96 3.96 · 103 2.962000 4.76 · 10−3 0.08 3.98 7.97 · 103 2.614000 4.83 · 10−3 0.05 4.00 16.00 · 103 2.528000 4.87 · 10−3 0.03 4.00 32.00 · 103 2.4716000 4.88 · 10−3 0.02 4.00 64.00 · 103 2.42

converge. For the examples analyzed in Sections 6.1 - 6.4, the normalized error is plotted in Figure 9. Aninteresting effect can be observed for the linear limit state function (Figure 9 (c)). The overall computationaleffort of the two CE-based methods first decreases with an increasing number of samples per level beforeeventually increasing again. This can be explained by the fact that with an increasing number of samplesper level, the methods need less steps to converge to the failure domain and therefore the total number oflimit state function calls can decrease.

7. Discussion

This study investigates the performance of the CE method for structural reliability with two different ISdensities, the multivariate normal distribution and the Gaussian mixture distribution. Updating proceduresfor both distributions within the CE algorithm are discussed. We extend the one-step updating procedurefor the GM distribution proposed in [7] to an iterative updating procedure through a new EM algorithm.Furthermore, the DBSCAN algorithm is introduced as a clustering algorithm to determine the number ofdistributions in the mixture. The initial parameters of the sampling distribution are chosen such that thefirst CE iteration step is an unweighted MCS. This has the advantage that no knowledge of the failure modesof the investigated problem is required in advance.Several examples of limit state functions in standard normal space are analyzed to show the performance ofthe different methods. These include component reliability problems as well as system reliability problemsin low and high dimensions. Moreover, the results are compared to SIS, another recently proposed sampling-based method. Within this method, the EM algorithm is used to fit a GM distribution based on weightedsamples and the fitted distribution is applied as proposal distribution in an MCMC sampling approach.Hence, a direct comparison of SIS and the CE method can be drawn.The concave example shows that the CE method using GM performs worse than the CE method using SG.Especially when a small sample size per level is chosen, the CE method using GM features a significant bias,while the CE method using SG gives nearly unbiased estimates. Furthermore, the CE method using SGconverges faster to the failure domain, which decreases the computational effort. SIS using GM performsbetter than the CE method using GM but compared to the CE method using SG the bias and δPF forsmall sample sizes is larger and it requires more steps to converge. In addition, the effect of the number ofdistributions in the GM is investigated. It appears that a specified number results in larger bias and δPF

21

Page 22: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

0 4 8 12 16 × 1030

0.2

0.4

0.6

Ntot−

0

0.5

1

1.5

0 1 2 3 × 104

Ntot−

(a) Concave limit state function (b) Series system problem

CE-SG

CE-GM

SIS-GM

CE-GM

CE-SG

SIS-GM

0

0.4

0.8

1.2

0 2 4 6 × 104

Ntot−

(c) Linear limit state function (n=2)

CE-SG

CE-GM

SIS-GM

8 0

0.1

0.2

0.3

0 2 4 6 × 104

Ntot−

(d) Combination of linear and convex limit state function

CE-GM

CE-SG

SIS-GM

err P

/PF

F�

err P

/PF

F�

err P

/PF

F�

err P

/PF

F�

**

*

*

**

**

*

*

**

Figure 9: Normalized error of the investigated methods for examples 6.1 - 6.4 plotted against the average of the total numberof samples. For each curve, the asterisk at the beginning indicates the smallest choice of ns per level; along the curves the ns

per level increases.

compared to an adaptive estimation of the number of Gaussians K using the DBSCAN algorithm, even forthe case when K is chosen as the number of failure domains of the problem.The series system problem shows the ability of the SG distribution to describe multimodal failure by widelyspread samples across the outcome space. The CE method using GM faces severe problems in identifyingall failure domains, which derives from the fact that only a small number of samples is taken into accountfor the parameter update, whereas, SIS using GM can correctly identify all failure regions and approximatethe failure domain with a GM distribution. The reason for that can be found in the MCMC algorithmconducted in every step of the SIS algorithm, which moves the samples to regions of high probability massof the target distribution.A linear limit state function with a failure probability independent of the dimension is analyzed to showthe performance of the different methods in varying dimensions. A simulation in two dimensions withsmall number of samples per level shows that the CE method has difficulties in approximating the optimalIS density of the linear failure domain. Depending on the spread of the samples in the initial steps, thealgorithm either converges to the failure domain or gets stuck at an intermediate domain. By choosing alarge number of samples this behavior can be avoided, which is also apparent in the results obtained in [7].As the number of random variables increase, both the CE method using SG and the CE method using GMshow a rapidly increasing bias and δPF . A reliable estimate cannot be provided for dimensions larger than10. On the other hand, SIS is able to provide estimates with acceptable bias and δPF for up to 50 randomvariables.Finally, a limit state function with two failure domains of different shape is investigated. One of the domainsis of convex shape and features a distinct design point while the other one is described by a linear limit statesurface. It is shown that the CE method using SG is able to capture both failure domains and provide anunbiased estimate even for a small number of samples per level, while the CE method using GM requires

22

Page 23: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

a larger number of samples to achieve the same accuracy. If the number of samples is chosen too small,the CE algorithm with GM is drawn to the design point of the convex limit state function and thus cannotcapture the second failure domain. This problem does not occur when using SIS with a GM distribution.In Figure 9, we plotted the normalized error of the investigated methods, which combines the bias andcoefficient of variation, as a function of the computational effort. This comparison shows that the CEmethod using SG performs more accurate than the CE method using GM for most of the investigatedproblems or at least equally accurate. SIS using GM gives accurate results for all investigated examples,but is sometimes less accurate than the CE method using SG for small computational efforts. Altogether,the CE method using GM performs poorly if no knowledge on the system behavior is available in advance.Using the original joint PDF of the random variables as the initial sampling density provides flexibility,however in cases with multiple failure regions of varying importance, the samples used to fit the GM modelin the first sampling step tend to concentrate in regions of higher importance. In [7], the authors attempt toovercome this issue through running a pre-sampling step to gain information about the failure modes of thesystem. However, this approach requires additional computational effort and if a small sample size is usedin the pre-sampling it might also result in ignoring failure regions [12]. One way to improve the performanceof the CE method with GM could be to run a FORM analysis to identify the design points of all failuredomains [27]. The distributions in the GM can then be centered at the design points in the first step ofthe CE algorithm. However, there is no guarantee that the FORM optimization will identify all designpoints, whereas the additional involved computational effort of gradient-based optimization algorithms thatare often used within FORM increases fast with increase of the dimension.

8. Conclusion

This study investigates the suitability of Gaussian densities as importance sampling densities in thecross entropy (CE) method. We discuss that contrary to the the single Gaussian distribution, closed-formupdating rules for the parameters cannot be obtained for the Gaussian mixture distribution and the updatecan be done with an iterative expectation-maximization algorithm. The number of distributions in themixture is crucial for the performance of the method. By introducing a modified DBSCAN algorithm, anefficient procedure to estimate this number in each step of the CE method is proposed. We show that the CEmethod using a single Gaussian is superior to the CE method using a Gaussian mixture if no knowledge of thesystem behavior is available in advance both in terms of the quality of the estimate and the computationaleffort. We also demonstrate that the CE method using either single Gaussian or Gaussian mixture is not agood choice for linear and high dimensional problems. We compare the performance of the CE method withthe one of sequential importance sampling (SIS), which employs the Gaussian mixture model as a proposaldistribution in a Markov chain Monte Carlo sampling setting. SIS is able to approximate the optimalimportance sampling density of problems with multiple failure domains with the Gaussian mixture modeland outperforms the CE method in problems with linear failure domains and moderately high dimensions.

Acknowledgements

This work was supported by the Deutsche Forschungsgemeinschaft (DFG) in the framework of the priorityprogramme SPP 1886 by grant STR 1140/6-1.

References

[1] M. Lemaire, A. Chateauneuf, J.-C. Mitteau, Structural Reliability, Wiley-ISTE, 2009.[2] R. Rackwitz, Reliability analysis - a review and some perspectives, Structural Safety 23 (4) (2001) 365 – 395.[3] A. Der Kiureghian, First- and second-order reliability methods, in: E. Nikolaidis, D. M. Ghiocel, S. Singhal (Eds.),

Engineering design reliability handbook, CRC Press, 2005, Ch. 14.[4] R. Y. Rubinstein, D. P. Kroese, Simulation and the Monte Carlo method, 3rd Edition, Springer, 2017.[5] S.-K. Au, J. L. Beck, Estimation of small failure probabilities in high dimensions by subset simulation, Probabilistic

Engineering Mechanics 16 (4) (2001) 263 – 277.

23

Page 24: Cross entropy-based importance sampling using Gaussian ...mediatum.ub.tum.de/doc/1456133/1456133.pdfmultivariate normal distribution and the Gaussian mixture model as importance sampling

[6] I. Papaioannou, C. Papadimitriou, D. Straub, Sequential importance sampling for structural reliability analysis, StructuralSafety 62 (2016) 66 – 75.

[7] N. Kurtz, J. Song, Cross-entropy-based adaptive importance sampling using Gaussian mixture, Structural Safety 42 (2013)35 – 44.

[8] S. Au, J. Beck, A new adaptive importance sampling scheme for reliability calculations, Structural Safety 21 (2) (1999)135 – 158.

[9] M. Hohenbichler, R. Rackwitz, Improvement of second-order reliability estimates by importance sampling, Journal ofEngineering Mechanics 114 (12) (1988) 2195–2199.

[10] P. Koutsourelakis, H. Pradlwarter, G. Schueller, Reliability of structures in high dimensions, part i: algorithms andapplications, Probabilistic Engineering Mechanics 19 (4) (2004) 409 – 417.

[11] P. T. de Boer, D. P. Kroese, S. Mannor, R. Y. Rubinstein, A tutorial on the cross-entropy method, Annals of operationsresearch 134 (1) (2005) 19–67.

[12] D. Y. Yang, J. Teng, D. M. Frangopol, Cross-entropy-based adaptive importance sampling for time-dependent reliabilityanalysis of deteriorating structures, Structural Safety 66 (2017) 38 – 50.

[13] G. Biondini, An introduction to rare event simulation and importance sampling, in: Big data analytics, Vol. 33 of Handbookof statistics, Elsevier, 2015, pp. 29 – 68.

[14] D. P. Kroese, R. Y. Rubinstein, P. W. Glynn, Chapter 2 - The cross-entropy method for estimation, in: Handbook ofStatistics: Machine Learning: Theory and Applications, Vol. 31 of Handbook of Statistics, Elsevier, 2013, pp. 19 – 34.

[15] J. C. Baez, T. Fritz, A bayesian characterization of relative entropy, CoRR abs/1402.3067.[16] M. Fujita, R. Rackwitz, Updating first- and second-order reliability estimates by importance sampling, Structural Eng./

Earthquake Eng. 5 (1) (1988) 31–37.[17] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of

the Royal Statistical Society. Series B (Methodological) 39 (1) (1977) 1–38.[18] Y. Chen, M. R. Gupta, EM demystified: an expectation-maximization tutorial, Tech. rep., Department of electrical

engineering, University of Washington (2010).[19] J. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of the fifth

Berkeley symposium on mathematical statistics and probability, Volume 1: Statistics, University of California Press,Berkeley, Calif., 1967, pp. 281–297.

[20] D. Arthur, S. Vassilvitskii, k-means++: The advantages of careful seeding, Technical Report 2006-13, Stanford InfoLab(2006).

[21] I. D. Gebru, X. Alameda-Pineda, F. Forbes, R. Horaud, EM algorithms for weighted-data clustering with application toaudio-visual scene analysis, CoRR abs/1509.01509.

[22] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databaseswith noise, in: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press,1996, pp. 226–231.

[23] A. Der Kiureghian, T. Dakessian, Multiple design points in first and second-order reliability, Structural Safety 20 (1)(1998) 37 – 49.

[24] P. H. Waarts, Structural reliability using finite element analysis; an appraisal of DARS: directional adaptive responsesurface sampling, Ph.D. thesis, Technische Universiteit Delft (2000).

[25] S. Engelund, R. Rackwitz, A benchmark study on importance sampling techniques in structural reliability, StructuralSafety 12 (4) (1993) 255 – 276.

[26] S. Katsuki, D. M. Frangopol, Hyperspace division method for structural reliability, Journal of Engineering Mechanics120 (11) (1994) 2405–2427.

[27] G. Schueller, R. Stix, A critical appraisal of methods to determine failure probabilities, Structural Safety 4 (4) (1987) 293– 309.

24