materials process design and control laboratory 1 an efficient bayesian inference approach to...

CCOORRNNEELLLL U N I V E R S I T Y


Materials Process Design and Control LaboratoryMaterials Process Design and Control Laboratory 11

An efficient Bayesian inference approach to An efficient Bayesian inference approach to inverse problems based on an adaptive inverse problems based on an adaptive

sparse grid collocation methodsparse grid collocation method

Xiang Ma and Nicholas Zabaras Materials Process Design and Control Laboratory

Sibley School of Mechanical and Aerospace Engineering101 Frank H. T. Rhodes Hall

Cornell University Ithaca, NY 14853-3801

Email: [email protected]: http://mpdc.mae.cornell.edu/




Outline of the presentation

Bayesian approach to inverse problems

Accelerating Bayesian inference via adaptive sparse grid collocation method

Inference of spatial field ---- MRF, Process convolution model, Regularization, e.g. permeability estimation

Conclusions




Inverse problems

system input(unknowns)

inverse problem: effect(d)–to-cause(m)

mathematical representation

d = G(m, ωs) + ωm

physical processes governed by PDEs

observations

forward problem: cause(m)-to-effect(d)

MELTMELTSOLIDSOLIDG G ,V,V g

Forward operators e.g. a system of PDEs;

computationally intensive

observed datamodel parameters/inputs ,

assume random

Given a set of data d, estimate m

( )G m d




Applications and significance

Inverse heat transfer

permeability estimation of a reservoir

ODFpole figure

microstructure

materials processdesign and control

groundwater cleanup

broad applications




Bayesian computational approach

Obstacle:• Posteriors are often high-dimensional • Computation cost for repeated likelihood evaluations

aim: explore the advantagesand resolve this difficulty

• provides probabilistic description of the inverse solution

• direct simulation in deterministic space

• prior distribution regularization

--- incorporation of prior knowledge

--- explores the distribution of regularization parameter

Advantages:




Bayesian inference The model m is now a random variable/field.

Apply Bayes’ theorem:

||

|

p d m p mp m d

p d m p m dm

posterior density

Likelihood function L (m)

prior density

Normalizing const

• The posterior density is the full Bayesian solution to the inverse problem. Compared to most deterministic inverse techniques

|p m d

Not just a single value for m, but a probability density which encapsulates all available information about the inverse problem.

Various summarizing statistics, e.g. mean, modes, marginal distribution and credible intervals, can be computed and quantify the uncertainty associated with a specific inverse solution.




Example: Source Inversion

Source described by location parameters m, active for . Strength s = 5.0, width τ = 0.1.

0,0.1t

Data are collected at 25 sensors on a regular 5x5 grid at two times t = 0.05 and t = 0.1.

[0,1] [0,1]

2

22 2

exp2 2

ˆ 0 on

( ,0) 0

tT sT e

t

T

T

x m

n

x

Measurement error is additive Gaussian.

Priors: 0 1, 0,1m m Um

Goal : Given a set of noisy temperature T, try to estimate the unknown source location m




Posterior density A hierarchical Bayesian model is augmented to detect the noise level σ. A standard practice to select priors for hyper-parameters is to use conjugate priors. Thus, an inverse Gamma distribution is chosen for σ2.

2

/ 2 12 2 222 2

, | exp exp2

n Gp d m

(m) d

where (α,β) is the parameter pair of the inverse Gamma distribution.

The posterior is non-standard and implicit, and the dimension of the posterior state space may be high. Thus, numerical sampling methods are needed for its exploration.

The most popular numerical method for this purpose is Markov Chain Monte Carlo.




Markov Chain Monte Carlo In this problem, the posterior density is sampled using the following hybrid of the Metropolis-Hastings algorithm and the Gibbs sampler:

0 0

1

1

1 12

1. Initialize and

2. For 0 : 1

sample | and sample 0,1

if ,

else

sample |

mcmc

i

i

i

i i

i i

i N

q U

A

p

m

m m m

m m

m m

m m

m

The acceptance ratio is defined as , iA m m

, | | , min 1,

, | |

i i

i

i i i

p qA

p q

m d m mm m

m d m m

The full conditional is also an inverse Gamma 12 | ip m

212 | / 2, / 2ii ii

p IG n d G m m

solving forward problem




Propagate prior uncertainty

In Bayesian problem, everything is assumed random. Thus, the forward problem becomes a stochastic partial differential equation (SPDE). The input uncertainty is the assumed prior density.

Based on the idea of uncertainty quantification, we can propagate the input uncertainty through the forward model and construct a stochastic surrogate model for the forward problem to reduce the computational time

The basic idea is to construct an interpolant in the stochastic prior space using adaptive sparse grid collocation method. Then the interpolant can be used as a surrogate model for the stochastic forward problem.




Mathematical preliminaries

Ω

Sample space of elementary events

Real line

Random

variable

MAP

Collection of all possible outcomes

Each outcome is mapped to a

corresponding real value

Interpreting random variables as functions

A general stochastic process is a random field with variations along space and time – A

function with domain (Ω, Τ, S)

Stochastic prior state space

Define

and their images:

iM

: NM i

i M

Then the state space of is defined as

M

1

Ni N

i

with the joint PDF denotes as . p m

For example, if is an independent uniform random variable in , then [ 1,1]

[ 1,1]N

Define a complete probability space with a sample space which corresponds to the outcomes of some experiments, being the σ-algebra of subsets in (these subsets are called events) and the probability measure.

, ,F P

F

: [0,1]P F

In this framework:

:M

image: :M M




Stochastic forward problem1

Without loss of generality, the data d is the solution u to the forward problem at given sensor locations.

We can define a complete probability space , where the sample space is the set of all possible outcomes of . The forward problem is now taken as a stochastic model. Find a solution such that for P-almost everywhere , the following equation holds:

, 0G u m

, ,T P

M:u

; 0u G M for all The random vector has a state space and a joint PDF which

is the prior PDF. Therefore, for , the realization of random vector takes value in . We define as the stochastic prior state space.

M p m

The stochastic posterior state space can be also defined as

: T M

where the event space is the inverse image (under ) of . T M 1. X. Ma, N. Zabaras, An efficient Bayesian inference approach to inverse problems based on an adaptive sparse

grid collocation method, Inverse Problems 25 (2009) 0350013(27pp).

m M




Stochastic forward problem Since the unknown parameters are always discretized into a random vector , then by using the Doob-Dynkin lemma, u is also a function of i.e. . Also, define D as a d-dimensional physical domain. Therefore, we can restate the deterministic forward problem as the following stochastic forward problem: find a function , such that the following holds:

M M ,u x M

, : nu Dx m

, ; 0, ,u D G x m x m

For each realization , the function value gives one realization of the predicted data, which is equivalent to the solution of the deterministic forward problem using the same m as the input. In this way, the repetitive solution of the deterministic forward problem in MCMC is substituted with the solution to the stochastic forward problem.

When the prior space is the unbounded, we can always truncated it to a bounded one based on some prior information about the unknowns.

The only requirement is that the prior space is large enough to contain the posterior space completely.

m M u m




Sparse grid collocation method1

Function value at any point is simply

Stochastic function in 2 dimensions

Need to represent this function

Sample the function at a finite set of points

Use basis functions to get a approximate representation

: Nf

1 kk i iΘ m

1

( ) ( ) ( )k

i i ii

f f a

m m mI

( )f mISpatial domain is approximated using a FE, FD, or FV discretization.

Stochastic domain is approximated using multidimensional interpolating functions

1. X. Ma, N. Zabaras, A hierarchical adaptive sparse grid collocation method for the solution of stochastic differential equations, Journal of Computational physics, Vol. 228 , pp. 3084-3113, 2009.




Choice of collocation points and nodal basis In the context of incorporating adaptivity, we have utilized the Newton-Cotes grid using equidistant support nodes. The number of nodes is defined as , if ; else Then the support nodes are1ik 1i 12 1.i

ik

Furthermore, by using the linear hat function as the univariate nodal basis function, one ensures a local support in contrast to the global support of Lagrange polynomials. This ensures that discontinuities in the stochastic space can be resolved. The piecewise linear basis functions can be defined as , for , and 1

1 1a 1i

ijm12i i

jm 12i ijm

1

for and1i 1, , ij k




LET OUR BASIC 1D INTERPOLATION SCHEME BE SUMMARIZED AS

IN MULTIPLE DIMENSIONS, THIS CAN BE WRITTEN AS

Denote the one dimensional interpolation formula as

In higher dimension, a simple case is the tensor product formula

Conventional sparse grid collocation (CSGC)

Using the 1D formula, the sparse interpolant , where is the depth of sparse grid interpolation and is the number of stochastic dimensions, is given by the Smolyak algorithm as

,q NA

00,q q N

q

1i i iU U 0 0U

Here we define the hierarchical surplus as:




Nodal basis vs. hierarchical basis

Nodal basis Hierarchical basis

1

1

31m

032m 3

3m 34m 3

5m

31a

32a

33a

34a 3

5a

1

1

21m

031m 1

1m 32m 2

2m

21a

11a

31a 3

2a

22a

31m 3

2m 33m 3

4m 35m

31( )f m

32( )f m 3

3( )f m 34( )f m

35( )f m

21m 3

1m 11m 3

2m 22m

11w

21w

22w

31w

32w

1 1 2 2 2 2 3 3 3 31 1 1 1 2 2 1 1 2 2f w a w a w a w a w a 3 3 3 3 3 3 3 3 3 3

1 1 2 2 3 3 4 4 5 5f f m a f m a f m a f m a f m a




Adaptive sparse grid collocation (ASGC)

Let us first revisit the 1D hierarchical interpolation For smooth functions, the hierarchical surpluses tend to zero as the interpolation level increases.

On the other hand, for non-smooth function, steep gradients/finite discontinuities are indicated by the magnitude of the hierarchical surplus.

The bigger the magnitude is, the stronger the underlying discontinuity is.

Therefore, the hierarchical surplus is a natural candidate for error control and implementation of adaptivity. If the hierarchical surplus is

larger than a pre-defined value (threshold), we simply add the 2N neighbor points of the current point.




Adaptive sparse grid collocation: Algorithms Let be the parameter for the adaptive refinement threshold. 0



Materials Process Design and Control LaboratoryMaterials Process Design and Control Laboratory

Adaptive sparse grid interpolation Ability to detect and reconstruct steep gradients




ASGC in Bayesian inference By using the ASGC method, the stochastic solution to the stochastic forward problem can now be approximated by the following reduced form:

To obtain the predicted data for any , instead of solving the deterministic forward problem, we can simply substitute into above equation to compute the value of within a certain accuracy. Furthermore,

we can easily compute the value of the likelihood.

In this case, most of the computational time is spent on constructing the interpolant for the solution. This construction is embarrassingly parallel and thus the computational time is minimal. Once the interpolant is

constructed, we can simply store it for future use when new observed data are available.

We only consider the unit hypercube here. Other shapes of the bounded prior space can be easily transformed to the unit hypercube.

u m

mu




Solution to the stochastic forward problem Return to the 2D source inversion problem. The stochastic forward problem is solved in the prior space by ASGC.2[0,1]

Propagate the uniform prior on source location. First, we need to check the convergence and accuracy of the surrogate model.

Surface response of a single

component of the stochastic

forward problem solution at point

(0.5,0.5) and t=0.05 with different ε.

110

310 410

25 10




Solution to the stochastic forward problem The accuracy of the surrogate model can be interrogated in several ways. First we compare the PDF at t = 0.05 and t = 0.1 at point (0.5,0.5) with the direct computation by solving the deterministic forward problem.

Therefore, it is seen that when , the PDF converges to the solution by Direct computation, which verifies the accuracy of the surrogate

stochastic forward model.

310




Solution to the stochastic forward problem Next, we examine the accuracy of the surrogate likelihood defined as

2

/ 22 22

| exp2

n up d m

(m) d

The exact source location is (0.5, 0.5) with 5% noisy in the data.

110 210

310

Contours of the exact (solid) and

the surrogate likelihood (dashed)

Convergence of the surrogate posterior:

Kullback-Leibler distance

|KLD p p vs. ε




Solution to the inverse problem Now we illustrate the usage of the surrogate mode for the Bayesian

inference approach on the model inverse problem.

The initial guess for m and σ is (0,0) and 1.0. The pair of parameters (α,β) for the inverse Gamma distribution is taken to be (1x10-3, 1x10-3).

The proposal distribution in the Metropolis-Hastings algorithm is a random walker sampler

| iq m m

22

2| exp 1/ 2i i

mq m m m m

Unless otherwise specified, the scale parameter is taken to be , and the surrogate by stochastic collocation method is constructed with

0.01m

The burn-in period of the Markov chain is 10000, and the last 20000 realizations are used to compute relevant statistical quantities.

410 .




Trace plots and autocorrelation plot of the Markov Chain

The acceptance rate is 0.314.

Visual inspection suggests the chain mixes very well.

The current choice of scale parameter in the proposal distribution leads to a fast decay with lag along the chain, concurring with good mixing in the trace plot.




Speed up of the surrogate model The motivation of using this method is to speed up the Bayesian inference for nonlinear problem, where repeated solution of the forward model for each MCMC sample is replaced by evaluation of .

Time : 34 hours

The direct FEM computation for the likelihood needs 30000 evaluations

Time : 84.18s to construct the surrogate model on 20 nodes and 26.9s for a single MCMC chain

The number of collocation points (thus the number of direct computation needed) in the sparse grid is 1081.




Marginal density A distinct feature of MCMC is easy extraction of the marginal distribution

for the component of interest via kernel density estimation (KDE).

Although the prior distribution is only uniform, the posterior density of greatly refines the prior distribution.

It is seen that the range of distribution is much larger with 5% noise, whereas it is much more concentrated around the exact value with 1% noise.

1m

1m

From the tables, it can be seen that the current Bayesian hierarchical formulation can successfully detect the noise levels.




Effect of source locations Besides the great computational savings, another advantage of using this method is that it is reusable when there are new measurements coming after constructing the surrogate model.

The following table shows the results for four other source locations: (0.1,0.9), (0.75,0.25) and two other random draws from the prior with 1%

noise in the data. The same surrogate model is used.

It can be seen that no matter where the source location is, the method can always infer the exact value without performing any additional direct FEM computations as in the traditional MCMC method.




Error with different ε and noise level It is also interesting to compare the results with respect to different ε and

noise level. The true source location is chosen at (0.2,0.8). The error is defined as the maximum error between the posterior mean and the true source locations.

Surprisingly, for a given noise level, even a big threshold can give us rather accurate results. This is possibly due to error cancellations between the error of the surrogate model and the measurement error.




Effect of the prior distribution We next demonstrate that the accuracy of the method is not affected by

the prior distribution provided the prior space is large enough to contain the posterior space.

The prior is chosen as independent Gaussian, which is a special form of MRF. The hierarchical PPDF is

21

2/ 2 112 22 2

12 2, , | exp exp exp exp

2 2

n Tup

(m) dm d m m

It is obvious that the unit square contains the posterior space. So, we truncate the unbounded space to the unit square and we do not need to perform the calculation using ASGC again.




Effect of the size of prior space We conduct several computations with different sizes of prior space .

The surrogate model is constructed with and 5% noise in the data. The true source location is chosen as (0.2, 0.8).

210

As long as the prior space contains the posterior space, we can always obtain accurate posterior mean estimation. Increasing does not affect the accuracy but it affects the computational cost. The number of

collocation points increases with larger and also the burn-in length of the MCMC chain is elongated. Therefore, it is important to choose an appropriate space which balances accuracy and computational cost.




Example : Permeability estimation In this example, we illustrate the method on the nonlinear inverse problem of estimating the spatial permeability field. The pressure and velocity and characterized by the following set of dimensionless equations:

where denotes the source/sink term.

injection well

production well

1 1

To impose the non-negativity of the permeability, from now on we will treat the logarithm of the

permeability as the main unknown of our problem. To generate simulation data, we consider smooth permeability of the following form

Pressure is measured at a 5x5 sensor network and 5% noise is added.




MRF and process convolution model In the general Bayesian setting, the unknown permeability field and the

forward model must be discretized. If the FE method is used, the field is discretized onto the nodes of the FE mesh. Denote by the number of nodes of the mesh; then we can write both the prior and posterior

densities in terms of the unknown parameter

Possible choices for the prior of is the Gaussian process (GP) and the Markov Random Field. A Gaussian MRF is of the form

m

,

1,

0,else

i

ij

n i j

W i j

Number of neighbors of site i

if and are adjacent. i j

hN

However, it is noted that the dimensionality of is generally very high since is above 1000 in a typical finite element mesh. Exploring the

posterior state space using MCMC is not trivial and the acceptance rate is generally very low. To this end, we use the process convolution approach to reduce the dimensionality.

m

hN




MRF and process convolution model The GP at any location can be represented as

The kernel function is chosen to be a mean zero Gaussian kernel at a discrete set of points which controls the spatial structure of the underlying process and are independent

mN

Thus the prior density of is

The inverse problem has now been transformed to an inference problem on the coefficients

If the locations of the kernels are at a regular grid, a MRF can also be used for the prior distribution of the underlying process on this grid.

The standard deviation of the kernel is usually equal to the distance between adjacent locations of the kernels and the lattice grid of the kernel locations is usually larger than the space of the observed data in

order to eliminate inaccurate results near the boundaries.




Permeability estimation: fixed kernel locations If the unknown is also denoted as , then the hierarchical PPDF can

be computed as follows:

We choose a 5 x 5 lattice in as kernel locations to construct the discrete process convolution model which corresponds to a total of 25

kernels, where the Gaussian kernel is used. MRF is used as the prior and the stochastic prior space is chosen.

m

2[ 0.5,1.5]

20,0.5N25[ 1,1]

A 25-dimensional ASGC with is used, which has 150 322 collocation points. However, the corresponding number of collocation

points for a conventional sparse grid is 199 876 961. Thus, the advantage of using ASGC is obvious.

110




Permeability estimation: fixed kernel locations

5% noise

in the data

True Posterior mean

Marginal densities




Permeability estimation: fixed kernel locations

1% noise

in the data

5 x 5 sensor grid 9 x 9 sensor grid

Marginal of each kernel strengths

5% noise 1% noise




Random kernel locations are independent Gaussian random variables. Each kernel has three

degrees of freedom, i.e. and x, y locations.i

i

Posterior mean: 4 kernels Posterior mean: 5 kernels

43,3i 8, 0,1x y

4 83,3 0,1

53,3i 10, 0,1x y

5 103,3 0,1

110




Conclusions An efficient Bayesian approach is introduced based on adaptive sparse

grid collocation method.

• Efficient propagation of the prior uncertainty through the forward model

• Inference of spatial fields, with dimensionality reduction using MRF process convolution model

• The accuracy of the methods is assessed on two non-linear inverse problems.

• The speed up of this method comparing with that of direct computation is drastic.

However, there are also some limitations of the method. ASGC relies exclusively on a bounded input support. Therefore, it is crucial to choose appropriate and . As long as the prior space includes the support of the posterior, the method provides very good results.




References

1. X. Ma, N. Zabaras, An efficient Bayesian inference approach to inverse problems based on an adaptive sparse grid collocation method, Inverse Problems 25 (2009) 0350013(27pp).

2. X. Ma, N. Zabaras, A hierarchical adaptive sparse grid collocation method for the solution of stochastic differential equations, Journal of Computational physics, Vol. 228 , pp. 3084-3113, 2009.

materials process design and control laboratory 1 an efficient bayesian inference approach to...

Documents