bayesian computational techniques for inverse …

BAYESIAN COMPUTATIONAL TECHNIQUES FOR INVERSE

PROBLEMS IN TRANSPORT PROCESSES

A Dissertation

Presented to the Faculty of the Graduate School

of Cornell University

in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

by

Jingbo Wang

January 2006

c© Jingbo Wang 2006

ALL RIGHTS RESERVED

BAYESIAN COMPUTATIONAL TECHNIQUES FOR INVERSE PROBLEMS

IN TRANSPORT PROCESSES

Jingbo Wang, Ph.D.

Cornell University 2006

Inverse problems in continuum transport processes (governed by partial dif-

ferential equations (PDEs)) have major applications in a variety of scientific and

engineering areas. The ill-posedness, high computational cost, and other compli-

cations of these problems pose significant intellectual challenges. In this thesis, a

computational framework is developed that integrates computational mathematics,

Bayesian statistics, statistical computation, and reduced-order modeling to address

data-driven inverse heat and mass transfer problems. The Bayesian computational

approach is advantageous in many aspects. In particular, it is able to quantify

system uncertainty and random data error, to derive a probabilistic description of

the inverse solution, to provide flexible spatial/temporal regularization to the ill-

posedness of the inverse problem, and to allow adaptive sequential estimation. The

components of this framework include hierarchical Bayesian formulation, prior mod-

eling of distributed parameters via spatial statistics, exploration of implicit posterior

distributions using Markov chain Monte Carlo (MCMC) simulation, proper orthog-

onal decomposition (POD)-based reduced-order modeling of the PDE system, and

sequential Bayesian estimation. These methodologies are applied to the solution of

a number of inverse problems in transport processes including inverse heat conduc-

tion, inverse heat radiation, contaminant detection in porous media flows, control of

directional solidification, and multiscale permeability estimation in heterogeneous

media. These problems are selected due to their technological significance as well as

their ability to demonstrate the attributes of the Bayesian computational approach.

The developed methodologies are general and applicable to many other inverse con-

tinuum problems. A summary of achievements and suggestions for future research

are given at the end of the thesis.

Biographical Sketch

The author was born in Shaanxi, China in February, 1978. After completing his

high school education from Bao Shi High School in Baoji, China, the author was

admitted into Mechanical Engineering program at Tsinghua University, Beijing in

1996, from where he received his Bachelor’s degree in engineering in June, 2000.

In August, 2000, the author was admitted into the graduate school at University

of Delaware, and awarded a Master of Science degree in July, 2002. The author

entered the doctoral program at the Sibley School of Mechanical and Aerospace

Engineering, Cornell University in August, 2002.

iii

This thesis is in memory of my father Wang, Fulin. The thesis is also dedicated to

my mother Wang, Xiaoping and my sister Wang, Jingyuan for their constant

support and encouragement towards academic pursuits during my school years.

iv

Acknowledgements

I would like to thank my thesis advisor, Professor Nicholas Zabaras, for his constant

support and guidance over the last 3 years. I would also like to thank Professors

David Ruppert and Thorsten Joachims for serving on my special committee and

for their encouragement and suggestions at various times during the course of this

work.

The financial support for this project was provided by NASA (grant NAG8-

1671) and the National Science Foundation (grant DMI-0113295). Partial support

from the Advanced Mechanical Technologies Program at General Electric Global

Research Center (GE-GRC) is also gratefully acknowledged. I would like to thank

the Sibley School of Mechanical and Aerospace Engineering for having supported

me through a teaching assistantship for part of my study at Cornell. The computing

for this project was supported by the Cornell Theory Center during 2002-2005.

Part of the computer codes associated with this project were written using the

object oriented programming environment of Diffpack and the academic license

that allowed for these developments is appreciated. The parallel simulators were

developed based on open source scientific computation package Pestc. I would

like to acknowledge the effort of its developers. I am indebted to the present and

former members of the MPDC group, especially to Shankar Ganapathysubramanian,

v

Baskar Ganapathysubramanian and Lijian Tan. Finally, my thanks are extended

to the Elsevier, Ltd. and Institute of Physics and IOP Publishing Ltd. for granting

permission to reproduce figures from our papers [29, 32, 31, 30].

vi

Table of Contents

Table of Contents vii

List of Tables x

List of Figures xi

1 Introduction 1

2 Fundamentals of Bayesian computation and Markov Random Field(MRF) 112.1 Bayesian statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 112.2 Markov chain Monte Carlo (MCMC) simulation . . . . . . . . . . . . 15

2.2.1 Monte Carlo principle . . . . . . . . . . . . . . . . . . . . . . 152.2.2 MCMC algorithms . . . . . . . . . . . . . . . . . . . . . . . . 162.2.3 Convergence assessment of MCMC . . . . . . . . . . . . . . . 19

2.3 Prior distribution modeling using MRF . . . . . . . . . . . . . . . . . 192.4 Generic Bayesian computational framework for inverse continuum

problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Inverse heat conduction problems (IHCP) - A Bayesian approach 223.1 The inverse heat conduction problems . . . . . . . . . . . . . . . . . . 233.2 Bayesian formulation of the inverse heat conduction problems . . . . 26

3.2.1 The likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.2 Prior distribution modeling . . . . . . . . . . . . . . . . . . . 273.2.3 The posterior distributions . . . . . . . . . . . . . . . . . . . . 303.2.4 Regularization in the Bayesian approach . . . . . . . . . . . . 32

3.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Heat flux reconstruction under uncertainties . . . . . . . . . . . . . . 37

3.4.1 Automatic selection of the regularization parameter . . . . . . 373.4.2 Effect of the sensor location . . . . . . . . . . . . . . . . . . . 403.4.3 IHCP under model uncertainties . . . . . . . . . . . . . . . . . 41

3.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.5.1 Example I: Parameter estimation . . . . . . . . . . . . . . . . 43

vii

3.5.2 Example II: Boundary heat flux estimation . . . . . . . . . . . 463.5.3 Example III: Boundary heat flux identification with simul-

taneous uncertainties in material property and thermocouplelocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5.4 Example IV: 1D piece-wise continuous heat source identification 523.5.5 Example V: 2D heat source identification . . . . . . . . . . . . 54

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4 Inverse heat radiation problem (IHRP)- An integrated reduced-order modeling and Bayesian computational approach to complexinverse continuum problems 584.1 The inverse heat radiation problem (IHRP) . . . . . . . . . . . . . . . 594.2 Direct simulation and reduced-order modeling . . . . . . . . . . . . . 624.3 Bayesian formulation of IHRP . . . . . . . . . . . . . . . . . . . . . . 674.4 MCMC sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.5 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5 Contamination source identification in porous media flow - Solvingthe PDEs backward in time using Bayesian method 835.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.2 The direct simulation and sensitivity analysis . . . . . . . . . . . . . 87

5.2.1 Solution of the flow equations . . . . . . . . . . . . . . . . . . 875.2.2 Solution of the concentration equation . . . . . . . . . . . . . 885.2.3 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . 89

5.3 Bayesian backward computation . . . . . . . . . . . . . . . . . . . . . 915.3.1 Bayesian inverse formulation . . . . . . . . . . . . . . . . . . . 915.3.2 The hierarchical posterior distribution . . . . . . . . . . . . . 925.3.3 The backward marching scheme . . . . . . . . . . . . . . . . . 93

5.4 Numerical exploration of the posterior distribution . . . . . . . . . . 945.5 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.5.1 Example 1: 1D advection-dispersion in homogeneous media . . 965.5.2 Example 2: 2D concentration reconstruction . . . . . . . . . . 98

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6 Open-loop control of directional solidification - A sequential Bayesiancomputational application 1086.1 Open-loop control of directional solidification using magnetic gradient 1106.2 A Bayesian filter-based control approach . . . . . . . . . . . . . . . . 114

6.2.1 Bayesian filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.2.2 A sequential Bayesian controller for solidification control . . . 116

6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

viii

7 Multiscale permeability estimation in heterogeneous porous media- A multiscale Bayesian inversion method 1327.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1337.2 Bayesian posterior distribution of the random permeability . . . . . . 135

7.2.1 Formulation I: MRF-based one scale model . . . . . . . . . . . 1367.2.2 Formulation II: HMT-based two scale model . . . . . . . . . . 1377.2.3 Exploring the posterior state space . . . . . . . . . . . . . . . 142

7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.3.1 Example I - permeability with bilinear logarithm . . . . . . . 1477.3.2 Example II - permeability of a random heterogeneous medium 149

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8 Conclusions and suggestions for the future research 1558.1 Pattern recognition for reduced-order modeling . . . . . . . . . . . . 1578.2 Enhancing the multiscale Bayesian inversion techniques . . . . . . . . 1588.3 Wavelet function representation . . . . . . . . . . . . . . . . . . . . . 159

Bibliography 161

ix

List of Tables

3.1 Bayesian estimates of k using different models. . . . . . . . . . . . . 44

6.1 Specifications of the direct solidification problem. . . . . . . . . . . . 120

x

List of Figures

2.1 Schematic of Bayesian computation for inverse continuum problems. 20

3.1 Schematic for inverse problems in heat conduction. The main un-knowns considered include the conductivity k, the heat flux q0 on Γ0

or the heat source f(x, t) in Ω. . . . . . . . . . . . . . . . . . . . . . 243.2 Linear finite element basis functions and neighborhood definition for

θ. The figure on the left refers to 1D heat conduction (unknownheat flux q(t)) and the figure on the right to 2D heat conduction ina square domain (unknown heat flux shown q(x, t)). . . . . . . . . . 29

3.3 The left figure is the schematic of the 1D inverse heat conductionproblem. The figure on the right provides the time-profile of thetrue heat flux that was used to generate the simulated sensor data. . 43

3.4 Computed posterior densities of k using different Bayesian models. . 453.5 True heat flux in example II. . . . . . . . . . . . . . . . . . . . . . . 463.6 Posterior mean estimates of the heat flux and 98% probability bounds

of the posterior distributions when d = 0.5 using a hierarchicalBayesian model (Example II). The figure on the left is obtained whenσT = 0.01 and the figure on the right is obtained when σT = 0.001. . 47

3.7 Posterior mean estimates of the heat flux and 98% probability boundsof the posterior distributions when d = 0.1 using a hierarchicalBayesian model (Example II). The figure on the left is obtained whenσT = 0.01 and the figure on the right is obtained when σT = 0.001. . 48

3.8 Posterior mean estimates of the heat flux and 98% probability boundsof the posterior distribution when d = 0.5 and σT = 0.01 using a hi-erarchical and augmented Bayesian model (Example II). . . . . . . . 49

3.9 Posterior density estimate of hyper-parameter λ in the second case. . 493.10 Posterior mean estimates of the heat flux and 98% probability bounds

of the posterior distribution when uncertainties in d and k exist. Thefigure on the left is obtained using true d and k, and the figure onthe right is obtained using the nominal values of d and k (ExampleIII). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.11 Posterior mean estimates of the heat flux and 98% probability boundsof the posterior distribution when k and d are treated as randomvariables (Example III). . . . . . . . . . . . . . . . . . . . . . . . . . 51

xi

3.12 True heat source (left) and reconstructed heat source (right) for caseII of Example IV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.13 Posterior mean estimate and 98% probability bounds of the posteriordistribution of step heat source at t = 0.24. . . . . . . . . . . . . . . 53

3.14 True heat source profiles for example V. . . . . . . . . . . . . . . . . 543.15 Posterior mean estimates of heat source profiles when σT = 0.02 for

example V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.16 Computed heat source at y = 0.725 at different times (example V,

σT = 0.005). Also shown are the 98% probability bounds of theposterior distribution at t = 0.05. . . . . . . . . . . . . . . . . . . . . 56

4.1 Schematic of the inverse radiation problem. The objective is to com-pute the point heat source g(t) given initial conditions, boundaryconditions on the surface and temperature measurements at a num-ber of points within the domain. . . . . . . . . . . . . . . . . . . . . 61

4.2 Schematic of the numerical example. . . . . . . . . . . . . . . . . . . 714.3 Profile of the step heat source. . . . . . . . . . . . . . . . . . . . . . 714.4 Homogeneous intensity fields on y = 0.5 along directions [0.9082

483 0.2958759 0.2958759] and [−0.90824830.29587590.2958759] for stepheat source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.5 Homogeneous temperature fields on y = 0.5 for step heat source. . . 744.6 Eigenfunctions of Ih along [0.9082483 0.2958759 0.2958759] on y = 0.5. . 754.7 Eigenfunctions of T h on y = 0.5. . . . . . . . . . . . . . . . . . . . . 764.8 Homogeneous temperature field computed using the POD method

on y = 0.5 for step heat source. . . . . . . . . . . . . . . . . . . . . . 774.9 Temperature evolution at thermocouple locations for step heat source. 784.10 MAP estimates for the step heat source. . . . . . . . . . . . . . . . . 794.11 Posterior mean estimate of the step heat source and probability

bounds of the posterior distribution when σT = 0.01. . . . . . . . . . 804.12 Profile of the triangular heat source. . . . . . . . . . . . . . . . . . . 804.13 MAP estimates for the triangular heat source case. . . . . . . . . . . 814.14 Posterior mean estimate of the triangular heat source and probability

bounds of the posterior distribution when σ = 0.01. . . . . . . . . . . 81

5.1 True and posterior mean estimate of concentration at t = 1.1. . . . . 975.2 True and posterior mean estimate of concentration at t = 1.9. . . . . 975.3 Posterior density of structure parameter λ in obtaining concentration

estimate at t = 1.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.4 Schematic of Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . 995.5 Reconstruction of the history of contaminant concentration: (a) The

true concentrations at different past time steps; (b) the reconstructedconcentrations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

xii

5.6 Reconstruction of the concentration at t = 0: (a) data are collectedat 9 × 9 sensor locations at t = 0.2 (b) data are collected at 5 × 5sensor locations at t = 1.0. . . . . . . . . . . . . . . . . . . . . . . . 101

5.7 Reconstruction of the history of pollute concentration: (a) showsthe true concentrations at different past time steps and (b) showsthe reconstructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.8 Reconstruction of the contamination history when data are collectedat 9× 9 sensor locations at t = 1.0. . . . . . . . . . . . . . . . . . . . 104

5.9 Reconstruction of the history of pollute concentration in heteroge-neous medium (data are collected at 32 × 32 grid): (a) the trueconcentrations at different past time steps and (b) the computedreconstructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.10 Reconstruction of the history of pollute concentration in heteroge-neous medium (data are collected on a 16× 16 grid). . . . . . . . . . 106

6.1 Schematic of the directional solidification system. A time-varying

magnetic field with spatial gradient ∂B∂z

is applied in the z direction. 1126.2 Schematic of a Bayesian filter with Markov properties. . . . . . . . . 1156.3 Snapshots of the solidification process without magnetic gradient

control applied. The left figures are the temperature fields and theright ones are the streamlines. . . . . . . . . . . . . . . . . . . . . . . 121

6.4 Configuration of the optimal magnetic gradient when λ = 0.1. . . . . 1216.5 Configuration of the optimal magnetic gradient when λ = 0.5. . . . . 1226.6 Configuration of the optimal magnetic gradient when λ = 1. . . . . . 1226.7 Snapshots of the solidification process with optimal magnetic gradi-

ent applied when λ = 0.1. The left figures are the temperature fieldsand the right ones are the streamlines. . . . . . . . . . . . . . . . . . 124

6.8 Snapshots of the solidification process with optimal magnetic gradi-ent applied when λ = 0.5. The left figures are the temperature fieldsand the right ones are the streamlines. . . . . . . . . . . . . . . . . . 125

6.9 Snapshots of the solidification process with optimal magnetic gradi-ent applied when λ = 1. The left figures are the temperature fieldsand the right ones are the streamlines. . . . . . . . . . . . . . . . . . 126

6.10 Configuration of the optimal magnetic gradient when the boundaryheat flux has random fluctuation with a uniform distribution. . . . . 127

6.11 Configuration of the optimal magnetic gradient when the boundaryheat flux has random fluctuation with a Gaussian distribution. . . . 127

6.12 Snapshots of the solidification process with optimal magnetic gra-dient applied when the boundary heat flux has random fluctuationwith a uniform distribution. The left figures are the temperaturefields and the right ones are the streamlines. . . . . . . . . . . . . . . 128

xiii

6.13 Snapshots of the solidification process with optimal magnetic gra-dient applied when the boundary heat flux has random fluctuationwith a Gaussian distribution. The left figures are the temperaturefields and the right ones are the streamlines. . . . . . . . . . . . . . . 129

7.1 Schematic of a 9-spot problem. A injection well is located at thecenter of the domain and 8 production wells distribute at the restnodes of a 2 × 2 grid. In general, for a n-spot problem, the n wellsdistribute at nodes of a (

√n − 1) × (

√n − 1) grid with the single

injection well at the center. . . . . . . . . . . . . . . . . . . . . . . . 1347.2 The log permeability of a random porous medium. Two large mag-

nitude discontinuities occur within two darker areas ([2, 4] × [4, 6]and [4, 6] × [2, 4]). Within each of these areas, the permeability isa correlated Gaussian random field with a correlation function ofρ(r) = e−r2

with r being the spatial distance among two locations.The random variations within each darker area have much smallermagnitude than the average magnitudes of both darker areas perme-ability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.3 The enlarged upper-left darker area ([2, 4]× [4, 6]) of the log perme-ability in Fig. 7.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.4 The enlarged lower-right darker area ([4, 6]×[2, 4]) of log permeabilityin Fig. 7.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.5 A scheme of hierarchical Markov tree model. . . . . . . . . . . . . . . . 1407.6 Schematics of single component updating (upper-left), block updat-

ing (upper-right), and whole field updating (lower). . . . . . . . . . . 1437.7 The true permeability field in example I. . . . . . . . . . . . . . . . . 1457.8 The permeability estimate on 32× 32 grid using data at 24 wells. . . 1457.9 The permeability estimate on 16× 16 grid using data at 24 wells. . . 1467.10 The permeability estimate on 8× 8 resolution using data at 24 wells. 1467.11 The permeability estimate on 32×32 resolution using data at 8 wells

without smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.12 The permeability estimate on 32×32 resolution using data at 8 wells

with smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.13 The permeability estimate on 16×16 resolution using data at 8 wells

with smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.14 The permeability estimate on 8× 8 resolution using data at 8 wells

with smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.15 The coarse scale estimate of the random heterogeneous permeability

(logarithm of the permeability is plotted). . . . . . . . . . . . . . . . 1507.16 Realization I of the fine scale log permeability distribution. The left

plot is the entire field. The middle plot is the enlarged upper-leftdarker area ([2, 4]× [4, 6]). The right plot is the enlarged lower-rightdarker area ([4, 6]× [2, 4]). . . . . . . . . . . . . . . . . . . . . . . . . 150

xiv

7.17 Realization II of the fine scale log permeability distribution. The leftplot is the entire field. The middle plot is the enlarged upper-leftdarker area ([2, 4]× [4, 6]). The right plot is the enlarged lower-rightdarker area ([4, 6]× [2, 4]). . . . . . . . . . . . . . . . . . . . . . . . . 151

7.18 Realization III of the fine scale log permeability distribution. Theleft plot is the entire field. The middle plot is the enlarged upper-leftdarker area ([2, 4]× [4, 6]). The right plot is the enlarged lower-rightdarker area ([4, 6]× [2, 4]). . . . . . . . . . . . . . . . . . . . . . . . . 151

7.19 Sample mean of the fine scale log permeability distribution. The leftplot is the entire field. The middle plot is the enlarged upper-leftdarker area ([2, 4]× [4, 6]). The right plot is the enlarged lower-rightdarker area ([4, 6]× [2, 4]). . . . . . . . . . . . . . . . . . . . . . . . . 152

7.20 True log permeability with correlation coefficient ρ = e−|r|. The leftplot is the entire field. The middle plot is the enlarged upper-leftdarker area ([2, 4]× [4, 6]). The right plot is the enlarged lower-rightdarker area ([4, 6]× [2, 4]). . . . . . . . . . . . . . . . . . . . . . . . . 153

7.21 Realization I of the fine scale log permeability distribution with ρ =e−|r|. The left plot is the entire field. The middle plot is the enlargedupper-left darker area ([2, 4]× [4, 6]). The right plot is the enlargedlower-right darker area ([4, 6]× [2, 4]). . . . . . . . . . . . . . . . . . 153

7.22 Realization II of the fine scale log permeability distribution withρ = e−|r|. The left plot is the entire field. The middle plot is theenlarged upper-left darker area ([2, 4]× [4, 6]). The right plot is theenlarged lower-right darker area ([4, 6]× [2, 4]). . . . . . . . . . . . . 154

xv

Chapter 1

Introduction

Continuum systems herein refer to physical systems described by partial differential

equations (PDEs). The direct or forward problem in continuum systems (direct

continuum problem) computes the solution to the PDEs with complete specifica-

tion of all pertinent physical information including spatial and temporal domains,

boundary conditions, initial conditions, as well as other physical parameters. To

the contrary, the inverse problem in continuum systems (inverse continuum prob-

lem) concerns determination of unknown boundary conditions, initial conditions,

physical parameters or geometry from known information about the governing con-

tinuum fields. This known information usually takes the form of discrete values of

these fields (experimental data or desired system response) at given spatial and tem-

poral locations. Thus the majority of inverse problems are data-driven in nature.

Inverse continuum problems are generally stochastic, e.g. random errors are always

present in experimentally collected data.

Inverse continuum problems have major applications in almost all scientific and

engineering areas. For example, among the physical problems studied in this the-

sis, inverse heat conduction is of interest in broad areas including manufacturing

1

2

process control; metallurgy; chemical, aerospace and nuclear engineering; food sci-

ence; medical diagnostics; etc. [1]. In this problem, one seeks the heat flux along

part of the boundary of a domain given sufficient conditions along the remaining

part of the boundary and temperature measurements within the domain of a con-

ducting solid. Another example is the study of inverse thermal radiation, in which

the heat source is determined from temperature data. This problem is motivated

by thermal control applications in space technology, combustion, high temperature

forming and coating technology, solar energy utilization, high temperature engine,

furnace technology and other areas [2]. Inverse continuum problems are very com-

mon in many mass transport processes such as detection of a contaminant source

in groundwater, estimation of heterogeneous permeability of subsurface structures,

identification of species generation rates in chemical reactions, identification of in-

jection rates or initial concentrations in miscible/immiscible displacements of fluids

in porous media, etc. In addition, design and many open-loop control problems in

continuum systems can be posed as inverse problems by treating design or control

objectives as measured data.

Inverse continuum problems have received enormous research interest because of

their technological significance and their mathematical and computational difficul-

ties. The main characteristic of an inverse continuum problem versus a well-posed

direct continuum problem is that it leads to solutions that generally are not unique or

stable to small changes in the given data [3], and this characteristic is often referred

to as ill-posedness. Additional features of solution procedures for solving inverse

continuum problems include the complexity of the direct simulation and the result-

ing implicit inverse formulation, the high computational cost, and the existence of

various uncertainties. The inverse problem requires solution of the direct continuum

3

process specified by a coupled system of PDEs. Discontinuities and singularities are

often expected in the unknowns as in the case of estimating heterogeneous random

fields in porous media. For a complex continuum transport system, the degrees-

of-freedom (DOF) of the numerical simulation can easily reach the hundreds of

thousands, millions or higher. Uncertainties are also unavoidable and have critical

effects on the inverse solutions. They arise, for example, from instability of the

physics, from insufficient knowledge of the underlying physical and mathematical

models, from lack of knowledge of material properties and initial/boundary condi-

tions, from propagation of errors in the simulation and of course from noise-polluted

experimental data. These features are summarized in Box 1.

• ill-posedness (non-unique solution and instability

to random error)

• complexity of direct simulation

• non-linearity and complex objective function re-

sponse surface

• high computational cost

• discontinuity in the distributed unknown fields

• system and measurement uncertainties

• sparseness of the data

Box 1: Features of inverse continuum problems

Many methods have been developed for the solution of inverse continuum prob-

lems with the majority of them restating the problem as a least-squares minimization

problem [4, 5, 6] (or using other appropriate norms of the deviation between the com-

puted PDE variables at the sensor locations for a guessed inverse solution and the

4

given measured data). The inverse problem is formulated using either a parametric

approach, where the unknown is first discretized using specified basis functions and

the coefficients are then estimated, or using an infinite-dimension functional opti-

mization approach, in which the functional form of the unknown is not prescribed

[7]. In addition to the direct problem, appropriate continuum or discrete sensitivity

and/or adjoint problems are usually required [8, 9]. The ill-posedness of the inverse

problem is addressed using the Tikhonov regularization technique [10, 11, 12], the

future information method [1, 13], the iterative regularization technique [3, 14], or

the mollification method [15]. The optimization problem is usually solved via a gra-

dient method such as the conjugate gradient method in either a finite or an infinite

dimensional space.

The above deterministic inverse techniques lead to point estimates of unknowns

without rigorously considering the statistical nature of system uncertainties and

without providing quantification of the uncertainty in the inverse solution. For many

inverse problems, these methods do not provide satisfactory solutions. The existing

regularization methods smooth the inverse solution without resolving its disconti-

nuities. Such techniques omit two types of useful information in regularizing the

ill-posed inverse problems: the accumulated prior knowledge of the unknowns and

the spatial and temporal dependence among the unknowns. The drawbacks restrict

their ability to solve distributed parameter estimation problems. Also, the selection

of an optimal regularization parameter is a problem common to all regularization

methods. Furthermore, the gradient method used in minimizing the deterministic

optimization objective often fails to locate the global minimum for highly nonlinear

problems. Finally, since the existing inverse methods do not consider prior informa-

tion regarding the unknowns besides measured data, obtaining a valid solution to

5

the inverse problem becomes less feasible when only sparse data are available.

• models uncertainties probabilistically and determines their propagation

through the inverse solution

• solves the direct problem deterministically with reduced-order models

• transforms the inverse problem into a well-posed problem in an expanded

stochastic space

• explores state space of the regularization parameter and allows selection

of its optimal value

• links with spatial statistics models for prior distribution modeling

• obtains accurate estimates using sparse data through prior distribution

modeling

• enables computation of statistics of inverse solution

Box 2: Advantages of the Bayesian computational approach

With the rapid growth of computational power and critical demands on robust-

ness and reliability, solving inverse problems under uncertainty has become more and

more important. Lately a number of methods have been proposed to solve stochastic

inverse problems, including the sensitivity analysis [16, 17], the extended maximum

likelihood estimator (MLE) approach [18, 19], the spectral stochastic method [20],

and the Bayesian inference method [21, 22, 23]. While all stochastic inverse meth-

ods can account for uncertainties and are able to provide point estimates to the

inverse solution with credible intervals, the Bayesian approach has more assets. In

the Bayesian approach, a prior distribution model is combined with the likelihood to

formulate the posterior probability density function (PPDF) of the unknown vari-

able [24, 25]. The Bayesian approach provides a complete probabilistic description

6

of the unknown quantities given all related observations. The method regularizes

the ill-posed inverse problem through prior distribution modeling [21] and in addi-

tion provides means to estimate the statistics of uncertainties. The advantages of

the Bayesian approach are summarized in Box 2.

The Bayesian approach can probabilistically model various system uncertainties

and determine their propagation to the inverse solution [26]. Therefore, it provides

not only point estimates but also the probability distribution of the unknown quan-

tities conditional on available data [27]. The Bayesian method explores statistics of

random data error, which is rather critical because solutions to inverse problems are

extremely sensitive to data error [28]. In addition, unlike other techniques that aim

to regularize the ill-posed inverse problem to achieve a point estimate, the Bayesian

method treats the inverse problem as a well-posed problem in an expanded sto-

chastic space [27]. Even when seeking only a point estimate, the Bayesian method

can provide more flexible regularization to the inverse problem [29, 30] in the sense

that the non-trivial problem of selecting the regularization parameter [10] is solved

through hierarchical Bayesian formulations [31]. Furthermore, under the Bayesian

framework, the forward problems are solved deterministically and the uncertain-

ties are accounted for solely through statistical inference [31]. Hence, legacy scien-

tific computational methods that simulate continuum processes can be used jointly

with Bayesian computational algorithms. Bayesian regularization through prior dis-

tribution modeling can deal with arbitrary unknown fields using spatial statistics

models [33, 34, 35]. It can provide accurate estimates with a limited number of mea-

surements when reliable prior models are available [36, 37]. Finally, the available

sampling strategies [38, 39] associated with Bayesian computation, especially the

Markov chain Monte Carlo (MCMC) simulation tools [40, 41, 42, 43, 44], are capa-

7

ble of overcoming the difficulties encountered when dealing with the optimization

of nonlinear problems of high-dimensionality.

Despite the rather long history of Bayesian statistics and development within

the past several decades of the computational method MCMC simulation, there are

few applications of Bayesian statistics to engineering inverse problems. The related

previous work includes those of Beck et al. [45] to structural models, of Kaipio et al.

[46] to electrical impedance tomography, of Sabin et al. [47] to grain size prediction,

of Michalak et al. [48] to flow in porous media, of Osio [49] to engineering design,

and of Higdon et al. [50] to petroleum engineering.

In this thesis, a computational framework that integrates computational math-

ematics aspects of PDEs, Bayesian statistics, statistical computation, as well as

reduced-order modeling is developed to address data-driven inverse problems in con-

tinuum systems. The components of this framework include hierarchical Bayesian

estimation, prior modeling of a distributed parameter via spatial statistics, explo-

ration of implicit posterior distribution using Markov chain Monte Carlo (MCMC)

simulation, proper orthogonal decomposition (POD) based reduced-order modeling

of PDE systems, as well as sequential Bayesian estimation. The emphasis in this the-

sis is on three aspects of inverse continuum problems: (i) developing methodologies

that enable probabilistic modeling and quantification of various uncertainties aris-

ing from physical instability, model and parameter insufficiency, and measurement

errors; (ii) computing inverse solutions with full-probabilistic specification; and (iii)

designing computational tools to address the high-computational cost in stochastic

optimization.

In the following chapters, presentation of the methodologies is fused with solu-

tions to physical problems including inverse heat conduction, inverse heat radiation,

8

contaminant detection in porous media flows, control of directional solidification,

and multiscale permeability estimation in heterogeneous media. The list of physical

problems is given in Box 3. These problems were selected due to their technical sig-

nificance as well as their ability to demonstrate the attributes of Bayesian method.

Among these problems, the inverse heat conduction problems, the inverse radia-

tion problem, and the solidification control problem have been studied by other

researchers. However, the Bayesian method improves the solutions to these prob-

lems in the following sense: i.) it quantifies the uncertainties in the inverse heat

conduction and radiation problems and provides statistics of the inverse solutions;

ii.) it allows estimates of discontinuous unknown heat source with sparse temper-

ature measurements; iii.) it saves computational time and memory cost for the

solidification control problem; and iv.) it enables an intelligent way to select the

regularization parameters in all of these problems. The problems of contaminant

detection in porous media flow (with heterogeneous permeability and anisotropic

dispersion) and the multiscale permeability estimation are unsolved problems (al-

though some related research was performed earlier as reviewed in the corresponding

chapters). It will be shown that the Bayesian method provides satisfactory solutions

to these problems.

The developed methodologies are generic and applicable to many other inverse

continuum problems. The new Bayesian computational approach provides a new

outlook to the solution of inverse continuum problems. This work will benefit a

variety of engineering and scientific areas such as materials and chemical process

monitoring/control, metallurgy, geology, combustion diagnostics, nuclear engineer-

ing, etc.

9

• inverse heat condition problems (IHCP)

• inverse heat radiation problem (IHRP)

• backward contamination source estimation problem (BCSEP)

• open-loop control of directional solidification using a non-

uniform external magnetic field

• multiscale permeability estimation in random heterogeneous

porous media

Box 3: Inverse problems of interest in this study

The outline of this thesis is as follows. In Chapter 2, the fundamental knowledge

of Bayesian computation pertaining to this study is reviewed, including Bayesian

statistics, MCMC and MRF. Chapter 3 elaborates the formulation of the posterior

probability density function (PPDF) for inverse heat conduction problems, includ-

ing boundary heat flux estimation, physical parameter estimation, and heat source

estimation. A sequence of MCMC samplers are designed for the posterior explo-

ration of IHCPs. In particular, hierarchical and augmented Bayesian models are

introduced for the purpose of automated regularization parameter selection as well

as uncertainty statistics estimation. Markov random fields (MRF) [51, 52] are used

to model the prior distributions of quantities varying in both space and time such as

heat flux and heat source. The objective of Chapter 3 is to demonstrate the funda-

mental steps in applying Bayesian computation to inverse continuum problems. In

Chapter 4, the Bayesian method is extended to a computationally expansive inverse

radiation problem. The focus of this Chapter is to fuse Bayesian inversion with

reduced-order modeling to address the high-computational cost associated with sto-

chastic inverse methods. In Chapter 5, Bayesian computation is further applied to

10

a backward inverse problem for estimation of contaminant source in porous media

flow. It demonstrates a backward marching in time procedure for addressing of

initial condition estimation problems. This is another type of inverse continuum

problems besides the boundary value estimation problems discussed in the previous

two chapters. Chapter 6 discusses a Bayesian filter approach for open-loop control

of directional solidification. This study accomplishes the Bayesian computational

framework by introducing sequential Bayesian estimation as complementary to the

whole-time domain method addressed in the previous chapters. It is aimed at illus-

trating a group of powerful sequential Bayesian computational methods that can be

applied to data-driven inversion in a dynamic continuum system. A multiscale in-

verse problem of estimating the permeability of heterogeneous porous medium using

flow data is finally presented in Chapter 7. The hierarchical Markov tree (HMT)

model is introduced to model random parameters at different length scales. The

conclusions of this work and suggestions for future research are finally summarized

in Chapter 8.

Chapter 2

Fundamentals of Bayesian

computation and Markov Random

Field (MRF)

This chapter provides information about Bayesian statistical analysis, the Markov

chain Monte Carlo (MCMC) computation method and Markov Random Field (MRF)

that are necessary background for the work presented in subsequent chapters. The

more advanced models and algorithms used to solve the actual problems in this thesis

are introduced in later chapters as the applications arise. For additional informa-

tion about these topics that is not covered in this thesis, readers are encouraged to

consult [24, 25, 26, 27].

2.1 Bayesian statistical analysis

Bayesian statistics study the probability of a hypothesis from both currently achieved

information (data) and previous knowledge (prior distribution) [26]. It provides

11

12

powerful analytical tools for parameter estimation, hypothesis testing, and model

selection problems, in particular those related to data-driven identification of input

parameters, satisfying robust design requirements, and real-time decision making.

The basis of Bayesian analysis is the Bayes’ formula:

p(θ|Y ) =p(Y |θ)p(θ)

p(Y )=

1

cp(Y |θ)p(θ). (2.1)

Here, θ is used to represent a hypothesis or a parameter and Y stands for the

observation (data) related to θ. p(θ|Y ), p(Y |θ) and p(θ) are the posterior probability

density function (PPDF), the likelihood and the prior probability density function,

respectively. Eq. (2.1) states that the posterior probability of a hypothesis given

some observations is proportional to the product of its likelihood and the prior

(unconditional) probability.

When θ represents a random parameter, the Bayes formula defines its distribu-

tion conditional on the data, which is the most complete probabilistic description

of the parameter. Hence, in a Bayesian estimation approach, the primary objective

is to derive the posterior distribution. Once the posterior distribution is known,

several point estimators can be defined such as the Maximize A Posteriori (MAP)

estimator:

θMAP = argmaxθ p(θ|Y ), (2.2)

and the posterior mean estimator:

θpostmean = E θ|Y. (2.3)

However, it is worth emphasizing that it is more meaningful to discuss the prob-

ability of an unknown variable to be within a certain range, rather than having a

particular value. Therefore, estimating a distribution makes more practical sense

than computing point estimates.

13

The inverse problems of interest in this thesis can be interpreted as parameter es-

timation problems by treating the unknown inverse solutions as random parameters

(when the inverse solution is a function, it can be parametrized by a projection onto

a function space spanned by finite number of basis functions). The main difference

between the Bayesian approach and the other inverse methods is that the Bayesian

approach determines the distribution of the inverse solution instead of point esti-

mates, and as a result, the inverse problem is formulated as a well-posed problem

in a stochastic space (state space defined by the prior distribution).

To obtain the posterior distribution of an inverse solution, one needs to formulate

the likelihood and the prior distribution according to the Bayes formula. Note that

it is not necessary to compute the normalizing constant c in Eq. (2.1) under most

circumstances because either an optimization problem is solved to compute the

point estimate or sampling methods are used to explore the posterior state space.

In either case, there is no need to know the normalizing constant. This greatly

simplifies the analysis and computation as it may not be trivial to calculate the

marginal distribution of Y .

It is relatively straightforward to obtain the likelihood. For instance, for a system

F (θ) with θ being the input parameter and F (·) being the system model, assume

there is a single measurement Y of F (θ) with the additive random error ω. The

likelihood in this case simply depends on the distribution of ω. If ω is a zero-mean

Gaussian variable and the standard deviation of ω is σ (which maybe unknown),

the likelihood is as follows:

p(Y |θ, σ2) =1√

2πσ2exp−(F (θ)− Y )2

2σ2 (2.4)

Modeling of the prior distribution is more complicated. Standard techniques such

as the conjugate priors and Jeffrey’s priors can be used to derive compatible priors

14

when the likelihoods have explicit functional forms. For problems studied in this

thesis, the system models are PDEs that can generally only be solved using numerical

methods. Therefore, the likelihoods are implicit (numerical solvers). Considering

the fact that the majority of the inverse solutions are distributed parameters or

continuous functions in space and time, the spatial statistics models are proper

candidate for the priors. A special type of spatial statistics model, the Markov

Random Field (MRF) is used extensively in this study for the prior distribution

modeling of distributed random quantities on finite lattices. The MRF models are

introduced in Section 2.3.

In addition to the basic Bayesian posterior distribution formula, the hyper-

parameters, which are the parameters in the prior distribution of a Bayesian for-

mulation, can also be modeled as random variables and have their own prior dis-

tributions. This leads to a multi-layer hierarchical Bayesian posterior formulation.

The standard prior modeling techniques are used to model the hyper-priors in this

course of study. Through hierarchical Bayesian modeling, the effect of prior un-

certainty, namely the poor knowledge of hyper-parameters, is diminished in the

inverse solutions. This approach also enables a mechanism to select optimal regu-

larization parameters automatically in computing point estimates. As an additional

advantage, the statistics of system errors can be computed from the hierarchical

formulations.

For the likelihood in Eq. (2.4), if the primary parameter θ is assumed to have a

Gaussian distribution with known mean value θ and unknown variance v and σ is

assumed to be known, the posterior distribution can be written as:

p(θ, v|Y ) ∝ p(Y |θ, v)p(θ, v) ∝ p(Y |θ)p(θ|v)p(v)

=1√

2πσ2exp−(F (θ)− Y )2

2σ2 1√

2πvexp−(θ − θ)2

2vv−(1+α) exp

−βv−1

, (2.5)

15

where v is the only hyper-parameter assumed to have an inverse Gamma distribu-

tion. In principle, σ and θ can be assumed as random variables as well.

2.2 Markov chain Monte Carlo (MCMC) simula-

tion

To explore the state space defined by the posterior distribution, numerical sam-

pling methods are needed because the posteriors are usually of high dimension,

non-standard and have implicit functional forms. The most widely used numerical

method for exploring the state space of a probability distribution is the Monte Carlo

simulation (MCS), which approximates the true expectation of a function of θ by

the sample mean. MCS is based upon a large sample set from the target distribution

(here the posterior distribution p(θ|Y )). For this purpose, various sampling strate-

gies have been proposed [38]. Among these techniques, Markov chain Monte Carlo

(MCMC) is the most sophisticated and useful [40, 41]. In the following introduction

of the MCMC algorithms, p(θ) denotes any probability density function of θ (not

only the prior) and f(θ) denotes an arbitrary function of θ.

2.2.1 Monte Carlo principle

The idea of Monte Carlo simulation is to draw an independent identically distrib-

uted (iid) set of samples θ(i)Li=1 from a target distribution p(θ) defined on a high

dimensional space Rm, where m is the dimension of θ [38]. These L samples can

be used to approximate the target density with the following empirical point-mass

function:

pL(θ) =1

L

L∑

i=1

δθ(i)(θ) (2.6)

16

where δθ(i)(θ) denotes the delta-Dirac mass located at θ(i). Consequently, one can

approximate the expectation of any function f of θ by its mean as follows:

EL(f) =1

L

L∑

i=1

f(θ(i)) 7−→L→∞ E(f) =∫

f(θ)p(θ)dθ (2.7)

By the strong law of large numbers, EL(f) will converge to E(f) when the number

of samples goes to infinity. In the case f(θ) = θ, one is able to use Eq. (2.7) to

compute the mean estimate of θ. The L samples can also be used to obtain the

MAP estimate of θ as follows:

θMAP = argmaxθ(i) p(θ(i)) (2.8)

It is also possible to construct simulated annealing algorithms that allow us to

sample approximately from the global maximum of the target distribution [38].

2.2.2 MCMC algorithms

MCMC is a strategy for generating samples θ(i) while exploring the state space of

θ using a Markov chain mechanism. For a particular Markov chain designed in

this simulation, the stationary distribution of the chain is the target distribution to

sample from. A sufficient, but not necessary, condition to ensure target distribution

p(θ) as the stationary distribution is to satisfy the detailed balance:

p(θ(i))T (θ(i−1)|θ(i)) = p(θ(i−1))T (θ(i)|θ(i−1)). (2.9)

In this condition, T (θ(i)|θ(i−1)) is the transition kernel of the Markov chain. The

detailed balance condition is in fact the basis of MCMC algorithms. This mechanism

guarantees that the samples θ(i) mimic samples drawn from the target distribution

p(θ) [38]. One thing should be pointed out is that one advantage of MCMC is that

one can draw samples from p(θ) without knowing the normalizing constant of it.

17

The basic form of MCMC, the Metropolis-Hastings (MH) algorithm is as follows

[38]:

Algorithm I

1. Initialize θ(0)

2. For i = 0 : Nmcmc− 1

— sample u ∼ U(0, 1)

— sample θ(∗) ∼ q(θ(∗)|θ(i))

— if u < A(θ(∗), θ(i)) = min1, p(θ(∗))q(θ(i)|θ(∗))p(θ(i))q(θ(∗)|θ(i))

θ(i+1) = θ(∗)

— else

θ(i+1) = θ(i)

In the above algorithm, Nmcmc is the total number of runs, u is a random number

generated from standard uniform distribution U(0, 1), p(θ) is the target distribution,

and q(∗|i) is a proposal distribution that has standard form and generates candidate

sample conditional on the previous sample. By its design, the algorithm guarantees

that the transition kernel of this chain satisfies the detailed balance and the samples

will converge to the target distribution for any proposal distribution. However,

careful design of q(∗|i) can accelerate the convergence. Once convergence of the

chain is achieved, the samples obtained thereafter can be regarded as belonging to

the target distribution.

As a special case of the MH algorithm, the symmetric sampler, which assumes

a symmetric proposal q(θ(∗)|θ(i)) = q(θ(i)|θ(∗)), is often used. The acceptance prob-

ability in this case is simplified to A(θ(∗), θ(i)) = min1, p(θ(∗))p(θ(i))

.

18

If the dimension of θ is high (large m), it is rather difficult to update the entire

random vector in a single MH step because the acceptance probability is usually

fairly small. A better approach is to update part of the components of θ each

time and implement an updating cycle inside each MH step, which is often termed

block-update or cycle hybrid MCMC [38]. The extreme case of this strategy is the

single-component Gibbs sampler, which updates a single component each time using

the full conditional distribution as the proposal distribution. The Gibbs sampler [53]

is the most widely used MCMC algorithm. It emphasizes the spatial ingredient of

MCMC algorithms in the sense that its specification is the same as the conditional

probability specification of a Markov Random Field [37]. For an m-dimensional

random vector θ, the full conditional distribution of the ith component θi is defined

as p(θi|θ−i), where θ−i stands for θ1, θ2, ..., θi−1, θi+1, ..., θm. When this full con-

ditional distribution is known and has standard form, it is often advantageous to

use it as the proposal distribution. The important feature of this sampler is that

the acceptance probability is always 1. This means that the candidate sample θ(∗)

generated in this way will always be accepted. The algorithm can be summarized

as follows:

Algorithm II

1. Initialize θ(0)

2. For i = 1 : Nmcmc

— sample θ(i+1)1 ∼ p(θ1|θ(i)

2 , θ(i)3 , ..., θ

(i)m )

— sample θ(i+1)2 ∼ p(θ2|θ(i+1)

1 , θ(i)3 , . . . , θ

(i)m )

—...

— sample θ(i+1)m ∼ p(θm|θ(i+1)

1 , θ(i+1)2 , . . . , θ

(i+1)m−1 )

19

2.2.3 Convergence assessment of MCMC

Although convergence of the above introduced samplers is guaranteed, there is in

general no explicit indication of when the chain converges. It is clear that the con-

vergence rate of Gibbs sampler is the fastest because of its perfect acceptance ratio.

Statisticians have developed a large number of techniques for convergence assess-

ment [54]. In the problems studied here, the convergence of MCMC is determined

by monitoring the histogram and marginal density of accepted samples. It is rather

an empirical approach yet accurate enough in the current applications to render

satisfactory statistics of the inverse solutions.

2.3 Prior distribution modeling using MRF

Markov Random Field (MRF) has been successfully used for prior distribution mod-

eling in many image processing and field data analysis applications [51, 52]. In this

work, MRF is introduced for the simultaneous prior distribution modeling of dis-

tributed unknowns in space and time by treating time as another spatial dimension.

Consequently, in discussion of the MRF, the unknown θ is treated as a collection

of spatially distributed random variables on a finite lattice. The canonical form of

MRF is a point-pair spatial model of θ:

p(θ) ∝ exp−∑

i∼j

WijΦ(γ(θi − θj)), (2.10)

in which θi is the unknown variable at spatial site i, γ is a scaling parameter and Φ

is an even function (Φ(−x) = Φ(x)) that determines the specific form of the MRF.

The summation in Eq. (2.10) is over all pairs of sites i ∼ j that are neighbors

and W ′ijs are specified nonzero weights. In general, the neighbors to a particular

unknown at a given location of a finite lattice refer to unknowns at adjacent points

20

on the same lattice.

The Φ used in the current study is in the form Φ(u) = 12u2, which is a widely

used model in spatial problems [51]. The MRF then can be rewritten as:

p(θ) ∝ λm/2 exp(−1

2λθT Wθ). (2.11)

In the above one-parameter model, the entries of the m×m matrix W are determined

as, Wij = ni if i=j, Wij = −1 if i and j are neighbors, and as 0 otherwise. ni is the

number of neighbors of site i. W determines the dependence between components

of θ and λ controls the scale on which the random vector is distributed. This

simple form of MRF has been reported to be effective in a number of applications

[25, 36, 37].

posterior exploration (Markov chain Monte Carlo)

prior distribution modeling• conjugate priors• physical constraints• spatial statistical models

likelihood computation• computational mathematics • reduced-order modeling (POD)• parallel computation

• Metropolis-Hastings sampler• symmetric sampler• independent sampler

hierarchical Bayesian formulation

• hybrid & cyclic MCMC• sequential MCMC

)()(),|()|,( σθσθσθ ppYpYp ∝

Figure 2.1: Schematic of Bayesian computation for inverse continuum problems.

The prior introduced by the above MRF model is invariant under space shift,

therefore, it will not over-constrain the state space of θ. It is also able to model

different spatial dependences among the variables by adjusting the entries of W .

This prior distribution in improper in the sense that the integral of it is not bounded.

21

However, the single impropriety (the W matrix has rank m-1) in this prior is removed

from the corresponding posterior distribution by the presence of any informative

data. The scaling parameter λ is also of great importance. It controls the strength

of spatial dependence and regularization of the inverse problem.

2.4 Generic Bayesian computational framework

for inverse continuum problems

The generic Bayesian framework for the solution of complex inverse continuum prob-

lems is shown in Fig. 2.1. The major steps are prior distribution modeling, Bayesian

formulation, likelihood computation and posterior computation.

In the following chapters, details of how to implement this framework are dis-

cussed via the solutions to specific physical problems. The prior distribution model-

ing is based on MRF and the conjugate priors discussed above. Hierarchical Bayesian

formulations are used in most circumstances in conjunction with different system

models. Efficient computation of the likelihood is another focus. The computational

mathematics and reduced-order modeling will be linked with MCMC algorithms to

address this issue. The basic MCMC algorithms introduced in this chapter will be

enhanced to address specific needs of each of the applications considered later in

this thesis.

Chapter 3

Inverse heat conduction problems

(IHCP) - A Bayesian approach

In this chapter, the Baysian computation approach is applied to solve some bench

mark inverse problems in heat conduction processes. The objective is two-fold: i.)

to illustrate how the Bayesian approach can be applied to address inverse continuum

problems; and ii.) to demonstrate the advantages of this new approach.

Thermal property estimation, boundary heat flux reconstruction, and heat source

identification are the most commonly encountered inverse problems in heat conduc-

tion. These problems are posed when direct measurement of above physical quan-

tities is not feasible. Although a number of deterministic optimization theories and

algorithms have been developed toward the solution of these problems [6], many

difficulties remain unresolved such as reconstructing discontinuous heat source and

selecting the regularization parameter. These problems will be addressed herein

using the Bayesian method.

The outline of this chapter is as follows. In Section 3.1, rigorous definition of

the inverse heat conduction problems (IHCP) are given. This is followed by the

22

23

formulation of the posterior probability density function (PPDF) for IHCP with

consideration of system uncertainties and measurement noise in Section 3.2. Hierar-

chical Bayesian models are introduced in this section for the regularization parameter

selection. Section 3.3 discusses the stochastic parameter estimation problem as a

subcase of the formulation given in Section 3.2. In Section 3.4, the boundary heat

flux and heat source reconstruction problems are studied. A sequence of MCMC

algorithms are designed with emphasis on the single component update scheme in

Sections 3.3 and 3.4. Several numerical examples are presented in Section 3.5 to

demonstrate the developed methodologies. A brief summary is given in Section 3.6.

3.1 The inverse heat conduction problems

The classical inverse heat conduction problem (IHCP) refers to the calculation of an

unknown heat flux given temperature measurements in the domain of a conducting

solid. In general, this inverse heat conduction problem can be defined through the

following equations (see Fig. 3.1),

ρCp∂T

∂t= 5 · (k5 T ) + f(x, t), in Ω, t ∈ [0, tmax], (3.1)

T (x, t) = Tg, on Γg, t ∈ [0, tmax], (3.2)

k∂T (x, t)

∂n= qh, on Γh, t ∈ [0, tmax], (3.3)

k∂T (x, t)

∂n= q0, on Γ0, t ∈ [0, tmax], (3.4)

T (x, 0) = T0(x), in Ω, (3.5)

where ρ, Cp, k denote the density, heat capacity and thermal conductivity, re-

spectively. Also, f is the heat source, Tg, T0 and qh are the known temperature

conditions along boundary Γg, known initial temperature condition and known heat

24

o

g

h

* *

**

*

* **

**

known temperature

known heat flux

thermocouples

heat sources

unknown heat flux

Figure 3.1: Schematic for inverse problems in heat conduction. The main unknowns

considered include the conductivity k, the heat flux q0 on Γ0 or the heat source

f(x, t) in Ω.

flux condition on boundary Γh, respectively. In the classical IHCP, the main un-

known is the heat flux q0 on the boundary Γ0 [3, 31]. The reconstruction of this

unknown heat flux becomes feasible with measurement of the temperature field at

distinct points within Ω × [0, tmax]. Let Y denote the measured temperature data,

i.e. Y = [Y(1)1 , Y

(1)2 , ..., Y

(1)M , Y

(2)1 , Y

(2)2 , ..., Y

(2)M , ..., ..., Y

(N)1 , Y

(N)2 , ..., Y

(N)M ]T , with

Y(j)i = T (xi, tj) + ω, (3.6)

where i = 1, . . . , M , j = 1, . . . , N and tN = tmax. M and N are the number

of thermocouples and number of measurements at each site, respectively, and ω

is the random measurement noise. Eqs.(3.1)-(3.5) define a well-posed direct heat

conduction problem for each guessed heat flux q0 on Γ0× [0, tmax]. For simplicity of

the presentation, it is assumed that only one sensor is used with its location denoted

by the vector d.

Other related inverse heat conduction problems include thermal parameter esti-

mation problems (e.g. estimating the thermal conductivity k) [55] and identification

25

of the heat source function f(x, t) [56, 57, 58]. In all these inverse problems, the

missing information can be deduced from the temperature measurements at the

thermocouple locations as given in Eq. (3.6).

In most deterministic approaches to the classical IHCP, one looks for a flux

q0(x, t) ∈ L2(Γ0 × [0, tmax]) such that:

J (q0) ≤ J (q0), ∀ q0 ∈ L2(Γ0 × [0, tmax]) (3.7)

where, L2(Γ0×[0, tmax]) is the space of all square integrable functions defined over the

spatial and temporal domains Γ0 and [0, tmax], respectively. The objective function

J (q0) ≡ J (θ) to be minimized is usually chosen as the L2 norm of the error between

the estimated and measured temperatures along the sensor locations:

J (q0) =1

2

M∑

i=1

N∑

j=1

T (xi, tj; q0)− Y (xi, tj)2

=1

2‖F (θ)− Y ‖2

L2(3.8)

where the solution T (x, t; q0) of the parametric direct problem was defined earlier.

The discrete L2 norm is also introduced above to simplify in the following analysis

the notation of the cost function.

In the present implementation of the IHCP, the unknown heat flux q0(x, t) is

discretized linearly in space and time using finite element interpolation for the grid

and time-stepping that is also used in the direct heat conduction analysis. However,

the space/time discretization used in the direct problem is generally finer than that

used in the discretization of q0 to avoid so-called “inverse crime” [27]. Thus the

unknown q0 can be written as:

q0(x, t) =m∑

i=1

θiwi(x, t) (3.9)

26

where w′is are the pre-defined finite element basis functions. The IHCP is then

transformed to the estimation of the weights θ′is. These weights are considered to

be represented by an unknown random vector θ of length m.

Let us denote with ωm the sensor uncertainty (sensor noise). Then one looks for

the vector θ such that:

Y ' F (θ) + ωm (3.10)

Direct inversion of Eq. (3.10) (or direct optimization of Eq. (3.8)) to compute

the heat flux is not feasible as it leads to an ill-posed system of equations. In most

deterministic approaches to the IHCP, it is assumed that a quasi-solution to the

inverse problem exists in the sense of Tikhonov [10]. A regularization term, which

is usually the L2 norm of the unknown heat flux or its derivatives, is added to the

objective function (e.g. Eq. (3.8)) to ensure the uniqueness and smoothness of

the inverse solution. The Bayesian approach introduced below allows more flexible

treatment of the inverse problem.

3.2 Bayesian formulation of the inverse heat con-

duction problems

In the following, the thermal conductivity k and the thermocouple location d are

modeled as random variables, and the boundary heat flux q0 is modeled as a stochas-

tic process. It is obvious that the true values of these assumed random quantities are

fixed. The rationality in modeling them as random variables or stochastic processes

is that they are all derived from the noise-polluted data, hence, uncertainty exists

in our knowledge of these quantities. In this discussion of the classical inverse heat

conduction problem, the heat source is assumed to be a known quantity. The heat

27

source identification problem can be addressed simply by replacing the heat flux

term with the heat source term in all following developments. Examples of heat

source identification are presented in Section 3.5 to emphasize the general applica-

bility of the following methodology.

3.2.1 The likelihood

The measurement errors are assumed to be independent identically distributed

(i.i.d.) Gauss random noise with zero mean and variance vT (standard deviation

σT ). It is assumed herein that the numerical errors induced by F are much lower in

magnitude than the measurement errors. This assumption may cause some bias in

the estimation of statistics of measurement noise, however, its effect on the inverse

solution is considered minor in the numerical experiments discussed in this chapter.

Subsequently, the likelihood can be written as,

p(Y |θ, k,d) =1

(2π)n/2vn/2T

exp

−(Y − F (θ, k,d))T (Y − F (θ, k,d))

2vT

, (3.11)

where n = N ×M is the total number of measurements.

3.2.2 Prior distribution modeling

In the current study, the criteria to choose prior distribution are i.) using the conju-

gate prior for all hyper-parameters and lumped random variables, and ii) using MRF

or its derivatives for all distributed primary unknowns. Consequently, conjugate pri-

ors [26] are used for random variables k and d, and the point-pair Markov random

field (MRF) model introduced in previous chapter is adopted for prior distribution

modeling of the heat flux:

p(θ) ∝ λm/2 exp−1

2λθT Wθ

, (3.12)

28

where m is the dimension of θ. Each component of θ, namely θi, represents the value

of the heat flux at a site (node) of a finite temporal-spatial lattice by choosing the

basis functions in Eq. (3.9) as linear finite element basis functions [29] (see Fig. 3.2

for the heat flux discretization in 1D and 2D heat conduction). This MRF model

is most appropriate for cases in which the heat flux is only a function of time (as

in the 1D IHCP) or space (e.g. in a time sequential calculation of the heat flux

or in a stationary heat flux identification problem). The neighborhood is defined

as the temporally or spatially adjacent sites in each case, respectively. In more

general situations where the heat flux is a function of both space and time, the heat

flux at one site has neighbor sites in both time and space as shown on the right of

Fig. 3.2. Therefore, the prior model for the heat flux in a general transient inverse

heat conduction problem should differ from the one introduced above because the

physical and discretization length scales in time and space are inherently different. In

this work, a two-scale MRF prior model is used by multiplying the weight coefficients

associated with temporally adjacent random parameter pairs by a scaling parameter

ζ. ζ is defined as the ratio of non-dimensional time step length to space step length

in the discretization of the heat flux. The parameter ζ can in general be treated as

unknown and updated in a (hierarchical) Bayesian formulation, but this approach

has not been followed here.

When discontinuities are expected in the unknown function (e.g. in the bound-

ary heat flux), the above canonical MRF model needs to be further adjusted since

it tends to over-smooth the inverse solution, i.e., the discontinuities may not be

resolved. Discontinuity adaptive MRF (DAMRF) models [59] are appropriate for

prior modeling in this situation. DAMRF can adaptively decrease the correlation

coefficient (entries of W) of two variables at adjacent spatial locations if the dif-

29

i-1

dt

neighbors of

i

t

qwi

i

i+1

qy

x

t

i

wi

Neighbors of

i

Figure 3.2: Linear finite element basis functions and neighborhood definition for θ.

The figure on the left refers to 1D heat conduction (unknown heat flux q(t)) and

the figure on the right to 2D heat conduction in a square domain (unknown heat

flux shown q(x, t)).

ference between these two variables tends to increase during the MCMC sampling

process. For instance, the correlation coefficient of two adjacent random variables

θi and θj can be defined as inversely proportional to |θi − θj| (i.e. the larger the

deviation between the two adjacent random variables, the smaller the spatial corre-

lation between them). As a consequence, the nonzero off-diagonal entries in W of

Eq. (3.12) vary in each MCMC sampling step instead of being fixed as −1. With

this consecutive update of the prior model (matrix W), the difference between two

adjacent variables tend to be amplified and the discontinuity, if there exists, will

eventually be resolved. For a complete summary and comparison of DAMRF mod-

els and the required programming techniques, one can consult [60]. In the current

study, a simple DAMRF model that mimics the basic line process model is adopted

[59]. In this approach, the total variation of θ is computed at each MCMC sam-

pling step after generating the new sample. If the variation between two adjacent

variables (say θi and θi+1) exceeds certain fraction (10% in current examples) of the

total variation, then Wi,i+1 and Wi+1,i are both set to zero, and 1 is subtracted from

30

ni and ni+1, respectively. Otherwise, the canonical MRF model is used. This model

is applied in Section 3.5 in the estimation of a discontinuous distributed heat source.

The prior distribution, p(k), of the conductivity is assumed to be of the form,

p(k) ∝ exp

−(k − k)2

2vk

, when k > 0, and 0 otherwise, (3.13)

where k and vk are the mean and variance, respectively, of a normal distribution.

This is in fact a renormalized normal distribution to enforce the non-negativity of

k. The uncorrelated joint normal distribution with mean d and covariance vdI is

assigned to d, where I is the identity matrix. Also, the state space of d is confined

in Ω.

3.2.3 The posterior distributions

With the above prior distribution models, the PPDF can be evaluated as,

p(θ, k,d|Y ) ∝ exp

−(Y − F (θ, k,d))T (Y − F (θ, k,d))

2vT

· exp

−1

2λθT Wθ

· exp

−(k − k)2

2vk

· exp

−(d− d)T (d− d)

2vd

,

k ∈ (0,∞) ∩ d ∈ Ω, and 0 otherwise. (3.14)

The parameters λ, k, vk, d and vd in the above formulation can be treated as ran-

dom variables in Bayesian inference, which are the hyper-parameters. A hierarchical

Bayesian PPDF is then formulated as follows:

p(θ, k,d, λ, k, vk, d, vd|Y ) ∝ p(Y |θ, k,d)p(θ|λ)p(k|k, vk)p(d|d, vd)

·p(λ)p(k)p(vk)p(d)p(vd). (3.15)

The function of this hierarchical Bayesian model is to diminish the effect of poor

prior knowledge of the hyper-parameters on the solution of the inverse problem. The

31

natural way to select priors for the hyper-parameters is to use the conjugate priors.

Hence, local uniform distributions are assigned to k and d. Gamma distribution

is chosen for λ, and inverse Gamma distribution is chosen for vk and vd. Equation

(3.15) can then be evaluated as,

p(θ, k,d, λ, k, vk, d, vd|Y ) ∝ exp

−(Y − F (θ, k,d))T (Y − F (θ, k,d))

2vT

·λm/2 exp−1

2λθT Wθ

v−1/2k exp

−(k − k)2

2vk

·v−r/2d exp

−(d− d)T (d− d)

2vd

λα0−1 exp −β0λ

·v−(1+α1)k exp

−β1v

−1k

v−(1+α2)d exp

−β2v

−1d

,

when λ ∈ (0,∞) ∩ k ∈ (0,∞) ∩ d ∈ Ω ∩ k ∈ (0, kmax] ∩ d ∈ Ω

∩ vk ∈ (0,∞) ∩ vd ∈ (0,∞), and 0 otherwise, (3.16)

where kmax is the maximum possible value of k (which can be an arbitrary large

number), r is the dimension of Ω, and (α0, β0), (α1, β1) and (α2, β2) are the parameter

pairs of the Gamma distribution that is of the form,

pX(x) =βα

Γ(α)xα−1e−βx, (3.17)

with Γ being the standard Gamma function. Here vT can also be treated as unknown

since it is rather difficult to quantify the magnitude of the measurement noise directly

from data. This is especially true when the experiment for collecting the temperature

data is not repetitive. In this case, the hierarchical and augmented Bayesian PPDF

is introduced as follows:

p(θ, k,d, λ, k, vk, d, vd, vT |Y ) ∝ v−n/2T exp

−(Y − F (θ, k,d))T (Y − F (θ, k,d))

2vT

·λm/2 exp−1

2λθT Wθ

v−1/2k exp

−(k − k)2

2vk

32

·v−r/2d exp

−(d− d)T (d− d)

2vd


·v−(1+α1)k exp

−β1v

−1k

v−(1+α2)d exp

−β2v

−1d

·v−(1+α3)T exp

−β3v

−1T

,

when λ ∈ (0,∞) ∩ k ∈ (0,∞) ∩ d ∈ Ω ∩ k ∈ (0, kmax] ∩ d ∈ Ω

∩ vk ∈ (0,∞) ∩ vd ∈ (0,∞) ∩ vT ∈ (0,∞), and 0 otherwise. (3.18)

Although the parameters k, d and θ are modeled as random variables in the same

joint distribution, there is no attempt to solve the inverse problem to simultaneously

estimate all these quantities. The solution to such a problem will, in most occasions,

be impractical or infeasible unless a substantial number of temperature data or

other constraints among the unknowns are available. Therefore, the idea behind

the above joint distribution is to investigate the effect of uncertainties in k and

d on the distribution of unknown θ provided prior distributions of k and d can

strongly constrain the highest density regions of k and d, respectively. Finally, it

is necessary to point out that the choices of distributions in the above formulations

are based on common practice but are not unique. The selection of distributions for

measurement noise and the priors may vary according to the nature of uncertainties

in each problem examined.

The above PPDFs are implicit due to the presence of numerical solver F , hence

can only be evaluated up to the normalizing constants. Numerical sampling strate-

gies are introduced in the next section to explore the posterior state spaces.

3.2.4 Regularization in the Bayesian approach

Before introducing the exploration of the posterior state space, it is helpful to discuss

the relation between Bayesian prior modeling and classical regularization method

33

for better understanding of the Bayesian method. Under some assumptions, one

can show that Bayesian prior regularization takes similar form to Tikhonov regular-

ization. For the system of Eq. (3.10), Tikhonov regularization modifies the original

finite dimensional parametric estimation problem posed with minimization of the

functional of Eq. (3.8) as follows:

θLS = argminθ 1

2‖F (θ)− Y ‖2

L2+ α‖θ‖2

p (3.19)

where θLS is the deterministic estimate of θ, α is a regularization parameter and ‖·‖p

represents different norms in the parameter space, usually taken as the L2 norm.

To clarify the relation between the above Tikhonov regularization and the Bayesian

prior regularization induced by Eq. (3.12), it is assumed that ωm is Gauss white

noise with known standard deviation σ. Then the likelihood function is the follow-

ing:

p(Y |θ) =1

(2πσ2)n/2exp−(F (θ)− Y )T (F (θ)− Y )

2σ2 (3.20)

where n = M ×N is the length of Y . The posterior PDF of θ can then be written

as follows:

p(θ|Y ) ∝ exp−(F (θ)− Y )T (F (θ)− Y )

2σ2

· λm/2 exp(−1

2λθT Wθ) (3.21)

From this distribution, the MAP estimate of θ can be derived as:

θMAP = argminθ 1

2(F (θ)− Y )T (F (θ)− Y ) +

λσ2

2θT Wθ (3.22)

By comparing Eqs. (3.19) and (3.22), it is seen that the least squares estimator and

the MAP estimator have similar mathematical forms. For example, by choosing

λ = 2ασ2 and W as an identity matrix, the two methods become identical for zeroth-

order Tikhonov regularization.

34

It is now clear that in Bayesian formulation, λ plays the role of a regularization

constant when σ is known. Different W ’s can be proposed for different problems.

One can in principle derive an MRF model to emulate various norms in the para-

meter space used in Tikhonov regularization.

In either approach, selection of regularization parameter, α or equivalently 12λσ2,

is important. There are in general three approaches for determining an optimal value

of regularization parameter. One is the so called Unbiased Predictive Risk Estimator

(UPRE) method. Another approach is the heuristic Tikhonov method, where the

inverse problem is solved with a set of regularization parameters. It was observed

that within a certain range of the regularization parameter (orders of magnitude

long), the obtained inverse solution was practically unchanged. The regularization

parameter is then chosen from this range. The third way, as used in this work, is

the use of a hierarchical Bayesian model. In this case, the problem is modeled in a

more flexible way and one is able to investigate uncertainty in spatial dependence

as well as to find the optimal regularization parameter.

3.3 Parameter estimation

In most cases, the thermophysical properties of conducting solids are not directly

measurable. Therefore, experiments are designed to measure closely related quan-

tities such as temperature. An inverse problem is then solved to obtain an optimal

estimate of the unknown property. Bayesian inference is applicable to this type

of inverse problem because the temperature is recognized as a sufficient statistic

of the thermophysical properties. Herein, thermal conductivity estimation is ana-

lyzed with the following analysis easily being extendable to the estimation of other

thermophysical properties.

35

Let us reconsider the inverse problems defined in Eqs. (3.1)-(3.9) with the mod-

ification that q0 and f are known and k is unknown. We also assume here that d is

fixed. According to Bayes’ formula, p(k|Y ) can be evaluated as,

p(k|Y ) ∝ p(Y |k)p(k). (3.23)

Therefore, as a special case of Eqs. (3.14), (3.16) and (3.18), the simple, hierar-

chical and augmented and hierarchical PPDFs of k conditional on the temperature

measurements Y are given as,

p(k|Y ) ∝ exp

−(Y − F (k))T (Y − F (k))

2vT

· exp

−(k − k)2

2vk

,

when k > 0, and 0 otherwise, (3.24)

p(k, k, vk|Y ) ∝ exp

−(Y − F (k))T (Y − F (k))

2vT

v−1/2k exp

−(k − k)2

2vk

·v−(1+α)k exp

−βv−1

k

, when k ∈ (0,∞) ∩ k ∈ (0, kmax]

∩vk ∈ (0,∞), and 0 otherwise, (3.25)

and,

p(k, k, vk, vT |Y ) ∝ v−n/2T exp

−(Y − F (k))T (Y − F (k))

2vT

·v−1/2k exp

−(k − k)2

2vk

v−(1+α)k exp

−βv−1

k

·v−(1+α1)T exp

−β1v

−1T

, when k ∈ (0,∞) ∩ k ∈ (0, kmax]

∩vk ∈ (0,∞) ∩ vT ∈ (0,∞), and 0 otherwise, (3.26)

respectively.

Equation (3.24) can be interpreted as a balance between prior belief regarding

the unknown parameter and information contained in the data (likelihood). More

precise prior models or more accurate measurements can lead to better posterior

estimates. Hence, the advantages of the above formulation over likelihood inference

36

are apparent especially (i) when the number of measurements is limited, accurate

posterior estimates are still possible through proper prior distribution modeling, and

(ii) when prior belief of the parameter is able to correct the effects of biased data.

To explore the PPDF of Eq. (3.24) using the MH algorithm, a symmetric proposal

distribution q(·|k(i)) ∝ N(k(i), σ2kq) is used, where σkq is specified as 5% of the

proposal mean (k(i)). This symmetric random walk was proved by the experimental

results to be rather optimal as proposal distribution in this case. It ensures a high

acceptance ratio as well as the capability to visit the entire posterior state space.

For Eqs. (3.25) and (3.26), the proposal distributions for all random variables have

the same structure. However, for the PPDFs in Eqs. (3.25) and (3.26), a strategy is

taken to update one variable at a time in order to increase the acceptance probability.

By defining ξ = [k, k, vk, vT ]T and

ξ(i+1)−j = ξ(i+1)

1 , ..., ξ(i+1)j−1 , ξ

(i)j+1, ..., ξ

(i)4 , (3.27)

the sampler for Eq. (3.26) is designed as follows:

Algorithm III

1. Initialize ξ(0)

2. For i = 0 : Nmcmc− 1

For j = 1 : 4


— sample ξ(∗)j ∼ qj(ξ

(∗)j |ξ(i+1)

−j , ξ(i)j )

— if u < A(ξ(∗)j , ξ

(i)j )

ξ(i+1)j = ξ

(∗)j

— else

37

ξ(i+1)j = ξ

(i)j ,

where,

A(ξ(∗)j , ξ

(i)j ) = min

1,

p(ξ(∗)j |ξ(i+1)

−j )qj(ξ(i)j |ξ(∗)

j , ξ(i+1)−j )

p(ξ(i)j |ξ(i+1)

−j )qj(ξ(∗)j |ξ(i)

j , ξ(i+1)−j )

, (3.28)

and,

qj(ξ(∗)j |ξ(i+1)

−j , ξ(i)j ) ∝ N(ξ

(i)j , σ2

ξjq), (3.29)

where σξjq is 5% of the proposal mean (ξ(i)j ).

3.4 Heat flux reconstruction under uncertainties

Boundary heat flux reconstruction is possibly the most popular inverse problem

in heat transfer processes. In existing methods, the difficulty in selecting optimal

regularization parameter to obtain a point estimate has not been well addressed.

In addition, the effects of errors in thermophysical properties (e.g. conductivity

and specific heat) and sensor locations on the solution of the inverse problem were

not examined. Herein, these two issues are addressed separately in the hierarchical

Bayesian inference framework.

3.4.1 Automatic selection of the regularization parameter

Selection of the regularization parameter has never been a trivial problem in almost

all deterministic methods for inverse problems (e.g. in Tikhonov regularization [10]

or the iterative regularization method [3]). In Bayesian estimation, regularization

is still critical as the scaling parameter λ of the MRF prior, which acts as a regu-

larization parameter [29], affects the posterior distribution, and more explicitly, it

substantially affects the posterior point estimates. A hierarchical Bayesian method

38

provides an elegant approach to choose λ automatically based upon the noise level

and prior distribution models.

In this section, we consider the special cases of Bayesian formulations in Eqs. (3.16)

and (3.18) with assumptions that k and d are known fixed constants. The resulting

formulations are then PPDFs of boundary heat flux under measurement noise with

known or unknown vT . They are given as follows:

p(θ, λ|Y ) ∝ exp

−(Y − F (θ))T (Y − F (θ))

2vT

λm/2 exp

−1

2λθT Wθ

·λα0−1 exp −β0λ , when λ ∈ (0,∞), and 0 otherwise, (3.30)

p(θ, λ, vT |Y ) ∝ v−n/2T exp

−(Y − F (θ))T (Y − F (θ))

2vT

·λm/2 exp−1

2λθT Wθλα0−1 exp−β0λ

v−(1+α3)T exp

−β3v

−1T

,

when λ ∈ (0,∞) ∩ vT ∈ (0,∞), and 0 otherwise. (3.31)

These two hierarchical Bayesian formulations enable a mechanism to select λ au-

tomatically by treating λ as a random variable. In the MCMC exploration of the

above PPDFs, the parameter λ is updated in each iteration so that an optimal

distribution of λ conditional on the measurement data is achieved.

When k and d are fixed, the system equation can be simply written as:

Y = Hθ + TI + ω, (3.32)

where H is the sensitivity matrix:

H(j, k) = TH(tj; wk), j = 1 : n, k = 1 : m. (3.33)

In the above equation, TH denotes the direct simulation solution at sensor location

with zero initial condition, zero boundary conditions on Γg and Γh, and heat flux wk

39

on Γ0. Also, TI denotes the direct solution at sensor location with zero boundary

condition on Γo and known initial condition and boundary conditions on Γg and Γh,

respectively.

In this case, it is noticed that the conditional distribution of θ on λ, vT and

Y follows a multivariate Gaussian. Hence, the full conditional distribution of each

component of θ is in standard form, which can be derived as follows:

p(θi|θ−i, λ, vT , Y ) ∝ N(µi, σ2i ), µi =

bi

2ai

, σi =

√1

ai

,

ai =N∑

s=1

H2si

vT

+ λWii, bi = 2N∑

s=1

µsHsi

vT

− λµp,

µs = Ys − (TI)s −∑

t6=i

Hstθt, µp =∑

j 6=i

Wjiθj +∑

k 6=i

Wikθk. (3.34)

It was mentioned earlier that the full conditional distribution can be used as proposal

distribution in the MCMC sampler. This will lead to a single-component Gibbs

sampler that has acceptance probability 1.0. A modified single-component Gibbs

sampler is thus used to explore the PPDFs in Eqs. (3.30) and (3.31) as follows,

Algorithm IV

1. Initialize θ(0), λ(0) and v(0)T

2. For i = 0 : Nmcmc− 1


−1 , λ(i), v(i)T )


−2 , λ(i), v(i)T )

—...


−m , λ(i), v(i)T ).


— sample λ(∗) ∼ qλ(λ(∗)|λ(i))

40

— if u < A(λ(∗), λ(i))

λ(i+1) = λ(∗)

— else

λ(i+1) = λ(i),


— sample v(∗)T ∼ qv(v

(∗)T |v(i)

T )

— if u < A(v(∗)T , v

(i)T )

v(i+1)T = v

(∗)T

— else

v(i+1)T = v

(i)T ,

where qλ and qv are determined similarly to Eq. (5.29).

3.4.2 Effect of the sensor location

In the proceeding sections, the focus was given to exploring the statistical informa-

tion of measurement errors and the prior modeling of primary- and hyper-unknowns.

However, other factors may also affect the solution of inverse heat conduction prob-

lems. Since the inverse problems are driven by sensor data, it is rational to in-

vestigate the effect of sensor location on the solution of the inverse problem. It is

straightforward to realize that the closer the sensor to the boundary, the better the

point estimate of the boundary heat flux is. However, the question of how the loca-

tion affects the higher order statistics of the boundary heat flux, or more specifically,

how the reliability regions of the inverse solution are affected by the sensor location,

can only be answered through Bayesian computation.

41

The difficulty of analyzing the effect of sensor location arises from the fact that

for the majority of inverse problems of interest, there is no closed functional form

available to describe the relation between d and the statistics of the inverse solutions

or even the point estimates themselves. For instance, in Eq. (3.33), d affects each

component of the sensitivity matrix H; hence it also affects the PPDF in Eq. (3.31).

However, it is rather difficult to explicitly study the effect of d on the posterior dis-

tribution of θ in an analytical manner. An alternative approach is to investigate the

effect by Bayesian computation. Given the PPDF of Eq. (3.31) and same magnitude

of the measurement noise, a sequence of numerical experiments can be conducted

with different sensor locations d1, d2, . . ., ds. By comparing the posterior estimates

(both point estimates and probability bounds) from MCMC samples, the effect of

d can be revealed. This experimental method provides an approach to guide ex-

perimental design in data-driven inverse problems, especially for higher dimensional

problems where it is of practical importance to use a minimum number of sensors

to achieve desirable inverse solution accuracy and reliability.

3.4.3 IHCP under model uncertainties

In many boundary heat flux reconstruction problems, the knowledge of thermophys-

ical property and/or sensor location is not exact. For instance, the true values of k

and d may exist in a narrow neighborhood of the nominal values. It is not clear up

to now how the uncertainties (small errors) in these system parameters would affect

the inverse solutions and the PPDF. Once again, as mentioned in the discussion

of the sensor location effect, it is impossible to conduct the investigation analyt-

ically. Therefore, the proposed approach is to explore the hierarchical Bayesian

formulation.

42

In Eq. (3.18), all hyper-parameters are modeled as random variables. Although

reasonable from a statistical inference perspective, the exploration of this formula-

tion can only be physically feasible by constraining the prior distributions of k and

d. Let us consider the practical case where k and d are known to be around certain

nominal values. In this case, constraints can be added to Eq. (3.18) by setting k

and d as the nominal values. Following these assumptions, the PPDF to investigate

effects of system uncertainties is,

p(θ, k,d, λ, vk, vd, vT |Y ) ∝ v−n/2T exp

−(Y − F (θ, k,d))T (Y − F (θ, k,d))

2vT

·λm/2 exp−1

2λθT Wθ

v−1/2k exp

−(k − k)2

2vk

·v−r/2d exp

−(d− d)T (d− d)

2vd


·v−(1+α1)k exp

−β1v

−1k

v−(1+α2)d exp

−β2v

−1d

·v−(1+α3)T exp

−β3v

−1T

,

when λ ∈ (0,∞) ∩ k ∈ (0,∞) ∩ d ∈ Ω ∩ vk ∈ (0,∞) ∩ vd ∈ (0,∞)

∩ vT ∈ (0,∞), and 0 otherwise. (3.35)

For this PPDF, the sensitivity matrix H varies in each MCMC iteration since

k and d are updated as well. Therefore, in implementations of the modified single-

component Gibbs sampler (algorithm IV), the sensitivity matrix H needs to be

recomputed at each iteration using updated k and d. H is used to update θ in the

single-component Gibbs sampling algorithm, and then the other random variables

λ, k, d, vk, vd and vT are updated consecutively in each MCMC step. Another mod-

ification to algorithm IV is that the proposal distributions of k and d are N(k, vk)

and N(d, vdI) in each iteration, respectively. Solutions of the IHCP accounting for

43

the sensor location effect and thermophysical property uncertainties are presented

and discussed in the following section.

Before proceeding to the presentation of the numerical examples, it should be

noticed that the convergence of a Markov chain in MCMC simulation is in general

a complex issue [38]. In this study, the convergence of a chain is determined when

the estimates of the posterior density remain practically unchanged using the same

amount of samples.

3.5 Examples

xq

dL

Y (d,i

t)

q

t0 0.4 0.8

1

1.0

Figure 3.3: The left figure is the schematic of the 1D inverse heat conduction prob-

lem. The figure on the right provides the time-profile of the true heat flux that was

used to generate the simulated sensor data.

3.5.1 Example I: Parameter estimation

The first example being studied is the estimation of the thermal conductivity k of a

conducting solid. Let us consider the experiment in Fig. 3.3 with the solid body at

zero initial temperature and being insulated at the right end (x = L). A heat flux

q(t) with triangular time profile is applied at the left end (x = 0). The temperature

44

Table 3.1: Bayesian estimates of k using different models.

case# Bayesian model prior of k data # σT kpostmean σk

1 Simple (Eq. 3.24) normal 50 0.005 1.2210 0.0032

2 Simple (Eq. 3.24) normal 50 0.001 1.2150 0.0006

3 Simple (Eq. 3.24) normal 100 0.005 1.2166 0.0022

4 Simple (Eq. 3.24) uniform 50 0.005 1.2205 0.0031

5 Hierarchical (Eq. 3.25 normal 50 0.005 1.2204 0.0031

with vk known)

6 Hierarchical (Eq. 3.25) normal 50 0.005 1.2204 0.0032

7 Hierarchical (Eq. 3.26) normal 50 0.005 1.2206 0.0058

is recorded at x = d. To simplify the discussion, the numerical study is conducted

in a dimensionless manner as follows:

∂T

∂t= k

∂2T

∂x2, 0 < t < 1, 0 < x < 1, (3.36)

T (x, 0) = 0, 0 ≤ x ≤ 1, (3.37)

k∂T

∂x|x=1 = 0, k

∂T

∂x|x=0 = q(t), 0 < t < 1. (3.38)

The simulation data are generated by adding i.i.d. Gauss random noise with mean

0 and variance vT to the computed temperature at d. In generating the data, a true

value of k is used randomly generated from a normal distribution with mean k and

variance vk (standard deviation σk). Algorithm III is used in this example. The

parameters α, β, α1 and β1 all take values of 1.0e− 3.

In this example, k and σk are taken as 1.0 and 0.15, respectively, and a value of

1.2146 is generated as the true k. In Table 1, the Bayesian estimates using differ-

ent formulations and different simulation data are listed. kpostmean is the posterior

45

mean estimate and σk is the estimate of standard deviation of the posterior dis-

tribution. The posterior densities of k in all listed cases are plotted in Fig. 3.4.

For each case, 20000 samples (after convergence) generated by the MH sampler are

used to compute the estimates. It is clear that the posterior mean estimates are

largely accurate. Note that increasing the number of measurements or decreasing

1.20 1.21 1.22 1.23 1.240

200

400

600

1.20 1.21 1.22 1.23 1.240

200

400

600

1.20 1.21 1.22 1.23 1.240

200

400

600

1.20 1.21 1.22 1.23 1.240

200

400

600

1.20 1.21 1.22 1.23 1.240

200

400

600

1.20 1.21 1.22 1.23 1.240

200

400

600

1.20 1.21 1.22 1.23 1.240

200

400

600

case 1 case 2 case 3

case 4 case 5 case 6

case 7

Figure 3.4: Computed posterior densities of k using different Bayesian models.

the magnitude of measurement errors can both reduce the standard deviation of

the posterior distribution and improve the posterior mean estimate. The posterior

mean estimate obtained from the first case is slightly more biased than the other

cases since the normal prior (fixed mean 1.0 and standard deviation 0.15) is biased

in representing the true value of k while the data contain more accurate information

about k. By relaxing the prior assumption on k, the estimates are improved. In

46

addition, case 7 enforces almost no assumption about the uncertainties. However,

it still provides an accurate estimate, even though the standard deviation of the

posterior distribution is higher than in the previous cases. Meanwhile, the posterior

mean estimate of σT is 0.0093 in case 7. The bias in this estimate is due to both

non-repetitive experimental data and the existence of numerical errors.

3.5.2 Example II: Boundary heat flux estimation

q

t0 0.5 0.9

1

1.00.1

Figure 3.5: True heat flux in example II.

In this example, we modify the earlier example by fixing the conductivity at 1.0 and

assuming that the boundary heat flux q(t) is unknown. The inverse problem is then

transformed to reconstructing q(t) from temperature measurements at location d.

To generate the simulation data, a direct heat conduction problem is first solved

on a fine grid and small time step with a boundary heat flux of the profile given in

Fig. 3.5. Simulation noise (i.i.d. Gauss error with mean 0 and variance vT ) is then

added to the direct solution at location d.

The purpose of this study is to show that the Bayesian approach can automati-

cally select the optimal regularization.

The posterior mean estimates and 98% probability bounds of the posterior dis-

tribution of the boundary heat flux are plotted in Figs. 3.6- 3.8. In all cases, the

prior distribution of λ is selected as Gamma distribution with parameters α = 0.001

47

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

t

q

True heat fluxPosterior mean estimate98% probability bounds

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

t

q


Figure 3.6: Posterior mean estimates of the heat flux and 98% probability bounds

of the posterior distributions when d = 0.5 using a hierarchical Bayesian model

(Example II). The figure on the left is obtained when σT = 0.01 and the figure on

the right is obtained when σT = 0.001.

and β = 0.001. This prior barely contains any information regarding λ except

for enforcing its nonnegativity. 100 measurements are taken at the sensor location

(sampling time interval ∆t = 0.01) for all cases. The results in Figs. 3.6 and 3.8 are

obtained when the thermocouple is located at d = 0.5, and the ones in Fig. 3.7 are

obtained when d = 0.1. Two levels of noise are considered, σT = 0.01 and 0.001. In

the discretization of q(t), 51 basis functions are used for each case. The hierarchical

Bayesian model of Eq. (3.30) is used to obtain the results in Figs. 3.6 and 3.7, and

the hierarchical and augmented Bayesian model of Eq. (3.31) is used to obtain the

results in Fig. 3.8. For all cases, 50000 MCMC samples are generated and the results

are obtained from the last 25000 samples.

It is clear that the automatic selection of the regularization parameter using

the hierarchical Bayesian model is rather optimal. The posterior estimates in all

cases are accurate and stable to perturbations in the location of the thermocouple

48

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

t

q


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.2

0

0.2

0.4

0.6

0.8

1

1.2

t

q



of the posterior distributions when d = 0.1 using a hierarchical Bayesian model

(Example II). The figure on the left is obtained when σT = 0.01 and the figure on

the right is obtained when σT = 0.001.

and the noise level. To verify the updated distribution of λ conditional on the

data, its estimated posterior density in the second case (corresponding to Fig. 3.6

(right)) is plotted in Fig. 3.9. The distribution of λ is greatly refined compared to

its prior (nearly uniform on (0,∞]). In the MCMC update, an initial value of 50 is

picked for λ. This is based upon a ‘common sense’ estimate of the magnitude of the

regularization parameter (λσ2T ), though it can be an arbitrary positive value. The

posterior mean of λ in this case is 153.4. The plot is obtained from the last 25000

samples among the total 50000 samples.

By comparing Fig. 3.6 (left) and Fig. 3.8, it is also observed that the point es-

timates and probability bounds using the hierarchical Bayesian model are almost

identical to those from the hierarchical and augmented Bayesian model. This implies

that the Bayesian method can detect the magnitude of noise in the data. The result

in Fig. 3.8 is obtained with no knowledge of the noise magnitude and regularization

49

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.2

0

0.2

0.4

0.6

0.8

1

t

q



of the posterior distribution when d = 0.5 and σT = 0.01 using a hierarchical and

augmented Bayesian model (Example II).

50 100 150 200 250 3000.000

0.004

0.008

0.012

0.016

p()

Figure 3.9: Posterior density estimate of hyper-parameter λ in the second case.

50

parameter. This example demonstrates the advantages of using the Bayesian infer-

ence method for inverse problem solution. By comparing Figs. 3.6 and 3.7, it is also

observed that the distribution (probability bounds) of the heat flux conditional on

the temperature measurements is affected significantly not only by the noise level

but also by the location of the thermocouple. At the same noise level, the closer the

sensor is to the boundary with unknown heat flux, the narrower the highest density

region of the posterior state space.

3.5.3 Example III: Boundary heat flux identification with

simultaneous uncertainties in material property and

thermocouple location

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.2

0

0.2

0.4

0.6

0.8

1

t

q


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.2

0

0.2

0.4

0.6

0.8

1

t

q



of the posterior distribution when uncertainties in d and k exist. The figure on the

left is obtained using true d and k, and the figure on the right is obtained using the

nominal values of d and k (Example III).

In the third numerical experiment, the 1D inverse heat conduction problem is

reconsidered with uncertainties in the thermal conductivity k and sensor location d.

51

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−0.2

0

0.2

0.4

0.6

0.8

1

t

q



of the posterior distribution when k and d are treated as random variables (Example

III).

It is thus assumed that the true values of k and d are near known nominal values k

and d for the given experiment, respectively. It is of interest to study the effect of

such system uncertainties on the computed inverse solutions.

In the current cases, k and d are selected as 1.0 and 0.3, respectively (dimension-

less quantities). Two random values 0.968376 and 0.328135 are generated to act as

the true values of k and d, respectively. 100 simulation measurements are generated

using the true k and d and the heat flux profile in the right Fig. 3.3 following the

same procedure as in the earlier examples. Also, σT = 0.005 in this part of the

study. First, two cases are studied as shown in Fig. 3.10. The results are obtained

through exploring the PPDF in Eq. (3.31) using the sensitivity matrix H computed

at nominal values k and d (right figure) and at true values k and d (left figure). The

third case is conducted through exploring the PPDF in Eq. (3.35) in which k and

d are treated as random variables (Fig. 11). It is observed from the three plots in

Figs. 3.10 and 11 that the uncertainties in k and d do not significantly affect the

52

inverse solution (posterior distribution). However, this is based upon the fact that

the magnitude of the uncertainties considered is small. In this case, the distribution

of the inverse solution is mainly dominated by the measurement noise.

3.5.4 Example IV: 1D piece-wise continuous heat source

identification

A heat source identification problem in 1D heat conduction is examined in this

section. The problem has been studied by Yi and Murio [15] using the mollification

method [61]. It is defined in a dimensionless manner as:

∂T

∂t= (k(x)Tx)x + f(x, t), (3.39)

where f(x, t) is the unknown source function to be estimated from temperature

measurements. As in Yi and Murio [15], we examine the following special case: when

k(x)=1 in x ∈ [0, 0.25], k(x) = 4x in x ∈ [0.25, 0.5], k(x) = 3− 2x in x ∈ [0.5, 0.75]

and k = 1.5 in x ∈ [0.75, 1.0], and T (x, t) = ex−t in x ∈ [0, 1], the exact heat source

is given as f(x, t) = −2ex−t in x ∈ [0, 0.25], f(x, t) = −(5+4x)ex−t in x ∈ [0.25, 0.5],

f(x, t) = (−2 + 2x)ex−t in x ∈ [0.5, 0.75] and f(x, t) = −2.5ex−t in x ∈ [0.75, 1.0].

We use the PPDF in Eq. (3.31) to solve these two problems by replacing the heat

flux term with the heat source term. The line process DAMRF model is used as prior

distribution of the heat source. The simulation data are generated by adding i.i.d.

Gauss random errors with σT = 0.005 to the analytical T (x, t). The temperature

is assumed to be measured at 31 evenly distributed sites within the domain [0, 1]

(no sensors on the boundary) at constant sampling time interval of 0.01. A grid

with 128 elements is used in the discretization of the heat source. The heat source

is reconstructed from t = 0 to t = 0.5. The true heat source and posterior mean

53

estimate are plotted in Fig 3.12. The results are rather accurate and comparable

with the results achieved in [15] under similar conditions except that the number of

thermocouples used in the current example is significantly less.

00.2

0.40.6

0.81 0

0.1

0.2

0.3

0.4

0.5

−15

−10

−5

0

y

x

f

00.2

0.40.6

0.81 0

0.1

0.2

0.3

0.4

0.5

−15

−10

−5

0

y

x

f

Figure 3.12: True heat source (left) and reconstructed heat source (right) for case

II of Example IV.

To verify the accuracy of the posterior mean estimates, the estimates and 98%

probability bounds of the posterior distributions at t=0.24 are plotted in Fig. 13.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−12

−10

−8

−6

−4

−2

0

x

f

True heat source Posterior mean estimate98% probability bounds of the posterior distribution

Figure 3.13: Posterior mean estimate and 98% probability bounds of the posterior

distribution of step heat source at t = 0.24.

54

3.5.5 Example V: 2D heat source identification

In this example, we consider a heat source identification problem as follows,

∂T

∂t=

∂2T

∂x2+

∂2T

∂y2+ f(x, y, t), 0 < t, 0 < x, y < 1, (3.40)

T (x, y, 0) = 0, 0 ≤ x, y ≤ 1, (3.41)

∂T

∂x|x=0 =

∂T

∂y|y=0 =

∂T

∂x|x=1 =

∂T

∂y|y=1 = 0, 0 < t, (3.42)

where the heat source f(x, y, t) is unknown. The problem is to reconstruct this

temporal-spatially varying quantity from temperature measurements at a number

of sensor locations.

0 0.5 100.5

10

100

200

xy

f

0 0.5 10

0.5

10

100

200

xy

f

0 0.5 10

0.5

10

100

200

xy

f

0 0.5 100.5

10

100

200

xy

f

t = 0 t = 0.02

t = 0.05 t = 0.1

Figure 3.14: True heat source profiles for example V.

A numerical experiment is conducted by simulating the case where 25 thermo-

couples are uniformly distributed within the domain [0, 1]×[0, 1], which is considered

a reasonable setup since no information about the heat source distribution is avail-

able a priori. At each sensor location, 20 measurements are taken at equal frequency

from t = 0 to t = 0.1.

55

0 0.5 100.5

10

100

200

xy

f

0 0.5 100.5

10

50

100

150

200

250

xy

f

0 0.5 100.5

10

100

200

xy

f

0 0.5 100.5

10

100

200

xy

f

t = 0 t = 0.02

t = 0.05 t = 0.1

Figure 3.15: Posterior mean estimates of heat source profiles when σT = 0.02 for

example V.

The true heat source used in the simulation data generation is of the form

f(x, y, t) = exp(−10t)20

2π · 0.1252exp

−(x− 0.75)2 + (y − 0.725)2

2 · 0.1252

. (3.43)

The data are generated by adding i.i.d. Gauss random errors (0 mean and standard

deviation σT ) to the direct solution with this heat source on a fine finite element

grid. Two magnitudes of noise level with σT = 0.005 and 0.02, respectively, are

examined.

This example is solved using the Bayesian formulation in Eq. (3.31). The two-

scale MRF model is used in prior modeling of the heat source. The heat source is

reconstructed using a discretization of 32× 32 grid in space and 11 basis functions

in time. The posterior state space is explored using the modified single-component

Gibbs sampler (algorithm IV).

The true heat source profiles at different time points and corresponding recon-

structed heat source profiles (posterior mean estimates) in the second case (σT =

0.02) are plotted in Figs. 3.14 and 3.15, respectively. It is seen that the posterior

56

mean estimates are overall rather accurate. The deviations in estimates at the ini-

tial time and the final time points are slightly larger. This is because the noise

to signal ratio in the first few time steps is large, and the simulated data contains

less information regarding the heat source in the final time period. Considering the

uniform distribution of sensors and the fact that no assumptions on noise magnitude

and regularization parameter are made in the solution procedure, the estimates are

rather satisfactory.

To further verify the accuracy of the posterior mean estimate, the reconstructed

heat source profiles at y = 0.725 at different time steps are plotted in Fig. 16 for

the first case (σT = 0.005). The probability bounds for the posterior distribution at

t = 0.05 are also shown in the same figure. It is seen that the estimates are rather

accurate except at early times as discussed above.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−50

0

50

100

150

200

x

f

true heat source

posterior mean estimate

t = 0

t = 0.05

t = 0.1

t = 0.02

98% probability boundsof posterior distributionat t=0.05

Figure 3.16: Computed heat source at y = 0.725 at different times (example V, σT =

0.005). Also shown are the 98% probability bounds of the posterior distribution at

t = 0.05.

57

3.6 Summary

In this chapter, the Bayesian computational approach using hierarchical Bayesian

formulations and MCMC simulation is presented for the solution of stochastic in-

verse problems in heat conduction. It has been demonstrated through numerical

examples that the Bayesian computational approach provides means to quantify

various system uncertainties and to deduce accurate probabilistic specifications of

the inverse solutions. In all presented numerical studies, the direct problems were

solved on much finer finite element grids and using smaller time steps than the dis-

cretization used in computing the inverse solutions. Still the discretization used in

the inverse solutions was fine enough to diminish the regularization effect introduced

by the a priori assumed function specification.

The fundamental steps of using Bayesian statistics to solve inverse continuum

problems and the advantages of Bayesian computational approach in quantifying un-

certainty, resolving discontinuity and selecting regularization parameter are demon-

strated via studies in this chapter.

Chapter 4

Inverse heat radiation problem

(IHRP)- An integrated

reduced-order modeling and

Bayesian computational approach

to complex inverse continuum

problems

In the previous chapter, the basic Bayesian computational approach to inverse con-

tinuum problems has been demonstrated via application to IHCP. However, the

algorithms developed in the previous chapter are fairly expensive to implement for

inverse problems in complex nonlinear PDE systems. For the purpose of addressing

the high computational cost of Bayesian computational method to inverse problems,

58

59

a reduced-order modeling approach is introduced in this chapter to integrate with

MCMC algorithms. This unique combination allows the solution to complex inverse

continuum problems. The reduced-order modeling method is based on the proper

orthogonal decomposition (POD) [62] and is illustrated via an inverse heat radiation

problem (IHRP) herein. As demonstrated in the numerical example, the simulation

time is drastically reduced using the POD method in the inverse computation.

The remaining of this chapter is organized in the following sequence. Section

4.1 introduces the inverse heat radiation problem. Section 4.2 briefly describes

the full- and reduced-order finite element models used for the direct analysis. The

formulation of the likelihood is presented in Section 4.3 together with the prior

distribution model and the PPDF under a Bayesian inference framework. The design

of the MCMC sampler is discussed in Section 4.4 including the exploration of the

posterior state space. In Section 4.5, two examples of reconstruction of step and

triangular heat source profiles are provided. Finally, Section 4.6 summarizes the

observations of this numerical study and some related issues.

4.1 The inverse heat radiation problem (IHRP)

Study of thermal radiation has been motivated by a wide range of applications in-

cluding thermal control in space technology, combustion, high temperature forming

and coating technology, solar energy utilization, high temperature engine, furnace

technology and other [63]. In participating media, radiation is accompanied by

heat conduction and convection. To simulate such processes, a coupled system

of partial differential equations (PDEs) governing temperature and radiation in-

tensity evolution needs to be solved iteratively. Difficulties arise in the solution

of such systems because the heat flux contributed by radiation varies nonlinearly

60

with the temperature, the radiation intensity varies in space and in direction, and

the radiation intensity equation is an integro-differential equation [63]. The direct

radiation problem, in which the temperature distribution is computed with pre-

scribed thermal properties, source generation and initial/boundary conditions, is

often solved using a combination of spatial discretization methods such as finite

volume or finite element methods (FEM) and ordinate approximation such as PN

and SN methods [63]. The inverse radiation problem in a participating medium that

is of interest here is defined as reconstruction of the heat source given temperature

measurements within the domain [64, 65, 66]. Similar problems have been studied

in [67, 68, 69, 70, 71, 72] using gradient based optimization methods of least-squares

error objection function/functional. Other methods, such as Monte Carlo method,

have also been developed for solving inverse radiation problems [73].

In this work, the situation where thermal conduction and radiation occur simul-

taneously in a participating medium with diffusively reflecting gray boundaries is

considered. The schematic of the problem of interest is given in Fig. 4.1. Inside the

3D domain V , heat conduction occurs simultaneously with absorption, scattering

and emission of the electromagnetic waves. On the boundary surface S, the temper-

ature is known and the electromagnetic waves are diffusively reflected. The transient

heat source will be estimated through temperature measurements at sensor (ther-

mocouple) sites within the domain. The governing equations for the temperature

and radiation intensity evolution in the domain V are as follows:

ρCp∂T

∂t= k∇2T −∇ · ~qr + g(t)G(x− x∗, y − y∗, z − z∗) (4.1)

~s · ∇I + (κ + σ)I − σ

4π

∫

4πI(~r, ~s

′)dΩ

′= κIb (4.2)

61

n

n

heat source

*

**

thermocouple

* S (diffusively reflecting,T known)

V

x

yz

Figure 4.1: Schematic of the inverse radiation problem. The objective is to compute

the point heat source g(t) given initial conditions, boundary conditions on the surface

and temperature measurements at a number of points within the domain.

where Ib is the black body radiation intensity governed by Planck function,

Ib =σbT

4

π(4.3)

and ~qr is the heat flux contributed by radiation:

∇ · ~qr = 4πκ(Ib − 1

4π

∫

4πI(~r, ~s)dΩ) (4.4)

On the boundary S, the following holds:

I(~r, ~s) = εIb +1− ε

π

∫

~n·~s′<0|~n · ~s′|I(~r, ~s

′)dΩ

′~n · ~s > 0 (4.5)

T = Tw (4.6)

In the above equations, T and I denote the temperature and radiation intensity,

respectively, ~r is the position vector and ~s is the direction vector. G(x − x∗, y −y∗, z− z∗) is the spatial approximation of a point heat source located at (x∗, y∗, z∗).

In this work, a 3D Gaussian distribution function is used for G. Ω stands for the

solid angle over the entire space. ρ is the density of the medium, Cp is the thermal

capacity, k is the thermal conductivity, and κ, σ, ε are the absorption coefficient,

scattering coefficient and boundary wall emissivity, respectively. Finally, σb is the

62

Stefan-Boltzmann constant and ~n is the unit normal vector on S pointing into the

domain.

In the inverse problem of interest, the heat source g(t) is the main unknown. Its

calculation becomes feasible by providing the values of the temperature at a given

number of locations within the domain as shown in Fig. 4.1. As in the discussion

of IHCP, let Y denote the measured temperature data. The inverse problem is then

stated as follows: find an estimate g(t) of the real heat source g(t) such that the

computed temperatures with this optimal source estimate can match Y in some

sense. For instance, most deterministic approaches will solve for g(t) by minimizing

the least-squares error between Y and the computed temperatures.

4.2 Direct simulation and reduced-order model-

ing

The direct problem can be solved using a combination of the finite element method

(FEM) in space discretization and the S4 method in ordinate discretization. It is

seen that Eq. (4.1) is a nonlinear partial differential equation (PDE) and Eq. (4.2)

has an integral term. They are coupled by the expressions in Eqs. (4.3) and (4.4).

The iterative process at each time step to solve the coupled Eqs. (4.1) and (4.2) is

summarized next:

1. Set T(i)guess = T (i−1);

2. Substitute T(i)guess into Eq. (4.3) to compute Ib;

3. Solve Eq. (4.2) for I(i);

4. Use Eq. (4.4) to compute ∇ · ~qr;

63

5. Solve Eq. (4.1) and update T(i)guess with the solution;

6. If the solutions converged, set T(i)guess as T (i) and save I(i); otherwise, go to step 2.

7. Go to the next time step.

Here T (i) denotes the temperature solution at the ith time step (note that T (0) is

a known initial temperature field) and T (i)guess is the guessed temperature solution.

In each iteration of the above procedure, the integro-differential Eq. (4.2) is solved

using the S4 method [63]. In this approach, the intensity I at each spatial point

is discretized into 24 directions. The integration over solid angles (directions) is

approximated as weighted sum in these 24 directions. The direction vectors and

associated weights are specified in [63]. In each direction, the governing equation

for I can be written as follows:

~si · ∇Ii + (κ + σ)Ii − σ

4π

24∑

j=1

Ij(~r)wj = κIb (4.7)

The associated boundary condition takes the following form:

Ii = εIb +1− ε

π

∑

j: ~n·~sj<0|~n · ~sj|wjIj, ~n · ~si > 0 (4.8)

where wj is the weight associated with the jth direction. For any given temperature

field, 24 equations as Eq. (4.7) with fixed direction vectors, ~si’s, need to be solved

iteratively to obtain I. It is noticed that Eq. (4.7) contains an advection term

~si · ∇Ii, hence the streamline-upwind/Petrov-Galerkin (SUPG) formulation [74] is

used to derive stabilized FEM equations. In summary, the weak formulations of

temperature Eq. (4.1) and intensity Eq. (4.7) can be written as follows:

∫

VρCpT

(i)Wdv + ∆t∫

Vk∇T (i) · ∇Wdv =

∆t∫

V(−∇ · ~qr + g(t)G(x− x∗, y − y∗, z − z∗))Wdv+

64

∫

VρCpT

(i−1)Wdv, (4.9)

and∫

V~si · ∇IiWdv +

∫

V(κ + σ)IiWdv =

∫

VκIbWdv +

∫

V

σ

4π

24∑

j=1

IjwjWdv, (4.10)

where W and W are the test (basis) functions for classical Galerkin and SUPG

formulations, respectively [74].

Using the above direct simulation framework, the total number of degrees-of-

freedom for the system becomes N3n × 25, where Nn is the number of nodes in each

coordinate. Also note that there are two iteration loops in each time step. Thus,

it is expected that the above full-order direct model solver will be computationally

intensive. To solve the stochastic inverse problem, a large number of direct simula-

tions is required. Therefore, reduced-order modeling needs to be introduced for the

direct simulation [75].

For the convenience of implementation, the direct problem is separated into an

inhomogeneous part (accounting for the temperature boundary condition on S) and

a homogeneous part (with zero applied temperature on S), i.e. T = T I + T h and

I = II + Ih. These fields are defined as follows:

For the inhomogeneous fields T I and II :

k∇2T I = 0 (4.11)

~s · ∇II + (κ + σ)II − σ

4π

∫

4πII(~r, ~s

′)dΩ

′= κII

b (4.12)

IIb =

σb(TI)4

π(4.13)

II = εIIb +

1− ε

π

∫

~n·~s′<0|~n · ~s′|II(~r, ~s

′)dΩ

′, ~n · ~s > 0 (4.14)

T I = Tw, on S (4.15)

65

For the homogeneous fields T h and Ih:

ρCp∂T h

∂t= k∇2T h −∇ · ~qr + g(t)G(x− x∗, y − y∗, z − z∗) (4.16)

~s · ∇Ih + (κ + σ)Ih − σ

4π

∫

4πIh(~r, ~s

′)dΩ

′= κIb − κII

b (4.17)

Ih =1− ε

π

∫

~n·~s′<0|~n · ~s′|Ih(~r, ~s

′)dΩ

′, ~n · ~s > 0 (4.18)

T h = 0, on S (4.19)

The reduced-order models are constructed for homogeneous T h and Ih only since

the steady state Eqs. (4.11)-(4.15) only need to be solved once in the inverse pro-

cedure.

The POD method is considered in the current work for the reduced-order mod-

eling [76, 77]. In this approach, the direct simulation result at each time step is

expressed as a linear combination of a set of orthonormal basis functions. The

coefficients associated with each basis function are computed from the solution of

ordinary differential equations (ODEs) derived by Galerkin projection. The basis

functions can be extracted from computational or experimental snapshots available

in a database through solving the following eigenvalue problem [76]:

1

Ne

Ne∑

i=1

∫

VU (i)U (i)(~r

′)Ψ(~r

′)dv

′= µΨ (4.20)

where U (i) is the ith field function (temperature or intensity field) from the data-

base, Ne is the number of snapshots used, µ is the eigenvalue of operator KΨ =

1Ne

∑Nei=1

∫V U (i)U (i)(~r

′)Ψ(~r

′)dv

′and Ψ is the corresponding eigenfunction. In this

study, the basis functions are obtained using ‘the method of snapshots’ as follows:

• Take an ensemble set U (1), U (2), ..., U (Ne), where U (i) is the full-model so-

lution of the PDEs at the ith time step. For temperature, U (i) is in fact

T h(t = i∆t). For intensity, U (i) is Ih(t = i∆t).

66

• Solve the eigenvalue problem CV = V µ, where C is a Ne × Ne matrix with

Cij = 1Ne

∫V U (i)U (j)dv, µ is a Ne × Ne diagonal matrix with the ith diagonal

entry µi is the ith eigenvalue of C, and the corresponding eigenvector Vi is the

ith column of Ne ×Ne matrix V.

• Compute the basis functions as Ψi =∑Ne

j=1 Vi(j)U(j)/(Neµi).

The set Ψ1, Ψ2, . . . , ΨNe is orthonormal [76]. Note that the intensity Ih is

a function of both space and orientation, therefore, the volume integration in Eq.

(4.20) and the followed eigenvalue analysis should be replaced with∫V

∫4π dvdΩ for

model reduction of Ih. Finally note that the beauty of the POD-based model-

reduction is that in most situations, it is sufficient to take only a small number of

basis functions (those corresponding to the larger eigenvalues). Convergence and

optimality properties of POD expansions can be found in [62].

Let ΨT1 , ΨT

2 , ..., ΨTKT denote the basis functions of T h and ΨI

1, ΨI2, ..., Ψ

IKI

denote the basis functions of Ih, where KT and KI are the number of basis functions

used for expanding temperature and intensity fields, respectively. The solutions of

the reduced-order model are written as follows:

T h(t, ~r) =KT∑

i=1

ai(t)ΨTi (~r) (4.21)

Ih(t, ~r, ~s) =KI∑

i=1

bi(t)ΨIi (~r, ~s) (4.22)

Substituting the above expressions into Eqs. (4.16) and (4.17), the following ODEs

are obtained:

Mjdaj

dt+

KT∑

i=1

Hjiai = −Sj + Qjg(t), j = 1 : KT (4.23)

KI∑

i=1

Ajibi −KI∑

i=1

Bjibi = Dj, j = 1 : KI (4.24)

67

where the following definitions have been introduced:

Mj = ρCp

∫

V(ΨT

j )2dv (4.25)

Hji = k∫

V∇ΨT

j · ∇ΨTi dv (4.26)

Sj =∫

V(∇ · ~qr)Ψ

Tj dv (4.27)

Qj =∫

VΨT

j G(x− x∗, y − y∗, z − z∗)dv (4.28)

Aji =∫

V

∫

4π(~s · ∇ΨI

i )ΨIj + (κ + σ)ΨI

i ΨIjdΩdv (4.29)

Bji =∫

V

∫

4π(

∫

4πΨI

i dΩ′)ΨI

jdΩdv (4.30)

Dj =∫

V

∫

4π(κIb − κII

b )ΨIjdΩdv (4.31)

Solving Eqs. (4.23) and (4.24), the reduced-order solution can be obtained as follows:

T = T I +KT∑

i=1

aiΨTi (4.32)

I = II +KI∑

i=1

biΨIi (4.33)

It is seen that the total number of degree-of-freedom is reduced to KT + KI ,

which is extremely small compared to the full-order model simulation. Using this

reduced-order solver for the direct analysis, we are now ready to investigate the

inverse problem of interest.

4.3 Bayesian formulation of IHRP

To introduce the Bayesian formulation, the unknown heat source function is first

discretized using linear finite element basis functions in time as in IHCP:

g(t) =m∑

i=1

wi(t)θi (4.34)

68

where θi’s are the corresponding nodal values of g and m is the number of basis

functions used.

The likelihood function can be obtained from the following relationship,

Y = F (θ) + ω (4.35)

where F is the a numerical solver that computes the temperatures at thermocouple

locations given the heat source using the reduced-order model introduced in the

previous section. Fi represents the temperature at the same location and time as

Yi does. In this work, we regard measurement errors (ω) as independent identically

distributed (i.i.d.) Gauss random variables with zero mean and standard deviation

(std) σT . It is assumed that the numerical errors are much less in magnitude than

measurement errors. Subsequently, the likelihood can be written as,

p(Y |θ) =1

(2π)n/2σnT

exp−(Y − F (θ))T (Y − F (θ))

2σ2T

(4.36)

The point-pair MRF is used for the prior modeling of θ. With the specified

likelihood function in Eq. (5.20) and prior distribution in Eq. (3.12), the PPDF for

the inverse problem can then be formulated as,

p(θ|Y ) ∝ exp− 1

2σ2T

[F (θ)− Y ]T [F (θ)− Y ] · exp−1

2λθT Wθ (4.37)

In the above formulation, all the normalizing constants are neglected because the

numerical algorithm introduced in later section allows to explore the posterior state

space without knowing these constants. Eq. (4.37) is the Bayesian formulation

investigated for the inverse radiation problem of interest. Both point estimates of

MAP (Eq. (3.22)) and posterior mean (Eq. (2.3)) and probability bounds of the

posterior distributions are computed based on this formulation.

69

4.4 MCMC sampler

For point estimates like MAP, deterministic optimization algorithms such as the

conjugate gradient method can be used to find the approximate solutions. However,

for obtaining the posterior mean estimate, or for estimating higher order statistics of

the random unknown, statistical sampling algorithms such as Markov chain Monte

Carlo (MCMC) simulation must be introduced to explore the posterior state space.

In this study, a modified MH sampler is designed which takes advantage of the

idea of Gibbs sampler, namely, to update the vector θ one component at each time.

The notation of

θ(i+1)−j = θ(i+1)

1 , θ(i+1)2 , ..., θ

(i+1)j−1 , θ

(i)j+1, ..., θ

(i)m

is used again here, where the superscript (i) refers to the ith sample and the subscript

j refers to the jth component. The sampler is designed as follows:

Algorithm V

1. Initialize θ(0)

2. For i = 0 : Nmcmc− 1

For j = 1 : m


— sample θ(∗)j ∼ qj(θ

(∗)j |θ(i+1)

−j , θ(i)j )

— if u < A(θ(∗)j , θ

(i)j )

θ(i+1)j = θ

(∗)j

— else

θ(i+1)j = θ

(i)j ,

70

where, A(θ(∗)j , θ

(i)j ) = min1, p(θ

(∗)j |θ(i+1)

−j )q(θ(i)j |θ(∗)

j ,θ(i+1)−j )

p(θ(i)j |θ(i+1)

−j )q(θ(∗)j |θ(i)

j ,θ(i+1)−j )

and

qj(θ(∗)j |θ(i+1)

−j , θ(i)j ) =

1√2πσqj

exp− 1

2σ2qj

(θ(∗)j − θ

(i)j )2 (4.38)

with σqj is the std of the jth proposal distribution. The reason for updating a single

component of θ at each MCMC step is to improve the acceptance probability. In

fact, by updating the entire vector at the same time, it is rather difficult to get

the candidate accepted. This sampler is essentially a cycle of m symmetric MCMC

samplers [38].

Since each run of above MH step requires a direct computation of the transient

temperature field, it is now clear that model-reduction is essential.

4.5 Numerical examples

A numerical example is presented in this section to demonstrate the developed

methodologies. The example considered is similar to that discussed in Park and

Sung [75] but with different spatial approximation of the point heat source and with

a reduced number of thermocouples. The schematic of the problem is shown in Fig.

5.4. The boundary conditions associated with Eqs. (4.1) and (4.2) are the following:

T = 800 K, on x = 0, 1, y = 0, 1, z = 0, 1 (4.39)

I(~r, ~s) = εIb +1− ε

π

∫

~n·~s′<0|~n · ~s′|I(~r, ~s

′)dΩ

′, ~n · ~s > 0

on x = 0, 1, y = 0, 1, z = 0, 1 (4.40)

Three thermocouples are mounted at 1 − (0.5, 0.5, 0.45), 2 − (0.5, 0.5, 0.4) and 3 −(0.5, 0.5, 0.35), respectively, as seen in Fig. 5.4. The heat source is located at

(0.5, 0.5, 0.5). The spatial distribution of the heat source is approximated as follows:

G(x− x∗, y − y∗, z − z∗) = exp− 1

0.052(x− 0.5)2(y − 0.5)2(z − 0.5)2 (4.41)

71

x

z y

1m

1m

1m

g(t)(0.5m, 0.5m, 0.5m)

800K

800K

800K

800K

800K

O

800K

***

12

3

Figure 4.2: Schematic of the numerical example.

The material properties are taken as follows: ρ = 0.4kg/m3, Cp = 1100J/kg ·K,

k = 44W/m · K, κ = 0.5, σ = 0.5 and ε = 0.5. The steady-state solution when

g(t) = 80 kW/m3 and

G(x− x∗, y − y∗, z − z∗) = exp− 1

0.252(x− 0.5)2(y − 0.5)2(z − 0.5)2 (4.42)

is taken as the initial condition.

o t

g(t)

400kW/m3

0.05s0.01s 0.04s

80kW/m3

Figure 4.3: Profile of the step heat source.

With the above specified conditions and a step heat source profile of g(t) as

shown in Fig. 4.3, the full-order direct model is first solved on a 26 × 26 × 26

grid from t = 0 to t = 0.05 s at 100 time steps. Fig. 4.4 shows the computed

homogeneous radiation intensities on cross section y = 0.5 at different times along

the specified directions. The homogeneous temperature fields on the same cross

72

section at different times are plotted in Fig. 4.5.

All 100 temperature and intensity fields are recorded as snapshots to obtain

the eigenfunctions (Ne = 100). Eigenfunctions corresponding to the first 6 largest

eigenvalues are used in the reduced-order model (KT = KI = 6). Fig. 4.6 shows the

1st, 3rd and 6th eigenfunctions of Ih on y = 0.5 along the specified direction. The

1st, 3rd and 6th eigenfunctions of T h on y = 0.5 are plotted in Fig. 4.7. To verify

the accuracy of the POD method, the temperature fields on y = 0.5 obtained by

solving the reduced-order model with a heat source as in Fig. 4.3 are given in Fig.

4.8. Fig. 4.9 shows the evolution of the temperature at the thermocouple locations

computed by both full-order and reduced-order model simulations. It is obvious that

the two solutions are almost indistinguishable. It is worth emphasizing that the full

model direction simulation in this example has total DOF of 25× 273 at each time

step. These unknown nodal values of temperature and intensity need to be solved

iteratively at each time step. Therefore, the computational cost is high. In fact, a

single run of the full model simulation at 100 time steps takes almost 24 hours using

13 v2 nodes (2 2.4GHz P4 Xeon processors per node, 2GB RAM per node, 512KB

cache per processor) at Cornell Theory Center (http://www.tc.cornell.edu/CTC-

Main/Services/CTC+Resources.htm). In contrast, the reduced-order model has

only 12 DOF at each time step and one run of the 100 time-step simulation can

be finished on a PC (Intel Pentium4 2.8GHz processor, 512 RAM) within a few

seconds. In general, the number of basis needed for reduced-order modeling is fairly

small (the order of 10). Enormous savings on computational time can be achieved.

To demonstrate the Bayesian method for inverse reconstruction of the heat source

profile of Fig. 4.3, simulation data are generated by adding Gauss random noise with

zero mean and standard deviation σT to the full-order direct model solution at the

73

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

0

0.2

0.4

0.6

0.8

1

Z

0 0.25 0.5 0.75 1

X (a) t = 0.005s

0

0.2

0.4

0.6

0.8

1

Z

0 0.25 0.5 0.75 1

X

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

(b) t = 0.01s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

(c) t = 0.025s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

(d) t = 0.05s s =[0.9082483 0.2958759 0.2958759] s =[-0.9082483 0.2958759 0.2958759]

Figure 4.4: Homogeneous intensity fields on y = 0.5 along directions [0.9082

483 0.2958759 0.2958759] and [−0.90824830.29587590.2958759] for step heat source.

74

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

(a) t = 0.005s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

(b) t = 0.01s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

(c) t = 0.025s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

(d) t = 0.05s

Figure 4.5: Homogeneous temperature fields on y = 0.5 for step heat source.

75

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

=1λ 1.877614e+04

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

=3λ 4.693608e-01

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

=6λ 5.397338e-04

s =[0.9082483 0.2958759 0.2958759]

Figure 4.6: Eigenfunctions of Ih along [0.9082483 0.2958759 0.2958759] on y = 0.5.

76

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

=1λ 21.98019

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

=3λ 2.136851e-03

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

=6λ 5.771976e-07

Figure 4.7: Eigenfunctions of T h on y = 0.5.

77

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

t = 0.005s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

t = 0.01s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

t = 0.025s

0 0.25 0.5 0.75 1

X

0

0.2

0.4

0.6

0.8

1

Z

t = 0.05s

Figure 4.8: Homogeneous temperature field computed using the POD method on

y = 0.5 for step heat source.

78

0 0.01 0.02 0.03 0.04 0.05798

800

802

804

806

808

810

812

t (s)

T (

K)

Full−order model solutionReduced−order model solution

1 − (0.5,0.5,0.45)

2 − (0.5,0.5,0.4)

3 − (0.5,0.5,0.35)

Figure 4.9: Temperature evolution at thermocouple locations for step heat source.

thermocouple locations. For all following cases, the temperature is assumed to be

measured from t = 0 to t = 0.05s with a sampling interval δt = 0.001s, hence,

there are totally 150 measurements for each case. 26 basis functions are used in the

discretization of g(t) with equal step size of dt = 0.002s.

To obtain a good starting point for the MH sampling, an initialization step is first

conducted by running the sampling algorithm while solely increasing the likelihood.

A few hundred runs of this procedure is enough to provide a good initial guess of θ.

Fig. 4.10 plots the MAP estimates of the step heat source using MCMC samples

when σT takes different values. It is seen that the MAP estimates are stable to

various magnitudes of errors. In Fig. 4.11, the posterior mean estimate when

σT = 0.01 is plotted. The estimates are achieved using 10000 converged MCMC

samples. The upper and lower bounds plotted in the same figure are the values at

3 standard deviations from the sample mean, which is an indication of the highest

density region of the posterior state space. The σqj used in the proposal distribution

is 1% of the magnitude of θ(i)j . This is to guarantee that the proposal distribution

can fully explore the posterior state space while concentrating on the highest density

79

0 0.01 0.02 0.03 0.04 0.050

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

5

t (s)g

(W

/m3)

True heat sourceMAP estimate when σ

T = 0.005

MAP estimate when σT = 0.01


Figure 4.10: MAP estimates for the step heat source.

region. The regularization constant, λ is chosen to be 8.0e−9, 5.0e−9 and 2.0e−9,

respectively for the above three cases. The overall acceptance ratio for the chain

used in Fig. 4.11 is around 77.5%.

A triangular profile of heat source as shown in Fig. 4.12 is also reconstructed

following the same procedure as in the earlier example including using the POD

basis generated earlier with snapshots from the step heat source problem. Fig. 4.13

plots the MAP estimates of triangular heat source when σT has different values. It

is again seen that the estimates are relatively stable to the change of magnitude

of noise. Fig. 4.14 plots the posterior mean estimate when σT = 0.01. The same

proposal distribution as in the previous cases is used for this run. The overall

acceptance of the Markov chain is around 77.4%. It is seen that with simulated

noise, the posterior mean estimate approximates the true heat flux quite well.

80

0 0.01 0.02 0.03 0.04 0.050.5

1

1.5

2

2.5

3

3.5

4

4.5x 105

t (s)

g(W

/m3)

True heat source

Posterior mean estimate

3 standard deviation bounds of the posterior distribution

Figure 4.11: Posterior mean estimate of the step heat source and probability bounds

of the posterior distribution when σT = 0.01.

o t

g(t)

0.02s 0.04s 0.05s

160kW/m3

80kW/m3

Figure 4.12: Profile of the triangular heat source.

81

0 0.01 0.02 0.03 0.04 0.05

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7x 10

5

t (s)

g (

W/m

3)

True heat sourceMAP estimate when σ

T = 0.005



Figure 4.13: MAP estimates for the triangular heat source case.

0 0.01 0.02 0.03 0.04 0.05

0.6

0.8

1

1.2

1.4

1.6

x 105

t (s)

g (

W/m

3)

True heat sourcePosterior mean estimate3 standard deviation bounds of the posteriordistribution

Figure 4.14: Posterior mean estimate of the triangular heat source and probability

bounds of the posterior distribution when σ = 0.01.

82

4.6 Summary

In this chapter, an inverse radiation problem is solved using the Bayesian compu-

tational method. The posterior distribution of an unknown heat source strength

is computed from temperature measurements by modeling the measurement errors

as i.i.d. Gauss random variables. The Metropolis-Hastings algorithm was used to

explore the posterior state space and the POD method to reduce the computational

cost. The simulation results indicate that the method can provide accurate point

estimates of the unknown heat source as well as complete statistical information.

Although the study is devoted toward point heat source estimation, the method-

ologies can be extended to reconstruction of distributed heat sources as well. In

the situation where thermal properties are dependent on the temperature and large

temperature variation is observed, the Bayesian computation is still applicable.

In the model reduction used in the reconstruction of the step heat source in

the first example, for demonstration purposes the snapshots were generated using

the same heat source profile. While the snapshots generated with the step heat

source profile were capable resolving the triangular heat source profile in the second

example, they may not be appropriate for use in the identification of heat sources

of other profiles and a more comprehensive set of snapshots generated from various

heat source profiles will be needed. This is indeed an open important research area

of current interest.

In summary, it can be concluded in this chapter that by integrating POD

based reduced-order modeling in the likelihood computation, Bayesian computa-

tional method can be applied to complex nonlinear inverse continuum problems.

This is an demonstration of the generic applicability of Bayesian computational

method.

Chapter 5

Contamination source

identification in porous media flow

- Solving the PDEs backward in

time using Bayesian method

Bayesian computational method has been well addressed in the previous chapters for

inverse continuum problems of estimating physical parameters and dynamic bound-

ary conditions. In this chapter, the method is extended for another type of inverse

problem, the backward solution of PDE. A contamination source reconstruction

problem is studied herein to illustrate the methodology, in which an advection-

dispersion equation is solved backward in time. The plan of this chapter is as

follows. Section 5.1 introduces the mathematical definition of the problem. The

direct simulation of the contamination propagation is discussed in Section 5.2. It

is then followed by the hierarchical Bayesian formulation of the inverse computa-

tion. The posterior exploration algorithms are presented in Section 5.4. Section 5.5

83

84

contains numerical examples to demonstrate the developed methodology. Finally,

conclusions of the this work are summarized in Section 5.6.

5.1 Problem definition

The contamination source identification problem has received significant research

interest due to its applications in groundwater and soil cleanup. Addressing this

problem requires solving the governing partial differential equations (PDEs) of con-

taminant propagation in porous media flow backwards in time. Namely, the objec-

tive is to compute the history of contaminant concentration from current concentra-

tion data. The ill-posedness of this inverse problem and the difficulties in simulating

the contaminant propagation have been well-recognized. To facilitate the solution

to this challenging problem, a variety of methods have been developed over the past

several decades, which have been reviewed by Atmadja and Bagtzoglou [78] and

Michalak and Kitanidis [79].

The Bayesian approach was first introduced for solving contamination source

identification by Snodgrass and Kitanidis [80]. In follow up studies [48, 81], Micha-

lak and Kitanidis developed a confined Brownian motion model to enforce nonneg-

ativity of concentration estimates and techniques to select structure parameters of

the Bayesian posterior distribution. Ruppert et. al. [82, 83] have also developed

enhanced MCMC algorithms to improve the mixing of the sampling process from

the posterior distributions.

In contrast to the relatively straightforward formulation of the likelihood, mod-

eling the prior distribution in the Bayesian approach is more difficult. Considering

the structure of a spatially varying concentration field, the point-pair MRF model

is used. In this work, in addition to concentration fields, the standard deviation of

85

measurement errors and the scaling parameter of the prior distribution are treated

as random variables. These parameters are referred as ‘structure variables’ follow-

ing the terminology by Michalak and Kitanidis [48]. These parameters are often

called hyper-parameters as well. Hierarchical Bayesian analysis is used to derive the

joint distribution of structure parameters with the unknown concentration fields.

The joint posterior state space is then explored using a mixed sampler that samples

the concentration variables using the Gibbs algorithm and the structure parameters

using the MH algorithm.

Simulation data are used in this study to test the presented inverse computa-

tion method. The Darcy equation for porous media flow is first solved using the

global gradient post-processing method (Loula et al. [84]). The velocity field is then

used to solve the advection-dispersion equation of concentration using a streamline-

upwind/Petrov-Galerkin (SUPG) finite element method. All equations are solved

on a rather fine grid to generate simulation data, thus avoiding the so called ‘inverse

crimes’.

Propagation of contaminant in an impermeable porous medium can be described

by the following advection-dispersion equation (ADE) [85]:

φ∂c

∂t+∇ · (cu)−∇ · (D∇c) = cq, in Ω× (0, T ], (5.1)

with prescribed initial and (Neuman) boundary conditions,

c(x, 0) = c0(x), in Ω, (5.2)

D∇c · n = 0, on ∂Ω× (0, T ]. (5.3)

In the above equations, c is the concentration (mass fraction) of the contaminant and

c is the prescribed concentration values at the injection and production wells. Also

q denotes the volume flux rate at the wells and φ and D are the medium porosity

86

and dispersion tensor, respectively. Finally, Ω is the spatial domain and (0, T ] is the

total time span. The anisotropic dispersion coefficient D can be modeled as follows:

D = φαmI + ‖u‖[αlE(u) + αtE⊥(u)], (5.4)

with

E(u) =1

‖u‖u⊗ u, E⊥(u) = I− E(u), (5.5)

where I is the identity matrix, and αm, αl and αt are the molecular diffusivity,

longitudinal dispersion coefficient and transverse dispersion coefficient, respectively.

The Darcy velocity u can be computed from the following equations:

∇ · u = q, in Ω × (0, T ], (5.6)

u = −K(x)

µ(c)∇p, in Ω × (0, T ], (5.7)

u · n = 0, on ∂Ω × (0, T ], (5.8)

where p is the hydrodynamic pressure and K and µ are the permeability and dynamic

viscosity, respectively. In this study, we assume the variation of viscosity can be

neglected, i.e. that µ is a constant equal to the dynamic viscosity of the resident

fluid (water). Therefore, the ADE Eq. (5.1) is decoupled from the flow Eqs. (5.6)

and (5.7).

A direct (or forward) contaminant propagation problem is defined as the compu-

tation of the concentration distribution at all times t ∈ (0, T ], given initial condition

Eq. (5.2) and boundary condition Eq. (5.3). In the inverse problem of interest con-

sidered, the contamination concentration at current time t = T can be measured

at finite locations inside Ω. However, the history of contaminant distribution is not

known. Namely, c0 and the time span T between releasing time t = 0 of the conta-

minant and measurement time t = T are both unknown. The inverse problem is to

87

compute the concentration backwards in time, namely c(t) with t < T , on a finer

scale grid than the measurement scale grid. It is assumed that no prior knowledge

of releasing time and location of the contaminant is available. The releasing time is

defined as the time point corresponding to a backward computed concentration of

1.0 at any location.

5.2 The direct simulation and sensitivity analysis

5.2.1 Solution of the flow equations

The solution to the direct problem is required for the inverse computation. The

direct simulation can be separated into two parts: solution to the flow equations

and solution to the concentration equation. In the first part, the constant flow

velocity field is obtained by solving Eqs. (5.6) to (5.8). The velocity is then used in

solving Eqs. (5.1)-(5.5).

In the context of the finite element (FE) method, the most common approaches

to solving the flow Eqs. (5.6)-(5.8) are the stabilized finite element method, in which

the pressure and velocity are determined simultaneously, and the gradient post-

processing method, in which the pressure is found first and then the velocity is

calculated via gradient post-processing. The gradient post-processing method is

easier to implement and computationally less costly. It solves a diffusion equation

derived by substituting Eq. (5.7) into Eq. (5.6) for pressure first. The velocity is

then computed as the smoothed gradient of the pressure field. In this work, the flow

equations are solved using a global post-processing method as discussed in [84].

In the gradient post-processing approach, the pressure is solved using

∇(

K

µ∇p

)= −q, (5.9)

88

which is derived by substituting Eq. (5.7) into Eq. (5.6). The finite element technique

to solve this steady state diffusion equation is trivial. Once the pressure field is

obtained, Eq. (5.7) can be used to compute the velocity. However, since the finite

element solution of pressure is usually not smooth across element boundaries, the

velocity obtained by directly computing the gradient of pressure is discontinuous

across element boundaries, which is not physically feasible. To achieve a continuous

velocity solution, a global L2-smoothing post-processing problem is usually solved

with the following weak formulation:

(u,w) = (−K

µ∇p,w), (5.10)

where (·, ·) is the L2(Ω) inner product and w is the test function for velocity. Ac-

cording to Loula et al. [84], to further increase the accuracy of the post-processing

result, Eq. (5.10) is often modified:

(u,w) + (δh)α(∇ · u,∇ ·w) = (−K

µ∇p,w)

+(δh)α(q,∇ ·w), (5.11)

in which h is the finite element grid size. The parameters δ and α are here taken as

0.1 and 1, respectively [84]. Let

U = u|u ∈ (L2(Ω))dim,∇ · u ∈ L2(Ω),u · n = 0. (5.12)

The problem can be stated as to find u ∈ U , such that, for all w ∈ U , Eq. (5.11)

holds.

5.2.2 Solution of the concentration equation

After computing the velocity from the above approaches, one can return to Eq. (5.1)

to evaluate the concentration. To solve this advection-dispersion equation, the

89

SUPG finite element formulation is used [86]:

∫

Ωφ

∂c

∂twdΩ +

∫

Ω(u · ∇c)wdΩ +

∫

ΩqcwdΩ

+∫

ΩD∇c · ∇wdΩ +

nel∑

e=1

∫

Ωeτue∇w(φ

∂c

∂t+ ue · ∇c

+qc)dΩe =∫

ΩcqwdΩ +

nel∑

e=1

∫

Ωe

τue∇wcqdΩe. (5.13)

where w is the test function for concentration. The weak problem is to find c ∈H1(Ω) such that, for all w ∈ H1(Ω) Eq. (5.13) holds. The element based integrals

(the 5th term on the left hand side and the 2nd term on the right hand side) are

the SUPG stabilizing terms, in which τ is the upwind parameter. In this form,

the SUPG finite element formulation assumes the gradients of the test functions w

are discontinuous across the element boundaries. The stabilization parameter τ is

computed via the following formula:

τ =1

2

h

‖ue‖min(Pe

3, 1.0), (5.14)

where Pe is the local (element) Peclet number that is defined as:

Pe =1

2h‖ue‖3

uTe Due

. (5.15)

With the finite element formulations introduced above, the direct problem is

solved using two-dimensional bi-linear finite elements. The simulator was imple-

mented for parallel machines using PETSc [87] and has been tested by comparing

the results to solutions of various numerical examples documented in [86, 88, 89].

5.2.3 Sensitivity analysis

A discussion of sensitivity analysis is necessary to improve understanding of the

Bayesian formulation. To present the sensitivity analysis, a simpler inverse problem

90

is temporarily considered in this section. By assuming a known releasing time

of contaminant, the inverse problem introduced in Section 5.1 is reduced to the

estimation of a spatially varying function c0(x). This function estimation problem

is further transformed into a parameter estimation problem by the finite element

approximation:

c0(x) =m∑

j=1

θjfj(x), (5.16)

where fj(x)’s are the linear finite element basis functions and θj’s are the nodal

values of finite element approximation of c0 (fj instead of wj is used in this chapter

to avoid confusion with test functions in the weak formulations). The problem now

is to estimate an unknown m-dimensional vector θ with θ(j)=θj being the nodal

value associated with the jth basis function.

Let c(x, T ) be the concentration at measurement time t = T that is computed

from Eq. (5.1) using c0 as initial condition. Due to the linearity of the direct problem,

c(x, T ) =m∑

j=1

θjcj(x, T ) (5.17)

with cj(x, T ) being the direct solution of concentration at t = T using fj(x) as the

initial condition.

Let an N -dimensional vector Y denote the concentration measurement data at

t = T with Y (i) being the measurement at the ith sensor location (xi) and N being

the total number of sensors. Furthermore, let C be an N -dimensional vector with

C(i) = c(xi, T ). Using Eq. (5.17), C can be represented as:

C = Hθ, (5.18)

in which H is a N × m matrix with H(i, j) = cj(xi, T ). H is often called the

sensitivity matrix. It reflects the sensitivity of the concentration C(i) at each sensor

location i with respect to small variations in each parameter θ(j).

91

In the remaining part of this article, the following system relationship is assumed:

Y = Hθ + ω, (5.19)

where ω denotes the error between the measured data and the concentration com-

puted using the true initial condition. Therefore, ω contains both the random

measurement error and the numerical error. The objective of the simpler inverse

problem is to find an estimate of θ such that the discrepancy between Y and C is

minimized in some sense.

With the capability to simulate the direct problem and compute the sensitivity

matrix, we are now ready to investigate the Bayesian formulation.

5.3 Bayesian backward computation

5.3.1 Bayesian inverse formulation

If the random errors in Eq. (5.19) are assumed to be independent identically dis-

tributed (i.i.d.) Gauss random noise with zero mean and variance vT (standard

deviation σT =√

vT ), the likelihood can be formulated as:

p(Y |θ) =1

(2π)N/2vN/2T

exp−(Y −Hθ)T (Y −Hθ)

2vT

. (5.20)

It should be noticed that even though other distributions can be used to model

random errors, the Gaussian distribution is the most commonly used model. With

the likelihood Eq. (5.20) and prior distribution Eq. (3.12), the posterior can be

tentatively written as:

p(θ|Y ) ∝ exp−(Y −Hθ)T (Y −Hθ)

2vT

· exp(−1

2λθT Wθ). (5.21)

However, this posterior distribution depends on pre-fixed values of vT and λ. In

reality, the magnitude of the actual noise can only be roughly estimated. Selection

92

of λ is even more non-deterministic. Those two structure parameters are key to

the estimation of the posterior distribution and to the degree of smoothness of all

point estimates. Unlike earlier methods that try to select such structure parameters

before the inverse computation, the hierarchical Bayesian approach is considered

here to estimate the distribution of the structure parameters simultaneously with

the computation of the concentration distribution.

5.3.2 The hierarchical posterior distribution

The structure parameter is generally assumed to have a nearly non-informative

distribution over its support. For instance, in the current example, the structure

parameters λ and vT are both assumed a priori to be nearly uniformly distributed

over (0,∞]. However, the functional form of the nearly non-informative prior varies

for different structure parameters. In this study, conjugate priors [26] are used

to model the prior distribution of λ and vT . For Eq. (5.21), Gamma and inverse

Gamma distributions are chosen as priors for λ and vT , respectively:

p(λ) ∝ βα11

Γ(α1)λα1−1e−β1λ, λ ∈ (0,∞] (5.22)

p(v−1T ) ∝ βα2

2

Γ(α2)v−(α2+1)T e−β2v−1

T , vT ∈ (0,∞] (5.23)

A small value 1.0e − 3 is selected for the Gamma distribution constants α1, α2, β1

and β2. Thus, the distributions Eqs. (5.22) and (5.23) are nearly non-informative

over (0,∞].

With the hyper priors defined above, a hierarchical Bayesian posterior distribu-

tion can be computed as follows:

p(θ, λ, vT |Y ) ∝ p(Y |θ, vT )p(θ|λ)p(λ)p(vT )

93

∝ v−N/2T exp−(Y −Hθ)T (Y −Hθ)

2vT

· λm/2 exp−1

2λθT Wθ

·λα1−1 exp−β1λv−(1+α2)T exp−β2v

−1T , λ ∈ (0,∞] ∩ vT ∈ (0,∞]. (5.24)

5.3.3 The backward marching scheme

Equation (5.24) models the posterior distribution of the initial concentration field

when T is known. In the primary problem of interest in this study, T is an unknown

variable as well. Therefore, a backward marching scheme is used to reconstruct the

entire history of the concentration fields. The procedure is as follows.

1. Select a small time step ∆t.

2. Formulate a posterior in the form of Eq. (5.24) with T = ∆t.

3. Compute the posterior mean estimate of concentration at t = tcurrent − ∆t

(with tcurrent being the current time).

4. Continue marching backwards in time until the estimated concentration reaches

1.0 at any location, which is determined as the releasing time of the contami-

nant.

5. If t = 0 without the computed concentration reaching anywhere 1, set T =

T + ∆t and return to step 2.

In this approach, the concentration prior to the measurement time is recon-

structed backward in time until the releasing time is reached. Note that the sensi-

tivity problems only need to be solved once in this approach (solve the sensitivity

problem over a large time span and record concentration values at sensor locations

at all time steps).

Computing integrals of the hierarchical posterior distribution Eq. (5.24) is not a

trivial task. More importantly, one is often interested in the highest density region

94

of the posterior distribution. Based upon these considerations, a Gibbs sampling

based Markov chain Monte Carlo (MCMC) simulation method is used to compute

the posterior mean estimate of concentration.

5.4 Numerical exploration of the posterior distri-

bution

The MCMC sampler designed for exploring Eq. (5.24) is based on the basic MCMC

algorithm, the Metropolis-Hastings algorithm and the Gibbs algorithm. The pseudo-

code is the following:

Algorithm VI

1. Initialize θ(0), λ(0) and v(0)T

2. For i = 0 : Nmcmc− 1


−1 , λ(i), v(i)T , Y )


−2 , λ(i), v(i)T , Y )

—...


−m , λ(i), v(i)T , Y ).


— sample λ(∗) ∼ qλ(λ(∗)|λ(i))

— if u < A(λ(∗), λ(i))

λ(i+1) = λ(∗)

— else

λ(i+1) = λ(i),

95


— sample v(∗)T ∼ qv(v

(∗)T |v(i)

T )

— if u < A(v(∗)T , v

(i)T )

v(i+1)T = v

(∗)T

— else

v(i+1)T = v

(i)T .

In the above algorithm, Nmcmc is the total number of sampling steps, and

θ(i), λ(i), and v(i)T are the samples generated in the ith iteration for θ, λ, and vT ,

respectively. Also, θ(i)j is the jth component of θ(i). The notation θ

(i+1)−j denotes a

m − 1 dimensional vector θ(i+1)1 , ..., θ

(i+1)j−1 , θ

(i)j+1, ..., θ

(i)m . Also, p(·|θ(i+1)

−j , λ(i), v(i)T ) is

the full conditional distribution of θj in the ith iteration and u is a random number

generated from the standard uniform distribution U(0, 1). Finally, qλ(·|λ(i)) and

qv(·|v(i)T ) are the proposal distributions for λ and vT in the ith iteration, respectively.

The first part of this algorithm generates samples of θ using the Gibbs sampling

algorithm. It updates each component of θ a time from this component’s full condi-

tional distribution. In other words, the distribution of θj is conditional on all other

other random variables at current values, which are derived as follows:

p(θj|θ−j, λ, vT , Y ) ∼ N(µj, σ2j ), (5.25)

µj =bj

2aj

, σj =

√1

aj

(5.26)

aj =N∑

s=1

H2sj

vT

+ λWjj, bj = 2N∑

s=1

µsHsj

vT

− λµp, (5.27)

µs = Ys −∑

t 6=j

Hstθt, µp =∑

i6=j

Wijθi +∑

k 6=j

Wjkθk (5.28)

The acceptance probability of every Gibbs sample is 1; hence all the samples of θ

generated in this way are accepted.

96

The second part of this algorithm uses an MH sampler to update λ and vT . The

proposal distributions used to generate new samples of λ and vT are both normal

distributions as follows:

qλ(λ(∗)|λ(i)) ∼ N(λ(i), σ2

λ), (5.29)

and

qv(v(∗)T |v(i)

T ) ∼ N(v(i)T , σ2

v). (5.30)

There are two notes to this sampling process. First, the physical limits of θ are

0 and 1. However, if such limits are posed to the sampling process (i.e. rejecting

negative samples and samples greater than 1) the posterior mean estimate can never

reach the limits, which causes biasedness. Therefore, in this study, no constraint is

applied to the sampling process. By doing this, some physically unfeasible samples

are generated, but the posterior mean estimates are feasible. Second, design of the

proposal distributions qλ(λ(∗)|λ(i)) and qv(v

(∗)T |v(i)

T ) must ensure that the effective

regularization parameter (12λσ2) is not overly large.

5.5 Numerical examples

In this section, the above introduced methodology is demonstrated via three numer-

ical examples. Without loss of generality, the examples are studied in dimensionless

form.

5.5.1 Example 1: 1D advection-dispersion in homogeneous

media

The first example is a one-dimensional problem adopted from [78]. Inside the spatial

domain [0, 28], Eq. (5.1) holds with unit constant velocity, porosity and dispersion

97

0 5 10 15 20 25 30−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

xC

on

cen

tra

tion

, C

true concentrationposterior mean estimate

Figure 5.1: True and posterior mean estimate of concentration at t = 1.1.

0 5 10 15 20 25 30−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

x

Co

nce

ntr

atio

n,

C

true concentrationposterior mean estimate

Figure 5.2: True and posterior mean estimate of concentration at t = 1.9.

coefficient (u = 1.0, φ=1.0, D = 1.0). The concentration at x = 0 and x = 28 is

kept at zero. The initial concentration is a rectangular pulse:

c0(x) =

1, 13.5 ≤ x ≤ 14.5

0, 0 ≤ x ≤ 13.5, 14.5 ≤ x ≤ 28(5.31)

The concentration data are collected at t = 2.0, while the objective is to estimate

the concentration at t = 1.1 and t = 1.9.

Following [78], the direct problem is solved on a grid with 112 elements with

time step 0.02. The true and estimated concentration profiles at t = 1.1 and 1.9 are

98

1000 3000 5000 7000 90000.0000

0.0001

0.0002

0.0003

0.0004

0.0005

Figure 5.3: Posterior density of structure parameter λ in obtaining concentration

estimate at t = 1.1.

plotted in Figs. 5.1 and 5.2, respectively. The estimate at t = 1.1 is slightly better

because the regularization parameter is more optimal. This example is different

from the one in [78] in that (i) the concentration data are measured at 27 locations

instead of at all element nodes and (ii) random noise with magnitude of 5% of the

true concentration is added to the data. Still the estimates are quite accurate. The

posterior density of the structure parameter λ in obtaining concentration estimate

at t = 1.1 is plotted in Fig. 5.3 (Gamma distribution).

5.5.2 Example 2: 2D concentration reconstruction

In the following examples, we simulate a quarter area of the classical 5-spot problem.

A schematic of the problem is shown in Fig. 5.4. Inside the square domain (unit

side length), Eqs. (5.1) through (5.8) hold. The injection and production wells are

located at (0, 0) and (1, 1), respectively, both having volume flux rate q (varying in

different examples).

The actual initial concentration used to generate the simulation data has a nor-

99

u = 0 u = 0

v = 0

v = 0

qin

qout

0=∂∂

n

c0=

∂∂

n

c

Figure 5.4: Schematic of Example 2.

mal distribution peaked at (0.375, 0.75) with standard deviation 0.1. The direct

problem is solved on a 128× 128 finite element grid with a time step 0.02. Random

measurement errors are simulated from a normal distribution with zero mean and

standard deviation 0.005 (5% - 15% of the maximum recorded data in examples) in

the homogeneous cases and 0.002 in the heterogeneous case. The simulation con-

centration data are generated by adding random measurement errors to the direct

simulation solution at sensor locations. The data are used to recover the contami-

nant concentration history on a 64× 64 grid. The number of sensors used varies in

the following examples.

Diffusion-dominated transport in homogeneous porous media

In the first two-dimensional example, a diffusion-dominated mode is considered by

setting a very small value (q = 0.001) for the well flux rate. The permeability and

viscosity are both taken as 1.0. To ensure the molecular diffusion is the dominant

transport mechanism, φ, αm, αl, αt take values 0.1, 0.1, 0.01 and 0.001, respectively.

The concentration data are measured at t = T = 1.0 in this case. The sensors are

evenly distributed on nodes of an 8× 8 grid.

100

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

t=0

t=0.2

t=0.4

t=0.6

t=0.8

t=1.0(a) (b)

Figure 5.5: Reconstruction of the history of contaminant concentration: (a) The true

concentrations at different past time steps; (b) the reconstructed concentrations.

101

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

(a) (b)

Figure 5.6: Reconstruction of the concentration at t = 0: (a) data are collected at

9 × 9 sensor locations at t = 0.2 (b) data are collected at 5 × 5 sensor locations at

t = 1.0.

The true concentration profiles and corresponding reconstructed concentration

profiles (posterior mean estimates) at different time points are plotted in Fig. 5.5.

The time indices are obtained by setting tcurrent = 1.0. Since the concentration

data are only collected at sparse sites at t = 1.0, it is of interest to reconstruct the

entire concentration field at this time. This is performed here by solving the direct

problem from t = 0 to t = 1.0 using the reconstructed concentration at t = 0 as the

initial condition. The same backward step size is used as in the direct simulation

(∆t = 0.02). It is seen that the estimated concentration profiles are rather close to

the true concentration. The peak concentration value in posterior mean estimate at

t = 0 is 0.9311, indicating that in this case, the backward marching procedure will

be continued even after the true initial releasing time.

It has also been observed in the study that the posterior mean estimates at time

points close to the measurement time are the most accurate. In Fig. 5.6(a), the

concentration estimate at t = 0 (the true releasing time) using data measured at

t = 0.2 is plotted. The peak value in this case is 0.9916. The estimation of releasing

time is very accurate in this case.

102

To test if the number of sensors can be further reduced, the above estimation is

repeated to use data at t = 1.0 from a 5 × 5 sensor network. The posterior mean

estimate of the concentration at t = 0 (true releasing time) is plotted in Fig. 5.6(b).

The peak value is only 0.8 in this case. Therefore, although the peak location and

initial concentration profile can be identified in this case, the estimation of releasing

time is not acceptable.

Case I: Advection-dominated transport in homogeneous media

In the second numerical experiment, we reconsider the earlier example by changing

the following parameters: q = 0.04, αm = 0, αl = 0.04 and αt = 0.004. Convec-

tion and dispersion are the main mechanisms of contaminant propagation in this

case. Fig. 5.7 shows the true concentration profiles and posterior mean estimates

at different time points using data at t = T = 1.0. In this example, the data are

measured using a 16× 16 sensor network.

In Fig. 5.8, the estimated profile was generated using the data collected from

a 9 × 9 sensor network. It is seen that more fluctuations exist in the estimates.

However, the peak location and profile of concentration can still be resolved.

Case II: Advection-dispersion in heterogeneous media

In this example, we extend our earlier studies to heterogeneous porous media. All the

quantities remain the same as in Example 2 in Section 5.5.2 except the permeability,

which in this case is generated randomly from a joint log-normal distribution on a

32 × 32 finite lattice. The permeability mean at each site is 1.0 and the standard

deviation of log permeability is 1.5. An uncorrelated structure is assumed in this

case. The largest permeability and smallest permeability values in this example

103

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0

t=0.2

t=0.4

t=0.6

t=0.8

t=1.0(a) (b)

Figure 5.7: Reconstruction of the history of pollute concentration: (a) shows the

true concentrations at different past time steps and (b) shows the reconstructions.

104

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0 t=0.2

t=0.4 t=0.6

t=0.8 t=1.0

Figure 5.8: Reconstruction of the contamination history when data are collected at

9× 9 sensor locations at t = 1.0.

differ by the magnitude of 105.

Fig. 5.9 shows the true concentration profiles and posterior mean estimates at

different time points using data at t = T = 1.0. In this example, the data are

measured from a 32 × 32 sensor network. The estimates obtained using data from

a 16 × 16 sensor network are also presented in Fig. 5.10. It is observed that the

estimates using less sensor data are comparable to the estimates in Fig. 5.9. Con-

sidering the heterogeneity and uniformly distributed sensor network, estimates in

Fig. 5.10 are quite impressive.

105

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0

t=0.2

t=0.4

t=0.6

t=0.8

t=1.0(a) (b)

Figure 5.9: Reconstruction of the history of pollute concentration in heterogeneous

medium (data are collected at 32× 32 grid): (a) the true concentrations at different

past time steps and (b) the computed reconstructions.

106

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0 t=0.2

t=0.4 t=0.6

t=0.8 t=1.0

Figure 5.10: Reconstruction of the history of pollute concentration in heterogeneous

medium (data are collected on a 16× 16 grid).

5.6 Summary

The hierarchical Bayesian computational method is extended in this chapter to solve

the contaminant history reconstruction problem in porous media flow. Through

this study, application of Bayesian method for backward solution of PDEs (initial

condition estimation) is demonstrated.

The regularity of the solution to this inverse problem is enforced by the Markov

Random Field model. Complete mathematical models including anisotropic dis-

persion and heterogeneous permeability are used in obtaining the direct simulation

results of contaminant propagation in porous media flow. The attributes of the

107

method are demonstrated via numerical examples in both homogeneous and hetero-

geneous porous media flows.

The current computational method successfully estimates instantaneously re-

leased contamination source in mixed fluids flow with constant viscosity. When

the mobility ratio of contaminant to resident fluid (water) deviates largely from

unity, a more complicated model is required to simulate the direct physical process.

However, the Bayesian method is still applicable in that scenario.

Chapter 6

Open-loop control of directional

solidification - A sequential

Bayesian computational

application

The Bayesian computational methods developed in chapter 3 and chapter 4 are

whole-time domain data inversion method. Namely, the unknowns at different time

points are modeled in the same joint posterior distribution conditional on all dy-

namic data. Such whole-time domain method is less practical than the sequential

estimation method when applied to real-time estimation-prediction applications.

For a dynamic system, it is advantageous to develop sequential Bayesian computa-

tional algorithms that can make real-time estimation conditional on dynamic data.

In this chapter, a directional solidification control problem is studied to demonstrate

the algorithm design of Bayesian filter for sequential solution to inverse problems.

The direct simulator of this study was developed by Ganapathysubramanian and

108

109

Zabaras in [101].

The real-time control of directional solidification is achieved by varying the ex-

ternal magnetic field gradient in this course of study, which is different from previous

solidification control mechanisms based on applying either a uniform or a rotating

magnetic field. The optimal magnetic gradient is estimated at each time step ac-

cording to the control objective, boundary data as well as magnetic gradient at the

previous time step. Random walk is used to model the evolution of magnetic gradi-

ent, which is treated as a random process in this approach. The likelihood is defined

in the way that the convection in the melt region is minimized. The posterior dis-

tribution of magnetic gradient at each time step is derived from the evolution prior

and the likelihood, from which the point estimates are computed. The Bayesian

control approach is analytically much simpler and computationally more affordable

than the previous whole-time domain control approach. It enables the on-line con-

trol of directional solidification. More significantly, it allows the quantification of

uncertainties in boundary conditions such as heat flux, and therefore, provides more

robust control solution.

The plan of this chapter is as follows. In Section 6.1, the physical problem is

defined. Section 6.2 introduces the fundamentals of Bayesian filter followed by the

sequential algorithm design for this particular control problem. Numerical examples

are provided in Section 6.3 to demonstrate the methodology. A brief summary is

finally provided in Section 6.4.

110

6.1 Open-loop control of directional solidification

using magnetic gradient

Solidification from the melt to near net shape is a commonly used manufacturing

technique. The fluid flow patterns in the melt affect the quality and properties of

the final product. For instance, solidification of alloys is invariably accompanied

by macrosegregation, which is caused by thermosolutal convection in the melt and

mushy zones. By controlling the flow behavior, the final solidified material can be

suitably affected. In this context, magnetic fields offer a promising means of control-

ling the solidification process. The effect of a Rotating Magnetic Field (RMF) on

crystal growth and solidification was investigated by Patzold et al. [90] and Roplekar

and Dantzig [91]. Galindo et al. [92] studied the effects of a rotating as well as a

travelling magnetic field on crystal growth processes. The use of a RMF results in

suppression of convection but fluctuations in temperature and concentration leading

to striation patterns in the crystal. Most of the magnetic field approaches to melt

flow control rely on the application of a constant magnetic field [93, 94]. A constant

magnetic field results in the Lorentz force that is used to damp and control the flow

[95, 96, 97, 98]. However, simultaneous application of a magnetic gradient results in

the Kelvin force along with the Lorentz force. This can be used for better control

of the melt flow resulting in higher crystal quality [99, 100].

In [101, 102], the effect of magnetic gradients on the quality of the crystal growth

is investigated. The infinite-dimensional functional optimization approach was taken

to compute optimal magnetic gradient history. This method requires the formulation

of an appropriate continuum adjoint problem that allows the analytical calculation

of the exact gradient of the objective functional. This whole-time domain approach

111

is very computationally expensive and requires storage of all transient temperature,

solute concentration and flow fields, which costs much computer memory. Thus,

this scheme is not applicable for real-time control of the solidification process.

In the following, a computational method for the open-loop control of the di-

rectional solidification is addressed using sequential Bayesian filter. The control

parameter is the time history of the imposed magnetic gradient. The objective

is to reduce the deviation of the velocity field in the melt region from conditions

corresponding to convection-less growth.

The direct solidification process is depicted in Fig. 6.1. Let Ω be a closed

bounded region in Rnsd , where nsd is the number of spatial dimensions, with a piece-

wise smooth boundary Γ. The region is filled with an incompressible, conducting

fluid. At time t = 0, a part of the boundary is cooled below the freezing temper-

ature of the fluid and solidification begins along that boundary. Two-dimensional

applications are considered in this study but the formulation presented is dimension-

independent. Let us denote the solid region by Ωs and the liquid region by Ωl. These

regions share a common solid-liquid interface boundary ΓI . As seen in Fig. 6.1, the

region Ωl has a boundary Γl which consists of ΓI (the solid-liquid interface), Γol (the

mold wall on the liquid side), Γbl (the bottom boundary of the liquid domain) and

Γtl (the top boundary of the liquid domain). Similarly Ωs has boundary Γs, which

consists of ΓI , Γos, Γbs and Γts. A time-varying magnetic field with spatial gradient

∂B∂z

is applied in the z direction.

The governing PDEs are:

in the melt region:

∇ · v = 0, (x, t) ∈ Ωl(t)× [0, tmax] (6.1)

112

ΓΓΓΓts ΓΓΓΓtl

BB

MELTMELTSOLIDSOLID

VVff

gg

ΓΓΓΓbs ΓΓΓΓbl

qosΓΓΓΓos qolΓΓΓΓI

x

z

ΓΓΓΓol

Figure 6.1: Schematic of the directional solidification system. A time-varying mag-

netic field with spatial gradient ∂B∂z

is applied in the z direction.

∂v

∂t+ v · ∇v = −∇p + Pr∇2v−RaT Prθleg + RaT Prγobθleg + RaCPrceg

+Ha2Prb2[−∇φ + v× eB]× eB, (x, t) ∈ Ωl(t)× [0, tmax] (6.2)

∂c

∂t+ v · ∇c = Le−1∇2c, (x, t) ∈ Ωl(t)× [0, tmax] (6.3)

∇2φ = ∇ · (v× eB), (x, t) ∈ Ωl(t)× [0, tmax] (6.4)

∂θl

∂t+ v · ∇θl = ∇2θl, (x, t) ∈ Ωl(t)× [0, tmax] (6.5)

∂θl

∂n= 0,

∂c

∂n= 0, (x, t) ∈ (Γl(t)− ΓI(t))× [0, tmax] (6.6)

v = 0,∂φ

∂n= 0, (x, t) ∈ Γl(t)× [0, tmax] (6.7)

v = 0, c = ci, θ = θi, x ∈ Ωl(t = 0) (6.8)

in the solid region:

∂θs

∂t= Rα∇2θs, (x, t) ∈ Ωs(t)× [0, tmax] (6.9)

θs = θ∗i , x ∈ Ωs(t = 0) (6.10)

∂θs

∂n= 0, (x, t) ∈ (Γs − Γos)× [0, tmax] (6.11)

θs = θs2, (x, t) ∈ Γos × [0, tmax] (6.12)

113

at the interface:

Rk∂θs

∂t− ∂θl

∂t= Ste−1vf · n, (x, t) ∈ ΓI(t)× [0, tmax] (6.13)

θ = θ0 + mc, (x, t) ∈ ΓI(t)× [0, tmax] (6.14)

∂c

∂n= Le(k − 1)(c + δ)vf · n, (x, t) ∈ ΓI(t)× [0, tmax] (6.15)

All of the above equations are non-dimensional with the following characteristic

scales: let L be a characteristic length of the domain; the characteristic scale for

time is taken as L2/α and for velocity as α/L; the dimensionless temperature θ is

defined as θ ≡ (T−To)/∆T where T , To and ∆T are the temperature, reference tem-

perature and reference temperature drop, respectively; likewise, the dimensionless

concentration field c is defined as (c − co)/∆c where c, co and ∆c are the concen-

tration, reference concentration and reference concentration drop, respectively; the

characteristic scale for the electric potential φ is taken as αBo where Bo is maxi-

mum value of the externally applied magnetic field. In these definitions, ρ is the

density, k is the thermal conductivity, α (α ≡ k/ρc) is the thermal diffusivity, D is

the solute diffusivity, σe is the electrical conductivity, and ν is the kinematic viscos-

ity. All fields and properties refer to the liquid domain unless denoted otherwise.

The magnitude of the applied gradient field is usually given as the value of ∇B2.

In this case, since B varies in the z direction only, it is given as B ∂B∂z

. Since only

dimensionless quantities will be used in the rest of this paper (unless is mentioned

otherwise), the symbol φ is used from now on to denote the dimensionless electric

potential.

To achieve a diffusion-based growth, the magnitude (b(t)) of the time-varying

magnetic field gradient must be chosen in such a way so as to negate the effects of

the thermal and convective buoyancy. In a whole-time domain approach, the control

114

objective is restated in terms of b(t) ∈ L2[0, tmax]. In particular, we are looking for

an optimal solution b(t) ∈ L2[0, tmax] such that:

S(b) ≤ S(b) ∀ b(t) ∈ L2[0, tmax], (6.16)

where

S(b) =1

2‖ v(x, t; b) ‖2

L2(Ωl×[0,tmax]) =1

2

∫ tmax

0

∫

Ωl(t)v(x, t; b) · v(x, t; b)dΩdt (6.17)

with the melt velocity v(x, t; b) defined from the solution of the direct problem

with b(t) as a function parameter. The main difficulty with the above optimization

problem is the calculation of the gradient S ′(b(t)) of the cost functional in L2[0, tmax].

Sensitivity and ajoint problems have to be solved, which requires storage of solution

to the direct problem at all time steps. Below, a Bayesian filter approach is presented

to allow real-time control of Eqs. (6.1) to (6.15) via estimating b(t). In the following

numerical studies, the direct problems (Eqs. (6.1) to (6.15)) are solved using a

single domain model along with volume-averaged governing transport equations is

used. Stabilized finite element techniques are used to discretize thermal, solutal and

momentum transport equations of the coupled system. Parallel iterative solution

techniques based on matrix free GMRES algorithm are employed for our numerical

simulations as well. These solver are developed by Ganapathysubramanian and

Zabaras in [101].

6.2 A Bayesian filter-based control approach

6.2.1 Bayesian filter

Bayesian filter refers to a group of predictor-corrector type sequential computation

algorithms [27]. The most commonly used Bayesian filters are the Kalman filter

115

)( 0Xp

evolution updating)|( 01 XXp

)( 1Xp

observation updating

)|( 11 DXp

evolution updating)|( 12 XXp

observation updating

)|( 12 DXp

)|( 22 DXp

)|( 11 XYp

)|( 22 XYp

.

.

.

Figure 6.2: Schematic of a Bayesian filter with Markov properties.

for linear dynamic system and the extended Kalman filter for nonlinear dynamic

system with explicit input-output relations. For a given dynamic system as

Xk+1 = Fk+1(Xk, Vk+1), (6.18)

Yk = Gk(Xk,Wk), (6.19)

where Xk and Yk are the state variable and measurement data at kth time step,

respectively, and Vk and Wk represent the associated process uncertainty and mea-

surement uncertainty, respectively, Bayesian filters compute conditional probability

p(Xk|Dk), (6.20)

where Dk stands for the ensemble of Y1, ..., Yk and p(X0|D0) = p(X0) is known.

The key steps to derive the posterior distribution 6.20 are the evolution updat-

ing and observation updating as shown in Fig. 6.2. The evolution updating for-

116

mulates distribution p(Xk+1|Dk with known models of p(Xk+1|Xk) and p(Xk|Dk).

p(Xk+1|Xk) is the conditional distribution of state parameter at the (k + 1)th time

step given parameter at the kth time step. This conditional distribution is subjected

to the model used in different applications. p(Xk|Dk) is the posterior distribution

of the parameter at kth time step, which contains all information regarding Xk at

time step k. After the evolution updating step, p(Xk+1|Dk actually provides the

prior distribution for estimation of Xk+1 at the (k + 1)th time step.

Observation updating is conducted when data at the (k + 1)th time step are

collected. It can be regarded as a standard Bayesian inference step. The poste-

rior distribution of p(Xk+1|Dk+1) is derived using the likelihood p(Yk+1|Xk+1) and

the prior p(Xk+1|Dk). It is necessary to point out that by using the scheme in

Fig. 6.2, the dynamic state parameter is assumed to be a Markov process, namely,

p(Xk+1|X0, ..., Xk) = p(Xk+1|Xk). The Markov property is also assumed for the

likelihood p(Yk|X0, ..., Xk) = p(Yk|Xk).

The essence of all Bayesian filters is the design of evolution updating step and

observation updating step. Once the posterior distribution p(Xk|Dk) is formulated

for all time steps, standard statistical computation algorithms can be used to explore

the posterior state space. For problems with low dimensionality, sequential MCMC

algorithms such as particle filter [38] are appropriate for the state space exploration

as well.

6.2.2 A sequential Bayesian controller for solidification con-

trol

For the solidification control problem, a random walk model is used for the state

parameter (magnitude of the magnetic gradient b(t) here). The Markov assumptions

117

on state parameter and likelihood apply to current problem as well. Let Bk be the

magnetic gradient value at the kth time step (b(k∆t) with ∆t being the time step

size), the random walk model is as follows:

Bk+1 = Bk + Vk+1, (6.21)

where Vk+1 is a Gauss random variable with zero mean and standard deviation σV .

Following this assumption, the evolution distribution can be written as:

p(Bk+1|Bk) ∝ 1

σV

exp−(Bk+1 −Bk)2

2σ2V

. (6.22)

The evolution updating is simplified to Eq. (6.22) for current problem. In another

word, the prior distribution of Bk+1 at each time step tk+1 is Eq. (6.22) with the

conditional value of Bk being the optimal estimate Bk in time step tk. This is because

the posterior distribution of Bk at each time step tk is implicit. The simplification

is also reasonable since for control purpose, only one magnetic gradient value that

is considered optimal should be applied at each time step.

To model the likelihood p(Yk|Bk), a few issues are addressed first. Firstly, the

observation data Yk in current problem are the desired values of physical quantities

to be controlled, for instance, to minimize the convection in melt part of the solid-

ification domain, zero velocity of the fluid is desired. Secondly, the system model

Gk(Bk,Wk) is implicit. It is a set of numerical simulators that solves the PDEs (6.1)

to (6.15) with optimal estimates of B0, ..., Bk−1 and guessed value of Bk. Thirdly,

Wk does not stand for measurement error in current problem. It represents the

uncertainties in the boundary conditions of above PDEs. For example, the fluctua-

tion of boundary heat flux for the energy equation. Based on these arguments, the

likelihood can be written as:

p(Yk|Bk, Wk) ∝ 1

σY

exp−(Yk −Gk(Bk,Wk)2

2σ2Y

. (6.23)

118

In above equation, σY is the standard deviation of a Gauss distribution. σY does not

correspond to variation of any physical data. It is solely a parameter for the control

of tradeoff between likelihood and prior in the optimization objective as illustrated

in the following discussion.

With the assumptions on prior and likelihood, the posterior distribution at the

(k + 1)th time step can be written as:

p(Bk+1,Wk+1|Dk+1) ∝ 1

σY

exp−(Yk+1 −Gk+1(Bk+1,Wk+1))2

2σ2Y

· 1

σV

exp−(Bk+1 −Bk)2

2σ2V

· 1

σW

exp−(Wk+1 − Wk+1)2

2σ2W

, (6.24)

in which the distribution of Wk+1 is assumed to be Gaussian with known mean value

Wk+1 and standard deviation σW .

As explained above, only one optimal value of Bk+1 need to be solved according to

the posterior distribution (6.24). Herein, this value is defined as the MAP estimate,

namely, the value of Bk+1 that maximizes the posterior probability in (6.24).

Bk+1, Wk+1 = argmaxBk+1,Wk+1

1

σY

exp−(Yk+1 −Gk+1(Bk+1,Wk+1))2

2σ2Y

· 1

σV

exp−(Bk+1 −Bk)2

2σ2V

· 1

σW

exp−(Wk+1 − Wk+1)2

2σ2W

, (6.25)

Since σY , σV and σW are all constants, maximizing (6.25) is equivalent to the min-

imizing of the following function:

Bk+1, Wk+1 = argminBk+1,Wk+1(Yk+1 −Gk+1(Bk+1,Wk+1))

2+

σ2Y

σ2V

· (Bk+1 −Bk)2 +

σ2Y

σ2W

· (Wk+1 − Wk+1)2, (6.26)

Solution to above optimization is feasible when the variation of heat flux is small.

However, instead of solving the optimal estimates of both Bk+1 and Wk+1, another

importance sampling approach is used to explore the uncertainty in Wk+1.

119

Since the distribution of Wk+1 is known in most applications. For example, the

boundary heat flux may vary around certain nominal value with variance estimated

from consecutive measurements. From the known Gaussian distribution of Wk+1,

a set of samples are drawn randomly at each time step. A weight that is propor-

tional to its probability is assigned to each sample. Let w(1)k+1, w

(2)k+1, ..., w

(n)k+1 be

the set of samples at time step tk+1 with n be the total number of samples and

f (1)k+1, f

(2)k+1, ..., f

(n)k+1 be the corresponding sample weights, the MAP estimator of

Bk+1 can be written as

Bk+1 = argminBk+1

n∑

i=1

f(i)k+1(Yk+1 −Gk+1(Bk+1, w

(i)k+1))

2+

σ2Y

σ2V

· (Bk+1 −Bk)2 (6.27)

Eq. (6.27) is a one variable optimization problem that can be solved using standard

gradient optimization method. It need to be pointed out that the ratio ofσ2

Y

σ2V, which

can be represented by a single parameter λ, controls the trade of between a flat

magnetic gradient and small convection.

Another approach to use importance sampling is to approximate the posterior

distribution of Bk+1 directly as:

p(Bk+1|Dk+1) ∝n∑

i=1

f(i)k+1[

1

σY

exp−(Yk+1 −Gk+1(Bk+1, w(i)k+1)

2

2σ2Y

· 1

σV

exp−(Bk+1 −Bk)2

2σ2V

], (6.28)

then define the MAP estimator using this distribution. This approach is not taken

here because the objective function become rather complex to optimizing.

120

Table 6.1: Specifications of the direct solidification problem.

Material specifications Setup specifications

27 NaCl aqueous solution Solidification in

Prandtl number 0.007 a rectangular cavity

Thermal Rayleigh number 200000 Dimensions 2cm x 2cm

Solutal Rayleigh number 10000 Fluid initially at 1 C

Lewis number 3000 Left wall kept at -25 w

Marangoni number 0

Stefan number 0.12778

Ratio of thermal diffusivites 1.25975

6.3 Examples

In this section, an example solidification process is studied to verify the methodology

developed above. The system specification and material properties are give in Table

6.1. Two cases are considered for this solidification process. In the first case, no

system uncertainty is considered, while in the second one, the boundary heat flux

has random variations. To demonstrate the effect of control mechanism applied to

the process, solidification without magnetic gradient control is first simulated. Two

snapshots are shown in Fig. 6.3. It can be seen that significant convection happens

in the melt region.

In the first three examples, no heat flux uncertainty is assumed. The examples

are the same except for value of λ. λ = 0.1, λ = 0.5 and λ = 1.0 are used for

these three examples, respectively. The computed optimal magnetic gradients as a

function of time are plotted in Figs. 6.4, 6.5 and 6.6, respectively. The corresponding

121

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

100 time step

200 time step

Figure 6.3: Snapshots of the solidification process without magnetic gradient control

applied. The left figures are the temperature fields and the right ones are the

streamlines.

time step

B(t

)

100 200 300 4000.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

Figure 6.4: Configuration of the optimal magnetic gradient when λ = 0.1.

122

time step

B(t

)

100 200 300 4000.96

0.97

0.98

0.99

1

1.01

1.02

Figure 6.5: Configuration of the optimal magnetic gradient when λ = 0.5.

time step

B(t

)

100 200 300 400-15

-10

-5

0

5

10

15

20

Figure 6.6: Configuration of the optimal magnetic gradient when λ = 1.

123

be-controlled temperature and solute concentration fields in melt region are plotted

in Figs. 6.7, 6.8, 6.9, respectively. It is seen that the convection in melt region has

been greatly reduced for all time steps in the first two examples. While in example

three, the magnetic field estimate is kept at 1.0 for rather long time steps due to

high λ value. In fact, the higher the value of λ, the stronger the constraint is applied

to magnetic gradient estimate and the flatter the estimate is. This trend is observed

in Figs. 6.4 and 6.5 as well. However, in Fig. 6.6, the magnetic gradient starts

fluctuating after certain time steps. The large overshot of magnetic gradient causes

fluctuations in the melt region as well, which can be observed in Fig. 6.9. The start

of this fluctuation is due to the accumulation effect of large magnetic gradient at

the initial stage. The fluctuation is quickly smoothed out in the following control

steps.

In the next two examples, the boundary heat flux is assumed to be random.

In example four, the heat flux is assumed to have a uniform random distribution

within -5% to 5% of the mean value. While in example five, the distribution of

boundary heat flux is Gaussian with standard deviation been 5% of the mean value.

In both examples, 50 samples are taken at each time step. The magnetic gradient

estimates and the corresponding results are shown in Figs. 6.10 6.12 and 6.10 6.13,

respectively. It is seen that the convection can be significantly reduced as well.

It needs to be pointed out that there are three important aspects in which

the Bayesian filter approach scores over the conventional functional optimization

approach.

i. Memory requirement: The functional optimization approach involves defining

a system of equations known as the continuum sensitivity equations that represent

the sensitivity of each of the dependent variables to changes in the control variable.

124

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

100 time step

200 time step

300 time step

400 time step

Figure 6.7: Snapshots of the solidification process with optimal magnetic gradient

applied when λ = 0.1. The left figures are the temperature fields and the right ones

are the streamlines.

125

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

100 time step

200 time step

300 time step

400 time step


applied when λ = 0.5. The left figures are the temperature fields and the right ones


126

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

274 time step

322 time step

347 time step

400 time step


applied when λ = 1. The left figures are the temperature fields and the right ones


127

time step

B(t

)

50 100 150

0.97

0.975

0.98

0.985

0.99

0.995

1

Figure 6.10: Configuration of the optimal magnetic gradient when the boundary

heat flux has random fluctuation with a uniform distribution.

time step

B(t

)

50 100 150

0.97

0.975

0.98

0.985

0.99

0.995

1

Figure 6.11: Configuration of the optimal magnetic gradient when the boundary

heat flux has random fluctuation with a Gaussian distribution.

128

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 time step

100 time step

150 time step

200 time step

Figure 6.12: Snapshots of the solidification process with optimal magnetic gradi-

ent applied when the boundary heat flux has random fluctuation with a uniform

distribution. The left figures are the temperature fields and the right ones are the

streamlines.

129

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 time step

100 time step

150 time step

200 time step

Figure 6.13: Snapshots of the solidification process with optimal magnetic gradi-

ent applied when the boundary heat flux has random fluctuation with a Gaussian

distribution. The left figures are the temperature fields and the right ones are the

streamlines.

130

Using these equations, a functional form for the gradient of the cost functional

is evaluated in terms of a new set of partial differential equations. This set of

equations is called the continuum adjoint equations. At each optimization step

the direct problem is solved for the whole time history. Then using this data the

adjoint equations are solved backwards in time. The gradient of the cost functional

is evaluated using the adjoint variables. The sensitivity equations are solved to

estimate the step size in the descent direction of the optimization routine. During

this process, the whole direct problem solution has to be stored in memory for later

use in the solution of the adjoint equations. This is enormously memory intensive.

In the solidification example considered, the temperature, velocity (2 values u ,v),

pressure, concentration and potential have to be stored at all the grid points for

all the time steps (3500 points for 400 timesteps). The memory requirement is

to store 6x3500x400 variables which easily goes over 300 MB. For problems with

finer resolution and larger time histories, the memory requirements quickly become

unmanageable. While in the Bayesian approach, only the direct simulation result at

previous one time step is stored. As the example problem considered, the memory

storage is on the order of 10−3 of the one in the conventional approach.

ii. Model complexity: Three sets of equations have to be solved in the conven-

tional approach. The direct problem is a nonlinear problem that has to be solved

over the whole time domain. The adjoint problem is a linear problem that is solved

backwards in time. At each time step, the direct solution is accessed and the direct

variables corresponding to that time step are used in the solution of the adjoint vari-

ables. The sensitivity equations are then solved for the whole time domain. While

in the Bayesian approach, only the direct problem is solved and in each time step,

the direct problem only needs to be solved one time step further.

131

iii. Real time control and errors: The conventional optimization problem is

a whole time domain solution. The problem assumes that all the data are fairly

accurate. Considering the nonlinearity of the equations and their coupled nature,

small error in the measurements could lead to a catastrophic divergence of the

control. While in the Bayesian, the uncertainty can be quantified and the most

robust control variable is computed.

6.4 Summary

A sequential controller for minimizing convection in directional solidification is devel-

oped in this chapter using Bayesian filter algorithm. The control variable, magnetic

gradient, is regarded as a random walk, whose optimal value at each time step is

estimated using the Maximize A Posteriori ”MAP” estimator. Effect of system un-

certainty is considered and quantified in this approach via discussion of fluctuating

boundary heat flux condition. The developed methodology is tested via numerical

examples. The key advantage of this Bayesian control approach is that there is

no need to solve sensitivity and adjoint problems and no need to store the direct

solution in memory. The control is conducted in a real-time fashion. Efficiency is

achieved by using this new approach. It also has the ability to quantify all sources

of uncertainties and estimate the optimal control variable accordingly.

Chapter 7

Multiscale permeability estimation

in heterogeneous porous media - A

multiscale Bayesian inversion

method

A multiscale Bayesian inversion method based on the hierarchical Markov tree

(HMT) model [103, 104] is presented in this chapter to address heterogeneous pa-

rameter estimation at different length scales. The methodology is introduced using

permeability estimation in heterogeneous porous media as the prototype problem.

The study is composed by two parts. In the first part, the permeability is esti-

mated at one length scale that is finer than the data-collection scale using Markov

Random Field as the prior. This method performs well for regular (smooth) per-

meability fields as tested by an example of estimating a permeability field whose

logarithm is bilinear. In the second part, the more generic case of a permeability

with random discontinuities is considered. A two-layer HMT model is used to model

132

133

the random permeability at both the data-collection scale and at another much finer

scale. The hierarchical Bayesian analysis is used to derive the posterior distribu-

tions of the permeability at both scales. The inner-scale spatial correlation of the

random permeability is assumed as a Markov chain. A hybrid MCMC algorithm is

developed to explore the posterior distributions of the permeability. In Section 7.1,

the permeability estimation problem is defined and the direct mathematical model

is introduced. In Section 7.2, the Bayesian formulations of the above discussed two

cases are derived and the associated MCMC algorithms are presented. Section 7.3

contains the numerical examples that are used to demonstrate the methodologies

developed in Section 7.2. A brief summary is provided in Section 7.4.

7.1 Problem definition

Simulation of transport processes in porous media flow requires the permeability

of the medium as an input. Nevertheless, the permeability of most media such

as the one of a geo-engineering system is usually not directly attainable and must

be estimated from related well or seismic data [51, 105]. The two types of well

data available are the static well data, which are the permeability measurements of

sample porous structures at the wells, and the dynamic well data, which are the

flow measurements at wells. The estimates obtained using static data are local,

therefore, are not able to represent the distributed porous structure. In this study,

we use dynamic flow data to compute the permeability, which makes permeability

estimation an inverse problem. A complication of this inverse problem is that the

well data are expensive to obtain and can only be collected at sparse locations. The

permeability in practice may vary significantly in space with random discontinuities.

To address this strongly ill-posed inverse problem, the direct mathematical model

134

is introduced first.

In this study, we consider the same direct model as in Chapter 5 for species

transport in porous media flow,

∇ · u = q, in Ω × (0, T ], (7.1)

u = −K(x)

µ(c)∇p, in Ω × (0, T ], (7.2)

φ∂c

∂t+∇ · (cu)−∇ · (D∇c) = cq, in Ω× (0, T ], (7.3)

with,

u · n = 0, on ∂Ω, × (0, T ], (7.4)

D∇c · n = 0, on ∂Ω× (0, T ]. (7.5)

c(x, 0) = c0(x), in Ω. (7.6)

with all the parameters defined in Chapter 5.

injection well

production well

u = 0

0=∂∂

n

cu = 0

0=∂∂

n

c

v = 0 0=∂∂

n

c

v = 0 0=∂∂

n

c

Figure 7.1: Schematic of a 9-spot problem. A injection well is located at the center

of the domain and 8 production wells distribute at the rest nodes of a 2× 2 grid. In

general, for a n-spot problem, the n wells distribute at nodes of a (√

n−1)×(√

n−1)

grid with the single injection well at the center.

The permeability K(x) is unknown in the above equations and will be estimated

from dynamic well data (pressure and/or concentration at wells in this study). In

135

specific, we consider a n-spot problem inside a square domain. An injection well is

located at the center of the square domain, while n-1 production wells distribute at

the rest nodes of a (√

n − 1) × (√

n − 1) grid. For instance, for a 9-spot problem,

the well distribution is as shown in Fig. 7.1. Pressure and concentration data are

measured at these well locations and used to estimate the permeability K(x). The

solution method to the direct problem has been introduced in Chapter 5 and the

same direct simulator will be used in this study with the permeability being the

input to the simulator and pressure and concentration data at the wells being the

output. As the inverse problem is concerned, we are interested in computing the

permeability at length scales that are equal or finer than the data-collection scale

((√

n− 1)× (√

n− 1)).

7.2 Bayesian posterior distribution of the random

permeability

The inverse problem can be posed in two different ways. In the first case, there

is a fixed target length scale on which the permeability is needed. This length

scale can be much smaller than the characteristic length scale of the sensor network

(here, the smallest distance between two wells). Solution to this strongly ill-posed

inverse problem is possible by introducing prior distribution model such as MRF.

This approach can be shown to provide satisfactory results when the underlying

permeability has good regularity (smoothness). In the second and more general

case, the permeability is expected to have random discontinuities. The problem

becomes more complicated and the smoothing prior model MRF will not be able to

resolve the rich heterogeneous feature of the porous structure. A potential approach

136

in this case is to estimate the permeability at different length scales by having

a hierarchically structured prior distribution. In the following sections, Bayesian

formulations for these two different situations are presented.

7.2.1 Formulation I: MRF-based one scale model

In this case, the objective is to estimate the permeability on a pre-determined length

scale that is finer than the data-collection scale. Let Y denote the concentration

measurements at the n-1 production wells (concentration at the injection well is

treated as a known boundary condition). The system model can be denoted as:

Y = F (K(x)) + ω, (7.7)

where F (K(x)) is the simulator that solves Eqs. (7.1) to (7.6) with K(x) being the

input and concentration at production wells as the output, and ω is the random

measurement noise following a zero mean Gaussian distribution.

To parameterize the permeability, the square domain is divided into a MxM

lattice with M being the number of elements in each coordinate and m = M ×M

being the total number of elements (pixels). Note that here m is much larger than

n. The permeability is assumed to be uniform within each pixel. Let θ denote the

unknown vector containing logarithms of the permeability values at all these pixels

(the dimension of θ is m). The system equation can be rewritten as:

Y = F (θ) + ω. (7.8)

The reason for representing the permeability using the local uniform model is that

permeability, by its definition, is a locally averaged parameter that models the pro-

portionality relation between the pressure gradient and the velocity field. Also note

that in this work θ is taken as the logarithm of the permeability value. This is

137

because the permeability is often assumed to be log-normal to account for its non-

negativity, and its logarithm follows Gaussian distribution, for which it is easier to

model the priors.

With the above assumptions, the posterior distribution of log permeability vector

θ is:

p(θ|Y ) ∝ exp− 1

2σ2[F (θ)− Y ]T [F (θ)− Y ] · exp−1

2λθT Wθ, (7.9)

in which σ is the standard deviation of the measurement error and W is defined in

the previous MRF model (Section 2.3). This formulation is identical as the inverse

formulations in the previous chapters. However, the exploration of it is much more

difficult because the dimension of θ is in general large and the direct simulation is

costly. The hybrid MCMC algorithm used to solve this issue is discussed in Section

7.2.3. It is emphasized here again that the formulation in Eq. (7.9) is only able

to provide smooth permeability estimate on fine grid with sparse well data. It will

be shown that it works well for applications where only the averaged equivalent

permeability is required.

7.2.2 Formulation II: HMT-based two scale model

In the more general case, the permeability has many random discontinuities that

have magnitudes varying over several length scales. For example, the logarithm of

a random permeability is shown in Fig. 7.2. Inside the [0, 8]× [0.8] square domain,

there are two sub-areas with large-magnitude permeability ([2, 4]× [4, 6] and [4, 6]×[2, 4]). The permeability in the upper-left darker area ([2, 4]× [4, 6]) has magnitude

around 3,000. The permeability in the lower-right darker area ([4, 6] × [2, 4]) is

around 700. While the permeability in the remaining of the domain is 1.0. Within

each sub-area with large permeability, the logarithm of permeability is a random

138

0 2 4 6 80

1

2

3

4

5

6

7

8

1

2

3

4

5

6

7

8

Figure 7.2: The log permeability of a random porous medium. Two large magni-

tude discontinuities occur within two darker areas ([2, 4] × [4, 6] and [4, 6] × [2, 4]).

Within each of these areas, the permeability is a correlated Gaussian random field

with a correlation function of ρ(r) = e−r2with r being the spatial distance among

two locations. The random variations within each darker area have much smaller

magnitude than the average magnitudes of both darker areas permeability.

2 2.5 3 3.5 44

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

5.8

6

7.8

7.9

8

8.1

8.2

8.3

Figure 7.3: The enlarged upper-left darker area ([2, 4]×[4, 6]) of the log permeability

in Fig. 7.2.

139

4 4.5 5 5.5 62

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

6.2

6.3

6.4

6.5

6.6

6.7

6.8

Figure 7.4: The enlarged lower-right darker area ([4, 6]× [2, 4]) of log permeability

in Fig. 7.2.

Gaussian field with a correlation function of ρ(r) = e−r2with r being the distance

between two locations. These two random fields are zoomed in as shown in Fig

7.3 and Fig. 7.4, respectively. It is obvious that the Bayesian formulation given in

Section 7.2.1 will not work for estimating such permeability since it tends to smooth

the field.

We consider a hierarchical representation of the heterogeneous permeability in

this section as depicted in Fig. 7.5 [103]. At the coarsest scale (a lumped parameter

representation), a homogeneous value is used to approximate the permeability, which

is represented as the root pixel in Fig. 7.5. Assuming that in each subsequent smaller

length scale, every pixel (parent) is split into t different pixels (children) as indicated

in Fig. 7.5, where t is equal to 4 (of course, t can take other values as well). As the

splitting proceeds, finer and finer description of the permeability is allowed.

For convenience of discussion, we introduce the following notation. Let s be

the scale index, or say the distance of the scale to the root pixel (s=0 for the root

scale), ms be the number of pixels in the sth layer, obviously ms = ts, and θsi be the

logarithm of permeability value at the ith pixel of the sth layer. In addition, we use

140

s = 0 (root scale)

s = 1 (1st scale)

s = 2 (2nd scale)

Figure 7.5: A scheme of hierarchical Markov tree model.

Θs to denote all pixel values (log permeability values) at the sth scale. Finally, we

assume Y s is the data set at the sth level.

As the inverse problem considered in this section, one computes the posterior

distribution p(Θs|Y r)’s for all s that is equal to or larger than r. To derive these

posterior probabilities, the hierarchical Markov tree model shown with a graphic

representation in Figure 7.5 is used.

The situation where the permeability is to be estimated on two length scales r

and s is first considered. r is the data-collection scale and s > r is a scale that has

much higher resolution than r. It is also assumed that each pixel on scale r is split

into t pixels on length scale s. The Bayesian formulation for this inverse problem is

composed by two parts as discussed next.

To estimate the permeability on scale r, no prior regularization is needed since it

is the same length scale as the data-collection scale. The given information of well

data is enough to solve the problem. Therefore, the distribution of permeability on

length scale r is modeled using the likelihood only.

p(Θr|Y r) ∝ exp− 1

2σ2[F (Θr)− Y r]T [F (Θr)− Y r], (7.10)

which implies that the prior of Θr is uniform.

141

The next step is to derive the posterior distribution of Θs, the log permeability

estimate on length scale s. Using the hierarchical Bayesian analysis, it is easy to

obtain that

p(Θs, Θr|Y r) ∝ p(Y r|Θs, Θr)p(Θs, Θr) ∝ p(Y r|Θs)p(Θs|Θr)p(Θr). (7.11)

In the derivation of Eq. (7.11), we assume that the data do not depend on coarser

scale permeability once the finer scale permeability is known. This is valid since the

finer scale permeability contains more information of the porous structure than the

coarser scale permeability. The coarser scale log permeability Θr in this formulation

is actually treated as the hyper-parameter and only matters in the prior distribution

of finer scale permeability Θs.

The likelihood in Eq. (7.11) is the same as Eq. (7.10) by replacing Θr with

Θs. The hyper-prior distribution p(Θr) in Eq. (7.11) is nothing but Eq. (7.10). To

model the conditional prior distribution p(Θs|Θr), the Markov assumption is used

for both the cross-length-scale and the inner-length-scale correlations. Namely,

p(θsi |Θs

−i, Θr) ∝ p(θs

i |θs∼i, θ

rpi

), (7.12)

where Θs−i denotes the vector θs

1, ..., θsi−1, θ

si+1, ..., θ

sms, θs

∼i denotes the permeabil-

ity pixels adjacent to θsi in the same scale (4 of them in 2-dimension case), and θr

pi

denotes the parent pixel of θsi on the length scale r. Eq. (7.12) states that the

distribution of permeability at a pixel on the finer scale depends only on the per-

meability at adjacent sites and the parent permeability at the coarser length scale.

This conditional distribution also defines a valid joint prior distribution of Θs condi-

tional on Θr by Hammerseley-Clifford theorem [37]. Since the sampling algorithms

introduced in the next section only require the conditional distribution to explore

the posterior state space, the joint distribution formula corresponding to Eq. (7.12)

142

is not needed.

Although examples in this chapter only involve estimation of the permeability on

two length scales, there is an interest to derive the Bayesian formula for estimating

permeability at arbitrary number of length scales. The downward Markov property

is assumed in this regard. Namely,

p(Θs|Θs−1, Θs−2, ..., Θ0) ∝ p(Θs|Θs−1), (7.13)

which is a perfectly admissible assumption since the finer scale permeability repre-

sentation contains all information about the permeability resolution at the coarser

scales. With this assumption, the posterior of permeability on S ordered length

scales s1, s2, ..., sS, with s1 being the coarsest and also the data-collection scale, is

p(Θs1 , ..., ΘsS |Y s1) ∝ p(Y s1|ΘsS)p(ΘsS |ΘsS−1)...p(Θs2|Θs1)p(Θs1). (7.14)

The conditional distributions p(Θsi|Θsi−1) in the above equation can be derived as

in Eq. (7.12), the likelihood has the same form as Eq. (7.10), and the hyper-prior

p(Θs1) is basically its likelihood.

7.2.3 Exploring the posterior state space

The Bayesian posterior distributions derived in the previous two sections both have

very high-dimension in general. The single component updating schemes used in

previous chapters are too expensive to implement for these formulations, while the

acceptance ratio will be extremely low if the unknown log permeability field is

updated entirely in each MCMC sampling step. To explore the joint state space

efficiently, the block hybrid MCMC sampler is used here [38]. The procedure of

this design is to first divide the 2-dimensional permeability field (Ms × Ms pixels

143

* *

** *** ......

* *

** *** ......

* *

** *** ......

Figure 7.6: Schematics of single component updating (upper-left), block updating

(upper-right), and whole field updating (lower).

on each length scale s) into certain equal-sized blocks. The pixel values inside the

same block are updated the same time at one sampling iteration.

A schematic comparing the single component, the entire field, and the block

updating methods is shown in Fig. 7.6. In the block hybrid MCMC, the log per-

meability on pixels inside each block is updated in one MCMC step. The candidate

sample vector for these parameters inside each block is generated one component

at a time using the conditional distribution Eq. (7.12) as the proposal distribution.

In this algorithm design, the MCMC transition kernel is actually composed by all

block transition kernels. Let Zs be the number of blocks within length scale s, zsj be

the number of pixels within the jth block, and Θsj denote all log permeability within

block j. The algorithm is summarized below:

Algorithm VII

1. Initialize (Θs)(0)

144

2. For i = 0 : Nmcmc− 1

For j = 1 : Zs


— For k = 1 : zsj

— sample (θsjk

)(∗) ∼ qjk((θs

jk)(∗)|(Θs

−jk)(i+1), (θr

pjs)(i))

— if u < A((Θsj)

(∗), (Θsj)

(i))

(Θsj)

(i+1) = (Θsj)

(∗)

— else

(Θsj)

(i+1) = (Θsj)

(i),

where the subscript jk denotes the kth component in the jth block. Since the proposal

distribution qjkis the conditional prior Eq. (7.12), the acceptance probability is

nothing but the likelihood ratio A((Θsj)

(∗), (Θsj)

(i)) = min1, p(Y r|(Θsj)

(∗)

p(Y r|(Θsj)

(i)).

The above introduced hybrid MCMC algorithm works for both formulations in

the above two sections. However, for the likelihood distribution Eq. (7.10), the single

component updating scheme used in the previous chapters works better since the

dimension in this distribution is very low. An extra sampling step is needed when

exploring the posterior distribution Eq. (7.11) for updating the hyper-parameter

Θr. The sampling importance resampling (SIR) algorithm is used for this purpose

[27]. The step is as follows. When using the single component scheme to explore

Eq. (7.10), a set of samples of the coarse scale log permeability are collected with

each sample having a weight proportional to its likelihood (the sampling step). In

the exploration of Eq. (7.11), the fine scale log permeability is updated first using

the above listed hybrid MCMC algorithm. Then a Θr candidate sample is drawn

145

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

10

20

30

40

50

60

Figure 7.7: The true permeability field in example I.

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

10

20

30

40

50

60

Figure 7.8: The permeability estimate on 32× 32 grid using data at 24 wells.

from the previously collected sample set according to these samples’ relative weights.

The acceptance of this candidate is determined by the standard MCMC acceptance

probability (the importance resampling step). Finally, the resampled Θr sample set

will have equal weight.

146

0 1 2 3 4 5 6 7 8

0

1

2

3

4

5

6

7

8

10

20

30

40

50

60

Figure 7.9: The permeability estimate on 16× 16 grid using data at 24 wells.

0 1 2 3 4 5 6 7 8

0

1

2

3

4

5

6

7

8

10

20

30

40

50

60

Figure 7.10: The permeability estimate on 8× 8 resolution using data at 24 wells.

147

7.3 Examples

7.3.1 Example I - permeability with bilinear logarithm

In the first example, we apply the one scale Bayesian formulation Eq. (7.9) to

estimate a permeability as shown in Fig. 7.7. A similar problem has been studied in

[51] by modeling the mass transport process as immersible displacement and using

tracer breakthrough time as the data. In the current example, the logarithm of the

permeability field is a linear function of x and y,

logK(x, y) = 0.5(x− 4.0) + 0.5(y − 4.0). (7.15)

The domain is [0.8] × [0, 8]. The parameters in the direct model are µ = 1.0 and

φ = 0.1. The volume injection rate at the injection well is 2.0. At time zero, the con-

centration is zero everywhere except at the injection well ([4,4]). The concentration

at the injection well is kept at 1.0 during the entire time period.

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

10

20

30

40

50

60

Figure 7.11: The permeability estimate on 32× 32 resolution using data at 8 wells

without smoothing.

The direct problem is solved on a 32 × 32 grid from time zero until the tracer

is detected at all the production wells. The simulation data are the concentration

values at production wells for all the time. The minimum value of the concentration

148

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

10

20

30

40

50

60


with smoothing.

0 1 2 3 4 5 6 7 8

0

1

2

3

4

5

6

7

8

10

20

30

40

50

60


with smoothing.

that can be detected is assumed to be 0.0001. Simulation data are generated by

adding random Gauss errors to the direct simulation results. The Gaussian distri-

bution used to generate the errors has a mean of zero and a standard deviation of

5% of the true concentration value. Two cases are considered. In the first situation,

it is assumed that there are 24 production wells evenly distributed at nodes of a

4× 4 grid (a 25-spot problem). In the second case, only 8 production wells as seen

in Fig. 7.1 are considered (a 9-spot problem). The finest permeability estimate is

on a 32× 32 grid in both cases. The scale ratios of unknown permeability to sensor

149

0 1 2 3 4 5 6 7 8

0

1

2

3

4

5

6

7

8

10

20

30

40

50

60

Figure 7.14: The permeability estimate on 8 × 8 resolution using data at 8 wells

with smoothing.

network (M/(√

n− 1)) are 8 and 16 in these two cases, respectively.

In Figs. 7.8, 7.9, and 7.10, the equivalent permeability estimates on 32 × 32,

16 × 16, and 8 × 8 grids are plotted (these three problems are solved separately

using Eq. 7.9). In these estimates, the concentration data at 24 wells (evenly

distributed in the domain) are used. The estimates obtained using concentration

data at only 8 wells (as shown in Fig. 7.1) are plotted in Figs. 7.11 to 7.14. Note

that the plotted results are all the MAP estimates. For the estimates in Figs. 7.12

to 7.14, the results are obtained by applying MRF smoothing to the original MAP

sample. It is seen that the formulation Eq. 7.10 provides rather good estimates of

the regular smooth permeability.

7.3.2 Example II - permeability of a random heterogeneous

medium

In the second example, we apply the two-scale Bayesian posterior distribution de-

veloped in Section 7.2.2 to estimate the random permeability as shown in Fig. 7.2.

In this example, the pressure data at well locations are used instead of the concen-

150

0 2 4 6 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Figure 7.15: The coarse scale estimate of the random heterogeneous permeability

(logarithm of the permeability is plotted).

0 2 4 6 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

2 2.5 3 3.5 44

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

5.8

6

7.8

7.9

8

8.1

8.2

8.3

4 4.5 5 5.5 62

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

6.2

6.3

6.4

6.5

6.6

6.7

Figure 7.16: Realization I of the fine scale log permeability distribution. The left

plot is the entire field. The middle plot is the enlarged upper-left darker area

([2, 4]× [4, 6]). The right plot is the enlarged lower-right darker area ([4, 6]× [2, 4]).

tration data. Therefore, in the direct model, only Eqs. (7.1) and (7.2) are solved

with the non-penetrating boundary conditions. Again, the viscosity µ is 1.0 and the

volume injection rate is 2.0. The direct problem is solved on a 128×128 grid, which

is also the resolution of the true permeability. The data are generated by adding

zero mean Gauss random errors to the simulation results (pressure values at well

locations). Again the standard deviation of the distribution used to generate the

random errors is 5 % of the true pressure values. A 25-spot case is considered here

151

0 2 4 6 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

2 2.5 3 3.5 44

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

5.8

6

7.8

7.9

8

8.1

8.2

8.3

4 4.5 5 5.5 62

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

6.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

Figure 7.17: Realization II of the fine scale log permeability distribution. The

left plot is the entire field. The middle plot is the enlarged upper-left darker area


0 2 4 6 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

2 2.5 3 3.5 44

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

5.8

6

7.7

7.8

7.9

8

8.1

8.2

8.3

4 4.5 5 5.5 62

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

6.2

6.3

6.4

6.5

6.6

6.7

Figure 7.18: Realization III of the fine scale log permeability distribution. The

left plot is the entire field. The middle plot is the enlarged upper-left darker area


152

0 2 4 6 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

2 2.5 3 3.5 44

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

5.8

6

8.03

8.035

8.04

8.045

8.05

8.055

8.06

8.065

8.07

8.075

4 4.5 5 5.5 62

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

6.435

6.44

6.445

6.45

6.455

6.46

6.465

6.47

6.475

Figure 7.19: Sample mean of the fine scale log permeability distribution. The left



(total 25 pressure measurements).

In the first step, the formulation Eq. (7.10) is used to estimate the permeability

on the same length scale as the data collection (a 4× 4 lattice). The result is shown

in Fig. 7.15.

The fine scale permeability is then estimated on a 128 × 128 grid. Three real-

izations (samples from the posterior distribution) of the random permeability are

plotted in Figs. 7.16 to 7.18. The plots are all log permeability values. The sample

mean of 1000 realizations is plotted in Fig. 7.19. Note that the color bars of the

middle and right plots in Fig. 7.19 indicate much smaller intervals than the corre-

sponding color bars in other figures. The mean of these samples is quite close to the

coarse scale estimate. This is a reasonable result since the data do not contain much

information of permeability on length scales smaller than the well distribution. It

only makes sense to estimate the distribution of the fine scale permeability instead

of point estimates.

In addition to the above weakly correlated permeability (ρ = e−r2) example, a

permeability with stronger spatial correlation (ρ = e−|r|) is estimated following the

153

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

7.7

7.8

7.9

8

8.1

8.2

8.3

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

6.35

6.4

6.45

6.5

6.55

6.6

6.65

Figure 7.20: True log permeability with correlation coefficient ρ = e−|r|. The left



0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

8

8.1

8.2

8.3

8.4

8.5

8.6

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

6.2

6.3

6.4

6.5

6.6

6.7

Figure 7.21: Realization I of the fine scale log permeability distribution with ρ =

e−|r|. The left plot is the entire field. The middle plot is the enlarged upper-left

darker area ([2, 4] × [4, 6]). The right plot is the enlarged lower-right darker area

([4, 6]× [2, 4]).

exactly same procedure. The distribution for generating random errors is also the

same as the above. The true log permeability and two realizations from the fine

scale posterior distribution are depicted in Figs. 7.20 to 7.22.

7.4 Summary

A multiscale permeability estimation problem is addressed in this chapter using

hierarchical Bayesian analysis and a hierarchical Markov tree (HMT) model. A hy-

154

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

8

8.1

8.2

8.3

8.4

8.5

8.6

8.7

0 1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

6.2

6.3

6.4

6.5

6.6

6.7

Figure 7.22: Realization II of the fine scale log permeability distribution with ρ =

e−|r|. The left plot is the entire field. The middle plot is the enlarged upper-left

darker area ([2, 4] × [4, 6]). The right plot is the enlarged lower-right darker area

([4, 6]× [2, 4]).

brid MCMC sampler is designed to explore the high-dimensional two-scale posterior

state space. It is demonstrated though the estimation of a random heterogeneous

permeability using pressure data that the Bayesian formulations are able to provide

distribution estimate of the fine scale permeability using sparse data.

Chapter 8

Conclusions and suggestions for

the future research

An integrated Bayesian and scientific computational framework has been developed

in this thesis for the solution of generic data-driven inverse problems in continuum

transport processes (inverse transport problems). Stochastic modeling of the inverse

transport problems using Bayesian inference method is first introduced via study of

a group of inverse heat conduction problems (IHCPs). An automated regularization

parameter selection method is presented using the hierarchical Bayesian models.

Two-scale Markov Random Field (MRF) and discontinuity adaptive Markov Ran-

dom Field (DAMRF) are used as prior models in addressing inverse problems with

dynamic unknowns and discontinuous unknowns, respectively. Metropolis-Hastings

(MH) algorithms, in particular, the Gibbs sampler and cyclic MH samplers, are in-

troduced for numerical solution of the IHCPs. The fusion of Bayesian computation

and continuum modeling is established via these studies.

For complex nonlinear inverse transport problems, proper orthogonal decompo-

sition (POD) based reduced-order modeling technique is introduced and integrated

155

156

with MH algorithms and parallel computation. The high computational cost asso-

ciated with Bayesian approach to inverse transport problems is addressed. Further,

backward solution to continuum transport process equations and open-loop control

of directional solidification are addressed using sequential Bayesian computational

method. Finally, the multiscale parameter estimation in heterogeneous porous me-

dia flow is address by introducing the hierarchical Markov tree (HMT) model.

It is concluded that the Bayesian approach is able to model and quantify system

and measurement uncertainties in inverse transport problems. Bayesian method

cures the ill-posedness of the inverse problem by posing it as a well-posed problem

in an expanded stochastic space. The usage of spatial statistics models in prior

distribution modeling and the hierarchical Bayesian models allow more flexible reg-

ularization than conventional inverse methods and enable automated selection of the

regularization parameters. It is also concluded that the high computational cost in

Bayesian computational approach can be addressed by using reduced-order modeling

techniques in the exploration of Bayesian posterior distributions. Also, the sequen-

tial Bayesian computational approach is more computationally efficient than the

conventional whole-time domain functional optimization approach in estimation-

prediction of dynamic transport processes. Mutlsicale Bayesian inversion models

provide tools to obtain the distribution and statistics of heterogeneous parameters

with sparse data.

Although this thesis has addressed the fundamental issues of applying Bayesian

computational method to inverse transport problems, several new developments can

broaden the applicability and scope of this approach. Suggestions for the continua-

tion of this study are provided next.

157

8.1 Pattern recognition for reduced-order model-

ing

It has been discussed that POD-based reduced-order modeling is one effective ap-

proach to reduce the computational cost in Bayesian computation. However, the

accuracy of reduced-order modeling depends strongly upon selection of the POD

basis fields. For inverse problem in complex continuum systems, the underlying

physics may vary significantly. Different physics may exist simultaneously such as

the combined heat convection and radiation processes in participating media. Also,

the mathematical models for the same physical process may be different. For in-

stance, the dispersion tensor in an advection-dispersion equation may by isotropic

or anisotropic, homogeneous or inhomogeneous. Furthermore, characteristics of the

same physical process may be completely different for different parameters in the

governing PDEs. For instance, the famous Bernard thermal instability with vari-

ation in thermal Rayleigh number. Therefore, for a complex continuum system

with significant experimental and/or simulation results, it is necessary to develop

an algorithm that can identify the proper POD basis containing the correct physical

modes.

Developing a digital library with associated classification, search and enlarge-

ment abilities is a potential approach to address this issue. Statistical classification

algorithms can be used to train the massive field data for the categorized digital

library. Each branch of the library contains a set of POD basis that corresponds to

a specific physical process with certain parameter range. The proper basis for new

application can be extracted from the library. Considering the high dimensionality

feature of such classification problem, Support Vector Machine (SVM) [106, 107]

158

is deemed as a proper means to conduct the basis model selection task. SVM is

a statistical learning algorithm for pattern classification and regression, which in-

volves training a set of optimal separation hyper-planes from labelled fields belong

to different categories. This development of digital library with associated classifier

will enable the application of Bayesian computation for more generic and complex

inverse continuum problems.

8.2 Enhancing the multiscale Bayesian inversion

techniques

To solve multiscale parameter estimation problems as the permeability estimation

problem discussed in Chapter 7, the key issues are modeling prior distribution of

the heterogeneous parameter at hierarchical length scales, deriving the posterior

distribution that contain parameters and data at all these length scales, allowing

adaptive selection of the basis functions for the discretization, and exploring the

multiscale posterior state space efficiently. A hierarchical Markov tree (HMT) model

has been introduced in this thesis to address the problem under a Bayesian analysis

framework. Some further development may be able to enhance the capability to

deal with more generic multiscale inverse problems.

Firstly, development of adaptive basis function selection method using Bayesian

analysis [108] enables more flexible scheme to model the multiscale parameters. In

addition, it is obvious that both deterministic optimization methods and standard

statistical computation algorithms fail to calculate the statistics of multiscale para-

meter estimates. New algorithms that integrate hybrid MCMC simulation [38] and

multiscale simulation methods are the potential approaches. The major complica-

159

tion of sampling from the multiscale parameter state space is that no MCMC pro-

posal distribution works universally well for all coefficients at different locations and

scales. Hybrid MCMC designation allows using a set of decoupled proposal kernels,

each of which targets at generating samples from a compact region of the multiscale

state space, to compose one mixed kernel analytically. The mixed proposal distrib-

ution guarantees that the entire multiscale posterior state space is explored, while

the direct multiscale simulation methods can be used within the likelihood compu-

tation for each hybrid update step. Hybrid MCMC schemes have to be fused with

multiscale simulation tools to explore the distribution with the abilities of i) gener-

ating samples of parameter projections at all length scales efficiently; ii) spending

more sampling steps in the regions and length scales with high heterogeneity; iii)

projecting distribution on one scale upward and downward to other scales; and iv)

updating the distribution with consecutively achieved data.

8.3 Wavelet function representation

In all of the examples in this thesis, finite element basis functions are used for

discretization of the inverse solution. This is not the optimal strategy in applications

where the inverse solution is multiscale in nature. The wavelet domain method

provides better sets of basis for such inverse problems [109].

Wavelet basis functions can represent function spaces at all length scales. The

unresolved residue can be quantified using its projection at the minimum resolved

length scale. It is a consistent choice for parametrization of multiscale inverse so-

lutions. By linking with mutual information theory and Maximum Entropy theory,

the proper minimum resolved length scale and statistics of multiscale parameter

160

can be estimated. Therefore, it is expected by introducing wavelet basis function to

Bayesian inverse computation, broader applications can be achieved.

Bibliography

[1] J. V. Beck, B. Blackwell and C. R. St-Clair Jr., Inverse Heat Conduction:

Ill-posed Problems, Wiley-Interscience, New York, 1985.

[2] R. Siegel and J.R. Howell., Thermal Radiation Heat Transfer, 3rd Edt., Hemi-

sphere publishing corporation, 1992.

[3] O. M. Alifanov, Inverse Heat Transfer Problems, Springer-Verlag, Berlin, 1994.

[4] M. N. Ozisik and R. B. Orlande., Inverse Heat Transfer: Fundamentals and

Applications, Taylor & Francis pbulication, 2000.

[5] K. A. Woodbury. (Edt.), Inverse Engineering Handbook 1st edition, CRC

Press, 2002

[6] N. Zabaras, Inverse problems in heat transfer, Chapter 17, Handbook of

Numerical Heat Transfer, 2nd Edt., John Wiley and Sons, 2004. (W.J.

Minkowycz, E.M. Sparrow, J. Y. Murthy, Edts.)

[7] N. Zabaras and S. Kang, On the Solution of An Ill-posed Inverse Design Solidi-

fication Problem Using Minimization Techniques in Finite and Infinite Dimen-

sional Spaces, International Journal for Numerical Methods in Engineering,

36:3973-3990, 1993.

161

162

[8] R. Sampath and N. Zabaras, Inverse thermal design and control of solidifica-

tion processes in the presence of a strong external magnetic field, International

Journal for Numerical Methods in Engineering, 50:2489-2520, 2001.

[9] R. Sampath and N. Zabaras, A functional optimization approach to an inverse

magneto-convection problem, Compt. Meth. Appl. Mech. Eng., 190:2063–2097,

2001.

[10] A.N. Tikhonov, Solution of incorrectly formulated problems and the regular-

ization method, Soviet Math. Dokl., 4(4):1035-1038, 1963.

[11] A.N. Tikhonov, Regularization of incorrectly posed problems, Soviet Math.

Dokl., 4(6):1624-1627, 1963.

[12] A. N. Tikhonov, Solution of ill-posed problems, Halsted Press, Washington,

1977.

[13] N. Zabaras, and J.C. Liu, An analysis of two-dimensional linear inverse heat-

transfer problems using an integral method, Num. Heat Transfer, 13(4):527-

533, 1988.

[14] F. Hettlichy and W. Rundell, Iterative methods for the reconstruction of an

inverse potential problem, Inverse Problems, 12:251-266, 1996.

[15] D.A. Murio, The Mollification Method and the Numerical Solution of Ill-posed

Problem, Wiley, New York, 1993.

[16] P. M. Norris, Application of experimental design methods to assess the efect

of uncertain boundary conditions in inverse heat transfer problems, Int. J.

Heat Mass Transfer, 41:313-322, 1998.

163

[17] B. F. Blackwell and K. J. Dowding., Sensitivity and uncertainty analysis for

thermal problems, Preceedings of the 4th International Conference on Inverse

Problems in Engineering Rio de Janeiro, Brazil, 2002

[18] T. D. Fadale, A. V. Nenarokomov and A. F. Emery, Uncertainties in parameter

estimation: the inverse problem, Int. J. Heat Mass Transfer, 38(3):511-518,

1995.

[19] A. F. Emery, A. V. Nenarokomov and T. D. Fadale, Uncertainties in para-

meter estimation: the optimal experiment design, Int. J. Heat Mass Transfer,

43:3331-3339, 2000.

[20] V. A. Badri Narayanan and N. Zabaras, Stochastic inverse heat conduction

using a spectral approach, Int. J. Numer. Meth. Eng., 60(9):1569-1593, 2004.

[21] A. F. Emery, Stochastic regularization for thermal problems with uncertain

parameters, Inverse Problems in Engineering, 9:109-125, 2001.

[22] C. Ferrero and K. Gallagher, Stochastic thermal history modeling, 1. con-

straining heat flow histories and their uncertainty, Marine and Petroleum Ge-

ology, 19:633-648, 2002.

[23] N. Leoni and C.H. Amon, Bayesian surrogates for integrating numerical, ana-

lytical and experimental data: application to inverse heat transfer in wearable

computers, IEEE Transactions Comps. Pack. Manuf. Technology, 23:23-33,

2000.

[24] C. P. Robert, The Bayesian Choice, From Decision-Theoretic Foundations to

Computational Implementation, 2nd Edt., Springer, 2001.

164

[25] J. Besag, P. Green, D. Higdon and K. Mengersen, Bayesian computation and

stochastic systems, Statistical Science, 10(1):3-41, 1995.

[26] P. Congdon, Bayesian Statistical Modeling, Wiley, New York, 2001.

[27] J. P. Kaipio and E. Somersalo, Computational and Statistical Methods for

Inverse Problems, Springer-Verlag, New York, 2005.

[28] C. Vogel, An applied mathematician’s perpective on regularization methods,

lecture in opening workshop for inverse problem methodology in complex sto-

chastic models, Session of parameter estimation and inverse problems, sta-

tistics perspective, Statistical and Applied Mathematical Sciences Institute,

Duke University, 2002.

[29] J. Wang and N. Zabaras, A Bayesian inference approach to the stochastic

inverse heat conduction problem, International Journal of Heat and Mass

Transfer, 47:3927-3941, 2004.

[30] J. Wang and N. Zabaras, A Markov Random Field model of contamination

source identification in porous media flow, International Journal of Heat and

Mass Transfer, accepted for publication.

[31] J. Wang and N. Zabaras, Hierarchical Bayesian models for inverse problems

in heat conduction, Inverse Problems, 21:183-206, 2005.

[32] J. Wang and N. Zabaras, Using Bayesian statistics in the estimation of heat

source in radiation, International Journal of Heat and Mass Transfer, 48:15-

29, 2005.

[33] J. Besag and C. Kooperberg, On the conditional and intrinsic autoregressions,

Biometrika, 82:733-746, 1995.

165

[34] J. Møller (Edt.), Spatial Statistics and Computational Methods, Springer-

Verlag, New York, 2003.

[35] J. Mateu and F. Montes (Edts.), Spatial Statistics Through Applications, In-

ternational Series on Advances in Ecological Sciences, WITPress, Boston,

2002.

[36] J. Besag and R. A. Kempton, Statistical analysis of field experiments, Bio-

metrics, 78:301-304, 1986.

[37] J. Besag and P. J. Green, Spatial statistics and Bayesian computation, Journal

of the Royal Statistical Society, Series B, Methodological, 55:25-37, 1993.

[38] C. Andrieu, N. Freitas, A. Doucet and M. I. Gordan, An introduction to

MCMC for machine learning, Machine Learning, 50:5-43, 2003.

[39] J. S. Liu, Monte Carlo Strategies in Scientific Computing, Springer-Verlag,

Berlin, 2001.

[40] P. Bremaud, Markov Chains, Gibbs Fields, Monte Carlo Simulation, and

Queues, Springer-Verlag, New York, 1999.

[41] P. J. Van Laarhoven and E. H. L. Arts, Simulated Annealing: Theory and

Applications, Reidel Publishers, Amsterdam, 1987.

[42] L. Tierney, Markov chains for exploring posterior distributions, The Annals

of Statistics, 22(4):1701-1762, 1994.

[43] W. R. Gilks, S. Richardson and D. J. Spiegelhalter(Edt.), Markov Chain

Monte Carlo in Practice, Chapman & Hall Ltd, New York, 1996.

166

[44] I. Beichl and F. Sullivan, The Metropolis algorithm, Computing in Science

and Engineering, 2(l):65-69, 2000.

[45] J. L. Beck and S. K. Au, Bayesian updating of structural models and reliabil-

ity using Markov Chain Monte Carlo Simulation, J Engineering Mechanics,

128:380-391, 2002.

[46] J. P. Kaipio, V. Kolehmainen, E. Somersalo and M. Vauhkonen, Statistical

inversion and Monte Carlo sampling methods in electrical impedance tomog-

raphy, Inverse Problems, 16: 1487-1522, 2000.

[47] T. J. Sabin, C. A. L. Bailer-Jones and P.J. Withers, Accelerated learning using

Gaussian process models to predict static recrystallization in an Al-Mg alloy,

Modeling Simul. Mater. Sci. Eng., 8:687-706, 2000.

[48] A. M. Michalak and P. K. Kitanidis, A method for enforcing parameter non-

negativity in Bayesian inverse problems with an applicaiton to contaminant

source identification, Water Resour. Res., 39:1033-1046, 2003

[49] I. G. Osio, Multistage Bayesian Surrogates and Optimal Sampling for En-

gineering Design and Process Improvement, Ph.D. Thesis, Carnegie Mellon

University, Pittsburgh, PA, 1996

[50] D. Higdon, H. K. Lee and Z. Bi, A Bayesian approach to characterizing un-

certainty in inverse problems using coarse and fine-scale information, IEEE

Transactions on Signal Processing, 50(2):389-399, 2002.

[51] H. K. Lee, D. M. Higdon, Z. Bi, M. A.R. Ferreira and M. West, Markov

random field models for high-dimensional parameters in simulations of fluid

flow in porous media, Technometrics, 44 (3):230-241, 2002.

167

[52] R. Chellappa and A. Jain (Edts), Markov Random Fields Theory and Appli-

cation, Academic Press Inc., 1991.

[53] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions and the

Bayesian restoration of images, Transactions on Pattern Analysis and Machine

Intelligence, 6:721-741, 1984.

[54] S. P. Brooks and G. O. Roberts, Convergence assesment techniques for Markov

Chain Monte Carlo, Statistics and Computing, 8:319-335, 1998.

[55] J. V. Beck and K. J. Arnold, Parameter Estimation in Engineering and Sci-

ence, New York: Wiley, 1977.

[56] J. R. Cannon and P. DuChateau, Inverse problems for an unknown source in

heat equation. J Mathematical Analysis and Applications, 75:465-485, 1980

[57] V. Isakov, Inverse Source Problems, American Mathematical Society, 1990

[58] A. Nanda and P. Das, Determination fo the source term in the heat conduction

equation, Inverse Problems, 12:325-339, 1996.

[59] D. J. Kang and K. S. Roh, A discontinuity adaptive Markov model for color

image smoothing, Image and Vision Computing, 19:369-379, 2001

[60] S. Z. Li, Discontinuous MRF prior and robust statistics: a comparative study,

Image and Vision Computing, 13:227-233, 1995

[61] Z. Yi and D. A. Murio, Source terms identification for the diffusion equa-

tion, 4th International Conference on Inverse Problems in Engineering Rio de

Janeiro, Brazil, 2002.

168

[62] P. Holmes, J. L. Lumley and G. Berkooz, Turbulence, Coherent Structures,

Dynamical Systems and Symmetry, Cambridge University Press, 1998.

[63] M. F. Modest, Radiative Heat Transfer, McGraw-Hill, Inc., 1993.

[64] C. H. Ho, M. N. Ozisik, An inverse radiation problem, International Journal

of Heat and Mass Transfer, 32:335-341, 1989.

[65] N. J. McCormick, Inverse radiative transfer problems : a review, Nuclear

Science and Engineering, 112:185-198, 1992.

[66] H. E., Ofodike, A. Ezekoye and J. R. Howell, Comparison of three regularized

solution techniques in a three-dimensional inverse radiation problem, Journal

of Quantitative Spectroscopy and Radiative Transfer, 73:307-316, 2002.

[67] L. H. Liu and H. P. Tan, Inverse radiation problem in three-dimensional com-

plicated geometric systems with opaque boundaries, Journal of Quantitative

Spectroscopy and Radiative Transfer, 68:559-573, 2001.

[68] C. E. Siewert, An inverse source problem in radiative transfer, Journal of

Quantitative Spectroscopy and Radiative Transfer, 50:603-609, 1993.

[69] T. Viik and N. J. McCormick, Numerical test of an inverse polarized radia-

tive transfer algorithm, Journal of Quantitative Spectroscopy and Radiative

Transfer, 78:235-241, 2003.

[70] M. Prud’homme and S. Jasmin, Determination of a heat source in porous

medium with convective mass diffusion by an inverse method, International

Journal of Heat and Mass Transfer, 46:2065-2075, 2003.

169

[71] H. M. Park and T. Y. Yoon, Solution of the inverse radiaiton problem using a

conjugate gradient method, International Journal of Heat and Mass Transfer,

43:1767-1776, 2000.

[72] S. M. H. Sarvari, J. R. Howell, and S. H. Mansouri, Inverse boundary design

conduction-radiation problem in irregular two-dimensional domains, Numeri-

cal Heat Transfer Part B - Fundamentals, 44(3):209-224, 2003.

[73] S. Subramaniam and M. P. Menguc, Solution of the inverse radiation problem

for inhomogeneous and anisotropically scattering media using a Monte Carlo

technique, International Journal of Heat and Mass Transfer, 34:253-266, 1991.

[74] A. N. Brooks and T. J. R. Hughes, Streamline-upwind/Petrov-Galerkin formu-

lation for convection dominated flows with particular emphasis on the incom-

pressible Navier-Stokes equation, Comput. Methods Appl. Mech. Eng., 32:199-

259, 1982.

[75] H. M. Park and M. C. Sung, Sequential solution of a three-dimensional inverse

radiation problem, Compt. Meth. Appl. Mech. Eng., 192:3689-3704, 2003.

[76] S. S. Ravindran, A reduced-order approach for optimal control of fluids using

proper orthogonal decomposition, International Journal for Numerical Meth-

ods in Fluids, 34:425-448, 2000.

[77] H. V. Ly and H. T. Tran, Modeling and control of physical processes us-

ing proper orthogomal decomposition, Mathematical and Computer Modeling,

33:223-236, 2001.

[78] J. Atmadja and A. C. Bagtzoglou, Pollution source identification in heteroge-

neous porous media, Water Resources Research, 37: 2113-2125, 2001.

170

[79] A. M. Michalak and P. K. Kitanidis, Estimation of historical groundwater con-

taminant distribution using the adjoint state method applied to geostatistical

inverse modeling, Water Resources Research, 40, W08302, 2004.

[80] M. F. Snodgrass and P. K. Kitanidis, A geostatistical approach to contaminant

source identification, Water Resources Research, 33:537-546, 1997.

[81] A. M. Michalak and P. K. Kitanidis, Application of geostatistical inverse mod-

eling to contaminant source identification at Dover AFB, Delaware, Journal

of Hydraulic Research, 42:9-18, 2004.

[82] C. Crainiceanu, D. Ruppert, J.R. Stedinger and C.T. Behr, Improving MCMC

mixing for a GLMM describing pathogen concentrations in water supplies,

Case Studies in Bayesian Statistics, VI, Gatsonis, C., et al (editors), Lecture

Notes in Statistics, 167:207-222, 2002.

[83] C. Crainiceanu, J. Stedinger, D. Ruppert and C. Behr, Modeling the U. S. na-

tional distribution of waterborne pathogen concentrations with application to

cryptosporidium parvum, Water Resources Research, 39(9):1235-1249, 2003.

[84] A. F.D. Loula, F. A. Rochinha and M. A. Murad, Higher-order gradient post-

processing for second-order elliptic problems, Computer Methods in Applied

Mechanics and Engineeeing, 128:361-381, 1995.

[85] H. Wang, D. Liang, R. E. Ewing, S. L. Lyons and G. Qin, An ELLAM-

MFEM solution technique for compressible fluid flows in porous media with

point sources and sinks, Journal of Computational Physics, 159:344-376, 2000.

[86] A. L.G.A. Coutinho and J. L.D. Alves, Parallel finite element simulation of

miscible displacement in porous media, SPE Journal, 1:487-500, 1996.

171

[87] S. Balay, K. Buschelman, V. Eijkhout, W.D. Gropp, D. Kaushik, M.G. Kne-

pley, L.C. McInnes, B.F. Smith and H. Zhang, PETSc Users Manual, ANL-

95/11 - Revision 2.1.5, Argonne National Laboratory, 2004.

[88] A.L.G.A. Coutinho, C.M. Dias, J.L.D. Alves, L. Laudau, A.F.D. Loula,

S.M.C. Malta, R.G.S. Castro and E.L.M. Garcia, Stabilized methods and

post-processing techniques for miscible displacements, Computer Methods in

Applied Mechanics and Engineering, 193:1421-1436, 2004.

[89] R.G. Sanabria Castro, S.M.C. Malta, A.F.D. Loula and L. Landau, Numerical

analysis of space-time finite element formulations for miscible displacements,

Computational Geosciences, 5:301-330, 2001.

[90] O. Patzold, I. Grants, U. Wunderwald, K. Jenker, A. Croll and G. Gerbeth,

Vertical gradient freeze growth of GaAs with a rotating magnetic field, J.

Crys. Growth, 245:237-246, 2002.

[91] J.K. Roplekar and J.A. Dantzig, A study of solidification with Rotating Mag-

netic field, UMinn report 2000.

[92] V. Galindo, G. Gerbeth, W. Von Ammon, E. Tomzig and J. Virbulis, Crystal

growth melt flow control by means of magnetic fields, Energy Conv. Manage.,

in press.

[93] H.P. Utech and M.C. Flemings, Elimination of Solute Banding in Indium

Antimonide Crystals by Growth in a Magnetic Field, J. App. Phys., 7:2021-

2024, 1966.

172

[94] H.A. Chedzey and D.T.J. Hurle, Avoidance of Growth-Striae in Semiconductor

and Metal Crystals Grown by Zone-Melting Techniques, Nature, 210:933-934,

1966.

[95] G.M. Oreper and J. Szekely, The effect of a magnetic field on transport phe-

nomena in a Bridgeman-Stockbarger crystal growth, J. Crys. Growth, 67:405-

435, 1984

[96] S. Motakef, Magnetic field eliminitation of convective interference with segre-

gation during vertical Bridgeman growth of doped semi-conductors, J. Crys.

Growth, 64:550-563, 1990

[97] H. Ben Hadid and B. Roux, Numerical study of convection in the horizontal

Bridgeman configuration under the action of a constant magnetic field. 1. Two

dimensional flow, J. Fluid Mech., 333:23-56, 1997

[98] H. Ben Hadid and B. Roux, Numerical study of convection in the horizontal

Bridgeman configuration under the action of a constant magnetic field. 2.

Three dimensional flow, J. Fluid Mech., 333:57-83, 1997.

[99] J. W. Evans, C. D. Seybert, F. Leslie and W. K. Jones, Supression/reversal

of natural convection by exploiting the temperature/composition dependance

of magnetic susceptibility, J. App. Phys., 88(7):4347-4351, 2000.

[100] J. Huang, D.D. Gray and B.F. Edwards, Thermoconvective instability of para-

magnetic fluids in a nonuniform magnetic field, Phys. Rev E, 57:5564-5571,

1998.

173

[101] B. Ganapathysubramanian and N. Zabaras, Using magnetic field gradients to

control the directional solidification of alloys and the growth of single crystals,

J. Crys. Growth, submitted for publication.

[102] B. Ganapathysubramanian, N. Zabaras, Control of solidification of non-

conducting materials using tailored magnetic fields, J. Crys. Growth, sub-

mitted for publication.

[103] C. Collet and F., Murtagh, Multiband segmentation based on a hierarchical

Markov model, Pattern Recognition, 37:2337-2347, 2004.

[104] M. S. Crouse, R. D. Nowak and R. G. Baraniuk, Wavelet-based statisticsl

signal processing using Hidden Markov Models, IEEE Transactions on Signal

Processing, 46:886-902, 1997.

[105] A.A. Grimstad, T. Mannseth T., G. Navdal and H. Urkedal, Adaptive multi-

scale permeability estimation, Computational Geosciences, 7(1):1-25, 2003.

[106] S. Dumais, Using SVMs for text categorization, IEEE Intell. Systems, 13:21-

23, 1998.

[107] T. Joachims, Learning to Classify Text using Support Vector Machines,

Kluwer, 2002.

[108] C.M. Crainiceanu, D. Ruppert and R.J. Carroll, Spatially adaptive P-spline

with heteroscedastic errors, submitted to Journal of Computational and

Graphical Statistics.

[109] J. Liu and P. Moulin, Information Theoretic analysis of interscale and in-

trascale dependencies between image wavelet coefficients, IEEE transactions

on image processing, 10(11):1647-1658, 2001.

bayesian computational techniques for inverse …

Documents