numerical methods
DESCRIPTION
Métodos numéricos para la caracterización de sitioTRANSCRIPT
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 1
Module 8: Numerical methods
Topics:
Introduction
Kriging
Artificial neural networks (ANN)
Triangulation with linear interpolation
Natural neighbour
Inverse distance
Minimum curvature
Regression by plane with weights
Radial basis functions
Keywords: Kriging, variogram, ANN, Interpolation methods
8.1 Introduction:
Surface interpolation and construction of maps have been traditionally used in many
fields such as physics, geophysics, geology, geodesy, hydrology, meteorology and so
on. The goal of this module is to present commonly used techniques for solving
interpolation/ approximation problems and to evaluate their applicability for solving
practical tasks in site charaterization. The below presented interpolation /
approximation methods are:
Kriging
Artificial neural networks (ANN)
Triangulation with linear interpolation
Natural neighbour
Inverse distance
Minimum curvature
Regression by plane with weights
Radial basis functions
8.2 Kriging method
8.2.1 Introduction:
Geo-statistics is a scientific approach to estimate problems in geology and mining. It
is a branch of statistics dealing with spatial phenomena modelled by random
functions.
Today, geo-statistics is no longer restricted to this kind of application. It is, applied in
disciplines such as hydrology, meteorology, oceanography, geography, forestry,
environmental monitoring, landscape ecology, agriculture or for ecosystem
geographical and dynamic study.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 2
Underlying each geo-statistical method is the notion of random function. A random
function describes a given spatial phenomenon over a domain. It consists of a set of
random variables, each of which describes the phenomenon at some location of the
domain.
In most geo-statistical methods, the dependencies between the random variables are
preferably described by a variogram. The variogram depicts the variance of the
increments of the quantity of interest as a function of the distance between sites.
By far, kriging is the most popular geo-statistical method. The aim of kriging is to
predict the phenomenon at unobserved sites. This is the problem of spatial estimation,
sometimes called spatial prediction. Examples of spatial phenomena estimations are
soil nutrient or pollutant concentrations over a field observed on a survey grid,
hydrologic variables over an aquifer observed at well locations, and air quality
measurements over an air basin observed at monitoring sites.
8.2.2 Kriging:
In real world, it is impossible to get exhaustive values of data at every desired point
because of practical constraints. Thus, interpolation is important and fundamental to
graphing, analysing and understanding of 2D data.
Interpolation is the estimation of a variable at an unmeasured location from observed
values at surrounding locations
The word "kriging" is synonymous with "optimal prediction". It is a method of
interpolation, which predicts unknown values from data observed at known locations.
This method uses variogram to express the spatial variation, and it minimizes the error
of predicted values which are estimated by spatial distribution of the predicted values.
Kriging is optimal interpolation based on regression against observed z values of
surrounding data points, weighted according to spatial covariance values.
The term kriging was coined by, Matheron in honor of D.G. Krige who published an
early account of this technique. In its simplest form, a kriging estimate of the field at
an unobserved location is an optimized linear combination of the data at observed
locations.
A full application of a kriging method involves different steps:
1. An important structural analysis is performed usual statistical tools like
histograms, empirical cumulative distributions, can be used in conjunction with an
analysis of the sample variogram.
2. In place of the sample variogram, that does not respect suitable mathematical
properties, a theoretical variogram is chosen. The fitting of the theoretical
variogram model to the sample variogram, informed by the structural analysis, is
performed.
3. Finally, from this variogram specification, the kriging estimate is computed at the
location of interest by solving a system of linear equations of the least squares
type.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 3
8.2.3 Advantages of kriging:
1. Helps to compensate for the effects of data clustering, assigning individual points
within a cluster less weight than isolated data points (or, treating clusters more like
single points)
2. Gives estimate of estimation error (kriging variance), along with estimate of the
variable, Z, itself (but error map is basically a scaled version of a map of distance to
nearest data point, so not that unique)
3. Availability of estimation error provides basis for stochastic simulation of possible
realizations of Z(u).
8.2.4 Kriging Approach and Terminology:
All kriging estimators are but variants of the basic linear regression estimator Z*(u)
defined as
Z*(u) m (u) =
With u, u : location vectors for estimation point and one of the neighbouring data
points, indexed by
n(u): number of data points in local neighbourhood used for estimation of Z*(u)
m(u), m(u): expected values (means) of Z(u) and Z(u)
(u) : kriging weight assigned to datum Z(u) for estimation location u; same datum
will receive different weight for different estimation location.
Z(u) is treated as a random field with a trend component, m(u), and a residual
component, R(u) = Z(u)- m(u). Kriging estimates residual at u as weighted sum of
residuals at surrounding data points. Kriging weights, , are derived from covariance
function or semi-variogram, which should characterize residual component.
Distinction between trend and residual somewhat arbitrary; varies with scale.
Basics of kriging:
The basic form of kriging estimator is:
Z*(u) m (u) =
The goal is to determine the weights , that minimize the variance of the estimator
2E (u) = var
under the unbiasedness constraint E = 0
The random field (RF) Z (u) is decomposed into residual and trend components, Z(u)
= R(u) + m(u), with the residual component treated as an RF with a stationary mean
of 0 and a stationary covariance (a function of lag, h, but not of position, u):
E {R (u)} = 0
Cov{R (u), R (u+ h)} = E{R (u). R (u +h)} = CR (h)
The residual covariance function is generally derived from the input semi-variogram
model, CR(h) = CR(0) - (h) = Sill - (h).
Thus, the semi-variogram we feed to a kriging program should represent the residual
component of the variable. The three main kriging variants,
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 4
1. Simple,
2. Ordinary, and
3. Kriging with a trend,
Differ in their treatments of the trend component, m (u).
Simple kriging:
For simple kriging, we assume that the trend component is a constant and known
mean, m (u) = m, so that
Z*
SK (u) = m +
This estimate is automatically unbiased, since E [Z (u) -m] = 0, so that E[ZSK*(u)] =
m = E[Z(u)]. The estimation error ZSK*(u) Z(u) is a linear combination of random
variables representing residuals at the data points, u , and the estimation point, u:
ZSK*(u) Z (u) = [ZSK
*(u) m] [Z (u) m]
=
Using rules for the variance of a linear combination of random variables, the error
variance is then given by
2E (u) = var {R*SK(u)} + var {RSK(u)} 2cov {RSK
*(u) , RSK(u)}
= ) + CR (0) -2
To minimize the error variance, we take the derivative of the above expression with
respect to each of the kriging weights and set each derivative to zero. This leads to the
following system of equations:
= 1,2,n(u)
Because of the constant mean, the covariance function for Z(u) is the same as that for
the residual component, C(h) = CR(h), so that we can write the simple kriging system
directly in terms of C(h):
This can be written in matrix form as
KSK (u) = K
where KSK is the matrix of covariances between data points, with elements Ki,j = C(ui-
uj), k is the vector of covariances between the data points and the estimation point,
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 5
with elements given by ki =C(ui - u), and SK(u) is the vector of simple kriging
weights for the surrounding data points. If the covariance model is licit (meaning the
underlying semi-variogram model is licit) and no two data points are co-located, then
the data covariance matrix is positive definite and we can solve for the kriging
weights using
SK = K-1
k
Once we have the kriging weights, we can compute both the kriging estimate and the
kriging variance, which is given by
2SK (u) = C(0) SKT(u)k = C(0) -
after substituting the kriging weights into the error variance expression above. All this
math finds a set of weights for estimating the variable value at the location u from
values at a set of neighbouring data points. The weight on each data point generally
decreases with increasing distance to that point, in accordance with the decreasing
data-to-estimation covariances specified in the right-hand vector, k. However, the set
of weights is also designed to account for redundancy among the data points,
represented in the data point-to-data point covariances in the matrix K. Multiplying k
by K-1
(on the left) will downweight points falling in clusters relative to isolated
points at the same distance.
Ordinary kriging:
For ordinary kriging, rather than assuming that the mean is constant over the entire
domain, we assume that it is constant in the local neighbourhood of each estimation
point, that is that m(u) = m(u) for each nearby data value , Z(u), that we are using to
estimate Z(u). In this case, the kriging estimator can be written
Z*(u) = m (u) +
=
and we filter the unknown local mean by requiring that the kriging weights sum to 1,
leading to an ordinary kriging estimator of
ZOK*(u) = with
In order to minimize the error variance subject to the unit-sum constraint on the
weights, we actually set up the system minimize the error variance plus an additional
term involving a Lagrange parameter, OK (u):
L = E2(u) + 2OK(u)[1-
so that minimization with respect to the Lagrange parameter forces the constraint to
be obeyed:
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 6
In this case, the system of equations for the kriging weights turns out to be
= 1,.n(u)
Where, CR(h) is once again the covariance function for the residual component of the
variable. In simple kriging, we could equate CR(h) and C(h), the covariance function
for the variable itself, due to the assumption of a constant mean. That equality does
not hold here, but in practice the substitution is often made anyway, on the
assumption that the semi-variogram, from which C(h) is derived, effectively filters the
influence of large-scale trends in the mean.
In fact, the unit-sum constraint on the weights allows the ordinary kriging system to
be stated directly in terms of the semi-variogram (in place of the CR(h) values above).
In a sense, ordinary kriging is the interpolation approach that follows naturally from a
Semi-variogram analysis, since both tools tend to filter trends in the mean.
Once the kriging weights (and Lagrange parameter) are, obtained the ordinary kriging
error variance is given by
ok2 (u) = C(0) -
Kriging with a Trend:
Kriging with a trend (the method formerly known as universal kriging) is much like
ordinary kriging, except that instead of fitting just a local mean in the neighbourhood
of the estimation point, we fit a linear or higher-order trend in the (x,y) coordinates of
the data points. A local linear (a.k.a., first-order) trend model would be given by
m(u) = m(x, y) = a0 + a1x +a2y
Including such a model in the kriging system involves the same kind of extension as
we used for ordinary kriging, with the addition of two more Lagrange parameters and
two extra columns and rows in the K matrix whose (non-zero) elements are the x and
y coordinates of the data points. Higher-order trends (quadratic, cubic) could be
handled in the same way, but in practice it is rare to use anything higher than a first-
order trend. Ordinary kriging is kriging with a zeroth-order trend model.
If the variable of interest does exhibit a significant trend, a typical approach would be
to attempt to estimate a de-trended semi-variogram using one of the methods
described in the Semi-variogram lecture and then feed this into kriging with a first
order trend. However, Goovaerts (1997) warns against this approach and instead
recommends performing simple kriging of the residuals from a global trend (with a
constant mean of 0) and then adding the kriged residuals back into the global trend.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 7
Co-kriging:
Kriging uses information from one or more correlated secondary variables or
multivariate kriging in general. It requires development of models for cross-
covariance-covariance between two different variables as a function of lag.
Indicator Kriging:
It is the kriging of indicator variables, which represent membership in a set of
categories. Used with naturally categorical variables like fancies or continuous
variables that has been threshold into categories (e.g., quartiles, deciles). Especially
useful for preserving correctness of high- and low permeability regions.
8.3 Artificial Neural Networks
8.3.1 Introduction:
Artificial neural networks (ANNs) are a form of artificial intelligence, which attempts
to mimic the function of the human brain and nervous system. ANNs learn from data
examples presented to them in order to capture the subtle functional relationships
among the data even if the underlying relationships are unknown or the physical
meaning is difficult to explain. ANNs are thus well suited to modelling the complex
behaviour of most geotechnical engineering materials, which by their very nature,
exhibit extreme variability.
Geotechnical properties of soils are controlled by factors such as mineralogy, fabric
and pore water, and the interactions of these factors are difficult to establish solely by
traditional statistical methods due to their interdependence. Based on the application
of ANNs, methodologies have been developed for estimating several soil properties
including the pre-consolidation pressure, shear strength and stress history, compaction
and permeability, soil classification and soil density.
Liquefaction during earthquakes, problem of settlements in shallow foundations can
also be controlled using ANN method.
8.3.2 Overview of Artificial Neural Network:
ANNs consist of a number of artificial neurons variously known as processing
elements (PEs), nodes or units. For multilayer perceptrons (MLPs), which is the
most commonly used ANNs in geotechnical engineering, processing elements are
usually arranged in layers:
1) An input layer,
2) An output layer and
3) One or more intermediate layers called hidden layers.
Figure 8.1 shows a typical multi-layer ANN arrangements
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 8
Each processing element in a specific layer is, fully or partially connected to many
other processing elements via weighted connections. The scalar weights determine the
strength of the connections between interconnected neurons.
A zero weight refers to no connection between two neurons and a negative weight
refers to a prohibitive relationship. From many other processing elements, an
individual processing element receives its weighted inputs, which are summed and a
bias unit or threshold is added or subtracted.
Figure 8.1: A typical multi-layer ANN showing the input layer for ten different inputs,
the middle or hidden layer(s), and the output layer having three outputs
The bias unit is used to scale the input to a useful range to improve the convergence
properties of the neural network. The result of this combined summation is passed
through a transfer function (e.g. logistic sigmoid or hyperbolic tangent) to produce the
output of the processing element.
Figure 8.2 shows ANN with hidden layers
For node j, this process is summarized as:
Ij = j +
yj = f(Ij)
where,
Ij = the activation level of node j;
Wji = the connection weight between nodes j and i;
xi= the input from node i, i = 0, 1, , n;
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 9
j = the bias or threshold for node j;
yj = the output of node j; and
f(.) = the transfer function.
Figure 8.2: ANN with a hidden layer
The propagation of information in MLPs starts at the input layer where the input data
are presented. The inputs are, weighted and received by each node in the next layer.
The weighted inputs are, then summed and passed through a transfer function to
produce the nodal output, which is weighted and passed to processing elements in the
next layer. The network adjusts its weights on presentation of a set of training data
and uses a learning rule until it can find a set of weights that will produce the input-
output mapping that has the smallest possible error. The above process is known as
learning or training.
Learning in ANNs is usually divided into supervised and unsupervised. In supervised
learning, the network is presented with a historical set of model inputs and the
corresponding (desired) outputs. The actual output of the network is compared with
the desired output and an error is calculated. This error is used to adjust the
connection weights between the model inputs and outputs to reduce the error between
the historical outputs and those predicted by the ANN.
In unsupervised learning, the network is only presented with the input stimuli and
there are no desired outputs. The network itself adjusts the connection weights
according to the input values. The idea of training in unsupervised networks is to
cluster the input records into classes of similar features.
ANNs can be categorized on the basis of two major criteria:
1) The learning rule used, and
2) The connections between processing elements.
Based on learning rules, ANNs, as mentioned above, can be divided into supervised
and unsupervised networks. Based on connections between processing elements,
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 10
ANNs can be divided into feed-forward and feedback networks. In feed forward
networks, the connections between the processing elements are in the forward
direction only, whereas, in feedback networks, connections between processing
elements are in both the forward and backward directions.
8.3.3 Modelling issues in Artificial Neural Networks:
In order to improve performance, ANN models need to be developed in a systematic
manner. Such an approach needs to address major factors such as the determination of
adequate model inputs, data division and pre-processing, the choice of suitable
network architecture, careful selection of some internal parameters that control the
optimization method, the stopping criteria and model validation. These factors are
explained and discussed below.
Determination of model inputs:
An important step in developing ANN models is to select the model input variables
that have the most significant impact on model performance. A good subset of input
variables can substantially improve model performance. Presenting as large a number
of input variables as possible to ANN models usually increases network size, resulting
in a decrease in processing speed and a reduction in the efficiency of the network.
A number of techniques have been suggested to assist with the selection of input
variables. An approach that is usually utilized in the field of geotechnical engineering
is that appropriate input variables can be selected in advance based on a priori
knowledge. Another approach used is to train many neural networks with different
combinations of input variables and to select the network that has the best
performance.
A step-wise technique described by Maier and Dandy can also be used in which
separate networks are trained, each using only one of the available variables as model
inputs. The network that performs the best is then retained, combining the variable
that results in the best performance with each of the remaining variables. This process
is repeated for an increasing number of input variables, until the addition of additional
variables results in no improvement in model performance.
Another useful approach is to employ a genetic algorithm to search for the best sets of
input variables. For each possible set of input variables chosen by the genetic
algorithm, a neural network is, trained and used to rank different subsets of possible
inputs. A set of input variables derives its fitness from the model error obtained based
on those variables.
A potential shortcoming of the above approaches is that they are model-based. In
other words, the determination as to whether a parameter input is significant or not is
dependent on the error of a trained model, which is not only a function of the inputs,
but also model structure and calibration. This can potentially obscure the impact of
different model inputs.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 11
In order to overcome this limitation, model-free approaches can be utilized, which use
linear dependence measures, such as correlation, or non-linear measures of
dependence, such as mutual information, to obtain the significant model inputs prior
to developing the ANN models.
Division of data:
ANNs perform best when they do not extrapolate beyond the range of the data used
for calibration. Therefore, the purpose of ANNs is to non-linearly interpolate
(generalize) in high-dimensional space between the data used for calibration. ANN
models generally have a large number of model parameters (connection weights) and
can therefore over-fit the training data.
In other words, if the number of degrees of freedom of the model is large compared
with the number of data points used for calibration, the model might no longer fit the
general trend, as desired. Consequently, a separate validation set is needed to ensure
that the model can generalize within the range of the data used for calibration. It is
common practice to divide the available data into two subsets; a training set, to
construct the neural network model, and an independent validation set to estimate the
model performance in a deployed environment.
Usually, two-thirds of the data are suggested for model training and one-third for
validation. A modification of the above data division method is cross-validation in
which the data are divided into three sets: training, testing and validation. The training
set is used to adjust the connection weights, whereas the testing set is used to check
the performance of the model at various stages of training and to determine when to
stop training to avoid over-fitting. The validation set is used to estimate the
performance of the trained network in the deployed environment.
In many situations, the available data are small enough to be solely devoted to model
training and collecting any more data for validation is difficult. In this situation, the
leave-k-out method can be used which involves holding back a small fraction of the
data for validation and using the rest of the data for training. After training, the
performance of the trained network has to be, estimated with the aid of the validation
set. A different small subset of data is, held back and the network is trained and tested
again. This process is, repeated many times with different subsets until an optimal
model can be obtained from the use of all of the available data.
In the majority of ANN applications in geotechnical engineering, the data are divided
into their subsets on an arbitrary basis. However, recent studies have found that the
way the data are divided can have a significant impact on the results obtained. As
ANNs have difficulty extrapolating beyond the range of the data used for calibration,
in order to develop the best ANN model, given the available data, all of the patterns
that are contained in the data need to be included in the calibration set.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 12
Data Pre-processing:
Once the available data have been divided into their subsets (i.e. training, testing and
validation), it is important to pre-process the data in a suitable form before they are
applied to the ANN. Data pre-processing is necessary to ensure all variables receive
equal attention during the training process.
Pre-processing can be in the form of data scaling, normalization and transformation.
Scaling the output data is essential, as they have to commensurate with the limits of
the transfer functions used in the output layer. Scaling the input data is not necessary
but it is almost always recommended. In some cases, the input data need to be
normally distributed in order to obtain optimal results.
Transforming the input data into some known forms may be helpful to improve ANN
performance. However, empirical trials showed that the model fits were the same,
regardless of whether raw or transformed data were used.
Determination of Model Architecture:
Determining the network architecture is one of the most important and difficult tasks
in ANN model development. It requires the selection of the optimum number of
layers and the number of nodes in each of these. For MLPs, there are always two
layers representing the input and output variables in any neural network. It has been
shown that one hidden layer is sufficient to approximate any continuous function
provided that sufficient connection weights are given.
After several contradictions, Lapedes and Farber (1988) provided more practical
proof that two hidden layers are sufficient, the first hidden layer is used to extract the
local features of the input patterns while the second hidden layer is useful to extract
the global features of the training patterns. However, Masters (1993) stated that using
more than one hidden layer often slows the training process dramatically and
increases the chance of getting trapped in local minima.
The number of nodes in the input and output layers is restricted by the number of
model inputs and outputs, respectively. It has been shown in the literature that neural
networks with a large number of free parameters (connection weights) are more
subject to over-fitting and poor generalization. Consequently, keeping the number of
hidden nodes to a minimum, provided that satisfactory performance is achieved, is
always better, as it:
o Reduces the computational time needed for training;
o Helps the network achieve better generalization performance;
o Helps avoid the problem of over-fitting and
o Allows the trained network to be analysed more easily.
For single hidden layer networks, there are a number of rules-of-thumb to obtain the
best number of hidden layer nodes. Hecht-Nielsen and Caudill suggested that the
upper limit of the number of hidden nodes in a single layer network may be taken as
(2I+1), where I is the number of inputs. The best approach found by Nawari et al.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 13
(1999) was to start with a small number of nodes and to slightly increase the number
until no significant improvement in model performance is achieved.
For networks with two hidden layers, the geometric pyramid rule described by Nawari
et al. (1999) can be used. The notion behind this method is that the number of nodes
in each layer follows a geometric progression of a pyramid shape, in which the
number of nodes decreases from the input layer towards the output layer. Kudrycki
found empirically that the optimum ratio of the first to second hidden layer nodes is
3:1, even for high dimensional inputs.
Another way of determining the optimal number of hidden nodes that can result in
good model generalization and avoid over-fitting is to relate the number of hidden
nodes to the number of available training samples (Maier and Dandy, 2000).
A number of systematic approaches have also been proposed to obtain automatically
the optimal network architecture. The adaptive method of architecture determination
is an example of the automatic methods for obtaining the optimal network architecture
that suggests starting with an arbitrary, but small, number of nodes in the hidden
layers.
During training, and as the network approaches its capacity, new nodes are added to
the hidden layers, and new connection weights are generated. Training is continued
immediately after the new hidden nodes are added to allow the new connection
weights to acquire the portion of the knowledge base, which was not stored in the old
connection weights. The above steps are repeated and new hidden nodes are added as
needed to the end of the training process, in which the appropriate network
architecture is automatically determined.
Model Optimization (Training):
As mentioned previously, the process of optimizing the connection weights is known
as training or learning. The aim is to find a global solution to what is typically a
highly non-linear optimization problem. The method most commonly used for finding
the optimum weight combination of feed-forward MLP neural networks is the back-
propagation algorithm which is based on first-order gradient descent.
The use of global optimization methods, such as simulated annealing and genetic
algorithms, have also been proposed. The advantage of these methods is that they
have the ability to escape local minima in the error surface and, thus, produce optimal
or near optimal solutions. However, they also have a slow convergence rate.
Ultimately, the model performance criteria, which are problem specific, will dictate
which training algorithm is most appropriate.
Stopping Criteria:
Stopping criteria are used to decide when to stop the training process. They determine
whether the model has been optimally or sub-optimally trained. Training can be
stopped: after the presentation of a fixed number of training records; when the
training error reaches a sufficiently small value; or when no or slight changes in the
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 14
training error occur. However, the above examples of stopping criteria may lead to the
model stopping prematurely or over-training.
The cross-validation technique is an approach that can be used to overcome such
problems. It is considered to be the most valuable tool to ensure over-fitting does not
occur (Smith 1993). A number of stopping criteria can also be used. Unlike cross-
validation, these stopping criteria require the data be divided into only two sets; a
training set, to construct the model; and an independent validation set, to test the
validity of the model in the deployed environment.
The basic notion of these stopping criteria is that model performance should balance
model complexity with the amount of training data and model error.
Model Validation:
Once the training phase of the model has been successfully accomplished, the
performance of the trained model should be validated. The purpose of the model
validation phase is to ensure that the model has the ability to generalize within the
limits set by the training data in a robust fashion, rather than simply having
memorized the input-output relationships that are contained in the training data.
The approach that is generally adopted is to test the performance of trained ANNs on
an independent validation set, which has not been used as part of the model building
process. If such performance is adequate, the model is deemed to be able to generalize
and is considered to be robust.
The coefficient of correlation, r, the root mean squared error, RMSE, and the mean
absolute error, MAE, are the main criteria that are often used to evaluate the
prediction performance of ANN models. The coefficient of correlation is a measure
that is used to determine the relative correlation and the goodness-of-fit between the
predicted and observed data. Smith (1986) suggested the following guide for values of
r between 0.0 and 1.0:
a. r 0.8 strong correlation exists between two sets of variables;
b. 0.2 < r < 0.8 correlation exists between the two sets of variables; and
c. r 0.2 weak correlation exists between the two sets of variables.
The RMSE is the most popular measure of error and has the advantage that large
errors receive much greater attention than small errors. In contrast with RMSE, MAE
eliminates the emphasis given to large errors. Both RMSE and MAE are desirable
when the evaluated output data are smooth or continuous.
Despite the success of ANNs in geotechnical engineering and other disciplines, they
suffer from some shortcomings that need further attention in the future including
model transparency and knowledge extraction, extrapolation and uncertainty.
Together, improvements in these issues will greatly enhance the usefulness of ANN
models with respect to geotechnical engineering applications.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 15
8.4 Triangulation with linear interpolation
The method of triangulation with linear interpolation is historically one of the first
methods used before the intensive development of computers. It is based on the
division of the domain D into triangles. Each triangle then defines, by its three
vertices, a plane that is why the resulting surface is per parts linear.
8.4.1 Advantages:
Very fast algorithm
Resulting surface is interpolative
8.4.2 Disadvantages:
The domain of the function f is limited to the convex envelope of the points XYZ.
Resulting surface is not smooth and iso-lines consists of line segments
The division into triangles may be ambiguous, as the following simple example of
alternative division of rectangle shows in the first case a valley was created, in the
second case a ridge was created.
8.4.3 Application:
This method is, still used in geodesy and digital models of terrain. As a rule,
characteristic points of terrain are measured it means that the person performing
terrain measurements surveys only points where the slope of terrain changes (tops,
edges, valleys and so on) and thus avoids the above-mentioned ambiguity. For
interpretation of such data, the Triangulation with linear interpolation method is quite
suitable.
8.5 Natural neighbour
The Natural neighbour is an interpolation method based on Voronoi tessellation.
Voronoi tessellation can be defined as the partitioning of a plane with n points into n
convex polygons such that each polygon contains exactly one point and every point in
a given polygon is closer to its central point than to any other. In other words, if
i=1n is a given set of points in 2 than the Voronoi polygon corresponding to the
point Xi is the set Vi = {X 2 ;X , X i
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 16
were included in the tessellation. The weights of points A, B, C, D and E which are
used to compute the interpolated value of X are respectively the areas of the grey
region intersecting each original cell of A, B, C, D and E and are also known as the
natural neighbour coordinates of X.
Figure 8.3: New Voronoi cell and areas for computation of neighbour point weights.
The surface formed by natural neighbour interpolation has the useful properties of
being continuous (C0) everywhere and passing exactly through z values of all data
points. Moreover, the interpolated surface is continuously differentiable (C1)
everywhere except at the data points, providing smooth interpolation in contrast to the
Triangulation with linear interpolation method.
8.5.1 Advantages:
Fast algorithm
Resulting surface is interpolative and smooth except at the data points.
8.5.2 Disadvantages:
The domain of the function f is limited to the convex envelope of the points XYZ
The shape of the resulting surface is not acceptable in some fields such as in geology
or hydrogeology.
8.5.3 Application:
The Natural neighbour method is, mainly used in GIS systems as a digital model of
terrain and fast interpolation of terrain data providing a smooth surface.
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 17
8.6 Inverse distance
This method computes a value of function f at an arbitrary point (x, y) D as a
weighted average of values Zi:
f(x, y) = , where wi = hi = and
2 is a smoothing parameter.
If the number of points n is too great, the value of f (x, y) is calculated only from
points belonging to the specified circle surrounding the point (x, y). The method was
frequently implemented in the first stages of computers development.
8.6.1 Advantages:
Simple computer implementation; for its simplicity, the method is implemented in
almost all gridding software packages
If 2=0, the method provides interpolation.
8.6.2 Disadvantages:
High computer time consumption if the number of points n is large (due to
computation of distances)
Typical generation of "bull's-eyes" surrounding the position of point locations within
the domain D that is why the resulting function is not acceptable for most
applications.
8.7 Minimum curvature method
This method and namely its computer implementation was developed by Smith and
Wessel (1990). The interpolated surface generated by the Minimum curvature method
is analogous to a thin, linearly elastic plate passing through each of the data values
with a minimum amount of bending. The algorithm of the Minimum curvature
method is based on the numerical solution of the modified bi-harmonic differential
equation
(1-T) 4f (x, y) - T2 f(x, y)=0 with three boundary conditions:
(1 T)2 f / n2 + (T) f / n = 0
(2 f )/ n=0 on the edges
2 f / x y=0 at the corners
where
T0,1 is a tensioning parameter
2 is the Laplacian operator 2 f =2 f / x2 + 2 f / y2
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 18
4= (2)2 is the bi-harmonic operator
4 f =4 f / x4 + 4 f / y4 + 24 f / x 2 y 2 and
n is the boundary normal.
If T=0, the bi-harmonic differential equation is solved; if T=1, the Laplace differential
equation is solved in this case, the resulting surface may have local extremes only at
points XYZ.
8.7.1 Advantages:
Speed of computation is high and an increasing number of points XYZ has small
influence on decreasing the computational speed.
Suitable method for a large number of points XYZ.
8.7.2 Disadvantages:
Complicated algorithm and computer implementation
If the parameter T is near zero, the resulting surface may have local extremes out of
the points location
Bad ability to conserve extrapolation trends.
8.7.3 Application:
Universal method suitable for smooth approximation and interpolation (for example
distribution of temperature, water heads, potential fields and so on).
8.8 Regression by plane with weights
This method is based on regression by plane f(x, y) = ax +by +c using a weighted
least square fit. The weight wi assigned to the point ( Xi , Yi , Zi ) is computed as an
inverse distance from the point (x, y) to the point ( Xi ,Yi ) . Then the minimum of the
following function of the three independent variables has to be found:
F (a , b , c) = , which leads
to the solution of the three linear equations:
)
After rearrangement the following equations are obtained:
a
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 19
a
a
In addition to the regression by plane, some mapping packages, offer possibility to
use polynomials of higher order.
8.8.1 Advantages:
Simple algorithm
Good extrapolation properties
8.8.2 Disadvantages:
Resulting function is only approximate
Slow speed of computation if n is great (due to computation of distances)
8.8.3 Application:
Surface reconstruction from digitized contour lines. The method was frequently used
namely in the past, when contour maps were transferred from paper sheets to digital
maps.
8.9 Radial basis functions
The method of Radial basis functions uses the interpolation function in the form:
f(x, y) = p(x, y) +
where
p(x, y) is a polynomial
wi are real weights
|(x, y)-(Xi,Yi)| is the Euclidean distance between the points (x, y) and (Xi , Yi)
(r) is a radial basis function
Commonly used radial basis functions are (c2 is the smoothing parameter):
Multi quadric: (r) =
Multi log: (r) = log (r2 + c2)
Natural cubic spline: (r) = (r2 +c2)3/2
Natural plate spline: (r) = (r2 +c2) log (r2 +c2)
-
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8
Dr. P. Anbazhagan Page 20
The interpolation process starts with polynomial regression using the polynomial p(x,
y). Then the following system of n linear equations is solved for unknown weights wi,
i = 1,..., n :
Zj p(Xj , Yj) = j=1,.,n
As soon as the weights wi are determined, the z-value of the surface can be directly
computed from equation above at any point (x, y) D.
8.9.1 Advantages:
Simple computer implementation; the system of linear equations has to be solved only
once (in contrast to the Kriging method, where a system of linear equations must be
solved for each grid node see the next section)
The resulting function is interpolative
Easy implementation of smoothing
8.9.2 Disadvantages:
If the number of points n is large, the number of linear equations is also large;
moreover the matrix of the system is not sparse, which leads to a long computational
time and possibly to the propagation of rounding errors. That is why this method, as
presented, is used for solving small problems with up to a few thousand points.
Solving large problems is also possible, but requires an additional algorithm for
searching points in the specified surrounding of each grid node.
8.9.3 Application:
Universal method suitable for use in any field.