numerical methods

20
SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8 Dr. P. Anbazhagan Page 1 Module 8: Numerical methods Topics: Introduction Kriging Artificial neural networks (ANN) Triangulation with linear interpolation Natural neighbour Inverse distance Minimum curvature Regression by plane with weights Radial basis functions Keywords: Kriging, variogram, ANN, Interpolation methods 8.1 Introduction: Surface interpolation and construction of maps have been traditionally used in many fields such as physics, geophysics, geology, geodesy, hydrology, meteorology and so on. The goal of this module is to present commonly used techniques for solving interpolation/ approximation problems and to evaluate their applicability for solving practical tasks in site charaterization. The below presented interpolation / approximation methods are: Kriging Artificial neural networks (ANN) Triangulation with linear interpolation Natural neighbour Inverse distance Minimum curvature Regression by plane with weights Radial basis functions 8.2 Kriging method 8.2.1 Introduction: Geo-statistics is a scientific approach to estimate problems in geology and mining. It is a branch of statistics dealing with spatial phenomena modelled by random functions. Today, geo-statistics is no longer restricted to this kind of application. It is, applied in disciplines such as hydrology, meteorology, oceanography, geography, forestry, environmental monitoring, landscape ecology, agriculture or for ecosystem geographical and dynamic study.

Upload: maximo-a-sanchez

Post on 14-Sep-2015

18 views

Category:

Documents


0 download

DESCRIPTION

Métodos numéricos para la caracterización de sitio

TRANSCRIPT

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 1

    Module 8: Numerical methods

    Topics:

    Introduction

    Kriging

    Artificial neural networks (ANN)

    Triangulation with linear interpolation

    Natural neighbour

    Inverse distance

    Minimum curvature

    Regression by plane with weights

    Radial basis functions

    Keywords: Kriging, variogram, ANN, Interpolation methods

    8.1 Introduction:

    Surface interpolation and construction of maps have been traditionally used in many

    fields such as physics, geophysics, geology, geodesy, hydrology, meteorology and so

    on. The goal of this module is to present commonly used techniques for solving

    interpolation/ approximation problems and to evaluate their applicability for solving

    practical tasks in site charaterization. The below presented interpolation /

    approximation methods are:

    Kriging

    Artificial neural networks (ANN)

    Triangulation with linear interpolation

    Natural neighbour

    Inverse distance

    Minimum curvature

    Regression by plane with weights

    Radial basis functions

    8.2 Kriging method

    8.2.1 Introduction:

    Geo-statistics is a scientific approach to estimate problems in geology and mining. It

    is a branch of statistics dealing with spatial phenomena modelled by random

    functions.

    Today, geo-statistics is no longer restricted to this kind of application. It is, applied in

    disciplines such as hydrology, meteorology, oceanography, geography, forestry,

    environmental monitoring, landscape ecology, agriculture or for ecosystem

    geographical and dynamic study.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 2

    Underlying each geo-statistical method is the notion of random function. A random

    function describes a given spatial phenomenon over a domain. It consists of a set of

    random variables, each of which describes the phenomenon at some location of the

    domain.

    In most geo-statistical methods, the dependencies between the random variables are

    preferably described by a variogram. The variogram depicts the variance of the

    increments of the quantity of interest as a function of the distance between sites.

    By far, kriging is the most popular geo-statistical method. The aim of kriging is to

    predict the phenomenon at unobserved sites. This is the problem of spatial estimation,

    sometimes called spatial prediction. Examples of spatial phenomena estimations are

    soil nutrient or pollutant concentrations over a field observed on a survey grid,

    hydrologic variables over an aquifer observed at well locations, and air quality

    measurements over an air basin observed at monitoring sites.

    8.2.2 Kriging:

    In real world, it is impossible to get exhaustive values of data at every desired point

    because of practical constraints. Thus, interpolation is important and fundamental to

    graphing, analysing and understanding of 2D data.

    Interpolation is the estimation of a variable at an unmeasured location from observed

    values at surrounding locations

    The word "kriging" is synonymous with "optimal prediction". It is a method of

    interpolation, which predicts unknown values from data observed at known locations.

    This method uses variogram to express the spatial variation, and it minimizes the error

    of predicted values which are estimated by spatial distribution of the predicted values.

    Kriging is optimal interpolation based on regression against observed z values of

    surrounding data points, weighted according to spatial covariance values.

    The term kriging was coined by, Matheron in honor of D.G. Krige who published an

    early account of this technique. In its simplest form, a kriging estimate of the field at

    an unobserved location is an optimized linear combination of the data at observed

    locations.

    A full application of a kriging method involves different steps:

    1. An important structural analysis is performed usual statistical tools like

    histograms, empirical cumulative distributions, can be used in conjunction with an

    analysis of the sample variogram.

    2. In place of the sample variogram, that does not respect suitable mathematical

    properties, a theoretical variogram is chosen. The fitting of the theoretical

    variogram model to the sample variogram, informed by the structural analysis, is

    performed.

    3. Finally, from this variogram specification, the kriging estimate is computed at the

    location of interest by solving a system of linear equations of the least squares

    type.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 3

    8.2.3 Advantages of kriging:

    1. Helps to compensate for the effects of data clustering, assigning individual points

    within a cluster less weight than isolated data points (or, treating clusters more like

    single points)

    2. Gives estimate of estimation error (kriging variance), along with estimate of the

    variable, Z, itself (but error map is basically a scaled version of a map of distance to

    nearest data point, so not that unique)

    3. Availability of estimation error provides basis for stochastic simulation of possible

    realizations of Z(u).

    8.2.4 Kriging Approach and Terminology:

    All kriging estimators are but variants of the basic linear regression estimator Z*(u)

    defined as

    Z*(u) m (u) =

    With u, u : location vectors for estimation point and one of the neighbouring data

    points, indexed by

    n(u): number of data points in local neighbourhood used for estimation of Z*(u)

    m(u), m(u): expected values (means) of Z(u) and Z(u)

    (u) : kriging weight assigned to datum Z(u) for estimation location u; same datum

    will receive different weight for different estimation location.

    Z(u) is treated as a random field with a trend component, m(u), and a residual

    component, R(u) = Z(u)- m(u). Kriging estimates residual at u as weighted sum of

    residuals at surrounding data points. Kriging weights, , are derived from covariance

    function or semi-variogram, which should characterize residual component.

    Distinction between trend and residual somewhat arbitrary; varies with scale.

    Basics of kriging:

    The basic form of kriging estimator is:

    Z*(u) m (u) =

    The goal is to determine the weights , that minimize the variance of the estimator

    2E (u) = var

    under the unbiasedness constraint E = 0

    The random field (RF) Z (u) is decomposed into residual and trend components, Z(u)

    = R(u) + m(u), with the residual component treated as an RF with a stationary mean

    of 0 and a stationary covariance (a function of lag, h, but not of position, u):

    E {R (u)} = 0

    Cov{R (u), R (u+ h)} = E{R (u). R (u +h)} = CR (h)

    The residual covariance function is generally derived from the input semi-variogram

    model, CR(h) = CR(0) - (h) = Sill - (h).

    Thus, the semi-variogram we feed to a kriging program should represent the residual

    component of the variable. The three main kriging variants,

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 4

    1. Simple,

    2. Ordinary, and

    3. Kriging with a trend,

    Differ in their treatments of the trend component, m (u).

    Simple kriging:

    For simple kriging, we assume that the trend component is a constant and known

    mean, m (u) = m, so that

    Z*

    SK (u) = m +

    This estimate is automatically unbiased, since E [Z (u) -m] = 0, so that E[ZSK*(u)] =

    m = E[Z(u)]. The estimation error ZSK*(u) Z(u) is a linear combination of random

    variables representing residuals at the data points, u , and the estimation point, u:

    ZSK*(u) Z (u) = [ZSK

    *(u) m] [Z (u) m]

    =

    Using rules for the variance of a linear combination of random variables, the error

    variance is then given by

    2E (u) = var {R*SK(u)} + var {RSK(u)} 2cov {RSK

    *(u) , RSK(u)}

    = ) + CR (0) -2

    To minimize the error variance, we take the derivative of the above expression with

    respect to each of the kriging weights and set each derivative to zero. This leads to the

    following system of equations:

    = 1,2,n(u)

    Because of the constant mean, the covariance function for Z(u) is the same as that for

    the residual component, C(h) = CR(h), so that we can write the simple kriging system

    directly in terms of C(h):

    This can be written in matrix form as

    KSK (u) = K

    where KSK is the matrix of covariances between data points, with elements Ki,j = C(ui-

    uj), k is the vector of covariances between the data points and the estimation point,

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 5

    with elements given by ki =C(ui - u), and SK(u) is the vector of simple kriging

    weights for the surrounding data points. If the covariance model is licit (meaning the

    underlying semi-variogram model is licit) and no two data points are co-located, then

    the data covariance matrix is positive definite and we can solve for the kriging

    weights using

    SK = K-1

    k

    Once we have the kriging weights, we can compute both the kriging estimate and the

    kriging variance, which is given by

    2SK (u) = C(0) SKT(u)k = C(0) -

    after substituting the kriging weights into the error variance expression above. All this

    math finds a set of weights for estimating the variable value at the location u from

    values at a set of neighbouring data points. The weight on each data point generally

    decreases with increasing distance to that point, in accordance with the decreasing

    data-to-estimation covariances specified in the right-hand vector, k. However, the set

    of weights is also designed to account for redundancy among the data points,

    represented in the data point-to-data point covariances in the matrix K. Multiplying k

    by K-1

    (on the left) will downweight points falling in clusters relative to isolated

    points at the same distance.

    Ordinary kriging:

    For ordinary kriging, rather than assuming that the mean is constant over the entire

    domain, we assume that it is constant in the local neighbourhood of each estimation

    point, that is that m(u) = m(u) for each nearby data value , Z(u), that we are using to

    estimate Z(u). In this case, the kriging estimator can be written

    Z*(u) = m (u) +

    =

    and we filter the unknown local mean by requiring that the kriging weights sum to 1,

    leading to an ordinary kriging estimator of

    ZOK*(u) = with

    In order to minimize the error variance subject to the unit-sum constraint on the

    weights, we actually set up the system minimize the error variance plus an additional

    term involving a Lagrange parameter, OK (u):

    L = E2(u) + 2OK(u)[1-

    so that minimization with respect to the Lagrange parameter forces the constraint to

    be obeyed:

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 6

    In this case, the system of equations for the kriging weights turns out to be

    = 1,.n(u)

    Where, CR(h) is once again the covariance function for the residual component of the

    variable. In simple kriging, we could equate CR(h) and C(h), the covariance function

    for the variable itself, due to the assumption of a constant mean. That equality does

    not hold here, but in practice the substitution is often made anyway, on the

    assumption that the semi-variogram, from which C(h) is derived, effectively filters the

    influence of large-scale trends in the mean.

    In fact, the unit-sum constraint on the weights allows the ordinary kriging system to

    be stated directly in terms of the semi-variogram (in place of the CR(h) values above).

    In a sense, ordinary kriging is the interpolation approach that follows naturally from a

    Semi-variogram analysis, since both tools tend to filter trends in the mean.

    Once the kriging weights (and Lagrange parameter) are, obtained the ordinary kriging

    error variance is given by

    ok2 (u) = C(0) -

    Kriging with a Trend:

    Kriging with a trend (the method formerly known as universal kriging) is much like

    ordinary kriging, except that instead of fitting just a local mean in the neighbourhood

    of the estimation point, we fit a linear or higher-order trend in the (x,y) coordinates of

    the data points. A local linear (a.k.a., first-order) trend model would be given by

    m(u) = m(x, y) = a0 + a1x +a2y

    Including such a model in the kriging system involves the same kind of extension as

    we used for ordinary kriging, with the addition of two more Lagrange parameters and

    two extra columns and rows in the K matrix whose (non-zero) elements are the x and

    y coordinates of the data points. Higher-order trends (quadratic, cubic) could be

    handled in the same way, but in practice it is rare to use anything higher than a first-

    order trend. Ordinary kriging is kriging with a zeroth-order trend model.

    If the variable of interest does exhibit a significant trend, a typical approach would be

    to attempt to estimate a de-trended semi-variogram using one of the methods

    described in the Semi-variogram lecture and then feed this into kriging with a first

    order trend. However, Goovaerts (1997) warns against this approach and instead

    recommends performing simple kriging of the residuals from a global trend (with a

    constant mean of 0) and then adding the kriged residuals back into the global trend.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 7

    Co-kriging:

    Kriging uses information from one or more correlated secondary variables or

    multivariate kriging in general. It requires development of models for cross-

    covariance-covariance between two different variables as a function of lag.

    Indicator Kriging:

    It is the kriging of indicator variables, which represent membership in a set of

    categories. Used with naturally categorical variables like fancies or continuous

    variables that has been threshold into categories (e.g., quartiles, deciles). Especially

    useful for preserving correctness of high- and low permeability regions.

    8.3 Artificial Neural Networks

    8.3.1 Introduction:

    Artificial neural networks (ANNs) are a form of artificial intelligence, which attempts

    to mimic the function of the human brain and nervous system. ANNs learn from data

    examples presented to them in order to capture the subtle functional relationships

    among the data even if the underlying relationships are unknown or the physical

    meaning is difficult to explain. ANNs are thus well suited to modelling the complex

    behaviour of most geotechnical engineering materials, which by their very nature,

    exhibit extreme variability.

    Geotechnical properties of soils are controlled by factors such as mineralogy, fabric

    and pore water, and the interactions of these factors are difficult to establish solely by

    traditional statistical methods due to their interdependence. Based on the application

    of ANNs, methodologies have been developed for estimating several soil properties

    including the pre-consolidation pressure, shear strength and stress history, compaction

    and permeability, soil classification and soil density.

    Liquefaction during earthquakes, problem of settlements in shallow foundations can

    also be controlled using ANN method.

    8.3.2 Overview of Artificial Neural Network:

    ANNs consist of a number of artificial neurons variously known as processing

    elements (PEs), nodes or units. For multilayer perceptrons (MLPs), which is the

    most commonly used ANNs in geotechnical engineering, processing elements are

    usually arranged in layers:

    1) An input layer,

    2) An output layer and

    3) One or more intermediate layers called hidden layers.

    Figure 8.1 shows a typical multi-layer ANN arrangements

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 8

    Each processing element in a specific layer is, fully or partially connected to many

    other processing elements via weighted connections. The scalar weights determine the

    strength of the connections between interconnected neurons.

    A zero weight refers to no connection between two neurons and a negative weight

    refers to a prohibitive relationship. From many other processing elements, an

    individual processing element receives its weighted inputs, which are summed and a

    bias unit or threshold is added or subtracted.

    Figure 8.1: A typical multi-layer ANN showing the input layer for ten different inputs,

    the middle or hidden layer(s), and the output layer having three outputs

    The bias unit is used to scale the input to a useful range to improve the convergence

    properties of the neural network. The result of this combined summation is passed

    through a transfer function (e.g. logistic sigmoid or hyperbolic tangent) to produce the

    output of the processing element.

    Figure 8.2 shows ANN with hidden layers

    For node j, this process is summarized as:

    Ij = j +

    yj = f(Ij)

    where,

    Ij = the activation level of node j;

    Wji = the connection weight between nodes j and i;

    xi= the input from node i, i = 0, 1, , n;

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 9

    j = the bias or threshold for node j;

    yj = the output of node j; and

    f(.) = the transfer function.

    Figure 8.2: ANN with a hidden layer

    The propagation of information in MLPs starts at the input layer where the input data

    are presented. The inputs are, weighted and received by each node in the next layer.

    The weighted inputs are, then summed and passed through a transfer function to

    produce the nodal output, which is weighted and passed to processing elements in the

    next layer. The network adjusts its weights on presentation of a set of training data

    and uses a learning rule until it can find a set of weights that will produce the input-

    output mapping that has the smallest possible error. The above process is known as

    learning or training.

    Learning in ANNs is usually divided into supervised and unsupervised. In supervised

    learning, the network is presented with a historical set of model inputs and the

    corresponding (desired) outputs. The actual output of the network is compared with

    the desired output and an error is calculated. This error is used to adjust the

    connection weights between the model inputs and outputs to reduce the error between

    the historical outputs and those predicted by the ANN.

    In unsupervised learning, the network is only presented with the input stimuli and

    there are no desired outputs. The network itself adjusts the connection weights

    according to the input values. The idea of training in unsupervised networks is to

    cluster the input records into classes of similar features.

    ANNs can be categorized on the basis of two major criteria:

    1) The learning rule used, and

    2) The connections between processing elements.

    Based on learning rules, ANNs, as mentioned above, can be divided into supervised

    and unsupervised networks. Based on connections between processing elements,

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 10

    ANNs can be divided into feed-forward and feedback networks. In feed forward

    networks, the connections between the processing elements are in the forward

    direction only, whereas, in feedback networks, connections between processing

    elements are in both the forward and backward directions.

    8.3.3 Modelling issues in Artificial Neural Networks:

    In order to improve performance, ANN models need to be developed in a systematic

    manner. Such an approach needs to address major factors such as the determination of

    adequate model inputs, data division and pre-processing, the choice of suitable

    network architecture, careful selection of some internal parameters that control the

    optimization method, the stopping criteria and model validation. These factors are

    explained and discussed below.

    Determination of model inputs:

    An important step in developing ANN models is to select the model input variables

    that have the most significant impact on model performance. A good subset of input

    variables can substantially improve model performance. Presenting as large a number

    of input variables as possible to ANN models usually increases network size, resulting

    in a decrease in processing speed and a reduction in the efficiency of the network.

    A number of techniques have been suggested to assist with the selection of input

    variables. An approach that is usually utilized in the field of geotechnical engineering

    is that appropriate input variables can be selected in advance based on a priori

    knowledge. Another approach used is to train many neural networks with different

    combinations of input variables and to select the network that has the best

    performance.

    A step-wise technique described by Maier and Dandy can also be used in which

    separate networks are trained, each using only one of the available variables as model

    inputs. The network that performs the best is then retained, combining the variable

    that results in the best performance with each of the remaining variables. This process

    is repeated for an increasing number of input variables, until the addition of additional

    variables results in no improvement in model performance.

    Another useful approach is to employ a genetic algorithm to search for the best sets of

    input variables. For each possible set of input variables chosen by the genetic

    algorithm, a neural network is, trained and used to rank different subsets of possible

    inputs. A set of input variables derives its fitness from the model error obtained based

    on those variables.

    A potential shortcoming of the above approaches is that they are model-based. In

    other words, the determination as to whether a parameter input is significant or not is

    dependent on the error of a trained model, which is not only a function of the inputs,

    but also model structure and calibration. This can potentially obscure the impact of

    different model inputs.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 11

    In order to overcome this limitation, model-free approaches can be utilized, which use

    linear dependence measures, such as correlation, or non-linear measures of

    dependence, such as mutual information, to obtain the significant model inputs prior

    to developing the ANN models.

    Division of data:

    ANNs perform best when they do not extrapolate beyond the range of the data used

    for calibration. Therefore, the purpose of ANNs is to non-linearly interpolate

    (generalize) in high-dimensional space between the data used for calibration. ANN

    models generally have a large number of model parameters (connection weights) and

    can therefore over-fit the training data.

    In other words, if the number of degrees of freedom of the model is large compared

    with the number of data points used for calibration, the model might no longer fit the

    general trend, as desired. Consequently, a separate validation set is needed to ensure

    that the model can generalize within the range of the data used for calibration. It is

    common practice to divide the available data into two subsets; a training set, to

    construct the neural network model, and an independent validation set to estimate the

    model performance in a deployed environment.

    Usually, two-thirds of the data are suggested for model training and one-third for

    validation. A modification of the above data division method is cross-validation in

    which the data are divided into three sets: training, testing and validation. The training

    set is used to adjust the connection weights, whereas the testing set is used to check

    the performance of the model at various stages of training and to determine when to

    stop training to avoid over-fitting. The validation set is used to estimate the

    performance of the trained network in the deployed environment.

    In many situations, the available data are small enough to be solely devoted to model

    training and collecting any more data for validation is difficult. In this situation, the

    leave-k-out method can be used which involves holding back a small fraction of the

    data for validation and using the rest of the data for training. After training, the

    performance of the trained network has to be, estimated with the aid of the validation

    set. A different small subset of data is, held back and the network is trained and tested

    again. This process is, repeated many times with different subsets until an optimal

    model can be obtained from the use of all of the available data.

    In the majority of ANN applications in geotechnical engineering, the data are divided

    into their subsets on an arbitrary basis. However, recent studies have found that the

    way the data are divided can have a significant impact on the results obtained. As

    ANNs have difficulty extrapolating beyond the range of the data used for calibration,

    in order to develop the best ANN model, given the available data, all of the patterns

    that are contained in the data need to be included in the calibration set.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 12

    Data Pre-processing:

    Once the available data have been divided into their subsets (i.e. training, testing and

    validation), it is important to pre-process the data in a suitable form before they are

    applied to the ANN. Data pre-processing is necessary to ensure all variables receive

    equal attention during the training process.

    Pre-processing can be in the form of data scaling, normalization and transformation.

    Scaling the output data is essential, as they have to commensurate with the limits of

    the transfer functions used in the output layer. Scaling the input data is not necessary

    but it is almost always recommended. In some cases, the input data need to be

    normally distributed in order to obtain optimal results.

    Transforming the input data into some known forms may be helpful to improve ANN

    performance. However, empirical trials showed that the model fits were the same,

    regardless of whether raw or transformed data were used.

    Determination of Model Architecture:

    Determining the network architecture is one of the most important and difficult tasks

    in ANN model development. It requires the selection of the optimum number of

    layers and the number of nodes in each of these. For MLPs, there are always two

    layers representing the input and output variables in any neural network. It has been

    shown that one hidden layer is sufficient to approximate any continuous function

    provided that sufficient connection weights are given.

    After several contradictions, Lapedes and Farber (1988) provided more practical

    proof that two hidden layers are sufficient, the first hidden layer is used to extract the

    local features of the input patterns while the second hidden layer is useful to extract

    the global features of the training patterns. However, Masters (1993) stated that using

    more than one hidden layer often slows the training process dramatically and

    increases the chance of getting trapped in local minima.

    The number of nodes in the input and output layers is restricted by the number of

    model inputs and outputs, respectively. It has been shown in the literature that neural

    networks with a large number of free parameters (connection weights) are more

    subject to over-fitting and poor generalization. Consequently, keeping the number of

    hidden nodes to a minimum, provided that satisfactory performance is achieved, is

    always better, as it:

    o Reduces the computational time needed for training;

    o Helps the network achieve better generalization performance;

    o Helps avoid the problem of over-fitting and

    o Allows the trained network to be analysed more easily.

    For single hidden layer networks, there are a number of rules-of-thumb to obtain the

    best number of hidden layer nodes. Hecht-Nielsen and Caudill suggested that the

    upper limit of the number of hidden nodes in a single layer network may be taken as

    (2I+1), where I is the number of inputs. The best approach found by Nawari et al.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 13

    (1999) was to start with a small number of nodes and to slightly increase the number

    until no significant improvement in model performance is achieved.

    For networks with two hidden layers, the geometric pyramid rule described by Nawari

    et al. (1999) can be used. The notion behind this method is that the number of nodes

    in each layer follows a geometric progression of a pyramid shape, in which the

    number of nodes decreases from the input layer towards the output layer. Kudrycki

    found empirically that the optimum ratio of the first to second hidden layer nodes is

    3:1, even for high dimensional inputs.

    Another way of determining the optimal number of hidden nodes that can result in

    good model generalization and avoid over-fitting is to relate the number of hidden

    nodes to the number of available training samples (Maier and Dandy, 2000).

    A number of systematic approaches have also been proposed to obtain automatically

    the optimal network architecture. The adaptive method of architecture determination

    is an example of the automatic methods for obtaining the optimal network architecture

    that suggests starting with an arbitrary, but small, number of nodes in the hidden

    layers.

    During training, and as the network approaches its capacity, new nodes are added to

    the hidden layers, and new connection weights are generated. Training is continued

    immediately after the new hidden nodes are added to allow the new connection

    weights to acquire the portion of the knowledge base, which was not stored in the old

    connection weights. The above steps are repeated and new hidden nodes are added as

    needed to the end of the training process, in which the appropriate network

    architecture is automatically determined.

    Model Optimization (Training):

    As mentioned previously, the process of optimizing the connection weights is known

    as training or learning. The aim is to find a global solution to what is typically a

    highly non-linear optimization problem. The method most commonly used for finding

    the optimum weight combination of feed-forward MLP neural networks is the back-

    propagation algorithm which is based on first-order gradient descent.

    The use of global optimization methods, such as simulated annealing and genetic

    algorithms, have also been proposed. The advantage of these methods is that they

    have the ability to escape local minima in the error surface and, thus, produce optimal

    or near optimal solutions. However, they also have a slow convergence rate.

    Ultimately, the model performance criteria, which are problem specific, will dictate

    which training algorithm is most appropriate.

    Stopping Criteria:

    Stopping criteria are used to decide when to stop the training process. They determine

    whether the model has been optimally or sub-optimally trained. Training can be

    stopped: after the presentation of a fixed number of training records; when the

    training error reaches a sufficiently small value; or when no or slight changes in the

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 14

    training error occur. However, the above examples of stopping criteria may lead to the

    model stopping prematurely or over-training.

    The cross-validation technique is an approach that can be used to overcome such

    problems. It is considered to be the most valuable tool to ensure over-fitting does not

    occur (Smith 1993). A number of stopping criteria can also be used. Unlike cross-

    validation, these stopping criteria require the data be divided into only two sets; a

    training set, to construct the model; and an independent validation set, to test the

    validity of the model in the deployed environment.

    The basic notion of these stopping criteria is that model performance should balance

    model complexity with the amount of training data and model error.

    Model Validation:

    Once the training phase of the model has been successfully accomplished, the

    performance of the trained model should be validated. The purpose of the model

    validation phase is to ensure that the model has the ability to generalize within the

    limits set by the training data in a robust fashion, rather than simply having

    memorized the input-output relationships that are contained in the training data.

    The approach that is generally adopted is to test the performance of trained ANNs on

    an independent validation set, which has not been used as part of the model building

    process. If such performance is adequate, the model is deemed to be able to generalize

    and is considered to be robust.

    The coefficient of correlation, r, the root mean squared error, RMSE, and the mean

    absolute error, MAE, are the main criteria that are often used to evaluate the

    prediction performance of ANN models. The coefficient of correlation is a measure

    that is used to determine the relative correlation and the goodness-of-fit between the

    predicted and observed data. Smith (1986) suggested the following guide for values of

    r between 0.0 and 1.0:

    a. r 0.8 strong correlation exists between two sets of variables;

    b. 0.2 < r < 0.8 correlation exists between the two sets of variables; and

    c. r 0.2 weak correlation exists between the two sets of variables.

    The RMSE is the most popular measure of error and has the advantage that large

    errors receive much greater attention than small errors. In contrast with RMSE, MAE

    eliminates the emphasis given to large errors. Both RMSE and MAE are desirable

    when the evaluated output data are smooth or continuous.

    Despite the success of ANNs in geotechnical engineering and other disciplines, they

    suffer from some shortcomings that need further attention in the future including

    model transparency and knowledge extraction, extrapolation and uncertainty.

    Together, improvements in these issues will greatly enhance the usefulness of ANN

    models with respect to geotechnical engineering applications.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 15

    8.4 Triangulation with linear interpolation

    The method of triangulation with linear interpolation is historically one of the first

    methods used before the intensive development of computers. It is based on the

    division of the domain D into triangles. Each triangle then defines, by its three

    vertices, a plane that is why the resulting surface is per parts linear.

    8.4.1 Advantages:

    Very fast algorithm

    Resulting surface is interpolative

    8.4.2 Disadvantages:

    The domain of the function f is limited to the convex envelope of the points XYZ.

    Resulting surface is not smooth and iso-lines consists of line segments

    The division into triangles may be ambiguous, as the following simple example of

    alternative division of rectangle shows in the first case a valley was created, in the

    second case a ridge was created.

    8.4.3 Application:

    This method is, still used in geodesy and digital models of terrain. As a rule,

    characteristic points of terrain are measured it means that the person performing

    terrain measurements surveys only points where the slope of terrain changes (tops,

    edges, valleys and so on) and thus avoids the above-mentioned ambiguity. For

    interpretation of such data, the Triangulation with linear interpolation method is quite

    suitable.

    8.5 Natural neighbour

    The Natural neighbour is an interpolation method based on Voronoi tessellation.

    Voronoi tessellation can be defined as the partitioning of a plane with n points into n

    convex polygons such that each polygon contains exactly one point and every point in

    a given polygon is closer to its central point than to any other. In other words, if

    i=1n is a given set of points in 2 than the Voronoi polygon corresponding to the

    point Xi is the set Vi = {X 2 ;X , X i

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 16

    were included in the tessellation. The weights of points A, B, C, D and E which are

    used to compute the interpolated value of X are respectively the areas of the grey

    region intersecting each original cell of A, B, C, D and E and are also known as the

    natural neighbour coordinates of X.

    Figure 8.3: New Voronoi cell and areas for computation of neighbour point weights.

    The surface formed by natural neighbour interpolation has the useful properties of

    being continuous (C0) everywhere and passing exactly through z values of all data

    points. Moreover, the interpolated surface is continuously differentiable (C1)

    everywhere except at the data points, providing smooth interpolation in contrast to the

    Triangulation with linear interpolation method.

    8.5.1 Advantages:

    Fast algorithm

    Resulting surface is interpolative and smooth except at the data points.

    8.5.2 Disadvantages:

    The domain of the function f is limited to the convex envelope of the points XYZ

    The shape of the resulting surface is not acceptable in some fields such as in geology

    or hydrogeology.

    8.5.3 Application:

    The Natural neighbour method is, mainly used in GIS systems as a digital model of

    terrain and fast interpolation of terrain data providing a smooth surface.

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 17

    8.6 Inverse distance

    This method computes a value of function f at an arbitrary point (x, y) D as a

    weighted average of values Zi:

    f(x, y) = , where wi = hi = and

    2 is a smoothing parameter.

    If the number of points n is too great, the value of f (x, y) is calculated only from

    points belonging to the specified circle surrounding the point (x, y). The method was

    frequently implemented in the first stages of computers development.

    8.6.1 Advantages:

    Simple computer implementation; for its simplicity, the method is implemented in

    almost all gridding software packages

    If 2=0, the method provides interpolation.

    8.6.2 Disadvantages:

    High computer time consumption if the number of points n is large (due to

    computation of distances)

    Typical generation of "bull's-eyes" surrounding the position of point locations within

    the domain D that is why the resulting function is not acceptable for most

    applications.

    8.7 Minimum curvature method

    This method and namely its computer implementation was developed by Smith and

    Wessel (1990). The interpolated surface generated by the Minimum curvature method

    is analogous to a thin, linearly elastic plate passing through each of the data values

    with a minimum amount of bending. The algorithm of the Minimum curvature

    method is based on the numerical solution of the modified bi-harmonic differential

    equation

    (1-T) 4f (x, y) - T2 f(x, y)=0 with three boundary conditions:

    (1 T)2 f / n2 + (T) f / n = 0

    (2 f )/ n=0 on the edges

    2 f / x y=0 at the corners

    where

    T0,1 is a tensioning parameter

    2 is the Laplacian operator 2 f =2 f / x2 + 2 f / y2

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 18

    4= (2)2 is the bi-harmonic operator

    4 f =4 f / x4 + 4 f / y4 + 24 f / x 2 y 2 and

    n is the boundary normal.

    If T=0, the bi-harmonic differential equation is solved; if T=1, the Laplace differential

    equation is solved in this case, the resulting surface may have local extremes only at

    points XYZ.

    8.7.1 Advantages:

    Speed of computation is high and an increasing number of points XYZ has small

    influence on decreasing the computational speed.

    Suitable method for a large number of points XYZ.

    8.7.2 Disadvantages:

    Complicated algorithm and computer implementation

    If the parameter T is near zero, the resulting surface may have local extremes out of

    the points location

    Bad ability to conserve extrapolation trends.

    8.7.3 Application:

    Universal method suitable for smooth approximation and interpolation (for example

    distribution of temperature, water heads, potential fields and so on).

    8.8 Regression by plane with weights

    This method is based on regression by plane f(x, y) = ax +by +c using a weighted

    least square fit. The weight wi assigned to the point ( Xi , Yi , Zi ) is computed as an

    inverse distance from the point (x, y) to the point ( Xi ,Yi ) . Then the minimum of the

    following function of the three independent variables has to be found:

    F (a , b , c) = , which leads

    to the solution of the three linear equations:

    )

    After rearrangement the following equations are obtained:

    a

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 19

    a

    a

    In addition to the regression by plane, some mapping packages, offer possibility to

    use polynomials of higher order.

    8.8.1 Advantages:

    Simple algorithm

    Good extrapolation properties

    8.8.2 Disadvantages:

    Resulting function is only approximate

    Slow speed of computation if n is great (due to computation of distances)

    8.8.3 Application:

    Surface reconstruction from digitized contour lines. The method was frequently used

    namely in the past, when contour maps were transferred from paper sheets to digital

    maps.

    8.9 Radial basis functions

    The method of Radial basis functions uses the interpolation function in the form:

    f(x, y) = p(x, y) +

    where

    p(x, y) is a polynomial

    wi are real weights

    |(x, y)-(Xi,Yi)| is the Euclidean distance between the points (x, y) and (Xi , Yi)

    (r) is a radial basis function

    Commonly used radial basis functions are (c2 is the smoothing parameter):

    Multi quadric: (r) =

    Multi log: (r) = log (r2 + c2)

    Natural cubic spline: (r) = (r2 +c2)3/2

    Natural plate spline: (r) = (r2 +c2) log (r2 +c2)

  • SITE CHARACTERIZATION & INSTRUMENTATION MODULE 8

    Dr. P. Anbazhagan Page 20

    The interpolation process starts with polynomial regression using the polynomial p(x,

    y). Then the following system of n linear equations is solved for unknown weights wi,

    i = 1,..., n :

    Zj p(Xj , Yj) = j=1,.,n

    As soon as the weights wi are determined, the z-value of the surface can be directly

    computed from equation above at any point (x, y) D.

    8.9.1 Advantages:

    Simple computer implementation; the system of linear equations has to be solved only

    once (in contrast to the Kriging method, where a system of linear equations must be

    solved for each grid node see the next section)

    The resulting function is interpolative

    Easy implementation of smoothing

    8.9.2 Disadvantages:

    If the number of points n is large, the number of linear equations is also large;

    moreover the matrix of the system is not sparse, which leads to a long computational

    time and possibly to the propagation of rounding errors. That is why this method, as

    presented, is used for solving small problems with up to a few thousand points.

    Solving large problems is also possible, but requires an additional algorithm for

    searching points in the specified surrounding of each grid node.

    8.9.3 Application:

    Universal method suitable for use in any field.