application of genetic algorithm-based support vector ... papers/applicati… · soil liquefaction...

11
ORIGINAL ARTICLE Application of genetic algorithm-based support vector machines for prediction of soil liquefaction Xinhua Xue 1,2 Ming Xiao 2 Received: 2 February 2016 / Accepted: 20 April 2016 / Published online: 11 May 2016 Ó Springer-Verlag Berlin Heidelberg 2016 Abstract This paper presents a hybrid genetic algorithm (GA) and support vector machine (SVM) techniques to predict the potential of soil liquefaction. GA is employed in selecting the optimal values of the kernel function and the penalty parameter in SVM model to improve the fore- casting accuracy. The database used in this study includes 109 CPT-based field observations from five major earth- quakes between 1964 and 1983. Several important parameters, including the cone resistance, total vertical stress, effective vertical stress, mean grain size, normalized peak horizontal acceleration at ground surface, cyclic stress ratio, and earthquake magnitude, were used as the input parameters, while the potential of soil liquefaction was the output parameter. The predictions from the GA-SVM model were compared with those from three methods: grid search (GS) method, artificial neural network (ANN) model, and C4.5 decision tree approach. The overall clas- sification success rates for the entire dataset predicted by GA-SVM, ANN, C4.5 decision tree, and GS-SVM models are 97.25, 97.2, 96.3, and 92.66 %, respectively. The study concluded that the proposed GA-SVM model improves the classification accuracy and is a feasible method in pre- dicting soil liquefaction. Keywords Soil liquefaction Support vector machine Genetic algorithm Grid search Liquefaction Classification accuracy Introduction Soil liquefaction and related ground failures are commonly associated with large earthquakes. Liquefaction is defined as the transformation of a granular material from a solid to a liquefied state as a consequence of increased pore-water pressure and reduced effective stress (Marcuson 1978). During ground shaking, shrinkage of pore spaces of loose to medium-compact granular soils squeezes the pore water; when the pore water cannot easily drain, the pore-water pressure significantly increases, thus reducing the effective stress. If the shear resistance of the soil becomes less than the static shear stress, the soil can undergo large defor- mations and is said to liquefy (Seed and Idriss 1967, 1971). Liquefaction may cause the reduction or loss of bearing capacity, large settlement, and horizontal displacement due to lateral spreads of liquefied soils. Liquefaction can exhibit in the forms of sand boils or lateral spread of sur- ficial soils. Examples of this type of damage were observed in the historical earthquakes, such as the 1964 Niigata, the 1964 Alaska, the 1971 San Fernando, the 1985 Mexico City, the 1994 Northridge, the 1994 Kobe, the 1999 Tai- wan, the 1999 Turkey, the 2010 Baja California, and the 2011 To ¯hoku earthquakes. Therefore, the assessment of liquefaction potential due to an earthquake at a site is an imperative task in geotechnical earthquake engineering. There are different methods available for determination of liquefaction potential of soil (e.g., Seed et al. 1983; Skempton 1986; Robertson and Campanella 1985; Shibata and Teparaksa 1988; Juang et al. 2000; Zhang et al. 2004; & Xinhua Xue [email protected] 1 State Key Laboratory of Hydraulics and Mountain River Engineering, College of Water Resource and Hydropower, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, Sichuan, People’s Republic of China 2 Department of Civil and Environmental Engineering, The Pennsylvania State University, University Park, PA 16802, USA 123 Environ Earth Sci (2016) 75:874 DOI 10.1007/s12665-016-5673-7

Upload: others

Post on 19-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • ORIGINAL ARTICLE

    Application of genetic algorithm-based support vector machinesfor prediction of soil liquefaction

    Xinhua Xue1,2 • Ming Xiao2

    Received: 2 February 2016 / Accepted: 20 April 2016 / Published online: 11 May 2016

    � Springer-Verlag Berlin Heidelberg 2016

    Abstract This paper presents a hybrid genetic algorithm

    (GA) and support vector machine (SVM) techniques to

    predict the potential of soil liquefaction. GA is employed in

    selecting the optimal values of the kernel function and the

    penalty parameter in SVM model to improve the fore-

    casting accuracy. The database used in this study includes

    109 CPT-based field observations from five major earth-

    quakes between 1964 and 1983. Several important

    parameters, including the cone resistance, total vertical

    stress, effective vertical stress, mean grain size, normalized

    peak horizontal acceleration at ground surface, cyclic stress

    ratio, and earthquake magnitude, were used as the input

    parameters, while the potential of soil liquefaction was the

    output parameter. The predictions from the GA-SVM

    model were compared with those from three methods: grid

    search (GS) method, artificial neural network (ANN)

    model, and C4.5 decision tree approach. The overall clas-

    sification success rates for the entire dataset predicted by

    GA-SVM, ANN, C4.5 decision tree, and GS-SVM models

    are 97.25, 97.2, 96.3, and 92.66 %, respectively. The study

    concluded that the proposed GA-SVM model improves the

    classification accuracy and is a feasible method in pre-

    dicting soil liquefaction.

    Keywords Soil liquefaction � Support vector machine �Genetic algorithm � Grid search � Liquefaction �Classification accuracy

    Introduction

    Soil liquefaction and related ground failures are commonly

    associated with large earthquakes. Liquefaction is defined

    as the transformation of a granular material from a solid to

    a liquefied state as a consequence of increased pore-water

    pressure and reduced effective stress (Marcuson 1978).

    During ground shaking, shrinkage of pore spaces of loose

    to medium-compact granular soils squeezes the pore water;

    when the pore water cannot easily drain, the pore-water

    pressure significantly increases, thus reducing the effective

    stress. If the shear resistance of the soil becomes less than

    the static shear stress, the soil can undergo large defor-

    mations and is said to liquefy (Seed and Idriss 1967, 1971).

    Liquefaction may cause the reduction or loss of bearing

    capacity, large settlement, and horizontal displacement due

    to lateral spreads of liquefied soils. Liquefaction can

    exhibit in the forms of sand boils or lateral spread of sur-

    ficial soils. Examples of this type of damage were observed

    in the historical earthquakes, such as the 1964 Niigata, the

    1964 Alaska, the 1971 San Fernando, the 1985 Mexico

    City, the 1994 Northridge, the 1994 Kobe, the 1999 Tai-

    wan, the 1999 Turkey, the 2010 Baja California, and the

    2011 Tōhoku earthquakes. Therefore, the assessment of

    liquefaction potential due to an earthquake at a site is an

    imperative task in geotechnical earthquake engineering.

    There are different methods available for determination

    of liquefaction potential of soil (e.g., Seed et al. 1983;

    Skempton 1986; Robertson and Campanella 1985; Shibata

    and Teparaksa 1988; Juang et al. 2000; Zhang et al. 2004;

    & Xinhua [email protected]

    1 State Key Laboratory of Hydraulics and Mountain River

    Engineering, College of Water Resource and Hydropower,

    Sichuan University, No. 24 South Section 1, Yihuan Road,

    Chengdu 610065, Sichuan, People’s Republic of China

    2 Department of Civil and Environmental Engineering, The

    Pennsylvania State University, University Park, PA 16802,

    USA

    123

    Environ Earth Sci (2016) 75:874

    DOI 10.1007/s12665-016-5673-7

    http://crossmark.crossref.org/dialog/?doi=10.1007/s12665-016-5673-7&domain=pdfhttp://crossmark.crossref.org/dialog/?doi=10.1007/s12665-016-5673-7&domain=pdf

  • Jha and Suzuki 2009; Samui 2013; Guettaya and Ouni

    2014; Pour and Asakereh 2015). Most of these methods

    depend on some limit states that separate the non-lique-

    faction region from the liquefaction region and were

    established empirically using in situ field observations

    from sites where test data are available (Mahesh 2006).

    Among them, standard penetration tests (SPT), cone pen-

    etration tests (CPT), flat dilatometer tests (DMT), shear

    wave velocity technique (SWV), and self-boring pressure

    meter test (SBPT) are the most commonly used in situ tests

    for liquefaction potential prediction. However, the high

    uncertainty in earthquake environments and soil charac-

    teristics make it difficult to choose a suitable empirical

    equation for regression analysis (Samui 2013). Conse-

    quently, many scholars and engineers have attempted to

    develop scientifically derived analytical models that are

    simpler, easier to implement, and more accurate than tra-

    ditional empirical equations for soil liquefaction analysis

    (Pal 2006; Lee and Chern 2013).

    In recent years, artificial neural networks (ANNs) model

    has been widely used for prediction of liquefaction

    potential as a classification problem (Sami and Patrick

    2005; Ramakrishnan et al. 2008; Chern and Lee 2009;

    Mughieda et al. 2009; Samui and Sitharam 2011; Maria

    2011; Farrokhzad et al. 2012; Mert 2013). Although the

    ANN was found to be more efficient compared to statistical

    methods, it has several inherent drawbacks such as over-

    fitting, slow convergence, poor generalizing performance,

    and arriving at local minimum. Support vector machine

    (SVM), developed by Vapnik (1995), is a machine-learning

    algorithm. SVM classifies data with different class labels

    by determining a set of support vectors that are members of

    a set of training inputs that outline a hyperplane in the

    feature space. SVM provides a generic mechanism that fits

    the hyperplane surface to the training data using a kernel

    function. However, it is necessary to optimize the SVM

    parameters such as the penalty parameter C and the kernel

    function parameters in order to improve the SVM classi-

    fication accuracy. The grid search (GS) method is com-

    monly used as parameter selection method in SVM.

    However, this method is prone to trap at local optimal

    points because GS is limited to the parameter value set

    initially (Keerthi and Lin 2003). Compared to GS, genetic

    algorithm (GA) is a search algorithm based upon the

    mechanics of natural selection (Kumar and Naresh 2007),

    and it outperforms in terms of both accuracy and time

    efficiency. Due to the promising performances, GA has

    been successfully used in various fields (e.g., Cevik and

    Cabalar 2009; Ding et al. 2011; Karegowda et al. 2011;

    Marghany 2013, 2014, 2015; Alemdag et al. 2016). How-

    ever, few publications are available on the performance

    comparison of GS and GA in optimizing the SVM model

    for the prediction of soil liquefaction potential. Therefore,

    the research objective is to develop a GA-SVM model for

    the prediction of soil liquefaction and compare the per-

    formances of the GA-SVM model with three other meth-

    ods: GS-SVM, ANN, and C4.5 decision tree approaches.

    Introduction of support vector machines

    Consider a training sample set xi; yið Þf g; i = 1 to N, whereN is the total number of samples. It is assumed that the

    samples can be classified into two classes namely triangle

    class and five-point star class (Fig. 1). Labels yi ¼ �1 andyi ¼ þ1 are associated with the triangle class and the five-point star class, respectively. H in Fig. 1 is a labeling line

    separating the two samples, and the H1 and H2 dashed

    lines pass through the nearest samples to the labeling line.

    The distance between H1 and H2 is called the margin, and

    the nearest data points used to define the margin are called

    support vectors. The separating hyperplane H that has the

    maximum distance between the nearest data is called the

    optimal separating hyperplane.

    As Fig. 1 shows, the hyperplane that separates the given

    data can be expressed as

    wTxþ b ¼ 0 ð1Þ

    where b is the bias, w is an adaptive weight vector, x is an

    input vector, and wTx is an inner product of w and x. For

    the linearly separable class, a separating hyperplane for the

    two classes can be defined as

    wTxi þ b� 1 if yi ¼ 1wTxi þ b� � 1 if yi ¼ �1

    �ð2Þ

    or it can be presented in the following complete equation:

    yi wTxi þ b

    � �� 1� 0 ð3Þ

    The goal of the SVM is to find b and w for the optimal

    separating hyperplane to maximize the margin 2= wk k.Therefore, the hyperplane that optimally separates the data

    Fig. 1 Hyper-plane classification

    874 Page 2 of 11 Environ Earth Sci (2016) 75:874

    123

  • is the one that minimizes wk k. The optimal separatinghyperplane can be obtained by solving the following con-

    vex quadratic optimization problem (Vapnik 1995):

    Minimize1

    2wTw ¼ 1

    2wk k2 ð4Þ

    Subject to yi wTxi þ b

    � �� 1� 0 i ¼ 1; 2; . . .;N ð5Þ

    The above equation can be transformed into the equiv-

    alent Lagrangian dual problem as (Lee and Chern 2013)

    L w; b; að Þ ¼ 12wTw�

    XNi¼1

    ai yi wTxi þ b

    � �� 1

    � �ð6Þ

    where ai � 0 (i = 1, 2,…, N) are the Lagrange multipliers.The main goal here is to find w and b, and the a which,respectively, minimize and maximizes Eq. (6). This can be

    done by differentiating L with respect to w and b, and

    setting the derivatives to zero (Lee and Chern 2013):

    oL w; b; að Þow

    ¼ 0 ) w ¼XNi¼1

    aiyixi ð7aÞ

    oL w; b; að Þob

    ¼ 0 )XNi¼1

    aiyi ¼ 0 ð7bÞ

    Multiplying Eq. (7a) by wT leads to

    wTw ¼XNi¼1

    aiyiwTxi ¼

    XNi¼1

    XNj¼1

    aiajyiyjxTi xj

    i ¼ 1; 2; . . .;N; j ¼ 1; 2; . . .;Nð Þð8Þ

    Based on Eqs. (7a), (7b), and (8), Eq. (6) can then be

    reformulated as follows:

    L að Þ ¼XNi¼1

    ai �1

    2

    XNi¼1

    XNj¼1

    aiajyiyjxTi xj ð9Þ

    Subject to

    PNi¼1 aiyi ¼ 0

    ai � 0; i ¼ 1; 2; . . .;N

    �ð10Þ

    It can be seen that this is a convex quadratic optimiza-

    tion problem, and it requires a solution a�i to determine theparameters w� and b� for the optimal hyperplane. Thus, theoptimal hyperplane decision function can be written as

    (Lee and Chern 2013)

    f xð Þ ¼ sign w�Txi þ b�� �

    ¼ signXNi¼1

    a�i yixTi xj þ b�

    !

    ð11Þ

    where sign is the signum function. If the result is positive, then

    it is classified x as class 1, otherwise, it is classified as class 2.

    However, because most of the classification problems

    are linearly non-separable instances, Cortes and Vapnik

    (1995) introduced positive slack variable n and a penalty

    parameter C to reduce the number of misclassifications.

    Thus, Eq. (4) can be changed as follows:

    Minimize1

    2wk k2þC

    XNi¼1

    ni ð12Þ

    Subject toyi w

    Txi þ bð Þ� 1� nini � 0; i ¼ 1; 2; . . .;N

    �ð13Þ

    where the slack variable n accounts for the effect of mis-classification and the penalty parameter C defines the trade-

    off between the number of misclassification in the training

    data and margin maximization.

    As above, the Lagrange function for this problem can be

    written as

    L w; b; n; a; bð Þ ¼ 12wTwþ C

    XNi¼1

    ni

    �XNi¼1

    ai yi wTxi þ b

    � �� 1þ ni

    � �

    þXNi¼1

    bini ð14Þ

    Subject to ai; bi � 0 i ¼ 1; 2; . . .;Nð Þ ð15Þ

    where ai and bi are the Lagrange multipliers.Differentiating L with respect to w, b, and n, and setting

    the derivatives to zero lead to

    oL w; b; n; a; bð Þow

    ¼ 0 ) w ¼XNi¼1

    aiyixi ð16aÞ

    oL w; b; n; a; bð Þob

    ¼ 0 )XNi¼1

    aiyi ¼ 0 ð16bÞ

    oL w; b; n; a; bð Þoni

    ¼ 0 ) C � ai � bi ¼ 0 ð16cÞ

    Based on Eqs. (16a), (16b), and (16c), Eq. (14) can then be

    reformulated as follows:

    L w; b; n; a; bð Þ ¼XNi¼1

    ai �1

    2

    XNi¼1

    XNj¼1

    aiajyiyjxTi xj ð17Þ

    Subject to

    Pni¼1 aiyi ¼ 0

    0� ai �C; i ¼ 1; 2; . . .;N

    �ð18Þ

    The equations are almost the same dual problem as

    before, with a slight difference being that the multipliers aihave an extra constraint. For nonlinear problems, the

    optimization problem can be solved by mapping the data

    into a high-dimensional feature space. In high-dimensional

    feature space, the inner product operation of linear problem

    can be substituted by kernel functions, i.e.,

    K xi; xj� �

    ¼ u xið Þu xj� �

    . Therefore, the decision function

    can be written as

    Environ Earth Sci (2016) 75:874 Page 3 of 11 874

    123

  • f xð Þ ¼XNi¼1

    yiaiK xi; xj� �

    þ b ð19Þ

    There are several kernel functions, such as the linear kernel

    functions, polynomial kernel functions, radial basis func-

    tion (RBF), sigmoid kernel functions, that are used in

    SVM. Dibike et al. (2001) demonstrated that the RBF

    outperformed other kernel functions after using different

    kernels in SVM for rainfall runoff modeling. Therefore,

    RBF is adopted in this study and expressed as

    K xi; xj� �

    ¼ exp �xi � xj�� ��2

    2r2

    !ð20Þ

    where r is the width of the radial basis function.

    GA-based parameter optimization method

    Parameters of the kernel function and the penalty param-

    eter (C) in SVM model would greatly influence the pre-

    diction results. However, the SVM theory does not provide

    method for selecting these parameters. For optimizing the

    parameters of the kernel function and the penalty parameter

    (C) in SVM model, genetic algorithm (GA) was used to

    select the best-matched parameters in this study.

    Genetic algorithm (GA)

    Genetic algorithm (GA) is a search algorithm based upon

    the mechanics of natural selection, derived from the

    theory of natural evolution. GA simulates mechanisms of

    population genetics and natural rules of survival in

    pursuit of the ideas of adaptation. A typical GA starts

    with an initial set of random solutions called population,

    and each individual in the population is called a chro-

    mosome. A chromosome is usually, but not necessarily,

    a binary string and represents a solution to a problem.

    Chromosomes evolve through successive iterations,

    called generations. During each generation, the chromo-

    somes are evaluated, using some measures of fitness. To

    create the next generation, new chromosomes, called

    offspring, are formed by either (1) merging two chro-

    mosomes from the current generation using a crossover

    operator or (2) modifying a chromosome using a muta-

    tion operator. A new generation is formed by (1)

    selecting some of the parents and offspring according to

    their fitness values and (2) rejecting the rest so that the

    population size is kept constant. In the process, better-

    fitting chromosomes have a higher chance of being

    selected. After several generations, the algorithm con-

    verges to the best set of chromosomes, which hopefully

    represent the optimum or near-optimal solution to the

    problem.

    SVM optimized by GA (GA-SVM)

    To design an effective classifier, the parameters of SVM

    model have to be configured properly in advance. In this

    section, the proposed GA-based SVM parameter opti-

    mization approach is described as follows.

    Chromosome design

    In this study, the RBF kernel function is adopted due to its

    promising performances, and thus, only two parameters, C

    and r, need to be optimized by using the proposed GA-based method. Therefore, the chromosome comprises two

    parameters, C and r. Figure 2 shows the chromosomemodel formed by two binary blocks: (1) the first block,

    which includes Ci with 1� i� nc, is the C parameter binaryrepresentation in nc bits; (2) the second block, which

    includes rj with 1� j� nr, is the parameter binary repre-sentation in nr bits.

    All the SVM parameters are numerical values, and it is

    necessary to convert the C and r parameter values in binaryrepresentation to a decimal representation form. This is

    done at each chromosome fitness evaluation in order to

    enable the classifier configuration and was performed here

    by Eq. (21) (Huang and Wang 2006).

    p ¼ minp

    þmaxp �minp2l � 1 � d ð21Þ

    where p denotes the phenotype parameter, minp and maxpare the minimum and maximum value of parameter p,

    respectively, d is the decimal value of bit string, and l is the

    number of bits used in the parameter binary representation.

    Fitness function

    Fitness function is an objective function that estimates the

    quality of each chromosome. In the GA-based SVM

    parameter optimization process, the most difficult work is

    to design a fitness function to produce SVM parameters

    that are reliable and effective for SVM models. K-fold

    cross-validation (CV) is a widely used technique to assess

    the generalization ability of an SVM classifier. One idea

    would be to use cross-validation accuracy (CVA) as fitness

    function in the genetic algorithm instead of classification

    accuracy. Higher CVA represents greater SVM classifier

    generalization ability on given training data. Here, we

    Fig. 2 Chromosome comprises two parameters, C and r (afterMarghany 2014)

    874 Page 4 of 11 Environ Earth Sci (2016) 75:874

    123

  • apply a fivefold cross-validation accuracy computed by

    SVM classifier on the training set to the GA fitness func-

    tion. We further calculate the average accuracy of the

    fivefold cross-validation as the final fitness value. The

    corresponding function can be defined as follows:

    fi ¼1

    n

    Xni¼1

    SVMCVA i½ �ð Þ ð22Þ

    where SVMCVA i½ � denotes the cross-validation accuracy offold i calculated by the SVM model and n indicates the

    total number of folds executed with cross-validation. Here

    n is equal to 5.

    Genetic operators

    Figure 3 illustrates the genetic operators of crossover and

    mutation. Crossover is the critical genetic operator that

    allows new solution regions in the search space to be

    explored, and it is performed by selecting a random gene

    along the length of the chromosomes and swapping all

    genes after that point (Marghany 2015). In mutation, the

    genes may occasionally be altered, i.e., binary code genes

    can change from 1 to 0 or vice versa.

    GA-SVM approach

    The main steps of the proposed GA-SVM approach are

    described as follows:

    1. Data preprocess. Normalize the datasets according to

    Eq. (23)

    y ¼ x� xminxmax � xmin

    ð23Þ

    where y is a normalized input parameter, x is the

    original input parameter, and xmax and xmin are the

    maximum and minimum parameters, respectively.

    2. Initialization. Generate initial population which indi-

    vidually is comprised of C and r, then convert eachchromosome from its genotype into phenotype.

    3. Fitness evaluation. For each chromosome representing

    C and r, training dataset is used to train the SVMclassifier to calculate the cross-validation accuracy,

    while the testing dataset is used to validate the model.

    When the cross-validation accuracy is obtained, each

    chromosome is evaluated by fitness function Eq. (22).

    4. Termination criteria. When the termination criteria are

    satisfied, the process ends; otherwise, we proceed with

    the next generation.

    5. Genetic operation. In this step, the system searches for

    better solutions by genetic operations, including

    selection, crossover, and mutation.

    The implementation of the proposed GA-SVM approach

    was carried out using MATLAB R2012b program in this

    study. The flowchart of the GA-SVM algorithm is shown in

    Fig. 4. This approach is applied in the case study for soil

    liquefaction prediction.

    GS-based parameter optimization method

    To verify the application effect of the proposed GA-SVM

    model, this study also adopted the grid search (GS) method

    Fig. 3 Genetic crossover and mutation operation Fig. 4 Flowchart of the GA-SVM model

    Environ Earth Sci (2016) 75:874 Page 5 of 11 874

    123

  • to optimize the parameters of SVM model. GS is a com-

    mon method for searching for the best C and r. Figure 5shows the process of GS-SVM. In the grid search algo-

    rithm, pairs of (C, r) are tried and the one with the bestcross-validation accuracy is chosen. After identifying a

    better region on the grid, a finer grid search in that region

    can be conducted.

    The implementation of the proposed GS-SVM approach

    was carried out using MATLAB R2012b program in this

    study. The flowchart of the GS-SVM model is shown in

    Fig. 5.

    Case study

    Database

    It is generally known that the susceptibility of soil deposits

    to liquefaction is determined by a combination of various

    factors to which they may be subjected, such as soil

    properties, geological conditions, and ground motion

    characteristics. Therefore, it is widely recognized that the

    determination of liquefaction potential is a complex

    geotechnical engineering problem. The soil properties and

    geological conditions determine the resistance of the

    deposit to liquefaction, while earthquake characteristics

    control the seismic loading conditions. The general criteria

    for soil liquefaction are: (1) the subsoil is loose sandy

    deposit, (2) the subsoil is saturated, and (3) the earthquake

    intensity is sufficiently large, and the duration of shaking is

    sufficiently long. Accordingly, the following factors are

    chosen as the evaluating indices: cone resistance (qc), total

    vertical stress (r0), effective vertical stress (r00), mean grainsize (D50), normalized peak horizontal acceleration at

    ground surface (amax), cyclic stress ratio (s�r00), and

    earthquake magnitude (Mw).

    The database used in this study includes 109 CPT-based

    field observations fromfivemajor earthquakes between 1964

    and 1983 (Goh 1996), and it has a wide range of parameters.

    The data consisted of 16 case records from Japan, 79 from

    China, 9 from the USA, and 5 from Romania. This repre-

    sented 74 sites that liquefied and 35 sites that did not liquefy

    (Table 1). To the best of the authors’ knowledge, there is no

    criterion used in dividing the data into training and testing

    sets presently, and the method for dividing the data into

    training and testing is arbitrary. For instance, Kurup and

    Dudani (2002) used 63 % of the data for training, Tang et al.

    (2005) used 75 %, while Padmini et al. (2008) used 80 %. In

    this study, 74 datasets are used for the training dataset, and

    other 35 datasets are used for the testing dataset. Here, for

    each case, ‘‘1’’ represents liquefaction and ‘‘0’’ denotes no

    liquefaction.

    Results and discussion

    Figure 6 shows that the best (C, r) is (12.5152, 1.0616)with a cross-validation rate of 97.29 % after training the

    GA-SVM model. In this result, the optimal parameters are

    used to test the GA-SVM model to generate the final

    classifier. Out of the 35 testing datasets used for prediction,

    only one case was misclassified, achieving an overall

    classification accuracy rate of 97.25 %.

    In addition to verifying the effectiveness of the proposed

    GA-SVM model, this study compares it with the GS-SVM

    model, artificial neural network (ANN) (Goh 1996), and

    C4.5 decision tree (Ardakani and Kohestni 2015) models.

    Since the bounds of parameters of GS-SVM are not known

    a priori, a coarse range search is made to find the best

    region of the GS-SVM parameters. Performing a complete

    grid search may be time-consuming. Hence, a coarse grid

    search is performed first. Once the coarse grid search is

    performed, fine grid search is then performed. The ranges

    of parameters taken for coarse grid and fine grid search are

    given in Table 2.Fig. 5 Flowchart of the GS-SVM model

    874 Page 6 of 11 Environ Earth Sci (2016) 75:874

    123

  • Table 1 Dataset used in case study (1: liquefaction; 0: no liquefaction) [data from Goh (1996)]

    No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by

    GA-SVM

    1 7.5 53 36 3.20 0.16 0.15 0.331 1 1

    2 7.5 87 52 1.6 0.16 0.16 0.331 1 1

    3 7.5 99 58 7.2 0.16 0.17 0.331 1 1

    4 7.5 152 83 5.6 0.16 0.17 0.331 1 1

    5 7.5 91 63 5.45 0.16 0.14 0.331 1 1

    6 7.5 127 80 8.84 0.16 0.15 0.331 1 1

    7 7.5 211 120 9.70 0.16 0.15 0.331 1 1

    8 7.5 86 46 8.0 0.16 0.19 0.30 0 0

    9 7.5 95 50 14.55 0.16 0.18 0.30 0 0

    10 7.7 58 48 10.0 0.23 0.18 0.32 0 0

    11 7.7 73 54 16.0 0.23 0.20 0.32 0 0

    12 7.7 96 6 15.38 0.23 0.21 0.32 0 0

    13 7.7 54 46 1.79 0.23 0.17 0.32 1 1

    14 7.7 64 52 4.10 0.23 0.19 0.32 1 1

    15 7.7 96 67 7.96 0.23 0.21 0.32 1 1

    16 7.7 114 75 8.97 0.23 0.22 0.32 1 1

    17 7.8 57 42 1.70 0.40 0.35 0.06 1 1

    18 7.8 114 69 9.40 0.40 0.41 0.25 1 1

    19 7.8 148 85 5.70 0.40 0.42 0.25 1 1

    20 7.8 162 92 7.60 0.40 0.42 0.30 1 1

    21 7.8 17 17 1.5 0.40 0.27 0.17 1 1

    22 7.8 25 21 1.0 0.40 0.32 0.17 1 1

    23 7.8 34 25 5.0 0.40 0.36 0.17 1 1

    24 7.8 38 34 2.50 0.40 0.29 0.14 1 1

    25 7.8 57 43 2.60 0.40 0.34 0.14 1 1

    26 7.8 76 52 3.20 0.40 0.37 0.16 1 1

    27 7.8 89 58 5.80 0.40 0.9 0.16 1 1

    28 7.8 122 74 3.50 0.40 0.40 0.16 1 1

    29 7.8 181 102 8.40 0.40 0.41 0.16 1 1

    30 7.8 38 29 1.70 0.40 0.35 0.12 1 1

    31 7.8 40 29 3.0 0.40 0.36 0.12 1 1

    32 7.8 51 35 4.10 0.40 0.38 0.12 1 1

    33 7.8 29 27 5.50 0.40 0.29 0.17 1 1

    34 7.8 57 40 9.00 0.40 0.37 0.32 1 1

    35 7.8 23 21 7.00 0.40 0.29 0.48 1 1

    36 7.8 34 26 1.18 0.40 0.35 0.48 1 1

    37 7.8 48 33 4.24 0.40 0.38 0.48 1 1

    38 7.8 76 71 11.47 0.40 0.27 0.16 0 0

    39 7.8 160 111 15.76 0.40 0.34 0.20 0 0

    40 7.8 59 56 11.39 0.20 0.14 0.21 0 0

    41 7.8 78 65 12.12 0.20 0.15 0.21 0 0

    42 7.8 99 75 17.76 0.20 0.17 0.14 0 0

    43 7.8 74 49 2.65 0.20 0.19 0.14 1 1

    44 7.8 53 35 4.40 0.20 0.20 0.16 1 1

    45 7.8 61 39 3.00 0.20 0.20 0.16 1 1

    46 7.8 156 81 9.00 0.20 0.23 0.08 1 1

    47 7.8 99 55 2.00 0.10 0.11 0.14 1 0

    48 7.8 95 52 1.10 0.20 0.23 0.07 1 0

    Environ Earth Sci (2016) 75:874 Page 7 of 11 874

    123

  • Table 1 continued

    No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by

    GA-SVM

    49 7.8 209 106 15.50 0.10 0.11 0.08 0 0

    50 7.8 217 110 6.50 0.10 0.11 0.08 0 0

    51 7.8 91 53 9.00 0.10 0.11 0.10 0 0

    52 7.8 101 58 2.50 0.10 0.11 0.10 0 0

    53 7.8 112 63 16.50 0.10 0.11 0.10 0 0

    54 7.8 91 68 13.65 0.10 0.06 0.25 0 0

    55 7.8 114 58 8.47 0.20 0.24 0.062 0 0

    56 7.8 228 112 4.55 0.20 0.23 0.067 0 0

    57 7.8 249 122 5.79 0.20 0.22 0.067 0 0

    58 7.8 121 55 2.48 0.20 0.25 0.062 1 1

    59 7.8 114 56 1.57 0.20 0.25 0.062 1 1

    60 7.8 213 103 1.4 0.20 0.23 0.67 1 1

    61 7.8 220 106 2.15 0.20 0.23 0.67 1 1

    62 7.8 230 111 2.60 0.20 0.23 0.67 1 1

    63 7.8 213 103 2.73 0.20 0.23 0.67 1 1

    64 7.8 219 106 1.78 0.20 0.23 0.67 1 1

    65 7.8 211 108 7.64 0.20 0.22 0.67 0 0

    66 6.6 29 29 25.60 0.80 0.44 0.11 0 0

    67 6.6 48 36 24.70 0.80 0.57 0.11 0 0

    68 6.6 64 42 31.40 0.80 0.64 0.11 0 0

    69 6.6 29 29 1.43 0.80 0.44 0.11 1 1

    70 6.6 64 42 2.48 0.80 0.64 0.11 1 1

    71 6.6 96 54 4.03 0.80 0.72 0.11 1 1

    72 6.6 29 29 3.30 0.80 0.44 0.06 0 0

    73 6.6 48 36 8.80 0.80 0.57 0.06 0 0

    74 6.6 64 42 6.70 0.80 0.64 0.06 0 0

    75 7.8 89 78 1.65 0.20 0.41 0.17 1 1

    76 7.8 99 83 3.65 0.20 0.15 0.17 1 1

    77 7.8 29 25 1.03 0.20 0.15 0.19 1 1

    78 7.8 55 37 5.00 0.20 0.19 0.31 1 1

    79 7.8 76 47 2.91 0.20 0.21 0.18 1 1

    80 7.8 105 61 6.06 0.20 0.21 0.18 1 1

    81 7.8 23 22 13.24 0.20 0.14 0.17 0 0

    82 7.8 32 26 13.06 0.20 0.16 0.17 0 0

    83 7.8 40 30 16.59 0.20 0.18 0.17 0 0

    84 7.8 59 59 10.59 0.20 0.13 0.26 0 0

    85 7.8 63 61 9.12 0.20 0.13 0.26 0 0

    86 7.8 76 67 11.29 0.20 0.15 0.26 0 0

    87 7.8 70 41 1.94 0.20 0.22 0.16 1 1

    88 7.8 76 44 5.00 0.20 0.22 0.16 1 1

    89 7.8 70 47 2.24 0.20 0.19 0.14 1 1

    90 7.8 114 79 14.12 0.20 0.09 0.25 0 0

    91 7.8 162 102 18.94 0.20 0.09 0.28 0 0

    92 7.8 44 44 3.52 0.20 0.13 0.16 1 1

    93 7.8 59 51 2.73 0.20 0.15 0.16 1 1

    94 7.8 57 50 3.29 0.20 0.15 0.21 1 1

    95 7.8 61 52 4.12 0.20 0.15 0.21 1 1

    96 7.8 72 57 2.94 0.20 0.16 0.21 1 1

    874 Page 8 of 11 Environ Earth Sci (2016) 75:874

    123

  • After the GS-SVM procedure, the optimal parameters

    with maximal classification accuracy are selected. As

    shown in Figs. 7 and 8, the best (C, r) is (3.7321, 2.0) witha cross-validation rate of 91.89 %. In this result, the opti-

    mal parameters are used to test the GS-SVM model to

    generate the final classifier. Out of the 35 testing datasets

    used, only two cases were misclassified, achieving an

    overall classification accuracy rate of 92.66 %.

    The comparison of prediction results of GA-SVM with

    GS-SVM, ANN (Goh 1996), and C4.5 decision tree (Ar-

    dakani and Kohestni 2015) is shown in Table 3. From

    Table 3, it can be seen that the classification success rate of

    the GA-SVM model for the training data is 97.29 %, which

    is higher than the rates of C4.5 decision tree (95.9 %) and

    GS-SVM models (91.89 %), and is slightly lower than the

    rate of ANN (98.6 %) reported by Goh (1996). However,

    the overall classification success rate of GA-SVM for the

    entire dataset is 97.25 %, while the overall classification

    success rates predicted by ANN, C4.5 decision tree, and

    GS-SVM models are 97.20, 96.30, and 92.66 %, respec-

    tively. It can be seen that the overall classification success

    rate predicted by GA-SVM model is the highest among the

    four methods. Moreover, the ANN uses many parameters,

    such as the number of hidden layers, number of hidden

    Table 1 continued

    No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by

    GA-SVM

    97 7.8 61 52 3.00 0.20 0.15 0.15 1 1

    98 7.8 95 68 5.85 0.20 0.18 0.32 1 1

    99 7.8 106 73 9.00 0.20 0.18 0.32 1 1

    100 7.8 49 48 1.88 0.20 0.13 0.13 1 1

    101 7.8 74 64 2.55 0.20 0.15 0.17 1 1

    102 7.8 76 65 4.50 0.20 0.15 0.17 1 1

    103 7.8 106 79 4.24 0.20 0.17 0.17 1 1

    104 7.8 114 77 8.00 0.20 0.18 0.22 0 1

    105 7.2 80 48 5.22 0.22 0.21 0.20 1 1

    106 7.2 95 55 3.73 0.22 0.22 0.20 1 1

    107 7.2 114 64 3.11 0.22 0.22 0.20 1 1

    108 7.2 133 73 1.32 0.22 0.22 0.20 1 1

    109 7.2 152 82 5.22 0.22 0.22 0.20 1 1

    Fig. 6 Fitness curve for GA-SVM to find the best parameters

    Table 2 Coarse and fine range partitions of GS-SVM parameterstaken during two-step optimization

    SVM parameters Coarse range partitions Fine range partitions

    C [2-10, 2-10] [2-2, 2-2]

    r [2-10, 2-10] [2-2, 2-2]

    Environ Earth Sci (2016) 75:874 Page 9 of 11 874

    123

  • nodes, learning rate, momentum term, number of training

    epochs, and weight initialization methods. The decision

    tree method is quite transparent and does not need opti-

    mization and internal parameters; however, it has a limited

    domain of applicability and is mostly case dependent. The

    generalization of decision tree method is limited and is

    only applicable in the range of training data. The case study

    on soil liquefaction prediction shows that the proposed GA-

    SVM model has better performance than the GS-SVM,

    ANN, and C4.5 decision tree approaches.

    Conclusions

    Determination of liquefaction potential is a complex

    geotechnical engineering problem due to the heterogeneous

    nature of the soils and the participation of a large number

    of factors that affect the liquefaction occurrence due to

    earthquakes. This study presents a hybrid model based on

    the combination of SVM and GA to improve the fore-

    casting performance. GA is employed in selecting the

    optimal values of the kernel function and the penalty

    parameter in SVM model to improve the forecasting

    accuracy. The developed GA-SVM approach was com-

    pared with GS-SVM, ANN model, and C4.5 decision tree

    approach. Prediction results demonstrate that the classifi-

    cation accuracy rate of the developed GA-SVM approach

    surpasses those of the other methods. The GA-SVM model

    can be used as a reliable approach in soil liquefaction

    prediction.

    Fig. 7 Parameters C and rversus the accuracy rate in two

    dimensions

    Fig. 8 Parameters C and r versus the accuracy rate in threedimensions

    Table 3 Comparison of modelpredictions of GA-SVM with

    GS-SVM, ANN, and C4.5

    decision tree

    Methods Performance in terms of successful prediction (%)

    Training Testing Overall

    ANN (Goh 1996) 98.6 94.3 97.20

    C4.5 decision tree (Ardakani and Kohestni 2015) 95.9 97.1 96.30

    GA-SVM 97.29 97.14 97.25

    GS-SVM 91.89 94.29 92.66

    874 Page 10 of 11 Environ Earth Sci (2016) 75:874

    123

  • References

    Alemdag S, Gurocak Z, Cevik A, Cabalar AF, Gokceoglu C (2016)

    Modeling deformation modulus of a stratified sedimentary rock

    mass using neural network, fuzzy inference and genetic

    programming. Eng Geol 203:70–82

    Ardakani A, Kohestni VR (2015) Evaluation of liquefaction potential

    based on CPT results using C4.5 decision tree. J AI Data Mining

    3(1):85–92

    Cevik A, Cabalar AF (2009) Modelling damping ratio and shear

    modulus of sand-mica mixtures using genetic programming.

    Expert Syst Appl 36(4):7749–7757

    Chern SG, Lee CY (2009) CPT-based simplified liquefaction

    assessment by using fuzzy-neural network. J Mar Sci Technol

    17(4):326–331

    Cortes C, Vapnik V (1995) Support-vector network. Mach Learn

    20:273–297

    Dibike YB, Velickov S, Solomatine D, Abbott MB (2001) Model

    induction with support vector machines: introduction and

    applications. J Comput Civ Eng 15(3):208–216

    Ding SF, Su CY, Yu JZ (2011) An optimizing BP neural network

    algorithm based on genetic algorithm. Artif Intell Rev

    36:153–162

    Farrokhzad F, Choobbasti AJ, Barari A (2012) Liquefaction micro-

    zonation of Babol city using artificial neural network. J King

    Saud Univ Sci 24(1):89–100

    Goh AT (1996) Neural-network modeling of CPT seismic liquefac-

    tion data. J Geotech Eng 122(1):70–73

    Guettaya I, Ouni MREL (2014) In situ-based assessment of soil

    liquefaction potential-case study of an earth dam in Tunisia.

    Front Struct Civ Eng 8(4):456–461

    Huang CL, Wang CJ (2006) A GA-based feature selection and

    parameters optimization for support vector machines. Expert

    Syst Appl 31:231–240

    Jha SK, Suzuki K (2009) Reliability analysis of soil liquefaction

    based on standard penetration test. Comput Geotech

    36(4):589–596

    Juang CH, Chen CJ, Jiang T, Andrus RD (2000) Risk-based

    liquefaction potential evaluation using standard penetration tests.

    Can Geotech J 37:1195–1208

    Karegowda AG, Manjunath AS, Jayaram MA (2011) Application of

    genetic algorithm optimized neural network connection weights

    for medical diagnosis of Pima Indians diabetes. Int J Soft

    Comput 2(2):15–22

    Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector

    machines with Gaussian kernel. Neural Comput 15(7):1667–1689

    Kumar S, Naresh R (2007) Efficient real coded genetic algorithm to

    solve the non-convex hydrothermal scheduling problem. Int J

    Electr Power Energy Syst 29(10):738–747

    Kurup PU, Dudani NK (2002) Neural networks for profiling stress

    history of clays from PCPT data. J Geotech Geoenviron

    128(4):569–579

    Lee CY, Chern SG (2013) Application of a support vector machine

    for liquefaction assessment. J Mar Sci Technol 21(3):318–324

    Mahesh P (2006) Support vector machines-based modelling of

    seismic liquefaction potential. Int J Numer Anal Methods

    Geomech 30:983–996

    Marcuson WF III (1978) Definition of terms related to liquefaction.

    J Geotech Eng Div ASCE 104(9):1197–1200

    Marghany M (2013) Genetic algorithm for oil spill automatic

    detection from Envisat Satellite Data. In: Computational science

    and its applications—ICCSA 2013. Springer, Berlin, pp 587–598

    Marghany M (2014) Multi-objective evolutionary algorithm for oil

    spill detection from COSMO-SkeyMed Satellite. In:

    Computational science and its applications—ICCSA 2014.

    Springer, Berlin, pp 355–371

    Marghany M (2015) Automatic detection of oil spills in the Gulf of

    Mexico from RADARSAT-2 SAR satellite data. Environ Earth

    Sci 74:5935–5947

    Maria JS (2011) Applying artificial neural networks for analysis of

    geotechnical problems. Comput Assist Mech Eng Sci

    18:231–241

    Mert T (2013) A comparative study on computer aided liquefaction

    analysis methods. Int J Housing Sci 37(2):121–135

    Mughieda O, Bani HK, Safieh B (2009) Liquefaction assessment by

    artificial neural networks based on CPT. Int J Geotech Eng

    2:289–302

    Padmini D, Ilamparuthi K, Sudheer KP (2008) Ultimate bearing

    capacity prediction of shallow foundations on cohesionless soils

    using neurofuzzy models. Comput Geotech 35:33–46

    Pal M (2006) Support vector machines-based modeling of seismic

    liquefaction potential. Int J Numer Anal Methods Geomech

    30:983–996

    Pour MN, Asakereh A (2015) A comparison between two field

    methods of evaluation of liquefaction potential in the Bandar

    Abbas city. Am J Civ Eng 3(2-2):1–5

    Ramakrishnan D, Singh TN, Purwar N, Badre KS, Gulati A, Gupta S

    (2008) Artificial neural network and liquefaction susceptibility

    assessment: a case study using the 2001 Bhuj earthquake data,

    Gujarat, India. Comput Geosci 12:491–501

    Robertson PK, Campanella RG (1985) Liquefaction potential of sands

    using the cone penetration test. J Geotech Eng Div ASCE

    111(3):384–403

    Sami M, de Patrick B (2005) Minimum principle and related

    numerical scheme for simulating initial flow and subsequent

    propagation of liquefied ground. Int J Numer Anal Methods

    Geomech 29:1065–1086

    Samui P (2013) Liquefaction prediction using support vector machine

    model based on cone penetration data. Front Struct Civ Eng

    7(1):72–82

    Samui P, Sitharam TG (2011) Machine learning modelling for

    predicting soil liquefaction susceptibility. Nat Hazards Earth

    Syst Sci 11:1–9

    Seed HB, Idriss IM (1967) Analysis of soil liquefaction: Niigata

    earthquake. J Soil Mech Foun Div ASCE 93(3):83–108

    Seed HB, Idriss IM (1971) Simplified procedure for evaluating soil

    liquefaction potential. J Soil Mech Foun Div ASCE

    97(9):1249–1273

    Seed HB, Idriss IM, Arango I (1983) Evaluation of liquefaction

    potential using field performance data. J Geotech Eng Div ASCE

    109(3):458–482

    Shibata T, Teparaksa W (1988) Evaluation of liquefaction potentials

    of soils using cone penetration tests. Soils Found 28(2):49–60

    Skempton AW (1986) Standard penetration test procedures and the

    effects in sands of overburden pressure, relative density, particle

    size, aging and overconsolidation. Géotechnique 36(3):425–447

    Tang Y, Zang YQ, Huang G, Hu X (2005) Granular SVM-RFE geneselection algorithm for reliable prostate cancer classification on

    microarray expression data. In: Proceedings of the 5th IEEE

    symposium on bioinformatics and bioengineering (BIBE’05)

    Vapnik V (1995) The nature of statistical learning theory. Springer,

    New York

    Zhang G, Robertson PK, Brachman RWI (2004) Estimating lique-

    faction-induced lateral displacements using the standard pene-

    tration test or cone penetration test. J Geotech Geoenviron Eng

    130(8):861–871

    Environ Earth Sci (2016) 75:874 Page 11 of 11 874

    123

    Application of genetic algorithm-based support vector machines for prediction of soil liquefactionAbstractIntroductionIntroduction of support vector machinesGA-based parameter optimization methodGenetic algorithm (GA)SVM optimized by GA (GA-SVM)Chromosome designFitness functionGenetic operatorsGA-SVM approachGS-based parameter optimization method

    Case studyDatabase

    Results and discussionConclusionsReferences