application of genetic algorithm-based support vector ... papers/applicati… · soil liquefaction...
TRANSCRIPT
-
ORIGINAL ARTICLE
Application of genetic algorithm-based support vector machinesfor prediction of soil liquefaction
Xinhua Xue1,2 • Ming Xiao2
Received: 2 February 2016 / Accepted: 20 April 2016 / Published online: 11 May 2016
� Springer-Verlag Berlin Heidelberg 2016
Abstract This paper presents a hybrid genetic algorithm
(GA) and support vector machine (SVM) techniques to
predict the potential of soil liquefaction. GA is employed in
selecting the optimal values of the kernel function and the
penalty parameter in SVM model to improve the fore-
casting accuracy. The database used in this study includes
109 CPT-based field observations from five major earth-
quakes between 1964 and 1983. Several important
parameters, including the cone resistance, total vertical
stress, effective vertical stress, mean grain size, normalized
peak horizontal acceleration at ground surface, cyclic stress
ratio, and earthquake magnitude, were used as the input
parameters, while the potential of soil liquefaction was the
output parameter. The predictions from the GA-SVM
model were compared with those from three methods: grid
search (GS) method, artificial neural network (ANN)
model, and C4.5 decision tree approach. The overall clas-
sification success rates for the entire dataset predicted by
GA-SVM, ANN, C4.5 decision tree, and GS-SVM models
are 97.25, 97.2, 96.3, and 92.66 %, respectively. The study
concluded that the proposed GA-SVM model improves the
classification accuracy and is a feasible method in pre-
dicting soil liquefaction.
Keywords Soil liquefaction � Support vector machine �Genetic algorithm � Grid search � Liquefaction �Classification accuracy
Introduction
Soil liquefaction and related ground failures are commonly
associated with large earthquakes. Liquefaction is defined
as the transformation of a granular material from a solid to
a liquefied state as a consequence of increased pore-water
pressure and reduced effective stress (Marcuson 1978).
During ground shaking, shrinkage of pore spaces of loose
to medium-compact granular soils squeezes the pore water;
when the pore water cannot easily drain, the pore-water
pressure significantly increases, thus reducing the effective
stress. If the shear resistance of the soil becomes less than
the static shear stress, the soil can undergo large defor-
mations and is said to liquefy (Seed and Idriss 1967, 1971).
Liquefaction may cause the reduction or loss of bearing
capacity, large settlement, and horizontal displacement due
to lateral spreads of liquefied soils. Liquefaction can
exhibit in the forms of sand boils or lateral spread of sur-
ficial soils. Examples of this type of damage were observed
in the historical earthquakes, such as the 1964 Niigata, the
1964 Alaska, the 1971 San Fernando, the 1985 Mexico
City, the 1994 Northridge, the 1994 Kobe, the 1999 Tai-
wan, the 1999 Turkey, the 2010 Baja California, and the
2011 Tōhoku earthquakes. Therefore, the assessment of
liquefaction potential due to an earthquake at a site is an
imperative task in geotechnical earthquake engineering.
There are different methods available for determination
of liquefaction potential of soil (e.g., Seed et al. 1983;
Skempton 1986; Robertson and Campanella 1985; Shibata
and Teparaksa 1988; Juang et al. 2000; Zhang et al. 2004;
& Xinhua [email protected]
1 State Key Laboratory of Hydraulics and Mountain River
Engineering, College of Water Resource and Hydropower,
Sichuan University, No. 24 South Section 1, Yihuan Road,
Chengdu 610065, Sichuan, People’s Republic of China
2 Department of Civil and Environmental Engineering, The
Pennsylvania State University, University Park, PA 16802,
USA
123
Environ Earth Sci (2016) 75:874
DOI 10.1007/s12665-016-5673-7
http://crossmark.crossref.org/dialog/?doi=10.1007/s12665-016-5673-7&domain=pdfhttp://crossmark.crossref.org/dialog/?doi=10.1007/s12665-016-5673-7&domain=pdf
-
Jha and Suzuki 2009; Samui 2013; Guettaya and Ouni
2014; Pour and Asakereh 2015). Most of these methods
depend on some limit states that separate the non-lique-
faction region from the liquefaction region and were
established empirically using in situ field observations
from sites where test data are available (Mahesh 2006).
Among them, standard penetration tests (SPT), cone pen-
etration tests (CPT), flat dilatometer tests (DMT), shear
wave velocity technique (SWV), and self-boring pressure
meter test (SBPT) are the most commonly used in situ tests
for liquefaction potential prediction. However, the high
uncertainty in earthquake environments and soil charac-
teristics make it difficult to choose a suitable empirical
equation for regression analysis (Samui 2013). Conse-
quently, many scholars and engineers have attempted to
develop scientifically derived analytical models that are
simpler, easier to implement, and more accurate than tra-
ditional empirical equations for soil liquefaction analysis
(Pal 2006; Lee and Chern 2013).
In recent years, artificial neural networks (ANNs) model
has been widely used for prediction of liquefaction
potential as a classification problem (Sami and Patrick
2005; Ramakrishnan et al. 2008; Chern and Lee 2009;
Mughieda et al. 2009; Samui and Sitharam 2011; Maria
2011; Farrokhzad et al. 2012; Mert 2013). Although the
ANN was found to be more efficient compared to statistical
methods, it has several inherent drawbacks such as over-
fitting, slow convergence, poor generalizing performance,
and arriving at local minimum. Support vector machine
(SVM), developed by Vapnik (1995), is a machine-learning
algorithm. SVM classifies data with different class labels
by determining a set of support vectors that are members of
a set of training inputs that outline a hyperplane in the
feature space. SVM provides a generic mechanism that fits
the hyperplane surface to the training data using a kernel
function. However, it is necessary to optimize the SVM
parameters such as the penalty parameter C and the kernel
function parameters in order to improve the SVM classi-
fication accuracy. The grid search (GS) method is com-
monly used as parameter selection method in SVM.
However, this method is prone to trap at local optimal
points because GS is limited to the parameter value set
initially (Keerthi and Lin 2003). Compared to GS, genetic
algorithm (GA) is a search algorithm based upon the
mechanics of natural selection (Kumar and Naresh 2007),
and it outperforms in terms of both accuracy and time
efficiency. Due to the promising performances, GA has
been successfully used in various fields (e.g., Cevik and
Cabalar 2009; Ding et al. 2011; Karegowda et al. 2011;
Marghany 2013, 2014, 2015; Alemdag et al. 2016). How-
ever, few publications are available on the performance
comparison of GS and GA in optimizing the SVM model
for the prediction of soil liquefaction potential. Therefore,
the research objective is to develop a GA-SVM model for
the prediction of soil liquefaction and compare the per-
formances of the GA-SVM model with three other meth-
ods: GS-SVM, ANN, and C4.5 decision tree approaches.
Introduction of support vector machines
Consider a training sample set xi; yið Þf g; i = 1 to N, whereN is the total number of samples. It is assumed that the
samples can be classified into two classes namely triangle
class and five-point star class (Fig. 1). Labels yi ¼ �1 andyi ¼ þ1 are associated with the triangle class and the five-point star class, respectively. H in Fig. 1 is a labeling line
separating the two samples, and the H1 and H2 dashed
lines pass through the nearest samples to the labeling line.
The distance between H1 and H2 is called the margin, and
the nearest data points used to define the margin are called
support vectors. The separating hyperplane H that has the
maximum distance between the nearest data is called the
optimal separating hyperplane.
As Fig. 1 shows, the hyperplane that separates the given
data can be expressed as
wTxþ b ¼ 0 ð1Þ
where b is the bias, w is an adaptive weight vector, x is an
input vector, and wTx is an inner product of w and x. For
the linearly separable class, a separating hyperplane for the
two classes can be defined as
wTxi þ b� 1 if yi ¼ 1wTxi þ b� � 1 if yi ¼ �1
�ð2Þ
or it can be presented in the following complete equation:
yi wTxi þ b
� �� 1� 0 ð3Þ
The goal of the SVM is to find b and w for the optimal
separating hyperplane to maximize the margin 2= wk k.Therefore, the hyperplane that optimally separates the data
Fig. 1 Hyper-plane classification
874 Page 2 of 11 Environ Earth Sci (2016) 75:874
123
-
is the one that minimizes wk k. The optimal separatinghyperplane can be obtained by solving the following con-
vex quadratic optimization problem (Vapnik 1995):
Minimize1
2wTw ¼ 1
2wk k2 ð4Þ
Subject to yi wTxi þ b
� �� 1� 0 i ¼ 1; 2; . . .;N ð5Þ
The above equation can be transformed into the equiv-
alent Lagrangian dual problem as (Lee and Chern 2013)
L w; b; að Þ ¼ 12wTw�
XNi¼1
ai yi wTxi þ b
� �� 1
� �ð6Þ
where ai � 0 (i = 1, 2,…, N) are the Lagrange multipliers.The main goal here is to find w and b, and the a which,respectively, minimize and maximizes Eq. (6). This can be
done by differentiating L with respect to w and b, and
setting the derivatives to zero (Lee and Chern 2013):
oL w; b; að Þow
¼ 0 ) w ¼XNi¼1
aiyixi ð7aÞ
oL w; b; að Þob
¼ 0 )XNi¼1
aiyi ¼ 0 ð7bÞ
Multiplying Eq. (7a) by wT leads to
wTw ¼XNi¼1
aiyiwTxi ¼
XNi¼1
XNj¼1
aiajyiyjxTi xj
i ¼ 1; 2; . . .;N; j ¼ 1; 2; . . .;Nð Þð8Þ
Based on Eqs. (7a), (7b), and (8), Eq. (6) can then be
reformulated as follows:
L að Þ ¼XNi¼1
ai �1
2
XNi¼1
XNj¼1
aiajyiyjxTi xj ð9Þ
Subject to
PNi¼1 aiyi ¼ 0
ai � 0; i ¼ 1; 2; . . .;N
�ð10Þ
It can be seen that this is a convex quadratic optimiza-
tion problem, and it requires a solution a�i to determine theparameters w� and b� for the optimal hyperplane. Thus, theoptimal hyperplane decision function can be written as
(Lee and Chern 2013)
f xð Þ ¼ sign w�Txi þ b�� �
¼ signXNi¼1
a�i yixTi xj þ b�
!
ð11Þ
where sign is the signum function. If the result is positive, then
it is classified x as class 1, otherwise, it is classified as class 2.
However, because most of the classification problems
are linearly non-separable instances, Cortes and Vapnik
(1995) introduced positive slack variable n and a penalty
parameter C to reduce the number of misclassifications.
Thus, Eq. (4) can be changed as follows:
Minimize1
2wk k2þC
XNi¼1
ni ð12Þ
Subject toyi w
Txi þ bð Þ� 1� nini � 0; i ¼ 1; 2; . . .;N
�ð13Þ
where the slack variable n accounts for the effect of mis-classification and the penalty parameter C defines the trade-
off between the number of misclassification in the training
data and margin maximization.
As above, the Lagrange function for this problem can be
written as
L w; b; n; a; bð Þ ¼ 12wTwþ C
XNi¼1
ni
�XNi¼1
ai yi wTxi þ b
� �� 1þ ni
� �
þXNi¼1
bini ð14Þ
Subject to ai; bi � 0 i ¼ 1; 2; . . .;Nð Þ ð15Þ
where ai and bi are the Lagrange multipliers.Differentiating L with respect to w, b, and n, and setting
the derivatives to zero lead to
oL w; b; n; a; bð Þow
¼ 0 ) w ¼XNi¼1
aiyixi ð16aÞ
oL w; b; n; a; bð Þob
¼ 0 )XNi¼1
aiyi ¼ 0 ð16bÞ
oL w; b; n; a; bð Þoni
¼ 0 ) C � ai � bi ¼ 0 ð16cÞ
Based on Eqs. (16a), (16b), and (16c), Eq. (14) can then be
reformulated as follows:
L w; b; n; a; bð Þ ¼XNi¼1
ai �1
2
XNi¼1
XNj¼1
aiajyiyjxTi xj ð17Þ
Subject to
Pni¼1 aiyi ¼ 0
0� ai �C; i ¼ 1; 2; . . .;N
�ð18Þ
The equations are almost the same dual problem as
before, with a slight difference being that the multipliers aihave an extra constraint. For nonlinear problems, the
optimization problem can be solved by mapping the data
into a high-dimensional feature space. In high-dimensional
feature space, the inner product operation of linear problem
can be substituted by kernel functions, i.e.,
K xi; xj� �
¼ u xið Þu xj� �
. Therefore, the decision function
can be written as
Environ Earth Sci (2016) 75:874 Page 3 of 11 874
123
-
f xð Þ ¼XNi¼1
yiaiK xi; xj� �
þ b ð19Þ
There are several kernel functions, such as the linear kernel
functions, polynomial kernel functions, radial basis func-
tion (RBF), sigmoid kernel functions, that are used in
SVM. Dibike et al. (2001) demonstrated that the RBF
outperformed other kernel functions after using different
kernels in SVM for rainfall runoff modeling. Therefore,
RBF is adopted in this study and expressed as
K xi; xj� �
¼ exp �xi � xj�� ��2
2r2
!ð20Þ
where r is the width of the radial basis function.
GA-based parameter optimization method
Parameters of the kernel function and the penalty param-
eter (C) in SVM model would greatly influence the pre-
diction results. However, the SVM theory does not provide
method for selecting these parameters. For optimizing the
parameters of the kernel function and the penalty parameter
(C) in SVM model, genetic algorithm (GA) was used to
select the best-matched parameters in this study.
Genetic algorithm (GA)
Genetic algorithm (GA) is a search algorithm based upon
the mechanics of natural selection, derived from the
theory of natural evolution. GA simulates mechanisms of
population genetics and natural rules of survival in
pursuit of the ideas of adaptation. A typical GA starts
with an initial set of random solutions called population,
and each individual in the population is called a chro-
mosome. A chromosome is usually, but not necessarily,
a binary string and represents a solution to a problem.
Chromosomes evolve through successive iterations,
called generations. During each generation, the chromo-
somes are evaluated, using some measures of fitness. To
create the next generation, new chromosomes, called
offspring, are formed by either (1) merging two chro-
mosomes from the current generation using a crossover
operator or (2) modifying a chromosome using a muta-
tion operator. A new generation is formed by (1)
selecting some of the parents and offspring according to
their fitness values and (2) rejecting the rest so that the
population size is kept constant. In the process, better-
fitting chromosomes have a higher chance of being
selected. After several generations, the algorithm con-
verges to the best set of chromosomes, which hopefully
represent the optimum or near-optimal solution to the
problem.
SVM optimized by GA (GA-SVM)
To design an effective classifier, the parameters of SVM
model have to be configured properly in advance. In this
section, the proposed GA-based SVM parameter opti-
mization approach is described as follows.
Chromosome design
In this study, the RBF kernel function is adopted due to its
promising performances, and thus, only two parameters, C
and r, need to be optimized by using the proposed GA-based method. Therefore, the chromosome comprises two
parameters, C and r. Figure 2 shows the chromosomemodel formed by two binary blocks: (1) the first block,
which includes Ci with 1� i� nc, is the C parameter binaryrepresentation in nc bits; (2) the second block, which
includes rj with 1� j� nr, is the parameter binary repre-sentation in nr bits.
All the SVM parameters are numerical values, and it is
necessary to convert the C and r parameter values in binaryrepresentation to a decimal representation form. This is
done at each chromosome fitness evaluation in order to
enable the classifier configuration and was performed here
by Eq. (21) (Huang and Wang 2006).
p ¼ minp
þmaxp �minp2l � 1 � d ð21Þ
where p denotes the phenotype parameter, minp and maxpare the minimum and maximum value of parameter p,
respectively, d is the decimal value of bit string, and l is the
number of bits used in the parameter binary representation.
Fitness function
Fitness function is an objective function that estimates the
quality of each chromosome. In the GA-based SVM
parameter optimization process, the most difficult work is
to design a fitness function to produce SVM parameters
that are reliable and effective for SVM models. K-fold
cross-validation (CV) is a widely used technique to assess
the generalization ability of an SVM classifier. One idea
would be to use cross-validation accuracy (CVA) as fitness
function in the genetic algorithm instead of classification
accuracy. Higher CVA represents greater SVM classifier
generalization ability on given training data. Here, we
Fig. 2 Chromosome comprises two parameters, C and r (afterMarghany 2014)
874 Page 4 of 11 Environ Earth Sci (2016) 75:874
123
-
apply a fivefold cross-validation accuracy computed by
SVM classifier on the training set to the GA fitness func-
tion. We further calculate the average accuracy of the
fivefold cross-validation as the final fitness value. The
corresponding function can be defined as follows:
fi ¼1
n
Xni¼1
SVMCVA i½ �ð Þ ð22Þ
where SVMCVA i½ � denotes the cross-validation accuracy offold i calculated by the SVM model and n indicates the
total number of folds executed with cross-validation. Here
n is equal to 5.
Genetic operators
Figure 3 illustrates the genetic operators of crossover and
mutation. Crossover is the critical genetic operator that
allows new solution regions in the search space to be
explored, and it is performed by selecting a random gene
along the length of the chromosomes and swapping all
genes after that point (Marghany 2015). In mutation, the
genes may occasionally be altered, i.e., binary code genes
can change from 1 to 0 or vice versa.
GA-SVM approach
The main steps of the proposed GA-SVM approach are
described as follows:
1. Data preprocess. Normalize the datasets according to
Eq. (23)
y ¼ x� xminxmax � xmin
ð23Þ
where y is a normalized input parameter, x is the
original input parameter, and xmax and xmin are the
maximum and minimum parameters, respectively.
2. Initialization. Generate initial population which indi-
vidually is comprised of C and r, then convert eachchromosome from its genotype into phenotype.
3. Fitness evaluation. For each chromosome representing
C and r, training dataset is used to train the SVMclassifier to calculate the cross-validation accuracy,
while the testing dataset is used to validate the model.
When the cross-validation accuracy is obtained, each
chromosome is evaluated by fitness function Eq. (22).
4. Termination criteria. When the termination criteria are
satisfied, the process ends; otherwise, we proceed with
the next generation.
5. Genetic operation. In this step, the system searches for
better solutions by genetic operations, including
selection, crossover, and mutation.
The implementation of the proposed GA-SVM approach
was carried out using MATLAB R2012b program in this
study. The flowchart of the GA-SVM algorithm is shown in
Fig. 4. This approach is applied in the case study for soil
liquefaction prediction.
GS-based parameter optimization method
To verify the application effect of the proposed GA-SVM
model, this study also adopted the grid search (GS) method
Fig. 3 Genetic crossover and mutation operation Fig. 4 Flowchart of the GA-SVM model
Environ Earth Sci (2016) 75:874 Page 5 of 11 874
123
-
to optimize the parameters of SVM model. GS is a com-
mon method for searching for the best C and r. Figure 5shows the process of GS-SVM. In the grid search algo-
rithm, pairs of (C, r) are tried and the one with the bestcross-validation accuracy is chosen. After identifying a
better region on the grid, a finer grid search in that region
can be conducted.
The implementation of the proposed GS-SVM approach
was carried out using MATLAB R2012b program in this
study. The flowchart of the GS-SVM model is shown in
Fig. 5.
Case study
Database
It is generally known that the susceptibility of soil deposits
to liquefaction is determined by a combination of various
factors to which they may be subjected, such as soil
properties, geological conditions, and ground motion
characteristics. Therefore, it is widely recognized that the
determination of liquefaction potential is a complex
geotechnical engineering problem. The soil properties and
geological conditions determine the resistance of the
deposit to liquefaction, while earthquake characteristics
control the seismic loading conditions. The general criteria
for soil liquefaction are: (1) the subsoil is loose sandy
deposit, (2) the subsoil is saturated, and (3) the earthquake
intensity is sufficiently large, and the duration of shaking is
sufficiently long. Accordingly, the following factors are
chosen as the evaluating indices: cone resistance (qc), total
vertical stress (r0), effective vertical stress (r00), mean grainsize (D50), normalized peak horizontal acceleration at
ground surface (amax), cyclic stress ratio (s�r00), and
earthquake magnitude (Mw).
The database used in this study includes 109 CPT-based
field observations fromfivemajor earthquakes between 1964
and 1983 (Goh 1996), and it has a wide range of parameters.
The data consisted of 16 case records from Japan, 79 from
China, 9 from the USA, and 5 from Romania. This repre-
sented 74 sites that liquefied and 35 sites that did not liquefy
(Table 1). To the best of the authors’ knowledge, there is no
criterion used in dividing the data into training and testing
sets presently, and the method for dividing the data into
training and testing is arbitrary. For instance, Kurup and
Dudani (2002) used 63 % of the data for training, Tang et al.
(2005) used 75 %, while Padmini et al. (2008) used 80 %. In
this study, 74 datasets are used for the training dataset, and
other 35 datasets are used for the testing dataset. Here, for
each case, ‘‘1’’ represents liquefaction and ‘‘0’’ denotes no
liquefaction.
Results and discussion
Figure 6 shows that the best (C, r) is (12.5152, 1.0616)with a cross-validation rate of 97.29 % after training the
GA-SVM model. In this result, the optimal parameters are
used to test the GA-SVM model to generate the final
classifier. Out of the 35 testing datasets used for prediction,
only one case was misclassified, achieving an overall
classification accuracy rate of 97.25 %.
In addition to verifying the effectiveness of the proposed
GA-SVM model, this study compares it with the GS-SVM
model, artificial neural network (ANN) (Goh 1996), and
C4.5 decision tree (Ardakani and Kohestni 2015) models.
Since the bounds of parameters of GS-SVM are not known
a priori, a coarse range search is made to find the best
region of the GS-SVM parameters. Performing a complete
grid search may be time-consuming. Hence, a coarse grid
search is performed first. Once the coarse grid search is
performed, fine grid search is then performed. The ranges
of parameters taken for coarse grid and fine grid search are
given in Table 2.Fig. 5 Flowchart of the GS-SVM model
874 Page 6 of 11 Environ Earth Sci (2016) 75:874
123
-
Table 1 Dataset used in case study (1: liquefaction; 0: no liquefaction) [data from Goh (1996)]
No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by
GA-SVM
1 7.5 53 36 3.20 0.16 0.15 0.331 1 1
2 7.5 87 52 1.6 0.16 0.16 0.331 1 1
3 7.5 99 58 7.2 0.16 0.17 0.331 1 1
4 7.5 152 83 5.6 0.16 0.17 0.331 1 1
5 7.5 91 63 5.45 0.16 0.14 0.331 1 1
6 7.5 127 80 8.84 0.16 0.15 0.331 1 1
7 7.5 211 120 9.70 0.16 0.15 0.331 1 1
8 7.5 86 46 8.0 0.16 0.19 0.30 0 0
9 7.5 95 50 14.55 0.16 0.18 0.30 0 0
10 7.7 58 48 10.0 0.23 0.18 0.32 0 0
11 7.7 73 54 16.0 0.23 0.20 0.32 0 0
12 7.7 96 6 15.38 0.23 0.21 0.32 0 0
13 7.7 54 46 1.79 0.23 0.17 0.32 1 1
14 7.7 64 52 4.10 0.23 0.19 0.32 1 1
15 7.7 96 67 7.96 0.23 0.21 0.32 1 1
16 7.7 114 75 8.97 0.23 0.22 0.32 1 1
17 7.8 57 42 1.70 0.40 0.35 0.06 1 1
18 7.8 114 69 9.40 0.40 0.41 0.25 1 1
19 7.8 148 85 5.70 0.40 0.42 0.25 1 1
20 7.8 162 92 7.60 0.40 0.42 0.30 1 1
21 7.8 17 17 1.5 0.40 0.27 0.17 1 1
22 7.8 25 21 1.0 0.40 0.32 0.17 1 1
23 7.8 34 25 5.0 0.40 0.36 0.17 1 1
24 7.8 38 34 2.50 0.40 0.29 0.14 1 1
25 7.8 57 43 2.60 0.40 0.34 0.14 1 1
26 7.8 76 52 3.20 0.40 0.37 0.16 1 1
27 7.8 89 58 5.80 0.40 0.9 0.16 1 1
28 7.8 122 74 3.50 0.40 0.40 0.16 1 1
29 7.8 181 102 8.40 0.40 0.41 0.16 1 1
30 7.8 38 29 1.70 0.40 0.35 0.12 1 1
31 7.8 40 29 3.0 0.40 0.36 0.12 1 1
32 7.8 51 35 4.10 0.40 0.38 0.12 1 1
33 7.8 29 27 5.50 0.40 0.29 0.17 1 1
34 7.8 57 40 9.00 0.40 0.37 0.32 1 1
35 7.8 23 21 7.00 0.40 0.29 0.48 1 1
36 7.8 34 26 1.18 0.40 0.35 0.48 1 1
37 7.8 48 33 4.24 0.40 0.38 0.48 1 1
38 7.8 76 71 11.47 0.40 0.27 0.16 0 0
39 7.8 160 111 15.76 0.40 0.34 0.20 0 0
40 7.8 59 56 11.39 0.20 0.14 0.21 0 0
41 7.8 78 65 12.12 0.20 0.15 0.21 0 0
42 7.8 99 75 17.76 0.20 0.17 0.14 0 0
43 7.8 74 49 2.65 0.20 0.19 0.14 1 1
44 7.8 53 35 4.40 0.20 0.20 0.16 1 1
45 7.8 61 39 3.00 0.20 0.20 0.16 1 1
46 7.8 156 81 9.00 0.20 0.23 0.08 1 1
47 7.8 99 55 2.00 0.10 0.11 0.14 1 0
48 7.8 95 52 1.10 0.20 0.23 0.07 1 0
Environ Earth Sci (2016) 75:874 Page 7 of 11 874
123
-
Table 1 continued
No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by
GA-SVM
49 7.8 209 106 15.50 0.10 0.11 0.08 0 0
50 7.8 217 110 6.50 0.10 0.11 0.08 0 0
51 7.8 91 53 9.00 0.10 0.11 0.10 0 0
52 7.8 101 58 2.50 0.10 0.11 0.10 0 0
53 7.8 112 63 16.50 0.10 0.11 0.10 0 0
54 7.8 91 68 13.65 0.10 0.06 0.25 0 0
55 7.8 114 58 8.47 0.20 0.24 0.062 0 0
56 7.8 228 112 4.55 0.20 0.23 0.067 0 0
57 7.8 249 122 5.79 0.20 0.22 0.067 0 0
58 7.8 121 55 2.48 0.20 0.25 0.062 1 1
59 7.8 114 56 1.57 0.20 0.25 0.062 1 1
60 7.8 213 103 1.4 0.20 0.23 0.67 1 1
61 7.8 220 106 2.15 0.20 0.23 0.67 1 1
62 7.8 230 111 2.60 0.20 0.23 0.67 1 1
63 7.8 213 103 2.73 0.20 0.23 0.67 1 1
64 7.8 219 106 1.78 0.20 0.23 0.67 1 1
65 7.8 211 108 7.64 0.20 0.22 0.67 0 0
66 6.6 29 29 25.60 0.80 0.44 0.11 0 0
67 6.6 48 36 24.70 0.80 0.57 0.11 0 0
68 6.6 64 42 31.40 0.80 0.64 0.11 0 0
69 6.6 29 29 1.43 0.80 0.44 0.11 1 1
70 6.6 64 42 2.48 0.80 0.64 0.11 1 1
71 6.6 96 54 4.03 0.80 0.72 0.11 1 1
72 6.6 29 29 3.30 0.80 0.44 0.06 0 0
73 6.6 48 36 8.80 0.80 0.57 0.06 0 0
74 6.6 64 42 6.70 0.80 0.64 0.06 0 0
75 7.8 89 78 1.65 0.20 0.41 0.17 1 1
76 7.8 99 83 3.65 0.20 0.15 0.17 1 1
77 7.8 29 25 1.03 0.20 0.15 0.19 1 1
78 7.8 55 37 5.00 0.20 0.19 0.31 1 1
79 7.8 76 47 2.91 0.20 0.21 0.18 1 1
80 7.8 105 61 6.06 0.20 0.21 0.18 1 1
81 7.8 23 22 13.24 0.20 0.14 0.17 0 0
82 7.8 32 26 13.06 0.20 0.16 0.17 0 0
83 7.8 40 30 16.59 0.20 0.18 0.17 0 0
84 7.8 59 59 10.59 0.20 0.13 0.26 0 0
85 7.8 63 61 9.12 0.20 0.13 0.26 0 0
86 7.8 76 67 11.29 0.20 0.15 0.26 0 0
87 7.8 70 41 1.94 0.20 0.22 0.16 1 1
88 7.8 76 44 5.00 0.20 0.22 0.16 1 1
89 7.8 70 47 2.24 0.20 0.19 0.14 1 1
90 7.8 114 79 14.12 0.20 0.09 0.25 0 0
91 7.8 162 102 18.94 0.20 0.09 0.28 0 0
92 7.8 44 44 3.52 0.20 0.13 0.16 1 1
93 7.8 59 51 2.73 0.20 0.15 0.16 1 1
94 7.8 57 50 3.29 0.20 0.15 0.21 1 1
95 7.8 61 52 4.12 0.20 0.15 0.21 1 1
96 7.8 72 57 2.94 0.20 0.16 0.21 1 1
874 Page 8 of 11 Environ Earth Sci (2016) 75:874
123
-
After the GS-SVM procedure, the optimal parameters
with maximal classification accuracy are selected. As
shown in Figs. 7 and 8, the best (C, r) is (3.7321, 2.0) witha cross-validation rate of 91.89 %. In this result, the opti-
mal parameters are used to test the GS-SVM model to
generate the final classifier. Out of the 35 testing datasets
used, only two cases were misclassified, achieving an
overall classification accuracy rate of 92.66 %.
The comparison of prediction results of GA-SVM with
GS-SVM, ANN (Goh 1996), and C4.5 decision tree (Ar-
dakani and Kohestni 2015) is shown in Table 3. From
Table 3, it can be seen that the classification success rate of
the GA-SVM model for the training data is 97.29 %, which
is higher than the rates of C4.5 decision tree (95.9 %) and
GS-SVM models (91.89 %), and is slightly lower than the
rate of ANN (98.6 %) reported by Goh (1996). However,
the overall classification success rate of GA-SVM for the
entire dataset is 97.25 %, while the overall classification
success rates predicted by ANN, C4.5 decision tree, and
GS-SVM models are 97.20, 96.30, and 92.66 %, respec-
tively. It can be seen that the overall classification success
rate predicted by GA-SVM model is the highest among the
four methods. Moreover, the ANN uses many parameters,
such as the number of hidden layers, number of hidden
Table 1 continued
No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by
GA-SVM
97 7.8 61 52 3.00 0.20 0.15 0.15 1 1
98 7.8 95 68 5.85 0.20 0.18 0.32 1 1
99 7.8 106 73 9.00 0.20 0.18 0.32 1 1
100 7.8 49 48 1.88 0.20 0.13 0.13 1 1
101 7.8 74 64 2.55 0.20 0.15 0.17 1 1
102 7.8 76 65 4.50 0.20 0.15 0.17 1 1
103 7.8 106 79 4.24 0.20 0.17 0.17 1 1
104 7.8 114 77 8.00 0.20 0.18 0.22 0 1
105 7.2 80 48 5.22 0.22 0.21 0.20 1 1
106 7.2 95 55 3.73 0.22 0.22 0.20 1 1
107 7.2 114 64 3.11 0.22 0.22 0.20 1 1
108 7.2 133 73 1.32 0.22 0.22 0.20 1 1
109 7.2 152 82 5.22 0.22 0.22 0.20 1 1
Fig. 6 Fitness curve for GA-SVM to find the best parameters
Table 2 Coarse and fine range partitions of GS-SVM parameterstaken during two-step optimization
SVM parameters Coarse range partitions Fine range partitions
C [2-10, 2-10] [2-2, 2-2]
r [2-10, 2-10] [2-2, 2-2]
Environ Earth Sci (2016) 75:874 Page 9 of 11 874
123
-
nodes, learning rate, momentum term, number of training
epochs, and weight initialization methods. The decision
tree method is quite transparent and does not need opti-
mization and internal parameters; however, it has a limited
domain of applicability and is mostly case dependent. The
generalization of decision tree method is limited and is
only applicable in the range of training data. The case study
on soil liquefaction prediction shows that the proposed GA-
SVM model has better performance than the GS-SVM,
ANN, and C4.5 decision tree approaches.
Conclusions
Determination of liquefaction potential is a complex
geotechnical engineering problem due to the heterogeneous
nature of the soils and the participation of a large number
of factors that affect the liquefaction occurrence due to
earthquakes. This study presents a hybrid model based on
the combination of SVM and GA to improve the fore-
casting performance. GA is employed in selecting the
optimal values of the kernel function and the penalty
parameter in SVM model to improve the forecasting
accuracy. The developed GA-SVM approach was com-
pared with GS-SVM, ANN model, and C4.5 decision tree
approach. Prediction results demonstrate that the classifi-
cation accuracy rate of the developed GA-SVM approach
surpasses those of the other methods. The GA-SVM model
can be used as a reliable approach in soil liquefaction
prediction.
Fig. 7 Parameters C and rversus the accuracy rate in two
dimensions
Fig. 8 Parameters C and r versus the accuracy rate in threedimensions
Table 3 Comparison of modelpredictions of GA-SVM with
GS-SVM, ANN, and C4.5
decision tree
Methods Performance in terms of successful prediction (%)
Training Testing Overall
ANN (Goh 1996) 98.6 94.3 97.20
C4.5 decision tree (Ardakani and Kohestni 2015) 95.9 97.1 96.30
GA-SVM 97.29 97.14 97.25
GS-SVM 91.89 94.29 92.66
874 Page 10 of 11 Environ Earth Sci (2016) 75:874
123
-
References
Alemdag S, Gurocak Z, Cevik A, Cabalar AF, Gokceoglu C (2016)
Modeling deformation modulus of a stratified sedimentary rock
mass using neural network, fuzzy inference and genetic
programming. Eng Geol 203:70–82
Ardakani A, Kohestni VR (2015) Evaluation of liquefaction potential
based on CPT results using C4.5 decision tree. J AI Data Mining
3(1):85–92
Cevik A, Cabalar AF (2009) Modelling damping ratio and shear
modulus of sand-mica mixtures using genetic programming.
Expert Syst Appl 36(4):7749–7757
Chern SG, Lee CY (2009) CPT-based simplified liquefaction
assessment by using fuzzy-neural network. J Mar Sci Technol
17(4):326–331
Cortes C, Vapnik V (1995) Support-vector network. Mach Learn
20:273–297
Dibike YB, Velickov S, Solomatine D, Abbott MB (2001) Model
induction with support vector machines: introduction and
applications. J Comput Civ Eng 15(3):208–216
Ding SF, Su CY, Yu JZ (2011) An optimizing BP neural network
algorithm based on genetic algorithm. Artif Intell Rev
36:153–162
Farrokhzad F, Choobbasti AJ, Barari A (2012) Liquefaction micro-
zonation of Babol city using artificial neural network. J King
Saud Univ Sci 24(1):89–100
Goh AT (1996) Neural-network modeling of CPT seismic liquefac-
tion data. J Geotech Eng 122(1):70–73
Guettaya I, Ouni MREL (2014) In situ-based assessment of soil
liquefaction potential-case study of an earth dam in Tunisia.
Front Struct Civ Eng 8(4):456–461
Huang CL, Wang CJ (2006) A GA-based feature selection and
parameters optimization for support vector machines. Expert
Syst Appl 31:231–240
Jha SK, Suzuki K (2009) Reliability analysis of soil liquefaction
based on standard penetration test. Comput Geotech
36(4):589–596
Juang CH, Chen CJ, Jiang T, Andrus RD (2000) Risk-based
liquefaction potential evaluation using standard penetration tests.
Can Geotech J 37:1195–1208
Karegowda AG, Manjunath AS, Jayaram MA (2011) Application of
genetic algorithm optimized neural network connection weights
for medical diagnosis of Pima Indians diabetes. Int J Soft
Comput 2(2):15–22
Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector
machines with Gaussian kernel. Neural Comput 15(7):1667–1689
Kumar S, Naresh R (2007) Efficient real coded genetic algorithm to
solve the non-convex hydrothermal scheduling problem. Int J
Electr Power Energy Syst 29(10):738–747
Kurup PU, Dudani NK (2002) Neural networks for profiling stress
history of clays from PCPT data. J Geotech Geoenviron
128(4):569–579
Lee CY, Chern SG (2013) Application of a support vector machine
for liquefaction assessment. J Mar Sci Technol 21(3):318–324
Mahesh P (2006) Support vector machines-based modelling of
seismic liquefaction potential. Int J Numer Anal Methods
Geomech 30:983–996
Marcuson WF III (1978) Definition of terms related to liquefaction.
J Geotech Eng Div ASCE 104(9):1197–1200
Marghany M (2013) Genetic algorithm for oil spill automatic
detection from Envisat Satellite Data. In: Computational science
and its applications—ICCSA 2013. Springer, Berlin, pp 587–598
Marghany M (2014) Multi-objective evolutionary algorithm for oil
spill detection from COSMO-SkeyMed Satellite. In:
Computational science and its applications—ICCSA 2014.
Springer, Berlin, pp 355–371
Marghany M (2015) Automatic detection of oil spills in the Gulf of
Mexico from RADARSAT-2 SAR satellite data. Environ Earth
Sci 74:5935–5947
Maria JS (2011) Applying artificial neural networks for analysis of
geotechnical problems. Comput Assist Mech Eng Sci
18:231–241
Mert T (2013) A comparative study on computer aided liquefaction
analysis methods. Int J Housing Sci 37(2):121–135
Mughieda O, Bani HK, Safieh B (2009) Liquefaction assessment by
artificial neural networks based on CPT. Int J Geotech Eng
2:289–302
Padmini D, Ilamparuthi K, Sudheer KP (2008) Ultimate bearing
capacity prediction of shallow foundations on cohesionless soils
using neurofuzzy models. Comput Geotech 35:33–46
Pal M (2006) Support vector machines-based modeling of seismic
liquefaction potential. Int J Numer Anal Methods Geomech
30:983–996
Pour MN, Asakereh A (2015) A comparison between two field
methods of evaluation of liquefaction potential in the Bandar
Abbas city. Am J Civ Eng 3(2-2):1–5
Ramakrishnan D, Singh TN, Purwar N, Badre KS, Gulati A, Gupta S
(2008) Artificial neural network and liquefaction susceptibility
assessment: a case study using the 2001 Bhuj earthquake data,
Gujarat, India. Comput Geosci 12:491–501
Robertson PK, Campanella RG (1985) Liquefaction potential of sands
using the cone penetration test. J Geotech Eng Div ASCE
111(3):384–403
Sami M, de Patrick B (2005) Minimum principle and related
numerical scheme for simulating initial flow and subsequent
propagation of liquefied ground. Int J Numer Anal Methods
Geomech 29:1065–1086
Samui P (2013) Liquefaction prediction using support vector machine
model based on cone penetration data. Front Struct Civ Eng
7(1):72–82
Samui P, Sitharam TG (2011) Machine learning modelling for
predicting soil liquefaction susceptibility. Nat Hazards Earth
Syst Sci 11:1–9
Seed HB, Idriss IM (1967) Analysis of soil liquefaction: Niigata
earthquake. J Soil Mech Foun Div ASCE 93(3):83–108
Seed HB, Idriss IM (1971) Simplified procedure for evaluating soil
liquefaction potential. J Soil Mech Foun Div ASCE
97(9):1249–1273
Seed HB, Idriss IM, Arango I (1983) Evaluation of liquefaction
potential using field performance data. J Geotech Eng Div ASCE
109(3):458–482
Shibata T, Teparaksa W (1988) Evaluation of liquefaction potentials
of soils using cone penetration tests. Soils Found 28(2):49–60
Skempton AW (1986) Standard penetration test procedures and the
effects in sands of overburden pressure, relative density, particle
size, aging and overconsolidation. Géotechnique 36(3):425–447
Tang Y, Zang YQ, Huang G, Hu X (2005) Granular SVM-RFE geneselection algorithm for reliable prostate cancer classification on
microarray expression data. In: Proceedings of the 5th IEEE
symposium on bioinformatics and bioengineering (BIBE’05)
Vapnik V (1995) The nature of statistical learning theory. Springer,
New York
Zhang G, Robertson PK, Brachman RWI (2004) Estimating lique-
faction-induced lateral displacements using the standard pene-
tration test or cone penetration test. J Geotech Geoenviron Eng
130(8):861–871
Environ Earth Sci (2016) 75:874 Page 11 of 11 874
123
Application of genetic algorithm-based support vector machines for prediction of soil liquefactionAbstractIntroductionIntroduction of support vector machinesGA-based parameter optimization methodGenetic algorithm (GA)SVM optimized by GA (GA-SVM)Chromosome designFitness functionGenetic operatorsGA-SVM approachGS-based parameter optimization method
Case studyDatabase
Results and discussionConclusionsReferences