machine learning in concrete strength simulations: multi-nation data analytics

10
Machine learning in concrete strength simulations: Multi-nation data analytics Jui-Sheng Chou a,, Chih-Fong Tsai b , Anh-Duc Pham a,c , Yu-Hsin Lu d a Department of Civil and Construction Engineering, National Taiwan University of Science and Technology, Taiwan b Department of Information Management, National Central University, Taiwan c Faculty of Project Management, The University of Danang, University of Science and Technology, Vietnam d Department of Accounting, Feng Chia University, Taiwan highlights This comprehensive study used advanced machine learning techniques to predict concrete compressive strength. Model performance is evaluated through multi-nation data simulation experiments. The prediction accuracy of ensemble technique is superior to that of single learning models. This study developed advanced learning approaches for solving civil engineering problems. The approach also has potential applications in material sciences. article info Article history: Received 30 May 2014 Received in revised form 5 September 2014 Accepted 24 September 2014 Keywords: High performance concrete Compressive strength Multi-nation data analysis Machine learning Ensemble classifiers Prediction abstract Machine learning (ML) techniques are increasingly used to simulate the behavior of concrete materials and have become an important research area. The compressive strength of high performance concrete (HPC) is a major civil engineering problem. However, the validity of reported relationships between con- crete ingredients and mechanical strength is questionable. This paper provides a comprehensive study using advanced ML techniques to predict the compressive strength of HPC. Specifically, individual and ensemble learning classifiers are constructed from four different base learners, including multilayer per- ceptron (MLP) neural network, support vector machine (SVM), classification and regression tree (CART), and linear regression (LR). For ensemble models that integrate multiple classifiers, the voting, bagging, and stacking combination methods are considered. The behavior simulation capabilities of these tech- niques are investigated using concrete data from several countries. The comparison results show that ensemble learning techniques are better than learning techniques used individually to predict HPC com- pressive strength. Although the two single best learning models are SVM and MLP, the stacking-based ensemble model composed of MLP/CART, SVM, and LR in the first level and SVM in the second level often achieves the best performance measures. This study validates the applicability of ML, voting, bagging, and stacking techniques for simple and efficient simulations of concrete compressive strength. Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction An important research problem in materials science is predict- ing the mechanical properties of construction materials [1]. For many years, the use of high performance concrete (HPC) in various structural applications has markedly increased [2]. Cement materi- als such as fly ash, blast furnace slag, metakaolin, and silica fume are often used to increase the compressive strength and durability of HPC [3–5]. In terms of concrete mix design and quality control, compressive strength is generally considered the most important quality of HPC. Developing accurate and reliable compressive strength prediction models can save time and costs by providing designers and structural engineers with vital data. Thus, accurate and early prediction of concrete strength is a critical issue in con- crete construction. Concrete compressive strength (CCS) is usually predicted using linear or non-linear regression methods [3,6–8]. The general form of the regression method is http://dx.doi.org/10.1016/j.conbuildmat.2014.09.054 0950-0618/Ó 2014 Elsevier Ltd. All rights reserved. Corresponding author. Tel.: +886 2 2737 6321; fax: +886 2 2737 6606. E-mail addresses: [email protected] (J.-S. Chou), [email protected] (C.-F. Tsai), [email protected] (A.-D. Pham), [email protected] (Y.-H. Lu). Construction and Building Materials 73 (2014) 771–780 Contents lists available at ScienceDirect Construction and Building Materials journal homepage: www.elsevier.com/locate/conbuildmat

Upload: yu-hsin

Post on 14-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Machine learning in concrete strength simulations: Multi-nation data analytics

Construction and Building Materials 73 (2014) 771–780

Contents lists available at ScienceDirect

Construction and Building Materials

journal homepage: www.elsevier .com/locate /conbui ldmat

Machine learning in concrete strength simulations: Multi-nation dataanalytics

http://dx.doi.org/10.1016/j.conbuildmat.2014.09.0540950-0618/� 2014 Elsevier Ltd. All rights reserved.

⇑ Corresponding author. Tel.: +886 2 2737 6321; fax: +886 2 2737 6606.E-mail addresses: [email protected] (J.-S. Chou), [email protected]

(C.-F. Tsai), [email protected] (A.-D. Pham), [email protected] (Y.-H. Lu).

Jui-Sheng Chou a,⇑, Chih-Fong Tsai b, Anh-Duc Pham a,c, Yu-Hsin Lu d

a Department of Civil and Construction Engineering, National Taiwan University of Science and Technology, Taiwanb Department of Information Management, National Central University, Taiwanc Faculty of Project Management, The University of Danang, University of Science and Technology, Vietnamd Department of Accounting, Feng Chia University, Taiwan

h i g h l i g h t s

� This comprehensive study used advanced machine learning techniques to predict concrete compressive strength.� Model performance is evaluated through multi-nation data simulation experiments.� The prediction accuracy of ensemble technique is superior to that of single learning models.� This study developed advanced learning approaches for solving civil engineering problems.� The approach also has potential applications in material sciences.

a r t i c l e i n f o

Article history:Received 30 May 2014Received in revised form 5 September 2014Accepted 24 September 2014

Keywords:High performance concreteCompressive strengthMulti-nation data analysisMachine learningEnsemble classifiersPrediction

a b s t r a c t

Machine learning (ML) techniques are increasingly used to simulate the behavior of concrete materialsand have become an important research area. The compressive strength of high performance concrete(HPC) is a major civil engineering problem. However, the validity of reported relationships between con-crete ingredients and mechanical strength is questionable. This paper provides a comprehensive studyusing advanced ML techniques to predict the compressive strength of HPC. Specifically, individual andensemble learning classifiers are constructed from four different base learners, including multilayer per-ceptron (MLP) neural network, support vector machine (SVM), classification and regression tree (CART),and linear regression (LR). For ensemble models that integrate multiple classifiers, the voting, bagging,and stacking combination methods are considered. The behavior simulation capabilities of these tech-niques are investigated using concrete data from several countries. The comparison results show thatensemble learning techniques are better than learning techniques used individually to predict HPC com-pressive strength. Although the two single best learning models are SVM and MLP, the stacking-basedensemble model composed of MLP/CART, SVM, and LR in the first level and SVM in the second level oftenachieves the best performance measures. This study validates the applicability of ML, voting, bagging, andstacking techniques for simple and efficient simulations of concrete compressive strength.

� 2014 Elsevier Ltd. All rights reserved.

1. Introduction

An important research problem in materials science is predict-ing the mechanical properties of construction materials [1]. Formany years, the use of high performance concrete (HPC) in variousstructural applications has markedly increased [2]. Cement materi-als such as fly ash, blast furnace slag, metakaolin, and silica fume

are often used to increase the compressive strength and durabilityof HPC [3–5]. In terms of concrete mix design and quality control,compressive strength is generally considered the most importantquality of HPC. Developing accurate and reliable compressivestrength prediction models can save time and costs by providingdesigners and structural engineers with vital data. Thus, accurateand early prediction of concrete strength is a critical issue in con-crete construction.

Concrete compressive strength (CCS) is usually predicted usinglinear or non-linear regression methods [3,6–8]. The general formof the regression method is

Page 2: Machine learning in concrete strength simulations: Multi-nation data analytics

772 J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780

y ¼ f ðbi � xiÞ ð1Þ

where y, f, bi and xi are the CCS, linear or nonlinear function, regres-sion coefficients and concrete attributes, respectively.

However, obtaining an accurate regression equation whenusing these empirical-based models is difficult. Moreover, severalfactors that affect the compressive strength of HPC differ fromthose that affect the compressive strength of conventional con-crete. Therefore, regression analysis may be unsuitable for predict-ing CCS [9].

To compensate for the drawbacks of conventional models,machine learning algorithms (i.e., neural networks, classificationand regression tree, linear regression, or support vector machine(SVM)) as baseline models have been applied in evolutionary orhybrid approaches to developing accurate and effective modelsfor predicting CCS [10]. Machine learning, which is a branch of arti-ficial intelligence (AI), can be used not only in knowledge-genera-tion tools, but also in general information-modeling tools forconventional statistical techniques [11]. The ML models approxi-mate the relationships between the inputs and outputs based ona measured set of data.

Recently, the use of ML-based applications has increased inmany areas of civil engineering, ranging from engineering designto project planning [11–15]. Other material science problems thathave been solved by ML include mixture design, predictingmechanical properties, or fault diagnosis [10,14,16–19]. Particu-larly, ML-based solutions using learning mechanisms readily avail-able in WEKA software1 often provide a good alternative approachto solving prediction problems.

For example, Chou et al. proposed several supervised learningmodels for predicting CCS. Their analytical results indicated thatmultiple additive regression trees achieve the highest predictiveaccuracy [20]. Moreover, Yan and Shi reported that SVM was betterthan other models for predicting elastic modulus of normal andhigh strength concrete [21]. Notably, artificial neural networks(ANNs) are used to construct mapping functions for predictingCCS [19,22]. Thus, ML techniques such as SVM and ANN are fre-quently used in prediction models. However, no single model hasconsistently proven superior.

Conversely, ensemble approaches that combine multiple learn-ing classifiers (or prediction models) have been proposed toimprove the performance of single learning techniques [23,24].The three methods of combining multiple prediction models intoa single model are voting, bagging, and stacking combinationmethods [25–27]. However, a literature review shows no studiesthat have compared individual models and ensemble learningtechniques for predicting the compressive strength of HPC.

Particularly, related works only develop their prediction modelsbased on single statistical or machine learning techniques; it isunknown that whether the prediction models based on ensemblelearning techniques can perform better than single ones in theproblem of HPC compressive strength prediction.

Therefore, individual and ensemble ML techniques were com-pared in this study to identify the best model for predicting themechanical properties of HPC. Specifically, four well-known indi-vidual ML techniques are compared: multilayer perceptron neuralnetwork, SVM, classification and regression tree, and linear regres-sion. Additionally, the voting, bagging, and stacking integrationmethods of combining individual models are examined in termsof the mean absolute error (MAE), root mean squared error (RMSE)and mean absolute percentage error (MAPE) via a synthesis index.Meanwhile, cross validation method [28] is used to avoid bias inthe experimental datasets.

1 http://www.cs.waikato.ac.nz/ml/weka/

To sum up, the contribution of this paper is two-fold. The firstone is to compare the performance of single and ensemble tech-niques, which is not given thorough investigation in HPC compres-sive strength prediction. The second one is the identified bestprediction model, which provides the lowest error rate, can beused for not only the practical purpose, but also future researchesas the baseline prediction model to compare with other advancedmodels.

The rest of this paper is organized as follows. The study contextis introduced by a brief literature review, including studies of CCSprediction and some well-known ML applications. The methodol-ogy section then describes individual and ensemble ML schemesand evaluation methods. The modeling experiments section dis-cusses the experimental settings and compares the predictionresults among individual and ensemble ML models of HPC com-pressive strength. Finally, the conclusion summarizes the findingsand conclusions.

2. Literature review

The use of computer-aided modeling for predicting themechanical properties of construction materials is growing [12].The many prediction techniques proposed so far include empiricalmodels, statistical techniques and artificial intelligence algorithms[3,8,10,13,21,29]. Some linear or non-linear regression analyseshave achieved good prediction accuracy. Zain and Abd, forinstance, used multivariate power equations to predict thestrength of high performance concrete [8]. Similarly, Aciti devel-oped a regression model for estimating concrete strength throughnon-destructive testing methods and then performed statisticaltests to verify the model [7].

However, predicting the behavior of HPC is relatively more dif-ficult than predicting the behavior of conventional concrete. Sincethe relationship between components and concrete properties ishighly non-linear, mathematically modeling the compressivestrength of HPC based on available data is difficult [30,31]. There-fore, conventional methods are often unsuitable for predicting con-crete compressive strength [9]. Since conventional materialsmodels are inadequate for simulating complex non-linear behav-iors and uncertainties, researchers have proposed various AI tech-niques for enhancing prediction accuracy [9,13,15,20,21].Boukhatem et al. showed that simulation models, decision supportsystems, and AI techniques are useful and powerful tools for solv-ing complex problems in concrete technology [12].

Notably, researchers have applied or evaluated the capability ofANNs to predict strength and other concrete behaviors [29,30,32–34]. Ni and Wang, for instance, used multi-layer feed-forward neu-ral networks to predict 28-day CCS based on various factors [32].Altun et al. further showed that the ANN was superior to regressionmethod in estimating CCS [29]. Yeh also successfully used ANN topredict the slump of concrete with fly ash and blast furnace slag[30]. The use of an adaptive probabilistic neural network forimproving accuracy in predicting CCS was studied by Lee et al. [35].

Meanwhile, the SVM has excellent generalization capabilitywhen solving non-linear problems. The SVM can also overcomethe problem of small sample size. An SVM analysis was used toestimate the temperatures at which concrete structures are dam-aged by fire [14]. Moreover, Gupta investigated the potential useof SVM for predicting CCS [18] by combining radial basis functionwith SVM. Yan and Shi used SVM to predict the elastic modulus ofnormal and high strength concrete. The analytical results showedthat the SVM outperformed other models [21].

Similarly, evolutionary algorithm-based methodologies havebeen used for knowledge discovery. Cheng et al. proposed anadvanced hybrid AI model that fused fuzzy logic, weight SVM

Page 3: Machine learning in concrete strength simulations: Multi-nation data analytics

J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780 773

and fast messy genetic algorithms to predict compressive strengthin HPC [16]. Comparisons showed that their model was better thanSVM and back-propagation neural network. Additionally, Mousaviet al. applied gene expression programming (GEP), a subset of GP,to approximate the compressive strength in various HPC mixes[36]. The prediction performance of the optimal GEP model wassuperior to that of regression-based models. Yeh and Lien com-bined an operation tree and genetic algorithm into a proposedGenetic Operation Tree (GOT) for automatically obtaining self-organized formulas for accurately predicting HPC compressivestrength [9]. Their comparisons showed that GOT was more accu-rate than nonlinear regression formulas. Again, however, GOT wasless accurate than neural network models.

As the need for prediction accuracy increases, complexityapproaches that of combined ML techniques. The unique advanta-ges of ensemble learning methods are apparent when solving prob-lems involving a small sample size, high-dimensionality, andcomplex database structures [24,27,37–39]. Chou and Tsai, forexample, improved accuracy in predicting HPC compressivestrength by using a novel hierarchical approach that combinedclassification and regression technique [23].

Generally, various ensemble approaches may prove to beefficient tools for solving problems that are difficult or impossi-ble to solve by individual ML techniques or by conventionalregression methods. However, very few studies have comparedvarious single and ensemble learning techniques for predictingHPC compressive strength. Therefore, the objective of this studywas to evaluate the usefulness and identify, in terms of errorrate, the best ML technique for predicting HPC compressivestrength.

3. Methodology

Machine learning technologies are now used in many fields to simulate materi-als behavior. For predicting HPC compressive strength, ML-based methodologiesinclude artificial neural network, classification and regression trees (CART), linearregression (LR), and SVM. The reasons of choosing these techniques are becausethey are the most popular and applied techniques in related works (c.f. Sections 1and 2), and some of them are also recognized as the top data mining algorithms[40].

3.1. Machine learning techniques

3.1.1. Artificial neural networkThe ANN is a powerful tool for solving very complex problems. Essentially, the

processing elements of a neural network resemble neurons in the human brain,which consist of many simple computational elements arranged in layers. Theuse of ANNs to predict CCS has been studied intensively [5,19,30,32,34]. Multilayerperception (MLP) neural networks are standard neural network models with aninput layer containing a set of sensory input nodes representing concrete compo-nents, one or more hidden layers containing computation nodes, and an outputlayer containing one computation node representing CCS. Like any intelligencemodel, ANNs have learning capability.

The most widely used and effective learning algorithm for training an MLP neu-ral network is the back-propagation (BP) algorithm, which adjusts connectionweights and bias values during training. An activated neuron in a hidden outputlayer can be expressed as

netj ¼X

wjixi and yj ¼ f ðnetjÞ ð2Þ

where netj is the activation of jth neuron, i is the set of neurons in the precedinglayer, wji is the weight of the connection between neuron j and neuron i, xi is the out-put of neuron i, and yj is the sigmoid or logistic transfer function

f ðnetjÞ ¼1

1þ e�knetjð3Þ

where k controls the function gradient.The formula for training and updating weights wji in each cycle h is

wjiðhÞ ¼ wjiðh� 1Þ þ DwjiðhÞ ð4Þ

The change in Dwji(h) is

DwjiðhÞ ¼ gdpixpi þ aDwjiðh� 1Þ ð5Þ

where g is the learning rate parameter, dpi is the propagated error, xpi is the output ofneuron i for record p, a is the momentum parameter, and Dwji(h � 1) is the change inwji in the previous cycle.

3.1.2. Support vector machineThe SVM, which was developed by Vapnik in 1995 [41], is widely used for clas-

sification, forecasting and regression. The high learning capabilities of SVMs havebeen confirmed in the civil engineering field [14,15,18,21]. In this supervised learn-ing method, SVMs are generated from the input–output mapping functions of alabeled training dataset. An SVM can be classified as either a classification target,which has only two values (i.e., 0 and 1), or as a regression target, which has a con-tinuous real value. The regression model of SVMs is typically used to construct theinput–output model because it effectively solves nonlinear regression problems[42].

The input for the SVM regression is first mapped to an n-dimensional featurespace by using a fixed mapping procedure. Nonlinear kernel functions then fit thehigh-dimensional feature space in which input data become more separable com-pared to input data in the original input space. The linear model in the space isf(x,w), which can be expressed by the following equation:

f ðx;xÞ ¼Xn

j¼1

wjgjðxÞ þ b ð6Þ

where gj(x) is a set of nonlinear transformations from the input space, b is a biasterm, and w denotes the weight vector estimated by minimizing the regularized riskfunction that includes the empirical risk.

Estimation quality is also measured by a loss function Le where

Le ¼ Leðy; f ðx;xÞÞ ¼0 if jy� f ðx;xÞj 6 e

jy� f ðx;xÞj otherwise

�ð7Þ

The novel feature of SVM regression is its e-insensitive loss function for com-puting a linear regression function for the new higher dimensional feature spacewhile simultaneously decreasing model complexity by minimizing kxk2. This func-tion is introduced by including nonnegative slack variables ni and ni

⁄, wherei = 1, . . ., n is used to identify training samples from the e-insensitive zone. TheSVM regression can thus be formulated by simplifying the following function:

min12kxk2 þ C

Xn

i¼1

ðni þ n�i Þ ð8Þ

subject toyi � f ðxi;xÞ 6 eþ n�if ðxi;xÞ � yi 6 eþ n�ini; n

�i P 0; i ¼ 1; . . . ;n

8><>:

This optimization problem can be transformed into a dual problem, which issolved by

f ðxÞ ¼XnSV

i¼1

ðai � a�i ÞKðx; xiÞ subject to 0 6 a�i 6 C;0 6 ai 6 C ð9Þ

where nSV is the number of support vectors. The kernel function is

Kðx; xiÞ ¼Xm

i¼1

giðxÞgiðxiÞ ð10Þ

During training, selected SVM kernel functions (i.e., linear, radial basis, polyno-mial, or sigmoid function) are used to identify support vectors along the functionsurface. The default kernel parameters depend on the kernel type and on the imple-mented software.

3.1.3. Classification and regression treeThe CART is a machine-learning method for constructing prediction models

from data. This decision tree method constructs classification trees or regressiontrees depending on the variable type, which may be categorical or numerical[43,44]. Breiman et al. showed that a learning tree can be optimized by using alearning data set to prune the saturated tree and select among the obtainedsequence of nested trees [43]. This process helps to retain a simple tree, whichensures robustness.

Depending on the target field, several impurity measures can be used to locatesplits for CART models. For instance, Gini is usually applied to symbolic target fieldswhile the least-squared deviation method is used for automatically selecting con-tinuous targets without explaining the selections. For node t in a CART, Gini indexg(t) is defined as

gðtÞ ¼Xj–i

pðjjtÞpðijtÞ ð11Þ

where i and j are target field categories

Page 4: Machine learning in concrete strength simulations: Multi-nation data analytics

Fig. 1. Ten-fold cross-validation method.

774 J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780

pðjjtÞ ¼ pðj; tÞpðtÞ ; pðjtÞ ¼ pðjÞNjðtÞ

Nj; andpðtÞ ¼

Xj

pðj; tÞ ð12Þ

where p(j) is the prior probability value for category j, Nj(t) is the number of recordsin category j of node t, and Nj is the number of records of category j in the root node.Notably, when the improvement after a split during tree growth is determined by

Air

W Cement FAA

W PowderA

Water FineAggregaCement

Fines

Conventional C

High Performanc

Fig. 2. Materials used in reg

Table 1Sources of datasets in literature.

Dataset Data source Supplementary cem

Dataset 1 Yeh [31] Blast-furnace slag; flYeh [54]

Dataset 2 Videla et al. [55] Blast-furnace slagSilica fume

Dataset 3 Lam et al. [56] Fly ash; silica fumeDataset 4 Lim et al. [57] Fly ash; silica fume;Dataset 5 Safarzadegan et al. [58] Metakaolin

using the Gini index, only records in node t and only root nodes with valid split-pre-dictors are used to compute Nj(t) and Nj, respectively.

Many studies have investigated the use of CART in various fields, including roadsafety analysis, traffic engineering, motor vehicle emissions [45]. A novel feature ofCART is its automatic search for the best predictors and the best threshold values forall predictors to classify target variable.

3.1.4. Linear regressionThe multiple linear regression (LR) model, an extension of the simple regression

model, determines the relationship between a numerical response variable and twoor more explanatory variables [46,47]. This model specifies that an appropriatefunction of the fitted probability of the event is a linear function of the observed val-ues of the available explanatory variables.

In the literature, LR is commonly used for modeling the mechanical propertiesof construction materials [6,7,20,23,48]. The computational problem addressed byLR is fitting a hyperplane to an n-dimensional space where n is the number of inde-pendent variables. For a system with n inputs (independent variables), X’s, and oneoutput (dependent variable), Y, the general least square problem is to determineunknown parameters of the linear regression model. Because of its simplicity, thisstudy investigated the applicability of LR. The general formula for LR models isshown in Eq. (13).

Y ¼ b0 þ b1x1 þ b2x2 þ � � � þ bnxn þ e ð13Þ

In the proposed model, Y is concrete compressive strength, bi is a regressioncoefficient (i = 1,2,..., n), e is an error term, and X’s values represent concrete attri-butes. Regression analysis estimates the unbiased values of the regression coeffi-cients bi against the training data set. The LR model applies four regressionmethods using ordinary least squares estimation: enter, stepwise, forward, andbackward [49]. Equation (14) is a concise vector-matrix form.

Y ¼ bxþ e ð14Þ

3.2. Ensemble models and cross-validation

The various supervised learning techniques such as SVMs, classification andregression tree, linear regression, and multilayer perceptron neural network [40]are typically used individually to construct single classifiers as the benchmark mod-els. In ML, ensemble classifiers or combinations of (different) classifiers have provensuperior to many individual classifiers [50,51].

Specifically, a combination of classifiers can compensate for errors made by theindividual classifiers on different parts of the input space. Therefore, the strategyused in ensemble systems is to create many classifiers and combine their outputssuch that the combination improves upon the performance of single classifiers inisolation [52].

CA

FA CA

teCoarse

Aggregate

oncrete

e Concrete

ular concrete and HPC.

enting materials Laboratory Sample size

y ash; super-plasticizer Taiwan 1,030103

Chile 194

Hong Kong 144super-plasticizer South Korea 104

Iran 100

Page 5: Machine learning in concrete strength simulations: Multi-nation data analytics

J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780 775

Next, the three methods combined into ensemble classifiers in this study aredescribed.

3.2.1. Voting methodThe simplest method of combining multiple classifiers is voting. In the cases of

prediction, the outputs of the individual classifiers are pooled. Then, the class whichreceives the largest number of votes is selected as the final classification decision. Ingeneral, the numerical output can be determined by different combination of prob-ability estimates.

By combining two to four different individual classifiers, this study obtainedeleven different ensemble classifiers. Ensembles of two different classifiers includedMLP+CART, MLP+SVM, MLP+LR, CART+SVM, CART+LR, and SVM+LR; ensembles ofthree different classifiers included MLP+CART+SVM, MLP+CART+LR, CART+SVM+LR,and MLP+SVM+LR. One ensemble of four different classifiers was used:MLP+CART+SVM+LR.

3.2.2. Bagging methodThe bagging method uses bootstrap method to train several classifiers indepen-

dently and with different training sets [26]. Bootstrapping builds k replicate train-ing datasets to construct k independent classifiers by randomly re-sampling andreplacing the original training dataset. That is, each training example may appearrepeatedly or not at all in any particular replicated training dataset of k. The k clas-sifiers are then aggregated through an appropriate combination method, e.g., aver-age of probabilities.

Table 2The HPC attributes in the datasets.

Parameter Unit Min.

Dataset 1 – TaiwanCement kg/m3 102.0Blast-furnace slag kg/m3 0.0Fly ash kg/m3 0.0Water kg/m3 121.8Super-plasticizer kg/m3 0.0Coarse aggregate kg/m3 708.0Fine aggregate kg/m3 594.0Age of testing Days 1.0Concrete compressive strength MPa 2.3

Dataset 2 – ChileCoarse aggregate kg/m3 1105.0Fine aggregate kg/m3 488.0Cement kg/m3 408.0Silica fume kg/m3 0.0Water kg/m3 160.0Plasticizer kg/m3 2.2High-range water-reducing kg/m3 6.7Entrapped air content % 1.3Age of testing Days 1.0Concrete compressive strength MPa 21.2

Dataset 3 – Hong KongFly ash replacement ratio % 0.0Silica fume replacement ratio % 0.0Total cementitious material kg/m3 400.0Fine aggregate kg/m3 536.0Coarse aggregate kg/m3 1086.0Water content lit/m3 150.0High rate water reducing agent lit/m3 0.0Age of samples Days 3.0Concrete compressive strength MPa 7.8

Dataset 4 – South KoreaWater to binder ratio % 30.0Water content kg/m3 160.0Fine aggregate % 37.0Fly ash % 0.0Air entraining ratio kg/m3 0.04Supper-plasticizer kg/m3 1.89Concrete compressive strength MPa 38.0

Dataset 5 – IranCement kg/m3 320.0Coarse aggregate kg/m3 765.0Fine aggregate kg/m3 796.0Water kg/m3 140.0Metakaolin kg/m3 0.0Age of testing Days 7.0Concrete compressive strength MPa 19.0

This study combined four individual learning techniques into an MLP ensemble,an SVM ensemble, a CART ensemble, and an LR ensemble.

3.2.3. Stacking methodStacking or stacked generalization [25] is a method of constructing multi-level

classifiers hierarchically. The first level consists of several single classifiers, and theoutputs of the first level classifiers are used to train the ‘stacked’ classifier, i.e., thesecond level classifier. Therefore, the final decision depends on the output of thestacked classifier. Unlike the above combined methods such as voting are per-formed by the ‘static’ combiner, the stacked classifier is a ‘trainable’ combiner. Thatis, it estimates the classifier errors for a particular learning dataset and then correctsthem.

Since SVM performs better than many supervised learning techniques in manypattern recognition problems, the stacking based ensemble classifiers in this studywere based on a two-level scheme with three different individual classifiers in thefirst level and an SVM regression in the second level [53].

3.2.4. Cross-validation methodThe k-fold cross-validation algorithm is often used to minimize bias associated

with random sampling of training and hold out data samples. Since Kohavi reportedthat ten-fold validation testing yields the optimal computational time and reliablevariance [28], this work applied a stratified ten-fold cross-validation approach toassess model performance. This method categorizes a fixed number of data samplesinto ten subsets. In each of ten rounds of model building and validation, it chooses adifferent data subset for testing and trains the model with the remaining nine data

Ave. Max. Direction

276.50 540.0 Input74.27 359.462.81 260.0

182.98 247.06.42 32.2

964.83 1145.0770.49 992.6

44.06 365.035.84 82.6 Output

1135.73 1173.0 Input602.71 700.0518.31 659.0

24.57 59.0164.74 168.0

2.73 3.39.30 14.51.82 2.5

18.81 56.067.11 113.7 Output

25.00 55.0 Input1.88 5.0

436.67 500.0639.38 724.0

1125.00 1157.0171.98 205.0

4.87 13.060.67 180.056.63 107.8 Output

37.60 45.0 Input170.00 180.0

46.00 53.010.10 20.0

0.05 0.084.48 8.5

52.68 74.0 Output

357.4 400.0 Input881.3 954.0884.7 1017.5173.0 200.0

42.0 80.076.3 180.049.3 82.5 Output

Page 6: Machine learning in concrete strength simulations: Multi-nation data analytics

Table 3Default model parameter settings.

Model Parameter Setting

MLP Hidden layer 1Learning rate 0.3Momentum 0.2Training/time 500Validation threshold 20

CART Initial count 0.0Max depth �1MinNum 2.0minVarianceProp 0.001noPruning falsenumFolds 3seed 1

SVM C 1.0Kernel RBF

LR EliminateColinearAttributes TrueMinimal FalseRidge 1.0E�8

Voting classifiers 2-4 weka.classifiers.ClassifiercombinationRule Average of

Probabilities

Bagging bagSizePercent 100numIterations 10

Stacking classifiers 3 weka.classifiers.ClassifiermetaClassifier SMOregnumFolds 10

Table 4Prediction performances of individual models.

Dataset ML technique MAE (MPa) RMSE (MPa) MAPE (%) SI

Dataset 1 – TaiwanMLP 6.19 7.95 20.84 0.54CART 5.86 7.84 20.66 0.50SVM 3.75 5.59 12.03 0.00LR 7.87 10.11 29.89 1.00

Dataset 2 – ChileMLP 4.00 5.40 6.81 0.00CART 4.29 5.72 7.35 0.04SVM 8.02 10.32 16.95 0.61LR 11.38 13.49 21.87 1.00

Dataset 3 – Hong KongMLP 4.28 5.81 10.51 0.00CART 7.29 9.28 16.31 0.56SVM 5.34 6.62 13.26 0.19LR 9.56 11.04 23.91 1.00

Dataset 4 – South Korea

MLP 1.43 1.90 11.34 0.47CART 1.86 2.58 3.22 0.68SVM 1.31 1.73 2.95 0.00LR 1.72 2.20 3.41 0.45

Dataset 5 – IranMLP 6.08 7.87 14.16 0.27CART 5.42 7.11 12.55 0.00SVM 7.73 10.10 18.83 1.00LR 6.47 7.86 15.68 0.40

Highlighted in bold denotes the best model and performance measure.

776 J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780

subsets. The test subset is used to validate model accuracy (Fig. 1). Algorithm accu-racy is then expressed as average accuracy acquired by the ten models in ten vali-dation rounds.

3.3. Performance evaluation methods

The following performance measures were used to evaluate the accuracy of theproposed machine learning models.

� Mean absolute error

MAE ¼ 1n

Xn

i¼1

jy� y0 j ð15Þ

where y0 is the predicted value; y is the actual value; and n is the number of datasamples.� Root mean squared error

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

Xn

i¼1

ðy0 � yÞ2vuut ð16Þ

� Mean absolute percentage error

MAPE ¼ 1n

Xn

i¼1

y� y0

y

�������� ð17Þ

To obtain a comprehensive performance measure, a synthesis index (SI) basedon three statistical measures, MAE, RMSE, and MAPE, was derived as

SI ¼ 1n

Xn

i¼1

Pi � Pmin;i

Pmax;i � Pmin;i

� �ð18Þ

where n = number of performance measures; and Pi = ith performance measure. TheSI ranged from 0 to 1; an SI value close to 0 indicated a highly accurate predictivemodel.

4. Modeling experiments

4.1. Data description and preparation

Data collected from reliable laboratory tests and publishedstudies were used as experimental data in evaluations of forecast-ing performance of the proposed ML models. Supplementarycement materials such as fly ash, blast furnace slag, metakaolin,and silica fume are often added to HPC [2] to improve its materialproperties (Fig. 2). Table 1 lists the five multi-nation datasets withvarious additional cementing materials.

Dataset 1 – Taiwan: In this experimental dataset originally col-lected by Yeh [31,54], the final set of 1133 samples of ordinaryPortland cement containing various additives and cured under nor-mal conditions was evaluated from numerous university researchlabs. All tests were performed on 15-cm cylindrical specimens ofconcrete prepared using standard procedures.

Dataset 2 – Chile [55]: Rapid hardening Portland blast-furnaceslag cement, silica fume, coarse and fine crushed siliceous aggre-gates, and plasticizer and high-range water reducing chemicaladmixtures were used. Concrete trial mixtures were proportionedusing the coarse aggregate dosages recommended by ACI 211.4R-93. All samples were stored in a standard curing room(T = 20 ± 3 �C and relative humidity [RH] > 90%) until testing atvarying age. Compressive strength tests were performed on two150 � 300 mm cylindrical samples according to ASTM C 39 on days1, 3, 7, 28, and 56.

Dataset 3 – Hong Kong [56]: Concrete mixes were prepared atdifferent ratios of water to cementitious materials, with low andhigh volumes of fly ash, and with or without addition of smallamounts of silica fume. The 144 different samples consisted of24 different mixes cured for 3, 7, 28, 56, or 180 days. In each mixseries, the percentage of cement replacement by fly ash (on directweight to weight basis) varied from 0% to 55%. Some mixes con-tained a further 5% silica fume replacement. The cementitious

materials were Portland cement equivalent to ASTM Type I, lowcalcium fly ash equivalent to ASTM Class F, and a condensed silicafume commercially available in Hong Kong.

Dataset 4 – South Korea [57]: All materials used in this experi-ment were produced in South Korea. Compressive strength was40–80 MPa. The W/B varies between 0.30 and 0.45, and theamount of fly ash used varies from 0% to 20% of the total binder,and the content of super-plasticizer and air-entraining agent are0–2% and 0.010–0.013%, respectively, when expressed as a

Page 7: Machine learning in concrete strength simulations: Multi-nation data analytics

Table 5Prediction performances of ensemble models.

Ensemble method Model MAE (MPa) RMSE (MPa) MAPE(%) SI

Dataset 1 – Taiwan

VotingMLP+CART 5.13 6.67 17.85 0.05MLP+SVM 8.35 10.65 30.20 0.17MLP+LR 6.13 7.87 22.29 0.09CART+SVM 3.97 5.51 14.24 0.02CART+LR 5.70 7.40 21.50 0.08SVM+LR 5.27 7.05 19.88 0.06MLP+CART+SVM 12.08 15.45 35.44 0.27MLP+CART+LR 36.32 38.41 111.04 1.00CART+SVM+LR 4.66 6.18 17.54 0.04MLP+SVM+LR 6.51 8.47 23.90 0.10MLP+CART+SVM+LR 28.84 31.71 80.89 0.76

BaggingMLP 15.66 19.70 53.40 0.41CART 4.26 5.71 15.02 0.02SVM 4.01 5.75 14.09 0.02LR 7.88 10.13 29.96 0.16

StackingMLP+CART+SVM 7.25 9.48 25.92 0.13MLP+CART+LR 7.03 9.02 24.59 0.12CART+SVM+LR 3.52 5.08 11.97 0.00MLP+SVM+LR 5.92 7.66 20.64 0.08

Dataset 2 – Chile

VotingMLP+CART 18.83 22.06 25.30 0.26MLP+SVM 10.74 12.79 19.71 0.14MLP+LR 54.97 70.13 89.40 1.00CART+SVM 4.91 6.39 9.80 0.03CART+LR 5.90 7.52 11.47 0.05SVM+LR 8.51 10.87 17.98 0.11MLP+CART+SVM 17.54 20.79 25.67 0.25MLP+CART+LR 48.26 58.00 74.39 0.83CART+SVM+LR 5.98 7.76 12.50 0.05MLP+SVM+LR 7.84 9.65 15.27 0.08MLP+CART+SVM+LR 37.98 45.95 56.16 0.63

BaggingMLP 20.91 24.31 38.68 0.34CART 3.82 5.13 6.50 0.00SVM 9.66 12.06 19.82 0.13LR 11.49 13.52 21.79 0.15

StackingMLP+CART+SVM 5.97 7.49 9.94 0.04MLP+CART+LR 5.86 7.30 9.71 0.04CART+SVM+LR 4.86 6.30 8.30 0.02MLP+SVM+LR 5.66 7.15 9.37 0.03

Dataset 3 – Hong Kong

VotingMLP+CART 19.22 22.46 31.77 0.08MLP+SVM 10.75 12.87 28.26 0.04MLP+LR 161.77 166.78 403.48 1.00CART+SVM 4.89 6.26 11.35 0.00CART+LR 5.78 7.12 14.74 0.01SVM+LR 6.14 7.51 16.19 0.01MLP+CART+SVM 18.04 21.53 34.79 0.08MLP+CART+LR 95.62 99.98 251.60 0.59CART+SVM+LR 4.81 6.09 12.54 0.00MLP+SVM+LR 7.10 8.61 17.86 0.02MLP+CART+SVM+LR 70.32 74.84 193.99 0.44

BaggingMLP 21.33 26.03 67.16 0.12CART 5.63 6.98 14.97 0.01SVM 6.13 7.72 15.93 0.01LR 9.64 11.10 24.14 0.03

StackingMLP+CART+SVM 6.40 8.09 15.74 0.01MLP+CART+LR 5.57 7.38 14.22 0.01CART+SVM+LR 5.22 7.01 12.18 0.00MLP+SVM+LR 4.70 6.02 11.75 0.00

(continued on next page)

J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780 777

Page 8: Machine learning in concrete strength simulations: Multi-nation data analytics

Table 5 (continued)

Ensemble method Model MAE (MPa) RMSE (MPa) MAPE(%) SI

Dataset 4 – South Korea

VotingMLP+CART 4.78 6.18 8.42 0.09MLP+SVM 4.44 4.98 8.61 0.08MLP+LR 45.53 49.72 79.73 1.00CART+SVM 1.29 1.79 2.50 0.01CART+LR 1.50 2.05 2.89 0.01SVM+LR 1.26 1.65 2.44 0.01MLP+CART+SVM 5.46 6.66 9.93 0.10MLP+CART+LR 26.15 30.72 49.58 0.59CART+SVM+LR 1.24 1.70 2.39 0.01MLP+SVM+LR 1.35 1.78 2.64 0.01MLP+CART+SVM+LR 17.79 21.51 33.72 0.40

BaggingMLP 5.74 7.34 11.34 0.11CART 1.68 2.37 3.22 0.02SVM 1.52 2.03 2.95 0.01LR 1.73 2.17 3.41 0.01

StackingMLP+CART+SVM 2.43 3.63 4.61 0.04MLP+CART+LR 1.64 2.34 3.28 0.01CART+SVM+LR 1.09 1.51 2.11 0.00MLP+SVM+LR 1.13 1.59 2.20 0.01

Dataset 5 – Iran

VotingMLP+CART 4.71 6.22 11.17 0.14MLP+SVM 5.72 7.39 13.61 0.40MLP+LR 4.97 6.39 11.85 0.20CART+SVM 5.12 7.06 12.61 0.29CART+LR 4.26 5.90 10.56 0.06SVM+LR 6.57 8.15 16.21 0.62MLP+CART+SVM 4.74 6.31 11.38 0.16MLP+CART+LR 4.06 5.61 9.87 0.00CART+SVM+LR 4.83 6.58 12.13 0.21MLP+SVM+LR 5.15 6.60 12.59 0.26MLP+CART+SVM+LR 4.47 6.07 10.99 0.11

BaggingMLP 4.38 5.69 10.29 0.05CART 4.25 5.78 10.17 0.04SVM 7.93 10.47 19.13 1.00LR 6.22 7.75 15.00 0.52

StackingMLP+CART+SVM 5.47 6.99 12.42 0.31MLP+CART+LR 5.21 6.53 12.21 0.25CART+SVM+LR 5.54 7.17 12.64 0.33MLP+SVM+LR 6.21 7.83 14.62 0.51

Highlighted in bold denotes the best model and performance measure.

0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50 6.00 6.50 7.00

Dataset 5

Dataset 4

Dataset 3

Dataset 2

Dataset 1

Con

cret

eda

tase

t

Average mean absolute error (MAE)

Stacking CART+SVM+LR

Stacking CART+SVM+LR

Stacking MLP+SVM+LR

Stacking CART+SVM+LR

Stacking MLP+CART+LR

Voting CART+SVM

Voting CART+SVM

Voting CART+SVM+LR

Voting CART+SVM+LRSVM

Voting MLP+CART+LRCART

Bagging SVM

Bagging CART

Bagging CART

Bagging SVM

Bagging CART

MPa

SVM

MLP

MLP

Fig. 3. Average MAEs of prediction models.

778 J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780

Page 9: Machine learning in concrete strength simulations: Multi-nation data analytics

Table 6Comparison of best individual and ensemble models.

Dataset Predictive technique MAE (MPa) RMSE (MPa) MAPE (%)

Dataset 1 (Taiwan) SVM 3.75 5.59 12.03CART+SVM+LR (stacking) 3.52 5.08 11.97

Dataset 2 (Chile) MLP 4.00 5.40 6.81CART (bagging) 3.82 5.13 6.50

Dataset 3 (Hong Kong) MLP 4.28 5.81 10.51MLP+SVM+LR (stacking) 4.70 6.02 11.75

Dataset 4 (South Korea) SVM 1.31 1.73 2.95CART+SVM+LR (stacking) 1.09 1.51 2.11

Dataset 5 (Iran) CART 5.42 7.11 12.55MLP+CART+LR (voting) 4.06 5.61 9.87

J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780 779

percentage of dry solids to the binder content. Portland cementprepared according to the definition for ASTM type I was used.The coarse aggregate used was crushed granite (specific gravity,2.7; fineness modulus, 7.2; maximum particle size, 19 mm). Thefine aggregate was quartz sand (specific gravity, 2.61; finenessmodulus, 2.94).

Dataset 5 – Iran [58]: Metakaolin (MK) is a very active naturalpozzolan that can produce concrete with high early strength.Recently, the highly pozzolonic properties of MK have been studiedintensively. Incorporating MK in concrete significantly increasedresistance to chloride penetration. ASTM C150 type-I Portlandcement (PC) was used for all of the concrete mixtures. Coarseand fine aggregates were crushed calcareous stone (maximum size,19 mm) and natural sand, respectively. Potable water was used forcasting and curing of all concrete specimens. The MK was used asthe supplementary cementitious material. The percentages of MKused to replace PC in this experiment were 0%, 5%, 10%, 12.5%,15%, and 20% by mass of concrete added to the clinker. Data for100 samples of concrete containing MK were obtained.

The multi-nation datasets used in this study have been con-firmed in part or in whole in many studies of predictive models(Table 1). Based on four predictive techniques used as baselinemodels, this study used five experimental datasets to investigatethe performance of the ensemble models. Table 2 summarizesthe statistical parameters in each of the databases, which wereobtained from various university research laboratories, includingdescriptive data for various additives and various curing timesunder normal conditions. The response/target was CCS, and thepredictor variables were the remaining attributes.

4.2. Model construction

The four different ML techniques used as single and combinedclassifiers in the prediction models were multilayer perceptron(MLP) neural network, classification and regression tree (CART),SVM, and linear regression (LR). Table 3 lists the parameters usedto develop these individual and ensemble models. The WEKA soft-ware was used to implement the models.

5. Results and discussion

Tables 4 and 5 shows the prediction performances of single andensemble learning based models, respectively over the multi-nation datasets. Table 4 shows the cross-fold modeling perfor-mance. For single learning based models, SVM performs the bestover two datasets (i.e., Datasets 1-Taiwan and 4-South Korea)and MLP has the lowest error rates over two datasets (i.e., Datasets2-Chile and 3-Hong Kong). Notably, the best single ML model isCART in Dataset 5-Iran over the synthesis index.

Overall, the ensemble models achieved good outcomes in termsof overall performance measures. For example, of the models basedon ensemble learning by voting approach, CART+SVM performed

best in Datasets 1–4 (SI values of 0.02, 0.03, 0.00 and 0.01, respec-tively). For the bagging approach, SVM or CART ensemble had thelowest error rates over Datasets 1–5 (SI values of 0.02, 0.00, 0.01,0.01 and 0.04, respectively). For the stacking approach, the firstlevel technique based on CART+SVM+LR often had the best perfor-mance over the multi-nation datasets.

Specifically, the ensemble learning based models that per-formed the best, in terms of SI values, over the five datasets werestacking-based CART+SVM+LR for Datasets 1 and 4, bagging-basedCART for Dataset 2, stacking-based MLP+SVM+LR for Dataset 3, andvoting-based MLP+CART+LR for Dataset 5. Fig. 3 shows the averageMAEs for the best prediction models based on single, voting, bag-ging, and stacking learning. For example, the best learning modelswere voting based MLP+CART+LR (5.21 MPa in Dataset 5), stackingbased CART+SVM+LR (3.52 and 1.09 MPa in Datasets 1 and 4,respectively), single MLP (4.28 MPa in Dataset 3), and baggingCART (3.82 MPa in Dataset 2).

Table 6 further compares the best single and ensemble learningbased models over the five datasets based on k-cross validationalgorithm. The comparative results showed that the best ensemblelearning based models outperformed the single learning basedmodels over four datasets.

The comparison results indicated that SVM and MLP can be con-sidered the two best individual learning based models and that theensemble techniques generally perform better than the best indi-vidual models for predicting HPC compressive strength if chosencarefully. Specifically, the voting-based MLP+CART+LR, bagging-based CART, and stacking-based MLP/CART+SVM+LR were the bestindividual models for constructing ensemble learning basedmodels.

Moreover, the analytical results indicated that most ML modelsachieved good performance and obtained lower error valuescompared to those of previous works, such as multi-gene geneticprogramming approach (MAE = 5.480 MPa) in Dataset 1 [59], com-bination of hyperbolic and exponential equation (MAE = 5.000 MPa)in Dataset 2 [55] and weighted genetic programming approach(RMSE = 2.180 MPa) in Dataset 4 [60].

6. Conclusions

The objective of this study was to perform a comprehensivecomparison of various learning techniques used individually andin combination for performing simulations of concrete compres-sive strength based on multi-nation datasets with diverse additivematerials. Four individual ML techniques (MLP, SVM, CART, and LR)were used to construct the prediction models as benchmark base-lines. The ensemble learning based models used voting, bagging,and stacking approaches to combine multiple single learning basedmodels.

The experimental results showed that the best individual learn-ing techniques were SVM and MLP except for Dataset 5. On aver-age, the stacking based ensemble model composed of MLP/CART,

Page 10: Machine learning in concrete strength simulations: Multi-nation data analytics

780 J.-S. Chou et al. / Construction and Building Materials 73 (2014) 771–780

SVM, and LR as the first level models and SVM as the second levelmodel performed best in terms of MAE, RMSE, and MAPE (Datasets1, 3, and 4).

Generally, ensemble learning techniques outperformed individ-ual learning techniques in predicting HPC compressive strength.However, individual ML based models should be selected carefullyto obtain the best ensemble model. Specifically, the best individualML model (i.e., MLP) sometimes provided lower error rates com-pared to ensemble learning based models (Dataset 3).

The contribution of this paper to the domain knowledge is topropose and validate the machine learning, voting, bagging, andstacking techniques for simulating concrete compressive strength.To maximize ease of use and modeling efficiency, this work useddefault settings in the individual and ensemble models in WEKA.Therefore, further studies are needed to explore how the parame-ters in these models can be optimized automatically.

References

[1] Sobhani J, Najimi M, Pourkhorshidi AR, Parhizkar T. Prediction of thecompressive strength of no-slump concrete: a comparative study ofregression, neural network and ANFIS models. Constr Build Mater2010;24(5):709–18.

[2] Kosmatka SH, Wilson ML. Design and control of concrete mixtures, EB001.fifteenth ed. Skokie (IL, USA): Portland Cement Association; 2011.

[3] Bharatkumar BH, Narayanan R, Raghuprasad BK, Ramachandramurthy DS. Mixproportioning of high performance concrete. Cement Concr Compos2001;23(1):71–80.

[4] Papadakis VG, Tsimas S. Supplementary cementing materials in concrete: PartI: Efficiency and design. Cem Concr Res 2002;32(10):1525–32.

[5] Prasad BKR, Eskandari H, Reddy BVV. Prediction of compressive strength of SCCand HPC with high volume fly ash using ANN. Constr Build Mater 2009;23(1):117–28.

[6] Bhanja S, Sengupta B. Investigations on the compressive strength of silica fumeconcrete using statistical methods. Cem Concr Res 2002;32(9):1391–4.

[7] Atici U. Prediction of the strength of mineral-addition concrete usingregression analysis. In: Concrete Research. Thomas Telford Ltd.; 2010. p.585–92.

[8] Zain MFM, Abd SM. Multiple regression model for compressive strengthprediction of high performance concrete. J Appl Sci 2009;9(1):155–60.

[9] Yeh IC, Lien L-C. Knowledge discovery of concrete material using GeneticOperation Trees. Expert Syst Appl 2009;36(3, Part 2):5807–12.

[10] Topçu _IB, Sarıdemir M. Prediction of compressive strength of concretecontaining fly ash using artificial neural networks and fuzzy logic. ComputMater Sci 2008;41(3):305–11.

[11] Reich Y. Machine learning techniques for civil engineering problems. ComputAided Civil Infrastruct Eng 1997;12(4):295–310.

[12] Boukhatem B, Kenai S, Tagnit-Hamou A, Ghrici M. Application of newinformation technology on concrete: an overview. J Civil Eng Manage 2011;17(2):248–58.

[13] Tiryaki S, Aydın A. An artificial neural network model for predictingcompression strength of heat treated woods and comparison with a multiplelinear regression model. Constr Build Mater 2014;62(0):102–8.

[14] Chen B-T, Chang T-P, Shih J-Y, Wang J-J. Estimation of exposed temperature forfire-damaged concrete using support vector machine. Comput Mater Sci2009;44(3):913–20.

[15] Majid A, Khan A, Javed G, Mirza AM. Lattice constant prediction of cubic andmonoclinic perovskites using neural networks and support vector regression.Comput Mater Sci 2010;50(2):363–72.

[16] Cheng M-Y, Chou J-S, Roy AFV, Wu Y-W. High-performance concretecompressive strength prediction using time-weighted evolutionary fuzzysupport vector machines inference model. Automat Constr 2012;28(0):106–15.

[17] Peng C-H, Yeh IC, Lien L-C. Building strength models for high-performanceconcrete at different ages using genetic operation trees, nonlinear regression,and neural networks. Eng Comput 2010;26(1):61–73.

[18] Gupta S. Support vector machines based modelling of concrete strength. In:Proceedings of world academy of science: engineering & technology, vol. 36;2007

[19] Dantas ATA, Batista Leite M, de Jesus Nagahama K. Prediction of compressivestrength of concrete containing construction and demolition waste usingartificial neural networks. Constr Build Mater 2013;38:717–22.

[20] Chou J-S, Chiu C, Farfoura M, Al-Taharwa I. Optimizing the prediction accuracyof concrete compressive strength based on a comparison of data-miningtechniques. J Comput Civil Eng 2011;25(3):242–53.

[21] Yan K, Shi C. Prediction of elastic modulus of normal and high strengthconcrete by support vector machine. Constr Build Mater 2010;24(8):1479–85.

[22] Uysal M, Tanyildizi H. Predicting the core compressive strength of self-compacting concrete (SCC) mixtures with mineral additives using artificialneural network. Constr Build Mater 2011;25(11):4105–11.

[23] Chou J-S, Tsai C-F. Concrete compressive strength analysis using a combinedclassification and regression technique. Automat Constr 2012;24(0):52–60.

[24] Singh KP, Gupta S, Rai P. Identifying pollution sources and predicting urban airquality using ensemble learning methods. Atmos Environ 2013;80:426–37.

[25] Wolpert DH. Stacked generalization. Neural Networks 1992;5(2):241–59.[26] Breiman L. Bagging predictors. Mach Learn 1996;24(2):123–40.[27] Dietterich T. Ensemble methods in machine learning. In: Multiple classifier

systems. Berlin (Heidelberg): Springer; 2000. p. 1–15.[28] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation

and model selection. In: International joint conference on artificialintelligence, Morgan Kaufmann; 1995. p. 1137–43.

[29] Altun F, Kis�i Ö, Aydin K. Predicting the compressive strength of steel fiberadded lightweight concrete using neural network. Comput Mater Sci2008;42(2):259–65.

[30] Yeh IC. Analysis of strength of concrete using design of experiments and neuralnetworks. J Mater Civ Eng 2006;18(4):597–604.

[31] Yeh IC. Modeling of strength of high-performance concrete using artificialneural networks. Cem Concr Res 1998;28(12):1797–808.

[32] Ni H-G, Wang J-Z. Prediction of compressive strength of concrete by neuralnetworks. Cem Concr Res 2000;30(8):1245–50.

[33] Parichatprecha R, Nimityongskul P. Analysis of durability of high performanceconcrete using artificial neural networks. Constr Build Mater 2009;23(2):910–7.

[34] Topçu _IB, Sarıdemir M. Prediction of properties of waste AAC aggregate concreteusing artificial neural network. Comput Mater Sci 2007;41(1): 117–25.

[35] Lee JJ, Kim D, Chang SK, Nocete CFM. An improved application technique of theadaptive probabilistic neural network for predicting concrete strength.Comput Mater Sci 2009;44(3):988–98.

[36] Mousavi SM, Aminian P, Gandomi AH, Alavi AH, Bolandi H. A new predictivemodel for compressive strength of HPC using gene expression programming.Adv Eng Softw 2012;45(1):105–14.

[37] Adeodato PJL, Arnaud AL, Vasconcelos GC, Cunha RCLV, Monteiro DSMP. MLPensembles improve long term prediction accuracy over single networks. Int JForecast 2011;27(3):661–71.

[38] Erdal HI, Karakurt O, Namli E. High performance concrete compressivestrength forecasting using ensemble models based on discrete wavelettransform. Eng Appl Artif Intell 2013(0).

[39] Erdal HI, Karakurt O. Advancing monthly streamflow prediction accuracy ofCART models using ensemble learning paradigms. J Hydrol 2013;477:119–28.

[40] Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, et al. Top 10algorithms in data mining. Knowl Inf Syst 2008;14(1):1–37.

[41] Vapnik VN. The nature of statistical learning theory. New York: Springer-Verlag; 1995.

[42] Smola A, Schölkopf B. A tutorial on support vector regression. Stat Comput2004;14(3):199–222.

[43] Breiman L, Friedman J, Olshen R, Stone C. Classification and regressiontrees. New York: Chapman & Hall/CRC; 1984.

[44] Loh W-Y. Classification and regression trees. Wiley Interdiscipl Rev DataMining Knowl Discov 2011;1(1):14–23.

[45] de Oña J, de Oña R, Calvo FJ. A classification tree approach to identify keyfactors of transit service quality. Expert Syst Appl 2012;39(12):11164–71.

[46] Neter J, Kutner MH, Nachtsheim CJ, Wasserman W. Applied Linear StatisticalModels. Fourth ed: McGraw-Hill/Irwin; 1996.

[47] Liang H, Song W. Improved estimation in multiple linear regression modelswith measurement error and general constraint. J. Multivar. Anal. 2009;100(4):726–41.

[48] Chen L. A multiple linear regression prediction of concrete compressivestrength based on physical properties of electric arc furnace oxidizing slag. Int.J. Appl. Sci. Eng. 2010;7(2):153–8.

[49] Yan X, Su XG. Linear regression analysis: theory and computing. Singapore:World Scientific Publishing Co. Pte. Ltd.; 2009.

[50] Frosyniotis D, Stafylopatis A, Likas A. A divide-and-conquer method for multi-net classifiers. Pattern Anal Appl 2003;6(1):32–40.

[51] Ghosh J. Multiclassifier systems: back to the future. In: Roli F, Kittler J, editors.Multiple classifier systems. Berlin (Heidelberg): Springer; 2002. p. 1–15.

[52] Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst Mag2006;6(3):21–45.

[53] Lee S-W, Byun H. A survey on pattern recognition applications of supportvector machines. Int J Pattern Recognit Artif Intell 2003;17(03):459–86.

[54] Yeh IC. Modeling slump of concrete with fly ash and superplasticizer. ComputConcr 2008;5(6):559–72.

[55] Videla C, Gaedicke C. Modeling Portland blast-furnace slag cement high-performance concrete. ACI Mater J 2004;101(5):365–75.

[56] Lam L, Wong YL, Poon CS. Effect of fly ash and silica fume on compressive andfracture behaviors of concrete. Cem Concr Res 1998;28(2):271–83.

[57] Lim C-H, Yoon Y-S, Kim J-H. Genetic algorithm in mix proportioning of high-performance concrete. Cem Concr Res 2004;34(3):409–20.

[58] Safarzadegan GS, Bahrami Jovein H, Ramezanianpour AA. Hybrid supportvector regression – particle swarm optimization for prediction of compressivestrength and RCPT of concretes containing metakaolin. Constr Build Mater2012;34(0):321–9.

[59] Gandomi A, Alavi A. A new multi-gene genetic programming approach tononlinear system modeling. Part I: Materials and structural engineeringproblems. Neural Comput Appl 2012;21(1):171–87.

[60] Tsai H-C, Lin Y-H. Predicting high-strength concrete parameters usingweighted genetic programming. Eng Comput 2011;27(4):347–55.