Classification in conservation biology: A comparison of five machine-learning methods

Download Classification in conservation biology: A comparison of five machine-learning methods

Post on 25-Nov-2016

214 views

Category:

Documents

1 download

TRANSCRIPT

mmherx 40se, E00,oke,Received 12 October 2009Received in revised form 17 June 2010Accepted 18 June 2010Keywords:Articial neural networksClassication treesFuzzy logicMeleagris ocellataRandom forests1. Introduction 2001). To overcome this problem, machine-learning methods haveEcological Informatics 5 (2010) 441450Contents lists available at ScienceDirectEcological Inj ourna l homepage: www.e lsClassication is one of the most widely applied tasks in ecology,and has been encountered in a variety of contexts, such as suitable siteassessment for ecological conservation and restoration (e.g., Chaseand Rothley, 2007), bioindicator identication (Kampichler andPlaten, 2004), vegetation mapping by remote sensing (Steele, 2000),risk assessment systems for introduced species (Caley and Kuhnert,2006), and species distribution and habitat models (e.g., Guisan andThuiller, 2005). In most cases, ecologists have to deal with noisy, high-dimensional data that are strongly non-linear and which do not meetbeen increasingly adopted as ecological classication methods overthe last 20 years. The most popular of these are classication trees andarticial neural networks (De'ath and Fabricius, 2000; Recknagel,2001; zesmi et al., 2006). New data mining approaches arecontinually being developed, thus steadily increasing the number ofavailable methods. Diversication of the ecological toolbox is notalways an advantage, however. Some methods are computationallyintensive and there are few guidelines or standard proceduresavailable, causing some confusion among ecologists over whichmethod should be selected for a given classication problem.the assumptions of conventional statistical Corresponding author. Vogeltrekstation, Dutch CDemography, NIOO-KNAW, P.O. Box 40, 6666 ZG Heter26 4791 233; fax: +31 26 4723 227.E-mail addresses: christian.kampichler@web.de (C. K(R. Wieland), sophie.calme@gmail.com (S. Calm), holg(H. Weissenberger), slaw2000@prodigy.net.mx (S. Arria1574-9541/$ see front matter 2010 Elsevier B.V. Aldoi:10.1016/j.ecoinf.2010.06.003Support vector machinesdimensional data that often are non-linear and do not meet the assumptions of conventional statisticalprocedures. To overcome this problem, machine-learning methods have been adopted as ecologicalclassication methods. We compared ve machine-learning based classication techniques (classicationtrees, random forests, articial neural networks, support vector machines, and automatically induced rule-based fuzzy models) in a biological conservation context. The study case was that of the ocellated turkey(Meleagris ocellata), a bird endemic to the Yucatan peninsula that has suffered considerable decreases in localabundance and distributional area during the last few decades. On a grid of 1010 km cells that wassuperimposed to the peninsula we analysed relationships between environmental and social explanatoryvariables and ocellated turkey abundance changes between 1980 and 2000. Abundance was expressed inthree (decrease, no change, and increase) and 14 more detailed abundance change classes, respectively.Modelling performance varied considerably between methods with random forests and classication treesbeing the most efcient ones as measured by overall classication error and the normalised mutualinformation index. Articial neural networks yielded the worst results along with linear discriminantanalysis, which was included as a conventional statistical approach. We not only evaluated classicationaccuracy but also characteristics such as time effort, classier comprehensibility and method intricacyaspects that determine the success of a classication technique among ecologists and conservation biologistsas well as for the communication with managers and decision makers. We recommend the combined use ofclassication trees and random forests due to the easy interpretability of classiers and the highcomprehensibility of the method. 2010 Elsevier B.V. All rights reserved.procedures (Recknagel, Various attembased classicaticomparisons, idapplications (ThuGarzn et al., 20et al., 2006; PrasMeynard and Qution trees, randentre for Avian Migration &en, The Netherlands. Tel.: +31ampichler), rwieland@zalf.deerweissen@ecosur.mxga-Weiss).l rights reserved.asks in ecology. Ecologists have to deal with noisy, high-Article history: Classication is one of the most widely applied ta b s t r a c ta r t i c l e i n f oClassication in conservation biology: A coChristian Kampichler a,b,, Ralf Wieland c, Sophie Cala Universidad Jurez Autnoma de Tabasco, Divisin de Ciencias Biolgicas, Carretera Villab Vogeltrekstation, Dutch Centre for Avian Migration & Demography, NIOO-KNAW, P.O. Boc ZALF Leibniz-Zentrum fr Agrarlandschaftsforschung, Institut fr Landschaftssystemanalyd El Colegio de la Frontera Sur, Unidad Chetumal, Avenida del Centenario km. 5.5, C.P. 779e Universit de Sherbrooke, Dpartement de biologie, 2500 avenue de l'Universit, Sherbroparison of vemachine-learningmethodsd,e, Holger Weissenberger d, Stefan Arriaga-Weiss amosa-Crdenas km. 0.5 s/n, C.P. 86150, Villahermosa, Tabasco, Mexico, 6666 ZG Heteren, The Netherlandsberswalder Strae 84, D-15374 Mncheberg, GermanyChetumal, Quintana Roo, MexicoQuebec, Canada J1K 2R1formaticsev ie r.com/ locate /eco l in fpts have been made to popularise machine-learningonmethods among ecologists, and through a series ofentify their strengths and weaknesses for differentiller et al., 2003; Segurado and Arajo, 2004; Benito-06; Elith et al., 2006; Leathwick et al., 2006; Moisenad et al., 2006; Cutler et al., 2007; Peters et al., 2007;inn, 2007). Different methods (including classica-om forests, bagging, articial neural networks,considered a reference grid of 1010 km cells, a resolution commonlyused tomap species distributions in bird atlases. Maximumnumber ofbirds within ocks was used as a proxy for species abundance, asgroup size varies with local abundance in gregarious species (Calmet al., in press). Estimates of OT abundance in 1980 and 2000 wereavailable on an ordinal scale (absent, low abundance, mediumabundance, and high abundance) for 688 cells. We applied twoways for dening the change of OT abundance between 1980 and2000. First, we classied abundance changes in three classes(increase, decrease, and no change), irrespective of prior abundancein 1980. For example, a cell with low OT abundance in 1980 that wentextinct falls into the same class decrease as does a cell with high OTabundance in 1980 that dropped to medium or low abundance or toextinction. Second, we resolved abundance changes at a ner grain,using 14 classes, with each one dened by abundance class in 1980and abundance class in 2000 (Table 1). We refer to them as thecoarse and ne models, respectively, throughout the paper.442 C. Kampichler et al. / Ecological Informatics 5 (2010) 441450multivariate adaptive regression splines, and genetic algorithms,among others) have been tested and compared with one another orwith more conventional statistical procedures such as generalisedlinear models and discriminant analysis. Due to the large number ofmethods and their alternatives (e.g., the various algorithms forinducing classication trees and the different methods for improvingtheir efciency such as pruning and boosting), the picture is still farfrom complete or conclusive and many questions remain unanswered(Elith et al., 2006). Moreover, the adoption of a new method dependsnot only on its efciency, but also on how comprehensible it is, howmuch time is necessary to learn and understand it, how easily can itsresults be interpreted, and whether it requires specialised software orcan be executed in an already familiar environment. In this paper, wecompare ve machine-learning methods against the traditionalclassication technique of discriminant function analysis (DA).The majority of articles that we examined and which were testingnovel classication methods in the context of distribution and habitatmodelling have used binary classication, i.e., presence or absence of aspecies. In the present paper the techniques of classication trees,random forests, articial neural networks, support vector machines,and automatically generated fuzzy classiers were tested, along withDA, for a casewithmultiple classes, viz., the analysis of abundance anddistribution loss of the ocellated turkey (Meleagris ocellata), anendangered species that is endemic to the Yucatan Peninsula, Mexico.The ocellated turkey (OT) is a large and easily recognisable birdwhose geographic range encompasses the Mexican states of Yucatn,Campeche and Quintana Roo, together with the Guatemalan depart-ment of El Petn and the northern portion of Belize (Howell andWebb, 1995). Calm et al. (in press) have demonstrated a consider-able loss of distributional area and local abundance of the OT between1980 and 2000, based on an analysis of 688 grid cells (1010 kmeach) distributed throughout the Mexican part of the species' range.Over this 20-year period, OT went extinct in 16.6% of the cells, whileits local abundance decreased in 34.6% of the cells. Most probably,colonisation from other regions of Mexico, which began in the 1970s,and its immediate consequences, such as increased levels ofsubsistence hunting (Quijano-Hernndez and Calm, 2002) anddeforestation, have driven OT abundance below a critical threshold.The current decrease of OT abundance is apparently decoupled fromexternal factors (Kampichler et al., submitted for publication), fromwhich it is concluded that the OT has encountered an extinctionvortex very similar to the model proposed by Oborny et al. (2005).This model predicts spontaneously fragmenting populations with,consequently, failing density regulation.The ve machine-learning based classication methods that weused to model OT abundance changes have included those that haveyet to be or which have been rarely been tested in an ecologicalapplication. Classication trees and neural networks are frequentlyused in ecological data analysis (e.g. De'ath and Fabricius, 2000;Schultz et al., 2000; Henderson et al., 2005; Jones et al., 2006; zesmiet al., 2006; Lee et al., 2007) and random forests are gaining increasinginterest (Benito-Garzn et al., 2006; Lawler et al., 2006; Cutler et al.,2007; Peters et al., 2007; Kampichler et al., submitted for publication),but support vector machines and automatically generated fuzzyclassiers have been only rarely used (Guo et al., 2005; Drake et al.,2006; Tscherko et al., 2007).We compared the classication efciencyof these methods, as measured by total classication error as well asby normalized mutual information (Forbes, 1995), and then com-pared them with the more conventional linear discriminant analysis.2. Material and methods2.1. Data originOT data stem from the comprehensive assessment made by Calmet al. (in press) in the Yucatan Peninsula of Mexico. The study regionWe used 44 explanatory variables to model OT abundance change.Six variables included information on prior OT abundance at twoscales: local (within a given cell) and regional (the Moore neighbour-hood, i.e., the eight cells immediately surrounding a given cell).Eighteen variables were related to local and regional vegetationproperties and land use types (for example, cover of certain foresttypes in a given cell). Eighteen variables included information on localand regional social aspects, such as the number of settlements of agiven size in a cell, and the proportion of males N18-years-old(typically those that go hunting) in the human population of a cell.Finally, two geographic variables (cell longitude and latitudecoordinates) were included. (For a complete list of explanatoryvariables, consult Table A, electronic supplementary material). Maindata sources were the national forest inventories of 1980 and 2000,which were compiled by the Mexican National Forest CommissionCONAFOR (URL http://www.conafor.gob.mx), and the populationcensus in 2000, which was compiled by the Mexican Institute ofStatistics, Geography and Informatics (INEGI, 2002).2.2. Software and data preparationThe R language and environment for statistical computing (RDevelopment Core Team, 2008) has continued to gain acceptanceamong ecologists (Kangas, 2004). We thus used R as a commonplatform for all of the classication methods that we applied and usedthe respective R packages: tree (Ripley, 2007) for classication trees;randomForest (Liaw and Wiener, 2002) for random forests; nnet(Venables and Ripley, 2002) for articial neural networks; kernlab(Karatzoglou et al., 2004) for support vector machines; gcl (Vinterbo,2007) for fuzzy classiers; and MASS (Venables and Ripley, 2002) fordiscriminant analysis.Table 1Classes of ocellated turkey abundance changes between 1980 and 2000. The abundancechange classes absent to high and low to mediumwere not represented in the data.OT abundance in 1980 OT abundance in 2000 AbbreviationHigh High HHHigh Medium HMHigh Low HLHigh Absent HAMedium High MHMedium Medium MMMedium Low MLMedium Absent MALow High LHLow Low LLLow Absent LAAbsent Medium AMAbsent Low ALAbsent Absent AAFig. 1. Prediction of ocellated turkey abundance changes between 1980 and 2000 on the Yucatan Peninsula using the coarse model based on three abundance change classes(increase, no change, and decrease). Each point or circle represents a 1010 km grid cell; grid cells without available data were left free. A, reference map with the states ofCampeche, Yucatn and Quintana Roo, including major cities; B, observed ocellated turkey abundance change; C, prediction by classication tree; D, prediction by random forest; E,prediction by articial neural network; F, prediction by support vector machine; G, prediction by automatically induced fuzzy rule-based model; H, prediction by linear discriminantanalysis. , increase; , no change; and , decrease.443C. Kampichler et al. / Ecological Informatics 5 (2010) 441450Each package implements its own performance measure proce-dures, which cannot be directly compared (e.g., bootstrapping and thedetermination of out-of-bag error in randomForest, continuousmonitoring of the proportion of correctly classied cases throughoutthe training process in nnet). To guarantee comparability, we did notrule-based models permit the representation and processing ofecological knowledge in terms of natural language, based on a set oftion and fuzzy operators, rule induction and rule ltering, which wasespecially designed for the production of an easily interpretable smallnumber of short rules. The main parameter to be adapted for FRBMinduction is the number of levels into which the explanatory variablesare discretised; for other options of gcl, their default values wereYuwerclassle-bnce444 C. Kampichler et al. / Ecological Informatics 5 (2010) 441450IF-THEN rules and fuzzy logic (Wieland, 2008). They have beenextensively used in ecological modelling; the application of automat-ed induction of FRBM for classication purposes, however, is a recentdevelopment (Bouchon-Meunier et al., 2007). Vinterbo et al. (2005)presented a set of algorithms for the combination of fuzzy discretisa-Fig. 2. Prediction of ocellated turkey abundance changes between 1980 and 2000 on theEach point or circle represents a 1010 km grid cell; grid cells without available dataincluding major cities; B, observed ocellated turkey abundance in 2000; C, prediction byF, prediction by support vector machine; G, prediction by automatically induced fuzzy ruclasses (for example, from low abundance to high abundance); , increase by one abundamake use of built-in procedures; rather, we divided the data set of 688cases into a set of 512 randomly chosen training cases and a set of 176held-back test cases. All classiers were generated using the sametraining set and were validated by applying them to the same test setand analysing the resulting confusion matrices (see below).2.3. Classication methodsClassication trees (CT). CT summarise the relationshipsbetween attributes (explanatory variables) and the class of an object(a categorical response variable) in a dichotomously branching treestructure (Breiman et al., 1984; Ripley, 1996). Each bifurcation isdened by one of the attributes, a certain value of which divides thedata set in two more homogeneous subsets. Through a process ofrecursive partitioning, the same attribute, or different ones, is used tocreate subsequent branching points in the tree. Classication trees aregrown automatically using algorithms that minimise the impurity ofthe subsets (Breiman et al., 1984; Venables and Ripley, 2002). While afully grown tree will explain the training data with certain accuracy, itwill fail for unseen data because of overtting. The tree packageincreases tree generality by pruning it according to a cost-complexitycriterion, which yields a simpler tree that can usually result in betterclassication of unseen cases (Ripley, 2007).Random forests (RF). RF are a novel ensemble learningalternative to CT; many trees are constructed, with classes beingpredicted by a majority vote (Breiman, 2001, Liaw andWiener, 2002).An RF is grown by a procedure called bagging (Breiman, 1996), whichis short for bootstrap aggregating, where each tree is independentlyconstructed by using a bootstrap sample (with replacement) of theentire data set. Each node of the trees is split using only a subset of theexplanatory variables chosen randomly for each tree. The parametersthat must be tuned for growing a random forest are the number ofattributes chosen at each split and the number of trees to be grown.Back-propagation neural networks (BPNN). Artical neuralnetworks are non-linear statistical data modelling tools that are basedon the architecture of biological neural networks, andwhich consist ofa group of interconnected computing units or articial neurons(Ripley, 1996; Warner and Misra, 1996). During a learning phase,connection weights among the neurons or nodes can be adapted bypropagating training data through the net. Typically, artical neuralnetworks are used for continuous variable modelling, but they alsocan be used for classication purposes (e.g., zesmi et al., 2006). Wetrained the most commonly applied architecture of artical neuralnetworks, a BPNN with one layer of hidden nodes. The parameter tobe optimised was the number of nodes in the hidden layer; all otherparameters used default values provided by nnet.Automatically induced fuzzy rule-based models (FRBM). Fuzzyclasses; and O, decrease by three abundance classes (for example, from high abundance toapplied.Support vector machines (SVM). The SVM has been developedfrom a linear classier using a maximum margin hyperplane toseparate two classes. In a non-linear case, the central idea ofclassication with SVM is to map training data into a higher-dimensional feature space and to compute separating hyperplanesthat achieve maximum separation (margin) between the classes. Theapplication of a kernel function permits the construction of theseparating hyperplanes without the necessity of explicitly carryingover the data into the feature space (Schlkopf and Smola, 2002). Themaximum separation hyperplane is only a function of the trainingdata which lie on the margin; these are called support vectors, andhence, the name of the classication method. The kernlab packageprovides several kernels, the most efcient of which must beidentied during the training phase.Discriminant analysis (DA). DA is a well-known, classicstatistical procedure going back to Fisher (1936), the intention ofwhich is nding a linear combination of the input variables thatmaximises the ratio between the separation of class means and thewithin-class variance (Venables and Ripley, 2002).2.4. Performance measuresThe performance of the six classiers that were developed on thetraining and test cases was measured by analysing their respectiveconfusion matrices, which were 33 matrices in case of the coarsemodel, and 1414 matrices in case of the ne model. We applied thenormalised mutual information (NMI) criterion (Forbes, 1995). NMIattempts to determine how well one classication is able to predict asecond, and has certain advantages over other tools for confusionmatrix analysis (for a detailed discussion, see Forbes, 1995). The indexis calculated as:NMI 1 table entropy row entropycolumn entropy 1 PP cijN TlncijN +P c+ jN Tlnc+ jNP ci+N Tln ci+Nwherecij is the cell count in row i (observations) and column j (modelpredictions),c+ j is the sum over the row entries making up column j,ci+ is the sum over the column entries making up row i, andN is the count total.NMI values range between 0 (random confusion matrix) and 1(complete correspondence between model prediction and observa-tion). NMI quanties the proportion of information in the observa-tions provided by a classication algorithm which is contained in itsmodel predictions. NMI has yet to attain the same popularity as otherclassication performance measures, but it is slowly entering thebiologists' toolbox (e.g., Hart et al., 2005). Since NMI can only becalculated formatrices with cijN0, we added 0.001 to every empty cell.catan Peninsula using the ne model based on 14 abundance change classes (Table 1).e left free. A, reference map with the states of Campeche, Yucatn and Quintana Roo,ication tree; D, prediction by random forest; E, prediction by articial neural network;ased model; H, prediction by linear discriminant analysis., increase by two abundanceclass; , no change; B, decrease by one abundance class;, decrease by two abundanceabsent).445C. Kampichler et al. / Ecological Informatics 5 (2010) 4414503. ResultsThe general spatial pattern of increase and decrease in OTabundance, between 1980 and 2000 was reproduced by almost allmethods (Figs. 1, 2), except for BPNN (coarse model, Fig. 1E) andBPNN and FRBM (ne model, Fig. 2E, F). In the latter cases, themethods most notably missed identifying the few regions where OTabundance increased, viz., in the south and southeast of the peninsula(a biosphere reserve declared in 1989 and permanent reserves incommunal farming areas, Kampichler et al., submitted for publica-tion). Fine model CT tended to overestimate OT performance overlarge parts of the study area, but correctly represented trends in theabundance changes (Fig. 2C).No single classication method was superior to the others in everycase. However, there were clear differences in classication efciency.RF were unsurpassed when it came to successfully classifying thetraining data; all cases were classied correctly, yielding an NMI of 1.00and classication accuracy of 100% (Fig. 3; for full confusion matrices,see Tables B and C, electronic supplementary material). They were alsothe best method when measured by the proportion of correctlyclassied cases of the test data, closely followed by CT (Fig. 3A, B).When performance for the test data was measured by NMI, RF werethird (coarsemodel) and second (nemodel), compared to SVMandCT(coarse model) and CT (ne model), respectively (Fig. 3C, D).Performance for the coarse model, when measured as the numberof correctly classied cases, varied between 69% (BPNN) and 100%(RF) for training cases and between 56% (SVM) and 78% (RF) for testcases (Fig. 3A). Differences were more pronounced for the ne model,with a range from 28% (BPNN) to 100% (RF) for training cases and 28%(BPNN) to 60% (RF) for test cases (Fig. 3B).NMI detected larger performance differences between the differ-ent models. For the coarse model, classication success ranged from0.15 (BPNN) to 1.00 (RF) for training data, and from 0.07 (BPNN) to0.52 (SVM) for test data (Fig. 3C). For the ne model, NMI rangedhannorm446 C. Kampichler et al. / Ecological Informatics 5 (2010) 441450Fig. 3. Performance measures of coarse and ne models of ocellated turkey abundance ccases by the nemodels; C, normalised mutual information of the coarse models; and D,BPNN, articial neural network; SVM, support vector machine; FRBM, automatically inducege predictions. A, correctly classied cases by the coarse models; B, correctly classifiedalised mutual information of the nemodels. CT, classication tree; RF, random forest;d rule-based fuzzy model; and DA, linear discriminant analysis.447C. Kampichler et al. / Ecological Informatics 5 (2010) 441450between 0.32 (DA) and 1.00 (RF) for training data, and 0.34 (DA) and0.97 (CT) for test data (Fig. 3D; for the full confusion matrices, seeTables B and C, electronic supplementary material).4. Discussion4.1. Model performanceRFwere unmatched in their ability to explain the training data, andamong the best methods for classifying unseen test data. Moreover, RFare robust against overtting; indeed, Breiman (2001) presented aproof that the error associated with random forests converges to alimit as the number of trees in a forest becomes large. This advantagewas demonstrated clearly in this and in previous studies (Benito-Garzn et al., 2006; Cutler et al., 2007), which makes RF an extremelyuseful tool in ecological classication. CT were surprisingly successfulfor predicting unseen cases (Fig. 3) and outcompetedmethods that arenormally regarded as superior, such as BPNN and SVM. In otherstudies, CT have been demonstrated to be less efcient than competingtechniques, e.g., generalised additive models (Thuiller et al., 2003;Segurado and Arajo, 2004; Moisen et al., 2006). BPNN performedpoorly in the present study, despite their well-known learningcapabilities and their good performance in comparisons of habitatmodelling techniques (Segurado and Arajo, 2004; Benito-Garzn etal., 2006).A remarkable difference among classication techniques appearedin the ne model, where prior abundance was incorporated into theabundance change classes. For example, if OT abundance in 1980 wasH, then the only possible abundance change classes are HH, HM, HLand HA. We found that CT, RF and SVM made efcient use of OTabundance in 1980, and they classied each case correctly into itsgroup with respect to this information (see confusion matrices inTable C, supplementary electronic material). In contrast, BPNN, FRBMand DA showed a high degree of classication error where OTabundance in 1980 was not respected for the nal class assignment.For example, a case with prior abundance H could have been assignedto abundance change classes AA, AL, AM, AH, LA, LL, LH, MA, ML, MMor MH (Table C, supplementary electronic material), therebydecreasing model performance. Ranking all applied methods accord-ing to their performance measures (classication error, NMI) for thecoarse and ne models, together with the training and test data, putthe methods in the following decreasing order: RFCT, SVMFRBMDABPNN.As has been observed repeatedly, a given modelling techniquemight be efcient for a given data set. Yet, it might fail for another,thereby rendering the identication of the best modelling approachimpossible. As a possible solution, Thuiller et al. (2009) proposed thatensembles of forecasts be tted by simulating across more than onemodelling technique and that the resulting range of predictionsshould be analysed rather than relying on the result of a single model.The corresponding software, BIOMOD (Thuiller et al., 2009, is alreadyavailable, but it is restricted to the modelling of presenceabsencedata. Also Huang and Lees (2004, 2005) suggested methods how tocombine models to improve classication accuracy and to providecondence measures. Hopefully, data mining platforms such as Rattle(Williams, 2009) will facilitate the simultaneous application ofdifferent modelling techniques, their evaluation and their use inensemble forecasting.4.2. Time effort for modelling and optimisationCT and DA can be calculated in a very short time. They aredeterministic methods in that, for a given data set and a given set ofmodelling parameters (e.g., the criterion used for measuring nodeimpurity during CT induction), there is only one solution. SVM havedeterministic solutions too, but the best kernel has to be found duringthe training phase. All other models (RF, BPNN, and FRBM) includerandom aspects during the training phase (e.g., the initialisation of anuntrained BPNN with random weights and bootstrapping of cases andrandom selection of explanatory variables in RF induction), and thus,require iterated modelling runs until the best solution is found. In thecase of BPNN and FRBM, overtting is a serious problem (i.e., the bestmodel for training data classication most probably will not work wellfor unseendata), and each trained net or fuzzymodelmust be applied totest data, which retards the optimisation process. To have reallyindependent test data for model validation, three data sets are actuallynecessary: (i) training data for model generation; (ii) test data toprevent overtting; and (iii) validation data for the trainedmodel. SinceFRBM and especially BPNN are quite data-intensive and ecological datasets are typically small, this puts practical limits to their application. Incontrast, RF induction has appeal in that the setting aside of test data forthe estimation of the error on unseen data is not necessary, since theprediction for the cases not included in the bootstrap sample at eachiteration (the so-called out-of-bag cases, or in short, OOB cases) can becompared with the observed data. All OOB predictions of a randomforest are aggregated and yield the OOB estimate of error rate (Liaw andWiener, 2002). (Note that we did not take advantage of this feature inorder to establish a uniform procedure of model validation for all testedmethods; see Section 2.2.)Optimisation time could be minimised by using specialised software,such as the SNNS Stuttgart Neural Network Simulator (http://www.ra.cs.uni-tuebingen.de/SNNS/) or SADATO (http://www.zalf.de/home_samt-lsa/download.html), which provides tools for cross-validation and includestechniques to avoid overtraining. Yet this would reduce the advantage ofrunning various modelling techniques in the same environment andincrease the time needed for software familiarisation.The number of parameters is quite high in some modellingtechniques. For example, not only are there several kernels availablefor SVM, but a number of hyper-parameters have to be determined foreach kernel (Karatzoglou et al., 2004). Moreover, the use of defaultparameters that are provided by the software does not necessarily yieldan optimal solution. Optimisation also requires considerable experiencewith the frequently non-trivial underlying mathematics of the method(Schlkopf and Smola, 2002) as well as the application software. Incontrast, CT and RF needonly a fewparameters to be determined. For RFinduction, for example, only the number of trees and the number ofexplanatory variables used at each split have to be chosen (Liaw andWiener, 2002), which makes the method easily accessible to non-specialists. Indeed, comprehensible demonstrations of the use of CT andRF in ecological contexts are available which facilitate their application(e.g., De'ath and Fabricius, 2000; Cutler et al., 2007).4.3. Classier comprehensibility and potential for knowledge detectionThe classication methods differed in how explicitly theypresented relationships between explanatory and response variables.Classiers that can be easily understood and communicated, however,are the best choice for ecologists, conservation managers, andbiologists in general. For example, in the context of gene expressiondata analysis, Vinterbo et al. (2005, p. 1964) stated thatcomplex data mining algorithms, such as support vector machines,neural networks and logistic regression have been used in theclassication of [...] data. Usually, they produce models that are noteasily interpretable by biologists and biomedical researchers [...]. Ifsimple but accurate rules could be induced from relatively smalltraining samples, interpretation of the models would be greatlyfacilitated.CTs meet this claim to a high degree; the resulting tree-basedclassier is explicit, but also comprehensible to those lacking extensivemathematical training (Fig. 4). Because RF are an ensemblemethod, i.e.,4.4. Intricacy of classication techniquesThe gap separating theoretically and empirically minded ecologistswas lamentedseveral decadesago (e.g.,omnicki, 1988). Since then,novelteaching toolshave improved themathematical and theoretical trainingofecologists, e.g., the use of simulationmodels in the classroom (Koratis etal., 1999). Yet, the gap still exists. In particular, the decisions made bymany conservation managers are still based on personal experience,common sense, and anecdote rather than on scientic evidence (forexamples, see Sutherland et al., 2004). Therefore, the application ofclassication methods in ecology and conservation, and their communi-cation tomanagers, will depend heavily on how easily thesemethods aregrasped. Methods that require a high degree of mathematical skill and along familiarisation phase, or those that work as black boxes, will be lessattractive than more comprehensible methods, the results of which areeasier to communicate. The theory and working principle behind CT andRF are particularly appealing since they can easily be understood byecologists andconservationmanagerswith limitedmathematical training.In contrast, BPNN, SVM (particularly in the non-linear case), and DArequire considerablemathematical knowledgeand imagination tobe fullyunderstoodand implemented. FRBMfalls between theseextremes;on theone hand, fuzzy rule-basedmodels are intuitively graspable; on the otherhand, the algorithms applied for their automated induction are far fromtrivial (Vinterbo et al., 2005) andmight be quite incomprehensible for anuntrained ecologist.448 C. Kampichler et al. / Ecological Informatics 5 (2010) 441450many single tree-based classiers make a majority vote, there is nosimple way to illustrate the complete classier. This needs not to be adisadvantage, since an ecologist or conservation manager normally isnot interested in knowingprecisely how the classier comes to a certainconclusion; rather, he or she wishes to know which of the explanatoryvariables are the most important predictors. Variable importance in RFcan be evaluated by looking at howmuch the prediction error increaseswhen theOOBdata are permuted for a certain variable,while keepingallothers constant. Only a randomly selected subset of explanatoryvariables is used for the induction of the single trees, which means therelative importance of every variable can be determined and displayed(Fig. 5), even if explanatory variables are correlated.FRBM represent relationships between explanatory and responsevariables as a setof if-thenrules and their corresponding fuzzy sets. For theOT dataset, the classier consisted of 128 (coarse model) and 162 (nemodel) rules, which were numbers far too high to be satisfactorilyinterpreted. DA normally yields quite transparent classiers, albeit in theform of discriminant function equations. In the case of the high-dimensional OT data and the 14 abundance classes used for the nemodel, the coefcients of the functions lled an overly complicated1344 matrix.BPNN are notoriously opaque. They do not provide explicit relation-ships between explanatory and response variables; rather their outputincludesmatrices of connectionweights. Although there aremethods toextract information from an BPNN (e.g., Huang and Xing, 2002) and torank the explanatory variables according to their importance (e.g., bymeans of sensitivity analysis, Huang and Lees, 2004, 2005), this wouldmean even greater expenditure of time because the analysis of an BPNNcan be more cumbersome than its training (Wieland and Mirschel,2008).Fig. 4. Classication trees for predicting ocellated turkey abundance changes between1980 and 2000 on the Yucatan Peninsula. (A) coarse model and (B) ne model. SeeTable 2 for abbreviations of abundance change classes.Fig. 5. Importance of the ten most important explanatory variables of random forestspredicting ocellated turkey abundance changes between 1980 and 2000 on the YucatanPeninsula. (A) coarse model and (B) ne model. See Table A (supplementary electronicmaterial) for descriptions of the explanatory variables.llingcase449C. Kampichler et al. / Ecological Informatics 5 (2010) 4414504.5. Integrative ranking of classication methodsWe concur with the recommendation made by Prasad et al. (2006)and regard tree-based methods as the most promising methods forecological prediction (here: classication), inparticular the combinationof CT with their simple interpretation (De'ath and Fabricius, 2000), andRFwith their highclassication accuracy (Cutler et al., 2007).We rankedthe methods, taking into consideration the different aspects ofmodelling performance (with special emphasis of the performance oftest data), modelling time effort, classier comprehensibility, andmethod intricacy (Table 2). CT and RF turned out to be the mostattractive methods, due to their high modelling performance, the littletime effort necessary for their generation, their transparent classiers,and their high comprehensibility. BPNN ranked low because they arevery complex methods that yield inapprehensible classiers. In Table 2,all aspects received the same weight; of course, the table might lookdifferent if a certain aspect had to be emphasised (e.g., modellingperformance) and others could be ignored (e.g., modelling time effort).We tried to keep the evaluation as objective as possible, but theassessments would be different for some aspects if they were madeeither by a mathematically well-trained theoretician or by a rst-timemodelling eld ecologist.5. ConclusionA modelling technique that works well for a given class ofmodelling problem was not necessarily appropriate for another one.In ecology and biological conservation, a classier should betransparent and easily interpreted, thereby allowing detected knowl-edge (e.g., a set of rules that relates a class to explanatory variables) tobe transformed into concrete guidelines for conservation. Theclassication technique should be straight forward and intuitivelygraspable to permit broad use by the ecological community, andcommunication with conservation managers, decision makers, andlast, but not least, the public. In the context presented in this paper,BPNN and to a certain degree SVM do not appear to be promisingtools, although they are extremely successful methods in otherdomains, whereas methods that might be regarded as old-fashionedby machine-learning specialists, CT in this case, appear to be quitepowerful. We recommend the application of CT and RF to ecologicalclassication problems due to their highmodelling accuracy as well astheir transparency and comprehensibility. In any case, interpretablemodels should be preferred to black box models unless prediction isthe only objective of the classication task.Table 2Ranking of applied classication techniques based on modelling performance, modeassessment is given, accompanied by a score ranging from 2 (worst case) to 2 (bestMethod Modelling performance Modelling and optimisation effortCT High (1) Very low (2)RF Very high (2) Low (1)DA Low (1) Very low (2)FRBM Medium (0) High (1)SVM High (1) Very high (2)BPNN Very low (2) Very high (2)AcknowledgementsWe thank Bill Parsons for revising the English text. This researchwas nanced by the Mexican Programa de Mejoramiento delProfesorado PROMEP (project number UJATAB-PTC-30, Promep/103.5/04/1401) and was supported by the German Federal Ministryof Consumer Protection, Food and Agriculture, and the Ministry ofAgriculture, Environmental Protection and Regional Planning of theFederal State of Brandenburg (Germany). This is publication 4828 ofthe Netherlands Institute of Ecology (NIOO-KNAW).Appendix A. Supplementary dataSupplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.ecoinf.2010.06.003.ReferencesBenito-Garzn, M., Blazek, R., Neteler, M., Snchez de Dios, R., Sainz-Ollero, H.,Furlanello, C., 2006. Predicting habitat suitability with machine learning models:the potential area of Pinus sylvestris. Ecological Modelling 197, 383393.Bouchon-Meunier, B., Detyniecki, M., Lesot, M.-J., Marsala, C., Rifqi, M., 2007. Real-worldfuzzy logic applications in data mining and information retrieval. In: Wang, P.P.,Ruan, D., Kerre, E.E. (Eds.), Fuzzy Logic A Spectrum of Theoretical & PracticalIssues. Springer, Berlin, pp. 219247.Breiman, L., 1996. Bagging predictors. Machine Learning 24, 123140.Breiman, L., 2001. Random forests. Machine Learning 45, 532.Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984. Classication and Regression Trees.Chapman & Hall/CRC, Boca Raton, FL, USA.Caley, P., Kuhnert, P.M., 2006. Application and evaluation of classication trees forscreening unwanted plants. Austral Ecology 31, 647655.Calm, S., Sanvicente, M., Weissenberger, H., in press. Changes in the distribution of theOcellated Turkey (Meleagris ocellata) in Mexico as documented by evidence ofmixed origins. Studies in Avian Biology.Chase, T., Rothley, K.D., 2007. Hierarchical tree classiers to nd suitable sites forsandplain grasslands and heathlands on Martha's Vineyard Island, Massachusetts.Biological Conservation 136, 6575.Cutler, D.R., Edwards Jr., T.C., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J.,2007. Random forests for classication in ecology. Ecology 88, 27832792.De'ath, G., Fabricius, K.E., 2000. Classication and regression trees: a powerful yetsimple technique for ecological data analysis. Ecology 81, 31783192.Drake, J.M., Randin, C., Guisan, A., 2006. Modelling ecological niches with support vectormachines. Journal of Applied Ecology 43, 424432.Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J.,Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A.,Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.McC., Peterson, A.T.,Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Sobern, J.,Williams, S., Wisz, M.S., Zimmermann, N.E., 2006. Novel methods improveprediction of species' distributions from occurrence data. Ecography 29, 129151.Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Annals ofEugenics 7, 179188.Forbes, A.D., 1995. Classicationalgorithm evaluation: ve performance measuresbased on confusion matrices. Journal of Clinical Monitoring 11, 189206.Guisan, A., Thuiller, W., 2005. Predicting species distribution: offering more than simplehabitat models. Ecology Letters 8, 9931009.Guo, Q., Kelly, M., Graham, C.H., 2005. Support vector machines for predictingdistribution of Sudden Oak Death in California. Ecological Modelling 182, 7590.Hart, C.E., Sharenbroich, L., Bornstein, B.J., Trout, D., King, B., Mjolsness, E., Wold, B.J.,2005. A mathematical and computational framework for quantitative comparisonand integration of large-scale gene expression data. Nucleic Acids Research 33,25802594.Henderson, B.L., Bui, E.N., Moran, C.J., Simon, D.A.P., 2005. Australia-wide predictions ofsoil properties using decision trees. Geoderma 124, 383398.Howell, S.N.G., Webb, S., 1995. A Guide to the Birds of Mexico and Northern CentralAmerica. Oxford University Press, Oxford.Huang, Z., Lees, B.G., 2004. Combining non-parametric models for multisource predictiveeffort, classier comprehensibility, and method intricacy. In each column, a verbal). The last column shows the total score and a concluding evaluation.Classier comprehensibilty Method intricacy Concluding evaluationVery high (2) Very low (2) Good (7)Very high (2) Low (1) Good (6)Medium (0) High (1) Medium (0)Medium (0) Medium (0) Medium (1)Low (1) High (1) Low (3)Very low (2) High (1) Low (7)forest mapping. Photogrammetric Engineering and Remote Sensing 70, 415426.Huang, Z., Lees, B.G., 2005. Representing and reducing error in natural resourceclassication. International Journal of Geographical Information Science 19, 603621.Huang, S.H., Xing, H., 2002. Extract intelligible and concise fuzzy rules from neuralnetworks. Fuzzy Sets and Systems 132, 233243.INEGI, 2002. XII Censo de poblacin y vivienda 2000. Instituto de Estadstica Geografa eInformtica, Mexico DF.Jones, M.J., Fielding, A., Sullivan, M., 2006. Analysing extinction risk in parrots usingdecision trees. Biodiversity and Conservation 15, 19932007.Kampichler, C., Platen, R., 2004. Ground beetle occurrence and moor degradation:modelling a bioindication system by automated decision-tree induction and fuzzylogic. Ecological Indicators 4, 99109.Kampichler, C., Calm, S., Weissenberger, H., Arriaga-Weiss S. L., submitted forpublication. Indication of a species in an extinction vortex: the case of the ocellatedturkey (Meleagris ocellata) on Yucatan peninsula, Mexico. Acta Oecologica.Kangas, M., 2004. R: a computational and graphics resource for ecologists. Frontiers inEcology and Environment 5, 277.Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A., 2004. kernlab an S4 package forkernel methods in R. Journal of Statistical Software 11 (9), 120.Koratis, K., Papatheodorou, E., Stamou, G.P., Paraskevopoulos, S., 1999. An investiga-tion of the effectiveness of computer simulation programs as tutorial tools forteaching population ecology at university. International Journal of ScienceEducation 21, 12691280.Lawler, J.J., White, D., Neilson, R.P., Blaustein, A.R., 2006. Predicting climate-induced rangeshifts: model differences andmodel reliability. Global Change Biology 12, 15681584.Leathwick, J.R., Elith, J., Hastie, T., 2006. Comparative performance of generalizedadditive models and multivariate adaptive regression splines for statisticalmodelling of species distributions. Ecological Modelling 199, 188196.Lee, J., Kwak, I.-S., Lee, E., Kim, K.A., 2007. Classication of breeding bird communitiesalong an urbanization gradient using an unsupervised articial neural network.Ecological Modelling 203, 6271.Liaw, A., Wiener, M., 2002. Classication and regression by randomForest. R News 2,1822.omnicki, A., 1988. The place of modelling in ecology. Oikos 52, 139142.Meynard, C.N., Quinn, J.F., 2007. Predicting species distributions: a critical comparisonof the most common statistical models using articial species. Journal ofBiogeography 34, 14551469.Moisen, G.G., Freeman, E.A., Blackard, J.A., Frescino, T.S., Zimmermann, N.E., Edwards Jr.,T.C., 2006. Predicting tree species and basal area in Utah: a comparison of stochasticgradient boosting, generalized additive models, and tree-based methods. Ecolog-ical Modelling 199, 176187.Oborny, B., Meszna, G., Szab, G., 2005. Dynamics of populations on the verge ofextinction. Oikos 109, 291296.zesmi, U., Tan, C.O., zesmi, S.L., Robertson, R.J., 2006. Generalizability of articialneural network models in ecological applications: predicting nest occurrence andbreeding success of the red-winged blackbird Agelaius phoeniceus. EcologicalModelling 195, 94104.Peters, J., De Baets, B., Verhoest, N.E.C., Samson, R., Degroeve, S., De Becker, P.,Huybrechts, W., 2007. Random forests as a tool for ecohydrological distributionmodelling. Ecological Modelling 207, 304318.Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classication and regression treetechniques: bagging and random forests for ecological prediction. Ecosystems 9,181199.Quijano-Hernndez, E., Calm, S., 2002. Aprovechamiento y conservacin de la faunasilvestre en una comunidad maya de Quintana Roo. Etnobiologia 2, 118.R Development Core Team, 2008. R: A Language and Environment for StatisticalComputing. http://www.R-project.org.Recknagel, F., 2001. Applications of machine learning to ecological modelling.Ecological Modelling 146, 303310.Ripley, B.D., 1996. Pattern Recognition and Neural Networks. Cambridge UniversityPress, Cambridge, UK.Ripley, B.D., 2007. Tree: Classication and Regression Trees. R package version 1.0-26,http://CRAN.R-project.org/.Schlkopf, B., Smola, A., 2002. Learning with Kernels: Support Vector Machines,Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA.Schultz, A.,Wieland, R., Lutze, G., 2000. Neural networks in agroecologicalmodelling stylishapplication or helpful tool? Computers and Electronics in Agriculture 29, 7397.Segurado, P., Arajo, M.B., 2004. An evaluation of methods for modelling speciesdistributions. Journal of Biogeography 31, 15551568.Steele, B.M., 2000. Combining multiple classiers: an application using spatial andremotely sensed information for land cover mapping. Remote Sensing ofEnvironment 74, 545556.Sutherland, W.J., Pullin, A.S., Dolman, P.M., Knight, T.M., 2004. The need for evidence-based conservation. Trends in Ecology and Evolution 19, 305308.Thuiller, W., Arajo, M.B., Lavorel, S., 2003. Generalized models vs. classication treeanalysis: predicting species distributions of plant species at different scales. Journalof Vegetation Science 14, 669680.Thuiller, W., Lafourcade, B., Engler, R., Arajo, M., 2009. BIOMOD a platform forensemble forecasting of species distributions. Ecography 32, 369373.Tscherko, D., Kandeler, E., Bardossy, A., 2007. Fuzzy classication of microbial biomass andenzyme activities in grassland soils. Soil Biology and Biochemistry 39, 17991808.Venables, W.N., Ripley, B.D., 2002. Modern Applied Statistics with S-PLUS. Springer,New York.Vinterbo, S.A., 2007. gcl: Compute a Fuzzy Rules or Tree Classier from Data. R packageversion 1.06.5, 2007, http://CRAN.R-project.org.Vinterbo, S.A., Kim, E.-Y., Ohno-Machado, L., 2005. Small, fuzzy and interpretable geneexpression based classiers. Bioinformatics 21, 19641970.Warner, B., Misra, M., 1996. Understanding neural networks as statistical tools.American Statistician 50, 284293.Wieland, R., 2008. Fuzzy models. In: Jrgensen, S.E., Fath, B.D. (Eds.), Encyclopedia ofEcology, Vol. 2. Elsevier, Amsterdam, The Netherlands, pp. 17171726.Wieland, R., Mirschel, M., 2008. Adaptive fuzzy modeling versus articial neuralnetworks. Environmental Modelling & Software 23, 215224.Williams, G.J., 2009. Rattle: a data mining GUI for R. R Journal 1 (2), 4555.450 C. Kampichler et al. / Ecological Informatics 5 (2010) 441450Classification in conservation biology: A comparison of five machine-learning methodsIntroductionMaterial and methodsData originSoftware and data preparationClassification methodsPerformance measuresResultsDiscussionModel performanceTime effort for modelling and optimisationClassifier comprehensibility and potential for knowledge detectionIntricacy of classification techniquesIntegrative ranking of classification methodsConclusionAcknowledgementsSupplementary dataReferences

Recommended

View more >