classification in conservation biology: a comparison of five machine-learning methods

Download Classification in conservation biology: A comparison of five machine-learning methods

Post on 25-Nov-2016




1 download

Embed Size (px)


  • mmherx 40se, E00,oke,

    Received 12 October 2009Received in revised form 17 June 2010Accepted 18 June 2010

    Keywords:Articial neural networksClassication treesFuzzy logicMeleagris ocellataRandom forests

    1. Introduction 2001). To overcome this problem, machine-learning methods have

    Ecological Informatics 5 (2010) 441450

    Contents lists available at ScienceDirect

    Ecological In

    j ourna l homepage: www.e lsClassication is one of the most widely applied tasks in ecology,and has been encountered in a variety of contexts, such as suitable siteassessment for ecological conservation and restoration (e.g., Chaseand Rothley, 2007), bioindicator identication (Kampichler andPlaten, 2004), vegetation mapping by remote sensing (Steele, 2000),risk assessment systems for introduced species (Caley and Kuhnert,2006), and species distribution and habitat models (e.g., Guisan andThuiller, 2005). In most cases, ecologists have to deal with noisy, high-dimensional data that are strongly non-linear and which do not meet

    been increasingly adopted as ecological classication methods overthe last 20 years. The most popular of these are classication trees andarticial neural networks (De'ath and Fabricius, 2000; Recknagel,2001; zesmi et al., 2006). New data mining approaches arecontinually being developed, thus steadily increasing the number ofavailable methods. Diversication of the ecological toolbox is notalways an advantage, however. Some methods are computationallyintensive and there are few guidelines or standard proceduresavailable, causing some confusion among ecologists over whichmethod should be selected for a given classication problem.the assumptions of conventional statistical

    Corresponding author. Vogeltrekstation, Dutch CDemography, NIOO-KNAW, P.O. Box 40, 6666 ZG Heter26 4791 233; fax: +31 26 4723 227.

    E-mail addresses: (C. K(R. Wieland), (S. Calm), holg(H. Weissenberger), (S. Arria

    1574-9541/$ see front matter 2010 Elsevier B.V. Aldoi:10.1016/j.ecoinf.2010.06.003Support vector machinesdimensional data that often are non-linear and do not meet the assumptions of conventional statisticalprocedures. To overcome this problem, machine-learning methods have been adopted as ecologicalclassication methods. We compared ve machine-learning based classication techniques (classicationtrees, random forests, articial neural networks, support vector machines, and automatically induced rule-based fuzzy models) in a biological conservation context. The study case was that of the ocellated turkey(Meleagris ocellata), a bird endemic to the Yucatan peninsula that has suffered considerable decreases in localabundance and distributional area during the last few decades. On a grid of 1010 km cells that wassuperimposed to the peninsula we analysed relationships between environmental and social explanatoryvariables and ocellated turkey abundance changes between 1980 and 2000. Abundance was expressed inthree (decrease, no change, and increase) and 14 more detailed abundance change classes, respectively.Modelling performance varied considerably between methods with random forests and classication treesbeing the most efcient ones as measured by overall classication error and the normalised mutualinformation index. Articial neural networks yielded the worst results along with linear discriminantanalysis, which was included as a conventional statistical approach. We not only evaluated classicationaccuracy but also characteristics such as time effort, classier comprehensibility and method intricacyaspects that determine the success of a classication technique among ecologists and conservation biologistsas well as for the communication with managers and decision makers. We recommend the combined use ofclassication trees and random forests due to the easy interpretability of classiers and the highcomprehensibility of the method.

    2010 Elsevier B.V. All rights reserved.procedures (Recknagel, Various attembased classicaticomparisons, idapplications (ThuGarzn et al., 20et al., 2006; PrasMeynard and Qution trees, rand

    entre for Avian Migration &en, The Netherlands. Tel.: +31

    ampichler), rwieland@zalf.deerweissen@ecosur.mxga-Weiss).

    l rights reserved.asks in ecology. Ecologists have to deal with noisy, high-Article history: Classication is one of the most widely applied ta b s t r a c ta r t i c l e i n f oClassication in conservation biology: A co

    Christian Kampichler a,b,, Ralf Wieland c, Sophie Cala Universidad Jurez Autnoma de Tabasco, Divisin de Ciencias Biolgicas, Carretera Villab Vogeltrekstation, Dutch Centre for Avian Migration & Demography, NIOO-KNAW, P.O. Boc ZALF Leibniz-Zentrum fr Agrarlandschaftsforschung, Institut fr Landschaftssystemanalyd El Colegio de la Frontera Sur, Unidad Chetumal, Avenida del Centenario km. 5.5, C.P. 779e Universit de Sherbrooke, Dpartement de biologie, 2500 avenue de l'Universit, Sherbroparison of vemachine-learningmethodsd,e, Holger Weissenberger d, Stefan Arriaga-Weiss a

    mosa-Crdenas km. 0.5 s/n, C.P. 86150, Villahermosa, Tabasco, Mexico, 6666 ZG Heteren, The Netherlandsberswalder Strae 84, D-15374 Mncheberg, GermanyChetumal, Quintana Roo, MexicoQuebec, Canada J1K 2R1


    ev ie locate /eco l in fpts have been made to popularise machine-learningonmethods among ecologists, and through a series ofentify their strengths and weaknesses for differentiller et al., 2003; Segurado and Arajo, 2004; Benito-06; Elith et al., 2006; Leathwick et al., 2006; Moisenad et al., 2006; Cutler et al., 2007; Peters et al., 2007;inn, 2007). Different methods (including classica-om forests, bagging, articial neural networks,

  • considered a reference grid of 1010 km cells, a resolution commonlyused tomap species distributions in bird atlases. Maximumnumber ofbirds within ocks was used as a proxy for species abundance, asgroup size varies with local abundance in gregarious species (Calmet al., in press). Estimates of OT abundance in 1980 and 2000 wereavailable on an ordinal scale (absent, low abundance, mediumabundance, and high abundance) for 688 cells. We applied twoways for dening the change of OT abundance between 1980 and2000. First, we classied abundance changes in three classes(increase, decrease, and no change), irrespective of prior abundancein 1980. For example, a cell with low OT abundance in 1980 that wentextinct falls into the same class decrease as does a cell with high OTabundance in 1980 that dropped to medium or low abundance or toextinction. Second, we resolved abundance changes at a ner grain,using 14 classes, with each one dened by abundance class in 1980and abundance class in 2000 (Table 1). We refer to them as thecoarse and ne models, respectively, throughout the paper.

    442 C. Kampichler et al. / Ecological Informatics 5 (2010) 441450multivariate adaptive regression splines, and genetic algorithms,among others) have been tested and compared with one another orwith more conventional statistical procedures such as generalisedlinear models and discriminant analysis. Due to the large number ofmethods and their alternatives (e.g., the various algorithms forinducing classication trees and the different methods for improvingtheir efciency such as pruning and boosting), the picture is still farfrom complete or conclusive and many questions remain unanswered(Elith et al., 2006). Moreover, the adoption of a new method dependsnot only on its efciency, but also on how comprehensible it is, howmuch time is necessary to learn and understand it, how easily can itsresults be interpreted, and whether it requires specialised software orcan be executed in an already familiar environment. In this paper, wecompare ve machine-learning methods against the traditionalclassication technique of discriminant function analysis (DA).

    The majority of articles that we examined and which were testingnovel classication methods in the context of distribution and habitatmodelling have used binary classication, i.e., presence or absence of aspecies. In the present paper the techniques of classication trees,random forests, articial neural networks, support vector machines,and automatically generated fuzzy classiers were tested, along withDA, for a casewithmultiple classes, viz., the analysis of abundance anddistribution loss of the ocellated turkey (Meleagris ocellata), anendangered species that is endemic to the Yucatan Peninsula, Mexico.

    The ocellated turkey (OT) is a large and easily recognisable birdwhose geographic range encompasses the Mexican states of Yucatn,Campeche and Quintana Roo, together with the Guatemalan depart-ment of El Petn and the northern portion of Belize (Howell andWebb, 1995). Calm et al. (in press) have demonstrated a consider-able loss of distributional area and local abundance of the OT between1980 and 2000, based on an analysis of 688 grid cells (1010 kmeach) distributed throughout the Mexican part of the species' range.Over this 20-year period, OT went extinct in 16.6% of the cells, whileits local abundance decreased in 34.6% of the cells. Most probably,colonisation from other regions of Mexico, which began in the 1970s,and its immediate consequences, such as increased levels ofsubsistence hunting (Quijano-Hernndez and Calm, 2002) anddeforestation, have driven OT abundance below a critical threshold.The current decrease of OT abundance is apparently decoupled fromexternal factors (Kampichler et al., submitted for publication), fromwhich it is concluded that the OT has encountered an extinctionvortex very similar to the model proposed by Oborny et al. (2005).This model predicts spontaneously fragmenting populations with,consequently, failing density regulation.

    The ve machine-learning based classication methods that weused to model OT abundance changes have included those that ha


View more >