geofis: an integrated tool for the assessment of landslide susceptibility

11

Click here to load reader

Upload: aykut

Post on 23-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GeoFIS: An integrated tool for the assessment of landslide susceptibility

GeoFIS: An integrated tool for the assessmentof landslide susceptibility

Turgay Osna a, Ebru Akcapinar Sezer b,n, Aykut Akgun c

a Netcad, Cyberpark, 06800 Bilkent, Ankara, Turkeyb Hacettepe University, Computer Engineering Department, 06800 Beytepe, Ankara, Turkeyc Karadeniz Technical University, Geological Engineering Department, Trabzon, Turkey

a r t i c l e i n f o

Article history:Received 10 October 2013Received in revised form26 December 2013Accepted 28 December 2013Available online 15 January 2014

Keywords:Mamdani fuzzy inference systemLandslide susceptibilityGeographic information systems (GIS)Trabzon (Northern Turkey)

a b s t r a c t

In this study, requirements of landslide susceptibility mapping by a Mamdani fuzzy inference system(FIS) are identified, and a single standalone application (GeoFIS) is developed. GeoFIS includes two mainopen source libraries, one for GIS operations and the other for creating Mamdani FIS. As a result, it ispossible to construct a landslide susceptibility map based on expert opinion, to visualize maps instantlyand to measure model performance. GeoFIS supports all steps of the landslide susceptibility mappingprocess, starting from data deployment and ending with performance measurement. In GeoFIS, visualcontrols allow use of the inferred results and actual landslide occurrence information, and ROC–AUCvalues are calculated automatically. Moreover, a confusion matrix is produced, and alternative measure-ment schemes such as recall are suggested, to reveal those performance details not observable withROC–AUC and to create trust in the inferred results. GeoFIS is applied to the Trabzon region of northernTurkey, and the recall and ROC–AUC values were .902 and .602, respectively.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Landslides are complex natural events with large effects onsocial and economic life. Thus assessment of landslide suscept-ibility is highly important for society. Gokceoglu and Sezer (2009)emphasized the importance of the susceptibility concept based onits frequent use in the landslide literature. Landslide informationin the landslide inventory is crisp, such as 1 (occurred) and 0 (didnot occur). However, susceptibility involves the spatial probabilityof occurrence and is presented as a numerical value that indicatesthe degree of susceptibility. In other words, susceptibility is thederived value based on the geomorphological, geological andenvironmental data from the region of interest. Predictive, classi-fication or statistical methods can be used for this type ofderivation. In fact, the variety of available methods has increasedwith the developments in soft computing techniques.

The complexity of the landslide susceptibility assessment is dueto the fuzziness in the conditioning parameters of the landslide.Thus, landslide susceptibility mapping is a context dependentproblem. In other words, all parameters that are used for landslidesusceptibility mapping should be assessed in study area which theyare measured. As a result, a given value of one parameter can havedifferent effects for different study areas. Furthermore, there is no

crisp mathematical function relating landslide susceptibility to theconditioning parameters. At this point, data-driven methodsbecome appropriate for landslide susceptibility derivation. Akgunet al. (2012) classifies the methods for landslide susceptibilitymapping into three categories, including statistical, soft computingand combined index maps. In this study, soft computing techniquesare addressed and criticized, and the integrated tool GeoFIS isproposed to conduct landslide susceptibility assessment using aMamdani-based inference system (Mamdani and Assilian, 1973).

In the literature, several soft computing methods have beenapplied to this problem, such as artificial neural networks, Baye-sian classifiers, adaptive neuro fuzzy systems, support vectormachines, and decision trees (e.g., Arora et al., 2004; Lee et al.,2004; Lui et al., 2006; Kanungo et al., 2006; Melchiorre et al.,2008; Pradhan et al., 2010; Caniani et al., 2008; Yilmaz, 2009;Nefeslioglu et al., 2010; Sezer et al., 2011; Xu et al., 2012;Ramakrishnan et al., 2013; Venkatesan et al., 2013). The generalapplications of these methods are illustrated in Fig. 1.

The data-driven methods include sampling, training, testingand performance evaluation phases. In the sampling phase, datawere divided into training and testing datasets by using anysampling method proposed in the literature (i.e. cross validation(Hirsch, 1991), stratified random sampling (Groves et al., 2009)).The training phase aims to discover the nonlinear relationshipsbetween input parameters and the output parameter. In thisphase, method parameters such as coefficients or weights areadjusted to minimize the error between measured and output

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/cageo

Computers & Geosciences

0098-3004/$ - see front matter & 2014 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.cageo.2013.12.016

n Corresponding author. Tel.: þ90 3122977500; fax: þ90 3122977502.E-mail address: [email protected] (E.A. Sezer).

Computers & Geosciences 66 (2014) 20–30

Page 2: GeoFIS: An integrated tool for the assessment of landslide susceptibility

values. After training, the testing phase is employed to assessthe quality of the model. At the same time, training data wereassessed one more time to clarify whether model over-fits. Aperformance-evaluation method is used to quantify the success ofthe predictions; Area Under Curve (AUC) with a Receiver Operat-ing Characteristics (ROC) Curve assessment is most useful methodbecause of its threshold free approach.

The major disadvantage associated with this type of process is thatall constructed models are completely dependent on the training dataused. This means that the model cannot be used for another studyarea, even if the conditioning characteristics are similar. In otherwords, new model has to be built for new study area. In addition,methods such as support vector machines or artificial neural networksisolate the modeler from the training stage. In other words, themodeler has no chance to learn re-usable relationships that existbetween inputs and the output. If construction of more broadlyapplicable models is necessary, Mamdani type fuzzy inference systemthat excludes the training stage becomes a more suitable approach.In fact, use of fuzzy inference systems to produce landslide suscept-ibility maps is discussed in the literature: Kayastha (2012), Tangestani(2004) and Lee (2007) all used fuzzy logic for landslide susceptibilitymapping. In these studies, different fuzzy operators were applied togenerate landslide susceptibility maps. In Ercanoglu and Gokceoglu(2004), fuzzy relations are employed for producing the susceptibilitymap, whereas Ercanoglu and Gokceoglu (2002) extracted 23 if-thenfuzzy rules for this purpose. Kanungo et al. (2008) proposed fuzzy settheory to assess landslide risk. Pourghasemi et al. (2012) presented asimilar application of fuzzy inference on this problem. Akgun et al.(2012) produced a landslide susceptibility map using a Mamdani-typefuzzy inference system (FIS) at the first time in the literature. Theypresented modeling software, MamLand, which is developed withMATLAB. This study and software enable modelers to process largequantities of data in the Mamdani FIS. In landslide susceptibilitymapping, the datasets employed have always large data points (i.e.,350,000 vs. 700,000 or more). It is not possible to develop FIS withoutautomatic data processing. In fact, MamLand enables this type ofautomation. However, MamLand has three impractical features. First,the modeler must use Microsoft Excel files to build models. Thismeans that the modeler is completely responsible for the preparationof error-free files to fulfill his requirements. Second, MATLAB and MSExcel must be installed to run MamLand. Third, MamLand conductsonly fuzzy inference; the modeler must use a separate geographicinformation system to produce and view the susceptibility map.For these reasons, this study develops the GeoFIS software tomeet all needs of the expert while modeling with Mamdani FIS. Morespecifically, the features of GeoFIS are as follows:

� Deployment of data.� Building Mamdani FIS (Constructing fuzzy rules, fuzzy sets,

membership functions).� Inference on susceptibility values.� Visual checking of inferred and measured results on a map.� Local inference debugging.� Performance evaluation with ROC–AUC and confusion matrices.

In addition, GeoFIS is directly executable software: it does notdepend on any other previously installed software.

2. Mamdani-type fuzzy inference system

Fuzzy inference systems (FIS) produce a crisp output forsupplied crisp inputs by using fuzzy set theory (Zadeh, 1965).The basic element of FIS is a linguistic variable with a linguisticvalue. In fact, each linguistic value comes upon a fuzzy set, and afuzzy set has unsharp boundaries; that is, an instance can belongto multiple sets with particular membership degrees (m). Member-ship degrees are produced with membership functions. In theliterature, there are several membership functions, such as trian-gular-shaped, trapezoidal-shaped, sigmoidal-shaped, bell-shaped,Gaussian combination, and π-shaped. If “х” is a crisp output, and“low” is a fuzzy set, then mlow(x) indicates the membership value ofx to the “low” set. Similarly, x can have another membership valuefor another fuzzy set because of overlapping boundaries of fuzzysets. As a result, fuzzy logic can resemble human reasoning,especially for values near the boundaries. This type of modelingenables one to consider uncertainty, context-dependency andcomplexity during the inference process.

There are three types of fuzzy inference systems: Mamdani-style, Sugeno-style and Tsukamoto-style. Their general processesare similar, but their rule structure differs at the conclusion. As aresult, aggregation or defuzzification techniques differ as well.In this study, Mamdani-style fuzzy inference is used because it isthe most appealing and commonly used method. The generalcharacteristics of Mamdani-style FIS are given here as a reminder,and detailed information can be found in Alvarez Grima (2000).There are four inference phases in Mamdani style FIS, includingfuzzification, rule evaluation, aggregation and defuzzification(Mamdani and Assilian, 1973); they are illustrated in Fig. 2.

In Fig. 2, “x” and “y” represent crisp inputs from the environ-ment, and “A”, “B” and “C” represent linguistic variables (i.e., slope,curvature). The values (A1, A2, B1, B2, C1 and C2) of the linguistic

Fig. 1. General view of modeling process using soft computing methods.

Fig. 2. A generalized scheme of the Mamdani style inference.

T. Osna et al. / Computers & Geosciences 66 (2014) 20–30 21

Page 3: GeoFIS: An integrated tool for the assessment of landslide susceptibility

variables are defined as fuzzy sets. Each inference phase can besummarized as follows:

1) Fuzzification: The crisp inputs coming from the environmentare mapped into a vector of linguistic values with calculatedmembership values.

2) Rule Evaluation: Fuzzy If-Then rules are prepared by the domainexpert using linguistic variables and Boolean operators. Therules with linguistic variables in their antecedent parts areevaluated if some membership values occurred in the fuzzifica-tion step. After the rule application, the graphical output of thefuzzy set on the subsequent part of the rule is scaled or clippedat the point of the calculated membership degree, which isshown in Fig. 2.

3) Aggregation: The final fuzzy output of the model is produced byaggregation of all local results from fuzzy rules triggered in the ruleevaluation phase. The max operator can be used for aggregation.

4) Defuzzification (optional): The fuzzy output produced in theaggregation phase is converted into a crisp output. This processcan be conducted in many ways, such as the centroid techni-que, a center of gravity (COG) max method that selects thefuzzy set with the largest membership value, or the largestmaximum mean technique (Cox, 1994). In the defuzzificationphase of the experiment, the centroid technique is used.

3. GeoFIS

GeoFIS aims to provide a single platform for landslide suscept-ibility mapping by including all necessary steps. In other words,

GeoFIS is a standalone application with Mamdani-type inferenceability and open source GIS software analysis (Fig. 3).

DotSpatial is a geographic information system library devel-oped with .NET DotSpatial, 2012. It allows developers to incorpo-rate spatial data, analysis and mapping functionality into theirapplications or to contribute GIS extensions to the communityDotSpatial, 2012. The Fuzzy Logic Library for Microsoft.Net (Fuz-zyNet) is an easy-to-use component that implements a fuzzyinference system (both Mamdani and Sugeno methods are sup-ported) (Fuzzynet, 2013). Both of these libraries are open sourceand are compatible with each other. In GeoFIS, DotSpatial andFuzzyNet are employed, and easy-to-use graphical interfaces aredeveloped, to execute the process of landslide susceptibilitymapping. Some highlights of GeoFIS were listed in the previoussection; a detailed list of GeoFIS features follows:

� Presents a single platform with integrated GIS and Mamdani-type inference.

� No external library or software requirements.� Efficient modeling ability based on expert opinion.� Batch input–output process.� Instant mapping/visual control.� Develops a regional model.� Debugging of model behavior for selected regions.� User-friendly graphical interface.� Map coloring and labeling.

3.1. Inference with GeoFIS

In GeoFIS, triangular, trapezoidal and gaussian membershipfunctions are supported, and the intersection and union of thefuzzy sets are calculated, with Eqs. (1) and (2), respectively:

mA\BðxÞ ¼ mAðxÞmBðxÞ ð1Þ

mAUBðxÞ ¼ mAðxÞþmBðxÞ�mAðxÞmBðxÞ ð2Þwhere “A” and “B” are fuzzy sets, and the use of Eqs. (1) and (2)together are advisable because of principles of duality. The scalingmethod is selected for rule evaluation, and the two types ofdefuzzification methods, center of gravity and mean of maximum,are supported.

GeoFIS supports the.shp data format, a popular GIS format, anddata in Excel or text files can be easily imported and transformedFig. 3. A generalized scheme of the GeoFIS.

Fig. 4. Definition of membership functions in GeoFIS.

T. Osna et al. / Computers & Geosciences 66 (2014) 20–3022

Page 4: GeoFIS: An integrated tool for the assessment of landslide susceptibility

to the.shp format. The main requirement is the X and Y coordi-nates of each parameter. In landslide susceptibility mapping, eachparameter (i.e., curvature and soil type) can be imported to thesystem one-by-one or as a group. After importing, parameters canbe queried or colored easily in GeoFIS. Most of the abilities listedso far are based on DotSpatial (2012).

After inputs are loaded to GeoFIS, the user can select an existingmodel or build new one. Input parameters used in the model areselected individually, and the output parameter is defined. In thislandslide susceptibility problem, the output is susceptibility. How-ever, to make GeoFIS applicable to different problems, the systemhas no default output parameter. After definition of the input–output parameters, membership functions are defined for theparameters; the definition for the susceptibility parameter (output)is illustrated in Fig. 4. The next step is population of the fuzzy rules

by the domain expert, with the help of the graphical interfaceillustrated in Fig. 5. Moreover, the user can import fuzzy rules froman Excel file or MATLAB environment.

At this point all requirements for inferring susceptibility valuesare satisfied and the user can implement the inference with theclick of a button (Fig. 6). With a hardware configuration includingan i7 processor, 64-bit CPU and 8 GB RAM, the inference processcompletes in approximately 15 min. When inference is completed,the user can overlay actual landslide occurrence data on theinferred values to observe the performance visually (Fig. 7). Ifsome intervals of the inferred results are colored differently (i.e.,red for 4 .8), as seen in Fig. 7, more effective analysis can beconducted. If the actual landslide occurrence value is available,inferred landslide susceptibility values and actual occurrences ofthe landslides can be visualized together (Fig. 7).

Fig. 5. Definition of fuzzy rules in GeoFIS.

Fig. 6. Inference GUI of GeoFIS.

T. Osna et al. / Computers & Geosciences 66 (2014) 20–30 23

Page 5: GeoFIS: An integrated tool for the assessment of landslide susceptibility

At this point, the user can do reverse engineering using thedebugging ability in GeoFIS (Fig. 8). In other words, if the useridentifies a region in which the inferred result differs greatly fromthe expectation, he/she can select that region and analyze thetriggered rules and the input parameter values. Thereafter, theuser can make tentative changes to membership values and fuzzyrules, and he/she can see the instantaneous inference results ofthese changes. The user can keep or cancel these changes.

3.2. GeoFIS-evaluation criteria

Overall accuracy has been shown to be inadequate for measur-ing the performance of a classifier when the data are imbalanced(Liao, 2008); multiple different measurement styles are required.Liao (2008) presents the geometric mean of accuracies, theF1-measure and ROC–AUC as the three most commonly usedcriteria to measure classifier performance on imbalanced data. In

Fig. 7. Visual comparison GUI in GeoFIS. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)

Fig. 8. Debug mode in GeoFIS for selected subregion.

T. Osna et al. / Computers & Geosciences 66 (2014) 20–3024

Page 6: GeoFIS: An integrated tool for the assessment of landslide susceptibility

the landslide susceptibility literature, method performance isassessed using ROC, and the issue of imbalanced data is comple-tely ignored. However, Davis and Goadrich (2006) and Sun et al.(2009) stated that the area under the ROC-curve could present anoverly optimistic view of an algorithm in an imbalanced scenarioand suggested the area under Precision-recall (PR-Curve) instead.Therefore, both the ROC-curve and confusion matrices areincluded in GeoFIS, and performance is assessed by both methods.

The confusion matrix (Table 1) gives basic statistics for assess-ment of inference system performance, such as true positive rate(TPR), false positive rate (FPR), precision and recall (Eqs. 3–5).Precision and recall reflect the sensitivity and specificity of themodels; their optimal values are equal to 1.

TPR¼ Recall¼ ð tptpþ f n

Þ ð3Þ

FPR¼ f pf pþtn

� �ð4Þ

Precision¼ ð tptpþ f p

Þ ð5Þ

The ROC curve is a threshold-free criterion and shows changes inTPR versus FPR. In other words, a ROC curve plots the TPR on the

Table 1Confusion Matrix.

Confusion matrix (tp: true positive, fp: false positive, fn: falsenegative, tn: true negative)Confusion matrix

Predicted

Positive Negative

Measured Positive tp fnNegative fp tn

Fig. 9. Typical ROC curve (Menzies et al., 2004, 2007).

Fig. 10. Location of the study area (Modified from Yalcin et al., 2011).

T. Osna et al. / Computers & Geosciences 66 (2014) 20–30 25

Page 7: GeoFIS: An integrated tool for the assessment of landslide susceptibility

y-axis and FPR on the x-axis. As a result, a ROC curve starts at thepoint (0, 0) and reaches (1, 1). The ideal ROC curve passes throughthe point (0, 1) with AUC¼1, indicating that there is no predictionerror. An acceptable AUC value for a ROC curve should be higherthan .5. A typical ROC curve is shown in Fig. 9 (Menzies et al., 2004,2007).

ROC can express the overall performance of the system, butassumes that an increase in true predictions can be a possibleincrease in false predictions. However, in landslide susceptibilityproblems, the output is designed to be regressive in the interval[.1] and is used as a susceptibility value. As a result, the falsenegative predictions (fn) need to be considered, but ROC cannotemphasize this factor.

When inference in GeoFIS is completed, the ROC curve is drawnand a confusion matrix is calculated automatically. The values ofTPR, FPR, precision and recall are listed for the current inference.The use of these metrics together enables the user to interpret theresults in more detail.

4. Test area and data

The test area is located on the eastern part of Black Sea Regionof Turkey, has an area of 300 km2 including the city of Trabzon,and is one the most landslide-prone areas of Turkey (Fig. 10).

In the eastern part of the Black Sea Region, several catastrophiclandslides have occurred, killing or injuring several people. Thus,the area is an attractive area of study for earth scientists interestedin landslide hazard (Akgun and Bulut, 2007; Akgun et al., 2008;Nefeslioglu and Gokceoglu, 2011; Yalcin et al., 2011; Kavzogluet al., 2013).

Due to its topography, climate and geology, the area is highlysusceptible to landslide phenomena. The annual precipitation ofthe area is almost 800 mm and is distributed evenly across months(Kavzoglu et al., 2013). This can trigger the susceptible slopes andcause landslides. In the area, the altitude ranges between 0and 1800 m (Fig. 11a) (Table 2). The slope gradients in the arearange from 01 to 76.031, with an average of 18.551 slope (Fig. 11b)

Fig. 11. Topographical data used to be landslide conditioning parameters. (a) Elevetation, (b) slope gradient, (c) slope aspect, (d) stream power index (SPI), and(e) topographical wetness index (TWI).

T. Osna et al. / Computers & Geosciences 66 (2014) 20–3026

Page 8: GeoFIS: An integrated tool for the assessment of landslide susceptibility

(Table 2). These slopes are generally southeast-facing (135.951)(Fig. 11c) (Table 2), and frequently encounter heavy rainfall thatmakes the slopes susceptible to landslide occurrence. Several

drainages generally control the runoff on the topography andaffect landslide occurrence; this was considered in this study.

To prepare a landslide susceptibility map based on expert-opinion, information on the landslide occurrence conditions iscrucial. In the study area, a total of 140 landslides were identified,and the mode of failure of each was determined to be a translationalrock block slide, debris block slide, translational earth slide androtational earth slumps according to the classification proposed byVarnes (1978) (Fig. 12). The areal extent of the smallest observablelandslide is approximately .023 km2 and the largest is 2175 km2.The range of failure depth of the rotational and planar slides changesbetween 2 m and more than 10 m. Considering the conditions of thelandslides identified in the study area, the fuzzy “if-then” rules weredeveloped. The fuzzy “if-then” rules are extremely important for arepresentative and realistic landslide susceptibility map. For thisreason, considering the landslide occurrence conditions for the studyarea, the rules were determined.

Total six parameters were considered for the landslide suscept-ibility zoning of the study area. These parameters are lithology,morphology such as altitude, slope gradient, slope aspect, topo-graphical wetness index (TWI) and stream power index (SPI)(Fig. 11).

Lithology is one of the most important parameters controllinglandslide occurrence in the study area. To investigate the relation-ship between landslide occurrence and lithology, a 1:100,000 scalelithology map prepared by the General Directorate of MineralResearch and Exploration (1998) was used (Fig. 12). In this map,nine lithological units were differentiated depending on the age,stratigraphic position and rock type. These units were reclassifiedinto three main categories based on rock type because ten differentunits might have caused difficulties when constructing if-thenrules. The reclassified units included alluvium (Al) (4.73% of thetotal area), volcanic rocks (86.11% of the total area) such as Eocenevolcanic facies (Ev), basalt, andesite and their pyroclasts (Kru1 andKru3), riodacite, dacite and their pyroclasts (Kru2), riolite, riodaciteand their pyroclasts (Kru4b and Kru5a), and sedimentary rocks(9.16% of the total area) such as Pliocene continental units (Pl) anduncemented sand pebble and clay (S) (Table 3). Almost all of the

Table 2Descriptive statistics for the data.

Grid cells without landslides Altitude Slope Aspect SPI TWI

N Valid 478,096 478,096 478,096 478,096 478,096Missing 0 0 0 0 0

Mean 488.0386 18.5522 135.9553 126.3481 6.9235Median 423.8300 19.3900 102.2500 58.0850 6.4500Mode 200.00 .00 �1.00 .00 10.13Std. deviation 318.42666 13.09510 122.22035 273.52478 1.70535Variance 101395.5 171.482 14937.814 74815.806 2.908Skewness .921 .133 .423 9.349 .905Std. error of skewness .004 .040 .040 .040 .040Kurtosis .683 � .633 �1.296 149.343 .026Std. error of Kurtosis .007 .007 .007 .007 .007Minimum 3.00 .00 �1.00 .00 3.05Maximum 1800.00 76.03 359.71 10153.67 19.20

N Valid 6531 6531 6531 6531 6531Missing 0 0 0 0 0

Mean 590.4380 20.7700 143.8797 140.7517 6.5869Median 544.2400 21.8200 119.6600 68.2000 6.1500Mode 500.00 .00 �1.00 .00 10.13Std. deviation 248.13826 12.41416 116.24321 289.33936 1.52125Variance 61572.596 154.111 13512.483 83717.265 2.314Skewness .462 � .072 .272 8.037 1.239Std. error of Skewness .030 .030 .030 .030 .030Kurtosis � .206 � .320 �1.332 106.777 1.121Std. error of Kurtosis .061 .061 .061 .061 .061Minimum 100.00 .00 �1.00 .00 3.67Maximum 1350.00 66.78 359.55 6498.54 14.53

Fig. 12. Lithology map (General Directorate of Mineral Research and Exploration(1998))and the landslide inventory of the study area.

T. Osna et al. / Computers & Geosciences 66 (2014) 20–30 27

Page 9: GeoFIS: An integrated tool for the assessment of landslide susceptibility

landslides observed in the area (97.83% of the all landslides)occurred in the volcanic rock units. In and around the study region,volcanic rock weathering is a widespread process due to thefracture structure of the rock units and the hydrothermal activityin the region. As the result of the weathering process, clay mineral-bearing soils develop. When heavy rainfall saturates these soils, theincrease in soil mass makes the lithological units very susceptible tolandslides. In general, the weathering depth in the area reaches to amaximum of four meters, and this depth is highly suitable forshallow-seated landslides. Weathering is not unique factor forlandslide occurrence in the area. Weathered rock units with steeplysloping topography are much more susceptible to landslide occur-rence. Fewer landslides were observed in the sedimentary rockunits (2.17% of the all landslides). These sedimentary rocks aregenerally loosely cemented and have a heterogeneous composition;thus the steep sedimentary slopes are more susceptible to landslideoccurrence. A descriptive statistical data showing the landslideoccurrence probability for each of lithological units was given inTable 3.

To produce altitude, slope gradient, slope aspect, TWI and SPIdata, initially a 1:25,000 scale topographical sheet of the area wasdigitized. Then, by using the digitized topographical data, a digitalelevation model (DEM) of the area was created by triangulatedirregular network (TIN) data model. With the help of the obtainedDEM, all the morphological data were produced.

Altitude was considered to be a good indicator for landslidesusceptibility and has been used by many researchers (Pachauriand Pant, 1992; Ercanoglu and Gokceoglu, 2002; Nefeslioglu et al.,2010; Akgun et al., 2012). For this reason, in this study, altitudewas considered as one of the landslide-conditioning parameters.

Slope gradient is frequently used in landslide susceptibilitystudies because sliding of the loose material is directly related tothe slope gradient (Liu et al., 2004; Akgun and Turk, 2010).Although the relationship between slope aspect and landslidinghas been investigated extensively, there is no general consensuson the effect of slope aspect (Carrara et al., 1991; Ercanoglu andGokceoglu, 2002). However, aspect is considered an importantlandslide-conditioning parameter for landslide susceptibilitymapping because the aspect related factors, such as exposure torainfall and wetting– drying cycles, may control the occurrence oflandslides (Van Westen and Bonilla, 1990; Carrara et al., 1991; Daiet al., 2001). The other secondary derivative of the DEM producedis the TWI. TWI has been used to describe the effect of topographyon the location and size of saturated areas of runoff generation(Nefeslioglu et al., 2008; Akgun and Turk, 2010). To calculate theTWI values, Moore et al. (1991) proposed the following equation:

TWI ¼ lnðAs= tan βÞ ð6Þ

where As is the specific catchment area (m2/m), and β the slopegradient (in degrees). The TWI map was produced using the DIGEM2.0, developed for digital terrain analysis by Conrad (2002), and then

exported to the ArcGIS (Version 9.3.1) (2009) software to use for themodel construction.

Another DEM-derived morphological datum feature is the SPI.This index is used to describe potential flow erosion and relatedlandscape processes (Moore et al., 1991). The SPI is calculated fromthe following equation:

SPI¼ As tan β ð7Þwhere, As is the specific catchment area (m2/m), and β the slopegradient in degrees. As the specific catchment area and gradientincrease, the amount of water contributed by upslope areas andthe velocity of water flow increase; hence, the SPI and slope-erosionrisk increase (Moore et al., 1991). Therefore, this parameter can beconsidered to be one of the components of landslide occurrence(Lee and Min, 2001; Gokceoglu et al., 2005; Nefeslioglu et al., 2008;Akgun and Turk, 2010). The SPI map was also produced byDIGEM 2.0.

To show the relationship between the morphological data andlandslide occurrence, a descriptive statistic data was given inTable 2. Based on this statistical data, it can be concluded that themost of the landslide occurrence happens at 590 m high. The mostsusceptible slope gradient and slope aspect are 20.771 and 143.871respectively. However, the most landslide occurrence can be seen at140.75 SPI value and 6.58 TWI value (Table 2).

5. Experimental results and discussion

In study area, 478,096 pixel values were processed and 6531pixels had actual landslide occurrence values. The remainder(471,565 pixels) had no landslide information, and their suscept-ibilities were inferred with GeoFIS. Because only 1.37% of the pixelshad landslide value, the data were imbalanced. Total 6 parameterswere used in modeling phase and total 96 rules were developed bydomain expert. While modeling, 2 fuzzy sets were used for eachinput (low, high) and, membership functions were adjusted tohandle fuzziness at the maximum level. In detail, triangularmembership function of “low” set was designed as “a¼0”, “b¼0”and “c¼maximum value of the parameter” and triangular member-ship function of “high”was designed as “a¼0”, “b¼maximum valueof the parameter” and “c¼maximum value of the parameter”. Totalof 5 symmetric fuzzy sets were used for the output (very low, low,moderate, high, very high) and [.1] range of output was equallydivided into five sets. All membership functions were triangular,and center of area was used as the defuzzification method.

The ROC curve for the inference is given in Fig. 13; the AUCvalue was .602, a low but acceptable value. However, the confusionmatrix yielded a TPR value of .902 (Fig. 13), suggesting that theimplemented FIS model was highly successful at predictingsusceptibility of landslide occurrence. A threshold value of .5 wasused to produce the confusion matrix because the output is in theinterval of [0..1] This observed value (.902) represents the degree

Table 3The distribution of the lithological units with respect to landslide in the area.

Lithology # of grid cells % # of grid cells in occurrence area % Frequency ratio

Alluvium (Al) 22,615 4.73 0 0 .000Eocene volcavic facies (Ev) 70,154 14.67 334 5.11 .348Basalt, andesite and their pyroclasts (Kru1) 1191 .25 0 .00 .000Riodacite, dacite and their pyroclasts (Kru2) 8827 1.85 152 2.34 1.264Basalt, andesite and their pyroclasts (Kru3) 305,202 63.83 5369 82.21 1.287Riolite, riodacite and their pyroclasts (Kru4b) 7509 1.57 314 4.80 3.057Riolite, riodacite and their pyroclasts (Kru5a) 18,827 3.94 220 3.37 .855Pliocene continental units (Pl) 42,587 8.91 142 2.17 .243Uncemented sand pebble and clay (S) 1184 .25 0 .00 .000

T. Osna et al. / Computers & Geosciences 66 (2014) 20–3028

Page 10: GeoFIS: An integrated tool for the assessment of landslide susceptibility

of trust in the inferred results for the pixels which had nolandslide occurrence information. In other words, the success ofinference on the landslide occurrences increases the trust on theinferred susceptibility values for the pixels which have not yetexperienced landslides. The TPR value was high, but the ROC–AUCvalue was not as high because most of the actual landslidescorrespond to the secondary susceptibility zone (Fig. 7).

As can be seen from the design of the FIS model, core of thefuzzy sets used for inputs included only one value (minimum ormaximum values of the parameters). Thus, almost all inputsbecame member of two fuzzy sets (low and high) with differentdegrees and this situation has prevented outputs to approachvalue of 1. In fact, this type of design limits upper values whichmay be defuzzified. Thus ROC assessment produced slightly lowerperformance, because it is based on shifting threshold from 0 to1 by stepwise approach. Consequently, when TPR and ROC–AUCare considered together, the landslide susceptibility map for thestudy area is considered to be successful and useable by landplanners and decision makers.

6. Conclusion

Landslide susceptibility mapping is a complex problem because ofthe imprecision of geological data and the inadequacy of the manysoft computing and machine learning methods employed previously.However, it is possible to produce successful landslide maps, whichare important for land planners and decision makers. The productionof this type of map has not yet become routine for engineers andearth scientists because it has been impractical and difficult. In thisstudy, bottlenecks that make landslide susceptibly mapping difficultare identified and are considered in the design criteria for develop-ment of the proposed GeoFIS application. GeoFIS operates as a singleplatformwith integrated GIS and Mamdani-style inference, eliminat-ing the requirements for external libraries or software to infer orvisualize results. It has efficient modeling abilities based on expertopinion, batch input–output processes, instant mapping/visual con-trols, debugging of model behavior for selected regions, performanceassessment and user-friendly graphical interfaces. The imbalancednature of the data used in landslide mapping is discussed, and manyconfusion matrix-based performance metrics (TPR), including ROC–AUC curves, are proposed. To assess the benefits of GeoFIS anddemonstrate its performance, landslide susceptibility is assessed for aselected landslide-prone area; achieved performance is .602 withROC–AUC and .902 with TPR. These results encourage us to recom-mend use of GeoFIS for landslide susceptibility map production and

to try other expert based methods to produce landslide suscepti-bility map.

Appendix A. Supporting information

Supplementary data associated with this article can be found inthe online version at http://dx.doi.org/10.1016/j.cageo.2013.12.016.

References

Akgun, A., Bulut, F., 2007. GIS-based landslide susceptibility for Arsin-Yomra(Trabzon, North Turkey) region. Environ. Geol. 51, 1377–1387.

Akgun, A., Dag, S., Bulut, F., 2008. Landslide susceptibility mapping for a landslide-prone area (Findikli, NE of Turkey) by likelihood frequency ratio and weightedlinear combination models. Environ. Geol. 54 (6), 1127–1143.

Akgun, A., Sezer, E.A., Nefeslioglu, H.A., Gokceoglu, C., Pradhan, B., 2012. An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibilityusing a Mamdani fuzzy algorithm. Comput. Geosci. 38 (1), 23–34.

Akgun, A., Turk, N., 2010. Landslide susceptibility mapping for Ayvalik (WesternTurkey) and its vicinity by multicriteria decision analysis. Environ. Earth Sci. 61(3), 595–611.

Alvarez Grima, M., 2000. Neuro-Fuzzy Modeling in Engineering Geology. A.A.Balkema, Rotterdam p. 244

ArcGIS (Version 9.3.1), 2009. Integrated GIS Software, ESRI, CA.Arora, M.K., Gupta, A.S.D., Gupta, R.P., 2004. An artificial neural network approach

for landslide hazard zonation in the Bhagirathi (Ganga) Valley, Himalayas. Int. J.Remote Sensing 25 (3), 559–572.

Caniani, D., Pascale, S., Sdao, F., Sole, A., 2008. Neural networks and landslidesusceptibility: a case study of the urban area of Potenza. Nat. Hazards 45,55–72.

Carrara, A., Cardinalli, M., Detti, R., Guzetti, F., Pasqui, V., Reichenbach, P., 1991. GIStechniques and statistical models in evaluating landslide hazard. Earth Surf.Process Landforms 16 (5), 427–445.

Conrad, O., 2002. DigitalesGelande-Modell (DiGeM) Terrain Analysis Software./⟨http://www.geogr.uni-goettingen.de/pg/saga/digem/S⟩ (accessed 18.04.06.).

Cox, E., 1994. The Fuzzy Systems Handbook: A Practitioner0s Guide to Building, Using,and Maintaining Fuzzy Systems, second ed. Academic Press, San Diago, CA

Dai, F.C., Lee, C.F., Xu, Z.W., 2001. Assessment of landslide susceptibility on thenatural terrain of Lantau Island, Hong Kong. Environ. Geol. 40 (3), 381–391.

Davis, J., Goadrich, M., 2006. The relationship between precision-recall and ROCcurves. In: Proceedings of the ICML006, ACM Press, pp. 233–240.

DotSpatial 1.4, 2012. ⟨http://dotspatial.codeplex.com⟩ (last visit date June 2013).Ercanoglu, M., Gokceoglu, C., 2004. Use of fuzzy relations to produce landslide

susceptibility map of a landslide prone area (West Black Sea Region, Turkey).Eng. Geol. 75 (3-4), 229–250.

Ercanoglu, M., Gokceoglu, C., 2002. Assessment of landslide susceptibility for alandslide-prone area (north of Yenice, NW Turkey) by fuzzy approach. Environ.Geol. 41 (6), 720–730.

Fuzzynet 1.2.0, 2013. ⟨http://sourceforge.net/projects/fuzzynet⟩ (last visit date June2013).

General Directorate of Mineral Research and Exploration, 1998. Geological map ofTurkey. 1,100.000-scaled Trabzon Sheet.

Gokceoglu, C., Sezer, E.A., 2009. Statistical assessment on international landslideliterature (1945–2008). Landslides 6, 345–351.

Fig. 13. Performance assessment in GeoFIS.

T. Osna et al. / Computers & Geosciences 66 (2014) 20–30 29

Page 11: GeoFIS: An integrated tool for the assessment of landslide susceptibility

Gokceoglu, C., Sonmez, H., Nefeslioglu, H.A., Duman, T.Y., Can, T., 2005. The 17March 2005 Kuzulu landslide (Sivas, Turkey) and landslide-susceptibility mapof its near vicinity. Eng. Geol. 81, 65–83.

Groves, R.M., Fowler, F.J., Couper, M.P., Lepkowski, J.M., Singer, E., Tourangeau, R.,2009. Survey Methodology, second ed. Wiley, ISBN: 978-0-470-46546-2

Hirsch, R., 1991. Validation samples. Biometrics 47 (3), 1193–1194.Kanungo, D.P., Arora, M.K., Sarkar, S., Gupta, R.P., 2006. A comparative study of

conventional, ANN black box, fuzzy and combined neural and fuzzy weightingprocedures for landslide susceptibility zonation in Darjeeling Himalayas.Eng. Geol. 85 (3–4), 347–366.

Kanungo, D.P., Arora, M.K., Gupta, R.P., Sarkar, S., 2008. Landslide risk assessmentusing concepts of danger pixels and fuzzy set theory in Darjeeling Himalayas.Landslides 5, 407–416.

Kavzoglu, T., Sahin, E.K., Colkesen, I., 2013. Landslide susceptibility mapping usingGIS-based multi-criteria decision analysis, support vector machines, andlogistic regression. Landslides, http://dx.doi.org/10.1007/s10346-013-0391-7.

Kayastha, P., 2012. Application of fuzzy logic approach for landslide susceptibilitymapping in Garuwa sub-basin, East Nepal. Front. Earth Sci. 6 (4), 420–432.

Lee, S., 2007. Application and verification of fuzzy algebraic operators to landslidesusceptibility mapping. Environ. Geol. 52 (4), 615–623.

Lee, S., Min, K., 2001. Statistical analysis of landslide susceptibility at Yongin, Korea.Environ. Geol. 40, 1095–1113.

Lee, S., Ryu, J.H., Won, J.S., Park, H., 2004. Determination and application of theweights for landslide susceptibility mapping using an artificial neural network.Eng. Geol. 71, 289–302.

Liao, T.W., 2008. Classificaiton of weld flaws with imbalanced class data. ExpertSyst. Appl. 35, 1041–1052.

Liu, J.G., Mason, P., Hilton, F., Lee, H., 2004. Detection of rapid erosion in SE Spain:a GIS approach based on ERS SAR coherence imagery. Photogramm. Eng.Remote Sensing 70 (10) (1197–1185).

Lui, Y., Guo, H.C., Zou, R., Wang, L.J., 2006. Neural network modelling for regionalhazard assessment of debris flow in Lake Qionghai Watershed, China. Environ.Geol. 49, 968–976.

Mamdani, E.H., Assilian, S., 1973. An experiment in linguistic synthesis with a fuzzylogic controller. Int. J. Man-Mach. Stud. 7 (1), 1–13.

Melchiorre, C., Matteucci, M., Azzoni, A., Zanchi, A., 2008. Artificial neural networksand cluster analysis in landslide susceptibility zonation. Geomorphology 94,379–400.

Menzies, T., DiStefano, J., Orrego, A., Chapman, R., 2004. Assessing predictors ofsoftware defects. Predictive Software Models Workshop, Chicago, USA.

Menzies, T., Greenwald, J., Frank, A., 2007. Data mining static code attributes tolearn defect predictors. IEEE Trans. Softw. Eng. 32 (1), 2–13.

Moore, I.D., Grayson, R.B., Ladson, A.R., 1991. Digital terrain modeling: a review ofhydrological, geomorphological and biological applications. Hydrol. Processes13 (4), 305–320.

Nefeslioglu, H.A., Gokceoglu, C., 2011. Probabilistic risk assessment in medium scalefor rainfall induced earthflows: Catakli catchment area (Cayeli, Rize, Turkey).Math. Probl. Eng. (Article ID 280431)

Nefeslioglu, H.A., Gokceoglu, C., Sonmez, H., 2008. An assessment on the use oflogistic regression and artificial neural networks with different samplingstrategies for the preparation of landslide susceptibility maps. Eng. Geol. 97,171–191.

Nefeslioglu, H.A., Sezer, E., Gokceoglu, C., Bozkir, A.S., Duman, T.Y., 2010. Assess-ment of landslide susceptibility by decision trees in the metropolitan area ofIstanbul, Turkey. Math. Probl. Eng., 1–15, http://dx.doi.org/10.1155/2010/901095 (Article ID 901095)

Pachauri, A.K, Pant, M., 1992. Landslide hazard mapping based on geologicalattributes. Eng. Geol. 32, 81–100.

Pourghasemi, H.R., Pradhan, B., Gokceoglu, C., 2012. Application of fuzzy logic andanalytical hierarchy process (AHP) to landslide susceptibility mapping at Harazwatershed, Iran. Nat. Hazards 63 (2), 965–996.

Pradhan, B., Sezer, E.A., Gokceoglu, C., Buchroithner, M.F., 2010. Landslide suscept-ibility mapping by neuro-fuzzy approach in a landslide prone area (CameronHighland, Malaysia). IEEE Trans. Geosci. Remote Sensing 48 (12), 4164–4177.

Ramakrishnan, D., Singh, T.N., Verma, A.K., Gulati, A., Tiwari, K.C., 2013. Softcomputing and GIS for landslide susceptibility assessment in Tawaghat area,Kumaon Himalaya, India. J. Int. Soc. Prev. Mitigation Nat. Hazards 65 (1),315–330.

Sezer, E.A., Pradhan, B., Gokceoglu, C., 2011. Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. ExpertSyst. Appl. 38 (7), 8208–8219.

Sun, A., Lim, E.P., Liu, Y., 2009. On strategies for imbalanced text classification usingSVM: a comparative study. Decision Support Syst. 48, 191–201.

Tangestani, M.H., 2004. Landslide susceptibility mapping using the fuzzy gammaapproach in a GIS, Kakan catchment area, southwest Iran. Aust. J. Earth Sci. 51(3), 439–450.

Varnes, D.J., 1978. Slope movement types and processes. In: Schuster, R.L., Krizek, R.J. (Eds.), Landslides Analysis and Control. Special Report, vol. 176. Transporta-tion Research Board, National Academy of Sciences, New York, pp. 12–33.

Van Westen, C.J., Bonilla, J.B.A., 1990. Mountain hazard analysis using PC-basedGIS. In: Proceedings of the 6th IAEG Congress, vol. 1, Balkema, Rotterdam,pp. 265–271.

Venkatesan, M., Thangavelu, A., Prabhavathy, P., 2013. An improved bayesianclassification data mining method for early warning landslide susceptibilitymodel using GIS. Adv. Intelligent Syst. Comput. 202, 277–288.

Xu, C., Xu, X., Dai, F., Saraf, A.K., 2012. Comparison of different models forsusceptibility mapping of earthquake triggered landslides related with the2008 Wenchuan earthquake in China. Comput. Geosci. 46, 317–329.

Yalcin, A., Reis, S., Aydinoglu, A.C., Yomralioglu, T., 2011. A GIS-based comparativestudy of frequency ratio, analytical hierarchy process, bivariate statistics andlogistics regression methods for landslide susceptibility mapping in Trabzon,NE Turkey. Catena 85, 274–287.

Yilmaz, I., 2009. Landslide susceptibility using frequency ratio, logistic regression,artificial neural networks and their comparison: a case study from Kat land-slides (Tokat-Turkey). Comput. Geosci. 35 (6), 1125–1138.

Zadeh, L.A., 1965. Fuzzy sets. Inf. Control 8, 338–353.

T. Osna et al. / Computers & Geosciences 66 (2014) 20–3030