fuzzy soil mapping based on prototype category...

006) 774–787www.elsevier.com/locate/geoderma

Geoderma 136 (2

Fuzzy soil mapping based on prototype category theory

Feng Qi a,⁎, A-Xing Zhu b,c, Mark Harrower c, James E. Burt c

a Department of Political Science and Geography, University of Texas at San Antonio, 6900 NW Loop 1604, San Antonio, TX 78249, USAb State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural

Resources Research, Chinese Academy of Sciences, Building 917, Datun Road, An Wai, Beijing 100101, Chinac Department of Geography, University of Wisconsin–Madison, 550 North Park Street, Madison, WI 53706, USA

Received 11 January 2005; received in revised form 26 May 2006; accepted 6 June 2006Available online 1 August 2006

Abstract

An essential component of soil mapping is classification, a process of assigning spatial soil entities to predefined categories(classes). However, by their nature soils exist as a continuum both in the spatial and attribute domains and often cannot be fittedinto discrete categories without introducing errors or at least over-simplification. One approach to mitigate this problem in digitalsoil mapping is the combination of fuzzy logic-based class assignment with a raster GIS representation model which allows thecontinuous spatial variation of soils to be expressed at much greater detail than has been achieved in conventional (analog) soilsurvey. However, applications of fuzzy soil mapping face two significant challenges: defining the central concept of a soil categoryand determining the degree of membership to the central concept. Prototype category theory is presented here as a potentialsolution to these difficulties. Emerging from ideas of family resemblance, centrality and membership gradience, and fuzzyboundaries (fuzzy set theory), prototype category theory stresses the fact that category membership is not homogenous and thatsome members are better representatives of a category than others. A prototype can be viewed as a representation of the category,that 1) reflects the central tendency of the instances' properties or patterns; 2) consequently is more similar to some categorymembers than others; and 3) is itself realizable but is not necessarily an instance. Based on this notion, we developed a prototype-based approach to acquire and represent knowledge on soil–landscape relationships and apply the knowledge in digital soilmapping under fuzzy logic. The prototype-based approach was applied in a case study to map soils in central Wisconsin, USA. Ourapproach created maps that were more accurate in terms of both soil series prediction and soil texture estimation than either thetraditional soil survey or a case-based reasoning approach.© 2006 Published by Elsevier B.V.

Keywords: Soil map; Fuzzy logic; Cognitive theory; Prototype category theory; GIS; SoLIM

1. Introduction

In traditional soil mapping it is long-standing conven-tion to classify soils and depict soil classes as discrete

⁎ Corresponding author. Tel.: +1 210 458 5611; fax: +1 210 458 5430.E-mail addresses: [email protected] (F. Qi), [email protected]

(A.-X. Zhu), [email protected] (M. Harrower),[email protected] (J.E. Burt).

0016-7061/$ - see front matter © 2006 Published by Elsevier B.V.doi:10.1016/j.geoderma.2006.06.001

polygons on an ‘area-class’ map (Mark and Csillag,1989). However, soils are known to exist more or less as acontinuum in both geographic space and attribute space(Burrough, 1996; Zhu, 1997a) as one soil type blends intoanother. Fitting this continuous spatial character of soilsinto discrete soil categories with full memberships over-generalizes the inherent complexity of soil variation,which in turn degrades the accuracy of soil spatial infor-mation products. In addition to the over-generalization of

mailto:[email protected]




http://dx.doi.org/10.1016/j.geoderma.2006.06.001

775F. Qi et al. / Geoderma 136 (2006) 774–787

soil variations, manual production in traditional soilmapping also faces other limitations, primarily low speedand high cost of production (Zhu et al., 2001).

In order to overcome these limitations, many research-ers started to explore the use of knowledge-based tech-niques and fuzzy logic concepts to improve soil mappingprocess and its products (see McBratney et al., 2003 for areview). Among these endeavors, the SoLIM approach(Zhu and Band, 1994; Zhu et al., 1996, 1997; Zhu, 1997a,b; Zhu et al., 2001) is a knowledge-based soil mappingsystem developed for mass production of soil survey.SoLIM is an automated soil inference system that com-bines fuzzy logic-based class assignment with a raster GISrepresentation model, which allows the continuous spatialvariation of soils to be expressed at much greater detail sothat the class transitions and within-class variations can berepresented. SoLIMuses an-dimensional similarity vectorto depict soil properties at each pixel location (i, j) (Zhu,1997a): Sij= (Sij

1, Sij2,…, Sij

k, …,Sijn), where Sij

k represents thesimilarity value or fuzzymembership of the soil at location(i, j) to the prescribed soil class k, and n is the total numberof these prescribed classes. The soil similarity vectors (S)are inferred under fuzzy logic based on the same conceptS= f (E) as in traditional soil survey, where the essentialformative environment data E can be derived using GIStechniques (Zhu et al., 1996;McSweeney et al., 1994), andthe relationship between soil and relevant environmentalvariables (the soil-landscape model) f is obtained throughknowledge acquisition. f should reflect both the centralconcepts of soil classes and the transitions between centralconcepts.

Knowledge-based fuzzy digital soil mapping ap-proaches currently face two major challenges in de-fining f: the representation of the central concept of asoil category and how membership to this central con-cept is modeled. These difficulties are largely due to twofactors. First, knowledge acquisition from human ex-perts has long been noted to be the bottleneck for thedevelopment of knowledge-based systems (Molokova,1993; Weibel et al., 1995), especially in the case ofknowledge-based soil mapping, where knowledge of thesoil–landscape model largely exists as “tacit knowl-edge” (Hudson, 1992). Second, it is desirable that theextracted knowledge be represented in a form that iscomputable as well as readily communicable with soilscientists so that other soil scientists can validate andupdate this knowledge base. This makes it necessary forthe knowledge representation to approximate the mentalrepresentation of soil scientists' understanding of thesoil classes and the knowledge acquisition to be basedon the cognitive process suited soil scientists' under-standing of soil variation.

Early implementations of SoLIM, however, did nottake into consideration the cognitive aspects of knowl-edge formulation and representation for soil classifica-tion. In previous SoLIM applications, soil scientists wererequired to provide either the exact forms of fuzzymembership functions (Zhu, 1999; Zhu et al., 2001) or alarge set of typical cases for known soil types in the studyarea (Shi et al., 2004). Soil inference was then conductedthrough fuzzy inference or case-based reasoning. Lack-ing a cognitive basis, these previous approaches placeunreasonable demands on soil scientists during knowl-edge acquisition by requiring that they reformulate theirmental knowledge into the desired form. Difficulty mayalso arise when the acquired knowledge needs to beinterpreted or updated. This paper presents an approachthat addresses this issue by employing the principles ofcognitive theory in both the knowledge representationand the associated knowledge acquisition process in aneffort to provide guidance to practitioners of digital soilmapping using approaches similar to SoLIM.

It has been contended that better understanding of howhuman beings acquire, organize, and process domainknowledge eases the difficulties in acquiring knowledgefrom domain experts in the development of knowledge-based systems (McCracken and Cate, 1986; Ford et al.,1991; Zhu, 1999; Özesmi and Özesmi, 2004). Especiallyfor soil classification, it has long been suggested that itwould be possible and desirable to connect the design ofknowledge-based classification systems with cognitivetheories (McCracken and Cate, 1986). This paper presentsan approach to obtaining and representing knowledge onsoil–landscape relationships based on cognitive theories onhuman categorization, specifically, the prototype categorytheory (Rosch, 1973, 1978; Smith and Medin, 1981;Lakoff, 1987; Minda and Smith, 2001). We represent andacquire the knowledge of soil–landscape model from soilscientists in terms of prototypes and membership grada-tions from prototypes. Variations of soil properties can thenbe modeled through prototype-based reasoning. The nextsection of this paper gives a brief introduction to prototypetheory and how it differs from “classical” category theorywhich underpins traditional approaches to soil mapping. Anovel prototype-based fuzzy soil mapping method is thenpresented. The advantages of this new method areillustrated and evaluated through a case study.

2. Prototype theory

Developments in cognitive psychology over the last30 years have radically changed our understanding of howhumans comprehend and describe the world through theuse of categories. The act of classification—the ability to

776 F. Qi et al. / Geoderma 136 (2006) 774–787

distinguish that A is different from B—is possible becauseof categories. Whenever we see something as a kind ofthing we are using categories, whether we are consciousof it or not. Categories help to (1) structure the world forus, (2) reduce detail, (3) aid memory retention and (4)facilitate the understanding of relationships.

The classical view of categories, traceable back toAristotle, treats categories as empty containers into whichwe place similar items. Similarity is derived from shared orcommon properties and assignment is made using a mental“check-list” of these attributes (Lakoff, 1987; Hahn andRamscar, 2001). Classical category theory further assumes:

1. Once placed in the container, all members of acategory are equally good examples of that category;

2. the boundaries between categories can be sharplydelineated;

3. no overlap exists between categories;4. categories are free of individual concerns, such as

experience, neurobiology, intelligence, or embodiedsensory knowledge (i.e., kinesthetic intelligence).

It was not until the 1970s that the classical view ofcategories received intense criticism and new views startedto emerge (Smith and Medin, 1981). The new theoriesviewed categories as a human construct created by anactive mind searching for meaning rather than seeing themas pre-existing structures in the world. According to thisnew view, categories are conceptual structures resultedfrom human's perception and interaction with theenvironment (Hahn and Ramscar, 2001). Categorizationis a process of comparing new instance to previouslyestablished (but highly malleable) mental representationsof the category and classification is based on similarity ofthe instance to the existing representations.

Among the newly developed category theories,prototype theory enjoys the largest support. Emergingfrom Wittgenstein's earlier ideas on family resemblance,centrality and membership gradience (Wittgenstein,1953) and Zadeh's fuzzy set theory (Zadeh, 1965),prototype theory (Rosch, 1973, 1978) stresses the fact thatcategory membership is not homogenous and that somemembers are better representatives of a category thanothers, which is noted as the “prototype effects” (Lakoff,1987). Furthermore, it is believed that people may holdmore than one kind of prototype for a given category andwill deploy themdifferently depending upon the situation.Other concepts in the prototype approach include:

1. Internal heterogeneity. Prototype categories allow foruneven fit. Although penguins cannot fly, they are stillbirds.

2. Indeterminate boundaries. The category “U.S. Sen-ator” is unambiguous—one is either a senator or oneis not. However, what threshold defines “rich”? Is itpossible to be 5 cents short of being rich?

3. Categories are relative. Most qualitative categories—hot, slow, old for example—only have meaning inrelation to their counterparts. Moreover, perception isoften dependent upon experience: a big city to oneperson may be a village to another.

4. Categories change. Criteria for membership oftenevolve, as in the case of video games expanding ournotion of “game”. Classical category theory assumesthat categories are static and do not change.

Characteristics of a category can be very broad andhaphazard. Categories are no longer seen as passive andstatic “mental containers.” Instead, they are constantlycreated, refined, and combined to create new categories,and thereby, construct new knowledge.

The prototype of a category is a composite or averageof all the real instances experienced, that 1) reflects somemeasure of central tendency of the instances' properties;2) consequently is more similar to some category mem-bers than others; and 3) is itself realizable but may notnecessarily be an instance (Smith and Medin, 1981).Compared to the classical view of categories, the proto-type is a summary representation of a category in terms offeatures that may be only probable to its category mem-bers (Davidsson, 1992).

In the case of soil classification, the continuous soilbody is categorized into soil classes (soil categories). Suchcategories are conceptual structures and internal hetero-geneity is shown when soil scientists think a certain pedonis more representative of a soil class than another one,although both are classified as the same class. Althoughsoils have been traditionally mapped as discrete polygons,it has long been recognized that soil classes have indeter-minate boundaries in both geographic space and attributespace (Burrough, 1996). In themapping of soil polygons interms of soil–landscape units (Hudson, 1990), definitionshave always been relative, as in a convex (versus concave)position or steep (versus not steep) slopes. Prototypes maychange and criteria for determining memberships evolvesthrough years of field work by experiencing more andmore real instances of a soil category. Such a prototype, ifrealized as an actual instance, has the highest membershipamong all instances of the category under concern. Soilcategorization/classification is then through the compari-son of the new instance to such established prototypes.

Traditional soil inventory mapping, however, is rootedin the classical view of categories. It overlooks theprototypical characteristics of soil categories and leads to


over-generalization in both the spatial and attributedomain (Zhu, 1997b). Prototype category theory, as pre-sented here, provides an alternative to acquiring knowl-edge about soil categories by creating category prototypesand defining membership gradations based on thedifference between a new instance and the prototype.

3. Prototype-based soil mapping

Based on prototype theory, a category can be repre-sented by its prototype: a composite set of features thatsummarizes the real instances of the category. This thenserves as the cognitive reference points for inference(Khalidi, 1995; Minda and Smith, 2001). New instancesare identified and memberships determined by matchingthe features (Khalidi, 1995; Markman, 1999). Our ap-proach to soil mapping based on prototype theory con-sists of two steps. The first step involves acquiringknowledge about the soil categories from soil scientistsand representing the knowledge in terms of prototypesand membership gradations. The second step is proto-type-based soil inference.

3.1. Knowledge representation and knowledgeacquisition

Smith and Medin (1981) distinguish two kinds offeatures related to the representation of a category: the

Fig. 1. Improved frame representa

“core features” versus the “identification features”. Inthe case of soil classification, the core features are thoseproperties that are formally described in Soil Taxonomy(Soil Survey Staff, 1999) and reveal relations betweenclasses, while the identification features are those thatare commonly used to identify the spatial occurrences ofcertain soil classes. In soil survey, soils are mappedbased on local soil–landscape models that describe therelationships between soil and its formative environ-mental factors (Hudson, 1992). In the context ofpredictive soil mapping (Scull et al., 2003; McBratneyet al., 2003), the formative environment is usuallycharacterized by variables such as terrain characteristics,parent materials, etc. The representation of soil classesfor soil inference can thus use these environmentalfactors as the identification features.

In order to incorporate the prototypical aspects of soilclasses, it is necessary to explicitly model the prototypesand membership gradations in the knowledge representa-tion. Specifically, the prototypes of classes can be storedusing a common frame structure (Minsky, 1975; Fillmore,1985; Zhu, 1999), while themembership gradations can berepresented with optimality curves (Zhu, 1999). Anexample of such frame is illustrated in Fig. 1. Eachframe defines the prototype(s) of a soil class and is alsolinked to a set of optimality curves that describe howmembership responds when the values of the identificationfeatures change. An optimality value of 1 refers to full

tion of the soil class Valton.


membership, and 0 means no membership at all. For soilclass Valton (Fig. 1), we see that the membership dropsfrom 1 to 0 when the bedrock changes fromOneota to anyother type, and the membership decreases constantly whenslope gets steeper.

Knowledge represented in such a way can be obtainedfrom soil scientists through knowledge acquisition. If theprototype of a soil class is realizable as a local instance, itcan be identified on a virtual landscape using a knowledge-acquisition tool such as 3dMapper (Burt and Zhu, 2004).3dMapper is a visualization software tool that can displayarbitrary 3D views of topography and allow for overlay ofGIS data layers that are necessary in the identification of atypical soil landscape unit. Fig. 2 shows the identificationof the prototype of soil class Valton on a virtual landscapeusing 3dMapper. In the identification of this prototype, soilscientist can load in data layers that capture soil-formativeenvironmental factors. The values of these environmentalfactors for the prototype can be recorded and stored in theframe structure in Fig. 1. In many cases, however, theprototype of a soil class may not be realizable as a realinstance. It is very common that certain soil classes (e.g.soil series) may be first found and defined for other areasand no typical pedons develop in the area currently underconcern. In these cases, prototypes should be defined by asoil scientist in the form of descriptive knowledge inknowledge acquisition. This involves the determination ofa list of relevant environmental variables (identificationfeatures) for each individual soil class and the typicalenvironmental conditions under which the soil class is

Fig. 2. Identify the prototype of soil class Valto

expected to occur. Once such knowledge is obtained fromsoil scientist through interviews, a set of frames (Fig. 1) canbe constructed to store the prototypes.

Given the prototypes defined for all the possible soilclasses in the mapping area, optimality curves can bemodeled empirically using a priori heuristic curves, asknown as the Semantic Import Model (SI) approach(Burrough, 1989; McBratney and Odeh, 1997). Thechoice for heuristic curves is based on the types of theenvironmental variables (identification features) (Shiet al., 2004). For a feature that is categorical (such asbedrock geology), the function is Boolean as the oneshown in Fig. 1. For continuous features, the soil scientistcan choose from the various widely accepted modelsdiscussed by Burrough et al. (1992) and MacMillan et al.(2000). The next section will show the use of Gaussianfunctions in a case study for soil mapping in Wisconsin.

3.2. Prototype-based inference

Once the expert knowledge is acquired in the form ofprototype frames and membership curves, it can be usedfor prototype-based soil inference. For any pixel in themapping area, the degree of class membership to anypredefined soil class can be determined by their degreesof similarity to the class prototype-based on themembership curves. It seems easy to confuse proto-type-based inference with case-based reasoning (Shiet al., 2004), in which conclusions are drawn based onsimilarities to existing classified cases. The important

n on a virtual landscape using 3dMapper.


differences between these two approaches include: (1)case-based reasoning is based on the exemplar view ofcategories which takes specific exemplars as thecategory's mental representation and assumes knowl-edge about the category is implicitly embedded in theindividual cases. In contrast, prototype theory viewsprototypes as the mental representation of a categorywhere knowledge about the category (e.g. identificationfeatures) is explicitly represented (Smith and Medin,1981); (2) case-based reasoning usually requires vastamount of cases (exemplars) while prototype-basedreasoning, as will be shown in the case study, may needonly one or a few prototypes (central tendency) for eachcategory (Zeithamova, 2003); (3) case-based reasoninguses examples that actually exist as members of thecategories under concern while prototypes may not beactual instances, but rather abstract representations or“textbook” examples that might not exist in a givendataset; (4) case-based reasoning uses not only similar-ities to the existing cases but also the typicality of thecases themselves to determine memberships of a newitem, while prototype-based reasoning needs to accountfor only the similarity to the prototypes.

Our paper suggests that prototype-based inference is abetter choice for fuzzy soil mapping due to the followingconsiderations. First, in the case of soil mapping,abundant classified examplesmay not be always availablein order for a case-based reasoning to achieve acceptableperformance. Second, soil mapping using case-basedreasoning requires the existence of local instances of thecategories, but representative examples for some soil

Fig. 3. Location and topography of

classes may not exist locally within the mapping area.Third, it is always desirable to have the knowledgeexplicitly represented and documented for communica-tion and map updates. Case-based reasoning does notdirectly allow this because knowledge is embedded inlarge amount of cases. Unlike in situations where it isimpossible to have an abstract summary of the categoryfeatures, the knowledge of a soil–landscape model can bewell represented in a holistic way using prototypes ratherthan being implicitly embedded in individual exemplars(which are unique to a data set and not readily transferableto new data sets). And fourth, research in cognitivepsychology (Minda and Smith, 2001) has noted thatexemplar theory has often favored small, poorly struc-tured categories, whereas prototype-based models mayrepresent better categories that are more complex andwellstructured. In our case study, wewill show the results fromboth case-based reasoning and prototype-based inferenceto compare the two in soil mapping.

4. Case study

The prototype-based soil inference approach wasimplemented under the SoLIM framework to update thesoil survey of Raffelson watershed in La Crosse,Wisconsin (Fig. 3). The watershed is located in the“Driftless area” of southwestern Wisconsin that hasremained free of direct impact from the most recentPleistocene era continental glaciers. The area has a typicalridge and valley terrain with relatively flat, narrow ridges.There are moderate side slopes with gradients below 20%

the Raffelson Watershed, WI.


and steep slopes with high gradient values around 50%.The bedrock in this watershed is mainly of two types: (1)Prairie du Chien dolomite and (2) sandstone of UpperCambrian age. Dolomite is only present on the high,rounded ridge areas; elsewhere it was removed bygeologic erosion. Wherever the dolomite is eroded, thebedrock is sandstone,with erosion and cutting responsiblefor most of the differences in the landforms. The erosionof bedrock has divided what was once a fairly levelplateau and has formed a relatively dissected upland withapparent relief. Most ridges and valleys in the area havebeen cultivated since late 19th century. Side-slopes aregenerally forested, though some have been cleared forpasturing.

In terms of parent material, there are as many as fivedistinct types within the Raffelson watershed area,which are Oneota (dolomite), Glauconite (sandstone),Jordan (sandstone), Wonewoc (sandstone) and alluvialmaterials. The complexity of parent material over thearea has led to the development of 16 different series(classes) of soil in the watershed. In this study a seniorsoil scientist from the local office of U.S Department ofAgriculture–Natural Resources Conservation Service(USDA–NRCS) was asked to provide expert knowl-edge in the form of a soil–landscape model concerningthe 16 soil classes.

In acquiring the knowledge of the prototypes for thesoil classes, we first interviewed the soil scientist andobtained the descriptions of the central concept for eachsoil class.We then identified realizations of the prototypesfor four soil classes Lamoille, Churchtown, Norden, andCouncil on a virtual landscape because there exist thefully representative instances for these soil classes in thewatershed. For the other 12 classes, we obtained the mosttypical values of relevant environmental variables fromthe soil scientist in an interview session. Finally,membership changes were modeled with Gaussianoptimality curves as shown in Fig. 1. The cross-overpoints (Burrough, 1989) were determined empiricallybased on soil scientist's knowledge of the non-over-lapping boundary of two adjacent central concepts.Likewise, the shapes of the curves were determinedbased on soil scientists' descriptions of the centralconcepts. For example, if the central concept of soilclass Valton indicates it occurs on slopes that are less than12% steep, the optimality curve is S-shaped (Fig. 1). Onthe other hand, if a soil occurs on slopes that are steeperthan 20%, the curve has a reversed S-shape. The normalbell-shaped curve indicates the highest membership in themiddle, with membership decreasing toward both tails.

In order to conduct soil inference with the acquiredexpert knowledge, we first created a GIS database of the

environmental variables that were listed by soil scientistsas the identification features for the soil prototypes.Because different soil classes may have different iden-tification features, our GIS database is extensive toinclude all features identified in knowledge acquisition.The features used in this research include parent material,elevation, slope gradient, surface curvatures (profile andplanform curvatures) (Zevenbergen and Thorne, 1987),and two spatial variables: topographic wetness index andpercentage of colluvium from competing bedrocks.Topographic wetness index is used to combine connec-tivity information based on flow direction with slopedynamics to represent the hydrological topographic cha-racteristics that influence soil formation (Moore et al.,1993; Band et al., 1993). Since colluvium from differentbedrocks tends to influence soil development, weincluded the spatial variable that describes the percentageof colluvium from competing upslope bedrocks forfootslope locations. Specifically, for a given footslopelocation, the relative amount of colluvium it received froma certain bedrock is approximated on the basis of theaccumulated upstream drainage cells originating from thegiven bedrock polygon. The percentage of colluviumfrom multiple competing upslope bedrocks is thencomputed relatively.

During soil inference, our inference engine scansthrough all pixels within the mapping area to computethe similarity vectors. For any given pixel, the inferenceengine first looks up in the knowledge base theprototype and identification features defined for everysoil class. The value of similarity to this class is obtainedby imposing a fuzzy-AND operation (Zadeh, 1965) onthe optimality values for all identification features. Theuse of fuzzy-AND operation follows Zhu and Band(1994) and is based on the limiting factor principle inecology, which states that the limiting factor controls thedevelopment of soil formation, thus the environmentalvariable that gives the least optimality value determinesthe membership of the soil.

Once the similarity vectors for all locations in themapping area are computed, a soil series map can becreated through the process of defuzzification (Janikow,1998) by assigning each location the soil class that hasthe highest membership value in the similarity vector.The resulting raster map is expected to be spatially moredetailed than traditional soil survey based on the ‘area-class’ model. In addition, uncertainties associated withthis classification process can be also computed from thesimilarity vector (Zhu, 1997b) to depict the typicalitiesof the classified soils and the transitions between soilclasses. The continuous variation of soils can be furtherrepresented by continuous soil property maps derived

Fig. 4. Distribution of A horizon sand percentage for the Raffelson watershed: (left) from conventional soil survey map; (right) from prototype-basedinference.


from the similarity vectors. A continuous soil property(e.g. A horizon depth) map can be generated with thefollowing formula according to Zhu et al. (1997):

mij ¼

Xn

k¼1

skijmk

Pn

k¼1skij

ð1Þ

where vij is the property at site (i, j); vk is the typicalvalue (either defined in national standards or determinedlocally) of that property of soil class k; sij

k is themembership value of soil class k at (i, j); n is the totalnumber of soil classes in the area.

Field validation was conducted in order to evaluatethe prototype-based fuzzy soil mapping approach. Wecollected data from 99 field points in the Raffelsonwatershed, of which all were classified and assigned soilseries names by two experienced soil scientists fromUSDA–NRCS local offices and 49 were given a textureanalysis to determine the percentages of sand and silt inthe A horizon. The surface texture property was selected

Fig. 5. Distribution ofA horizon silt percentage for the Raffelsonwatershed: (left)

because the relative sand, silt, and clay content of thesurface soil is a fundamental soil property that is closelyrelated to many other soil properties such as permeabil-ity, porosity, water holding capacity, and soil fertility(Posadas et al., 2001).

The validity of the prototype-based fuzzy mappingapproach was evaluated in two aspects. First, theinference results were compared to the traditional soilsurvey map in terms of classification accuracy, mapconsistency and the level of detail both spatially andparametrically. Second, the prototype-based inferenceresults were compared to case-based reasoning results interms of classification accuracy based on inferred soilseries maps and ability to capture continuous soil propertyvariations based derived soil property maps.

5. Results and discussion

5.1. Comparison of prototype-based inference resultswith traditional soil survey map

The 99 sample points for field evaluation were chosenbased on two sampling strategies: transect sampling and

from conventional soil surveymap; (right) fromprototype-based inference.

Fig. 6. Scatter plots of observed A horizon silt percentage values vs. the values derived from traditional soil map (left) and prototype-based inference(right) at 49 sample locations in Raffelson watershed.

Table 1Accuracy of the derived A horizon texture in the Raffelson watershed:the prototype-based inference result vs. the soil survey map

Percentage of sand Percentage of silt

MAE RMSE AC MAE RMSE AC

Inference result 7.75 12.65 0.85 6.49 11.11 0.85Soil survey map 14.87 19.83 0.58 14.51 18.08 0.46


point sampling. Transect sampling was employed tocover transitions between major landscape units (such asfrom ridge top to valley bottom and from concave drawto convex nose positions) at 53 of the 99 sites. Theremainder of the 99 sites was selected by local scientistsbased on the topography of the watershed to cover themajor landscape units (such as ridge tops, shoulderpositions, valley bottoms) throughout the watershed. Thesoil series names given by soil scientists of these 99 siteswere compared to both those mapped in conventional soilsurvey and those obtained from the inferred soil seriesmap derived from prototype-based inference. Of the 99sites, the prototype-based approach correctly inferred thesoil series at 83 sites (∼83.8%), while the conventionalsoil survey mapped 66 sites correctly (∼66.7%). Athorough examination of both maps indicates that theimproved performance is largely explained by threefactors. First, spatial distributions of soil classes on theinferred soil map follow accurately soil scientist'sknowledge about the soil–landscape relationships whileconventional soil map shows evidence of misplacementof class boundaries. This is often inevitable in manual soilsurvey due to the difficulty in precise determination oflandscape characteristics through visual perception.Second, the automated approach allows for consistentapplication of soil scientist's knowledge in the entiremapping area while manual soil survey may introduceinconsistency. And third, inclusions that are common inconventional soil map due to limitations of scale can beavoided with the raster based representation because soilvariations can be depicted in more detail.

In addition to improved accuracy in classifying soilseries, the advantage of prototype-based soil inference

largely lie in its ability to capture within-class variationsand transitions between class prototypes. As aforemen-tioned, soil at each pixel location is represented by asimilarity vector and the soil series map is created bydefuzzifying the similarity vectors at all pixel locations.Although each pixel location is labeled as the soil classto which it has the highest similarity value, its similarityto other soil classes provides information about the soilpedon's typicality to its assigned class. In order tomeasure numerically the accuracy of the inferred map incapturing continuous variations of soil properties, wecreated continuous soil texture maps using Eq. (1). Thetypical soil texture values for each soil series (vk) thatappears in our study area were obtained from the samplesite associated with the highest similarity value to theseries according to Moore (2004). For the sake ofcomparison, soil texture maps based on the publishedsoil survey map were also generated by assigning eachpixel the typical texture values of the soil series as whichthe pixel is labeled in the map according to official soilsurvey records. Figs. 4 and 5 show the maps of Ahorizon texture in terms of sand and silt percentages,respectively. The apparent difference between the mapsderived from the prototype-based inference and those

Table 2Comparison of soil series inferred from prototype-based approach and case-based reasoning against the field observations for the Raffelson study area

Oneota(39 samples)

Glauconite(41 samples)

Wonewoc(8 samples)

Jordan(6 samples)

Aluvial(5 samples)

Correct % Correct % Correct % Correct % Correct %

Prototype-based 37 95 36 88 5 63 3 50 2 40Case-based 35 90 34 83 6 75 4 75 2 40

Table 3Accuracy of the derived A horizon texture in the Raffelson watershed:the prototype-based inference result vs. the case-based reasoning result

Percentage of sand Percentage of silt

MAE RMSE AC MAE RMSE AC

Prototype-basedinference

7.75 12.65 0.85 6.49 11.11 0.85

Case-based reasoning 11.07 15.92 0.76 8.93 13.23 0.75


based on soil survey is the shown spatial details andcontinuity of soil texture variations. Inferred texturemaps depict more details than those based on soil surveyand tend to show continuous changes between soil typeswithin the same bedrock breaks while the survey mapsshow abrupt changes along class boundaries.

Closer inspection of the texture maps also show thatthe prototype-based inference results provide realisticdetails about spatial variations of soil texture. Forexample, in the Raffelson watershed sandstone bedrock(Wonewoc, Jordan) tends to develop soils with coarsertexture (higher sand and lower silt) than Dolomite(Oneota). Within a catena, however, we would expect alow percentage of sand on flat ridge tops due to thepreservation of finer materials and an increase of sand onback slopes due to the erosion of finer materials (siltpercentage should exhibits a reversed pattern). Further-more, convergent areas should have a low percentage ofsand and high percentage of silt due to the accumulationof fine materials and divergent areas show the opposite.This expected spatial pattern of soil texture is clearlyobservable from the maps derived from inference results,while the map based on soil survey lacks the same level ofdetails. The traditional soil survey classification differ-entiates only two discrete soil classes occurring on ridgetop and back slope positions and does not capture thecontinuous transitions between and within such positions.As a result, the texture maps based on soil survey fail toseparate ridge and shoulder positions although they tendto have different soil textures in reality and cannotdifferentiate convex back slope from concave locationswhere differential erosion and accumulation patternsshould result in distinct textures.

Particle-size analysis were conducted with samplescollected at 49 field sites using the pipette method (Kilmerand Alexander, 1949) to determine A horizon textures.The two scatter plots in Fig. 6 compare the estimation ofpercentages of silt in A horizon from prototype-basedinference with those derived from soil survey map. Fig. 6(left) shows the tendency of the inferred values to followthe field observations, while Fig. 6 (right) shows littlecorrelation between the observed values and thoseobtained from soil survey map. Three indices were also

computed to evaluate the performance of prototype-basedinference: mean average error (MAE), root mean squareerror (RMSE), and agreement coefficient (AC). The AC isdefined as (Willmott, 1984):

AC ¼ 1−nd RMSE2

PE; ð2Þ

where n is the number of observations and PE thepotential error variance defined as:

PE ¼Xn

j¼1

ðjPi−Oj þ jOi−OjÞ2; ð3Þ

given that Ō is the observed mean, and Pi and Oi are theestimated and observed value, respectively. AC valuesvary between 0 and 1, with 1 indicates perfect agreementand 0 means complete disagreement between the esti-mated and observed values (Willmott, 1984). Table 1 liststhese statistics for comparing the performance ofprototype-based inference with the soil survey map inestimating soil texture values. The MAE and RMSEstatistics for prototype-based inference are consistentlylower than those for the soil map, and the significantlyhigher AC for prototype-based inference supportsprevious evidence that prototype-based inference is ableto capture continuous variations of soil properties betterthan the soil survey map.

5.2. Comparison of prototype-based inference withcase-based reasoning

Case-based reasoning has been implemented with theprocedure outlined in Shi et al. (2004) in the Raffelson

Fig. 7. A horizon sand percentage for the Raffelson watershed: (left) from prototype-based inference; (right) from case-based reasoning.


watershed. A total of 78 cases were used to create a case-based soil series map for the area. Of the 99 field sites,the prototype-based approach correctly inferred the soilseries at 83 sites (∼83.8%), while case-based reasoninginferred 81 sites correctly (∼81.8%). A breakup of soilseries based on parent materials provides more infor-mation about the performances of the two approaches.Table 2 lists the accuracies of the two approaches inclassifying soil series developed on the five differentparent materials in the watershed. The table shows thatprototype-based approach performs better on the twomost dominant parent materials that cover over 90% ofthe watershed area while case-based reasoning showsadvantage in small areas (less than 5% of the total area)of the two minor parent materials.

By examining the nature of the misclassifiedsamples, an apparent difference between the proto-type-based inference and case-based reasoning werenoticed. The sample points misclassified by theprototype-based inference in areas of the three minorparent materials are those whose occurrences can not beexplained by soil scientist's soil–landscape model. Inother words, the landscape characteristics of thesesample points do not match those provided by soil

Fig. 8. (left) A horizon silt percentage for the Raffelson watershed: (left)

scientist for the observed soil series. One possiblereason is that these local soils exist as exceptions that arecaused by local disturbances. A second possible reasonis that soil scientist's knowledge about these soils is notwell-developed since the area is small and the for-mulation of mental prototypes has not achieved asufficient level. In either case, the soil class cannot besufficiently represented in soil scientist's mind with asingle prototype that summarizes the central tendency ofreal instances. Therefore inference based on prototypesdid not perform as well as inference based on individualcases.

Prototype-based inference performed better thancase-based reasoning in the two dominant parentmaterial areas for which the prototypes have beenwell-developed given sufficient experience with realinstances. It was noted that case-based reasoningmisclassified some of the samples although theoccurrences of these soils match exactly soil scientist'sknowledge. This could be due to the fact that case-basedreasoning requires large amount of real cases to achievehigh accuracy. Although it helped in capturing excep-tions to general rules, the overall accuracy sacrificesfrom the lack of cases at dominant landscape locations.

from prototype-based inference; (right) from case-based reasoning.

Fig. 9. Membership curves for soilValton based on curvature: (left)membership curve defined by 2 cases. (right)membership curve defined by one prototype.


The performances of prototype-based inference andcase-based reasoning in modeling continuous soil prop-erties are further compared through field evaluations usingA horizon textures at 49 sample points. Figs. 8 and 9illustrate the A horizon soil texture maps derived fromcase-based reasoning results with Eq. (1) in comparison tothose derived from prototype-based inference. AndTable 3 lists the statistics computed for both approaches.It is noticeable that the property maps generated fromprototype-based inference exhibits more continuoustransitions than those from case-based reasoning (Figs. 7and 8). The reason is that the prototype-based approachmodels membership gradations with a set of curves thatgive the highestmembership to instances that are closest toclass prototypes and make gradual decrease of member-ships proportional to the deviation from prototypes. Thisleads to smooth transitions between class prototypes.Case-based reasoning, on the other hand, models instancememberships based on the typicalities of the casesthemselves by taking into account the occurrences ofcases at similar landscape locations. The lack of cases atcertain landscape locations, therefore, may lead tomisrepresentation of the case typicalities and results infaulty membership values. Fig. 9 shows the membershipcurve simulated from 3 cases used in case-basedreasoning, where the lack of cases between point A andB causes the unrealistic local dip in membership at P. Asa result, the property values derived using the similarityvectors computed from these cases may not match theobserved values. As shown in Table 3, the texturevalues of the 49 field points obtained from prototype-based reasoning match the observed values significant-ly better (AC=0.85 for sand and 0.85 for silt) than thosefrom case-based reasoning (AC=0.76 for sand and 0.75for silt), although their accuracies in classifying soilseries names are not significantly different.

6. Conclusions

This paper has presented a fuzzy soilmapping approachbased on prototype category theory. The case study in theRaffelsonwatershed shows that results from the prototype-based inference are significantly more accurate than both

the conventional soil map in both soil class identificationand property estimation (accuracy increase of ∼17% onsoil series name prediction and agreement coefficientincrease of ∼0.3 on A horizon texture estimation).Compared to case-based reasoning, prototype theoryshows significant improvement only in property estimation(accuracy increase of ∼2% on soil series name predictionand agreement coefficient increase of ∼0.1 on A horizontexture estimation). This case study demonstrates that theprototype-based approach is an effective method in termsof knowledge acquisition and knowledge-based fuzzymapping. It shares the same advantages with previousknowledge-based fuzzy soil mapping methods: theautomated mapping process is more efficient than manualsoil survey and reduces errors introduced in manualcompilation; soil scientist's knowledge of the local soil–landscape model can be consistently applied in the entiremapping area; and the fuzzy representation scheme allowsfor the representation of high level of details of soilinformation. Meanwhile, prototype category theory pro-vides a cognitive basis for the process of knowledgeacquisition, knowledge representation and fuzzy soilinference. It not only leads to more accurate classificationof soil categories, but also represents the knowledge of asoil–landscape model explicitly in a holistic way that isreadily reusable and transferable to new data sets. Inconsideration of the primary characteristic elements ofhuman categorization, the prototype-based approachshows advantages over case-based reasoning in capturingwithin-class variations and transitions between soil classes.As noted by Suchan (1998, p.v–vi):

Prototype category theory can be a prompt to ima-ginative thinking about the categories we craft orchoose before definitions are formalized for geographicrepresentation. A category precedes and influences datacollection, analysis, synthesis, and presentation. Byviewing categories differently, we may gain insights toa given category and its relations to others.

Our case study shows that the prototype-basedapproach applies successfully in the “driftless area” ofsouthwestern Wisconsin, especially for soils of whichthe soil–landscape model is well-developed based on


soil scientist's field experience. The general framework(the knowledge representation, acquisition, and soilinference schemes) should be also applicable in otherparts of the world. Successful application of thisapproach, however, may require changes to the soilcategory level and identification features being usedbased on the local landscape characteristics and thepurpose of the inventory. Applications in other land-scapes are under way to further examine the prototype-based approach within the SoLIM framework.

Like other knowledge-based methods, we recognizethat this approach is highly sensitive to the quality ofthe acquired expert knowledge. If the prototypes of soilclasses are poorly defined, the performance of thisapproach can be less trustworthy, as shown by soils onthe Jordan and Wonewoc parent material in the studywatershed. In this case, proper utilization of specificknowledge in terms of cases can improve the accuracy ofsoil inference. The cognitive basis for inference based onspecific knowledge is the exemplar category theory.Exemplar models store the specific exemplars as thecategory's mental representation and inference isaccomplished through case-based reasoning. Studieshave shown that no single theory accounts for all real-world categories: the exemplar theory is often favoredfor small, poorly structured categories containing low-dimensional stimuli, whereas prototype-based strategiesare better for representing categories that are betterstructured and contain higher dimensional stimuli(Minda and Smith, 2001). For real world categoriessuch as soil classes, the two situations may exist at thesame time. As suggested by psychologist Zeithamova(2003), it should be necessary to adopt a compositemodel in situations where both kinds of categories arepresent in the categorization system. It thus is necessaryin our future research to develop a more versatile systemthat allows for knowledge acquisition and inferencebased on both prototypes and specific exemplars if not allsoil classes are represented as well-formulated proto-types in soil scientist's mental representation of the soil–landscape model.

Acknowledgements

The work reported in this paper is supported by thefunding from theNatural Resources Conservation Serviceof United States Department of Agriculture, and by theChinese Academy of Sciences through its “One-HundredTalents” program toA-Xing Zhu, and by the University ofWisconsin–Madison. The authors also wish to thank theanonymous reviewers for their comments on themanuscript.

References

Band, L.E., Patterson, P., Nemani, R.R., Running, S.W., 1993. Forestecosystem processes at the watershed scale: 2. Incorporatinghillslope hydrology. Agricultural and Forest Meteorology 63,93–126.

Burrough, P.A., 1989. Fuzzy mathematical methods for soil survey andland evaluation. Journal of Soil Science 40, 477–492.

Burrough, P.A., 1996. Natural objects with indeterminate boundaries.In: Burrough, P.A., Frank, A.U. (Eds.), Geographic Objects withIndeterminate Boundaries. Francis and Taylor, London.

Burrough, P.A., McMillan, R.A., Van Deursen, W., 1992. Fuzzyclassification methods for determining land suitability from soilprofile observations and topography. Journal of Soil Science 43,193–210.

Burt, J.E., Zhu, A.X., 2004. 3dMapper 4.02. Terrain Analytics. LLC,Madison, WI. www.terrainanalytics.com.

Davidsson, P., 1992. Concept acquisition by autonomous agents:cognitive modeling versus the engineering approach. LundUniversity Cognitive Studies, vol. 12, ISSN:1101–8453. LundUniversity, Sweden.

Fillmore, C., 1985. Frames and semantics of understanding. Quadernidi Semantica 6, 222–253.

Ford, K.M., Petry, F.E., Adams-Webber, J.R., Chang, P.J., 1991. Anapproach to knowledge acquisition based on the structure ofpersonal construct systems. IEEE Transactions on Knowledge andData Engineering 3, 78–87.

Hahn, U., Ramscar, M., 2001. Similarity and Categorization. OxfordUniversity Press, New York.

Hudson, B.D., 1990. Concepts of soil mapping and interpretation. SoilSurvey Horizons 31, 63–73.

Hudson, B.D., 1992. The soil survey as paradigm-based science. SoilScience Society of America Journal 56, 836–841.

Janikow, C.Z., 1998. Fuzzy decision trees: issues and methods. IEEETransactions on Systems, Man and Cybernetics. Part B. Cyber-netics 28, 1–14.

Khalidi, M.A., 1995. Two concepts of concept. Mind and Language10, 402–422.

Kilmer, V.J., Alexander, L.T., 1949. Methods of making mechanicalanalyses of soils. Soil Science 68, 15–24.

Lakoff, G., 1987. Women, Fire, and Dangerous Things: WhatCategories Reveal about the Mind. University of Chicago Press,Chicago.

MacMillan, R.A., Pettapiece, W.W., Nolan, S.C., Goddard, T.W.,2000. A generic procedure for automatically segmenting land-forms into landform elements using DEMs, heuristic rules andfuzzy logic. Fuzzy Sets Syst. 113, 81–109.

Mark, D.M., Csillag, F., 1989. The nature of boundaries on ‘area-class’maps. Cartographica 26, 65–78.

Markman, A.B., 1999. Knowledge Representation. Lawrence ErlbaumAssociates Publishers, Mahwah, NJ.

McBratney, A.B., Odeh, I.O.A., 1997. Application of fuzzy sets in soilscience: fuzzy logic, fuzzy measurements and fuzzy decisions.Geoderma 77, 85–113.

McBratney, A.B., Mendonca Santos, M.L., Minasny, B., 2003. Ondigital soil mapping. Geoderma 117, 3–52.

McCracken, R.J., Cate, R.B., 1986. Artificial intelligence, cognitivescience, and measurement theory applied in soil classification. SoilScience Society of America Journal 50, 557–561.

McSweeney, K., Gessler, P.E., Slater, B.K., Petersen, G.W., Hammer,R.D., Bell, J.C., 1994. Towards a new framework for modeling thesoil–landscape continuum. Factors of soil formation: a fiftieth

http://www.terrainanalytics.com


anniversary retrospective. Soil Science Society of America JournalSpecial Publication 33, 127–143.

Minda, J.P., Smith, J.D., 2001. Prototypes in category learning: theeffects of category size, category structure, and stimuluscomplexity. Journal of Experimental Psychology. Learning,Memory, and Cognition 27, 775–799.

Minsky, M., 1975. A framework for representing knowledge. In:Winston, P.H. (Ed.), The Psychology of Computer Vision.McGraw-Hill, New York.

Molokova, O.S., 1993. Methodology for the acquisition of knowledgefor expert systems: part 1. Basic concepts and definitions. Journalof Computer and Systems Sciences International 31, 1–5.

Moore, A.C., 2004. Predicting soil property variation using a soillandscape inference model. Thesis (M.S.), University of Wiscon-sin-Madison.

Moore, I.D., Gessler, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soilattribute prediction using terrain analysis. Soil Science Society ofAmerica Journal 57, 443–452.

Özesmi, U., Özesmi, S.L., 2004. Ecological models based on people'sknowledge: a multi-step fuzzy cognitive mapping approach.Ecological Modelling 176, 43–64.

Posadas, A.N.D., Gimenez, D., Bitelli, M., Vaz, C.M.P., Flury, M.,2001. Multifractal characterization of soil particle size distribu-tions. Soil Science Society of America Journal 65, 1361–1367.

Rosch, E.H., 1973. Natural categories. Cognitive Psychology 4,328–350.

Rosch, E.H., 1978. Principles of categorization. In: Rosch, E.H.,Lloyd, B.B. (Eds.), Hillsdale, Cognition and Categorization.Lawrence Erlbaum Associates, NJ.

Scull, P., Franklin, J., Chadwick, O.A., McArthur, D., 2003. Predictivesoil mapping: a review. Progress in Physical Geography 27, 171–197.

Shi, X., Zhu, A.X., Burt, J.E., Qi, F., Simonson, D., 2004. A case-based reasoning approach to fuzzy soil mapping. Soil ScienceSociety of America Journal 68, 885–894.

Smith, E.E., Medin, D.L., 1981. Categories and Concepts. HarvardUniversity Press, Cambridge, MA.

Soil Survey Staff, 1999. Soil taxonomy: a basic system of soilclassification for making and interpreting soil surveys. NaturalResources Conservation Service, U.S. Department of Agriculture,Agriculture Handbook, vol. 436. U.S. Government Printing Office,Washington D.C., pp. 10–869.

Suchan, T., 1998. Categories in Geographic Representation. Ph.D.Thesis, The Pennsylvania State University, University Park, PA.

Weibel, R., Keller, S., Reichenbacher, T., 1995. Overcoming theknowledge acquisition bottleneck in map generalization: the roleof interactive systems and computational intelligence. In: Frank,A., Kuhn, W. (Eds.), Spatial Information Theory: A TheoreticalBasis for GIS. Springer-Verlag, London, pp. 139–156.

Willmott, C.J., 1984. On the evaluation of model performances inphysical geography. In: Gaile, G.L., Willmott, C.J. (Eds.), Spatialstatistics and models. D. Reidel Publi., Dordrecht, the Netherlands,pp. 43–460.

Wittgenstein, L., 1953. Philosophical Investigations. Macmillan, NewYork.

Zadeh, L., 1965. Fuzzy sets. Information and Control 8, 338–353.Zeithamova, D., 2003. Theories of categorization and category

learning: why a single approach cannot be sufficient to accountfor the phenomenon. Studia Psychologica 45, 169–185.

Zevenbergen, L.W., Thorne, C.R., 1987. Quantitative analysis of landsurface topography. Earth Surface Processes and Landforms 12,47–56.

Zhu, A.X., 1997a. A similarity model for representing soil spatialinformation. Geoderma 77, 217–242.

Zhu, A.X., 1997b. Measuring uncertainty in class assignment fornatural resource maps under fuzzy logic. PhotogrammetricEngineering and Remote Sensing 63, 1195–1202.

Zhu, A.X., 1999. A personal construct-based knowledge acquisitionprocess for natural resource mapping. International Journal ofGeographical Information Science 13, 119–141.

Zhu, A.X., Band, L.E., 1994. A knowledge-based approach to dataintegration for soil mapping. Canadian Journal of Remote Sensing20, 408–418.

Zhu, A.X., Band, L.E., Dutton, B., Nimlos, T., 1996. Automated soilinference under fuzzy logic. Ecological Modeling 90, 123–145.

Zhu, A.X., Band, L.E., Vertessy, R., Dutton, B., 1997. Deriving soilproperty using a soil land inference model (SoLIM). Soil ScienceSociety of America Journal 61, 523–533.

Zhu, A.X., Hudson, B., Burt, J., Lubich, K., Simonson, D., 2001. Soilmapping using GIS, expert knowledge, and fuzzy logic. SoilScience Society of America Journal 65, 1463–1472.

fuzzy soil mapping based on prototype category...

Documents