the significance of border training patterns in classification by a feedforward neural network using...

This article was downloaded by: [Boston University]On: 04 October 2014, At: 18:12Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number:1072954 Registered office: Mortimer House, 37-41 Mortimer Street,London W1T 3JH, UK

International Journal ofRemote SensingPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/tres20

The significance ofborder training patternsin classification by afeedforward neural networkusing back propagationlearningGiles M. FoodyPublished online: 25 Nov 2010.

To cite this article: Giles M. Foody (1999) The significance of border trainingpatterns in classification by a feedforward neural network using back propagationlearning, International Journal of Remote Sensing, 20:18, 3549-3562, DOI:10.1080/014311699211192

To link to this article: http://dx.doi.org/10.1080/014311699211192

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of allthe information (the “Content”) contained in the publications on ourplatform. However, Taylor & Francis, our agents, and our licensorsmake no representations or warranties whatsoever as to the accuracy,completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views ofthe authors, and are not the views of or endorsed by Taylor & Francis.The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor andFrancis shall not be liable for any losses, actions, claims, proceedings,demands, costs, expenses, damages, and other liabilities whatsoever

http://www.tandfonline.com/loi/tres20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/014311699211192

http://dx.doi.org/10.1080/014311699211192

or howsoever caused arising directly or indirectly in connection with, inrelation to or arising out of the use of the Content.

This article may be used for research, teaching, and private studypurposes. Any substantial or systematic reproduction, redistribution,reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of accessand use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

int. j. remote sensing, 1999, vol. 20, no. 18, 3549± 3562

The signi® cance of border training patterns in classi® cation by afeedforward neural network using back propagation learning

GILES M. FOODY²Department of Geography, University of Southampton, High® eld,Southampton SO17 1BJ, UK

(Received 9 December 1997; in ® nal form 18 August 1998 )

Abstract. Training patterns vary in their importance in image classi® cation.Consequently, the selection and re® nement of training sets can have a majorimpact on classi® cation accuracy. For classi® cation by a neural network, trainingpatterns that lie close to the location of decision boundaries in feature space mayaid the derivation of an accurate classi® cation. The role of such border trainingpatterns and their identi® cation is discussed in relation to a series of crop classi-® cations from airborne Thematic Mapper data. It is shown that a neural networktrained with a set of border patterns may have a lower accuracy of learning buta signi® cantly higher accuracy of generalization than one trained with a set ofpatterns drawn from the cores of the classes. Unfortunately, conventional trainingpattern selection and re® nement procedures tend to favour core training patterns.For classi® cation by a neural network, procedures which encourage the inclusionof border training patterns should be adopted as this may facilitate the productionof an accurate classi® cation.

1. IntroductionNeural networks have become a popular tool in the analysis of remotely sensed

data. Although there are a wide range of network types and possible applications inremote sensing, most attention has focused on the use of multi-layer perceptron orfeedforward networks trained with a backpropagation learning algorithm for super-vised classi® cation (Atkinson and Tatnall 1997, Day 1997, Wilkinson 1997).Numerous studies on this topic have demonstrated that neural networks may gener-ally classify data more accurately than conventional statistical methods. As with allsupervised classi® cations, however, the quality of the training stage is of major, ifnot paramount, importance (Foody and Arora 1997). In the training stage, the aimis for the network to adapt itself iteratively until it is able to identify correctly theclass membership of the training patterns provided to it and generalize accurately.

Numerous factors in¯ uence the ability of a neural network to learn the classcharacteristics from the training set and develop a capacity to predict the classmembership of previously unseen cases accurately. These range from the propertiesof the network itself, such as its complexity and learning parameters, to those of thetraining set. The combined e� ect of these factors complicates the design of the

² Tel: 023 8059 5493. Fax: 023 8059 3295. E-mail address: [email protected]

International Journal of Remote SensingISSN 0143-1161 print/ISSN 1366-5901 online Ñ 1999 Taylor & Francis Ltd

http://www.tandf.co.uk/JNLS/res.htmhttp://www.taylorandfrancis.com/JNLS/res.htm

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

G. M. Foody3550

training stage of a neural network classi® cation, particularly as the analyst sometimesmust respond to con¯ icting pressures. It is important that, for instance, the networkis su� ciently complex and intensively trained to characterize the classes without lossof generalization power (i.e. the ability to identify class membership of previouslyunseen cases correctly) or imposition of excessive computational demands. For this,the analyst must weigh-up the costs and bene® ts of a range of issues such as di� erentnetwork architectures, learning algorithms and parameters. To help resolve some ofthe problems, research has addressed issues such as the optimization of the networkarchitecture and the development of accelerated learning approaches (Dryer 1993,Jiang et al. 1994, Manry et al. 1994). Whatever network properties are selected, thetraining set will have a signi® cant e� ect on classi® cation accuracy.

The in¯ uence of the training set on a classi® cation extends well beyond thetraining stage. The nature of the training set can have a major impact on the abilityof the network to generalize and thus on the accuracy with which an independenttest set may be classi® ed. The size of the training set in relation to the complexityof the network is, for instance, related to the likely error rate of a neural networkclassi® cation (Baum and Haussler 1989, Bishop 1995). This is supported by observa-tions that classi® cation accuracy may be positively related to training set size (Zhuanget al. 1994). However, beyond a certain training set size the rate of increase inaccuracy with the addition of further training patterns diminishes (Zhuang et al.1994, Foody and Arora 1997) while the cost of ground data acquisition rises. Size isnot, however, the only property of the training set that in¯ uences the accuracy of aneural network classi® cation. The composition of the training set, notably in termsof the number of training samples or patterns for each class, is important, particularlyin relation to the relative abundance of the classes at the site and the degree of intra-class heterogeneity (Foody et al. 1995, Blamire 1996, Staufer and Fischer 1997,Fardanesh and Ersoy 1998). There must therefore be a su� cient number and varietyof training patterns to enable the network to train appropriately and represent reality(Dawson et al. 1993, Atkinson and Tatnall 1997).

Analysis of the training data may also reveal means of increasing the performanceof the network, particularly in terms of speed, computational requirements andclassi® cation accuracy. It may, for instance, reveal redundant discriminating variableswhich may then be excluded from the classi® cation. In addition to restricting theclassi® cation to those discriminating variables that are useful for separating betweenclasses, this feature selection results in a decrease in network complexity and trainingtime while maintaining, if not possibly increasing, classi® cation accuracy (Chang andLippmann 1991, Battiti 1994, Lee and Landgrebe 1997, Benediktsson and Sveinsson1997). A further re® nement that may be made is through editing of the training setitself. Training patterns are of unequal value and, therefore, vary in their usefulness.The realization of this, and that not all training patterns are useful, with some evendetrimental to the classi® cation, opens up further possible methods of increasing theaccuracy of network learning and generalization. Neural networks have, for instance,di� culty coping with con¯ icting information. Thus the removal of ambiguous train-ing cases while decreasing the size of the training set may increase the accuracy ofnetwork learning and generalization (Dawson et al. 1993). Further increases inaccuracy could be obtained by selecting training patterns and re® ning training setsintelligently (Ahmad and Tesauro 1989, Chang and Lippmann 1991).

Typically, training sets are acquired and re® ned using approaches that weredeveloped for use with other classi® cation techniques. In general, the training patterns

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

Border training patterns in neural network classi® cation 3551

for a supervised classi® cation are supposed to be drawn from sites that are represent-ative of the classes but there is a tendency for them to be taken from exemplar sitesof each class. Once extracted, the training set may also be subjected to re® nementprocedures that generally aim to remove or down-weight the in¯ uence of atypicalor extreme training patterns (Buttner et al. 1989, Aria 1992, Ediriwickrema andKhorram 1997, Mather 1999). As a consequence of these training site location andre® nement issues, class separability is often over-estimated in training datasets.Consequently, a set of such training patterns may not be an ideal basis for classi® ca-tion. This is particularly evident with a neural network classi® cation in which theactual distribution of training patterns in feature space (rather than summary statist-ics of their distribution) is a major determinant of the location of classi® cationdecision boundaries or hyperplanes. Ahmad and Tesauro (1989) demonstrated theimportance of training patterns that lie close to the decision boundary in achievingan accurate classi® cation. These border training patterns may, however, lie outsidethe c̀ore’ of each class and so be prone to removal or revision by conventionaltraining set re® nement techniques. The aimof this paper is to evaluate the signi® canceof such border training patterns in a neural network classi® cation, with particularemphasis on the accuracy of network learning and generalization. The implicationsfor training set selection and re® nement are also addressed.

2. Test site and data

Airborne Thematic Mapper (ATM) data in eleven wavebands with a spatialresolution of approximately 5m acquired in July 1986 for a region of ¯ at agriculturalland located near the village of Feltwell, Norfolk, UK, were used. At this site in 1986a range of crops was grown but most of the land was planted to sugar beet, wheat,barley, carrots, potatoes and grass and this study focused on these six classes only.Near the time of the ATMdata acquisition a crop map was produced by conventional® eld survey methods and this was used as ground data. Further details and examplesof the datasets were given in an earlier paper (Foody and Arora 1997).

3. MethodsSince the spatial resolution of the ATM data was relatively ® ne in relation to the

size of the ® elds at the test site, most pixels could be assumed to be pure. The dataset,however, contained a small proportion of mixed pixels, mainly in the vicinity of ® eldboundaries. As boundary regions are often unrepresentative and the inclusion ofmixed pixels degrades analyses, a bu� er zone around the boundaries of the ® eldswas de® ned and masked out of the study, conforming to common practice in cropclassi® cations. A strati® ed random sample of 1050 pixels, 175 pixels per class, wasacquired from the ATM data. This was divided systematically into training (n= 450)and testing samples (n= 600). A stepwise discriminant analysis was performed onthe sample of training patterns to evaluate the extent of inter-class separability andthe relative discriminatory power of the data acquired in the eleven wavebands fora feature selection. On the basis of the results, three wavebandsÐ 0.60± 0.63 mm,0.69± 0.75 mm and 1.55± 1.75 mmÐ were identi® ed as providing a high level of inter-class separability, classifying 85.1% of the training set correctly; only the dataacquired in these three wavebands were used in the later analyses. In addition tofocusing on the wavebands providing the highest degree of inter-class separability,the feature selection also helped to limit the complexity of the neural network andits likely error rate for a training set of de® ned size.

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

G. M. Foody3552

To evaluate the signi® cance of border training patterns on the accuracy of aneural network classi® cation, the training sample was divided into two trainingsets Ð one that may be considered to be composed of border patterns the other ofnon-border patterns. For this, a scale of b̀orderness’ was derived.

A set of border training patterns would ideally contain patterns from di� erentclasses but which are close together in feature space and thereby expected to lie nearthe classi® cation decision boundary. Such training patterns would therefore beexpected to show a relatively similar degree of membership to the classes either sideof the boundary. Class membership may be expressed in numerous ways. In thispaper, the strength of class membership was expressed by the Mahalanobis distance.This measures the closeness of a training pattern to a class centroid and so providesa measure of the typicality of that pattern to the class; its magnitude is inverselyrelated to the strength of class membership and may be converted to a probabilityof membership with reference to a chi-square distribution (McKay and Campbell1982). A border training pattern would be expected to be almost as close to itsactual class of membership as it is to another class. Ideally, therefore, the di� erencein the Mahalanobis distances between the two most likely classes of membershipwould be small for a border training pattern. Thus b̀orderness’ may be expressedas the di� erence between the two smallest Mahalanobis distances measured for eachtraining pattern.

The Mahalanobis distance between each training pattern and each class wascomputed and the di� erence between the two smallest distances derived for each ofthe 450 training patterns. These were then ordered by the magnitude of the di� erencebetween the two smallest distances and divided into two equally sized training sets.Those training patterns with a small di� erence between the two smallest distancesformed the training set considered to contain border training patterns. Conversely,the training set containing the training patterns with a relatively large di� erencebetween the two smallest distances was considered to contain non-border patterns.These interpretations are supported by examination of the location of the trainingpatterns in feature space. In general, the patterns in the training set with a largedi� erence between the two smallest Mahalanobis distances appear clustered intoclasses, sometimes clearly separate from other classes (® gure 1). These training pat-terns, therefore, appear to have been drawn from cores of the classes. With the othertraining set the patterns are more dispersed and the location of the classes lessvisually distinct (® gure 2). For brevity, the training sets comprising the trainingpatterns with a low and a large di� erence between the two smallest Mahalanobisdistances will be referred to as the b̀order’ training set and c̀ore’ training setrespectively.

A series of trial runs was undertaken to de® ne an appropriate neural networkapproach for the investigation. For comparative purposes, emphasis was placed ona series of classi® cations in which the same network architecture and parameterswere used in all classi® cations, although the results from a limited number of otherarchitectures are provided to broaden the applicability of the discussion. The focus,however, was on a conventional feedforward neural network with a single hiddenlayer comprising ten units and employing the standard logistic sigmoid transferfunction. There were three input and six output units, determined respectively bythe number of wavebands andclasses used, and a bias unit connected to all processingunits. A stochastic backpropagation learning algorithmwas used with the parametersde® ning the learning rate and momentum set initially at 0.8 and 0.2 respectively.

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4


(c)

(a)

(b)

Figure 1. Scatterplots showing the location of the core training patterns in the feature spaceof the datasets used. (a) 1.55± 1.75 and 0.60± 0.63 mm wavebands; (b) 1.55± 1.75 and0.69± 0.75 mm wavebands; (c) 0.60± 0.63 and 0.69± 0.75 mm wavebands.

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

G. M. Foody3554

(c)

(a)

(b)

Figure 2. Scatterplots showing the location of the border training patterns in the featurespace of the datasets used. (a) 1.55± 1.75 and 0.60± 0.63 mm wavebands; (b) 1.55±1.75 and 0.69± 0.75 mm wavebands; (c) 0.60± 0.63 and 0.69± 0.75 mm wavebands. Forcomparative purposes the axes were scaled as in ® gure 1.

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4


The learning rate was reduced during training to 0.15 after 2000 epochs, and to 0.1after a further 4000 epochs. Each network was trained for a total of 10 000 epochsand the ® nal network error in learning noted as a guide to the accuracy of training.The magnitude of this error is negatively related to the accuracy of network learningand is often used as a guide to the quality of a neural network for classi® cationapplications.

Once trained, the test set was passed through the network and each case allocatedto the class associated with the most activated output unit. The accuracy of theclassi® cation derived was evaluated with the aid of a confusion matrix from whichquantitative measures of accuracy were derived to illustrate the quality of thenetwork’s generalizations. These measurements were the percentage correct alloca-tion and a kappa-like coe� cient (Kn ), sometimes termed a tau coe� cient, thatprovides a more realistic degree of compensation for the e� ects of chance agreementthan the widely used Cohen’s kappa coe� cient of agreement (Brennan and Prediger1981, Foody 1992). The kappa-like coe� cients were calculated from the equation:

K n =Po Õ (1/c)1Õ (1/c)

(1)

where Po is the observed proportion of agreement and c the number of classes. Thesigni® cance of di� erences in classi® cation accuracy expressed by Kn was assessedwith a Z test (Ma and Redmond 1995).

4. Results and discussionClassi® cations were undertaken using training sets composed of core and border

training patterns to illustrate the e� ect of border training patterns on networklearning and generalization. The implications of these classi® cations to training setselection and re® nement were then assessed.

4.1. Classi® cations with core and/or border training patternsThe border and core training sets were used to train two similarly constructed

neural networks. The ® nal error in training was substantially lower for the networktrained with the set of core training patterns than that trained with the set of bordertraining patterns (table 1). This re¯ ected the somewhat simpler task of separatingthe classes as represented in the core training set, where the classes were relativelydiscrete and unambiguous, than in the border training set. In terms of the ability ofthe networks to generalize and predict accurately the class membership of previouslyunseen cases, which is a more rigorous and realistic evaluation of the accuracy of aneural network classi® cation, markedly di� erent results were obtained. While appar-ently having trained to a lower accuracy then the network trained with the set ofcore patterns, the network trained with the set of border patterns generalized moreaccurately (table 1). The accuracy of the classi® cations di� ered by 5% and thedi� erence in Kn coe� cients was signi® cant at the 95% level of con® dence. Theseresults highlight the danger of using the training error as an index of network quality.More importantly, they indicate the importance of border training patterns for anaccurate classi® cation. The same trends were also observed for networks with di� er-ent architectures (table 2). It is apparent, therefore, that training pattern selectionand re® nement approaches for neural network classi® cation should aim to ensurethe presence of border training patterns. Di� erent relationships may, however, beobserved with other classi® ers including neural networks using a di� erent learning

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

G. M. Foody3556

Table 1. Summary of the results of the classi® cations derived from neural networks trainedwith the border, core and combined training sets. The accuracy of training is indicatedby the ® nal root sum squared error observed at the termination of training while thepercentage correct allocation and Kn coe� cient express the accuracy of the classi® ca-tion of the independent testing set. For each classi® cation the network used had anarchitecture comprising three input units, ten hidden units in a single layer and sixoutput units (i.e. 3:10:6). Note that the accuracyvalues derived for a single classi® cationare not directly comparable and each should only be interpreted relatively betweenclassi® cations derived with the use of di� erent training sets.

Generalization accuracy

Number of Training accuracy PercentageTraining set patterns (® nal learning error) correct Kn

Border 225 0.1755 83.33 0.806

Core 225 0.0407 78.33 0.739219 0.0003 78.33 0.739164 0.0004 78.66 0.743123 0.0005 79.50 0.75493 0.0004 78.33 0.739

Combined 225 0.1402 86.33 0.836450 0.1386 87.00 0.844

Table 2. Summary of results of classi® cations obtained using neural networks with di� erentarchitectures (input units:hidden units in a single layer:output units).

Generalization accuracy

Training accuracy PercentageArchitecture Training set (® nal learning error) correct Kn

3:4:6 Border 0.2313 82.16 0.786Core 0.0837 75.16 0.702

3:7:6 Border 0.1995 85.00 0.820Core 0.0498 77.83 0.739

3:12:6 Border 0.1634 78.50 0.742Core 0.0224 77.33 0.728

algorithm. For example, classi® cations of the test set by a discriminant analysistrained with the core and border training patterns di� ered insigni® cantly (Foody1998).

Border training patterns appear to be more important than core patterns for thederivation of an accurate classi® cation. By being less distinct or more ambiguousthan core training patterns, they lie closer to the locality of an e� ective decisionboundary. Conversely, the core training patterns appear to present a relativelyunambiguous and simplediscrimination problembut the resultant decision boundarymay be a poor discriminator of unseen testing cases, particularly if these are drawnfrom a varied sample. The core training set used was, however, not free fromambiguity. It had been assumed that with the core training patterns, at least, theclosest class would be the actual class of membership. This was not the case for sixof the training patterns, for two of which the actual class of membership was not

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4


even the second most likely class of membership. Given that in this training set thedi� erences in the distances between the classes are relatively large it is possible thatthese training patterns could be a signi® cant source of confusion to the learningprocess and could distort the generalization ability of the network. These six trainingpatterns were therefore deleted from the core training set and the network was re-trained before it was used to re-classify the test set. The result was a marked decreasein the training error at the termination of the learning process but there was nosigni® cant e� ect on the accuracy of generalization (table 1). This supports the viewthat ambiguous cases may degrade network learning. In this instance, however, theyhad no signi® cant e� ect on the accuracy of the classi® cation of the testing set.

Since the core training patterns are relatively clumped in feature space it may behypothesized that variations in the size of the training set would have a limited e� ecton the accuracy of network training or generalization; a complex and largely case-dependent relationship would be expected with the border training set. To test this,the core training set was reduced in size by systematic removal of training patternsin relation to their position along the scale of borderness de® ned. In this way, eachtraining set had the same general distribution of training patterns with the onlymajor di� erence being the overall size of the training set. Three training sets wereproduced, containing respectively 164, 123 and 93 training patterns; the six ambigu-ous cases were excluded from all of these sets. Each was used to train a networkand the accuracy of network learning and generalization evaluated as with the othernetworks. The results revealed a high degree of stability in both the accuracy ofnetwork learning and generalization. The accuracy of the classi® cations of the testingset, for instance, di� ered by at most 1.17%, insigni® cant at the 95%level of con® dence(table 1).

Overall, the results indicate the signi® cance of border training patterns in a neuralnetwork classi® cation. It is apparent that, relative to core training patterns, theirpresence may reduce the accuracy of network learning but increase the accuracy ofgeneralization. Numerous factors restrict further evaluation and comparison of theresults. For instance, while the core and border training sets were of the same sizethey di� ered in class composition and the scale of borderness de® ned, while simpleto derive and intuitively logical, may not be optimal. Furthermore, since the samenetwork architecture and learning algorithm parameters were used in all classi® ca-tions it is possible that the network was of variable suitability for the individualclassi® cations. Consequently, there are many training set issues still to be addressedfully. Critically, these include the speci® cation of the ideal properties for a trainingset (e.g. size, class composition, level of inter- and intra-class variability etc.) inaddition to the methods for the selection and re® nement of training sets. It may bethat an ideal training set includes both core and border training patterns. This isevident in table 1 which shows the results of classi® cations derived using a combina-tion of training patterns drawn from the core and border training sets. A combinedtraining set, composed of half of each of the border and core training sets sampledsystematically along the scale of borderness to form a set of the same size forcomparative purposes, exhibited an accuracy of learning intermediate between thatobserved with classi® cations based on the core and border training sets. The accuracyof the classi® cation of the test set was, however, higher than that derived fromnetworks trained with the core and border patterns (table 1). This indicates thatcore and border training patterns provide di� erent anduseful discriminatory informa-tion. Using all the training patterns available and thereby doubling the size of the

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

G. M. Foody3558

combined training set resulted in a marginal, but insigni® cant (at 95% level ofcon® dence), increase in accuracy.

4.2. Implications for training set selection and re® nementThe important conclusion to be drawn from the classi® cations reported above is

that the inclusion of border training patterns in a training set appears to facilitatethe productionof an accurate classi® cation. Consequently it is important that trainingset re® nement procedures for a neural network classi® cation aim to maintain theborder patterns, perhaps at the expense of some core training patterns. This mayprove to be a challenging task as training set re® nement is, as succinctly describedby Bolstad and Lillesand (1991), more of an art than a science. There are manymethods that may be used but in general they aim to remove or down-weight thecontribution of apparently atypical training patterns and raise accuracy.Unfortunately, as border training patterns may be relatively ambiguous and lie awayfrom the class centroids, they may be identi® ed as candidates for removal or revisionin the course of a conventional training set re® nement procedure. The Mahalanobisdistance is, for example, the basis of some training set re® nement methods, withdistant patterns removed or down-weighted. Table 3 shows that border patterns aregenerally more distant and so atypical of their actual class of membership than arecore training patternsÐ they are thus stronger candidates for re® nement operations.Paradoxically therefore, the application of a conventional training set re® nementprocedure could result in the loss of useful discriminatory information through, forinstance, the removal of border training patterns. The impact of this is di� cult topredict but, as a guide, some analyses were repeated using equally sized training setsthat were de® ned on the basis of their Mahalanobis distance to the actual class of

Table 3. Summary of Mahalanobis distances to the actual class of membership for thetraining patterns in the border and core training sets. Note that of the six ambiguouscases in the core training set (see text for discussion) ® ve were of sugar beet and oneof potatoes and the ® gures in brackets show the values obtained after their deletion.In general, the training patterns in the border set are either of similar or larger distancefrom the class centroids than the patterns in the core training set.

Mahalanobis distance to actual class

Training set Class Mean Median Minimum Maximum

Border Barley 3.57 2.98 0.51 9.11Carrots 3.80 3.44 0.73 15.35Grass 5.19 5.82 0.78 10.38Potatoes 6.05 4.33 1.75 21.50Sugar beet 1.59 1.09 0.02 11.10Wheat 1.40 0.96 0.07 8.86All 2.96 2.10 0.02 21.50

Core Barley 4.40 3.05 0.43 13.58Carrots 2.78 1.73 0.17 9.10Grass 2.63 1.37 0.02 12.73Potatoes 2.09(1.79) 1.54(1.53) 0.09 18.55(5.20)Sugar beet 6.99(1.74) 2.34(1.66) 0.47 23.47(3.53)Wheat 0.69 0.67 0.26 1.40All 2.95(2.54) 1.56(1.43) 0.02 23.47(13.58)

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4


membership. For this, the entire training sample (n= 450) was ordered by the distanceof each training pattern to the centroid of its actual class of membership. This wasthen divided into two halves. One set contained the patterns with a small distanceto their actual class of membership, which are typical of their class and likely to beignored in training set re® nement. Conversely, the other set contained those patternsfor which the distance to the actual class was high and may be considered to berelatively atypical of their class (table 4). The training patterns in this set would thusbe the focus of training set re® nement procedures with some perhaps removed orrevised. In terms of the accuracy with which the training set was characterized, the® nal error in learning was lower when the low- rather than high-distance trainingset was used. Indeed, the error was similar to that derived from the use of the coretraining pattern set (tables 1 and 5). This is not surprising as both e� ectively comprisetraining patterns that lie near the class centroids. In terms of the accuracy withwhich the test set could be classi® ed, a higher accuracy was derived from the networktrained with the low-distance training set than the high-distance training set (table 5).While this indicates that the exclusion of atypical patterns increases the classi® cationaccuracy, it is apparent that a signi® cantly higher accuracy was observed with theuse of the border training set (table 1). Accepting that training pattern re® nementcan be valuable in a neural network classi® cation, it is, therefore, important thatapproaches that recognize the signi® cance of border patterns are used. Clearly, basicprocedures such as those based directly on the Mahalanobis distance to a class orthe typicality of class membership are inappropriate as they are de® ned in relationto a single class. Although border patterns are di� cult to characterize it is apparentthat their de® nition must involve more than one class. The scale of bordernessused here was e� ectively based upon the typicality of membership to two classesand could be used to identify training patterns capable of facilitating an accurateclassi® cation of the test set.

Table 4. Summary statistics of the training sets de® ned by the Mahalanobis distance to theactual class of membership.

Mahalanobis distance to actual class

Number ofTraining set patterns Mean Median Minimum Maximum

Low distance 225 0.87 0.79 0.02 1.78High distance 225 5.04 3.57 1.78 23.47

Table 5. Accuracy of network training and generalization for the networks trained with thetraining sets de® ned by the Mahalanobis distance to the actual class of membership (table 4).

Generalization accuracyTraining accuracy

Training set (® nal learning error) Percentage correct Kn

Low distance 0.0005 77.33 0.728High distance 0.1495 73.66 0.684

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

G. M. Foody3560

5. Summary and conclusions

The main results from a series of classi® cations were as follows.

(1) The ® nal error in training a neural network was substantially lower for thenetwork trained with the set of core training patterns than that trained withthe set of border training patterns.

(2) The network trained with border patterns classi® ed an independent test setto a signi® cantly higher accuracy than did the network trained with corepatterns.

(3) Training patterns di� er in their value and information content. Overall,border training patterns appear to present a di� cult problem in trainingbut aid the accuracy of network generalization. In the core training set itwas apparent that some patterns were problematic and their removal aidednetwork learning. Additionally, variation of the number of core patternsused, over the range studied, had little e� ect on the accuracy of networkgeneralization.

(4) The combination of core and border training patterns appeared to make useof the di� erent information contents of the data. The highest accuracies ofthe classi® cations of the test set were, however, observed for networks inwhich the training set contained border training patterns.

(5) Border training patterns are generally more atypical of the actual class ofmembership than are core training patterns. Unfortunately, this makes themstronger candidates for removal or revision in conventional training setre® nement operations.

The degree of borderness of a training set appears to have a marked e� ect onthe accuracy of a neural network classi® cation. A training set comprising borderpatterns presents the network with the relatively di� cult task of ® tting a decisionboundary through a set of data in which the classes appear relatively indistinct.These training patterns, however, provide important information on which to guidethe location of the classi® cation decision boundary and aid the derivation of anaccurate classi® cation of an unseen test set. Conversely, with a set of core trainingpatterns it is relatively easy to discriminate between the classes in training thenetwork but the classi® cation decision boundary ® tted is less well located than thatderived with the aid of the border patterns. As a result, a network trained with a setof border training patterns may be expected to have a lower accuracy of learningbut higher accuracy in generalization than one trained with a set of core trainingpatterns.

While training sets containing core training patterns (i.e. the core and combinedtraining sets) consistently provided a high accuracy in learning, the highest accuraciesin generalization were derived from networks trained with at least some bordertraining patterns (i.e. the border and combined training sets). It is clear, therefore,that training patterns vary in their value in a classi® cation. The character of atraining set, not just its size, must, therefore, be considered. In addition to the issuesdiscussed in this paper there are other important implications of the results. Forinstance, de® ning network architecture on rules based on the number of trainingpatterns may be inappropriate in some circumstances. The composition of thetraining set and the nature of the classes will need consideration. Although theoptimal composition of a training set was not de® ned, it is apparent that the presenceof border patterns substantially in¯ uences the accuracy of a neural network

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4


classi® cation. Since, by de® nition, these patterns generally lie outside the core of theclasses they may be less favoured in training site selection, which has traditionallybeen biased towards exemplar sites. Furthermore, border training patterns may beexpected to display characteristics that identify them as candidates for deletion orrevision in the course of a training set re® nement operation. They may, for instance,be expected to lie at some distance from the core of their class and have a highdegree of membership to another class. Therefore, the application of such re® nementprocedures could paradoxically actually remove informative training patterns whilemaintaining less informative ones.

Acknowledgments

The datasets used were acquired as part of the AgiSAR campaign sponsored bythe Commission of the European Communities. The neural network analyses wereundertaken with the NCS NeuralDesk package. I am grateful for comments on theresearch arising from conference presentations and for suggestions for improvementsto the paper from the two referees.

References

Ahmad, S., and Tesauro, G., 1989, Scaling and generalisation in neural networks: a casestudy. In Proceedings 1988 Connectionist Models Summer School, edited byD. Touretzky, G. Hintonand T. Sejnowsjki (San Mateo: Morgan Kaufmann), pp. 3± 10.

Aria, K., 1992, A supervised Thematic Mapper classi® cation with a puri® cation of trainingsamples. International Journal of Remote Sensing, 13, 2039± 2049.

Atkinson, P. M., and Tatnall, A.R. L., 1997, Neural networks in remote sensing. InternationalJournal of Remote Sensing, 18, 699± 709.

Battiti, R., 1994, Using mutual information for selecting features in supervised neural netlearning. IEEE T ransactions on Neural Networks, 5, 537± 550.

Baum, E. B., and Haussler, D., 1989, What size net gives valid generalisation? In Advancesin Neural Information Processing Systems, volume 1, edited by D. S. Touretzky (SanMateo: Morgan Kaufmann), pp. 81± 90.

Benediktsson, J. A., and Sveinsson, J. R., 1997, Feature extraction for neural networkclassi® ers. In Neurocomoputation in Remote Sensing Data Analysis, edited byI. Kanellopoulos, G. G. Wilkinson, F. Roli and J. Austin (Berlin: Springer-Verlag),pp. 97± 104.

Bishop, C. M., 1995, Neural Networks for Pattern Recognition (Oxford: Oxford UniversityPress).

Blamire, P. A., 1996, The in¯ uence of relative sample size in training arti® cial neural networks.International Journal of Remote Sensing, 17, 223± 230.

Bolstad, P. V., and Lillesand, T. M., 1991, Semi-automated training approaches for spectralclass de® nition. International Journal of Remote Sensing, 13, 3157± 3166.

Brennan, R. L., and Prediger, D. J., 1981, Coe� cient kappa: some uses, misuses andalternatives. Educational and Psychological Measurement, 41, 687± 699.

Buttner, G., Hajos, T., and Korandi, M., 1989, Improvements to the e� ectiveness of super-vised training procedures. International Journal of Remote Sensing, 10, 1005± 1013.

Chang, E. I., and Lippmann, R. P., 1991, Using genetic algorithms to improve patternclassi® cation performance. In Advances in Neural Information Processing Systems,volume 3, edited by R. P. Lippmann, J. E. Moody and D. S. Touretzky (San Mateo:Morgan Kaufmann), pp. 797± 803.

Dawson, M. S., Fung, A. K., and Manry, M. T., 1993, Surface parameter retrieval using fastlearning neural networks. Remote Sensing Reviews, 7, 1± 18.

Day, C., 1997, Remote sensing applications which may be addressed by neural networks usingparallel processing technology. In Neurocomoputation in Remote Sensing Data Analysis,edited by I. Kanellopoulos, G. G. Wilkinson, F. Roli and J. Austin (Berlin: Springer-Verlag), pp. 262± 278.

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

Border training patterns in neural network classi® cation3562

Dryer, P., 1993, Classi® cation of land cover using optimized neural nets on SPOT data.Photogrammetric Engineering and Remote Sensing, 59, 617± 621.

Ediriwickrema, J., and Khorram, S., 1997, Hierarchical maximum-likelihood classi® cationfor improved accuracies IEEE T ransactions on Geoscience and Remote Sensing, 35,810± 816.

Fardanesh, M. T., and Ersoy, O. K., 1998, Classi® cation accuracy improvement of neuralnetwork classi® ers by using unlabeled data. IEEE T ransactions on Geoscience andRemote Sensing, 36, 1020± 1025.

Foody, G. M., 1992, On the compensation for chance agreement in image classi® cationaccuracy assessment. Photogrammetric Engineering and Remote Sensing, 58, 1459± 1460.

Foody, G. M., 1998, The e� ect of border training patterns on the accuracy of neural andstatistical image classi® cations. Developing International Connections (Nottingham:Remote Sensing Society), pp. 589± 595.

Foody, G. M., and Arora, M. K., 1997, An evaluation of some factors a� ecting the accuracyof classi® cation by an arti® cial neural network. International Journal of Remote Sensing,18, 799± 810.

Foody, G. M., McCulloch, M. B., and Yates, W. B., 1995, The e� ect of training set size andcomposition on arti® cial neural network classi® cation. International Journal of RemoteSensing, 16, 1707± 1723.

Jiang, X., Chen, M.-S., Manry, M. T., Dawson, M. S., and Fung, A. K., 1994, Analysis andoptimisationof neural networks for remote sensing. Remote Sensing Reviews, 9, 97± 114.

Lee, C., and Landgrebe, D. A., 1997, Decisionboundary feature extraction for neural networksIEEE T ransactions on Neural Networks, 8, 75± 83.

Ma, Z., and Redmond, R L., 1995, Tau coe� cients for accuracy assessment of classi® cationof remote sensing data. Photogrammetric Engineering and Remote Sensing, 61, 435± 439.

Manry, M. T., Dawson, M. S., Fung, A. K., Apollo, S. J., Allen, L. S., and Lyle, W. D.,1994, Fast training of neural networks for remote sensing. Remote Sensing Reviews,9, 77± 96.

Mather, P. M., 1999, Computer Processing of Remotely-Sensed Images, second edition(Chichester: Wiley).

McKay, R. J., and Campbell, N. A., 1982, Variable selection techniques in discriminantanalysis II: Allocation. British Journal of Mathematical and Statistical Psychology,35, 30± 41.

Staufer, P., and Fischer, M. M., 1997, Spectral pattern recognition by a two-layer perceptron:e� ects of training set size. In Neurocomoputation in Remote Sensing Data Analysis,edited by I .Kanellopoulos, G. G. Wilkinson, F. Roli and J. Austin (Berlin: Springer-Verlag), pp. 105± 116.

Wilkinson, G. G., 1997, Open questions in neurocomputing for Earth observation. InNeurocomoputation in Remote Sensing Data Analysis, edited by I .Kanellopoulos,G. G. Wilkinson, F. Roli and J. Austin (Berlin: Springer-Verlag), pp. 3± 13.

Zhuang, X., Engel, B. A., Lozano-Garcia, D. F., Fernandez, R. N., and Johannsen, C. J.,1994, Optimisation of training data required for neuro-classi® cation. InternationalJournal of Remote Sensing, 15, 3271± 3277.

Dow

nloa

ded

by [

Bos

ton

Uni

vers

ity]

at 1

8:12

04

Oct

ober

201

4

the significance of border training patterns in classification by a feedforward neural network using...

Documents