a comparison of machine learning algorithms: the effects of

Advisor: Dr. John Rogan

A comparison of Machine Learning Algorithms:A comparison of Machine Learning Algorithms:The Effects of Classification Scheme Detail on Map AccuracyThe Effects of Classification Scheme Detail on Map Accuracy

Sponsors:

2

4

5

7

3

B2 > 0.5B2 < 0.5

DEM > 200

B3 > 0.1

B3 < 0.1

B4 < 0.7 B4 < 0.7

B5 > 0.1

DEM < 200

B5 < 0.1

F1 Layer Neurons

Vigilance parameter

F2 Layer Neurons

ARTb

ARTa

Joe Fortier Clark University Worcester, MA

Multi-Layer Perceptron (MLP)

Classification Tree Algorithm (CTA)

Self-Organizing Map (SOM)

Fuzzy ARTMAP

AbstractAbstractRemote sensing research has benefited from the introduction of machine learning algorithms for specific land-cover and land-use mapping. As research into machine learning tools grows, knowing which algorithm is most appropriate for a given application can be difficult. Previous studies have compared machine learning classifiers to assess land-cover map accuracies, but little is known about the impact of thematic detail on algorithm performance. In this study Landsat ETM+ and environmental GIS data are used to compare classification speeds and output accuracies between a Classification Tree Algorithm, Fuzzy ARTMAP, Self Organizing Feature Map, and a Multi-layer Perceptron at aggregated levels of classification scheme detail, over a subset of Massachusetts, USA. The results of this study demonstrate that while the four algorithms produce statistically similar accuracies at three levels of thematic detail, the classification tree algorithm outperforms other classifiers in speed and integration of multi-source data.

Accuracy Assessment

Level I Level II Level III

• Spectral 07/07/1999

• Spectral 07/07/1999

• Spectral 9/27/2000

• Spectral 07/07/1999

• Spectral 9/27/2000

• Ancillary data & texture

Inpu

t Dat

a U

sed

CTALand-cover Maps

SOM

Fuzzy ARTMAPMLP

Classification

Ortho-Photo Interpretation

Level II

ETM+ 7/7/1999Bands 1-5 and 7

70%

30%

Thematic Detail Comparison

Environmental Variables Level I

ETM+ 9/27/2000Bands 1-5 and 7

Usefulness of Multi-seasonal

imageryUsefulness of Environmental

variables

Classification Time Comparison

IntroductionIntroductionThe usefulness of satellite data for land-cover mapping projects has been well researched and documented in remote sensing literature since the early 1970’s. As the capabilities of remote sensing based mapping and monitoring programs improve, natural resource managers and researchers are taking on more ambitious programs than previously possible. The result are land-cover mapping programs that attempt to characterize large spatial domains (biomes and countries) with medium spatial scale data (e.g., 30 m) (Franklin and Wulder, 2002). The ideal classification algorithms for such activities are ones that can handle multi-source, voluminous datasets while minimizing user input (Rogan et al., 2003).

To this appeal, machine learning algorithms have commonly outperformed conventional classifiers (e.g., Maximum Likelihood) in their use over the past 15 years (Gopal et al., 1999). Machine learning refers to a variety of algorithms, based on artificial intelligence, that are able to recognize data patterns through repeated learning techniques without prior data distribution assumptions. However, while machine learning classifiers are known to address the shortcomings of traditional, parametric algorithms, comprehensive comparative studies are still required to assess their usefulness when compared with each other since studies, to date, provide no clear evidence that any one algorithm outperforms the others (DeFries and Chan, 2000).

This objective of this study was to compare four machine Learning algorithms in their ability to accurately map land-cover at three levels of categorical detail. Additional comparison was performed testing algorithm speed and the usefulness of multi-temporal and environmental GIS variables.

MassachusettsStudy Area

Worcester

Boston

MethodologyMethodology

ConclusionConclusionThe number of map categories did not prove to be a distinguishing characteristic in comparing classifiers since the four algorithms produced statistically similar land-cover maps at all three levels of thematic detail. The classification tree proved the best classifier to meet the needs of large-area land-cover mapping programs based on its speed and ability to integrate multi-source data without requiring any pre-classification variable testing to weed out detrimental variables.

Level III1. Orchards2. Pasture/Row Crop3. Cranberry Bogs4. Broadleaf Forest5. Conifer Forest6. Mixed Forest7. Golf Course8. Grassland 9. Commercial/ Industrial10.H. Density Residential11.L. Density Residential12.Deep Water13.Shallow Water14.Non-Forested Wetland15.Freshwater Marsh16.Saltwater Marsh17.Sand Quarry18.Bare Soil

Level II1. Agriculture2. Broadleaf Forest3. Conifer Forest4. Grassland 5. Developed H. Intensity6. Developed L. Intensity7. Open Water8. Freshwater Wetlands9. Saltwater Wetlands10.Barren

Level I1. Agriculture2. Forest Land3. Grasslands 4. Developed5. Water6. Barren

ResultsResults

Input DataInput DataLandLand--cover Categoriescover Categories Multi-source Data:

In 80% of the trials the inclusion of multi-temporal spectral imagery produced significantly higher accuracies at the 95% confidence level; increasing accuracies an average of 11.3% at Level I, 11.6% at Level II, and 15.5% at Level III. The inclusion of environmental variables had lesser effect on accuracy, averaging increases of 0.2% at Level I, 3.2% at Level II, and 1.7% at Level III.

Study AreaStudy AreaEncompasses the portion of the state of Massachusetts, USA that falls within WRS-2 Path/Row 12/31

Machine Learning AlgorithmsMachine Learning Algorithms

FuzzyARTMAP

SOMMLP

AgricultureForestGrasslandDevelopedWaterBarren

Detrimental Variables:

The images below highlight how MLP, Fuzzy ARTMAP, and SOM were negatively affected by the inclusion of detrimental variables. The TWI, Southwestness and Texture images proved to be detrimental variables in the case of this study. CTA was not affected by the inclusion of these images in either overall accuracy or visual correctness of classification.

Thematic Detail:

No significant difference in kappa accuracies were produced between classifiers as the level of thematic detail increased. Though not significantly lower, SOM produced the lowest accuracy maps in the majority of trials.

Fuzzy ARTMAP performed the slowest averaging greater than 10 hours perclassification. In all cases increasing the amount of input data or thematic detail required longer classification times.

Level III

SOMMLP CTA

Overall Kappa Values: Level II

0.50.550.6

0.650.7

0.750.8

0.850.9

spectral July spectral Julyand Sept.

spectral Julyand Sept.and GISvariables

Kap

pa

MLPFuzzy ARTMAPSOMCTA

Overall Kappa Values: Level III

0.50.550.6

0.650.7

0.750.8

0.850.9



Kap

pa


Overall Kappa Values: Level I

0.50.550.6

0.650.7

0.750.8

0.850.9



Kap

pa


Average Running Time

0.825

10.45

65

4.442

5

0.145

5

0

2

4

6

8

10

12

MLP Fuzzy ARTMAP SOM CTA

hour

s

Classification Time:

This graph displays the average time taken by each algorithm to classify the image. CTA performed the fastest in an average of 8 minutes, while

ReferencesDeFries, R.S. and J.C-W. Chan. (2000), Multiple Criteria for Evaluating Machine Learning Algorithms for Land Cover Classification from Satellite Data.Remote Sensing of Environment. 74: 503-515

Franklin, S.E. and M. Wulder. (2002), Remote sensing methods in medium spatial resolution satellite data land cover classification of large areas. Progress in Physical Geography. 26(2): 173-205

Gopal, Sucharita, Curtis E. Woodcock, and Aland H. Strahler. (1999), Fuzzy Neural Network Classification of Global Land Cover from a 1° AVHRR Data Set. Remote Sensing of Environment. 67: 230-243

Rogan, J., J. Miller, D. Stow, J. Franklin, L. Levien, and C. Fischer. (2003), Land-Cover Change Monitoring with Classification Trees Using Landsat TM and Ancillary Data. Photogrammetric Engineering and Remote Sensing. 69( 7): 793–804.

7/7/1999 Spectral 9/27/2000 Spectral

Ancillary

Spectral

PrecipitationDEM Slope

TWISouthwestness Texture

Level I

Level I

a comparison of machine learning algorithms: the effects of

Documents