pedometrics2013 book of abstracts 14

1
Using the Multi-layer Perceptron Model to Predict Soil Classes - Data and Sampling Issues Fonseca, I.L.; Brasil, R.; Freire, S.; Rocha, J.; Tenedurio, J.A. Centro de Estudos Geográficos - IGOT, University of Lisbon Oral Presentation Artificial Neural Networks (ANNs) have been successfully used in digital soil mapping. The Multi-Layer Perceptron (MLP) architecture has been applied across different subjects in classification problems with a varied degree of accuracy. However, in soil science, and particularly in digital soil mapping, where ANNs can be very useful at predicting the spatial distribution of soil classes in areas lacking soil maps, the robustness of that architecture is not well known. Moreover, it is also not clear whether the accuracy of soil class predictions depends on (1) the type of sampling strategy used, (2) data pre- processing, (3) location information, and (4) if the pattern holds across catchments. Thus, in addition to geological and land cover information, a set of variables derived from SRTM DEMs were calculated for catchments with published soil maps, and used as input data for the MLP model. Different types of sampling strategies were compared using standardized and non-standardized data, with up to 10 or 12 input variables (addition of latitude and longitude data to the latter dataset), for four catchments: two in northern Portugal (Mondim de Basto and Vila Real), one in central Portugal (Castanheira) and one in Andalusia, in southern Spain (El Almendro). These four catchments are covered by four digital soil maps published by different institutions, with different FAO legend levels and scales (1:100000 in Portugal vs. 1:400000 in Spain). Results were largely concurrent between catchments, indicating that using random sampled training data always outperformed all other types of sampling strategies. Non-standardized data allowed the ANN to better predict soil classes, with accuracy levels, i.e. level of agreement between predicted and existing soil maps, generally higher if data were not pre-processed. Also, adding latitude and longitude as input data generally improved accuracy because it allowed the ANN to understand how the relative distance between training points determines the soil class, which is related to the concepts of hillslope connectivity, catena and autocorrelation, soils tend to vary across hillslopes, with close neighbouring positions more likely to have similar soils, which is determined by the local soil forming characteristics and lateral water and sediment flows.

Upload: ricardo-brasil

Post on 11-Jul-2015

23 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Pedometrics2013 book of abstracts 14

Using the Multi-layer Perceptron Model to Predict Soil Classes - Data and Sampling Issues Fonseca, I.L.; Brasil, R.; Freire, S.; Rocha, J.; Tenedurio, J.A. Centro de Estudos Geográficos - IGOT, University of Lisbon Oral Presentation Artificial Neural Networks (ANNs) have been successfully used in digital soil mapping. The Multi-Layer Perceptron (MLP) architecture has been applied across different subjects in classification problems with a varied degree of accuracy. However, in soil science, and particularly in digital soil mapping, where ANNs can be very useful at predicting the spatial distribution of soil classes in areas lacking soil maps, the robustness of that architecture is not well known. Moreover, it is also not clear whether the accuracy of soil class predictions depends on (1) the type of sampling strategy used, (2) data pre-processing, (3) location information, and (4) if the pattern holds across catchments. Thus, in addition to geological and land cover information, a set of variables derived from SRTM DEMs were calculated for catchments with published soil maps, and used as input data for the MLP model. Different types of sampling strategies were compared using standardized and non-standardized data, with up to 10 or 12 input variables (addition of latitude and longitude data to the latter dataset), for four catchments: two in northern Portugal (Mondim de Basto and Vila Real), one in central Portugal (Castanheira) and one in Andalusia, in southern Spain (El Almendro). These four catchments are covered by four digital soil maps published by different institutions, with different FAO legend levels and scales (1:100000 in Portugal vs. 1:400000 in Spain). Results were largely concurrent between catchments, indicating that using random sampled training data always outperformed all other types of sampling strategies. Non-standardized data allowed the ANN to better predict soil classes, with accuracy levels, i.e. level of agreement between predicted and existing soil maps, generally higher if data were not pre-processed. Also, adding latitude and longitude as input data generally improved accuracy because it allowed the ANN to understand how the relative distance between training points determines the soil class, which is related to the concepts of hillslope connectivity, catena and autocorrelation, soils tend to vary across hillslopes, with close neighbouring positions more likely to have similar soils, which is determined by the local soil forming characteristics and lateral water and sediment flows.