# spatial interpolation comparison - evaluation of spatial ...spatial- interpolation comparison...

Post on 19-May-2018

212 views

Embed Size (px)

TRANSCRIPT

Spatial Interpolation ComparisonEvaluation of spatial prediction methods

Tomislav Hengl

ISRIC World Soil Information, Wageningen University

Geostatistics course, 2529 October 2010, Wageningen

Topic

I Geostatistics = a toolbox to generate maps from point datai.e.to interpolate;

I There are many possibilities;

I An inexperienced user will often be challenged by the amountof techniques to run spatial interpolation;

I . . .which method should we use?

Geostatistics course, 2529 October 2010, Wageningen

Topic

I Geostatistics = a toolbox to generate maps from point datai.e.to interpolate;

I There are many possibilities;

I An inexperienced user will often be challenged by the amountof techniques to run spatial interpolation;

I . . .which method should we use?

Geostatistics course, 2529 October 2010, Wageningen

Topic

I Geostatistics = a toolbox to generate maps from point datai.e.to interpolate;

I There are many possibilities;

I An inexperienced user will often be challenged by the amountof techniques to run spatial interpolation;

I . . .which method should we use?

Geostatistics course, 2529 October 2010, Wageningen

Topic

I Geostatistics = a toolbox to generate maps from point datai.e.to interpolate;

I There are many possibilities;

I . . .which method should we use?

Geostatistics course, 2529 October 2010, Wageningen

Have you heard of SIC?

Geostatistics course, 2529 October 2010, Wageningen

http://bookshop.europa.eu/uri?target=EUB:NOTICE:LBNA21595:EN:HTML

The spatial prediction game

Participants were invited to estimate values located at 1000locations (right, crosses), using 200 observations (left, circles).

Geostatistics course, 2529 October 2010, Wageningen

Lessons learned (from SIC)

Geostatistics course, 2529 October 2010, Wageningen

Li and Heap (2008)

Geostatistics course, 2529 October 2010, Wageningen

http://www.ga.gov.au/image_cache/GA12526.pdf

How many techniques are there?

Li and Heap (2008) list over 40 unique techniques.

1. Are all these equally valid?

2. How to objectively compare various methods (which criteriato use)?

3. Which method to pick for your own case study?

Geostatistics course, 2529 October 2010, Wageningen

http://www.ga.gov.au/image_cache/GA12526.pdf

There are not as many

There are roughly four main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. expert systems / machine learning;

Geostatistics course, 2529 October 2010, Wageningen

There are not as many

There are roughly four main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. expert systems / machine learning;

Geostatistics course, 2529 October 2010, Wageningen

There are not as many

There are roughly four main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. expert systems / machine learning;

Geostatistics course, 2529 October 2010, Wageningen

There are not as many

There are roughly four main clusters of techniques:

1. splines (deterministic);

2. kriging-based (plain geostatistics);

3. regression-based;

4. expert systems / machine learning;

Geostatistics course, 2529 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden the time needed to completepredictions;

Geostatistics course, 2529 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden the time needed to completepredictions;

Geostatistics course, 2529 October 2010, Wageningen

The 5 criteria

1. the overall mapping accuracy, e.g.standardized RMSE atcontrol points the amount of variation explained by thepredictor expressed in %;

2. the bias, e.g.mean error the accuracy of estimating thecentral population parameters;

3. the model robustness, also known as model sensitivity inhow many situations would the algorithm completely fail /how much artifacts does it produces?;

4. the model reliability how good is the model in estimatingthe prediction error (how accurate is the prediction varianceconsidering the true mapping accuracy)?;

5. the computational burden the time needed to completepredictions;

Geostatistics course, 2529 October 2010, Wageningen

The 5 criteria

2. the bias, e.g.mean error the accuracy of estimating thecentral population parameters;

5. the computational burden the time needed to completepredictions;

Geostatistics course, 2529 October 2010, Wageningen

The 5 criteria

2. the bias, e.g.mean error the accuracy of estimating thecentral population parameters;

5. the computational burden the time needed to completepredictions;

Geostatistics course, 2529 October 2010, Wageningen

Can we simplify this?

1. In theory, we could derive a single composite measure thatwould then allow you to select the optimal predictor for anygiven data set (but this is not trivial!)

2. But how to assign weights to different criteria?

3. In many cases we simply finish using some nave predictor that is predictor that we know has a statistically more optimalalternative, but this alternative is not feasible.

Geostatistics course, 2529 October 2010, Wageningen

Automated mapping

In the intamap package1 decides which method to pick for you:

> meuse$value output

Hypothesis

We need a single criteria to compare various prediction methods.

Geostatistics course, 2529 October 2010, Wageningen

Mapping accuracy and survey costs

The cost of a soil survey is also a function of mapping scale,roughly:

log(X) = b0 + b1 log(SN) (1)

We can fit a linear model to the empirical table data frome.g.Legros (2006; p.75), and hence we get:

X = exp (19.0825 1.6232 log(SN)) (2)

where X is the minimum cost/ha in Euros (based on estimates in2002). To map 1 ha of soil at 1:100,000 scale, for example, oneneeds (at least) 1.5 Euros.

Geostatistics course, 2529 October 2010, Wageningen

Survey costs and mapping scale

9.5 10.0 10.5 11.0 11.5 12.0 12.5

1

01

23

Scale number (logscale)

Min

imum

sur

vey

cost

s in

EU

R /

ha (

log

scal

e)

Geostatistics course, 2529 October 2010, Wageningen

Survey costs and mapping scale

Total costs of a soil survey can be estimated by using the size ofarea and number of samples.The effective scale number (SN) is:

SN =

4 A

N 102 . . . SN =

A

N 102 (3)

where A is the surface of the study area in m2 and N is the totalnumber of observations.

Geostatistics course, 2529 October 2010, Wageningen

Converges to:

X = exp

(19.0825 1.6232 log

[0.0791

A

N 102

])(4)

Geostatistics course, 2529 October 2010, Wageningen

Output map, from info perspective

The resulting (predictions) map is a sum of two signals:

Z (s) = Z (s) + (s) (5)

where Z (s) is the true variation, and (s) is the error component.Th

Recommended