a practical guide to - geostatistical mapping - geostat courses

112
A Practical Guide to Geostatistical Mapping Tomislav Hengl ISRIC — World Soil Information, Wageningen University GEOSTAT course, 11-17 April 2011, Canberra

Upload: others

Post on 11-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

A Practical Guide toGeostatistical Mapping

Tomislav Hengl

ISRIC — World Soil Information, Wageningen University

GEOSTAT course, 11-17 April 2011, Canberra

Page 2: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Topics

I spatio-temporal data — elements, aspects, formats

I data import (GDAL) and visual exploration

I geographic data, maps, cartographic projections systems(proj4)

I Google Earth — the final GIS?I Spatio-temporal statistics — basics:

1. spatial prediction / automated mapping2. kriging, regression, regression-kriging3. some applications

GEOSTAT course, 11-17 April 2011, Canberra

Page 3: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Today, everybody is a spatial analyst!

I We have the tools that allow GIS+statistics integrationI There is more and more auxiliary data:

1. MODIS (global coverage, 250 m, every 2 days, 36 bands)2. Meteorological images (e.g.SEVIRI; 1 km, every 15 mins., 12

bands)3. SRTM DEM, GDEM, LiDAR (topography, 30–100 m)

I We can automate data analysis (“Get results sooner, withmore accuracy. . . and retire sooner” Chih Jeng Kenneth Tan)

I GE registered more than 350 millions of downloads!

GEOSTAT course, 11-17 April 2011, Canberra

Page 4: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

GIS analysis for all

“From a period in which geographic information sys-tems, and later geocomputation and geographical in-formation science, have been agenda setters, thereseems to be interest in trying things out, in ex-pressing ideas in code, and in encouraging othersto apply the coded functions in teaching and appliedresearch settings.”

Roger Bivand

GEOSTAT course, 11-17 April 2011, Canberra

Page 5: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

The missing link

I Our projects typically depend on both statistical and GISanalysis

I Some believe that this could all be done within R

I Others believe that this could all be done within commercialpackages (ArcGIS)

I . . . and the winner is:

1. R — scripting, statistical computing2. SAGA/GRASS — GIS data input and geographical analysis3. Google Earth — storage, sharing, browsing,

GEOSTAT course, 11-17 April 2011, Canberra

Page 6: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Basic concepts

I Models — statistical model (conceptual); data models(formats); model parameters;

I Methods (functions) — implemented as algorithms; inputs,outputs, arguments;

I Data — variables: target variables, auxiliary variables(predictors); metadata; geoinformation;

I Applications — field-specific; result interpretation;associated uncertainty;

GEOSTAT course, 11-17 April 2011, Canberra

Page 7: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What is spatio-temporal statistics about?

Spatio-temporal statistics — statistical techniques adjusted tohandle spatio-temporal data.

Geostatistics is a subset of statistics specialized in analysis andinterpretation of geographically (and temporally) referenced data.

Geostatistics is an analytical tool for statistical analysis ofsampled field data.The bottom line is — you collect (spatio-temporal) data and youneed tools that can help you answer field-specific questions(i.e.that can help you produce outputs of interest — maps,predictions, statistical measures).

GEOSTAT course, 11-17 April 2011, Canberra

Page 8: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Geostatistics — topics

Typical questions of interest to a geostatistician are:

I how does a variable vary in space?

I what controls its variation in space?

I where to locate samples to describe its spatial variability?

I how many samples are needed to represent its spatialvariability?

I what is a value of a variable at some new location?

I what is the uncertainty of the estimate?

GEOSTAT course, 11-17 April 2011, Canberra

Page 9: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Analysis objectives

For Diggle and Ribeiro (2007) there are three scientific objectivesof geostatistics:

1. model estimation, i.e.inference about the model parameters;

2. prediction, i.e.inference about the unobserved values of thetarget variable;

3. hypothesis testing;

GEOSTAT course, 11-17 April 2011, Canberra

Page 10: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Environmental variables

Quantitative or descriptive measures of different environmentalfeatures.

I biology (distribution of species and biodiversity measures)

I soil science (soil properties and types)

I vegetation science (plant species and communities, landcover types)

I climatology (climatic variables at surface and benith/above)

I hydrology (water quantities and conditions)

GEOSTAT course, 11-17 April 2011, Canberra

Page 11: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Example

GEOSTAT course, 11-17 April 2011, Canberra

Page 12: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Variables

* Name, definition

* Feature of interest

* Measurement units

* Representation, data model, domain

* Spatio-temporal pattern

* Application, decision making process, datainterpretation

GEOSTAT course, 11-17 April 2011, Canberra

Page 13: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Example: pH

I environmental feature: acidity in soil

I variable of interest: pH factor

I units: concentration of the H+ ions in soil (negativeexponent)

I sampling technique: pH meter (field or laboratory); soilsolution

I targeted output: a map of continuous values ofconcentration (continuous fields)

I interpretation: values of pH define properties of soil (acid,neutral, alkaline soils)

GEOSTAT course, 11-17 April 2011, Canberra

Page 14: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial variability

Commonly a result of complex processes working at the sametime and over long periods of time, rather than an effect of asingle realization of a single factor.

Sum of two components: (a) the natural spatial variation and(b) the inherent noise.

I Geographical variation (2D)

I Vertical variation (3D)

I Temporal variation

I Variation at different scales (support size)

GEOSTAT course, 11-17 April 2011, Canberra

Page 15: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

A way to classify variables

1. SRV — short-range variability

2. TV — temporal variability

3. VV — vertical variability

4. SSD — standard sampling density

5. DRS — remote-sensing detectability

Other important issues: (6) sampling costs, (7) global or localcoverage, (8) relationship with other variables, (9) scalability

GEOSTAT course, 11-17 April 2011, Canberra

Page 16: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What you need to know!

Each 2D map of an environmental variable should always indicatea time reference (interval), applicable vertical dimension1

and the sample (support) size i.e.the effective scale.

It is also important to know: the approximate geographicalcoordinates of the study area (gravity point), borders of thearea of interest (mask), coordinate system (proj4 string) andwho and how made the map.

1Orthogonal distance from the land surface.

GEOSTAT course, 11-17 April 2011, Canberra

Page 17: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Look again at these maps

GEOSTAT course, 11-17 April 2011, Canberra

Page 18: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Nature of variables

From a meta-physical perspective, what we are most oftenmapping in geostatistics are, in fact, quantities of molecules of acertain kind or quantities of energy.

Many variables directly refer to processes and are expressed inquantity per time units — e.g.mm of rainfall per year.

In ecology: objects of interest (individual plants or animals), oftenimmeasurable in quantity — animal species change their locationdynamically, often in unpredictable directions and withunpredictable spatial patterns (non-linear trajectories); occurrencerecords — 0/1 observations; these are modeled using thestatistical probability theory.

GEOSTAT course, 11-17 April 2011, Canberra

Page 19: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Data models/formats

Data format is the way we define structure and elements of arecord of some variable/feature.Data format dictates many things: the way we edit(reading/writing), search, compute, transform or scale data.

Data formats are software-specific — everybody has a differentidea about how to represent data digitally.

GEOSTAT course, 11-17 April 2011, Canberra

Page 20: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Data formats in R

R classes — what type of object is it?

numeric: array of numbers (vector/matrix); dataframe: thefundamental structure for statistical analysis; matrix: with namedcolumns (roughly, database fields) and (optionally) named rows(roughly, database cases); models/formulas: complex hierarchicalstructure (a set of lists, vectors, dataframes);

Common classes in R: numeric, string, integer, factor,Date-Time (POSIX/C99) etc.

GEOSTAT course, 11-17 April 2011, Canberra

Page 21: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial data in R

R has special classes for spatial data: spatial points, pointpatterns, pixels, lines, grids, CRS etc.; many existing spatialstatistics packages work with these classes;

http://www.r-project.org/Rgeo/

Pebesma, E.J., Bivand, R.S., 2005. Classes and methods forspatial data in R. R News 5/2, 9–13.Bivand, R.S., Pebesma, E.J., Gomez-Rubio, V., 2008. AppliedSpatial Data Analysis with R. UseR! Series, Springer, 378 pp.

GEOSTAT course, 11-17 April 2011, Canberra

Page 22: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Example of a gridded data in R

Formal class 'SpatialGridDataFrame' [package "sp"] with 6 slots

..@ data :'data.frame': 5530 obs. of 1 variables:

.. ..$ lgn3: int [1:5530] 1 1 1 1 1 1 1 22 22 22 ...

..@ grid :Formal class 'GridTopology' [package "sp"] with 3 slots

.. .. ..@ cellcentre.offset: Named num [1:2] 190163 314013

.. .. .. ..- attr(*, "names")= chr [1:2] "x" "y"

.. .. ..@ cellsize : num [1:2] 25 25

.. .. ..@ cells.dim : int [1:2] 70 79

..@ grid.index : int(0)

..@ coords : num [1:2, 1:2] 190163 191888 314013 315963

.. ..- attr(*, "dimnames")=List of 2

.. .. ..$ : NULL

.. .. ..$ : chr [1:2] "x" "y"

..@ bbox : num [1:2, 1:2] 190150 314000 191900 315975

.. ..- attr(*, "dimnames")=List of 2

.. .. ..$ : chr [1:2] "x" "y"

.. .. ..$ : chr [1:2] "min" "max"

..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots

.. .. ..@ projargs: chr NA

GEOSTAT course, 11-17 April 2011, Canberra

Page 23: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Compare with ArcInfo ASCII

ncols 70

nrows 79

xllcorner 190150

yllcorner 314000

cellsize 25.00

nodata_value 0

1 1 1 1 1 1 1 22 22 22 22 22 22 22 22 22 1 1

1 17 17 17 17 24 17 17 17 17 17 17 17

17 17 17 24 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 4 5 5 5 5 5 5 5 5 6 6 6 2 2 2 2 2

1 1 1 1 1 22 22 22 22 22 22 22 22 22 22 22

1 1 1 17 17 17 24 24 24 17 17 17 17 17 17

17 17 17 24 24 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 ...

GEOSTAT course, 11-17 April 2011, Canberra

Page 24: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Compare with Idrisi raster I

file title : Land cover types from LGN3

data type : byte

file type : binary

columns : 70

rows : 79

ref. system : epsg:28992

ref. units : m

unit dist. : 1.0000000

min. X : 190150.00000

max. X : 191900.00000

min. Y : 314000.00000

max. Y : 315975.00000

pos'n error : unknown

resolution : 25.000000

min. value : 0

max. value : 39

value units : meter

value error : unknown

flag value : 0

flag def'n : missing data

legend cats : 26

GEOSTAT course, 11-17 April 2011, Canberra

Page 25: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Compare with Idrisi raster II

category 0 :

category 1: Agrarisch gras

category 2: Maıs

category 3: Aardappelen

category 4: Bieten

category 5: Granen

category 6: Overige landbouwgewassen

category 7: Glastuinbouw

category 8: Boomgaarden

category 9: Bloembollen

category 10: Loofbos

category 11: Naaldbos

category 12: Droge heide

category 13: Open begroeid natuurgebied

category 14: Kale grond natuurgebied

category 15: Zoet water

...

GEOSTAT course, 11-17 April 2011, Canberra

Page 26: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Land cover map in NL

LGN3 land cover classes

314500

315000

315500

190500 191000 191500

1

2

3

4

5

6

8

10

15

17

18

19

20

21

22

24

25

GEOSTAT course, 11-17 April 2011, Canberra

Page 27: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What is GDAL/OGR?

Translation of data from one software to other is now made simplethanks to:

I GDAL2 — Geospatial Data Abstraction Library

I OGR — OpenGIS Simple Features Reference Implementation

Note: not all software producers support GDAL!

2http://www.gdal.org/formats_list.html

GEOSTAT course, 11-17 April 2011, Canberra

Page 28: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

PROJ.4

Geographic data always refers to some referent coordinatesystem.

PROJ.4 — Cartographic Projections Library (this allows you toreproject maps to almost any coordinate system)http://spatialreference.org

Today, it is much easier to move maps from one to otherprojection system than 10 years ago (reproject on-fly). You onlyneed to assign the correct proj4string and then you do not have toworry about it any more. Unless you got it wrong — all parametersneed to be absolutely correct!

GEOSTAT course, 11-17 April 2011, Canberra

Page 29: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

EPSG

European Petroleum Survey Group (EPSG) databasehttp://www.epsg-registry.org/

World standard or user-defined coordinate systems;e.g.Amersfoort / RD New (EPSG 28992):

+proj=sterea

+lat_0=52.15616055555555 +lon_0=5.38763888888889

+k=0.999908 +x_0=155000 +y_0=463000

+ellps=bessel

+towgs84=565.237,50.0087,465.658,-0.406857,

0.350733,-1.87035,4.0812

+units=m +no_defs

GEOSTAT course, 11-17 April 2011, Canberra

Page 30: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

WGS84

The only truly global reference model of the Earth is the WorldGeodetic System (WGS84) ellipsoid:

> EPSG <- make_EPSG()

> EPSG[EPSG$note=="# WGS 84",-2]

code prj4

249 4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs

Our current space-time location:

> 149.11902 E, -35.28028 N, 14 April, 03:15 GMT

GEOSTAT course, 11-17 April 2011, Canberra

Page 31: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Google Earth

GEOSTAT course, 11-17 April 2011, Canberra

Page 32: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Why not always use Longlat system?

I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1

I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)

I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);

I Map units are abstract — arcseconds, arcminutes,arcdegrees

GEOSTAT course, 11-17 April 2011, Canberra

Page 33: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Why not always use Longlat system?

I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1

I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)

I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);

I Map units are abstract — arcseconds, arcminutes,arcdegrees

GEOSTAT course, 11-17 April 2011, Canberra

Page 34: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Why not always use Longlat system?

I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1

I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)

I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);

I Map units are abstract — arcseconds, arcminutes,arcdegrees

GEOSTAT course, 11-17 April 2011, Canberra

Page 35: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Why not always use Longlat system?

I Aspect ratio for X (Longitude) and Y (Latitude)coordinates is not 1:1

I Grid cell size is not constant — most of algorithms ingeomorphometry, geostatistics, assume cartesian system(e.g.derivation of slope, distances etc.)

I You can not print and use such maps to determine distances,areas, directions (in a traditional cartographic way);

I Map units are abstract — arcseconds, arcminutes,arcdegrees

GEOSTAT course, 11-17 April 2011, Canberra

Page 36: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

From Geographic to Projected coordinates

GEOSTAT course, 11-17 April 2011, Canberra

Page 37: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Grid cell size / scale

see also “Finding the right pixel size”

GEOSTAT course, 11-17 April 2011, Canberra

Page 38: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Impact of grid cell size

GEOSTAT course, 11-17 April 2011, Canberra

Page 39: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Resampling

For example in SAGA GIS:$module name : Get Grid Data for Shapes

...

-INTERPOL:<num> Interpolation

Choice

Available Choices:

[0] Nearest Neighbor

[1] Bilinear Interpolation

[2] Inverse Distance Interpolation

[3] Bicubic Spline Interpolation

[4] B-Spline Interpolation

GEOSTAT course, 11-17 April 2011, Canberra

Page 40: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Resampling scheme (nearest neighbor and bilinear)

GEOSTAT course, 11-17 April 2011, Canberra

Page 41: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial prediction techniques

Next topics:

I Spatial prediction — basic principles, classification(mechanical / statistical methods)

I Kriging — semivariance, variogram, ordinary kriging,characteristics of kriging, variants of kriging

I Regression — correlation, prediction error, OLS, GLS, GLMs

I Regression-kriging — the generic spatial prediction model

GEOSTAT course, 11-17 April 2011, Canberra

Page 42: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Books on geostatistics

I Goovaerts, P., 1997. Geostatistics for Natural ResourcesEvaluation (Applied Geostatistics). Oxford University Press,New York, 496 pp.

I Webster, R. and Oliver, M.A., 2007. Geostatistics forEnvironmental Scientists. Statistics in Practice. JohnWiley & Sons, Chichester, 330 pp.

I Pebesma, E.J., 2003. Gstat User’s manual. University ofUtrecht, Utrecht www.gstat.org

I Rossiter D.G., 2008. Spatial analysis and Geostatistics,lecture notes, ITC.

GEOSTAT course, 11-17 April 2011, Canberra

Page 43: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Geostatistical mapping

“Analytical production of maps by using field observations, auxiliaryinformation and a computer program that calculates values atlocations of interest (a study area)”

GEOSTAT course, 11-17 April 2011, Canberra

Page 44: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial prediction model

A widely-accepted generic spatial prediction model3:

z (s0) = E {Z |z (si), qk (s0), γ(h), s ∈ A} (1)

where z (si) is the input point dataset, qk (s0) is the list ofdeterministic predictors and γ(h) is the covariance model definingthe spatial autocorrelation structure.

3A spatial prediction model defines inputs, outputs and the computationalprocedure to derive outputs based on the given inputs.

GEOSTAT course, 11-17 April 2011, Canberra

Page 45: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial prediction scheme

GEOSTAT course, 11-17 April 2011, Canberra

Page 46: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial prediction techniques I

1. MECHANICAL (DETERMINISTIC) MODELS — Theseare models where arbitrary or empirical model parameters areused. No estimate of the model error is available and usuallyno strict assumptions about the variability of a feature exist.The most common techniques that belong to this group are:

I Thiessen polygons;I Inverse distance interpolation;I Regression on coordinates;I Natural neighbors;I Splines;I . . .

GEOSTAT course, 11-17 April 2011, Canberra

Page 47: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial prediction techniques II

2. LINEAR STATISTICAL (PROBABILITY) MODELS —In the case of statistical models, the model parameters arecommonly estimated in an objective way, following probabilitytheory. The predictions are accompanied with an estimate ofthe prediction error. A drawback is that the input data setusually need to satisfy strict statistical assumptions. There areat least four groups of linear statistical models:

I kriging (plain geostatistics);I environmental correlation (e.g.regression-based);I Bayesian-based models (e.g.Bayesian Maximum Entropy);I hybrid models (e.g.regression-kriging);I . . .

GEOSTAT course, 11-17 April 2011, Canberra

Page 48: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Spatial prediction techniques III

3. EXPERT-BASED SYSTEMS — These models can becompletely subjective (ergo irreproducible) or completelybased on data; predictions are typically different for each run.Expert systems can also largely be based on probability theory(especially Bayesian statistics), however, it is good to putthem in a different group because they are conceptuallydifferent from standard linear statistical techniques. There areat least three groups of expert based systems:

I mainly knowledge-driven expert system (e.g.hand-drawnmaps);

I mainly data-driven expert system (e.g.based on neuralnetworks);

I machine learning algorithms (purely data-driven);

GEOSTAT course, 11-17 April 2011, Canberra

Page 49: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Inverse distance interpolation

A value of target variable at some new location can be derived as aweighted average:

z (s0) =

n∑i=1

λi(s0) · z (si) = λT0 · z (2)

where λi is the weight for neighbour i . The sum of weights needs toequal one to ensure an unbiased interpolator.The simplest approach for determining the weights is to use the inversedistances from all points to the new point:

λi(s0) =

1dβ(s0,si )

n∑i=0

1dβ(s0,si )

; β > 1 (3)

GEOSTAT course, 11-17 April 2011, Canberra

Page 50: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Kriging

A standard version of kriging is called ordinary kriging. The predictionsare based on the model:

Z (s) = µ+ ε′(s) (4)

where µ is the constant stationary function (global mean) and ε′(s) is thespatially correlated stochastic part of variation.The predictions are obtained using:

zOK(s0) =

n∑i=1

wi(s0) · z (si) = λT0 · z (5)

where λ0 is the vector of kriging weights (wi), z is the vector of nobservations at primary locations.

GEOSTAT course, 11-17 April 2011, Canberra

Page 51: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Kriging (2)

The kriging OK weights are solved by multiplying the covariances:

λ0 = C−1 · c0; C (|h| = 0) = C0 + C1 (6)

where C is the covariance matrix derived for n × n observations and c0is the vector of covariances at new location.

C (s1, s1) · · · C (s1, sn) 1

......

...C (sn , s1) · · · C (sn , sn) 1

1 · · · 1 0

−1

·

C (s0, s1)

...C (s0, sn)

1

=

w1(s0)

...wn(s0)ϕ

(7)

GEOSTAT course, 11-17 April 2011, Canberra

Page 52: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Semivariance

The basis of kriging is derivation and plotting of the so-calledsemivariances — differences between the neighbouring values:

γ(h) =1

2E[(z (si)− z (si + h))

2]

(8)

where z (si) is the value of target variable at some sampled location andz (si + h) is the value of the neighbor at distance si + h.Suppose that there are n point observations, this yields n · (n − 1)/2pairs for which a semivariance can be calculated.

Once we calculated an experimental variogram, we can fit it using some

of the authorized variogram models, such as linear, spherical,

exponential, circular, Gaussian, Bessel, power. . .

GEOSTAT course, 11-17 April 2011, Canberra

Page 53: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Variograms

GEOSTAT course, 11-17 April 2011, Canberra

Page 54: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Anisotropy

GEOSTAT course, 11-17 April 2011, Canberra

Page 55: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Anisotropy in gstat

The variogram models can be extended to even larger number ofparameters if either (a) anisotropy or (b) smoothness are considered inaddition to modelling of nugget and sill variation.

The 2D geometric anisotropy in gstat, for example, is modelled byreplacing the range parameter with three parameters — range in themajor direction (direction of the strongest correlation), angle of theprincipal direction and the anisotropy ratio:

vgm(nugget=1, model="Sph", sill=10, range=2,

anis=c(30,0.5))

where value of the angle of major direction is 30 (azimuthal directionmeasured in degrees clockwise), and value of the anisotropy ratio is 0.5

(range in minor direction is two times shorter).

GEOSTAT course, 11-17 April 2011, Canberra

Page 56: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Kriging, steps

330000

331000

332000

333000

178500 179500 180500 181500

●●●●

●●

●●● ●

●●

●●●●

●●

●●●●●

●●●●

●●

●●● ●

●●●●●●

●●●

●●●●●

●●

●●●●●●●●

●●●●

●●●●

●●●●

●●●●●

●●●●

●●

●●

●● ●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●●●● ●

●●

●●●

1002004008001600

(a)

distance

sem

ivar

ianc

e

1

2

3

500 1000 1500

●●●

●●

●●

●●

●●

●●

●●● ●

●●

● ●● ●●

●●

●●● ●●●

●●●●

●●

●●●●

●●

●●

●●●●●●●● ●●●

●● ●

●●

●●

●●

●● ●●●●

●●

●●

●●

●● ●●●●●●

●●●●

●● ●

●●●

●●

●●

●● ●●

●●

●●

●●●●

●●●

●●●●

●●

●●

●●

●● ●●●

●●●●

●●

●●

●●

●●● ●●●

●●●

●●

●●

●●●● ●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●●●●●●

●●●

●●

●●●

●●

●●●●●●●

●●●

●●

●●●

●●

●●●●●

●●●●

●● ●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●●●

●●●●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●

●●

●●●●●●●●●

●●●

●● ●●●●●

●●●●

●●

●● ●

● ●

●●

●●● ● ●●

●●

●●●●●

●●●●

●●●

●●

●●●●● ● ●

●●

●●●●

●●●●●

●●

●●

●●●●● ● ●●

●●

●●●●●●●●●

●●●

●● ●●●●●● ●●●

●●●●

●●

●●●

●●

●●

●●●●● ● ●●●●

●●

●●●●

●●

●●●

●●●

●●

●●●●● ● ●●●●●

●●

●●●●●●●●

●●

●●●

●● ●●●●●●

●●●●

●●●●

●●●●●●

●●

●●

●●●

●●●●●●

●●●●

●●●

●●

●●● ●

●●●

●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●●●● ●

●●

●●●●

●●

●●

●● ●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●●●●●●●

●●

●●●

●● ●●●●●●

●●●●●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●● ●●●● ●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●● ●●●● ●●●

●●

●● ●●●●●●●●

●●●

●● ●●●●●● ●●●● ●●●

●●

●●● ●●

●●●●

●●

●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●●●

●●●●

●●

●●

●●●

●●

● ●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●

●●●●

●●

● ●●●

●●

●●

●●●

●●●

●●●●●●

●●●

●●●

●●

●●

●●

●●

● ●

●●●

●● ●●●●●●●●●●●●

●●

●●●

●●

●●

●●

●●●

●●●●●●

●●

●●● ●

●●

●●

●●●

●●●

●●

●●

●●●●●●●●●

●●

●●●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

●●●●● ●

●●

●●●●

●●

●●

●●●●

●●

● ●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●● ●●●●● ●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●●

●●●

●●●●

●●●

●●

●●●●

●●

●●●●●

●●

●●

● ● ●

●●●

●●●●

● ●●

●●●●

●●●

●●

●●

●●

●●

●●● ●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

● ● ●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●

● ●●● ●

●●

●●

●●●

●●

●●

●●

●●

●● ●

● ●● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●● ●

●●●

●●●

●●

●●

●●●●

●●

●● ●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●●●●●

●●

●●

●●●

●●

●●

●●●●●●●● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●●●

●●

●●

●●

●●

●●●●●●●

●●

●●●

●●●●●●●

●●

●●

●●●●●●●●●

●●●●

●●●●●●●●●

●●●●

●●

●● ●

●●●●●●●

●●●●● ●

●●●●●●●

●●●●● ●

●●●●●●●●

●●●●●●● ●

●●●●●●

●●●●●●●●

●●●●●●●

●●●●● ●● ●●

●●●●●●● ●

●●●●●●●

●●●

●●

●●

●●●

●●●

●●●●●● ●

●●

●●●

●●●●●●●● ●

●● ●● ●● ●

●●●

●●

●●●

●●●

●●●●●

●●

● ●

●●●

●●

●●

●●

●●

●●●●●● ●

●● ●● ●● ●● ●●●●

●●

●● ●●●● ●● ●

●●

●● ●

●●● ●●● ●●

● ● ● ●● ● ●

●●

●● ●

● ●

●●

●●

●●●●

●●

●●●

●●

●●

●●●

● ●

●● ●

●●

● ●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●● ●

● ●● ●

●●

● ●●

●●

●●

●●

●● ●

●●●●●●●

●●●● ●● ●● ●●

●●

●●

●●●●

●●●

●●●●● ●● ● ●●

●●

●●

●●

●●●●

●●

●●●●

● ●●●

●●

● ●●●●

●●●

●●

●●

●●●

●●

●●●

●●

● ●●●●

●●●

●●

●●●●

●●●●

● ●

●●

●●●●● ●●●●●

●●●●● ●● ●●●

●●

●●●●

●● ●●●●●

●●●●● ●● ●●●

●●

●●●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

● ●

● ●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●● ●●●●

●●

●●●

●●

●●●

●●●

● ●

●●

●●●

●●

●●

● ●● ● ●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●●● ● ●●● ●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

● ●●

●●

●●

● ●●● ● ● ●●● ●●●●

●●

●●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●●

● ● ● ●●●●

●●

●●●●

●●●

●●

●●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

●●

●●

● ● ●●●●●

●●

●●

●●●●●

●●●●●

●●

●●

●●

● ●

● ●

●●●

●●●

● ●●●

●●●

●●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ● ●●●

●●● ●

●●●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

● ● ●●●●

● ● ●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●●● ● ● ●●●● ● ● ●●● ●●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

●●

● ●●

●●

●●

●●●● ● ● ●●●●

● ●●●

●● ●●●●

●●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

● ●

● ●●

●●

●●

● ●●● ● ● ●●●●● ●

●●●

●●

●●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

●●

●● ● ● ●●●●● ● ●●●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

●●

●●

● ● ●●●

●● ● ●●●

●●●

●● ●●●

●●

●●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

●●

●●

● ●●● ● ● ●●●●●● ●● ●●●●●●●

●●

●●●●

●●

●●

●●

●●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ●●●● ● ●●●●● ● ●●●●●●●●●

●●

●●

●●●●

●●

●●

●●

●●

●● ●

● ●

●● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●●●● ● ●●●●

●●●●

●●●●

●●●●

●●

●● ●●●

●●

●●●●

●●

●●

●●

●● ●

● ●

●● ●

●●

●●

●●

●●

● ●●●

● ●●●● ● ●●●●●● ●● ●●●●●●

● ●●

●●●●●

● ●● ●●●●●● ●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●●●● ●●●

●●

●●●●

●●

●●

●●

●● ●

● ●

●● ●

●●

●● ● ●●●●●●● ●

●●●●

●●● ●●●

●●

●●

●●

●●●●●●●● ●●●

●●

●●●●

●●

●●

●●

●● ●

● ●

●● ●

●●

●●

● ●

●●● ● ● ●●●●●● ● ● ●●●●●●

● ●●●

●●

●●

●●● ●●

●●●

●●

●●●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●● ● ●●●●● ● ● ● ●

●●●●● ●

●● ●●

●●

●● ●●●●● ●●●

●●

●●●●

●●

●●●●●

●●

●● ●

● ●

●● ●

●●

● ●●

●●

●●

● ●●●● ● ●●

●●●

● ●●

●●●●●

● ●●● ●●●

●●

●●●●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ● ● ● ● ●●

●● ●● ●

●●

●●●

●●

● ●

●●

●●

●●●●

●● ●

●●●

●●

●●

●●

●●●●● ●

●●

●●●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

●● ● ● ● ● ● ●●

●●

●●

●●

● ●

●●

●●

●●●

●●● ●●●

●●●

●● ●●●●●● ●●●● ●●●

●●

●●●

●●

●● ● ●●

●●

●● ●

● ●

●● ●

●●● ● ● ●●

●●● ●

●●●●●

● ●●●● ●●

●●

●●

●●●

●●

●●

●●●●●●

●●●●

●●

●●

●●●

●●

● ●

● ●

● ●

●●

●●●●●● ● ● ●

● ●●●● ●

●●

● ● ●●

●●●

●●

●●

●●●●●●

●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

●●●● ● ●

●● ●

●●● ●●

● ●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●

● ●

● ●

●● ●

●●

●● ● ●●●●●●● ● ● ●● ●●●● ●●●

● ● ●●

●●●

●●

● ●● ●●●●

●●●

●●

● ● ●

●● ●

● ●●●●

●●●

●● ●

●●

●●

● ●

●● ●● ● ●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●● ●

●●●●

●●●

●●

●●●●

●●

●● ●

● ●●

●●

●●

●●

●● ●

●●

● ●

●●●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●

● ● ●●●

●● ● ●● ●

●● ●

●● ● ●●●

●●●

●●●●●●

●●●

●●

●●●●●● ●●●● ●●●

●●

●●●●

●●

●●

●●

●● ●

● ●

●● ●

●●

●● ● ●●●●●●● ●

●●●●

●●● ●●●

● ●● ●

● ●●

●●

●●

●●●●

●●

●●●

●● ●

●●

●●●● ●● ●●●●● ●●

●●

● ● ● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●●●●●

●●●●●●

●●

●●●

●●

● ●

● ●

●● ●

●●

●●●●●●● ● ● ●

● ●●●● ●

●●

● ●●

●●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

●●

●●

●● ● ● ●●●●● ● ●● ● ●

●●●●● ●

●●

●● ●●

●●●● ●●

●●

●●

●●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

●●

●●

● ●●● ● ● ●●●●● ● ●●●●●●●● ● ●● ●

●● ●●

●●●●● ●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ● ●●●●

● ● ●●●●

● ●●● ● ●

● ●

●●●

●●●●●

●●●●

●●●

●●

●●●

●●●

● ●

●●

●●●

●●

●●

●●● ●●● ● ● ● ● ● ●●●

●●●● ●●● ● ●

●●●

●●●●●

●●●●●

●●

●●●●●●●●● ●

●●

● ●●●●● ●● ●●

●●

●●●● ●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●●●

●● ●

● ●●●

● ●●●● ●

●●

●●●● ●●

● ●●

●●●

● ●●● ● ●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ● ●●●● ● ● ● ●●●●●● ● ● ●● ●

●●●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ● ● ●●●● ● ● ● ●●●●●● ● ● ●● ●

●●●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ● ● ●●●● ● ● ● ●●●●●● ● ● ●●

● ●●●●

●●●

●●

● ●●●

●●●●

● ●

● ●●●●●● ●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●●●●●● ●

●●

●●

●●●●●● ●●

●●

●●●● ●

●●

●●●

●●

●●

●●

●●

● ●●

●●●●●●●●● ●●

●●

●●●●●

●● ●

●●● ●

● ●

●●

●●

●● ●●

●●●●●●●●● ●

●●

●●●●●●

● ●

●●● ●

● ●

●●

●●

●●● ●●

●●●●● ●●●● ● ●●

●●●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

● ●

● ●

●●●

●●

● ●

●● ● ●●●● ●

●●

●●

●●●● ●

●●

●●

●●● ●●

●●

● ●

●●

●●●

●●

●●

● ●

● ●

●●●

●●

●●

●● ● ●●●● ● ● ●●●●●●

● ●●● ●

●●●●

●● ● ●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ● ●●●● ● ● ● ● ●●●●●● ● ● ●● ●

●●●●●

●● ● ●

●●

●●

●● ●●●●●

●●●●●●●●● ●

●●

●●

●●●●●● ●●

●●●● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●● ●●●

●●

●●●

●●

●●

●●●

●●

●●●

● ●

● ●●●●● ●

●●●

●●

●●●● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●● ●

●●●●

●●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●● ● ●●●● ● ● ● ● ●●●●●● ● ● ●● ● ●

● ●●●●●●●●

●●● ●

●●

● ●

●●●

●●●

●●

●●

●●●

● ●●●●

● ●●●

●● ●

●●

●●

●● ●●

●●

●●

● ●● ●

(b)

distance

sem

ivar

ianc

e

0.2

0.4

0.6

500 1000 1500

+

+

+

+

+

+ +

++

+ +

+

+

+ +

57

299

419

457547

533574564589

543500

477452

457415

(c)

distance

sem

ivar

ianc

e0.2

0.4

0.6

500 1000 1500

+

+

+

+

+

+ +

++

+ +

+

+

+ +

(d)

GEOSTAT course, 11-17 April 2011, Canberra

Page 57: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Burrough and McDonnell (1998) example

What is the value of target variable at the location x=5, y=5? (examplefrom the book by Burrough and McDonnell (1998)):

> grid10 = expand.grid(x = seq(0, 10, 10), y = seq(0, 10,

+ 10))

> gridded(grid10) = ~x + y

> newpoint = as.data.frame(matrix(c(5, 5), nrow = 1, ncol = 2,

+ dimnames = list(c("x1"), c("X", "Y"))))

> coordinates(newpoint) <- ~X + Y

> krige(Z ~ 1, points5, newpoint, vgm(nugget = 2.5, "Sph",

+ psill = 7.5, range = 10))

[using ordinary kriging]

coordinates var1.pred var1.var

1 (5, 5) 4.3 4.93

GEOSTAT course, 11-17 April 2011, Canberra

Page 58: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Kriging = geostatistics

Kriging has for many decades been used as a synonym forgeostatistical interpolation.

It originated in the mining industry in the early 1950’s as a meansof improving ore reserve estimation (mining engineers D. G. Krigeand the statistician H. S. Sichel).

The technique was first published in Krige (1951), but it tookalmost a decade until a French mathematician G. Matheronderived the formulas and basically established the whole field oflinear geostatistics.

GEOSTAT course, 11-17 April 2011, Canberra

Page 59: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Environmental correlation

The concept of vegetation / soil – environment relationships hasfrequently been presented in terms of an equation with six keyenvironmental factors as:

V × S [x , y , t ] = f

{s[x , y , t ] c[x , y , t ] o[x , y , t ]r [x , y , t ] p[x , y , t ] a[x , y , t ]

(9)

GEOSTAT course, 11-17 April 2011, Canberra

Page 60: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Types of lm’s

There are (at least) four groups of statistical models that havebeen used to make spatial predictions with the help ofenvironmental factors:

I Classification-based models

I Tree-based models (decision tree)

I Regression models (Generalized Linear Models, GeneralAdditive Models)

GEOSTAT course, 11-17 April 2011, Canberra

Page 61: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Environmental correlation with OLS

A common regression-based approach to spatial prediction is themultiple linear regression:

zOLS(s0) = b0+b1 ·q1(s0)+. . .+bp ·qp(s0) =p∑

k=0

βk ·qk (s0) = βT·q (10)

where qk (s0) are the values of the auxiliary variables at the target

location, p is the number of predictors or auxiliary variables, and βk arethe regression coefficients solved using the Ordinary Least Squares:

β =(qT · q

)−1 · qT · z (11)

GEOSTAT course, 11-17 April 2011, Canberra

Page 62: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

OLS prediction error

The prediction error of a multiple linear regression model is:

σ2OLS(s0) = MSE ·

[1 + qT

0 ·(qT · q

)−1 · q0

](12)

where MSE is the mean square (residual) error around the regression line:

MSE =

n∑i=1

[z (si)− z (si)]2

n − 2(13)

and q0 is the vector of predictors at new, unvisited location. The OLS

prediction error reflects the amount of extrapolation in the feature

space!

GEOSTAT course, 11-17 April 2011, Canberra

Page 63: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Adjusted R-square

The sum of squares of residuals (SSE ) can be used to determine theadjusted coefficient of multiple determination (R2

a), which describesthe goodness of fit:

R2a = 1−

(n − 1

n − p

)· SSE

SSTO

= 1−(n − 1

n − p

)·(1− R2

) (14)

where SSTO is the total sum of squares, R2 indicates amount of

variance explained by model, whereas R2a adjusts for the number of

variables (p) used. For many environmental mapping projects, a

R2a ≥0.85 is already a very satisfactory solution and higher values will

typically only mean over-fitting of the data.

GEOSTAT course, 11-17 April 2011, Canberra

Page 64: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Comparison of spatial prediction techniques

zinc.id zinc.tr zinc.lm zinc.ok

0

500

1000

1500

GEOSTAT course, 11-17 April 2011, Canberra

Page 65: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Universal model of spatial variation

From the statistical perspective, an environmental variable can beviewed as an information signal consisting of three components:

Z (s) = Z ∗(s) + ε′(s) + ε′′ (15)

where Z ∗(s) is the deterministic component, ε′(s) is thespatially correlated random component and ε′′ is the purenoise, usually the result of the measurement error.

GEOSTAT course, 11-17 April 2011, Canberra

Page 66: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Best Linear Unbiased Predictor

Matheron (1969) proposed that a value of a target variable at somelocation can be modelled as a sum of the deterministic and stochasticcomponents:

z (s0) = m(s0) + e(s0)

=

p∑k=0

βk · qk (s0) +n∑

i=1

λi · e(si)(16)

where m(s0) is the fitted deterministic part, e(s0) is the interpolated

residual, βk are estimated deterministic model coefficients (β0 is theestimated intercept), λi are kriging weights determined by the spatialdependence structure of the residual and where e(si) is the residual atlocation si .

GEOSTAT course, 11-17 April 2011, Canberra

Page 67: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

BLUP (2)

The regression coefficients βk can be estimated from the sample by somefitting method, e.g.ordinary least squares (OLS) or, optimally, usingGeneralized Least Squares:

βGLS =(qT ·C−1 · q

)−1 · qT ·C−1 · z (17)

where βGLS is the vector of estimated regression coefficients, C is thecovariance matrix of the residuals, q is a matrix of predictors at thesampling locations and z is the vector of measured values of the targetvariable.

GEOSTAT course, 11-17 April 2011, Canberra

Page 68: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

BLUP (3)

In matrix notation, regression-kriging is commonly written as(Christensen, 2001):

zRK(s0) = qT0 · βGLS + λT0 · (z− q · βGLS) (18)

where z (s0) is the predicted value at location s0, q0 is the vector ofp + 1 predictors and λ0 is the vector of n kriging weights used tointerpolate the residuals.

The estimation of the residuals is an iterative process: first the

deterministic part of variation is estimated using ordinary least squares

(OLS), then the covariance function of the residuals is used to obtain the

GLS coefficients. The most reliable way to estimate the model

coefficients is REML.

GEOSTAT course, 11-17 April 2011, Canberra

Page 69: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

RK variance

The RK prediction variance reflects the position of new locations(extrapolation) in both geographical and feature space:

σ2RK(s0) = (C0 + C1)− cT0 ·C−1 · c0

+(q0 − qT ·C−1 · c0

)T · (qT ·C−1 · q)−1 · (q0 − qT ·C−1 · c0

)(19)

where C0 + C1 is the sill variation and c0 is the vector of covariances ofresiduals at the unvisited location.

GEOSTAT course, 11-17 April 2011, Canberra

Page 70: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

RK and MLR

If the residuals show no spatial auto-correlation (pure nugget effect), theregression-kriging converges to pure multiple linear regression (Cbecomes identity matrix):

C =

C0 + C1 · · · 0... C0 + C1 00 0 C0 + C1

= (C0 + C1) · I (20)

so the kriging weights at any location predict the mean residual i.e.0value. Similarly, the regression-kriging variance reduces to the multiplelinear regression variance:

σ2RK(s0) = (C0 + C1)− 0 + qT

0 ·(qT · 1

(C0 + C1)· q)−1

· q0

σ2RK(s0) = σ2

OLS(s0) = MSE ·[1 + qT

0 ·(qT · q

)−1 · q0

](21)

GEOSTAT course, 11-17 April 2011, Canberra

Page 71: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

RK and OK

If the target variable shows no correlation with the auxiliary predictors,

the regression-kriging model reduces to ordinary kriging model because

the deterministic part equals the (global) mean value.

GEOSTAT course, 11-17 April 2011, Canberra

Page 72: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

RK and KED/UK

In the case of KED/UK, the extended covariance matrix of residuals isused, which looks like this:

CKED =

C (s1, s1) · · · C (s1, sn) 1 q1(s1) · · · qp(s1)...

......

......

C (sn , s1) · · · C (sn , sn) 1 q1(sn) · · · qp(sn)1 · · · 1 0 0 · · · 0

q1(s1) · · · q1(sn) 0 0 · · · 0...

... 0...

...qp(s1) · · · qp(sn) 0 0 · · · 0

(22)

GEOSTAT course, 11-17 April 2011, Canberra

Page 73: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

RK and KED/UK (2)

The KED/UK weights are solved using the extended matrices:

λKED0 ={wKED1 (s0), ...,w

KEDn (s0), ϕ0(s0), ..., ϕp(s0)

}T= CKED−1 · cKED0

(23)

where λKED0 is the vector of solved weights, ϕp are the Lagrangemultipliers, CKED is the extended covariance matrix of residuals and cKED0

is the extended vector of covariances at new location.

GEOSTAT course, 11-17 April 2011, Canberra

Page 74: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

RK and KED/UK (3)

The predictions at new locations are made by:

zKED(s0) =

n∑i=1

wKEDi (s0)·z (si) = δT0 · z (24)

for:n∑

i=1

wKEDi (s0)·qk (si) = qk (s0); k = 1, ..., p (25)

where δ0 is the vector of KED/UK weights (wKEDi ).

Hence, KED/UK looks exactly as ordinary kriging, except thecovariance matrix is extended with values of auxiliary predictors!

GEOSTAT course, 11-17 April 2011, Canberra

Page 75: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

The name confusion

Matheron (1969) originally termed the technique Le krigeage universel(Universal kriging), however, the technique was intended as ageneralized case of kriging where the trend is modelled as a function ofcoordinates only.

If the deterministic part of variation (drift) is defined externally as alinear function of some auxiliary variables, rather than the coordinates,the term Kriging with External Drift (KED) is preferred.

The drift and residuals can also be estimated separately and thensummed. This procedure was suggested by Ahmed and de Marsily (1987)and Odeh et al.(1995), who later named it Regression-kriging.

Minasny and McBratney (2007) suggest that instead a mathematicallyaccurate term should be used to name the technique: EBLUP.

GEOSTAT course, 11-17 April 2011, Canberra

Page 76: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Why do I prefer the term RK?

1. RK explicitly separates trend estimation from spatialprediction of residuals, allowing the use ofarbitrarily-complex forms of regression, rather than the simplelinear techniques

2. RK allows the separate interpretation of the twointerpolated components

3. The emphasis on regression is important also because fittingof the deterministic part of variation (regression) is often morebeneficial for the quality of final maps

4. KED (extended) matrix is instable in the case that thecovariate does not vary smoothly in space

GEOSTAT course, 11-17 April 2011, Canberra

Page 77: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Decision tree

YESIs the

variable correlated with environmental

factors?

NO

YESIs the physical model

known?

NO

ORDINARY KRIGING

YESDo the

residuals showspatial auto-correlation?

YESDo the

residuals showspatial auto-correlation?

YESDoes the

variable showsspatial auto-correlation?

NO

INVERSE DISTANCE

INTERPOLATION

YESCan a

variogram with >1 parametersbe fitted?

NO

ENVIRONMENTAL CORRELATION

(OLS)

REGRESSION-KRIGING

(calibration)

NO PREDICTIONSPOSSIBLE

REGRESSION-KRIGING

(GLS)

DETERMINISTIC MODEL

NO

NO

GEOSTAT course, 11-17 April 2011, Canberra

Page 78: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Gstat

inverse distance interpolation:

ev.id = krige(ev∼1, data=points, newdata=mapgrid)

correlation with coordinates (2nd order polynomial model):

ev.ts = krige(ev∼x+y+x*y+x*x+y*y, data=points,

newdata=mapgrid)

moving window (with coordinates):

ev.mv = krige(ev∼x+y+x*y+x*x+y*y, data=points,

newdata=mapgrid, nmax=20)

GEOSTAT course, 11-17 April 2011, Canberra

Page 79: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Gstat (2)

ordinary kriging:

ev.ok = krige(ev∼1, data=points, newdata=mapgrid,

model=vgm(psill=5, "Exp", range=1000, nugget=1))

environmental correlation (OLS):

ev.ec = krige(ev∼q1+q2, data=points, newdata=mapgrid)

regression-kriging (universal kriging):

ev.rk = krige(ev∼q1+q2, data=points, newdata=mapgrid,

model=vgm(psill=3, "Exp", range=500, nugget=0))

GEOSTAT course, 11-17 April 2011, Canberra

Page 80: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

R syntax

GEOSTAT course, 11-17 April 2011, Canberra

Page 81: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Space-time data

Universal kriging model for spatio-temporal data (Heuvelink &Griffith, 2010):

T (s, t) = m(s, t) + ε(s, t) (26)

where m(s, t) is the deterministic part of the variation (i.e. a linear

function of the auxiliary variables), ε(s, t) is the residual for every(s, t).

GEOSTAT course, 11-17 April 2011, Canberra

Page 82: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Space-time cube

GEOSTAT course, 11-17 April 2011, Canberra

Page 83: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Space-time workshop (Munster)

GEOSTAT course, 11-17 April 2011, Canberra

Page 84: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Space-time semivariance

γ(si , ti ; sj , tj ) = 0.5 · E[(ε(si , ti)− ε(sj , tj ))2

](27)

GEOSTAT course, 11-17 April 2011, Canberra

Page 85: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Residuals

Residuals (ε) consist of three stationary and independentcomponents (Heuvelink & Griffith, 2010):

ε(s, t) = εs(s) + εt(t) + εs,t(s, t) (28)

where εs(s) is a purely spatial process (with constant realizationsover time), εt(t) is a purely temporal process, and εs,t(s, t) is aspace-time process for which distance in space is made comparableto distance in time by introducing a space-time anisotropy ratio.

GEOSTAT course, 11-17 April 2011, Canberra

Page 86: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Zonal anisotropies

The covariance structure can be represented by (Snepvangers etal., 2003):

C (h, u) = Cs(h) + Ct(u) + Cs,t(√

h2 + (α+ u)2) (29)

where C (h, u) is the covariance at distance h in space, andtime-distance u, Cs(h) + Ct(u) allow the presence of zonalanisotropies (different variogram sills in different directions), andCs,t(

√h2 + (α+ u)2) allows the presence of geometric anisotropy

represented with the ratio α.

GEOSTAT course, 11-17 April 2011, Canberra

Page 87: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

The data set

GEOSTAT course, 11-17 April 2011, Canberra

Page 88: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

In space-time cube

XY

cdays

GEOSTAT course, 11-17 April 2011, Canberra

Page 89: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Variograms (separately)

0 50000 100000 150000 200000

05

1015

2025

365 days

Distance (m)

Sem

ivar

ianc

e

0 10 20 30 40 50 600

1020

3040

50

159 stations

Distance (in days)

Sem

ivar

ianc

e

GEOSTAT course, 11-17 April 2011, Canberra

Page 90: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Variograms (zonal anisotropy)

Distance (m)

sem

ivar

ianc

e

2

4

6

8

10

50000 100000 150000 200000

●●

● ●●

● ● ●● ●

● ●●

Distance (in days)se

miv

aria

nce

2

4

6

8

10

5 10 15

● ●●

●●

●●

●●

● ●

Marginal experimental variograms for residuals and fitted models:(left) space-domain only, (right) time-domain only.

GEOSTAT course, 11-17 April 2011, Canberra

Page 91: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Final results

GEOSTAT course, 11-17 April 2011, Canberra

Page 92: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some experiences

I By adding the time component we are better off.

I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.

I Fitting and visualization of space-time variograms is abottle-neck!

I Predictions need to be visualized as animations.

I We have ignored the one-way auto-correlation (time worksonly one way)?

GEOSTAT course, 11-17 April 2011, Canberra

Page 93: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some experiences

I By adding the time component we are better off.

I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.

I Fitting and visualization of space-time variograms is abottle-neck!

I Predictions need to be visualized as animations.

I We have ignored the one-way auto-correlation (time worksonly one way)?

GEOSTAT course, 11-17 April 2011, Canberra

Page 94: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some experiences

I By adding the time component we are better off.

I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.

I Fitting and visualization of space-time variograms is abottle-neck!

I Predictions need to be visualized as animations.

I We have ignored the one-way auto-correlation (time worksonly one way)?

GEOSTAT course, 11-17 April 2011, Canberra

Page 95: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some experiences

I By adding the time component we are better off.

I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.

I Fitting and visualization of space-time variograms is abottle-neck!

I Predictions need to be visualized as animations.

I We have ignored the one-way auto-correlation (time worksonly one way)?

GEOSTAT course, 11-17 April 2011, Canberra

Page 96: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some experiences

I By adding the time component we are better off.

I Automation of space-time regression-kriging (overlay,regression modeling, variogram fitting, predictions,visualization in Google Earth) is anticipated.

I Fitting and visualization of space-time variograms is abottle-neck!

I Predictions need to be visualized as animations.

I We have ignored the one-way auto-correlation (time worksonly one way)?

GEOSTAT course, 11-17 April 2011, Canberra

Page 97: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Universal space-time reference

Each observation should have by default:

I Longitude and latitude (WGS84) (or projected X ,Ycoordinates + proj4 string);

I Begin / end of the time interval in UTC (GMT) system;

I Support size (in square meters);

I Uncertainty or measurement error;

GEOSTAT course, 11-17 April 2011, Canberra

Page 98: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Space-time algebra re-visited

Should we (re)define and (re)implementspace-time (4D) algebra?

GEOSTAT course, 11-17 April 2011, Canberra

Page 99: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What does this mean?

I Distances always on a sphere (sphere geometry);

I Always use information about uncertainty (weightedregression);

I Always use information about the support size (nuggetestimation, cross-validation);

I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);

I Use Google Earth to visualize any type of geographic data;

GEOSTAT course, 11-17 April 2011, Canberra

Page 100: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What does this mean?

I Distances always on a sphere (sphere geometry);

I Always use information about uncertainty (weightedregression);

I Always use information about the support size (nuggetestimation, cross-validation);

I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);

I Use Google Earth to visualize any type of geographic data;

GEOSTAT course, 11-17 April 2011, Canberra

Page 101: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What does this mean?

I Distances always on a sphere (sphere geometry);

I Always use information about uncertainty (weightedregression);

I Always use information about the support size (nuggetestimation, cross-validation);

I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);

I Use Google Earth to visualize any type of geographic data;

GEOSTAT course, 11-17 April 2011, Canberra

Page 102: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What does this mean?

I Distances always on a sphere (sphere geometry);

I Always use information about uncertainty (weightedregression);

I Always use information about the support size (nuggetestimation, cross-validation);

I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);

I Use Google Earth to visualize any type of geographic data;

GEOSTAT course, 11-17 April 2011, Canberra

Page 103: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

What does this mean?

I Distances always on a sphere (sphere geometry);

I Always use information about uncertainty (weightedregression);

I Always use information about the support size (nuggetestimation, cross-validation);

I Re-implement also any raster processing (geomorphometry,resampling, filtering etc);

I Use Google Earth to visualize any type of geographic data;

GEOSTAT course, 11-17 April 2011, Canberra

Page 104: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Global Multiscale Nested RK

Global models will soon replace local (isolated) modeling. Onesuch approach is the nested RK model:

z (sB) = m0(sB−k )+e1(sB−k |sB−[k+1])+ . . .+ek (sB−2|sB−1)+ε(sB) (30)

where z (sB) is the value of the target variable estimated at groundscale (B), B−1, . . . ,B−k are the higher order components,ek (sB−k |sB−(k+1)) is the residual variation from scale sB−(k+1) to ahigher resolution scale sB−k , and ε is spatially auto-correlatedresidual soil variation (dealt with ordinary kriging).

GEOSTAT course, 11-17 April 2011, Canberra

Page 105: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Multi-scale concept

5.6 km

Global: climatic patterns and

processes, vegetation zones,

elevation

1 km

extent

Continental: geological zones,

meso-climatic conditions, erosion/

deposition at large scales

250 m Regional: general land use,

vegetation cover

100 mLocal: land management, erosion/

deposition at watershed level

GEOSTAT course, 11-17 April 2011, Canberra

Page 106: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Multi-resolution signal (McBratney, 1998)

GEOSTAT course, 11-17 April 2011, Canberra

Page 107: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some thoughts

I Global models — global multiscale predictions — arenow.

I It is very probable that, in the near future, any geostatisticalanalysis will be global.

I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).

I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!

GEOSTAT course, 11-17 April 2011, Canberra

Page 108: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some thoughts

I Global models — global multiscale predictions — arenow.

I It is very probable that, in the near future, any geostatisticalanalysis will be global.

I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).

I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!

GEOSTAT course, 11-17 April 2011, Canberra

Page 109: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some thoughts

I Global models — global multiscale predictions — arenow.

I It is very probable that, in the near future, any geostatisticalanalysis will be global.

I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).

I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!

GEOSTAT course, 11-17 April 2011, Canberra

Page 110: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Some thoughts

I Global models — global multiscale predictions — arenow.

I It is very probable that, in the near future, any geostatisticalanalysis will be global.

I We probably need to re-write the geostatistical algorithmsso they work with sphere geometry (3D + time).

I There is enormous amount of publicly available RS andGIS data that is waiting to be used for geostatisticalmapping — use it!

GEOSTAT course, 11-17 April 2011, Canberra

Page 111: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Limits of geostatistics in R

I Sampling optimization algorithms for geostatisticalmodelling are missing

I Local regression-kriging still waits to be implemented

I In gstat, only linear models can be implemented; geoR canincorporate also non-linear models, but it can only work with<< 103 points

I KED algorithm can be rather slow and can lead to instabilities(singularity problem)

I Interactive visualization and fitting of 3D space-timevariograms is missing

GEOSTAT course, 11-17 April 2011, Canberra

Page 112: A Practical Guide to - Geostatistical Mapping - GEOSTAT courses

Nothing can save bad data!

I Even the most sophisticated geostatistical tools will not beable to save the data sets of poor quality! If you want toproduce quality outputs (maps/reports), make sure your inputfield data satisfies some minimum requirements:

1. it is large enough2. it is representative3. it is independent4. it is produced using consistent methodology5. its precision is significantly precise

I Geostatistical mapping using inconsistent point samples ispossible, but do you really need this?

GEOSTAT course, 11-17 April 2011, Canberra