geostatistic_2
TRANSCRIPT
-
7/28/2019 Geostatistic_2
1/54
Introduction to Spatial
Statistics
Budhi Setiawan, PhD
-
7/28/2019 Geostatistic_2
2/54
Types of Spatial Data
Continuous Random Field
Lattice Data
Point Pattern Data
Note: Each type of data is analyzed
differently
-
7/28/2019 Geostatistic_2
3/54
Geostatistics
Geostatistical analysis is distinct fromother spatial models in the statistics
literature in that it assumes the region
of study is continuous
Observations could
be taken at any point
within the study area
Interpolation at
points in between
observed locations
makes sense 05
1015
20
X0
5
10
15
20
Y
0
0.1
0.2
0.3
0.4
0.5
Z
-
7/28/2019 Geostatistic_2
4/54
Spatial Autocorrelation
Spatial modeling is based on theassumption that observations close inspace tend to co-vary more stronglythan those far from each other Positively co-vary: values are similar in
value E.g. elevation (or depth) tends to be similar for
locations close together)
Negatively co-vary: values tend to beopposite in value E.g. density of an organism that is highly
spatially clustered, where observations inbetween clusters are low and values within
clusters are high
-
7/28/2019 Geostatistic_2
5/54
Covariance Definition: two variables are said to co-vary
if their correlation coefficient is not zero
where is the correlation coefficientbetween X andY and X (Y) is thestandard deviation ofX(Y)
Consider this in the context of a single
variable
E.g. do nearest neighbors have non-zero
yxyxyx yxEyxyx )])([(),cov(),(,
-
7/28/2019 Geostatistic_2
6/54
Continuous Data Geostatistics
Notation
Z(s) is the random process at location s=(x,
y)z(s) is the observed value of the process atlocation s=(x, y)
D is the study region
The sample is the set {z(s) : s D} . We saythat it is a partial realization of the randomspatial process {Z(s) : s D}
-
7/28/2019 Geostatistic_2
7/54
Conceptual Model
where
(s) is the mean structure; called large-scale non-spatial trend
W(s) is a zero-mean, stationary process whose autocorrelationrange is larger than min{|| si sj||: i,j = 1, 2, , n}; called smoothsmall-scale variation
(s) is a zero-mean, stationary process whose autocorrelation
range is
smaller than min{|| si sj||: i,j = 1, 2, , n} and which isindependent of W(s); called micro-scale variation ormeasurementerror
(s) is the random noise term with zero-mean and constant
)()()()()( sssWssZ
)()()()()( WZ
-
7/28/2019 Geostatistic_2
8/54
Simpler Conceptual Model
where
(s) is the mean structure; called large-scale non-spatial trend
(s) = W(s) + (s) is a zero-mean, stationaryprocess with autocorrelation which combines the
smooth small- scale and micro-scale variation
(s) is the random noise term with zero-mean andconstant variance which is independent of W(s) and
(s)
)()()()()( sssWssZ
)()()()( ssssZ
-
7/28/2019 Geostatistic_2
9/54
Graphical Concept with Trend
-5
0
5
10
15
20
25
30
35
Z
0 5 10 15 20 25 30 35
X
Red line indicates large-scaletrend
Green line shows how the
data are arranged around thetrend
Note that there is a pattern
to the points around the red
line. The pattern impliespossible positive
autocorrelation in Z(x).
Finally, there is white noise.
-
7/28/2019 Geostatistic_2
10/54
Graphical Concept without Trend
Red line indicates aconstant mean, i.e. no large-
scale trend
Green line shows how thedata are arranged around the
trend
Again, the pattern of the
green line implies possiblepositive autocorrelation in
RZ(x)
-15
-10
-5
0
5
10
15
RZ
0 5 10 15 20 25 30 35
X
-
7/28/2019 Geostatistic_2
11/54
Important Point
The model indicates that Z can bedecomposed into large-scale
variation, small + micro-scale
variation, and noise The reality is that any estimated
decomposition is not a unique
E.g. in the graph just shown, we couldhave instead added a sinusoidal aspect to
the large-scale trend and hence captured
much of the apparent autocorrelation
-
7/28/2019 Geostatistic_2
12/54
Example
Red line indicates large-scaletrend captured by a
sinusoidal + linear trend
Green line shows how thedata are arranged around the
trend
Note that now there is no
obvious pattern and so theremaining unexplained
variation is likely white noise
in Z(x).
-5
0
5
10
15
20
25
30
35
Z
0 5 10 15 20 25 30 35
X
-
7/28/2019 Geostatistic_2
13/54
Modeling
Ultimately we want to do modeling ofZ using the geostatistical model
Requires estimates of the model
components the mean
the small-scale variation and the
covariances among Z values at differentlocations
Any leftovers, i.e. the unexplained or
residual variability
)()()()( ssssZ
-
7/28/2019 Geostatistic_2
14/54
Important Point
The choice of approach (detailed fit of atrend vs. large-scale trend + autocorrelation)to estimating/predicting Z depends stronglyon the reason for and uses of the model E.g. if you are interested in predicting Z at
unsampled locations within the study area, thenany model that uses covariates to estimate large-scale trend must also have the covariates known
for the unsampled locations E.g. if you are interested in understanding the
reasons for the spatial distribution of Z then youmay or may not want to incorporate a spatialcorrelation component
2/)]()([ 2tZsZ 2/)]()([ 2tZsZ 2/)]()([ 2tZsZ
-
7/28/2019 Geostatistic_2
15/54
Correlation Structure
(Semivariogram)
Now, to assess spatial autocorrelation we look atthe behavior of the following:
for every possible pair of locations in the dataset (Nlocations yields N(N-1)/2 pairs).
Correlated: we would expect Z(si) to be similar in
value to Z(sj) and hence the squared difference tobe small.
Independent: we would expect the squareddifference to be relatively large since the two
numbers would vary according to the populationvariabilit .
2/)]()([ tZsZ 2/)]()([ tZsZ 2/)]()([ tZsZ
2
)]()([ 2ji
ij
sZsZ
-
7/28/2019 Geostatistic_2
16/54
Plot (Variogram Cloud)
distance
gamma
5 10 15
0.0
0.02
0.0
4
0.0
6
0.0
8
0.1
0
Looking forpattern, i.e. is
there a trend in
with respect to
distance between
two locations
Variogram cloud for a dataset of 400 observations
-
7/28/2019 Geostatistic_2
17/54
Empirical Variogram
The variogram cloud is usually veryuninformative Difficult to discern trend or pattern
More pertinent is to calculate the averagevalues of for different distances Problem is we dont usually have discrete
distances between locations (happens onlywhen data are on a perfect grid).
A common method for averaging at specificdistances is to bin the distances into intervals(called lag distances), i.e. use all points withinsome bin width around a given distance value
-
7/28/2019 Geostatistic_2
18/54
-
7/28/2019 Geostatistic_2
19/54
Continuous Data Geostatistics
Because we do not usually have lots of values at
discrete distances, a common method for averagingthe values at discrete distances is to use all pointswithin some bin width around a given distance value.
So we choose several levels ofh (distances) and
calculate the empirical variogram:
where N(h) is the set of all locations that are a distanceof h apart within a tolerance region around h, i.e.
and |N(h)| is the number of pairs in N(h).
2
( )
12 ( ) [ ( ) ( )]
| ( ) | N hh Z s Z t
N h
)}(||||||:||),{()( htoltsorhtstshN
-
7/28/2019 Geostatistic_2
20/54
Empirical Semivariogram
distance
gamma
0 2 4 6 8 10 12
0.0
0.5
1.0
1.5
2.0
2.5
3.0
This plot is called anomnidirectional classical
empirical semivariogram
Omnidirectional because the
direction between the pairs of
locations was ignored,
Classical because the
equation used to estimate the
mean (alternatives exist that
are robust to outliers or tofailure of assumptions of the
model)
Semi because of the division
by 2 in the equation used
Graph based on a set of 20 distance lags
-
7/28/2019 Geostatistic_2
21/54
Important Points
The constantly increasing semi-variogramindicates that there is a problem with thisdataset Ideally, it should at some distance level off at the
variance of the process implying that at somedistance the relationship between 2 locations isthe same regardless of the distance betweenthem (i.e. observations are independent at largedistances)
This graph indicates that The data imply correlation exists at all distances (and
therefore the study region is small relative to the rangeof autocorrelation) or
The data have a large-scale trend which may accountfor most of the seeming autocorrelation (small-scale
-
7/28/2019 Geostatistic_2
22/54
Semivariogram
distance
gamma
0 2 4 6 8 10 12
0.0
0.5
1.0
1.5
Note the rise andthen leveling off
of the (h) values
as distance
increases
Well cover shapes
for variograms in
more detail later
Empirical semivariogram for different dataset in whichthere was no large-scale trend but definite autocorrelation
-
7/28/2019 Geostatistic_2
23/54
Semivariogram
Note that the (h)values are more-
or-less the same
regardless of
distance
Empirical semivariogram for different dataset in whichthere was no large-scale trend and no autocorrelation
distance
gamma
0 5 10 15
0.0
0.0
02
0.0
04
0.0
06
0.0
08
-
7/28/2019 Geostatistic_2
24/54
Important Points
If the empirical semivariogram increases indistance between locations, then thecorrelation between points is decreasing asdistance increases
The point at which it flattens to a constantvalue is the distance at which any two pointsthat distance or larger apart are independent.The value of is the variance of the spatialprocess
At this point in our analyses, the number of lagdistances you use is not that critical but when
we try to fit a curve to the empiricalsemivario ram later the number of la s
-
7/28/2019 Geostatistic_2
25/54
Important Point About Directionality
Another point to consider is whether the
pattern of autocorrelation, i.e. the shape of
the curve describing the semivariogram, is
the same in every direction.
Cant tell from the omnidirectional plot.
Need to check if there is a directional effect
-
7/28/2019 Geostatistic_2
26/54
Directional Semivariograms
To check directionality in thecovariance, plot for each h for
different directions
Modify the sets of locations over
which the averaging occurs
Typically done using a set of binned
directions (wedges of the compass) Requires that you modify the definition
of neighborhood
)}(),(||||:),{(),( angletolhtoltstshN
-
7/28/2019 Geostatistic_2
27/54
Directional Semivariograms
EXAMPLE:
calculate mean
variability forthe angles 0,
22.5, 45, 67.5,
90, and 112.5
with a toleranceof 11.25 on
each side.0
1
2
3
4
5
0
0 2 4 6 8 10 12
22.5 45
0 2 4 6 8 10 12
67.5 90
0 2 4 6 8 10 12
0
1
2
3
4
5
112.5
distance
-
7/28/2019 Geostatistic_2
28/54
Need for Assumptions in Order to
Proceed Beyond This Point
The data that are collected are apartial observation of the spatialsurface (e.g. map) that we areinterested in
In addition, it is usually assumed thatthere is some super process thatcreated the particular surface forwhich we have this partial view To estimate the spatial autocorrelation we
need to make some assumptions. Otherwise, we dont have sufficient
information to make any inferences.
-
7/28/2019 Geostatistic_2
29/54
Two Assumptions
Stationarity, specifically second-order
stationarity
Isotropy
DstsCtZsZ )())(),(cov(
-
7/28/2019 Geostatistic_2
30/54
Stationarity
The mean of the process is constant, i.e. no trend(s) = for all s D (1)
The covariance between any pair of points
depends only on the distance (and possibly
direction) of the points NOT the location of the
points in space:
where C(.) is the covariance function This implies that the variance of Z is constant everywhere
If both points are met then the spatial process we
are studying is said to be second-order
stationary.
DsssCsZsZ jiji )())(),(cov(
-
7/28/2019 Geostatistic_2
31/54
Relationship between Semivariogram and Correlation
Assuming intrinsic stationarity, we have
Now, assuming that ,
we have
where . Thus,
[ ( ) ( )] 0E Z Z s h s
[ ( ) ( )] 2 ( )Var Z Z s h s h
1 2 1 2 1 2
2
[ ( ) ( )] [ ( )] [ ( )] 2 [ ( ), ( )]
2 2 ( )
Var Z Z Var Z Var Z Cov Z Z
C
s s s s s s
h
2
1 2[ ( )] [ ( )]Var Z Var Z s s
1 2 s s h2( ) ( )C h h
-
7/28/2019 Geostatistic_2
32/54
Isotropy
The covariance between any pair ofpoints does not depend on direction
but only distance
)(||)(||))(),(cov( hCssCsZsZ jiji
If this holds
then the spatialprocess is said
to be isotropic
-
7/28/2019 Geostatistic_2
33/54
Non-Constant Mean
Two ways to handle a trend when it does
exist:
Detrend the data using regression (or similar) with
covariates and then use the residuals from thetrend analysis for the spatial autocorrelation
analysis
E.g. disease rates as a function of population density
Universal kriging (UK) which allows for estimatingthe trend as a global polynomial in s = (x, y) and
estimating the spatial autocorrelation
simultaneously
UK ignores other explanatory covariates which can be
-
7/28/2019 Geostatistic_2
34/54
Non-Constant Variance
To account for heterogeneity (non-
constant variance),
estimate variability in smaller subregions of
the study area Need to make decisions about the size and extent of
the subregions
Need sufficient numbers of observations within each
subregion
Transform or standardize your data so that the
variability of the transformed values is constant
over the region
-
7/28/2019 Geostatistic_2
35/54
Anisotropy
Two types of anisotropy Geometric the range over which correlation is non-zero depends
on direction
The variance is constant over all directions
This type can be adjusted for in geostatistical analyses
Zonal
Anything not geometric anisotropyAnisotropy implies that the spatial process
evolves differentially throughout the studyregion
-
7/28/2019 Geostatistic_2
36/54
Variography
Fitting a valid semivariogram functionto the empirical semivariogram
Now we are interested in describing
the variogram as an equation in whichvariance is a function of the distance.
We shall assume that the spatial
process is second-order stationaryand isotropic in the following.
-
7/28/2019 Geostatistic_2
37/54
Semivariogram
We have already seen how to obtain the empiricalvariogram of
is the semivariogram and is the primaryquantity of interest because
Now we are interested in describing the
semivariogram as a function of the distance.
We shall assume that the spatial process is second-order stationary and isotropic in the following.
))()(var()(2 tZsZh
)(h
)()0()( hCCh
-
7/28/2019 Geostatistic_2
38/54
Semivariogram
Semivariogram Models have the following
properties:
1) Many are not linear in their parameters
2) Must be conditionally negative-definite, i.e. thefunction must satisfy
for any real numbers satisfying
3) If as , there is microscalevariation which is assumed to be due tomeasurement error (ME) or a process occurring atthe microscale. ME is measurable only if we have
replicate values at each location in the sample.
s t
tstsaa )(20
0ia
0)( ch 0h
-
7/28/2019 Geostatistic_2
39/54
Semivariogram
Semivariogram Models have the followingproperties:
If(h) is constant for every h except h = 0 where(0) = 0, then Z(s) and Z(t) are uncorrelated for
any pair of locations s and t
, i.e. ||h||2 is
increasing faster than (h) as h increases
||||0||||/)(2 2 hashh
-
7/28/2019 Geostatistic_2
40/54
distance
sill
nugget
range
A Typical Semivariogram
-
7/28/2019 Geostatistic_2
41/54
Characteristics of the Semivariogram
It is 0 when the separation distance is 0 (Var(0)=0).
Nugget effect: variation in two points very closetogether.
May be measurement error
May be indicative of erratic process (gold ore).
The sill corresponds to the overall variance of thedata.
Data separated by distances less than the range
are spatially autocorrelated (Less variationbetween close observations than between farobservations.)
-
7/28/2019 Geostatistic_2
42/54
22 )()(|||| jijiii yyxx ss
Estimating the Semivariogram
Take all pairwise differences in the data:(Z(si)-Z(sj)), s= (x, y), a point in the 2-D plane.
Compute the Euclidean distance between the
spatial locations:
Average pairs that have the same distanceclass;
Binning: like a 2-D histogram.
-
7/28/2019 Geostatistic_2
43/54
End Result: Empirical Semivariogram
M d li th S i i
-
7/28/2019 Geostatistic_2
44/54
Modeling the Semivariogram
The semivariogram measures variation among
units h units apart.
Note: We do not want negative standard errors.
So, we model the semivariogram with selected
parametric functions ensuring all standard errorsare nonnegative.
We estimate the nugget, sill, and range
parameters of the model that best fit the empirical
semivariogram (nonlinear least squares problem).
-
7/28/2019 Geostatistic_2
45/54
Selected
semivariogram
models
-
7/28/2019 Geostatistic_2
46/54
Covariogram Models
Power Model is simply areparameterization of the
exponential model.
Spherical
Model
Exponential Model
Gaussian Model
-
7/28/2019 Geostatistic_2
47/54
Covariogram vs. Semivariogram
The covariogram and semivariogram are related:
)()0()( hCCh
-
7/28/2019 Geostatistic_2
48/54
The fitted semivariogram model
Estimates: nugget=0.084, sill=0.269, range=110.3 miles
-
7/28/2019 Geostatistic_2
49/54
Common methods for fitting these functions to a set of empiricalsemivariogram means:
1) choose the most likely candidate model
2) Methods for estimating the parameters of the model :
non-linear least squares estimation allows for the estimation of parametersthat enter the equation non-linearly but ignores any dependences among theempirical variogram values
non-linear weighted least-squares generalized least squares in which thevariance-covariance of the variogram data points is accounted for in theestimation procedure
maximum likelihood assuming the data are Normally distributed but the
estimators are likely to be highly biased, especially in small samples (theusual remedy is jackknifing)
restricted maximum likelihood maximize a slightly altered likelihood functionwhich reduces the bias of the MLEs
P ti f V i
s t
ts tsaa )(20 s t
ts tsaa )(20 s t
ts tsaa )(20 s t
ts tsaa )(20 s t
ts tsaa )(20 s t
ts tsaa )(20 ts tsaa )(20 ts tsaa )(20 ts tsaa )(20 0ia 0ia 0ia 0ia 0ia 0ia
-
7/28/2019 Geostatistic_2
50/54
Properties of Variogram
Models if as then there is microscale
variation Usually assumed to be due to measurement
error (ME)
ME is measurable only if we have replicatevalues at each location in the sample When fitting a variogram function, may estimate
a non-zero value for c0 even when you do nothave replicate observations at sites. This is
called the nugget.
if(h) is constant for every h except h=0where (0) = 0, then Z(si) and Z(sj) areuncorrelated for any pair of locations si andsj
s ts ts t
0)( ch 0h
P ti f V i
-
7/28/2019 Geostatistic_2
51/54
Properties of Variogram
Models
-
7/28/2019 Geostatistic_2
52/54
Choosing a Best Model
Need to choose the variogram model that
best fits the data
Best minimum unexplained variation after
fitting Look at a measure of deviance
where is the empirical semivariogram for the ith
lag and is the value predicted by the fitted
semivariogram model
i
ii hh2)]()([
)( ih
)( ih
-
7/28/2019 Geostatistic_2
53/54
Choosing a Best Model
In the absence of comparing deviance(or similar) measures to determine if
the model seems appropriate
Compare fits visually
Use prior knowledge from other studies to
determine
-
7/28/2019 Geostatistic_2
54/54
Next Steps
Using the results of the variography todo statistical modeling of the spatial
process
kriging