geostatistic_2

Upload: edwin-harsiga

Post on 03-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Geostatistic_2

    1/54

    Introduction to Spatial

    Statistics

    Budhi Setiawan, PhD

  • 7/28/2019 Geostatistic_2

    2/54

    Types of Spatial Data

    Continuous Random Field

    Lattice Data

    Point Pattern Data

    Note: Each type of data is analyzed

    differently

  • 7/28/2019 Geostatistic_2

    3/54

    Geostatistics

    Geostatistical analysis is distinct fromother spatial models in the statistics

    literature in that it assumes the region

    of study is continuous

    Observations could

    be taken at any point

    within the study area

    Interpolation at

    points in between

    observed locations

    makes sense 05

    1015

    20

    X0

    5

    10

    15

    20

    Y

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    Z

  • 7/28/2019 Geostatistic_2

    4/54

    Spatial Autocorrelation

    Spatial modeling is based on theassumption that observations close inspace tend to co-vary more stronglythan those far from each other Positively co-vary: values are similar in

    value E.g. elevation (or depth) tends to be similar for

    locations close together)

    Negatively co-vary: values tend to beopposite in value E.g. density of an organism that is highly

    spatially clustered, where observations inbetween clusters are low and values within

    clusters are high

  • 7/28/2019 Geostatistic_2

    5/54

    Covariance Definition: two variables are said to co-vary

    if their correlation coefficient is not zero

    where is the correlation coefficientbetween X andY and X (Y) is thestandard deviation ofX(Y)

    Consider this in the context of a single

    variable

    E.g. do nearest neighbors have non-zero

    yxyxyx yxEyxyx )])([(),cov(),(,

  • 7/28/2019 Geostatistic_2

    6/54

    Continuous Data Geostatistics

    Notation

    Z(s) is the random process at location s=(x,

    y)z(s) is the observed value of the process atlocation s=(x, y)

    D is the study region

    The sample is the set {z(s) : s D} . We saythat it is a partial realization of the randomspatial process {Z(s) : s D}

  • 7/28/2019 Geostatistic_2

    7/54

    Conceptual Model

    where

    (s) is the mean structure; called large-scale non-spatial trend

    W(s) is a zero-mean, stationary process whose autocorrelationrange is larger than min{|| si sj||: i,j = 1, 2, , n}; called smoothsmall-scale variation

    (s) is a zero-mean, stationary process whose autocorrelation

    range is

    smaller than min{|| si sj||: i,j = 1, 2, , n} and which isindependent of W(s); called micro-scale variation ormeasurementerror

    (s) is the random noise term with zero-mean and constant

    )()()()()( sssWssZ

    )()()()()( WZ

  • 7/28/2019 Geostatistic_2

    8/54

    Simpler Conceptual Model

    where

    (s) is the mean structure; called large-scale non-spatial trend

    (s) = W(s) + (s) is a zero-mean, stationaryprocess with autocorrelation which combines the

    smooth small- scale and micro-scale variation

    (s) is the random noise term with zero-mean andconstant variance which is independent of W(s) and

    (s)

    )()()()()( sssWssZ

    )()()()( ssssZ

  • 7/28/2019 Geostatistic_2

    9/54

    Graphical Concept with Trend

    -5

    0

    5

    10

    15

    20

    25

    30

    35

    Z

    0 5 10 15 20 25 30 35

    X

    Red line indicates large-scaletrend

    Green line shows how the

    data are arranged around thetrend

    Note that there is a pattern

    to the points around the red

    line. The pattern impliespossible positive

    autocorrelation in Z(x).

    Finally, there is white noise.

  • 7/28/2019 Geostatistic_2

    10/54

    Graphical Concept without Trend

    Red line indicates aconstant mean, i.e. no large-

    scale trend

    Green line shows how thedata are arranged around the

    trend

    Again, the pattern of the

    green line implies possiblepositive autocorrelation in

    RZ(x)

    -15

    -10

    -5

    0

    5

    10

    15

    RZ

    0 5 10 15 20 25 30 35

    X

  • 7/28/2019 Geostatistic_2

    11/54

    Important Point

    The model indicates that Z can bedecomposed into large-scale

    variation, small + micro-scale

    variation, and noise The reality is that any estimated

    decomposition is not a unique

    E.g. in the graph just shown, we couldhave instead added a sinusoidal aspect to

    the large-scale trend and hence captured

    much of the apparent autocorrelation

  • 7/28/2019 Geostatistic_2

    12/54

    Example

    Red line indicates large-scaletrend captured by a

    sinusoidal + linear trend

    Green line shows how thedata are arranged around the

    trend

    Note that now there is no

    obvious pattern and so theremaining unexplained

    variation is likely white noise

    in Z(x).

    -5

    0

    5

    10

    15

    20

    25

    30

    35

    Z

    0 5 10 15 20 25 30 35

    X

  • 7/28/2019 Geostatistic_2

    13/54

    Modeling

    Ultimately we want to do modeling ofZ using the geostatistical model

    Requires estimates of the model

    components the mean

    the small-scale variation and the

    covariances among Z values at differentlocations

    Any leftovers, i.e. the unexplained or

    residual variability

    )()()()( ssssZ

  • 7/28/2019 Geostatistic_2

    14/54

    Important Point

    The choice of approach (detailed fit of atrend vs. large-scale trend + autocorrelation)to estimating/predicting Z depends stronglyon the reason for and uses of the model E.g. if you are interested in predicting Z at

    unsampled locations within the study area, thenany model that uses covariates to estimate large-scale trend must also have the covariates known

    for the unsampled locations E.g. if you are interested in understanding the

    reasons for the spatial distribution of Z then youmay or may not want to incorporate a spatialcorrelation component

    2/)]()([ 2tZsZ 2/)]()([ 2tZsZ 2/)]()([ 2tZsZ

  • 7/28/2019 Geostatistic_2

    15/54

    Correlation Structure

    (Semivariogram)

    Now, to assess spatial autocorrelation we look atthe behavior of the following:

    for every possible pair of locations in the dataset (Nlocations yields N(N-1)/2 pairs).

    Correlated: we would expect Z(si) to be similar in

    value to Z(sj) and hence the squared difference tobe small.

    Independent: we would expect the squareddifference to be relatively large since the two

    numbers would vary according to the populationvariabilit .

    2/)]()([ tZsZ 2/)]()([ tZsZ 2/)]()([ tZsZ

    2

    )]()([ 2ji

    ij

    sZsZ

  • 7/28/2019 Geostatistic_2

    16/54

    Plot (Variogram Cloud)

    distance

    gamma

    5 10 15

    0.0

    0.02

    0.0

    4

    0.0

    6

    0.0

    8

    0.1

    0

    Looking forpattern, i.e. is

    there a trend in

    with respect to

    distance between

    two locations

    Variogram cloud for a dataset of 400 observations

  • 7/28/2019 Geostatistic_2

    17/54

    Empirical Variogram

    The variogram cloud is usually veryuninformative Difficult to discern trend or pattern

    More pertinent is to calculate the averagevalues of for different distances Problem is we dont usually have discrete

    distances between locations (happens onlywhen data are on a perfect grid).

    A common method for averaging at specificdistances is to bin the distances into intervals(called lag distances), i.e. use all points withinsome bin width around a given distance value

  • 7/28/2019 Geostatistic_2

    18/54

  • 7/28/2019 Geostatistic_2

    19/54

    Continuous Data Geostatistics

    Because we do not usually have lots of values at

    discrete distances, a common method for averagingthe values at discrete distances is to use all pointswithin some bin width around a given distance value.

    So we choose several levels ofh (distances) and

    calculate the empirical variogram:

    where N(h) is the set of all locations that are a distanceof h apart within a tolerance region around h, i.e.

    and |N(h)| is the number of pairs in N(h).

    2

    ( )

    12 ( ) [ ( ) ( )]

    | ( ) | N hh Z s Z t

    N h

    )}(||||||:||),{()( htoltsorhtstshN

  • 7/28/2019 Geostatistic_2

    20/54

    Empirical Semivariogram

    distance

    gamma

    0 2 4 6 8 10 12

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    This plot is called anomnidirectional classical

    empirical semivariogram

    Omnidirectional because the

    direction between the pairs of

    locations was ignored,

    Classical because the

    equation used to estimate the

    mean (alternatives exist that

    are robust to outliers or tofailure of assumptions of the

    model)

    Semi because of the division

    by 2 in the equation used

    Graph based on a set of 20 distance lags

  • 7/28/2019 Geostatistic_2

    21/54

    Important Points

    The constantly increasing semi-variogramindicates that there is a problem with thisdataset Ideally, it should at some distance level off at the

    variance of the process implying that at somedistance the relationship between 2 locations isthe same regardless of the distance betweenthem (i.e. observations are independent at largedistances)

    This graph indicates that The data imply correlation exists at all distances (and

    therefore the study region is small relative to the rangeof autocorrelation) or

    The data have a large-scale trend which may accountfor most of the seeming autocorrelation (small-scale

  • 7/28/2019 Geostatistic_2

    22/54

    Semivariogram

    distance

    gamma

    0 2 4 6 8 10 12

    0.0

    0.5

    1.0

    1.5

    Note the rise andthen leveling off

    of the (h) values

    as distance

    increases

    Well cover shapes

    for variograms in

    more detail later

    Empirical semivariogram for different dataset in whichthere was no large-scale trend but definite autocorrelation

  • 7/28/2019 Geostatistic_2

    23/54

    Semivariogram

    Note that the (h)values are more-

    or-less the same

    regardless of

    distance

    Empirical semivariogram for different dataset in whichthere was no large-scale trend and no autocorrelation

    distance

    gamma

    0 5 10 15

    0.0

    0.0

    02

    0.0

    04

    0.0

    06

    0.0

    08

  • 7/28/2019 Geostatistic_2

    24/54

    Important Points

    If the empirical semivariogram increases indistance between locations, then thecorrelation between points is decreasing asdistance increases

    The point at which it flattens to a constantvalue is the distance at which any two pointsthat distance or larger apart are independent.The value of is the variance of the spatialprocess

    At this point in our analyses, the number of lagdistances you use is not that critical but when

    we try to fit a curve to the empiricalsemivario ram later the number of la s

  • 7/28/2019 Geostatistic_2

    25/54

    Important Point About Directionality

    Another point to consider is whether the

    pattern of autocorrelation, i.e. the shape of

    the curve describing the semivariogram, is

    the same in every direction.

    Cant tell from the omnidirectional plot.

    Need to check if there is a directional effect

  • 7/28/2019 Geostatistic_2

    26/54

    Directional Semivariograms

    To check directionality in thecovariance, plot for each h for

    different directions

    Modify the sets of locations over

    which the averaging occurs

    Typically done using a set of binned

    directions (wedges of the compass) Requires that you modify the definition

    of neighborhood

    )}(),(||||:),{(),( angletolhtoltstshN

  • 7/28/2019 Geostatistic_2

    27/54

    Directional Semivariograms

    EXAMPLE:

    calculate mean

    variability forthe angles 0,

    22.5, 45, 67.5,

    90, and 112.5

    with a toleranceof 11.25 on

    each side.0

    1

    2

    3

    4

    5

    0

    0 2 4 6 8 10 12

    22.5 45

    0 2 4 6 8 10 12

    67.5 90

    0 2 4 6 8 10 12

    0

    1

    2

    3

    4

    5

    112.5

    distance

  • 7/28/2019 Geostatistic_2

    28/54

    Need for Assumptions in Order to

    Proceed Beyond This Point

    The data that are collected are apartial observation of the spatialsurface (e.g. map) that we areinterested in

    In addition, it is usually assumed thatthere is some super process thatcreated the particular surface forwhich we have this partial view To estimate the spatial autocorrelation we

    need to make some assumptions. Otherwise, we dont have sufficient

    information to make any inferences.

  • 7/28/2019 Geostatistic_2

    29/54

    Two Assumptions

    Stationarity, specifically second-order

    stationarity

    Isotropy

    DstsCtZsZ )())(),(cov(

  • 7/28/2019 Geostatistic_2

    30/54

    Stationarity

    The mean of the process is constant, i.e. no trend(s) = for all s D (1)

    The covariance between any pair of points

    depends only on the distance (and possibly

    direction) of the points NOT the location of the

    points in space:

    where C(.) is the covariance function This implies that the variance of Z is constant everywhere

    If both points are met then the spatial process we

    are studying is said to be second-order

    stationary.

    DsssCsZsZ jiji )())(),(cov(

  • 7/28/2019 Geostatistic_2

    31/54

    Relationship between Semivariogram and Correlation

    Assuming intrinsic stationarity, we have

    Now, assuming that ,

    we have

    where . Thus,

    [ ( ) ( )] 0E Z Z s h s

    [ ( ) ( )] 2 ( )Var Z Z s h s h

    1 2 1 2 1 2

    2

    [ ( ) ( )] [ ( )] [ ( )] 2 [ ( ), ( )]

    2 2 ( )

    Var Z Z Var Z Var Z Cov Z Z

    C

    s s s s s s

    h

    2

    1 2[ ( )] [ ( )]Var Z Var Z s s

    1 2 s s h2( ) ( )C h h

  • 7/28/2019 Geostatistic_2

    32/54

    Isotropy

    The covariance between any pair ofpoints does not depend on direction

    but only distance

    )(||)(||))(),(cov( hCssCsZsZ jiji

    If this holds

    then the spatialprocess is said

    to be isotropic

  • 7/28/2019 Geostatistic_2

    33/54

    Non-Constant Mean

    Two ways to handle a trend when it does

    exist:

    Detrend the data using regression (or similar) with

    covariates and then use the residuals from thetrend analysis for the spatial autocorrelation

    analysis

    E.g. disease rates as a function of population density

    Universal kriging (UK) which allows for estimatingthe trend as a global polynomial in s = (x, y) and

    estimating the spatial autocorrelation

    simultaneously

    UK ignores other explanatory covariates which can be

  • 7/28/2019 Geostatistic_2

    34/54

    Non-Constant Variance

    To account for heterogeneity (non-

    constant variance),

    estimate variability in smaller subregions of

    the study area Need to make decisions about the size and extent of

    the subregions

    Need sufficient numbers of observations within each

    subregion

    Transform or standardize your data so that the

    variability of the transformed values is constant

    over the region

  • 7/28/2019 Geostatistic_2

    35/54

    Anisotropy

    Two types of anisotropy Geometric the range over which correlation is non-zero depends

    on direction

    The variance is constant over all directions

    This type can be adjusted for in geostatistical analyses

    Zonal

    Anything not geometric anisotropyAnisotropy implies that the spatial process

    evolves differentially throughout the studyregion

  • 7/28/2019 Geostatistic_2

    36/54

    Variography

    Fitting a valid semivariogram functionto the empirical semivariogram

    Now we are interested in describing

    the variogram as an equation in whichvariance is a function of the distance.

    We shall assume that the spatial

    process is second-order stationaryand isotropic in the following.

  • 7/28/2019 Geostatistic_2

    37/54

    Semivariogram

    We have already seen how to obtain the empiricalvariogram of

    is the semivariogram and is the primaryquantity of interest because

    Now we are interested in describing the

    semivariogram as a function of the distance.

    We shall assume that the spatial process is second-order stationary and isotropic in the following.

    ))()(var()(2 tZsZh

    )(h

    )()0()( hCCh

  • 7/28/2019 Geostatistic_2

    38/54

    Semivariogram

    Semivariogram Models have the following

    properties:

    1) Many are not linear in their parameters

    2) Must be conditionally negative-definite, i.e. thefunction must satisfy

    for any real numbers satisfying

    3) If as , there is microscalevariation which is assumed to be due tomeasurement error (ME) or a process occurring atthe microscale. ME is measurable only if we have

    replicate values at each location in the sample.

    s t

    tstsaa )(20

    0ia

    0)( ch 0h

  • 7/28/2019 Geostatistic_2

    39/54

    Semivariogram

    Semivariogram Models have the followingproperties:

    If(h) is constant for every h except h = 0 where(0) = 0, then Z(s) and Z(t) are uncorrelated for

    any pair of locations s and t

    , i.e. ||h||2 is

    increasing faster than (h) as h increases

    ||||0||||/)(2 2 hashh

  • 7/28/2019 Geostatistic_2

    40/54

    distance

    sill

    nugget

    range

    A Typical Semivariogram

  • 7/28/2019 Geostatistic_2

    41/54

    Characteristics of the Semivariogram

    It is 0 when the separation distance is 0 (Var(0)=0).

    Nugget effect: variation in two points very closetogether.

    May be measurement error

    May be indicative of erratic process (gold ore).

    The sill corresponds to the overall variance of thedata.

    Data separated by distances less than the range

    are spatially autocorrelated (Less variationbetween close observations than between farobservations.)

  • 7/28/2019 Geostatistic_2

    42/54

    22 )()(|||| jijiii yyxx ss

    Estimating the Semivariogram

    Take all pairwise differences in the data:(Z(si)-Z(sj)), s= (x, y), a point in the 2-D plane.

    Compute the Euclidean distance between the

    spatial locations:

    Average pairs that have the same distanceclass;

    Binning: like a 2-D histogram.

  • 7/28/2019 Geostatistic_2

    43/54

    End Result: Empirical Semivariogram

    M d li th S i i

  • 7/28/2019 Geostatistic_2

    44/54

    Modeling the Semivariogram

    The semivariogram measures variation among

    units h units apart.

    Note: We do not want negative standard errors.

    So, we model the semivariogram with selected

    parametric functions ensuring all standard errorsare nonnegative.

    We estimate the nugget, sill, and range

    parameters of the model that best fit the empirical

    semivariogram (nonlinear least squares problem).

  • 7/28/2019 Geostatistic_2

    45/54

    Selected

    semivariogram

    models

  • 7/28/2019 Geostatistic_2

    46/54

    Covariogram Models

    Power Model is simply areparameterization of the

    exponential model.

    Spherical

    Model

    Exponential Model

    Gaussian Model

  • 7/28/2019 Geostatistic_2

    47/54

    Covariogram vs. Semivariogram

    The covariogram and semivariogram are related:

    )()0()( hCCh

  • 7/28/2019 Geostatistic_2

    48/54

    The fitted semivariogram model

    Estimates: nugget=0.084, sill=0.269, range=110.3 miles

  • 7/28/2019 Geostatistic_2

    49/54

    Common methods for fitting these functions to a set of empiricalsemivariogram means:

    1) choose the most likely candidate model

    2) Methods for estimating the parameters of the model :

    non-linear least squares estimation allows for the estimation of parametersthat enter the equation non-linearly but ignores any dependences among theempirical variogram values

    non-linear weighted least-squares generalized least squares in which thevariance-covariance of the variogram data points is accounted for in theestimation procedure

    maximum likelihood assuming the data are Normally distributed but the

    estimators are likely to be highly biased, especially in small samples (theusual remedy is jackknifing)

    restricted maximum likelihood maximize a slightly altered likelihood functionwhich reduces the bias of the MLEs

    P ti f V i

    s t

    ts tsaa )(20 s t

    ts tsaa )(20 s t

    ts tsaa )(20 s t

    ts tsaa )(20 s t

    ts tsaa )(20 s t

    ts tsaa )(20 ts tsaa )(20 ts tsaa )(20 ts tsaa )(20 0ia 0ia 0ia 0ia 0ia 0ia

  • 7/28/2019 Geostatistic_2

    50/54

    Properties of Variogram

    Models if as then there is microscale

    variation Usually assumed to be due to measurement

    error (ME)

    ME is measurable only if we have replicatevalues at each location in the sample When fitting a variogram function, may estimate

    a non-zero value for c0 even when you do nothave replicate observations at sites. This is

    called the nugget.

    if(h) is constant for every h except h=0where (0) = 0, then Z(si) and Z(sj) areuncorrelated for any pair of locations si andsj

    s ts ts t

    0)( ch 0h

    P ti f V i

  • 7/28/2019 Geostatistic_2

    51/54

    Properties of Variogram

    Models

  • 7/28/2019 Geostatistic_2

    52/54

    Choosing a Best Model

    Need to choose the variogram model that

    best fits the data

    Best minimum unexplained variation after

    fitting Look at a measure of deviance

    where is the empirical semivariogram for the ith

    lag and is the value predicted by the fitted

    semivariogram model

    i

    ii hh2)]()([

    )( ih

    )( ih

  • 7/28/2019 Geostatistic_2

    53/54

    Choosing a Best Model

    In the absence of comparing deviance(or similar) measures to determine if

    the model seems appropriate

    Compare fits visually

    Use prior knowledge from other studies to

    determine

  • 7/28/2019 Geostatistic_2

    54/54

    Next Steps

    Using the results of the variography todo statistical modeling of the spatial

    process

    kriging