exploratory tools for spatial data: diagnosing spatial autocorrelation main message when modeling &...

Download Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships

Post on 05-Jan-2016




0 download

Embed Size (px)


  • Exploratory Tools for Spatial Data: Diagnosing Spatial AutocorrelationMain Message when modeling & analyzing spatial data: SPACE MATTERS!Relationships between observations from independent data can be analyzed in numerous ways. Some include:

    1. Estimation through Stochastic Dependencies2. Spatial Regression: Deterministic structure of the mean function.

    3. Lattice Modeling: expressing observations as functions of neighboring values.

    Chapter Emphasis: exploratory tools for spatial data must allow some insight into the spatial structure in the data.

  • For instance, stem & leaf plots and histograms pictorially represent the data, but tell us nothing about the datas spatial orientation or structure. (Stem & Leaf Plot)(Histogram)

  • Example of using lattice modeling to demonstrate importance of retaining spatial information: 10 X 10 lattices filled with 100 observations drawn at random. Lattice A is a completely random assignment of observations to lattice positions. Lattice B is an assignment to positions such that a value is surrounded by values similar in magnitude.

  • Histograms of the 100 observed values that do not take into account spatial position will be identical for the two lattices: Note: The density estimate is not an estimate of the probability distribution of the data; that requires a different formula. Even if the histogram is calculated by lumping data across spatial locations appears Gaussian does not imply that the data are a realization of a Gaussian random field.

  • Plotting observed values against the average value of the nearest neighbors the difference in the spatial distribution between the two lattices emerge:

    The data in lattice A are not spatially correlated and the data in lattice B are very strongly autocorrelated. Terminology:

  • Distinguishing between spatial and non-spatial arrangements can detect outliers.In a box plot or a stem & leaf plot, outliers are termed distributional.A spatial outlier in an observation that is unusual compared to its surrounding values.Diagnosing Spatial Outliers:Median-Polish the data, meaning remove the large scale trends in the data by some outer outlier-resistant method, and to look for outlying observations in a box-plot of the median-polished residuals. Use of Lag Plots (Previous example)

  • Concerning Mercer and Hall Grain Yield. 1S+Spatial States Code:Bwplot(y~grain, data=wheat, ylab=Row, xlab= Grain Yield)Bwplot (x~grain,data=wheat, ylab=Column, xlab= Grain Yield)

  • Describing, Diagnosing, and Testing the Degree of Spatial AutocorrelationGeostatistical Data: the empirical semivariogram provides an estimate of the spatial structure.

    Lattice data JOINT-COUNT statistics have been developed for binary and nominal data. Moran (1950) and Geary (1954): developed autocorrelation coefficients for continuous attributes observed on lattices.

    Coefficient Morans I and Gearys C.Comparing an estimate of the covariation among the Z(s) to an estimate of their variation. 2

  • Let Z(si), i= 1,2,3,,n denote the attribute Z observed at site si and Ui= Z(si)- Z its centered version. wij denotes the neighborhood connectivity weight between sites si and sj with wii= 0.

  • In the absence of spatial autocorrelation, I has an expected value E[I]= -1/(n-1)

    values I > E[I] indicate positive autocorrelation. values I < E[I] indicate negative autocorrelation.To determine whether a deviation of I from its expectation is statistically significant one relies on the asymptotic distribution of I which is Gaussian with mean -1/(n-1) and variance 2I.The hypothesis of no spatial autocorrelation is rejected at the x 100% significance level if |Zobs| = |I- E[I]| / I is more extreme than the za/2 cutoff of a standard Gaussian distribution.

  • Assume Z(si) are Gaussian

    Under Null Hypothesis, Z(si) are assumed G(,2), so that Ui ~ (0, 2(1-1/n))2. Randomization Framework

    Z(si) are considered fixed; randomly permuted among the n lattice sites.

    There are n! equally likely random permutations and I2 is the variance of the n! Moran I values. 3Best Alternative to Randomization.

  • Calculates the Zobs statistics and p-values under the Gaussian and randomization assumption. Data containing the W matrix (W= [wij] ) is passed to the macro through the w_data option. (we are utilizing SASmacro %MoranI) For rectangular lattices: use the macro %ContWght (in file \SASMacros\ContiguityWeights.sas) calculates the W matrices for classical neighborhood definitions.

  • %include DriveLetterofCDROM: \Data\SAS\MercerWheatYieldData.sas;%include DriveLetterofCDROM: \SASMacros\ContiguityWeights.sas;%include DriveLetterofCDROM: \SASMacros\MoranI.sas;

    Title1 Morans I for Mercer and Hall Wheat Yield, Rooks Move;%Contwght (rows=30, cols=25, move=rook, out=rook);%MoranI(data=mercer, y=grain, row=row, col=col, w_data=rock);4

  • Sensitive to large scale trends in dataVery sensitive to the choice of the neighborhood matrix WIf the rook definition (edges abut) is replaced by the bishops move (touching corners), the autocorrelation remains significant but the value of the test statistic is reduced by about 50%.Title1 Morans I for Mercer and Hall Wheat Grain Data, Bishops Move; %ContWght (row=20, cols=25, move=bishop, out=bishop); %MoranI(data=mercer, y=grain, row=row, col=col, w_data=bishop);5

  • Linear Model: Z=1.4 + 0.1x + 0.2y +0.002x2 + e, e~iidG(0,1), where x and y are the lattice coordinates.Data simulate; do x= 1 to 10; do y= 1 to 10; z= 1.4 + 0.1*x + 0.2*y +0.002*x*x + rannor(2334); output; end; end;Run; Title1 Morans I for independent data with large-scale trend;%ContWght(rows=10, cols=10, move=rock, out=rock);%MoranI(data=simulate, y=z, row=x, col=y, w_data=rook)Test indicates strong positive autocorrelation which is an artifact of the changes in E[Z] rather than stochastic spatial dependency among the sites.

  • IF trend contamination distorts inferences about the spatial autocorrelation coefficient, then it seems reasonable to remove the trend and calculate the autocorrelation coefficient from the RESIDUALS. The residual vectorModified I test statisticThe mean and variance differ a little bit, now, the E[I*] depends on the weights W and the X matrix. (6)

  • Title1 Morans I for Mercer and Hall Wheat Yield Data;Title 2 Calculated for Regression Residuals;%include DriveLetterofCDROM: \SASMacros\MoranResiduals.sas;Data xmat: set mercer; x1= col; x2= col**2, x3= col**3; keep x1 x2 x3Run;%RegressI(xmat=xmat, data=mercer, z=grain, weight=rook, local=1);This particular code fits a large scale mean model with cubic column effects and no row effects. This adds higher order terms for column effects and leaves the results essentially unchanged.7

  • The value of Zobs is slightly reduced from Output 9.3(slide 14) indicating that the column trends did add some false autocorrelation. P value is highly significant, conventional tests for independent data is not a fun analysis.

  • Optional Parameter: local= 8LISA: Local Indicator of Spatial AssociationThe interpretation is that if the test statistics is < Expected Value then sites connected to each site si have attribute values dissimilar from Z(si)A high (low) value at si is surrounded by low (high) values. If the test statistic is > Expected Value, then a high (low) value at Z(si) is surrounded by high (low) values at connected sites.

  • Graph shows detrended Mercer and Hall grain yield data with sites with positive LISAs. Hot-spots where autocorrelation is locally much greater than for the remainder of the lattice is obvious.


View more >