applied spatial statistics in r, section 2 - huit sites hostingzhukov/spatial2.pdf · ·...

Applied Spatial Statistics in R, Section 2Spatial Autocorrelation

Yuri M. Zhukov

IQSS, Harvard University

January 16, 2010

Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 1 / 1

Spatial Autocorrelation

Outline

1 IntroductionWhy use spatial methods?The spatial autoregressive data generating process

2 Spatial Data and Basic Visualization in R

PointsPolygonsGrids

3 Spatial Autocorrelation

4 Spatial Weights

5 Point Processes

6 Geostatistics7 Spatial Regression

Models for continuous dependent variablesModels for categorical dependent variablesSpatiotemporal models



What is Spatial Autocorrelation?

Spatial autocorrelation measures the degree to which a phenomenonof interest is correlated to itself in space (Cliff and Ord 1973, 1981).

Tests of spatial autocorrelation examine whether the observed valueof a variable at one location is independent of values of that variableat neighboring locations.

Positive spatial autocorrelation indicates that similar values appearclose to each other, or cluster, in space

Negative spatial autocorrelation indicates that neighboring values aredissimilar or, equivalenty, that similar values are dispersed.

Null spatial autocorrelation indicates that the spatial pattern israndom.



Global autocorrelation: Moran’s I

The Moran’s I coefficient calculates the ratio between the product ofthe variable of interest and its spatial lag, with the product of thevariable of interest, adjusted for the spatial weights used.

I =n∑n

i=1

∑nj=1 wij

∑ni=1

∑nj=1 wij(yi − y)(yj − y)∑n

i=1(yi − y)2

where yi is the value of a variable for the ith observation, y is thesample mean and wij is the spatial weight of the connection between iand j .

Values range from –1 (perfect dispersion) to +1 (perfect correlation).A zero value indicates a random spatial pattern.

Under the null hypothesis of no autocorrelation, E[I] = −1n−1



Global autocorrelation: Moran’s I

Calculating the variance of Moran’s I is a little more involved:

Var(I) =ns1 − s2s3

(n − 1)(n − 2)(n − 3)(∑

i

∑j wij)2

s1 =(n2 − 3n + 3)(1

2

∑i

∑j

(wij + wji )2)

− n(∑

i

(∑j

wij +∑j

wji )2)

+ 3(∑i

∑j

wij)2

s2 =n−1

∑i (yi − x)4

(n−1∑

i (yi − x)2)2

s3 =1

2

∑i

∑j

(wij + wji )2 − 2n

(1

2

∑i

∑j

(wij + wji )2)

+ 6(∑

i

∑j

wij

)2



Global autocorrelation: Geary’s C

The Geary’s C uses the sum of squared differences between pairs ofdata values as its measure of covariation.

C =(n − 1)

∑i

∑j wij(yi − yj)

2

2(∑

i

∑j wij)

∑i (yi − y)2

where yi is the value of a variable for the ith observation, y is thesample mean and wij is the spatial weight of the connection between iand j .

Values range from 0 (perfect correlation) to 2 (perfect dispersion). Avalue of 1 indicates a random spatial pattern.



Global autocorrelation: Join Counts

When the variable of interest is categorical, a join count analysis canbe used to assess the degree of clustering or dispersion.

A binary variable is mapped in two colors (Black & White), such thata join, or edge, is classified as either WW (0-0), BB (1-1), or BW(1-0).

Join count statistics can show

positive spatial autocorrelation (clustering) if the number of BW joinsis significantly lower than what we would expect by chance,negative spatial autocorrelation (dispersion) if the number of BW joinsis significantly higher than what we would expect by chance,null spatial autocorrelation (random pattern) if the number of BWjoins is approximately the same as what we would expect by chance.




By the naive definition of probability, if we have nB Black units andnW = n − nB White units, the respective probabilities of observingthe two types of units are:

PB =nBn

PW =n − nB

n= 1− PB

The probabilities of BB and WW in two adjacent cells are

PBB = PBPB = P2B PWW = (1− PB)(1− PB) = (1− PB)2

The probability of BW in two adjacent cells is

PBW = PB(1− PB) + (1− PB)PB = 2PB(1− PB)




The expected counts of each type of join are:

E[BB] =1

2

∑i

∑j

wijP2B E[WW ] =

1

2

∑i

∑j

wij(1− PB)2

E[BW ] =1

2

∑i

∑j

wij2PB(1− PB)

Where 12

∑i

∑j wij is the total number of joins (of any type) on a

map, assuming a binary connectivity matrix.The observed counts are:

BB =1

2

∑i

∑j

wijyiyj WW =1

2

∑i

∑j

wij(1− yi )(1− yj)

BW =1

2

∑i

∑j

wij(yi − yj)2

where yi = 1 if unit i is Black and yi = 0 if White.Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 9 / 1



The variance of BW is calculated as

σ2BW =E[BW 2]− E[BW ]2

=1

4

(2s2nB(n − nB)

n(n − 1)+

(s3 − s1)nB(n − nB)

n(n − 1)

+4(s2

1 + s2 − s3)nB(nB − 1)(n − nB)(n − nB − 1)

n(n − 1)(n − 2)(n − 3)

)− E[BW ]2

s1 =∑i

∑j

wij

s2 =1

2

∑i

∑j

(wij − wji )2

s3 =∑i

(∑j

wij +∑j

wji )2




A test statistic for the BW join count is

Z(BW ) =BW − E[BW ]√

σ2BW

The join count statistic is assumed to be asymptotically normallydistributed under the null hypothesis of no spatial autocorrelation.

The test of significance is then provided by evaluating the BWstatistic as a standard deviate (Cliff and Ord, 1981).



Local autocorrelation

Global tests for spatial autocorrelation are calculated from localrelationships between observed values at spatial units and theirneighbors.

It is possible to break these measures down into their components,thus constructing local tests for spatial autocorrelation.

These tests can be used to detect

Clusters, or units with similar neighborsHotspots, or units with dissimilar neighbors



Local autocorrelation

Below is a scatterplot of county vote for Bush and its spatial lag (averagevote received in neighboring counties). The Moran’s I coefficient is drawnas the slope of the linear relationship between the two. The plot ispartitioned into four quadrants: low-low, low-high, high-low and high-high.

0 20 40 60 80 100

020

4060

80100

Moran Scatterplot

Percent for Bush

Per

cent

for B

ush

(Spa

tial L

ag)

GlacierRolette

St. Louis

San Juan

Roosevelt

Clallam

Jefferson

GarfieldMissoula

Custer

DouglasAshland

SiouxDeer Lodge

Carter

Big Horn

Harding

Multnomah

Dewey

Valley

Ramsey

Yellowstone National Park (Part)

ClintonLamoilleChittenden

Clark

Buffalo

Blaine Teton Madison

Mower

Ada

Shannon Todd

Windham

Bannock

Sioux

Berkshire

Franklin

Cassia

HampshireSuffolk

FranklinBlaineBox ElderRich

Arthur

Geauga

Banner

Cuyahoga

Dukes

WeberLincoln

Nantucket

Morgan

Summit

Mahoning

Bergen

Bronx

Salt Lake

Nassau

Essex

New York

UintahDuchesne

Hudson

Queens

Kings

Hayes

Richmond

Utah

Philadelphia

Marion

Carbon

DenverPitkin

Montgomery

Monroe

Anne ArundelPrince Georges

Washington

St. Louis

Sonoma

Monroe

Marin

Elliott

San Miguel

Garfield

DoloresAlameda

San FranciscoSan Mateo

Costilla

Jackson

KnottSanta Clara

Santa Cruz

San JuanRio Arriba

Apache

TaosHertfordWarren

Person

Ochiltree

OrangeDurham

Kingfisher

Roberts

Santa Fe

McKinley

Los Alamos

San Miguel Tipton

Potter

Oldham

St. FrancisTorrance De Soto

Tunica

Parmer

CatronGrant

Florence

Foard

Fulton

Clarke Jefferson

TaliaferroCarroll

Clayton

ChicotFayette

Hancock

Bamberg

Holmes

DorchesterGlascock

NoxubeeGreene Allendale

West Carroll

Dallas

Sumter

Shackelford

Perry

Warren

MaconEffingham

Marengo

SchleyLowndes

BullockWilcox

Beaufort

Jasper

Claiborne

SterlingGlasscock

Loving

Lee

Jefferson Davis

Crane

Miller

Decatur

Grady

Gadsden Livingston

DuvalJefferson

Orleans

St. Bernard

Alachua

LaFourche

Bexar

ZavalaMcMullen

Dimmit

Duval

BrooksStarr

Willacy

Menominee

Orleans

Monroe

WayneDane

Milwaukee

LivingstonWayneWashtenaw

Broome Cook

Monroe

Lagrange

Baltimore City

Fairfax

ArlingtonAlexandria

HarrisonburgStaunton

Charlottesville

Clifton ForgeRichmond City

Colonial Heights

Petersburg

Poquoson CitySouth Boston



Local autocorrelation: Local Moran’s I

A local Moran’s I coefficient for unit i can be constructed as one ofthe n components which comprise the global test:

Ii =(yi − y)

∑nj=1 wij(yj − y)∑n

i=1(yi−y)2

n

As with global statistics, we assume that the global mean y is anadequate representation of the variable of interest.

As before, local statistics can be tested for divergence from expectedvalues, under assumptions of normality.



Local autocorrelation: Local Moran’s IBelow is a plot of Local Moran |z |-scores for the 2004 PresidentialElections. Higher absolute values of z scores (red) indicate the presence of“hot spots”, where the percentage of the vote received by President Bushwas significantly different from that in neighboring counties.

Local Moran's I (|z| scores)

0

5

10

15

20



Words of Caution

1 Autocorrelation tests are highly sensitive to spatial patterning in thevariable of interest from any source. But by assuming that the meanmodel removes such systematic spatial patterning, spatialautocorrelation tests do not always produce useful insights into theDGP.

2 These tests are also highly sensitive to one’s choice of spatial weights.Where the weights do not reflect the “true” structure of spatialinteraction, estimated autocorrelation (or lack thereof) may actuallystem from misspecification.



Words of Caution

Below is a correlogram of Moran’s I coefficients for Polity IV countrydemocracy scores in 2008. The x-axis represents distances betweencountry capitals, in kilometers. Here, democracy is significantly (p ≤ .05)spatially autocorrelated only at distances of 3,000 km and below. So,autocorrelation estimates will depend highly on choice of lag distance.

0 2000 6000 10000 14000 18000 22000 26000 30000 34000 38000

-1.0

-0.6

-0.2

d2m(corD1[, 1])

Mor

an's

I C

oeffi

cien

t

0 2000 6000 10000 14000 18000 22000 26000 30000 34000 38000

0.00.20.40.60.81.0

p-value

p ≤ 0.05



Words of Caution

1 Autocorrelation tests are highly sensitive to spatial patterning in thevariable of interest from any source. But by assuming that the meanmodel removes such systematic spatial patterning, spatialautocorrelation tests do not always produce useful insights into theDGP.

2 These tests are also highly sensitive to one’s choice of spatial weights.Where the weights do not reflect the “true” structure of spatialinteraction, estimated autocorrelation (or lack thereof) may actuallystem from misspecification.

3 As originally designed, spatial autocorrelation tests assumed there areno neighborless units in the study area. When this assumption isviolated, the size of n may be adjusted (reduced) to reflect the factthat some units are effectively being ignored. Not doing so willgenerally bias the absolute value for the autocorrelation statisticupward and the variance downward.



Examples in R

Switch to R tutorial script. Section 2.


applied spatial statistics in r, section 2 - huit sites hostingzhukov/spatial2.pdf · ·...

Documents