advanced spatial analysis spatial regression modeling
DESCRIPTION
Advanced Spatial Analysis Spatial Regression Modeling. Paul R. Voss and Katherine J. Curtis. Day 4. GISPopSci. Review of yesterday. Spatial processes spatial heterogeneity spatial dependence Spatial regression models Various specifications for spatial dependence spatial lag model - PowerPoint PPT PresentationTRANSCRIPT
Advanced Spatial AnalysisAdvanced Spatial Analysis
Spatial Regression ModelingSpatial Regression Modeling
GISPopSci
Day 4
Paul R. VossPaul R. Vossandand
Katherine J. CurtisKatherine J. Curtis
GISPopSci
Review of yesterday• Spatial processes
– spatial heterogeneity– spatial dependence
• Spatial regression models• Various specifications for spatial
dependence– spatial lag model– spatial error model– higher-order models
• Afternoon lab– spatial regression modeling in GeoDa & R
GISPopSci
Questions?
GISPopSci
Plan for today• Understanding spatial heterogeneity in
relationships• Local multivariate methods for spatial data
analysis• Introduction Geographically Weighted
Regression (GWR)– theory & concept– uses of GWR– cautions regarding GWR
• Discrete spatial heterogeneity in relationships
• Lab: GWR in R; spatial regime analysis in R
Review: Spatial Dependence & Spatial
Heterogeneity
GISPopSci
Spatial dependence…
the existence of a functional relationship between what happens at one point in space & what happens elsewhere
Spatial heterogeneity…
exists when the mean, and/or variance, and/or covariance
structure “drifts” over a mapped process
GISPopSci
Spatial heterogeneity…
• Typified by regional differentiation
• Reflects “spatial continuities” of social processes which, “taken together help bind social space into recognizable structures”
– a “mosaic of homogeneous (or nearly homogeneous)” areas in which
each is different from its neighbors (Haining 1990:22)
GISPopSci
Suppose we observe the following map…
% Child Poverty US South, Census 2000
GISPopSci
The question for us is…
Is this observed spatial distribution of child poverty generated by spatial heterogeneity or spatial dependence (or nuisance)?
It’s not always easy to know…
% Child Poverty US South, Census 2000
GISPopSci
Iterate as needed…
1. EDA & ESDA on variables—global & local patterns of spatial
autocorrelation under different neighborhood specifications
2. OLS baseline model & accompanying diagnostics
3. Correct for spatial heterogeneity if indicated
4. (With possible controls for spatial heterogeneity) estimate & contrast
spatial error & spatial lag model results
GISPopSci
LISA map of PPOV (1st order queen weights)…
GISPopSci
The question for us is…
Is the process generating poverty in the Mississippi Delta the same as the process generating poverty in Appalachia, or are there different processes?
In other words, is there spatial heterogeneity in the relationships?
% Child Poverty US South, Census 2000
GISPopSci
Spatial Heterogeneity in Relationships
GISPopSci
“The term spatial heterogeneity refers to variation in
relationships over space.”
James P. LeSage
Spatial Econometrics
December, 1998, p. 6
(book manuscript online at
http://www.spatial-econometrics.com/html/wbook.pdf)
GISPopSci
Constancy Assumption…
• Slope of a regression line (or average association among all units)
applies to separate units that comprise the whole (Freedman et
al. 1991:678)
– Unemployment has same association with child poverty in all
counties
GISPopSci
Spatial Heterogeneity…
• Regionally-specific circumstances influence structural
relationships (O’Loughlin et al. 1994)
– Unemployment has different associations with child poverty in
different counties
GISPopSci
Aspatial Context…
Individual wage returns (y) to achieved education (x) by gender
(in some hypothetical advanced society)
A0 = male
A1 = female
GISPopSci
Spatial Context…
Median wage returns (y) to HS+ education (x) by region
(hypothetical values)
A0 = South
A1 = non-South
GISPopSci
Spatial Context…
• Differentiation in the magnitude & nature of relationships
across the spatial region
• Geographic space represents a physically bounded area that
holds social characteristics
– which intersect to create divergent social, economic, & political
outcomes
– across sub-areas within the larger spatial region
GISPopSci
A Brief Digression: “Neighborhood Effects” or “Contextual
Effects”
GISPopSci
Contextual effects…
“[T]he essential feature of all contextual-effects models is an
allowance for macro processes that are presumed to have
an impact on the individual actor over and above the effects of any individual-level variables that may be operating.”
Hubert M. Blalock, Annual Review of Sociology (1984:354)
“Putting people into place means explaining behavior and
outcomes in relation to a potentially changing local context.”
Barbara Entwisle, Demography (2007:687)
GISPopSci
Broad Social, Economic, Cultural, Health, & Environmental Conditions & Policiesat the Global, National, & Local Levels
Living & Working Conditions
Social, Family, & Community Networks
Individual Behavior
Individual Characteristics
Adapted from Luke (2004:5)
Contextual layers…
GISPopSci
Conceptual motivations…
• Ecological & atomistic fallacies– Misattribution of relationships discovered at one level to
relationships at another level
• Collectives and their members—– Both have properties that can be dis/aggregated across
levels
– But the relationships between the properties may differ between the levels
GISPopSci
Statistical motivations…
• Non-independence in error structure– Correlated errors
– Inaccurate standard errors
• Coefficients apply equally to all contexts– Relationship assumed stable across contexts (constancy
assumption!)
GISPopSci
Variation in outcomes:
Variation in the effects:
Community Inequality
(Community i)
Community Inequality
(Community j)
Individual Health
(Individual i, j)
*
***
Individual Health
(Individual i)
Individual Characteristic(Individual i)
Community Inequality
(Community j)
GISPopSci
Place i
Labor Market StructurePolitical Climate
Population CompositionPopulation Health
Place i Place n
Place qPlace pPlace o
Place m
Place lPlace kPlace j
Individual Individual
“Place” versus “Space”…
GISPopSci
“Multilevel models do not incorporate any notion of space and, as such, may be described as nonspatial: they consider the neighborhood affiliation of individuals but neglect spatial connections between neighborhoods.”
Basile Chaix et al., American Journal of Epidemiology (2005:177)
GISPopSci
Introducing “space” into “place” framework…
“[The multilevel approach] fragments space into administrative neighborhoods and ignores spatial associations between them.”
Basile Chaix et al., American Journal of Epidemiology (2005:171)
“A more dynamic conceptualization is needed that…integrates multiple dimensions of local social and spatial context…”
Barbara Entwisle, Demography (2007:687)
GISPopSci
Multiple membership model…
Household
Individual
Occasion
From Goldstein et al. (2000)GISPopSci
From Goldstein et al. (2000)
Household j
Individual i
Occasion t
Household k
Individual i
Occasion t+1
Extended multiple membership model…
GISPopSci
Place i
Labor Market StructurePolitical Climate
Population CompositionPopulation Health
Place i Place n
Place qPlace pPlace o
Place m
Place lPlace kPlace j
Individual Individual
Spatially lagged predictors…
GISPopSci
X1ij, individual characteristicZ1j, contextual characteristic
][ 011111110111000 ijjijjijjjijij XXZZXY
Standard multilevel model…
GISPopSci
X1ij, individual characteristicZ1j, contextual characteristicWZ1j, spatially-weighted contextual characteristic
ijjjijjjijij XWZWZXZZXY 1111101111110111000
][ 011 ijjijj X
Multilevel model with spatially lagged predictors…
GISPopSci
The point being…
• Multilevel modeling framework consistent with concept of heterogeneity in relationships– relationships & outcomes might be conditioned by place
• But not necessarily “spatial” heterogeneity– though framework can be modified to explicitly
incorporate “space,” not just “place”
GISPopSci
Geographically Weighted Regression(continuous spatial heterogeneity)
GISPopSci
GISPopSci
This week we’ve looked at the results of a simple OLS
multivariate regression model
Dependent variable:sqrt(PPOV)
Independent variables:sqrt(UNEM)sqrt(PFHH)log(HSPLUS)
GISPopSci
What about all those diagnostics at the end of the GeoDa regression output?
They revealed a specification with lots of problems
GISPopSci
Recall, the lower half of the GeoDa output from the OLS regression run looked like this
GISPopSci
Let’s remind ourselves…
What’s the take-home message?• Perhaps continue explorations in R (richer
diagnostic environment)
• But, unless we’re very fortunate, the OLS model diagnostics almost always leave us with a bad taste. Why?– BECAUSE WE WANT TO MOVE ON! We want to do something about the
spatial autocorrelation in the residuals
– BUT neither the residual Moran statistic nor the Lagrange multiplier statistics are trustworthy in the presence of non-normality & heteroskedasticy
– Furthermore, econometeric simulations have shown that in the presence of residual spatial autocorrelation, heteroskedasticity is induced
– We’ve got problems!!
– This is where we begin to look for ways of reducing the unresolved heterogeneity that appears to be plaguing our OLS model
GISPopSci
What to do?• Our model was very simple. Surely we could
add important covariates that might improve the statistical qualities of our residuals– additional covariates?– corrections for large 1st-order trends in the data?
• Alternative functional forms?
• Other specifications?– interactions?– spatial regime approaches?
• Here’s where we may get some additional guidance from GWR
GISPopSci
GISPopSci
GWR
Background• Social processes are non-stationary
• We have come to accept the reality that phenomena vary depending on where they are measured– Certainly for single variables
– But multivariate relationships?
• It is not at all uncommon to see published research studies that specify and estimate multivariate models (based, say, on census data) that report only “global” regression results– e.g., the relationship of median HH income to home
ownership rates for counties across the U.S.
GISPopSci
Local methods for spatial data analysis: Lots of them!
• Local univariate spatial data analysis– point pattern clustering; scan statistics
– local graphical analysis; dynamically linked windows
– local filtering
– local measures of spatial dependence
• Local multivariate spatial data analysis– spatial expansion models
– multilevel modeling
– random coefficient models; spatial regime models
– geographically weighted regression
GISPopSci
What is GWR?• Method of exploratory spatial data analysis• Software (GWR 3.x & spgwr in R)• Book• Website• Very much identified with three people…
GISPopSci
GWR 3.xSoftware developed by:
Stewart Fotheringham• Martin Charlton• Chris Brunsdon• University of Newcastle upon Tyne• (at the time)
A. Stewart Fotheringham
Martin Charlton
Chris Brunsdon
Specifically, GWR is a tool for exploring and identifying variation in statistical relationships over space
It’s a way of exploring “spatial heterogeneity” (“spatial non-stationarity”); i.e., where the same
stimulus provokes a different response in different parts of the study region
GISPopSci
Linear regression model:
GISPopSci
ikikiii xxxy ...22110
OLS estimator:
YXXX ')'(ˆ 1
• Gauss-Markov assumptions• Parameters constant over space• If there’s spatial heterogeneity, we only
see it through the residuals
Assumptions:
So… what to do?• We’ve looked at several approaches this
week:– Map residuals; look for spatial patterning– Compute an autocorrelation statistic for the
residuals– “Model” the error dependency using spatial
regression model
• But… why not address the issue of spatial nonstationarity directly, and allow the relationships to vary over space?
GISPopSci
Geographically Weighted Regression model:
GISPopSci
ikikiiiiiii xxxy ...22110
GWR estimator:
YWXXWX iii ')'(ˆ 1
Where Wi is a matrix of weights specific to location i such that observations nearer to i are given greater weight than observations further away
GISPopSci
Where wik is the weight given to data at location k for the estimate of the local parameters at location i
in
i
i
i
i
w
w
w
w
W
...000
...
0...00
0...0
0...0
3
2
1
Optimizing the weights matrix is a computational task of O(n2), so for large data sets it takes some time. Good news: we need only derive this once
Note: if Wi = I (identity matrix), each observation in the data has a weight of unity, and the GWR model reduces to the OLS model
GISPopSci
Weights are determined using a kernel estimation function
Let’s illustrate how it worksI’ll show you some more detail
about kernel function options near the end of today, and (in lab) how to derive and optimize the weights
GISPopSci
GISPopSci
Data used for this illustration:
Proportion of Children in Poverty (transformed)2000 Census of Population (U.S.)
Counties are unit of analysis (n = 3,074)Contiguous 48 states only
Most indep. cities merged with surrounding county
GISPopSci
Log Odds of Proportion of Children in Poverty: 2000
GISPopSci
The “global regression model”
GISPopSci
Industrial Structure
% Extr. Ind. % Non Dur Ind. % Misc. Svcs. % Prof. Svcs
Log Odds % Children in Poverty
Emp. Oppy. Structure % Unemp % Males under emp.
Family Structure % of families w/ children headed by females
Control Variables % Hispanic % Black % HS or less % Emp. in county
Our Model (after Friedman and Lichter, 1998)
Let’s make sure we understand this model
• Ecological regression (with all the attendant caveats)
• Friedman & Lichter fit the model using weighed logistic regression, but were not aware (at the time) of spatial modeling
• Should have been fit using formal spatial regression approaches (we did this in a replication of the research)
GISPopSci
GISPopSci
Now, we’ll take a look at this model from a GWR perspective
GISPopSci
Log Odds of Proportion of Children in Poverty: 2000
•
Here’s how it works…
•••••
•
GISPopSci
Regression Intercept
Global = -4.984
GISPopSci
T-value: Regression Intercept
GISPopSci
Variable: Prop. H.S. Educ. or Less
Global: 2.6
Local: -4.1 - +6.5
GISPopSci
Variable: Proportion Labor Force Unemployed
Global: 4.2
Local: -9.0 - +14.2
GISPopSci
Variable: Prop. Extractive Industries
Global: 2.5
Local: -14.6 - +25.5
GISPopSci
Variable: Prop. Female Headed Households
Local: -4.3 - +8.4 ns
Global: 3.9
GISPopSci
Local R2 Values
Global R2: 0.713
So? What can be done with this other than making pretty maps?
• Exploratory spatial data analysis– Look for possible interaction effects– Regime analysis– Policy tool
• For example…
GISPopSci
GISPopSci
PHSLS
PUNEMns
PFHHns PPSRV
PNDMFG
PEXTR
GISPopSci
Strengths of GWR• Potentially important tool when exploring
spatial data. Nothing is the same everywhere. Helps you to understand spatial heterogeneity in your data.
• Provides better understanding global model. Serves as a device for possibly identifying specification errors in global model (e.g., important interaction effects). Thus, GWR and local analysis becomes a potential model-building procedure
• Permits logistic regression and Poisson regression as well as “normal” regression
GISPopSci
Okay. Pretty slick! Everyone likes this, right?
Well, No
GISPopSci
Some faults regarding GWR
• Very rote. Regression undertaken at each “regression point” without much care regarding regression assumptions
• Data with spatially autocorrelated residuals fit with OLS rather than spatial regression model (MLE, IV/GMM)
• The results are not easily amenable to tabular presentation; they often make great maps, however
GISPopSci
But wait… There’s more!• Usual rule in regression: n observations, k
parameters; n >> k
• GWR fits n x k parameters with only n observations
• Some unusual results arising from GWR are not yet fully understood. For example, it has been observed and commented upon that GWR can sometimes generate high (negative) correlations among estimated parameters
GISPopSci
PARM_9
1.0.50.0-.5-1.0-1.5
PA
RM
_2
4
3
2
1
0
-1
Parameters: Proportion Black vs. Proportion Nondurable Manufacturing
Pearson’s r = -.729
PRONMAN
.5.4.3.2.10.0-.1
PR
OB
LCK
1.0
.8
.6
.4
.2
0.0
-.2
Variables: Proportion Black vs. Proportion Nondurable Manufacturing
Pearson’s r = .368
GISPopSci
David Wheeler & Michael TiefelsdorfJournal of Geographical Systems 7(2005):161-287
• “GWR appears to amplify regression parameter correlations present in global model”
• “One local parameter pattern can be used to predict another local parameter pattern”
• “Misspecified kernel function increases coefficient correlation”
• “Perhaps, local spatial autocorrelation among the residuals influences the parameter correlation”
Let’s take a look at what we get with the Southern Counties
shape file as a preview of this afternoon’s lab
• Richer understanding of kernel weighting options
• GWR results from the model regressing sqrt(PPOV) on sqrt(UNEM), sqrt(PFHH), and log(HSPLUS)
• Examine parameter relationships and residuals
GISPopSci
GISPopSci
This afternoon we’ll take a closer look at the kernel options available in GWR & R.
The following resulted from bw search:
Adaptive bandwidth kernels difficult to plot, as bw varies for different areas with in study area
GISPopSci
Map of GWR results for independent variable sqrt(PFHH)
Global: 0.387
GISPopSci
Map of GWR results for independent variable sqrt(UNEM)
Global: 0.828
GISPopSci
Map of OLS residuals and GWR residuals
Global: Moran’s I = 0.451
GWR: Moran’s I = 0.059
GISPopSci
Map of GWR R-squared values
GISPopSci
Boxplots of GWR parameters & R2
Anything interesting in the correlations among the GWR
parameters?
GISPopSci
GISPopSci
cor: -0.178
GISPopSci
cor: 0.106
GISPopSci
cor: 0.138
What to make of it all…• While there’s lots of modeling going on,
GWR is primarily a tool for exploring spatial heterogeneity in your data– does it point to revised specification?– perhaps regime analysis?
• The Leung, et al. (2000) test is useful for testing whether the range of GWR parameter values is significant– ours were all highly significant– when not, mixed GWR models are suggested
(tools for actually doing this are not readily at hand)
GISPopSci
Spatial Regime Analysis(discrete spatial heterogeneity)
GISPopSci
Spatial Regime Analysis…
• Discrete spatial heterogeneity (versus continuous spatial
heterogeneity)
• Allows model coefficients to vary between discrete spatial
subsets of the data (interactions)
GISPopSci
In practice…
• Interact each explanatory variable with each sub-region dummy
variable
– unemployment X Delta (0,1)
• Chow test to assess significance of regimes
– spatial Chow test necessary when spatial autocorrelation evidenced
– test on residuals, F distribution (non-spatial) or Chi-squared distribution
(spatial)
• Can incorporate small-scale spatial effects
– spatial diagnostics
– spatial lag / spatial error
GISPopSci
County Child Poverty Rates (log odds), 2000…
GISPopSci
County Child Poverty Rates (log odds), 2000…
GISPopSci
County Child Poverty, South & non-South…
β SE β SERacial/Ethnic Concentration Proportion African American 0.34 0.21 -0.46 *** 0.09 12.17 *** Proportion Native American 0.53 *** 0.11 0.57 * 0.24 0.03 Proportion Hispanic 0.28 ** 0.10 0.32 *** 0.08 0.11Economic Conditions †
Farming Dependent 0.34 *** 0.02 0.14 *** 0.03 32.78 *** Mining Dependent 0.06 0.04 0.03 0.03 0.53 Manufacturing Dependent -0.10 *** 0.02 -0.05 * 0.02 4.89 * Federal/State Government Dependent -0.03 0.03 -0.03 0.02 0.02 Services Dependent -0.07 ** 0.02 -0.08 * 0.03 0.06 Proportion Unemployed 2.19 *** 0.42 3.61 *** 0.43 5.62 * Proportion Male Underemployed 2.43 *** 0.19 1.72 *** 0.21 6.46 *Demographic Structure Proportion Female-Headed Households 3.07 *** 0.23 3.36 *** 0.24 0.78 Proportion Disabled 2.03 *** 0.27 1.70 *** 0.26 0.82 Proportion Age 65 & Older 0.31 0.23 0.63 ** 0.23 0.99 Proportion Foreign-Born 0.54 ** 0.20 0.75 ** 0.25 0.40Human Capital Proportion High School Educated or Less 1.41 *** 0.10 1.43 *** 0.12 0.01
Constant -3.75 *** 0.08 -3.62 *** 0.08 1.47Spatial Lag Parameter (ρ) 0.33 *** 0.01
Chow Test for Structural Instability across Regimes 106.59 ***Likelihood Ratio Test for Spatial Lag 482.11 ***-2 Log Likelihood -299.09
* p < .05, ** p < .01, *** p < .001† Nonspecialized economic dependence is the reference category.
South Non-South Structural
GISPopSci
GISPopSci
GWR as a descriptive tool…
GISPopSci
GISPopSci
Spatial regime analysis of Plantation / Non-Plantation Belt…
GISPopSci
Implications for the outcome variable…
GISPopSci
GISPopSci
Readings for today• Fotheringham, A. Stewart, and Chris Brunsdon. 1999. “Local forms of Spatial
Analysis.” Geographical Analysis 31(4):340-358.
• Wheeler, David, and Michael Tiefelsdorf. 2005. “Multicollinearity and Correlation among Local Regression Coefficients in Geographically Weighted Regression.” Journal of Geographical Systems 7:161-187.
• O’Loughlin, John, Colin Flint, and Luc Anselin. 1994. “The Geography of the Nazi Vote: Context, Confession, and Class in the Reichstag Election of 1930.” Annals of the Association of American Geographers 84(3):351-380.
• Cahill, Meagan, & Gordon Mulligan. 2007. “Using Geographically Weighted Regression to Explore Local Crime Patterns.” Social Science Computer Review 25(2):174-193.
• Gros, Daniel, Chris Brunsdon & Richard Harris. No date. Introduction to Geographically Weighted Regression (GWR) and to Grid Enabled GWR.
• Harris, Richard, Alex Singleton, Daniel Grose, Chris Brunsdon & Paul Longley. 2010. “Grid-enabling Geographically Weighted Regression: A Case Study of Participation in Higher Education in England.” Transactions in GIS 14(1):43-61.
• Anselin, Luc. 2007. “Discrete Spatial Heterogeneity” & “Continuous Spatial Heterogeneity.” Spatial Regression Analysis in R: A Workbook. Chap. 8 & 9.
GISPopSci
Afternoon Lab
GWR hands on (using R)&
Spatial Regime Analysis (using R)
GISPopSci
Questions?