an overview spatial analysis. spatial analysis involves... data exploration – the uncovering of...

39
AN OVERVIEW Spatial Analysis

Upload: jordan-andrew-booker

Post on 03-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

AN OVERVIEW

Spatial Analysis

Page 2: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Spatial Analysis involves ...

Data Exploration – the uncovering of patterns, identification of the unusual, discovery of groups

Visualisation – mapping and chartingSummary– data reduction, noise removal, synthesis of

informationModelling and EstimationExplanation (Causal analysis)Reliability and Quality Measurement

Page 3: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Data Types in Spatial Analysis

Spatial analysis draws on three types of information: THEMATIC (Attribute) data – WHAT are the key characteristics

we are interested in? SPATIAL data – WHERE are things in space? TEMPORAL data –WHEN things exist or existed, or when

particular events took place We can explore thematic, spatial or temporal relationships

separately, or in combination BUT the more we explore simultaneously, the HARDER it gets

and the LONGER it takes

Page 4: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Properties of Spatial Data

SPATIAL PATTERN of locations

SPATIAL DEPENDENCE between attribute values observed at different locations

SPATIAL HETEROGENEITY - systematic variation across space

Page 5: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Spatial Patterns

• Systematic behaviour / occurrence in space:

RANDOM CLUSTERED DISPERSED

Page 6: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Spatial Dependence

• Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things”

• Consequently, the way in which we aggregate and partition data may have implications… (MAUP – see later)

Page 7: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

What is Spatial Heterogeneity?

Systematic Variation Across Space.This can be caused by:

intrinsically different relationships across space (spatial variations in attitudes or preferences due to administrative, political or social contexts);

model misspecification (omitted variables, inappropriate functional form).

spatial variation in relationships due to sampling variations;

Page 8: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Exploratory Data Analysis

Exploratory techniques are useful in:Pre-modelling: exploring data accuracy, formulating

hypotheses, detecting clusters, outliers and trends (“brushing” in GeoDa);

Post-modelling: examining model accuracy and robustness (e.g. mapping of residuals from a model).

May be applied to individual variables, or to relationships between variables.

Page 9: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Exploratory Spatial Data Analysis in GeoDa

Page 10: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

http://www.ph.ucla.edu/epi/snow/broadstreetpump.html

Exploratory Spatial Data Analysis: John Snow’s Map of Cholera Deaths, 1855

Page 11: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Exploratory Post-Modelling Analysis in ONS: Checking the validity of small area estimates

Page 12: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Accounting for Spatial Effects

We can exploit spatial properties to: Produce smooth estimates and formulate hypotheses and

eliminate noiseWe can potentially build models to explain and

measure: Spatial interactions (understanding flows) Similarity and influences across space Variations across space Processes that evolve across space and time

Page 13: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Similarity and influences between neighbours

Rook’s Case Bishop’s Case Queen’s Case

‘W’ Adjacency Matrix

Page 14: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Spatial dependence with area data

Measuring local relationships:

Moran scatterplot(Visualising)

Moran’s i and Geary’s c(Hypothesis testing)

Proportion of economically active heads

of household in social classes 1 & 2

(East Anglia wards)

PHHSO

.6.4.20.0-.2-.4

WPH

HSO

.4

.3

.2

.1

-.0

-.1

-.2

-.3

Page 15: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Modelling Spatial Dependence

•The spatial interaction matrix W normally shows direct

neighbours (R, B or Q case), but may accommodate

second- or higher order neighbours by means of spatial

lags, thus:

Page 16: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

What is a neighbourhood?

What relationships to a given location might make neighbouring spatial data influential? Only immediate neighbours (R, B or Q case)?; First, second, third etc., ‘order’? All neighbours within a given radius (h)?; A fixed number of neighbours?

Page 17: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Capturing Neighbourhoods – the Spatial Kernel

A ‘tent’ that can be placed over each data point Fixed Kernel - the number of neighbours captured

varies across the study area Adaptive Kernel - the radius varies over the study area,

but the number of neighbours captured for every point is fixed.

Neighbours enclosed ‘within the tent’ are then weighted according to a ‘distance decay curve’ implemented by the kernel.

Page 18: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

The coverage (floor area of the tent) might be chosen:

arbitrarily (resulting in too much or too little smoothing);

by some rule of thumb, e.g.

optimally (using Kriging interpolation).

Spatial Kernels - Potential for errors

Page 19: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

IDW Smoothing

A simple smoothing method used widely in GIS is IDW (inverse distance-weighted) interpolation

IDW interpolation works by estimating the value of the target variable y at points unknown using a local weighted average of known values

The influence of each local point is weighted by proximity to the point of interest

Page 20: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

IDW Interpolation

d1

d8

d6

d4

d2

d7

d5d3

In general terms, the smoothed values ofa target variable y are averages of the values at n known points, multiplied by a WEIGHTING FUNCTION : where wij are the weights linking observation i and any other observation j

The weighting function, w, is used tomodel the influence of distance on thecontribution that the other points make.

It is usually some functionof distance, like w = 1/dk or w = ekd

Page 21: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Exploiting Spatial Dependency - Example

There were about 215,000 house sales in London in 2002

Page 22: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Adaptive Kernel – 25 nearest neighboursAdaptive Kernel – 25 nearest neighbours Fixed Kernel – 5 Kilometre Search RadiusFixed Kernel – 5 Kilometre Search Radius

Two contrasted smoothing regimes

Page 23: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Considerations and Refinements

How many points should we include?What is the maximum distance we should search

within?What should the value of w be? It need not be 0 – 1 .Should these values be fixed or variable across space?

We can best-guess the first three, and assume the fourth is fixed – or we can refer to local spatial properties, using Kriging or GWR

Page 24: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Some Limitations ...

As we try to analyse more things simultaneously (space, time, theme), models become more complicated, more difficult to specify and (usually) take longer to compute.

Software to implement is LIMITED / Bespoke Data handling / volumes can be a problem! LSOA W matrix! More complexity (usually) means:

More difficult to specify the model properly Harder to interpret the results More difficult to judge quality

Combining techniques and models is tricky

Page 25: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

… and gains

Spatial analysis builds spatial information explicitly into the analysis and modelling framework.

Methods allow us to explore our data for patterns and trends, to build models that include spatial and space / time relationships, and to visualise the results of our analysis.

If methods are based on statistical principles, we can obtain reliability measures alongside the outputs.

Page 26: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Modifiable Units of Spatial Coverage

There is no “standard” unit of spatial coverage like the HH:MM:SS of time.

When we move from unit-level spatial data to groupings of individual point events, these geographical groups are inherently MODIFIABLE.

A space can be subdivided into n zones in many thousands of different ways.

In practice, this means that the areas used to report aggregated data are often arbitrary (in terms of the data being studied) or designed without data analysis in mind.

Page 27: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of
Page 28: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Cross-cut by 2003 CAS WardsCross-cut by 2003 CAS Wards Cross-cut by MSOAsCross-cut by MSOAs

One dataset – two boundaries

Page 29: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

The two faces of MAUP

The “Scale Effect” or “Aggregation Effect” Different results and inferences may be obtained when the

same set of data is grouped into increasingly larger areal units

The “Zoning Effect” or “Partitioning Effect” Results and inferences may vary depending on the partition of

space that is applied at a given geographical scale.

These effects interact. Exactly how?

Page 30: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Scale Zoning

Page 31: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

ONS Preliminary MAUP Project - Data

2001 Census data for England and Wales Standard dataset of 122 variables, aggregated to seven different

geographies: Output Areas (175,434 areas) Lower Layer Super Output Areas (34,378 areas) Middle Layer Super Output Areas (7,194 areas) 2003 Statistical Wards (8,868 areas) Local Authorities (376 areas) Counties (34 areas) Government Office Regions (10 areas)

Page 32: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

ONS Preliminary MAUP Project - Procedures

Pearson PM correlations between all variables calculated at all seven levels and ‘mapped’ in MatLab.

Both x and y axis show the range of 112 census variables, starting at the top left-hand corner

Pearson PM correlations were calculated for every variable against all the others.

Naturally, some correlations were positive, others, negative.

Page 33: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

MatLab Pearson Correlation Matrices - Counts

However, for Count data, as the areas grew in size from Output Area to GOR, there was a distinct shift towards positive correlations.

This is expected – larger areas, more count data.The sudden ‘redshift’ from MSOA to Ward is accounted for by

the heterogeneous (size and socio-economic) nature of wards

Page 34: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

MatLab Pearson Correlation Matrices – Counts

OA LSOA MSOA

Ward LAD County GOR

Page 35: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

MatLab Pearson Correlation Matrices – Rates

Rate data also shows a drift towards positive correlation with larger spatial units, but proceeds more smoothly than with count data.

This is because as the spatial units grow in size, the differences between the areas is reduced.

Page 36: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

MatLab Pearson Correlation Matrices – Rates

OA LSOA MSOA

Ward LAD County GOR

Page 37: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

ONS Preliminary MAUP Project – Contd.

We also did simple linear regression models on a smaller, common subset of variables, and fitted these at each geography.

We then applied the models to all seven geographies using raw count and rate-based input data.

These confirmed that Pearson PM correlations change considerably when calculated for different geographies

Page 38: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

Summary of (tentative) findings so far

As geographical level alters, Pearson (and Spearman) correlations, independent variable vs. dependent relationships, regression parameters and model predictive power also alter, but not necessarily consistently. The direction of change can reverse, and this can include sign changes.

Changes most severe for large area configurationsCount models less resilient than rate models

Page 39: AN OVERVIEW Spatial Analysis. Spatial Analysis involves... Data Exploration – the uncovering of patterns, identification of the unusual, discovery of

What next?

Extension of work to consider “typical” ONS analysis scenarios and geographical effects

Seven geographies is not enough to really explore these effects – especially partitioning problem rather than aggregation effects

Plan to construct many “pseudo” geographies using Dave Martin’s AZ Tool utility.

Use of 2001 individual-level census base to construct a series of analysis scenarios for a proper simulation study across hundreds or thousands of artificial geographies