chapter 6 – analysis of mapped point patterns
DESCRIPTION
Chapter 6 – Analysis of mapped point patterns This chapter will introduce methods for analyzing and modeling the spatial distribution of mapped point data in which the location of every individual in the population is known. - PowerPoint PPT PresentationTRANSCRIPT
1
Chapter 6 – Analysis of mapped point patterns
This chapter will introduce methods for analyzing and modeling the spatial
distribution of mapped point data in which the location of every individual in the
population is known.
Two types of analyses can be conducted with mapped point patterns: (1) detecting
patterns (hypothesis test – if a pattern is at random, regular or aggregated
distribution), and (2) model fitting (inference – e.g., fit point pattern models to an
observed point pattern, see Chapter 7.)
In this chapter, we will concentrate on the first type of analysis by introducing an
important technique for detecting spatial patterns of mapped data.
2
Nearest-neighbor distribution functions: G(r) and F(r)
The various distance methods presented in Chapter 4 only provide summary
information on a spatial pattern at a particular distance (e.g., first nearest neighbor
distance, etc.). We now present methods that actually describe the distribution of the
nearest-neighbor distances, i.e., we model the nn distances by considering the distance
as a random variable.
G(r) is defined as a probability that the distance from a random chosen event to its
nearest neighbor is less than or equal to r:
The estimator is:
where ri is the nn distance for a randomly chosen event i (i = 1, 2, …, n ),
I(ri r) is an indicator function, I(ri r) = 1 if (ri r) is true, 0 otherwise.
F(r) is a probability that the distance from a random chosen point to its nearest
neighbor is less than or equal to r, also called “empty space function”. It has exactly
the same expression as G(r), but r in F(r) is a point-to-event distance.
).distancenn ()( rprobrG
nrrIrGn
ii /)()(ˆ
1
3
More on G(r) and F(r)
Under csr, it can be shown that G(r) has the form
To judge how far the empirical is from the csr, a simulation envelope could be
computed for based on, say, 100
realizations of s1, s2, …, s652 from a
uniform distribution in a study area
(i.e., assume the 652 Douglas-firs
follow the Poisson distribution). The
estimator is calculated from
each realization and for each distance
r, the largest and smallest values
define the simulation envelope.
(The envelope is not shown in the
figure here.)
.12)(22
0
2 rr x edxxerG )(ˆ rG
dist
Gr×
65
2
0 5 10 15 20 25 30
010
020
030
040
050
060
0Douglas-fir (n = 652)
2
1)( rerG
)(ˆ rG
)(ˆ rG
22)( rrerp
4
R package spatstat for point pattern analysis
Developed by Adrian Baddeley and Rolf Turner
The package supports:
1. creation, manipulation and plotting of point patterns
2. exploratory data analysis
3. simulation of point process models
4. parametric model-fitting
5. hypothesis tests and diagnosticsThe first thing to do for all these analyses is to create a ppp object!
Use Douglas-fir data as example:
> df.dat=subset(victoria.dat,victoria.dat$sp==“DF”)
> df.ppp=ppp(df.dat$x,df.dat$y,c(0,103),c(0,87))
> df.ppp=ppp(df.dat$x,df.dat$y,window=owin(c(0,103),c(0,87))
> df.ppp=ppp(df.dat$x,df.dat$y,poly=list(x=c(0,50,60,0),y=c(0,0,60,50))) # ploygon window
5Baddeley, A.J. & Gill, R.D. 1997. Kaplan-Meier estimators of interpoint distance distributions for spatial point processes. Annals of Statistics 25:263-292.
2
1)( rerG
Regular 1st nn dist
Aggregated
R implementation
1. Prepare Douglas-fir and Hemlock data into
ppp format (df.ppp, hl.ppp)
2. df.G=Gest(df.ppp)
3. plot(df.G)
4. plot(envelope(df.ppp,fun=Gest))
#generate envelope.
5. The pointwise envelopes are not “confidence
bands” for the true value of the function! The
test is constructed by choosing a fixed value
of r, and rejecting the null hypothesis if the
observed function value lies outside the
envelope at this value of r. This test has exact
significance level alpha = 2nrank/(1 + nsim).
nrank = the rank of the envelope value
amongst the nsim simulated values.
6
K-function
It is the most important function for quantifying mapped point pattern, proposed by Ripley in
1976, often called Ripley’s K-function.
K-function is a second-moment measure as it is closely related to the second-order intensity of a
stationary isotropic point process. It captures the spatial (in)dependence between different
regions of the point process. Let’s first look at the 1st- and 2nd-order properties of a spatial
point process.
1st-order property:
where Ax is an infinitesimal region which contains point x. For a stationary process, (x) =
constant.
2nd-order property:
For a stationary + isotropic process, (x, y) = (h), where h = |x-y|.
* Ripley, B. D. 1976. The second-order analysis of stationary point process. J. of Appl. Prob. 13:255-266.
x
x
A A
ANEx
x
)]([lim)(
0
yx
yx
AA AA
ANANEyx
yx
)]()([lim),(
0,2
7
Definition of K-function
K-function is defined as
K(h) = -1 E(# of other events within distance h of an arbitrary event).
E(# of other events within distance h of an arbitrary event) = K(h).
. h
8
The relationship between K-function and 2(x, y)
where 2(r)/ is interpreted as the conditional intensity of an event at x given an
event at 0, i.e., 2(0, x)/. This intensity corresponds to the intensity at the point
x conditional on that there is an event at 0.
For a Poisson process, (r) = 2, then K(h) = h2. Use as a null model for csr:
K(h) > h2 suggests aggregated pattern.K(h) = h2 suggests random pattern.K(h) < h2 suggests regular pattern.
. h
h
h h
rdrrhK
rdrrrdrdr
hK
0
22
2
0 0 0
22
)(2)(
)(2)(
)(
9
The properties of K(h)
1. For a Poisson process, (r) = 2, then K(h) = h2. Use as a null model for csr:
K(h) > h2 suggests aggregated pattern.K(h) = h2 suggests random pattern.K(h) < h2 suggests regular pattern.
2. K-function is invariant under random thinning. By “random thinning”, we mean that if each event of a process is retained or not according to a series of Bernoulli trials. This property means that the K-function of the resulting thinned process is identical to that of the original, unthinned process.
10
A simple estimator of K(h)
..
. . .
. ..
..
.
.
.
.
.
.
hsi
sj
N
i
N
ijj
ji hssIN
hK1 1
||||1
ˆ1
)(ˆ
h
.Edge effect: Thosepoint close to the edges will have less # of points with the h circle than those points far from the edges.
11
Toroidal unbiased estimator of K(h)
Because of edge effect, the simple estimator is not very efficient and is biased. An alternative is the estimator based on toroidal correction. N+ is the number of points that fall within ||si – sj|| h.
N
i
N
ijj
ji hssIN
hK1 1
||||1
ˆ1
)(ˆ
Toroidal edge correction, use only for stationary + isotropic patterns
h
Misuse of toroidal edge correction for non-stationary patterns
12
Weighted unbiased estimator of K(h)
Another unbiased estimator, initially proposed by Ripley (1976), gives more weight to those points near the boundaries.
where the weight w(si, sj) is the proportion of the circumference of a circle centered at si, passing through sj (si must be within the study area). w(si, sj) = 1 if the circle entirely locates within the study area.
N
i
N
ijj
jiji
hssIsswN
hK1 1
||||),(
11ˆ1
)(ˆ
..
. . .
. ..
..
.
.
.
.
.
.
hsi
sj
h
.
13
Computing w(si, sj)
Assume the study area is [0, a][0, b] and si has coordinates si = (x, y). Rewrite w(si,
sj) = w(si, h), h is the radius for a circle centered at si. Denote d1 = min(x, a-x), and
d2 = min(y, b-y); thus d1 and d2 are the distances from si to the nearest vertical and
horizontal edges of A.
w(si, h) is calculated as follows:
1. If h2 d12 + d2
2 (circle intersects with both vertical and horizontal edges):
2. If h2 d12 + d2
2 (circle intersects with one edge):
)
),min((cos)
),min((cos
11),( 2111
h
hd
h
hdhsw i
)(cos)(cos
2
1
4
3),( 2111
h
d
h
dhsw i ..
. . .. .
.
..
..
.
.
.
.
hsi.
si.
si.
14
Variance and simulation envelopes
As it was mentioned earlier that a csr has K(h) = h2. It is usual to express K(h) as
(*)
This transformation stabilizes variance for the transformed K0(h), which is
approximately:
To judge how far the observed K-function deviates from the csr, a simulation envelope could be constructed based on, say, 99 realizations of s1, s2, …, s982 from a uniform distribution in a study area (i.e., the 982 western hemlock trees follow the Poisson distribution). The K-function is calculated from each realization, and for each distance h the largest and smallest values define the simulation envelope.
.2
12N
)(ˆ
)(ˆ0
hKhK
15
R implementation
Let’s model the distribution of the 982 western hemlocks. The spatstat program computes the transformed K(h) presented on previous page.
>hl.kest=Kest(hl.ppp)
# hl.ppp = is ppp object of sptatstat
>plot(hl.kest)
>plot(hl.kest$r,sqrt(hl.kest$iso/pi)-hl.kest$r)
>hl.env=envelope(hl.ppp)>plot(hl.kest$r,sqrt(hl.kest$iso/pi))>lines(hl.env$r,sqrt(hl.env$lo/pi),col=2)>lines(hl.env$r,sqrt(hl.env$hi/pi),col=2)
16
L-function
In practice, K-function is usually displayed in L-function, defined as
• For an aggregated distribution
• For a random distribution
• For a regular distribution
Examples:
hhKhL )(ˆ)(ˆ0
0))(ˆ( hLE
0))(ˆ( hLE
0))(ˆ( hLE
0 10 20 30 40 50
-1.0
-0.5
0.0
0.5
Douglas-fir(n = 652)
L(h
)
0 10 20 30 40 50
01
23
Hemlock(n = 982)
h h
17
g-function (pair correlation function)
g-function is derivative of K-function, defined as
Obviously, g-function describes how K-function changes with spatial distance lag h. K-
function is a cumulative function which may accumulate confounding large scale (large
h) effect with the effect of small scales (small h). g-function is said to be able to separate
these effects.
R implementation:
>pcf(hl.ppp)
h
hKhg
2
)(')(
. h
18
Bivariate spatial point patterns
A bivariate spatial point pattern consists of the locations of two types of events in a
bounded study area A, e.g., the distributions of two tree species (Douglas-fir and
western hemlock). It can be defined as {sj(i): i = 1, 2; j = 1, 2, …} of type i (i = 1, 2)
species at jth location. The two species may or may not be spatially independent. A
natural working hypothesis is that the patterns of the two species are independent.
However, it is worth to note that the independence does not necessarily guarantee the
csr for each of the species.
Similar to the univariate case, the K-function can be extended to the bivariate case to
quantify the relationship between the two species, defined as
K12(h) = -1 E(# of type 2 events within distance h of an arbitrary type 1 event).
If both species are at csr, K12(h) (= K21(h)) has a simple result
K12(h) = h2.
19
An unbiased estimator
For a given data, K12(h) and K21(h) can be respectively estimated as
where w(si(1), sj
(2)) is the proportion of the circumference of the circle with centre sj(1)
and radius h that lies within the study region A.
)||(||),(
111)(
~
)||(||),(
111)(
~
)1()2(
1 1)1()2(
2121
1 1
)2()1()2()1(
1212
2 1
1 2
hssIsswN
hK
hssIsswN
hK
ji
N
i
N
j ji
N
i
N
jji
ji
.
... . .
. ..
..
.
.
.
.
hsi
(1)
sj(2)
.. .
20
An estimator of variance reduction
When the underlying process for both species are independent Poisson, Lotwick &
Silverman (1982) show that the most efficient estimator is a linear combination
Because for csr K12(h) = h2, we can similarly define an L-function:
• For an aggregated distribution
• For a random distribution
• For a regular distribution
* Lotwick, H. W. & Silverman, B. W. 1982. Methods for analysing spatial processes of several types of
points. J. R. Stat. Soc. B, 44:406-413.
.)(
~)(
~)(ˆ
21
21112212 NN
hKNhKNhK
hhK
hL
)(ˆ)(ˆ 12
12
0))(ˆ( 12 hLE
0))(ˆ( 12 hLE
0))(ˆ( 12 hLE
21
R implementation
Let’s use Splus to compute K12(h) for redcedar and western hemlock.
>victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp)
>cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”)
>plot(cdhl.kcross)
Also see Kmulti
>plot(Kmulti(victoria.ppp,victoria.ppp$marks=="CD",victoria.ppp$marks=="HL"))
22
K12
(h)
0 10 20 30 40 50
05
1015
2025
Distance h
K12
(h)
0 10 20 30 40 50
05
1015
20
Hemlock-redcedar
Douglas fir-hemlock
23
Assignment: Compute bivariate L function for CD and HL of Victoria.dat
> victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp)> cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”)> plot(cdhl.kcross)> cdhl.env=envelope(victoria.ppp, Kcross, i="HL", j="CD")
> cdhl.lfn=sqrt(cdhl.kcross$iso/pi)-cdhl.kcross$r> plot(cdhl.kcross$r, cdhl.lfn, ylim=c(-0.25,1.1), xlab="h", ylab="L function")> lines(cdhl.env$r, sqrt(cdhl.env$hi/pi)-cdhl.env$r, col="red")> lines(cdhl.env$r, sqrt(cdhl.env$lo/pi)-cdhl.env$r, col="blue")