chapter 6 – analysis of mapped point patterns

1

Chapter 6 – Analysis of mapped point patterns

This chapter will introduce methods for analyzing and modeling the spatial

distribution of mapped point data in which the location of every individual in the

population is known.

Two types of analyses can be conducted with mapped point patterns: (1) detecting

patterns (hypothesis test – if a pattern is at random, regular or aggregated

distribution), and (2) model fitting (inference – e.g., fit point pattern models to an

observed point pattern, see Chapter 7.)

In this chapter, we will concentrate on the first type of analysis by introducing an

important technique for detecting spatial patterns of mapped data.

2

Nearest-neighbor distribution functions: G(r) and F(r)

The various distance methods presented in Chapter 4 only provide summary

information on a spatial pattern at a particular distance (e.g., first nearest neighbor

distance, etc.). We now present methods that actually describe the distribution of the

nearest-neighbor distances, i.e., we model the nn distances by considering the distance

as a random variable.

G(r) is defined as a probability that the distance from a random chosen event to its

nearest neighbor is less than or equal to r:

The estimator is:

where ri is the nn distance for a randomly chosen event i (i = 1, 2, …, n ),

I(ri r) is an indicator function, I(ri r) = 1 if (ri r) is true, 0 otherwise.

F(r) is a probability that the distance from a random chosen point to its nearest

neighbor is less than or equal to r, also called “empty space function”. It has exactly

the same expression as G(r), but r in F(r) is a point-to-event distance.

).distancenn ()( rprobrG

nrrIrGn

ii /)()(ˆ

1

3

More on G(r) and F(r)

Under csr, it can be shown that G(r) has the form

To judge how far the empirical is from the csr, a simulation envelope could be

computed for based on, say, 100

realizations of s1, s2, …, s652 from a

uniform distribution in a study area

(i.e., assume the 652 Douglas-firs

follow the Poisson distribution). The

estimator is calculated from

each realization and for each distance

r, the largest and smallest values

define the simulation envelope.

(The envelope is not shown in the

figure here.)

.12)(22

0

2 rr x edxxerG )(ˆ rG

dist

Gr×

65

2

0 5 10 15 20 25 30

010

020

030

040

050

060

0Douglas-fir (n = 652)

2

1)( rerG

)(ˆ rG

)(ˆ rG

22)( rrerp

4

R package spatstat for point pattern analysis

Developed by Adrian Baddeley and Rolf Turner

The package supports:

1. creation, manipulation and plotting of point patterns

2. exploratory data analysis

3. simulation of point process models

4. parametric model-fitting

5. hypothesis tests and diagnosticsThe first thing to do for all these analyses is to create a ppp object!

Use Douglas-fir data as example:

> df.dat=subset(victoria.dat,victoria.dat$sp==“DF”)

> df.ppp=ppp(df.dat$x,df.dat$y,c(0,103),c(0,87))

> df.ppp=ppp(df.dat$x,df.dat$y,window=owin(c(0,103),c(0,87))

> df.ppp=ppp(df.dat$x,df.dat$y,poly=list(x=c(0,50,60,0),y=c(0,0,60,50))) # ploygon window

5Baddeley, A.J. & Gill, R.D. 1997. Kaplan-Meier estimators of interpoint distance distributions for spatial point processes. Annals of Statistics 25:263-292.

2

1)( rerG

Regular 1st nn dist

Aggregated

R implementation

1. Prepare Douglas-fir and Hemlock data into

ppp format (df.ppp, hl.ppp)

2. df.G=Gest(df.ppp)

3. plot(df.G)

4. plot(envelope(df.ppp,fun=Gest))

#generate envelope.

5. The pointwise envelopes are not “confidence

bands” for the true value of the function! The

test is constructed by choosing a fixed value

of r, and rejecting the null hypothesis if the

observed function value lies outside the

envelope at this value of r. This test has exact

significance level alpha = 2nrank/(1 + nsim).

nrank = the rank of the envelope value

amongst the nsim simulated values.

6

K-function

It is the most important function for quantifying mapped point pattern, proposed by Ripley in

1976, often called Ripley’s K-function.

K-function is a second-moment measure as it is closely related to the second-order intensity of a

stationary isotropic point process. It captures the spatial (in)dependence between different

regions of the point process. Let’s first look at the 1st- and 2nd-order properties of a spatial

point process.

1st-order property:

where Ax is an infinitesimal region which contains point x. For a stationary process, (x) =

constant.

2nd-order property:

For a stationary + isotropic process, (x, y) = (h), where h = |x-y|.

* Ripley, B. D. 1976. The second-order analysis of stationary point process. J. of Appl. Prob. 13:255-266.

x

x

A A

ANEx

x

)]([lim)(

0

yx

yx

AA AA

ANANEyx

yx

)]()([lim),(

0,2

7

Definition of K-function

K-function is defined as

K(h) = -1 E(# of other events within distance h of an arbitrary event).

E(# of other events within distance h of an arbitrary event) = K(h).

. h

8

The relationship between K-function and 2(x, y)

where 2(r)/ is interpreted as the conditional intensity of an event at x given an

event at 0, i.e., 2(0, x)/. This intensity corresponds to the intensity at the point

x conditional on that there is an event at 0.

For a Poisson process, (r) = 2, then K(h) = h2. Use as a null model for csr:

K(h) > h2 suggests aggregated pattern.K(h) = h2 suggests random pattern.K(h) < h2 suggests regular pattern.

. h

h

h h

rdrrhK

rdrrrdrdr

hK

0

22

2

0 0 0

22

)(2)(

)(2)(

)(

9

The properties of K(h)

1. For a Poisson process, (r) = 2, then K(h) = h2. Use as a null model for csr:

K(h) > h2 suggests aggregated pattern.K(h) = h2 suggests random pattern.K(h) < h2 suggests regular pattern.

2. K-function is invariant under random thinning. By “random thinning”, we mean that if each event of a process is retained or not according to a series of Bernoulli trials. This property means that the K-function of the resulting thinned process is identical to that of the original, unthinned process.

10

A simple estimator of K(h)

..

. . .

. ..

..

.

.

.

.

.

.

hsi

sj

N

i

N

ijj

ji hssIN

hK1 1

||||1

ˆ1

)(ˆ

h

.Edge effect: Thosepoint close to the edges will have less # of points with the h circle than those points far from the edges.

11

Toroidal unbiased estimator of K(h)

Because of edge effect, the simple estimator is not very efficient and is biased. An alternative is the estimator based on toroidal correction. N+ is the number of points that fall within ||si – sj|| h.

N

i

N

ijj

ji hssIN

hK1 1

||||1

ˆ1

)(ˆ

Toroidal edge correction, use only for stationary + isotropic patterns

h

Misuse of toroidal edge correction for non-stationary patterns

12

Weighted unbiased estimator of K(h)

Another unbiased estimator, initially proposed by Ripley (1976), gives more weight to those points near the boundaries.

where the weight w(si, sj) is the proportion of the circumference of a circle centered at si, passing through sj (si must be within the study area). w(si, sj) = 1 if the circle entirely locates within the study area.

N

i

N

ijj

jiji

hssIsswN

hK1 1

||||),(

11ˆ1

)(ˆ

..

. . .

. ..

..

.

.

.

.

.

.

hsi

sj

h

.

13

Computing w(si, sj)

Assume the study area is [0, a][0, b] and si has coordinates si = (x, y). Rewrite w(si,

sj) = w(si, h), h is the radius for a circle centered at si. Denote d1 = min(x, a-x), and

d2 = min(y, b-y); thus d1 and d2 are the distances from si to the nearest vertical and

horizontal edges of A.

w(si, h) is calculated as follows:

1. If h2 d12 + d2

2 (circle intersects with both vertical and horizontal edges):

2. If h2 d12 + d2

2 (circle intersects with one edge):

)

),min((cos)

),min((cos

11),( 2111

h

hd

h

hdhsw i

)(cos)(cos

2

1

4

3),( 2111

h

d

h

dhsw i ..

. . .. .

.

..

..

.

.

.

.

hsi.

si.

si.

14

Variance and simulation envelopes

As it was mentioned earlier that a csr has K(h) = h2. It is usual to express K(h) as

(*)

This transformation stabilizes variance for the transformed K0(h), which is

approximately:

To judge how far the observed K-function deviates from the csr, a simulation envelope could be constructed based on, say, 99 realizations of s1, s2, …, s982 from a uniform distribution in a study area (i.e., the 982 western hemlock trees follow the Poisson distribution). The K-function is calculated from each realization, and for each distance h the largest and smallest values define the simulation envelope.

.2

12N

)(ˆ

)(ˆ0

hKhK

15

R implementation

Let’s model the distribution of the 982 western hemlocks. The spatstat program computes the transformed K(h) presented on previous page.

>hl.kest=Kest(hl.ppp)

# hl.ppp = is ppp object of sptatstat

>plot(hl.kest)

>plot(hl.kest$r,sqrt(hl.kest$iso/pi)-hl.kest$r)

>hl.env=envelope(hl.ppp)>plot(hl.kest$r,sqrt(hl.kest$iso/pi))>lines(hl.env$r,sqrt(hl.env$lo/pi),col=2)>lines(hl.env$r,sqrt(hl.env$hi/pi),col=2)

16

L-function

In practice, K-function is usually displayed in L-function, defined as

• For an aggregated distribution

• For a random distribution

• For a regular distribution

Examples:

hhKhL )(ˆ)(ˆ0

0))(ˆ( hLE

0))(ˆ( hLE

0))(ˆ( hLE

0 10 20 30 40 50

-1.0

-0.5

0.0

0.5

Douglas-fir(n = 652)

L(h

)

0 10 20 30 40 50

01

23

Hemlock(n = 982)

h h

17

g-function (pair correlation function)

g-function is derivative of K-function, defined as

Obviously, g-function describes how K-function changes with spatial distance lag h. K-

function is a cumulative function which may accumulate confounding large scale (large

h) effect with the effect of small scales (small h). g-function is said to be able to separate

these effects.

R implementation:

>pcf(hl.ppp)

h

hKhg

2

)(')(

. h

18

Bivariate spatial point patterns

A bivariate spatial point pattern consists of the locations of two types of events in a

bounded study area A, e.g., the distributions of two tree species (Douglas-fir and

western hemlock). It can be defined as {sj(i): i = 1, 2; j = 1, 2, …} of type i (i = 1, 2)

species at jth location. The two species may or may not be spatially independent. A

natural working hypothesis is that the patterns of the two species are independent.

However, it is worth to note that the independence does not necessarily guarantee the

csr for each of the species.

Similar to the univariate case, the K-function can be extended to the bivariate case to

quantify the relationship between the two species, defined as

K12(h) = -1 E(# of type 2 events within distance h of an arbitrary type 1 event).

If both species are at csr, K12(h) (= K21(h)) has a simple result

K12(h) = h2.

19

An unbiased estimator

For a given data, K12(h) and K21(h) can be respectively estimated as

where w(si(1), sj

(2)) is the proportion of the circumference of the circle with centre sj(1)

and radius h that lies within the study region A.

)||(||),(

111)(

~

)||(||),(

111)(

~

)1()2(

1 1)1()2(

2121

1 1

)2()1()2()1(

1212

2 1

1 2

hssIsswN

hK

hssIsswN

hK

ji

N

i

N

j ji

N

i

N

jji

ji

.

... . .

. ..

..

.

.

.

.

hsi

(1)

sj(2)

.. .

20

An estimator of variance reduction

When the underlying process for both species are independent Poisson, Lotwick &

Silverman (1982) show that the most efficient estimator is a linear combination

Because for csr K12(h) = h2, we can similarly define an L-function:

• For an aggregated distribution

• For a random distribution

• For a regular distribution

* Lotwick, H. W. & Silverman, B. W. 1982. Methods for analysing spatial processes of several types of

points. J. R. Stat. Soc. B, 44:406-413.

.)(

~)(

~)(ˆ

21

21112212 NN

hKNhKNhK

hhK

hL

)(ˆ)(ˆ 12

12

0))(ˆ( 12 hLE

0))(ˆ( 12 hLE

0))(ˆ( 12 hLE

21

R implementation

Let’s use Splus to compute K12(h) for redcedar and western hemlock.

>victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp)

>cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”)

>plot(cdhl.kcross)

Also see Kmulti

>plot(Kmulti(victoria.ppp,victoria.ppp$marks=="CD",victoria.ppp$marks=="HL"))

22

K12

(h)

0 10 20 30 40 50

05

1015

2025

Distance h

K12

(h)

0 10 20 30 40 50

05

1015

20

Hemlock-redcedar

Douglas fir-hemlock

23

Assignment: Compute bivariate L function for CD and HL of Victoria.dat

> victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp)> cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”)> plot(cdhl.kcross)> cdhl.env=envelope(victoria.ppp, Kcross, i="HL", j="CD")

> cdhl.lfn=sqrt(cdhl.kcross$iso/pi)-cdhl.kcross$r> plot(cdhl.kcross$r, cdhl.lfn, ylim=c(-0.25,1.1), xlab="h", ylab="L function")> lines(cdhl.env$r, sqrt(cdhl.env$hi/pi)-cdhl.env$r, col="red")> lines(cdhl.env$r, sqrt(cdhl.env$lo/pi)-cdhl.env$r, col="blue")

chapter 6 – analysis of mapped point patterns

Documents