using plot correlation structure to compare species

30
Using Plot Correlation Structure to Compare Species Accumulation Curves Derived from Nested and Non-nested Sampling Schemes Abstract An approach for computing the correlations of species richness values obtained from nested and non-nested sampling schemes is described. The correlations obtained are used to construct species accumulation curves for the Carolina Vegetation Survey data set using the log Arrhenius, Gleason, and Arrhenius models. These models are then used to predict the richness at the 400 m 2 scale, a value outside the range of the data. To compare nested and non-nested sampling schemes using the different models the MSE, bias, and variance of the different estimators of predicted species richness were calculated. These quantities were then used to make aggregated comparisons (comparing the overall distributions of the estimator properties ignoring the plot from which they claim) and pairwise comparisons (comparing the results of the different sampling schemes on the same plot for all plots). Comparisons were made both with (GLS) and without (OLS) including the plot correlation structure in the fitting of the models and the results of the two estimation methods contrasted. The aggregated approach provided a clear MSE ranking of the models. The Arrhenius model is superior to the log Arrhenius model which is superior to the Gleason model. This same ranking was obtained for both nested and non-nested samples. The estimator based on the Gleason model is negatively biased, often extremely so, but highly precise. The estimator based on the log Arrhenius model is the least precise of the three and is positively biased. The Arrhenius model is also positively biased but typically less so than the log Arrhenius model. GLS estimation greatly improved the estimates obtained from the log Arrhenius model. A paired difference approach was used to compare the sampling schemes. Results varied depending on the model used. Non-nested samples yield estimates that typically are more biased (positively) and have larger MSE than do nested samples when using the log Arrhenius model, although much of the difference in MSE and some of the difference in bias is removed by using GLS estimation. With the Gleason model non-nested samples yield better estimates than do nested samples as measured by all three criteria: MSE, bias, and variance. But the Gleason model with either sampling scheme yields extremely biased estimates a fact that is not improved by switching to GLS estimation. When using the Arrhenius model nested and non-nested samples are indistinguishable with respect to MSE and variance. It is the case that estimates from nested samples tend to be less biased than those from non-nested samples with the difference between them increasing monotonically with the value being estimated. Part 1: Deriving the Correlation Structure Background In thinking about how to quantify the inherent correlation of nested samples, I initially headed down a wrong path. Following the approach taken by Cam et al. (2002) I looked to the overlap in area between the samples as the source of the correlation. Because of the nonlinear nature of the species accumulation relation, this is the wrong way to tackle the problem. The Rosetta stone for me in all this was to realize that while the nestedness or non-nestedness of the samples does control the structure of the observed inter-plot correlation matrix, the sampling scheme by itself is not the cause of the correlation. For that we need to look to the response variable, species richness. Species richness, by its dependence on species lists, induces a correlation in the response independently of the sampling scheme used. As a result the richness values obtained both from nested and non-nested samples are typically correlated. The primary role of the sampling scheme is in determining the pattern of the correlation.

Upload: others

Post on 03-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Plot Correlation Structure to Compare Species

Using Plot Correlation Structure to Compare Species Accumulation Curves Derived from Nested and Non-nested Sampling Schemes

Abstract

An approach for computing the correlations of species richness values obtained from nested and non-nested sampling schemes is described. The correlations obtained are used to construct species accumulation curves for the Carolina Vegetation Survey data set using the log Arrhenius, Gleason, and Arrhenius models. These models are then used to predict the richness at the 400 m2 scale, a value outside the range of the data. To compare nested and non-nested sampling schemes using the different models the MSE, bias, and variance of the different estimators of predicted species richness were calculated. These quantities were then used to make aggregated comparisons (comparing the overall distributions of the estimator properties ignoring the plot from which they claim) and pairwise comparisons (comparing the results of the different sampling schemes on the same plot for all plots). Comparisons were made both with (GLS) and without (OLS) including the plot correlation structure in the fitting of the models and the results of the two estimation methods contrasted. The aggregated approach provided a clear MSE ranking of the models. The Arrhenius model is superior to the log Arrhenius model which is superior to the Gleason model. This same ranking was obtained for both nested and non-nested samples. The estimator based on the Gleason model is negatively biased, often extremely so, but highly precise. The estimator based on the log Arrhenius model is the least precise of the three and is positively biased. The Arrhenius model is also positively biased but typically less so than the log Arrhenius model. GLS estimation greatly improved the estimates obtained from the log Arrhenius model. A paired difference approach was used to compare the sampling schemes. Results varied depending on the model used. Non-nested samples yield estimates that typically are more biased (positively) and have larger MSE than do nested samples when using the log Arrhenius model, although much of the difference in MSE and some of the difference in bias is removed by using GLS estimation. With the Gleason model non-nested samples yield better estimates than do nested samples as measured by all three criteria: MSE, bias, and variance. But the Gleason model with either sampling scheme yields extremely biased estimates a fact that is not improved by switching to GLS estimation. When using the Arrhenius model nested and non-nested samples are indistinguishable with respect to MSE and variance. It is the case that estimates from nested samples tend to be less biased than those from non-nested samples with the difference between them increasing monotonically with the value being estimated.

Part 1: Deriving the Correlation Structure

Background

In thinking about how to quantify the inherent correlation of nested samples, I initially headed down a wrong path. Following the approach taken by Cam et al. (2002) I looked to the overlap in area between the samples as the source of the correlation. Because of the nonlinear nature of the species accumulation relation, this is the wrong way to tackle the problem.

The Rosetta stone for me in all this was to realize that while the nestedness or non-nestedness of the samples does control the structure of the observed inter-plot correlation matrix, the sampling scheme by itself is not the cause of the correlation. For that we need to look to the response variable, species richness. Species richness, by its dependence on species lists, induces a correlation in the response independently of the sampling scheme used. As a result the richness values obtained both from nested and non-nested samples are typically correlated. The primary role of the sampling scheme is in determining the pattern of the correlation.

Page 2: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 2

I begin my approach by generically focusing on the sampling scheme alone without reference to whether the samples obtained are used to fit species accumulation curves or species area curves. Initially I also ignore the underlying sampling scheme. Fig. 1 shows all possible configurations for two plots at different scales in which a species richness value of 2 was obtained for both. In Fig. 1a exactly the same species appear at each scale. Clearly the plots are perfectly correlated. In Fig. 1b entirely different species appear at each scale so that there is no overlap. These plots exhibit no correlation. In Fig. 1c one species is found at both scales. Since the plots have half their species in common the correlation should be 0.5.

Scale 1

Scale 2

Richness = 2 Richness = 2

r = 1

Species A

Species B

Species A

Species B

Scale 1

Scale 2

Richness = 2 Richness = 2

r = 0

Species A

Species B

Species C

Species D

Scale 1

Scale 2

Richness = 2 Richness = 2

r = 0.5

Species A

Species BSpecies D

Species A

a. Both species in common b. No species in common c. One species in common

Figure 1 Three examples of pairs of plots at different scales with the same richness values but different correlations. The richness-induced correlations vary depending on species composition.

Next let’s include the sampling scheme in the discussion. If we’re told that the plots depicted in Fig. 1 are nested, then only scenario Fig. 1a is possible. In short, if the same richness is observed at two different scales using a nested sampling scheme then it follows that the plots contain the same species and the hence outcomes are perfectly correlated. With non-nested samples, on the other hand, all three scenarios of Fig. 1 could occur, although their relative likelihood would be controlled by the abundance and distribution of the various species in the subplots. Extending these results to plots with unequal numbers of species is straight forward. Consider the case where richness values of 1 and 3 are obtained for two plots. As before if the plots share no species in common, the correlation is zero. This leaves the case where the plots share a single species in common. There are two possible scenarios (Fig. 2).

Richness = 3

r = 0.33

Scale 2

Species A

Species B

Species C

Scale 1

Species B

Richness = 1

Richness = 1

r = –0.33

Scale 1

Richness = 3

Species A

Species B

Species C

Scale 2

Species B

a. Larger area plot has more species b. Smaller area plot has more species

Figure 2 Two examples of pairs of plots at different scales with different richness values but overlapping species lists. The sign of the correlation is determined by which plot, the larger (+) or the smaller (–), has the greater number of species.

Page 3: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 3

Since the plots in Fig. 2 share 31 of their species in common, the magnitude of

the correlation in both cases is 31 . The sign of the correlation indicates whether the

change of richness with area is in the same or the opposite direction as the change in area. The negative sign in Fig. 2b arises because when plot area increased, richness decreased.

It’s worth noting at this point that while negative correlations can occur in species area curve calculations for non-nested samples, they can never occur in species accumulation curve calculations regardless of the sampling scheme. This is because with species accumulation curves, by construction, the species lists at each stage are added to the accumulated list of species already encountered. The number of species present is never allowed to decrease. Thus samples that are non-nested yield accumulated species lists that are nested. As a result when fitting species accumulation curves, we should expect similar between-subplot correlations regardless of the sampling scheme used. The discussion accompanying Figs. 1 and 2 easily generalizes to any pair of plots. In a nested sampling scheme calculating the inter-plot correlation for use in estimating species accumulation curves requires only that we know the richness values of the two plots in question. Consider two plots a and b. Suppose b has greater area than a and that a is properly nested in b. The richness correlation of these two plots, denoted r , is calculated as follows:

ab

( )( )

=

≠=

0richnessif,1

0richnessif,)(richness)(richness

b

bba

rab (1)

For non-nested samples the protocol is only slightly more complicated. We begin by replacing the sequence of non-nested subplots (arranged in order of increasing area) with a corresponding sequence of “equivalent” nested subplots obtained by merging species lists. Having reduced the problem to one of comparing nested subplots, the correlation between subplots can be obtained using eqn (1) above. Further details and an example of the calculations for non-nested samples are outlined below. Nested Samples

Eqn (1) was used to calculate the six pairwise correlations among the four

subplots in each of the 5408 available nested samples (1352 plots with four replicates in each plot). To get a sense of how the different correlations compare to each other their frequency distributions were compared. The kernel distributions (smoothed histograms) of the distributions of correlation coefficients are shown in Figure 3 below. Observe that the correlation distributions for plots the same “distance” apart (on a log10 scale) tend to cluster together.

Page 4: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 4

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Estimated Correlation

Den

sity

Nested Samples Correlation Structure

r12r13r14r23r24r34

Figure 3 The subscripts 1, 2, 3, and 4 on the correlation labels in the figure refer to nested subplots on the 0.1, 1, 10, and 100 m2 scale respectively (n = 5408 for each curve). Solid lines correspond to the correlation distributions of plot sizes that are one unit apart (on a log10 scale), dashed lines for correlations between plot sizes two units apart, and the dotted line for the correlation between plot sizes that are three units apart.

If the correlation that maximizes the density for each distribution in Fig. 3 is used as a “typical” value, the following correlation matrix is obtained:

, (2)

=

150.22.07.50.148.15.22.48.137.07.15.37.1

r

where the entry in row i, column j of this matrix is the correlation between subplots i and j. The nearly banded diagonal pattern of the matrix in eqn (2) approximates the correlation structure one would expect from an AR(1), autoregressive of order one, process.

=

11

11

:structure)1(AR

23

2

2

32

ρρρρρρ

ρρρρρρ

r

(Note: the estimated correlation matrix of eqn (2) suggests that two distinct values of ρ might be more appropriate—one value for row 1 and column 1, and another for the rest of the matrix.) The resemblance to an AR(1) correlation structure is a satisfying result because theoretically an AR(1) process is exactly what one would expect based on the

Page 5: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 5

degree of area overlap of the subplots. The degree of area overlap itself possesses an AR(1) structure, ρ = 0.10.

Area overlap for pairs of nested subplots:

110.01.001.10.110.01.01.10.110.001.01.10.1

Non-nested Samples

Let be the species lists for a sequence of non-nested plots arranged in order of increasing area. Define

nxxx ,,, 21 K

nn xxxy

xxyxy

∪∪∪=

∪==

K

M

21

212

11

Clearly by definition . Suppose . Then the correlation between non-nested plots a and b is defined as follows:

nyyy ⊆⊆⊆ L21 ba yy ⊆

( )( ) ( )

( )

=

≠=

0richnessif,1

0richnessif,richnessrichness

b

bb

a

ab

y

yyy

r (3)

Observe that except for notation this formula is identical to eqn (1) for nested samples.

Note: In principle eqn (3) should be multiplied by

( ) ( )( ) ( )

−−

ab

ab

yyyy

areaarearichnessrichness

sign

which takes the value +1 when richness increases with area and –1 when richness decreases with area. As was noted above this additional factor is unnecessary for species accumulation curves (but not species area curves) because richness is a nondecreasing function of area and hence the correlation is always nonnegative. Here’s an example of applying the formula. A random selection of subplots at different spatial scales from plot 001-02-0202 yielded the following species lists, the s following the notation above.

ix

1x 9324 11694 11949 15051

2x 4887 7476 11949 15051

3x 3920 6305 7469 7476 9898 11694 11949 15051

4x 100 2468 2963 3057 7476 7821 9324 9898 10425 10504 10584 11694 11949 12957 13959 15051

Page 6: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 6

Next form the accumulated species lists, the s. iy

1y 9324 11694 11949 15051

2y 9324 11694 11949 15051 4887 7476

3y 9324 11694 11949 15051 4887 7476 3920 6305 7469 9898

4y 9324 11694 11949 15051 4887 7476 3920 6305 7469 9898 100 2468 2963 3057 7821 10425 10504 10584 12957 13959 15998

The corresponding richness values are

Plot 1y 2y 3y 4y Richness 4 6 10 21

Using eqn (3) the interplot correlations are

Correlation 12r 13r 14r 23r 24r 34r Value 6

4 104 21

4 106 21

6 2110

Arranging these in the form a correlation matrix yields

Example correlation matrix:

=

148.029.019.048.0160.040.029.060.0167.019.040.067.01

r

Continuing in this fashion for each random selection of subplots within a plot, eqn

(3) was used to calculate the correlation between the six possible combinations of subplots. This was done for each of the 1352 plots. To match the results for the nested samples, four different subplot combinations were selected from each plot (using the Latin square algorithm described below) to yield a total of 5408 samples.

The individual correlations are random variables and hence vary from plot to plot. The kernel distributions (smoothed histograms) of the resulting distributions of correlation coefficients are shown in Fig. 4 below. Observe that the correlation distributions for plots the same “distance” apart (on a log10 scale) tend to cluster together and that the arrangement closely parallels what was seen in Fig. 3.

Page 7: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 7

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Estimated Correlation

Den

sity

Non−nested Samples Correlation Structure

r12r13r14r23r24r34

Figure 4 The subscripts 1, 2, 3, and 4 on the correlation labels in the figure refer to accumulated subplots with areas 0.1, 1.1, 11.1, and 111.1 m2 respectively (n = 5408 for each curve). Solid lines correspond to the correlation distributions of plot sizes that are one unit apart (on a log10 scale), dashed lines for correlations between plot sizes two units apart, and the dotted line for the correlation between plot sizes that are three units apart. The structure shown is correct for species accumulation curves only. If the correlation that maximizes the density for each distribution in Fig. 4 is used as a “typical” value, the following correlation matrix is obtained.

Typical (non-nested) correlation structure: (4)

=

154.24.06.54.147.12.24.47.130.06.12.30.1

r

The values shown are a close match to those shown in eqn (2) for nested samples. (Keep in mind though that the areas of plots 2, 3, and 4 are slightly larger for non-nested samples than for nested samples.)

The correlation distributions for nested and non-nested samples are compared in Fig. 5. From the figure it’s clear that for the purposes of constructing species accumulation curves the correlation distributions of plots from nested and non-nested sampling schemes are essentially identical.

Page 8: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 8

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Estimated Correlation

Den

sity

Intercorrelations of Plots at Scales 0.1, 1, 10 m2

Nestedr23r13r12

Non−nestedr23r13r12

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Estimated Correlation

Den

sity

Nestedr14r24r34

Non−nestedr14r24r34

Correlations with Plot at Largest Scale, 100 m2

Figure 5 Comparing the correlation structures using nested and non-nested sampling schemes in constructing species accumulation curves.

Part 2: Using Subplot Correlations to Fit Species Accumulation Curves Background In the ordinary least squares approach to simple linear regression, the equation for the ith observation is

iii xy εββ ++= 10

where the individual iε are independent and identically distributed normal random variables with common variance . The individual regression equations can be arranged as a single vector equation.

+

=

+

+

++

=

++

++++

=

nnnnnnn x

xx

x

xx

x

xx

y

yy

ε

εε

ββ

ε

εε

ββ

ββββ

εββ

εββεββ

MMMMMMM2

1

1

02

1

2

1

10

210

110

10

2210

1110

2

1

1

11

or using vector-matrix notation

εβ += Xy .

Keeping with this notation, the ordinary least squares solution for the coefficient vector β is

( ) yTT XXX 1−=β (5)

and the standard regression assumptions become

Page 9: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 9

OLS assumptions: . (6) ( ) ( ) ( IΝIE 22

2

2

2

,,

000

000

Var, σσ

σ

σσ

00 ∼=

== εεε

L

OM

M

L

)

where I is the identity matrix A natural generalization of ordinary least squares is to allow the residuals to be

correlated, thus replacing the identity matrix I in eqn (6) with a general matrix Ω. Since is a covariance matrix not all possible choices of Ω are suitable. Covariance

matrices must satisfy the mathematical requirement of being positive definite. (I discuss this property at length below.) For generalized least squares (GLS) estimation the assumptions of eqn (6) are the following.

Ω2σ

GLS assumptions: ( ) ( ) ( )Ω∼Ω= 22 ,,, σσ 0Νεεε = Var0E (7)

It’s an easy exercise to show that under the GLS assumptions, the ordinary least squares solution of eqn (5) becomes the following generalized least squares solution.

( ) y111 −−− ΩΩ= TT XXXβ (8)

Implementing the GLS Approach for Species Accumulation Curves

Generating the non-nested samples

I followed the protocol for choosing subplots from nested samples in a single plot that I described in an earlier document. In short, the four nested samples in a plot and four area scales at which samples were taken can be organized as a 4 × 4 Latin square in which columns represent the scale, the elements in a column or row identify specific nested samples (module numbers), and the rows correspond to the non-nested samples that are generated. The Latin square design guarantees that the four different area scales in a single row come from different nested samples and the four subplots at the same scale in a column also come from different nested samples. Although there are 576 distinct Latin squares of order 4, there are only four distinct reduced Latin squares of order 4 (squares in which the elements in the first row are in ascending sequence). The remaining 572 squares can be obtained from the reduced set by column permutation. Fig. 6 shows the four reduced Latin squares of order 4. The numbers in each square correspond to module numbers from the modified Whittaker nested design of the Carolina Vegetation Survey.

IVIIIIII2389329889239832

2389392882939832

8329329829839832

3289239889239832

Figure 6 The four distinct reduced Latin squares of order 4. Columns correspond to scale and rows correspond to samples. The numbers in the squares are the module numbers used in the modified Whittaker nested design of the Carolina Vegetation Survey

Page 10: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 10

The protocol I used for generating the non-nested samples from the four nested samples in a plot is as follows. For each plot I randomly selected one of the four reduced Latin squares of Fig. 6 and then randomly permuted its columns. The rows of the permuted Latin square then defined the four non-nested samples for that plot. The numbers in each row indicate the scale order in which the modules should be selected for use in that sample. For example if the module order in a row is 9, 2, 3, 8 then module 9 should be used for the species list at the smallest scale, module 2 for the species list at the next largest scale, etc.

Correlation or covariance matrix?

To implement the GLS estimation of a species accumulation curves, I took Ω to be the estimated correlation matrix obtained from the species lists. Note that because the residual covariance matrix is , but Ω is a correlation matrix and hence has ones on the diagonal, my choice for Ω constrains the residual variance to be the same at each spatial scale. Essentially I’m assuming that the variability in accumulated richness is independent of area. (Of course OLS does this also.) Typically the variance of count data increases with its mean requiring the use of probability models such as the Poisson and negative binomial.

Ω2σ

When dealing with a count-derived quantity such as accumulated richness, it’s not clear that a monotonic relationship between the mean and variance is appropriate. Fig. 7 illustrates the situation. Because each plot contains four nested samples, we can treat these samples as plot replicates and use them to calculate the mean and variance of richness at each scale for each plot. Fig. 7a plots the mean richness trajectories while Fig. 7b plots the variance trajectories for a random sample of 50 plots. Keep in mind that with n = 4 at each scale, the variance estimates are likely to be rather poor.

Scale

Plo

t Mea

n

1 2 3 4

010

2030

4050

6070

Scale

Plo

t Var

ianc

e

1 2 3 4

020

4060

8010

012

0

Figure 7 Means and variance for species richness for a random sample of 50 plots. Modules (individual nested samples) are treated as replicates (n = 4 for each plot).

Observe that while mean richness increases monotonically in all plots, a variety of patterns are exhibited by the variance, including constant, monotonic, and quadratic trends. Other than the fact that the smallest variance typically occurs at the smallest scale, little else can be said. My gut reaction is that the quadratic pattern might be closer to the truth. (Note: I did fit a multilevel model to the variance data using the entire collection of

Page 11: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 11

1352 trajectories and found evidence for a significant linear trend in variance. In a multilevel model one essentially obtains estimates of the slopes for each trajectory and asks if the mean of those slopes is different from zero.)

Given that we have only a single observation at each spatial scale there are operationally only a limited number of sensible ways to construct estimates for the variance at each scale for use in fitting a species accumulation curve. I can’t think of an a priori theoretical approach (like the one I used for subplot autocorrelation) for doing this except to appeal to what is known for count data, but unlike count data the variance of richness data would eventually have to decrease once the majority of the species in the plot were inventoried. Unfortunately there is not enough data available per sample to try to model the variance parametrically as part of the process of fitting the species accumulation curve. A resampling-based empirical solution would be to use a bootstrap on the combined species lists for all modules in a plot to get an estimate of variance at each scale. (If we had abundance lists rather than species lists to work with the bootstrap could be done using just a single nested sample rather than all four.) Perhaps this is worth doing. I confess I haven’t played with this to see what arises.

I think ignoring heteroscedasticity is a less severe criticism than ignoring the autocorrelation issue. After all, with only one observation at each spatial scale one can’t actually “see” that heteroscedasticity is a problem with these data. Furthermore, it’s possible that the transformations used in some of the linearized species accumulation curve models might serve to mitigate the problem some, if the problem exists at all. Rather than drag this analysis out any longer I’m going to punt on this question for now and proceed under the assumption that the variance of species richness is constant across spatial scales.

Positive definiteness

A technical issue is to what extent the estimated Ω is truly a correlation matrix. The sole mathematical requirement, other than that its elements are restricted to lie in the interval [–1,1], is that Ω must be positive definite. Because the component correlations of Ω were calculated pairwise, one correlation at a time, rather than simultaneously, it is conceivable that the correlation matrix obtained might not be positive definite. A matrix is positive definite if all of its eigenvalues are positive or, equivalently, if the determinant of each of its principal diagonal submatrices (including the matrix itself) is positive. Of the 5408 correlation matrices randomly constructed for this analysis, 240 of them turned out not to be positive definite. More specifically these 240 all were positive semidefinite, meaning that at least one of their eigenvalues was zero. The matrix below that arose as the fourth replicate obtained from plot 001-05-0207 is a typical example.

=

11

1111

43

61

61

43

92

92

61

92

61

92

r

This matrix arose from four plots with 2, 1, 9, and 10 different species which when accumulated yielded accumulated richness values of 2, 2, 9, and 12. Because the first two accumulated richness values are the same, the first two rows of the correlation matrix are identical. As a result the matrix is singular and its determinant is zero.

Page 12: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 12

Rather than discard these 240 observations, I elected to alter their correlation matrices slightly so that the new matrix is nonsingular. I did so by replacing all perfect correlations (correlations of 1) with a near perfect correlation of 0.95. Thus the correlation matrix shown above was replaced with

=

11

195.95.1

*

43

61

61

43

92

92

61

92

61

92

r .

The altered matrix is now positive definite and still contains the crucial detail about this sample namely that the two smallest plots were highly correlated in species richness.

Fitting the Models

Because occasionally plots at the smallest scale did not contain any species, zero richness values appear in the database. Because the logarithm of zero is undefined, it is technically not possible to fit the log Arrhenius model to such plots. Of course this problem doesn’t arise in fitting the Gleason and Arrhenius models. Instead of deleting these plots I elected to follow what is a fairly standard protocol when log transforming count data that contain zero counts—replace log with y ( )cy +log for some choice of c. Typical choices for c are 2

1 or 1. I elected to use 21=c and, departing from the

standard convention, I only added this constant to the zero richness values, not all the values. Because the log Arrhenius and Gleason models are linear models, the coefficient estimates were obtained directly from eqn (8) using the correlation matrix that was estimated from the species lists. The Arrhenius model was fit using the gnls, the generalized nonlinear least squares function in R.

Part 3: Comparing OLS and GLS Results Using MSE, Bias, and Variance

Background

In each plot the goal is to accurately estimate the known species richness at the 400 m2 scale. To simplify discussion let’s denote this value as θ. Now in reality θ varies from plot to plot, but let’s overlook this complication for the moment. A protocol used for estimating θ is typically called an estimator of θ and is denoted by . The two sampling schemes (nested and non-nested), three species accumulation curve models (log Arrhenius, Gleason, and Arrhenius), and two methods of estimation (OLS and GLS) combine to give us 12 different estimators as shown in the table below.

θ

θ

OLS Estimation GLS Estimation Model Nested Non-nested Nested Non-nested Log Arrhenius OLS

)nest(LAθ OLSLAθ GLS

)(nestLAθ GLSLAθ

Gleason OLS(nest)Gθ OLS

Gθ GLS)nest(Gθ GLS

Gθ Arrhenius OLS

)nest(Aθ OLSAθ GLS

)nest(Aθ GLSAθ

Page 13: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 13

Not all pairwise comparisons of these estimators are relevant. Essentially the primary question of interest can be formulated as follows: “For a given model, does the choice of sampling scheme (nested or non-nested), appreciably alter the accuracy of the estimate obtained, and do the results depend upon the method of estimation (OLS or GLS) used?” How should these estimators be compared? The primary concern is one of accuracy, i.e., which estimator yields estimates that tend to come closer to the true value θ. A standard statistical measure of accuracy is mean squared error (MSE). The mean squared error of an estimator is defined as follows: θ

( ) ( )

−=

2ˆˆMSE θθθ E (9) where E is the ordinary expectation operator. Estimators with smaller MSE are more accurate. The inclusion of the expectation operator is necessary. is a random variable; it varies from sample to sample. Thus in a given sample the squared difference between the estimate and the true value might be small, but it could just as well be large for another sample. Such sample-dependence is unsatisfactory when evaluating an estimator and the use of MSE avoids this problem. A small MSE means that the squared differences between the estimates and the true value will be small on average across all samples.

θ

The presence of squared differences in the expectation is necessary to prevent underestimates and overestimates from canceling each other out. Options other than squaring would work as well, e.g., absolute value would be another sensible choice. The primary appeal of squared differences, and hence of MSE, is that MSE can be algebraically decomposed into two interpretable components: bias and precision. Bias is a measure of whether an estimator is consistently too high or too low in its estimate of a parameter and is defined as follows.

( ) ( ) θθθ −= ˆˆBias E (10)

An estimator with negative bias tends to underestimate θ on average, while one with positive bias tends to overestimate θ. An estimator without bias is said to be unbiased.

Precision, on the other hand, is a measure of the variability of an estimator. The more variable an estimator is, the less precise it is. An estimator’s variance, its average variability about its sample mean, is typically used as a measure of its precision.

( ) ( )[ ]2ˆˆˆVar θθθ EE −= (11) Algebraically it can be shown that

( ) ( ) ( )[ ]2ˆBiasˆVarˆMSE θθθ += In comparing two estimators we can look at MSE, its two components bias and variance, or all three. For further discussion of MSE, bias, and precision in an ecological context, see Hellmann and Fowler (1999).

Page 14: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 14

Obtaining an Aggregated Measure of MSE, Bias, and Variance When θ Varies The problem for us in applying the above definitions to the plot data is that there is not a single unique θ that we are trying to estimate. Instead θ varies from plot to plot. Thus we are forced to compare our estimators at each different value of θ (see Part 4 for an efficient way to do this), or, alternatively, use some ad hoc method to combine the results for different θ into some sort of single overall measure. In this section I focus on one approach for constructing such an aggregated measure. Define a random variable as follows: . Clearly if is an estimator of θ, then is an estimator of 0. Observe the following:

*θ θθθ −= ˆˆ* θ*θ

( ) ( ) ( ) ( ) ( ) ( )*** ˆBias0ˆˆˆˆˆBias θθθθθθθθ =−==−=−= EEEE .

So, and have the same bias. Also θ *θ

( ) ( ) ( )*ˆVarˆVarˆVar θθθθ =−= because subtracting a constant from a random variable does not change its variability. Finally,

( ) ( ) ( ) ( ) ( )*2*2*2 ˆMSE0ˆˆˆˆMSE θθθθθθ =−==−= EEE .

So, and have the same mean squared error. Thus if we subtract from the value of θ it is trying to estimate, we end up with a new estimator that estimates 0 in that plot. Since the from different plots are estimating the same quantity 0, we can treat them as realizations of a single random variable. By using we can obtain an overall measure of MSE, bias, and variance without reference to a specific value of θ.

θ *θ

θ*θ

One caveat: do not take this approach too seriously. This is a method for combining a heterogeneous collection of estimators that are functions of a varying parameter into a homogeneous collection that no longer depend on that parameter. The set of values of MSE, bias, and variance that we obtain is conditional on the population distribution of θ. A different population distribution will produce different results. (If MSE, bias, and precision did not change with θ there would be no philosophical objection to combining them. As we’ll see in Part 4, though, they clearly do change with θ.) These objections cause us no difficulty because when we evaluate our different estimators we use the same population distribution of θ. Thus the comparisons we carry out are perfectly valid. To the extent that the given population distribution of θ is typical of other populations, our results may generalize. Results of the Aggregated Approach Each square in Fig. 8 compares the results for different models in which the same sampling scheme and estimation method were used. is the estimator of the predicted richness at 400 m2 after being transformed as discussed above. (Technically it is that is shown in Fig. 8.) As Fig. 8 reveals, all four squares show the same general pattern for

θ*θ

Page 15: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 15

the three models. The pattern does not depend on the estimation method (OLS or GLS) or the sampling scheme (nested or non-nested) used.

Den

sity

0.00

0.01

0.02

0.03

0.04

0.05

θ −40 θ −20 θ θ +20 θ +40

log−ArrheniusGleasonArrhenius

Distribution of θNested Samples

Den

sity

0.00

0.01

0.02

0.03

0.04

0.05

θ −40 θ −20 θ θ +20 θ +40

log−ArrheniusGleasonArrhenius

Distribution of θNested Samples (GLS)

Den

sity

0.00

0.02

0.04

0.06

θ −40 θ −20 θ θ +20 θ +40 θ +60

log−ArrheniusGleasonArrhenius

Distribution of θNon−nested Samples

Den

sity

0.00

0.02

0.04

0.06

θ −40 θ −20 θ θ +20 θ +40 θ +60

log−ArrheniusGleasonArrhenius

Distribution of θNon−nested Samples (GLS)

Figure 8 The distribution of the predicted richness at the 400 m2 scale for three different models using the aggregated approach. Models in the left column were fit using ordinary least squares; the correlation between plots was ignored in fitting the models. Models in the right column were fit using generalized least squares (GLS); the correlation between plots was used in fitting the models. The top row is the estimator’s distribution for nested samples and the bottom row is the distribution for non-nested samples. in the figure more properly corresponds to

of the text. Similarly the θ shown on horizontal axis is really 0. θ

We see that the Gleason model consistently underestimates θ (has negative bias) while the Arrhenius and log Arrhenius models show a much smaller (in absolute value) positive bias. The log Arrhenius model appears to yield the most variable estimator. Comparisons across sampling schemes are easier to make using Fig. 9 below, but even in Fig. 8 we can see that under OLS estimation, both the Arrhenius and log Arrhenius models exhibit a positive bias for non-nested sampling that is not seen in the nested samples. This bias appears to disappear under GLS estimation. The two tables shown below display typical aggregated values of bias, variance, and MSE for each of the three models cross-classified by sampling scheme and

Page 16: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 16

estimation method. The tabulated values are those that correspond to the peak density in each of the kernel density displays of Fig. 8. The observed negative bias of the Gleason model in general, and the positive bias of Arrhenius and log Arrhenius models for non-nested sampling are confirmed by the table.

OLS Estimation Bias Variance MSE

Model Nested Non-nested

Nested Non-nested

Nested Non-nested

Log Arrhenius 10.779 21.499 548.652 797.471 664.849 1259.688 Gleason –15.621 –10.830 93.451 53.139 337.463 170.425 Arrhenius 2.174 5.571 139.046 100.331 143.770 131.367

GLS Estimation Bias Variance MSE

Model Nested Non-nested

Nested Non-nested

Nested Non-nested

Log Arrhenius 1.960 8.612 343.793 348.228 347.636 422.390 Gleason –15.581 –10.557 94.748 54.985 337.528 166.437 Arrhenius 1.915 2.633 221.426 145.743 225.092 152.675

What was not obvious from the figure but is apparent in the tables is how much more precise (smaller variance) are the estimates obtained using the Gleason model (although some of that advantage disappears under GLS estimation). GLS estimation has its biggest effect on the log Arrhenius model improving its bias, variance, and MSE for both sampling schemes. Using MSE to rank models we obtain the same order for nested and non-nested sampling under OLS estimation: Arrhenius is better than Gleason is better than log Arrhenius. With GLS estimation, the relative ranking depends on the sampling scheme.

Fig. 9 compares the nested and non-nested sampling schemes directly for each model and separately for each estimation method. The small squares (indicating the distribution mean) on the density = 0 line in each square in the figure are included to aid in assessing the relative bias of the sampling schemes. The reduced bias accruing from nested sampling is clear for the Arrhenius and log Arrhenius models, although this difference for the log Arrhenius model essentially disappears under GLS estimation. Notice that in all cases the non-nested density is shifted to the right of the nested density. This suggests that overall non-nested samples yield higher estimates of richness than do nested samples.

Page 17: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 17

Den

sity

0.00

0.01

0.02

0.03

θ −20 θ θ +20 θ +40 θ +60

nested densitynested meannon−nested densitynon−nested mean

Distribution of θLog−Arrhenius Model

Den

sity

0.00

0.01

0.02

0.03

θ −20 θ θ +20 θ +40 θ +60

nested densitynested meannon−nested densitynon−nested mean

Distribution of θLog−Arrhenius Model (GLS)

Den

sity

0.00

0.02

0.04

0.06

θ −40 θ −20 θ

nested densitynested meannon−nested densitynon−nested mean

Distribution of θGleason Model

Den

sity

0.00

0.02

0.04

0.06

θ −40 θ −20 θ

nested densitynested meannon−nested densitynon−nested mean

Distribution of θGleason Model (GLS)

Den

sity

0.00

0.01

0.02

0.03

0.04

0.05

θ −20 θ θ +20 θ +40 θ +60

nested densitynested meannon−nested densitynon−nested mean

Distribution of θArrhenius Model

Den

sity

0.00

0.01

0.02

0.03

0.04

0.05

θ −20 θ θ +20 θ +40

nested densitynested meannon−nested densitynon−nested mean

Distribution of θArrhenius Model (GLS)

Figure 9 The distribution of the predicted richness at the 400 m2 scale for three different models. The left column displays model results using ordinary least squares in which the correlation between plots is ignored in fitting the models. The right column displays model results using generalized least squares (GLS) in which the correlation between plots is used in fitting the models. Each display contrasts the estimator’s distribution for nested and non-nested sampling schemes for a specific model. in the figure more properly corresponds to of the text. Similarly the θ shown on horizontal axis is really 0.

θ *θ

Page 18: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 18

Part 4: A Pairwise Approach to Comparing Bias, Variance, and MSE Background While the aggregated approach described in Part 3 is informative, it ignores an important aspect of the experimental design. The 1352 plots that are used in each of the models under the two different sampling schemes are not separate independent samples. In truth each model, sampling scheme, and estimation method uses the same set of plots over again. In other words the data are inherently paired. The fact that this pairing has been ignored up to now is not a fatal flaw in the analysis, because we have not carried out significance tests or made any use of the “inflated” sample sizes in our results. (In fact it is not clear to me what role if any significance testing should play here at all. I’ve avoided it completely because in my opinion the analysis is not based on a sample but instead makes use of the entire population of interest.) Still, if plots vary in any important way, some of this variability can potentially obscure interesting patterns in the data. Incorporating a paired design as part of the analysis can have the beneficial effect of increasing the power to detect such patterns. Treating the data as paired also can allow us to grapple with the population level heterogeneity of θ directly without the need for ad hoc methods. Each plot yields 4 nested samples and as many as 24 different non-nested samples. Since each plot has a single θ to be estimated, each of these samples when coupled with a model, sampling scheme, and estimation method provides a separate estimate of the same θ. Thus we can use these individual estimates to provide a plot-specific estimate of bias, variance, and MSE. With only n = 4 samples to estimate these quantities, the estimates themselves will not be of the best quality. But since this is a plot-specific problem and not a function of the samples per se, it should affect the different estimators equally. Furthermore if we focus on the pairwise differences in bias, variance, and MSE for pairs of estimators in each plot separately, some of these inadequacies should wash out. Finally the large number of samples available, 1352 pairs, should dilute any plot-specific biases that might arise. In what follows I use the pairwise method to focus specifically on differences due to sampling method, nested or non-nested, controlling for model and estimation method used. I proceed as follows. In each plot I use the four nested samples and four randomly created non-nested samples and eqns (9), (10), and (11) to obtain separate estimates of MSE, variance, and bias for each of the three models and each of the two estimation methods. In each plot the difference in these estimates between nested and non-nested samples is calculated and the differences obtained for all 1352 plots are plotted against the true value of θ. Figs. 11–19 show the results for MSE, bias, and variance for log Arrhenius (Figs. 11–13), Gleason (Figs. 14–16), and Arrhenius (Figs. 17–19) models. The arrangement of the graphs in the figures is the same in each (Fig. 10). There are two columns—OLS results are displayed on the left and GLS results on the right. The top row (brown plot symbols) displays the differences, non-nested minus nested, the middle row (blue plot symbols) displays the values for nested samples only, and the bottom row (green plot symbols) displays the results for non-nested samples. As a general remark, if the sampling schemes have no effect on the property in question, the plot of differences (the top row) should show only random scatter about the horizontal zero line. I discuss the results for each model in turn.

Page 19: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 19

OLS GLS

Differences

(non-nested – nested)

Nested

Non-nested

Differences

(non-nested – nested)

Nested

Non-nested

Figure 10 Logical arrangement of graphs in each of Figs. 11–19. Top row shows paired differences. Rows 2 and 3 show individual values

Log Arrhenius Model (Figs. 11–13)

Observe first that MSE, bias, and variance tend to increase with θ for both sampling schemes and both estimation methods. (Each figure shows a trend line with positive slope in rows two and three.) Furthermore the scatter in these estimates increases with θ. Next turn to the differences. While many of the paired differences in MSE are close to zero, it is clear that there is an excess of positive differences (Fig. 11 top row, left column). In other words, in a given plot the estimator based on non-nested samples typically has a larger MSE than does the estimator based on nested samples. Interestingly, this pattern completely disappears when GLS estimation is used (right column). The pattern is even more striking when we look at bias (Fig. 12). Non-nested samples yield estimates that are positively biased when compared to nested samples and the difference between them increases with θ (the scatter plot of differences shows a significant positive trend). Observe that an increase in bias with θ occurs in non-nested samples all by themselves (bottom row), but is not observed in nested samples (middle row) where we see an increasing spread with θ but an essentially random scatter about 0. Recall from the aggregated results that we concluded non-nested samples yielded estimates with a slightly larger positive bias, but that this bias disappeared somewhat under GLS estimation. Fig. 12 reveals that the latter conclusion was overly optimistic. Using the more powerful paired difference approach we see that even under GLS estimation there is a preponderance of positive bias differences at all values of θ and this imbalance increases with θ. From Fig. 13 we can conclude that there is essentially no difference in precision between the two sampling methods. Conclusions: Non-nested samples yield estimates that typically are more biased (positively) and have larger MSE than do nested samples when using the log Arrhenius model, although much of the difference in MSE and some of the difference in bias is removed by using GLS estimation.

Page 20: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 20

Gleason Model (Figs. 14–16)

MSE, bias, and variance increase with θ regardless of the sampling scheme or method of estimation. For large θ the bias is profoundly negative. The paired differences confirm the results from the aggregated approach; nested samples tend to yield estimates that are more negatively biased than those from non-nested samples (Fig. 15). While there is no trend in the variance with θ (Fig. 16), the preponderance of negative differences suggests that nested samples yield estimates that are typically less precise than are those from non-nested samples. Finally the MSE of estimates from non-nested samples is smaller than that from nested samples (Fig. 14), and the magnitude of the difference between them increases with θ. Conclusions: The estimator based on non-nested samples beats the estimator based on nested samples with respect to all three criteria: MSE, bias, and variance. But as Fig. 16 shows both sampling schemes yield severely biased estimates, a fact that is little improved by switching to GLS estimation. Thus the victory of the non-nested sampling scheme appears to be a hollow one. Arrhenius Model (Figs. 17–19) The Arrhenius model yields the best behaved results of any of the models. There is essentially no trend in MSE or variance with θ for either sampling scheme. The difference plots for MSE and variance show random scatter about zero. There is essentially no difference between the sampling methods with respect to these two properties. Bias is a different story. Non-nested samples yield estimates that are more positively biased than their nested counterparts. Although GLS estimation mitigates this somewhat, there is still clearly a preponderance of positive differences even here. The individual sample plots of rows two and three are even more revealing. Nested samples show essentially random scatter of bias about zero. Non-nested samples show a distinct positive bias that tends to increase slightly with θ. While GLS improves this somewhat, non-nested samples still tend to be more positively biased than are nested samples. Conclusions: Nested and non-nested samples when using the Arrhenius model are indistinguishable with respect to MSE and variance. Unlike the other two models, the accuracy of the Arrhenius model is nearly invariant to the magnitude of θ. Estimates from nested samples tend to be less biased than those from non-nested samples with the extent of the difference increasing with θ.

Page 21: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 21

True Richness at 400 m2

MS

E D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

050

0010

000

1500

020

000

2500

0

Log Arrhenius Model

True Richness at 400 m2

MS

E D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−20

00−

1000

010

0020

0030

00

Log Arrhenius Model (GLS)

True Richness at 400 m2

MS

E (

nest

ed s

ampl

es)

20 40 60 80 100 120 140

050

0010

000

1500

0

Log Arrhenius Model

True Richness at 400 m2

MS

E (

nest

ed s

ampl

es)

20 40 60 80 100 120 140

020

0040

0060

0080

00

Log Arrhenius Model (GLS)

True Richness at 400 m2

MS

E (

non−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

050

0010

000

1500

020

000

2500

0

Log Arrhenius Model

True Richness at 400 m2

MS

E (

non−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

020

0040

0060

00

Log Arrhenius Model (GLS)

Figure 11 MSE for the estimator of predicted richness at the 400 m2 scale for log Arrhenius models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in MSE between nested and non-nested sampling schemes. The second and third rows show the MSE for nested and non-nested samples alone, respectively.

Page 22: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 22

True Richness at 400 m2

Bia

s D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

020

4060

80

Log Arrhenius Model

True Richness at 400 m2

Bia

s D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−30

−10

010

2030

Log Arrhenius Model (GLS)

True Richness at 400 m2

Bia

s (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

−20

020

4060

80

Log Arrhenius Model

True Richness at 400 m2

Bia

s (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

−20

020

4060

Log Arrhenius Model (GLS)

True Richness at 400 m2

Bia

s (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

050

100

Log Arrhenius Model

True Richness at 400 m2

Bia

s (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

−40

−20

020

4060

Log Arrhenius Model (GLS)

Figure 12 Bias for the estimator of predicted richness at the 400 m2 scale for log Arrhenius models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in bias between nested and non-nested sampling schemes. The second and third rows show the bias for nested and non-nested samples alone, respectively.

Page 23: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 23

True Richness at 400 m2

Var

ianc

e D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

050

0010

000

Log Arrhenius Model

True Richness at 400 m2

Var

ianc

e D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−20

00−

1000

010

0020

0030

00

Log Arrhenius Model (GLS)

True Richness at 400 m2

Var

ianc

e (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

020

0040

0060

00

Log Arrhenius Model

True Richness at 400 m2

Var

ianc

e (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

010

0020

0030

0040

0050

00

Log Arrhenius Model (GLS)

True Richness at 400 m2

Var

ianc

e (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

030

0060

0090

0012

000

Log Arrhenius Model

True Richness at 400 m2

Var

ianc

e (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

010

0020

0030

0040

0050

00

Log Arrhenius Model (GLS)

Figure 13 Variance of the estimator of predicted richness at the 400 m2 scale for log Arrhenius models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in variance between nested and non-nested sampling schemes. The second and third rows show the variance for nested and non-nested samples alone, respectively.

Page 24: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 24

True Richness at 400 m2

MS

E D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−12

00−

900

−60

0−

300

0

Gleason Model

True Richness at 400 m2

MS

E D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−15

00−

1000

−50

00

Gleason Model (GLS)

True Richness at 400 m2

MS

E (

nest

ed s

ampl

es)

20 40 60 80 100 120 140

060

012

0018

0024

00

Gleason Model

True Richness at 400 m2

MS

E (

nest

ed s

ampl

es)

20 40 60 80 100 120 140

060

012

0018

0024

00

Gleason Model (GLS)

True Richness at 400 m2

MS

E (

non−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

050

010

0015

00

Gleason Model

True Richness at 400 m2

MS

E (

non−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

050

010

0015

0020

00

Gleason Model (GLS)

Figure 14 MSE for the estimator of predicted richness at the 400 m2 scale for Gleason models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in MSE between nested and non-nested sampling schemes. The second and third rows show the MSE for nested and non-nested samples alone, respectively.

Page 25: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 25

True Richness at 400 m2

Bia

s D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

05

1015

20

Gleason Model

True Richness at 400 m2

Bia

s D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−5

05

1015

20

Gleason Model (GLS)

True Richness at 400 m2

Bia

s (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

−50

−40

−30

−20

−10

0

Gleason Model

True Richness at 400 m2

Bia

s (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

−50

−40

−30

−20

−10

0

Gleason Model (GLS)

True Richness at 400 m2

Bia

s (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

−40

−30

−20

−10

0

Gleason Model

True Richness at 400 m2

Bia

s (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

−40

−30

−20

−10

0

Gleason Model (GLS)

Figure 15 Bias for the estimator of predicted richness at the 400 m2 scale for Gleason models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in bias between nested and non-nested sampling schemes. The second and third rows show the bias for nested and non-nested samples alone, respectively.

Page 26: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 26

True Richness at 400 m2

Var

ianc

e D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−30

0−

200

−10

00

Gleason Model

True Richness at 400 m2

Var

ianc

e D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−30

0−

200

−10

00

100

Gleason Model (GLS)

True Richness at 400 m2

Var

ianc

e (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

010

020

030

0

Gleason Model

True Richness at 400 m2

Var

ianc

e (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

010

020

030

040

050

0

Gleason Model (GLS)

True Richness at 400 m2

Var

ianc

e (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

050

100

150

200

Gleason Model

True Richness at 400 m2

Var

ianc

e (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

010

020

030

0

Gleason Model (GLS)

Figure 16 Variance of the estimator of predicted richness at the 400 m2 scale for Gleason models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in variance between nested and non-nested sampling schemes. The second and third rows show the variance for nested and non-nested samples alone, respectively.

Page 27: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 27

True Richness at 400 m2

MS

E D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−20

000

2000

4000

Arrhenius Model

True Richness at 400 m2

MS

E D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−10

000

−50

000

5000

Arrhenius Model (GLS)

True Richness at 400 m2

MS

E (

nest

ed s

ampl

es)

20 40 60 80 100 120 140

010

0020

0030

0040

00

Arrhenius Model

True Richness at 400 m2

MS

E (

nest

ed s

ampl

es)

20 40 60 80 100 120 140

025

0050

0075

0010

000

Arrhenius Model (GLS)

True Richness at 400 m2

MS

E (

non−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

010

0030

0050

00

Arrhenius Model

True Richness at 400 m2

MS

E (

non−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

015

0030

0045

00

Arrhenius Model (GLS)

Figure 17 MSE for the estimator of predicted richness at the 400 m2 scale for Arrhenius models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in MSE between nested and non-nested sampling schemes. The second and third rows show the MSE for nested and non-nested samples alone, respectively.

Page 28: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 28

True Richness at 400 m2

Bia

s D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−20

−10

010

20

Arrhenius Model

True Richness at 400 m2

Bia

s D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−60

−40

−20

020

Arrhenius Model (GLS)

True Richness at 400 m2

Bia

s (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

−20

010

2030

4050

Arrhenius Model

True Richness at 400 m2

Bia

s (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

−20

020

4060

Arrhenius Model (GLS)

True Richness at 400 m2

Bia

s (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

010

2030

40

Arrhenius Model

True Richness at 400 m2

Bia

s (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

−20

020

40

Arrhenius Model (GLS)

Figure 18 Bias for the estimator of predicted richness at the 400 m2 scale for Arrhenius models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in bias between nested and non-nested sampling schemes. The second and third rows show the bias for nested and non-nested samples alone, respectively.

Page 29: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 29

True Richness at 400 m2

Var

ianc

e D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−30

00−

1500

015

00

Arrhenius Model

True Richness at 400 m2

Var

ianc

e D

iffer

ence

(no

n−ne

sted

−ne

sted

)

20 40 60 80 100 120 140

−80

00−

4000

020

00

Arrhenius Model (GLS)

True Richness at 400 m2

Var

ianc

e (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

010

0020

0030

00

Arrhenius Model

True Richness at 400 m2

Var

ianc

e (n

este

d sa

mpl

es)

20 40 60 80 100 120 140

020

0040

0060

0080

00

Arrhenius Model (GLS)

True Richness at 400 m2

Var

ianc

e (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

010

0020

0030

00

Arrhenius Model

True Richness at 400 m2

Var

ianc

e (n

on−

nest

ed s

ampl

es)

20 40 60 80 100 120 140

010

0020

0030

00

Arrhenius Model (GLS)

Figure 19 Variance of the estimator of predicted richness at the 400 m2 scale for Arrhenius models. The left column shows results using ordinary least squares (OLS). The right column shows results using generalized least squares (GLS). The first row shows paired differences in variance between nested and non-nested sampling schemes. The second and third rows show the variance for nested and non-nested samples alone, respectively.

Page 30: Using Plot Correlation Structure to Compare Species

Incorporating the Correlation Structure of Plots … 30

Summary Richness predictions based on non-nested samples tend to be larger than those obtained from nested samples. For Gleason models this is a good thing, yielding a smaller negative bias, but for Arrhenius and log Arrhenius models this is bad leading to an increase in bias. The two sampling methods when used with Arrhenius and log Arrhenius models yield estimates that are equally precise. For Gleason models non-nested samples lead to greater precision. Overall Arrhenius-type models yield better results than does the Gleason model. The Arrhenius model is especially attractive because its accuracy does not vary with the magnitude of the parameter being estimated (unlike the other two models). If the log-transformed Arrhenius model is chosen to fit species accumulation curves, it is recommended that the plot correlation structure be incorporated in the analysis using generalized least squares estimation. For other models, ordinary least squares estimation appears to be adequate. Cited References Cam, E., J. D. Nichols, J. E. Hines, J. R. Sauer, R. Alpizar-Jara, C. H. Flather. 2002.

Disentangling sampling and ecological explanations underlying species-area relationships. Ecology 83(4): 1118–1130.

Hellmann, Jessica J. and Gary W. Fowler. 1999. Bias, precision, and accuracy of four measures of species richness. Ecological Applications 9(3): 824–834.