latent spatial models and sampling design for landscape ...landscape genetic sampling design 1045 of...

22
The Annals of Applied Statistics 2016, Vol. 10, No. 2, 1041–1062 DOI: 10.1214/16-AOAS929 © Institute of Mathematical Statistics, 2016 LATENT SPATIAL MODELS AND SAMPLING DESIGN FOR LANDSCAPE GENETICS 1 BY EPHRAIM M. HANKS ,MEVIN B. HOOTEN , ,STEVEN T. KNICK § , SARA J. OYLER-MCCANCE ,J ENNIFER A. FIKE , TODD B. CROSS , ∗∗ AND MICHAEL K. SCHWARTZ Pennsylvania State University , U.S. Geological Survey, Colorado Cooperative Fish and Wildlife Research Unit , Colorado State University , U.S. Geological Survey, Forest and Rangeland Ecosystem Science Center § , U.S. Geological Survey, Fort Collins Science Center , U.S. Forest Service, Rocky Mountain Research Station and University of Montana ∗∗ We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spa- tial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random ef- fect to allow for spatial correlation between genetic observations. We illus- trate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models cover- ing large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States. 1. Introduction. Landscape connectivity is “the degree to which the land- scape facilitates or impedes movement among resource patches” [Taylor et al. (1993)]. Connectivity among subpopulations or habitat patches is an important fac- tor in maintaining biodiversity, understanding the spread of infectious diseases and allocating resources for conservation. Many management efforts devote consider- able resources to maintaining or restoring landscape connectivity [e.g., Crooks and Sanjayan (2006)]. Natural and anthropogenic disturbance can alter the land- scape, affecting connectivity and increasing uncertainty about the functioning of current ecosystems. Data driven decision-making can lead to effective manage- ment of species and ecosystems. Spatially-referenced genetic data are commonly used to study functional con- nectivity [e.g., Cushman et al. (2006), Durand et al. (2009), Guillot et al. (2005), McRae (2006), Slatkin (1987)]. Genetic variation in a metapopulation is a func- tion of gene flow and mutation, and the spatial distribution of genetic variation is a function of migration within the metapopulation and the mating process which Received December 2014; revised March 2016. 1 Supported by the U.S. Geological Survey RWO 98. Key words and phrases. Landscape genetics, sage grouse, optimal sampling. 1041

Upload: others

Post on 02-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

The Annals of Applied Statistics2016, Vol. 10, No. 2, 1041–1062DOI: 10.1214/16-AOAS929© Institute of Mathematical Statistics, 2016

LATENT SPATIAL MODELS AND SAMPLING DESIGNFOR LANDSCAPE GENETICS1

BY EPHRAIM M. HANKS∗, MEVIN B. HOOTEN†,‡, STEVEN T. KNICK§,SARA J. OYLER-MCCANCE¶, JENNIFER A. FIKE¶,

TODD B. CROSS‖,∗∗ AND MICHAEL K. SCHWARTZ‖

Pennsylvania State University∗, U.S. Geological Survey, Colorado CooperativeFish and Wildlife Research Unit†, Colorado State University‡, U.S. Geological

Survey, Forest and Rangeland Ecosystem Science Center§, U.S. GeologicalSurvey, Fort Collins Science Center¶, U.S. Forest Service, Rocky Mountain

Research Station‖ and University of Montana∗∗

We propose a spatially-explicit approach for modeling genetic variationacross space and illustrate how this approach can be used to optimize spa-tial prediction and sampling design for landscape genetic data. We propose amultinomial data model for categorical microsatellite allele data commonlyused in landscape genetic studies, and introduce a latent spatial random ef-fect to allow for spatial correlation between genetic observations. We illus-trate how modern dimension reduction approaches to spatial statistics canallow for efficient computation in landscape genetic statistical models cover-ing large spatial domains. We apply our approach to propose a retrospectivespatial sampling design for greater sage-grouse (Centrocercus urophasianus)population genetics in the western United States.

1. Introduction. Landscape connectivity is “the degree to which the land-scape facilitates or impedes movement among resource patches” [Taylor et al.(1993)]. Connectivity among subpopulations or habitat patches is an important fac-tor in maintaining biodiversity, understanding the spread of infectious diseases andallocating resources for conservation. Many management efforts devote consider-able resources to maintaining or restoring landscape connectivity [e.g., Crooksand Sanjayan (2006)]. Natural and anthropogenic disturbance can alter the land-scape, affecting connectivity and increasing uncertainty about the functioning ofcurrent ecosystems. Data driven decision-making can lead to effective manage-ment of species and ecosystems.

Spatially-referenced genetic data are commonly used to study functional con-nectivity [e.g., Cushman et al. (2006), Durand et al. (2009), Guillot et al. (2005),McRae (2006), Slatkin (1987)]. Genetic variation in a metapopulation is a func-tion of gene flow and mutation, and the spatial distribution of genetic variation isa function of migration within the metapopulation and the mating process which

Received December 2014; revised March 2016.1Supported by the U.S. Geological Survey RWO 98.Key words and phrases. Landscape genetics, sage grouse, optimal sampling.

1041

Page 2: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1042 E. M. HANKS ET AL.

mixes genes at a fine temporal scale. Genetic data consist of a snapshot in time ofthis spatio-temporal process, and are thus spatially-correlated observations. Spatialstatistical models have been used in many forms in the field of landscape genetics[e.g., Guillot et al. (2009), Hanks and Hooten (2013), Selander (1970), Smouseand Peakall (1999), Sokal, Oden and Wilson (1991)]. Here we introduce a spatialstatistical model with the specific aim to use the model to identify optimal spatialsampling designs for landscape genetic data.

In Section 2, we introduce our motivating landscape genetic study, which in-volves optimal retrospective sampling design for landscape genetics. In Section 3,we propose a hierarchical multinomial model with latent spatial autocorrelationfor spatially-referenced genetic data. In Section 4, we describe a spatial covari-ance function that takes into account the lekking behavior of sage-grouse, and fitthe hierarchical model from Section 3 to the sage-grouse genetic data. In Section 5,we describe approaches for retrospective spatial sampling design based on the hi-erarchical model from Section 3, and make recommendations on the allocation ofthe retrospective sampling effort for sage-grouse genetic data in the western UnitedStates. In Section 6, we discuss possible extensions to our latent variable approachto modeling landscape genetic data.

2. Greater sage-grouse in the western United States. The greater sage-grouse (Centrocercus urophasianus) is the largest species of grouse found in NorthAmerica. This species depends on sagebrush (Artemisia spp.) for both food andcover, and the health of the sage-grouse population is a strong indicator of thehealth of sagebrush in the western U.S. [Connelly et al. (2000), Patterson (1952)].Sage-grouse are a lekking species with leks often found in natural openings insagebrush communities, surrounded by potential nesting habitat [e.g., Connelly,Hagen and Schroeder (2011)]. Leks are spatial locations where, during breedingseasons, male sage-grouse engage in competitive mating displays. Leks can be re-markably persistent in space and time, with some leks remaining active for up to90 years [Dalke et al. (1963), Smith et al. (2005), Wiley (1973)].

Greater sage-grouse were historically widely distributed across 12 westernstates in the U.S. and three Canadian provinces, but have undergone both a pop-ulation decline [Garton et al. (2011)] and a range contraction [Schroeder et al.(2004)]. Quantifying the distribution of sage-grouse genetic variability across therange of the species could be used to identify population structure and characteris-tics of gene flow that provide insight into regions where management actions couldhave the greatest impact.

2.1. Sample collection. From 2009 to 2012, sage-grouse feathers were col-lected from sage-grouse leks during the spring breeding season. An opportunisticsample of all the feathers observed on the lek was collected by collecting feathersobserved while traversing the lek. Feathers were placed in paper envelopes withthe spatial coordinates for the lek. Following collection, envelopes were stored incool, dry facilities out of direct sunlight. All envelopes and feathers collected from

Page 3: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1043

any given lek were pooled by lek, as it was impossible to determine before a labanalysis which feathers could have come from the same sage-grouse.

2.2. Sample extraction and genotyping. DNA was extracted from feathers us-ing QIAGEN’s DNeasy Blood and Tissue Kit and a user-developed protocol forpurification of total DNA from nails, hair or feathers. Each sample was genotypedacross a panel of 14 variable microsatellite loci, all of which have been redesignedspecifically to optimize efficacy with low quality and quantity DNA acquired fromnoninvasively collected feather samples.

Each sample was amplified multiple times at each microsatellite locus to screenfor genotyping error (e.g., allelic drop out, false alleles, scoring error, etc.). If therewere any variation between the genotypes generated for each sample, sampleswere genotyped another two to six times to confirm genotype accuracy. As ge-netic material was coming from feathers, many of which were significantly weath-ered, some samples failed at some loci. We removed from the analysis any indi-vidual where samples failed at more than 1/3 of the loci. The program DROPOUT[McKelvey and Schwartz (2005)] was used to identify duplicate sample genotypesand screen for genotyping error.

By December 2012, genotype data had been obtained for 830 unique individualsat 243 distinct leks clustered in the southwestern half of the sage-grouse range(Figure 1). While feathers had been collected across the entire range, genetic datawere not yet available for samples from many leks by December 2012.

FIG. 1. Sage-grouse lek locations in the western United States. Feather samples were collectedfrom 1177 sage-grouse leks between 2009 and 2012. Microsatellite allele data from 243 of these lekswere available for analysis in March, 2013.

Page 4: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1044 E. M. HANKS ET AL.

3. Latent spatial models for landscape genetics. A microsatellite is a por-tion of repetitive DNA in which certain short DNA sequences are repeated multipletimes. In transcription, it is possible for the number of repeats of the DNA segmentat a microsatellite to be either increased or decreased, as repeats are either tran-scribed multiple times or skipped. This leads microsatellites to often exhibit highmutation rates and corresponding high diversity in a population. An individualstrand of DNA’s microsatellite allele is characterized by the number of repeats ofthe short DNA sequence. Microsatellite alleles are inherited bi-parentally, and soare codominant, such that common alleles between two individuals shows sharedancestry, and a greater proportion of shared alleles indicates greater relatedness.While multiple feathers were collected at each lek, individual sage-grouse at eachlek were identified by their unique genotype.

We propose a latent spatial model for spatially referenced microsatellite alleledata where genetic relatedness is modeled using spatial correlation. In our spec-ification, the categorical allele data are modeled using a multinomial probit datamodel with latent Gaussian spatial random effects.

Consider microsatellite allele data observed at L distinct loci for each spa-tially referenced individual in the study. At the �th locus, for � = 1,2, . . . ,L,we denote the list of all distinct observed alleles from all individuals in thestudy as {a�1, a�2, . . . , a�K�

}. The actual numeric repeat length of the allele isnot typically correlated with any important feature (e.g., fitness); rather, com-mon alleles can indicate genetic relatedness. We will thus specify a model inwhich the alleles are considered as categorical observations. In particular, wemodel the two observed alleles for each (diploid) individual at each locus asarising from a multinomial distribution with spatially varying allele probabilitiesps� ≡ (ps�1, ps�2, · · · , ps�K�

)′, where s ∈ {1,2, . . . , S} indexes the spatial lo-cation.

Let ysip� be a vector coded such that the kth entry ysip�k = 1 if the pth (indexingploidy) observed allele at the �th locus is a�k for the ith individual sage-grouse atthe sth spatial lek location, and ysip�k = 0 otherwise. Then the multinomial probitmodel [e.g., Albert and Chib (1993)] for categorical data is specified in terms oflatent variables, z, as follows. Let

(1) ysip�k ={

1, zsip�k = max{zsip�a, a = 1, . . . ,K�},0, otherwise,

where

(2) zsip�ki.i.d.∼ N(μ�k + ηs�k,1).

The allele a�k makes up a fraction ps�k of the genetic makeup of the sage-grouseat location s, where ps�k = P(zsip�k = max{zsip�a, a = 1, . . . ,K�}).

The mean of the latent variable zsip�k in (2) consists of the sum of two effects.The first is μ�k , an allele specific intercept which determines the relative frequency

Page 5: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1045

of the kth allele at the �th locus across the entire population being studied. Largevalues of μ�k , relative to μ�k′ , make it more likely that zsip�k will be larger thanzsip�k′ , and so the kth allele will be more prevalent than the (k′)th allele. We notethat the model (1)–(2) is invariant to a shift in all μ�k , as the likelihood is a func-tion of the contrasts zsip�k − zsip�k′ , and not the actual values of zsip�k . Thus, ifμ�k were replaced by μ�k + c for k = 1,2, . . . ,K� and some constant c, then thelikelihood of the observed allele data would remain unchanged. To maintain modelidentifiability, we fix μ�1 = 0 for � = 1,2, . . . ,L because only the relative differ-ences (contrasts) in μ�k are identifiable.

The second term in the mean of (2) is ηs�k , which is a spatially varying randomeffect that allows the allele frequencies ps� to vary over the spatial range of thespecies. Consider

(3) η�k = [η1�k η2�k · · · ηn�k

]′ ∼ N(0,�(θ)

),

where �(θ) is the spatial covariance matrix of the spatially referenced locationsfor which we have observations, parameterized by θ .

We adopt a Bayesian approach to modeling, and specify prior distributions forall parameters in (1)–(3):

μ�k ∼ N(0, σ 2

μ

), � = 1,2, . . . ,L, k = 2,3, . . . ,K�,(4)

θ ∼ [θ ],(5)

where the bracket notation “[·]” indicates a probability distribution. The prior on θwill depend on the covariance model used, and we thus leave the prior specification(5) in a general form for now. We note that for � = 1,2, . . . ,L, each μ�1 is fixedat 0, and thus in (4) we do not specify prior distributions for these fixed baselineallele intercepts.

Implicit in the multinomial model for observed alleles is the assumption that wehave observed all possible alleles at each locus. This is not likely to be the case inmost natural populations. Under the assumption that the latent allelic processes areindependent (zsip�k is independent of zsip�k′ , k �= k′), the latent model (2) is unaf-fected by not observing some alleles, but the interpretation of ps�k changes to bethe relative probability of observing the kth allele at the �th locus of an individualat the sth spatial location, given that one of {a�1, a�2, . . . , a�K�

} is observed.

4. Sage-grouse landscape genetic analysis.

4.1. Lek network connectivity model. In the previous section we specified amultinomial model for spatially referenced microsatellite allele data with latentspatial autocorrelation. Critical to our approach is the choice of covariance matrix�(θ) for the latent spatial random effects in (3). Leks are persistent centers of sage-grouse breeding activity, and are thus essential to gene flow and spatial geneticvariation.

Page 6: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1046 E. M. HANKS ET AL.

We will consider a covariance that is based on the isolation by resistance (IBR)approach of McRae (2006) and takes into account the importance of leks in sage-grouse gene flow, similar in spirit to the model of Knick and Hanser (2011). TheIBR approach envisions space as a graph: a set of spatial nodes (leks in our case)connected by edges, where the edges are resistors in an electric circuit. McRae(2006) showed that if the resistance (edge weights) of the resistors was inverselyproportional to migration rates between nodes, and migration could be modeledas a random walk on the graph of nodes, then the resistance distance of the graph[Klein and Randic (1993)] is proportional to linearized Fst .

Consider a spatial network with nodes at known sage-grouse leks across thewestern U.S. and edge weights (conductances in the IBR framework) being a func-tion of the Euclidean distance between leks. For the sth lek, consider the set of“neighboring” leks indexed by t ∈ N (s), where N (s) is the set of leks that aredirectly connected (neighbors) to the sth lek. We consider lek s and lek t to beneighbors if a sage-grouse could plausibly migrate from lek s to lek t directly,without an intermediate stop at any other lek. This is based on the migration modelfor movement central to the IBR approach as described by McRae (2006).

Define the edge weight between the sth and t th leks as αst/σ2, where αst is a

decreasing function of the Euclidean distance between the leks and σ 2 is a scal-ing factor. Under the random walk model for migration in the IBR framework,αst/σ

2 is proportional to the migration rate of sage-grouse between the two leks.We consider edge weights that are decreasing functions f (dst ) of the Euclideandistance dst between leks, and that are set to 0 for all leks that are more distantthan a maximum distance dMAX:

(6) αst ={f (dst ), dst ≤ dMAX,

0, dst > dMAX.

We will consider multiple functional forms for f (d) and multiple maximum dis-tances dMAX in Section 4.3, and use information criteria to choose the functionalform and maximum distance most appropriate for the sage-grouse genetic data.Setting edge weights equal to zero for leks that are more distant than dMAX indi-cates that migration (and the resulting flow of genetic information) is only possiblebetween distant leks through the use of intermediate leks in the lek network. Thisis a Markov assumption, and our resulting random effects {η�k} in (3) are GaussianMarkov random fields.

Hanks and Hooten (2013) showed that the spatial connectivity implied by sucha random walk migration model (or an equivalent electric circuit) can be mod-eled as an intrinsic conditional autoregressive (ICAR) Gaussian Markov randomfield [Besag (1974), Besag and Kooperberg (1995)]. The spatial precision matrix

Page 7: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1047

(inverse covariance matrix) for an ICAR on the entire lek network is given by

(7)1

σ 2 Q = 1

σ 2

⎡⎢⎢⎢⎢⎣

∑j �=1 α1j −α12 −α13 · · ·−α21

∑j �=2 α2j −α23 · · ·

−α31 −α32∑

j �=3 α3j · · ·...

......

. . .

⎤⎥⎥⎥⎥⎦ .

We do not have observed genetic data for all known lek locations. If we dividethe leks into sampled (or observed o) and unobserved (u) leks, the spatial randomeffects η (suppressing the locus � and allele k subscripts) and precision matrix Qcan be partitioned accordingly:

η =[ηo

ηu

], Q = 1

σ 2

[Qoo Qou

Quo Quu

]

and the covariance matrix �(σ 2) of the observed leks is given by the Schur com-plement

(8) �(σ 2) = σ 2[

Qoo − Qou(Quu)−1Quo

]−1.

We assign a conjugate inverse gamma prior distribution to σ 2:

(9) σ 2 ∼ IG(r, q),

with r and q chosen so that the prior has mean of 10 and variance of 100. Wealso fixed the hyperparameter σ 2

μ = 100. Inference on the parameters ({μ�k} andσ 2) in the full model, given by equations (1)–(4) and (6)–(9), can then be madeusing a Markov chain Monte Carlo (MCMC) algorithm under a Bayesian statisticalparadigm.

4.2. Reduced-rank spatial model. Approximately 6000 known sage-grouseleks reside across the western United States, resulting in a large (≈ 6000 × 6000)spatial precision matrix Q representing pairwise connections between leks. Themagnitude of this spatial precision matrix will make model fitting and the compar-ison of potential retrospective sampling designs very computationally intensive.We thus propose a reduced-rank model for the spatial random effect η�k in (3).Our reduced-rank approach utilizes a spectral decomposition of the spatial preci-sion matrix Q and is similar to that of Wikle and Cressie (1999), Berliner, Wikleand Cressie (2000) and others.

For a fully observed lek network, η�k ∼ N(0, σ 2Q−), where Q/σ 2 is an ns ×ns

precision matrix. The spectral decomposition of Q yields

Q = MD−M′,where the columns of M are the eigenvectors of Q and D is a diagonal matrixcontaining the respective eigenvalues {λ1, λ2, . . . , λns } of Q−. Then the spatialrandom effect can be expressed as

η�k = Mδ�k,

Page 8: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1048 E. M. HANKS ET AL.

where

δ�k ∼ N(0, σ 2D

).

Then a reduced-rank version of the spatial random effect is given by taking only thefirst ne eigenvectors of Q− and setting λm = 0 for all m > ne. This approximatesthe random effect η�k with a random effect η�k that captures a portion of the spatialstructure in Q−. In practice, ne can be chosen so that η�k ≈ η�k but ne is still muchsmaller than ns , the number of leks in the network. This leads to a computationallyefficient representation of the spatial random effect, with (3) becoming

ηs�k = m′s δ�k,(10)

δ�k ∼ N(0, σ 2D

),(11)

where m′s is the sth row of M, the ns × ne matrix of the first ne eigenvectors of

Q−, and D is the ne × ne diagonal matrix of the first ne eigenvalues of Q−.The complete hierarchical statistical model we have described for the multino-

mial allele model with latent reduced-rank spatial random effects is

ysip�k ={

1, zsip�k = max{zsip�a, a = 1, . . . ,K�},0, otherwise,

zsip�k ∼ N(μ�k + ηs�k,1),

ηs�k = m′s δ�k,

δ�k ∼ N(0, σ 2D

),

μ�k ∼ N(0, τ 2)

,

σ 2 ∼ IG(r, q).

Full-conditional distributions are available for all parameters in this hierarchicalmodel, and are given in the supplemental article [Hanks et al. (2016)].

4.3. Model fitting. For our analysis of sage-grouse genetic data, two functionalforms for f (d) in (6) were specified, inverse distance [f1(d) = 1/d], and inversesquared distance [f2(d) = 1/d2]. For each of these functional forms, three differ-ent models with dMAX being specified as 18 km, 25 km and 50 km were specified.The six resulting spatial models were each fit to the observed allele data from 830feather samples collected at 243 distinct leks (Figure 1) across the western U.S.Under a Bayesian statistical paradigm, an MCMC algorithm was used to obtainsamples from the posterior distribution of parameters in each network model, con-ditioned on the observed sage-grouse genotype data. Convergence was assessedvisually, with chains showing good mixing.

The models were compared using the deviance information criterion (DIC) ofSpiegelhalter et al. (2002) (Table 1). The best model (DIC = 661,844) was the

Page 9: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1049

TABLE 1Comparison of spatial lek-network models based on DIC

Model f (d) dMAX DIC

1 1/d2 25,000.00 661,8442 1/d 50,000.00 662,4093 1/d 25,000.00 664,8484 1/d2 50,000.00 670,8835 1/d 18,000.00 671,3756 1/d2 18,000.00 674,832

network model with edge weight between leks inversely proportional to the squareof the distance between leks [f1(d) = 1/d2] and dMAX = 25 km. We note thatdMAX gives the distance at which pairwise edges between leks and correspondingentries of Q are set to zero, implying conditional independence between two leksfurther apart than 25 km, conditional on the random effects ηs�k at all other leks.

For the following analysis, we focus only on the model with the best (low-est) DIC. This corresponds to an empirical Bayesian approach to model fitting.An alternative approach would be to assign a prior distribution to the exponent(ξ ) in f (d) = d−ξ in (6) and jointly estimate the posterior distribution of ξ withthe posteriors for other model parameters. A similar approach could be taken forthe cutoff parameter dMAX. However, this approach would be computationally de-manding, and in testing leads to poor mixing of MCMC chains. Our empiricalBayes approach is similar in spirit to assigning a discrete uniform prior on therange parameter in a geostatistical spatial model [e.g., Diggle and Ribeiro (2007)].

The posterior distribution for σ 2 in the best model has mean of 0.169 and vari-ance of 0.015. The MCMC standard error based on the consistent batch meansestimator [Flegal, Haran and Jones (2008)] for σ 2 is 0.0011. This provides someconfidence that our MCMC algorithm has converged to the stationary posteriordistribution.

The quantity σ 2 is not easily interpreted directly, but the latent representation ofzsip�k lends an interpretation to the latent spatial covariance matrix �(σ 2). From(2),

zsip�k = μ�k + ηs�k + εsip�k, εsip�k ∼ N(0,1),

where η�k ∼ N(0, �(σ 2)). The spatial covariance matrix �(σ 2) is nonstation-ary, and the average of the posterior mean diagonal elements of �(σ 2) is 8.43,which is much greater than the unit variance of the nonspatial noise (nugget) con-tained in εsip�k of (2), leading us to conclude that spatial variability in the ge-netic data accounts for a much larger proportion of the variability than does non-spatial (within lek) variability. Summaries of posterior distributions for {μ�k, � =1,2, . . . ,14, k = 1,2, . . . ,K�} are not shown, but correlate well with empirical al-lele frequencies taken over all 1136 genetic samples.

Page 10: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1050 E. M. HANKS ET AL.

5. Optimal retrospective sampling for sage-grouse in the western UnitedStates. Having fit the model to microsatellite allele data from sage-grouse feathersamples collected in 2009–2012, we now consider a retrospective sampling designfor this system. Our goal is to recommend optimal regions for sampling previouslyunsampled sage-grouse leks, given what is known from the spatially referencedgenetic data already collected. This requires the definition of a criterion φ(dj ) bywhich we can compare J potential sampling designs (dj , j = 1, . . . , J ).

5.1. Latent Gaussian design. A common criterion to minimize for optimalsampling of Gaussian random variables is the mean squared prediction error(MSPE) at unobserved locations [e.g., Zimmerman (2006)]. The observations{ysip�k} in our model are not Gaussian, rather they are categorical alleles whichwe model as arising from a multinomial distribution. However, each zsip�k is a la-tent representation of the true observation ysip�k , and zsip�k is normally distributedsuch that

zsip�k = μ�k + ηs�k + εsip�k, εsip�k ∼ N(0,1).

This leads us to consider the MSPE of the latent z variables as a design crite-rion. Full posterior predictive inference on this MSPE would be computationallyprohibitive, as it would require integrating over the entire posterior distribution.Instead, in what follows, we will consider the parameters {μ�k} and σ 2 to be fixedand known, and we will focus on finding optimal designs conditional on the poste-rior mean values of model parameters. Our development of a design criterion firstfocuses on the design for the latent Gaussian z, and is reminiscent of Kriging [e.g.,Cressie (1993)]. We will then consider a design criterion that takes into accountthe categorical nature of the data by integrating over the latent Gaussian z.

For the kth allele at the �th locus, we can divide the spatial locations, and theircorresponding z variables, into observed (z(o)

�k = {zsip�k : s is observed}) and un-

observed (z(u)�k = {zsip�k : s is unobserved}) categories. The “observed” set corre-

sponds to the genetic samples obtained from the sage-grouse leks, while the “un-observed” set will be a single unsampled sage-grouse at each unsampled lek. Thejoint distribution of z�k at all leks is Gaussian:

(12) z�k =(

z(o)�k

z(u)�k

)∼ N

(μ�k1,

(�oo �ou

�uo �uu

)),

where the covariance matrix of z�k is � = σ 2� + I and has been partitioned ac-cording to observed and unobserved locations.

If z(o) are known (observed), then the mean of the distribution of z(u)�k (condi-

tional on z(o)) is given by

(13) z(u)�k = μ�k1 + �uo�

−1oo

(z(o)�k − μ�k1

),

Page 11: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1051

and the corresponding conditional Gaussian covariance matrix is given by

(14) � = E[(zu − zu)(zu − zu)

′] = �uu − �uo�−1oo �ou.

Each diagonal entry of � contains the MSPE for z at an unobserved location, andwe choose the sum of these entries [“A-optimality” in Harville (2008)] for ourlatent Gaussian design criterion:

(15) φLG(d) = tr(�) = sum(diag(�)

).

Other possible alternatives include “D-optimality,” in which the design criterionconsists of the product of the diagonal elements of �. While D-optimality hassome desirable properties when considering optimal covariate designs, we focuson A-optimality, which is the prevailing approach in spatial sampling [e.g., Hootenet al. (2009), Wikle and Royle (2005), Zimmerman (2006)] and is interpretablewith the design criterion being the mean square prediction error averaged over allunobserved spatial locations.

It is notable that neither the covariance matrix � nor the design criterion φLG(d)

depend on z(o)�k . Additionally, this design criterion would be identical for any locus

� and allele k, as we assume in (3) that the spatial correlation is shared by all latentallelic processes {z�k}. As the design criterion φLG depends only on the covariancematrix �, computation of this design criterion is very efficient, as changing a de-sign (e.g., adding a new lek to the set of previously sampled leks) only involvesevaluating (14) and (15) for a different partition of locations into “observed” and“unobserved” in (12).

5.2. Design for categorical observations. The genetic observations are notGaussian, and it is not clear whether the latent Gaussian design criterion (15) isappropriate for categorical data. We thus consider a categorical design criterion.Consider predicting the first (of ploidy 2) observed allele at locus � for an indi-vidual sage-grouse at an unsampled spatial location u, that is, we want to predictyu� = (yu11�1, . . . , yu11�K)′, which will be a vector with one entry equal to 1 (indi-cating the observed allele) and all other entries equal to zero. Note that, due to ourlatent Gaussian model (1)–(3), predictions on yu� can be obtained by integratingover predictions on (zu11�1, . . . , zu11�K�

). As we did in Section 5.1, we will con-sider prediction and sampling design conditioned on the posterior mean values of{μ�k}, σ 2 and the latent {z(o)

�k }. Our approach could be extended to full posteriorpredictive inference by integrating over the posterior distribution of these parame-ters, but this is computationally prohibitive in the case of the sage-grouse study.

If we condition on {z(o)�1 , . . . , z(o)

�K�}, then the mean of the latent Gaussian

(zu11�1, . . . , zu11�K�) is (zu11�1, . . . , zu11�K�

), with each entry obtained using (13).Our prediction of yu�|{z(o)

�1 , . . . , z(o)�K�

} is then yu11�ku�

= 1, where zu11�ku�

> zu11�a ,

a �= ku�, with all other entries in yu� equal to zero.

Page 12: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1052 E. M. HANKS ET AL.

To obtain a design criterion, we note that the MSPE of yu�|{z(o)�1 , . . . , z(o)

�K�} is

equal to the probability that the index ku� of the maximum entry of (zu11�1, . . . ,

zu11�K�) is different from the index ku� of the maximum entry of (zu11�1, . . . ,

zu11�K�).

Each zu11�a is marginally distributed as

zu11�a|{z(o)�a

} ∼ N(zu11�a,ψ

2u

),

where zu11�a is given by (13) and ψ2u is the uth diagonal element of � in (14).

While the latent z random variables are correlated in space, they are uncorrelatedbetween alleles; that is, zu11�a is independent of zu11�a′ for a �= a′, and zu� =(zu11�1, zu11�2, . . . , zu11�K�

)′ is distributed

(16) zu�|{z(o)�a , a = 1, . . . ,K�

} ∼ N(zu�,ψ

2uI

).

Without loss of generality, suppose that the index ku11� of the maximum zu11�a

is ku11� = K�; that is, the observed allele is the last allele in the list. Then theprobability of an incorrect categorical allele prediction is

θu� = P(ku11� �= ku11�|{z(o)

�a , a = 1, . . . ,K�

})= 1 − P

(ku11� = ku11�|{z(o)

�a , a = 1, . . . ,K�

})= 1 − P

(zu11�1 − zu11�K�

< 0, zu11�2 − zu11�K�< 0, . . . ,

zu11�K�−1 − zu11�K�< 0|{z(o)

�a , a = 1, . . . ,K�

}),

which is an orthant probability of the multivariate normal wu� ∼ N(Lzu�,ψ2uLL′),

defined as

(17) wu� =

⎡⎢⎢⎢⎣

wu11�1wu11�2

...

wu11�K�−1

⎤⎥⎥⎥⎦ = Lzu� =

⎡⎢⎢⎢⎣

1 0 0 · · · −10 1 0 · · · −1...

. . ....

0 0 · · · 1 −1

⎤⎥⎥⎥⎦

⎡⎢⎢⎢⎣

zu11�1zu11�2

...

zu11�K�

⎤⎥⎥⎥⎦ ,

with L a contrast matrix of dimension (K� −1)×K�. Our proposed categorical de-sign criterion is then the sum of this probability for each locus at each unobservedlocation:

φcat(d) = ∑u

∑�

θu�

= ∑u

∑�

(1 − P(wu11�1 < 0,wu11�2 < 0, . . . ,wu11�K�−1 < 0)

).

(18)

Note that this design criterion considers prediction error at each locus for oneindividual at each unobserved sample location.

The calculation of each of the multivariate normal orthant probabilities {θu�} issimplified, as the correlation matrix can be written as LL′ = I+11′. In this special

Page 13: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1053

case, each multivariate orthant probability can be expressed as a one-dimensionalintegral on [0,1] [Genz and Bretz (2009), pages 3, 16–17, Marsaglia (1963)]:

θu� = 1 −∫ 1

0

K�−1∏k=1

(1 − F

(zu�K�

− zu�k√ψ2

u

− F−1(t)

))dt,

where F is the CDF of a standard normal random variable. We approximated thisone-dimensional integral using numerical quadrature. This provides an efficientapproach to calculating the categorical design criterion φcat (18).

5.3. Optimal sampling for reduced-rank spatial models. Computing the de-sign criteria (15) and (18) requires calculation of the inverse �−1

oo which hascomputational cost of O(n3

d), where nd is the number of locations in the pro-posed design. In the sage-grouse study, nd = 1177 leks were sampled duringthe years of 2009–2012. For our retrospective design, we consider adding leksto the sampling design, which will increase this number. Inverting a matrix ofthis size is computationally demanding, especially if we wish to compare a largenumber of potential sampling designs. However, we can take advantage of thereduced-rank spatial model presented in Section 4.2 to significantly reduce thecomputational cost of computing the design criterion by replacing the covari-ance matrix � = �uu − �uo�

−1oo �ou (14) with a reduced-rank approximation

� = �uu − �uo�−1oo �ou.

Note that the reduced-rank covariance matrix � of all observed and unobservednodes can be decomposed and written in block form:

� = σ 2MDM′ + I

= σ 2 ·[

Mo

Mu

]D

[M′

o M′u

] +[Io 00 Iu

],

which leads to a reduced-rank version of the Kriging covariance matrix:

� = �uu − �uo�−1oo �ou(19)

= σ 2MuDM′u + Iu

(20)− (

σ 2MuDM′o

)(σ 2MoDM′

o + Io

)−1(σ 2MoDM′

u

).

This decomposition requires the inverse of an nd ×nd matrix, but the inverse inquestion can be expressed using the Sherman–Morrison–Woodbury identity [e.g.,Gentle (2007), page 221]

(21)(Mo

(σ 2D

)M′

o + Io

)−1 = Io − Mo

(1

σ 2 D−1 + M′oIoMo

)−1M′

o.

Page 14: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1054 E. M. HANKS ET AL.

The resulting expression requires only the inverse of a matrix of dimension-ality ne × ne, where ne is equal to the number of eigenvectors retained in thereduced-rank version of Q and typically ne � nd . This formulation allows us totake advantage of the reduced-rank approximation to the spatial covariance in thecomputation of the sampling criteria φLG(d) and φcat(d). Similar reduced-rank ap-proximations have been used for prediction [e.g., Wikle and Cressie (1999)], butnot, to our knowledge, in optimal sampling design.

5.4. Recommendations for retrospective sage-grouse sampling effort. Tomake recommendations for the retrospective sampling effort of sage-grouse leks,we identify optimal retrospective designs under both the latent Gaussian MSPEsampling criterion (15) and the categorical sampling criterion (18). As described inSection 4.3, genetic data from 830 individuals at 243 distinct leks (Figure 1) wereused to fit the statistical model proposed in Section 4.2. Each of these leks wereconsidered to be observed for our purpose of identifying an optimal retrospec-tive design. We also included 934 additional leks where feather samples had beencollected during 2009–2012 at these additional leks, but had not been genotypedbefore the resumption of sampling. The genetic information from these samplescould thus not be used to estimate the spatial covariance parameters in our statis-tical model, but knowing that samples have been obtained from these additionalsampling locations can aid in identifying regions of high sampling importance forthe retrospective sampling effort. This results in a total of 243 + 934 = 1177 lekswhich we consider to be observed, or sampled, in the existing design.

Given this set of observed leks, we consider adding R additional leks to theexisting design. The motivation behind this type of design is that resources mayexist to collect genetic information (e.g., sage-grouse feathers) at only a subset ofthe known leks across the western United States. Under such a constraint, we chosethis retrospective design of R additional leks so that the sampling design criterionis optimized. For illustrative purposes, we consider R = 10.

A complete search of the space of retrospective designs would be computa-tionally prohibitive, thus we constructed an iterative search algorithm to identifya pseudo-optimal retrospective design. We first considered 10,000 random retro-spective designs of ten additional leks randomly chosen from all leks where feath-ers had not been collected (unsampled leks), and computed the latent Gaussiansampling criterion (15) and categorical criterion (18) for each. We also consideredmultiple manually specified initial designs chosen so that the ten new leks wereregularly spaced in geographic space. The best six designs from this initial searchwere used as initial designs in an iterative exchange algorithm [Royle (1998)]. Theswitching algorithm is detailed in Algorithm 1.

We ran the iterative algorithm for Nswitch = 1000 iterations for each initial de-sign. A local minimum was found for each initial design; the resulting designswere then compared, and the design with the smallest design criterion was chosen.This process was repeated once for the latent Gaussian criterion (15) and once for

Page 15: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1055

Algorithm 1 Sampling design switching algorithm1: Set the initial design d = (s1, s2, . . . , sR) � R = the size of the design2: Compute the initial design criterion φ(d)

3: Set Nswitch = the number of iterations4: Set dMAX5: for iter in 1:Nswitch do6: for v in 1:R do7: Draw s∗

v uniformly from {s : ‖s − sv‖ < dMAX}8: Compute the design criterion φ(d∗) for d∗ = (s1, . . . , sv−1, s

∗v ,

sv+1, . . . , sR)

9: if φ(d∗) < φ(d) then10: d ← d∗11: return d � d is a locally-optimal design

the categorical criterion (18). The resulting optimal designs for the two criteriaare shown in Figure 2. This retrospective design is based on the estimated spa-tial covariance for a network model of lek connectivity and provides guidance onhow states can prioritize future sage-grouse lek sampling efforts by identifying 10leks of high sampling importance in a study designed to characterize sage-grousegenetic diversity.

5.5. Simulation study. We conducted a simulation study to compare how ef-fective the latent Gaussian design criterion φLG and the categorical design crite-rion φ∗

cat are at identifying important sampling locations. Using posterior meanparameters from our results in Section 4, we simulated spatially correlated cate-gorical data on the entire sage-grouse lek network using the model described in(1)–(3). We considered simulating one locus with 10 alleles for one individual ateach lek in the spatial network. We considered predicting the simulated categori-cal observation conditional on a fixed, true value of σ 2 and the fixed, true latentGaussian zsip�k at each lek in the optimal design. The latent zsip�k at leks not inthe optimal design (where predictions are required) were treated as unknown. Thismimics the case in which a researcher has estimated σ 2 and zsip�k at the locationsin the optimal design from existing data, such as through the posterior mean aftermodel fitting as described in Section 4. In practice, there will be uncertainty aboutthese parameter estimates; fixing them at their true values for this simulation studyallows us to study the relative merits of the latent Gaussian and full categoricaldesign criteria in a “best case” scenario. As always, poor estimation of model pa-rameters resulting from biased samples or poor choices of prior distributions couldnegatively affect predictive error. For optimal designs, we used the optimal retro-spective designs from Section 5.4, which are shown in Figure 2. This will allow us

Page 16: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1056 E. M. HANKS ET AL.

FIG. 2. Sage-grouse lek sampling recommendations based on latent Gaussian MSPE. Leks shownas grey circles indicate leks that have not been sampled, while leks shown as black dots have beensampled in 2009–2012. Leks shown as crosses are the optimal categorical retrospective lek samplinglocations under the constraint that no more than 10 additional leks are sampled in 2013. Leks shownas open circles are the optimal latent Gaussian retrospective lek sampling locations under the sameconstraint.

to study the ability of the latent Gaussian design criterion to approximate the fullcategorical criterion. The predicted categorical observations were compared withthe true simulated categorical observations, and the MSPE was computed as thenumber of leks where the predicted allele was not the true simulated allele. Thisprocess was repeated 1000 times, with results shown in Figure 3.

In Figure 3(a), we show the categorical and latent Gaussian design criteria for300 random designs. It is clear that the categorical and latent Gaussian designcriteria are not always in agreement. This is evident in Figure 2 as well, where thereis very little overlap between the optimal categorical and latent Gaussian designs.Figure 3(b) shows the simulation MSPEs for the optimal latent Gaussian designand the optimal categorical design for all runs in the simulation study. While theoptimal categorical design (MSPE = 2451) does, on average, predict better thanthe optimal latent Gaussian design (MSPE = 2459), the difference is slight. Thisindicates that the latent Gaussian design criterion may be an effective substitutefor the more computationally intensive categorical design criterion.

Page 17: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1057

FIG. 3. Simulation study comparing the categorical (φcat) and latent Gaussian (φLG) design cri-teria. The latent Gaussian design criterion φLG shows little correlation with φcat (a). The optimaldesigns from Section 5.4 were compared by simulating categorical observations on the entire lek net-work and comparing predictions from the optimal design with actual simulated observations. Whilethe optimal categorical design results in better predictions than the latent Gaussian design (b), thedifference is slight.

6. Discussion. We have proposed a multinomial model with latent spatial ran-dom effects for microsatellite allele data and illustrated how optimal retrospectivespatial designs can be obtained for landscape genetic studies. We defined spatialcovariance between leks based on a graphical lek network, with edge weights afunction of Euclidean distance between leks. This model was developed to accountfor the lekking behavior of greater sage-grouse that results in a spatial constella-tion of breeding nodes [Knick and Hanser (2011)], and will be appropriate for otherspecies which have clearly defined breeding areas or population locations. Alter-nately, a landscape graph could be specified where the edge weights are defined bythe local landscape features [e.g., Cushman and Landguth (2010), Cushman et al.(2006), McRae (2006), McRae and Beier (2007), McRae et al. (2008), Spear et al.(2010)], and connectivity is defined under an isolation by resistance or least-costpath (LCP) approach. Both the IBR and LCP approaches imply a nonstationaryspatial covariance, with the nonstationarity a function of landscape characteristicshypothesized to facilitate or impede gene flow. Hanks and Hooten (2013) parame-terized the edge weights in an ICAR spatial model based on landscape covariates,and made inference on parameters related to the resistance of different landscapecovariates using a Bayesian approach.

We have not included fixed covariate effects in our latent model (2), rather wehave modeled only population-level allele prevalence (μ�k) and latent spatial ran-dom variation ηs�k . If the loci used in the analysis are from genes influenced byselection for specific landscape features, then a linear predictor x′

sβ�k could be in-cluded in the mean of (2), where xs is a vector containing landscape characteristicsat the sth spatial location, and β�k are allele-specific regression parameters whichprovide indications of the effect of selection on each allele. In the case where theloci are selectively neutral, there is no reason to assume that allele frequency would

Page 18: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1058 E. M. HANKS ET AL.

be linked to the landscape features of the spatial locations where the individuals areobserved. Loci could be classified as being selectively neutral if a regression analy-sis reveals no significant relationships between allele prevalence and the landscapecharacteristics in x.

We also assumed in (3) that the latent allelic spatial effects ηla have the samespatial covariance matrix �(θ) for each locus and allele. This implies that the pro-cesses driving spatial variation in allele frequencies are the same for each locus. Ifthe loci in question have similar mutation rates, then this assumption is likely tobe met, as the remaining microevolutionary processes driving spatial variation inallele frequencies (e.g., mating, survival and movement) should be shared acrossloci. If mutation rates are highly variable between loci, then (3) could be general-ized to allow for loci-specific covariance matrices [e.g., η�k ∼ N(0,��)].

While genetic data were collected from 2009 to 2012, we did not include atemporal component in our model. The main assumption implicit in collapsingover time is that the mean lek-specific allele probabilities {ps�k} are constant from2009–2012. One could consider allowing the spatial allele probabilities to varyover time, but because no leks were visited more than once, we proceeded with thissimplifying assumption. Additionally, before the start of the 2013 season, geno-typed feather samples were only available for leks in the southwestern half of thesage-grouse range. When genotype data are available from the northeastern half ofthe range, the analysis could be performed with more complete observations.

Other considerations reflecting our a priori understanding of the metapopulationstructure could also be considered in the sampling design. If population delineationis likely to be small across the range of the species, then regular spatial coveragemight be preferred. If directional climate change or habitat change is shifting thespecies distribution, areas at both the trailing and leading edge may be importantto sample, as selective pressures may be greater in these locations. Even in a stablespecies distribution, edge or peripheral populations tend to have smaller effectivepopulation sizes, and thus are subject to more change due to random processes, po-tentially making them of higher importance for sampling [Schwartz et al. (2003)].Thus, any process that could cause a lek to be small and subject to drift (e.g., closeto energy development or agricultural tillage), or to be large and under unique se-lective pressures, could be a cause for prioritization of that lek in the samplingprocess. Similarly, the large geographic range of sage-grouse means that differentpopulations may be subject to different selective pressures. Areas that are in eco-logically unique habitats may be important to sample to assess if unique geneticadaptations (e.g., local adaptation) have occurred. Cryptic species, subspecies ordistinct populations segments can be found by sampling these areas. The designcriteria proposed in this manuscript focus on minimizing the global uncertainty inour understanding of population genetics. Future work will consider local designcriteria that emphasize understanding changes on the fringes of a population.

The use of a network model assumes that we know the location of all sage-grouse leks across the study region. In many situations, it would be more real-istic to assume that there are subpopulations at many unknown locations across

Page 19: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1059

the study region. In this case, the latent spatial processes could be modeled usinga continuous Gaussian random field instead of the discrete-space network modelused in this paper. Nonstationary spatial covariance could be modeled using spatialdeformation approaches [e.g., Schmidt and O’Hagan (2003)] or convolution-basedcovariance functions [e.g., Calder and Cressie (2007)]. Hierarchical spatial model-ing of landscape genetic data provides new opportunities to provide inference forlandscape effects on gene flow within a formal statistical framework.

Acknowledgments. We appreciate the collaborative effort by 11 states in theWestern Association of State Agencies in contributing sage-grouse data and ex-pertise, and collecting feathers for this study. We are also grateful for the helpfulsuggestions provided by Karen Kafadar, an anonymous Associate Editor and twoanonymous reviewers. Their careful reading of this manuscript strengthened it inmany ways. Any use of trade, firm or product names is for descriptive purposesonly and does not imply endorsement by the U.S. Government.

SUPPLEMENTARY MATERIAL

Full conditional distributions (DOI: 10.1214/16-AOAS929SUPP; .pdf). Herewe provide full conditional distributions for all parameters in the hierarchical sta-tistical model with latent reduced-rank spatial random effects.

REFERENCES

CUSHMAN, S. A. and LANDGUTH, E. L. (2010). Scale dependent inference in landscape genetics.Landsc. Ecol. 25 967–979.

ALBERT, J. H. and CHIB, S. (1993). Bayesian analysis of binary and polychotomous response data.J. Amer. Statist. Assoc. 88 669–679. MR1224394

BERLINER, L. M., WIKLE, C. K. and CRESSIE, N. (2000). Long-lead prediction of Pacific SSTsvia Bayesian dynamic modeling. J. Climate 13 3953–3968.

BESAG, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc.Ser. B. Stat. Methodol. 36 192–236. With discussion by D. R. Cox, A. G. Hawkes, P. Clifford,P. Whittle, K. Ord, R. Mead, J. M. Hammersley, and M. S. Bartlett and with a reply by the author.MR0373208

BESAG, J. and KOOPERBERG, C. (1995). On conditional and intrinsic autoregressions. Biometrika82 733–746. MR1380811

CALDER, C. A. and CRESSIE, N. (2007). Some topics in convolution-based spatial modeling. InProceedings of the 56th Session of the International Statistics Institute 22–29. Instituto Nacionalde Estatistica, Lisbon, Portugal.

CONNELLY, J. W., HAGEN, C. A. and SCHROEDER, M. A. (2011). Characteristics and dynam-ics of greater sage-grouse populations. In Greater Sage-Grouse: Ecology and Conservation of aLandscape Species and Its Habitats. Studies in Avian Biology 38 53–67. Univ. California Press,Oakland.

CONNELLY, J. W., SCHROEDER, M. A., SANDS, A. R. and BRAUN, C. E. (2000). Guidelines tomanage sage-grouse populations and their habitats. Wildl. Soc. Bull. 28 967–985.

CRESSIE, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. MR1239641CROOKS, K. R. and SANJAYAN, M. A. (2006). Connectivity Conservation. Cambridge Univ. Press,

Cambridge.

Page 20: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1060 E. M. HANKS ET AL.

CUSHMAN, S. A., MCKELVEY, K. S., HAYDEN, J. and SCHWARTZ, M. K. (2006). Gene flow incomplex landscapes: Testing multiple hypotheses with causal modeling. Amer. Nat. 168 486–499.

DALKE, P. D., PYRAH, D. B., STANTON, D. C., CRAWFORD, J. E. and SCHLATTERER, E. F.(1963). Ecology, productivity, and management of sage-grouse in Idaho. J. Wildl. Manag. 27811–841.

DIGGLE, P. J. and RIBEIRO, P. J. JR. (2007). Model-Based Geostatistics. Springer, New York.MR2293378

DURAND, E., JAY, F., GAGGIOTTI, O. E. and FRANÇOIS, O. (2009). Spatial inference of admixtureproportions and secondary contact zones. Mol. Biol. Evol. 26 1963–1973.

FLEGAL, J. M., HARAN, M. and JONES, G. L. (2008). Markov chain Monte Carlo: Can we trustthe third significant figure? Statist. Sci. 23 250–260. MR2516823

GARTON, E. O., CONNELLY, J. W., HORNE, J. S., HAGEN, C. A., MOSER, A. andSCHROEDER, M. A. (2011). Greater sage-grouse population dynamics and probability of per-sistence. In Greater Sage-Grouse: Ecology and Conservation of a Landscape Species and ItsHabitats. Studies in Avian Biology 38 293–381. Univ. California Press, Oakland.

GENTLE, J. E. (2007). Matrix Algebra: Theory, Computations, and Applications in Statistics.Springer, New York. MR2337395

GENZ, A. and BRETZ, F. (2009). Computation of Multivariate Normal and t Probabilities. LectureNotes in Statistics 195. Springer, Dordrecht. MR2840595

GUILLOT, G., ESTOUP, A., MORTIER, F. and COSSON, J. F. (2005). A spatial statistical model forlandscape genetics. Genetics 170 1261–1280.

GUILLOT, G., LEBLOIS, R., COULON, A. and FRANTZ, A. C. (2009). Statistical methods in spatialgenetics. Mol. Ecol. 18 4734–4756.

HANKS, E. M. and HOOTEN, M. B. (2013). Circuit theory and model-based inference for landscapeconnectivity. J. Amer. Statist. Assoc. 108 22–33. MR3174600

HANKS, E. M., HOOTEN, M. B., KNICK, S. T., OYLER-MCCANCE, S. J., FIKE, J. A.,CROSS, T. B. and SCHWARTZ, M. K. (2016). Supplement to “Latent spatial models and sam-pling design for landscape genetics.” DOI:10.1214/00-AOAS929SUPP.

HARVILLE, D. A. (2008). Matrix Algebra from a Statistician’s Perspective. Springer, New York.HOOTEN, M. B., WIKLE, C. K., SHERIFF, S. L. and RUSHIN, J. W. (2009). Optimal spatio-

temporal hybrid sampling designs for ecological monitoring. J. Veg. Sci. 20 639–649.KLEIN, D. J. and RANDIC, M. (1993). Resistance distance. J. Math. Chem. 12 81–95. Applied graph

theory and discrete mathematics in chemistry (Saskatoon, SK, 1991). MR1219566KNICK, S. T. and HANSER, S. E. (2011). Connecting pattern and process in greater sage-grouse

populations and sagebrush landscapes. In Greater Sage-Grouse: Ecology and Conservation of aLandscape Species and Its Habitats. Studies in Avian Biology 38 383–406. Univ. California Press,Oakland.

MARSAGLIA, G. (1963). Expressing the normal distribution with covariance matrix A + B in termsof one with covariance matrix A. Biometrika 50 535–538. MR0181061

MCKELVEY and SCHWARTZ (2005). dropout: A program to identify problem loci and samples fornoninvasive genetic samples in a capture-mark-recapture framework. Molecular Ecology Notes 5716–718.

MCRAE, B. H. (2006). Isolation by resistance. Evolution 60 1551–1561.MCRAE, B. H. and BEIER, P. (2007). Circuit theory predicts gene flow in plant and animal popula-

tions. Proc. Natl. Acad. Sci. USA 104 19885–19890.MCRAE, B. H., DICKSON, B. G., KEITT, T. H. and SHAH, V. B. (2008). Using circuit theory to

model connectivity in ecology, evolution, and conservation. Ecology 89 2712–2724.PATTERSON, R. L. (1952). The Sage-Grouse in Wyoming. Sage Books, Los Angeles.ROYLE, J. (1998). An algorithm for the construction of spatial coverage designs with implementation

in S-PLUS. Comput. Geosci. 24 479–488.

Page 21: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

LANDSCAPE GENETIC SAMPLING DESIGN 1061

SCHMIDT, A. M. and O’HAGAN, A. (2003). Bayesian inference for non-stationary spatial co-variance structure via spatial deformations. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 743–758.MR1998632

SCHROEDER, M. A., ALDRIDGE, C. L., APA, A. D., BOHNE, J. R., BRAUN, C. E., BUN-NELL, S. D., CONNELLY, J. W., DEIBERT, P. A., GARDNER, S. C., HILLIARD, M. A. et al.(2004). Distribution of sage-grouse in North America. Condor 106 363–376.

SCHWARTZ, M. K., MILLS, L. S., ORTEGA, Y., RUGGIERO, L. F. and ALLENDORF, F. W. (2003).Landscape location affects genetic variation of Canada lynx (Lynx canadensis). Mol. Ecol. 121807–1816.

SELANDER, R. K. (1970). Behavior and genetic variation in natural populations. Am. Zool. 10 53–66.

SLATKIN, M. (1987). Gene flow and the geographic structure of natural populations. Science 236787–792.

SMITH, J. T., FLAKE, L. D., HIGGINS, K. F., KOBRIGER, G. D. and HOMER, C. G. (2005). Eval-uating lek occupancy of greater sage-grouse in relation to landscape cultivation in the Dakotas.West. N. Am. Nat. 65 310–320.

SMOUSE, P. E. and PEAKALL, R. (1999). Spatial autocorrelation analysis of individual multialleleand multilocus genetic structure. Heredity 82 561–573.

SOKAL, R. R., ODEN, N. L. and WILSON, C. (1991). Genetic evidence for the spread of agriculturein Europe by demic diffusion. Nature 351 143–145.

SPEAR, S. F., BALKENHOL, N., FORTIN, M. J., MCRAE, B. H. and SCRIBNER, K. (2010). Use ofresistance surfaces for landscape genetic studies: Considerations for parameterization and analy-sis. Mol. Ecol. 19 3576–3591.

SPIEGELHALTER, D. J., BEST, N. G., CARLIN, B. P. and VAN DER LINDE, A. (2002). Bayesianmeasures of model complexity and fit. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 583–639.MR1979380

TAYLOR, P. D., FAHRIG, L., HENEIN, K. and MERRIAM, G. (1993). Connectivity is a vital elementof landscape structure. Oikos 68 571–573.

WIKLE, C. K. and CRESSIE, N. (1999). A dimension-reduced approach to space-time Kalman fil-tering. Biometrika 86 815–829. MR1741979

WIKLE, C. K. and ROYLE, J. A. (2005). Dynamic design of ecological monitoring networks fornon-Gaussian spatio-temporal data. Environmetrics 16 507–522. MR2147540

WILEY, R. H. (1973). Territoriality and non-random mating in sage-grouse, Centrocercusurophasianus. Anim. Behav. Monogr. 6 87–169.

ZIMMERMAN, D. L. (2006). Optimal network design for spatial prediction, covariance parameterestimation, and empirical prediction. Environmetrics 17 635–652. MR2247174

E. M. HANKS

DEPARTMENT OF STATISTICS

PENNSYLVANIA STATE UNIVERSITY

UNIVERSITY PARK, PENNSYLVANIA 16802USAE-MAIL: [email protected]

M. B. HOOTEN

U.S. GEOLOGICAL SURVEY

COLORADO COOPERATIVE FISH

AND WILDLIFE RESEARCH UNIT

DEPARTMENT OF FISH, WILDLIFE,AND CONSERVATION BIOLOGY

DEPARTMENT OF STATISTICS

COLORADO STATE UNIVERSITY

FORT COLLINS, COLORADO 80523USA

Page 22: Latent spatial models and sampling design for landscape ...LANDSCAPE GENETIC SAMPLING DESIGN 1045 of the kth allele at the th locus across the entire population being studied.Large

1062 E. M. HANKS ET AL.

S. T. KNICK

U.S. GEOLOGICAL SURVEY

FOREST AND RANGELAND ECOSYSTEM

SCIENCE CENTER

BOISE, IDAHO 83706USA

S. J. OYLER-MCCANCE

J. A. FIKE

FORT COLLINS SCIENCE CENTER

2150 CENTRE AVE BLDG CFORT COLLINS, COLORADO 80526USA

T. B. CROSS

M. K. SCHWARTZ

USFS ROCKY MOUNTAIN RESEARCH STATION

NATIONAL GENOMICS CENTER FOR WILDLIFE

AND FISH CONSERVATION

800 E. BECKWITH AVE

MISSOULA, MONTANA 59801USA