development of high-density snp genotyping arrays for white spruce ( picea glauca ) and...

13
Development of high-density SNP genotyping arrays for white spruce (Picea glauca) and transferability to subtropical and nordic congeners NATHALIE PAVY, 1 * FRANCE GAGNON, 1 * PHILIPPE RIGAULT, 2 * SYLVIE BLAIS, 1 ASTRID DESCH ^ ENES, 1 BRIAN BOYLE, 1 BETTY PELGAS, 1,3 MARIE DESLAURIERS, 1,4 S EBASTIEN CL EMENT, 1,4 PATRICIA LAVIGNE, 1,3 MANUEL LAMOTHE, 1,3 JANICE E.K. COOKE, 5 JUAN P. JARAMILLO-CORREA, 1,6 JEAN BEAULIEU, 1,4 NATHALIE ISABEL, 1,3 JOHN MACKAY 1 and JEAN BOUSQUET 1 1 Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and Integrative Biology, Universit e Laval, Qu ebec, Canada, QC G1V 0A6, 2 Gydle Inc., 1363 Avenue Maguire, Qu ebec, Canada, QC G1T 1Z2, 3 Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre, 1055 Rue du P.E.P.S., C.P. 10380, succ, Sainte-Foy, Qu ebec, Canada, QC G1V 4C7, 4 Natural Resources Canada, Canadian Wood Fibre Centre, 1055 Rue du P.E.P.S., C.P. 10380, succ. Sainte-Foy, Qu ebec, Canada, QC G1V 4C7, 5 Department of Biological Sciences, CW405 Biological Sciences Building, University of Alberta, Edmonton, Canada, AB T6G 2E9, 6 Departamento de Ecolog ıa Evolutiva, Instituto de Ecolog ıa, Universidad Nacional Aut onoma de M exico, Apartado Postal 70-275, M exico, D.F. Mexico Abstract High-density SNP genotyping arrays can be designed for any species given sufficient sequence information of high quality. Two high-density SNP arrays relying on the Infinium iSelect technology (Illumina) were designed for use in the conifer white spruce (Picea glauca). One array contained 7338 segregating SNPs representative of 2814 genes of various molecular functional classes for main uses in genetic association and population genetics studies. The other one contained 9559 segregating SNPs representative of 9543 genes for main uses in population genetics, linkage map- ping of the genome and genomic prediction. The SNPs assayed were discovered from various sources of gene rese- quencing data. SNPs predicted from high-quality sequences derived from genomic DNA reached a genotyping success rate of 64.7%. Nonsingleton in silico SNPs (i.e. a sequence polymorphism present in at least two reads) pre- dicted from expressed sequenced tags obtained with the Roche 454 technology and Illumina GAII analyser resulted in a similar genotyping success rate of 71.6% when the deepest alignment was used and the most favourable SNP probe per gene was selected. A variable proportion of these SNPs was shared by other nordic and subtropical spruce species from North America and Europe. The number of shared SNPs was inversely proportional to phylogenetic divergence and standing genetic variation in the recipient species, but positively related to allele frequency in P. gla- uca natural populations. These validated SNP resources should open up new avenues for population genetics and comparative genetic mapping at a genomic scale in spruce species. Keywords: gene SNPs, genotyping, Infinium SNP array, interspecific divergence, Picea, SNP sharing Received 1 October 2012; revision received 1 December 2012; accepted 4 December 2012 Introduction SNPs have become cornerstone markers for a wide vari- ety of genetic applications in model and nonmodel spe- cies (Stapley et al. 2010; Ekblom & Galindo 2011) because of their abundance in genomes and their amenability to high-throughput genotyping. EST sequence redundancy and gene resequencing offer efficient ways to identify extensive SNP data sets amenable to genotyping (e.g. Pavy et al. 2006). In conifers, the development of in silico SNP resources was enhanced by several large- scale EST sequencing projects (MacKay & Dean 2011). These SNP resources have enabled the development of genotyping assays used for linkage mapping (e.g. Pavy et al. 2008a; Eckert et al. 2010; Chancerel et al. 2011), quantitative trait loci analyses (e.g. Pelgas et al. 2011) as well as population genetics and genetic association stud- ies (e.g. Namroud et al. 2008; Eckert et al. 2009, 2010; Correspondence: Jean Bousquet, Fax: 418-656-3493; E-mail: [email protected] *These authors contributed equally to this work. © 2013 Blackwell Publishing Ltd Molecular Ecology Resources (2013) doi: 10.1111/1755-0998.12062

Upload: independent

Post on 05-Apr-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Development of high-density SNP genotyping arrays forwhite spruce (Picea glauca) and transferability to subtropicaland nordic congeners

NATHALIE PAVY,1* FRANCE GAGNON,1* PHILIPPE RIGAULT,2* SYLVIE BLAIS,1 ASTRID

DESCH ENES,1 BRIAN BOYLE,1 BETTY PELGAS,1,3 MARIE DESLAURIERS,1,4 S �EBASTIEN

CL �EMENT,1,4 PATRICIA LAVIGNE,1,3 MANUEL LAMOTHE,1,3 JANICE E.K. COOKE,5

JUAN P. JARAMILLO-CORREA,1,6 JEAN BEAULIEU,1,4 NATHALIE ISABEL,1,3 JOHN MACKAY1 and

JEAN BOUSQUET1

1Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and

Integrative Biology, Universit�e Laval, Qu�ebec, Canada, QC G1V 0A6, 2Gydle Inc., 1363 Avenue Maguire, Qu�ebec, Canada, QC

G1T 1Z2, 3Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre, 1055 Rue du P.E.P.S., C.P. 10380,

succ, Sainte-Foy, Qu�ebec, Canada, QC G1V 4C7, 4Natural Resources Canada, Canadian Wood Fibre Centre, 1055 Rue du P.E.P.S.,

C.P. 10380, succ. Sainte-Foy, Qu�ebec, Canada, QC G1V 4C7, 5Department of Biological Sciences, CW405 Biological Sciences

Building, University of Alberta, Edmonton, Canada, AB T6G 2E9, 6Departamento de Ecolog�ıa Evolutiva, Instituto de Ecolog�ıa,

Universidad Nacional Aut�onoma de M�exico, Apartado Postal 70-275, M�exico, D.F. Mexico

Abstract

High-density SNP genotyping arrays can be designed for any species given sufficient sequence information of high

quality. Two high-density SNP arrays relying on the Infinium iSelect technology (Illumina) were designed for use in

the conifer white spruce (Picea glauca). One array contained 7338 segregating SNPs representative of 2814 genes of

various molecular functional classes for main uses in genetic association and population genetics studies. The other

one contained 9559 segregating SNPs representative of 9543 genes for main uses in population genetics, linkage map-

ping of the genome and genomic prediction. The SNPs assayed were discovered from various sources of gene rese-

quencing data. SNPs predicted from high-quality sequences derived from genomic DNA reached a genotyping

success rate of 64.7%. Nonsingleton in silico SNPs (i.e. a sequence polymorphism present in at least two reads) pre-

dicted from expressed sequenced tags obtained with the Roche 454 technology and Illumina GAII analyser resulted

in a similar genotyping success rate of 71.6% when the deepest alignment was used and the most favourable SNP

probe per gene was selected. A variable proportion of these SNPs was shared by other nordic and subtropical spruce

species from North America and Europe. The number of shared SNPs was inversely proportional to phylogenetic

divergence and standing genetic variation in the recipient species, but positively related to allele frequency in P. gla-

uca natural populations. These validated SNP resources should open up new avenues for population genetics and

comparative genetic mapping at a genomic scale in spruce species.

Keywords: gene SNPs, genotyping, Infinium SNP array, interspecific divergence, Picea, SNP sharing

Received 1 October 2012; revision received 1 December 2012; accepted 4 December 2012

Introduction

SNPs have become cornerstone markers for a wide vari-

ety of genetic applications in model and nonmodel spe-

cies (Stapley et al. 2010; Ekblom & Galindo 2011) because

of their abundance in genomes and their amenability to

high-throughput genotyping. EST sequence redundancy

and gene resequencing offer efficient ways to identify

extensive SNP data sets amenable to genotyping

(e.g. Pavy et al. 2006). In conifers, the development of

in silico SNP resources was enhanced by several large-

scale EST sequencing projects (MacKay & Dean 2011).

These SNP resources have enabled the development of

genotyping assays used for linkage mapping (e.g. Pavy

et al. 2008a; Eckert et al. 2010; Chancerel et al. 2011),

quantitative trait loci analyses (e.g. Pelgas et al. 2011) as

well as population genetics and genetic association stud-

ies (e.g. Namroud et al. 2008; Eckert et al. 2009, 2010;

Correspondence: Jean Bousquet, Fax: 418-656-3493;

E-mail: [email protected]

*These authors contributed equally to this work.

© 2013 Blackwell Publishing Ltd

Molecular Ecology Resources (2013) doi: 10.1111/1755-0998.12062

Holliday et al. 2010; Beaulieu et al. 2011; Prunier et al.

2011; Chen et al. 2012). The application of genomic

prediction to trees (Grattapaglia & Resende 2011; Kumar

et al. 2012; Resende et al. 2012) is also expected to

rely heavily on the genotyping of large sets of SNPs

(Grattapaglia et al. 2011).

We have previously developed several spruce SNP

genotyping arrays based on the GoldenGate assay

(Illumina) (Fan et al. 2006), each targeting from many

hundreds to around a 1000 SNPs (e.g. Pavy et al. 2008a,

2012a; Beaulieu et al. 2011; Pelgas et al. 2011; Prunier

et al. 2011). In the present study, we tested previously

genotyped SNPs and many thousand new ones with the

iSelect Infinium genotyping assay (Illumina) (Gunderson

2009), which is more highly multiplexed than the Gold-

enGate assay. Typical of the Infinium assay, the first step

is a whole genome amplification to produce a large num-

ber of DNA copies, which is favourable for obtaining

enough template DNA for species with large genomes,

such as spruces (in excess of 1010 bp, Murray 1998) and

accomodating small tissue samples, such as for nondes-

tructive analysis at the seedling stage. In plants, Infinium

iSelect SNPs arrays have been designed for a number of

species for which the genome is completely or almost

completely sequenced, such as grape (Myles et al. 2010),

maize (Ganal et al. 2011; Mammadov et al. 2011), potato

(Hamilton et al. 2011), apple tree (Chagn�e et al. 2012) and

peach tree (Verde et al. 2012). For plant species with

unsequenced genomes, it has been used for loblolly pine

(Eckert et al. 2010), rye (Haseneyer et al. 2011) and sun-

flower (Bachlava et al. 2012).

Picea is a large circumpolar genus with over 35 species

of northern latitudes or at high altitudes in subtropical

areas. Several spruce species have wide ecological and

economical relevance in northern countries, with tree

improvement and reforestation programmes implicating

millions of seedlings annually (Bousquet et al. 2007; Mul-

lin et al. 2011). The genus is phylogenetically complex

with deep lineages and closely related species forming

hybridizing complexes (Bouill�e et al. 2011). Although

phylogenetically distant spruce lineages have diverged

more than 10 Ma, it has been shown that part of the

nuclear gene SNPs are shared between distant lineages

(Bouill�e & Bousquet 2005; Namroud et al. 2010). As

previously shown for PCR primer sequences (Perry &

Bousquet 1998), SNP genotyping resources developed

for one species may have potential applications for other

species, depending on sequence identity between

genomes. Interspecies transferability would facilitate

population genetics and comparative genome mapping

studies across the genus, especially for species with more

limited genomic resources.

In this study, we have integrated data from high-

throughput sequencing, data mining of gene functions

and functional genomic analyses to identify sets of candi-

date genes for further genetic analyses in spruce. We

designed two large-scale SNP genotyping arrays for

white spruce [Picea glauca (Moench) Voss], which

provided valid genotyping data for 1855 gene SNPs

previously published and for 13 533 new SNPs

distributed among 10 296 genes. We also assessed to

what extent P. glauca SNPs were shared with other

spruce species including endangered subtropical taxa

with limited availability of genomic resources and major

nordic species implicated in various population and

comparative genomics projects.

Methods

SNP discovery and array design

Two iSelect Infinium (Illumina) SNP arrays were

designed from P. glauca SNPs. The PgAS1 array was

mainly designed for population genetics and genetic

association studies, whereas the PgLM3 array was con-

structed for population genetics, genomic prediction and

linkage mapping purposes. SNPs were obtained from

three sources: (i) SNPs recovered from previously

described GoldenGate (Illumina) arrays (Pavy et al.

2008a, 2012a; Beaulieu et al. 2011; Pelgas et al. 2011); (ii)

SNPs identified from the alignment of sequences from

genomic DNA (gDNA) obtained with the Sanger tech-

nology such as previously described (Pavy et al. 2008a);

(iii) in silico SNPs predicted from the alignment of

NextGen transcript sequences (Rigault et al. 2011). The

in silico SNPs assayed on the PgAS1 array were derived

from the alignment of 454 GS-FLX and 454 Titanium

(Roche) cDNA reads sequenced from 40 P. glauca indi-

viduals (reported by Rigault et al. 2011), whereas SNPs

assayed on the PgLM3 array were generated from the

alignment of cDNA reads from the parents of the map-

ping population #C94-1-2516 (♀77111 9 ♂2388) (Pelgas

et al. 2011) sequenced by using the 454 GS-FLX, the 454

Titanium (Roche) and the Genome Analyzer II (Illumina)

platforms. These sequence reads are available in the

NCBI archive under the project number SRP003565. SNP

prediction was achieved with proprietary software

developed by Gydle� (Qu�ebec) by considering several

criteria: depth of alignment, minor allele frequency

(MAF), conformation of the sequence surrounding the

SNP including the absence of repetitive elements and

neighbouring polymorphisms. If a targeted SNP was

close to another polymorphic site, this latter translated

into a degenerated base in the sequence flanking the

SNP. Thus, 20.4% of the sequences submitted to Illumina

for the PgAS1 array included at least one degenerated

base. For the PgLM3 array, there were 6.1% of such

sequences. The design requirement to obtain more SNPs

© 2013 Blackwell Publishing Ltd

2 N. PAVY ET AL .

per gene for genetic association studies led to the selection

of more SNPs per gene than for PgLM3 where only one

SNP per gene was necessary for gene mapping purposes.

Such design requirements resulted in a less optimal selec-

tion of SNPs for the PgAS1 array than for PgLM3, includ-

ing a higher frequency of degenerated sites in the flanking

sequences of the SNPs. Whenever possible, SNPs of type

II (one bead) were prioritized over SNPs of type I (two

beads) (Gunderson 2009). The locus-specific probes were

designed and synthesized by Illumina.

For the PgAS1 array, SNPs were selected in candidate

genes putatively involved in a large number of traits

including wood formation, growth and adaptation to

biotic and abiotic factors (see Results). Thus, a broad list

of 3532 potential candidate genes was compiled based on

transcriptomic studies (Pavy et al. 2008b; El Kayal et al.

2011), functional analyses of transcription factors (Bomal

et al. 2008; Bedon et al. 2010; Cot�e et al. 2010), QTL (Pel-

gas et al. 2011) and outlier detection studies (Namroud

et al. 2008), as well as functional annotations from Arabi-

dopsis genes (e.g. Groover 2005; Demura & Fukuda 2007;

Zhang et al. 2011). As linkage disequilibrium decays

rapidly within P. glauca genes (Namroud et al. 2010;

Pavy et al. 2012b), one SNP was targeted, wherever

possible, every 200 bp of cDNA sequence. This design

constraint implicated that some singleton SNPs (i.e. a

variant observed only once at a given position of the read

sequence alignment) had to be assayed because of low

sequencing depth in some genes, even if it was antici-

pated that the confidence and success rate would be less

for singleton SNPs. On average for in silico PgAS1 SNPs,

the alignment depths were 19.7 and 7.4 reads for nonsin-

gleton and singleton SNPs respectively. In total, 14 734

distinct SNPs were submitted to Illumina. In the end,

13 162 SNPs (89.3%) from 3473 candidate genes could be

assayed whereas for the remaining SNPs, the bead chip

manufacture procedures failed.

With the PgLM3 array, the major aim was to develop

a resource enabling a large coverage of the genome for

genomic prediction in populations in high linkage dis-

equilibrium and for the genetic mapping of as many

genes as possible with a single SNP per gene locus

(except two SNPs for a few high priority genes). More-

over, to ensure map integration with previous spruce

gene linkage maps (Pelgas et al. 2006, 2011; Pavy et al.

2008a, 2012a), SNPs from already positioned genes were

included. SNPs successfully genotyped with the PgAS1

array and polymorphic between parents of the mapping

population were also included on PgLM3. Again, by con-

straint of design, SNPs of lower quality, including single-

tons, had to be used for genes with low sequencing

depth to complete the array. For the PgLM3 array, mean

depths of 43.1 and 28.3 reads were reached for nonsin-

gleton and singleton SNPs respectively. These depths

were higher than the ones obtained for predicting SNPs

for the PgAS1 array, which was consistent with the

higher number of ESTs used in the alignment for the pre-

diction of PgLM3 SNPs. Among the 15 660 SNPs submit-

ted to Illumina, 14 139 (90.3%) distinct SNPs from 14 063

genes were successfully manufactured.

Gene annotation

All genotyped SNPs reported in the present study are

from expressed genes. For diverse types of applications

such as genetic association testing or gene mapping,

gene annotations are an essential part of SNP array

description. A previous report about the white spruce

transcriptome sequenced to date described the gene con-

tent as well as the gene families based on similarities

with sequences from the protein families PFAM database

(Rigault et al. 2011). The sequences were retrieved from

this P. glauca GCAT3.3 assembly (Rigault et al. 2011). To

complement their functional annotations, we performed

a Blastx search with Blast2GO default parameters (but

e-value <e-10) against the nonredundant (nr) protein

sequence database and then run the Gene Ontology map-

ping step by using the plant GO-Slim terms (Conesa et al.

2005). An enrichment analysis of GO terms was con-

ducted by using the Fisher’s two-tailed test implemented

in Blast2GO with a FDR <5%.

Plant material

For the PgAS1 array, 3670 P. glauca trees from more than

40 natural populations representative of the species

range in eastern Canada were genotyped including two

large-scale association genetics populations for growth,

adaptation and wood-related traits, as well as the parents

of the outbred F1 crosses # C94-1-2516 (♀77111 9 ♂2388)

and # C96-1-2856 (♀80112 9 ♂80109) (Pelgas et al. 2011).

Also, pedigree trees or trees from natural populations for

seven other spruce species distributed worldwide were

genotyped: ten individuals from black spruce [P. mariana

(Mill.) B.S.P.], including eight trees from distant natural

populations in Qu�ebec and two pedigree trees; seven

pedigree trees of interior spruce (P. glauca x engelmannii)

provided by K. Ritland (University of British Columbia,

Canada); six pedigree trees from Sitka spruce [P. sitchen-

sis (Bong.) Carri�ere] provided by S. A’Hara and J. Cott-

rell (British Forestry Commission, Midlothian, United

Kingdom); four pedigree trees of Norway spruce [P. abies

(L.) Karst] provided by M. Lascoux (Uppsala University,

Sweden) and M. Fladung (Federal Research Centre for

Forestry and Forest Products, Institute for Forest Genet-

ics and Forest Tree Breeding, Großhansdorf, Germany)

and five individuals from distinct natural populations

for each of three Mexican spruce species, namely

© 2013 Blackwell Publishing Ltd

DESIGN AND APPLICATIONS OF SPRUCE SNP ARRAYS 3

Mexican spruce (P. mexicana Mart�ınez), Chihuahua

spruce (P. chihuahuana Mart�ınez) and Martinez spruce

(P. martinezii TF Patterson). DNA samples for genotyping

were isolated from spruce needles and terminal buds

by using the NucleoSpin 96 Plant II extraction system

of Macherey-Nagel (Bethlehem, Pennsylvania) and the

DNeasy 96 Plant Kit of Qiagen (Mississauga, Ontario).

In total, 2236 P. glauca individuals were genotyped

with the PgLM3 array, the majority of them (1996) form-

ing the linkage mapping population # C94-1-2516 (Pelgas

et al. 2011), whereas the other ones (240) were from natu-

ral populations of eastern Canada. Trees from the seven

other spruce species were also included as described

above. Twenty trees from P. mariana representative of

distant natural populations in Qu�ebec (for a total of 30)

as well as four additional pedigree trees from P. abies

provided by P. Ingvarsson (Umea Plant Science Centre,

Sweden) were also analysed.

Positive controls were used for both the PgAS1 and

PgLM3 SNP arrays. They were the parents of the map-

ping population #C94-1-2516, given that their genotypes

were known for the SNPs recovered from previous Gold-

enGate arrays they aimed at mapping genes (Pelgas et al.

2011; Pavy et al. 2012a). Each parent was replicated on

every other DNA plate prepared for genotyping. Also,

one sample from a natural population was replicated on

each DNA plate. Thus, a total of two positive controls

were used for each DNA plate.

Genotyping assays

The SNP genotyping assays were carried out with the

team of A. Montpetit at the Genome Quebec Innovation

Centre at McGill University (Montr�eal). A minimum of

80 ng of template gDNA per sample was used. Genotype

calling was performed using the GENOME STUDIO V2010.3

software (Illumina). The criteria for calling SNPs were

based on signals detected from all the white spruce sam-

ples as follows: a minimum GenTrain score of 0.15, a

minimum call rate of 50% and a GenCall score above

0.05. The minimum GenTrain score of 0.15 is relatively

permissive and was adopted given that spruce DNA

samples had never been handled on this genotyping

platform before. Therefore, all SNPs with GenTrain score

below 0.4 were visually inspected and, if necessary, man-

ually curated or rejected. This procedure resulted in the

inspection of many hundreds SNPs. A few SNPs also

assayed on previously designed GoldenGate arrays were

removed because of inconsistent clustering, i.e. heterozy-

gous clusters shifted to homozygous ones or inversely,

as described by Mammadov et al. (2011). Also, we con-

sidered only SNPs with a MAF across the P. glauca natu-

ral populations over 0.001 for PgAS1 and 0.01 for

PgLM3, and exhibiting a minimum of two heterozygous

genotypes. A different MAF was used given the different

numbers of trees from natural populations assayed

between the two arrays. Furthermore, SNPs with fixed or

nearly fixed heterozygosity were discarded, as they may

represent paralogous variation. For the PgLM3 array, the

segregation pattern across the genetic mapping popula-

tion was also taken into account.

Genotyping signals obtained for the other species

used to estimate SNP transferability were not included

in the overall analysis of Gentrain scores and success

rates, which was restricted to P. glauca samples. A SNP

was declared valid for a congener when its genotyping

signals were distributed in more than one P. glauca clus-

ter and when the call rate was above 50%. Only SNPs

successfully called for P. glauca were considered in this

comparative analysis.

Results

Efficiency of Infinium arrays

The reproducibility of the Infinium assay estimated with

positive controls was 99.98% for both the PgAS1 and

PgLM3 arrays (respectively, 99.97% and 99.99%), based

on all segregating SNPs. The genotyping of a subset of

240 trees tested with both arrays resulted in a reproduc-

ibility rate of 99.82%, based on a set of 1509 segregating

SNPs replicated on both arrays. A reproducibility rate of

99.49% was also observed between SNPs successfully

genotyped with Infinium and GoldenGate arrays, based

on a set of 1855 segregating SNPs obtained from previ-

ous GoldenGate arrays.

In total, 7338 SNPs distributed over 2814 genes were

successfully genotyped with PgAS1 for 3670 P. glauca

trees, and 9559 SNPs distributed over 9543 genes were

successfully genotyped with PgLM3 for 2236 P. glauca

trees (Table 1). Merging data from both arrays resulted

in 15 388 unique segregating SNPs over 10 296 genes

(Table 2), including 13 533 newly genotyped SNPs and

1855 SNPs previously published. Their annotations are

provided in Table S1.

The success rate was determined as the number of

segregating SNPs in the assay relative the number of

SNPs assayed. The overall success rates were 55.8% and

67.6% for the PgAS1 and the PgLM3 arrays, respectively

(Table 1). SNPs previously genotyped successfully with

a GoldenGate assay resulted in a success rate of 92.3%

with PgAS1 and 95.4% with PgLM3 (Table 1). SNPs

genotyped on the PgAS1 array and included on the

PgLM3 array were recovered at a rate of 90.7% (Table 1).

On PgAS1, SNPs derived from gDNA resequencing were

more successfully genotyped than in silico SNPs pre-

dicted from EST alignments (Table 1). Among in silico

SNPs, the nonsingleton SNPs reached success rates of

© 2013 Blackwell Publishing Ltd

4 N. PAVY ET AL .

55.1% for PgAS1 and 71.6% for PgLM3 (Table 1). The

success rate on a gene basis was higher for PgAS1 (81.0%

of assayed genes) than for the PgLM3 array (67.9% of

assayed genes) given that more SNPs were assayed per

gene with PgAS1 than PgLM3. For all segregating SNPs,

the missing data represented 1.20% of the P. glauca sam-

ples for the PgAS1 and 1.79% for PgLM3.

Successfully genotyped SNPs were grouped accord-

ing to their minor allele frequency (MAF) in natural po-

pulations (Fig. 1). Overall, 20.3% of these SNPs were rare

(MAF <0.05) in white spruce natural populations.

Annotation of the genes carrying segregating SNPs

A large diversity of gene families encompassing many

molecular functions and biological processes was suc-

cessfully genotyped with the SNP arrays (Fig. 2). When

considering segregating SNPs, PgAS1 contained 738

nonredundant gene families representing 27.0% of the

2734 families from the white spruce GCAT3.3 gene cata-

logue (e-value <e-10) (Rigault et al. 2011). On PgLM3,

1868 gene families were represented among segregating

SNPs, corresponding to 68.3% of the families from the

GCAT3.3 gene catalogue. Among the molecular func-

tions from the level 3 of the Gene Ontology, eight clas-

ses were represented by more than 10 genes on both

PgAS1 and PgLM3, and the latter also included 10

genes of the ‘carbohydrate binding’ class (Fig. 2).

Among the classes of biological processes at the level 3

of the GO classification, 23 classes encompassed more

than 10 genes on PgAS1 and 25 classes encompassed

more than 10 genes on PgLM3 (Fig. 2). The gene repre-

sentation of SNP arrays and GCAT3.3 were further com-

pared for GO terms assigned to 10 genes or more at all

levels of classification. As expected, PgLM3 was more

representative of GCAT3.3 than PgAS1: respectively, 39

and 56 GO terms were differentially represented

between PgLM3 or PGAS1 on one hand, and GCAT 3.3

on the other hand (Fisher’s text, FDR <5%). Across

molecular functions, PgLM3 was well representative of

GCAT3.3 with only four terms being differentially repre-

sented, compared to 14 terms between PgAS1 and

GCAT3.3 (Fisher’s test, FDR <5%). These four terms

were protein binding and nucleic acid binding activities

being overrepresented and catalytic activity and lipid

binding being underrepresented on PgLM3.

Numbers of SNPs shared across spruce species

The numbers of shared SNPs were estimated by consid-

ering the overall results from both arrays (Table 2,

Table 3). Results showed that 11 181 P. glauca SNPs

(73.1%) were shared with one or several other spruce

species (Fig. 3). The highest rate (64.4%) was observed

with P. glauca x engelmannii from western Canada

(Table 2). Next, 22.4% of P. glauca SNPs were segregating

in P. sitchensis, a species known to hybridize with P. glau-

ca in western Canada. Lower rates of 17.6% and 12.5%

were observed, respectively, with the North American

P. mariana and the Eurasian P. abies, two species that are

reproductively isolated from P. glauca. Contrasted levels

of shared SNPs were observed with the three subtropical

species from Mexico: 16.7% with P. mexicana, 2.2% with

P. chihuahuana and 1.6% with P. martinezii. Except for the

comparison involving the naturally hybridizing interior

spruce (64.4%) and another species from western North

America P. sitchensis (29.6%), all other comparisons gave

around 10–20% of shared SNPs, except for comparisons

involving P. chihuahuana or P. martinezii where the num-

bers of shared SNPs were well below 10%. The lists of

SNPs successfully genotyped in each species are pro-

vided in Table S2.

A large part of the P. glauca shared SNPs was with a

single other species (34.4%), or two (20.7%) or three

(11.0%) other species (Fig. 3). The SNPs which were

shared by two or more taxa also had the highest MAF

Table 1 Genotyping success of the Picea glauca genotyping

arrays according to SNP sources

SNP

array Source of SNPs

Number of

assayed SNPs

Number of

segregating

SNPs (success

rate in %)

PgAS1

Recovered from

GoldenGate

arrays

1084 1000 (92.3)

Genomic DNA

resequencing

544 352 (64.7)

EST alignments

Singleton SNPs 1645 537 (32.6)

Nonsingleton

SNPs

9889 5449 (55.1)

Total 13 162 7338 (55.8)

Number of

represented genes

3473 genes 2814 genes (81.0)

PgLM3

Recovered from

GoldenGate

arrays

1430 1364 (95.4)

Recovered from

PgAS1 array

1055 957 (90.7)

EST alignments

Singleton SNPs 2518 699 (27.8)

Nonsingleton

SNPs

9136 6539 (71.6)

Total 14 139 9559 (67.6)

Number of

represented genes

14 063 genes 9543 genes (67.9)

© 2013 Blackwell Publishing Ltd

DESIGN AND APPLICATIONS OF SPRUCE SNP ARRAYS 5

within the P. glauca natural populations (Fig. 3). In con-

trast, the 4208 P. glauca SNPs not shared with any other

species tested had a lower MAF in natural populations

of P. glauca (average MAF = 0.081), compared with those

of the 11 181 shared SNPs (average MAF = 0.237)

(Fig. 3). Although in some cases the numbers of shared

SNPs were low, they represent hundreds of validated

SNPs usable for some of these species lacking any geno-

mic resource.

Discussion

Large-scale SNP genotyping in spruces

The genomic resources presented and evaluated herein

represent an unprecedented effort to deploy high-

throughput SNP genotyping in conifers. It is an outcome

from years of effort implicating sequencing projects and

the development of genomics tools including gene

Table 2 Numbers of segregating SNPs from Picea glauca genotyping arrays present in other spruce species

Species

PgAS1 PgLM3 PgAS1 + PgLM3* Total number of

genes carrying at

least one segregating

SNP (sharing

rate in %)†

Number of

segregating

SNPs

Number of

tested

individuals

Number of

segregating

SNPs

Number of

tested

individuals

Number of

segregating SNPs

(sharing rate in %)

Number of

tested

individuals

P. glauca

(white spruce)

7338 3670 9559 2236 15 388 5666 10 296

P. glauca x engelmannii

(interior spruce)

4881 7 6181 7 9903 (64.4) 7 7126 (69.2)

P. sitchensis

(Sitka spruce)

1815 6 2029 6 3451 (22.4) 6 2783 (27.0)

P. mariana

(black spruce)

1162 10 1786 30 2710 (17.6) 30 2354 (22.9)

P. mexicana

(Mexican spruce)

1231 5 1600 5 2565 (16.7) 5 2166 (21.0)

P. abies

(Norway spruce)

891 4 1216 8 1922 (12.5) 8 1710 (16.6)

P. chihuahuana

(Chihuahua spruce)

163 5 210 5 341 (2.2) 5 322 (3.1)

P. martinezii

(Martinez spruce)

154 5 120 5 250 (1.6) 5 229 (2.2)

*Combinations of trees and SNPs present on both genotyping arrays were counted only once.

†Sharing rates estimated on a gene locus basis for potential linkage mapping and genomic prediction applications in populations in

linkage disequilibrium.

20.3%

12.6%11.5%

9.6% 8.8% 8.0% 7.9% 7.3% 6.8% 7.2%

0

0.00–

0.05

0.05–

0.10

0.10–

0.15

0.15–

0.20

0.20–

0.25

0.25–

0.30

0.30–

0.35

0.35–

0.40

0.40–

0.45

0.45–

0.50

500

1000

1500

2000

2500

3000

3500

4000

Num

bers

of s

egre

gatin

g S

NP

s

Minor allele frequency class

PgAS1 PgLM3 bothFig. 1 Distribution of minor allele fre-

quencies in natural populations of Picea

glauca in eastern Canada as tested with

3670 trees genotyped with the PgAS1 SNP

array and 240 trees genotyped with the

PgLM3 SNP array. The overall percentage

of segregating SNPs in each frequency

class is indicated on top of histograms.

© 2013 Blackwell Publishing Ltd

6 N. PAVY ET AL .

12181819232425354048545670858797

150154160

320352

438647

0 500 1000

cell cycle (0.4%)cellular developmental process (0.6%)

cellular homeostasis (0.6%)cell death (0.6%)

response to biotic stimulus (0.8%)reproductive process (0.8%)

response to external stimulus (0.9%)regulation of biological quality (1.2%)

anatomical structure development (1.4%)response to endogenous stimulus (1.6%)

nitrogen compound metabolic process (1.8%)secondary metabolic process (1.9%)

multicellular organismal development (2.4%)cellular response to stimulus (2.9%)

regulation of biological process (3.0%)response to abiotic stimulus (3.3%)establishment of localization (5.1%)

response to stress (5.3%)catabolic process (5.5%)

macromolecule metabolic process (10.9%)cellular metabolic process (12.0%)

biosynthetic process (14.9%)primary metabolic process (22.1%)

PgAS1

11253144455563657683103111125183192199229292335338

525865912

9861474

0 1000 2000

tropism (0.1%)cell communication (0.3%)

cell death (0.4%)cellular developmental process (0.6%)

cell cycle (0.6%)response to external stimulus (0.7%)

cellular homeostasis (0.9%)reproductive process (0.9%)

secondary metabolic process (1.0%)response to biotic stimulus (1.1%)

regulation of biological quality (1.4%)anatomical structure development (1.5%)response to endogenous stimulus (1.7%)

cellular response to stimulus (2.5%)regulation of biological process (2.6%)

response to abiotic stimulus (2.7%)multicellular organismal development (3.1%)

nitrogen compound metabolic process (4.0%)catabolic process (4.5%)

response to stress (4.6%)establishment of localization (7.1%)

macromolecule metabolic process (11.7%)cellular metabolic process (12.4%)

biosynthetic process (13.4%)primary metabolic process (20.0%)

PgLM3

146069

210273

306367

446

0 500

lipid binding (0.8%)signal transducer (3.4%)

sequence-specific DNA binding TF (4.0%)nucleic acid binding (12.0%)

protein binding (15.6%)nucleotide binding (17.5%)

hydrolase (21.0%)transferase (25.6%)

PgAS1

2629

120121

620688

843900919

0 500 1000

lipid binding (0.6%)carbohydrate binding (0.7%)

sequence-specific DNA binding TF (2.8%)signal transducer (2.8%)

nucleic acid binding (14.5%)protein binding (16.1%)

nucleotide binding (19.8%)transferase (21.1%)

hydrolase (21.5%)

PgLM3

(A) (B)

(C) (D)

Fig. 2 Biological processes (A-B) and molecular functions (C-D) assigned at level 3 of the GO classification for genes carrying segregat-

ing SNPs in Picea glauca genotyping arrays. Only GO terms assigned to 10 or more genes successfully genotyped on PgAS1 (A, C) and

PgLM3 (B, D) SNP arrays are represented. Bars represent the numbers of genes found for each GO term, with proportions of total anno-

tations in parentheses.

Table 3 Numbers and rates (in parentheses in %) of shared SNPs between spruce species based on a sample of 15,388 nonredundant

segregating SNPs from the Piceaglauca PgAS1 and PgLM3 genotyping arrays. Rates of shared SNPs were estimated relative to the total

number of segregating nonredundant SNPs for each pairwise comparison. There were 7338 P. glauca segregating SNPs from the PgAS1

array, 9559 segregating SNPs from the PgLM3 array, whereas 1509 segregating SNPs were shared by both arrays

Species

P. glauca

(white

spruce)

P. glauca x

engelmannii

(Interior

spruce)

P. sitchensis

(Sitka

spruce)

P. mariana

(black

spruce)

P. mexicana

(Mexican

spruce)

P. abies

(Norway

spruce)

P. chihuahuana

(Chihuahua

spruce)

P. martinezii

(Martinez

spruce)

P. glauca 15 388 (100) 9904 (64.4) 3452 (22.4) 2710 (17.6) 2565 (16.7) 1922 (12.5) 341 (2.2) 250 (1.6)

P. glauca x

engelmannii

3053 (29.6) 2112 (20.1) 2272 (22.3) 1604 (15.7) 271 (2.7) 190 (1.9)

P. sitchensis 1114 (22.1) 1157 (23.8) 864 (19.2) 177 (4.9) 110 (3.1)

P. mariana 790 (17.6) 789 (20.5) 187 (6.5) 133 (4.7)

P. mexicana 605 (15.6) 141 (5.1) 79 (2.9)

P. abies 119 (5.6) 86 (4.1)

P. chihuahuana 33 (5.9)

Number of

genotyped individuals

5666 7 6 30 5 8 5 5

© 2013 Blackwell Publishing Ltd

DESIGN AND APPLICATIONS OF SPRUCE SNP ARRAYS 7

catalogues in spruces. The present SNP arrays are pub-

licly available and should contribute to significant

resources and knowledge in population genetics, associa-

tion genetics, genomic prediction and genome linkage

mapping in P. glauca, in addition to information for the

design of future genotyping arrays in this species and

other spruce taxa. The use of these SNPs is independent

from the Illumina technologies as the deposited informa-

tion provided through the dbSNP database enables to

design any genotyping assay. For genetic association test-

ing, the PgAS1 array represents 7-fold and 5-fold increases

in SNP and gene coverage, respectively, compared with

previous GoldenGate arrays (Beaulieu et al. 2011; Nam-

roud et al. 2012). For gene linkage mapping, the PgLM3

array represents a 5-fold increase in gene coverage com-

pared with GoldenGate arrays used to map 1800 genes on

the white spruce genome (Pavy et al. 2012a), which should

be helpful to map and pinpoint genes involved in quanti-

tative trait loci more precisely (Pelgas et al. 2011).

Together, the arrays represent an increase of nearly 30X in

SNP coverage compared with first-generation GoldenGate

arrays used to identify outlier polymorphisms related to

adaptation in spruce natural populations (Namroud et al.

2008; Prunier et al. 2011). A marginally higher proportion

of rare SNPs (21% vs. 18% for SNPs with P < 0.05) was

noted for the PgAS1 array compared with PgLM3, consis-

tent with the nearly 20 times larger sample of trees from

natural populations genotyped with PgAS1.

Performance of the assays

The highest genotyping success rates were above 90%

and were obtained for SNPs previously genotyped with

GoldenGate arrays as well as for successful SNPs from

the Infinium array PgAS1 that were included in PgLM3.

Similar success rates with such SNPs were obtained in

other studies on apple trees (Chagn�e et al. 2012) and

maize (Mammadov et al. 2011). In maize, 89% of Golden-

Gate SNPs segregated with the Infinium assay; the fail-

ures were attributed to differences in position and length

of SNP probes between assays (Mammadov et al. 2011).

Among the sources of SNPs tested, those derived

from gDNA resequencing with the Sanger technology

were quite successful (64.7% on PgAS1). Marginally

higher success rates were observed in previously

described genotyping arrays relying on newly discov-

ered SNPs from similar resequencing of genomic DNA

in spruces (Pavy et al. 2008a; Beaulieu et al. 2011; Pelgas

et al. 2011) and pines (Lepoittevin et al. 2010; Chancerel

et al. 2011). However, this sequencing approach is time-

consuming and labour intensive even though different

pooling DNA schemes can be used to reduce the

sequencing effort (Pelgas et al. 2004).

The success obtained for in silico nonsingleton SNPs

was the highest (71.6%) for the PgLM3 array, which was

marginally higher than that observed for SNPs discov-

ered by resequencing gDNA for the PgAS1 array. Single-

ton SNPs that were assayed by chip design constraint

had the lowest genotyping success rates for both arrays

among the in silico SNPs predicted from NextGen

sequences. For these SNPs, the lower coverage and lack

of repeatability in the alignment of transcript sequences

was by far the biggest cause of failure.

Another probable cause of assay failure is linked to

the design of long (50-mer) oligos on transcript

sequences, where the placement of introns is not known

and where it is possible that the probe could span an

intron. This effect would be more severe when multiple

SNPs of the same gene had to be tested by chip design

constraint, such as for the PgAS1 array. Indeed, PgAS1

was designed primarily for genetic association testing of

candidate genes where the success rate for nonsingleton

SNPs from transcript sequence alignments was notice-

ably lower than that for PgLM3 where the chip design

required the genotyping of only one SNP per gene. The

extent of this potential problem will be better known

from looking at genomic sequences including introns.

Such sequences are currently being obtained from

sequencing the white spruce genome (www.smartfor-

ests.ca/en-ca/home.aspx).

The other ‘usual culprits’ of failure in genome-wide

arrays (repeats, low-complexity, dense local SNPs) proba-

bly played a minor role in the rates of success obtained.

Both SNP arrays were mostly designed from exon

sequences, which are mostly high-complexity, unique

sequences. As well, the reference exon sequences were

of high quality as they were derived from a reliable

0.10

0.20

0.30

0.40

0

1000

2000

3000

4000

5000

6000

0* 1 2 3 4 5 6 7

Min

or a

llele

freq

uenc

y in

P. g

lauc

apo

pula

tions

Num

ber o

f seg

rega

ting

SN

Ps

Number of taxa sharing P. glauca SNPs

Number of segregating SNPsMinor allele frequency in P. glauca populations

27.35%

34.44%

20.72%

10.96%

4.40%1.75% 0.35% 0.02%

0

Fig. 3 Transferability of Picea glauca SNPs across spruce species

according to their minor allele frequencies (MAF) as estimated

in natural populations of P. glauca in eastern Canada. The per-

centages of SNPs are indicated on top of histograms. The cate-

gory named 0* includes SNPs segregating only in P. glauca.

© 2013 Blackwell Publishing Ltd

8 N. PAVY ET AL .

Sanger-based catalogue of cDNAs (Rigault et al. 2011).

SNPs in flanking sequences were accounted for to the

extent that they were detected, given the low coverage

for some genes and lack of intron sequences in such a

catalogue. Such SNPs in flanking sequences are also

much less of a problem in Infinium assays, which can

tolerate several mismatches in a 50 mer, than for the

GoldenGate assay.

In spite of their lower success rate, singleton SNPs pro-

vided hundreds of segregating markers that will be use-

ful for genetic analyses. A similarly low genotyping

success rate (33.3%) of in silico singleton SNPs was also

found based on a GoldenGate array in catfish (Wang et al.

2008). In other species with no genomic sequence avail-

able, diverse success rates were reported for in silico SNPs

tested on Infinium arrays. In sunflower, 84.1% of the

attempted nonsingleton in silico SNPs were recovered,

with 80.6% of them segregating (Bachlava et al. 2012).

This high success rate resulted from stringent conditions

applied in the SNPs selection, such as sequence depth

and redundancy among genotypes (Bachlava et al. 2012).

By including only nonsingleton in silico SNPs, 58.2% of

the SNPs turned out to be valid in a rye SNP array (Hase-

neyer et al. 2011). This rate is in the same range as that

observed in the present study for the PgAS1 array,

although lower than that observed for PgLM3.

Overall, the PgLM3 array performed better than

PgAS1. PgLM3 was based on SNPs predicted in deeper

alignments, with more repetitions at each position and

greater reliability of SNP calling. The mean depths at the

predicted nonsingleton SNP site were 19.7 and 43.1 reads

for PgAS1 and PgLM3 respectively (see Methods). Fur-

thermore, given the less stringent chip design constraints

where only one SNP per gene was selected for building

PgLM3, the SNP harbouring the best quality could be

retained. Our general findings and those of other studies

indicated that singleton SNPs should be avoided as

much as possible, as there is a high likelihood that they

represent sequencing errors (Pavy et al. 2006; Appleby

et al. 2009; Metzker 2009; van Oeveren & Janssen 2009).

Sequence assembly and SNP probe design also represent

significant challenges because the spruce genome is not

yet sequenced and contains large gene families (e.g. Be-

don et al. 2010). Given that our main goal was not to

study the effect of the in silico prediction parameters

upon the genotyping success rate, but to obtain a large

validated SNP resource enabling future genetic investi-

gations at the spruce transcriptome level, the objective

was well achieved.

SNP annotation

Both arrays described in this report contained a higher

proportion of genes (carrying segregating SNPs) based

on similarity to known protein families or with the GO

classification compared with the overall white spruce

gene catalogue (Table 2). The higher representation of

annotated genes on the genotyping arrays resulted from

the design selection criteria for candidate genes, in par-

ticular when designing PgAS1 compared with PgLM3

gene sequences (Table 2). The arrays were complemen-

tary with regard to the representation of some gene cate-

gories. For example, genes involved in catalytic activity

were overrepresented on PgAS1 and underrepresented

on PgLM3. Genes known to be involved in wood forma-

tion (cell wall, lignification, secondary metabolism), as

well as transcription factors which represent an essential

focus of our research on wood developmental regulation

(e.g. Bomal et al. 2008; Bedon et al. 2010) were also more

represented among the genes of PgAS1.

Numbers of SNPs shared across spruce species

The several thousands P. glauca individuals genotyped

in the present study were from the eastern part of the

species range in Canada where there is little or no signifi-

cant genetic structure among natural populations (Jara-

millo-Correa et al. 2001; Namroud et al. 2008, 2010;

Beaulieu et al. 2011), and where populations belong to

the same geographical lineage (de Lafontaine et al. 2010).

Therefore, for the P. glauca segregating SNPs identified

in the present study, a very high transfer rate should be

obtained at the intraspecific level when using these SNPs

in different regions or with geographically different po-

pulations of the eastern lineage. For populations of the

western lineage (de Lafontaine et al. 2010), a high trans-

fer rate should also be obtained with a value in excess of

that found for the naturally hybridizing interior spruce

species complex, P. glauca x engelmannii (64.4%). The

transfer rate across P. glauca populations will also attain

its maximum value by avoiding SNPs with low natural

frequency (MAF), as shown for the interspecific transfer

rates (Fig. 3).

Although small panels of individuals were used to

assess the numbers of shared SNPs across spruce species,

the results obtained should be highly useful given that

the shared SNPs are directly amenable to genotyping

with expected success rate of 90% or more using similar

platforms, and that for some of the species tested, there

exist little or no genomic resources available. Sets of

shared SNPs will be useful not only for population

genomics applications but also to accelerate comparative

genome mapping studies among spruces. The variation

in numbers of shared SNPs was large and could be

explained following a number of factors. The highest

number was observed with P. glauca x P. engelmannii

(interior spruce), a species complex implicating hybrids of

various composition in British Columbia. P. engelmannii

© 2013 Blackwell Publishing Ltd

DESIGN AND APPLICATIONS OF SPRUCE SNP ARRAYS 9

Parry is also the sister taxon to P. glauca according to

both cpDNA and mtDNA phylogenies (Bouill�e et al.

2011). The second number of shared SNPs was observed

with P. sitchensis, which hybridizes naturally with

P. glauca in their large zone of contact in British Colum-

bia. Even if the genotyped individuals originated from

Britain where the species has been introduced more

than 100 years ago, their natural origin is from western

North America. Whereas the cpDNA phylogeny

places P. sitchensis as a remote lineage to that leading to

P. glauca and P. engelmannii, P. sitchensis forms, with

P. engelmannii, a close sister group to P. glauca on the

mtDNA phylogeny (Bouill�e et al. 2011). Thus, the phylo-

genetic placement of P. sitchensis is uncertain, but overall,

it is phylogenetically more remote from P. glauca than

P. engelmannii is, a pattern in line with the number of

shared SNPs observed.

As for P. mexicana, P. mariana and P. abies, the num-

bers of shared SNPs were noticeably lower than for the

two previous taxa. On the mtDNA phylogeny (Bouill�e

et al. 2011), these species are located well apart from

white spruce, with the European P. abies being the most

divergent one, followed by the North American P. mari-

ana and P. mexicana. On the cpDNA phylogeny (Bouill�e

et al. 2011), P. mexicana is also located closer to P. glauca

than the two other species. Although P. glauca and P.

mexicana are allopatric, they can cross readily (Gordon

1982), whereas for P. glauca, P. mariana and P. abies, inter-

specific crossability is low or null (Mikkola 1969; Gordon

1976), a pattern consistent with the larger phylogenetic

divergence observed among them (Bouill�e et al. 2011).

Thus, if phylogeny and crossability are considered as

predictors of the numbers of SNPs shared between P. gla-

uca and its congeners, this number is expected to be

higher between P. glauca and P. mexicana than that with

either P. mariana or P. abies. The number of SNPs shared

between P. glauca and P. mexicana was higher than that

observed with P. abies, but lower than that with P. mari-

ana. This partly unexpected result could be due to the

marginally reduced genetic diversity in P. mexicana, or

more likely to an inflated number of SNPs shared with

P. mariana, given that a larger number of P. mariana indi-

viduals had to be considered for various purposes.

Indeed, when a more restricted number of trees was con-

sidered for P. mariana, as that tested for the PgAS1 array

(Table 2), a smaller number of shared SNPs was

observed for P. mariana than for P. mexicana. Hence, for

these various species, the variability in numbers of

shared SNPs was well in line with expectations from

crossability studies and patterns of phylogenetic diver-

gence from P. glauca.

Unexpected small numbers of shared SNPs were

observed for the Mexican P. chihuahuana and P. martin-

ezii. To our knowledge, there is no reliable crossability

data between these species and P. glauca, but their phy-

logenetic divergence from P. glauca is about the same as

that observed between P. mexicana and P. glauca on the

mtDNA phylogeny, while this divergence is in the same

range as that observed for P. mariana and P. abies on the

cpDNA phylogeny (Bouill�e et al. 2011). Thus, according

to expectations from phylogenetic divergence, a larger

number of shared SNPs was anticipated with these spe-

cies, somewhere between that observed for P. mexicana

and that for P. mariana or P. abies. The observed low

numbers suggest the existence of another effect than

solely phylogenetic divergence. This trend could be

explained in part by lower levels of genome-wide stand-

ing genetic variation in P. chihuahuana and P. martinezii

than in P. mexicana, consistent with reports of reduced

levels of genetic diversity in these endangered mountain

species of Mexico (Ledig et al. 1997, 2000; Jaramillo-Cor-

rea et al. 2006). Although P. mexicana is also restricted to

a small number of populations, it harbours higher

observed and expected heterozygosities and number of

alleles per allozyme loci compared with the other two

Mexican species, together with a significant excess of

heterozygotes within populations (Ledig et al. 2002).

This pattern implies that the standing genetic variation

in P. mexicana might be in part maintained by selection

against inbreds, as observed for other conifers (Isabel

et al. 1995). This higher genetic diversity presumably

contributed to the larger number of SNPs shared with

P. mexicana compared with the other two Mexican species.

The present results also indicate that whereas some of

the gene SNPs might be shared because of a lack of

reproductive isolation and interspecific gene flow, such

as between P. glauca, P. glauca x P. engelmannii (interior

spruce) and P. sitchensis, others are probably the result of

shared ancestry between species that have diverged

many million years ago, such as between the reproduc-

tively isolated P. glauca and P. mariana, or P. abies (Bouill�e

& Bousquet 2005; Namroud et al. 2010). Also, a positive

relationship was observed between the minor allele fre-

quency of SNPs in natural populations of P. glauca and

the number of SNPs shared with other spruce species

(Fig. 3). This trend indicates that, on average, frequent

alleles were exchanged more often between hybridizing

taxa, and/or are likely to have a more ancient origin, in

some cases dating back million years ago (Bouill�e &

Bousquet 2005).

The net value of the SNP resource reported here

extends well beyond its usefulness for white spruce. It is

a validated resource at the genotyping level, and largely

transferable to other spruce species for population genet-

ics applications and comparative mapping purposes.

These markers are also far from being anonymous, being

representative of a diversity of targeted genes well anno-

tated through a comprehensive gene catalogue and in

© 2013 Blackwell Publishing Ltd

10 N. PAVY ET AL .

the process of being progressively mapped onto the

spruce genome (Pavy et al. 2012a). The complete

genomic resource has been made available through the

dbSNP database.

Acknowledgements

This work was funded by grants from Genome Quebec and

Genome Canada to J. Mackay and J. Bousquet for Arborea II

and a pilote genome sequencing project, by a NSERC dis-

covery grant to J. Bousquet, and grants from the Genomics

R&D Initiative to J. Beaulieu and N. Isabel. We thank Frank

Bedon, Claude Bomal, S�ebastien Caron, Isabelle Gigu�ere, Vicky

Roy (Univ. Laval), Denis Lachance, Caroline Levasseur, Marie-

Jos�ee Morency, Armand S�eguin (Canadian Forest Service,

Qu�ebec, Canada) for contributions to the selection of candidate

genes. We are also grateful to the collaborators who provided

needle specimens or DNA samples from pedigree populations:

Kermit Ritland (Univ. of British Columbia, Canada), Stuart

A’Hara and Joan Cottrell (British Forestry Commission, Midlo-

thian, United Kingdom), Martin Lascoux (Uppsala Univ., Swe-

den), Matthias Fladung (Federal Research Centre for Forestry

and Forest Products, Institute for Forest Genetics and Forest

Tree Breeding, Großhansdorf, Germany) and P€ar Ingvarsson

(Univ. of Umea Plant Science Centre, Sweden). The authors

wish to acknowledge Sauphie Senneville (Univ. Laval) for

logistics and infrastructure support, as well as Daniel Vincent

and Alexandre Montpetit of the genotyping platform of the

McGill University and G�enome Qu�ebec Innovation Centre for

their excellent work and assistance with conducting the geno-

typing assays.

References

Appleby N, Edwards D, Batley J (2009) New technologies for ultra-high

throughput genotyping in plants. Methods in Molecular Biology, 513,

19–39.

Bachlava E, Taylor CA, Tang S et al. (2012) SNP discovery and develop-

ment of a high-density genotyping array for sunflower. PLoS ONE, 7,

e29814.

Beaulieu J, Doerksen T, Boyle B et al. (2011) Association genetics of wood

physical traits in the conifer white spruce and relationships with gene

expression. Genetics, 188, 197–214.

Bedon F, Bomal C, Caron S et al. (2010) Subgroup 4 R2R3-MYBs in conifer

trees: gene family expansion and contribution to the isoprenoid- and

flavonoid-oriented responses. Journal of Experimental Botany, 61,

3847–3864.

Bomal C, Bedon F, Caron S et al. (2008) Involvement of Pinus taeda MYB1

and MYB8 in phenylpropanoid metabolism and secondary cell wall

biogenesis: a comparative in planta analysis. Journal of Experimental Bot-

any, 59, 3925–3939.

Bouill�e M, Bousquet J (2005) Trans-species shared polymorphisms at or-

thologous nuclear gene loci among distant species in the conifer Picea

(Pinaceae): implications for the long-term maintenance of genetic

diversity in trees. American Journal of Botany, 92, 63–73.

Bouill�e M, Senneville S, Bousquet J (2011) Discordant mtDNA and

cpDNA phylogenies indicate geographic speciation and reticulation as

driving factors for the diversification of the genus Picea. Tree Genetics &

Genomes, 7, 469–484.

Bousquet J, Isabel N, Pelgas B et al. (2007) Spruce. In:Forest trees (ed. Kole

C), pp. 93–114. Springer-Verlag, Berlin Heidelberg New York.

Chagn�e D, Crowhurst RN, Troggio M et al. (2012) Genome-wide SNP

detection, validation, and development of an 8K SNP array for apple.

PLoS ONE, 7, e31745.

Chancerel E, Lepoittevin C, Le Provost G et al. (2011) Development and

implementation of a highly-multiplexed SNP array for genetic map-

ping in maritime pine and comparative mapping with loblolly pine.

BMC Genomics, 12, 368.

Chen J, K€allman T, Ma X et al. (2012) Disentangling the roles of history

and local selection in shaping clinal variation in allele frequencies

and gene expression in Norway spruce (Picea abies). Genetics, 191,

865–881.

Conesa A, Gotz S, Garcia-Gomez JM et al. (2005) Blast2GO: a universal

tool for annotation, visualization and analysis in functional genomics

research. Bioinformatics, 21, 3674–3676.

Cot�e CL, Boileau F, Roy V et al. (2010) Gene family structure, expression

and functional analysis of HD-Zip III genes in angiosperm and gymno-

sperm forest trees. BMC Plant Biology, 10, 273.

Demura T, Fukuda H (2007) Transcriptional regulation in wood forma-

tion. Trends in Plant Science, 12, 64–70.

Eckert AJ, Bower AD, Wegrzyn JL et al. (2009) Association genetics of

coastal Douglas fir (Pseudotsuga menziesii var. menziesii, Pinaceae).

I. Cold-hardiness related traits. Genetics, 182, 1289–1302.

Eckert AJ, van Heerwaarden J, Wegrzyn JL et al. (2010) Patterns of popu-

lation structure and environmental associations to aridity across the

range of loblolly pine (Pinus taeda L., Pinaceae). Genetics, 185, 969–982.

Ekblom R, Galindo J (2011) Applications of next generation sequencing

in molecular ecology of non-model organisms. Heredity, 107, 1–15.

El Kayal W, Allen CCG, Ju CJ-T et al. (2011) Molecular events of apical

bud formation in white spruce, Picea glauca. Plant, Cell & Environment,

34, 480–500.

Fan JB, Chee MS, Gunderson KL (2006) Highly parallel genomic assays.

Nature Reviews Genetics, 7, 632–644.

Ganal MW, Durstewitz G, Polley A et al. (2011) A large maize (Zea mays

L.) SNP genotyping array: development and germplasm genotyping,

and genetic mapping to compare with the B73 reference genome. PlosS

ONE, 6, e28334.

Gordon AG (1976) The taxonomy and genetics of Picea rubens and its rela-

tionship to Picea mariana. Canadian Journal of Botany, 54, 781–813.

Gordon A (1982) Genetics of genecology of spruce, Sault Ste. Marie,

Ontario, 1979 and 1980. In: Proceedings of the eighteenth meeting of the

Canadian Tree Improvement Association. Part 1 (ed. Yeatman C), pp.

112–115. Canadian Tree Improvement Association, Duncan, British

Columbia.

Grattapaglia D, Resende MDV (2011) Genomic selection in forest tree

breeding. Tree Genetics & Genomes, 7, 241–255.

Grattapaglia D, Silva-Junior OB, Kirst M et al. (2011) High-throughput

SNP genotyping in the highly heterozygous genome of Eucalyptus:

assay success, polymorphism and transferability across species. BMC

Plant Biology, 11, 65.

Groover AT (2005) What genes make a tree a tree? Trends in Plant Science,

10, 210–214.

Gunderson KL (2009) Whole-genome genotyping on bead arrays. Meth-

ods in Molecular Biology, 529, 197–213.

Hamilton JP, Hansey CN, Whitty BR et al. (2011) Single nucleotide poly-

morphism discovery in elite North American potato germplasm. BMC

Genomics, 12, 302.

Haseneyer G, Schmutzer T, Seidel M et al. (2011) From RNA-seq to large-

scale genotyping - genomics resources for rye (Secale cereale L.). BMC

Plant Biology, 11, 131.

Holliday J, Ritland K, Aitken SN (2010) Widespread, ecologically relevant

genetic markers developed from association mapping of climate-

related traits in Sitka spruce (Picea sitchensis). New Phytologist, 188,

501–514.

Isabel N, Beaulieu J, Bousquet J (1995) Complete congruence between

gene diversity estimates derived from genotypic data at enzyme and

RAPD loci in black spruce. Proceedings of the National Academy of

Sciences of the USA, 92, 6369–6373.

© 2013 Blackwell Publishing Ltd

DES IGN AND APPLICATIONS OF SPRUCE SNP ARRAYS 11

van Oeveren J, Janssen A (2009) Mining SNPs from DNA sequence data;

computational approaches to SNP discovery and analysis. Methods in

Molecular Biology, 578, 73–91.

Jaramillo-Correa JP, Beaulieu J, Bousquet J (2001) Contrasting evolution-

ary forces driving population structure at expressed sequence tag

polymorphisms, allozymes and quantitative traits in white spruce.

Molecular Ecology, 10, 2729–2740.

Jaramillo-Correa JP, Beaulieu J, Ledig FT et al. (2006) Decoupled mito-

chondrial and chloroplast DNA population structure reveals Holocene

collapse and population isolation in a threatened Mexican-endemic

conifer. Molecular Ecology, 15, 2787–2800.

Kumar S, Bink MCAM, Volz RK et al. (2012) Towards genomic selection

in apple (Malus 9 domestica Borkh.) breeding programmes: Prospects,

challenges and strategies. Tree Genetics & Genomes, 8, 1–14.

de Lafontaine G, Turgeon J, Payette S (2010) Phylogeography of white

spruce (Picea glauca) in eastern North America reveals contrasting eco-

logical trajectories. Journal of Biogeography, 37, 741–751.

Ledig FT, Jacob-Cervantes V, Hodgskiss PD et al. (1997) Recent evolution

and divergence among populations of a rare Mexican endemic, Chi-

huahua spruce, following Holocene climatic warming. Evolution, 51,

1815–1827.

Ledig FT, Bermejo-Vel�azquez B, Hodgskiss PD et al. (2000) The mating

system and genic diversity in Mart�ınez spruce, an extremely rare ende-

mic of M�exico’s Sierra Madre Oriental: an example of facultative sel-

fing and survival in interglacial refugia. Canadian Journal of Forest

Research, 30, 1156–1164.

Ledig FT, Hodgskiss PD, Jacob-Cervantes V (2002) Genetic diversity,

mating system, and conservation of a Mexican subalpine relict, Picea

mexicana Mart�ınez. Conservation Genetics, 3, 113–122.

Lepoittevin C, Frigerio JM, Garnier-G�er�e P et al. (2010) In vitro vs in silico

detected SNPs for the development of a genotyping array: what can

we learn from a non-model species? PLoS ONE, 5, e11034.

MacKay J, Dean J (2011) Transcriptomics. In: Genetics, Genomics and Breed-

ing of Conifers (eds Plomion C, Bousquet J & Kole C), pp. 323–357.

Edenbridge Science Publishers & CRC Press, New York.

Mammadov JA, Chen W, Mingus J et al. (2011) Development of versatile

gene-based SNP assays in maize (Zea mays L.). Molecular Breeding, 29,

779–790.

Metzker ML (2009) Sequencing technologies - the next generation. Nature

Reviews Genetics, 11, 31–46.

Mikkola L (1969) Observations on inter-specific sterility in Picea. Annals of

Botany Fenicci, 6, 285–339.

Mullin T, Andersson B, Bastien J-C et al. (2011) Economic importance,

breeding objectives and achievements. In: Genetics, Genomics and Breed-

ing of Conifers (eds Plomion C, Bousquet J & Kole C), pp. 40–127. Eden-

bridge Science Publishers & CRC Press, New York.

Murray BG (1998) Nuclear DNA amounts in gymnosperms. Annals of

Botany, 82, 3–15.

Myles S, Chia J-M, Hurwitz B et al. (2010) Rapid genomic characteriza-

tion of the genus Vitis. PLoS ONE, 5, e8219.

Namroud M-C, Beaulieu J, Juge N et al. (2008) Scanning the genome

for gene single nucleotide polymorphisms involved in adaptive

population differentiation in white spruce. Molecular Ecology, 17,

3599–3613.

Namroud M-C, Guillet-Claude C, Mackay J et al. (2010) Molecular evolu-

tion of regulatory genes in spruces from different species and conti-

nents: heterogeneous patterns of linkage disequilibrium and selection

but correlated recent demographic changes. Journal of Molecular Evolu-

tion, 70, 371–386.

Namroud M-C, Bousquet J, Doerksen T et al. (2012) Scanning SNPs from

a large set of expressed genes to assess the impact of artificial selection

on the undomesticated genetic diversity of white spruce. Evolutionary

Applications, 5, 641–656.

Pavy N, Parsons LS, Paule C et al. (2006) Automated SNP detection from

a large collection of white spruce expressed sequences: contributing

factors and approaches for the categorization of SNPs. BMC Genomics,

7, 174.

Pavy N, Pelgas B, Beauseigle S et al. (2008a) Enhancing genetic mapping

of complex genomes through the design of highly-multiplexed SNP

arrays: application to the large and unsequenced genomes of white

spruce and black spruce. BMC Genomics, 9, 21.

Pavy N, Boyle B, Nelson C et al. (2008b) Identification of conserved core

xylem gene sets: conifer cDNA microarray development, transcript

profiling and computational analyses. New Phytologist, 180, 766–786.

Pavy N, Pelgas B, Laroche J et al. (2012a) A spruce gene map infers

ancient genome reshuffling and subsequent slow evolution in the

gymnosperm lineage leading to extant conifers. BMC Biology, 10, 84.

Pavy N, Namroud M-C, Gagnon F et al. (2012b) The heterogeneous levels

of linkage disequilibrium in white spruce genes and comparative anal-

ysis with other conifers. Heredity, 108, 273–284.

Pelgas B, Isabel N, Bousquet J (2004) Efficient screening for expressed

sequence tag polymorphisms (ESTPs) by DNA pool sequencing and

denaturing gradient gel electrophoresis (DGGE) in spruces. Molecular

Breeding, 13, 263–279.

Pelgas B, Beauseigle S, Acher�e V et al. (2006) Comparative genome map-

ping between Picea glauca, P. abies and P. mariana x rubens, and corre-

spondance with other Pinaceae. Theoretical and Applied Genetics, 113,

1371–1393.

Pelgas B, Bousquet J, Meirmans PG et al. (2011) QTL mapping in white

spruce: gene maps and genomic regions underlying adaptive traits

across pedigrees, years and environments. BMC Genomics, 12, 145.

Perry DJ, Bousquet J (1998) Sequence-tagged-site (STS) markers of arbi-

trary genes: the utility of black spruce-derived STS primers in other

conifers. Theoretical and Applied Genetics, 97, 735–743.

Prunier J, Laroche J, Beaulieu J et al. (2011) Scanning the genome for gene

SNPs related to climate adaptation and estimating selection at the

molecular level in boreal black spruce. Molecular Ecology, 20,

1702–1716.

Resende MFRJ, Munos P, Acosta JJ et al. (2012) Accelerating the domesti-

cation of trees using genomic selection: accuracy of prediction models

across ages and environments. New Phytologist, 193, 617–624.

Rigault P, Boyle B, Lepage P et al. (2011) A white spruce gene catalogue

for conifer genome analyses. Plant Physiology, 157, 14–28.

Stapley J, Reger J, Feulner PGD et al. (2010) Adaptation genomics: the

next generation. Trends in Ecology & Evolution, 25, 705–712.

Verde I, Bassil N, Scalabrin S et al. (2012) Development and evaluation of

a 9K SNP array for peach by internationally coordinated SNP detection

and validation in breeding germplasm. PLoS ONE, 7, e35668.

Wang S, Sha Z, Sonstegard TS et al. (2008) Quality assessment parameters

for EST-derived SNPs from catfish. BMC Genomics, 9, 450.

Zhang J, Elo A, Helariutta Y (2011) Arabidopsis as a model for wood for-

mation. Current Opinion in Biotechnology, 22, 293–299.

J.P.J.-C., J.Be., N.I. provided plant material; B.B., B.P.,

M.D., S.C., M.L., J.E.K.C., J.Be., N.I., J.M. selected the

genes; F.G., S.B. prepared the samples and coordinated

the manufacture of SNP arrays; P.L., S.B. submitted data

to dbSNP; P.R. predicted SNPs; P.R. and J.Bo. designed

the genotyping assays; N.P., F.G., S.B., J.P.J.-C., A.D.,

J.Bo. analysed data and prepared the manuscript; N.I.,

J.Be., J.M., J.Bo. provided funding for the study.

Data Accessibility

Data were deposited in the database of single nucleo-

tide polymorphisms (dbSNP at http://www.ncbi.nlm.

nih.gov/SNP/). Bethesda (MD): National Center for

© 2013 Blackwell Publishing Ltd

12 N. PAVY ET AL .

Biotechnology Information, National Library of

Medicine. dbSNP accessions: ss511222299-ss538955466

(dbSNP Build ID:136). The sequence reads are available

in the NCBI archive under the project number

SRP003565. Lists of SNPs and their gene annotations are

reported in supplementary tables.

Supporting Information

Additional Supporting Information may be found in the online

version of this article:

Table S1 Accessions of SNPs and annotation of genes repre-

sented on the PgAS1 and PgLM3 genotyping arrays.

Table S2 Lists of SNPs (NCBI assay ids) successfully genotyped

in each spruce species.

© 2013 Blackwell Publishing Ltd

DES IGN AND APPLICATIONS OF SPRUCE SNP ARRAYS 13