association genetics in forest trees santiago c. gonzález-martínez center of forest research,...
Post on 23-Jan-2016
221 views
TRANSCRIPT
Association genetics in forest trees
Santiago C. González-Martínez
Center of Forest Research, INIA, PO Box 8111 28080 Madrid, Spain
[email protected]@inia.es
SNP 1 SNP 2 SNP 3 Trait 1 Trait 2AT GT CC 1 3AT GT GG 10 4TT TT GG 10 7AA GG CG 5 1AA GG CG 5 3AT TT CG 5 5AA GT CC 1 2TT GT GG 10 8TT TT GG 10 7TT TT CC 1 10AT GG CG 5 6AA GT CG 5 4AA TT CG 5 1TT TT GG 10 9AT GG CC 1 6AT GT CG 5 4AT GG CC 1 5AA GG CC 1 2AA GG GG 10 1TT GT GG 10 8
f(AA)=0.35 f(GG)=0.35 f(CC)=0.30f(AT)=0.35 f(GT)=0.35 f(CG)=0.45f(TT)=0.30 f(GG)=0.30 f(GG)=0.25
Trait 1 Trait 1 Trait 1u(AA)=32/7=4.57 u(GG)=28/7=4.00 u(CC)=6/6=1.00u(AT)=23/6=3.83 u(GT)=42/7=6.00 u(GC)=35/7=5.00u(TT)=51/6=8.50 u(TT)=41/6=6.83 u(GG)=70/7=10.00
Trait 2 Trait 2 Trait 2u(AA)=14/7=2.00 u(GG)=24/7=3.43 u(CC)=28/6=4.66u(AT)=33/7=4.71 u(GT)=33/7=4.71 u(GC)=24/7=3.43u(TT)=49/6=8.16 u(TT)=41/6=6.83 u(GG)=44/7=6.28
What is association genetics?
Linkage versus Association: finding the molecular variation underlying complex traits
several generations
X
X
XX
X
X
X
Natural population (= multiple genetic backgrounds)
Mapping pedigree
A favourable mutation
LG
For which organisms genetic association is a promising approach?
• Relatively undomesticated species with outbred mating systems and large natural populations.
• Organisms with long life-spam, where generating pedigrees would take several years.
• Organisms (such as humans) where artificial crosses are not possible or are difficult to obtain (incompatible species).
• In plants: opportunity to test for genetic association of multiple traits and phenotypes: long-term common garden experiments (including clonal tests high precision in the estimation of phenotypes).
The ‘immortal’ association population
Linkage disequilibrium and association
Stumpf & McVean (2003)Nature Reviews Genetics
a)
b)
c)
Rapid decay of LD in conifers, but LD might be stronger in regions under selection (example: LD extends over 800 kb around Y1 gene in maize, Palaisa et al. 2004, which in general shows also a rapid decay of LD with physical distance, Remington et al. 2001)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 500 1000 1500 2000 2500 3000 3500
distance (base pairs)
r2
Picea abies all
P. abies without Romania
Baltico-Nordic domain
Alpine domain
Heuertz et al. 2006Genetics
Extend of LD and association: higher LD makes easier to detect associations but more difficult to identify the causal mutations
Variation among genes
Variation among species
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 500 1000 1500 2000 2500 3000 3500
distance (base pairs)
r2
Picea abies all
P. abies without Romania
Baltico-Nordic domain
Alpine domain
conifers
humans
Stumpf & McVean (2003)Nature Reviews Genetics
Based on Yu & Buckler (2006)Current Opinion in Biotechnology
GLMGC
Approaches to genetic association in plants
Familial relatedness
Po
pula
tion
str
uct
ure
SAGC
GLMGCMLM
MLMTDTQTDT
unknown
Natural populations
Breeding populations
Complex demography
Power considerations: the size of an association population
% variation explained by QTN
Po
we
r
Long & Langley (1999)Genome Research
A single random mating population with mutation, random genetic drift, and recombination
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
N=500 N=100 N=50
Hirschhorn & Daly 2005Nature Reviews Genetics
Increased rate of false-positives due to population structure…
Zhao et al. (2007)PLoS Genetics
…but correcting for pop structure produces true negatives!
haplotypes
Moroccan
Western
Eastern
maritime pine
ab
c
haplotypes
Moroccan
Western
Eastern
maritime pine
ab
c
Multiple glaciar refugiaMultiple glaciar refugia
Drought cline
Postglacial migrations
Power considerations: structured populations
Zhao et al. (2007)PLoS Genetics
% variation explained by QTN
Po
we
r
(Small association pop of ~100 accessions)
Methods for genetic association in forest trees
• Standard general linear models (GLMs), usually with p values computed by permutation.
y = + mi + eij, where y is the trait value, is a general
mean, mi is the genotype of the i-th SNP and eij is the residual.
• Structured Association (Pritchard et al. 2000; Thornsberry 2001) and PCA Association (Price et al. 2006).
Controls for population structure by incorporating a Q matrix. This matrix is an n × p population structure incidence matrix where n is the number of individuals assayed and p is the number of populations defined.
• Mixed Linear Models (MLMs; Yu et al. 2006).
They incorporate a Q matrix (fixed effect) but also a pairwise relatedness matrix (K matrix, a random effect), which account for within population structure.
• Family-based methods (Transmission Disequilibrium Test, TDT or QTDT, and its several extensions).
Parents must be heterozygous to be informative.
From few to moderate genetic backgrounds tested.
FBRC association population in loblolly
pine
González-Martínez et al. (2008) Heredity
Partial diallel, including 15-24 offspring from 61 families. Association with WUE (isotope discrimination in two sites)
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Genotype by family for DHN1-S2
Tra
it
Corrections for multiple testing
• Experiment-wise permutation
• Bonferroni (/k, with k = the number of tests)
• False Discovery Rate (FDR)
Storey & Tibshirani (2003)PNAS
FDR: the expected proportion of false positives among all significant tests
Permutation tests (Hirschhorn and Daly 2005)
Some examplesMonolignol biosynthesis
and cell-wall related genesGonzález-Martínez et al. (2007)
Genetics
Drought tolerance Collada et al. (in prep.)
Pinus taeda L
Continuous range, no clear population genetic structure
Fragmented range, significant population structure
TREESNIPS project (also P. sylvestris, Picea abies and oaks)
ADEPT project
Tamrabta(30)
TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)
CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca
(25)San Leonardo de Yagüe
Valdemaqueda(24)(21)Arenas de San Pedro
(27)San Cipriano
(40)Petrock
(43)Le Verdon
Olonne/Mer(44)
(42)Hourtin
(41)Mimizan
Cenicientos(20) Ahin(28)
St Jean de Monts(45)
(46) Pleucadec
(11)Pineta (10)Aulenne
Restonica (2)Pinia (15)
(29)Oria
(47)Erdeven
Pinus pinastergeographicrange France
Spain
Tunisia
Portugal
Morocco
Tamrabta(30)
TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)
CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca
(25)San Leonardo de Yagüe
Valdemaqueda(24)(21)Arenas de San Pedro
(27)San Cipriano
(40)Petrock
(43)Le Verdon
Olonne/Mer(44)
(42)Hourtin
(41)Mimizan
Cenicientos(20) Ahin(28)
St Jean de Monts(45)
(46) Pleucadec
(11)Pineta (10)Aulenne
Restonica (2)Pinia (15)
(29)Oria
(47)Erdeven
Pinus pinastergeographicrange France
Spain
Tunisia
Portugal
Morocco
Tamrabta(30)
TabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarkaTabarka(50)
CuellarCuellarCuellarCuellarCuellarCuellarCuellarCuellar(23)Cuellar(26)Bayubas de Abajo(22)Coca
(25)San Leonardo de Yagüe
Valdemaqueda(24)(21)Arenas de San Pedro
(27)San Cipriano
(40)Petrock
(43)Le Verdon
Olonne/Mer(44)
(42)Hourtin
(41)Mimizan
Cenicientos(20) Ahin(28)
St Jean de Monts(45)
(46) Pleucadec
(11)Pineta (10)Aulenne
Restonica (2)Pinia (15)
(29)Oria
(47)Erdeven
Pinus pinastergeographicrange France
Spain
Tunisia
Portugal
Morocco
22 populations
Pinus pinaster Ait.
Phenotypic traits
S1
S2
S3
2o wall
1o wall
microfibrilangle
• Earlywood specific gravity (ewsg)• Latewood specific gravity (lwsg)• Percent latewood (lw)• Earlywood microfibril angle (ewmfa)• Lignin & cellulose content (lgn-cel)
• Synthetic PCAs for different wood-age types
SNP genotyping
0
50
100
150
200
0 50 100 150X axis
R110 (mP)
Y a
xis
TA
MR
A (
mP
)
FP-TDI platform 58 SNPs from 20 wood- and drought- related candidate genes.
Genetic association with wood property traits
González-Martínez et al. 2007 Genetics
Significant genetic association of cad gene with earlywood specific gravity and 4cl with %
latewood
SNP M28 (position 16 bp)
10 180* *
MGSLESEKTV […] SPMKHFGMTEP
10 180* *
MGSLETEKTV […] SPMKHFAMTEP
cynnamyl alcohol dehydrogenase (cad)
M28
T G G A GT T G A GT G G A A
A G C G G AAA G C G G
M29
Tested but not givingsignificant associations
SNP M28 (position 16 bp)
10 180* *
MGSLESEKTV […] SPMKHFGMTEP
10 180* *
MGSLETEKTV […] SPMKHFAMTEP
cynnamyl alcohol dehydrogenase (cad)
M28
T G G A GT T G A GT G G A A
A G C G G AAA G C G G
M29
Tested but not givingsignificant associations
0 500 1000 1500 2000 2500
1
994
1410
1609
1697
1845
1934
2004
2385
2589
F4 R4 F3 R3 F2 R1A61 601 947 1454 1486 2003
F5 R3 F6 R6491 1956 2728
0 500 1000 1500 2000 2500 2500 3000 3500
-60 90 208 321 781 1008 1133 1417 1528 1681 3192 328490
F1A R1A F2 R2 F3 R3F6 R6
4cl
cad
Provenance-progeny combined tests in two sites: Cálcena (central Spain) & Bordeaux (southwestern France)
• Isotope discrimination (WUE)• Growth (height, diameter, annual increments)• Biomass (total and aerial)• Ontogeny scores• Survival
Genetic association with WUE
Phenotypic traits
SNP genotyping
Pyrosequencing Relatively high genotyping error.
Collada et al. (in prep.)
Central/marginal pairs
C - - - - - - - - - - - - - - - - - - - T T t C c A t c C c A g t A T G A T A T T C C G G T Pinus taedaT c T C C A T G G C G G A C A C a T A C T T T C T G T T C C G T C T T G G C T C T C C A C T 1C c T C C A T G G C G G A C A C a T A C T T T C T G T T C C G T C T T G G C T C T C C A C T 6C C T C C A T G G C G G A C A C A T A C T T T C T G T T C C G T C T T G G C T C T C C A C T 5T A T T T A C G G C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 10C A T T T A C G G C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 29C A T T T A C G A C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 1C A T T T G C G G C G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 2C A T T T A C G G T G G A C A C T T G T T T T C T G T T C C G T C A C G G C A T C T C G G T 1C A T T T A C G G C G G A C A T A T A C C C T T C A G T C C G T C A C A T T A C T C T G G T 1C C G T T A C G G C G G A C A T A T A C C C T T C A G T C C G T C A C A T T A C T C T G G T 1C C T C C A T A G C G T G A G C A T A C T T A C C A T C T C A G T A C G T T A C T C T G G T 1FRD13C
pr-agp4 470bp 0.14691062bp 0.0009991069bp 0.000999
dhn1 116bp 0.013171bp 0.2188
ccoaomt 1229bp 0.0699erd3 92bp 0.4256dhn2 248bp 0.4286
254bp 0.2927259bp 0.3646293bp 0.3457
lp3-3 43bp 0.460569bp 0.737375bp 0.027
223bp 0.3377267bp 0.9071272bp 0.4366
rd21 3bp 0.7313
BLUEs (pop effect removed)
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
Iso
top
e d
iscr
imin
atio
n
TT GT GG
Average for TT: 0.0034Average for GT: -0.0407
agp4GLMs, population as a factor
Tassel demo
R SNPassoc package demo
Perspectives on genetic association in forest trees
• Enormous potential, but still many technical challenges ahead: optimization of SNP genotyping platforms, dealing with recently evolved gene families, building large unstructured association populations, transfer information to non-model species, etc.
• Linking genotype-phenotype through association genetics works well for well-known metabolic pathways, and for some species such as loblolly pine genome-wide approaches are now in place. As large-scale association studies are developed, more complex questions will be addressed: gene interactions, heterosis, plasticity (G x E), etc.
• Apart from industry applications, given the ecosystem-wide importance of forest trees, genetic association will have a strong influence in evolutionary and ecological research.
Absence of transpecific SNPs between P. pinaster and P. taeda, two pine species separated by ~120 Myr
Lp3_3 pinaster F1 R10 185 352 406
nt_4
3nt
_44
nt_5
5nt
_59
nt_6
4nt
_65
nt_6
6nt
_67
nt_6
8nt
_69
nt_7
0nt
_71
nt_7
2nt
_73
nt_7
4nt
_75
nt_7
6nt
_77
nt_8
1nt
_85
nt_8
7nt
_91
nt_9
7nt
_106
nt_1
15nt
_127
nt_1
34nt
_143
nt_1
56nt
_158
nt_1
61nt
_188
nt_1
96nt
_198
nt_1
99nt
_200
nt_2
01nt
_204
nt_2
23nt
_235
nt_2
36nt
_246
nt_2
67nt
_272
nt_2
98nt
_318
nt_3
19nt
_330
nt_3
63
Hap_1 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A C
Hap_2 C G C G G G A G G T G A A G A G T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C
Hap_3 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C T C A T A C
Hap_4 C G C G G G A G G A G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T C T A A G A T A C
Hap_5 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C
Hap_6 T G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C A T T T A A G A T A C
Hap_7 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T A A G A T A T
Hap_8 C G C G G G A G G T G A A G A C T G A G T G C G A C C T G G G C C G A T C C T T T T C A G A T A C
Hap_A C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C
Hap_B C T T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G A C G C
Hap_C C G T A - - - - - - - - - - - - C A T T C T T A G T A G A A A - T A - - - T T C T C A A G A C G C
Hap_D C G T A - - - - - - - - - - - - C A T T C T T A G T A G G A A - T A - - - T T C T C A A G G C G C
P.pinaster
P.taeda
ABA-and-WDS-induced-gene-3 (lp3-3)
P. pinaster
P. taeda
Average Ks between P. pinaster and P. taeda of ~2%
Acknowledgements
TREESNIPS (for maritime pine: C. Collada, E. Eveno, M.A. Guevara, A. Booth, A. Soto, C. Plomion, L. Díaz, S. McCallum, I. Aranda, O. Brendel, R. Alía, V. Leger, J. Brach, J. Russell, P.H. Garnier-Géré, M.T. Cervera)
ADEPT & ADEPT2 (N.C. Wheeler, E. Ersoz, G.R. Brown, G.P. Gill, R.J. Kuntz, J.A. Beal, J. Manares, D. Huber, J. Davis, B. Pande, J. Lee, A. Eckert, J. Wegrzyn, C.D. Nelson)
FUNDING AGENCIES (NSF, CSREES-USDA, EU, MEC-Spain)
and, of course, all you!