assocn mapg for class
TRANSCRIPT
-
8/13/2019 Assocn Mapg for Class
1/35
ASSOCIATION MAPPING
IN PLANTS
-
8/13/2019 Assocn Mapg for Class
2/35
Linkage analysis (QTL mapping)Association mapping (Linkage Disequilibrium based)
Candidate gene studies through genomic
approaches
Approaches to dissect complex traits
-
8/13/2019 Assocn Mapg for Class
3/35
Disadvantages of linkage mapping
Amount of genetic variation between any two parents is limited.
Genetic backgrounds in which QTL mapping is done is not
always a representative of the crop genetic background.
Only a few generations of effective recombination taking place
leading to longer segments in LD, the consequence being reduced
resolution.
i.e homozygosity is reached in a faster pace making
dissection of a genomic region difficult -(fine mapping not
possible)
-
8/13/2019 Assocn Mapg for Class
4/35
Linkage mapping - counts recombination between
markers and the trait of interest (linkage) in a
biparental population.
Association mapping- measures correlation between
marker alleles and trait allele in a population (linkage
Disequilibrium)
Association analysis, also known as LD mapping or
association mapping, is apopulation-basedsurveyused to identify trait-marker relationships based on
linkage disequilibrium (Flint-Garcia et al. 2003)
-
8/13/2019 Assocn Mapg for Class
5/35
How does one proceed?
Haplotype- a set of closely linked genetic markers on
a chromosome that tend to be inherited together
-
8/13/2019 Assocn Mapg for Class
6/35
Linkage vs Association
Linkage
Family-based
Few markers for genomecoverage (300-400 SSRs)
Good for initial detection; poor
for fine-mapping
Powerful for rare variants
Association (LD mapping)
Families or unrelated
Many markers for genomecoverage (105106SNPs)
Poor for initial detection; good
for fine-mapping
Powerful for common variants;
rare variants generally
impossible
Complementary
idea is to understandmolecular genetic basis of phenotypic variation
-
8/13/2019 Assocn Mapg for Class
7/35
Gene and genotype frequencies
Locus 1: allele A freq = pA
allele a freq = pa
Locus 2: allele B freq = pB
allele b freq = pB
pA + pa = 1
pAA + pAa + paa = 1
What is pAB (gamete) when there is linkage disequilibrium?
And when there is no linkage?
You restore HWE after a single generation of random mating in a population
only when the individual loci are considered singly/ one at a time.
pB + pb = 1
pBB + pBb + pbb = 1
-
8/13/2019 Assocn Mapg for Class
8/35
Linkage Disequilibrium
PABPAPB
PAbPAPb= PA(1-PB)
PaBPaPB= (1-PA)PB
PabPaPb= (1-PA) (1-PB)
8
B b Total
A PAB
PAb
PA
a PaB
Pab
Pa
Total PB
Pb
1.0
SNP 1
SNP 2
Linkage Disequilibrium (LD) is a
measure of the non-random association
of alleles at two different loci.
-
8/13/2019 Assocn Mapg for Class
9/35
Whatever we measure as LD is in fact gametic
phase disequilibriumand thus will remain true only
when it is due to linkage. (corollary?)
D = ru-st
-
8/13/2019 Assocn Mapg for Class
10/35
Factors affecting LD
Mutation
Self pollination
Genetic isolation
Population admixture
Small founder population / genetic drift
Selection
Epistasis
Factors Increasing LD Factors Decreasing LD
High recombination
Recurrent mutation
Outcrossing
Gene conversion
-
8/13/2019 Assocn Mapg for Class
11/35
Linkage DisequilibriumExampleHow does LD arise?
There are only three haplotypes: AG, CG, and CC.
There is no AC haplotype, i.e., PAC= 0.
However, PAPC=1/9, thus PAPC PAC .
These two SNPs are in linkage disequilibrium
11
-- A -- -- -- G -- -- --
-- C -- -- -- G -- -- --
-- A -- -- -- G -- -- --
-- C -- -- -- G -- -- --
-- C -- -- -- C-- -- --
Before mutation After mutation
PA=1/2PC=1/2
PG=1
PA=1/3
PC=2/3
PG=2/3
PC=1/3
-
8/13/2019 Assocn Mapg for Class
12/35
Linkage EquilibriumExampleHow does LD disappear?
After recombination,
PAG= PAPG = 1/4,
PCG
= PC
PG
= 1/4,
PCC= PCPC= 1/4, and
PAC= PAPC= 1/4.
Thus, these two SNPs are linkage equilibrium.
12
-- A -- -- -- G -- -- --
-- C -- -- -- G -- -- --
-- C -- -- -- C -- -- --
-- A-- -- -- C-- -- --
-- A -- -- -- G -- -- --
-- C -- -- -- G -- -- --
-- C -- -- -- C -- -- --
Before recombination After recombination
PA=1/2
PC=1/2
PG=1/2
PC=1/2
-
8/13/2019 Assocn Mapg for Class
13/35
Measure of LD: D Coefficient
The measure the non-randomness of two loci is represented by adeviation Das follows:
D= PABPab PAbPaB
PAB= PAPB + D
PAb= PA(1-PB) - D
PaB= (1-PA)PB - D
Pab= (1-PA) (1-PB) + D
D= 0 when the two loci are in linkage equilibrium.
13
-
8/13/2019 Assocn Mapg for Class
14/35
Standardization of DCoefficient
Dcoefficient is normalized (Lewontin, 1964) since range of D is always
determined by the allele frequency.
D = D/Dmax, where Dmax stands for the absolute maximalpossible value of D.
BaaBBa
bAAbbA
baabba
BAABBA
BabA
baBA
PPDPDPP
PPDPDPP
PPDPDPP
PPDPDPP
DPPPP
D
DPPPP
D
D
0
0
0
0
.0if,),min(
;0if,),min('
14
0
-PAPB PaPB
D D
-
8/13/2019 Assocn Mapg for Class
15/35
Interpretation of D
Dis constrained between -1 and +1.
D= 1 (perfect positive LD between SNP alleles)
D= 0 (linkage equilibrium between SNP alleles)
D= -1 (perfect negative LD between SNP alleles)
D= 0.87 (strong positive LD between SNP alleles)
D= 0.12 (weak positive LD between SNP alleles)
15
-
8/13/2019 Assocn Mapg for Class
16/35
Measure of LD: r2(Hill and Robertson, 1968)
r2
= (PAB
pA
pB)2
/pA
pa
pB
pb
0 r2 1.
Most relevant LD measurement.
0.1 to 0.2 r2refers to LD decay
r2is the square of correlation coefficient when alleles are binary
coded.
If r2= LD value of SNP with another and h2q= total trait
variation, then,
r2* h2q= the trait variation that can be explained by these SNPs
r2=2/K, where Kis the number of chromosomes.
-
8/13/2019 Assocn Mapg for Class
17/35
Decay of LD over Time
The chromosome recombination decreases LD so that
equilibrium is attained at the end.
17
-
8/13/2019 Assocn Mapg for Class
18/35
3/6
2/43/2
6/23/5
2/6
3/6 5/6
Allele 6is associated with trait of interest
4/62/6
6/6
6/6
3/4
5/2
Controls Cases
Allelic Association
-
8/13/2019 Assocn Mapg for Class
19/35
Direct Association
Mutant or susceptible polymorphism
Allele of interest is itself involved in phenotype
Indirect AssociationAllele itself is not involved, but a nearby correlated
marker changes phenotype
Spurious association
Apparent association not related to genetic causes(most common outcome)
Linkage Disequilibrium:correlation between (any) markers in population
Allelic Association: correlation between marker allele and trait
Allelic Association
Three Common Forms
-
8/13/2019 Assocn Mapg for Class
20/35
Indirect and Direct Allelic Association
D
*
Measure trait relevance
(*) directly, ignoring
correlated markers nearby
Direct Association
M1 M2 Mn
Assess trait effects on QTL
via correlated markers (Mi)
D
Indirect Association & LD
-
8/13/2019 Assocn Mapg for Class
21/35
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.02 0.04 0.06 0.08 0.1
Recombination fraction
D
10 gens
20 gens
50 gens
250 gens
How far apart can markers be to detect association?
Dt= (1
)tD0
Expected decay of linkage disequilibrium
-
8/13/2019 Assocn Mapg for Class
22/35
Association Mapping for crop
improvement(AM versus QTL Mapping)
Since Association Mapping can be conducted directly on the
breeding material:
1. Direct inference from research to breeding is possible.
2. Phenotypic variation is observed for most traits of interest.
3. Marker polymorphism is higher than in biparentalpopulations.
4. No need of pedigrees or structured mapping populations.
5. Routine evaluations provide phenotypic data.
6. Higher resolution possible because of recombination over
very large number of generation studiedthrough extent ofhaplotype sharing.
7. Association Mapping provides other useful information
about:
Organization of genetic variation and
Polymorphism across the genome
-
8/13/2019 Assocn Mapg for Class
23/35
Types of Populations
1. Germplasm Bank Collection
A collection of genetic resources including landraces, exotic material
and wild relatives.
2. Synthetic Populations
Outcrossing populations synthesized from inbred lines (segregating
generations). May be used for recurrent selection.
3. Elite Lines
Inbred lines (and checks) manipulated with the objective of releasing
new varieties in the short term.
-
8/13/2019 Assocn Mapg for Class
24/35
Characteristics Related to Association Mapping
S. No Aspects of AM GP Bank Synthetic
population
Elite GP
1 Sample Core collection Segregating
progenies
Elite lines
2 Sample turnover Static Ephemeral Gradually
substituting
3 Source of Pdata Screenings Progeny tests Yield trials
4 Types of traits High h2 and
domestication
traits
Depends on the
evaluation
scheme
Low h2 traits
5 Type of Marker SNP SNP/SSR SSR
6 LD Low Intermediate and
fast decaying
High
-
8/13/2019 Assocn Mapg for Class
25/35
S. No Aspects of AM GP Bank Synthetic
population
Elite GP
7 Population
structure
Medium Low High
8 Allele diversity
among samples
High Intermediate Low
9 Allele diversity
within samples
Variable 1 or 2 alleles 1 allele
10 Power Low Intermediate and
decreasing
High; could
allow genomic
scan
11 Resolution High; could allow
fine mapping
Intermediate and
increasing
Low
12 Use of informative
markers
Transfer of new
alleles by marker
assisted BC
Incorporation in
selection index
MAS in
progenies
(after
validation)
-
8/13/2019 Assocn Mapg for Class
26/35
Germplasm bank core-collections - for allele-mining of
candidate genes and fine-mapped QTLs
Elite lines - to detect genomic regions associated with
traits of interest
Synthetic populations might represent a balance between
power and precision, and have the major advantage of being
unstructured
Summary (of characteristics)
-
8/13/2019 Assocn Mapg for Class
27/35
Pearson Chi square test
Yates correction
Fishers Exact test
-
8/13/2019 Assocn Mapg for Class
28/35
Structured Association
To tackle highly structured populations
Looks for closely related clusters and develops Q by use of a set of
random unlinked markers
Corrects the false associations
STRUCTURE (Pritchard et al. 2000) estimates population structure
and shared coancestry coefficients for all markers
Not good enough when some degree of relationship (Kinship) is also
present.
-
8/13/2019 Assocn Mapg for Class
29/35
Accounts for multiple levels of relatedness (Yu et al., 2006)
Uses Q matrix (from STRUCTURE) to account for population
subdivision
Uses K matrix to account for relatedness within populationsusing Spagedisoftware
Superior to other methods (Structured Association, Genomic
Control and Quantitative Transmission Disequilibrium Test) in
Type I error control and statistical power
Implemented in the software TASSEL
Replacement of Q matrix with P matrix makes it more robust
(Price et al., 2006 and Zhou et al., 2007)
Mixed Linear Model (MLM)
-
8/13/2019 Assocn Mapg for Class
30/35
Population Stratification
A population under study may have sub-populations, which may lead to
Spurious association.
Loss of power to detect real association.
EIGENSTRAT (Price et al. 2006 Nat. Genet.) uses principal components to
extract information on stratification and adjust for the stratification in
association analysis.
Mixed Population = Sub-population 1 + Sub-population 2
A a A a A a
Case 70 80=
10 40+
60 40
Control 50 100 20 80 30 20
-
8/13/2019 Assocn Mapg for Class
31/35
A method for joint QTL linkage and association mapping in a set of
RIL populations derived from matings to a common parent
Dense genotyping of SNPs is performed only in the parents
Only common parent-specific SNPs are genotyped in the RILs and
are used to identify the parental origin of chromosomal segments,
allowing projection of sequence information from parents to RILs
LD information from ancient recombination is thus captured,
allowing for high resolution mapping with far less genotyping effort
Nested Association Mapping
-
8/13/2019 Assocn Mapg for Class
32/35
Power of Association Mapping
Decisive Factors
Extent and evolution of LD in the population (mode of pollination)
Complexity and mode of gene action for the trait of interest
Sample size
Preselecting a priori known QTLs or candidate genes
Samples with longer LD blocks
Availability of pedigree and genomic information and resourcesQuality of phenotypic data
Association mapping panel constructed
Efficiency of targeted gene sequencing (genotypic data)
-
8/13/2019 Assocn Mapg for Class
33/35
LD and AM studies in various plant species
S.
No.
Crop Population LD Extent Traits
1 Rice Diverse landraces andaccessions
5- 500 kb; 20-30 cM; 50-225cM
Glutinous phenotype, Starchquality, yield and itscomponents
2 Wheat Diverse
cultivars
-
8/13/2019 Assocn Mapg for Class
34/35
Common Statistical Software
Packages for Association Mapping
S. No. Software Package Focus Remarks
1 TASSEL Association analysis Free, LD stat, sequenceanalysis, AM by GLM andMLM
2 STRUCTURE Population structure Free, widely used for PSanalysis
3 SPAGeDi Relative kinship Free, Genetic relationshipanalysis
4 EINGENSTART PCA, Association
analysis
Free, PCA was proposed
as an alternative forpopulation for PSA
5 MTDFREML Mixed model Free, can be used forplant data
6 ASREML Mixed model Commercial, can be usedfor plant data
-
8/13/2019 Assocn Mapg for Class
35/35
Two major typesGenome-wide screen and candidate gene
Genome-wide screen
Hypothesis-free
High-cost: large genotyping
requirements
Multiple-testing issues
Possible many false positives,
fewer misses
Candidate gene
Hypothesis-driven
Low-cost: small genotyping
requirements
Multiple-testing less important
Possible many misses, fewer
false positives