Varalakshmi,B.C
Association Mapping approaches for tagging quality traits in Maize
What is association mapping?
‘’Association genetics’’ or ‘’association studies,” or ‘’linkage disequilibrium mapping”
(Oraguzie et al. 2007)
Tool to resolve complex trait variation down to the sequence level by exploiting historical and evolutionary recombination events at the population level.
(Nordborg & Tavare, 2002; Risch& Merikangas, 1996).
LD mapping detects and locates quantitative trait loci (QTL) by the strength of the correlation between a trait and a marker.
Offers greater precision in QTL location than family-based linkage analysis
More efficient marker-assisted selection, facilitate gene discovery.
Does not require family or pedigree information , can be applied to a range of experimental and non-experimental populations.
Care must be taken during analysis to control for the increased rate of false positive results arising.
(Mackay and Powell, 2007)
Why association mapping..?
New tool
Resolve trait variation down to
sequence level
Identifying novel and superior alleles
Genomic Technolog
y
Sequencing technologies
markedly reduced the
cost
Sequencing, gene
expression profiling,
comparative genomics
Natural Diversit
y
Harnesses genetic
diversity of natural
populations to individual
nucleotides
Annotated genome
sequence from model
species
Hansen et al.,2001 ; Kraakman et al., 2006
Zhu et al., 2008
Genomic technologies for high-throughput genome sequencing
Diversity Panel
Attribute QTL mapping Association genetics
Detection goal
Quantitative trait locus, i.e., wide region within
specific pedigrees within which a QTL
is located
Quantitative trait nucleotide, i.e.,
physically as close as possible to
causative sequence(s)
Resolution of causative Trait polymorphism
Low – moderate density linkage
maps only required
High – disequilibrium within small
physical regions requiring many
markers
Marker discovery costs
Moderate Moderate for few traits, high for many traits
Comparison of Association Genetics and Conventional QTL mapping
Attribute QTL mapping Association genetics
Experimental populationsfor detection
Defined pedigrees, e.g., backcross,F2, RI, three and two generation
pedigrees/families, half-sib families,
etc.
Unrelated individuals
(“unstructured” populations), largenumbers of small unrelated families
Number of markers
required for genomecoverage
102–low 103 105 for small genomes ~109 forlarge genomes
Linkage analysis and Association
MappingAssociation MappingLinkage analysis
Advantages of Using Natural Population
Broader genetic variations with wider background for marker-trait
correlations .
Higher resolution mapping ( recombination events)
Exploiting historically measured trait data for association.
No need for the development of expensive and tedious bi-parental
populations
(Kraakman 2006 ; Hansen, 2001)
RILs V/s Association Mapping Panel
Morrell et al., 2011
Scheme of association mapping or tagging a gene of interest using germplasm accessions.
(Nordborg et al., 2005)
Types of Association MappingGenome-wide Association Mapping
(GWAS) Comprehensive approach to
systematically search the genome for causal genetic variation.
Large no of markers are tested for association with complex traits.
Prior information regarding candidate gene is not required
Works best for a research consortium with complementary expertise &
adequate funding.
Candidate- gene association mapping
Candidate genes selected based on knowledge from mutational analysis, biochemical pathway, or linkage analysis
Independent set of random markers needs to be scored to infer genetic
relationships.
Low cost, hypothesis driven, and trait specific approach but will miss other
unknown loci.
(Zhu et al., 2010).
Principle Of Association Mapping is Linkage disequilibrium (LD)
Oraguzie et al.,2007
Linkage refers to coinheritance of different loci within a genetic distance on the chromosome.
LE is a random association of alleles at different loci and equals the product of allele frequencies within haplotypes.
LD is a non-random association of alleles at different loci, describing the condition with non-equal frequency of haplotypes in a population.
Concept of LD Linkage disequilibrium also referred as “gametic
phase disequilibrium” (GPD) or “gametic disequilibrium” (GLD)
first described by Jennings in 1917, and its quantification (D i.e. coefficient of LD) was developed by Lewtonin in 1964.
D is the difference between the observed gametic frequencies of haplotypes and the expected gametic haplotype frequencies under linkage equilibrium.
D = P AB − PAPB = PAB Pab − PAbPaB
Besides D, a various different measures of LD are D, r2, D2, D∗
(Oraguzie ., 2007)
Choosing appropriate LD measures depends on the objective of the study.
r2, the square of the correlation coefficient between the two loci.
r2 is affected by mutation and recombination D is affected by more mutational histories.
The r2 value varies from 0 to 1.
The r2 value of equal to 0.1 (10%) or above considered the significant.
(Abdurakhmonov and Abdukarimov, 2008)
Calculation and visualization of LD:LD triangle and decay plots
LD can be calculated using haplotyping algorithms.
Maximum likelihood estimate (MLE) using an expectation maximization algorithm.
Graphical display of pairwise LD between two loci is useful to estimate the LD patterns measured using a large number of molecular markers.
(Abdurakhmonov and Abdukarimov, 2008)
Software used for calculation of LD
“Graphical overview of linkage disequilibrium” (GOLD) to depict the structure and pattern of LD.
“Trait Analysis by aSSociation, Evolution and Linkage” (TASSEL) and PowerMarker
The TASSEL generated triangle plot for pairwise LD
(Abdurakhmonov and Abdukarimov, 2008)
Each cell represents the comparison of two pairs of marker sites with the colour codes for the presence of significant LD.
LD decay plot
To estimate the size of LD blocks, the r 2 values (alternatively, D can also be used) usually plotted against the genetic (cM) or weighted (bp) distance referred to as a “LD decay plot”.
(Abdurakhmonov and Abdukarimov, 2008)
Factors affecting LD & association mapping Mutation and recombination are one of the
strong impact factors influencing LD.
Factors Increasing LD: New mutation, mating system (self-
pollination), genetic isolation, population structure, relatedness (kinship), genetic drift, admixture, selection (natural, artificial).
Factors Decreasing LD: High recombination and mutation rate,
recurrent mutations, outcrossing(Huttley et al., 2005).
Need of Association Mapping in MAIZE ? Source of cooking oil, biofuel and animal feed.
Model organism for cytogenetics, genetics, genomics, and functional genomics studies.
(Strable and Scanlon, 2009). Primary staple food in many African countries.
Map-based cloning of QTLs is time consuming and expensive process in Maize .
Association mapping can explore all recombination events and mutations in a given population and with a higher resolution .
(Yu and Buckler, 2006)
Examples of the range of phenotypic variation in maize germplasm held in the CIMMYT genebank (Dr. Suketoshi Taba).
Nested Association Mapping(NAM) Joint linkage and linkage disequilibrium
mapping have been proposed as “Fine Mapping’’ approach.
(Mott and Flint, 2002; Wu et al., 2002) NAM is currently implemented in maize.
Powerful strategy for dissecting the genetic basis of quantitative traits in species with low LD.
For other crop species, different genetic designs (e.g., diallel, eight-way cross) could be used to accommodate the level of LD.
NAM allows high power, cost effective genome scans, and facilitates to link molecular variation with complex trait variation.
(Yu et al., 2008)
Nested Association Mapping
Population
Sample size
Backgroundmarkers
Associationmethod
Candidate genes
Traits References
Diverse inbred lines
97 47 LR+Q ae1, bt2, sh1, sh2, sugary1,waxy1
Kernel composition & starch pasting properties
Wilson et al., 2004
Diverse inbred lines
42 101 LR+Q,GLM–Q
bm3 Forage quality traits
Lübberstedt et al., 2005
Diverse inbred lines
57 --- Haplotype tree scanning
Sugary1Sweet taste
Tracy et al., 2006
Diverse inbred lines
281 89 plus 553
MLM crtRB1 Carotenoid content
Yan et al., 2010
Elite lines
71 --- Unknown
DGAT Oil content & composition
Zheng et al., 2008
Elite inbred lines
75 --- Case-control
Y1 Endosperm colour
Palaisa et al., 2003
Application of candidate gene strategy to identify CrtRB1 locus
β-carotene biosynthetic pathway
Simplified Carotenoid biosynthetic pathway in maize and (Tian et al.,2001).
crtRB1 is the target gene
Zea mays crtRB1 is the target gene in the present study. translated exons are depicted as black boxes .
MethodsGermplasm evaluationPanel 1 (P1): 281 maize inbred lines grown in
Urbana, Illinois (USA) in 2002–2005.
Panel 2 (P2): 245 diverse maize inbred lines derived from tropical and subtropical adapted maize germplasm.
Panel 3 (P3): 55 diverse maize inbred lines derived from temperate-adapted maize germplasm.
Carotenoid Quantification
HPLC analysis: Extraction of carotenoids for all segregating mapping
populations was carried out by HPLC analysis.
(Kurilich and Juvik, 1999).
Population structure and kinship analysis Population structure and kinship for P1 was estimated
using 89 simple sequence repeat (SSR) markers and 553 SNP markers, respectively
(Yu et al., 2006).
STRUCTURE 2.1 was used to estimate the population structure of P2 and P3 using 46 and 86 SSRs, respectively.
Linkage mapping and QTL mapping
crtRB1 was mapped via genetic linkage mapping in a RIL population derived from B73 and BY80415, using the crtRB1 3′TE polymorphism.
QTL analysis in this population was done using QTL Cartographer 2.5
(Wang et al.,2005).
Statistical analysis
Association analysis was carried out using a mixed model incorporating kinship and population structure as implemented in TASSEL2.1 (Bradbury, et al., 2007).
LD analysis was carried out using TASSEL2.1 with the entire sequence of crtRB1; a window size of 50 bp was used to plot the average r2 against the distance.
5′TE allelic series: 1, 397-bp insertion; 2, 206-bp insertion; 3, 0-bp insertion. InDel4 allelic series: 12-bp or 0-bp insertion. 3′TE allelic series: 1, no insertion; 2, 325-bp insertion; 3, 1,250-bp insertion.P value from association analysis carried out using the mixed model incorporating population structure and kinship, using data from 4 different years.R2 values from analysis of variance (ANOVA) of data showing percentage phenotypic variation .
Haplotype is shown as linear combination
5′TE allele (1, 397-bp insertion; 2, 206-bp insertion; 3, 0-bp insertion),
InDel4 allele (12-bp or 0-bp insertion),
3′TE allele (1, no insertion; 2, 325-bp insertion; 3, 1,250-bp insertion).
Allele-specific crtRB1 effects on biochemical activity and transcriptional expression.
CrtRB1 quantitative RT-PCR from whole kernel at 15 d after pollination (DAP) and seedling leaf messenger RNA for the six indicated lines of Zea mays.
β-carotene hydroxylase product profiles for the four CRTRB1 allozymes expressed in a recombinant E. coli assay system producing β-carotene. Genetic variation for each allozyme is listed according to InDel4 and C-terminal (3′TE) differences.
Whole genome scan association mapping for oleic acid content
To identify loci with major effect on oleic acid content in maize kernels.
8,590 loci were tested for association with oleic acid content in 553 maize inbreds.
A single locus with major effect on oleic acid was mapped between 380 and 384 cM in the IBM2 neighbors genetic map onchromosome 4 and conWrmed in a biparental population.
Fatty acid desaturase, fad2, idenntified >2 kb from the associated genetic marker, is the most likely candidate gene responsible for the difference in the phenotype.
Non-conservative amino acid polymorphism near the active site of fad2 contributes to the effect on oleic acid content.
First report on use of a high resolution whole genome scan association mapping.
Materials and Methods
Whole genome scan association mapping
Single nucleotide polymorphism(SNP) haplotypes at 8,590 genetic loci were genotyped in 553 maize inbred lines.
Statistical test for association between haplotypes and the and the embryo oleic acid was performed by STRUCTURE program (Pritchard et al. 2000).
LD was computed between the locus of interest and all other loci using r2 (Devlin and Risch 1995).
Results
Boxes domain regions of the protein sequence. Horizontal grey arrows in both sequences coding region. Vertical bars nucleotide polymorphisms between both alleles half-length vertical bars synonymous substitutions. Triangles amino acid substitutions Lines across both sequences deletions and insertions. Black triangle non-conservative amino acid substitution of a leucine by threonine
Comparison of Low-oleic Acid Content (Lo) Against High-oleic Acid Content (Ho) Alleles of fad2.
Association mapping of the markers MZA10924, MZA4015, and MZA5102 (top) and linkage disequilibrium (LD) of all markers against the MZA10924 (bottom).
vertical scale negative logarithm of the association mapping P-value statistics horizontal scale genetic position in cM from Pioneer’s genetic map.
Thank you