Download - Association mapping approaches for tagging quality traits in maize

Varalakshmi,B.C

Association Mapping approaches for tagging quality traits in Maize

What is association mapping?

‘’Association genetics’’ or ‘’association studies,” or ‘’linkage disequilibrium mapping”

(Oraguzie et al. 2007)

Tool to resolve complex trait variation down to the sequence level by exploiting historical and evolutionary recombination events at the population level.

(Nordborg & Tavare, 2002; Risch& Merikangas, 1996).

LD mapping detects and locates quantitative trait loci (QTL) by the strength of the correlation between a trait and a marker.

Offers greater precision in QTL location than family-based linkage analysis

More efficient marker-assisted selection, facilitate gene discovery.

Does not require family or pedigree information , can be applied to a range of experimental and non-experimental populations.

Care must be taken during analysis to control for the increased rate of false positive results arising.

(Mackay and Powell, 2007)

Why association mapping..?

New tool

Resolve trait variation down to

sequence level

Identifying novel and superior alleles

Genomic Technolog

y

Sequencing technologies

markedly reduced the

cost

Sequencing, gene

expression profiling,

comparative genomics

Natural Diversit

y

Harnesses genetic

diversity of natural

populations to individual

nucleotides

Annotated genome

sequence from model

species

Hansen et al.,2001 ; Kraakman et al., 2006

Zhu et al., 2008

Genomic technologies for high-throughput genome sequencing

Diversity Panel

Attribute QTL mapping Association genetics

Detection goal

Quantitative trait locus, i.e., wide region within

specific pedigrees within which a QTL

is located

Quantitative trait nucleotide, i.e.,

physically as close as possible to

causative sequence(s)

Resolution of causative Trait polymorphism

Low – moderate density linkage

maps only required

High – disequilibrium within small

physical regions requiring many

markers

Marker discovery costs

Moderate Moderate for few traits, high for many traits

Comparison of Association Genetics and Conventional QTL mapping

Attribute QTL mapping Association genetics

Experimental populationsfor detection

Defined pedigrees, e.g., backcross,F2, RI, three and two generation

pedigrees/families, half-sib families,

etc.

Unrelated individuals

(“unstructured” populations), largenumbers of small unrelated families

Number of markers

required for genomecoverage

102–low 103 105 for small genomes ~109 forlarge genomes

Linkage analysis and Association

MappingAssociation MappingLinkage analysis

Advantages of Using Natural Population

Broader genetic variations with wider background for marker-trait

correlations .

Higher resolution mapping ( recombination events)

Exploiting historically measured trait data for association.

No need for the development of expensive and tedious bi-parental

populations

(Kraakman 2006 ; Hansen, 2001)

RILs V/s Association Mapping Panel

Morrell et al., 2011

Scheme of association mapping or tagging a gene of interest using germplasm accessions.

(Nordborg et al., 2005)

Types of Association MappingGenome-wide Association Mapping

(GWAS) Comprehensive approach to

systematically search the genome for causal genetic variation.

Large no of markers are tested for association with complex traits.

Prior information regarding candidate gene is not required

Works best for a research consortium with complementary expertise &

adequate funding.

Candidate- gene association mapping

Candidate genes selected based on knowledge from mutational analysis, biochemical pathway, or linkage analysis

Independent set of random markers needs to be scored to infer genetic

relationships.

Low cost, hypothesis driven, and trait specific approach but will miss other

unknown loci.

(Zhu et al., 2010).

Principle Of Association Mapping is Linkage disequilibrium (LD)

Oraguzie et al.,2007

Linkage refers to coinheritance of different loci within a genetic distance on the chromosome.

LE is a random association of alleles at different loci and equals the product of allele frequencies within haplotypes.

LD is a non-random association of alleles at different loci, describing the condition with non-equal frequency of haplotypes in a population.

Concept of LD Linkage disequilibrium also referred as “gametic

phase disequilibrium” (GPD) or “gametic disequilibrium” (GLD)

first described by Jennings in 1917, and its quantification (D i.e. coefficient of LD) was developed by Lewtonin in 1964.

D is the difference between the observed gametic frequencies of haplotypes and the expected gametic haplotype frequencies under linkage equilibrium.

D = P AB − PAPB = PAB Pab − PAbPaB

Besides D, a various different measures of LD are D, r2, D2, D∗

(Oraguzie ., 2007)

Choosing appropriate LD measures depends on the objective of the study.

r2, the square of the correlation coefficient between the two loci.

r2 is affected by mutation and recombination D is affected by more mutational histories.

The r2 value varies from 0 to 1.

The r2 value of equal to 0.1 (10%) or above considered the significant.

(Abdurakhmonov and Abdukarimov, 2008)

Calculation and visualization of LD:LD triangle and decay plots

LD can be calculated using haplotyping algorithms.

Maximum likelihood estimate (MLE) using an expectation maximization algorithm.

Graphical display of pairwise LD between two loci is useful to estimate the LD patterns measured using a large number of molecular markers.


Software used for calculation of LD

“Graphical overview of linkage disequilibrium” (GOLD) to depict the structure and pattern of LD.

“Trait Analysis by aSSociation, Evolution and Linkage” (TASSEL) and PowerMarker

The TASSEL generated triangle plot for pairwise LD


Each cell represents the comparison of two pairs of marker sites with the colour codes for the presence of significant LD.

LD decay plot

To estimate the size of LD blocks, the r 2 values (alternatively, D can also be used) usually plotted against the genetic (cM) or weighted (bp) distance referred to as a “LD decay plot”.


Factors affecting LD & association mapping Mutation and recombination are one of the

strong impact factors influencing LD.

Factors Increasing LD: New mutation, mating system (self-

pollination), genetic isolation, population structure, relatedness (kinship), genetic drift, admixture, selection (natural, artificial).

Factors Decreasing LD: High recombination and mutation rate,

recurrent mutations, outcrossing(Huttley et al., 2005).

Need of Association Mapping in MAIZE ? Source of cooking oil, biofuel and animal feed.

Model organism for cytogenetics, genetics, genomics, and functional genomics studies.

(Strable and Scanlon, 2009). Primary staple food in many African countries.

Map-based cloning of QTLs is time consuming and expensive process in Maize .

Association mapping can explore all recombination events and mutations in a given population and with a higher resolution .

(Yu and Buckler, 2006)

Examples of the range of phenotypic variation in maize germplasm held in the CIMMYT genebank (Dr. Suketoshi Taba).

Nested Association Mapping(NAM) Joint linkage and linkage disequilibrium

mapping have been proposed as “Fine Mapping’’ approach.

(Mott and Flint, 2002; Wu et al., 2002) NAM is currently implemented in maize.

Powerful strategy for dissecting the genetic basis of quantitative traits in species with low LD.

For other crop species, different genetic designs (e.g., diallel, eight-way cross) could be used to accommodate the level of LD.

NAM allows high power, cost effective genome scans, and facilitates to link molecular variation with complex trait variation.

(Yu et al., 2008)

Nested Association Mapping

Population

Sample size

Backgroundmarkers

Associationmethod

Candidate genes

Traits References

Diverse inbred lines

97 47 LR+Q ae1, bt2, sh1, sh2, sugary1,waxy1

Kernel composition & starch pasting properties

Wilson et al., 2004


42 101 LR+Q,GLM–Q

bm3 Forage quality traits

Lübberstedt et al., 2005


57 --- Haplotype tree scanning

Sugary1Sweet taste

Tracy et al., 2006


281 89 plus 553

MLM crtRB1 Carotenoid content

Yan et al., 2010

Elite lines

71 --- Unknown

DGAT Oil content & composition

Zheng et al., 2008

Elite inbred lines

75 --- Case-control

Y1 Endosperm colour

Palaisa et al., 2003

Application of candidate gene strategy to identify CrtRB1 locus

β-carotene biosynthetic pathway

Simplified Carotenoid biosynthetic pathway in maize and (Tian et al.,2001).

crtRB1 is the target gene

Zea mays crtRB1 is the target gene in the present study. translated exons are depicted as black boxes .

MethodsGermplasm evaluationPanel 1 (P1): 281 maize inbred lines grown in

Urbana, Illinois (USA) in 2002–2005.

Panel 2 (P2): 245 diverse maize inbred lines derived from tropical and subtropical adapted maize germplasm.

Panel 3 (P3): 55 diverse maize inbred lines derived from temperate-adapted maize germplasm.

Carotenoid Quantification

HPLC analysis: Extraction of carotenoids for all segregating mapping

populations was carried out by HPLC analysis.

(Kurilich and Juvik, 1999).

Population structure and kinship analysis Population structure and kinship for P1 was estimated

using 89 simple sequence repeat (SSR) markers and 553 SNP markers, respectively

(Yu et al., 2006).

STRUCTURE 2.1 was used to estimate the population structure of P2 and P3 using 46 and 86 SSRs, respectively.

Linkage mapping and QTL mapping

crtRB1 was mapped via genetic linkage mapping in a RIL population derived from B73 and BY80415, using the crtRB1 3′TE polymorphism.

QTL analysis in this population was done using QTL Cartographer 2.5

(Wang et al.,2005).

Statistical analysis

Association analysis was carried out using a mixed model incorporating kinship and population structure as implemented in TASSEL2.1 (Bradbury, et al., 2007).

LD analysis was carried out using TASSEL2.1 with the entire sequence of crtRB1; a window size of 50 bp was used to plot the average r2 against the distance.

5′TE allelic series: 1, 397-bp insertion; 2, 206-bp insertion; 3, 0-bp insertion. InDel4 allelic series: 12-bp or 0-bp insertion. 3′TE allelic series: 1, no insertion; 2, 325-bp insertion; 3, 1,250-bp insertion.P value from association analysis carried out using the mixed model incorporating population structure and kinship, using data from 4 different years.R2 values from analysis of variance (ANOVA) of data showing percentage phenotypic variation .

Haplotype is shown as linear combination

5′TE allele (1, 397-bp insertion; 2, 206-bp insertion; 3, 0-bp insertion),

InDel4 allele (12-bp or 0-bp insertion),

3′TE allele (1, no insertion; 2, 325-bp insertion; 3, 1,250-bp insertion).

Allele-specific crtRB1 effects on biochemical activity and transcriptional expression.

CrtRB1 quantitative RT-PCR from whole kernel at 15 d after pollination (DAP) and seedling leaf messenger RNA for the six indicated lines of Zea mays.

β-carotene hydroxylase product profiles for the four CRTRB1 allozymes expressed in a recombinant E. coli assay system producing β-carotene. Genetic variation for each allozyme is listed according to InDel4 and C-terminal (3′TE) differences.

Whole genome scan association mapping for oleic acid content

To identify loci with major effect on oleic acid content in maize kernels.

8,590 loci were tested for association with oleic acid content in 553 maize inbreds.

A single locus with major effect on oleic acid was mapped between 380 and 384 cM in the IBM2 neighbors genetic map onchromosome 4 and conWrmed in a biparental population.

Fatty acid desaturase, fad2, idenntified >2 kb from the associated genetic marker, is the most likely candidate gene responsible for the difference in the phenotype.

Non-conservative amino acid polymorphism near the active site of fad2 contributes to the effect on oleic acid content.

First report on use of a high resolution whole genome scan association mapping.

Materials and Methods

Whole genome scan association mapping

Single nucleotide polymorphism(SNP) haplotypes at 8,590 genetic loci were genotyped in 553 maize inbred lines.

Statistical test for association between haplotypes and the and the embryo oleic acid was performed by STRUCTURE program (Pritchard et al. 2000).

LD was computed between the locus of interest and all other loci using r2 (Devlin and Risch 1995).

Results

Boxes domain regions of the protein sequence. Horizontal grey arrows in both sequences coding region. Vertical bars nucleotide polymorphisms between both alleles half-length vertical bars synonymous substitutions. Triangles amino acid substitutions Lines across both sequences deletions and insertions. Black triangle non-conservative amino acid substitution of a leucine by threonine

Comparison of Low-oleic Acid Content (Lo) Against High-oleic Acid Content (Ho) Alleles of fad2.

Association mapping of the markers MZA10924, MZA4015, and MZA5102 (top) and linkage disequilibrium (LD) of all markers against the MZA10924 (bottom).

vertical scale negative logarithm of the association mapping P-value statistics horizontal scale genetic position in cM from Pioneer’s genetic map.

Thank you

Download - Association mapping approaches for tagging quality traits in maize

Top Related