assocn mapg for class

Upload: lordniklaus

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Assocn Mapg for Class

    1/35

    ASSOCIATION MAPPING

    IN PLANTS

  • 8/13/2019 Assocn Mapg for Class

    2/35

    Linkage analysis (QTL mapping)Association mapping (Linkage Disequilibrium based)

    Candidate gene studies through genomic

    approaches

    Approaches to dissect complex traits

  • 8/13/2019 Assocn Mapg for Class

    3/35

    Disadvantages of linkage mapping

    Amount of genetic variation between any two parents is limited.

    Genetic backgrounds in which QTL mapping is done is not

    always a representative of the crop genetic background.

    Only a few generations of effective recombination taking place

    leading to longer segments in LD, the consequence being reduced

    resolution.

    i.e homozygosity is reached in a faster pace making

    dissection of a genomic region difficult -(fine mapping not

    possible)

  • 8/13/2019 Assocn Mapg for Class

    4/35

    Linkage mapping - counts recombination between

    markers and the trait of interest (linkage) in a

    biparental population.

    Association mapping- measures correlation between

    marker alleles and trait allele in a population (linkage

    Disequilibrium)

    Association analysis, also known as LD mapping or

    association mapping, is apopulation-basedsurveyused to identify trait-marker relationships based on

    linkage disequilibrium (Flint-Garcia et al. 2003)

  • 8/13/2019 Assocn Mapg for Class

    5/35

    How does one proceed?

    Haplotype- a set of closely linked genetic markers on

    a chromosome that tend to be inherited together

  • 8/13/2019 Assocn Mapg for Class

    6/35

    Linkage vs Association

    Linkage

    Family-based

    Few markers for genomecoverage (300-400 SSRs)

    Good for initial detection; poor

    for fine-mapping

    Powerful for rare variants

    Association (LD mapping)

    Families or unrelated

    Many markers for genomecoverage (105106SNPs)

    Poor for initial detection; good

    for fine-mapping

    Powerful for common variants;

    rare variants generally

    impossible

    Complementary

    idea is to understandmolecular genetic basis of phenotypic variation

  • 8/13/2019 Assocn Mapg for Class

    7/35

    Gene and genotype frequencies

    Locus 1: allele A freq = pA

    allele a freq = pa

    Locus 2: allele B freq = pB

    allele b freq = pB

    pA + pa = 1

    pAA + pAa + paa = 1

    What is pAB (gamete) when there is linkage disequilibrium?

    And when there is no linkage?

    You restore HWE after a single generation of random mating in a population

    only when the individual loci are considered singly/ one at a time.

    pB + pb = 1

    pBB + pBb + pbb = 1

  • 8/13/2019 Assocn Mapg for Class

    8/35

    Linkage Disequilibrium

    PABPAPB

    PAbPAPb= PA(1-PB)

    PaBPaPB= (1-PA)PB

    PabPaPb= (1-PA) (1-PB)

    8

    B b Total

    A PAB

    PAb

    PA

    a PaB

    Pab

    Pa

    Total PB

    Pb

    1.0

    SNP 1

    SNP 2

    Linkage Disequilibrium (LD) is a

    measure of the non-random association

    of alleles at two different loci.

  • 8/13/2019 Assocn Mapg for Class

    9/35

    Whatever we measure as LD is in fact gametic

    phase disequilibriumand thus will remain true only

    when it is due to linkage. (corollary?)

    D = ru-st

  • 8/13/2019 Assocn Mapg for Class

    10/35

    Factors affecting LD

    Mutation

    Self pollination

    Genetic isolation

    Population admixture

    Small founder population / genetic drift

    Selection

    Epistasis

    Factors Increasing LD Factors Decreasing LD

    High recombination

    Recurrent mutation

    Outcrossing

    Gene conversion

  • 8/13/2019 Assocn Mapg for Class

    11/35

    Linkage DisequilibriumExampleHow does LD arise?

    There are only three haplotypes: AG, CG, and CC.

    There is no AC haplotype, i.e., PAC= 0.

    However, PAPC=1/9, thus PAPC PAC .

    These two SNPs are in linkage disequilibrium

    11

    -- A -- -- -- G -- -- --

    -- C -- -- -- G -- -- --

    -- A -- -- -- G -- -- --

    -- C -- -- -- G -- -- --

    -- C -- -- -- C-- -- --

    Before mutation After mutation

    PA=1/2PC=1/2

    PG=1

    PA=1/3

    PC=2/3

    PG=2/3

    PC=1/3

  • 8/13/2019 Assocn Mapg for Class

    12/35

    Linkage EquilibriumExampleHow does LD disappear?

    After recombination,

    PAG= PAPG = 1/4,

    PCG

    = PC

    PG

    = 1/4,

    PCC= PCPC= 1/4, and

    PAC= PAPC= 1/4.

    Thus, these two SNPs are linkage equilibrium.

    12

    -- A -- -- -- G -- -- --

    -- C -- -- -- G -- -- --

    -- C -- -- -- C -- -- --

    -- A-- -- -- C-- -- --

    -- A -- -- -- G -- -- --

    -- C -- -- -- G -- -- --

    -- C -- -- -- C -- -- --

    Before recombination After recombination

    PA=1/2

    PC=1/2

    PG=1/2

    PC=1/2

  • 8/13/2019 Assocn Mapg for Class

    13/35

    Measure of LD: D Coefficient

    The measure the non-randomness of two loci is represented by adeviation Das follows:

    D= PABPab PAbPaB

    PAB= PAPB + D

    PAb= PA(1-PB) - D

    PaB= (1-PA)PB - D

    Pab= (1-PA) (1-PB) + D

    D= 0 when the two loci are in linkage equilibrium.

    13

  • 8/13/2019 Assocn Mapg for Class

    14/35

    Standardization of DCoefficient

    Dcoefficient is normalized (Lewontin, 1964) since range of D is always

    determined by the allele frequency.

    D = D/Dmax, where Dmax stands for the absolute maximalpossible value of D.

    BaaBBa

    bAAbbA

    baabba

    BAABBA

    BabA

    baBA

    PPDPDPP

    PPDPDPP

    PPDPDPP

    PPDPDPP

    DPPPP

    D

    DPPPP

    D

    D

    0

    0

    0

    0

    .0if,),min(

    ;0if,),min('

    14

    0

    -PAPB PaPB

    D D

  • 8/13/2019 Assocn Mapg for Class

    15/35

    Interpretation of D

    Dis constrained between -1 and +1.

    D= 1 (perfect positive LD between SNP alleles)

    D= 0 (linkage equilibrium between SNP alleles)

    D= -1 (perfect negative LD between SNP alleles)

    D= 0.87 (strong positive LD between SNP alleles)

    D= 0.12 (weak positive LD between SNP alleles)

    15

  • 8/13/2019 Assocn Mapg for Class

    16/35

    Measure of LD: r2(Hill and Robertson, 1968)

    r2

    = (PAB

    pA

    pB)2

    /pA

    pa

    pB

    pb

    0 r2 1.

    Most relevant LD measurement.

    0.1 to 0.2 r2refers to LD decay

    r2is the square of correlation coefficient when alleles are binary

    coded.

    If r2= LD value of SNP with another and h2q= total trait

    variation, then,

    r2* h2q= the trait variation that can be explained by these SNPs

    r2=2/K, where Kis the number of chromosomes.

  • 8/13/2019 Assocn Mapg for Class

    17/35

    Decay of LD over Time

    The chromosome recombination decreases LD so that

    equilibrium is attained at the end.

    17

  • 8/13/2019 Assocn Mapg for Class

    18/35

    3/6

    2/43/2

    6/23/5

    2/6

    3/6 5/6

    Allele 6is associated with trait of interest

    4/62/6

    6/6

    6/6

    3/4

    5/2

    Controls Cases

    Allelic Association

  • 8/13/2019 Assocn Mapg for Class

    19/35

    Direct Association

    Mutant or susceptible polymorphism

    Allele of interest is itself involved in phenotype

    Indirect AssociationAllele itself is not involved, but a nearby correlated

    marker changes phenotype

    Spurious association

    Apparent association not related to genetic causes(most common outcome)

    Linkage Disequilibrium:correlation between (any) markers in population

    Allelic Association: correlation between marker allele and trait

    Allelic Association

    Three Common Forms

  • 8/13/2019 Assocn Mapg for Class

    20/35

    Indirect and Direct Allelic Association

    D

    *

    Measure trait relevance

    (*) directly, ignoring

    correlated markers nearby

    Direct Association

    M1 M2 Mn

    Assess trait effects on QTL

    via correlated markers (Mi)

    D

    Indirect Association & LD

  • 8/13/2019 Assocn Mapg for Class

    21/35

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 0.02 0.04 0.06 0.08 0.1

    Recombination fraction

    D

    10 gens

    20 gens

    50 gens

    250 gens

    How far apart can markers be to detect association?

    Dt= (1

    )tD0

    Expected decay of linkage disequilibrium

  • 8/13/2019 Assocn Mapg for Class

    22/35

    Association Mapping for crop

    improvement(AM versus QTL Mapping)

    Since Association Mapping can be conducted directly on the

    breeding material:

    1. Direct inference from research to breeding is possible.

    2. Phenotypic variation is observed for most traits of interest.

    3. Marker polymorphism is higher than in biparentalpopulations.

    4. No need of pedigrees or structured mapping populations.

    5. Routine evaluations provide phenotypic data.

    6. Higher resolution possible because of recombination over

    very large number of generation studiedthrough extent ofhaplotype sharing.

    7. Association Mapping provides other useful information

    about:

    Organization of genetic variation and

    Polymorphism across the genome

  • 8/13/2019 Assocn Mapg for Class

    23/35

    Types of Populations

    1. Germplasm Bank Collection

    A collection of genetic resources including landraces, exotic material

    and wild relatives.

    2. Synthetic Populations

    Outcrossing populations synthesized from inbred lines (segregating

    generations). May be used for recurrent selection.

    3. Elite Lines

    Inbred lines (and checks) manipulated with the objective of releasing

    new varieties in the short term.

  • 8/13/2019 Assocn Mapg for Class

    24/35

    Characteristics Related to Association Mapping

    S. No Aspects of AM GP Bank Synthetic

    population

    Elite GP

    1 Sample Core collection Segregating

    progenies

    Elite lines

    2 Sample turnover Static Ephemeral Gradually

    substituting

    3 Source of Pdata Screenings Progeny tests Yield trials

    4 Types of traits High h2 and

    domestication

    traits

    Depends on the

    evaluation

    scheme

    Low h2 traits

    5 Type of Marker SNP SNP/SSR SSR

    6 LD Low Intermediate and

    fast decaying

    High

  • 8/13/2019 Assocn Mapg for Class

    25/35

    S. No Aspects of AM GP Bank Synthetic

    population

    Elite GP

    7 Population

    structure

    Medium Low High

    8 Allele diversity

    among samples

    High Intermediate Low

    9 Allele diversity

    within samples

    Variable 1 or 2 alleles 1 allele

    10 Power Low Intermediate and

    decreasing

    High; could

    allow genomic

    scan

    11 Resolution High; could allow

    fine mapping

    Intermediate and

    increasing

    Low

    12 Use of informative

    markers

    Transfer of new

    alleles by marker

    assisted BC

    Incorporation in

    selection index

    MAS in

    progenies

    (after

    validation)

  • 8/13/2019 Assocn Mapg for Class

    26/35

    Germplasm bank core-collections - for allele-mining of

    candidate genes and fine-mapped QTLs

    Elite lines - to detect genomic regions associated with

    traits of interest

    Synthetic populations might represent a balance between

    power and precision, and have the major advantage of being

    unstructured

    Summary (of characteristics)

  • 8/13/2019 Assocn Mapg for Class

    27/35

    Pearson Chi square test

    Yates correction

    Fishers Exact test

  • 8/13/2019 Assocn Mapg for Class

    28/35

    Structured Association

    To tackle highly structured populations

    Looks for closely related clusters and develops Q by use of a set of

    random unlinked markers

    Corrects the false associations

    STRUCTURE (Pritchard et al. 2000) estimates population structure

    and shared coancestry coefficients for all markers

    Not good enough when some degree of relationship (Kinship) is also

    present.

  • 8/13/2019 Assocn Mapg for Class

    29/35

    Accounts for multiple levels of relatedness (Yu et al., 2006)

    Uses Q matrix (from STRUCTURE) to account for population

    subdivision

    Uses K matrix to account for relatedness within populationsusing Spagedisoftware

    Superior to other methods (Structured Association, Genomic

    Control and Quantitative Transmission Disequilibrium Test) in

    Type I error control and statistical power

    Implemented in the software TASSEL

    Replacement of Q matrix with P matrix makes it more robust

    (Price et al., 2006 and Zhou et al., 2007)

    Mixed Linear Model (MLM)

  • 8/13/2019 Assocn Mapg for Class

    30/35

    Population Stratification

    A population under study may have sub-populations, which may lead to

    Spurious association.

    Loss of power to detect real association.

    EIGENSTRAT (Price et al. 2006 Nat. Genet.) uses principal components to

    extract information on stratification and adjust for the stratification in

    association analysis.

    Mixed Population = Sub-population 1 + Sub-population 2

    A a A a A a

    Case 70 80=

    10 40+

    60 40

    Control 50 100 20 80 30 20

  • 8/13/2019 Assocn Mapg for Class

    31/35

    A method for joint QTL linkage and association mapping in a set of

    RIL populations derived from matings to a common parent

    Dense genotyping of SNPs is performed only in the parents

    Only common parent-specific SNPs are genotyped in the RILs and

    are used to identify the parental origin of chromosomal segments,

    allowing projection of sequence information from parents to RILs

    LD information from ancient recombination is thus captured,

    allowing for high resolution mapping with far less genotyping effort

    Nested Association Mapping

  • 8/13/2019 Assocn Mapg for Class

    32/35

    Power of Association Mapping

    Decisive Factors

    Extent and evolution of LD in the population (mode of pollination)

    Complexity and mode of gene action for the trait of interest

    Sample size

    Preselecting a priori known QTLs or candidate genes

    Samples with longer LD blocks

    Availability of pedigree and genomic information and resourcesQuality of phenotypic data

    Association mapping panel constructed

    Efficiency of targeted gene sequencing (genotypic data)

  • 8/13/2019 Assocn Mapg for Class

    33/35

    LD and AM studies in various plant species

    S.

    No.

    Crop Population LD Extent Traits

    1 Rice Diverse landraces andaccessions

    5- 500 kb; 20-30 cM; 50-225cM

    Glutinous phenotype, Starchquality, yield and itscomponents

    2 Wheat Diverse

    cultivars

  • 8/13/2019 Assocn Mapg for Class

    34/35

    Common Statistical Software

    Packages for Association Mapping

    S. No. Software Package Focus Remarks

    1 TASSEL Association analysis Free, LD stat, sequenceanalysis, AM by GLM andMLM

    2 STRUCTURE Population structure Free, widely used for PSanalysis

    3 SPAGeDi Relative kinship Free, Genetic relationshipanalysis

    4 EINGENSTART PCA, Association

    analysis

    Free, PCA was proposed

    as an alternative forpopulation for PSA

    5 MTDFREML Mixed model Free, can be used forplant data

    6 ASREML Mixed model Commercial, can be usedfor plant data

  • 8/13/2019 Assocn Mapg for Class

    35/35

    Two major typesGenome-wide screen and candidate gene

    Genome-wide screen

    Hypothesis-free

    High-cost: large genotyping

    requirements

    Multiple-testing issues

    Possible many false positives,

    fewer misses

    Candidate gene

    Hypothesis-driven

    Low-cost: small genotyping

    requirements

    Multiple-testing less important

    Possible many misses, fewer

    false positives