molecular marker and its application to genome mapping and molecular breeding

Molecular Marker and Its Application to

Genome Mapping and Molecular Breeding

Binying Fu

Institute of Crop Sciences

The Chinese Academy of Agricultural Sciences

Beijing 100081, China

Nov-14-2012

Definition of Biological Marker

�� Biological markers can be anything that distinguishes Biological markers can be anything that distinguishes

one individual or population from anotherone individual or population from another

�� Can be phenotypicCan be phenotypic

� Can be a biochemical or genetic differenceCan be a biochemical or genetic difference

�� Can be phenotypicCan be phenotypic

��ColorColor: yellow vs white etc: yellow vs white etc

��TextureTexture: smooth vs rough etc: smooth vs rough etc

��ShapeShape: round vs irregular etc: round vs irregular etc

Phenotypic Markers

Weakness: unstable and limited number and polymorphism

http://cgil.uoguelph.ca/QTL/Fig2_3.htm

Cytological Marker

Any distinct and heritable feature of chromosome structure that

can be used to follow (usually by microscopy) that chromosome

or chromosome region in breeding experiments.

Weakness: side effect and need special technique

Biochemical Marker-Isozyme and Protein

Weakness：limited number, spatio-temporal expressed and need special technique such as Starch Gel with special staining

Characteristics of Ideal Markers

� Polymorphism

� Stability, no influences from the environment

� Wide dispersion through the genome

� Simplicity of observation� Simplicity of observation

� Low cost

� Mendelian Heritability

� Co-dominancy

� Reproducibility

� Portability between species

Define：：：：A molecular selection technique of DNA signposts which allows the identification of differences in the nucleotide sequences of the DNA in different individuals. Or any genetic element ( locus, allele, DNA sequence or chromosome feature) which can be readily detected by phenotype, cytological or molecular techniques, and used to follow a chromosome or chromosomal segment during genetic analysis. (Also DNA marker)

Molecular MarkersMolecular Markers

Agriculture: a tool which allows crop geneticists and breeders to locate on a plant chromosome the genes for a trait of interest. It is considered more efficient than conventional breeding as it has the potential to greatly reduce development times and substitutes laboratory selection for much of the fieldwork. MAS or MDB!

Molecular, or DNA-based, markers have been increasingly important in plant breeding because of their features: Phenotypic stability (not affected by environment), Useful polymorphism, Ease of development.

Mutation = heritable (at the cell level) changes in DNA

sequence, regardless of whether the change produces any

detectable effect on a gene product. Mutations are the source

of new variation (polymorphism) upon which natural selection

works. Inherited mutations that are dispersed through a

Where does the molecular marker come from?

population can become polymorphisms.

Polymorphism = presence in the same population of two or

more alternative forms of a DNA sequence, with the most

common allele having a frequency of 99% or less. Any two

individuals have a polymorphic difference every 1,000-10,000

base pairs.

Class of Mutation Mechanism Frequency Example

Genome mutation Chromosome 10-2/cell division Aneuploidy

missegregation

Chromosome mutation Chromosome 6x10-4/cell division Translocation

Comparison of Mutation FrequenciesComparison of Mutation Frequencies

Chromosome mutation Chromosome 6x10 /cell division Translocation

rearrangement

Gene mutation Base-pair mutation 10-10/base pair/cell division Point mutation

10-5-10-6/locus/generation

humans have ~109 base pairs/haploid genome,therefore each person will have 1-100 new mutations

1 in 20 people will have a new gene mutation

Types of Types of MutationsMutations (1)(1)

• Missense mutations (amino acid substitution)

• Nonsense mutations (premature stop codon)

Nucleotide Substitutions Altering Coding Sequence

Types of Mutations (2)Types of Mutations (2)

• RNA processing mutations (destruction of splice sites,

cap sites, poly A sites, or creation of cryptic sites)

• Regulatory mutations (promoter mutations)

Nucleotide Substitutions Altering Gene Expression

• Regulatory mutations (promoter mutations)

Types of Mutations (3)Types of Mutations (3)

• Insertion or deletion of small number of bases

If number of bases involved is not a multiple of 3,

causes frameshift

Deletions and Insertions (InDels)

causes frameshift

If number of bases involved is a multiple of 3,

causes loss or gain of codons

• Larger deletions, inversions, and duplications

Can create gene syndromes

RecombinationRecombination--GeneratedGenerated

Duplications, Deletions, Insertions

Duplication

Insertion

Inversion

Brief Summary

The term MARKER is usually used for “LOCUS MARKER”. Each gene has a particular place along the chromosome called

LOCUS. Due to mutations, genes can be modified in several forms

mutually exclusives called ALLELES (or allelic forms). All allelic

forms of a gene occur at the same locus on homologous

chromosomes. When allelic forms of one locus are identical, the

genotype is called HOMOZYGOTE (at this locus), whereas

different allelic forms constituted a HETEROZYGOTE. In

diploid organisms, the GENOTYPE is constituted by the two

allelic forms of the homologous chromosomes.

Thus, MOLECULAR MARKERS are all loci markers related

to DNA (sometimes biochemical or morphological markers

included).

First Generation: 1980s

� -Based on DNA-DNA hybridizations, such as RFLP.

Second Generation: 1990s

� -Based on PCR: Using random primers: RAPD, DAF, ISSR

Molecular Markers ClassesMolecular Markers Classes

Using specific primers: SSR, SCAR, STS

� -Based on PCR and restriction cutting: AFLP, CAPs

Third Generation: recently

� -Based on DNA point mutations (SNP), can be detected by SSCP,

DASH, DNA chip, sequencing etc.

The Evolution of Markers

AFLPs on microarrays (2000)

SSCPs CAPs (1993)

cDNA Sequencing-cSSR

RAPDs (1990)

AFLPs (1996)

Genomic Era

Automation

AFLPs on automated sequencers (1998)

Complete Genomic Sequence

High-throughput marker analysis

SNPs on Chips

Hallmark event

Morphological Variants (Pre 1950s)

Restriction (1968) and Southern Blotting (1975)

RFLPs (1980)Pre-PCR

DNA-Hybridization-Scene

Protein-Scene

Allozymes (1960s)

Gel Eletrophoresis (1950s)

Gene –Specific PCRPCR (1986)

SSCPs CAPs (1993)

OLIGO-Scene

Microsatellites (SSRs 1989)

RAPDs (1990)

Hallmark event

DNA Markers

Simple Sequence Repeats-SSR

Single Nucleotide Polymorphism-SNPSingle Nucleotide Polymorphism-SNP

Single Feature Polymorphisms (SFPs)

Microsatellites

What are microsatellites?

� Simple sequence repeats (SSRs) or microsatellites are tandemly repeated mono-,

di-, tri-, tetra-, penta-, and hexa-nucleotide motifs. SSR length polymorphisms are

caused by differences in the number of repeats

� SSR loci are “individually amplified by PCR using pairs of oligonucleotide primers

specific to unique DNA sequences flanking the SSR sequence”.

Example Mononucleotide SSR (A)11

AAAAAAAAAAADinucleotide SSR (GT)6

GTGTGTGTGTGTTrinucleotide SSR (CTG)4CTGCTGCTGCTG

Tetranucleotide SSR (ACTC)4

ACTCACTCACTCACTC

Feature of SSR Marker

� SSRs tend to be highly polymorphic.

SSRs are highly abundant and randomly dispersed throughout

Microsatellites

� SSRs are highly abundant and randomly dispersed throughout

most genomes.

� Most SSR markers are co-dominant and locus specific.

� Genotyping throughput is high and can be automated.

Where are microsatellites found?

Majority are in non-coding region

Microsatellites

Repeat Motifs

� AC repeats tend to be more abundant than other di-nucleotide repeat motifs in animals

� The most abundant di-nucleotide repeat motifs in plants, in descending order, are AT, AG, and AC.

Microsatellites

� Because AT repeats self-anneal, AT-enrichment methods have not been developed.

� Typically, SSRs are developed for di-, tri-, and tetra-nucleotide repeat motifs. CA and GA have been widely used in plants.

� SSR markers have been developed for a variety of tri- and tetra-nucleotide repeats in plants.

� Tetra-nucleotide repeats have the potential to be very highly polymorphic.

SSR Containing Sequences from BACSSR Containing Sequences from BAC--endsends

1 % in Corn 0.6 % in Soybean

SSR containing sequences in different BAC ends, there are 1% SSR in Corn,

0.6% in Soybean. Among these, most are dinucleotide repeats

Trinucleotide Repeats in Soy BACTrinucleotide Repeats in Soy BAC--end Sequencesend Sequences

In the Soybean genome, most of the trinucleotide repeats

in BAC-end sequences are AAT repeats, one quarter of

them are AAC repeats.

Simple sequence repeats (SSRs). SSRs are particularly useful for developing genetic Simple sequence repeats (SSRs). SSRs are particularly useful for developing genetic markers. They are believed to vary through DNA replication slippage , and are related to genetic instability . In Table 2, we describe SSR content for two sectors, n 6 to 11 units and n >11 units, to emphasize that the number of SSRs dropped substantially after 11 units. The SSR content for 93-11 was 1.7% of the genome,

lower than in the human, where it was 3%. The overwhelming majority of

rice SSRs were mononucleotides, primarily (A)n or (T)n, and with n is 6 to

11. In contrast, for the human, the greatest contributions came from dinucleotides.

From Nipponbare, Goff etal., 2002, Sciences.

The most prevalent SSR is tri-nucleotide; Most frequent 2-SSR is AG, 3-SSR is

CGG, 4-SSR is CGAT.

How do microsatellites mutate?

Microsatellites

� Replication Slippage

� Unequal crossing-over during meiosis

“Polymerase slippage” or

“slipped-strand mispairing.”

A commonly observed replication error is the replication slippage, which

Replication Slippage

When the DNA replicates, the polymerase loses track of its place, and either leaves

out repeat units or adds too many repeat units.

occurs at the repetitive sequences when the new strand mispairs with the template

strand. The microsatellite polymorphism is mainly caused

by the replication slippage. If the mutation occurs in a coding region, it could produce abnormal proteins, leading to diseases.

Unequal crossing-over during meiosis

This is thought to explain more drastic changes in numbers of repeats. In this

diagram, chromosome A obtained too many repeats during crossing-over, and

chromosome B obtained too few repeats.

Why do microsatellites exist?

� "junk" DNA, and the variation is mostly neutral

� a necessary source of genetic variation

Microsatellites

� a necessary source of genetic variation

� regulate gene expression and protein function

Moxon, E. R., Wills, C. 1999. "DNA microsatellites: Agents of Evolution?" Scientific

American. Jan., pp. 72-77.

Kashi, Y. and M. Soller. 1999. "Functional Roles of Microsatellites and Minisatellites."

In: Microsatellites: Evolution and Applications. Edited by Goldstein and Schlotterer.

Oxford University Press.

Models of Microsatellite Mutation (1)

This model holds that when microsatellites mutate, they only gain

or lose one repeat. This implies that two alleles that differ by one repeat are more closely related (have a more recent common

1. Stepwise Mutation Model (SMM)

repeat are more closely related (have a more recent common

ancestor) than alleles that differ by many repeats. In other words, size matters when doing statistical tests of population

substructuring. The SMM is generally the preferred model when

calculating relatedness between individuals and population

substructuring, although there is the problem of homoplasy.

Each mutation can create any new allele randomly. A 15-repeat allele

could be just as closely related to a 10-repeat allele as a 11-repeat allele.

All that matters is that they are different alleles. In other words, size isn't

important.

2. Infinite Alleles Model (IAM)

Models of Microsatellite Mutation(2)

important.

A 15-repeat allele could be just as closely related to a 10-repeat allele

as a 11-repeat allele.

15-repeat 11-repeat 10-repeat

8-repeat

Genomic DNA

Conventional Developmental Steps of SSR Markers

PCR test using diverse genotypes

Specific SSR

DNA Library

Positive Clones

SSR probes

Sequencing of positive DNA clones

1. The customary method for SSR genotyping is denaturing polyacrylamide gel electrophoresis using silver-stained PCR products. These assays can usually distinguish alleles differing by 4 bp and may distinguish alleles differing by 2 bp.

2. Semi-automated SSR genotyping can be performed by assaying fluorescently labelled PCR products for length variants on an automated DNA sequencer. Several instruments have been developed (e.g., Applied

Four Assay Methods

DNA sequencer. Several instruments have been developed (e.g., Applied Biosystems and Li-Cor). Alleles differing by 2 to 4 bp can usually be distinguished.

3. SSR length polymorphisms can be assayed using non-denaturing high performance liquid chromatography (Marino et al. 1998). Alleles differing by 2 to 4 bp can usually be distinguished.

4. SSR alleles differing by several repeat units can often be distinguished on agarose gels.

SSRs assayed on polyacrylamide gels typically show a characteristic

“stuttering”. Stutter bands are artifacts produced by DNA polymerase slippage. Typically, the most prominent stutter bands are +1 and - 1 repeat (e.g., + or - 2 bp for a di-nucleotide repeat), and, if visible, the next most prominent stutter bands are +2 and -2 repeats.

� The development of SSRs is labor intensive（NO in

sequence-based SSR development) .

� SSR marker development costs are very high.

Weaknesses

� SSR markers are taxa specific.

� Start-up costs are high for automated SSR assay methods.

� Developing PCR multiplexes is difficult and expensive. Some

markers may not multiplex.

� SNP is the molecular basis for most phenotypic differences between

individuals

� SNP is the most common genetic variations.

SNPs are highly abundant, stable and distributed throughout the genome

Single Nucleotide Polymorphisms

� SNPs are highly abundant, stable and distributed throughout the genome

� SNP assay is amenable to automation and high throughput.

� SNP is biallelic.

GATTTAGATCGCGATAGAG

GATTTAGATCTCGATAGAG

� SNPs in intergenic regions may …

� Have no genetic effect …

� Affect genetic regulatory signals …

� Interfere with RNA splice sites …

Single Nucleotide Polymorphisms

� SNPs in Coding regions (cSNP) may …� Synonymously change the codon of an amino acid,

which may have no further effect, or may influence

e.g. codon bias.

� non-synonymously alter the encoded amino acid

(nsSNP) by a conservative exchange, or non-

conservative (radical) mutation.

SNP Variation in Maize and Soybean

Maize Soy

Frequency of Candidate SNPs from

Different Sources in Maize and Soy

Region Maize Soy

EST (5’end) 1/1.5kb 1/1.9kbEST (5’end) 1/1.5kb 1/1.9kb

Genomic 1/640bp 1/750bp

3’UTR 1/441bp 1/416bp

SNP/250bp SNP/268bp SNP/243bpSNP/236bpSNP/250bp

SNP/268bp SNP/243bpSNP/236bp

SNPs Discovery

1. Sequence databases searches

2. Target specific SNP discovery and development2. Target specific SNP discovery and development

-Conformation-based mutation scanning

-Direct DNA sequencing

Identify SNP from Sequence Databases

Identification of Target Specific SNPs

Steps:

1. Amplify the genes of interests with PCR

2. Scan for mutation with various methods

-Conformation-based mutation scanning

- Single -strand conformation polymorphism analysis- Gel electrophoresis- Gel electrophoresis- Chemical and enzymatic mismatch cleavage detection - Denaturing gradient gel electrophoresis- Denaturing HPLC

4. Align sequences from different sources to find SNPs

3. Sequence positive PCR products

-Sequence multiple individuals

-Sequence heterozygotes

Gel-Based Methods

-PCR-restriction fragment length polymorphism analysis

-PCR-based allelic specific amplification

-Oligonucleotide ligation assay genotyping

-Minisequencing(10~20base)

Technologies for Detecting Known SNPs

Non-Gel-Based High Through Genotyping Technologies

-Solution hybridization using fluorescence dyes

-Allelic specific ligation

-Allelic specific nucleotide incorporation

1. High resolution separation

2. Chemical color reaction

-DNA microarray genotyping

Oligo Ligation Assay（（（（OLA））））

Two allele-specific oligonucleotide probes (one specific for the wild-type allele and the

other specific for the variant allele) and a fluorescent common probe are used in each

assay. The 3' ends of the allele-specific probes are immediately adjacent to the 5' end of

the common probe. In the presence of thermally stable DNA ligase, ligation of the

fluorescently labeled probe to the allele-specific probe(s) occurs only when there is a

perfect match between the variant or the wild-type probe and the PCR product template.

These ligation products are then separated by electrophoresis, which permits the

recognition of the wild-type genotypes, the variants, the heterozygotes, and the unligated

probes.

Figure. Schematic representation of the allele-specific codominant PCR strategy. Oligonucleotide primers with 3' nucleotides that correspond to an SNP site are used to preferentially amplify specific alleles. A, Primer P1 forms a perfect match with allele 1 but forms a mismatch at the 3' terminus with the DNA sequence of allele 2. Primer P2 similarly

Allele-Specific Codominant PCR Strategy

DNA sequence of allele 2. Primer P2 similarly forms a perfect match with allele 2 and a 3' terminus mismatch with allele 1.B, Schematic of agarose gel analysis showing the expected outcome for the amplification of organisms homozygous and heterozygous for both alleles using primers P1 and P2. P1, Primer 1; P2, primer 2; A1, allele 1; A2, allele 2.

Eliana Drenkard et al. 2000 Plant Physiol 124: 1483-1492

Principle: A 1 bp mismatch in the center of a 15mer will change

the T m by 5 - 10 degrees, therefore a SNP in the middle of a

15mer can be genotyped using paired ASOs.

SNP Detection Allele Specific Oligohybridization

� PCR amplify target gene (different individual) in 96 well format

� Prepare dot-blot on nylon filter

� Hybridize to allele-specific 15mer and detect the signal

� Wash at stringency temperature

� Repeat for alternate allele and other SNPs

Single-stranded DNAs are generated by denaturation of the PCR

products and separated on a nondenaturing polyacrylamide gel. A

fragment with a single-base modification generally forms a different

conformer and migrates differently when compared with wild-type

Single-Strand Conformation Polymorphism Analysis

Size <200bp,

Accuracy: 70%-95%

Size >400bp,

Accuracy: 50%

1% false positive

SNP Genotyping Using Oligo Chip

T genotype

C genotypeOligo Chip: a set of 15-

nucleotide probes, which consist

of different sets of probes

overlapped each other, 14

nucleotides were overlapped,

among the four probes in one set,

the sequences are almost the

same except one A/G/C/T

http://www.ricesnp.org/index.aspx##

Direct Sequencing - New Sequencing Technology

Pyrosequencing technology offers rapid and accurate genotyping, allowing for

dependable SNP and mutation analysis. This technology utilizes an enzyme cascade system that results in the production of measurable light whenever a nucleotide forms a base pair with its complimentary base in a DNA template strand.

Solexa/Illumina SequencingSolexa/Illumina Sequencing

Munroe & Harris, (2010) Third-generation sequencing fireworks at Marco Island.

Nature Biotechnology 28: 426–428.

Use of SNPs

1. Markers for linkage mapping-Discover SNPs contribute

to agronomic traits

2. Trace origin of introgression2. Trace origin of introgression

3. Markers for association studies (Linkage Disequilibrium)

4. Markers for population genetic analysis

molecular marker and its application to genome mapping and molecular breeding

dnadna hybridizations

types of mutations

inherited mutations

dna point mutations

dna ch

dna markeragriculture

locus marker

new gene mutation

Technology

molecular tagging, allele mining and marker aided · pdf...

hop breeding using molecular marker technology - ibd asia...

pedigree molecular marker

molecular marker-assisted breeding in rice

20.orchid molecular breeding

marker-assisted breeding for disease resistance...

molecular breeding foods

agronomic and marker-assisted breeding strategies …

molecular marker applications in oat (avena sativa l ... ·...

genomics based marker development and breeding for

application of molecular markers to rice breeding in … ·...

plant breeding methods and use of classical plant breeding....

molecular genetic markers associated with salt...

cytogenetic identification and molecular marker

molecular marker analysis of quantitative …

fruit breedomics workshop wp6 from marker assisted breeding...

marker-assisted breeding for rice improvement

molecular breeding 03

a molecular marker-assisted backcross breeding strategy...

applying molecular marker information and data mining to a...