Microsatellite markers: what they mean and why they are so ... ?· Microsatellite markers: what they…
Post on 16-Feb-2019
Embed Size (px)
Microsatellite markers: what they mean and why they are so useful
Maria Lucia Carneiro Vieira1, Luciane Santini1, Augusto Lima Diniz1 and Carla de Freitas Munhoz1
1Departamento de Gentica, Escola Superior de Agricultura Luiz de Queiroz (ESALQ), Universidade de
So Paulo (USP), Piracicaba, SP, Brazil.
Microsatellites or Single Sequence Repeats (SSRs) are extensively employed in plant genetics studies, using bothlow and high throughput genotyping approaches. Motivated by the importance of these sequences over the last de-cades this review aims to address some theoretical aspects of SSRs, including definition, characterization and bio-logical function. The methodologies for the development of SSR loci, genotyping and their applications as molecularmarkers are also reviewed. Finally, two data surveys are presented. The first was conducted using the main data-base of Web of Science, prospecting for articles published over the period from 2010 to 2015, resulting in approxi-mately 930 records. The second survey was focused on papers that aimed at SSR marker development, published inthe American Journal of Botanys Primer Notes and Protocols in Plant Sciences (over 2013 up to 2015), resulting in atotal of 87 publications. This scenario confirms the current relevance of SSRs and indicates their continuous utiliza-tion in plant science.
Keywords: SSR biological function, genomic distribution, genotyping approaches, molecular marker, practical utility.
Received: February 29, 2016; Accepted: May 13, 2016.
Ongoing technological advances in all fields ofknowledge mean that we cannot be sure which technolo-gies will survive the impact of innovation, and for howlong. Over the years, advances in molecular genetics meth-odology have lead to widespread use of codominant molec-ular markers, especially Simple Sequence Repeats (SSRs)and, more recently, Single Nucleotide Polymorphisms(SNPs). This paper attempts to present an overview of howthe concept of SSRs has evolved and how their biologicalfunctions were discovered. We also address the develop-ment of methods for identifying polymorphic SSRs, and theapplication of these markers in genetic analysis. It revealsthat much remains to be explored regarding these se-quences, particularly in relation to cultivated and wildplants.
Definition and genome occurrence ofmicrosatellites and their use as genetic markers
Microsatellites (1 to 10 nucleotides) and mini-satellites (> 10 nucleotides) are subcategories of tandem re-peats (TRs) that, together with the predominant inter-spersed repeats (or remnants of transposable elements),make up genomic repetitive regions. TRs are evolutionarily
relevant due to their instability. They mutate at rates be-tween 103 and 106 per cell generation i.e., up to 10 orders ofmagnitude greater than point mutations (Gemayel et al.,2012).
Microsatellites, Simple Sequence Repeats (SSR),Short Tandem Repeats (STR) and Simple Sequence LengthPolymorphisms (SSLP) are found in prokaryotes andeukaryotes. They are widely distributed throughout the ge-nome, especially in the euchromatin of eukaryotes, andcoding and non-coding nuclear and organellar DNA(Prez-Jimnez et al., 2013; Phumichai et al., 2015).
There is a lot of evidence to back up the hypothesisthat SSRs are not randomly distributed along the genome.In a comparative study, SSR distribution was found to behighly non-random and to vary a great deal in different re-gions of the genes of Arabidopsis thaliana and rice (Law-son and Zhang, 2006). In the major cereals, for instance,authors have tended to categorize microsatellites based ondifferent criteria. In barley and Avena species, SSRs wereclassified in two types: those with unique sequences on ei-ther flank and those intimately associated with retrotrans-posons and other dispersed repetitive elements. The secondtype was found to be less polymorphic in oat cultivars(Ramsay et al., 1999; Li et al., 2000). Using publicly avail-able DNA sequence information on the rice genome,Temnykh et al. (2001) categorized microsatellites based onlength and noticed that longer perfect repeats ( 20 nucleo-tides) were highly polymorphic. Microsatellites with SSRsshorter than 12 bp were found to have a mutation potential
Genetics and Molecular Biology, 39, 3, 312-328 (2016)Copyright 2016, Sociedade Brasileira de Gentica. Printed in BrazilDOI: http://dx.doi.org/10.1590/1678-4685-GMB-2016-0027
Send correspondence to Maria Lucia Carneiro Vieira. USP/ESALQ,P.O. Box 83, 13400-970, Piracicaba, SP, Brazil. E-mail:firstname.lastname@example.org
no different from that of most unique sequences. Moreover,authors reported that ~80% of GC-rich trinucleotides oc-curred in exons, whereas AT-rich trinucleotides weredistributed roughly evenly throughout all genomic compo-nents (coding sequences, untranslated regions, introns andintergenic spaces). Tetranucleotide SSRs were predomi-nantly situated in non-coding, mainly intergenic regions ofthe rice genome. It was later established that the SSR distri-butions in different regions of the maize genome werenon-random, and that density was highest in untranslatedregions (UTR), gradually falling off in the promotor,intron, intergenic, and coding sequence regions, in that or-der (Qu and Liu, 2013).
On the other hand, comparisons of microsatellite dis-tributions in Rumex acetosa and Silene latifolia chromo-somes showed that some motifs (e.g. CAA or TAA) arestrongly accumulated in non-recombining regions of thesex chromosome (Y) in both plant species (Kejnovsky etal., 2009). Similarly, a very large accumulation consistingmainly of microsatellites on the heterochromatic W chro-mosome was reported in a group of fish species (Leporinusspp.) that share a ZW sex system, showing an interconnec-tion between heterochromatinization and the accumulationof repetitive sequences, which has been proposed as the ba-sis of sex chromosome evolution (Poltronieri et al., 2014).
Generally speaking, it can be affirmed that the occur-rence of SSRs is lower in gene regions, due to the fact thatSSRs have a high mutation rate that could compromisegene expression. Studies indicate that in coding regionsthere is a predominance of SSRs with gene motifs of the tri-and hexanucleotide type, the result of selection pressureagainst mutations that alter the reading frame (Zhang et al.,2004; Xu et al., 2013b). In humans, the consensus is thatSSRs can also originate in coding regions, leading to theappearance of repetitive patterns in protein sequences. Inprotein sequence database studies, it was reported that tan-dem repeats are common in many proteins, and the mecha-nisms involved in their genesis may contribute to the rapidevolution of proteins (Katti et al., 2000; Huntley and Gol-ding, 2000).
Repeat polymorphisms usually result from the addi-tion or deletion of the entire repeat units or motifs. There-fore, different individuals exhibit variations as differencesin repeat numbers. In other words, the polymorphisms ob-served in SSRs are the result of differences in the number ofrepeats of the motif caused by polymerase strand-slippagein DNA replication or by recombination errors.Strand-slippage replication is a DNA replication error inwhich the template and nascent strands are mismatched.This means that the template strand can loop out, causingcontraction. The nascent strand can also loop out, leading torepeat expansion. Recombination events, such as unequalcrossing over and gene conversion, may additionally leadto SSR sequence contractions and expansions. Accordingto several authors, the longer and purer the repeat, the
higher the mutation frequency, whereas shorter repeatswith lower purity have a lower mutation frequency.
Mutations that have evaded correction by the DNAmismatch repair system form new alleles at SSR loci. Forthis reason, different alleles may exist at a given SSR locus,which means that SSRs are more informative than othermolecular markers, including SNPs.
As for their composition, SSRs can be classified ac-cording to motif as: i) perfect if composed entirely of re-peats of a single motif; ii) imperfect if a base pair notbelonging to the motif occurs between repeats; iii) inter-rupted if a sequence of a few base pairs is inserted into themotif; or iv) composite if formed by multiple, adjacent, re-petitive motifs (reviewed in Oliveira et al., 2006; revisitedby Mason, 2015).
SSRs have been the most widely used markers forgenotyping plants over the past 20 years because they arehighly informative, codominant, multi-allele genetic mark-ers that are experimentally reproducible and transferableamong related species (Mason, 2015). In particular, SSRsare useful for wild species (i) in studies of diversity mea-sured on the basis of genetic distance; (ii) to estimate geneflow and crossing over rates; and (iii) in evolutionary stud-ies, above all to infer infraspecific genetic relations. On theother hand, for cultivated plants SSRs are commonly usedfor (i) constructing linkage maps; (ii) mapping loci in-volved in quantitative traits (QTL); (iii) estimating thedegree of kinship between genotypes; (iv) using marker-assisted selection; and (v) defining cultivar DNA finger-prints (Jonah et al., 2011; Kalia et al., 2011). SSRs havebeen particularly useful for generating integrated maps forplant species in which full-sib families are used for con-structing linkage maps (Garcia et al., 2006; Souza et al.,2013; Pereira et al., 2013), and for combining genetic,physical, and sequence-based maps (Temnykh, 2001), pro-viding breeders and geneticists with a tool to link pheno-typic and genotypic variation (see Mammadov et al., 2012;Hayward et al., 2015 for review articles).
These markers are enormously useful in studies ofpopulation structure, genetic mapping, and evolutionaryprocesses. SSRs with core repeats 3 to 5 nucleotides longare preferred in forensics and parentage analysis. It is worthnoting that a number of SSR search algorithms have beendeveloped, including TRF (Benson, 1999), SSRIT(Temnykh, 2001), MISA (Thiel et al., 2003), SSRFinder(Gao et al., 2003), TROLL (Castelo et al., 2002) andSciRoKo (Kofler et al., 2007).
Detailing the biological functions of SSRs
Despite the wide applicability of SSRs as geneticmarkers since their discovery in the 1980s, little is knownabout the biological importance of microsatellites (Tautzand Renz, 1984), especially in plants. Morgante et al.(2002) estimated the density of SSRs in Arabidopsisthaliana, rice (Oryza sativa), soybean (Glycine max),
Vieira et al. 313
maize (Zea mays) and wheat (Triticum aestivum) and ob-served a high frequency of SSRs in transcribed regions, es-pecially in untranslated regions (UTRs). Interestingly,there are substantial data indicating that SSR expansions orcontractions in protein-coding regions can lead to a gain orloss of gene function via frameshift mutation or expandedtoxic mRNAs. SSR variations in 5-UTRs could regulategene expression by affecting transcription and translation,but expansions in the 3-UTRs cause transcription slippageand produce expanded mRNA, which can disrupt splicingand may disrupt other cellular functions. Intronic SSRs canaffect gene transcription, mRNA splicing, or export to cy-toplasm. Triplet SSRs located in UTRs or introns can alsoinduce heterochromatin-mediated-like gene silencing. Allthese effects can eventually lead to phenotypic changes (Liet al., 2004; Nalavade et al., 2013).
In fact, variation in the length of DNA triplet repeatshas been linked to phenotypic variability in microbes and toseveral human disorders, including Huntingtons diseasewhich is caused mainly by (CAG)n expansions. Moreover,the frequencies of different codon repeats vary consider-ably depending on the type of encoded amino acid. Inplants, a triplet repeat-associated genetic defect was identi-fied in a wild variety of A. thaliana that carries a dramati-cally expanded TTC/GAA repeat in the intron of the geneencoding the large subunit 1 of the isopropyl malate iso-merase. Expansion of the repeat causes an environment-dependent reduction in the enzymes activity and severelyimpairs plant growth, whereas contraction of the expandedrepeat can reverse the detrimental effect on the phenotype(Sureshkumar et al., 2009).
Historically, tandem repeats have been designated asnonfunctional DNA, mainly because they are highly unsta-ble. With the exception of tandem repeats involved in hu-man neurodegenerative diseases, repeat variation was oftenbelieved to be neutral with no phenotypic consequences(see Gemayel et al., 2012).
The detection of microsatellites in transcripts and reg-ulatory regions of the genome encouraged scientific inter-est in discovering their possible biological functions. Moreand more publications have presented evidence that micro-satellites play a role in relevant processes, such as the regu-lation of transcription and translation, organization ofchromatin, genome size and the cell cycle (Nevo, 2001; Liet al., 2004; Gao et al., 2013).
As mentioned above, most of the knowledge acquiredon microsatellites occurring in genes was obtained bystudying humans and animals, indicating their relationshipwith the manifestation of disease. In bacteria, maintainingnumerous microsatellite variants provides a source ofhighly mutable sequences that enable prompt generation ofnovel variations, ensuring the survival of the bacterial pop-ulation in widely varying environments, and adaptation topathogenesis and virulence. Nevertheless, few studies havefocused on whether the typical instability of microsatellites
is linked to phenotypic effects in plants (Li et al., 2004; Gaoet al., 2013). However, thanks to whole genome sequenc-ing the important role repeats might play in genomes is be-ing elucidated.
The consensus is that the biological function of amicrosatellite is related to its position in the genome. Forinstance, SSRs in 5-UTRs serve as protein binding sites,thereby regulating gene translation and protein componentand function, as classically demonstrated for the humangenes for thymidylate synthase (Horie et al., 1995) andcalmodulin-1 (Toutenhoofd et al., 1998). Ten years later,SSR densities in different regions (5-UTRs, introns, cod-ing exons, 3-UTRs, and upstream regions) in housekeep-ing and tissue-specific genes in human and mouse werecompared. Specifically, SSRs in the 5-UTRs of house-keeping genes are more abundant than in tissue-specificgenes. Additionally, it was suggested that SSRs may havean effect on gene expression and may play an importantrole in contributing to the different expression profiles ofhousekeeping and tissue-specific genes (Lawson andZhang, 2008).
In plants, despite the fact that a high density of SSRshas been detected in 5-UTR regions (Fujimori et al., 2003;Tranbarger et al., 2012; Zhao et al., 2014), there are fewstudies verifying their effect on the regulation of gene ex-press...