molecular marker databases
TRANSCRIPT
49
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,DOI 10.1007/978-1-4939-1966-6_4, © Springer Science+Business Media New York 2015
Chapter 4
Molecular Marker Databases
Kaitao Lai , Michał Tadeusz Lorenc , and David Edwards
Abstract
The detection and analysis of genetic variation plays an important role in plant breeding and this role is increasing with the continued development of genome sequencing technologies. Molecular genetic markers are important tools to characterize genetic variation and assist with genomic breeding. Processing and storing the growing abundance of molecular marker data being produced requires the development of specifi c bioinformatics tools and advanced databases. Molecular marker databases range from species specifi c through to organism wide and often host a variety of additional related genetic, genomic, or phenotypic infor-mation. In this chapter, we will present some of the features of plant molecular genetic marker databases, highlight the various types of marker resources, and predict the potential future direction of crop marker databases.
Key words Molecular marker , Genetic marker , Genetic variation , SNP marker , SSR marker
1 Introduction
The characterization of genetic variation can provide knowledge to help understand the molecular basis of various biological phenom-ena in plants. Phenotype-based genetic markers were used in Gregor Mendel’s experiments in the nineteenth century. Later, phenotype-based genetic markers helped establish the theory of genetic linkage. More recently, DNA-based markers have been developed to over-come the limitations of phenotype-based genetic markers [ 1 ].
While several diverse DNA-based marker types have been developed, single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs, also known as microsatellites) predomi-nate and are widely used in plant breeding, genomic research, and modern genetic analysis [ 2 , 3 ]. Molecular markers are used in plant breeding and genetic research, including mapping of genes and quantitative trait loci (QTL) analysis, phylogenetic studies, comparative genomics, and marker-assisted breeding [ 4 – 6 ].
Most molecular marker databases host SNP and SSR markers [ 7 ]. Some databases also include other types of marker that are not
50
commonly used. These markers include restriction fragment length polymorphism (RFLP), amplifi ed fragment length poly-morphism (AFLP), random amplifi cation of polymorphic DNA (RAPD), short tandem repeat (STR), and diversity arrays technology (DArT).
A SNP is a DNA sequence variation, representing an individual nucleotide base in the genome that differs between individual genomes [ 8 ]. SNPs are regarded as evolutionarily conserved markers and have been used as markers for QTL analysis and in association studies in place of SSRs. There are several approaches to identify and genotype SNPs in plants [ 9 , 10 ] and their diverse applications suggest that they will continue to be the dominant DNA molecular marker in the foreseeable future [ 11 ]. The application of new sequencing methods is leading the discovery of large numbers of SNPs in wheat [ 12 , 13 ], rice [ 14 , 15 ], Brassicas [ 16 ], and other crop species [ 17 , 18 ].
SSRs are highly polymorphic and informative markers. SSRs demonstrate a high degree of transferability between different species and so are regarded as excellent markers for comparative genetic and genomic analysis. PCR primers designed to an SSR from one species frequently amplify a corresponding locus in related species. The mining of SSRs from gene and genome sequence data is now routine [ 19 ], with large numbers of SSRs identifi ed in a range of species including Brassicas [ 20 , 21 ], wheat [ 22 ], and strawberry [ 23 ]. SSR loci also provide hot spots for SNP discovery and SSRs may readily be converted to SNP markers [ 24 ].
Advances in genome sequencing technology and the increasing availability of genome sequences are providing an abundance of dense molecular markers [ 25 , 26 ]. For example, sequence poly-morphisms developed using the Brassica rapa genome sequence [ 27 ] have been used to identify and characterize SNP and poly-morphisms in agronomically important genes in canola ( B. napus ) [ 28 – 30 ]. In addition, the sequencing of isolated chromosome arms in wheat [ 31 – 33 ] has led to the identifi cation of large num-bers of molecular markers [ 22 ].
Genetic linkage maps represent the order of known molecular genetic markers along a given chromosome for a given species. Comparative mapping is a valuable technique to identify similarities and differences between species [ 34 ]. Many marker databases pro-vide a CMap map visualization tool or their own customized viewer tools for displaying data, including chromosomes and genetic mark-ers with associated mapping locations in the form of genetic linkage maps or comparative maps. A list of molecular marker databases is presented in Table 1 . In addition, web links and references for relevant marker databases are presented in Table 2 .
Kaitao Lai et al.
51
(con
tinue
d)
Tabl
e 1
Exam
ples
of m
olec
ular
mar
ker d
atab
ases
with
diff
eren
t typ
es o
f mar
kers
Data
base
nam
e Vi
ewer
SN
Ps
SSRs
RF
LPs
RAPD
s AF
LPs
ESTs
BA
Cs
DArT
s DN
A pr
obes
PC
R pr
imer
s
auto
SNPd
b *
+
Bra
ssica
.info
+
+ +
+ +
Bra
ssica
rap
a ge
nom
e da
taba
se
* +
Chi
ckpe
a ro
ot E
ST
data
base
+
Cot
ton
Mar
ker
Dat
abas
e (C
MD
) *
+ +
+
Gen
Ban
k db
SNP
* +
Gra
inge
nes
* +
+ +
+ +
+ +
Gra
men
e *
+ +
+ +
+ +
+ +
+
ICR
ISA
T
+
Leg
ume
Info
rmat
ion
Syst
em (
LIS
) *
+ +
+ +
+
Mai
zeG
DB
+
+ +
+ +
Moc
caD
B
+ +
+
Panz
ea
* +
+
Ric
e G
enom
e A
nnot
atio
n Pr
ojec
t *
+
SSR
Pri
mer
+
SSR
tax
onom
y tr
ee
+
Molecular Marker Databases
52
Tabl
e 1
(con
tinue
d)
Data
base
nam
e Vi
ewer
SN
Ps
SSRs
RF
LPs
RAPD
s AF
LPs
ESTs
BA
Cs
DArT
s DN
A pr
obes
PC
R pr
imer
s
SOL
Gen
omic
s N
etw
ork
(SG
N)
* +
+ +
+ +
SoyB
ase
* +
+ +
+ +
+
tfG
DR
Pro
ject
Web
site
*
+
Tri
ticea
e M
appe
d E
ST
Dat
aBas
e ve
r.2.0
(T
riM
ED
B)
* +
+
Veg
Mar
ks
* +
+ +
Whe
at g
enom
e in
form
atio
n +
+
* in
dica
tes
that
thi
s da
taba
se p
rovi
des
view
er, +
indi
cate
s th
at t
his
data
base
sup
plie
s th
is t
ype
of m
arke
r
Kaitao Lai et al.
53
Tabl
e 2
Exam
ples
of m
olec
ular
mar
ker d
atab
ases
rela
ted
to c
rop
impr
ovem
ent
Data
base
nam
e W
eb li
nk
Refe
renc
es
auto
SNPd
b ht
tp:/
/au
tosn
pdb.
appl
iedb
ioin
form
atic
s.co
m.a
u/
[ 60 ,
62 ,
63 ]
Bra
ssica
.info
ht
tp:/
/w
ww
.bra
ssic
a.in
fo/
reso
urce
/m
arke
rs.p
hp
[ 56 ]
Bra
ssica
rap
a ge
nom
e da
taba
se
http
://
bras
sica
db.o
rg/
brad
/ge
netic
Mar
ker.p
hp
[ 75 ]
Chi
ckpe
a ro
ot E
ST d
atab
ase
http
://
ww
w.ic
risa
t.or
g/w
hat-
we-
do/
biot
echn
olog
y/C
pest
/ho
me.
asp
[ 73 ]
Cot
ton
Mar
ker
Dat
abas
e (C
MD
) ht
tp:/
/w
ww
.cot
tonm
arke
r.org
/cg
i-bi
n/cm
d_se
arch
_mar
ker_
resu
lt.cg
i [ 5
9 ]
Gen
Ban
k db
SNP
http
://
ww
w.n
cbi.n
lm.n
ih.g
ov/
proj
ects
/SN
P/
[ 76 –
78 ]
Gra
inge
nes
http
://
whe
at.p
w.u
sda.
gov/
cgi-
bin/
grai
ngen
es/
brow
se.c
gi?c
lass
=mar
ker
[ 45 –
47 ]
Gra
men
e ht
tp:/
/w
ww
.gra
men
e.or
g/db
/m
arke
rs/
mar
ker_
view
[ 4
2 ]
ICR
ISA
T
http
://
ww
w.ic
risa
t.or
g/
[ 74 ]
Leg
ume
Info
rmat
ion
Syst
em (
LIS
) ht
tp:/
/w
ww
.com
para
tive-
legu
mes
.org
/
[ 79 ,
80 ]
Mai
zeG
DB
ht
tp:/
/w
ww
.mai
zegd
b.or
g/pr
obe.
php
[ 81 –
83 ]
Moc
caD
B
http
://
moc
cadb
.mpl
.ird.
fr/
inde
x.ph
p?ca
t=1
[ 58 ]
Panz
ea
http
://
ww
w.p
anze
a.or
g/db
/se
arch
es/
web
form
/m
arke
r_se
arch
[ 8
4 ]
Ric
e G
enom
e A
nnot
atio
n Pr
ojec
t ht
tp:/
/ri
ce.p
lant
biol
ogy.
msu
.edu
/an
nota
tion_
pseu
do_p
utat
ives
sr.s
htm
l [ 5
2 ]
SSR
Pri
mer
2
http
://
fl ora
.acp
fg.c
om.a
u/ss
rpri
mer
2/
[ 68 ]
(con
tinue
d)
Molecular Marker Databases
54
Data
base
nam
e W
eb li
nk
Refe
renc
es
SSR
tax
onom
y tr
ee
http
://
appl
iedb
ioin
form
atic
s.co
m.a
u/pr
ojec
ts/
ssrt
axon
omy/
php/
[ 6
8 ]
SOL
Gen
omic
s N
etw
ork
(SG
N)
http
://
solg
enom
ics.
net/
[ 5
7 ]
SoyB
ase
http
://
soyb
ase.
org/
[ 8
5 ]
tfG
DR
Pro
ject
Web
site
ht
tp:/
/tf
gdr.b
ioin
fo.w
su.e
du/
[8
6]
Tri
ticea
e M
appe
d E
ST D
atab
ase
ver.2
.0 (
Tri
ME
DB
) ht
tp:/
/tr
imed
b.ps
c.ri
ken.
jp/
inde
x.pl
[ 5
0 ]
Veg
Mar
ks
http
://
vegm
arks
.niv
ot.a
ffrc
.go.
jp/
Veg
Mar
ks/
jsp/
page
.do?
tran
sitio
n=m
arke
r
Whe
at g
enom
e in
form
atio
n ht
tp:/
/w
ww
.whe
atge
nom
e.in
fo
[ 65 ,
67 ]
Tabl
e 2
(con
tinue
d)
Kaitao Lai et al.
55
2 Molecular Marker Databases
With the ever increasing amount of genetic and genomic information there is a requirement to manage the data to make it available and accessible to researchers [ 35 , 36 ]. This includes the development of custom visualization tools [ 36 – 38 ] and bioinformatics systems to traverse the genome to phenome divide [ 39 , 40 ]. Many molecular marker databases provide various types of markers for a range of species while some databases provide information on a single type of marker [ 41 ]. The largest single marker database is dbSNP ( http://www.ncbi.nlm.nih.gov/projects/SNP/ ). dbSNP provides SNP data mostly for humans and other vertebrates, although it also includes some plant data.
There are several databases for the grasses. The Gramene data-base ( http://www.gramene.org/ ) hosts many types of markers based on the genomes of rice, maize, grape, and Arabidopsis [ 42 ]. This website provides a search engine, and users can search for spe-cifi c markers. Marker details are displayed in text format, including database cross-references and map positions linked to chromosomes in CMap [ 43 ]. The source of SSR markers includes the International Rice Genome Sequencing Project, IRMI (International Rice Microsatellite Initiative), MaizeGDB, the Cornell SSR library, and the Indian Agricultural Research Institute. Most of the SSR markers are from rice and maize. A total of 2,942 SNP markers from the Gramene database belong to barley and are related to high-throughput SNP genotyping in barley [ 44 ].
GrainGenes ( http://wheat.pw.usda.gov/cgi-bin/graingenes/ ) hosts multiple types of markers for Triticeae and Avena [ 45 – 47 ]. The website also provides comparative map views for wheat, barley, rye, and oats using CMap. Marker types include SSR, RFLP, and SNP. Most of the SNP makers are from two sources [ 44 , 48 ]. An improved SNP-based consensus genetic map has been devel-oped from 1,133 individuals from ten mapping populations. This database provides a search panel with query name or a list of marker names as input.
MaizeGDB ( http://www.maizegdb.org ) provides a search engine to identify ESTs, AFLPs, RAPD probes, and sequence data for maize. The legume information system (LIS) provides access to markers such as SNP, SSR, RFLP, and RAPDs for diverse legumes, including peanut, soybean, alfalfa, and common bean.
The Panzea ( http://www.panzea.org/ ) database describes the genetic architecture of complex traits in maize and teosinte. This database also provides a marker search interface. Two common types of marker, SNP and SSR, can be searched for. The search results display a list of markers with position details related to different chromosomes. When the marker is selected, the website
Molecular Marker Databases
56
can display this marker in precomputed multiple sequence align-ments using the Look-Align viewer [ 49 ].
TriMEDB (Triticeae mapped EST database) [ 50 ] provides information on mapped cDNA markers that are related between barley and wheat. The current version of TriMEDB provides map- location data for barley and wheat. These data were retrieved from three published barley linkage maps: the barley SNP database of SCRI ( http://bioinf.scri.ac.uk/barley_snpdb/ ), the barley tran-script map of IPK ( http://pgrc.ipk-gatersleben.de/transcript_map/ ), HarvEST barley versions 1.63 and 1.68 ( http://harvest.ucr.edu/ ), and one diploid wheat map [ 51 ]. Users can search the database from the search markers page using marker and chromo-some names. The search results include the name of any retrieved marker, related linkage maps, chromosome number, map posi-tions, primer pairs for PCR, EST contigs for each sequence resource, a link to the cDNA assembly, and comparative maps for the rice genome. The database can be accessed at http://trimedb.psc.riken.jp/ .
The database of the Rice Genome Annotation Project [ 52 ] hosts putative SSRs in the rice genome pseudomolecules ( http://rice.plantbiology.msu.edu/ ). The rice genome annotation project pseudomolecules (Release 7) were used for SSR identifi cation [ 53 ]. This database provides a web interface and displays predicted SSR markers fi ltered by type and/or chromosome, as well as a GBrowse view to display the SSR sequences.
With the exception of some important species, databases for nongrass species tend to be more limited in scope. There are a large number of Brassica molecular markers developed together with bioinformatics resources [ 54 , 55 ]. The central Brassica portal for all things Brassica ( http://www.brassica.info ) provides a link to access to a range of Brassica molecular markers, including SNP/InDel, SSR, RFLP, AFLP, and RAPD. This website provides a summary of available information for Brassica SSRs and provides a means to exchange and distribute these markers at the Brassica microsatellite information exchange [ 56 ].
The Sol Genomics Network database (SGN; http://solgenomics. net/ ) is a clade-oriented database (COD) hosting biological data for species in the Solanaceae and their close relatives. The data types range from chro-mosomes and genes to phenotypes and accessions. SGN hosts more than 20 genetic and physical maps for tomato, potato, pepper, and tobacco with thousands of markers. Genetic marker types in the database include SNP, SSR, AFLP, PCR, and RFLP [ 57 ].
The SoyBase database ( http://soybase.org/ ) hosts genomic and genetic data for soybean. The markers include SNP, SSR, RFLP, RAPD, and AFLP. The markers can be viewed from CMap and have also been linked to their corresponding location in a Gbrowse2 genome viewer. Each marker comes with the genomic sequence, detection method, and information source.
Kaitao Lai et al.
57
VegMarks ( http://vegmarks.nivot.affrc.go.jp/ ) is a database for vegetable genetic markers developed by National Institute of Vegetable and Tea Science (NIVTS) in Japan. This database pro-vides various marker characteristics, including ID number, genetic map position, nucleotide sequence of the clones/PCR primers, and polymorphism data among varieties/accessions for Chinese cabbage, bunching onion, cucumber, eggplant, melon, and tomato. The markers hosted in this database include SNP, SSR, and RFLP. Some marker data is restricted for registered users only. This data-base provides a single map for each chromosome together with marker position information.
MoccaDB ( http://moccadb.mpl.ird.fr/ ) is an integrative database for functional, comparative, and diversity studies in the Rubiaceae family which includes coffee [ 58 ]. It provides an easy access to markers, such as SSR, SNP, and RFLP and related infor-mation data such as PCR assay conditions, cross amplifi cation within related species, locus position on different linkage maps, and diversity parameters. It also provides a search engine for searching related markers by keywords and downloads of related data in Microsoft Offi ce Excel format.
The Cotton Microsatellite Database (CMD) ( http://www.cottonmarker.org/ ) is a curated and integrated web-based relational database providing centralized access to publicly available cotton SSRs. CMD contains publication, sequence, primer, mapping, and homology data for nine major cotton SSR projects, collectively representing 5,484 SSR markers [ 59 ].
In addition to species-specifi c databases, other databases focus on specifi c marker types. The autoSNPdb database [ 60 ] is based on an early pipeline for SNP discovery from EST sequence data [ 24 , 61 ]. It provides an interface facilitating a variety of queries to search for SNPs within known genes from a range of species including Brassica, rice, barley [ 62 ], and wheat [ 63 ]. The SNP identifi cation method was developed based on polymorphisms related to specifi c genes identifi ed through keyword, sequence similarity, or compara-tive genomics approaches. The results provide sequence annotation and SNP information in tabular and graphical format.
There are an increasing number of bioinformatics resources available for wheat [ 64 ]. WheatGenome.info is an integrated database resource which supplies a variety of web-based systems hosting wheat genetic and genomic data. Wheatgenome.info [ 65 ] provides a GBrowse2-based wheat genome viewer, CMap and CMap3D comparative genetic map viewers [ 38 , 43 ]. From the GBrowse2-based wheat genome viewer, wheat reference genomic sequences are currently only available for wheat group 7 chromo-somes [ 31 , 32 ]. SGSautoSNP (Second Generation Sequencing autoSNP) software has been used to identify more than 900 000 SNPs between four Australian varieties along this chromosome
Molecular Marker Databases
58
group [ 66 ]. More SNPs can be expected to be identifi ed between further wheat cultivars as this project develops.
SSR Primer 2 ( http://fl ora.acpfg.com.au/ssrprimer2/ ) [ 67 ] provides the real-time discovery of SSRs within submitted DNA sequences, with the concomitant design of PCR primers for SSR amplifi cation [ 68 ]. The success of this system has been demon-strated in Brassica [ 69 – 71 ] and strawberry [ 23 ].
A chickpea ( Cicer arietinum L) root EST database hosted at ICRISAT ( http://www.icrisat.org/ ) provides access to over 2,800 chickpea ESTs from a library constructed after subtractive suppressive hybridization (SSH) of root tissue from two closely related chickpea genotypes possessing different sources of drought avoidance and tolerance [ 72 ]. This chickpea root EST database is a subset of larger ICRISAT maintained database. ICRISAT ( http://www.icrisat.org/ ) also hosts a nonredundant set of 4,543 SNPs, which were identifi ed between two chickpea genotypes [ 73 ].
3 Conclusions and Future Direction
Molecular marker databases are expanding rapidly as increasing numbers of markers are developed from the latest high- throughput DNA sequencing technologies. There is an increasing challenge to manage and maintain this expanding data as well as integrate marker data with the growth of available genome sequences. Finally, the greatest challenge will be to fully integrate genetic diversity information with heritable trait information, bridging the genome to phenome divide and providing the tools for more advanced breeding and crop improvement.
References
1. Duran C, Edwards D, Batley J (2009) Molecular marker discovery and genetic map visualisation. In: Edwards D, Hanson D, Stajich J (eds) Applied bioinformatics. Springer, New York, pp 165–189
2. Edwards D, Batley J (2008) Bioinformatics: fundamentals and applications in plant genet-ics, mapping and breeding. In: Kole C, Abbott AG (eds) Principles and practices of plant genomics. Science Publishers, Inc., New York, pp 269–302
3. Appleby N, Edwards D, Batley J (2009) New technologies for ultra-high throughput geno-typing in plants. In: Somers D, Langridge P, Gustafson J (eds) Plant genomics. Humana, New York, pp 19–40
4. Prasad M, Varshney RK, Roy JK, Balyan HS, Gupta PK (2000) The use of microsatellites for
detecting DNA polymorphism, genotype iden-tifi cation and genetic diversity in wheat. Theor Appl Genet 100:592–594
5. Stein N, Graner A (2005) Map-based gene isolation in cereal genomes. In: Gupta P, Varshney R (eds) Cereal genomics. Springer, Amsterdam, pp 331–360
6. Varshney RK, Sigmund R, Börner A, Korzun V, Stein N, Sorrells ME, Langridge P, Graner A (2005) Interspecifi c transferability and com-parative mapping of barley EST-SSR markers in wheat, rye and rice. Plant Sci 168:195–202
7. Batley J, Edwards D (2009) Mining for single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) molecular genetic mark-ers. In: Posada D (ed) Bioinformatics for DNA sequence analysis. Humana, New York, pp 303–322
Kaitao Lai et al.
59
8. Edwards D, Forster JW, Chagné D, Batley J (2007) What are SNPs? In: Oraguzie NC, Rikkerink EHA, Gardiner SE, Silva HND (eds) Association mapping in plants. Springer, New York, pp 41–52
9. Chagné D, Batley J, Edwards D, Forster JW (2007) Single nucleotide polymorphism geno-typing in plants. In: Oraguzie N, Rikkerink E, Gardiner S, De Silva H (eds) Association map-ping in plants. Springer, New York, pp 77–94
10. Edwards D, Forster JW, Cogan NOI, Batley J, Chagné D (2007) Single nucleotide polymor-phism discovery. In: Oraguzie N, Rikkerink E, Gardiner S, De Silva H (eds) Association map-ping in plants. Springer, New York, pp 53–76
11. Batley J, Edwards D (2007) SNP applications in plants. In: Oraguzie N, Rikkerink E, Gardiner S, De Silva H (eds) Association mapping in plants. Springer, New York, pp 95–102
12. Allen AM, Barker GL, Berry ST, Coghill JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D'Amore R, McKenzie N, Waite D, Hall A, Bevan M, Hall N, Edwards KJ (2011) Transcript-specifi c, single-nucleotide polymor-phism discovery and linkage analysis in hexa-ploid bread wheat (Triticum aestivum L.). Plant Biotechnol J 9:1086–1099
13. Winfi eld MO, Wilkinson PA, Allen AM, Barker GL, Coghill JA, Burridge A, Hall A, Brenchley RC, D'Amore R, Hall N, Bevan MW, Richmond T, Gerhardt DJ, Jeddeloh JA, Edwards KJ (2012) Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol J 10:733–742
14. Kharabian-Masouleh A, Waters DLE, Reinke RF, Henry RJ (2011) Discovery of polymor-phisms in starch-related genes in rice germ-plasm by amplifi cation of pooled DNA and deeply parallel sequencing†. Plant Biotechnol J 9:1074–1085
15. Subbaiyan GK, Waters DL, Katiyar SK, Sadananda AR, Vaddadi S, Henry RJ (2012) Genome-wide DNA polymorphisms in elite indica rice inbreds discovered by whole- genome sequencing. Plant Biotechnol J 10:623–634
16. Trick M, Long Y, Meng JL, Bancroft I (2009) Single nucleotide polymorphism (SNP) discov-ery in the polyploid Brassica napus using Solexa transcriptome sequencing. Plant Biotechnol J 7:334–346
17. Barker GLA, Edwards KJ (2009) A genome- wide analysis of single nucleotide polymor-phism diversity in the world's major cereal crops. Plant Biotechnol J 7:318–325
18. Bundock PC, Eliott FG, Ablett G, Benson AD, Casu RE, Aitken KS, Henry RJ (2009) Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploid plant
species using 454 sequencing. Plant Biotechnol J 7:347–354
19. Edwards D, Batley J (2010) Plant genome sequencing: applications for crop improvement. Plant Biotechnol J 7:1–8
20. Hong CP, Piao ZY, Kang TW, Batley J, Yang TJ, Hur YK, Bhak J, Park BS, Edwards D, Lim YP (2007) Genomic distribution of simple sequence repeats in Brassica rapa. Mol Cells 23:349–356
21. Burgess B, Mountford H, Hopkins CJ, Love C, Ling AE, Spangenberg GC, Edwards D, Batley J (2006) Identifi cation and characterization of simple sequence repeat (SSR) markers derived in silico from Brassica oleracea genome shotgun sequences. Mol Ecol Notes 6:1191–1194
22. Nie X, Li B, Wang L, Liu P, Biradar SS, Li T, Dolezel J, Edwards D, Luo M, Weining S (2012) Development of chromosome-arm- specifi c microsatellite markers in Triticum aes-tivum (Poaceae) using NGS technology. Am J Bot 99:e369–e371
23. Keniry A, Hopkins CJ, Jewell E, Morrison B, Spangenberg GC, Edwards D, Batley J (2006) Identifi cation and characterization of simple sequence repeat (SSR) markers from Fragaria x ananassa expressed sequences. Mol Ecol Notes 6:319–322
24. Batley J, Barker G, O'Sullivan H, Edwards KJ, Edwards D (2003) Mining for single nucleo-tide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiol 132:84–91
25. Lee H, Lai K, Lorenc MT, Imelfort M, Duran C, Edwards D (2012) Bioinformatics tools and databases for analysis of next generation sequence data. Brief Funct Genomics 2:12–24
26. Imelfort M, Duran C, Batley J, Edwards D (2009) Discovering genetic polymorphisms in next-generation sequencing data. Plant Biotechnol J 7:312–317
27. Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun J-H, Bancroft I, Cheng F, Huang S, Li X, Hua W, Wang J, Wang X, Freeling M, Pires JC, Paterson AH, Chalhoub B, Wang B, Hayward A, Sharpe AG, Park B-S, Weisshaar B, Liu B, Li B, Liu B, Tong C, Song C, Duran C, Peng C, Geng C, Koh C, Lin C, Edwards D, Mu D, Shen D, Soumpourou E, Li F, Fraser F, Conant G, Lassalle G, King GJ, Bonnema G, Tang H, Wang H, Belcram H, Zhou H, Hirakawa H, Abe H, Guo H, Wang H, Jin H, Parkin IAP, Batley J, Kim J-S, Just J, Li J, Xu J, Deng J, Kim JA, Li J, Yu J, Meng J, Wang J, Min J, Poulain J, Hatakeyama K, Wu K, Wang L, Fang L, Trick M, Links MG, Zhao M, Jin M, Ramchiary N, Drou N, Berkman PJ, Cai Q,
Molecular Marker Databases
60
Huang Q, Li R, Tabata S, Cheng S, Zhang S, Zhang S, Huang S, Sato S, Sun S, Kwon S-J, Choi S-R, Lee T-H, Fan W, Zhao X, Tan X, Xu X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, Wang Z, Xiong Z, Zhang Z (2011) The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 43:1035–1040
28. Hayward A, Dalton-Morgan J, Mason A, Zander M, Edwards D, Batley J (2012) SNP discovery and applications in Brassica napus . J Plant Biotechnol 39:49–61
29. Hayward A, Vighnesh G, Delay C, Samian MR, Manoli S, Stiller J, McKenzie M, Edwards D, Batley J (2012) Second-generation sequenc-ing for gene discovery in the Brassicaceae. Plant Biotechnol J 10:750–759
30. Tollenaere R, Hayward A, Dalton-Morgan J, Campbell E, McLanders J, Lorenc M, Manoli S, Stiller J, Raman R, Raman H, Edwards D, Batley J (2012) Identifi cation and characterisa-tion of candidate Rlm4 blackleg resistance genes in Brassica napus using next generation sequencing. Plant Biotechnol J 10:709–715
31. Berkman BJ, Skarshewski A, Lorenc MT, Lai K, Duran C, Ling EYS, Stiller J, Smits L, Imelfort M, Manoli S, McKenzie M, Kubalakova M, Simkova H, Batley J, Fleury D, Dolezel J, Edwards D (2011) Sequencing and assembly of low copy and genic regions of iso-lated Triticum aestivum chromosome arm 7DS. Plant Biotechnol J 9:768–775
32. Berkman PJ, Skarshewski A, Manoli S, Lorenc MT, Stiller J, Smits L, Lai K, Campbell E, Kubalakova M, Simkova H, Batley J, Dolezel J, Hernandez P, Edwards D (2012) Sequencing wheat chromosome arm 7BS delimits the 7BS/4AL translocation and reveals homoeolo-gous gene conservation. Theor Appl Genet 124:423–432
33. Hernandez P, Martis M, Dorado G, Pfeifer M, Galvez S, Schaaf S, Jouve N, Simkova H, Valarik M, Dolezel J, Mayer KF (2012) Next- generation sequencing and syntenic integration of fl ow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J Cell Mol Biol 69:377–386
34. Duran C, Edwards D, Batley J (2009) Genetic maps and the use of synteny. In: Gustafson JP, Langridge P, Somers DJ (eds) Plant genomics. Humana, New York, pp 41–55
35. Batley J, Edwards D (2009) Genome sequence data: management, storage, and visualization. Biotechniques 46:333–336
36. Duran C, Appleby N, Edwards D, Batley J (2009) Molecular genetic markers: discovery, applications, data storage and visualisation. Curr Bioinform 4:16–27
37. Lim G, Jewell E, Li X, Erwin T, Love C, Batley J, Spangenberg G, Edwards D (2007) A com-parative map viewer integrating genetic maps for Brassica and Arabidopsis. BMC Plant Biol 7:40
38. Duran C, Boskovic Z, Imelfort M, Batley J, Hamilton NA, Edwards D (2010) CMap3D: a 3D visualisation tool for comparative genetic maps. Bioinformatics 26:273–274
39. Duran C, Eales D, Marshall D, Imelfort M, Stiller J, Berkman PJ, Clark T, McKenzie M, Appleby N, Batley J, Basford K, Edwards D (2010) Future tools for association mapping in crop plants. Genome 53:1017–1023
40. Edwards D, Batley J (2004) Plant bioinformatics: from genome to phenome. Trends Biotechnol 22:232–237
41. Lai K, Lorenc MT, Edwards D (2012) Genomic databases for crop improvement. Agronomy 2:62–73
42. Youens-Clark K, Buckler E, Casstevens T, Chen C, DeClerck G, Derwent P, Dharmawardhana P, Jaiswal P, Kersey P, Karthikeyan AS, Lu J, McCouch SR, Ren L, Spooner W, Stein JC, Thomason J, Wei S, Ware D (2011) Gramene database in 2010: updates and extensions. Nucleic Acids Res 39:D1085–D1094
43. Youens-Clark K, Faga B, Yap IV, Stein L, Ware D (2009) CMap 1.01: a comparative mapping application for the Internet. Bioinformatics 25:3040–3042
44. Close TJ, Bhat PR, Lonardi S, Wu Y, Rostoks N, Ramsay L, Druka A, Stein N, Svensson JT, Wanamaker S, Bozdag S, Roose ML, Moscou MJ, Chao S, Varshney RK, Szucs P, Sato K, Hayes PM, Matthews DE, Kleinhofs A, Muehlbauer GJ, DeYoung J, Marshall DF, Madishetty K, Fenton RD, Condamine P, Graner A, Waugh R (2009) Development and implementation of high-throughput SNP genotyping in barley. BMC Genomics 10:582
45. O'Sullivan H (2007) GrainGenes – a genomic database for Triticeae and Avena. In: Edwards D (ed) Methods in molecular biology. Humana, Totowa, NJ, pp 301–314
46. Carollo V, Matthews DE, Lazo GR, Blake TK, Hummel DD, Lui N, Hane DL, Anderson OD (2005) GrainGenes 2.0. An improved resource for the small-grains community. Plant Physiol 139:643–651
47. Matthews DE, Carollo VL, Lazo GR, Anderson OD (2003) GrainGenes, the genome database for small-grain crops. Nucleic Acids Res 31:183–186
48. Szűcs P, Blake VC, Bhat PR, Chao S, Close TJ, Cuesta-Marcos A, Muehlbauer GJ, Ramsay L, Waugh R, Hayes PM (2009) An integrated
Kaitao Lai et al.
61
resource for Barley linkage map and malting quality QTL alignment. Plant Gen 2:134–140
49. Canaran P, Stein L, Ware D (2006) Look- Align: an interactive web-based multiple sequence alignment viewer with polymorphism analysis support. Bioinformatics 22:885–886
50. Mochida K, Saisho D, Yoshida T, Sakurai T, Shinozaki K (2008) TriMEDB: a database to integrate transcribed markers and facilitate genetic studies of the tribe Triticeae. BMC Plant Biol 8:72
51. Hori K, Takehara S, Nankaku N, Sato K, Sasakuma T, Takeda K (2007) Barley EST mark-ers enhance map saturation and QTL mapping in diploid wheat. Breed Sci 57:39–45
52. Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L, Orvis J, Haas B, Wortman J, Buell CR (2007) The TIGR rice genome annotation resource: improvements and new features. Nucleic Acids Res 35:D883–D887
53. Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): fre-quency, length variation, transposon associa-tions, and genetic marker potential. Genome Res 11:1441–1452
54. Lorenc MT, Boskovic Z, Stiller J, Duran C, Edwards D (2012) Role of Bioinformatics as a tool for oilseed Brassica species. In: Edwards D, Parkin IAP, Batley J (eds) Genetics, genom-ics and breeding of oilseed Brassicas . Science Publishers Inc., New Hampshire, pp 194–205
55. Duran C, Boskovic Z, Batley J, Edwards D (2011) Role of bioinformatics as a tool for veg-etable Brassica species. In: Sadowski J (ed) Vegetable Brassicas. Science Publishers, Inc., New Hampshire, pp 406–418
56. Choi SR, Teakle GR, Plaha P, Kim JH, Allender CJ, Beynon E, Piao ZY, Soengas P, Han TH, King GJ, Barker GC, Hand P, Lydiate DJ, Batley J, Edwards D, Koo DH, Bang JW, Park BS, Lim YP (2007) The reference genetic link-age map for the multinational Brassica rapa genome sequencing project. Theor Appl Genet 115:777–792
57. Bombarely A, Menda N, Tecle IY, Buels RM, Strickler S, Fischer-York T, Pujar A, Leto J, Gosselin J, Mueller LA (2011) The sol genomics network (solgenomics.net): grow-ing tomatoes using Perl. Nucleic Acids Res 39:D1149–D1155
58. Plechakova O, Tranchant-Dubreuil C, Benedet F, Couderc M, Tinaut A, Viader V, De Block P, Hamon P, Campa C, de Kochko A, Hamon S, Poncet V (2009) MoccaDB – an integrative
database for functional, comparative and diver-sity studies in the Rubiaceae family. BMC Plant Biol 9:123
59. Blenda A, Scheffl er J, Scheffl er B, Palmer M, Lacape JM, Yu JZ, Jesudurai C, Jung S, Muthukumar S, Yellambalase P, Ficklin S, Staton M, Eshelman R, Ulloa M, Saha S, Burr B, Liu S, Zhang T, Fang D, Pepper A, Kumpatla S, Jacobs J, Tomkins J, Cantrell R, Main D (2006) CMD: a cotton microsatellite database resource for Gossypium genomics. BMC Genomics 7:132
60. Duran C, Appleby N, Clark T, Wood D, Imelfort M, Batley J, Edwards D (2009) AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants. Nucleic Acids Res 37:D951–D953
61. Barker G, Batley J, O'Sullivan H, Edwards KJ, Edwards D (2003) Redundancy based detec-tion of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics 19:421–422
62. Duran C, Appleby N, Vardy M, Imelfort M, Edwards D, Batley J (2009) Single nucleotide polymorphism discovery in barley using autoS-NPdb. Plant Biotechnol J 7:326–333
63. Lai K, Duran C, Berkman PJ, Lorenc MT, Stiller J, Manoli S, Hayden MJ, Forrest KL, Fleury D, Baumann U, Zander M, Mason AS, Batley J, Edwards D (2012) Single nucleotide polymorphism discovery from wheat next- generation sequence data. Plant Biotechnol J 10:743–749
64. Edwards D (2011) Wheat bioinformatics. In: Bonjean A, Angus W, Van Ginkel M (eds) The world wheat book. Lavoisier, Paris, pp 851–875
65. Lai K, Berkman PJ, Lorenc MT, Duran C, Smits L, Manoli S, Stiller J, Edwards D (2012) WheatGenome.info: an integrated database and portal for wheat genome information. Plant Cell Physiol 53:e2
66. Edwards D, Wilcox S, Barrero RA, Fleury D, Cavanagh CR, Forrest KL, Hayden MJ, Moolhuijzen P, Keeble-Gagnère G, Bellgard MI, Lorenc MT, Shang CA, Baumann U, Taylor JM, Morell MK, Langridge P, Appels R, Fitzgerald A (2012) Bread matters: a national initiative to profi le the genetic diversity of Australian wheat. Plant Biotechnol J 10:703–708
67. Jewell E, Robinson A, Savage D, Erwin T, Love CG, Lim GAC, Li X, Batley J, Spangenberg GC, Edwards D (2006) SSRPrimer and SSR taxonomy tree: biome SSR discovery. Nucleic Acids Res 34:W656–W659
68. Robinson AJ, Love CG, Batley J, Barker G, Edwards D (2004) Simple sequence repeat
Molecular Marker Databases
62
marker loci discovery using SSR primer. Bioinformatics 20:1475–1476
69. Batley J, Hopkins CJ, Cogan NOI, Hand M, Jewell E, Kaur J, Kaur S, Li X, Ling AE, Love C, Mountford H, Todorovic M, Vardy M, Walkiewicz M, Spangenberg GC, Edwards D (2007) Identifi cation and characterization of simple sequence repeat markers from Brassica napus expressed sequences. Mol Ecol Notes 7:886–889
70. Hopkins CJ, Cogan NOI, Hand M, Jewell E, Kaur J, Li X, Lim GAC, Ling AE, Love C, Mountford H, Todorovic M, Vardy M, Spangenberg GC, Edwards D, Batley J (2007) Sixteen new simple sequence repeat markers from Brassica juncea expressed sequences and their cross-species amplifi cation. Mol Ecol Notes 7:697–700
71. Ling AE, Kaur J, Burgess B, Hand M, Hopkins CJ, Li X, Love CG, Vardy M, Walkiewicz M, Spangenberg G, Edwards D, Batley J (2007) Characterization of simple sequence repeat markers derived in silico from Brassica rapa bacterial artifi cial chromosome sequences and their application in Brassica napus. Mol Ecol Notes 7:273–277
72. Jayashree B, Buhariwalla HK, Shinde S, Crouch JH (2005) A legume genommics resource: the chickpea root expressed sequence tag database. Electron J Biotechnol 8:128–133
73. Azam S, Thakur V, Ruperao P, Shah T, Balaji J, Amindala B, Farmer AD, Studholme DJ, May GD, Edwards D, Jones JD, Varshney RK (2012) Coverage-based consensus calling (CbCC) of short sequence reads and compari-son of CbCC results to identify SNPs in chick-pea ( Cicer arietinum ; Fabaceae), a crop species without a reference genome. Am J Bot 99:186–192
74. Cheng F, Liu S, Wu J, Fang L, Sun S, Liu B, Li P, Hua W, Wang X, Cheng F, Liu SY, Wu J, Fang L, Sun SL, Liu B, Li PX, Hua W, Wang XW (2011) BRAD, the genetics and genomics data-base for Brassica plants. BMC Plant Biol 11:136
75. Karsch-Mizrachi I, Nakamura Y, Cochrane G (2012) The international nucleotide sequence database collaboration. Nucleic Acids Res 40:D33–D37
76. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37:26–31
77. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio
M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E (2008) Database resources of the national center for biotechnology information. Nucleic Acids Res 36:D13–D21
78. Gonzales MD, Gajendran K, Farmer AD, Archuleta E, Beavis WD (2007) Leveraging model legume information to fi nd candidate genes for soybean sudden death syndrome using the legume information system. In: Edwards D (ed) Methods in molecular biology. Humana, Totowa, NJ, pp 245–259
79. Gonzales MD, Archuleta E, Farmer A, Gajendran K, Grant D, Shoemaker R, Beavis WD, Waugh ME (2005) The legume informa-tion system (LIS): an integrated information resource for comparative legume biology. Nucleic Acids Res 33:D660–D665
80. Schaeffer ML, Harper LC, Gardiner JM, Andorf CM, Campbell DA, Cannon EK, Sen TZ, Lawrence CJ (2011) MaizeGDB: curation and outreach go hand-in-hand. Database (Oxford) 2011, bar022
81. Lawrence CJ (2007) MaizeGDB – the maize genetics and genomics database. In: Edwards D (ed) Methods in molecular biology. Humana, Totowa, NJ, pp 331–345
82. Lawrence CJ, Schaeffer ML, Seigfried TE, Campbell DA, Harper LC (2007) MaizeGDB's new data types, resources and activities. Nucleic Acids Res 35:D895–D900
83. Canaran P, Buckler ES, Glaubitz JC, Stein L, Sun Q, Zhao W, Ware D (2008) Panzea: an update on new content and features. Nucleic Acids Res 36:D1041–D1043
84. Grant D, Nelson RT, Cannon SB, Shoemaker RC (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res 38:D843–D846
85. Wegrzyn J, Main D, Figueroa B, Choi M, Yu J, Neale D, Jung S, Lee T, Stanton M, Zheng P, Ficklin S, Cho I, Peace C, Evans K, Volk G, Oraguzie N, Chen C, Olmstead M, Gmitter G, Abbott A (2012) Uniform standards for genome databases in forest and fruit trees. Tree Genet Genomes 8:1–2
86. Tree fruit Genome Database Resources (tfGDR) (2002) Washington State University, Pullman, WA. http://www.tfgdr.org
Kaitao Lai et al.