molecular marker databases

14
49 Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245, DOI 10.1007/978-1-4939-1966-6_4, © Springer Science+Business Media New York 2015 Chapter 4 Molecular Marker Databases Kaitao Lai, Michal Tadeusz Lorenc, and David Edwards Abstract The detection and analysis of genetic variation plays an important role in plant breeding and this role is increasing with the continued development of genome sequencing technologies. Molecular genetic markers are important tools to characterize genetic variation and assist with genomic breeding. Processing and storing the growing abundance of molecular marker data being produced requires the development of specific bioinformatics tools and advanced databases. Molecular marker databases range from species specific through to organism wide and often host a variety of additional related genetic, genomic, or phenotypic infor- mation. In this chapter, we will present some of the features of plant molecular genetic marker databases, highlight the various types of marker resources, and predict the potential future direction of crop marker databases. Key words Molecular marker, Genetic marker, Genetic variation, SNP marker, SSR marker 1 Introduction The characterization of genetic variation can provide knowledge to help understand the molecular basis of various biological phenom- ena in plants. Phenotype-based genetic markers were used in Gregor Mendel’s experiments in the nineteenth century. Later, phenotype- based genetic markers helped establish the theory of genetic linkage. More recently, DNA-based markers have been developed to over- come the limitations of phenotype-based genetic markers [1]. While several diverse DNA-based marker types have been developed, single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs, also known as microsatellites) predomi- nate and are widely used in plant breeding, genomic research, and modern genetic analysis [2, 3]. Molecular markers are used in plant breeding and genetic research, including mapping of genes and quantitative trait loci (QTL) analysis, phylogenetic studies, comparative genomics, and marker-assisted breeding [46]. Most molecular marker databases host SNP and SSR markers [7]. Some databases also include other types of marker that are not

Upload: independent

Post on 01-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

49

Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,DOI 10.1007/978-1-4939-1966-6_4, © Springer Science+Business Media New York 2015

Chapter 4

Molecular Marker Databases

Kaitao Lai , Michał Tadeusz Lorenc , and David Edwards

Abstract

The detection and analysis of genetic variation plays an important role in plant breeding and this role is increasing with the continued development of genome sequencing technologies. Molecular genetic markers are important tools to characterize genetic variation and assist with genomic breeding. Processing and storing the growing abundance of molecular marker data being produced requires the development of specifi c bioinformatics tools and advanced databases. Molecular marker databases range from species specifi c through to organism wide and often host a variety of additional related genetic, genomic, or phenotypic infor-mation. In this chapter, we will present some of the features of plant molecular genetic marker databases, highlight the various types of marker resources, and predict the potential future direction of crop marker databases.

Key words Molecular marker , Genetic marker , Genetic variation , SNP marker , SSR marker

1 Introduction

The characterization of genetic variation can provide knowledge to help understand the molecular basis of various biological phenom-ena in plants. Phenotype-based genetic markers were used in Gregor Mendel’s experiments in the nineteenth century. Later, phenotype-based genetic markers helped establish the theory of genetic linkage. More recently, DNA-based markers have been developed to over-come the limitations of phenotype-based genetic markers [ 1 ].

While several diverse DNA-based marker types have been developed, single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs, also known as microsatellites) predomi-nate and are widely used in plant breeding, genomic research, and modern genetic analysis [ 2 , 3 ]. Molecular markers are used in plant breeding and genetic research, including mapping of genes and quantitative trait loci (QTL) analysis, phylogenetic studies, comparative genomics, and marker-assisted breeding [ 4 – 6 ].

Most molecular marker databases host SNP and SSR markers [ 7 ]. Some databases also include other types of marker that are not

50

commonly used. These markers include restriction fragment length polymorphism (RFLP), amplifi ed fragment length poly-morphism (AFLP), random amplifi cation of polymorphic DNA (RAPD), short tandem repeat (STR), and diversity arrays technology (DArT).

A SNP is a DNA sequence variation, representing an individual nucleotide base in the genome that differs between individual genomes [ 8 ]. SNPs are regarded as evolutionarily conserved markers and have been used as markers for QTL analysis and in association studies in place of SSRs. There are several approaches to identify and genotype SNPs in plants [ 9 , 10 ] and their diverse applications suggest that they will continue to be the dominant DNA molecular marker in the foreseeable future [ 11 ]. The application of new sequencing methods is leading the discovery of large numbers of SNPs in wheat [ 12 , 13 ], rice [ 14 , 15 ], Brassicas [ 16 ], and other crop species [ 17 , 18 ].

SSRs are highly polymorphic and informative markers. SSRs demonstrate a high degree of transferability between different species and so are regarded as excellent markers for comparative genetic and genomic analysis. PCR primers designed to an SSR from one species frequently amplify a corresponding locus in related species. The mining of SSRs from gene and genome sequence data is now routine [ 19 ], with large numbers of SSRs identifi ed in a range of species including Brassicas [ 20 , 21 ], wheat [ 22 ], and strawberry [ 23 ]. SSR loci also provide hot spots for SNP discovery and SSRs may readily be converted to SNP markers [ 24 ].

Advances in genome sequencing technology and the increasing availability of genome sequences are providing an abundance of dense molecular markers [ 25 , 26 ]. For example, sequence poly-morphisms developed using the Brassica rapa genome sequence [ 27 ] have been used to identify and characterize SNP and poly-morphisms in agronomically important genes in canola ( B. napus ) [ 28 – 30 ]. In addition, the sequencing of isolated chromosome arms in wheat [ 31 – 33 ] has led to the identifi cation of large num-bers of molecular markers [ 22 ].

Genetic linkage maps represent the order of known molecular genetic markers along a given chromosome for a given species. Comparative mapping is a valuable technique to identify similarities and differences between species [ 34 ]. Many marker databases pro-vide a CMap map visualization tool or their own customized viewer tools for displaying data, including chromosomes and genetic mark-ers with associated mapping locations in the form of genetic linkage maps or comparative maps. A list of molecular marker databases is presented in Table 1 . In addition, web links and references for relevant marker databases are presented in Table 2 .

Kaitao Lai et al.

51

(con

tinue

d)

Tabl

e 1

Exam

ples

of m

olec

ular

mar

ker d

atab

ases

with

diff

eren

t typ

es o

f mar

kers

Data

base

nam

e Vi

ewer

SN

Ps

SSRs

RF

LPs

RAPD

s AF

LPs

ESTs

BA

Cs

DArT

s DN

A pr

obes

PC

R pr

imer

s

auto

SNPd

b *

+

Bra

ssica

.info

+

+ +

+ +

Bra

ssica

rap

a ge

nom

e da

taba

se

* +

Chi

ckpe

a ro

ot E

ST

data

base

+

Cot

ton

Mar

ker

Dat

abas

e (C

MD

) *

+ +

+

Gen

Ban

k db

SNP

* +

Gra

inge

nes

* +

+ +

+ +

+ +

Gra

men

e *

+ +

+ +

+ +

+ +

+

ICR

ISA

T

+

Leg

ume

Info

rmat

ion

Syst

em (

LIS

) *

+ +

+ +

+

Mai

zeG

DB

+

+ +

+ +

Moc

caD

B

+ +

+

Panz

ea

* +

+

Ric

e G

enom

e A

nnot

atio

n Pr

ojec

t *

+

SSR

Pri

mer

+

SSR

tax

onom

y tr

ee

+

Molecular Marker Databases

52

Tabl

e 1

(con

tinue

d)

Data

base

nam

e Vi

ewer

SN

Ps

SSRs

RF

LPs

RAPD

s AF

LPs

ESTs

BA

Cs

DArT

s DN

A pr

obes

PC

R pr

imer

s

SOL

Gen

omic

s N

etw

ork

(SG

N)

* +

+ +

+ +

SoyB

ase

* +

+ +

+ +

+

tfG

DR

Pro

ject

Web

site

*

+

Tri

ticea

e M

appe

d E

ST

Dat

aBas

e ve

r.2.0

(T

riM

ED

B)

* +

+

Veg

Mar

ks

* +

+ +

Whe

at g

enom

e in

form

atio

n +

+

* in

dica

tes

that

thi

s da

taba

se p

rovi

des

view

er, +

indi

cate

s th

at t

his

data

base

sup

plie

s th

is t

ype

of m

arke

r

Kaitao Lai et al.

53

Tabl

e 2

Exam

ples

of m

olec

ular

mar

ker d

atab

ases

rela

ted

to c

rop

impr

ovem

ent

Data

base

nam

e W

eb li

nk

Refe

renc

es

auto

SNPd

b ht

tp:/

/au

tosn

pdb.

appl

iedb

ioin

form

atic

s.co

m.a

u/

[ 60 ,

62 ,

63 ]

Bra

ssica

.info

ht

tp:/

/w

ww

.bra

ssic

a.in

fo/

reso

urce

/m

arke

rs.p

hp

[ 56 ]

Bra

ssica

rap

a ge

nom

e da

taba

se

http

://

bras

sica

db.o

rg/

brad

/ge

netic

Mar

ker.p

hp

[ 75 ]

Chi

ckpe

a ro

ot E

ST d

atab

ase

http

://

ww

w.ic

risa

t.or

g/w

hat-

we-

do/

biot

echn

olog

y/C

pest

/ho

me.

asp

[ 73 ]

Cot

ton

Mar

ker

Dat

abas

e (C

MD

) ht

tp:/

/w

ww

.cot

tonm

arke

r.org

/cg

i-bi

n/cm

d_se

arch

_mar

ker_

resu

lt.cg

i [ 5

9 ]

Gen

Ban

k db

SNP

http

://

ww

w.n

cbi.n

lm.n

ih.g

ov/

proj

ects

/SN

P/

[ 76 –

78 ]

Gra

inge

nes

http

://

whe

at.p

w.u

sda.

gov/

cgi-

bin/

grai

ngen

es/

brow

se.c

gi?c

lass

=mar

ker

[ 45 –

47 ]

Gra

men

e ht

tp:/

/w

ww

.gra

men

e.or

g/db

/m

arke

rs/

mar

ker_

view

[ 4

2 ]

ICR

ISA

T

http

://

ww

w.ic

risa

t.or

g/

[ 74 ]

Leg

ume

Info

rmat

ion

Syst

em (

LIS

) ht

tp:/

/w

ww

.com

para

tive-

legu

mes

.org

/

[ 79 ,

80 ]

Mai

zeG

DB

ht

tp:/

/w

ww

.mai

zegd

b.or

g/pr

obe.

php

[ 81 –

83 ]

Moc

caD

B

http

://

moc

cadb

.mpl

.ird.

fr/

inde

x.ph

p?ca

t=1

[ 58 ]

Panz

ea

http

://

ww

w.p

anze

a.or

g/db

/se

arch

es/

web

form

/m

arke

r_se

arch

[ 8

4 ]

Ric

e G

enom

e A

nnot

atio

n Pr

ojec

t ht

tp:/

/ri

ce.p

lant

biol

ogy.

msu

.edu

/an

nota

tion_

pseu

do_p

utat

ives

sr.s

htm

l [ 5

2 ]

SSR

Pri

mer

2

http

://

fl ora

.acp

fg.c

om.a

u/ss

rpri

mer

2/

[ 68 ]

(con

tinue

d)

Molecular Marker Databases

54

Data

base

nam

e W

eb li

nk

Refe

renc

es

SSR

tax

onom

y tr

ee

http

://

appl

iedb

ioin

form

atic

s.co

m.a

u/pr

ojec

ts/

ssrt

axon

omy/

php/

[ 6

8 ]

SOL

Gen

omic

s N

etw

ork

(SG

N)

http

://

solg

enom

ics.

net/

[ 5

7 ]

SoyB

ase

http

://

soyb

ase.

org/

[ 8

5 ]

tfG

DR

Pro

ject

Web

site

ht

tp:/

/tf

gdr.b

ioin

fo.w

su.e

du/

[8

6]

Tri

ticea

e M

appe

d E

ST D

atab

ase

ver.2

.0 (

Tri

ME

DB

) ht

tp:/

/tr

imed

b.ps

c.ri

ken.

jp/

inde

x.pl

[ 5

0 ]

Veg

Mar

ks

http

://

vegm

arks

.niv

ot.a

ffrc

.go.

jp/

Veg

Mar

ks/

jsp/

page

.do?

tran

sitio

n=m

arke

r

Whe

at g

enom

e in

form

atio

n ht

tp:/

/w

ww

.whe

atge

nom

e.in

fo

[ 65 ,

67 ]

Tabl

e 2

(con

tinue

d)

Kaitao Lai et al.

55

2 Molecular Marker Databases

With the ever increasing amount of genetic and genomic information there is a requirement to manage the data to make it available and accessible to researchers [ 35 , 36 ]. This includes the development of custom visualization tools [ 36 – 38 ] and bioinformatics systems to traverse the genome to phenome divide [ 39 , 40 ]. Many molecular marker databases provide various types of markers for a range of species while some databases provide information on a single type of marker [ 41 ]. The largest single marker database is dbSNP ( http://www.ncbi.nlm.nih.gov/projects/SNP/ ). dbSNP provides SNP data mostly for humans and other vertebrates, although it also includes some plant data.

There are several databases for the grasses. The Gramene data-base ( http://www.gramene.org/ ) hosts many types of markers based on the genomes of rice, maize, grape, and Arabidopsis [ 42 ]. This website provides a search engine, and users can search for spe-cifi c markers. Marker details are displayed in text format, including database cross-references and map positions linked to chromosomes in CMap [ 43 ]. The source of SSR markers includes the International Rice Genome Sequencing Project, IRMI (International Rice Microsatellite Initiative), MaizeGDB, the Cornell SSR library, and the Indian Agricultural Research Institute. Most of the SSR markers are from rice and maize. A total of 2,942 SNP markers from the Gramene database belong to barley and are related to high-throughput SNP genotyping in barley [ 44 ].

GrainGenes ( http://wheat.pw.usda.gov/cgi-bin/graingenes/ ) hosts multiple types of markers for Triticeae and Avena [ 45 – 47 ]. The website also provides comparative map views for wheat, barley, rye, and oats using CMap. Marker types include SSR, RFLP, and SNP. Most of the SNP makers are from two sources [ 44 , 48 ]. An improved SNP-based consensus genetic map has been devel-oped from 1,133 individuals from ten mapping populations. This database provides a search panel with query name or a list of marker names as input.

MaizeGDB ( http://www.maizegdb.org ) provides a search engine to identify ESTs, AFLPs, RAPD probes, and sequence data for maize. The legume information system (LIS) provides access to markers such as SNP, SSR, RFLP, and RAPDs for diverse legumes, including peanut, soybean, alfalfa, and common bean.

The Panzea ( http://www.panzea.org/ ) database describes the genetic architecture of complex traits in maize and teosinte. This database also provides a marker search interface. Two common types of marker, SNP and SSR, can be searched for. The search results display a list of markers with position details related to different chromosomes. When the marker is selected, the website

Molecular Marker Databases

56

can display this marker in precomputed multiple sequence align-ments using the Look-Align viewer [ 49 ].

TriMEDB (Triticeae mapped EST database) [ 50 ] provides information on mapped cDNA markers that are related between barley and wheat. The current version of TriMEDB provides map- location data for barley and wheat. These data were retrieved from three published barley linkage maps: the barley SNP database of SCRI ( http://bioinf.scri.ac.uk/barley_snpdb/ ), the barley tran-script map of IPK ( http://pgrc.ipk-gatersleben.de/transcript_map/ ), HarvEST barley versions 1.63 and 1.68 ( http://harvest.ucr.edu/ ), and one diploid wheat map [ 51 ]. Users can search the database from the search markers page using marker and chromo-some names. The search results include the name of any retrieved marker, related linkage maps, chromosome number, map posi-tions, primer pairs for PCR, EST contigs for each sequence resource, a link to the cDNA assembly, and comparative maps for the rice genome. The database can be accessed at http://trimedb.psc.riken.jp/ .

The database of the Rice Genome Annotation Project [ 52 ] hosts putative SSRs in the rice genome pseudomolecules ( http://rice.plantbiology.msu.edu/ ). The rice genome annotation project pseudomolecules (Release 7) were used for SSR identifi cation [ 53 ]. This database provides a web interface and displays predicted SSR markers fi ltered by type and/or chromosome, as well as a GBrowse view to display the SSR sequences.

With the exception of some important species, databases for nongrass species tend to be more limited in scope. There are a large number of Brassica molecular markers developed together with bioinformatics resources [ 54 , 55 ]. The central Brassica portal for all things Brassica ( http://www.brassica.info ) provides a link to access to a range of Brassica molecular markers, including SNP/InDel, SSR, RFLP, AFLP, and RAPD. This website provides a summary of available information for Brassica SSRs and provides a means to exchange and distribute these markers at the Brassica microsatellite information exchange [ 56 ].

The Sol Genomics Network database (SGN; http://solgenomics. net/ ) is a clade-oriented database (COD) hosting biological data for species in the Solanaceae and their close relatives. The data types range from chro-mosomes and genes to phenotypes and accessions. SGN hosts more than 20 genetic and physical maps for tomato, potato, pepper, and tobacco with thousands of markers. Genetic marker types in the database include SNP, SSR, AFLP, PCR, and RFLP [ 57 ].

The SoyBase database ( http://soybase.org/ ) hosts genomic and genetic data for soybean. The markers include SNP, SSR, RFLP, RAPD, and AFLP. The markers can be viewed from CMap and have also been linked to their corresponding location in a Gbrowse2 genome viewer. Each marker comes with the genomic sequence, detection method, and information source.

Kaitao Lai et al.

57

VegMarks ( http://vegmarks.nivot.affrc.go.jp/ ) is a database for vegetable genetic markers developed by National Institute of Vegetable and Tea Science (NIVTS) in Japan. This database pro-vides various marker characteristics, including ID number, genetic map position, nucleotide sequence of the clones/PCR primers, and polymorphism data among varieties/accessions for Chinese cabbage, bunching onion, cucumber, eggplant, melon, and tomato. The markers hosted in this database include SNP, SSR, and RFLP. Some marker data is restricted for registered users only. This data-base provides a single map for each chromosome together with marker position information.

MoccaDB ( http://moccadb.mpl.ird.fr/ ) is an integrative database for functional, comparative, and diversity studies in the Rubiaceae family which includes coffee [ 58 ]. It provides an easy access to markers, such as SSR, SNP, and RFLP and related infor-mation data such as PCR assay conditions, cross amplifi cation within related species, locus position on different linkage maps, and diversity parameters. It also provides a search engine for searching related markers by keywords and downloads of related data in Microsoft Offi ce Excel format.

The Cotton Microsatellite Database (CMD) ( http://www.cottonmarker.org/ ) is a curated and integrated web-based relational database providing centralized access to publicly available cotton SSRs. CMD contains publication, sequence, primer, mapping, and homology data for nine major cotton SSR projects, collectively representing 5,484 SSR markers [ 59 ].

In addition to species-specifi c databases, other databases focus on specifi c marker types. The autoSNPdb database [ 60 ] is based on an early pipeline for SNP discovery from EST sequence data [ 24 , 61 ]. It provides an interface facilitating a variety of queries to search for SNPs within known genes from a range of species including Brassica, rice, barley [ 62 ], and wheat [ 63 ]. The SNP identifi cation method was developed based on polymorphisms related to specifi c genes identifi ed through keyword, sequence similarity, or compara-tive genomics approaches. The results provide sequence annotation and SNP information in tabular and graphical format.

There are an increasing number of bioinformatics resources available for wheat [ 64 ]. WheatGenome.info is an integrated database resource which supplies a variety of web-based systems hosting wheat genetic and genomic data. Wheatgenome.info [ 65 ] provides a GBrowse2-based wheat genome viewer, CMap and CMap3D comparative genetic map viewers [ 38 , 43 ]. From the GBrowse2-based wheat genome viewer, wheat reference genomic sequences are currently only available for wheat group 7 chromo-somes [ 31 , 32 ]. SGSautoSNP (Second Generation Sequencing autoSNP) software has been used to identify more than 900 000 SNPs between four Australian varieties along this chromosome

Molecular Marker Databases

58

group [ 66 ]. More SNPs can be expected to be identifi ed between further wheat cultivars as this project develops.

SSR Primer 2 ( http://fl ora.acpfg.com.au/ssrprimer2/ ) [ 67 ] provides the real-time discovery of SSRs within submitted DNA sequences, with the concomitant design of PCR primers for SSR amplifi cation [ 68 ]. The success of this system has been demon-strated in Brassica [ 69 – 71 ] and strawberry [ 23 ].

A chickpea ( Cicer arietinum L) root EST database hosted at ICRISAT ( http://www.icrisat.org/ ) provides access to over 2,800 chickpea ESTs from a library constructed after subtractive suppressive hybridization (SSH) of root tissue from two closely related chickpea genotypes possessing different sources of drought avoidance and tolerance [ 72 ]. This chickpea root EST database is a subset of larger ICRISAT maintained database. ICRISAT ( http://www.icrisat.org/ ) also hosts a nonredundant set of 4,543 SNPs, which were identifi ed between two chickpea genotypes [ 73 ].

3 Conclusions and Future Direction

Molecular marker databases are expanding rapidly as increasing numbers of markers are developed from the latest high- throughput DNA sequencing technologies. There is an increasing challenge to manage and maintain this expanding data as well as integrate marker data with the growth of available genome sequences. Finally, the greatest challenge will be to fully integrate genetic diversity information with heritable trait information, bridging the genome to phenome divide and providing the tools for more advanced breeding and crop improvement.

References

1. Duran C, Edwards D, Batley J (2009) Molecular marker discovery and genetic map visualisation. In: Edwards D, Hanson D, Stajich J (eds) Applied bioinformatics. Springer, New York, pp 165–189

2. Edwards D, Batley J (2008) Bioinformatics: fundamentals and applications in plant genet-ics, mapping and breeding. In: Kole C, Abbott AG (eds) Principles and practices of plant genomics. Science Publishers, Inc., New York, pp 269–302

3. Appleby N, Edwards D, Batley J (2009) New technologies for ultra-high throughput geno-typing in plants. In: Somers D, Langridge P, Gustafson J (eds) Plant genomics. Humana, New York, pp 19–40

4. Prasad M, Varshney RK, Roy JK, Balyan HS, Gupta PK (2000) The use of microsatellites for

detecting DNA polymorphism, genotype iden-tifi cation and genetic diversity in wheat. Theor Appl Genet 100:592–594

5. Stein N, Graner A (2005) Map-based gene isolation in cereal genomes. In: Gupta P, Varshney R (eds) Cereal genomics. Springer, Amsterdam, pp 331–360

6. Varshney RK, Sigmund R, Börner A, Korzun V, Stein N, Sorrells ME, Langridge P, Graner A (2005) Interspecifi c transferability and com-parative mapping of barley EST-SSR markers in wheat, rye and rice. Plant Sci 168:195–202

7. Batley J, Edwards D (2009) Mining for single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) molecular genetic mark-ers. In: Posada D (ed) Bioinformatics for DNA sequence analysis. Humana, New York, pp 303–322

Kaitao Lai et al.

59

8. Edwards D, Forster JW, Chagné D, Batley J (2007) What are SNPs? In: Oraguzie NC, Rikkerink EHA, Gardiner SE, Silva HND (eds) Association mapping in plants. Springer, New York, pp 41–52

9. Chagné D, Batley J, Edwards D, Forster JW (2007) Single nucleotide polymorphism geno-typing in plants. In: Oraguzie N, Rikkerink E, Gardiner S, De Silva H (eds) Association map-ping in plants. Springer, New York, pp 77–94

10. Edwards D, Forster JW, Cogan NOI, Batley J, Chagné D (2007) Single nucleotide polymor-phism discovery. In: Oraguzie N, Rikkerink E, Gardiner S, De Silva H (eds) Association map-ping in plants. Springer, New York, pp 53–76

11. Batley J, Edwards D (2007) SNP applications in plants. In: Oraguzie N, Rikkerink E, Gardiner S, De Silva H (eds) Association mapping in plants. Springer, New York, pp 95–102

12. Allen AM, Barker GL, Berry ST, Coghill JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D'Amore R, McKenzie N, Waite D, Hall A, Bevan M, Hall N, Edwards KJ (2011) Transcript-specifi c, single-nucleotide polymor-phism discovery and linkage analysis in hexa-ploid bread wheat (Triticum aestivum L.). Plant Biotechnol J 9:1086–1099

13. Winfi eld MO, Wilkinson PA, Allen AM, Barker GL, Coghill JA, Burridge A, Hall A, Brenchley RC, D'Amore R, Hall N, Bevan MW, Richmond T, Gerhardt DJ, Jeddeloh JA, Edwards KJ (2012) Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol J 10:733–742

14. Kharabian-Masouleh A, Waters DLE, Reinke RF, Henry RJ (2011) Discovery of polymor-phisms in starch-related genes in rice germ-plasm by amplifi cation of pooled DNA and deeply parallel sequencing†. Plant Biotechnol J 9:1074–1085

15. Subbaiyan GK, Waters DL, Katiyar SK, Sadananda AR, Vaddadi S, Henry RJ (2012) Genome-wide DNA polymorphisms in elite indica rice inbreds discovered by whole- genome sequencing. Plant Biotechnol J 10:623–634

16. Trick M, Long Y, Meng JL, Bancroft I (2009) Single nucleotide polymorphism (SNP) discov-ery in the polyploid Brassica napus using Solexa transcriptome sequencing. Plant Biotechnol J 7:334–346

17. Barker GLA, Edwards KJ (2009) A genome- wide analysis of single nucleotide polymor-phism diversity in the world's major cereal crops. Plant Biotechnol J 7:318–325

18. Bundock PC, Eliott FG, Ablett G, Benson AD, Casu RE, Aitken KS, Henry RJ (2009) Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploid plant

species using 454 sequencing. Plant Biotechnol J 7:347–354

19. Edwards D, Batley J (2010) Plant genome sequencing: applications for crop improvement. Plant Biotechnol J 7:1–8

20. Hong CP, Piao ZY, Kang TW, Batley J, Yang TJ, Hur YK, Bhak J, Park BS, Edwards D, Lim YP (2007) Genomic distribution of simple sequence repeats in Brassica rapa. Mol Cells 23:349–356

21. Burgess B, Mountford H, Hopkins CJ, Love C, Ling AE, Spangenberg GC, Edwards D, Batley J (2006) Identifi cation and characterization of simple sequence repeat (SSR) markers derived in silico from Brassica oleracea genome shotgun sequences. Mol Ecol Notes 6:1191–1194

22. Nie X, Li B, Wang L, Liu P, Biradar SS, Li T, Dolezel J, Edwards D, Luo M, Weining S (2012) Development of chromosome-arm- specifi c microsatellite markers in Triticum aes-tivum (Poaceae) using NGS technology. Am J Bot 99:e369–e371

23. Keniry A, Hopkins CJ, Jewell E, Morrison B, Spangenberg GC, Edwards D, Batley J (2006) Identifi cation and characterization of simple sequence repeat (SSR) markers from Fragaria x ananassa expressed sequences. Mol Ecol Notes 6:319–322

24. Batley J, Barker G, O'Sullivan H, Edwards KJ, Edwards D (2003) Mining for single nucleo-tide polymorphisms and insertions/deletions in maize expressed sequence tag data. Plant Physiol 132:84–91

25. Lee H, Lai K, Lorenc MT, Imelfort M, Duran C, Edwards D (2012) Bioinformatics tools and databases for analysis of next generation sequence data. Brief Funct Genomics 2:12–24

26. Imelfort M, Duran C, Batley J, Edwards D (2009) Discovering genetic polymorphisms in next-generation sequencing data. Plant Biotechnol J 7:312–317

27. Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun J-H, Bancroft I, Cheng F, Huang S, Li X, Hua W, Wang J, Wang X, Freeling M, Pires JC, Paterson AH, Chalhoub B, Wang B, Hayward A, Sharpe AG, Park B-S, Weisshaar B, Liu B, Li B, Liu B, Tong C, Song C, Duran C, Peng C, Geng C, Koh C, Lin C, Edwards D, Mu D, Shen D, Soumpourou E, Li F, Fraser F, Conant G, Lassalle G, King GJ, Bonnema G, Tang H, Wang H, Belcram H, Zhou H, Hirakawa H, Abe H, Guo H, Wang H, Jin H, Parkin IAP, Batley J, Kim J-S, Just J, Li J, Xu J, Deng J, Kim JA, Li J, Yu J, Meng J, Wang J, Min J, Poulain J, Hatakeyama K, Wu K, Wang L, Fang L, Trick M, Links MG, Zhao M, Jin M, Ramchiary N, Drou N, Berkman PJ, Cai Q,

Molecular Marker Databases

60

Huang Q, Li R, Tabata S, Cheng S, Zhang S, Zhang S, Huang S, Sato S, Sun S, Kwon S-J, Choi S-R, Lee T-H, Fan W, Zhao X, Tan X, Xu X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, Wang Z, Xiong Z, Zhang Z (2011) The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 43:1035–1040

28. Hayward A, Dalton-Morgan J, Mason A, Zander M, Edwards D, Batley J (2012) SNP discovery and applications in Brassica napus . J Plant Biotechnol 39:49–61

29. Hayward A, Vighnesh G, Delay C, Samian MR, Manoli S, Stiller J, McKenzie M, Edwards D, Batley J (2012) Second-generation sequenc-ing for gene discovery in the Brassicaceae. Plant Biotechnol J 10:750–759

30. Tollenaere R, Hayward A, Dalton-Morgan J, Campbell E, McLanders J, Lorenc M, Manoli S, Stiller J, Raman R, Raman H, Edwards D, Batley J (2012) Identifi cation and characterisa-tion of candidate Rlm4 blackleg resistance genes in Brassica napus using next generation sequencing. Plant Biotechnol J 10:709–715

31. Berkman BJ, Skarshewski A, Lorenc MT, Lai K, Duran C, Ling EYS, Stiller J, Smits L, Imelfort M, Manoli S, McKenzie M, Kubalakova M, Simkova H, Batley J, Fleury D, Dolezel J, Edwards D (2011) Sequencing and assembly of low copy and genic regions of iso-lated Triticum aestivum chromosome arm 7DS. Plant Biotechnol J 9:768–775

32. Berkman PJ, Skarshewski A, Manoli S, Lorenc MT, Stiller J, Smits L, Lai K, Campbell E, Kubalakova M, Simkova H, Batley J, Dolezel J, Hernandez P, Edwards D (2012) Sequencing wheat chromosome arm 7BS delimits the 7BS/4AL translocation and reveals homoeolo-gous gene conservation. Theor Appl Genet 124:423–432

33. Hernandez P, Martis M, Dorado G, Pfeifer M, Galvez S, Schaaf S, Jouve N, Simkova H, Valarik M, Dolezel J, Mayer KF (2012) Next- generation sequencing and syntenic integration of fl ow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J Cell Mol Biol 69:377–386

34. Duran C, Edwards D, Batley J (2009) Genetic maps and the use of synteny. In: Gustafson JP, Langridge P, Somers DJ (eds) Plant genomics. Humana, New York, pp 41–55

35. Batley J, Edwards D (2009) Genome sequence data: management, storage, and visualization. Biotechniques 46:333–336

36. Duran C, Appleby N, Edwards D, Batley J (2009) Molecular genetic markers: discovery, applications, data storage and visualisation. Curr Bioinform 4:16–27

37. Lim G, Jewell E, Li X, Erwin T, Love C, Batley J, Spangenberg G, Edwards D (2007) A com-parative map viewer integrating genetic maps for Brassica and Arabidopsis. BMC Plant Biol 7:40

38. Duran C, Boskovic Z, Imelfort M, Batley J, Hamilton NA, Edwards D (2010) CMap3D: a 3D visualisation tool for comparative genetic maps. Bioinformatics 26:273–274

39. Duran C, Eales D, Marshall D, Imelfort M, Stiller J, Berkman PJ, Clark T, McKenzie M, Appleby N, Batley J, Basford K, Edwards D (2010) Future tools for association mapping in crop plants. Genome 53:1017–1023

40. Edwards D, Batley J (2004) Plant bioinformatics: from genome to phenome. Trends Biotechnol 22:232–237

41. Lai K, Lorenc MT, Edwards D (2012) Genomic databases for crop improvement. Agronomy 2:62–73

42. Youens-Clark K, Buckler E, Casstevens T, Chen C, DeClerck G, Derwent P, Dharmawardhana P, Jaiswal P, Kersey P, Karthikeyan AS, Lu J, McCouch SR, Ren L, Spooner W, Stein JC, Thomason J, Wei S, Ware D (2011) Gramene database in 2010: updates and extensions. Nucleic Acids Res 39:D1085–D1094

43. Youens-Clark K, Faga B, Yap IV, Stein L, Ware D (2009) CMap 1.01: a comparative mapping application for the Internet. Bioinformatics 25:3040–3042

44. Close TJ, Bhat PR, Lonardi S, Wu Y, Rostoks N, Ramsay L, Druka A, Stein N, Svensson JT, Wanamaker S, Bozdag S, Roose ML, Moscou MJ, Chao S, Varshney RK, Szucs P, Sato K, Hayes PM, Matthews DE, Kleinhofs A, Muehlbauer GJ, DeYoung J, Marshall DF, Madishetty K, Fenton RD, Condamine P, Graner A, Waugh R (2009) Development and implementation of high-throughput SNP genotyping in barley. BMC Genomics 10:582

45. O'Sullivan H (2007) GrainGenes – a genomic database for Triticeae and Avena. In: Edwards D (ed) Methods in molecular biology. Humana, Totowa, NJ, pp 301–314

46. Carollo V, Matthews DE, Lazo GR, Blake TK, Hummel DD, Lui N, Hane DL, Anderson OD (2005) GrainGenes 2.0. An improved resource for the small-grains community. Plant Physiol 139:643–651

47. Matthews DE, Carollo VL, Lazo GR, Anderson OD (2003) GrainGenes, the genome database for small-grain crops. Nucleic Acids Res 31:183–186

48. Szűcs P, Blake VC, Bhat PR, Chao S, Close TJ, Cuesta-Marcos A, Muehlbauer GJ, Ramsay L, Waugh R, Hayes PM (2009) An integrated

Kaitao Lai et al.

61

resource for Barley linkage map and malting quality QTL alignment. Plant Gen 2:134–140

49. Canaran P, Stein L, Ware D (2006) Look- Align: an interactive web-based multiple sequence alignment viewer with polymorphism analysis support. Bioinformatics 22:885–886

50. Mochida K, Saisho D, Yoshida T, Sakurai T, Shinozaki K (2008) TriMEDB: a database to integrate transcribed markers and facilitate genetic studies of the tribe Triticeae. BMC Plant Biol 8:72

51. Hori K, Takehara S, Nankaku N, Sato K, Sasakuma T, Takeda K (2007) Barley EST mark-ers enhance map saturation and QTL mapping in diploid wheat. Breed Sci 57:39–45

52. Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L, Orvis J, Haas B, Wortman J, Buell CR (2007) The TIGR rice genome annotation resource: improvements and new features. Nucleic Acids Res 35:D883–D887

53. Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): fre-quency, length variation, transposon associa-tions, and genetic marker potential. Genome Res 11:1441–1452

54. Lorenc MT, Boskovic Z, Stiller J, Duran C, Edwards D (2012) Role of Bioinformatics as a tool for oilseed Brassica species. In: Edwards D, Parkin IAP, Batley J (eds) Genetics, genom-ics and breeding of oilseed Brassicas . Science Publishers Inc., New Hampshire, pp 194–205

55. Duran C, Boskovic Z, Batley J, Edwards D (2011) Role of bioinformatics as a tool for veg-etable Brassica species. In: Sadowski J (ed) Vegetable Brassicas. Science Publishers, Inc., New Hampshire, pp 406–418

56. Choi SR, Teakle GR, Plaha P, Kim JH, Allender CJ, Beynon E, Piao ZY, Soengas P, Han TH, King GJ, Barker GC, Hand P, Lydiate DJ, Batley J, Edwards D, Koo DH, Bang JW, Park BS, Lim YP (2007) The reference genetic link-age map for the multinational Brassica rapa genome sequencing project. Theor Appl Genet 115:777–792

57. Bombarely A, Menda N, Tecle IY, Buels RM, Strickler S, Fischer-York T, Pujar A, Leto J, Gosselin J, Mueller LA (2011) The sol genomics network (solgenomics.net): grow-ing tomatoes using Perl. Nucleic Acids Res 39:D1149–D1155

58. Plechakova O, Tranchant-Dubreuil C, Benedet F, Couderc M, Tinaut A, Viader V, De Block P, Hamon P, Campa C, de Kochko A, Hamon S, Poncet V (2009) MoccaDB – an integrative

database for functional, comparative and diver-sity studies in the Rubiaceae family. BMC Plant Biol 9:123

59. Blenda A, Scheffl er J, Scheffl er B, Palmer M, Lacape JM, Yu JZ, Jesudurai C, Jung S, Muthukumar S, Yellambalase P, Ficklin S, Staton M, Eshelman R, Ulloa M, Saha S, Burr B, Liu S, Zhang T, Fang D, Pepper A, Kumpatla S, Jacobs J, Tomkins J, Cantrell R, Main D (2006) CMD: a cotton microsatellite database resource for Gossypium genomics. BMC Genomics 7:132

60. Duran C, Appleby N, Clark T, Wood D, Imelfort M, Batley J, Edwards D (2009) AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants. Nucleic Acids Res 37:D951–D953

61. Barker G, Batley J, O'Sullivan H, Edwards KJ, Edwards D (2003) Redundancy based detec-tion of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics 19:421–422

62. Duran C, Appleby N, Vardy M, Imelfort M, Edwards D, Batley J (2009) Single nucleotide polymorphism discovery in barley using autoS-NPdb. Plant Biotechnol J 7:326–333

63. Lai K, Duran C, Berkman PJ, Lorenc MT, Stiller J, Manoli S, Hayden MJ, Forrest KL, Fleury D, Baumann U, Zander M, Mason AS, Batley J, Edwards D (2012) Single nucleotide polymorphism discovery from wheat next- generation sequence data. Plant Biotechnol J 10:743–749

64. Edwards D (2011) Wheat bioinformatics. In: Bonjean A, Angus W, Van Ginkel M (eds) The world wheat book. Lavoisier, Paris, pp 851–875

65. Lai K, Berkman PJ, Lorenc MT, Duran C, Smits L, Manoli S, Stiller J, Edwards D (2012) WheatGenome.info: an integrated database and portal for wheat genome information. Plant Cell Physiol 53:e2

66. Edwards D, Wilcox S, Barrero RA, Fleury D, Cavanagh CR, Forrest KL, Hayden MJ, Moolhuijzen P, Keeble-Gagnère G, Bellgard MI, Lorenc MT, Shang CA, Baumann U, Taylor JM, Morell MK, Langridge P, Appels R, Fitzgerald A (2012) Bread matters: a national initiative to profi le the genetic diversity of Australian wheat. Plant Biotechnol J 10:703–708

67. Jewell E, Robinson A, Savage D, Erwin T, Love CG, Lim GAC, Li X, Batley J, Spangenberg GC, Edwards D (2006) SSRPrimer and SSR taxonomy tree: biome SSR discovery. Nucleic Acids Res 34:W656–W659

68. Robinson AJ, Love CG, Batley J, Barker G, Edwards D (2004) Simple sequence repeat

Molecular Marker Databases

62

marker loci discovery using SSR primer. Bioinformatics 20:1475–1476

69. Batley J, Hopkins CJ, Cogan NOI, Hand M, Jewell E, Kaur J, Kaur S, Li X, Ling AE, Love C, Mountford H, Todorovic M, Vardy M, Walkiewicz M, Spangenberg GC, Edwards D (2007) Identifi cation and characterization of simple sequence repeat markers from Brassica napus expressed sequences. Mol Ecol Notes 7:886–889

70. Hopkins CJ, Cogan NOI, Hand M, Jewell E, Kaur J, Li X, Lim GAC, Ling AE, Love C, Mountford H, Todorovic M, Vardy M, Spangenberg GC, Edwards D, Batley J (2007) Sixteen new simple sequence repeat markers from Brassica juncea expressed sequences and their cross-species amplifi cation. Mol Ecol Notes 7:697–700

71. Ling AE, Kaur J, Burgess B, Hand M, Hopkins CJ, Li X, Love CG, Vardy M, Walkiewicz M, Spangenberg G, Edwards D, Batley J (2007) Characterization of simple sequence repeat markers derived in silico from Brassica rapa bacterial artifi cial chromosome sequences and their application in Brassica napus. Mol Ecol Notes 7:273–277

72. Jayashree B, Buhariwalla HK, Shinde S, Crouch JH (2005) A legume genommics resource: the chickpea root expressed sequence tag database. Electron J Biotechnol 8:128–133

73. Azam S, Thakur V, Ruperao P, Shah T, Balaji J, Amindala B, Farmer AD, Studholme DJ, May GD, Edwards D, Jones JD, Varshney RK (2012) Coverage-based consensus calling (CbCC) of short sequence reads and compari-son of CbCC results to identify SNPs in chick-pea ( Cicer arietinum ; Fabaceae), a crop species without a reference genome. Am J Bot 99:186–192

74. Cheng F, Liu S, Wu J, Fang L, Sun S, Liu B, Li P, Hua W, Wang X, Cheng F, Liu SY, Wu J, Fang L, Sun SL, Liu B, Li PX, Hua W, Wang XW (2011) BRAD, the genetics and genomics data-base for Brassica plants. BMC Plant Biol 11:136

75. Karsch-Mizrachi I, Nakamura Y, Cochrane G (2012) The international nucleotide sequence database collaboration. Nucleic Acids Res 40:D33–D37

76. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37:26–31

77. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio

M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E (2008) Database resources of the national center for biotechnology information. Nucleic Acids Res 36:D13–D21

78. Gonzales MD, Gajendran K, Farmer AD, Archuleta E, Beavis WD (2007) Leveraging model legume information to fi nd candidate genes for soybean sudden death syndrome using the legume information system. In: Edwards D (ed) Methods in molecular biology. Humana, Totowa, NJ, pp 245–259

79. Gonzales MD, Archuleta E, Farmer A, Gajendran K, Grant D, Shoemaker R, Beavis WD, Waugh ME (2005) The legume informa-tion system (LIS): an integrated information resource for comparative legume biology. Nucleic Acids Res 33:D660–D665

80. Schaeffer ML, Harper LC, Gardiner JM, Andorf CM, Campbell DA, Cannon EK, Sen TZ, Lawrence CJ (2011) MaizeGDB: curation and outreach go hand-in-hand. Database (Oxford) 2011, bar022

81. Lawrence CJ (2007) MaizeGDB – the maize genetics and genomics database. In: Edwards D (ed) Methods in molecular biology. Humana, Totowa, NJ, pp 331–345

82. Lawrence CJ, Schaeffer ML, Seigfried TE, Campbell DA, Harper LC (2007) MaizeGDB's new data types, resources and activities. Nucleic Acids Res 35:D895–D900

83. Canaran P, Buckler ES, Glaubitz JC, Stein L, Sun Q, Zhao W, Ware D (2008) Panzea: an update on new content and features. Nucleic Acids Res 36:D1041–D1043

84. Grant D, Nelson RT, Cannon SB, Shoemaker RC (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res 38:D843–D846

85. Wegrzyn J, Main D, Figueroa B, Choi M, Yu J, Neale D, Jung S, Lee T, Stanton M, Zheng P, Ficklin S, Cho I, Peace C, Evans K, Volk G, Oraguzie N, Chen C, Olmstead M, Gmitter G, Abbott A (2012) Uniform standards for genome databases in forest and fruit trees. Tree Genet Genomes 8:1–2

86. Tree fruit Genome Database Resources (tfGDR) (2002) Washington State University, Pullman, WA. http://www.tfgdr.org

Kaitao Lai et al.