novel primers for complete mitochondrial cytochrome b gene sequencing in mammals

6
Novel primers for complete mitochondrial cytochrome b gene sequencing in mammals ASHWIN NAIDU,* ROBERT R. FITAK,† ADRIAN MUNGUIA-VEGA* and MELANIE CULVER*†‡ *School of Natural Resources and the Environment, University of Arizona, 1311, East Fourth Street, Room 317, Tucson, AZ 85721, USA, Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721, USA, Arizona Cooperative Fish and Wildlife Research Unit, University of Arizona, Tucson, AZ 85721, USA Abstract Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sam- ple are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identi- fied a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submit- ted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification. Keywords: cytochrome b, mammals, primers, sequence data, species identification Received 12 June 2010; revision received 31 August 2011; accepted 7 September 2011 Introduction Sequence-based species identification relies on the extent and integrity of sequence data present in online databases such as GenBank, but the availability and quality of reference sequence data have often been questioned (Bridge et al. 2003; Harris 2003; Nilsson et al. 2006). When identifying species from a sample of unknown origin, a BLAST search of partial DNA sequence(s) obtained from the sample may not overlap, or match completely, with reference sequences present in Gen- Bank. When the sequence from the matching species is not present, high-scoring alignments with closely related sequences might produce unreliable results on species identity. Accurate identification of species from DNA sequences depends on existing reference data in GenBank and interpretation of BLAST search results (reviewed in Kang et al. 2010). To minimize errors in species identification, there is a need for the deposition of complete gene sequences into GenBank that can be particularly useful as reference sequences. These refer- ence sequences would provide full coverage in local alignments with query sequences obtained from ampli- fication of partial gene regions (e.g. in forensic cases and ancient DNA applications). For species identification, the query sequence must show little intra-specific variation and sufficient interspe- cies variation such that two closely related members of the same genus could be separated; this query sequence can be the whole gene or only a short region of the gene (e.g. Hebert et al. 2003). Recently, for species identifica- tion in mammals, the mitochondrial cytochrome b (cyt b) gene and the cytochrome c oxidase subunit 1 (COI) gene have become popular (reviewed in Ogden et al. 2009). In mammals, the cyt b gene shows higher interspecies varia- tion and thereby, when using shorter regions, can be more informative than the COI gene for species identifi- cation (Tobe et al. 2009). Although many mammalian species are represented in online databases such as Gen- Bank, only partial DNA sequences from the cyt b gene of Correspondence: Ashwin Naidu, Fax: (520) 621 8801; E-mail: [email protected] ȑ 2011 Blackwell Publishing Ltd Molecular Ecology Resources (2012) 12, 191–196 doi: 10.1111/j.1755-0998.2011.03078.x

Upload: ashwin-naidu

Post on 29-Sep-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Novel primers for complete mitochondrial cytochrome b genesequencing in mammals

ASHWIN NAIDU,* ROBERT R. FITAK,† ADRIAN MUNGUIA-VEGA* and MELANIE CULVER*†‡

*School of Natural Resources and the Environment, University of Arizona, 1311, East Fourth Street, Room 317, Tucson, AZ 85721,

USA, †Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ 85721, USA, ‡Arizona Cooperative Fish

and Wildlife Research Unit, University of Arizona, Tucson, AZ 85721, USA

Abstract

Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such

as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sam-

ple are aligned against existing sequences in databases. When the sequence from the matching species is not present in the

database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For

species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large

amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence

data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identi-

fied a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified

and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submit-

ted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single

primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the

addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of

novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species

identification.

Keywords: cytochrome b, mammals, primers, sequence data, species identification

Received 12 June 2010; revision received 31 August 2011; accepted 7 September 2011

Introduction

Sequence-based species identification relies on the extent

and integrity of sequence data present in online databases

such as GenBank, but the availability and quality of

reference sequence data have often been questioned

(Bridge et al. 2003; Harris 2003; Nilsson et al. 2006).

When identifying species from a sample of unknown

origin, a BLAST search of partial DNA sequence(s)

obtained from the sample may not overlap, or match

completely, with reference sequences present in Gen-

Bank. When the sequence from the matching species is

not present, high-scoring alignments with closely

related sequences might produce unreliable results on

species identity. Accurate identification of species from

DNA sequences depends on existing reference data in

GenBank and interpretation of BLAST search results

(reviewed in Kang et al. 2010). To minimize errors in

species identification, there is a need for the deposition

of complete gene sequences into GenBank that can be

particularly useful as reference sequences. These refer-

ence sequences would provide full coverage in local

alignments with query sequences obtained from ampli-

fication of partial gene regions (e.g. in forensic cases

and ancient DNA applications).

For species identification, the query sequence must

show little intra-specific variation and sufficient interspe-

cies variation such that two closely related members of

the same genus could be separated; this query sequence

can be the whole gene or only a short region of the gene

(e.g. Hebert et al. 2003). Recently, for species identifica-

tion in mammals, the mitochondrial cytochrome b (cyt b)

gene and the cytochrome c oxidase subunit 1 (COI) gene

have become popular (reviewed in Ogden et al. 2009). In

mammals, the cyt b gene shows higher interspecies varia-

tion and thereby, when using shorter regions, can be

more informative than the COI gene for species identifi-

cation (Tobe et al. 2009). Although many mammalian

species are represented in online databases such as Gen-

Bank, only partial DNA sequences from the cyt b gene ofCorrespondence: Ashwin Naidu, Fax: (520) 621 8801;

E-mail: [email protected]

� 2011 Blackwell Publishing Ltd

Molecular Ecology Resources (2012) 12, 191–196 doi: 10.1111/j.1755-0998.2011.03078.x

these species are generally available. This could be due

to the primer pairs that have already been in use for cyt b

gene sequencing in whole or in part (e.g. Kocher et al.

1989; Irwin et al. 1991; Bartlett & Davidson 1992; Parson

et al. 2000; Hsieh et al. 2001; Verma & Singh 2003).

To alleviate the issues associated with reference

sequence availability and efficient amplification of whole

genes, we developed a single PCR primer pair to enable

sequencing of the complete cyt b gene (�1140 bp) in a

large number of mammal species. Because the cyt b gene

is valuable for mammalian species identification, we

developed this primer pair with the aim of contributing

towards and expanding the complete cyt b sequence data

in GenBank.

Materials and methods

To design this primer pair, we downloaded DNA

sequences spanning �1740 bp in the mitochondrial gen-

ome of six species representing six mammalian orders

from GenBank (Table 1). These DNA sequences included

the complete coding sequence (cds) of the cyt b gene and

�300 bp of flanking sequence on either side. We aligned

these sequences using BioEdit software v7.0.9.0 (Hall

1999). We identified and designed a forward primer

MTCB-F (5¢-CCHCCATAAATAGGNGAAGG-3¢) and a

reverse primer MTCB-R (5¢-WAGAAYTTCAGCTTT-

GGG-3¢) located on highly conserved regions—the for-

ward primer is located in the NADH dehydrogenase

subunit 6 gene, and the reverse primer is anchored in the

transfer RNA-Pro gene. The positions of these forward

and reverse primers in the human mitochondrial genome

are 14588–14607 and 15989–16006, respectively, according

to Anderson et al. (1981).

We used OligoAnalyzer software v3.1 (Integrated

DNA Technologies, Inc., Coralville, IA, USA, http://

www.idtdna.com/analyzer/Applications/OligoAnalyzer/)

to assess melting temperatures and to verify homo-

dimers, hetero-dimers, secondary structures and self-

priming on the putative primer pair—MTCB-F and

MTCB-R. We used a short-input sequence optimized

BLAST search on each primer sequence to check for

matching sequences in GenBank. To test the functionality

and scope of this primer pair, we performed an in silico

PCR using the Amplify software v3.1 (Copyright of Bill

Engels, 2005, University of Wisconsin, http://

engels.genetics.wisc.edu/amplify/) on mitochondrial

genome sequences of 27 species representing 26 mamma-

lian orders (Table 1). We obtained amplification in all 27

species. The amplified target fragment size ranged

between 1415 and 1442 bp (Table 1). We also observed

amplification of nontarget fragments in some species

indicating that in vitro gel purification of the amplified

target fragment may be required prior to sequencing.

We further tested the primer pair using PCR in vitro

on 44 species representing 10 orders of mammals

(Table 2). Of these 44 species, 41 were unique to the 27

species that we tested in silico. All 44 samples were tissue

samples collected in the field by various experienced

field biologists (see Acknowledgements), who identified

species with the specimen in hand. To extract DNA from

these samples, we used the QIAamp Tissue & Blood

kit (Qiagen, Valencia, CA, USA) and followed the manu-

facturer’s instructions. To test specificity of the primer

pair, we also included DNA samples from two nonmam-

malian species—Athene cunicularia and Anodonta californi-

ensis. We used 0.1–10 ng ⁄ lL of DNA as template for

PCR. We performed amplifications in a 20-lL reaction

volume with the following final concentrations: 1· PCR

buffer containing 1.5 mM MgCl2 (Qiagen), an additional

1.0 mM MgCl2 (Qiagen), 0.2 mM dNTPs (Qiagen), 0.05%

BSA (Sigma-Aldrich, St. Louis, MO, USA), 0.5 U of Taq

DNA Polymerase (Qiagen) and 0.5 lM each of forward

and reverse primers. PCR cycling was performed in

Mastercycler PCR machines (Eppendorf, Westbury, NY,

USA) with an initial denaturation at 95 �C for 10 min,

followed by 35 cycles of denaturation at 95 �C for 45 s,

annealing at 55 �C for 1 min, extension at 72 �C for 2 min

and a final extension step at 72 �C for 10 min. We

subjected PCR products to electrophoresis in a 1% aga-

rose gel stained with ethidium bromide. We prepared

PCR products for sequencing via treatment with the Exo-

SAP-IT PCR Clean-up kit (USB Corporation, Cleveland,

OH, USA) using manufacturer’s recommendations. In

cases where nontarget amplifications of fragments other

than the target region were obtained in addition to the

target region (�1420 bp), we used the QIAquick Gel

Extraction kit (Qiagen) and followed manufacturer’s

instructions to purify the target fragment for sequencing.

All purified PCR products were sequenced using both

forward and reverse primers on a 3730xl Automated

DNA Analyzer (Applied Biosystems, Foster City, CA,

USA).

All samples were sequenced in triplicate. We assem-

bled and edited sequences from each species using

Sequencher software v4.9 (Gene Codes Corporation, Ann

Arbor, MI, USA). We aligned both forward and reverse

sequences from all three sequencing attempts with refer-

ence sequences to check for base calling errors, frame

shifts, insertions and deletions. We trimmed flanking

sequences from the sequence assembly and derived a

consensus of the cyt b cds (�1140 bp). We annotated each

cyt b cds in Sequin software v10.3 (NCBI, http://

www.ncbi.nlm.nih.gov/Sequin/), to check for complete

amino acid translation to the cyt b protein, prior to

submitting sequences to GenBank.

To test for sequence homology between species, we

used MegAlign software v9.0.4 (DNASTAR, Inc.,

� 2011 Blackwell Publishing Ltd

192 A . N A I D U E T A L .

Madison, WI, USA, https://www.dnastar.com/t-sub-

products-lasergene-megalign.aspx). We performed a

slow-accurate ClustalW alignment of the full-length cyt b

gene. We used 1140 bp from each sequence to perform

the alignment with the following multiple alignment

parameters: gap penalty = 15.00, gap length penalty =

6.66, delay divergent sequences = 30%, DNA transition

weight = 0.50, protein weight matrix = Gonnet series,

DNA weight matrix = ClustalW. We calculated pairwise

percent identity and divergence between each of the

sequences.

Results

We obtained PCR amplicons from 43 of 44 mammalian

species (98%), and no amplicons from both nonmammali-

an species. The sample from Chaetodipus rudinoris failed

to amplify even after multiple PCR attempts. Of the 43,

amplicons from 30 species were sequenced directly

(70%), and amplicons from 13 species needed gel extrac-

tion prior to sequencing (30%, Table 2). On average, in

each direction, forward and reverse, we obtained a

sequence read length of 700–750 bp. Sequencing in both

forward and reverse directions was necessary to obtain

the entire gene sequence putatively spanning �1140 bp.

We found that the cyt b gene sequences spanned between

1140–1200 bp in our data set. Our submission to Gen-

Bank consisted of 40 mammalian species’ consensus cyt b

gene sequences that correctly annotated to the cyt b pro-

tein (Table 2).

The intra-species percent identity (homology), as

between Canis lupus baileyi and Canis lupus familiaris, was

Table 1 Sequences from GenBank used in primer design and in silico testing

Order Species GenBank accession TFS (bp) NTFS (bp)

Sequences used in alignment for primer design

Carnivora Acinonyx jubatus AF344830.1

Cetartiodactyla Cervus elaphus AB245427.2

Didelphimorphia Didelphis virginiana Z29573.1

Primates Homo sapiens J01415.2

Proboscidea Loxodonta africana AJ224821.1

Rodentia Mus musculus DQ874614.2

Sequences used for in silico PCR simulations

Afrosoricida Echinops telfairi AJ400734.2 1433 –

Carnivora Herpestes javanicus AY873843.1 1421 –

Cetartiodactyla Grampus griseus EU557095.1 1422 93, 877, 1075

Cetartiodactyla Lama glama AP003426.1 1421 –

Cingulata Dasypus novemcinctus Y11832.1 1425 363, 633

Chiroptera Artibeus jamaicensis AF061340.1 1419 60, 92, 635, 973

Dasyuromorphia Dasyurus hallucatus AY795973.1 1430 59, 252, 367, 884, 975

Dermoptera Cynocephalus variegatus AJ428849.1 1420 –

Didelphimorphia Didelphis virginiana Z29573.1 1436 –

Diprotodontia Vombatus ursinus AJ304826.1 1427 –

Erinaceomorpha Erinaceus europaeus X88898.2 1421 –

Hyracoidea Procavia capensis AB096865.1 1442 1305

Lagomorpha Ochotona collaris AF348080.1 1415 654, 716, 873, 1056, 1214

Macroscelidea Macroscelides proboscideus AJ421452.1 1417 60

Monotremata Ornithorhynchus anatinus X83427.1 1416 370, 1689

Notoryctemorphia Notoryctes typhlops AJ639874.1 1429 534, 1098, 1173

Paucituberculata Rhyncholestes raphanurus AJ508399.1 1432 97

Peramelemorphia Isoodon macrourus AF358864.1 1428 719

Perissodactyla Equus caballus X79547.1 1427 703

Pholidota Manis tetradactyla AJ421454.1 1419 –

Primates Eulemur macaco AB371088.1 1419 1504

Proboscidea Elephas maximus DQ316068.1 1419 186, 309, 897

Rodentia Myoxus glis AJ001562.1 1421 248, 633, 871, 971

Scandentia Tupaia belangeri AF217811.1 1415 761

Sirenia Dugong dugon AJ421723.1 1420 –

Soricomorpha Crocidura russula AY769264.1 1422 364, 373, 1465

Tubulidentata Orycteropus afer Y18475.1 1426 718, 764

TFS, target fragment size; NTFS, nontarget fragment size, detected by in silico PCR in Amplify v3.1 software.

� 2011 Blackwell Publishing Ltd

P R I M E R S F O R M A M M A L I A N C Y T b G E N E S E Q U E N C I N G 193

99.3%; the interspecific percent identity, as between two

Dipodomys spp., was 85.4%, and between three Lepus spp.

averaged 93.5%. We also generated an overall comparison

chart of pairwise percent identity and sequence diver-

gence within species and between closely related genera

in our data set (see Table S1, Supporting information).

Discussion

The cyt b gene is a valuable marker for sequence-based

species identification in mammalian species, although

effective species identification depends on the availabil-

ity of reference sequence data available in databases such

Table 2 DNA samples used in PCR in vitro testing and respective sequences deposited in GenBank

Order Species SD ⁄ GE GenBank accession Sequence length (bp)

Carnivora Panthera onca SD GU175435 1140

Carnivora Lynx rufus SD GU175436 1140

Carnivora Procyon lotor GE GU175439 1140

Carnivora Mephitis mephitis SD GU175440 1140

Carnivora Puma concolor couguar SD GU175442 1140

Carnivora Canis lupus baileyi SD HM222711 1140

Carnivora Canis lupus familiaris SD JF489119 1140

Carnivora Urocyon cinereoargenteus SD JF489121 1140

Carnivora Vulpes macrotis SD JF489127 1140

Cetartiodactyla Phocoena sinus GE HM222714 1140

Cetartiodactyla Balaena mysticetus SD JF489130 1140

Cetartiodactyla Antilocapra americana sonoriensis SD GU175434 1140

Cetartiodactyla Ovis canadensis mexicana SD HM222706 1140

Cetartiodactyla Odocoileus hemionus SD HM222707 1140

Cetartiodactyla Alces alces SD JF489131 1140

Cetartiodactyla Lama pacos SD JF489132 1140

Cetartiodactyla Cervus elaphus SD JF489133 1140

Cetartiodactyla Pecari tajacu SD JF489135 1140

Cetartiodactyla Madoqua kirkii SD JF489137 1140

Chiroptera Leptonycteris curasoae SD GU175441 1140

Chiroptera Myotis auriculus SD JF489122 1140

Chiroptera Euderma maculatum SD JF489125 1140

Chiroptera Tadarida brasiliensis SD JF489129 1140

Didelphimorphia Didelphis virginiana GE HM222715 1149

Didelphimorphia Caluromys derbianus GE JF489138 1200

Insectivora Sorex monticolus GE JF489124 1140

Lagomorpha Lepus californicus xanti GE HM222712 1140

Lagomorpha Lepus insularis GE HM222713 1140

Lagomorpha Lepus americanus SD JF489126 1140

Perissodactyla Equus caballus SD JF489134 1140

Primates Ateles geoffroyi frontatus SD HM222708 1140

Primates Lemur catta SD JF489136 1140

Rodentia Dipodomys simulans peninsularis SD GU175437 1140

Rodentia Dipodomys merriami melanurus SD GU175438 1140

Rodentia Tamiasciurus hudsonicus grahamensis SD GU175443 1140

Rodentia Mus musculus GE HM222709 1146

Rodentia Rattus norvegicus GE HM222710 1143

Rodentia Peromyscus maniculatus SD JF489123 1143

Rodentia Neotoma albigula SD JF489128 1143

Sirenia Trichechus manatus GE JF489120 1140

DNA samples amplified by PCR in vitro but not submitted to GenBank

Carnivora Ursus americanus GE

Cetartiodactyla Phocoena spinipinnis GE

Rodentia Castor canadensis GE

DNA samples failed to amplify by PCR in vitro

Rodentia Chaetodipus rudinoris

SD, sequenced directly; GE, gel extracted.

� 2011 Blackwell Publishing Ltd

194 A . N A I D U E T A L .

as GenBank. We developed a single PCR primer pair to

enable sequencing of the complete cyt b gene in a large

number and diverse range of mammalian species. We

developed this primer pair with the aim of expanding the

complete cyt b sequence data on mammalian species

represented in GenBank. Primer sets previously

described for mammalian cyt b gene sequencing (Kocher

et al. 1989; Bartlett & Davidson 1992; Parson et al. 2000;

Hsieh et al. 2001; Verma & Singh 2003) were for partial

fragments of the cyt b gene. The primers described by

Irwin et al. (1991) amplify the entire cyt b but were tested

in a limited set of mammalian species. To our knowledge,

this is the first single primer pair tested for complete cyt b

gene sequencing in a broadly representative set of mam-

malian species.

Two major applications of this primer pair are as fol-

lows: (i) development of mammalian cyt b gene sequence

databases for species identification and (ii) verification of

cyt b sequence data before and after deposition into

online databases. As this is a single primer pair for whole

gene sequencing, PCRs are more time and cost effective

than when using a set or panel of primers. This is desir-

able especially when dealing with a large number of sam-

ples or when setting up reference sequence databases

where multiple sequencing runs may be required.

Although this primer pair may have limited use in

degraded or ancient DNA applications, this primer pair

can be used to obtain sequences from known specimens,

including field-collected samples from well-documented

species.

Because we did not test this primer pair by PCR in

vitro in monotreme and xenarthran species of mammals,

and that in marsupials we tested them only in two spe-

cies (one of them being the marsupial species Didelphis

virginiana), it is possible that this primer pair will mostly

be useful for studies involving eutherian mammals. Also,

we tried multiple PCR trials on samples from different

individuals from Chaetodipus rudinoris, but the samples

failed to amplify. Based on this information, we accept

that exceptions may exist to the universal nature of this

primer pair, most likely due to mismatches in the primer

sites on template DNA.

Nuclear mitochondrial pseudogenes (numts) could be

easily coamplified by universal primers (Song et al. 2008).

However, because numts are usually not functional, they

can be removed by examination of sequences characteris-

tics that imply functionality of the gene product, includ-

ing indels, in-frame stop codons and nucleotide

composition (Song et al. 2008). We did not submit

sequences from three species, Ursus americanus, Phocoena

spinipinnis and Castor canadensis, because these sequences

either consisted of internal stop codons, indels or frame-

shift mutations that we identified upon translation

and ⁄ or also upon comparison with closely related

complete cyt b reference sequences from GenBank. This

may suggest the presence of numt amplicons, or that the

template DNA used during PCR was of poor-quality or

degraded. In support of the argument about the possibil-

ity of pseudogene coamplification, we recognize that

such problems have been encountered in previous stud-

ies (e.g. Song et al. 2008; Moulton et al. 2010) and that the

issue of nonspecificity is an important concern for a pri-

mer pair that has been specifically designed for universal

amplification across a range of taxa.

Considering the examination of amino acid translation

as a quality control measure for protein coding genes, we

submitted only the complete coding sequences that cor-

rectly annotated to the cyt b protein (Table 2). The soft-

ware we used (Amplify v3.1) for in silico testing

predicted several nonspecific fragment amplifications in

the mitochondrial genomes. We also needed to perform

gel extraction of target fragment from 30% of the species

that we tested in vitro. Although increasing the primer

specificity may not eliminate the coamplification of

pseudogenes (Moulton et al. 2010), to reduce nonspecific

amplifications, we suggest the use of more stringent PCR

conditions. Using 1–1.5 mM of MgCl2 instead of 2.5 mM

MgCl2 (total concentration including MgCl2 precontained

in Qiagen PCR buffer) and an annealing time of 30–45 s

instead of 1 min could enhance specificity.

Although it is known that primers consisting of

degenerate bases are less likely to preferentially anneal to

numts (Sorenson et al. 1999), because our primer pair

consists of degenerate bases, we wanted to test whether

these primers putatively match with any nuclear DNA

sequences available in GenBank. Our test on using a

short-input sequence optimized BLAST search on each

primer sequence returned hits only from mitochondrial

DNA sequences (data not shown). However, this result

does not imply that these primers will not anneal to other

sites in the nuclear genome, even if stringent PCR condi-

tions are used. On the contrary, it is known that genomic

extracts from tissue samples have a much higher copy

number of mitochondrial loci compared with nuclear

loci. Additionally, considering that most eukaryotic num-

ts are shorter than amplified fragments >700 bp from

mitochondrial DNA (see Pereira & Baker 2004; Richly &

Leister 2004; also see http://www.pseudogene.net/), we

can be confident that the target fragments we amplified

(�1420 bp), sequenced (�750 bp in each direction) and

submitted to GenBank are not numt sequences.

A search of GenBank will show many partial mamma-

lian sequences corresponding to shorter (<500 bp) frag-

ments, partial cds, of the cyt b gene. Effective species

identification will require the availability of reliable refer-

ence data that are complete. There is also a need to

update sequence data on species represented in Gen-

Bank, particularly in the light of data sharing concerns

� 2011 Blackwell Publishing Ltd

P R I M E R S F O R M A M M A L I A N C Y T b G E N E S E Q U E N C I N G 195

among the scientific community as expressed by Noor

et al. (2006). We believe that this primer pair and the

design and use of other such primers will enhance

sequence data availability for species identification, mini-

mize errors in DNA sequence submissions and promote

successful data sharing in a more convenient, efficient

and cost-effective manner.

Acknowledgements

We would like to thank the researchers who contributed DNA

samples to us at the Conservation Genetics Laboratory in the

School of Natural Resources and the Environment, University

of Arizona—Alberto Macias-Duarte, Cora Varas-Nelson, Judith

Ramirez, Karla Pelz-Serrano, Sarah Rinkevich, Terry Myers

and Ron Thompson; Carol Chambers, Tad Theimer, Tzeidle

Wasserman and Suzanne Hagell from Northern Arizona

University; and Linda Searles from Southwest Wildlife Conser-

vation Center. We also thank the University of Arizona Genet-

ics Core for assistance with DNA sequencing. We are grateful

to the United States Fish and Wildlife Service–Science Support

Program for funding this work.

References

Anderson S, Bankier AT, Barrell BG et al. (1981) Sequence and organiza-

tion of the human mitochondrial genome. Nature, 290, 457–465.

Bartlett SE, Davidson WS (1992) FINS (forensically informative nucleotide

sequencing): a procedure for identifying the animal origin of biological

specimens. BioTechniques, 12, 408–411.

Bridge PD, Roberts PJ, Spooner BM, Panchal G (2003) On the unreliability

of published DNA sequences. New Phytologist, 160, 43–48.

Hall TA (1999) BioEdit: a user-friendly biological sequence alignment edi-

tor and analysis program for Windows 95 ⁄ 98 ⁄ NT. Nucleic Acids Sympo-

sium Series, 41, 95–98.

Harris DJ (2003) Can you bank on GenBank? Trends in Ecology and Evolu-

tion, 18, 317–319.

Hebert PDN, Cywinska A, Ball SL, DeWaard JR (2003) Biological identifi-

cations through DNA barcodes. Proceedings of the Royal Society B: Biolog-

ical Sciences, 270, 313–321.

Hsieh HM, Chiang HL, Tsai LC et al. (2001) Cytochrome b gene for spe-

cies identification of the conservation animals. Forensic Science Interna-

tional, 122, 7–18.

Irwin DM, Kocher TD, Wilson AC (1991) Evolution of cytochrome b in

mammals. Journal of Molecular Evolution, 32, 128–144.

Kang S, Mansfield MA, Park B et al. (2010) The promise and pitfalls of

sequence-based identification of plant pathogenic fungi and oomyce-

tes. Phytopathology, 100, 732–737.

Kocher TD, Thomas WK, Meyer A et al. (1989) Dynamics of mitochon-

drial DNA evolution in animals: amplification and sequencing with

conserved primers. Proceedings of the National Academy of Sciences of the

USA, 86, 6196–6200.

Moulton MJ, Song H, Whiting MF (2010) Assessing the effects of primer

specificity on eliminating numt coamplification in DNA barcoding: a

case study from Orthoptera (Arthropoda: Insecta). Molecular Ecology

Resources, 10, 615–662.

Nilsson RH, Ryberg M, Kristiansson E, Abarenkov K, Larsson K-H, Kol-

jalg U (2006) Taxonomic reliability of DNA sequences in public

sequence databases: a fungal perspective. PLoS ONE, 1, e59.

Noor MAF, Zimmerman KJ, Teeter KC (2006) Data sharing: how much

doesn’t get submitted to GenBank? PLoS Biology, 4, 1113–1114.

Ogden R, Dawnay N, McEwing R (2009) Wildlife DNA forensics–bridg-

ing the gap between conservation genetics and law enforcement.

Endangered Species Research, 9, 179–195.

Parson W, Pegoraro K, Niederstatter H, Foger M, Steinlechner M (2000)

Species identification by means of the cytochrome b gene. International

Journal of Legal Medicine, 114, 23–28.

Pereira SL, Baker AJ (2004) Low number of mitochondrial pseudogenes in

the chicken (Gallus gallus) nuclear genome: implications for molecular

inference of population history and phylogenetics. BMC Evolutionary

Biology, 4, 17.

Richly E, Leister D (2004) NUMTs in sequenced eukaryotic genomes.

Molecular Biology and Evolution, 21, 1081–1084.

Song H, Buhay JE, Whiting MF, Crandall KA (2008) Many species in one:

DNA barcoding overestimates the number of species when nuclear

mitochondrial pseudogenes are coamplified. Proceedings of the National

Academy of Sciences of the USA, 105, 13486–13491.

Sorenson MD, Ast JC, Dimcheff DE, Yuri T, Mindell DP (1999) Primers

for a PCR-based approach to mitochondrial genome sequencing in

birds and other vertebrates. Molecular Phylogenetics and Evolution, 12,

105–114.

Tobe SS, Kitchener A, Linacre A (2009) Cytochrome b or cytochrome c oxi-

dase subunit I for mammalian species identification—an answer to the

debate. Forensic Science International, 2, 306–307.

Verma SK, Singh L (2003) Novel universal primers establish identity of an

enormous number of animal species for forensic application. Molecular

Ecology Notes, 3, 28–31.

Data Accessibility

DNA sequences: Genbank accessions GU175434–

GU175443, HM222706–HM222715, and JF489119–

JF489138.

Supporting Information

Additional supporting information may be found in the

online version of this article.

Table S1 Pairwise percent identity (above diagonal) and

sequence divergence (below diagonal) between all 40 cyt

b gene sequences submitted to GenBank.

Please note: Wiley-Blackwell are not responsible for the

content or functionality of any supporting information

supplied by the authors. Any queries (other than missing

material) should be directed to the corresponding author

for the article.

� 2011 Blackwell Publishing Ltd

196 A . N A I D U E T A L .