evaluation of genbank, eztaxon, and bibi services for the

17
1 Evaluation of GenBank, EzTaxon, and BIBI Services for the Molecular Identification of 1 Clinical Blood Culture Isolates that were Unidentifiable or Misidentified by 2 Conventional Methods 3 4 Kyung Sun Park 1* , Chang-Seok Ki 1* , Cheol-In Kang 2 , Yae-Jean Kim 3 , Doo Ryeon Chung 2 , Kyong-Ran Peck 2 , 5 Jae-Hoon Song 2 , Nam Yong Lee 1 6 7 Department of Laboratory Medicine & Genetics 1 , Division of Infectious Diseases 2 , Department of Pediatrics 3 , 8 Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea 9 10 11 Running title: Evaluation of GenBank, EzTaxon, and BIBI: 16S rRNA 12 13 14 15 16 17 18 * The first two authors contributed equally to this work. 19 20 Corresponding authors: 21 Nam Yong Lee, M.D., Ph.D. 22 Department of Laboratory Medicine & Genetics, Samsung Medical Center, Sungkyunkwan 23 University School of Medicine, 50 Irwon-Dong, Gangnam-Gu, Seoul, South Korea, 135-710 24 Tel: +82-2-3410-2706, Fax: +82-2-3410-2719, E-mail: [email protected] 25 26 Copyright © 2012, American Society for Microbiology. All Rights Reserved. J. Clin. Microbiol. doi:10.1128/JCM.00081-12 JCM Accepts, published online ahead of print on 7 March 2012 on April 13, 2018 by guest http://jcm.asm.org/ Downloaded from

Upload: buidiep

Post on 14-Feb-2017

219 views

Category:

Documents


0 download

TRANSCRIPT

1

Evaluation of GenBank, EzTaxon, and BIBI Services for the Molecular Identification of 1

Clinical Blood Culture Isolates that were Unidentifiable or Misidentified by 2

Conventional Methods 3

4

Kyung Sun Park1*, Chang-Seok Ki1*, Cheol-In Kang2, Yae-Jean Kim3, Doo Ryeon Chung2, Kyong-Ran Peck2, 5

Jae-Hoon Song2, Nam Yong Lee1 6

7

Department of Laboratory Medicine & Genetics1, Division of Infectious Diseases2, Department of Pediatrics3, 8

Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea 9

10

11

Running title: Evaluation of GenBank, EzTaxon, and BIBI: 16S rRNA 12

13

14

15

16

17

18

*The first two authors contributed equally to this work. 19

20

Corresponding authors: 21

Nam Yong Lee, M.D., Ph.D. 22

Department of Laboratory Medicine & Genetics, Samsung Medical Center, Sungkyunkwan 23

University School of Medicine, 50 Irwon-Dong, Gangnam-Gu, Seoul, South Korea, 135-710 24

Tel: +82-2-3410-2706, Fax: +82-2-3410-2719, E-mail: [email protected] 25

26

Copyright © 2012, American Society for Microbiology. All Rights Reserved.J. Clin. Microbiol. doi:10.1128/JCM.00081-12 JCM Accepts, published online ahead of print on 7 March 2012

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

2

Abstract 27

We compared the 16S rRNA gene sequencing results analyzed with GenBank, EzTaxon, 28

and BIBI databases for blood culture specimens where identifications were incomplete, 29

conflicting, or unidentifiable using conventional methods. Analyses performed using 30

GenBank combined with EzTaxon (kappa=0.79) were more discriminative than those using 31

other databases alone or in combination with a second database. 32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

3

The 16S rRNA gene is increasingly used to confirm the molecular identification of microbes, 52

but one major problem associated with 16S rRNA gene sequencing is difficulty in 53

interpretation. There are a number of public and commercial DNA sequencing databases 54

available for microbes. Public databases such as GenBank, which may be searched using the 55

National Center for Biotechnology Information Basic Local Alignment Search Tool (NCBI 56

BLAST), lack peer-reviewed sequences of type strains and sequences of non-type strains (3). 57

On the other hand, while commercial databases potentially contain high-quality filtered 58

sequence data, there are a limited number of reference sequences (5, 9, 11). 59

Recently, several freely available, quality-controlled, web-based public databases, such as 60

the EzTaxon database (http://www.eztaxon.org/) (1) and BIBI database (http://pbil.univ-61

lyon1.fr/bibi/) (8), have been developed for bacterial identification based on 16S rRNA gene 62

sequences. Despite the advances in 16S rRNA gene databases, these databases are rarely 63

evaluated or compared. 64

The aim of this study was to compare the 16S rRNA gene sequencing results analyzed with 65

GenBank using BLAST (GenBank), EzTaxon, and BIBI databases for blood culture 66

specimens where identifications were incomplete, conflicting, or unidentifiable using 67

conventional methods. 68

In our laboratory, 16S rRNA sequencing/or alternative DNA target genes such as gyrB, tuf, 69

secA1 or recA are used as an adjunct to established conventional methods for the 70

identification of difficult-to-identify or rarely encountered bacteria. From January 2010 to 71

April 2011, we encountered 41 consecutive cases of isolates from blood culture that were 72

conflicting, incomplete, or unidentified using conventional methods (37 cases of cultured 73

colonies from blood culture bottles and four cases directly from blood culture bottles). 74

First, the 16S rRNA gene sequences were analyzed with GenBank. The 16S rRNA gene 75

sequence analysis was compliant with the Clinical and Laboratory Standards Institute (CLSI) 76

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

4

guideline MM18-A (4). The same sequences were then compared with the EzTaxon and BIBI 77

database servers. A comparison of the characteristics of the GenBank, EzTaxon, and BIBI 78

databases is shown in Table 1. 79

The identification of microorganisms was considered final when there were two or more 80

concordant results from among three 16S rRNA databases, when the biochemical 81

characteristics of the identified strains were concordant with the known biochemical profiles 82

of the reference strains, or when the strain was identified by additional alternative target 83

genes according to CLSI guidelines. 84

Using our strategies, we correctly identified 30 (73.2%) of 41 strains as single species 85

(Table 2). However, three cases (strains 19, 27, and 36) were identified as a single genus with 86

multiple species. Four cases (strain 11, 18, 21, and 34) were identified at the genus level. We 87

were unable to identify two cases due to unsatisfactory quality. Another single case (strain 88

22) was identified using only the BIBI database and therefore was interpreted as not being 89

fully identified. 90

We used inter-rater agreement statistics (Kappa calculation) to evaluate the correlations 91

between 16S rRNA analyses using each database and combinations of two databases with 92

analyses using comprehensive identification, considering the 16S rRNA gene, and 93

biochemical characteristics or alternative target genes (Table 2). There were no databases 94

alone or in combination that had a kappa value greater than 0.80, which means there was a 95

very good correlation with comprehensive identification. These results imply that only the 96

16S rRNA gene analysis for unidentifiable or misidentified cases by conventional methods 97

has some limitations. The 16S rRNA analysis by GenBank (kappa=0.66) had a higher 98

correlation with analysis by comprehensive identification than analyses by other individual 99

databases. Furthermore, the analysis by GenBank combined with EzTaxon (kappa=0.79) 100

proved to be more discriminative than analysis by GenBank alone, another database alone or 101

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

5

combination of two other databases. 102

Of 39 isolates, not including the two cases with illegible sequences, we obtained 29 103

concordant results (74.4%) for the best-matched strains among the three databases (Table 3). 104

There were ten discordant results (25.6%) for the best-matched strains among the three 105

databases (Table 4). 106

The discrepancies among databases and differences in correlations with comprehensive 107

identifications indicate that there is a lack of consensus on a definite standard for the 108

necessary requirements for sequence databases and a lack of evaluation of the various 109

databases. 110

First, to some degree, these differences might result from the use of different software with 111

the various databases. A previous study (2) reported that, when the taxa being compared are 112

less closely related, the dendrogram relationships are more strongly affected by the program 113

used. We observed several cases (strains 11, 16, 19, 22, 28) where the best-matched strains in 114

one database were not used for analysis in another database because of their low similarity to 115

the query, even though they had the same GenBank accession number. There is a difficulty in 116

applying defined threshold values to determine genus and species according to CLSI 117

guidelines because different similarity results are obtained when the same sequence is 118

compared using different programs. Therefore, it is necessary to provide some guidelines or a 119

consensus regarding programs or parameters included in 16S rRNA gene sequence databases. 120

Furthermore, since we did not compare 16S rRNA analyses using other popular databases 121

such as greengenes (7), ribosomal data project (RDP) (6), or Ribosomal Differentiation of 122

Medical Microorganisms (10), more comparative studies using some of these other databases 123

are necessary. In addition, because our study has some bias because of the limited and 124

selected population of isolates, it is necessary to analyze more 16S rRNA sequences of 125

various organisms recovered from clinical specimens in those comparative studies. 126

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

6

Second, there are differences in the total number of 16S rRNA gene sequences in each 127

database. In the present study, the EzTaxon database had more unidentified results than other 128

databases because it contains only 16S rRNA gene sequences of type strains. 129

In conclusion, analysis of only the 16S rRNA gene is not sufficient for the molecular 130

identification of rare cases where conventional methods do not correctly identify the strain. 131

Based on our experience, we propose that 16S rRNA gene sequencing results should be 132

analyzed by two or more databases including GenBank, preferably analyzed using GenBank 133

at the start and confirmed using other peer-reviewed databases supplementally, because the 134

interpretation of 16S rRNA gene sequences depends on the program used by the database. 135

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

7

References 136 137 1. Chun, J., J. H. Lee, Y. Jung, M. Kim, S. Kim, B. K. Kim, and Y. W. Lim. 2007. 138

EzTaxon: a web-based tool for the identification of prokaryotes based on 16S 139

ribosomal RNA gene sequences. Int J Syst Evol Microbiol 57:2259-2261. 140

2. Clarridge, J. E., 3rd. 2004. Impact of 16S rRNA gene sequence analysis for 141

identification of bacteria on clinical microbiology and infectious diseases. Clin 142

Microbiol Rev 17:840-862, table of contents. 143

3. Clayton, R. A., G. Sutton, P. S. Hinkle, Jr., C. Bult, and C. Fields. 1995. 144

Intraspecific variation in small-subunit rRNA sequences in GenBank: why single 145

sequences may not adequately represent prokaryotic taxa. Int J Syst Bacteriol 45:595-146

599. 147

4. Clinical and Laboratory Standards Institute. 2008. Interpretive Criteria for 148

Identification of Bacteria and Fungi by DNA Target Sequencing; Approved Guideline. 149

MM18-A. Clinical and Laboratory Standards Institute. 150

5. Cloud, J. L., P. S. Conville, A. Croft, D. Harmsen, F. G. Witebsky, and K. C. 151

Carroll. 2004. Evaluation of partial 16S ribosomal DNA sequencing for identification 152

of nocardia species by using the MicroSeq 500 system with an expanded database. J 153

Clin Microbiol 42:578-584. 154

6. Cole, J. R., Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris, A. S. Kulam-155

Syed-Mohideen, D. M. McGarrell, T. Marsh, G. M. Garrity, and J. M. Tiedje. 156

2009. The Ribosomal Database Project: improved alignments and new tools for rRNA 157

analysis. Nucleic Acids Res 37:D141-145. 158

7. DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. 159

Huber, D. Dalevi, P. Hu, and G. L. Andersen. 2006. Greengenes, a chimera-checked 160

16S rRNA gene database and workbench compatible with ARB. Appl Environ 161

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

8

Microbiol 72:5069-5072. 162

8. Devulder, G., G. Perriere, F. Baty, and J. P. Flandrois. 2003. BIBI, a bioinformatics 163

bacterial identification tool. J Clin Microbiol 41:1785-1787. 164

9. Mellmann, A., J. L. Cloud, S. Andrees, K. Blackwood, K. C. Carroll, A. Kabani, 165

A. Roth, and D. Harmsen. 2003. Evaluation of RIDOM, MicroSeq, and Genbank 166

services in the molecular identification of Nocardia species. Int J Med Microbiol 167

293:359-370. 168

10. Turenne, C. Y., L. Tschetter, J. Wolfe, and A. Kabani. 2001. Necessity of quality-169

controlled 16S rRNA gene sequence databases: identifying nontuberculous 170

Mycobacterium species. J Clin Microbiol 39:3637-3648. 171

11. Woo, P. C. Y., K. H. L. Ng, S. K. P. Lau, K. t. Yip, A. M. Y. Fung, K. w. Leung, D. 172

M. W. Tam, T. l. Que, and K. y. Yuen. 2003. Usefulness of the MicroSeq 500 16S 173

Ribosomal DNA-Based Bacterial Identification System for Identification of Clinically 174

Significant Bacterial Isolates with Ambiguous Biochemical Profiles. Journal of 175

Clinical Microbiology 41:1996-2001. 176

177

178 179 180 181 182 183 184 185 186 187 188 189 190

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

9

191

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

10

Table 1. Comparison of characteristics of GenBank, EzTaxon, and BIBI databases 192 193

GenBank EzTaxon BIBI

Resources of reference

sequences

DNA DataBank of Japan (DDBJ) + the

European Molecular Biology Laboratory

(EMBL) + GenBank at NCBI

GenBank + sequences of strains

provided by authors

GenBank

Target genes All 16S rRNA 16S rRNA, gyrB, recA, sodA, rpoB, tmRNA, tuf,

groel2-hsp65

Curated sequences No Yes Yes

Updated nomenclature No Yes, from DSMZa Yes, from DSMZa

Origin of sequences All Only type strains All, But categorized analysis (type strains,

BacteriaArchaea_TS_SSU-rDNA-16S_stringent;

type strains + strains with validly published

name, BacteriaArchaea_SSU-rDNA-

16S_stringent; type strains + strains with/without

validly published name, BacteriaArchaea_SSU-

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

11

rDNA-16S_lax)

Search engines BLASTb BLASTb and FASTA,

then using pairwise global sequence

alignment (algorithm of Myers &

Miller)

BLASTb

Multiple sequence

alignment

No Yes, using Clustal W Yes, using Clustal W

Phylogenetic inference Neighbor-joining, fast minimum evolution

(using BLASTa pairwise alignment)

Neighbor-joining, maximum-

parsimony, maximum-likelihood (using

Clustal W)

Not described (using Clustal W)

a DSMZ, Deutsche Sammlung von Mikroorganismen und Zellkulturen (German Collection of Microorganisms and Cell Cultures) 194

b BLAST, basic local alignment search tool 195

196

197

198

199

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

12

Table 2. 16S rRNA gene analysis using GenBank, EzTaxon, and BIBI databases: Correlation with the comprehensive identification 200

considering 16S rRNA genes analysis and biochemical characteristics or alternative DNA target gene sequences 201

GenBank Kappa

(95% CI)

EzTaxon Kappa

(95% CI)

BIBI Kappa

(95% CI)

GenBank+

EzTaxona

Kappa

(95% CI)

GenBank

+BIBIa

Kappa

(95% CI)

EzTaxon

+BIBIa

Kappa

(95% CI)

Comprehensive

identification

Single species level 25 0.66

(0.45-0.88)

23 0.63

(0.41-0.85)

30 0.43

(0.18-0.68)

27 0.79

(0.60-0.98)

30 0.54

(0.28-0.81)

31 0.57

(0.31-0.82)

31

Single genus with

multiple species

10 9 3 7 4 3 3

Genus level, only 3 3 2 3 4 4 4

Unidentifiable 3 5 4 3 2 2 3

Misidentification 0 1 2 1 1 1 0

a When there were discrepant results between databases in the identification using the combination of two databases, a more specific outcome was considered as a 202

provisional result for the identification. In the unidentifiable cases using one database and identification using another, the identified results were regarded as provisional. 203

In the cases of identification of single species using one database and single genus with multiple species using another, the results for the single species were presumed to 204

be provisional. When the results of two databases were the same for genus but different for species, the provisional results were those at the genus level. 205

206

207

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

13

Table 3. 16S rRNA gene identification using GenBank, EzTaxon, and BIBI databases: 29 concordant results for the best-matched strains 208

209

Bacteria 16S rRNA-based identification Strain no. (N)

Gram-positive cocci Staphylococcus aureus 14,40 (2)

S. lugdunensis 5 (1)

Granulicatella adiacens 4, 13 (2)

Gemella morbillorum 35 (1)

Streptococcus pneumoniae 24 (1)

S. pneumoniae/S. pseudopneumoniae/S. mitisa 27, 36 (2)

S. mutans 33 (1)

Enterococcus faecalis 25, 30 (2)

Gram-negative cocci Neisseria species 18 (1)

Gram-negative bacilli Klebsiella pneumoniae 1 (1)

Burkholderia pseudomallei 7 (1)

Achromobacter xylosoxidans 16 (1)

Capnocytophaga sputigena 6 (1)

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

14

210

211

212

213

214

215

216

217

218

219

220

221

a The blood culture bottles presented positive signs, but these isolates were not cultured because of their autolytic tendencies. Failure to distinguish among 222

S. mitis, S. oralis, and S. pneumoniae is a well-known problem when performing gene sequencing. Even after several attempts to differentiate the species 223

using tuf, rpoB and recA genes, we could not obtain the correct sequencing results, presumably because of low DNA concentrations. 224

225

226

227

Eikenella corrodens 8 (1)

Leptotrichia trevisanii 3 (1)

Odoribacter splanchnicus 38 (1)

Gram-positive bacilli Mycobacterium chelonae/M. abscessus/M. massiliense/M. bolletii 41 (1)

Microbacterium paraoxydans 9 (1)

Lactobacillus salivarius 29, 32 (2)

L. paracasei 15 (1)

Bacillus circulans 39 (1)

Clostridium tertium 23, 31 (2)

C. symbiosum 12 (1) on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

15

Table 4. 16S rRNA gene identification using GenBank, EzTaxon, and BIBI databases: Ten discordant results for the best-matched strains 228 229 230

Strain

No.

Identification by

conventional

methods

GenBank EzTaxon BIBIa Comprehensive

identification Best match (%

similarities, base

differences)

Second best

match (%

similarities, base

differences)

Third best match

(% similarities,

base differences)

Best match (%

similarities, base

differences)

Second best

match (%

similarities, base

differences)

Third best match

(% similarities,

base differences)

Best match Second best

match

19 Unidentified Streptococcus

mitis (99.44,

2/535)

Streptococcus

pseudopnuemoni

ae (99.43, 1/530),

Streptococcus

pneumoniae

(99.43, 1/530)

Streptococcus

orlais (99.25,

1/530)

Streptococcus

pseudopneumoni

ae (99.61, 2/510)

Streptococcus

mitis (99.43,

3/530)

Streptococcus

pneumoniae

(99.24, 4/526)

Streptococcus

mitis

Streptococcus

mitis/S.

pseudopnuemoni

ae/S.

pneumoniaeb

28 Colonies 1: E.

faecium/colonies

2: unidentified

Enterococcus

faecium (99.86,

0/698)

Enterococcus

durans (99.86,

1/699)

Enterococcus

faecium (99.86,

1/697)

Enterococcus

durans (99.71,

2/698)

Enterococcus

hirae (99.57,

3/698)

Enterococcus

durans

Enterococcus

faecium

Enterococcus

faecium

10 Sphingomonas

paucimobilis

(VITEK2-GN)/

Pasteurella

species (API-NE)

Aggregatibacter

aphrophilus

(99.31, 0/722)

Haemophilus

paraphrophilus

(99.03, 0/719)

Unidentified Aggregatibacter

aphrophilus

Aggregatibacter

aphrophilus

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

16

11 Pasteurella canis

(VITEK2-GN)/

Acinetobacter

lwofii

(MicroScan-O/N

combo 44)

Acinetobacter

species (99.05,

3/739)

Acinetobacter

parvus (99.18,

6/729)

Unidentified

(stringent),

Acinetobacter

species (lax)

Acinetobacter

species

37 Unidentified Moraxella

nonliquefaciens

(99.58, 2/715)

Moraxella

nonliquefaciens

(99.72, 2/712)

Unidentified

(stringent),

Moraxella

nonliquefaciens

(lax)

Moraxella

nonliquefaciens

20 Unidentified Microbacterium

aurum (99.82,

0/544)

Unidentified Microbacterium

aurum

Microbacterium

aurum

21 Unidentified Microbacterium

oxydans (99.60,

0/497)

Microbacterium

paraoxydans

(99.20, 1/498)

Microbacterium

paraoxydans

(98.19, 9/497)

Microbacterium

luteolum (98.05,

9/461)

Microbacterium

thalassium

(97.96, 10/498)

Microbacterium

phyllosphaerae

Microbacterium

species

26 Clostridium

bifermentans

(VITEK2-ANC)

Catabacter

hongkongensis

(99.12, 6/678)

Unidentified Catabacter

hongkongensis

Catabacter

hongkongensis

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from

17

34 Unidentified Fontibacillus

aquaticus (97.11,

6/691)

Fontibacillus

aquaticus (97.23,

19/687)

Unidentified

(stringent, lax)

Fontibacillus

species

22 Unidentified Unidentified Unidentified Oscillibacter

valericigenes

Unidentified

a The % similarities and base differences are not reported for the BIBI database because they are not listed in the BIBI database . 231 b Described in Table 3. 232 233 234 235 236 237

on April 13, 2018 by guest

http://jcm.asm

.org/D

ownloaded from