precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · metagenomics...
TRANSCRIPT
1
Precision long-read metagenomics sequencing for food safety by detection and 1
assembly of Shiga toxin-producing Escherichia coli in irrigation water 2
3
Meghan Maguire, Julie A. Kase, Dwayne Roberson, Tim Muruvanda, Eric W. Brown, 4
Marc Allard, Steven M. Musser, Narjol González-Escalona* 5
6
Center for Food Safety and Applied Nutrition, Food and Drug Administration, College 7
Park, MD USA. 8
9
Key words: Foodborne pathogens, qPCR, Escherichia coli, nanopore sequencing, real 10
time PCR, irrigation water, metagenomics 11
12
13
Running title: E. coli long-read metagenomics for food safety 14
15
16
17
18
______________ 19
*Corresponding author. [email protected]. Mailing address, Center 20
for Food Safety and Applied Nutrition, Food and Drug Administration, 5001 Campus 21
Drive, College Park, MD 20740, USA 22
23
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
2
ABSTRACT 24
Shiga toxin-producing Escherichia coli (STEC) contamination of agricultural water might 25
be an important factor to recent foodborne illness and outbreaks involving leafy greens. 26
Whole genome sequencing generation of closed bacterial genomes plays an important 27
role in source tracking. We aimed to determine the limits of detection and classification 28
of STECs by qPCR and nanopore sequencing using enriched irrigation water artificially 29
contaminated with E. coli O157:H7 (EDL933). We determined the limit of STEC 30
detection by qPCR to be 30 CFU/reaction, which is equivalent to 105 CFU/ml in the 31
enrichment. By using Oxford Nanopore’s EPI2ME WIMP workflow and de novo 32
assembly with Flye followed by taxon classification with a k-mer analysis software 33
(Kraken), E. coli O157:H7 could be detected at 103 CFU/ml (68 reads) and a complete 34
fragmented E. coli O157:H7 metagenome-assembled genome (MAG) was obtained at 35
105-108 CFU/ml. Using a custom script to extract the E. coli reads, a completely closed 36
MAG was obtained at 107-108 CFU/ml and a complete, fragmented MAG was obtained 37
at 105-106 CFU/ml. In silico virulence detection for E. coli MAGs for 105-108 CFU/ml 38
showed that the virulotype was indistinguishable from the spiked E. coli O157:H7 strain. 39
We further identified the bacterial species in the un-spiked enrichment, including 40
antimicrobial resistance genes, which could have important implications to food safety. 41
We propose this workflow could be used for detection and complete genomic 42
characterization of STEC from a complex microbial sample and could be applied to 43
determine the limit of detection and assembly of other foodborne bacterial pathogens. 44
45
46
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
3
IMPORTANCE 47
Foodborne illness caused by Shiga toxin-producing E. coli (STEC) ranges in severity 48
from diarrhea to hemolytic uremic syndrome and produce-related incidence is 49
increasing. The pervasive nature of E. coli requires not only detection, but also a 50
complete genome to determine potential pathogenicity based on stx and eae genes, 51
serotype, and other virulence factors. We have developed a pipeline to determine the 52
limits of nanopore sequencing for STECs in a metagenomic sample. By utilizing the 53
current qPCR in the FDA Bacteriological Analytical Manual (BAM) Chapter 4A, we can 54
quantify the amount of STEC in the enrichment and then sequence and classify the 55
STEC in less than half the time as current protocols that require a single isolate. These 56
methods have wide implications for food safety, including decreased time to STEC 57
identification during outbreaks, characterization of the microbial community, and the 58
potential to use these methods to determine the limits for other foodborne pathogens. 59
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
4
INTRODUCTION 60
Shiga toxin-producing Escherichia coli (STEC) is a foodborne pathogen capable of 61
causing severe illness, notably hemolytic uremic syndrome (HUS), and death (1-4). 62
STEC-mediated foodborne illness cases and outbreaks are most commonly associated 63
with the O157:H7 serotype; however, non-O157 STEC illnesses are increasingly being 64
reported (5-8). In the United States, O157:H7 is responsible for approximately 95,000 65
cases per year of which over 2,000 require hospitalization (7). A further 170,000 cases 66
can be attributed to non-O157 STEC exposure. Foodborne transmission accounts for 67
nearly 70% of O157:H7 incidents. Foodborne outbreaks have been increasingly 68
produce-related, from 0.7% in the 1970s to 6% in the 1990s (9). More recently (2004-69
2013) produce accounts for approximately 18% of foodborne outbreaks and E. coli is 70
one of the most common bacterial sources (10-13). Produce can become contaminated 71
during production, packaging, or preparation; however, half of the produce-associated 72
infections are linked to contamination prior to purchase (14). As such, agricultural water, 73
including water used for irrigation, is an important potential source of contamination 74
(https://www.fda.gov/food/outbreaks-foodborne-illness/environmental-assessment-75
factors-potentially-contributing-contamination-romaine-lettuce-implicated; 76
https://www.fda.gov/food/outbreaks-foodborne-illness/investigation-summary-factors-77
potentially-contributing-contamination-romaine-lettuce-implicated-fall) (15-20). While 78
prevention and mitigation strategies are beyond the scope of this study, detection of 79
STEC bacteria remains paramount for public health. 80
81
A standard and accepted method for the detection of STECs in foods is based on a 82
combined result from qPCR and a traditional microbiological method described in the 83
FDA Bacteriological Analytical Manual (BAM) Chapter 4A 84
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
5
(https://www.fda.gov/food/laboratory-methods-food/bam-chapter-4a-diarrheagenic-85
escherichia-coli). It consists of a 24-hour sample enrichment in modified buffered 86
peptone water with pyruvate at 42°C followed by qPCR detection of the main virulence 87
genes (stx1 and stx2) and the wzy gene of the O157 antigen for a total analysis time of 88
2-3 days. Positive qPCR results undergo further analysis and O157:H7 STEC is 89
confirmed by several rounds of selective plating on tellurite cefixime – sorbitol 90
MacConkey agar (TC-SMAC), chromogenic agar, and trypticase soy agar with 0.6% 91
yeast extract (TSAYE) plates for 2-4 more days of analysis time. Isolates confirmed to 92
be pure cultures are assessed for toxigenic potential again by qPCR. Further analysis by 93
whole genome sequencing (WGS) is used to determine the complete scope of 94
pathogenicity and antimicrobial resistance status which would add 3-5 days. In 95
conclusion, approximately two weeks are needed for STEC confirmation by the BAM 96
method. 97
98
WGS is rapidly changing the approach to foodborne illnesses and outbreak 99
investigations (21). WGS is being used to monitor and identify foodborne pathogens 100
(22,23) and the presence of antimicrobial resistance or virulence genes (24,25). Specific 101
information on serotype and pathogenicity as it relates to phylogenic relationships is 102
increasingly important in outbreak scenarios (26,27). Some of the virulence genes 103
detected by WGS mediate attachment and colonization of STECs and can be found in 104
the locus of enterocyte effacement (LEE), including intimin (eae) and type 3 secretion 105
system (TTSS) effector proteins (esp, esc, tir), non-LEE effectors (nleA, nleB, nleC), and 106
other putative virulence genes (ehxA, etpD, subA, toxB, saa) (4,25,28,29). 107
108
Metagenomics for sample microbial analysis and targeted detection have been used 109
extensively in many sample types (e.g. spinach, chapati flour, and ice cream) (30-33). 110
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
6
Analysis is typically either by 16S rDNA profiling or by shotgun metagenomic sequencing 111
(30-35). Many studies have recently started using long read approaches (36-38) for 112
metagenomic studies because it provides finished metagenome-assembled genomes 113
(MAGs) for the most abundant species or bacteria in the microbiome sample (36-38). 114
Closed MAGs provide a better assessment of those genomes and their virulence 115
potential or functionality in that ecosystem. Most of these approaches either used mock 116
communities or did not properly address the limits of the technique for either detection or 117
the initial sample concentration required to get a completely closed or draft genome that 118
would allow for target organism characterization in the sample. Each metagenomic 119
approach provides different levels of analysis. 16S rDNA has the most sensitive 120
detection limit, requiring the lowest initial CFU, but the lowest resolution, reporting at the 121
genus level because some species have almost identical 16S rDNA genes and the 122
fragment of the 16S rDNA used is relatively small (39). Shotgun metagenomic WGS 123
provides a less sensitive detection limit (above 103 CFU/ml), but provides information 124
from species to a strain level, including many functional genes in the microbiome sample 125
analyzed (33,36,40,41). Metagenomic analyses can be made either using short or long 126
sequencing reads technologies. Short-read shotgun metagenomics was most commonly 127
used for microbiome analyses (30-32,35,39,39,42), while the use of long-read 128
metagenomics is on the rise in the last few years (36-38,40) for numerous reasons. 129
These were summarized very concisely by Bertrand et al. (2019) (40), where the authors 130
mention that short-read sequencing presents difficulty in accurately assembling the 131
complex, highly repetitive regions that can range in sizes up to hundreds of kilobases 132
(36), especially when multiple species are present. Classification of these short reads 133
into species bins based on clustering relies on consensus genomes and is not precise 134
enough for strain-level metagenomic assemblies that are necessary for outbreak and 135
traceback scenarios (40). 136
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
7
137
Nanopore sequencing can produce completely closed genomes, while also offering 138
affordability and portability (24,43). It does not employ a size selection process (unlike 139
those seen with Illumina or Pacific Biosciences sequencers) resulting in longer reads 140
that can help assemble complex genomic areas (e.g. phages and highly repetitive 141
regions). Furthermore, because nanopore sequencing can perform real-time base 142
calling, it allows for semi-real-time analysis when paired with the Oxford Nanopore 143
EPI2ME cloud service that has the “What’s in my pot” (WIMP) workflow (44). WIMP 144
identifies reads by taxa using an algorithm with the Centrifuge software (45) and the 145
RefSeq sequence database at NCBI (https://www.ncbi.nlm.nih.gov/refseq/). It identifies 146
the number of reads matching an organism of interest. These reads can potentially be 147
retrieved from the total reads and analyzed separately. These extracted reads can then 148
be de novo assembled and could result in a completely closed MAG for the organism of 149
interest, in our case STEC O157:H7. This approach could dramatically reduce the time 150
to detect and identify an STEC strain in a sample from two weeks to three days from 151
enrichment. 152
153
Long-read nanopore sequencing has proven a useful tool to close bacterial genomes in 154
metagenomic samples where the bacterial species are present in approximately equal 155
proportions (approx. 12% or 107 CFU/ml) (36,37). The actual cell numbers in the initial 156
sample needed for successful assembly of the genome of interest has not been 157
determined yet. Nanopore sequencing of mock microbial communities suggests that the 158
technology is capable of detecting as few as 50 cells in the reaction (i.e. 4 reads) (37). 159
This, however, will be not enough reads to make an informative identification of STEC 160
serotype or evaluation of potential risk to human health via presence of important 161
virulence genes. While current protocols utilize WGS after isolating individual colonies by 162
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
8
the selective plating methods described, culture-independent detection of O157:H7 163
STECs in irrigation water (that were FDA BAM-confirmed O157 STEC-positive) by 164
Illumina MiSeq and Oxford Nanopore shotgun sequencing resulted in negative results 165
due to low concentrations of the organism of interest (Gonzalez-Escalona, unpublished 166
results). However, instead of analyzing the water directly, we suggest that using samples 167
after enrichment will have a sufficient STEC concentration to assemble their genome 168
from nanopore sequencing, which will allow for identification of their serotype, virulence 169
composition, and antimicrobial resistance gene (AMR) content (31). 170
171
In order to test and empirically determine the limits of the technique, nanopore 172
sequencing for: 1) detection, 2) characterization, and 3) closing genomes of STECs, we 173
artificially contaminated STEC-negative irrigation water with 10-fold dilutions of E. coli 174
O157:H7 EDL933. We propose a workflow for detection and quantification (qPCR) for 175
STECs in enriched samples followed by identification and typing by nanopore 176
sequencing. We also developed a script to extract the desired reads by taxa from the 177
total sequenced reads. 178
179
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
9
MATERIALS AND METHODS 180
Bacterial strains and media. We used a variant of the shiga toxin-producing E. coli 181
(STEC) EDL933 O157:H7 strain for all our experiments. This strain was from our 182
collection at CFSAN and is a variant of ATCC 43895 that after several passages in the 183
lab has lost the stx2 phage. EDL933 was grown in static culture overnight in tryptic soy 184
broth (TSB) at 37°C. 185
186
Preparation of E. coli EDL933 inocula for spiking experiments. For artificial 187
contamination, overnight culture (109 CFU/ml) of E. coli EDL933 was serially diluted 10-188
fold in TSB. Dilutions containing approximately 109 CFU/ml through less than 10 CFU/ml 189
were used for spiking studies. Dilutions of the overnight culture spread on tryptic soy 190
agar (TSA) plates were used to calculate the number of CFUs per ml in the original 191
culture. 192
193
Sample processing and artificial contamination. An STEC-negative irrigation water 194
sample (200 ml) from the Southwestern US was enriched by adding an equal volume of 195
2X modified Buffered Peptone Water with pyruvate (mBPWp) and incubated at 37°C 196
static for 5 hours. Antimicrobial cocktail [Acriflavin-Cefsulodin-Vancomycin (ACV)] was 197
added and incubated at 42°C static overnight (18-24 h), according to Chapter 4A of the 198
BAM. E. coli EDL933 dilutions were added in a 1:1 ratio to the enriched irrigation water 199
sample. 200
201
Nucleic acid extraction. DNA from artificially contaminated irrigation water enrichment 202
was extracted by two methods for 1) qPCR and 2) nanopore sequencing. A 1ml fraction 203
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
10
of each spiked enrichment sample was processed for qPCR analysis according to the 204
FDA BAM Chapter 4A. Briefly, cells were pelleted by centrifugation at 12,000 x g for 3 205
minutes. The pellet was washed in 0.85% NaCl and resuspended in 1mL sterile water. 206
Samples were boiled at 100°C for 10 minutes then centrifuged to pellet debris. The DNA 207
supernatant was saved. Another 1 ml portion of each spiked enrichment sample was 208
extracted using the Maxwell RSC Cultured Cells DNA kit with a Maxwell RSC Instrument 209
(Promega Corporation, Madison, WI) according to manufacturer’s instructions for Gram-210
negative bacteria with additional RNase treatment. DNA concentration was determined 211
by Qubit 4 Fluorometer (Invitrogen, Carlsbad, CA) according to manufacturer’s 212
instructions. 213
214
STEC qPCR detection. The presence of STEC EDL933 was determined by qPCR as 215
described in Chapter 4A of the BAM detecting stx1, stx2, and wzy. Briefly, the DNA 216
supernatants recovered from boiled samples were diluted 1:10 in nuclease-free water 217
and 2l was added to 28l master mix containing 0.25M stx1 and stx2 primers, 0.3M 218
wzy primers, 0.2M stx1 probe, 0.15M stx2 and wzy probes, 1X Internal Positive 219
Control Mix (Cat: 4308323, Applied Biosystems), 1X Express qPCR Supermix Universal 220
Taq (Cat: 11785200, Invitrogen), and ROX passive dye. All primers and probes 221
(Supplementary Table 1) employed in this study were purchased from IDT (Coralville, IA, 222
USA). 223
224
Whole genome sequencing, contigs assembly and annotation. DNA recovered from 225
the E. coli EDL933 spiked water enrichment samples was sequenced using a GridION 226
nanopore sequencer (Oxford Nanopore Technologies, Oxford, UK). The sequencing 227
libraries were prepared using the Genomic DNA by Ligation kit (SQK-LSK109) and run 228
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
11
in FLO-MIN106 (R9.4.1) flow cells, according to the manufacturer’s instructions for 72 229
hours (Oxford Nanopore Technologies). The run was live base called using Guppy 230
v3.2.10 included in the MinKNOW v3.6.5 (v19.12.6) software (Oxford Nanopore 231
Technologies). 232
233
The initial classification of the reads for each run was done using the “What's in my pot” 234
(WIMP) workflow contained in the EPI2ME cloud service (Oxford Nanopore 235
Technologies). That workflow allows for taxonomic classification of the reads generated 236
by the GridION sequencing in real time. Using the WIMP classification output, the reads 237
that were identified as “Escherichia coli*” were extracted and saved in a single fastq file 238
using a custom python script (v2.7.3) (Supplementary Note 1). The metagenome-239
assembled genomes (MAGs) for each spiked sample were obtained by de novo 240
assembly using 1) all nanopore data output and 2) extracted E. coli reads using the Flye 241
program v2.6 (46) with the meta parameter. The assembled contigs were classified by 242
taxonomy by Kraken (47) using GalaxyTrakr (48) (Flye+Kraken). The presence of the 243
complete genome and synteny of the completely closed genomes on the final 244
assemblies was checked using Mauve genome aligner (49) . 245
246
Closure of strain EDL933 genome by nanopore sequencing. For bioinformatic 247
quality control purposes we generated a closed genome of the strain used in in the 248
artificial contamination studies. The long-read sequencing library was prepared and run 249
as mentioned above for the spiked experiments. All reads below 5,000 base pairs in 250
length were removed from further analysis. The genome was assembled using Flye 251
v1.6 (46). The nanopore output resulted in 144,000 reads for a total yield of 716 Mb. 252
253
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
12
In silico serotyping. The major serotype present in each sample was determined by 254
batch screening in Ridom SeqSphere+ v 7.0.6 (Ridom, Münster, Germany) using the 255
genes deposited in the Center for Genomic Epidemiology 256
(https://cge.cbs.dtu.dk/services/) for E. coli as part of their web-based tool, 257
SerotypeFinder 2.0 (https://cge.cbs.dtu.dk/services/SerotypeFinder/). 258
259
In silico identification of virulence genes. The presence of virulence genes was 260
determined by batch screening in Ridom SeqSphere+ software v 7.0.6 (Ridom) using the 261
genes deposited in the NCBI Pathogen Detection Reference Gene Catalog 262
(https://www.ncbi.nlm.nih.gov/pathogens/isolates#/refgene/) and described in Gonzalez-263
Escalona and Kase (2019) (25). 264
265
In silico identification of antimicrobial resistance genes. We identified the 266
antimicrobial resistance genes present in our sequenced genomes using the EPI2ME 267
Fastq Antimicrobial Resistance workflow (Oxford Nanopore Technologies). This 268
workflow consists of three processes, including 1) quality control of the reads, 2) WIMP 269
analysis using centrifuge and NCBI RefSeq database, and 3) detection of antimicrobial 270
genes using the CARD database (50). 271
272
Metagenomic and EDL933 data accession numbers. The metagenomic sequence 273
data from this study and the nanopore data for the EDL933 strain used in this study are 274
available in GenBank under bioproject number PRJNA639799. 275
276
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
13
RESULTS 277
Determination of the detection limit of the STEC qPCR method. The qPCR detection 278
limit of E. coli EDL933 was determined by calculating CFUs per reaction. DNA was 279
extracted from E. coli EDL933 culture according to the boil method described in the BAM 280
Chapter 4A for quantification. The starting CFU/ml concentration determined by plating 281
on TSA plates was 1.5 x 109 CFU/ml. Several 10-fold dilutions of the original boil sample 282
were tested in triplicate with the STEC qPCR assay. The wzy gene was detected over 283
six orders of magnitude from 30 to 3 x 106 CFU per reaction (correlation coefficient (R2) = 284
0.99 and efficiency (E) = 98%, Figure 1A). Likewise, the stx1 gene was detected linearly 285
over six orders of magnitude from 30 to 3 x 106 CFU per reaction (R2 = 0.99 and E = 286
96%, Figure 1B). The limit of the STEC qPCR detection, therefore, was 30 CFU per 287
reaction. This means that the minimal number of detectable STEC cells in enrichment 288
culture using this qPCR protocol will be approximately 105 CFU/ml. 289
290
STEC-spiked enriched sample preparation for limit of detection of nanopore 291
sequencing. In order to assess the performance and detection limit of nanopore 292
sequencing on STEC enrichment samples, we used an aliquot of irrigation water found 293
by the FDA BAM method to have nondetectable amounts of O157:H7 STEC. Fractions 294
of this enriched irrigation water sample were artificially contaminated with serial dilutions 295
of E. coli EDL933 overnight culture. A schematic representation of the workflow is shown 296
in Figure 2. The concentration of the stock E. coli EDL933 overnight culture was 297
determined by agar plating to be 1.46 x 109 CFU/ml. Artificial contamination of the 298
enriched field irrigation water sample (1:1), therefore, resulted in an approximate 299
concentration of 7.3 x 108 CFU/ml for the sample referred to as Water+Ecoli1. We 300
confirmed detection and quantification by the BAM qPCR method for this sample. The 301
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
14
BAM qPCR method detected the presence of both wzy (Cq = 22) and stx1 (Cq = 23) 302
genes in Water+Ecoli1. Using the standard curve determined above, we calculated 1.1 x 303
105 CFU/reaction using the wzy gene and 9.8 x 104 CFU/reaction using the stx1 gene. 304
This approximates to 5.4 x 108 CFU/ml using the wzy gene and 4.9 x 108 CFU/ml using 305
the stx1 gene in the Water+Ecoli1 sample. Therefore, the estimated CFU/ml 306
concentration measured by plating and qPCR was very similar. 307
308
Bacterial community associated with BAM enrichment of irrigation water. The 309
bacterial community composition of the enriched irrigation water was determined by a 310
metagenomic analysis using Oxford Nanopore sequencing of the DNA isolated from the 311
sample. The nanopore output resulted in 5.75M reads in 16.9Gb total yield 312
(Supplementary Table 2). The metagenomic bacterial composition of the sample was 313
analyzed by two methods, Oxford Nanopore EPI2ME “What’s in my pot” (WIMP) 314
workflow analysis and a de novo assembly using Flye of all the reads followed by a 315
classification of all the contigs in the assembly by the k-mer software (Kraken). We only 316
reported the taxa accounting for greater than 1% of the total bacterial community. The 317
total WIMP output can be found at 318
https://epi2me.nanoporetech.com/workflow_instance/227465?token=D9EAC8AA-4839-319
11EA-99BC-71806BDB886C and the total Flye+Kraken output can be found in 320
Supplementary Table 3. 321
322
WIMP analysis of the reads obtained by nanopore sequencing of the enriched water 323
sample showed a highly diverse bacterial composition. Even though hundreds of 324
bacterial species were identified in the sample, the majority of the sample was 325
composed of nine bacterial species (>1% in Table 1). The nine bacterial species found in 326
descending cumulative order of frequency were: Klebsiella pneumoniae (28.52%), 327
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
15
Enterobacter cloacae (21.18%), Enterobacter sp. ODB01 (6.71%), Enterobacter kobei 328
(6.53%), Pseudomonas putida (3.88%), Citrobacter freundii (3.86%), Acinetobacter 329
baumannii (3.69%), Enterobacter hormaechei (3.42%), and Enterobacter xiangfangensis 330
(1.22%). Additionally, there were 10,806 reads (0.25%) that were identified as belonging 331
to Escherichia coli. These E. coli reads did not match STEC O157:H7. 332
333
All these microorganisms were identified in the Flye de novo assembly. Flye assembly 334
resulted in 677 contigs of different sizes. Taxa identification by Kraken showed that the 335
contigs of bigger sizes belonged to these taxa: Acinetobacter baumannii (3,784,399 bp), 336
Citrobacter freundii (2,003,414 bp), Klebsiella pneumoniae (1,934,072 bp), Enterobacter 337
cloacae (1,001,408 bp), Pseudomonas putida (438,653 bp), and Enterobacter kobei 338
(365,044 bp). Enterobacter hormaechei and Enterobacter xiangfangensis were also 339
represented in the contigs assembled by Flye, but in the Kraken database hormachei 340
and xiangfangensis are listed as subspecies of E. hormachei. Many other 341
microorganisms were also identified (Supplementary Table 3). Four of the 677 contigs 342
were identified as matching E. coli with the largest being 30,837 bp in length. 343
344
We were also interested in testing the possibility that AMR genes could be identified. We 345
have used the EPI2ME Fastq Antimicrobial Resistance workflow, which processes all 346
nanopore reads in three stages: 1) reads are passed through a quality filter, 2) taxa are 347
identified by the WIMP workflow, and 3) the classified reads are then analyzed for AMR 348
genes using the CARD database (https://card.mcmaster.ca/home). The prior 349
classification by WIMP permits identification of the AMR genes in each particular 350
species. The AMR genes found in the field irrigation water sample include -lactamase 351
genes in Klebsiella pneumoniae (blaSHV, blaACT), Enterobacter cloacae (blaSHV, blaACT), 352
Citrobacter freundii (blaSHV, blaCMY), Acinetobacter baumannii (blaOXA), and Enterobacter 353
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
16
hormaechei (blaACT) 354
(https://epi2me.nanoporetech.com/workflow_instance/232304?token=CCD817B6-742C-355
11EA-B9EA-1AEE73EF14E7). Several efflux pump genes that can be associated with 356
antibiotic resistance were also found in Klebsiella pneumoniae (acr, ram), Enterobacter 357
cloacae (acr, ram, vga), and Acinetobacter baumannii (abe, ade, mex). Lastly, a PhoP 358
gene mutation was detected in Klebsiella pneumoniae (977 reads matching) conferring 359
colistin resistance and the QnrB23 gene (29 reads matching) that confers 360
fluoroquinolone resistance was detected in Citrobacter freundii. 361
362
Nanopore long-read detection limit for E. coli spiked into irrigation water 363
enrichment. After establishing the bacterial community of the un-spiked enriched 364
irrigation water sample, we sequenced DNA obtained from the artificially contaminated 365
water enrichment at levels 7 x 108 CFU/ml (Water+Ecoli1) to 7 x 103 CFU/ml 366
(Water+Ecoli6) by nanopore. One microgram of DNA per sample was used as initial 367
DNA amount for nanopore sequencing and each was run in a single flow cell (Figure 2). 368
The total nanopore output per sample can be found in Supplementary Table 2. On 369
average, each run resulted in approximately 5 million reads with a total yield of 17 Gb. 370
371
The nanopore output for each sample/run was analyzed using the WIMP workflow and 372
Flye+Kraken as described earlier for the enriched water sample. WIMP workflow 373
classified between 1.7 million (Water+Ecoli1) and 8,500 (Water+Ecoli6) reads as E. coli 374
for the serially diluted spiked irrigation water samples (Table 2). These reads accounted 375
for 53 to 0.26% of the total reads for each run (Figure 3). The de novo Flye assembly of 376
the reads for each of the spiked water samples produced assemblies with 555 to 780 377
contigs. 378
379
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
17
The enrichment samples with the highest E. coli EDL933 spiked concentrations, 380
Water+Ecoli1 and 2, showed the highest number of reads classified as E. coli by WIMP 381
(Table 2). As expected, the number of E. coli reads classified by WIMP decreased 382
accordingly with dilution of spiked E. coli. Approximately 8,500 E. coli reads were 383
identified in the lowest level of spiked EDL933 Water+Ecoli6 (7 x 103 CFU/ml), almost 384
the same number of reads as the un-spiked enriched water sample (Table 2). WIMP 385
further classified 68 reads as belonging to O157 in that sample. Therefore, the detection 386
limit for O157:H7 by nanopore sequencing was established at 7 x 103 CFU/ml in the 387
enrichment. 388
389
Nanopore long-read genome assembly limit for E. coli spiked into irrigation water 390
enrichment. After showing that the detection limit of nanopore sequencing was 7 x 103 391
CFU/ml, we sought to establish the genome assembly limit. A genome assembly limit is 392
the minimum number of reads used in a de novo assembly that produces 1) a complete 393
metagenome-assembled genome or MAG at 20X coverage (chromosome and plasmid, 394
if present) or 2) a fragmented MAG from a complex bacterial background. Obtaining a 395
complete or fragmented E. coli O157 MAG will allow us to perform a positive 396
identification of the genome of interest, EDL933, as well as enabling a complete 397
genomic characterization (determine serotype and presence of stx types, eae gene, 398
virulence genes, and AMR genes). We expect this read number to be proportionally 399
related to the initial number of CFU/ml in the sample. To test this hypothesis, the 400
completely closed genome of the E. coli EDL933 strain used in our spiking experiments 401
was used. By using this reference genome, we ensured the accuracy of our in silico 402
analysis to detect the serotype and the entire virulence gene profile. Our EDL933 strain 403
is serotype O157:H7 and has stx1a, eae gamma-1 and other virulence genes (toxB, 404
etpD, tccP, etc). EDL933 available at GenBank (AE005174) has additionally the stx2 405
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
18
gene. We aligned both genomes and found that our version has the entire stx2 phage 406
missing (results not shown). 407
408
When the total nanopore sequencing output was de novo assembled by Flye, the total 409
number of contigs was similar to the un-spiked water sample with 555 and 627 contigs 410
for Water+Ecoli1 and 2, respectively. The E. coli EDL933 O157:H7 genome could be 411
detected in 4 or 5 contigs, the plasmid was present as a single contig in each (Table 2). 412
Serotype analysis accurately identified the E. coli as O157:H7. The presence of stx1a 413
and eae gamma-1 were also detected. In fact, the serotype and stx and eae genes could 414
be determined in Water+Ecoli3 and 4, however the completely closed E. coli O157 MAG 415
could not be obtained at any spiked concentration (Table 2). 416
417
At the lowest spiked levels in samples Water+Ecoli5 and 6, the number of reads 418
associated with E. coli was approximately the same as had been detected in the un-419
spiked water sample (Table 2). Only the O9 serotype was identified in the assemblies, 420
which was the same as detected in the un-spiked water sample. Likewise, detection of 421
the stx and eae genes was lost (Table 2). Therefore, we determined that the limit of 422
fragmented assembly was approximately 7 x 105 CFU/ml, but we were not able to obtain 423
a completely O157:H7 STEC closed MAG even at the highest level of 7 x 108 CFU/ml 424
using this approach. 425
426
In order to improve the assembly, we decided to extract the E. coli reads and perform 427
the de novo assembly again. Using WIMP classified reads allowed us to run a script 428
(Supplementary Note 1) that extracted only the reads identified as E. coli and perform a 429
similar analysis as above. Flye assembly of these filtered reads produced assemblies 430
that contained fewer number of contigs with larger sizes. High inoculation levels in 431
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
19
spiked samples Water+Ecoli1 and 2 resulted in assemblies with 44 and 40 contigs, 432
respectively, each containing the completely closed E. coli O157 MAG (chromosome 433
and plasmid each in a single contig) (Table 3). Serotype analysis accurately detected 434
O157:H7 in assemblies from samples Water+Ecoli1 through 4. The same was observed 435
for the recognition of stx1a and eae gamma-1 genes. 436
437
At the lowest spiked levels in samples Water+Ecoli5 and 6, Flye was able to still produce 438
an assembly but with lower number of contigs (approximately 30 contigs). However, in 439
this case no O157:H7, stx1a, or eae gamma-1 were detected by in silico analyses. 440
Serotype analysis identified the presence of the O9 serotype (Table 3). 441
442
Further analysis of the E. coli extracted reads for Water+Ecoli5 and Water+Ecoli6 443
revealed that only 43 and 31 reads matched E. coli O157. Kraken identified those reads 444
from a total of 2,326 (Water+Ecoli5) and 1,749 (Water+Ecoli6) E. coli reads, 445
respectively. This confirmed our previous analysis that the limit of detection of reads 446
matching O157 is approximately 7 x 103 CFU/ml. Our fragmented assembly limit was still 447
7 x 105 CFU/ml, but the assembly improved as we were able to produce a completely 448
circular closed chromosome in a single contig with a concentration of at least 7 x 107 449
CFU/ml. 450
451
Virulence gene identification. In addition to detecting and serotyping STECs, the 452
detection of virulence genes provides necessary information in outbreak scenarios and 453
is important to food safety. We again used the spiked E. coli EDL933 genome as 454
reference for in silico analysis (Bioproject number PRJNA639799). We have previously 455
reported a set of 94 virulence genes (25) and the EDL933 genome showed the presence 456
of 23 of those genes (Table 4). Among those genes were esp, tccP, nle genes, tir, and 457
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
20
toxB. The assemblies generated above were analyzed for the presence of all 23 458
virulence genes. Corresponding to the limits of MAG assembly (either completely closed 459
or fragmented), all virulence genes were detected in Water+Ecoli1 and Water+Ecoli2 460
when using the total nanopore reads output or extracted reads. On the other hand, we 461
failed to detect tccP in Water+Ecoli3 (7 x 106 CFU/ml) and pssA and air in Water+Ecoli4 462
(7 x 105 CFU/ml) samples in assemblies with the total nanopore output, but when the E. 463
coli reads were extracted from the WIMP output, all virulence genes could be detected. 464
Thus, our extracted E. coli reads script improved our genome assembly. 465
466
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
21
DISCUSSION 467
Considering the importance of irrigation water to food safety, accurate detection and 468
classification of STECs potentially present in those waters is paramount, particularly 469
during an outbreak incident. Current methods of detection include qPCR and extensive 470
selective plating before WGS analysis. This method is a time-consuming process that 471
only provides confirmation of an isolate after almost two weeks of labor. By combining 472
qPCR and long-read metagenomic analysis of the pre-enrichment we can definitively 473
detect an STEC isolate, as well as characterize its virulence potential in 3 - 4 days. 474
While this will not replace eventual confirmation by microbiological methods, this 475
reduces the time for a prospective corrective measure by a complete week. 476
477
In our study we have empirically determined the in silico limits of detection, classification, 478
and closing genomes of STECs in E. coli EDL933 artificially contaminated irrigation 479
water using nanopore sequencing as a proof of concept. We have also developed a 480
pipeline for determination of these limits that can be used for other foodborne or clinical 481
bacteria (Figure 4). Our results showed that the level of STEC O157 needed for 482
detection in the enrichment sample was 103 CFU/ml (Tables 2 and 3). While STEC O157 483
levels between 105 CFU/ml to 106 CFU/ml were enough to produce a fragmented MAG 484
in a few contigs that allowed for complete characterization of the STEC genome, levels 485
above 107 CFU/ml were enough to produce a completely closed STEC MAG (Table 3) 486
with genome coverage of 385X. The complete plasmid was generated from STEC levels 487
above 104 CFU/ml. These recovered MAGs (either fragmented or completely closed) 488
from all spiked samples above 105 CFU/ml allowed us to comprehensively characterize 489
the virulotype and genome synteny matching 100% to the spiked EDL933 strain 490
(Supplementary Figure 1). The genome of the strain used in this study was sequenced 491
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
22
and completely closed by us and used as reference for genome completeness for the de 492
novo assemblies sourced from the artificially contaminated samples. Our variant strain of 493
EDL933 was devoid of the stx2 phage by qPCR. A comparison of the EDL933 genome 494
published earlier (GenBank accession AE005174) and ours showed that stx2 phage was 495
completely missing in our strain, confirming the qPCR results. 496
497
These limits of detection and assembly by nanopore sequencing in conjunction with the 498
use of qPCR for screening the levels of STEC in the pre-enrichments provide an obvious 499
advantage. By combining the qPCR result and the likelihood of genome closure by 500
nanopore sequencing, we have provided an excellent tool for predicting when to pursue 501
sequencing DNA from a particular sample (Figure 4A). The tangible benefit of this 502
combination will depend on the depth of analysis needed – detection versus 503
characterization. While metagenomics has a lower detection limit than that of the BAM 504
qPCR methodology (103 CFU/ml versus 105 CFU/ml) for STEC, it costs 1,000 times 505
more per sample. The sensitivity of nanopore can partially be attributed to the proportion 506
of the initial sample that is used. Nanopore uses approximately 3% of the initial sample 507
(approx. 200 ng/rxn of 6 ug DNA extracted), while qPCR uses 0.02% (2ul of a 1:10 508
dilution from 1 ml extraction, approx. 1ng/rxn). qPCR provides a directed approach but is 509
only as good as the primers included and can only detect the intended target. In the 510
cases of Salmonella spp. or Listeria monocytogenes, their detection by qPCR or 511
metagenomic analysis, is beneficial as per FDA’s zero tolerance policy for these two 512
microorganisms in foods. However, presumptive-positive samples would have to be 513
culturally confirmed before any regulatory action could be taken to stop importation or 514
interstate transport of a particular commodity (51) 515
(https://www.fda.gov/media/102633/download; 516
https://www.fda.gov/media/83177/download). On the other hand, if you want to fully 517
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
23
characterize an organism by using their complete genome, the methodology described 518
herein will allow any laboratory to expedite the process of detection and characterization 519
(Figure 4B). In the case of STECs the entire genome is needed in order to make an 520
informative decision of their potential health risk to humans. Many E. coli strains are not 521
harmful to humans and will pose no risk for public health. Thus, obtaining the completely 522
closed genome of any potential STEC will provide an accurate characterization of all 523
virulence genes it possesses to allow a prediction of potential health risk (25). In our 524
study, the de novo assembly of the complete MAG for EDL933 (either in fragments or 525
completely closed) was achieved in all samples with levels above 105 CFU/ml, which 526
was almost equivalent to the limit of detection by qPCR, but with the added benefit of 527
complete genome characterization critically needed during outbreak and traceback 528
investigations. 529
530
Mining for specific reads matching your organism of interest in metagenomic sequencing 531
data is challenging and requires conducting assemblies using millions of reads with the 532
consequent time and computing resources that can impact the accuracy of the genome 533
assembler employed (30,31,37,39,43). We took advantage of the WIMP workflow 534
included in the EPI2ME cloud service (Oxford Nanopore) that classifies each single read 535
in a .csv file and downloads those classified reads into a pass folder. A script was written 536
that separated the desired reads by taxa into a new folder. Assemblies produced by 537
using all reads versus using only filtered reads by taxa were compared and completely 538
closed O157:H7 MAGs for 107 and 108 CFU/ml levels with filtered reads were obtained 539
(Tables 2 and 3). As expected, the assemblies produced with taxa filtered reads were 540
faster, more precise and consumed less resources. 541
542
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
24
Unlike other technologies, nanopore sequencing output is dependent on the quality of 543
the DNA. Some nanopore metagenomics applications can be conducted directly from 544
samples in which DNA extracts do not contain inhibitors and where the target 545
organism(s) is in enough concentration to be detected (36,37). However, STECs in 546
irrigation water require further processing due to low initial concentrations and the 547
presence of considerable humic acid and other unknown inhibitors. Cleaning of those 548
DNAs resulted in loss and shearing of the DNA (Gonzalez-Escalona, unpublished 549
results), with the consequent loss of both resolution and capability of closing target 550
genomes or MAGs. Hence, there is a need to increase the initial biomass of the target 551
organism (STEC in our case) by an enrichment method that will also minimize the 552
contaminants and improve the quality and quantity of the final DNA. By using the Oxford 553
Nanopore Ligation Kit, we have maximized the potential output, which is preferable for 554
metagenomic analyses. In the future we plan to validate these findings with the Rapid 555
Kit, which would further decrease time between sample processing and analysis with the 556
expectation of completely closing MAGs of organisms above 107 CFU/ml. During sample 557
analysis, we noticed that 73% of the reads were below 5000 bp, this also could impact 558
the closing of genomes of interest and also impact the microbial profile or MAGs from 559
that sample. In our case after removing those reads, the same microbial profile was 560
observed, albeit with fewer reads per organism (results not shown). Our future plans 561
include finding a better method for DNA extraction that could provide higher DNA 562
recovery with less shearing to maximize the potential of nanopore sequencing from 563
enriched culture samples. Some authors have addressed DNA shearing when extracting 564
the DNA and suggest gentler bead-beating steps may yield less sheared high molecular 565
weight (HMW) DNA, but might fail in extracting DNA from most difficult organisms (36). 566
567
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
25
By using our proposed pipeline, we were not only able to improve detection and 568
characterization of our desired organism (STEC), we were also able to identify the 569
bacteria species that were present in the un-spiked enriched irrigation water. This 570
analysis showed that the most common microorganisms (>1% abundance) enriched by 571
the BAM method belonged to the genera Enterobacter, Klebsiella, Pseudomonas, 572
Acinetobacter, and Citrobacter. We also identified Salmonella, Escherichia, Serratia, 573
Edwardsiella, Yersinia, and Cronobacter among the 0.1% abundance. De novo 574
assembly of the long-read data resulted in 677 contigs with most of these MAGs in a 575
fragmented stage, we were not able to recover a completely closed genome. 576
Nevertheless, we were able to recover the complete genome for Acinetobacter 577
baumannii (~3.9 Mb) in 17 contigs (with the longest contig 3,784,399 bp), Enterobacter 578
cloacae in 167 contigs, Klebsiella pneumoniae in 35 contigs, Citrobacter freundii in 21 579
contigs, and Pseudomonas putida in 214 contigs (Supplementary Table 4). We could 580
not recover the MAGs for Enterobacter sp., Enterobacter kobei, Enterobacter 581
hormaechei subsp. Hormachei or Enterobacter hormaechei subsp. Xiangfangensis 582
strains, even though they were present in higher abundance than Acinetobacter 583
baumannii. The most plausible explanation could be that there were many different 584
strains representing those species and therefore it was very hard to assemble their 585
individual genomes. We did find some E. coli reads in the un-spiked enriched irrigation 586
water sample (10,806 reads), suggesting the presence of E. coli in the original irrigation 587
water sample at very low concentrations. However, the E. coli identified by in silico 588
molecular serotyping matched to O9 serotype (Table 3), not the O157:H7 serotype of 589
our spiked EDL933 strain, and no virulence genes were found by our in silico 590
virulotyping (25). As mentioned by Leonard et al. (2015), having found other non-591
pathogenic E. coli in the original water sample reinforces the need to obtain the 592
complete genomes in order to assess the potential virulence of any E. coli strain (31). If 593
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
26
we applied the same script for other organisms with abundance above 3% but used 594
different taxa to filter their reads, we could potentially close those genomes as well. This 595
opens up a very attractive way of obtaining closed MAGs from metagenomic samples, 596
similar to what was obtained previously for fecal samples (36). 597
598
In addition to taxa identification, another advantage of our proposed pipeline was that we 599
surveyed the microbial community for the presence of AMR genes in the un-spiked 600
enriched water sample. The Antimicrobial Resistance workflow in EPI2ME (Oxford 601
Nanopore) provides AMR gene detection and identifies the organism carrying that AMR 602
gene based on the WIMP classification of the read. This specific result will be hard to 603
achieve when using short reads. AMR genes found in the irrigation water sample 604
included several beta-lactamase and efflux pump genes that confer antibiotic resistance 605
in Klebsiella pneumoniae, Enterobacter, Citrobacter freundii, Acinetobacter baumannii, 606
and Enterobacter hormaechei. The qnrB23 gene variant (29 reads matching) that 607
confers fluoroquinolone resistance was detected in Citrobacter freundii. Finally, besides 608
AMR genes we observed the presence of point mutations which confers resistance to 609
colistin in Klebsiella pneumoniae (PhoP gene mutation with 977 reads matching, 855X 610
coverage). Antibiotic resistant bacteria in humans has been linked to food sources (52), 611
making the presence of these AMR genes in known human pathogens such as 612
Klebsiella pneumoniae and Acinetobacter baumannii worrisome. Acetinobacter has 613
recently been shown to use killing-enhanced horizontal gene transfer (53), which 614
suggests further study given the high number of AMR genes present in this sample. 615
Additionally, as the soil filters and concentrates the bacteria in the irrigation water, the 616
risk for human consumption increases (54). National and international organizations, 617
such as the National Antimicrobial Resistance Monitoring System (NARMS - 618
https://www.cdc.gov/narms/index.html), One Health approach 619
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
27
(https://www.cdc.gov/onehealth/index.html) and the Global AMR Surveillance System 620
(GLASS - https://www.who.int/glass/en/) use the resources of the CDC, USDA, FDA, 621
and WHO to monitor and report the prevalence of and distribute regulatory guidance on 622
antimicrobial resistance in pathogenic and commensal bacteria in food and food animals 623
(52,55). Our pipeline could be an important screening tool to enhance future testing. 624
625
Conclusions: 626
Overall, we tested the limits of detection and assembly for EDL933 O157:H7 in enriched 627
irrigation water using a shotgun long-read sequencing approach. We determined that the 628
limit of detection was 103 CFU/ml. We were able to show that if the levels of the target 629
organism were above 105 CFU/ml, the complete genome will be assembled into several 630
fragments, aided by filtering the reads by taxa. This fragmented genome will be enough 631
to make a complete characterization of the STEC strain including serotype and 632
virulotype. We also determined the detection limit of the BAM STEC qPCR (105 CFU/ml) 633
coincided with our STEC assembly limit by nanopore sequencing. Therefore, we 634
recommend a combination approach using qPCR and nanopore sequencing. In the 635
screening stage, qPCR can provide both detection and an estimate of CFU/ml 636
concentration which could predict if subsequent nanopore sequencing will produce 637
enough data to obtain a complete MAG of the target organism, either closed or 638
fragmented. It also provides information on how many samples can be run in a single 639
flow cell. In our case we used one sample per flow cell, but if levels > 107 CFU/ml we 640
could use 3 - 4 samples per flow cell, reducing the sequencing cost per sample. We 641
expect that the use of this pipeline could enhance the capacity of Public Health entities 642
to respond faster and more accurately during outbreak and traceback investigations. 643
Moreover, further advances in nanopore long-read sequencing accuracy will improve the 644
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
28
quality of these MAGs allowing use for phylogenetic analysis and providing a myriad of 645
applications in metagenomics. 646
647
648
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
29
ACKNOWLEDGEMENTS 649
The study was supported by funding from the MCMi Challenge Grants Program 650
Proposal #2018-646 and the FDA Foods Program Intramural Funds. Dr. Maguire 651
acknowledges a Research Fellowship Program (ORISE) for the Center for Food 652
Safety and Applied Nutrition administered by the Oak Ridge Associated 653
Universities through a contract with the U.S. Food and Drug Administration. 654
655
656
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
30
REFERENCES 657
658
1. Beutin, L. and A. Martin. 2012. Outbreak of Shiga toxin-producing Escherichia 659
coli (STEC) O104:H4 infection in Germany causes a paradigm shift with regard 660
to human pathogenicity of STEC strains. J. Food Prot. 75:408-418. 661
2. Mellmann, A., M. Bielaszewska, R. Kock, A. W. Friedrich, A. Fruth, B. 662
Middendorf, D. Harmsen, M. A. Schmidt, and H. Karch. 2008. Analysis of 663
collection of hemolytic uremic syndrome-associated enterohemorrhagic 664
Escherichia coli. Emerg. Infect. Dis. 14:1287-1290. 665
3. Tarr, P. I., C. A. Gordon, and W. L. Chandler. 2005. Shiga-toxin-producing 666
Escherichia coli and haemolytic uraemic syndrome. Lancet 365:1073-1086. 667
4. Gonzalez-Escalona, N., J. Meng, and M. P. Doyle. 2019. Shiga toxin-producing 668
Escherichia coli, p. 289-315. In: M. P. Doyle, F. Diez-Gonzalez, and C. Hill 669
(eds.), Food Microbiology: Fundamentals And Frontiers. 5th ed. ASM press, 670
Washington DC. 671
5. Mead, P. S., L. Slutsker, V. Dietz, L. F. McCaig, J. S. Bresee, C. Shapiro, P. 672
M. Griffin, and R. V. Tauxe. 1999. Food-related illness and death in the United 673
States. Emerg. Infect Dis 5:607-625. 674
6. Allos, B. M., M. R. Moore, P. M. Griffin, and R. V. Tauxe. 2004. Surveillance 675
for sporadic foodborne disease in the 21st century: the FoodNet perspective. Clin 676
Infect. Dis. 38 Suppl 3:S115-S120. 677
7. Scallan, E., R. M. Hoekstra, F. J. Angulo, R. V. Tauxe, M. A. Widdowson, S. 678
L. Roy, J. L. Jones, and P. M. Griffin. 2011. Foodborne illness acquired in the 679
United States--major pathogens. Emerg. Infect. Dis. 17:7-15. 680
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
31
8. Brooks, J. T., E. G. Sowers, J. G. Wells, K. D. Greene, P. M. Griffin, R. M. 681
Hoekstra, and N. A. Strockbine. 2005. Non-O157 Shiga toxin-producing 682
Escherichia coli infections in the United States, 1983-2002. J. Infect Dis. 683
192:1422-1429. 684
9. Sivapalasingam, S., C. R. Friedman, L. Cohen, and R. V. Tauxe. 2004. Fresh 685
produce: a growing cause of outbreaks of foodborne illness in the United States, 686
1973 through 1997. J Food Prot. 67:2342-2353. 687
10. Tack, D. M., L. Ray, P. M. Griffin, P. R. Cieslak, J. Dunn, T. Rissman, R. 688
Jervis, S. Lathrop, A. Muse, M. Duwell, K. Smith, M. Tobin-D'Angelo, D. J. 689
Vugia, K. J. Zablotsky, B. J. Wolpert, R. Tauxe, and D. C. Payne. 2020. 690
Preliminary Incidence and Trends of Infections with Pathogens Transmitted 691
Commonly Through Food - Foodborne Diseases Active Surveillance Network, 10 692
U.S. Sites, 2016-2019. MMWR. Morb. Mortal. Wkly. Rep. 69:509-514. 693
11. Bottichio, L., A. Keaton, D. Thomas, T. Fulton, A. Tiffany, A. Frick, M. 694
Mattioli, A. Kahler, J. Murphy, M. Otto, A. Tesfai, A. Fields, K. Kline, J. 695
Fiddner, J. Higa, A. Barnes, F. Arroyo, A. Salvatierra, A. Holland, W. 696
Taylor, J. Nash, B. M. Morawski, S. Correll, R. Hinnenkamp, J. Havens, K. 697
Patel, M. N. Schroeder, L. Gladney, H. Martin, L. Whitlock, N. Dowell, C. 698
Newhart, L. F. Watkins, V. Hill, S. Lance, S. Harris, M. Wise, I. Williams, C. 699
Basler, and L. Gieraltowski. 2019. Shiga Toxin-Producing Escherichia coli 700
Infections Associated With Romaine Lettuce - United States, 2018. Clinical 701
Infectious Diseases. 702
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
32
12. Fischer, M., Bourner, A., and Plunkett, G. 2015. Outbreak alert! 2015: a 703
review of foodborne illness in the US from 2004-2013. Washington DC, Center 704
for Science in the Public Interest. 705
13. Olaimat, A. N. and R. A. Holley. 2012. Factors influencing the microbial safety 706
of fresh produce: a review. Food Microbiol. 32:1-19. 707
14. Rangel, J. M., P. H. Sparling, C. Crowe, P. M. Griffin, and D. L. Swerdlow. 708
2005. Epidemiology of Escherichia coli O157:H7 outbreaks, United States, 1982-709
2002. Emerg. Infect Dis. 11:603-609. 710
15. Steele, M. and Odumeru, J. Irrigation water as source of foodborne pathogens 711
on fruit and vegetables. Journal of food protection 67(12), 2839-2849. 2004. 712
16. Uyttendaele, M., L. A. Jaykus, P. Amoah, A. Chiodini, D. Cunliffe, L. 713
Jacxsens, K. Holvoet, L. Korsten, M. Lau, P. McClure, G. Medema, I. 714
Sampers, and P. Rao Jasti. 2015. Microbial Hazards in Irrigation Water: 715
Standards, Norms, and Testing to Manage Use of Water in Fresh Produce Primary 716
Production. Comprehensive Reviews in Food Science and Food Safety 14:336-717
356. 718
17. Leaman, S., Gorny, J., Wetherington, D., and Bekris, H. 2014. Agricultural 719
Water. Center for Produce Safety Five Year Research Review . 720
18. Monaghan, J. M. and M. L. Hutchison. 2012. Distribution and decline of 721
human pathogenic bacteria in soil after application in irrigation water and the 722
potential for soil-splash-mediated dispersal onto fresh produce. J Appl. Microbiol 723
112:1007-1019. 724
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
33
19. Oliveira, M., I. Vinas, J. Usall, M. Anguera, and M. Abadias. 2012. Presence 725
and survival of Escherichia coli O157:H7 on lettuce leaves and in soil treated 726
with contaminated compost and irrigation water. Int J Food Microbiol. 156:133-727
140. 728
20. Allende, A. and J. Monaghan. 2015. Irrigation Water Quality for Leafy Crops: 729
A Perspective of Risks and Potential Solutions. Int J Environ. Res. Public Health 730
12:7457-7477. 731
21. Allard, M. W., R. Bell, C. M. Ferreira, N. Gonzalez-Escalona, M. Hoffmann, 732
T. Muruvanda, A. Ottesen, P. Ramachandran, E. Reed, S. Sharma, E. 733
Stevens, R. Timme, J. Zheng, and E. W. Brown. 2018. Genomics of foodborne 734
pathogens for microbial food safety. Curr. Opin. Biotechnol 49:224-229. 735
22. Bergholz, T. M., A. I. Moreno Switt, and M. Wiedmann. 2014. Omics 736
approaches in food safety: fulfilling the promise? Trends Microbiol. 22:275-281. 737
23. Sekse, C., A. Holst-Jensen, U. Dobrindt, G. S. Johannessen, W. Li, B. 738
Spilsberg, and J. Shi. 2017. High Throughput Sequencing for Detection of 739
Foodborne Pathogens. Front. Microbiol. 8:2029. 740
24. Gonzalez-Escalona, N., M. A. Allard, E. W. Brown, S. Sharma, and M. 741
Hoffmann. 2019. Nanopore sequencing for fast determination of plasmids, 742
phages, virulence markers, and antimicrobial resistance genes in Shiga toxin-743
producing Escherichia coli. PLoS One 14:e0220494. 744
25. Gonzalez-Escalona, N. and J. A. Kase. 2019. Virulence gene profiles and 745
phylogeny of shiga toxin-positive Escherichia coli strains isolated from FDA 746
regulated foods during 2010-2017. PLoS One 14:e0214620. 747
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
34
26. Hoffmann, M., Y. Luo, S. R. Monday, N. Gonzalez-Escalona, A. R. Ottesen, 748
T. Muruvanda, C. Wang, G. Kastanis, C. Keys, D. Janies, I. F. Senturk, U. V. 749
Catalyurek, H. Wang, T. S. Hammack, W. J. Wolfgang, D. Schoonmaker-750
Bopp, A. Chu, R. Myers, J. Haendiges, P. S. Evans, J. Meng, E. A. Strain, M. 751
W. Allard, and E. W. Brown. 2016. Tracing Origins of the Salmonella Bareilly 752
Strain Causing a Food-borne Outbreak in the United States. J. Infect Dis. 753
213:502-508. 754
27. Gonzalez-Escalona, N., M. Toro, L. V. Rump, G. Cao, T. G. Nagaraja, and J. 755
Meng. 2016. Virulence gene profiles and clonal relationships of Escherichia coli 756
O26:H11 isolates from feedlot cattle as determined by whole-genome sequencing. 757
Appl. Environ. Microbiol. 82:3900-3912. 758
28. Kaper JB, Nataro JP, and Mobley HL. 2004. Pathogenic Escherichia coli. Nat 759
Rev Microbiol 2:123-140. 760
29. Garmendia, J., G. Frankel, and V. F. Crepin. 2005. Enteropathogenic and 761
enterohemorrhagic Escherichia coli infections: translocation, translocation, 762
translocation. Infect. Immun. 73:2573-2585. 763
30. Leonard, S. R., M. K. Mammel, D. W. Lacher, and C. A. Elkins. 2016. Strain-764
Level Discrimination of Shiga Toxin-Producing Escherichia coli in Spinach 765
Using Metagenomic Sequencing. PLoS One 11:e0167870. 766
31. Leonard, S. R., M. K. Mammel, D. W. Lacher, and C. A. Elkins. 2015. 767
Application of metagenomic sequencing to food safety: detection of Shiga Toxin-768
producing Escherichia coli on fresh bagged spinach. Appl. Environ. Microbiol. 769
81:8183-8191. 770
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
35
32. Lusk, P. T., P. Ramachandran, E. Reed, J. A. Kase, and A. Ottesen. 2018. 771
Metagenomic Description of Preenrichment and Postenrichment of Recalled 772
Chapati Atta Flour Using a Shotgun Sequencing Approach. Genome Announc. 6. 773
33. Ottesen, A., P. Ramachandran, Y. Chen, E. Brown, E. Reed, and E. Strain. 774
2020. Quasimetagenomic source tracking of Listeria monocytogenes from 775
naturally contaminated ice cream. BMC Infect. Dis. 20:83. 776
34. Kovac, J., H. d. Bakker, L. M. Carroll, and M. Wiedmann. 2017. Precision 777
food safety: A systems approach to food safety facilitated by-ágenomics tools. 778
TrAC Trends in Analytical Chemistry 96:52-61. 779
35. Gigliucci, F., F. A. B. von Meijenfeldt, A. Knijn, V. Michelacci, G. Scavia, F. 780
Minelli, B. E. Dutilh, H. M. Ahmad, G. C. Raangs, A. W. Friedrich, J. W. A. 781
Rossen, and S. Morabito. 2018. Metagenomic characterization of the human 782
intestinal microbiota in fecal samples from STEC-infected patients. Front. Cell 783
Infect. Microbiol. 8:25. 784
36. Moss, E. L., D. G. Maghini, and A. S. Bhatt. 2020. Complete, closed bacterial 785
genomes from microbiomes using nanopore sequencing. Nat Biotechnol. 701–707 786
37. Nicholls, S. M., J. C. Quick, S. Tang, and N. J. Loman. 2019. Ultra-deep, long-787
read nanopore sequencing of mock microbial community standards. Gigascience. 788
8. 789
38. Boykin, L. M., P. Sseruwagi, T. Alicai, E. Ateka, I. U. Mohammed, J. L. 790
Stanton, C. Kayuki, D. Mark, T. Fute, J. Erasto, H. Bachwenkizi, B. Muga, 791
N. Mumo, J. Mwangi, P. Abidrabo, G. Okao-Okuja, G. Omuut, J. Akol, H. 792
B. Apio, F. Osingada, M. A. Kehoe, D. Eccles, A. Savill, S. Lamb, T. Kinene, 793
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
36
C. B. Rawle, A. Muralidhar, K. Mayall, F. Tairo, and J. Ndunguru. 2019. 794
Tree Lab: portable genomics for early detection of plant viruses and pests in Sub-795
Saharan Africa. Genes (Basel) 10:632. 796
39. Grutzke, J., B. Malorny, J. A. Hammerl, A. Busch, S. H. Tausch, H. Tomaso, 797
and C. Deneke. 2019. Fishing in the Soup - Pathogen Detection in Food Safety 798
Using Metabarcoding and Metagenomic Sequencing. Front. Microbiol. 10:1805. 799
40. Bertrand, D., J. Shaw, M. Kalathiyappan, A. H. Q. Ng, M. S. Kumar, C. Li, 800
M. Dvornicic, J. P. Soldo, J. Y. Koh, C. Tong, O. T. Ng, T. Barkham, B. 801
Young, K. Marimuthu, K. R. Chng, M. Sikic, and N. Nagarajan. 2019. Hybrid 802
metagenomic assembly enables high-resolution analysis of resistance 803
determinants and mobile elements in human microbiomes. Nat. Biotechnol. 804
37:937-944. 805
41. Gu, G., A. Ottesen, S. Bolten, Y. Luo, S. Rideout, and X. Nou. 2020. 806
Microbiome convergence following sanitizer treatment and identification of 807
sanitizer resistant species from spinach and lettuce rinse water. Int J Food 808
Microbiol. 318:108458. 809
42. Suttner, B., E. R. Johnston, L. H. Orellana, R. Rodriguez, J. K. Hatt, D. 810
Carychao, M. Q. Carter, M. B. Cooley, and K. T. Konstantinidis. 2020. 811
Metagenomics as a Public Health Risk Assessment Tool in a Study of Natural 812
Creek Sediments Influenced by Agricultural and Livestock Runoff: Potential and 813
Limitations. Appl. Environ. Microbiol. 86 (6):e02525-19. doi: 814
10.1128/AEM.02525-19 815
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
37
43. Loman, N. J., J. Quick, and J. T. Simpson. 2015. A complete bacterial genome 816
assembled de novo using only nanopore sequencing data. Nat. Methods 12:733-817
735. 818
44. Juul, S., F. Izquierdo, A. Hurst, X. Dai, A. Wright, E. Kulesha, R. Pettett, 819
and D. J. Turner. 2015. Whats in my pot? Real-time species identification on the 820
MinION. bioRxiv030742. 821
45. Kim, D., L. Song, F. P. Breitwieser, and S. L. Salzberg. 2016. Centrifuge: rapid 822
and sensitive classification of metagenomic sequences. Genome Res. 26:1721-823
1729. 824
46. Kolmogorov, M., J. Yuan, Y. Lin, and P. A. Pevzner. 2019. Assembly of long, 825
error-prone reads using repeat graphs. Nat Biotechnol 37:540-546. 826
47. Wood, D. E. and S. L. Salzberg. 2014. Kraken: ultrafast metagenomic sequence 827
classification using exact alignments. Genome Biol. 15:R46. 828
48. Afgan, E., D. Baker, B. Batut, M. van den Beek, D. Bouvier, M. Cech, J. 829
Chilton, D. Clements, N. Coraor, B. A. Gruning, A. Guerler, J. Hillman-830
Jackson, S. Hiltemann, V. Jalili, H. Rasche, N. Soranzo, J. Goecks, J. Taylor, 831
A. Nekrutenko, and D. Blankenberg. 2018. The Galaxy platform for accessible, 832
reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids 833
Res. 46:W537-W544. 834
49. Darling, A. C. E., B. Mau, F. R. Blattner, and N. T. Perna. 2004. Mauve: 835
Multiple alignment of conserved genomic sequence with rearrangements. 836
Genome Res. 14:1394-1403. 837
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
38
50. Alcock, B. P., A. R. Raphenya, T. T. Y. Lau, K. K. Tsang, M. Bouchard, A. 838
Edalatmand, W. Huynh, A. V. Nguyen, A. A. Cheng, S. Liu, S. Y. Min, A. 839
Miroshnichenko, H. K. Tran, R. E. Werfalli, J. A. Nasir, M. Oloni, D. J. 840
Speicher, A. Florescu, B. Singh, M. Faltyn, A. Hernandez-Koutoucheva, A. 841
N. Sharma, E. Bordeleau, A. C. Pawlowski, H. L. Zubyk, D. Dooley, E. 842
Griffiths, F. Maguire, G. L. Winsor, R. G. Beiko, F. S. L. Brinkman, W. W. 843
L. Hsiao, G. V. Domselaar, and A. G. McArthur. 2020. CARD 2020: antibiotic 844
resistome surveillance with the comprehensive antibiotic resistance database. 845
Nucleic Acids Res. 48:D517-D525. 846
51. Archer, D. L. 2018. The evolution of FDA's policy on Listeria monocytogenes in 847
ready-to-eat foods in the United States. Current Opinion in Food Science 20:64-848
68. 849
52. Karp, B. E., H. Tate, J. R. Plumblee, U. Dessai, J. M. Whichard, E. L. 850
Thacker, K. R. Hale, W. Wilson, C. R. Friedman, P. M. Griffin, and P. F. 851
McDermott. 2017. National Antimicrobial Resistance Monitoring System: Two 852
Decades of Advancing Public Health Through Integrated Surveillance of 853
Antimicrobial Resistance. Foodborne. Pathog. Dis. 14:545-557. 854
53. Cooper, R. M., L. Tsimring, and J. Hasty. 2017. Inter-species population 855
dynamics enhance microbial horizontal gene transfer and spread of antibiotic 856
resistance. Elife. 6. 857
54. Holzel, C. S., J. L. Tetens, and K. Schwaiger. 2018. Unraveling the Role of 858
Vegetables in Spreading Antimicrobial-Resistant Bacteria: A Need for 859
Quantitative Risk Assessment. Foodborne Pathog. Dis 15:671-688. 860
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
39
55. Thakur, S. and G. C. Gray. 2019. The Mandate for a Global "One Health" 861
Approach to Antimicrobial Resistance Surveillance. Am J Trop Med Hyg 862
2018/12/27:227-228. 863
864 865
866
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
40
TABLES 867
Table 1. Bacterial community analysis of enriched irrigation water by WIMP. Only 868
species with frequencies above 1% are shown. 869
Reads % Reads a
Klebsiella pneumoniae 1,215,922 28.52
Enterobacter cloacae 902,950 21.18
Enterobacter sp. 286,190 6.71
Enterobacter kobei 278,513 6.53
Pseudomonas putida 165,323 3.88
Citrobacter freundii 164,595 3.86
Acinetobacter baumannii 157,421 3.69
Enterobacter hormaechei 145,839 3.42
Enterobacter xiangfangensis 52,058 1.22
870 a% Reads were calculated as the proportion of total reads classified by WIMP software 871
analysis 872
873
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
41
Table 2. Virulence of the Flye assemblies obtained with all reads. 874
Sample Innoculation
Level (CFU/ml)
Serotypea stx type
eae type
Contig No. (genome and
plasmid)b
Water 0 O9 - - 677
Water+Ecoli1 7.3 x 108 O157:H7 1a gamma-1 555 (3+)
Water+Ecoli2 7.3 x 107 O157:H7 1a gamma-1 627 (4+)
Water+Ecoli3 7.3 x 106 O157:H7/ O9 1a gamma-1 753 (35+)
Water+Ecoli4 c 7.3 x 105 O157:H7/ O9 1a gamma-1 780 (80+)
Water+Ecoli5 7.3 x 104 O9 - -
Water+Ecoli6d 7.3 x 103 O9 - - 644
875
ain silico serotype using DTU. 876
bIn parenthesis, the number of contigs that contain the entire chromosome, a plus sign 877
(+) indicates the presence of a contig that matched the EDL933 circular plasmid. 878
cfragmented genome assembly limit. 879
dE. coli O157 detection limit (68 reads matching O157 by WIMP classification). 880
881
882
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
42
Table 3. Virulence of the Flye assemblies obtained with the WIMP E. coli extracted 883
reads. 884
Samplea WIMP E. coli Reads
Serotypeb stx type eae type
Contig No. (genome and
plasmid)c
Water 10,806 O9 - - 31
Water+Ecoli1 1,659,463 O157:H7 1a gamma-1 44 (1+)
Water+Ecoli2 432,649 O157:H7 1a gamma-1 40 (1+)
Water+Ecoli3 73,783 O157:H7/ O9 1a gamma-1 41 (8+)
Water+Ecoli4d 17,203 O157:H7/ O9 1a gamma-1 92 (63+)
Water+Ecoli5 10,086 O9 - - 28
Water+Ecoli6e 8,515 O9 - - 24
885 aCFU/ml levels of EDL933 inoculation can be found in Table 2. 886
bin silico serotype using DTU. 887
cIn parenthesis, the number of contigs that contain the entire chromosome, a plus sign 888
(+) indicates the presence of a contig that matched the EDL933 circular plasmid. 889
dfragmented genome assembly limit. 890
eE. coli O157 detection limit (31 reads matching O157 by Kraken classification). 891
892
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
43
Table 4. In silico detection of virulence genes in Flye assemblies with all nanopore reads 893
and E. coli extracted reads. 894
All reads
E. coli extracted reads
Water+Ecoli
Water+Ecoli
Virulence Gene
EDL933 Reference Water 1 2 3 4 5 6
Water 1 2 3 4 5 6
astA + - + + + + - -
- + + + + - -
ehxA + - + + + + - -
- + + + + - -
espA + - + + + + - -
- + + + + - -
espB + - + + + + - -
- + + + + - -
espF + - + + + + - -
- + + + + - -
espJ + - + + + + - -
- + + + + - -
espK + - + + + + - -
- + + + + - -
espP + - + + + + - -
- + + + + - -
tccP + - + + - + - -
- + + + + - -
etpD + - + + + + - -
- + + + + - -
gad + - + + + + - -
- + + + + - -
iha + - + + + + - -
- + + + + - -
iss + - + + + + - -
- + + + + - -
nleA + - + + + + - -
- + + + + - -
nleB + - + + + + - -
- + + + + - -
nleC + - + + + + - -
- + + + + - -
tir + - + + + + - -
- + + + + - -
katP + - + + + + - -
- + + + + - -
pssA + - + + + - - -
- + + + + - -
air + - + + + - - -
- + + + + - -
toxB + - + + + + - -
- + + + + - -
ecf1 + - + + + + - -
- + + + + - -
IEE + - + + + + - -
- + + + + - -
+ Virulence gene detected 895
- Virulence gene not detected 896
897
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
44
FIGURES 898
Figure 1. Determination of the detection limit of the qPCR assay. Calibration curves 899
generated using 10-fold dilutions of DNA standards for E. coli EDL933 (top) detecting 900
the wzy (A) and stx1 (B) genes. The Cq values were plotted against the log-scale CFU 901
per reaction target concentration (bottom). The R2 values and reaction efficiency (E) are 902
also shown. 903
904
905
906
907
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
45
Figure 2. Flow diagram of artificial contamination of enriched, STEC-negative irrigation 908
water through analysis by nanopore sequencing. 909
910
911
912
913
914
915
E. coli O157:H7 EDL933
24h culture (~109)
serial dilutions (10-fold)
STEC-negative
enriched irrigation water (24h)
Spike-In
(1mL water + 1mL diluted EDL933) Plate to count CFU
(24h)
DNA Extraction
Oxford Nanopore Sequencing
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
46
Figure 3. Relative abundance of bacterial species associated with irrigation water un-916
spiked and spiked with E. coli EDL933. Enriched irrigation water (Water) was artificially 917
contaminated with 10-fold dilutions of E. coli EDL933 (+Ecoli) with a starting 918
concentration of 7 x 108 CFU/ml (Water+Ecoli1). Reads were analyzed by the EPI2ME 919
WIMP workflow. Bacterial species contributing more than 1% of the classified reads are 920
shown and the sum of the remaining species identified are included as “Other.” 921
922
923
924
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint
47
Figure 4. STEC detection and classification by combined qPCR and nanopore 925
sequencing approach. A) Direct comparison of quantitative qPCR detection with de novo 926
assembly limits by nanopore sequencing informs detection and classification of STECs. 927
B) Pipeline for detection and classification of STECs in enriched irrigation water using 928
nanopore sequencing and EPI2ME cloud-based services to identify reads of a desired 929
taxa for de novo assembly with Flye and in silico analysis. 930
931
932
and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105
The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint