precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · metagenomics...

47
1 Precision long-read metagenomics sequencing for food safety by detection and 1 assembly of Shiga toxin-producing Escherichia coli in irrigation water 2 3 Meghan Maguire, Julie A. Kase, Dwayne Roberson, Tim Muruvanda, Eric W. Brown, 4 Marc Allard, Steven M. Musser, Narjol González-Escalona* 5 6 Center for Food Safety and Applied Nutrition, Food and Drug Administration, College 7 Park, MD USA. 8 9 Key words: Foodborne pathogens, qPCR, Escherichia coli, nanopore sequencing, real 10 time PCR, irrigation water, metagenomics 11 12 13 Running title: E. coli long-read metagenomics for food safety 14 15 16 17 18 ______________ 19 *Corresponding author. [email protected]. Mailing address, Center 20 for Food Safety and Applied Nutrition, Food and Drug Administration, 5001 Campus 21 Drive, College Park, MD 20740, USA 22 23 and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 The copyright holder for this preprint (which this version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718 doi: bioRxiv preprint

Upload: others

Post on 05-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

1

Precision long-read metagenomics sequencing for food safety by detection and 1

assembly of Shiga toxin-producing Escherichia coli in irrigation water 2

3

Meghan Maguire, Julie A. Kase, Dwayne Roberson, Tim Muruvanda, Eric W. Brown, 4

Marc Allard, Steven M. Musser, Narjol González-Escalona* 5

6

Center for Food Safety and Applied Nutrition, Food and Drug Administration, College 7

Park, MD USA. 8

9

Key words: Foodborne pathogens, qPCR, Escherichia coli, nanopore sequencing, real 10

time PCR, irrigation water, metagenomics 11

12

13

Running title: E. coli long-read metagenomics for food safety 14

15

16

17

18

______________ 19

*Corresponding author. [email protected]. Mailing address, Center 20

for Food Safety and Applied Nutrition, Food and Drug Administration, 5001 Campus 21

Drive, College Park, MD 20740, USA 22

23

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 2: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

2

ABSTRACT 24

Shiga toxin-producing Escherichia coli (STEC) contamination of agricultural water might 25

be an important factor to recent foodborne illness and outbreaks involving leafy greens. 26

Whole genome sequencing generation of closed bacterial genomes plays an important 27

role in source tracking. We aimed to determine the limits of detection and classification 28

of STECs by qPCR and nanopore sequencing using enriched irrigation water artificially 29

contaminated with E. coli O157:H7 (EDL933). We determined the limit of STEC 30

detection by qPCR to be 30 CFU/reaction, which is equivalent to 105 CFU/ml in the 31

enrichment. By using Oxford Nanopore’s EPI2ME WIMP workflow and de novo 32

assembly with Flye followed by taxon classification with a k-mer analysis software 33

(Kraken), E. coli O157:H7 could be detected at 103 CFU/ml (68 reads) and a complete 34

fragmented E. coli O157:H7 metagenome-assembled genome (MAG) was obtained at 35

105-108 CFU/ml. Using a custom script to extract the E. coli reads, a completely closed 36

MAG was obtained at 107-108 CFU/ml and a complete, fragmented MAG was obtained 37

at 105-106 CFU/ml. In silico virulence detection for E. coli MAGs for 105-108 CFU/ml 38

showed that the virulotype was indistinguishable from the spiked E. coli O157:H7 strain. 39

We further identified the bacterial species in the un-spiked enrichment, including 40

antimicrobial resistance genes, which could have important implications to food safety. 41

We propose this workflow could be used for detection and complete genomic 42

characterization of STEC from a complex microbial sample and could be applied to 43

determine the limit of detection and assembly of other foodborne bacterial pathogens. 44

45

46

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 3: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

3

IMPORTANCE 47

Foodborne illness caused by Shiga toxin-producing E. coli (STEC) ranges in severity 48

from diarrhea to hemolytic uremic syndrome and produce-related incidence is 49

increasing. The pervasive nature of E. coli requires not only detection, but also a 50

complete genome to determine potential pathogenicity based on stx and eae genes, 51

serotype, and other virulence factors. We have developed a pipeline to determine the 52

limits of nanopore sequencing for STECs in a metagenomic sample. By utilizing the 53

current qPCR in the FDA Bacteriological Analytical Manual (BAM) Chapter 4A, we can 54

quantify the amount of STEC in the enrichment and then sequence and classify the 55

STEC in less than half the time as current protocols that require a single isolate. These 56

methods have wide implications for food safety, including decreased time to STEC 57

identification during outbreaks, characterization of the microbial community, and the 58

potential to use these methods to determine the limits for other foodborne pathogens. 59

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 4: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

4

INTRODUCTION 60

Shiga toxin-producing Escherichia coli (STEC) is a foodborne pathogen capable of 61

causing severe illness, notably hemolytic uremic syndrome (HUS), and death (1-4). 62

STEC-mediated foodborne illness cases and outbreaks are most commonly associated 63

with the O157:H7 serotype; however, non-O157 STEC illnesses are increasingly being 64

reported (5-8). In the United States, O157:H7 is responsible for approximately 95,000 65

cases per year of which over 2,000 require hospitalization (7). A further 170,000 cases 66

can be attributed to non-O157 STEC exposure. Foodborne transmission accounts for 67

nearly 70% of O157:H7 incidents. Foodborne outbreaks have been increasingly 68

produce-related, from 0.7% in the 1970s to 6% in the 1990s (9). More recently (2004-69

2013) produce accounts for approximately 18% of foodborne outbreaks and E. coli is 70

one of the most common bacterial sources (10-13). Produce can become contaminated 71

during production, packaging, or preparation; however, half of the produce-associated 72

infections are linked to contamination prior to purchase (14). As such, agricultural water, 73

including water used for irrigation, is an important potential source of contamination 74

(https://www.fda.gov/food/outbreaks-foodborne-illness/environmental-assessment-75

factors-potentially-contributing-contamination-romaine-lettuce-implicated; 76

https://www.fda.gov/food/outbreaks-foodborne-illness/investigation-summary-factors-77

potentially-contributing-contamination-romaine-lettuce-implicated-fall) (15-20). While 78

prevention and mitigation strategies are beyond the scope of this study, detection of 79

STEC bacteria remains paramount for public health. 80

81

A standard and accepted method for the detection of STECs in foods is based on a 82

combined result from qPCR and a traditional microbiological method described in the 83

FDA Bacteriological Analytical Manual (BAM) Chapter 4A 84

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 5: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

5

(https://www.fda.gov/food/laboratory-methods-food/bam-chapter-4a-diarrheagenic-85

escherichia-coli). It consists of a 24-hour sample enrichment in modified buffered 86

peptone water with pyruvate at 42°C followed by qPCR detection of the main virulence 87

genes (stx1 and stx2) and the wzy gene of the O157 antigen for a total analysis time of 88

2-3 days. Positive qPCR results undergo further analysis and O157:H7 STEC is 89

confirmed by several rounds of selective plating on tellurite cefixime – sorbitol 90

MacConkey agar (TC-SMAC), chromogenic agar, and trypticase soy agar with 0.6% 91

yeast extract (TSAYE) plates for 2-4 more days of analysis time. Isolates confirmed to 92

be pure cultures are assessed for toxigenic potential again by qPCR. Further analysis by 93

whole genome sequencing (WGS) is used to determine the complete scope of 94

pathogenicity and antimicrobial resistance status which would add 3-5 days. In 95

conclusion, approximately two weeks are needed for STEC confirmation by the BAM 96

method. 97

98

WGS is rapidly changing the approach to foodborne illnesses and outbreak 99

investigations (21). WGS is being used to monitor and identify foodborne pathogens 100

(22,23) and the presence of antimicrobial resistance or virulence genes (24,25). Specific 101

information on serotype and pathogenicity as it relates to phylogenic relationships is 102

increasingly important in outbreak scenarios (26,27). Some of the virulence genes 103

detected by WGS mediate attachment and colonization of STECs and can be found in 104

the locus of enterocyte effacement (LEE), including intimin (eae) and type 3 secretion 105

system (TTSS) effector proteins (esp, esc, tir), non-LEE effectors (nleA, nleB, nleC), and 106

other putative virulence genes (ehxA, etpD, subA, toxB, saa) (4,25,28,29). 107

108

Metagenomics for sample microbial analysis and targeted detection have been used 109

extensively in many sample types (e.g. spinach, chapati flour, and ice cream) (30-33). 110

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 6: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

6

Analysis is typically either by 16S rDNA profiling or by shotgun metagenomic sequencing 111

(30-35). Many studies have recently started using long read approaches (36-38) for 112

metagenomic studies because it provides finished metagenome-assembled genomes 113

(MAGs) for the most abundant species or bacteria in the microbiome sample (36-38). 114

Closed MAGs provide a better assessment of those genomes and their virulence 115

potential or functionality in that ecosystem. Most of these approaches either used mock 116

communities or did not properly address the limits of the technique for either detection or 117

the initial sample concentration required to get a completely closed or draft genome that 118

would allow for target organism characterization in the sample. Each metagenomic 119

approach provides different levels of analysis. 16S rDNA has the most sensitive 120

detection limit, requiring the lowest initial CFU, but the lowest resolution, reporting at the 121

genus level because some species have almost identical 16S rDNA genes and the 122

fragment of the 16S rDNA used is relatively small (39). Shotgun metagenomic WGS 123

provides a less sensitive detection limit (above 103 CFU/ml), but provides information 124

from species to a strain level, including many functional genes in the microbiome sample 125

analyzed (33,36,40,41). Metagenomic analyses can be made either using short or long 126

sequencing reads technologies. Short-read shotgun metagenomics was most commonly 127

used for microbiome analyses (30-32,35,39,39,42), while the use of long-read 128

metagenomics is on the rise in the last few years (36-38,40) for numerous reasons. 129

These were summarized very concisely by Bertrand et al. (2019) (40), where the authors 130

mention that short-read sequencing presents difficulty in accurately assembling the 131

complex, highly repetitive regions that can range in sizes up to hundreds of kilobases 132

(36), especially when multiple species are present. Classification of these short reads 133

into species bins based on clustering relies on consensus genomes and is not precise 134

enough for strain-level metagenomic assemblies that are necessary for outbreak and 135

traceback scenarios (40). 136

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 7: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

7

137

Nanopore sequencing can produce completely closed genomes, while also offering 138

affordability and portability (24,43). It does not employ a size selection process (unlike 139

those seen with Illumina or Pacific Biosciences sequencers) resulting in longer reads 140

that can help assemble complex genomic areas (e.g. phages and highly repetitive 141

regions). Furthermore, because nanopore sequencing can perform real-time base 142

calling, it allows for semi-real-time analysis when paired with the Oxford Nanopore 143

EPI2ME cloud service that has the “What’s in my pot” (WIMP) workflow (44). WIMP 144

identifies reads by taxa using an algorithm with the Centrifuge software (45) and the 145

RefSeq sequence database at NCBI (https://www.ncbi.nlm.nih.gov/refseq/). It identifies 146

the number of reads matching an organism of interest. These reads can potentially be 147

retrieved from the total reads and analyzed separately. These extracted reads can then 148

be de novo assembled and could result in a completely closed MAG for the organism of 149

interest, in our case STEC O157:H7. This approach could dramatically reduce the time 150

to detect and identify an STEC strain in a sample from two weeks to three days from 151

enrichment. 152

153

Long-read nanopore sequencing has proven a useful tool to close bacterial genomes in 154

metagenomic samples where the bacterial species are present in approximately equal 155

proportions (approx. 12% or 107 CFU/ml) (36,37). The actual cell numbers in the initial 156

sample needed for successful assembly of the genome of interest has not been 157

determined yet. Nanopore sequencing of mock microbial communities suggests that the 158

technology is capable of detecting as few as 50 cells in the reaction (i.e. 4 reads) (37). 159

This, however, will be not enough reads to make an informative identification of STEC 160

serotype or evaluation of potential risk to human health via presence of important 161

virulence genes. While current protocols utilize WGS after isolating individual colonies by 162

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 8: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

8

the selective plating methods described, culture-independent detection of O157:H7 163

STECs in irrigation water (that were FDA BAM-confirmed O157 STEC-positive) by 164

Illumina MiSeq and Oxford Nanopore shotgun sequencing resulted in negative results 165

due to low concentrations of the organism of interest (Gonzalez-Escalona, unpublished 166

results). However, instead of analyzing the water directly, we suggest that using samples 167

after enrichment will have a sufficient STEC concentration to assemble their genome 168

from nanopore sequencing, which will allow for identification of their serotype, virulence 169

composition, and antimicrobial resistance gene (AMR) content (31). 170

171

In order to test and empirically determine the limits of the technique, nanopore 172

sequencing for: 1) detection, 2) characterization, and 3) closing genomes of STECs, we 173

artificially contaminated STEC-negative irrigation water with 10-fold dilutions of E. coli 174

O157:H7 EDL933. We propose a workflow for detection and quantification (qPCR) for 175

STECs in enriched samples followed by identification and typing by nanopore 176

sequencing. We also developed a script to extract the desired reads by taxa from the 177

total sequenced reads. 178

179

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 9: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

9

MATERIALS AND METHODS 180

Bacterial strains and media. We used a variant of the shiga toxin-producing E. coli 181

(STEC) EDL933 O157:H7 strain for all our experiments. This strain was from our 182

collection at CFSAN and is a variant of ATCC 43895 that after several passages in the 183

lab has lost the stx2 phage. EDL933 was grown in static culture overnight in tryptic soy 184

broth (TSB) at 37°C. 185

186

Preparation of E. coli EDL933 inocula for spiking experiments. For artificial 187

contamination, overnight culture (109 CFU/ml) of E. coli EDL933 was serially diluted 10-188

fold in TSB. Dilutions containing approximately 109 CFU/ml through less than 10 CFU/ml 189

were used for spiking studies. Dilutions of the overnight culture spread on tryptic soy 190

agar (TSA) plates were used to calculate the number of CFUs per ml in the original 191

culture. 192

193

Sample processing and artificial contamination. An STEC-negative irrigation water 194

sample (200 ml) from the Southwestern US was enriched by adding an equal volume of 195

2X modified Buffered Peptone Water with pyruvate (mBPWp) and incubated at 37°C 196

static for 5 hours. Antimicrobial cocktail [Acriflavin-Cefsulodin-Vancomycin (ACV)] was 197

added and incubated at 42°C static overnight (18-24 h), according to Chapter 4A of the 198

BAM. E. coli EDL933 dilutions were added in a 1:1 ratio to the enriched irrigation water 199

sample. 200

201

Nucleic acid extraction. DNA from artificially contaminated irrigation water enrichment 202

was extracted by two methods for 1) qPCR and 2) nanopore sequencing. A 1ml fraction 203

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 10: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

10

of each spiked enrichment sample was processed for qPCR analysis according to the 204

FDA BAM Chapter 4A. Briefly, cells were pelleted by centrifugation at 12,000 x g for 3 205

minutes. The pellet was washed in 0.85% NaCl and resuspended in 1mL sterile water. 206

Samples were boiled at 100°C for 10 minutes then centrifuged to pellet debris. The DNA 207

supernatant was saved. Another 1 ml portion of each spiked enrichment sample was 208

extracted using the Maxwell RSC Cultured Cells DNA kit with a Maxwell RSC Instrument 209

(Promega Corporation, Madison, WI) according to manufacturer’s instructions for Gram-210

negative bacteria with additional RNase treatment. DNA concentration was determined 211

by Qubit 4 Fluorometer (Invitrogen, Carlsbad, CA) according to manufacturer’s 212

instructions. 213

214

STEC qPCR detection. The presence of STEC EDL933 was determined by qPCR as 215

described in Chapter 4A of the BAM detecting stx1, stx2, and wzy. Briefly, the DNA 216

supernatants recovered from boiled samples were diluted 1:10 in nuclease-free water 217

and 2l was added to 28l master mix containing 0.25M stx1 and stx2 primers, 0.3M 218

wzy primers, 0.2M stx1 probe, 0.15M stx2 and wzy probes, 1X Internal Positive 219

Control Mix (Cat: 4308323, Applied Biosystems), 1X Express qPCR Supermix Universal 220

Taq (Cat: 11785200, Invitrogen), and ROX passive dye. All primers and probes 221

(Supplementary Table 1) employed in this study were purchased from IDT (Coralville, IA, 222

USA). 223

224

Whole genome sequencing, contigs assembly and annotation. DNA recovered from 225

the E. coli EDL933 spiked water enrichment samples was sequenced using a GridION 226

nanopore sequencer (Oxford Nanopore Technologies, Oxford, UK). The sequencing 227

libraries were prepared using the Genomic DNA by Ligation kit (SQK-LSK109) and run 228

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 11: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

11

in FLO-MIN106 (R9.4.1) flow cells, according to the manufacturer’s instructions for 72 229

hours (Oxford Nanopore Technologies). The run was live base called using Guppy 230

v3.2.10 included in the MinKNOW v3.6.5 (v19.12.6) software (Oxford Nanopore 231

Technologies). 232

233

The initial classification of the reads for each run was done using the “What's in my pot” 234

(WIMP) workflow contained in the EPI2ME cloud service (Oxford Nanopore 235

Technologies). That workflow allows for taxonomic classification of the reads generated 236

by the GridION sequencing in real time. Using the WIMP classification output, the reads 237

that were identified as “Escherichia coli*” were extracted and saved in a single fastq file 238

using a custom python script (v2.7.3) (Supplementary Note 1). The metagenome-239

assembled genomes (MAGs) for each spiked sample were obtained by de novo 240

assembly using 1) all nanopore data output and 2) extracted E. coli reads using the Flye 241

program v2.6 (46) with the meta parameter. The assembled contigs were classified by 242

taxonomy by Kraken (47) using GalaxyTrakr (48) (Flye+Kraken). The presence of the 243

complete genome and synteny of the completely closed genomes on the final 244

assemblies was checked using Mauve genome aligner (49) . 245

246

Closure of strain EDL933 genome by nanopore sequencing. For bioinformatic 247

quality control purposes we generated a closed genome of the strain used in in the 248

artificial contamination studies. The long-read sequencing library was prepared and run 249

as mentioned above for the spiked experiments. All reads below 5,000 base pairs in 250

length were removed from further analysis. The genome was assembled using Flye 251

v1.6 (46). The nanopore output resulted in 144,000 reads for a total yield of 716 Mb. 252

253

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 12: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

12

In silico serotyping. The major serotype present in each sample was determined by 254

batch screening in Ridom SeqSphere+ v 7.0.6 (Ridom, Münster, Germany) using the 255

genes deposited in the Center for Genomic Epidemiology 256

(https://cge.cbs.dtu.dk/services/) for E. coli as part of their web-based tool, 257

SerotypeFinder 2.0 (https://cge.cbs.dtu.dk/services/SerotypeFinder/). 258

259

In silico identification of virulence genes. The presence of virulence genes was 260

determined by batch screening in Ridom SeqSphere+ software v 7.0.6 (Ridom) using the 261

genes deposited in the NCBI Pathogen Detection Reference Gene Catalog 262

(https://www.ncbi.nlm.nih.gov/pathogens/isolates#/refgene/) and described in Gonzalez-263

Escalona and Kase (2019) (25). 264

265

In silico identification of antimicrobial resistance genes. We identified the 266

antimicrobial resistance genes present in our sequenced genomes using the EPI2ME 267

Fastq Antimicrobial Resistance workflow (Oxford Nanopore Technologies). This 268

workflow consists of three processes, including 1) quality control of the reads, 2) WIMP 269

analysis using centrifuge and NCBI RefSeq database, and 3) detection of antimicrobial 270

genes using the CARD database (50). 271

272

Metagenomic and EDL933 data accession numbers. The metagenomic sequence 273

data from this study and the nanopore data for the EDL933 strain used in this study are 274

available in GenBank under bioproject number PRJNA639799. 275

276

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 13: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

13

RESULTS 277

Determination of the detection limit of the STEC qPCR method. The qPCR detection 278

limit of E. coli EDL933 was determined by calculating CFUs per reaction. DNA was 279

extracted from E. coli EDL933 culture according to the boil method described in the BAM 280

Chapter 4A for quantification. The starting CFU/ml concentration determined by plating 281

on TSA plates was 1.5 x 109 CFU/ml. Several 10-fold dilutions of the original boil sample 282

were tested in triplicate with the STEC qPCR assay. The wzy gene was detected over 283

six orders of magnitude from 30 to 3 x 106 CFU per reaction (correlation coefficient (R2) = 284

0.99 and efficiency (E) = 98%, Figure 1A). Likewise, the stx1 gene was detected linearly 285

over six orders of magnitude from 30 to 3 x 106 CFU per reaction (R2 = 0.99 and E = 286

96%, Figure 1B). The limit of the STEC qPCR detection, therefore, was 30 CFU per 287

reaction. This means that the minimal number of detectable STEC cells in enrichment 288

culture using this qPCR protocol will be approximately 105 CFU/ml. 289

290

STEC-spiked enriched sample preparation for limit of detection of nanopore 291

sequencing. In order to assess the performance and detection limit of nanopore 292

sequencing on STEC enrichment samples, we used an aliquot of irrigation water found 293

by the FDA BAM method to have nondetectable amounts of O157:H7 STEC. Fractions 294

of this enriched irrigation water sample were artificially contaminated with serial dilutions 295

of E. coli EDL933 overnight culture. A schematic representation of the workflow is shown 296

in Figure 2. The concentration of the stock E. coli EDL933 overnight culture was 297

determined by agar plating to be 1.46 x 109 CFU/ml. Artificial contamination of the 298

enriched field irrigation water sample (1:1), therefore, resulted in an approximate 299

concentration of 7.3 x 108 CFU/ml for the sample referred to as Water+Ecoli1. We 300

confirmed detection and quantification by the BAM qPCR method for this sample. The 301

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 14: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

14

BAM qPCR method detected the presence of both wzy (Cq = 22) and stx1 (Cq = 23) 302

genes in Water+Ecoli1. Using the standard curve determined above, we calculated 1.1 x 303

105 CFU/reaction using the wzy gene and 9.8 x 104 CFU/reaction using the stx1 gene. 304

This approximates to 5.4 x 108 CFU/ml using the wzy gene and 4.9 x 108 CFU/ml using 305

the stx1 gene in the Water+Ecoli1 sample. Therefore, the estimated CFU/ml 306

concentration measured by plating and qPCR was very similar. 307

308

Bacterial community associated with BAM enrichment of irrigation water. The 309

bacterial community composition of the enriched irrigation water was determined by a 310

metagenomic analysis using Oxford Nanopore sequencing of the DNA isolated from the 311

sample. The nanopore output resulted in 5.75M reads in 16.9Gb total yield 312

(Supplementary Table 2). The metagenomic bacterial composition of the sample was 313

analyzed by two methods, Oxford Nanopore EPI2ME “What’s in my pot” (WIMP) 314

workflow analysis and a de novo assembly using Flye of all the reads followed by a 315

classification of all the contigs in the assembly by the k-mer software (Kraken). We only 316

reported the taxa accounting for greater than 1% of the total bacterial community. The 317

total WIMP output can be found at 318

https://epi2me.nanoporetech.com/workflow_instance/227465?token=D9EAC8AA-4839-319

11EA-99BC-71806BDB886C and the total Flye+Kraken output can be found in 320

Supplementary Table 3. 321

322

WIMP analysis of the reads obtained by nanopore sequencing of the enriched water 323

sample showed a highly diverse bacterial composition. Even though hundreds of 324

bacterial species were identified in the sample, the majority of the sample was 325

composed of nine bacterial species (>1% in Table 1). The nine bacterial species found in 326

descending cumulative order of frequency were: Klebsiella pneumoniae (28.52%), 327

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 15: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

15

Enterobacter cloacae (21.18%), Enterobacter sp. ODB01 (6.71%), Enterobacter kobei 328

(6.53%), Pseudomonas putida (3.88%), Citrobacter freundii (3.86%), Acinetobacter 329

baumannii (3.69%), Enterobacter hormaechei (3.42%), and Enterobacter xiangfangensis 330

(1.22%). Additionally, there were 10,806 reads (0.25%) that were identified as belonging 331

to Escherichia coli. These E. coli reads did not match STEC O157:H7. 332

333

All these microorganisms were identified in the Flye de novo assembly. Flye assembly 334

resulted in 677 contigs of different sizes. Taxa identification by Kraken showed that the 335

contigs of bigger sizes belonged to these taxa: Acinetobacter baumannii (3,784,399 bp), 336

Citrobacter freundii (2,003,414 bp), Klebsiella pneumoniae (1,934,072 bp), Enterobacter 337

cloacae (1,001,408 bp), Pseudomonas putida (438,653 bp), and Enterobacter kobei 338

(365,044 bp). Enterobacter hormaechei and Enterobacter xiangfangensis were also 339

represented in the contigs assembled by Flye, but in the Kraken database hormachei 340

and xiangfangensis are listed as subspecies of E. hormachei. Many other 341

microorganisms were also identified (Supplementary Table 3). Four of the 677 contigs 342

were identified as matching E. coli with the largest being 30,837 bp in length. 343

344

We were also interested in testing the possibility that AMR genes could be identified. We 345

have used the EPI2ME Fastq Antimicrobial Resistance workflow, which processes all 346

nanopore reads in three stages: 1) reads are passed through a quality filter, 2) taxa are 347

identified by the WIMP workflow, and 3) the classified reads are then analyzed for AMR 348

genes using the CARD database (https://card.mcmaster.ca/home). The prior 349

classification by WIMP permits identification of the AMR genes in each particular 350

species. The AMR genes found in the field irrigation water sample include -lactamase 351

genes in Klebsiella pneumoniae (blaSHV, blaACT), Enterobacter cloacae (blaSHV, blaACT), 352

Citrobacter freundii (blaSHV, blaCMY), Acinetobacter baumannii (blaOXA), and Enterobacter 353

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 16: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

16

hormaechei (blaACT) 354

(https://epi2me.nanoporetech.com/workflow_instance/232304?token=CCD817B6-742C-355

11EA-B9EA-1AEE73EF14E7). Several efflux pump genes that can be associated with 356

antibiotic resistance were also found in Klebsiella pneumoniae (acr, ram), Enterobacter 357

cloacae (acr, ram, vga), and Acinetobacter baumannii (abe, ade, mex). Lastly, a PhoP 358

gene mutation was detected in Klebsiella pneumoniae (977 reads matching) conferring 359

colistin resistance and the QnrB23 gene (29 reads matching) that confers 360

fluoroquinolone resistance was detected in Citrobacter freundii. 361

362

Nanopore long-read detection limit for E. coli spiked into irrigation water 363

enrichment. After establishing the bacterial community of the un-spiked enriched 364

irrigation water sample, we sequenced DNA obtained from the artificially contaminated 365

water enrichment at levels 7 x 108 CFU/ml (Water+Ecoli1) to 7 x 103 CFU/ml 366

(Water+Ecoli6) by nanopore. One microgram of DNA per sample was used as initial 367

DNA amount for nanopore sequencing and each was run in a single flow cell (Figure 2). 368

The total nanopore output per sample can be found in Supplementary Table 2. On 369

average, each run resulted in approximately 5 million reads with a total yield of 17 Gb. 370

371

The nanopore output for each sample/run was analyzed using the WIMP workflow and 372

Flye+Kraken as described earlier for the enriched water sample. WIMP workflow 373

classified between 1.7 million (Water+Ecoli1) and 8,500 (Water+Ecoli6) reads as E. coli 374

for the serially diluted spiked irrigation water samples (Table 2). These reads accounted 375

for 53 to 0.26% of the total reads for each run (Figure 3). The de novo Flye assembly of 376

the reads for each of the spiked water samples produced assemblies with 555 to 780 377

contigs. 378

379

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 17: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

17

The enrichment samples with the highest E. coli EDL933 spiked concentrations, 380

Water+Ecoli1 and 2, showed the highest number of reads classified as E. coli by WIMP 381

(Table 2). As expected, the number of E. coli reads classified by WIMP decreased 382

accordingly with dilution of spiked E. coli. Approximately 8,500 E. coli reads were 383

identified in the lowest level of spiked EDL933 Water+Ecoli6 (7 x 103 CFU/ml), almost 384

the same number of reads as the un-spiked enriched water sample (Table 2). WIMP 385

further classified 68 reads as belonging to O157 in that sample. Therefore, the detection 386

limit for O157:H7 by nanopore sequencing was established at 7 x 103 CFU/ml in the 387

enrichment. 388

389

Nanopore long-read genome assembly limit for E. coli spiked into irrigation water 390

enrichment. After showing that the detection limit of nanopore sequencing was 7 x 103 391

CFU/ml, we sought to establish the genome assembly limit. A genome assembly limit is 392

the minimum number of reads used in a de novo assembly that produces 1) a complete 393

metagenome-assembled genome or MAG at 20X coverage (chromosome and plasmid, 394

if present) or 2) a fragmented MAG from a complex bacterial background. Obtaining a 395

complete or fragmented E. coli O157 MAG will allow us to perform a positive 396

identification of the genome of interest, EDL933, as well as enabling a complete 397

genomic characterization (determine serotype and presence of stx types, eae gene, 398

virulence genes, and AMR genes). We expect this read number to be proportionally 399

related to the initial number of CFU/ml in the sample. To test this hypothesis, the 400

completely closed genome of the E. coli EDL933 strain used in our spiking experiments 401

was used. By using this reference genome, we ensured the accuracy of our in silico 402

analysis to detect the serotype and the entire virulence gene profile. Our EDL933 strain 403

is serotype O157:H7 and has stx1a, eae gamma-1 and other virulence genes (toxB, 404

etpD, tccP, etc). EDL933 available at GenBank (AE005174) has additionally the stx2 405

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 18: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

18

gene. We aligned both genomes and found that our version has the entire stx2 phage 406

missing (results not shown). 407

408

When the total nanopore sequencing output was de novo assembled by Flye, the total 409

number of contigs was similar to the un-spiked water sample with 555 and 627 contigs 410

for Water+Ecoli1 and 2, respectively. The E. coli EDL933 O157:H7 genome could be 411

detected in 4 or 5 contigs, the plasmid was present as a single contig in each (Table 2). 412

Serotype analysis accurately identified the E. coli as O157:H7. The presence of stx1a 413

and eae gamma-1 were also detected. In fact, the serotype and stx and eae genes could 414

be determined in Water+Ecoli3 and 4, however the completely closed E. coli O157 MAG 415

could not be obtained at any spiked concentration (Table 2). 416

417

At the lowest spiked levels in samples Water+Ecoli5 and 6, the number of reads 418

associated with E. coli was approximately the same as had been detected in the un-419

spiked water sample (Table 2). Only the O9 serotype was identified in the assemblies, 420

which was the same as detected in the un-spiked water sample. Likewise, detection of 421

the stx and eae genes was lost (Table 2). Therefore, we determined that the limit of 422

fragmented assembly was approximately 7 x 105 CFU/ml, but we were not able to obtain 423

a completely O157:H7 STEC closed MAG even at the highest level of 7 x 108 CFU/ml 424

using this approach. 425

426

In order to improve the assembly, we decided to extract the E. coli reads and perform 427

the de novo assembly again. Using WIMP classified reads allowed us to run a script 428

(Supplementary Note 1) that extracted only the reads identified as E. coli and perform a 429

similar analysis as above. Flye assembly of these filtered reads produced assemblies 430

that contained fewer number of contigs with larger sizes. High inoculation levels in 431

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 19: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

19

spiked samples Water+Ecoli1 and 2 resulted in assemblies with 44 and 40 contigs, 432

respectively, each containing the completely closed E. coli O157 MAG (chromosome 433

and plasmid each in a single contig) (Table 3). Serotype analysis accurately detected 434

O157:H7 in assemblies from samples Water+Ecoli1 through 4. The same was observed 435

for the recognition of stx1a and eae gamma-1 genes. 436

437

At the lowest spiked levels in samples Water+Ecoli5 and 6, Flye was able to still produce 438

an assembly but with lower number of contigs (approximately 30 contigs). However, in 439

this case no O157:H7, stx1a, or eae gamma-1 were detected by in silico analyses. 440

Serotype analysis identified the presence of the O9 serotype (Table 3). 441

442

Further analysis of the E. coli extracted reads for Water+Ecoli5 and Water+Ecoli6 443

revealed that only 43 and 31 reads matched E. coli O157. Kraken identified those reads 444

from a total of 2,326 (Water+Ecoli5) and 1,749 (Water+Ecoli6) E. coli reads, 445

respectively. This confirmed our previous analysis that the limit of detection of reads 446

matching O157 is approximately 7 x 103 CFU/ml. Our fragmented assembly limit was still 447

7 x 105 CFU/ml, but the assembly improved as we were able to produce a completely 448

circular closed chromosome in a single contig with a concentration of at least 7 x 107 449

CFU/ml. 450

451

Virulence gene identification. In addition to detecting and serotyping STECs, the 452

detection of virulence genes provides necessary information in outbreak scenarios and 453

is important to food safety. We again used the spiked E. coli EDL933 genome as 454

reference for in silico analysis (Bioproject number PRJNA639799). We have previously 455

reported a set of 94 virulence genes (25) and the EDL933 genome showed the presence 456

of 23 of those genes (Table 4). Among those genes were esp, tccP, nle genes, tir, and 457

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 20: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

20

toxB. The assemblies generated above were analyzed for the presence of all 23 458

virulence genes. Corresponding to the limits of MAG assembly (either completely closed 459

or fragmented), all virulence genes were detected in Water+Ecoli1 and Water+Ecoli2 460

when using the total nanopore reads output or extracted reads. On the other hand, we 461

failed to detect tccP in Water+Ecoli3 (7 x 106 CFU/ml) and pssA and air in Water+Ecoli4 462

(7 x 105 CFU/ml) samples in assemblies with the total nanopore output, but when the E. 463

coli reads were extracted from the WIMP output, all virulence genes could be detected. 464

Thus, our extracted E. coli reads script improved our genome assembly. 465

466

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 21: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

21

DISCUSSION 467

Considering the importance of irrigation water to food safety, accurate detection and 468

classification of STECs potentially present in those waters is paramount, particularly 469

during an outbreak incident. Current methods of detection include qPCR and extensive 470

selective plating before WGS analysis. This method is a time-consuming process that 471

only provides confirmation of an isolate after almost two weeks of labor. By combining 472

qPCR and long-read metagenomic analysis of the pre-enrichment we can definitively 473

detect an STEC isolate, as well as characterize its virulence potential in 3 - 4 days. 474

While this will not replace eventual confirmation by microbiological methods, this 475

reduces the time for a prospective corrective measure by a complete week. 476

477

In our study we have empirically determined the in silico limits of detection, classification, 478

and closing genomes of STECs in E. coli EDL933 artificially contaminated irrigation 479

water using nanopore sequencing as a proof of concept. We have also developed a 480

pipeline for determination of these limits that can be used for other foodborne or clinical 481

bacteria (Figure 4). Our results showed that the level of STEC O157 needed for 482

detection in the enrichment sample was 103 CFU/ml (Tables 2 and 3). While STEC O157 483

levels between 105 CFU/ml to 106 CFU/ml were enough to produce a fragmented MAG 484

in a few contigs that allowed for complete characterization of the STEC genome, levels 485

above 107 CFU/ml were enough to produce a completely closed STEC MAG (Table 3) 486

with genome coverage of 385X. The complete plasmid was generated from STEC levels 487

above 104 CFU/ml. These recovered MAGs (either fragmented or completely closed) 488

from all spiked samples above 105 CFU/ml allowed us to comprehensively characterize 489

the virulotype and genome synteny matching 100% to the spiked EDL933 strain 490

(Supplementary Figure 1). The genome of the strain used in this study was sequenced 491

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 22: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

22

and completely closed by us and used as reference for genome completeness for the de 492

novo assemblies sourced from the artificially contaminated samples. Our variant strain of 493

EDL933 was devoid of the stx2 phage by qPCR. A comparison of the EDL933 genome 494

published earlier (GenBank accession AE005174) and ours showed that stx2 phage was 495

completely missing in our strain, confirming the qPCR results. 496

497

These limits of detection and assembly by nanopore sequencing in conjunction with the 498

use of qPCR for screening the levels of STEC in the pre-enrichments provide an obvious 499

advantage. By combining the qPCR result and the likelihood of genome closure by 500

nanopore sequencing, we have provided an excellent tool for predicting when to pursue 501

sequencing DNA from a particular sample (Figure 4A). The tangible benefit of this 502

combination will depend on the depth of analysis needed – detection versus 503

characterization. While metagenomics has a lower detection limit than that of the BAM 504

qPCR methodology (103 CFU/ml versus 105 CFU/ml) for STEC, it costs 1,000 times 505

more per sample. The sensitivity of nanopore can partially be attributed to the proportion 506

of the initial sample that is used. Nanopore uses approximately 3% of the initial sample 507

(approx. 200 ng/rxn of 6 ug DNA extracted), while qPCR uses 0.02% (2ul of a 1:10 508

dilution from 1 ml extraction, approx. 1ng/rxn). qPCR provides a directed approach but is 509

only as good as the primers included and can only detect the intended target. In the 510

cases of Salmonella spp. or Listeria monocytogenes, their detection by qPCR or 511

metagenomic analysis, is beneficial as per FDA’s zero tolerance policy for these two 512

microorganisms in foods. However, presumptive-positive samples would have to be 513

culturally confirmed before any regulatory action could be taken to stop importation or 514

interstate transport of a particular commodity (51) 515

(https://www.fda.gov/media/102633/download; 516

https://www.fda.gov/media/83177/download). On the other hand, if you want to fully 517

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 23: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

23

characterize an organism by using their complete genome, the methodology described 518

herein will allow any laboratory to expedite the process of detection and characterization 519

(Figure 4B). In the case of STECs the entire genome is needed in order to make an 520

informative decision of their potential health risk to humans. Many E. coli strains are not 521

harmful to humans and will pose no risk for public health. Thus, obtaining the completely 522

closed genome of any potential STEC will provide an accurate characterization of all 523

virulence genes it possesses to allow a prediction of potential health risk (25). In our 524

study, the de novo assembly of the complete MAG for EDL933 (either in fragments or 525

completely closed) was achieved in all samples with levels above 105 CFU/ml, which 526

was almost equivalent to the limit of detection by qPCR, but with the added benefit of 527

complete genome characterization critically needed during outbreak and traceback 528

investigations. 529

530

Mining for specific reads matching your organism of interest in metagenomic sequencing 531

data is challenging and requires conducting assemblies using millions of reads with the 532

consequent time and computing resources that can impact the accuracy of the genome 533

assembler employed (30,31,37,39,43). We took advantage of the WIMP workflow 534

included in the EPI2ME cloud service (Oxford Nanopore) that classifies each single read 535

in a .csv file and downloads those classified reads into a pass folder. A script was written 536

that separated the desired reads by taxa into a new folder. Assemblies produced by 537

using all reads versus using only filtered reads by taxa were compared and completely 538

closed O157:H7 MAGs for 107 and 108 CFU/ml levels with filtered reads were obtained 539

(Tables 2 and 3). As expected, the assemblies produced with taxa filtered reads were 540

faster, more precise and consumed less resources. 541

542

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 24: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

24

Unlike other technologies, nanopore sequencing output is dependent on the quality of 543

the DNA. Some nanopore metagenomics applications can be conducted directly from 544

samples in which DNA extracts do not contain inhibitors and where the target 545

organism(s) is in enough concentration to be detected (36,37). However, STECs in 546

irrigation water require further processing due to low initial concentrations and the 547

presence of considerable humic acid and other unknown inhibitors. Cleaning of those 548

DNAs resulted in loss and shearing of the DNA (Gonzalez-Escalona, unpublished 549

results), with the consequent loss of both resolution and capability of closing target 550

genomes or MAGs. Hence, there is a need to increase the initial biomass of the target 551

organism (STEC in our case) by an enrichment method that will also minimize the 552

contaminants and improve the quality and quantity of the final DNA. By using the Oxford 553

Nanopore Ligation Kit, we have maximized the potential output, which is preferable for 554

metagenomic analyses. In the future we plan to validate these findings with the Rapid 555

Kit, which would further decrease time between sample processing and analysis with the 556

expectation of completely closing MAGs of organisms above 107 CFU/ml. During sample 557

analysis, we noticed that 73% of the reads were below 5000 bp, this also could impact 558

the closing of genomes of interest and also impact the microbial profile or MAGs from 559

that sample. In our case after removing those reads, the same microbial profile was 560

observed, albeit with fewer reads per organism (results not shown). Our future plans 561

include finding a better method for DNA extraction that could provide higher DNA 562

recovery with less shearing to maximize the potential of nanopore sequencing from 563

enriched culture samples. Some authors have addressed DNA shearing when extracting 564

the DNA and suggest gentler bead-beating steps may yield less sheared high molecular 565

weight (HMW) DNA, but might fail in extracting DNA from most difficult organisms (36). 566

567

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 25: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

25

By using our proposed pipeline, we were not only able to improve detection and 568

characterization of our desired organism (STEC), we were also able to identify the 569

bacteria species that were present in the un-spiked enriched irrigation water. This 570

analysis showed that the most common microorganisms (>1% abundance) enriched by 571

the BAM method belonged to the genera Enterobacter, Klebsiella, Pseudomonas, 572

Acinetobacter, and Citrobacter. We also identified Salmonella, Escherichia, Serratia, 573

Edwardsiella, Yersinia, and Cronobacter among the 0.1% abundance. De novo 574

assembly of the long-read data resulted in 677 contigs with most of these MAGs in a 575

fragmented stage, we were not able to recover a completely closed genome. 576

Nevertheless, we were able to recover the complete genome for Acinetobacter 577

baumannii (~3.9 Mb) in 17 contigs (with the longest contig 3,784,399 bp), Enterobacter 578

cloacae in 167 contigs, Klebsiella pneumoniae in 35 contigs, Citrobacter freundii in 21 579

contigs, and Pseudomonas putida in 214 contigs (Supplementary Table 4). We could 580

not recover the MAGs for Enterobacter sp., Enterobacter kobei, Enterobacter 581

hormaechei subsp. Hormachei or Enterobacter hormaechei subsp. Xiangfangensis 582

strains, even though they were present in higher abundance than Acinetobacter 583

baumannii. The most plausible explanation could be that there were many different 584

strains representing those species and therefore it was very hard to assemble their 585

individual genomes. We did find some E. coli reads in the un-spiked enriched irrigation 586

water sample (10,806 reads), suggesting the presence of E. coli in the original irrigation 587

water sample at very low concentrations. However, the E. coli identified by in silico 588

molecular serotyping matched to O9 serotype (Table 3), not the O157:H7 serotype of 589

our spiked EDL933 strain, and no virulence genes were found by our in silico 590

virulotyping (25). As mentioned by Leonard et al. (2015), having found other non-591

pathogenic E. coli in the original water sample reinforces the need to obtain the 592

complete genomes in order to assess the potential virulence of any E. coli strain (31). If 593

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 26: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

26

we applied the same script for other organisms with abundance above 3% but used 594

different taxa to filter their reads, we could potentially close those genomes as well. This 595

opens up a very attractive way of obtaining closed MAGs from metagenomic samples, 596

similar to what was obtained previously for fecal samples (36). 597

598

In addition to taxa identification, another advantage of our proposed pipeline was that we 599

surveyed the microbial community for the presence of AMR genes in the un-spiked 600

enriched water sample. The Antimicrobial Resistance workflow in EPI2ME (Oxford 601

Nanopore) provides AMR gene detection and identifies the organism carrying that AMR 602

gene based on the WIMP classification of the read. This specific result will be hard to 603

achieve when using short reads. AMR genes found in the irrigation water sample 604

included several beta-lactamase and efflux pump genes that confer antibiotic resistance 605

in Klebsiella pneumoniae, Enterobacter, Citrobacter freundii, Acinetobacter baumannii, 606

and Enterobacter hormaechei. The qnrB23 gene variant (29 reads matching) that 607

confers fluoroquinolone resistance was detected in Citrobacter freundii. Finally, besides 608

AMR genes we observed the presence of point mutations which confers resistance to 609

colistin in Klebsiella pneumoniae (PhoP gene mutation with 977 reads matching, 855X 610

coverage). Antibiotic resistant bacteria in humans has been linked to food sources (52), 611

making the presence of these AMR genes in known human pathogens such as 612

Klebsiella pneumoniae and Acinetobacter baumannii worrisome. Acetinobacter has 613

recently been shown to use killing-enhanced horizontal gene transfer (53), which 614

suggests further study given the high number of AMR genes present in this sample. 615

Additionally, as the soil filters and concentrates the bacteria in the irrigation water, the 616

risk for human consumption increases (54). National and international organizations, 617

such as the National Antimicrobial Resistance Monitoring System (NARMS - 618

https://www.cdc.gov/narms/index.html), One Health approach 619

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 27: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

27

(https://www.cdc.gov/onehealth/index.html) and the Global AMR Surveillance System 620

(GLASS - https://www.who.int/glass/en/) use the resources of the CDC, USDA, FDA, 621

and WHO to monitor and report the prevalence of and distribute regulatory guidance on 622

antimicrobial resistance in pathogenic and commensal bacteria in food and food animals 623

(52,55). Our pipeline could be an important screening tool to enhance future testing. 624

625

Conclusions: 626

Overall, we tested the limits of detection and assembly for EDL933 O157:H7 in enriched 627

irrigation water using a shotgun long-read sequencing approach. We determined that the 628

limit of detection was 103 CFU/ml. We were able to show that if the levels of the target 629

organism were above 105 CFU/ml, the complete genome will be assembled into several 630

fragments, aided by filtering the reads by taxa. This fragmented genome will be enough 631

to make a complete characterization of the STEC strain including serotype and 632

virulotype. We also determined the detection limit of the BAM STEC qPCR (105 CFU/ml) 633

coincided with our STEC assembly limit by nanopore sequencing. Therefore, we 634

recommend a combination approach using qPCR and nanopore sequencing. In the 635

screening stage, qPCR can provide both detection and an estimate of CFU/ml 636

concentration which could predict if subsequent nanopore sequencing will produce 637

enough data to obtain a complete MAG of the target organism, either closed or 638

fragmented. It also provides information on how many samples can be run in a single 639

flow cell. In our case we used one sample per flow cell, but if levels > 107 CFU/ml we 640

could use 3 - 4 samples per flow cell, reducing the sequencing cost per sample. We 641

expect that the use of this pipeline could enhance the capacity of Public Health entities 642

to respond faster and more accurately during outbreak and traceback investigations. 643

Moreover, further advances in nanopore long-read sequencing accuracy will improve the 644

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 28: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

28

quality of these MAGs allowing use for phylogenetic analysis and providing a myriad of 645

applications in metagenomics. 646

647

648

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 29: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

29

ACKNOWLEDGEMENTS 649

The study was supported by funding from the MCMi Challenge Grants Program 650

Proposal #2018-646 and the FDA Foods Program Intramural Funds. Dr. Maguire 651

acknowledges a Research Fellowship Program (ORISE) for the Center for Food 652

Safety and Applied Nutrition administered by the Oak Ridge Associated 653

Universities through a contract with the U.S. Food and Drug Administration. 654

655

656

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 30: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

30

REFERENCES 657

658

1. Beutin, L. and A. Martin. 2012. Outbreak of Shiga toxin-producing Escherichia 659

coli (STEC) O104:H4 infection in Germany causes a paradigm shift with regard 660

to human pathogenicity of STEC strains. J. Food Prot. 75:408-418. 661

2. Mellmann, A., M. Bielaszewska, R. Kock, A. W. Friedrich, A. Fruth, B. 662

Middendorf, D. Harmsen, M. A. Schmidt, and H. Karch. 2008. Analysis of 663

collection of hemolytic uremic syndrome-associated enterohemorrhagic 664

Escherichia coli. Emerg. Infect. Dis. 14:1287-1290. 665

3. Tarr, P. I., C. A. Gordon, and W. L. Chandler. 2005. Shiga-toxin-producing 666

Escherichia coli and haemolytic uraemic syndrome. Lancet 365:1073-1086. 667

4. Gonzalez-Escalona, N., J. Meng, and M. P. Doyle. 2019. Shiga toxin-producing 668

Escherichia coli, p. 289-315. In: M. P. Doyle, F. Diez-Gonzalez, and C. Hill 669

(eds.), Food Microbiology: Fundamentals And Frontiers. 5th ed. ASM press, 670

Washington DC. 671

5. Mead, P. S., L. Slutsker, V. Dietz, L. F. McCaig, J. S. Bresee, C. Shapiro, P. 672

M. Griffin, and R. V. Tauxe. 1999. Food-related illness and death in the United 673

States. Emerg. Infect Dis 5:607-625. 674

6. Allos, B. M., M. R. Moore, P. M. Griffin, and R. V. Tauxe. 2004. Surveillance 675

for sporadic foodborne disease in the 21st century: the FoodNet perspective. Clin 676

Infect. Dis. 38 Suppl 3:S115-S120. 677

7. Scallan, E., R. M. Hoekstra, F. J. Angulo, R. V. Tauxe, M. A. Widdowson, S. 678

L. Roy, J. L. Jones, and P. M. Griffin. 2011. Foodborne illness acquired in the 679

United States--major pathogens. Emerg. Infect. Dis. 17:7-15. 680

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 31: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

31

8. Brooks, J. T., E. G. Sowers, J. G. Wells, K. D. Greene, P. M. Griffin, R. M. 681

Hoekstra, and N. A. Strockbine. 2005. Non-O157 Shiga toxin-producing 682

Escherichia coli infections in the United States, 1983-2002. J. Infect Dis. 683

192:1422-1429. 684

9. Sivapalasingam, S., C. R. Friedman, L. Cohen, and R. V. Tauxe. 2004. Fresh 685

produce: a growing cause of outbreaks of foodborne illness in the United States, 686

1973 through 1997. J Food Prot. 67:2342-2353. 687

10. Tack, D. M., L. Ray, P. M. Griffin, P. R. Cieslak, J. Dunn, T. Rissman, R. 688

Jervis, S. Lathrop, A. Muse, M. Duwell, K. Smith, M. Tobin-D'Angelo, D. J. 689

Vugia, K. J. Zablotsky, B. J. Wolpert, R. Tauxe, and D. C. Payne. 2020. 690

Preliminary Incidence and Trends of Infections with Pathogens Transmitted 691

Commonly Through Food - Foodborne Diseases Active Surveillance Network, 10 692

U.S. Sites, 2016-2019. MMWR. Morb. Mortal. Wkly. Rep. 69:509-514. 693

11. Bottichio, L., A. Keaton, D. Thomas, T. Fulton, A. Tiffany, A. Frick, M. 694

Mattioli, A. Kahler, J. Murphy, M. Otto, A. Tesfai, A. Fields, K. Kline, J. 695

Fiddner, J. Higa, A. Barnes, F. Arroyo, A. Salvatierra, A. Holland, W. 696

Taylor, J. Nash, B. M. Morawski, S. Correll, R. Hinnenkamp, J. Havens, K. 697

Patel, M. N. Schroeder, L. Gladney, H. Martin, L. Whitlock, N. Dowell, C. 698

Newhart, L. F. Watkins, V. Hill, S. Lance, S. Harris, M. Wise, I. Williams, C. 699

Basler, and L. Gieraltowski. 2019. Shiga Toxin-Producing Escherichia coli 700

Infections Associated With Romaine Lettuce - United States, 2018. Clinical 701

Infectious Diseases. 702

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 32: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

32

12. Fischer, M., Bourner, A., and Plunkett, G. 2015. Outbreak alert! 2015: a 703

review of foodborne illness in the US from 2004-2013. Washington DC, Center 704

for Science in the Public Interest. 705

13. Olaimat, A. N. and R. A. Holley. 2012. Factors influencing the microbial safety 706

of fresh produce: a review. Food Microbiol. 32:1-19. 707

14. Rangel, J. M., P. H. Sparling, C. Crowe, P. M. Griffin, and D. L. Swerdlow. 708

2005. Epidemiology of Escherichia coli O157:H7 outbreaks, United States, 1982-709

2002. Emerg. Infect Dis. 11:603-609. 710

15. Steele, M. and Odumeru, J. Irrigation water as source of foodborne pathogens 711

on fruit and vegetables. Journal of food protection 67(12), 2839-2849. 2004. 712

16. Uyttendaele, M., L. A. Jaykus, P. Amoah, A. Chiodini, D. Cunliffe, L. 713

Jacxsens, K. Holvoet, L. Korsten, M. Lau, P. McClure, G. Medema, I. 714

Sampers, and P. Rao Jasti. 2015. Microbial Hazards in Irrigation Water: 715

Standards, Norms, and Testing to Manage Use of Water in Fresh Produce Primary 716

Production. Comprehensive Reviews in Food Science and Food Safety 14:336-717

356. 718

17. Leaman, S., Gorny, J., Wetherington, D., and Bekris, H. 2014. Agricultural 719

Water. Center for Produce Safety Five Year Research Review . 720

18. Monaghan, J. M. and M. L. Hutchison. 2012. Distribution and decline of 721

human pathogenic bacteria in soil after application in irrigation water and the 722

potential for soil-splash-mediated dispersal onto fresh produce. J Appl. Microbiol 723

112:1007-1019. 724

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 33: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

33

19. Oliveira, M., I. Vinas, J. Usall, M. Anguera, and M. Abadias. 2012. Presence 725

and survival of Escherichia coli O157:H7 on lettuce leaves and in soil treated 726

with contaminated compost and irrigation water. Int J Food Microbiol. 156:133-727

140. 728

20. Allende, A. and J. Monaghan. 2015. Irrigation Water Quality for Leafy Crops: 729

A Perspective of Risks and Potential Solutions. Int J Environ. Res. Public Health 730

12:7457-7477. 731

21. Allard, M. W., R. Bell, C. M. Ferreira, N. Gonzalez-Escalona, M. Hoffmann, 732

T. Muruvanda, A. Ottesen, P. Ramachandran, E. Reed, S. Sharma, E. 733

Stevens, R. Timme, J. Zheng, and E. W. Brown. 2018. Genomics of foodborne 734

pathogens for microbial food safety. Curr. Opin. Biotechnol 49:224-229. 735

22. Bergholz, T. M., A. I. Moreno Switt, and M. Wiedmann. 2014. Omics 736

approaches in food safety: fulfilling the promise? Trends Microbiol. 22:275-281. 737

23. Sekse, C., A. Holst-Jensen, U. Dobrindt, G. S. Johannessen, W. Li, B. 738

Spilsberg, and J. Shi. 2017. High Throughput Sequencing for Detection of 739

Foodborne Pathogens. Front. Microbiol. 8:2029. 740

24. Gonzalez-Escalona, N., M. A. Allard, E. W. Brown, S. Sharma, and M. 741

Hoffmann. 2019. Nanopore sequencing for fast determination of plasmids, 742

phages, virulence markers, and antimicrobial resistance genes in Shiga toxin-743

producing Escherichia coli. PLoS One 14:e0220494. 744

25. Gonzalez-Escalona, N. and J. A. Kase. 2019. Virulence gene profiles and 745

phylogeny of shiga toxin-positive Escherichia coli strains isolated from FDA 746

regulated foods during 2010-2017. PLoS One 14:e0214620. 747

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 34: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

34

26. Hoffmann, M., Y. Luo, S. R. Monday, N. Gonzalez-Escalona, A. R. Ottesen, 748

T. Muruvanda, C. Wang, G. Kastanis, C. Keys, D. Janies, I. F. Senturk, U. V. 749

Catalyurek, H. Wang, T. S. Hammack, W. J. Wolfgang, D. Schoonmaker-750

Bopp, A. Chu, R. Myers, J. Haendiges, P. S. Evans, J. Meng, E. A. Strain, M. 751

W. Allard, and E. W. Brown. 2016. Tracing Origins of the Salmonella Bareilly 752

Strain Causing a Food-borne Outbreak in the United States. J. Infect Dis. 753

213:502-508. 754

27. Gonzalez-Escalona, N., M. Toro, L. V. Rump, G. Cao, T. G. Nagaraja, and J. 755

Meng. 2016. Virulence gene profiles and clonal relationships of Escherichia coli 756

O26:H11 isolates from feedlot cattle as determined by whole-genome sequencing. 757

Appl. Environ. Microbiol. 82:3900-3912. 758

28. Kaper JB, Nataro JP, and Mobley HL. 2004. Pathogenic Escherichia coli. Nat 759

Rev Microbiol 2:123-140. 760

29. Garmendia, J., G. Frankel, and V. F. Crepin. 2005. Enteropathogenic and 761

enterohemorrhagic Escherichia coli infections: translocation, translocation, 762

translocation. Infect. Immun. 73:2573-2585. 763

30. Leonard, S. R., M. K. Mammel, D. W. Lacher, and C. A. Elkins. 2016. Strain-764

Level Discrimination of Shiga Toxin-Producing Escherichia coli in Spinach 765

Using Metagenomic Sequencing. PLoS One 11:e0167870. 766

31. Leonard, S. R., M. K. Mammel, D. W. Lacher, and C. A. Elkins. 2015. 767

Application of metagenomic sequencing to food safety: detection of Shiga Toxin-768

producing Escherichia coli on fresh bagged spinach. Appl. Environ. Microbiol. 769

81:8183-8191. 770

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 35: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

35

32. Lusk, P. T., P. Ramachandran, E. Reed, J. A. Kase, and A. Ottesen. 2018. 771

Metagenomic Description of Preenrichment and Postenrichment of Recalled 772

Chapati Atta Flour Using a Shotgun Sequencing Approach. Genome Announc. 6. 773

33. Ottesen, A., P. Ramachandran, Y. Chen, E. Brown, E. Reed, and E. Strain. 774

2020. Quasimetagenomic source tracking of Listeria monocytogenes from 775

naturally contaminated ice cream. BMC Infect. Dis. 20:83. 776

34. Kovac, J., H. d. Bakker, L. M. Carroll, and M. Wiedmann. 2017. Precision 777

food safety: A systems approach to food safety facilitated by-ágenomics tools. 778

TrAC Trends in Analytical Chemistry 96:52-61. 779

35. Gigliucci, F., F. A. B. von Meijenfeldt, A. Knijn, V. Michelacci, G. Scavia, F. 780

Minelli, B. E. Dutilh, H. M. Ahmad, G. C. Raangs, A. W. Friedrich, J. W. A. 781

Rossen, and S. Morabito. 2018. Metagenomic characterization of the human 782

intestinal microbiota in fecal samples from STEC-infected patients. Front. Cell 783

Infect. Microbiol. 8:25. 784

36. Moss, E. L., D. G. Maghini, and A. S. Bhatt. 2020. Complete, closed bacterial 785

genomes from microbiomes using nanopore sequencing. Nat Biotechnol. 701–707 786

37. Nicholls, S. M., J. C. Quick, S. Tang, and N. J. Loman. 2019. Ultra-deep, long-787

read nanopore sequencing of mock microbial community standards. Gigascience. 788

8. 789

38. Boykin, L. M., P. Sseruwagi, T. Alicai, E. Ateka, I. U. Mohammed, J. L. 790

Stanton, C. Kayuki, D. Mark, T. Fute, J. Erasto, H. Bachwenkizi, B. Muga, 791

N. Mumo, J. Mwangi, P. Abidrabo, G. Okao-Okuja, G. Omuut, J. Akol, H. 792

B. Apio, F. Osingada, M. A. Kehoe, D. Eccles, A. Savill, S. Lamb, T. Kinene, 793

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 36: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

36

C. B. Rawle, A. Muralidhar, K. Mayall, F. Tairo, and J. Ndunguru. 2019. 794

Tree Lab: portable genomics for early detection of plant viruses and pests in Sub-795

Saharan Africa. Genes (Basel) 10:632. 796

39. Grutzke, J., B. Malorny, J. A. Hammerl, A. Busch, S. H. Tausch, H. Tomaso, 797

and C. Deneke. 2019. Fishing in the Soup - Pathogen Detection in Food Safety 798

Using Metabarcoding and Metagenomic Sequencing. Front. Microbiol. 10:1805. 799

40. Bertrand, D., J. Shaw, M. Kalathiyappan, A. H. Q. Ng, M. S. Kumar, C. Li, 800

M. Dvornicic, J. P. Soldo, J. Y. Koh, C. Tong, O. T. Ng, T. Barkham, B. 801

Young, K. Marimuthu, K. R. Chng, M. Sikic, and N. Nagarajan. 2019. Hybrid 802

metagenomic assembly enables high-resolution analysis of resistance 803

determinants and mobile elements in human microbiomes. Nat. Biotechnol. 804

37:937-944. 805

41. Gu, G., A. Ottesen, S. Bolten, Y. Luo, S. Rideout, and X. Nou. 2020. 806

Microbiome convergence following sanitizer treatment and identification of 807

sanitizer resistant species from spinach and lettuce rinse water. Int J Food 808

Microbiol. 318:108458. 809

42. Suttner, B., E. R. Johnston, L. H. Orellana, R. Rodriguez, J. K. Hatt, D. 810

Carychao, M. Q. Carter, M. B. Cooley, and K. T. Konstantinidis. 2020. 811

Metagenomics as a Public Health Risk Assessment Tool in a Study of Natural 812

Creek Sediments Influenced by Agricultural and Livestock Runoff: Potential and 813

Limitations. Appl. Environ. Microbiol. 86 (6):e02525-19. doi: 814

10.1128/AEM.02525-19 815

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 37: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

37

43. Loman, N. J., J. Quick, and J. T. Simpson. 2015. A complete bacterial genome 816

assembled de novo using only nanopore sequencing data. Nat. Methods 12:733-817

735. 818

44. Juul, S., F. Izquierdo, A. Hurst, X. Dai, A. Wright, E. Kulesha, R. Pettett, 819

and D. J. Turner. 2015. Whats in my pot? Real-time species identification on the 820

MinION. bioRxiv030742. 821

45. Kim, D., L. Song, F. P. Breitwieser, and S. L. Salzberg. 2016. Centrifuge: rapid 822

and sensitive classification of metagenomic sequences. Genome Res. 26:1721-823

1729. 824

46. Kolmogorov, M., J. Yuan, Y. Lin, and P. A. Pevzner. 2019. Assembly of long, 825

error-prone reads using repeat graphs. Nat Biotechnol 37:540-546. 826

47. Wood, D. E. and S. L. Salzberg. 2014. Kraken: ultrafast metagenomic sequence 827

classification using exact alignments. Genome Biol. 15:R46. 828

48. Afgan, E., D. Baker, B. Batut, M. van den Beek, D. Bouvier, M. Cech, J. 829

Chilton, D. Clements, N. Coraor, B. A. Gruning, A. Guerler, J. Hillman-830

Jackson, S. Hiltemann, V. Jalili, H. Rasche, N. Soranzo, J. Goecks, J. Taylor, 831

A. Nekrutenko, and D. Blankenberg. 2018. The Galaxy platform for accessible, 832

reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids 833

Res. 46:W537-W544. 834

49. Darling, A. C. E., B. Mau, F. R. Blattner, and N. T. Perna. 2004. Mauve: 835

Multiple alignment of conserved genomic sequence with rearrangements. 836

Genome Res. 14:1394-1403. 837

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 38: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

38

50. Alcock, B. P., A. R. Raphenya, T. T. Y. Lau, K. K. Tsang, M. Bouchard, A. 838

Edalatmand, W. Huynh, A. V. Nguyen, A. A. Cheng, S. Liu, S. Y. Min, A. 839

Miroshnichenko, H. K. Tran, R. E. Werfalli, J. A. Nasir, M. Oloni, D. J. 840

Speicher, A. Florescu, B. Singh, M. Faltyn, A. Hernandez-Koutoucheva, A. 841

N. Sharma, E. Bordeleau, A. C. Pawlowski, H. L. Zubyk, D. Dooley, E. 842

Griffiths, F. Maguire, G. L. Winsor, R. G. Beiko, F. S. L. Brinkman, W. W. 843

L. Hsiao, G. V. Domselaar, and A. G. McArthur. 2020. CARD 2020: antibiotic 844

resistome surveillance with the comprehensive antibiotic resistance database. 845

Nucleic Acids Res. 48:D517-D525. 846

51. Archer, D. L. 2018. The evolution of FDA's policy on Listeria monocytogenes in 847

ready-to-eat foods in the United States. Current Opinion in Food Science 20:64-848

68. 849

52. Karp, B. E., H. Tate, J. R. Plumblee, U. Dessai, J. M. Whichard, E. L. 850

Thacker, K. R. Hale, W. Wilson, C. R. Friedman, P. M. Griffin, and P. F. 851

McDermott. 2017. National Antimicrobial Resistance Monitoring System: Two 852

Decades of Advancing Public Health Through Integrated Surveillance of 853

Antimicrobial Resistance. Foodborne. Pathog. Dis. 14:545-557. 854

53. Cooper, R. M., L. Tsimring, and J. Hasty. 2017. Inter-species population 855

dynamics enhance microbial horizontal gene transfer and spread of antibiotic 856

resistance. Elife. 6. 857

54. Holzel, C. S., J. L. Tetens, and K. Schwaiger. 2018. Unraveling the Role of 858

Vegetables in Spreading Antimicrobial-Resistant Bacteria: A Need for 859

Quantitative Risk Assessment. Foodborne Pathog. Dis 15:671-688. 860

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 39: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

39

55. Thakur, S. and G. C. Gray. 2019. The Mandate for a Global "One Health" 861

Approach to Antimicrobial Resistance Surveillance. Am J Trop Med Hyg 862

2018/12/27:227-228. 863

864 865

866

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 40: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

40

TABLES 867

Table 1. Bacterial community analysis of enriched irrigation water by WIMP. Only 868

species with frequencies above 1% are shown. 869

Reads % Reads a

Klebsiella pneumoniae 1,215,922 28.52

Enterobacter cloacae 902,950 21.18

Enterobacter sp. 286,190 6.71

Enterobacter kobei 278,513 6.53

Pseudomonas putida 165,323 3.88

Citrobacter freundii 164,595 3.86

Acinetobacter baumannii 157,421 3.69

Enterobacter hormaechei 145,839 3.42

Enterobacter xiangfangensis 52,058 1.22

870 a% Reads were calculated as the proportion of total reads classified by WIMP software 871

analysis 872

873

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 41: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

41

Table 2. Virulence of the Flye assemblies obtained with all reads. 874

Sample Innoculation

Level (CFU/ml)

Serotypea stx type

eae type

Contig No. (genome and

plasmid)b

Water 0 O9 - - 677

Water+Ecoli1 7.3 x 108 O157:H7 1a gamma-1 555 (3+)

Water+Ecoli2 7.3 x 107 O157:H7 1a gamma-1 627 (4+)

Water+Ecoli3 7.3 x 106 O157:H7/ O9 1a gamma-1 753 (35+)

Water+Ecoli4 c 7.3 x 105 O157:H7/ O9 1a gamma-1 780 (80+)

Water+Ecoli5 7.3 x 104 O9 - -

Water+Ecoli6d 7.3 x 103 O9 - - 644

875

ain silico serotype using DTU. 876

bIn parenthesis, the number of contigs that contain the entire chromosome, a plus sign 877

(+) indicates the presence of a contig that matched the EDL933 circular plasmid. 878

cfragmented genome assembly limit. 879

dE. coli O157 detection limit (68 reads matching O157 by WIMP classification). 880

881

882

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 42: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

42

Table 3. Virulence of the Flye assemblies obtained with the WIMP E. coli extracted 883

reads. 884

Samplea WIMP E. coli Reads

Serotypeb stx type eae type

Contig No. (genome and

plasmid)c

Water 10,806 O9 - - 31

Water+Ecoli1 1,659,463 O157:H7 1a gamma-1 44 (1+)

Water+Ecoli2 432,649 O157:H7 1a gamma-1 40 (1+)

Water+Ecoli3 73,783 O157:H7/ O9 1a gamma-1 41 (8+)

Water+Ecoli4d 17,203 O157:H7/ O9 1a gamma-1 92 (63+)

Water+Ecoli5 10,086 O9 - - 28

Water+Ecoli6e 8,515 O9 - - 24

885 aCFU/ml levels of EDL933 inoculation can be found in Table 2. 886

bin silico serotype using DTU. 887

cIn parenthesis, the number of contigs that contain the entire chromosome, a plus sign 888

(+) indicates the presence of a contig that matched the EDL933 circular plasmid. 889

dfragmented genome assembly limit. 890

eE. coli O157 detection limit (31 reads matching O157 by Kraken classification). 891

892

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 43: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

43

Table 4. In silico detection of virulence genes in Flye assemblies with all nanopore reads 893

and E. coli extracted reads. 894

All reads

E. coli extracted reads

Water+Ecoli

Water+Ecoli

Virulence Gene

EDL933 Reference Water 1 2 3 4 5 6

Water 1 2 3 4 5 6

astA + - + + + + - -

- + + + + - -

ehxA + - + + + + - -

- + + + + - -

espA + - + + + + - -

- + + + + - -

espB + - + + + + - -

- + + + + - -

espF + - + + + + - -

- + + + + - -

espJ + - + + + + - -

- + + + + - -

espK + - + + + + - -

- + + + + - -

espP + - + + + + - -

- + + + + - -

tccP + - + + - + - -

- + + + + - -

etpD + - + + + + - -

- + + + + - -

gad + - + + + + - -

- + + + + - -

iha + - + + + + - -

- + + + + - -

iss + - + + + + - -

- + + + + - -

nleA + - + + + + - -

- + + + + - -

nleB + - + + + + - -

- + + + + - -

nleC + - + + + + - -

- + + + + - -

tir + - + + + + - -

- + + + + - -

katP + - + + + + - -

- + + + + - -

pssA + - + + + - - -

- + + + + - -

air + - + + + - - -

- + + + + - -

toxB + - + + + + - -

- + + + + - -

ecf1 + - + + + + - -

- + + + + - -

IEE + - + + + + - -

- + + + + - -

+ Virulence gene detected 895

- Virulence gene not detected 896

897

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 44: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

44

FIGURES 898

Figure 1. Determination of the detection limit of the qPCR assay. Calibration curves 899

generated using 10-fold dilutions of DNA standards for E. coli EDL933 (top) detecting 900

the wzy (A) and stx1 (B) genes. The Cq values were plotted against the log-scale CFU 901

per reaction target concentration (bottom). The R2 values and reaction efficiency (E) are 902

also shown. 903

904

905

906

907

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 45: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

45

Figure 2. Flow diagram of artificial contamination of enriched, STEC-negative irrigation 908

water through analysis by nanopore sequencing. 909

910

911

912

913

914

915

E. coli O157:H7 EDL933

24h culture (~109)

serial dilutions (10-fold)

STEC-negative

enriched irrigation water (24h)

Spike-In

(1mL water + 1mL diluted EDL933) Plate to count CFU

(24h)

DNA Extraction

Oxford Nanopore Sequencing

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 46: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

46

Figure 3. Relative abundance of bacterial species associated with irrigation water un-916

spiked and spiked with E. coli EDL933. Enriched irrigation water (Water) was artificially 917

contaminated with 10-fold dilutions of E. coli EDL933 (+Ecoli) with a starting 918

concentration of 7 x 108 CFU/ml (Water+Ecoli1). Reads were analyzed by the EPI2ME 919

WIMP workflow. Bacterial species contributing more than 1% of the classified reads are 920

shown and the sum of the remaining species identified are included as “Other.” 921

922

923

924

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint

Page 47: Precision long-read metagenomics sequencing for food safety by … · 2020. 7. 17. · Metagenomics for sample microbial analysis and targeted detection have been used . 110 . extensively

47

Figure 4. STEC detection and classification by combined qPCR and nanopore 925

sequencing approach. A) Direct comparison of quantitative qPCR detection with de novo 926

assembly limits by nanopore sequencing informs detection and classification of STECs. 927

B) Pipeline for detection and classification of STECs in enriched irrigation water using 928

nanopore sequencing and EPI2ME cloud-based services to identify reads of a desired 929

taxa for de novo assembly with Flye and in silico analysis. 930

931

932

and is also made available for use under a CC0 license. was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105

The copyright holder for this preprint (whichthis version posted July 18, 2020. ; https://doi.org/10.1101/2020.07.17.209718doi: bioRxiv preprint