using long native reads to partition and assemble genomes ... · after aligning all reads to an e....
TRANSCRIPT
![Page 1: Using long native reads to partition and assemble genomes ... · After aligning all reads to an E. coli K12 reference sequence, the methylation detection tool Tombo was used to characterise](https://reader034.vdocuments.mx/reader034/viewer/2022042101/5e7d9b2e55f9252fc26ffd48/html5/thumbnails/1.jpg)
Using long native reads to partition and assemble genomes from complex metagenomic samples Long PCR-free nanopore reads allow partitioning and assembly of individual genomes from complex mixtures of different organisms, using several different bioinformatics approaches
© 2019 Oxford Nanopore Technologies. All rights reserved. Oxford Nanopore Technologies’ products are currently for research use only.P17004 - Version 7.0
Fig. 1 De novo metagenomic assembly a) laboratory workflow b) typical bioinformatics pipeline
Long reads provide more genomic context, improving assembly from complex samples
Fig. 2 Assembly by coverage binning a) workflow b–e) performance on Zymo mock community
Binning using differential coverage profiles to improve assembly contiguity
Total DNA extracted fromcomplex sample and long-fragment library prepared
Library sequenced andlong reads obtained
Genome sequencesreconstructed by metagenomic
assembly of long reads
a)
b)
Contact: [email protected] More information at: www.nanoporetech.com and publications.nanoporetech.com
The majority of microbes cannot be cultured in the laboratory, and so the most direct way to derive whole genome sequences from complex mixtures of organisms is by metagenomic assembly, where all genomes in the sample are assembled together (Fig. 1a). Such mixtures often contain many similar genomes with different levels of abundance, which often leads to misassembly. A common approach to this problem is to bin reads into subsets that ideally represent a single genome, and to then assemble bins individually. Long reads can improve this by improving the sensitivity and specificity of binning strategies and providing longer overlaps for assembly. An example of such a workflow is shown in Fig. 1b.
For metagenomic samples where the microbial genomes are not well represented in reference databases, differences in the organism abundance within the samples can be exploited as a binning strategy (Fig. 2a). We used three different extraction protocols on the Zymo mock community to create different genome abundances (Fig. 2b). We aligned reads from each sample to contigs assembled from the combined set of samples, and used aligned read depth to measure contig abundance in each sample (Fig. 2c). This allows binning of contigs based on matching abundance profiles (Fig. 2d). Finally, contigs in the initial bins can be refined by taking all reads that align to the contigs and conducting a second, bin-specific assembly (Fig. 2e).
Fig. 3 Partitioning native bacterial reads using strain-specific methylation patterns a) overview of experimental set-up b) bioinformatics workflow c) hexbin plot showing partitioned reads
Separating reads from closely related bacterial genomes using Tombo to identify strain- specific patterns of Dam and Dcm methylation The high degree of sequence similarity between multiple strains in microbial communities can present significant challenges for analysis. One way to resolve strain-specific sequences is to take advantage of the patterns of DNA methylation that are often present in microbial genomes. Methylation occurs at specific target motifs, yet there exists a great diversity of these motifs in the bacterial world, even among members of the same species. These naturally occurring methylation patterns can be detected in nanopore reads and can serve as epigenetic barcodes for binning reads by strain. In the example shown, two strains of E. coli were co-cultured: a wild-type K12 strain and a K12 mutant lacking the Dcm and Dam methyltransferases that methylate the 5’-CCWGG-3’ and 5’-GATC-3’ motifs, respectively (Fig. 3a). Nanopore sequencing resulted in a mixture of reads from each strain. After aligning all reads to an E. coli K12 reference sequence, the methylation detection tool Tombo was used to characterise the methylation status at 5’-GATC-3’ and 5’-CCWGG-3’ sites. Read-level statistics were compiled by assessing all motif sites from each read and taking the median methylation score for these sites (Fig. 3b). The resulting hexbin plot shows a division of reads solely based on these read-level methylation assessments at the two motifs in question: one group has high scores for both the Dcm and Dam motifs, while the other group has low methylation scores for both motifs (Fig. 3c).
Nanopore reads
Bin by genusor species
Classifiedreads
Unclassifiedreads
Taxonomicbins
PLOTTING: min_cov: 5 window: 20
Each bin
Initial assemblyminiasm
Final assemblycanu
QC/identificationBLASTN
Recruit readsminimap2
Filter Length > 2 kbQ score > 7
ClassifyKaiju
a)
Nanoporereads
Sample N
Sample 1
Sample 2
a)
Co-assembleall samples
wtdgb2
Filter Length > 2 kbQ score > 7
Createcoverage
profiles andassigncontigsto bins
metabat2
minimap2
Alignsample 1
Alignsample 2
Alignsample N
Coveragebins
Each bin
Evaluate bin qualityCheckM
Final assemblycanu/wtdgb2
QC/identificationBLASTN
Recruit readsminimap2
Rel
ativ
e ab
unda
nce
0.0
0.2
0.4
0.6
0.8
1.0
Super
natan
t
Pellet
2
Pellet
1
b)
BacillusEnterococcusEscherichiaLactobacillusListeriaPseudomonasSalmonellaStaphylococcusUnassigned
50 kb500 kb1,000 kb2,000 kb
Sup
erna
tant
102
101
Pellet 2101 102 103
c) d)
E. faec
alis
E. coli
S. ente
rica
L. fer
mentum
L. mon
ocyto
gene
s
S. aure
us
P. ae
rugin
osa
B. sub
tilis
unkn
own
Bin
10852369741
n/a
0
0
0
0
0
0
0
0
0
0
0
0.03
0
0
0
0
0
0
0
0
0.03
0
0
0
0
0
0
0
0
0
0.01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.05
0
0
0.01
0
0
0
0.07
0.07
0.33
0.48
0.05
0
0
0
0
0
0
0
0
0
0
0.09
0.06
0.11
0.08
0.06
0
0
0.01
0.02
0.34
1.0
0.8
0.6
0.4
0.2
01
1
0.99
1
0.97
0.97
0.95
0.23
% P
seud
omon
asge
nom
e
0
20
40
60
80
100e)
Initial Final
Assembly
N = 53
N = 18
-4
4
2
0
00
20
40
60
80
100
120
140
-1-2-3 1 2
-2
Med
ian
Dcm
Median Dam
Read-levelmethylation detection
N6-methyladenine (6mA)5-methylcytosine (5mC)
5’-GATC-3’
5’-CCWGG-3’
5’-GATC-3’
E. coli K12 E. coli K12 Dam-/Dcm-
5’-CCWGG-3’
Co-cultured strainswith distinct MTase activities
a) c)
Align toreference
minimap2
Call methylationat motifs of interest
Tombo
Pool motif scores on each read
Separate readsby motif scores
Nanopore readsStrain binningby methylation
b)
Read count