smrt-cappable-seq reveals the complex operome of bacteria · - transcription termination sites...
TRANSCRIPT
SMRT-Cappable-seq reveals the complex operome of bacteria
Bo Yan
Ettwiller Lab
New England Biolabs
PacBio User Group Meeting
June 28, 2017
Identification of operon is crucial for understanding the gene regulation
- Operon:
A transcriptional unit contains a set of genes under the control of the same promoter
- The organization of bacterial genes into operons benefits co-regulation
E. coli galactose operon
galTgalE galK galMpromoter
TSS TTS
TSS: transcription start site
TTS: transcription termination site
Development of the full-length sequencing tech is important for operomics
- Current methods can not provide accurate genome wide operon information.
The operome analysis is based on high-throughput short-read sequencing
Short-read sequencing can not couple the TSS to the TTS of transcripts
Most operons are predicted based on computational methods
illumina sequencing reads for E. coli Leu operon
TSS
TTS Riboswitch TTS
Annotated operons
5’end3’end
- SMRT-Cappable-seq:
- isolation of full-length bacterial primary transcripts with long read sequencing
- PacBio SMRT (single molecule real-time) sequencing: high-throughput, long read
sequencing
- Development of the SMRT-Cappable-seq method
- A Novel mechanism through which bacteria could regulate operon expression:
Through controlling the read-through across intrinsic termination sites
- Identification of novel operons in E. coli
Project Goal
Specifically cap the 5’end of the bacterial primary transcripts
galTgalE galK galM
TSS TTS
5’pppPrimary transcript
~1-5%
Processed rRNA transcript
Degraded transcript
~95-99%
5’OH
5’p
5’p
Genome
3’ OH
3’ OH
3’ OH
3’ p
5’ppp5’end of Primary transcript 3’ OH
3’ OH
Capping primary transcripts for Transcription start site analysis
DTB
Schildkraut & Ettwiller, 2016, BMC Genomics
SMRT-Cappable-seq: isolate and sequence the full-length primary transcripts
5’pppPrimary transcript
~1-5%
TSS
gene gene gene
Processed rRNA transcript
Degraded transcript
~95-99%
5’OH
5’p
5’p
Genome3’ OH
3’ OH
3’ OH
3’ p
Capping reaction; tailing reaction
DTB-GpppPrimary transcript AAAAAA
Processed rRNA transcript
Degraded transcript 5’OH
5’p
5’p
AAAAAA
AAAAAA
3’ p
Streptavidin
DTB-Gppp AAAAAA
Primary transcript
Elution
DTB-Gppp AAAAAA
cDNA synthesis and SMRT sequencing
SMRT-Cappable-seq Library
5’p
5’p
Processed transcript
Degraded transcript
5’p
AAAAAA
AAAAAA
DTB-Gppp AAAAAA
Primary transcript
cDNA synthesis and SMRT sequencing
Control Library
3’ p
PacBio iso-seq protocol is not adapted for bacterial transcripts
5’3’ cDNA
mRNA
mRNA
Template switchcDNA synthesis
adapter
PacBio RNA-seq protocol (iso-seq)
- Problems of using iso-seq protocol for bacterial transcripts
- Low yield
- Bias for bacterial RNA starting with different nucleotides
Use TdT based method to add an adapter to the 3’end of cDNA
- Terminal transferase (TdT) adds deoxyribonucleotides to the 3’ hydroxyl terninus of cDNA
- TdT method has higher yield than using template switching
- TdT method dose not introduce bias
3’5’ DTB
5’
5’
cDNA with polyG adapter at 3’end
3’ GGGGG
Addition of polyG by TdT
5’ DTB
5’3’ GGGGG cDNA
Second strand synthesisCCCCC
cDNA
mRNA
Using cohesive ends ligation increases library preparation efficiency
- Limitations for blunt end ligation:
- Low efficiency; self ligation; adapter dimer
- Using USER to generate cohesive ends for ligation:
- Increase ligation efficiency; prevent self ligation and adapter dimer formation
SMRTbell template
Second strand synthesis
and PCR amplification
USER digestion and Ligation
SMRT sequencing
dUdU
Pipeline of SMRT-Cappable-seq
E. coli total RNA
Capping Tailing
Capped, tailed primary transcripts
Streptavidin
enrichment
cDNA
synthesis TdT
5’ Adapter -cDNA- 3’adapter
Second strand
cDNA synthesis
PCR amplification
du-PCR products
USER Cohesive ligation
SMRTbell template
PacBio sequencing
90% of the E. coli genome is covered by SMRT-Cappable-seq reads
- E. coli growth media: M9 minimal medium and Rich medium
- Use 12 SMRTcells for sequencing each condition, and generate ~0.3 million reads
- Maximal size 5kb, average size 2kb
- More than 90% of the E. coli genome is covered
Collaboration with PacBio
Tyson Clark
Matthew Boitano
SM
RT
-Ca
pp
ab
le-s
eq
Gen
e
IGV
SMRT-Cappable-seq has high specificity for capturing primary transcripts A. Principle of SMRT-Cappable-seq B. Read distribution across
genomic features
C. TSS correlation accross conditions
1 10 100 1000 10000
# of read at TSS (Rich medium)
10000
1000
100
10
1
# o
f re
ad
at TSS (M
9 m
ed
ium
)
Control Control Cappable-seq Cappable-seq
Protein
coding genes
Primary rRNA Processed rRNA
E. coli
Figure 1
95%
mRNA
30%
mRNA
20%
mRNA
92%
mRNA
qPCR result:
1000 fold greater recovery of primary transcripts
mRNA
SM
RT
-Ca
pp
ab
le-s
eq
Gen
e
TSS
SMRT-Cappable-seq accurately defines the transcription start site
- SMRT-Cappable-seq provides a snapshot of the real-time transcription
IGV35 kb
SMRT-Cappable-seq defines the transcription termination site
- Transcription termination sites (TTS):
significant accumulation of 3’end reads by Binomial test.
- 408 TTSs were identified, from which 98 are experimentally confirmed.
hpt
can
yadG yadH yadI
141361
142703
Transcriptional coordination for overlapping sense transcripts
Transcriptional coordination for overlapping antisense transcripts B.
C.
pncC recA recX
prs dauA
Genomic position (bp)
A. Example of condition dependent read-through
Figure 3
number of
reads
number of
reads
100
0
Genomic positions (bp)
2824000 28220002823000
Genomic positions (bp)
2821000
142000 143000 144000 145000
Rich
med
ium
Min
imal m
ed
ium
100
0
Min
imal m
ed
ium
Rich
med
ium
1262000 1261000 1260000 1259000
200
100
0
200
100
0
number of
reads
TSS1
TSS2
M9
0
100
2824000 2823000 2822000 2821000
TTS
5’end 3’end
pncC recA recX
The previously validated intrinsic termination sites have transcripts passing-through
0
50
100
953000 952000
5’end
pflB pflB terminator
Termination sites without read-through
3’end
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
ychH prs
lolB-ispE-prs terminator
dauA prs
prs terminator
dauA
3’ end extension across the TTS
1259000 1260000 1261000 1262000
3’end5’end
0
100
200
Termination sites with read-through
D.
Median size (in nt) of extension beyond TTS10 1000
75
50
25
0
% o
f re
ad
s p
ass
ing
th
roug
h T
TS
A. Genomewide characterisation of termination read-through B. Percentage of read-through at
termination sites
C. Length of the read-through (nt)
0.00
0.25
0.50
0.75
Pe
rce
nt
D. Presence of attenuator and
terminator loop
Figure 2
0-10 nt 10-100 nt 100-1000 nt >1000 nt
0-5% 5-20% 20-50% 50-80% 80-100%
Novel TTS Known TTS
Novel TTS Known TTS
Perc
en
tag
e o
f TTS (or ra
nd
om
posi
tion
s)
known TTS
novel TTS
known
attenuator
number of
reads at TSS
5000
10000
15000
Attenuator RNAstructure
random SMRTseq TTS
Median size of the 3’end extension across the TTSs
The majority of defined intrinsic termination sites have transcripts passing-through
- 40% of defined TTSs have read-throughs that contain additional gene(s)
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
ychH prs
lolB-ispE-prs terminator
dauA prs
prs terminator
dauA
3’ end extension across the TTS
1259000 1260000 1261000 1262000
3’end5’end
0
100
200
Internal termination sites with read-through
The degree of read-through is condition dependent for some TTSs
- The upregulation of dauA gene is necessary to respond to the pH changes in Rich medium
- dauA:
encoding C4 dicarboxylic acid transporter, necessary for succinate transportation
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
prs dauA90%
40%
TTS 3’end5’end
M9
Rich
prs dauA
0
100
200
0
100
200
Regulate operon expression through Riboswitch in the 5’ UTR
Riboswitch
leuA
TSS TTS
1 2 3 4leuL leuB leuC leuD
Anti-anti-terminator
terminator
Leader peptide
Leu
RNAP off
Leu (+)
5’ untranslated region
OFF
- Riboswitch: regulatory RNA element in the 5’ untranslated region (5’ UTR)
- Common strategy for regulating amino acid synthesis and antibiotics resistance
leuA
TSS TTS
leuL leuB leuC leuD
Antiterminator
1
2 3
4
Leu (-)
ON
Protein coding genes
The switch between terminator and antiterminator might exist in some defined TTSs
AntiterminatorTerminator
on
off
15%
SMRT-Cappable-seq defined TTSs
Antiterminator structure
1%
randomly selected sequences
A novel mechanism for bacterial operon regulation
- Control read-through across internal termination sites at the 3’ end of genes
- Control the expression of part of the operon
- Establish the operon polarity and increase the plasticity of gene regulation
Supplementary Figure 2
Genomic position (bp)
Num
ber of re
ad
s
5’end
1000
500
190000 191000 192000 193000
3’end
Gene
1 2 3 4 5
Polycistronic TUs
with different
gene combinations
1 2 3 4 5
1 2 3
1 2
Internal TTS
Sometimes transcription termination coordinates with the transcription start
M9
Ric
h
1151000 1152000 1153000 1154000 1155000
0
100
200
300
400
500
0
100
200
300
400
500
TSS_1150933
TSS_1151052
TSS_1151511
TSS_1151575
100
0
200
300
400
fabG acpP fabF pabC mltG
hpt
can
yadG yadH yadI
141361
142703
Transcriptional coordination for overlapping sense transcripts
Transcriptional coordination for overlapping antisense transcripts B.
C.
pncC recA recX
prs dauA
Genomic position (bp)
A. Example of condition dependent read-through
Figure 3
number of
reads
number of
reads
100
0
Genomic positions (bp)
2824000 28220002823000
Genomic positions (bp)
2821000
142000 143000 144000 145000
Rich
med
ium
Min
imal m
ed
ium
100
0
Min
imal m
ed
ium
Rich
med
ium
1262000 1261000 1260000 1259000
200
100
0
200
100
0
number of
reads
TSS1
TSS2
M9
0
100
2824000 2823000 2822000 2821000
1151000 115200 1153000 1154000 1155000
TSS1
TSS2
TSS3
Transcription coordination
TTS
3’end5’end
pncC recA recX
3’end5’end
5%
30%
5%
80%
Known Operons
SMRT-cappable-seq Operons
Size (base)
Elo
ng
ate
d 5
’
Elo
ng
ate
d 3
’
Elongated 3’
A. Example of a novel extended operon
Known Operon
rpsB tsf pyrH frr
Read counts
Genomic position (bp)
Figure 4
C. Fractions of novel operons in E. coli
B. Genome-wdie identif cation of a novel extented operons
TTS
tsf
Known operon
Elongated 3’end
tff rpsB pyrH frr
Genomic position
- 40% of the annotated operons in RegulonDB are extended by SMRT-Cappable-seq
SMRT-Cappable-seq identifies 840 novel operons for E. coli
Known Operons
SMRT-cappable-seq Operons
Size (base)
Elo
ng
ate
d 5
’
Elo
ng
ate
d 3
’
Elongated 3’
A. Example of a novel extended operon
Known Operon
rpsB tsf pyrH frr
Read counts
Genomic position (bp)
Figure 4
C. Fractions of novel operons in E. coli
B. Genome-wdie identif cation of a novel extented operons
Known Operons
SMRT-cappable-seq Operons
Size (base)
Elo
ng
ated
5’
Elo
ng
ated
3’
Elongated 3’
A. Example of a novel extended operon
Known Operon
rpsB tsf pyrH frr
Read counts
Genomic position (bp)
Figure 4
C. Fractions of novel operons in E. coli
B. Genome-wdie identif cation of a novel extented operons
Same
Novel
Shorter 3’end
Summary
SMRT-Cappable-seq
- Enrich and sequence the full length mRNA
- Identify transcription start and end at a single base resolution
- Used for bacterial operome and transcriptome analysis
E. coli operome:
- Identify 840 novel operons
- Reveal pervasive control of read-through across the termination sites
- A novel mechanism for generating polycistronic transcripts and operon
polarity
Applications of SMRT-Cappable-seq
- Study bacterial operome and transcriptome
- Enrich the full length prokaryotic mRNA, microbiome
Laurence Ettwiller
Ira Schildkraut
Ivan Correa
Madalee Wulf
Nick Guan
Alex Fomenkov
Brain Anton
Bill Jack
Rich Roberts
Jim Ellard
Don Comb
Acknowledgements
Collaboration with PacBio
Tyson Clark
Matthew Boitano
Supplementary Figure 2
Genomic position (bp)
Num
ber of re
ad
s
Functional protein products
Novel TTS
Known TTS
The degree of read-through is not correlated with the strength of the terminator
SMRTbell template
Second strand synthesis
and PCR amplification
DTB-Gppp AAAAAA
First strand cDNA synthesis
DTB-Gppp AAAAAA
DTB-Gppp AAAAAA
Removal of incomplete cDNA by Rnase I
cDNA incomplete at 3’end
Addition of polyG linker by TdT
DTB-Gppp AAAAAA
GGG
Streptavidin
RT primerTTTTTT
USER digestion and Ligation
SMRT sequencing
cD
NA
synth
esis
Lib
rary
pre
para
tion
dUdU
A. Principle of SMRT-Cappable-seq B. Read distribution across
genomic features
C. TSS correlation accross conditions
1 10 100 1000 10000
# of read at TSS (Rich medium)
10000
1000
100
10
1
# o
f re
ad
at TSS (M
9 m
ed
ium
)
Control Control Cappable-seq Cappable-seq
Protein
coding genes
Primary rRNA Processed rRNA
E. coli
Figure 1
Regulate operon expression by controlling premature termination in the 5’UTR
Non-coding
Leader peptide
Riboswitch
Leu
- Cappable-seq conditions can not be
used for full length transcript.
- Optimization:
Capping reaction; enrichment step
- SMRT-cappable-seq:
1000 fold greater recovery of
primary transcripts
90% mRNA:
~80% protein coding mRNA
~10% primary transcripts for rRNA
New condition efficiently enriches the full-length primary transcripts
Transcripts composition
Cappable Control
18%
mRNA
Perc
enta
ge o
f m
apped r
eads
Cappable Control
90%
mRNA
5%
mRNA5%
mRNA
Sometimes transcription termination coordinates with the transcription start
M9
Ric
h
785500 786000 786500 787000 787500
0
200
400
600
800
0
200
400
600
800
TSS_785591
TSS_787632
TSS_787670
M9
Ric
h
785500 786000 786500 787000 787500
0
200
400
600
800
0
200
400
600
800
TSS_785591
TSS_787632
TSS_787670
aroG gpmA
Termination of transcriptions from both directions
TTS
hpt
can
yadG yadH yadI
141361
142703
Transcriptional coordination for overlapping sense transcripts
Transcriptional coordination for overlapping antisense transcripts B.
C.
pncC recA recX
prs dauA
Genomic position (bp)
A. Example of condition dependent read-through
Figure 3
number of
reads
number of
reads
100
0
Genomic positions (bp)
2824000 28220002823000
Genomic positions (bp)
2821000
142000 143000 144000 145000
Rich
med
ium
Min
imal m
ed
ium
100
0
Min
imal m
ed
ium
Rich
med
ium
1262000 1261000 1260000 1259000
200
100
0
200
100
0
number of
reads
TSS dependent termination
hptcan
yadG
- Transcription coordination: different assembles of the RNA transcriptional machinery
SMRT-Cappable-seq is a powerful method to
phase the transcription start and termination site
hpt
can
yadG yadH yadI
141361
142703
Transcriptional coordination for overlapping sense transcripts
Transcriptional coordination for overlapping antisense transcripts B.
C.
pncC recA recX
prs dauA
Genomic position (bp)
A. Example of condition dependent read-through
Figure 3
number of
reads
number of
reads
100
0
Genomic positions (bp)
2824000 28220002823000
Genomic positions (bp)
2821000
142000 143000 144000 145000
Rich
med
ium
Min
imal m
ed
ium
100
0
Min
imal m
ed
ium
Rich
med
ium
1262000 1261000 1260000 1259000
200
100
0
200
100
0
number of
reads
TSS1
TSS2
M9
0
100
2824000 2823000 2822000 2821000
Termination site5’end 3’end
Phase TSS and TTS
illumina sequencing reads for the same loci
PacBio iso-seq protocol is not adapted for bacterial transcripts
5’3’ cDNA
mRNA
mRNA
Template switchcDNA synthesis
Problems of using iso-seq
protocol for SMRT cappable-seq
Low yield
Has bias for different
DTB capped transcripts
adapter
PacBio RNA-seq protocol (iso-seq)
TSS defined by template switch
E. coli TSS motif
Annotated TSS
TSS TSS
- Has higher yield than using template switching.
- Does not introduce bias at the TSS.
Template switch TdT
TdT based method generates the same TSS motif profile
Annotated TSS
E. coli TSS motif
WebLogo 3.5.0
0.0
1.0
bits
-30
T
G
C
A
-29
G
C
T
A
-28
T
G
AC
-27
T
G
C
A
-26
G
C
A
-25
A
T
C
G
-24
T
CAG
-23
A
T
GC
-22
A
TCG
-21
A
TGC
-20
T
ACG
-19
T
G
C
-18
A
T
CG
-17
G
C
AT
-16
C
A
T
G
-15
C
GAT
-14
A
G
C
T
-13
A
T
CG
-12
A
CTG
-11
T
A
GC
-10
TA
CG
-9
A
TCG
-8
T
AGC
-7
A
TGC
-6
A
GCT
-5
GACT
-4
GCAT
-3
G
CAT
-2
ACGT
-1
GCT
0
G
CT
1
CGAT
2
GACT
3
G
A
CT
4
G
ACT
5
G
A
C
T
6
GCAT
7
G
C
AT
8
G
C
T
9
G
C
AT
10
G
C
A
T
Motif of SMRT-cappable-seq defined termination sites
A GC rich hairpin structure is present in the TTSs
GC- rich region
D.
Median size (in nt) of extension beyond TTS10 1000
75
50
25
0
% o
f re
ad
s p
ass
ing
thro
ug
h T
TS
A. Genomewide characterisation of termination read-through B. Percentage of read-through at
termination sites
C. Length of the read-through (nt)
0.00
0.25
0.50
0.75
Perc
en
t
D. Presence of attenuator and
terminator loop
Figure 2
0-10 nt 10-100 nt 100-1000 nt >1000 nt
0-5% 5-20% 20-50% 50-80% 80-100%
Novel TTS Known TTS
Novel TTS Known TTS
Perc
enta
ge o
f TTS (or ra
nd
om
posi
tion
s)
known TTS
novel TTS
known
attenuator
number of
reads at TSS
5000
10000
15000
Attenuator RNAstructure
random SMRTseq TTS
95%
SMR-cappable-seq defined TTSs
20%
randomly selected sequences
T
T
TG
T
AC
CG
10
A
A
A
A
C
C
C
C
G
G
20
G
GCG
T G
C
T
C
C
30
G
G
G
G
T
T
T
T
T
T
40
C
Probability >= 99%
99% > Probability >= 95%
95% > Probability >= 90%
90% > Probability >= 80%
80% > Probability >= 70%
70% > Probability >= 60%
60% > Probability >= 50%
50% > Probability
ENERGY = 3.8 05/28/17 19:41:09
T
T
T
G
T
AC C
G
10
A
A
A
A
C
C
C
C
G
G
20
G
G
CG
T
G
C
T
C
C
30
G
G
G
G
T
T
T
T
T
T
40
C
Probability >= 99%
99% > Probability >= 95%
95% > Probability >= 90%
90% > Probability >= 80%
80% > Probability >= 70%
70% > Probability >= 60%
60% > Probability >= 50%
50% > Probability
ENERGY = 3.8 05/28/17 19:41:09
TTS
TTS
The previously validated intrinsic termination sites have transcripts passing-throughM9
Rich
216000217000218000219000
0
200
400
0
200
400
TSS_218837
nlpEyaeF
proS
proS terminator
216000 217000 218000
3’end5’end
0
200
400
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
ychH prs
lolB-ispE-prs terminator
dauA prs
prs terminator
dauA
3’ end extension across the TTS
1259000 1260000 1261000 1262000
3’end5’end
0
100
200
D.
Median size (in nt) of extension beyond TTS10 1000
75
50
25
0
% o
f re
ad
s p
ass
ing
th
roug
h T
TS
A. Genomewide characterisation of termination read-through B. Percentage of read-through at
termination sites
C. Length of the read-through (nt)
0.00
0.25
0.50
0.75
Pe
rce
nt
D. Presence of attenuator and
terminator loop
Figure 2
0-10 nt 10-100 nt 100-1000 nt >1000 nt
0-5% 5-20% 20-50% 50-80% 80-100%
Novel TTS Known TTS
Novel TTS Known TTS
Perc
en
tag
e o
f TTS (or ra
nd
om
posi
tions)
known TTS
novel TTS
known
attenuator
number of
reads at TSS
5000
10000
15000
Attenuator RNAstructure
random SMRTseq TTS
Percentage of read-through across the TTSs
D.
Median size (in nt) of extension beyond TTS10 1000
75
50
25
0
% o
f re
ad
s p
ass
ing
th
roug
h T
TS
A. Genomewide characterisation of termination read-through B. Percentage of read-through at
termination sites
C. Length of the read-through (nt)
0.00
0.25
0.50
0.75
Pe
rce
nt
D. Presence of attenuator and
terminator loop
Figure 2
0-10 nt 10-100 nt 100-1000 nt >1000 nt
0-5% 5-20% 20-50% 50-80% 80-100%
Novel TTS Known TTS
Novel TTS Known TTS
Perc
en
tag
e o
f TTS (or ra
nd
om
posi
tion
s)
known TTS
novel TTS
known
attenuator
number of
reads at TSS
5000
10000
15000
Attenuator RNAstructure
random SMRTseq TTS
Median size of the 3’end extension across the TTSs
40% of TTSs have read-throughs
that contain additional gene(s)
Percentage = number of reads passing-through
number of reads at TTS + number of reads passing-through
The majority of defined intrinsic termination sites have transcripts passing-through
Current concept for E. coli termination sites needs to be redefined.
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
M9
Rich
1259000126000012610001262000
0
100
200
0
100
200
TSS_1262176
ychH prs
lolB-ispE-prs terminator
dauA prs
prs terminator
dauA
3’ end extension across the TTS
1259000 1260000 1261000 1262000
0
100
200
M9
Ric
h
412000041220004124000
−100
0
100
200
300
400
−100
0
100
200
300
400
TSS_4119356
TSS_4120407
TSS_4121960
TSS_4122348
TSS_4123368
TSS_4124510
cytR ftsN hslV hslU menA rraA
cytRp hslVp
menAp
rraAp
RegulonDB
cytRp
hslVp
menAphslUpftsNp
SMRT-cappable-seq
Known Operons
SMRT-cappable-seq Operons
Size (base)
Elo
ng
ate
d 5
’
Elo
ng
ate
d 3
’
Elongated 3’
A. Example of a novel extended operon
Known Operon
rpsB tsf pyrH frr
Read counts
Genomic position (bp)
Figure 4
C. Fractions of novel operons in E. coli
B. Genome-wdie identif cation of a novel extented operons
Known Operons
SMRT-cappable-seq Operons
Size (base)
Elo
ng
ated
5’
Elo
ng
ated
3’
Elongated 3’
A. Example of a novel extended operon
Known Operon
rpsB tsf pyrH frr
Read counts
Genomic position (bp)
Figure 4
C. Fractions of novel operons in E. coli
B. Genome-wdie identif cation of a novel extented operons
Same
Novel
Shorter 3’end
rraAphslVp
SMRT-Cappable-seq identifies 840 novel operons for E. coli
TTS
TSS
No confirmed TTS
5’end 3’end