smrt-cappable-seq reveals the complex operome of bacteria · - transcription termination sites...

39
SMRT-Cappable-seq reveals the complex operome of bacteria Bo Yan Ettwiller Lab New England Biolabs PacBio User Group Meeting June 28, 2017

Upload: vantruc

Post on 15-Jul-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

SMRT-Cappable-seq reveals the complex operome of bacteria

Bo Yan

Ettwiller Lab

New England Biolabs

PacBio User Group Meeting

June 28, 2017

Page 2: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Identification of operon is crucial for understanding the gene regulation

- Operon:

A transcriptional unit contains a set of genes under the control of the same promoter

- The organization of bacterial genes into operons benefits co-regulation

E. coli galactose operon

galTgalE galK galMpromoter

TSS TTS

TSS: transcription start site

TTS: transcription termination site

Page 3: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Development of the full-length sequencing tech is important for operomics

- Current methods can not provide accurate genome wide operon information.

The operome analysis is based on high-throughput short-read sequencing

Short-read sequencing can not couple the TSS to the TTS of transcripts

Most operons are predicted based on computational methods

illumina sequencing reads for E. coli Leu operon

TSS

TTS Riboswitch TTS

Annotated operons

5’end3’end

Page 4: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

- SMRT-Cappable-seq:

- isolation of full-length bacterial primary transcripts with long read sequencing

- PacBio SMRT (single molecule real-time) sequencing: high-throughput, long read

sequencing

- Development of the SMRT-Cappable-seq method

- A Novel mechanism through which bacteria could regulate operon expression:

Through controlling the read-through across intrinsic termination sites

- Identification of novel operons in E. coli

Project Goal

Page 5: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Specifically cap the 5’end of the bacterial primary transcripts

galTgalE galK galM

TSS TTS

5’pppPrimary transcript

~1-5%

Processed rRNA transcript

Degraded transcript

~95-99%

5’OH

5’p

5’p

Genome

3’ OH

3’ OH

3’ OH

3’ p

5’ppp5’end of Primary transcript 3’ OH

3’ OH

Capping primary transcripts for Transcription start site analysis

DTB

Schildkraut & Ettwiller, 2016, BMC Genomics

Page 6: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

SMRT-Cappable-seq: isolate and sequence the full-length primary transcripts

5’pppPrimary transcript

~1-5%

TSS

gene gene gene

Processed rRNA transcript

Degraded transcript

~95-99%

5’OH

5’p

5’p

Genome3’ OH

3’ OH

3’ OH

3’ p

Capping reaction; tailing reaction

DTB-GpppPrimary transcript AAAAAA

Processed rRNA transcript

Degraded transcript 5’OH

5’p

5’p

AAAAAA

AAAAAA

3’ p

Streptavidin

DTB-Gppp AAAAAA

Primary transcript

Elution

DTB-Gppp AAAAAA

cDNA synthesis and SMRT sequencing

SMRT-Cappable-seq Library

5’p

5’p

Processed transcript

Degraded transcript

5’p

AAAAAA

AAAAAA

DTB-Gppp AAAAAA

Primary transcript

cDNA synthesis and SMRT sequencing

Control Library

3’ p

Page 7: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

PacBio iso-seq protocol is not adapted for bacterial transcripts

5’3’ cDNA

mRNA

mRNA

Template switchcDNA synthesis

adapter

PacBio RNA-seq protocol (iso-seq)

- Problems of using iso-seq protocol for bacterial transcripts

- Low yield

- Bias for bacterial RNA starting with different nucleotides

Page 8: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Use TdT based method to add an adapter to the 3’end of cDNA

- Terminal transferase (TdT) adds deoxyribonucleotides to the 3’ hydroxyl terninus of cDNA

- TdT method has higher yield than using template switching

- TdT method dose not introduce bias

3’5’ DTB

5’

5’

cDNA with polyG adapter at 3’end

3’ GGGGG

Addition of polyG by TdT

5’ DTB

5’3’ GGGGG cDNA

Second strand synthesisCCCCC

cDNA

mRNA

Page 9: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Using cohesive ends ligation increases library preparation efficiency

- Limitations for blunt end ligation:

- Low efficiency; self ligation; adapter dimer

- Using USER to generate cohesive ends for ligation:

- Increase ligation efficiency; prevent self ligation and adapter dimer formation

SMRTbell template

Second strand synthesis

and PCR amplification

USER digestion and Ligation

SMRT sequencing

dUdU

Page 10: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Pipeline of SMRT-Cappable-seq

E. coli total RNA

Capping Tailing

Capped, tailed primary transcripts

Streptavidin

enrichment

cDNA

synthesis TdT

5’ Adapter -cDNA- 3’adapter

Second strand

cDNA synthesis

PCR amplification

du-PCR products

USER Cohesive ligation

SMRTbell template

PacBio sequencing

Page 11: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

90% of the E. coli genome is covered by SMRT-Cappable-seq reads

- E. coli growth media: M9 minimal medium and Rich medium

- Use 12 SMRTcells for sequencing each condition, and generate ~0.3 million reads

- Maximal size 5kb, average size 2kb

- More than 90% of the E. coli genome is covered

Collaboration with PacBio

Tyson Clark

Matthew Boitano

SM

RT

-Ca

pp

ab

le-s

eq

Gen

e

IGV

Page 12: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

SMRT-Cappable-seq has high specificity for capturing primary transcripts A. Principle of SMRT-Cappable-seq B. Read distribution across

genomic features

C. TSS correlation accross conditions

1 10 100 1000 10000

# of read at TSS (Rich medium)

10000

1000

100

10

1

# o

f re

ad

at TSS (M

9 m

ed

ium

)

Control Control Cappable-seq Cappable-seq

Protein

coding genes

Primary rRNA Processed rRNA

E. coli

Figure 1

95%

mRNA

30%

mRNA

20%

mRNA

92%

mRNA

qPCR result:

1000 fold greater recovery of primary transcripts

mRNA

Page 13: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

SM

RT

-Ca

pp

ab

le-s

eq

Gen

e

TSS

SMRT-Cappable-seq accurately defines the transcription start site

- SMRT-Cappable-seq provides a snapshot of the real-time transcription

IGV35 kb

Page 14: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

SMRT-Cappable-seq defines the transcription termination site

- Transcription termination sites (TTS):

significant accumulation of 3’end reads by Binomial test.

- 408 TTSs were identified, from which 98 are experimentally confirmed.

hpt

can

yadG yadH yadI

141361

142703

Transcriptional coordination for overlapping sense transcripts

Transcriptional coordination for overlapping antisense transcripts B.

C.

pncC recA recX

prs dauA

Genomic position (bp)

A. Example of condition dependent read-through

Figure 3

number of

reads

number of

reads

100

0

Genomic positions (bp)

2824000 28220002823000

Genomic positions (bp)

2821000

142000 143000 144000 145000

Rich

med

ium

Min

imal m

ed

ium

100

0

Min

imal m

ed

ium

Rich

med

ium

1262000 1261000 1260000 1259000

200

100

0

200

100

0

number of

reads

TSS1

TSS2

M9

0

100

2824000 2823000 2822000 2821000

TTS

5’end 3’end

pncC recA recX

Page 15: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

The previously validated intrinsic termination sites have transcripts passing-through

0

50

100

953000 952000

5’end

pflB pflB terminator

Termination sites without read-through

3’end

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

ychH prs

lolB-ispE-prs terminator

dauA prs

prs terminator

dauA

3’ end extension across the TTS

1259000 1260000 1261000 1262000

3’end5’end

0

100

200

Termination sites with read-through

Page 16: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

D.

Median size (in nt) of extension beyond TTS10 1000

75

50

25

0

% o

f re

ad

s p

ass

ing

th

roug

h T

TS

A. Genomewide characterisation of termination read-through B. Percentage of read-through at

termination sites

C. Length of the read-through (nt)

0.00

0.25

0.50

0.75

Pe

rce

nt

D. Presence of attenuator and

terminator loop

Figure 2

0-10 nt 10-100 nt 100-1000 nt >1000 nt

0-5% 5-20% 20-50% 50-80% 80-100%

Novel TTS Known TTS

Novel TTS Known TTS

Perc

en

tag

e o

f TTS (or ra

nd

om

posi

tion

s)

known TTS

novel TTS

known

attenuator

number of

reads at TSS

5000

10000

15000

Attenuator RNAstructure

random SMRTseq TTS

Median size of the 3’end extension across the TTSs

The majority of defined intrinsic termination sites have transcripts passing-through

- 40% of defined TTSs have read-throughs that contain additional gene(s)

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

ychH prs

lolB-ispE-prs terminator

dauA prs

prs terminator

dauA

3’ end extension across the TTS

1259000 1260000 1261000 1262000

3’end5’end

0

100

200

Internal termination sites with read-through

Page 17: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

The degree of read-through is condition dependent for some TTSs

- The upregulation of dauA gene is necessary to respond to the pH changes in Rich medium

- dauA:

encoding C4 dicarboxylic acid transporter, necessary for succinate transportation

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

prs dauA90%

40%

TTS 3’end5’end

M9

Rich

prs dauA

0

100

200

0

100

200

Page 18: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Regulate operon expression through Riboswitch in the 5’ UTR

Riboswitch

leuA

TSS TTS

1 2 3 4leuL leuB leuC leuD

Anti-anti-terminator

terminator

Leader peptide

Leu

RNAP off

Leu (+)

5’ untranslated region

OFF

- Riboswitch: regulatory RNA element in the 5’ untranslated region (5’ UTR)

- Common strategy for regulating amino acid synthesis and antibiotics resistance

leuA

TSS TTS

leuL leuB leuC leuD

Antiterminator

1

2 3

4

Leu (-)

ON

Protein coding genes

Page 19: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

The switch between terminator and antiterminator might exist in some defined TTSs

AntiterminatorTerminator

on

off

15%

SMRT-Cappable-seq defined TTSs

Antiterminator structure

1%

randomly selected sequences

Page 20: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

A novel mechanism for bacterial operon regulation

- Control read-through across internal termination sites at the 3’ end of genes

- Control the expression of part of the operon

- Establish the operon polarity and increase the plasticity of gene regulation

Supplementary Figure 2

Genomic position (bp)

Num

ber of re

ad

s

5’end

1000

500

190000 191000 192000 193000

3’end

Gene

1 2 3 4 5

Polycistronic TUs

with different

gene combinations

1 2 3 4 5

1 2 3

1 2

Internal TTS

Page 21: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Sometimes transcription termination coordinates with the transcription start

M9

Ric

h

1151000 1152000 1153000 1154000 1155000

0

100

200

300

400

500

0

100

200

300

400

500

TSS_1150933

TSS_1151052

TSS_1151511

TSS_1151575

100

0

200

300

400

fabG acpP fabF pabC mltG

hpt

can

yadG yadH yadI

141361

142703

Transcriptional coordination for overlapping sense transcripts

Transcriptional coordination for overlapping antisense transcripts B.

C.

pncC recA recX

prs dauA

Genomic position (bp)

A. Example of condition dependent read-through

Figure 3

number of

reads

number of

reads

100

0

Genomic positions (bp)

2824000 28220002823000

Genomic positions (bp)

2821000

142000 143000 144000 145000

Rich

med

ium

Min

imal m

ed

ium

100

0

Min

imal m

ed

ium

Rich

med

ium

1262000 1261000 1260000 1259000

200

100

0

200

100

0

number of

reads

TSS1

TSS2

M9

0

100

2824000 2823000 2822000 2821000

1151000 115200 1153000 1154000 1155000

TSS1

TSS2

TSS3

Transcription coordination

TTS

3’end5’end

pncC recA recX

3’end5’end

5%

30%

5%

80%

Page 22: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Known Operons

SMRT-cappable-seq Operons

Size (base)

Elo

ng

ate

d 5

Elo

ng

ate

d 3

Elongated 3’

A. Example of a novel extended operon

Known Operon

rpsB tsf pyrH frr

Read counts

Genomic position (bp)

Figure 4

C. Fractions of novel operons in E. coli

B. Genome-wdie identif cation of a novel extented operons

TTS

tsf

Known operon

Elongated 3’end

tff rpsB pyrH frr

Genomic position

- 40% of the annotated operons in RegulonDB are extended by SMRT-Cappable-seq

SMRT-Cappable-seq identifies 840 novel operons for E. coli

Known Operons

SMRT-cappable-seq Operons

Size (base)

Elo

ng

ate

d 5

Elo

ng

ate

d 3

Elongated 3’

A. Example of a novel extended operon

Known Operon

rpsB tsf pyrH frr

Read counts

Genomic position (bp)

Figure 4

C. Fractions of novel operons in E. coli

B. Genome-wdie identif cation of a novel extented operons

Known Operons

SMRT-cappable-seq Operons

Size (base)

Elo

ng

ated

5’

Elo

ng

ated

3’

Elongated 3’

A. Example of a novel extended operon

Known Operon

rpsB tsf pyrH frr

Read counts

Genomic position (bp)

Figure 4

C. Fractions of novel operons in E. coli

B. Genome-wdie identif cation of a novel extented operons

Same

Novel

Shorter 3’end

Page 23: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Summary

SMRT-Cappable-seq

- Enrich and sequence the full length mRNA

- Identify transcription start and end at a single base resolution

- Used for bacterial operome and transcriptome analysis

E. coli operome:

- Identify 840 novel operons

- Reveal pervasive control of read-through across the termination sites

- A novel mechanism for generating polycistronic transcripts and operon

polarity

Applications of SMRT-Cappable-seq

- Study bacterial operome and transcriptome

- Enrich the full length prokaryotic mRNA, microbiome

Page 24: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Laurence Ettwiller

Ira Schildkraut

Ivan Correa

Madalee Wulf

Nick Guan

Alex Fomenkov

Brain Anton

Bill Jack

Rich Roberts

Jim Ellard

Don Comb

Acknowledgements

Collaboration with PacBio

Tyson Clark

Matthew Boitano

Page 25: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,
Page 26: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Supplementary Figure 2

Genomic position (bp)

Num

ber of re

ad

s

Functional protein products

Page 27: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Novel TTS

Known TTS

The degree of read-through is not correlated with the strength of the terminator

Page 28: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

SMRTbell template

Second strand synthesis

and PCR amplification

DTB-Gppp AAAAAA

First strand cDNA synthesis

DTB-Gppp AAAAAA

DTB-Gppp AAAAAA

Removal of incomplete cDNA by Rnase I

cDNA incomplete at 3’end

Addition of polyG linker by TdT

DTB-Gppp AAAAAA

GGG

Streptavidin

RT primerTTTTTT

USER digestion and Ligation

SMRT sequencing

cD

NA

synth

esis

Lib

rary

pre

para

tion

dUdU

Page 29: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

A. Principle of SMRT-Cappable-seq B. Read distribution across

genomic features

C. TSS correlation accross conditions

1 10 100 1000 10000

# of read at TSS (Rich medium)

10000

1000

100

10

1

# o

f re

ad

at TSS (M

9 m

ed

ium

)

Control Control Cappable-seq Cappable-seq

Protein

coding genes

Primary rRNA Processed rRNA

E. coli

Figure 1

Page 30: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Regulate operon expression by controlling premature termination in the 5’UTR

Non-coding

Leader peptide

Riboswitch

Leu

Page 31: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

- Cappable-seq conditions can not be

used for full length transcript.

- Optimization:

Capping reaction; enrichment step

- SMRT-cappable-seq:

1000 fold greater recovery of

primary transcripts

90% mRNA:

~80% protein coding mRNA

~10% primary transcripts for rRNA

New condition efficiently enriches the full-length primary transcripts

Transcripts composition

Cappable Control

18%

mRNA

Perc

enta

ge o

f m

apped r

eads

Cappable Control

90%

mRNA

5%

mRNA5%

mRNA

Page 32: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

Sometimes transcription termination coordinates with the transcription start

M9

Ric

h

785500 786000 786500 787000 787500

0

200

400

600

800

0

200

400

600

800

TSS_785591

TSS_787632

TSS_787670

M9

Ric

h

785500 786000 786500 787000 787500

0

200

400

600

800

0

200

400

600

800

TSS_785591

TSS_787632

TSS_787670

aroG gpmA

Termination of transcriptions from both directions

TTS

hpt

can

yadG yadH yadI

141361

142703

Transcriptional coordination for overlapping sense transcripts

Transcriptional coordination for overlapping antisense transcripts B.

C.

pncC recA recX

prs dauA

Genomic position (bp)

A. Example of condition dependent read-through

Figure 3

number of

reads

number of

reads

100

0

Genomic positions (bp)

2824000 28220002823000

Genomic positions (bp)

2821000

142000 143000 144000 145000

Rich

med

ium

Min

imal m

ed

ium

100

0

Min

imal m

ed

ium

Rich

med

ium

1262000 1261000 1260000 1259000

200

100

0

200

100

0

number of

reads

TSS dependent termination

hptcan

yadG

- Transcription coordination: different assembles of the RNA transcriptional machinery

Page 33: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

SMRT-Cappable-seq is a powerful method to

phase the transcription start and termination site

hpt

can

yadG yadH yadI

141361

142703

Transcriptional coordination for overlapping sense transcripts

Transcriptional coordination for overlapping antisense transcripts B.

C.

pncC recA recX

prs dauA

Genomic position (bp)

A. Example of condition dependent read-through

Figure 3

number of

reads

number of

reads

100

0

Genomic positions (bp)

2824000 28220002823000

Genomic positions (bp)

2821000

142000 143000 144000 145000

Rich

med

ium

Min

imal m

ed

ium

100

0

Min

imal m

ed

ium

Rich

med

ium

1262000 1261000 1260000 1259000

200

100

0

200

100

0

number of

reads

TSS1

TSS2

M9

0

100

2824000 2823000 2822000 2821000

Termination site5’end 3’end

Phase TSS and TTS

illumina sequencing reads for the same loci

Page 34: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

PacBio iso-seq protocol is not adapted for bacterial transcripts

5’3’ cDNA

mRNA

mRNA

Template switchcDNA synthesis

Problems of using iso-seq

protocol for SMRT cappable-seq

Low yield

Has bias for different

DTB capped transcripts

adapter

PacBio RNA-seq protocol (iso-seq)

TSS defined by template switch

E. coli TSS motif

Annotated TSS

TSS TSS

Page 35: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

- Has higher yield than using template switching.

- Does not introduce bias at the TSS.

Template switch TdT

TdT based method generates the same TSS motif profile

Annotated TSS

E. coli TSS motif

Page 36: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

WebLogo 3.5.0

0.0

1.0

bits

-30

T

G

C

A

-29

G

C

T

A

-28

T

G

AC

-27

T

G

C

A

-26

G

C

A

-25

A

T

C

G

-24

T

CAG

-23

A

T

GC

-22

A

TCG

-21

A

TGC

-20

T

ACG

-19

T

G

C

-18

A

T

CG

-17

G

C

AT

-16

C

A

T

G

-15

C

GAT

-14

A

G

C

T

-13

A

T

CG

-12

A

CTG

-11

T

A

GC

-10

TA

CG

-9

A

TCG

-8

T

AGC

-7

A

TGC

-6

A

GCT

-5

GACT

-4

GCAT

-3

G

CAT

-2

ACGT

-1

GCT

0

G

CT

1

CGAT

2

GACT

3

G

A

CT

4

G

ACT

5

G

A

C

T

6

GCAT

7

G

C

AT

8

G

C

T

9

G

C

AT

10

G

C

A

T

Motif of SMRT-cappable-seq defined termination sites

A GC rich hairpin structure is present in the TTSs

GC- rich region

D.

Median size (in nt) of extension beyond TTS10 1000

75

50

25

0

% o

f re

ad

s p

ass

ing

thro

ug

h T

TS

A. Genomewide characterisation of termination read-through B. Percentage of read-through at

termination sites

C. Length of the read-through (nt)

0.00

0.25

0.50

0.75

Perc

en

t

D. Presence of attenuator and

terminator loop

Figure 2

0-10 nt 10-100 nt 100-1000 nt >1000 nt

0-5% 5-20% 20-50% 50-80% 80-100%

Novel TTS Known TTS

Novel TTS Known TTS

Perc

enta

ge o

f TTS (or ra

nd

om

posi

tion

s)

known TTS

novel TTS

known

attenuator

number of

reads at TSS

5000

10000

15000

Attenuator RNAstructure

random SMRTseq TTS

95%

SMR-cappable-seq defined TTSs

20%

randomly selected sequences

T

T

TG

T

AC

CG

10

A

A

A

A

C

C

C

C

G

G

20

G

GCG

T G

C

T

C

C

30

G

G

G

G

T

T

T

T

T

T

40

C

Probability >= 99%

99% > Probability >= 95%

95% > Probability >= 90%

90% > Probability >= 80%

80% > Probability >= 70%

70% > Probability >= 60%

60% > Probability >= 50%

50% > Probability

ENERGY = 3.8 05/28/17 19:41:09

T

T

T

G

T

AC C

G

10

A

A

A

A

C

C

C

C

G

G

20

G

G

CG

T

G

C

T

C

C

30

G

G

G

G

T

T

T

T

T

T

40

C

Probability >= 99%

99% > Probability >= 95%

95% > Probability >= 90%

90% > Probability >= 80%

80% > Probability >= 70%

70% > Probability >= 60%

60% > Probability >= 50%

50% > Probability

ENERGY = 3.8 05/28/17 19:41:09

TTS

TTS

Page 37: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

The previously validated intrinsic termination sites have transcripts passing-throughM9

Rich

216000217000218000219000

0

200

400

0

200

400

TSS_218837

nlpEyaeF

proS

proS terminator

216000 217000 218000

3’end5’end

0

200

400

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

ychH prs

lolB-ispE-prs terminator

dauA prs

prs terminator

dauA

3’ end extension across the TTS

1259000 1260000 1261000 1262000

3’end5’end

0

100

200

Page 38: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

D.

Median size (in nt) of extension beyond TTS10 1000

75

50

25

0

% o

f re

ad

s p

ass

ing

th

roug

h T

TS

A. Genomewide characterisation of termination read-through B. Percentage of read-through at

termination sites

C. Length of the read-through (nt)

0.00

0.25

0.50

0.75

Pe

rce

nt

D. Presence of attenuator and

terminator loop

Figure 2

0-10 nt 10-100 nt 100-1000 nt >1000 nt

0-5% 5-20% 20-50% 50-80% 80-100%

Novel TTS Known TTS

Novel TTS Known TTS

Perc

en

tag

e o

f TTS (or ra

nd

om

posi

tions)

known TTS

novel TTS

known

attenuator

number of

reads at TSS

5000

10000

15000

Attenuator RNAstructure

random SMRTseq TTS

Percentage of read-through across the TTSs

D.

Median size (in nt) of extension beyond TTS10 1000

75

50

25

0

% o

f re

ad

s p

ass

ing

th

roug

h T

TS

A. Genomewide characterisation of termination read-through B. Percentage of read-through at

termination sites

C. Length of the read-through (nt)

0.00

0.25

0.50

0.75

Pe

rce

nt

D. Presence of attenuator and

terminator loop

Figure 2

0-10 nt 10-100 nt 100-1000 nt >1000 nt

0-5% 5-20% 20-50% 50-80% 80-100%

Novel TTS Known TTS

Novel TTS Known TTS

Perc

en

tag

e o

f TTS (or ra

nd

om

posi

tion

s)

known TTS

novel TTS

known

attenuator

number of

reads at TSS

5000

10000

15000

Attenuator RNAstructure

random SMRTseq TTS

Median size of the 3’end extension across the TTSs

40% of TTSs have read-throughs

that contain additional gene(s)

Percentage = number of reads passing-through

number of reads at TTS + number of reads passing-through

The majority of defined intrinsic termination sites have transcripts passing-through

Current concept for E. coli termination sites needs to be redefined.

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

M9

Rich

1259000126000012610001262000

0

100

200

0

100

200

TSS_1262176

ychH prs

lolB-ispE-prs terminator

dauA prs

prs terminator

dauA

3’ end extension across the TTS

1259000 1260000 1261000 1262000

0

100

200

Page 39: SMRT-Cappable-seq reveals the complex operome of bacteria · - Transcription termination sites (TTS): significant accumulation of 3’end reads by Binomial test. - 408 TTSs were identified,

M9

Ric

h

412000041220004124000

−100

0

100

200

300

400

−100

0

100

200

300

400

TSS_4119356

TSS_4120407

TSS_4121960

TSS_4122348

TSS_4123368

TSS_4124510

cytR ftsN hslV hslU menA rraA

cytRp hslVp

menAp

rraAp

RegulonDB

cytRp

hslVp

menAphslUpftsNp

SMRT-cappable-seq

Known Operons

SMRT-cappable-seq Operons

Size (base)

Elo

ng

ate

d 5

Elo

ng

ate

d 3

Elongated 3’

A. Example of a novel extended operon

Known Operon

rpsB tsf pyrH frr

Read counts

Genomic position (bp)

Figure 4

C. Fractions of novel operons in E. coli

B. Genome-wdie identif cation of a novel extented operons

Known Operons

SMRT-cappable-seq Operons

Size (base)

Elo

ng

ated

5’

Elo

ng

ated

3’

Elongated 3’

A. Example of a novel extended operon

Known Operon

rpsB tsf pyrH frr

Read counts

Genomic position (bp)

Figure 4

C. Fractions of novel operons in E. coli

B. Genome-wdie identif cation of a novel extented operons

Same

Novel

Shorter 3’end

rraAphslVp

SMRT-Cappable-seq identifies 840 novel operons for E. coli

TTS

TSS

No confirmed TTS

5’end 3’end