(positive) design of ribonucleic...

181
(Positive) Design of RiboNucleic Acids Yann Ponty ?,,? Centre National de la Recherche Scientifique (CNRS) LIX, Ecole Polytechnique AMIBio team, Inria Saclay

Upload: others

Post on 10-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

(Positive) Design of RiboNucleic Acids

Yann Ponty?,•,†

? Centre National de la Recherche Scientifique (CNRS)

• LIX, Ecole Polytechnique

† AMIBio team, Inria Saclay

http://goo.gl/mejsFh

Page 2: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

PhD in Computer Science, Université Paris-Sud (France)

CNRS research scientist

Faculty at LIX, Computer Science department of Ecole Polytechnique

Group leader – AMIBio team (Ecole Polytechnique and Inria Saclay)

Postdoc experience in RNA Computational Biology (Boston, Paris)and Discrete Mathematics (Paris)

Extended sabbatical at Simon Fraser University (Vancouver, Canada)

Page 3: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

Page 4: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

Pol

Page 5: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T APol

A

Page 6: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T APol

A U

Page 7: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T APol

A U G

Page 8: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T APol

A U G G U U A C C C A U

Page 9: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Page 10: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Ribosome

Page 11: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Ribosome

Met

Page 12: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Ribosome

ValMet

Page 13: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Ribosome

ThrVal

Met

Page 14: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Ribosome

HisThr

ValMet

Page 15: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Met Val Thr His Ile Leu His Asn

Page 16: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Met Val Thr His Ile Leu His Asn

Page 17: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Met Val Thr His Ile Leu His Asn

Page 18: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

RNAs

{A,C,G,U}?

DNA

{A,C,G,T}?

Proteins{Ala, Arg, . . . , Val}?︸ ︷︷ ︸

20+ Amino acids

THE CODE(genes)

THE MACHINE(enzymes)

MEH. . .

A T G G T T A C C C A T

T A C C A A T G G G T A

A U G G U U A C C C A U

Met Val Thr His Ile Leu His Asn

Page 19: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

DNA

RNA

Proteins

Transcription

Translation

Regulation

Transfer

Participates

Maturation

Car

rier

Synthesis

RNA functionsMessenger

Translation

Regulation

Enzyme

Catalytic

. . .

0

500

1000

1500

2000

2500

3000

#RNA Functional Families (RFAM DB)

Page 20: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology (v2.0)

DNA

RNA

Proteins

Transcription

TranslationRegulation

Transfer

Participates

Maturation

Car

rier

Synthesis

RNA functionsMessenger

Translation

Regulation

Enzyme

Catalytic

. . .

0

500

1000

1500

2000

2500

3000

#RNA Functional Families (RFAM DB)

Page 21: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Fundamental dogma of molecular biology

DNA

RNA

Proteins

Transcription

TranslationRegulation

Transfer

Participates

Maturation

Car

rier

Synthesis

RNA functionsMessenger

Translation

Regulation

Enzyme

Catalytic

. . .

0

500

1000

1500

2000

2500

3000

#RNA Functional Families (RFAM DB)

Page 22: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA world: Resolving the chicken vs egg paradox at the origin of life. . .

Gene Enzyme

encodes

replicates

RNA

encodes

replicates

A gene big enough to specify an enzyme would be too big to replicate accuratelywithout the aid of an enzyme of the very kind that it is trying to specify. So thesystem apparently cannot get started.[· · · ] This is the RNA World. To see how plausible it is, we need to look at whyproteins are good at being enzymes but bad at being replicators; at why DNA isgood at replicating but bad at being an enzyme; and finally why RNA might justbe good enough at both roles to break out of the Catch-22.R. Dawkins. The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution

Page 23: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA world: Resolving the chicken vs egg paradox at the origin of life. . .

Gene Enzyme

encodes

replicates

RNA

encodes

replicates

A gene big enough to specify an enzyme would be too big to replicate accuratelywithout the aid of an enzyme of the very kind that it is trying to specify. So thesystem apparently cannot get started.[· · · ] This is the RNA World. To see how plausible it is, we need to look at whyproteins are good at being enzymes but bad at being replicators; at why DNA isgood at replicating but bad at being an enzyme; and finally why RNA might justbe good enough at both roles to break out of the Catch-22.R. Dawkins. The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution

Page 24: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA folding

G/C

Canonical base-pairs

Watson/C

rick base-pairsW

obble base-pair

U/A

U/G

RNA are single-stranded andfold on themselves, establishingcomplex 3D structures that areessential to their function(s).

RNA structures are stabilized bybase-pairs, each mediated byhydrogen bonds.

Page 25: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Design

RNA = Linear Polymer = Sequence in {A,C,G,U}?

UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122

Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)

Page 26: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Evolution of RNAs

Homologous genes = Functionally equivalent, within or across organismsUsually well-captured by sequence similarity in proteins, binding sites. . .

Problem: Many classes of non-(protein) coding RNAs (ncRNAs) poorlyconserved at the sequence level but adopt a conserved structure!

RFAM Bacterial RNAse-P class B Alignment

Page 27: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Design

RNA = Linear Polymer = Sequence in {A,C,G,U}?

UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122

Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)

Structure Prediction

Page 28: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Design

RNA = Linear Polymer = Sequence in {A,C,G,U}?

UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122

Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)

Structure Prediction

RNA Design

Page 29: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Why we design RNAs

To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality

To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure

To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces

To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments

To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters

To perform controlled experiments

Page 30: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Why we design RNAs

To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality

To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure

To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces

To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments

To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters

To perform controlled experiments

Page 31: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Why we design RNAs

To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality

To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure

To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces

To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments

To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters

To perform controlled experiments

Page 32: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Why we design RNAs

To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality

To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure

To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces

To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments

To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters

To perform controlled experiments

Page 33: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Why we design RNAs

To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality

To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure

To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces

To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments

To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters

To perform controlled experiments

Page 34: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Why we design RNAs

To create building blocks for synthetic systemsRationally-designed RNAs increase orthogonality

To assess the significance of observed phenomenonRandom models should include every established characters. . .. . . including adoption of a single structure

To test/push our understanding of how RNA foldsMisfolding RNAs reveal gaps in our energy models and descriptors for theconformational spaces

To help search for homologous sequencesIncomplete covariance models hindered by limited training setsDesign can be used to generalize existing alignments

To fuel RNA-based therapeuticsSequence-based (siRNA, synthetic genes), but structure matters

To perform controlled experiments

Page 35: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Controlled experiments through RNA design

Motivation: Quantifying the impactof structure S on efficacy of a singleExon Splicing Enhancers (ESE):

Presence of ESE motif E ;

Different structures S1, S2. . . ;

Avoid library of (∼1500!)documented ESEs motifs.

Objectives. Design RNA which:1 Folds into a prescribed

structure;2 Features/avoids motifs.3 Control GC%, Boltz. prob.. . . .

Structural context of ESE motif intranscript was shown to affect itsfunctionality. [Liu et al, FEBS Lett. 2010]

Page 36: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Design objectives

Positive structural designOptimize affinity of designs towards target structure(s)Examples: Most stable sequence for given fold. . .

Negative structural designLimit affinity of designs towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .

Additional constraints:

Forbid motif list to appear anywhere in designForce motif list to appear each at least onceLimit available alternatives at certain positionsControl overall composition (GC-content)

Page 37: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Outline

I. Single Structure Design (IncaRNAtion)

II. Constrained Design using Formal Languages

III. Multiple Structures

Page 38: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

I. Inverse Folding

Designing a given structure

Page 39: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA sequence and structure(s)

RNA = Linear Polymer = Sequence in {A,C,G,U}?

UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122

Primary Structure Secondary Structure Tertiary Structure5s rRNA (PDBID: 1K73:B)

MFE folding prediction: O(n3)

Page 40: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Crossing interactions

Excluded from the secondary structure:

Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].

Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)

(Pseudo?)knots: Crossing sets of nested stable base-pairs

G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G

1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229

Group I Ribozyme (PDBID: 1Y0Q:A)

Page 41: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Crossing interactions

Excluded from the secondary structure:

Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].

Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)

(Pseudo?)knots: Crossing sets of nested stable base-pairs

G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G

1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229

Group I Ribozyme (PDBID: 1Y0Q:A)

Page 42: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Crossing interactions

Excluded from the secondary structure:

Non-canonical base-pairs:Any base-pair other than {(A-U), (C-G), (G-U)}OR interacting in a non-standard way (WC/WC-Cis) [Leontis Westhof, RNA2001].

Canonical CG base-pair (WC/WC-Cis) Non-canonical base-pair (Sugar/WC-Trans)

(Pseudo?)knots: Crossing sets of nested stable base-pairs

G A G C C U U U A U A C A G U A A U G U A U A U C G A A A A A U C C U C U A A U U C A G G G A A C A C C U A A G G C A A U C C U G A G C U A A G C U C U U A G U A A U A A G A G A A A G U G C A A C G A C U A U U C C G A U A G G A A G U A G G G U C A A G U G A C U C G A A A U G G G G A U U A C C C U U C U A G G G U A G U G A U A U A G U C U G A A C A U A U A U G G A A A C A U A U A G A A G G A U A G G A G U A A C G A A C C U A U C C G U A A C A U A A U U G

1 2 12 22 32 42 52 57 67 77 87 97 107 117 127 137 147 157 167 177 187 197 207 217 227 229

Group I Ribozyme (PDBID: 1Y0Q:A)

Crossing interactions do exist!

Example: Group II Intron (PDB ID: 3IGI)

But are hard to predict[Lyngsoe-ICALP’04]

[Sheikh Backofen Ponty, CPM’12]

Page 43: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T →∞

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 44: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T →∞

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 45: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T = 0h

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 46: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T = 1h

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 47: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T = 2h

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 48: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T = 5h

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 49: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T = 10h

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 50: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T →∞

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 51: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Thermodynamics vs Kinetics

Paradigms for RNA structure prediction

1978–1990s Most probable structure = Minimal Free-Energy (MFE)

1990s–2010s Functional structure(s) = Boltzmann ensemble (partitionfunction)

2010s–???? Embracing the kinetics of RNA folding

T = 10h

mRNA half-life: ∼7h(Mouse [Sharova2009])

Page 52: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

Page 53: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

Page 54: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

Page 55: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

Page 56: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

ES = 2 ·∆

U

G

+ 4 ·∆

G

C

+ 2 ·∆

C

G

Page 57: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

ES = ∆

(C

G

G

C

)+ ∆

(G

C

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)

Page 58: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

ES = ∆

(C

G

G

C

)+ ∆

(G

C

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

G

G

C

)+ ∆

(U

U

ACA

G

)+ ∆

(G

A

UG

A

G

CC

)+ ∆

(C A

U

GUG

)

Page 59: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem statement

C G G A U G U U A C A G C G A G C A U G U G C C C G

1 10 20 26

C

G

G

A

U

G

U

U

ACA

G

C

GA

GC A

U

GUG

CC

C

G

1

10

20

26

RNA structure S: Non-crossing base-pairs for positions in sequence w

Motifs: Sequence/structure features (e.g. Base-pairs, Stacks, Loops. . . )

Energy model:Motif→ Free-energy contribution ∆(·) ∈ R− ∪ {+∞}Free-Energy Ew (S): Sum over (independently contributing) motifs in S

Definition (MFE-PREDICT(E) problem)

Input: RNA sequence w ∈ {A,C,G,U}∗Output: Secondary struct. S∗ with Minimal Free-Energy (MFE) Ew (S∗)

Problem solved exactly in O(n3) time.[Nussinov Jacobson, PNAS 1980] [Zuker Stiegler, NAR 1981]. . . .

Page 60: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Dynamic programming (DP) for RNA folding

Theorem ([Nussinov and Jacobson(1980)])

Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory

i j=

i i+1 j+

i k j

≥ θ

Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))

Ni,j : Max #base-pairs over interval [i, j]

Ni,t = 0, ∀t ∈ [i, i + θ]

Ni,j = min

Ni+1,j {i unpaired}j

mink=i+θ+1

Ei,k + Ni+1,k−1 + Nk+1,j {i paired to k }

Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]

Comparative folding [Sankoff1985]

Equilibrium base-pairing probabilities [McCaskill1990]

Moments of additive features [Miklos2005,Ponty2011]

∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]

Basic crossing structures [Rivas1999]. . .

Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]

Moments of additive features [Miklos2005,Ponty2011]

Maximum expected accuracy structure [Do2006]

Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]

Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space

Objective function additive with respect to DP scheme

Page 61: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Dynamic programming (DP) for RNA folding

Theorem ([Nussinov and Jacobson(1980)])

Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory

i j=

i i+1 j+

i k j

≥ θ

Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))

Ci,j : Number of secondary structures compatible with interval [i, j]

Ci,t = 1, ∀t ∈ [i, i + θ]

Ci,j =∑∑∑{

Ci+1,j {i unpaired}∑∑∑jk=i+θ+11comp.(i,k)×Ci+1,k−1×Ck+1,j {i paired to k }

Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]

Comparative folding [Sankoff1985]

Equilibrium base-pairing probabilities [McCaskill1990]

Moments of additive features [Miklos2005,Ponty2011]

∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]

Basic crossing structures [Rivas1999]. . .

Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]

Moments of additive features [Miklos2005,Ponty2011]

Maximum expected accuracy structure [Do2006]

Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]

Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space

Objective function additive with respect to DP scheme

Page 62: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Dynamic programming (DP) for RNA folding

Theorem ([Nussinov and Jacobson(1980)])

Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory

i j=

i i+1 j+

i k j

≥ θ

Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))

Zi,j =∑

S comp.with w[i,j]

e−Ew (S)

RT = Partition function for compatible structs within [i, j]

Zi,t = 1, ∀t ∈ [i, i + θ]

Zi,j =∑∑∑{

Zi+1,j {i unpaired}∑jk=i+θ+1e

−Ei,kRT ×Zi+1,k−1×Zk+1,j {i paired to k }

Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]

Comparative folding [Sankoff1985]

Equilibrium base-pairing probabilities [McCaskill1990]

Moments of additive features [Miklos2005,Ponty2011]

∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]

Basic crossing structures [Rivas1999]. . .

Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]

Moments of additive features [Miklos2005,Ponty2011]

Maximum expected accuracy structure [Do2006]

Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]

Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space

Objective function additive with respect to DP scheme

Page 63: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Dynamic programming (DP) for RNA folding

Theorem ([Nussinov and Jacobson(1980)])

Max #base-pairs/min energy structure computed in O(n3)/O(n2) time/memory

i j=

i i+1 j+

i k j

≥ θ

Ei,k : Free-energy contribution of base-pair (i, k). (−1/+∞ or ∆G(si?≡ sk ))

Zi,j =∑

S comp.with w[i,j]

e−Ew (S)

RT = Partition function for compatible structs within [i, j]

Zi,t = 1, ∀t ∈ [i, i + θ]

Zi,j =∑∑∑{

Zi+1,j {i unpaired}∑jk=i+θ+1e

−Ei,kRT ×Zi+1,k−1×Zk+1,j {i paired to k }

Many extensions:Nearest-neighbor/Turner energy model [Zuker1981]

Comparative folding [Sankoff1985]

Equilibrium base-pairing probabilities [McCaskill1990]

Moments of additive features [Miklos2005,Ponty2011]

∆ kcal.mol−1 suboptimal structures of MFE [Wuchty1999]

Basic crossing structures [Rivas1999]. . .

Exact sampling in Boltzmann distr. [Ding2003,Ponty2008]

Moments of additive features [Miklos2005,Ponty2011]

Maximum expected accuracy structure [Do2006]

Distance-classified partitioning of Boltzmann ens. [E.Freyhult2007a]

Made possible by:Completeness/Unambiguity of decomposition∃ energy-preserving bijection between derivations of DP scheme and search space

Objective function additive with respect to DP scheme

Page 64: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA inverse folding

RNA = Linear Polymer = Sequence in {A,C,G,U}?

UUAGGCGGCCACAGC

GGUGGGGUUGCCUCC

CGUACCCAUCCCGAA

CACGGAAGAUAAGCC

CACCAGCGUUCCGGG

GAGUACUGGAGUGCG

CGAGCCUCUGGGAAA

CCCGGUUCGCCGCCA

CC

U

U

A GG

CG

GC

C

A

C

A

G

C

G

G

U

G

G

GG

U

U

G

C

C

U

C

C

C

G

UA

C

C

C

A

U

CC

C

GA

A

C

A

C

G

GAA

G

AU

A

A

G

C

C

C

A

C

CA

G

CG

U

U

CC

GG

GG A

G

U

A

C U

G

GA

GU

G

CG

C

GA

G

C

CU

CU

G

GG

A

A

A

CC

CG

GUUC

GC

CG

CC

A

CC

1

10

20

30

40

50

60

70

80

90

100

110

120

122

Primary Structure Secondary Structure Structure Tertiaire5s rRNA (PDBID: 1K73:B)

MFE folding prediction: Θ(n3)

Inverse folding: NP-hard?

Page 65: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Inverse Folding

Definition (INVERSE-FOLDING(E) problem)

Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:

∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆

or ∅ if no such sequence exists.

Difficult problem: No obvious DP decomposition

Existing algorithms: Heuristics or Exponential-time

Complexity of problem unknown (despite [Schnall Levin et al, ICML’08])Reason: Non locality, no theoretical frameworks, too many parameters. . .

Page 66: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Inverse Folding

Definition (INVERSE-FOLDING(E) problem)

Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:

∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆

or ∅ if no such sequence exists.

Example:

1

10

20

26

Page 67: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Inverse Folding

Definition (INVERSE-FOLDING(E) problem)

Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:

∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆

or ∅ if no such sequence exists.

Example:

1

10

20

26

= 1 10

13

1 10

13

Page 68: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Inverse Folding

Definition (INVERSE-FOLDING(E) problem)

Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:

∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆

or ∅ if no such sequence exists.

Example:

1

10

20

26

= 1 10

13

1 10

13

AAGAGUCGCUCUCfolds<-----1 10

13

Page 69: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

RNA Inverse Folding

Definition (INVERSE-FOLDING(E) problem)

Input: Secondary structure S + Energy distance ∆ > 0.Output: RNA sequence w ∈ Σ? such that:

∀S′ ∈ S|w | \ {S} : Ew,S′ ≥ Ew ,S + ∆

or ∅ if no such sequence exists.

Example:AAGAGUCGCUCUCAAGAGUCGCUCUC

1

10

20

26

1

10

20

26

Folds

Page 70: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Existing approaches for negative design

Based on local search. . .

RNAInverse - TBI Vienna

Info-RNA - Backofen@Freiburg

RNA-SSD - Condon@UBC

NUPack - Pierce@Caltech

. . . bio-inspired algorithms. . .

RNAFBinv - Barash@Ben Gurion

FRNAKenstein - Hein@Oxford

AntaRNA - Backofen@Freiburg

ERD - Ganjtabesh@Tehran

. . . exact approaches. . .

RNAIFold - Clote@Boston College

CO4 - Will@Leipzig

Typical issues:Naive initialization strategiesPoor coverage of sequence space:

Local search remain confined near initial sequenceGC-rich produced sequences

⇒ Global sampling [Levin et al, NAR 12]

Page 71: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Existing approaches for negative design

Based on local search. . .

RNAInverse - TBI Vienna

Info-RNA - Backofen@Freiburg

RNA-SSD - Condon@UBC

NUPack - Pierce@Caltech

. . . bio-inspired algorithms. . .

RNAFBinv - Barash@Ben Gurion

FRNAKenstein - Hein@Oxford

AntaRNA - Backofen@Freiburg

ERD - Ganjtabesh@Tehran

. . . exact approaches. . .

RNAIFold - Clote@Boston College

CO4 - Will@Leipzig

Typical issues:Naive initialization strategiesPoor coverage of sequence space:

Local search remain confined near initial sequenceGC-rich produced sequences

⇒ Global sampling [Levin et al, NAR 12]

Page 72: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

The case for a control of GC-content

High GC-content suspected to induce kinetic traps

Page 73: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Global sampling [Levin et al, NAR 12]

Target structure SBoltzmann distribution based on affinity towards SRandom generation from Boltzmann DistributionFold sampled sequences and compare to target

Boltzmann factor:

Bw (S) := e−Ew (S)

RT

Pseudo-Partition Function:

Z(S) =∑

w∈Σ∗

Bw (S)

Boltzmann probability:

p(s) :=Bw (S)

Z

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

Page 74: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

IncaRNAtion [Reinharz et al, Bioinformatics 2013]

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

Explore sequence spaceStructure fixed

(Only 1 case applies)

a′a bi + 1 j

∑a′

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

a′ b′a bi + 1 j − 1

∑a′,b′

Page 75: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

IncaRNAtion [Reinharz et al, Bioinformatics 2013]

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

Explore sequence spaceStructure fixed

(Only 1 case applies)

a′a bi + 1 j

∑a′

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

a′ b′a bi + 1 j − 1

∑a′,b′

Page 76: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

IncaRNAtion [Reinharz et al, Bioinformatics 2013]

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

Explore sequence spaceStructure fixed

(Only 1 case applies)

a′a bi + 1 j

∑a′

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

a′ b′a bi + 1 j − 1

∑a′,b′

Page 77: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

IncaRNAtion [Reinharz et al, Bioinformatics 2013]

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

Explore sequence spaceStructure fixed

(Only 1 case applies)

a′a bi + 1 j

∑a′

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

a′ b′a bi + 1 j − 1

∑a′,b′

Page 78: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

IncaRNAtion [Reinharz et al, Bioinformatics 2013]

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

Explore sequence spaceStructure fixed

(Only 1 case applies)

a′a bi + 1 j

∑a′

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

a′ b′a bi + 1 j − 1

∑a′,b′

Page 79: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

"Global" Stochastic Backtrack

Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence:

nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

− 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

Page 80: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

"Global" Stochastic Backtrack

Z = weight of nt1

Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1

nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

Page 81: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

"Global" Stochastic Backtrack

Z = weight of nt1

Z = weight of nt2 ∧ nt17

Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17

nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt2 nt17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

Page 82: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

"Global" Stochastic Backtrack

Z = weight of nt1Z = weight of nt2 ∧ nt17

Z = weight of nt3 ∧ nt16

Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17nt3 nt16

nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt3 nt16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

Page 83: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

"Global" Stochastic Backtrack

Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16

Z = weight of nt4 ∧ nt8

Z = weight of nt9 ∧ nt15Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8

nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt4 nt8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

Page 84: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

"Global" Stochastic Backtrack

Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8

Z = weight of nt9 ∧ nt15

Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15

nt10 nt14

−nt9 nt15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

Page 85: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

"Global" Stochastic Backtrack

Z = weight of nt1Z = weight of nt2 ∧ nt17Z = weight of nt3 ∧ nt16Z = weight of nt4 ∧ nt8 Z = weight of nt9 ∧ nt15

Z = weight of nt10 ∧ nt14

Sequence: nt1 nt2 nt17nt3 nt16nt4 nt8nt5 nt6 nt7 nt9 nt15nt10 nt14

−nt10 nt14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 −

Page 86: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

GC-content bias

0.0 0.2 0.4 0.6 0.8 1.0GC content

0

200

400

600

800

1000

Quanti

ty

Target 15% GC

Average GC content is 64%

RFAM average GC content48.8%

Page 87: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Weighted DP Recursions

[Position i unpaired]

[Position i paired with k ]

[Paired ends + Stacking pairs]

si sja b

i j

a′a bi + 1 j

∑a′

x#GC(a′)

a′ b′a bi + 1 jkk − 1 k + 1

∑a′,b′

x#GC(a′b′)

a′ b′a bi + 1 j − 1

∑a′,b′

x#GC(a′b′)

Page 88: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Incarnation NT distribution: Bissection scheme

..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

[Waldispühl and Ponty, RECOMB, 2011]

Page 89: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Incarnation NT distribution: Bissection scheme

..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141

[Waldispühl and Ponty, RECOMB, 2011]

Page 90: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Incarnation NT distribution: Bissection scheme

..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141

.

x#GC = 0.0625

.

Tot round 3 is 547

[Waldispühl and Ponty, RECOMB, 2011]

Page 91: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Incarnation NT distribution: Bissection scheme

..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141

.

x#GC = 0.0625

.

Tot round 3 is 547

.

x#GC = 0.0938

.

Tot round 4 is 823

[Waldispühl and Ponty, RECOMB, 2011]

Page 92: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Incarnation NT distribution: Bissection scheme

..

0.0

.

0.2

.

0.4

.

0.6

.

0.8

.

1.0

.

GC content.

0

.

200

.

400

.

600

.

800

.

1000

.

Quan

tity

.

Target 15% GC

.

x#GC = 0.2500

.

Tot round 1 is 0

.

x#GC = 0.1250

.

Tot round 2 is 141

.

x#GC = 0.0625

.

Tot round 3 is 547

.

x#GC = 0.0938

.

Tot round 4 is 823

.

x#GC = 0.0781

.

Tot round 5 is 1165

[Waldispühl and Ponty, RECOMB, 2011]

Page 93: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Limits of the approach

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .

Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]

Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]

Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]

Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]

Page 94: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Limits of the approach

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .

Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]

Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]

Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]

Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]

Page 95: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Limits of the approach

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .

Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]

Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]

Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]

Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]

Page 96: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Limits of the approach

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

1

10

20

30

Heuristic: Strong affinity is neither sufficient, nor necessary, but . . .

Strong empirical correlation affinity/success of design [Levin et al, NAR 2012]

Linear time-complexity [Reinharz Ponty Waldispühl, ISMB/ECCB’13]

Composition control [Bodini Ponty, AofA’10] [Reinharz et al, ISMB/ECCB’13]

Complementary with local search approaches [Reinharz et al, ISMB/ECCB’13]

Page 97: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Local vs Global vs "Glocal"

Page 98: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Local vs Global vs "Glocal"

Page 99: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Local vs Global vs "Glocal"

Page 100: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

The success of glocal strategies

Sampling + Optimize creates highly probable design sequences

Page 101: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

II. Constrained design

Avoiding/forcing motifs

Page 102: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Existing approaches for negative design

Based on local search. . .

RNAInverse - TBI Vienna

Info-RNA -Backofen@Freiburg

RNA-SSD - Condon@UBC

NUPack - Pierce@Caltech

. . . bio-inspired algorithms. . .

RNAFBinv - Barash@Ben Gurion

FRNAKenstein - Hein@Oxford

AntaRNA - Backofen@Freiburg

. . . exact approaches. . .

RNAIFold - Clote@Boston College

CO4 - Will@Leipzig

Few algorithms support avoided/mandatory motifs. . .

. . . none guarantees reasonable runtime.

Typical reasons:Deep local minima (Rugged landscape)

Mandatory motifs⇒ Late deadends (Branch and Bound)

Forbidden motifs⇒ Search space disconnection (Local Search)

Page 103: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Existing approaches for negative design

Based on local search. . .

RNAInverse - TBI Vienna

Info-RNA -Backofen@Freiburg

RNA-SSD - Condon@UBC

NUPack - Pierce@Caltech

. . . bio-inspired algorithms. . .

RNAFBinv - Barash@Ben Gurion

FRNAKenstein - Hein@Oxford

AntaRNA - Backofen@Freiburg

. . . exact approaches. . .

RNAIFold - Clote@Boston College

CO4 - Will@Leipzig

Few algorithms support avoided/mandatory motifs. . .

. . . none guarantees reasonable runtime.

Typical reasons:Deep local minima (Rugged landscape)

Mandatory motifs⇒ Late deadends (Branch and Bound)

Forbidden motifs⇒ Search space disconnection (Local Search)

Page 104: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem with local approaches: An example

Simplified vocabulary {A,U}

+ Forbidden motifs F = {AU,UA}

UUUUUAAAAA

AAUAA

AUAAA

UAAAA UUUUA

UUUAU

UUAUU

UAUUU

UAAUU

AUUAU

AAUAU

AAUUU

AAAUA

AAAUA

AAUUA

AAAUU

AUUUU

⇒ F may disconnect search space (holds for any move set!)

Page 105: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Problem with local approaches: An example

Simplified vocabulary {A,U} + Forbidden motifs F = {AU,UA}

UUUUUAAAAA

AAUAA

AUAAA

UAAAA UUUUA

UUUAU

UUAUU

UAUUU

UAAUU

AUUAU

AAUAU

AAUUU

AAAUA

AAAUA

AAUUA

AAAUU

AUUUU

⇒ F may disconnect search space (holds for any move set!)

Page 106: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Idea

Use formal language constructs to constrain global sampling

Forced motifsAvoided motifs

→ Regular language LC ∈ Reg

Structure compatibility+ Positional constraints+ Energy Model

→ Weighted Context-Free Lang LS ∈ CFL

Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL

Build weighted context-free grammar G for LC ∩ LS+ Random generation

⇒ Global sampling under constraints

Page 107: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Idea

Use formal language constructs to constrain global sampling

Forced motifsAvoided motifs

→ Regular language LC ∈ Reg

Structure compatibility+ Positional constraints+ Energy Model

→ Weighted Context-Free Lang LS ∈ CFL

Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL

Build weighted context-free grammar G for LC ∩ LS+ Random generation

⇒ Global sampling under constraints

Page 108: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Idea

Use formal language constructs to constrain global sampling

Forced motifsAvoided motifs

→ Regular language LC ∈ Reg

Structure compatibility+ Positional constraints+ Energy Model

→ Weighted Context-Free Lang LS ∈ CFL

Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL

Build weighted context-free grammar G for LC ∩ LS+ Random generation

⇒ Global sampling under constraints

Page 109: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Idea

Use formal language constructs to constrain global sampling

Forced motifsAvoided motifs

→ Regular language LC ∈ Reg

Structure compatibility+ Positional constraints+ Energy Model

→ Weighted Context-Free Lang LS ∈ CFL

Folklore theorem (constructive): Reg ∩ (W)CFL ⊆ (W)CFL

Build weighted context-free grammar G for LC ∩ LS+ Random generation

⇒ Global sampling under constraints

Page 110: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the Finite State Automaton

To force multiple words, keeptrack of generated words:

Create disjunctiveautomata for eachM′ ⊆MReroute accepting statesAccepting state = noforced word remaining (εin A∅)Forbidden words can beadded to sub-automata

#States:

O(

2|M| ·(∑

i |fi |+∑

j |mj |))

Example: M = {AGC,GG}

ε

A

AG

AGC

C

G

A

G

GG

G

G

ε

A

AG

AGC

C

G

A

ε

G

G

G

G

ε

A,C,G,U

AM

A{GG} A{AGC}A∅

Page 111: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the Finite State Automaton

To force multiple words, keeptrack of generated words:

Create disjunctiveautomata for eachM′ ⊆MReroute accepting statesAccepting state = noforced word remaining (εin A∅)Forbidden words can beadded to sub-automata

#States:

O(

2|M| ·(∑

i |fi |+∑

j |mj |))

Example: M = {AGC,GG}

ε

A

AG

G

A

G

G

ε

A

AG

G

A

ε

G

G

ε

A,C,G,U

C

G

CG

AM

A{GG} A{AGC}

A∅

Page 112: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the Finite State Automaton

To force multiple words, keeptrack of generated words:

Create disjunctiveautomata for eachM′ ⊆MReroute accepting statesAccepting state = noforced word remaining (εin A∅)Forbidden words can beadded to sub-automata

#States:

O(

2|M| ·(∑

i |fi |+∑

j |mj |))

Example:M = {AGC,GG};F = {AA}

ε

A

AG

G

A

G

G

ε

A

AG

G

A

ε

G

G

A

A

ε

A

A

C,G,U

C

G

CG

⊥A

A

A

A

Page 113: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the grammar

Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position

. ( ( ( . ) ) ( . . ) )

1 5 10 12

Page 114: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the grammar

Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position

. ( ( ( . ) ) ( . . ) )

1 5 10 12

S1

S2

S3S4

S5

S8

S9S10

Page 115: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the grammar

Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position

S1 → .S2 S2 → (S3 ) S3 → (S4 )S8 S4 → (S5 )

S5 → . S8 → (S9 ) S9 → .S10 S10 → .

Page 116: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the grammar

Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position

V1 → AV2 | CV2 | GV2 | UV2

V2 → AV3U | CV3G | GV3C | GV3U | UV3A | UV3G

V3 → AV4UV8 | CV4GV8 | GV4CV8 | GV4UV8| UV4AV8 | UV4GV8

V4 → AV5U | CV5G | GV5C | GV5U | UV5A | UV5G

V5 → A | C | G | UV8 → AV9U | CV9G | GV9C | GV9U | UV9A | UV9G

V9 → AV10 | CV10 | GV10 | UV10

V10 → A | C | G | U

Page 117: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Building the grammar

Input: Secondary Structure S + Positional constraintsA Create Parse Tree for secondary structureB Translate Parse Tree into single-word grammarC Expand grammar to instantiate compatible base/base-pairsD Restrict to bases/base-pairs allowed at each position

V1 → AV2 | CV2 | GV2 | UV2

V2 → AV3U |�����CV3G |�����GV3C |�����GV3U | UV3A |�����UV3G

V3 → AV4UV8 |������CV4GV8 |������GV4CV8 |�������GV4UV8| UV4AV8 |�������UV4GV8

V4 → AV5U |�����CV5G |�����GV5C |�����GV5U | UV5A |�����UV5G

V5 → A | C | G | UV8 → AV9U |�����CV9G |�����GV9C |�����GV9U | UV9A |�����UV9G

V9 → AV10 | CV10 | GV10 | UV10

V10 → A | C | G | U

Page 118: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Random generation

Combine CFG and aut. → CFG (Multiplying #Rules by |Q|3)

GenRGenS [Ponty Termier Denise, Bioinformatics 2006]:Precomputes #words for each non-terminalRandom Generation w.r.t. weighted distribution

Energy models:Uniform distributionNussinov energy modelStacking-pairs model (Turner 2004)

Based on refined, yet similar, grammar

Overall complexity: |S| · 23|M| ·(∑

i |fi |+∑

j |mj |)3

Linear on |S|Exponential on |M|, but NP-Hard problem

Page 119: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

III. Positive design for multiplestructures

Page 120: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Motivation: Kinetics and riboswitches

∆G

- Local Minimum (LM)

Page 121: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Design objectives

Positive structural designOptimize affinity of designed sequences towards target structureOr simply ensure their compatibility with one or several structuresExamples: Most stable sequence for given fold. . .

Negative structural designLimit affinity of designed sequences towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .

Additional constraints:

Forbid motif list to appear anywhere in design

Force motif list to appear each at least onceLimit available alternatives at certain positions

Control overall composition (GC-content)

Page 122: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Design objectives

Positive structural designOptimize affinity of designed sequences towards target structureOr simply ensure their compatibility with one or several structuresExamples: Most stable sequence for given fold. . .

Negative structural designLimit affinity of designed sequences towards alternative structuresExamples: Lowest free-energy, High Boltzmann probability/Low entropy. . .

Additional constraints:

Forbid motif list to appear anywhere in design

Force motif list to appear each at least onceLimit available alternatives at certain positions

Control overall composition (GC-content)

Page 123: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Single structure

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Question: How many Compatible sequences?Answer: 4#BPs × 4#Unpaired → 268 435 456

Page 124: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Single structure

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

G A G C U C A G C G C G A A C G C U C U C A

Compatible Sequence

Question: How many Compatible sequences?Answer: 4#BPs × 4#Unpaired → 268 435 456

Page 125: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Single structure

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

A A A A G A C U G G A C U U G G C C U U U C

Incompatible Sequence

Question: How many Compatible sequences?

Answer: 4#BPs × 4#Unpaired → 268 435 456

Page 126: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Single structure

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Question: How many Compatible sequences?

Answer: 4#BPs × 4#Unpaired → 268 435 456

Page 127: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Single structure

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Question: How many Compatible sequences?

Answer: 4#BPs × 4#Unpaired → 268 435 456

Page 128: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Single structure

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Question: How many Compatible sequences?Answer: 4#BPs × 4#Unpaired → 268 435 456

Page 129: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Two structures

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)

4#CCs → 65 536

Page 130: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Two structures

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)

4#CCs → 65 536

Page 131: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Two structures

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)

4#CCs → 65 536

Page 132: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Two structures

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)

4#CCs → 65 536

Page 133: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + Two structures

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (both base-pairs and dependency graphs bipartite)

4#CCs → 65 536

Page 134: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + > 2 structs

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64

Page 135: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + > 2 structs

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64

Page 136: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + > 2 structs

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64

Page 137: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + > 2 structs

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64

Page 138: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + > 2 structs

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64

Page 139: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: Watson-Crick + > 2 structs

C

U

G

A

Compatible Base Pairs = Only Watson-Crick base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→ 4#CCs = 64

Page 140: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: WC/Wobble + Single struct.

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

Question: How many Compatible sequences?

Answer: 4#Unpaired × 6#BPs → 6 879 707 136

Page 141: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: WC/Wobble + Single struct.

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

Question: How many Compatible sequences?

Answer: 4#Unpaired × 6#BPs → 6 879 707 136

Page 142: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: WC/Wobble + Two structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc)

Page 143: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: WC/Wobble + Two structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc)

Page 144: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: WC/Wobble + Two structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc)

Page 145: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: WC/Wobble + Two structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc)

Page 146: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible RNAs: WC/Wobble + Two structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc)

Page 147: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For paths: A simple DFA generates compatible sequences

εstart

AU

GC

A

U

C

G

G

CG

U

U

A

Remark: A↔ C/G↔ U symmetry

Page 148: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For paths: A simple DFA generates compatible sequences

εstart • ◦A,C

G,U

•◦

Remark: A↔ C/G↔ U symmetry

p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)

Page 149: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For paths: A simple DFA generates compatible sequences

εstart • ◦A,C

G,U

•◦

Remark: A↔ C/G↔ U symmetry

m•(n) = m◦(n − 1)

m◦(n) = m◦(n − 1) + m•(n − 1)

= m◦(n − 1) + m◦(n − 2)

= F(n + 2)

p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)

Page 150: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For paths: A simple DFA generates compatible sequences

εstart • ◦A,C

G,U

•◦

Remark: A↔ C/G↔ U symmetry

m•(n) = m◦(n − 1)

m◦(n) = m◦(n − 1) + m•(n − 1)

= m◦(n − 1) + m◦(n − 2)

= F(n + 2)

p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)

Page 151: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For paths: A simple DFA generates compatible sequences

εstart • ◦A,C

G,U

•◦

Remark: A↔ C/G↔ U symmetry

m•(n) = m◦(n − 1)

m◦(n) = m◦(n − 1) + m•(n − 1)

= m◦(n − 1) + m◦(n − 2)

= F(n + 2)

(Since m◦(0) = 1 and m◦(1) = 2)

p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)

Page 152: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For paths: A simple DFA generates compatible sequences

εstart • ◦A,C

G,U

•◦

Remark: A↔ C/G↔ U symmetry

m•(n) = m◦(n − 1)

m◦(n) = m◦(n − 1) + m•(n − 1)

= m◦(n − 1) + m◦(n − 2)

= F(n + 2)

(Since m◦(0) = 1 and m◦(1) = 2)

p(n) := mε(n) = 2 m•(n − 1) + 2 m◦(n − 1) = 2(F(n) + F(n + 1)) =,F(n + 2)

Page 153: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

For cycle: A barely more involved DFA generates compatible sequences

εstart

• ◦1 •1

◦2 •2•2

A,C◦

◦G,U

Remark: A↔ C/G↔ U symmetry

m◦2 (n) = F(n + 2)

m◦1 (n) = F(n + 1)

(Since m◦1 (0) = 1 and m◦1 (1) = 1)

c(n) := mε(n) = 2 m◦1 (n − 2) + 2 m◦2 (n − 1)

= 2(F(n − 1) + F(n + 1)) = 2F(n) + 4F(n − 1)

Page 154: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible designs for paths and cycles

Theorem (#Compatible designs for paths and cycles)

The numbers of compatible designs for paths and cycles of length n are:

p(n) = 2Fn+2 and c(n) = 2Fn + 4Fn−1

where Fn: nth Fibonacci number, F0 = 0, F1 = 1 and Fn = Fn−1 + Fn−2.

Theorem (#Compatible designs for general 2-structures graphs)

G: dependency graph associated with 2 RNA structures (max deg=2).The number #Designs(G) of compatible designs for G is given by

#Designs(G) =∏

p∈P(G)

2F|p|+2 ×∏

c∈C(G)

(2F|c| + 4F|c|−1

)where G decomposes into paths P(G) and cycles C(G).

Page 155: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible sequences: WC/Wobble + Two structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

aeg

p

u

k

Dependency graph:Cycles + Paths

bd

h j q

t

f l o v c s

i m n r

Question: How many Compatible sequences?

Answer: 6= ∅! (base-pairs and dependency graphs always bipartite)

#Designs(G) =∏

c∈CC(G)

#Designs(cc) = 2 322 432

Page 156: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible sequences: Watson-Crick + > 2 structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→

∏cc∈CC(G)

2×#IS(cc)

Page 157: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible sequences: Watson-Crick + > 2 structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Non-bipartite→ ∅; Bipartite→∏

cc∈CC(G)

2×#IS(cc)

Page 158: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 159: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

A

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 160: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 161: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U

G

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 162: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U

G C

G

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 163: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

A

U

U

U

G C

G

A U

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 164: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 165: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 166: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Bijection between Independent Sets and Valid Designs

C

U

G

AA

CG

U

fed

i

g

h

cba

C

G

G

G

U A

U

C G

Remark: No adjacent black letters in compatible designs

Up to trivial symmetry? (e.g. top-left position ∈ {G,A}):Designs?(cc) ⊆ IndependentSets(cc)

Also, IS (black) +↖ vert. ∈ {G,A} ⇒ Unique compatible design

⇒ Bijection between Designs?(cc) and IndependentSets(cc).

Page 167: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Valid designs and independent sets

Theorem (#Valid design for bipartite connected dependency graphs)

Let G be a bipartite connected dependency graph, one has:

#Designs(G) = 2×#Designs?(G) = 2×#IS(G)

For a bipartite dependency graph G we get:

#Designs(G) =∏

cc∈CC(G)

2×#IS(cc) = 2|CC(G)| ×#IS(G)

But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]

(+ Any G is a dependency graph)

Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .

Theorem

#Designs is #P-hard.

No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)

Page 168: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Valid designs and independent sets

Theorem (#Valid design for bipartite connected dependency graphs)

Let G be a bipartite connected dependency graph, one has:

#Designs(G) = 2×#Designs?(G) = 2×#IS(G)

For a bipartite dependency graph G we get:

#Designs(G) =∏

cc∈CC(G)

2×#IS(cc) = 2|CC(G)| ×#IS(G)

But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]

(+ Any G is a dependency graph)

Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .

Theorem

#Designs is #P-hard.

No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)

Page 169: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Valid designs and independent sets

Theorem (#Valid design for bipartite connected dependency graphs)

Let G be a bipartite connected dependency graph, one has:

#Designs(G) = 2×#Designs?(G) = 2×#IS(G)

For a bipartite dependency graph G we get:

#Designs(G) =∏

cc∈CC(G)

2×#IS(cc) = 2|CC(G)| ×#IS(G)

But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]

(+ Any G is a dependency graph)

Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .

Theorem

#Designs is #P-hard.

No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)

Page 170: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Valid designs and independent sets

Theorem (#Valid design for bipartite connected dependency graphs)

Let G be a bipartite connected dependency graph, one has:

#Designs(G) = 2×#Designs?(G) = 2×#IS(G)

For a bipartite dependency graph G we get:

#Designs(G) =∏

cc∈CC(G)

2×#IS(cc) = 2|CC(G)| ×#IS(G)

But #IS(G) is #P-hard on bipartite graphs [Bubbley&Dyer’01]

(+ Any G is a dependency graph)

Algorithm A ∈ P for #Designs(G)→ Algorithm A′ ∈ P for #BIS. . .

Theorem

#Designs is #P-hard.

No polynomial algorithm for #Designs(G) unless #P = FP (⇒ P = NP)

Page 171: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Consequences

Corollary (#Approximability for ≤ 5 structures) [[Weitz’06]

For any G built from ≤ 5 pseudoknotted structures, #Design(G) can beapproximated within any ratio in polynomial time (PTAS)

Corollary (#BIS hardness for > 5 struct.) [[Cai, Galanis, Goldberg, Jerrum,McQuillan’16]

Beyond 5 pseudoknotted structures, approximating #Design becomes as hardas approximating #BIS without any constraint.

Why pseudoknotted? Because any bipartite graph of max degree ∆ can bedecomposed into ∆ matchings in polynomial time (Vizing’s theorem).

Lastly, connection between counting and sampling [Jerrum, Valiant, Vazirani’86].

Conjecture (#BIS hardness of sampling)

Generating comp. sequences (almost) uniformly for general input is #BIS-hard.

Page 172: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Consequences

Corollary (#Approximability for ≤ 5 structures) [[Weitz’06]

For any G built from ≤ 5 pseudoknotted structures, #Design(G) can beapproximated within any ratio in polynomial time (PTAS)

Corollary (#BIS hardness for > 5 struct.) [[Cai, Galanis, Goldberg, Jerrum,McQuillan’16]

Beyond 5 pseudoknotted structures, approximating #Design becomes as hardas approximating #BIS without any constraint.

Why pseudoknotted? Because any bipartite graph of max degree ∆ can bedecomposed into ∆ matchings in polynomial time (Vizing’s theorem).

Lastly, connection between counting and sampling [Jerrum, Valiant, Vazirani’86].

Conjecture (#BIS hardness of sampling)

Generating comp. sequences (almost) uniformly for general input is #BIS-hard.

Page 173: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Consequences

Corollary (#Approximability for ≤ 5 structures) [[Weitz’06]

For any G built from ≤ 5 pseudoknotted structures, #Design(G) can beapproximated within any ratio in polynomial time (PTAS)

Corollary (#BIS hardness for > 5 struct.) [[Cai, Galanis, Goldberg, Jerrum,McQuillan’16]

Beyond 5 pseudoknotted structures, approximating #Design becomes as hardas approximating #BIS without any constraint.

Why pseudoknotted? Because any bipartite graph of max degree ∆ can bedecomposed into ∆ matchings in polynomial time (Vizing’s theorem).

Lastly, connection between counting and sampling [Jerrum, Valiant, Vazirani’86].

Conjecture (#BIS hardness of sampling)

Generating comp. sequences (almost) uniformly for general input is #BIS-hard.

Page 174: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Perspectives: FPT and Boltzmann sampling algorithms

b

c

de

f gh i

j k l m

nopq

rs

a t

uv

a b c d e

f

gh

ijk l

mn o

pqr

s t u v

a

b

c

d

e f

g

h ij

k

l m n o

p

q

r st

u

v

R1

R2

R3

i) Input Structures

ab c d e f g h i j k l mn o p q r s t u v

ii) Merged Base-Pairs

aeg

p uk

csn

j

qtb

d h m

f l o

vi

r

iii) Compatibility Graph

v o

o ll ff r

l i

j qq t

t bb dd hh m

k pk p g

g n g p uu g e

u e a e ss c

iv) Tree Decomposition

RNARedPrint

Partition FunctionStochastic Backtrack

GACUUUAGCU

UGUUUGAAGU

UA

GACUUGGGAC

UUUUAGGUGU

CU

UUAGAUUCCA

GGGGCUUGU

AGG

GGUUUAGGGC

CUUUAGGUAU

CU

AUUGUGACGG

UCGUGACUAG

UU

. . .

0

0

0

−5

−5

−5

kcal.mol−1

kcal.mol−1

kcal.mol−1

E1

E2

E3GC%

0 50 100

v) Weight Optimization (Adaptive Sampling)

Weightsupd

ate

GCCGCGGUAGCUACAGCCGGCUUUGGGGUUGGGUAGACUCCGGUGCUGCAGCGGCUGUGGCUGGCCGGUUCUGGUUUGCUUAGGGCUACGACGGCGGUGCCGGCAUUUGC. . .

0kcal.mol−1

E1E2E3

GC%

0 50 100

vi) Final Designs

FPT algorithm for counting based on tree decompositionMultidimensional Boltzmann sampling to control energies, GC. . .

Page 175: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible sequences: Watson-Crick + > 2 structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Bipartite→

∏cc∈CC(G)

2×#IS(cc) = 496 672

Page 176: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Counting compatible sequences: Watson-Crick + > 2 structures

C

U

G

A

Compatible Base Pairs = Include Wobble base pairs

a b c d e f g h i j k l m n o p q r s t u v

Dependency graph:Cycles, Paths, Trees. . .

aeg

p

u

k bd

h j q

t

f l o v

cs

i

mn

r

Question: How many Compatible sequences?

Answer: Bipartite→∏

cc∈CC(G)

2×#IS(cc) = 496 672

Page 177: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

Perspectives: FPT and Boltzmann sampling algorithms

b

c

de

f gh i

j k l m

nopq

rs

a t

uv

a b c d e

f

gh

ijk l

mn o

pqr

s t u v

a

b

c

d

e f

g

h ij

k

l m n o

p

q

r st

u

v

R1

R2

R3

i) Input Structures

ab c d e f g h i j k l mn o p q r s t u v

ii) Merged Base-Pairs

aeg

p uk

csn

j

qtb

d h m

f l o

vi

r

iii) Compatibility Graph

v o

o ll ff r

l i

j qq t

t bb dd hh m

k pk p g

g n g p uu g e

u e a e ss c

iv) Tree Decomposition

RNARedPrint

Partition FunctionStochastic Backtrack

GACUUUAGCU

UGUUUGAAGU

UA

GACUUGGGAC

UUUUAGGUGU

CU

UUAGAUUCCA

GGGGCUUGU

AGG

GGUUUAGGGC

CUUUAGGUAU

CU

AUUGUGACGG

UCGUGACUAG

UU

. . .

0

0

0

−5

−5

−5

kcal.mol−1

kcal.mol−1

kcal.mol−1

E1

E2

E3GC%

0 50 100

v) Weight Optimization (Adaptive Sampling)

Weightsupd

ate

GCCGCGGUAGCUACAGCCGGCUUUGGGGUUGGGUAGACUCCGGUGCUGCAGCGGCUGUGGCUGGCCGGUUCUGGUUUGCUUAGGGCUACGACGGCGGUGCCGGCAUUUGC. . .

0kcal.mol−1

E1E2E3

GC%

0 50 100

vi) Final Designs

FPT algorithm for counting based on tree decompositionMultidimensional Boltzmann sampling to control energies, GC. . .

Page 178: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

o

1

2

3

4

5

6

7

8

9

1 0

E 1 0 _ 1

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

E 2 3 _ 1

E 2 3 _ 2

E 2 3 _ 5

E 2 3 _ 6

E 2 3 _ 7

E 2 3 _ 8

E 2 3 _ 9

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1 4 2 4 3

4 4

4 5

4 6

4 7

4 8

4 9

5 0

o

120

46

68

98

115

128

148

171

201

218

240

258

282

312

330

350

363

382

397

419

UUCCUAAU

UAGGAA

UA

AC

GU

GA

UA

GU

GU

UG

UA

UC

CG

U

GUGCAUACG

AU

GC

AU

AGUG

CCCU

CCACU

G

AGGG

UU

CA

C

GA

UA

UG

AG

GC

CU

CG

GU

GA

A

UA

UC

C G U

GUGUACACG

G U A C A U A G U GCCCUCCACU

G A G G G UG

UG

U

AC

GC

UG

UA

AG

GC

CU

UA

CG

AC

AC

A

UG

CG

CGU

GU

GC

AU

AC

G

AUGCAUAGUG

CCCU

CCACU

G

AGGG

UG

UG

UA

UA

UC

AU

GC

GG

GU

UA

UA

CU

AG

UA

UA

AC

CC

GU

UA

UA

CA

CA

AU

GA

GG

GA

UC

CC

CCU

C C U C G G GGAGG

UC

AUAU

AA

U

AAC

G

AUA

GUUAU

UAA

AU

C

A

UUU

UG

CG

U

UUU

UAUAAAACUU

A

CU

AU

AU U U A U

UG

A A UC

A U U U

CA

G G

AGUG

GUCAA

UU U A A A U A A

UC

G

A UA

UUUGU

U AAA

UC

A

U

UG

GA

AU

A A A

G G UG

C

UA

AA

AU

U

UUC

GAAU

UA

UACCCA

U:A (405)-413->U:A (405)-413->

G

C

G

G

A

U

UUAGCUC

AGU

U

G

GG A

G A G C

G

C

C

A

G

AC

U

GA A

G

A

P

C

U

G

G

AG G

U

C

C U G U G

U PC

G

AUC

CACAG

A

A

U

U

C

G

C

A

CC

A

1

10

20

30

40

50

60

70

76

Conclusions

RNA is cool!

RNA design is one of the current challenge of RNA bioinformatics withfar-reaching consequences for drug design, synthetic biology. . .

Practical use-cases require expressive and modular constraints

Future methods: kinetics, interactions,multiple structures,pseudoknots. . .

RNA inverse folding is the combinatorial core of design.It remains largely unsolved, and opens new lines of research in Comp.Sci.

Page 179: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

o

1

2

3

4

5

6

7

8

9

1 0

E 1 0 _ 1

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

E 2 3 _ 1

E 2 3 _ 2

E 2 3 _ 5

E 2 3 _ 6

E 2 3 _ 7

E 2 3 _ 8

E 2 3 _ 9

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1 4 2 4 3

4 4

4 5

4 6

4 7

4 8

4 9

5 0

o

120

46

68

98

115

128

148

171

201

218

240

258

282

312

330

350

363

382

397

419

UUCCUAAU

UAGGAA

UA

AC

GU

GA

UA

GU

GU

UG

UA

UC

CG

U

GUGCAUACG

AU

GC

AU

AGUG

CCCU

CCACU

G

AGGG

UU

CA

C

GA

UA

UG

AG

GC

CU

CG

GU

GA

A

UA

UC

C G U

GUGUACACG

G U A C A U A G U GCCCUCCACU

G A G G G UG

UG

U

AC

GC

UG

UA

AG

GC

CU

UA

CG

AC

AC

A

UG

CG

CGU

GU

GC

AU

AC

G

AUGCAUAGUG

CCCU

CCACU

G

AGGG

UG

UG

UA

UA

UC

AU

GC

GG

GU

UA

UA

CU

AG

UA

UA

AC

CC

GU

UA

UA

CA

CA

AU

GA

GG

GA

UC

CC

CCU

C C U C G G GGAGG

UC

AUAU

AA

U

AAC

G

AUA

GUUAU

UAA

AU

C

A

UUU

UG

CG

U

UUU

UAUAAAACUU

A

CU

AU

AU U U A U

UG

A A UC

A U U U

CA

G G

AGUG

GUCAA

UU U A A A U A A

UC

G

A UA

UUUGU

U AAA

UC

A

U

UG

GA

AU

A A A

G G UG

C

UA

AA

AU

U

UUC

GAAU

UA

UACCCA

U:A (405)-413->U:A (405)-413->

G

C

G

G

A

U

UUAGCUC

AGU

U

G

GG A

G A G C

G

C

C

A

G

AC

U

GA A

G

A

P

C

U

G

G

AG G

U

C

C U G U G

U PC

G

AUC

CACAG

A

A

U

U

C

G

C

A

CC

A

1

10

20

30

40

50

60

70

76

Collaborators

University McGillVladimir ReinharzJérôme Waldispühl

MITBonnie BergerSrinivas DevadasAlex LevinMieszko LisCharles O’Donnell

LRI – Univ. Paris SudAlain DeniseVincent Le Gallic

Wuhan UniversityYi ZhangYu Zhou

LIGM – Marne la ValléeStéphane Vialette

LIX – Ecole PolytechniqueAlice HéliouMireille Regnier

Simon Fraser UniversityJozef HalesJan Manuch (UBC)Ladislav Stacho

Cédric ChauveJulien Courtiel

TBI ViennaRonnie LorenzAndrea Tanzer

Page 180: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

o

1

2

3

4

5

6

7

8

9

1 0

E 1 0 _ 1

1 1

1 2

1 3

1 4

1 5

1 6

1 7

1 8

1 9

2 0

2 1

2 2

2 3

E 2 3 _ 1

E 2 3 _ 2

E 2 3 _ 5

E 2 3 _ 6

E 2 3 _ 7

E 2 3 _ 8

E 2 3 _ 9

2 4

2 5

2 6

2 7

2 8

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 8

3 9

4 0

4 1 4 2 4 3

4 4

4 5

4 6

4 7

4 8

4 9

5 0

o

120

46

68

98

115

128

148

171

201

218

240

258

282

312

330

350

363

382

397

419

UUCCUAAU

UAGGAA

UA

AC

GU

GA

UA

GU

GU

UG

UA

UC

CG

U

GUGCAUACG

AU

GC

AU

AGUG

CCCU

CCACU

G

AGGG

UU

CA

C

GA

UA

UG

AG

GC

CU

CG

GU

GA

A

UA

UC

C G U

GUGUACACG

G U A C A U A G U GCCCUCCACU

G A G G G UG

UG

U

AC

GC

UG

UA

AG

GC

CU

UA

CG

AC

AC

A

UG

CG

CGU

GU

GC

AU

AC

G

AUGCAUAGUG

CCCU

CCACU

G

AGGG

UG

UG

UA

UA

UC

AU

GC

GG

GU

UA

UA

CU

AG

UA

UA

AC

CC

GU

UA

UA

CA

CA

AU

GA

GG

GA

UC

CC

CCU

C C U C G G GGAGG

UC

AUAU

AA

U

AAC

G

AUA

GUUAU

UAA

AU

C

A

UUU

UG

CG

U

UUU

UAUAAAACUU

A

CU

AU

AU U U A U

UG

A A UC

A U U U

CA

G G

AGUG

GUCAA

UU U A A A U A A

UC

G

A UA

UUUGU

U AAA

UC

A

U

UG

GA

AU

A A A

G G UG

C

UA

AA

AU

U

UUC

GAAU

UA

UACCCA

U:A (405)-413->U:A (405)-413->

G

C

G

G

A

U

UUAGCUC

AGU

U

G

GG A

G A G C

G

C

C

A

G

AC

U

GA A

G

A

P

C

U

G

G

AG G

U

C

C U G U G

U PC

G

AUC

CACAG

A

A

U

U

C

G

C

A

CC

A

1

10

20

30

40

50

60

70

76

Thanks!

Poster submission & Registration open soon. . .(+ ISCB travel fellowships for students)

Page 181: (Positive) Design of RiboNucleic Acidsbioinformatics.aut.ac.ir/workshop/wp-content/uploads/2016/05/2017 … · The Ancestor’s Tale: A Pilgrimage to the Dawn of Evolution. RNA folding

References I

R. Nussinov and A.B. Jacobson.

Fast algorithm for predicting the secondary structure of single-stranded RNA.Proc Natl Acad Sci U S A, 77:6903–13, 1980.