modeling rna motifs by graph-grammars franç[email protected]

17
Modeling RNA motifs by graph-grammars Franç[email protected] www.iric.ca

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Modeling RNA motifs by graph-grammars

Franç[email protected]

www.iric.ca

07.05 - Madison (ROC) 2

MC-Tools: Functions

• ( MC-Annotate 3-D ) -> graph• ( MC-Cycles graph ) -> [ NCM ]• ( MC-Seq graph ) -> [ sequence ]• ( MC-Fold sequence ) -> [ graph ]• ( MC-Cons [ ( sequence, [ graph ] ) ] ) -> [ graph ]• ( MC-Search ( graph, [ 3-D ] ) -> [ 3-D ]• ( MC-Sym graph ) -> [ 3-D ]

07.05 - Madison (ROC) 3

MC-Tools: Objects(rat 28S rRNA sarcin/ricin stem-loop)

Sequence: GGGUGCUCAGUACGAGAGGAACCGCACCC

Graph:

Nucleotide cyclic motifs:

3-D structure:

Szewczak et al. PNAS(USA) 1993Lemieux & Major NAR 2006Parisien, Thibault & Major (in prep.)

( MC-Fold sequence ) ->

[ graph ]

( MC-Sym graph ) -> [ 3-D ]

07.05 - Madison (ROC) 4

Graph

( MC-Annotate 3-D ) -> graph

Gendron, Lemieux & Major JMB 2001Lemieux & Major NAR 2002Leontis & Westhof RNA 2001

07.05 - Madison (ROC) 5

Shortest Cycle Basis

C1

C5C4

C3

C2

X1

X2

X3

X4 Y1

Y2

Y3

5’ 3’

( MC-Cycle graph ) -> [ NCM ]

Horton SIAM J Comp 1987St-Onge et al. NAR 2007

07.05 - Madison (ROC) 6

The Nucleotide Cyclic Motifs (NCM)

i. Embrace indistinctly all base pairing types (Watson-Crick and others)

ii. Precisely designate how any nucleotide in the sequence relate to others

iii. Are joined through a common base pair (context). This helps us predict coherent chains of NCMs and to project them in 3-D. Tentative definition of a motif: “ordered” chain of NCMs.

iv. Recur within and across all RNAsv. Are short (< 10 nts; most of 3 to 5 nts)vi. Compose the classical motifs (cf. GRNA tetraloop;

sarcin/ricin motif, etc). There are exceptions (cf. AA platform).

Lemieux & Major (2006) NAR 34:2340Parisien, Thibault & Major (in prep.)

07.05 - Madison (ROC) 7

Aim

We want a computational model that can encode the valid sequences and structural features of RNA motifs.

Hypothesis: A relation between the sequence and the structure of RNA motifs exists.

07.05 - Madison (ROC) 8

Graph Grammars

• A graph grammar is to a set of graphs what a formal generative grammar is to a set of strings, i.e. a precise and formal description of that set.

• A graph-grammar consists of a set of rules or productions for transforming graphs.

• Formally, a graph-grammar, H = {N, , P}, consists of a set of non‑terminal symbols, N, a set of terminal symbols, , and a set of production rules, P.

Hypothesis: NCMs are “independent” building blocks.

Nagl Computing 1976Nagl In H. Ehrig et al., eds 1987St-Onge et al. NAR 2007

07.05 - Madison (ROC) 9

Sarcin/Ricin Graph Grammar

N = {C1, C2, … C5},the set of NCMs:

= {S1, S2, … S5}the sets of sequences for each NCM:

P is a set of consistent assignment of the sequences in to the NCMs in N (production rules):

ARNt levure 23S H. marismortui 16S E. coli

St-Onge et al. NAR 2007

07.05 - Madison (ROC) 10

Sarcin/Ricin Building BlocksC1 :

Theoretical : 256 (16 x 16)

IMs : 120 (10 x 12)

PDB : 7

C2 :

Theoretical : 64 (16 x 4)

IMs : 40 (10 x 4)

PDB : 5

Theoretical : 16

IMs : 10

PDB : 15

A

AA

U

U

A

A

U A

C3 :

Theoretical : 64 (16 x 4)

IMs : 56 (14 x 4)

PDB : 2

C4 :

Theoretical : 256 (16 x 16)

IMs : 160 (16 x 10)

PDB : 3

C5 :

Theoretical : 64 (16 x 4)

IMs : 40 (10 x 4)

PDB : 8

A

G U

G

A G

U

A G

A

St-Onge et al. NAR 2007

07.05 - Madison (ROC) 11

( MC-Seq sarcin-ricin-graph ) -> [ sequence ]

Sequences supported by the NCMs in the PDB:

AGUA-GAA AGUA-AAA

GGUA-GAA GGUA-AAA

If we remove the instances of the sarcin/ricin motifs

( MC-Search ( sarcin-ricin-graph, [ PDB ] ) ) -> [ 3-D ]

Then, the same four sequences are supported

=> NCMs are found outside the sarcin/ricin context

Larose et al. (in prep.)St-Onge et al. NAR 2007

07.05 - Madison (ROC) 12

Graph Grammar Parsing

Westhof (personal comm.)St-Onge et al. NAR 2007

806 sequences aligned according to E. coli 23S rRNA structure; site 204-207 / 189-191.

07.05 - Madison (ROC) 13

MC-Seq PDB

Alignement: 5S, 16S, 23S

AGUA-AAAAGUA-GAAGGUA-GAA

GGUA-AAA

AAUA-AAAAAUA-GAAACUA-AAAACUA-GAAACUA-GACAGUA-AAC

AGUA-CAAAGUA-GACAGUA-GAUAGUA-GCCAGUA-GGGAGUA-GUGAGUC-GAAAUUA-GAA

CGUA-GAAGAUA-GAAGGUA-GAUGUUA-GAAUGUA-GAAUGUA-GAC

Isostericity matrices

10 000 sequences

Validation(MC-Seq vs. PDB vs. Alignment)

St-Onge et al. NAR 2007

07.05 - Madison (ROC) 14

Perspectives

• We want to develop a version of MC-Seq that would be useful during the alignment process.

• PDB does not seem to contain enough structural information yet.

• To avoid too many sequences, the NCMs (context) are necessary.

• Two more things need to be considered…

07.05 - Madison (ROC) 15

Sarcin/Ricin(Sequence/Structure Space Is Not Simple)

St-Onge et al. (in prep.)

07.05 - Madison (ROC) 16

Modeling In 3-D Might Be Necessary

AlignmentAUUA-GAA

(0.9Å)

MC-FoldCAUU-AAG

(2.1Å)

St-Onge et al. NAR 2007

07.05 - Madison (ROC) 17

Acknowledgments

Martin Larose (Res. assistant)

Philippe Thibault (Res. assistant)

Patrick Gendron (Res. assistant)

Romain Rivière (Postdoc, CS)

Véronique Lisi (Ph.D. Molecular Biology)

Marc Parisien (Ph.D. Computer Science)

Emmanuelle Permal (Ph.D. Bioinformatics)

Karine St-Onge (Ph.D. Computer Science)

Louis-Philippe Lavoie (M.Sc. Bioinformatics)

Maxime Caron (M.Sc. Bioinformatics)

Caroline Louis-Jeune (M.Sc. Bioinformatics)

Montréal:

Pascal Chartrand

Gerardo Ferberye

Sylvie Hamel

Sébastien Lemieux

Pascale Legault

Luc Desgroseillers

Kathy Borden

Daniel Lamarre

Éric Westhof (Strasbourg)

Alain Denise (Paris)

Dave Mathews (Rochester)