modeling rna motifs by graph-grammars franç[email protected]
Post on 21-Dec-2015
216 views
TRANSCRIPT
07.05 - Madison (ROC) 2
MC-Tools: Functions
• ( MC-Annotate 3-D ) -> graph• ( MC-Cycles graph ) -> [ NCM ]• ( MC-Seq graph ) -> [ sequence ]• ( MC-Fold sequence ) -> [ graph ]• ( MC-Cons [ ( sequence, [ graph ] ) ] ) -> [ graph ]• ( MC-Search ( graph, [ 3-D ] ) -> [ 3-D ]• ( MC-Sym graph ) -> [ 3-D ]
07.05 - Madison (ROC) 3
MC-Tools: Objects(rat 28S rRNA sarcin/ricin stem-loop)
Sequence: GGGUGCUCAGUACGAGAGGAACCGCACCC
Graph:
Nucleotide cyclic motifs:
3-D structure:
Szewczak et al. PNAS(USA) 1993Lemieux & Major NAR 2006Parisien, Thibault & Major (in prep.)
( MC-Fold sequence ) ->
[ graph ]
( MC-Sym graph ) -> [ 3-D ]
07.05 - Madison (ROC) 4
Graph
( MC-Annotate 3-D ) -> graph
Gendron, Lemieux & Major JMB 2001Lemieux & Major NAR 2002Leontis & Westhof RNA 2001
07.05 - Madison (ROC) 5
Shortest Cycle Basis
C1
C5C4
C3
C2
X1
X2
X3
X4 Y1
Y2
Y3
5’ 3’
( MC-Cycle graph ) -> [ NCM ]
Horton SIAM J Comp 1987St-Onge et al. NAR 2007
07.05 - Madison (ROC) 6
The Nucleotide Cyclic Motifs (NCM)
i. Embrace indistinctly all base pairing types (Watson-Crick and others)
ii. Precisely designate how any nucleotide in the sequence relate to others
iii. Are joined through a common base pair (context). This helps us predict coherent chains of NCMs and to project them in 3-D. Tentative definition of a motif: “ordered” chain of NCMs.
iv. Recur within and across all RNAsv. Are short (< 10 nts; most of 3 to 5 nts)vi. Compose the classical motifs (cf. GRNA tetraloop;
sarcin/ricin motif, etc). There are exceptions (cf. AA platform).
Lemieux & Major (2006) NAR 34:2340Parisien, Thibault & Major (in prep.)
07.05 - Madison (ROC) 7
Aim
We want a computational model that can encode the valid sequences and structural features of RNA motifs.
Hypothesis: A relation between the sequence and the structure of RNA motifs exists.
07.05 - Madison (ROC) 8
Graph Grammars
• A graph grammar is to a set of graphs what a formal generative grammar is to a set of strings, i.e. a precise and formal description of that set.
• A graph-grammar consists of a set of rules or productions for transforming graphs.
• Formally, a graph-grammar, H = {N, , P}, consists of a set of non‑terminal symbols, N, a set of terminal symbols, , and a set of production rules, P.
Hypothesis: NCMs are “independent” building blocks.
Nagl Computing 1976Nagl In H. Ehrig et al., eds 1987St-Onge et al. NAR 2007
07.05 - Madison (ROC) 9
⇒
Sarcin/Ricin Graph Grammar
N = {C1, C2, … C5},the set of NCMs:
= {S1, S2, … S5}the sets of sequences for each NCM:
P is a set of consistent assignment of the sequences in to the NCMs in N (production rules):
ARNt levure 23S H. marismortui 16S E. coli
⇒
⇒
St-Onge et al. NAR 2007
07.05 - Madison (ROC) 10
Sarcin/Ricin Building BlocksC1 :
Theoretical : 256 (16 x 16)
IMs : 120 (10 x 12)
PDB : 7
C2 :
Theoretical : 64 (16 x 4)
IMs : 40 (10 x 4)
PDB : 5
Theoretical : 16
IMs : 10
PDB : 15
A
AA
U
U
A
A
U A
C3 :
Theoretical : 64 (16 x 4)
IMs : 56 (14 x 4)
PDB : 2
C4 :
Theoretical : 256 (16 x 16)
IMs : 160 (16 x 10)
PDB : 3
C5 :
Theoretical : 64 (16 x 4)
IMs : 40 (10 x 4)
PDB : 8
A
G U
G
A G
U
A G
A
St-Onge et al. NAR 2007
07.05 - Madison (ROC) 11
( MC-Seq sarcin-ricin-graph ) -> [ sequence ]
Sequences supported by the NCMs in the PDB:
AGUA-GAA AGUA-AAA
GGUA-GAA GGUA-AAA
If we remove the instances of the sarcin/ricin motifs
( MC-Search ( sarcin-ricin-graph, [ PDB ] ) ) -> [ 3-D ]
Then, the same four sequences are supported
=> NCMs are found outside the sarcin/ricin context
Larose et al. (in prep.)St-Onge et al. NAR 2007
07.05 - Madison (ROC) 12
Graph Grammar Parsing
Westhof (personal comm.)St-Onge et al. NAR 2007
806 sequences aligned according to E. coli 23S rRNA structure; site 204-207 / 189-191.
07.05 - Madison (ROC) 13
MC-Seq PDB
Alignement: 5S, 16S, 23S
AGUA-AAAAGUA-GAAGGUA-GAA
GGUA-AAA
AAUA-AAAAAUA-GAAACUA-AAAACUA-GAAACUA-GACAGUA-AAC
AGUA-CAAAGUA-GACAGUA-GAUAGUA-GCCAGUA-GGGAGUA-GUGAGUC-GAAAUUA-GAA
CGUA-GAAGAUA-GAAGGUA-GAUGUUA-GAAUGUA-GAAUGUA-GAC
Isostericity matrices
10 000 sequences
Validation(MC-Seq vs. PDB vs. Alignment)
St-Onge et al. NAR 2007
07.05 - Madison (ROC) 14
Perspectives
• We want to develop a version of MC-Seq that would be useful during the alignment process.
• PDB does not seem to contain enough structural information yet.
• To avoid too many sequences, the NCMs (context) are necessary.
• Two more things need to be considered…
07.05 - Madison (ROC) 15
Sarcin/Ricin(Sequence/Structure Space Is Not Simple)
St-Onge et al. (in prep.)
07.05 - Madison (ROC) 16
Modeling In 3-D Might Be Necessary
AlignmentAUUA-GAA
(0.9Å)
MC-FoldCAUU-AAG
(2.1Å)
St-Onge et al. NAR 2007
07.05 - Madison (ROC) 17
Acknowledgments
Martin Larose (Res. assistant)
Philippe Thibault (Res. assistant)
Patrick Gendron (Res. assistant)
Romain Rivière (Postdoc, CS)
Véronique Lisi (Ph.D. Molecular Biology)
Marc Parisien (Ph.D. Computer Science)
Emmanuelle Permal (Ph.D. Bioinformatics)
Karine St-Onge (Ph.D. Computer Science)
Louis-Philippe Lavoie (M.Sc. Bioinformatics)
Maxime Caron (M.Sc. Bioinformatics)
Caroline Louis-Jeune (M.Sc. Bioinformatics)
Montréal:
Pascal Chartrand
Gerardo Ferberye
Sylvie Hamel
Sébastien Lemieux
Pascale Legault
Luc Desgroseillers
Kathy Borden
Daniel Lamarre
Éric Westhof (Strasbourg)
Alain Denise (Paris)
Dave Mathews (Rochester)