rna foldingckingsf/bioinfo-lectures/rnafold.pdf · rna folding rules rna folding rules: 1. if two...
TRANSCRIPT
![Page 1: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/1.jpg)
RNA FoldingCMSC 423
Lecture by Darya Filippova
![Page 2: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/2.jpg)
RNA Folding
G C
CG
UA
A U
U A
C G
CG
AU
A U
G
G
G
U
A
A
A
A G C C
GGCU
UA
A
A
GA
C
C
G
GU
C
U
U
U
A
CC
C
C
GG
A
U
A
U
G
C
CC
C
A
A
RNA is single stranded and folds up:• G and C stick together• A and U stick together
![Page 3: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/3.jpg)
RNA Folding Rules
RNA folding rules:1. If two bases are closer than 4 bases apart, they cannot
pair2. Each base is matched to at most one other base3. The allowable pairs are {U, A} and {C, G}4. Pairs cannot “cross.”
G C
CG
UA
A U
U A
C G
CG
AU G CCG UAA UU AC G CG AU
![Page 4: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/4.jpg)
No Crossings
If (i,j) and (k,m) are paired, we must have i < k < m < j.
Paired bases have to be nested.
i jk m
![Page 5: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/5.jpg)
RNA Folding
Given: a string r = b1b2b3,...,bn with bi ∈ {A,C,U,G}Find: the largest set of pairs S = {(i,j)}, where i,j ∈ {1,2,...,n} that satisfies the RNA folding rules.
Goal: match as many bases as possible.
![Page 6: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/6.jpg)
Subproblems
G CCG UUA UU AC G CG AU1 j
G CCG UUA UU AC G CG AU1 j
j is not paired with anything
j is paired with some t ≤ j -4
t
OPT(t+1, j-1)OPT(1, t-1)
OPT(1, j-1)
![Page 7: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/7.jpg)
Recurrence
If j - i ≤ 4:
OPT (i, j) = max
(OPT (i, j � 1)
maxt{1 +OPT (i, t� 1) +OPT (t+ 1, j � 1)
If j - i > 4:
In the 2nd case above, we try all possible t with which to pair j.That is, t runs from i to j-4.
OPT (i, j) = 0
![Page 8: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/8.jpg)
Order to solve the subproblems
• In what order should we solve the subproblems?
![Page 9: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/9.jpg)
Order to solve the subproblems
• In what order should we solve the subproblems?
• What problems do we need to solve OPT(i,j)?
OPT(i,t-1) and OPT(t+1, j-1) for every t between i and j
• In what sense are these problems “smaller?”
![Page 10: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/10.jpg)
Order to solve the subproblems
• In what order should we solve the subproblems?
• What problems do we need to solve OPT(i,j)?
OPT(i,t-1) and OPT(t+1, j-1) for every t between i and j
• In what sense are these problems “smaller?”
• They involve smaller intervals of the string:
We solve OPT(i,j) in order of increase value of j - i.
![Page 11: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/11.jpg)
Filling in the matrix
i
j
n
1n1
only use half: i < j
OPT(i,j)
![Page 12: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/12.jpg)
Filling in the matrix
i
j
n
1n1
in order of increasing j-i
![Page 13: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/13.jpg)
Filling in the matrix
i
j
n
1n1
in order of increasing j-i
![Page 14: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/14.jpg)
Filling in the matrix
i
j
n
1n1
in order of increasing j-i
![Page 15: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/15.jpg)
Filling in the matrix
i
j
n
1n1
in order of increasing j-i
![Page 16: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/16.jpg)
Filling in the matrix
i
j
n
1n1
in order of increasing j-i
![Page 17: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/17.jpg)
Case 1
i
j
n
1n1
OPT(i,j)
OPT(i,j-1)
OPT (i, j) = max
(OPT (i, j � 1). . .
![Page 18: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/18.jpg)
Case 1
i
j
n
1n1
OPT(i,j)
OPT(i,j-1)
OPT (i, j) = max
(OPT (i, j � 1). . .
![Page 19: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/19.jpg)
Case 2
i
j
n
1n1
OPT(i,j)
OPT(t+1,j-1)
OPT(i,t-1)
OPT (i, j) = max
(. . .
maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}
![Page 20: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/20.jpg)
Case 2
i
j
n
1n1
OPT(i,j)
OPT(t+1,j-1)
OPT(i,t-1)
OPT (i, j) = max
(. . .
maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}
![Page 21: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/21.jpg)
Case 2
i
j
n
1n1
OPT(i,j)
OPT(t+1,j-1)
OPT(i,t-1)
OPT (i, j) = max
(. . .
maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}
![Page 22: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/22.jpg)
Code
def rnafold(rna): n = len(rna) OPT = make_matrix(n, n) Arrows = make_matrix(n, n) for k in xrange(5, n): # interval length for i in xrange(n-k): # interval start j = i + k # interval end best_t = OPT[i][j-1] arrow = -1 for t in xrange(i, j): if is_complement(rna[t], rna[j]): val = 1 + \\
(OPT[i][t-1] if t > i else 0) + OPT[t+1][j-1] if val >= best_t: best_t, arrow = val, t OPT[i][j] = best_t Arrows[i][j] = arrow return OPT, Arrows
![Page 23: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/23.jpg)
Backtrace code
def rna_backtrace(Arrows): Pairs = [] # holds the pairs in the optimal solution Stack = [(0, len(Arrows) - 1)] # tracks cells we have to visit while len(Stack) > 0: i, j = Stack.pop() if j - i <= 4: continue # if cell is base case, skip it # Arrow = -1 means we didn’t match j if Arrows[i][j] == -1: Stack.append((i, j - 1)) else: t = Arrows[i][j] Pairs.append((t, j)) # save that j matched with t # add the two daughter problems if t > i: Stack.append((i, t - 1)) Stack.append((t + 1, j - 1))return Pairs
![Page 24: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/24.jpg)
Subproblems, 2
• We have a subproblem for every interval (i,j)
• How many subproblems are there?
![Page 25: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/25.jpg)
Subproblems, 2
• We have a subproblem for every interval (i,j)
• How many subproblems are there?
✓n
2
◆= O(n2)
![Page 26: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/26.jpg)
Running Time
• O(n2) subproblems
• Each takes O(n) time to solve(have to search over all possible choices of t)
• Total running time is O(n3).
![Page 27: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/27.jpg)
Summary
• This is essentially “Nussinov’s algorithm,” which was proposed for finding RNA structures in 1978.
• Same dynamic programming idea: write the answer to the full problem in terms of the answer to smaller problems.
• Still have an O(n2) matrix to fill.
• Main differences from sequence alignment:• We fill in the matrix in a different order: entries (i,j) in order
of increasing j - i.• We have to try O(n) possible subproblems inside the max.
This leads to an O(n3) algorithm.
![Page 28: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to](https://reader036.vdocuments.mx/reader036/viewer/2022071508/61298ffb7710992f994925e8/html5/thumbnails/28.jpg)
Pseudoknots
(Staple & Butcher, PLoS Biol, 2005)