rna secondary structure -...

Post on 04-Jun-2018

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RNA Secondary Structure

CSE 417W.L. Ruzzo

The Double Helix

Los Alamos Science

The “Central Dogma” ofMolecular Biology

DNA RNA Protein

DNA (chromosome) RNA

(messenger)

Proteingene

cell

Non-coding RNA

• Messenger RNA - codes for proteins• Non-coding RNA - all the rest

– Before, say, mid 1990’s, 1-2 dozen known(critically important, but narrow roles: e.g.ribosomal and transfer RNA, splicing, SRP)

• Since mid 90’s dramatic discoveries• Regulation, transport, stability/degradation• E.g. “microRNA”: hundreds in humans• E.g. “riboswitches”: thousands in bacteria

DNA structure: dull

…ACCGCTAGATG…

…TGGCGATCTAC…

• RNA’s fold,and function

• Nature useswhat works

RNAStructure:Rich

Why is structure Important?

• For protein-coding, similarity in sequence is apowerful tool for finding related sequences– e.g. “hemoglobin” is easily recognized in all

vertebrates• For non-coding RNA, many different

sequences have the same structure, andstructure is most important for function.– So, using structure plus sequence, can find related

sequences at much greater evolutionary distances

Q: What’s so hard?

AC

U

G

C

A

G

G

G

A

G

C

AA G

C

GA

G

G CC

U

C

UGC A

A

UG

A C

G

GU

G

CA

U

GA

G

A G

C

G UCU UU

U

C

A

A

CA

C UG

U

UA

U

G

G

A

A

G U

UU

G

GC

UA

GC

G UU C

UA

G

AG

C U

G

UG

A

CA

C

UG

CC

G

C

GA

C

G

G GA

A

A

GU A A

C

GG

G

CGG

C

G

A

GU

AA

A

C C

C

GA

UC CC

G

GU

G

A

A

U

AG

CC

U GA

A

A

A

A

CA

A

A

GU

A

CA CGG

G

A

UAC

G

A: Structure often more important than sequence

6S mimics anopen promoter

Barrick et al. RNA 2005Trotochaud et al. NSMB 2005Willkomm et al. NAR 2005

E.coli

Chloroflexus aurantiacus

Geobacter metallireducensGeobacter sulphurreducens

Chloroflexiδ -Proteobacteria

Symbiobacterium thermophilum

Used by CMfinder

Found by scan

“Central Dogma”=

“Central Chicken & Egg”?DNA RNA Protein

Was there once an “RNA World”?

DNA (chromosome) RNA

(messenger)

Proteingene

cell

6.5 RNA Secondary Structure

Algorithms

RNA Secondary Structure

RNA. String B = b1b2…bn over alphabet { A, C, G, U }.

Secondary structure. RNA is single-stranded so it tends to loop backand form base pairs with itself. This structure is essential forunderstanding behavior of molecule.

G

U

C

A

GA

A

G

CG

A

UG

A

U

U

A

G

A

C A

A

C

U

G

A

G

U

C

A

U

C

G

G

G

C

C

G

Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA

complementary base pairs: A-U, C-G

RNA Secondary Structure

Secondary structure. A set of pairs S = { (bi, bj) } that satisfy: [Watson-Crick.]

– S is a matching and– each pair in S is a Watson-Crick pair: A-U, U-A, C-G, or G-C.

[No sharp turns.] The ends of each pair are separated by at least 4intervening bases. If (bi, bj) ∈ S, then i < j - 4.

[Non-crossing.] If (bi, bj) and (bk, bl) are two pairs in S, then wecannot have i < k < j < l.

Free energy. Usual hypothesis is that an RNA molecule will form thesecondary structure with the optimum total free energy.

Goal. Given an RNA molecule B = b1b2…bn, find a secondary structure Sthat maximizes the number of base pairs.

approximate by number of base pairs

RNA Secondary Structure: Examples

Examples.

C

G G

C

A

G

U

U

U A

A U G U G G C C A U

G G

C

A

G

U

U A

A U G G G C A U

C

G G

C

A

U

G

U

U A

A G U U G G C C A U

sharp turn crossingok

G

G≤4

base pair

RNA Secondary Structure: Subproblems

First attempt. OPT(j) = maximum number of base pairs in a secondarystructure of the substring b1b2…bj.

Difficulty. Results in two sub-problems. Finding secondary structure in: b1b2…bt-1. Finding secondary structure in: bt+1bt+2…bn-1.

1 t n

match bt and bn

OPT(t-1)

need more sub-problems

Dynamic Programming Over Intervals

Notation. OPT(i, j) = maximum number of base pairs in a secondarystructure of the substring bibi+1…bj.

Case 1. If i ≥ j - 4.– OPT(i, j) = 0 by no-sharp turns condition.

Case 2. Base bj is not involved in a pair.– OPT(i, j) = OPT(i, j-1)

Case 3. Base bj pairs with bt for some i ≤ t < j - 4.– non-crossing constraint decouples resulting sub-problems– OPT(i, j) = 1 + maxt { OPT(i, t-1) + OPT(t+1, j-1) }

Remark. Same core idea in CKY algorithm to parse context-free grammars.

take max over t such that i ≤ t < j-4 andbt and bj are Watson-Crick complements

Bottom Up Dynamic Programming Over Intervals

Q. What order to solve the sub-problems?A. Do shortest intervals first.

Running time. O(n3).

RNA(b1,…,bn) { for k = 5, 6, …, n-1 for i = 1, 2, …, n-k j = i + k Compute M[i, j]

return M[1, n]}

using recurrence

0 0 0

0 0

02

3

4

1

i

6 7 8 9

j

0 0 0

0 0

02

3

4

1

i

6 7 8 9

j

CUCCGGUUGCAAUGUC n= 16((.(....).)..).. 0 0 0 0 0 1 1 1 1 1 2 2 2 3 3 3 0 0 0 0 0 0 0 0 1 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

E.g.:OPT(6,16) = 2:

GUUGCAAUGUC(.(...)...)

E.g.:OPT(1,6) = 1:

CUCCGG(....)

top related