phylogenetic reconstruction based on rna secondary structural alignment

Post on 05-Jan-2016

41 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Phylogenetic Reconstruction based on RNA Secondary Structural Alignment. Benny Chor, Tel-Aviv Univ. Joint work with Moran Cabili, Assaf Meirovich, and Metsada Pasmanik-Chor. Phylogenetic Trees Based on What ? Morphology (1800 - ) Single gene sequence (DNA or AA) (1960 - ). - PowerPoint PPT Presentation

TRANSCRIPT

Phylogenetic Reconstruction based on RNA Secondary

Structural Alignment

Benny Chor, Tel-Aviv Univ.

Joint work with Moran Cabili, Assaf Meirovich, and Metsada Pasmanik-Chor

Phylogenetic Trees Based on What ?

• Morphology (1800 - )

• Single gene sequence (DNA or AA)

(1960 - )

Phylogenetic Trees Based on What ?

• Whole genomes (2002 - )

1. Find a reliable metric between pairs of objects.

2. Design / choose / modify a good algorithm for determining metric (pairwise distances).

3. Compute distance matrix.

4. Construct a Neighbor Joining tree from the distance matrix.

5. As a sanity check, compare resulting tree to

“standard & accepted” ones.

NJ

More Sources to Base Phylogeny On?A Proposed, Metric Induced Approach

Metric Induced ApproachWas already applied (fairly successfully), e.g.

for constructing phylogenies based on whole

genomes/proteomes (Burstein et al., 2005),

and others, based on metabolic networks

(Tuller et al., 2006).

Of course distances that are

appropriate to each domain must

be applied (or especially designed).

NJ

Can phylogenetic reconstruction be based on RNA secondary structures ?

Our Question

Answer: Yes, And Even Quite Well

Archaea

Eukarya

Bacteria

Our tree, based on secondary structs.of 16s rRNA from 91 species

1. Find an efficient alignment algorithm (similarity based) pair-wise RNA secondary structures.

2. Transform similarity to distance.

3. Use RNA databases to get the RNA molecules

and structures. Apply the algorithm to compute

the distance for each pair of molecules.

4. Run NJ to produce trees.

Metric Induced Approach: Specifics

- We chose to use RSmatch: A sophisticated dynamic programming algorithm, based on the “dot bracket” representation of the secondary structure. J. Liu , J.T. Wang , J. Hu , B. Tian. BMC Bioinformatics 2005 , 6:89.

- RSmatch sorts each dot and bracket to components, and then compares components according to their order in the secondary structure.

- RSmatch employs both sequences and structures.

- Complexity: O(nm), where n and m are the lengths of the two

RNA molecules that are compared.

TAATTATCGGAAGCAGTGCCTTCCATAATTA

( ( ( ( ( ( ( . ( ( ( ( ( . . . . . . ) ) ) ) ) ) ) ) ) ) ) )

The Alignment Algorithm Chosen

From Similarity to Distance

In transforming the scoring matrix from similarity to distance, we tried to preserve the ratios between mismatches values, and of course lower similarity should imply higher distance.

Distance metric requirements:

Symmetry, Δ inequality, non negativity, self distance=0

Actual Distance Matrices: Higher Mismatch Penalties at “Dots”

AU CG GC GU UA UG

AU 0 1 1 0.5 0.5 0.5

CG 1 0 0.5 1 1 1

GC 1 0.5 0 1 1 1

GU 0.5 1 1 0 0.5 0.5

UA 0.5 1 1 0.5 0 0.5

UG 0.5 1 1 0.5 0.5 0

A C G U

A 0 2 2 2

C 2 0 2 2

G 2 2 0 2

U 2 2 2 0

- Gap cost : 3 per nucleotide involved.

- Δ inequality : mismatch < 2* gap cost

DBs of Reliable Secondary Struc.

• RNaseP DB:http://www.mbio.ncsu.edu/RNaseP/

Sequences length: ~300 - 400 (+/-) nucleotides

DBs constructed with manual intervention

RNaseP function:

Cleaves off an extra, or precursor, sequence of RNA on tRNA molecules.

• 16S rRNA:Comparative RNA Web Site: http://www.rna.icmb.utexas.edu/

Sequences length: ~1,500 (+/-) nucleotides

16S function:

In charge of tRNA binding and formation of peptide bonds during translation.

Our results …ahhm… trees

RNaseP Tree, 51 SpeciesSecondary structure based tree

• Good partition to 3 kingdoms.• Bacteria (characterized by Bxy) also look good.

RNaseP 51 SpeciesSequence based tree

Eukarya

Bacteria

Archaea

Eukaryotes are not monophyletic (yeast external).

16s rRNA – 20 Species Secondary structure based

tree

Fungi

Bacillariophyta

Viridaeplanatae

Mammalia

Amphibia

16s rRNA –91 SpeciesSecondary structure based tree

Eukarya

Bacteria

Archaea

After completing this project, we discovered a related, earlier work from David Penny’s group. When determining evolutionary relationships between some catalytic RNA molecules, they constructed a 16S rRNA tree based on a similar “distance approach”.

We compared our results to

the trees published in their article

(using a different distance algorithm,

RNAdistance, by Shapiro & Zhang).

Collins et al., 2000

Collins et al., 2000.

Collins’ 16s rRNA sequence based tree

Collins’ 16s r RNA secondary struct based tree

16 Species

Bacteria

Archaea

Bacteria

Archaea

Our Tree, 13 Out of 16 Collins’ Species

Secondary structure based treeArchaea

Bacteria

A Close Look at the Trees

Collins’ 16s rRNA seq based tree

Our 16s second. struct. tree

Collins’ 16s second. struct. based tree

outgroups

A Close Look at Sec. Strucs. Supports a “Thermoplasma Outgroup” Theory

Methanobacteruim Methanococcus Thermoplasma

Conclusions

1. Encouraging results

2. Accuracy of structure based trees is comparable to sequence based trees.

3. Warning: Reliable secondary structures

are crucial for accurate tree reconstruction.

top related