multiple sequence alignments algorithms. mlagan: progressive alignment of dna given n sequences,...
Post on 22-Dec-2015
227 views
TRANSCRIPT
![Page 1: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/1.jpg)
Multiple Sequence Alignments
Algorithms
![Page 2: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/2.jpg)
MLAGAN: progressive alignment of DNA
Given N sequences, phylogenetic tree
Align pairwise, in order of the tree (LAGAN)
Human
Baboon
Mouse
Rat
![Page 3: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/3.jpg)
MLAGAN: main steps
Given a collection of sequences, and a phylogenetic tree
1. Find local alignments for every pair of sequences x, y
2. Find anchors between every pair of sequences, similar to LAGAN anchoring
3. Progressive alignment• Multi-Anchoring based on reconciling the pairwise anchors• LAGAN-style limited-area DP
4. Optional refinement steps
![Page 4: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/4.jpg)
MLAGAN: multi-anchoring
XZ
YZ
X/Y
Z
To anchor the (X/Y), and (Z) alignments:
![Page 5: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/5.jpg)
Whole-genome alignment Human/Mouse/Rat
![Page 6: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/6.jpg)
Insertion/Deletion Rate Analysis
![Page 7: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/7.jpg)
Heuristics to improve multiple alignments
• Iterative refinement schemes
• A*-based search
• Consistency
• Simulated Annealing
• …
![Page 8: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/8.jpg)
Iterative Refinement
One problem of progressive alignment:• Initial alignments are “frozen” even when new evidence comes
Example:
x: GAAGTTy: GAC-TT
z: GAACTGw: GTACTG
Frozen!
Now clear correct y = GA-CTT
![Page 9: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/9.jpg)
Iterative Refinement
Algorithm (Barton-Stenberg):
1. Align most similar xi, xj
2. Align xk most similar to (xixj)3. Repeat 2 until (x1…xN) are aligned
4. For j = 1 to N,Remove xj, and realign to x1…xj-1xj+1…xN
5. Repeat 4 until convergence
Note: Guaranteed to converge
![Page 10: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/10.jpg)
Iterative Refinement
For each sequence y1. Remove y2. Realign y
(while rest fixed)x
y
z
x,z fixed projection
allow y to vary
![Page 11: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/11.jpg)
Iterative Refinement
Example: align (x,y), (z,w), (xy, zw):
x: GAAGTTAy: GAC-TTAz: GAACTGAw: GTACTGA
After realigning y:
x: GAAGTTAy: G-ACTTA + 3 matchesz: GAACTGAw: GTACTGA
![Page 12: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/12.jpg)
Iterative Refinement
Example not handled well:
x: GAAGTTAy1: GAC-TTAy2: GAC-TTAy3: GAC-TTA
z: GAACTGAw: GTACTGA
Realigning any single yi changes nothing
![Page 13: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/13.jpg)
Restricted MDP
Run MDP, restricted to radius R from m
x
y
z
Running Time: O(2N RN-1 L)
![Page 14: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/14.jpg)
Tree Refinement
Run 3D-DP, restricted to radius R from m, for each tree node
x
y
z
Running Time: ~7R2 LN
R: RadiusL: Alignment LengthN: Number of Sequences
![Page 15: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/15.jpg)
A* for Multiple Alignments
Review of the A* algorithm
v
START
GOAL
g(v)h(v)
• g(v) is the cost so far• h(v) is an estimate of the minimum cost from v to GOAL• f(v) ≥ g(v) + h(v) is the minimum cost of a path passing by v
1. Expand v with the smallest f(v)2. Never expand v, if f(v) ≥ shortest path to the goal found so
far
![Page 16: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/16.jpg)
A* for Multiple Alignments
• Nodes: Cells in the DP matrix• g(v): alignment cost so far• h(v): sum-of-pairs of individual pairwise alignments
• Initial minimum alignment cost estimate: sum-of-pairs of global pairwise alignments
v
START
GOAL
g(v)h(v)
![Page 17: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/17.jpg)
Consistency – T-Coffee
z
x
y
xi
yj yj’
zk
![Page 18: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/18.jpg)
![Page 19: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/19.jpg)
T – Coffee Layout
LALIGN CLUSTALW
![Page 20: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/20.jpg)
Generating Primary Library
A A A
B B B A AC C
B B BC C C
ClustalW Primary Library Lalign Primary Library (10 top scoring non–intersecting
Local (Global pairwise alignment) (Pairwise alignment)
Library has information for each N(N-1)/2 sequence pairs.
![Page 21: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/21.jpg)
Primary Library
Seq A GARFIELD THE LAST FAT CAT Seq B GARFIELD THE FAST CAT - - -
Prim. Weight = 88
Seq A GARFIELD THE LAST FA-T CAT Seq C GARFIELD THE VERY FAST ---
Prim. Weight = 77
Seq A GARFIELD THE LAST FAT CAT Seq D - - - -- - -- THE - - - - FAT CAT
Prim. Weight = 100
Seq B GARFIELD THE - - - - FAST CAT Seq C GARFIELD THE VERY FAST CAT
Prim. Weight = 100
Seq B GARFIELD THE FAST CAT Seq D - - - - -- - THE FA-T CAT
Prim. Weight = 100
Seq C GARFIELD THE VERY FAST CAT Seq D - - - -- -- - THE - - - - FA-T CAT
Prim. Weight = 100
![Page 22: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/22.jpg)
Combining the libraries
Seq A GARFIELD THE LAST FAT CAT
Seq B GARFIELD THE FAST CAT - - -
Primary weight(ClustalW)=88
Primary Weight(Lalign)=88
W(A(G),B(G)) = 88 + 88 = 176
If a pair is duplicated across the two libraries, it is merged into
single entry with weight = sum of two weights
pairs of residue that did not occur are not present ( weight 0 )
![Page 23: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/23.jpg)
Library Extension
• Complete extension requires examination of all triplets.• Not all bring information ( eg. A and B through D ).
• Weight of a pair = weights gathered through examination of all triplets involving that pair.
![Page 24: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/24.jpg)
Running Time
• Complexity of entire procedure:
O(N2 * L2) + O(N3*L) + O(N3) + O(N*L2)
O(N2 L2) - pair-wise library computationO(N3 L) - library extensionO(N3) - computation of NJ treeO(N L2) - progressive alignment computation
Where:L – average sequence lengthN – number of sequences
![Page 25: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/25.jpg)
T-Coffee compared with other methods
Method Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total(141)
Dialign 71.0 25.2 35.1 74.7 80.4 61.5
ClustalW 78.5 32.2 42.5 65.7 74.3 66.4
Prrp 78.6 32.5 50.2 51.1 82.7 66.4
T-Coffee 80.7 37.3 52.9 83.2 88.7 72.1
![Page 26: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/26.jpg)
Gene Recognition
Credits for slides:Marina AlexanderssonLior PachterSerge Saxonov
![Page 27: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/27.jpg)
Reading
• GENSCAN
• EasyGene
• SLAM
• Twinscan
Optional:
Chris Burge’s Thesis
![Page 28: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/28.jpg)
Gene expression
Protein
RNA
DNA
transcription
translation
CCTGAGCCAACTATTGATGAA
PEPTIDE
CCUGAGCCAACUAUUGAUGAA
![Page 29: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/29.jpg)
Gene structure
exon1 exon2 exon3intron1 intron2
transcription
translation
splicing
exon = codingintron = non-coding
![Page 30: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/30.jpg)
Finding genes
Start codonATG
5’ 3’
Exon 1 Exon 2 Exon 3Intron 1 Intron 2
Stop codonTAG/TGA/TAA
Splice sites
![Page 31: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/31.jpg)
![Page 32: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/32.jpg)
![Page 33: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/33.jpg)
Approaches to gene finding
• Homology BLAST, Procrustes.
• Ab initio Genscan, Genie, GeneID.
• Hybrids GenomeScan, GenieEST, Twinscan, SGP, ROSETTA,
CEM, TBLASTX, SLAM.
![Page 34: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/34.jpg)
HMMs for single species gene finding: Generalized HMMs
![Page 35: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/35.jpg)
HMMs for gene finding
GTCAGAGTAGCAAAGTAGACACTCCAGTAACGC
exon exon exonintronintronintergene intergene
![Page 36: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/36.jpg)
GHMM for gene finding
TAA A A A A A A A A A A AA AAT T T T T T T T T T T T T T TG GGG G G G GGGG G G G GCC C C C C C
Exon1 Exon2 Exon3
duration
![Page 37: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/37.jpg)
Observed duration times
![Page 38: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/38.jpg)
Better way to do it: negative binomial
• EasyGene:
Prokaryotic
gene-finder
Larsen TS, Krogh A
• Negative binomial with n = 3
![Page 39: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/39.jpg)
Biology of Splicing
(http://genes.mit.edu/chris/)
![Page 40: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/40.jpg)
Consensus splice sites
(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)
Donor: 7.9 bitsAcceptor: 9.4 bits(Stephens & Schneider, 1996)
![Page 41: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/41.jpg)
Splice site detection
5’ 3’Donor site
Position
-8 … -2 -1 0 1 2 … 17
A 26 … 60 9 0 1 54 … 21C 26 … 15 5 0 1 2 … 27G 25 … 12 78 99 0 41 … 27T 23 … 13 8 1 98 3 … 25
![Page 42: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/42.jpg)
Splice Site Models
• WMM: weight matrix model = PSSM (Staden 1984)
• WAM: weight array model = 1st order Markov (Zhang & Marr 1993)
• MDD: maximal dependence decomposition (Burge & Karlin 1997) decision-tree like algorithm to take significant pairwise dependencies into
account
![Page 43: Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d815503460f94a66d91/html5/thumbnails/43.jpg)
atg
tga
ggtgag
ggtgag
ggtgag
caggtg
cagatg
cagttg
caggccggtgag