coffee shop f91921025 黃仁暐 f92921029 戴志華 f92921041 施逸優 r93921142 吳於芳...
TRANSCRIPT
Coffee Shop
F91921025 黃仁暐F92921029 戴志華F92921041 施逸優R93921142 吳於芳R94921035 林與絜
2005/12/14 2
Menu
Coffee Shop OpeningWhy coffee shop?
Three FlavorsCOFFEE
T-Coffee
3DCoffee
Remarks
Recipes
2005/12/14 3
Multiple Sequence Alignment
Multiple sequence alignment is one of the most important tool for analyzing biological sequence.
structure prediction
phylogenetic analysis
function prediction
polymerase chain reaction (PCR) primer design.
2005/12/14 4
Multiple Sequence Alignment
However, the accuracy is not good enough.difficult to evaluate the quality of a multiple alignment
algorithmically very hard to produce the optimal alignment
In order to increase the accuracy of multiple sequence alignment, we opened a coffee shop to share three kinds of coffee.
2005/12/14 5
Before (drinking) COFFEEFor comparative genomics, and why?
Understanding the process of evolution at gross level and local level
Translate DNA sequence data into proteins of known function
Meaning of conservative regions
E. coli, C. elegans, Drosophila, Human…What’s their relationship?
2005/12/14 6
阿拉伯芥
大腸桿菌
酵母菌
集胞藻屬( 藍綠藻類 )
線蟲 果蠅
人類
Classification for genes of different function
Adapted from “Principles of genome analysis and genomics” Fig. 7.5 (p.129), by S. B. Primrose and R. M. Twyman, 3rd edition
2005/12/14 7
Comparative genomics vs. multiple sequence alignment
Alignment → conservative region
Conservative region → gene location
Evolution evidence
http://www.public.iastate.edu/~semrich/compgen/
2005/12/14 8
http://gchelpdesk.ualberta.ca/news/02jun05/cbhd_news_02jun05.php
A: human chromosome IB: human chromosome IIC: human chromosome III
Chromosome III region 125-128 Mb was magnified 120X
The alignment between the chromosomes
2005/12/14 9
Our FlavorsCOFFEE: A New Objective Function For Multiple Sequence Alignmnent.
C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998
T-Coffee: A novel method for multiple sequence alignments.
C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000
3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004
COFFEE
2005/12/14 11
COFFEE
An objective function for multiple sequence alignments
Cédirc Notredame, Liisa Holm and Desmond G. Higgins
SAGA with COFFEE score
2005/12/14 12
Introduction
COFFEE - Consistency based Objective Function For alignmEnt EvaluationAn objective function, COFFEE score, is proposed to measure the quality of multiple sequence alignmentsOptimize the COFFEE score of a multiple sequence alignment with the genetic algorithm package SAGA (Sequence Alignment Genetic Algorithm)
2005/12/14 13
Overview of their method
Given a set of sequences to be aligned
a library containing all pairwise alignments between them,
the COFFEE score reflects the level of consistency between a multiple sequence alignment and the library.
2005/12/14 14
COFFEE score
×
×
1
1 1,,
1
1 1,,
)(
)(
COFFEE N
i
N
ijjiji
N
i
N
ijjiji
ALENW
ASCOREW
score
librarytheandAbetweensharedarethat
residuesofpairsalignedofnumberASCORE
with
ji
ji
,
, )(
:
2005/12/14 15
COFFEE score
2005/12/14 16
Using COFFEE in SAGAIteratively, a multiple sequence alignment with higher COFFEE score is generated by SAGA until the COFFEE score cannot be improved SAGA follows the general principle of genetic algorithm.
The notion of survival of the fittest
SAGA iteratively does: Evaluate the score of the alignmentsThe fitter an alignment, the more likely it is to survive and produce an offspringAlignments survived may be kept unchanged, randomly modified (mutation), or combined with another alignment (cross-over)
2005/12/14 17
ResultsCOFFEE function
SAGA
Optimization of COFFEE function
Effect of optimization
Comparison: COFFEE and others
Others: PRRP, Clustal W, PILEUP, SAGA MSA, SAM
COFFEE score & alignment accuracy
等下會看到一堆表格很枯燥,所以請忍耐…
2005/12/14 18
Optimization COFFEE function was optimized by SAGA
Using ClustalW alignments
Using SAGA alignments
2005/12/14 19
Comparison
Multiple alignments of SAGA COFFEE and 5 other methods
PRRP, ClustalW, PILEUP, SAGA MSA, SAM
Performance of SAGA and ClustalW
Comparison of other 5 methods即使 SAGA-COFFEE 不是最好的結果 →跟最好的也相去不遠
Identity level lower → better SAGA-COFFEE results
2005/12/14 20
2005/12/14 21
Ratio of (E+H) residue correctly aligned
Better of worse alignment? SAGA-COFFEE & others
NO such thing as an ideal method
Correctly aligned ratio Better than PRRP
Worse than PRRP
2005/12/14 22
COFFEE score and alignment accuracy
r=0.65
Coffee sequence score
E+H accuracy (%)E+H accuracy (%)
Average identity (%)
由 coffee score 去預測 alignment 的準確度Average identity 並沒有辦
法預測 alignment 的準確度
>85% 的 sequence 都可預測 (error ~ ±10%)
2005/12/14 23
Correlation between score and accuracy
Higher score → higher accuracy
SAGA produces more high-score sequence than ClustalW
Coffee Break ?
T-Coffee
2005/12/14 26
T-Coffee
A novel method for multiple sequence alignments
C.Notredame, D. Higgins, J. Heringa
ClustalW with extended library
2005/12/14 27
ClustalW
ClustalW is the core alignment stradegy of T-Coffee, it follows the procedure below:
Pairwise Alignment: calculate distance matrix
Guide TreeUnrooted Neighbor-Joining Tree
Rooted Neighbor-Joining Tree: guide tree with sequence weights
Progressive Alignment: align following the guide tree
2005/12/14 28
Calculate distance matrix
2005/12/14 29
Guide tree
Use Neighbor-Joining Method to build guide tree from distance matrix.
First construct an unrooted Neighbor-Joining tree, then convert it to a rooted Neighbor-Joining tree, the guide tree.
2005/12/14 30
Unrooted Neighbor-Joining Tree
2005/12/14 31
Rooted Neighbor-Joining Tree
2005/12/14 32
Progressive Alignment: align following the guide tree
Seq1 Seq2 Seq3 Seq4 Seq5
Alignment 1 Alignment 2
Alignment 3 Final alignment
2005/12/14 33
Progressive-alignment strategy
ProsFaster and saving spaces. (compared with computing all possible multiple alignments)
Cons May not find optimum solution.
Errors made in the rest alignments cannot be rectified later as the rest of the sequences are added in.
T-Coffee is an attempt to minimize that effect!“Once a gap, always a gap!”
2005/12/14 34
T-Coffee Algorithm
Generating a primary library of alignments
Derivetion of the primary library weights
Combination of the libraries
Extending the library
Progressive alignment strategy
2005/12/14 35
ClustalW Primary Library (Global)
Lalign Primary Library (Local)
Weighting
Primary Library
2005/12/14 36
Primary Library
2005/12/14 37
ClustalW Primary Library (Global)
Lalign Primary Library (Local)
Weighting
Primary Library
Extension
Extended Library
2005/12/14 38
Extended Library
A
Weight(A-C-B)
= min( Weigh(A-C), Weight(B-C) )
= min( 77, 100 ) = 77
Weight(A-D-B)
= min( Weight(A-D), Weight(B-D) )
= min( 100, 100 ) = 100
2005/12/14 39
Extended Library
SeqA: GARFIELD THE LAST FAT CAT
SeqB: GARFIELD THE FAST CAT
SeqA: GARFIELD THE LAST FAT CAT
SeqB: GARFIELD THE FAST CATA
2005/12/14 40
Extended Library
SeqA: GARFIELD THE LAST FAT CAT
SeqB: GARFIELD THE FAST CAT
ASeqA: GARFIELD THE LAST FAT CAT
SeqB: GARFIELD THE FAST CAT
2005/12/14 41
Progressive Alignment
ClustalW Primary Library (Global)
Lalign Primary Library (Local)
Weighting
Primary Library
Extension
Extended Library
Multiple Alignment Information
2005/12/14 42
Progressive Assignment
2005/12/14 43
Complexity Analysis
complexity of the whole procedure:O(N2L2) + O(N3L) + O(N3) + O(NL2)O(N2L2): computation of the pair-wise libraryO(N3L): computation of the extended pair-wise libraryO(N3): computation of the NJ treeO(NL2): computation of the progressive alignmentN sequences that can be aligned in a multiple alignment of length L
2005/12/14 44
Experiment
Implementation environment
Result 1: Effect of combining local and global alignments without extension; effect of the library extension
Result 2: compared with other multiple sequence alignment methods
2005/12/14 45
Implementation environment
Programming language: ANSI C
Hardware: LINUX platform with Pentium II processors (330 MHz).
Test case: BaliBase database of multiple sequence alignment
2005/12/14 46
Result 1
Table 1: The effect of combining local and global alignments
Name global/local/extend Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total(141) Significance
C ClustalW pw /.../... 70.6 26.7 43.0 56.0 60.0 58.9 7.8
CE ClustalW pw/…/ex 77.1 33.6 47.6 64.8 75.9 66.3 17.7
L .../Lalign pw/... 65.4 12.1 22.8 53.9 66.0 52.0 7.8
LE .../Lalign pw/ex 72.6 25.6 47.2 77.5 85.5 64.2 16.3
CL ClustalW pw/Lalign pw/.. 76.2 32.0 48.3 76.2 74.6 66.5 12.1g
CLE ClustalW pw/Lalign pw /ex 80.6 37.1 52.9 83.2 88.6 72.0
2005/12/14 47
Result 2
Table 2: T-coffee compared with other multiple sequence alignment methods
Method Cat1(81) Cat2(23) Cat3(4) Cat4(12) Cat5(11) Total1(141) Total2(141) Significance
Dialign 71.0 25.2 35.1 74.7 80.4 61.5 57.3 11.3ClustalW 78.5 32.2 42.5 65.7 74.3 66.4 58.6 26.2Prrp 78.6 32.5 50.2 51.1 82.7 66.4 59.0 36.9 T-Coffee 80.6 37.1 52.9 83.2 88.6 72.0 68.6
3DCoffee
2005/12/14 49
3DCoffee
Combining protein sequences and structures within multiple sequence alignments
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame
T-Coffee with structure information
2005/12/14 50
3DCoffee
Structural information can help to improve the quality of multiple sequence alignments
3DCoffeeCombines protein sequences and structuresIs based on T-Coffee version 2.00Uses a mixture of pairwise sequence alignments and pairwise structure comparison methods.
2005/12/14 51
3DCoffee
Use T-Coffee to compileA primary library: a list of weighted pairs of residues.
An extended library: usage the column consistency relationship between all sequences
According to the structure informationFugue, SAP, LSQman
2005/12/14 52
3DCoffee
Fugue – a threading method that aligns a protein sequence with a 3D-structure
SAP – uses DP to compute a pairwise alignment based on a non-rigid structure superposition
LSQman – a rigid body structure superposition package
2005/12/14 53
3DCoffee
Set the weight of new alignment as 100which is the most score of primary library
Add the weighted alignments into the library
Carry out progressive alignment the same as T-Coffee
2005/12/14 54
Remarks
COFFEE : An objective function for multiple sequence alignments
SAGA with COFFEE score
T-Coffee : A novel method for multiple sequence alignments
ClustalW with extended library3DCoffee : Combining protein sequences and structures within multiple sequence alignmentsT-Coffee with structure information
2005/12/14 55
RecipesCLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Julie D.Thompson, Desmond G.Higgins+ and Toby J.Gibson*. 1994
COFFEE: A New Objective Function For Multiple Sequence Alignmnent.
C. Notredame, L. Holme and D.G. Higgins,Bioinformatics,Vol 14 (5) 407-422,1998
T-Coffee: A novel method for multiple sequence alignments.C.Notredame, D. Higgins, J. Heringa,Journal of Molecular Biology,Vol 302, pp205-217,2000
3DCoffee: Combining Protein Sequences and Structures within Multiple Sequence Alignments.
O. O'Sullivan, K Suhre, C. Abergel, D.G. Higgins, C. Notredame. Journal of Molecular Biology,Vol 340, pp385-395,2004
2005/12/14 56
Q & A
2005/12/14 57
Thank You
2005/12/14 58
Residue scoreSequence score measurement
Global measurement
Residue was scored 9 >90% of the pairs involved in were also present in the reference library
Residue score evaluated → substitution defined
Class 5 substitution → residue score ≥ 5
2005/12/14 59
5566677788888888899999877- - - - -66666666788888888887
vsdvprdlevvaatptslliswdap gslevvaatptslliswdap
2005/12/14 60
• Correct substitution: SAGA > ClustalW
• Lower accuracy: more false positive in SAGA alignment
2005/12/14 61
High-scoring residues with high accuracy Higher substitution
category → smaller number of prediction