chapter 5 multiple sequence alignment. multiple alignment is an extension of pairwise alignment...
TRANSCRIPT
![Page 1: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/1.jpg)
Chapter 5
Multiple Sequence Alignment
![Page 2: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/2.jpg)
•Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned•This alignment provides insights not possible in pairwise alignments, such as
•Conserved sequence patterns•Conserved and functionally critical amino acid residues•Prerequisite for phylogenetic analyses•Prediction of protein secondary and tertiary structures•Design of degenerate PCR primers
![Page 3: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/3.jpg)
Scoring Function
•The purpose of multiple alignment is to line up sequences in a way so that a maximum number of residues from each sequence are matched according to a scoring function•The scoring function is generally based on “sum of pairs” (SP)•The SP is the sum of all pairwise scores for all residues in the alignment
Sequence 1: G K NSequence 2: T R NSequence 3: S H E
G:T = 1 K:R=2 N:N=6 T:S = 1 R:H=0 N:E=0 G:S = 0 K:H=-1 N:E=0Total:2 + 1 + 6 = 9
C S T P A G N D E Q H R K M I L V F Y W C 9 -1 -1 -3 0 -3 -3 -3 -4 -3 -3 -3 -3 -1 -1 -1 -1 -2 -2 -2 S -1 4 1 -1 1 0 1 0 0 0 -1 -1 0 -1 -2 -2 -2 -2 -2 -3 T -1 1 4 1 -1 1 0 1 0 0 0 -1 0 -1 -2 -2 -2 -2 -2 -3 P -3 -1 1 7 -1 -2 -1 -1 -1 -1 -2 -2 -1 -2 -3 -3 -2 -4 -3 -4 A 0 1 -1 -1 4 0 -1 -2 -1 -1 -2 -1 -1 -1 -1 -1 -2 -2 -2 -3 G -3 0 1 -2 0 6 -2 -1 -2 -2 -2 -2 -2 -3 -4 -4 0 -3 -3 -2 N -3 1 0 -2 -2 0 6 1 0 0 -1 0 0 -2 -3 -3 -3 -3 -2 -4 D -3 0 1 -1 -2 -1 1 6 2 0 -1 -2 -1 -3 -3 -4 -3 -3 -3 -4 E -4 0 0 -1 -1 -2 0 2 5 2 0 0 1 -2 -3 -3 -3 -3 -2 -3 Q -3 0 0 -1 -1 -2 0 0 2 5 0 1 1 0 -3 -2 -2 -3 -1 -2 H -3 -1 0 -2 -2 -2 1 1 0 0 8 0 -1 -2 -3 -3 -2 -1 2 -2 R -3 -1 -1 -2 -1 -2 0 -2 0 1 0 5 2 -1 -3 -2 -3 -3 -2 -3 K -3 0 0 -1 -1 -2 0 -1 1 1 -1 2 5 -1 -3 -2 -3 -3 -2 -3 M -1 -1 -1 -2 -1 -3 -2 -3 -2 0 -2 -1 -1 5 1 2 -2 0 -1 -1 I -1 -2 -2 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 4 2 1 0 -1 -3 L -1 -2 -2 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 2 4 3 0 -1 -2 V -1 -2 -2 -2 0 -3 -3 -3 -2 -2 -3 -3 -2 1 3 1 4 -1 -1 -3 F -2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 0 -1 6 3 1 Y -2 -2 -2 -3 -2 -3 -2 -3 -2 -1 2 -2 -2 -1 -1 -1 -1 3 7 2 W -2 -3 -3 -4 -3 -2 -4 -4 -3 -2 -2 -3 -3 -1 -3 -2 -3 1 2 11
Thus 29 = 512 times more likely than by random chance
Blosum62 substitution matrix
![Page 4: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/4.jpg)
Exhaustive Algorithms
Brute Force Algorithm
Similar to dynamic programming algorithms that searches for the best solution, examining every possible solutionIn pairwise alignment use a 2D matrixFor N sequences, use an N-dimensional matrixNumber of calculations increase exponentially (N×N×N×N×…)Generally only useful for <=10 short sequences
Divide and Conquer Alignment (DCA)
Identify regional similarities in multiple sequencesDo a brute force alignment of the similar regionsJoin the independently aligned regionshttp://bibiserv.techfak.uni-bielefeld.de/dca/
![Page 5: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/5.jpg)
![Page 6: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/6.jpg)
Heuristic Algorithm
Progressive Alignment Method
•Pairwise alignment by Needleman-Wunsch of all pairs•Records similarity scores of aligned pairs•Scores entered into matrix•Guide tree constructed that reflects similarity between aligned pairs•Most closely related sequences re-aligned with Needleman-Wunsch•Different substitution matrices are selected depending on evolutionary distance between sequences to be aligned•Aligned pair converted to “consensus sequence” with fixed gaps•Consensus sequences treated as ordinary sequence for next step which is pairwise alignment with most related sequence in guide tree•Next “consensus sequence” is calculated and process repeated until all sequences are aligned•Most famous: clustalW (command line) clustalX (GUI)•http://www.ebi.ac.uk/Tools/clustalw2/index.html
![Page 7: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/7.jpg)
Download and install clustW from
ftp://ftp.ebi.ac.uk/pub/software/clustalw2/2.0.9/
Spend a few minutes entering sequences and doing alignments
![Page 8: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/8.jpg)
•ClustalW uses gap penalties that is context sensitive:•Gaps count more close to runs of hydrophobic amino acids (more likely to be in internal conserved regions of a protein) compared to next to hydrophilic regions or G, likely to be on the outside in loops•Weighing scheme: closely related sequences are given a lower weighting score•The weighting score is dependent upon the branch length divided by the number of shared branches•This has the effect of minimizing a possible dominating effect of common sequences
![Page 9: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/9.jpg)
Drawbacks and Solutions
•Based on global alignment – thus only sequences of similar length can be aligned•Long gaps required for alignment of dissimilar sequence length penalized•“Greedy” algorithm – once gaps are introduced, they stay in subsequence consensus sequences
![Page 10: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/10.jpg)
T-Coffee
•Tree-based Consistency Objective Function for alignment Evaluation•http://www.ebi.ac.uk/Tools/t-coffee/•http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi•Performs global alignment with clustal•Local pairwise alignment with Lalign•Global and ten best local alignments are pooled to form a library•All pairwise alignments are then aligned with a third possible sequence•Distance matrix calculated to build a guide tree•Guide tree used for final multiple alignment•Does not get” stuck” in sub-optimal initial alignments•Slower than clustal
![Page 11: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/11.jpg)
dbClustal
•First performs BLASTP search for a query sequence•Aligned pairs are analyzed to obtain anchor points (local conserved regions) using a program called Ballast•Global alignment generated by Clustal, weighed to anchor points•Initial local alignment minimizes errors in divergent sequences•Multiple alignment subsequently evaluated by NorMD which removes poorly aligned sequences•http://bips.u-strasbg.fr/PipeAlign/jump_to.cgi?DbClustal+noid
![Page 12: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/12.jpg)
Partial Order Alignment (POA)
•http://bioinformatics.ucla.edu/poa/•Multiple alignments performed on more and more sequences from a list•Identical residues condensed to nodes•Each new sequence aligned with each sequence of the graph model•Eliminates the problem of error fixation•Faster and more accurate than clustal
![Page 13: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/13.jpg)
PRALINE
•http://zeus.cs.vu.nl/programs/pralinewww/•Builds profiles of sequences to be aligned•Profiles generated by PSI-BLAST•Because profiles contain information on close relatives, divergent sequences are more accurately aligned•Program can incorporate secondary protein structure•Very sophisticated but very slow
![Page 14: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/14.jpg)
Iterative Alignment
PRRN
•Find optimal solution by iteratively modifying sub-optimal solutions•http://prrn.ims.u-tokyo.ac.jp/•Multiple alignment is performed on whole group of sequences•Sequences randomly distributed into two groups•Dynamic programming applied to consensus sequences derived from each group•The random split is repeated and another round of dynamic programming alignment performed•This is repeated until the alignment score no longer increases•A multiple alignment of the sequences are then again performed•Process repeated until multiple alignment score no longer improves
![Page 15: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/15.jpg)
Iterative Alignment
DIALIGN2
•http://mobyle.pasteur.fr/cgi-bin/MobylePortal/portal.py?form=dialign•Breaks all sequences down into segments, and performs alignment between segments•High-scoring segments are progressively assembled into larger and larger sequences•The score of an alignment is calculated from the block and not from individual residues•Sequence regions between block are left unaligned•Very suited to alignment of divergent sequences
![Page 16: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/16.jpg)
Practical Issues
•DNA alignments are only based on 4 nucleotides, and are less reliable than protein sequence alignments•Alignments of DNA sequence does not consider functional issues, suchas gene boundaries•Insertion of gaps may “break” codons or cause frameshift that will not be tolerated in the protein, and is functional nonsense•Thus, always better toalign protein sequences•Possible to convert DNA to amino acid sequence, then align, and then decode back to DNA
•RevTrans (http://www.cbs.dtu.dk/services/RevTrans/)•PROTA2DNA (missing link…)
![Page 17: Chapter 5 Multiple Sequence Alignment. Multiple alignment is an extension of pairwise alignment where multiple sequences are aligned This alignment provides](https://reader035.vdocuments.mx/reader035/viewer/2022071716/56649de55503460f94adce38/html5/thumbnails/17.jpg)
Editing and Format
•Most alignment programs require final editing by a human to ensure that there are no problems in functionality•Finding badly aligned regions•Removing non-sensical gaps etc.•http://www.mbio.ncsu.edu/bioEdit/bioedit.html
•Need to convert one sequence format to another: http://iubio.bio.indiana.edu/cgi-bin/readseq.cgi/