scoring(matrices(for(( sequence(comparisons(02710/lectures/scoringmatrices2015.pdf ·...
TRANSCRIPT
![Page 1: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/1.jpg)
Scoring Matrices for Sequence Comparisons
DEKM book Notes from Dr. Bino John
and Dr. Takis Benos
1
![Page 2: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/2.jpg)
Why compare sequences?
• Given a new sequence, infer its funcHon based on similarity to another sequence
• Find important molecular regions – conserved across species
2
![Page 3: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/3.jpg)
3
Sequence -‐> Structure -‐> FuncHon
FUNCTION
SEQUENCE
STRUCTURE
? MALRAK…
Cytochrome protein
![Page 4: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/4.jpg)
4
Query: 1 MLAKGLPPRSVLVKGYQTFLSAPREGLGRLRVPTGEGAGISTRSPRPFNEIPSPGDNGWL 60 MLA+GL RSVLVKG Q FLSAPRE G RV TGEGA IST++PRPF+EIPSPGDNGW+ Sbjct: 1 MLARGLALRSVLVKGCQPFLSAPRECPGHPRVGTGEGACISTKTPRPFSEIPSPGDNGWI 60 Query: 61 NLYHFWRETGTHKVHLHHVQNFQKYGPIYREKLGNVESVYVIDPEDVALLFKSEGPNPER 120 NLY FW+E GT K+H HHVQNFQKYGPIYREKLGN+ESVY+IDPEDVALLFK EGPNPER Sbjct: 61 NLYRFWKEKGTQKIHYHHVQNFQKYGPIYREKLGNLESVYIIDPEDVALLFKFEGPNPER 120 Query: 121 FLIPPWVAYHQYYQRPIGVLLKKSAAWKKDRVALNQEVMAPEATKNFLPLLDAVSRDFVS 180 + IPPWVAYHQ+YQ+P+GVLLKKS AWKKDR+ LN EVMAPEA KNF+PLLD VS+DFV Sbjct: 121 YNIPPWVAYHQHYQKPVGVLLKKSGAWKKDRLVLNTEVMAPEAIKNFIPLLDTVSQDFVG 180 Query: 181 VLHRRIKKAGSGNYSGDISDDLFRFAFESITNVIFGERQGMLEEVVNPEAQRFIDAIYQM 240 VLHRRIK+ GSG +SGDI +DLFRFAFESITNVIFGER GMLEE+V+PEAQ+FIDA+YQM Sbjct: 181 VLHRRIKQQGSGKFSGDIREDLFRFAFESITNVIFGERLGMLEEIVDPEAQKFIDAVYQM 240 Query: 241 FHTSVPMLNLPPDLFRLFRTKTWKDHVAAWDVIFSKADIYTQNFYWELRQKGSVHHDYRG 300 FHTSVPMLNLPPDLFRLFRTKTW+DHVAAWD IF+KA+ YTQNFYW+LR+K ++Y G Sbjct: 241 FHTSVPMLNLPPDLFRLFRTKTWRDHVAAWDTIFNKAEKYTQNFYWDLRRKRE-FNNYPG 299 Query: 301 MLYRLLGDSKMSFEDIKANVTEMLAGGVDTTSMTLQWHLYEMARNLKVQDMLRAEVLAAR 360 +LYRLLG+ K+ ED+KANVTEMLAGGVDTTSMTLQWHLYEMAR+L VQ+MLR EVL AR Sbjct: 300 ILYRLLGNDKLLSEDVKANVTEMLAGGVDTTSMTLQWHLYEMARSLNVQEMLREEVLNAR 359 Query: 361 HQAQGDMATMLQLVPLLKASIKETLRLHPISVTLQRYLVNDLVLRDYMIPAKTLVQVAIY 420 QAQGD + MLQLVPLLKASIKETLRLHPISVTLQRYLVNDLVLRDYMIPAKTLVQVA+Y Sbjct: 360 RQAQGDTSKMLQLVPLLKASIKETLRLHPISVTLQRYLVNDLVLRDYMIPAKTLVQVAVY 419 Query: 421 ALGREPTFFFDPENFDPTRWLSKDKNITYFRNLGFGWGVRQCLGRRIAELEMTIFLINML 480 A+GR+P FF +P FDPTRWL K++++ +FRNLGFGWGVRQC+GRRIAELEMT+FLI++L Sbjct: 420 AMGRDPAFFSNPGQFDPTRWLGKERDLIHFRNLGFGWGVRQCVGRRIAELEMTLFLIHIL 479 Query: 481 ENFRVEIQHLSDVGTTFNLILMPEKPISFTFWPFNQEATQ 520 ENF+VE+QH SDV T FNLILMP+KPI F PFNQ+ Q Sbjct: 480 ENFKVELQHFSDVDTIFNLILMPDKPIFLVFRPFNQDPLQ 519
Human (C11A_HUMAN; P05108) vs. pig (C11A_PIG; P10612)
Important molecular regions conserved across species
![Page 5: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/5.jpg)
5
Query: 34 TGEGAGISTRSPRPFNEIPSPGDNGWLNLYHFWRETGTHKVHLHHVQNFQKYGPIYREKL 93 T G + +PFN+IP N L++ F + G VH V NF+ +GPIYREK+ Sbjct: 27 TRSGRAPQNSTVQPFNKIPGRWRNSLLSVLAFTKMGGLRNVHRIMVHNFKTFGPIYREKV 86 Query: 94 GNVESVYVIDPEDVALLFKSEGPNPERFLIPPWVAYHQYYQRPIGVLLKKSAAWKKDRVA 153 G +SVY+I PED A+LFK+EG +P R + W AY Y + GVLLK+ AWK DR+ Sbjct: 87 GIYDSVYIIKPEDGAILFKAEGHHPNRINVDAWTAYRDYRNQKYGVLLKEGKAWKTDRMI 146 Query: 154 LNQEVMAPEATKNFLPLLDAVSRDFVSVLHRRIKKAGSGNYSGDISDDLFRFAFESITNV 213 LN+E++ P+ F+PLLD V +DFV+ ++++I+++G ++ D++ DLFRF+ ES++ V Sbjct: 147 LNKELLLPKLQGTFVPLLDEVGQDFVARVNKQIERSGQKQWTTDLTHDLFRFSLESVSAV 206 Query: 214 IFGERQGMLEEVVNPEAQRFIDAIYQMFHTSVPMLNLPPDLFRLFRTKTWKDHVAAWDVI 273 ++GER G+L + ++PE Q FID + MF T+ PML LPP L R + WK+HV AWD I Sbjct: 207 LYGERLGLLLDNIDPEFQHFIDCVSVMFKTTSPMLYLPPGLLRSIGSNIWKNHVEAWDGI 266 Query: 274 FSKADIYTQNFYWELRQKGSVHHDYRGMLYRLLGDSKMSFEDIKANVTEMLAGGVDTTSM 333 F++AD QN + + ++ + Y G+L LL K+S EDIKA+VTE++AGGVD+ + Sbjct: 267 FNQADRCIQNIFKQWKENPEGNGKYPGVLAILLMQDKLSIEDIKASVTELMAGGVDSVTF 326 Query: 334 TLQWHLYEMARNLKVQDMLRAEVLAARHQAQGDMATMLQLVPLLKASIKETLRLHPISVT 393 TL W LYE+AR +QD LRAE+ AAR +GDM M++++PLLKA++KETLRLHP++++ Sbjct: 327 TLLWTLYELARQPDLQDELRAEISAARIAFKGDMVQMVKMIPLLKAALKETLRLHPVAMS 386 Query: 394 LQRYLVNDLVLRDYMIPAKTLVQVAIYALGREPTFFFDPENFDPTRWLSKDKNITYFRNL 453 L RY+ D V+++Y IPA TLVQ+ +YA+GR+ FF PE + P+RW+S ++ YF++L Sbjct: 387 LPRYITEDTVIQNYHIPAGTLVQLGVYAMGRDHQFFPKPEQYCPSRWISSNRQ--YFKSL 444 Query: 454 GFGWGVRQCLGRRIAELEMTIFLINMLENFRVEIQHLSDVGTTFNLILMPEKPISFTFWP 513 GFG+G RQCLGRRIAE EM IFLI+MLENFR+E Q +V + F L+LMPEKPI T P Sbjct: 445 GFGFGPRQCLGRRIAETEMQIFLIHMLENFRIEKQKQIEVRSKFELLLMPEKPIILTIKP 504 Query: 514 FN 515 N Sbjct: 505 LN 506
Human (C11A_HUMAN; P05108) vs. zebrafish (Cyp11a1; Q8JH93)
Important molecular regions conserved across species
![Page 6: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/6.jpg)
Why compare sequences? Do more..
• Determine the evoluHonary constraints at work
• Find mutaHons in a populaHon or family of genes
• Find similar looking sequence in a database • Find secondary/terHary structure of a sequence of interest – molecular modeling using a template (homology modeling)
6
![Page 7: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/7.jpg)
How to compare sequences?
• We need to compare DNA or protein sequences, esHmate their distances e.g., helpful for inferring molecular funcHon by finding similarity with known funcHon
• => we need ‘good alignment’ of sequences • => we need: a measure of judging the quality of alignment in relaHon to other possible alignments, a scoring system.
7
![Page 8: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/8.jpg)
Sequence alignment
• Are two sequences related? – Align sequences or parts of them – Decide if alignment is by chance or evoluHonarily linked?
• Issues: – What sorts of alignments to consider? – How to score an alignment and hence rank? – Algorithm to find good alignments – Evaluate the significance of the alignment
8
![Page 9: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/9.jpg)
9
Aligning two sequences
• Problem: Given two strings, S and T, find their “best” arrangement (global alignment) or the two largest substrings, s and t, with maximum similarity (local alignment).
• Aligned residue states: – Match -‐ Mismatch -‐ Gap (inserHon/deleHon)
• We need: – Scoring scheme for “similarity” – Alignment algorithm
![Page 10: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/10.jpg)
10
Global alignment • Given two strings, S and T, find their relaHve alignment with the highest “score”.
Seq. #1: G A A T T C A G T T A Seq. #2: G G A T C G A
![Page 11: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/11.jpg)
Sequence comparison
• Look for evidence that the sequences had a common ancestor
• MutaHons and selecHon cause divergence between sequences
• MutaHonal process: subsHtuHon – Change residues in a sequence – InserHons or deleHons (indels) which add or remove residues
• SelecHon screens mutaHons, so that some subsHtuHons occur more ogen than others
11
![Page 12: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/12.jpg)
12
Global alignment (cntd)
G A A T T C A G T T A | | | | G G A T C G A
G A A T T C A G T T A | | G G A T C G A
G A A T T C - A G T T A | | | | | G G A - T C G A
2 matches, 4 mism., 0 gaps
4 matches, 3 mism., 0 gaps
5 matches, 1 mism., 2 gaps
![Page 13: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/13.jpg)
13
Alignment: the problem (cntd)
Scoring schemes: three possible situaHons…
• Match • Mismatch • Gap
• Linear • Convex • Affine
How much??
![Page 14: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/14.jpg)
Sequence Alignment
• Methods to align – short genome segments – database of sequences – whole genomes and chromosomes – in the presence of large scale rearrangements
• ApproximaHons for speed and storage
14
![Page 15: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/15.jpg)
Things to learn.. • Compare different scoring schemes • Techniques for obtaining best scoring alignment of a given
type • Reduce computaHonal burden • Speed up database search techniques • Evaluate approximaHons used in common database search
programs • Techniques for aligning DNA and protein sequences
together • IdenHfy sequences of low complexity • IdenHfy significant alignments based on their score • Techniques for aligning complex genomic sequences
15
![Page 16: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/16.jpg)
Circular logic in alignment and scoring How do we now the what is the right distance without a good alignment? How do we construct a good alignment without knowing what subsHtuHons were made previously?
Less gaps but 7 matches More gaps but 8 matches
Which is beker? 16
![Page 17: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/17.jpg)
SubsHtuHon matrices for proteins
• We need to compare DNA or protein sequences, esHmate their distances e.g., helpful for inferring molecular funcHon by finding similarity with known funcHon
• => we need ‘good alignment’ of sequences • => we need: a measure of judging the quality of alignment in relaHon to other possible alignments, a scoring system.
17
![Page 18: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/18.jpg)
AddiHve Scoring Systems • Look at each posiHon of a given alignment, and give a score
for the ‘quality of the match’. Total/cumulaHve scores is the sum over all individual scores.
• e.g. two DNA sequences; match = +1; mismatch = -‐1; gap open = -‐3; gap extension = -‐1
• CumulaHve score = 7 – 4 = 3
18
![Page 19: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/19.jpg)
Scoring or subsHtuHon matrices
• Score (i,j) => AAs or nts (i,j) are aligned at any posiHon
• DNA = 4 x 4 • Protein = 20 x 20
19
![Page 20: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/20.jpg)
20
Base mutaHons (general): definiHons
A GPurines
C TPyrimidines
Transitions
Transversions
![Page 21: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/21.jpg)
21
Nucleic acid Distance Matrices
NucleoHde subsHtuHon matrices. A T C G A 1 0 0 0 T 0 1 0 0 C 0 0 1 0 G 0 0 0 1
A T C G A 5 -4 -4 -4 T -4 5 -4 -4 C -4 -4 5 -4 G -4 -4 -4 5
Identity BLAST
A T C G A 0 5 5 1 T 5 0 1 5 C 5 1 0 5 G 1 5 5 0
Transition/ Transversion
A G
C T
![Page 22: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/22.jpg)
Finding biologically meaningful scores
• DNA: simple scoring matrices are ogen effecHve – usually don’t study very diverged DNA sequences
• Proteins: some subsHtuHons are more likely to occur than others because chemical props are similar e.g., isoleucine for valine, serine for threonine (conserva+ve subs+tu+ons) – To get beker alignments, use scoring matrices derived from staHsHcal analysis of protein data
22
![Page 23: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/23.jpg)
23
ConservaHve subsHtuHons
Slide courtesy: Serafim Batzoglou
![Page 24: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/24.jpg)
Specs for deriving biologically meaningful scores
• IdenHcally aligned AAs should have greater score than any subsHtuHon
• ConservaHve subs score > non-‐conservaHves • Scores should reflect evoluHonary distances – Mouse and Rat => very similar sequences – Mouse and Yeast => divergent sequences – => scoring matrices are a funcHon of evoluHonary distance between sequences
24
![Page 25: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/25.jpg)
Sample subsHtuHon matrix
25
![Page 26: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/26.jpg)
Two frequently used matrices • PAM (point accepted mutaHon) family – Markov chains and phylogeneHc trees for fi;ng evolu+onary model – Log-‐likelihood raHos for ge;ng a score from an es+mated transi+on matrix
• BLOSUM family – Log-‐likelihood raHos for ge;ng a score from an es+mated transi+on matrix
26
![Page 27: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/27.jpg)
1. log-‐odds scoring
• What are the odds that an alignment is biologically meaningful?
• Random model: product of chance events • Non-‐random model: two sequences derived from a common ancestor
27
![Page 28: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/28.jpg)
2. log-‐odds scoring
28
![Page 29: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/29.jpg)
3. log-‐odds scoring
29
![Page 30: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/30.jpg)
Point/Percent Accepted MutaHons (PAM)
• PAM Markov transiHon matrix – Table of esHmated transiHon probabiliHes for the underlying evoluHonary model
• PAM subsHtuHon matrix – Table of scores for all possible pairs of AAs
30
![Page 31: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/31.jpg)
PAM Model • Each site evolves as a Markov chain • Independent of others • All Markov chains have the same transiHon matrix • Dayhoff et al (1978) esHmated one-‐step • transiHons
31
![Page 32: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/32.jpg)
PAM1 transiHon matrix
• Expect 1% of the AAs to undergo accepted point mutaHons. – Align protein sequences that are at least 85% idenHcal – Reconstruct phylogeneHc trees and infer ancestral sequences
– Count AA replacements that occur along the tree..i.e. count mutaHons accepted by natural selecHon
– Use the counts to esHmate probabiliHes for replacements
32
![Page 33: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/33.jpg)
33
![Page 34: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/34.jpg)
34
![Page 35: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/35.jpg)
35
![Page 36: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/36.jpg)
36
![Page 37: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/37.jpg)
From transiHon matrix to scores
37
![Page 38: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/38.jpg)
38
![Page 39: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/39.jpg)
39
![Page 40: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/40.jpg)
40
![Page 41: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/41.jpg)
41
![Page 42: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/42.jpg)
42
![Page 43: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/43.jpg)
43
![Page 44: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/44.jpg)
PAMn subsHtuHon
44
![Page 45: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/45.jpg)
PAM Matrices
• Family of matrices PAM 80, PAM 120, PAM 250
• The number at the end indicates the evoluHonary distance between the sequences on which the matrix is based
• Greater numbers denote greater distances – PAM250 is for more distant proteins than PAM80
45
![Page 46: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/46.jpg)
46
PAM250
![Page 47: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/47.jpg)
47
PAM
AssumpHons of the PAM model:
• Replacement at any site depends only on the a.a. on that site, given the mutability table.
• Sequences in the training set (and those compared) have average a.a. composiHon.
![Page 48: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/48.jpg)
48
Sources of error in PAM • Many proteins depart from the average a.a.
composiHon.
• The a.a. composiHon can vary even within a protein (e.g. transmembrane proteins).
• A.a. posiHons are not “mutated” equally probably; especially in long evoluHonary distances.
![Page 49: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/49.jpg)
49
Sources of error in PAM (cntd)
• A.a. residues are not equally prone to mutaHons.
• Rare replacements are observed too infrequently and…
• …errors in PAM1 are magnified in PAM250.
![Page 50: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/50.jpg)
50
AA matrices: BLOSUM
Blocks SubsHtuHon Matrices (BLOSUM):
• Log-‐likelihood matrix (Henikoff & Henikoff, 1992)
• BLOCKS database of aligned sequences used as primary source set.
![Page 51: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/51.jpg)
BLOSUM matrices
• Different BOLSUMn matrices are calculated independently from BLOCKS (ungapped local alignments)
• BLOSUMn is based on a cluster of BLOCKS of sequences that share at least n percent idenHty
• BLOSUM62 represents closer sequences than BLOSUM45
51
![Page 52: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/52.jpg)
BLOSUM matrices designed to find conserved regions of proteins
• BLOCKS database contains large number of ungapped mulHple local alignments of conserved regions of proteins
• Alignments include distantly related sequences in which mulHple base subsHtuHons at the same posiHon could be observed
52
![Page 53: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/53.jpg)
• BLOCKS alignments used to derive BLOSUM matrices include sequences that are much less similar to each other than those used by Dayhoff, but whole evoluHonary homology can be confirmed through intermediate sequences
• Alignments created without phylogeneHc tree
53
![Page 54: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/54.jpg)
• Direct comparison of aligned residues does not model real subsHtuHons, because the sequences have evolved from a common ancestor and not from each other.
• Unfortunately, large variaHon in sequences prevents tree construcHon..
• If the alignment is correct, aligned residues will be related by their evoluHonary history and the alignment is expected to contain useful info on subsHtuHon preferences.
• Also, direct sequence comparison can ogen be for tesHng “significance” of similarity
54
![Page 55: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/55.jpg)
BLOSUM50 scoring matrix
55
![Page 56: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/56.jpg)
56
SubsHtuHon matrices: a comparison
PAM vs BLOSUM • PAM is based on closely related sequences, thus is biased
for short evoluHonary distances where number of mutaHons are scalable
• PAM is based on globally aligned sequences, thus includes conserved and non-‐conserved posiHons; BLOSUM is based on conserved posiHons only
![Page 57: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/57.jpg)
57
SubsHtuHon matrices: a comparison (cntd)
PAM vs BLOSUM (cntd) • Lower PAM/higher BLOSUM matrices idenHfy shorter local
alignments of highly similar sequences
• Higher PAM/lower BLOSUM matrices idenHfy longer local alignments of more distant sequences
![Page 58: Scoring(Matrices(for(( Sequence(Comparisons(02710/Lectures/ScoringMatrices2015.pdf · Scoring(Matrices(for((Sequence(Comparisons ... (Given(two(strings,(S(and(T,(find ... – =>scoring(matrices(are(afuncHon(of](https://reader031.vdocuments.mx/reader031/viewer/2022021705/5b7352347f8b9a0c418d9f6b/html5/thumbnails/58.jpg)
58
SubsHtuHon matrices: a comparison (cntd)
PAM vs BLOSUM
• Matrices of choice: • BLOSUM62: the all-‐weather matrix • PAM250: for distant relaHves